WO2023010502A1

WO2023010502A1 - Method and apparatus for anomaly detection on graph

Info

Publication number: WO2023010502A1
Application number: PCT/CN2021/111102
Authority: WO
Inventors: Evgeny Kharlamov; Jie Tang; Bo Chen
Original assignee: Tsinghua University; Robert Bosch GmbH
Current assignee: Tsinghua University; Robert Bosch GmbH
Priority date: 2021-08-06
Filing date: 2021-08-06
Publication date: 2023-02-09
Anticipated expiration: 2024-02-06
Also published as: CN118020076A; DE112021007660T5

Abstract

A method for training a Graph Neural Network (GNN) with multiple layers for anomaly detection is disclosed. The method comprising obtaining an embedding for each node of a graph and an embedding for the entire graph; inputting the embedding for each node and the embedding for the entire graph to the GNN, wherein the embedding for each node and the embedding for the entire graph are updated successively at each layer of the GNN; and updating the GNN based on a loss function.. The method further comprising edges of the graph are updated at each layer of the GNN before the embedding for each node and the embedding for the entire graph are updated. Numerous other aspects are provided.

Description

[Title established by the ISA under Rule 37.2] METHOD AND APPARATUS FOR ANOMALY DETECTION ON GRAPH

FIELD

Aspects of the present disclosure relate generally to artificial intelligence, and more particularly, to a method and an apparatus provided for anomaly detection on a graph.

BACKGROUND

Anomaly detection has profound impacts on preventing malicious activities in various applications, such as the detection of online review spams, financial frauds, fake users, and misinformation. The most promising developments have been the utilization of graph structures in machine learning models, as graphs can be used for naturally modeling the structural dependencies underlying the data.

A significant line of works has been focused on discovering and incorporating the structural patterns of anomalies into learning models. Recently, the advances of graph neural networks (GNNs) have inspired and empowered various attempts to adopt GNNs for detecting anomalies. Existing attempts to address the graph-based anomaly detection have focused on structural feature engineering or learning. The main idea of GNN-based anomaly detection is to leverage the power of GNNs to learn expressive node representations with the goal of distinguishing abnormal and normal nodes in the embedding space, producing promising results.

Contrastive learning is a widely used method to train an encoder to learn to produce vector representations of inputs such that representations of inputs in the same class will be more similar compared to representations of inputs in different classes. It is proven to be very useful for the downstream classification task, but is not specifically proposed to solve the anomaly detection problem.

However, it may be not sufficient to only make inputs in the same class more similar, it could be more efficient to make inputs in different classes more distinguishing at the same time. It is promising to combine GNNs and contrastive learning for anomaly detection, with a finely designed objective.

SUMMARY

The following presents a simplified summary of one or more aspects in order to provide a basic understanding of such aspects. This summary is not an extensive overview of all contemplated aspects, and is intended to neither identify key or critical elements of all aspects nor delineate the scope of any or all aspects. Its sole purpose is to present some concepts of one or more aspects in a simplified form as a prelude to the more detailed description that is presented later.

In light of the contrastive learning, it is proposed in the present disclosure to contrast each node with the global context of the input graph. For the purpose of contrasting abnormal nodes with normal ones in terms of their distances to the global context, a context aware graph contrastive learning (CoGCL) model is disclosed based on graph contrastive learning and GNN. The underlying assumption is that abnormal nodes tend to be more distant from the majority of the nodes than normal ones.

The present disclosure provides the context aware graph contrastive loss function in a supervised manner, i.e., labeled normal and abnormal nodes as positive and negative keys respectively. To achieve the contrastive objective, a graph neural network encoder is disclosed to learn the nodes' representations as well as learn the global context of the input graph during message passing. The graph neural network encoder is further disclosed to infer and remove suspicious links before learning the nodes' representations and learning the global context of the input graph. The disclosed encoder has corresponding modules for edge update, node update and graph update.

In addition to the supervised model, it is also extended to an unsupervised model for handling cases with scarce labels. Straightforwardly, node labels that can be directly used to replace the ground-truth labels in the supervised contrastive loss function of CoGCL need to be synthesized.

According to an aspect, a method for training a Graph Neural Network (GNN) with multiple layers for anomaly detection is disclosed. The method comprising: obtaining an embedding for each node of a graph and an embedding for the entire graph; inputting the embedding for each node and the embedding for the entire graph to the GNN, wherein the embedding for each node and the embedding for the entire graph are updated successively at each layer of the GNN; and updating the GNN based on a loss function.

According to a further aspect, a portion of nodes is identified as abnormal nodes before input to the GNN based at least on ground-truth node labels.

According to a further aspect, a portion of nodes is identified as abnormal nodes before input to the GNN based at least on generating the graph by: breaking a first graph into a plurality of sub-graphs; and injecting a set of nodes from sub-graphs other than a sub-graph which is to be used as the graph for inputting into the GNN, into the sub-graph as abnormal nodes.

According to a further aspect, the loss function is to enforce maximizing the consistency between embeddings of normal nodes of the graph and the embedding for the entire graph compared with the consistency between embeddings of abnormal nodes of the graph and the embedding for the entire graph.

According to a further aspect, edges of the graph are updated at each layer of the GNN before the embedding for each node and the embedding for the entire graph are updated.

According to a further aspect, the edges of the graph are updated at each layer by estimating a suspicious probability of an edge between two nodes; and removing the edge based at least partially the suspicious probability of the edge.

According to a further aspect, the embedding for each node is updated at each layer by aggregating embeddings of all neighboring nodes of the last layer for a node based at least on the edges of current layer; and combining the aggregated embedding with the embedding for the node of last layer.

According to a further aspect, the embedding for the entire graph is updated at each layer by calculating a similarity between the embedding for each node of current layer and the embedding for the entire graph of last layer; and calculating the embedding for the entire graph of the current layer by weighted aggregating the embedding for each node of the current layer based on the corresponding similarities.

A method for using a Graph Neural Network (GNN) trained with the method in the present disclosure for anomaly detection is disclosed. The method comprising obtaining an embedding for each node of a graph and an embedding for the entire graph; inputting the embedding for each node and the embedding for the entire graph to the GNN, wherein at least the embedding for each node and the embedding for the entire graph are updated successively at each layer of the GNN; and outputting, after final layer of the GNN, a cosine similarity between the updated embedding for a node and the updated embedding for the entire graph as an abnormal extent.

According to an aspect, a method for training an online acadamic system including a Graph Neural Network (GNN) with multiple layers for wrongly-assigned publication detection is disclosed. The method comprising: obtaining an embedding for each node of a graph and an embedding for the entire graph; inputting the embedding for each node and the embedding for the entire graph to the GNN, wherein the embedding for each node and the embedding for the entire graph are updated successively at each layer of the GNN; and updating the GNN based on a loss function.

According to further aspect, the graph is constructed based on data of an online academic system, and wherein each node of the graph represents a researcher of the online academic system, and an edge exists if two publications have a certain common attribute, such as coauthors, venues or organizations.

According to an aspect, a method for training an online spams recognition system including a Graph Neural Network (GNN) with multiple layers for spams detection is disclosed. The method comprising: obtaining an embedding for each node of a graph and an embedding for the entire graph; inputting the embedding for each node and the embedding for the entire graph to the GNN, wherein the embedding for each node and the embedding for the entire graph are updated successively at each layer of the GNN; and updating the GNN based on a loss function.

According to further aspect, the graph is constructed based on data of an online e-mail system, and wherein each node of the graph represents an account of the online e-mail system, and an edge exists if any e-mail exchange happens between two accounts.

According to an aspect, a method for training a frauds recognition system including a Graph Neural Network (GNN) with multiple layers for anomaly detection is disclosed. The method comprising: obtaining an embedding for each node of a graph and an embedding for the entire graph; inputting the embedding for each node and the embedding for the entire graph to the GNN, wherein the embedding for each node and the embedding for the entire graph are updated successively at each layer of the GNN; and updating the GNN based on a loss function.

According to further aspect, the graph is constructed based on data of a bank system, and wherein each node of the graph represents an account of the bank system, and an edge exists if any transfer happens between two accounts or the two accounts have a certain common attribute, such as account name.

According to an aspect, a method for training a rumor prediction system including a Graph Neural Network (GNN) with multiple layers for anomaly detection is disclosed. The method comprising: obtaining an embedding for each node of a graph and an embedding for the entire graph; inputting the embedding for each node and the embedding for the entire graph to the GNN, wherein the embedding for each node and the embedding for the entire graph are updated successively at each layer of the GNN; and updating the GNN based on a loss function.

According to further aspect, the graph is constructed based on data of a social network system, and wherein each node of the graph represents an account of the social network system, and an edge exists if any information flow happens between two accounts.

The present disclosure can be used in a variety of applications, for example, in any scenario that inputs can be learned as node representations and connections between the nodes can be learned as edges, including but not limited to dataset preprocessing, detection of online review spams, financial frauds, insurance frauds, bot frauds, fake users and rumors in social media, fake news, etc..

BRIEF DESCRIPTION OF THE DRAWINGS

The disclosed aspects will be described in connection with the appended drawings that are provided to illustrate and not to limit the disclosed aspects.

Fig. 1a illustrates an exemplary anomaly detection scenario, in accordance with various aspects of the present disclosure.

Fig. 1b illustrates exemplary similarity between normal nodes and global context versus abnormal nodes and the global context, in accordance with various aspects of the present disclosure.

Fig. 2 illustrates an exemplary module diagram of the GNN encoder for CoGCL at each layer, in accordance with various aspects of the present disclosure.

Fig. 3 illustrates an exemplary framework of CoGCL, in accordance with various aspects of the present disclosure.

Fig. 4 illustrates an exemplary process of creating corrupt graphs, in accordance with various aspects of the present disclosure.

Fig. 5 illustrates an exemplary flow chart of training a GNN for anomaly detection, in accordance with various aspects of the present disclosure.

Fig. 6 illustrates an exemplary flow chart of using the GNN trained with the method in Fig. 5 for anomaly detection, in accordance with various aspects of the present disclosure.

Fig. 7 illustrates an exemplary computing system 700, in accordance with various aspects of the present disclosure.

DETAILED DESCRIPTION

The present disclosure will now be discussed with reference to several example implementations. It is to be understood that these implementations are discussed only for enabling those skilled in the art to better understand and thus implement the embodiments of the present disclosure, rather than suggesting any limitations on the scope of the present disclosure.

Various embodiments will be described in detail with reference to the accompanying drawings. Wherever possible, the same reference numbers will be used throughout the drawings to refer to the same or like parts. References made to particular examples and embodiments are for illustrative purposes, and are not intended to limit the scope of the disclosure.

An important issue that data scientists are working to solve recently is anomaly detection, from network security to financial fraud, and further to misinformation propagated on social network, anomaly detection helps protect business, individuals and online communities. To this end, with mainly focused on structural features, graph-based anomaly detection has been widely used for detecting malicious activities in real-world applications.

An anomaly detection problem is motivated by news reported in 2012 that a researcher named “Jun Lu" from Beijing University of Chemical Technology cheated the awards using the papers of another author who is also named “Jun Lu" but from Yale University. This event caused by the wrongly-assigned papers of the same author name implies the importance of anomaly detection.

To better understand the behavior of anomalies, a similar example is taken as a case study, by building a graph with papers assigned to a certain author as nodes, and connecting two papers with an edge if they share the same coauthors or venues. The goal in this example is to detect the wrongly-assigned papers (anomalies) , i.e., those that are not authored by the certain author. As shown in Fig. 1a, blank nodes represent papers authored by the certain author, and black nodes represent papers actually from other authors but wrongly-assigned.

As another example, a graph can be constructed based on data of an online e-mail system, and wherein each node of the graph represents an account of the online e-mail system, and an edge exists if any e-mail exchange happens between two accounts. As yet another example, a graph can be constructed based on data of a bank system, and wherein each node of the graph represents an account of the bank system, and an edge exists if any transfer happens between two accounts or the two accounts have a certain common attribute, such as account name. And as other example, a graph can be constructed based on data of a social network system, and wherein each node of the graph represents an account of the social network system, and an edge exists if any information flow happens between two accounts.

Fig. 1a shows the t-SNE2 projection of each node’s input features-the BERT embedding of its title, with blank as normal nodes and black as abnormal ones. It can be observed that both abnormal and normal nodes are distributed diversely, wherein abnormal ones being relatively more diverse. Intuitively, to quantify this observation, the similarity between each node and the global context, which is the average of all node features, is computed, which is showed in Figure 1b.

Fig. 1b illustrates exemplary similarity between normal nodes and global context versus abnormal nodes and the global context, in accordance with various aspects of the present disclosure. The similarity between normal nodes and global context is computed as a cosine similarity herein, and shown in the blank block. The similarity between abnormal nodes and the global context is computed as a cosine similarity herein, and shown with a shadowed block. On the left-side, it is the cosine similarity calculated based on the original input features, it suggests that though being relatively closed (y∈ [0.97, 1] ) , the two similarity distributions can be clearly distinguished. Inspired by these observations, it is desired a straightforward way to capture the differences for distinguishing abnormal and normal nodes. On the right-side, it is the cosine similarity calculated based on embeddings generated by CoGCL in the present disclosure, the two similarity distributions are more distant and more distinguishable as shown, illustrating that CoGCL is an ideal way for distinguishing the anomalies from the graph.

A graph-based anomaly detection task can be formalized as following problem. With a graph denoted as G= (V, X, A) , where V is the set of N nodes,

denotes the adjacency matrix, and X is the corresponding feature vectors with

denoting the d-dimensional feature vector of v _i∈V. In consideration of generality, G is considered into an unweighted, undirected and single-relation graph, i.e., A _ij=1 if an edge exists between v _i and v _j, and A _ij=0 otherwise.

Given a labeled graph G= (V, X, A, Y) , Y is the set of labels on nodes with y _i∈Y takes value 1 if v _i is abnormal and 0 otherwise. In an example, the goal is to learn a classifier

to determine whether a given node is abnormal (1) or normal (0) .

To address this problem, most existing GNN-based models directly instantiate g as a binary classifier to predict the abnormal labels. Inspired by the observations about the distinguished distances to the global context (Fig. 1b) , contrastive learning is leveraged in the present disclosure for detecting the anomalies.

In a first example, a CoGCL model is discussed in a supervised framework.

The basic idea of the CoGCL model is to determine a node’s abnormality by contrasting it with the global context (e.g., the average of all nodes) of the entire graph. This is motivated by the discovery that there exists a significant difference of distance to the global context between the normal and the abnormal nodes, that is, a node is more abnormal if it deviates farther away from the majority of nodes in the feature space.

In view of this, the CoGCL model is disclosed to distinguish abnormal and normal nodes in the embedding space by leveraging graph contrastive learning. Specifically, given a graph G, our objective is to learn a GNN encoder that can output an embedding h _i for each node v _i and also embedding q for the entire graph (global context) , i.e., (H, q) =f _GNN (X, A) with

One of the main ideas of the model is to take the graph embedding q as the query, a normal node's embedding as the positive key that matches with q, and the embeddings of all the abnormal nodes as the negative keys that don't match with q. These representations in the embedding space may then be used for downstream tasks, such as node classification, anomaly detection, etc.

In an embodiment, a concrete loss function is designed as equation (1) :

where τ is the temperature hyperparameter. The objective function is to enforce maximizing the consistency between the positive pairs (normal nodes and global context) compared with the negative pairs (abnormal node, global context) .

Generally, for the purpose of optimizing an objective, a GNN model contains a message passing process and a neighbor aggregation process. Specifically, to optimize the context aware contrastive objective in equation (1) , the CoGCL model contains several steps. For example, the CoGCL model contains a node update process and graph update process. For another example, the CoGCL model contains an edge update process, a node update process and graph update process. As a result, the CoGCL's GNN encoder f _GNN contains modules for corresponding processes.

Fig. 2 illustrates an exemplary module diagram of the GNN encoder for CoGCL at each layer, in accordance with various aspects of the present disclosure. As described below, some or all illustrated features may be omitted in a particular implementation within the scope of the present disclosure, and some illustrated features may not be required for implementation of all embodiments. It should be understood that modules shown with dashed lines in Fig. 2 indicate optional modules.

A graph's representation in embedding space is input into the GNN. In an aspect, the graph is a labeled graph with a set of labels on nodes. In an aspect the graph is input into the GNN at least in the form of an embedding for each node and an embedding for the entire graph. In an aspect, the graph is input into the GNN with an adjacent matrix representing edges.

As shown in Fig. 2, the encoder f _GNN contains several modules to optimize the context aware contrastive objective in equation (1) . In an aspect, the encoder f _GNN contains two modules, a node update module 202 and a graph update module 203. At each layer of GNN, the two modules are performed successively. In another aspect, the encoder f _GNN contains three modules, an edge update module 201, a node update module 202 and a graph update module 203. At each layer of GNN, the three modules are performed successively.

An edge update module 201 is to estimate the suspicious probability of an edge and modify the adjacent matrix according to the probabilities, wherein

A ^(updated) =f _edge (H ^(input) , A ^(input) , q ^(input) ) (2)

The suspicious edges are removed to reduce the negative influence from them thoroughly.

An node update module 202 is then performed after the edge update module, to aggregate the neighboring information based on the modified adjacent matrix A ^(updated) and then combine the aggregated result with the concerned node, wherein

H ^(updated) =f _node (H ^(input) , A ^(updated) ) (3)

An graph update module 203 is then performed after the node update module, to update the context of the whole graph based on the updated node embeddings H ^(updated) and the input graph embedding q ^(input) via a readout function:

q ^(updated) =f _graph (H ^(updated) , q ^(input) ) (4)

After all the layer convolutions, the updated node embedding H ^(updated) , graph embedding q ^(updated) and graph edges A ^(updated) are output for further processing.

Fig. 3 illustrates an exemplary framework of CoGCL model, in accordance with various aspects of the present disclosure. As described below, some or all illustrated features may be omitted in a particular implementation within the scope of the present disclosure, and some illustrated features may not be required for implementation of all embodiments. It should be understood that modules shown with dashed lines in Fig. 3 indicate optional modules.

As shown in Fig. 3, a graph G is input into the CoGCL model at 301. In an aspect, the graph G is a labeled graph with a set of labels Y on nodes. In an aspect, the graph G is input into the CoGCL model with an embedding for each node

and an embedding for the entire graph q. The embedding for the entire graph q is obtained based at least on embeddings for each node

via a readout function. In an aspect, the graph G is input into the CoGCL model with an adjacent matrix A representing edges. As in Fig. 3, the blank nodes are shown as normal nodes, while the black nodes are shown as abnormal nodes.

It is illustrated in Fig. 3 that the CoGCL model includes a plurality of repeat layers, with each layer at least includes corresponding modules of the encoder f _GNN as discussed with Fig. 2. It is noted that the number of layers would not limit the scope of the invention.

For example, layer 302-l includes two modules for node update and graph update respectively. For another example, layer 302-l includes three modules for edge update, node update and graph update respectively. At each layer, edge update as the first step, which is optional, is to estimate the likelihood of each link being a suspicious one and then remove the most suspicious ones. Then the node update is to update the node embeddings by message passing on the updated adjacency matrix. Finally, the global context is updated via graph update. After L-layer graph convolutions, the context aware contrastive loss function in equation (1) would be optimized, and the updated nodes' embeddings and graph embedding are output to 303 for further processing, such as calculating loss function for training, and calculating cosine similarity when for anomaly prediction, etc.

In an embodiment, the encoder f _GNN contains an edge update module 302-l-1 in layer 302-l to remove suspicious links. Suspicious links are the ones that connect nodes with opposite labels. For example, almost every fraudster in business website can be linked to some benign users by the homophily assumption between neighbors –two neighboring nodes tend to have similar features and same labels –and impact the performance of the traditional message-passing along with the links.

To predict the likelihood of a link, instead of based on the node features regardless of the node labels, both the node features h and one's distance to the global context h-q in the feature space are considered. In the present disclosure, the distance can serve as implicit labels for guiding the model to remove suspicious links –those connected by the nodes with distinguished distance to the global context.

In an aspect, a context aware link predictor is provided in the present disclosure to estimate the suspicious linkage likelihood between two nodes. The link predictor accepts the representations of two nodes as the input, where each node is represented by its own embedding and its relative distance to the global context embedding, i.e.,

Where

is the concatenation operator and MLP is a projection layer.

the distance from node v _i to the global context, indicates that two nodes can be highly probably linked given their relative positions to the global context are similar, even if their local features are slightly different. In an aspect, the linkage likelihood between node v _i and v _j can be calculated as:

Where ReLU normalizes the cosine similarities into [0, 1] .

In an aspect, the suspicious links are removed to reduce the negative influence from them thoroughly by employing Bernoulli approximation, the binary case of the Gumbel-Softmax reparameterization trick. Specifically, for each link

a Gumbel noise ε～ Gumbel (0, 1) is first sampled, and then is added to the logarithm of the linkage likelihood

and is applied a sigmoid function. Then, the result is rounded as 1 if it is larger than 0.5 and 0 otherwise to obtain the updated

i.e.,

Where λ indicates the temperature of Gumbel-Softmax distribution. In an aspect, according to straight-through gradient estimator, the gradients is passed to the relaxed value

instead of the discrete value

during the backward procedure.

In another aspect, to accelerate the optimization of the link,

is also added, which is the cross-entropy loss between the estimated likelihood

and the observation

of the last layer, as the constraint of the loss function in equation (1) .

In an embodiment, the encoder f _GNN in layer 302-l contains a node update module 302-l-2 for updating node embeddings. In an aspect, the node embeddings are updated based on the updated edges. In another aspect, the node embeddings are updated without updating the edges of the graph.

In an aspect, the operation in equation (3) is divided into an AGGREGATION and a COMBINE operations as:

Where equation (8) is to aggregate the neighboring information of the last layer based on the updated adjacency matrix and equation (9) is to combine the aggregated neighboring information with the concerned node. In an aspect, the AGGREGATION operation is the sum, the same as GIN. In an aspect, the COMBINE operation is concatenation.

In an embodiment, the encoder f _GNN in layer 302-l contains a graph update module 302-l-2. The graph embedding q which is the global context, needs to be updated at each layer after all nodes' embeddings are updated.

In an aspect, the graph embedding q can be obtained by straightforwardly performing one of sum pooling, max pooling or average pooling of all nodes in the graph. However, these kinds of operations do not distinguish the normal and abnormal nodes, which result in the inaccurate global context being deviated from the normal pattern.

In another aspect, a memory buffer m is introduced to register the global context q ^(l-1) of the last layer, based on which the contribution of each node to the global context is calculated. The main idea here is a node which is deviated from the current global context should be weakened in the next context update step. The concrete strategy can be formulated as:

The memory are read out and the attention

for each node v _i is calculated based at least on its cosine similarity with the memory vector, which value is the global context q ^(l-1) of the last layer. Then the global context q ^(l) is updated by weighted aggregating different nodes, and the updated global context is written into the memory to update the memory vector m.

After L-layer graph convolutions, the updated nodes' embeddings H ^(L) and graph embedding q ^(L) are output to 303 for further processing.

In an embodiment, for the graphs used for training, at 303, the loss function is calculated based at least on the updated nodes' embeddings H ^(L) and graph embedding q ^(L) .

In another embodiment, for the graphs used for test, at 303, the updated nodes' embeddings H ^(L) and graph embedding q ^(L) are output to 303 for anomaly prediction.

In an aspect, for node v _i∈V to be predicted, a binary classifier is used at 303 to determine whether a given node is abnormal or normal with a binary label indicating 1 or 0.

In another aspect, instead of directly predicting a binary label for each node, for node v _i∈V to be predicted, with its embedding

and the graph embedding q ^(L) of the final layer, the cosine similarity between them is calculated as the abnormal extent of v _i. The lower the cosine similarity is, the more anomalous the v _i is predicted to be.

In the above example, the CoGCL model is discussed in a supervised framework. While sufficient labels of anomalies are often arduously expensive to obtain, the CoGCL model can be extended as an unsupervised model to handle real-world applications with scarce or even no labels.

In a second example, the CoGCL model is discussed in an unsupervised framework, especially focus on a pre-training strategy.

Inspired by the idea of the context aware contrastive objective, it is proposed to construct pseudo anomalies via corrupting the original graph. Considering one small part of the original graph, nodes outside of this small part can be injected into it to act as pseudo anomalies. The underlying assumption is that as nodes in different parts of the graph follow different feature distributions, they can serve as the pseudo anomalies to the current original graph context.

A corrupt graph is defined as follows. Given a graph G= (V, X, A) , which is broken into M parts

For each part G _i, a corrupt graph

is created by injecting a set of nodes

from the parts except G _i, i.e.,

thus the corrupt nodes

are the union of V _i and

i.e.,

The corrupt adjacency matrix

of

is obtained by slicing A using the indexes in

Fig. 4 illustrates an exemplary process of creating corrupt graphs, in accordance with various aspects of the present disclosure. Firstly, the graph is broken into several sub-graphs. There are different ways for breaking the graph into multiple parts, such as using community detection or clustering algorithms. In an aspect, the hierarchical agglomerative clustering algorithms can be used to cluster nodes into sub-graphs based on the initial features X and determine the size of the clusters via finding the optimal Silhouette Coefficient, which is a well-adopted clustering metric.

Then the sub-graphs are injected into nodes from other sub-graphs as abnormal nodes. In an aspect, each corrupt sub-graph is input into the CoGCL model respectively.

In an aspect, given the corrupt sub-graphs with injected nodes as pseudo anomalies and original nodes as normal ones, the same context aware contrastive loss function defined in equation (1) can be directly optimized to train f _GNN.

In another aspect, the context aware contrastive loss function defined in equation (1) can be further fine-tuned on the target graph if the ground truth labels are also available.

Fig. 5 illustrates an exemplary flow chart of training a GNN for anomaly detection, in accordance with various aspects of the present disclosure. As described below, some or all illustrated features may be omitted in a particular implementation within the scope of the present disclosure, and some illustrated features may not be required for implementation of all embodiments. In some examples, the method may be carried out by any suitable apparatus or means for carrying out the functions or algorithm described below. It should be understood that operations shown with dashed lines in Fig. 5 indicate optional operations.

The method begins at 501, with a graph G is given. The graph G is a graph with labels, which is denoted as G= (V, X, A, Y) , where V is the set of N nodes,

denotes the adjacency matrix, and X is the corresponding feature vectors with

denoting the d-dimensional feature vector of v _i∈V, Y is the set of labels on nodes with y _i∈Y takes value 1 if v _i is abnormal and 0 otherwise.

In an aspect, the set of labels Y is a set of ground-truth labels. In another aspect, the set of labels Y is obtained through a pre-training procedure, which would be discussed with the optional operation 500. In other aspect, the set of labels Y is a set of labels including both ground-truth labels and pre-trained labels.

As shown in Fig. 5, operations in 500 are optional. At 500, a graph G is generated by breaking a large graph G′ into M parts

Then for each part G _i, injecting a set of nodes

from the parts except G _i to create corresponding corrupt graph

The corresponding corrupt graphs

are input into the model respectively, each of the corrupt graphs is treated as an independent graph G.

The method proceeds to 502 from 501, with obtaining an embedding for each node of the graph G and an embedding for the entire graph.

The method proceeds to 503, with inputting the embedding for each node and the embedding for the entire graph to a Graph Neural Network (GNN) with multiple layers.

In an aspect, at each layer of the GNN, the embedding for each node and the embedding for the entire graph are updated successively.

As an example, the embedding for each node is updated in a message passing and neighbor aggregation process. In an embodiment, the embedding for each node is updated at each layer by aggregating embeddings of all neighboring nodes of the last layer for a concerned node based at least on the edges of current layer; and combining the aggregated embedding with the embedding for the concerned node of last layer. In an embodiment, the aggregating operation comprises summation, and the combining operation comprises concatenation.

As a further example, the embedding for the entire graph is updated at each layer after the embedding for each node is updated at the current layer. In an embodiment, the embedding for the entire graph is obtained by performing one of sum pooling, max pooling or average pooling of all nodes in the graph. In another embodiment, the embedding for the entire graph is updated by calculating a similarity between the embedding for each node of current layer and the embedding for the entire graph of last layer; and calculating the embedding for the entire graph of the current layer by weighted aggregating the embedding for each node of the current layer based on the corresponding similarities. In a further embodiment, the similarity is cosine similarity, and the aggregating operation comprises summation.

In another aspect, at each layer of the GNN, edges of the graph, the embedding for each node and the embedding for the entire graph are updated successively.

As an example, the edges of the graph are updated at each layer by estimating a suspicious probability of an edge between two nodes; and removing the edge based at least partially the suspicious probability of the edge.

As a further example, the embedding for each node is updated in a message passing and neighbor aggregation process. In an embodiment, the embedding for each node is updated at each layer after the edges is updated at current layer by aggregating embeddings of all neighboring nodes of the last layer for a concerned node based at least on the edges of current layer; and combining the aggregated embedding with the embedding for the concerned node of last layer. In an embodiment, the aggregating operation comprises summation, and the combining operation comprises concatenation.

After all layer graph convolution, the method proceeds to 504, with updating the GNN based on a loss function.

In an aspect, the loss function is to enforce maximizing the consistency between embeddings of normal nodes of the graph and the embedding for the entire graph compared with the consistency between embeddings of abnormal nodes of the graph and the embedding for the entire graph.

Fig. 6 illustrates an exemplary flow chart of using the GNN trained with the method in the Fig. 5 for anomaly detection, in accordance with various aspects of the present disclosure. As described below, some or all illustrated features may be omitted in a particular implementation within the scope of the present disclosure, and some illustrated features may not be required for implementation of all embodiments. In some examples, the method may be carried out by any suitable apparatus or means for carrying out the functions or algorithm described below.

The method begins at 601, with a graph G is given for anomaly detection. The graph G is denoted as G= (V, X, A) , where V is the set of N nodes,

denotes the adjacency matrix, and X is the corresponding feature vectors with

denoting the d-dimensional feature vector of v _i∈V. In a further aspect, the graph G could be given with ground-truth labels as G= (V, X, A, Y) , wherein Y is the set of labels on nodes with y _i∈Y takes value 1 if v _i is abnormal and 0 otherwise.

The method proceeds to 602, with obtaining an embedding for each node of the graph G and an embedding for the entire graph.

The method proceeds to 603, with inputting the embedding for each node and the embedding for the entire graph to a Graph Neural Network (GNN) with multiple layers.

After all layer graph convolution, the method proceeds to 604, with outputting indication for determining whether a node is anomaly.

In an aspect, the indication is a binary label for the node to determine whether a given node is abnormal (1) or normal (0) .

In another aspect, the indication is a similarity between the updated embedding for the node and the updated embedding for the entire graph as an abnormal extent. For example, the similarity can be a cosine similarity.

Fig. 7 illustrates an exemplary computing system 700, in accordance with various aspects of the present disclosure. The computing system 700 may comprise at least one processor 710. The computing system 700 may further comprise at least one storage device 720. In an aspect, the storage device 720 may store computer-executable instructions that, when executed, cause the processor 710 to perform the method for training a Graph Neural Network (GNN) with multiple layers for anomaly detection, comprising: obtaining an embedding for each node of a graph and an embedding for the entire graph; inputting the embedding for each node and the embedding for the entire graph to the GNN, wherein the embedding for each node and the embedding for the entire graph are updated successively at each layer of the GNN; and updating the GNN based on a loss function. In a further aspect, the storage device 720 may store computer-executable instructions that, when executed, cause the processor 710 to perform the method for using a Graph Neural Network (GNN) trained with the present disclosure for anomaly detection, comprising: obtaining an embedding for each node of a graph and an embedding for the entire graph; inputting the embedding for each node and the embedding for the entire graph to the GNN, wherein at least the embedding for each node and the embedding for the entire graph are updated successively at each layer of the GNN; and outputting, after final layer of the GNN, a cosine similarity between the updated embedding for a node and the updated embedding for the entire graph as an abnormal extent.

It should be appreciated that the storage device 720 may store computer-executable instructions that, when executed, cause the processor 710 to perform any operations according to the embodiments of the present disclosure as described in connection with FIGs. 1-6.

The embodiments of the present disclosure may be embodied in a computer-readable medium such as non-transitory computer-readable medium. The non-transitory computer-readable medium may comprise instructions that, when executed, cause one or more processors to perform any operations according to the embodiments of the present disclosure as described in connection with FIGs. 1-6.

The embodiments of the present disclosure may be embodied in a computer program product comprising computer-executable instructions that, when executed, cause one or more processors to perform any operations according to the embodiments of the present disclosure as described in connection with FIGs. 1-6.

It should be appreciated that all the operations in the methods described above are merely exemplary, and the present disclosure is not limited to any operations in the methods or sequence orders of these operations, and should cover all other equivalents under the same or similar concepts.

It should also be appreciated that all the modules in the apparatuses described above may be implemented in various approaches. These modules may be implemented as hardware, software, or a combination thereof. Moreover, any of these modules may be further functionally divided into sub-modules or combined together.

The previous description is provided to enable any person skilled in the art to practice the various aspects described herein. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects. Thus, the claims are not intended to be limited to the aspects shown herein. All structural and functional equivalents to the elements of the various aspects described throughout the present disclosure that are known or later come to be known to those of ordinary skill in the art are expressly incorporated herein by reference and are intended to be encompassed by the claims.

Claims

A method for training a Graph Neural Network (GNN) with multiple layers for anomaly detection, the method comprising:

obtaining an embedding for each node of a graph and an embedding for the entire graph;

inputting the embedding for each node and the embedding for the entire graph to the GNN, wherein the embedding for each node and the embedding for the entire graph are updated successively at each layer of the GNN; and

updating the GNN based on a loss function.
The method of claim 1, wherein a portion of nodes is identified as abnormal nodes before input to the GNN based at least on ground-truth node labels.
The method of claim 1, wherein a portion of nodes is identified as abnormal nodes before input to the GNN based at least on generating the graph by:

breaking a first graph into a plurality of sub-graphs; and

injecting a set of nodes from sub-graphs other than a sub-graph which is to be used as the graph for inputting into the GNN, into the sub-graph as abnormal nodes.
The method of claims 2 or 3, wherein the loss function is to enforce maximizing the consistency between embeddings of normal nodes of the graph and the embedding for the entire graph compared with the consistency between embeddings of abnormal nodes of the graph and the embedding for the entire graph.
The method of claim 1, the method further comprising:

edges of the graph are updated at each layer of the GNN before the embedding for each node and the embedding for the entire graph are updated.
The method of claim 5, wherein the edges of the graph are updated at each layer by:

estimating a suspicious probability of an edge between two nodes; and

removing the edge based at least partially the suspicious probability of the edge.
The method of claim 6, wherein estimating the suspicious probability of the edge between two nodes comprising:

estimating the suspicious probability of the edge between two nodes based at least on a similarity of the two nodes, wherein each node is represented by its own node embedding and its distance to the embedding for the entire graph of the last layer.
The method of claim 5, wherein the embedding for each node is updated at each layer by:

aggregating embeddings of all neighboring nodes of the last layer for a node based at least on the edges of current layer; and

combining the aggregated embedding with the embedding for the node of last layer.
The method of claim 8, wherein the aggregating operation comprises summation, and the combining operation comprises concatenation.
The method of claim 5, wherein the embedding for the entire graph is obtained by performing one of sum pooling, max pooling or average pooling of all nodes in the graph.
The method of claim 5, wherein the embedding for the entire graph is updated at each layer by:

calculating a similarity between the embedding for each node of current layer and the embedding for the entire graph of last layer;

calculating the embedding for the entire graph of the current layer by weighted aggregating the embedding for each node of the current layer based on the corresponding similarities.
The method of claim 1, wherein the graph is constructed based on data of an online e-mail system, and wherein each node of the graph represents an account of the online e-mail system, and an edge exists if any e-mail exchange happens between two accounts.
The method of claim 1, wherein the graph is constructed based on data of an online academic system, and wherein each node of the graph represents a publication of a researcher, and an edge exists if two publications have a certain common attribute.
A method for using a Graph Neural Network (GNN) trained with one of claims 1-13 for anomaly detection, comprising:

obtaining an embedding for each node of a graph and an embedding for the entire graph;

inputting the embedding for each node and the embedding for the entire graph to the GNN, wherein at least the embedding for each node and the embedding for the entire graph are updated successively at each layer of the GNN; and

outputting, after final layer of the GNN, a cosine similarity between the updated embedding for a node and the updated embedding for the entire graph as an abnormal extent.
A computer system, comprising:

one or more processors; and

one or more storage devices storing computer-executable instructions that, when executed, cause the one or more processors to perform the operations of the method of one of claims 1-14.
One or more computer readable storage media storing computer-executable instructions that, when executed, cause one or more processors to perform the operations of the method of one of claims 1-14.
A computer program product comprising computer-executable instructions that, when executed, cause one or more processors to perform the operations of the method of one of claims 1-14.