[go: up one dir, main page]

WO2023010502A1 - Method and apparatus for anomaly detection on graph - Google Patents

Method and apparatus for anomaly detection on graph Download PDF

Info

Publication number
WO2023010502A1
WO2023010502A1 PCT/CN2021/111102 CN2021111102W WO2023010502A1 WO 2023010502 A1 WO2023010502 A1 WO 2023010502A1 CN 2021111102 W CN2021111102 W CN 2021111102W WO 2023010502 A1 WO2023010502 A1 WO 2023010502A1
Authority
WO
WIPO (PCT)
Prior art keywords
graph
embedding
node
gnn
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2021/111102
Other languages
French (fr)
Inventor
Evgeny Kharlamov
Jie Tang
Bo Chen
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Robert Bosch GmbH
Original Assignee
Tsinghua University
Robert Bosch GmbH
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University, Robert Bosch GmbH filed Critical Tsinghua University
Priority to CN202180101403.3A priority Critical patent/CN118020076A/en
Priority to PCT/CN2021/111102 priority patent/WO2023010502A1/en
Priority to DE112021007660.4T priority patent/DE112021007660T5/en
Publication of WO2023010502A1 publication Critical patent/WO2023010502A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/042Knowledge-based neural networks; Logical representations of neural networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound

Definitions

  • aspects of the present disclosure relate generally to artificial intelligence, and more particularly, to a method and an apparatus provided for anomaly detection on a graph.
  • Anomaly detection has profound impacts on preventing malicious activities in various applications, such as the detection of online review spams, financial frauds, fake users, and misinformation.
  • the most promising developments have been the utilization of graph structures in machine learning models, as graphs can be used for naturally modeling the structural dependencies underlying the data.
  • GNNs graph neural networks
  • Contrastive learning is a widely used method to train an encoder to learn to produce vector representations of inputs such that representations of inputs in the same class will be more similar compared to representations of inputs in different classes. It is proven to be very useful for the downstream classification task, but is not specifically proposed to solve the anomaly detection problem.
  • contrastive learning it is proposed in the present disclosure to contrast each node with the global context of the input graph.
  • a context aware graph contrastive learning (CoGCL) model is disclosed based on graph contrastive learning and GNN.
  • the underlying assumption is that abnormal nodes tend to be more distant from the majority of the nodes than normal ones.
  • the present disclosure provides the context aware graph contrastive loss function in a supervised manner, i.e., labeled normal and abnormal nodes as positive and negative keys respectively.
  • a graph neural network encoder is disclosed to learn the nodes' representations as well as learn the global context of the input graph during message passing.
  • the graph neural network encoder is further disclosed to infer and remove suspicious links before learning the nodes' representations and learning the global context of the input graph.
  • the disclosed encoder has corresponding modules for edge update, node update and graph update.
  • a method for training a Graph Neural Network (GNN) with multiple layers for anomaly detection comprising: obtaining an embedding for each node of a graph and an embedding for the entire graph; inputting the embedding for each node and the embedding for the entire graph to the GNN, wherein the embedding for each node and the embedding for the entire graph are updated successively at each layer of the GNN; and updating the GNN based on a loss function.
  • GNN Graph Neural Network
  • a portion of nodes is identified as abnormal nodes before input to the GNN based at least on ground-truth node labels.
  • a portion of nodes is identified as abnormal nodes before input to the GNN based at least on generating the graph by: breaking a first graph into a plurality of sub-graphs; and injecting a set of nodes from sub-graphs other than a sub-graph which is to be used as the graph for inputting into the GNN, into the sub-graph as abnormal nodes.
  • the loss function is to enforce maximizing the consistency between embeddings of normal nodes of the graph and the embedding for the entire graph compared with the consistency between embeddings of abnormal nodes of the graph and the embedding for the entire graph.
  • edges of the graph are updated at each layer of the GNN before the embedding for each node and the embedding for the entire graph are updated.
  • the edges of the graph are updated at each layer by estimating a suspicious probability of an edge between two nodes; and removing the edge based at least partially the suspicious probability of the edge.
  • the embedding for each node is updated at each layer by aggregating embeddings of all neighboring nodes of the last layer for a node based at least on the edges of current layer; and combining the aggregated embedding with the embedding for the node of last layer.
  • the embedding for the entire graph is updated at each layer by calculating a similarity between the embedding for each node of current layer and the embedding for the entire graph of last layer; and calculating the embedding for the entire graph of the current layer by weighted aggregating the embedding for each node of the current layer based on the corresponding similarities.
  • a method for using a Graph Neural Network (GNN) trained with the method in the present disclosure for anomaly detection comprising obtaining an embedding for each node of a graph and an embedding for the entire graph; inputting the embedding for each node and the embedding for the entire graph to the GNN, wherein at least the embedding for each node and the embedding for the entire graph are updated successively at each layer of the GNN; and outputting, after final layer of the GNN, a cosine similarity between the updated embedding for a node and the updated embedding for the entire graph as an abnormal extent.
  • GNN Graph Neural Network
  • edges of the graph are updated at each layer of the GNN before the embedding for each node and the embedding for the entire graph are updated.
  • a method for training an online acadamic system including a Graph Neural Network (GNN) with multiple layers for wrongly-assigned publication detection comprising: obtaining an embedding for each node of a graph and an embedding for the entire graph; inputting the embedding for each node and the embedding for the entire graph to the GNN, wherein the embedding for each node and the embedding for the entire graph are updated successively at each layer of the GNN; and updating the GNN based on a loss function.
  • GNN Graph Neural Network
  • the graph is constructed based on data of an online academic system, and wherein each node of the graph represents a researcher of the online academic system, and an edge exists if two publications have a certain common attribute, such as coauthors, venues or organizations.
  • a method for training an online spams recognition system including a Graph Neural Network (GNN) with multiple layers for spams detection comprising: obtaining an embedding for each node of a graph and an embedding for the entire graph; inputting the embedding for each node and the embedding for the entire graph to the GNN, wherein the embedding for each node and the embedding for the entire graph are updated successively at each layer of the GNN; and updating the GNN based on a loss function.
  • GNN Graph Neural Network
  • the graph is constructed based on data of an online e-mail system, and wherein each node of the graph represents an account of the online e-mail system, and an edge exists if any e-mail exchange happens between two accounts.
  • a method for training a frauds recognition system including a Graph Neural Network (GNN) with multiple layers for anomaly detection comprising: obtaining an embedding for each node of a graph and an embedding for the entire graph; inputting the embedding for each node and the embedding for the entire graph to the GNN, wherein the embedding for each node and the embedding for the entire graph are updated successively at each layer of the GNN; and updating the GNN based on a loss function.
  • GNN Graph Neural Network
  • the graph is constructed based on data of a bank system, and wherein each node of the graph represents an account of the bank system, and an edge exists if any transfer happens between two accounts or the two accounts have a certain common attribute, such as account name.
  • a method for training a rumor prediction system including a Graph Neural Network (GNN) with multiple layers for anomaly detection comprising: obtaining an embedding for each node of a graph and an embedding for the entire graph; inputting the embedding for each node and the embedding for the entire graph to the GNN, wherein the embedding for each node and the embedding for the entire graph are updated successively at each layer of the GNN; and updating the GNN based on a loss function.
  • GNN Graph Neural Network
  • the graph is constructed based on data of a social network system, and wherein each node of the graph represents an account of the social network system, and an edge exists if any information flow happens between two accounts.
  • the present disclosure can be used in a variety of applications, for example, in any scenario that inputs can be learned as node representations and connections between the nodes can be learned as edges, including but not limited to dataset preprocessing, detection of online review spams, financial frauds, insurance frauds, bot frauds, fake users and rum in social media, fake news, etc..
  • Fig. 1a illustrates an exemplary anomaly detection scenario, in accordance with various aspects of the present disclosure.
  • Fig. 1b illustrates exemplary similarity between normal nodes and global context versus abnormal nodes and the global context, in accordance with various aspects of the present disclosure.
  • Fig. 2 illustrates an exemplary module diagram of the GNN encoder for CoGCL at each layer, in accordance with various aspects of the present disclosure.
  • Fig. 3 illustrates an exemplary framework of CoGCL, in accordance with various aspects of the present disclosure.
  • Fig. 4 illustrates an exemplary process of creating corrupt graphs, in accordance with various aspects of the present disclosure.
  • Fig. 5 illustrates an exemplary flow chart of training a GNN for anomaly detection, in accordance with various aspects of the present disclosure.
  • Fig. 6 illustrates an exemplary flow chart of using the GNN trained with the method in Fig. 5 for anomaly detection, in accordance with various aspects of the present disclosure.
  • Fig. 7 illustrates an exemplary computing system 700, in accordance with various aspects of the present disclosure.
  • anomaly detection helps protect business, individuals and online communities.
  • graph-based anomaly detection has been widely used for detecting malicious activities in real-world applications.
  • Fig. 1a illustrates an exemplary anomaly detection scenario, in accordance with various aspects of the present disclosure.
  • a similar example is taken as a case study, by building a graph with papers assigned to a certain author as nodes, and connecting two papers with an edge if they share the same coauthors or venues.
  • the goal in this example is to detect the wrongly-assigned papers (anomalies) , i.e., those that are not authored by the certain author.
  • blank nodes represent papers authored by the certain author
  • black nodes represent papers actually from other authors but wrongly-assigned.
  • a graph can be constructed based on data of an online e-mail system, and wherein each node of the graph represents an account of the online e-mail system, and an edge exists if any e-mail exchange happens between two accounts.
  • a graph can be constructed based on data of a bank system, and wherein each node of the graph represents an account of the bank system, and an edge exists if any transfer happens between two accounts or the two accounts have a certain common attribute, such as account name.
  • a graph can be constructed based on data of a social network system, and wherein each node of the graph represents an account of the social network system, and an edge exists if any information flow happens between two accounts.
  • Fig. 1a shows the t-SNE2 projection of each node’s input features-the BERT embedding of its title, with blank as normal nodes and black as abnormal ones. It can be observed that both abnormal and normal nodes are distributed diversely, wherein abnormal ones being relatively more diverse. Intuitively, to quantify this observation, the similarity between each node and the global context, which is the average of all node features, is computed, which is showed in Figure 1b.
  • Fig. 1b illustrates exemplary similarity between normal nodes and global context versus abnormal nodes and the global context, in accordance with various aspects of the present disclosure.
  • the similarity between normal nodes and global context is computed as a cosine similarity herein, and shown in the blank block.
  • the similarity between abnormal nodes and the global context is computed as a cosine similarity herein, and shown with a shadowed block.
  • a graph-based anomaly detection task can be formalized as following problem.
  • G (V, X, A)
  • V is the set of N nodes, denotes the adjacency matrix
  • X is the corresponding feature vectors with denoting the d-dimensional feature vector of v i ⁇ V.
  • Y is the set of labels on nodes with y i ⁇ Y takes value 1 if v i is abnormal and 0 otherwise.
  • the goal is to learn a classifier to determine whether a given node is abnormal (1) or normal (0) .
  • a CoGCL model is discussed in a supervised framework.
  • the basic idea of the CoGCL model is to determine a node’s abnormality by contrasting it with the global context (e.g., the average of all nodes) of the entire graph. This is motivated by the discovery that there exists a significant difference of distance to the global context between the normal and the abnormal nodes, that is, a node is more abnormal if it deviates farther away from the majority of nodes in the feature space.
  • the CoGCL model is disclosed to distinguish abnormal and normal nodes in the embedding space by leveraging graph contrastive learning.
  • One of the main ideas of the model is to take the graph embedding q as the query, a normal node's embedding as the positive key that matches with q, and the embeddings of all the abnormal nodes as the negative keys that don't match with q.
  • These representations in the embedding space may then be used for downstream tasks, such as node classification, anomaly detection, etc.
  • a concrete loss function is designed as equation (1) :
  • is the temperature hyperparameter.
  • the objective function is to enforce maximizing the consistency between the positive pairs (normal nodes and global context) compared with the negative pairs (abnormal node, global context) .
  • a GNN model contains a message passing process and a neighbor aggregation process.
  • the CoGCL model contains several steps.
  • the CoGCL model contains a node update process and graph update process.
  • the CoGCL model contains an edge update process, a node update process and graph update process.
  • the CoGCL's GNN encoder f GNN contains modules for corresponding processes.
  • Fig. 2 illustrates an exemplary module diagram of the GNN encoder for CoGCL at each layer, in accordance with various aspects of the present disclosure. As described below, some or all illustrated features may be omitted in a particular implementation within the scope of the present disclosure, and some illustrated features may not be required for implementation of all embodiments. It should be understood that modules shown with dashed lines in Fig. 2 indicate optional modules.
  • a graph's representation in embedding space is input into the GNN.
  • the graph is a labeled graph with a set of labels on nodes.
  • the graph is input into the GNN at least in the form of an embedding for each node and an embedding for the entire graph.
  • the graph is input into the GNN with an adjacent matrix representing edges.
  • the encoder f GNN contains several modules to optimize the context aware contrastive objective in equation (1) .
  • the encoder f GNN contains two modules, a node update module 202 and a graph update module 203. At each layer of GNN, the two modules are performed successively.
  • the encoder f GNN contains three modules, an edge update module 201, a node update module 202 and a graph update module 203. At each layer of GNN, the three modules are performed successively.
  • An edge update module 201 is to estimate the suspicious probability of an edge and modify the adjacent matrix according to the probabilities, wherein
  • a (updated) f edge (H (input) , A (input) , q (input) ) (2)
  • An node update module 202 is then performed after the edge update module, to aggregate the neighboring information based on the modified adjacent matrix A (updated) and then combine the aggregated result with the concerned node, wherein
  • H (updated) f node (H (input) , A (updated) ) (3)
  • An graph update module 203 is then performed after the node update module, to update the context of the whole graph based on the updated node embeddings H (updated) and the input graph embedding q (input) via a readout function:
  • the updated node embedding H (updated) , graph embedding q (updated) and graph edges A (updated) are output for further processing.
  • Fig. 3 illustrates an exemplary framework of CoGCL model, in accordance with various aspects of the present disclosure. As described below, some or all illustrated features may be omitted in a particular implementation within the scope of the present disclosure, and some illustrated features may not be required for implementation of all embodiments. It should be understood that modules shown with dashed lines in Fig. 3 indicate optional modules.
  • a graph G is input into the CoGCL model at 301.
  • the graph G is a labeled graph with a set of labels Y on nodes.
  • the graph G is input into the CoGCL model with an embedding for each node and an embedding for the entire graph q. The embedding for the entire graph q is obtained based at least on embeddings for each node via a readout function.
  • the graph G is input into the CoGCL model with an adjacent matrix A representing edges. As in Fig. 3, the blank nodes are shown as normal nodes, while the black nodes are shown as abnormal nodes.
  • the CoGCL model includes a plurality of repeat layers, with each layer at least includes corresponding modules of the encoder f GNN as discussed with Fig. 2. It is noted that the number of layers would not limit the scope of the invention.
  • layer 302-l includes two modules for node update and graph update respectively.
  • layer 302-l includes three modules for edge update, node update and graph update respectively.
  • edge update as the first step, which is optional, is to estimate the likelihood of each link being a suspicious one and then remove the most suspicious ones.
  • the node update is to update the node embeddings by message passing on the updated adjacency matrix.
  • the global context is updated via graph update.
  • the context aware contrastive loss function in equation (1) would be optimized, and the updated nodes' embeddings and graph embedding are output to 303 for further processing, such as calculating loss function for training, and calculating cosine similarity when for anomaly prediction, etc.
  • the encoder f GNN contains an edge update module 302-l-1 in layer 302-l to remove suspicious links.
  • Suspicious links are the ones that connect nodes with opposite labels. For example, almost every fraudster in business website can be linked to some benign users by the homophily assumption between neighbors –two neighboring nodes tend to have similar features and same labels –and impact the performance of the traditional message-passing along with the links.
  • both the node features h and one's distance to the global context h-q in the feature space are considered.
  • the distance can serve as implicit labels for guiding the model to remove suspicious links –those connected by the nodes with distinguished distance to the global context.
  • a context aware link predictor is provided in the present disclosure to estimate the suspicious linkage likelihood between two nodes.
  • the link predictor accepts the representations of two nodes as the input, where each node is represented by its own embedding and its relative distance to the global context embedding, i.e.,
  • the linkage likelihood between node v i and v j can be calculated as:
  • the suspicious links are removed to reduce the negative influence from them thoroughly by employing Bernoulli approximation, the binary case of the Gumbel-Softmax reparameterization trick. Specifically, for each link a Gumbel noise ⁇ Gumbel (0, 1) is first sampled, and then is added to the logarithm of the linkage likelihood and is applied a sigmoid function. Then, the result is rounded as 1 if it is larger than 0.5 and 0 otherwise to obtain the updated i.e.,
  • indicates the temperature of Gumbel-Softmax distribution.
  • the gradients is passed to the relaxed value instead of the discrete value during the backward procedure.
  • the encoder f GNN in layer 302-l contains a node update module 302-l-2 for updating node embeddings.
  • the node embeddings are updated based on the updated edges. In another aspect, the node embeddings are updated without updating the edges of the graph.
  • equation (3) is divided into an AGGREGATION and a COMBINE operations as:
  • equation (8) is to aggregate the neighboring information of the last layer based on the updated adjacency matrix and equation (9) is to combine the aggregated neighboring information with the concerned node.
  • the AGGREGATION operation is the sum, the same as GIN.
  • the COMBINE operation is concatenation.
  • the encoder f GNN in layer 302-l contains a graph update module 302-l-2.
  • the graph embedding q which is the global context, needs to be updated at each layer after all nodes' embeddings are updated.
  • the graph embedding q can be obtained by straightforwardly performing one of sum pooling, max pooling or average pooling of all nodes in the graph.
  • these kinds of operations do not distinguish the normal and abnormal nodes, which result in the inaccurate global context being deviated from the normal pattern.
  • a memory buffer m is introduced to register the global context q (l-1) of the last layer, based on which the contribution of each node to the global context is calculated.
  • the main idea here is a node which is deviated from the current global context should be weakened in the next context update step.
  • the concrete strategy can be formulated as:
  • the memory are read out and the attention for each node v i is calculated based at least on its cosine similarity with the memory vector, which value is the global context q (l-1) of the last layer. Then the global context q (l) is updated by weighted aggregating different nodes, and the updated global context is written into the memory to update the memory vector m.
  • the updated nodes' embeddings H (L) and graph embedding q (L) are output to 303 for further processing.
  • the loss function is calculated based at least on the updated nodes' embeddings H (L) and graph embedding q (L) .
  • the updated nodes' embeddings H (L) and graph embedding q (L) are output to 303 for anomaly prediction.
  • a binary classifier is used at 303 to determine whether a given node is abnormal or normal with a binary label indicating 1 or 0.
  • the cosine similarity between them is calculated as the abnormal extent of v i .
  • the CoGCL model is discussed in a supervised framework. While sufficient labels of anomalies are often arduously expensive to obtain, the CoGCL model can be extended as an unsupervised model to handle real-world applications with scarce or even no labels.
  • the CoGCL model is discussed in an unsupervised framework, especially focus on a pre-training strategy.
  • Fig. 4 illustrates an exemplary process of creating corrupt graphs, in accordance with various aspects of the present disclosure.
  • the graph is broken into several sub-graphs.
  • the hierarchical agglomerative clustering algorithms can be used to cluster nodes into sub-graphs based on the initial features X and determine the size of the clusters via finding the optimal Silhouette Coefficient, which is a well-adopted clustering metric.
  • each corrupt sub-graph is input into the CoGCL model respectively.
  • the same context aware contrastive loss function defined in equation (1) can be directly optimized to train f GNN .
  • the context aware contrastive loss function defined in equation (1) can be further fine-tuned on the target graph if the ground truth labels are also available.
  • Fig. 5 illustrates an exemplary flow chart of training a GNN for anomaly detection, in accordance with various aspects of the present disclosure. As described below, some or all illustrated features may be omitted in a particular implementation within the scope of the present disclosure, and some illustrated features may not be required for implementation of all embodiments. In some examples, the method may be carried out by any suitable apparatus or means for carrying out the functions or algorithm described below. It should be understood that operations shown with dashed lines in Fig. 5 indicate optional operations.
  • the method begins at 501, with a graph G is given.
  • the set of labels Y is a set of ground-truth labels. In another aspect, the set of labels Y is obtained through a pre-training procedure, which would be discussed with the optional operation 500. In other aspect, the set of labels Y is a set of labels including both ground-truth labels and pre-trained labels.
  • a graph G is generated by breaking a large graph G′ into M parts Then for each part G i , injecting a set of nodes from the parts except G i to create corresponding corrupt graph The corresponding corrupt graphs are input into the model respectively, each of the corrupt graphs is treated as an independent graph G.
  • the method proceeds to 502 from 501, with obtaining an embedding for each node of the graph G and an embedding for the entire graph.
  • the method proceeds to 503, with inputting the embedding for each node and the embedding for the entire graph to a Graph Neural Network (GNN) with multiple layers.
  • GNN Graph Neural Network
  • the embedding for each node and the embedding for the entire graph are updated successively.
  • the embedding for each node is updated in a message passing and neighbor aggregation process.
  • the embedding for each node is updated at each layer by aggregating embeddings of all neighboring nodes of the last layer for a concerned node based at least on the edges of current layer; and combining the aggregated embedding with the embedding for the concerned node of last layer.
  • the aggregating operation comprises summation
  • the combining operation comprises concatenation.
  • the embedding for the entire graph is updated at each layer after the embedding for each node is updated at the current layer.
  • the embedding for the entire graph is obtained by performing one of sum pooling, max pooling or average pooling of all nodes in the graph.
  • the embedding for the entire graph is updated by calculating a similarity between the embedding for each node of current layer and the embedding for the entire graph of last layer; and calculating the embedding for the entire graph of the current layer by weighted aggregating the embedding for each node of the current layer based on the corresponding similarities.
  • the similarity is cosine similarity
  • the aggregating operation comprises summation.
  • edges of the graph, the embedding for each node and the embedding for the entire graph are updated successively.
  • the edges of the graph are updated at each layer by estimating a suspicious probability of an edge between two nodes; and removing the edge based at least partially the suspicious probability of the edge.
  • the embedding for each node is updated in a message passing and neighbor aggregation process.
  • the embedding for each node is updated at each layer after the edges is updated at current layer by aggregating embeddings of all neighboring nodes of the last layer for a concerned node based at least on the edges of current layer; and combining the aggregated embedding with the embedding for the concerned node of last layer.
  • the aggregating operation comprises summation
  • the combining operation comprises concatenation.
  • the embedding for the entire graph is updated at each layer after the embedding for each node is updated at the current layer.
  • the embedding for the entire graph is obtained by performing one of sum pooling, max pooling or average pooling of all nodes in the graph.
  • the embedding for the entire graph is updated by calculating a similarity between the embedding for each node of current layer and the embedding for the entire graph of last layer; and calculating the embedding for the entire graph of the current layer by weighted aggregating the embedding for each node of the current layer based on the corresponding similarities.
  • the similarity is cosine similarity
  • the aggregating operation comprises summation.
  • the method proceeds to 504, with updating the GNN based on a loss function.
  • the loss function is to enforce maximizing the consistency between embeddings of normal nodes of the graph and the embedding for the entire graph compared with the consistency between embeddings of abnormal nodes of the graph and the embedding for the entire graph.
  • Fig. 6 illustrates an exemplary flow chart of using the GNN trained with the method in the Fig. 5 for anomaly detection, in accordance with various aspects of the present disclosure.
  • some or all illustrated features may be omitted in a particular implementation within the scope of the present disclosure, and some illustrated features may not be required for implementation of all embodiments.
  • the method may be carried out by any suitable apparatus or means for carrying out the functions or algorithm described below.
  • the method begins at 601, with a graph G is given for anomaly detection.
  • the method proceeds to 602, with obtaining an embedding for each node of the graph G and an embedding for the entire graph.
  • the method proceeds to 603, with inputting the embedding for each node and the embedding for the entire graph to a Graph Neural Network (GNN) with multiple layers.
  • GNN Graph Neural Network
  • the embedding for each node and the embedding for the entire graph are updated successively.
  • the embedding for each node is updated in a message passing and neighbor aggregation process.
  • the embedding for each node is updated at each layer by aggregating embeddings of all neighboring nodes of the last layer for a concerned node based at least on the edges of current layer; and combining the aggregated embedding with the embedding for the concerned node of last layer.
  • the aggregating operation comprises summation
  • the combining operation comprises concatenation.
  • the embedding for the entire graph is updated at each layer after the embedding for each node is updated at the current layer.
  • the embedding for the entire graph is obtained by performing one of sum pooling, max pooling or average pooling of all nodes in the graph.
  • the embedding for the entire graph is updated by calculating a similarity between the embedding for each node of current layer and the embedding for the entire graph of last layer; and calculating the embedding for the entire graph of the current layer by weighted aggregating the embedding for each node of the current layer based on the corresponding similarities.
  • the similarity is cosine similarity
  • the aggregating operation comprises summation.
  • edges of the graph, the embedding for each node and the embedding for the entire graph are updated successively.
  • the edges of the graph are updated at each layer by estimating a suspicious probability of an edge between two nodes; and removing the edge based at least partially the suspicious probability of the edge.
  • the embedding for each node is updated in a message passing and neighbor aggregation process.
  • the embedding for each node is updated at each layer after the edges is updated at current layer by aggregating embeddings of all neighboring nodes of the last layer for a concerned node based at least on the edges of current layer; and combining the aggregated embedding with the embedding for the concerned node of last layer.
  • the aggregating operation comprises summation
  • the combining operation comprises concatenation.
  • the embedding for the entire graph is updated at each layer after the embedding for each node is updated at the current layer.
  • the embedding for the entire graph is obtained by performing one of sum pooling, max pooling or average pooling of all nodes in the graph.
  • the embedding for the entire graph is updated by calculating a similarity between the embedding for each node of current layer and the embedding for the entire graph of last layer; and calculating the embedding for the entire graph of the current layer by weighted aggregating the embedding for each node of the current layer based on the corresponding similarities.
  • the similarity is cosine similarity
  • the aggregating operation comprises summation.
  • the method proceeds to 604, with outputting indication for determining whether a node is anomaly.
  • the indication is a binary label for the node to determine whether a given node is abnormal (1) or normal (0) .
  • the indication is a similarity between the updated embedding for the node and the updated embedding for the entire graph as an abnormal extent.
  • the similarity can be a cosine similarity.
  • Fig. 7 illustrates an exemplary computing system 700, in accordance with various aspects of the present disclosure.
  • the computing system 700 may comprise at least one processor 710.
  • the computing system 700 may further comprise at least one storage device 720.
  • the storage device 720 may store computer-executable instructions that, when executed, cause the processor 710 to perform the method for training a Graph Neural Network (GNN) with multiple layers for anomaly detection, comprising: obtaining an embedding for each node of a graph and an embedding for the entire graph; inputting the embedding for each node and the embedding for the entire graph to the GNN, wherein the embedding for each node and the embedding for the entire graph are updated successively at each layer of the GNN; and updating the GNN based on a loss function.
  • GNN Graph Neural Network
  • the storage device 720 may store computer-executable instructions that, when executed, cause the processor 710 to perform the method for using a Graph Neural Network (GNN) trained with the present disclosure for anomaly detection, comprising: obtaining an embedding for each node of a graph and an embedding for the entire graph; inputting the embedding for each node and the embedding for the entire graph to the GNN, wherein at least the embedding for each node and the embedding for the entire graph are updated successively at each layer of the GNN; and outputting, after final layer of the GNN, a cosine similarity between the updated embedding for a node and the updated embedding for the entire graph as an abnormal extent.
  • GNN Graph Neural Network
  • the storage device 720 may store computer-executable instructions that, when executed, cause the processor 710 to perform any operations according to the embodiments of the present disclosure as described in connection with FIGs. 1-6.
  • the embodiments of the present disclosure may be embodied in a computer-readable medium such as non-transitory computer-readable medium.
  • the non-transitory computer-readable medium may comprise instructions that, when executed, cause one or more processors to perform any operations according to the embodiments of the present disclosure as described in connection with FIGs. 1-6.
  • the embodiments of the present disclosure may be embodied in a computer program product comprising computer-executable instructions that, when executed, cause one or more processors to perform any operations according to the embodiments of the present disclosure as described in connection with FIGs. 1-6.
  • modules in the apparatuses described above may be implemented in various approaches. These modules may be implemented as hardware, software, or a combination thereof. Moreover, any of these modules may be further functionally divided into sub-modules or combined together.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method for training a Graph Neural Network (GNN) with multiple layers for anomaly detection is disclosed. The method comprising obtaining an embedding for each node of a graph and an embedding for the entire graph; inputting the embedding for each node and the embedding for the entire graph to the GNN, wherein the embedding for each node and the embedding for the entire graph are updated successively at each layer of the GNN; and updating the GNN based on a loss function.. The method further comprising edges of the graph are updated at each layer of the GNN before the embedding for each node and the embedding for the entire graph are updated. Numerous other aspects are provided.

Description

[Title established by the ISA under Rule 37.2] METHOD AND APPARATUS FOR ANOMALY DETECTION ON GRAPH FIELD
Aspects of the present disclosure relate generally to artificial intelligence, and more particularly, to a method and an apparatus provided for anomaly detection on a graph.
BACKGROUND
Anomaly detection has profound impacts on preventing malicious activities in various applications, such as the detection of online review spams, financial frauds, fake users, and misinformation. The most promising developments have been the utilization of graph structures in machine learning models, as graphs can be used for naturally modeling the structural dependencies underlying the data.
A significant line of works has been focused on discovering and incorporating the structural patterns of anomalies into learning models. Recently, the advances of graph neural networks (GNNs) have inspired and empowered various attempts to adopt GNNs for detecting anomalies. Existing attempts to address the graph-based anomaly detection have focused on structural feature engineering or learning. The main idea of GNN-based anomaly detection is to leverage the power of GNNs to learn expressive node representations with the goal of distinguishing abnormal and normal nodes in the embedding space, producing promising results.
Contrastive learning is a widely used method to train an encoder to learn to produce vector representations of inputs such that representations of inputs in the same class will be more similar compared to representations of inputs in different classes. It is proven to be very useful for the downstream classification task, but is not specifically proposed to solve the anomaly detection problem.
However, it may be not sufficient to only make inputs in the same class more similar, it could be more efficient to make inputs in different classes more distinguishing at the same time. It is promising to combine GNNs and contrastive learning for anomaly detection, with a finely designed objective.
SUMMARY
The following presents a simplified summary of one or more aspects in order to provide a basic understanding of such aspects. This summary is not an extensive overview of all contemplated aspects, and is intended to neither identify key or critical elements of all aspects nor delineate the scope of any or all aspects. Its sole purpose is to present some concepts of one or more aspects in a simplified form as a prelude to the more detailed description that is presented later.
In light of the contrastive learning, it is proposed in the present disclosure to contrast each node with the global context of the input graph. For the purpose of contrasting abnormal nodes with normal ones in terms of their distances to the global context, a context aware graph contrastive learning (CoGCL) model is disclosed based on graph contrastive learning and GNN. The underlying assumption is that abnormal nodes tend to be more distant from the majority of the nodes than normal ones.
The present disclosure provides the context aware graph contrastive loss function in a supervised manner, i.e., labeled normal and abnormal nodes as positive and negative keys respectively. To achieve the contrastive objective, a graph neural network encoder is disclosed to learn the nodes' representations as well as learn the global context of the input graph during message passing. The graph neural network encoder is further disclosed to infer and remove suspicious links before learning the nodes' representations and learning the global context of the input graph. The disclosed encoder has corresponding modules for edge update, node update and graph update.
In addition to the supervised model, it is also extended to an unsupervised model for handling cases with scarce labels. Straightforwardly, node labels that can be directly used to replace the ground-truth labels in the supervised contrastive loss function of CoGCL need to be synthesized.
According to an aspect, a method for training a Graph Neural Network (GNN) with multiple layers for anomaly detection is disclosed. The method comprising: obtaining an embedding for each node of a graph and an embedding for the entire graph; inputting the embedding for each node and the embedding for the entire graph to the GNN, wherein the embedding for each node and the embedding for the entire graph are updated successively at each layer of the GNN; and updating the GNN based on a loss function.
According to a further aspect, a portion of nodes is identified as abnormal nodes before input to the GNN based at least on ground-truth node labels.
According to a further aspect, a portion of nodes is identified as abnormal nodes before input to the GNN based at least on generating the graph by: breaking a first graph into a plurality of sub-graphs; and injecting a set of nodes from sub-graphs other than a sub-graph which is to be used as the graph for inputting into the GNN, into the sub-graph as abnormal nodes.
According to a further aspect, the loss function is to enforce maximizing the consistency between embeddings of normal nodes of the graph and the embedding for the entire graph compared with the consistency between embeddings of abnormal nodes of the graph and the embedding for the entire graph.
According to a further aspect, edges of the graph are updated at each layer of the GNN before the embedding for each node and the embedding for the entire graph are updated.
According to a further aspect, the edges of the graph are updated at each layer by estimating a suspicious probability of an edge between two nodes; and removing the edge based at least partially the suspicious probability of the edge.
According to a further aspect, the embedding for each node is updated at each layer by aggregating embeddings of all neighboring nodes of the last layer for a node based at least on the edges of current layer; and combining the aggregated embedding with the embedding for the node of last layer.
According to a further aspect, the embedding for the entire graph is updated at each layer by calculating a similarity between the embedding for each node of current layer and the embedding for the entire graph of last layer; and calculating the embedding for the entire graph of the current layer by weighted aggregating the embedding for each node of the current layer based on the corresponding similarities.
A method for using a Graph Neural Network (GNN) trained with the method in the present disclosure for anomaly detection is disclosed. The method comprising obtaining an embedding for each node of a graph and an embedding for the entire graph; inputting the embedding for each node and the embedding for the entire graph to the GNN, wherein at least the embedding for each node and the embedding for the entire graph are updated successively at each layer of the GNN; and outputting, after final layer of the GNN, a cosine similarity between the updated embedding for a node and the updated embedding for the entire graph as an abnormal extent.
According to a further aspect, edges of the graph are updated at each layer of the GNN before the embedding for each node and the embedding for the entire graph are updated.
According to an aspect, a method for training an online acadamic system including a Graph Neural Network (GNN) with multiple layers for wrongly-assigned publication detection is disclosed. The method comprising: obtaining an embedding for each node of a graph and an embedding for the entire graph; inputting the embedding for each node and the embedding for the entire graph to the GNN, wherein the embedding for each node and the embedding for the entire graph are updated successively at each layer of the GNN; and updating the GNN based on a loss function.
According to further aspect, the graph is constructed based on data of an online academic system, and wherein each node of the graph represents a researcher of the online academic system, and an edge exists if two publications have a certain common attribute, such as coauthors, venues or organizations.
According to an aspect, a method for training an online spams recognition system including a Graph Neural Network (GNN) with multiple layers for spams detection is disclosed. The method comprising: obtaining an embedding for each node of a graph and an embedding for the entire graph; inputting the embedding for each node and the embedding for the entire graph to the GNN, wherein the embedding for each node and  the embedding for the entire graph are updated successively at each layer of the GNN; and updating the GNN based on a loss function.
According to further aspect, the graph is constructed based on data of an online e-mail system, and wherein each node of the graph represents an account of the online e-mail system, and an edge exists if any e-mail exchange happens between two accounts.
According to an aspect, a method for training a frauds recognition system including a Graph Neural Network (GNN) with multiple layers for anomaly detection is disclosed. The method comprising: obtaining an embedding for each node of a graph and an embedding for the entire graph; inputting the embedding for each node and the embedding for the entire graph to the GNN, wherein the embedding for each node and the embedding for the entire graph are updated successively at each layer of the GNN; and updating the GNN based on a loss function.
According to further aspect, the graph is constructed based on data of a bank system, and wherein each node of the graph represents an account of the bank system, and an edge exists if any transfer happens between two accounts or the two accounts have a certain common attribute, such as account name.
According to an aspect, a method for training a rumor prediction system including a Graph Neural Network (GNN) with multiple layers for anomaly detection is disclosed. The method comprising: obtaining an embedding for each node of a graph and an embedding for the entire graph; inputting the embedding for each node and the embedding for the entire graph to the GNN, wherein the embedding for each node and the embedding for the entire graph are updated successively at each layer of the GNN; and updating the GNN based on a loss function.
According to further aspect, the graph is constructed based on data of a social network system, and wherein each node of the graph represents an account of the social network system, and an edge exists if any information flow happens between two accounts.
The present disclosure can be used in a variety of applications, for example, in any scenario that inputs can be learned as node representations and connections between the nodes can be learned as edges, including but not limited to dataset preprocessing, detection of online review spams, financial frauds, insurance frauds, bot frauds, fake users and rumors in social media, fake news, etc..
BRIEF DESCRIPTION OF THE DRAWINGS
The disclosed aspects will be described in connection with the appended drawings that are provided to illustrate and not to limit the disclosed aspects.
Fig. 1a illustrates an exemplary anomaly detection scenario, in accordance with various aspects of the present disclosure.
Fig. 1b illustrates exemplary similarity between normal nodes and global context versus abnormal nodes and the global context, in accordance with various aspects of the present disclosure.
Fig. 2 illustrates an exemplary module diagram of the GNN encoder for CoGCL at each layer, in accordance with various aspects of the present disclosure.
Fig. 3 illustrates an exemplary framework of CoGCL, in accordance with various aspects of the present disclosure.
Fig. 4 illustrates an exemplary process of creating corrupt graphs, in accordance with various aspects of the present disclosure.
Fig. 5 illustrates an exemplary flow chart of training a GNN for anomaly detection, in accordance with various aspects of the present disclosure.
Fig. 6 illustrates an exemplary flow chart of using the GNN trained with the method in Fig. 5 for anomaly detection, in accordance with various aspects of the present disclosure.
Fig. 7 illustrates an exemplary computing system 700, in accordance with various aspects of the present disclosure.
DETAILED DESCRIPTION
The present disclosure will now be discussed with reference to several example implementations. It is to be understood that these implementations are discussed only for enabling those skilled in the art to better understand and thus implement the embodiments of the present disclosure, rather than suggesting any limitations on the scope of the present disclosure.
Various embodiments will be described in detail with reference to the accompanying drawings. Wherever possible, the same reference numbers will be used throughout the drawings to refer to the same or like parts. References made to particular examples and embodiments are for illustrative purposes, and are not intended to limit the scope of the disclosure.
An important issue that data scientists are working to solve recently is anomaly detection, from network security to financial fraud, and further to misinformation propagated on social network, anomaly detection helps protect business, individuals and online communities. To this end, with mainly focused on structural features, graph-based anomaly detection has been widely used for detecting malicious activities in real-world applications.
Fig. 1a illustrates an exemplary anomaly detection scenario, in accordance with various aspects of the present disclosure.
An anomaly detection problem is motivated by news reported in 2012 that a researcher named “Jun Lu" from Beijing University of Chemical Technology cheated the awards using the papers of another author who is also named “Jun Lu" but from  Yale University. This event caused by the wrongly-assigned papers of the same author name implies the importance of anomaly detection.
To better understand the behavior of anomalies, a similar example is taken as a case study, by building a graph with papers assigned to a certain author as nodes, and connecting two papers with an edge if they share the same coauthors or venues. The goal in this example is to detect the wrongly-assigned papers (anomalies) , i.e., those that are not authored by the certain author. As shown in Fig. 1a, blank nodes represent papers authored by the certain author, and black nodes represent papers actually from other authors but wrongly-assigned.
As another example, a graph can be constructed based on data of an online e-mail system, and wherein each node of the graph represents an account of the online e-mail system, and an edge exists if any e-mail exchange happens between two accounts. As yet another example, a graph can be constructed based on data of a bank system, and wherein each node of the graph represents an account of the bank system, and an edge exists if any transfer happens between two accounts or the two accounts have a certain common attribute, such as account name. And as other example, a graph can be constructed based on data of a social network system, and wherein each node of the graph represents an account of the social network system, and an edge exists if any information flow happens between two accounts.
Fig. 1a shows the t-SNE2 projection of each node’s input features-the BERT embedding of its title, with blank as normal nodes and black as abnormal ones. It can be observed that both abnormal and normal nodes are distributed diversely, wherein abnormal ones being relatively more diverse. Intuitively, to quantify this observation, the similarity between each node and the global context, which is the average of all node features, is computed, which is showed in Figure 1b.
Fig. 1b illustrates exemplary similarity between normal nodes and global context versus abnormal nodes and the global context, in accordance with various aspects of the present disclosure. The similarity between normal nodes and global context is computed as a cosine similarity herein, and shown in the blank block. The similarity between abnormal nodes and the global context is computed as a cosine similarity herein, and shown with a shadowed block. On the left-side, it is the cosine similarity calculated based on the original input features, it suggests that though being relatively closed (y∈ [0.97, 1] ) , the two similarity distributions can be clearly distinguished. Inspired by these observations, it is desired a straightforward way to capture the differences for distinguishing abnormal and normal nodes. On the right-side, it is the cosine similarity calculated based on embeddings generated by CoGCL in the present disclosure, the two similarity distributions are more distant and more distinguishable as shown, illustrating that CoGCL is an ideal way for distinguishing the anomalies from the graph.
A graph-based anomaly detection task can be formalized as following problem. With a graph denoted as G= (V, X, A) , where V is the set of N nodes, 
Figure PCTCN2021111102-appb-000001
denotes the adjacency matrix, and X is the corresponding feature vectors with 
Figure PCTCN2021111102-appb-000002
denoting the d-dimensional feature vector of v i∈V. In consideration of generality, G is considered into an unweighted, undirected and single-relation graph, i.e., A ij=1 if an edge exists between v i and v j, and A ij=0 otherwise.
Given a labeled graph G= (V, X, A, Y) , Y is the set of labels on nodes with y i∈Y takes value 1 if v i is abnormal and 0 otherwise. In an example, the goal is to learn a classifier
Figure PCTCN2021111102-appb-000003
to determine whether a given node is abnormal (1) or normal (0) .
To address this problem, most existing GNN-based models directly instantiate g as a binary classifier to predict the abnormal labels. Inspired by the observations about the distinguished distances to the global context (Fig. 1b) , contrastive learning is leveraged in the present disclosure for detecting the anomalies.
In a first example, a CoGCL model is discussed in a supervised framework.
The basic idea of the CoGCL model is to determine a node’s abnormality by contrasting it with the global context (e.g., the average of all nodes) of the entire graph. This is motivated by the discovery that there exists a significant difference of distance to the global context between the normal and the abnormal nodes, that is, a node is more abnormal if it deviates farther away from the majority of nodes in the feature space.
In view of this, the CoGCL model is disclosed to distinguish abnormal and normal nodes in the embedding space by leveraging graph contrastive learning. Specifically, given a graph G, our objective is to learn a GNN encoder that can output an embedding h i for each node v i and also embedding q for the entire graph (global context) , i.e., (H, q) =f GNN (X, A) with
Figure PCTCN2021111102-appb-000004
One of the main ideas of the model is to take the graph embedding q as the query, a normal node's embedding as the positive key that matches with q, and the embeddings of all the abnormal nodes as the negative keys that don't match with q. These representations in the embedding space may then be used for downstream tasks, such as node classification, anomaly detection, etc.
In an embodiment, a concrete loss function is designed as equation (1) :
Figure PCTCN2021111102-appb-000005
where τ is the temperature hyperparameter. The objective function is to enforce maximizing the consistency between the positive pairs (normal nodes and global context) compared with the negative pairs (abnormal node, global context) .
Generally, for the purpose of optimizing an objective, a GNN model contains a message passing process and a neighbor aggregation process. Specifically, to optimize the context aware contrastive objective in equation (1) , the CoGCL model contains several steps. For example, the CoGCL model contains a node update process and graph update process. For another example, the CoGCL model contains an edge update process, a node update process and graph update process. As a result, the CoGCL's GNN encoder f GNN contains modules for corresponding processes.
Fig. 2 illustrates an exemplary module diagram of the GNN encoder for CoGCL at each layer, in accordance with various aspects of the present disclosure. As described below, some or all illustrated features may be omitted in a particular implementation within the scope of the present disclosure, and some illustrated features may not be required for implementation of all embodiments. It should be understood that modules shown with dashed lines in Fig. 2 indicate optional modules.
A graph's representation in embedding space is input into the GNN. In an aspect, the graph is a labeled graph with a set of labels on nodes. In an aspect the graph is input into the GNN at least in the form of an embedding for each node and an embedding for the entire graph. In an aspect, the graph is input into the GNN with an adjacent matrix representing edges.
As shown in Fig. 2, the encoder f GNN contains several modules to optimize the context aware contrastive objective in equation (1) . In an aspect, the encoder f GNN contains two modules, a node update module 202 and a graph update module 203. At each layer of GNN, the two modules are performed successively. In another aspect, the encoder f GNN contains three modules, an edge update module 201, a node update module 202 and a graph update module 203. At each layer of GNN, the three modules are performed successively.
An edge update module 201 is to estimate the suspicious probability of an edge and modify the adjacent matrix according to the probabilities, wherein
(updated) =f edge (H  (input) , A  (input) , q  (input) )              (2)
The suspicious edges are removed to reduce the negative influence from them thoroughly.
An node update module 202 is then performed after the edge update module, to aggregate the neighboring information based on the modified adjacent matrix A  (updated) and then combine the aggregated result with the concerned node, wherein
(updated) =f node (H  (input) , A  (updated) )                (3)
An graph update module 203 is then performed after the node update module, to update the context of the whole graph based on the updated node embeddings H  (updated) and the input graph embedding q  (input) via a readout function:
(updated) =f graph (H  (updated) , q  (input) )                (4)
After all the layer convolutions, the updated node embedding H  (updated) , graph embedding q  (updated) and graph edges A  (updated) are output for further processing.
Fig. 3 illustrates an exemplary framework of CoGCL model, in accordance with various aspects of the present disclosure. As described below, some or all illustrated features may be omitted in a particular implementation within the scope of the present disclosure, and some illustrated features may not be required for implementation of all embodiments. It should be understood that modules shown with dashed lines in Fig. 3 indicate optional modules.
As shown in Fig. 3, a graph G is input into the CoGCL model at 301. In an aspect, the graph G is a labeled graph with a set of labels Y on nodes. In an aspect, the graph G is input into the CoGCL model with an embedding for each node
Figure PCTCN2021111102-appb-000006
and an embedding for the entire graph q. The embedding for the entire graph q is obtained based at least on embeddings for each node
Figure PCTCN2021111102-appb-000007
via a readout function. In an aspect, the graph G is input into the CoGCL model with an adjacent matrix A representing edges. As in Fig. 3, the blank nodes are shown as normal nodes, while the black nodes are shown as abnormal nodes.
It is illustrated in Fig. 3 that the CoGCL model includes a plurality of repeat layers, with each layer at least includes corresponding modules of the encoder f GNN as discussed with Fig. 2. It is noted that the number of layers would not limit the scope of the invention.
For example, layer 302-l includes two modules for node update and graph update respectively. For another example, layer 302-l includes three modules for edge update, node update and graph update respectively. At each layer, edge update as the first step, which is optional, is to estimate the likelihood of each link being a suspicious one and then remove the most suspicious ones. Then the node update is to update the node embeddings by message passing on the updated adjacency matrix. Finally, the global context is updated via graph update. After L-layer graph convolutions, the context aware contrastive loss function in equation (1) would be optimized, and the updated nodes' embeddings and graph embedding are output to 303 for further processing, such as calculating loss function for training, and calculating cosine similarity when for anomaly prediction, etc.
In an embodiment, the encoder f GNN contains an edge update module 302-l-1 in layer 302-l to remove suspicious links. Suspicious links are the ones that connect nodes with opposite labels. For example, almost every fraudster in business website can be linked to some benign users by the homophily assumption between neighbors –two neighboring nodes tend to have similar features and same labels –and impact the performance of the traditional message-passing along with the links.
To predict the likelihood of a link, instead of based on the node features regardless of the node labels, both the node features h and one's distance to the global context h-q in the feature space are considered. In the present disclosure, the distance can serve as implicit labels for guiding the model to remove suspicious links –those connected by the nodes with distinguished distance to the global context.
In an aspect, a context aware link predictor is provided in the present disclosure to estimate the suspicious linkage likelihood between two nodes. The link predictor accepts the representations of two nodes as the input, where each node is represented by its own embedding and its relative distance to the global context embedding, i.e.,
Figure PCTCN2021111102-appb-000008
Where
Figure PCTCN2021111102-appb-000009
is the concatenation operator and MLP is a projection layer. 
Figure PCTCN2021111102-appb-000010
Figure PCTCN2021111102-appb-000011
the distance from node v i to the global context, indicates that two nodes can be highly probably linked given their relative positions to the global context are similar, even if their local features are slightly different. In an aspect, the linkage likelihood between node v i and v j can be calculated as:
Figure PCTCN2021111102-appb-000012
Where ReLU normalizes the cosine similarities into [0, 1] .
In an aspect, the suspicious links are removed to reduce the negative influence from them thoroughly by employing Bernoulli approximation, the binary case of the Gumbel-Softmax reparameterization trick. Specifically, for each link
Figure PCTCN2021111102-appb-000013
a Gumbel noise ε~ Gumbel (0, 1) is first sampled, and then is added to the logarithm of the linkage likelihood
Figure PCTCN2021111102-appb-000014
and is applied a sigmoid function. Then, the result is rounded as 1 if it is larger than 0.5 and 0 otherwise to obtain the updated
Figure PCTCN2021111102-appb-000015
i.e.,
Figure PCTCN2021111102-appb-000016
Where λ indicates the temperature of Gumbel-Softmax distribution. In an aspect, according to straight-through gradient estimator, the gradients is passed to the relaxed value
Figure PCTCN2021111102-appb-000017
instead of the discrete value
Figure PCTCN2021111102-appb-000018
during the backward procedure.
In another aspect, to accelerate the optimization of the link, 
Figure PCTCN2021111102-appb-000019
Figure PCTCN2021111102-appb-000020
is also added, which is the cross-entropy loss between the estimated  likelihood
Figure PCTCN2021111102-appb-000021
and the observation
Figure PCTCN2021111102-appb-000022
of the last layer, as the constraint of the loss function in equation (1) .
In an embodiment, the encoder f GNN in layer 302-l contains a node update module 302-l-2 for updating node embeddings. In an aspect, the node embeddings are updated based on the updated edges. In another aspect, the node embeddings are updated without updating the edges of the graph.
In an aspect, the operation in equation (3) is divided into an AGGREGATION and a COMBINE operations as:
Figure PCTCN2021111102-appb-000023
Figure PCTCN2021111102-appb-000024
Where equation (8) is to aggregate the neighboring information of the last layer based on the updated adjacency matrix and equation (9) is to combine the aggregated neighboring information with the concerned node. In an aspect, the AGGREGATION operation is the sum, the same as GIN. In an aspect, the COMBINE operation is concatenation.
In an embodiment, the encoder f GNN in layer 302-l contains a graph update module 302-l-2. The graph embedding q which is the global context, needs to be updated at each layer after all nodes' embeddings are updated.
In an aspect, the graph embedding q can be obtained by straightforwardly performing one of sum pooling, max pooling or average pooling of all nodes in the graph. However, these kinds of operations do not distinguish the normal and abnormal nodes, which result in the inaccurate global context being deviated from the normal pattern.
In another aspect, a memory buffer m is introduced to register the global context q  (l-1) of the last layer, based on which the contribution of each node to the global context is calculated. The main idea here is a node which is deviated from the current global context should be weakened in the next context update step. The concrete strategy can be formulated as:
Figure PCTCN2021111102-appb-000025
The memory are read out and the attention
Figure PCTCN2021111102-appb-000026
for each node v i is calculated based at least on its cosine similarity with the memory vector, which value is the global context q  (l-1) of the last layer. Then the global context q  (l) is updated by weighted  aggregating different nodes, and the updated global context is written into the memory to update the memory vector m.
After L-layer graph convolutions, the updated nodes' embeddings H  (L) and graph embedding q  (L) are output to 303 for further processing.
In an embodiment, for the graphs used for training, at 303, the loss function is calculated based at least on the updated nodes' embeddings H  (L) and graph embedding q  (L) .
In another embodiment, for the graphs used for test, at 303, the updated nodes' embeddings H  (L) and graph embedding q  (L) are output to 303 for anomaly prediction.
In an aspect, for node v i∈V to be predicted, a binary classifier is used at 303 to determine whether a given node is abnormal or normal with a binary label indicating 1 or 0.
In another aspect, instead of directly predicting a binary label for each node, for node v i∈V to be predicted, with its embedding
Figure PCTCN2021111102-appb-000027
and the graph embedding q  (L) of the final layer, the cosine similarity between them is calculated as the abnormal extent of v i. The lower the cosine similarity is, the more anomalous the v i is predicted to be.
In the above example, the CoGCL model is discussed in a supervised framework. While sufficient labels of anomalies are often arduously expensive to obtain, the CoGCL model can be extended as an unsupervised model to handle real-world applications with scarce or even no labels.
In a second example, the CoGCL model is discussed in an unsupervised framework, especially focus on a pre-training strategy.
Inspired by the idea of the context aware contrastive objective, it is proposed to construct pseudo anomalies via corrupting the original graph. Considering one small part of the original graph, nodes outside of this small part can be injected into it to act as pseudo anomalies. The underlying assumption is that as nodes in different parts of the graph follow different feature distributions, they can serve as the pseudo anomalies to the current original graph context.
A corrupt graph is defined as follows. Given a graph G= (V, X, A) , which is broken into M parts
Figure PCTCN2021111102-appb-000028
For each part G i, a corrupt graph
Figure PCTCN2021111102-appb-000029
is created by injecting a set of nodes
Figure PCTCN2021111102-appb-000030
from the parts except G i, i.e., 
Figure PCTCN2021111102-appb-000031
thus the corrupt nodes
Figure PCTCN2021111102-appb-000032
are the union of V i and
Figure PCTCN2021111102-appb-000033
i.e., 
Figure PCTCN2021111102-appb-000034
The corrupt adjacency matrix 
Figure PCTCN2021111102-appb-000035
of
Figure PCTCN2021111102-appb-000036
is obtained by slicing A using the indexes in
Figure PCTCN2021111102-appb-000037
Fig. 4 illustrates an exemplary process of creating corrupt graphs, in accordance with various aspects of the present disclosure. Firstly, the graph is broken into several  sub-graphs. There are different ways for breaking the graph into multiple parts, such as using community detection or clustering algorithms. In an aspect, the hierarchical agglomerative clustering algorithms can be used to cluster nodes into sub-graphs based on the initial features X and determine the size of the clusters via finding the optimal Silhouette Coefficient, which is a well-adopted clustering metric.
Then the sub-graphs are injected into nodes from other sub-graphs as abnormal nodes. In an aspect, each corrupt sub-graph is input into the CoGCL model respectively.
In an aspect, given the corrupt sub-graphs with injected nodes as pseudo anomalies and original nodes as normal ones, the same context aware contrastive loss function defined in equation (1) can be directly optimized to train f GNN.
In another aspect, the context aware contrastive loss function defined in equation (1) can be further fine-tuned on the target graph if the ground truth labels are also available.
Fig. 5 illustrates an exemplary flow chart of training a GNN for anomaly detection, in accordance with various aspects of the present disclosure. As described below, some or all illustrated features may be omitted in a particular implementation within the scope of the present disclosure, and some illustrated features may not be required for implementation of all embodiments. In some examples, the method may be carried out by any suitable apparatus or means for carrying out the functions or algorithm described below. It should be understood that operations shown with dashed lines in Fig. 5 indicate optional operations.
The method begins at 501, with a graph G is given. The graph G is a graph with labels, which is denoted as G= (V, X, A, Y) , where V is the set of N nodes, 
Figure PCTCN2021111102-appb-000038
denotes the adjacency matrix, and X is the corresponding feature vectors with
Figure PCTCN2021111102-appb-000039
denoting the d-dimensional feature vector of v i∈V, Y is the set of labels on nodes with y i∈Y takes value 1 if v i is abnormal and 0 otherwise.
In an aspect, the set of labels Y is a set of ground-truth labels. In another aspect, the set of labels Y is obtained through a pre-training procedure, which would be discussed with the optional operation 500. In other aspect, the set of labels Y is a set of labels including both ground-truth labels and pre-trained labels.
As shown in Fig. 5, operations in 500 are optional. At 500, a graph G is generated by breaking a large graph G′ into M parts
Figure PCTCN2021111102-appb-000040
Then for each part G i, injecting a set of nodes
Figure PCTCN2021111102-appb-000041
from the parts except G i to create corresponding corrupt graph
Figure PCTCN2021111102-appb-000042
The corresponding corrupt graphs
Figure PCTCN2021111102-appb-000043
are input into the model respectively, each of the corrupt graphs is treated as an independent graph G.
The method proceeds to 502 from 501, with obtaining an embedding for each node of the graph G and an embedding for the entire graph.
The method proceeds to 503, with inputting the embedding for each node and the embedding for the entire graph to a Graph Neural Network (GNN) with multiple layers.
In an aspect, at each layer of the GNN, the embedding for each node and the embedding for the entire graph are updated successively.
As an example, the embedding for each node is updated in a message passing and neighbor aggregation process. In an embodiment, the embedding for each node is updated at each layer by aggregating embeddings of all neighboring nodes of the last layer for a concerned node based at least on the edges of current layer; and combining the aggregated embedding with the embedding for the concerned node of last layer. In an embodiment, the aggregating operation comprises summation, and the combining operation comprises concatenation.
As a further example, the embedding for the entire graph is updated at each layer after the embedding for each node is updated at the current layer. In an embodiment, the embedding for the entire graph is obtained by performing one of sum pooling, max pooling or average pooling of all nodes in the graph. In another embodiment, the embedding for the entire graph is updated by calculating a similarity between the embedding for each node of current layer and the embedding for the entire graph of last layer; and calculating the embedding for the entire graph of the current layer by weighted aggregating the embedding for each node of the current layer based on the corresponding similarities. In a further embodiment, the similarity is cosine similarity, and the aggregating operation comprises summation.
In another aspect, at each layer of the GNN, edges of the graph, the embedding for each node and the embedding for the entire graph are updated successively.
As an example, the edges of the graph are updated at each layer by estimating a suspicious probability of an edge between two nodes; and removing the edge based at least partially the suspicious probability of the edge.
As a further example, the embedding for each node is updated in a message passing and neighbor aggregation process. In an embodiment, the embedding for each node is updated at each layer after the edges is updated at current layer by aggregating embeddings of all neighboring nodes of the last layer for a concerned node based at least on the edges of current layer; and combining the aggregated embedding with the embedding for the concerned node of last layer. In an embodiment, the aggregating operation comprises summation, and the combining operation comprises concatenation.
As a further example, the embedding for the entire graph is updated at each layer after the embedding for each node is updated at the current layer. In an embodiment, the embedding for the entire graph is obtained by performing one of sum pooling, max pooling or average pooling of all nodes in the graph. In another embodiment, the  embedding for the entire graph is updated by calculating a similarity between the embedding for each node of current layer and the embedding for the entire graph of last layer; and calculating the embedding for the entire graph of the current layer by weighted aggregating the embedding for each node of the current layer based on the corresponding similarities. In a further embodiment, the similarity is cosine similarity, and the aggregating operation comprises summation.
After all layer graph convolution, the method proceeds to 504, with updating the GNN based on a loss function.
In an aspect, the loss function is to enforce maximizing the consistency between embeddings of normal nodes of the graph and the embedding for the entire graph compared with the consistency between embeddings of abnormal nodes of the graph and the embedding for the entire graph.
Fig. 6 illustrates an exemplary flow chart of using the GNN trained with the method in the Fig. 5 for anomaly detection, in accordance with various aspects of the present disclosure. As described below, some or all illustrated features may be omitted in a particular implementation within the scope of the present disclosure, and some illustrated features may not be required for implementation of all embodiments. In some examples, the method may be carried out by any suitable apparatus or means for carrying out the functions or algorithm described below.
The method begins at 601, with a graph G is given for anomaly detection. The graph G is denoted as G= (V, X, A) , where V is the set of N nodes, 
Figure PCTCN2021111102-appb-000044
denotes the adjacency matrix, and X is the corresponding feature vectors with
Figure PCTCN2021111102-appb-000045
denoting the d-dimensional feature vector of v i∈V. In a further aspect, the graph G could be given with ground-truth labels as G= (V, X, A, Y) , wherein Y is the set of labels on nodes with y i∈Y takes value 1 if v i is abnormal and 0 otherwise.
The method proceeds to 602, with obtaining an embedding for each node of the graph G and an embedding for the entire graph.
The method proceeds to 603, with inputting the embedding for each node and the embedding for the entire graph to a Graph Neural Network (GNN) with multiple layers.
In an aspect, at each layer of the GNN, the embedding for each node and the embedding for the entire graph are updated successively.
As an example, the embedding for each node is updated in a message passing and neighbor aggregation process. In an embodiment, the embedding for each node is updated at each layer by aggregating embeddings of all neighboring nodes of the last layer for a concerned node based at least on the edges of current layer; and combining the aggregated embedding with the embedding for the concerned node of last layer. In an embodiment, the aggregating operation comprises summation, and the combining operation comprises concatenation.
As a further example, the embedding for the entire graph is updated at each layer after the embedding for each node is updated at the current layer. In an embodiment, the embedding for the entire graph is obtained by performing one of sum pooling, max pooling or average pooling of all nodes in the graph. In another embodiment, the embedding for the entire graph is updated by calculating a similarity between the embedding for each node of current layer and the embedding for the entire graph of last layer; and calculating the embedding for the entire graph of the current layer by weighted aggregating the embedding for each node of the current layer based on the corresponding similarities. In a further embodiment, the similarity is cosine similarity, and the aggregating operation comprises summation.
In another aspect, at each layer of the GNN, edges of the graph, the embedding for each node and the embedding for the entire graph are updated successively.
As an example, the edges of the graph are updated at each layer by estimating a suspicious probability of an edge between two nodes; and removing the edge based at least partially the suspicious probability of the edge.
As a further example, the embedding for each node is updated in a message passing and neighbor aggregation process. In an embodiment, the embedding for each node is updated at each layer after the edges is updated at current layer by aggregating embeddings of all neighboring nodes of the last layer for a concerned node based at least on the edges of current layer; and combining the aggregated embedding with the embedding for the concerned node of last layer. In an embodiment, the aggregating operation comprises summation, and the combining operation comprises concatenation.
As a further example, the embedding for the entire graph is updated at each layer after the embedding for each node is updated at the current layer. In an embodiment, the embedding for the entire graph is obtained by performing one of sum pooling, max pooling or average pooling of all nodes in the graph. In another embodiment, the embedding for the entire graph is updated by calculating a similarity between the embedding for each node of current layer and the embedding for the entire graph of last layer; and calculating the embedding for the entire graph of the current layer by weighted aggregating the embedding for each node of the current layer based on the corresponding similarities. In a further embodiment, the similarity is cosine similarity, and the aggregating operation comprises summation.
After all layer graph convolution, the method proceeds to 604, with outputting indication for determining whether a node is anomaly.
In an aspect, the indication is a binary label for the node to determine whether a given node is abnormal (1) or normal (0) .
In another aspect, the indication is a similarity between the updated embedding for the node and the updated embedding for the entire graph as an abnormal extent. For example, the similarity can be a cosine similarity.
Fig. 7 illustrates an exemplary computing system 700, in accordance with various aspects of the present disclosure. The computing system 700 may comprise at least one processor 710. The computing system 700 may further comprise at least one storage device 720. In an aspect, the storage device 720 may store computer-executable instructions that, when executed, cause the processor 710 to perform the method for training a Graph Neural Network (GNN) with multiple layers for anomaly detection, comprising: obtaining an embedding for each node of a graph and an embedding for the entire graph; inputting the embedding for each node and the embedding for the entire graph to the GNN, wherein the embedding for each node and the embedding for the entire graph are updated successively at each layer of the GNN; and updating the GNN based on a loss function. In a further aspect, the storage device 720 may store computer-executable instructions that, when executed, cause the processor 710 to perform the method for using a Graph Neural Network (GNN) trained with the present disclosure for anomaly detection, comprising: obtaining an embedding for each node of a graph and an embedding for the entire graph; inputting the embedding for each node and the embedding for the entire graph to the GNN, wherein at least the embedding for each node and the embedding for the entire graph are updated successively at each layer of the GNN; and outputting, after final layer of the GNN, a cosine similarity between the updated embedding for a node and the updated embedding for the entire graph as an abnormal extent.
It should be appreciated that the storage device 720 may store computer-executable instructions that, when executed, cause the processor 710 to perform any operations according to the embodiments of the present disclosure as described in connection with FIGs. 1-6.
The embodiments of the present disclosure may be embodied in a computer-readable medium such as non-transitory computer-readable medium. The non-transitory computer-readable medium may comprise instructions that, when executed, cause one or more processors to perform any operations according to the embodiments of the present disclosure as described in connection with FIGs. 1-6.
The embodiments of the present disclosure may be embodied in a computer program product comprising computer-executable instructions that, when executed, cause one or more processors to perform any operations according to the embodiments of the present disclosure as described in connection with FIGs. 1-6.
It should be appreciated that all the operations in the methods described above are merely exemplary, and the present disclosure is not limited to any operations in the  methods or sequence orders of these operations, and should cover all other equivalents under the same or similar concepts.
It should also be appreciated that all the modules in the apparatuses described above may be implemented in various approaches. These modules may be implemented as hardware, software, or a combination thereof. Moreover, any of these modules may be further functionally divided into sub-modules or combined together.
The previous description is provided to enable any person skilled in the art to practice the various aspects described herein. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects. Thus, the claims are not intended to be limited to the aspects shown herein. All structural and functional equivalents to the elements of the various aspects described throughout the present disclosure that are known or later come to be known to those of ordinary skill in the art are expressly incorporated herein by reference and are intended to be encompassed by the claims.

Claims (17)

  1. A method for training a Graph Neural Network (GNN) with multiple layers for anomaly detection, the method comprising:
    obtaining an embedding for each node of a graph and an embedding for the entire graph;
    inputting the embedding for each node and the embedding for the entire graph to the GNN, wherein the embedding for each node and the embedding for the entire graph are updated successively at each layer of the GNN; and
    updating the GNN based on a loss function.
  2. The method of claim 1, wherein a portion of nodes is identified as abnormal nodes before input to the GNN based at least on ground-truth node labels.
  3. The method of claim 1, wherein a portion of nodes is identified as abnormal nodes before input to the GNN based at least on generating the graph by:
    breaking a first graph into a plurality of sub-graphs; and
    injecting a set of nodes from sub-graphs other than a sub-graph which is to be used as the graph for inputting into the GNN, into the sub-graph as abnormal nodes.
  4. The method of claims 2 or 3, wherein the loss function is to enforce maximizing the consistency between embeddings of normal nodes of the graph and the embedding for the entire graph compared with the consistency between embeddings of abnormal nodes of the graph and the embedding for the entire graph.
  5. The method of claim 1, the method further comprising:
    edges of the graph are updated at each layer of the GNN before the embedding for each node and the embedding for the entire graph are updated.
  6. The method of claim 5, wherein the edges of the graph are updated at each layer by:
    estimating a suspicious probability of an edge between two nodes; and
    removing the edge based at least partially the suspicious probability of the edge.
  7. The method of claim 6, wherein estimating the suspicious probability of the edge between two nodes comprising:
    estimating the suspicious probability of the edge between two nodes based at least on a similarity of the two nodes, wherein each node is represented by its own  node embedding and its distance to the embedding for the entire graph of the last layer.
  8. The method of claim 5, wherein the embedding for each node is updated at each layer by:
    aggregating embeddings of all neighboring nodes of the last layer for a node based at least on the edges of current layer; and
    combining the aggregated embedding with the embedding for the node of last layer.
  9. The method of claim 8, wherein the aggregating operation comprises summation, and the combining operation comprises concatenation.
  10. The method of claim 5, wherein the embedding for the entire graph is obtained by performing one of sum pooling, max pooling or average pooling of all nodes in the graph.
  11. The method of claim 5, wherein the embedding for the entire graph is updated at each layer by:
    calculating a similarity between the embedding for each node of current layer and the embedding for the entire graph of last layer;
    calculating the embedding for the entire graph of the current layer by weighted aggregating the embedding for each node of the current layer based on the corresponding similarities.
  12. The method of claim 1, wherein the graph is constructed based on data of an online e-mail system, and wherein each node of the graph represents an account of the online e-mail system, and an edge exists if any e-mail exchange happens between two accounts.
  13. The method of claim 1, wherein the graph is constructed based on data of an online academic system, and wherein each node of the graph represents a publication of a researcher, and an edge exists if two publications have a certain common attribute.
  14. A method for using a Graph Neural Network (GNN) trained with one of claims 1-13 for anomaly detection, comprising:
    obtaining an embedding for each node of a graph and an embedding for the entire graph;
    inputting the embedding for each node and the embedding for the entire graph to the GNN, wherein at least the embedding for each node and the embedding for the entire graph are updated successively at each layer of the GNN; and
    outputting, after final layer of the GNN, a cosine similarity between the updated embedding for a node and the updated embedding for the entire graph as an abnormal extent.
  15. A computer system, comprising:
    one or more processors; and
    one or more storage devices storing computer-executable instructions that, when executed, cause the one or more processors to perform the operations of the method of one of claims 1-14.
  16. One or more computer readable storage media storing computer-executable instructions that, when executed, cause one or more processors to perform the operations of the method of one of claims 1-14.
  17. A computer program product comprising computer-executable instructions that, when executed, cause one or more processors to perform the operations of the method of one of claims 1-14.
PCT/CN2021/111102 2021-08-06 2021-08-06 Method and apparatus for anomaly detection on graph Ceased WO2023010502A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN202180101403.3A CN118020076A (en) 2021-08-06 2021-08-06 Method and apparatus for anomaly detection on a graph
PCT/CN2021/111102 WO2023010502A1 (en) 2021-08-06 2021-08-06 Method and apparatus for anomaly detection on graph
DE112021007660.4T DE112021007660T5 (en) 2021-08-06 2021-08-06 METHOD AND DEVICE FOR ANORMAL DETECTION ON A GRAPH

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/111102 WO2023010502A1 (en) 2021-08-06 2021-08-06 Method and apparatus for anomaly detection on graph

Publications (1)

Publication Number Publication Date
WO2023010502A1 true WO2023010502A1 (en) 2023-02-09

Family

ID=85154107

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/111102 Ceased WO2023010502A1 (en) 2021-08-06 2021-08-06 Method and apparatus for anomaly detection on graph

Country Status (3)

Country Link
CN (1) CN118020076A (en)
DE (1) DE112021007660T5 (en)
WO (1) WO2023010502A1 (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116304367A (en) * 2023-02-24 2023-06-23 河北师范大学 Algorithm and device for community acquisition based on graph autoencoder self-supervised training
CN116628496A (en) * 2023-05-21 2023-08-22 北京工业大学 A False Complaint Detection Method Based on Dual-Channel Feature Contrastive Learning
CN117093928A (en) * 2023-10-18 2023-11-21 南开大学 Adaptive graph node anomaly detection method based on spectral domain graph neural network
CN117407697A (en) * 2023-12-14 2024-01-16 南昌科晨电力试验研究有限公司 Graph anomaly detection method and system based on autoencoder and attention mechanism
CN117633635A (en) * 2024-01-23 2024-03-01 南京信息工程大学 Dynamic rumor detection method based on space-time propagation diagram
CN118916820A (en) * 2024-10-08 2024-11-08 国网江苏省电力有限公司营销服务中心 Multi-view multi-scale graph anomaly detection method and system based on dynamic node screening
CN119646312A (en) * 2024-12-18 2025-03-18 同济大学 A graph contrastive learning method based on hierarchical neighbor enhancement for recommender systems
CN119728295A (en) * 2025-02-25 2025-03-28 贵州航天计量测试技术研究所 A network malicious attack detection method and system based on multi-view feature fusion
WO2025010053A3 (en) * 2024-08-12 2025-06-26 Bts Kurumsal Bi̇li̇şi̇m Teknoloji̇leri̇ Anoni̇m Şi̇rketi̇ Cyber threat prediction system and method thereof
CN120257029A (en) * 2025-06-06 2025-07-04 浙江大学 A graph anomaly detection method based on multi-view graph embedding and node prototype comparison

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210004210A1 (en) * 2019-07-01 2021-01-07 X Development Llc Learning and using programming styles
US20210067549A1 (en) * 2019-08-29 2021-03-04 Nec Laboratories America, Inc. Anomaly detection with graph adversarial training in computer systems
US20210081717A1 (en) * 2018-05-18 2021-03-18 Benevolentai Technology Limited Graph neutral networks with attention
US20210103797A1 (en) * 2019-10-04 2021-04-08 Lunit Inc. Method and system for analyzing image
US20210232918A1 (en) * 2020-01-29 2021-07-29 Nec Laboratories America, Inc. Node aggregation with graph neural networks

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210081717A1 (en) * 2018-05-18 2021-03-18 Benevolentai Technology Limited Graph neutral networks with attention
US20210004210A1 (en) * 2019-07-01 2021-01-07 X Development Llc Learning and using programming styles
US20210067549A1 (en) * 2019-08-29 2021-03-04 Nec Laboratories America, Inc. Anomaly detection with graph adversarial training in computer systems
US20210103797A1 (en) * 2019-10-04 2021-04-08 Lunit Inc. Method and system for analyzing image
US20210232918A1 (en) * 2020-01-29 2021-07-29 Nec Laboratories America, Inc. Node aggregation with graph neural networks

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116304367A (en) * 2023-02-24 2023-06-23 河北师范大学 Algorithm and device for community acquisition based on graph autoencoder self-supervised training
CN116304367B (en) * 2023-02-24 2023-12-01 河北师范大学 Algorithm and device for obtaining communities based on graph self-encoder self-supervision training
CN116628496A (en) * 2023-05-21 2023-08-22 北京工业大学 A False Complaint Detection Method Based on Dual-Channel Feature Contrastive Learning
CN117093928A (en) * 2023-10-18 2023-11-21 南开大学 Adaptive graph node anomaly detection method based on spectral domain graph neural network
CN117407697B (en) * 2023-12-14 2024-04-02 南昌科晨电力试验研究有限公司 Graph anomaly detection method and system based on autoencoder and attention mechanism
CN117407697A (en) * 2023-12-14 2024-01-16 南昌科晨电力试验研究有限公司 Graph anomaly detection method and system based on autoencoder and attention mechanism
CN117633635A (en) * 2024-01-23 2024-03-01 南京信息工程大学 Dynamic rumor detection method based on space-time propagation diagram
CN117633635B (en) * 2024-01-23 2024-04-16 南京信息工程大学 Dynamic rumor detection method based on space-time propagation diagram
WO2025010053A3 (en) * 2024-08-12 2025-06-26 Bts Kurumsal Bi̇li̇şi̇m Teknoloji̇leri̇ Anoni̇m Şi̇rketi̇ Cyber threat prediction system and method thereof
CN118916820A (en) * 2024-10-08 2024-11-08 国网江苏省电力有限公司营销服务中心 Multi-view multi-scale graph anomaly detection method and system based on dynamic node screening
CN119646312A (en) * 2024-12-18 2025-03-18 同济大学 A graph contrastive learning method based on hierarchical neighbor enhancement for recommender systems
CN119646312B (en) * 2024-12-18 2025-09-26 同济大学 Graph contrast learning method based on hierarchical neighbor enhancement and oriented to recommendation system
CN119728295A (en) * 2025-02-25 2025-03-28 贵州航天计量测试技术研究所 A network malicious attack detection method and system based on multi-view feature fusion
CN119728295B (en) * 2025-02-25 2025-07-15 贵州航天计量测试技术研究所 Network malicious attack detection method and system based on multi-view feature fusion
CN120257029A (en) * 2025-06-06 2025-07-04 浙江大学 A graph anomaly detection method based on multi-view graph embedding and node prototype comparison

Also Published As

Publication number Publication date
CN118020076A (en) 2024-05-10
DE112021007660T5 (en) 2024-03-14

Similar Documents

Publication Publication Date Title
WO2023010502A1 (en) Method and apparatus for anomaly detection on graph
Habibpour et al. Uncertainty-aware credit card fraud detection using deep learning
Choi et al. An artificial intelligence approach to financial fraud detection under IoT environment: A survey and implementation
US20220067738A1 (en) System and Method for Blockchain Automatic Tracing of Money Flow Using Artificial Intelligence
Ma et al. A deep learning-based DDoS detection framework for Internet of Things
CN113536383A (en) Method and device for training neural network based on privacy protection
US20200372327A1 (en) Quantifying the Predictive Uncertainty of Neural Networks Via Residual Estimation With I/O Kernel
CN114255121A (en) Training Method of Credit Risk Prediction Model and Credit Risk Prediction Method
Pang et al. ADI: Adversarial dominating inputs in vertical federated learning systems
CN116599683A (en) A malicious traffic detection method, system, device and storage medium
Singh Yadav et al. Unsupervised learning for financial statement fraud detection using manta ray foraging based convolutional neural network
Karim et al. Catch me if you can: Semi-supervised graph learning for spotting money laundering
Lin et al. Tracking phishing on Ethereum: Transaction network embedding approach for accounts representation learning
Wang et al. Data complexity-based batch sanitization method against poison in distributed learning
Fan et al. Deep joint adversarial learning for anomaly detection on attribute networks
Singh et al. Integrating machine learning in business decision making: Application and future directions
CN114912927A (en) Block chain anti-fraud analysis method and system
Yin et al. Behavioral graph fraud detection in E-commerce
Kadam Financial fraud detection using jump-attentive graph neural networks
Ma et al. Parallel Graph Learning with Temporal Stamp Encoding for Fraudulent Transactions Detections
Devi et al. IoT device security for smart card fraud detection for credit cards
Li et al. Research on Fincial Fraud Detection based on Deep Graph Neural Network
Roy et al. GAD-EBM: Graph Anomaly Detection using Energy-Based Models
Dairi et al. Auto-configured explainable graph neural networks for multi-site pollution prediction
Han et al. DeCaf: A Causal Decoupling Framework for OOD Generalization on Node Classification

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 112021007660

Country of ref document: DE

WWE Wipo information: entry into national phase

Ref document number: 202180101403.3

Country of ref document: CN

122 Ep: pct application non-entry in european phase

Ref document number: 21952379

Country of ref document: EP

Kind code of ref document: A1