CN118540239A

CN118540239A - Heterogeneous graph anomaly detection method based on multipath neighborhood node self-adaptive selection

Info

Publication number: CN118540239A
Application number: CN202410621742.XA
Authority: CN
Inventors: 宫继兵; 林宇庭; 彭吉全; 赵金烨; 张锦
Original assignee: Yanshan University
Current assignee: Yanshan University
Priority date: 2024-05-20
Filing date: 2024-05-20
Publication date: 2024-08-23

Abstract

The present invention discloses a heterogeneous graph anomaly detection method based on adaptive selection of multi-way neighborhood nodes, which belongs to the technical field of deep learning and graph data mining, and includes the following steps: given a meta-path pattern, constructing a multidimensional graph with multidimensional semantic information; expanding the sampling range of the multi-way neighborhood, dynamically adjusting the sampling probability of the neighborhood nodes, and sampling an appropriate amount of neighborhood nodes; simulating the basic environmental elements in the reinforcement learning task according to the dependency, dynamics and complexity of the sampling node sequence; deploying multiple agents in the multidimensional graph, adaptively selecting the same matching nodes in sequence from the multidimensional sampling node sequence; using relationship-aware neighborhood aggregation to learn comprehensive node representation embedding; passing the representation embedding of the target node to downstream tasks for heterogeneous graph representation learning. The present invention can realize the fine-grained selection of neighborhood nodes in the heterogeneous graph, not only enhancing the reliability of the central node in the information aggregation process, but also improving the performance of the heterogeneous graph in representation learning.

Description

Anomaly detection method for heterogeneous graphs based on adaptive selection of multi-path neighborhood nodes

技术领域Technical Field

本发明涉及深度学习和图数据挖掘技术领域，尤其是一种基于多路邻域节点自适应选择的异质图异常检测方法。The present invention relates to the technical field of deep learning and graph data mining, and in particular to a heterogeneous graph anomaly detection method based on multi-path neighborhood node adaptive selection.

背景技术Background Art

真实世界场景中的网络数据可以建模为由多种实体和关系构成的异质图，并在很多研究领域中得到深入的挖掘和广泛的应用。在互联网新时代中，海量知识数据的涌现无疑是一个巨大的考验。面对大规模的图数据时，人们常常面临着信息的不完整性和不可靠性问题，这使得有效地选择和处理这些充满不确定性的知识资源变得颇具挑战。Network data in real-world scenarios can be modeled as heterogeneous graphs consisting of multiple entities and relationships, and has been deeply explored and widely applied in many research fields. In the new era of the Internet, the emergence of massive knowledge data is undoubtedly a huge challenge. When faced with large-scale graph data, people often face the problems of incompleteness and unreliability of information, which makes it challenging to effectively select and process these uncertain knowledge resources.

虽然图神经网络在处理各类结构化图数据方面表现出色，但其表示学习的过程往往建立在同配性假设之上，这意味着图中的邻域节点倾向于拥有与中心节点相似的特征或类别。这一现象在现实世界的图数据中得到了普遍观察和实质验证。然而，图数据的质量参差不齐，在图建模时可能会遭遇诸如异常值、噪声、稀疏性、偏差以及长尾分布等挑战。这些异常数据的存在可能会破坏同配性假设，导致基于邻域的信息变得不可信。如果从这样的邻域环境中聚合信息，可能会引入误导性因素，从而损害图学习算法的性能，导致学到的图表示失真。在同质图中，同配性尚且不是普遍存在的，异配性特征反而更为常见。而在异质图中，同质的对等节点间需要经过多跳的关系路径才能建立连接，这也进一步增强了节点之间的异配性。Although graph neural networks perform well in processing various types of structured graph data, their representation learning process is often based on the assortativity assumption, which means that neighborhood nodes in the graph tend to have similar features or categories as the central node. This phenomenon has been widely observed and verified in real-world graph data. However, the quality of graph data varies, and challenges such as outliers, noise, sparsity, bias, and long-tail distribution may be encountered when modeling graphs. The existence of these abnormal data may undermine the assortativity assumption and make neighborhood-based information unreliable. If information is aggregated from such a neighborhood environment, misleading factors may be introduced, thereby damaging the performance of the graph learning algorithm and causing the learned graph representation to be distorted. In homogeneous graphs, assortativity is not yet ubiquitous, and heterogeneity is more common. In heterogeneous graphs, homogeneous peer nodes need to go through multi-hop relationship paths to establish connections, which further enhances the heterogeneity between nodes.

最近有研究工作指出：在一般的现实业务场景应用中，在运用到图神经网络时，并非所有邻居节点都对消息传递聚合具有正向作用，事实上，一些邻居节点可能还会传递给中心节点的表示学习产生负面影响的干扰信息。它们通过手工或启发式的方法来指定需要保留有益邻居的数量或阈值来缓解有害节点对目标中心节点的负面影响。然而，这些工作仅粗糙地根据邻居节点与中心节点之间的距离度量指标来评判节点的好坏，并没有以更细粒度的方式自适应地检测出与中心节点在空间和语义上毗邻的节点异常。Recent research has pointed out that in general real-world business scenarios, when applied to graph neural networks, not all neighbor nodes have a positive effect on message aggregation. In fact, some neighbor nodes may also pass interference information that has a negative impact on the representation learning of the central node. They use manual or heuristic methods to specify the number or threshold of beneficial neighbors that need to be retained to alleviate the negative impact of harmful nodes on the target central node. However, these works only roughly judge the quality of nodes based on the distance measurement indicators between neighbor nodes and central nodes, and do not adaptively detect node anomalies that are spatially and semantically adjacent to the central node in a more fine-grained manner.

因此，有必要提出一种基于多路邻域节点自适应选择的异质图异常检测方法，来帮助图神经网络更有效地捕捉图数据中多维度的结构和特征信息，以完成高质量的异质图表示学习。Therefore, it is necessary to propose a heterogeneous graph anomaly detection method based on adaptive selection of multi-way neighborhood nodes to help graph neural networks more effectively capture the multi-dimensional structure and feature information in graph data, so as to complete high-quality heterogeneous graph representation learning.

发明内容Summary of the invention

本发明针对异质图中高阶同质对等节点存在的异常，可能会使观察到的邻域不可靠，并导致节点表示学习的误导性信息聚合问题，提供了一种基于多路邻域节点自适应选择的异质图异常检测方法，实现了对异质图中邻域节点的细粒度选择，不仅增强了中心节点在信息聚合过程的可靠性，还提高了异质图在表示学习方面的性能。The present invention aims at the problem that anomalies existing in high-order homogeneous peer nodes in heterogeneous graphs may make the observed neighborhood unreliable and lead to misleading information aggregation in node representation learning. A heterogeneous graph anomaly detection method based on multi-way neighborhood node adaptive selection is provided, which realizes fine-grained selection of neighborhood nodes in heterogeneous graphs, which not only enhances the reliability of central nodes in the information aggregation process, but also improves the performance of heterogeneous graphs in representation learning.

为解决上述技术问题，本发明所采用的技术方案是：In order to solve the above technical problems, the technical solution adopted by the present invention is:

一种基于多路邻域节点自适应选择的异质图异常检测方法，包括以下步骤：A heterogeneous graph anomaly detection method based on multi-path neighborhood node adaptive selection comprises the following steps:

S1，给定元路径模式，构建具有多维语义信息的多维图；S1, given a meta-path pattern, construct a multi-dimensional graph with multi-dimensional semantic information;

S2，扩展多路邻域的采样范围，动态调整邻域节点的采样概率，采样出适量的邻域节点；S2, expand the sampling range of multi-path neighborhood, dynamically adjust the sampling probability of neighborhood nodes, and sample an appropriate number of neighborhood nodes;

S3，按照采样节点序列的依赖性、动态性和复杂性仿真强化学习任务中的基本环境要素；S3, simulates the basic environment elements in the reinforcement learning task according to the dependency, dynamics and complexity of the sampled node sequence;

S4，在多维图中部署多智能体，自适应地从多维采样节点序列中顺序选择出同配节点；S4, deploys multiple agents in the multidimensional graph and adaptively selects the same nodes from the multidimensional sampling node sequence;

S5，利用关系感知邻域聚合学习全面的节点表征嵌入；S5, learning comprehensive node representation embeddings using relation-aware neighborhood aggregation;

S6，将目标节点的表征嵌入传递到下游任务中进行异质图表示学习。S6, passes the representation embedding of the target node to the downstream task for heterogeneous graph representation learning.

本发明技术方案的进一步改进在于：在S1中，具体为给定n个不同的元路径模式根据每个元路径模式进行相应的连续矩阵乘法计算，构建出具有n维关系路径语义信息的多维图过程包括以下步骤：A further improvement of the technical solution of the present invention is that: in S1, specifically, given n different meta-path patterns According to each meta-path pattern, corresponding continuous matrix multiplication calculations are performed to construct a multi-dimensional graph with n-dimensional relationship path semantic information. The process includes the following steps:

S11，不同的下游图挖掘任务可能对目标节点的类型数量有特定的要求，每种目标节点类型对应构建一个多维图；S11, different downstream graph mining tasks may have specific requirements on the number of target node types, and a multidimensional graph is constructed for each target node type;

S12，沿着预设计的元路径中所有关系对应的邻接矩阵进行连续相乘来构建基于元路径的图的邻接矩阵表达如下：S12, along the pre-designed metapath The adjacency matrix of the graph based on the meta-path is constructed by continuously multiplying the adjacency matrices corresponding to all relations in The expression is as follows:

其中，e_l代表节点类型，是关系<e_l,e_l+1>对应的邻接矩阵；对n个不同的元路径模式进行类似的计算，构建出一组同质图对应的邻接矩阵将异质图转化为n维多维图。Among them, e _l represents the node type, is the adjacency matrix corresponding to the relationship < _el , _el+1 >; for n different meta-path patterns Perform similar calculations to construct an adjacency matrix corresponding to a set of homogeneous graphs Convert heterogeneous graphs into n-dimensional multidimensional graphs.

本发明技术方案的进一步改进在于：在S2中，具体为从多路采样启发的角度扩展节点邻域的采样范围，利用上下文多臂赌博机动态地调整每维关系路径下邻域节点的采样概率，并根据概率分别从每个维度为每个节点采样出T个邻域节点，从而构建邻域采样池NSP；过程包括以下步骤：The further improvement of the technical solution of the present invention is that: in S2, specifically, the sampling range of the node neighborhood is expanded from the perspective of multi-path sampling inspiration, the context multi-armed bandit is used to dynamically adjust the sampling probability of the neighborhood nodes under each dimensional relationship path, and T neighborhood nodes are sampled for each node from each dimension according to the probability, so as to construct a neighborhood sampling pool NSP; the process includes the following steps:

S21，对于具有d_Φ(v)维节点属性X_v的节点v，采用线性变换矩阵和偏置向量将节点属性变换到特定的d维隐嵌入空间中，得到节点的初始表征嵌入H_v，表达如下：S21, for a node v with d _Φ(v) -dimensional node attributes X _v , a linear transformation matrix is used and the bias vector Transform node attributes into a specific d-dimensional latent embedding space In the above example, the initial representation of the node is embedded in H _v and is expressed as follows:

H_v＝W_Φ(v)X_v+b_Φ(v) (2)H _v ＝W _Φ(v) X _v +b _Φ(v) (2)

S22，对于每个维度的中心节点，采用多路代表性的邻域采样启发式策略；S22, for the central node of each dimension, a multi-way representative neighborhood sampling heuristic strategy is adopted;

S23，用Q^k来代表第k个邻域采样启发策略的概率矩阵，在回合ep中，结合K路邻域采样启发策略的采样概率，计算中心节点v_i对邻域节点v_j的加权多路采样概率 S23, use ^Qk to represent the probability matrix of the k-th neighborhood sampling heuristic strategy. In round ep, combine the sampling probability of the K-way neighborhood sampling heuristic strategy to calculate the weighted multi-way sampling probability of the central node _vi to the neighboring node _vj

其中，是邻域采样启发策略粒度概率，Q^k满足行归一化 in, is the probability of the neighborhood sampling heuristic strategy granularity, Q ^k satisfies row normalization

S24，计算策略粒度概率是第k个邻域采样启发策略的概率矩阵Q^k的权重；邻域采样启发策略的综合选择能够被视为上下文多臂赌博机中的对抗性问题，策略粒度概率作为多路邻域采样的赌博动作，自适应地调整每一路的采样概率，并随着模型训练回合的迭代而更新：S24, calculate the probability of strategy granularity is the weight of the probability matrix Q ^k of the kth neighborhood sampling heuristic strategy; the comprehensive selection of the neighborhood sampling heuristic strategy can be regarded as an adversarial problem in the contextual multi-armed bandit. The strategy granularity probability is used as a gambling action for multi-way neighborhood sampling, adaptively adjusting the sampling probability of each way and updating it with the iteration of the model training round:

其中，初始值且p_min≥0是一个可控常数，用于约束采样概率的下限；w_k是第k路策略的权重，根据具有监督学习保证的上下文多臂赌博机强化学习算法，迭代更新规则如下：Among them, the initial value And p _min ≥ 0 is a controllable constant used to constrain the lower limit of the sampling probability; w _k is the weight of the k-th strategy. According to the contextual multi-armed bandit reinforcement learning algorithm with supervised learning guarantee, the iterative update rule is as follows:

其中，是更新周期，和是可调整的超参数，代表上下文多臂赌博机选择动作时的遗憾率；是在第ep回合第k个策略下采样节点的加权奖励值，表达如下：in, is the update cycle, and is an adjustable hyperparameter representing the regret rate of the contextual multi-armed bandit when selecting actions; is the weighted reward value of the sampling node under the k-th strategy in the ep-th round, expressed as follows:

其中，是节点v_i选择邻域节点的集合；是节点v_i和v_j的平均多头注意力得分；in, is the set of neighboring nodes selected by node v _i ; is the average multi-head attention score of nodes _vi and _vj ;

S25，在每个回合结束后，为每个节点v，在每维关系路径下，根据采样概率从节点集合中采样T个邻域节点，并将这些中心节点和邻域节点对添加到邻域采样池中：S25, after each round, for each node v, in each dimension relationship path Next, according to the sampling probability From the node set Sample T neighborhood nodes and add these center node and neighborhood node pairs to the neighborhood sampling pool:

其中，是采样概率。in, is the sampling probability.

本发明技术方案的进一步改进在于：在S22中，所述多路代表性的邻域采样启发式策略包括：一跳邻居策略、二跳邻居策略、最近邻邻居策略和随机游走邻居策略；具体为：A further improvement of the technical solution of the present invention is that: in S22, the multi-channel representative neighborhood sampling heuristic strategy includes: a one-hop neighbor strategy, a two-hop neighbor strategy, a nearest neighbor strategy and a random walk neighbor strategy; specifically:

所述一跳邻居策略：从归一化邻接矩阵中采样的直接连接节点，其中，是添加了自环的邻接矩阵，是满足的对角矩阵；The one-hop neighbor strategy: From the normalized adjacency matrix The directly connected nodes sampled in , where is the adjacency matrix with self-loops added, is satisfied The diagonal matrix of ;

所述二跳邻居策略：从归一化邻接矩阵的幂中采样的间接连接节点；The two-hop neighbor strategy: From the power of the normalized adjacency matrix Indirectly connected nodes sampled in;

所述最近邻邻居策略：从基于节点属性的余弦相似性的概率矩阵S^KNN中采样的节点，其中，节点v_i和v_j之间的相似性得分为 The nearest neighbor strategy: nodes sampled from the probability matrix S ^KNN based on the cosine similarity of node attributes, where the similarity score between nodes _vi and _vj is

所述随机游走邻居策略：从随机游走权重矩阵S^PPR中采样的节点，其中，个性化PageRank算法用于生成游走权重纯量系数c∈[0,1]，表示列归一化邻接矩阵。The random walk neighbor strategy: nodes sampled from the random walk weight matrix S ^PPR , where the personalized PageRank algorithm is used to generate the walk weights scalar coefficient c∈[0,1], Represents a column-normalized adjacency matrix.

本发明技术方案的进一步改进在于：在S3中，具体为针对邻域采样池中每个中心节点的采样邻域节点集合，根据它们的采样概率进行顺序排列，并按照采样邻域节点序列的依赖关系、动态特性以及复杂程度仿真强化学习任务中的基本环境要素；过程包括以下步骤：A further improvement of the technical solution of the present invention is that: in S3, specifically for each central node in the neighborhood sampling pool, the sampling neighborhood node set is arranged in order according to their sampling probability, and the basic environmental elements in the reinforcement learning task are simulated according to the dependency, dynamic characteristics and complexity of the sampling neighborhood node sequence; the process includes the following steps:

S31，为满足强化环境要素中的依赖性，在对邻域节点的顺序选择时，根据邻域节点的采样概率对邻域节点进行排序，排列相近的邻域节点在语义上存在关联，随着智能体沿着时间步在邻域节点序列中前进，靠近的邻域节点会产生顺序依赖性，依赖过程的表达如下：S31, in order to meet the dependency in the enhanced environmental elements, when selecting the order of neighboring nodes, the neighboring nodes are sorted according to the sampling probability of the neighboring nodes. The neighboring nodes with similar arrangement are semantically associated. As the agent moves forward in the neighboring node sequence along the time step, the close neighboring nodes will produce sequential dependency. The expression of the dependency process is as follows:

o_t+1＝τo_t+(1-τ)o_t+1 (8)o _t+1 =τo _t +(1-τ)o _t+1 (8)

其中，o_t是在时间步t的环境状态，τ是一个数值较小的超参数；Among them, o _t is the state of the environment at time step t, and τ is a small hyperparameter;

S32，为满足强化环境要素中的动态性，对邻域节点序列设置跳过率p_s，智能体以概率p_s随机地跳过对某一个邻域节点的评估，并直接转移到下一个邻域节点，这时的状态序列长度和内容在每一个回合中都会发生明显的变化；S32, in order to meet the dynamics of the enhanced environmental elements, a skip rate _ps is set for the neighborhood node sequence. The agent randomly skips the evaluation of a neighborhood node with probability _ps and directly transfers to the next neighborhood node. At this time, the length and content of the state sequence will change significantly in each round;

S33，为满足强化环境要素中的复杂性，根据邻域节点的采样概率大小对邻域节点进行降序排列，智能体从采样概率较高的简单邻域节点开始选择判断，并利用较高的跳过率来规避较为困难的邻域节点，跳过率p_s随着模型训练回合的迭代而更新：S33, in order to meet the complexity of the enhanced environmental elements, the neighborhood nodes are sorted in descending order according to the sampling probability of the neighborhood nodes. The agent starts to select and judge from the simple neighborhood nodes with higher sampling probability, and uses a higher skip rate to avoid more difficult neighborhood nodes. The skip rate _ps is updated with the iteration of the model training round:

其中，p_s是基本跳过率，是确保动态特性的最小跳过率；d_t代表在时间步t时邻域节点v_t的归一化复杂性度量，表达如下：Among them, _ps is the basic skip rate, is the minimum skip rate to ensure dynamic characteristics; _dt represents the normalized complexity measure of the neighborhood node _vt at time step t, expressed as follows:

其中，即从T个采样节点中选取具有最大多路采样概率的节点序号；l_t∈[0,1]是节点v与v_t同配的归一化置信度，与智能体行为中选择决策的奖励值有关；κ是平衡中心节点v与邻域节点v_t之间同配性和异配性特征的超参数。in, That is, the node number with the maximum multi-path sampling probability is selected from the T sampling nodes; l _t ∈ [0,1] is the normalized confidence that nodes v and v _t are assortative, which is related to the reward value of the decision-making selection in the agent's behavior; κ is a hyperparameter that balances the assortative and heteromatching characteristics between the central node v and the neighboring node v _t .

本发明技术方案的进一步改进在于：在S4中，具体为在一个回合中，智能体依次对中心节点与它对应的最多T个采样邻域节点进行异配性判断，过程被视作T轮决策过程；在多维图中为n维关系路径构建的同质图部署n种智能体，它们在n维关系路径语义的马尔可夫博弈中学习如何选择同配性节点以实现多维度选择的合作均衡，从而构建邻域选择池NCP；过程包括以下步骤：The further improvement of the technical solution of the present invention is that: in S4, specifically in one round, the intelligent agent judges the heterogeneity of the central node and the maximum T sampled neighboring nodes corresponding to it in turn, and the process is regarded as a T-round decision process; in the multidimensional graph, a homogeneous graph constructed for the n-dimensional relationship path is deployed with n kinds of intelligent agents, and they learn how to select homogeneous nodes in the Markov game of the semantics of the n-dimensional relationship path to achieve a cooperative equilibrium of multi-dimensional selection, thereby constructing a neighborhood selection pool NCP; the process includes the following steps:

S41，采用最新的中心节点表征H_v，当前时间步的邻域节点表征H_vt和采样邻域节点集合NSP(v)共同作为当前时刻t的可观测状态o_t，表达如下：S41, using the latest central node representation _Hv , the neighborhood node representation _Hvt of the current time step and the sampled neighborhood node set NSP(v) as the observable state _ot at the current time t, expressed as follows:

S42，每个智能体的动作空间定义为一个由选择和不选择同配性节点组成的离散方案，表达如下：S42, the action space of each agent It is defined as a discrete scheme consisting of selecting and not selecting nodes of similarity, expressed as follows:

S43，在时间步t，采用线性层将中心节点表征H_v和邻域节点表征组合起来，获得整合中心节点与当前时间步邻域节点的信息估计表达如下：S43, at time step t, a linear layer is used to represent the central node H _v and the neighboring node H v. Combined, we can get the information of the integrated center node and the neighboring nodes of the current time step. The estimated expression is as follows:

其中，σ(·)是激活函数，(·||·)是拼接操作，W_h和b_h分别是线性映射中可学习的权重矩阵和偏置向量；借鉴对比预测编码的理论，估计中心节点与邻域节点集之间的互信息，获得更高阶的信息量估计表达如下：Among them, σ(·) is the activation function, (·||·) is the concatenation operation, W _h and b _h are the learnable weight matrix and bias vector in the linear mapping, respectively; referencing the theory of contrastive predictive coding, the mutual information between the central node and the set of neighboring nodes is estimated to obtain higher-order information. The estimated expression is as follows:

其中，W_d是可学习的线性变换矩阵；将上面两个估计式拼接起来，进行可学习的线性变换W_p和Softmax层输出归一化步骤，获得动作概率分布表达如下：Among them, _Wd is a learnable linear transformation matrix; concatenate the above two estimation formulas, perform a learnable linear transformation _Wp and a Softmax layer output normalization step, and obtain the action probability distribution The expression is as follows:

S44，在时间步t，根据图结构的数据特点，采用无监督的方式为智能体的动作分配奖励r_t+1(v)，表达如下：S44, at time step t, according to the data characteristics of the graph structure, an unsupervised approach is used to assign rewards r _t+1(v) to the actions of the agent, expressed as follows:

其中，σ(·)是Sigmoid激活函数，Ω是在异质图上可观测到的正节点对的集合，而Ω^-则是从所有未观测到的节点对中采样的负节点对的集合，ω₁和ω₂是可调整的超参数，分别代表正样本的奖励权重和负样本的惩罚权重；Where σ(·) is the Sigmoid activation function, Ω is the set of positive node pairs observed on the heterogeneous graph, and Ω ^- is the set of negative node pairs sampled from all unobserved node pairs. ω ₁ and ω ₂ are adjustable hyperparameters, representing the reward weight of positive samples and the penalty weight of negative samples, respectively.

S45，在时间步t，智能体α观测节点状态o_t，根据策略执行动作u_t，获得反馈r_t+1，记录完成状态done，并进入下一个时间步t+1，继续观测新的节点状态o_t+1，为每个智能体α建立一个经验回放缓冲池在每个时间步存入经验轨迹τ^α＝(o_t,u_t,r_t+1,o_t+1,done)；S45, at time step t, agent α observes the node state o _t and follows the strategy Execute action u _t , obtain feedback r _t+1 , record the completed state done , and enter the next time step t+1 , continue to observe the new node state o _t+1 , and establish an experience replay buffer pool for each agent α Store the empirical trajectory τ ^α =(o _t ,u _t ,r _t+1 ,o _t+1 ,done) at each time step;

S46，采用多演员注意力评论家算法实现上述多智能体行为的学习与优化，在多智能体强化学习环境中，训练去中心化的策略，通过多头注意力机制动态地选取每个智能体的重要信息，并集中学习策略的评价；S46, uses a multi-actor attention critic algorithm to achieve the learning and optimization of the above multi-agent behavior. In a multi-agent reinforcement learning environment, it trains decentralized strategies, dynamically selects important information of each agent through a multi-head attention mechanism, and centralizes the evaluation of learning strategies;

S47，在每个回合结束后，对于在每维关系路径下的中心节点v，智能体沿时间序列从邻域采样池中迭代选择候选节点，并将这些中心节点和选择节点对添加到邻域选择池中：S47, after each round, for each dimension relationship path For the central node v under the circumstance, the agent iteratively selects candidate nodes from the neighborhood sampling pool along the time series and adds these central node and selected node pairs to the neighborhood selection pool:

本发明技术方案的进一步改进在于：在S46中，具体过程包括以下步骤：A further improvement of the technical solution of the present invention is that in S46, the specific process includes the following steps:

S461，智能体α_i的状态动作价值函数的表达如下：S461, the state-action value function of agent α _i is expressed as follows:

其中，gi(·)是面向不同模块的多层感知机嵌入函数，W_s是可学习的线性变换矩阵，在其他智能体α_j的贡献函数g^ou(o,u)中被共享使用，是贡献价值的注意力权重；Among them, gi(·) is the multi-layer perceptron embedding function for different modules, _Ws is a learnable linear transformation matrix, which is shared in the contribution function g ^ou (o,u) of other agents α _j . is the attention weight of the contribution value;

S462，所有智能体的价值函数优化能够近似看作多任务回归问题，它们的联合回归损失函数如下：S462, the value function optimization of all agents can be approximately regarded as a multi-task regression problem, and their joint regression loss function is as follows:

其中，θ和分别是评论家网络和目标评论家网络的参数；是目标价值函数，表达如下：Among them, θ and are the parameters of the critic network and the target critic network, respectively; is the target value function, expressed as follows:

其中，γ是折扣因子，τ是平衡最大化熵和奖励的温度超参数，是目标策略网络的参数；where γ is the discount factor, τ is the temperature hyperparameter that balances maximizing entropy and reward, are the parameters of the target policy network;

S463，每个智能体的策略网络通过梯度上升进行更新，它们各自的梯度表达如下：S463, the policy network of each agent is updated by gradient ascent, and their respective gradients are expressed as follows:

其中，A_i(o|u)＝Q_i(o,u；θ)-b_i(o,u)是优势函数，是反事实基线，φ是策略网络的参数，ρ是可学习的策略对数调整系数；所有目标网络的参数都是通过软更新机制进行同步的。Among them, A _i (o|u) = _Qi (o,u; θ) - _bi (o,u) is the advantage function, is the counterfactual baseline, φ is the parameter of the policy network, and ρ is the learnable policy logarithmic adjustment coefficient; the parameters of all target networks are synchronized through the soft update mechanism.

本发明技术方案的进一步改进在于：在S5中，具体为：为了获得全面的节点表征嵌入，采用关系感知邻域聚合，将邻域聚合分为关系路径内的聚合和关系路径间的聚合；过程包括以下步骤：A further improvement of the technical solution of the present invention is that: in S5, specifically: in order to obtain comprehensive node representation embedding, relationship-aware neighborhood aggregation is adopted, and neighborhood aggregation is divided into aggregation within the relationship path and aggregation between the relationship paths; the process includes the following steps:

S51，在关系路径内的聚合中，采用注意力机制来对邻域选择的权重进行动态调整，得到关系路径内的聚合表征嵌入表达如下：S51, in the aggregation within the relationship path, the attention mechanism is used to dynamically adjust the weight of the neighborhood selection to obtain the aggregation representation embedding within the relationship path The expression is as follows:

其中，W_h和W_t分别是针对中心节点和邻域节点的线性变换矩阵，它们在所有关系路径中是共享的；a_v,u表示节点v_j对v_i的重要性分数，其表达如下：Among them, W _h and W _t are the linear transformation matrices for the central node and the neighborhood nodes respectively, which are shared in all relationship paths; a _v,u represents the importance score of node v _j to _vi , which is expressed as follows:

其中，W_a是注意力变换矩阵，tanh激活函数可以进一步区分邻域节点带给中心节点的异常信息；Among them, _Wa is the attention transformation matrix, and the tanh activation function can further distinguish the abnormal information brought to the central node by the neighboring nodes;

S52，在关系路径间的聚合中，将所有关系路径下的语义表征嵌入信息拼接起来进行联合学习，得到关系路径间的聚合表征嵌入Z_v：S52, in the aggregation between relation paths, the semantic representation embedding information under all relation paths is spliced together for joint learning to obtain the aggregated representation embedding Z _v between relation paths:

其中，MLP(·)是面向最终图任务的多层感知机函数。Among them, MLP(·) is the multi-layer perceptron function for the final graph task.

本发明技术方案的进一步改进在于：在S6中，具体为将目标中心节点的最终表征嵌入Z作为输出，并传递到下游任务中进行异质图的表示学习，过程包括以下步骤：A further improvement of the technical solution of the present invention is that in S6, the final representation of the target central node is embedded into Z as output and passed to the downstream task for representation learning of the heterogeneous graph. The process includes the following steps:

S61，在训练的时候，针对图节点分类任务，首先将图神经网络的输出层维度设定为与节点类型数量一样，然后对于单标签分类场景，通过Softmax函数进行归一化处理，并结合最小化交叉熵损失函数来优化算法性能；而对于多标签分类场景，采用Sigmoid函数来处理每个标签的独立二分类问题，并结合最小化二元交叉熵损失函数来优化算法性能；S61, during training, for the graph node classification task, first set the output layer dimension of the graph neural network to be the same as the number of node types, then for the single-label classification scenario, normalize it through the Softmax function, and optimize the algorithm performance by minimizing the cross entropy loss function; for the multi-label classification scenario, use the Sigmoid function to handle the independent binary classification problem of each label, and optimize the algorithm performance by minimizing the binary cross entropy loss function;

S62，在训练的时候，针对图链接预测任务，首先需要对预测节点对进行内积运算或通过DistMult解码器处理，再结合最小化二元交叉熵损失函数来优化算法性能；S62, during training, for the graph link prediction task, it is first necessary to perform inner product operations on the predicted node pairs or process them through the DistMult decoder, and then combine the minimization of the binary cross entropy loss function to optimize the algorithm performance;

S63，基于下游任务的标签监督信号对模型进行学习优化，异质图神经网络的损失函数如下：S63, based on the label supervision signal of the downstream task, the model is learned and optimized. The loss function of the heterogeneous graph neural network is as follows:

其中，是关系感知邻域聚合模块的损失项，λ_*是正则化权重，θ是模型参数集，||θ||₂是模型参数的L₂正则化范数。in, is the loss term of the relation-aware neighborhood aggregation module, λ _* is the regularization weight, θ is the set of model parameters, and ||θ|| ₂ is the L ₂ regularization norm of the model parameters.

由于采用了上述技术方案，本发明取得的技术进步是：Due to the adoption of the above technical solution, the technical progress achieved by the present invention is:

本发明设计的一种基于多路邻域节点自适应选择的异质图异常检测方法，针对异质图中高阶同质对等节点存在的异配性，从而导致异质图神经网络聚合异常信息的问题，首先采用上下文多臂赌博机动态地扩展邻域节点采样的范围，然后利用多智能体强化学习自适应地选择与目标节点同配的邻域节点，最大程度地过滤掉异常邻域节点，实现了对异质图中邻域节点的细粒度选择，最后通过关系感知邻域聚合模块融合多种关系路径下的选择邻域信息，进一步降低了潜在的异常信息对异质图神经网络的影响；本发明实现了异质图的自适应异常检测方法，并提高了异质图神经网络在图节点分类和图链接预测任务上的表现性能。The present invention designs a heterogeneous graph anomaly detection method based on multi-path neighborhood node adaptive selection. Aiming at the problem that heterogeneous graph neural network aggregates abnormal information due to the heterogeneity of high-order homogeneous peer nodes in heterogeneous graphs, a contextual multi-armed bandit is firstly used to dynamically expand the range of neighborhood node sampling, and then multi-agent reinforcement learning is used to adaptively select neighborhood nodes that are homogeneous with target nodes, and abnormal neighborhood nodes are filtered out to the greatest extent, thereby realizing fine-grained selection of neighborhood nodes in heterogeneous graphs. Finally, a relationship-aware neighborhood aggregation module is used to fuse selected neighborhood information under multiple relationship paths, further reducing the impact of potential abnormal information on heterogeneous graph neural networks. The present invention realizes an adaptive anomaly detection method for heterogeneous graphs and improves the performance of heterogeneous graph neural networks in graph node classification and graph link prediction tasks.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图做以简单地介绍，显而易见地，下面描述中的附图是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图；In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings required for use in the embodiments or the description of the prior art are briefly introduced below. Obviously, the drawings described below are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without creative work.

图1是本发明实施例中提供的一种基于多路邻域节点自适应选择的异质图异常检测方法的流程图；FIG1 is a flow chart of a heterogeneous graph anomaly detection method based on multi-path neighborhood node adaptive selection provided in an embodiment of the present invention;

图2是本发明实施例中提供的一种基于多路邻域节点自适应选择的异质图异常检测方法的架构图。FIG2 is an architecture diagram of a heterogeneous graph anomaly detection method based on adaptive selection of multi-path neighborhood nodes provided in an embodiment of the present invention.

具体实施方式DETAILED DESCRIPTION

需要说明的是，本发明的说明书和权利要求书及上述附图中的术语“包括”和“具有”以及他们的任何变形，意图在于覆盖不排他的包含，例如，包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元，而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。It should be noted that the terms "including" and "having" and any variations thereof in the specification and claims of the present invention and the above-mentioned drawings are intended to cover non-exclusive inclusions. For example, a process, method, system, product or apparatus comprising a series of steps or units is not necessarily limited to those steps or units clearly listed, but may include other steps or units that are not clearly listed or inherent to these processes, methods, products or apparatuses.

本发明中的部分专业术语解释：Explanation of some professional terms in this invention:

异质图(heterogeneous graph)：异质图定义为其中是节点集合，ε是边集合，E是实体类型集合，R是关系类型集合，χ是实体属性集合；此外，还定义了两个映射函数，即Φ(·)代表节点到实体类型的映射函数，Ψ(·)代表边到关系类型的映射函数；Heterogeneous graph: A heterogeneous graph is defined as in is a node set, ε is an edge set, E is an entity type set, R is a relationship type set, and χ is an entity attribute set. In addition, two mapping functions are defined, namely, Φ(·) represents the mapping function from node to entity type, and Ψ(·) represents the mapping function from edge to relationship type.

多维图(multi-dimensional graph)：多维图是异质图的一个特例，遵循|E|＝1和|R|>1，其中，|E|表示实体类型的数量，|R|表示关系类型的数量；Multi-dimensional graph: A multi-dimensional graph is a special case of a heterogeneous graph, which follows |E|=1 and |R|>1, where |E| represents the number of entity types and |R| represents the number of relationship types.

异质图神经网络(heterogeneous graph neural network)：异质图神经网络分别在不同的关系内聚合来自中心节点的邻居节点的信息来嵌入图结构数据，然后融合关系间的图信息；中心节点v在第l层的表征嵌入为：其中，σ是激活函数，是通过连接或求和将节点v的信息与其相邻信息组合在一起的运算符，AGG(.)表示将来自不同关系的邻域信息映射到向量中的聚合函数；Heterogeneous graph neural network: Heterogeneous graph neural network aggregates information from neighbor nodes of the central node in different relationships to embed graph structure data, and then fuses graph information between relationships; the representation embedding of the central node v in the lth layer is: Where σ is the activation function, is an operator that combines the information of node v with its neighboring information by concatenation or summation. AGG(.) represents an aggregation function that maps neighborhood information from different relations into a vector;

异配节点(heterophilic node)：给定一张包含n个节点实例的属性图，有b个节点异于常规节点(b＜＜n)，它们的属性、连接或行为不同于大多数其他正常节点；异配节点检测的目标是学习一个能够识别并区分与中心节点显著不同的异常节点的模型。Heterophilic node: Given an attribute graph containing n node instances, there are b nodes that are different from regular nodes (b＜＜n), and their attributes, connections or behaviors are different from most other normal nodes; the goal of heterophilic node detection is to learn a model that can identify and distinguish abnormal nodes that are significantly different from the central node.

下面结合附图及实施例对本发明做进一步详细说明：The present invention is further described in detail below with reference to the accompanying drawings and embodiments:

如图1所示，一种基于多路邻域节点自适应选择的异质图异常检测方法，包括以下步骤：As shown in FIG1 , a heterogeneous graph anomaly detection method based on multi-path neighborhood node adaptive selection includes the following steps:

S1，给定n个不同的元路径模式根据每个元路径模式进行相应的连续矩阵乘法计算，构建出具有n维关系路径语义信息的多维图 S1, given n different meta-path patterns According to each meta-path pattern, corresponding continuous matrix multiplication calculations are performed to construct a multi-dimensional graph with n-dimensional relationship path semantic information.

具体过程为：The specific process is:

S11，不同的下游图挖掘任务可能对目标节点的类型数量有特定的要求，通常情况下，每种目标节点类型都会对应构建一个多维图；S11, Different downstream graph mining tasks may have specific requirements on the number of target node types. Usually, a multi-dimensional graph is constructed for each target node type;

S12，沿着预设计的元路径中所有关系对应的邻接矩阵进行连续相乘来构建基于元路径的图的邻接矩阵其表达如下：S12, along the pre-designed metapath The adjacency matrix of the graph based on the meta-path is constructed by continuously multiplying the adjacency matrices corresponding to all relations in It is expressed as follows:

其中，e_l代表节点类型，是关系<e_l,e_l+1>对应的邻接矩阵；对n个不同的元路径模式进行类似的计算，可以构建出一组同质图对应的邻接矩阵将异质图转化为n维多维图；Among them, e _l represents the node type, is the adjacency matrix corresponding to the relationship < _el , _el+1 >; for n different meta-path patterns By performing similar calculations, we can construct an adjacency matrix corresponding to a set of homogeneous graphs. Convert heterogeneous graphs into n-dimensional multidimensional graphs;

S2，从多路采样启发的角度扩展节点邻域的采样范围，利用上下文多臂赌博机动态地调整每维关系路径下邻域节点的采样概率，并根据概率分别从每个维度为每个节点采样出适当的T个邻域节点，从而构建邻域采样池NSP；S2, expands the sampling range of the node neighborhood from the perspective of multi-path sampling inspiration, uses the contextual multi-armed bandit to dynamically adjust the sampling probability of the neighborhood nodes under each dimension of the relationship path, and samples appropriate T neighborhood nodes for each node from each dimension according to the probability, thereby constructing the neighborhood sampling pool NSP;

具体过程为：The specific process is:

S21，对于具有d_Φ(v)维节点属性X_v的节点v，采用线性变换矩阵和偏置向量将节点属性变换到特定的d维隐嵌入空间中，得到节点的初始表征嵌入H_v，其表达如下：S21, for a node v with d _Φ(v) -dimensional node attributes X _v , a linear transformation matrix is used and the bias vector Transform node attributes into a specific d-dimensional latent embedding space In the above example, we get the initial representation embedding H _v of the node, which is expressed as follows:

H_v＝W_Φ(v)X_v+b_Φ(v) (2)H _v ＝W _Φ(v) X _v +b _Φ(v) (2)

S22，对于每个维度的中心节点，采用多路代表性的邻域采样启发式策略，这些策略包括：一跳邻居策略、二跳邻居策略、最近邻邻居策略和随机游走邻居策略；S22, for the central node of each dimension, a multi-way representative neighborhood sampling heuristic strategy is adopted, which includes: one-hop neighbor strategy, two-hop neighbor strategy, nearest neighbor strategy and random walk neighbor strategy;

具体为：Specifically:

一跳邻居策略：从归一化邻接矩阵中采样的直接连接节点，其中，是添加了自环的邻接矩阵，是满足的对角矩阵；One-hop neighbor strategy: From the normalized adjacency matrix The directly connected nodes sampled in , where is the adjacency matrix with self-loops added, is satisfied The diagonal matrix of ;

二跳邻居策略：从归一化邻接矩阵的幂中采样的间接连接节点；Two-hop neighbor strategy: from the power of the normalized adjacency matrix Indirectly connected nodes sampled in;

最近邻邻居策略：从基于节点属性的余弦相似性的概率矩阵S^KNN中采样的节点，其中，节点v_i和v_j之间的相似性得分为 Nearest neighbor strategy: nodes are sampled from the probability matrix S ^KNN based on the cosine similarity of node attributes, where the similarity score between nodes _vi and _vj is

随机游走邻居策略：从随机游走权重矩阵S^PPR中采样的节点，其中，个性化PageRank算法用于生成游走权重这里的纯量系数c∈[0,1]，表示列归一化邻接矩阵；Random walk neighbor strategy: nodes sampled from the random walk weight matrix S ^PPR , where the personalized PageRank algorithm is used to generate the walk weights Here the scalar coefficient c∈[0,1], represents the column-normalized adjacency matrix;

S23，用Q^k来代表第k个邻域采样启发式策略的概率矩阵，在回合ep中，结合这K路邻域采样启发式策略的采样概率，计算中心节点v_i对邻域节点v_j的加权多路采样概率 S23, use ^Qk to represent the probability matrix of the kth neighborhood sampling heuristic strategy. In round ep, combine the sampling probabilities of the K-way neighborhood sampling heuristic strategy to calculate the weighted multi-way sampling probability of the central node _vi to the neighboring node _vj

其中，是邻域采样启发式策略粒度概率，Q^k满足行归一化 in, is the granularity probability of the neighborhood sampling heuristic strategy, Q ^k satisfies row normalization

S24，计算策略粒度概率它是第k个邻域采样启发式策略的概率矩阵Q^k的权重；多路采样启发策略的综合选择可以被视为上下文多臂赌博机中的对抗性问题，策略粒度概率作为多路邻域采样的赌博动作，自适应地调整每一路的采样概率，并随着模型训练回合的迭代而更新：S24, calculate the probability of strategy granularity It is the weight of the probability matrix ^Qk of the kth neighborhood sampling heuristic strategy; the comprehensive selection of multi-way sampling heuristic strategies can be regarded as an adversarial problem in the contextual multi-armed bandit. The strategy granularity probability is used as a gambling action for multi-way neighborhood sampling, adaptively adjusting the sampling probability of each way and updating it with the iteration of the model training round:

其中，初始值且p_min≥0是一个可控常数，用于约束采样概率的下限；w_k是第k路策略的权重，根据具有监督学习保证的上下文多臂赌博机强化学习算法，其迭代更新规则如下：Among them, the initial value And p _min ≥ 0 is a controllable constant used to constrain the lower limit of the sampling probability; w _k is the weight of the k-th strategy. According to the contextual multi-armed bandit reinforcement learning algorithm with supervised learning guarantee, its iterative update rule is as follows:

其中，是更新周期，和是可调整的超参数，代表上下文多臂赌博机选择动作时的遗憾率；是在第ep回合第k个策略下采样节点的加权奖励值，其表达如下：in, is the update cycle, and is an adjustable hyperparameter representing the regret rate of the contextual multi-armed bandit when selecting actions; is the weighted reward value of the sampling node under the k-th strategy in the ep-th round, which is expressed as follows:

S3，针对邻域采样池中每个中心节点的采样邻域节点集合，根据其采样概率进行顺序排列，并按照采样邻域节点序列的依赖关系、动态特性以及复杂程度仿真强化学习任务中的基本环境要素；S3, for each central node in the neighborhood sampling pool, the sampling neighborhood node set is arranged in order according to its sampling probability, and the basic environmental elements in the reinforcement learning task are simulated according to the dependency, dynamic characteristics and complexity of the sampling neighborhood node sequence;

具体过程为：The specific process is:

o_t+1＝τo_t+(1-τ)o_t+1 (8)o _t+1 =τo _t +(1-τ)o _t+1 (8)

其中，p_s是基本跳过率，是确保动态特性的最小跳过率；d_t代表在时间步t时邻域节点v_t的归一化复杂性度量，其表达如下：Among them, _ps is the basic skip rate, is the minimum skip rate to ensure dynamic characteristics; _dt represents the normalized complexity measure of the neighborhood node _vt at time step t, which is expressed as follows:

其中，即从T个采样节点中选取具有最大多路采样概率的节点序号；l_t∈[0,1]是节点v与v_t同配的归一化置信度，与智能体行为中选择决策的奖励值有关；κ是平衡中心节点v与邻域节点v_t之间同配性和异配性特征的超参数；in, That is, the node number with the maximum multi-path sampling probability is selected from the T sampling nodes; l _t ∈ [0,1] is the normalized confidence that nodes v and v _t are assortative, which is related to the reward value of the decision-making selection in the agent's behavior; κ is a hyperparameter that balances the assortative and heteroassortative characteristics between the central node v and the neighboring node v _t ;

S4，在一个回合中，智能体依次对中心节点与它对应的最多T个采样邻域节点进行异配性判断，该过程被视作T轮决策过程；在多维图中为n维关系路径构建的同质图部署n种智能体，它们在n维关系路径语义的马尔可夫博弈中学习如何选择同配性节点以实现多维度选择的合作均衡，从而构建邻域选择池NCP；S4, in one round, the agents make heterogeneity judgments on the central node and its corresponding up to T sampled neighboring nodes in turn, and this process is regarded as a T-round decision process; n types of agents are deployed in the homogeneous graph constructed for the n-dimensional relationship path in the multidimensional graph, and they learn how to select homogeneous nodes in the Markov game of the n-dimensional relationship path semantics to achieve cooperative equilibrium in multi-dimensional selection, thereby constructing a neighborhood selection pool NCP;

具体过程为：The specific process is:

S41，采用最新的中心节点表征H_v，当前时间步的邻域节点表征H_vt和采样邻域节点集合NSP(v)共同作为当前时刻t的可观测状态o_t，其表达如下：S41, using the latest central node representation _Hv , the neighborhood node representation _Hvt of the current time step and the sampled neighborhood node set NSP(v) as the observable state _ot at the current time t, which is expressed as follows:

o_t(v)＝[H_v,H_vt,H_NSP(v)] (11)o _t(v) =[H _v ,H _vt ,H _NSP(v) ] (11)

S42，每个智能体的动作空间定义为一个由选择(select)和不选择(unselect)同配性节点组成的离散方案，其表达如下：S42, the action space of each agent It is defined as a discrete scheme consisting of selecting and unselecting nodes of similarity, which is expressed as follows:

S43，在时间步t，采用线性层将中心节点表征H_v和邻域节点表征组合起来，获得整合中心节点与当前时间步邻域节点的信息其估计表达如下：S43, at time step t, a linear layer is used to represent the central node H _v and the neighboring node H v. Combined, we can get the information of the integrated center node and the neighboring nodes of the current time step. The estimated expression is as follows:

其中，σ(·)是激活函数，(·||·)是拼接操作，W_h和b_h分别是线性映射中可学习的权重矩阵和偏置向量；借鉴对比预测编码的理论，估计中心节点与邻域节点集之间的互信息，获得更高阶的信息量其估计表达如下：Among them, σ(·) is the activation function, (·||·) is the concatenation operation, W _h and b _h are the learnable weight matrix and bias vector in the linear mapping, respectively; referencing the theory of contrastive predictive coding, the mutual information between the central node and the set of neighboring nodes is estimated to obtain higher-order information. The estimated expression is as follows:

其中，W_d是可学习的线性变换矩阵；将上面两个估计式拼接起来，进行可学习的线性变换W_p和Softmax层输出归一化步骤，获得动作概率分布其表达如下：Among them, _Wd is a learnable linear transformation matrix; concatenate the above two estimation formulas, perform a learnable linear transformation _Wp and a Softmax layer output normalization step, and obtain the action probability distribution It is expressed as follows:

S44，在时间步t，根据图结构的数据特点，采用无监督的方式为智能体的动作分配奖励r_t+1(v)，其表达如下：S44, at time step t, according to the data characteristics of the graph structure, an unsupervised approach is used to assign rewards r _t+1(v) to the actions of the agent, which is expressed as follows:

具体过程为：The specific process is:

其中，gi(·)是面向不同模块的多层感知机嵌入函数，W_s是可学习的线性变换矩阵，它在其他智能体α_j的贡献函数g^ou(o,u)中被共享使用，是贡献价值的注意力权重；Among them, gi(·) is the multi-layer perceptron embedding function for different modules, _Ws is a learnable linear transformation matrix, which is shared in the contribution function g ^ou (o,u) of other agents α _j . is the attention weight of the contribution value;

S462，所有智能体的价值函数优化可以近似看作多任务回归问题，它们的联合回归损失函数如下：S462, the value function optimization of all agents can be approximately regarded as a multi-task regression problem, and their joint regression loss function is as follows:

其中，θ和分别是评论家网络和目标评论家网络的参数；是目标价值函数，其表达如下：Among them, θ and are the parameters of the critic network and the target critic network, respectively; is the target value function, which is expressed as follows:

其中，A_i(o|u)＝Q_i(o,u；θ)-b_i(o,u)是优势函数，是反事实基线，φ是策略网络的参数，ρ是可学习的策略对数调整系数；所有目标网络的参数都是通过软更新机制进行同步的；Among them, A _i (o|u) = _Qi (o,u; θ) - _bi (o,u) is the advantage function, is the counterfactual baseline, φ is the parameter of the policy network, and ρ is the learnable policy logarithmic adjustment coefficient; all target network parameters are synchronized through the soft update mechanism;

S5，为了获得全面的节点表征嵌入，采用关系感知邻域聚合，将邻域聚合分为关系路径内的聚合和关系路径间的聚合；S5, in order to obtain comprehensive node representation embedding, relation-aware neighborhood aggregation is adopted, and neighborhood aggregation is divided into aggregation within relation paths and aggregation between relation paths;

具体过程为：The specific process is:

S51，在关系路径内的聚合中，采用注意力机制来对邻域选择的权重进行动态调整，得到关系路径内的聚合表征嵌入其表达如下：S51, in the aggregation within the relationship path, the attention mechanism is used to dynamically adjust the weight of the neighborhood selection to obtain the aggregation representation embedding within the relationship path It is expressed as follows:

其中，MLP(·)是面向最终图任务的多层感知机函数；Among them, MLP(·) is the multi-layer perceptron function for the final graph task;

S6，将目标中心节点的最终表征嵌入Z作为输出，并传递到下游任务中进行异质图的表示学习；S6, embeds the final representation of the target central node into Z as output and passes it to the downstream task for representation learning of heterogeneous graphs;

具体过程为：The specific process is:

S63，基于下游任务的标签监督信号对模型进行学习优化，该异质图神经网络的损失函数如下：S63, based on the label supervision signal of the downstream task, the model is learned and optimized. The loss function of the heterogeneous graph neural network is as follows:

最后应说明的是：以上各实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述各实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分或者全部技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, rather than to limit it. Although the present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that they can still modify the technical solutions described in the aforementioned embodiments, or replace some or all of the technical features therein by equivalents. However, these modifications or replacements do not cause the essence of the corresponding technical solutions to deviate from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A heterogeneous graph anomaly detection method based on multipath neighborhood node self-adaptive selection is characterized by comprising the following steps of: the method comprises the following steps:

S1, a multi-dimensional diagram with multi-dimensional semantic information is constructed by giving a meta-path mode;

S2, expanding the sampling range of the multipath neighborhood, dynamically adjusting the sampling probability of the neighborhood nodes, and sampling a proper amount of neighborhood nodes;

s3, simulating basic environmental elements in the reinforcement learning task according to the dependence, the dynamics and the complexity of the sampling node sequence;

S4, deploying multi-agent in the multi-dimensional graph, and adaptively selecting homoleptic nodes from the multi-dimensional sampling node sequence in sequence;

S5, comprehensive node characterization embedding is learned by utilizing relation perception neighborhood aggregation;

And S6, embedding and transmitting the characterization of the target node into a downstream task to perform heterogeneous graph representation learning.

2. The heterogeneous graph anomaly detection method based on adaptive selection of multiple neighbor nodes according to claim 1, wherein the method is characterized by comprising the following steps: in S1, n different meta-path patterns are givenCorresponding continuous matrix multiplication calculation is carried out according to each element path mode, and a multidimensional graph with n-dimensional relation path semantic information is constructedThe process comprises the following steps:

s11, different downstream graph mining tasks may have specific requirements on the types and the number of target nodes, and each target node type correspondingly constructs a multidimensional graph;

S12, along a preset meta-path Successive multiplication of adjacency matrices corresponding to all relationships in a matrix to construct adjacency matrices of a primitive path-based graphThe expression is as follows:

wherein e _l represents the node type, Is an adjacency matrix corresponding to the relation < e _l,e_l+1 >; for n different meta-path modesPerforming similar calculation to construct a group of adjacency matrixes corresponding to the homogeneity graphsThe heterograms are converted into n-dimensional multidimensional maps.

3. The heterogeneous graph anomaly detection method based on adaptive selection of multiple neighbor nodes according to claim 1, wherein the method is characterized by comprising the following steps: in S2, specifically, the sampling range of a node neighborhood is expanded from the angle of multi-path sampling heuristic, the sampling probability of the neighborhood nodes under each dimension relation path is dynamically adjusted by using a contextual multi-arm gambling machine, and T neighborhood nodes are sampled for each node from each dimension according to the probability, so that a neighborhood sampling pool NSP is constructed; the process comprises the following steps:

s21, for a node v with d _Φ(v) -dimensional node attribute X _v, adopting a linear transformation matrix Bias vectorTransforming node attributes to a specific d-dimensional hidden embedding spaceThe initial characterization embedding H _v of the node is obtained and expressed as follows:

H_v＝W_Φ(v)X_v+b_Φ(v) (2)

S22, adopting a multipath representative neighborhood sampling heuristic strategy for the central node of each dimension;

S23, using Q ^k to represent probability matrix of kth neighborhood sampling heuristic, in turn ep, calculating weighted multi-path sampling probability of center node v _i to neighborhood node v _j by combining sampling probability of K paths of neighborhood sampling heuristic

Wherein,Is the granularity probability of a neighborhood sampling heuristic strategy, and Q ^k meets the line normalization

S24, calculating the granularity probability of the strategyThe weight of probability matrix Q ^k, which is the kth neighborhood sampling heuristic; the comprehensive selection of neighborhood sampling heuristics can be seen as an oppositional problem in a contextual multi-arm gambling machine, with policy granularity probability as the gambling action of multi-path neighborhood sampling, adaptively adjusting the sampling probability of each path, and updating with iterations of model training rounds:

Wherein the initial value And p _min is more than or equal to 0 and is a controllable constant for restricting the lower limit of the sampling probability; w _k is the weight of the kth path strategy, and according to the contextual multi-arm gambling machine reinforcement learning algorithm with supervised learning assurance, the iterative updating rule is as follows:

Wherein, Is the period of the update period and,AndIs an adjustable super parameter representing the regretrate of the contextual multi-arm gambling machine in selecting actions; the weighted prize value of the sampling node under the kth policy of the ep round is expressed as follows:

Wherein, Is a set of node v _i selecting a neighborhood node; is the average multi-headed attention score for nodes v _i and v _j;

S25, after each round is finished, for each node v, the path is related in each dimension Under, according to sampling probabilitySlave node setSampling T neighborhood nodes, and adding the center node and neighborhood node pairs into a neighborhood sampling pool:

Wherein, Is the sampling probability.

4. The heterogeneous graph anomaly detection method based on adaptive selection of multiple neighbor nodes according to claim 3, wherein the method is characterized by comprising the following steps: in S22, the multi-path representative neighborhood sampling heuristic includes: a one-hop neighbor strategy, a two-hop neighbor strategy, a nearest neighbor strategy and a random walk neighbor strategy; the method comprises the following steps:

The one-hop neighbor policy: from normalized adjacency matrix A direct connection node of the mid-sample, wherein,Is an adjacency matrix to which a self-loop is added,Is satisfied withIs a diagonal matrix of (a);

the two-hop neighbor policy: from powers of normalized adjacency matrices Indirect connection nodes of the middle sampling;

The nearest neighbor policy: nodes sampled from a probability matrix S ^KNN based on cosine similarity of node attributes, wherein the similarity score between nodes v _i and v _j is

The random walk neighbor policy: nodes sampled from the random walk weight matrix S ^PPR, wherein the personalized PageRank algorithm is used to generate the walk weightsThe scalar coefficient c epsilon 0,1,Representing a column normalized adjacency matrix.

5. The heterogeneous graph anomaly detection method based on adaptive selection of multiple neighbor nodes according to claim 1, wherein the method is characterized by comprising the following steps: in S3, specifically, sampling neighborhood node sets aiming at each central node in a neighborhood sampling pool are sequentially arranged according to sampling probability of the sampling neighborhood node sets, and basic environment elements in a reinforcement learning task are simulated according to the dependency relationship, dynamic characteristics and complexity of the sampling neighborhood node sequences; the process comprises the following steps:

S31, in order to meet the requirement of enhancing the dependency in the environment elements, when the neighborhood nodes are sequentially selected, the neighborhood nodes are ordered according to the sampling probability of the neighborhood nodes, the neighborhood nodes which are arranged in close relation exist in terms of semanteme, as the agent advances in the neighborhood node sequence along the time step, the adjacent neighborhood nodes can generate the sequential dependency, and the dependency process is expressed as follows:

o_t+1＝τo_t+(1-τ)o_t+1 (8)

where o _t is the environmental state at time step t, τ is a small value of the hyper-parameter;

S32, in order to meet the requirement of enhancing the dynamic property in the environment elements, a skip rate p _s is set for the neighborhood node sequence, the intelligent agent skips the evaluation of a certain neighborhood node randomly with the probability p _s and directly transfers to the next neighborhood node, and at the moment, the length and the content of the state sequence are obviously changed in each round;

S33, in order to meet the complexity in the enhanced environment elements, the neighborhood nodes are arranged in a descending order according to the sampling probability of the neighborhood nodes, the intelligent agent starts to select and judge from the simple neighborhood nodes with higher sampling probability, and avoids the difficult neighborhood nodes by using a higher skip rate, wherein the skip rate p _s is updated along with the iteration of the model training round:

Where p _s is the basic skip rate, Is a minimum skip rate that ensures dynamic characteristics; d _t represents the normalized complexity measure of the neighborhood node v _t at time step t, expressed as follows:

Wherein, Selecting a node sequence number with the maximum multipath sampling probability from T sampling nodes; l _t epsilon [0,1] is the normalized confidence that node v is co-configured with v _t, and is related to the reward value of the selection decision in the agent behavior; kappa is a hyper-parameter that balances homoleptic and heteroleptic characteristics between center node v and neighborhood node v _t.

6. The heterogeneous graph anomaly detection method based on adaptive selection of multiple neighbor nodes according to claim 1, wherein the method is characterized by comprising the following steps: in S4, specifically, in one round, the intelligent agent sequentially carries out heteroleptic judgment on the central node and at most T sampling neighborhood nodes corresponding to the central node, and the process is regarded as a T-round decision process; deploying n kinds of intelligent agents for a homography constructed for an n-dimensional relation path in the multi-dimensional graph, and learning how to select homography nodes in a Markov game of the semantics of the n-dimensional relation path so as to realize cooperation balance of multi-dimensional selection, thereby constructing a neighborhood selection pool NCP; the process comprises the following steps:

S41, adopting the latest central node representation H _v, the neighborhood node representation H _vt of the current time step and the sampling neighborhood node set NSP (v) to serve as an observable state o _t of the current time t together, and expressing as follows:

S42, action space of each agent Defined as a discrete scheme consisting of selected and unselected homoleptic nodes expressed as follows:

S43, at time step t, adopting a linear layer to represent the central node representation H _v and the neighborhood node representation Combining to obtain information of the integration center node and the neighborhood node of the current time stepThe estimation is expressed as follows:

wherein σ (·) is an activation function, (·|·) is a stitching operation, and W _h and b _h are a weight matrix and a bias vector, respectively, that can be learned in the linear mapping; by referring to the theory of contrast predictive coding, the mutual information between the central node and the neighborhood node set is estimated to obtain higher-order information quantity The estimation is expressed as follows:

Wherein W _d is a learnable linear transformation matrix; splicing the two estimation modes, and performing a step of output normalization of a leachable linear transformation W _p and a Softmax layer to obtain motion probability distribution The expression is as follows:

S44, in a time step t, according to the data characteristics of the graph structure, rewards r _t+1(v) are distributed for the actions of the intelligent agent in an unsupervised mode, and the expression is as follows:

Wherein σ (·) is a Sigmoid activation function, Ω is a set of positive node pairs observable on the heterogeneous graph, Ω ^- is a set of negative node pairs sampled from all unobserved node pairs, ω ₁ and ω ₂ are adjustable hyper-parameters representing the reward weight of the positive sample and the penalty weight of the negative sample, respectively;

S45, at time step t, the agent alpha observes the node state o _t according to the strategy Executing action u _t to obtain feedback r _t+1, recording the completion state done, entering the next time step t+1, continuously observing the new node state o _t+1, and building an experience playback buffer pool for each agent alphaThe empirical trace τ ^α＝(o_t,u_t,r_t+1,o_t+1, done is stored at each time step);

S46, learning and optimizing the behavior of the multi-agent are achieved by adopting a multi-actor attention commentator algorithm, a decentralization strategy is trained in a multi-agent reinforcement learning environment, important information of each agent is dynamically selected through a multi-head attention mechanism, and evaluation of the learning strategy is concentrated;

s47, after each round is finished, for each dimension of the relation path And (3) the following center nodes v, the agent iteratively selects candidate nodes from the neighborhood sampling pool along the time sequence, and adds the center nodes and the selection node pairs into the neighborhood selection pool:

7. The heterogeneous graph anomaly detection method based on adaptive selection of multiple neighbor nodes according to claim 6, wherein the method is characterized by comprising the following steps: in S46, the specific process includes the following steps:

S461, the expression of the state action cost function of agent α _i is as follows:

Wherein g _i (·) is a multi-layer perceptron embedding function for different modules, W _s is a learnable linear transformation matrix shared among the contribution functions g ^ou (o, u) of other agents alpha _j, Attention weight that is a contribution value;

s462, the cost function optimization of all agents can be approximated as a multi-task regression problem, their joint regression loss functions are as follows:

Wherein θ and Parameters of the critics network and the target critics network are respectively; Is a target cost function expressed as follows:

where γ is the discount factor, τ is the temperature hyper-parameter that balances the maximization entropy and the rewards, Is a parameter of the target policy network;

s463, the policy network of each agent is updated by gradient ramp-up, their respective gradients are expressed as follows:

Wherein A _i(o|u)＝Q_i(o,u;θ)-b_i (o, u) is the dominance function, Is the counterfactual baseline, phi is a parameter of the policy network, ρ is a learnable policy log adjustment coefficient.

8. The heterogeneous graph anomaly detection method based on adaptive selection of multiple neighbor nodes according to claim 1, wherein the method is characterized by comprising the following steps: in S5, specifically: in order to obtain comprehensive node characterization embedding, adopting relation-aware neighborhood aggregation, and dividing the neighborhood aggregation into aggregation in a relation path and aggregation among the relation paths; the process comprises the following steps:

S51, in the aggregation in the relation path, adopting an attention mechanism to dynamically adjust the weight selected by the neighborhood to obtain the aggregation characterization embedding in the relation path The expression is as follows:

Wherein W _h and W _t are linear transformation matrices for the central node and the neighborhood nodes, respectively, which are shared in all relationship paths; a _v,u represents the importance score of node v _j to v _i expressed as follows:

Wherein W _a is an attention transformation matrix, and the tanh activation function can further distinguish abnormal information brought to the central node by the neighborhood nodes;

S52, in the aggregation among the relationship paths, the semantic representation embedded information under all the relationship paths is spliced together for joint learning, so that the aggregation representation embedded Z _v among the relationship paths is obtained:

Wherein, MLP (·) is the multi-layer perceptron function towards the final graph task.

9. The heterogeneous graph anomaly detection method based on adaptive selection of multiple neighbor nodes according to claim 1, wherein the method is characterized by comprising the following steps: in S6, the final representation of the target central node is embedded into Z as output and is transmitted to a downstream task for representation learning of a heterogeneous graph; the process comprises the following steps:

s61, aiming at the graph node classification task, firstly setting the dimension of an output layer of a graph neural network to be the same as the number of node types, and then carrying out normalization processing on a single-label classification scene through a Softmax function, and optimizing the algorithm performance by combining a minimized cross entropy loss function; for a multi-label classification scene, a Sigmoid function is adopted to process independent classification problems of each label, and algorithm performance is optimized by combining a minimum binary cross entropy loss function;

s62, when training, aiming at a graph link prediction task, firstly, carrying out inner product operation on a prediction node pair or processing through a DistMult decoder, and then optimizing algorithm performance by combining a minimum binary cross entropy loss function;

S63, learning and optimizing a model based on a label supervision signal of a downstream task, wherein the loss function of the heterogeneous graph neural network is as follows:

Wherein, Is a penalty term for the relational aware neighborhood aggregation module, λ _* is a regularization weight, θ is a model parameter set, θ ₂ is an L ₂ regularization norm of the model parameters.