WO2022267976A1

WO2022267976A1 - Entity alignment method and apparatus for multi-modal knowledge graphs, and storage medium

Info

Publication number: WO2022267976A1
Application number: PCT/CN2022/099188
Authority: WO
Inventors: 朱佳
Original assignee: Zhejiang Normal University CJNU
Current assignee: Zhejiang Normal University CJNU
Priority date: 2021-06-21
Filing date: 2022-06-16
Publication date: 2022-12-29
Anticipated expiration: 2023-12-21
Also published as: LU503448B1; CN113360673B; CN113360673A

Abstract

Disclosed are an entity alignment method and apparatus for multi-modal knowledge graphs, and a storage medium. The present invention comprises: acquiring data of a first multi-modal knowledge graph and a second multi-modal knowledge graph, and extracting therefrom entities that require alignment; processing multi-modal data of the entities to obtain each modal vector of the entities, and performing early fusion and late fusion according to each modal vector; combining the result of early fusion and the result of late fusion to obtain a multi-modal embedded vector; and performing entity alignment according to the multi-modal embedded vector. By using the method of the present invention, entity alignment for multi-modal knowledge graphs can be implemented, thus solving the problem of inconsistency between multi-modal knowledge expressions. The present invention can be widely applied to the technical field of knowledge graphs.

Description

Entity alignment method, device and storage medium for multimodal knowledge graph

technical field

本发明涉及知识图谱技术领域，尤其是多模态知识图谱的实体对齐方法、装置及存储介质。The present invention relates to the technical field of knowledge graphs, in particular to an entity alignment method, device and storage medium for multimodal knowledge graphs.

Background technique

因为大多数知识图谱是为了特定目的，并且是基于单语环境来构建的，这导致了同一概念在不同的知识图谱中会出现不同表述的情况。而实体对齐的目的就是把两个知识图谱中表述不同但实际相同的实体筛选出来，以整合不同的知识图谱。Because most knowledge graphs are built for specific purposes and based on a monolingual environment, this leads to the situation that the same concept may appear differently in different knowledge graphs. The purpose of entity alignment is to filter out entities that are different but actually the same in the two knowledge graphs, so as to integrate different knowledge graphs.

由于知识形态多种多样，目前嵌入技术尚不能很好处理多模态知识，为克服这一挑战，近期研究者提出了各种模型，以融合知识图谱中多模态信息，并形成联合嵌入，让对齐模型能自动调节模态权重。但是，这些研究并未考虑特征级别的模态相关性，当多模态之间的相关性相对较大时，很可能得不到满意的结果。现有技术中存在的这些问题亟待解决。Due to the variety of knowledge forms, the current embedding technology cannot handle multi-modal knowledge well. To overcome this challenge, researchers have recently proposed various models to fuse multi-modal information in knowledge graphs and form joint embeddings. Allows the alignment model to automatically adjust the modal weights. However, these studies have not considered the modal correlation at the feature level, and it is likely that unsatisfactory results will be obtained when the correlation between multiple modalities is relatively large. These problems existing in the prior art need to be solved urgently.

发明内容Contents of the invention

本发明的目的在于至少一定程度上解决现有技术中存在的技术问题之一。The purpose of the present invention is to solve one of the technical problems in the prior art at least to a certain extent.

为此，本发明实施例的一个目的在于提供多模态知识图谱的实体对齐方法、装置和介质，其能够通过对多模态知识图谱进行早期融合以及晚期融合，实现对多模态知识图谱的实体对齐，解决了多模态知识表述之间不一致的问题。To this end, an object of the embodiments of the present invention is to provide an entity alignment method, device, and medium for multimodal knowledge graphs, which can realize the integration of multimodal knowledge graphs by performing early fusion and late fusion on multimodal knowledge graphs. Entity alignment, which resolves the inconsistency between multimodal knowledge representations.

为了达到上述技术目的，本发明实施例所采取的技术方案包括：In order to achieve the above technical objectives, the technical solutions adopted in the embodiments of the present invention include:

第一方面，本发明实施例提供了一种多模态知识图谱的实体对齐方法，包括以下步骤：In the first aspect, an embodiment of the present invention provides a method for entity alignment of a multimodal knowledge graph, including the following steps:

多模态知识图谱的实体对齐方法，其特征在于，包括以下步骤：The entity alignment method of the multimodal knowledge graph is characterized in that it comprises the following steps:

获取第一多模态知识图谱和第二多模态知识图谱的数据；Obtain data of the first multimodal knowledge graph and the second multimodal knowledge graph;

分别从所述第一多模态知识图谱和第二多模态知识图谱中提取需要对齐的实体；Extract entities to be aligned from the first multimodal knowledge graph and the second multimodal knowledge graph respectively;

处理所述实体的多模态数据，得到所述实体的各模态向量，其中，所述多模态数据包括图像数据、关系数据、属性数据以及知识图谱结构数据；所述各模态向量包括图像嵌入向量、关系嵌入向量、属性嵌入向量以及知识图谱结构向量；Processing the multimodal data of the entity to obtain the modal vectors of the entity, wherein the multimodal data includes image data, relationship data, attribute data, and knowledge map structure data; the modal vectors include Image embedding vector, relationship embedding vector, attribute embedding vector and knowledge map structure vector;

根据所述各模态向量，通过全连结神经网络模型进行早期融合；Carry out early fusion through the fully connected neural network model according to the modal vectors;

根据所述各模态向量，通过低秩多模态模型进行晚期融合；performing late fusion through a low-rank multimodal model according to each of the modality vectors;

对早期融合的结果和晚期融合的结果进行结合，得到多模态嵌入向量；Combining the results of early fusion and late fusion to obtain a multimodal embedding vector;

根据所述多模态嵌入向量执行实体对齐。Entity alignment is performed based on the multimodal embedding vectors.

进一步地，所述处理所述实体的图像数据，得到所述实体的图像嵌入向量这一步骤，其具体包括：Further, the step of processing the image data of the entity to obtain the image embedding vector of the entity specifically includes:

采用预先训练过的RESNET模型对所述获取的图像数据进行特征提取；Using a pre-trained RESNET model to perform feature extraction on the acquired image data;

通过第一预设函数对所述提取特征进行处理，得到图像嵌入向量。The extracted features are processed by a first preset function to obtain an image embedding vector.

进一步地，所述处理所述实体的关系数据，得到所述实体的关系嵌入向量这一步骤，其具体包括：Further, the step of processing the relationship data of the entity to obtain the relationship embedding vector of the entity specifically includes:

通过TransE模型将所述获取的关系数据转换成平移向量；Converting the obtained relationship data into a translation vector through a TransE model;

通过第二预设函数计算所述平移向量的结构相似性，得到逻辑回归损失函数；Calculating the structural similarity of the translation vector through a second preset function to obtain a logistic regression loss function;

通过将所述逻辑回归损失函数进行收敛，得到关系嵌入向量。By converging the logistic regression loss function, a relationship embedding vector is obtained.

进一步地，所述处理所述实体的属性数据，得到所述实体的属性嵌入向量这一步骤，其具体包括：Further, the step of processing the attribute data of the entity to obtain the attribute embedding vector of the entity specifically includes:

通过前馈网络将所述获得的属性数据映射到低维空间，得到属性嵌入向量。The acquired attribute data is mapped to a low-dimensional space through a feed-forward network to obtain an attribute embedding vector.

进一步地，所述处理所述实体的知识图谱结构数据，得到所述实体的结构嵌入向量这一步骤，其具体包括：Further, the step of processing the knowledge map structure data of the entity to obtain the structure embedding vector of the entity specifically includes:

建立基于图卷积网络的半监督嵌入模型；Building a semi-supervised embedding model based on graph convolutional networks;

设置关系顶点；set relationship vertices;

通过所述半监督嵌入模型对所述关系顶点进行处理，得到结构嵌入向量。Process the relationship vertices through the semi-supervised embedding model to obtain a structure embedding vector.

进一步地，所述早期融合具体包括：Further, the early fusion specifically includes:

建立全连结神经网络模型；Establish a fully connected neural network model;

通过所述全连结神经网络模型将所述RESNET模型提取的所有特征进行融合。All the features extracted by the RESNET model are fused through the fully connected neural network model.

进一步地，所述晚期融合具体包括：Further, the late fusion specifically includes:

通过低秩多模态融合模型化简多模态融合的向量表示；Simplify the vector representation of multimodal fusion through a low-rank multimodal fusion model;

通过预设方式对所述向量表示进行化简。Simplify the vector representation in a preset manner.

进一步地，所述对所述早期融合和所述晚期融合进行结合这一步骤，其具体包括：Further, the step of combining the early fusion and the late fusion specifically includes:

根据预设损失函数通过协同训练对所述早期融合和所述晚期融合进行结合。The early fusion and the late fusion are combined through collaborative training according to a preset loss function.

第二方面，本发明实施例提供了一种多模态知识图谱的实体对齐装置，包括：In the second aspect, an embodiment of the present invention provides an entity alignment device for a multimodal knowledge graph, including:

至少一个处理器；at least one processor;

至少一个存储器，用于存储至少一个程序；at least one memory for storing at least one program;

当所述至少一个程序被所述至少一个处理器执行时，使得所述至少一个处理器实现所述的多模态知识图谱的实体对齐方法。When the at least one program is executed by the at least one processor, the at least one processor is made to implement the entity alignment method of the multi-modal knowledge graph.

第三方面，本发明实施例提供了一种存储介质，其中存储有处理器可执行的指令，所述处理器可执行的指令在由处理器执行时用于实现所述的多模态知识图谱的实体对齐方法。In the third aspect, the embodiment of the present invention provides a storage medium, which stores processor-executable instructions, and the processor-executable instructions are used to implement the multi-modal knowledge graph when executed by the processor The entity alignment method for .

本发明公开了一种多模态知识图谱的实体对齐方法，具备如下有益效果：The invention discloses an entity alignment method of a multimodal knowledge graph, which has the following beneficial effects:

本发明通过获取第一多模态知识图谱和第二多模态知识图谱的数据，从中提取需要对齐的实体；然后处理由图像数据、关系数据、属性数据以及知识图谱结构数据组成的多模态实体数据，得到由图像嵌入向量、关系嵌入向量、属性嵌入向量以及知识图谱结构向量组成的各模态向量，并根据各模态向量，进行早期融合以及晚期融合；接着，对早期融合的结果和晚期融合的结果进行结合，得到多模态嵌入向量；最后，根据多模态嵌入向量执行实体对齐。通过使用本发明中的方法，能够实现对多模态知识图谱的实体对齐，解决了多模态知识表述之间不一致的问题。The present invention obtains the data of the first multi-modal knowledge map and the second multi-modal knowledge map, extracts entities that need to be aligned; and then processes the multi-modality composed of image data, relational data, attribute data and knowledge map structure data Entity data, get the modal vectors composed of image embedding vectors, relational embedding vectors, attribute embedding vectors, and knowledge map structure vectors, and perform early fusion and late fusion according to each modal vector; then, the results of early fusion and The results of the late fusion are combined to obtain multimodal embedding vectors; finally, entity alignment is performed based on the multimodal embedding vectors. By using the method in the present invention, the entity alignment of the multi-modal knowledge map can be realized, and the problem of inconsistency between multi-modal knowledge representations can be solved.

Description of drawings

为了更清楚地说明本发明实施例或者现有技术中的技术方案，下面对本发明实施例或者现有技术中的相关技术方案附图作以下介绍，应当理解的是，下面介绍中的附图仅仅为了方便清晰表述本发明的技术方案中的部分实施例，对于本领域的技术人员来说，在无需付出创造性劳动的前提下，还可以根据这些附图获取到其他附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following describes the accompanying drawings of the embodiments of the present invention or the related technical solutions in the prior art. It should be understood that the accompanying drawings in the following introduction are only In order to clearly describe some embodiments of the technical solutions of the present invention, those skilled in the art can also obtain other drawings based on these drawings without creative work.

图1为本发明具体实施例的一种多模态知识图谱的实体对齐方法的流程示意图；FIG. 1 is a schematic flow diagram of an entity alignment method of a multimodal knowledge graph according to a specific embodiment of the present invention;

图2为本发明具体实施例的一种多模态知识图谱的实体对齐方法在应用过程的流程图；Fig. 2 is a flow chart of the application process of an entity alignment method of a multimodal knowledge graph according to a specific embodiment of the present invention;

图3为本发明具体实施例的一种多模态知识图谱的实体对齐装置的结构示意图。Fig. 3 is a schematic structural diagram of an entity alignment device for a multi-modal knowledge graph according to a specific embodiment of the present invention.

detailed description

下面详细描述本发明的实施例，所述实施例的示例在附图中示出，其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的，仅用于解释本发明，而不能理解为对本发明的限制。对于以下实施例中的步骤编号，其仅为了便于阐述说明而设置，对步骤之间的顺序不做任何限定，实施例中的各步骤的执行顺序均可根据本领域技术人员的理解来进行适应性调整。Embodiments of the present invention are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals designate the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary only for explaining the present invention and should not be construed as limiting the present invention. For the step numbers in the following embodiments, it is only set for the convenience of illustration and description, and the order between the steps is not limited in any way. The execution order of each step in the embodiments can be adapted according to the understanding of those skilled in the art sexual adjustment.

实体对齐是通过排列同一真实世界原型的各种实体，以整合不同知识图谱的一项关键工作，因为大多数知识图谱是为特定目的，且基于单语环境构建的，故导致各种知识图谱即使是对于同一概念，都会有不同描述的差异。Entity alignment is a key task to integrate different knowledge graphs by arranging various entities of the same real-world prototype, because most knowledge graphs are built for a specific purpose and based on a monolingual environment, resulting in various Even for the same concept, there will be differences in different descriptions.

早期对实体对齐的研究大部分集中于属性相似度。这些研究时常困惑于使实体对齐容易出错的属性异质性。最近，鉴于知识图谱嵌入的迅速发展，许多研究者依据实体对齐问题的各类模型，尝试应用嵌入技术。不过，这些嵌入技术尚不能很好处理多模态知识，因为知识形态多种多样，有诸如关系三元组、图像等等，但与此同时，这些知识形态又高度支持实体对齐工作。Much of the early research on entity alignment focused on attribute similarity. These studies are often puzzled by attribute heterogeneity that makes entity alignment error-prone. Recently, given the rapid development of knowledge graph embedding, many researchers have tried to apply embedding techniques based on various models of the entity alignment problem. However, these embedding techniques are not yet able to handle multi-modal knowledge well, because knowledge forms are diverse, such as relational triples, images, etc., but at the same time, these knowledge forms are highly supportive of entity alignment work.

多模态知识产生实体对齐问题的影响并非微不足道，因为不同模态中不可避免的异质性使其难以学习和融合来自不同模态的知识表述。涉及同一目标，应用传统技术，仅使用图像或文字信息，是不容易识别的。为克服这一挑战，近期研究者提出了各种模型，以融合知识图谱中多模态信息，形成联合嵌入，让对齐模型能自动调节模态权重。不过，这些研究并未考虑特征级别的模态相关性，当多模态之间的相关性相对较大时，很可能得不到满意的结果。The impact of multimodal knowledge generating entity alignment problems is not trivial, since the inevitable heterogeneity in different modalities makes it difficult to learn and fuse knowledge representations from different modalities. Involving the same target, it is not easy to identify using traditional technology and only using image or text information. To overcome this challenge, researchers have recently proposed various models to fuse multimodal information in knowledge graphs to form a joint embedding, allowing the alignment model to automatically adjust the modal weights. However, these studies did not consider the modal correlation at the feature level, and when the correlation between multiple modalities is relatively large, it is likely that unsatisfactory results will be obtained.

基于上述问题，本方案提出了一种多模态知识图谱的实体对齐方法。本方案通过先处理实体中由图像数据、关系数据、属性数据以及知识图谱结构数据组成的多模态数据，以得到由图像嵌入向量、关系嵌入向量、属性嵌入向量以及知识图谱结构向量组成的各模态向量，然后根据各模态向量来分别进行早期融合以及晚期融合后，再对早期融合的结果和晚期融合的结果进行结合，得到多模态嵌入向量，从而解决当多模态之间的相关性相对较大时产生的影响，提高实体对齐结果的准确性。Based on the above problems, this scheme proposes an entity alignment method for multimodal knowledge graphs. This scheme first processes the multi-modal data composed of image data, relational data, attribute data and knowledge map structure data in the entity to obtain the image embedding vector, relational embedding vector, attribute embedding vector and knowledge map structure vector. The modal vector, and then perform early fusion and late fusion according to each modal vector, and then combine the results of early fusion and late fusion to obtain a multimodal embedding vector, so as to solve the problem of multimodal The influence produced when the correlation is relatively large improves the accuracy of entity alignment results.

具体地，参照图1和图2，本发明实施例提供的多模态知识图谱的实体对齐方法，包括以下步骤：Specifically, referring to Fig. 1 and Fig. 2, the entity alignment method of the multimodal knowledge map provided by the embodiment of the present invention includes the following steps:

步骤101、获取第一多模态知识图谱和第二多模态知识图谱的数据。知识图谱，是通过将应用数学、图形学、信息可视化技术、信息科学等学科的理论与方法与计量学引文分析、共现分析等方法结合，并利用可视化的图谱形象地展示学科的核心结构、发展历史、前沿领域以及整体知识架构达到多学科融合目的的现代理论。其中，多模态知识图谱与传统知识图谱的主要区别是，传统知识图谱主要集中研究文本和数据库的实体和关系，而多模态知识图谱则在传统知识图谱的基础上，构建了多种模态下的实体，以及多种模态实体间的多模态语义关系。Step 101, acquiring data of a first multimodal knowledge graph and a second multimodal knowledge graph. Knowledge map is a combination of theories and methods of applied mathematics, graphics, information visualization technology, information science and other disciplines with metrology citation analysis, co-occurrence analysis and other methods, and uses the visual map to vividly display the core structure of the subject, Modern theories that develop history, frontier fields, and overall knowledge structure to achieve multidisciplinary integration. Among them, the main difference between the multi-modal knowledge graph and the traditional knowledge graph is that the traditional knowledge graph mainly focuses on the entities and relationships of texts and databases, while the multi-modal knowledge graph constructs a variety of models based on the traditional knowledge graph. modal entities, as well as the multimodal semantic relationship between entities of various modalities.

步骤102、从多模态知识图谱中提取需要对齐的实体。本步骤模态知识图谱是指步骤101中的第一多模态知识图谱和第二多模态知识图谱。其具体操作过程是指从第一多模态知识图谱和第二多模态知识图谱中分别提取需要对齐的实体。实体是客观存在并可相互区别的事物，往往指某类事物的集合。Step 102, extract entities to be aligned from the multimodal knowledge graph. The modal knowledge graph in this step refers to the first multimodal knowledge graph and the second multimodal knowledge graph in step 101 . The specific operation process refers to extracting entities to be aligned from the first multi-modal knowledge graph and the second multi-modal knowledge graph respectively. Entities are things that exist objectively and can be distinguished from each other, and often refer to a collection of certain types of things.

步骤103、处理实体的多模态数据，得到实体的各模态向量。其中，多模态数据包括图像数据、关系数据、属性数据以及知识图谱结构数据；各模态向量包括图像嵌入向量、关系嵌入向量、属性嵌入向量以及知识图谱结构向量。Step 103, processing the multi-modal data of the entity to obtain the vectors of each mode of the entity. Among them, multimodal data includes image data, relational data, attribute data, and knowledge graph structure data; each modality vector includes image embedding vector, relational embedding vector, attribute embedding vector, and knowledge graph structure vector.

其中，图像嵌入具体为，使用预训练过的RESNET模型作为图像的特征抽取器，取最后一层的输出作为图像表示。最后，抽取的特征通过第一预设函数加工，得到图像嵌入向量emb_I。其中，RESNET模型是指残差网络，是卷积神经网络的一种。其特点是容易优化，并且能够通过增加相当的深度来提高准确率。其内部的残差块使用了跳跃连接，缓解了在深度神经网络中增加深度带来的梯度消失问题。相比于另一种经典卷积神经网络模型VGG16，RESNTET能够解决深度网络中的退化问题。Among them, the image embedding is specifically to use the pre-trained RESNET model as the feature extractor of the image, and take the output of the last layer as the image representation. Finally, the extracted features are processed by the first preset function to obtain the image embedding vector emb_I. Among them, the RESNET model refers to the residual network, which is a kind of convolutional neural network. It is characterized by easy optimization and the ability to increase accuracy by adding considerable depth. Its internal residual block uses skip connections, which alleviates the problem of gradient disappearance caused by increasing depth in deep neural networks. Compared with VGG16, another classic convolutional neural network model, RESNTET can solve the degradation problem in deep networks.

第一预设函数如下：The first default function is as follows:

emb _I＝W _I*RESNET(I)+b _I emb _I ＝W _I *RESNET(I)+b _I

上式中，W _I是权重向量，b _I是偏置向量，I表示图像。 In the above formula, W _I is the weight vector, b _I is the bias vector, and I represents the image.

其中，关系嵌入具体为，使用TransE模型将多模态知识图谱中所有的实体、关系表示成一个低维的向量。TransE模型的作用就是把三元组翻译成embedding词向量。而三元组，也就是(头实体，关系，尾实体)的形式，头实体和尾实体统称为实体。为了简化起见，用f(h,r,t)来表示三元组,h为头实体，t是尾实体，r为h和t的关系。接着，通过第二预设函数去计量结构的相似性。Among them, relational embedding is specifically, using the TransE model to represent all entities and relations in the multimodal knowledge map as a low-dimensional vector. The role of the TransE model is to translate triples into embedding word vectors. And the triplet, that is, the form of (head entity, relationship, tail entity), the head entity and the tail entity are collectively referred to as entities. For simplicity, f(h,r,t) is used to represent the triplet, h is the head entity, t is the tail entity, and r is the relationship between h and t. Next, a second preset function is used to measure the similarity of the structure.

第二预设函数如下：The second preset function is as follows:

f _rel(h,r,t)＝-||h ⁽²⁾+r-t ⁽²⁾|| f _rel (h,r,t)＝-||h ⁽²⁾ +rt ⁽²⁾ ||

其中，f _rel(h,r,t)是计算实体h与实体t的相似性的函数。 Among them, f _rel (h, r, t) is a function to calculate the similarity between entity h and entity t.

于是，得到逻辑回归损失函数，通过收敛函数得到关系嵌入向量emb_r，如下所示：Thus, the logistic regression loss function is obtained, and the relationship embedding vector emb_r is obtained through the convergence function, as follows:

上式中，a是f _rel(h,r,t)的标签，其数值为1或-1，X ⁺表示来源知识图谱和目标知识图谱中的正相关事实，X ^-是指通过替换正相关事实的头部或尾部实体来表示一组负样本。 In the above formula, a is the label of f _rel (h, r, t), and its value is 1 or -1, X ⁺ indicates the positive correlation fact in the source knowledge graph and the target knowledge graph, and X ^- refers to the positive correlation fact by replacing Fact head or tail entities to represent a set of negative samples.

其中，属性嵌入具体为，由于来自邻居节点的噪音存在，用深度神经网络模型来处理属性嵌入反而效果不佳，因此采用简单的前馈网络将属性特征映射为低维空间，以此来得到属性嵌入向量：Among them, attribute embedding is specifically, due to the existence of noise from neighboring nodes, using a deep neural network model to process attribute embedding is not effective, so a simple feedforward network is used to map attribute features into a low-dimensional space to obtain attribute Embedding vector:

emb _A＝W _A*A+b _A emb _A =W _A *A+b _A

上式中，emb _A是属性嵌入向量，W _A是权重矩阵向量，b _A是偏差向量，A为属性的集合。 In the above formula, emb _A is the attribute embedding vector, W _A is the weight matrix vector, b _A is the bias vector, and A is the set of attributes.

其中，知识图谱结构嵌入具体为，建立基于图卷积网络的半监督嵌入模型，将知识图谱转化为一个无向图。原始知识图谱的结构被重新构建。例如，假设三元组(e1,r,e2)，e1,e2代表实体，r代表实体间的关系，而在本实施例中，半监督嵌入模型给三元组分配不同的关系顶点r1和r2,形成(e1,r1)和(e2,r2)。每一关系顶点采用唯一的独热表示。Among them, the knowledge map structure embedding is specifically to establish a semi-supervised embedding model based on graph convolutional network, and transform the knowledge map into an undirected graph. The structure of the original knowledge graph is reconstructed. For example, assuming a triplet (e1, r, e2), e1, e2 represent entities, r represents the relationship between entities, and in this embodiment, the semi-supervised embedding model assigns different relationship vertices r1 and r2 to triplets , forming (e1,r1) and (e2,r2). Each relation vertex adopts a unique one-hot representation.

基于这一新建的无向图，使用Deepwalk算法表示每一实体顶点的特征向量，并且将每一关系顶点的唯一独热表示输入到GCN系统。这些关系顶点可以显示在两个实体顶点之间带有同一关系信息的邻居总数。在通过卷积层的编码之后，可以获得图中实体顶点和关系顶点的表示信息。对于GCN中的每一层可以被写为非线性函数：Based on this new undirected graph, the Deepwalk algorithm is used to represent the feature vector of each entity vertex, and the unique one-hot representation of each relationship vertex is input to the GCN system. These relationship vertices can display the total number of neighbors with the same relationship information between two entity vertices. After encoding through the convolutional layer, the representation information of entity vertices and relational vertices in the graph can be obtained. For each layer in GCN can be written as a non-linear function:

H ^(l+1)＝f(H ^(l)，M) H ^(l+1) = f(H ^(l) , M)

上式中，H ^(l+1)是输入矩阵，H ^(l)是输出矩阵，L是层数，M是知识图谱的邻接矩阵。然后，设定如下传播规则： In the above formula, H ^(l+1) is the input matrix, H ^(l) is the output matrix, L is the number of layers, and M is the adjacency matrix of the knowledge map. Then, set the propagation rules as follows:

f(H ^(l)，M)＝ReLU(MH ^(l)W ^(l)) f(H ^(l) ，M)＝ReLU(MH ^(l) W ^(l) )

上式中，W ^(l)是L网络层的权重矩阵，ReLU是激活函数。注意，与M相乘仅是合计所有邻接顶点的所有属性，而非顶点本身。因此，需要对M加进单位矩阵I，于是，上述方程便更新如下： In the above formula, W ^(l) is the weight matrix of the L network layer, and ReLU is the activation function. Note that multiplying by M only sums all attributes of all adjacent vertices, not the vertices themselves. Therefore, it is necessary to add the identity matrix I to M, so the above equation is updated as follows:

上式中，M＝M+I，D是M的对角矩阵。本实施例采用最后一层的输出作为知识图谱的结构嵌入向量emb_kg。In the above formula, M=M+I, and D is a diagonal matrix of M. In this embodiment, the output of the last layer is used as the structural embedding vector emb_kg of the knowledge graph.

步骤104、通过全连结神经网络模型进行早期融合。早期融合是指在数据输送进模型之前，通过组合特征的方式，将特征中的关系更好地进行捕捉。本方案使用标准的早期融合技术，融合多个取自不同数据模态的特征。本实施例通过设计一个简单的全连结神经网络模型，将各个模态的所有特征串联起来。Step 104, performing early fusion through the fully connected neural network model. Early fusion refers to better capturing the relationship in features by combining features before data is fed into the model. This scheme uses standard early fusion techniques to fuse multiple features from different data modalities. In this embodiment, a simple fully connected neural network model is designed to connect all the features of each modality in series.

步骤105、通过低秩多模态模型进行晚期融合。把

定义为M个不同模态的单模态信息的编码，多模态融合的目标是将单模态表示集成到一个紧凑的多模态表示中。张量表示被认为多模态融合的一个有效办法。但是，学习权重张量的参数量也将成指数增加。这不仅增多了大量的计算，还使模型有过拟合的风险。本实施例通过低秩多模态融合模型把权重分解为一系列低秩因子集。其中，低秩多模态融合模型可以将

简化为输出向量h _l: Step 105, performing late fusion through the low-rank multimodal model. Bundle

Defined as the encoding of unimodal information from M different modalities, the goal of multimodal fusion is to integrate unimodal representations into a compact multimodal representation. Tensor representations are considered an efficient approach for multimodal fusion. However, the number of parameters of the learned weight tensor will also increase exponentially. This not only increases a large number of calculations, but also puts the model at risk of overfitting. In this embodiment, the weight is decomposed into a series of low-rank factor sets through the low-rank multimodal fusion model. Among them, the low-rank multimodal fusion model can combine

Simplifies to the output vector h _l :

上式中，

表示一系列张量的元素点积,r是张量的秩，

是每个模态m的相应低秩因子。和现有的方法相比，这个计算方式简化了Z和W的并行分解。这样只需要计算h _l而无需创建张量Z，避免计算大输入张量Z。若r太大，计算量仍然很大。此时，通过交换求和顺序和按元素乘积的方式更新为下列等式： In the above formula,

Represents the element dot product of a series of tensors, r is the rank of the tensor,

is the corresponding low-rank factor for each mode m. Compared with existing methods, this calculation method simplifies the parallel decomposition of Z and W. This way only h _l needs to be calculated without creating tensor Z, avoiding the calculation of large input tensor Z. If r is too large, the amount of calculation is still very large. In this case, update to the following equation by swapping the order of summation and element-wise product:

上式中，i表示矩阵的第i条，而新加的约束条件是为了确保分解存在于可接受的范围内的同时减少计算量。In the above formula, i represents the i-th item of the matrix, and the newly added constraints are to ensure that the decomposition exists within an acceptable range while reducing the amount of calculation.

步骤106、结合早期融合的结果和晚期融合的结果，得到多模态嵌入向量。具体地，通过下列损失函数将晚期融合的结果h _l和早期融合模型产生的h _e结合在一起，以得到最终的多模态嵌入embF。这样可以结合两种融合的优势：不仅可以轻易地结合前期融合的输出特征，而且还能避免输入张量过程产生的计算，降低了计算的复杂性。 Step 106, combining the results of the early fusion and the late fusion to obtain a multimodal embedding vector. Specifically, the late fusion result h _l and the early fusion model he produced are combined by the following loss function to obtain the final multimodal embedding _embF . In this way, the advantages of the two fusions can be combined: not only can the output features of the previous fusion be easily combined, but also the calculations generated by the input tensor process can be avoided, reducing the complexity of calculations.

步骤107、根据所述多模态嵌入向量执行实体对齐。Step 107, perform entity alignment according to the multimodal embedding vector.

在一些实施例中，多模态向量的嵌入通过多次训练来实现。具体地，用L2规范约束所有实体嵌入，以调整嵌入向量。参数用Xavier初始化器进行初始化，损失函数用Adadelta优化，以简化计算。除了所有实体的embF之外，还需要计算所有偶图实体配对的相似性，并用损失函数L _ea来排列它们。L _ea如下所示： In some embodiments, embedding of multimodal vectors is achieved through multiple trainings. Specifically, all entity embeddings are constrained with an L2 norm to adjust the embedding vectors. The parameters are initialized with the Xavier initializer, and the loss function is optimized with Adadelta to simplify the calculation. In addition to the embF of all entities, it is also necessary to calculate the similarity of all even-graph entity pairings and rank them with the loss function L _ea . L _ea looks like this:

其中α和β是温度标度；N是种子数。where α and β are temperature scales; N is the number of seeds.

当整个训练过程收敛后，基于embF通过最近邻搜索算法执行实体对齐。When the entire training process converges, entity alignment is performed through the nearest neighbor search algorithm based on embF.

下面提供本实施例的具体实验数据：Provide the concrete experimental data of this embodiment below:

该实验的主要内容是测量两个公共的多模式数据集FB15K-DB15K和FB15K-YAGO15K之间的相似度，从而得出本实施例的性能。本实施例使用余弦相似度来计算表示两数据集的相似度，并且使用Hits@n，MR和MRR作为评估所有模型的指标。Hits@n表示基于相似度计算，正确的实体在前n中排名的比率。MR表示正确实体的平均等级。MRR表示正确实体的平均倒数。The main content of this experiment is to measure the similarity between two public multi-mode data sets FB15K-DB15K and FB15K-YAGO15K, so as to obtain the performance of this embodiment. In this embodiment, cosine similarity is used to calculate the similarity between two data sets, and Hits@n, MR and MRR are used as indicators for evaluating all models. Hits@n represents the ratio of correct entities ranked in the top n based on similarity calculations. MR indicates the average rank of correct entities. MRR represents the mean reciprocal of correct entities.

实验中选择了各种类型的最新模型来演示本实施例(DFMKE)框架的性能，包括两种典型的基于翻译的方法，即TransE和IPTransE。两种简单的晚期融合方法：MMKG和MMEA；以及两种最新的方法：MultiKE和EVA。对于那些使用与本实施例相同的数据集的方法，直接采用其报告的结果。对于其他方法，遵循原始论文中提到的相同的超参数设置来重复其他方法的实验。Various types of state-of-the-art models are selected in the experiments to demonstrate the performance of the proposed (DFMKE) framework, including two typical translation-based methods, TransE and IPTransE. Two simple late fusion methods: MMKG and MMEA; and two more recent methods: MultiKE and EVA. For those methods that used the same dataset as in this example, the reported results were adopted directly. For other methods, experiments with other methods are repeated following the same hyperparameter settings mentioned in the original paper.

从上表可以看出，Hits@1，Hits@10，和MRR三个指标中，本实施例(DFMKE)位列最高；在MR指标中，本实施例(DFMKE)位列最低。也就是说，本实施例(DFMKE)相比于其他现有技术有更高的实体对齐正确率，有效地解决了多模态知识表述之间不一致的问题。It can be seen from the above table that among the three indicators of Hits@1, Hits@10, and MRR, this embodiment (DFMKE) ranks the highest; among the MR indicators, this embodiment (DFMKE) ranks the lowest. That is to say, this embodiment (DFMKE) has a higher entity alignment accuracy rate than other existing technologies, and effectively solves the problem of inconsistency between multimodal knowledge representations.

参照图3，本发明实施例提供了一种多模态知识图谱的实体对齐装置，包括：Referring to FIG. 3 , an embodiment of the present invention provides an entity alignment device for a multimodal knowledge graph, including:

至少一个处理器201；at least one processor 201;

至少一个存储器202，用于存储至少一个程序；at least one memory 202 for storing at least one program;

当所述至少一个程序被所述至少一个处理器201执行时，使得所述至少一个处理器201实现图1所示的多模态知识图谱的实体对齐方法。When the at least one program is executed by the at least one processor 201, the at least one processor 201 is enabled to implement the entity alignment method of the multimodal knowledge graph shown in FIG. 1 .

上述方法实施例中的内容均适用于本装置实施例中，本装置实施例所具体实现的功能与上述方法实施例相同，并且达到的有益效果与上述方法实施例所达到的有益效果也相同。The content in the above-mentioned method embodiment is applicable to this device embodiment, and the specific functions realized by this device embodiment are the same as those of the above-mentioned method embodiment, and the beneficial effects achieved are also the same as those achieved by the above-mentioned method embodiment.

本发明实施例还提供了一种存储介质，其中存储有处理器可执行的指令，所述处理器可执行的指令在由处理器执行时用于实现图1所示多模态知识图谱的实体对齐方法。The embodiment of the present invention also provides a storage medium, which stores processor-executable instructions, and the processor-executable instructions are used to realize the entity of the multimodal knowledge graph shown in FIG. 1 when executed by the processor. alignment method.

在一些可选择的实施例中，在方框图中提到的功能/操作可以不按照操作示图提到的顺序发生。例如，取决于所涉及的功能/操作，连续示出的两个方框实际上可以被大体上同时地执行或所述方框有时能以相反顺序被执行。此外，在本发明的流程图中所呈现和描述的实施例以示例的方式被提供，目的在于提供对技术更全面的理解。所公开的方法不限于本文所呈现的操作和逻辑流程。可选择的实施例是可预期的，其中各种操作的顺序被改变以及其中被描述为较大操作的一部分的子操作被独立地执行。In some alternative implementations, the functions/operations noted in the block diagrams may occur out of the order noted in the operational diagrams. For example, two blocks shown in succession may, in fact, be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality/operations involved. Furthermore, the embodiments presented and described in the flowcharts of the present invention are provided by way of example in order to provide a more comprehensive understanding of the technology. The disclosed methods are not limited to the operations and logical flow presented herein. Alternative embodiments are contemplated in which the order of various operations is changed and in which sub-operations described as part of larger operations are performed independently.

此外，虽然在功能性模块的背景下描述了本发明，但应当理解的是，除非另有相反说明，所述的功能和/或特征中的一个或多个可以被集成在单个物理装置和/或软件模块中，或者一个或多个功能和/或特征可以在单独的物理装置或软件模块中被实现。还可以理解的是，有关每个模块的实际实现的详细讨论对于理解本发明是不必要的。更确切地说，考虑到在本文中公开的装置中各种功能模块的属性、功能和内部关系的情况下，在工程师的常规技术内将会了解该模块的实际实现。因此，本领域技术人员运用普通技术就能够在无需过度试验的情况下实现在权利要求书中所阐明的本发明。还可以理解的是，所公开的特定概念仅仅是说明性的，并不意在限制本发明的范围，本发明的范围由所附权利要求书及其等同方案的全部范围来决定。Furthermore, although the invention has been described in the context of functional modules, it should be understood that one or more of the described functions and/or features may be integrated into a single physical device and/or unless stated to the contrary. or software modules, or one or more functions and/or features may be implemented in separate physical devices or software modules. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary to understand the present invention. Rather, given the attributes, functions and internal relationships of the various functional blocks in the devices disclosed herein, the actual implementation of the blocks will be within the ordinary skill of the engineer. Accordingly, those skilled in the art can implement the present invention set forth in the claims without undue experimentation using ordinary techniques. It is also to be understood that the particular concepts disclosed are illustrative only and are not intended to limit the scope of the invention which is to be determined by the appended claims and their full scope of equivalents.

所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(ROM，Read-Only Memory)、随机存取存储器(RAM，Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。If the functions described above are realized in the form of software function units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the essence of the technical solution of the present invention or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the method described in each embodiment of the present invention. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disc, etc., which can store program codes. .

在流程图中表示或在此以其他方式描述的逻辑和/或步骤，例如，可以被认为是用于实现逻辑功能的可执行指令的定序列表，可以具体实现在任何计算机可读介质中，以供指令执行系统、装置或设备(如基于计算机的系统、包括处理器的系统或其他可以从指令执行系统、装置或设备取指令并执行指令的系统)使用，或结合这些指令执行系统、装置或设备而使用。就本说明书而言，“计算机可读介质”可以是任何可以包含、存储、通信、传播或传输程序以供指令执行系统、装置或设备或结合这些指令执行系统、装置或设备而使用的装置。The logic and/or steps represented in the flowcharts or otherwise described herein, for example, can be considered as a sequenced listing of executable instructions for implementing logical functions, which can be embodied in any computer-readable medium, For use with instruction execution systems, devices, or devices (such as computer-based systems, systems including processors, or other systems that can fetch instructions from instruction execution systems, devices, or devices and execute instructions), or in conjunction with these instruction execution systems, devices or equipment for use. For the purposes of this specification, a "computer-readable medium" may be any device that can contain, store, communicate, propagate or transmit a program for use in or in conjunction with an instruction execution system, device or device.

计算机可读介质的更具体的示例(非穷尽性列表)包括以下：具有一个或多个布线的电连接部(电子装置)，便携式计算机盘盒(磁装置)，随机存取存储器(RAM)，只读存储器(ROM)，可擦除可编辑只读存储器(EPROM或闪速存储器)，光纤装置，以及便携式光盘只读存储器(CDROM)。另外，计算机可读介质甚至可以是可在其上打印所述程序的纸或其他合适的介质，因为可以例如通过对纸或其他介质进行光学扫描，接着进行编辑、解译或必要时以其他合适方式进行处理来以电子方式获得所述程序，然后将其存储在计算机存储器中。More specific examples (non-exhaustive list) of computer-readable media include the following: electrical connection with one or more wires (electronic device), portable computer disk case (magnetic device), random access memory (RAM), Read Only Memory (ROM), Erasable and Editable Read Only Memory (EPROM or Flash Memory), Fiber Optic Devices, and Portable Compact Disc Read Only Memory (CDROM). In addition, the computer-readable medium may even be paper or other suitable medium on which the program can be printed, since the program can be read, for example, by optically scanning the paper or other medium, followed by editing, interpretation or other suitable processing if necessary. processing to obtain the program electronically and store it in computer memory.

应当理解，本发明的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中，多个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。例如，如果用硬件来实现，和在另一实施方式中一样，可用本领域公知的下列技术中的任一项或他们的组合来实现：具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路，具有合适的组合逻辑门电路的专用集成电路，可编程门阵列(PGA)，现场可编程门阵列(FPGA)等。It should be understood that various parts of the present invention can be realized by hardware, software, firmware or their combination. In the embodiments described above, various steps or methods may be implemented by software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, it can be implemented by any one or combination of the following techniques known in the art: Discrete logic circuits, ASICs with suitable combinational logic gates, programmable gate arrays (PGAs), field programmable gate arrays (FPGAs), etc.

在本说明书的上述描述中，参考术语“一个实施方式/实施例”、“另一实施方式/实施例”或“某些实施方式/实施例”等的描述意指结合实施方式或示例描述的具体特征或者特点包含于本发明的至少一个实施方式或示例中。在本说明书中，对上述术语的示意性表述不一定指的是相同的实施方式或示例。而且，描述的具体特征或者特点可以在任何的一个或多个实施方式或示例中以合适的方式结合。In the above description of this specification, the description with reference to the terms "one embodiment/example", "another embodiment/example" or "some embodiments/example" means that the description is described in conjunction with the embodiment or example. Specific features or characteristics are included in at least one embodiment or example of the invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the described specific features or characteristics may be combined in any suitable manner in any one or more embodiments or examples.

尽管已经示出和描述了本发明的实施方式，本领域的普通技术人员可以理解：在不脱离本发明的原理和宗旨的情况下可以对这些实施方式进行多种变化、修改、替换和变型，本发明的范围由权利要求及其等同物限定。Although the embodiments of the present invention have been shown and described, those skilled in the art can understand that various changes, modifications, substitutions and variations can be made to these embodiments without departing from the principle and spirit of the present invention. The scope of the invention is defined by the claims and their equivalents.

以上是对本发明的较佳实施进行了具体说明，但本发明并不限于所述实施例，熟悉本领域的技术人员在不违背本发明精神的前提下还可做作出种种的等同变形或替换，这些等同的变形或替换均包含在本申请权利要求所限定的范围内。The above is a specific description of the preferred implementation of the present invention, but the present invention is not limited to the described embodiments, and those skilled in the art can also make various equivalent deformations or replacements without violating the spirit of the present invention. These equivalent modifications or replacements are all within the scope defined by the claims of the present application.

Claims

The entity alignment method of the multimodal knowledge graph is characterized in that it comprises the following steps:

Obtain data of the first multimodal knowledge graph and the second multimodal knowledge graph;

Extract entities to be aligned from the first multimodal knowledge graph and the second multimodal knowledge graph respectively;

Processing the multimodal data of the entity to obtain the modal vectors of the entity, wherein the multimodal data includes image data, relationship data, attribute data, and knowledge map structure data; the modal vectors include Image embedding vector, relationship embedding vector, attribute embedding vector and knowledge map structure vector;

Carry out early fusion through the fully connected neural network model according to the modal vectors;

performing late fusion through a low-rank multimodal model according to each of the modality vectors;

Combining the results of early fusion and late fusion to obtain a multimodal embedding vector;

Entity alignment is performed based on the multimodal embedding vectors.

The entity alignment method of the multimodal knowledge map according to claim 1, wherein the step of processing the image data of the entity to obtain the image embedding vector of the entity specifically includes:

Using a pre-trained RESNET model to perform feature extraction on the acquired image data;

The extracted features are processed by a first preset function to obtain an image embedding vector.

The entity alignment method of the multimodal knowledge map according to claim 1, wherein the step of processing the relational data of the entity and obtaining the relational embedding vector of the entity specifically includes:

Converting the obtained relationship data into a translation vector through a TransE model;

Calculating the structural similarity of the translation vector through a second preset function to obtain a logistic regression loss function;

By converging the logistic regression loss function, a relationship embedding vector is obtained.

The entity alignment method of the multimodal knowledge map according to claim 1, wherein the step of processing the attribute data of the entity to obtain the attribute embedding vector of the entity specifically includes:

The acquired attribute data is mapped to a low-dimensional space through a feed-forward network to obtain an attribute embedding vector.

The entity alignment method of the multimodal knowledge graph according to claim 1, wherein the step of processing the knowledge graph structure data of the entity to obtain the structural embedding vector of the entity specifically includes:

Building a semi-supervised embedding model based on graph convolutional networks;

set relationship vertices;

Process the relationship vertices through the semi-supervised embedding model to obtain a structure embedding vector.

The entity alignment method of the multimodal knowledge map according to claim 2, wherein the early fusion specifically includes:

Establish a fully connected neural network model;

All the features extracted by the RESNET model are fused through the fully connected neural network model.

The entity alignment method of the multimodal knowledge map according to claim 1, wherein the late fusion specifically includes:

Simplify the vector representation of multimodal fusion through a low-rank multimodal fusion model;

Simplify the vector representation in a preset manner.

The entity alignment method of the multimodal knowledge map according to claim 1, wherein the step of combining the early fusion and the late fusion specifically includes:

The early fusion and the late fusion are combined through collaborative training according to a preset loss function.

The entity alignment device of the multimodal knowledge graph is characterized in that it includes:

at least one processor;

at least one memory for storing at least one program;

When the at least one program is executed by the at least one processor, the at least one processor implements the entity alignment method of the multi-modal knowledge graph according to any one of claims 1-8.

A computer-readable storage medium, in which processor-executable instructions are stored, wherein the processor-executable instructions are used to implement any one of claims 1-8 when executed by a processor. An entity alignment method for the multimodal knowledge graph described above.