[go: up one dir, main page]

CN111160557A - Knowledge representation learning method based on double-agent reinforcement learning path search - Google Patents

Knowledge representation learning method based on double-agent reinforcement learning path search Download PDF

Info

Publication number
CN111160557A
CN111160557A CN201911376444.4A CN201911376444A CN111160557A CN 111160557 A CN111160557 A CN 111160557A CN 201911376444 A CN201911376444 A CN 201911376444A CN 111160557 A CN111160557 A CN 111160557A
Authority
CN
China
Prior art keywords
agent
entity
hop
relation
relationship
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911376444.4A
Other languages
Chinese (zh)
Other versions
CN111160557B (en
Inventor
陈岭
崔军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201911376444.4A priority Critical patent/CN111160557B/en
Publication of CN111160557A publication Critical patent/CN111160557A/en
Application granted granted Critical
Publication of CN111160557B publication Critical patent/CN111160557B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

本发明公开了一种基于双代理增强学习路径搜索的知识表示学习方法,包括以下步骤:(1)删除知识库中的冗余关系,预训练实体和关系的向量;(2)路径搜索器根据实体和关系的向量搜索知识库中每一个三元组的实体对之间的若干个多跳关系,在搜索过程中使用考虑状态和历史信息的关系代理和实体代理进行决策;(3)根据实体之间的多跳关系和搜索获得的多跳关系来学习实体和关系的向量,并使用注意力机制来衡量每个多跳关系的权重。该知识表示学习方法能够引入高质量的多跳关系。

Figure 201911376444

The invention discloses a knowledge representation learning method based on dual-agent enhanced learning path search, comprising the following steps: (1) deleting redundant relationships in a knowledge base, and pre-training vectors of entities and relationships; (2) a path searcher according to The vector of entities and relationships searches several multi-hop relationships between entity pairs of each triple in the knowledge base, and uses relational agents and entity agents that consider state and historical information to make decisions during the search process; (3) According to the entity The multi-hop relationship between the multi-hop relationship and the search-obtained multi-hop relationship is used to learn the vectors of entities and relationships, and the attention mechanism is used to measure the weight of each multi-hop relationship. This knowledge representation learning method is able to introduce high-quality multi-hop relations.

Figure 201911376444

Description

Knowledge representation learning method based on double-agent reinforcement learning path search
Technical Field
The invention relates to the field of knowledge representation learning, in particular to a knowledge representation learning method based on double-agent reinforcement learning path search.
Background
Currently, knowledge bases containing large amounts of structured knowledge are important components of many applications, such as knowledge reasoning, question answering, etc. Thus, in recent years, large knowledge bases have been constructed by many enterprises and organizations, such as Freebase, DBpedia, YAGO, and the like. Knowledge in the knowledge base is represented in the form of triples (head, relationship, tail) which may be abbreviated as (h, r, t). Although the existing knowledge base already contains a great deal of knowledge, the relations among a plurality of entities are still lost, so that the completion of the knowledge base becomes a research hotspot.
And (4) realizing the completion of the knowledge base, and modeling the knowledge base firstly. Symbolic representation is a knowledge base modeling method that treats entities and relationships in a knowledge base as symbols. Symbolic representations have the disadvantages of low computational efficiency and data sparsity and cannot be adapted to the knowledge base with the capacity gradually increasing nowadays. Knowledge representation is also a knowledge base modeling method, and the entities and the relations in the knowledge base are embedded into a low-dimensional vector space, and the semantics of the entities and the relations are mapped into corresponding vectors, so that the problems of low calculation efficiency and data sparsity are solved, and the method can be applied to a large knowledge base.
Translation-based models are a typical class of knowledge representation learning methods that treat relationships in a triplet as translation operations between head and tail entities. When the relation between the entities is missing, the corresponding relation vector can be calculated through the difference between the vector of the tail entity and the vector of the head entity, so that the relation is completed. Most of the existing translation-based models only consider single-hop relationships, but not multi-hop relationships, i.e. relationship paths formed by multiple relationships between entities.
Some translation-based models consider multi-hop relationships, but have the following problems:
(1) the multi-hop relationship is obtained in a traversal mode, so that the time is consumed, and the quality of the multi-hop relationship is low;
(2) the weights assigned to each multi-hop relationship are based on their static characteristics, and the model cannot learn these weights during the training process.
In recent years, some work of introducing reinforcement learning into knowledge base completion is emerging, and a high-quality multi-hop relation is obtained by constructing a reinforcement learning model. However, these models have the following problems:
(1) the information considered in the process of searching the multi-hop relationship is not comprehensive enough, and only the selection of the relationship is considered and the selection of the entity is ignored;
(2) the setting of the reward is too simple and does not take various factors into comprehensive consideration.
Disclosure of Invention
The technical problem to be solved by the invention is how to search and introduce high-quality multi-hop relationship in the knowledge representation learning process.
In order to solve the above problems, the present invention provides a knowledge representation learning method based on dual-agent reinforcement learning path search, comprising the following steps:
(1) deleting redundant relations in the knowledge base, and pre-training vectors of entities and relations;
(2) the path searcher searches a plurality of multi-hop relations between entity pairs of each triple in the knowledge base according to the vectors of the entities and the relations, and a relation agent and an entity agent which consider state and historical information are used for making decisions in the searching process;
(3) and learning vectors of the entities and the relations according to the multi-hop relations among the entities and the multi-hop relations obtained by searching, and measuring the weight of each multi-hop relation by using an attention mechanism.
Compared with the prior art, the invention has the beneficial effects that:
compared with the traditional method for obtaining the multi-hop relationship through traversal, the multi-hop relationship searched by the path searcher has higher quality, and the weight given to the multi-hop relationship is more reasonable; compared with the existing method based on reinforcement learning, the method has the advantages that the decision is made by using two agents, the state and the information can be utilized more comprehensively, and the reward in the model is set more reasonably. The method is mainly applied to knowledge base completion.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is an overall flowchart of a knowledge representation learning method based on a dual-agent reinforcement learning path search according to an embodiment of the present invention;
FIG. 2 is a flow chart of data preprocessing provided by an embodiment of the present invention;
FIG. 3 is a flowchart of a path search according to an embodiment of the present invention;
FIG. 4 is a flow chart of knowledge representation learning provided by an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the detailed description and specific examples, while indicating the scope of the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.
Fig. 1 is an overall flowchart of a knowledge representation learning method based on a dual-agent reinforcement learning path search according to an embodiment of the present invention. Referring to fig. 1, the embodiment provides a knowledge representation learning method based on a dual-agent reinforcement learning path search, which includes three stages of data preprocessing, path search and knowledge representation learning.
Data preprocessing stage
In the data preprocessing stage, the redundant relationship in the knowledge base, the pre-training entity and the relationship vector are mainly deleted, as shown in fig. 2, the specific process is as follows:
step 1-1: and inputting a knowledge base KB and deleting the redundancy relation.
Knowledge in the knowledge base KB is represented in the form of triples (h, r, t), where h represents the head entity, r represents the relationship, and t represents the tail entity. h and t belong to an entity set E, R belongs to a relation set R, the triple (h, R, t) reflects the existence of the relation R between the entity h and the entity t, and the redundant relation in the knowledge base KB is deleted to obtain the processed knowledge base.
Step 1-2: vectors of entities and relationships in the knowledge base KB are pre-trained using an existing translation-based model (e.g., TransE).
The path searcher needs to utilize vectors of entities and relationships, and therefore the vectors of entities and relationships in the knowledge base KB are pre-trained using a translation-based model.
Taking TransE as an example: TransE learns a vector for each entity and relation in the knowledge base, and in a triplet (h, r, t), the vectors h, r and t corresponding to the head entity h, the relation r and the tail entity t should satisfy:
h+r=t (1)
and the vectors of the entities and the relations are learned by taking the vectors as targets.
Path search phase
The path search stage mainly realizes searching a plurality of multi-hop relationships between the entity pairs of each triple in the knowledge base according to the vectors of the entities and the relationships, and transmits the multi-hop relationships finally reaching the tail entity to the knowledge representation learning stage, as shown in fig. 3, the specific flow is as follows:
step 2-1: the triples in the knowledge base KB are divided into batches.
The invention trains the path searcher in a batch processing mode. Triples in KB are randomly divided into batches according to a predefined defined batch size.
Step 2-2: taking one batch, searching the multi-hop relationship between the entity pairs of each triple in the batch through a path searcher.
The path searcher comprises a relation agent and an entity agent, starting from the head entity of the given triple, the relation agent calculates the probability distribution of all relations contained in the current entity, and selects one relation; and the entity agent calculates the probability distribution of all tail entities corresponding to the current entity and the selected relation and selects one entity. This process is continued until the tail entity of a given triplet is reached or the maximum number of steps is reached.
The path searcher is based on an enhanced learning model, and is composed of two agents, called a relational agent and an entity agent. The process of searching for a multi-hop relationship between (h, t) in (h, r, t) is as follows: starting from a head entity h, at the t step, the relation agent starts from a current entity etIncluding selection of one relation r from all relationstEntity proxy slave etAnd rtAnd selecting one entity from all corresponding tail entities, and carrying out the process until the tail entity t is reached or the step number reaches a preset maximum step number.
The environment of the path searcher can be viewed as a Markov decision process, represented by a four-tuple (S, A, T, R), where S represents a set of states, A represents a set of actions, T represents a transition, and R represents a reward.
At step t, the status of the relational agent is denoted Srel,t=(etR, t) in which etIs the current entity etR is a vector representation of the relationship r in the triplet, the vector representation of the tail entity t in the t triplet; the state of the entity agent is denoted Sent,t=(et,r,t,rt),rtRelationship r being a selection of a relationship agenttIs represented by a vector of (a).
At step t, the action set of the relational agent is the current entity etAll relationships contained, denoted Arel,t={r|(etR, e) belongs to KB }; the action set of the entity proxy is the current entity etRelationship r with relationship agent selectiontAll corresponding tail entities, denoted Aent,t={e|(et,rt,e)∈KB}。
At step t, the status of the relational agent is from (e)tR, t) to (e)t+1R, T), the transition of the relational agent is denoted Trel((et,r,t),rt)=(et+1R, t); state of entity agent from (e)t,r,t,rt) Become (e)t+1,r,t,rt+1) The transfer of the physical agent is denoted as Tent((et,r,t,rt),et+1)=(et+1,r,t,rt+1)。
One multi-hop relationship p ═ (r)1,r2,…,rn) The reward of (1) is composed of two parts of overall precision and path weight. Wherein the overall accuracy Rg(p) is represented by:
Figure BDA0002341105460000061
path weight Rw(p) is represented by:
Figure BDA0002341105460000062
where W is the weight matrix and p is the vector representation of the multihop relationship p:
Figure BDA0002341105460000063
the total reward for the multi-hop relationship p is then expressed as:
R(p)=Rg(p)+Rw(p) (5)
both the relational agent and the entity agent compute a probability distribution over the decision network for performing each action. The input to the decision network contains both historical information and status. Vector d for history information at t-th steptThat the present invention obtains d by training an RNNt
dt=RNN(dt-1,[et-1,rt-1]) (6)
Wherein [,]representing the concatenation of two vectors. The inputs of the decision networks corresponding to the relational agent and the entity agent are respectively represented as Xrel,t=[dt,Srel,t]And Xent,t=[dt,Sent,t]。
The structure of the decision network is a fully-connected neural network comprising two hidden layers, and each hidden layer is connected with a ReLU nonlinear layer.
The output of the decision network corresponding to the relation agent and the entity agent is Arel,tAnd Aent,tProbability distribution of each action in (1):
Prel(Xrel,t)=softmax(Arel,tOrel,t) (7)
Pent(Xent,t)=softmax(Aent,tOent,t) (8)
wherein A isrel,tAnd Aent,tRespectively represent by Arel,t、Aent,tA matrix formed by vectors of all the relations and entities; o isrel,tAnd Oent,tRespectively representing the outputs of the second ReLU layer of the decision network corresponding to the relational and entity agents. The relationship agent and the entity agent, when selecting an entity or a relationship, will make a random selection based on the calculated probability distribution.
For each triplet in a batch, several multi-hop relationships are searched using the path searcher described above.
Step 2-3: and updating the parameters and the weight matrixes of the relation agents and the entity agents by utilizing the multi-hop relations searched in the batch.
The relevant parameters of the path search phase are updated by maximizing the expected cumulative reward, the parameters including the parameters of the two decision networks, the parameters of the RNN calculating the history information and the weight matrix W. The desired jackpot is defined as:
Figure BDA0002341105460000071
wherein
Figure BDA0002341105460000072
Represents the state StAnd action atReward of P (a | X)t(ii) a θ) represents X at a given inputtProbability of time action a, the present invention updates the parameters by a monte carlo gradient, the gradient of J (θ) is expressed as:
Figure BDA0002341105460000073
for a searched multi-hop relationship p, when updating parameters, each of the processes of searching the multi-hop relationship
Figure BDA0002341105460000074
Are all equal to R (p).
Step 2-4: step 2-2 and step 2-3 are repeated until all batches in KB are processed.
And repeating the step 2-2 and the step 2-3, searching the multi-hop relationship between the entity pairs of all the triples in the KB in batch, and updating the related parameters in the path searching stage.
Knowledge representation learning phase
In the knowledge representation learning stage, a single-hop relationship and a multi-hop relationship are simultaneously utilized to learn the entity and the relationship vector, as shown in fig. 4, the specific process is as follows:
step 3-1: the knowledge base divides the triples in KB into batches.
The invention trains a knowledge representation learning model in a batch processing mode, and randomly divides triples in the KB into a plurality of batches according to a preset defined batch size.
Step 3-2: taking one batch, and calculating the weight of all multi-hop relations of each triple.
Given a triplet (h, r, t), the set of all multi-hop relationships is { p }1,…,pK}, a multihop relationship piThe weight of (d) is defined as:
Figure BDA0002341105460000081
wherein:
ηi=tanh(Wpi) (12)
wherein W is a weight matrix which is the same matrix as the weight matrix in the path search stage reward.
A multi-hop relationship as used herein is a multi-hop relationship that ultimately reaches the tail entity, searched in the path search phase.
And calculating the weight of all multi-hop relations of each triplet in the batch according to the formula.
Step 3-3: and calculating energy functions and losses of all the triples in the batch by using the single-hop relation and the multi-hop relation, and updating the vectors and the weight matrix of the entities and the relations.
Given a triplet (h, r, t), the set of all multi-hop relationships is { p }1,…,pK-knowledge-representation learning phase energy function is defined as:
Figure BDA0002341105460000082
Figure BDA0002341105460000091
from the energy function, a loss function can be defined for the knowledge representation learning phase:
Figure BDA0002341105460000092
wherein γ is a predefined boundary [ ·]+Represents 0 and [. cndot]The value in the rule is the maximum value, T is a positive sample set, namely the set of all triples in the knowledge base; t is-Is a set of negative examples, expressed as:
T-={(h,r′,t)|r′∈R},(h,r,t)∈T (15)
the negative examples are obtained by replacing the relationship r in the triples with another relationship r' in the knowledge base.
The loss of all triples in the batch is calculated and the vector of entities and relationships and the weight matrix W are updated by minimizing the loss.
Step 3-4: step 3-2 and step 3-3 are repeated until all batches in KB are processed.
And repeating the step 3-2 and the step 3-3, calculating the weight, the energy function and the loss function of the multi-hop relation corresponding to all the triples in the KB in batches, and updating the related parameters of the knowledge representation learning stage.
Step 3-5: if the iteration reaches the preset maximum times, outputting the vectors of the entities and the relations; otherwise, go to step 2-2.
And iteratively performing a path searching stage and a knowledge representation learning stage until iteration reaches a preset maximum number, and outputting vectors of the entities and the relations.
In the knowledge representation learning method based on the double-agent reinforcement learning path search, the path searcher searches high-quality multi-hop relationships among entities by using the entities and the relationship vectors trained by the knowledge representation learning model, and uses two agents to make decisions in the searching process, so that the state and the information can be considered more comprehensively; knowledge representation the learning model learns the vectors of entities and relationships using both single-hop relationships and the searched multi-hop relationships, and uses an attention mechanism to weigh the weight of each multi-hop relationship. The rewarding in the path searcher and the weight in the knowledge representation model share a part of parameters, so that the part of parameters can not only measure the weight of the multi-hop relationship, but also guide the path searcher to search the multi-hop relationship which is more useful for the knowledge representation learning process.
The above-mentioned embodiments are intended to illustrate the technical solutions and advantages of the present invention, and it should be understood that the above-mentioned embodiments are only the most preferred embodiments of the present invention, and are not intended to limit the present invention, and any modifications, additions, equivalents, etc. made within the scope of the principles of the present invention should be included in the scope of the present invention.

Claims (9)

1.一种基于双代理增强学习路径搜索的知识表示学习方法,包括以下步骤:1. A knowledge representation learning method based on dual-agent reinforcement learning path search, comprising the following steps: (1)删除知识库中的冗余关系,预训练实体和关系的向量;(1) Delete redundant relationships in the knowledge base, and pre-train the vectors of entities and relationships; (2)路径搜索器根据实体和关系的向量搜索知识库中每一个三元组的实体对之间的若干个多跳关系,在搜索过程中使用考虑状态和历史信息的关系代理和实体代理进行决策;(2) The path searcher searches several multi-hop relationships between the entity pairs of each triple in the knowledge base according to the vector of entities and relationships, and uses the relationship agent and entity agent considering the state and historical information in the search process. decision making; (3)根据实体之间的多跳关系和搜索获得的多跳关系来学习实体和关系的向量,并使用注意力机制来衡量每个多跳关系的权重。(3) Learn the vectors of entities and relations according to the multi-hop relations between entities and the multi-hop relations obtained by searching, and use the attention mechanism to weigh the weight of each multi-hop relation. 2.如权利要求1所述的基于双代理增强学习路径搜索的知识表示学习方法,其特征在于,步骤(1)中,采用基于翻译的模型预训练知识库中实体和关系的向量,在一个三元组(h,r,t)中,头实体h、关系r、尾实体t相应的向量h,r和t之间应该满足:2. the knowledge representation learning method based on dual-agent reinforcement learning path search as claimed in claim 1, is characterized in that, in step (1), adopt the vector of entity and relation in the model pre-training knowledge base based on translation, in a In the triplet (h, r, t), the vector h, r and t corresponding to the head entity h, relation r, and tail entity t should satisfy: h+r=th+r=t 以此为目标来学习实体和关系的向量。This is the goal to learn vectors for entities and relationships. 3.如权利要求1所述的基于双代理增强学习路径搜索的知识表示学习方法,其特征在于,步骤(2)中,路径搜索器基于增强学习模型,由两个代理组成,称为关系代理和实体代理,搜索(h,r,t)中(h,t)之间的多跳关系的过程如下:从头实体h出发,在第t步时,关系代理从当前实体et包含所有的关系中选择一个关系rt,实体代理从et和rt对应的所有尾实体中选择一个实体,该过程直到到达尾实体t或者步数达到预定的最大步数。3. The knowledge representation learning method based on dual-agent reinforcement learning path search as claimed in claim 1, wherein in step (2), the path searcher is based on the reinforcement learning model and consists of two agents, which are called relational agents and the entity agent, the process of searching for the multi-hop relationship between (h, t) in (h, r, t) is as follows: starting from the head entity h, at step t, the relationship agent contains all the relationships from the current entity e t Selecting a relation rt in , the entity agent selects an entity from all the tail entities corresponding to e t and rt , and the process is performed until the tail entity t is reached or the number of steps reaches a predetermined maximum number of steps. 4.如权利要求3所述的基于双代理增强学习路径搜索的知识表示学习方法,其特征在于,路径搜索器的环境视为一个马尔科夫决策过程,用四元组(S,A,T,R)表示,其中S表示状态集,A表示动作集,T表示转移,R表示奖励。4. the knowledge representation learning method based on dual-agent reinforcement learning path search as claimed in claim 3, it is characterized in that, the environment of path searcher is regarded as a Markov decision process, with quadruple (S, A, T , R), where S represents the state set, A represents the action set, T represents the transition, and R represents the reward. 在第t步时,关系代理的状态表示为Srel,t=(et,r,t),其中et是当前实体et的向量表示,r是三元组中关系r的向量表示,t三元组中尾实体t的向量表示;实体代理的状态表示为Sent,t=(et,r,t,rt),rt是关系代理选择的关系rt的向量表示。At step t, the state of the relational agent is represented as S rel, t = (et , r, t), where e t is the vector representation of the current entity e t and r is the vector representation of the relation r in the triplet, The vector representation of the tail entity t in the t-triple; the state of the entity agent is denoted as Sent, t = (et, r, t , r t ), where r t is the vector representation of the relation rt chosen by the relation agent. 在第t步时,关系代理的动作集是当前实体et包含的所有关系,表示为Arel,t={r|(et,r,e)∈KB};实体代理的动作集是当前实体et和关系代理选择的关系rt对应的所有尾实体,表示为Aent,t={e|(et,rt,e)∈KB};At the t-th step, the action set of the relationship agent is all the relationships contained in the current entity e t , expressed as A rel, t = {r|(e t , r, e)∈KB}; the action set of the entity agent is the current All tail entities corresponding to entity e t and relation r t selected by the relationship agent, denoted as A ent, t = {e|(et , r t , e)∈KB}; 在第t步时,关系代理的状态从(et,r,t)变为(et+1,r,t),关系代理的转移表示为Trel((et,r,t),rt)=(et+1,r,t);实体代理的状态从(et,r,t,rt)变为(et+1,r,t,rt+1),实体代理的转移表示为Tent((et,r,t,rt),et+1)=(et+1,r,t,rt+1);At step t, the state of the relational agent changes from (e t , r, t) to (e t+1 , r, t), and the transition of the relational agent is denoted as T rel ((e t , r, t), r t )=(e t+1 , r, t); the state of the entity agent changes from (e t , r, t, r t ) to (e t+1 , r, t, r t+1 ), the entity The agent's transition is denoted as T ent ((e t , r, t, r t ), e t+1 ) = (e t+1 , r, t, r t+1 ); 一个多跳关系p=(r1,r2,...,rn)的奖励由整体精度和路径权重两部分组成,其中整体精度Rg(p)表示为:The reward of a multi-hop relation p=(r 1 , r 2 , ..., rn ) consists of two parts, the overall accuracy and the path weight, where the overall accuracy R g (p) is expressed as:
Figure FDA0002341105450000021
Figure FDA0002341105450000021
路径权重Rw(p)表示为:The path weight R w (p) is expressed as:
Figure FDA0002341105450000022
Figure FDA0002341105450000022
其中W是权重矩阵,p是多跳关系p的向量表示:where W is the weight matrix and p is the vector representation of the multi-hop relation p:
Figure FDA0002341105450000023
Figure FDA0002341105450000023
则多跳关系p的总奖励表示为:Then the total reward of the multi-hop relationship p is expressed as: R(p)=Rg(p)+Rw(p) (5)R(p)= Rg (p)+ Rw (p) (5)
5.如权利要求4所述的基于双代理增强学习路径搜索的知识表示学习方法,其特征在于,关系代理和实体代理均通过决策网络来计算执行每个动作的概率分布,决策网络的输入包含历史信息和状态两部分,在第t步时的历史信息用向量dt来表示,通过训练一个RNN来得到dt5. The knowledge representation learning method based on dual-agent enhanced learning path search as claimed in claim 4, wherein the relational agent and the entity agent both calculate the probability distribution of executing each action through a decision-making network, and the input of the decision-making network contains There are two parts of historical information and state. The historical information at the t-th step is represented by a vector d t , and d t is obtained by training an RNN: dt=RNN(dt-1,[et-1,rt-1]) (6)d t =RNN(d t-1 , [e t-1 , r t-1 ]) (6) 其中[,]表示两个向量的连接,关系代理和实体代理对应的决策网络的输入分别表示为Xrel,t=[dt,Srel,t]和Xent,t=[dt,Sent,t];where [,] represents the connection of two vectors, and the input of the decision network corresponding to the relational agent and the entity agent is represented as X rel, t = [d t , S rel, t ] and X ent, t = [d t , S , respectively ent, t ]; 决策网络的结构是一个包含两个隐藏层的全连接神经网络,每个隐藏层后都连接着一个ReLU非线性层;The structure of the decision network is a fully connected neural network with two hidden layers, each hidden layer is connected to a ReLU nonlinear layer; 关系代理和实体代理对应的决策网络的输出为Arel,t和Aent,t中每个动作的概率分布:The output of the decision network corresponding to the relational agent and the entity agent is A rel, t and A ent, the probability distribution of each action in t: Prel(Xrel,t)=softmax(Arel,tOrel,t) (7)P rel (X rel, t ) = softmax(A rel, t O rel, t ) (7) Pent(Xent,t)=softmax(Aent,tOent,t) (8)P ent (X ent, t ) = softmax(A ent, t O ent, t ) (8) 其中Arel,t和Aent,t分别表示由Arel,t、Aent,t中所有的关系、实体的向量构成的矩阵;Orel,t和Oent,t分别表示关系代理和实体代理对应的决策网络的第二个ReLU层的输出;关系代理和实体代理在选择实体或关系时,会基于计算得到的概率分布进行随机选择。Among them, A rel,t and A ent, t represent the matrix composed of all the relation and entity vectors in A rel, t and A ent, t respectively; O rel, t and O ent, t represent the relation proxy and entity proxy respectively The output of the second ReLU layer of the corresponding decision network; when the relationship agent and the entity agent select an entity or relationship, they will randomly select based on the calculated probability distribution. 6.如权利要求5所述的基于双代理增强学习路径搜索的知识表示学习方法,其特征在于,利用搜索到的多跳关系更新关系代理和实体代理的参数和权重矩阵,具体过程为:6. the knowledge representation learning method based on dual-agent enhanced learning path search as claimed in claim 5, is characterized in that, utilizes the searched multi-hop relationship to update the parameters and weight matrix of relational agent and entity agent, and the concrete process is: 路径搜索阶段的相关参数通过最大化期望累计奖励来更新,参数包括两个决策网络的参数、计算历史信息的RNN的参数和权重矩阵W,期望累计奖励定义为:The relevant parameters of the path search stage are updated by maximizing the expected cumulative reward. The parameters include the parameters of the two decision-making networks, the parameters of the RNN that calculates historical information, and the weight matrix W. The expected cumulative reward is defined as:
Figure FDA0002341105450000041
Figure FDA0002341105450000041
其中
Figure FDA0002341105450000042
表示状态St和动作at下的奖励,P(a|Xt;α)表示在给定输入Xt时动作a的概率,本发明通过蒙特卡洛梯度来更新参数,J(θ)的梯度表示为:
in
Figure FDA0002341105450000042
Represents the reward under state S t and action a t , P(a|X t ; α) represents the probability of action a given input X t , the present invention updates the parameters through Monte Carlo gradient, J(θ) The gradient is expressed as:
Figure FDA0002341105450000043
Figure FDA0002341105450000043
对于一个搜索到的多跳关系p,当更新参数时,搜索该多跳关系的过程中每一个
Figure FDA0002341105450000044
都等于R(p)。
For a searched multi-hop relationship p, when the parameters are updated, each
Figure FDA0002341105450000044
are equal to R(p).
7.如权利要求5所述的基于双代理增强学习路径搜索的知识表示学习方法,其特征在于,步骤(3)的具体过程为:7. The knowledge representation learning method based on dual-agent enhanced learning path search as claimed in claim 5, wherein the specific process of step (3) is: (3-1)对于每一个三元组,计算其所有多跳关系的权重;(3-1) For each triple, calculate the weights of all its multi-hop relationships; (3-2)利用单跳关系和多跳关系计算该批中所有三元组的能量函数和损失,并更新实体、关系的向量和权重矩阵。(3-2) Calculate the energy function and loss of all triples in the batch using single-hop and multi-hop relations, and update the vectors and weight matrices of entities and relations. 8.如权利要求7所述的基于双代理增强学习路径搜索的知识表示学习方法,其特征在于,步骤(3-1)中,给定一个三元组(h,r,t),其所有多跳关系集合为{p1,...,pK},一个多跳关系pi的权重定义为:8. The knowledge representation learning method based on dual-agent enhanced learning path search according to claim 7, characterized in that, in step (3-1), given a triple (h, r, t), all of its The set of multi-hop relationships is {p 1 ,...,p K }, and the weight of a multi-hop relationship p i is defined as:
Figure FDA0002341105450000045
Figure FDA0002341105450000045
其中:in: ηi=tanh(Wpi) (12)η i =tanh(Wp i ) (12) 其中W是权重矩阵。where W is the weight matrix.
9.如权利要求7所述的基于双代理增强学习路径搜索的知识表示学习方法,其特征在于,步骤(3-2)中,给定一个三元组(h,r,t),其所有多跳关系集合为{p1,...,pK},知识表示学习阶段的能量函数定义为:9. The knowledge representation learning method based on dual-agent enhanced learning path search according to claim 7, wherein in step (3-2), given a triple (h, r, t), all of its The multi-hop relation set is {p 1 , ..., p K }, and the energy function of the knowledge representation learning stage is defined as:
Figure FDA0002341105450000051
Figure FDA0002341105450000051
根据能量函数,定义损失函数:According to the energy function, define the loss function:
Figure FDA0002341105450000052
Figure FDA0002341105450000052
其中γ是预先定义的边界,[·]+表示0与[·]内的值取最大值,T是正样本集合,即知识库中所有三元组的集合;T-是负样本集合,表示为:where γ is a predefined boundary, [ ] + represents the maximum value between 0 and [ ], T is the set of positive samples, that is, the set of all triples in the knowledge base; T - is the set of negative samples, expressed as : T-={(h,r′,t)|r′∈R},(h,r,t)∈T (15)T - = {(h, r', t)|r'∈R}, (h, r, t)∈T (15) 负样本通过将三元组中的关系r替换为知识库中的另一个关系r′得到;Negative samples are obtained by replacing the relation r in the triplet with another relation r' in the knowledge base; 计算该批中所有三元组的损失,并通过最小化损失来更新实体和关系的向量和权重矩阵W。Compute the loss for all triples in the batch and update the vector and weight matrix W of entities and relations by minimizing the loss.
CN201911376444.4A 2019-12-27 2019-12-27 Knowledge representation learning method based on double-agent reinforcement learning path search Expired - Fee Related CN111160557B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911376444.4A CN111160557B (en) 2019-12-27 2019-12-27 Knowledge representation learning method based on double-agent reinforcement learning path search

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911376444.4A CN111160557B (en) 2019-12-27 2019-12-27 Knowledge representation learning method based on double-agent reinforcement learning path search

Publications (2)

Publication Number Publication Date
CN111160557A true CN111160557A (en) 2020-05-15
CN111160557B CN111160557B (en) 2023-04-18

Family

ID=70558468

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911376444.4A Expired - Fee Related CN111160557B (en) 2019-12-27 2019-12-27 Knowledge representation learning method based on double-agent reinforcement learning path search

Country Status (1)

Country Link
CN (1) CN111160557B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
NL2028258A (en) * 2020-09-03 2021-08-17 Shandong Artificial Intelligence Inst Attention-lstm-based method for knowledge reasoning of reinforcement learning agent
CN114290323A (en) * 2020-10-07 2022-04-08 罗伯特·博世有限公司 Apparatus and method for controlling a robotic device
CN114328493A (en) * 2021-12-30 2022-04-12 杭州电子科技大学 Biomedical knowledge base completion method and device based on multi-hop path

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103530457A (en) * 2013-10-10 2014-01-22 南京邮电大学 Modeling and construction method of complex relation chain of internet of things based on multiple tuples
US20170024476A1 (en) * 2012-01-05 2017-01-26 Yewno, Inc. Information network with linked information nodes
CN107885760A (en) * 2016-12-21 2018-04-06 桂林电子科技大学 It is a kind of to represent learning method based on a variety of semantic knowledge mappings
CN109885627A (en) * 2019-02-13 2019-06-14 北京航空航天大学 A method and device for neural network training relationship between entities
CN110046262A (en) * 2019-06-10 2019-07-23 南京擎盾信息科技有限公司 A kind of Context Reasoning method based on law expert's knowledge base

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170024476A1 (en) * 2012-01-05 2017-01-26 Yewno, Inc. Information network with linked information nodes
CN103530457A (en) * 2013-10-10 2014-01-22 南京邮电大学 Modeling and construction method of complex relation chain of internet of things based on multiple tuples
CN107885760A (en) * 2016-12-21 2018-04-06 桂林电子科技大学 It is a kind of to represent learning method based on a variety of semantic knowledge mappings
CN109885627A (en) * 2019-02-13 2019-06-14 北京航空航天大学 A method and device for neural network training relationship between entities
CN110046262A (en) * 2019-06-10 2019-07-23 南京擎盾信息科技有限公司 A kind of Context Reasoning method based on law expert's knowledge base

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
XING TANG.ET.: "Knowledge representation learning with entity descriptions,hierarchical types,and textual relations" *
王子涵等: "基于实体相似度信息的知识图谱补全算法" *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
NL2028258A (en) * 2020-09-03 2021-08-17 Shandong Artificial Intelligence Inst Attention-lstm-based method for knowledge reasoning of reinforcement learning agent
CN114290323A (en) * 2020-10-07 2022-04-08 罗伯特·博世有限公司 Apparatus and method for controlling a robotic device
CN114328493A (en) * 2021-12-30 2022-04-12 杭州电子科技大学 Biomedical knowledge base completion method and device based on multi-hop path

Also Published As

Publication number Publication date
CN111160557B (en) 2023-04-18

Similar Documents

Publication Publication Date Title
Baymurzina et al. A review of neural architecture search
JP7381814B2 (en) Automatic compression method and platform for pre-trained language models for multitasking
Zhan et al. A fast kriging-assisted evolutionary algorithm based on incremental learning
CN109992670B (en) Atlas completion method based on knowledge atlas neighborhood structure
CN115186097B (en) Interactive recommendation method based on knowledge graph and reinforcement learning
CN112052936B (en) Reinforcement learning exploration method and device based on generative adversarial mechanism
Wang et al. Evolutionary extreme learning machine ensembles with size control
WO2022126683A1 (en) Method and platform for automatically compressing multi-task-oriented pre-training language model
CN113792924A (en) A single job shop scheduling method based on Deep Q-network deep reinforcement learning
Chen et al. Rlpath: a knowledge graph link prediction method using reinforcement learning based attentive relation path searching and representation learning
CN109241291A (en) Knowledge mapping optimal path inquiry system and method based on deeply study
CN114564596A (en) Cross-language knowledge graph link prediction method based on graph attention machine mechanism
Farasat et al. ARO: A new model-free optimization algorithm inspired from asexual reproduction
CN111160557B (en) Knowledge representation learning method based on double-agent reinforcement learning path search
CN108764577A (en) Online time series prediction technique based on dynamic fuzzy Cognitive Map
CN115169439A (en) A method and system for effective wave height prediction based on sequence-to-sequence network
CN119378698A (en) A knowledge reasoning method based on adversarial reinforcement learning
Xue et al. An effective surrogate-assisted rank method for evolutionary neural architecture search
CN110009048B (en) A method and device for constructing a neural network model
CN111832817A (en) Small-world echo state network time series prediction method based on MCP penalty function
CN116718198B (en) Unmanned aerial vehicle cluster path planning method and system based on time sequence knowledge graph
CN113095466A (en) Algorithm of satisfiability model theoretical solver based on meta-learning model
CN115599918B (en) Graph enhancement-based mutual learning text classification method and system
CN114596473B (en) A network embedding pre-training method based on graph neural network hierarchical loss function
CN110852435A (en) Neural evolution calculation model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20230418