CN111160557B

CN111160557B - Knowledge representation learning method based on double-agent reinforcement learning path search

Info

Publication number: CN111160557B
Application number: CN201911376444.4A
Authority: CN
Inventors: 陈岭; 崔军
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2019-12-27
Filing date: 2019-12-27
Publication date: 2023-04-18
Anticipated expiration: 2039-12-27
Also published as: CN111160557A

Abstract

The invention discloses a knowledge representation learning method based on double-agent enhanced learning path search, comprising the following steps: (1) deleting redundant relations in the knowledge base, and pre-training vectors of entities and relations; (2) path searcher according to Entity and relationship vector search for several multi-hop relationships between entity pairs of each triple in the knowledge base, using relational agents and entity agents that consider state and historical information to make decisions during the search process; (3) according to the entity The multi-hop relations between and the multi-hop relations obtained by searching are used to learn the vectors of entities and relations, and the attention mechanism is used to weigh the weight of each multi-hop relation. This knowledge representation learning method is able to introduce high-quality multi-hop relations.

Description

Knowledge representation learning method based on double-agent reinforcement learning path search

Technical Field

The invention relates to the field of knowledge representation learning, in particular to a knowledge representation learning method based on double-agent reinforcement learning path search.

Background

Currently, knowledge bases containing large amounts of structured knowledge are important components of many applications, such as knowledge reasoning, question answering, etc. Thus, in recent years, large knowledge bases have been constructed by many businesses and organizations, such as Freebase, DBpedia, YAGO, and the like. Knowledge in the knowledge base is represented in the form of triples (head, relationship, tail) which may be abbreviated as (h, r, t). Although the existing knowledge base already contains a large amount of knowledge, the relationships among a plurality of entities are still lost, so that the completion of the knowledge base becomes a research hotspot.

And (4) realizing knowledge base completion, and modeling the knowledge base firstly. Symbolic representation is a knowledge base modeling method that treats entities and relationships in a knowledge base as symbols. Symbolic representations have the disadvantages of low computational efficiency and data sparsity and cannot be adapted to the knowledge base with the capacity gradually increasing nowadays. Knowledge representation is also a knowledge base modeling method, and the entities and the relations in the knowledge base are embedded into a low-dimensional vector space, and the semantics of the entities and the relations are mapped into corresponding vectors, so that the problems of low calculation efficiency and data sparsity are solved, and the method can be applied to a large knowledge base.

Translation-based models are a typical class of knowledge representation learning methods that treat relationships in a triplet as translation operations between head and tail entities. When the relation between the entities is missing, the corresponding relation vector can be calculated through the difference between the vector of the tail entity and the vector of the head entity, so that the relation is completed. Most of the existing translation-based models only consider single-hop relationships, but not multi-hop relationships, i.e. relationship paths formed by multiple relationships between entities.

Some translation-based models consider multi-hop relationships, but have the following problems:

(1) The multi-hop relationship is obtained in a traversal mode, so that the time is consumed, and the quality of the multi-hop relationship is low;

(2) The weights assigned to each multi-hop relationship are based on their static characteristics, and the model cannot learn these weights during the training process.

In recent years, some work of introducing reinforcement learning into knowledge base completion is emerging, and a high-quality multi-hop relation is obtained by constructing a reinforcement learning model. However, these models have the following problems:

(1) The information considered in the process of searching the multi-hop relationship is not comprehensive enough, and only the selection of the relationship is considered and the selection of the entity is ignored;

(2) The setting of the reward is too simple and does not take various factors into comprehensive consideration.

Disclosure of Invention

The technical problem to be solved by the invention is how to search and introduce high-quality multi-hop relationship in the knowledge representation learning process.

In order to solve the above problems, the present invention provides a knowledge representation learning method based on dual-agent reinforcement learning path search, comprising the following steps:

(1) Deleting redundant relations in the knowledge base, and pre-training vectors of entities and relations;

(2) The path searcher searches a plurality of multi-hop relations between entity pairs of each triple in the knowledge base according to the vectors of the entities and the relations, and a relation agent and an entity agent which consider state and historical information are used for making decisions in the searching process;

(3) And learning vectors of the entities and the relations according to the multi-hop relations among the entities and the multi-hop relations obtained by searching, and measuring the weight of each multi-hop relation by using an attention mechanism.

Compared with the prior art, the invention has the following beneficial effects:

compared with the traditional method for obtaining the multi-hop relationship through traversal, the multi-hop relationship searched by the path searcher has higher quality, and the weight given to the multi-hop relationship is more reasonable; compared with the existing method based on reinforcement learning, the method has the advantages that the decision is made by using two agents, the state and the information can be utilized more comprehensively, and the reward in the model is set more reasonably. The method is mainly applied to knowledge base completion.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is an overall flowchart of a knowledge representation learning method based on a dual-agent reinforcement learning path search according to an embodiment of the present invention;

FIG. 2 is a flow chart of data preprocessing provided by an embodiment of the present invention;

FIG. 3 is a flowchart of a path search according to an embodiment of the present invention;

FIG. 4 is a flow chart of knowledge representation learning provided by an embodiment of the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the detailed description and specific examples, while indicating the scope of the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.

Fig. 1 is an overall flowchart of a knowledge representation learning method based on a dual-agent reinforcement learning path search according to an embodiment of the present invention. Referring to fig. 1, the embodiment provides a knowledge representation learning method based on a dual-agent reinforcement learning path search, which includes three stages of data preprocessing, path search and knowledge representation learning.

Data preprocessing stage

In the data preprocessing stage, the redundant relationship in the knowledge base, the pre-training entity and the relationship vector are mainly deleted, as shown in fig. 2, the specific process is as follows:

step 1-1: and inputting a knowledge base KB and deleting the redundancy relation.

Knowledge in the knowledge base KB is represented in the form of triples (h, r, t), where h represents the head entity, r represents the relationship, and t represents the tail entity. h and t belong to an entity set E, R belongs to a relation set R, the triple (h, R, t) reflects the existence of the relation R between the entity h and the entity t, and the redundant relation in the knowledge base KB is deleted to obtain the processed knowledge base.

Step 1-2: vectors of entities and relationships in the knowledge base KB are pre-trained using an existing translation-based model (e.g., transE).

The path searcher needs to utilize vectors of entities and relationships, and therefore the vectors of entities and relationships in the knowledge base KB are pre-trained using a translation-based model.

Taking TransE as an example: transE learns a vector for each entity and relation in the knowledge base, and in a triplet (h, r, t), the vectors h, r and t corresponding to the head entity h, the relation r and the tail entity t should satisfy:

h+r＝t (1)

and the vectors of the entities and the relations are learned by taking the vectors as targets.

Path search phase

The path search stage mainly realizes searching a plurality of multi-hop relationships between the entity pairs of each triple in the knowledge base according to the vectors of the entities and the relationships, and transmits the multi-hop relationships finally reaching the tail entity to the knowledge representation learning stage, as shown in fig. 3, the specific flow is as follows:

step 2-1: the triples in the knowledge base KB are divided into batches.

The invention trains the path searcher in a batch processing mode. Triples in KB are randomly divided into batches according to a predefined defined batch size.

Step 2-2: taking one batch, searching the multi-hop relationship between the entity pairs of each triple in the batch through a path searcher.

The path searcher comprises a relation agent and an entity agent, starting from the head entity of the given triple, the relation agent calculates the probability distribution of all relations contained in the current entity, and selects one relation; and the entity agent calculates the probability distribution of all tail entities corresponding to the current entity and the selected relation and selects one entity. This process continues until the tail entity of a given triplet is reached or the maximum number of steps is reached.

The path searcher is based on an enhanced learning model, and is composed of two agents, called a relational agent and an entity agent. The process of searching for a multi-hop relationship between (h, t) in (h, r, t) is as follows: starting from a head entity h, at the t step, the relation agent starts from a current entity e _t Including selection of one relation r from all relations _t Entity broker slave e _t And r _t And selecting one entity from all corresponding tail entities, and carrying out the process until the tail entity t is reached or the step number reaches a preset maximum step number.

The environment of the path searcher can be viewed as a Markov decision process, represented by a four-tuple (S, A, T, R), where S represents a set of states, A represents a set of actions, T represents a transition, and R represents a reward.

At step t, the status of the relational agent is denoted S _rel,t ＝(e _t R, t) in which e _t Is the current entity e _t R is a vector representation of the relationship r in the triplet, the vector representation of the tail entity t in the t triplet; the state of the entity agent is denoted S _ent,t ＝(e _t ,r,t,r _t )，r _t Relationship r being a relational agent choice _t Is represented by a vector of (a).

At step t, the action set of the relational agent is the current entity e _t All relationships contained, denoted A _rel,t ＝{r|(e _t R, e) belongs to KB }; the action set of the entity proxy is the current entity e _t Relationship r with relationship agent selection _t All corresponding tail entities, denoted A _ent,t ＝{e|(e _t ,r _t ,e)∈KB}。

At step t, the status of the relational agent is from (e) _t R, t) to (e) _t+1 R, T), the transition of the relational agent is denoted T _rel ((e _t ,r,t),r _t )＝(e _t+1 R, t); state of entity agent from (e) _t ,r,t,r _t ) Become (e) _t+1 ,r,t,r _t+1 ) The transfer of the physical agent is denoted as T _ent ((e _t ,r,t,r _t ),e _t+1 )＝(e _t+1 ,r,t,r _t+1 )。

One multi-hop relationship p = (r) ₁ ,r ₂ ,…,r _n ) The reward of (1) is composed of two parts of overall precision and path weight. Wherein the overall accuracy R _g (p) is represented by:

path weight R _w (p) is represented by:

where W is the weight matrix and p is the vector representation of the multihop relationship p:

the total reward for the multi-hop relationship p is then expressed as:

R(p)＝R _g (p)+R _w (p) (5)

both the relational agent and the entity agent compute a probability distribution over the decision network for performing each action. The input to the decision network contains both historical information and status. At the t stepVector d for history information _t To show that the present invention obtains d by training an RNN _t ：

d _t ＝RNN(d _t-1 ,[e _t-1 ,r _t-1 ]) (6)

Wherein [,]representing the concatenation of two vectors. The inputs of the decision networks corresponding to the relational agent and the entity agent are respectively represented as X _rel,t ＝[d _t ,S _rel,t ]And X _ent,t ＝[d _t ,S _ent,t ]。

The structure of the decision network is a fully-connected neural network comprising two hidden layers, and each hidden layer is connected with a ReLU nonlinear layer.

The output of the decision network corresponding to the relation agent and the entity agent is A _rel,t And A _ent,t Probability distribution of each action in (1):

P _rel (X _rel,t )＝softmax(A _rel,t O _rel,t ) (7)

P _ent (X _ent,t )＝softmax(A _ent,t O _ent,t ) (8)

wherein A is _rel,t And A _ent,t Respectively represent by A _rel,t 、A _ent,t A matrix formed by vectors of all the relations and entities; o is _rel,t And O _ent,t Respectively representing the outputs of the second ReLU layer of the decision network corresponding to the relational and entity agents. The relationship agent and the entity agent, when selecting an entity or a relationship, will make a random selection based on the calculated probability distribution.

For each triplet in a batch, several multi-hop relationships are searched using the path searcher described above.

Step 2-3: and updating parameters and weight matrixes of the relational agents and the entity agents by utilizing the multi-hop relations searched in the batch.

The relevant parameters of the path search phase are updated by maximizing the expected cumulative reward, the parameters including the parameters of the two decision networks, the parameters of the RNN calculating the history information and the weight matrix W. The desired jackpot is defined as:

wherein

Represents the state S _t And action a _t Reward of P (a | X) _t (ii) a θ) represents X at a given input _t Probability of time action a, the present invention updates the parameters by a monte carlo gradient, the gradient of J (θ) is expressed as:

for a searched multi-hop relationship p, when updating parameters, each of the processes of searching the multi-hop relationship

Are all equal to R (p).

Step 2-4: step 2-2 and step 2-3 are repeated until all batches in KB are processed.

And repeating the step 2-2 and the step 2-3, searching the multi-hop relationship between the entity pairs of all the triples in the KB in batch, and updating the related parameters in the path searching stage.

Knowledge representation learning phase

In the knowledge representation learning stage, a single-hop relationship and a multi-hop relationship are simultaneously utilized to learn the entity and the relationship vector, as shown in fig. 4, the specific process is as follows:

step 3-1: the knowledge base divides the triples in KB into batches.

The invention trains a knowledge representation learning model in a batch processing mode, and randomly divides triples in the KB into a plurality of batches according to a preset defined batch size.

Step 3-2: taking one batch, and calculating the weight of all multi-hop relations of each triple.

Given a value of threeTuple (h, r, t) with all multi-hop relationships set to { p } ₁ ,…,p _K H.a multi-hop relationship p _i The weight of (d) is defined as:

wherein:

η _i ＝tanh(Wp _i ) (12)

wherein W is a weight matrix which is the same matrix as the weight matrix in the path search stage reward.

A multi-hop relationship as used herein is a multi-hop relationship that ultimately reaches the tail entity, searched in the path search phase.

And calculating the weight of all multi-hop relations of each triple in the batch according to the formula.

Step 3-3: and calculating energy functions and losses of all the triples in the batch by using the single-hop relation and the multi-hop relation, and updating the vectors and the weight matrix of the entities and the relations.

Given a triplet (h, r, t), the set of all multi-hop relationships is { p } ₁ ,…,p _K -knowledge-representation learning phase energy function is defined as:

from the energy function, a loss function can be defined for the knowledge representation learning phase:

wherein γ is a predefined boundary, [ ·] ⁺ Represents 0 and [. Cndot]The value of inner is taken as the maximum value, T is the positive sampleThe current set, i.e. the set of all triples in the knowledge base; t is ^- Is a set of negative examples, expressed as:

T ^- ＝{(h,r′,t)|r′∈R},(h,r,t)∈T (15)

the negative examples are obtained by replacing the relationship r in the triples with another relationship r' in the knowledge base.

The loss of all triples in the batch is calculated and the vector of entities and relationships and the weight matrix W are updated by minimizing the loss.

Step 3-4: step 3-2 and step 3-3 are repeated until all batches in KB are processed.

And repeating the step 3-2 and the step 3-3, calculating the weight, the energy function and the loss function of the multi-hop relation corresponding to all the triples in the KB in batches, and updating the related parameters of the knowledge representation learning stage.

Step 3-5: if the iteration reaches the preset maximum times, outputting the vectors of the entities and the relations; otherwise, go to step 2-2.

And iteratively performing a path searching stage and a knowledge representation learning stage until iteration reaches a preset maximum number, and outputting vectors of the entities and the relations.

In the knowledge representation learning method based on the double-agent reinforcement learning path search, the path searcher searches high-quality multi-hop relationships among entities by using the entities and the relationship vectors trained by the knowledge representation learning model, and uses two agents to make decisions in the searching process, so that the state and the information can be considered more comprehensively; the knowledge representation learning model simultaneously utilizes the single-hop relationship and the searched multi-hop relationship to learn the vectors of the entity and the relationship, and uses an attention mechanism to measure the weight of each multi-hop relationship. The rewarding in the path searcher and the weight in the knowledge representation model share a part of parameters, so that the part of parameters can not only measure the weight of the multi-hop relationship, but also guide the path searcher to search the multi-hop relationship which is more useful for the knowledge representation learning process.

The above-mentioned embodiments are intended to illustrate the technical solutions and advantages of the present invention, and it should be understood that the above-mentioned embodiments are only the most preferred embodiments of the present invention, and are not intended to limit the present invention, and any modifications, additions, equivalents, etc. made within the scope of the principles of the present invention should be included in the scope of the present invention.

Claims

1. A knowledge representation learning method based on double-agent reinforcement learning path search comprises the following steps:

(1) Deleting redundant relationships in the knowledge base DBpedia, and pre-training vectors of entities and relationships, wherein the steps comprise: and pre-training vectors of entities and relations in a knowledge base DBpedia by adopting a translation-based model, wherein in a triplet (h, r, t), the vectors h, r and t corresponding to a head entity h, a relation r and a tail entity t meet the following conditions:

h+r＝t (1)

using the vector as a target to learn the entity and the relation;

(2) The path searcher searches a plurality of multi-hop relations between entity pairs of each triple in a knowledge base DBpedia according to the vectors of the entities and the relations, and uses a relation agent and an entity agent considering state and historical information to make decisions in the searching process, wherein the decision making process comprises the following steps:

the path searcher is based on an enhanced learning model and is composed of two agents, namely a relation agent and an entity agent, and the process of searching the multi-hop relation between (h, t) in (h, r, t) is as follows: starting from a head entity h, at the t step, the relation agent starts from a current entity e _t Including selection of one relation r from all relations _t Entity proxy slave e _t And r _t Selecting one entity from all corresponding tail entities, and carrying out the process until the tail entity t is reached or the step number reaches the preset maximum step number;

the environment of the path searcher is regarded as a Markov decision process and is represented by a quadruplet (S, A, T, R), wherein S represents a state set, A represents an action set, T represents transition, and R represents reward;

at step t, the status of the relational agent is denoted S _rel，t ＝(e _t R, t) in which e _t Is the current entity e _t Vector representation ofR is the vector representation of the relationship r in the triplet, t is the vector representation of the tail entity t in the triplet; the state of the entity agent is denoted S _ent，t ＝(e _t ，r，t，r _t )，r _t Relationship r being a selection of a relationship agent _t A vector representation of (a);

at step t, the action set of the relational agent is the current entity e _t All relationships contained, denoted A _rel，t ＝{r|(e _t R, e) belongs to DBpedia }; the action set of the entity proxy is the current entity e _t Relationship r with relationship agent selection _t All corresponding tail entities, denoted A _ent，t ＝{e|(e _t ，r _t ，e)∈DBpedia}；

At step t, the status of the relational agent is from (e) _t R, t) to (e) _t+1 R, T), the transition of the relational agent is denoted T _rel ((e _t ，r，t)，r _t )＝(e _t+1 R, t); state of entity agent from (e) _t ，r，t，r _t ) Become (e) _t+1 ，r，t，r _t+1 ) The transfer of the physical agent is denoted as T _ent ((e _t ，r，t，r _t )，e _t+1 )＝(e _t+1 ，r，t，r _t+1 )；

One multi-hop relationship p = (r) ₁ ，r ₂ ，...，r _n ) The reward of (2) is composed of an overall precision and a path weight, wherein the overall precision R _g (p) is represented by:

path weight R _w (p) is represented by:

the total reward for the multi-hop relationship p is then expressed as:

R(p)＝R _g (p)+R _w (p) (5)

the relation agent and the entity agent calculate the probability distribution of executing each action through a decision network, the input of the decision network comprises two parts of history information and state, and the vector d for the history information at the t step _t To express, d is obtained by training an RNN _t ：

d _t ＝RNN(d _t-1 ，[e _t-1 ，r _t-1 ]) (6)

Wherein [,]the inputs of decision networks corresponding to a relational agent and an entity agent representing the connection of two vectors are respectively represented as X _rel，t ＝[d _t ，S _rel，t ]And X _ent，t ＝[d _t ，S _ent，t ]；

The decision network is structurally a fully-connected neural network comprising two hidden layers, and a ReLU nonlinear layer is connected behind each hidden layer;

the output of the decision network corresponding to the relation agent and the entity agent is A _rel，t And A _ent，t Probability distribution of each action in (1):

P _rel (X _rel，t )＝softmax(A _rel，t O _rel，t ) (7)

P _ent (X _ent，t )＝softmax(A _ent，t O _ent，t ) (8)

wherein A is _rel，t And A _ent，t Respectively represent by A _rel，t 、A _ent，t A matrix formed by vectors of all the relations and entities; o is _rel，t And O _ent，t The outputs of the second ReLU layers of the decision networks corresponding to the relational agent and the entity agent are respectively represented; the relation agent and the entity agent can randomly select on the basis of probability distribution obtained by calculation when selecting an entity or a relation;

updating parameters and weight matrixes of the relational agents and the entity agents by utilizing the searched multi-hop relations, and the specific process comprises the following steps:

the relevant parameters of the path search phase are updated by maximizing the expected cumulative reward, which includes the parameters of the two decision networks, the parameter of the RNN calculating the history information and the weight matrix W, and is defined as:

wherein

Represents the state S _t And action a _t Reward of P (a | X) _t (ii) a θ) represents X at a given input _t Probability of time action a, updating the parameters by a monte carlo gradient, the gradient of J (θ) is expressed as:

Are all equal to R (p);

(3) Learning vectors of the entities and the relations according to the multi-hop relations among the entities and the multi-hop relations obtained by searching, and measuring the weight of each multi-hop relation by using an attention mechanism;

(4) And completing the knowledge base DBpedia based on the learned entity and the relation vector.

2. The knowledge representation learning method based on the dual-agent reinforcement learning path search as claimed in claim 1, wherein the specific process of step (3) is:

(3-1) calculating the weight of all multi-hop relations of each triple;

and (3-2) calculating energy functions and losses of all the triples in the batch by using the single-hop relation and the multi-hop relation, and updating vectors and weight matrixes of the entities and the relations.

3. The method for learning knowledge representation based on dual-agent reinforcement learning path search according to claim 2, wherein in step (3-1), a triple (h, r, t) is given, and the set of all multi-hop relationships thereof is { p } ₁ ，...，p _K }, a multihop relationship p _i The weight of (d) is defined as:

wherein:

η _i ＝tanh(Wp _i ) (12)

where W is the weight matrix.

4. The knowledge representation learning method based on the dual-agent reinforcement learning path search as claimed in claim 2, wherein in step (3-2), a triplet (h, r, t) is given, and all the multi-hop relationship sets thereof are { p } p ₁ ，...，p _K -knowledge-representation learning phase energy function is defined as:

from the energy function, a loss function is defined:

wherein γ is a predefined boundary [ ·] ⁺ Represents 0 and [. Cndot]The value in the table is the maximum value, and T is a positive sample set, namely the set of all triples in a knowledge base DBpedia; t is ^- Is negativeA sample set, represented as:

T ^- ＝{(h，r′，t)|r′∈R}，(h，r，t)∈T (15)

the negative sample is obtained by replacing the relation r in the triple with another relation r' in the knowledge base DBpedia;

the penalties for all triples in the batch are calculated and the vector of entities and relationships and the weight matrix W are updated by minimizing the penalties.