WO2021257195A1

WO2021257195A1 - Topic graph-based comment generation

Info

Publication number: WO2021257195A1
Application number: PCT/US2021/030749
Authority: WO
Inventors: Qing Zhou; Jianyong Wang; Peng Chen; Qiang Li; Tian Wei; Zhiyang Su; Ting Sun
Original assignee: Microsoft Technology Licensing LLC
Current assignee: Microsoft Technology Licensing LLC
Priority date: 2020-06-18
Filing date: 2021-05-05
Publication date: 2021-12-23
Anticipated expiration: 2022-12-18
Also published as: CN111753050A; CN111753050B

Abstract

The present disclosure provides methods and apparatuses for topic graph-based comment generation. At least one topic item may be identified from a target document. A topic graph may be established with the at least one topic item. A topic graph representation of the topic graph may be generated based at least on structure information of the topic graph. A comment to the target document may be generated based at least on the topic graph representation.

Description

TOPIC GRAPH-BASED COMMENT GENERATION

BACKGROUND

[0001] In recent years, Artificial Intelligence (AI) techniques have developed rapidly and have been widely applied. Automatic content creation with the AI techniques has become a research hotspot. For example, it has been proposed to apply the AI techniques for automatically generating comments to documents. Herein, a document may broadly refer to various text content, e.g., news, article, novel, book, product introduction, service experience description, social media text content, etc. The AI techniques may simulate human’s thinking approaches and expression habits, to analyze a document and generate a corresponding comment for expressing views and positions.

SUMMARY

[0002] This Summary is provided to introduce a selection of concepts that are further described below in the Detailed Description. It is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

[0003] Embodiments of the present disclosure propose methods and apparatuses for topic graph-based comment generation. At least one topic item may be identified from a target document. A topic graph may be established with the at least one topic item. A topic graph representation of the topic graph may be generated based at least on structure information of the topic graph. A comment to the target document may be generated based at least on the topic graph representation.

[0004] It should be noted that the above one or more aspects comprise the features hereinafter fully described and particularly pointed out in the claims. The following description and the drawings set forth in detail certain illustrative features of the one or more aspects. These features are only indicative of the various ways in which the principles of various aspects may be employed, and this disclosure is intended to include all such aspects and their equivalents.

BRIEF DESCRIPTION OF THE DRAWINGS

[0005] The disclosed aspects will hereinafter be described in conjunction with the appended drawings that are provided to illustrate and not to limit the disclosed aspects. [0006] FIG.l illustrates an exemplary process of establishing a candidate topic item set according to an embodiment.

[0007] FIG.2 illustrates an exemplary process of generating a topic item representation according to an embodiment.

[0008] FIG.3 illustrates an exemplary process for topic graph-based comment generation according to an embodiment.

[0009] FIG.4 illustrates an example of establishing a topic graph according to an embodiment.

[0010] FIG.5 illustrates an exemplary topic graph-based comment generating system according to an embodiment.

[0011] FIG.6 illustrates an exemplary decoder unit according to an embodiment.

[0012] FIG.7 illustrates a flowchart of an exemplary method for topic graph-based comment generation according to an embodiment.

[0013] FIG.8 illustrates an exemplary apparatus for topic graph-based comment generation according to an embodiment.

[0014] FIG.9 illustrates an exemplary apparatus for topic graph-based comment generation according to an embodiment.

DETAILED DESCRIPTION

[0015] The present disclosure will now be discussed with reference to several example implementations. It is to be understood that these implementations are discussed only for enabling those skilled in the art to better understand and thus implement the embodiments of the present disclosure, rather than suggesting any limitations on the scope of the present disclosure.

[0016] Existing AI-based automatic comment generation techniques usually generate a comment to a document merely based on information recited in the document itself. For example, in the case of applying a sequence-to-sequence (seq2seq) model for comment generation, the model may adopt a document as an input, and generate a comment merely based on the document.

[0017] Embodiments of the present disclosure propose to apply a topic graph mined from a document for comment generation for the document. A topic graph is a structural representation of topic-related information in a document. Through adopting a topic graph, a comment generating process may better consider structural relation of key content in a document, so that a generated comment has higher accuracy, stronger logicality, etc. A representation of a topic graph may be generated based at least on structure information of the topic graph, wherein the structure information reflects structural relation among key content such as topics in a document. Moreover, a representation of a topic graph may be generated further based on rich topic-related information beyond a target document. Thus, in the case that a comment generating model or system generates a comment based at least on a topic graph representation, the generated comment will have richer diversity, more substantial elaborations, etc. Herein, a topic graph representation may refer to a characteristic representation of information in a topic graph in a latent space. Moreover, herein, the term "representation" may broadly refer to an embedding representation, a vector representation, a matrix representation, etc. in a latent space.

[0018] In an aspect, during an offline mining stage, the embodiments of the present disclosure may prepare a candidate topic item set in advance and generate a topic item representation of each candidate topic item. A topic item may correspond to a topic, and may be used for establishing a topic graph. Each topic item may be, e.g., a bigram comprising an entity and topic words. A large number of entities may be previously mined, and topic words may be extracted from a large number of reference documents. A plurality of candidate topic items is constituted through relevance calculation between entities and topic words. Moreover, a topic item representation of each candidate topic item may be generated based at least on a plurality of relevant documents associated with the candidate topic item. A topic item representation may refer to a characteristic representation of information in a topic item in a latent space.

[0019] In an aspect, during an online application stage, for a target document for which a comment is to be generated, the embodiments of the present disclosure may identify topic items from the target document with a candidate topic item set, and establish a topic graph with the identified topic items. In the process of generating a topic graph representation, at least structure information of the topic graph may be considered, so that the topic graph representation is capable of better reflecting structural relation among topics in the document. Moreover, in the process of generating the topic graph representation, a topic item representation of each topic item generated based on rich topic-related information may also be adopted, so that the topic graph representation not only reflects information contained in the target document, but also is capable of reflecting richer relevant information. In a comment generating model, a comment may be generated by merging a target document representation and the topic graph representation, so that the comment is generated at least under the influence of the topic graph.

[0020] Through the topic graph-based comment generation according to the embodiments of the present disclosure, comments with higher-quality may be generated for a target document, thereby significantly improving the performance of an automatic comment generating system and effectively improving user experience. [0021] FIG.l illustrates an exemplary process 100 of establishing a candidate topic item set according to an embodiment. The process 100 may be performed for preparing a candidate topic item set in advance in an offline mining stage.

[0022] At 110, a list of entities may be obtained, the list comprising a large number of words representing entities. For example, a large number of words representing entities may be extracted from various data sources through any known data mining technique. Herein, an entity may broadly comprise a personal name, a company name, a brand, a program name, etc. The data sources from which entity words are extracted may comprise various content providing platforms, e.g., Wikipedia, Baidu Baike, news websites, etc. Each entity in the entity list may comprise one or more words. The embodiments of the present disclosure are not limited to any specific approach of obtaining the entity list.

[0023] At 120, a plurality of reference documents may be obtained. The reference documents may be used as original data from which topic words are extracted. The reference documents may comprise various documents collected from the Internet, e.g., news, article, etc.

[0024] At 130, a large number of topic words may be extracted from the reference documents. In an implementation, all n-gram words may be extracted from titles of the reference documents as topic words.

[0025] At 140, a relevance value between each entity obtained at 110 and each topic word extracted at 130 may be calculated, the relevance value reflecting a relevance degree between the entity and the topic word. In an implementation, the relevance value may be a mutual information value (PMI). For example, a PML between an entity z and a topic word z may be calculated as:

PMI - = lo_g ^P(entity ^ topic word i) Equation P(entity i)P( topic word i)

(1) wherein /’(entity z) denotes a frequency that the entity z occurs in an obtained reference document, /’(topic word z) denotes a frequency that the topic word z occurs in the obtained reference document, /’(entity z, topic word z) denotes a frequency that the entity z and the topic word z co-occur in the obtained reference document. The entity z and the topic word z may be combined into a bigram <entity z, topic word z>, thereby the calculated PML is a relevance value corresponding to the bigram. Through the calculation at 140, a large number of bigrams and their respective relevance values may be obtained.

[0026] At 150, a plurality of <entity, topic words> bigrams highest-ranked by relevance may be selected as a plurality of candidate topic items. For example, bigrams with relevance values above a predetermined relevance threshold may be selected as candidate topic items. Table 1 shows an exemplary list of selected candidate topic items.

Table 1 [0027] As shown in Table 1, the exemplary bigram <Michael Jordan, sports> has the highest relevance value of "7.65", and the exemplary bigram <Nike, sponsor> has the second highest relevance value of "7.50", and so on. Assuming that a relevance threshold is "4.50", Table 1 may comprise a large number of bigrams with relevance values higher than the relevance threshold. It should be understood that all the entities, topic words, and PMI values in Table 1 are exemplary, and in actual application scenarios, any other bigrams may be selected at 150.

[0028] At 160, a candidate topic item set may be formed with the candidate topic items selected at 150. The candidate topic item set will comprise a plurality of candidate topic items, and each candidate topic item may correspond to a topic and comprise an entity and topic words with higher relevance. Taking the candidate topic item <Michael Jordan, sports> as an example, since Michael Jordan is a person frequently appearing in the field of sports, and he is often mentioned in various sports-related events, therefore, the topic based on the combination of "Michael Jordan" and "sports" will be representative. [0029] It should be understood that the process of establishing a candidate topic item set discussed above in conjunction with the process 100 is only exemplary, and any form of modification may be made to the process 100 according to specific application requirements and designs.

[0030] FIG.2 illustrates an exemplary process 200 of generating a topic item representation according to an embodiment. The process 200 is a continuation of the process 100 in FIG.l, for further determining a topic item representation of each candidate topic item in the candidate topic item set. A topic item representation may also be referred to as a topic item vector representation, a topic item embedding representation, etc. It is assumed that the process 200 will determine a topic item representation of a candidate topic item i 202. The candidate topic item i 202 may be a bigram composed of an entity i and topic words i. The topic item representation of the candidate topic item i 202 may be generated with relevant documents of the candidate topic item.

[0031] At 210, a plurality of relevant document titles including the entity i and the topic words i in the candidate topic item i 202 may be identified. For example, a plurality of relevant documents, titles of which include both the entity i and the topic words i, may be identified from titles of the plurality of reference documents obtained at 120 in FIG.l. The titles of these relevant documents may be further used for generating the topic item representation of the candidate topic item i 202.

[0032] In an implementation, at 220, a title representation of each relevant document title may be generated. A title representation may refer to a characteristic representation of information in a title in a latent space, e.g., a title vector representation. A relevant document title may be converted into a title representation through any known technique, e.g., Bert, etc. Thus, through the processing at 220, a plurality of title representations may be generated for the plurality of relevant document titles identified at 210.

[0033] At 230, clustering may be performed on the plurality of generated title representations. For example, these title representations may be clustered into at least one cluster.

[0034] At 240, a representative representation of at least one cluster obtained through the clustering at 230 may be calculated. In an implementation, the representative representation may be a representation calculated for a center of the at least one cluster. Thus, the representative representation may characterize the candidate topic item i 202 with rich information from the plurality of relevant document titles. The representative representation calculated at 240 may be used as a topic item representation i 204 of the candidate topic item i.

[0035] Through performing the process 200 on each candidate topic item in the candidate topic item set, a topic item representation of each candidate topic item may be obtained.

[0036] FIG.3 illustrates an exemplary process 300 for topic graph-based comment generation according to an embodiment. The process 300 may be performed for generating a comment for a target document 302 in an online application stage.

[0037] At 310, at least one topic item may be identified from the target document 302. In an implementation, the topic item identification at 310 may be performed with a pre- established candidate topic item set 314, and the candidate topic item set 314 may be established through, e.g., the process 100 in FIG.l. Topic items included in the target document 302 may be identified through matching the target document 302 with the candidate topic item set 314. For example, the candidate topic item set 314 comprises a plurality of candidate topic items and each candidate topic item comprises a bigram of <entity, topic words>, therefore, all the bigrams completely contained in the target document 302 may be found out through, e.g., string matching, etc., as the identified topic items. It should be understood that the topic item identification at 310 may be performed for a title, a main body, or both the title and the main body of the target document 302. [0038] Assuming that a plurality of topic items 312 are identified at 310, a topic graph may be established at 320 with the topic items 312. In an implementation, each topic item in the topic items 312 may be set as a node in the topic graph. Then, one or more node pairs sharing the same information may be identified from all the nodes. For example, two nodes including the same entity or two nodes including the same topic words may be identified as a node pair. Furthermore, for each identified node pair, an edge for connecting two nodes in the node pair may be set. This edge indicates that the two nodes have common information, e.g., entity or topic words, and thus have strong relevance. An edge is a graphical representation of connection relationship in a topic graph. Connection information of each node may be determined with edges connected with the node. All the nodes and edges connected among these nodes together form the topic graph. Structure information of the topic graph may be determined according to the nodes and edges included in the topic graph, and the structure information comprises at least connection information of each node, and thus may reflect relevance of each node to other nodes. For the ease of understanding, FIG.4 illustrates an example of establishing a topic graph according to an embodiment.

[0039] It is assumed that, in FIG.4, it is desired to establish a topic graph for a target document 410. Similar to the topic item identifying processing at 310 in FIG.3, a group of topic items 430 may be identified from the target document 410 with the candidate topic item set 420. For example, the identified topic items may comprise <Michael Jordan, sports>, <Nike, sponsor>, <Nike, shoes>, <Michael Jordan, basketball>, <Nike, sports>, <Michael Jordan, NBA star>, etc. Similar to the topic graph establishing processing at 320 in FIG.3, a topic graph 440 may be established with the group of topic items 430. As shown, nodes 441 to 446 in the topic graph 440 are set with the identified topic items respectively. Edges 451 to 457 in the topic graph 440 are set between node pairs sharing the same information respectively. For example, the node 441 and the node 444 share the same topic word "sports", thus an edge 457 is set between the node 441 and the node 444. For example, the node 444 and the node 445 share the same entity "Nike", thus an edge 454 is set between the node 444 and the node 445. It should be understood that the topic graph and its establishment process in FIG.4 are only exemplary, and in actual applications, the topic graph may comprise more or less information, or adopt any other representation form.

[0040] Returning back to FIG.3, after the topic graph 322 is established through the topic graph establishment process at 320, a topic graph representation of the topic graph 322 may be generated at 330. In an implementation, the topic graph representation may be generated based at least on structure information 324 of the topic graph 322. As mentioned above, the structure information 324 may comprise connection information of each node. Therefore, the generated topic graph representation considers at least relevance among topics in the target document. Moreover, optionally, in an implementation, the topic graph representation may be generated based on the structure information 324 and a topic item representation of a topic item corresponding to each node. In this case, at least one topic item representation 316 corresponding to at least one topic item 312 may be obtained first. Topic item representations 316 may be previously generated through the process 200 in FIG.2. Optionally, the process 200 may be performed in the online application stage, so as to generate the topic item representations 316 in real time. The topic item representations 316 of the topic items 312 may be used as initial node features 326 of respective nodes in the topic graph. Thus, after an initial node feature of each node in the topic graph is obtained, the topic graph representation may be generated based on the structure information 326 and the initial node features 326. Since topic item representations or initial node features are adopted, the generated topic graph representation will further consider rich information beyond the target document.

[0041] In an implementation, at 330, an updated node feature of each node may be generated, through, e.g., a graph attention network, based on an initial node feature of the node and connection information of the node included in the structure information, and updated node features of a plurality of nodes may be combined into a topic graph representation. The graph attention network may adopt, e.g., a graph attention neural network GAT, etc. An updated node feature may be considered as a node feature that contains structure information of a topic graph. The updated node features of the plurality of nodes may be combined into the topic graph representation through various approaches. For example, in the case that each updated node feature is a vector representation, these vector representations may be combined into a matrix, so that this matrix representation may be used as the topic graph representation.

[0042] After the topic graph representation 332 is generated through the topic graph representation generating processing at 330, a comment to the target document 302 may be generated at 350 based at least on the topic graph representation 332. Moreover, the comment generation at 350 may also be based on a document representation of the target document 302. For example, a document representation 342 corresponding to the target document 302 may be generated at 340. A document representation may refer to a characteristic representation of information in a document in a latent space. Any known document encoding techniques, e.g., Transformer encoding, etc., may be adopted at 340 for encoding the target document 302 into the document representation 342.

[0043] The comment generation at 350 may comprise a merging of the document representation 342 and the topic graph representation 332. In the case of adopting seq2seq architecture for implementing automatic comment generation, the document representation 342 and the topic graph representation 332 may be merged by adding attention mechanisms in a decoder. For example, a first attention mechanism may be applied to the document representation 342, a second attention mechanism may be applied to the topic graph representation 332, and then an output of the first attention mechanism and an output of the second attention mechanism are merged, accordingly, a result of the merging may be used for generating a comment.

[0044] Through the process 300, a comment 304 to the target document 302 may be finally generated. Since the process 300 is performed based at least on the topic graph, the comment generating process not only considers structural relation among topics in the document, but also considers rich information beyond the target document, therefore the generated comment 304 will have higher quality.

[0045] FIG.5 illustrates an exemplary topic graph-based comment generating system 500 according to an embodiment. The comment generating system 500 may be constructed based at least on, e.g., the process 300 in FIG.3.

[0046] The comment generating system 500 may comprise a topic graph generating module 510 for constructing a topic graph according to a target document 502, and a sequence-to-sequence comment generating model 520 for comment generation.

[0047] The topic graph generating module 510 may comprise a topic item identifying unit 512. The topic item identifying unit 512 may identify topic items from the target document 502 through, e.g., the topic item identifying processing at 310 in FIG.3. The topic graph generating module 510 may further comprise a topic graph establishing unit 514. The topic graph establishing unit 514 may establish a topic graph with the topic items through, e.g., the topic graph establishing processing at 320 in FIG.3. Although not shown, the topic graph generating module 510 may further comprise a unit for obtaining initial node features of nodes in the topic graph.

[0048] The sequence-to-sequence comment generating model 520 may comprise an encoder unit 522 and a decoder unit 528. The encoder unit 522 may comprise a document encoder 524 and a graph attention network 526. The document encoder 524 may encode the target document 502 into a corresponding document representation through, e.g., the document representation generating processing at 340 in FIG.3. The graph attention network 526 may generate a topic graph representation based on topic graph structure information and initial node features provided by the topic graph generating module 510, through, e.g., the topic graph representation generating processing at 330 in FIG.3. The decoder unit 528 may generate a comment 504 based on the topic graph representation and the document representation provided by the encoder unit 522, through, e.g., the comment generating processing at 350 in FIG.3.

[0049] It should be understood that all the modules and units in the topic graph-based comment generating system 500 in FIG.5 are exemplary, and any approach of modification may be made to the modules and units in the system and the structure of the system according to specific application requirements and designs. For example, the modules and units in FIG.5 may be combined and split in any approach. For example, although FIG.5 adopts a sequence-to-sequence comment generating model, this model may also be replaced by any other model capable of generating comments based on a topic graph representation and a document representation.

[0050] FIG.6 illustrates an exemplary decoder unit 600 according to an embodiment. The decoder unit 600 may correspond to the decoder unit 528 in FIG.5. The decoder unit 600 may generate a comment text based at least on a document representation provided by a document encoder 622 and a topic graph representation provided by a graph attention network 632. The document encoder 622 and the graph attention network 632 may correspond to the document encoder 524 and the graph attention network 526 in FIG.5, respectively.

[0051] It is assumed that the number of decoding steps of the decoder unit 600 is T. At each decoding step, the decoder unit 600 may generate a token, and this token and tokens generated in previous decoding steps may form an output sequence together. Herein, a token may comprise one or more characters, one or more words, etc. The output sequence may be used as an input sequence for the next decoding step for further generating the next token. At the first decoding step, a start symbol <s> may be used as an input sequence, and the decoder unit 600 may take the start symbol <s>, the document representation and the topic graph representation as inputs. The decoder unit 600 may output a predicted probability of each candidate token, and select a token from a plurality of highest-ranked candidate tokens as a token generation result by the first decoding step, and add it into an output sequence. At the decoding step /, an output sequence obtained at the decoding step t- 1 may be used as an input sequence, and the decoder unit 600 may generate a token at the decoding step t based on the input sequence, the document representation and the topic graph, and add it into the output sequence. The decoding process may be performed iteratively in the above approach until the maximum number of decoding steps T is reached or an ending symbol is generated. The output sequence finally obtained may be used as the comment text.

[0052] In the decoder unit 600, a self-attention mechanism 610 may be first applied to the input sequence 602. An attention mechanism 620 may be applied to the document representation from the document encoder 622 and an output of the self-attention mechanism 610. An attention mechanism 630 may be applied to the topic graph representation from the graph attention network 632 and an output of the attention mechanism 620. A gating mechanism 640 may be applied to an output of the attention mechanism 630 and the output of the attention mechanism 620, for performing merging. The merged information may be used for generating a token which will be added into the output sequence 604.

[0053] Assuming that the input sequence is edec , the document representation is c™_<:, and the topic graph representation is e_gra_Ph, the processing in the decoder unit 600 may be represented as: d-_dec Attention(Q _dec> ^ ^_de V _dec) Equation

(2) d_enc Attention d _dec ^_en ^ ^_enc) Equation (3) Equation (4)

Equation

(5)

Equation

(6) wherein the attention mechanism Attention(Q, K, V ) = softmax

W _grap are parameters, and s is a sigmoid function. Equation (2) corresponds to the self attention mechanism 610, wherein d_dec is an output. Equation (3) corresponds to the attention mechanism 620, wherein d_enc is an output. Equation (4) corresponds to the attention mechanism 630, wherein d_graph is an output. Equation (5) and Equation (6) correspond to the gating mechanism 640. d_output is the output of the decoder unit. The document representation and the topic graph representation are merged at least in Equation (3) to Equation (5), and the merged information is used for predicting the output token of the decoder unit.

[0054] It should be understood that FIG.6 only illustrates an exemplary implementation of the decoder unit 600, and any approach of modification may be made to the structure of the decoder unit 600 according to specific application requirements and designs.

[0055] FIG.7 illustrates a flowchart of an exemplary method 700 for topic graph-based comment generation according to an embodiment.

[0056] At 710, at least one topic item may be identified from a target document.

[0057] At 720, a topic graph may be established with the at least one topic item.

[0058] At 730, a topic graph representation of the topic graph may be generated based at least on structure information of the topic graph.

[0059] At 740, a comment to the target document may be generated based at least on the topic graph representation.

[0060] In an implementation, each topic item may be a bigram comprising an entity and topic words.

[0061] In an implementation, the identifying at least one topic item may comprise: identifying the at least one topic item through matching the target document with a candidate topic item set. [0062] In an implementation, the establishing a topic graph may comprise: setting the at least one topic item as at least one node; identifying one or more node pairs sharing the same information; and setting, for each identified node pair, an edge for connecting two nodes in the node pair.

[0063] In an implementation, the generating a topic graph representation may comprise: obtaining at least one topic item representation of the at least one topic item; and generating the topic graph representation based on the at least one topic item representation and the structure information.

[0064] A topic item representation of each topic item may be generated with relevant documents of the topic item.

[0065] The topic graph may comprise a plurality of nodes and a plurality of edges connected among the plurality of nodes, and the structure information may comprise connection information of each node. The generating the topic graph representation may comprise: obtaining an initial node feature of each node in the plurality of nodes, the initial node feature corresponding to a topic item representation of a topic item in the node; generating, based on an initial node feature of each node and connection information of the node, an updated node feature of the node through a graph attention network; and combining a plurality of updated node representations of the plurality of nodes into the topic graph representation.

[0066] In an implementation, the method 700 may further comprise: generating a document representation corresponding to the target document. The comment may be generated further based on the document representation.

[0067] The generating a comment may comprise: applying a first attention mechanism to the document representation; applying a second attention mechanism to the topic graph representation; merging an output of the first attention mechanism and an output of the second attention mechanism; and generating the comment with a result of the merging. [0068] In an implementation, the method 700 may further comprise: pre-establishing the candidate topic item set.

[0069] The pre-establishing the candidate topic item set may comprise: obtaining a plurality of entities; extracting a plurality of topic words from a plurality of reference document titles; calculating a relevance value between each entity and each topic word; and selecting a plurality of <entity, topic words> bigrams highest-ranked by relevance, as a plurality of candidate topic items in the candidate topic item set.

[0070] The method 700 may further comprise, for each candidate topic item: identifying, from the plurality of reference document titles, a plurality of relevant document titles including an entity and topic words in the candidate topic item; and generating a topic item representation of the candidate topic item with the plurality of relevant document titles.

[0071] The generating a topic item representation of the candidate topic item may comprise: generating a plurality of title representations of the plurality of relevant document titles; clustering the plurality of title representations into at least one cluster; and calculating a representative representation of the at least one cluster, as the topic item representation.

[0072] It should be understood that the method 700 may further comprise any step/process for topic graph-based comment generation according to the embodiments of the present disclosure as described above.

[0073] FIG.8 illustrates an exemplary apparatus 800 for topic graph-based comment generation according to an embodiment.

[0074] The apparatus 800 may comprise: a topic item identifying module 810, for identifying at least one topic item from a target document; a topic graph establishing module 820, for establishing a topic graph with the at least one topic item; a topic graph representation generating module 830, for generating a topic graph representation of the topic graph based at least on structure information of the topic graph; and a comment generating module 840, for generating a comment to the target document based at least on the topic graph representation.

[0075] In an implementation, the topic graph establishing module 820 may be for: setting the at least one topic item as at least one node; identifying one or more node pairs sharing the same information; and setting, for each identified node pair, an edge for connecting two nodes in the node pair.

[0076] In an implementation, the topic graph representation generating module 830 may be for: obtaining at least one topic item representation of the at least one topic item; and generating the topic graph representation based on the at least one topic item representation and the structure information.

[0077] A topic item representation of each topic item may be generated with relevant documents of the topic item.

[0078] In an implementation, the apparatus 800 may further comprise: a document representation generating module, for generating a document representation corresponding to the target document. The comment may be generated further based on the document representation.

[0079] The comment generating module 840 may be for: applying a first attention mechanism to the document representation; applying a second attention mechanism to the topic graph representation; merging an output of the first attention mechanism and an output of the second attention mechanism; and generating the comment with a result of the merging.

[0080] Moreover, the apparatus 800 may further comprise any other module that performs steps of the methods for topic graph-based comment generation according to the embodiments of the present disclosure as described above.

[0081] FIG.9 illustrates an exemplary apparatus 900 for topic graph-based comment generation according to an embodiment.

[0082] The apparatus 900 may comprise: at least one processor 910; and a memory 920 storing computer-executable instructions. When the computer-executable instructions are executed, the at least one processor 910 may: identify at least one topic item from a target document; establish a topic graph with the at least one topic item; generate a topic graph representation of the topic graph based at least on structure information of the topic graph; and generate a comment to the target document based at least on the topic graph representation. Moreover, the processor 910 may further perform any other step/process of the methods for topic graph-based comment generation according to the embodiments of the present disclosure as described above.

[0083] The embodiments of the present disclosure may be embodied in a non- transitory computer-readable medium. The non-transitory computer readable medium may comprise instructions that, when executed, cause one or more processors to perform any operation of the methods for topic graph-based comment generation according to the embodiments of the disclosure as described above.

[0084] It should be appreciated that all the operations in the methods described above are merely exemplary, and the present disclosure is not limited to any operations in the methods or sequence orders of these operations, and should cover all other equivalents under the same or similar concepts.

[0085] It should also be appreciated that all the modules in the apparatuses described above may be implemented in various approaches. These modules may be implemented as hardware, software, or a combination thereof. Moreover, any of these modules may be further functionally divided into sub-modules or combined together.

[0086] Processors have been described in connection with various apparatuses and methods. These processors may be implemented using electronic hardware, computer software, or any combination thereof. Whether such processors are implemented as hardware or software will depend upon the particular application and overall design constraints imposed on the system. By way of example, a processor, any portion of a processor, or any combination of processors presented in the present disclosure may be implemented with a microprocessor, microcontroller, digital signal processor (DSP), a field-programmable gate array (FPGA), a programmable logic device (PLD), a state machine, gated logic, discrete hardware circuits, and other suitable processing components configured to perform the various functions described throughout the present disclosure. The functionality of a processor, any portion of a processor, or any combination of processors presented in the present disclosure may be implemented with software being executed by a microprocessor, microcontroller, DSP, or other suitable platform.

[0087] Software shall be construed broadly to mean instructions, instruction sets, code, code segments, program code, programs, subprograms, software modules, applications, software applications, software packages, routines, subroutines, objects, threads of execution, procedures, functions, etc. The software may reside on a computer-readable medium. A computer-readable medium may include, by way of example, memory such as a magnetic storage device (e.g., hard disk, floppy disk, magnetic strip), an optical disk, a smart card, a flash memory device, random access memory (RAM), read only memory (ROM), programmable ROM (PROM), erasable PROM (EPROM), electrically erasable PROM (EEPROM), a register, or a removable disk. Although a memory is shown as being separate from the processor in various aspects presented in this disclosure, a memory may also be internal to the processor (e.g., a cache or a register).

[0088] The previous description is provided to enable any person skilled in the art to practice the various aspects described herein. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects. Thus, the claims are not intended to be limited to the aspects shown herein. All structural and functional equivalents to the elements of the various aspects described throughout the present disclosure that are known or later come to be known to those of ordinary skilled in the art are intended to be encompassed by the claims.

Claims

1. A method for topic graph-based comment generation, comprising: identifying at least one topic item from a target document; establishing a topic graph with the at least one topic item; generating a topic graph representation of the topic graph based at least on structure information of the topic graph; and generating a comment to the target document based at least on the topic graph representation.

2. The method of claim 1, wherein each topic item is a bigram comprising an entity and topic words.

3. The method of claim 1, wherein the identifying at least one topic item comprises: identifying the at least one topic item through matching the target document with a candidate topic item set.

4. The method of claim 1, wherein the establishing a topic graph comprises: setting the at least one topic item as at least one node; identifying one or more node pairs sharing the same information; and setting, for each identified node pair, an edge for connecting two nodes in the node pair.

5. The method of claim 1, wherein the generating a topic graph representation comprises: obtaining at least one topic item representation of the at least one topic item; and generating the topic graph representation based on the at least one topic item representation and the structure information.

6. The method of claim 5, wherein a topic item representation of each topic item is generated with relevant documents of the topic item.

7. The method of claim 5, wherein the topic graph comprises a plurality of nodes and a plurality of edges connected among the plurality of nodes, the structure information comprises connection information of each node, and the generating the topic graph representation comprises: obtaining an initial node feature of each node in the plurality of nodes, the initial node feature corresponding to a topic item representation of a topic item in the node; generating, based on an initial node feature of each node and connection information of the node, an updated node feature of the node through a graph attention network; and combining a plurality of updated node representations of the plurality of nodes into the topic graph representation.

8. The method of claim 1, further comprising: generating a document representation corresponding to the target document, and wherein the comment is generated further based on the document representation.

9. The method of claim 8, wherein the generating a comment comprises: applying a first attention mechanism to the document representation; applying a second attention mechanism to the topic graph representation; merging an output of the first attention mechanism and an output of the second attention mechanism; and generating the comment with a result of the merging.

10. The method of claim 3, further comprising: pre-establishing the candidate topic item set.

11. The method of claim 10, wherein the pre-establishing the candidate topic item set comprises: obtaining a plurality of entities; extracting a plurality of topic words from a plurality of reference document titles; calculating a relevance value between each entity and each topic word; and selecting a plurality of <entity, topic words> bigrams highest-ranked by relevance, as a plurality of candidate topic items in the candidate topic item set.

12. The method of claim 11, further comprising, for each candidate topic item: identifying, from the plurality of reference document titles, a plurality of relevant document titles including an entity and topic words in the candidate topic item; and generating a topic item representation of the candidate topic item with the plurality of relevant document titles.

13. The method of claim 12, wherein the generating a topic item representation of the candidate topic item comprises: generating a plurality of title representations of the plurality of relevant document titles; clustering the plurality of title representations into at least one cluster; and calculating a representative representation of the at least one cluster, as the topic item representation.

14. An apparatus for topic graph-based comment generation, comprising: a topic item identifying module, for identifying at least one topic item from a target document; a topic graph establishing module, for establishing a topic graph with the at least one topic item; a topic graph representation generating module, for generating a topic graph representation of the topic graph based at least on structure information of the topic graph; and a comment generating module, for generating a comment to the target document based at least on the topic graph representation.

15. An apparatus for topic graph-based comment generation, comprising: at least one processor; and a memory storing computer-executable instructions that, when executed, cause the at least one processor to: identify at least one topic item from a target document, establish a topic graph with the at least one topic item, generate a topic graph representation of the topic graph based at least on structure information of the topic graph, and generate a comment to the target document based at least on the topic graph representation.