[go: up one dir, main page]

CN114661893A - Abstract generation method and device - Google Patents

Abstract generation method and device Download PDF

Info

Publication number
CN114661893A
CN114661893A CN202210318418.1A CN202210318418A CN114661893A CN 114661893 A CN114661893 A CN 114661893A CN 202210318418 A CN202210318418 A CN 202210318418A CN 114661893 A CN114661893 A CN 114661893A
Authority
CN
China
Prior art keywords
keyword
dialogue
keywords
determining
sentence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210318418.1A
Other languages
Chinese (zh)
Inventor
莫森·波尔瓦利
盛晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lenovo Beijing Ltd
Original Assignee
Lenovo Beijing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lenovo Beijing Ltd filed Critical Lenovo Beijing Ltd
Priority to CN202210318418.1A priority Critical patent/CN114661893A/en
Publication of CN114661893A publication Critical patent/CN114661893A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • G06F16/345Summarisation for human users
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3334Selection or weighting of terms from queries, including natural language queries
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Machine Translation (AREA)

Abstract

本申请提供了一种摘要生成方法和装置,包括:确定对话文本中各对话语句内的关键词;确定不同关键词之间的相关度以及对话语句内关键词之间的出现顺序;基于出现顺序,确定关键词对对话文本的影响程度;结合不同关键词之间的相关度以及关键词对对话文本的影响程度,确定关键词的重要程度;基于关键词的重要程度,生成对话文本的摘要,摘要包括:对话文本中重要程度符合条件的关键词所在的对话语句。本申请的方案能够更为高效和准确地生成对话文本的摘要。

Figure 202210318418

The present application provides a method and device for generating an abstract, including: determining the keywords in each dialogue sentence in the dialogue text; determining the correlation between different keywords and the appearance order of the keywords in the dialogue sentences; based on the appearance order , determine the degree of influence of keywords on the dialogue text; combine the correlation between different keywords and the degree of influence of keywords on the dialogue text to determine the degree of importance of the keywords; based on the importance of the keywords, generate a summary of the dialogue text, The abstract includes: dialogue sentences in the dialogue text where the keywords whose importance levels meet the conditions are located. The solution of the present application can generate summaries of dialogue texts more efficiently and accurately.

Figure 202210318418

Description

摘要生成方法和装置Abstract generating method and apparatus

技术领域technical field

本申请涉及文本处理技术领域,尤其涉及一种摘要生成方法和装置。The present application relates to the technical field of text processing, and in particular, to a method and apparatus for generating abstracts.

背景技术Background technique

对话文本是指至少两个对话方之间聊天或者交流的对话语句所构成的文本。The dialogue text refers to a text composed of dialogue sentences chatting or communicating between at least two dialogue parties.

为了能够更为便捷和高效的了解对话文本的内容,经常需要生成对话文本的摘要。如,在客户服务场景中,用户与客服(如,人工客服或者机器人客服等)之间可以通过对话来实现问题咨询等信息交互,而通过提取用户与客服之间对话的对话文本的摘要,能够对用户与客服之间的交互信息进行浓缩,有利于客服服务中的相关人员获得已经讨论过的问题或者解决方案等的概括内容。In order to understand the content of the dialogue text more conveniently and efficiently, it is often necessary to generate a summary of the dialogue text. For example, in the customer service scenario, the user and the customer service (such as human customer service or robot customer service, etc.) can exchange information such as question consultation through dialogue, and by extracting the summary of the dialogue text between the user and the customer service, it is possible to Concentrating the interactive information between the user and the customer service is helpful for the relevant personnel in the customer service to obtain the general content of the discussed problems or solutions.

为了能够更为准确反映对话文本的内容,就需要合理地提取对话文本的摘要,因此,如何能够更为合理地从对话文本中提取摘要,使得提取的摘要能够更为准确地反映对话文本的内容是本领域技术人员迫切需要解决的技术问题。In order to more accurately reflect the content of the dialogue text, it is necessary to extract the abstract of the dialogue text reasonably. Therefore, how to extract the abstract from the dialogue text more reasonably so that the extracted abstract can more accurately reflect the content of the dialogue text It is a technical problem that those skilled in the art urgently need to solve.

发明内容SUMMARY OF THE INVENTION

本申请提供了一种摘要生成方法和装置。The present application provides an abstract generating method and apparatus.

其中,一种摘要生成方法,包括:Among them, a summary generation method includes:

确定对话文本中各对话语句内的关键词;Determine the keywords in each dialogue sentence in the dialogue text;

确定不同关键词之间的相关度以及所述对话语句内关键词之间的出现顺序;determining the degree of relevancy between different keywords and the order in which the keywords appear in the dialogue sentence;

基于所述出现顺序,确定所述关键词对所述对话文本的影响程度;determining the degree of influence of the keyword on the dialogue text based on the order of appearance;

结合不同关键词之间的相关度以及所述关键词对所述对话文本的影响程度,确定所述关键词的重要程度;Determine the importance of the keyword in combination with the correlation between different keywords and the degree of influence of the keyword on the dialogue text;

基于所述关键词的重要程度,生成所述对话文本的摘要,所述摘要包括:所述对话文本中重要程度符合条件的关键词所在的对话语句。Based on the importance of the keywords, a summary of the dialogue text is generated, and the summary includes: dialogue sentences in the dialogue text where the keywords whose importance levels meet the conditions are located.

在一种可能的实现方式中,在所述生成所述对话文本的摘要之前,还包括:In a possible implementation manner, before generating the summary of the dialogue text, the method further includes:

将所述对话文本划分为至少一个对话分区,不同对话分区内的对话语句表征不同类别的对话意图;Dividing the dialogue text into at least one dialogue partition, and dialogue sentences in different dialogue partitions represent different types of dialogue intentions;

所述基于所述关键词的重要程度,生成所述对话文本的摘要,包括:The generating a summary of the dialogue text based on the importance of the keyword, including:

基于所述关键词的重要程度,确定所述对话分区内用于组成摘要的目标对话语句,得到各对话分区内的目标对话语句组成的摘要。Based on the importance of the keywords, the target dialogue sentences used to form the abstract in the dialogue partition are determined, and the abstract formed by the target dialogue sentences in each dialogue partition is obtained.

在一种可能的实现方式中,确定对话语句内关键词之间的出现顺序包括:In a possible implementation manner, determining the appearance order of keywords in the dialogue sentence includes:

对于每个关键词,确定所述关键词的各共现关键词以及所述关键词与其共现关键词首次共同出现的共现对话语句,所述关键词的共现关键词为与所述关键词同时出现在一个对话语句内的其他关键词;For each keyword, determine each co-occurrence keyword of the keyword and the co-occurrence dialogue sentence in which the keyword and its co-occurrence keyword co-occur for the first time, and the co-occurrence keyword of the keyword is the same as the keyword Words appearing simultaneously with other keywords within a dialogue sentence;

确定所述共现对话语句内所述关键词与其共现关键词之间的出现顺序。An order of appearance between the keyword and its co-occurrence keyword within the co-occurrence dialogue sentence is determined.

其中,一种摘要生成装置,包括:Among them, an abstract generating device, including:

关键词确定单元,用于确定对话文本中各对话语句内的关键词;a keyword determination unit, used for determining keywords in each dialogue sentence in the dialogue text;

关联确定单元,用于确定不同关键词之间的相关度以及所述对话语句内关键词之间的出现顺序;an association determination unit, configured to determine the degree of relevancy between different keywords and the order of appearance of the keywords in the dialogue sentence;

影响确定单元,用于基于所述出现顺序,确定所述关键词对所述对话文本的影响程度;an influence determination unit, configured to determine the degree of influence of the keyword on the dialogue text based on the appearance order;

重要度确定单元,用于结合不同关键词之间的相关度以及所述关键词对所述对话文本的影响程度,确定所述关键词的重要程度;an importance determination unit, configured to determine the importance of the keyword in combination with the degree of correlation between different keywords and the degree of influence of the keyword on the dialogue text;

摘要生成单元,用于基于所述关键词的重要程度,生成所述对话文本的摘要,所述摘要包括:所述对话文本中重要程度符合条件的关键词所在的对话语句。The abstract generating unit is configured to generate an abstract of the dialogue text based on the importance of the keywords, where the abstract includes: dialogue sentences in the dialogue text where the keywords whose importance meets the condition are located.

由以上可知,本申请中,在确定对话文本中各对话语句的关键词之后,不仅会确定不同关键词之间的相关度,还会结合对话语句内关键词之间的出现顺序,确定关键词对对话文本的影响程度。在基础上,本申请会结合关键词之间的相关度以及关键词对对话文本的影响程度这两个维度,综合确定关键词的重要程度,使得关键词的重要程度能够更为合理且准确地反映出该关键词在对话文本中的重要性,因此,结合关键词的重要程度能够更为合理的从对话文本中确定出用于生成摘要的对话语句,从而使得摘要能够更为准确反映出对话文本中的关键信息。It can be seen from the above that in the present application, after the keywords of each dialogue sentence in the dialogue text are determined, not only the correlation between different keywords is determined, but also the appearance order of the keywords in the dialogue sentences is used to determine the keywords. The degree of influence on the dialogue text. On this basis, this application will combine the two dimensions of the correlation between keywords and the degree of influence of keywords on the dialogue text, and comprehensively determine the importance of keywords, so that the importance of keywords can be more reasonable and accurate. It reflects the importance of the keyword in the dialogue text. Therefore, combining the importance of the keyword can more reasonably determine the dialogue sentence used to generate the summary from the dialogue text, so that the summary can more accurately reflect the dialogue. key information in the text.

附图说明Description of drawings

为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。In order to explain the technical solutions in the embodiments of the present application more clearly, the following briefly introduces the drawings that are used in the description of the embodiments. Obviously, the drawings in the following description are only the embodiments of the present application. For those of ordinary skill in the art, other drawings can also be obtained according to the provided drawings without any creative effort.

图1示出了本申请实施例提供的摘要生成方法的一种流程示意图;1 shows a schematic flowchart of a method for generating an abstract provided by an embodiment of the present application;

图2示出了本申请实施例提供的摘要生成方法的又一种流程示意图;FIG. 2 shows another schematic flowchart of the abstract generation method provided by the embodiment of the present application;

图3示出了本申请实施例提供的有向图的一种示意图;FIG. 3 shows a schematic diagram of a directed graph provided by an embodiment of the present application;

图4示出了本申请实施例中确定关键词之间相关度的一种流程示意图;FIG. 4 shows a schematic flowchart of determining the degree of correlation between keywords in an embodiment of the present application;

图5示出了本申请实施例中确定关键词的重要程度的一种流程示意图;5 shows a schematic flowchart of determining the importance of keywords in the embodiment of the present application;

图6示出了本申请实施例提供的摘要生成方法的又一种流程示意图;FIG. 6 shows another schematic flowchart of the abstract generation method provided by the embodiment of the present application;

图7示出了本申请实施例中对话文本划分出的对话分区的一种示意图;FIG. 7 shows a schematic diagram of a dialogue partition divided by dialogue text in an embodiment of the present application;

图8示出了本申请实施例提供的摘要生成方法的又一种流程示意图;FIG. 8 shows another schematic flowchart of the method for generating an abstract provided by an embodiment of the present application;

图9示出了本申请实施例提供的有向图的又一种示意图;FIG. 9 shows another schematic diagram of a directed graph provided by an embodiment of the present application;

图10示出了本申请实施例提供的摘要生成装置的一种组成结构示意图;FIG. 10 shows a schematic diagram of a composition structure of an abstract generating apparatus provided by an embodiment of the present application;

图11示出了本申请实施例提供的电子设备的一种组成架构示意图。FIG. 11 shows a schematic diagram of a composition structure of an electronic device provided by an embodiment of the present application.

具体实施方式Detailed ways

本申请实施例的方案适用于为任意类型的对话文本生成摘要,以通过生成的摘要更为合理且准确地反映出对话文本的关键信息。The solutions of the embodiments of the present application are suitable for generating summaries for any type of dialog text, so that the generated summaries can more reasonably and accurately reflect the key information of the dialog text.

下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有付出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the protection scope of the present application.

如图1所示,其示出了本申请实施例提供的摘要生成方法的一种流程示意图,本实施例的方法可以应用于任意类型的电子设备。或者多台电子设备组成的集群或者分布式系统等,如,电子设备可以为笔记本电脑、台式电脑或者服务器等,对此不加限制。As shown in FIG. 1 , it shows a schematic flowchart of a method for generating an abstract provided in an embodiment of the present application. The method in this embodiment can be applied to any type of electronic device. Or a cluster or a distributed system composed of multiple electronic devices, for example, the electronic device may be a notebook computer, a desktop computer, or a server, etc., which is not limited.

本实施例方法可以包括:The method of this embodiment may include:

S101,确定对话文本中各对话语句内的关键词。S101: Determine the keywords in each dialogue sentence in the dialogue text.

其中,对话文本包括至少一条对话语句。当然,一般情况下,需要生成摘要的对话文本都会包括多条对话语句。如,对话文本中的对话语句可以为不同对话方之间用于信息咨询、解答或者交流的文本语句。Wherein, the dialogue text includes at least one dialogue sentence. Of course, in general, the dialogue text that needs to generate a summary will include multiple dialogue sentences. For example, the dialogue sentences in the dialogue text may be text sentences used for information consultation, solution or communication between different dialogue parties.

在本申请中,需要提取摘要的对话文本可以为任意场景下的对话文本。如,该对话文本可以为客户服务中用户与客服之间交互的对话文本,例如,包括买方用户与卖方客服之间交互的多条对话语句的文本。又如,对话文本可以为单位或者企业内容涉及工作事务相关交流的对话文本。In this application, the dialogue text for which the abstract needs to be extracted may be dialogue text in any scenario. For example, the dialogue text may be the dialogue text of the interaction between the user and the customer service in the customer service, for example, the text including a plurality of dialogue sentences of the interaction between the buyer user and the seller customer service. For another example, the dialogue text may be a dialogue text of a unit or enterprise whose content involves communication related to work affairs.

当然,对话文本还可以有其他可能,本申请对于对话文本的来源以及获取方式等均不加限制。Of course, the dialogue text may also have other possibilities, and this application does not limit the source and acquisition method of the dialogue text.

其中,一条对话语句中可以包括至少一个关键词。Wherein, a dialogue sentence may include at least one keyword.

对话语句中的关键词可以通过多种方式确定,对此不加限制。The keywords in the dialogue sentences can be determined in a number of ways without limitation.

如,一种较为简单的方式可以为,将对话语句进行分词,将对话语句分出的各个分词作为对话语句的关键词。For example, a relatively simple way may be to perform word segmentation on the dialogue sentence, and use each segmented word from the dialogue sentence as a keyword of the dialogue sentence.

又如,在对对话语句分词后,可以将分词出的各个分词去除一些停用词之后剩余的词确定为对话语句的关键词。For another example, after segmenting the dialogue sentence, the remaining words after removing some stop words from each segmented word may be determined as the keywords of the dialogue sentence.

当然,还可以根据对话文本所涉及到的具体场景等,将对话语句中具有特定语义的词作为关键词等,在此不再赘述。Of course, words with specific semantics in the dialogue sentence may also be used as keywords according to the specific scene involved in the dialogue text, etc., which will not be repeated here.

S102,确定不同关键词之间的相关度以及对话语句内关键词之间的出现顺序。S102: Determine the degree of relevance between different keywords and the appearance order of the keywords in the dialogue sentence.

文本对话中不同关键词之间的相关度可以包括:对话文本中任意两个关键词之间的相关度。The correlation between different keywords in the text dialogue may include: the correlation between any two keywords in the dialogue text.

其中,关键词之间的相关度只要是反映两个关键词之间的相关程度。可以理解的是,计算两个关键词之间相关度的方式可以有多种,本申请对此不加限制。Among them, the degree of correlation between keywords only reflects the degree of correlation between two keywords. It can be understood that, there may be various ways to calculate the degree of correlation between two keywords, which is not limited in this application.

对于一个对话语句而言,对话语句中关键词之间的出现顺序可以包括:对话语句中任意两个关键词之间的出现顺序。如,对话语句中包括关键词A和关键词B,则可以确定出关键词A与关键词B在该对话语句中出现的先后顺序。For a dialogue sentence, the appearance order between the keywords in the dialogue sentence may include: the appearance order between any two keywords in the dialogue sentence. For example, if the dialogue sentence includes the keyword A and the keyword B, the sequence in which the keyword A and the keyword B appear in the dialogue sentence can be determined.

可以理解的是,两个关键词在同一个对话语句内的出现顺序反映的也是这两个关键词之间的一种关联关系,通过两个关键词在同一个对话语句内的出现顺序可以表征出这两个关键词之间的影响关系。It can be understood that the appearance order of two keywords in the same dialogue sentence also reflects a relationship between the two keywords, and the appearance order of the two keywords in the same dialogue sentence can be represented. Find out the relationship between the two keywords.

如,某条对话语句中关键词A出现在关键词B前面,那么说明关键词B有可能是由关键词A引出的关键词,由于关键词A的出现才可能引发出现关键词B。For example, if keyword A appears before keyword B in a dialogue sentence, it means that keyword B may be a keyword derived from keyword A, and keyword B may appear only because of the appearance of keyword A.

S103,基于出现顺序,确定关键词对对话文本的影响程度。S103, based on the order of appearance, determine the degree of influence of the keyword on the dialogue text.

可以理解的是,对于一个关键词而言,该关键词可能会出现在多个对话语句中,而结合不同对话语句中关键词之间的出现顺序,可以得到该关键词与其他各关键词之间的出现顺序,也就可以得到该关键词对于其他关键词的影响关系,以及其他关键词对该关键词的影响关系。It can be understood that, for a keyword, the keyword may appear in multiple dialogue sentences, and combining the appearance order of the keywords in different dialogue sentences, the relationship between the keyword and other keywords can be obtained. The order in which the keywords appear, the influence relationship of the keyword on other keywords, and the influence relationship of other keywords on the keyword can be obtained.

而通过该关键词对于其他关键词的影响关系以及其他关键词对于该关键词的影响关系中的一种或者两种,可以反映出关键词在对话文本中的影响程度。And through one or both of the influence relationship of the keyword on other keywords and the influence relationship of other keywords on the keyword, the degree of influence of the keyword in the dialogue text can be reflected.

如,考虑到同一对话语句中关键词之后出现的其他关键词可以认为是有该关键词引起的其他关键词。而对于一个关键词而言,由该关键词引出的其他关键词的数量越多,或者是,其他关键词引出该关键词的数量越多,则说明该关键词在该对话文本中具有较为重要的影响力,其影响程度越大。For example, considering that other keywords appearing after the keyword in the same dialogue sentence can be considered as other keywords caused by the keyword. For a keyword, the greater the number of other keywords elicited by the keyword, or the greater the number of other keywords eliciting the keyword, the more important the keyword is in the dialogue text. influence, the greater the degree of influence.

基于此,在一种可能的实现方式中,针对一个关键词,结合不同对话文本中关键词之间的出现顺序,确定位于该关键词之后的其他关键词的第一数量以及位于该关键词之前的其他关键词的第二数量。相应的,可以基于该关键词对应的第一数量和第二数量,确定该关键词该对对话文本的影响程度。Based on this, in a possible implementation manner, for a keyword, the first number of other keywords located after the keyword and the number of other keywords located before the keyword are determined in combination with the appearance order of the keywords in different dialogue texts The second number of other keywords. Correspondingly, the degree of influence of the keyword on the dialogue text may be determined based on the first quantity and the second quantity corresponding to the keyword.

如,可以基于第一数量和第二数量之和,确定关键词对该对话文本的影响程度。其中,关键词对应的第一数量和第二数量之和越大,该关键词对该对话文本的影响程度越大。For example, the degree of influence of the keyword on the dialogue text may be determined based on the sum of the first quantity and the second quantity. Wherein, the greater the sum of the first quantity and the second quantity corresponding to the keyword, the greater the influence of the keyword on the dialogue text.

又如,可以基于第一数量和第二数量中的较大者,确定关键词对该对话文本的影响程度。其中,关键词对应的第一数量和第二数量中的较大者的取值越大,关键词对该对话文本的影响程度越大。例如,可以将第一数量和第二数量中的较大者的数值作为表征该关键词对该对话文本的影响程度的影响值。For another example, the degree of influence of the keyword on the dialogue text may be determined based on the larger of the first number and the second number. Wherein, the larger the value of the larger of the first quantity and the second quantity corresponding to the keyword, the greater the influence of the keyword on the dialogue text. For example, the larger value of the first quantity and the second quantity may be used as the influence value representing the influence degree of the keyword on the dialogue text.

可以理解的是,在本申请中关键词的影响程度可以通过分数或者影响等级等形式表现出来,对于具体方式不加限制。It can be understood that, in the present application, the degree of influence of keywords can be expressed in the form of scores or influence grades, and the specific manner is not limited.

S104,结合不同关键词之间的相关度以及关键词对对话文本的影响程度,确定关键词的重要程度。S104 , determining the importance of the keywords in combination with the correlation between different keywords and the degree of influence of the keywords on the dialogue text.

可以理解的是,对于一个关键词而言,该关键词分别与对话文本中其他各关键词之间的相关度可以反映出关键词在对话文本中的重要程度;而就有关键词与其他关键词之间的出现顺序得到的该关键词对对话文本的影响程度也可以从另一维度上反映出该关键词的重要程度。基于此,本申请结合关键词之间的相关度以及关键词对该对话文本的影响程度综合确定该关键词的重要程度,从而可以更为准确且合理的反映出对话文本中各关键词的重要性。It is understandable that for a keyword, the correlation between the keyword and other keywords in the dialogue text can reflect the importance of the keyword in the dialogue text; The degree of influence of the keyword on the dialogue text obtained from the order of appearance of the words can also reflect the importance of the keyword from another dimension. Based on this, the present application comprehensively determines the importance of the keyword in combination with the correlation between keywords and the degree of influence of the keyword on the dialogue text, so that the importance of each keyword in the dialogue text can be more accurately and reasonably reflected sex.

其中,结合相关度以及关键词对对话文本的影响程度确定关键词的重要程度的方式,可以有多种可能,本申请对此不加限制。There are various possibilities for determining the degree of importance of the keyword in combination with the degree of relevance and the degree of influence of the keyword on the dialogue text, which is not limited in this application.

举例说明,可以基于关键词与其他关键词之间的相关度,确定关键词的第一重要程度值;结合该关键词对对话文本的影响程度,确定第二重要程度值。然后,基于第一重要程度值和第二重要程度值的加权和或者乘积等,确定关键词的重要程度。For example, the first importance level value of the keyword may be determined based on the correlation between the keyword and other keywords; the second importance level value may be determined in combination with the degree of influence of the keyword on the dialogue text. Then, based on the weighted sum or product or the like of the first importance level value and the second importance level value, the importance level of the keyword is determined.

S105,基于关键词的重要程度,生成对话文本的摘要。S105, based on the importance of the keywords, generate a summary of the dialogue text.

其中,该摘要包括:对话文本中重要程度符合条件的关键词所在的对话语句。Wherein, the abstract includes: dialogue sentences in the dialogue text where the keywords whose importance levels meet the conditions are located.

如,在一种可能的实现方式中,在确定出对话文本中各关键词的重要程度之后,针对该对话文本中每条对话语句,可以结合对话语句中关键词的重要程度,确定该对话语句的重要程度。相应的,可以选取对话文本中重要程度较高的前至少一条对话语句作为对话文本的摘要。For example, in a possible implementation manner, after determining the importance of each keyword in the dialogue text, for each dialogue sentence in the dialogue text, the dialogue sentence can be determined by combining the importance of the keywords in the dialogue sentence degree of importance. Correspondingly, at least one preceding dialogue sentence with a high degree of importance in the dialogue text may be selected as the summary of the dialogue text.

其中,结合对话语句中关键词的重要程度,确定该对话语句的重要程度的方式也可以有多种,如,可以综合对话语句中所有关键词的重要程度确定该对话语句的重要程度。There are also various ways to determine the importance of the dialogue sentence in combination with the importance of the keywords in the dialogue sentence. For example, the importance of the dialogue sentence can be determined by combining the importance of all keywords in the dialogue sentence.

在一种实现方式中,为了能够使得重要程度较高的关键词能够出现在摘要中,本申请还可以将对话语句中重要程度最高的关键词的重要程度作为该对话语句的重要程度。例如,假设以重要程度评分来表征关键词的重要程度,那么可以将对话语句中重要程度评分最高的关键词对应的重要程度评分确定为对话语句的评分,从而可以选取评分较高的对话语句生成对话文本的摘要。In an implementation manner, in order to enable a keyword with a high degree of importance to appear in the abstract, the present application may also use the degree of importance of the keyword with the highest degree of importance in the dialogue sentence as the degree of importance of the dialogue sentence. For example, if the importance of keywords is represented by the importance score, then the importance score corresponding to the keyword with the highest importance score in the dialogue sentence can be determined as the score of the dialogue sentence, so that the dialogue sentence with a higher score can be selected to generate A summary of the dialogue text.

由以上可知,本申请中,在确定对话文本中各对话语句的关键词之后,不仅会确定不同关键词之间的相关度,还会结合对话语句内关键词之间的出现顺序,确定关键词对对话文本的影响程度。在基础上,本申请会结合关键词之间的相关度以及关键词对对话文本的影响程度这两个维度,综合确定关键词的重要程度,使得关键词的重要程度能够更为合理且准确地反映出该关键词在对话文本中的重要性,因此,结合关键词的重要程度能够更为合理的从对话文本中确定出用于生成摘要的对话语句,从而使得摘要能够更为准确反映出对话文本中的关键信息。It can be seen from the above that in the present application, after the keywords of each dialogue sentence in the dialogue text are determined, not only the correlation between different keywords is determined, but also the appearance order of the keywords in the dialogue sentences is used to determine the keywords. The degree of influence on the dialogue text. On this basis, this application will combine the two dimensions of the correlation between keywords and the degree of influence of keywords on the dialogue text, and comprehensively determine the importance of keywords, so that the importance of keywords can be more reasonable and accurate. It reflects the importance of the keyword in the dialogue text. Therefore, combining the importance of the keyword can more reasonably determine the dialogue sentence used to generate the summary from the dialogue text, so that the summary can more accurately reflect the dialogue. key information in the text.

可以理解的是,对于任意两个关键词而言,这两个关键词可能会同时出现在多条对话语句中,而且在不同对话语句中这两个关键词之间的出现顺序有可能会有所不同。为了避免两个关键词在不同会话语句中的出现顺序不一致,而影响到确定关键词对对话文本的影响程度,本申请可以仅仅关注两个关键词首次同时出现的对话语句中,这两个关键词的出现顺序。It is understandable that for any two keywords, these two keywords may appear in multiple dialogue sentences at the same time, and the order of appearance of these two keywords in different dialogue sentences may be different. different. In order to avoid inconsistency in the order in which the two keywords appear in different conversational sentences, which may affect the degree of influence of the keywords on the dialogue text, this application can only focus on the dialogue sentences in which the two keywords appear at the same time for the first time. the order in which the words appear.

可以理解的是,在实际应用中,对于任意两个关键词而言,这两个关键词首次同时出现的对话语句中这两个关键词的出现顺序,更能够表征这两个关键词之间的重要程度。It is understandable that, in practical applications, for any two keywords, the appearance order of the two keywords in the dialogue sentence in which the two keywords appear at the same time for the first time can better characterize the relationship between the two keywords. degree of importance.

为了便于理解,下面结合确定出现顺序的一种可能的实现方式对本申请实施例的方案进行说明。如图2所示,其示出了本申请实施例提供的摘要生成方法的又一种流程示意图,本实施例的方法可以包括:For ease of understanding, the solutions of the embodiments of the present application will be described below with reference to a possible implementation manner of determining the order of appearance. As shown in FIG. 2 , which shows another schematic flowchart of the method for generating an abstract provided in an embodiment of the present application, the method in this embodiment may include:

S201,确定对话文本中各对话语句内的关键词。S201: Determine the keywords in each dialogue sentence in the dialogue text.

S202,确定不同关键词之间的相关度。S202, determining the degree of correlation between different keywords.

以上步骤S201和S202可以参见前面实施例的相关介绍,在此不再赘述。For the above steps S201 and S202, reference may be made to the relevant introduction of the previous embodiment, and details are not repeated here.

S203,对于每个关键词,确定关键词的各共现关键词以及该关键词与其共现关键词首次共同出现的共现对话语句。S203 , for each keyword, determine each co-occurrence keyword of the keyword and a co-occurrence dialogue sentence in which the keyword and its co-occurrence keyword co-occur for the first time.

其中,关键词的共现关键词为与该关键词同时出现在一个对话语句内的其他关键词。也就是说,对于出现在同一个对话语句中各关键词而言,任意两个关键词之间均互为共现关键词。Among them, the co-occurrence keyword of a keyword is other keywords that appear in a dialogue sentence at the same time as the keyword. That is to say, for each keyword appearing in the same dialogue sentence, any two keywords are co-occurrence keywords with each other.

如,对话语句中包括关键词A和关键词B,那么关键词B为关键词A的共现关键词,且关键词B也为关键词A的共现关键词。For example, if the dialogue sentence includes keyword A and keyword B, then keyword B is a co-occurrence keyword of keyword A, and keyword B is also a co-occurrence keyword of keyword A.

可以理解的是,对于互为关键词的任意两个关键词而言,这两个关键词同时出现的对话语句也就是同时包含这两个关键词的对话语句,为了便于区分,将这两个关键词同时出现的对话语句称为共现对话语句。而在实际应用中,这两个关键词有可能会同时在多个共现对话语句,然而,为了能够更为准确获得这两个关键词之间的影响关系,本实施例可以仅关注这两个关键词第一次共同出现的共现对话语句。It can be understood that, for any two keywords that are keywords for each other, the dialogue sentence in which the two keywords appear at the same time is also the dialogue sentence containing the two keywords at the same time. Dialogue sentences in which keywords appear at the same time are called co-occurring dialogue sentences. In practical applications, these two keywords may co-occur in multiple dialogue sentences at the same time. However, in order to obtain the influence relationship between these two keywords more accurately, this embodiment may only focus on these two keywords. A co-occurring dialogue sentence in which each keyword co-occurs for the first time.

S204,对于关键词对应的任意一个共现关键词,确定该关键词与该共现关键词对应的共现对话语句内,该关键词与该共现关键词之间的出现顺序。S204 , for any co-occurrence keyword corresponding to the keyword, determine the appearance order between the keyword and the co-occurrence keyword in the co-occurrence dialogue sentence corresponding to the keyword and the co-occurrence keyword.

需要说明的是,此处提到的共现对话语句同样是指关键词与其共现关键词首次同时出现的共现对话语句。It should be noted that the co-occurrence dialogue sentence mentioned here also refers to a co-occurrence dialogue sentence in which a keyword and its co-occurrence keyword appear at the same time for the first time.

可以理解的是,由于两个关键词之间互为共现关键词,因此,对于任意两个关键词而言,只需要确定一次这两个关键词首次同时出现的对话语句内这两个关键词之间的出现顺序即可。基于此,该步骤S204也可以看成是对于互为共现关键词的任意两个关键词,确定这两个关键词首次出现的共现对话语句内,这两个关键词的出现顺序。It can be understood that, since the two keywords are co-occurring keywords, for any two keywords, it is only necessary to determine the two keywords in the dialogue sentence in which the two keywords appear at the same time for the first time. The order in which the words appear. Based on this, step S204 can also be regarded as, for any two keywords that are mutually co-occurring keywords, determining the order in which the two keywords appear in the co-occurrence dialogue sentence in which the two keywords appear for the first time.

S205,针对每个关键词,基于该关键词对应的各共现对话语句内该关键词与该关键词的共现关键词之间的出现顺序,确定该关键词对该对话文本的影响程度。S205 , for each keyword, determine the degree of influence of the keyword on the dialogue text based on the appearance order of the keyword and the co-occurrence keywords of the keyword in each co-occurring dialogue sentence corresponding to the keyword.

如,在一种可能的实现方式中,针对每个关键词,基于该关键词对应的各共现对话语句内该关键词与该关键词的共现关键词之间的出现顺序,确定位于该关键词之后的其他关键词的第一数量以及位于该关键词之前的其他关键词的第二数量。在此基础上,可以基于关键词对应的第一数量和第二数量,确定关键词对对话文本的影响程度。For example, in a possible implementation manner, for each keyword, it is determined that the keyword located in the A first number of other keywords following the keyword and a second number of other keywords preceding the keyword. On this basis, the degree of influence of the keyword on the dialogue text may be determined based on the first quantity and the second quantity corresponding to the keyword.

在又一种可能的实现方式中,为了能够更为快捷和高效的确定出关键词与其共现关键词之间的出现顺序,并能够高效统计出每个关键词之前以及之后出现的共现关键词的数量信息,本申请还可以结合有向图来分析关键词与其他关键词之间的影响关系,并确定关键词对对话文本的影响程度。In another possible implementation manner, in order to more quickly and efficiently determine the order of appearance between keywords and their co-occurrence keywords, and to efficiently count the co-occurrence keys that appear before and after each keyword The quantity information of the words, the present application can also combine the directed graph to analyze the influence relationship between the keywords and other keywords, and determine the degree of influence of the keywords on the dialogue text.

具体的,在确定关键词的各共现关键词以及该关键词与其共现关键词首次共同出现的共现对话语句之后,可以构建表征该对话文本内不同关键词之间关联关系的有向图。Specifically, after determining each co-occurrence keyword of the keyword and the co-occurrence dialogue sentence in which the keyword and its co-occurrence keyword co-occur for the first time, a directed graph representing the association relationship between different keywords in the dialogue text can be constructed .

其中,有向图包括:多个节点以及多个节点之间指示有方向的有向边。每个节点表征一个关键词,两个节点之间具有所述有向边表示所述两个节点对应的两个关键词互为共现关键词,其中,两个节点之间的有向边的方向表示这两个节点对应的两个关键词在其共现对话语句内的出现顺序。The directed graph includes: a plurality of nodes and directed edges between the plurality of nodes indicating a direction. Each node represents a keyword, and having the directed edge between two nodes indicates that the two keywords corresponding to the two nodes are co-occurrence keywords. The direction indicates the order in which the two keywords corresponding to the two nodes appear in their co-occurring dialogue sentences.

如图3所示,其示出了本申请实施例提供的有向图的一种示意图。在图3中每个圆圈表示一个节点,如图3可以包括节点A、节点B、节点C、节点D、节点E、节点F和节点G。As shown in FIG. 3 , it shows a schematic diagram of a directed graph provided by an embodiment of the present application. In FIG. 3 , each circle represents a node, which may include node A, node B, node C, node D, node E, node F, and node G as shown in FIG. 3 .

其中,不同节点表示对话文本中不同的关键词,例如,节点A可以表示关键词1,节点B可以表示关键词2,其他也类似,不再赘述。Wherein, different nodes represent different keywords in the dialogue text. For example, node A may represent keyword 1, node B may represent keyword 2, and others are similar, and will not be repeated here.

由图3可以看出,有些节点之间通过具有方向的有向边相连,而有些节点之间不存在有向边。其中,如果两个节点表征的两个关键词之间互为共现关键词,那么这两个节点之间存在有向边,而有向边的箭头方向表示这两个关键词在首次同时出现的共现对话语句中的出现顺序,有向边的方向为由出现顺序靠前的关键词指向出现顺序靠后的关键词。It can be seen from Figure 3 that some nodes are connected by directed edges with directions, while some nodes do not have directed edges. Among them, if the two keywords represented by the two nodes are co-occurring keywords, then there is a directed edge between the two nodes, and the arrow direction of the directed edge indicates that the two keywords appear at the same time for the first time. The order of appearance in the co-occurrence dialogue sentence of , the direction of the directed edge is that the keywords with the earlier order of appearance point to the keywords with the later order of appearance.

例如,节点A和节点B对应的关键词1和关键词2首次同时出现在对话语句S中,而且在该对话语句S中,关键词1先出现而关键词2后出现,因此,在图3中需要由表示关键词1的节点A指向表示关键词2的节点B,以表示节点A表征的关键词可以引出节点B表征的关键词。For example, the keyword 1 and the keyword 2 corresponding to the node A and the node B appear simultaneously in the dialogue sentence S for the first time, and in the dialogue sentence S, the keyword 1 appears first and the keyword 2 appears later. Therefore, in Figure 3 It is necessary to point from node A representing keyword 1 to node B representing keyword 2, so that the keyword represented by node A can lead to the keyword represented by node B.

结合图3的有向图可知,基于有向图可以较为清晰的表征出各个节点表征的关键词之间是否属于共现关键词,以及,互为共现关键词的两个关键词之间出现顺序。Combining with the directed graph in Figure 3, it can be seen that based on the directed graph, it is possible to clearly characterize whether the keywords represented by each node belong to co-occurrence keywords, and whether two keywords that are co-occurring keywords appear between each other. order.

相应的,在构建出有向图的基础上,本申请可以按照有向图中不同节点之间有向边的数量以及方向,确定节点表征的关键词对该对话文本的影响程度。Correspondingly, on the basis of constructing the directed graph, the present application can determine the degree of influence of the keywords represented by the nodes on the dialogue text according to the number and direction of directed edges between different nodes in the directed graph.

如,可以基于有向图,统计每个节点指向的其他节点的数量,该数量本质上就是在该节点表征的关键词之后出现的其他关键词的该第一数量。相应的,还可以统计指向该节点的其他节点的数量,该数量就是在该节点表征的关键词之前出现的其他关键词的该第二数量。For example, based on a directed graph, the number of other nodes pointed to by each node can be counted, and the number is essentially the first number of other keywords that appear after the keyword represented by the node. Correspondingly, the number of other nodes pointing to the node may also be counted, and the number is the second number of other keywords that appear before the keyword represented by the node.

可以理解的是,结合有向图可以利用现有的链路关系分析函数更为方便的统计出该第一数量和第二数量。It can be understood that the first quantity and the second quantity can be more conveniently calculated by using the existing link relationship analysis function in combination with the directed graph.

如,在一种可能的实现方式中,可以采用超文本敏感标题搜索(Hyperlink-Induced Topic Search,HITS)算法来统计每个节点对应的第一数量以及第二数量。For example, in a possible implementation manner, a Hyperlink-Induced Topic Search (HITS) algorithm may be used to count the first quantity and the second quantity corresponding to each node.

HITS算法可以用于计算页面的两种值,一个是枢纽值(Hub Scores),另一种是权威值(Authority Scores),这两种值是互相依存、互相影响的。所谓枢纽值,指的是页面上所有导出链接指向页面的权威值之和。权威值是指所有导入链接所在的页面中枢纽之和。The HITS algorithm can be used to calculate two values of a page, one is Hub Scores, the other is Authority Scores, these two values are interdependent and affect each other. The so-called pivot value refers to the sum of the authority values of all outgoing links on the page pointing to the page. The authority value is the sum of the pivots on the page where all incoming links are located.

在本申请中借助HITS算法可以针对有向图中每个节点计算枢纽值和权威值,其中,节点的枢纽指就是该节点指向的其他节点的总数量,即节点表征的关键词对应的第一数量;而节点的权威值就是指向该节点的所有节点的总数量,即节点表征的关键词对应的该第二数量。In this application, the pivot value and authority value can be calculated for each node in the directed graph with the help of the HITS algorithm, where the pivot of a node refers to the total number of other nodes pointed to by the node, that is, the first node corresponding to the keyword represented by the node. The authority value of a node is the total number of all nodes pointing to the node, that is, the second number corresponding to the keyword represented by the node.

可以理解的是,此处是以基于HITS来分析有向图确定各个节点表征的关键词对应的第一数量和第二数量为例,在实际应用中还可以是借助其他算法来分析有向图来确定各个关键词的第一数量和第二数量,对此不加限制。It can be understood that the first quantity and the second quantity corresponding to the keywords represented by each node are determined by analyzing the directed graph based on HITS as an example. In practical applications, other algorithms can also be used to analyze the directed graph. to determine the first quantity and the second quantity of each keyword, which is not limited.

S206,结合不同关键词之间的相关度以及关键词对对话文本的影响程度,确定关键词的重要程度。S206, determining the importance of the keywords in combination with the correlation between different keywords and the degree of influence of the keywords on the dialogue text.

S207,基于关键词的重要程度,生成对话文本的摘要。S207, based on the importance of the keywords, generate a summary of the dialogue text.

该摘要包括:对话文本中重要程度符合条件的关键词所在的对话语句。The abstract includes: dialogue sentences in the dialogue text where the keywords whose importance levels meet the conditions are located.

以上步骤S206和S207可以参见前面实施例的相关介绍,在此不再赘述。For the above steps S206 and S207, reference may be made to the relevant introductions in the previous embodiments, and details are not repeated here.

在本申请以上实施例中,确定关键词之间的相关度的方式可以有多种可能,本申请对此不加限制。In the above embodiments of the present application, there may be multiple possibilities for determining the degree of correlation between keywords, which is not limited in the present application.

在一种可能的实现方式中,为了能够更为准确和可靠地反映出关键词之间的相关度,本申请除了结合关键词的向量之间的相似度之外,还可以结合关键词在对话文本中的词频、以及不同关键词之间同时出现在同一个对话语句内的出现次数来确定关键词之间的相关度。In a possible implementation manner, in order to reflect the correlation between keywords more accurately and reliably, in addition to combining the similarity between the vectors of keywords, the present application can also combine keywords in dialogue The word frequency in the text and the number of occurrences of different keywords in the same dialogue sentence at the same time determine the correlation between keywords.

如图4所示,其示出了本申请实施例中确定关键词之间相关度的一种流程示意图,本实施例可以包括:As shown in FIG. 4 , which shows a schematic flowchart of determining the correlation between keywords in the embodiment of the present application, the embodiment may include:

S401,针对每个关键词,基于对话文本中首次出现关键词的对话语句,确定关键词的词向量。S401 , for each keyword, determine a word vector of the keyword based on the dialogue sentence in which the keyword appears for the first time in the dialogue text.

可以理解的是,一个关键词可能会存在于多个对话语句中,但是这个关键词第一次出现的对话语句对于该关键词的影响最大,也最能表征出该关键词的语义等特征,基于此,本申请是以对话文本中该关键词首次出现在的对话语句,来确定该关键词的词向量,以使得关键词的词向量能够更为准确地反映出该关键词的语义等特征。It is understandable that a keyword may exist in multiple dialogue sentences, but the dialogue sentence in which the keyword appears for the first time has the greatest impact on the keyword, and can best characterize the semantics and other characteristics of the keyword. Based on this, the present application determines the word vector of the keyword based on the dialogue sentence in which the keyword appears for the first time in the dialogue text, so that the word vector of the keyword can more accurately reflect the semantics and other characteristics of the keyword .

可以理解的是,基于关键词首次出现的对话语句,确定关键词的词向量的方式可以有多种,本申请对此不加限制。It can be understood that, based on the dialogue sentence in which the keyword appears for the first time, there may be various ways to determine the word vector of the keyword, which is not limited in this application.

在一种可能的实现方式中,为了使得关键词的词向量能够反映出关键词所在的上下文的特征,进而使得后续计算出的关键词之间的相似度能够反映出关键词之间所在上下文的相似度,本申请可以利用基于变换器的双向编码表征(Bidirectional EncoderRepresentations from Transformers,BERT)模型,以及该关键词首次出现在的对话语句,确定该关键词的词向量。In a possible implementation manner, in order to enable the word vector of the keyword to reflect the characteristics of the context in which the keyword is located, so that the similarity between the keywords calculated subsequently can reflect the context in which the keywords are located. Similarity, the present application can use the transformer-based Bidirectional Encoder Representations from Transformers (BERT) model and the dialogue sentence in which the keyword first appears to determine the word vector of the keyword.

如,将该关键词首次出现在的对话语句输入到BERT模型,得到该BERT模型最后一层的编码向量,该编码向量就是该关键词的词向量。For example, input the dialogue sentence in which the keyword appears for the first time into the BERT model, and obtain the encoding vector of the last layer of the BERT model, and the encoding vector is the word vector of the keyword.

S402,基于关键词的词向量,确定不同关键词之间的向量相似度。S402, based on the word vector of the keyword, determine the vector similarity between different keywords.

其中,向量相似度也就是关键词之间的相似度。本申请中主要关注任意两个关键词之间的相似度,因此,该步骤S402需要分别计算两两关键词之间的向量相似度。Among them, the vector similarity is the similarity between keywords. In this application, the similarity between any two keywords is mainly concerned. Therefore, in step S402, the vector similarity between the two keywords needs to be calculated respectively.

基于关键词的词向量确定关键词之间的相似度的方式同样可以有多种可能,对此不加限制。There are also multiple possibilities for determining the similarity between keywords based on the word vector of the keywords, which is not limited.

如,在一种可能的实现方式中,可以利用余弦相似度来计算关键词的词向量之间的相似度。如,对于任意两个关键词,如关键词ti与关键词tj,可以通过如下公式一,计算这两个关键词之间的向量相似度Sim(ti,tj):For example, in a possible implementation manner, the cosine similarity may be used to calculate the similarity between the word vectors of the keywords. For example, for any two keywords, such as keyword t i and keyword t j , the following formula 1 can be used to calculate the vector similarity Sim(t i ,t j ) between the two keywords:

Figure BDA0003570612100000121
Figure BDA0003570612100000121

其中,Emb(ti)是关键词ti的词向量,Emb(tj)是关键词tj的词向量。Among them, Emb(t i ) is the word vector of the keyword t i , and Emb(t j ) is the word vector of the keyword t j .

关键词ti与关键词tj表示对话文本中任意两个关键词,其中,关键词ti为对话文本中第i个关键词,而关键词tj为对话文本中第j个关键词,i和j为大于等于1且小于等于N的自然数,N为对话文本中关键词的总数量。The keyword t i and the keyword t j represent any two keywords in the dialogue text, where the keyword t i is the ith keyword in the dialogue text, and the keyword t j is the jth keyword in the dialogue text, i and j are natural numbers greater than or equal to 1 and less than or equal to N, where N is the total number of keywords in the dialogue text.

S403,结合不同关键词之间的向量相似度,关键词在对话文本中的词频以及不同关键词之间同时出现在一条对话语句中的出现次数,确定不同关键词之间的相关度。S403: Determine the correlation between different keywords by combining the vector similarity between different keywords, the word frequency of keywords in the dialogue text, and the number of times different keywords appear simultaneously in a dialogue sentence.

其中,关键词在对话文本中的词频是指该关键词在对话文本中的出现次数。The word frequency of a keyword in the dialogue text refers to the number of times the keyword appears in the dialogue text.

不同关键词之间同时出现在一条对话语句中的出现次数是指对于任意两个关键词,这两个关键词同时出现在相同的对话语句中的总次数。如,关键词A和关键词B同时出现在对话语句1、对话语句3和对话语句7中,那么这两个关键词同时出现在相同的对话语句中的出现次数为3次。The number of simultaneous occurrences of different keywords in a dialogue sentence refers to the total number of times that for any two keywords, the two keywords appear in the same dialogue sentence at the same time. For example, if keyword A and keyword B appear in dialogue sentence 1, dialogue sentence 3 and dialogue sentence 7 at the same time, the number of times these two keywords appear in the same dialogue sentence at the same time is 3 times.

本实施例在确定关键词之间相关度时,综合考虑不同关键词之间的向量相似度,关键词在对话文本中的词频以及不同关键词同时出现在相同的对话语句中的次数,从而能够更为全面分析关键词之间的相关性,使得得到的相关度能够更为准确和全面反映关键词之间的关联关系。In this embodiment, when determining the correlation between keywords, the vector similarity between different keywords, the word frequency of keywords in the dialogue text, and the number of times different keywords appear in the same dialogue sentence are comprehensively considered, so as to be able to The correlation between keywords is more comprehensively analyzed, so that the obtained correlation can more accurately and comprehensively reflect the correlation between keywords.

其中,具体计算相关度的方式可以根据需要设定,对此不加限制。Wherein, the specific method for calculating the correlation degree can be set as required, which is not limited.

在一种可能的实现方式中,本申请可以通过如下公式二计算关键词ti与关键词tj之间的相关度mijIn a possible implementation manner, the present application can calculate the correlation m ij between the keyword t i and the keyword t j through the following formula 2:

Figure BDA0003570612100000131
Figure BDA0003570612100000131

其中,TF(ti)为关键词ti在对话文本中的词频,TF(tj)为关键词tj在对话文本中的词频;TSij为关键词ti与关键词tj同时出现在相同的对话语句中的出现次数,TSji为关键词TSji与关键词ti同时出现在相同的对话语句中的出现次数。可以理解的是,TSij与TSji的取值相同。Among them, TF(t i ) is the word frequency of the keyword t i in the dialogue text, TF(t j ) is the word frequency of the keyword t j in the dialogue text; TS ij is the simultaneous occurrence of the keyword t i and the keyword t j The number of occurrences in the same dialogue sentence, TS ji is the number of times that the keyword TS ji and the keyword t i appear in the same dialogue sentence at the same time. It can be understood that the values of TS ij and TS ji are the same.

在本申请以上实施例中,结合不同关键词之间的相关度以及所述关键词对所述对话文本的影响程度,确定所述关键词的重要程度的方式也可以有多种可能。为了便于理解,下面以确定关键词的重要程度的一种实现方式为例说明。In the above embodiments of the present application, there are also various possibilities for determining the importance of the keywords in combination with the degree of correlation between different keywords and the degree of influence of the keywords on the dialogue text. For ease of understanding, an implementation manner for determining the importance of a keyword is described below as an example.

如图5所示,其示出了本申请实施例提供的确定关键词的重要程度的一种实现流程示意图,本实施例的方法可以包括:As shown in FIG. 5 , which shows a schematic flowchart of an implementation of determining the importance of a keyword provided by an embodiment of the present application, the method of this embodiment may include:

S501,针对每个关键词,确定对话文本中首次出现关键词的首现对话语句以及首现对话语句的语句来源方。S501 , for each keyword, determine the first dialogue sentence in which the keyword appears for the first time in the dialogue text and the sentence source of the first dialogue sentence.

在本实施例中,同样考虑到关键词首次出现在的对话语句对于确定关键词的重要程度的影响较大,因此,本申请需要结合关键词首次出现的对话语句所属于的语句来源方来确定关键词的重要程度。In this embodiment, it is also considered that the dialogue sentence in which the keyword appears for the first time has a great influence on determining the importance of the keyword. Therefore, this application needs to determine the source of the sentence to which the dialogue sentence in which the keyword appears for the first time belongs. The importance of keywords.

其中,为了便于区分,本申请中,将关键词首次出现在的对话语句称为该关键词的首现对话语句。Among them, for the convenience of distinction, in this application, the dialogue sentence in which the keyword appears for the first time is referred to as the first-occurrence dialogue sentence of the keyword.

关键词的首现对话语句的语句来源方可以表征该首现对话语句属于对话文本中涉及到的多个对话方中的哪个对话方。The source of the sentence of the first dialogue sentence of the keyword can represent which dialogue party among the multiple dialogue parties involved in the dialogue text belongs to which the first dialogue sentence belongs.

如,以客户服务场景中,对话文本主要是用户与客服之间的对话,因此,对话文本所涉及到的对话方包括用户和客服,因此,语句来源方可以是用户和客服中的一种。For example, in the customer service scenario, the dialogue text is mainly the dialogue between the user and the customer service. Therefore, the dialogue parties involved in the dialogue text include the user and the customer service. Therefore, the source of the sentence can be one of the user and the customer service.

又如,以业务交流场景为例,对话文本中可能涉及到多个业务方,而语句来源方可以属于多个业务方中的某一方。For another example, taking a business communication scenario as an example, multiple business parties may be involved in the dialogue text, and the source of the sentence may belong to one of the multiple business parties.

S502,结合关键词对应的首现对话语句的语句来源方以及不同关键词之间的相关度,确定关键词的基准重要程度。S502: Determine the benchmark importance of the keyword in combination with the source of the first dialogue sentence corresponding to the keyword and the correlation between different keywords.

可以理解的是,对话文本涉及到的不同语句来源方的对话语句具有不同的特征,而关键词对应的首现对话语句所属的语句来源方不同时,基于该关键词与其他关键词之间的相关度所能反映出的该关键词的重要程度也会有所差别。It can be understood that the dialogue sentences of different sentence sources involved in the dialogue text have different characteristics, and when the first dialogue sentence corresponding to a keyword belongs to a different sentence source, based on the difference between the keyword and other keywords. Relevance can reflect the importance of the keyword will also vary.

如,仍以客户服务场景为例,用户侧发出的对话语句越长,该对话语句适合生成摘要的适合度越低。相应的,如果关键词的首现对话语句来源于用户侧,那么关键词所属的对话语句越长,该关键词与其他关键词之间的相关度可能会越高,但是该关键词反而不作为对话文本中较为重要的词,该关键词的重要程度会相对较低。For example, still taking the customer service scenario as an example, the longer the dialogue sentence sent by the user side, the lower the suitability of the dialogue sentence for generating a summary. Correspondingly, if the first dialogue sentence of a keyword comes from the user side, the longer the dialogue sentence to which the keyword belongs, the higher the correlation between the keyword and other keywords may be, but the keyword does not act as a For more important words in the dialogue text, the importance of the keyword will be relatively low.

反之,对于客服侧出现的对话语句而言,考虑到客服侧发出的对话语句一般都较为精炼且与对话文本的本质内容无关的无关信息较少,因此,来源于客服侧的对话语句越长,那么首次出现在该对话语句中的关键词的重要程度也越能反映对话文本的对话信息,因此,该关键词与其他关键词的相关度越高,则可以说明该关键词的重要程度越高。On the contrary, for the dialogue sentences that appear on the customer service side, considering that the dialogue sentences sent by the customer service side are generally more refined and have less irrelevant information irrelevant to the essential content of the dialogue text, the longer the dialogue sentences from the customer service side, the Then the importance of the keyword that appears in the dialogue sentence for the first time can also reflect the dialogue information of the dialogue text. Therefore, the higher the correlation between the keyword and other keywords, the higher the importance of the keyword. .

在一种可能的实现方式中,对于语句来源方包括来源于客服和来源于用户中的一种的情况下,对于任意一个关键词,可以确定关键词与其他关键词之间的相关度的相关度总和;在此基础上,结合该关键词对应的首现对话语句的语句来源方以及该关键词对应的相关度总和,确定该关键词的基准重要程度。In a possible implementation manner, in the case that the source of the sentence includes one of the customer service source and the user source, for any keyword, the correlation of the relevance between the keyword and other keywords can be determined. On this basis, the reference importance degree of the keyword is determined by combining the sentence source of the first dialogue sentence corresponding to the keyword and the sum of the relevancy degrees corresponding to the keyword.

其中,如果关键词对应的首现对话语句来源于客服,关键词的基准重要程度与关键词对应的相关度总和之间负相关。Among them, if the first dialogue sentence corresponding to the keyword comes from the customer service, there is a negative correlation between the baseline importance of the keyword and the sum of the correlation degrees corresponding to the keyword.

如果关键词对应的首现对话语句来源于用户,该关键词的基准重要程度与所述关键词对应的相关度总和之间正相关。If the first dialogue sentence corresponding to the keyword comes from the user, there is a positive correlation between the reference importance of the keyword and the sum of the relevancy degrees corresponding to the keyword.

例如,对于关键词ti对应的首现对话语句来源于客服(Agent)的情况,可以通过如下公式三,计算该关键词ti的基准重要程度SScore(ti|Agent):For example, for the case where the first dialogue sentence corresponding to the keyword t i comes from a customer service agent (Agent), the following formula 3 can be used to calculate the benchmark importance degree SScore(t i |Agent) of the keyword t i :

Figure BDA0003570612100000151
Figure BDA0003570612100000151

对于关键词ti对应的首现对话语句来源于用户(User)侧的情况,可以通过如下公式四,计算该关键词ti的基准重要程度可以通过如下公式四计算得到:For the case where the first dialogue sentence corresponding to the keyword t i comes from the user (User) side, the following formula 4 can be used to calculate the benchmark importance of the keyword t i can be calculated by the following formula 4:

Figure BDA0003570612100000152
Figure BDA0003570612100000152

进一步的,为了能够比较不同关键词之间的基准重要程度的大小,还可以对不同关键词的基准重要程度进行归一化,因此,关键词的基准重要程度可以为归一化后的基准重要程度。Further, in order to be able to compare the benchmark importance levels of different keywords, the benchmark importance levels of different keywords can also be normalized. Therefore, the benchmark importance levels of keywords can be the normalized benchmark importance levels. degree.

如,以假设关键词ti的基准重要程度为SScore(ti)(例如,可以为基于公式三计算出的SScore(ti|Agent),或者是公式四计算出的SScore(ti|User)),那么可以通过如下公式五进行归一化,得到归一化后的基准重要程度softmax(ti):For example, it is assumed that the benchmark importance of the keyword t i is SScore(t i ) (for example, it can be SScore(t i |Agent) calculated based on formula 3, or SScore(t i |User) calculated based on formula 4 )), then it can be normalized by the following formula 5 to obtain the normalized benchmark importance degree softmax(t i ):

Figure BDA0003570612100000161
Figure BDA0003570612100000161

其中,tl为对话文本中第l个关键词,l为大于等于1且小于等于N的自然数,N为对话文本中关键词的总数量。Among them, t l is the lth keyword in the dialogue text, l is a natural number greater than or equal to 1 and less than or equal to N, and N is the total number of keywords in the dialogue text.

S503,结合关键词的基准重要程度以及关键词对对话文本的影响程度,确定关键词的重要程度。S503 , determining the importance of the keyword in combination with the benchmark importance degree of the keyword and the degree of influence of the keyword on the dialogue text.

其中,基准重要程度越高且影响程度越高,该关键词的重要程度越高。Among them, the higher the importance of the benchmark and the higher the degree of influence, the higher the importance of the keyword.

如,在一种可能的实现方式中,可以利用关键词对对话文本的影响程度对该关键词的基准重要程度进行修正,得到关键词的重要程度。For example, in a possible implementation manner, the reference importance degree of the keyword may be corrected by using the degree of influence of the keyword on the dialogue text to obtain the importance degree of the keyword.

例如,假设关键词对对话文本的影响程度通过影响分数表示,影响分数越大则影响程度越高。而基准重要程度也为基准重要评分,那么可以将基准重要评分与该影响分数的乘积确定为表示确定关键词的重要程度的重要度评分。For example, it is assumed that the degree of influence of keywords on the dialogue text is represented by the influence score, and the larger the influence score is, the higher the influence degree is. And the benchmark importance is also the benchmark importance score, then the product of the benchmark importance score and the influence score can be determined as the importance score representing the importance of the determined keyword.

可以理解的是以上是以一种可能情况为例说明,在实际应用中,还可以通过关键词的基准重要程度对应的基准重要评分与其影响程度对应的影响分数求和,将求和得到的结果作为表示该关键词的重要程度的重要度评分。当然,确定关键词的重要程度的方式还可以有其他可能,对此不加限制。It can be understood that the above is an example of a possible situation. In practical applications, the benchmark importance score corresponding to the benchmark importance degree of the keyword and the impact score corresponding to the impact degree can also be summed, and the result obtained by the summation can be obtained. As an importance score indicating the importance of the keyword. Of course, there may also be other possibilities for determining the importance of the keyword, which is not limited.

在图5实施例中,确定关键词的重要程度的构成中,考虑到关键词首次出现的首现对话语句对于关键词的影响较大,且综合考虑到该首现对话语句的来源方来以及关键词与其他关键词的相关度,综合确定关键词的基础重要程度,从而能够更为全面和准确的确定关键词与其他关键词之间的关联程度以及该关键词的重要性。In the embodiment of FIG. 5 , in the composition of determining the importance of the keyword, it is considered that the first-occurring dialogue sentence in which the keyword appears for the first time has a greater impact on the keyword, and the source of the first-occurring dialogue sentence is comprehensively considered and The degree of correlation between a keyword and other keywords, comprehensively determine the basic importance of the keyword, so that the degree of correlation between the keyword and other keywords and the importance of the keyword can be more comprehensively and accurately determined.

可以理解的是,通常情况下不同对话方进行对话语句的交互时,一般都会涉及到的问题咨询以及解答等多个部分,不同部分主要表达的意图会有所不同。基于此,本申请还可以将对话文本可以划分为至少一个对话分区,不同对话分区内的对话语句表征不同类别的对话意图。It is understandable that in general, when different dialogue parties interact with dialogue sentences, there are usually multiple parts such as question consultation and answering involved, and the main intentions expressed by different parts will be different. Based on this, the present application can further divide the dialogue text into at least one dialogue partition, and dialogue sentences in different dialogue partitions represent different types of dialogue intentions.

考虑到不同对话分区内的对话语句所属的对话意图不同,因此,不同对话分区内的对话语句对于表征对话文本的关键信息的重要程度也会有所差别。基于此,为了能够更为合理的生成对话文本的摘要,使得生成的摘要能够更为全面且准确的表达对话文本的主旨,本申请还可以分别从各个对话分区内确定用于组成摘要的对话语句。Considering that dialogue sentences in different dialogue partitions belong to different dialogue intentions, the importance of dialogue sentences in different dialogue partitions to the key information representing the dialogue text will also be different. Based on this, in order to generate a more reasonable summary of the dialogue text, so that the generated summary can express the gist of the dialogue text more comprehensively and accurately, the present application can also determine the dialogue sentences used to compose the summary from each dialogue partition. .

具体的,可以基于关键词的重要程度,确定对话分区内用于组成摘要的目标对话语句,得到各对话分区内的目标对话语句组成的摘要。Specifically, based on the importance of the keywords, the target dialogue sentences used for composing the abstract in the dialogue partition can be determined, and the abstract consisting of the target dialogue sentences in each dialogue partition can be obtained.

在一种可能的实现方式中,考虑到不同对话分区内的对话语句的重要程度不同,本申请在不同对话分区内确定出的用于生成摘要的目标对话语句的数量也可以不同。In a possible implementation manner, considering that the importance levels of the dialogue sentences in different dialogue partitions are different, the number of target dialogue sentences for generating summaries determined by the present application in different dialogue partitions may also be different.

下面结合该种可能的实现方式对本申请的方案进行介绍。如图6所示,其示出了本申请实施例提供的摘要生成方法的又一种流程示意图,本实施例的方法可以包括:The solution of the present application will be introduced below in conjunction with this possible implementation manner. As shown in FIG. 6 , which shows another schematic flowchart of the method for generating an abstract provided in an embodiment of the present application, the method in this embodiment may include:

S601,确定对话文本中各对话语句内的关键词。S601: Determine the keywords in each dialogue sentence in the dialogue text.

S602,确定不同关键词之间的相关度以及对话语句内关键词之间的出现顺序。S602, determine the correlation between different keywords and the appearance order of the keywords in the dialogue sentence.

S603,基于出现顺序,确定关键词对对话文本的影响程度。S603, based on the order of appearance, determine the degree of influence of the keyword on the dialogue text.

S604,结合不同关键词之间的相关度以及关键词对对话文本的影响程度,确定关键词的重要程度。S604: Determine the importance of the keywords in combination with the correlation between different keywords and the degree of influence of the keywords on the dialogue text.

以上步骤S601到S604可以参见前面实施例的相关介绍,在此不再赘述。For the above steps S601 to S604, reference may be made to the relevant introduction in the previous embodiment, and details are not repeated here.

S605,将对话文本划分为至少一个对话分区。S605: Divide the dialogue text into at least one dialogue partition.

其中,不同对话分区内的对话语句表征不同类别的对话意图。Among them, the dialogue sentences in different dialogue partitions represent different categories of dialogue intentions.

如,可以结合对话文本中各对话语句的语义,确定对话语句的语义所表征的对话意图,将对话语句中相邻且对话意图相同的对话语句划分为至少一个对话分区。For example, the semantics of each dialogue sentence in the dialogue text can be combined to determine the dialogue intention represented by the semantics of the dialogue sentence, and the dialogue sentences in the dialogue sentences with the same dialogue intention are divided into at least one dialogue partition.

可以理解的是,咨询以及聊天类的对话文本中的对话语句都是按照特定的对话逻辑来进行的,而对话文本的对话逻辑所涉及到的各个逻辑组成部分所表征的对话意图不同,因此,对话文本可以按照对话逻辑划分为至少一个逻辑分区,每个逻辑分区就是一个对话分区。It can be understood that the dialogue sentences in the dialogue texts of consultation and chat are carried out according to a specific dialogue logic, and the dialogue intentions represented by the logical components involved in the dialogue logic of the dialogue text are different. Therefore, The dialogue text can be divided into at least one logical partition according to the dialogue logic, and each logical partition is a dialogue partition.

如,多个对话方的对话一般都是从打招呼开始,然后是提出问题,其次是问题探讨或者细节确认,最后是问题解决这几个部分,因此,对话文本可以划分为四个对话分区,第一个对话分区的意图为打招呼,其包括打招呼相关的对话语句;第二个对话分区的意图为问题标题,其包括与提出问题以及确定所存在的问题的一条或者多条对话语句;第三个对话分区的意图可以为问题细节沟通,其可以包括与确定问题的具体细节内容以及探讨可能的问题方向的至少一条对话语句;第四个对话分区的意图可以为确定问题原因,其可以包括涉及到问题的原因及确定的解决方案的相关对话语句。For example, the dialogue of multiple dialogue parties generally starts with greeting, then asking questions, followed by problem discussion or confirmation of details, and finally problem solving. Therefore, the dialogue text can be divided into four dialogue sections. The intent of one dialogue partition is to say hello, which includes dialogue sentences related to the greeting; the intent of the second dialogue partition is the question title, which consists of one or more dialogue sentences related to asking the question and identifying the problem at hand; The intention of the dialogue partition can be to communicate the details of the problem, which can include at least one dialogue sentence to identify the specific details of the problem and explore possible directions of the problem; the intention of the fourth dialogue partition can be to determine the cause of the problem, which can include related to the problem. Conversational statements about the cause of the problem and the identified solution.

当然,考虑到对话文本中打招呼,以及,对话文本最后用于结束对话的再见语句对于确定文本对话所表达的主题含义的意义不大,可以去除对话文本中打招呼以及再见相关的部分。Of course, considering the greeting in the dialogue text and the goodbye sentence used to end the dialogue at the end of the dialogue text is of little significance for determining the subject meaning expressed by the text dialogue, the parts related to greeting and goodbye in the dialogue text can be removed.

举例说明:for example:

图7为对话文本所划分出的多个对话分区的示意图。FIG. 7 is a schematic diagram of a plurality of dialogue partitions divided by dialogue text.

在图7中语句就是一个对话语句,语句所在的框的长短表示对话语句的长短。The sentence in FIG. 7 is a dialogue sentence, and the length of the box where the sentence is located indicates the length of the dialogue sentence.

在图7所示的对话文本中最开始为与打招呼相关的至少一条对话语句701,该部分对话语句可以移除,如图7中采用叉号表示删除。另外,对话文本中最后面的对话语句为表示再见等结束对话的结束对话语句702,这部分内容同样可以删除。In the dialogue text shown in FIG. 7 , there is at least one dialogue sentence 701 related to greeting at the beginning, and this part of the dialogue sentence can be removed. In FIG. 7 , a cross is used to indicate deletion. In addition, the last dialogue sentence in the dialogue text is an end dialogue sentence 702 indicating goodbye, etc. to end the dialogue, and this part of the content can also be deleted.

在此基础上,图7的对话文本被分为了三个部分,即三个对话分区。这三个对话分区分别为位于最上面的第一个对话分区703、位于中间的第二个对话分区704和位于最下面的第三个对话分区705。On this basis, the dialogue text in Figure 7 is divided into three parts, namely three dialogue partitions. The three dialogue partitions are the first dialogue partition 703 located at the top, the second dialogue partition 704 located in the middle, and the third dialogue partition 705 located at the bottom.

可以理解的是,由于不同对话方在对话过程中都是按照对话逻辑逐步进行对话交流的,因此,属于相同对话意图的对话语句一般都是相邻且连续的对话语句。如,由图7可见,每个对话分区内的至少一条对话语句为连续的至少一条对话语句。It is understandable that since different dialogue parties conduct dialogue and communication step by step according to dialogue logic in the dialogue process, dialogue sentences belonging to the same dialogue intention are generally adjacent and continuous dialogue sentences. For example, as can be seen from FIG. 7 , at least one dialogue sentence in each dialogue partition is at least one continuous dialogue sentence.

比如,在图7中,第一个对话分区703可以为阐述问题的部分,该部分包括:打招呼之后的第1条到第6条对话语句,如图7中语句1到语句6所示。相应的,语句1与语句6主要交互是为了确定需要讨论或者咨询的问题。For example, in FIG. 7 , the first dialogue section 703 may be a part for elaborating questions, and the part includes the first to sixth dialogue sentences after greeting, as shown in sentences 1 to 6 in FIG. 7 . Correspondingly, the main interaction between Statement 1 and Statement 6 is to determine the issues that need to be discussed or consulted.

在图7中第二个对话分区704可以为问题细节沟通,该分区可以包括语句7到语句8。The second dialogue partition 704 in FIG. 7 may be the question detail communication, which may include statement 7 to statement 8.

而第三个对话分区705为问题解决部分,该分区可以包括语句9和语句10。While the third dialogue section 705 is the problem solving section, this section may include statement 9 and statement 10.

当然,以上仅仅是以划分对话文本的对话分区的一种可能实现为例说明,在实际应用中,对话文本所能划分的对话分区的个数以及划分方式均可以根据需要设定,对此不加限制。Of course, the above is just an example of a possible realization of the dialogue partitions for dividing dialogue texts. In practical applications, the number of dialogue partitions that can be divided into dialogue texts and the partitioning method can be set as required. Add restrictions.

可以理解的是,该步骤S605与前面步骤S601到S604中任意一个步骤的执行顺序并不限于图6的序号限制,在实际应用中,也可以是先执行该步骤S605再执行前面步骤S601到S604中的步骤,还可以是在步骤是601到S604中任意一个步骤的同时,执行该步骤S605。It can be understood that the execution order of this step S605 and any one of the previous steps S601 to S604 is not limited to the sequence number restriction in FIG. 6 . In practical applications, it is also possible to perform this step S605 first and then perform the previous steps S601 to S604. The step in the step S605 can also be performed while the step is any one of the steps 601 to S604.

S606,针对每个对话分区,基于不同对话意图的对话分区各自所需提取语句的目标数量,结合关键词的重要程度,从对话分区内确定与该对话分区对应的目标数量个目标对话语句。S606: For each dialogue partition, determine the target number of target dialogue sentences corresponding to the dialogue partition from within the dialogue partition based on the target number of sentences to be extracted from the dialogue partitions with different dialogue intentions and in combination with the importance of the keywords.

其中,从该对话分区中提取出目标数量个目标对话语句的方式与前面从对话文本中确定对话语句的过程相似。The method of extracting the target number of target dialogue sentences from the dialogue partition is similar to the previous process of determining dialogue sentences from dialogue texts.

如,针对每个对话语句,可以基于对话语句中各关键词的重要程度确定对话语句的重要程度(具体实现方式可以参见前面的介绍),在此基础上,针对每个对话分区,结合该对话分区对应的目标数量,可以提取出该对话分区中重要程度较高的前目标数量个对话语句。For example, for each dialogue sentence, the importance of the dialogue sentence can be determined based on the importance of each keyword in the dialogue sentence (for the specific implementation method, please refer to the previous introduction). On this basis, for each dialogue partition, combine the dialogue The number of targets corresponding to the partition can be used to extract the number of dialogue sentences with higher importance in the dialogue partition.

其中,不同对话意图的对话分区所需提取的目标数量可以不同,也可以相同,具体可以根据需要设定。Wherein, the number of targets to be extracted for dialogue partitions with different dialogue intentions may be different or the same, and may be specifically set as required.

考虑到不同意图的对话分区内的对话语句与对话文本所需表达主题内容的贴合程度不同,即重要程度不同,因此,本申请可以为不同对话意图的对话分区设置不同的目标数量。如,考虑到问题提出相关的对话分区内的对话语句能够反映出对话文本的主题方向,因此,可以将问题提取相关的对话分区(例如图7中第一个对话分区)对应的目标数量设置得相对较多些。Considering that the dialogue sentences in the dialogue partitions with different intentions and the dialogue text need to express the subject content differently, that is, the importance levels are different, therefore, the present application can set different target numbers for dialogue partitions with different dialogue intentions. For example, considering that the dialogue sentences in the dialogue partition related to question formulation can reflect the topic direction of the dialogue text, the target number corresponding to the dialogue partition related to question extraction (for example, the first dialogue partition in Figure 7) can be set to relatively more.

例如,仍以图7的例子说明。假设问题提取这一意图的对话分区所能提取的语句的目标数量为3个,而表征问题细节描述的对话分区对应的目标数量为2个,问题解决相关的对话分区的目标数量可以为1个。那么,就需从图7的第一个对话分区中提取出重要程度较高3个目标对话语句,类似的,从第二个对话分区中提取出2个目标对话语句,而从第三个对话分区中提取出一个目标对话语句。For example, the example of FIG. 7 is still used for description. Assume that the number of targets that can be extracted from the dialogue partition of the question extraction intention is 3, and the number of targets corresponding to the dialogue partition representing the detailed description of the problem is 2, and the number of targets for the dialogue partition related to problem solving can be 1. . Then, it is necessary to extract 3 target dialogue sentences with higher importance from the first dialogue partition in Figure 7. Similarly, extract 2 target dialogue sentences from the second dialogue partition, and extract 2 target dialogue sentences from the third dialogue partition. A target dialogue sentence is extracted from the partition.

S607,基于各对话分区内的目标对话语句,生成对话文本的摘要。S607, based on the target dialogue sentences in each dialogue partition, generate a summary of the dialogue text.

如,可以按照对话分区的先后顺序以及每个对话分区中目标对话语句的先后顺序,将各对话分区内的目标对话语句组合成摘要。For example, the target dialogue sentences in each dialogue partition can be combined into a summary according to the sequence of dialogue partitions and the order of target dialogue sentences in each dialogue partition.

在本实施例中,将对话文本划分为不同对话意图的至少一个对话分区,然后从对话分区中提取用于生成摘要的对话语句,既有利于使得摘要可以覆盖表征不同意图的语句,使得摘要能够更为全面反映对话文本的内容。In this embodiment, the dialogue text is divided into at least one dialogue partition with different dialogue intentions, and then the dialogue sentences used to generate the summary are extracted from the dialogue partition, which is beneficial to enable the summary to cover sentences representing different intentions, and to enable the summary to More fully reflects the content of the dialogue text.

另外,由于本申请可以依次将各对话分区中提取出的对话语句生成对话摘要,也使得对话摘要可以反映出对话文本的对话逻辑。In addition, since the present application can sequentially generate dialogue summaries from the dialogue sentences extracted from each dialogue partition, the dialogue summaries can also reflect the dialogue logic of the dialogue text.

为了便于理解本申请的方案,下面以结合有向图的方式来生成对话摘要为例,对本申请的方案进行说明。In order to facilitate the understanding of the solution of the present application, the solution of the present application is described below by taking the method of generating a dialogue summary in combination with a directed graph as an example.

如图8所示,其示出了本申请实施例提供的摘要生成方法的又一种流程示意图,本实施例的方法可以包括:As shown in FIG. 8 , which shows another schematic flowchart of the method for generating an abstract provided in an embodiment of the present application, the method in this embodiment may include:

S801,确定对话文本中各对话语句内的关键词,得到对话文本中的N个关键词。S801: Determine the keywords in each dialogue sentence in the dialogue text, and obtain N keywords in the dialogue text.

S802,将对话文本划分为至少一个对话分区。S802: Divide the dialogue text into at least one dialogue partition.

每个对话区分包括至少一条对话语句,且不同对话分区包含的对话语句不同。Each dialogue partition includes at least one dialogue sentence, and different dialogue partitions contain different dialogue sentences.

其中,不同对话分区内的对话语句表征不同类别的对话意图。Among them, the dialogue sentences in different dialogue partitions represent different categories of dialogue intentions.

以上步骤S801和S802的顺序不限于图8所示。The sequence of the above steps S801 and S802 is not limited to that shown in FIG. 8 .

S803,对于对话文本中任意两个关键词,确定这两个关键词之间的相关度。S803, for any two keywords in the dialogue text, determine the degree of correlation between the two keywords.

可以理解的是,通过计算出对话文本中两两关键词之间的相关度,可以得到一个N*N的共现矩阵M,该共现矩阵中的元素为关键词ti与关键词tj之间的相关度mij,其中,不同元素对应不同关键词对之间的相关度,关键词包括对话文本中的两个关键词。It can be understood that, by calculating the correlation between two keywords in the dialogue text, an N*N co-occurrence matrix M can be obtained, and the elements in the co-occurrence matrix are the keyword t i and the keyword t j . The degree of correlation m ij between different elements corresponds to the degree of correlation between different pairs of keywords, and the keywords include two keywords in the dialogue text.

其中,相关度的计算可以参见前面公式二的相关介绍,在此不再赘述。For the calculation of the correlation degree, reference may be made to the relevant introduction of the foregoing formula 2, and details are not repeated here.

S804,针对对话文本中任意两个关键词,确定这两个关键词首次同时出现的共现对话语句,得到这两个关键词在该共现对话语句内的出现顺序。S804, for any two keywords in the dialogue text, determine the co-occurrence dialogue sentence in which the two keywords appear at the same time for the first time, and obtain the appearance order of the two keywords in the co-occurrence dialogue sentence.

S805,基于对话文本中两两关键词之间的相关度以及任意两个关键词在其共现对话语句内的出现顺序,构建表征对话文本内不同关键词之间关联关系的有向图。S805 , construct a directed graph representing the association relationship between different keywords in the dialogue text based on the correlation between two keywords in the dialogue text and the appearance order of any two keywords in the co-occurring dialogue sentences.

其中,有向图包括:多个节点以及多个节点之间指示有方向的有向边,每个节点表征一个关键词,两个节点之间具有的有向边表示这两个节点之间的相关度不为零,两个节点之间的有向边的方向表示这两个节点对应的两个关键词在共现对话语句内的出现顺序。The directed graph includes: a plurality of nodes and directed edges indicating a direction between the plurality of nodes, each node represents a keyword, and a directed edge between two nodes indicates the relationship between the two nodes. The degree of relevance is not zero, and the direction of the directed edge between the two nodes represents the order in which the two keywords corresponding to the two nodes appear in the co-occurring dialogue sentence.

可以理解的是,由前面公式二可知,如果两个关键词未同时出现在同一个对话语句中,那么这两个关键词的相关度会为零,因此,可以结合相关度来确定两个关键词对应的节点之间是否存在有向边。It is understandable that, according to the previous formula 2, if two keywords do not appear in the same dialogue sentence at the same time, the correlation between the two keywords will be zero. Therefore, the correlation can be combined to determine the two keywords. Whether there is a directed edge between the nodes corresponding to the word.

同时,在本实施例中,有向图中两个节点之间的有向边的权重值可以为这两个节点表征的两个关键词之间的相关度。具体可以结合共现矩阵M来确定有向边的权重值(或者说相关度)。Meanwhile, in this embodiment, the weight value of a directed edge between two nodes in the directed graph may be the correlation between two keywords represented by the two nodes. Specifically, the weight value (or correlation) of the directed edge can be determined in combination with the co-occurrence matrix M.

可以理解的是,为了能够体现出关键词首次出现的对话语句所来自的对话分区,本申请中构建出的有向图中还可以标示出节点表示的关键词来自哪个对话分区。如图9所示,有向图中划分出了各个节点表征的关键词所属的对话分区,例如节点A和节点B表示的关键词来自第一个对话分区,对于其他节点类似,在此不再赘述。It can be understood that, in order to reflect the dialogue partition from which the dialogue sentence where the keyword appears for the first time, the directed graph constructed in this application can also indicate which dialogue partition the keyword represented by the node comes from. As shown in Figure 9, the directed graph divides the dialogue partitions to which the keywords represented by each node belong. For example, the keywords represented by node A and node B are from the first dialogue partition. Similar to other nodes, they are not repeated here. Repeat.

S806,针对有向图中每个节点,利用HITS算法确定该节点的权威值和枢纽值,将该节点的权威值和枢纽值中的较大者,确定为该节点表征的关键词对该对话文本的影响程度值。S806, for each node in the directed graph, use the HITS algorithm to determine the authority value and the pivot value of the node, and determine the larger of the authority value and the pivot value of the node as the keyword represented by the node. The text's degree of influence value.

S807,对于有向图中每个节点,确定对话文本中首次出现该节点表示的关键词的首现对话语句及该首现对话语句的语句来源方。S807: For each node in the directed graph, determine the first occurrence of the dialogue sentence in which the keyword represented by the node appears for the first time in the dialogue text and the sentence source of the first occurrence of the dialogue sentence.

S808,结合该首现对话语句的语句来源方以及有向图中该节点与其他节点之间的有向边上的相关度,确定该节点表示的关键词的基准重要程度。S808: Determine the reference importance of the keyword represented by the node in combination with the source of the first dialogue sentence and the correlation on the directed edge between the node and other nodes in the directed graph.

其中,确定关键词的基准重要程度的过程可以参见前面实施例的相关介绍,在此不再赘述。Wherein, for the process of determining the benchmark importance degree of the keyword, reference may be made to the relevant introduction of the previous embodiment, and details are not described herein again.

需要说明的是,本实施例是以结合有向图来确定关键词的基准重要程度,但是可以理解的是,结合有向图只是更为清晰表示出关键词与其他关键词之间的相关度,但是不结合有向图而直接利用关键词之间的相关性,确定关键词的基准重要程度也同样适用于本实施例。It should be noted that in this embodiment, the reference importance of keywords is determined based on the combination of directed graphs, but it can be understood that the combination of directed graphs only more clearly shows the degree of correlation between keywords and other keywords. , but the correlation between keywords is directly used without being combined with the directed graph, and the determination of the benchmark importance of keywords is also applicable to this embodiment.

S809,结合该节点表示的关键词的基准重要程度以及该节点表示的关键词对对话文本的影响程度值,确定该节点表示的关键词的重要程度评分。S809: Determine the importance degree score of the keyword represented by the node in combination with the reference importance degree of the keyword represented by the node and the influence degree value of the keyword represented by the node on the dialogue text.

例如,以基准重要程度为基准重要程度值为例说明,假设通过前面公式三到公式五得到关键词ti的基准重要程度值为softmax(ti),那么这个关键词ti的重要程度评分S(ti)可以通过如下公式六得到:For example, taking the benchmark importance level as the benchmark importance level value as an example, assuming that the benchmark importance level value of the keyword t i is obtained from the previous formulas 3 to 5, the benchmark importance level value is softmax(t i ), then the importance level score of the keyword t i S(t i ) can be obtained by the following formula 6:

S(ti)=softmax(ti)*(1+max(HITSA(ti),HITSH(ti))) (公式六);S(t i )=softmax(t i )*(1+max(HITS A (t i ),HITS H (t i ))) (Formula 6);

其中,HITSA(ti)为关键词ti对应的权威值,HITSH(ti)为关键词ti对应的枢纽值。Among them, HITS A (t i ) is the authority value corresponding to the keyword t i , and HITS H (t i ) is the pivot value corresponding to the keyword t i .

S810,对于对话文本中任意一个对话语句,将该对话语句中重要程度评分最高的关键词对应的重要程度评分确定为该对话语句的评分。S810, for any dialogue sentence in the dialogue text, determine the importance degree score corresponding to the keyword with the highest importance degree score in the dialogue sentence as the score of the dialogue sentence.

S811,针对每个对话区分,按照该对话分区所需提取语句的目标数量,从该对话分区中提取重要程度评分较高的前目标数量条目标对话语句。S811 , for each dialogue partition, according to the target number of sentences to be extracted for the dialogue partition, extract from the dialogue partition the former target number of target dialogue sentences with a higher importance score.

S812,按照各对话分区的先后顺序以及对话分区内提取的目标对话语句的先后顺序,将各对话分区提取的目标对话语句组合为该对话文本的摘要。S812 , combine the target dialogue sentences extracted from each dialogue partition into an abstract of the dialogue text according to the sequence of each dialogue partition and the sequence of the target dialogue sentences extracted in the dialogue partition.

对应本申请实施例提供的一种摘要生成方法,本申请实施例还提供了一种摘要生成装置。Corresponding to an abstract generating method provided by an embodiment of the present application, an embodiment of the present application also provides an abstract generating apparatus.

如图10所示,其示出了本申请实施例提供的摘要生成装置一个实施例的组成结构示意图,本实施例的装置可以包括:As shown in FIG. 10, it shows a schematic diagram of the composition and structure of an embodiment of an abstract generating apparatus provided by an embodiment of the present application. The apparatus in this embodiment may include:

关键词确定单元1001,用于确定对话文本中各对话语句内的关键词;A keyword determining unit 1001, configured to determine keywords in each dialogue sentence in the dialogue text;

关联确定单元1002,用于确定不同关键词之间的相关度以及所述对话语句内关键词之间的出现顺序;an association determination unit 1002, configured to determine the degree of relevancy between different keywords and the order of appearance of the keywords in the dialogue sentence;

影响确定单元1003,用于基于所述出现顺序,确定所述关键词对所述对话文本的影响程度;an influence determination unit 1003, configured to determine the degree of influence of the keyword on the dialogue text based on the appearance order;

重要度确定单元1004,用于结合不同关键词之间的相关度以及所述关键词对所述对话文本的影响程度,确定所述关键词的重要程度;An importance determination unit 1004, configured to determine the importance of the keyword in combination with the correlation between different keywords and the degree of influence of the keyword on the dialogue text;

摘要生成单元1005,用于基于所述关键词的重要程度,生成所述对话文本的摘要,所述摘要包括:所述对话文本中重要程度符合条件的关键词所在的对话语句。The abstract generating unit 1005 is configured to generate an abstract of the dialogue text based on the importance of the keywords, where the abstract includes: dialogue sentences in the dialogue text where the keywords whose importance meets the condition are located.

在一种可能的实现方式中,该装置还包括:In a possible implementation, the device further includes:

对话分区单元,用于摘要生成单元生成所述对话文本的摘要之前,将所述对话文本划分为至少一个对话分区,不同对话分区内的对话语句表征不同类别的对话意图;a dialogue partitioning unit, used for dividing the dialogue text into at least one dialogue partition before the summary generating unit generates the abstract of the dialogue text, and dialogue sentences in different dialogue partitions represent different types of dialogue intentions;

该摘要生成单元具体为,用于基于所述关键词的重要程度,确定所述对话分区内用于组成摘要的目标对话语句,得到各对话分区内的目标对话语句组成的摘要。The abstract generating unit is specifically configured to determine, based on the importance of the keyword, target dialogue sentences in the dialogue partition for composing the abstract, and obtain an abstract consisting of the target dialogue sentences in each dialogue partition.

在又一种可能的实现方式中,该摘要生成单元,包括:In another possible implementation manner, the abstract generating unit includes:

语句确定单元,用于基于不同对话意图的对话分区各自所需提取语句的目标数量,结合所述关键词的重要程度,从所述对话分区内确定所述目标数量个目标对话语句。The sentence determination unit is configured to determine the target number of target dialogue sentences from the dialogue partition based on the target number of sentences to be extracted from the dialogue partitions with different dialogue intentions and in combination with the importance of the keyword.

在又一种可能的实现方式中,关联确定单元包括:In yet another possible implementation, the association determination unit includes:

共现确定单元,用于对于每个关键词,确定所述关键词的各共现关键词以及所述关键词与其共现关键词首次共同出现的共现对话语句,所述关键词的共现关键词为与所述关键词同时出现在一个对话语句内的其他关键词;A co-occurrence determination unit, configured to determine, for each keyword, each co-occurrence keyword of the keyword and a co-occurrence dialogue sentence in which the keyword and its co-occurrence keyword co-occur for the first time, and the co-occurrence of the keyword Keywords are other keywords that appear in a dialogue sentence at the same time as the keyword;

顺序确定单元,用于确定所述共现对话语句内所述关键词与其共现关键词之间的出现顺序。An order determination unit, configured to determine the appearance order of the keyword and its co-occurrence keyword in the co-occurrence dialogue sentence.

在又一种可能的实现方式中,该影响确定单元,包括:In yet another possible implementation manner, the influence determination unit includes:

数量确定子单元,用于基于所述出现顺序,确定位于所述关键词之后的其他关键词的第一数量以及位于所述关键词之前的其他关键词的第二数量;A quantity determination subunit, configured to determine a first quantity of other keywords located after the keyword and a second quantity of other keywords located before the keyword based on the appearance order;

影响确定子单元,用于基于所述关键词对应的第一数量和所述第二数量,确定所述关键词对所述对话文本的影响程度。An influence determination subunit, configured to determine the degree of influence of the keyword on the dialogue text based on the first quantity and the second quantity corresponding to the keyword.

在又一种可能的实现方式中,该顺序确定单元,包括:In yet another possible implementation manner, the sequence determination unit includes:

有向图构建单元,用于构建表征所述对话文本内不同关键词之间关联关系的有向图;a directed graph construction unit, used for constructing a directed graph representing the association relationship between different keywords in the dialogue text;

其中,所述有向图包括:多个节点以及多个节点之间指示有方向的有向边,每个节点表征一个关键词,两个节点之间具有所述有向边表示所述两个节点对应的两个关键词互为共现关键词,所述两个节点之间的有向边的方向表示所述两个节点对应的两个关键词在共现对话语句内的出现顺序;Wherein, the directed graph includes: a plurality of nodes and a directed edge indicating a direction between the plurality of nodes, each node represents a keyword, and the directed edge between two nodes indicates that the two The two keywords corresponding to the nodes are mutually co-occurring keywords, and the direction of the directed edge between the two nodes represents the appearance order of the two keywords corresponding to the two nodes in the co-occurrence dialogue sentence;

影响确定单元,具体为,用于按照所述有向图中不同节点之间有向边的数量以及方向,确定所述节点表征的关键词对所述对话文本的影响程度。The influence determination unit is specifically configured to determine the degree of influence of the keywords represented by the nodes on the dialogue text according to the number and direction of directed edges between different nodes in the directed graph.

在又一种可能的实现方式中,重要度确定单元,包括:In yet another possible implementation manner, the importance determination unit includes:

来源方确定子单元,用于针对每个关键词,确定所述对话文本中首次出现所述关键词的首现对话语句以及所述首现对话语句的语句来源方;a source determination subunit, configured to, for each keyword, determine the first dialogue sentence in which the keyword appears for the first time in the dialogue text and the sentence source of the first dialogue sentence;

基准确定子单元,用于结合所述关键词对应的首现对话语句的语句来源方以及不同关键词之间的相关度,确定所述关键词的基准重要程度;A benchmark determination subunit, configured to determine the benchmark importance of the keyword in combination with the source of the first dialogue sentence corresponding to the keyword and the correlation between different keywords;

重要度确定子单元,用于结合所述关键词的基准重要程度以及所述关键词对所述对话文本的影响程度,确定所述关键词的重要程度。The importance degree determination subunit is used for determining the importance degree of the keyword in combination with the reference degree of importance of the keyword and the degree of influence of the keyword on the dialogue text.

在又一种可能的实现方式中,该来源方确定子单元确定出的语句来源方包括来源于客服和来源于用户中的一种;In another possible implementation manner, the source party of the sentence determined by the source party determining subunit includes one of a customer service source and a user source;

该基准确定子单元,包括:The benchmark determines subunits, including:

和确定子单元,用于确定所述关键词与其他关键词之间的相关度的相关度总和;and a determining subunit for determining the sum of the relevancy degrees between the keyword and other keywords;

基准程度确定子单元,用于结合所述关键词对应的首现对话语句的语句来源方以及所述关键词对应的相关度总和,确定所述关键词的基准重要程度;其中,如果所述关键词对应的首现对话语句来源于客服,所述关键词的基准重要程度与所述关键词对应的相关度总和之间负相关;如果所述关键词对应的首现对话语句来源于用户,所述关键词的基准重要程度与所述关键词对应的相关度总和之间正相关。The benchmark degree determination subunit is used to determine the benchmark importance degree of the keyword in combination with the sentence source of the first dialogue sentence corresponding to the keyword and the sum of the relevancy degrees corresponding to the keyword; wherein, if the key The first dialogue sentence corresponding to the word comes from the customer service, and the reference importance of the keyword is negatively correlated with the sum of the correlation degrees corresponding to the keyword; if the first dialogue sentence corresponding to the keyword comes from the user, the There is a positive correlation between the benchmark importance of the keyword and the sum of the relevancy degrees corresponding to the keyword.

在又一种可能的实现方式中,该关联确定单元,包括:In yet another possible implementation, the association determination unit includes:

向量确定子单元,用于针对每个所述关键词,基于所述对话文本中首次出现所述关键词的对话语句,确定所述关键词的词向量;A vector determination subunit, configured to, for each of the keywords, determine the word vector of the keyword based on the dialogue sentence in which the keyword appears for the first time in the dialogue text;

相似度确定子单元,用于基于所述关键词的词向量,确定不同关键词之间的向量相似度;a similarity determination subunit, used for determining the vector similarity between different keywords based on the word vectors of the keywords;

相关度确定子单元,用于结合不同关键词之间的向量相似度,所述关键词在所述对话文本中的词频以及不同关键词之间同时出现在一条对话语句中的出现次数,确定不同关键词之间的相关度。The correlation determination subunit is used to combine the vector similarity between different keywords, the word frequency of the keywords in the dialogue text and the number of occurrences of different keywords that appear in a dialogue sentence at the same time, to determine the difference. correlation between keywords.

又一方面,本申请还提供了一种电子设备,如图11所示,其示出了该电子设备的一种组成结构示意图,该电子设备可以为任意类型的电子设备,该电子设备至少包括处理器1101和存储器1102;In another aspect, the present application also provides an electronic device, as shown in FIG. 11 , which shows a schematic diagram of the composition structure of the electronic device, the electronic device can be any type of electronic device, and the electronic device at least includes processor 1101 and memory 1102;

其中,处理器1101用于执行如上任意一个实施例中的摘要生成方法。Wherein, the processor 1101 is configured to execute the abstract generating method in any one of the above embodiments.

该存储器1102用于存储处理器执行操作所需的程序。The memory 1102 is used to store programs required by the processor to perform operations.

可以理解的是,该电子设备还可以包括显示单元1103以及输入单元1104。It can be understood that the electronic device may further include a display unit 1103 and an input unit 1104 .

当然,该电子设备还可以具有比图11更多或者更少的部件,对此不加限制。Of course, the electronic device may also have more or less components than those shown in FIG. 11 , which is not limited.

另一方面,本申请还提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由处理器加载并执行以实现如上任意一个实施例所述的摘要生成方法。In another aspect, the present application also provides a computer-readable storage medium, where at least one instruction, at least one program, code set or instruction set is stored in the computer-readable storage medium, the at least one instruction, the at least one A piece of program, the code set or the instruction set is loaded and executed by the processor to implement the abstract generation method described in any one of the above embodiments.

本申请还提出了一种计算机程序,该计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机程序在电子设备上运行时,用于执行如上任意一个实施例中的摘要生成方法。The present application also proposes a computer program comprising computer instructions stored in a computer-readable storage medium. When the computer program runs on the electronic device, it is used to execute the abstract generating method in any one of the above embodiments.

需要说明的是,本说明书中的各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似的部分互相参见即可。同时,本说明书中各实施例中记载的特征可以相互替换或者组合,使本领域专业技术人员能够实现或使用本申请。对于装置类实施例而言,由于其与方法实施例基本相似,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。It should be noted that the various embodiments in this specification are described in a progressive manner, and each embodiment focuses on the differences from other embodiments. For the same and similar parts among the various embodiments, refer to each other Can. Meanwhile, the features described in each embodiment in this specification can be replaced or combined with each other, so that those skilled in the art can realize or use the present application. As for the apparatus type embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for the relevant part, please refer to the partial description of the method embodiment.

最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括要素的过程、方法、物品或者设备中还存在另外的相同要素。Finally, it should also be noted that in this document, relational terms such as first and second are used only to distinguish one entity or operation from another, and do not necessarily require or imply these entities or that there is any such actual relationship or sequence between operations. Moreover, the terms "comprising", "comprising" or any other variation thereof are intended to encompass a non-exclusive inclusion such that a process, method, article or device comprising a list of elements includes not only those elements, but also includes not explicitly listed or other elements inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in the process, method, article, or device that includes the element.

对所公开的实施例的上述说明,使本领域技术人员能够实现或使用本申请。对这些实施例的多种修改对本领域技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本申请的精神或范围的情况下,在其它实施例中实现。因此,本申请将不会被限制于本文所示的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。The above description of the disclosed embodiments enables any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the present application. Therefore, this application is not intended to be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

以上仅是本申请的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本申请的保护范围。The above are only the preferred embodiments of the present application. It should be pointed out that for those skilled in the art, without departing from the principles of the present application, several improvements and modifications can also be made, and these improvements and modifications should also be regarded as The protection scope of this application.

Claims (10)

1. A summary generation method comprises the following steps:
determining keywords in each pair of spoken sentences in the dialog text;
determining the correlation degree between different key words and the appearance sequence between the key words in the dialogue sentences;
determining the influence degree of the keywords on the dialog text based on the appearance sequence;
determining the importance degree of the keywords by combining the correlation degrees among different keywords and the influence degree of the keywords on the dialog text;
generating a summary of the dialog text based on the importance of the keyword, the summary comprising: and the conversation sentence in which the key word with the importance degree meeting the condition is positioned in the conversation text.
2. The method of claim 1, prior to said generating the summary of the dialog text, further comprising:
dividing the dialogue text into at least one dialogue partition, wherein dialogue sentences in different dialogue partitions represent different types of dialogue intents;
generating the abstract of the dialog text based on the importance degree of the keyword comprises the following steps:
and determining the target dialogue sentences used for composing the abstract in the dialogue partitions based on the importance degrees of the keywords to obtain the abstract composed of the target dialogue sentences in each dialogue partition.
3. The method of claim 2, wherein determining the target dialog sentences for composing the summary within the dialog partition based on the importance of the keyword comprises:
and determining the target quantity of the target dialog sentences from the dialog partitions based on the target quantity of the sentences required to be extracted by the dialog partitions with different dialog intentions and combining the importance degrees of the keywords.
4. The method of claim 1, determining an order of occurrence between keywords within a conversational sentence comprising:
for each keyword, determining co-occurrence keywords of the keyword and a co-occurrence dialogue sentence in which the keyword and the co-occurrence keyword thereof commonly occur for the first time, wherein the co-occurrence keywords of the keyword are other keywords which simultaneously occur in one dialogue sentence with the keyword;
determining an order of occurrence between the keyword and its co-occurring keywords within the co-occurring dialog sentence.
5. The method of claim 1 or 4, wherein determining the degree of influence of the keyword on the dialog text based on the order of occurrence comprises:
determining a first number of other keywords that are located after the keyword and a second number of other keywords that are located before the keyword based on the order of occurrence;
and determining the influence degree of the keywords on the dialog text based on the first quantity and the second quantity corresponding to the keywords.
6. The method of claim 4, said determining an order of occurrence between the keyword and its co-occurring keyword within the co-occurring dialog statement, comprising:
constructing a directed graph representing the incidence relation between different keywords in the dialog text;
wherein the directed graph comprises: the method comprises the following steps that directional edges with directions are indicated among a plurality of nodes and a plurality of nodes, each node represents a keyword, the directional edges are arranged between the two nodes to represent that the two keywords corresponding to the two nodes are co-occurrence keywords, and the directions of the directional edges between the two nodes represent the occurrence sequence of the two keywords corresponding to the two nodes in a co-occurrence dialogue statement;
the determining the influence degree of the keyword on the dialog text based on the appearance sequence comprises:
and determining the influence degree of the keywords represented by the nodes on the dialog text according to the number and the direction of directed edges between different nodes in the directed graph.
7. The method of claim 1, wherein determining the importance of the keyword by combining the relevance between different keywords and the influence degree of the keyword on the dialog text comprises:
aiming at each keyword, determining a first-appearing dialogue sentence with the keyword appearing for the first time in the dialogue text and a sentence source party of the first-appearing dialogue sentence;
determining the reference importance degree of the keywords by combining the sentence source party of the first-time dialog sentence corresponding to the keywords and the correlation degree between different keywords;
and determining the importance degree of the keyword by combining the reference importance degree of the keyword and the influence degree of the keyword on the dialog text.
8. The method of claim 7, the source of the statement comprising one of customer service origin and user origin;
determining the reference importance degree of the keyword by combining the sentence source side of the first-occurring dialog sentence corresponding to the keyword and the correlation degree between different keywords, wherein the step comprises the following steps of:
determining a correlation sum of correlations between the keyword and other keywords;
determining the reference importance degree of the keyword by combining the sentence source party of the first-occurring dialog sentence corresponding to the keyword and the correlation sum corresponding to the keyword;
if the first-time dialog statement corresponding to the keyword comes from customer service, the reference importance degree of the keyword is negatively correlated with the sum of the correlation degrees corresponding to the keyword;
and if the first-occurring dialog sentence corresponding to the keyword is from the user, positively correlating the reference importance degree of the keyword with the sum of the correlation degrees corresponding to the keyword.
9. The method of claim 1, determining relevance between different keywords, comprising:
for each keyword, determining a word vector of the keyword based on a dialog sentence in the dialog text, in which the keyword appears for the first time;
determining vector similarity between different keywords based on word vectors of the keywords;
and determining the correlation degree between different keywords by combining the vector similarity between different keywords, the word frequency of the keywords in the dialog text and the occurrence frequency of the keywords in a dialog sentence.
10. A digest generation apparatus comprising:
a keyword determining unit, configured to determine keywords in each pair of spoken sentences in the dialog text;
the association determining unit is used for determining the correlation between different keywords and the appearance sequence between the keywords in the dialogue sentences;
an influence determining unit configured to determine a degree of influence of the keyword on the dialog text based on the appearance order;
the importance determining unit is used for determining the importance of the keywords by combining the relevance among different keywords and the influence degree of the keywords on the dialog text;
a summary generation unit, configured to generate a summary of the dialog text based on the importance degree of the keyword, where the summary includes: and the conversation sentence in which the key word with the importance degree meeting the condition is positioned in the conversation text.
CN202210318418.1A 2022-03-29 2022-03-29 Abstract generation method and device Pending CN114661893A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210318418.1A CN114661893A (en) 2022-03-29 2022-03-29 Abstract generation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210318418.1A CN114661893A (en) 2022-03-29 2022-03-29 Abstract generation method and device

Publications (1)

Publication Number Publication Date
CN114661893A true CN114661893A (en) 2022-06-24

Family

ID=82033221

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210318418.1A Pending CN114661893A (en) 2022-03-29 2022-03-29 Abstract generation method and device

Country Status (1)

Country Link
CN (1) CN114661893A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010244339A (en) * 2009-04-07 2010-10-28 Nippon Telegr & Teleph Corp <Ntt> Related keyword presentation device and program
US20130339021A1 (en) * 2012-06-19 2013-12-19 International Business Machines Corporation Intent Discovery in Audio or Text-Based Conversation
CN111125348A (en) * 2019-11-25 2020-05-08 北京明略软件系统有限公司 Text abstract extraction method and device
CN112836016A (en) * 2021-02-05 2021-05-25 北京字跳网络技术有限公司 Method, apparatus, device and storage medium for generating meeting minutes
KR102280488B1 (en) * 2020-11-19 2021-07-22 주식회사 두유비 Conversation content summarization method based on sentence priority and keyword importance

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010244339A (en) * 2009-04-07 2010-10-28 Nippon Telegr & Teleph Corp <Ntt> Related keyword presentation device and program
US20130339021A1 (en) * 2012-06-19 2013-12-19 International Business Machines Corporation Intent Discovery in Audio or Text-Based Conversation
CN111125348A (en) * 2019-11-25 2020-05-08 北京明略软件系统有限公司 Text abstract extraction method and device
KR102280488B1 (en) * 2020-11-19 2021-07-22 주식회사 두유비 Conversation content summarization method based on sentence priority and keyword importance
CN112836016A (en) * 2021-02-05 2021-05-25 北京字跳网络技术有限公司 Method, apparatus, device and storage medium for generating meeting minutes

Similar Documents

Publication Publication Date Title
JP7323504B2 (en) METHOD, APPARATUS, ELECTRONIC DEVICE, AND COMPUTER-READABLE STORAGE MEDIUM FOR GENERATION OF MEETINGS
TWI732271B (en) Human-machine dialog method, device, electronic apparatus and computer readable medium
US20220156298A1 (en) Providing agent-assist, context-aware recommendations
CN110692050B (en) Adaptive Evaluation of Meta-Relationships in Semantic Graphs
US8660836B2 (en) Optimization of natural language processing system based on conditional output quality at risk
WO2023029420A1 (en) Power user appeal screening method and system, electronic device, and storage medium
JP6663826B2 (en) Computer and response generation method
US20160170957A1 (en) Inter Thread Anaphora Resolution
Echeverry-Correa et al. Topic identification techniques applied to dynamic language model adaptation for automatic speech recognition
Pandey et al. A study of sentiment analysis task and it's challenges
Maslowski et al. In-the-wild chatbot corpus: from opinion analysis to interaction problem detection
US8533211B2 (en) Information extraction across multiple expertise-specific subject areas
WO2024226268A1 (en) Entropy based key-phrase extraction
WO2025167001A1 (en) Dialogue processing method and apparatus, and electronic device and storage medium
CN113420544A (en) Hot word determination method and device, electronic equipment and storage medium
WO2010132062A1 (en) System and methods for sentiment analysis
Hristova Text analytics for customer satisfaction prediction: A case study in the banking domain
CN118838995A (en) Method, device, equipment and storage medium for answering questioning and notifying based on knowledge base
CN114661893A (en) Abstract generation method and device
Valiyev et al. Initial exploitation of natural language processing techniques on NATO strategy and policies
US20190244174A1 (en) System for Inspecting Message Logs Using an Interaction Engine
Leonard et al. UFEL: a By-design Understandable and Frugal Entity Linking System for French Microposts
Dinh et al. A framework to discover potential ideas of new product development from crowdsourcing application
Sharmista et al. Sentiment analysis on Tamil reviews as products in social media using machine learning techniques: A novel study
Meisenbacher et al. A Collocation-based Method for Addressing Challenges in Word-level Metric Differential Privacy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination