CN112148870B

CN112148870B - Abstract generation method and device, electronic equipment and computer readable storage medium

Info

Publication number: CN112148870B
Application number: CN201910562883.8A
Authority: CN
Inventors: 桂敏; 王睿; 田俊峰
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2019-06-26
Filing date: 2019-06-26
Publication date: 2022-09-16
Anticipated expiration: 2039-06-26
Also published as: CN112148870A

Abstract

The embodiment of the invention discloses a summary generation method, a device, electronic equipment and a computer readable storage medium, wherein the method comprises the following steps: acquiring an input object, and determining an input text according to the input object; coding an input text to obtain a semantic coding result, wherein the semantic coding result comprises a semantic coding value and an initial content vector at each moment; performing iterative computation according to the semantic coding result to obtain an initial attention distribution parameter; correcting the initial attention distribution parameter; and carrying out iterative decoding on the basis of the corrected attention distribution parameters and the semantic code values at all the moments to obtain the abstract of the input object. The technical scheme can save the time and the energy of the attribution party of the object to be processed, improve the working efficiency of the object to be processed, save the time of the reading party and output high-value information quantity for the object to be processed in a short time, and is simple in realization structure and beneficial to wide popularization and use.

Description

Abstract generation method and device, electronic equipment and computer readable storage medium

Technical Field

The embodiment of the invention relates to the technical field of abstract extraction, in particular to an abstract generating method and device, electronic equipment and a computer readable storage medium.

Background

With the development of science and technology, the amount of information which people need to accept, read and pay attention to is increasing, and in order to save time of people and improve the receiving efficiency of people on information, the abstract extraction quality of the information is more and more important. The abstract extraction of the information refers to the process of refining or summarizing the main content of the given information into one or several sentences, so that the simplified information content can be displayed to people, people can conveniently know the main content of the information in a short time, and the people can be helped to judge whether to need further detailed reading.

However, most of the existing information abstract extraction work is completed by the information attribution party, but due to factors of time and capacity, the information abstract extracted by the information attribution party is often not accurate enough or cannot be emphasized, so that the quality of the information abstract is low. If a high-quality information digest can be automatically generated, not only can the time and effort of the information attribution party be saved and the work efficiency be improved, but also the time of the reading party can be saved and a high-value information amount can be output to the reading party in a short time.

In the related art, there are various methods for automatically generating an information abstract, which can automatically generate an information abstract, but for a long text object, or cannot meet the requirement of significance of abstract generation, that is, an extracted abstract is difficult to concentrate on an important part related to the current time, or the implementation structure is complex and difficult to be widely used.

Disclosure of Invention

The embodiment of the invention provides a summary generation method and device, electronic equipment and a computer readable storage medium.

In a first aspect, an embodiment of the present invention provides a digest generation method.

Specifically, the digest generation method includes:

acquiring an input object, and determining an input text according to the input object;

coding an input text to obtain a semantic coding result, wherein the semantic coding result comprises a semantic coding value and an initial content vector at each moment;

performing iterative computation according to the semantic coding result to obtain an initial attention distribution parameter;

correcting the initial attention distribution parameter;

and carrying out iterative decoding on the basis of the corrected attention distribution parameters and the semantic code values at all the moments to obtain the abstract of the input object.

With reference to the first aspect, in a first implementation manner of the first aspect, the input object is one or more of the following objects: inputting text, voice and images;

when the input object is input voice, the acquiring the input object and determining the input text according to the input object comprise: acquiring input voice, and converting the input voice into an input text;

when the input object is an input image, the acquiring the input object, and determining an input text according to the input object includes: acquiring an input image, and identifying a text in the input image to obtain an input text.

With reference to the first aspect and the first implementation manner of the first aspect, in a second implementation manner of the first aspect, the encoding an input text to obtain a semantic encoding result includes:

performing word segmentation processing on an input text to obtain one or more words;

and carrying out word-by-word coding on the one or more words to obtain a semantic coding result.

With reference to the first aspect, the first implementation manner of the first aspect, and the second implementation manner of the first aspect, the present disclosure is in a third implementation manner of the first aspect: encoding the input text by using the first recurrent neural network to obtain a semantic encoding result, and/or,

and carrying out iterative decoding by utilizing a second recurrent neural network based on the corrected attention distribution parameters and the semantic code values at all the moments to obtain the abstract of the input text, namely the abstract of the input object.

With reference to the first implementation manner of the first aspect, the second implementation manner of the first aspect, and the third implementation manner of the first aspect, in a fourth implementation manner of the first aspect, the iteratively calculating to obtain an initial attention distribution parameter according to the semantic coding result includes:

and based on a first transformation function, calculating to obtain an initial attention distribution parameter at the current moment according to the semantic coding value at each moment and a semantic decoding result at the last moment, wherein the semantic decoding result at the initial moment is calculated based on a second transformation function according to the initial content vector.

With reference to the first aspect, the first implementation manner of the first aspect, the second implementation manner of the first aspect, the third implementation manner of the first aspect, and the fourth implementation manner of the first aspect, in a fifth implementation manner of the first aspect, the modifying the initial attention distribution parameter includes:

and correcting the initial attention distribution parameter according to the correlation between the semantic decoding result at the current moment and the input text to obtain the corrected attention distribution parameter at the current moment. With reference to the first aspect, the first implementation manner of the first aspect, the second implementation manner of the first aspect, the third implementation manner of the first aspect, the fourth implementation manner of the first aspect, and the fifth implementation manner of the first aspect, in a sixth implementation manner of the first aspect, the performing iterative decoding based on the corrected attention distribution parameter and the semantic code value at each time to obtain the abstract of the input object includes:

calculating to obtain a current-time intermediate content vector based on the corrected current-time attention distribution parameter and the semantic code values of all times by using a third transformation function;

calculating to obtain a semantic decoding result at the current moment based on the intermediate content vector at the current moment and the semantic decoding result at the historical moment by using a second transformation function;

and combining semantic decoding results at all times to obtain the abstract of the input text, namely the abstract of the input object.

With reference to the first aspect, the first implementation manner of the first aspect, the second implementation manner of the first aspect, the third implementation manner of the first aspect, the fourth implementation manner of the first aspect, the fifth implementation manner of the first aspect, and the sixth implementation manner of the first aspect, in a seventh implementation manner of the first aspect, after the correcting the initial attention distribution parameter, the method further includes:

and correcting the corrected attention distribution parameters again based on a preset objective function, wherein the preset objective function comprises a first preset objective function generated based on a local loss function and a second preset objective function generated based on a global loss function.

With reference to the first aspect, the first implementation manner of the first aspect, the second implementation manner of the first aspect, the third implementation manner of the first aspect, the fourth implementation manner of the first aspect, the fifth implementation manner of the first aspect, the sixth implementation manner of the first aspect, and the seventh implementation manner of the first aspect, in an eighth implementation manner of the first aspect, the first preset objective function is characterized as: minimizing the inverse of the variance of the attention distribution parameter at each time instant; and/or, the second preset objective function is characterized by: the variance of the difference distribution obtained by subtracting the maximum attention distribution parameter at each time from the sum of the attention distribution parameters at all times is minimized.

In a second aspect, an embodiment of the present invention provides a method for generating a travel note summary.

Specifically, the method for generating the travel notes abstract comprises the following steps:

acquiring a link, wherein the link comprises a travel note text;

generating an abstract of the travel note text based on the attention distribution;

and uploading the generated abstract to the target object.

With reference to the second aspect, in a first implementation manner of the second aspect, the generating the abstract of the shorthand text based on the attention distribution includes:

coding the travel note text to obtain a semantic coding result, wherein the semantic coding result comprises a semantic coding value and an initial content vector at each moment;

correcting the initial attention distribution parameter;

and carrying out iterative decoding on the basis of the corrected attention distribution parameters and the semantic code values at all the moments to obtain the abstract of the travel note text.

With reference to the second aspect and the first implementation manner of the second aspect, in a second implementation manner of the second aspect, the encoding the travel note text to obtain a semantic encoding result includes:

performing word segmentation processing on the travel memory text to obtain one or more words;

With reference to the second aspect, the first implementation manner of the second aspect, and the second implementation manner of the second aspect, in a third implementation manner of the second aspect: coding the travel note text by utilizing the first cyclic neural network to obtain a semantic coding result, and/or,

and carrying out iterative decoding by utilizing a second recurrent neural network based on the corrected attention distribution parameters and the semantic code values at all the moments to obtain the abstract of the travel note text.

With reference to the first implementation manner of the second aspect, the second implementation manner of the second aspect, and the third implementation manner of the second aspect, in a fourth implementation manner of the second aspect of the present disclosure, the iteratively calculating an initial attention distribution parameter according to the semantic coding result includes:

With reference to the second aspect, the first implementation manner of the second aspect, the second implementation manner of the second aspect, the third implementation manner of the second aspect, and the fourth implementation manner of the second aspect, in a fifth implementation manner of the second aspect of the present disclosure, the modifying the initial attention distribution parameter includes:

and correcting the initial attention distribution parameter according to the correlation between the semantic decoding result at the current moment and the travel note text to obtain the corrected attention distribution parameter at the current moment.

With reference to the second aspect, the first implementation manner of the second aspect, the second implementation manner of the second aspect, the third implementation manner of the second aspect, the fourth implementation manner of the second aspect, and the fifth implementation manner of the second aspect, in a sixth implementation manner of the second aspect, the iteratively decoding the modified attention distribution parameter and the semantic code value at each time to obtain the abstract of the travel note text includes:

and combining semantic decoding results at all times to obtain the abstract of the travel note text.

With reference to the second aspect, the first implementation manner of the second aspect, the second implementation manner of the second aspect, the third implementation manner of the second aspect, the fourth implementation manner of the second aspect, the fifth implementation manner of the second aspect, and the sixth implementation manner of the second aspect, in a seventh implementation manner of the second aspect, after the correcting the initial attention distribution parameter, the method further includes:

With reference to the second aspect, the first implementation manner of the second aspect, the second implementation manner of the second aspect, the third implementation manner of the second aspect, the fourth implementation manner of the second aspect, the fifth implementation manner of the second aspect, the sixth implementation manner of the second aspect, and the seventh implementation manner of the second aspect, in an eighth implementation manner of the second aspect, the first preset objective function is characterized as: minimizing the inverse of the variance of the attention distribution parameter at each time instant; and/or, the second preset objective function is characterized by: the variance of the difference distribution obtained by subtracting the maximum attention distribution parameter at each time from the sum of the attention distribution parameters at all times is minimized.

In a third aspect, an embodiment of the present invention provides an apparatus for generating a summary.

Specifically, the digest generation apparatus includes:

a determination module configured to acquire an input object, and determine an input text according to the input object;

the encoding module is configured to encode an input text to obtain a semantic encoding result, wherein the semantic encoding result comprises a semantic encoding value and an initial content vector at each moment;

the calculation module is configured to obtain an initial attention distribution parameter through iterative calculation according to the semantic coding result;

a modification module configured to modify the initial attention distribution parameter;

and the decoding module is configured to perform iterative decoding on the basis of the corrected attention distribution parameters and the semantic code values at all the moments to obtain the abstract of the input object.

With reference to the third aspect, in a first implementation manner of the third aspect, the input object is one or more of the following objects: inputting text, voice and image;

when the input object is input speech, the determination module is configured to: acquiring input voice, and converting the input voice into an input text;

when the input object is an input image, the determination module is configured to: acquiring an input image, and identifying a text in the input image to obtain an input text.

With reference to the third aspect and the first implementation manner of the third aspect, in a second implementation manner of the third aspect, an embodiment of the present invention includes:

the first word segmentation processing submodule is configured to perform word segmentation processing on an input text to obtain one or more words;

the first coding submodule is configured to perform word-by-word coding on the one or more words to obtain a semantic coding result.

With reference to the third aspect, the first implementation manner of the third aspect, and the second implementation manner of the third aspect, in a third implementation manner of the third aspect, the encoding module is configured to: encoding the input text by using the first recurrent neural network to obtain a semantic encoding result, and/or,

the decoding module is configured to: and carrying out iterative decoding by utilizing a second recurrent neural network based on the corrected attention distribution parameters and the semantic code values at all the moments to obtain the abstract of the input text, namely the abstract of the input object.

With reference to the third aspect, the first implementation manner of the third aspect, the second implementation manner of the third aspect, and the third implementation manner of the third aspect, in a fourth implementation manner of the third aspect, the calculation module is configured to:

With reference to the third aspect, the first implementation manner of the third aspect, the second implementation manner of the third aspect, the third implementation manner of the third aspect, and the fourth implementation manner of the third aspect, in a fifth implementation manner of the third aspect, the modifying module is configured to:

and correcting the initial attention distribution parameter according to the correlation between the semantic decoding result at the current moment and the input text to obtain the corrected attention distribution parameter at the current moment.

With reference to the third aspect, the first implementation manner of the third aspect, the second implementation manner of the third aspect, the third implementation manner of the third aspect, the fourth implementation manner of the third aspect, and the fifth implementation manner of the third aspect, in a sixth implementation manner of the third aspect, the decoding module includes:

the first calculation submodule is configured to calculate and obtain a current-time intermediate content vector based on the corrected current-time attention distribution parameter and the semantic code values at all times by using a third transformation function;

the second calculation submodule is configured to calculate a current time semantic decoding result based on the current time intermediate content vector and a historical time semantic decoding result by using a second transformation function;

and the combination submodule is configured to combine semantic decoding results at all times to obtain an abstract of the input text, namely the abstract of the input object.

With reference to the third aspect, the first implementation manner of the third aspect, the second implementation manner of the third aspect, the third implementation manner of the third aspect, the fourth implementation manner of the third aspect, the fifth implementation manner of the third aspect, and the sixth implementation manner of the third aspect, in a seventh implementation manner of the third aspect, the present disclosure further includes, after the modifying module, that:

and the re-correction module is configured to re-correct the corrected attention distribution parameters based on a preset objective function, wherein the preset objective function comprises a first preset objective function generated based on a local loss function and a second preset objective function generated based on a global loss function.

With reference to the third aspect, the first implementation manner of the third aspect, the second implementation manner of the third aspect, the third implementation manner of the third aspect, the fourth implementation manner of the third aspect, the fifth implementation manner of the third aspect, the sixth implementation manner of the third aspect, and the seventh implementation manner of the third aspect, in an eighth implementation manner of the third aspect, the first preset objective function is characterized by: minimizing the inverse of the variance of the attention distribution parameter at each time instant; and/or, the second preset objective function is characterized by: the variance of the difference distribution obtained by subtracting the maximum attention distribution parameter at each time from the sum of the attention distribution parameters at all times is minimized.

In a fourth aspect, an embodiment of the present invention provides a travel note summary generating apparatus.

Specifically, the apparatus for generating a shorthand summary includes:

an acquisition module configured to acquire a link, the link comprising a travel note text;

a generation module configured to generate a summary of the biographical text based on an attention distribution;

and the uploading module is configured to upload the generated abstract to the target object.

With reference to the fourth aspect, in a first implementation manner of the fourth aspect, the generating module includes:

the second coding submodule is configured to code the travel note text to obtain a semantic coding result, wherein the semantic coding result comprises a semantic coding value and an initial content vector at each moment;

a third calculation submodule configured to obtain an initial attention distribution parameter through iterative calculation according to the semantic coding result;

a modification submodule configured to modify the initial attention distribution parameter;

and the decoding submodule is configured to perform iterative decoding on the basis of the corrected attention distribution parameters and the semantic code values at all the moments to obtain the abstract of the travel note text.

With reference to the fourth aspect and the first implementation manner of the fourth aspect, in a second implementation manner of the fourth aspect, the second encoding sub-module includes:

the second word segmentation processing submodule is configured to perform word segmentation processing on the travel memory text to obtain one or more words;

and the third coding sub-module is configured to perform word-by-word coding on the one or more words to obtain a semantic coding result.

With reference to the fourth aspect, the first implementation manner of the fourth aspect, and the second implementation manner of the fourth aspect, in a third implementation manner of the fourth aspect: the second encoding submodule is configured to encode the travel note text using the first recurrent neural network, obtain a semantic encoding result, and/or,

and the decoding submodule is configured to perform iterative decoding on the basis of the corrected attention distribution parameters and the semantic code values at all times by using a second recurrent neural network to obtain an abstract of the travel note text.

With reference to the first implementation manner of the fourth aspect, the second implementation manner of the fourth aspect, and the third implementation manner of the fourth aspect, in a fourth implementation manner of the fourth aspect, the third computation submodule is configured to:

With reference to the fourth aspect, the first implementation manner of the fourth aspect, the second implementation manner of the fourth aspect, the third implementation manner of the fourth aspect, and the fourth implementation manner of the fourth aspect, in a fifth implementation manner of the fourth aspect, the modification submodule is configured to:

With reference to the fourth aspect, the first implementation manner of the fourth aspect, the second implementation manner of the fourth aspect, the third implementation manner of the fourth aspect, the fourth implementation manner of the fourth aspect, and the fifth implementation manner of the fourth aspect, in a sixth implementation manner of the fourth aspect, the decoding sub-module of the present disclosure includes:

a fourth calculation submodule configured to calculate, by using a third transformation function, an intermediate content vector at the current time based on the corrected attention distribution parameter at the current time and the semantic code values at the respective times;

a fifth calculation submodule configured to calculate, by using a second transformation function, a current time semantic decoding result based on the current time intermediate content vector and a historical time semantic decoding result;

and the second combination sub-module is configured to combine semantic decoding results at all times to obtain the abstract of the travel note text.

With reference to the fourth aspect, the first implementation manner of the fourth aspect, the second implementation manner of the fourth aspect, the third implementation manner of the fourth aspect, the fourth implementation manner of the fourth aspect, the fifth implementation manner of the fourth aspect, and the sixth implementation manner of the fourth aspect, in a seventh implementation manner of the fourth aspect, the disclosure further includes, after the modifying sub-module:

and the re-correction submodule is configured to re-correct the corrected attention distribution parameters based on a preset objective function, wherein the preset objective function comprises a first preset objective function generated based on a local loss function and a second preset objective function generated based on a global loss function.

With reference to the fourth aspect, the first implementation manner of the fourth aspect, the second implementation manner of the fourth aspect, the third implementation manner of the fourth aspect, the fourth implementation manner of the fourth aspect, the fifth implementation manner of the fourth aspect, the sixth implementation manner of the fourth aspect, and the seventh implementation manner of the fourth aspect, in an eighth implementation manner of the fourth aspect, the first preset objective function is characterized by: minimizing the inverse of the variance of the attention distribution parameter at each time instant; and/or, the second preset objective function is characterized by: the variance of the difference distribution obtained by subtracting the maximum attention distribution parameter at each time from the sum of the attention distribution parameters at all times is minimized.

In a fifth aspect, an embodiment of the present invention provides an electronic device, including a memory and a processor, where the memory is used to store one or more computer instructions for supporting any of the above apparatuses to execute any of the above methods, and the processor is configured to execute the computer instructions stored in the memory. Any of the above may also include a communication interface for communicating with other devices or a communication network.

In a sixth aspect, the present invention provides a computer-readable storage medium for storing computer instructions for any one of the apparatuses above, which includes computer instructions for performing any one of the methods described above for any one of the apparatuses above.

The technical scheme provided by the embodiment of the invention has the following beneficial effects:

the technical scheme is based on an attention mechanism, the abstract information of the input text is obtained by encoding and decoding the input text determined according to the input object, and the high-quality information abstract with significance can be obtained even for long text objects by correcting the attention distribution parameters. By the technical scheme, the time and the energy of the party to which the object to be processed belongs can be saved, the working efficiency of the party to which the object to be processed belongs is improved, the time of the party to be read can be saved, high-value information can be output for the party to be processed in a short time, and the realization structure of the technical scheme is simple and is beneficial to wide popularization and use.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of embodiments of the invention.

Drawings

Other features, objects and advantages of embodiments of the invention will become more apparent from the following detailed description of non-limiting embodiments thereof, when taken in conjunction with the accompanying drawings. In the drawings:

FIG. 1 illustrates a flow diagram of a digest generation method according to an embodiment of the present invention;

FIG. 2 is a diagram illustrating a scene application when an input object is input speech;

FIG. 3 illustrates a scene application diagram when the input object is an input image;

fig. 4 shows a flowchart of step S102 of the digest generation method according to the embodiment shown in fig. 1;

fig. 5 shows a flow chart of step S105 of the digest generation method according to the embodiment shown in fig. 1;

FIG. 6 illustrates a flow diagram of a summary generation method according to another embodiment of the present invention;

FIG. 7 illustrates a flow diagram of a method of biographical summary generation, according to an embodiment of the invention;

FIG. 8 shows a flowchart of step S702 of the biographical summary generation method according to the embodiment shown in FIG. 7;

fig. 9 is a block diagram showing the structure of a digest generation apparatus according to an embodiment of the present invention;

FIG. 10 is a block diagram of an encoding module 902 of the digest generation apparatus according to the embodiment shown in FIG. 9;

fig. 11 is a block diagram illustrating a structure of a decoding module 905 of the digest generation apparatus according to the embodiment illustrated in fig. 9;

fig. 12 is a block diagram showing the construction of a digest generation apparatus according to another embodiment of the present invention;

fig. 13 is a block diagram showing the structure of a travel note digest generation apparatus according to an embodiment of the present invention;

fig. 14 is a block diagram showing a structure of a generation module 1302 of the apparatus for generating a summary of note according to the embodiment shown in fig. 13;

FIG. 15 shows a block diagram of an electronic device according to an embodiment of the invention;

FIG. 16 is a schematic block diagram of a computer system suitable for implementing the method according to the above embodiment of the present invention.

Detailed Description

Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily implement them. Furthermore, parts that are not relevant to the description of the exemplary embodiments have been omitted from the drawings for the sake of clarity.

In the embodiments of the present invention, it is to be understood that terms such as "including" or "having", etc., are intended to indicate the presence of the features, numbers, steps, actions, components, parts, or combinations thereof disclosed in the present specification, and are not intended to exclude the possibility that one or more other features, numbers, steps, actions, components, parts, or combinations thereof may be present or added.

It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict. Embodiments of the present invention will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

The technical scheme provided by the embodiment of the invention is based on an attention mechanism, the abstract information of the input text is obtained by encoding and decoding the input text determined according to the input object, and the high-quality information abstract with significance can be obtained even for a long text object by correcting the attention distribution parameter. By the technical scheme, the time and the energy of the party to which the object to be processed belongs can be saved, the working efficiency of the party to which the object to be processed belongs is improved, the time of the party to be read can be saved, high-value information can be output for the party to be processed in a short time, and the realization structure of the technical scheme is simple and is beneficial to wide popularization and use.

Fig. 1 shows a flowchart of a digest generation method according to an embodiment of the present invention, which includes the following steps S101 to S105, as shown in fig. 1:

in step S101, an input object is acquired, and an input text is determined according to the input object;

in step S102, an input text is encoded to obtain a semantic encoding result, where the semantic encoding result includes a semantic encoding value and an initial content vector at each time;

in step S103, an initial attention distribution parameter is obtained by iterative computation according to the semantic coding result;

in step S104, the initial attention distribution parameter is corrected;

in step S105, iterative decoding is performed based on the corrected attention distribution parameter and the semantic code value at each time to obtain a summary of the input object.

As mentioned above, with the development of science and technology, the amount of information that people need to accept, read and pay attention to is increasing, and the quality of abstract extraction of information is becoming more and more important in order to save people's time and improve people's receiving efficiency of information. The abstract extraction of the information refers to the process of refining or summarizing the main content of the given information into one or several sentences, so that the simplified information content can be displayed to people, people can conveniently know the main content of the information in a short time, and the people can be helped to judge whether to need further detailed reading. However, the existing information abstract extraction work has various defects, such as incapability of solving the significance problem of the abstract, or complex implementation structure, and difficulty in wide application.

In view of the above-described drawbacks, in this embodiment, a digest generation method is proposed that obtains digest information of an input text by encoding and decoding the input text determined according to an input object based on an attention mechanism, and that can obtain a high-quality information digest having significance even for a long text object by correcting an attention distribution parameter. According to the technical scheme, the time and the energy of the attribution party of the object to be processed can be saved, the working efficiency of the object to be processed is improved, the time of the reading party can be saved, high-value information quantity is output for the object to be processed in a short time, the structure is simple, and the object to be processed is beneficial to wide popularization and use.

In an optional implementation manner of this embodiment, the input object may be one or more of the following objects: input text, input speech, input images, etc. However, no matter what form the input object is, the output abstract is in text form, so when the input object is input voice, the step S101 is to acquire the input object, and the step of determining the input text according to the input object includes: acquiring input voice, and converting the input voice into an input text; when the input object is an input image, the step S101 is to acquire the input object, and the step of determining the input text according to the input object includes: acquiring an input image, identifying a text in the input image to obtain an input text, subsequently processing the determined input text, and extracting to obtain a corresponding abstract.

Fig. 2 is a schematic view of a scene application when an input object is input voice, and as shown in fig. 2, when the input object is voice uttered by a person, the voice is first converted into a text by using a voice recognition technology, and then the text is processed to extract a corresponding abstract.

Fig. 3 is a schematic view illustrating a scene application when an input object is an input image, and as shown in fig. 3, when the input object is an image containing a large number of characters, firstly, a text in the image is recognized by using an image recognition technology, and then, the recognized text is processed to extract a corresponding abstract.

In an optional implementation manner of this embodiment, the encoding of the input text refers to performing a semantic encoding process on the input text to obtain a corresponding semantic encoding result, and the subsequent semantic encoding result may be used to perform a semantic decoding process to obtain summary information corresponding to the input text and capable of reflecting important content of the input text. The content of the input object and the content of the input text corresponding to the input object are different from the content of the summary information obtained by the semantic decoding processing, and the text length is also different, and generally speaking, the text length of the input text is larger than the text length of the summary information.

In an optional implementation manner of this embodiment, a Recurrent Neural Network (RNN) is used to encode and decode information, specifically, a first RNN is used to encode an input text to obtain a semantic encoding result, and a second RNN is used to perform iterative decoding based on the corrected attention distribution parameter and the semantic encoding value at each time to obtain an abstract of the input text, that is, an abstract of the input object. In an optional implementation manner of this embodiment, the first recurrent neural network may be selected as a Bi-directional Long-Short Term Memory (Bi-LSTM), and the second recurrent neural network may be selected as a unidirectional Long-Short Term Memory (LSTM). In this implementation, the semantic code value at each time point refers to the hidden state value of Bi-LSTM, the initial content vector refers to the output of Bi-LSTM, and the final hidden state value of LSTM constitutes the abstract of the input text.

In an alternative implementation of the embodiment, the attention distribution parameter is a parameter involved in the attention mechanism to characterize the importance of the information that requires a lot of attention. The attention mechanism is derived from a brain signal processing mechanism specific to human vision, namely, the human vision obtains a target area needing important attention by rapidly scanning a global image, namely a focus of attention, and then more attention resources are invested into the target area, so that more detailed information of the target needing attention can be obtained, and other useless information is suppressed. The attention mechanism in deep learning is similar to the selective visual attention mechanism of human in nature, and the main purpose of the attention mechanism is to select more critical information for the current task target from a plurality of information.

In an optional implementation manner of this embodiment, as shown in fig. 4, the step S102, namely, the step of encoding the input text to obtain the semantic encoding result, includes the following steps S401 to S402:

in step S401, performing word segmentation processing on an input text to obtain one or more words;

in step S402, performing word-by-word encoding on the one or more words to obtain a semantic encoding result.

In order to improve the accuracy of text coding and adapt to the characteristics of the circular neural network semantic coding, in the implementation mode, firstly, word segmentation processing is carried out on an input text to be processed to obtain one or more words, and then word-by-word coding is carried out on the obtained one or more words to obtain a final semantic coding result.

The specific way of word segmentation processing can be selected by those skilled in the art according to the needs of practical application, and the present disclosure does not specifically limit the same.

In an optional implementation manner of this embodiment, the step S103 of obtaining an initial attention distribution parameter by iterative computation according to the semantic coding result may include:

In order to represent the importance degree of information requiring a great deal of attention, in this implementation, an attention distribution parameter that can be used for subsequent decoding processing is calculated, and specifically, based on the first transformation function, an initial attention distribution parameter at the current time is calculated according to the semantic code value at each time and the semantic decoding result at the previous time.

Assuming that the semantic code value at each time is h _j The last time semantic decoding result is represented as H _i-1 Where j represents the time at which the semantic code value is located, and may also represent the order of occurrence of words in the input text, that is, h _j The semantic code value corresponding to the jth word in the input text can also be represented, the value range of j is 1-Lx, wherein Lx represents the length of the input text, namely the number of words in the input text, H _i-1 The hidden state value at the i-1 th time of the second recurrent neural network can also be represented. Then the current instant initial attention distribution parameter W _ij Can be expressed as:

W _ij ＝F ₁ (h _j ,H _i-1 )，

wherein, F ₁ Denotes for an input parameter h _j And H _i-1 A first transformation function for transforming, which is used for characterizing the target word y in the finally obtained abstract _i Associations with each input word in the input text. Thus, after the hidden layer state value of the second recurrent neural network at each moment is obtained, the initial attention distribution parameter W at the corresponding moment can be determined _ij The specific calculation of the hidden layer state value at each moment of the second recurrent neural network will be described in detail belowA description is given.

In an optional implementation manner of this embodiment, the expression form of the first transformation function may be selected and set according to the requirement of practical application, and the disclosure is not particularly limited to the specific expression form.

Wherein the semantic decoding result at the initial time can be calculated based on a second transformation function according to the initial content vector, the initial content vector is obtained by encoding the input text and can be represented as C ₁ Then the semantic decoding result H of the initial time ₁ That is, the first target word in the final abstract can be expressed as: h ₁ ＝y ₁ ＝F ₂ (C ₁ ) Wherein F is ₂ (■, …, ■) represents a second transformation function for transforming the content vector to obtain a decoding result, it should be noted that, since an iterative decoding mechanism is adopted in the implementation manner of the present disclosure, when obtaining the semantic decoding result, it is necessary to comprehensively consider the content vector and the semantic decoding result at the historical time, and according to the number of the semantic decoding results, the input parameters of the second transformation function are one or more, for example, at the initial time, the semantic decoding result is not obtained yet, and then the input parameters of the second transformation function are only the initial content vector C at the initial time ₁ (ii) a Using a second transformation function based on the initial content vector C ₁ Calculating to obtain a semantic decoding result H at a first moment ₁ The calculated content vector C at the second time may then be used ₂ Semantic decoding result H with first time ₁ And simultaneously as the input parameter of the second transformation function, calculating to obtain a semantic decoding result H at a second moment ₂ ：H ₂ ＝y ₂ ＝F ₂ (C ₂ ,H ₁ ) (ii) a Then, the content vector C of the third moment is obtained ₃ Semantic decoding result H with first time and second time ₁ And H ₂ And simultaneously, the semantic decoding result H at the third moment is obtained by calculation as the input parameter of the second transformation function ₃ ：H ₃ ＝y ₃ ＝F ₂ (C ₃ ,H ₁ ,H ₂ ) By analogy, i.e.Semantic decoding results at various times can be obtained.

In an alternative implementation manner of this embodiment, the representation form of the second transformation function may be selected and set according to the needs of practical application, and the disclosure is not particularly limited to the specific representation form.

In an optional implementation manner of this embodiment, the step S104, namely, the step of correcting the initial attention distribution parameter, may include:

Considering that for a long text, the initial attention distribution obtained by calculation is often dispersed, which may cause a lack of significance to some extent, and is not beneficial to the improvement of the quality of the generated summary, in this implementation, after the initial attention distribution is obtained by calculation, the initial attention distribution may be corrected according to the semantic decoding result at the current time, that is, whether attention is placed on the key content related to the current state is determined, so as to compensate for the information significance. Specifically, the initial attention distribution parameter W may be calculated according to a correlation between a semantic decoding result at the current time and an input text _ij Correcting to obtain corrected attention distribution parameter W 'at current time' _ij . If the correlation degree between the semantic decoding result at the current moment and the input text is calculated to confirm that the correlation degree between the semantic decoding result at the current moment and the input text is small, namely the input text is unimportant information for the semantic decoding at the current moment, the corresponding attention distribution parameters can be reduced to disperse the originally concentrated attention, otherwise, if the correlation degree between the semantic decoding result at the current moment and the input text is calculated to confirm that the correlation degree between the semantic decoding result at the current moment and the input text is large, namely the input text is more important information for the semantic decoding at the current moment, the corresponding attention distribution parameters can be increased to focus the attention on the relatively important contents.

The calculation of the correlation between the semantic decoding result at the current time and the input text can be performed in various ways, and a person skilled in the art can select a suitable calculation method according to the needs of practical application and the characteristics of the text object, which is not specifically limited by the present disclosure. In addition, when calculating the correlation between the semantic decoding result at the current time and the input text, in order to improve the accuracy of the calculation and reduce the calculation amount, a part of the input text may be selected for performing the correlation calculation, for example, a part of the input text with a preset length that may be correlated with the semantic decoding result at the current time may be selected for performing the calculation.

In an optional implementation manner of this embodiment, as shown in fig. 5, the step S105 of performing iterative decoding based on the corrected attention distribution parameter and the semantic code value at each time to obtain the abstract of the input object includes the following steps S501 to S503:

in step S501, a third transformation function is used to calculate an intermediate content vector at the current time based on the corrected attention distribution parameter at the current time and the semantic code values at each time;

in step S502, a current time semantic decoding result is calculated based on the current time intermediate content vector and a historical time semantic decoding result by using a second transformation function;

in step S503, the semantic decoding results at each time are combined to obtain the abstract of the input text, i.e., the abstract of the input object.

In the implementation mode, an abstract of the input text is obtained in an iterative decoding mode, specifically, a third transformation function is firstly utilized, and a current-time intermediate content vector is calculated and obtained based on a corrected current-time attention distribution parameter and semantic coding values of all times; then, calculating to obtain a semantic decoding result at the current moment based on the intermediate content vector at the current moment and the semantic decoding result at the historical moment by utilizing a second transformation function; and finally, combining the semantic decoding results to obtain the abstract of the input text.

More specifically, assume that the corrected ith time point attention distribution parameterNumber represents W' _ij Each time semantic code value is represented as h _j Then the intermediate content vector C at the ith time instant _i By a third transformation function F ₃ (■ ) calculated as:

C _i ＝F ₃ (W’ _ij ,h _j )。

in an alternative implementation manner of this embodiment, the representation form of the third transformation function may be selected and set according to the needs of practical application, and the disclosure is not particularly limited to the specific representation form. For example, the third transformation function may be selected as a weighted sum function, etc., according to the requirements of the actual application.

Obtaining the intermediate content vector C at the ith time _i Thereafter, the second transformation function F may be utilized in accordance with the description above ₂ (■, …, ■) and semantic decoding results at the historical time are calculated to obtain a semantic decoding result H at the ith time _i That is, the ith target word in the finally obtained abstract: h _i ＝y _i ＝F ₂ (C _i ,y _i-1 ,y _i-2 ,…,y ₁ )。

And finally, combining the semantic decoding results of all the moments according to the sequence of the moments from first to last to obtain the generated abstract of the input text.

In order to ensure the accuracy of the attention mechanism, in an optional implementation manner of this embodiment, after the initial attention distribution parameter is corrected, a step of correcting the corrected attention distribution parameter again based on an objective function is further included, that is, as shown in fig. 6, the summary generation method includes the following steps S601 to S606:

in step S601, an input object is acquired, and an input text is determined according to the input object;

in step S602, an input text is encoded to obtain a semantic encoding result, where the semantic encoding result includes a semantic encoding value and an initial content vector at each time;

in step S603, an initial attention distribution parameter is obtained through iterative computation according to the semantic coding result;

in step S604, the initial attention distribution parameter is corrected;

in step S605, correcting the corrected attention distribution parameter again based on a preset objective function, where the preset objective function includes a first preset objective function generated based on a local loss function and a second preset objective function generated based on a global loss function;

in step S606, iterative decoding is performed based on the corrected attention distribution parameter and the semantic code value at each time to obtain a summary of the input object.

In this embodiment, in order to further improve the quality of attention concentration, the attention mechanism is supervised by providing a first preset objective function based on a local loss function and a second preset objective function based on a global loss function, respectively, and the corrected attention distribution parameter W' is set in consideration of the fact that the attention concentration position may be locally deviated at each time and the same position may be repeatedly noticed and observed from a global perspective when the long text digest is generated. _ij Performing a second correction to obtain a second corrected attention distribution parameter W' _ij To correct local and global errors.

Considering that a distribution of attention concentration should be relatively sharp, i.e. the variance will be relatively large, in order to correct local errors, in an alternative implementation of the present embodiment the first preset objective function is set to minimize the inverse of the variance of the attention distribution parameter at each moment.

For the whole decoding process, the information of the same position should not be focused multiple times, otherwise, the problem of repeated focusing is caused. Therefore, except that attention is focused to a specific location at a certain moment, attention to the location at other moments should be small. Therefore, in order to correct the global error, in an alternative implementation manner of the embodiment, the second preset objective function is set to make a difference distribution obtained by subtracting the maximum attention distribution parameter at each time from the sum of the attention distribution parameters at all times within a preset range, or to minimize a variance of the difference distribution, so as to prevent the attention from being focused on the same position repeatedly. Ideally, the difference distribution should be relatively flat because a specific location should not be assigned too much attention at any time except when the specific location obtains a maximum attention distribution value, and therefore, the flatness of the difference distribution is limited by setting a preset range, or the problem of repeated attention of information is reduced by minimizing the variance of the difference distribution, so as to achieve a minimum global error.

Fig. 7 is a flowchart illustrating a method for generating a shorthand summary according to an embodiment of the present invention, and as shown in fig. 7, the method for generating a shorthand summary includes the following steps S701 to S703:

in step S701, a link is acquired, where the link includes a travel note text;

in step S702, generating a summary of the travel note text based on the attention distribution;

in step S703, the generated digest is uploaded to the target object.

In view of the fact that the text content of many travel notes is long at present, which is not beneficial for readers to quickly obtain the content of interest of the readers, and many travel notes exist in the form of links, in this embodiment, the travel note text in the links is processed, the abstract of the text is extracted, and the extracted abstract is uploaded to the target object. The target object refers to an object that can place, store, or carry a summary carrier, such as a memory, a website, etc., including a link of the summary.

In an alternative implementation manner of this embodiment, as shown in fig. 8, the step S702, that is, the step of generating the abstract of the shorthand text based on the attention distribution, includes the following steps S801 to S804:

in step S801, the travel note text is encoded to obtain a semantic encoding result, where the semantic encoding result includes a semantic encoding value and an initial content vector at each time;

in step S802, an initial attention distribution parameter is obtained through iterative computation according to the semantic coding result;

in step S803, the initial attention distribution parameter is corrected;

in step S804, iterative decoding is performed based on the corrected attention distribution parameter and the semantic code value at each time to obtain a summary of the travel note text.

In order to improve the accuracy of the summary extraction, in the embodiment, based on the attention mechanism, the summary information of the input shorthand text is obtained by encoding and decoding the input shorthand text, and by correcting the attention distribution parameter, a high-quality information summary with significance can be obtained even for a long shorthand text object. This technical scheme not only can save text attribution side's time and energy, improves its work efficiency, can also save the time of reading the side, for its output high value information volume in the short time to it realizes simple structure, is favorable to extensively promoting and using.

It should be noted that some technical terms or technical features related to fig. 8 are the same as or similar to those mentioned in the above-mentioned embodiments, and corresponding explanations and descriptions can refer to the description of the above-mentioned embodiments, and the present invention is not repeated herein.

The following are embodiments of the apparatus of the present invention that may be used to perform embodiments of the method of the present invention.

Fig. 9 is a block diagram illustrating a structure of a digest generation apparatus according to an embodiment of the present invention, which may be implemented as part or all of an electronic device by software, hardware, or a combination of both. As shown in fig. 9, the digest generation apparatus includes:

a determining module 901 configured to acquire an input object, and determine an input text according to the input object;

an encoding module 902, configured to encode an input text to obtain a semantic encoding result, where the semantic encoding result includes a semantic encoding value and an initial content vector at each time;

a calculating module 903, configured to obtain an initial attention distribution parameter by iterative calculation according to the semantic coding result;

a modification module 904 configured to modify the initial attention distribution parameter;

and a decoding module 905 configured to perform iterative decoding based on the corrected attention distribution parameter and the semantic code value at each time to obtain a summary of the input object.

In view of the above-described drawbacks, in this embodiment, a digest generation apparatus is proposed that obtains digest information of an input text determined from an input object by encoding and decoding the input text based on an attention mechanism, and that can obtain a high-quality digest of information with saliency even for a long text object by correcting an attention distribution parameter. According to the technical scheme, the time and the energy of the attribution party of the object to be processed can be saved, the working efficiency of the object to be processed is improved, the time of the reading party can be saved, high-value information quantity is output for the object to be processed in a short time, the structure is simple, and the object to be processed is beneficial to wide popularization and use.

In an optional implementation manner of this embodiment, the input object may be one or more of the following objects: input text, input speech, input images, etc. However, no matter what form the input object is, the output abstract is in text form, and therefore, when the input object is input speech, the determining module 901 may be configured to: acquiring input voice, and converting the input voice into an input text; when the input object is an input image, the determining module 901 may be configured to: acquiring an input image, identifying a text in the input image to obtain an input text, subsequently processing the determined input text, and extracting to obtain a corresponding abstract.

Fig. 2 shows a schematic view of a scene application when an input object is input speech, and as shown in fig. 2, when the input object is speech uttered by a person, a determining module 901 firstly converts the speech into a text by using a speech recognition technology, and then processes the text to extract a corresponding abstract.

Fig. 3 is a schematic view illustrating a scene application when an input object is an input image, and as shown in fig. 3, when the input object is an image containing a large number of characters, the determining module 901 firstly identifies a text in the image by using an image identification technology, and then processes the identified text to extract a corresponding abstract.

In an optional implementation manner of this embodiment, the encoding module 902 encodes the input text, that is, performs a semantic encoding process on the input text to obtain a corresponding semantic encoding result, and the subsequent semantic encoding result may be used by the decoding module 905 to perform a semantic decoding process to obtain abstract information corresponding to the input text and capable of reflecting important content of the input text. The content of the input object and the content of the input text corresponding to the input object are different from the content of the summary information obtained by the semantic decoding processing, and the text length is also different, and generally speaking, the text length of the input text is larger than the text length of the summary information.

In an optional implementation manner of this embodiment, a Recurrent Neural Network (RNN) is used to encode and decode information, specifically, the encoding module 501 encodes an input text by using a first Recurrent Neural Network to obtain a semantic encoding result, and the decoding module 504 performs iterative decoding by using a second Recurrent Neural Network based on the corrected attention distribution parameter and the semantic encoding value at each time to obtain an abstract of the input text, that is, an abstract of the input object. In an optional implementation manner of this embodiment, the first recurrent neural network may be selected as a Bi-directional Long-Short Term Memory (Bi-LSTM), and the second recurrent neural network may be selected as a unidirectional Long-Short Term Memory (LSTM). In this implementation, the semantic code value at each time point refers to the hidden state value of Bi-LSTM, the initial content vector refers to the output of Bi-LSTM, and the final hidden state value of LSTM constitutes the abstract of the input text.

In an alternative implementation of the embodiment, the attention distribution parameter is a parameter involved in the attention mechanism to characterize the importance of the information that requires a lot of attention. The attention mechanism is derived from a brain signal processing mechanism specific to human vision, namely, the human vision obtains a target area needing important attention by rapidly scanning a global image, namely, a so-called attention focus, and then more attention resources are invested in the target area, so that more detail information of the target needing attention can be obtained, and other useless information is suppressed. The attention mechanism in deep learning is similar to the selective visual attention mechanism of human in nature, and the main purpose of the attention mechanism is to select more critical information for the current task target from a plurality of information.

In an optional implementation manner of this embodiment, as shown in fig. 10, the encoding module 902 includes:

a first word segmentation sub-module 1001 configured to perform word segmentation on an input text to obtain one or more words;

the first encoding sub-module 1002 is configured to perform word-by-word encoding on the one or more words to obtain a semantic encoding result.

In order to improve the accuracy of text coding and adapt to the features of the cyclic neural network semantic coding, in this implementation, the first word segmentation sub-module 1001 first performs word segmentation on an input text to be processed to obtain one or more words, and the first coding sub-module 1002 performs word-by-word coding on the obtained one or more words to obtain a final semantic coding result.

The specific manner of the word segmentation processing of the first word segmentation processing sub-module 1001 may be selected by those skilled in the art according to the needs of practical applications, and the present disclosure does not specifically limit the same.

In an optional implementation manner of this embodiment, the calculation module 903 may be configured to:

In order to represent the importance degree of information requiring a great deal of attention, in this implementation, the attention distribution parameter that can be used for the subsequent decoding processing is calculated, and specifically, the calculating module 502 calculates the current-time initial attention distribution parameter according to the semantic coding value at each time and the semantic decoding result at the previous time based on the first transformation function.

Assuming that the semantic code value at each time is h _j The last time semantic decoding result is represented as H _i-1 Where j represents the time at which the semantic code value is located, and may also represent the order of occurrence of words in the input text, that is, h _j The semantic code value corresponding to the jth word in the input text can be represented, the value range of j is 1-Lx, wherein Lx represents the length of the input text, namely the number of words in the input text, and H _i-1 The hidden state value at the i-1 th time of the second recurrent neural network can also be represented. Then said whenTime of day initial attention distribution parameter W _ij Can be expressed as:

W _ij ＝F ₁ (h _j ,H _i-1 )，

wherein, F ₁ (■ ) shows for an input parameter h _j And H _i-1 A first transformation function for transforming, which is used for characterizing the target word y in the finally obtained abstract _i Associations with each input word in the input text. Thus, after the hidden layer state value of the second recurrent neural network at each moment is obtained, the initial attention distribution parameter W at the corresponding moment can be determined _ij The specific calculation of the hidden layer state value at each moment of the second recurrent neural network will be described in detail below.

In an alternative implementation manner of this embodiment, the expression form of the first transformation function may be selected and set according to the needs of practical application, and the disclosure is not particularly limited to the specific expression form.

Wherein the semantic decoding result at the initial time can be calculated based on a second transformation function according to the initial content vector, the initial content vector is obtained by encoding the input text and can be represented as C ₁ Then the semantic decoding result H of the initial time ₁ That is, the first target word in the final abstract can be expressed as: h ₁ ＝y ₁ ＝F ₂ (C ₁ ) Wherein F is ₂ (. …, ■) represents a second transformation function for transforming a content vector to obtain a decoding result, it should be noted that, since an iterative decoding mechanism is adopted in the implementation of the present disclosure, when obtaining a semantic decoding result, it is necessary to comprehensively consider the content vector and the semantic decoding result at a historical time, and according to the number of the semantic decoding results, the input parameters of the second transformation function are one or more, for example, at an initial time, a semantic decoding result is not obtained yet, and then the input parameters of the second transformation function are only the initial content vector C at the initial time ₁ (ii) a Using a second transformation function based on the initial content vector C ₁ Calculating to obtain the semantic solution of the first momentCode result H ₁ The calculated content vector C at the second time may then be used ₂ Semantic decoding result H with first time ₁ And simultaneously, the semantic decoding result H at the second moment is obtained by calculation as the input parameter of the second transformation function ₂ ：H ₂ ＝y ₂ ＝F ₂ (C ₂ ,H ₁ ) (ii) a Then, the content vector C of the third moment is obtained ₃ Semantic decoding result H with first time and second time ₁ And H ₂ And simultaneously, the semantic decoding result H at the third moment is obtained by calculation as the input parameter of the second transformation function ₃ ：H ₃ ＝y ₃ ＝F ₂ (C ₃ ,H ₁ ,H ₂ ) By analogy, the semantic decoding result at each moment can be obtained.

In an optional implementation manner of this embodiment, the modification module 904 may be configured to:

Considering that for a long text, the initial attention distribution obtained by calculation is often dispersed, which may cause a lack of significance to a certain extent, and is not beneficial to the improvement of the quality of the generated summary, in this implementation manner, after the initial attention distribution is obtained by calculation by the calculation module 903, the correction module 904 may correct the initial attention distribution according to the semantic decoding result at the current time, that is, determine whether the attention is placed on the important content related to the current state, so as to compensate for the information significance. Specifically, the modification module 904 may apply the initial attention distribution parameter W according to the correlation between the semantic decoding result and the input text at the current time _ij Correcting to obtain corrected attention distribution parameter W at current time’ _ij . If the correlation between the semantic decoding result at the current moment and the input text is determined to be small by calculating the correlation between the semantic decoding result at the current moment and the input text, that is, the input text is unimportant information for the semantic decoding at the current moment, the modification module 904 can reduce the corresponding attention distribution parameters to separate the originally concentrated attention, otherwise, if the correlation between the semantic decoding result at the current moment and the input text is determined to be large by calculating the correlation between the semantic decoding result at the current moment and the input text, that is, the input text is more important information for the semantic decoding at the current moment, the modification module 904 can increase the corresponding attention distribution parameters to concentrate the attention on the relatively important contents.

The calculation of the correlation between the current semantic decoding result and the input text may be performed in a variety of ways, and those skilled in the art may select a suitable calculation method according to the needs of practical application and the characteristics of the text object, which is not limited in this disclosure. In addition, when calculating the correlation between the semantic decoding result at the current time and the input text, in order to improve the accuracy of the calculation and reduce the calculation amount, a part of the input text may be selected for performing the correlation calculation, for example, a part of the input text with a preset length that may be correlated with the semantic decoding result at the current time may be selected for performing the calculation.

In an optional implementation manner of this embodiment, as shown in fig. 11, the decoding module 905 includes:

the first calculation submodule 1101 is configured to calculate, by using a third transformation function, an intermediate content vector at the current time based on the modified attention distribution parameter at the current time and the semantic code value at each time;

the second computation submodule 1102 is configured to compute, by using a second transformation function, a current time semantic decoding result based on the current time intermediate content vector and the historical time semantic decoding result;

the first combining sub-module 1103 is configured to combine semantic decoding results at each time to obtain an abstract of the input text, that is, an abstract of the input object.

In this implementation manner, the decoding module 905 acquires the abstract of the input text in an iterative decoding manner, specifically, the first calculation sub-module 1101 calculates, by using a third transformation function, an intermediate content vector at the current time based on the corrected attention distribution parameter at the current time and the semantic code values at each time; the second calculation submodule 1102 then calculates, by using a second transformation function, a current time semantic decoding result based on the current time intermediate content vector and the historical time semantic decoding result; the first combining sub-module 1103 finally combines the semantic decoding results to obtain the abstract of the input text.

More specifically, it is assumed that the corrected i-th time point attention distribution parameter is represented by W' _ij Each time semantic code value is represented as h _j Then the intermediate content vector C at the ith time instant _i By a third transformation function F ₃ (■ ) calculated as:

C _i ＝F ₃ (W’ _ij ,h _j )。

Obtaining the intermediate content vector C at the ith time _i Thereafter, the second transformation function F may be utilized in accordance with the description above ₂ (■, …, ■) and semantic decoding results at the historical time are calculated to obtain a semantic decoding result H at the ith time _i I.e. the ith target word in the finally obtained abstract: h _i ＝y _i ＝F ₂ (C _i ,y _i-1 ,y _i-2 ,…,y ₁ )。

In order to ensure the accuracy of the attention mechanism, in an optional implementation manner of this embodiment, after the correcting module 904, a portion for performing a second correction on the corrected attention distribution parameter based on an objective function is further included, that is, as shown in fig. 12, the summary generating device includes:

a determining module 1201 configured to acquire an input object, and determine an input text according to the input object;

the encoding module 1202 is configured to encode an input text to obtain a semantic encoding result, where the semantic encoding result includes a semantic encoding value and an initial content vector at each time;

a calculating module 1203, configured to iteratively calculate an initial attention distribution parameter according to the semantic coding result;

a modification module 1204 configured to modify the initial attention distribution parameter;

a re-correction module 1205 configured to re-correct the corrected attention distribution parameter based on a preset objective function, where the preset objective function includes a first preset objective function generated based on a local loss function and a second preset objective function generated based on a global loss function;

and the decoding module 1206 is configured to perform iterative decoding based on the corrected attention distribution parameters and the semantic code values at each moment to obtain a summary of the input object.

In this embodiment, the re-correction module 1205 is configured to set a first preset objective function based on the local loss function and a second preset objective function based on the global loss function to the corrected attention distribution parameter W ″, respectively, to monitor the attention mechanism so as to further improve the quality of attention focusing, considering that the attention focusing position may be biased locally at each time and the same position may be repeatedly focused and noticed globally when generating the long text summary. _ij Performing a second correction to obtain a second corrected attention distribution parameter W' _ij To correct local and global errors.

For the whole decoding process, the information of the same position should not be focused multiple times, otherwise, the problem of repeated focusing is caused. Therefore, except that attention is focused to a specific location at a certain time, attention to the location at other times should be small. Therefore, in order to correct the global error, in an alternative implementation manner of the present embodiment, the second preset objective function is set to make a difference distribution obtained by subtracting the maximum attention distribution parameter at each time from the sum of the attention distribution parameters at all times within a preset range, or to minimize a variance of the difference distribution, so as to prevent the attention from being concentrated to the same position repeatedly. Ideally, the difference distribution should be relatively flat because a specific location should not be assigned too much attention at any time except when the specific location obtains a maximum attention distribution value, and therefore, the flatness of the difference distribution is limited by setting a preset range, or the problem of repeated attention of information is reduced by minimizing the variance of the difference distribution, so as to achieve a minimum global error.

In order to make the above technical solution clearer, the technical solution is explained and illustrated below by using a specific example.

Assuming that the input text S is represented by S, the word composition of the input text S after the word segmentation process is performed on the input text S can be represented as: s ═ x ₁ ,x ₂ ,…,x _m In which x _i The method includes the steps that the number of ith words forming an input text S is represented, wherein i is 1 … m, and m represents the number of words obtained after word segmentation processing is carried out on the input text S; the words in the input text S are input into the Bi-LSTM in time division for coding, and the semantic coding value h of each time can be obtained _j And an initial content vector C ₁ Wherein, C ₁ Is obtained by encoding the input text S, and can also be considered as using the fourth transformation function F ₄ For the input text S transformed: c ₁ ＝F ₄ (x ₁ ,x ₂ ,…,x _m ) (ii) a The obtained initial content vector C ₁ Input to LSTM for decoding, which may also be considered as based on the initial content vector C ₁ Based on a second transformation function F ₂ Calculating to obtain a semantic decoding result H at a first moment ₁ ：H ₁ ＝y ₁ ＝F ₂ (C ₁ ) (ii) a According to semantic decoding result H of the first moment ₁ Semantic code value h at each time _j And a first transformation function F ₁ Calculating to obtain an initial attention distribution parameter W at the second moment _2j ：W _2j ＝F ₁ (h _j ,H ₁ ) (ii) a Based on the above objective function, the initial attention distribution parameter W for the second moment _2j Performing two corrections to obtain a second time attention distribution parameter W after the two corrections " _2j (ii) a Using a third transformation function F ₃ Based on the attention distribution parameter W at the second time after the second correction " _2j And a semantic code value h for each time instant _j Calculating to obtain an intermediate content vector C at the second moment ₂ ：C ₂ ＝F ₃ (W’ _2j ,h _j ) (ii) a The intermediate content vector C of the second time instant ₂ Semantic decoding result H with first time ₁ Simultaneously as said second transformation function F ₂ The semantic decoding result H at the second moment is obtained by calculating the input parameters ₂ ：H ₂ ＝y ₂ ＝F ₂ (C ₂ ,H ₁ ) (ii) a Similarly, the semantic decoding result H according to the second time instant ₂ Semantic code value h at each time _j And a first transformation function F ₁ Calculating to obtain an initial attention distribution parameter W at the third moment _3j ：W _3j ＝F ₁ (h _j ,H ₂ ) (ii) a Based on the above objective function, the initial attention distribution parameter W for the third time instant _3j Performing two corrections to obtain a third time attention distribution parameter W after the two corrections " _3j (ii) a Using a third transformation function F ₃ Based on a quadratic correctionAttention distribution parameter W at the second subsequent moment " _3j And a semantic code value h for each time instant _j Calculating to obtain an intermediate content vector C at the third moment ₃ ：C ₃ ＝F ₃ (W’ _2j ,h _j ) (ii) a The intermediate content vector C of the third time instant ₃ Semantic decoding result H with first time ₁ And semantic decoding result H of the second time ₂ Simultaneously as said second transformation function F ₂ The semantic decoding result H at the third moment is obtained by calculating the input parameters ₃ ：H ₃ ＝y ₃ ＝F ₂ (C ₃ ,H ₁ ,H ₂ ) And by analogy, the semantic decoding results of each time can be obtained, and finally the semantic decoding results of each time are combined according to the sequence of the time from first to last, so that the abstract T of the input text S can be generated, wherein T ═ { y { (y) ₁ ,y ₂ ,…,y _n In which y _i The i-th word constituting the abstract T is represented, i is 1 … n, and n represents the number of words in the abstract T.

Fig. 13 is a block diagram showing the structure of a travel note digest creation apparatus according to an embodiment of the present invention, which may be implemented as part of or all of an electronic device by software, hardware, or a combination of both. As shown in fig. 13, the travel note digest generation apparatus includes:

an obtaining module 1301 configured to obtain a link, where the link includes a travel note text;

a generating module 1302 configured to generate a summary of the biographical text based on the attention distribution;

and an uploading module 1303 configured to upload the generated summary to the target object.

In view of the fact that the text content of many travel notes is long at present, which is not beneficial for readers to quickly obtain the content of interest of the readers, and many travel notes exist in the form of links, in this embodiment, the travel note text in the links is processed, the abstract of the text is extracted, and the extracted abstract is uploaded to the target object. The target object refers to an object, such as a storage, a website, and the like, which can place, store, or carry a summary carrier, such as a link containing the summary.

In an optional implementation manner of this embodiment, as shown in fig. 14, the generating module 1302 includes:

the second encoding submodule 1401 is configured to encode the shorthand text to obtain a semantic encoding result, where the semantic encoding result includes a semantic encoding value and an initial content vector at each time;

a third computing submodule 1402 configured to obtain an initial attention distribution parameter by iterative computation according to the semantic coding result;

a modification sub-module 1403 configured to modify the initial attention distribution parameter;

and the decoding submodule 1404 is configured to perform iterative decoding on the basis of the corrected attention distribution parameter and the semantic code values at each time to obtain a summary of the travel note text.

In an optional implementation manner of this embodiment, the second encoding submodule 1401 includes:

and the third coding submodule is configured to perform word-by-word coding on the one or more words to obtain a semantic coding result.

In an optional implementation manner of this embodiment, the second encoding submodule 1401 is configured to encode the shorthand text by using the first recurrent neural network, obtain a semantic encoding result, and/or,

the decoding submodule 1404 is configured to perform iterative decoding by using a second recurrent neural network based on the corrected attention distribution parameter and the semantic code value at each time to obtain an abstract of the shorthand text.

In an optional implementation manner of this embodiment, the third calculation submodule 1402 is configured to:

In an optional implementation manner of this embodiment, the modification sub-module 1403 is configured to:

In an optional implementation manner of this embodiment, the decoding sub-module 1404 includes:

the fifth calculation submodule is configured to calculate a current time semantic decoding result based on the current time intermediate content vector and a historical time semantic decoding result by using a second transformation function;

In an optional implementation manner of this embodiment, after the modification sub-module 1403, the method further includes:

In an optional implementation manner of this embodiment, the first preset objective function is characterized by: minimizing the inverse of the variance of the attention distribution parameter at each time instant; and/or, the second preset objective function is characterized by: the variance of the difference distribution obtained by subtracting the maximum attention distribution parameter at each time from the sum of the attention distribution parameters at all times is minimized.

It should be noted that, some technical terms or technical features related to fig. 14 and the related embodiments thereof are the same as or similar to those of the above-mentioned embodiments, and the corresponding explanation and description can refer to the description of the above-mentioned embodiments, and the description of the present invention is not repeated herein.

Fig. 15 is a block diagram illustrating a structure of an electronic device according to an embodiment of the present invention, and as shown in fig. 15, the electronic device 1500 includes a memory 1501 and a processor 1502; wherein,

the memory 1501 is configured to store one or more computer instructions, which are executed by the processor 1502 to perform any of the method steps described above.

FIG. 16 is a schematic block diagram of a computer system suitable for use in implementing any of the methods described above according to embodiments of the invention.

As shown in fig. 16, the computer system 1600 includes a Central Processing Unit (CPU)1601 which can execute various processes in the above-described embodiments in accordance with a program stored in a Read Only Memory (ROM)1602 or a program loaded from a storage portion 1608 into a Random Access Memory (RAM) 1603. In the RAM1603, various programs and data necessary for the operation of the system 1600 are also stored. The CPU1601, ROM1602, and RAM1603 are connected to each other via a bus 1604. An input/output (I/O) interface 1605 is also connected to the bus 1604.

The following components are connected to the I/O interface 1605: an input portion 1606 including a keyboard, a mouse, and the like; an output portion 1607 including a display device such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage portion 1608 including a hard disk and the like; and a communication section 1609 including a network interface card such as a LAN card, a modem, or the like. The communication section 1609 performs communication processing via a network such as the internet. A driver 1610 is also connected to the I/O interface 1605 as needed. A removable medium 1611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1610 as necessary, so that a computer program read out therefrom is mounted in the storage portion 1608 as necessary.

In particular, the above described method may be implemented as a computer software program according to an embodiment of the present invention. For example, embodiments of the invention include a computer program product comprising a computer program tangibly embodied on a medium readable thereby, the computer program comprising program code for performing the method. In such embodiments, the computer program may be downloaded and installed from a network via the communication portion 1609, and/or installed from the removable media 1611.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowcharts or block diagrams may represent a module, a program segment, or a portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units or modules described in the embodiments of the present invention may be implemented by software, or may be implemented by hardware. The units or modules described may also be provided in a processor, and the names of the units or modules do not in some cases constitute a limitation of the units or modules themselves.

As another aspect, an embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium may be a computer-readable storage medium included in the apparatus in the foregoing embodiment; or it may be a separate computer readable storage medium not incorporated into the device. The computer readable storage medium stores one or more programs for use by one or more processors in performing the methods described in the embodiments of the present invention.

The foregoing description is only exemplary of the preferred embodiments of the invention and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention according to the embodiments of the present invention is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept. For example, the above features and (but not limited to) the features with similar functions disclosed in the embodiments of the present invention are mutually replaced to form the technical solution.

Claims

1. A method for generating a summary, comprising:

correcting the initial attention distribution parameter;

iterative decoding is carried out on the basis of the corrected attention distribution parameters and the semantic code values at all the moments to obtain an abstract of the input object;

wherein, the obtaining of the initial attention distribution parameter by iterative computation according to the semantic coding result comprises:

based on a first transformation function, calculating to obtain an initial attention distribution parameter at the current moment according to the semantic coding value at each moment and a semantic decoding result at the last moment, wherein the semantic decoding result at the initial moment is calculated based on a second transformation function according to the initial content vector;

the modifying the initial attention distribution parameter includes:

correcting the initial attention distribution parameter according to the correlation between the semantic decoding result at the current moment and the input text to obtain a corrected attention distribution parameter at the current moment;

after the correcting the initial attention distribution parameter, the method further includes:

and correcting the corrected attention distribution parameters again based on a preset target function, wherein the preset target function comprises a first preset target function generated based on a local loss function and a second preset target function generated based on a global loss function, the first preset target function is used for correcting local errors, and the second preset target function is used for correcting global errors.

2. The method of claim 1, wherein the input object is one or more of the following: inputting text, voice and image;

3. The method of claim 2, wherein the encoding the input text to obtain the semantic encoding result comprises:

4. The method of claim 3, wherein: encoding the input text by using the first recurrent neural network to obtain a semantic encoding result, and/or,

5. The method according to claim 3, wherein the iteratively decoding based on the modified attention distribution parameters and the semantic code values at each time to obtain the abstract of the input object comprises:

6. The method of claim 3,

the first preset objective function is characterized by: minimizing the inverse of the variance of the attention distribution parameter at each time instant; and/or, the second preset objective function is characterized by: the variance of the difference distribution obtained by subtracting the maximum attention distribution parameter at each time from the sum of the attention distribution parameters at all times is minimized.

7. A method for generating a travel note abstract is characterized by comprising the following steps:

acquiring a link, wherein the link comprises a travel note text;

generating a summary of the travel note text based on the attention distribution;

uploading the generated abstract to a target object;

wherein the generating the summary of the biographical text based on the attention distribution comprises:

correcting the initial attention distribution parameter;

iterative decoding is carried out on the basis of the corrected attention distribution parameters and the semantic code values at all the moments to obtain an abstract of the travel note text;

the obtaining of the initial attention distribution parameter through iterative computation according to the semantic coding result comprises the following steps:

the modifying the initial attention distribution parameter includes:

correcting the initial attention distribution parameter according to the correlation between the semantic decoding result at the current moment and the travel note text to obtain a corrected attention distribution parameter at the current moment;

8. The method of claim 7, wherein the encoding the travel note text to obtain a semantic encoding result comprises:

9. The method of claim 7, wherein: coding the travel note text by utilizing the first cyclic neural network to obtain a semantic coding result, and/or,

10. The method according to claim 7, wherein the iteratively decoding based on the corrected attention distribution parameters and the semantic code values at each time to obtain the abstract of the shorthand text comprises:

11. The method of claim 7, wherein the first predetermined objective function is characterized by: minimizing the inverse of the variance of the attention distribution parameter at each time instant; and/or, the second preset objective function is characterized by: the variance of the difference distribution obtained by subtracting the maximum attention distribution parameter at each time from the sum of the attention distribution parameters at all times is minimized.

12. An apparatus for generating a summary, comprising:

the decoding module is configured to perform iterative decoding on the basis of the corrected attention distribution parameters and the semantic code values at all the moments to obtain a summary of the input object;

wherein the computing module is further configured to:

the correction module is further configured to:

after the correcting module, the method further comprises the following steps:

and the re-correction module is configured to re-correct the corrected attention distribution parameters based on a preset objective function, wherein the preset objective function comprises a first preset objective function generated based on a local loss function and a second preset objective function generated based on a global loss function, the first preset objective function is used for correcting a local error, and the second preset objective function is used for correcting a global error.

13. The apparatus of claim 12, wherein the input object is one or more of: inputting text, voice and image;

14. The apparatus of claim 13, wherein the encoding module comprises:

15. The apparatus of claim 13, wherein the encoding module is configured to: encoding the input text by using the first recurrent neural network to obtain a semantic encoding result, and/or,

16. The apparatus of claim 13,

the decoding module includes:

the first calculation submodule is configured to calculate to obtain a current-time intermediate content vector based on the corrected current-time attention distribution parameter and the semantic coding value of each time by using a third transformation function;

and the first combination sub-module is configured to combine semantic decoding results at all times to obtain an abstract of the input text, namely an abstract of the input object.

17. The apparatus of claim 13, wherein the first predetermined objective function is characterized by: minimizing the inverse of the variance of the attention distribution parameter at each time instant; and/or, the second preset objective function is characterized by: the variance of the difference distribution obtained by subtracting the maximum attention distribution parameter at each time from the sum of the attention distribution parameters at all times is minimized.

18. A travel note digest generation apparatus, comprising:

an upload module configured to upload the generated summary to a target object; wherein the generating module comprises:

the decoding submodule is configured to perform iterative decoding on the basis of the corrected attention distribution parameters and the semantic code values at all the moments to obtain an abstract of the travel note text;

the third computing submodule is further configured to:

the modifier sub-module is further configured to:

after the modification submodule, the method further comprises:

and the re-correction submodule is configured to re-correct the corrected attention distribution parameters based on a preset objective function, wherein the preset objective function comprises a first preset objective function generated based on a local loss function and a second preset objective function generated based on a global loss function, the first preset objective function is used for correcting a local error, and the second preset objective function is used for correcting a global error.

19. The apparatus of claim 18, wherein the second encoding submodule comprises:

20. The apparatus of claim 18, wherein: the second encoding submodule is configured to encode the travel note text using the first recurrent neural network, obtain a semantic encoding result, and/or,

21. The apparatus of claim 18,

the decoding sub-module includes:

22. The apparatus of claim 18, wherein the first predetermined objective function is characterized by: minimizing the inverse of the variance of the attention distribution parameter at each time instant; and/or, the second preset objective function is characterized by: the variance of the difference distribution obtained by subtracting the maximum attention distribution parameter at each time from the sum of the attention distribution parameters at all times is minimized.

23. An electronic device comprising a memory and a processor; wherein,

the memory is configured to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the method steps of any of claims 1-11.

24. A computer-readable storage medium having stored thereon computer instructions, which, when executed by a processor, carry out the method steps of any one of claims 1 to 11.