[go: up one dir, main page]

WO2017076205A1 - Method and apparatus for obtaining reply prompt content for chat start sentence - Google Patents

Method and apparatus for obtaining reply prompt content for chat start sentence Download PDF

Info

Publication number
WO2017076205A1
WO2017076205A1 PCT/CN2016/103422 CN2016103422W WO2017076205A1 WO 2017076205 A1 WO2017076205 A1 WO 2017076205A1 CN 2016103422 W CN2016103422 W CN 2016103422W WO 2017076205 A1 WO2017076205 A1 WO 2017076205A1
Authority
WO
WIPO (PCT)
Prior art keywords
chat
topic
initiation sentence
entry
sentence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2016/103422
Other languages
French (fr)
Chinese (zh)
Inventor
陈包容
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Publication of WO2017076205A1 publication Critical patent/WO2017076205A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9032Query formulation
    • G06F16/90332Natural language query formulation or dialogue systems

Definitions

  • the present invention relates to the field of communications technologies, and in particular, to a method and apparatus for obtaining a reply prompt content of a chat initiation sentence.
  • the method for obtaining the content of the chat reply prompt by the database matching method is mainly divided into two steps, that is, the pre-processing of the chat initiation sentence is first obtained, the word segmentation text is obtained, and then the obtained segmentation text is matched with the pre-established database to obtain Chat reply to the prompt content.
  • the traditional database matching method does not necessarily obtain the chat reply prompt content matching the chat initiation sentence, thereby resulting in low intelligence and poor user experience.
  • the present invention provides a method and apparatus for obtaining a reply prompt content of a chat initiation sentence, so as to solve the problem that the traditional database matching method does not necessarily obtain the chat reply prompt content matching the chat initiation sentence, resulting in low intelligence of the chat and Technical problems with poor user experience.
  • a method for obtaining a reply prompt content of a chat initiation sentence including:
  • the user network data of the communication terminal is collected based on the distributed cloud computing manner, and the chat initiation sentence is semantically matched by using the user network data to obtain the second semantic matching result, and the first The second semantic matching result is used as the reply prompt content of the chat initiation sentence.
  • establishing a topic database corresponding to the preset topic includes:
  • a sample chat pair with a preset topic as a chat topic is created, and the sample chat pair is used as a topic database corresponding to the preset topic, and the sample chat pair includes a sample initiation sentence, and a sample response sentence corresponding to the sample initiation sentence set according to the scene option. .
  • obtaining the topic classification to which the chat initiation sentence received by the communication terminal belongs includes:
  • the topic classification to which the chat initiation sentence belongs is obtained according to the keyword.
  • the session initiation sentence is semantically matched by using a topic database corresponding to the preset topic of the topic classification, and obtaining the first semantic matching result includes:
  • the same sample initiation sentence as the chat initiation sentence is matched in the topic database, and the first semantic matching result is obtained according to the scenario information.
  • semantically matching the chat initiation sentence by using the user network data, and obtaining the second semantic matching result includes:
  • the user network data is preprocessed to obtain preprocessed text, and the preprocessing includes word segmentation processing, semantic disambiguation processing, part of speech labeling processing, removal of stop word processing, punctuation symbol processing, and expression character processing;
  • the K-means clustering algorithm is used to perform text clustering on the pre-processed text to obtain a text clustering center;
  • chat initiation sentences are matched in the user network data corresponding to the clustering topic, and the second semantic matching result is obtained.
  • the scene entry includes:
  • Relationship entry name entry, gender entry, age entry, instant messaging account entry, email address entry, home address entry, occupation category entry, job entry, work unit entry, unit address entry of the communication terminal that sends and receives the chat initiation sentence , bank account entry, friend impression entry, hobbyist entry, circle of friends status entry, mood entry, recent topic of interest entry, current communication status entry, scene image entry, time entry, holiday entry, season entry, geographic location information entry, distance An entry, a communication frequency entry, a communication time entry, a communication duration entry, and a selection mode entry for initiating historical communication, wherein the selection manner includes starting a communication mode from the address book, initiating a communication mode from the historical call record, and initiating a communication mode from the short message communication module, Initiate communication from the dial pad.
  • collecting the content information of the scene image entry of the communication terminal that sends or receives the chat initiation sentence in the scene entry includes:
  • the Gaussian difference DOG operator is used to extract the region of interest of the scene training image, and the scale-invariant feature transform SIFT feature of the region of interest of the scene training image is calculated;
  • the K-means clustering algorithm is used to cluster the SIFT features of the region of interest of the scene training image to obtain a plurality of cluster centers, and a visual word dictionary composed of visual words corresponding to each cluster center is established;
  • the DOG operator is used to extract the region of interest of the scene image, and the visual word dictionary is matched with the visual word closest to the SIFT feature of the region of interest of the scene image;
  • the scene image is classified by using a pre-trained support vector machine classifier, and the communication for sending or receiving the chat initiation sentence is obtained.
  • the content information of the scene image entry of the terminal is obtained.
  • an apparatus for obtaining a reply prompt content of a chat initiation sentence including:
  • a topic database creation device configured to establish a topic database corresponding to the preset topic
  • the topic classification obtaining device is configured to obtain a topic classification to which the chat initiation sentence received by the communication terminal belongs;
  • the first semantic matching device is configured to perform semantic matching on the chat initiation sentence by using a topic database corresponding to the preset topic of the topic classification, obtain a first semantic matching result, and use the first semantic matching result as a reply prompt of the chat initiation sentence. content;
  • the second semantic matching device is configured to collect data of the user network data of the communication terminal based on the distributed cloud computing manner if the first semantic matching result is not obtained, and perform semantic matching on the chat initiation sentence by using the user network data to obtain the first The second semantic matching result, and the second semantic matching result is used as the reply prompt content of the chat initiation sentence.
  • the topic database creation device includes:
  • a setting device configured to set a scene item associated with the preset topic and a scene option corresponding to the scene item
  • the sample chat pair creation device is configured to create a sample chat pair with a preset topic as a chat topic, and the sample chat pair is used as a topic database corresponding to the preset topic, and the sample chat pair includes a sample initiation sentence, and is set according to the scene option.
  • the sample response sentence corresponding to the sample initiation sentence.
  • the topic classification obtaining device includes:
  • a merged text obtaining device configured to obtain the above chat content of the chat initiation sentence, and merge the chat start sentence and the chat content of the chat initiation sentence into a merged text in a text format;
  • a keyword extracting device configured to extract keywords of the merged text
  • the topic classification determining means is configured to acquire a topic classification to which the chat initiation sentence belongs according to the keyword.
  • the method and device for obtaining the reply prompt content of the chat initiation sentence provided by the embodiment of the present invention, the method for obtaining the topic classification to which the chat initiation sentence received by the communication terminal belongs, and using the customized preset topic corresponding to the topic classification
  • the topic database semantically matches the chat initiation sentence, obtains the first semantic matching result, collects the user network data of the communication terminal without obtaining the first semantic matching result, and uses the user network data to semantically match the chat initiation sentence.
  • the second semantic matching result is obtained, which solves the problem that the traditional database matching method does not necessarily obtain the chat reply prompt content matching the chat initiation sentence, thereby causing low intelligence of the chat and technical problems of poor user experience, fully
  • the user network data of the communication terminal is used to obtain the reply prompt content of the chat initiation sentence, which improves the accuracy of the reply prompt content acquisition, reflects a high level of intelligence, and improves the user experience.
  • FIG. 1 is a flowchart of a method for obtaining a reply prompt content of a chat initiation sentence according to an optional embodiment of the present invention
  • FIG. 2 is a scenario image of a communication terminal that assumes a received chat initiation sentence according to an alternative embodiment of the present invention
  • FIG. 3 is a diagram of a visual word result obtained by matching a scene image of a communication terminal of a received chat initiation sentence with a visual word dictionary according to an alternative embodiment of the present invention
  • FIG. 4 is a flowchart of a method for obtaining a reply prompt content of a chat initiation sentence according to an alternative embodiment of the present invention
  • FIG. 5 is a flowchart of a method for obtaining a reply prompt content of a chat initiation sentence according to an alternative embodiment of the present invention
  • FIG. 6 is a flowchart of a method for obtaining a reply prompt content of a chat initiation sentence for a third simplified embodiment according to an alternative embodiment of the present invention
  • FIG. 7 is a structural block diagram of an apparatus for obtaining a reply prompt content of a chat initiation sentence according to an optional embodiment of the present invention.
  • an alternative embodiment of the present invention provides a method for obtaining a reply prompt content of a chat initiation sentence, including:
  • Step S101 establishing a topic database corresponding to the preset topic
  • Step S102 acquiring a topic classification to which the chat initiation sentence received by the communication terminal belongs;
  • Step S103 Perform semantic matching on the chat initiation sentence by using a topic database corresponding to the preset topic of the topic classification, obtain a first semantic matching result, and use the first semantic matching result as a reply prompt content of the chat initiation sentence;
  • Step S104 If the first semantic matching result is not obtained, collect data of the user network data of the communication terminal based on the distributed cloud computing manner, and perform semantic matching on the chat initiation sentence by using the user network data to obtain a second semantic matching result. And the second semantic matching result is used as the reply prompt content of the chat initiation sentence.
  • the method for obtaining the reply prompt content of the chat initiation sentence obtains the topic classification to which the chat initiation sentence received by the communication terminal belongs, and uses the customized topic database corresponding to the preset topic with the same topic classification to chat. Initiating a sentence for semantic matching, obtaining a first semantic matching result, and collecting user network data of the communication terminal without obtaining the first semantic matching result, and using the user network data to semantically match the chat initiation sentence to obtain a second
  • the semantic matching result solves the problem that the traditional database matching method does not necessarily obtain the chat reply prompt content matching the chat initiation sentence, thereby resulting in low intelligence of the chat and poor user experience, and fully utilizing the communication terminal.
  • User network data to get chat The reply prompt content of the sentence improves the accuracy of the content of the reply prompt content, embodies a higher level of intelligence, and improves the user experience.
  • the user network data of the communication terminal in this embodiment includes personal information data of the communication terminal, social information data (microblog, WeChat, forum, blog, etc.), communication information data, online shopping information data, online footprint information data, and the like.
  • the communication information includes the user's own historical communication records, historical communication records of other users using the same communication application software, and communication records provided by third-party application software.
  • the communication record further includes a call record and a short message record, and the short message record further includes a mobile phone short message record and an instant message record, and the call record further includes a mobile phone call record and an instant message voice and video call record.
  • the reply prompt content matching the chat initiation sentence in the chat process is mainly obtained based on the user network data, so the embodiment is mainly for the network chat data with context interaction in the user network data of the communication terminal. Collecting, such as WeChat, QQ instant messaging chat, chat data with Taobao customer service, Baidu question and answer, interaction in Weibo private message or chat data.
  • the present embodiment can perform semantic matching on the chat initiation sentence by using the topic database corresponding to the preset topic with the same topic classification, and can also use the topic corresponding to the preset topic closest to the topic classification.
  • the database semantically matches the chat initiation sentence.
  • This embodiment uses a Hadoop-based platform for processing big data.
  • Hadoop is an open source distributed computing platform, and its core includes the distributed file system (Hadoop Distributed Files System, Hadoop, referred to as HDFS).
  • HDFS Hadoop Distributed Files System
  • the many advantages of HDFS allow users to deploy Hadoop on low-cost hardware and build distributed clusters to form a distributed system.
  • Hadoop DataBase (HBase) is a distributed database system based on the distributed file system HDFS that provides high reliability, high performance, column storage, scalable, real-time read and write. It is mainly used to store unstructured data. Loose and semi-structured loose data.
  • the collected network data is stored by the distributed storage device, and the distributed storage device is implemented based on HDFS.
  • establishing a topic database corresponding to the preset topic includes:
  • a sample chat pair with a preset topic as a chat topic is created, and the sample chat pair is used as a topic database corresponding to the preset topic, and the sample chat pair includes a sample initiation sentence, and a sample response sentence corresponding to the sample initiation sentence set according to the scene option. .
  • the phrase for the same chat (for example, "Go travel together?”) often requires different responses in real life (for example, "The weather is bad, how about going next time, how?", "I prefer the house.” At home, I don't like to travel.”, "Recent work is too busy, I can't get time to travel.” etc.), that is, for the same chat initiation sentence, communication terminal users often need to give different according to different environments or scenarios. Reply.
  • the sample chat pair includes a sample initiation sentence, a sample response sentence corresponding to the sample initiation sentence set according to the scene option, and a sample chat pair with the preset topic as the chat topic as a topic database corresponding to the preset topic.
  • the embodiment sets the scene item associated with the preset topic according to daily experience.
  • Table 1 shows several different preset topics and their associated scene entries.
  • the preset topic is a "tourism" default topic
  • the default topic is "email" preset topic, it is known through daily experience that the communication terminal belongs to the "mailing" preset topic.
  • the embodiment sets the preset topic for different topics.
  • the scene entries associated with them are not fixed, but are only given according to daily experience, that is, the user can set the scene entries associated with the preset topics as needed.
  • the scene option corresponding to the scene entry is also customized by the user as needed.
  • the sample corresponding to the sample initiation sentence is set according to the scene option.
  • Table 2 is the code name of the scene option corresponding to the three scene entries associated with the "shopping" preset topic, wherein the relationship entry may refer to the relationship entry of the communication terminal that receives and sends the chat initiation sentence. It can be seen from Table 2 that the scene option corresponding to the relationship entry is six items, the scene option corresponding to the distance item is three items, and the scene option corresponding to the weather item is five items.
  • the number of combinations of all scene options of all scene entries in this embodiment is 6*3*5, that is, when creating a sample chat pair in the topic database corresponding to the “shopping” preset topic, a sentence is initiated for each sample. You can set up to 90 sample response sentences corresponding to it.
  • the embodiment when creating a sample reply sentence for a sample start sentence, the number of combinations of the scene options and the sample reply sentence of the combined mode may be set as needed, that is, the start sentence for each sample does not need to set all the scene option combinations. Sample response sentence. And in an optional implementation process, the embodiment sets the content information for each scene item to include a scene option that is “empty” (which may be represented by a “0” code), because in actual implementation, The content information corresponding to the scene entry cannot be obtained. For example, if the communication terminal that receives the chat initiation sentence does not have a Global Positioning System (GPS) positioning or does not open the geographic location permission, the data returned by the system is air.
  • GPS Global Positioning System
  • the related search result cannot be obtained when the communication terminal disconnects the network link or enters the no-network signal area.
  • the communication terminal user when creating a sample chat pair in the topic database, for example, for some chat initiation sentences with a fixed reply, the communication terminal user only needs to set the scene option content to be empty.
  • the scene item associated with a preset topic and a scene option corresponding to the scene item, and creating a sample chat pair with the preset topic as a chat topic as a topic database corresponding to the preset topic, Enrich the sample reply sentence types for the same sample-initiating sentence, meet the actual needs, enhance the user experience, and realize the sample setting of the sample-initiating sentence from different scenes and their combination conditions by considering the scene items associated with the preset topic.
  • the reply sentence is in line with the logical thinking of human replying to chat information, and has a high level of intelligence and personalization.
  • obtaining the topic classification to which the chat initiation sentence received by the communication terminal belongs includes:
  • the topic classification to which the chat initiation sentence belongs is obtained according to the keyword.
  • the embodiment determines that the topic classification to which the chat initiation sentence belongs is not only based on the chat initiation sentence but on the above chat content of the chat initiation sentence and the chat initiation sentence, but in the actual implementation process, the chat initiation sentence is relative to the chat.
  • the above-mentioned chat content of the initiating sentence has more reference value for obtaining the topic classification to which the chat initiation sentence belongs. Therefore, in this embodiment, the keyword of the merged text can be extracted by using the weighted word frequency statistics of the segmentation text obtained after the merged text segmentation, that is, The closer the chat content is to the chat initiation sentence, the greater the weighting factor.
  • the content corresponding to the keyword may be used as the topic classification to which the chat initiation sentence belongs, or the topic corresponding to the keyword may be queried according to the association mapping table of the preset keyword and the topic classification. classification.
  • the topic classification of the chat initiation sentence is obtained by combining the chat content of the chat initiation sentence and the chat initiation sentence, and the chat context in which the chat initiation sentence belongs is fully considered, and the chat context is obtained by relying only on the chat initiation sentence. Topic classification is more accurate. And the word frequency statistics of the word segmentation text obtained after the merged text segmentation is determined by the weighting method to determine the keyword, so that the topic classification of the obtained chat initiation sentence is more accurate.
  • the present embodiment when the chat initiation sentence does not chat with the content, the present embodiment only obtains the topic classification to which the chat initiation sentence belongs, and in this embodiment, the above chat content of the chat initiation sentence is obtained.
  • the scope of the user is customized, for example, the above chat content can be obtained within a certain period of time, or the above chat content within a certain number of content.
  • the session initiation sentence is semantically matched by using a topic database corresponding to the preset topic of the topic classification, and obtaining the first semantic matching result includes:
  • the same sample initiation sentence as the chat initiation sentence is matched in the topic database, and the first semantic matching result is obtained according to the scenario information.
  • the content information of the scene item collected by the topic classification may be calculated, reasoned, The way to query, search, or any combination of them.
  • the method for calculating, inferring, querying, searching, or any combination of personal information, social information, communication information, online shopping information, online footprint information, user behavior information, user service information, and the like of the communication terminal may be The content information corresponding to the scene item is obtained, wherein the user behavior information refers to information about the behavior of demand expression, information acquisition, information utilization, etc., which is displayed when the user seeks the information he needs.
  • the communication information includes the user's own historical communication records, historical communication records of other users using the same communication application software, and communication records provided by third-party application software.
  • the communication record further includes a call record and a short message record, and the short message record further includes a mobile phone short message record and an instant message record, and the call record further includes a mobile phone call record and an instant message voice and video call record.
  • the scene item when the scene item is a geographical location information item, it can be obtained by querying the GPS positioning information.
  • the geographical position difference of the communication terminal that receives and sends the chat initiation sentence can be obtained by calculating the scene item.
  • the topic item When the topic item is recently concerned, it can be obtained by searching the latest web browsing record of the communication terminal.
  • the scene item when the scene item is a weather item, it can be obtained by querying the weather webpage, or by reasoning the meteorological information such as temperature, wind direction and humidity collected. .
  • Obtaining the first semantic matching result according to the scenario information in this embodiment may include: first identifying the scenario information, and obtaining the identifier ID.
  • Table 2 it is assumed that only the sending chat initiation sentence and the receiving chat initiation sentence are collected in this embodiment.
  • the content information of the relationship entry of the communication terminal is “colleague”, and the content information of the weather item of the communication terminal that sends the chat initiation sentence is “clear”, and the obtained identification ID number is “3+0+1”, and then
  • the topic database corresponding to the preset topic with the same topic classification matches the same sample initiation sentence as the chat initiation sentence, and matches the combination code corresponding to the identification ID number in the scenario option corresponding to the sample initiation sentence, and will be the same as the identification ID.
  • the sample reply sentence corresponding to the combination code is used as the reply prompt content corresponding to the chat initiation sentence.
  • the same sample initiation sentence can be obtained by using exact matching to obtain the same sentence initiation sentence.
  • a fuzzy matching method may also be used to obtain a sample initiation sentence similar to a chat initiation sentence. In this embodiment, the fuzzy matching method is used to obtain the chat initiation sentence.
  • the similar sample initiation sentence may include: first preprocessing the chat initiation sentence, the preprocessing includes word segmentation, semantic disambiguation, part of speech tagging, removing the stop word, and the like, and then the pre-processed chat initiation sentence and the topic database
  • the sample initiation sentence performs text matching, and the text matching sample initiation sentence whose similarity is greater than a preset threshold is used as a sample initiation sentence matching the chat initiation sentence.
  • different sample reply sentences can be matched according to different scene information, thereby realizing that the reply prompt content corresponding to the chat initiation sentence is intelligently acquired according to the scene information of the communication terminal, and has a high degree of intelligence. And personalization level.
  • the topic database corresponding to the preset topic created in this embodiment has the functions of automatic learning and automatic updating.
  • the content information of the collected scene entry is not included in the created topic database.
  • the content information of the item for example, when the scene option of the weather item includes only three, respectively: “1" indicates clear, “2" indicates rain, and “3” indicates snowing, and when weather scene entries are collected
  • the system will create a scene option with the code "4" indicating "cloudy” under the weather scene entry, and the corresponding updated scene option combination and the corresponding sample reply sentence.
  • the sample reply sentence that is combined for each scene option in this embodiment may be one session content, or may be multiple session content.
  • semantically matching the chat initiation sentence by using the user network data, and obtaining the second semantic matching result includes:
  • the user network data is preprocessed to obtain preprocessed text, and the preprocessing includes word segmentation processing, semantic disambiguation processing, part of speech labeling processing, removal of stop word processing, punctuation symbol processing, and expression character processing;
  • the K-means clustering algorithm is used to perform text clustering on the pre-processed text to obtain a text clustering center;
  • chat initiation sentences are matched in the user network data corresponding to the clustering topic, and the second semantic matching result is obtained.
  • the second semantic matching result is obtained by matching the chat initiation sentence with the collected user network data, but the user network data is generally the number of big data. According to the information, when the chat initiation sentence is directly matched in the user network data, multiple matching results or the obtained reply prompt content may be completely irrelevant.
  • the embodiment firstly collects the user network data. Perform preprocessing, perform text clustering on the preprocessed preprocessed text, obtain the text clustering center, and extract the keywords of the text clustering center as the clustering topic, and finally get the closest to the topic classification to which the chat initiation sentence belongs. The chat initiation sentence is matched in the user network data corresponding to the clustering topic, thereby obtaining the second semantic matching result.
  • the present embodiment performs text clustering on the pre-processed text based on the K-means clustering algorithm to obtain a text clustering center, which may include the following steps:
  • K data may be either a word or a sentence.
  • the distance between each sample and the center point is obtained by calculating the distance between the word vector corresponding to each sample and the word vector corresponding to the center point.
  • the communication terminal user in order to improve the accuracy of obtaining the second semantic matching result according to the user network data, the communication terminal user generally performs one or more screenings on the collected user network data before performing preprocessing and text clustering. .
  • the embodiment may obtain the similarity between the topic classification and the cluster topic to which the chat initiation sentence belongs, or may be preset. The relevance of topic classification and clustering topic is obtained.
  • the K-means clustering algorithm is used to extract the clustering topic of the user network data, and the chat initiation sentence is matched in the user network data corresponding to the clustering topic closest to the chat initiation sentence, thereby saving a large number of user networks.
  • the matching time of the data matching the chat initiation sentence thereby improving the speed and efficiency of the reply prompt content acquisition, and matching the chat initiation sentence by only the user network data corresponding to the cluster topic closest to the chat initiation sentence, Make the obtained reply prompt content more accurate and intelligent.
  • the scene entry includes:
  • the scene entry of this embodiment is not limited to including only the above-mentioned scene entries, and is not limited to including all of the above scene entries, and may be selected by the user or selected according to needs and system design complexity and design precision.
  • collecting the content information of the scene image entry of the communication terminal that sends or receives the chat initiation sentence in the scene entry includes:
  • the region of interest of the scene training image is extracted by using a Difference of Gaussian (DOG) operator, and the SIFT feature of the region of interest of the scene training image is calculated;
  • DOG Difference of Gaussian
  • the K-means clustering algorithm is used to cluster the Scale-Invariant Feature Transform (SIFT) features of the region of interest of the scene training image to obtain multiple cluster centers, and each cluster is established.
  • SIFT Scale-Invariant Feature Transform
  • the DOG operator is used to extract the region of interest of the scene image, and the visual word dictionary is matched with the visual word closest to the SIFT feature of the region of interest of the scene image;
  • the scene image is classified according to the distribution of the visual words of the region of interest of the scene image by using a pre-trained support vector machine classifier, and the content information of the scene image item of the communication terminal that transmits or receives the chat initiation sentence is obtained.
  • the SIFT feature in this embodiment is a scale-invariant feature transformation, which is to find extreme points in the spatial scale and extract its position, scale, and rotation invariants.
  • the process of matching the visual word closest to the SIFT feature of the region of interest of the scene image in the visual word dictionary is: the SIFT feature of each region of interest of the scene image and each of the visual word dictionary
  • the SIFT feature of the cluster center corresponding to a visual word is used for similarity calculation. When the similarity between the region of interest and the visual word is calculated to be greater than a preset threshold, the visual word is considered to be the closest visual word to the region of interest. .
  • FIG. 2 is a schematic diagram of a scenario in which a communication terminal receiving a chat initiation sentence receives a chat initiation sentence according to an embodiment, and extracts SIFT features of five regions of interest of the scene image, and calculates each sense by calculating The similarity between the SIFT feature of the region of interest and the visual word dictionary, thereby obtaining the closest visual words to the above five regions of interest, namely "sky", “flag", "building", “lion”, “bridge”
  • the content information of the scene image item of the communication terminal receiving the chat initiation sentence is obtained by using the pre-trained support vector machine according to the obtained visual words closest to the above five regions of interest.
  • the content information of the scene image item of the communication terminal is obtained by collecting the scene image of the communication terminal, so that the scene information acquired based on the scene image information is closer to the real scene information, and the scene corresponding to the chat initiation sentence is obtained by using the scene image information.
  • the response prompts are more personalized and more in line with the communication context.
  • the communication scenario for the first embodiment is as follows: the communication terminal A sends the chat initiation sentence in the text format of “Would you like to go shopping together?” to the communication terminal B. Referring to FIG. 4, the communication terminal B obtains a reply prompt of the chat initiation sentence.
  • Content methods include:
  • Step S201 setting a scene entry associated with the preset topic, and corresponding to the scene entry Scene options.
  • the scene entries associated with the “shopping” topic in the preset topic include a relationship entry of the communication terminal that sends and receives the chat initiation sentence, a distance entry, and a receiving chat initiation.
  • the weather entry of the communication terminal of the sentence has 6 scene options corresponding to the relationship entry, 3 scene options corresponding to the distance entry, and 5 scene options corresponding to the weather item, refer to Table 2.
  • Step S202 creating a sample chat pair with the preset topic as the chat topic, and using the sample chat pair as the topic database corresponding to the preset topic, the sample chat pair including the sample initiation sentence and the sample initiation sentence corresponding according to the scene option. Sample response sentence.
  • the embodiment creates a sample chat pair corresponding to the “shopping” topic, and sets a custom combination number of sample reply sentences for each sample start sentence in each sample chat pair, for example, for the sample reply sentence “Do you want to Go shopping together?” Set the sample response sentence according to all combinations of scene options (90 types). For the sample reply sentence “How do you translate in English?” Set a sample response sentence of the scene option combination (code 0+0+0) Then, the created sample chat pair corresponding to the "shopping" topic is created as a topic database corresponding to the "shopping” topic.
  • Step S203 Acquire the above chat content of the chat initiation sentence, and merge the chat content of the chat initiation sentence and the chat initiation sentence into a merged text in a text format.
  • the content of the chat initiation sentence in the embodiment includes a total of four conversation contents, which may be: communication terminal A: busy? / Communication terminal B: Fortunately. / Communication terminal A: I feel that there is no suitable clothing to wear recently! / Communication terminal B: Yes. Then the merged text obtained is ⁇ busy? / Ok. / Recently I feel that there is no suitable clothing to wear! /right. / Do you want to go shopping together? ⁇ .
  • Step S204 extracting keywords of the merged text.
  • the extracted word segmentation text includes ⁇ "busy", “clothing", “wearing”, “shopping” ⁇ , and this embodiment
  • the word frequency statistics are performed by weighted word frequency statistics, and the word segmentation text corresponding to the maximum weighted word frequency statistics value is selected as the keyword of the combined text.
  • the time interval from the chat content in the merged text is from small to large. Decrement, and the specific value of the weighting factor is customized by the user as needed.
  • the word frequency of the word segmentation text extracted in this embodiment is one, after the weight word frequency calculation is performed on each word segmentation text, the keyword can be obtained as “shopping”.
  • Step S205 determining a topic classification to which the chat initiation sentence belongs according to the keyword.
  • the content corresponding to the keyword is used as the topic classification to which the chat initiation sentence belongs, that is, the topic to which the chat initiation sentence belongs in this embodiment is classified as “shopping”.
  • Step S206 Acquire a topic database corresponding to the preset topic that is the same as the topic classification.
  • Step S207 Collect content information of a scene item associated with the topic classification, and obtain scene information.
  • Step S208 matching the same sample initiation sentence as the chat initiation sentence in the topic database, and acquiring the first semantic matching result according to the scenario information.
  • the scenario information is first identified, and the identifier ID is obtained. Referring to Table 2, the identifier ID number obtained is “2+1+1”, and then the topic database corresponding to the preset topic with the same topic classification is matched and The same as the sample initiation sentence of the chat initiation sentence, and the combination code corresponding to the identification ID number in the scene option corresponding to the sample initiation sentence, and the sample reply sentence corresponding to the same combination code as the identification ID is corresponding to the chat initiation sentence Reply to the prompt content.
  • the present embodiment initiates the sentence “Do you want to go shopping together” for the sample chat, and the obtained conclusion content corresponding to the reply condition ID number “2+1+1” is 3, respectively, "Pro, I am too far away from you, can you make an appointment next time?", "The weather is good today, I just want to go out and stroll.”, "Although I am far away from you, but The weather is good today, so I still want to go out and stroll.”
  • the system will provide the three reply prompts for the user to select.
  • the present embodiment obtains the reply prompt content through the topic database without manual editing, which improves the chat session speed and improves the user experience.
  • the communication scenario for the second embodiment is: the communication terminal A sends the chat initiation sentence in the text format that the content is "busy?" to the communication terminal B.
  • the communication terminal B obtains the reply prompt content of the chat initiation sentence.
  • Step S301 setting a scene item associated with the preset topic, and a scene option corresponding to the scene item.
  • the number of preset topics in the embodiment is sufficient, and the scene entry associated with the “hello” topic in the preset topic only includes the scene image entry of the communication terminal that receives the chat initiation sentence, and corresponds to the scene image entry.
  • Step S302 creating a sample chat pair with the preset topic as the chat topic, and using the sample chat pair as the topic database corresponding to the preset topic, the sample chat pair including the sample initiation sentence, and the sample initiation sentence corresponding according to the scene option. Sample response sentence.
  • a sample reply sentence corresponding to the scene option is set for each sample start sentence in each sample chat pair, for example, for a sample reply sentence “ Where are you?” You can set up sample response sentences (9 types) corresponding to all scene options, or you can set sample responses corresponding to some scene options. sentence.
  • Step S303 Acquire a topic classification to which the chat initiation sentence received by the communication terminal belongs.
  • the content of the chat initiation sentence is empty in the embodiment, and since the content of the chat initiation sentence is simple, the topic to which the chat initiation sentence belongs is classified as “greeting”.
  • Step S304 acquiring a topic database corresponding to the preset topic that is the same as the topic classification.
  • Step S305 collecting content information of the scene item associated with the topic classification, and obtaining scene information.
  • Step S306 collecting a scene image of the communication terminal that receives the chat initiation sentence.
  • Step S307 extracting the region of interest of the scene image, and matching the visual word closest to the SIFT feature of the region of interest of the scene image in the visual word dictionary.
  • the DOG (Difference of Gaussian) operator is first used to extract the region of interest of the scene image, and then the SIFT feature of each region of interest is calculated, and the region of interest of the scene image is matched in the visual word dictionary.
  • the SIFT features the closest visual word. It is assumed that after matching, the visual words corresponding to the three regions of interest are respectively referred to as “road signs”, “lanes” and “distance signs”.
  • Step S308 classifying the scene image according to the distribution of the visual words of the region of interest of the scene image by using a pre-trained support vector machine classifier to obtain content information of the scene item.
  • the embodiment uses a support vector machine method to design a classifier, and trains 9 types of images classified in a known scene during the training phase (airport, ocean, forest, village, street, tree, tall building, highway, Office), each type of image includes 100 different training sample images.
  • the communication terminal receiving the chat initiation sentence can be obtained.
  • the content information of the scene image entry is "highway".
  • Step S309 matching the same sample initiation sentence as the chat initiation sentence in the topic database, and acquiring the first semantic matching result according to the scenario information.
  • this embodiment is preset In the topic database corresponding to the topic category "Hello", the sample reply sentence for the sample is "Busy?", and the sample response sentence with the scene option "Highway” includes multiple items, for example, "I am at high speed, it is inconvenient to return information. Contact us at the high speed.”, "Sorry, it is not convenient to return information, contact you later", etc., then display the contents of these reply prompts in the communication terminal for the user to select.
  • the scene image of the communication terminal that receives the chat initiation sentence is collected, and the content information of the scene entry is obtained based on the collected scene image, so that the scene information obtained based on the content information is closer to the real scene information, so that the scene information is adopted.
  • the obtained reply prompt matching the chat initiation sentence is highly intelligent and personalized.
  • the communication scenario for the third embodiment is as follows: the communication terminal A sends the communication initiation message to the communication terminal B in the text format “What is the price of the apple?”, and the sentence is initiated for the chat, and the embodiment cannot be based on the established topic.
  • the database obtains the first semantic matching result matched by the database.
  • the method for the communication terminal B to obtain the reply prompt content of the chat initiation sentence includes:
  • Step S401 Acquire a topic classification to which the chat initiation sentence received by the communication terminal belongs.
  • the method for obtaining the topic classification to which the chat initiation sentence belongs can refer to the method adopted in the simplified embodiment 1. Therefore, it is not discussed here. It is assumed that the topic acquired in this embodiment is classified as “Apple mobile phone”.
  • Step S402 performing data collection on user network data of the communication terminal based on the distributed cloud computing manner.
  • the collection of the user network data in the embodiment is implemented by using a network crawler, and the collected network data is stored by the distributed storage device, where the distributed storage device is implemented based on HDFS.
  • Step S403 preprocessing the user network data to obtain pre-processed text, and the pre-processing includes word segmentation processing, semantic disambiguation processing, part-of-speech tagging processing, removal of stop word processing, punctuation symbol processing, and expression character processing.
  • a stop word dictionary is first established, and then the words whose frequency of occurrence is high but have no practical meaning, such as "", "", "?", etc., are deleted.
  • the collected user network data can also be used for part-of-speech tagging and part-of-speech filtering.
  • the part of speech filtering is based on the part-of-speech tagging.
  • the processing of different vocabulary categories, experimental proof, adjectives and adverb pairs Clustering effect The improvement has not been much improved, so it should be removed, leaving only nouns, verbs and acronyms.
  • Step S404 performing text clustering on the pre-processed text by using a K-means clustering algorithm to obtain a text clustering center.
  • the K-means clustering algorithm is used to perform text clustering on the preprocessed text to obtain a text clustering center, which may include the following steps:
  • K data may be either a word or a sentence.
  • the distance between each sample and the center point is obtained by calculating the distance between the word vector corresponding to each sample and the word vector corresponding to the center point.
  • the text clustering center obtained in this embodiment is two, namely: “Apple-Mobile Phone” and “Fruit-Apple”.
  • Step S405 extracting a keyword of the text clustering center as a clustering topic corresponding to the text clustering center.
  • the clustering topic obtained in this embodiment is also two, namely “Apple-Mobile” and “Fruit-Apple”.
  • Step S406 Acquire a clustering topic that is closest to the topic classification to which the chat initiation sentence belongs.
  • the embodiment may obtain the similarity between the topic classification and the cluster topic to which the chat initiation sentence belongs, or may be pre- The relationship between the topic classification and the cluster topic is obtained.
  • By calculating the similarity between the topic classification and the cluster topic to which the chat initiation sentence belongs it is easy to obtain the closest topic classification ("Apple mobile phone") to which the chat initiation sentence belongs.
  • the clustering topic is "Apple-Mobile.”
  • Step S407 Match the chat initiation sentence in the user network data corresponding to the clustering topic to obtain a second semantic matching result.
  • the chat initiation sentence ("What is the price of the apple?") in the user network data corresponding to the clustering topic, it is easy to obtain the price corresponding to the price of the Apple mobile phone instead of The price of the apple in the fruit.
  • the K-means clustering algorithm is used to extract the clustering topic of the user network data, and the chat initiation sentence is matched in the user network data corresponding to the clustering topic closest to the chat initiation sentence, thereby saving a large number of user networks.
  • the matching time of the data matching the chat initiation sentence thereby improving the speed and efficiency of the reply prompt content acquisition, and matching the chat initiation sentence by only the user network data corresponding to the cluster topic closest to the chat initiation sentence, Make the obtained reply prompt content more accurate and intelligent.
  • an apparatus for obtaining a reply prompt content of a chat initiation sentence includes:
  • the topic database creation device 10 is configured to establish a topic database corresponding to the preset topic
  • the topic classification obtaining means 20 is configured to acquire a topic classification to which the chat initiation sentence received by the communication terminal belongs;
  • the first semantic matching device 30 is configured to perform semantic matching on the chat initiation sentence by using a topic database corresponding to the preset topic of the topic classification, obtain a first semantic matching result, and use the first semantic matching result as a reply of the chat initiation sentence. Prompt content;
  • the second semantic matching device 40 is configured to collect data of the user network data of the communication terminal based on the distributed cloud computing manner if the first semantic matching result is not obtained, and perform semantic matching on the chat initiation sentence by using the user network data to obtain The second semantic matching result is used, and the second semantic matching result is used as the reply prompt content of the chat initiation sentence.
  • the topic database creation device 10 includes:
  • a setting device configured to set a scene item associated with the preset topic and a scene option corresponding to the scene item
  • the sample chat pair creation device is configured to create a sample chat pair with a preset topic as a chat topic, and the sample chat pair is used as a topic database corresponding to the preset topic, and the sample chat pair includes a sample initiation sentence, and is set according to the scene option.
  • the sample response sentence corresponding to the sample initiation sentence.
  • the topic classification obtaining device 20 includes:
  • the above chat content of the day initiation sentence and the chat initiation sentence are merged into a combined text in a text format;
  • a keyword extracting device configured to extract keywords of the merged text
  • the topic classification determining means is configured to acquire a topic classification to which the chat initiation sentence belongs according to the keyword.
  • the device for obtaining the reply prompt content of the chat initiation sentence obtains the topic classification to which the chat initiation sentence received by the communication terminal belongs, and uses the customized topic database corresponding to the preset topic with the same topic classification to chat. Initiating a sentence for semantic matching, obtaining a first semantic matching result, and collecting user network data of the communication terminal without obtaining the first semantic matching result, and using the user network data to semantically match the chat initiation sentence to obtain a second
  • the semantic matching result solves the problem that the traditional database matching method does not necessarily obtain the chat reply prompt content matching the chat initiation sentence, thereby resulting in low intelligence of the chat and poor user experience, and fully utilizing the communication terminal.
  • the user network data obtains the reply prompt content of the chat initiation sentence, improves the accuracy of the reply prompt content acquisition, and embodies a higher intelligent level and improves the user experience.
  • the working process and working principle of the method for obtaining the reply prompting content of the chat initiation sentence can be referred to the working process and working principle of the method for obtaining the reply prompt content of the chat initiation sentence.
  • the communication terminal device in the embodiment of the present invention may be a desktop computer, a tablet computer, a personal digital assistant, a mobile phone, a television, an on-board computer, a wearable communication device, or the like.
  • the present invention obtains the topic classification of the chat initiation sentence received by the communication terminal, and uses the customized topic database corresponding to the preset topic with the same topic classification to semantically match the chat initiation sentence.
  • the traditional database matching method does not necessarily obtain the chat reply prompt content matching the chat initiation sentence, thereby resulting in a low degree of intelligence of the chat and a technical problem of poor user experience, and fully utilizing the user network data of the communication terminal to obtain the chat initiation.
  • the reply prompt content of the sentence improves the accuracy of the reply prompt content acquisition, and reflects the higher level of intelligence and improvement.
  • the user experience is not necessarily obtain the chat reply prompt content matching the chat initiation sentence, thereby resulting in a low degree of intelligence of the chat and a technical problem of poor user experience.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

Disclosed are a method and apparatus for obtaining reply prompt content for a chat start sentence. In the method, by creating topic databases corresponding to preset topics and obtaining a topic classification to which a chat start sentence received by a communication terminal belongs; performing semantic match on the chat start sentence by utilizing a topic database corresponding to a preset topic that is the same as the topic classification, to obtain a first semantic match result; and if the first semantic match result is not obtained, performing data acquisition on user network data of the communication terminal based on a distributed cloud computing method, and performing semantic match on the chat start sentence by utilizing the user network data, to obtain a second semantic match result, the technical problems of low degree of intelligence and poor user experience of a chat because the conventional database match method cannot certainly obtain chat reply prompt content that matches a chat start sentence are resolved, the accuracy of obtaining reply prompt content is improved, a relatively high degree of intelligence is reflected, and user experience is improved.

Description

一种获取聊天发起句的回复提示内容的方法及装置Method and device for obtaining reply prompt content of chat initiation sentence 技术领域Technical field

本发明涉及通信技术领域,具体涉及一种获取聊天发起句的回复提示内容的方法及装置。The present invention relates to the field of communications technologies, and in particular, to a method and apparatus for obtaining a reply prompt content of a chat initiation sentence.

背景技术Background technique

目前,不管是智能聊天机器人系统提供的智能聊天回复提示内容,还是通讯终端提供给通讯双方用于选择或智能回复的聊天回复提示内容,大多是通过数据库匹配的方式获取。这种通过数据库匹配方法获取聊天回复提示内容的方法主要分为二个步骤,即首先对聊天发起句进行预处理,获得分词文本,然后将获得的分词文本与预先建立的数据库进行匹配,从而获得聊天回复提示内容。At present, whether it is the smart chat reply prompt content provided by the smart chat robot system, or the chat reply prompt content provided by the communication terminal to the communication parties for selection or intelligent reply, most of them are obtained through database matching. The method for obtaining the content of the chat reply prompt by the database matching method is mainly divided into two steps, that is, the pre-processing of the chat initiation sentence is first obtained, the word segmentation text is obtained, and then the obtained segmentation text is matched with the pre-established database to obtain Chat reply to the prompt content.

但采用这种传统的数据库匹配方式并不一定能获取与聊天发起句匹配的聊天回复提示内容,从而导致聊天的智能化程度低以及用户体验不佳。However, the traditional database matching method does not necessarily obtain the chat reply prompt content matching the chat initiation sentence, thereby resulting in low intelligence and poor user experience.

发明内容Summary of the invention

本发明提供了一种获取聊天发起句的回复提示内容的方法及装置,以解决采用传统的数据库匹配方式并不一定能获取与聊天发起句匹配的聊天回复提示内容导致聊天的智能化程度低以及用户体验不佳的技术问题。The present invention provides a method and apparatus for obtaining a reply prompt content of a chat initiation sentence, so as to solve the problem that the traditional database matching method does not necessarily obtain the chat reply prompt content matching the chat initiation sentence, resulting in low intelligence of the chat and Technical problems with poor user experience.

根据本发明实施例的一方面,提供了一种获取聊天发起句的回复提示内容的方法,包括:According to an aspect of the embodiments of the present invention, a method for obtaining a reply prompt content of a chat initiation sentence is provided, including:

建立与预设话题对应的话题数据库;Establish a topic database corresponding to the preset topic;

获取通讯终端接收的聊天发起句所属的话题分类;Obtaining a topic classification to which the chat initiation sentence received by the communication terminal belongs;

利用与话题分类相同的预设话题对应的话题数据库对聊天发起句进行语义匹配,获取第一语义匹配结果,并将第一语义匹配结果作为聊天发起句的回复提示内容; Securitally matching the chat initiation sentence by using the topic database corresponding to the preset topic of the topic classification, obtaining the first semantic matching result, and using the first semantic matching result as the reply prompt content of the chat initiation sentence;

如果未获得第一语义匹配结果,则基于分布式云计算方式对通讯终端的用户网络数据进行数据采集,并利用用户网络数据对聊天发起句进行语义匹配,获取第二语义匹配结果,并将第二语义匹配结果作为聊天发起句的回复提示内容。If the first semantic matching result is not obtained, the user network data of the communication terminal is collected based on the distributed cloud computing manner, and the chat initiation sentence is semantically matched by using the user network data to obtain the second semantic matching result, and the first The second semantic matching result is used as the reply prompt content of the chat initiation sentence.

可选地,建立与预设话题对应的话题数据库包括:Optionally, establishing a topic database corresponding to the preset topic includes:

设定与预设话题关联的场景条目,以及与场景条目对应的场景选项;Setting a scene entry associated with the preset topic, and a scene option corresponding to the scene entry;

创建以预设话题为聊天主题的样本聊天对,并将样本聊天对作为与预设话题对应的话题数据库,样本聊天对包括样本发起句、根据场景选项设置的与样本发起句对应的样本回复句。A sample chat pair with a preset topic as a chat topic is created, and the sample chat pair is used as a topic database corresponding to the preset topic, and the sample chat pair includes a sample initiation sentence, and a sample response sentence corresponding to the sample initiation sentence set according to the scene option. .

可选地,获取通讯终端接收的聊天发起句所属的话题分类包括:Optionally, obtaining the topic classification to which the chat initiation sentence received by the communication terminal belongs includes:

获取聊天发起句的上文聊天内容,并将聊天发起句和聊天发起句的上文聊天内容合并成文本格式的合并文本;Obtaining the above chat content of the chat initiation sentence, and merging the chat content of the chat initiation sentence and the chat initiation sentence into a combined text in a text format;

提取合并文本的关键词;Extract keywords of the merged text;

根据关键词获取聊天发起句所属的话题分类。The topic classification to which the chat initiation sentence belongs is obtained according to the keyword.

可选地,利用与话题分类相同的预设话题对应的话题数据库对聊天发起句进行语义匹配,获取第一语义匹配结果包括:Optionally, the session initiation sentence is semantically matched by using a topic database corresponding to the preset topic of the topic classification, and obtaining the first semantic matching result includes:

获取与话题分类相同的预设话题对应的话题数据库;Obtaining a topic database corresponding to the preset topic that is the same as the topic classification;

采集与话题分类关联的场景条目的内容信息,获得场景信息;Collecting content information of a scene item associated with the topic classification to obtain scene information;

在话题数据库中匹配与聊天发起句相同的样本发起句,并根据场景信息获取第一语义匹配结果。The same sample initiation sentence as the chat initiation sentence is matched in the topic database, and the first semantic matching result is obtained according to the scenario information.

可选地,利用用户网络数据对聊天发起句进行语义匹配,获取第二语义匹配结果包括:Optionally, semantically matching the chat initiation sentence by using the user network data, and obtaining the second semantic matching result includes:

对用户网络数据进行预处理获得预处理文本,预处理包括分词处理、语义消歧处理、词性标注处理、去除停用词处理、标点符号处理、表情字符处理;The user network data is preprocessed to obtain preprocessed text, and the preprocessing includes word segmentation processing, semantic disambiguation processing, part of speech labeling processing, removal of stop word processing, punctuation symbol processing, and expression character processing;

利用K均值聚类算法对预处理文本进行文本聚类,获得文本聚类中心; The K-means clustering algorithm is used to perform text clustering on the pre-processed text to obtain a text clustering center;

提取文本聚类中心的关键词作为与文本聚类中心对应的聚类话题;Extracting keywords of the text clustering center as clustering topics corresponding to the text clustering center;

获取与聊天发起句所属的话题分类最接近的聚类话题;Obtaining a clustering topic that is closest to the topic classification to which the chat initiation sentence belongs;

在与聚类话题对应的用户网络数据中对聊天发起句进行匹配,获得第二语义匹配结果。The chat initiation sentences are matched in the user network data corresponding to the clustering topic, and the second semantic matching result is obtained.

可选地,场景条目包括:Optionally, the scene entry includes:

发送和接收聊天发起句的通讯终端的关系条目、姓名条目、性别条目、年龄条目、即时通讯账号条目、电子邮箱地址条目、家庭地址条目、职业类别条目、职务条目、工作单位条目、单位地址条目、银行账号条目、好友印象条目、兴趣爱好条目、朋友圈状态条目、心情条目、最近关注话题条目、当前通讯状态条目、场景图像条目、时间条目、节日条目、季节条目、地理位置信息条目、距离条目、通讯频率条目、通讯次数条目、通讯时长条目、发起历史通讯的选择方式条目,其中,选择方式包括从通讯录发起通讯方式、从历史通话记录发起通讯方式、从短信通讯模块发起通讯方式、从拨号盘发起通讯方式。Relationship entry, name entry, gender entry, age entry, instant messaging account entry, email address entry, home address entry, occupation category entry, job entry, work unit entry, unit address entry of the communication terminal that sends and receives the chat initiation sentence , bank account entry, friend impression entry, hobbyist entry, circle of friends status entry, mood entry, recent topic of interest entry, current communication status entry, scene image entry, time entry, holiday entry, season entry, geographic location information entry, distance An entry, a communication frequency entry, a communication time entry, a communication duration entry, and a selection mode entry for initiating historical communication, wherein the selection manner includes starting a communication mode from the address book, initiating a communication mode from the historical call record, and initiating a communication mode from the short message communication module, Initiate communication from the dial pad.

可选地,采集场景条目中的发送或接收聊天发起句的通讯终端的场景图像条目的内容信息包括:Optionally, collecting the content information of the scene image entry of the communication terminal that sends or receives the chat initiation sentence in the scene entry includes:

采集发送或接收聊天发起句的通讯终端的场景图像;Collecting a scene image of a communication terminal that sends or receives a chat initiation sentence;

采用高斯差分DOG算子提取场景训练图像的感兴趣区域,并计算场景训练图像的感兴趣区域的尺度不变特征变换SIFT特征;The Gaussian difference DOG operator is used to extract the region of interest of the scene training image, and the scale-invariant feature transform SIFT feature of the region of interest of the scene training image is calculated;

采用K均值聚类算法对场景训练图像的感兴趣区域的SIFT特征进行聚类,获得多个聚类中心,并建立由与每一个聚类中心对应的视觉单词构成的视觉单词词典;The K-means clustering algorithm is used to cluster the SIFT features of the region of interest of the scene training image to obtain a plurality of cluster centers, and a visual word dictionary composed of visual words corresponding to each cluster center is established;

采用DOG算子提取场景图像的感兴趣区域,并在视觉单词词典中匹配与场景图像的感兴趣区域的SIFT特征最接近的视觉单词;The DOG operator is used to extract the region of interest of the scene image, and the visual word dictionary is matched with the visual word closest to the SIFT feature of the region of interest of the scene image;

根据场景图像的感兴趣区域的视觉单词的分布对场景图像采用预先训练好的支持向量机分类器进行分类,获得发送或接收聊天发起句的通讯 终端的场景图像条目的内容信息。According to the distribution of the visual words of the region of interest of the scene image, the scene image is classified by using a pre-trained support vector machine classifier, and the communication for sending or receiving the chat initiation sentence is obtained. The content information of the scene image entry of the terminal.

根据本发明实施例的另一方面,提供了一种获取聊天发起句的回复提示内容的装置,包括:According to another aspect of the embodiments of the present invention, an apparatus for obtaining a reply prompt content of a chat initiation sentence is provided, including:

话题数据库创建装置,设置为建立与预设话题对应的话题数据库;a topic database creation device, configured to establish a topic database corresponding to the preset topic;

话题分类获取装置,设置为获取通讯终端接收的聊天发起句所属的话题分类;The topic classification obtaining device is configured to obtain a topic classification to which the chat initiation sentence received by the communication terminal belongs;

第一语义匹配装置,设置为利用与话题分类相同的预设话题对应的话题数据库对聊天发起句进行语义匹配,获取第一语义匹配结果,并将第一语义匹配结果作为聊天发起句的回复提示内容;The first semantic matching device is configured to perform semantic matching on the chat initiation sentence by using a topic database corresponding to the preset topic of the topic classification, obtain a first semantic matching result, and use the first semantic matching result as a reply prompt of the chat initiation sentence. content;

第二语义匹配装置,设置为如果未获得第一语义匹配结果,则基于分布式云计算方式对通讯终端的用户网络数据进行数据采集,并利用用户网络数据对聊天发起句进行语义匹配,获取第二语义匹配结果,并将第二语义匹配结果作为聊天发起句的回复提示内容。The second semantic matching device is configured to collect data of the user network data of the communication terminal based on the distributed cloud computing manner if the first semantic matching result is not obtained, and perform semantic matching on the chat initiation sentence by using the user network data to obtain the first The second semantic matching result, and the second semantic matching result is used as the reply prompt content of the chat initiation sentence.

可选地,话题数据库创建装置包括:Optionally, the topic database creation device includes:

设定装置,设置为设定与预设话题关联的场景条目,以及与场景条目对应的场景选项;a setting device configured to set a scene item associated with the preset topic and a scene option corresponding to the scene item;

样本聊天对创建装置,设置为创建以预设话题为聊天主题的样本聊天对,并将样本聊天对作为与预设话题对应的话题数据库,样本聊天对包括样本发起句、根据场景选项设置的与样本发起句对应的样本回复句。The sample chat pair creation device is configured to create a sample chat pair with a preset topic as a chat topic, and the sample chat pair is used as a topic database corresponding to the preset topic, and the sample chat pair includes a sample initiation sentence, and is set according to the scene option. The sample response sentence corresponding to the sample initiation sentence.

可选地,话题分类获取装置包括:Optionally, the topic classification obtaining device includes:

合并文本获取装置,设置为获取聊天发起句的上文聊天内容,并将聊天发起句和聊天发起句的上文聊天内容合并成文本格式的合并文本;a merged text obtaining device, configured to obtain the above chat content of the chat initiation sentence, and merge the chat start sentence and the chat content of the chat initiation sentence into a merged text in a text format;

关键字提取装置,设置为提取合并文本的关键词;a keyword extracting device configured to extract keywords of the merged text;

话题分类确定装置,设置为根据关键词获取聊天发起句所属的话题分类。The topic classification determining means is configured to acquire a topic classification to which the chat initiation sentence belongs according to the keyword.

本发明实施例具有以下有益效果: Embodiments of the present invention have the following beneficial effects:

本发明实施例提供的获取聊天发起句的回复提示内容的方法及装置,该方法通过获取通讯终端接收的聊天发起句所属的话题分类,并利用自定义的与话题分类相同的预设话题对应的话题数据库对聊天发起句进行语义匹配,获得第一语义匹配结果,并在未获得第一语义匹配结果的前提下采集通讯终端的用户网络数据,并利用该用户网络数据对聊天发起句进行语义匹配,获得第二语义匹配结果,解决了采用传统的数据库匹配方式并不一定能获取与聊天发起句匹配的聊天回复提示内容,从而导致聊天的智能化程度低以及用户体验不佳的技术问题,充分利用通讯终端的用户网络数据获取聊天发起句的回复提示内容,提高了回复提示内容获取的准确度,体现了较高的智能化水平,提升了用户体验。The method and device for obtaining the reply prompt content of the chat initiation sentence provided by the embodiment of the present invention, the method for obtaining the topic classification to which the chat initiation sentence received by the communication terminal belongs, and using the customized preset topic corresponding to the topic classification The topic database semantically matches the chat initiation sentence, obtains the first semantic matching result, collects the user network data of the communication terminal without obtaining the first semantic matching result, and uses the user network data to semantically match the chat initiation sentence. The second semantic matching result is obtained, which solves the problem that the traditional database matching method does not necessarily obtain the chat reply prompt content matching the chat initiation sentence, thereby causing low intelligence of the chat and technical problems of poor user experience, fully The user network data of the communication terminal is used to obtain the reply prompt content of the chat initiation sentence, which improves the accuracy of the reply prompt content acquisition, reflects a high level of intelligence, and improves the user experience.

除了上面所描述的目的、特征和优点之外,本发明实施例还有其它的目的、特征和优点。下面将参照图,对本发明作进一步详细的说明。In addition to the objects, features and advantages described above, the present invention has other objects, features and advantages. The invention will now be described in further detail with reference to the drawings.

附图说明DRAWINGS

构成本申请的一部分的附图用来提供对本发明的进一步理解,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。在附图中:The accompanying drawings, which are incorporated in the claims In the drawing:

图1是本发明可选实施例的获取聊天发起句的回复提示内容的方法流程图;1 is a flowchart of a method for obtaining a reply prompt content of a chat initiation sentence according to an optional embodiment of the present invention;

图2是本发明可选实施例假设采集的接收聊天发起句的通讯终端的场景图像;2 is a scenario image of a communication terminal that assumes a received chat initiation sentence according to an alternative embodiment of the present invention;

图3是本发明可选实施例对假设采集的接收聊天发起句的通讯终端的场景图像与视觉单词词典匹配后获得的视觉单词结果图;3 is a diagram of a visual word result obtained by matching a scene image of a communication terminal of a received chat initiation sentence with a visual word dictionary according to an alternative embodiment of the present invention;

图4是本发明可选实施例针对第一个精简实施例获取聊天发起句的回复提示内容的方法流程图;4 is a flowchart of a method for obtaining a reply prompt content of a chat initiation sentence according to an alternative embodiment of the present invention;

图5是本发明可选实施例针对第二个精简实施例获取聊天发起句的回复提示内容的方法流程图; FIG. 5 is a flowchart of a method for obtaining a reply prompt content of a chat initiation sentence according to an alternative embodiment of the present invention;

图6是本发明可选实施例针对第三个精简实施例获取聊天发起句的回复提示内容的方法流程图;6 is a flowchart of a method for obtaining a reply prompt content of a chat initiation sentence for a third simplified embodiment according to an alternative embodiment of the present invention;

图7是本发明可选实施例的获取聊天发起句的回复提示内容的装置的结构框图。FIG. 7 is a structural block diagram of an apparatus for obtaining a reply prompt content of a chat initiation sentence according to an optional embodiment of the present invention.

具体实施方式detailed description

以下结合附图对本发明的实施例进行详细说明,但是本发明可以由权利要求限定和覆盖的多种不同方式实施。The embodiments of the present invention are described in detail below with reference to the accompanying drawings.

参照图1,本发明的可选实施例提供了一种获取聊天发起句的回复提示内容的方法,包括:Referring to FIG. 1, an alternative embodiment of the present invention provides a method for obtaining a reply prompt content of a chat initiation sentence, including:

步骤S101,建立与预设话题对应的话题数据库;Step S101, establishing a topic database corresponding to the preset topic;

步骤S102,获取通讯终端接收的聊天发起句所属的话题分类;Step S102, acquiring a topic classification to which the chat initiation sentence received by the communication terminal belongs;

步骤S103,利用与话题分类相同的预设话题对应的话题数据库对聊天发起句进行语义匹配,获取第一语义匹配结果,并将第一语义匹配结果作为聊天发起句的回复提示内容;Step S103: Perform semantic matching on the chat initiation sentence by using a topic database corresponding to the preset topic of the topic classification, obtain a first semantic matching result, and use the first semantic matching result as a reply prompt content of the chat initiation sentence;

步骤S104,如果未获得第一语义匹配结果,则基于分布式云计算方式对通讯终端的用户网络数据进行数据采集,并利用用户网络数据对聊天发起句进行语义匹配,获取第二语义匹配结果,并将第二语义匹配结果作为聊天发起句的回复提示内容。Step S104: If the first semantic matching result is not obtained, collect data of the user network data of the communication terminal based on the distributed cloud computing manner, and perform semantic matching on the chat initiation sentence by using the user network data to obtain a second semantic matching result. And the second semantic matching result is used as the reply prompt content of the chat initiation sentence.

本发明实施例提供的获取聊天发起句的回复提示内容的方法,通过获取通讯终端接收的聊天发起句所属的话题分类,并利用自定义的与话题分类相同的预设话题对应的话题数据库对聊天发起句进行语义匹配,获得第一语义匹配结果,并在未获得第一语义匹配结果的前提下采集通讯终端的用户网络数据,并利用该用户网络数据对聊天发起句进行语义匹配,获得第二语义匹配结果,解决了采用传统的数据库匹配方式并不一定能获取与聊天发起句匹配的聊天回复提示内容,从而导致聊天的智能化程度低以及用户体验不佳的技术问题,充分利用通讯终端的用户网络数据获取聊天发 起句的回复提示内容,提高了回复提示内容获取的准确度,体现了较高的智能化水平,提升了用户体验。The method for obtaining the reply prompt content of the chat initiation sentence provided by the embodiment of the present invention obtains the topic classification to which the chat initiation sentence received by the communication terminal belongs, and uses the customized topic database corresponding to the preset topic with the same topic classification to chat. Initiating a sentence for semantic matching, obtaining a first semantic matching result, and collecting user network data of the communication terminal without obtaining the first semantic matching result, and using the user network data to semantically match the chat initiation sentence to obtain a second The semantic matching result solves the problem that the traditional database matching method does not necessarily obtain the chat reply prompt content matching the chat initiation sentence, thereby resulting in low intelligence of the chat and poor user experience, and fully utilizing the communication terminal. User network data to get chat The reply prompt content of the sentence improves the accuracy of the content of the reply prompt content, embodies a higher level of intelligence, and improves the user experience.

本实施例中的通讯终端的用户网络数据包括通讯终端的个人信息数据、社交信息数据(微博、微信、论坛、博客等等)、通讯信息数据、网上购物信息数据、上网足迹信息数据等等。通讯信息又包括用户自己的历史通讯记录、使用同一通讯应用软件的其他用户的历史通讯记录以及第三方应用软件提供的通讯记录。可选地,通讯记录又包括通话记录和短信记录,且短信记录又包括手机短信记录和即时通讯消息记录,通话记录又包括手机通话记录和即时通讯语音和视频通话记录。需要说明的是,由于本实施例主要是基于用户网络数据获取聊天过程中与聊天发起句匹配的回复提示内容,故本实施例主要是针对通讯终端的用户网络数据中具有上下文交互的网络聊天数据进行采集,例如微信、QQ的即时通讯聊天记录、与淘宝客服的聊天数据、百度问答、微博私信中的交互或聊天数据等等。此外,在实际的实施过程中,本实施例既可以利用与话题分类相同的预设话题对应的话题数据库对聊天发起句进行语义匹配,也可以利用与话题分类最相近的预设话题对应的话题数据库对聊天发起句进行语义匹配。The user network data of the communication terminal in this embodiment includes personal information data of the communication terminal, social information data (microblog, WeChat, forum, blog, etc.), communication information data, online shopping information data, online footprint information data, and the like. . The communication information includes the user's own historical communication records, historical communication records of other users using the same communication application software, and communication records provided by third-party application software. Optionally, the communication record further includes a call record and a short message record, and the short message record further includes a mobile phone short message record and an instant message record, and the call record further includes a mobile phone call record and an instant message voice and video call record. It should be noted that, in this embodiment, the reply prompt content matching the chat initiation sentence in the chat process is mainly obtained based on the user network data, so the embodiment is mainly for the network chat data with context interaction in the user network data of the communication terminal. Collecting, such as WeChat, QQ instant messaging chat, chat data with Taobao customer service, Baidu question and answer, interaction in Weibo private message or chat data. In addition, in the actual implementation process, the present embodiment can perform semantic matching on the chat initiation sentence by using the topic database corresponding to the preset topic with the same topic classification, and can also use the topic corresponding to the preset topic closest to the topic classification. The database semantically matches the chat initiation sentence.

本实施例对大数据的处理采用基于Hadoop的平台。Hadoop是一个开源分布式计算平台,其核心包括分布式文件系统(Hadoop Distributed Files System,Hadoop,简称为HDFS)。HDFS的众多优点(主要包括高容错性、高伸缩性等)允许用户将Hadoop部署在低廉的硬件上,搭建分布式集群,构成分布式系统。Hadoop数据库(Hadoop DataBase,简称为HBase)是建立在分布式文件系统HDFS之上的提供高可靠性、高性能、列存储、可伸缩、实时读写的分布式数据库系统,主要用来存储非结构化和半结构化的松散数据。本实施例通过分布式存储设备存储采集的网络数据,且分布式存储设备基于HDFS实现。This embodiment uses a Hadoop-based platform for processing big data. Hadoop is an open source distributed computing platform, and its core includes the distributed file system (Hadoop Distributed Files System, Hadoop, referred to as HDFS). The many advantages of HDFS (mainly including high fault tolerance, high scalability, etc.) allow users to deploy Hadoop on low-cost hardware and build distributed clusters to form a distributed system. Hadoop DataBase (HBase) is a distributed database system based on the distributed file system HDFS that provides high reliability, high performance, column storage, scalable, real-time read and write. It is mainly used to store unstructured data. Loose and semi-structured loose data. In this embodiment, the collected network data is stored by the distributed storage device, and the distributed storage device is implemented based on HDFS.

可选地,建立与预设话题对应的话题数据库包括:Optionally, establishing a topic database corresponding to the preset topic includes:

设定与预设话题关联的场景条目,以及与场景条目对应的场景选项; Setting a scene entry associated with the preset topic, and a scene option corresponding to the scene entry;

创建以预设话题为聊天主题的样本聊天对,并将样本聊天对作为与预设话题对应的话题数据库,样本聊天对包括样本发起句、根据场景选项设置的与样本发起句对应的样本回复句。A sample chat pair with a preset topic as a chat topic is created, and the sample chat pair is used as a topic database corresponding to the preset topic, and the sample chat pair includes a sample initiation sentence, and a sample response sentence corresponding to the sample initiation sentence set according to the scene option. .

由于针对同一个聊天发起句(例如“一起去旅游吗?”)在现实生活中往往需要有不同的回复结果(例如“天气不好,下次再去,怎么样?”、“我比较喜欢宅在家,不喜欢旅游。”、“最近工作太忙了,抽不出时间去旅游。”等等),也即针对同一个聊天发起句,通讯终端用户往往需要根据不同的环境或场景给予不同的回复。针对该问题,本实施例在建立与预设话题对应的话题数据库时,首先设定与预设话题关联的场景条目,以及与场景条目对应的场景选项,然后创建以预设话题为聊天主题的样本聊天对,样本聊天对包括样本发起句、根据场景选项设置的与样本发起句对应的样本回复句;并将与预设话题为聊天主题的样本聊天对作为与预设话题对应的话题数据库。Since the phrase for the same chat (for example, "Go travel together?") often requires different responses in real life (for example, "The weather is bad, how about going next time, how?", "I prefer the house." At home, I don't like to travel.", "Recent work is too busy, I can't get time to travel." etc.), that is, for the same chat initiation sentence, communication terminal users often need to give different according to different environments or scenarios. Reply. To solve the problem, in this embodiment, when establishing a topic database corresponding to a preset topic, first setting a scene item associated with the preset topic, and a scene option corresponding to the scene item, and then creating a topic with the preset topic as the chat topic. The sample chat pair includes a sample initiation sentence, a sample response sentence corresponding to the sample initiation sentence set according to the scene option, and a sample chat pair with the preset topic as the chat topic as a topic database corresponding to the preset topic.

可选地,本实施例设定与预设话题关联的场景条目是根据日常经验设定的。参照表1,表1给出了几种不同的预设话题及与其关联的场景条目。例如,如果预设话题为“旅游”预设话题时,则通过日常经验可知,通讯终端对属于“旅游”预设话题的聊天发起句进行回复时,务必会要考虑到天气好坏的因素、有没有时间的因素、地点因素以及是否有旅游兴趣爱好因素等等,而对预设话题为“发邮件”预设话题时,则通过日常经验可知,通讯终端对属于“发邮件”预设话题的聊天发起句进行回复时,务必会考虑到电子邮箱发送或接收的电子邮箱地址因素,而基本不会考虑天气好坏的因素,需要说明的是,本实施例针对不同的预设话题设定的与之关联的场景条目并非是固定的,而仅仅是根据日常经验人为给定的,也即用户可以根据需要自行设定与预设话题关联的场景条目。 Optionally, the embodiment sets the scene item associated with the preset topic according to daily experience. Referring to Table 1, Table 1 shows several different preset topics and their associated scene entries. For example, if the preset topic is a "tourism" default topic, it is known from daily experience that when the communication terminal replies to the chat initiation sentence belonging to the "tourism" preset topic, it is necessary to take into consideration the factors of the weather, There are no time factors, location factors, and whether there are tourism interest factors, etc., and when the default topic is "email" preset topic, it is known through daily experience that the communication terminal belongs to the "mailing" preset topic. When replying to a chat initiation sentence, it is necessary to take into account the e-mail address factor sent or received by the e-mail, and basically does not consider the factors of the weather. It should be noted that the embodiment sets the preset topic for different topics. The scene entries associated with them are not fixed, but are only given according to daily experience, that is, the user can set the scene entries associated with the preset topics as needed.

表1Table 1

Figure PCTCN2016103422-appb-000001
Figure PCTCN2016103422-appb-000001

本实施例设定与场景条目对应的场景选项也是由用户根据需要自定义的。例如,设置与天气条目对应的场景选项时,既可以包括三个场景选项(“1”表示晴朗,“2”表示下雨,“3”表示下雪),也可以仅仅包括二个场景选项(“1”表示天气好,“2”表示天气差);同样地,设置与时间条目对应的场景选项时,既可以包括三个场景选项(“1”表示上午,“2”表示下午,“3”表示晚上),也可以仅仅包括二个场景选项(“1”表示白天,“2”表示晚间)。In this embodiment, the scene option corresponding to the scene entry is also customized by the user as needed. For example, when setting scene options corresponding to weather items, you can include three scene options ("1" for sunny, "2" for rain, "3" for snow), or just two scene options ( “1” means the weather is good, “2” means the weather is bad); similarly, when setting the scene option corresponding to the time entry, it can include three scene options (“1” means morning, “2” means afternoon, “3” "Indicating evening", it is also possible to include only two scene options ("1" for daytime and "2" for nighttime).

需要说明的是,本实施例在根据场景选项设置与样本发起句对应的样 本回复句时,需充分考虑与预设话题关联的所有场景条目的所有场景选项的组合。参照表2,表2为与“逛街”预设话题关联的三个场景条目分别对应的场景选项的代号,其中的关系条目可以指接收和发送聊天发起句的通讯终端的关系条目。从表2中可以看出,通讯终端用户自定义与关系条目对应的场景选项为六项,与距离条目对应的场景选项为三项,与天气条目对应的场景选项为五项。故本实施例所有场景条目的所有场景选项的组合数=6*3*5种,也即在创建与“逛街”预设话题对应的话题数据库中的样本聊天对时,针对每一个样本发起句,最多可以设置与其对应的90个样本回复句。It should be noted that, in this embodiment, the sample corresponding to the sample initiation sentence is set according to the scene option. In this reply sentence, it is necessary to fully consider the combination of all scene options of all scene entries associated with the preset topic. Referring to Table 2, Table 2 is the code name of the scene option corresponding to the three scene entries associated with the "shopping" preset topic, wherein the relationship entry may refer to the relationship entry of the communication terminal that receives and sends the chat initiation sentence. It can be seen from Table 2 that the scene option corresponding to the relationship entry is six items, the scene option corresponding to the distance item is three items, and the scene option corresponding to the weather item is five items. Therefore, the number of combinations of all scene options of all scene entries in this embodiment is 6*3*5, that is, when creating a sample chat pair in the topic database corresponding to the “shopping” preset topic, a sentence is initiated for each sample. You can set up to 90 sample response sentences corresponding to it.

本实施例在创建针对样本发起句的样本回复句时,可以根据需要设置场景选项的组合数目和组合方式的样本回复句,也即针对每一个样本发起句并不需要设置所有场景选项组合情况下的样本答复句。且在可选的实施过程中,本实施例针对每一个场景条目均设置内容信息包括为“空”的场景选项(可以用“0”代号表示),这是因为在实际的实施过程中,可能出现没法获取场景条目对应的内容信息,例如若接收聊天发起句的通讯终端没有安装全球定位系统(Global Positioning System,简称为GPS)定位或没有开通获取地理位置权限时,则系统返回的数据为空。又例如在一些需要依靠互联网查询和搜索获取场景条目内容信息的情况,在通讯终端断开网络链接或进入无网络信号区域时是不能获取相关的搜索结果。此外,在创建话题数据库中的样本聊天对时,例如针对一些具有固定回复的聊天发起句,通讯终端用户只需设置场景选项内容均为空即可。 In this embodiment, when creating a sample reply sentence for a sample start sentence, the number of combinations of the scene options and the sample reply sentence of the combined mode may be set as needed, that is, the start sentence for each sample does not need to set all the scene option combinations. Sample response sentence. And in an optional implementation process, the embodiment sets the content information for each scene item to include a scene option that is “empty” (which may be represented by a “0” code), because in actual implementation, The content information corresponding to the scene entry cannot be obtained. For example, if the communication terminal that receives the chat initiation sentence does not have a Global Positioning System (GPS) positioning or does not open the geographic location permission, the data returned by the system is air. For example, in some cases where it is necessary to rely on the Internet for querying and searching to obtain the content information of the scene item, the related search result cannot be obtained when the communication terminal disconnects the network link or enters the no-network signal area. In addition, when creating a sample chat pair in the topic database, for example, for some chat initiation sentences with a fixed reply, the communication terminal user only needs to set the scene option content to be empty.

表2Table 2

Figure PCTCN2016103422-appb-000002
Figure PCTCN2016103422-appb-000002

本实施例通过设定与预设话题关联的场景条目,以及与场景条目对应的场景选项,并将创建的与预设话题为聊天主题的样本聊天对作为与预设话题对应的话题数据库,大大丰富了针对同一个样本发起句的样本回复句种类,符合实际需要,增强用户体验,且通过考虑与预设话题关联的场景条目,实现从不同的场景及其组合条件出发对样本发起句设置样本回复句,符合人类进行聊天信息回复的逻辑思维,具有较高的智能化和个性化水平。In this embodiment, by setting a scene item associated with a preset topic and a scene option corresponding to the scene item, and creating a sample chat pair with the preset topic as a chat topic as a topic database corresponding to the preset topic, Enrich the sample reply sentence types for the same sample-initiating sentence, meet the actual needs, enhance the user experience, and realize the sample setting of the sample-initiating sentence from different scenes and their combination conditions by considering the scene items associated with the preset topic. The reply sentence is in line with the logical thinking of human replying to chat information, and has a high level of intelligence and personalization.

可选地,获取通讯终端接收的聊天发起句所属的话题分类包括:Optionally, obtaining the topic classification to which the chat initiation sentence received by the communication terminal belongs includes:

获取聊天发起句的上文聊天内容,并将聊天发起句和聊天发起句的上文聊天内容合并成文本格式的合并文本; Obtaining the above chat content of the chat initiation sentence, and merging the chat content of the chat initiation sentence and the chat initiation sentence into a combined text in a text format;

提取合并文本的关键词;Extract keywords of the merged text;

根据关键词获取聊天发起句所属的话题分类。The topic classification to which the chat initiation sentence belongs is obtained according to the keyword.

本实施例确定聊天发起句所属的话题分类并不仅仅是基于聊天发起句,而是基于聊天发起句和聊天发起句的上文聊天内容,而在实际的实施过程中,聊天发起句相对于聊天发起句的上文聊天内容对获取聊天发起句所属的话题分类更具有参考价值,故本实施例可以采用对合并文本分词后获得的分词文本的加权词频统计提取合并文本的关键词,也即对离聊天发起句越接近的聊天内容赋予权重更大的权重系数。本实施例在根据合并文本获得关键词后,可以采用关键词对应的内容作为聊天发起句所属的话题分类,也可以根据预设的关键词与话题分类的关联映射表查询与关键词对应的话题分类。The embodiment determines that the topic classification to which the chat initiation sentence belongs is not only based on the chat initiation sentence but on the above chat content of the chat initiation sentence and the chat initiation sentence, but in the actual implementation process, the chat initiation sentence is relative to the chat. The above-mentioned chat content of the initiating sentence has more reference value for obtaining the topic classification to which the chat initiation sentence belongs. Therefore, in this embodiment, the keyword of the merged text can be extracted by using the weighted word frequency statistics of the segmentation text obtained after the merged text segmentation, that is, The closer the chat content is to the chat initiation sentence, the greater the weighting factor. In this embodiment, after the keyword is obtained according to the merged text, the content corresponding to the keyword may be used as the topic classification to which the chat initiation sentence belongs, or the topic corresponding to the keyword may be queried according to the association mapping table of the preset keyword and the topic classification. classification.

本实施例通过结合聊天发起句和聊天发起句的上文聊天内容获取聊天发起句所属的话题分类,充分考虑了聊天发起句所处的聊天语境,相对于仅仅依靠聊天发起句获得其所属的话题分类的准确度更高。且通过加权的方式对合并文本分词后获得的分词文本的词频统计确定关键词,使得获取的聊天发起句的所属话题分类更精确。In this embodiment, the topic classification of the chat initiation sentence is obtained by combining the chat content of the chat initiation sentence and the chat initiation sentence, and the chat context in which the chat initiation sentence belongs is fully considered, and the chat context is obtained by relying only on the chat initiation sentence. Topic classification is more accurate. And the word frequency statistics of the word segmentation text obtained after the merged text segmentation is determined by the weighting method to determine the keyword, so that the topic classification of the obtained chat initiation sentence is more accurate.

需要说明的是,当本实施例中聊天发起句没有聊天上文内容时,本实施例则仅仅根据聊天发起句获取其所属的话题分类,且本实施例中获取聊天发起句的上文聊天内容的范围由用户自定义,例如可以获取一定时间内的上文聊天内容,或一定内容条数内的上文聊天内容。It should be noted that, in this embodiment, when the chat initiation sentence does not chat with the content, the present embodiment only obtains the topic classification to which the chat initiation sentence belongs, and in this embodiment, the above chat content of the chat initiation sentence is obtained. The scope of the user is customized, for example, the above chat content can be obtained within a certain period of time, or the above chat content within a certain number of content.

可选地,利用与话题分类相同的预设话题对应的话题数据库对聊天发起句进行语义匹配,获取第一语义匹配结果包括:Optionally, the session initiation sentence is semantically matched by using a topic database corresponding to the preset topic of the topic classification, and obtaining the first semantic matching result includes:

获取与话题分类相同的预设话题对应的话题数据库;Obtaining a topic database corresponding to the preset topic that is the same as the topic classification;

采集与话题分类关联的场景条目的内容信息,获得场景信息;Collecting content information of a scene item associated with the topic classification to obtain scene information;

在话题数据库中匹配与聊天发起句相同的样本发起句,并根据场景信息获取第一语义匹配结果。The same sample initiation sentence as the chat initiation sentence is matched in the topic database, and the first semantic matching result is obtained according to the scenario information.

本实施例采集与话题分类的场景条目的内容信息可以采用计算、推理、 查询、搜索或其任意组合的方式。可选地,可以通过对通讯终端的个人信息、社交信息、通讯信息、网上购物信息、上网足迹信息、用户行为信息、用户业务信息等数据的计算、推理、查询、搜索或其任意组合的方式获取与场景条目对应的内容信息,其中,用户行为信息是指用户寻求他所需求的信息时所表现出来的需求表达、信息获取、信息利用等行为的信息。通讯信息又包括用户自己的历史通讯记录、使用同一通讯应用软件的其他用户的历史通讯记录以及第三方应用软件提供的通讯记录。可选地,通讯记录又包括通话记录和短信记录,且短信记录又包括手机短信记录和即时通讯消息记录,通话记录又包括手机通话记录和即时通讯语音和视频通话记录。In this embodiment, the content information of the scene item collected by the topic classification may be calculated, reasoned, The way to query, search, or any combination of them. Optionally, the method for calculating, inferring, querying, searching, or any combination of personal information, social information, communication information, online shopping information, online footprint information, user behavior information, user service information, and the like of the communication terminal may be The content information corresponding to the scene item is obtained, wherein the user behavior information refers to information about the behavior of demand expression, information acquisition, information utilization, etc., which is displayed when the user seeks the information he needs. The communication information includes the user's own historical communication records, historical communication records of other users using the same communication application software, and communication records provided by third-party application software. Optionally, the communication record further includes a call record and a short message record, and the short message record further includes a mobile phone short message record and an instant message record, and the call record further includes a mobile phone call record and an instant message voice and video call record.

例如当场景条目为地理位置信息条目时,可通过查询GPS定位的信息获取,当场景条目为距离条目时,则可以通过计算接收和发送聊天发起句的通讯终端的地理位置差获取,当场景条目为最近关注话题条目时,可通过搜索通讯终端最近的网页浏览记录获取,当场景条目为天气条目时,既可以通过查询天气网页获取,也可通过采集的温度、风向、湿度等气象信息推理获取。For example, when the scene item is a geographical location information item, it can be obtained by querying the GPS positioning information. When the scene item is a distance item, the geographical position difference of the communication terminal that receives and sends the chat initiation sentence can be obtained by calculating the scene item. When the topic item is recently concerned, it can be obtained by searching the latest web browsing record of the communication terminal. When the scene item is a weather item, it can be obtained by querying the weather webpage, or by reasoning the meteorological information such as temperature, wind direction and humidity collected. .

本实施例根据场景信息获取第一语义匹配结果可以包括:首先对场景信息进行标识,获得标识ID,可选地,参照表2,假设本实施例仅仅采集到发送聊天发起句和接收聊天发起句的通讯终端的关系条目的内容信息为“同事”,发送聊天发起句的通讯终端的天气条目的内容信息为“晴朗”,则获得的标识ID号为“3+0+1”,然后在与话题分类相同的预设话题对应的话题数据库匹配与聊天发起句相同的样本发起句,以及在与样本发起句对应的场景选项中匹配与标识ID号对应的组合代号,并将与标识ID相同的组合代号对应的样本答复句作为与聊天发起句对应的回复提示内容。在实际的实施过程中,本实施例在与话题分类相同的预设话题对应的话题数据库匹配与聊天发起句相同的样本发起句时,既可以采用精确匹配获取与聊天发起句相同的样本发起句,也可以采用模糊匹配的方式获取与聊天发起句相似的样本发起句。本实施例采用模糊匹配的方式获取与聊天发起句 相似的样本发起句可以包括:首先对聊天发起句进行预处理,预处理包括分词、语义消歧、词性标注、去除停用词等操作,然后将预处理后的聊天发起句与话题数据库中的样本发起句进行文本匹配,并将文本匹配相似度大于预设阈值的样本发起句作为与聊天发起句匹配的样本发起句。这样,针对同一个聊天发起句,根据不同的场景信息可匹配不同的样本答复句,从而实现了根据通讯终端的场景信息智能获取与聊天发起句对应的回复提示内容,具有较高的智能化程度和个性化水平。Obtaining the first semantic matching result according to the scenario information in this embodiment may include: first identifying the scenario information, and obtaining the identifier ID. Optionally, referring to Table 2, it is assumed that only the sending chat initiation sentence and the receiving chat initiation sentence are collected in this embodiment. The content information of the relationship entry of the communication terminal is “colleague”, and the content information of the weather item of the communication terminal that sends the chat initiation sentence is “clear”, and the obtained identification ID number is “3+0+1”, and then The topic database corresponding to the preset topic with the same topic classification matches the same sample initiation sentence as the chat initiation sentence, and matches the combination code corresponding to the identification ID number in the scenario option corresponding to the sample initiation sentence, and will be the same as the identification ID. The sample reply sentence corresponding to the combination code is used as the reply prompt content corresponding to the chat initiation sentence. In an actual implementation process, in this embodiment, when the topic database corresponding to the preset topic with the same topic classification matches the sample initiation sentence with the same chat initiation sentence, the same sample initiation sentence can be obtained by using exact matching to obtain the same sentence initiation sentence. A fuzzy matching method may also be used to obtain a sample initiation sentence similar to a chat initiation sentence. In this embodiment, the fuzzy matching method is used to obtain the chat initiation sentence. The similar sample initiation sentence may include: first preprocessing the chat initiation sentence, the preprocessing includes word segmentation, semantic disambiguation, part of speech tagging, removing the stop word, and the like, and then the pre-processed chat initiation sentence and the topic database The sample initiation sentence performs text matching, and the text matching sample initiation sentence whose similarity is greater than a preset threshold is used as a sample initiation sentence matching the chat initiation sentence. In this way, for the same chat initiation sentence, different sample reply sentences can be matched according to different scene information, thereby realizing that the reply prompt content corresponding to the chat initiation sentence is intelligently acquired according to the scene information of the communication terminal, and has a high degree of intelligence. And personalization level.

需要说明的是,本实施例中创建的与预设话题对应的话题数据库具有自动学习和自动更新的功能,可选地,当采集到场景条目的内容信息不包括在已创建的话题数据库中场景条目的内容信息时,例如当天气条目的场景选项仅仅包括三个时,分别为:“1”表示晴朗,“2”表示下雨,“3”表示下雪,则当采集到天气场景条目的内容信息为“阴天”时,系统将在天气场景条目下创建代号“4”表示“阴天”的场景选项,以及对应更新场景选项组合及与其对应的样本回复句。另外,本实施例针对每一个场景选项组合的样本答复句可以为一条会话内容,也可以为多条会话内容。It should be noted that the topic database corresponding to the preset topic created in this embodiment has the functions of automatic learning and automatic updating. Optionally, when the content information of the collected scene entry is not included in the created topic database. When the content information of the item is included, for example, when the scene option of the weather item includes only three, respectively: "1" indicates clear, "2" indicates rain, and "3" indicates snowing, and when weather scene entries are collected When the content information is "cloudy", the system will create a scene option with the code "4" indicating "cloudy" under the weather scene entry, and the corresponding updated scene option combination and the corresponding sample reply sentence. In addition, the sample reply sentence that is combined for each scene option in this embodiment may be one session content, or may be multiple session content.

可选地,利用用户网络数据对聊天发起句进行语义匹配,获取第二语义匹配结果包括:Optionally, semantically matching the chat initiation sentence by using the user network data, and obtaining the second semantic matching result includes:

对用户网络数据进行预处理获得预处理文本,预处理包括分词处理、语义消歧处理、词性标注处理、去除停用词处理、标点符号处理、表情字符处理;The user network data is preprocessed to obtain preprocessed text, and the preprocessing includes word segmentation processing, semantic disambiguation processing, part of speech labeling processing, removal of stop word processing, punctuation symbol processing, and expression character processing;

利用K均值聚类算法对预处理文本进行文本聚类,获得文本聚类中心;The K-means clustering algorithm is used to perform text clustering on the pre-processed text to obtain a text clustering center;

提取文本聚类中心的关键词作为与文本聚类中心对应的聚类话题;Extracting keywords of the text clustering center as clustering topics corresponding to the text clustering center;

获取与聊天发起句所属的话题分类最接近的聚类话题;Obtaining a clustering topic that is closest to the topic classification to which the chat initiation sentence belongs;

在与聚类话题对应的用户网络数据中对聊天发起句进行匹配,获得第二语义匹配结果。The chat initiation sentences are matched in the user network data corresponding to the clustering topic, and the second semantic matching result is obtained.

本实施例获取第二语义匹配结果主要是通过将聊天发起句与采集的用户网络数据进行匹配实现的,然而由于用户网络数据一般为大数据的数 据信息,故当直接在用户网络数据中对聊天发起句进行匹配时,可能会获得多个匹配结果或获取的回复提示内容完全不相关,针对该问题,本实施例首先对采集的用户网络数据进行预处理,并对预处理后的预处理文本进行文本聚类,获得文本聚类中心,以及提取文本聚类中心的关键词作为聚类话题,最后在与聊天发起句所属的话题分类最接近的聚类话题对应的用户网络数据中对聊天发起句进行匹配,从而获得第二语义匹配结果。The second semantic matching result is obtained by matching the chat initiation sentence with the collected user network data, but the user network data is generally the number of big data. According to the information, when the chat initiation sentence is directly matched in the user network data, multiple matching results or the obtained reply prompt content may be completely irrelevant. For this problem, the embodiment firstly collects the user network data. Perform preprocessing, perform text clustering on the preprocessed preprocessed text, obtain the text clustering center, and extract the keywords of the text clustering center as the clustering topic, and finally get the closest to the topic classification to which the chat initiation sentence belongs. The chat initiation sentence is matched in the user network data corresponding to the clustering topic, thereby obtaining the second semantic matching result.

可选地,本实施例基于K-means聚类算法对预处理文本进行文本聚类,获得文本聚类中心,可以包括以下步骤:Optionally, the present embodiment performs text clustering on the pre-processed text based on the K-means clustering algorithm to obtain a text clustering center, which may include the following steps:

a、随机选取K个数据作为中心点,本实施例中的中心点既可以是一个词也可以是一句话。a. Randomly select K data as a center point, and the center point in this embodiment may be either a word or a sentence.

b、然后计算每一个样本与中心点的距离,选取最小的距离对应的中心点即为所属的类。可选地,本实施例通过计算每一个样本对应的词向量与中心点对应的词向量之间的距离获取每一个样本与中心点的距离。b. Then calculate the distance between each sample and the center point, and select the center point corresponding to the smallest distance to be the class to which it belongs. Optionally, in this embodiment, the distance between each sample and the center point is obtained by calculating the distance between the word vector corresponding to each sample and the word vector corresponding to the center point.

c、对应每一个类,重新计算中心点(该类别中所有样本的均值)。c. For each class, recalculate the center point (the mean of all samples in the category).

d、重复迭代b、c步骤直至收敛,即聚类中心不再变化。d. Repeat the iterations b and c steps until convergence, that is, the cluster center does not change.

在可选的实施过程中,为了提高根据用户网络数据获得第二语义匹配结果的精确度,通讯终端用户一般会对采集的用户网络数据进行一次或多次筛选后再进行预处理和文本聚类。此外,本实施例获取与聊天发起句所属的话题分类最接近的聚类话题时,既可以通过计算聊天发起句所属的话题分类与聚类话题之间的相似度获得,也可以通过预设的话题分类与聚类话题的关联度获得。In an optional implementation process, in order to improve the accuracy of obtaining the second semantic matching result according to the user network data, the communication terminal user generally performs one or more screenings on the collected user network data before performing preprocessing and text clustering. . In addition, when obtaining the clustering topic that is closest to the topic classification to which the chat initiation sentence belongs, the embodiment may obtain the similarity between the topic classification and the cluster topic to which the chat initiation sentence belongs, or may be preset. The relevance of topic classification and clustering topic is obtained.

本实施例通过K均值聚类算法提取用户网络数据的聚类话题,以及在与聊天发起句最接近的聚类话题对应的用户网络数据中对聊天发起句进行匹配,节省了在大量的用户网络数据对聊天发起句进行匹配的匹配时间,从而提高了回复提示内容获取的速度和效率,且通过只在与聊天发起句最接近的聚类话题对应的用户网络数据中对聊天发起句进行匹配,使得获取的回复提示内容更准确、更智能化。 In this embodiment, the K-means clustering algorithm is used to extract the clustering topic of the user network data, and the chat initiation sentence is matched in the user network data corresponding to the clustering topic closest to the chat initiation sentence, thereby saving a large number of user networks. The matching time of the data matching the chat initiation sentence, thereby improving the speed and efficiency of the reply prompt content acquisition, and matching the chat initiation sentence by only the user network data corresponding to the cluster topic closest to the chat initiation sentence, Make the obtained reply prompt content more accurate and intelligent.

可选地,场景条目包括:Optionally, the scene entry includes:

发送和接收聊天发起句的通讯终端的姓名条目、性别条目、年龄条目、即时通讯账号条目、电子邮箱地址条目、家庭地址条目、职业类别条目、职务条目、工作单位条目、单位地址条目、银行账号条目、好友印象条目、兴趣爱好条目、朋友圈状态条目、心情条目、最近关注话题条目、当前通讯状态条目、场景图像条目、时间条目、节日条目、季节条目、地理位置信息条目、距离条目、通讯频率条目、通讯次数条目、通讯时长条目、发起历史通讯的选择方式条目,其中,选择方式包括从通讯录发起通讯方式、从历史通话记录发起通讯方式、从短信通讯模块发起通讯方式、从拨号盘发起通讯方式。Name entry, gender entry, age entry, instant messaging account entry, email address entry, home address entry, occupation category entry, job entry, work unit entry, unit address entry, bank account number of the communication terminal that sends and receives the chat initiation sentence Entry, friend impression entry, hobbyist entry, circle of friends status entry, mood entry, recent attention topic entry, current communication status entry, scene image entry, time entry, holiday entry, season entry, geographic location information entry, distance entry, communication Frequency entry, communication number entry, communication duration entry, selection mode entry for initiating historical communication, wherein the selection mode includes starting communication mode from the address book, initiating communication mode from the historical call record, initiating communication mode from the short message communication module, and from the dial pad Initiate communication methods.

本实施例的场景条目不限于只包括上述的场景条目,同时也不限于包括上述全部的场景条目,可以由用户自定义或根据需要和系统设计复杂度和设计精度进行选取。The scene entry of this embodiment is not limited to including only the above-mentioned scene entries, and is not limited to including all of the above scene entries, and may be selected by the user or selected according to needs and system design complexity and design precision.

可选地,采集场景条目中的发送或接收聊天发起句的通讯终端的场景图像条目的内容信息包括:Optionally, collecting the content information of the scene image entry of the communication terminal that sends or receives the chat initiation sentence in the scene entry includes:

采集发送或接收聊天发起句的通讯终端的场景图像;Collecting a scene image of a communication terminal that sends or receives a chat initiation sentence;

采用高斯差分(Difference of Gaussian,简称为DOG)算子提取场景训练图像的感兴趣区域,并计算场景训练图像的感兴趣区域的SIFT特征;The region of interest of the scene training image is extracted by using a Difference of Gaussian (DOG) operator, and the SIFT feature of the region of interest of the scene training image is calculated;

采用K均值聚类算法对场景训练图像的感兴趣区域的尺度不变特征变换(Scale-Invariant Feature Transform,简称为SIFT)特征进行聚类,获得多个聚类中心,并建立由与每一个聚类中心对应的视觉单词构成的视觉单词词典;The K-means clustering algorithm is used to cluster the Scale-Invariant Feature Transform (SIFT) features of the region of interest of the scene training image to obtain multiple cluster centers, and each cluster is established. a visual word dictionary composed of visual words corresponding to the class center;

采用DOG算子提取场景图像的感兴趣区域,并在视觉单词词典中匹配与场景图像的感兴趣区域的SIFT特征最接近的视觉单词;The DOG operator is used to extract the region of interest of the scene image, and the visual word dictionary is matched with the visual word closest to the SIFT feature of the region of interest of the scene image;

根据场景图像的感兴趣区域的视觉单词的分布对场景图像采用预先训练好的支持向量机分类器进行分类,获得发送或接收聊天发起句的通讯终端的场景图像条目的内容信息。 The scene image is classified according to the distribution of the visual words of the region of interest of the scene image by using a pre-trained support vector machine classifier, and the content information of the scene image item of the communication terminal that transmits or receives the chat initiation sentence is obtained.

本实施例中的SIFT特征是一种尺度不变特征转换,是在空间尺度中寻找极值点,并提取出其位置、尺度、旋转不变量。可选地,本实施例在视觉单词词典中匹配与场景图像的感兴趣区域的SIFT特征最接近的视觉单词的过程为:将场景图像的每一个感兴趣区域的SIFT特征与视觉单词词典中每一个视觉单词对应的聚类中心的SIFT特征进行相似度计算,当计算出感兴趣区域与视觉单词的相似度大于预设阈值时,就认为该视觉单词是与该感兴趣区域最接近的视觉单词。这样,就能将场景图像的所有感兴趣区域用视觉单词表示,并基于场景图像中的视觉单词的分布情况设计和训练分类器,从而最终获得发送和/或接收聊天发起句的通讯终端的场景图像条目的内容信息。The SIFT feature in this embodiment is a scale-invariant feature transformation, which is to find extreme points in the spatial scale and extract its position, scale, and rotation invariants. Optionally, the process of matching the visual word closest to the SIFT feature of the region of interest of the scene image in the visual word dictionary is: the SIFT feature of each region of interest of the scene image and each of the visual word dictionary The SIFT feature of the cluster center corresponding to a visual word is used for similarity calculation. When the similarity between the region of interest and the visual word is calculated to be greater than a preset threshold, the visual word is considered to be the closest visual word to the region of interest. . In this way, all regions of interest of the scene image can be represented by visual words, and the classifier is designed and trained based on the distribution of visual words in the scene image, thereby finally obtaining a scene of the communication terminal that transmits and/or receives the chat initiation sentence. The content information of the image entry.

参照图2,图2为本实施例假设采集的接收聊天发起句的通讯终端接收聊天发起句的场景图像,通过对该场景图像的五个感兴趣区域的SIFT特征提取,以及通过计算每一个感兴趣区域的SIFT特征与视觉单词词典的相似度,从而获得与上述五个感兴趣区域最接近的视觉单词,分别为“天空”、“国旗”、“建筑物”、“狮子”、“桥”,参照图3,然后再根据获得的与上述五个感兴趣区域最接近的视觉单词,采用预先训练好的支持向量机获得接收聊天发起句的通讯终端的场景图像条目的内容信息。Referring to FIG. 2, FIG. 2 is a schematic diagram of a scenario in which a communication terminal receiving a chat initiation sentence receives a chat initiation sentence according to an embodiment, and extracts SIFT features of five regions of interest of the scene image, and calculates each sense by calculating The similarity between the SIFT feature of the region of interest and the visual word dictionary, thereby obtaining the closest visual words to the above five regions of interest, namely "sky", "flag", "building", "lion", "bridge" Referring to FIG. 3, the content information of the scene image item of the communication terminal receiving the chat initiation sentence is obtained by using the pre-trained support vector machine according to the obtained visual words closest to the above five regions of interest.

本实施例通过采集通讯终端的场景图像获得通讯终端的场景图像条目的内容信息,使得基于场景图像信息获取的场景信息更接近真实场景信息,并且使得采用基于场景图像信息获得的与聊天发起句对应的回复提示内容更加具有个性化,以及更符合通讯语境。In this embodiment, the content information of the scene image item of the communication terminal is obtained by collecting the scene image of the communication terminal, so that the scene information acquired based on the scene image information is closer to the real scene information, and the scene corresponding to the chat initiation sentence is obtained by using the scene image information. The response prompts are more personalized and more in line with the communication context.

下面针对三个精简实施例对本发明实施例的获取聊天发起句的回复提示内容的过程和原理进行更进一步说明。The process and principle of obtaining the reply prompt content of the chat initiation sentence in the embodiment of the present invention are further described below for the three simplified embodiments.

精简实施例一针对的通讯情景为:通讯终端A给通讯终端B发送内容为“要不要一起去逛街?”的文本格式的聊天发起句,参照图4,通讯终端B获取聊天发起句的回复提示内容的方法包括:The communication scenario for the first embodiment is as follows: the communication terminal A sends the chat initiation sentence in the text format of “Would you like to go shopping together?” to the communication terminal B. Referring to FIG. 4, the communication terminal B obtains a reply prompt of the chat initiation sentence. Content methods include:

步骤S201,设定与预设话题关联的场景条目,以及与场景条目对应 的场景选项。可选地,假设本实施例预设话题的数目足够多,且预设话题中与“逛街”话题关联的场景条目包括发送和接收聊天发起句的通讯终端的关系条目、距离条目、接收聊天发起句的通讯终端的天气条目,且与关系条目对应的场景选项共6项,与距离条目对应的场景选项共3项,与天气条目对应的场景选项共5项,参照表2。Step S201, setting a scene entry associated with the preset topic, and corresponding to the scene entry Scene options. Optionally, it is assumed that the number of preset topics in the embodiment is sufficient, and the scene entries associated with the “shopping” topic in the preset topic include a relationship entry of the communication terminal that sends and receives the chat initiation sentence, a distance entry, and a receiving chat initiation. The weather entry of the communication terminal of the sentence has 6 scene options corresponding to the relationship entry, 3 scene options corresponding to the distance entry, and 5 scene options corresponding to the weather item, refer to Table 2.

步骤S202,创建以预设话题为聊天主题的样本聊天对,并将样本聊天对作为与预设话题对应的话题数据库,样本聊天对包括样本发起句、根据场景选项设置的与样本发起句对应的样本回复句。Step S202, creating a sample chat pair with the preset topic as the chat topic, and using the sample chat pair as the topic database corresponding to the preset topic, the sample chat pair including the sample initiation sentence and the sample initiation sentence corresponding according to the scene option. Sample response sentence.

可选地,本实施例创建与“逛街”话题对应的样本聊天对,且针对每一个样本聊天对中的样本发起句均设置自定义组合数目的样本回复句,例如针对样本答复句“要不要一起去逛街?”设置根据场景选项所有组合(共90种)的样本回复句,针对样本答复句“逛街用英语怎么翻译?”设置一个场景选项组合(代号0+0+0)的样本回复句,然后将创建的与“逛街”话题对应的样本聊天对作为与“逛街”话题对应的话题数据库。Optionally, the embodiment creates a sample chat pair corresponding to the “shopping” topic, and sets a custom combination number of sample reply sentences for each sample start sentence in each sample chat pair, for example, for the sample reply sentence “Do you want to Go shopping together?” Set the sample response sentence according to all combinations of scene options (90 types). For the sample reply sentence “How do you translate in English?” Set a sample response sentence of the scene option combination (code 0+0+0) Then, the created sample chat pair corresponding to the "shopping" topic is created as a topic database corresponding to the "shopping" topic.

步骤S203,获取聊天发起句的上文聊天内容,并将聊天发起句和聊天发起句的上文聊天内容合并成文本格式的合并文本。可选地,假设本实施例中聊天发起句的上文内容一共包括四条对话内容,可以为:通讯终端A:忙吗?/通讯终端B:还好。/通讯终端A:最近感觉没有合适的衣服穿了!/通讯终端B:是吧。则获取的合并文本为{忙吗?/还好。/最近感觉没有合适的衣服穿了!/是吧。/要不要一起去逛街呢?}。Step S203: Acquire the above chat content of the chat initiation sentence, and merge the chat content of the chat initiation sentence and the chat initiation sentence into a merged text in a text format. Optionally, it is assumed that the content of the chat initiation sentence in the embodiment includes a total of four conversation contents, which may be: communication terminal A: busy? / Communication terminal B: Fortunately. / Communication terminal A: I feel that there is no suitable clothing to wear recently! / Communication terminal B: Yes. Then the merged text obtained is {busy? / Ok. / Recently I feel that there is no suitable clothing to wear! /right. / Do you want to go shopping together? }.

步骤S204,提取合并文本的关键词。通过对合并文本进行分词、去除停用词、词性标注、语义消歧等操作后,假设提取的分词文本包括{“忙”、“衣服”、“穿”、“逛街”},且本实施例采取加权词频统计的方式进行词频统计,并选取最大加权词频统计值对应的分词文本为合并文本的关键词。可选地,由于本实施例的合并文本一共包括五条聊天内容,则分别设置的五个加权系数分别为k1=0.5、k2=0.2、k3=0.15、k4=0.1、k5=0.05,其中加权系数随合并文本中的聊天内容离聊天发起句时间间隔由小至大逐渐 递减,而加权系数的具体值由用户根据需要自定义。Step S204, extracting keywords of the merged text. After the operation of segmenting the merged text, removing the stop words, part of speech tagging, semantic disambiguation, etc., it is assumed that the extracted word segmentation text includes {"busy", "clothing", "wearing", "shopping"}, and this embodiment The word frequency statistics are performed by weighted word frequency statistics, and the word segmentation text corresponding to the maximum weighted word frequency statistics value is selected as the keyword of the combined text. Optionally, since the merged text of the embodiment includes five pieces of chat content, the five weighting coefficients respectively set are respectively k1=0.5, k2=0.2, k3=0.15, k4=0.1, and k5=0.05, wherein the weighting coefficient The time interval from the chat content in the merged text is from small to large. Decrement, and the specific value of the weighting factor is customized by the user as needed.

由于本实施例提取的分词文本的词频数都为一,故经过对每个分词文本进行加权词频计算后,可以获得关键词为“逛街”。Since the word frequency of the word segmentation text extracted in this embodiment is one, after the weight word frequency calculation is performed on each word segmentation text, the keyword can be obtained as “shopping”.

步骤S205,根据关键词确定聊天发起句所属的话题分类。可选地,本实施例采用关键词对应的内容作为聊天发起句所属的话题分类,即本实施例中聊天发起句所属的话题分类为“逛街”。Step S205, determining a topic classification to which the chat initiation sentence belongs according to the keyword. Optionally, in this embodiment, the content corresponding to the keyword is used as the topic classification to which the chat initiation sentence belongs, that is, the topic to which the chat initiation sentence belongs in this embodiment is classified as “shopping”.

步骤S206,获取与话题分类相同的预设话题对应的话题数据库。Step S206: Acquire a topic database corresponding to the preset topic that is the same as the topic classification.

步骤S207,采集与话题分类关联的场景条目的内容信息,获得场景信息。可选地,由于本实施例预设话题中与“逛街”话题关联的场景条目包括:f1=发送聊天发起句和接收聊天发起句的通讯终端的关系条目、f2=发送聊天发起句和接收聊天发起句的通讯终端的距离条目、f3=接收聊天发起句的通讯终端的天气条目,参见表1。且假设本实施例通过查询接收聊天发起句的通讯终端的通讯录备注信息,获得f1=发送聊天发起句和接收聊天发起句的通讯终端的关系条目的内容信息为“朋友”,且通过计算发送聊天发起句和接收聊天发起句的通讯终端的地理位置差获得f2=发送聊天发起句和接收聊天发起句的通讯终端的距离条目的内容信息为“远”(预先定义地理位置差大于10公里时返回“远”内容信息至距离条目),且通过在互联网上搜索接收聊天发起句的通讯终端所在地理位置的天气信息获得f3=接收聊天发起句的通讯终端的天气条目的内容信息为“晴朗”。Step S207: Collect content information of a scene item associated with the topic classification, and obtain scene information. Optionally, the scene entry associated with the “shopping” topic in the preset topic in this embodiment includes: f1=sending a chat initiation sentence and a relationship entry of a communication terminal receiving the chat initiation sentence, f2=sending a chat initiation sentence, and receiving a chat The distance entry of the communication terminal of the initiation sentence, f3 = the weather entry of the communication terminal receiving the chat initiation sentence, see Table 1. And it is assumed that the content information of the relationship entry of the communication terminal that receives the chat initiation sentence and the communication initiation terminal is “friend” by querying the address book remark information of the communication terminal that receives the chat initiation sentence, and is sent by calculation. The location difference between the chat initiation sentence and the communication terminal receiving the chat initiation sentence is f2=the content information of the distance entry of the communication terminal that sends the chat initiation sentence and receives the chat initiation sentence is “far” (pre-defined geographical difference is greater than 10 km) Returning "far" content information to the distance entry), and obtaining the weather information of the geographic location of the communication terminal receiving the chat initiation sentence by searching the Internet for the weather information of the communication terminal receiving the chat initiation sentence is "clear" .

步骤S208,在话题数据库中匹配与聊天发起句相同的样本发起句,并根据场景信息获取第一语义匹配结果。可选地,首先对场景信息进行标识,获得标识ID,参照表2,则获得的标识ID号为“2+1+1”,然后在与话题分类相同的预设话题对应的话题数据库匹配与聊天发起句相同的样本发起句,以及在与样本发起句对应的场景选项中匹配与标识ID号对应的组合代号,并将与标识ID相同的组合代号对应的样本答复句作为与聊天发起句对应的回复提示内容。假设本实施例针对样本聊天发起句“要不要一起去逛街呢”,获取的与回复条件ID号“2+1+1”对应的结论内容为 3条,分别为“亲,我离你那太远了,下次再约可否?”、“今天天气不错,我正好也想出来逛逛。”、“虽然我离你那挺远的,但今天天气不错,所以还是想出来逛逛”。在实际的聊天回复时,系统将提供该三条回复提示内容供用户选择。Step S208, matching the same sample initiation sentence as the chat initiation sentence in the topic database, and acquiring the first semantic matching result according to the scenario information. Optionally, the scenario information is first identified, and the identifier ID is obtained. Referring to Table 2, the identifier ID number obtained is “2+1+1”, and then the topic database corresponding to the preset topic with the same topic classification is matched and The same as the sample initiation sentence of the chat initiation sentence, and the combination code corresponding to the identification ID number in the scene option corresponding to the sample initiation sentence, and the sample reply sentence corresponding to the same combination code as the identification ID is corresponding to the chat initiation sentence Reply to the prompt content. It is assumed that the present embodiment initiates the sentence “Do you want to go shopping together” for the sample chat, and the obtained conclusion content corresponding to the reply condition ID number “2+1+1” is 3, respectively, "Pro, I am too far away from you, can you make an appointment next time?", "The weather is good today, I just want to go out and stroll.", "Although I am far away from you, but The weather is good today, so I still want to go out and stroll." In the actual chat reply, the system will provide the three reply prompts for the user to select.

可以看出,当采集的场景条目的内容信息不同时,获取的场景信息不同,从而获取的回复提示内容不同,故解决了现有获取回复提示内容的方法没有结合参与聊天的通讯终端的场景信息,导致获取的回复提示内容单一固定,聊天的智能化程度低以及用户体验不佳的技术问题,实现了根据不同的场景信息获取不同的回复提示内容,且获取的回复提示内容充分结合了通讯双方的场景信息,并与通讯双方息息相关,体现了较高的智能化和个性化水平。此外,本实施例通过话题数据库获取回复提示内容无需人工编辑,提高了聊天会话速度,提升了用户体验。It can be seen that when the content information of the collected scene entries is different, the acquired scene information is different, and the obtained reply prompt content is different, so that the existing method for obtaining the reply prompt content is not combined with the scene information of the communication terminal participating in the chat. The result that the obtained reply prompt content is fixed, the degree of intelligence of the chat is low, and the user experience is not good, and the different reply prompt contents are obtained according to different scene information, and the obtained reply prompt content fully combines the communication parties The scene information is closely related to both parties, reflecting a high level of intelligence and personalization. In addition, the present embodiment obtains the reply prompt content through the topic database without manual editing, which improves the chat session speed and improves the user experience.

精简实施例二针对的通讯情景为:通讯终端A给通讯终端B发送内容为“忙吗?”的文本格式的聊天发起句,参照图5,通讯终端B获取聊天发起句的回复提示内容的方法包括:The communication scenario for the second embodiment is: the communication terminal A sends the chat initiation sentence in the text format that the content is "busy?" to the communication terminal B. Referring to FIG. 5, the communication terminal B obtains the reply prompt content of the chat initiation sentence. include:

步骤S301,设定与预设话题关联的场景条目,以及与场景条目对应的场景选项。可选地,假设本实施例预设话题的数目足够多,且预设话题中与“打招呼”话题关联的场景条目仅仅包括接收聊天发起句的通讯终端的场景图像条目,且与场景图像条目对应的场景选项共9项,分别为机场、海洋、森林、村庄、街道、树木、高楼、高速公路、办公室。Step S301, setting a scene item associated with the preset topic, and a scene option corresponding to the scene item. Optionally, it is assumed that the number of preset topics in the embodiment is sufficient, and the scene entry associated with the “hello” topic in the preset topic only includes the scene image entry of the communication terminal that receives the chat initiation sentence, and corresponds to the scene image entry. There are a total of nine scene options, namely airports, oceans, forests, villages, streets, trees, high-rises, highways, and offices.

步骤S302,创建以预设话题为聊天主题的样本聊天对,并将样本聊天对作为与预设话题对应的话题数据库,样本聊天对包括样本发起句、根据场景选项设置的与样本发起句对应的样本回复句。Step S302, creating a sample chat pair with the preset topic as the chat topic, and using the sample chat pair as the topic database corresponding to the preset topic, the sample chat pair including the sample initiation sentence, and the sample initiation sentence corresponding according to the scene option. Sample response sentence.

可选地,本实施例在创建与“打招呼”为聊天主题的样本聊天对时,针对每一个样本聊天对中的样本发起句均设置与场景选项对应的样本回复句,例如针对样本答复句“你在哪呢?”既可以设置与所有场景选项对应的样本回复句(共9种),也可以设置与部分场景选项对应的样本回复 句。Optionally, in this embodiment, when creating a sample chat pair with “calling” as a chat topic, a sample reply sentence corresponding to the scene option is set for each sample start sentence in each sample chat pair, for example, for a sample reply sentence “ Where are you?” You can set up sample response sentences (9 types) corresponding to all scene options, or you can set sample responses corresponding to some scene options. sentence.

步骤S303,获取通讯终端接收的聊天发起句所属的话题分类。可选地,假设本实施例中聊天发起句的上文内容为空,且鉴于聊天发起句的内容简单,故易获取聊天发起句所属的话题分类为“打招呼”。Step S303: Acquire a topic classification to which the chat initiation sentence received by the communication terminal belongs. Optionally, it is assumed that the content of the chat initiation sentence is empty in the embodiment, and since the content of the chat initiation sentence is simple, the topic to which the chat initiation sentence belongs is classified as “greeting”.

步骤S304,获取与话题分类相同的预设话题对应的话题数据库。Step S304, acquiring a topic database corresponding to the preset topic that is the same as the topic classification.

步骤S305,采集与话题分类关联的场景条目的内容信息,获得场景信息。可选地,由于本实施例预设话题中与“打招呼”话题关联的场景条目仅包括f1=接收聊天发起句的通讯终端的场景图像条目。Step S305, collecting content information of the scene item associated with the topic classification, and obtaining scene information. Optionally, the scene entry associated with the “greeting” topic in the preset topic in this embodiment includes only f1=the scene image entry of the communication terminal that receives the chat initiation sentence.

步骤S306,采集接收聊天发起句的通讯终端的场景图像。Step S306, collecting a scene image of the communication terminal that receives the chat initiation sentence.

步骤S307,提取场景图像的感兴趣区域,并在视觉单词词典中匹配与场景图像的感兴趣区域的SIFT特征最接近的视觉单词。Step S307, extracting the region of interest of the scene image, and matching the visual word closest to the SIFT feature of the region of interest of the scene image in the visual word dictionary.

可选地,本实施例首先通过DOG(Difference of Gaussian)算子提取场景图像的感兴趣区域,然后计算每一个感兴趣区域的SIFT特征,并在视觉单词词典中匹配与场景图像的感兴趣区域的SIFT特征最接近的视觉单词。假设通过匹配后,本实施例匹配出与三个感兴趣区域对应的视觉单词分别为“路标”、“车道”、“距离指示牌”。Optionally, in this embodiment, the DOG (Difference of Gaussian) operator is first used to extract the region of interest of the scene image, and then the SIFT feature of each region of interest is calculated, and the region of interest of the scene image is matched in the visual word dictionary. The SIFT features the closest visual word. It is assumed that after matching, the visual words corresponding to the three regions of interest are respectively referred to as “road signs”, “lanes” and “distance signs”.

步骤S308,根据场景图像的感兴趣区域的视觉单词的分布对场景图像采用预先训练好的支持向量机分类器进行分类,获得场景条目的内容信息。可选地,本实施例采用支持向量机的方法设计分类器,且在训练阶段对已知场景分类的9类图像进行训练(机场、海洋、森林、村庄、街道、树木、高楼、高速公路、办公室),每类图像均包括100幅不同的训练样本图像。通过将步骤S307得到的包括三个视觉单词的场景图像(“路标”、“车道”、“距离指示牌”)输入到预先训练好的支持向量机分类器,可以得到接收聊天发起句的通讯终端的场景图像条目的内容信息为“高速公路”。Step S308, classifying the scene image according to the distribution of the visual words of the region of interest of the scene image by using a pre-trained support vector machine classifier to obtain content information of the scene item. Optionally, the embodiment uses a support vector machine method to design a classifier, and trains 9 types of images classified in a known scene during the training phase (airport, ocean, forest, village, street, tree, tall building, highway, Office), each type of image includes 100 different training sample images. By inputting the scene image ("roadmap", "lane", "distance sign") including three visual words obtained in step S307 to the pre-trained support vector machine classifier, the communication terminal receiving the chat initiation sentence can be obtained. The content information of the scene image entry is "highway".

步骤S309,在话题数据库中匹配与聊天发起句相同的样本发起句,并根据场景信息获取第一语义匹配结果。可选地,假设本实施例预先设定 的与话题分类“打招呼”对应的话题数据库中,针对样本发起句“忙吗?”,且场景选项为“高速公路”的样本回复句包括多项,例如“我在高速上,不方便回信息,下高速和您联系。”、“不好意思,现在不方便回信息,稍后和您联系”等等,则分别将这些回复提示内容显示在通讯终端供用户选择。Step S309, matching the same sample initiation sentence as the chat initiation sentence in the topic database, and acquiring the first semantic matching result according to the scenario information. Optionally, it is assumed that this embodiment is preset In the topic database corresponding to the topic category "Hello", the sample reply sentence for the sample is "Busy?", and the sample response sentence with the scene option "Highway" includes multiple items, for example, "I am at high speed, it is inconvenient to return information. Contact us at the high speed.", "Sorry, it is not convenient to return information, contact you later", etc., then display the contents of these reply prompts in the communication terminal for the user to select.

本实施例通过采集接收聊天发起句的通讯终端的场景图像,并基于采集的场景图像获取场景条目的内容信息,使得基于该内容信息获得的场景信息更接近真实场景信息,从而使得采用基于场景信息获得的与聊天发起句匹配的回复提示内容智能化程度高、具有个性化。In this embodiment, the scene image of the communication terminal that receives the chat initiation sentence is collected, and the content information of the scene entry is obtained based on the collected scene image, so that the scene information obtained based on the content information is closer to the real scene information, so that the scene information is adopted. The obtained reply prompt matching the chat initiation sentence is highly intelligent and personalized.

精简实施例三针对的通讯情景为:通讯终端A给通讯终端B发送内容为“苹果价格是多少?”的文本格式的聊天发起句,且针对该聊天发起句,本实施例不能根据建立的话题数据库获得与之匹配的第一语义匹配结果,参照图6,通讯终端B获取聊天发起句的回复提示内容的方法包括:The communication scenario for the third embodiment is as follows: the communication terminal A sends the communication initiation message to the communication terminal B in the text format “What is the price of the apple?”, and the sentence is initiated for the chat, and the embodiment cannot be based on the established topic. The database obtains the first semantic matching result matched by the database. Referring to FIG. 6, the method for the communication terminal B to obtain the reply prompt content of the chat initiation sentence includes:

步骤S401,获取通讯终端接收的聊天发起句所属的话题分类。本实施例获取聊天发起句所属的话题分类可参照精简实施例一采取的方法,故在此不再论述,假设本实施例获取的话题分类为“苹果手机”。Step S401: Acquire a topic classification to which the chat initiation sentence received by the communication terminal belongs. The method for obtaining the topic classification to which the chat initiation sentence belongs can refer to the method adopted in the simplified embodiment 1. Therefore, it is not discussed here. It is assumed that the topic acquired in this embodiment is classified as “Apple mobile phone”.

步骤S402,基于分布式云计算方式对通讯终端的用户网络数据进行数据采集。可选地,本实施例中用户网络数据的采集是通过网络爬虫实现的,通过分布式存储设备存储采集的网络数据,其中分布式存储设备基于HDFS实现。Step S402, performing data collection on user network data of the communication terminal based on the distributed cloud computing manner. Optionally, the collection of the user network data in the embodiment is implemented by using a network crawler, and the collected network data is stored by the distributed storage device, where the distributed storage device is implemented based on HDFS.

步骤S403,对用户网络数据进行预处理获得预处理文本,预处理包括分词处理、语义消歧处理、词性标注处理、去除停用词处理、标点符号处理、表情字符处理。可选地,本实施例去除停用词时,首先建立一个停用词词典,然后匹配去除那些出现频率很高但是有没有实际意义的词,例如“的”“了”“吗”等。在实际的实施过程中,还可以对采集的用户网络数据进行词性标注和词性过滤,其中词性过滤又建立在词性标注的基础上,对不同性质的词汇分门别类的处理,实验证明,形容词和副词对聚类的效 果的提高没有太大的改善,所以应予以去除,只保留名词、动词和缩略词。Step S403, preprocessing the user network data to obtain pre-processed text, and the pre-processing includes word segmentation processing, semantic disambiguation processing, part-of-speech tagging processing, removal of stop word processing, punctuation symbol processing, and expression character processing. Optionally, in the embodiment, when the stop word is removed, a stop word dictionary is first established, and then the words whose frequency of occurrence is high but have no practical meaning, such as "", "", "?", etc., are deleted. In the actual implementation process, the collected user network data can also be used for part-of-speech tagging and part-of-speech filtering. The part of speech filtering is based on the part-of-speech tagging. The processing of different vocabulary categories, experimental proof, adjectives and adverb pairs Clustering effect The improvement has not been much improved, so it should be removed, leaving only nouns, verbs and acronyms.

步骤S404,利用K均值聚类算法对预处理文本进行文本聚类,获得文本聚类中心。本实施例基于K-means聚类算法对预处理文本进行文本聚类,获得文本聚类中心,可以包括以下步骤:Step S404, performing text clustering on the pre-processed text by using a K-means clustering algorithm to obtain a text clustering center. In this embodiment, the K-means clustering algorithm is used to perform text clustering on the preprocessed text to obtain a text clustering center, which may include the following steps:

a、随机选取K个数据作为中心点,本实施例中的中心点既可以是一个词也可以是一句话。a. Randomly select K data as a center point, and the center point in this embodiment may be either a word or a sentence.

b、然后计算每一个样本与中心点的距离,选取最小的距离对应的中心点即为所属的类。可选地,本实施例通过计算每一个样本对应的词向量与中心点对应的词向量之间的距离获取每一个样本与中心点的距离。b. Then calculate the distance between each sample and the center point, and select the center point corresponding to the smallest distance to be the class to which it belongs. Optionally, in this embodiment, the distance between each sample and the center point is obtained by calculating the distance between the word vector corresponding to each sample and the word vector corresponding to the center point.

c、对应每一个类,重新计算中心点(该类别中所有样本的均值)。c. For each class, recalculate the center point (the mean of all samples in the category).

d、重复迭代b、c步骤直至收敛,即聚类中心不再变化。d. Repeat the iterations b and c steps until convergence, that is, the cluster center does not change.

可选地,假设本实施例获取的文本聚类中心为两个,分别为:“苹果-手机”和“水果-苹果”。Optionally, it is assumed that the text clustering center obtained in this embodiment is two, namely: “Apple-Mobile Phone” and “Fruit-Apple”.

步骤S405,提取文本聚类中心的关键词作为与文本聚类中心对应的聚类话题。可选地,本实施例获取的聚类话题也为两个,分别为“苹果-手机”和“水果-苹果”。Step S405, extracting a keyword of the text clustering center as a clustering topic corresponding to the text clustering center. Optionally, the clustering topic obtained in this embodiment is also two, namely “Apple-Mobile” and “Fruit-Apple”.

步骤S406,获取与聊天发起句所属的话题分类最接近的聚类话题。可选地,本实施例获取与聊天发起句所属的话题分类最接近的聚类话题时,既可以通过计算聊天发起句所属的话题分类与聚类话题之间的相似度获得,也可以通过预设的话题分类与聚类话题的关联度获得,通过计算聊天发起句所属的话题分类与聚类话题之间的相似度很容易获得与聊天发起句所属的话题分类(“苹果手机”)最接近的聚类话题为“苹果-手机”。Step S406: Acquire a clustering topic that is closest to the topic classification to which the chat initiation sentence belongs. Optionally, when obtaining the clustering topic that is closest to the topic classification to which the chat initiation sentence belongs, the embodiment may obtain the similarity between the topic classification and the cluster topic to which the chat initiation sentence belongs, or may be pre- The relationship between the topic classification and the cluster topic is obtained. By calculating the similarity between the topic classification and the cluster topic to which the chat initiation sentence belongs, it is easy to obtain the closest topic classification ("Apple mobile phone") to which the chat initiation sentence belongs. The clustering topic is "Apple-Mobile."

步骤S407,在与聚类话题对应的用户网络数据中对聊天发起句进行匹配,获得第二语义匹配结果。可选地,本实施例通过在与聚类话题对应的用户网络数据中对聊天发起句(“苹果价格是多少?”)进行匹配,很容易获得与之对应的价格是苹果手机的价格而非水果中的苹果的价格。 Step S407: Match the chat initiation sentence in the user network data corresponding to the clustering topic to obtain a second semantic matching result. Optionally, in this embodiment, by matching the chat initiation sentence ("What is the price of the apple?") in the user network data corresponding to the clustering topic, it is easy to obtain the price corresponding to the price of the Apple mobile phone instead of The price of the apple in the fruit.

本实施例通过K均值聚类算法提取用户网络数据的聚类话题,以及在与聊天发起句最接近的聚类话题对应的用户网络数据中对聊天发起句进行匹配,节省了在大量的用户网络数据对聊天发起句进行匹配的匹配时间,从而提高了回复提示内容获取的速度和效率,且通过只在与聊天发起句最接近的聚类话题对应的用户网络数据中对聊天发起句进行匹配,使得获取的回复提示内容更准确、更智能化。In this embodiment, the K-means clustering algorithm is used to extract the clustering topic of the user network data, and the chat initiation sentence is matched in the user network data corresponding to the clustering topic closest to the chat initiation sentence, thereby saving a large number of user networks. The matching time of the data matching the chat initiation sentence, thereby improving the speed and efficiency of the reply prompt content acquisition, and matching the chat initiation sentence by only the user network data corresponding to the cluster topic closest to the chat initiation sentence, Make the obtained reply prompt content more accurate and intelligent.

参照图7,本发明的可选实施例提供的获取聊天发起句的回复提示内容的装置,包括:Referring to FIG. 7, an apparatus for obtaining a reply prompt content of a chat initiation sentence according to an optional embodiment of the present invention includes:

话题数据库创建装置10,设置为建立与预设话题对应的话题数据库,The topic database creation device 10 is configured to establish a topic database corresponding to the preset topic,

话题分类获取装置20,设置为获取通讯终端接收的聊天发起句所属的话题分类;The topic classification obtaining means 20 is configured to acquire a topic classification to which the chat initiation sentence received by the communication terminal belongs;

第一语义匹配装置30,设置为利用与话题分类相同的预设话题对应的话题数据库对聊天发起句进行语义匹配,获取第一语义匹配结果,并将第一语义匹配结果作为聊天发起句的回复提示内容;The first semantic matching device 30 is configured to perform semantic matching on the chat initiation sentence by using a topic database corresponding to the preset topic of the topic classification, obtain a first semantic matching result, and use the first semantic matching result as a reply of the chat initiation sentence. Prompt content;

第二语义匹配装置40,设置为如果未获得第一语义匹配结果,则基于分布式云计算方式对通讯终端的用户网络数据进行数据采集,并利用用户网络数据对聊天发起句进行语义匹配,获取第二语义匹配结果,并将第二语义匹配结果作为聊天发起句的回复提示内容。The second semantic matching device 40 is configured to collect data of the user network data of the communication terminal based on the distributed cloud computing manner if the first semantic matching result is not obtained, and perform semantic matching on the chat initiation sentence by using the user network data to obtain The second semantic matching result is used, and the second semantic matching result is used as the reply prompt content of the chat initiation sentence.

可选地,话题数据库创建装置10包括:Optionally, the topic database creation device 10 includes:

设定装置,设置为设定与预设话题关联的场景条目,以及与场景条目对应的场景选项;a setting device configured to set a scene item associated with the preset topic and a scene option corresponding to the scene item;

样本聊天对创建装置,设置为创建以预设话题为聊天主题的样本聊天对,并将样本聊天对作为与预设话题对应的话题数据库,样本聊天对包括样本发起句、根据场景选项设置的与样本发起句对应的样本回复句。The sample chat pair creation device is configured to create a sample chat pair with a preset topic as a chat topic, and the sample chat pair is used as a topic database corresponding to the preset topic, and the sample chat pair includes a sample initiation sentence, and is set according to the scene option. The sample response sentence corresponding to the sample initiation sentence.

可选地,话题分类获取装置20包括:Optionally, the topic classification obtaining device 20 includes:

合并文本获取装置,设置为获取聊天发起句的上文聊天内容,并将聊 天发起句和聊天发起句的上文聊天内容合并成文本格式的合并文本;Consolidate the text acquisition device, set to get the chat content of the chat initiation sentence, and chat The above chat content of the day initiation sentence and the chat initiation sentence are merged into a combined text in a text format;

关键字提取装置,设置为提取合并文本的关键词;a keyword extracting device configured to extract keywords of the merged text;

话题分类确定装置,设置为根据关键词获取聊天发起句所属的话题分类。The topic classification determining means is configured to acquire a topic classification to which the chat initiation sentence belongs according to the keyword.

本发明实施例提供的获取聊天发起句的回复提示内容的装置,通过获取通讯终端接收的聊天发起句所属的话题分类,并利用自定义的与话题分类相同的预设话题对应的话题数据库对聊天发起句进行语义匹配,获得第一语义匹配结果,并在未获得第一语义匹配结果的前提下采集通讯终端的用户网络数据,并利用该用户网络数据对聊天发起句进行语义匹配,获得第二语义匹配结果,解决了采用传统的数据库匹配方式并不一定能获取与聊天发起句匹配的聊天回复提示内容,从而导致聊天的智能化程度低以及用户体验不佳的技术问题,充分利用通讯终端的用户网络数据获取聊天发起句的回复提示内容,提高了回复提示内容获取的准确度,体现了较高的智能化水平,提升了用户体验。The device for obtaining the reply prompt content of the chat initiation sentence provided by the embodiment of the present invention obtains the topic classification to which the chat initiation sentence received by the communication terminal belongs, and uses the customized topic database corresponding to the preset topic with the same topic classification to chat. Initiating a sentence for semantic matching, obtaining a first semantic matching result, and collecting user network data of the communication terminal without obtaining the first semantic matching result, and using the user network data to semantically match the chat initiation sentence to obtain a second The semantic matching result solves the problem that the traditional database matching method does not necessarily obtain the chat reply prompt content matching the chat initiation sentence, thereby resulting in low intelligence of the chat and poor user experience, and fully utilizing the communication terminal. The user network data obtains the reply prompt content of the chat initiation sentence, improves the accuracy of the reply prompt content acquisition, and embodies a higher intelligent level and improves the user experience.

本实施例获取聊天发起句的回复提示内容的装置的工作过程和工作原理可参照本实施例的获取聊天发起句的回复提示内容的方法的工作过程和工作原理。本发明实施例中的通讯终端设备可以是台式电脑、平板电脑、个人数字助理、手机、电视机、车载电脑、可穿戴通信设备等。In the embodiment, the working process and working principle of the method for obtaining the reply prompting content of the chat initiation sentence can be referred to the working process and working principle of the method for obtaining the reply prompt content of the chat initiation sentence. The communication terminal device in the embodiment of the present invention may be a desktop computer, a tablet computer, a personal digital assistant, a mobile phone, a television, an on-board computer, a wearable communication device, or the like.

工业实用性:通过上述描述可知,本发明通过获取通讯终端接收的聊天发起句所属的话题分类,并利用自定义的与话题分类相同的预设话题对应的话题数据库对聊天发起句进行语义匹配,获得第一语义匹配结果,并在未获得第一语义匹配结果的前提下采集通讯终端的用户网络数据,并利用该用户网络数据对聊天发起句进行语义匹配,获得第二语义匹配结果,解决了采用传统的数据库匹配方式并不一定能获取与聊天发起句匹配的聊天回复提示内容,从而导致聊天的智能化程度低以及用户体验不佳的技术问题,充分利用通讯终端的用户网络数据获取聊天发起句的回复提示内容,提高了回复提示内容获取的准确度,体现了较高的智能化水平,提升 了用户体验。Industrial Applicability: It can be seen from the above description that the present invention obtains the topic classification of the chat initiation sentence received by the communication terminal, and uses the customized topic database corresponding to the preset topic with the same topic classification to semantically match the chat initiation sentence. Obtaining a first semantic matching result, and collecting user network data of the communication terminal without obtaining the first semantic matching result, and using the user network data to perform semantic matching on the chat initiation sentence to obtain a second semantic matching result, and solving the problem The traditional database matching method does not necessarily obtain the chat reply prompt content matching the chat initiation sentence, thereby resulting in a low degree of intelligence of the chat and a technical problem of poor user experience, and fully utilizing the user network data of the communication terminal to obtain the chat initiation. The reply prompt content of the sentence improves the accuracy of the reply prompt content acquisition, and reflects the higher level of intelligence and improvement. The user experience.

以上仅为本发明的可选实施例而已,并不用于限制本发明,对于本领域的技术人员来说,本发明可以有各种更改和变化。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。 The above is only an alternative embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes can be made to the present invention. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and scope of the present invention are intended to be included within the scope of the present invention.

Claims (10)

一种获取聊天发起句的回复提示内容的方法,包括:A method for obtaining a reply prompt content of a chat initiation sentence, comprising: 建立与预设话题对应的话题数据库;Establish a topic database corresponding to the preset topic; 获取通讯终端接收的聊天发起句所属的话题分类;Obtaining a topic classification to which the chat initiation sentence received by the communication terminal belongs; 利用与所述话题分类相同的所述预设话题对应的话题数据库对所述聊天发起句进行语义匹配,获取第一语义匹配结果,并将所述第一语义匹配结果作为所述聊天发起句的回复提示内容;Performing semantic matching on the chat initiation sentence by using a topic database corresponding to the preset topic of the topic classification, acquiring a first semantic matching result, and using the first semantic matching result as the chat initiation sentence Reply to the prompt content; 如果未获得所述第一语义匹配结果,则基于分布式云计算方式对所述通讯终端的用户网络数据进行数据采集,并利用所述用户网络数据对所述聊天发起句进行语义匹配,获取第二语义匹配结果,并将所述第二语义匹配结果作为所述聊天发起句的回复提示内容。If the first semantic matching result is not obtained, performing data collection on the user network data of the communication terminal based on the distributed cloud computing manner, and performing semantic matching on the chat initiation sentence by using the user network data to obtain the first The two semantic matching results, and the second semantic matching result is used as the reply prompt content of the chat initiation sentence. 根据权利要求1所述的获取聊天发起句的回复提示内容的方法,其中,建立与预设话题对应的话题数据库包括:The method for obtaining a reply prompt content of a chat initiation sentence according to claim 1, wherein the establishing a topic database corresponding to the preset topic comprises: 设定与预设话题关联的场景条目,以及与所述场景条目对应的场景选项;Setting a scene item associated with the preset topic, and a scene option corresponding to the scene item; 创建以所述预设话题为聊天主题的样本聊天对,并将所述样本聊天对作为与所述预设话题对应的话题数据库,所述样本聊天对包括样本发起句、根据所述场景选项设置的与所述样本发起句对应的样本回复句。Creating a sample chat pair with the preset topic as a chat topic, and using the sample chat pair as a topic database corresponding to the preset topic, the sample chat pair includes a sample initiation sentence, and setting according to the scene option A sample reply sentence corresponding to the sample initiation sentence. 根据权利要求2所述的获取聊天发起句的回复提示内容的方法,其中,获取通讯终端接收的聊天发起句所属的话题分类包括:The method for obtaining a reply prompt content of a chat initiation sentence according to claim 2, wherein obtaining a topic classification to which the chat initiation sentence received by the communication terminal belongs includes: 获取所述聊天发起句的上文聊天内容,并将所述聊天发起句和所述聊天发起句的上文聊天内容合并成文本格式的合并文本;Obtaining the above chat content of the chat initiation sentence, and combining the chat initiation sentence and the above chat content of the chat initiation sentence into a merged text in a text format; 提取所述合并文本的关键词;Extracting keywords of the merged text; 根据所述关键词获取所述聊天发起句所属的话题分类。 Obtaining a topic classification to which the chat initiation sentence belongs according to the keyword. 根据权利要求3所述的获取聊天发起句的回复提示内容的方法,其中,利用与所述话题分类相同的所述预设话题对应的话题数据库对所述聊天发起句进行语义匹配,获取第一语义匹配结果包括:The method for obtaining a reply prompt content of a chat initiation sentence according to claim 3, wherein the chat initiation sentence is semantically matched by using a topic database corresponding to the preset topic of the topic classification, and obtaining the first Semantic matching results include: 获取与所述话题分类相同的所述预设话题对应的话题数据库;Obtaining a topic database corresponding to the preset topic that is the same as the topic classification; 采集与所述话题分类关联的场景条目的内容信息,获得场景信息;Collecting content information of a scene item associated with the topic classification to obtain scene information; 在所述话题数据库中匹配与所述聊天发起句相同的样本发起句,并根据所述场景信息获取第一语义匹配结果。And matching, in the topic database, a sample initiation sentence that is the same as the chat initiation sentence, and acquiring a first semantic matching result according to the scenario information. 根据权利要求1-4任一所述的获取聊天发起句的回复提示内容的方法,其中,利用所述用户网络数据对所述聊天发起句进行语义匹配,获取第二语义匹配结果包括:The method for obtaining the reply prompt content of the chat initiation sentence according to any one of claims 1-4, wherein the semantic initiation of the chat initiation sentence is performed by using the user network data, and obtaining the second semantic matching result comprises: 对所述用户网络数据进行预处理获得预处理文本,所述预处理包括分词处理、语义消歧处理、词性标注处理、去除停用词处理、标点符号处理、表情字符处理;Pre-processing the user network data to obtain pre-processed text, the pre-processing including word segmentation processing, semantic disambiguation processing, part-of-speech tagging processing, removing stop word processing, punctuation symbol processing, and emoticon character processing; 利用K均值聚类算法对所述预处理文本进行文本聚类,获得文本聚类中心;The K-means clustering algorithm is used to perform text clustering on the pre-processed text to obtain a text clustering center; 提取所述文本聚类中心的关键词作为与所述文本聚类中心对应的聚类话题;Extracting a keyword of the text clustering center as a clustering topic corresponding to the text clustering center; 获取与所述聊天发起句所属的话题分类最接近的所述聚类话题;Obtaining the clustering topic that is closest to the topic classification to which the chat initiation sentence belongs; 在与所述聚类话题对应的用户网络数据中对所述聊天发起句进行匹配,获得第二语义匹配结果。Matching the chat initiation sentence in the user network data corresponding to the clustering topic to obtain a second semantic matching result. 根据权利要求5所述的获取聊天发起句的回复提示内容的方法,其中,所述场景条目包括:The method for obtaining a reply prompt content of a chat initiation sentence according to claim 5, wherein the scene entry comprises: 发送和接收所述聊天发起句的通讯终端的关系条目、姓名条目、性别条目、年龄条目、即时通讯账号条目、电子邮箱地址条目、家庭地址条目、职业类别条目、职务条目、工作单位条目、单位地址条目、 银行账号条目、好友印象条目、兴趣爱好条目、朋友圈状态条目、心情条目、最近关注话题条目、当前通讯状态条目、场景图像条目、时间条目、节日条目、季节条目、地理位置信息条目、距离条目、通讯频率条目、通讯次数条目、通讯时长条目、发起历史通讯的选择方式条目,其中,所述选择方式包括从通讯录发起通讯方式、从历史通话记录发起通讯方式、从短信通讯模块发起通讯方式、从拨号盘发起通讯方式。Relationship entry, name entry, gender entry, age entry, instant messaging account entry, email address entry, home address entry, occupation category entry, job entry, work unit entry, unit of the communication terminal that sends and receives the chat initiation sentence Address entry, Bank account entry, friend impression entry, hobbyist entry, circle of friends status entry, mood entry, recent topic of interest entry, current communication status entry, scene image entry, time entry, holiday entry, season entry, geographic location information entry, distance entry a communication frequency entry, a communication frequency entry, a communication duration entry, and a selection mode entry for initiating historical communication, wherein the selection manner includes starting a communication mode from the address book, initiating a communication mode from the historical call record, and initiating a communication mode from the short message communication module. Start the communication method from the dial pad. 根据权利要求6所述的获取聊天发起句的回复提示内容的方法,其中,采集所述场景条目中的发送或接收所述聊天发起句的通讯终端的场景图像条目的内容信息包括:The method for obtaining the reply prompt content of the chat initiation sentence according to claim 6, wherein the collecting the content information of the scene image entry of the communication terminal that sends or receives the chat initiation sentence in the scene entry comprises: 采集发送或接收所述聊天发起句的通讯终端的场景图像;Collecting a scene image of the communication terminal that sends or receives the chat initiation sentence; 采用高斯差分DOG算子提取场景训练图像的感兴趣区域,并计算所述场景训练图像的感兴趣区域的尺度不变特征变换SIFT特征;Extracting a region of interest of the scene training image by using a Gaussian difference DOG operator, and calculating a scale-invariant feature transform SIFT feature of the region of interest of the scene training image; 采用K均值聚类算法对所述场景训练图像的感兴趣区域的SIFT特征进行聚类,获得多个聚类中心,并建立由与每一个所述聚类中心对应的视觉单词构成的视觉单词词典;The K-means clustering algorithm is used to cluster the SIFT features of the region of interest of the scene training image to obtain a plurality of cluster centers, and a visual word dictionary composed of visual words corresponding to each of the cluster centers is established. ; 采用DOG算子提取所述场景图像的感兴趣区域,并在所述视觉单词词典中匹配与所述场景图像的感兴趣区域的SIFT特征最接近的视觉单词;Extracting a region of interest of the scene image using a DOG operator, and matching a visual word in the visual word dictionary that is closest to a SIFT feature of the region of interest of the scene image; 根据所述场景图像的感兴趣区域的视觉单词的分布对所述场景图像采用预先训练好的支持向量机分类器进行分类,获得发送或接收所述聊天发起句的通讯终端的场景图像条目的内容信息。And classifying the scene image by using a pre-trained support vector machine classifier according to the distribution of the visual words of the region of interest of the scene image, and obtaining the content of the scene image item of the communication terminal that sends or receives the chat initiation sentence. information. 一种获取聊天发起句的回复提示内容的装置,包括:A device for obtaining a reply prompt content of a chat initiation sentence, comprising: 话题数据库创建装置,设置为建立与预设话题对应的话题数据库;a topic database creation device, configured to establish a topic database corresponding to the preset topic; 话题分类获取装置,设置为获取通讯终端接收的聊天发起句所属的话题分类; The topic classification obtaining device is configured to obtain a topic classification to which the chat initiation sentence received by the communication terminal belongs; 第一语义匹配装置,设置为利用与所述话题分类相同的所述预设话题对应的话题数据库对所述聊天发起句进行语义匹配,获取第一语义匹配结果,并将所述第一语义匹配结果作为所述聊天发起句的回复提示内容;a first semantic matching device, configured to perform semantic matching on the chat initiation sentence by using a topic database corresponding to the preset topic of the topic classification, acquire a first semantic matching result, and match the first semantic The result is the reply prompt content of the chat initiation sentence; 第二语义匹配装置,设置为如果未获得所述第一语义匹配结果,则基于分布式云计算方式对所述通讯终端的用户网络数据进行数据采集,并利用所述用户网络数据对所述聊天发起句进行语义匹配,获取第二语义匹配结果,并将所述第二语义匹配结果作为所述聊天发起句的回复提示内容。a second semantic matching device, configured to collect data of user network data of the communication terminal based on a distributed cloud computing manner, and use the user network data to perform the chat if the first semantic matching result is not obtained The initiating sentence performs semantic matching, obtains a second semantic matching result, and uses the second semantic matching result as the reply prompt content of the chat initiation sentence. 根据权利要求8所述的获取聊天发起句的回复提示内容的装置,其中,所述话题数据库创建装置包括:The apparatus for obtaining a reply prompt content of a chat initiation sentence according to claim 8, wherein the topic database creation means comprises: 设定装置,设置为设定与预设话题关联的场景条目,以及与所述场景条目对应的场景选项;a setting device configured to set a scene item associated with the preset topic, and a scene option corresponding to the scene item; 样本聊天对创建装置,设置为创建以所述预设话题为聊天主题的样本聊天对,并将所述样本聊天对作为与所述预设话题对应的话题数据库,所述样本聊天对包括样本发起句、根据所述场景选项设置的与所述样本发起句对应的样本回复句。a sample chat pair creation device, configured to create a sample chat pair with the preset topic as a chat topic, and use the sample chat pair as a topic database corresponding to the preset topic, where the sample chat pair includes a sample initiation a sentence, a sample reply sentence corresponding to the sample initiation sentence set according to the scene option. 根据权利要求9所述的获取聊天发起句的回复提示内容的装置,其中,所述话题分类获取装置包括:The apparatus for obtaining a reply prompt content of a chat initiation sentence according to claim 9, wherein the topic classification acquisition means comprises: 合并文本获取装置,设置为获取所述聊天发起句的上文聊天内容,并将所述聊天发起句和所述聊天发起句的上文聊天内容合并成文本格式的合并文本;a merged text obtaining device, configured to acquire the above chat content of the chat initiation sentence, and merge the chat initiation sentence and the chat content of the chat initiation sentence into a merged text in a text format; 关键字提取装置,设置为提取所述合并文本的关键词;a keyword extracting device configured to extract keywords of the merged text; 话题分类确定装置,设置为根据所述关键词获取所述聊天发起句所属的话题分类。 The topic classification determining means is configured to acquire a topic classification to which the chat initiation sentence belongs according to the keyword.
PCT/CN2016/103422 2015-11-04 2016-10-26 Method and apparatus for obtaining reply prompt content for chat start sentence Ceased WO2017076205A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510741085.3 2015-11-04
CN201510741085.3A CN106649405A (en) 2015-11-04 2015-11-04 Method and device for acquiring reply prompt content of chat initiating sentence

Publications (1)

Publication Number Publication Date
WO2017076205A1 true WO2017076205A1 (en) 2017-05-11

Family

ID=58661751

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/103422 Ceased WO2017076205A1 (en) 2015-11-04 2016-10-26 Method and apparatus for obtaining reply prompt content for chat start sentence

Country Status (2)

Country Link
CN (1) CN106649405A (en)
WO (1) WO2017076205A1 (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109871492A (en) * 2018-12-24 2019-06-11 深圳市珍爱捷云信息技术有限公司 Task processing method, device, computer equipment and computer storage medium
CN110263318A (en) * 2018-04-23 2019-09-20 腾讯科技(深圳)有限公司 Processing method, device, computer-readable medium and the electronic equipment of entity name
CN110413770A (en) * 2019-06-12 2019-11-05 阿里巴巴集团控股有限公司 Group's message is referred to the method and device of group topic
CN110633410A (en) * 2018-06-21 2019-12-31 中兴通讯股份有限公司 Information processing method and device, storage medium, and electronic device
CN110765338A (en) * 2018-07-26 2020-02-07 北京搜狗科技发展有限公司 Data processing method and device and data processing device
CN111061865A (en) * 2018-10-17 2020-04-24 武汉斗鱼网络科技有限公司 Method and computing device for text mining of session scene
CN111914565A (en) * 2020-07-15 2020-11-10 海信视像科技股份有限公司 Electronic device and method for processing user sentences
CN112822093A (en) * 2021-01-07 2021-05-18 南京绛门信息科技股份有限公司 Multi-terminal message aggregation system and method based on 5G
CN113037932A (en) * 2021-02-26 2021-06-25 北京百度网讯科技有限公司 Reply message generation method and device, electronic equipment and storage medium
CN113127613A (en) * 2020-01-10 2021-07-16 北京搜狗科技发展有限公司 Chat information processing method and device
CN113139061A (en) * 2021-05-14 2021-07-20 东北大学 Case feature extraction method based on word vector clustering
CN114374572A (en) * 2021-12-30 2022-04-19 广州趣丸网络科技有限公司 Voice information processing method and device
CN114817483A (en) * 2021-01-18 2022-07-29 北京猎户星空科技有限公司 Data processing method and device, electronic equipment and storage medium
CN115002053A (en) * 2022-06-14 2022-09-02 北京百度网讯科技有限公司 Interaction method and device and electronic equipment
WO2022252951A1 (en) * 2021-06-02 2022-12-08 International Business Machines Corporation Curiosity based activation and search depth
CN115577285A (en) * 2022-09-28 2023-01-06 上海喜马拉雅科技有限公司 Training set processing method, device, electronic device and storage medium for classification
CN115934923A (en) * 2023-03-15 2023-04-07 威海海洋职业学院 E-commerce reply method and system based on big data
CN119940558A (en) * 2025-04-10 2025-05-06 天津渤海职业技术学院 AI robot dialogue control method and system based on big data search

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107453980A (en) * 2017-07-26 2017-12-08 北京小米移动软件有限公司 Problem response method and device in instant messaging
CN107623627A (en) * 2017-09-27 2018-01-23 珠海市魅族科技有限公司 A kind of information replying method and device, terminal and readable storage medium storing program for executing
TWI656448B (en) * 2017-11-01 2019-04-11 中華電信股份有限公司 Topic providing apparatus and could file prompting method thereof
CN108121799A (en) * 2017-12-21 2018-06-05 广东欧珀移动通信有限公司 Recommendation method and device for reply sentences, storage medium and mobile terminal
CN108460159B (en) * 2018-03-29 2022-04-29 Oppo广东移动通信有限公司 Information reply method, terminal equipment and computer readable storage medium
CN110555094A (en) * 2018-03-30 2019-12-10 北京金山安全软件有限公司 information recommendation method and device, electronic equipment and storage medium
CN109242706A (en) * 2018-08-20 2019-01-18 中国平安人寿保险股份有限公司 Method, apparatus, computer equipment and the storage medium for assisting seat personnel to link up
CN109547323B (en) * 2018-10-17 2019-11-12 北京达佳互联信息技术有限公司 Information processing method, device, server, terminal and storage medium
CN109842549B (en) * 2019-03-21 2021-06-04 天津字节跳动科技有限公司 Instant messaging interaction method and device and electronic equipment
CN110532565B (en) * 2019-08-30 2022-03-25 联想(北京)有限公司 Statement processing method and device and electronic equipment
CN111263016A (en) * 2020-01-10 2020-06-09 深圳追一科技有限公司 Communication assistance method, communication assistance device, computer equipment and computer-readable storage medium
CN111914073A (en) * 2020-07-15 2020-11-10 中国联合网络通信集团有限公司 Customer service response method, device, equipment and storage medium
CN111881283B (en) * 2020-08-03 2024-10-22 海信电子科技(武汉)有限公司 Business keyword library creation method, intelligent chat guiding method and device
CN111897943A (en) * 2020-08-17 2020-11-06 腾讯科技(深圳)有限公司 Session record searching method and device, electronic equipment and storage medium
CN112905770B (en) * 2021-02-10 2024-09-13 华南师范大学 Artificial intelligence psychological health chat robot based on corpus and oriented to professional occupation
CN113535926B (en) * 2021-07-26 2023-11-10 深圳市优必选科技股份有限公司 Active dialogue method and device and voice terminal
CN113595886A (en) * 2021-07-29 2021-11-02 北京达佳互联信息技术有限公司 Instant messaging message processing method and device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1637740A (en) * 2003-11-20 2005-07-13 阿鲁策株式会社 Conversation control apparatus, and conversation control method
CN101071418A (en) * 2007-03-29 2007-11-14 腾讯科技(深圳)有限公司 Chat method and system
CN103079008A (en) * 2013-01-07 2013-05-01 北京播思软件技术有限公司 Method and system for automatically generating replying suggestion according to content of short message
CN103390047A (en) * 2013-07-18 2013-11-13 天格科技(杭州)有限公司 Chatting robot knowledge base and construction method thereof
CN104268129A (en) * 2014-08-28 2015-01-07 小米科技有限责任公司 Message reply method and message reply device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102866990B (en) * 2012-08-20 2016-08-03 北京搜狗信息服务有限公司 A kind of theme dialogue method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1637740A (en) * 2003-11-20 2005-07-13 阿鲁策株式会社 Conversation control apparatus, and conversation control method
CN101071418A (en) * 2007-03-29 2007-11-14 腾讯科技(深圳)有限公司 Chat method and system
CN103079008A (en) * 2013-01-07 2013-05-01 北京播思软件技术有限公司 Method and system for automatically generating replying suggestion according to content of short message
CN103390047A (en) * 2013-07-18 2013-11-13 天格科技(杭州)有限公司 Chatting robot knowledge base and construction method thereof
CN104268129A (en) * 2014-08-28 2015-01-07 小米科技有限责任公司 Message reply method and message reply device

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110263318B (en) * 2018-04-23 2022-10-28 腾讯科技(深圳)有限公司 Entity name processing method and device, computer readable medium and electronic equipment
CN110263318A (en) * 2018-04-23 2019-09-20 腾讯科技(深圳)有限公司 Processing method, device, computer-readable medium and the electronic equipment of entity name
CN110633410A (en) * 2018-06-21 2019-12-31 中兴通讯股份有限公司 Information processing method and device, storage medium, and electronic device
CN110765338A (en) * 2018-07-26 2020-02-07 北京搜狗科技发展有限公司 Data processing method and device and data processing device
CN111061865A (en) * 2018-10-17 2020-04-24 武汉斗鱼网络科技有限公司 Method and computing device for text mining of session scene
CN109871492A (en) * 2018-12-24 2019-06-11 深圳市珍爱捷云信息技术有限公司 Task processing method, device, computer equipment and computer storage medium
CN110413770A (en) * 2019-06-12 2019-11-05 阿里巴巴集团控股有限公司 Group's message is referred to the method and device of group topic
CN110413770B (en) * 2019-06-12 2023-01-31 创新先进技术有限公司 Method and device for classifying group messages into group topics
CN113127613B (en) * 2020-01-10 2024-01-09 北京搜狗科技发展有限公司 Chat information processing method and device
CN113127613A (en) * 2020-01-10 2021-07-16 北京搜狗科技发展有限公司 Chat information processing method and device
CN111914565A (en) * 2020-07-15 2020-11-10 海信视像科技股份有限公司 Electronic device and method for processing user sentences
CN112822093A (en) * 2021-01-07 2021-05-18 南京绛门信息科技股份有限公司 Multi-terminal message aggregation system and method based on 5G
CN114817483A (en) * 2021-01-18 2022-07-29 北京猎户星空科技有限公司 Data processing method and device, electronic equipment and storage medium
CN113037932B (en) * 2021-02-26 2022-09-23 北京百度网讯科技有限公司 Reply message generating method, apparatus, electronic device and storage medium
CN113037932A (en) * 2021-02-26 2021-06-25 北京百度网讯科技有限公司 Reply message generation method and device, electronic equipment and storage medium
CN113139061B (en) * 2021-05-14 2023-07-21 东北大学 A Case Feature Extraction Method Based on Word Vector Clustering
CN113139061A (en) * 2021-05-14 2021-07-20 东北大学 Case feature extraction method based on word vector clustering
WO2022252951A1 (en) * 2021-06-02 2022-12-08 International Business Machines Corporation Curiosity based activation and search depth
US11769501B2 (en) 2021-06-02 2023-09-26 International Business Machines Corporation Curiosity based activation and search depth
CN114374572B (en) * 2021-12-30 2023-12-01 广州趣丸网络科技有限公司 Voice information processing method and device
CN114374572A (en) * 2021-12-30 2022-04-19 广州趣丸网络科技有限公司 Voice information processing method and device
CN115002053A (en) * 2022-06-14 2022-09-02 北京百度网讯科技有限公司 Interaction method and device and electronic equipment
CN115002053B (en) * 2022-06-14 2024-02-13 北京百度网讯科技有限公司 Interactive methods, devices and electronic devices
CN115577285A (en) * 2022-09-28 2023-01-06 上海喜马拉雅科技有限公司 Training set processing method, device, electronic device and storage medium for classification
CN115934923B (en) * 2023-03-15 2023-05-05 威海海洋职业学院 E-commerce replying method and system based on big data
CN115934923A (en) * 2023-03-15 2023-04-07 威海海洋职业学院 E-commerce reply method and system based on big data
CN119940558A (en) * 2025-04-10 2025-05-06 天津渤海职业技术学院 AI robot dialogue control method and system based on big data search

Also Published As

Publication number Publication date
CN106649405A (en) 2017-05-10

Similar Documents

Publication Publication Date Title
WO2017076205A1 (en) Method and apparatus for obtaining reply prompt content for chat start sentence
US10380160B2 (en) Dynamic language model
KR20210038860A (en) Intent recommendation method, apparatus, device and storage medium
US11848009B2 (en) Adaptive interface in a voice-activated network
CN106656732A (en) Scene information-based method and device for obtaining chat reply content
CN110209897A (en) Intelligent dialogue method, apparatus, storage medium and equipment
CN107145545B (en) Top-k area user text data recommendation method in social network based on position
US11436446B2 (en) Image analysis enhanced related item decision
CN113806588B (en) Method and device for searching videos
CN109416695A (en) Local service information is provided in automatic chatting
CN110059177A (en) A kind of activity recommendation method and device based on user's portrait
CN107153687B (en) Indexing method for social network text data
CN107862004A (en) Intelligent sorting method and device, storage medium and electronic equipment
CN113139110A (en) Regional feature processing method, device, equipment, storage medium and program product
CN101923556B (en) Method and device for searching webpages according to sentence serial numbers
JP2022007576A (en) Information processing system, information processing method, information processing program, and server
JP2023027749A (en) Method and apparatus for determining broadcasting style, equipment, and computer storage medium
CN106649410B (en) Method and device for obtaining chat reply content
US20250265278A1 (en) Map search method and apparatus, server, terminal, and storage medium
CN114036414A (en) Method and device for processing interest points, electronic equipment, medium and program product
CN108427769B (en) A method for extracting people's interest tags based on social network
CN113821739B (en) Local event detection method, apparatus, device and storage medium
CN116561352A (en) Electronic red envelope generation method, device, computer readable medium and electronic device
CN107292750B (en) Information collection method and information collection device for social network
CN120336383A (en) A hot query indexing method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16861473

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205N DATED 11/07/2018)

122 Ep: pct application non-entry in european phase

Ref document number: 16861473

Country of ref document: EP

Kind code of ref document: A1