[go: up one dir, main page]

WO2023035623A1 - Answer corpus generation method based on artificial intelligence, and related device - Google Patents

Answer corpus generation method based on artificial intelligence, and related device Download PDF

Info

Publication number
WO2023035623A1
WO2023035623A1 PCT/CN2022/088893 CN2022088893W WO2023035623A1 WO 2023035623 A1 WO2023035623 A1 WO 2023035623A1 CN 2022088893 W CN2022088893 W CN 2022088893W WO 2023035623 A1 WO2023035623 A1 WO 2023035623A1
Authority
WO
WIPO (PCT)
Prior art keywords
corpus
question
answer
participle
professional
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2022/088893
Other languages
French (fr)
Chinese (zh)
Inventor
吴闻杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Publication of WO2023035623A1 publication Critical patent/WO2023035623A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Definitions

  • the present application relates to the field of artificial intelligence, in particular to an artificial intelligence-based answer corpus generation method and related equipment.
  • the inventor realizes that the existing medical platform has a product recommendation function in the consultation process, generally relying on the doctor to make a verbal recommendation, and when the patient confirms that there is a purchase intention, the product or service is pushed.
  • this promotion method relying on manpower has a better conversion effect, it cannot be promoted on a large scale.
  • accurate intelligent recommendations based on big data analysis have not been fully applied.
  • the main purpose of this application is to solve the technical problem of low accuracy in the existing intelligent product recommendation based on the consultation link.
  • the first aspect of the present application provides a method for generating response corpus based on artificial intelligence, including: obtaining the query corpus and the response corpus corresponding to the query corpus to be pushed, and based on the preset linear chain conditional random field, respectively
  • the query corpus and the response corpus to be pushed are subjected to word segmentation processing, correspondingly obtaining a plurality of question segmentation words and a plurality of response segmentation words; respectively performing professional word semantic matching on the query diagnosis segmentation words and the response segmentation words, correspondingly obtaining the Inquiry professional participle corresponding to the question participle and the response professional participle corresponding to the response participle; Carry out cross question-and-answer matching to each described interrogation professional participle and each described response professional participle in turn, and according to the result of cross question-answer matching, to The professional participle of the inquiry and the professional participle of the response are combined to obtain a diagnostic statement; using the preset prior medical knowledge base, matching the treatment
  • the second aspect of the present application provides an artificial intelligence-based response corpus generation device, including: a word segmentation module, used to obtain the query corpus and the response corpus corresponding to the query corpus to be pushed, and based on the preset linear chain condition.
  • word segmentation processing is performed on the inquiry corpus and the response corpus to be pushed, correspondingly obtaining a plurality of question segmentation words and a plurality of response word segmentations;
  • a semantic matching module is used to separately analyze the question segmentation words and the response Segmentation carries out semantic matching of professional words, and correspondingly obtains the corresponding professional word segmentation of the inquiry and the corresponding professional participle of the response participle;
  • the combination module is used to adopt a preset prior medical knowledge base, Match the treatment product information
  • the third aspect of the present application provides an artificial intelligence-based answer corpus generation device, including: a memory and at least one processor, instructions are stored in the memory; the at least one processor invokes the instructions in the memory , so that the artificial intelligence-based answer corpus generation device executes the artificial intelligence-based answer corpus generation method as follows: obtain the question corpus and the answer corpus corresponding to the question corpus, and based on the preset linear chain conditional random field, respectively performing word segmentation processing on the inquiry corpus and the response corpus to be pushed, and correspondingly obtaining multiple question segmentation words and multiple response word segmentations; Semantic matching, correspondingly obtain the professional participles corresponding to the question participle and the corresponding participles of the response participles; carry out cross question-and-answer matching to each of the participles of the professional participle of the inquiry and each participle of the professional participle of the response in turn, and according to As a result of cross-question-answer matching, combine the professional participle of the inquiry
  • the fourth aspect of the present application provides a computer-readable storage medium, and instructions are stored in the computer-readable storage medium, and when it is run on a computer, it causes the computer to perform the following artificial intelligence-based response corpus generation Method: Obtain the question corpus and the response corpus to be pushed corresponding to the question corpus, and perform word segmentation processing on the question corpus and the response corpus to be pushed respectively based on the preset linear chain conditional random field, correspondingly get more An inquiry participle and a plurality of response participles; the professional word semantic matching is carried out to the question participle and the response participle respectively, and correspondingly obtain the question participle corresponding to the question participle and the response specialty corresponding to the response participle Segmentation: Carry out cross question-and-answer matching for each of the professional word segmentations of the inquiry and each of the professional word segmentations of the responses in turn, and according to the results of the cross question-answer matching, combine the professional word segmentation of the inquiry and the professional word segment
  • the inquiry corpus and the corpus to be pushed are converted into professional inquiry corpus and professional word semantic matching Response corpus, and match the professional word segmentation of the inquiry and response to judge the patient's condition and the treatment plan recommended by the doctor, get the diagnosis sentence, and match the treatment product information according to the diagnosis sentence, and together with the response corpus to be pushed sent to patients.
  • the recommended products and services are accurate recommendations for this chat scene, providing users with a quick ordering function for specific products, and realizing accurate recommendations of property rights during the consultation process.
  • Fig. 1 is the schematic diagram of the first embodiment of the response corpus generation method based on artificial intelligence of the present application
  • Fig. 2 is the second embodiment schematic diagram of the application's artificial intelligence-based response corpus generation method
  • Fig. 3 is the schematic diagram of the third embodiment of the answer corpus generation method based on artificial intelligence of the present application
  • Fig. 4 is a schematic diagram of an embodiment of the artificial intelligence-based response corpus generating device of the present application.
  • FIG. 5 is a schematic diagram of another embodiment of the artificial intelligence-based response corpus generation device of the present application.
  • FIG. 6 is a schematic diagram of an embodiment of an artificial intelligence-based response corpus generation device in the present application.
  • the embodiment of the present application provides an artificial intelligence-based answer corpus generation method and related equipment, which obtains the question corpus and the answer corpus to be pushed, and performs word segmentation processing based on the preset linear chain condition random field, and obtains the question word segmentation and answer correspondingly Word segmentation; carry out semantic matching of professional words for question and answer word segmentation, correspondingly obtain professional word segmentation for inquiry and professional response; carry out cross-question and answer matching for professional word segmentation for inquiry and professional word for response, and according to the results of cross-question-answer matching, Combine the word segmentation for diagnosis and the professional word for response to obtain the diagnosis statement; use the preset prior medical knowledge base to match the treatment product information corresponding to the diagnosis statement, and combine the treatment product information and the response corpus to be pushed to obtain a new response to be pushed Corpus and push.
  • This application realizes the recommendation of treatment products in the process of online consultation, and improves the intelligence of online consultation.
  • the first embodiment of the artificial intelligence-based answer corpus generation method in the embodiment of the present application includes:
  • the execution subject of the present application may be an artificial intelligence-based response corpus generation device, or a terminal or a server, which is not specifically limited here.
  • the embodiment of the present application is described by taking the server as an execution subject as an example.
  • the inquiry corpus is input and sent by the patient through the inquiry chat interface
  • the response corpus to be pushed is obtained by the docking doctor receiving the patient's inquiry corpus, inputting and sending it through the chat interface, for example, the patient sends "arm Skin allergies, what should I do with the doctor?”, and the doctor replies to the query corpus, “Have you ever taken anti-allergic medicine at home?”
  • the response corpus to be pushed among which, the corpus to be pushed is sent to the background, and If it is not directly forwarded to the patient, it needs to be implanted with product recommendation information through this application method before sending it.
  • the acquisition of product recommendation information is based on the semantic matching of the inquiry corpus and the response corpus to be pushed, so it is necessary to perform semantic recognition on the query corpus and the response corpus to be pushed, and match the corresponding product recommendation information based on the semantic recognition results .
  • the pre-set linear chain conditional random field can be used to perform word segmentation processing on the question corpus and the response corpus to be pushed, so as to obtain corresponding multiple question and answer word segmentation.
  • the relationship of "generation-discriminant pair" performs word segmentation processing on the question segmentation and response segmentation.
  • the part of speech is firstly distinguished between the question participle and the response participle, and the parts of speech are marked for the question participle and the response participle, so as to find out the noun question participle and response participle.
  • the answer participle is directly excluded. It can be directly connected to a custom or existing rule dictionary for exclusion, and the operator can complete the rules, or it can be connected to the AI system for exclusion.
  • the doctor responds to the question corpus input by the patient.
  • the doctor obtains the professional question and response word segmentation, which are the key question words and key words in the question corpus.
  • Response words which may contain multiple symptoms or questions during the consultation process, and the doctor will also respond to different symptoms or questions accordingly. Therefore, it is necessary to perform cross-question and answer matching on the professional participle of the consultation and the professional participle of the response. .
  • the mapping relationship table between the regular expression of the diagnosis statement and the diagnosis result is configured in the prior medical knowledge base, the corresponding regular expression can be found through the diagnosis statement, and then the mapping relationship can be traversed through the regular expression table to determine the diagnostic result mapped to the diagnostic statement, and then a mapping relationship table between the diagnostic result and the treatment product identification information is also configured in the prior medical knowledge base.
  • the inquiry corpus and the corpus to be pushed are converted into professional inquiry corpus and response corpus through semantic matching of professional words , and match the professional word segmentation of the inquiry and response to judge the patient's condition and the treatment plan recommended by the doctor, obtain the diagnosis sentence, and match the treatment product information according to the diagnosis sentence, and push it together with the response corpus to be pushed to patient.
  • the recommended products and services are accurate recommendations for this chat scene, providing users with a quick ordering function for specific products, and realizing accurate recommendations of property rights during the consultation process.
  • the second embodiment of the artificial intelligence-based answer corpus generation method in the embodiment of the present application includes:
  • each character in the question corpus is divided and sequentially encoded to obtain a character table.
  • the character feature vector of each character is trained using a neural network such as Word2vec, wherein the character feature The vector contains the context information of the question corpus.
  • Each character feature vector represents a character.
  • the dimension of each character feature vector can be adjusted according to the size of the corpus. Generally, the optional dimensions are 50, 100, 200, etc.
  • cL is The vector corresponding to the l-th letter in the pinyin corresponding to the text; L is the maximum length of the pinyin, which is a fixed value by default.
  • the maximum length of the pinyin corresponding to the text is 6, so L can be set to 6; and if the length L' of the pinyin corresponding to the text is less than L, the elements in the row L'+1 ⁇ L' in the corresponding pinyin vector matrix will be Set to zero; for example, if the length of the pinyin "shi" corresponding to " ⁇ " is 3, all rows 4-6 in the corresponding pinyin vector matrix are set to zero.
  • Each pinyin feature vector matrix is sequentially encoded by a convolutional neural network CNN to obtain a pinyin feature vector of a fixed size.
  • the character feature vector and the pinyin feature vector are assembled according to the order of each character in the question corpus for one-to-one correspondence, and the context information vector can be obtained, and then the context information vector is input into the bidirectional LSTM neural network for semantic analysis , where the bidirectional LSTM neural network includes a forward LSTM neural network and a backward LSTM neural network, combined with forgetting and saving mechanisms for backpropagation to learn the semantic features of the context information vector.
  • z) is the probability of labeling y when the value is z
  • S(z) is a normalization factor, in order to normalize the output to a value from 0 to 1.
  • each character in the question-and-answer word segmentation and each professional word in the professional word dictionary has its special phonetic and font combination.
  • the initial consonant, the final vowel, the complement code of the final vowel and the tone of each character are digitally coded to obtain the four-digit code of its pronunciation;
  • the Chinese character structure, five four-corner codes, and the number of strokes of each character are coded to obtain the shape of the character 7-digit code;
  • the combination of the two can form the unique 11-digit phonetic code of each character, including the first phonetic code and the second phonetic code.
  • F0 to F9, G0 to G9, H0 to H9, J0 to J9, K0 to K9 represent respectively the upper left corner, the upper right corner, the lower left corner, the lower right corner, and the coding field corresponding to the ten classes of strokes corresponding to the attached number;
  • Li i is the number of strokes and i is Positive integer
  • the font coding information of the word " ⁇ " is E2F4G4H2J1K4L7, so the coding information of the commonly used characters of the word "flower” is A11B13C13D1E2F4G4H2J1K4L7.
  • the phonetic-graph code includes eleven types of coding fields. If the same type of coding field is different between the first phonetic-graph code and the second phonetic-graph code, the edit distance is increased by 1; otherwise, the original value remains. If the coding fields of all types are consistent between the two, it means that the two commonly used words have the highest similarity, and the edit distance between the two is 0. If the coding fields of all types between the two are inconsistent, it means that the two commonly used words are similar If the degree is the lowest, the edit distance between the two is 11, so the edit distance between the pre-replaced word and the commonly used word is between 0-11.
  • the edit distance is the quantitative value of the similarity between each word in the question and answer segment and each word in the professional dictionary, and the smaller the edit distance, the higher the similarity, so the user can set the preset edit distance Threshold, used to filter professional words for cross combination.
  • a conventional semantic recognition model is used to perform semantic analysis on the question-and-answer sub-phrases and professional phrases, and obtain the first semantic analysis result and the second semantic analysis result respectively. If there is a small semantic deviation between the two after the comparison, it is determined that the professional word changed in the corresponding professional phrase is a synonym for the corresponding question and answer participle, and it is used as the question and answer professional participle corresponding to the question and answer participle.
  • the phonetic-phonetic codes of question and answer word segmentation and professional word segmentation are constructed through the preset common word dictionary and professional word dictionary, and the number of each question and answer word segmentation is determined through the matching of phonetic-phonetic codes. Synonyms, and replace them to obtain the corresponding professional participle for consultation and professional participle for answering, which will be more accurate in subsequent product matching.
  • the third embodiment of the artificial intelligence-based answer corpus generation method in the embodiment of the present application includes:
  • the modular encrypted corpus includes the first modular encrypted corpus corresponding to the medical inquiry corpus and the response to be pushed The second encrypted corpus corresponding to the corpus;
  • the type (Type) of the plaintext m of the inquiry corpus and the response corpus to be pushed is T
  • the set of T is ⁇ integer, real number, character, date, Boolean, etc. ⁇
  • a medical inquiry corpus that is, binary, decimal, hexadeci
  • c represents the ciphertext
  • m represents the consultation
  • s represents the base used in encryption
  • r represents a random number
  • p is an encryption key
  • x0 is an intermediate variable, which is equal to the encryption key p and another encryption key
  • the corresponding original ciphertext code, inverse ciphertext code and complement code of ciphertext can be calculated through the encrypted corpus, and the original ciphertext code,
  • the ciphertext inverse code and the ciphertext complement code are used for encryption calculation, for the addition operation of the encrypted corpus, the ciphertext combination in it is directly summed in place without using the original ciphertext code, ciphertext inverse code and ciphertext Text complement.
  • the total length of the storage format is 32 bits, 64 bits or 80 bits, and includes sign bits, integer bits and decimal places, and according to This storage format expands the binary bit plaintext; performs encryption operations on the expanded binary bit plaintext, and combines the results of the encrypted operations to obtain the corresponding ciphertext as the dividend and divisor respectively; set the initial value of the decimal counter count equal to the storage Format length-L, where L is the length of integer bits in the storage format; judge whether the ciphertext of the dividend is greater than the ciphertext of the divisor, if greater, then add the ciphertext of the dividend to the complement of the encrypted corpus, and obtain the remainder as The new dividend, and add the ciphertext of 1 in the integer position, that is, the ciphertext quotient is obtained; otherwise, judge whether the ciphertext of the remainder is all zero or the decimal place counter count is greater than the total length of the storage format,
  • the patient's personal privacy information can be guaranteed, and the patient's consultation can be improved. sense of experience.
  • the artificial intelligence-based response corpus generation method in the embodiment of the present application
  • the following describes the artificial intelligence-based response corpus generation device in the embodiment of the present application.
  • the artificial intelligence-based response in the embodiment of the present application One embodiment of the corpus generation device includes:
  • the word segmentation module 401 is used to obtain the query corpus and the response corpus to be pushed corresponding to the query corpus, and perform word segmentation processing on the query corpus and the response corpus to be pushed based on the preset linear chain conditional random field , correspondingly obtain multiple question and response participle words;
  • the semantic matching module 402 is used to carry out semantic matching of professional words to the question participle and the response participle respectively, and correspondingly obtain the professional question participle corresponding to the question participle and the response professional participle corresponding to the response participle;
  • the question-and-answer matching module 403 is used to sequentially perform cross question-and-answer matching on each of the professional word segmentations of the inquiry and each of the professional word segmentations of the responses, and perform cross-question matching on the professional word segmentation of the inquiry and the professional word segmentation of the responses according to the results of the cross-question matching. Combination to get the diagnostic statement;
  • the combination module 404 is used to match the treatment product information corresponding to the diagnostic sentence using the preset prior medical knowledge base, combine the treatment product information and the response corpus to be pushed, obtain and push new response corpus to be pushed.
  • the inquiry corpus and the corpus to be pushed are converted into professional inquiry corpus and response corpus through semantic matching of professional words , and match the professional word segmentation of the inquiry and response to judge the patient's condition and the treatment plan recommended by the doctor, obtain the diagnosis sentence, and match the treatment product information according to the diagnosis sentence, and push it together with the response corpus to be pushed to patient.
  • the recommended products and services are accurate recommendations for this chat scene, providing users with a quick ordering function for specific products, and realizing accurate recommendations of property rights during the consultation process.
  • FIG. 5 another embodiment of the artificial intelligence-based answer corpus generation device in the embodiment of the present application includes:
  • the word segmentation module 401 is used to obtain the query corpus and the response corpus to be pushed corresponding to the query corpus, and perform word segmentation processing on the query corpus and the response corpus to be pushed based on the preset linear chain conditional random field , correspondingly obtain multiple question and response participle words;
  • the semantic matching module 402 is used to carry out semantic matching of professional words to the question participle and the response participle respectively, and correspondingly obtain the professional question participle corresponding to the question participle and the response professional participle corresponding to the response participle;
  • the question-and-answer matching module 403 is used to sequentially perform cross question-and-answer matching on each of the professional word segmentations of the inquiry and each of the professional word segmentations of the responses, and perform cross-question matching on the professional word segmentation of the inquiry and the professional word segmentation of the responses according to the results of the cross-question matching. Combination to get the diagnostic statement;
  • the combination module 404 is used to match the treatment product information corresponding to the diagnostic sentence using the preset prior medical knowledge base, combine the treatment product information and the response corpus to be pushed, obtain and push new response corpus to be pushed.
  • the word segmentation module 401 includes:
  • An extraction unit 4011 configured to extract character feature vectors and corresponding pinyin feature vectors of the question-and-answer corpus, wherein the question-and-answer corpus includes question-and-answer corpus and response corpus to be pushed;
  • a splicing unit 4012 configured to splice the character feature vectors and corresponding pinyin feature vectors to obtain context information vectors, and perform semantic analysis on the context information vectors to obtain semantic features;
  • the decoding unit 4013 is configured to use a preset linear chain conditional random field to mark the semantic features to obtain a word segmentation tag sequence, and decode the word segment tag sequence to obtain a plurality of question and answer word segmentation, wherein the question and answer word segmentation includes Questions and answers.
  • the semantic matching module 402 includes:
  • Construction unit 4021 for constructing the first phonetic-phonetic code of the question-and-answer participle in the preset common word dictionary, and constructing the second phonetic-phonetic code of each professional word in the preset professional word dictionary, and calculating the first phonetic-phonetic code and the edit distance between the second phonetic-graph code;
  • the combination unit 4022 is used to combine the question and answer participle corresponding to the first phonetic-phonetic code whose editing distance is less than the preset editing distance threshold to obtain the question-answering word group, and select the second phonetic-phonetic code corresponding to the second phonetic-phonetic code whose editing distance is less than the editing distance threshold professional term;
  • the replacement unit 4023 is used to replace the corresponding question and answer participle in the question and answer participle with the selected professional words in turn, so as to obtain a plurality of professional phrases corresponding to the question and answer participle;
  • Semantic analysis unit 4024 configured to perform semantic analysis on the question and answer phrases to obtain a first semantic analysis result, and perform semantic analysis on each of the professional phrases to obtain multiple second semantic analysis results;
  • the comparison unit 4025 is configured to compare the first semantic analysis result with each of the second semantic analysis results, and select each question and answer in the question and answer sub-phrase group from a plurality of professional phrases according to the comparison result A synonym of the participle; use the selected synonym as the question and answer professional participle corresponding to the question and answer participle, wherein the question and answer professional participle includes an inquiry professional participle and a response professional participle.
  • comparison unit 4025 is also used for:
  • the combining module 404 includes:
  • the traversal unit 4041 is configured to use the diagnostic statement to perform hierarchical traversal in the preset priori medical knowledge base, and determine the diagnosis result corresponding to the diagnostic statement according to the result of the hierarchical traversal;
  • the screening unit 4042 is configured to select the therapeutic product identification information matching the diagnosis result from the prior knowledge base, and obtain the therapeutic product information mapped to the therapeutic product identification information, wherein the therapeutic product Information includes referral links and summary information for therapeutic products.
  • the artificial intelligence-based answer corpus generation device also includes an encryption module 405, which is used for:
  • the inverse code of the ciphertext, and the complementary code of the ciphertext perform a modular operation on the encrypted corpus to obtain a modular encrypted corpus, wherein the modular encrypted corpus includes the corresponding The first modular encrypted corpus and the second encrypted corpus corresponding to the response corpus to be pushed;
  • the first encrypted corpus is used as a new interrogation corpus, and the second encrypted corpus is used as a new response corpus to be pushed.
  • the phonetic-phonetic codes of question and answer word segmentation and professional word segmentation are constructed through the preset common word dictionary and professional word dictionary, and the number of each question and answer word segmentation is determined through the matching of phonetic-phonetic codes. Synonyms, and replace them, to get the corresponding professional word segmentation and response professional word segmentation, which is more accurate in subsequent product matching; by further encrypting the query corpus and the response corpus to be pushed, and by calculating the ciphertext , product recommendation and other data processing processes can better guarantee the patient's personal privacy information and improve the patient's consultation experience.
  • Figure 4 and Figure 5 above describe in detail the artificial intelligence-based response corpus generation device in the embodiment of the present application from the perspective of modular functional entities, and the following describes the artificial intelligence-based response corpus generation device in the embodiment of the present application from the perspective of hardware processing Describe in detail.
  • FIG. 6 is a schematic structural diagram of an artificial intelligence-based response corpus generation device provided by an embodiment of the present application.
  • the artificial intelligence-based response corpus generation device 600 may have relatively large differences due to different configurations or performances, and may include one or More than one processor (central processing units, CPU) 610 (for example, one or more processors) and memory 620, one or more storage media 630 for storing application programs 633 or data 632 (for example, one or more mass storage devices ).
  • the memory 620 and the storage medium 630 may be temporary storage or persistent storage.
  • the program stored in the storage medium 630 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations for the artificial intelligence-based answer corpus generation device 600 .
  • the processor 610 may be configured to communicate with the storage medium 630 , and execute a series of instruction operations in the storage medium 630 on the artificial intelligence-based response corpus generating device 600 .
  • the artificial intelligence-based answer corpus generating device 600 may also include one or more power sources 640, one or more wired or wireless network interfaces 650, one or more input and output interfaces 660, and/or, one or more operating systems 631 , such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, etc.
  • operating systems 631 such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, etc.
  • the present application also provides an artificial intelligence-based response corpus generation device, the computer device includes a memory and a processor, and computer-readable instructions are stored in the memory, and when the computer-readable instructions are executed by the processor, the processor executes the above-mentioned tasks.
  • the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium may be a non-volatile computer-readable storage medium.
  • the computer-readable storage medium may also be a volatile computer-readable storage medium.
  • Instructions are stored in the computer-readable storage medium, and when the instructions are run on the computer, the computer is made to execute the steps of the method for generating answer corpus based on artificial intelligence as follows: obtaining the question corpus and the information corresponding to the question corpus The response corpus to be pushed, and based on the preset linear chain conditional random field, respectively perform word segmentation processing on the inquiry corpus and the response corpus to be pushed, correspondingly obtain a plurality of question segmentation words and a plurality of response word segmentation; The questioning participle and the response participle carry out professional word semantic matching, correspondingly obtain the questioning specialty participle corresponding to the described questioning participle and the answering specialty participle corresponding to the response participle; According to the results of the cross-ques
  • the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the technical solution of the present application is essentially or part of the contribution to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disk or optical disc and other media that can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Human Computer Interaction (AREA)
  • Pathology (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Machine Translation (AREA)

Abstract

The present application relates to the field of artificial intelligence. Disclosed are an answer corpus generation method based on artificial intelligence, and a related device. The method comprises: acquiring an inquiry corpus and an answer corpus to be pushed, and performing word segmentation processing on the basis of a preset linear-chain conditional random field, so as to correspondingly obtain segmented inquiry words and segmented answer words (101); performing professional word semantic matching on the segmented inquiry words and the segmented answer words, so as to correspondingly obtain professional segmented inquiry words and professional segmented answer words (102); performing cross question-answer matching on the professional segmented inquiry words and the professional segmented answer words, and combining the professional segmented inquiry words and the professional segmented answer words according to a cross question-answer matching result, so as to obtain a diagnosis statement (103); and matching, by means of using a preset prior medical knowledge base, therapy product information which corresponds to the diagnosis statement, combining the therapy product information and said answer corpus, so as to obtain a new answer corpus to be pushed, and pushing said new answer corpus (104). By means of the present application, therapy product recommendation during an online inquiry process is realized, thereby improving the degree of intelligence of an online inquiry.

Description

基于人工智能的应答语料生成方法及相关设备Response corpus generation method and related equipment based on artificial intelligence

本申请要求于2021年09月09日提交中国专利局、申请号为202111055021.X、发明名称为“基于人工智能的应答语料生成方法及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在申请中。This application claims the priority of the Chinese patent application with the application number 202111055021.X and the title of the invention "artificial intelligence-based response corpus generation method and related equipment" submitted to the China Patent Office on September 09, 2021, the entire content of which is passed References are incorporated in the application.

技术领域technical field

本申请涉及人工智能领域,尤其涉及一种基于人工智能的应答语料生成方法及相关设备。The present application relates to the field of artificial intelligence, in particular to an artificial intelligence-based answer corpus generation method and related equipment.

背景技术Background technique

随着计算机技术的发展,现有产品推荐服务从最原始的广告位推广、竞价排名等,到目前的广告推荐算法,现阶段很多视频网站正在尝试的与视频要素绑定的产品推荐,也存在多种互联网店商AI推荐算法等。都在向着两个技术方向上进化,即技术上的精准推荐,以及形式上非暴力展示。以求获得更高的推荐转化率和更好的用户服务体验。With the development of computer technology, the existing product recommendation services have evolved from the most original advertising promotion, bidding ranking, etc. to the current advertising recommendation algorithm. At this stage, many video websites are trying to combine product recommendations with video elements. A variety of Internet shopkeeper AI recommendation algorithms, etc. They are all evolving in two technical directions, that is, technically accurate recommendation and formal non-violent display. In order to obtain higher recommendation conversion rate and better user service experience.

发明人意识到,现有医疗平台的问诊环节中,均携带有产品推荐功能,一般依靠医生进行口头上的推荐,当患者确认有购买意向时,再进行产品或者服务的推送。这种依赖人力的推广方式虽然有更好的转化效果,但无法进行大规模推广,在大数据大力发展的时代,未充分地应用基于大数据分析的精准智能推荐。The inventor realizes that the existing medical platform has a product recommendation function in the consultation process, generally relying on the doctor to make a verbal recommendation, and when the patient confirms that there is a purchase intention, the product or service is pushed. Although this promotion method relying on manpower has a better conversion effect, it cannot be promoted on a large scale. In the era of vigorous development of big data, accurate intelligent recommendations based on big data analysis have not been fully applied.

发明内容Contents of the invention

本申请的主要目的在于解决现有基于问诊环节的产品智能推荐精准度较低的技术问题。The main purpose of this application is to solve the technical problem of low accuracy in the existing intelligent product recommendation based on the consultation link.

本申请第一方面提供了一种基于人工智能的应答语料生成方法,包括:获取问诊语料和所述问诊语料对应的待推送应答语料,并基于预置线性链条件随机场,分别对所述问诊语料和所述待推送应答语料进行分词处理,对应得到多个问诊分词和多个应答分词;分别对所述问诊分词和所述应答分词进行专业词语义匹配,对应得到所述问诊分词对应的问诊专业分词和所述应答分词对应的应答专业分词;依次对各所述问诊专业分词和各所述应答专业分词进行交叉问答匹配,并根据交叉问答匹配的结果,对所述问诊专业分词和所述应答专业分词进行组合,得到诊断语句;采用预置先验医疗知识库,匹配诊断语句对应的治疗产品信息,对所述治疗产品信息和所述待推送应答语料进行组合,得到新的待推送应答语料并推送。The first aspect of the present application provides a method for generating response corpus based on artificial intelligence, including: obtaining the query corpus and the response corpus corresponding to the query corpus to be pushed, and based on the preset linear chain conditional random field, respectively The query corpus and the response corpus to be pushed are subjected to word segmentation processing, correspondingly obtaining a plurality of question segmentation words and a plurality of response segmentation words; respectively performing professional word semantic matching on the query diagnosis segmentation words and the response segmentation words, correspondingly obtaining the Inquiry professional participle corresponding to the question participle and the response professional participle corresponding to the response participle; Carry out cross question-and-answer matching to each described interrogation professional participle and each described response professional participle in turn, and according to the result of cross question-answer matching, to The professional participle of the inquiry and the professional participle of the response are combined to obtain a diagnostic statement; using the preset prior medical knowledge base, matching the treatment product information corresponding to the diagnosis statement, and comparing the treatment product information and the response corpus to be pushed Combine them to get a new response corpus to be pushed and push it.

本申请第二方面提供了一种基于人工智能的应答语料生成装置,包括:分词模块,用于获取问诊语料和所述问诊语料对应的待推送应答语料,并基于预置线性链条件随机场,分别对所述问诊语料和所述待推送应答语料进行分词处理,对应得到多个问诊分词和多个应答分词;语义匹配模块,用于分别对所述问诊分词和所述应答分词进行专业词语义匹配,对应得到所述问诊分词对应的问诊专业分词和所述应答分词对应的应答专业分词;问答匹配模块,用于依次对各所述问诊专业分词和各所述应答专业分词进行交叉问答匹配,并根据交叉问答匹配的结果,对所述问诊专业分词和所述应答专业分词进行组合,得到诊断语句;组合模块,用于采用预置先验医疗知识库,匹配诊断语句对应的治疗产品信息,对所述治疗产品信息和所述待推送应答语料进行组合,得到新的待推送应答语料并推送。The second aspect of the present application provides an artificial intelligence-based response corpus generation device, including: a word segmentation module, used to obtain the query corpus and the response corpus corresponding to the query corpus to be pushed, and based on the preset linear chain condition. At the airport, word segmentation processing is performed on the inquiry corpus and the response corpus to be pushed, correspondingly obtaining a plurality of question segmentation words and a plurality of response word segmentations; a semantic matching module is used to separately analyze the question segmentation words and the response Segmentation carries out semantic matching of professional words, and correspondingly obtains the corresponding professional word segmentation of the inquiry and the corresponding professional participle of the response participle; Responding to professional word segmentation for cross-question matching, and according to the result of cross-question-answer matching, combining the professional word segmentation of the inquiry and the professional word segmentation of the response to obtain a diagnostic statement; the combination module is used to adopt a preset prior medical knowledge base, Match the treatment product information corresponding to the diagnostic sentence, combine the treatment product information and the response corpus to be pushed, obtain a new response corpus to be pushed, and push it.

本申请第三方面提供了一种基于人工智能的应答语料生成设备,包括:存储器和至少一个处理器,所述存储器中存储有指令;所述至少一个处理器调用所述存储器中的所述指令,以使得所述基于人工智能的应答语料生成设备执行如下所述的基于人工智能的应答语料生成方法:获取问诊语料和所述问诊语料对应的待推送应答语料,并基于预置线性链条件随机场,分别对所述问诊语料和所述待推送应答语料进行分词处理,对应得到多个问诊分词和多个应答分词;分别对所述问诊分词和所述应答分词进行专业词语义匹配,对应得到所述问诊分词对应的问诊专业分词和所述应答分词对应的应答专业分词;依次对各所述 问诊专业分词和各所述应答专业分词进行交叉问答匹配,并根据交叉问答匹配的结果,对所述问诊专业分词和所述应答专业分词进行组合,得到诊断语句;采用预置先验医疗知识库,匹配诊断语句对应的治疗产品信息,对所述治疗产品信息和所述待推送应答语料进行组合,得到新的待推送应答语料并推送。The third aspect of the present application provides an artificial intelligence-based answer corpus generation device, including: a memory and at least one processor, instructions are stored in the memory; the at least one processor invokes the instructions in the memory , so that the artificial intelligence-based answer corpus generation device executes the artificial intelligence-based answer corpus generation method as follows: obtain the question corpus and the answer corpus corresponding to the question corpus, and based on the preset linear chain conditional random field, respectively performing word segmentation processing on the inquiry corpus and the response corpus to be pushed, and correspondingly obtaining multiple question segmentation words and multiple response word segmentations; Semantic matching, correspondingly obtain the professional participles corresponding to the question participle and the corresponding participles of the response participles; carry out cross question-and-answer matching to each of the participles of the professional participle of the inquiry and each participle of the professional participle of the response in turn, and according to As a result of cross-question-answer matching, combine the professional participle of the inquiry and the professional participle of the response to obtain a diagnostic statement; use a preset prior medical knowledge base to match the treatment product information corresponding to the diagnosis statement, and compare the treatment product information Combine with the response corpus to be pushed to obtain a new response corpus to be pushed and push it.

本申请的第四方面提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行如下所述的基于人工智能的应答语料生成方法:获取问诊语料和所述问诊语料对应的待推送应答语料,并基于预置线性链条件随机场,分别对所述问诊语料和所述待推送应答语料进行分词处理,对应得到多个问诊分词和多个应答分词;分别对所述问诊分词和所述应答分词进行专业词语义匹配,对应得到所述问诊分词对应的问诊专业分词和所述应答分词对应的应答专业分词;依次对各所述问诊专业分词和各所述应答专业分词进行交叉问答匹配,并根据交叉问答匹配的结果,对所述问诊专业分词和所述应答专业分词进行组合,得到诊断语句;采用预置先验医疗知识库,匹配诊断语句对应的治疗产品信息,对所述治疗产品信息和所述待推送应答语料进行组合,得到新的待推送应答语料并推送。The fourth aspect of the present application provides a computer-readable storage medium, and instructions are stored in the computer-readable storage medium, and when it is run on a computer, it causes the computer to perform the following artificial intelligence-based response corpus generation Method: Obtain the question corpus and the response corpus to be pushed corresponding to the question corpus, and perform word segmentation processing on the question corpus and the response corpus to be pushed respectively based on the preset linear chain conditional random field, correspondingly get more An inquiry participle and a plurality of response participles; the professional word semantic matching is carried out to the question participle and the response participle respectively, and correspondingly obtain the question participle corresponding to the question participle and the response specialty corresponding to the response participle Segmentation: Carry out cross question-and-answer matching for each of the professional word segmentations of the inquiry and each of the professional word segmentations of the responses in turn, and according to the results of the cross question-answer matching, combine the professional word segmentation of the inquiry and the professional word segmentation of the response to obtain a diagnostic sentence ; Using the preset prior medical knowledge base, matching the treatment product information corresponding to the diagnosis sentence, combining the treatment product information and the response corpus to be pushed, obtaining a new response corpus to be pushed and pushing it.

本申请提供的技术方案中,通过获取患者输入的问诊语料和医生对患者进行待推送应答语料的答复,通过专业词语义匹配来将问诊语料和待推送语料转化为专业的问诊语料和应答语料,并对问诊和应答的专业分词进行匹配,以用于判断患者的病情和医生推荐的治疗方案,得到诊断语句,并根据诊断语句来匹配治疗产品信息,并与待推送应答语料一起推送给患者。实现问诊环节平滑的产品推荐功能。推荐的产品、服务是针对了此次聊天场景的精确推荐,向用户提供了针对特定产品的快速下单功能,实现问诊过程中产权的精准推荐。In the technical solution provided by this application, by obtaining the inquiry corpus input by the patient and the doctor's reply to the patient's response corpus to be pushed, the inquiry corpus and the corpus to be pushed are converted into professional inquiry corpus and professional word semantic matching Response corpus, and match the professional word segmentation of the inquiry and response to judge the patient's condition and the treatment plan recommended by the doctor, get the diagnosis sentence, and match the treatment product information according to the diagnosis sentence, and together with the response corpus to be pushed sent to patients. Realize the smooth product recommendation function in the consultation process. The recommended products and services are accurate recommendations for this chat scene, providing users with a quick ordering function for specific products, and realizing accurate recommendations of property rights during the consultation process.

附图说明Description of drawings

图1为本申请基于人工智能的应答语料生成方法的第一个实施例示意图;Fig. 1 is the schematic diagram of the first embodiment of the response corpus generation method based on artificial intelligence of the present application;

图2为本申请基于人工智能的应答语料生成方法的第二个实施例示意图;Fig. 2 is the second embodiment schematic diagram of the application's artificial intelligence-based response corpus generation method;

图3为本申请基于人工智能的应答语料生成方法的第三个实施例示意图;Fig. 3 is the schematic diagram of the third embodiment of the answer corpus generation method based on artificial intelligence of the present application;

图4为本申请基于人工智能的应答语料生成装置的一个实施例示意图;Fig. 4 is a schematic diagram of an embodiment of the artificial intelligence-based response corpus generating device of the present application;

图5为本申请基于人工智能的应答语料生成装置的另一个实施例示意图;FIG. 5 is a schematic diagram of another embodiment of the artificial intelligence-based response corpus generation device of the present application;

图6为本申请基于人工智能的应答语料生成设备的一个实施例示意图。FIG. 6 is a schematic diagram of an embodiment of an artificial intelligence-based response corpus generation device in the present application.

具体实施方式Detailed ways

本申请实施例提供了一种基于人工智能的应答语料生成方法及相关设备,获取问诊语料和待推送应答语料,并基于预置线性链条件随机场进行分词处理,对应得到问诊分词和应答分词;对问诊分词和应答分词进行专业词语义匹配,对应得到问诊专业分词和应答专业分词;对问诊专业分词和应答专业分词进行交叉问答匹配,并根据交叉问答匹配的结果,对问诊专业分词和应答专业分词进行组合,得到诊断语句;采用预置先验医疗知识库,匹配诊断语句对应的治疗产品信息,对治疗产品信息和待推送应答语料进行组合,得到新的待推送应答语料并推送。本申请实现了线上问诊过程中的治疗产品推荐,提升了线上问诊的智能化程度。The embodiment of the present application provides an artificial intelligence-based answer corpus generation method and related equipment, which obtains the question corpus and the answer corpus to be pushed, and performs word segmentation processing based on the preset linear chain condition random field, and obtains the question word segmentation and answer correspondingly Word segmentation; carry out semantic matching of professional words for question and answer word segmentation, correspondingly obtain professional word segmentation for inquiry and professional response; carry out cross-question and answer matching for professional word segmentation for inquiry and professional word for response, and according to the results of cross-question-answer matching, Combine the word segmentation for diagnosis and the professional word for response to obtain the diagnosis statement; use the preset prior medical knowledge base to match the treatment product information corresponding to the diagnosis statement, and combine the treatment product information and the response corpus to be pushed to obtain a new response to be pushed Corpus and push. This application realizes the recommendation of treatment products in the process of online consultation, and improves the intelligence of online consultation.

本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”或“具有”及其任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first", "second", "third", "fourth", etc. (if any) in the specification and claims of the present application and the above drawings are used to distinguish similar objects, and not necessarily Used to describe a specific sequence or sequence. It is to be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments described herein can be practiced in sequences other than those illustrated or described herein. Furthermore, the term "comprising" or "having" and any variations thereof, are intended to cover a non-exclusive inclusion, for example, a process, method, system, product or device comprising a sequence of steps or elements is not necessarily limited to those explicitly listed instead, may include other steps or elements not explicitly listed or inherent to the process, method, product or apparatus.

为便于理解,下面对本申请实施例的具体流程进行描述,请参阅图1,本申请实施例中基于人工智能的应答语料生成方法的第一个实施例包括:For ease of understanding, the following describes the specific process of the embodiment of the present application. Please refer to FIG. 1. The first embodiment of the artificial intelligence-based answer corpus generation method in the embodiment of the present application includes:

101、获取问诊语料和问诊语料对应的待推送应答语料,并基于预置线性链条件随机场,分别对问诊语料和待推送应答语料进行分词处理,对应得到多个问诊分词和多个应答分词;101. Obtain the question corpus and the response corpus to be pushed corresponding to the question corpus, and based on the preset linear chain conditional random field, perform word segmentation processing on the question corpus and the response corpus to be pushed respectively, correspondingly obtain multiple question segmentation words and multiple answer participle;

可以理解的是,本申请的执行主体可以为基于人工智能的应答语料生成装置,还可以是终端或者服务器,具体此处不做限定。本申请实施例以服务器为执行主体为例进行说明。It can be understood that the execution subject of the present application may be an artificial intelligence-based response corpus generation device, or a terminal or a server, which is not specifically limited here. The embodiment of the present application is described by taking the server as an execution subject as an example.

本实施例中,问诊语料由患者通过问诊聊天界面输入并发送得到,待推送应答语料则是由对接医生接收到患者的问诊语料后,通过聊天界面输入并发得到,比如患者发送“胳膊皮肤过敏,请问医生该怎么处理?”的问诊语料,对接医生回复该问诊语料,“自己是否在家用过抗过敏药?”的待推送应答语料,其中,待推送语料发送至后台,并未直接转发给患者,需通过本申请方法进行产品推荐信息的植入再进行发送。In this embodiment, the inquiry corpus is input and sent by the patient through the inquiry chat interface, and the response corpus to be pushed is obtained by the docking doctor receiving the patient's inquiry corpus, inputting and sending it through the chat interface, for example, the patient sends "arm Skin allergies, what should I do with the doctor?”, and the doctor replies to the query corpus, “Have you ever taken anti-allergic medicine at home?” The response corpus to be pushed, among which, the corpus to be pushed is sent to the background, and If it is not directly forwarded to the patient, it needs to be implanted with product recommendation information through this application method before sending it.

另外,产品推荐信息的获取是根据问诊语料和待推送应答语料的语义进行匹配的,故需要对问诊语料和待推送应答语料进行语义识别,并根据语义识别结果,匹配对应的产品推荐信息。此处可以先通过结合隐马尔可夫模型,具体采用预置的线性链条件随机场对问诊语料和待推送应答语料进行分词处理,得到对应的多个问诊分词和应答分词,两者通过“生成-判别对”的关系对问诊分词和应答分词进行分词处理。In addition, the acquisition of product recommendation information is based on the semantic matching of the inquiry corpus and the response corpus to be pushed, so it is necessary to perform semantic recognition on the query corpus and the response corpus to be pushed, and match the corresponding product recommendation information based on the semantic recognition results . Here, by combining the Hidden Markov Model, the pre-set linear chain conditional random field can be used to perform word segmentation processing on the question corpus and the response corpus to be pushed, so as to obtain corresponding multiple question and answer word segmentation. The relationship of "generation-discriminant pair" performs word segmentation processing on the question segmentation and response segmentation.

我们一般都假设问诊语料和待推送应答语料多个应答分词X和已知词性Y有相同的结构,即:X=(x1,x2,……,xn),Y=(y1,y2,……,yn);通过每个Y分别与X连接,每个Y由y1-yn顺序连接,组成的结构即构成了线性链条件随机场。并通过线性随机场进行学习,直到达到最优化权重,即可得到问诊分词和应答分词。We generally assume that the question corpus and the response corpus to be pushed have the same structure of multiple response participle X and known part of speech Y, namely: X=(x1, x2,...,xn), Y=(y1, y2,... ..., yn); each Y is connected to X respectively, and each Y is sequentially connected by y1-yn, and the formed structure constitutes a linear chain conditional random field. And learn through the linear random field until the optimal weight is reached, and the question and answer word segmentation can be obtained.

102、分别对问诊分词和应答分词进行专业词语义匹配,对应得到问诊分词对应的问诊专业分词和应答分词对应的应答专业分词;102. Carry out semantic matching of professional words on the question and answer participle respectively, and correspondingly obtain the professional question participle corresponding to the question participle and the answer professional participle corresponding to the answer participle;

本实施例中,先对问诊分词和应答分词进行词性区分,为问诊分词和应答分词标示词性,以此找出名词性的问诊分词和应答分词即可,对于其他词性的问诊分词和应答分词则直接排除。可直接对接自定义的或者现有的规则词库进行排除,并通过运营人员补全规则,也可对接AI系统进行排除。In this embodiment, the part of speech is firstly distinguished between the question participle and the response participle, and the parts of speech are marked for the question participle and the response participle, so as to find out the noun question participle and response participle. And the answer participle is directly excluded. It can be directly connected to a custom or existing rule dictionary for exclusion, and the operator can complete the rules, or it can be connected to the AI system for exclusion.

进一步的,采用保留的问诊分词和应答分词进行专业词语义匹配,可对接现有的基于规则的同近义词词库,并通过运营人员补全规则,也可以对接AI同近义词功能,实现基于AI系统的转换。最终目标为将口语化或非统一的问诊分词和应答分词转换为统一的问诊分词和应答分词。Further, using reserved question and answer word segmentation for semantic matching of professional words can be connected to the existing rule-based synonym thesaurus, and the operation personnel can complete the rules, and can also be connected to the AI synonym function to achieve AI-based System conversion. The ultimate goal is to convert colloquial or non-unified question and answer participle into unified question and answer participle.

103、依次对各问诊专业分词和各应答专业分词进行交叉问答匹配,并根据交叉问答匹配的结果,对问诊专业分词和应答专业分词进行组合,得到诊断语句;103. Carry out cross question-and-answer matching for each professional participle of the inquiry and each participle of the professional response in turn, and according to the result of the cross-question-answer matching, combine the participles of the professional participle of the inquiry and the participle of the professional response to obtain a diagnostic sentence;

本实施例中,医生针对患者输入的问诊语料进行应答语料的答复,在进行分词处理和语义识别后,得到问诊专业分词和应答专业分词,为问诊语料中的关键问诊词语和关键应答词语,其中,在问诊过程中可能包含多个症状或者问题,则医生也会针对不同的症状或者问题进行相对应的回复,故此处需要对问诊专业分词和应答专业分词进行交叉问答匹配。In this embodiment, the doctor responds to the question corpus input by the patient. After word segmentation processing and semantic recognition, the doctor obtains the professional question and response word segmentation, which are the key question words and key words in the question corpus. Response words, which may contain multiple symptoms or questions during the consultation process, and the doctor will also respond to different symptoms or questions accordingly. Therefore, it is necessary to perform cross-question and answer matching on the professional participle of the consultation and the professional participle of the response. .

具体的,可以先对各个问诊专业分词分别进行分布式表示,得到问诊专业分词的问诊词向量序列和与应答专业分词绑定的预置的参照词向量序列;然后通过深度学习模型来识别问诊词向量序列和参照词向量序列的语义相似度;接着通过预设的公式计算每个问诊专业分词和每个应答专业分词之间的文本相似度;再通过语义相似度和文本相似度来确定两者之间的综合相似度;根据综合相似度确定与文字专业分词相匹配的应答专业分词,并进行组合,即可得到诊断语句。Specifically, it is possible to perform a distributed representation of the word segmentation for each medical inquiry specialty first, and obtain the query word vector sequence and the preset reference word vector sequence bound to the answer professional word segmentation; and then use the deep learning model to Identify the semantic similarity between the query word vector sequence and the reference word vector sequence; then calculate the text similarity between each query professional word and each response professional word segmentation through a preset formula; then use the semantic similarity and text similarity The comprehensive similarity between the two is determined; according to the comprehensive similarity, the corresponding professional word segmentation matching the text professional word segmentation is determined and combined to obtain the diagnostic sentence.

104、采用预置先验医疗知识库,匹配诊断语句对应的治疗产品信息,对治疗产品信息和待推送应答语料进行组合,得到新的待推送应答语料并推送。104. Use the preset prior medical knowledge base, match the treatment product information corresponding to the diagnosis sentence, combine the treatment product information and the response corpus to be pushed, obtain and push the new response corpus to be pushed.

本实施例中,先验医疗知识库中配置了诊断语句的正则表达式与诊断结果之间的映射关系表,通过诊断语句可以查找到对应的正则表达式,然后通过正则表达式遍历该映射关系表,即可确定与诊断语句相映射的诊断结果,然后先验医疗知识库中还配置了诊断结果与治疗产品标识信息之间的映射关系表,通过诊断结果遍历该映射关系表,即可查找到具体的治疗产品标识信息,包括质量产品的推荐链接和摘要信息,其中,摘要信息可以包括品牌名称、使用说明、出售数量和好评率等。具体如下所示:In this embodiment, the mapping relationship table between the regular expression of the diagnosis statement and the diagnosis result is configured in the prior medical knowledge base, the corresponding regular expression can be found through the diagnosis statement, and then the mapping relationship can be traversed through the regular expression table to determine the diagnostic result mapped to the diagnostic statement, and then a mapping relationship table between the diagnostic result and the treatment product identification information is also configured in the prior medical knowledge base. By traversing the mapping relationship table through the diagnostic result, you can find To specific treatment product identification information, including recommended links and summary information of quality products, where the summary information may include brand name, instructions for use, sales quantity and favorable rating, etc. Specifically as follows:

(1)采用诊断语句,在预置先验医疗知识库中进行层次遍历,并根据层次遍历的结果,确定诊断语句对应的诊断结果;(1) Using diagnostic sentences, performing hierarchical traversal in the preset prior medical knowledge base, and according to the results of hierarchical traversal, determine the diagnostic results corresponding to the diagnostic sentences;

(2)从先验知识库中选取与诊断结果相匹配的治疗产品标识信息,并获取与治疗产品标识信息相映射的治疗产品信息,其中,治疗产品信息包括治疗产品的推荐链接和摘要信息。(2) Select the therapeutic product identification information matching the diagnosis result from the prior knowledge base, and obtain the therapeutic product information mapped to the therapeutic product identification information, where the therapeutic product information includes the recommended link and summary information of the therapeutic product.

本申请实施例中,通过获取患者输入的问诊语料和医生对患者进行待推送应答语料的答复,通过专业词语义匹配来将问诊语料和待推送语料转化为专业的问诊语料和应答语料,并对问诊和应答的专业分词进行匹配,以用于判断患者的病情和医生推荐的治疗方案,得到诊断语句,并根据诊断语句来匹配治疗产品信息,并与待推送应答语料一起推送给患者。实现问诊环节平滑的产品推荐功能。推荐的产品、服务是针对了此次聊天场景的精确推荐,向用户提供了针对特定产品的快速下单功能,实现问诊过程中产权的精准推荐。In the embodiment of the present application, by obtaining the inquiry corpus input by the patient and the doctor's reply to the patient's response corpus to be pushed, the inquiry corpus and the corpus to be pushed are converted into professional inquiry corpus and response corpus through semantic matching of professional words , and match the professional word segmentation of the inquiry and response to judge the patient's condition and the treatment plan recommended by the doctor, obtain the diagnosis sentence, and match the treatment product information according to the diagnosis sentence, and push it together with the response corpus to be pushed to patient. Realize the smooth product recommendation function in the consultation process. The recommended products and services are accurate recommendations for this chat scene, providing users with a quick ordering function for specific products, and realizing accurate recommendations of property rights during the consultation process.

请参阅图2,本申请实施例中基于人工智能的应答语料生成方法的第二个实施例包括:Referring to Fig. 2, the second embodiment of the artificial intelligence-based answer corpus generation method in the embodiment of the present application includes:

201、获取问诊语料和问诊语料对应的待推送应答语料,并提取问答语料的字符特征向量以及对应的拼音特征向量,其中,问答语料包括问诊语料和待推送应答语料;201. Acquire the question-and-answer corpus and the response corpus to be pushed corresponding to the question-and-answer corpus, and extract the character feature vector and the corresponding pinyin feature vector of the question-and-answer corpus, wherein the question-and-answer corpus includes the question-answer corpus and the response corpus to be pushed;

202、对字符特征向量以及对应的拼音特征向量进行拼接,得到上下文信息向量,并对上下文信息向量进行语义分析,得到语义特征;202. Splicing character feature vectors and corresponding pinyin feature vectors to obtain context information vectors, and performing semantic analysis on context information vectors to obtain semantic features;

203、采用预置线性链条件随机场对语义特征进行标注,得到分词标注序列,并对分词标注序列进行解码,得到多个问答分词,其中,问答分词包括问诊分词和应答分词;203. Use the preset linear chain conditional random field to mark the semantic features, obtain the word segmentation tag sequence, and decode the word segment tag sequence to obtain multiple question and answer word segmentation, wherein the question and answer word segmentation includes question and answer word segmentation;

本实施例中,将提问语料中的每个文字分割开,并进行顺序编码,即可得到一个字符表,通过字符表,利用Word2vec等神经网络训练每个文字的字符特征向量,其中,字符特征向量包含提问语料的上下文信息,每个字符特征向量代表一个文字,每个字符特征向量的维度可以根据语料大小进行调整,一般可选的维度为50,100,200等。In this embodiment, each character in the question corpus is divided and sequentially encoded to obtain a character table. Through the character table, the character feature vector of each character is trained using a neural network such as Word2vec, wherein the character feature The vector contains the context information of the question corpus. Each character feature vector represents a character. The dimension of each character feature vector can be adjusted according to the size of the corpus. Generally, the optional dimensions are 50, 100, 200, etc.

将问答语料中的每个文字转换为字母,以此构建问答语料对应的字母表,并构建字母表中的每个字母随机初始化为向量形式,得到每个文字对应的拼音向量矩阵其中,cL为文字对应的拼音中第l个字母对应的向量;L为拼音的最大长度,预设为固定值。一般来说文字对应的拼音的最大长度为6,因此L可以设置为6;且若文字对应的拼音长度L'小于L,则将对应拼音向量矩阵中的第L'+1~L'行元素置零;例如,“市”对应的拼音“shi”长度为3,则其对应的拼音向量矩阵中的第4-6行全部置零。将每个拼音特征向量矩阵依次卷积神经网路CNN进行编码,得到一个固定大小的拼音特征向量。Convert each text in the question-and-answer corpus into a letter to construct the alphabet corresponding to the question-and-answer corpus, and construct each letter in the alphabet to be randomly initialized into a vector form to obtain the pinyin vector matrix corresponding to each text. Among them, cL is The vector corresponding to the l-th letter in the pinyin corresponding to the text; L is the maximum length of the pinyin, which is a fixed value by default. Generally speaking, the maximum length of the pinyin corresponding to the text is 6, so L can be set to 6; and if the length L' of the pinyin corresponding to the text is less than L, the elements in the row L'+1~L' in the corresponding pinyin vector matrix will be Set to zero; for example, if the length of the pinyin "shi" corresponding to "市" is 3, all rows 4-6 in the corresponding pinyin vector matrix are set to zero. Each pinyin feature vector matrix is sequentially encoded by a convolutional neural network CNN to obtain a pinyin feature vector of a fixed size.

本实施例中,将字符特征向量和拼音特征向量按照提问语料中每个文字的顺序组成进行一一对应的拼接,即可得到上下文信息向量,然后将上下文信息向量输入双向LSTM神经网络进行语义分析,其中,双向LSTM神经网络包括一个前向LSTM神经网络和一个后向LSTM神经网络,结合遗忘和保存机制进行反向传播学习上下文信息向量的语义特征。In this embodiment, the character feature vector and the pinyin feature vector are assembled according to the order of each character in the question corpus for one-to-one correspondence, and the context information vector can be obtained, and then the context information vector is input into the bidirectional LSTM neural network for semantic analysis , where the bidirectional LSTM neural network includes a forward LSTM neural network and a backward LSTM neural network, combined with forgetting and saving mechanisms for backpropagation to learn the semantic features of the context information vector.

最终,通过线性链条随机场CRF对语义特征进行标注,得到分词标注序列,其中,语义特征Z={z1,z2,……,zN},标注信息Y={y1,y2,……,yN},则对于给定的语义特征Z取值为z的条件下,在标签序列Y上取值为y的条件概率为p(y|z),具体公式如下:Finally, the semantic feature is marked by the linear chain random field CRF, and the word segmentation tag sequence is obtained, where the semantic feature Z={z1, z2,...,zN}, and the tag information Y={y1, y2,...,yN} , then under the condition that the given semantic feature Z takes the value z, the conditional probability of taking the value y on the label sequence Y is p(y|z), and the specific formula is as follows:

Figure PCTCN2022088893-appb-000001
Figure PCTCN2022088893-appb-000001

Figure PCTCN2022088893-appb-000002
Figure PCTCN2022088893-appb-000002

其中,n=1,2,…,N,tk()和sl()是特征函数,λk和μl分别是tk()和sl()对应的权值。p(y|z)是表示取值为z的情况下标注为y的概率,S(z)是规范化因子,为了将输出归一化为一个0到1的数值。在通过上述公式选取得到分词标注序列后,进行解密,即可得到对应的多个问答分词。Wherein, n=1,2,...,N, tk() and sl() are feature functions, and λk and μl are weights corresponding to tk() and sl() respectively. p(y|z) is the probability of labeling y when the value is z, and S(z) is a normalization factor, in order to normalize the output to a value from 0 to 1. After the sequence of word segmentation is selected and obtained through the above formula, it is decrypted to obtain a plurality of corresponding question and answer word segments.

204、构建问答分词在预置常用词词典中的第一音形码,以及构建预置专业词词典中各专业词的第二音形码,并计算第一音形码和第二音形码之间的编辑距离;204. Construct the first phonetic-graphic code of the question-and-answer word segmentation in the preset common word dictionary, and construct the second phonetic-graphic code of each professional word in the preset professional word dictionary, and calculate the first phonetic-graphic code and the second phonetic-graphic code The edit distance between;

205、对编辑距离小于预置编辑距离阈值的第一音形码对应的问答分词进行组合,得到问答分词组,以及选取编辑距离小于编辑距离阈值的第二音形码对应的专业词;205. Combining the question-and-answer participle corresponding to the first phonetic-phonetic code whose editing distance is less than the preset editing distance threshold, to obtain the question-and-answer word group, and selecting the professional word corresponding to the second phonetic-phonetic code whose editing distance is smaller than the editing distance threshold;

206、依次采用选取的专业词替换问答分词组中对应的问答分词,得到问答分词组对应的多个专业词组;206. Using the selected professional words in turn to replace the corresponding question and answer participle in the question and answer participle to obtain multiple professional phrases corresponding to the question and answer participle;

207、对问答分词组进行语义分析,得到第一语义分析结果,以及对各专业词组进行语义分析,得到多个第二语义分析结果;207. Perform semantic analysis on the question and answer phrases to obtain the first semantic analysis result, and perform semantic analysis on each professional phrase to obtain multiple second semantic analysis results;

208、分别对第一语义分析结果与各第二语义分析结果进行对比,并根据对比的结果,从多个专业词组中选取问答分词组中各问答分词的近义词;208. Comparing the first semantic analysis results with the second semantic analysis results respectively, and selecting synonyms of each question and answer participle in the question and answer participle group from multiple professional phrases according to the comparison results;

209、将选取的近义词作为问答分词对应的问答专业分词,其中,问答专业分词包括问诊专业分词和应答专业分词;209. Use the selected synonym as the question-and-answer professional participle corresponding to the question-and-answer participle, wherein the question-and-answer professional participle includes the inquiry-specialized participle and the response-professional participle;

本实施例中,问答分词中各个文字以及专业词词典中的各个专业字词,均有其特殊的字音与字形组合。其中对每个文字的声母、韵母、韵母补码、声调进行数字编码,得到其字音的四位数字编码;对每个文字的汉字结构、五个四角码、笔画数量进行编码,得到其字形的7位数字编码;两者组合即可形成每个文字特有的11位音形码,包括第一音形码和第二音形码。In this embodiment, each character in the question-and-answer word segmentation and each professional word in the professional word dictionary has its special phonetic and font combination. Wherein, the initial consonant, the final vowel, the complement code of the final vowel and the tone of each character are digitally coded to obtain the four-digit code of its pronunciation; the Chinese character structure, five four-corner codes, and the number of strokes of each character are coded to obtain the shape of the character 7-digit code; the combination of the two can form the unique 11-digit phonetic code of each character, including the first phonetic code and the second phonetic code.

具体的,若以A1至A26代表声母表中顺序的26个声母对应的编码字段;以B1至B39代表韵母表中顺序的39个韵母对应的编码字段;以C1至C39代表韵母表中顺序的39个韵母对应的韵母补码对应的编码字段;以D1至D4代表声调一声至四声对应的编码字段;则“花”字的字音码编码信息A11B13C13D1。若以E1至E7分别代表常用字的左右结构、上下结构、左中右结构、上中下结构、半包围结构、全包围结构、镶嵌结构对应的编码字段;以F0至F9、G0至G9、H0至H9、J0至J9、K0至K9分别代表常用字的左上角、右上角、左下角、右下角、附号对应的十类笔形对应的编码字段;以Li(i为笔画数量且i为正整数)代表笔画数量对应的编码字段;则“花”字的字形编码信息为E2F4G4H2J1K4L7,故“花”字的常用字编码信息为A11B13C13D1E2F4G4H2J1K4L7。Concrete, if represent the coding fields corresponding to 26 initial consonants of order in the table of initials with A1 to A26; Represent the code field corresponding to 39 simple or compound vowels of the order in the table of finals and syllables with B1 to B39; The 39 finals correspond to the coding fields corresponding to the finals complement code; D1 to D4 represent the coding fields corresponding to the first to four tones; then the phonetic code coding information of the word "花" is A11B13C13D1. If the left and right structure, up and down structure, left middle and right structure, upper middle and lower structure, half surround structure, full surround structure, mosaic structure corresponding coding fields of representing commonly used characters with E1 to E7; With F0 to F9, G0 to G9, H0 to H9, J0 to J9, K0 to K9 represent respectively the upper left corner, the upper right corner, the lower left corner, the lower right corner, and the coding field corresponding to the ten classes of strokes corresponding to the attached number; with Li (i is the number of strokes and i is Positive integer) represents the coding field corresponding to the number of strokes; then the font coding information of the word "花" is E2F4G4H2J1K4L7, so the coding information of the commonly used characters of the word "flower" is A11B13C13D1E2F4G4H2J1K4L7.

本实施例中,音形码包括十一种类型的编码字段,第一音形码和第二音形码之间同一类型的编码字段不同,则其编辑距离增加1,否则保持原值。若两者之间全部类型的编码字段一致,表示两个常用字相似度最高,两者之间的编辑距离为0,若两者之间全部类型的编码字段都不一致,表示两个常用字相似度最低,则所两者之间的编辑距离为11,故预替换字与常用字之间的编辑距离在0-11之间。In this embodiment, the phonetic-graph code includes eleven types of coding fields. If the same type of coding field is different between the first phonetic-graph code and the second phonetic-graph code, the edit distance is increased by 1; otherwise, the original value remains. If the coding fields of all types are consistent between the two, it means that the two commonly used words have the highest similarity, and the edit distance between the two is 0. If the coding fields of all types between the two are inconsistent, it means that the two commonly used words are similar If the degree is the lowest, the edit distance between the two is 11, so the edit distance between the pre-replaced word and the commonly used word is between 0-11.

本实施例中,由于编辑距离为问答分词中每个文字和专业词典中每个字词之间相似度的量化数值,且编辑距离越小,相似度越高,故用户可设置预置编辑距离阈值,用于筛选用于交叉组合的专业词。In this embodiment, since the edit distance is the quantitative value of the similarity between each word in the question and answer segment and each word in the professional dictionary, and the smaller the edit distance, the higher the similarity, so the user can set the preset edit distance Threshold, used to filter professional words for cross combination.

在对专业词(a1,a2,b1,b2,c1,d1,d2,e1,e2,e3)进行交叉组合时,参照问 答分词组(A,B,C,D,E),选取第一组专业词组(a1,b1,c1,d1,e1),固定专业词“B,C,D,E”,依次将“A”改变为“a1”和“a2”,可以得到专业词组(a1,B,C,D,E)和(a2,B,C,D,E),以此类推,直到全部替换组合完毕时停止。When cross-combining professional words (a1, a2, b1, b2, c1, d1, d2, e1, e2, e3), refer to the question and answer word group (A, B, C, D, E) to select the first group Professional phrase (a1, b1, c1, d1, e1), fixed professional word "B, C, D, E", change "A" to "a1" and "a2" in turn, can get professional phrase (a1, B , C, D, E) and (a2, B, C, D, E), and so on, until all replacement combinations are completed and stop.

本实施例中,通过常规的语义识别模型对问答分词组和专业词组进行语义分析,分别得到第一语义分析结果和第二语义分析结果。若经过对比后,两者之间存在较小的语义偏差,则确定对应专业词组中改变的专业词是对应问答分词的近义词,并作为该问答分词对应的问答专业分词。In this embodiment, a conventional semantic recognition model is used to perform semantic analysis on the question-and-answer sub-phrases and professional phrases, and obtain the first semantic analysis result and the second semantic analysis result respectively. If there is a small semantic deviation between the two after the comparison, it is determined that the professional word changed in the corresponding professional phrase is a synonym for the corresponding question and answer participle, and it is used as the question and answer professional participle corresponding to the question and answer participle.

210、依次对各问诊专业分词和各应答专业分词进行交叉问答匹配,并根据交叉问答匹配的结果,对问诊专业分词和应答专业分词进行组合,得到诊断语句;210. Carry out cross question-and-answer matching for each professional participle of inquiry and each participle of professional response in turn, and according to the result of cross-question-answer matching, combine the participle of professional questioning and participle of professional response to obtain a diagnostic sentence;

211、采用预置先验医疗知识库,匹配诊断语句对应的治疗产品信息,对治疗产品信息和待推送应答语料进行组合,得到新的待推送应答语料并推送。211. Using the preset prior medical knowledge base, matching the treatment product information corresponding to the diagnosis statement, combining the treatment product information and the response corpus to be pushed, obtaining and pushing the new response corpus to be pushed.

本申请实施例中,通过预置的常用词词典和专业词词典来构建问诊分词和应答分词以及专业分词的音形码,通过音形码的匹配来确定每个问诊纷纷次和应答分词的近义词,并进行代替,得到对应的问诊专业分词和应答专业分词,在后续进行产品匹配时更准确。In the embodiment of the present application, the phonetic-phonetic codes of question and answer word segmentation and professional word segmentation are constructed through the preset common word dictionary and professional word dictionary, and the number of each question and answer word segmentation is determined through the matching of phonetic-phonetic codes. Synonyms, and replace them to obtain the corresponding professional participle for consultation and professional participle for answering, which will be more accurate in subsequent product matching.

请参阅图3,本申请实施例中基于人工智能的应答语料生成方法的第三个实施例包括:Referring to Fig. 3, the third embodiment of the artificial intelligence-based answer corpus generation method in the embodiment of the present application includes:

301、获取问诊语料和问诊语料对应的待推送应答语料,采用预置全同态加密算法,将问诊语料和待推送应答语料转化成对应的进制位明文;301. Obtain the medical questioning corpus and the response corpus to be pushed corresponding to the medical questioning corpus, and use a preset fully homomorphic encryption algorithm to convert the medical questioning corpus and the response corpus to be pushed into corresponding plaintext;

302、对进制位明文进行加密运算,得到加密语料,并根据预置模值,计算加密语料的密文原码、密文反码和密文补码;302. Perform encryption operation on the base plaintext to obtain the encrypted corpus, and calculate the original ciphertext code, inverse ciphertext code, and ciphertext complement of the encrypted corpus according to the preset modulus;

303、采用密文原码、密文反码和密文补码,对加密语料进行模运算,得到模加密语料,其中,模加密语料包括问诊语料对应的第一模加密语料和待推送应答语料对应的第二加密语料;303. Using the original code of the ciphertext, the reverse code of the ciphertext, and the complement of the ciphertext, perform a modular operation on the encrypted corpus to obtain the modular encrypted corpus, wherein the modular encrypted corpus includes the first modular encrypted corpus corresponding to the medical inquiry corpus and the response to be pushed The second encrypted corpus corresponding to the corpus;

304、将第一加密语料作为新的问诊语料,以及将第二加密语料作为新的待推送应答语料,并基于预置线性链条件随机场,分别对问诊语料和待推送应答语料进行分词处理,对应得到多个问诊分词和多个应答分词;304. Use the first encrypted corpus as a new question corpus and the second encrypted corpus as a new response corpus to be pushed, and perform word segmentation on the question corpus and the response corpus to be pushed based on the preset linear chain conditional random field Processing, and correspondingly obtain multiple question and response participle words;

本实施例中,问诊语料和待推送应答语料的明文m的类型(Type)为T,T的集合为{整数、实数、字符、日期、布尔型等},已知明文ms,其中,s表示数据进制(System),即二进制、十进制、十六进制、521进制等等,记作(T,ms);例如:s=2表示二进制,二进制通常用B表示,明文m表示为二进制位mB,记作(T,mB);s=16表示十六进制,十六进制通常用H表示,明文m表示为十六进制位mH,记作(T,mH);s=512表示512进制,明文m表示为512进制位m512,记作(T,m512)等。比如,一个问诊语料和一个待推送应答语料分别为m1=5,m2=3,需要分别对其进行二进制位加密,则得到的二进制位明文分别为101和011。In this embodiment, the type (Type) of the plaintext m of the inquiry corpus and the response corpus to be pushed is T, and the set of T is {integer, real number, character, date, Boolean, etc.}, and the known plaintext ms, where s Indicates the data system (System), that is, binary, decimal, hexadecimal, 521, etc., denoted as (T, ms); for example: s=2 means binary, binary is usually represented by B, and plain text m is represented as Binary bit mB is written as (T, mB); s=16 represents hexadecimal, and hexadecimal is usually represented by H, and plaintext m is expressed as hexadecimal bit mH, and is written as (T, mH); s =512 means 512 base, and plaintext m is expressed as 512 base m512, which is recorded as (T, m512) and so on. For example, a medical inquiry corpus and a response corpus to be pushed are respectively m1=5 and m2=3, which need to be encrypted with binary digits respectively, and the obtained binary digit plaintexts are 101 and 011 respectively.

本实施例中,在对进制位明文进行加密运算时,可以采用以下加密运算公式:c=(m+s*r+p*r)mod x0,其中,c表示密文,m表示问诊语料和待推送应答语料明文中的进制位,s表示加密中所采用的进制,r表示随机数,p为加密密钥,x0是一个中间变量,其等于加密密钥p与另一个加密密钥q的乘积。In this embodiment, when performing encryption operations on the base plaintext, the following encryption operation formula can be used: c=(m+s*r+p*r) mod x0, wherein, c represents the ciphertext, and m represents the consultation The base digits in the corpus and the plaintext of the response corpus to be pushed, s represents the base used in encryption, r represents a random number, p is an encryption key, x0 is an intermediate variable, which is equal to the encryption key p and another encryption key The product of the key q.

例如,假设工作密钥p=111,q=11,x0=p*q=1221,令r=1,则对于明文5的二进制位明文101和明文3的二进制位明文011中的数0和1而言,使用本步骤的上述公式计算后,可得到:数0加密后的密文等于113;数1加密后的密文等于114。For example, suppose the working key p=111, q=11, x0=p*q=1221, let r=1, then for the numbers 0 and 1 in the binary bit plaintext 101 of plaintext 5 and the binary bit plaintext 011 of plaintext 3 In other words, after calculation using the above formula in this step, it can be obtained: the ciphertext encrypted by number 0 is equal to 113; the ciphertext encrypted by number 1 is equal to 114.

另外,经过原码、反码、补码的计算方法,通过加密语料即可计算得到对应的密文原码、密文反码和密文补码,而在对加密语料的密文原码、密文反码和密文补码进行加密计算时,对于加密语料的加法运算,直接将其中的密文组合进行对位求和运算,而无需用到 密文原码、密文反码和密文补码。In addition, through the calculation method of original code, inverse code and complement code, the corresponding original ciphertext code, inverse ciphertext code and complement code of ciphertext can be calculated through the encrypted corpus, and the original ciphertext code, When the ciphertext inverse code and the ciphertext complement code are used for encryption calculation, for the addition operation of the encrypted corpus, the ciphertext combination in it is directly summed in place without using the original ciphertext code, ciphertext inverse code and ciphertext Text complement.

其中,在对加密语料进行减法运算时,首先获取减数的加密语料的反码,然后根据该反码获取对应的补码,最后将该补码与被减数的加密语料的原码进行对位求和运算。Among them, when performing subtraction on the encrypted corpus, first obtain the inverse code of the encrypted corpus of the subtrahend, then obtain the corresponding complement according to the inverse code, and finally compare the complement with the original code of the encrypted corpus of the minuend bit sum operation.

其中,在对于加密语料进行乘法运算时,首先根据加密语料中文字元素的个数n创建一个n*(2n-1)的矩阵,然后,将构建的矩阵的每列进行求和,从而得到一个新的行向量,取该行向量作为加密语料的乘法运算结果。Among them, when performing multiplication for the encrypted corpus, first create an n*(2n-1) matrix according to the number n of text elements in the encrypted corpus, and then sum each column of the constructed matrix to obtain a The new row vector, take this row vector as the multiplication result of the encrypted corpus.

其中,在对于加密语料进行除法运算时,创建空的除法运算结果的存储格式,该存储格式的总长度为32位、64位或80位,且包括符号位、整数位和小数位,并根据该存储格式对二进制位明文进行扩展;对扩展后的二进制位明文进行加密运算,将加密运算结果进行组合,从而得到对应的密文分别作为被除数和除数;设置小数位计数器count的初始值等于存储格式的长度-L,其中L是存储格式中整数位的长度;判断被除数的密文是否大于除数的密文,如果大于,则将被除数的密文与加密语料的补码做加法,得到余数作为新的被除数,并且在整数位用1的密文做加法,即得到的是密文商;否则判断余数的密文是否全部为零或小数位计数器count大于存储格式的总长度,如果是,则直接按前面预设的存储格式进行存放,否则在余数密文的最右边添加0的密文,得到新的余数密文,判断步骤新的余数密文是否大于除数的密文,如果是大于则将新的余数密文与除数的密文补码做加法,以再次获得新的余数密文,同时将第count个小数位的值设置为1对应的密文值;将第count个小数位的值设置为0对应的密文值,将小数位计数器count加1,根据得到的密文值获取商的整数部分和小数部分,并按前面预设的存储格式进行存放。Among them, when performing a division operation on the encrypted corpus, create an empty storage format for the division result. The total length of the storage format is 32 bits, 64 bits or 80 bits, and includes sign bits, integer bits and decimal places, and according to This storage format expands the binary bit plaintext; performs encryption operations on the expanded binary bit plaintext, and combines the results of the encrypted operations to obtain the corresponding ciphertext as the dividend and divisor respectively; set the initial value of the decimal counter count equal to the storage Format length-L, where L is the length of integer bits in the storage format; judge whether the ciphertext of the dividend is greater than the ciphertext of the divisor, if greater, then add the ciphertext of the dividend to the complement of the encrypted corpus, and obtain the remainder as The new dividend, and add the ciphertext of 1 in the integer position, that is, the ciphertext quotient is obtained; otherwise, judge whether the ciphertext of the remainder is all zero or the decimal place counter count is greater than the total length of the storage format, if yes, then Store directly according to the previously preset storage format, otherwise add 0 ciphertext to the far right of the remainder ciphertext to obtain a new remainder ciphertext, and determine whether the new remainder ciphertext in the step is greater than the ciphertext of the divisor, if it is greater than then Add the new remainder ciphertext and the ciphertext complement of the divisor to obtain the new remainder ciphertext again, and set the value of the countth decimal place to the ciphertext value corresponding to 1; set the countth decimal place Set the value to the ciphertext value corresponding to 0, add 1 to the decimal counter count, obtain the integer part and decimal part of the quotient according to the obtained ciphertext value, and store it in the previously preset storage format.

305、构建问答分词在预置常用词词典中的第一音形码,以及构建预置专业词词典中各专业词的第二音形码,并计算第一音形码和第二音形码之间的编辑距离;305. Construct the first phonetic-graphic code of the question-and-answer word segmentation in the preset common word dictionary, and construct the second phonetic-graphic code of each professional word in the preset professional word dictionary, and calculate the first phonetic-graphic code and the second phonetic-graphic code The edit distance between;

306、对编辑距离小于预置编辑距离阈值的第一音形码对应的问答分词进行组合,得到问答分词组,以及选取编辑距离小于编辑距离阈值的第二音形码对应的专业词;306. Combining the question-and-answer participle corresponding to the first phonetic-phonetic code whose edit distance is less than the preset edit-distance threshold, to obtain the question-and-answer word group, and selecting the professional word corresponding to the second phonetic-phonetic code whose edit distance is less than the edit-distance threshold;

307、依次采用选取的专业词替换问答分词组中对应的问答分词,得到问答分词组对应的多个专业词组;307. Using the selected professional words in turn to replace the corresponding question and answer participle in the question and answer participle to obtain multiple professional phrases corresponding to the question and answer participle;

308、对问答分词组进行语义分析,得到第一语义分析结果,以及对各专业词组进行语义分析,得到多个第二语义分析结果;308. Perform semantic analysis on the question and answer phrases to obtain a first semantic analysis result, and perform semantic analysis on each professional phrase to obtain multiple second semantic analysis results;

309、分别对第一语义分析结果与各第二语义分析结果进行对比;309. Comparing the first semantic analysis result and each second semantic analysis result respectively;

310、根据对比的结果,分别计算第一语义分析结果与各第二语义分析结果之间的差异程度,并根据差异程度,确定问答分词组与各专业词组之间的相似度;310. According to the comparison results, respectively calculate the degree of difference between the first semantic analysis result and each second semantic analysis result, and determine the similarity between the question and answer phrase group and each professional phrase group according to the degree of difference;

311、将问答分词组中每个问答分词对应专业词所在的专业词组进行分类,得到多个问答分词类别的专业词组;311. Classify the professional phrases corresponding to the professional words in the question-and-answer word segmentation group for each question-and-answer word segmentation, and obtain multiple professional phrases of the question-and-answer word segmentation categories;

312、分别从各个问答分词类别的专业词组中选取相似度最大的专业词组,并将选取的专业词组中对应问答分词类别的专业词作为问答分词的近义词;312. Select the professional phrase with the highest similarity from the professional phrases of each question and answer word segmentation category, and use the professional words corresponding to the question and answer word segmentation category in the selected professional phrases as synonyms for the question and answer word segmentation;

本实施例中,对于对专业词组(a1,a2,b1,b2,c1,d1,d2,e1,e2,e3)进行交叉组合时,参照问答分词组(A,B,C,D,E),针对“A”对应的专业词改变“a1”、“a2”,将对应的专业词组(a1,B,C,D,E)和(a2,B,C,D,E)分为一类,以此类推,得到多个问答分词类别的专业词组,与问答分词组相似度最大的专业词组,其改变的专业词与问答分词组中的问答分词也相似度最大,则可以作为该问答分词的近义词。In the present embodiment, when carrying out cross-combination to professional phrase (a1, a2, b1, b2, c1, d1, d2, e1, e2, e3), refer to question and answer word segmentation (A, B, C, D, E) , change "a1" and "a2" for the professional words corresponding to "A", and divide the corresponding professional phrases (a1, B, C, D, E) and (a2, B, C, D, E) into one category , and so on, get the professional phrases of multiple question and answer word segmentation categories, and the professional phrases with the greatest similarity with the question and answer word segmentation, and the changed professional words are also the most similar to the question and answer word segmentation in the question and answer word segmentation, then it can be used as the question and answer word segmentation synonyms of .

313、将选取的近义词作为问答分词对应的问答专业分词,其中,问答专业分词包括问诊专业分词和应答专业分词;313. Use the selected synonym as the question-and-answer professional participle corresponding to the question-and-answer participle, wherein the question-and-answer professional participle includes the inquiry-specialized participle and the response-professional participle;

314、依次对各问诊专业分词和各应答专业分词进行交叉问答匹配,并根据交叉问答匹配的结果,对问诊专业分词和应答专业分词进行组合,得到诊断语句;314. Carry out cross question-and-answer matching on each question segment and each response segment in turn, and according to the cross-question matching result, combine the query segment and the response segment to obtain a diagnostic statement;

315、采用预置先验医疗知识库,匹配诊断语句对应的治疗产品信息,对治疗产品信息 和待推送应答语料进行组合,得到新的待推送应答语料并推送。315. Use the preset prior medical knowledge base to match the treatment product information corresponding to the diagnosis statement, combine the treatment product information and the response corpus to be pushed, and obtain and push the new response corpus to be pushed.

本申请实施例中,通过对问诊语料和待推送应答语料进行进一步的加密,并通过对密文进行计算,产品推荐等数据处理过程,更能保证患者的个人隐私信息,提升患者的问诊体验感。In the embodiment of this application, by further encrypting the inquiry corpus and the response corpus to be pushed, and by calculating the ciphertext, product recommendation and other data processing processes, the patient's personal privacy information can be guaranteed, and the patient's consultation can be improved. sense of experience.

上面对本申请实施例中基于人工智能的应答语料生成方法进行了描述,下面对本申请实施例中基于人工智能的应答语料生成装置进行描述,请参阅图4,本申请实施例中基于人工智能的应答语料生成装置一个实施例包括:The above describes the artificial intelligence-based response corpus generation method in the embodiment of the present application, and the following describes the artificial intelligence-based response corpus generation device in the embodiment of the present application. Please refer to FIG. 4, the artificial intelligence-based response in the embodiment of the present application One embodiment of the corpus generation device includes:

分词模块401,用于获取问诊语料和所述问诊语料对应的待推送应答语料,并基于预置线性链条件随机场,分别对所述问诊语料和所述待推送应答语料进行分词处理,对应得到多个问诊分词和多个应答分词;The word segmentation module 401 is used to obtain the query corpus and the response corpus to be pushed corresponding to the query corpus, and perform word segmentation processing on the query corpus and the response corpus to be pushed based on the preset linear chain conditional random field , correspondingly obtain multiple question and response participle words;

语义匹配模块402,用于分别对所述问诊分词和所述应答分词进行专业词语义匹配,对应得到所述问诊分词对应的问诊专业分词和所述应答分词对应的应答专业分词;The semantic matching module 402 is used to carry out semantic matching of professional words to the question participle and the response participle respectively, and correspondingly obtain the professional question participle corresponding to the question participle and the response professional participle corresponding to the response participle;

问答匹配模块403,用于依次对各所述问诊专业分词和各所述应答专业分词进行交叉问答匹配,并根据交叉问答匹配的结果,对所述问诊专业分词和所述应答专业分词进行组合,得到诊断语句;The question-and-answer matching module 403 is used to sequentially perform cross question-and-answer matching on each of the professional word segmentations of the inquiry and each of the professional word segmentations of the responses, and perform cross-question matching on the professional word segmentation of the inquiry and the professional word segmentation of the responses according to the results of the cross-question matching. Combination to get the diagnostic statement;

组合模块404,用于采用预置先验医疗知识库,匹配诊断语句对应的治疗产品信息,对所述治疗产品信息和所述待推送应答语料进行组合,得到新的待推送应答语料并推送。The combination module 404 is used to match the treatment product information corresponding to the diagnostic sentence using the preset prior medical knowledge base, combine the treatment product information and the response corpus to be pushed, obtain and push new response corpus to be pushed.

本申请实施例中,通过获取患者输入的问诊语料和医生对患者进行待推送应答语料的答复,通过专业词语义匹配来将问诊语料和待推送语料转化为专业的问诊语料和应答语料,并对问诊和应答的专业分词进行匹配,以用于判断患者的病情和医生推荐的治疗方案,得到诊断语句,并根据诊断语句来匹配治疗产品信息,并与待推送应答语料一起推送给患者。实现问诊环节平滑的产品推荐功能。推荐的产品、服务是针对了此次聊天场景的精确推荐,向用户提供了针对特定产品的快速下单功能,实现问诊过程中产权的精准推荐。In the embodiment of the present application, by obtaining the inquiry corpus input by the patient and the doctor's reply to the patient's response corpus to be pushed, the inquiry corpus and the corpus to be pushed are converted into professional inquiry corpus and response corpus through semantic matching of professional words , and match the professional word segmentation of the inquiry and response to judge the patient's condition and the treatment plan recommended by the doctor, obtain the diagnosis sentence, and match the treatment product information according to the diagnosis sentence, and push it together with the response corpus to be pushed to patient. Realize the smooth product recommendation function in the consultation process. The recommended products and services are accurate recommendations for this chat scene, providing users with a quick ordering function for specific products, and realizing accurate recommendations of property rights during the consultation process.

请参阅图5,本申请实施例中基于人工智能的应答语料生成装置的另一个实施例包括:Please refer to Fig. 5, another embodiment of the artificial intelligence-based answer corpus generation device in the embodiment of the present application includes:

分词模块401,用于获取问诊语料和所述问诊语料对应的待推送应答语料,并基于预置线性链条件随机场,分别对所述问诊语料和所述待推送应答语料进行分词处理,对应得到多个问诊分词和多个应答分词;The word segmentation module 401 is used to obtain the query corpus and the response corpus to be pushed corresponding to the query corpus, and perform word segmentation processing on the query corpus and the response corpus to be pushed based on the preset linear chain conditional random field , correspondingly obtain multiple question and response participle words;

语义匹配模块402,用于分别对所述问诊分词和所述应答分词进行专业词语义匹配,对应得到所述问诊分词对应的问诊专业分词和所述应答分词对应的应答专业分词;The semantic matching module 402 is used to carry out semantic matching of professional words to the question participle and the response participle respectively, and correspondingly obtain the professional question participle corresponding to the question participle and the response professional participle corresponding to the response participle;

问答匹配模块403,用于依次对各所述问诊专业分词和各所述应答专业分词进行交叉问答匹配,并根据交叉问答匹配的结果,对所述问诊专业分词和所述应答专业分词进行组合,得到诊断语句;The question-and-answer matching module 403 is used to sequentially perform cross question-and-answer matching on each of the professional word segmentations of the inquiry and each of the professional word segmentations of the responses, and perform cross-question matching on the professional word segmentation of the inquiry and the professional word segmentation of the responses according to the results of the cross-question matching. Combination to get the diagnostic statement;

组合模块404,用于采用预置先验医疗知识库,匹配诊断语句对应的治疗产品信息,对所述治疗产品信息和所述待推送应答语料进行组合,得到新的待推送应答语料并推送。The combination module 404 is used to match the treatment product information corresponding to the diagnostic sentence using the preset prior medical knowledge base, combine the treatment product information and the response corpus to be pushed, obtain and push new response corpus to be pushed.

具体的,所述分词模块401包括:Specifically, the word segmentation module 401 includes:

提取单元4011,用于提取所述问答语料的字符特征向量以及对应的拼音特征向量,其中,问答语料包括问诊语料和待推送应答语料;An extraction unit 4011, configured to extract character feature vectors and corresponding pinyin feature vectors of the question-and-answer corpus, wherein the question-and-answer corpus includes question-and-answer corpus and response corpus to be pushed;

拼接单元4012,用于对所述字符特征向量以及对应的拼音特征向量进行拼接,得到上下文信息向量,并对所述上下文信息向量进行语义分析,得到语义特征;A splicing unit 4012, configured to splice the character feature vectors and corresponding pinyin feature vectors to obtain context information vectors, and perform semantic analysis on the context information vectors to obtain semantic features;

解码单元4013,用于采用预置线性链条件随机场对所述语义特征进行标注,得到分词标注序列,并对所述分词标注序列进行解码,得到多个问答分词,其中,所述问答分词包括问诊分词和应答分词。The decoding unit 4013 is configured to use a preset linear chain conditional random field to mark the semantic features to obtain a word segmentation tag sequence, and decode the word segment tag sequence to obtain a plurality of question and answer word segmentation, wherein the question and answer word segmentation includes Questions and answers.

具体的,所述语义匹配模块402包括:Specifically, the semantic matching module 402 includes:

构建单元4021,用于构建所述问答分词在预置常用词词典中的第一音形码,以及构建预置专业词词典中各专业词的第二音形码,并计算第一音形码和第二音形码之间的编辑距离;Construction unit 4021, for constructing the first phonetic-phonetic code of the question-and-answer participle in the preset common word dictionary, and constructing the second phonetic-phonetic code of each professional word in the preset professional word dictionary, and calculating the first phonetic-phonetic code and the edit distance between the second phonetic-graph code;

组合单元4022,用于对编辑距离小于预置编辑距离阈值的第一音形码对应的问答分词进行组合,得到问答分词组,以及选取编辑距离小于所述编辑距离阈值的第二音形码对应的专业词;The combination unit 4022 is used to combine the question and answer participle corresponding to the first phonetic-phonetic code whose editing distance is less than the preset editing distance threshold to obtain the question-answering word group, and select the second phonetic-phonetic code corresponding to the second phonetic-phonetic code whose editing distance is less than the editing distance threshold professional term;

替换单元4023,用于依次采用选取的专业词替换所述问答分词组中对应的问答分词,得到问答分词组对应的多个专业词组;The replacement unit 4023 is used to replace the corresponding question and answer participle in the question and answer participle with the selected professional words in turn, so as to obtain a plurality of professional phrases corresponding to the question and answer participle;

语义分析单元4024,用于对所述问答分词组进行语义分析,得到第一语义分析结果,以及对各所述专业词组进行语义分析,得到多个第二语义分析结果;Semantic analysis unit 4024, configured to perform semantic analysis on the question and answer phrases to obtain a first semantic analysis result, and perform semantic analysis on each of the professional phrases to obtain multiple second semantic analysis results;

对比单元4025,用于分别对所述第一语义分析结果与各所述第二语义分析结果进行对比,并根据对比的结果,从多个所述专业词组中选取所述问答分词组中各问答分词的近义词;将选取的近义词作为所述问答分词对应的问答专业分词,其中,所述问答专业分词包括问诊专业分词和应答专业分词。The comparison unit 4025 is configured to compare the first semantic analysis result with each of the second semantic analysis results, and select each question and answer in the question and answer sub-phrase group from a plurality of professional phrases according to the comparison result A synonym of the participle; use the selected synonym as the question and answer professional participle corresponding to the question and answer participle, wherein the question and answer professional participle includes an inquiry professional participle and a response professional participle.

具体的,所述对比单元4025还用于:Specifically, the comparison unit 4025 is also used for:

根据对比的结果,分别计算所述第一语义分析结果与各所述第二语义分析结果之间的差异程度,并根据所述差异程度,确定所述问答分词组与各所述专业词组之间的相似度;According to the result of the comparison, respectively calculate the degree of difference between the first semantic analysis result and each of the second semantic analysis results, and according to the degree of difference, determine the difference between the question and answer phrase group and each of the professional phrases the similarity;

将所述问答分词组中每个问答分词对应专业词所在的专业词组进行分类,得到多个问答分词类别的专业词组;Classify the professional phrases where each question and answer participle corresponds to the professional word in the question and answer participle, and obtain the professional phrases of a plurality of question and answer participle categories;

分别从各个问答分词类别的专业词组中选取相似度最大的专业词组,并将选取的专业词组中对应问答分词类别的专业词作为所述问答分词的近义词。Select the professional phrase with the highest similarity from the professional phrases of each question and answer word segmentation category, and use the professional words corresponding to the question and answer word segmentation category in the selected professional phrases as synonyms for the question and answer word segmentation.

具体的,所述组合模块404包括:Specifically, the combining module 404 includes:

遍历单元4041,用于采用所述诊断语句,在预置先验医疗知识库中进行层次遍历,并根据层次遍历的结果,确定所述诊断语句对应的诊断结果;The traversal unit 4041 is configured to use the diagnostic statement to perform hierarchical traversal in the preset priori medical knowledge base, and determine the diagnosis result corresponding to the diagnostic statement according to the result of the hierarchical traversal;

筛选单元4042,用于从所述先验知识库中选取与所述诊断结果相匹配的治疗产品标识信息,并获取与所述治疗产品标识信息相映射的治疗产品信息,其中,所述治疗产品信息包括治疗产品的推荐链接和摘要信息。The screening unit 4042 is configured to select the therapeutic product identification information matching the diagnosis result from the prior knowledge base, and obtain the therapeutic product information mapped to the therapeutic product identification information, wherein the therapeutic product Information includes referral links and summary information for therapeutic products.

具体的,所述基于人工智能的应答语料生成装置还包括加密模块405,用于:Specifically, the artificial intelligence-based answer corpus generation device also includes an encryption module 405, which is used for:

采用预置全同态加密算法,将所述问诊语料和所述待推送应答语料转化成对应的进制位明文;Using a preset fully homomorphic encryption algorithm, converting the medical inquiry corpus and the response corpus to be pushed into corresponding base plaintext;

对所述进制位明文进行加密运算,得到加密语料,并根据预置模值,计算所述加密语料的密文原码、密文反码和密文补码;Encrypting the base plaintext to obtain encrypted corpus, and calculating the original ciphertext, inverse ciphertext and complement of ciphertext of the encrypted corpus according to the preset modulus;

采用所述密文原码、所述密文反码和所述密文补码,对所述加密语料进行模运算,得到模加密语料,其中,所述模加密语料包括所述问诊语料对应的第一模加密语料和所述待推送应答语料对应的第二加密语料;Using the original code of the ciphertext, the inverse code of the ciphertext, and the complementary code of the ciphertext, perform a modular operation on the encrypted corpus to obtain a modular encrypted corpus, wherein the modular encrypted corpus includes the corresponding The first modular encrypted corpus and the second encrypted corpus corresponding to the response corpus to be pushed;

将所述第一加密语料作为新的问诊语料,以及将所述第二加密语料作为新的待推送应答语料。The first encrypted corpus is used as a new interrogation corpus, and the second encrypted corpus is used as a new response corpus to be pushed.

本申请实施例中,通过预置的常用词词典和专业词词典来构建问诊分词和应答分词以及专业分词的音形码,通过音形码的匹配来确定每个问诊纷纷次和应答分词的近义词,并进行代替,得到对应的问诊专业分词和应答专业分词,在后续进行产品匹配时更准确;通过对问诊语料和待推送应答语料进行进一步的加密,并通过对密文进行计算,产品推荐等数据处理过程,更能保证患者的个人隐私信息,提升患者的问诊体验感。In the embodiment of the present application, the phonetic-phonetic codes of question and answer word segmentation and professional word segmentation are constructed through the preset common word dictionary and professional word dictionary, and the number of each question and answer word segmentation is determined through the matching of phonetic-phonetic codes. Synonyms, and replace them, to get the corresponding professional word segmentation and response professional word segmentation, which is more accurate in subsequent product matching; by further encrypting the query corpus and the response corpus to be pushed, and by calculating the ciphertext , product recommendation and other data processing processes can better guarantee the patient's personal privacy information and improve the patient's consultation experience.

上面图4和图5从模块化功能实体的角度对本申请实施例中的基于人工智能的应答语料生成装置进行详细描述,下面从硬件处理的角度对本申请实施例中基于人工智能的应答语料生成设备进行详细描述。Figure 4 and Figure 5 above describe in detail the artificial intelligence-based response corpus generation device in the embodiment of the present application from the perspective of modular functional entities, and the following describes the artificial intelligence-based response corpus generation device in the embodiment of the present application from the perspective of hardware processing Describe in detail.

图6是本申请实施例提供的一种基于人工智能的应答语料生成设备的结构示意图,该基于人工智能的应答语料生成设备600可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上处理器(central processing units,CPU)610(例如,一个或一个以上处理器)和存储器620,一个或一个以上存储应用程序633或数据632的存储介质630(例如一个或一个以上海量存储设备)。其中,存储器620和存储介质630可以是短暂存储或持久存储。存储在存储介质630的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对基于人工智能的应答语料生成设备600中的一系列指令操作。更进一步地,处理器610可以设置为与存储介质630通信,在基于人工智能的应答语料生成设备600上执行存储介质630中的一系列指令操作。FIG. 6 is a schematic structural diagram of an artificial intelligence-based response corpus generation device provided by an embodiment of the present application. The artificial intelligence-based response corpus generation device 600 may have relatively large differences due to different configurations or performances, and may include one or More than one processor (central processing units, CPU) 610 (for example, one or more processors) and memory 620, one or more storage media 630 for storing application programs 633 or data 632 (for example, one or more mass storage devices ). Wherein, the memory 620 and the storage medium 630 may be temporary storage or persistent storage. The program stored in the storage medium 630 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations for the artificial intelligence-based answer corpus generation device 600 . Furthermore, the processor 610 may be configured to communicate with the storage medium 630 , and execute a series of instruction operations in the storage medium 630 on the artificial intelligence-based response corpus generating device 600 .

基于人工智能的应答语料生成设备600还可以包括一个或一个以上电源640,一个或一个以上有线或无线网络接口650,一个或一个以上输入输出接口660,和/或,一个或一个以上操作系统631,例如Windows Serve,Mac OS X,Unix,Linux,FreeBSD等等。本领域技术人员可以理解,图6示出的基于人工智能的应答语料生成设备结构并不构成对基于人工智能的应答语料生成设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。The artificial intelligence-based answer corpus generating device 600 may also include one or more power sources 640, one or more wired or wireless network interfaces 650, one or more input and output interfaces 660, and/or, one or more operating systems 631 , such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, etc. Those skilled in the art can understand that the structure of the artificial intelligence-based answer corpus generation device shown in FIG. 6 does not constitute a limitation to the artificial intelligence-based answer corpus generation device, and may include more or less components than those shown in the illustration, or Combining certain parts, or different arrangements of parts.

本申请还提供一种基于人工智能的应答语料生成设备,所述计算机设备包括存储器和处理器,存储器中存储有计算机可读指令,计算机可读指令被处理器执行时,使得处理器执行上述各实施例中的所述基于人工智能的应答语料生成方法的步骤。The present application also provides an artificial intelligence-based response corpus generation device, the computer device includes a memory and a processor, and computer-readable instructions are stored in the memory, and when the computer-readable instructions are executed by the processor, the processor executes the above-mentioned tasks. The steps of the artificial intelligence-based response corpus generation method in the embodiments.

本申请还提供一种计算机可读存储介质,该计算机可读存储介质可以为非易失性计算机可读存储介质,该计算机可读存储介质也可以为易失性计算机可读存储介质,所述计算机可读存储介质中存储有指令,当所述指令在计算机上运行时,使得计算机执行如下所述的基于人工智能的应答语料生成方法的步骤:获取问诊语料和所述问诊语料对应的待推送应答语料,并基于预置线性链条件随机场,分别对所述问诊语料和所述待推送应答语料进行分词处理,对应得到多个问诊分词和多个应答分词;分别对所述问诊分词和所述应答分词进行专业词语义匹配,对应得到所述问诊分词对应的问诊专业分词和所述应答分词对应的应答专业分词;依次对各所述问诊专业分词和各所述应答专业分词进行交叉问答匹配,并根据交叉问答匹配的结果,对所述问诊专业分词和所述应答专业分词进行组合,得到诊断语句;采用预置先验医疗知识库,匹配诊断语句对应的治疗产品信息,对所述治疗产品信息和所述待推送应答语料进行组合,得到新的待推送应答语料并推送。The present application also provides a computer-readable storage medium. The computer-readable storage medium may be a non-volatile computer-readable storage medium. The computer-readable storage medium may also be a volatile computer-readable storage medium. Instructions are stored in the computer-readable storage medium, and when the instructions are run on the computer, the computer is made to execute the steps of the method for generating answer corpus based on artificial intelligence as follows: obtaining the question corpus and the information corresponding to the question corpus The response corpus to be pushed, and based on the preset linear chain conditional random field, respectively perform word segmentation processing on the inquiry corpus and the response corpus to be pushed, correspondingly obtain a plurality of question segmentation words and a plurality of response word segmentation; The questioning participle and the response participle carry out professional word semantic matching, correspondingly obtain the questioning specialty participle corresponding to the described questioning participle and the answering specialty participle corresponding to the response participle; According to the results of the cross-question-answer matching, the professional word segmentation of the inquiry and the professional word segmentation of the response are combined to obtain the diagnostic sentence; the pre-set prior medical knowledge base is used to match the corresponding diagnostic sentence. Combining the treatment product information with the response corpus to be pushed to obtain and push the new response corpus to be pushed.

所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the above-described system, device and unit can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.

所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or part of the contribution to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disk or optical disc and other media that can store program codes. .

以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述 实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。As mentioned above, the above embodiments are only used to illustrate the technical solutions of the present application, and are not intended to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: it can still understand the foregoing The technical solutions described in each embodiment are modified, or some of the technical features are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the various embodiments of the application.

Claims (20)

一种基于人工智能的应答语料生成方法,其中,所述基于人工智能的应答语料生成方法包括:A method for generating answer corpus based on artificial intelligence, wherein the method for generating answer corpus based on artificial intelligence includes: 获取问诊语料和所述问诊语料对应的待推送应答语料,并基于预置线性链条件随机场,分别对所述问诊语料和所述待推送应答语料进行分词处理,对应得到多个问诊分词和多个应答分词;Obtaining the question corpus and the response corpus to be pushed corresponding to the question corpus, and performing word segmentation processing on the question corpus and the response corpus to be pushed respectively based on the preset linear chain conditional random field, correspondingly obtaining a plurality of question Diagnosis participle and multiple response participle; 分别对所述问诊分词和所述应答分词进行专业词语义匹配,对应得到所述问诊分词对应的问诊专业分词和所述应答分词对应的应答专业分词;Carry out professional word semantic matching to described question participle and described response participle respectively, correspondingly obtain the professional question participle corresponding to described question participle and the response professional participle corresponding to described response participle; 依次对各所述问诊专业分词和各所述应答专业分词进行交叉问答匹配,并根据交叉问答匹配的结果,对所述问诊专业分词和所述应答专业分词进行组合,得到诊断语句;Carry out cross question-and-answer matching on each of the professional participle of the inquiry and each of the professional participle of the response in turn, and according to the result of the cross question-answer matching, combine the professional participle of the inquiry and the professional participle of the response to obtain a diagnostic sentence; 采用预置先验医疗知识库,匹配诊断语句对应的治疗产品信息,对所述治疗产品信息和所述待推送应答语料进行组合,得到新的待推送应答语料并推送。The preset prior medical knowledge base is used to match the treatment product information corresponding to the diagnosis statement, and the treatment product information and the response corpus to be pushed are combined to obtain and push a new response corpus to be pushed. 根据权利要求1所述的基于人工智能的应答语料生成方法,其中,所述分别对所述问诊语料和所述待推送应答语料进行分词处理,对应得到多个问诊分词和多个应答分词包括:The method for generating response corpus based on artificial intelligence according to claim 1, wherein, the word segmentation processing is performed on the question corpus and the response corpus to be pushed respectively, and a plurality of question segmentation words and a plurality of response segmentation words are correspondingly obtained include: 提取所述问答语料的字符特征向量以及对应的拼音特征向量,其中,问答语料包括问诊语料和待推送应答语料;Extracting character feature vectors and corresponding pinyin feature vectors of the question-and-answer corpus, wherein the question-and-answer corpus includes question-and-answer corpus and response corpus to be pushed; 对所述字符特征向量以及对应的拼音特征向量进行拼接,得到上下文信息向量,并对所述上下文信息向量进行语义分析,得到语义特征;Splicing the character feature vector and the corresponding pinyin feature vector to obtain a context information vector, and performing semantic analysis on the context information vector to obtain semantic features; 采用预置线性链条件随机场对所述语义特征进行标注,得到分词标注序列,并对所述分词标注序列进行解码,得到多个问答分词,其中,所述问答分词包括问诊分词和应答分词。Using the preset linear chain conditional random field to mark the semantic features to obtain a word segmentation tagging sequence, and decoding the word segmentation tagging sequence to obtain a plurality of question and answer word segmentation, wherein the question and answer word segmentation includes question word segmentation and response word segmentation . 根据权利要求2所述的基于人工智能的应答语料生成方法,其中,所述分别对所述问诊分词和所述应答分词进行专业词语义匹配,对应得到所述问诊分词对应的问诊专业分词和所述应答分词对应的应答专业分词包括:The method for generating response corpus based on artificial intelligence according to claim 2, wherein said professional word semantic matching is performed on said question participle and said response participle respectively, and correspondingly obtains the inquiry specialty corresponding to said question participle Participle and the response professional participle corresponding to the response participle include: 构建所述问答分词在预置常用词词典中的第一音形码,以及构建预置专业词词典中各专业词的第二音形码,并计算第一音形码和第二音形码之间的编辑距离;Construct the first phonetic-graphic code of the question-and-answer participle in the preset common word dictionary, and construct the second phonetic-graphic code of each professional word in the preset professional word dictionary, and calculate the first phonetic-shaped code and the second phonetic-shaped code The edit distance between; 对编辑距离小于预置编辑距离阈值的第一音形码对应的问答分词进行组合,得到问答分词组,以及选取编辑距离小于所述编辑距离阈值的第二音形码对应的专业词;Combining the question-and-answer participle corresponding to the first phonetic-phonetic code whose edit distance is less than the preset edit-distance threshold, obtains the question-and-answer word group, and selecting the professional word corresponding to the second phonetic-phonetic code whose edit distance is less than the edit-distance threshold; 依次采用选取的专业词替换所述问答分词组中对应的问答分词,得到问答分词组对应的多个专业词组;Adopting selected professional words to replace the corresponding question and answer participle in the question and answer participle in turn, obtain a plurality of professional phrases corresponding to the question and answer participle; 对所述问答分词组进行语义分析,得到第一语义分析结果,以及对各所述专业词组进行语义分析,得到多个第二语义分析结果;Performing semantic analysis on the question and answer phrases to obtain a first semantic analysis result, and performing semantic analysis to each of the professional phrases to obtain a plurality of second semantic analysis results; 分别对所述第一语义分析结果与各所述第二语义分析结果进行对比,并根据对比的结果,从多个所述专业词组中选取所述问答分词组中各问答分词的近义词;Comparing the first semantic analysis result with each of the second semantic analysis results, and selecting synonyms for each question and answer participle in the question and answer participle group from a plurality of the professional phrases according to the results of the comparison; 将选取的近义词作为所述问答分词对应的问答专业分词,其中,所述问答专业分词包括问诊专业分词和应答专业分词。The selected synonyms are used as the question and answer professional participle corresponding to the question and answer participle, wherein the question and answer professional participle includes the question and answer professional participle. 根据权利要求3所述的基于人工智能的应答语料生成方法,其中,所述根据对比的结果,从多个所述专业词组中选取所述问答分词组中各问答分词的近义词包括:The artificial intelligence-based answer corpus generation method according to claim 3, wherein, according to the result of the comparison, selecting the synonyms of each question and answer participle in the question and answer participle group from a plurality of said professional phrases includes: 根据对比的结果,分别计算所述第一语义分析结果与各所述第二语义分析结果之间的差异程度,并根据所述差异程度,确定所述问答分词组与各所述专业词组之间的相似度;According to the result of the comparison, respectively calculate the degree of difference between the first semantic analysis result and each of the second semantic analysis results, and according to the degree of difference, determine the difference between the question and answer phrase group and each of the professional phrases the similarity; 将所述问答分词组中每个问答分词对应专业词所在的专业词组进行分类,得到多个问答分词类别的专业词组;Classify the professional phrases where each question and answer participle corresponds to the professional word in the question and answer participle, and obtain the professional phrases of a plurality of question and answer participle categories; 分别从各个问答分词类别的专业词组中选取相似度最大的专业词组,并将选取的专业 词组中对应问答分词类别的专业词作为所述问答分词的近义词。Select the professional phrase with the highest similarity from the professional phrases of each question and answer word segmentation category, and use the professional words corresponding to the question and answer word segmentation category in the selected professional phrases as synonyms of the question and answer word segmentation. 根据权利要求1-4中任一项所述的基于人工智能的应答语料生成方法,其中,所述采用预置先验医疗知识库,匹配诊断语句对应的治疗产品信息包括:According to the artificial intelligence-based response corpus generation method according to any one of claims 1-4, wherein said use of a preset prior medical knowledge base, matching the treatment product information corresponding to the diagnosis statement includes: 采用所述诊断语句,在预置先验医疗知识库中进行层次遍历,并根据层次遍历的结果,确定所述诊断语句对应的诊断结果;Using the diagnostic statement, perform hierarchical traversal in the preset priori medical knowledge base, and determine the diagnosis result corresponding to the diagnostic statement according to the result of the hierarchical traversal; 从所述先验知识库中选取与所述诊断结果相匹配的治疗产品标识信息,并获取与所述治疗产品标识信息相映射的治疗产品信息,其中,所述治疗产品信息包括治疗产品的推荐链接和摘要信息。Selecting the therapeutic product identification information matching the diagnosis result from the prior knowledge base, and obtaining the therapeutic product information mapped to the therapeutic product identification information, wherein the therapeutic product information includes the recommendation of the therapeutic product Links and Summary Information. 根据权利要求1所述的基于人工智能的应答语料生成方法,其中,在所述获取问诊语料和所述问诊语料对应的待推送应答语料之后,还包括:The artificial intelligence-based answer corpus generation method according to claim 1, wherein, after said obtaining the question corpus and the answer corpus to be pushed corresponding to the question corpus, further comprising: 采用预置全同态加密算法,将所述问诊语料和所述待推送应答语料转化成对应的进制位明文;Using a preset fully homomorphic encryption algorithm, converting the medical inquiry corpus and the response corpus to be pushed into corresponding base plaintext; 对所述进制位明文进行加密运算,得到加密语料,并根据预置模值,计算所述加密语料的密文原码、密文反码和密文补码;Encrypting the base plaintext to obtain encrypted corpus, and calculating the original ciphertext, inverse ciphertext and complement of ciphertext of the encrypted corpus according to the preset modulus; 采用所述密文原码、所述密文反码和所述密文补码,对所述加密语料进行模运算,得到模加密语料,其中,所述模加密语料包括所述问诊语料对应的第一模加密语料和所述待推送应答语料对应的第二加密语料;Using the original code of the ciphertext, the inverse code of the ciphertext, and the complementary code of the ciphertext, perform a modular operation on the encrypted corpus to obtain a modular encrypted corpus, wherein the modular encrypted corpus includes the corresponding The first modular encrypted corpus and the second encrypted corpus corresponding to the response corpus to be pushed; 将所述第一加密语料作为新的问诊语料,以及将所述第二加密语料作为新的待推送应答语料。The first encrypted corpus is used as a new interrogation corpus, and the second encrypted corpus is used as a new response corpus to be pushed. 一种基于人工智能的应答语料生成设备,其中,所述基于人工智能的应答语料生成设备包括:存储器和至少一个处理器,所述存储器中存储有指令;An artificial intelligence-based response corpus generation device, wherein the artificial intelligence-based response corpus generation device includes: a memory and at least one processor, instructions are stored in the memory; 所述至少一个处理器调用所述存储器中的所述指令,以使得所述基于人工智能的应答语料生成设备执行如下所述的基于人工智能的应答语料生成方法的步骤:The at least one processor invokes the instruction in the memory, so that the artificial intelligence-based answer corpus generation device executes the following steps of the artificial intelligence-based answer corpus generation method: 获取问诊语料和所述问诊语料对应的待推送应答语料,并基于预置线性链条件随机场,分别对所述问诊语料和所述待推送应答语料进行分词处理,对应得到多个问诊分词和多个应答分词;Obtaining the question corpus and the response corpus to be pushed corresponding to the question corpus, and performing word segmentation processing on the question corpus and the response corpus to be pushed respectively based on the preset linear chain conditional random field, correspondingly obtaining a plurality of question Diagnosis participle and multiple response participle; 分别对所述问诊分词和所述应答分词进行专业词语义匹配,对应得到所述问诊分词对应的问诊专业分词和所述应答分词对应的应答专业分词;Carry out professional word semantic matching to described question participle and described response participle respectively, correspondingly obtain the professional question participle corresponding to described question participle and the response professional participle corresponding to described response participle; 依次对各所述问诊专业分词和各所述应答专业分词进行交叉问答匹配,并根据交叉问答匹配的结果,对所述问诊专业分词和所述应答专业分词进行组合,得到诊断语句;Carry out cross question-and-answer matching on each of the professional participle of the inquiry and each of the professional participle of the response in turn, and according to the result of the cross question-answer matching, combine the professional participle of the inquiry and the professional participle of the response to obtain a diagnostic sentence; 采用预置先验医疗知识库,匹配诊断语句对应的治疗产品信息,对所述治疗产品信息和所述待推送应答语料进行组合,得到新的待推送应答语料并推送。The preset prior medical knowledge base is used to match the treatment product information corresponding to the diagnosis statement, and the treatment product information and the response corpus to be pushed are combined to obtain and push a new response corpus to be pushed. 根据权利要求7所述的基于人工智能的应答语料生成设备,其中,所述分别对所述问诊语料和所述待推送应答语料进行分词处理,对应得到多个问诊分词和多个应答分词包括:The artificial intelligence-based response corpus generating device according to claim 7, wherein said word segmentation processing is performed on said question corpus and said response corpus to be pushed, correspondingly obtaining a plurality of question segmentation words and a plurality of response segmentation words include: 提取所述问答语料的字符特征向量以及对应的拼音特征向量,其中,问答语料包括问诊语料和待推送应答语料;Extracting character feature vectors and corresponding pinyin feature vectors of the question-and-answer corpus, wherein the question-and-answer corpus includes question-and-answer corpus and response corpus to be pushed; 对所述字符特征向量以及对应的拼音特征向量进行拼接,得到上下文信息向量,并对所述上下文信息向量进行语义分析,得到语义特征;Splicing the character feature vector and the corresponding pinyin feature vector to obtain a context information vector, and performing semantic analysis on the context information vector to obtain semantic features; 采用预置线性链条件随机场对所述语义特征进行标注,得到分词标注序列,并对所述分词标注序列进行解码,得到多个问答分词,其中,所述问答分词包括问诊分词和应答分词。Using the preset linear chain conditional random field to mark the semantic features to obtain a word segmentation tagging sequence, and decoding the word segmentation tagging sequence to obtain a plurality of question and answer word segmentation, wherein the question and answer word segmentation includes question word segmentation and response word segmentation . 根据权利要求8所述的基于人工智能的应答语料生成设备,其中,所述分别对所述问诊分词和所述应答分词进行专业词语义匹配,对应得到所述问诊分词对应的问诊专业分 词和所述应答分词对应的应答专业分词包括:According to the artificial intelligence-based response corpus generation device according to claim 8, wherein said professional word semantic matching is performed on said question participle and said response participle respectively, and correspondingly obtains the inquiry specialty corresponding to said question participle Participle and the response professional participle corresponding to the response participle include: 构建所述问答分词在预置常用词词典中的第一音形码,以及构建预置专业词词典中各专业词的第二音形码,并计算第一音形码和第二音形码之间的编辑距离;Construct the first phonetic-graphic code of the question-and-answer participle in the preset common word dictionary, and construct the second phonetic-graphic code of each professional word in the preset professional word dictionary, and calculate the first phonetic-shaped code and the second phonetic-shaped code The edit distance between; 对编辑距离小于预置编辑距离阈值的第一音形码对应的问答分词进行组合,得到问答分词组,以及选取编辑距离小于所述编辑距离阈值的第二音形码对应的专业词;Combining the question-and-answer participle corresponding to the first phonetic-phonetic code whose edit distance is less than the preset edit-distance threshold, obtains the question-and-answer word group, and selecting the professional word corresponding to the second phonetic-phonetic code whose edit distance is less than the edit-distance threshold; 依次采用选取的专业词替换所述问答分词组中对应的问答分词,得到问答分词组对应的多个专业词组;Adopting selected professional words to replace the corresponding question and answer participle in the question and answer participle in turn, obtain a plurality of professional phrases corresponding to the question and answer participle; 对所述问答分词组进行语义分析,得到第一语义分析结果,以及对各所述专业词组进行语义分析,得到多个第二语义分析结果;Performing semantic analysis on the question and answer phrases to obtain a first semantic analysis result, and performing semantic analysis to each of the professional phrases to obtain a plurality of second semantic analysis results; 分别对所述第一语义分析结果与各所述第二语义分析结果进行对比,并根据对比的结果,从多个所述专业词组中选取所述问答分词组中各问答分词的近义词;Comparing the first semantic analysis result with each of the second semantic analysis results, and selecting synonyms for each question and answer participle in the question and answer participle group from a plurality of the professional phrases according to the results of the comparison; 将选取的近义词作为所述问答分词对应的问答专业分词,其中,所述问答专业分词包括问诊专业分词和应答专业分词。The selected synonyms are used as the question and answer professional participle corresponding to the question and answer participle, wherein the question and answer professional participle includes the question and answer professional participle. 根据权利要求9所述的基于人工智能的应答语料生成设备,其中,所述根据对比的结果,从多个所述专业词组中选取所述问答分词组中各问答分词的近义词包括:According to the artificial intelligence-based response corpus generation device according to claim 9, wherein, according to the result of the comparison, selecting synonyms of each question and answer participle in the question and answer participle group from a plurality of said professional phrases includes: 根据对比的结果,分别计算所述第一语义分析结果与各所述第二语义分析结果之间的差异程度,并根据所述差异程度,确定所述问答分词组与各所述专业词组之间的相似度;According to the result of the comparison, respectively calculate the degree of difference between the first semantic analysis result and each of the second semantic analysis results, and according to the degree of difference, determine the difference between the question and answer phrase group and each of the professional phrases the similarity; 将所述问答分词组中每个问答分词对应专业词所在的专业词组进行分类,得到多个问答分词类别的专业词组;Classify the professional phrases where each question and answer participle corresponds to the professional word in the question and answer participle, and obtain the professional phrases of a plurality of question and answer participle categories; 分别从各个问答分词类别的专业词组中选取相似度最大的专业词组,并将选取的专业词组中对应问答分词类别的专业词作为所述问答分词的近义词。Select the professional phrase with the highest similarity from the professional phrases of each question and answer word segmentation category, and use the professional words corresponding to the question and answer word segmentation category in the selected professional phrases as synonyms for the question and answer word segmentation. 根据权利要求7-10中任一项所述的基于人工智能的应答语料生成设备,其中,所述采用预置先验医疗知识库,匹配诊断语句对应的治疗产品信息包括:According to the artificial intelligence-based response corpus generation device according to any one of claims 7-10, wherein said use of a preset prior medical knowledge base, matching the treatment product information corresponding to the diagnosis statement includes: 采用所述诊断语句,在预置先验医疗知识库中进行层次遍历,并根据层次遍历的结果,确定所述诊断语句对应的诊断结果;Using the diagnostic statement, perform hierarchical traversal in the preset priori medical knowledge base, and determine the diagnosis result corresponding to the diagnostic statement according to the result of the hierarchical traversal; 从所述先验知识库中选取与所述诊断结果相匹配的治疗产品标识信息,并获取与所述治疗产品标识信息相映射的治疗产品信息,其中,所述治疗产品信息包括治疗产品的推荐链接和摘要信息。Selecting the therapeutic product identification information matching the diagnosis result from the prior knowledge base, and obtaining the therapeutic product information mapped to the therapeutic product identification information, wherein the therapeutic product information includes the recommendation of the therapeutic product Links and Summary Information. 根据权利要求7所述的基于人工智能的应答语料生成设备,其中,在所述获取问诊语料和所述问诊语料对应的待推送应答语料之后,还包括:The artificial intelligence-based answer corpus generating device according to claim 7, wherein, after said acquiring the question corpus and the answer corpus to be pushed corresponding to the question corpus, further comprising: 采用预置全同态加密算法,将所述问诊语料和所述待推送应答语料转化成对应的进制位明文;Using a preset fully homomorphic encryption algorithm, converting the medical inquiry corpus and the response corpus to be pushed into corresponding base plaintext; 对所述进制位明文进行加密运算,得到加密语料,并根据预置模值,计算所述加密语料的密文原码、密文反码和密文补码;Encrypting the base plaintext to obtain encrypted corpus, and calculating the original ciphertext, inverse ciphertext and complement of ciphertext of the encrypted corpus according to the preset modulus; 采用所述密文原码、所述密文反码和所述密文补码,对所述加密语料进行模运算,得到模加密语料,其中,所述模加密语料包括所述问诊语料对应的第一模加密语料和所述待推送应答语料对应的第二加密语料;Using the original code of the ciphertext, the inverse code of the ciphertext, and the complementary code of the ciphertext, perform a modular operation on the encrypted corpus to obtain a modular encrypted corpus, wherein the modular encrypted corpus includes the corresponding The first modular encrypted corpus and the second encrypted corpus corresponding to the response corpus to be pushed; 将所述第一加密语料作为新的问诊语料,以及将所述第二加密语料作为新的待推送应答语料。The first encrypted corpus is used as a new interrogation corpus, and the second encrypted corpus is used as a new response corpus to be pushed. 一种计算机可读存储介质,所述计算机可读存储介质上存储有指令,其中,所述指令被处理器执行时实现如下所述的基于人工智能的应答语料生成方法的步骤:A computer-readable storage medium, the computer-readable storage medium is stored with instructions, wherein, when the instructions are executed by a processor, the steps of the method for generating answer corpus based on artificial intelligence are realized as follows: 获取问诊语料和所述问诊语料对应的待推送应答语料,并基于预置线性链条件随机场,分别对所述问诊语料和所述待推送应答语料进行分词处理,对应得到多个问诊分词和多个应答分词;Obtaining the question corpus and the response corpus to be pushed corresponding to the question corpus, and performing word segmentation processing on the question corpus and the response corpus to be pushed respectively based on the preset linear chain conditional random field, correspondingly obtaining a plurality of question Diagnosis participle and multiple response participle; 分别对所述问诊分词和所述应答分词进行专业词语义匹配,对应得到所述问诊分词对应的问诊专业分词和所述应答分词对应的应答专业分词;Carry out professional word semantic matching to described question participle and described response participle respectively, correspondingly obtain the professional question participle corresponding to described question participle and the response professional participle corresponding to described response participle; 依次对各所述问诊专业分词和各所述应答专业分词进行交叉问答匹配,并根据交叉问答匹配的结果,对所述问诊专业分词和所述应答专业分词进行组合,得到诊断语句;Carry out cross question-and-answer matching on each of the professional participle of the inquiry and each of the professional participle of the response in turn, and according to the result of the cross question-answer matching, combine the professional participle of the inquiry and the professional participle of the response to obtain a diagnostic sentence; 采用预置先验医疗知识库,匹配诊断语句对应的治疗产品信息,对所述治疗产品信息和所述待推送应答语料进行组合,得到新的待推送应答语料并推送。The preset prior medical knowledge base is used to match the treatment product information corresponding to the diagnosis statement, and the treatment product information and the response corpus to be pushed are combined to obtain and push a new response corpus to be pushed. 根据权利要求13所述的计算机可读存储介质,其中,所述分别对所述问诊语料和所述待推送应答语料进行分词处理,对应得到多个问诊分词和多个应答分词包括:The computer-readable storage medium according to claim 13, wherein said performing word segmentation processing on said question corpus and said response corpus to be pushed respectively, correspondingly obtaining a plurality of question segmentation words and multiple response word segmentations comprises: 提取所述问答语料的字符特征向量以及对应的拼音特征向量,其中,问答语料包括问诊语料和待推送应答语料;Extracting character feature vectors and corresponding pinyin feature vectors of the question-and-answer corpus, wherein the question-and-answer corpus includes question-and-answer corpus and response corpus to be pushed; 对所述字符特征向量以及对应的拼音特征向量进行拼接,得到上下文信息向量,并对所述上下文信息向量进行语义分析,得到语义特征;Splicing the character feature vector and the corresponding pinyin feature vector to obtain a context information vector, and performing semantic analysis on the context information vector to obtain semantic features; 采用预置线性链条件随机场对所述语义特征进行标注,得到分词标注序列,并对所述分词标注序列进行解码,得到多个问答分词,其中,所述问答分词包括问诊分词和应答分词。Using the preset linear chain conditional random field to mark the semantic features to obtain a word segmentation tagging sequence, and decoding the word segmentation tagging sequence to obtain a plurality of question and answer word segmentation, wherein the question and answer word segmentation includes question word segmentation and response word segmentation . 根据权利要求14所述的计算机可读存储介质,其中,所述分别对所述问诊分词和所述应答分词进行专业词语义匹配,对应得到所述问诊分词对应的问诊专业分词和所述应答分词对应的应答专业分词包括:The computer-readable storage medium according to claim 14, wherein the professional word semantic matching is performed on the question participle and the response participle respectively, and correspondingly obtain the professional participle of the questioning and the corresponding participle of the question participle. The professional participle corresponding to the above participles include: 构建所述问答分词在预置常用词词典中的第一音形码,以及构建预置专业词词典中各专业词的第二音形码,并计算第一音形码和第二音形码之间的编辑距离;Construct the first phonetic-graphic code of the question-and-answer participle in the preset common word dictionary, and construct the second phonetic-graphic code of each professional word in the preset professional word dictionary, and calculate the first phonetic-shaped code and the second phonetic-shaped code The edit distance between; 对编辑距离小于预置编辑距离阈值的第一音形码对应的问答分词进行组合,得到问答分词组,以及选取编辑距离小于所述编辑距离阈值的第二音形码对应的专业词;Combining the question-and-answer participle corresponding to the first phonetic-phonetic code whose edit distance is less than the preset edit-distance threshold, obtains the question-and-answer word group, and selecting the professional word corresponding to the second phonetic-phonetic code whose edit distance is less than the edit-distance threshold; 依次采用选取的专业词替换所述问答分词组中对应的问答分词,得到问答分词组对应的多个专业词组;Adopting selected professional words to replace the corresponding question and answer participle in the question and answer participle in turn, obtain a plurality of professional phrases corresponding to the question and answer participle; 对所述问答分词组进行语义分析,得到第一语义分析结果,以及对各所述专业词组进行语义分析,得到多个第二语义分析结果;Performing semantic analysis on the question and answer phrases to obtain a first semantic analysis result, and performing semantic analysis to each of the professional phrases to obtain a plurality of second semantic analysis results; 分别对所述第一语义分析结果与各所述第二语义分析结果进行对比,并根据对比的结果,从多个所述专业词组中选取所述问答分词组中各问答分词的近义词;Comparing the first semantic analysis result with each of the second semantic analysis results, and selecting synonyms for each question and answer participle in the question and answer participle group from a plurality of the professional phrases according to the results of the comparison; 将选取的近义词作为所述问答分词对应的问答专业分词,其中,所述问答专业分词包括问诊专业分词和应答专业分词。The selected synonyms are used as the question and answer professional participle corresponding to the question and answer participle, wherein the question and answer professional participle includes the question and answer professional participle. 根据权利要求15所述的计算机可读存储介质,其中,所述根据对比的结果,从多个所述专业词组中选取所述问答分词组中各问答分词的近义词包括:The computer-readable storage medium according to claim 15, wherein, according to the result of the comparison, selecting the synonyms of each question and answer participle in the question and answer participle from a plurality of said professional phrases includes: 根据对比的结果,分别计算所述第一语义分析结果与各所述第二语义分析结果之间的差异程度,并根据所述差异程度,确定所述问答分词组与各所述专业词组之间的相似度;According to the result of the comparison, respectively calculate the degree of difference between the first semantic analysis result and each of the second semantic analysis results, and according to the degree of difference, determine the difference between the question and answer phrase group and each of the professional phrases the similarity; 将所述问答分词组中每个问答分词对应专业词所在的专业词组进行分类,得到多个问答分词类别的专业词组;Classify the professional phrases where each question and answer participle corresponds to the professional word in the question and answer participle, and obtain the professional phrases of a plurality of question and answer participle categories; 分别从各个问答分词类别的专业词组中选取相似度最大的专业词组,并将选取的专业词组中对应问答分词类别的专业词作为所述问答分词的近义词。Select the professional phrase with the highest similarity from the professional phrases of each question and answer word segmentation category, and use the professional words corresponding to the question and answer word segmentation category in the selected professional phrases as synonyms for the question and answer word segmentation. 根据权利要求13-16中任一项所述的计算机可读存储介质,其中,所述采用预置先验医疗知识库,匹配诊断语句对应的治疗产品信息包括:The computer-readable storage medium according to any one of claims 13-16, wherein, using the preset prior medical knowledge base, matching the treatment product information corresponding to the diagnostic statement includes: 采用所述诊断语句,在预置先验医疗知识库中进行层次遍历,并根据层次遍历的结果,确定所述诊断语句对应的诊断结果;Using the diagnostic statement, perform hierarchical traversal in the preset priori medical knowledge base, and determine the diagnosis result corresponding to the diagnostic statement according to the result of the hierarchical traversal; 从所述先验知识库中选取与所述诊断结果相匹配的治疗产品标识信息,并获取与所述治疗产品标识信息相映射的治疗产品信息,其中,所述治疗产品信息包括治疗产品的推荐 链接和摘要信息。Selecting the therapeutic product identification information matching the diagnosis result from the prior knowledge base, and obtaining the therapeutic product information mapped to the therapeutic product identification information, wherein the therapeutic product information includes the recommendation of the therapeutic product Links and Summary Information. 根据权利要求13所述的计算机可读存储介质,其中,在所述获取问诊语料和所述问诊语料对应的待推送应答语料之后,还包括:The computer-readable storage medium according to claim 13, wherein, after the acquisition of the medical inquiry corpus and the response corpus to be pushed corresponding to the medical inquiry corpus, further comprising: 采用预置全同态加密算法,将所述问诊语料和所述待推送应答语料转化成对应的进制位明文;Using a preset fully homomorphic encryption algorithm, converting the medical inquiry corpus and the response corpus to be pushed into corresponding base plaintext; 对所述进制位明文进行加密运算,得到加密语料,并根据预置模值,计算所述加密语料的密文原码、密文反码和密文补码;Encrypting the base plaintext to obtain encrypted corpus, and calculating the original ciphertext, inverse ciphertext and complement of ciphertext of the encrypted corpus according to the preset modulus; 采用所述密文原码、所述密文反码和所述密文补码,对所述加密语料进行模运算,得到模加密语料,其中,所述模加密语料包括所述问诊语料对应的第一模加密语料和所述待推送应答语料对应的第二加密语料;Using the original code of the ciphertext, the inverse code of the ciphertext, and the complementary code of the ciphertext, perform a modular operation on the encrypted corpus to obtain a modular encrypted corpus, wherein the modular encrypted corpus includes the corresponding The first modular encrypted corpus and the second encrypted corpus corresponding to the response corpus to be pushed; 将所述第一加密语料作为新的问诊语料,以及将所述第二加密语料作为新的待推送应答语料。The first encrypted corpus is used as a new interrogation corpus, and the second encrypted corpus is used as a new response corpus to be pushed. 一种基于人工智能的应答语料生成装置,其中,所述基于人工智能的应答语料生成装置包括:An artificial intelligence-based response corpus generation device, wherein the artificial intelligence-based response corpus generation device includes: 分词模块,用于获取问诊语料和所述问诊语料对应的待推送应答语料,并基于预置线性链条件随机场,分别对所述问诊语料和所述待推送应答语料进行分词处理,对应得到多个问诊分词和多个应答分词;The word segmentation module is used to obtain the query corpus and the response corpus to be pushed corresponding to the query corpus, and perform word segmentation processing on the query corpus and the response corpus to be pushed based on the preset linear chain conditional random field, respectively, Correspondingly obtain multiple question and answer participle words; 语义匹配模块,用于分别对所述问诊分词和所述应答分词进行专业词语义匹配,对应得到所述问诊分词对应的问诊专业分词和所述应答分词对应的应答专业分词;The semantic matching module is used to carry out semantic matching of professional words on the question participle and the response participle respectively, and correspondingly obtain the professional question participle corresponding to the question participle and the response professional participle corresponding to the response participle; 问答匹配模块,用于依次对各所述问诊专业分词和各所述应答专业分词进行交叉问答匹配,并根据交叉问答匹配的结果,对所述问诊专业分词和所述应答专业分词进行组合,得到诊断语句;A question-and-answer matching module, which is used to sequentially perform cross question-and-answer matching on each of the professional participle of the inquiry and each of the professional participle of the response, and combine the participle of the professional participle of the inquiry with the professional participle of the response according to the result of the cross-question matching , get the diagnostic statement; 组合模块,用于采用预置先验医疗知识库,匹配诊断语句对应的治疗产品信息,对所述治疗产品信息和所述待推送应答语料进行组合,得到新的待推送应答语料并推送。The combination module is used to match the treatment product information corresponding to the diagnostic statement using the preset prior medical knowledge base, and combine the treatment product information and the response corpus to be pushed to obtain and push new response corpus to be pushed. 根据权利要求19所述的基于人工智能的应答语料生成装置,其中,所述基于人工智能的应答语料生成装置还包括加密模块,用于:The artificial intelligence-based answer corpus generation device according to claim 19, wherein, the artificial intelligence-based answer corpus generation device also includes an encryption module for: 采用预置全同态加密算法,将所述问诊语料和所述待推送应答语料转化成对应的进制位明文;Using a preset fully homomorphic encryption algorithm, converting the medical inquiry corpus and the response corpus to be pushed into corresponding base plaintext; 对所述进制位明文进行加密运算,得到加密语料,并根据预置模值,计算所述加密语料的密文原码、密文反码和密文补码;Encrypting the base plaintext to obtain encrypted corpus, and calculating the original ciphertext, inverse ciphertext and complement of ciphertext of the encrypted corpus according to the preset modulus; 采用所述密文原码、所述密文反码和所述密文补码,对所述加密语料进行模运算,得到模加密语料,其中,所述模加密语料包括所述问诊语料对应的第一模加密语料和所述待推送应答语料对应的第二加密语料;Using the original code of the ciphertext, the inverse code of the ciphertext, and the complementary code of the ciphertext, perform a modular operation on the encrypted corpus to obtain a modular encrypted corpus, wherein the modular encrypted corpus includes the corresponding The first modular encrypted corpus and the second encrypted corpus corresponding to the response corpus to be pushed; 将所述第一加密语料作为新的问诊语料,以及将所述第二加密语料作为新的待推送应答语料。The first encrypted corpus is used as a new interrogation corpus, and the second encrypted corpus is used as a new response corpus to be pushed.
PCT/CN2022/088893 2021-09-09 2022-04-25 Answer corpus generation method based on artificial intelligence, and related device Ceased WO2023035623A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111055021.X 2021-09-09
CN202111055021.XA CN113742454B (en) 2021-09-09 2021-09-09 Response corpus generation method based on artificial intelligence and related equipment

Publications (1)

Publication Number Publication Date
WO2023035623A1 true WO2023035623A1 (en) 2023-03-16

Family

ID=78737446

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/088893 Ceased WO2023035623A1 (en) 2021-09-09 2022-04-25 Answer corpus generation method based on artificial intelligence, and related device

Country Status (2)

Country Link
CN (1) CN113742454B (en)
WO (1) WO2023035623A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116313162A (en) * 2023-05-12 2023-06-23 北京梆梆安全科技有限公司 Medical inquiry system based on AI model
CN116992011A (en) * 2023-08-15 2023-11-03 浙商证券股份有限公司 Method, system and device for service data matching query
CN118278406A (en) * 2024-04-29 2024-07-02 上海信产管理咨询有限公司 Communication engineering record file information processing method, device and storage medium
CN118690000A (en) * 2024-08-26 2024-09-24 吉林大学第一医院 An emergency triage question-answering system based on knowledge graph
CN119670869A (en) * 2025-02-21 2025-03-21 北京融威众邦科技股份有限公司 Corpus construction method and system of medical question-answering large model

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113742454B (en) * 2021-09-09 2023-07-21 平安科技(深圳)有限公司 Response corpus generation method based on artificial intelligence and related equipment
CN114297693B (en) * 2021-12-30 2022-11-18 北京海泰方圆科技股份有限公司 Model pre-training method and device, electronic equipment and storage medium
CN114861080B (en) * 2022-05-12 2025-05-02 平安科技(深圳)有限公司 Question and answer corpus recommendation method, device, computer equipment and storage medium
CN116775833A (en) * 2023-06-20 2023-09-19 平安科技(深圳)有限公司 Information complement method, device, equipment and medium suitable for inquiry

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101178197B1 (en) * 2011-03-17 2012-08-29 김지만 System for advertising medicine
US20170116384A1 (en) * 2015-10-21 2017-04-27 Jamal Ghani Systems and methods for computerized patient access and care management
CN109817351A (en) * 2019-01-31 2019-05-28 百度在线网络技术(北京)有限公司 A kind of information recommendation method, device, equipment and storage medium
CN112509682A (en) * 2020-12-15 2021-03-16 康键信息技术(深圳)有限公司 Text recognition-based inquiry method, device, equipment and storage medium
CN113742454A (en) * 2021-09-09 2021-12-03 平安科技(深圳)有限公司 Response corpus generation method based on artificial intelligence and related equipment

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6026388A (en) * 1995-08-16 2000-02-15 Textwise, Llc User interface and other enhancements for natural language information retrieval system and method
CN110781677B (en) * 2019-10-12 2023-02-07 深圳平安医疗健康科技服务有限公司 Medicine information matching processing method and device, computer equipment and storage medium
CN111695343A (en) * 2020-06-23 2020-09-22 深圳壹账通智能科技有限公司 Wrong word correcting method, device, equipment and storage medium
CN112287080B (en) * 2020-10-23 2023-10-03 平安科技(深圳)有限公司 Method and device for rewriting problem statement, computer device and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101178197B1 (en) * 2011-03-17 2012-08-29 김지만 System for advertising medicine
US20170116384A1 (en) * 2015-10-21 2017-04-27 Jamal Ghani Systems and methods for computerized patient access and care management
CN109817351A (en) * 2019-01-31 2019-05-28 百度在线网络技术(北京)有限公司 A kind of information recommendation method, device, equipment and storage medium
CN112509682A (en) * 2020-12-15 2021-03-16 康键信息技术(深圳)有限公司 Text recognition-based inquiry method, device, equipment and storage medium
CN113742454A (en) * 2021-09-09 2021-12-03 平安科技(深圳)有限公司 Response corpus generation method based on artificial intelligence and related equipment

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116313162A (en) * 2023-05-12 2023-06-23 北京梆梆安全科技有限公司 Medical inquiry system based on AI model
CN116313162B (en) * 2023-05-12 2023-08-18 北京梆梆安全科技有限公司 Medical inquiry system based on AI model
CN116992011A (en) * 2023-08-15 2023-11-03 浙商证券股份有限公司 Method, system and device for service data matching query
CN116992011B (en) * 2023-08-15 2024-09-13 浙商证券股份有限公司 Method, system and device for service data matching query
CN118278406A (en) * 2024-04-29 2024-07-02 上海信产管理咨询有限公司 Communication engineering record file information processing method, device and storage medium
CN118690000A (en) * 2024-08-26 2024-09-24 吉林大学第一医院 An emergency triage question-answering system based on knowledge graph
CN119670869A (en) * 2025-02-21 2025-03-21 北京融威众邦科技股份有限公司 Corpus construction method and system of medical question-answering large model

Also Published As

Publication number Publication date
CN113742454B (en) 2023-07-21
CN113742454A (en) 2021-12-03

Similar Documents

Publication Publication Date Title
WO2023035623A1 (en) Answer corpus generation method based on artificial intelligence, and related device
CN113707300B (en) Search intention recognition method, device, equipment and medium based on artificial intelligence
CN110032648B (en) Medical record structured analysis method based on medical field entity
CN112447300B (en) Medical query method and device based on graph neural network, computer equipment and storage medium
WO2021139424A1 (en) Text content quality evaluation method, apparatus and device, and storage medium
CN117407541A (en) Knowledge graph question-answering method based on knowledge enhancement
CN113707299A (en) Auxiliary diagnosis method and device based on inquiry session and computer equipment
CN112035627B (en) Automatic question and answer method, device, equipment and storage medium
CN110569343B (en) Clinical text structuring method based on question and answer
CN111048167A (en) Hierarchical case structuring method and system
CN110598786A (en) Neural network training method, semantic classification method and semantic classification device
CN113094478B (en) Expression reply method, device, equipment and storage medium
CN113724830A (en) Medicine taking risk detection method based on artificial intelligence and related equipment
CN115132372A (en) Term processing method, apparatus, electronic device, storage medium, and program product
CN110322959A (en) A kind of Knowledge based engineering depth medical care problem method for routing and system
CN112199958A (en) Concept word sequence generation method and device, computer equipment and storage medium
CN116522944A (en) Picture generation method, device, equipment and medium based on multi-head attention
CN108664464A (en) A kind of the determination method and determining device of semantic relevancy
CN116975212A (en) Method, device, computer equipment and storage medium for finding answers to question text
CN113704481B (en) Text processing method, device, equipment and storage medium
CN111680515B (en) Answer determination method and device based on AI (Artificial Intelligence) recognition, electronic equipment and medium
CN109284491A (en) Medical text recognition method, sentence recognition model training method
EP3901875A1 (en) Topic modelling of short medical inquiries
WO2021042517A1 (en) Artificial intelligence-based article gist extraction method and device, and storage medium
CN115858724A (en) Question and answer processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22866101

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22866101

Country of ref document: EP

Kind code of ref document: A1