[go: up one dir, main page]

WO2019114430A1 - Natural language question understanding method and apparatus, and electronic device - Google Patents

Natural language question understanding method and apparatus, and electronic device Download PDF

Info

Publication number
WO2019114430A1
WO2019114430A1 PCT/CN2018/112115 CN2018112115W WO2019114430A1 WO 2019114430 A1 WO2019114430 A1 WO 2019114430A1 CN 2018112115 W CN2018112115 W CN 2018112115W WO 2019114430 A1 WO2019114430 A1 WO 2019114430A1
Authority
WO
WIPO (PCT)
Prior art keywords
natural language
question information
language question
parsing unit
minimum
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2018/112115
Other languages
French (fr)
Chinese (zh)
Inventor
王碧波
董雪梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Publication of WO2019114430A1 publication Critical patent/WO2019114430A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Definitions

  • the present disclosure relates to the field of natural language processing technologies, and in particular, to a method, device, and electronic device for understanding natural language questions.
  • Natural language processing is a long-term concern and research by human beings. Currently, this technology is mainly used in many fields such as multi-language translation and information query, and has made good progress. However, natural language processing is directly applied to data analysis. There is no precedent on the top.
  • Natural language processing is divided into many different technical genres. At the beginning, natural language processing methods based on formal languages are in the mainstream, but this technical route cannot handle the expressions that are rich in change, and can only be written mechanically according to certain Templates or rules that translate or generate language are very blunt. Later, there was a way to introduce statistical mathematics theory into language processing. For example, most of the machine translation systems such as Google Translate and Baidu Translation were developed on the basis of such systems. This natural language processing method based on statistical theory can effectively use a large number of corpora to train the model, so as to acquire various forms of language expression. It is currently very good at multilingual translation. However, this technical route still has the defect that the recognition accuracy needs to be improved.
  • the present disclosure provides a method, device, and electronic device for understanding natural language questions.
  • the present disclosure provides a method for understanding natural language questions, including:
  • the natural language question information is question information related to the data query
  • the data is retrieved from the preset knowledge base according to the query instruction, and the data result corresponding to the natural language question information is obtained; the preset knowledge base is generated according to the database data provided by the user, the input information data of the user, and/or the third party data.
  • parsing the natural language question information to obtain a minimum parsing unit includes:
  • the entity noun recognition is performed on a plurality of segmentation segments to obtain a minimum parsing unit; the minimum parsing unit includes: an attribute minimum parsing unit, a metric minimum parsing unit, and a time-space modified structural word.
  • the present disclosure provides a second possible implementation manner of the first aspect, wherein the attribute minimum parsing unit comprises at least one of an attribute item, a calculated operation item, and an attribute logical relationship item; the metric minimum parsing unit includes the metric At least one of the item, the metric logical relationship item, and the calculated modifier.
  • the attribute item in the attribute minimum parsing unit represents a category or entity to which the natural language question information belongs;
  • the calculation operation item in the attribute minimum parsing unit represents classification, grouping, segmentation, partial value, counting, pinyin Ranking and ranking by last name stroke;
  • attribute logical relationship items in attribute minimum resolution unit represent logical relationships of similarity, dissimilarity, inclusion, non-containment, comparison, and comparison.
  • the metric item in the metric minimum analytic unit represents a numerical value
  • the metric logical relationship item in the metric minimum analytic unit represents a numerical relationship of greater than, less than, equal to, greater than or equal to, less than or equal to, and not equal to
  • the calculated modifiers in the metric minimum parsing unit represent summation, average, count, standard deviation, variance, correlation, correlation coefficient, daily growth rate, weekly growth rate, monthly growth rate, quarterly growth rate, annual growth rate, and ranking. , maximum, minimum, front N, and reciprocal N.
  • the space-time modified structure word represents at least one of the following: a time period, a time point, a point before, a point after and a point, and a point and a point; a preset distance range somewhere Outside, somewhere within the preset distance range and somewhere set the direction relative position. For example, near a certain place, somewhere within a range of kilometers, miles, meters, kilometers, etc., the relative position of the east, south, west, and north directions.
  • the present disclosure provides a third possible implementation manner of the first aspect, wherein the generating the query instruction corresponding to the natural language question information based on the minimum parsing unit and the preset instruction set includes:
  • the corresponding instructions are extracted from the preset instruction set and combined to generate a query instruction corresponding to the natural language question information.
  • the data query logic included in the natural language question information specifically:
  • the query logic corresponding to the natural language question information is obtained.
  • the present disclosure provides a fourth possible implementation manner of the first aspect, wherein the method is performed after performing a retrieval from a preset knowledge base according to a query instruction to obtain a data result corresponding to the natural language question information Also includes:
  • the knowledge base sample data includes: user-provided database data, user input information data, and/or third-party data;
  • the present disclosure provides a fifth possible implementation manner of the first aspect, wherein, after performing a retrieval from a preset knowledge base according to the query instruction, obtaining the data result corresponding to the natural language question information, the method Also includes:
  • the natural language question information and its corresponding data result are added to the preset knowledge base.
  • the present disclosure provides an apparatus for understanding natural language questions, including:
  • the information acquisition module is configured to obtain natural language question information input by the user end; the natural language question information is question information related to the data query;
  • An information parsing module configured to parse natural language question information to obtain a minimum parsing unit
  • An instruction generating module configured to generate a query instruction corresponding to the natural language question information based on the minimum parsing unit and the preset instruction set;
  • a retrieval module configured to retrieve from the preset knowledge base according to the query instruction, and obtain a data result corresponding to the natural language question information; the preset knowledge base is generated according to the database data provided by the user, the input information data of the user, and/or the third party data.
  • the information parsing module includes:
  • a word segmentation module for performing word segmentation on natural language question information to obtain a plurality of word segmentation segments
  • the identification module is configured to perform entity noun recognition on the plurality of segmentation segments to obtain a minimum parsing unit; the minimum parsing unit comprises: a minimum parsing unit of the attribute, a minimum parsing unit of the metric, and a structured word in time and space.
  • the attribute minimum parsing unit includes at least one of an attribute item, a calculation operation item, and an attribute logical relationship item;
  • the metric minimum parsing unit includes at least one of a metric item, a metric logical relationship item, and a calculation modification item.
  • the attribute item in the attribute minimum parsing unit represents a category or entity to which the natural language question information belongs;
  • the calculation operation item in the attribute minimum parsing unit represents classification, grouping, segmentation, partial value, counting, pinyin Ranking and ranking by last name stroke;
  • attribute logical relationship items in attribute minimum resolution unit represent logical relationships of similarity, dissimilarity, inclusion, non-containment, comparison, and comparison.
  • the segmentation may include cutting the head, ending, and the like. Partial values can include for the string, the nth to mth letters, and so on.
  • the metric item in the metric minimum analytic unit represents a numerical value
  • the metric logical relationship item in the metric minimum analytic unit represents a numerical relationship of greater than, less than, equal to, greater than or equal to, less than or equal to, and not equal to
  • the calculated modifiers in the metric minimum parsing unit represent summation, average, count, standard deviation, variance, correlation, correlation coefficient, daily growth rate, weekly growth rate, monthly growth rate, quarterly growth rate, annual growth rate, and ranking. , maximum, minimum, front N, and reciprocal N.
  • the space-time modified structure word represents at least one of the following: a time period, a time point, a point before, a point after and a point, and a point and a point; a preset distance range somewhere Outside, somewhere within the preset distance range and somewhere set the direction relative position.
  • the instruction generating module is specifically configured to generate, according to the minimum parsing unit and the preset instruction set, a query instruction corresponding to the natural language question information by using the following steps:
  • corresponding instructions are extracted from the preset instruction set and combined to generate a query instruction corresponding to the natural language question information.
  • the instruction generating module is specifically configured to infer, according to the minimum parsing unit, data query logic included in the natural language question information by:
  • the query logic corresponding to the natural language question information is obtained.
  • the present disclosure further provides an electronic device comprising a memory and a processor, the memory storing a computer program executable on the processor, the step of implementing the method of the first aspect when the processor executes the computer program.
  • the present disclosure also provides a computer readable medium having processor-executable non-volatile program code, the program code causing a processor to perform the method of the first aspect.
  • the natural language question information is parsed to obtain a minimum parsing unit, and then the natural language question is constructed based on the minimum parsing unit and the preset instruction set.
  • the query statement of the information is further retrieved from the pre-established knowledge base according to the query statement, and the data result corresponding to the question information is obtained, wherein the knowledge base is established based on the database data provided by the user, the input information data of the user, and/or
  • the third-party data can provide accurate and calculated data results for the question information, so that it can be applied to professional scenes such as data analysis.
  • FIG. 1 is a flowchart of a method for understanding a natural language question provided by the present disclosure
  • FIG. 3 is a flowchart of another method for understanding natural language questions provided by the present disclosure.
  • FIG. 5 is a flowchart of another method for understanding natural language questions provided by the present disclosure.
  • FIG. 6 is a schematic structural diagram of an apparatus for understanding natural language questions provided by the present disclosure.
  • FIG. 7 is a schematic structural diagram of an electronic device according to the present disclosure.
  • the existing natural language processing methods have the defects that the recognition accuracy needs to be improved.
  • the research found that the recognition accuracy needs to be improved mainly in the following aspects: (1) If some scenes do not have a large amount of corpus accumulation, their recognition performance will be greatly reduced; (2) the model trained by statistical methods is not accurate. Sexual, difficult to express or resolve the exact meaning. Therefore, it cannot be applied to some very professional scenes, such as the field of data analysis. Based on this, the method, device and electronic device for understanding natural language questions provided by the present disclosure can accurately identify the natural language question information of the user and match the high-accuracy data result, which can be applied to the field of data analysis. And other professional scenes.
  • Embodiment 1 is a diagrammatic representation of Embodiment 1:
  • the present disclosure provides a method for understanding a natural language question, which can be applied to a relatively professional scene such as a data analysis field, and can be executed by an electronic device having a data processing function or the like.
  • the method includes the following steps:
  • S101 Acquire natural language question information input by the user end; the natural language question information is question information related to the data query.
  • the user can input the natural language question information to the software system by using the input process of voice or typing, and input the natural language question information to the software system, such as "the Vietnamese male who is older than 60 (years old) before the age of 2016 ( The average age of the body weight is less than 70kg".
  • the above natural language question information is finally obtained by the server in the form of text, and the natural language question information is question information related to the data query.
  • the electronic device may also pre-store a plurality of natural language question information and hierarchically store a plurality of natural language question information.
  • the first data analysis scenario, the second data analysis scenario, the M data may be set. Analyze the scene, M is an integer greater than 3.
  • Each data analysis scenario can correspond to different sub-scenarios, and each sub-scenario corresponds to different natural language question information.
  • the electrical device displays each data analysis scenario to the user, and the user selects the data analysis scenarios according to the required data to select the desired natural language question information for input.
  • the question information is parsed to obtain the minimum parsing unit.
  • the specific parsing process includes the following steps, as shown in Figure 2:
  • S201 Perform word segmentation on the natural language question information to obtain a plurality of word segmentation segments.
  • the word processing of the natural language question information is firstly processed, that is, the entity boundary recognition, and the natural language question information is segmented into a plurality of word segmentation, for example, the above question “the age of more than 60 (years old) before 2016)
  • the Vietnamese male surnamed Wang is less than the average age of 70kg.
  • word segmentation processing can be performed on natural language question information based on dictionary word segmentation algorithms such as forward maximum matching method, inverse maximum matching method and two-way matching word segmentation method.
  • the natural language question information can be segmented based on a word segmentation tool such as a Stanford word segmentation tool or a Hanlp word segmentation tool.
  • a natural language question can be asked based on a Hidden Markov Model (HMM), a Conditional Random Field Algorithm (CRF), a Support Vector Machine (SVM), and a deep learning.
  • HMM Hidden Markov Model
  • CRF Conditional Random Field Algorithm
  • SVM Support Vector Machine
  • Information is processed in word segmentation. It is also possible to perform word segmentation processing on natural language question information based on a combination of one or more of the above.
  • S202 Perform entity noun recognition on a plurality of segmentation segments to obtain a minimum resolution unit.
  • the minimum parsing unit includes: an attribute minimum parsing unit, a metric minimum parsing unit, and a time-space modified structural word.
  • the attribute minimum parsing unit includes at least one of an attribute item, a calculation operation item, and an attribute logical relationship item;
  • the metric minimum parsing unit includes at least one of a metric item, a metric logical relationship item, and a calculation modifier;
  • the structural word includes a space-time modifier.
  • the attribute item in the attribute minimum parsing unit indicates the category or entity to which the natural language question information belongs;
  • the calculation operation item in the attribute minimum parsing unit indicates classification, grouping, segmentation, partial value, counting, and pinyin ranking.
  • Attributes in the minimum parsing unit of the attribute represent logical relationships of similarity, dissimilarity, inclusion, and non-containment; metrics in the metric minimum parsing unit represent numerical values; metrics in the smallest parsing unit
  • the logical relationship item represents a numerical relationship of greater than, less than, equal to, greater than or equal to, less than or equal to.
  • the calculated modifiers in the metric minimum parsing unit represent summation, averaging (average usage can be: average xxx, not using the average of xxx), count, standard deviation, variance, correlation, correlation coefficient, daily growth rate, week Growth rate, monthly growth rate, quarterly growth rate, annual growth rate, ranking, maximum value, minimum value, pre-N, reciprocal number N, etc., N is an integer greater than or equal to 1; space-time modified structure words indicate time period, time point, A word describing time before, after, after, at a certain point, and some point, and somewhere within a preset distance range, somewhere within a preset distance range, and somewhere in a certain direction.
  • the space-time modified structure word may also be a combination of the above-mentioned modifiers indicating time and space, such as within a certain distance of a certain time period.
  • the core of the judgment of the attribute minimum parsing unit, the metric minimum parsing unit and the time-space modified structure word is whether the word recognized by the entity noun can be calculated, for example, the above representation is greater than, less than, included, not included, standard deviation, Words such as growth rate are words that can be calculated. Subsequent steps based on these computable words can improve the accuracy of understanding natural language question information and obtain more accurate data results for data analysis. The field provides better data reference information.
  • the minimum parsing unit corresponding to the natural language question information can be determined.
  • the question information is “box office income of action movies in North China in August 2017”.
  • the extracted word segmentation is: time (August 2017), region (North China region), action movie, box office income .
  • time (March 2017) is a space-time modified structure word
  • the region (North China region) belongs to the attribute minimum parsing unit
  • the action movie belongs to the attribute minimum parsing unit
  • the income belongs to the metric minimum parsing unit.
  • S103 Generate a query instruction corresponding to the natural language question information based on the minimum parsing unit and the preset instruction set.
  • the process of generating the query instruction includes the following steps, as shown in FIG. 3:
  • S301 Infer the data query logic included in the natural language question information according to the minimum parsing unit.
  • the minimum parsing unit obtained includes: North China (attribute minimum parsing unit), action movie (attribute minimum parsing unit), income (Metric minimum parsing unit); space-time modified structure words: August 2017; then further based on these words to determine the data query logic contained in the question information.
  • the combination logic in the instruction set is determined according to the meaning and order of the minimum parsing unit.
  • the category or entity to which the natural language question information belongs may be obtained according to the attribute item in the attribute minimum parsing unit, thereby obtaining the category of the natural language question information.
  • the query logic corresponding to each category of natural language question information may be preset.
  • the data query logic corresponding to the natural language question information with different problem categories is different.
  • the types of questions in natural language questioning information are: question quantity class, question ratio class, question ranking class, question relationship class, and so on.
  • the quantity class refers to the metric or attribute value in a certain time period or at a certain time point;
  • the question ratio class refers to the ratio of the count of a certain metric or an attribute in different time periods, and the different metrics are in the same time period.
  • the ratio of the rankings; the ranking class refers to sorting the columns (metrics or attributes) according to a certain dimension (the dimension is finally parsed into a filter condition); the relationship in the relational class is predefined or the program is learned through data training. For example, the most dangerous, most valuable, most relevant, nearby, similar, and so on.
  • the logic corresponding to the quantity class is: time period or time point
  • the mode is: time period
  • Query logic is: time period or time point
  • the mode is: time period
  • S302 According to the data query logic, extract corresponding instructions from the preset instruction set to combine, and generate a query instruction corresponding to the natural language question information.
  • the query request is composed of a predefined instruction fragment or instruction set
  • the database is composed by parsing these instruction sets. Request, filter, calculate, and return data to answer
  • S104 Perform a retrieval from a preset knowledge base according to the query instruction, and obtain a data result corresponding to the natural language question information.
  • the knowledge base sample data includes: user-provided database data, user input information data, and/or third-party data.
  • S402 Generate a preset knowledge base according to the knowledge base sample data.
  • the specific knowledge base generation process is not specifically limited.
  • a deep learning model based on a convolutional neural network may be used for establishment, or other methods may be employed.
  • the method provided by the present disclosure further includes the following steps, as shown in FIG. 5:
  • the data in the preset knowledge base can be continuously updated, so that the content in the preset knowledge base is more and more rich, so that the query accuracy of the question information can be continuously improved.
  • the resulting data results will change according to the query time and the knowledge base, and it is not a pre-stored answer. For example, if the preset knowledge base stores different data results corresponding to the same natural language question information, the latest time for obtaining the data result for the same natural language question information may be recorded in the preset knowledge base, and the user input is obtained. After the natural language question information, all data results corresponding to the natural language question information are searched from the preset knowledge base, and a data result closest to the obtained time is selected as the final data result corresponding to the natural language question information.
  • the preset knowledge base stores different data results corresponding to the same natural language question information
  • the natural language may be found from the preset knowledge base. All the data results corresponding to the question information are output to the user, and a data result selected by the user from all the data results is obtained, and the number of times the data results are selected in all the data results corresponding to the natural language question information is counted in the presence After the data result is selected more than the set highest threshold, the data result is used as the final data result corresponding to the natural language question information.
  • the natural language question information is first parsed to obtain a minimum parsing unit, and then the query statement of the natural language question information is constructed based on the minimum parsing unit and the preset instruction set. Then, the data is retrieved from the pre-established knowledge base according to the query statement, and the data result corresponding to the question information is obtained.
  • the knowledge base is established based on user-provided database data, user input information data, and/or third-party data. It can provide accurate and calculated data results for the question information, so that the method can be applied to professional scenes such as data analysis.
  • Embodiment 2 is a diagrammatic representation of Embodiment 1:
  • the apparatus includes an information acquisition module 61, an information analysis module 62, an instruction generation module 63, and a retrieval module 64.
  • the information obtaining module 61 is configured to obtain the natural language question information input by the user end; the natural language question information is the question information related to the data query; the information parsing module 62 is configured to parse the natural language question information to obtain a minimum parsing unit; The instruction generating module 63 is configured to generate a query instruction corresponding to the natural language question information based on the minimum parsing unit and the preset instruction set; the retrieving module 64 is configured to perform the retrieving from the preset knowledge base according to the query instruction to obtain the natural language question information. Corresponding data results; the preset knowledge base is generated based on user-provided database data, user input information data, and/or third-party data.
  • the information parsing module 62 further includes:
  • the word segmentation module 621 is configured to perform word segmentation processing on the natural language question information to obtain a plurality of word segment segments; the recognition module 622 is configured to perform entity noun recognition on the plurality of segment word segments to obtain a minimum parsing unit; and the minimum parsing unit includes: attribute minimum parsing The unit, the metric minimum parsing unit, and the structural word are modified in time.
  • each module has the same technical features as the above-described method of understanding natural language questions, and therefore, the above functions can also be realized.
  • the specific working process of each module in the device refer to the foregoing method embodiment, and details are not described herein again.
  • the attribute minimum parsing unit includes at least one of an attribute item, a calculation operation item, and an attribute logical relationship item;
  • the metric minimum parsing unit includes at least one of a metric item, a metric logical relationship item, and a calculation modification item.
  • the attribute item in the attribute minimum parsing unit represents a category or entity to which the natural language question information belongs;
  • the calculation operation item in the attribute minimum parsing unit represents classification, grouping, segmentation, partial value, counting, pinyin Ranking and ranking by last name stroke;
  • attribute logical relationship items in attribute minimum resolution unit represent logical relationships of similarity, dissimilarity, inclusion, non-containment, comparison, and comparison.
  • the metric item in the metric minimum analytic unit represents a numerical value
  • the metric logical relationship item in the metric minimum analytic unit represents a numerical relationship of greater than, less than, equal to, greater than or equal to, less than or equal to, and not equal to
  • the calculated modifiers in the metric minimum parsing unit represent summation, average, count, standard deviation, variance, correlation, correlation coefficient, daily growth rate, weekly growth rate, monthly growth rate, quarterly growth rate, annual growth rate, and ranking. , maximum, minimum, front N, and reciprocal N.
  • the space-time modified structure word represents at least one of the following: a time period, a time point, a point before, a point after, and a point and a point.
  • the instruction generating module 63 is configured to generate, according to the minimum parsing unit and the preset instruction set, a query instruction corresponding to the natural language question information by: inferring the natural according to the minimum parsing unit The data query logic included in the language question information; according to the data query logic, extracting corresponding instructions from the preset instruction set to be combined, and generating a query instruction corresponding to the natural language question information.
  • the instruction generating module 63 is specifically configured to infer, according to the minimum parsing unit, data query logic included in the natural language question information by: obtaining a category corresponding to the natural language question information; The category obtains query logic corresponding to the natural language question information.
  • Embodiment 3 is a diagrammatic representation of Embodiment 3
  • the present disclosure also provides an electronic device.
  • the electronic device includes a processor 70, a memory 71, a bus 72, and a communication interface 73.
  • the processor 70, the communication interface 73, and the memory 71 are connected by a bus 72.
  • the processor 70 is operative to execute executable modules, such as computer programs, stored in the memory 71.
  • the steps of the method as described in the method embodiments are implemented when the processor executes a computer program. That is, each method step in Embodiment 1 can be performed by the processor 70.
  • the memory 71 may include a high speed random access memory (RAM), and may also include a non-volatile memory, such as at least one disk memory.
  • RAM high speed random access memory
  • non-volatile memory such as at least one disk memory.
  • the communication connection between the system network element and at least one other network element is implemented by at least one communication interface 73 (which may be wired or wireless), and may use an Internet, a wide area network, a local network, a metropolitan area network, or the like.
  • the bus 72 may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, or an Extended Industry Standard Architecture (EISA) bus.
  • ISA Industry Standard Architecture
  • PCI Peripheral Component Interconnect
  • EISA Extended Industry Standard Architecture
  • the bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one double-headed arrow is shown in Figure 7, but it does not mean that there is only one bus or one type of bus.
  • the memory 71 is used to store a program, and the processor 70 executes the program after receiving the execution instruction, and the method executed by the device defined by the flow process disclosed in any of the foregoing embodiments may be applied to the processor 70.
  • Processor 70 may be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the above method may be completed by an integrated logic circuit of hardware in the processor 70 or an instruction in the form of software.
  • the processor 70 may be a general-purpose processor, including a central processing unit (CPU), a network processor (NP processor, etc.), or a digital signal processor (DSP). ), Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware component.
  • the methods, steps, and logic blocks in this disclosure may be implemented or performed.
  • the general purpose processor may be a microprocessor or the processor or any conventional processor or the like.
  • the steps of the method provided in connection with the present disclosure may be directly embodied by the completion of the hardware decoding processor or by a combination of hardware and software modules in the decoding processor.
  • the software modules can be located in a conventional storage medium such as random access memory, flash memory, read only memory, programmable read only memory or electrically erasable programmable memory, registers, and the like.
  • the storage medium is located in the memory 71, and the processor 70 reads the information in the memory 71 and performs the steps of the above method in combination with its hardware.
  • the present disclosure also provides a computer program product for a method of understanding natural language questions, comprising a computer readable storage medium storing non-volatile program code executable by a processor, the program code comprising instructions operable to perform the previous method embodiments
  • a computer program product for a method of understanding natural language questions, comprising a computer readable storage medium storing non-volatile program code executable by a processor, the program code comprising instructions operable to perform the previous method embodiments
  • each block of the flowchart or block diagram can represent a module, a program segment, or a portion of code that includes one or more of the Executable instructions.
  • the functions noted in the blocks may also occur in a different order than that illustrated in the drawings. For example, two consecutive blocks may be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowcharts, and combinations of blocks in the block diagrams and/or flowcharts can be implemented in a dedicated hardware-based system that performs the specified function or function. Or it can be implemented by a combination of dedicated hardware and computer instructions.
  • the disclosed systems, devices, and methods may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some communication interface, device or unit, and may be electrical, mechanical or otherwise.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in various embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the functions, if implemented in the form of software functional units and sold or used as separate products, may be stored in a non-transitory computer readable storage medium executable by a processor.
  • a computer device which may be a personal computer, server, or network device, etc.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .
  • the method, device and electronic device for understanding natural language questions provided by the present disclosure can provide accurate and statistically obtained data results for question information, and realize accurate identification of natural language question information, and can be applied to data analysis fields, etc. Professional scenes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present disclosure relates to the technical field of natural language processing. Provided are a natural language question understanding method and apparatus, and an electronic device. The natural language question understanding method comprises: obtaining natural language question information input by a user terminal, the natural language question information being question information related to data query; parsing the natural language question information to obtain a minimum parsing unit; generating, on the basis of the minimum parsing unit and a preset instruction set, a query instruction corresponding to the natural language question information; and retrieving, according to the query instruction, from a preset knowledge base a data result corresponding to the natural language question information, the preset knowledge base being generated according to database data provided by a user, input information data of the user, and/or third-party data. The method can accurately recognize natural language question information.

Description

自然语言提问的理解方法、装置及电子设备Method, device and electronic device for understanding natural language questioning

相关申请的交叉引用Cross-reference to related applications

本公开要求于2017年12月15日提交中国专利局的申请号为CN2017113616797,名称为“自然语言提问的理解方法、装置及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。The present application claims priority to Chinese Patent Application No. CN2017113616797, entitled "Understanding Method, Apparatus and Electronic Device for Natural Language Questioning", filed on December 15, 2017, the entire contents of which are incorporated by reference. In the present disclosure.

技术领域Technical field

本公开涉及自然语言处理技术领域,尤其是涉及一种自然语言提问的理解方法、装置及电子设备。The present disclosure relates to the field of natural language processing technologies, and in particular, to a method, device, and electronic device for understanding natural language questions.

背景技术Background technique

自然语言处理是一项被人类长期关注并研究的技术,目前该技术主要应用于多语言翻译、信息查询等多个领域,并且均有很好的进展,然而将自然语言处理直接运用在数据分析上尚无先例。Natural language processing is a long-term concern and research by human beings. Currently, this technology is mainly used in many fields such as multi-language translation and information query, and has made good progress. However, natural language processing is directly applied to data analysis. There is no precedent on the top.

自然语言处理分为多种不同的技术流派,一开始,基于形式语言的自然语言处理方法居主流地位,但这种技术路线无法处理富于变化的表达方法,只能机械的按照某些写好的模板或规则对语言进行翻译或生成,显得非常生硬。之后,出现了将统计数学理论引入语言处理的方式,比如,当前谷歌翻译、百度翻译等大部分机器翻译系统均是在此类系统的基础上开发而成。这种基于统计理论的自然语言处理方法,可以有效的运用大量的语料库对模型进行训练,从而习得各种语言表达的变化形式。目前在多语言翻译上表现是很好的。但是这种技术路线仍然存在识别准确性有待提高的缺陷。Natural language processing is divided into many different technical genres. At the beginning, natural language processing methods based on formal languages are in the mainstream, but this technical route cannot handle the expressions that are rich in change, and can only be written mechanically according to certain Templates or rules that translate or generate language are very blunt. Later, there was a way to introduce statistical mathematics theory into language processing. For example, most of the machine translation systems such as Google Translate and Baidu Translation were developed on the basis of such systems. This natural language processing method based on statistical theory can effectively use a large number of corpora to train the model, so as to acquire various forms of language expression. It is currently very good at multilingual translation. However, this technical route still has the defect that the recognition accuracy needs to be improved.

发明内容Summary of the invention

有鉴于此,本公开提供一种自然语言提问的理解方法、装置及电子设备。In view of this, the present disclosure provides a method, device, and electronic device for understanding natural language questions.

第一方面,本公开提供了一种自然语言提问的理解方法,包括:In a first aspect, the present disclosure provides a method for understanding natural language questions, including:

获取用户端输入的自然语言提问信息;自然语言提问信息为与数据查询相关的提问信息;Obtaining natural language question information input by the user end; the natural language question information is question information related to the data query;

对自然语言提问信息进行解析,得到最小解析单元;Parsing the natural language question information to obtain a minimum parsing unit;

基于最小解析单元以及预设指令集,生成自然语言提问信息对应的查询指令;Generating a query instruction corresponding to the natural language question information based on the minimum parsing unit and the preset instruction set;

根据查询指令从预设知识库中进行检索,得到自然语言提问信息对应的数据结果;预设知识库根据用户提供的数据库数据、用户的输入信息数据和/或第三方数据生成。The data is retrieved from the preset knowledge base according to the query instruction, and the data result corresponding to the natural language question information is obtained; the preset knowledge base is generated according to the database data provided by the user, the input information data of the user, and/or the third party data.

结合第一方面,本公开提供了第一方面的第一种可能的实施方式,其中,对自然语言 提问信息进行解析,得到最小解析单元,具体包括:With reference to the first aspect, the present disclosure provides a first possible implementation manner of the first aspect, wherein parsing the natural language question information to obtain a minimum parsing unit includes:

对自然语言提问信息进行分词处理,得到多个分词片段;Perform word segmentation on natural language question information to obtain multiple word segmentation segments;

对多个分词片段进行实体名词识别,得到最小解析单元;最小解析单元包括:属性最小解析单元、度量最小解析单元及时空修饰结构词。The entity noun recognition is performed on a plurality of segmentation segments to obtain a minimum parsing unit; the minimum parsing unit includes: an attribute minimum parsing unit, a metric minimum parsing unit, and a time-space modified structural word.

结合第一方面,本公开提供了第一方面的第二种可能的实施方式,其中,属性最小解析单元包括属性项、计算操作项、属性逻辑关系项中至少一项;度量最小解析单元包括度量项、度量逻辑关系项、计算修饰项中至少一项。With reference to the first aspect, the present disclosure provides a second possible implementation manner of the first aspect, wherein the attribute minimum parsing unit comprises at least one of an attribute item, a calculated operation item, and an attribute logical relationship item; the metric minimum parsing unit includes the metric At least one of the item, the metric logical relationship item, and the calculated modifier.

可选地,所述属性最小解析单元中的属性项表示自然语言提问信息所属的分类或实体;属性最小解析单元中的计算操作项表示分类、分组、切分、部分取值、计数、按拼音排名和按姓氏笔画排名的方式;属性最小解析单元中的属性逻辑关系项表示相似、不相似、包含、不包含、对比和比较的逻辑关系。Optionally, the attribute item in the attribute minimum parsing unit represents a category or entity to which the natural language question information belongs; the calculation operation item in the attribute minimum parsing unit represents classification, grouping, segmentation, partial value, counting, pinyin Ranking and ranking by last name stroke; attribute logical relationship items in attribute minimum resolution unit represent logical relationships of similarity, dissimilarity, inclusion, non-containment, comparison, and comparison.

可选地,所述度量最小解析单元中的度量项表示数值;所述度量最小解析单元中的度量逻辑关系项表示大于、小于、等于、大于等于、小于等于、不等于的数值大小关系;所述度量最小解析单元中的计算修饰项表示求和、平均、计数、标准差、方差、相关度、相关系数、日增长率、周增长率、月增长率、季度增长率、年增长率、排序、最大值、最小值、前N和倒数N。Optionally, the metric item in the metric minimum analytic unit represents a numerical value; the metric logical relationship item in the metric minimum analytic unit represents a numerical relationship of greater than, less than, equal to, greater than or equal to, less than or equal to, and not equal to; The calculated modifiers in the metric minimum parsing unit represent summation, average, count, standard deviation, variance, correlation, correlation coefficient, daily growth rate, weekly growth rate, monthly growth rate, quarterly growth rate, annual growth rate, and ranking. , maximum, minimum, front N, and reciprocal N.

可选地,所述时空修饰结构词表示下述项中的至少一个:时间段、时间点、某点以前、某点以后和,以及某点和某点之间;某处预设距离范围之外、某处预设距离范围之内和某处设定方向相对位置。如某处附近,某处预设距离公里、英里、米、千米等范围之内或之外,某处东、南、西、北等方向相对位置的表示等。Optionally, the space-time modified structure word represents at least one of the following: a time period, a time point, a point before, a point after and a point, and a point and a point; a preset distance range somewhere Outside, somewhere within the preset distance range and somewhere set the direction relative position. For example, near a certain place, somewhere within a range of kilometers, miles, meters, kilometers, etc., the relative position of the east, south, west, and north directions.

结合第一方面,本公开提供了第一方面的第三种可能的实施方式,其中,基于最小解析单元以及预设指令集,生成自然语言提问信息对应的查询指令,具体包括:With reference to the first aspect, the present disclosure provides a third possible implementation manner of the first aspect, wherein the generating the query instruction corresponding to the natural language question information based on the minimum parsing unit and the preset instruction set includes:

根据最小解析单元推断自然语言提问信息所包含的数据查询逻辑;Inferring data query logic included in the natural language question information according to the minimum parsing unit;

根据数据查询逻辑,从预设指令集中提取相应指令进行组合,生成自然语言提问信息对应的查询指令。According to the data query logic, the corresponding instructions are extracted from the preset instruction set and combined to generate a query instruction corresponding to the natural language question information.

可选地,根据所述最小解析单元推断所述自然语言提问信息所包含的数据查询逻辑,具体包括:Optionally, inferring, according to the minimum parsing unit, the data query logic included in the natural language question information, specifically:

获得所述自然语言提问信息对应的类别;Obtaining a category corresponding to the natural language question information;

根据获得的类别,得到所述自然语言提问信息对应的查询逻辑。According to the obtained category, the query logic corresponding to the natural language question information is obtained.

结合第一方面,本公开提供了第一方面的第四种可能的实施方式,其中,在根据查询指令从预设知识库中进行检索,得到自然语言提问信息对应的数据结果之前,所述方法还 包括:In conjunction with the first aspect, the present disclosure provides a fourth possible implementation manner of the first aspect, wherein the method is performed after performing a retrieval from a preset knowledge base according to a query instruction to obtain a data result corresponding to the natural language question information Also includes:

获取知识库样本数据;知识库样本数据包括:用户提供的数据库数据、用户的输入信息数据和/或第三方数据;Obtaining knowledge base sample data; the knowledge base sample data includes: user-provided database data, user input information data, and/or third-party data;

根据知识库样本数据,生成预设知识库。Generate a preset knowledge base based on the knowledge base sample data.

结合第一方面,本公开提供了第一方面的第五种可能的实施方式,其中,在根据查询指令从预设知识库中进行检索,得到自然语言提问信息对应的数据结果后,所述方法还包括:With reference to the first aspect, the present disclosure provides a fifth possible implementation manner of the first aspect, wherein, after performing a retrieval from a preset knowledge base according to the query instruction, obtaining the data result corresponding to the natural language question information, the method Also includes:

将自然语言提问信息及其对应的数据结果添加至预设知识库中。The natural language question information and its corresponding data result are added to the preset knowledge base.

第二方面,本公开提供一种自然语言提问的理解装置,包括:In a second aspect, the present disclosure provides an apparatus for understanding natural language questions, including:

信息获取模块,用于获取用户端输入的自然语言提问信息;自然语言提问信息为与数据查询相关的提问信息;The information acquisition module is configured to obtain natural language question information input by the user end; the natural language question information is question information related to the data query;

信息解析模块,用于对自然语言提问信息进行解析,得到最小解析单元;An information parsing module, configured to parse natural language question information to obtain a minimum parsing unit;

指令生成模块,用于基于最小解析单元以及预设指令集,生成自然语言提问信息对应的查询指令;An instruction generating module, configured to generate a query instruction corresponding to the natural language question information based on the minimum parsing unit and the preset instruction set;

检索模块,用于根据查询指令从预设知识库中进行检索,得到自然语言提问信息对应的数据结果;预设知识库根据用户提供的数据库数据、用户的输入信息数据和/或第三方数据生成。a retrieval module, configured to retrieve from the preset knowledge base according to the query instruction, and obtain a data result corresponding to the natural language question information; the preset knowledge base is generated according to the database data provided by the user, the input information data of the user, and/or the third party data. .

结合第二方面,本公开提供了第二方面的第一种可能的实施方式,其中,信息解析模块包括:With reference to the second aspect, the present disclosure provides a first possible implementation manner of the second aspect, wherein the information parsing module includes:

分词模块,用于对自然语言提问信息进行分词处理,得到多个分词片段;a word segmentation module for performing word segmentation on natural language question information to obtain a plurality of word segmentation segments;

识别模块,用于对多个分词片段进行实体名词识别,得到最小解析单元;最小解析单元包括:属性最小解析单元、度量最小解析单元及时空修饰结构词。The identification module is configured to perform entity noun recognition on the plurality of segmentation segments to obtain a minimum parsing unit; the minimum parsing unit comprises: a minimum parsing unit of the attribute, a minimum parsing unit of the metric, and a structured word in time and space.

可选地,所述属性最小解析单元包括属性项、计算操作项、属性逻辑关系项中至少一项;所述度量最小解析单元包括度量项、度量逻辑关系项、计算修饰项中至少一项。Optionally, the attribute minimum parsing unit includes at least one of an attribute item, a calculation operation item, and an attribute logical relationship item; the metric minimum parsing unit includes at least one of a metric item, a metric logical relationship item, and a calculation modification item.

可选地,所述属性最小解析单元中的属性项表示自然语言提问信息所属的分类或实体;属性最小解析单元中的计算操作项表示分类、分组、切分、部分取值、计数、按拼音排名和按姓氏笔画排名的方式;属性最小解析单元中的属性逻辑关系项表示相似、不相似、包含、不包含、对比和比较的逻辑关系。其中,切分可以包括切分开头、结尾等。部分取值可以包括针对字符串,取第n到第m个字母等。Optionally, the attribute item in the attribute minimum parsing unit represents a category or entity to which the natural language question information belongs; the calculation operation item in the attribute minimum parsing unit represents classification, grouping, segmentation, partial value, counting, pinyin Ranking and ranking by last name stroke; attribute logical relationship items in attribute minimum resolution unit represent logical relationships of similarity, dissimilarity, inclusion, non-containment, comparison, and comparison. Wherein, the segmentation may include cutting the head, ending, and the like. Partial values can include for the string, the nth to mth letters, and so on.

可选地,所述度量最小解析单元中的度量项表示数值;所述度量最小解析单元中的度量逻辑关系项表示大于、小于、等于、大于等于、小于等于、不等于的数值大小关系;所 述度量最小解析单元中的计算修饰项表示求和、平均、计数、标准差、方差、相关度、相关系数、日增长率、周增长率、月增长率、季度增长率、年增长率、排序、最大值、最小值、前N和倒数N。Optionally, the metric item in the metric minimum analytic unit represents a numerical value; the metric logical relationship item in the metric minimum analytic unit represents a numerical relationship of greater than, less than, equal to, greater than or equal to, less than or equal to, and not equal to; The calculated modifiers in the metric minimum parsing unit represent summation, average, count, standard deviation, variance, correlation, correlation coefficient, daily growth rate, weekly growth rate, monthly growth rate, quarterly growth rate, annual growth rate, and ranking. , maximum, minimum, front N, and reciprocal N.

可选地,所述时空修饰结构词表示下述项中的至少一个:时间段、时间点、某点以前、某点以后和,以及某点和某点之间;某处预设距离范围之外、某处预设距离范围之内和某处设定方向相对位置。Optionally, the space-time modified structure word represents at least one of the following: a time period, a time point, a point before, a point after and a point, and a point and a point; a preset distance range somewhere Outside, somewhere within the preset distance range and somewhere set the direction relative position.

可选地,所述指令生成模块具体配置为通过以下步骤基于所述最小解析单元以及预设指令集,生成所述自然语言提问信息对应的查询指令:Optionally, the instruction generating module is specifically configured to generate, according to the minimum parsing unit and the preset instruction set, a query instruction corresponding to the natural language question information by using the following steps:

根据所述最小解析单元推断所述自然语言提问信息所包含的数据查询逻辑;Deriving data query logic included in the natural language question information according to the minimum parsing unit;

根据所述数据查询逻辑,从预设指令集中提取相应指令进行组合,生成所述自然语言提问信息对应的查询指令。According to the data query logic, corresponding instructions are extracted from the preset instruction set and combined to generate a query instruction corresponding to the natural language question information.

可选地,所述指令生成模块具体配置为通过以下步骤根据所述最小解析单元推断所述自然语言提问信息所包含的数据查询逻辑:Optionally, the instruction generating module is specifically configured to infer, according to the minimum parsing unit, data query logic included in the natural language question information by:

获得所述自然语言提问信息对应的类别;Obtaining a category corresponding to the natural language question information;

根据获得的类别,得到所述自然语言提问信息对应的查询逻辑。According to the obtained category, the query logic corresponding to the natural language question information is obtained.

第三方面,本公开还提供一种电子设备,包括存储器和处理器,存储器上存储有可在处理器上运行的计算机程序,处理器执行计算机程序时实现第一方面所述的方法的步骤。In a third aspect, the present disclosure further provides an electronic device comprising a memory and a processor, the memory storing a computer program executable on the processor, the step of implementing the method of the first aspect when the processor executes the computer program.

第四方面,本公开还提供一种具有处理器可执行的非易失的程序代码的计算机可读介质,程序代码使处理器执行第一方面所述的方法。In a fourth aspect, the present disclosure also provides a computer readable medium having processor-executable non-volatile program code, the program code causing a processor to perform the method of the first aspect.

在本公开提供的自然语言提问的理解方法、装置及电子设备中,首先对自然语言提问信息进行解析,得到最小解析单元,然后基于最小解析单元以及预设的指令集,构造出该自然语言提问信息的查询语句,进而依据该查询语句从预先建立的知识库中进行检索,得到该提问信息对应的数据结果,其中,知识库的建立基于用户提供的数据库数据、用户的输入信息数据和/或第三方数据,可以为提问信息提供准确的、经过计算统计之后得到的数据结果,从而可以应用于数据分析领域等专业的场景。In the method, device and electronic device for understanding natural language questions provided by the present disclosure, firstly, the natural language question information is parsed to obtain a minimum parsing unit, and then the natural language question is constructed based on the minimum parsing unit and the preset instruction set. The query statement of the information is further retrieved from the pre-established knowledge base according to the query statement, and the data result corresponding to the question information is obtained, wherein the knowledge base is established based on the database data provided by the user, the input information data of the user, and/or The third-party data can provide accurate and calculated data results for the question information, so that it can be applied to professional scenes such as data analysis.

本公开的其他特征和优点将在随后的说明书中阐述,并且,部分地从说明书中变得显而易见,或者通过实施本公开而了解。本公开的目的和其他优点在说明书、权利要求书以及附图中所特别指出的结构来实现和获得。Other features and advantages of the present disclosure will be set forth in the description which follows. The objectives and other advantages of the disclosure are realized and attained by the structure of the invention.

为使本公开的上述目的、特征和优点能更明显易懂,下文特举较佳实施例,并配合所附附图,作详细说明如下。The above described objects, features, and advantages of the present invention will become more apparent from the description of the appended claims.

附图说明DRAWINGS

为了更清楚地说明本公开具体实施方式或现有技术中的技术方案,下面将对具体实施方式或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本公开的一些实施方式,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the specific embodiments of the present disclosure or the technical solutions in the prior art, the drawings to be used in the specific embodiments or the description of the prior art will be briefly described below, and obviously, the attached in the following description The figures are some embodiments of the present disclosure, and other drawings may be obtained from those of ordinary skill in the art without departing from the drawings.

图1为本公开提供的一种自然语言提问的理解方法的流程图;FIG. 1 is a flowchart of a method for understanding a natural language question provided by the present disclosure;

图2为本公开提供的另一种自然语言提问的理解方法的流程图;2 is a flowchart of another method for understanding natural language questions provided by the present disclosure;

图3为本公开提供的另一种自然语言提问的理解方法的流程图;FIG. 3 is a flowchart of another method for understanding natural language questions provided by the present disclosure;

图4为本公开提供的另一种自然语言提问的理解方法的流程图;4 is a flowchart of another method for understanding natural language questions provided by the present disclosure;

图5为本公开提供的另一种自然语言提问的理解方法的流程图;FIG. 5 is a flowchart of another method for understanding natural language questions provided by the present disclosure;

图6为本公开提供的一种自然语言提问的理解装置的结构示意图;FIG. 6 is a schematic structural diagram of an apparatus for understanding natural language questions provided by the present disclosure;

图7为本公开提供的一种电子设备的结构示意图。FIG. 7 is a schematic structural diagram of an electronic device according to the present disclosure.

具体实施方式Detailed ways

为使本公开的目的、技术方案和优点更加清楚,下面将结合附图对本公开的技术方案进行清楚、完整地描述,显然,所描述的实施例是本公开一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。The present invention will be clearly and completely described in the following with reference to the accompanying drawings. example. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present disclosure without departing from the inventive scope are the scope of the disclosure.

目前现有的自然语言处理方法存在识别准确性有待提高的缺陷。经研究发现,识别准确性有待提高主要体现在以下几方面:(1)如果某些场景没有大量的语料积累,其识别性能就大打折扣;(2)通过统计方法训练出来的模型,不具有精确性,难以表达或解析准确的含义。因此,无法运用于一些很专业的场景,比如数据分析领域。基于此,本公开提供的一种自然语言提问的理解方法、装置及电子设备,能够对用户的自然语言提问信息进行准确的识别,并匹配出高准确度的数据结果,可以应用于数据分析领域等专业的场景。At present, the existing natural language processing methods have the defects that the recognition accuracy needs to be improved. The research found that the recognition accuracy needs to be improved mainly in the following aspects: (1) If some scenes do not have a large amount of corpus accumulation, their recognition performance will be greatly reduced; (2) the model trained by statistical methods is not accurate. Sexual, difficult to express or resolve the exact meaning. Therefore, it cannot be applied to some very professional scenes, such as the field of data analysis. Based on this, the method, device and electronic device for understanding natural language questions provided by the present disclosure can accurately identify the natural language question information of the user and match the high-accuracy data result, which can be applied to the field of data analysis. And other professional scenes.

为便于对本实施例进行理解,首先对本公开所公开的一种自然语言提问的理解方法进行详细介绍。In order to facilitate understanding of the present embodiment, a method for understanding natural language questions disclosed in the present disclosure will be described in detail first.

实施例一:Embodiment 1:

本公开提供一种自然语言提问的理解方法,可以应用于数据分析领域等相对专业的场景,可以由具有数据处理功能的电子设备等执行。参见图1所示,该方法包括以下几个步骤:The present disclosure provides a method for understanding a natural language question, which can be applied to a relatively professional scene such as a data analysis field, and can be executed by an electronic device having a data processing function or the like. Referring to Figure 1, the method includes the following steps:

S101:获取用户端输入的自然语言提问信息;所述自然语言提问信息为与数据查询相关的提问信息。S101: Acquire natural language question information input by the user end; the natural language question information is question information related to the data query.

具体实现的时候,用户可以通过语音或者打字等输入方式,利用搜索引擎的交互过程,给软件系统输入自然语言提问信息,比如“2016年之前年龄大于60(岁的)中国姓王的藏族男性(的)体重小于70kg的平均年龄”等。上述自然语言提问信息最终以文本的形式被服务器获取到,而且,该自然语言提问信息为与数据查询相关的提问信息。电子设备中也可以预存多种自然语言提问信息,并对多种自然语言提问信息进行分级存储,例如,可以设定第一数据分析场景、第二数据分析场景......第M数据分析场景,M为大于3的整数。各数据分析场景下可以对应不同的子场景,各子场景中对应不同的自然语言提问信息。电设备将各数据分析场景展示给用户,由用户按所需的数据分析场景进行依次选择,从而选择出所需的自然语言提问信息进行输入。In the specific implementation, the user can input the natural language question information to the software system by using the input process of voice or typing, and input the natural language question information to the software system, such as "the Tibetan male who is older than 60 (years old) before the age of 2016 ( The average age of the body weight is less than 70kg". The above natural language question information is finally obtained by the server in the form of text, and the natural language question information is question information related to the data query. The electronic device may also pre-store a plurality of natural language question information and hierarchically store a plurality of natural language question information. For example, the first data analysis scenario, the second data analysis scenario, the M data may be set. Analyze the scene, M is an integer greater than 3. Each data analysis scenario can correspond to different sub-scenarios, and each sub-scenario corresponds to different natural language question information. The electrical device displays each data analysis scenario to the user, and the user selects the data analysis scenarios according to the required data to select the desired natural language question information for input.

S102:对自然语言提问信息进行解析,得到最小解析单元。S102: Parse the natural language question information to obtain a minimum parsing unit.

为了得到最终的与自然语言提问信息相对应的准确的数据结果,需要对该自然语言提问信息进行准确地分析理解,因此,在获取到用户输入的自然语言提问信息后,需要首先对该自然语言提问信息进行解析,得到最小解析单元。具体的解析过程包括以下步骤,参见图2所示:In order to obtain the final accurate data result corresponding to the natural language question information, the natural language question information needs to be accurately analyzed and understood. Therefore, after obtaining the natural language question information input by the user, the natural language needs to be first The question information is parsed to obtain the minimum parsing unit. The specific parsing process includes the following steps, as shown in Figure 2:

S201:对自然语言提问信息进行分词处理,得到多个分词片段。S201: Perform word segmentation on the natural language question information to obtain a plurality of word segmentation segments.

在具体实现的时候,首先对自然语言提问信息进行分词处理,也就是实体边界识别,将自然语言提问信息分割成多个分词片段,比如,上述问题“2016年之前年龄大于60(岁的)中国姓王的藏族男性(的)体重小于70kg的平均年龄”,经过分词处理后,可以得到“2016年之前”“年龄”“大于60(岁的)”“中国”“姓”“王的”“藏族”“男性(的)”“体重”“小于70kg的”“平均”“年龄”多个分词片段。In the specific implementation, the word processing of the natural language question information is firstly processed, that is, the entity boundary recognition, and the natural language question information is segmented into a plurality of word segmentation, for example, the above question "the age of more than 60 (years old) before 2016) The Tibetan male surnamed Wang is less than the average age of 70kg. After the word segmentation, you can get "before 2016", "age", "greater than 60 (years old), "China", "surname", "king" Tibetan ""male"" "weight" "less than 70kg" "average" "age" multiple segmentation.

对自然语言提问信息进行分词处理的方式有多种,例如,可以基于词典分词算法,如正向最大匹配法、逆向最大匹配法和双向匹配分词法等对自然语言提问信息进行分词处理。又例如,可以基于分词工具如斯坦福(Stanford)分词工具、Hanlp分词工具等对自然语言提问信息进行分词处理。又例如,可以基于隐马尔可夫模型(Hidden Markov Model,HMM))、条件随机场算法(Conditional Random Field algorithm,CRF)、支持向量机(Support Vector Machine,SVM)、深度学习等对自然语言提问信息进行分词处理。也可以基于上述一种或者多种方式的组合对自然语言提问信息进行分词处理。There are various ways to perform word segmentation processing on natural language question information. For example, word segmentation processing can be performed on natural language question information based on dictionary word segmentation algorithms such as forward maximum matching method, inverse maximum matching method and two-way matching word segmentation method. For another example, the natural language question information can be segmented based on a word segmentation tool such as a Stanford word segmentation tool or a Hanlp word segmentation tool. For another example, a natural language question can be asked based on a Hidden Markov Model (HMM), a Conditional Random Field Algorithm (CRF), a Support Vector Machine (SVM), and a deep learning. Information is processed in word segmentation. It is also possible to perform word segmentation processing on natural language question information based on a combination of one or more of the above.

S202:对多个分词片段进行实体名词识别,得到最小解析单元。S202: Perform entity noun recognition on a plurality of segmentation segments to obtain a minimum resolution unit.

在得到多个分词片段后,对每个分词片段进行实体名词识别,确定该自然语言提问信息对应的最小解析单元。其中,最小解析单元包括:属性最小解析单元、度量最小解析单元及时空修饰结构词。属性最小解析单元包括属性项、计算操作项、属性逻辑关系项中至 少一项;度量最小解析单元包括度量项、度量逻辑关系项、计算修饰项中至少一项;结构词包括时空修饰词。After obtaining a plurality of segmentation segments, the entity nouns are identified for each segmentation segment, and the minimum parsing unit corresponding to the natural language question information is determined. The minimum parsing unit includes: an attribute minimum parsing unit, a metric minimum parsing unit, and a time-space modified structural word. The attribute minimum parsing unit includes at least one of an attribute item, a calculation operation item, and an attribute logical relationship item; the metric minimum parsing unit includes at least one of a metric item, a metric logical relationship item, and a calculation modifier; the structural word includes a space-time modifier.

可选的,属性最小解析单元中的属性项表示该自然语言提问信息所属的分类或实体;属性最小解析单元中的计算操作项表示分类、分组、切分、部分取值、计数、按拼音排名、按姓氏笔画排名等方式;属性最小解析单元中的属性逻辑关系项表示相似、不相似、包含、不包含等逻辑关系;度量最小解析单元中的度量项表示数值;度量最小解析单元中的度量逻辑关系项表示大于、小于、等于、大于等于、小于等于等数值大小关系。度量最小解析单元中的计算修饰项表示求和、平均(平均的用法可以为:平均xxx,不得使用xxx的平均值)、计数、标准差、方差、相关度、相关系数、日增长率、周增长率、月增长率、季度增长率、年增长率、排序、最大值、最小值、前N、倒数N等等,N为大于等于1的整数;时空修饰结构词表示时间段、时间点、某点以前、某点以后、某点和某点之间等等描述时间的词语,以及某处预设距离范围之外、某处预设距离范围之内和某处设定方向相对位置。如某处附近,某处预设距离公里、英里、米、千米等范围之内或之外,某处东、南、西、北等方向相对位置的表示等。时空修饰结构词还可以是以上表示时间和空间的修饰词的组合,比如某时间段某处某距离以内等。Optionally, the attribute item in the attribute minimum parsing unit indicates the category or entity to which the natural language question information belongs; the calculation operation item in the attribute minimum parsing unit indicates classification, grouping, segmentation, partial value, counting, and pinyin ranking. Attributes in the minimum parsing unit of the attribute represent logical relationships of similarity, dissimilarity, inclusion, and non-containment; metrics in the metric minimum parsing unit represent numerical values; metrics in the smallest parsing unit The logical relationship item represents a numerical relationship of greater than, less than, equal to, greater than or equal to, less than or equal to. The calculated modifiers in the metric minimum parsing unit represent summation, averaging (average usage can be: average xxx, not using the average of xxx), count, standard deviation, variance, correlation, correlation coefficient, daily growth rate, week Growth rate, monthly growth rate, quarterly growth rate, annual growth rate, ranking, maximum value, minimum value, pre-N, reciprocal number N, etc., N is an integer greater than or equal to 1; space-time modified structure words indicate time period, time point, A word describing time before, after, after, at a certain point, and some point, and somewhere within a preset distance range, somewhere within a preset distance range, and somewhere in a certain direction. For example, near a certain place, somewhere within a range of kilometers, miles, meters, kilometers, etc., the relative position of the east, south, west, and north directions. The space-time modified structure word may also be a combination of the above-mentioned modifiers indicating time and space, such as within a certain distance of a certain time period.

需要说明的是,属性最小解析单元、度量最小解析单元及时空修饰结构词的判断的核心在于实体名词识别出的词是否可计算,比如,上述表示大于、小于、包含、不包含、标准差、增长率等含义的词汇均为可以进行计算的词,以这些可计算的词为依据进行后续的步骤,可以提高对自然语言提问信息理解的准确度,以及获得更加精确的数据结果,为数据分析领域提供更好的数据参考信息。It should be noted that the core of the judgment of the attribute minimum parsing unit, the metric minimum parsing unit and the time-space modified structure word is whether the word recognized by the entity noun can be calculated, for example, the above representation is greater than, less than, included, not included, standard deviation, Words such as growth rate are words that can be calculated. Subsequent steps based on these computable words can improve the accuracy of understanding natural language question information and obtain more accurate data results for data analysis. The field provides better data reference information.

通过对用户输入的自然语言提问信息进行分词处理和实体名词识别,可以确定出该自然语言提问信息所对应的最小解析单元。比如:提问信息为“2017年8月华北地区动作电影的票房收入”,那么分词后,提取出的分词片段为:时间(2017年8月),地区(华北地区),动作电影的,票房收入。然后对上述分词片段进行实体名词识别,确定出该提问信息的最小解析单元和结构词。其中,时间(2017年8月)为时空修饰结构词;地区(华北地区)属于属性最小解析单元,动作电影属于属性最小解析单元;收入属于度量最小解析单元。By performing word segmentation processing and entity noun recognition on the natural language question information input by the user, the minimum parsing unit corresponding to the natural language question information can be determined. For example, the question information is “box office income of action movies in North China in August 2017”. After the word segmentation, the extracted word segmentation is: time (August 2017), region (North China region), action movie, box office income . Then, the above participle segment is subjected to entity noun recognition, and the minimum parsing unit and structural word of the question information are determined. Among them, time (August 2017) is a space-time modified structure word; the region (North China region) belongs to the attribute minimum parsing unit, the action movie belongs to the attribute minimum parsing unit; the income belongs to the metric minimum parsing unit.

S103:基于最小解析单元以及预设指令集,生成自然语言提问信息对应的查询指令。S103: Generate a query instruction corresponding to the natural language question information based on the minimum parsing unit and the preset instruction set.

具体的,查询指令的生成过程包括以下步骤,参见图3所示:Specifically, the process of generating the query instruction includes the following steps, as shown in FIG. 3:

S301:根据最小解析单元推断自然语言提问信息所包含的数据查询逻辑。S301: Infer the data query logic included in the natural language question information according to the minimum parsing unit.

比如,对自然语言提问信息“2017年8月华北地区动作电影的票房收入”进行解析后, 得到的最小解析单元包括:华北地区(属性最小解析单元)、动作电影(属性最小解析单元)、收入(度量最小解析单元);时空修饰结构词为:2017年8月;然后进一步根据这些词确定该提问信息所包含的数据查询逻辑。具体的,根据最小解析单元的含义和顺序,确定其在指令集中的组合逻辑。For example, after analyzing the natural language question information “box office income of action movies in North China in August 2017”, the minimum parsing unit obtained includes: North China (attribute minimum parsing unit), action movie (attribute minimum parsing unit), income (Metric minimum parsing unit); space-time modified structure words: August 2017; then further based on these words to determine the data query logic contained in the question information. Specifically, the combination logic in the instruction set is determined according to the meaning and order of the minimum parsing unit.

本公开中,根据属性最小解析单元中的属性项可以得到自然语言提问信息所属的分类或实体,进而得到自然语言提问信息的类别。为了实现对不同类别的自然语言提问信息的准确查询,可以预先设定各种类别的自然语言提问信息分别对应的查询逻辑。In the present disclosure, the category or entity to which the natural language question information belongs may be obtained according to the attribute item in the attribute minimum parsing unit, thereby obtaining the category of the natural language question information. In order to accurately query different types of natural language question information, the query logic corresponding to each category of natural language question information may be preset.

问题类别不同的自然语言提问信息所对应数据查询逻辑是不同的。自然语言提问信息的问题类别常见的有:问数量类、问比率类、问排名类、问关系类等。其中,问数量类指某些特定时间段内或者某个时间点的度量值或属性值;问比率类指某个度量或某个属性的计数在不同时间段的比值、不同度量在同一时间段的比值等;问排名类指按照某种维度(该维度最终解析成过滤条件)对列(度量或属性)进行排序;问关系类中的关系是预定义的或者程序经数据训练习得的,比如最危险的、最有价值的、最相关的、附近的、同类的等等。The data query logic corresponding to the natural language question information with different problem categories is different. The types of questions in natural language questioning information are: question quantity class, question ratio class, question ranking class, question relationship class, and so on. The quantity class refers to the metric or attribute value in a certain time period or at a certain time point; the question ratio class refers to the ratio of the count of a certain metric or an attribute in different time periods, and the different metrics are in the same time period. The ratio of the rankings; the ranking class refers to sorting the columns (metrics or attributes) according to a certain dimension (the dimension is finally parsed into a filter condition); the relationship in the relational class is predefined or the program is learned through data training. For example, the most dangerous, most valuable, most relevant, nearby, similar, and so on.

针对每种问题类别有不同的查询逻辑,比如,问数量类所对应的逻辑为:时间段或时间点|逻辑修饰1|数值1|逻辑修饰n|数值n|,问比率类所对应的组合方式为:时间段|逻辑修饰1|数值1|逻辑修饰n|数值n|分类A的|分类B的|逻辑修饰B|分类N的|数值2|逻辑修饰|计算修饰|数值1|增长率(计算修饰)等等,通过获得自然语言提问信息对应的类别,基于预先设定的各种类别的自然语言提问信息分别对应的查询逻辑,即可根据获得的类别,得到自然语言提问信息对应的查询逻辑。There are different query logics for each problem category. For example, the logic corresponding to the quantity class is: time period or time point | logical modification 1 | value 1 | logical modification n | value n |, the combination of the ratio class The mode is: time period | logical modification 1 | numerical value 1 | logical modification n | numerical value n | classification A | classification B | logical modification B | classification N | numerical value 2 | logical modification | calculation modification | (calculation modification) and the like, by obtaining the category corresponding to the natural language question information, based on the query logic corresponding to the preset natural language question information of various categories, the natural language question information corresponding to the obtained category can be obtained. Query logic.

S302:根据数据查询逻辑,从预设指令集中提取相应指令进行组合,生成自然语言提问信息对应的查询指令。S302: According to the data query logic, extract corresponding instructions from the preset instruction set to combine, and generate a query instruction corresponding to the natural language question information.

比如,基于上述自然语言提问信息“2017年8月华北地区动作电影的票房收入”解析出的最小解析单元,以及上述数据查询逻辑,从预设指令集中提取出相应的指令,并按照数据查询逻辑进行组合,构造出该提问信息的查询指令,如:通过SQL(或Cyper)等结构化的查询语言下发查询请求:select sum(票房)from电影票房表group by地区,电影类别where地区='华北'and电影类别='动作'and时间>=2017-08-01and时间<2018-09-01,该查询请求由预定义的指令片段或指令集合组合而成,数据库通过解析这些指令集组成的请求,筛选、计算并返回数据,从而回答用户的问题。For example, based on the above-mentioned natural language question information "the box office revenue of the action movie in North China in August 2017", and the above-mentioned data query logic, the corresponding instruction is extracted from the preset instruction set, and according to the data query logic Combine and construct the query instruction of the question information, such as: sending a query request through a structured query language such as SQL (or Cyper): select sum (box office) from the movie box office table by region, movie category where area = ' North China'and movie category = 'action' and time>=2017-08-01and time <2018-09-01, the query request is composed of a predefined instruction fragment or instruction set, and the database is composed by parsing these instruction sets. Request, filter, calculate, and return data to answer user questions.

S104:根据查询指令从预设知识库中进行检索,得到自然语言提问信息对应的数据结果。S104: Perform a retrieval from a preset knowledge base according to the query instruction, and obtain a data result corresponding to the natural language question information.

在根据所述查询指令从预设知识库中进行检索,得到所述自然语言提问信息对应的数据结果之前,还包括以下步骤,参见图4所示:Before performing the retrieval from the preset knowledge base according to the query instruction to obtain the data result corresponding to the natural language question information, the following steps are also included, as shown in FIG. 4:

S401:获取知识库样本数据;所述知识库样本数据包括:用户提供的数据库数据、用户的输入信息数据和/或第三方数据。S401: Acquire knowledge base sample data; the knowledge base sample data includes: user-provided database data, user input information data, and/or third-party data.

S402:根据知识库样本数据,生成预设知识库。S402: Generate a preset knowledge base according to the knowledge base sample data.

具体的知识库生成过程不做具体限定,例如,可以采用基于卷积神经网络的深度学习模型进行建立,也可以采用其它的方法。The specific knowledge base generation process is not specifically limited. For example, a deep learning model based on a convolutional neural network may be used for establishment, or other methods may be employed.

为了提高查询结果的精确度,本公开所提供的方法还包括以下步骤,参见图5所示:In order to improve the accuracy of the query result, the method provided by the present disclosure further includes the following steps, as shown in FIG. 5:

S501:将自然语言提问信息及其对应的数据结果添加至预设知识库中。S501: Add the natural language question information and the corresponding data result to the preset knowledge base.

通过上述步骤可以不断地更新预设知识库中的数据,使预设知识库中的内容越来越丰富,从而可以不断地提高提问信息的查询精确度。最终得到的数据结果会根据查询时间的不同以及知识库的不断更新,发生相应的变化,而并非预存好的答案。例如,若预设知识库中存储有针对同一自然语言提问信息对应的不同的数据结果,可以在预设知识库中记录针对同一自然语言提问信息得到各数据结果的最近时间,在获取用户端输入的该自然语言提问信息之后,从预设知识库中查找出与该自然语言提问信息对应的所有数据结果,并选取获得的时间最近的一个数据结果作为该自然语言提问信息对应的最终数据结果。Through the above steps, the data in the preset knowledge base can be continuously updated, so that the content in the preset knowledge base is more and more rich, so that the query accuracy of the question information can be continuously improved. The resulting data results will change according to the query time and the knowledge base, and it is not a pre-stored answer. For example, if the preset knowledge base stores different data results corresponding to the same natural language question information, the latest time for obtaining the data result for the same natural language question information may be recorded in the preset knowledge base, and the user input is obtained. After the natural language question information, all data results corresponding to the natural language question information are searched from the preset knowledge base, and a data result closest to the obtained time is selected as the final data result corresponding to the natural language question information.

又例如,若预设知识库中存储有针对同一自然语言提问信息对应的不同的数据结果,在获取用户端输入的该自然语言提问信息之后,可以从预设知识库中查找出与该自然语言提问信息对应的所有数据结果并向用户输出,获取用户从所有数据结果中选择的一个数据结果,将与该自然语言提问信息对应的所有数据结果中各数据结果被选中的次数进行统计,在存在数据结果被选中的次数大于设定最高阈值后,将该数据结果作为该自然语言提问信息对应的最终数据结果。For example, if the preset knowledge base stores different data results corresponding to the same natural language question information, after obtaining the natural language question information input by the user end, the natural language may be found from the preset knowledge base. All the data results corresponding to the question information are output to the user, and a data result selected by the user from all the data results is obtained, and the number of times the data results are selected in all the data results corresponding to the natural language question information is counted in the presence After the data result is selected more than the set highest threshold, the data result is used as the final data result corresponding to the natural language question information.

在本公开提供的自然语言提问的理解方法中,首先对自然语言提问信息进行解析,得到最小解析单元,然后基于最小解析单元以及预设的指令集,构造出该自然语言提问信息的查询语句,进而依据该查询语句从预先建立的知识库中进行检索,得到该提问信息对应的数据结果,该方法中,知识库的建立基于用户提供的数据库数据、用户的输入信息数据和/或第三方数据,可以为提问信息提供准确的、经过计算统计之后得到的数据结果,从而使得该方法可以应用于数据分析领域等专业的场景。In the understanding method of the natural language question provided by the present disclosure, the natural language question information is first parsed to obtain a minimum parsing unit, and then the query statement of the natural language question information is constructed based on the minimum parsing unit and the preset instruction set. Then, the data is retrieved from the pre-established knowledge base according to the query statement, and the data result corresponding to the question information is obtained. In the method, the knowledge base is established based on user-provided database data, user input information data, and/or third-party data. It can provide accurate and calculated data results for the question information, so that the method can be applied to professional scenes such as data analysis.

实施例二:Embodiment 2:

本公开提供一种自然语言提问的理解装置,参见图6所示,该装置包括:信息获取模 块61、信息解析模块62、指令生成模块63和检索模块64。The present disclosure provides an apparatus for understanding natural language questions. Referring to FIG. 6, the apparatus includes an information acquisition module 61, an information analysis module 62, an instruction generation module 63, and a retrieval module 64.

信息获取模块61,用于获取用户端输入的自然语言提问信息;自然语言提问信息为与数据查询相关的提问信息;信息解析模块62,用于对自然语言提问信息进行解析,得到最小解析单元;指令生成模块63,用于基于最小解析单元以及预设指令集,生成自然语言提问信息对应的查询指令;检索模块64,用于根据查询指令从预设知识库中进行检索,得到自然语言提问信息对应的数据结果;预设知识库根据用户提供的数据库数据、用户的输入信息数据和/或第三方数据生成。The information obtaining module 61 is configured to obtain the natural language question information input by the user end; the natural language question information is the question information related to the data query; the information parsing module 62 is configured to parse the natural language question information to obtain a minimum parsing unit; The instruction generating module 63 is configured to generate a query instruction corresponding to the natural language question information based on the minimum parsing unit and the preset instruction set; the retrieving module 64 is configured to perform the retrieving from the preset knowledge base according to the query instruction to obtain the natural language question information. Corresponding data results; the preset knowledge base is generated based on user-provided database data, user input information data, and/or third-party data.

具体的,信息解析模块62还包括:Specifically, the information parsing module 62 further includes:

分词模块621,用于对自然语言提问信息进行分词处理,得到多个分词片段;识别模块622,用于对多个分词片段进行实体名词识别,得到最小解析单元;最小解析单元包括:属性最小解析单元、度量最小解析单元及时空修饰结构词。The word segmentation module 621 is configured to perform word segmentation processing on the natural language question information to obtain a plurality of word segment segments; the recognition module 622 is configured to perform entity noun recognition on the plurality of segment word segments to obtain a minimum parsing unit; and the minimum parsing unit includes: attribute minimum parsing The unit, the metric minimum parsing unit, and the structural word are modified in time.

本公开所提供的自然语言提问的理解装置中,各个模块与前述自然语言提问的理解方法具有相同的技术特征,因此,同样可以实现上述功能。本装置中的各个模块的具体工作过程参见上述方法实施例,在此不再赘述。In the apparatus for understanding natural language questions provided by the present disclosure, each module has the same technical features as the above-described method of understanding natural language questions, and therefore, the above functions can also be realized. For the specific working process of each module in the device, refer to the foregoing method embodiment, and details are not described herein again.

可选地,所述属性最小解析单元包括属性项、计算操作项、属性逻辑关系项中至少一项;所述度量最小解析单元包括度量项、度量逻辑关系项、计算修饰项中至少一项。Optionally, the attribute minimum parsing unit includes at least one of an attribute item, a calculation operation item, and an attribute logical relationship item; the metric minimum parsing unit includes at least one of a metric item, a metric logical relationship item, and a calculation modification item.

可选地,所述属性最小解析单元中的属性项表示自然语言提问信息所属的分类或实体;属性最小解析单元中的计算操作项表示分类、分组、切分、部分取值、计数、按拼音排名和按姓氏笔画排名的方式;属性最小解析单元中的属性逻辑关系项表示相似、不相似、包含、不包含、对比和比较的逻辑关系。Optionally, the attribute item in the attribute minimum parsing unit represents a category or entity to which the natural language question information belongs; the calculation operation item in the attribute minimum parsing unit represents classification, grouping, segmentation, partial value, counting, pinyin Ranking and ranking by last name stroke; attribute logical relationship items in attribute minimum resolution unit represent logical relationships of similarity, dissimilarity, inclusion, non-containment, comparison, and comparison.

可选地,所述度量最小解析单元中的度量项表示数值;所述度量最小解析单元中的度量逻辑关系项表示大于、小于、等于、大于等于、小于等于、不等于的数值大小关系;所述度量最小解析单元中的计算修饰项表示求和、平均、计数、标准差、方差、相关度、相关系数、日增长率、周增长率、月增长率、季度增长率、年增长率、排序、最大值、最小值、前N和倒数N。Optionally, the metric item in the metric minimum analytic unit represents a numerical value; the metric logical relationship item in the metric minimum analytic unit represents a numerical relationship of greater than, less than, equal to, greater than or equal to, less than or equal to, and not equal to; The calculated modifiers in the metric minimum parsing unit represent summation, average, count, standard deviation, variance, correlation, correlation coefficient, daily growth rate, weekly growth rate, monthly growth rate, quarterly growth rate, annual growth rate, and ranking. , maximum, minimum, front N, and reciprocal N.

可选地,所述时空修饰结构词表示下述项中的至少一个:时间段、时间点、某点以前、某点以后和,以及某点和某点之间。Optionally, the space-time modified structure word represents at least one of the following: a time period, a time point, a point before, a point after, and a point and a point.

可选地,所述指令生成模块63具体配置为通过以下步骤基于所述最小解析单元以及预设指令集,生成所述自然语言提问信息对应的查询指令:根据所述最小解析单元推断所述自然语言提问信息所包含的数据查询逻辑;根据所述数据查询逻辑,从预设指令集中提取相应指令进行组合,生成所述自然语言提问信息对应的查询指令。Optionally, the instruction generating module 63 is configured to generate, according to the minimum parsing unit and the preset instruction set, a query instruction corresponding to the natural language question information by: inferring the natural according to the minimum parsing unit The data query logic included in the language question information; according to the data query logic, extracting corresponding instructions from the preset instruction set to be combined, and generating a query instruction corresponding to the natural language question information.

可选地,所述指令生成模块63具体配置为通过以下步骤根据所述最小解析单元推断所述自然语言提问信息所包含的数据查询逻辑:获得所述自然语言提问信息对应的类别;根据获得的类别,得到所述自然语言提问信息对应的查询逻辑。Optionally, the instruction generating module 63 is specifically configured to infer, according to the minimum parsing unit, data query logic included in the natural language question information by: obtaining a category corresponding to the natural language question information; The category obtains query logic corresponding to the natural language question information.

实施例三:Embodiment 3:

本公开还提供一种电子设备,参见图7所示,该电子设备包括:处理器70,存储器71,总线72和通信接口73,所述处理器70、通信接口73和存储器71通过总线72连接;处理器70用于执行存储器71中存储的可执行模块,例如计算机程序。处理器执行计算机程序时实现如方法实施例所述的方法的步骤。即实施例1中的各方法步骤可以由处理器70执行。The present disclosure also provides an electronic device. As shown in FIG. 7, the electronic device includes a processor 70, a memory 71, a bus 72, and a communication interface 73. The processor 70, the communication interface 73, and the memory 71 are connected by a bus 72. The processor 70 is operative to execute executable modules, such as computer programs, stored in the memory 71. The steps of the method as described in the method embodiments are implemented when the processor executes a computer program. That is, each method step in Embodiment 1 can be performed by the processor 70.

其中,存储器71可能包含高速随机存取存储器(RAM,RandomAccessMemory),也可能还包括非不稳定的存储器(non-volatile memory),例如至少一个磁盘存储器。通过至少一个通信接口73(可以是有线或者无线)实现该系统网元与至少一个其他网元之间的通信连接,可以使用互联网,广域网,本地网,城域网等。The memory 71 may include a high speed random access memory (RAM), and may also include a non-volatile memory, such as at least one disk memory. The communication connection between the system network element and at least one other network element is implemented by at least one communication interface 73 (which may be wired or wireless), and may use an Internet, a wide area network, a local network, a metropolitan area network, or the like.

总线72可以是工业标准体系结构(Industry Standard Architecture,ISA)总线、外设部件互连标准(Peripheral Component Interconnect,PCI)总线或扩展工业标准结构(Extended Industry Standard Architecture,EISA)总线等。所述总线可以分为地址总线、数据总线、控制总线等。为便于表示,图7中仅用一个双向箭头表示,但并不表示仅有一根总线或一种类型的总线。The bus 72 may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, or an Extended Industry Standard Architecture (EISA) bus. The bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one double-headed arrow is shown in Figure 7, but it does not mean that there is only one bus or one type of bus.

其中,存储器71用于存储程序,所述处理器70在接收到执行指令后,执行所述程序,前述本公开任一实施例揭示的流过程定义的装置所执行的方法可以应用于处理器70中,或者由处理器70实现。The memory 71 is used to store a program, and the processor 70 executes the program after receiving the execution instruction, and the method executed by the device defined by the flow process disclosed in any of the foregoing embodiments may be applied to the processor 70. Medium or implemented by processor 70.

处理器70可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器70中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器70可以是通用处理器,包括中央处理器(Central Processing Unit,简称CPU)、网络处理器(Network Processor,简称NP)等;还可以是数字信号处理器(Digital Signal Processing,简称DSP)、专用集成电路(Application Specific Integrated Circuit,简称ASIC)、现成可编程门阵列(Field-Programmable Gate Array,简称FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本公开中的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本公开所提供的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读 存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器71,处理器70读取存储器71中的信息,结合其硬件完成上述方法的步骤。Processor 70 may be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the above method may be completed by an integrated logic circuit of hardware in the processor 70 or an instruction in the form of software. The processor 70 may be a general-purpose processor, including a central processing unit (CPU), a network processor (NP processor, etc.), or a digital signal processor (DSP). ), Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware component. The methods, steps, and logic blocks in this disclosure may be implemented or performed. The general purpose processor may be a microprocessor or the processor or any conventional processor or the like. The steps of the method provided in connection with the present disclosure may be directly embodied by the completion of the hardware decoding processor or by a combination of hardware and software modules in the decoding processor. The software modules can be located in a conventional storage medium such as random access memory, flash memory, read only memory, programmable read only memory or electrically erasable programmable memory, registers, and the like. The storage medium is located in the memory 71, and the processor 70 reads the information in the memory 71 and performs the steps of the above method in combination with its hardware.

本公开还提供自然语言提问的理解方法的计算机程序产品,包括存储了处理器可执行的非易失的程序代码的计算机可读存储介质,所述程序代码包括的指令可用于执行前面方法实施例中所述的方法,具体实现可参见方法实施例,在此不再赘述。The present disclosure also provides a computer program product for a method of understanding natural language questions, comprising a computer readable storage medium storing non-volatile program code executable by a processor, the program code comprising instructions operable to perform the previous method embodiments For the specific implementation, refer to the method embodiment, and details are not described herein again.

所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的装置及电子设备的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。A person skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the device and the electronic device described above can refer to the corresponding process in the foregoing method embodiments, and details are not described herein again.

附图中的流程图和框图显示了根据本公开的多个实施例方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或代码的一部分,所述模块、程序段或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或动作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality and operation of possible implementations of methods and computer program products according to various embodiments of the present disclosure. In this regard, each block of the flowchart or block diagram can represent a module, a program segment, or a portion of code that includes one or more of the Executable instructions. It should also be noted that in some alternative implementations, the functions noted in the blocks may also occur in a different order than that illustrated in the drawings. For example, two consecutive blocks may be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowcharts, and combinations of blocks in the block diagrams and/or flowcharts, can be implemented in a dedicated hardware-based system that performs the specified function or function. Or it can be implemented by a combination of dedicated hardware and computer instructions.

在本公开的描述中,需要说明的是,术语“中心”、“上”、“下”、“左”、“右”、“竖直”、“水平”、“内”、“外”等指示的方位或位置关系为基于附图所示的方位或位置关系,仅是为了便于描述本公开和简化描述,而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作,因此不能理解为对本公开的限制。此外,术语“第一”、“第二”、“第三”仅用于描述目的,而不能理解为指示或暗示相对重要性。In the description of the present disclosure, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inside", "outside", etc. The orientation or positional relationship indicated is based on the orientation or positional relationship shown in the drawings, and is merely for the convenience of describing the present disclosure and the simplified description, and does not indicate or imply that the device or component referred to has a specific orientation, in a specific orientation. The construction and operation are therefore not to be construed as limiting the disclosure. Moreover, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,又例如,多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些通信接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. The device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner. For example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some communication interface, device or unit, and may be electrical, mechanical or otherwise.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.

另外,在本公开各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各 个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in various embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.

所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个处理器可执行的非易失的计算机可读取存储介质中。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。The functions, if implemented in the form of software functional units and sold or used as separate products, may be stored in a non-transitory computer readable storage medium executable by a processor. Based on such understanding, the portion of the technical solution of the present disclosure that contributes in essence or to the prior art or the portion of the technical solution may be embodied in the form of a software product stored in a storage medium, including The instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present disclosure. The foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .

最后应说明的是:以上所述实施例,仅为本公开的具体实施方式,用以说明本公开的技术方案,而非对其限制,本公开的保护范围并不局限于此,尽管参照前述实施例对本公开进行了详细的说明,本领域的普通技术人员应当理解:任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,其依然可以对前述实施例所记载的技术方案进行修改或可轻易想到变化,或者对其中部分技术特征进行等同替换;而这些修改、变化或者替换,并不使相应技术方案的本质脱离本公开技术方案的精神和范围,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应所述以权利要求的保护范围为准。It should be noted that the above-mentioned embodiments are merely specific embodiments of the present disclosure, and are used to explain the technical solutions of the present disclosure, and are not limited thereto. The scope of protection of the present disclosure is not limited thereto, although reference is made to the foregoing. The embodiments are described in detail, and those skilled in the art should understand that any one skilled in the art can still modify the technical solutions described in the foregoing embodiments within the technical scope of the disclosure. The changes may be easily conceived, or equivalents may be substituted for some of the technical features; and the modifications, variations, or substitutions of the present invention are not intended to depart from the spirit and scope of the technical solutions of the present disclosure, and should be covered in the protection of the present disclosure. Within the scope. Therefore, the scope of protection of the present disclosure should be determined by the scope of the claims.

工业实用性Industrial applicability

本公开提供的自然语言提问的理解方法、装置及电子设备,可以为提问信息提供准确的、经过计算统计之后得到的数据结果,实现对自然语言提问信息的准确识别,可以应用于数据分析领域等专业的场景。The method, device and electronic device for understanding natural language questions provided by the present disclosure can provide accurate and statistically obtained data results for question information, and realize accurate identification of natural language question information, and can be applied to data analysis fields, etc. Professional scenes.

Claims (20)

一种自然语言提问的理解方法,其特征在于,包括:A method for understanding natural language questions, which is characterized by comprising: 获取用户端输入的自然语言提问信息;所述自然语言提问信息为与数据查询相关的提问信息;Obtaining natural language question information input by the user end; the natural language question information is question information related to the data query; 对所述自然语言提问信息进行解析,得到最小解析单元;Parsing the natural language question information to obtain a minimum parsing unit; 基于所述最小解析单元以及预设指令集,生成所述自然语言提问信息对应的查询指令;Generating, according to the minimum parsing unit and the preset instruction set, a query instruction corresponding to the natural language question information; 根据所述查询指令从预设知识库中进行检索,得到所述自然语言提问信息对应的数据结果;所述预设知识库根据用户提供的数据库数据、用户的输入信息数据和/或第三方数据生成。Performing a search from the preset knowledge base according to the query instruction to obtain a data result corresponding to the natural language question information; the preset knowledge base is based on database data provided by the user, user input information data, and/or third party data. generate. 根据权利要求1所述的方法,其特征在于,所述对所述自然语言提问信息进行解析,得到最小解析单元,具体包括:The method according to claim 1, wherein the parsing the natural language question information to obtain a minimum parsing unit comprises: 对所述自然语言提问信息进行分词处理,得到多个分词片段;Performing word segmentation processing on the natural language question information to obtain a plurality of word segmentation segments; 对所述多个分词片段进行实体名词识别,得到最小解析单元;所述最小解析单元包括:属性最小解析单元、度量最小解析单元及时空修饰结构词。Performing entity noun recognition on the plurality of segmentation segments to obtain a minimum parsing unit; the minimum parsing unit includes: an attribute minimum parsing unit, a metric minimum parsing unit, and a space-time modifying structural word. 根据权利要求2所述的方法,其特征在于,所述属性最小解析单元包括属性项、计算操作项、属性逻辑关系项中至少一项;所述度量最小解析单元包括度量项、度量逻辑关系项、计算修饰项中至少一项。The method according to claim 2, wherein the attribute minimum parsing unit comprises at least one of an attribute item, a calculation operation item, and an attribute logical relationship item; the metric minimum parsing unit comprises a metric item, a metric logical relationship item And calculate at least one of the modifiers. 根据权利要求3所述的方法,其特征在于,所述属性最小解析单元中的属性项表示自然语言提问信息所属的分类或实体;所述属性最小解析单元中的计算操作项表示分类、分组、切分、部分取值、计数、按拼音排名和按姓氏笔画排名的方式;所述属性最小解析单元中的属性逻辑关系项表示相似、不相似、包含、不包含、对比和比较的逻辑关系。The method according to claim 3, wherein the attribute item in the attribute minimum parsing unit represents a category or entity to which the natural language question information belongs; the calculation operation item in the attribute minimum parsing unit represents classification, grouping, Segmentation, partial value, count, ranking by pinyin and ranking by last name stroke; attribute logical relationship terms in the minimum parsing unit of the attribute represent logical relationships of similarity, dissimilarity, inclusion, non-containment, comparison and comparison. 根据权利要求3或4所述的方法,其特征在于,所述度量最小解析单元中的度量项表示数值;所述度量最小解析单元中的度量逻辑关系项表示大于、小于、等于、大于等于、小于等于、不等于的数值大小关系;所述度量最小解析单元中的计算修饰项表示求和、平均、计数、标准差、方差、相关度、相关系数、日增长率、周增长率、月增长率、季度增长率、年增长率、排序、最大值、最小值、前N和倒数N。The method according to claim 3 or 4, wherein the metric item in the metric minimum analytic unit represents a numerical value; the metric logical relationship item in the metric minimum analytic unit represents greater than, less than, equal to, greater than or equal to, a numerical relationship of less than or equal to and not equal to; the calculated modifier in the minimum parsing unit of the metric represents summation, average, count, standard deviation, variance, correlation, correlation coefficient, daily growth rate, weekly growth rate, monthly growth Rate, quarterly growth rate, annual growth rate, sort, maximum, minimum, pre-N and reciprocal N. 根据权利要求2至5任一项所述的方法,其特征在于,所述时空修饰结构词表示下述项中的至少一个:时间段、时间点、某点以前、某点以后,以及某点和某点之间;某处预设距离范围之外、某处预设距离范围之内和某处设定方向相对位置。The method according to any one of claims 2 to 5, wherein the space-time modified structure word represents at least one of the following: a time period, a time point, a point before, after a point, and a point Between a certain point and a preset position within a certain distance, and a relative position within a certain distance. 根据权利要求1至6任一项所述的方法,其特征在于,所述基于所述最小解析单元 以及预设指令集,生成所述自然语言提问信息对应的查询指令,具体包括:The method according to any one of claims 1 to 6, wherein the generating the query instruction corresponding to the natural language question information based on the minimum parsing unit and the preset instruction set comprises: 根据所述最小解析单元推断所述自然语言提问信息所包含的数据查询逻辑;Deriving data query logic included in the natural language question information according to the minimum parsing unit; 根据所述数据查询逻辑,从预设指令集中提取相应指令进行组合,生成所述自然语言提问信息对应的查询指令。According to the data query logic, corresponding instructions are extracted from the preset instruction set and combined to generate a query instruction corresponding to the natural language question information. 根据权利要求7所述的方法,其特征在于,根据所述最小解析单元推断所述自然语言提问信息所包含的数据查询逻辑,具体包括:The method according to claim 7, wherein the inferring the data query logic included in the natural language question information according to the minimum parsing unit comprises: 获得所述自然语言提问信息对应的类别;Obtaining a category corresponding to the natural language question information; 根据获得的类别,得到所述自然语言提问信息对应的查询逻辑。According to the obtained category, the query logic corresponding to the natural language question information is obtained. 根据权利要求1至8任一项所述的方法,其特征在于,在所述根据所述查询指令从预设知识库中进行检索,得到所述自然语言提问信息对应的数据结果之前,所述方法还包括:The method according to any one of claims 1 to 8, wherein before the retrieving from the preset knowledge base according to the query instruction to obtain the data result corresponding to the natural language question information, The method also includes: 获取知识库样本数据;所述知识库样本数据包括:用户提供的数据库数据、用户的输入信息数据和/或第三方数据;Obtaining knowledge base sample data; the knowledge base sample data includes: user-provided database data, user input information data, and/or third-party data; 根据所述知识库样本数据,生成预设知识库。A preset knowledge base is generated according to the knowledge base sample data. 根据权利要求1至9任一项所述的方法,其特征在于,在所述根据所述查询指令从预设知识库中进行检索,得到所述自然语言提问信息对应的数据结果后,所述方法还包括:The method according to any one of claims 1 to 9, wherein after the searching according to the query instruction is performed from a preset knowledge base to obtain a data result corresponding to the natural language question information, The method also includes: 将所述自然语言提问信息及其对应的数据结果添加至所述预设知识库中。The natural language question information and its corresponding data result are added to the preset knowledge base. 一种自然语言提问的理解装置,其特征在于,包括:An apparatus for understanding natural language questions, comprising: 信息获取模块,配置为获取用户端输入的自然语言提问信息;所述自然语言提问信息为与数据查询相关的提问信息;The information obtaining module is configured to obtain natural language question information input by the user end; the natural language question information is question information related to the data query; 信息解析模块,配置为对所述自然语言提问信息进行解析,得到最小解析单元;An information parsing module configured to parse the natural language question information to obtain a minimum parsing unit; 指令生成模块,配置为基于所述最小解析单元以及预设指令集,生成所述自然语言提问信息对应的查询指令;An instruction generating module, configured to generate a query instruction corresponding to the natural language question information based on the minimum parsing unit and a preset instruction set; 检索模块,配置为根据所述查询指令从预设知识库中进行检索,得到所述自然语言提问信息对应的数据结果;所述预设知识库根据用户提供的数据库数据、用户的输入信息数据和/或第三方数据生成。a retrieval module, configured to retrieve from the preset knowledge base according to the query instruction, to obtain a data result corresponding to the natural language question information; the preset knowledge base is based on database data provided by the user, input information data of the user, and / or third party data generation. 根据权利要求11所述的装置,其特征在于,所述信息解析模块包括:The device according to claim 11, wherein the information parsing module comprises: 分词模块,配置为对所述自然语言提问信息进行分词处理,得到多个分词片段;a word segmentation module configured to perform word segmentation processing on the natural language question information to obtain a plurality of word segmentation segments; 识别模块,配置为对所述多个分词片段进行实体名词识别,得到最小解析单元;所述最小解析单元包括:属性最小解析单元、度量最小解析单元及时空修饰结构词。The identification module is configured to perform entity noun recognition on the plurality of segmentation segments to obtain a minimum parsing unit; the minimum parsing unit includes: an attribute minimum parsing unit, a metric minimum parsing unit, and a space-modifying structural word. 根据权利要求12所述的装置,其特征在于,所述属性最小解析单元包括属性项、计算操作项、属性逻辑关系项中至少一项;所述度量最小解析单元包括度量项、度量逻辑关系项、计算修饰项中至少一项。The apparatus according to claim 12, wherein the attribute minimum parsing unit comprises at least one of an attribute item, a calculation operation item, and an attribute logical relationship item; the metric minimum parsing unit comprises a metric item, a metric logical relationship item And calculate at least one of the modifiers. 根据权利要求13所述的装置,其特征在于,所述属性最小解析单元中的属性项表示自然语言提问信息所属的分类或实体;所述属性最小解析单元中的计算操作项表示分类、分组、切分、部分取值、计数、按拼音排名和按姓氏笔画排名的方式;所述属性最小解析单元中的属性逻辑关系项表示相似、不相似、包含、不包含、对比和比较的逻辑关系。The apparatus according to claim 13, wherein the attribute item in the attribute minimum parsing unit represents a category or entity to which the natural language question information belongs; the calculation operation item in the attribute minimum parsing unit represents classification, grouping, Segmentation, partial value, count, ranking by pinyin and ranking by last name stroke; attribute logical relationship terms in the minimum parsing unit of the attribute represent logical relationships of similarity, dissimilarity, inclusion, non-containment, comparison and comparison. 根据权利要求13或14所述的装置,其特征在于,所述度量最小解析单元中的度量项表示数值;所述度量最小解析单元中的度量逻辑关系项表示大于、小于、等于、大于等于、小于等于、不等于的数值大小关系;所述度量最小解析单元中的计算修饰项表示求和、平均、计数、标准差、方差、相关度、相关系数、日增长率、周增长率、月增长率、季度增长率、年增长率、排序、最大值、最小值、前N和倒数N。The apparatus according to claim 13 or 14, wherein the metric item in the metric minimum parsing unit represents a numerical value; and the metric logical relationship item in the metric minimum parsing unit represents greater than, less than, equal to, greater than or equal to, a numerical relationship of less than or equal to and not equal to; the calculated modifier in the minimum parsing unit of the metric represents summation, average, count, standard deviation, variance, correlation, correlation coefficient, daily growth rate, weekly growth rate, monthly growth Rate, quarterly growth rate, annual growth rate, sort, maximum, minimum, pre-N and reciprocal N. 根据权利要求12至15任一项所述的装置,其特征在于,所述时空修饰结构词表示下述项中的至少一个:时间段、时间点、某点以前、某点以后和,以及某点和某点之间;某处预设距离范围之外、某处预设距离范围之内和某处设定方向相对位置。The apparatus according to any one of claims 12 to 15, wherein the space-time modified structure word represents at least one of the following: a time period, a time point, a point before, a point after, and a certain Between the point and a point; somewhere outside the preset distance range, somewhere within the preset distance range and somewhere set the direction relative position. 根据权利要求11至16任一项所述的装置,其特征在于,所述指令生成模块具体配置为通过以下步骤基于所述最小解析单元以及预设指令集,生成所述自然语言提问信息对应的查询指令:The device according to any one of claims 11 to 16, wherein the instruction generating module is configured to generate, according to the minimum parsing unit and the preset instruction set, corresponding to the natural language question information Query instructions: 根据所述最小解析单元推断所述自然语言提问信息所包含的数据查询逻辑;Deriving data query logic included in the natural language question information according to the minimum parsing unit; 根据所述数据查询逻辑,从预设指令集中提取相应指令进行组合,生成所述自然语言提问信息对应的查询指令。According to the data query logic, corresponding instructions are extracted from the preset instruction set and combined to generate a query instruction corresponding to the natural language question information. 根据权利要求17所述的装置,其特征在于,所述指令生成模块具体配置为通过以下步骤根据所述最小解析单元推断所述自然语言提问信息所包含的数据查询逻辑:The apparatus according to claim 17, wherein the instruction generating module is specifically configured to infer, according to the minimum parsing unit, data query logic included in the natural language question information by: 获得所述自然语言提问信息对应的类别;Obtaining a category corresponding to the natural language question information; 根据获得的类别,得到所述自然语言提问信息对应的查询逻辑。According to the obtained category, the query logic corresponding to the natural language question information is obtained. 一种电子设备,包括存储器和处理器,所述存储器上存储有可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现上述权利要求1至10任一项所述的方法的步骤。An electronic device comprising a memory and a processor having stored thereon a computer program executable on the processor, wherein the processor executes the computer program to implement the above claims 1 to 10 The steps of any of the methods described. 一种具有处理器可执行的非易失的程序代码的计算机可读介质,其特征在于,所述程序代码使所述处理器执行所述权利要求1至10任一项所述的方法。A computer readable medium having a processor-executable non-volatile program code, wherein the program code causes the processor to perform the method of any one of claims 1 to 10.
PCT/CN2018/112115 2017-12-15 2018-10-26 Natural language question understanding method and apparatus, and electronic device Ceased WO2019114430A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711361679.7A CN108108426B (en) 2017-12-15 2017-12-15 Understanding method and device for natural language question and electronic equipment
CN201711361679.7 2017-12-15

Publications (1)

Publication Number Publication Date
WO2019114430A1 true WO2019114430A1 (en) 2019-06-20

Family

ID=62216599

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/112115 Ceased WO2019114430A1 (en) 2017-12-15 2018-10-26 Natural language question understanding method and apparatus, and electronic device

Country Status (2)

Country Link
CN (1) CN108108426B (en)
WO (1) WO2019114430A1 (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108108426B (en) * 2017-12-15 2021-05-07 杭州汇数智通科技有限公司 Understanding method and device for natural language question and electronic equipment
CN108920543B (en) * 2018-06-13 2020-07-10 珠海格力电器股份有限公司 Query and interaction method and device, computer device and storage medium
CN108846125A (en) * 2018-06-29 2018-11-20 北京百度网讯科技有限公司 Talk with generation method, device, terminal and computer readable storage medium
CN109508441B (en) * 2018-08-21 2023-12-08 江苏赛睿信息科技股份有限公司 Method and device for realizing data statistical analysis through natural language and electronic equipment
CN111104118A (en) * 2018-10-29 2020-05-05 百度在线网络技术(北京)有限公司 AIML-based natural language instruction execution method and system
CN111191431A (en) * 2018-10-29 2020-05-22 百度在线网络技术(北京)有限公司 Method and system for generating report according to natural language instruction
CN111309722A (en) * 2018-12-11 2020-06-19 北京京东尚科信息技术有限公司 A data processing method and device
CN110263051A (en) * 2019-06-11 2019-09-20 出门问问信息科技有限公司 Question and answer for question answering system are to update method, device, equipment and storage medium
CN111324631B (en) * 2020-03-19 2022-04-22 成都海天数联科技有限公司 A method for automatically generating SQL statements from human natural language of query data
CN112270189B (en) * 2020-11-12 2023-07-18 佰聆数据股份有限公司 Question type analysis node generation method, system and storage medium
CN112270188B (en) * 2020-11-12 2023-12-12 佰聆数据股份有限公司 Questioning type analysis path recommendation method, system and storage medium
CN114519044A (en) * 2020-11-20 2022-05-20 富泰华工业(深圳)有限公司 Data query method, block chain system, sharing device and query device
CN112506949B (en) * 2020-12-03 2023-07-25 北京百度网讯科技有限公司 Structured query language query statement generation method, device and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102479223A (en) * 2010-11-25 2012-05-30 中国移动通信集团浙江有限公司 Data query method and system
CN105701253A (en) * 2016-03-04 2016-06-22 南京大学 Chinese natural language interrogative sentence semantization knowledge base automatic question-answering method
CN107391706A (en) * 2017-07-28 2017-11-24 湖北文理学院 A kind of city tour's question answering system based on mobile Internet
CN108108426A (en) * 2017-12-15 2018-06-01 杭州网蛙科技有限公司 Understanding method, device and the electronic equipment that natural language is putd question to

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103226606B (en) * 2013-04-28 2016-08-10 浙江核新同花顺网络信息股份有限公司 Inquiry choosing method and system
KR20150129134A (en) * 2014-05-08 2015-11-19 한국전자통신연구원 System for Answering and the Method thereof
CN105608218B (en) * 2015-12-31 2018-11-27 上海智臻智能网络科技股份有限公司 The method for building up of intelligent answer knowledge base establishes device and establishes system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102479223A (en) * 2010-11-25 2012-05-30 中国移动通信集团浙江有限公司 Data query method and system
CN105701253A (en) * 2016-03-04 2016-06-22 南京大学 Chinese natural language interrogative sentence semantization knowledge base automatic question-answering method
CN107391706A (en) * 2017-07-28 2017-11-24 湖北文理学院 A kind of city tour's question answering system based on mobile Internet
CN108108426A (en) * 2017-12-15 2018-06-01 杭州网蛙科技有限公司 Understanding method, device and the electronic equipment that natural language is putd question to

Also Published As

Publication number Publication date
CN108108426A (en) 2018-06-01
CN108108426B (en) 2021-05-07

Similar Documents

Publication Publication Date Title
WO2019114430A1 (en) Natural language question understanding method and apparatus, and electronic device
US10956464B2 (en) Natural language question answering method and apparatus
US11164568B2 (en) Speech recognition method and apparatus, and storage medium
CN112035730B (en) Semantic retrieval method and device and electronic equipment
CN107818781B (en) Intelligent interaction method, equipment and storage medium
CN107609101B (en) Intelligent interaction method, equipment and storage medium
CN106649786B (en) Answer retrieval method and device based on deep question answering
CN114547274B (en) Multi-turn question and answer method, device and equipment
CN107797984B (en) Intelligent interaction method, equipment and storage medium
JP6007088B2 (en) Question answering program, server and method using a large amount of comment text
CN105095204B (en) The acquisition methods and device of synonym
US20210117625A1 (en) Semantic parsing of natural language query
US12353408B2 (en) Semantic parsing of natural language query
CN108959531B (en) Information search method, device, device and storage medium
WO2017107457A1 (en) Query recommendation method and apparatus
CN106156023B (en) Method, apparatus and system for semantic matching
CN107229668A (en) A kind of text extracting method based on Keywords matching
CN106257455B (en) A kind of Bootstrapping method extracting viewpoint evaluation object based on dependence template
CN111159377B (en) Attribute recall model training method, device, electronic device and storage medium
WO2015149533A1 (en) Method and device for word segmentation processing on basis of webpage content classification
WO2020232898A1 (en) Text classification method and apparatus, electronic device and computer non-volatile readable storage medium
CN105956053A (en) Network information-based search method and apparatus
WO2018090468A1 (en) Method and device for searching for video program
WO2019201024A1 (en) Method, apparatus and device for updating model parameter, and storage medium
CN112417210A (en) Body-building video query method, device, terminal and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18888105

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18888105

Country of ref document: EP

Kind code of ref document: A1