WO2019114430A1

WO2019114430A1 - Natural language question understanding method and apparatus, and electronic device

Info

Publication number: WO2019114430A1
Application number: PCT/CN2018/112115
Authority: WO
Inventors: 王碧波; 董雪梅
Original assignee: Individual
Current assignee: Individual
Priority date: 2017-12-15
Filing date: 2018-10-26
Publication date: 2019-06-20
Anticipated expiration: 2020-06-15
Also published as: CN108108426A; CN108108426B

Abstract

The present disclosure relates to the technical field of natural language processing. Provided are a natural language question understanding method and apparatus, and an electronic device. The natural language question understanding method comprises: obtaining natural language question information input by a user terminal, the natural language question information being question information related to data query; parsing the natural language question information to obtain a minimum parsing unit; generating, on the basis of the minimum parsing unit and a preset instruction set, a query instruction corresponding to the natural language question information; and retrieving, according to the query instruction, from a preset knowledge base a data result corresponding to the natural language question information, the preset knowledge base being generated according to database data provided by a user, input information data of the user, and/or third-party data. The method can accurately recognize natural language question information.

Description

Method, device and electronic device for understanding natural language questioning

相关申请的交叉引用Cross-reference to related applications

本公开要求于2017年12月15日提交中国专利局的申请号为CN2017113616797，名称为“自然语言提问的理解方法、装置及电子设备”的中国专利申请的优先权，其全部内容通过引用结合在本公开中。The present application claims priority to Chinese Patent Application No. CN2017113616797, entitled "Understanding Method, Apparatus and Electronic Device for Natural Language Questioning", filed on December 15, 2017, the entire contents of which are incorporated by reference. In the present disclosure.

Technical field

本公开涉及自然语言处理技术领域，尤其是涉及一种自然语言提问的理解方法、装置及电子设备。The present disclosure relates to the field of natural language processing technologies, and in particular, to a method, device, and electronic device for understanding natural language questions.

Background technique

自然语言处理是一项被人类长期关注并研究的技术，目前该技术主要应用于多语言翻译、信息查询等多个领域，并且均有很好的进展，然而将自然语言处理直接运用在数据分析上尚无先例。Natural language processing is a long-term concern and research by human beings. Currently, this technology is mainly used in many fields such as multi-language translation and information query, and has made good progress. However, natural language processing is directly applied to data analysis. There is no precedent on the top.

自然语言处理分为多种不同的技术流派，一开始，基于形式语言的自然语言处理方法居主流地位，但这种技术路线无法处理富于变化的表达方法，只能机械的按照某些写好的模板或规则对语言进行翻译或生成，显得非常生硬。之后，出现了将统计数学理论引入语言处理的方式，比如，当前谷歌翻译、百度翻译等大部分机器翻译系统均是在此类系统的基础上开发而成。这种基于统计理论的自然语言处理方法，可以有效的运用大量的语料库对模型进行训练，从而习得各种语言表达的变化形式。目前在多语言翻译上表现是很好的。但是这种技术路线仍然存在识别准确性有待提高的缺陷。Natural language processing is divided into many different technical genres. At the beginning, natural language processing methods based on formal languages are in the mainstream, but this technical route cannot handle the expressions that are rich in change, and can only be written mechanically according to certain Templates or rules that translate or generate language are very blunt. Later, there was a way to introduce statistical mathematics theory into language processing. For example, most of the machine translation systems such as Google Translate and Baidu Translation were developed on the basis of such systems. This natural language processing method based on statistical theory can effectively use a large number of corpora to train the model, so as to acquire various forms of language expression. It is currently very good at multilingual translation. However, this technical route still has the defect that the recognition accuracy needs to be improved.

发明内容Summary of the invention

有鉴于此，本公开提供一种自然语言提问的理解方法、装置及电子设备。In view of this, the present disclosure provides a method, device, and electronic device for understanding natural language questions.

第一方面，本公开提供了一种自然语言提问的理解方法，包括：In a first aspect, the present disclosure provides a method for understanding natural language questions, including:

获取用户端输入的自然语言提问信息；自然语言提问信息为与数据查询相关的提问信息；Obtaining natural language question information input by the user end; the natural language question information is question information related to the data query;

对自然语言提问信息进行解析，得到最小解析单元；Parsing the natural language question information to obtain a minimum parsing unit;

基于最小解析单元以及预设指令集，生成自然语言提问信息对应的查询指令；Generating a query instruction corresponding to the natural language question information based on the minimum parsing unit and the preset instruction set;

根据查询指令从预设知识库中进行检索，得到自然语言提问信息对应的数据结果；预设知识库根据用户提供的数据库数据、用户的输入信息数据和/或第三方数据生成。The data is retrieved from the preset knowledge base according to the query instruction, and the data result corresponding to the natural language question information is obtained; the preset knowledge base is generated according to the database data provided by the user, the input information data of the user, and/or the third party data.

结合第一方面，本公开提供了第一方面的第一种可能的实施方式，其中，对自然语言提问信息进行解析，得到最小解析单元，具体包括：With reference to the first aspect, the present disclosure provides a first possible implementation manner of the first aspect, wherein parsing the natural language question information to obtain a minimum parsing unit includes:

对自然语言提问信息进行分词处理，得到多个分词片段；Perform word segmentation on natural language question information to obtain multiple word segmentation segments;

对多个分词片段进行实体名词识别，得到最小解析单元；最小解析单元包括：属性最小解析单元、度量最小解析单元及时空修饰结构词。The entity noun recognition is performed on a plurality of segmentation segments to obtain a minimum parsing unit; the minimum parsing unit includes: an attribute minimum parsing unit, a metric minimum parsing unit, and a time-space modified structural word.

结合第一方面，本公开提供了第一方面的第二种可能的实施方式，其中，属性最小解析单元包括属性项、计算操作项、属性逻辑关系项中至少一项；度量最小解析单元包括度量项、度量逻辑关系项、计算修饰项中至少一项。With reference to the first aspect, the present disclosure provides a second possible implementation manner of the first aspect, wherein the attribute minimum parsing unit comprises at least one of an attribute item, a calculated operation item, and an attribute logical relationship item; the metric minimum parsing unit includes the metric At least one of the item, the metric logical relationship item, and the calculated modifier.

可选地，所述属性最小解析单元中的属性项表示自然语言提问信息所属的分类或实体；属性最小解析单元中的计算操作项表示分类、分组、切分、部分取值、计数、按拼音排名和按姓氏笔画排名的方式；属性最小解析单元中的属性逻辑关系项表示相似、不相似、包含、不包含、对比和比较的逻辑关系。Optionally, the attribute item in the attribute minimum parsing unit represents a category or entity to which the natural language question information belongs; the calculation operation item in the attribute minimum parsing unit represents classification, grouping, segmentation, partial value, counting, pinyin Ranking and ranking by last name stroke; attribute logical relationship items in attribute minimum resolution unit represent logical relationships of similarity, dissimilarity, inclusion, non-containment, comparison, and comparison.

可选地，所述度量最小解析单元中的度量项表示数值；所述度量最小解析单元中的度量逻辑关系项表示大于、小于、等于、大于等于、小于等于、不等于的数值大小关系；所述度量最小解析单元中的计算修饰项表示求和、平均、计数、标准差、方差、相关度、相关系数、日增长率、周增长率、月增长率、季度增长率、年增长率、排序、最大值、最小值、前N和倒数N。Optionally, the metric item in the metric minimum analytic unit represents a numerical value; the metric logical relationship item in the metric minimum analytic unit represents a numerical relationship of greater than, less than, equal to, greater than or equal to, less than or equal to, and not equal to; The calculated modifiers in the metric minimum parsing unit represent summation, average, count, standard deviation, variance, correlation, correlation coefficient, daily growth rate, weekly growth rate, monthly growth rate, quarterly growth rate, annual growth rate, and ranking. , maximum, minimum, front N, and reciprocal N.

可选地，所述时空修饰结构词表示下述项中的至少一个：时间段、时间点、某点以前、某点以后和，以及某点和某点之间；某处预设距离范围之外、某处预设距离范围之内和某处设定方向相对位置。如某处附近，某处预设距离公里、英里、米、千米等范围之内或之外，某处东、南、西、北等方向相对位置的表示等。Optionally, the space-time modified structure word represents at least one of the following: a time period, a time point, a point before, a point after and a point, and a point and a point; a preset distance range somewhere Outside, somewhere within the preset distance range and somewhere set the direction relative position. For example, near a certain place, somewhere within a range of kilometers, miles, meters, kilometers, etc., the relative position of the east, south, west, and north directions.

结合第一方面，本公开提供了第一方面的第三种可能的实施方式，其中，基于最小解析单元以及预设指令集，生成自然语言提问信息对应的查询指令，具体包括：With reference to the first aspect, the present disclosure provides a third possible implementation manner of the first aspect, wherein the generating the query instruction corresponding to the natural language question information based on the minimum parsing unit and the preset instruction set includes:

根据最小解析单元推断自然语言提问信息所包含的数据查询逻辑；Inferring data query logic included in the natural language question information according to the minimum parsing unit;

根据数据查询逻辑，从预设指令集中提取相应指令进行组合，生成自然语言提问信息对应的查询指令。According to the data query logic, the corresponding instructions are extracted from the preset instruction set and combined to generate a query instruction corresponding to the natural language question information.

可选地，根据所述最小解析单元推断所述自然语言提问信息所包含的数据查询逻辑，具体包括：Optionally, inferring, according to the minimum parsing unit, the data query logic included in the natural language question information, specifically:

获得所述自然语言提问信息对应的类别；Obtaining a category corresponding to the natural language question information;

根据获得的类别，得到所述自然语言提问信息对应的查询逻辑。According to the obtained category, the query logic corresponding to the natural language question information is obtained.

结合第一方面，本公开提供了第一方面的第四种可能的实施方式，其中，在根据查询指令从预设知识库中进行检索，得到自然语言提问信息对应的数据结果之前，所述方法还包括：In conjunction with the first aspect, the present disclosure provides a fourth possible implementation manner of the first aspect, wherein the method is performed after performing a retrieval from a preset knowledge base according to a query instruction to obtain a data result corresponding to the natural language question information Also includes:

获取知识库样本数据；知识库样本数据包括：用户提供的数据库数据、用户的输入信息数据和/或第三方数据；Obtaining knowledge base sample data; the knowledge base sample data includes: user-provided database data, user input information data, and/or third-party data;

根据知识库样本数据，生成预设知识库。Generate a preset knowledge base based on the knowledge base sample data.

结合第一方面，本公开提供了第一方面的第五种可能的实施方式，其中，在根据查询指令从预设知识库中进行检索，得到自然语言提问信息对应的数据结果后，所述方法还包括：With reference to the first aspect, the present disclosure provides a fifth possible implementation manner of the first aspect, wherein, after performing a retrieval from a preset knowledge base according to the query instruction, obtaining the data result corresponding to the natural language question information, the method Also includes:

将自然语言提问信息及其对应的数据结果添加至预设知识库中。The natural language question information and its corresponding data result are added to the preset knowledge base.

第二方面，本公开提供一种自然语言提问的理解装置，包括：In a second aspect, the present disclosure provides an apparatus for understanding natural language questions, including:

信息获取模块，用于获取用户端输入的自然语言提问信息；自然语言提问信息为与数据查询相关的提问信息；The information acquisition module is configured to obtain natural language question information input by the user end; the natural language question information is question information related to the data query;

信息解析模块，用于对自然语言提问信息进行解析，得到最小解析单元；An information parsing module, configured to parse natural language question information to obtain a minimum parsing unit;

指令生成模块，用于基于最小解析单元以及预设指令集，生成自然语言提问信息对应的查询指令；An instruction generating module, configured to generate a query instruction corresponding to the natural language question information based on the minimum parsing unit and the preset instruction set;

检索模块，用于根据查询指令从预设知识库中进行检索，得到自然语言提问信息对应的数据结果；预设知识库根据用户提供的数据库数据、用户的输入信息数据和/或第三方数据生成。a retrieval module, configured to retrieve from the preset knowledge base according to the query instruction, and obtain a data result corresponding to the natural language question information; the preset knowledge base is generated according to the database data provided by the user, the input information data of the user, and/or the third party data. .

结合第二方面，本公开提供了第二方面的第一种可能的实施方式，其中，信息解析模块包括：With reference to the second aspect, the present disclosure provides a first possible implementation manner of the second aspect, wherein the information parsing module includes:

分词模块，用于对自然语言提问信息进行分词处理，得到多个分词片段；a word segmentation module for performing word segmentation on natural language question information to obtain a plurality of word segmentation segments;

识别模块，用于对多个分词片段进行实体名词识别，得到最小解析单元；最小解析单元包括：属性最小解析单元、度量最小解析单元及时空修饰结构词。The identification module is configured to perform entity noun recognition on the plurality of segmentation segments to obtain a minimum parsing unit; the minimum parsing unit comprises: a minimum parsing unit of the attribute, a minimum parsing unit of the metric, and a structured word in time and space.

可选地，所述属性最小解析单元包括属性项、计算操作项、属性逻辑关系项中至少一项；所述度量最小解析单元包括度量项、度量逻辑关系项、计算修饰项中至少一项。Optionally, the attribute minimum parsing unit includes at least one of an attribute item, a calculation operation item, and an attribute logical relationship item; the metric minimum parsing unit includes at least one of a metric item, a metric logical relationship item, and a calculation modification item.

可选地，所述属性最小解析单元中的属性项表示自然语言提问信息所属的分类或实体；属性最小解析单元中的计算操作项表示分类、分组、切分、部分取值、计数、按拼音排名和按姓氏笔画排名的方式；属性最小解析单元中的属性逻辑关系项表示相似、不相似、包含、不包含、对比和比较的逻辑关系。其中，切分可以包括切分开头、结尾等。部分取值可以包括针对字符串，取第n到第m个字母等。Optionally, the attribute item in the attribute minimum parsing unit represents a category or entity to which the natural language question information belongs; the calculation operation item in the attribute minimum parsing unit represents classification, grouping, segmentation, partial value, counting, pinyin Ranking and ranking by last name stroke; attribute logical relationship items in attribute minimum resolution unit represent logical relationships of similarity, dissimilarity, inclusion, non-containment, comparison, and comparison. Wherein, the segmentation may include cutting the head, ending, and the like. Partial values can include for the string, the nth to mth letters, and so on.

可选地，所述时空修饰结构词表示下述项中的至少一个：时间段、时间点、某点以前、某点以后和，以及某点和某点之间；某处预设距离范围之外、某处预设距离范围之内和某处设定方向相对位置。Optionally, the space-time modified structure word represents at least one of the following: a time period, a time point, a point before, a point after and a point, and a point and a point; a preset distance range somewhere Outside, somewhere within the preset distance range and somewhere set the direction relative position.

可选地，所述指令生成模块具体配置为通过以下步骤基于所述最小解析单元以及预设指令集，生成所述自然语言提问信息对应的查询指令：Optionally, the instruction generating module is specifically configured to generate, according to the minimum parsing unit and the preset instruction set, a query instruction corresponding to the natural language question information by using the following steps:

根据所述最小解析单元推断所述自然语言提问信息所包含的数据查询逻辑；Deriving data query logic included in the natural language question information according to the minimum parsing unit;

根据所述数据查询逻辑，从预设指令集中提取相应指令进行组合，生成所述自然语言提问信息对应的查询指令。According to the data query logic, corresponding instructions are extracted from the preset instruction set and combined to generate a query instruction corresponding to the natural language question information.

可选地，所述指令生成模块具体配置为通过以下步骤根据所述最小解析单元推断所述自然语言提问信息所包含的数据查询逻辑：Optionally, the instruction generating module is specifically configured to infer, according to the minimum parsing unit, data query logic included in the natural language question information by:

第三方面，本公开还提供一种电子设备，包括存储器和处理器，存储器上存储有可在处理器上运行的计算机程序，处理器执行计算机程序时实现第一方面所述的方法的步骤。In a third aspect, the present disclosure further provides an electronic device comprising a memory and a processor, the memory storing a computer program executable on the processor, the step of implementing the method of the first aspect when the processor executes the computer program.

第四方面，本公开还提供一种具有处理器可执行的非易失的程序代码的计算机可读介质，程序代码使处理器执行第一方面所述的方法。In a fourth aspect, the present disclosure also provides a computer readable medium having processor-executable non-volatile program code, the program code causing a processor to perform the method of the first aspect.

在本公开提供的自然语言提问的理解方法、装置及电子设备中，首先对自然语言提问信息进行解析，得到最小解析单元，然后基于最小解析单元以及预设的指令集，构造出该自然语言提问信息的查询语句，进而依据该查询语句从预先建立的知识库中进行检索，得到该提问信息对应的数据结果，其中，知识库的建立基于用户提供的数据库数据、用户的输入信息数据和/或第三方数据，可以为提问信息提供准确的、经过计算统计之后得到的数据结果，从而可以应用于数据分析领域等专业的场景。In the method, device and electronic device for understanding natural language questions provided by the present disclosure, firstly, the natural language question information is parsed to obtain a minimum parsing unit, and then the natural language question is constructed based on the minimum parsing unit and the preset instruction set. The query statement of the information is further retrieved from the pre-established knowledge base according to the query statement, and the data result corresponding to the question information is obtained, wherein the knowledge base is established based on the database data provided by the user, the input information data of the user, and/or The third-party data can provide accurate and calculated data results for the question information, so that it can be applied to professional scenes such as data analysis.

本公开的其他特征和优点将在随后的说明书中阐述，并且，部分地从说明书中变得显而易见，或者通过实施本公开而了解。本公开的目的和其他优点在说明书、权利要求书以及附图中所特别指出的结构来实现和获得。Other features and advantages of the present disclosure will be set forth in the description which follows. The objectives and other advantages of the disclosure are realized and attained by the structure of the invention.

为使本公开的上述目的、特征和优点能更明显易懂，下文特举较佳实施例，并配合所附附图，作详细说明如下。The above described objects, features, and advantages of the present invention will become more apparent from the description of the appended claims.

DRAWINGS

为了更清楚地说明本公开具体实施方式或现有技术中的技术方案，下面将对具体实施方式或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图是本公开的一些实施方式，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the specific embodiments of the present disclosure or the technical solutions in the prior art, the drawings to be used in the specific embodiments or the description of the prior art will be briefly described below, and obviously, the attached in the following description The figures are some embodiments of the present disclosure, and other drawings may be obtained from those of ordinary skill in the art without departing from the drawings.

图1为本公开提供的一种自然语言提问的理解方法的流程图；FIG. 1 is a flowchart of a method for understanding a natural language question provided by the present disclosure;

图2为本公开提供的另一种自然语言提问的理解方法的流程图；2 is a flowchart of another method for understanding natural language questions provided by the present disclosure;

图3为本公开提供的另一种自然语言提问的理解方法的流程图；FIG. 3 is a flowchart of another method for understanding natural language questions provided by the present disclosure;

图4为本公开提供的另一种自然语言提问的理解方法的流程图；4 is a flowchart of another method for understanding natural language questions provided by the present disclosure;

图5为本公开提供的另一种自然语言提问的理解方法的流程图；FIG. 5 is a flowchart of another method for understanding natural language questions provided by the present disclosure;

图6为本公开提供的一种自然语言提问的理解装置的结构示意图；FIG. 6 is a schematic structural diagram of an apparatus for understanding natural language questions provided by the present disclosure;

图7为本公开提供的一种电子设备的结构示意图。FIG. 7 is a schematic structural diagram of an electronic device according to the present disclosure.

Detailed ways

为使本公开的目的、技术方案和优点更加清楚，下面将结合附图对本公开的技术方案进行清楚、完整地描述，显然，所描述的实施例是本公开一部分实施例，而不是全部的实施例。基于本公开中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本公开保护的范围。The present invention will be clearly and completely described in the following with reference to the accompanying drawings. example. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present disclosure without departing from the inventive scope are the scope of the disclosure.

目前现有的自然语言处理方法存在识别准确性有待提高的缺陷。经研究发现，识别准确性有待提高主要体现在以下几方面：(1)如果某些场景没有大量的语料积累，其识别性能就大打折扣；(2)通过统计方法训练出来的模型，不具有精确性，难以表达或解析准确的含义。因此，无法运用于一些很专业的场景，比如数据分析领域。基于此，本公开提供的一种自然语言提问的理解方法、装置及电子设备，能够对用户的自然语言提问信息进行准确的识别，并匹配出高准确度的数据结果，可以应用于数据分析领域等专业的场景。At present, the existing natural language processing methods have the defects that the recognition accuracy needs to be improved. The research found that the recognition accuracy needs to be improved mainly in the following aspects: (1) If some scenes do not have a large amount of corpus accumulation, their recognition performance will be greatly reduced; (2) the model trained by statistical methods is not accurate. Sexual, difficult to express or resolve the exact meaning. Therefore, it cannot be applied to some very professional scenes, such as the field of data analysis. Based on this, the method, device and electronic device for understanding natural language questions provided by the present disclosure can accurately identify the natural language question information of the user and match the high-accuracy data result, which can be applied to the field of data analysis. And other professional scenes.

为便于对本实施例进行理解，首先对本公开所公开的一种自然语言提问的理解方法进行详细介绍。In order to facilitate understanding of the present embodiment, a method for understanding natural language questions disclosed in the present disclosure will be described in detail first.

实施例一：Embodiment 1:

本公开提供一种自然语言提问的理解方法，可以应用于数据分析领域等相对专业的场景，可以由具有数据处理功能的电子设备等执行。参见图1所示，该方法包括以下几个步骤：The present disclosure provides a method for understanding a natural language question, which can be applied to a relatively professional scene such as a data analysis field, and can be executed by an electronic device having a data processing function or the like. Referring to Figure 1, the method includes the following steps:

S101：获取用户端输入的自然语言提问信息；所述自然语言提问信息为与数据查询相关的提问信息。S101: Acquire natural language question information input by the user end; the natural language question information is question information related to the data query.

具体实现的时候，用户可以通过语音或者打字等输入方式，利用搜索引擎的交互过程，给软件系统输入自然语言提问信息，比如“2016年之前年龄大于60(岁的)中国姓王的藏族男性(的)体重小于70kg的平均年龄”等。上述自然语言提问信息最终以文本的形式被服务器获取到，而且，该自然语言提问信息为与数据查询相关的提问信息。电子设备中也可以预存多种自然语言提问信息，并对多种自然语言提问信息进行分级存储，例如，可以设定第一数据分析场景、第二数据分析场景......第M数据分析场景，M为大于3的整数。各数据分析场景下可以对应不同的子场景，各子场景中对应不同的自然语言提问信息。电设备将各数据分析场景展示给用户，由用户按所需的数据分析场景进行依次选择，从而选择出所需的自然语言提问信息进行输入。In the specific implementation, the user can input the natural language question information to the software system by using the input process of voice or typing, and input the natural language question information to the software system, such as "the Tibetan male who is older than 60 (years old) before the age of 2016 ( The average age of the body weight is less than 70kg". The above natural language question information is finally obtained by the server in the form of text, and the natural language question information is question information related to the data query. The electronic device may also pre-store a plurality of natural language question information and hierarchically store a plurality of natural language question information. For example, the first data analysis scenario, the second data analysis scenario, the M data may be set. Analyze the scene, M is an integer greater than 3. Each data analysis scenario can correspond to different sub-scenarios, and each sub-scenario corresponds to different natural language question information. The electrical device displays each data analysis scenario to the user, and the user selects the data analysis scenarios according to the required data to select the desired natural language question information for input.

S102：对自然语言提问信息进行解析，得到最小解析单元。S102: Parse the natural language question information to obtain a minimum parsing unit.

为了得到最终的与自然语言提问信息相对应的准确的数据结果，需要对该自然语言提问信息进行准确地分析理解，因此，在获取到用户输入的自然语言提问信息后，需要首先对该自然语言提问信息进行解析，得到最小解析单元。具体的解析过程包括以下步骤，参见图2所示：In order to obtain the final accurate data result corresponding to the natural language question information, the natural language question information needs to be accurately analyzed and understood. Therefore, after obtaining the natural language question information input by the user, the natural language needs to be first The question information is parsed to obtain the minimum parsing unit. The specific parsing process includes the following steps, as shown in Figure 2:

S201：对自然语言提问信息进行分词处理，得到多个分词片段。S201: Perform word segmentation on the natural language question information to obtain a plurality of word segmentation segments.

在具体实现的时候，首先对自然语言提问信息进行分词处理，也就是实体边界识别，将自然语言提问信息分割成多个分词片段，比如，上述问题“2016年之前年龄大于60(岁的)中国姓王的藏族男性(的)体重小于70kg的平均年龄”，经过分词处理后，可以得到“2016年之前”“年龄”“大于60(岁的)”“中国”“姓”“王的”“藏族”“男性(的)”“体重”“小于70kg的”“平均”“年龄”多个分词片段。In the specific implementation, the word processing of the natural language question information is firstly processed, that is, the entity boundary recognition, and the natural language question information is segmented into a plurality of word segmentation, for example, the above question "the age of more than 60 (years old) before 2016) The Tibetan male surnamed Wang is less than the average age of 70kg. After the word segmentation, you can get "before 2016", "age", "greater than 60 (years old), "China", "surname", "king" Tibetan ""male"" "weight" "less than 70kg" "average" "age" multiple segmentation.

对自然语言提问信息进行分词处理的方式有多种，例如，可以基于词典分词算法，如正向最大匹配法、逆向最大匹配法和双向匹配分词法等对自然语言提问信息进行分词处理。又例如，可以基于分词工具如斯坦福(Stanford)分词工具、Hanlp分词工具等对自然语言提问信息进行分词处理。又例如，可以基于隐马尔可夫模型(Hidden Markov Model,HMM))、条件随机场算法(Conditional Random Field algorithm，CRF)、支持向量机(Support Vector Machine，SVM)、深度学习等对自然语言提问信息进行分词处理。也可以基于上述一种或者多种方式的组合对自然语言提问信息进行分词处理。There are various ways to perform word segmentation processing on natural language question information. For example, word segmentation processing can be performed on natural language question information based on dictionary word segmentation algorithms such as forward maximum matching method, inverse maximum matching method and two-way matching word segmentation method. For another example, the natural language question information can be segmented based on a word segmentation tool such as a Stanford word segmentation tool or a Hanlp word segmentation tool. For another example, a natural language question can be asked based on a Hidden Markov Model (HMM), a Conditional Random Field Algorithm (CRF), a Support Vector Machine (SVM), and a deep learning. Information is processed in word segmentation. It is also possible to perform word segmentation processing on natural language question information based on a combination of one or more of the above.

S202：对多个分词片段进行实体名词识别，得到最小解析单元。S202: Perform entity noun recognition on a plurality of segmentation segments to obtain a minimum resolution unit.

在得到多个分词片段后，对每个分词片段进行实体名词识别，确定该自然语言提问信息对应的最小解析单元。其中，最小解析单元包括：属性最小解析单元、度量最小解析单元及时空修饰结构词。属性最小解析单元包括属性项、计算操作项、属性逻辑关系项中至少一项；度量最小解析单元包括度量项、度量逻辑关系项、计算修饰项中至少一项；结构词包括时空修饰词。After obtaining a plurality of segmentation segments, the entity nouns are identified for each segmentation segment, and the minimum parsing unit corresponding to the natural language question information is determined. The minimum parsing unit includes: an attribute minimum parsing unit, a metric minimum parsing unit, and a time-space modified structural word. The attribute minimum parsing unit includes at least one of an attribute item, a calculation operation item, and an attribute logical relationship item; the metric minimum parsing unit includes at least one of a metric item, a metric logical relationship item, and a calculation modifier; the structural word includes a space-time modifier.

可选的，属性最小解析单元中的属性项表示该自然语言提问信息所属的分类或实体；属性最小解析单元中的计算操作项表示分类、分组、切分、部分取值、计数、按拼音排名、按姓氏笔画排名等方式；属性最小解析单元中的属性逻辑关系项表示相似、不相似、包含、不包含等逻辑关系；度量最小解析单元中的度量项表示数值；度量最小解析单元中的度量逻辑关系项表示大于、小于、等于、大于等于、小于等于等数值大小关系。度量最小解析单元中的计算修饰项表示求和、平均(平均的用法可以为：平均xxx，不得使用xxx的平均值)、计数、标准差、方差、相关度、相关系数、日增长率、周增长率、月增长率、季度增长率、年增长率、排序、最大值、最小值、前N、倒数N等等，N为大于等于1的整数；时空修饰结构词表示时间段、时间点、某点以前、某点以后、某点和某点之间等等描述时间的词语，以及某处预设距离范围之外、某处预设距离范围之内和某处设定方向相对位置。如某处附近，某处预设距离公里、英里、米、千米等范围之内或之外，某处东、南、西、北等方向相对位置的表示等。时空修饰结构词还可以是以上表示时间和空间的修饰词的组合，比如某时间段某处某距离以内等。Optionally, the attribute item in the attribute minimum parsing unit indicates the category or entity to which the natural language question information belongs; the calculation operation item in the attribute minimum parsing unit indicates classification, grouping, segmentation, partial value, counting, and pinyin ranking. Attributes in the minimum parsing unit of the attribute represent logical relationships of similarity, dissimilarity, inclusion, and non-containment; metrics in the metric minimum parsing unit represent numerical values; metrics in the smallest parsing unit The logical relationship item represents a numerical relationship of greater than, less than, equal to, greater than or equal to, less than or equal to. The calculated modifiers in the metric minimum parsing unit represent summation, averaging (average usage can be: average xxx, not using the average of xxx), count, standard deviation, variance, correlation, correlation coefficient, daily growth rate, week Growth rate, monthly growth rate, quarterly growth rate, annual growth rate, ranking, maximum value, minimum value, pre-N, reciprocal number N, etc., N is an integer greater than or equal to 1; space-time modified structure words indicate time period, time point, A word describing time before, after, after, at a certain point, and some point, and somewhere within a preset distance range, somewhere within a preset distance range, and somewhere in a certain direction. For example, near a certain place, somewhere within a range of kilometers, miles, meters, kilometers, etc., the relative position of the east, south, west, and north directions. The space-time modified structure word may also be a combination of the above-mentioned modifiers indicating time and space, such as within a certain distance of a certain time period.

需要说明的是，属性最小解析单元、度量最小解析单元及时空修饰结构词的判断的核心在于实体名词识别出的词是否可计算，比如，上述表示大于、小于、包含、不包含、标准差、增长率等含义的词汇均为可以进行计算的词，以这些可计算的词为依据进行后续的步骤，可以提高对自然语言提问信息理解的准确度，以及获得更加精确的数据结果，为数据分析领域提供更好的数据参考信息。It should be noted that the core of the judgment of the attribute minimum parsing unit, the metric minimum parsing unit and the time-space modified structure word is whether the word recognized by the entity noun can be calculated, for example, the above representation is greater than, less than, included, not included, standard deviation, Words such as growth rate are words that can be calculated. Subsequent steps based on these computable words can improve the accuracy of understanding natural language question information and obtain more accurate data results for data analysis. The field provides better data reference information.

通过对用户输入的自然语言提问信息进行分词处理和实体名词识别，可以确定出该自然语言提问信息所对应的最小解析单元。比如：提问信息为“2017年8月华北地区动作电影的票房收入”，那么分词后，提取出的分词片段为：时间(2017年8月)，地区(华北地区)，动作电影的，票房收入。然后对上述分词片段进行实体名词识别，确定出该提问信息的最小解析单元和结构词。其中，时间(2017年8月)为时空修饰结构词；地区(华北地区)属于属性最小解析单元，动作电影属于属性最小解析单元；收入属于度量最小解析单元。By performing word segmentation processing and entity noun recognition on the natural language question information input by the user, the minimum parsing unit corresponding to the natural language question information can be determined. For example, the question information is “box office income of action movies in North China in August 2017”. After the word segmentation, the extracted word segmentation is: time (August 2017), region (North China region), action movie, box office income . Then, the above participle segment is subjected to entity noun recognition, and the minimum parsing unit and structural word of the question information are determined. Among them, time (August 2017) is a space-time modified structure word; the region (North China region) belongs to the attribute minimum parsing unit, the action movie belongs to the attribute minimum parsing unit; the income belongs to the metric minimum parsing unit.

S103：基于最小解析单元以及预设指令集，生成自然语言提问信息对应的查询指令。S103: Generate a query instruction corresponding to the natural language question information based on the minimum parsing unit and the preset instruction set.

具体的，查询指令的生成过程包括以下步骤，参见图3所示：Specifically, the process of generating the query instruction includes the following steps, as shown in FIG. 3:

S301：根据最小解析单元推断自然语言提问信息所包含的数据查询逻辑。S301: Infer the data query logic included in the natural language question information according to the minimum parsing unit.

比如，对自然语言提问信息“2017年8月华北地区动作电影的票房收入”进行解析后，得到的最小解析单元包括：华北地区(属性最小解析单元)、动作电影(属性最小解析单元)、收入(度量最小解析单元)；时空修饰结构词为：2017年8月；然后进一步根据这些词确定该提问信息所包含的数据查询逻辑。具体的，根据最小解析单元的含义和顺序，确定其在指令集中的组合逻辑。For example, after analyzing the natural language question information “box office income of action movies in North China in August 2017”, the minimum parsing unit obtained includes: North China (attribute minimum parsing unit), action movie (attribute minimum parsing unit), income (Metric minimum parsing unit); space-time modified structure words: August 2017; then further based on these words to determine the data query logic contained in the question information. Specifically, the combination logic in the instruction set is determined according to the meaning and order of the minimum parsing unit.

本公开中，根据属性最小解析单元中的属性项可以得到自然语言提问信息所属的分类或实体，进而得到自然语言提问信息的类别。为了实现对不同类别的自然语言提问信息的准确查询，可以预先设定各种类别的自然语言提问信息分别对应的查询逻辑。In the present disclosure, the category or entity to which the natural language question information belongs may be obtained according to the attribute item in the attribute minimum parsing unit, thereby obtaining the category of the natural language question information. In order to accurately query different types of natural language question information, the query logic corresponding to each category of natural language question information may be preset.

问题类别不同的自然语言提问信息所对应数据查询逻辑是不同的。自然语言提问信息的问题类别常见的有：问数量类、问比率类、问排名类、问关系类等。其中，问数量类指某些特定时间段内或者某个时间点的度量值或属性值；问比率类指某个度量或某个属性的计数在不同时间段的比值、不同度量在同一时间段的比值等；问排名类指按照某种维度(该维度最终解析成过滤条件)对列(度量或属性)进行排序；问关系类中的关系是预定义的或者程序经数据训练习得的，比如最危险的、最有价值的、最相关的、附近的、同类的等等。The data query logic corresponding to the natural language question information with different problem categories is different. The types of questions in natural language questioning information are: question quantity class, question ratio class, question ranking class, question relationship class, and so on. The quantity class refers to the metric or attribute value in a certain time period or at a certain time point; the question ratio class refers to the ratio of the count of a certain metric or an attribute in different time periods, and the different metrics are in the same time period. The ratio of the rankings; the ranking class refers to sorting the columns (metrics or attributes) according to a certain dimension (the dimension is finally parsed into a filter condition); the relationship in the relational class is predefined or the program is learned through data training. For example, the most dangerous, most valuable, most relevant, nearby, similar, and so on.

S302：根据数据查询逻辑，从预设指令集中提取相应指令进行组合，生成自然语言提问信息对应的查询指令。S302: According to the data query logic, extract corresponding instructions from the preset instruction set to combine, and generate a query instruction corresponding to the natural language question information.

比如，基于上述自然语言提问信息“2017年8月华北地区动作电影的票房收入”解析出的最小解析单元，以及上述数据查询逻辑，从预设指令集中提取出相应的指令，并按照数据查询逻辑进行组合，构造出该提问信息的查询指令，如：通过SQL(或Cyper)等结构化的查询语言下发查询请求：select sum(票房)from电影票房表group by地区,电影类别where地区＝'华北'and电影类别＝'动作'and时间>＝2017-08-01and时间<2018-09-01，该查询请求由预定义的指令片段或指令集合组合而成，数据库通过解析这些指令集组成的请求，筛选、计算并返回数据，从而回答用户的问题。For example, based on the above-mentioned natural language question information "the box office revenue of the action movie in North China in August 2017", and the above-mentioned data query logic, the corresponding instruction is extracted from the preset instruction set, and according to the data query logic Combine and construct the query instruction of the question information, such as: sending a query request through a structured query language such as SQL (or Cyper): select sum (box office) from the movie box office table by region, movie category where area = ' North China'and movie category = 'action' and time>=2017-08-01and time <2018-09-01, the query request is composed of a predefined instruction fragment or instruction set, and the database is composed by parsing these instruction sets. Request, filter, calculate, and return data to answer user questions.

S104：根据查询指令从预设知识库中进行检索，得到自然语言提问信息对应的数据结果。S104: Perform a retrieval from a preset knowledge base according to the query instruction, and obtain a data result corresponding to the natural language question information.

在根据所述查询指令从预设知识库中进行检索，得到所述自然语言提问信息对应的数据结果之前，还包括以下步骤，参见图4所示：Before performing the retrieval from the preset knowledge base according to the query instruction to obtain the data result corresponding to the natural language question information, the following steps are also included, as shown in FIG. 4:

S401：获取知识库样本数据；所述知识库样本数据包括：用户提供的数据库数据、用户的输入信息数据和/或第三方数据。S401: Acquire knowledge base sample data; the knowledge base sample data includes: user-provided database data, user input information data, and/or third-party data.

S402：根据知识库样本数据，生成预设知识库。S402: Generate a preset knowledge base according to the knowledge base sample data.

具体的知识库生成过程不做具体限定，例如，可以采用基于卷积神经网络的深度学习模型进行建立，也可以采用其它的方法。The specific knowledge base generation process is not specifically limited. For example, a deep learning model based on a convolutional neural network may be used for establishment, or other methods may be employed.

为了提高查询结果的精确度，本公开所提供的方法还包括以下步骤，参见图5所示：In order to improve the accuracy of the query result, the method provided by the present disclosure further includes the following steps, as shown in FIG. 5:

S501：将自然语言提问信息及其对应的数据结果添加至预设知识库中。S501: Add the natural language question information and the corresponding data result to the preset knowledge base.

通过上述步骤可以不断地更新预设知识库中的数据，使预设知识库中的内容越来越丰富，从而可以不断地提高提问信息的查询精确度。最终得到的数据结果会根据查询时间的不同以及知识库的不断更新，发生相应的变化，而并非预存好的答案。例如，若预设知识库中存储有针对同一自然语言提问信息对应的不同的数据结果，可以在预设知识库中记录针对同一自然语言提问信息得到各数据结果的最近时间，在获取用户端输入的该自然语言提问信息之后，从预设知识库中查找出与该自然语言提问信息对应的所有数据结果，并选取获得的时间最近的一个数据结果作为该自然语言提问信息对应的最终数据结果。Through the above steps, the data in the preset knowledge base can be continuously updated, so that the content in the preset knowledge base is more and more rich, so that the query accuracy of the question information can be continuously improved. The resulting data results will change according to the query time and the knowledge base, and it is not a pre-stored answer. For example, if the preset knowledge base stores different data results corresponding to the same natural language question information, the latest time for obtaining the data result for the same natural language question information may be recorded in the preset knowledge base, and the user input is obtained. After the natural language question information, all data results corresponding to the natural language question information are searched from the preset knowledge base, and a data result closest to the obtained time is selected as the final data result corresponding to the natural language question information.

又例如，若预设知识库中存储有针对同一自然语言提问信息对应的不同的数据结果，在获取用户端输入的该自然语言提问信息之后，可以从预设知识库中查找出与该自然语言提问信息对应的所有数据结果并向用户输出，获取用户从所有数据结果中选择的一个数据结果，将与该自然语言提问信息对应的所有数据结果中各数据结果被选中的次数进行统计，在存在数据结果被选中的次数大于设定最高阈值后，将该数据结果作为该自然语言提问信息对应的最终数据结果。For example, if the preset knowledge base stores different data results corresponding to the same natural language question information, after obtaining the natural language question information input by the user end, the natural language may be found from the preset knowledge base. All the data results corresponding to the question information are output to the user, and a data result selected by the user from all the data results is obtained, and the number of times the data results are selected in all the data results corresponding to the natural language question information is counted in the presence After the data result is selected more than the set highest threshold, the data result is used as the final data result corresponding to the natural language question information.

在本公开提供的自然语言提问的理解方法中，首先对自然语言提问信息进行解析，得到最小解析单元，然后基于最小解析单元以及预设的指令集，构造出该自然语言提问信息的查询语句，进而依据该查询语句从预先建立的知识库中进行检索，得到该提问信息对应的数据结果，该方法中，知识库的建立基于用户提供的数据库数据、用户的输入信息数据和/或第三方数据，可以为提问信息提供准确的、经过计算统计之后得到的数据结果，从而使得该方法可以应用于数据分析领域等专业的场景。In the understanding method of the natural language question provided by the present disclosure, the natural language question information is first parsed to obtain a minimum parsing unit, and then the query statement of the natural language question information is constructed based on the minimum parsing unit and the preset instruction set. Then, the data is retrieved from the pre-established knowledge base according to the query statement, and the data result corresponding to the question information is obtained. In the method, the knowledge base is established based on user-provided database data, user input information data, and/or third-party data. It can provide accurate and calculated data results for the question information, so that the method can be applied to professional scenes such as data analysis.

实施例二：Embodiment 2:

本公开提供一种自然语言提问的理解装置，参见图6所示，该装置包括：信息获取模块61、信息解析模块62、指令生成模块63和检索模块64。The present disclosure provides an apparatus for understanding natural language questions. Referring to FIG. 6, the apparatus includes an information acquisition module 61, an information analysis module 62, an instruction generation module 63, and a retrieval module 64.

信息获取模块61，用于获取用户端输入的自然语言提问信息；自然语言提问信息为与数据查询相关的提问信息；信息解析模块62，用于对自然语言提问信息进行解析，得到最小解析单元；指令生成模块63，用于基于最小解析单元以及预设指令集，生成自然语言提问信息对应的查询指令；检索模块64，用于根据查询指令从预设知识库中进行检索，得到自然语言提问信息对应的数据结果；预设知识库根据用户提供的数据库数据、用户的输入信息数据和/或第三方数据生成。The information obtaining module 61 is configured to obtain the natural language question information input by the user end; the natural language question information is the question information related to the data query; the information parsing module 62 is configured to parse the natural language question information to obtain a minimum parsing unit; The instruction generating module 63 is configured to generate a query instruction corresponding to the natural language question information based on the minimum parsing unit and the preset instruction set; the retrieving module 64 is configured to perform the retrieving from the preset knowledge base according to the query instruction to obtain the natural language question information. Corresponding data results; the preset knowledge base is generated based on user-provided database data, user input information data, and/or third-party data.

具体的，信息解析模块62还包括：Specifically, the information parsing module 62 further includes:

分词模块621，用于对自然语言提问信息进行分词处理，得到多个分词片段；识别模块622，用于对多个分词片段进行实体名词识别，得到最小解析单元；最小解析单元包括：属性最小解析单元、度量最小解析单元及时空修饰结构词。The word segmentation module 621 is configured to perform word segmentation processing on the natural language question information to obtain a plurality of word segment segments; the recognition module 622 is configured to perform entity noun recognition on the plurality of segment word segments to obtain a minimum parsing unit; and the minimum parsing unit includes: attribute minimum parsing The unit, the metric minimum parsing unit, and the structural word are modified in time.

本公开所提供的自然语言提问的理解装置中，各个模块与前述自然语言提问的理解方法具有相同的技术特征，因此，同样可以实现上述功能。本装置中的各个模块的具体工作过程参见上述方法实施例，在此不再赘述。In the apparatus for understanding natural language questions provided by the present disclosure, each module has the same technical features as the above-described method of understanding natural language questions, and therefore, the above functions can also be realized. For the specific working process of each module in the device, refer to the foregoing method embodiment, and details are not described herein again.

可选地，所述时空修饰结构词表示下述项中的至少一个：时间段、时间点、某点以前、某点以后和，以及某点和某点之间。Optionally, the space-time modified structure word represents at least one of the following: a time period, a time point, a point before, a point after, and a point and a point.

可选地，所述指令生成模块63具体配置为通过以下步骤基于所述最小解析单元以及预设指令集，生成所述自然语言提问信息对应的查询指令：根据所述最小解析单元推断所述自然语言提问信息所包含的数据查询逻辑；根据所述数据查询逻辑，从预设指令集中提取相应指令进行组合，生成所述自然语言提问信息对应的查询指令。Optionally, the instruction generating module 63 is configured to generate, according to the minimum parsing unit and the preset instruction set, a query instruction corresponding to the natural language question information by: inferring the natural according to the minimum parsing unit The data query logic included in the language question information; according to the data query logic, extracting corresponding instructions from the preset instruction set to be combined, and generating a query instruction corresponding to the natural language question information.

可选地，所述指令生成模块63具体配置为通过以下步骤根据所述最小解析单元推断所述自然语言提问信息所包含的数据查询逻辑：获得所述自然语言提问信息对应的类别；根据获得的类别，得到所述自然语言提问信息对应的查询逻辑。Optionally, the instruction generating module 63 is specifically configured to infer, according to the minimum parsing unit, data query logic included in the natural language question information by: obtaining a category corresponding to the natural language question information; The category obtains query logic corresponding to the natural language question information.

实施例三：Embodiment 3:

本公开还提供一种电子设备，参见图7所示，该电子设备包括：处理器70，存储器71，总线72和通信接口73，所述处理器70、通信接口73和存储器71通过总线72连接；处理器70用于执行存储器71中存储的可执行模块，例如计算机程序。处理器执行计算机程序时实现如方法实施例所述的方法的步骤。即实施例1中的各方法步骤可以由处理器70执行。The present disclosure also provides an electronic device. As shown in FIG. 7, the electronic device includes a processor 70, a memory 71, a bus 72, and a communication interface 73. The processor 70, the communication interface 73, and the memory 71 are connected by a bus 72. The processor 70 is operative to execute executable modules, such as computer programs, stored in the memory 71. The steps of the method as described in the method embodiments are implemented when the processor executes a computer program. That is, each method step in Embodiment 1 can be performed by the processor 70.

其中，存储器71可能包含高速随机存取存储器(RAM，RandomAccessMemory)，也可能还包括非不稳定的存储器(non-volatile memory)，例如至少一个磁盘存储器。通过至少一个通信接口73(可以是有线或者无线)实现该系统网元与至少一个其他网元之间的通信连接，可以使用互联网，广域网，本地网，城域网等。The memory 71 may include a high speed random access memory (RAM), and may also include a non-volatile memory, such as at least one disk memory. The communication connection between the system network element and at least one other network element is implemented by at least one communication interface 73 (which may be wired or wireless), and may use an Internet, a wide area network, a local network, a metropolitan area network, or the like.

总线72可以是工业标准体系结构(Industry Standard Architecture，ISA)总线、外设部件互连标准(Peripheral Component Interconnect，PCI)总线或扩展工业标准结构(Extended Industry Standard Architecture，EISA)总线等。所述总线可以分为地址总线、数据总线、控制总线等。为便于表示，图7中仅用一个双向箭头表示，但并不表示仅有一根总线或一种类型的总线。The bus 72 may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, or an Extended Industry Standard Architecture (EISA) bus. The bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one double-headed arrow is shown in Figure 7, but it does not mean that there is only one bus or one type of bus.

其中，存储器71用于存储程序，所述处理器70在接收到执行指令后，执行所述程序，前述本公开任一实施例揭示的流过程定义的装置所执行的方法可以应用于处理器70中，或者由处理器70实现。The memory 71 is used to store a program, and the processor 70 executes the program after receiving the execution instruction, and the method executed by the device defined by the flow process disclosed in any of the foregoing embodiments may be applied to the processor 70. Medium or implemented by processor 70.

处理器70可能是一种集成电路芯片，具有信号的处理能力。在实现过程中，上述方法的各步骤可以通过处理器70中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器70可以是通用处理器，包括中央处理器(Central Processing Unit，简称CPU)、网络处理器(Network Processor，简称NP)等；还可以是数字信号处理器(Digital Signal Processing，简称DSP)、专用集成电路(Application Specific Integrated Circuit，简称ASIC)、现成可编程门阵列(Field-Programmable Gate Array，简称FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本公开中的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本公开所提供的方法的步骤可以直接体现为硬件译码处理器执行完成，或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器，闪存、只读存储器，可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器71，处理器70读取存储器71中的信息，结合其硬件完成上述方法的步骤。Processor 70 may be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the above method may be completed by an integrated logic circuit of hardware in the processor 70 or an instruction in the form of software. The processor 70 may be a general-purpose processor, including a central processing unit (CPU), a network processor (NP processor, etc.), or a digital signal processor (DSP). ), Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware component. The methods, steps, and logic blocks in this disclosure may be implemented or performed. The general purpose processor may be a microprocessor or the processor or any conventional processor or the like. The steps of the method provided in connection with the present disclosure may be directly embodied by the completion of the hardware decoding processor or by a combination of hardware and software modules in the decoding processor. The software modules can be located in a conventional storage medium such as random access memory, flash memory, read only memory, programmable read only memory or electrically erasable programmable memory, registers, and the like. The storage medium is located in the memory 71, and the processor 70 reads the information in the memory 71 and performs the steps of the above method in combination with its hardware.

本公开还提供自然语言提问的理解方法的计算机程序产品，包括存储了处理器可执行的非易失的程序代码的计算机可读存储介质，所述程序代码包括的指令可用于执行前面方法实施例中所述的方法，具体实现可参见方法实施例，在此不再赘述。The present disclosure also provides a computer program product for a method of understanding natural language questions, comprising a computer readable storage medium storing non-volatile program code executable by a processor, the program code comprising instructions operable to perform the previous method embodiments For the specific implementation, refer to the method embodiment, and details are not described herein again.

所属领域的技术人员可以清楚地了解到，为描述的方便和简洁，上述描述的装置及电子设备的具体工作过程，可以参考前述方法实施例中的对应过程，在此不再赘述。A person skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the device and the electronic device described above can refer to the corresponding process in the foregoing method embodiments, and details are not described herein again.

附图中的流程图和框图显示了根据本公开的多个实施例方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上，流程图或框图中的每个方框可以代表一个模块、程序段或代码的一部分，所述模块、程序段或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意，在有些作为替换的实现中，方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如，两个连续的方框实际上可以基本并行地执行，它们有时也可以按相反的顺序执行，这依所涉及的功能而定。也要注意的是，框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合，可以用执行规定的功能或动作的专用的基于硬件的系统来实现，或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality and operation of possible implementations of methods and computer program products according to various embodiments of the present disclosure. In this regard, each block of the flowchart or block diagram can represent a module, a program segment, or a portion of code that includes one or more of the Executable instructions. It should also be noted that in some alternative implementations, the functions noted in the blocks may also occur in a different order than that illustrated in the drawings. For example, two consecutive blocks may be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowcharts, and combinations of blocks in the block diagrams and/or flowcharts, can be implemented in a dedicated hardware-based system that performs the specified function or function. Or it can be implemented by a combination of dedicated hardware and computer instructions.

在本公开的描述中，需要说明的是，术语“中心”、“上”、“下”、“左”、“右”、“竖直”、“水平”、“内”、“外”等指示的方位或位置关系为基于附图所示的方位或位置关系，仅是为了便于描述本公开和简化描述，而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作，因此不能理解为对本公开的限制。此外，术语“第一”、“第二”、“第三”仅用于描述目的，而不能理解为指示或暗示相对重要性。In the description of the present disclosure, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inside", "outside", etc. The orientation or positional relationship indicated is based on the orientation or positional relationship shown in the drawings, and is merely for the convenience of describing the present disclosure and the simplified description, and does not indicate or imply that the device or component referred to has a specific orientation, in a specific orientation. The construction and operation are therefore not to be construed as limiting the disclosure. Moreover, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

在本申请所提供的几个实施例中，应该理解到，所揭露的系统、装置和方法，可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的，例如，所述单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，又例如，多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些通信接口，装置或单元的间接耦合或通信连接，可以是电性，机械或其它的形式。In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. The device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner. For example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some communication interface, device or unit, and may be electrical, mechanical or otherwise.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.

另外，在本公开各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in various embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.

所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个处理器可执行的非易失的计算机可读取存储介质中。基于这样的理解，本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(ROM，Read-Only Memory)、随机存取存储器(RAM，Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。The functions, if implemented in the form of software functional units and sold or used as separate products, may be stored in a non-transitory computer readable storage medium executable by a processor. Based on such understanding, the portion of the technical solution of the present disclosure that contributes in essence or to the prior art or the portion of the technical solution may be embodied in the form of a software product stored in a storage medium, including The instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present disclosure. The foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .

最后应说明的是：以上所述实施例，仅为本公开的具体实施方式，用以说明本公开的技术方案，而非对其限制，本公开的保护范围并不局限于此，尽管参照前述实施例对本公开进行了详细的说明，本领域的普通技术人员应当理解：任何熟悉本技术领域的技术人员在本公开揭露的技术范围内，其依然可以对前述实施例所记载的技术方案进行修改或可轻易想到变化，或者对其中部分技术特征进行等同替换；而这些修改、变化或者替换，并不使相应技术方案的本质脱离本公开技术方案的精神和范围，都应涵盖在本公开的保护范围之内。因此，本公开的保护范围应所述以权利要求的保护范围为准。It should be noted that the above-mentioned embodiments are merely specific embodiments of the present disclosure, and are used to explain the technical solutions of the present disclosure, and are not limited thereto. The scope of protection of the present disclosure is not limited thereto, although reference is made to the foregoing. The embodiments are described in detail, and those skilled in the art should understand that any one skilled in the art can still modify the technical solutions described in the foregoing embodiments within the technical scope of the disclosure. The changes may be easily conceived, or equivalents may be substituted for some of the technical features; and the modifications, variations, or substitutions of the present invention are not intended to depart from the spirit and scope of the technical solutions of the present disclosure, and should be covered in the protection of the present disclosure. Within the scope. Therefore, the scope of protection of the present disclosure should be determined by the scope of the claims.

Industrial applicability

本公开提供的自然语言提问的理解方法、装置及电子设备，可以为提问信息提供准确的、经过计算统计之后得到的数据结果，实现对自然语言提问信息的准确识别，可以应用于数据分析领域等专业的场景。The method, device and electronic device for understanding natural language questions provided by the present disclosure can provide accurate and statistically obtained data results for question information, and realize accurate identification of natural language question information, and can be applied to data analysis fields, etc. Professional scenes.

Claims

A method for understanding natural language questions, which is characterized by comprising:

Obtaining natural language question information input by the user end; the natural language question information is question information related to the data query;

Parsing the natural language question information to obtain a minimum parsing unit;

Generating, according to the minimum parsing unit and the preset instruction set, a query instruction corresponding to the natural language question information;

Performing a search from the preset knowledge base according to the query instruction to obtain a data result corresponding to the natural language question information; the preset knowledge base is based on database data provided by the user, user input information data, and/or third party data. generate.

The method according to claim 1, wherein the parsing the natural language question information to obtain a minimum parsing unit comprises:

Performing word segmentation processing on the natural language question information to obtain a plurality of word segmentation segments;

Performing entity noun recognition on the plurality of segmentation segments to obtain a minimum parsing unit; the minimum parsing unit includes: an attribute minimum parsing unit, a metric minimum parsing unit, and a space-time modifying structural word.

The method according to claim 2, wherein the attribute minimum parsing unit comprises at least one of an attribute item, a calculation operation item, and an attribute logical relationship item; the metric minimum parsing unit comprises a metric item, a metric logical relationship item And calculate at least one of the modifiers.

The method according to claim 3, wherein the attribute item in the attribute minimum parsing unit represents a category or entity to which the natural language question information belongs; the calculation operation item in the attribute minimum parsing unit represents classification, grouping, Segmentation, partial value, count, ranking by pinyin and ranking by last name stroke; attribute logical relationship terms in the minimum parsing unit of the attribute represent logical relationships of similarity, dissimilarity, inclusion, non-containment, comparison and comparison.

The method according to claim 3 or 4, wherein the metric item in the metric minimum analytic unit represents a numerical value; the metric logical relationship item in the metric minimum analytic unit represents greater than, less than, equal to, greater than or equal to, a numerical relationship of less than or equal to and not equal to; the calculated modifier in the minimum parsing unit of the metric represents summation, average, count, standard deviation, variance, correlation, correlation coefficient, daily growth rate, weekly growth rate, monthly growth Rate, quarterly growth rate, annual growth rate, sort, maximum, minimum, pre-N and reciprocal N.

The method according to any one of claims 2 to 5, wherein the space-time modified structure word represents at least one of the following: a time period, a time point, a point before, after a point, and a point Between a certain point and a preset position within a certain distance, and a relative position within a certain distance.

The method according to any one of claims 1 to 6, wherein the generating the query instruction corresponding to the natural language question information based on the minimum parsing unit and the preset instruction set comprises:

Deriving data query logic included in the natural language question information according to the minimum parsing unit;

According to the data query logic, corresponding instructions are extracted from the preset instruction set and combined to generate a query instruction corresponding to the natural language question information.

The method according to claim 7, wherein the inferring the data query logic included in the natural language question information according to the minimum parsing unit comprises:

Obtaining a category corresponding to the natural language question information;

According to the obtained category, the query logic corresponding to the natural language question information is obtained.

The method according to any one of claims 1 to 8, wherein before the retrieving from the preset knowledge base according to the query instruction to obtain the data result corresponding to the natural language question information, The method also includes:

Obtaining knowledge base sample data; the knowledge base sample data includes: user-provided database data, user input information data, and/or third-party data;

A preset knowledge base is generated according to the knowledge base sample data.

The method according to any one of claims 1 to 9, wherein after the searching according to the query instruction is performed from a preset knowledge base to obtain a data result corresponding to the natural language question information, The method also includes:

The natural language question information and its corresponding data result are added to the preset knowledge base.

An apparatus for understanding natural language questions, comprising:

The information obtaining module is configured to obtain natural language question information input by the user end; the natural language question information is question information related to the data query;

An information parsing module configured to parse the natural language question information to obtain a minimum parsing unit;

An instruction generating module, configured to generate a query instruction corresponding to the natural language question information based on the minimum parsing unit and a preset instruction set;

a retrieval module, configured to retrieve from the preset knowledge base according to the query instruction, to obtain a data result corresponding to the natural language question information; the preset knowledge base is based on database data provided by the user, input information data of the user, and / or third party data generation.

The device according to claim 11, wherein the information parsing module comprises:

a word segmentation module configured to perform word segmentation processing on the natural language question information to obtain a plurality of word segmentation segments;

The identification module is configured to perform entity noun recognition on the plurality of segmentation segments to obtain a minimum parsing unit; the minimum parsing unit includes: an attribute minimum parsing unit, a metric minimum parsing unit, and a space-modifying structural word.

The apparatus according to claim 12, wherein the attribute minimum parsing unit comprises at least one of an attribute item, a calculation operation item, and an attribute logical relationship item; the metric minimum parsing unit comprises a metric item, a metric logical relationship item And calculate at least one of the modifiers.

The apparatus according to claim 13, wherein the attribute item in the attribute minimum parsing unit represents a category or entity to which the natural language question information belongs; the calculation operation item in the attribute minimum parsing unit represents classification, grouping, Segmentation, partial value, count, ranking by pinyin and ranking by last name stroke; attribute logical relationship terms in the minimum parsing unit of the attribute represent logical relationships of similarity, dissimilarity, inclusion, non-containment, comparison and comparison.

The apparatus according to claim 13 or 14, wherein the metric item in the metric minimum parsing unit represents a numerical value; and the metric logical relationship item in the metric minimum parsing unit represents greater than, less than, equal to, greater than or equal to, a numerical relationship of less than or equal to and not equal to; the calculated modifier in the minimum parsing unit of the metric represents summation, average, count, standard deviation, variance, correlation, correlation coefficient, daily growth rate, weekly growth rate, monthly growth Rate, quarterly growth rate, annual growth rate, sort, maximum, minimum, pre-N and reciprocal N.

The apparatus according to any one of claims 12 to 15, wherein the space-time modified structure word represents at least one of the following: a time period, a time point, a point before, a point after, and a certain Between the point and a point; somewhere outside the preset distance range, somewhere within the preset distance range and somewhere set the direction relative position.

The device according to any one of claims 11 to 16, wherein the instruction generating module is configured to generate, according to the minimum parsing unit and the preset instruction set, corresponding to the natural language question information Query instructions:

The apparatus according to claim 17, wherein the instruction generating module is specifically configured to infer, according to the minimum parsing unit, data query logic included in the natural language question information by:

An electronic device comprising a memory and a processor having stored thereon a computer program executable on the processor, wherein the processor executes the computer program to implement the above claims 1 to 10 The steps of any of the methods described.

A computer readable medium having a processor-executable non-volatile program code, wherein the program code causes the processor to perform the method of any one of claims 1 to 10.