CN110866042B

CN110866042B - Intelligent query method and device for table and computer readable storage medium

Info

Publication number: CN110866042B
Application number: CN201910975458.1A
Authority: CN
Inventors: 王建华; 马琳; 张晓东
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-10-11
Filing date: 2019-10-11
Publication date: 2023-05-12
Anticipated expiration: 2039-10-11
Also published as: CN110866042A; WO2021068565A1

Abstract

The invention relates to an artificial intelligence technology, and discloses a table intelligent query method, which comprises the following steps: receiving an original table set and a label set, splitting the original table set to obtain a standard table set, performing part-of-speech coding on the label set to obtain a word vector set, inputting the standard table set and the word vector set into an intelligent query model for training to obtain a training value, until the training value is smaller than a preset threshold value, finishing training by the intelligent query model, receiving query contents of a user, extracting the query contents based on a keyword extraction technology to obtain a keyword set, performing part-of-speech coding on the keyword set to obtain a keyword vector set, inputting the keyword vector set into the intelligent query model for finishing training, obtaining the table set required by the query contents, and outputting. The invention also provides a table intelligent query device and a computer readable storage medium. The invention can realize the accurate and efficient intelligent form query function.

Description

Intelligent query method and device for table and computer readable storage medium

Technical Field

The present invention relates to the field of artificial intelligence technologies, and in particular, to a method and apparatus for intelligently querying a table, and a computer readable storage medium.

Background

With the rapid development of the internet, the data size is rapidly enlarged, so that the rapid query speed of the data is higher and higher. Most of the data are stored in a form, such as daily gain data of business operation of a company, property registration information of a property company and the like, and most of the data-based queries are based on a form traversal method or a user keyword search method at present, and although the query requirements can be met to a certain extent, when the form capacity is large, the form traversal method and the user keyword search method have low search speed and consume a large amount of computation memory.

Disclosure of Invention

The invention provides a method, a device and a computer readable storage medium for intelligently inquiring a form, which mainly aim at intelligently inquiring the form according to the inquiry requirement of a user.

In order to achieve the above object, the present invention provides a method for intelligently querying a table, including:

receiving an original table set and a label set, and splitting the original table set to obtain a standard table set;

performing part-of-speech coding on the tag set to obtain a word vector set;

inputting the standard table set and the word vector set into an intelligent query model to train to obtain a training value, judging the magnitude relation between the training value and a preset threshold, if the training value is larger than the preset threshold, continuing training by the intelligent query model, and if the training value is smaller than the preset threshold, completing training by the intelligent query model;

receiving query content of a user, extracting the query content based on a keyword extraction technology to obtain a keyword set, performing part-of-speech coding on the keyword set to obtain a keyword vector set, inputting the keyword vector set into the intelligent query model which is trained, obtaining and outputting a form set required by the query content.

Optionally, the splitting process splits the original table set into a user layer, a calculation layer and a data layer, and forms the user layer, the calculation layer and the data layer into the standard table set;

wherein:

the user layer consists of the table questions and the table heads of each table in the original table set;

the data layer is composed of a table body of each table in the original table set;

the computing layer provides mutual query functions of the user layer and the data layer.

Optionally, the part-of-speech encoding the tag set to obtain a word vector set includes:

performing single-heat coding on the tag set to obtain a primary word vector set;

and dimension reduction is carried out on the primary word vector set to obtain the word vector set.

Optionally, the dimension reduction includes:

establishing a forward probability model and a backward probability model;

and optimizing the forward probability model and the backward probability model to obtain an optimized solution, wherein the optimized solution is the word vector set.

Optionally, the standard table set and the word vector set are input into an intelligent query model to be trained to obtain training values, including:

the user layer information in the standard table set is subjected to part-of-speech coding to obtain a user layer information vector set;

inputting the user layer information vector set into the intelligent query model, and sequentially performing convolution operation, pooling operation and activation operation on the intelligent query model to obtain a predicted value set;

and carrying out loss calculation on the predicted value set and the word vector set to obtain the training value.

In addition, in order to achieve the above object, the present invention also provides a form intelligent query device, which includes a memory and a processor, wherein a form intelligent query program capable of running on the processor is stored in the memory, and the form intelligent query program when executed by the processor implements the following steps:

performing part-of-speech coding on the tag set to obtain a word vector set;

wherein:

Optionally, the dimension reduction includes:

establishing a forward probability model and a backward probability model;

In addition, to achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon a table smart query program executable by one or more processors to implement the steps of the table smart query method as described above.

According to the invention, the standard form set is obtained by splitting the original form set, so that the splitting process can split the huge and integrated original form into the form with small volume, the quick query is facilitated, the intelligent query model is trained through the standard form set, so that the intelligent query model has an excellent and efficient query function, meanwhile, the keyword extraction is carried out on the query content of the user, the query content wanted by the user is accurately known, and the quick query is carried out according to the intelligent query model. Therefore, the intelligent form query method, the intelligent form query device and the computer readable storage medium can realize an accurate and efficient form query function.

Drawings

FIG. 1 is a flowchart of a method for intelligently querying a table according to an embodiment of the present invention;

fig. 2 is a schematic diagram of an internal structure of a table intelligent query device according to an embodiment of the invention;

fig. 3 is a schematic diagram of a table intelligent query program in a table intelligent query device according to an embodiment of the invention.

The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

The invention provides an intelligent form query method. Referring to fig. 1, a flow chart of a table intelligent query method according to an embodiment of the invention is shown. The method may be performed by an apparatus, which may be implemented in software and/or hardware.

In this embodiment, the intelligent query method for the table includes:

s1, receiving an original table set and a label set, and splitting the original table set to obtain a standard table set.

The original table set in the preferred embodiment of the invention is a table which is automatically generated on the premise of different services, for example, a real estate agency collects a plurality of house sources and obtains a house property information table set based on the summary of EXCEL software; the communication service company generates a user consumption list set and the like according to the use conditions of conversations, flow rates and the like of different users.

The tag set is a description of each table in the original set of tables. Preferably, the description mode adopts a keyword combination mode, such as the property information table set, the tag set adopts a combination mode of 4 keywords of cell name and unit number, property area, market price and property pattern, and if the tag set records that the property information table set has a set of rooms as follows: a cell 3A unit +89 square meters +120 ten thousand + three-room one-room two-guard.

Preferably, the splitting process splits the original table set into a user layer, a calculation layer and a data layer to obtain the standard table set. Because the table comprises the table questions, the table heads and the table body, wherein the table questions and the table heads are based on the text expression form, and the table questions of the product sales statistics table are the product sales statistics table, and the table heads comprise the product numbers, the product names, the specifications, the packaging modes and the like, the table questions and the table heads of the original table set are extracted to form a user layer; the table body is data of the whole table, so that the table body of the original table set is extracted to form a data layer, when the whole table is split to form the user layer and the data layer, a query relation needs to be established between the user layer and the data layer, namely the function of the calculation layer, the calculation layer can adopt a multi-element linear index query form, and the linear query is that:

y＝a ₁ x ₁ +a ₂ x ₂ +…+a _n x _n

wherein y is the data in the data layer, x ₁ 、x ₂ 、x _n A is the text information of the user layer ₁ 、a ₂ 、a _n The index coefficient is 0 or 1. If the real estate information with the real estate price of 130 ten thousand is to be inquired, so that y=130, and all user layer text information meeting the real estate price of 130 ten thousand is reversely calculated, namely, the original table is concentrated with the header and the header information, for example, the header is a district A, a district B and a district E, the real estate information meeting the real estate price of 130 ten thousand is provided, and the header is under the header of district A, and the real estate price of 130 ten thousand is provided.

S2, performing part-of-speech coding on the tag set to obtain a word vector set.

Because the intelligent query model cannot effectively identify the text information, the effective identification is to extract the characteristics of the text information and then perform differential judgment, preferably, the part-of-speech coding is to represent all keywords of each tag in the tag set by using an N-dimensional matrix vector, i.e. convert the text information into digital information for the identification and training of the subsequent model, wherein N is the number of the keywords in the tag set.

Further, the part-of-speech coding firstly carries out one-hot coding (one-hot) operation on the tag set to obtain a primary word vector set, and then carries out dimension reduction on the primary word vector set to obtain the word vector set.

The one-hot operation is as follows:

wherein i represents a keyword number, v ⁱ N-dimensional matrix vector representing key i, all v ⁱ Forming the primary word vector set, assuming a total of s keywords, v _j Is the j-th element of the N-dimensional matrix vector.

Further, the dimension reduction is to reduce the generated N-dimensional matrix vector to smaller dimension data that is easier to compute for subsequent model training, i.e., to ultimately convert the primary word vector set into the word vector set.

Preferably, the dimension reduction establishes a forward probability model and a backward probability model, and then optimizes the forward probability model and the backward probability model to obtain an optimized solution, wherein the optimized solution is the word vector set.

Further, the forward probability model and the backward probability model are respectively:

optimizing the forward probability model and the backward probability model:

where max represents the optimization and,

representing the deviation derivative, v ⁱ And (3) representing the N-dimensional matrix vector of the keyword i, wherein the label set comprises s keywords, and further, after optimizing the forward probability model and the backward probability model, the dimension of the N-dimensional matrix vector is reduced to be smaller, and the word vector set is obtained after the dimension reduction is completed.

S3, inputting the standard table set and the word vector set into an intelligent query model to train to obtain a training value, judging the magnitude relation between the training value and a preset threshold, if the training value is larger than the preset threshold, continuing training by the intelligent query model, and if the training value is smaller than the preset threshold, completing training by the intelligent query model.

Preferably, the training to obtain the training value includes: and carrying out part-of-speech coding on the user layer information in the standard table set to obtain a user layer information vector set, inputting the user layer information vector set into the intelligent query model, sequentially carrying out convolution operation, pooling operation and activation operation on the intelligent query model to obtain a predicted value set, and carrying out loss calculation on the predicted value set and the word vector set to obtain the training value.

Further, the convolution operation and the pooling operation comprise the steps of constructing a convolution template in advance, determining a convolution step length, and calculating the convolution template and a user layer information vector set according to the convolution step length to obtain a convolution matrix set after the convolution operation, so that the convolution operation is completed. And selecting the maximum value or the average value of the matrixes in the convolution matrix set to replace the convolution matrix set, and completing the pooling operation.

Further, the pre-constructed convolution templates may be a standard 3*3 matrix, such as

The calculation mode of the matrix after the convolution operation is a mode of from left to right and the convolution amplitude is 1, and if the characteristic candidate area matrix with the characteristic 9*9 in the characteristic candidate area set is: />

Said pre-constructed convolution template->

First of all, and->

The calculation is carried out in the following way: 1*0, 0*3, 1*1, and the like, and the final result is: />

And so on, the pre-constructed convolution template +.>

According to the convolution amplitude of 1, continuously traversing the matrix to the right for one step is as follows: said pre-constructed convolution template->

Performing the above operation to obtain the pre-constructed convolution template +.>

It follows that a large number of small dimensional matrices can be generated when the convolution operation is completed, e.g.>

And->

Etc., so that the pooling operation reduces the dimensions of a large number of small-dimension matrices generated by the convolution operation, preferably using a maximization principle, such as the one described above +.>

And

the pooling operation is completed by substituting the largest values 3 and 7.

Preferably, the convolution operation and the pooling operation are repeated, and preferably 16 times of convolution and pooling operations can be used to obtain a final feature matrix set.

Preferably, the activating operation is to estimate the probability of the feature matrix set through a softmax function, and select the prediction result with the highest probability as the final prediction and output the final prediction result. The softmax function is:

wherein p (matrix) represents the output probability of the feature matrix set matrix, k represents the data amount of the feature matrix set, e is an infinite non-cyclic fraction, and j represents the selectable range of the prediction value set. Such as when

At the time, calculate +.>

Is 0.21, when->

At the time, calculate +.>

Is 0.64, so the feature matrix is represented by +.>

The loss calculation includes:

wherein t is the number of the word vector sets, y _i For the set of word vectors, y _i ' is the set of predictors.

S4, receiving query contents of a user, extracting the query contents based on a keyword extraction algorithm to obtain a keyword set, performing part-of-speech conversion on the keyword set to obtain a keyword vector set, inputting the keyword vector set into the intelligent query model which completes training to obtain a form set required by the query contents, and outputting the form set.

Since the query content input by the user is often not in the prescribed keyword combination form, and the spoken language is often tasted for a lot, such as "i want to find the telephone bill of September", the keyword extraction is required for the query content input by the user, and the keywords "September" and "telephone bill" in "i want to find the telephone bill of September" are extracted.

Preferably, the keyword extraction may be performed by using a traversal method, for example, splitting and de-duplicating all keywords of the tag set to construct a keyword table, and sequentially comparing the query content with the keyword table to complete the keyword extraction.

Further, the intelligent query model obtains a table set required by the query content, wherein the query content is information of the user layer, so that the function provided by the calculation layer for querying the data layer according to the user layer is called to obtain the data layer, and the data layer and the user layer are recombined to obtain the table set.

The invention also provides an intelligent query device for the table. Referring to fig. 2, an internal structure diagram of a table intelligent query device according to an embodiment of the invention is shown.

In this embodiment, the table intelligent query device 1 may be a PC (Personal Computer ), or a terminal device such as a smart phone, a tablet computer, a portable computer, or a server. The form intelligent querying device 1 comprises at least a memory 11, a processor 12, a communication bus 13, and a network interface 14.

The memory 11 includes at least one type of readable storage medium including flash memory, a hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the form smart query device 1, such as a hard disk of the form smart query device 1. The memory 11 may also be an external storage device of the table Smart query device 1 in other embodiments, for example, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the table Smart query device 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the table intelligent query apparatus 1. The memory 11 may be used not only for storing application software installed in the form intelligent inquiry apparatus 1 and various types of data, such as codes of the form intelligent inquiry program 01, but also for temporarily storing data that has been output or is to be output.

Processor 12 may in some embodiments be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor or other data processing chip for executing program code or processing data stored in memory 11, such as executing a form intelligent lookup program 01, etc.

The communication bus 13 is used to enable connection communication between these components.

The network interface 14 may optionally comprise a standard wired interface, a wireless interface (e.g. WI-FI interface), typically used to establish a communication connection between the apparatus 1 and other electronic devices.

Optionally, the device 1 may further comprise a user interface, which may comprise a Display (Display), an input unit such as a Keyboard (Keyboard), and a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch, or the like. The display may also be referred to as a display screen or a display unit, as appropriate, for displaying information processed in the form intelligent querying device 1 and for displaying a visual user interface.

Fig. 2 shows only a form intelligent querying device 1 having components 11-14 and a form intelligent querying program 01, it will be understood by those skilled in the art that the structure shown in fig. 1 does not constitute a limitation of the form intelligent querying device 1, and may include fewer or more components than shown, or may combine certain components, or may be arranged in different components.

In the embodiment of the apparatus 1 shown in fig. 2, a table intelligent inquiry program 01 is stored in the memory 11; the following steps are implemented when the processor 12 executes the table intelligent inquiry program 01 stored in the memory 11:

step one, receiving an original table set and a label set, and splitting the original table set to obtain a standard table set.

y＝a ₁ x ₁ +a ₂ x ₂ +…+a _n x _n

Step two, part-of-speech coding is carried out on the tag set to obtain a word vector set.

The one-hot operation is as follows:

optimizing the forward probability model and the backward probability model:

where max represents the optimization and,

And thirdly, inputting the standard table set and the word vector set into an intelligent query model to train to obtain a training value, judging the magnitude relation between the training value and a preset threshold, if the training value is larger than the preset threshold, continuing training by the intelligent query model, and if the training value is smaller than the preset threshold, finishing training by the intelligent query model.

The matrix after the convolution operation is calculated by adopting the following modeThe convolution amplitude is 1 from left to right, and if the feature candidate area matrix with the feature 9*9 in the feature candidate area set is: />

Said pre-constructed convolution template->

First of all, and->

And so on, the pre-constructed convolution template +.>

And->

And

the pooling operation is completed by substituting the largest values 3 and 7.

At the time, calculate +.>

Is 0.21, when->

At the time, calculate +.>

Is 0.64, so the feature matrix is represented by +.>

The loss calculation includes:

And step four, receiving query contents of a user, extracting the query contents based on a keyword extraction algorithm to obtain a keyword set, performing part-of-speech conversion on the keyword set to obtain a keyword vector set, inputting the keyword vector set into the intelligent query model which completes training to obtain a form set required by the query contents, and outputting the form set.

Optionally, in other embodiments, the table intelligent query program may be further divided into one or more modules, where one or more modules are stored in the memory 11 and executed by one or more processors (the processor 12 in this embodiment) to perform the present invention, and the modules referred to herein are a series of instruction segments of a computer program capable of performing a specific function, for describing the execution of the table intelligent query program in the table intelligent query device.

For example, referring to fig. 3, a schematic program module of a table intelligent query program in an embodiment of a table intelligent query apparatus according to the present invention is shown, where the table intelligent query program may be divided into a data receiving and processing module 10, a part-of-speech encoding module 20, an intelligent query model training module 30, and a table query and output module 40, which are exemplary:

the data receiving and processing module 10 is configured to: and receiving an original table set and a label set, and splitting the original table set to obtain a standard table set.

The part-of-speech encoding module 20 is configured to: and performing part-of-speech coding on the tag set to obtain a word vector set.

The intelligent query model training 30 is used to: and inputting the standard table set and the word vector set into an intelligent query model to train to obtain a training value, judging the magnitude relation between the training value and a preset threshold, if the training value is larger than the preset threshold, continuing training by the intelligent query model, and if the training value is smaller than the preset threshold, finishing training by the intelligent query model.

The table queries and outputs 40 for: receiving query content of a user, extracting the query content based on a keyword extraction technology to obtain a keyword set, performing part-of-speech coding on the keyword set to obtain a keyword vector set, inputting the keyword vector set into the intelligent query model which is trained, obtaining and outputting a form set required by the query content.

The functions or operation steps implemented when the program modules, such as the data receiving and processing module 10, the part-of-speech encoding module 20, the intelligent query model training module 30, the table query and output module 40, etc., are executed are substantially the same as those of the foregoing embodiments, and will not be described herein.

In addition, an embodiment of the present invention further provides a computer-readable storage medium, where a table intelligent query program is stored, where the table intelligent query program can be executed by one or more processors to implement the following operations:

and receiving an original table set and a label set, and splitting the original table set to obtain a standard table set.

And performing part-of-speech coding on the tag set to obtain a word vector set.

And inputting the standard table set and the word vector set into an intelligent query model to train to obtain a training value, judging the magnitude relation between the training value and a preset threshold, if the training value is larger than the preset threshold, continuing training by the intelligent query model, and if the training value is smaller than the preset threshold, finishing training by the intelligent query model.

It should be noted that, the foregoing reference numerals of the embodiments of the present invention are merely for describing the embodiments, and do not represent the advantages and disadvantages of the embodiments. And the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, apparatus, article or method that comprises the element.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) as described above, comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present invention.

The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims

1. An intelligent query method for a form, which is characterized by comprising the following steps:

performing part-of-speech coding on the tag set to obtain a word vector set;

2. The intelligent query method of claim 1, wherein the splitting process splits the original table set into a user layer, a calculation layer and a data layer, and composes the user layer, the calculation layer and the data layer into the standard table set;

wherein:

3. The method for intelligently querying a table according to claim 1 or 2, wherein the step of performing part-of-speech encoding on the tag set to obtain a word vector set includes:

4. The form intelligence query method of claim 3, wherein said dimension reduction comprises:

establishing a forward probability model and a backward probability model;

5. The method of claim 2, wherein inputting the standard table set and the word vector set into an intelligent query model for training to obtain training values comprises:

6. A form intelligence query device, the device comprising a memory and a processor, the memory storing a form intelligence query program operable on the processor, the form intelligence query program when executed by the processor implementing the steps of:

performing part-of-speech coding on the tag set to obtain a word vector set;

7. The intelligent table query device according to claim 6, wherein the splitting process splits the original table set into a user layer, a calculation layer and a data layer, and composes the user layer, the calculation layer and the data layer into the standard table set;

wherein:

8. The intelligent query device according to claim 6 or 7, wherein said part-of-speech encoding said tag set to obtain a word vector set, comprises:

9. The form intelligent querying device as recited in claim 8, wherein the dimension reduction comprises:

establishing a forward probability model and a backward probability model;

10. A computer-readable storage medium having stored thereon a form intelligence query program executable by one or more processors to implement the steps of the form intelligence query method of any of claims 1 to 5.