CN119336795B

CN119336795B - A search purpose prediction method and system based on knowledge management

Info

Publication number: CN119336795B
Application number: CN202411847602.0A
Authority: CN
Inventors: 刘祯; 刘敬仪; 李舒婷; 杨明栋; 邱杰峰; 杨伟伟; 程莉红; 施千里; 胡丹丹; 陈莹; 佟成郁; 孙海凤; 谭兵; 徐晓燕; 李喆; 陈龙; 黄健; 王一宏; 倪玉; 周劼翀
Original assignee: Nuclear Power Operation Research Shanghai Co ltd; CNNC Fujian Nuclear Power Co Ltd
Current assignee: Nuclear Power Operation Research Shanghai Co ltd; CNNC Fujian Nuclear Power Co Ltd
Priority date: 2024-12-16
Filing date: 2024-12-16
Publication date: 2025-04-11
Anticipated expiration: 2044-12-16
Also published as: CN119336795A

Abstract

The present invention relates to the technical field of search purpose prediction, and discloses a search purpose prediction method and system based on knowledge management, including: obtaining the search content of the user, and extracting keywords in the search content; constructing coordinates for the content in the database; locating the search content according to the characteristics of the keywords; generating a predicted range of the search content at the coordinate location; synthesizing the database content within the predicted range to generate a prediction of the search purpose. The accuracy of keyword recognition is effectively improved, making the search content more accurate. The relationship between words can be better represented to achieve accurate coordinate positioning. Not only does it improve the accuracy and efficiency of the search, but it can also intelligently predict the user's search purpose, greatly improving the information retrieval experience in the knowledge management system.

Description

Knowledge management-based retrieval purpose prediction method and system

Technical Field

The invention relates to the technical field of search purpose prediction, in particular to a search purpose prediction method and system based on knowledge management.

Background

In the information age today, knowledge management is one of the key means for enterprises and organizations to gain competitive advantages. Knowledge management aims at collecting, sorting, sharing and utilizing knowledge resources through a systematic method and improving innovation capability and decision level of organization. In a big data environment, how to effectively manage and utilize a large amount of knowledge and information becomes a significant challenge in the field of knowledge management.

Information retrieval technology (Information Retrieval) has evolved over the last decades. Traditional information retrieval methods rely mainly on keyword matching and boolean logic, but with explosive growth of internet and data volume, traditional methods are increasingly inadequate. Modern information retrieval technologies begin to combine advanced techniques such as Natural Language Processing (NLP), machine Learning (ML), and Deep Learning (DL) to improve the accuracy and efficiency of retrieval.

In knowledge management systems, users often need to quickly and accurately obtain information related to their needs. This need has spawned a knowledge management-based search method, i.e., the user's search objective is understood by an intelligent means, thereby providing more accurate and relevant search results. However, existing retrieval methods still have shortcomings in processing complex queries, understanding user intent, and integrating a variety of information sources.

Disclosure of Invention

The present invention has been made in view of the above-described problems.

Therefore, the invention solves the technical problems of poor information relevance, inaccurate keyword extraction, insufficient retrieval precision and the like of the existing retrieval target prediction method.

In order to solve the technical problems, the invention provides a retrieval purpose prediction method based on knowledge management, which comprises the following steps:

Acquiring search content of a user, and extracting keywords in the search content;

carrying out coordinate construction on the content in the database;

according to the characteristics of the keywords, carrying out coordinate positioning on the search content;

Generating a prediction range of the search content at the coordinate location;

and integrating the database contents in the prediction range to generate a prediction for the search purpose.

The invention relates to a search target prediction method based on knowledge management, which is a preferable scheme, wherein the search content comprises characters or sentences input by a user;

The keywords include, to be input, sentences Dividing words into n independent words;

,

Wherein, The sentence to be input is represented as such,Represents the word set after word segmentation,Represent the firstA personal word;

will each word Conversion to corresponding word vectors;

,

Wherein, Representation wordsWord2VecRepresenting words by Word2Vec modelConverting into vectors;

generating each word by counting the parts of speech of each word in different environments in a dictionary library Owned part-of-speech collections;Representation wordsIn the dictionary base, the j-th part of speech which may exist;

The dictionary probability of each word part of speech is obtained from a dictionary database;

,

Wherein, POSRepresentation wordsIs part of speechDictionary probability, count of (c)Representation wordsAs part of speech in a dictionaryIs used for the number of occurrences of (a),Representation wordsFor the number of occurrences of any part of speech,Representation wordsPart of speech tag of (2);

At each word In the corresponding part-of-speech set, according to the dictionary probability, randomly extracting the part-of-speech in the setObtaining part-of-speech tagging sequencesAnd extracting the part of speech tagging sequence randomly each timeIs not the same until no new part-of-speech tagging sequence is generated or the maximum number of extractions is reached, wherein,Representing the part of speech of the i-th word.

As a preferable scheme of the retrieval purpose prediction method based on knowledge management, the invention further comprises the following steps ofFor the extracted kth sequence,An ith part of speech of the kth part of speech tagging sequence;

Part-of-speech tagging using a pre-trained part-of-speech embedding matrix Mapping to a corresponding part-of-speech embedded vector;

,

Wherein POS2Vec represents the mapping function of part-of-speech tags to embedded vectors;

Combining each part-of-speech sequence with a word vector The sequences of each part-of-speech sequence are combined, and the fitness of each part-of-speech sequence is calculated through a Bi-LSTM layer and a full connection layer;

forward LSTM:

,

Backward LSTM:

,

Combining forward and backward hidden states:

,

Calculating the fitness of each part-of-speech sequence:

,

Selecting a sequence with the highest fitness from the generated multiple part-of-speech sequences as a final part-of-speech tagging result;

,

Wherein, Representing the part-of-speech sequence with the highest fitness; Expressed in all possible part-of-speech sequences Is selected such thatThe largest sequence; Represent the first An fitness score of the individual part-of-speech sequences; Represent the first Part of speech sequence ofPersonal wordLSTM represents a long-term memory network; representing part of speech Is a vector of embedding; a splicing operation of representing word vectors and part-of-speech embedded vectors; Represent the first Part of speech sequence ofForward LSTM hidden state vector of the individual word; Represent the first Part of speech sequence ofPersonal wordA backward LSTM hidden state vector; Represent the first Part of speech sequence ofA backward LSTM hidden state vector of the individual word; Representing the i-th word in the kth part-of-speech sequence, combining the hidden state vectors of the forward and backward LSTM; representing bias vectors for fully connected layers, [.] represents splicing operations for vectors; A weight matrix representing the full connection layer;

presetting the weight of each part of speech, and selecting a part of speech tagging sequence through a mapping function Part-of-speech weights for each word;

,

Wherein, Representation wordsThe weight value of the map is used to determine,Representation tagIs mapped to;

transmitting the context information of the sentences to an attention layer, and calculating the importance score of each word;

,

Wherein, Representation wordsIs a weight of attention of (2); is a weight matrix of the attention layer; Representation words Hidden state vectors in the Bi-LSTM layer; the softmax represents a normalization function such that the attention weight sum of all words is 1;

To each word AndMultiplying and outputting a screening score;

,

Preset threshold valueThe word of (2) is subjected to keyword screening;

,

Wherein, A set of vectors representing the outputted keywords.

The invention relates to a search target prediction method based on knowledge management, which is a preferable scheme, wherein the coordinate construction comprises vectorizing words in a database, calculating similarity between word vectors as an abscissa, calculating relevance between different words according to historical training data as an ordinate, constructing a coordinate axis, and determining the position of the word vector in the database in the coordinate axis;

setting the most frequent word in the database Defined as origin, words are represented by cosine similarityAnd other wordsSimilarity between;

,

Wherein, The dot product of the word vector is represented,AndRepresenting norms of word vectors;

calculating the relevance between different words according to the historical training data, and representing the relevance between the words through a co-occurrence matrix, wherein the co-occurrence matrix is set as ,Representation wordsSum wordFrequency of simultaneous occurrence in historical data;

,

normalizing the co-occurrence matrix:

,

Wherein, Representation wordsSum wordIs a correlation of (1);

constructing words in a database Coordinates of (c):

,

where u represents the index of words in the database and m represents the number of words in the database.

As a preferable scheme of the retrieval target prediction method based on knowledge management, the coordinate positioning comprises the steps of respectively calculating each keyword and a coordinate originThe similarity and the relevance of each keyword in the coordinate axis are established;

Outputting the coordinates of each keyword to obtain a coordinate set of the keywords ;

Wherein, Representing the coordinates of the mth keyword.

The invention relates to a search target prediction method based on knowledge management, which comprises the following steps that a prediction range of search contents comprises a circular area with a radius r by taking coordinates of each keyword as a center in a coordinate system;

,

Wherein Distance is Representing keywordsSum wordThe euclidean distance in the coordinate axis,Representing keywordsIs used for the purpose of determining the coordinates of (a),Representation wordsCoordinates of (c);

for each word If the condition is satisfied, determining that the word is within the prediction range, and determining that all words satisfying the condition are within the prediction rangePerforming set operation to obtain a word set in a final prediction range

,

Where r represents a standard quantity of the prediction horizon threshold, M represents a keyword set,Represents the c-th keyword.

The method for predicting the search target based on knowledge management comprises the steps of integrating database contents in a prediction range, enabling each word in a database to match with corresponding search contents, and collecting the wordsThe elements in the list are arranged in priority, and retrieval information corresponding to each element is output in sequence according to the priority;

the priority comprises measuring the number of times that word vector in database is selected by prediction range, setting the initial counting base of word vector in database as 0, and setting each word vector in keyword In the prediction range of (2), word vectors in the database are counted and increased by 1, all keywords are traversed, and a word set is obtainedSequentially arranging the counting results from large to small, generating priority level descending priority arrangement according to the arrangement of the counting results, and arranging the elements with the same counting result at the same priority level.

A knowledge management based search purpose prediction system employing the method of the present invention, wherein:

The acquisition unit acquires search content of a user and extracts keywords in the search content;

The construction unit is used for carrying out coordinate construction on the content in the database and carrying out coordinate positioning on the search content according to the characteristics of the keywords;

And the prediction unit is used for carrying out approximate change on the keywords, generating a prediction range of search contents at a coordinate positioning position according to a change result, and synthesizing database contents in the prediction range to generate a prediction for search purposes.

A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method according to any one of the invention when the computer program is executed.

A computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor realizes the steps of the method according to any of the present invention.

The knowledge management-based retrieval target prediction method has the beneficial effects that the accuracy of keyword identification is effectively improved through keyword extraction and word vectorization, so that the retrieval content is more accurate. And secondly, part-of-speech tagging and comprehensive context information are carried out by adopting a Bi-LSTM model, so that the understanding capability of sentence structure and semantics is improved. By constructing the similarity and relevance coordinate system among the words, the relationship among the words can be better represented, and accurate coordinate positioning is realized. On the basis, a prediction range of the search content is generated, and the relevance and the comprehensiveness of the search result are ensured. In addition, by weighting, sorting and comprehensively processing the contents in the prediction range, the user satisfaction of the search result can be improved. In the whole, the invention not only improves the accuracy and efficiency of searching, but also can intelligently predict the searching purpose of the user, and greatly improves the information searching experience in the knowledge management system.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is an overall flowchart of a retrieval purpose prediction method based on knowledge management according to a first embodiment of the present invention.

Detailed Description

So that the manner in which the above recited objects, features and advantages of the present invention can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to the embodiments, some of which are illustrated in the appended drawings. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

Embodiment 1, referring to fig. 1, provides a retrieval purpose prediction method based on knowledge management, which comprises:

S1, acquiring search content of a user, and extracting keywords in the search content.

Further, the search content comprises words or sentences input by a user.

The keywords include, to be input, sentencesThe word is segmented into n independent words.

,

Wherein, The sentence to be input is represented as such,Represents the word set after word segmentation,Represent the firstAnd (5) personal words.

Will each wordConversion to corresponding word vectors。

,

Wherein, Representation wordsWord2VecRepresenting words by Word2Vec modelConverted into vectors.

Generating each word by counting the parts of speech of each word in different environments in a dictionary libraryOwned part-of-speech collections;Representation wordsIn the dictionary base, there may be a j-th part of speech.

The part-of-speech frequency of each word is obtained from a dictionary library.

,

Wherein, POSRepresentation wordsIs part of speechDictionary probability, count of (c)Representation wordsAs part of speech in a dictionaryIs used for the number of occurrences of (a),Representation wordsFor the number of occurrences of any part of speech,Representation wordsIs a part of speech tag of (a).

At each wordIn the corresponding part-of-speech set, according to the dictionary probability, randomly extracting the part-of-speech in the setObtaining part-of-speech tagging sequencesAnd extracting the part of speech tagging sequence randomly each timeIs not the same until no new part-of-speech tagging sequence is generated or the maximum number of extractions is reached, wherein,Representing the part of speech of the i-th word.

Is provided withFor the extracted kth sequence,The ith part of speech of the kth part of speech tagging sequence.

Part-of-speech tagging using a pre-trained part-of-speech embedding matrixMapping to a corresponding part-of-speech embedded vector。

,

Where POS2Vec represents the mapping function of the part-of-speech tag to the embedded vector.

Combining each part-of-speech sequence with the original word vector sequence, and calculating the fitness of each part-of-speech sequence through the Bi-LSTM layer and the full connection layer.

Forward LSTM:

,

Backward LSTM:

,

Combining forward and backward hidden states:

,

Calculating the fitness of each part-of-speech sequence:

,

And selecting a sequence with the highest fitness from the generated multiple part-of-speech sequences as a final part-of-speech tagging result.

,

Wherein, Representing the part-of-speech sequence with the highest fitness; Expressed in all possible part-of-speech sequences Is selected such thatThe largest sequence; Represent the first An fitness score of the individual part-of-speech sequences; Represent the first Part of speech sequence ofPersonal wordLSTM represents a long-term memory network; representing part of speech Is a vector of embedding; a splicing operation of representing word vectors and part-of-speech embedded vectors; Represent the first Part of speech sequence ofForward LSTM hidden state vector of the individual word; Represent the first Part of speech sequence ofPersonal wordA backward LSTM hidden state vector; Represent the first Part of speech sequence ofA backward LSTM hidden state vector of the individual word; Representing the i-th word in the kth part-of-speech sequence, combining the hidden state vectors of the forward and backward LSTM; representing bias vectors for fully connected layers, [.] represents splicing operations for vectors; Representing the weight matrix of the fully connected layer.

The weight of each part of speech is preset, and the part of speech weight of each word is selected from the part of speech tagging sequence through a mapping function.

,

Wherein, Representation wordsThe weight value of the map is used to determine,Representation tagIs mapped to the mapping of (a).

The context information of the sentence is passed to an attention layer, and the importance score of each word is calculated.

,

Wherein, Representation wordsIs a weight of attention of (2); is a weight matrix of the attention layer; Representation words Hidden state vectors in the Bi-LSTM layer; Representing the bias vector of the attention layer, and softmax representing the normalization function such that the attention weight sum of all words is 1.

To each wordAndMultiplying and outputting a screening score。

,

Preset threshold valueAnd (5) keyword screening is carried out.

,

Wherein, A set of vectors representing the outputted keywords.

The system can more accurately label the parts of speech through integrating the context information of the Bi-LSTM model, improve the accuracy of the parts of speech label and select the part of speech sequence with the highest fitness as a final result. By calculating the part-of-speech weight and the attention weight, the system can evaluate the importance of each word in the sentence, thereby screening out the keywords that are most useful for retrieval purposes. By screening the scores, the system can determine which words are the most important keywords, so that the keywords are prioritized in the retrieval process, and the relevance and accuracy of the retrieval result are improved.

And S2, carrying out coordinate construction on the content in the database.

The coordinate construction comprises vectorizing words in a database, calculating similarity between word vectors as abscissa, calculating relevance between different words according to historical training data as ordinate, constructing coordinate axes, and determining positions of the word vectors in the database in the coordinate axes.

Setting the most frequent word in the databaseDefined as origin, words are represented by cosine similarityAnd other wordsSimilarity between。

,

Wherein, The dot product of the word vector is represented,AndRepresenting the norms of the word vectors.

Calculating the relevance between different words according to the historical training data, and representing the relevance between the words through a co-occurrence matrix, wherein the co-occurrence matrix is set as,Representation wordsSum wordThe frequency of simultaneous occurrence in the history data.

,

Normalizing the co-occurrence matrix:

,

Wherein, Representation wordsSum wordIs a relationship of (a) and (b).

Constructing words in a databaseCoordinates of (c):

,

It is to be noted that, by vectorizing and constructing coordinates of the contents in the database, the semantic relationship and relevance between the words are visually represented in a two-dimensional coordinate system. In the coordinate system, the word with the highest frequency is used as an origin, and similarity among the words is calculated through cosine similarity, so that the relative position of the words in the semantic space can be reflected more accurately. The representation mode is beneficial to more accurately determining the content related to the user query through the generation of the coordinate positioning and the prediction range in the subsequent retrieval process, and improves the accuracy and the relevance of the retrieval result.

And S3, carrying out coordinate positioning on the search content according to the characteristics of the keywords.

The coordinate positioning comprises the steps of respectively calculating each keyword and a coordinate originThe similarity and the relevance of each keyword in the coordinate axis are established;

Wherein, Representing the coordinates of the mth keyword.

It is to be noted that, by calculating the similarity and the association of each keyword with the origin of coordinates, respectively, the position of each keyword can be precisely located in the coordinate axis. The accurate positioning is helpful for accurately representing the semantic positions of the keywords, and is convenient for subsequent retrieval and analysis.

S4, generating a prediction range of the search content at the coordinate positioning.

The prediction range of the search content includes, in a coordinate system, a circular region having a radius r with respect to the coordinates of each keyword as the center.

,

Wherein Distance isRepresenting keywordsSum wordThe euclidean distance in the coordinate axis,Representing keywordsIs used for the purpose of determining the coordinates of (a),Representation wordsIs defined by the coordinates of (a).

For each wordIf the condition is satisfied, determining that the word is within the prediction range, and determining that all words satisfying the condition are within the prediction rangePerforming set operation to obtain a word set in a final prediction range

,

It is to be noted that the coordinates of the keywords represent their position in semantic space, while generating circular areas of prediction horizon enables capturing the semantic and contextual relationship of these keywords to surrounding words. By the method, the system can better understand the query intention of the user and provide more relevant and accurate retrieval results. By calculating the Euclidean distance between the keywords and the words in the coordinate axes, the words can be rapidly judged to be in the prediction range. The geometric method is simple and efficient, and the number of words to be processed can be greatly reduced, so that the overall retrieval efficiency of the system is improved. Setting a standard prediction range radius r can enable the system to dynamically adjust the range according to the characteristics of different queries. This flexibility enables the system to accommodate different types of queries, whether broad subject matter or specific questions, providing suitable search results.

And through the collective operation, all the vocabularies meeting the conditions are integrated together, so that the comprehensiveness of the search result is ensured. Even if some keywords have no direct related words under some conditions, the comprehensive method can still capture all possible related information, and the coverage of the search result is improved.

And S5, integrating the database contents in the prediction range to generate a prediction for the search purpose.

Further, integrating the database contents in the prediction range includes matching each word in the database with the corresponding search content, and matching the word setThe elements in the list are arranged in priority, and the retrieval information corresponding to each element is output in sequence according to the priority.

The priority comprises measuring the number of times that word vector in database is selected by prediction range, setting the initial counting base of word vector in database as 0, and setting each word vector in keywordIn the prediction range of (2), word vectors in the database are counted and increased by 1, all keywords are traversed, and a word set is obtainedSequentially arranging the counting results from large to small, generating priority level descending priority arrangement according to the arrangement of the counting results, and arranging the elements with the same counting result at the same priority level.

It will be appreciated that by comprehensively analysing the database contents within the prediction horizon, and in particular by metering the number of times a word vector is selected by the prediction horizon, the most relevant word can be identified. The more these terms appear in the predictive range of the plurality of keywords, the more relevant they are to the user's retrieval. Therefore, the design can effectively improve the relevance of the search result and ensure that the user obtains the most valuable information. By counting the number of times each word vector is selected and ranking the numbers of times, the importance and priority of each word can be determined. The higher the priority word, the greater its importance in retrieving content. Therefore, the design can optimize the priority ordering of the retrieval results, so that the user can see the most relevant information first, and the retrieval experience and satisfaction of the user are improved. By automatically metering and ordering the number of occurrences of the word vector, the system is able to intelligently analyze and process a large amount of data, generating an optimal search result. The design not only improves the intelligent degree of the system, but also reduces manual intervention and improves the automatic processing capacity of the system.

On the other hand, the embodiment also provides a retrieval purpose prediction system based on knowledge management, which comprises:

the acquisition unit acquires the search content of the user and extracts keywords in the search content.

And the construction unit is used for carrying out coordinate construction on the content in the database and carrying out coordinate positioning on the search content according to the characteristics of the keywords.

The above functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. The storage medium includes a U disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, an optical disk, or other various media capable of storing program codes.

Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.

More specific examples (a non-exhaustive list) of the computer-readable medium include an electrical connection (an electronic device) having one or more wires, a portable computer diskette (a magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.

It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of techniques known in the art, discrete logic circuits with logic gates for implementing logic functions on data signals, application specific integrated circuits with appropriate combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.

Embodiment 2, below, provides a retrieval purpose prediction method based on knowledge management, and in order to verify the beneficial effects of the invention, scientific demonstration is performed through economic benefit calculation and simulation experiments.

The search content input by the user is 'DATA MINING Algorithms and Models for ANALYSIS AND Prediction IN MANAGEMENT'.

The system performs word segmentation on the sentence, and extracts seven key words, namely Data, mining, algorithm, model, analysis, prediction and Management.

Each keyword is converted to a Word vector using a pre-trained Word2Vec model.

And acquiring part-of-speech statistics of each keyword in different environments through a dictionary library, and generating a part-of-speech set of each word.

The part-of-speech frequency of each word in the dictionary base is counted.

According to the part-of-speech frequency, part-of-speech tagging sequences are randomly generated and part-of-speech tags are mapped to part-of-speech embedding vectors using a pre-trained part-of-speech embedding matrix.

And calculating the fitness of each part-of-speech sequence through the Bi-LSTM layer, and selecting the sequence with the highest fitness as a final part-of-speech tagging result.

Vectorization is carried out on all words in the database, and similarity among word vectors is calculated as abscissa.

And calculating the relevance among the words, and representing the relevance among the words through the co-occurrence matrix and carrying out normalization processing to obtain an ordinate.

The coordinate location of each word in the database is determined.

And calculating the similarity and the relevance between each keyword and the origin of coordinates (the word with the highest frequency), and constructing the coordinate information of each keyword in the coordinate axis.

And outputting the coordinates of each keyword to obtain a coordinate set of the keywords.

The words in the prediction range are selected into the prediction range by calculating the screening scores of the keywords and the words and multiplying the screening scores by a standard value (such as 0.5) by taking the coordinates of each keyword as the center, and if the result is in the range, the words are selected into the prediction range.

And carrying out set operation on all words meeting the conditions to obtain a word set in a final prediction range.

And comprehensively analyzing the database contents in the prediction range.

And metering according to the number of times that the word vector is selected by the prediction range, wherein the initial count is 0.

Counting the occurrence times of each word in the prediction range of all keywords, and sequentially arranging counting results from large to small to generate a word set with decreasing priority. The data are recorded as shown in table 1.

Table 1 data record table

The innovations and advantages of the present invention can be seen from the experimental data tables described above. Firstly, the data in the table show that the word quantity and the word vector selection times of each keyword in the prediction range are different, and the description system can effectively identify and locate related words. The highest number of word vector selections of the keyword 'Data' is 15, which indicates that the keyword 'Data' has the highest occurrence frequency and the highest correlation in the prediction range. In contrast, the keyword "Management" has the lowest number of word vector selections of 5 times, reflecting that the correlation thereof is weaker in the prediction range.

Compared with the traditional method, the method realizes more accurate keyword positioning through coordinate construction and positioning. Traditional information retrieval methods typically rely on keyword matching and boolean logic, and it is difficult to fully capture semantic relationships and contextual information between keywords. The invention can better understand and represent the relation between words through the technologies of word vectorization, part-of-speech tagging, bi-LSTM model, co-occurrence matrix and the like, and improves the retrieval accuracy and correlation.

Specifically, the Bi-LSTM model synthesizes the context information of sentences, improves the accuracy of part-of-speech tagging, and enables the system to better understand the query intention of the user. By calculating the similarity and the relevance of the keywords and the frequency highest words, the system can accurately position the position of each keyword in the coordinate axis, and the accuracy and the comprehensiveness of the prediction range are ensured.

In addition, the prediction range generation and comprehensive processing steps of the invention prioritize the words by measuring the selected times of the word vectors, so that the retrieval result has more relevance and practicability. Compared with the traditional method, the method can dynamically adapt to different types of queries and provide more intelligent and personalized retrieval service.

In summary, the invention has significant innovativeness and advantages in improving the relevance of the search results, optimizing the priority ranking, providing the overall search results, and enhancing the intellectualization of the system. The data analysis in the examples can fully prove the effectiveness and superiority of the invention in practical application.

It should be noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that the technical solution of the present invention may be modified or substituted without departing from the spirit and scope of the technical solution of the present invention, which is intended to be covered in the scope of the claims of the present invention.

Claims

1. A retrieval purpose prediction method based on knowledge management, characterized by comprising:

Obtain the user's search content and extract keywords from the search content;

Coordinate construction of the content in the database;

According to the characteristics of the keywords, coordinate the search content;

At the coordinate location, generate a predicted range of the retrieval content;

synthesizing the contents of the database within the prediction range to generate a prediction for the search purpose;

The coordinate construction includes vectorizing the words in the database, calculating the similarity between the word vectors as the horizontal coordinate; calculating the correlation between different words according to the historical training data as the vertical coordinate, constructing the coordinate axis, and determining the position of the word vector in the database in the coordinate axis;

Let the word _wq with the highest frequency in the database be defined as the origin, and the cosine similarity is used to represent the similarity between word _wq and other words _wp : sin( _wq , _wp );

in, represents the dot product of word vectors, and Represents the norm of the word vector;

The correlation between different words is calculated based on the historical training data, and the correlation between words is represented by the co-occurrence matrix; let the co-occurrence matrix be C, and C( _wq , _wp ) represents the frequency of the co-occurrence of word _wq and word _wp in the historical data;

C(w _q ,w _p )=count(w _q ,w _p )

Normalize the co-occurrence matrix:

Among them, R(w _q ,w _p ) represents the association between word w _q and word w _p ;

Construct the coordinates of word w _p in the database:

Coordinate(w _p )=(sin(w _q ,w _p ),R(w _q ,w _p ))

Where u represents the index of the word in the database, and m represents the number of words in the database;

The coordinate positioning includes respectively calculating the similarity and relevance between each keyword and the coordinate origin w _q , thereby constructing the coordinate information of each keyword in the coordinate axis;

Output the coordinates of each keyword to obtain the coordinate set of the keyword D = {D ₁ , D ₂ , …, D _m };

Where D _m represents the coordinates of the mth keyword;

The predicted range of the search content includes, in the coordinate system, a circular area with a radius of r generated with the coordinates of each keyword as the center;

Wherein, Distance(w _c , w _p ) represents the Euclidean distance between keyword w _c and word w _p in the coordinate axis, (x _c , y _c ) represents the coordinate of keyword w _c , and (x _p , y _p ) represents the coordinate of word w _p ;

For each word w _p , if it meets the conditions, it is determined to be within the prediction range; all words w _p that meet the conditions are set to obtain the final word set P within the prediction range:

Among them, r represents the standard quantity of the prediction range threshold, M represents the keyword set, and w _c represents the cth keyword.

2. The method for predicting search purpose based on knowledge management according to claim 1, wherein the search content includes text or sentences input by the user;

The keywords include,segmenting the input sentence S into n independent words;

S＝{w ₁ , w ₂ ,..., w _n }

Where S represents the input sentence, {w ₁ ,w ₂ ,...,w _n } represents the word set after word segmentation, and w _n represents the nth word;

Convert each word _wi into the corresponding word vector

in, Represents the word vector of word _wi , Word2Vec( _wi ) means converting word _wi into a vector through the Word2Vec model;

By counting the part of speech of each word in the dictionary under different circumstances, a part of speech set {t _i1 , t _i2 , …, t _ij } of each word _wi is generated; t _ij represents the jth part of speech that may exist in the word _wi in the dictionary;

Get the dictionary probability of each word's part of speech from the dictionary library;

Where _Pdict (POS( _wi )= _ti ) represents the dictionary probability that word _wi is part of speech _ti , count( _wi , _ti ) represents the number of occurrences of word _wi as part of speech _ti in the dictionary, Indicates the number of occurrences of word _wi as any part of speech, POS( _wi ) represents the part-of-speech tag of word _wi ;

In the part-of-speech set corresponding to each word _wi , the part-of-speech t _i ∈ {t _i1 , t _i2 , …, t _ij } in the set is randomly extracted based on the dictionary probability to obtain a part-of-speech tagging sequence T = {t ₁ , t ₂ , …, t _n }, and the part-of-speech tagging sequence T obtained by each random extraction is different until no new part-of-speech tagging sequence is generated or the maximum number of extractions is reached; where _ti represents the part of speech of the i-th word.

3. The method for predicting retrieval purpose based on knowledge management as claimed in claim 2, characterized in that: the keyword further comprises, let T _k be the kth sequence extracted, t _ki be the i-th part of speech of the k-th part of speech tag sequence;

Use a pre-trained part-of-speech embedding matrix to map the part-of-speech tag t _ki to the corresponding part-of-speech embedding vector

Among them, POS2Vec represents the mapping function from part-of-speech tags to embedded vectors;

Combine each part-of-speech sequence with the word vector The sequences are combined and the fitness of each part-of-speech sequence is calculated through the Bi-LSTM layer and the fully connected layer;

Forward LSTM:

Backward LSTM:

Combining the forward and backward hidden states:

Calculate the fitness of each part-of-speech sequence:

Select the sequence with the highest fitness from the multiple generated part-of-speech sequences as the final part-of-speech tagging result;

Among them, T ^* represents the part-of-speech sequence with the highest fitness; It means selecting the sequence that makes s _k the largest among all possible part-of-speech sequences T _k ; s _k represents the fitness score of the kth part-of-speech sequence; represents the forward LSTM hidden state vector of the i-th word w _i in the k-th part-of-speech sequence; LSTM stands for long short-term memory network; Embedding vector representing part of speech t _ki ; Represents the concatenation operation of word vector and part-of-speech embedding vector; Represents the forward LSTM hidden state vector of the i-1th word in the kth part-of-speech sequence; Represents the backward LSTM hidden state vector of the i-th word w _i in the k-th part-of-speech sequence; represents the backward LSTM hidden state vector of the i+1th word in the kth part-of-speech sequence; h _ki represents the hidden state vector of the i-th word in the kth part-of-speech sequence, combining the forward and backward LSTMs; b _s represents the bias vector of the fully connected layer; [...] represents the concatenation operation of the vector; W _s represents the weight matrix of the fully connected layer;

Preset the weight of each part of speech, and select the part of speech weight of each word in the part of speech tag sequence T ^* through the mapping function;

α _i =f(POS( _wi ))

Where α _i represents the weight value of the mapping of word w _i , and f(POS(w _i )) represents the mapping of label POS(w _i );

Pass the context information of the sentence to an attention layer to calculate the importance score of each word;

β _i =softmax(W _a h _i + _ba )

Where _βi represents the attention weight of word _wi ; _Wa is the weight matrix of the attention layer; _hi represents the hidden state vector of word _wi in the Bi-LSTM layer; _ba represents the bias vector of the attention layer; softmax represents the normalization function so that the sum of the attention weights of all words is 1;

Multiply α _i and β _i of each word and output the screening score Score( _wi );

Score( _wi ) = _βi × _αi

Preset threshold θ to filter keywords;

Keywords＝{w _i |Score(w _i )>θ}

Among them, Keywords represents the vector set of output keywords.

4. The method for predicting search purposes based on knowledge management as claimed in claim 3, characterized in that: synthesizing the database contents within the prediction range includes matching each word in the database with the corresponding search contents, prioritizing the elements in the word set P, and outputting the search information corresponding to each element in turn according to the priority;

The priority includes measuring the number of times the word vector in the database is selected by the prediction range; assuming that the initial counting basis of the word vector in the database is 0, adding 1 to the count of each word vector in the database in the prediction range of the keyword _wp , traveling through all keywords, and obtaining the counting result of each element in the word set P; arranging the counting results from large to small, and generating a descending priority arrangement one by one according to the arrangement of the counting results, and the elements with the same counting results are at the same priority.

5. A retrieval purpose prediction system based on knowledge management using the method according to any one of claims 1 to 4, characterized in that:

A collection unit obtains the user's search content and extracts keywords from the search content;

A construction unit constructs coordinates of the content in the database; coordinates the searched content according to the characteristics of the keyword;

The prediction unit performs approximate changes on the keywords, and generates a prediction range of the search content at the coordinate location according to the change result; and integrates the database content within the prediction range to generate a prediction of the search purpose.

6. A computer device, comprising: a memory and a processor; the memory stores a computer program, characterized in that: when the processor executes the computer program, the steps of a retrieval purpose prediction method based on knowledge management are implemented.

7. A computer-readable storage medium having a computer program stored thereon, characterized in that: when the computer program is executed by a processor, the steps of a retrieval purpose prediction method based on knowledge management are implemented.