[go: up one dir, main page]

CN119336795B - A search purpose prediction method and system based on knowledge management - Google Patents

A search purpose prediction method and system based on knowledge management Download PDF

Info

Publication number
CN119336795B
CN119336795B CN202411847602.0A CN202411847602A CN119336795B CN 119336795 B CN119336795 B CN 119336795B CN 202411847602 A CN202411847602 A CN 202411847602A CN 119336795 B CN119336795 B CN 119336795B
Authority
CN
China
Prior art keywords
word
speech
coordinate
vector
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202411847602.0A
Other languages
Chinese (zh)
Other versions
CN119336795A (en
Inventor
刘祯
刘敬仪
李舒婷
杨明栋
邱杰峰
杨伟伟
程莉红
施千里
胡丹丹
陈莹
佟成郁
孙海凤
谭兵
徐晓燕
李喆
陈龙
黄健
王一宏
倪玉
周劼翀
鲍琨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nuclear Power Operation Research Shanghai Co ltd
CNNC Fujian Nuclear Power Co Ltd
Original Assignee
Nuclear Power Operation Research Shanghai Co ltd
CNNC Fujian Nuclear Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nuclear Power Operation Research Shanghai Co ltd, CNNC Fujian Nuclear Power Co Ltd filed Critical Nuclear Power Operation Research Shanghai Co ltd
Priority to CN202411847602.0A priority Critical patent/CN119336795B/en
Publication of CN119336795A publication Critical patent/CN119336795A/en
Application granted granted Critical
Publication of CN119336795B publication Critical patent/CN119336795B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Probability & Statistics with Applications (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明涉及检索目的预测技术领域,公开了一种基于知识管理的检索目的预测方法及系统,包括:获取用户的检索内容,并提取检索内容中的关键词;对数据库中的内容进行坐标构建;根据所述关键词的特征,对检索内容进行坐标定位;在坐标定位处,生成检索内容的预测范围;对所述预测范围内的数据库内容进行综合,生成对检索目的的预测。有效提升了关键词识别的准确性,使得检索内容更加精准。能够更好地表示词语间的关系,实现精确的坐标定位。不仅提升了检索的准确性和效率,还能够智能预测用户的检索目的,极大改善了知识管理系统中的信息检索体验。

The present invention relates to the technical field of search purpose prediction, and discloses a search purpose prediction method and system based on knowledge management, including: obtaining the search content of the user, and extracting keywords in the search content; constructing coordinates for the content in the database; locating the search content according to the characteristics of the keywords; generating a predicted range of the search content at the coordinate location; synthesizing the database content within the predicted range to generate a prediction of the search purpose. The accuracy of keyword recognition is effectively improved, making the search content more accurate. The relationship between words can be better represented to achieve accurate coordinate positioning. Not only does it improve the accuracy and efficiency of the search, but it can also intelligently predict the user's search purpose, greatly improving the information retrieval experience in the knowledge management system.

Description

Knowledge management-based retrieval purpose prediction method and system
Technical Field
The invention relates to the technical field of search purpose prediction, in particular to a search purpose prediction method and system based on knowledge management.
Background
In the information age today, knowledge management is one of the key means for enterprises and organizations to gain competitive advantages. Knowledge management aims at collecting, sorting, sharing and utilizing knowledge resources through a systematic method and improving innovation capability and decision level of organization. In a big data environment, how to effectively manage and utilize a large amount of knowledge and information becomes a significant challenge in the field of knowledge management.
Information retrieval technology (Information Retrieval) has evolved over the last decades. Traditional information retrieval methods rely mainly on keyword matching and boolean logic, but with explosive growth of internet and data volume, traditional methods are increasingly inadequate. Modern information retrieval technologies begin to combine advanced techniques such as Natural Language Processing (NLP), machine Learning (ML), and Deep Learning (DL) to improve the accuracy and efficiency of retrieval.
In knowledge management systems, users often need to quickly and accurately obtain information related to their needs. This need has spawned a knowledge management-based search method, i.e., the user's search objective is understood by an intelligent means, thereby providing more accurate and relevant search results. However, existing retrieval methods still have shortcomings in processing complex queries, understanding user intent, and integrating a variety of information sources.
Disclosure of Invention
The present invention has been made in view of the above-described problems.
Therefore, the invention solves the technical problems of poor information relevance, inaccurate keyword extraction, insufficient retrieval precision and the like of the existing retrieval target prediction method.
In order to solve the technical problems, the invention provides a retrieval purpose prediction method based on knowledge management, which comprises the following steps:
Acquiring search content of a user, and extracting keywords in the search content;
carrying out coordinate construction on the content in the database;
according to the characteristics of the keywords, carrying out coordinate positioning on the search content;
Generating a prediction range of the search content at the coordinate location;
and integrating the database contents in the prediction range to generate a prediction for the search purpose.
The invention relates to a search target prediction method based on knowledge management, which is a preferable scheme, wherein the search content comprises characters or sentences input by a user;
The keywords include, to be input, sentences Dividing words into n independent words;
,
Wherein, The sentence to be input is represented as such,Represents the word set after word segmentation,Represent the firstA personal word;
will each word Conversion to corresponding word vectors;
,
Wherein, Representation wordsWord2VecRepresenting words by Word2Vec modelConverting into vectors;
generating each word by counting the parts of speech of each word in different environments in a dictionary library Owned part-of-speech collections;Representation wordsIn the dictionary base, the j-th part of speech which may exist;
The dictionary probability of each word part of speech is obtained from a dictionary database;
,
Wherein, POSRepresentation wordsIs part of speechDictionary probability, count of (c)Representation wordsAs part of speech in a dictionaryIs used for the number of occurrences of (a),Representation wordsFor the number of occurrences of any part of speech,Representation wordsPart of speech tag of (2);
At each word In the corresponding part-of-speech set, according to the dictionary probability, randomly extracting the part-of-speech in the setObtaining part-of-speech tagging sequencesAnd extracting the part of speech tagging sequence randomly each timeIs not the same until no new part-of-speech tagging sequence is generated or the maximum number of extractions is reached, wherein,Representing the part of speech of the i-th word.
As a preferable scheme of the retrieval purpose prediction method based on knowledge management, the invention further comprises the following steps ofFor the extracted kth sequence,An ith part of speech of the kth part of speech tagging sequence;
Part-of-speech tagging using a pre-trained part-of-speech embedding matrix Mapping to a corresponding part-of-speech embedded vector;
,
Wherein POS2Vec represents the mapping function of part-of-speech tags to embedded vectors;
Combining each part-of-speech sequence with a word vector The sequences of each part-of-speech sequence are combined, and the fitness of each part-of-speech sequence is calculated through a Bi-LSTM layer and a full connection layer;
forward LSTM:
,
Backward LSTM:
,
Combining forward and backward hidden states:
,
Calculating the fitness of each part-of-speech sequence:
,
Selecting a sequence with the highest fitness from the generated multiple part-of-speech sequences as a final part-of-speech tagging result;
,
Wherein, Representing the part-of-speech sequence with the highest fitness; Expressed in all possible part-of-speech sequences Is selected such thatThe largest sequence; Represent the first An fitness score of the individual part-of-speech sequences; Represent the first Part of speech sequence ofPersonal wordLSTM represents a long-term memory network; representing part of speech Is a vector of embedding; a splicing operation of representing word vectors and part-of-speech embedded vectors; Represent the first Part of speech sequence ofForward LSTM hidden state vector of the individual word; Represent the first Part of speech sequence ofPersonal wordA backward LSTM hidden state vector; Represent the first Part of speech sequence ofA backward LSTM hidden state vector of the individual word; Representing the i-th word in the kth part-of-speech sequence, combining the hidden state vectors of the forward and backward LSTM; representing bias vectors for fully connected layers, [.] represents splicing operations for vectors; A weight matrix representing the full connection layer;
presetting the weight of each part of speech, and selecting a part of speech tagging sequence through a mapping function Part-of-speech weights for each word;
,
Wherein, Representation wordsThe weight value of the map is used to determine,Representation tagIs mapped to;
transmitting the context information of the sentences to an attention layer, and calculating the importance score of each word;
,
Wherein, Representation wordsIs a weight of attention of (2); is a weight matrix of the attention layer; Representation words Hidden state vectors in the Bi-LSTM layer; the softmax represents a normalization function such that the attention weight sum of all words is 1;
To each word AndMultiplying and outputting a screening score;
,
Preset threshold valueThe word of (2) is subjected to keyword screening;
,
Wherein, A set of vectors representing the outputted keywords.
The invention relates to a search target prediction method based on knowledge management, which is a preferable scheme, wherein the coordinate construction comprises vectorizing words in a database, calculating similarity between word vectors as an abscissa, calculating relevance between different words according to historical training data as an ordinate, constructing a coordinate axis, and determining the position of the word vector in the database in the coordinate axis;
setting the most frequent word in the database Defined as origin, words are represented by cosine similarityAnd other wordsSimilarity between;
,
Wherein, The dot product of the word vector is represented,AndRepresenting norms of word vectors;
calculating the relevance between different words according to the historical training data, and representing the relevance between the words through a co-occurrence matrix, wherein the co-occurrence matrix is set as ,Representation wordsSum wordFrequency of simultaneous occurrence in historical data;
,
normalizing the co-occurrence matrix:
,
Wherein, Representation wordsSum wordIs a correlation of (1);
constructing words in a database Coordinates of (c):
,
where u represents the index of words in the database and m represents the number of words in the database.
As a preferable scheme of the retrieval target prediction method based on knowledge management, the coordinate positioning comprises the steps of respectively calculating each keyword and a coordinate originThe similarity and the relevance of each keyword in the coordinate axis are established;
Outputting the coordinates of each keyword to obtain a coordinate set of the keywords ;
Wherein, Representing the coordinates of the mth keyword.
The invention relates to a search target prediction method based on knowledge management, which comprises the following steps that a prediction range of search contents comprises a circular area with a radius r by taking coordinates of each keyword as a center in a coordinate system;
,
Wherein Distance is Representing keywordsSum wordThe euclidean distance in the coordinate axis,Representing keywordsIs used for the purpose of determining the coordinates of (a),Representation wordsCoordinates of (c);
for each word If the condition is satisfied, determining that the word is within the prediction range, and determining that all words satisfying the condition are within the prediction rangePerforming set operation to obtain a word set in a final prediction range
,
Where r represents a standard quantity of the prediction horizon threshold, M represents a keyword set,Represents the c-th keyword.
The method for predicting the search target based on knowledge management comprises the steps of integrating database contents in a prediction range, enabling each word in a database to match with corresponding search contents, and collecting the wordsThe elements in the list are arranged in priority, and retrieval information corresponding to each element is output in sequence according to the priority;
the priority comprises measuring the number of times that word vector in database is selected by prediction range, setting the initial counting base of word vector in database as 0, and setting each word vector in keyword In the prediction range of (2), word vectors in the database are counted and increased by 1, all keywords are traversed, and a word set is obtainedSequentially arranging the counting results from large to small, generating priority level descending priority arrangement according to the arrangement of the counting results, and arranging the elements with the same counting result at the same priority level.
A knowledge management based search purpose prediction system employing the method of the present invention, wherein:
The acquisition unit acquires search content of a user and extracts keywords in the search content;
The construction unit is used for carrying out coordinate construction on the content in the database and carrying out coordinate positioning on the search content according to the characteristics of the keywords;
And the prediction unit is used for carrying out approximate change on the keywords, generating a prediction range of search contents at a coordinate positioning position according to a change result, and synthesizing database contents in the prediction range to generate a prediction for search purposes.
A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method according to any one of the invention when the computer program is executed.
A computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor realizes the steps of the method according to any of the present invention.
The knowledge management-based retrieval target prediction method has the beneficial effects that the accuracy of keyword identification is effectively improved through keyword extraction and word vectorization, so that the retrieval content is more accurate. And secondly, part-of-speech tagging and comprehensive context information are carried out by adopting a Bi-LSTM model, so that the understanding capability of sentence structure and semantics is improved. By constructing the similarity and relevance coordinate system among the words, the relationship among the words can be better represented, and accurate coordinate positioning is realized. On the basis, a prediction range of the search content is generated, and the relevance and the comprehensiveness of the search result are ensured. In addition, by weighting, sorting and comprehensively processing the contents in the prediction range, the user satisfaction of the search result can be improved. In the whole, the invention not only improves the accuracy and efficiency of searching, but also can intelligently predict the searching purpose of the user, and greatly improves the information searching experience in the knowledge management system.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is an overall flowchart of a retrieval purpose prediction method based on knowledge management according to a first embodiment of the present invention.
Detailed Description
So that the manner in which the above recited objects, features and advantages of the present invention can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to the embodiments, some of which are illustrated in the appended drawings. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
Embodiment 1, referring to fig. 1, provides a retrieval purpose prediction method based on knowledge management, which comprises:
S1, acquiring search content of a user, and extracting keywords in the search content.
Further, the search content comprises words or sentences input by a user.
The keywords include, to be input, sentencesThe word is segmented into n independent words.
,
Wherein, The sentence to be input is represented as such,Represents the word set after word segmentation,Represent the firstAnd (5) personal words.
Will each wordConversion to corresponding word vectors
,
Wherein, Representation wordsWord2VecRepresenting words by Word2Vec modelConverted into vectors.
Generating each word by counting the parts of speech of each word in different environments in a dictionary libraryOwned part-of-speech collections;Representation wordsIn the dictionary base, there may be a j-th part of speech.
The part-of-speech frequency of each word is obtained from a dictionary library.
,
Wherein, POSRepresentation wordsIs part of speechDictionary probability, count of (c)Representation wordsAs part of speech in a dictionaryIs used for the number of occurrences of (a),Representation wordsFor the number of occurrences of any part of speech,Representation wordsIs a part of speech tag of (a).
At each wordIn the corresponding part-of-speech set, according to the dictionary probability, randomly extracting the part-of-speech in the setObtaining part-of-speech tagging sequencesAnd extracting the part of speech tagging sequence randomly each timeIs not the same until no new part-of-speech tagging sequence is generated or the maximum number of extractions is reached, wherein,Representing the part of speech of the i-th word.
Is provided withFor the extracted kth sequence,The ith part of speech of the kth part of speech tagging sequence.
Part-of-speech tagging using a pre-trained part-of-speech embedding matrixMapping to a corresponding part-of-speech embedded vector
,
Where POS2Vec represents the mapping function of the part-of-speech tag to the embedded vector.
Combining each part-of-speech sequence with the original word vector sequence, and calculating the fitness of each part-of-speech sequence through the Bi-LSTM layer and the full connection layer.
Forward LSTM:
,
Backward LSTM:
,
Combining forward and backward hidden states:
,
Calculating the fitness of each part-of-speech sequence:
,
And selecting a sequence with the highest fitness from the generated multiple part-of-speech sequences as a final part-of-speech tagging result.
,
Wherein, Representing the part-of-speech sequence with the highest fitness; Expressed in all possible part-of-speech sequences Is selected such thatThe largest sequence; Represent the first An fitness score of the individual part-of-speech sequences; Represent the first Part of speech sequence ofPersonal wordLSTM represents a long-term memory network; representing part of speech Is a vector of embedding; a splicing operation of representing word vectors and part-of-speech embedded vectors; Represent the first Part of speech sequence ofForward LSTM hidden state vector of the individual word; Represent the first Part of speech sequence ofPersonal wordA backward LSTM hidden state vector; Represent the first Part of speech sequence ofA backward LSTM hidden state vector of the individual word; Representing the i-th word in the kth part-of-speech sequence, combining the hidden state vectors of the forward and backward LSTM; representing bias vectors for fully connected layers, [.] represents splicing operations for vectors; Representing the weight matrix of the fully connected layer.
The weight of each part of speech is preset, and the part of speech weight of each word is selected from the part of speech tagging sequence through a mapping function.
,
Wherein, Representation wordsThe weight value of the map is used to determine,Representation tagIs mapped to the mapping of (a).
The context information of the sentence is passed to an attention layer, and the importance score of each word is calculated.
,
Wherein, Representation wordsIs a weight of attention of (2); is a weight matrix of the attention layer; Representation words Hidden state vectors in the Bi-LSTM layer; Representing the bias vector of the attention layer, and softmax representing the normalization function such that the attention weight sum of all words is 1.
To each wordAndMultiplying and outputting a screening score
,
Preset threshold valueAnd (5) keyword screening is carried out.
,
Wherein, A set of vectors representing the outputted keywords.
The system can more accurately label the parts of speech through integrating the context information of the Bi-LSTM model, improve the accuracy of the parts of speech label and select the part of speech sequence with the highest fitness as a final result. By calculating the part-of-speech weight and the attention weight, the system can evaluate the importance of each word in the sentence, thereby screening out the keywords that are most useful for retrieval purposes. By screening the scores, the system can determine which words are the most important keywords, so that the keywords are prioritized in the retrieval process, and the relevance and accuracy of the retrieval result are improved.
And S2, carrying out coordinate construction on the content in the database.
The coordinate construction comprises vectorizing words in a database, calculating similarity between word vectors as abscissa, calculating relevance between different words according to historical training data as ordinate, constructing coordinate axes, and determining positions of the word vectors in the database in the coordinate axes.
Setting the most frequent word in the databaseDefined as origin, words are represented by cosine similarityAnd other wordsSimilarity between
,
Wherein, The dot product of the word vector is represented,AndRepresenting the norms of the word vectors.
Calculating the relevance between different words according to the historical training data, and representing the relevance between the words through a co-occurrence matrix, wherein the co-occurrence matrix is set as,Representation wordsSum wordThe frequency of simultaneous occurrence in the history data.
,
Normalizing the co-occurrence matrix:
,
Wherein, Representation wordsSum wordIs a relationship of (a) and (b).
Constructing words in a databaseCoordinates of (c):
,
where u represents the index of words in the database and m represents the number of words in the database.
It is to be noted that, by vectorizing and constructing coordinates of the contents in the database, the semantic relationship and relevance between the words are visually represented in a two-dimensional coordinate system. In the coordinate system, the word with the highest frequency is used as an origin, and similarity among the words is calculated through cosine similarity, so that the relative position of the words in the semantic space can be reflected more accurately. The representation mode is beneficial to more accurately determining the content related to the user query through the generation of the coordinate positioning and the prediction range in the subsequent retrieval process, and improves the accuracy and the relevance of the retrieval result.
And S3, carrying out coordinate positioning on the search content according to the characteristics of the keywords.
The coordinate positioning comprises the steps of respectively calculating each keyword and a coordinate originThe similarity and the relevance of each keyword in the coordinate axis are established;
Outputting the coordinates of each keyword to obtain a coordinate set of the keywords ;
Wherein, Representing the coordinates of the mth keyword.
It is to be noted that, by calculating the similarity and the association of each keyword with the origin of coordinates, respectively, the position of each keyword can be precisely located in the coordinate axis. The accurate positioning is helpful for accurately representing the semantic positions of the keywords, and is convenient for subsequent retrieval and analysis.
S4, generating a prediction range of the search content at the coordinate positioning.
The prediction range of the search content includes, in a coordinate system, a circular region having a radius r with respect to the coordinates of each keyword as the center.
,
Wherein Distance isRepresenting keywordsSum wordThe euclidean distance in the coordinate axis,Representing keywordsIs used for the purpose of determining the coordinates of (a),Representation wordsIs defined by the coordinates of (a).
For each wordIf the condition is satisfied, determining that the word is within the prediction range, and determining that all words satisfying the condition are within the prediction rangePerforming set operation to obtain a word set in a final prediction range
,
Where r represents a standard quantity of the prediction horizon threshold, M represents a keyword set,Represents the c-th keyword.
It is to be noted that the coordinates of the keywords represent their position in semantic space, while generating circular areas of prediction horizon enables capturing the semantic and contextual relationship of these keywords to surrounding words. By the method, the system can better understand the query intention of the user and provide more relevant and accurate retrieval results. By calculating the Euclidean distance between the keywords and the words in the coordinate axes, the words can be rapidly judged to be in the prediction range. The geometric method is simple and efficient, and the number of words to be processed can be greatly reduced, so that the overall retrieval efficiency of the system is improved. Setting a standard prediction range radius r can enable the system to dynamically adjust the range according to the characteristics of different queries. This flexibility enables the system to accommodate different types of queries, whether broad subject matter or specific questions, providing suitable search results.
And through the collective operation, all the vocabularies meeting the conditions are integrated together, so that the comprehensiveness of the search result is ensured. Even if some keywords have no direct related words under some conditions, the comprehensive method can still capture all possible related information, and the coverage of the search result is improved.
And S5, integrating the database contents in the prediction range to generate a prediction for the search purpose.
Further, integrating the database contents in the prediction range includes matching each word in the database with the corresponding search content, and matching the word setThe elements in the list are arranged in priority, and the retrieval information corresponding to each element is output in sequence according to the priority.
The priority comprises measuring the number of times that word vector in database is selected by prediction range, setting the initial counting base of word vector in database as 0, and setting each word vector in keywordIn the prediction range of (2), word vectors in the database are counted and increased by 1, all keywords are traversed, and a word set is obtainedSequentially arranging the counting results from large to small, generating priority level descending priority arrangement according to the arrangement of the counting results, and arranging the elements with the same counting result at the same priority level.
It will be appreciated that by comprehensively analysing the database contents within the prediction horizon, and in particular by metering the number of times a word vector is selected by the prediction horizon, the most relevant word can be identified. The more these terms appear in the predictive range of the plurality of keywords, the more relevant they are to the user's retrieval. Therefore, the design can effectively improve the relevance of the search result and ensure that the user obtains the most valuable information. By counting the number of times each word vector is selected and ranking the numbers of times, the importance and priority of each word can be determined. The higher the priority word, the greater its importance in retrieving content. Therefore, the design can optimize the priority ordering of the retrieval results, so that the user can see the most relevant information first, and the retrieval experience and satisfaction of the user are improved. By automatically metering and ordering the number of occurrences of the word vector, the system is able to intelligently analyze and process a large amount of data, generating an optimal search result. The design not only improves the intelligent degree of the system, but also reduces manual intervention and improves the automatic processing capacity of the system.
On the other hand, the embodiment also provides a retrieval purpose prediction system based on knowledge management, which comprises:
the acquisition unit acquires the search content of the user and extracts keywords in the search content.
And the construction unit is used for carrying out coordinate construction on the content in the database and carrying out coordinate positioning on the search content according to the characteristics of the keywords.
And the prediction unit is used for carrying out approximate change on the keywords, generating a prediction range of search contents at a coordinate positioning position according to a change result, and synthesizing database contents in the prediction range to generate a prediction for search purposes.
The above functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. The storage medium includes a U disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, an optical disk, or other various media capable of storing program codes.
Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
More specific examples (a non-exhaustive list) of the computer-readable medium include an electrical connection (an electronic device) having one or more wires, a portable computer diskette (a magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.
It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of techniques known in the art, discrete logic circuits with logic gates for implementing logic functions on data signals, application specific integrated circuits with appropriate combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.
Embodiment 2, below, provides a retrieval purpose prediction method based on knowledge management, and in order to verify the beneficial effects of the invention, scientific demonstration is performed through economic benefit calculation and simulation experiments.
The search content input by the user is 'DATA MINING Algorithms and Models for ANALYSIS AND Prediction IN MANAGEMENT'.
The system performs word segmentation on the sentence, and extracts seven key words, namely Data, mining, algorithm, model, analysis, prediction and Management.
Each keyword is converted to a Word vector using a pre-trained Word2Vec model.
And acquiring part-of-speech statistics of each keyword in different environments through a dictionary library, and generating a part-of-speech set of each word.
The part-of-speech frequency of each word in the dictionary base is counted.
According to the part-of-speech frequency, part-of-speech tagging sequences are randomly generated and part-of-speech tags are mapped to part-of-speech embedding vectors using a pre-trained part-of-speech embedding matrix.
And calculating the fitness of each part-of-speech sequence through the Bi-LSTM layer, and selecting the sequence with the highest fitness as a final part-of-speech tagging result.
Vectorization is carried out on all words in the database, and similarity among word vectors is calculated as abscissa.
And calculating the relevance among the words, and representing the relevance among the words through the co-occurrence matrix and carrying out normalization processing to obtain an ordinate.
The coordinate location of each word in the database is determined.
And calculating the similarity and the relevance between each keyword and the origin of coordinates (the word with the highest frequency), and constructing the coordinate information of each keyword in the coordinate axis.
And outputting the coordinates of each keyword to obtain a coordinate set of the keywords.
The words in the prediction range are selected into the prediction range by calculating the screening scores of the keywords and the words and multiplying the screening scores by a standard value (such as 0.5) by taking the coordinates of each keyword as the center, and if the result is in the range, the words are selected into the prediction range.
And carrying out set operation on all words meeting the conditions to obtain a word set in a final prediction range.
And comprehensively analyzing the database contents in the prediction range.
And metering according to the number of times that the word vector is selected by the prediction range, wherein the initial count is 0.
Counting the occurrence times of each word in the prediction range of all keywords, and sequentially arranging counting results from large to small to generate a word set with decreasing priority. The data are recorded as shown in table 1.
Table 1 data record table
The innovations and advantages of the present invention can be seen from the experimental data tables described above. Firstly, the data in the table show that the word quantity and the word vector selection times of each keyword in the prediction range are different, and the description system can effectively identify and locate related words. The highest number of word vector selections of the keyword 'Data' is 15, which indicates that the keyword 'Data' has the highest occurrence frequency and the highest correlation in the prediction range. In contrast, the keyword "Management" has the lowest number of word vector selections of 5 times, reflecting that the correlation thereof is weaker in the prediction range.
Compared with the traditional method, the method realizes more accurate keyword positioning through coordinate construction and positioning. Traditional information retrieval methods typically rely on keyword matching and boolean logic, and it is difficult to fully capture semantic relationships and contextual information between keywords. The invention can better understand and represent the relation between words through the technologies of word vectorization, part-of-speech tagging, bi-LSTM model, co-occurrence matrix and the like, and improves the retrieval accuracy and correlation.
Specifically, the Bi-LSTM model synthesizes the context information of sentences, improves the accuracy of part-of-speech tagging, and enables the system to better understand the query intention of the user. By calculating the similarity and the relevance of the keywords and the frequency highest words, the system can accurately position the position of each keyword in the coordinate axis, and the accuracy and the comprehensiveness of the prediction range are ensured.
In addition, the prediction range generation and comprehensive processing steps of the invention prioritize the words by measuring the selected times of the word vectors, so that the retrieval result has more relevance and practicability. Compared with the traditional method, the method can dynamically adapt to different types of queries and provide more intelligent and personalized retrieval service.
In summary, the invention has significant innovativeness and advantages in improving the relevance of the search results, optimizing the priority ranking, providing the overall search results, and enhancing the intellectualization of the system. The data analysis in the examples can fully prove the effectiveness and superiority of the invention in practical application.
It should be noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that the technical solution of the present invention may be modified or substituted without departing from the spirit and scope of the technical solution of the present invention, which is intended to be covered in the scope of the claims of the present invention.

Claims (7)

1.一种基于知识管理的检索目的预测方法,其特征在于,包括:1. A retrieval purpose prediction method based on knowledge management, characterized by comprising: 获取用户的检索内容,并提取检索内容中的关键词;Obtain the user's search content and extract keywords from the search content; 对数据库中的内容进行坐标构建;Coordinate construction of the content in the database; 根据所述关键词的特征,对检索内容进行坐标定位;According to the characteristics of the keywords, coordinate the search content; 在坐标定位处,生成检索内容的预测范围;At the coordinate location, generate a predicted range of the retrieval content; 对所述预测范围内的数据库内容进行综合,生成对检索目的的预测;synthesizing the contents of the database within the prediction range to generate a prediction for the search purpose; 所述坐标构建包括,对数据库中的词进行向量化,计算词向量之间的相似性,作为横坐标;根据历史训练数据计算不同词之间的关联性,作为纵坐标,构建坐标轴,并确定数据库中的词向量在坐标轴中的位置;The coordinate construction includes vectorizing the words in the database, calculating the similarity between the word vectors as the horizontal coordinate; calculating the correlation between different words according to the historical training data as the vertical coordinate, constructing the coordinate axis, and determining the position of the word vector in the database in the coordinate axis; 设数据库中频率最高的词wq定义为原点,通过余弦相似度来表示词wq和其他词wp之间的相似性sin(wq,wp);Let the word wq with the highest frequency in the database be defined as the origin, and the cosine similarity is used to represent the similarity between word wq and other words wp : sin( wq , wp ); 其中,表示词向量的点积,表示词向量的范数;in, represents the dot product of word vectors, and Represents the norm of the word vector; 根据历史训练数据计算不同词之间的关联性,通过共现矩阵表示词之间的关联性;设共现矩阵为C,C(wq,wp)表示词wq和词wp在历史数据中同时出现的频率;The correlation between different words is calculated based on the historical training data, and the correlation between words is represented by the co-occurrence matrix; let the co-occurrence matrix be C, and C( wq , wp ) represents the frequency of the co-occurrence of word wq and word wp in the historical data; C(wq,wp)=count(wq,wp)C(w q ,w p )=count(w q ,w p ) 对共现矩阵进行归一化处理:Normalize the co-occurrence matrix: 其中,R(wq,wp)表示词wq和词wp的关联性;Among them, R(w q ,w p ) represents the association between word w q and word w p ; 构建数据库中词wp的坐标:Construct the coordinates of word w p in the database: Coordinate(wp)=(sin(wq,wp),R(wq,wp))Coordinate(w p )=(sin(w q ,w p ),R(w q ,w p )) 其中,u表示数据库中词的索引,m表示数据库中词的数量;Where u represents the index of the word in the database, and m represents the number of words in the database; 所述坐标定位包括,分别计算每个所述关键词与坐标原点wq的相似性和关联性,从而构建每一个关键词在坐标轴中的坐标信息;The coordinate positioning includes respectively calculating the similarity and relevance between each keyword and the coordinate origin w q , thereby constructing the coordinate information of each keyword in the coordinate axis; 输出每个关键词的坐标,得到关键词的坐标集合D={D1,D2,…,Dm};Output the coordinates of each keyword to obtain the coordinate set of the keyword D = {D 1 , D 2 , …, D m }; 其中,Dm表示第m个关键词的坐标;Where D m represents the coordinates of the mth keyword; 所述检索内容的预测范围包括,坐标系中,以每个关键词的坐标为中心,生成半径为r的圆形区域;The predicted range of the search content includes, in the coordinate system, a circular area with a radius of r generated with the coordinates of each keyword as the center; 其中,Distance(wc,wp)表示关键词wc和词wp在坐标轴中的欧氏距离,(xc,yc)表示关键词wc的坐标,(xp,yp)表示词wp的坐标;Wherein, Distance(w c , w p ) represents the Euclidean distance between keyword w c and word w p in the coordinate axis, (x c , y c ) represents the coordinate of keyword w c , and (x p , y p ) represents the coordinate of word w p ; 对于每个词wp,若满足条件,则判定在预测范围内;将满足条件的所有词wp进行集合运算,得到最终的预测范围内的词集合P:For each word w p , if it meets the conditions, it is determined to be within the prediction range; all words w p that meet the conditions are set to obtain the final word set P within the prediction range: 其中,r表示预测范围阈值的标准量,M表示关键词集合,wc表示第c个关键词。Among them, r represents the standard quantity of the prediction range threshold, M represents the keyword set, and w c represents the cth keyword. 2.如权利要求1所述的基于知识管理的检索目的预测方法,其特征在于:所述检索内容包括,用户输入的文字或句子;2. The method for predicting search purpose based on knowledge management according to claim 1, wherein the search content includes text or sentences input by the user; 所述关键词包括,将输入的句子S分词成n个独立的词语;The keywords include,segmenting the input sentence S into n independent words; S={w1,w2,...,wn}S={w 1 , w 2 ,..., w n } 其中,S表示输入的句子,{w1,w2,...,wn}表示分词后的词语集合,wn表示第n个词;Where S represents the input sentence, {w 1 ,w 2 ,...,w n } represents the word set after word segmentation, and w n represents the nth word; 将每个词语wi转换为对应的词向量 Convert each word wi into the corresponding word vector 其中,表示词wi的词向量,Word2Vec(wi)表示通过Word2Vec模型将词wi转换为向量;in, Represents the word vector of word wi , Word2Vec( wi ) means converting word wi into a vector through the Word2Vec model; 通过对词典库中,每个词在不同环境下的词性统计,生成每个词wi拥有的词性集合{ti1,ti2,…,tij};tij表示词wi在词典库中,可能存在的第j个词性;By counting the part of speech of each word in the dictionary under different circumstances, a part of speech set {t i1 , t i2 , …, t ij } of each word wi is generated; t ij represents the jth part of speech that may exist in the word wi in the dictionary; 从词典库中获取每个词词性的词典概率;Get the dictionary probability of each word's part of speech from the dictionary library; 其中,Pdict(POS(wi)=ti)表示词wi为词性ti的词典概率,count(wi,ti)表示词wi在词典中作为词性ti的出现次数,表示词wi为任意词性的出现次数,POS(wi)表示词wi的词性标签;Where Pdict (POS( wi )= ti ) represents the dictionary probability that word wi is part of speech ti , count( wi , ti ) represents the number of occurrences of word wi as part of speech ti in the dictionary, Indicates the number of occurrences of word wi as any part of speech, POS( wi ) represents the part-of-speech tag of word wi ; 在每个词wi对应的词性集合中,按照词典概率为依据,随机抽取集合中的词性ti∈{ti1,ti2,…,tij},得到词性标注序列T={t1,t2,...,tn},且每次随机抽取得到的词性标注序列T不相同,直至不再产生新的词性标注序列或达到最大抽取次数为止;其中,ti表示第i个词的词性。In the part-of-speech set corresponding to each word wi , the part-of-speech t i ∈ {t i1 , t i2 , …, t ij } in the set is randomly extracted based on the dictionary probability to obtain a part-of-speech tagging sequence T = {t 1 , t 2 , …, t n }, and the part-of-speech tagging sequence T obtained by each random extraction is different until no new part-of-speech tagging sequence is generated or the maximum number of extractions is reached; where ti represents the part of speech of the i-th word. 3.如权利要求2所述的基于知识管理的检索目的预测方法,其特征在于:所述关键词还包括,设Tk为抽取到的第k个序列,tki为第k个词性标注序列的第i个词性;3. The method for predicting retrieval purpose based on knowledge management as claimed in claim 2, characterized in that: the keyword further comprises, let T k be the kth sequence extracted, t ki be the i-th part of speech of the k-th part of speech tag sequence; 使用一个预训练的词性嵌入矩阵,将词性标签tki映射到对应的词性嵌入向量 Use a pre-trained part-of-speech embedding matrix to map the part-of-speech tag t ki to the corresponding part-of-speech embedding vector 其中,POS2Vec表示词性标签到嵌入向量的映射函数;Among them, POS2Vec represents the mapping function from part-of-speech tags to embedded vectors; 将每个词性序列与词向量的序列进行结合,通过Bi-LSTM层和全连接层计算每个词性序列的适应度;Combine each part-of-speech sequence with the word vector The sequences are combined and the fitness of each part-of-speech sequence is calculated through the Bi-LSTM layer and the fully connected layer; 前向LSTM:Forward LSTM: 后向LSTM:Backward LSTM: 结合前向和后向隐状态:Combining the forward and backward hidden states: 计算每个词性序列的适应度:Calculate the fitness of each part-of-speech sequence: 从生成的多个词性序列中选择适应度最高的序列,作为最终的词性标注结果;Select the sequence with the highest fitness from the multiple generated part-of-speech sequences as the final part-of-speech tagging result; 其中,T*表示适应度最高的词性序列;表示在所有可能的词性序列Tk中选择使得sk最大的序列;sk表示第k个词性序列的适应度得分;表示第k个词性序列中第i个词wi的前向LSTM隐状态向量;LSTM表示长短期记忆网络;表示词性tki的嵌入向量;表示词向量和词性嵌入向量的拼接操作;表示第k个词性序列中第i-1个词的前向LSTM隐状态向量;表示第k个词性序列中第i个词wi的后向LSTM隐状态向量;表示第k个词性序列中第i+1个词的后向LSTM隐状态向量;hki表示第k个词性序列中,第i个词,结合了前向和后向LSTM的隐状态向量;bs表示全连接层的偏置向量;[...]表示向量的拼接操作;Ws表示全连接层的权重矩阵;Among them, T * represents the part-of-speech sequence with the highest fitness; It means selecting the sequence that makes s k the largest among all possible part-of-speech sequences T k ; s k represents the fitness score of the kth part-of-speech sequence; represents the forward LSTM hidden state vector of the i-th word w i in the k-th part-of-speech sequence; LSTM stands for long short-term memory network; Embedding vector representing part of speech t ki ; Represents the concatenation operation of word vector and part-of-speech embedding vector; Represents the forward LSTM hidden state vector of the i-1th word in the kth part-of-speech sequence; Represents the backward LSTM hidden state vector of the i-th word w i in the k-th part-of-speech sequence; represents the backward LSTM hidden state vector of the i+1th word in the kth part-of-speech sequence; h ki represents the hidden state vector of the i-th word in the kth part-of-speech sequence, combining the forward and backward LSTMs; b s represents the bias vector of the fully connected layer; [...] represents the concatenation operation of the vector; W s represents the weight matrix of the fully connected layer; 预设每个词性的权重,通过映射函数,选择词性标注序列T*中,每个词的词性权重;Preset the weight of each part of speech, and select the part of speech weight of each word in the part of speech tag sequence T * through the mapping function; αi=f(POS(wi))α i =f(POS( wi )) 其中,αi表示词wi映射的权重值,f(POS(wi))表示标签POS(wi)的映射;Where α i represents the weight value of the mapping of word w i , and f(POS(w i )) represents the mapping of label POS(w i ); 将句子的上下文信息传递给一个注意力层,计算每个词的重要性分数;Pass the context information of the sentence to an attention layer to calculate the importance score of each word; βi=softmax(Wahi+ba)β i =softmax(W a h i + ba ) 其中,βi表示词wi的注意力权重;Wa是注意力层的权重矩阵;hi表示词wi在Bi-LSTM层中的隐状态向量;ba表示注意力层的偏置向量;softmax表示归一化函数,使得所有词的注意力权重和为1;Where βi represents the attention weight of word wi ; Wa is the weight matrix of the attention layer; hi represents the hidden state vector of word wi in the Bi-LSTM layer; ba represents the bias vector of the attention layer; softmax represents the normalization function so that the sum of the attention weights of all words is 1; 将每个词的αi和βi相乘,输出筛选评分Score(wi);Multiply α i and β i of each word and output the screening score Score( wi ); Score(wi)=βi×αi Score( wi ) = βi × αi 预设阈值θ,进行关键词筛选;Preset threshold θ to filter keywords; Keywords={wi|Score(wi)>θ}Keywords={w i |Score(w i )>θ} 其中,Keywords表示输出的关键词的向量集合。Among them, Keywords represents the vector set of output keywords. 4.如权利要求3所述的基于知识管理的检索目的预测方法,其特征在于:对所述预测范围内的数据库内容进行综合包括,使数据库中的每个词匹配对应的检索内容,对词集合P中的元素进行优先级排列,依次按照优先级输出每个元素对应的检索信息;4. The method for predicting search purposes based on knowledge management as claimed in claim 3, characterized in that: synthesizing the database contents within the prediction range includes matching each word in the database with the corresponding search contents, prioritizing the elements in the word set P, and outputting the search information corresponding to each element in turn according to the priority; 所述优先级包括,对数据库中的词向量被预测范围选中的次数进行计量;设数据库中的词向量初始计数基础为0,将每个在关键词wp的预测范围中,数据库中的词向量进行计数加1,游历所有关键词,得到对词集合P中每个元素的计数结果;从大到小依次排列计数结果,按照计数结果的排列逐一生成优先等级递减优先级排列,且计数结果相同的元素位于同一优先级。The priority includes measuring the number of times the word vector in the database is selected by the prediction range; assuming that the initial counting basis of the word vector in the database is 0, adding 1 to the count of each word vector in the database in the prediction range of the keyword wp , traveling through all keywords, and obtaining the counting result of each element in the word set P; arranging the counting results from large to small, and generating a descending priority arrangement one by one according to the arrangement of the counting results, and the elements with the same counting results are at the same priority. 5.一种采用如权利要求1-4任一所述方法的基于知识管理的检索目的预测系统,其特征在于:5. A retrieval purpose prediction system based on knowledge management using the method according to any one of claims 1 to 4, characterized in that: 采集单元,获取用户的检索内容,并提取检索内容中的关键词;A collection unit obtains the user's search content and extracts keywords from the search content; 构建单元,对数据库中的内容进行坐标构建;根据所述关键词的特征,对检索内容进行坐标定位;A construction unit constructs coordinates of the content in the database; coordinates the searched content according to the characteristics of the keyword; 预测单元,对所述关键词进行近似变化,根据变化结果,在坐标定位处,生成检索内容的预测范围;对所述预测范围内的数据库内容进行综合,生成对检索目的的预测。The prediction unit performs approximate changes on the keywords, and generates a prediction range of the search content at the coordinate location according to the change result; and integrates the database content within the prediction range to generate a prediction of the search purpose. 6.一种计算机设备,包括:存储器和处理器;所述存储器存储有计算机程序,其特征在于:所述处理器执行所述计算机程序时实现基于知识管理的检索目的预测方法的步骤。6. A computer device, comprising: a memory and a processor; the memory stores a computer program, characterized in that: when the processor executes the computer program, the steps of a retrieval purpose prediction method based on knowledge management are implemented. 7.一种计算机可读存储介质,其上存储有计算机程序,其特征在于:所述计算机程序被处理器执行时实现基于知识管理的检索目的预测方法的步骤。7. A computer-readable storage medium having a computer program stored thereon, characterized in that: when the computer program is executed by a processor, the steps of a retrieval purpose prediction method based on knowledge management are implemented.
CN202411847602.0A 2024-12-16 2024-12-16 A search purpose prediction method and system based on knowledge management Active CN119336795B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411847602.0A CN119336795B (en) 2024-12-16 2024-12-16 A search purpose prediction method and system based on knowledge management

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411847602.0A CN119336795B (en) 2024-12-16 2024-12-16 A search purpose prediction method and system based on knowledge management

Publications (2)

Publication Number Publication Date
CN119336795A CN119336795A (en) 2025-01-21
CN119336795B true CN119336795B (en) 2025-04-11

Family

ID=94275054

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411847602.0A Active CN119336795B (en) 2024-12-16 2024-12-16 A search purpose prediction method and system based on knowledge management

Country Status (1)

Country Link
CN (1) CN119336795B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114996439A (en) * 2022-08-01 2022-09-02 太极计算机股份有限公司 Text search method and device
CN118428481A (en) * 2024-07-05 2024-08-02 青岛海信信息科技股份有限公司 A method for realizing operation and maintenance knowledge search based on embedding vector

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003256466A (en) * 2002-03-04 2003-09-12 Denso Corp Adaptive information retrieval system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114996439A (en) * 2022-08-01 2022-09-02 太极计算机股份有限公司 Text search method and device
CN118428481A (en) * 2024-07-05 2024-08-02 青岛海信信息科技股份有限公司 A method for realizing operation and maintenance knowledge search based on embedding vector

Also Published As

Publication number Publication date
CN119336795A (en) 2025-01-21

Similar Documents

Publication Publication Date Title
JP7162648B2 (en) Systems and methods for intent discovery from multimedia conversations
CN115270738B (en) Research and report generation method, system and computer storage medium
KR20210044697A (en) Ai based question and answer system and method
CN111353021B (en) Intention recognition method and device, electronic device and medium
CN113821646A (en) Intelligent patent similarity searching method and device based on semantic retrieval
CN119719312B (en) Intelligent government affair question-answering method, device, equipment and storage medium
US20050138079A1 (en) Processing, browsing and classifying an electronic document
CN114238632A (en) Multi-label classification model training method and device and electronic equipment
CN113792131B (en) Keyword extraction method and device, electronic equipment and storage medium
CN116629258B (en) Structured analysis method and system for judicial document based on complex information item data
CN110442702A (en) Searching method, device, readable storage medium storing program for executing and electronic equipment
CN116186381A (en) Intelligent search and recommendation method and system
CN120296146A (en) Government document citation retrieval method, device, equipment and medium based on big model
CN119646191B (en) Automatic labeling method, device and equipment based on large model and clustering algorithm
CN119336795B (en) A search purpose prediction method and system based on knowledge management
CN116932487B (en) Quantized data analysis method and system based on data paragraph division
CN119938871A (en) A data retrieval enhancement generation method in the medical insurance field based on a large model
CN119579303A (en) End-to-end generation method and device for customer credit rating report
CN119691267A (en) UGC (user generated content) retrieval method and system based on semantic information, electronic equipment and storage medium
CN115328945A (en) Data asset retrieval method, electronic device and computer-readable storage medium
Tüselmann et al. A weighted combination of semantic and syntactic word image representations
CN119513327B (en) Information technology consultation service method and system based on correlation analysis
Cakaloglu et al. MRNN: A multi-resolution neural network with duplex attention for document retrieval in the context of question answering
CN118484509B (en) Semantic retrieval method and device
CN113127612A (en) Reply feedback method, reply feedback device and intelligent equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant