Detailed Description
Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application. The step numbers in the following embodiments are set for convenience of illustration only, and the order between the steps is not limited in any way, and the execution order of the steps in the embodiments may be adaptively adjusted according to the understanding of those skilled in the art.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the present application.
Before describing embodiments of the present application in detail, some of the terms and expressions that are referred to in the embodiments of the present application will be described first, and the terms and expressions that are referred to in the embodiments of the present application are applicable to the following explanations.
A geographic information system (Geographic Information System, GIS), which is a computer system for collecting, storing, managing, processing, retrieving, analyzing and expressing geospatial data, is a system that analyzes and processes massive amounts of geographic data.
The interest point (Point of Interest, POI) is a landmark or scenic spot in the geographic information system, and is used for marking government departments represented by the place, commercial institutions (gas stations, department stores, supermarkets, restaurants, hotels, convenience stores, hospitals and the like) of various industries, tourist attractions (parks and public toilets), ancient points of interest, transportation facilities (various stations, parking lots, overspeed cameras, speed limit marks) and the like.
Artificial intelligence (Artificial Intelligence, AI), is a theory, method, technique, and application system that simulates, extends, and extends human intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, obtains knowledge, and uses the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision. The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
Machine Learning (ML), which is a multi-domain interdisciplinary, involves multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc., and is a special study of how a computer simulates or implements Learning behavior of a human being to acquire new knowledge or skills, and reorganizes the existing knowledge structure to continuously improve its own performance. Machine learning is the core of artificial intelligence and is the fundamental approach to make computers have intelligence, which is applied throughout various fields of artificial intelligence, and machine learning (deep learning) generally includes technologies such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.
Natural language processing (Nature Language processing, NLP) is an important direction in the fields of computer science and artificial intelligence. The method can be used for researching various theories and methods for realizing effective communication between people and computers by using natural language, and natural language processing is a science integrating linguistics, computer science and mathematics. The field relates to natural language, namely language used by people in daily life, so that the field has close relation with the study of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic questions and answers, knowledge graph techniques, and the like.
Big data (Big data) refers to a data set which cannot be captured, managed and processed by a conventional software tool within a certain time range, and is a massive, high-growth-rate and diversified information asset which needs a new processing mode to have stronger decision-making ability, insight discovery ability and flow optimization ability. With the advent of the cloud age, big data has attracted more and more attention, and special techniques are required for big data to effectively process a large amount of data within a tolerant elapsed time. Technologies applicable to big data include massively parallel processing databases, data mining, distributed file systems, distributed databases, cloud computing platforms, the internet, and scalable storage systems.
Blockchain (Blockchain) is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The blockchain is essentially a decentralised database, which is a series of data blocks generated by cryptographic methods, each data block containing a batch of information of network transactions for verifying the validity (anti-counterfeiting) of the information and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, and an application services layer. The blockchain can comprise a public chain, a alliance chain and a private chain, wherein the public chain refers to the blockchain that any person can enter the blockchain network at any time to read data, send data or contend accounting; the alliance chain refers to a blockchain which a plurality of organizations or organizations participate in management together; the private chain refers to a blockchain with a certain centralized control, the writing right of the account book of the private chain is controlled by a certain organization or organization, and the access and the use of data have strict authority management.
The intelligent transportation system (Intelligent Traffic System, ITS), also called intelligent transportation system (Intelligent Transportation System), is a comprehensive transportation system which effectively and comprehensively applies advanced scientific technologies (information technology, computer technology, data communication technology, sensor technology, electronic control technology, automatic control theory, operation study, artificial intelligence and the like) to transportation, service control and vehicle manufacturing, and strengthens the connection among vehicles, roads and users, thereby forming a comprehensive transportation system which ensures safety, improves efficiency, improves environment and saves energy.
The training and identifying method of the interest point information identifying model provided in the embodiment of the application relates to an artificial intelligence technology, wherein the training and identifying method mainly relates to a natural language processing technology, a machine learning/deep learning technology and the like in the technical field of artificial intelligence. Specifically, the method provided by the embodiment of the application can process the interest point information by adopting a natural language processing technology, so that the membership of the entity in the interest point information is conveniently identified; in the process of identifying the membership of the entity in the interest point information, the model obtained through machine learning/deep learning training can be used for performing feature data extraction, classification tasks and the like, and a training method of the interest point information identification model is also provided.
The method provided by the embodiment of the application can be executed in application scenes such as big data, an intelligent traffic system, an intelligent vehicle-road cooperative system and the like: for example, in big data application scenarios, there is a need to perform statistical analysis on scenic spot popularity. Under the scene, the method provided by the embodiment of the application can be used for carrying out statistical analysis on the browsing times of the entities of the same type for searching the interest point information on the network, so as to obtain the analysis results corresponding to the interest points. In the application scene of the intelligent traffic system and the intelligent vehicle-road cooperative system, the membership of the entities in the information of each interest point is analyzed, so that the generation of a geographic information system with higher fineness can be facilitated, and navigation guidance is conveniently provided.
Of course, it should be noted that the above application scenario is only exemplary, and is not meant to limit the practical application of the method in the embodiment of the present application. Those skilled in the art will appreciate that the methods provided in the embodiments of the present application may be utilized to perform specified tasks in different application scenarios.
In addition, in each specific embodiment of the present application, when related processing is required according to data related to the identity or characteristics of the target object, such as information of the target object, behavior data of the target object, history data of the target object, and position information of the target object, permission or consent of the target object is obtained first, and the collection, use, processing, and the like of the data comply with related laws and regulations and standards of related countries and regions. In addition, when the embodiment of the application needs to acquire the sensitive information of the target object, the independent permission or independent consent of the target object is acquired through a popup window or a jump to a confirmation page or the like, and after the independent permission or independent consent of the target object is explicitly acquired, the necessary target object related data for enabling the embodiment of the application to normally operate is acquired.
A point of interest is typically a landmark, a sight or any physically significant location in a geographic information system, such as a school, a bank, a restaurant, a gas station, a hospital, a supermarket, etc. The interest point information is information including the names of the interest points, such as 'first restaurant in experiment', 'civilization district convenience supermarket', and the like, and the interest point information can provide convenient and practical guidance, so that the interest point information is a necessary element in many navigation applications.
In the point of interest information, there are sometimes a plurality of points of interest, each of which is an entity. For example, the aforementioned "experimental first dining hall" includes two points of interest, namely "experimental first dining hall" and "first dining hall". Similarly, the interest point information such as "technical transaction building east gate" also includes two interest points of "technical transaction building" and "east gate". Of course, the number of entities that may be included in each point of interest information is not fixed, which is not limited in this application. For interest point information including a plurality of interest point entities, there may exist various entity membership relationships, such as a main sub-point relationship, where the main sub-point relationship is used to represent a dependency relationship between interest points, for example, in interest point information of "university a" and "university a" are two entities, and "university a" is affiliated to "university a", where "university a" and "university a" belong to the main sub-point relationship, and "university a" is a main point of "university a", and "university a" is a sub-point of "university a". In addition to the above-mentioned main-sub point relationships, other entity membership relationships may be set, for example, brands (may also be used as main points) and membership relationships of branch shops, which are not described in detail herein. The entity membership is very important for building a geographic information system and a map knowledge graph.
In the related art, entity membership in the point of interest information is generally identified based on a named entity identification manner. However, the current interest point information recognition model is often obtained based on complex corpus training under various scenes, the pertinence is not strong under the scene of interest point information recognition, the semantic analysis capability is weak, and the accuracy of the obtained recognition result is low.
Based on the above, the embodiment of the application provides a training method, a device, an electronic device and a storage medium for identifying interest point information, wherein the training method not only obtains first interest point information and a semantic component label corresponding to the first interest point information for training the interest point information identification model, but also obtains an associated task label of the first interest point information when the interest point information identification model is trained in a multitask learning mode; then, after extracting the characteristics of the first interest point information, inputting the obtained first characteristic data into a second sub-model to identify the interest point information, and inputting the first characteristic data into an associated task model to carry out associated semantic task processing; and then, processing the loss values obtained by the two parts by integrating the identification of the interest point information and the associated semantic task, and carrying out parameter updating on the interest point information identification model. Therefore, the semantic understanding and analyzing capability of the interest point information recognition model can be improved through a multi-task combined learning mode. When the model is used for identifying the interest point information subsequently, the accuracy of the obtained identification result is improved.
The following description and illustration will first be made of a task for processing information about points of interest related to an application of an embodiment of the present application with reference to the accompanying drawings.
Referring to fig. 1, fig. 1 illustrates a task architecture diagram of a point-of-interest information processing task, and in general, the point-of-interest information processing task may include multiple dimensions, for example, in the example illustrated in fig. 1, three dimensions of semantic analysis, syntactic analysis, and lexical analysis are mainly included in the point-of-interest information. A developer related to the field may develop a service system capable of identifying the point of interest information based on the task architecture in fig. 1.
Specifically, in the task architecture of fig. 1, the semantic analysis is mainly used for identifying functions and semantic tags of related words in the point of interest information, for example, for the point of interest information such as "medium petrochemical forest-mei gas station", the semantic analysis can identify which words belong to core words of the point of interest information and which belong to secondary words, so that the main attribute of the place corresponding to the point of interest information can be conveniently identified; the semantic tag identification is mainly used for distinguishing semantic meanings represented by various words in the interest point information, namely, the words belong to which types of information are characterized, for example, the interest point information such as a KFC Mo Liu shopping center is identified by the semantic tag, and words of which two words of KFC and Mo Liu belong to brand type information and words of which shopping and center belong to business type information can be identified. Syntactic analysis is a task dimension, which may also be referred to as a name understanding task, that is the primary application of the method provided by embodiments of the present application. The task aims at analyzing semantic components in the interest point information and can comprise sub-tasks of main component analysis, subordinate component analysis, principal and subordinate and hierarchical recognition and the like. Wherein the purpose of principal component analysis is to identify principal and child points, and the purpose of subordinate component analysis is to identify aliases, branches and descriptions. For example, for "middle petrochemical forest-america gas station", it is possible to identify "middle petrochemical forest-america" as the main point and "gas station" as the sub-point; the name "KFC (Mo Liu shopping mall)", it can be recognized that "KFC" is the main point and "Mo Liu shopping mall" is the branch store. The word analysis and comparison are biased to process words in the interest point information, for example, word characters of each word can be determined after the interest point information is subjected to word segmentation, and the process is similar to the manner of determining semantic tags, and is not repeated here. In some cases, the common names of the actual principal points may not be included in the point of interest information, but include synonyms or aliases of the common names of the principal points, for example, the common point of interest information is "university of sublimating physical college", and the "university of sublimating physical college" may be searched for actually at the time of searching. Therefore, synonym expansion can be performed on words in the interest point information through lexical analysis, and the comprehensiveness and accuracy of the interest point identification and retrieval are improved. In the process of lexical analysis, the information of the interest point generally needs to be subjected to word segmentation processing to obtain a plurality of words forming the information of the interest point. Here, the word segmentation mode may be full segmentation, fine granularity segmentation or coarse granularity segmentation, and the like, and different segmentation granularities may be adopted, so that the word segmentation result obtained by segmentation may be different, and the segmentation granularity may be specifically selected according to the actual application requirement, which is not limited in the application.
In connection with a task architecture diagram of a point-of-interest information processing task shown in fig. 1, a model architecture diagram capable of executing the task architecture shown in fig. 1 is provided in an embodiment of the present application, and in particular, as shown in fig. 2. Correspondingly, in the model architecture of fig. 2, three parts are also mainly included. The task corresponding to semantic analysis can comprise word weight calculation, word class prediction, semantic similarity calculation and other subdivision task types, in terms of models, models capable of carrying out similarity learning, behavior feature extraction and language feature extraction can be adopted, and accuracy of the obtained result can be improved in a multi-model fusion mode. Corresponding to the task of lexical analysis, related word segmentation algorithms can be adopted to perform word segmentation processing with various granularities, word roles can be matched based on various dictionaries, or word roles can be identified based on models and the like.
The task of syntactic analysis is the part of the main application of the method provided in the embodiments of the present application. In fig. 2, it can be seen that the model layer of the portion may use a language model in combination with multi-task learning to perform syntactic analysis on the point of interest information based on feature data extracted by the point of interest information, so as to identify principal points, sub points, and semantic components such as branches, aliases, and the like in the point of interest information.
It should be noted that, the task architecture shown in fig. 1 and the model architecture shown in fig. 2 are only used to describe the entirety of the task for processing the point of interest information, and do not mean that the point of interest information identification model in the present application needs to process all task types or all components in the task types. Moreover, those skilled in the art may adjust the task architecture or the model architecture according to actual requirements, for example, to omit or add certain technical content, which is not limited in the present application.
In the related art, a model generally needs to be trained to meet its parameters for predetermined needs before it is put into use. In an embodiment of the application, a training method of a point-of-interest information recognition model is provided. The point-of-interest information recognition model is mainly used for performing the processing task of the syntactic analysis in fig. 1, and for convenience of description, a brief description of an exemplary structure of the model and recognition principles when applied will be provided below.
Referring to fig. 3, the point-of-interest information recognition model provided in the embodiment of the present application mainly includes two parts, which are respectively denoted as a first sub-model 310 and a second sub-model 320. The first sub-model 310 is used for extracting feature data of the input interest point information, and the second sub-model 320 is used for identifying membership of an entity in the interest point information according to the extracted feature data, so as to output a prediction result. Specifically, in some embodiments, the first sub-model 310 may employ a language model that can extract semantic features of the point of interest information, for example, the point of interest information may be semantically encoded in a word embedding manner; because the interest point information is continuous information, the recognition model needs to output recognition results corresponding to each word during prediction, so that the second sub-model 320 in the embodiment of the application can adopt a sequence labeling model, the sequence labeling model is more efficient to apply in a natural language processing task, for example, for an input sequence with the length of N, when each element in the input sequence needs to be labeled to obtain output with the length of N, the speed and efficiency of model processing can be greatly improved by adopting the sequence labeling model. Illustratively, in some embodiments, the first sub-model 310 may employ a BERT model, a long-short-term memory model (Long Short Term Memory, LSTM), and derivatives thereof such as an ALBERT model, a BiLSTM model, etc.; the second sub-model 320 may be a Hidden Markov Model (HMM), a Conditional Random Field (CRF), or the like, which is not particularly limited in the present application.
Of course, it should be noted that, in addition to the first sub-model and the second sub-model, the interest point information identification model provided in the embodiment of the present application may further include other parts, for example, may be a model structure (for example, dropout layer) that filters input data or optimizes and filters intermediate feature data. For example, a first sub-model in a certain interest point information identification model is a BiLSTM model, a second sub-model is a conditional random field, and the method further comprises an embedding layer and a coding layer, wherein each word of the input interest point information is converted into a vector in the embedding layer, the vectors are input into the BiLSTM model, and feature extraction is performed based on the context information to obtain feature vectors with higher dimensionality of the word; then, the coding layer can map the high-dimensional feature vector into a low-dimensional vector conforming to the dimension of the label, so that labeling of each word can be carried out through a conditional random field based on the low-dimensional vector, and the labeling data is the recognition result output by the model.
Hereinafter, a detailed description will be given of specific implementations of the embodiments of the present application with reference to the accompanying drawings.
Referring to fig. 4, fig. 4 is a schematic diagram illustrating an implementation environment of a training method of the point-of-interest information identification model provided in an embodiment of the present application. In this implementation environment, the main hardware and software body includes the first terminal 410.
Specifically, the first terminal 410 may have a related map application installed therein and be configured with a point-of-interest information recognition model; the interest point information identification model can acquire interest point information from the map application program, and feed back corresponding identification results to the map application program for use, so that the map application program provides navigation results based on the identification results given by the interest point information identification model. In the case illustrated in fig. 4, the point of interest information recognition model may be trained and applied in the first terminal 410.
In addition, referring to fig. 5, fig. 5 is a schematic diagram of another implementation environment of the training method of the point-of-interest information identification model according to the embodiment of the present application, where a main software and hardware body related to the implementation environment includes a second terminal 510 and a server 520, where the second terminal 510 and the server 520 are in communication connection.
Specifically, the server 520 may have a related map application installed therein and be configured with a point of interest information identification model; the map application may receive the initiated navigation request and may send the navigation results to the second terminal 510 for display for browsing. Specifically, the interest point information recognition model may acquire interest point information from the map application program, and feed back a corresponding recognition result to the map application program for use, so that the map application program provides a navigation result based on the recognition result given by the interest point information recognition model.
The first terminal 410 and the second terminal 510 of the above embodiments may include, but are not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, and a vehicle-mounted terminal.
The server 520 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs (Content Delivery Network, content delivery networks), basic cloud computing services such as big data and artificial intelligence platforms, and the like.
In addition, server 520 may also be a node server in a blockchain network.
A communication connection may be established between the second terminal 510 and the server 520 through a wireless network or a wired network. The wireless network or wired network may be configured as the internet, using standard communication techniques and/or protocols, or any other network including, for example, but not limited to, a local area network (Local Area Network, LAN), metropolitan area network (Metropolitan Area Network, MAN), wide area network (Wide Area Network, WAN), mobile, wired or wireless network, a private network, or any combination of virtual private networks.
Of course, it can be understood that the implementation environments in fig. 4 and fig. 5 are only some optional application scenarios of the training method of the point of interest information identification model provided in the embodiments of the present application, and the actual application is not fixed to the software and hardware environments shown in fig. 4 and fig. 5. The method provided by the embodiment of the application can be applied to various technical fields including but not limited to the technical fields of big data, intelligent traffic systems, intelligent vehicle-road cooperative systems and the like.
Referring to fig. 6, fig. 6 is a flowchart of a training method of a point of interest information recognition model according to an embodiment of the present application, where the training method may be executed by a terminal alone or in combination with a server, and the training method includes, but is not limited to, the following steps 610 to 660.
Step 610: acquiring first interest point information, semantic component labels corresponding to words in the first interest point information and associated task labels of the first interest point information; the semantic component labels are used for representing membership of entities in the first interest point information, and the associated task labels are used for representing task results of associated tasks of the first interest point information.
In this step, when training the interest point information recognition model, corresponding training data needs to be acquired. The training data mainly comprises two parts, wherein the first part is data which is input into the model and used for being processed by the model to obtain a corresponding prediction result, the other part is tag data which corresponds to the first part data, the tag data is used for identifying a real result of a prediction task, the accuracy of model prediction can be reflected by comparing the prediction result with the tag data, and the training data can be further used for updating parameters of the model and determining the progress of model training.
Specifically, in the embodiment of the application, when training the interest point information identification model, a multitask learning mode is adopted, and the associated task model is combined to participate in a prediction task together, so that relevant parameters of the interest point information identification model are updated. In the multi-task learning, a part of learning tasks are tasks for identifying the interest point information, namely tasks to be executed by the trained interest point information identification model, and the other part of learning tasks are tasks for the associated task model to be executed. Accordingly, correspondingly, when acquiring training data in step 610, point of interest information for training may be acquired first, and noted as first point of interest information; and aiming at the task executed by the interest point information identification model, acquiring a semantic component label corresponding to the word in the first interest point information, wherein the semantic component label is used for representing the membership of the entity in the first interest point information. Here, the data form of the semantic component label and the membership relationship represented by the semantic component label may be flexibly set according to needs, which is not limited in this application, for example, in some embodiments, a vector may be used as the data form of the semantic component label, for example, when the semantic component label is (1, 0), an entity (or a part of an entity) that represents a term corresponding to the semantic component label and belongs to a principal point in the first interest point information; when the semantic component label is (0, 1, 0), an entity (or a part of an entity) representing that the word corresponding to the semantic component label belongs to a sub-point in the first interest point information; when the semantic component label is (0, 1), the word representing the semantic component label belongs to an entity (or a part of an entity) of the branch in the first point of interest information.
It should be noted that, because the granularity of the word segmentation may be different, in the embodiment of the present application, each word in the first interest point information may represent an entity alone or may be part of an entity. In actual processing, multiple words with tags of the same semantic components in succession may be determined to be one entity. In other words, when the semantic component labels of the words are marked, the semantic component labels of all the entities in the first interest point information can be marked first, and then for the same entity split into a plurality of words, the semantic component labels of the words obtained after splitting all adopt the semantic component labels of the original entities. For example, in a certain point of interest information, the "university of clearing" is the main point of the point of interest information, and two words of "clearing" and "university" are obtained after word segmentation processing is performed on the "university of clearing", so that semantic component labels corresponding to the two words are respectively labeled as labels of the main point type.
In this step, in addition to the semantic component labels, the associated task labels of the first interest point information are also acquired. Here, the task result of the associated task used to characterize the first point of interest information may be used to measure the accuracy of the task executed by the associated task model, and the specific data form of the task is not limited in this application, for example, any one or more of a numerical value, a vector, a matrix, or a tensor may be used. For the specific meaning of the associated task tag, the specific meaning needs to be determined according to the specific associated task type, and this part of the content will be described and illustrated in the following embodiments, which are not repeated herein.
In this step, the related information and data may be obtained locally or from the cloud, which is not limited in the embodiment of the present application.
Step 620: and inputting the first interest point information into the first sub-model for natural language processing to obtain first characteristic data corresponding to the first interest point information.
In this step, for the obtained first interest point information, the obtained first interest point information may be input into a first sub-model of the interest point information identification model, natural language processing is performed on the first interest point information through the first sub-model, and corresponding feature data is extracted and obtained and recorded as first feature data. In the embodiment of the present application, the data form of the feature data is not limited, and for example, the feature data may include one or more of a numerical value, a vector, a matrix, or a tensor.
Specifically, in some embodiments, when the first sub-model performs natural language processing on the first point of interest information, a word or a word in the first point of interest information may be mapped into a vector space with a unified lower dimension, so as to obtain corresponding first feature data, and a strategy for generating such mapping includes a neural network, dimension reduction of a word co-occurrence matrix, a probability model, an interpretable knowledge base method, and the like.
In the embodiment of the present application, the first feature data of the obtained first interest point information may be feature data extracted at a word level or feature data extracted at a word level. Illustratively, taking the feature data extracted at the word level as an example, in the subsequent processing, since the semantic components thereof need to be identified in the dimension of the word level by the second sub-model, intermediate processing steps may be added to convert the first feature data at the word level into the first feature data at the word level. For example, the first interest point information may be subjected to word segmentation processing to obtain a phrase that forms the information, where the phrase includes a plurality of words, and for each word, the first feature data corresponding to the word contained therein may be subjected to feature fusion processing to obtain the first feature data corresponding to the word. Here, when performing word segmentation, in some embodiments, a dictionary-based word segmentation algorithm may be adopted, where a sentence is segmented into words according to a dictionary, and then an optimal combination mode of the words is searched; in some embodiments, word segmentation algorithm based on words may be used, where the sentence is divided into individual words, and then the words are combined into words, so as to find an optimal combination mode, which is not limited in this application. In addition, when the feature data is fused, any processing mode such as splicing, post-superposition normalization and the like can be adopted.
Step 630: inputting the first characteristic data into the second sub-model, and carrying out semantic component analysis on the first interest point information to obtain a semantic component analysis result corresponding to the word in the first interest point information.
In this step, after the first feature data is obtained, the first feature data may be input into the second sub-model, and semantic component analysis is performed on the first interest point information, so as to obtain a semantic component analysis result corresponding to each term. The semantic component analysis result is used for representing the membership of the entity in the first interest point information predicted by the model, and the data form and the corresponding meaning are the same as the semantic component labels, and are not described herein again.
Step 640: and inputting the first characteristic data into the associated task model, and carrying out associated task processing on the first interest point information to obtain an associated task processing result corresponding to the first interest point information.
In this step, as before, in the embodiment of the present application, the prediction task is jointly participated in by combining the associated task model in a multitask learning manner. Therefore, the first characteristic data is also input into the associated task model, and associated task processing is carried out on the characteristic data extracted by the associated task model based on the point-of-interest information identification model, so that an associated task processing result corresponding to the first point-of-interest information is obtained. Similarly, here, the data form and the corresponding meaning of the associated task processing result are the same as those of the associated task tag described above.
Specifically, in the embodiment of the present application, the associated task model may be any learning model related to a natural language processing task, which is not limited in this application.
It should be noted that, in the embodiment of the present application, the associated task model may also include a structure including a feature extraction portion, but when the associated task model is applied, the associated task model does not need to extract feature data of the first interest point information through its own feature extraction portion, and uses feature data extracted by the first sub-model of the interest point information identification model. In this way, the effect of extracting the feature data by the first sub-model can be referred to by the accuracy of the associated task model in executing the associated task, so that the feature data is fed back to the point-of-interest information identification model to update the parameters of the point-of-interest information identification model.
Step 650: and determining a training loss value according to the semantic component analysis result and the semantic component label, and the associated task processing result and the associated task label.
In the step, after the semantic component analysis result and the associated task processing result are obtained, the training loss value can be determined according to the performance of the interest point information identification model and the associated task model. Specifically, for the multi-task learning in the embodiment of the present application, the trained loss value also includes two parts, where one part is the loss value of the task identified by the interest point information, and the part loss value can be determined by the semantic component analysis result and the semantic component label; the other part is the loss value of the task which is responsible for executing the associated task model, and similarly, the loss value of the part can be determined through the associated task processing result and the associated task label, and the loss value of the whole training can be determined based on the loss values of the two parts.
Specifically, for various models in the artificial intelligence field, the prediction accuracy of the model can be measured by a Loss Function (Loss Function), wherein the Loss Function is defined on single training data and is used for measuring the prediction error of one training data, and particularly determining the Loss value of the training data through the label of the single training data and the prediction result of the model on the training data. In actual training, one training data set includes a plurality of training data, so that a Cost Function (Cost Function) is generally adopted to measure the overall error of the training data set, and the Cost Function is defined on the whole training data set and is used for calculating the average value of the prediction errors of all the training data, so that the prediction effect of the model can be better measured. For a general machine learning model, based on the cost function, a regular term for measuring the complexity of the model can be used as a training objective function, and based on the objective function, the loss value of the whole training data set can be obtained. There are many kinds of common loss functions, such as 0-1 loss function, square loss function, absolute loss function, logarithmic loss function, cross entropy loss function, etc., which can be used as the loss function of the machine learning model, and will not be described in detail herein. In the embodiment of the application, one loss function can be selected to determine the loss value of training, namely, the loss value between the predicted result and the label. The prediction result is a semantic component analysis result or an associated task processing result, and the label is a semantic component label or an associated task label.
It may be understood that in the embodiment of the present application, when determining the loss value of the task identified by the point of interest information and the loss value of the task that the associated task model is responsible for executing, the same loss function may be adopted, or different loss functions may be adopted, which is not limited in this application.
Step 660: and updating parameters of the first sub-model and the second sub-model according to the loss value.
The loss value obtained in step 650 may be used to evaluate the accuracy of model predictions to back-propagate the model and update its internal relevant parameters. Therefore, in this step, after obtaining the trained loss value, parameter updating may be performed on the first sub-model and the second sub-model in the point-of-interest information identification model. Specifically, in the process of parameter updating, the parameter updating can be iterated and updated for a plurality of times through a back propagation algorithm, and when the iterated turns reach the preset turns, training can be considered to be completed, and a trained interest point information identification model is obtained. Of course, in some embodiments, the verification data set may be preset, and during the training process, the current recognition accuracy of the model is verified through the verification data set every time of iterating or for a predetermined number of times, and when the recognition accuracy reaches a preset standard, the training of the model may be considered to be completed. Here, the acquisition and acquisition manner of the verification data set is the same as that of the training data, and will not be described herein.
It can be understood that in the embodiment of the present application, by means of multitask learning, when training the point-of-interest information identification model, not only the first point-of-interest information for training the point-of-interest information identification model and the semantic component tags corresponding thereto are obtained, but also the associated task tag of the first point-of-interest information is obtained; then, after extracting the characteristics of the first interest point information, inputting the obtained first characteristic data into a second sub-model to identify the interest point information, and inputting the first characteristic data into an associated task model to carry out associated semantic task processing; and then, processing the loss values obtained by the two parts by integrating the identification of the interest point information and the associated semantic task, and carrying out parameter updating on the interest point information identification model. Therefore, the semantic understanding and analyzing capability of the interest point information recognition model can be improved through a multi-task combined learning mode. When the model is used for identifying the interest point information subsequently, the accuracy of the obtained identification result is improved.
It should be noted that, the associated task model used in the embodiment of the present application may be already trained, or may be trained together with the point of interest information identification model in the present application after initialization. In other words, when the associated task model used in the process of training the point-of-interest information recognition model is a model already trained, the parameters of the associated task model do not need to be updated and changed in the process of training the point-of-interest information recognition model; when the associated task model used in the process of training the interest point information identification model is an untrained model, the interest point information identification model and the associated task model can be regarded as a whole model, and parameters of the interest point information identification model and the associated task model are updated through the obtained loss value.
In addition, in the embodiment of the present application, if a trained associated task model is selected, in order to improve the association between the associated task model and the point-of-interest information recognition model, the associated task model may be obtained by training data in the related field of the point-of-interest information, for example, in some embodiments, the same batch of point-of-interest information may be obtained, the associated task model may be trained first to obtain a trained associated task model, and then the point-of-interest information recognition model may be trained using the point-of-interest information and the trained associated task model. Therefore, the related task model can be attached to the application field of the interest point information, the capability of extracting the characteristic data of the interest point information identification model can be reflected more effectively, and the prediction accuracy of the interest point information identification model obtained by training can be improved.
The implementation of the associated task model in the embodiments of the present application is explained and illustrated below with reference to some specific application examples.
Illustratively, in some embodiments, the associated task model includes a name category prediction model; inputting the first characteristic data into the associated task model, and carrying out associated task processing on the first interest point information to obtain an associated task processing result corresponding to the first interest point information, wherein the associated task processing result comprises the following steps:
And inputting the first characteristic data into a name category prediction model, and predicting the name category of the first interest point information to obtain a name category prediction result corresponding to the first interest point information.
In this embodiment of the present application, the associated task model may include a name category prediction model, and the name category prediction model may predict a name category to which the first point of interest information belongs based on the first feature data of the first point of interest information. Here, the name category is used to characterize what type of location the first point of interest information belongs to, for example, the name category may include classification such as "school", "residential building", "store", "restaurant", "hospital", and the like, and the specific name category may be flexibly adjusted according to needs, which is not limited in this application.
The name category prediction model performs a classification task, namely, performs category classification on the first interest point information, so as to obtain a corresponding name category prediction result. For example, the name category prediction model may predict whether "laboratory primary school" belongs to the name category of "school" or the name category of "residential building". Here, the different name categories may be distinguished in any data form, which the present application does not limit.
It can be understood that in the machine learning field, the classification task is one of basic tasks, and the types of models and algorithms that can be selected are various, so in the embodiment of the present application, the classification model algorithm that is selected for the name class prediction model is not limited. In addition, in the embodiment of the application, the name category prediction model predicts based on the feature data extracted from the first sub-model, so that the accuracy of prediction is related to the feature extraction capability of the first sub-model to a certain extent, and the parameters of the interest point information identification model can be fed back and updated through the accuracy of performing the related task of the name category prediction by the name category prediction model, namely, the accuracy of the name category prediction result.
Illustratively, in other embodiments, the associated task model includes a name similarity discrimination model; inputting the first characteristic data into the associated task model, and carrying out associated task processing on the first interest point information to obtain an associated task processing result corresponding to the first interest point information, wherein the associated task processing result comprises the following steps:
randomly selecting and obtaining first target interest point information and second target interest point information from the first interest point information;
Determining first target data obtained through processing according to the first target interest point information from the first characteristic data and second target data obtained through processing according to the second target interest point information;
and inputting the first target data and the second target data into a name similarity judging model, and judging the name similarity of the first target interest point information and the second target interest point information to obtain a name similarity judging result of the first target interest point information and the second target interest point information.
In this embodiment of the present application, the associated task model may further include a name similarity determining model, where the name similarity determining model may be used to determine a similarity between two or more interest point information. For convenience of description, taking the discrimination of the similarity between the two interest point information as an example, first, two interest point information may be randomly selected from the first interest point information, and recorded as the first target interest point information and the second target interest point information. For the first target interest point information, first target data corresponding to the first target interest point information is determined from the first characteristic data, and similarly, second target data corresponding to the second target interest point information is also determined from the first characteristic data. Then, the name similarity discrimination model may determine a name similarity discrimination result of the first target point of interest information and the second target point of interest information based on the first target data and the second target data.
Specifically, in some embodiments, the similarity between the interest point information may be represented by a percentage, for example, when the name similarity discrimination result of the first target interest point information and the second target interest point information is 90%, it is indicated that the similarity between the first target interest point information and the second target interest point information is higher, and it is likely to have a certain relevance; on the contrary, when the name similarity discrimination result of the first target interest point information and the second target interest point information is 20%, the similarity of the first target interest point information and the second target interest point information is lower, and at the moment, the probability that the first target interest point information and the second target interest point information are not associated is higher. When the name similarity judging model determines a name similarity judging result based on the first target data and the second target data, the difference condition between the first target data and the second target data can be directly calculated to obtain a difference value, for example, the difference value can be a difference value between numerical values, euclidean distance between vectors, norm difference between matrixes and the like, then the name similarity judging result is determined according to the difference condition between the first target data and the second target data through the difference value and a preset function, and the function can enable the name similarity judging result and the difference value to be in a negative correlation relationship.
It can be understood that, in the embodiment of the present application, since the first target data and the second target data are extracted by the first sub-model, the accuracy of the discrimination of the name similarity discrimination model is also related to the feature extraction capability of the first sub-model to a certain extent, so that the parameters of the interest point information discrimination model can be fed back and updated by the accuracy of the name similarity discrimination model for executing the related task of the name similarity discrimination, that is, the accuracy of the name similarity discrimination result.
It should be noted that the above examples of the related task model are only used to illustrate the implementation manner of the related task model in the embodiments of the present application, and are not meant to limit the specific implementation thereof. In addition, in practical application, the number of the associated task models used for training the interest point information identification model can be one or more, and when a plurality of associated task models are used, each associated task model corresponds to one associated task.
In some embodiments, determining the trained penalty value based on the semantic component analysis results and the semantic component labels, and the associated task processing results and the associated task labels, comprises:
Determining a first sub-loss value for carrying out semantic component analysis on the first interest point information according to the semantic component analysis result and the semantic component label;
determining a second sub-loss value for performing associated task processing on the first interest point information according to the associated task processing result and the associated task label;
and carrying out weighted summation on the first sub-loss value and the second sub-loss value to obtain a trained loss value.
In the embodiment of the present application, when determining the trained loss value, the loss value for performing semantic component analysis on the first interest point information may be determined according to the semantic component analysis result and the semantic component label through the corresponding loss function, and recorded as a first sub-loss value; meanwhile, according to the associated task processing result and the associated task label, a loss value for performing associated task processing on the first interest point information can be determined through a corresponding loss function and recorded as a second sub-loss value. The first sub-loss value and the second sub-loss value may then be weighted and summed to obtain a trained loss value. Here, the weighting weights corresponding to the first sub-loss value and the second sub-loss value may be flexibly set as needed, which is not limited in this application.
It should be noted that, as described above, in the embodiment of the present application, the number of associated task models used for training the point-of-interest information identification model may be multiple, and when multiple associated task models are used, the sum or weighted sum of the loss values corresponding to the associated task models may be used as the loss value of the associated task as a whole.
In some embodiments, weighted summing the first sub-loss value and the second sub-loss value to obtain a trained loss value comprises:
determining the product of the first sub-loss value and the first weight to obtain a first numerical value;
determining the product of the second sub-loss value and the second weight to obtain a second value;
obtaining a trained loss value according to the sum of the first value and the second value;
wherein the first weight is greater than the second weight.
In this embodiment of the present application, when weighting a first sub-loss value obtained by executing a task for identifying point of interest information and a second sub-loss value obtained by executing an associated task, first, weights corresponding to the first sub-loss value and the second sub-loss value may be set: the weight corresponding to the first sub-loss value is marked as a first weight, and the weight corresponding to the second sub-loss value is marked as a second weight. Here, since the model to be trained in the embodiment of the present application is the interest point information identification model, in order to reduce the occurrence of the situation that the influence of the associated task is too large, and the accuracy of the interest point information identification model after training is low, the first weight may be set to be greater than the second weight. Therefore, the training task is more biased to the influence brought by the task identified by the interest point information, and the influence factor of the main task on training is improved as much as possible in the process of multi-task learning. After setting the two weights, determining the product of the first sub-loss value and the first weight to obtain a first numerical value; and determining a product of the second sub-loss value and the second weight to obtain a second value, so that a trained loss value can be obtained according to the sum of the first value and the second value.
In some embodiments, the method further comprises:
detecting whether each piece of first interest point information lacks a corresponding semantic component label or an associated task label;
when the corresponding semantic component labels are missing from the first interest point information, skipping the step of inputting the first characteristic data into the second sub-model and carrying out semantic component analysis on the first interest point information; or,
and when the corresponding associated task label is missing from the first interest point information, skipping the step of inputting the first characteristic data into the associated task model and carrying out associated task processing on the first interest point information.
In the embodiment of the present application, since a multi-task learning manner is adopted in the training process, when training data is acquired, some tasks may lack corresponding labels. For example, the first interest point information includes "the university of Qinghai Dong", "the university of Qinghai affiliated middle school" and "the third canteen of the double denier university", wherein the tags related to the three interest point information include the related task tags (such as the tags representing the similarity of the names of the two) of the university of Qinghai Dong and the university of Qinghai affiliated middle school of the Qinghai university, and the semantic component tags of the "the university of Qinghai affiliated middle school" and the "the third canteen of the double denier university". In the training process, on one hand, in order to improve the utilization rate of the labels in the obtained training data, on the other hand, in consideration of the situation that part of tasks lack labels and may be difficult to execute, a task decoupling strategy may be adopted for training. Specifically, whether the corresponding semantic component tags or the related task tags are missing or not can be detected, if the corresponding semantic component tags are missing, the loss value obtained by performing semantic component analysis on the corresponding semantic component tags cannot be determined later, so that the step 630 can be skipped for the training data; similarly, if the corresponding associated task tag is missing, then the penalty value for performing the associated task on it cannot be determined later, and therefore, the foregoing step 640 may not be performed for that training data.
It should be noted that, since the number of associated task models used for training the point-of-interest information identification model may be plural, when the associated task label corresponding to the point-of-interest information identification model is missing, only the step of inputting the first feature data into the associated task model with the associated task label missing is needed to be skipped, and the associated task without the associated task label missing can be executed as usual. And if a certain first interest point information lacks the corresponding semantic component labels and all the associated task labels, the first interest point information can be removed from the training data.
The training method of the interest point information identification model provided in the present application will be described in detail below with reference to some more specific embodiments.
Referring to fig. 7, in an embodiment of the present application, a point-of-interest information identification model is provided, where a first sub-model portion of the model uses an ALBERT model, and a second sub-model portion uses a conditional random field. The interest point information is input into the interest point information identification model, and the ALBERT model can process the interest point information to obtain a context-related dynamic word vector, namely the first characteristic data in the embodiment of the application. As described above, in the embodiment of the present application, the word vector may be converted into the word vector by a splicing or weighting manner according to the word segmentation result obtained by segmenting the point of interest information. In the interest point information recognition model in the embodiment of the present application, besides applying the obtained word vector, the interest point information is searched and matched through a dictionary, and domain knowledge features corresponding to the interest point information are determined. Here, the dictionary is used to store at least one word file, any of which is used to store a plurality of words under a corresponding semantic category. For example, the point of interest dictionary may store a master point file, a child point file, a branch file, a business turn file, a brand file, a role file, etc., and domain knowledge features may be represented by one or more vectors. For example, when a vector is used to represent a domain knowledge feature corresponding to the point of interest information, a word file corresponding to each element position in the vector may be set, when the point of interest information is searched and matched, the value of the element position corresponding to the word file may be 1 if the point of interest information is found to match a certain word file, otherwise, if the point of interest information is searched and matched, the value of the element position corresponding to the word file may be 0 if the point of interest information is found to not match a certain word file. Thus, the domain knowledge features corresponding to the interest point information can be determined. Of course, in some embodiments, the value of the element position corresponding to the word file may also be determined according to the word specifically hit in the word file, which is not limited in this application. In addition, since brackets exist in part of the interest point information, information such as sub-points or branch names is generally marked in the brackets, whether the words in the interest point information are in the brackets or not can also be used as one dimension of the domain knowledge feature.
After obtaining the domain knowledge features, feature fusion processing can be performed on the first feature data and the domain knowledge features corresponding to each word according to the word segmentation result, and a specific fusion manner is similar to that of the foregoing embodiment, and is not repeated herein. After fusion is completed, the obtained fusion characteristic data can be input into a conditional random field, and the conditional random field can label the fusion characteristic data corresponding to each word. Referring to fig. 7, in particular, in this embodiment of the present application, a labeling manner of BMES may be adopted, where a local tag at the beginning of each entity name in the point of interest information is denoted as B, a local tag in the middle of each entity name is denoted as M, a local tag at the end of each entity name is denoted as E, a tag of an independent entity name is denoted as S, and a tag of a preset non-entity name is denoted as O. Thus, the information between each adjacent tag "B" and tag "E" is an entity. In this embodiment of the present application, when identifying the semantic component of the interest point information through the model, in some embodiments, the entity that only designates the semantic component may be selected to set the model, or the label may be further subdivided, for example, the label corresponding to the word belonging to the semantic component of the principal point is added with the sign of the principal point, and the label corresponding to the word belonging to the semantic component of the sub point is added with the sign of the sub point, so as to identify the semantic component of the word.
In this embodiment, when training the model shown in fig. 7, in some embodiments, the ALBERT model may be pre-trained by using a corpus related to the point of interest information, so as to obtain the ALBERT model suitable for processing tasks in the field of the point of interest information. Specifically, referring to fig. 8, fig. 8 shows a flowchart of pre-training the ALBERT model. When the ALBERT model is pre-trained, a large amount of corpus related to the interest point information is needed, and in the embodiment of the application, the corpus can be obtained from two aspects, on one hand, the corpus can be collected from a related interest point information base; on the other hand, the dictionary acquisition based on the above construction can be performed. Here, the obtained point of interest information is noted as second point of interest information, and the language model may be pre-trained based on the second point of interest information. For example, taking the ALBERT model in fig. 8 as an example, first, the acquired second interest point information may be subjected to multi-granularity masking, for example, in some embodiments, the second interest point information may be subjected to random masking; in other embodiments, the point of interest information may be segmented, where the granularity of the segmentation may be a word dimension or an entity dimension. Then, masking may be performed on several words or entity information therein. Of course, in the embodiment of the application, a mode of combining multiple mask treatments can be adopted, so that the semantic modeling capability of the ALBERT model obtained by pre-training can be greatly improved, and more accurate feature data can be extracted.
After the pre-training of the ALBERT model is completed, the whole interest point information identification model can be trained through the training method provided by the embodiment of the application. Specifically, referring to fig. 9, in the embodiment of the present application, when training the model shown in fig. 7, two related tasks of a name category prediction task and a name similarity determination task are combined to perform training for multi-task learning. In the training process, the respective loss values of the sequence labeling task (i.e. the task for identifying the interest point information), the name category prediction task and the name similarity judging task of the conditional random field can be calculated respectively, and then the loss values are weighted to obtain the final loss value. The specific loss function is as follows:
Loss=α*L ner +β*L class +γ*L sim
where Loss represents the overall Loss value, α represents the weighted weight of the Loss value corresponding to the task identified by the point of interest information, L ner Representing a loss value corresponding to the task identified by the interest point information; beta represents the weighted weight of the loss value corresponding to the name class prediction task, L class Representing a loss value corresponding to the name class prediction task; gamma represents the weighted weight of the loss value corresponding to the name similarity discrimination task, L sim And the loss value corresponding to the name similarity judging task is represented.
Particularly, in the embodiment of the application, in order to solve the problem that the labeling corpus aligned with each task is difficult to obtain during multi-task learning, a corpus decoupling strategy is also provided, namely, the task which does not correspond to the task can be skipped is selected. At this time, when calculating the loss function, if a certain sample in the current task does not have a label corresponding to the task, the sample needs to be removed from the loss function, so that the final loss function is expressed as the following formula:
in the method, in the process of the invention,representing a loss value corresponding to a task for identifying the interest point information after corpus decoupling;Representing a loss value corresponding to the name class prediction task after corpus decoupling;and representing the loss value corresponding to the name similarity discrimination task after corpus decoupling.
In order to compare and verify the effectiveness of the training method of the interest point information identification model provided in the embodiment of the present application, a scheme of identifying interest point information by using a bislm model is used as a reference scheme, the effect of the model obtained by training based on various model strategies is verified, and an accuracy rate, a recall rate and an F1 value are used as evaluation indexes, and specific experimental results are as follows:
TABLE 1
In table 1, from top to bottom, the model strategy is "ALBERT model" scheme, which means that ALBERT model is adopted to replace bislm model in the reference scheme; the model strategy is an ALBERT model+word vector scheme, and means for converting ALBERT model output into word vector is added to the former scheme; the model strategy is an ALBERT model, word vector and CRF scheme, and compared with the scheme, a conditional random field is added as a means for obtaining sequence labeling data by a decoder; the model strategy is a scheme of POI-ALBERT model, word vector and CRF, and means for pre-training the ALBERT model in the interest point information field is added relative to the scheme; the model strategy is a scheme of POI-ALBERT model, word vector and CRF and domain knowledge feature, and means for combining the domain knowledge feature to recognize is added relative to the scheme. As can be seen from the data in table 1, in the scheme in the embodiment of the application, the context features of the interest point information are extracted by using the model, and the domain knowledge features obtained by dictionary matching are combined as the feature data used in model prediction, so that the accuracy of recognition can be remarkably improved; in addition, the ALBERT model obtained through pre-training in the interest point information field is combined with the conditional random field to perform recognition, so that a better recognition effect relative to the BiSLTM model can be achieved.
TABLE 2
TABLE 3 Table 3
Further, in the embodiment of the application, the related index data of the model obtained after the newly added "multitask learning", "LOSS weight tuning" and "corpus decoupling" are further compared. Fig. 10 shows comparison conditions of F1 value indexes of models obtained by training after various model strategies are built, and it can be seen from table 2 and table 3 that the training method of the interest point information identification model provided in the embodiment of the application adopts a multi-task learning and corpus decoupling strategy, and after training the interest point information identification model by combining weighted loss values of tasks, the identification accuracy of the model can be greatly improved. Based on fig. 10, it can be seen that, in the embodiment of the present application, the model F1 value obtained by adding "multi-task learning", "LOSS weight tuning" and "corpus decoupling" is greatly improved, which is equivalent to the scheme of combining bislm and CRF models for prediction.
Referring to fig. 11, fig. 11 is a flowchart of a method for identifying point of interest information according to an embodiment of the present application. Similar to the training method of the point of interest information recognition model described above, the point of interest information recognition method may be performed by the terminal alone or in conjunction with the server, and the point of interest information recognition method includes, but is not limited to, the following steps 1110 to 1130.
Step 1110: acquiring third interest point information;
step 1120: inputting the third interest point information into the first sub-model for natural language processing to obtain second characteristic data corresponding to the third interest point information;
step 1130: inputting the second characteristic data into a second sub-model, and carrying out semantic component analysis on the third interest point information to obtain a semantic component identification result, wherein the semantic component identification result is used for representing the membership of the entity in the third interest point information;
the interest point information recognition model is trained by a training method of the interest point information recognition model shown in fig. 6.
In the foregoing embodiment, a training method of the interest point information identification model provided in the embodiment of the present application is described. After the training of the interest point information identification model is completed, the interest point information identification model can be put into the interest point information identification task to obtain a required interest point information identification result.
Specifically, in the embodiment of the present application, the point of interest information to be identified may be denoted as third point of interest information. After the third interest point information is obtained, the third interest point information can be input into a first sub-model in the trained interest point information identification model to be subjected to natural language processing, so that feature data corresponding to the third interest point information is obtained and recorded as second feature data. And then, inputting the second characteristic data into a second sub-model, and carrying out semantic component analysis on the third interest point information through the second sub-model to obtain a semantic component identification result, wherein the semantic component identification result can represent the membership of the entity in the third interest point information. The specific data form and meaning of the data form can refer to the semantic component label, and are not repeated herein.
In some embodiments, according to the second feature data, performing semantic component analysis on the third point of interest information to obtain a semantic component identification result, including:
searching and matching the third interest point information in a preset dictionary to obtain third characteristic data corresponding to the third interest point information;
performing feature fusion processing on the second feature data and the third feature data to obtain fourth feature data;
and inputting the fourth characteristic data into the second sub-model, and carrying out semantic component analysis on the third interest point information to obtain a semantic component identification result.
In this embodiment of the present application, referring to the model structure shown in fig. 7, when identifying the third point of interest information, not only the second feature data extracted by the first sub-model may be used, but also the domain knowledge feature corresponding to the third point of interest information may be determined by combining with a dictionary, so as to obtain the third feature data. Then, feature fusion processing can be performed on the second feature data and the third feature data to obtain fourth feature data, and semantic component analysis is performed on the third interest point information through the second sub-model based on the fourth feature data to obtain a semantic component recognition result. Thus, the accuracy of the obtained identification result can be effectively improved. Specifically, the fusion process herein may take any form of weighting or stitching, which is not limited in this application.
Taking feature data in a vector form as an example, assuming that a language model in the form of an output word feature vector is adopted in the embodiment of the present application, that is, the second feature data includes a word feature vector, when the second feature data and the third feature data are fused, first, the third feature data is obtained by searching and matching a dictionary, and is generally in the form of a word feature vector, and in the embodiment of the present application, the third feature data is marked as a first word feature vector. And for the second feature data, word segmentation processing can be performed on the third interest point information to obtain each word in the third interest point information, then feature fusion processing can be performed on the word feature vectors according to the word segmentation result to obtain word feature vectors corresponding to the words in the third interest point information, and the word feature vectors are recorded as second word feature vectors. Then, feature fusion processing can be performed on the first word feature vector and the second word feature vector, so as to obtain fourth feature data.
It can be understood that the method for identifying the interest point information provided by the embodiment of the application can be applied to application scenes such as big data, intelligent traffic systems, intelligent vehicle-road cooperative systems and the like.
For example, referring to fig. 12, an interface diagram of a terminal device displaying point of interest information is shown in fig. 12. In the terminal page shown in fig. 12, when searching for certain point-of-interest information, the terminal device may receive input query information. For example, when the input information is a point of interest belonging to a main point stored in the geographic information system, the terminal device may display the point of interest information including information of each branch under the main point. For example, in fig. 12, the input "kender" is the main point in the multiple interest point information, and the terminal device may correspondingly obtain the interest point information corresponding to the multiple branch shops having the entity membership of the main point, and display the interest point information for selection, so as to facilitate guiding navigation.
Referring to fig. 13 and 14, exemplary interface diagrams of a map navigation software are shown in fig. 13 and 14. In fig. 13, when the original map navigation software displays the information of the points of interest, it will generally choose to display the whole content of the information, for example, for a certain global vacation zone in fig. 13, there are multiple points of interest, and when each point of interest is displayed, each point of interest is prefixed with a global vacation zone, for example, "global vacation zone tourist bus parking zone" and so on. Thus, it seems inconvenient to use, and incomplete display is liable to occur. Based on the method for identifying the interest point information provided in the embodiment of the application, the entity membership between the interest point information can be identified. So that when searching the interest point information of the main point of the 'global vacation zone', only the names of the sub points thereof can be displayed. Referring to fig. 14, the map navigation software may omit the main point information in the identification of other interest points belonging to the main point "global vacation zone", so that the interface display is simpler, the readability is better, and the geographic information guidance can be provided more conveniently.
Referring to fig. 15, fig. 15 is a schematic structural diagram of a training device for a point of interest information identification model according to an embodiment of the present application, where the point of interest information identification model includes a first sub-model and a second sub-model; the training device of the interest point information identification model comprises:
The obtaining module 1510 is configured to obtain first point of interest information, a semantic component tag corresponding to a word in the first point of interest information, and an associated task tag of the first point of interest information; the semantic component labels are used for representing membership of entities in the first interest point information, and the associated task labels are used for representing task results of associated tasks of the first interest point information;
a first processing module 1520, configured to input the first point of interest information into the first sub-model for performing natural language processing, to obtain first feature data corresponding to the first point of interest information;
the analysis module 1530 is configured to input the first feature data into the second sub-model, perform semantic component analysis on the first interest point information, and obtain a semantic component analysis result corresponding to the word in the first interest point information;
the second processing module 1540 is configured to input the first feature data into the associated task model, perform associated task processing on the first interest point information, and obtain an associated task processing result corresponding to the first interest point information;
the computing module 1550 is used for determining a training loss value according to the semantic component analysis result and the semantic component label, and the associated task processing result and the associated task label;
And an updating module 1560, configured to update parameters of the first sub-model and the second sub-model according to the loss value.
Further, the associated task model includes a name category prediction model; the second processing module is specifically configured to:
and inputting the first characteristic data into a name category prediction model, and predicting the name category of the first interest point information to obtain a name category prediction result corresponding to the first interest point information.
Further, the associated task model comprises a name similarity judging model; the second processing module is specifically configured to:
randomly selecting and obtaining first target interest point information and second target interest point information from the first interest point information;
determining first target data obtained through processing according to the first target interest point information from the first characteristic data and second target data obtained through processing according to the second target interest point information;
and inputting the first target data and the second target data into a name similarity judging model, and judging the name similarity of the first target interest point information and the second target interest point information to obtain a name similarity judging result of the first target interest point information and the second target interest point information.
Further, the above calculation module is specifically configured to:
determining a first sub-loss value for carrying out semantic component analysis on the first interest point information according to the semantic component analysis result and the semantic component label;
determining a second sub-loss value for performing associated task processing on the first interest point information according to the associated task processing result and the associated task label;
and carrying out weighted summation on the first sub-loss value and the second sub-loss value to obtain a trained loss value.
Further, the above calculation module is specifically configured to:
determining the product of the first sub-loss value and the first weight to obtain a first numerical value;
determining the product of the second sub-loss value and the second weight to obtain a second value;
obtaining a trained loss value according to the sum of the first value and the second value;
wherein the first weight is greater than the second weight.
Further, the training device of the interest point information identification model further includes:
the detection module is used for detecting whether the corresponding semantic component labels or the associated task labels are missing in the first interest point information;
the jump module is used for skipping the step of inputting the first characteristic data into the second sub-model and carrying out semantic component analysis on the first interest point information when the corresponding semantic component label is missing from the first interest point information; or,
And skipping the step of inputting the first characteristic data into the associated task model and carrying out associated task processing on the first interest point information when the corresponding associated task label is absent from the first interest point information.
Further, the training device of the interest point information identification model further comprises a pre-training module, wherein the pre-training module is used for:
acquiring second interest point information of a batch;
performing mask processing on each second interest point information to obtain an interest point information training data set;
and pre-training the first sub-model according to the interest point information training data set and the second interest point information.
Further, the pre-training module is configured to at least one of:
carrying out random mask processing on the second interest point information; or,
masking a plurality of words in the second interest point information; or,
and performing mask processing on a plurality of entity information in the second interest point information.
It can be understood that the content of the training method embodiment of the point of interest information recognition model shown in fig. 6 is applicable to the training device embodiment of the point of interest information recognition model, and the functions of the training device embodiment of the point of interest information recognition model are the same as those of the training embodiment of the point of interest information recognition model shown in fig. 6, and the advantages achieved are the same as those achieved by the training embodiment of the point of interest information recognition model shown in fig. 6.
On the other hand, the embodiment of the application also provides an interest point information identification device, which is used for identifying the interest point information through an interest point information identification model, wherein the interest point information identification model comprises a first sub-model and a second sub-model; the device comprises:
the second acquisition module is used for acquiring third interest point information;
the third processing module is used for inputting third interest point information into the first submodel to perform natural language processing to obtain second characteristic data corresponding to the third interest point information;
the second analysis module is used for inputting second characteristic data into the second sub-model, and carrying out semantic component analysis on the third interest point information to obtain a semantic component identification result, wherein the semantic component identification result is used for representing the membership of the entity in the third interest point information;
the interest point information recognition model is obtained by training the training method of the interest point information recognition model.
Further, the second analysis module is specifically configured to:
searching and matching the third interest point information in a preset dictionary to obtain third characteristic data corresponding to the third interest point information;
performing feature fusion processing on the second feature data and the third feature data to obtain fourth feature data;
And inputting the fourth characteristic data into the second sub-model, and carrying out semantic component analysis on the third interest point information to obtain a semantic component identification result.
Further, the second analysis module is specifically configured to:
word segmentation processing is carried out on the third interest point information;
according to the word segmentation processing result, carrying out feature fusion processing on the character feature vector to obtain a second word feature vector corresponding to the word in the third interest point information;
and carrying out feature fusion processing on the first word feature vector and the second word feature vector to obtain fourth feature data.
It can be understood that the content in the embodiment of the method for identifying point of interest information shown in fig. 11 is applicable to the embodiment of the device for identifying point of interest information, and the functions implemented by the embodiment of the device for identifying point of interest information are the same as those of the embodiment of the method for identifying point of interest information shown in fig. 11, and the advantages achieved are the same as those achieved by the embodiment of the method for identifying point of interest information shown in fig. 11.
Referring to fig. 16, the embodiment of the application further discloses an electronic device, including:
at least one processor 1610;
at least one memory 1620 for storing at least one program;
The at least one program, when executed by the at least one processor 1610, causes the at least one processor 1610 to implement an embodiment of a training method of a point of interest information identification model as shown in fig. 6 or an embodiment of a point of interest information identification method as shown in fig. 11.
It can be understood that the training method embodiment of the interest point information identification model shown in fig. 6 and the content in the interest point information identification method embodiment shown in fig. 11 are both applicable to the present electronic device embodiment, the functions specifically implemented by the present electronic device embodiment are the same as those of the training method embodiment of the interest point information identification model shown in fig. 6 and the interest point information identification method embodiment shown in fig. 11, and the beneficial effects achieved are the same as those achieved by the training method embodiment of the interest point information identification model shown in fig. 6 and the interest point information identification method embodiment shown in fig. 11.
The embodiment of the application also discloses a computer readable storage medium, in which a program executable by a processor is stored, which when executed by the processor is used to implement the training method embodiment of the interest point information identification model shown in fig. 6 or the interest point information identification method embodiment shown in fig. 11.
It can be understood that the training method embodiment of the interest point information identification model shown in fig. 6 and the content in the interest point information identification method embodiment shown in fig. 11 are both applicable to the computer-readable storage medium embodiment, the functions specifically implemented by the computer-readable storage medium embodiment are the same as those of the training method embodiment of the interest point information identification model shown in fig. 6 and the interest point information identification method embodiment shown in fig. 11, and the beneficial effects achieved are the same as those achieved by the training method embodiment of the interest point information identification model shown in fig. 6 and the interest point information identification method embodiment shown in fig. 11.
The embodiments also disclose a computer program product or a computer program comprising computer instructions stored in a computer readable storage medium as described above; the processor of the electronic device shown in fig. 16 may read the computer instructions from the computer-readable storage medium described above, and the processor executes the computer instructions, so that the electronic device performs the training method embodiment of the point-of-interest information identification model shown in fig. 6 and the point-of-interest information identification method embodiment shown in fig. 11.
It can be understood that the training method embodiment of the interest point information identification model shown in fig. 6 and the content in the interest point information identification method embodiment shown in fig. 11 are both applicable to the present computer program product or the computer program embodiment, and the functions specifically implemented by the present computer program product or the computer program embodiment are the same as those of the training method embodiment of the interest point information identification model shown in fig. 6 and the interest point information identification method embodiment shown in fig. 11, and the beneficial effects achieved are the same as those achieved by the training method embodiment of the interest point information identification model shown in fig. 6 and the interest point information identification method embodiment shown in fig. 11.
In some alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may in fact be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flowcharts of this application are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed, and in which sub-operations described as part of a larger operation are performed independently.
Furthermore, while the present application is described in the context of functional modules, it should be appreciated that, unless otherwise indicated, one or more of the functions and/or features may be integrated in a single physical device and/or software module or one or more of the functions and/or features may be implemented in separate physical devices or software modules. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary to an understanding of the present application. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be apparent to those skilled in the art from consideration of their attributes, functions and internal relationships. Thus, those of ordinary skill in the art will be able to implement the present application as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative and are not intended to be limiting upon the scope of the application, which is to be defined by the appended claims and their full scope of equivalents.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable storage medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable storage medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer-readable storage medium may even be paper or other suitable medium upon which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted, or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.
It is to be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.
In the foregoing description of the present specification, descriptions of the terms "one embodiment/example", "another embodiment/example", "certain embodiments/examples", and the like, are intended to mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the present application have been shown and described, it will be understood by those of ordinary skill in the art that: many changes, modifications, substitutions and variations may be made to the embodiments without departing from the principles and spirit of the application, the scope of which is defined by the claims and their equivalents.
While the preferred embodiments of the present application have been described in detail, the present application is not limited to the embodiments, and various equivalent modifications and substitutions can be made by those skilled in the art without departing from the spirit of the present application, and these equivalent modifications and substitutions are intended to be included in the scope of the present application as defined in the appended claims.