CN116796181A - Method, device, computer equipment and storage medium for training prediction model - Google Patents
Method, device, computer equipment and storage medium for training prediction model Download PDFInfo
- Publication number
- CN116796181A CN116796181A CN202210250322.6A CN202210250322A CN116796181A CN 116796181 A CN116796181 A CN 116796181A CN 202210250322 A CN202210250322 A CN 202210250322A CN 116796181 A CN116796181 A CN 116796181A
- Authority
- CN
- China
- Prior art keywords
- training
- sample
- features
- generalization
- function
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The application provides a method, a device, computer equipment and a storage medium for training a prediction model, which can be applied to the field of artificial intelligence or intelligent traffic and the like and is used for solving the problem of low prediction accuracy and reliability of the trained prediction model. The method at least comprises the following steps: during each iteration training process, at least the following steps are executed: based on a plurality of function combinations obtained by permutation and combination of each pre-stored nonlinear function, respectively, nonlinear transformation is carried out on the plurality of initial characteristics to obtain a plurality of generalization characteristics; wherein each generalization feature characterizes one object type associated with the sample object and one content type associated with the sample recommended content; and determining the training loss of the prediction model to be trained based on the various generalization features, and performing model parameter adjustment. Therefore, the problem of over fitting is avoided, and the prediction accuracy and reliability of the prediction model are improved.
Description
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, a computer device, and a storage medium for training a prediction model.
Background
With the continuous development of technology, more and more devices can provide intelligent search services for target objects. For example, since the content that the device can present is increasingly rich in response to the search operation of the target object, the device can preferentially present the target content having a higher probability of selecting the target object from among the plurality of contents when the content searched out by the target object is more.
When the device preferentially presents the target content with larger target object selection probability, a trained prediction model obtained by multiple rounds of iterative training is generally adopted, and the predicted target object selects different content probabilities.
In the traditional method for obtaining the trained prediction model, because the quantity of sample data is limited, repeated iterative training is usually required to be carried out on the prediction model to be trained based on the same group of sample data, so that the training loss of the prediction model can reach a training target, and the trained prediction model can be obtained.
However, when the prediction model to be trained is trained based on the same set of sample data repeatedly, the prediction model is only focused on learning the main features represented by the set of sample data, so that the obtained trained prediction model has the problem of over fitting, and therefore, when the trained prediction model is used for predicting the probability of selecting different contents by a target object, the situation of misjudgment probability can occur, so that the content preferentially presented by the device is not the target content which needs to be selected by the target object, and the prediction accuracy and reliability of the trained prediction model are lower.
Therefore, in the related art, the prediction accuracy and reliability of the trained prediction model are low.
Disclosure of Invention
The embodiment of the application provides a method, a device, computer equipment and a storage medium for training a prediction model, which are used for solving the problem that the prediction accuracy and reliability of the trained prediction model are low.
In a first aspect, a method of training a predictive model is provided, comprising:
obtaining respective sample data, wherein each sample data comprises: sample object and sample recommended content, and sample probability that the sample object selects the sample recommended content;
based on the sample data, performing multiple rounds of iterative training on the prediction model to be trained, and outputting a trained target prediction model; wherein, in each round of iterative training process, at least the following steps are executed:
respectively extracting characteristics of a sample object and sample recommended content contained in one sample data to obtain a plurality of initial characteristics, and respectively carrying out nonlinear transformation on the plurality of initial characteristics based on a plurality of function combinations obtained by carrying out permutation and combination on each prestored nonlinear function to obtain a plurality of generalization characteristics; wherein each generalization feature characterizes one object type associated with the sample object and one content type associated with the sample recommended content;
And determining the training loss of the prediction model to be trained based on the various generalization features, and performing model parameter adjustment.
In a second aspect, there is provided an apparatus for training a predictive model, comprising:
the acquisition module is used for: for obtaining respective sample data, wherein each sample data comprises: sample object and sample recommended content, and sample probability that the sample object selects the sample recommended content;
the processing module is used for: the method comprises the steps of carrying out multi-round iterative training on a prediction model to be trained based on each sample data, and outputting a trained target prediction model; wherein, in each round of iterative training process, at least the following steps are executed:
the processing module is further configured to: respectively extracting characteristics of a sample object and sample recommended content contained in one sample data to obtain a plurality of initial characteristics, and respectively carrying out nonlinear transformation on the plurality of initial characteristics based on a plurality of function combinations obtained by carrying out permutation and combination on each prestored nonlinear function to obtain a plurality of generalization characteristics; wherein each generalization feature characterizes one object type associated with the sample object and one content type associated with the sample recommended content;
The processing module is further configured to: and determining the training loss of the prediction model to be trained based on the various generalization features, and performing model parameter adjustment.
Optionally, the processing module is specifically configured to:
selecting each sparse feature matched with the sample object and each sparse feature matched with the sample recommended content from a pre-stored sparse feature dictionary;
and respectively adjusting the selected sparse features into specified dimensions to obtain the initial features.
Optionally, the processing module is specifically configured to:
for each function combination, the following operations are performed separately:
based on each nonlinear function contained in a function combination, respectively carrying out nonlinear transformation on the plurality of initial features to obtain a plurality of nonlinear transformation results;
and combining the nonlinear transformation results to obtain a generalization feature.
Optionally, each function combination is a sequence of a plurality of sub-combinations, each sub-combination comprising at least one nonlinear function; the processing module is specifically configured to:
for the multiple function combinations, the following operations are respectively executed:
according to the arrangement sequence of a plurality of sub-combinations contained in a function combination, respectively carrying out nonlinear transformation on the plurality of initial features based on at least one nonlinear function contained in a first sub-combination in the arrangement sequence, and outputting at least one corresponding intermediate feature;
And according to the arrangement sequence, respectively carrying out nonlinear transformation on at least one intermediate feature output by the adjacent previous sub-combination based on at least one nonlinear function contained in other sub-combinations in sequence until the last sub-combination is arranged in the arrangement sequence to carry out nonlinear transformation, and taking at least one intermediate feature output as a generalization feature.
Optionally, the processing module is specifically configured to:
performing linear transformation on the initial features by adopting a plurality of linear functions to obtain a plurality of linear transformation features, wherein each linear function is a first-order linear function or a second-order linear function;
determining a training loss of the predictive model to be trained based on the plurality of linear transformation features and the plurality of generalization features;
and when the training loss does not meet the training target, carrying out model parameter adjustment on the prediction model to be trained, and entering the next round of iterative training.
Optionally, the processing module is specifically configured to:
for the plurality of generalization features, the following operations are performed respectively: predicting a training probability of the sample object to select the sample recommendation based on the plurality of linear transformation features and a generalization feature;
Based on the predicted plurality of training probabilities, the training loss of the prediction model to be trained is determined based on errors between the predicted plurality of training probabilities and the sample probabilities contained in the sample data.
Optionally, the processing module is specifically configured to:
respectively calculating errors between the training probabilities and sample probabilities contained in the sample data by adopting a cross entropy function to obtain a plurality of cross entropy losses;
calculating the error between every two training probabilities in the training probabilities by adopting a similarity evaluation function to obtain at least one probability loss;
determining a training loss of the predictive model to be trained based on the plurality of cross entropy losses and the at least one probability loss.
Optionally, the processing module is specifically configured to:
calculating the error between every two generalization features in the plurality of generalization features by adopting a norm function to obtain at least one generalization loss;
the plurality of cross entropy losses, the at least one probability loss, and the at least one generalization loss are taken as training losses of the predictive model to be trained.
In a third aspect, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method according to the first aspect.
In a fourth aspect, there is provided a computer device comprising:
a memory for storing program instructions;
and a processor for calling program instructions stored in the memory and executing the method according to the first aspect according to the obtained program instructions.
In a fifth aspect, there is provided a computer readable storage medium storing computer executable instructions for causing a computer to perform the method of the first aspect.
In the embodiment of the application, each pre-stored nonlinear function is arranged and combined to obtain a plurality of function combinations, a plurality of initial characteristics corresponding to one sample data are respectively subjected to nonlinear transformation based on the plurality of function combinations to obtain a plurality of generalization characteristics, and each generalization characteristic represents one object type related to the sample object and one content type related to the recommended content of the sample.
On the one hand, in the process of carrying out one round of iterative training on one sample data, various generalization features contained in the sample data are extracted, and the prediction model to be trained is trained based on the various generalization features.
On the other hand, in the process of one round of iterative training, the extracted various generalized characteristics can represent various object types related to sample objects and various content types related to sample recommended content, so that the prediction model to be trained fully learns sample data, the problem that the trained prediction model is fitted due to the fact that the sample data contain some characteristic conditions in the training process is omitted, and the prediction accuracy and reliability of the trained prediction model are improved.
Drawings
FIG. 1a is a schematic diagram of a method of training predictions in the related art;
FIG. 1b is an application scenario of a method for training predictions provided by an embodiment of the present application;
FIG. 2 is a flow chart of a method for training predictions according to an embodiment of the present application;
FIG. 3a is a schematic diagram of a method for training prediction according to an embodiment of the present application;
FIG. 3b is a schematic diagram II of a method for training prediction according to an embodiment of the present application;
FIG. 4a is a schematic diagram III of a method for training predictions according to an embodiment of the present application;
FIG. 4b is a schematic diagram of a method for training predictions according to an embodiment of the present application;
FIG. 5a is a schematic diagram of a method for training predictions according to an embodiment of the present application;
FIG. 5b is a schematic diagram of a method for training predictions according to an embodiment of the present application;
FIG. 5c is a schematic diagram seven of a method for training predictions according to an embodiment of the present application;
FIG. 6 is a schematic diagram eight of a method for training predictions according to an embodiment of the present application;
FIG. 7a is a schematic diagram of a method for training predictions according to an embodiment of the present application;
FIG. 7b is a schematic diagram of a method for training predictions according to an embodiment of the present application;
FIG. 7c is a schematic diagram eleven of a method of training predictions provided by an embodiment of the present application;
FIG. 8 is a schematic diagram of a training prediction apparatus according to an embodiment of the present application;
fig. 9 is a schematic diagram of a second structure of the training prediction apparatus according to the embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application.
Some terms in the embodiments of the present application are explained below to facilitate understanding by those skilled in the art.
(1) Depth recommendation (deep fm) model:
based on classical paper Wide & Deep learning, the Wide part, namely LR part of the original paper is replaced by FM, the defect that the original model needs artificial feature engineering is improved, and an end-to-end Deep learning model is obtained. The deep FM model is widely applied to recommendation and advertisement systems of a plurality of Internet companies.
(2) Embedding (Embedding):
the task of deep learning is to map high-dimensional raw data (e.g., images, sentences) to low-dimensional manifolds so that the high-dimensional raw data becomes separable after being mapped to the low-dimensional manifolds, this mapping is called Embedding.
(3) Random discard (Dropout) algorithm:
dropout means that in the training process of the deep learning network, the neural network unit is temporarily discarded from the network according to a certain probability, which is equivalent to finding a thinner network from the original network.
The embodiment of the application relates to the field of artificial intelligence (Artificial Intelligence, AI), which is designed based on a Computer Vision (CV) technology and a Machine Learning (ML) technology, and can be applied to the fields of cloud computing, intelligent traffic, auxiliary driving or maps and the like.
Artificial intelligence is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and expand human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is a comprehensive technology of computer science, which studies the design principles and implementation methods of various machines in an attempt to understand the essence of intelligence, and to produce a new intelligent machine that can react in a similar way to human intelligence, so that the machine has the functions of sensing, reasoning and decision.
Artificial intelligence is a comprehensive discipline, and relates to a wide range of fields, including hardware-level technology and software-level technology. Basic technologies of artificial intelligence generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation interaction systems, electromechanical integration, and the like. The software technology of artificial intelligence mainly comprises computer vision technology, voice processing technology, natural language processing technology, machine learning/deep learning, automatic driving, intelligent traffic and other large directions. With the development and progress of artificial intelligence, the artificial intelligence is developed and applied in various fields, such as common fields of smart home, smart customer service, virtual assistant, smart sound box, smart marketing, smart wearable equipment, unmanned driving, automatic driving, unmanned plane, robot, smart medical treatment, internet of vehicles, automatic driving, smart transportation, etc., and it is believed that with the further development of future technology, the artificial intelligence will be applied in more fields, playing an increasingly important value. The scheme provided by the embodiment of the application relates to the technology of artificial intelligence deep learning, augmented reality and the like, and is specifically further described through the following embodiments.
The computer vision is a science for researching how to make a machine "see", and more specifically, a camera and a computer are used to replace human eyes to identify, track and measure targets, and the like, and further, graphic processing is performed, so that the computer is processed into images which are more suitable for human eyes to observe or transmit to an instrument to detect. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data. Computer vision technologies typically include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technology, virtual reality, augmented reality, synchronous positioning and mapping, autopilot, intelligent transportation, etc., as well as common biometric technologies such as face recognition, fingerprint recognition, etc.
Machine learning is a multi-field interdisciplinary, and relates to multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like, and a specially researched computer acquires new knowledge or skills by simulating learning behaviors of human beings, reorganizes the existing knowledge structure and enables the computer to continuously improve the performance of the computer.
Machine learning is the core of artificial intelligence, which is the fundamental way for computers to have intelligence, applied throughout various areas of artificial intelligence; the core of machine learning is deep learning, which is a technology for realizing machine learning. Machine learning typically includes deep learning, reinforcement learning, transfer learning, induction learning, artificial neural networks, teaching learning, etc., which includes convolutional neural networks (Convolutional Neural Networks, CNN), deep confidence networks, recurrent neural networks, automatic encoders, generation countermeasure networks, etc.
It should be noted that, in the embodiments of the present application, related data such as objects or contents are related, when the above embodiments of the present application are applied to specific products or technologies, user permission or consent is required to be obtained, and the collection, use and processing of related data is required to comply with related laws and regulations and standards of related countries and regions.
The application field of the method for training a prediction model provided by the embodiment of the application is briefly described below.
With the continuous development of technology, more and more devices can provide intelligent search services for target objects. Because the recommended content which can be presented by the device is increasingly rich when responding to the searching operation of the target object, the device can preferentially present the target recommended content with higher probability of selecting the target object from a plurality of recommended contents when the recommended content searched by the target object is more.
For example, in the communication software installed in the device, the target object may acquire different articles or videos through different subscription numbers, please refer to fig. 1a. Subscription numbers are a self-media publishing platform, and daily living objects are more than 3 hundred million, so that recommended content which can be presented by the device is very rich.
The device can display each article or video in turn according to the probability from large to small by predicting the probability of selecting each article or video by the target object, so that the target object can acquire the recommended content of interest more quickly.
When the device preferentially presents the target recommended content with a larger target object selection probability, a trained prediction model obtained by multiple rounds of iterative training is generally adopted, and the predicted probability of selecting different recommended content by the target object is predicted, for example, by adopting a Click-Through-Rate (CTR) prediction model, so as to predict the probability of whether a user clicks a certain item.
In the traditional method for obtaining the trained prediction model, because the quantity of sample data is limited, repeated iterative training is usually required to be carried out on the prediction model to be trained based on the same group of sample data, so that the training loss of the prediction model can reach a training target, and the trained prediction model can be obtained.
However, when the prediction model to be trained is trained based on the same set of sample data repeatedly, the prediction model only focuses on learning the main features represented by the set of sample data, but ignores the sparse long tail data, so that the fitting capacity of the long tail features is insufficient, and the obtained trained prediction model has the problem of over fitting.
Therefore, when the trained prediction model is used for predicting the probability of selecting different recommended contents by the target object, the situation of misjudgment probability can occur, for example, the recommended contents which are not interested by the target object are predicted to be the larger selection probability of the target object; and predicting the recommended content of interest of the target object as a small selection probability of the target object, and the like. The recommendation content which is preferentially presented by the equipment is not the target recommendation content which needs to be selected by the target object, so that the prediction accuracy and reliability of the trained prediction model are low.
Therefore, in the related art, the prediction accuracy and reliability of the trained prediction model are low.
In order to solve the problems of low prediction accuracy and low reliability of the trained prediction model, the application provides a method for training the prediction model. In the method, after obtaining each sample data, a prediction model to be trained is subjected to multiple rounds of iterative training based on each sample data, and a trained target prediction model is output. Wherein each sample data comprises: sample object and sample recommended content, and sample probability that the sample object selects the sample recommended content.
During each iteration training process, at least the following steps are executed: extracting features of sample objects and sample recommended contents contained in one sample data respectively to obtain a plurality of initial features, and carrying out nonlinear transformation on the plurality of initial features based on a plurality of function combinations obtained by carrying out permutation and combination on each prestored nonlinear function respectively to obtain a plurality of generalization features; wherein each generalization feature characterizes one object type associated with the sample object and one content type associated with the sample recommended content. Based on various generalization features, training loss of a prediction model to be trained is determined, and model parameter adjustment is performed.
In the embodiment of the application, each pre-stored nonlinear function is arranged and combined to obtain a plurality of function combinations, a plurality of initial characteristics corresponding to one sample data are respectively subjected to nonlinear transformation based on the plurality of function combinations to obtain a plurality of generalization characteristics, and each generalization characteristic represents one object type related to the sample object and one content type related to the recommended content of the sample.
On the one hand, in the process of carrying out one round of iterative training on one sample data, various generalization features contained in the sample data are extracted, and the prediction model to be trained is trained based on the various generalization features.
On the other hand, in the process of one round of iterative training, the extracted various generalized characteristics can represent various object types related to sample objects and various content types related to sample recommended content, so that the prediction model to be trained fully learns sample data, the problem that the trained prediction model is fitted due to the fact that the sample data contain some characteristic conditions in the training process is omitted, and the prediction accuracy and reliability of the trained prediction model are improved.
The application scenario of the method for training a predictive model provided by the application is described below.
Referring to fig. 1b, a schematic view of an application scenario of the method for training a prediction model according to the present application is shown. The application scene comprises a client 101 and a server 102. Communication may be between client 101 and server 102. The communication mode can be communication by adopting a wired communication technology, for example, communication is carried out through a connecting network wire or a serial port wire; the communication may also be performed by using a wireless communication technology, for example, a bluetooth or wireless fidelity (wireless fidelity, WIFI) technology, which is not particularly limited.
Client 101 generally refers to a device that may provide sample data to server 102 or may use a trained predictive model, e.g., a terminal device, a third party application accessible to a terminal device, or a web page accessible to a terminal device, etc. Terminal devices include, but are not limited to, cell phones, computers, intelligent transportation devices, intelligent appliances, and the like. The server 102 generally refers to a device, such as a terminal device or a server, that can train a predictive model. Servers include, but are not limited to, cloud servers, local servers, or associated third party servers, and the like. Both the client 101 and the server 102 can adopt cloud computing to reduce occupation of local computing resources; cloud storage may also be employed to reduce the occupation of local storage resources.
As an embodiment, the client 101 and the server 102 may be the same device, which is not limited in particular. In the embodiment of the present application, the description will be given by taking the case that the client 101 and the server 102 are different devices respectively.
The following specifically describes a method for training a prediction model provided by the embodiment of the present application based on fig. 1b, where the server 102 is used as a server, and the server is used as a main body. Referring to fig. 2, a flowchart of a method for training a prediction model according to an embodiment of the application is shown.
S201, obtaining each sample data.
When the server trains the prediction model to be trained, each sample data can be acquired first, and each sample data comprises: sample object and sample recommended content, and sample probability that the sample object selects the sample recommended content. The trained prediction model is used for predicting the probability that a target object selects a certain candidate recommended content based on the target object, the candidate recommended content and other information, so that the server can sort the candidate recommended contents according to the probability of selecting the candidate recommended content according to the predicted target object from high to low, and sequentially present the target recommended content in the candidate recommended contents to the target object according to the sorting.
The sample object may include an object such as a virtual account number; the sample object may further include a time of selecting the recommended content of the sample, and may further include attribute information of the sample object, for example, sex, age, geographical location, occupation, account usage duration, etc., without limitation.
The sample recommended content may include content such as articles, videos, voice or chat records, etc., and the sample recommended content may further include attribute information of the content, for example, authors of the content, interaction amount of the content, main keywords included in comments of the content, richness of the content, such as length of the articles, number of pictures included in the articles, number of video scene switching, etc., without limitation.
The sample object selection sample recommended content may be used for viewing details of the sample recommended content, playing the sample recommended content, viewing context information of the sample recommended content, forwarding the sample recommended content, marking the sample recommended content, and the like, and is not particularly limited.
The server may obtain each sample data from the storage device, may download each sample data from the network resource, may generate each virtual sample data according to the data standard, and the like, and is not specifically limited.
S202, performing multi-round iterative training on the prediction model to be trained based on each sample data, and outputting a trained target prediction model.
After obtaining each sample data, the server may perform multiple rounds of iterative training on the prediction model to be trained based on each sample data, outputting a trained target prediction model. In each round of iterative training, the server can train the prediction model to be trained based on one sample data, and adjust model parameters of the prediction model to be trained. The server may output the trained target prediction model until the training of the prediction model to be trained is completed based on all the sample data, or may output the trained target prediction model when it is determined that the training loss of the prediction model to be trained satisfies the training target, or the like, specifically without limitation.
Taking the process of training the prediction model to be trained based on one sample data as an example, the process of training the prediction model to be trained based on other sample data is similar, and the training process for the prediction model to be trained can refer to S203 to S204, which is not described herein.
And S203, respectively extracting features of a sample object and sample recommended content contained in one sample data to obtain a plurality of initial features, and respectively carrying out nonlinear transformation on the plurality of initial features based on a plurality of function combinations obtained by carrying out permutation and combination on each prestored nonlinear function to obtain a plurality of generalization features.
For one sample data in each sample data, the server can adopt a prediction model to be trained to respectively extract characteristics of a sample object and sample recommended content contained in the one sample data, so as to obtain a plurality of initial characteristics. Each initial feature is used to characterize an apparent shallow feature contained in the sample object or sample recommendation, and each individual initial feature can only be used to characterize a portion of the sample object or sample recommendation. Each initial feature may be in the form of a feature vector or a feature matrix, and the feature dimensions of each initial feature may be the same or different, which is not particularly limited.
After obtaining each initial feature, the server continues to adopt a prediction model to be trained, and performs permutation and combination on each pre-stored nonlinear function to obtain multiple function combinations, and performs nonlinear transformation on the multiple initial features based on the multiple function combinations to obtain multiple generalized features. Each generalization feature characterizes one object type associated with the sample object and one content type associated with the sample recommended content. By performing nonlinear transformation on a plurality of initial features, the characterization capability of the initial features can be enhanced, each part of the sample object or the sample recommended content is converged into a whole to form abstract deep features contained in the sample object or the sample recommended content, so that each generalized feature obtained after the nonlinear transformation can characterize one object type related to the sample object and one content type related to the sample recommended content.
Referring to fig. 3a, a schematic diagram of a method for training a prediction model according to an embodiment of the application is shown. And aiming at sample objects and sample recommended contents contained in one sample data, obtaining an initial feature A, initial features B and … … and an initial feature N after extracting features by adopting a prediction model to be trained. The predictive model to be trained continues to perform nonlinear transformation on the initial feature a, the initial features B, … …, and the initial feature N based on the various functional combinations, respectively. After the function combination a is adopted to carry out nonlinear transformation on the initial feature A, the initial features B and … … and the initial feature N, a generalized feature a is obtained; the function combination B is adopted to carry out nonlinear transformation on the initial feature A, the initial features B and … … and the initial feature N, and then a generalized feature B is obtained; similarly, a function combination N is adopted to perform nonlinear transformation on the initial feature A, the initial features B and … … and the initial feature N, so that one generalization feature N is obtained, and a plurality of generalization features are obtained.
As an embodiment, the initial feature may be embedding (embedding), and the server may directly extract the sample object and the embedding contained in the sample recommended content by using a prediction model to be trained; the server may also select each sparse feature matched with the sample object and each sparse feature matched with the sample recommended content from a pre-stored sparse feature dictionary, where the sparse feature may be in a code (one-hot) form. Thus, all features of the sample object and sample recommendation content characterizations can be obtained.
After each sparse feature is obtained, the server continues to adopt a prediction model to be trained, and adjusts each sparse feature to be a designated dimension to obtain a plurality of initial features. The manner of adjusting the dimensionality of the sparse features using the predictive model to be trained may be achieved by a hidden layer in the predictive model to be trained.
The hidden layer for adjusting the dimension in the prediction model to be trained can comprise a plurality of hidden nodes, for each sparse feature, the hidden nodes with the same number as the value of the appointed dimension can be selected from the plurality of hidden nodes, the dimension of the sparse feature is adjusted, each selected hidden node can be based on the corresponding weight parameter to be trained, each element contained in the sparse feature is weighted and summed, and therefore the sparse feature can be adjusted to the appointed dimension.
Referring to FIG. 3b, the designated dimension is 3, each sparse feature is [0,0,1,0,0,0], [0,1,0,0,0,0], [0,0,0,0,0,1] and [1,0,0,0,0,0], and each sparse feature has dimensions of 6. Taking dimension adjustment for sparse features [0,0,1,0,0,0] as an example, three hidden nodes are selected from a plurality of hidden nodes, namely a hidden node A, a hidden node B and a hidden node C, wherein the hidden nodes A, the hidden node B and the hidden node C respectively correspond to weight parameters to be trained of 2,5 and 3.
The hidden node A carries out weighted summation on each element contained in the sparse feature [0,0,1,0,0,0] to obtain 2; the hidden node B performs weighted summation on each element contained in the sparse feature [0,0,1,0,0,0] to obtain 5; the hidden node C performs weighted summation on each element contained in the sparse feature [0,0,1,0,0,0] to obtain 3. Thus, through the hidden layer, the sparse feature [0,0,1,0,0,0] can be adjusted to [2,5,3], achieving the purpose of adjusting dimension 6 to 3. Because the sparse features contain a large amount of 0, the sparse features can be condensed through dimension adjustment, and the processing of the prediction model for the subsequent processing of the features can be performed based on the features with specified dimensions, so that the problem of high training complexity caused by diversity of the feature dimensions is avoided.
As an embodiment, when the pre-stored nonlinear functions are arranged and combined to obtain multiple function combinations, each two function combinations may include the same number of nonlinear functions or may include different numbers of nonlinear functions. For example, the first function combination includes a nonlinear function a and a nonlinear function B, the second function combination includes a nonlinear function a and a nonlinear function C, and the third function combination includes a nonlinear function a, a nonlinear function B, and a nonlinear function C.
At least one nonlinear function is correspondingly different from the two function combinations containing the same number of nonlinear functions. For example, the first function combination contains a nonlinear function a and a nonlinear function B, and the second function combination contains a nonlinear function a and a nonlinear function C.
Based on various function combinations, nonlinear transformation is respectively carried out on a plurality of initial features, and corresponding various generalization features are obtained. Taking a function combination as an example, respectively adopting each nonlinear function contained in the function combination to perform nonlinear transformation on a plurality of initial features to obtain a plurality of nonlinear transformation results, and combining each nonlinear transformation result to obtain a generalization feature.
Referring to fig. 4a, a function combination includes a nonlinear function a, a nonlinear function B, and a nonlinear function C, and a plurality of initial features includes an initial feature a, initial features B, … …, and an initial feature n. Nonlinear transformation is carried out on the initial feature a, the initial features b and … … and the initial feature n by adopting a nonlinear function A, so as to obtain a nonlinear transformation result A; nonlinear transformation is carried out on the initial feature a, the initial features B and … … and the initial feature n by adopting a nonlinear function B, so as to obtain a nonlinear transformation result B; and carrying out nonlinear transformation on the initial feature a, the initial features b and … … and the initial feature n by adopting a nonlinear function C to obtain a nonlinear transformation result C. The nonlinear transformation result a, the nonlinear transformation result B, and the nonlinear transformation result C are combined into one generalized feature.
As an embodiment, each function combination may also be a sequence of a plurality of sub-combinations, each sub-combination comprising at least one nonlinear function. For example, one function combination includes a sub-combination a, a sub-combination B, and a sub-combination C, and the sub-combinations in the one function combination are arranged in the order of the sub-combination a, the sub-combination C, and the sub-combination B.
Taking a function combination as an example, the server may adopt a prediction model to be trained, perform nonlinear transformation on the plurality of initial features according to the arrangement sequence of the plurality of sub-combinations included in the function combination, and output at least one corresponding intermediate feature based on at least one nonlinear function included in the first sub-combination in the arrangement sequence. And continuing to perform nonlinear transformation on at least one intermediate feature output by the adjacent previous sub-combination based on at least one nonlinear function contained in other sub-combinations in sequence according to the arrangement sequence until the output at least one intermediate feature is used as a generalization feature after the nonlinear transformation is performed on the last sub-combination in the arrangement sequence.
Referring to fig. 4B, one function combination includes a sub-combination a and a sub-combination B, and the sub-combinations in the one function combination are arranged in the order of the sub-combination a and the sub-combination B. The sub-combination a comprises a nonlinear function a, a nonlinear function B and a nonlinear function c, and the sub-combination B comprises a nonlinear function d and a nonlinear function e. The plurality of initial features includes initial feature a, initial features b, … …, and initial feature n.
According to the arrangement sequence, nonlinear transformation is carried out on the initial feature a, the initial features b and … … and the initial feature n by adopting a nonlinear function a contained in the sub-combination A to obtain an intermediate feature a; nonlinear transformation is carried out on the initial feature a, the initial features b and … … and the initial feature n by adopting a nonlinear function b contained in the sub-combination A, so as to obtain an intermediate feature b; and carrying out nonlinear transformation on the initial feature a, the initial features b and … … and the initial feature n by adopting a nonlinear function c contained in the sub-combination A to obtain an intermediate feature c.
Continuously adopting a nonlinear function d contained in the sub-combination B according to the arrangement sequence to perform nonlinear transformation on the intermediate feature a, the intermediate feature B and the intermediate feature c to obtain an intermediate feature d; and carrying out nonlinear transformation on the intermediate feature a, the intermediate feature B and the intermediate feature c by adopting a nonlinear function e contained in the sub-combination B to obtain an intermediate feature e.
Since sub-combination B is the last sub-combination, then intermediate feature d and intermediate feature e are combined into one generalized feature.
As an embodiment, the server may use Dropout algorithm to perform permutation and combination on multiple nonlinear functions to obtain multiple function combinations, and may use Dropout randomness to obtain multiple permutation and combination modes from multiple nonlinear functions to form multiple function combinations. By fitting one sample data through multiple function combinations, multiple generalization features of the sample data can be obtained, the purpose of carrying out multiple training and learning on the one sample data is achieved, and the effect of data enhancement is achieved.
Taking Dropout as an example, two kinds of function combinations are obtained, because of randomness of Dropout, nonlinear functions contained in the two kinds of function combinations are different, and because the function combinations are composed of nonlinear functions selected from a plurality of prestored nonlinear functions, if corresponding identical nonlinear functions exist in the two kinds of function combinations, relevant node parameters corresponding to the identical nonlinear functions are identical, namely parameters are shared.
After obtaining the plurality of initial features, the server may select from a plurality of nonlinear functions using Dropout, please refer to fig. 5a, wherein each circle represents a nonlinear function, and the prediction model includes a plurality of nonlinear functions.
With reference to fig. 5B, two corresponding function combinations, function combination a and function combination B, are obtained by two rounds of random selection. In the function combination a and the function combination B, the solid circles represent the non-linear functions that have been selected, and the dotted circles represent the non-linear functions that have not been selected.
After obtaining the function combination a and the function combination B, please refer to fig. 5c, the server performs nonlinear transformation on the plurality of initial features based on the function combination a and the function combination B, respectively, to obtain two corresponding generalization features, including a generalization feature a and a generalization feature B.
S204, determining training loss of a prediction model to be trained based on various generalization features, and adjusting model parameters.
After obtaining the plurality of generalization features, the server may determine a training loss of the predictive model to be trained based on the plurality of generalization features and perform model parameter adjustment. The server can adjust model parameters of the prediction model to be trained when determining that the training loss does not meet the training target; and outputting the prediction model to be trained as a trained target prediction model when the training loss meets the training target. The server can also adjust model parameters of the prediction model to be trained based on training loss when determining that sample data which is not trained exists in each sample data; and outputting a prediction model to be trained as a trained target prediction model when determining that the sample data which is not trained does not exist in each sample data, and particularly, not limiting.
The server can predict one training probability of the sample recommended content selected by the sample object based on various generalization features respectively, and obtain a plurality of training probabilities. The server may determine a training loss based on an error between an average of the plurality of training probabilities and the corresponding sample probability; the server may also determine a training loss based on a maximum error between each of the plurality of training probabilities and the sample probability; the server may also determine a training loss based on a minimum error between each of the plurality of training probabilities and the sample probability; the server may also determine a training loss or the like based on a distribution of errors between each of the plurality of training probabilities and the sample probability, and is not particularly limited.
Through predicting a plurality of training probabilities based on a plurality of generalization features, the training process is simplified and the training efficiency is improved by completely training one sample data for a plurality of times in one round of training process.
As an embodiment, since the initial features characterize the apparent features of the sample data and the generalized features characterize the abstract features of the sample data, the server may predict the training probabilities of the sample object to select the sample recommendation based on the initial features and the generalized features.
The server may perform linear transformation on the plurality of initial features by using a plurality of linear functions, to obtain a plurality of linear transformation features, where each linear function is a first-order linear function or a second-order linear function. The server may perform linear transformation on the plurality of initial features by using a first-order linear function to obtain a first-order representation, i.e., a first-order feature, and perform linear transformation on the plurality of initial features by using a second-order linear function to obtain a second-order representation, i.e., a second-order feature, of the plurality of initial features, and may use the obtained first-order feature and second-order feature as linear transformation features, respectively.
When the initial features are obtained based on the sparse features, the server can perform linear transformation on each sparse feature by adopting a first-order linear function to obtain a first-order feature, perform linear transformation on each initial feature by adopting a second-order linear function to obtain a second-order feature, and respectively using the obtained first-order feature and second-order feature as linear transformation features.
When the plurality of initial features are subjected to linear transformation, the server can select part of the initial features from all the initial features to perform linear transformation, or can directly perform linear transformation on all the initial features.
The server can learn the selected linear function based on multiple rounds of iterative training when performing linear transformation each time, and parameters contained in each linear function can be learned based on multiple rounds of iterative training, and the like, and the method is not particularly limited.
After obtaining the plurality of linear transformation features, the server may determine a training loss of the predictive model to be trained based on the plurality of linear transformation features and the plurality of generalization features. And when the training loss does not meet the training target, carrying out model parameter adjustment on the prediction model to be trained, and entering the next round of iterative training. When the training loss meets the training target, outputting the prediction model to be trained as a trained target prediction model.
As an embodiment, when determining the training loss of the prediction model to be trained based on the plurality of linear transformation features and the plurality of generalization features, the server may select one training probability of the sample recommendation content for each generalization feature based on the plurality of linear transformation features and the one generalization feature, so that the plurality of training probabilities may be obtained. The server may determine a training loss of the predictive model to be trained based on the predicted plurality of training probabilities, the errors between the respective sample probabilities.
As an embodiment, when determining the training loss of the prediction model to be trained based on the predicted errors between the plurality of training probabilities and the corresponding sample probabilities, respectively, the server may calculate the errors between the plurality of training probabilities and the corresponding sample probabilities, respectively, using a cross entropy function, and obtain a plurality of cross entropy losses. The server may take the obtained plurality of cross entropy losses as training losses for the predictive model to be trained.
Because based on various generalization features, a plurality of training probabilities can be correspondingly obtained, and the predicted training probabilities tend to be consistent no matter what angle is based on the generalization features, the situation that the obtained plurality of training probabilities are too divergent to cause inaccurate determined training loss or cannot meet the training target can be avoided by setting constraint conditions in the training process. The constraint of the constraint condition can be used as the training loss of the prediction model to be trained together with the cross entropy loss, so that the prediction model to be trained can learn sample data more accurately.
There are a variety of constraints, two of which are described below as examples.
Constraint condition one:
the error between each two training probabilities of the plurality of training probabilities is constrained.
The server may calculate an error between each two training probabilities of the plurality of training probabilities using a similarity evaluation function to obtain at least one probability loss. The probability loss is used to measure the similarity between two training probabilities, the smaller the probability loss, the greater the similarity.
The server may determine a training penalty for the predictive model to be trained based on the plurality of cross entropy penalties and the at least one probability penalty. Through constraint probability loss, the model can extract the generalization characteristics of sample data more accurately in the training process, predict the training probability based on the generalization characteristics more accurately, and improve the generalization capability, so that the prediction accuracy and reliability of the trained target prediction model are higher.
Constraint conditions II:
the error between each two generalization features in the plurality of generalization features is constrained.
The server may calculate an error between each two generalization features of the plurality of generalization features using a norm function to obtain at least one generalization loss. The norm function may be an L1 norm, an L2 norm, or the like, and is not particularly limited. By restraining every two generalization features, the object category and the content category of every two generalization feature characterization are prevented from being too divergent, so that the model can extract the generalization features of sample data more accurately in the training process, and the prediction accuracy and reliability of the prediction model are improved.
The server may take a plurality of cross entropy losses, and at least one generalization loss, as training losses for the predictive model to be trained. The server may also combine constraint conditions one to use a plurality of cross entropy losses, at least one probability loss, and at least one generalization loss as training losses of the prediction model to be trained, and the like, and is not particularly limited.
Taking two kinds of generalization features, namely, a generalization feature a and a generalization feature B as an example, please refer to fig. 6, a server may first calculate an error between the generalization feature a and the generalization feature B by using a norm function, so as to obtain a generalization loss. After the prediction is performed based on the generalization feature A and the generalization feature B respectively to obtain the training probability A and the training probability B, the server calculates the error between the training probability A and the training probability B by adopting a similarity evaluation function to obtain the probability loss. The server adopts a cross entropy function to calculate errors between the training probability A and the training probability B and the sample probability respectively, and cross entropy loss A and cross entropy loss B are obtained. The server takes the generalization loss, the probability loss, and the cross entropy loss A and the cross entropy loss B as training losses.
The following describes an example of a method for training a prediction model according to an embodiment of the present application.
Referring to fig. 7a, taking a training process for one sample data as an example, the model architecture of the prediction model to be trained is based on the model architecture of the deep fm model.
The method comprises the steps that a server adopts a prediction model to be trained, feature extraction is conducted on sample objects and sample recommended contents contained in sample data, a plurality of sparse features are obtained, one sparse feature is represented by one triangle, and different sparse features are represented by different triangles.
After obtaining the plurality of sparse features, adopting a hidden Layer (Embedding Layer) for converting the sparse features into the embedded Layer in a prediction model to be trained, respectively densifying the plurality of sparse features, respectively adjusting the plurality of sparse features into specified dimensions, and obtaining a plurality of initial features.
After a plurality of initial features are obtained, a Dropout algorithm is adopted, and two kinds of function combinations A and B are selected from a plurality of nonlinear functions contained in a prediction model to be trained through permutation and combination. The solid circles in fig. 7a are non-linear functions that have been selected, and the dashed circles are non-linear functions that have not been selected. The two function combinations in fig. 7a each comprise a plurality of sub-combinations, each sub-combination comprising a plurality of nonlinear functions. The parameters of the same nonlinear function in the two function combinations are shared.
And respectively adopting two kinds of function combinations to perform nonlinear transformation on a plurality of initial features to obtain two corresponding generalization features, namely a generalization feature A and a generalization feature B, which are represented by blocks in fig. 7 a. Meanwhile, a Wide & FM linear memory module in a prediction model to be trained is adopted, and a plurality of linear functions can be utilized to perform linear transformation on a plurality of initial characteristics respectively, so that a plurality of corresponding linear transformation characteristics are obtained. And predicting the training probability of the sample recommended content selected by the sample object based on the obtained multiple linear transformation characteristics and each generalized characteristic by adopting a prediction model to be trained, and obtaining two training probabilities, namely a training probability A and a training probability B.
After obtaining two generalization features, calculating a generalization loss between the two generalization features by adopting an L2 norm; after obtaining two training probabilities, calculating the similarity between the two training probabilities by adopting KL-Divergent to obtain probability loss; after obtaining the two training probabilities, respectively calculating the cross entropy loss between the two training probabilities and the sample probability by adopting a cross entropy function, and obtaining the two cross entropy losses.
The server may determine whether the obtained generalization loss, probability loss, and two cross entropy losses respectively satisfy respective corresponding training targets as training losses.
When the training loss is determined to meet the training target, adjusting model parameters of a prediction model to be trained, and entering into the next round of iterative training; and outputting the prediction model to be trained as a trained target prediction model when the training loss meets the training target.
Compared with the CTR estimated model of the main stream in the related technology, the test is carried out in the offline data set on the subscription number service, and the model evaluation index (AUC) is remarkably improved. Referring to table 1, the prediction model provided by the embodiment of the present application is compared with the LR model, the FM model, the Wide & Deep model, the Deep FM model, the AutoInt model and the xdeefm model in the related art, so as to improve the AUC index.
TABLE 1
| Model | AUC |
| LR model | 0.756 |
| FM model | 0.762 |
| Wide&Deep model | 0.766 |
| Deep FM model | 0.769 |
| AutoInt model | 0.768 |
| xDeepFM model | 0.771 |
| Predictive model of the application | 0.774 |
In the embodiment of the application, no model structure and model parameters are added on the basis of the deep FM model. On the basis of carrying out multiple training on each sample data, the training time length is only increased by 30 percent, which is far lower than the time consumption increase brought by models with more complex structures, such as an AutoInt model and an xDeepFM model.
As one example, after obtaining the trained target prediction model, the server may predict a probability that the target object selects the target content using the target prediction model.
The target object enters an application program through the client, and the application program can push the content published by each subscription number to the target object at regular time. For example, each alternative content that can be pushed includes alternative content a, published by ID1, for introducing the recently launched movie a; alternative content B, issued by ID2, for introducing a bread specific approach; alternative content C, issued by ID3, is used for introducing ten movies with highest scores in the last five years; alternative content D is issued by ID4 and is used for introducing ten television shows related to baking; alternative content E, published by ID5, is used to introduce a recently-rated movie and television show of the target actor.
Referring to fig. 7b, the first target object and the second target object enter a certain application program through respective clients, wherein the clients have relevant information of the corresponding target objects, such as the sex of the first target object is female, the occupation is a western, and the first target object and the ID1 are in friend relation; the preference of the second target object is to watch a movie, and the favorite star is the target actor.
Then, when the first target object enters a certain application program through the client, the server predicts the prediction probability of the first target object selecting each candidate content from the candidate contents by using the target prediction model. The server inputs the related information of the first target object and the related information of each candidate content into a target prediction model.
The target prediction model outputs the prediction probability of the first target object selection candidate content A to be 0.6 by extracting the characteristics of the first target object and the characteristics of each candidate content; the prediction probability of the first target object selection candidate content B is 0.8; the prediction probability of the first target object selection candidate content C is 0.2; the prediction probability of the first target object selection candidate content D is 0.7; the predictive probability of the first target object selection candidate content E is 0.1.
Thus, the server can select, from among the respective alternative contents, an alternative content whose prediction probability exceeds 0.5, including an alternative content a, an alternative content B, and an alternative content D. And the server sorts the alternative content A, the alternative content B and the alternative content D according to the prediction probability from large to small to obtain a target sequence with the arrangement sequence of the alternative content B, the alternative content D and the alternative content A. The server sends the target sequence to the client, and the client presents each content to the first target object according to the arrangement sequence of each content in the target sequence.
Referring to fig. 7c, when the second target object enters a certain application program through the client, the server predicts the prediction probability of the second target object selecting each candidate content from the candidate contents by using the target prediction model. The server inputs the relevant information of the second target object and the relevant information of each candidate content into the target prediction model.
The target prediction model outputs the prediction probability of the second target object selection alternative content A to be 0.9 by extracting the characteristics of the second target object and the characteristics of each alternative content; the prediction probability of the first target object selection candidate content B is 0.1; the prediction probability of the first target object selection candidate content C is 0.6; the prediction probability of the first target object selection candidate content D is 0.3; the predictive probability of the first target object selection candidate content E is 0.8.
Thus, the server may select the first 3 candidate contents with the highest prediction probability from the respective candidate contents, including candidate content a, candidate content C, and candidate content E. And the server sorts the alternative content A, the alternative content C and the alternative content E according to the prediction probability from large to small to obtain a target sequence with the arrangement sequence of the alternative content A, the alternative content E and the alternative content C. The server sends the target sequence to the client, and the client presents each content to the second target object according to the arrangement sequence of each content in the target sequence.
Based on the same inventive concept, the embodiment of the application provides a device for training a prediction model, which can realize the functions corresponding to the method for training the prediction model. Referring to fig. 8, the apparatus includes an acquisition module 801 and a processing module 802, where:
acquisition module 801: for obtaining respective sample data, wherein each sample data comprises: sample object and sample recommended content, and sample probability that the sample object selects the sample recommended content;
processing module 802: the method comprises the steps of carrying out multiple rounds of iterative training on a prediction model to be trained based on each sample data, and outputting a trained target prediction model; wherein, in each round of iterative training process, at least the following steps are executed:
the processing module 802 is further configured to: extracting features of sample objects and sample recommended contents contained in one sample data respectively to obtain a plurality of initial features, and carrying out nonlinear transformation on the plurality of initial features based on a plurality of function combinations obtained by carrying out permutation and combination on each prestored nonlinear function respectively to obtain a plurality of generalization features; wherein each generalization feature characterizes one object type associated with the sample object and one content type associated with the sample recommended content;
The processing module 802 is further configured to: based on various generalization features, training loss of a prediction model to be trained is determined, and model parameter adjustment is performed.
In one possible embodiment, the processing module 802 is specifically configured to:
selecting each sparse feature matched with the sample object and each sparse feature matched with the sample recommended content from a pre-stored sparse feature dictionary respectively;
and respectively adjusting the selected sparse features into specified dimensions to obtain a plurality of initial features.
In one possible embodiment, the processing module 802 is specifically configured to:
for each function combination, the following operations are performed separately:
based on each nonlinear function contained in a function combination, respectively carrying out nonlinear transformation on a plurality of initial features to obtain a plurality of nonlinear transformation results;
combining the plurality of nonlinear transformation results to obtain a generalization feature.
In one possible embodiment, each function combination is a sequence of a plurality of sub-combinations, each sub-combination comprising at least one nonlinear function; the processing module 802 is specifically configured to:
for a plurality of function combinations, the following operations are performed respectively:
according to the arrangement sequence of a plurality of sub-combinations contained in one function combination, respectively carrying out nonlinear transformation on a plurality of initial features based on at least one nonlinear function contained in a first sub-combination in the arrangement sequence, and outputting at least one corresponding intermediate feature;
And respectively carrying out nonlinear transformation on at least one intermediate feature output by the adjacent previous sub-combination based on at least one nonlinear function contained in other sub-combinations in sequence according to the arrangement sequence until the output at least one intermediate feature is used as a generalization feature after the nonlinear transformation is carried out on the last sub-combination in the arrangement sequence.
In one possible embodiment, the processing module 802 is specifically configured to:
adopting a plurality of linear functions to respectively perform linear transformation on the plurality of initial characteristics to obtain a plurality of linear transformation characteristics, wherein each linear function is a first-order linear function or a second-order linear function;
determining a training loss of a predictive model to be trained based on the plurality of linear transformation features and the plurality of generalization features;
and when the training loss does not meet the training target, carrying out model parameter adjustment on the prediction model to be trained, and entering the next round of iterative training.
In one possible embodiment, the processing module 802 is specifically configured to:
for a plurality of generalization features, the following operations are respectively performed: predicting training probabilities of sample recommendation contents of sample objects based on a plurality of linear transformation characteristics and a generalization characteristic;
based on the predicted plurality of training probabilities, an error between each of the training probabilities and a sample probability contained in one sample data is determined.
In one possible embodiment, the processing module 802 is specifically configured to:
respectively calculating a plurality of training probabilities by adopting a cross entropy function, and obtaining a plurality of cross entropy losses by error between the training probabilities and sample probabilities contained in sample data;
calculating the error between every two training probabilities in the plurality of training probabilities by adopting a similarity evaluation function to obtain at least one probability loss;
based on the plurality of cross entropy losses and the at least one probability loss, a training loss of the predictive model to be trained is determined.
In one possible embodiment, the processing module 802 is specifically configured to:
calculating the error between every two generalization features in a plurality of generalization features by adopting a norm function to obtain at least one generalization loss;
the method comprises the step of taking a plurality of cross entropy losses, at least one probability loss and at least one generalization loss as training losses of a prediction model to be trained.
Referring to fig. 9, the apparatus for training prediction may be run on a computer device 900, and a current version and a historical version of a data storage program and application software corresponding to the data storage program may be installed on the computer device 900, where the computer device 900 includes a processor 980 and a memory 920. In some embodiments, the computer device 900 may include a display unit 940, the display unit 940 including a display panel 941 for displaying an interface or the like for interactive operation by a user.
In one possible embodiment, the display panel 941 may be configured in the form of a liquid crystal display (Liquid Crystal Display, LCD) or an Organic Light-Emitting Diode (OLED) or the like.
The processor 980 is configured to read the computer program and then perform a method defined by the computer program, for example, the processor 980 reads a data storage program or a file, etc., so that the data storage program is executed on the computer device 900 and a corresponding interface is displayed on the display unit 940. Processor 980 may include one or more general-purpose processors and may also include one or more DSPs (Digital Signal Processor, digital signal processors) for performing associated operations to implement the techniques according to embodiments of the present application.
Memory 920 generally includes memory and external storage, and memory may be Random Access Memory (RAM), read Only Memory (ROM), CACHE memory (CACHE), and the like. The external memory can be a hard disk, an optical disk, a USB disk, a floppy disk, a tape drive, etc. The memory 920 is used to store computer programs including application programs corresponding to respective clients, etc., and other data, which may include data generated after the operating system or application programs are executed, including system data (e.g., configuration parameters of the operating system) and user data. In an embodiment of the present application, program instructions are stored in memory 920 and processor 980 executes the program instructions in memory 920, implementing any of the methods discussed in the previous figures.
The above-described display unit 940 is used to receive input digital information, character information, or touch operation/noncontact gestures, and to generate signal inputs related to user settings and function controls of the computer device 900, and the like. Specifically, in an embodiment of the present application, the display unit 940 may include a display panel 941. The display panel 941, such as a touch screen, may collect touch operations thereon or thereabout by a user (e.g., operations of the user on the display panel 941 or on the display panel 941 using any suitable object or accessory such as a finger, a stylus, etc.), and drive the corresponding connection device according to a predetermined program.
In one possible embodiment, the display panel 941 may include two parts, a touch detection device and a touch controller. The touch detection device detects the touch azimuth of a player, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device and converts it into touch point coordinates, which are then sent to the processor 980, and can receive commands from the processor 980 and execute them.
The display panel 941 may be implemented by various types such as resistive, capacitive, infrared, and surface acoustic wave. In addition to the display unit 940, in some embodiments, the computer device 900 may also include an input unit 930, and the input unit 930 may include an image input device 931 and other input devices 932, which may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, mouse, joystick, etc.
In addition to the above, computer device 900 may also include a power supply 990 for powering other modules, audio circuitry 960, near field communication module 970, and RF circuitry 910. The computer device 900 may also include one or more sensors 950, such as acceleration sensors, light sensors, pressure sensors, and the like. Audio circuitry 960 may include, among other things, a speaker 961 and a microphone 962, for example, where the computer device 900 may collect a user's voice via the microphone 962, perform a corresponding operation, etc.
The number of processors 980 may be one or more, and the processors 980 and memory 920 may be coupled or may be relatively independent.
As an example, processor 980 in fig. 9 may be used to implement the functionality of acquisition module 801 and processing module 802 as in fig. 8.
As an example, the processor 980 in fig. 9 may be used to implement the functions associated with the servers or terminal devices discussed above.
Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware associated with program instructions, where the foregoing program may be stored in a computer readable storage medium, and when executed, the program performs steps including the above method embodiments; and the aforementioned storage medium includes: a mobile storage device, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk or an optical disk, or the like, which can store program codes.
Alternatively, the above-described integrated units of the present application may be stored in a computer-readable storage medium if implemented in the form of software functional modules and sold or used as separate products. Based on such understanding, the technical solutions of the embodiments of the present application may be embodied in essence or in a part contributing to the prior art in the form of a software product, for example, by a computer program product stored in a storage medium, comprising several instructions for causing a computer device to execute all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a removable storage device, ROM, RAM, magnetic or optical disk, or other medium capable of storing program code.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.
Claims (12)
1. A method of training a predictive model, comprising:
Obtaining respective sample data, wherein each sample data comprises: sample object and sample recommended content, and sample probability that the sample object selects the sample recommended content;
based on the sample data, performing multiple rounds of iterative training on the prediction model to be trained, and outputting a trained target prediction model; wherein, in each round of iterative training process, at least the following steps are executed:
respectively extracting characteristics of a sample object and sample recommended content contained in one sample data to obtain a plurality of initial characteristics, and respectively carrying out nonlinear transformation on the plurality of initial characteristics based on a plurality of function combinations obtained by carrying out permutation and combination on each prestored nonlinear function to obtain a plurality of generalization characteristics; wherein each generalization feature characterizes one object type associated with the sample object and one content type associated with the sample recommended content;
and determining the training loss of the prediction model to be trained based on the various generalization features, and performing model parameter adjustment.
2. The method according to claim 1, wherein the feature extraction is performed on the sample object and the sample recommended content included in the sample data, respectively, to obtain a plurality of initial features, including:
Selecting each sparse feature matched with the sample object and each sparse feature matched with the sample recommended content from a pre-stored sparse feature dictionary;
and respectively adjusting the selected sparse features into specified dimensions to obtain the initial features.
3. The method according to claim 1, wherein the nonlinear transformation is performed on the plurality of initial features based on a plurality of function combinations obtained by permutation and combination of respective nonlinear functions, respectively, to obtain a plurality of generalized features, including:
for each function combination, the following operations are performed separately:
based on each nonlinear function contained in a function combination, respectively carrying out nonlinear transformation on the plurality of initial features to obtain a plurality of nonlinear transformation results;
and combining the nonlinear transformation results to obtain a generalization feature.
4. A method according to claim 3, wherein each function combination is a sequence of a plurality of sub-combinations, each sub-combination comprising at least one nonlinear function;
the nonlinear transformation is performed on the plurality of initial features based on a plurality of function combinations obtained by permutation and combination of the pre-stored nonlinear functions, respectively, to obtain a plurality of generalization features, including:
For the multiple function combinations, the following operations are respectively executed:
according to the arrangement sequence of a plurality of sub-combinations contained in a function combination, respectively carrying out nonlinear transformation on the plurality of initial features based on at least one nonlinear function contained in a first sub-combination in the arrangement sequence, and outputting at least one corresponding intermediate feature;
and according to the arrangement sequence, respectively carrying out nonlinear transformation on at least one intermediate feature output by the adjacent previous sub-combination based on at least one nonlinear function contained in other sub-combinations in sequence until the last sub-combination is arranged in the arrangement sequence to carry out nonlinear transformation, and taking at least one intermediate feature output as a generalization feature.
5. The method according to any one of claims 1-4, wherein determining training loss of the predictive model to be trained and performing model parameter adjustment based on the plurality of generalization features comprises:
performing linear transformation on the initial features by adopting a plurality of linear functions to obtain a plurality of linear transformation features, wherein each linear function is a first-order linear function or a second-order linear function;
Determining a training loss of the predictive model to be trained based on the plurality of linear transformation features and the plurality of generalization features;
and when the training loss does not meet the training target, carrying out model parameter adjustment on the prediction model to be trained, and entering the next round of iterative training.
6. The method of claim 5, wherein determining a training loss of the predictive model to be trained based on the plurality of linear transformation features and the plurality of generalization features comprises:
for the plurality of generalization features, the following operations are performed respectively: predicting a training probability of the sample object to select the sample recommendation based on the plurality of linear transformation features and a generalization feature;
based on the predicted plurality of training probabilities, the training loss of the prediction model to be trained is determined based on errors between the predicted plurality of training probabilities and the sample probabilities contained in the sample data.
7. The method of claim 6, wherein determining the training loss of the predictive model to be trained based on the predicted plurality of training probabilities and the error between the respective sample probabilities contained in the one sample data comprises:
Respectively calculating errors between the training probabilities and sample probabilities contained in the sample data by adopting a cross entropy function to obtain a plurality of cross entropy losses;
calculating the error between every two training probabilities in the training probabilities by adopting a similarity evaluation function to obtain at least one probability loss;
determining a training loss of the predictive model to be trained based on the plurality of cross entropy losses and the at least one probability loss.
8. The method of claim 7, wherein determining a training loss of the predictive model to be trained based on the plurality of cross entropy losses and the at least one probability loss comprises:
calculating the error between every two generalization features in the plurality of generalization features by adopting a norm function to obtain at least one generalization loss;
the plurality of cross entropy losses, the at least one probability loss, and the at least one generalization loss are taken as training losses of the predictive model to be trained.
9. An apparatus for training a predictive model, comprising:
the acquisition module is used for: for obtaining respective sample data, wherein each sample data comprises: sample object and sample recommended content, and sample probability that the sample object selects the sample recommended content;
The processing module is used for: the method comprises the steps of carrying out multi-round iterative training on a prediction model to be trained based on each sample data, and outputting a trained target prediction model; wherein, in each round of iterative training process, at least the following steps are executed:
the processing module is further configured to: respectively extracting characteristics of a sample object and sample recommended content contained in one sample data to obtain a plurality of initial characteristics, and respectively carrying out nonlinear transformation on the plurality of initial characteristics based on a plurality of function combinations obtained by carrying out permutation and combination on each prestored nonlinear function to obtain a plurality of generalization characteristics; wherein each generalization feature characterizes one object type associated with the sample object and one content type associated with the sample recommended content;
the processing module is further configured to: and determining the training loss of the prediction model to be trained based on the various generalization features, and performing model parameter adjustment.
10. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 8.
11. A computer device, comprising:
A memory for storing program instructions;
a processor for invoking program instructions stored in the memory and for performing the method according to any of claims 1-8 in accordance with the obtained program instructions.
12. A computer-readable storage medium storing computer-executable instructions for causing a computer to perform the method of any one of claims 1 to 8.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202210250322.6A CN116796181A (en) | 2022-03-15 | 2022-03-15 | Method, device, computer equipment and storage medium for training prediction model |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202210250322.6A CN116796181A (en) | 2022-03-15 | 2022-03-15 | Method, device, computer equipment and storage medium for training prediction model |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN116796181A true CN116796181A (en) | 2023-09-22 |
Family
ID=88047005
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202210250322.6A Pending CN116796181A (en) | 2022-03-15 | 2022-03-15 | Method, device, computer equipment and storage medium for training prediction model |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN116796181A (en) |
-
2022
- 2022-03-15 CN CN202210250322.6A patent/CN116796181A/en active Pending
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN111008332B (en) | Content item recommendation method, device, server and storage medium | |
| CN112231347B (en) | A data processing method, device, computer equipment and storage medium | |
| CN111597446A (en) | Content pushing method and device based on artificial intelligence, server and storage medium | |
| CN113254711B (en) | Interactive image display method and device, computer equipment and storage medium | |
| CN111353299B (en) | Dialog scene determining method based on artificial intelligence and related device | |
| CN116935170A (en) | Processing method and device of video processing model, computer equipment and storage medium | |
| CN114780863B (en) | Project recommendation method and device based on artificial intelligence, computer equipment and medium | |
| CN115131058B (en) | Account identification method, device, equipment and storage medium | |
| CN116051917B (en) | Method for training image quantization model, method and device for searching image | |
| CN116955707A (en) | Content tag determination method, device, equipment, medium and program product | |
| CN115487508B (en) | Training method and related device for game team recommendation model | |
| CN118035945B (en) | Label recognition model processing method and related device | |
| CN117938951B (en) | Information pushing method, device, computer equipment and storage medium | |
| CN120523941A (en) | Text classification method, device, equipment and storage medium | |
| CN117056587B (en) | Content pushing method, device, computer equipment and storage medium | |
| CN114926480B (en) | Method, device, equipment and storage medium for training image segmentation model | |
| CN116796181A (en) | Method, device, computer equipment and storage medium for training prediction model | |
| CN111814812A (en) | Modeling method, device, storage medium, electronic device and scene recognition method | |
| CN116975451A (en) | Article recommendation method, device, equipment and storage medium | |
| CN115269961A (en) | Content search method and related device | |
| CN118051629A (en) | Content generation method, device, computer equipment and storage medium | |
| CN114595777B (en) | A method, device, computer equipment and storage medium for training classification model | |
| CN113426139A (en) | Information recommendation method and device, computer equipment and storage medium | |
| HK40084266A (en) | An interactive images display method, device, computer equipment and storage medium | |
| CN118690090A (en) | Method, device, equipment and storage medium for determining object attribute labels |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |