[go: up one dir, main page]

WO2023231753A1 - Neural network training method, data processing method, and device - Google Patents

Neural network training method, data processing method, and device Download PDF

Info

Publication number
WO2023231753A1
WO2023231753A1 PCT/CN2023/094166 CN2023094166W WO2023231753A1 WO 2023231753 A1 WO2023231753 A1 WO 2023231753A1 CN 2023094166 W CN2023094166 W CN 2023094166W WO 2023231753 A1 WO2023231753 A1 WO 2023231753A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
feature
loss function
training sample
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2023/094166
Other languages
French (fr)
Chinese (zh)
Inventor
周峰暐
董振华
孙睿
洪蓝青
黎嘉伟
李震国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of WO2023231753A1 publication Critical patent/WO2023231753A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • This application relates to the field of artificial intelligence, and in particular to a neural network training method, data processing method and equipment.
  • neural networks are increasingly used to help people make decisions.
  • a neural network can be used to determine whether to recommend specific news; as another example, for example, in an education recommendation system, a neural network can be used to determine whether to recommend a specific course; as another example, for example, Neural networks are used to determine whether to grant a loan to a specific applicant, and so on.
  • the predicted decision-making information output by the trained neural network during the inference phase will also be biased, or even amplify this bias.
  • the collected training data contains more feedback data for courses provided by suppliers targeting the general public, and more feedback data for courses provided by niche suppliers. There is less feedback data on courses provided by suppliers.
  • the neural network trained based on such data will consider the course suppliers when determining the recommended courses. That is, the neural network learns the characteristics of the training data during the training process. deviation.
  • the probability of courses provided by public suppliers being recommended will be greatly increased, but the quality of the courses provided by the specific suppliers mentioned above is not necessarily good, that is, the predictive decision-making information output by the neural network in the inference stage will also be biased. .
  • Embodiments of the present application provide a neural network training method, a data processing method and a device to respectively obtain the characteristics of the information associated with the target attribute in the input data, and the characteristics of the information not associated with the target attribute in the input data, In this way, the user can determine the target attribute based on the reasons that cause the deviation of the prediction decision information in the current task, and extract the characteristics of the information associated with the target attribute that causes the deviation and the characteristics of the information not associated with the target attribute, which is beneficial to reducing the
  • the difficulty of the subsequent decision-making process is conducive to improving the accuracy of the final predictive decision-making information.
  • embodiments of the present application provide a neural network training method that can use artificial intelligence technology to make decisions.
  • the method includes: the training device inputs the first training sample into a feature acquisition network, and uses the feature acquisition network to perform the first training Feature extraction is performed on the sample to obtain the feature information of the first training sample; according to the feature information of the first training sample, the first feature information and the second feature information corresponding to the first training sample are generated through the feature acquisition network; wherein, with the first The first characteristic information corresponding to the training sample includes the characteristics of the information associated with the target attribute in the first training sample; and the first characteristic information
  • the second feature information corresponding to the training sample includes features of information in the first training sample that are not associated with each target attribute, that is, the training target includes the obtained first feature information and the second feature information includes different information.
  • technicians can determine target attributes based on factors in the current task that cause bias in predictive decision information.
  • at least one target attribute used in the training phase of the neural network may include Supplier
  • the information associated with the target attribute in the first training sample may include a watermark indicating the supplier of the course, an introduction to the supplier in the course cover, or other information; as another
  • a trained neural network is used to determine whether a face in an image has curly hair, and the factors that lead to bias in the prediction decision information include the gender of the person in the image, then at least one goal is used in the training phase of the neural network.
  • the attributes may include the gender of the user.
  • the information associated with the target attribute in the face image i.e., the first training sample
  • the image information of the neck part may be used to determine whether the user has Adam's apple. wait.
  • the training device performs a classification operation according to the first feature information corresponding to the first training sample to obtain prediction category information.
  • the prediction category information indicates the prediction category of the first feature information corresponding to the first training sample, that is, the first prediction category information indicates The prediction category corresponding to the information associated with the target attribute in the first training sample, the prediction category is included in multiple categories corresponding to the target attribute; according to the first loss function, the feature acquisition network is trained to obtain the feature acquisition after training Network; as an example, if a neural network is used to determine whether a certain course is recommended, and the target attribute is a supplier, then the multiple categories corresponding to the target attribute can include supplier A, supplier B, supplier C, and supplier D. etc.; as another example, if a neural network is used to determine whether the person in the image has curly hair, and the target attribute is gender, then the multiple categories corresponding to the target attribute may include male and female.
  • the first loss function includes a first loss function term and a second loss function term.
  • the first loss function term indicates the similarity between the predicted category information and the expected category information.
  • the expected category information indicates the similarity between the first training sample and the target attribute.
  • Correct category of associated information, the purpose of using the second loss function term for training includes reducing the similarity between the first feature information and the second feature information; further, the second loss function term can directly calculate the first feature information and The similarity between the second feature information, or the second loss function term can also be the similarity between other information.
  • the feature information of the first training sample is decomposed through the feature acquisition network to obtain the first feature information and the second feature information corresponding to the first training sample.
  • the first feature information corresponding to the first training sample includes Characteristics of the information associated with the target attribute in the first training sample; performing a classification operation according to the first feature information corresponding to the first training sample to obtain predicted category information, where the predicted category information indicates the first feature information corresponding to the first training sample
  • the prediction category is included in multiple categories corresponding to the target attribute; according to the first loss function, the feature acquisition network is trained to obtain the trained feature acquisition network;
  • the first loss function includes the first loss function term and The second loss function term, the first loss function term indicates the similarity between the predicted category information and the expected category information, the expected category information indicates the correct category of the information associated with the target attribute in the first training sample, the second loss function term is used
  • the purpose of training includes reducing the similarity between the first feature information and the second feature information.
  • the trained feature acquisition network can respectively obtain the characteristics of the information associated with the target attribute in the input data, and the characteristics of the information not associated with the target attribute in the input data, so that the user can make predictions based on the current task.
  • Determine the target attribute based on the reasons for the deviation of the decision-making information, and extract the characteristics of the information associated with the target attribute that caused the deviation and the characteristics of the information not associated with the target attribute, which will help reduce the difficulty of the subsequent decision-making process and improve the quality of the decision-making process.
  • the method further includes: the training device combines the first feature information and the second feature information corresponding to the first training sample to obtain the first combined feature; the aforementioned "combination" operation
  • the method can be any one or more of the following operations: splicing, addition or other types of operations, etc.; input the first combined features into the first classification network to obtain the output of the first classification network corresponding to the first training sample First predictive decision information.
  • the training device trains the feature acquisition network according to the first loss function, including: the training device trains the feature acquisition network and the first classification network according to the first loss function, where the first loss function also includes a third loss function term , the third loss function term indicates the similarity between the first predicted decision information and the expected decision information corresponding to the first training sample.
  • the purpose of training the feature acquisition network not only includes accurately obtaining the characteristics of the information associated with the target attribute from the feature information of the first training sample, but also The first feature information and the second feature information are combined to obtain the first combined feature, and a third loss function term is introduced.
  • the purpose of using the third loss function term for training includes improving the prediction decision information based on the first combined feature.
  • the accuracy rate is conducive to further improving the accuracy rate of the prediction decision information obtained in the reasoning stage.
  • the training device trains the feature acquisition network and the first classification network, and after obtaining the trained feature acquisition network and the trained first classification network, the method further includes: training device acquisition third feature information, and combine the second feature information and the third feature information corresponding to the second training sample to obtain the second combined feature; wherein, the third feature information and the first feature corresponding to the second training sample
  • the data size of the messages is the same and the data content is different.
  • the third feature information is also expressed as an N-dimensional tensor, and the third feature information and the first feature information are in each of the aforementioned N dimensions. The lengths are the same.
  • the third feature information is also expressed as a vector, and the length of the third feature information and the first feature information are the same; if the first feature information is specifically expressed as a matrix, then the third feature information is specifically expressed as a vector.
  • the feature information is also represented as a matrix, and the length and width of the third feature information and the first feature information are the same; the difference in data content of the third feature information and the first feature information corresponding to the second training sample refers to:
  • the data contents of the third feature information and the first feature information corresponding to the second training sample are not exactly the same, that is, it suffices that there are different data in the third feature information and the first feature information corresponding to the second training sample.
  • the training device inputs the second combined features into the trained first classification network to obtain the second prediction decision information corresponding to the second training sample output by the trained first classification network; inputs the second combined features into the second classification network to obtain the third prediction decision information corresponding to the second training sample output by the second classification network; the training device trains the second classification network according to the second loss function to obtain the trained second classification network.
  • the feature acquisition network and the trained second classification network belong to the same target neural network; "second prediction decision information” and “third prediction decision information” are both a kind of decision information, and "second prediction decision information” and “third prediction decision information” What is indicated by "Three Predictive Decision Information" depends on the type of target task performed by the target neural network.
  • the second loss function includes a fourth loss function term and a fifth loss function term.
  • the fourth loss function term indicates the similarity between the second prediction decision information and the expected decision information corresponding to the second training sample.
  • the fifth loss function term indicates the similarity between the second prediction decision information and the expected decision information corresponding to the second training sample.
  • the function indicates the degree of similarity between the second prediction decision information and the third prediction decision information.
  • the combined features obtained by combining the first feature information and the second feature information corresponding to the second training sample are compared with the second combined features, that is, the characteristics of the information associated with the target attribute in the second training sample occur.
  • the training goal also includes generating the original expected decision-making information of the second training sample, that is, non-existent training data is added during the training phase, and the training goal includes information based on different categories of target attributes, and the expected decision-making information can be obtained ;
  • it not only increases the diversity of training data; it also helps reduce the dependence of the trained neural network on the information associated with the target attributes in the input data, and pays more attention to the information related to the tasks performed, which is beneficial to This improves the accuracy of the output prediction decision information and is conducive to improving the fairness of the prediction decision information obtained for groups pointed to by different categories of target attributes.
  • the training device obtains the third feature information, including: the training device generates the first feature information corresponding to the second training sample through the trained feature acquisition network; The corresponding first feature information and disturbance information are weighted and summed to obtain the third feature information, and the weight value of the disturbance information is adjustable.
  • the disturbance information may be information randomly generated by the training equipment, or may include the gradient corresponding to the first loss function term. The disturbance information may also be obtained through other methods, etc., and is not exhaustive here.
  • the characteristic information of the second training sample is first decomposed, and then the characteristics of the information associated with the target attribute are intervened, thereby obtaining a training sample that is opposite to the actual situation of the second training sample. If the weight value of the disturbance information exceeds is larger, the lower the similarity between the third feature information and the first feature information, the more conducive to improving the fairness of the obtained prediction decision information; then the obtained prediction decision information can be balanced by adjusting the weight value of the disturbance information accuracy and fairness.
  • the method further includes: the training device performs a classification operation according to the second feature information corresponding to the first training sample, and obtains fourth prediction decision information corresponding to the first training sample; wherein, The first loss function also includes a sixth loss function term, the sixth loss function term indicates the similarity between the fourth predicted decision information and the expected decision information corresponding to the first training sample, and the second loss function term indicates the first loss function The similarity between the gradient corresponding to the term and the gradient corresponding to the sixth loss function term.
  • the gradient corresponding to the first loss function term is to enable the acquisition of the first feature information to more accurately reflect the characteristics of the information associated with the target attribute
  • the gradient corresponding to the sixth loss function term is to enable the obtained first feature information to more accurately reflect the characteristics of the information associated with the target attribute.
  • the second loss function term adopts the gradient sum corresponding to a loss function term. The similarity between the gradients corresponding to the sixth loss function term can further improve the efficiency of updating the weight parameters of the feature acquisition network, and is conducive to improving the first feature information and the second feature information generated by the feature acquisition network after training. accuracy.
  • the method is applied to any of the following scenarios: determining whether to recommend the object pointed by the first training sample, determining whether the object in the first training sample is in the target state, or determining whether to agree with the first The training sample points to the applicant's request.
  • various "decision information" in this solution can be used to indicate any of the following information: whether to recommend the object pointed to by the first training sample, whether the object in the first training sample is in the target state, whether to agree to the first training The sample points to the applicant's request.
  • This implementation provides a variety of application scenarios for this method, which improves the implementation flexibility of this solution.
  • embodiments of this application provide a data processing method that can use artificial intelligence technology to make decisions.
  • the method includes: the execution device inputs the data to be processed into the feature acquisition network, performs feature extraction on the data to be processed through the feature acquisition network, and obtains the feature information of the data to be processed; and generates the first feature information through the feature acquisition network according to the feature information of the data to be processed. and second characteristic information, where the first characteristic information includes characteristics of information associated with the target attribute in the data to be processed.
  • the execution device combines the first feature information and the second feature information to obtain the first combined feature; inputs the first combined feature into the classification network to obtain prediction decision information output by the classification network.
  • the feature acquisition network is trained using a first loss function.
  • the first loss function includes a first loss function term and a second loss function term.
  • the first loss function term indicates the similarity between the predicted category information and the expected category information.
  • the predicted The category information indicates the predicted category of the information associated with the target attribute in the data input to the feature acquisition network.
  • the expected category information indicates the correct category of the information associated with the target attribute in the data input to the feature acquisition network.
  • the second loss function term is used for training. The purpose includes reducing the similarity between the first feature information and the second feature information.
  • the first loss function further includes a third loss function term, and the third loss function term indicates the relationship between the first prediction decision information and the expected decision information corresponding to the data of the input feature acquisition network.
  • the similarity of the first prediction decision information indicates the decision corresponding to the data input to the feature acquisition network.
  • the execution device can also be used to execute the steps performed by the training device in the first aspect and each possible implementation manner of the first aspect.
  • the specific implementation manner and noun of the steps in each possible implementation manner of the second aspect Please refer to the first aspect for its meaning and beneficial effects, and will not be repeated here.
  • inventions of the present application provide a neural network training device that can use artificial intelligence technology to make decisions.
  • the neural network training device includes: a feature extraction module for inputting the first training sample into the feature acquisition network, Feature extraction is performed on the first training sample through the feature acquisition network to obtain the feature information of the first training sample; the generation module is used to generate the first training sample corresponding to the first training sample through the feature acquisition network based on the feature information of the first training sample.
  • Feature information and second feature information The first feature information corresponding to the first training sample includes characteristics of the information associated with the target attribute in the first training sample; A feature information performs a classification operation to obtain predicted category information.
  • the predicted category information indicates the predicted category of the first feature information corresponding to the first training sample.
  • the predicted category is included in multiple categories corresponding to the target attribute; the training module is used to According to the first loss function, the feature acquisition network is trained to obtain the trained feature acquisition network; wherein the first loss function includes a first loss function term and a second loss function term, and the first loss function term indicates the predicted category information and The similarity between expected category information, the expected category information indicates the correct category of the information associated with the target attribute in the first training sample, and the purpose of using the second loss function term for training includes reducing the difference between the first feature information and the second feature information. similarity between.
  • the neural network training device can also be used to perform the steps performed by the training device in the first aspect and each possible implementation of the first aspect, and the specific implementation of the steps in each possible implementation of the third aspect.
  • inventions of the present application provide a data processing device that can use artificial intelligence technology to make decisions.
  • the data processing device includes: a feature extraction module for inputting the data to be processed into the feature acquisition network, and through the feature acquisition The network performs feature extraction on the data to be processed to obtain the feature information of the data to be processed; the generation module is used to generate first feature information and second feature information through the feature acquisition network according to the feature information of the data to be processed.
  • the first feature information includes the feature information to be processed.
  • the combination module is used to combine the first feature information and the second feature information to obtain the first combined feature
  • the classification module is used to input the first combined feature into the classification network to obtain the prediction decision information output by the classification network; wherein, the feature acquisition network is trained using a first loss function, the first loss function includes a first loss function term and a second loss function term, and the first loss function term indicates the prediction category information The similarity between the predicted category information and the expected category information.
  • the predicted category information indicates the predicted category of the information associated with the target attribute in the data input to the feature acquisition network.
  • the expected category information indicates the correctness of the information associated with the target attribute in the data input to the feature acquisition network.
  • the purpose of using the second loss function term for training includes reducing the similarity between the first feature information and the second feature information.
  • the data processing device can also be used to execute the second aspect and the steps performed by the execution device in each possible implementation of the second aspect, and the specific implementation of the steps in each possible implementation of the fourth aspect.
  • the meaning of nouns and the beneficial effects they bring can be referred to the first aspect, and will not be repeated here.
  • inventions of the present application provide a computer program product.
  • the computer program product includes a program.
  • the program When the program is run on a computer, it causes the computer to execute the neural network training method described in the first aspect, or causes the computer to perform the neural network training method described in the first aspect.
  • the computer executes the data processing method described in the second aspect above.
  • embodiments of the present application provide a computer-readable storage medium.
  • the computer-readable storage medium stores a computer program that, when run on a computer, causes the computer to execute the neural processing described in the first aspect.
  • embodiments of the present application provide a training device, including a processor and a memory.
  • the processor is coupled to the memory.
  • the memory is used to store programs; the processor is used to execute the program in the memory, so that the training device executes the above The training method of the neural network described in the first aspect.
  • embodiments of the present application provide an execution device, including a processor and a memory.
  • the processor is coupled to the memory.
  • the memory is used to store programs; the processor is used to execute the program in the memory, so that the execution device executes the above The data processing method described in the second aspect.
  • the present application provides a chip system, which includes a processor and is used to support an execution device or a communication device to implement the functions involved in the above aspects, for example, sending or processing data involved in the above methods and /or information.
  • the chip system further includes a memory, and the memory is used to store necessary program instructions and data for execution devices or communication devices.
  • the chip system may be composed of chips, or may include chips and other discrete devices.
  • Figure 1 is a schematic structural diagram of the artificial intelligence main framework provided by the embodiment of the present application.
  • Figure 2a is a system architecture diagram of the data processing system provided by the embodiment of the present application.
  • Figure 2b is a schematic flow chart of a neural network training method provided by an embodiment of the present application.
  • Figure 3 is a schematic flow chart of a neural network training method provided by an embodiment of the present application.
  • Figure 4 is a schematic flow chart of a neural network training method provided by an embodiment of the present application.
  • Figure 5 is a schematic diagram comparing the first feature information and the second feature information in the neural network training method provided by the embodiment of the present application;
  • Figure 6 is a schematic flow chart of a neural network training method provided by an embodiment of the present application.
  • Figure 7 is a schematic flow chart of a neural network training method provided by an embodiment of the present application.
  • Figure 8 is a schematic flow chart of the data processing method provided by the embodiment of the present application.
  • Figure 9 is a schematic diagram of the beneficial effects of the neural network training method provided by the embodiment of the present application.
  • Figure 10 is a schematic structural diagram of a neural network training device provided by an embodiment of the present application.
  • Figure 11 is a schematic structural diagram of a data processing device provided by an embodiment of the present application.
  • Figure 12 is a schematic structural diagram of an execution device provided by an embodiment of the present application.
  • Figure 13 is another structural schematic diagram of the training equipment provided by the embodiment of the present application.
  • Figure 14 is a schematic structural diagram of a chip provided by an embodiment of the present application.
  • Figure 1 shows a structural schematic diagram of the artificial intelligence main framework.
  • the following is from the “intelligent information chain” (horizontal axis) and “IT value chain” ( The above artificial intelligence theme framework is elaborated on the two dimensions of vertical axis).
  • the "intelligent information chain” reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, the data has gone through the condensation process of "data-information-knowledge-wisdom".
  • the "IT value chain” reflects the value that artificial intelligence brings to the information technology industry, from the underlying infrastructure of human intelligence and information (providing and processing technology implementation) to the systematic industrial ecological process.
  • Infrastructure provides computing power support for artificial intelligence systems, enables communication with the external world, and supports it through basic platforms.
  • computing power is provided by a smart chip, which can specifically use a central processing unit (CPU), an embedded neural network processor (neural-network processing unit, NPU), a graphics processor ( Graphics processing unit (GPU), application specific integrated circuit (ASIC) or field programmable gate array (FPGA) and other hardware acceleration chips;
  • the basic platform includes distributed computing framework and network and other related platforms Guarantee and support can include cloud storage and computing, interconnection networks, etc.
  • sensors communicate with the outside world to obtain data, which are provided to smart chips in the distributed computing system provided by the basic platform for calculation.
  • Data from the upper layer of the infrastructure is used to represent data sources in the field of artificial intelligence.
  • the data involves graphics, images, voice, and text, as well as IoT data of traditional devices, including business data of existing systems as well as force, displacement, Sensing data such as liquid level, temperature, humidity, etc.
  • Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making and other methods.
  • machine learning and deep learning can perform symbolic and formal intelligent information modeling, extraction, preprocessing, training, etc. on data.
  • Reasoning refers to the process of simulating human intelligent reasoning in computers or intelligent systems, using formal information to perform machine thinking and problem solving based on reasoning control strategies. Typical functions are search and matching.
  • Decision-making refers to the process of decision-making after intelligent information is reasoned, and usually provides functions such as classification, sorting, and prediction.
  • some general capabilities can be formed based on the results of further data processing, such as algorithms or a general system, such as translation, text analysis, computer vision processing, speech recognition, and image processing. identification, etc.
  • Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. They are the encapsulation of overall artificial intelligence solutions, productizing intelligent information decision-making and realizing practical applications. Its application fields mainly include: intelligent terminals, intelligent manufacturing, Smart transportation, smart home, smart healthcare, smart security, autonomous driving, smart city, etc.
  • the embodiments of the present application can be applied to various application fields in the field of artificial intelligence.
  • artificial intelligence technology can be used to solve decision-making problems in various application fields.
  • the predictive decision information output by the neural network can indicate whether to recommend specific news provided by a certain supplier; or, the predictive decision information output by the neural network can indicate whether to recommend specific news provided by a certain supplier.
  • specific courses offered; alternatively, the predictive decision information output by the neural network can indicate whether to recommend a specific movie offered by a certain piece of paper.
  • the prediction decision information output by the neural network can indicate whether the person in the image has curly hair; or, the prediction decision information output by the neural network can indicate whether the person in the image is smiling; or, The predictive decision-making information output by the neural network can indicate whether the person in the image is attractive or not.
  • the prediction decision information output by the neural network can indicate whether to approve a specific applicant's loan request, etc.
  • the application scenarios of the embodiments of this application are not exhaustive here.
  • Figure 2a is a diagram of the present application.
  • the data processing system 200 includes a training device 210, a database 220, an execution device 230, a data storage system 240 and a client device 250.
  • the execution device 230 includes a computing device Module 231.
  • the database 220 stores a training data set
  • the training device 210 generates the first model/rule 201, and uses the training data set to iteratively train the first model/rule 201 to obtain the trained first model/rule 201.
  • the first model/rule 201 may be embodied as a neural network, or may be embodied as a non-neural network model. In the embodiment of this application, only the first model/rule 201 expressed as a neural network is used as an example for explanation. Further, the first model/rule 201 may include a neural network for feature acquisition on input data.
  • Figure 2b is a flowchart of a neural network training method provided by an embodiment of the present application. Schematic diagram.
  • the training device 210 inputs the first training sample into the feature acquisition network (that is, an example of the first model/rule 201), and performs feature extraction on the first training sample through the feature acquisition network to obtain feature information of the first training sample.
  • the training device 210 decomposes the feature information of the first training sample through the feature acquisition network to obtain the first feature information and the second feature information corresponding to the first training sample.
  • the first feature information corresponding to the first training sample includes Characteristics of the information associated with the target attribute in the first training sample.
  • the training device 210 performs a classification operation according to the first feature information corresponding to the first training sample to obtain predicted category information.
  • the predicted category information indicates the predicted category of the first feature information corresponding to the first training sample.
  • the predicted category is included in the target in various categories corresponding to attributes.
  • the training device 210 trains the feature acquisition network according to the first loss function to obtain the trained feature acquisition network.
  • the first loss function includes a first loss function term and a second loss function term.
  • the first loss function term indicates prediction.
  • the expected category information indicates the correct category of the information associated with the target attribute in the first training sample.
  • the purpose of using the second loss function term for training includes reducing the first feature information and the second similarity between feature information.
  • the trained first model/rules 201 obtained by the training device 210 will be deployed to the execution device 230.
  • the execution device 230 can call the data, codes, etc. in the data storage system 240, or store the data, instructions, etc. in the data storage.
  • the data storage system 240 may be placed in the execution device 230 , or the data storage system 240 may be an external memory relative to the execution device 230 .
  • the execution device 230 and the client device 250 may be independent devices.
  • the execution device 230 is configured with an input/output (I/O) interface for data interaction with the client device 250.
  • the "user" can input the data to be processed through the client device 250.
  • the client device 250 sends the data to be processed to the execution device 230 through the I/O interface.
  • the execution device 230 generates and After the prediction decision information corresponding to the data to be processed is obtained, the aforementioned prediction decision information can be returned to the client device 250 through the I/O interface and provided to the user.
  • the execution device 230 may be configured in the client device 250.
  • the execution device 230 may be the host processor (Host) of the mobile phone or tablet.
  • the execution device 230 can also be a graphics processing unit (GPU) or a neural network processor (NPU) in a mobile phone or tablet.
  • the GPU or NPU serves as a co-processor. Loaded to the main processor, the main processor allocates tasks.
  • Figure 3 is a schematic flow chart of the neural network training method provided by the embodiment of the present application.
  • the neural network training method provided by the embodiment of the present application may include:
  • the training device inputs the first training sample into the feature acquisition network, performs feature extraction on the first training sample through the feature acquisition network, and obtains feature information of the first training sample.
  • a training data set is deployed on the training device, and the training device can sample from the training data set At least one first training sample is input into a feature acquisition network, and feature extraction is performed on the first training sample through the feature acquisition network to obtain feature information of the first training sample.
  • the feature acquisition network may include a first neural network module for feature extraction.
  • the first neural network module may specifically adopt a convolutional neural network, a recurrent neural network, a residual neural network or other types of neural networks. Specifically, it may The selection is made based on the data type of the first training sample.
  • the first neural network module can use residual neural network (residual neural network, ResNet)-18, ResNet-34 or other types of neural networks, etc., which are not mentioned here. Do exhaustion.
  • the training device generates first feature information and second feature information corresponding to the first training sample through the feature acquisition network based on the feature information of the first training sample.
  • the first feature information corresponding to the first training sample includes the first training sample. Characteristics of the information in the sample associated with the target attribute.
  • the training device can generate at least one first feature information and second feature information corresponding to the first training sample through the feature acquisition network according to the feature information of the first training sample; wherein, corresponding to the first training sample
  • Each of the first feature information includes features of information associated with a target attribute in the first training sample
  • the second feature information corresponding to the first training sample includes information that is not associated with each target attribute in the first training sample.
  • the characteristics, that is, the training target includes different information in the obtained first characteristic information and second characteristic information.
  • technicians can determine target attributes based on factors in the current task that cause bias in predictive decision information.
  • at least one target attribute used in the training phase of the neural network may include Supplier
  • the information associated with the target attribute in the first training sample may include a watermark indicating the supplier of the course, an introduction to the supplier in the course cover, or other information; as another
  • a trained neural network is used to determine whether a face in an image has curly hair, and the factors that cause bias in the predicted decision information include the gender of the person in the image, then at least one goal is used in the training phase of the neural network.
  • the attributes may include the gender of the user.
  • the information associated with the target attribute in the face image i.e., the first training sample
  • the image information of the neck part may be used to determine whether the user has Adam's apple. etc. It should be understood that the examples here are only for the convenience of understanding the two concepts of "target attribute” and "information associated with the target attribute in the training sample” and are not used to limit this solution.
  • the feature acquisition network may include at least a second neural network module and a third neural network module that correspond to at least one target attribute.
  • the training device generates the feature information of the first training sample.
  • a first feature information corresponding to the first training sample can be obtained from the feature information of the first training sample through each second neural network module, and the first feature information corresponding to the first training sample can be obtained from the feature information of the first training sample through the third neural network module.
  • a second feature information corresponding to the first training sample is obtained.
  • the feature acquisition network may include a complete second neural network module.
  • the training device After the training device generates the feature information of the first training sample, it may input the feature information of the first training sample into the second neural network module. , performing a decomposition operation through the second neural network module to obtain at least one first feature information and one second feature information corresponding to the first training sample output by the second neural network module.
  • feature acquisition network can also be embodied in other structural forms.
  • the description here is only used to prove the feasibility of this solution and is not used to limit this solution.
  • the training device performs a classification operation according to the first feature information corresponding to the first training sample to obtain first prediction category information.
  • the first prediction category information indicates the prediction category of the first feature information corresponding to the first training sample.
  • the prediction category Included in various categories corresponding to the target attribute.
  • the training device can input the first feature information corresponding to the first training sample into the first classifier, and perform the classification operation through the first classifier to obtain the first predicted category information generated by the first classifier;
  • the first prediction category information indicates the prediction category of the first feature information corresponding to the first training sample, that is, the first prediction category information indicates the prediction category corresponding to the information associated with the target attribute in the first training sample, and the prediction The category is included in various categories corresponding to the target attribute.
  • the multiple categories corresponding to the target attribute may include supplier A, supplier B, supplier C, supplier D, etc.;
  • a neural network is used to determine whether the person in the image has curly hair, and the target attribute is gender, then the multiple categories corresponding to the target attribute may include male and female; as another example, for example, a neural network is used to determine whether the person in the image has curly hair.
  • the first training sample includes the user's rating of the movie and multiple attribute information of the movie itself.
  • the target attribute can include the producer of the movie, and the multiple categories corresponding to the target attribute can include A system. Film producers, film producers B, film producers C, and film producers D, etc. I will not list them all here.
  • the training device performs a classification operation based on the second feature information corresponding to the first training sample, and obtains fourth prediction decision information corresponding to the first training sample.
  • the training device may also input second feature information corresponding to the first training sample into a second classifier, and generate fourth prediction decision information corresponding to the first training sample through the second classifier.
  • the trained feature acquisition network is used to perform the target task
  • the "fourth prediction decision information" is the prediction decision information output by the first classifier
  • the content indicated by the "fourth prediction decision information” depends on the target task.
  • the trained feature acquisition network can be applied to any of the following scenarios: determining whether to recommend the object pointed by the first training sample, determining whether the object in the first training sample is in the target state, or determining whether to agree with the object pointed by the first training sample. the applicant's request.
  • the "fourth prediction decision information” can be used to indicate any of the following information: whether to recommend the object pointed by the first training sample, whether the object in the first training sample is in the target state, whether to agree with the object pointed by the first training sample
  • the applicant's request or "fourth prediction decision information” may also be used to indicate other types of information and so on.
  • the object pointed by the first training sample may be news, courses or other types of objects, etc.
  • the target status may be smile, curly hair, attractive or other types of status, etc., such as the applicant's request. It can be a loan request, a promotion request, or other types of requests, etc.
  • the examples here are only for the convenience of understanding this program and are not used to limit this program.
  • the content of the information indicated by the “fourth prediction decision information” is determined based on the content of the target task, and is not exhaustive here. Multiple application scenarios of this method are provided, which improves the implementation flexibility of this solution.
  • the training device combines the first feature information and the second feature information corresponding to the first training sample to obtain the first combined feature.
  • the training device may combine the first feature information and the second feature information corresponding to the first training sample to obtain the first combined feature.
  • the aforementioned "combination" operation method can be any of the following or Various operations: splicing, addition or other types of operations, etc., not exhaustive here.
  • the training device can directly perform a combination operation on the aforementioned first feature information and the second feature information; it can also perform a combination operation on the first training sample and the first feature information.
  • the corresponding first feature information and/or second feature information are preprocessed, and then the above combination operation is performed.
  • the aforementioned preprocessing operations may include normalization processing, processing through activation functions, multiplication with preset weight values or other processing methods, etc., which are not limited here.
  • the training device inputs the first combined features into the first classification network to obtain the first prediction decision information corresponding to the first training sample output by the first classification network.
  • the training device can input the first combined features into the first classification network to obtain the first prediction decision information corresponding to the first training sample output by the first classification network; the meaning of "first prediction decision information” The meaning is similar to the “fourth prediction decision information", but the difference is that the "first prediction decision information” is generated by the first classification network, and the “fourth prediction decision information” is generated by the second classifier.
  • the training device trains the feature acquisition network according to the first loss function, where the first loss function includes a first loss function term and a second loss function term, and the first loss function term indicates the first prediction category information and the first loss function term.
  • the similarity between expected category information, the first expected category information indicates the correct category of the information associated with the target attribute in the first training sample, and the purpose of using the second loss function term for training includes reducing the first feature information and the second feature similarity between information.
  • the training device can iteratively train the feature acquisition network and the first classifier according to the first loss function until the convergence conditions are met, and a trained feature acquisition network is obtained.
  • the aforementioned convergence conditions may include the number of iterative training reaching a preset number, meeting the convergence conditions of the first loss function or other convergence conditions, etc., which are not exhaustive here.
  • the first loss function may include a first loss function term and a second loss function term, the first loss function term indicates the similarity between the first predicted category information and the first expected category information, and the first expected category information indicates the first training
  • the correct category of the information associated with the target attribute in the sample and the purpose of using the first loss function term for training includes improving the similarity between the first predicted category information and the first expected category information.
  • the purpose of using the second loss function term for training includes reducing the similarity between the first feature information and the second feature information.
  • the first loss function term can adopt cosine similarity, L1 similarity, L2 similarity or other types of similarity between the first predicted category information and the first expected category information, or the first loss function term can Based on the Euclidean distance, cosine distance, Mahalanobis distance or other types of distance between the first predicted category information and the first expected category information, the greater the distance between the first predicted category information and the first expected category information, the first The smaller the similarity between the predicted category information and the first expected category information, it should be noted that the example of the first loss function term here is only for the convenience of understanding the first loss function term and is not used to limit this solution.
  • steps 304 to 306 are all optional steps. If steps 304 to 306 are not executed, then after repeatedly executing steps 301 to 303 and step 307 multiple times, the trained feature acquisition network can be obtained, and the trained feature acquisition network can be The subsequent feature acquisition network is deployed to the execution device.
  • step 307 may include: the training device may iteratively train the feature acquisition network, the first classifier, the second classifier and the first classification network according to the first loss function until the convergence condition is met. software to obtain the trained feature acquisition network and the trained first classification network.
  • the training device after generating the function value of the first loss function, performs gradient derivation on the function value of the first loss function, and reversely updates the feature acquisition network, the first classifier, the second classifier and the first classifier.
  • the weight parameters of the classification network are used to complete a training of the feature acquisition network and the first classification network.
  • the first loss function may also include a third loss function term and a sixth loss function term, and the third loss function term indicates the similarity between the first predicted decision information and the expected decision information corresponding to the first training sample;
  • the six loss function terms indicate the similarity between the fourth predicted decision information and the expected decision information corresponding to the first training sample; "the similarity between the first predicted decision information and the expected decision information corresponding to the first training sample"
  • the similarity between the fourth predicted decision information and the expected decision information corresponding to the first training sample please refer to the calculation method of "the similarity between the first predicted category information and the first expected category information” , will not be described in detail here.
  • the second loss function term can directly calculate the similarity between the first feature information and the second feature information, or the second loss function term can also use the gradient corresponding to the first loss function term and the sixth
  • the similarity between the gradients corresponding to the loss function term please refer to the calculation method of "similarity between the first predicted category information and the first expected category information", here No further details will be given.
  • the calculation formula of the first loss function is shown below.
  • the similarity between two gradients is taken as the cosine similarity between the two gradients as an example.
  • L 1 represents the first loss function
  • n represents the training device to obtain n training samples (that is, a batch of training samples) from the training data set to train the feature acquisition network
  • Represents the second feature information that is, the feature of the information in the training sample that is not associated with the sensitive attribute
  • represents the first feature information that is, the characteristics of the information associated with sensitive attributes in the training sample
  • g represents the first classification network
  • yi represents the expected decision information corresponding to the first training sample
  • Li represents the third loss function term , that is, the similarity between the first prediction decision information output by the first classification network and yi
  • g y represents the second classifier
  • represents the sixth loss function term that is, the similarity between the fourth predicted decision information and the expected decision information corresponding to the first training sample
  • g a represents the first classifier
  • a i represents the target attribute in the training sample the correct category of the associated information (i.e.
  • the first desired category of information Represents the first loss function term, indicating the similarity between the first predicted category information and ai ;
  • Represents the sixth loss function term corresponding gradient Represents the gradient corresponding to the first loss function term, represents the second loss function term.
  • the cosine similarity between the gradient corresponding to the first loss function term and the gradient corresponding to the sixth loss function term is used as an example;
  • ⁇ 1 , ⁇ 2 and ⁇ 3 respectively represent three weight value, it should be understood that the examples of specific implementation methods of the first loss function here are only for convenience of understanding this solution and are not used to limit this solution.
  • step 307 may include: the training device may iteratively train the feature acquisition network, the first classifier, and the second classifier according to the first loss function until the convergence condition is met. , get the trained feature acquisition network.
  • the first loss function may include a first loss function term, a second loss function term and a sixth loss function term; further, the second loss function term may directly calculate the similarity between the first feature information and the second feature information. Alternatively, the similarity between the gradient corresponding to the first loss function term and the gradient corresponding to the sixth loss function term can also be calculated.
  • the gradient corresponding to the first loss function term is to enable the acquisition of the first feature information to more accurately reflect the characteristics of the information associated with the target attribute.
  • the gradient corresponding to the sixth loss function term is In order to eliminate the interference of information related to the target attribute included in the first training sample based on the obtained second feature information, so as to generate more accurate prediction decision information, the second loss function term adopts the gradient corresponding to a loss function term.
  • the similarity between the gradient corresponding to the sixth loss function term can further improve the efficiency of updating the weight parameters of the feature acquisition network, and is conducive to improving the first feature information and second features generated by the trained feature acquisition network. Accuracy of information.
  • step 307 may include: the training device may iteratively train the feature acquisition network, the first classifier, and the first classification network according to the first loss function until the convergence condition is met. , obtain the trained feature acquisition network and the trained first classification network.
  • the first loss function may include a first loss function term, a second loss function term and a third loss function term; the second loss function term in this implementation may calculate the difference between the first feature information and the second feature information. Similarity.
  • the purpose of training the feature acquisition network not only includes accurately obtaining the features of the information associated with the target attribute from the feature information of the first training sample, but also The first feature information and the second feature information are combined to obtain the first combined feature, and a third loss function term is introduced.
  • the purpose of using the third loss function term for training includes improving the prediction decision based on the first combined feature.
  • the accuracy of the information is conducive to further improving the accuracy of the predictive decision-making information obtained in the reasoning stage.
  • Figure 4 is a schematic flowchart of a neural network training method provided by an embodiment of the present application
  • Figure 5 is a neural network training method provided by an embodiment of the present application.
  • a comparison diagram of the first feature information and the second feature information. Refer to Figure 4 first.
  • the prediction decision information output by the neural network indicates whether the person in the image is smiling is used as an example.
  • the training device can obtain the first training sample of a batch (patch) (in Figure 4, four The first training sample is taken as an example), and the expected decision information corresponding to each first training sample is obtained (in Figure 4, the first one without a smile and the last three smiles are taken as an example).
  • the training device inputs the four first training samples into the feature extraction network one after another to obtain the feature information of each first training sample, performs a decomposition operation on the feature information of each first training sample, and obtains the feature information corresponding to each first training sample.
  • the first feature information and the second feature information, each first training The first feature information and the second feature information corresponding to the training sample are spliced, and the first combined feature corresponding to each first training sample can be obtained.
  • the training device inputs the first combined features corresponding to each first training sample into the first classification network to obtain the first prediction decision information corresponding to each first training sample.
  • the training device may also perform a classification operation based on the first feature information corresponding to each first training sample to obtain the first prediction category information corresponding to each first feature information; Prediction category information, first expected category information corresponding to each first feature information, second feature information corresponding to each first training sample, first prediction decision information corresponding to each first training sample, and each third
  • the expected decision information corresponding to a training sample is generated to generate the function value of the first loss function, perform gradient derivation of the function value of the first loss function, and reversely update the weight parameters of the feature extraction network and the first classification network to complete Multiple trainings of the feature extraction network and the first classification network.
  • Figure 5 is an image obtained after visualizing the first feature information and the second feature information.
  • the prediction decision information output by the neural network indicates whether the person in the image has curly hair.
  • the target attribute is Taking gender as an example, as shown in Figure 5, the second feature information corresponding to the first training sample (that is, the features of the information that is not associated with the target attribute in the first training sample) carries more of the hair in the image
  • the characteristic information of the area, the first characteristic information corresponding to the first training sample (that is, the characteristics of the information associated with the target attribute in the first training sample) has less characteristic information of the hair area, and carries more information related to the image.
  • the training device generates second feature information corresponding to the second training sample through the trained feature acquisition network.
  • the training device can obtain the trained feature acquisition network after iteratively training the feature acquisition network according to the first loss function.
  • the training device can also obtain the second training sample from the training data set, and through the training
  • the final feature acquisition network (that is, the trained feature acquisition network obtained in steps 301 to 307) generates the second feature information corresponding to the second training sample; optionally, the training device can generate the second feature information corresponding to the second training sample.
  • first characteristic information and second characteristic information are examples of the training sample.
  • step 302 First feature information and second feature information corresponding to the second training sample
  • First feature information and second feature information corresponding to the first training sample "The concept is similar, the difference is that the first training sample and the second training sample are different training samples.
  • the training device can also perform a classification operation based on the second feature information corresponding to the second training sample to obtain the fifth feature information corresponding to the second training sample.
  • Prediction decision information the concepts of "fifth prediction decision information corresponding to the second training sample” and “fourth prediction decision information corresponding to the first training sample” are similar.
  • the training device obtains the third feature information.
  • the third feature information and the first feature information corresponding to the second training sample have the same data size and different data contents.
  • the training device can also obtain third feature information; wherein the third feature information and the first feature information corresponding to the second training sample have the same data size and different data contents.
  • the third feature information is also expressed as an N-dimensional tensor. tensor, and the third feature information and the first feature information have the same length in each of the aforementioned N dimensions.
  • the first feature information is specifically expressed as a vector
  • the third feature information is also expressed as a vector, and the length of the third feature information and the first feature information are the same;
  • the first feature information is specifically expressed as a matrix
  • the third feature information is specifically expressed as a vector.
  • the feature information is also represented as a matrix, and the length and width of the third feature information and the first feature information are the same, so an exhaustive list is not included here.
  • the difference in data content between the third feature information and the first feature information corresponding to the second training sample means that the data content of the third feature information and the first feature information corresponding to the second training sample are not exactly the same, that is, the third feature information and the first feature information corresponding to the second training sample are different in data content. It suffices that there are different data in the three feature information and the first feature information corresponding to the second training sample.
  • the training device can obtain the first feature information corresponding to the second training sample, perform a weighted summation of the first feature information and the disturbance information corresponding to the second training sample, and obtain the third feature. information; where the weight value of the disturbance information can be variable or fixed.
  • the disturbance information may be information randomly generated by the training equipment, or may include the gradient corresponding to the first loss function term.
  • the disturbance information may also be obtained through other methods, etc., and is not exhaustive here.
  • represents the third characteristic information
  • ⁇ 1 represents the weight of the disturbance information
  • ⁇ 1 is uniformly sampled from [0, ⁇ ]
  • Represents the first loss function term represents the predicted category information corresponding to the first feature information output by the first classifier
  • a i represents the expected category information corresponding to the first feature information (that is, the expected category information of the information associated with the target attribute in the second training sample)
  • can be constant or adjustable.
  • the training device may also obtain the first feature information from the third training sample, and determine the first feature information corresponding to the third training sample as the third feature corresponding to the second training sample. information; the correct category of the information associated with the target attribute in the third training sample, and the correct category of the information associated with the target attribute in the second training sample are different.
  • the training device can randomly obtain a piece of third feature information, and the first feature information corresponding to the second training sample has the same data size as the aforementioned randomly obtained third feature information. It should be noted that the training device can also obtain the third feature information in other ways, and this list will not be exhaustive here.
  • the training device can also perform a classification operation based on the first feature information corresponding to the second training sample to obtain the second prediction category information.
  • the second prediction The category information indicates the prediction category of the first feature information corresponding to the second training sample; the concepts of "second prediction category information" and "first prediction category information" are similar.
  • the specific acquisition method of "second prediction category information” please refer to The description of the specific acquisition method of the "first prediction category information" in the above steps will not be repeated here.
  • the training device combines the second feature information and the third feature information corresponding to the second training sample to obtain the second combined feature.
  • the training device inputs the second combined features into the trained first classification network, and obtains the second prediction decision information corresponding to the second training sample output by the trained first classification network.
  • steps 310 and 311 can refer to the description in steps 305 and 306 above.
  • the "second feature information corresponding to the first training sample” in steps 305 and 306 is replaced by step "Second feature information corresponding to the second training sample” in steps 310 and 311, replace "first feature information corresponding to the first training sample” in steps 305 and 306 with "third feature information corresponding to the first training sample” in steps 310 and 311
  • the meanings of "feature information” and “second prediction decision information” are similar to the meanings of the above-mentioned "fourth prediction decision information", and will not be described again here.
  • the training device inputs the second combined features into the second classification network, and obtains the third prediction decision information output by the second classification network corresponding to the second training sample.
  • the training device can input the second combined features into the second classification network to obtain the third prediction decision information corresponding to the second training sample output by the second classification network, "third prediction decision information"
  • the meaning is similar to the meaning of the "fourth prediction decision information” mentioned above, and will not be described again here.
  • the training device trains the second classification network according to the second loss function, where the second loss function includes a fourth loss function term and a fifth loss function term, and the fourth loss function term indicates the second prediction decision information and the fifth loss function term.
  • the similarity between the expected decision information corresponding to the two training samples, and the fifth loss function indicates the similarity between the second prediction decision information and the third prediction decision information.
  • the training device can also keep the weight parameters of the first classification network unchanged, and iteratively train the second classification network according to the second loss function until the convergence conditions are met to obtain the trained second classification network.
  • the trained feature acquisition network and the trained second classification network belong to the same target neural network, and the target neural network will be arranged on the execution device to perform the target task.
  • the aforementioned target task may be any of the following tasks: determining whether to recommend the object pointed to by the first training sample, determining whether the object in the first training sample is in the target state, determining whether to agree to the request of the applicant pointed to by the first training sample, or other tasks. type of tasks.
  • the training device can also keep the weight parameters of the first classification network unchanged, and iteratively train the feature acquisition network and the second classification network according to the second loss function until the convergence conditions are met, to obtain the trained second classification network. and the feature acquisition network after retraining, the second classification network after training and the feature acquisition network after retraining belong to the above-mentioned target neural network.
  • the second loss function at least includes a fourth loss function term and a fifth loss function term
  • the fourth loss function term indicates the similarity between the second prediction decision information and the expected decision information corresponding to the second training sample
  • the fifth loss function term indicates the similarity between the second prediction decision information and the expected decision information corresponding to the second training sample.
  • the loss function indicates the similarity between the second prediction decision information and the third prediction decision information.
  • the fourth loss function term may specifically adopt cosine similarity, L1 similarity, L2 similarity or other types of similarity between the second predicted decision information and the expected decision information corresponding to the second training sample, or , the fourth loss function term can be obtained based on the Euclidean distance, cosine distance, Mahalanobis distance or other types of distance between the second predicted decision information and the expected decision information corresponding to the second training sample, etc., no exhaustive list will be made here. .
  • the specific expression form of the fifth loss function term please refer to the specific expression form of the fourth loss function term, and will not be described in detail here.
  • the second loss function may also include a first loss function term and a second loss function term; for "first loss function
  • first loss function For the description of "loss function term and second loss function term”, please refer to the description in step 307. It should be noted that in step 307, the first loss function term and the second loss function term are both calculated based on the first training sample. Step 313 is calculated based on the second training sample.
  • the second loss function may also include a seventh loss function term, the seventh loss function term indicates the similarity between the fifth predicted decision information and the expected decision information corresponding to the second training sample; then the second loss The function term may adopt the similarity between the gradient corresponding to the first loss function term and the gradient corresponding to the seventh loss function term.
  • L 2 represents the second loss function
  • n represents the training device that obtains n training samples (that is, a batch of training samples) from the training data set to train the feature acquisition network and the second classification network.
  • Represents the fourth loss function term Represents the fifth loss function term, represents the third characteristic information, represents the third prediction decision information corresponding to the second training sample output by the second classification network
  • yi represents the expected decision information corresponding to the second training sample
  • represents the weight value
  • Figures 6 and 7 are two schematic flow charts of neural network training methods provided by embodiments of the present application.
  • Figure 6 mainly shows the process of training the feature acquisition network and the second classification network included in the target neural network in the second training stage.
  • the prediction decision information output by the neural network indicates whether the person in the image is smiling is used as an example.
  • the training device can obtain a batch of second training samples (in Figure 4, four second training samples are For example), and obtain the expected decision information corresponding to each second training sample (in Figure 4, the first one without a smile and the last three smiles are taken as an example).
  • the training device inputs the four second training samples into the feature extraction network one after another to obtain the feature information of each second training sample, performs a decomposition operation on the feature information of each second training sample, and obtains the feature information corresponding to each second training sample.
  • first characteristic information and second characteristic information are examples of characteristic information and second characteristic information.
  • the training device combines the first feature information corresponding to each second training sample and the disturbance information to obtain the third feature information corresponding to each second training sample, and combines the third feature information corresponding to each second training sample By splicing the information and the second feature information, the second combined features corresponding to each second training sample can be obtained.
  • the training device separates the second combined features corresponding to each second training sample into the first classification network and the second classification network after training, and obtains the second classification network output by the trained first classification network corresponding to the second training sample. Prediction decision information, and third prediction decision information corresponding to the second training sample output by the second classification network.
  • the training device may generate the second prediction decision information according to the second prediction decision information corresponding to each second training sample, the third prediction decision information corresponding to each second training sample, and the expected decision information corresponding to each second training sample.
  • the function value of the loss function is reversely derived from the function value of the second loss function, and the weight parameters of the feature extraction network and the second classification network are updated to complete multiple trainings of the feature extraction network and the first classification network. It should be noted that the specific meaning of the second loss function can be referred to the above description. The example in Figure 6 is only for convenience of understanding this solution and is not used to limit this solution.
  • the combined feature obtained by combining the first feature information and the second feature information corresponding to the second training sample Compared with the second combined features, that is, the changes in the characteristics of the information associated with the target attribute in the second training sample, but the goal of training also includes generating the original expected decision information of the second training sample, that is, in the training phase Non-existent training data is added, and the training target includes information based on different categories of target attributes to obtain the expected decision-making information; through the above solution, not only the diversity of training data is increased; it is also beneficial to reducing the impact of the neural network after training.
  • the dependence of the information associated with the target attributes in the input data, paying more attention to the information related to the tasks performed, is conducive to improving the accuracy of the output prediction decision information, and is conducive to improving the groups pointed to by different categories of target attributes.
  • the fairness of the obtained prediction decision-making information is conducive to improving the accuracy of the output prediction decision information, and is conducive to improving the groups pointed to by different categories of target attributes.
  • the characteristic information of the second training sample is first decomposed, and then the characteristics of the information associated with the target attribute are intervened, thereby obtaining a training sample that is opposite to the actual situation of the second training sample. If the weight value of the perturbation information is The larger the value, the lower the similarity between the third feature information and the first feature information, which is more conducive to improving the fairness of the obtained prediction decision information; then the obtained prediction decision can be balanced by adjusting the weight value of the disturbance information Accuracy and fairness of information.
  • Figure 7 is a schematic flowchart of a neural network training method provided by an embodiment of the present application.
  • the neural network training method provided by the embodiment of the present application can be divided into a first training stage and a second training stage.
  • the training device inputs the first training sample into the feature extraction network,
  • the extracted feature information is decomposed to obtain the first feature information and the second feature information corresponding to the first training sample, and the first feature information and the second feature information corresponding to the first training sample are spliced to obtain the first feature information.
  • the training device can input the first combined feature into the first classification network to obtain the first prediction decision information output by the first classification network corresponding to the first training sample; the training device can obtain according to the aforementioned steps information, generate the function value of the first loss function, and update the weight parameters of the feature acquisition network and the first classification network.
  • the training device repeatedly performs the aforementioned steps to iteratively train the feature acquisition network and the first classification network to obtain the trained feature acquisition network and the trained first classification network.
  • the training device inputs the second training sample into the feature extraction network, decomposes the extracted feature information to obtain the first feature information and the second feature information corresponding to the second training sample, and converts the second training sample into the feature extraction network.
  • the first feature information and disturbance information corresponding to the training sample are weighted and summed to obtain the third feature information, and the second feature information and the third feature information corresponding to the second training sample are spliced to obtain the second combined feature.
  • the training device inputs the second combined features into the second classification network and the trained first classification network respectively, obtains the third prediction decision information output by the second classification network and corresponds to the second training sample, and obtains the trained first
  • the second prediction decision information corresponding to the second training sample output by the classification network can, based on the obtained aforementioned information, Generate the function value of the second loss function, keep the weight parameters of the first classification network unchanged, and update the weight parameters of the feature acquisition network and the first classification network.
  • the training device repeatedly performs the aforementioned steps to iteratively train the feature acquisition network and the second classification network to obtain the retrained feature acquisition network and the trained second classification network.
  • the trained feature acquisition network can respectively obtain the characteristics of the information associated with the target attribute in the input data, and the characteristics of the information not associated with the target attribute in the input data, so that the user can obtain the characteristics of the information based on the current task.
  • Figure 8 is a schematic flow chart of the data processing method provided by the embodiment of the present application.
  • the data processing method provided by the embodiment of the present application may include:
  • the execution device inputs the data to be processed into the feature acquisition network, extracts features from the data to be processed through the feature acquisition network, and obtains the feature information of the data to be processed.
  • the execution device generates first feature information and second feature information through the feature acquisition network based on the feature information of the data to be processed.
  • the first feature information includes features of the information associated with the target attribute in the data to be processed.
  • steps 801 and 802 can refer to the description of steps 301 and 302 in the corresponding embodiment of Figure 3 above.
  • the difference is that the "first training sample" in steps 301 and 302 is replaced with step 801 and "data to be processed” in step 802.
  • the specific meaning of each noun in steps 801 and 802 can be referred to the description in the corresponding embodiment of Figure 3 above, and will not be described again here.
  • the execution device combines the first feature information and the second feature information to obtain the first combined feature.
  • the execution device inputs the first combined features into the classification network to obtain prediction decision information output by the classification network.
  • the feature acquisition network is trained using a first loss function, and the first loss function at least includes a first loss function term and a second loss function. Loss function term, the first loss function term indicates the similarity between the predicted category information and the expected category information, the predicted category information indicates the predicted category of the information associated with the target attribute in the data of the input feature acquisition network, and the expected category information indicates the input feature
  • the purpose of obtaining the correct category of information associated with the target attribute in the network data and using the second loss function term for training includes reducing the similarity between the first feature information and the second feature information.
  • steps 803 and 804 can refer to the description of steps 305 and 306 in the corresponding embodiment of Figure 3 above.
  • the difference is that the "first training sample" in steps 305 and 306 is replaced with step 803 and "data to be processed” in step 804.
  • steps 803 and 804 please refer to the description in the corresponding embodiment of Figure 3 above, and will not be described again here.
  • the feature acquisition network is trained using the first loss function.
  • the classification network can specifically use the first classification network in the corresponding embodiment of Figure 3, or it can also be used
  • Figure 3 corresponds to the second classification network in the embodiment, and may also be a trained classification network obtained by other methods, etc., and is not exhaustive here.
  • the feature acquisition network and the classification network are trained using the first loss function and the second loss function.
  • first loss function and the second loss function please refer to the description in the corresponding embodiment in Figure 3, which will not be done here. Repeat.
  • specific meaning of each loss function term in the "first loss function and the second loss function” you can also refer to the description in the corresponding embodiment in Figure 3, which will not be described here.
  • the characteristics of the information associated with the target attribute in the data to be processed are obtained respectively, and the characteristics of the information in the data to be processed that are not associated with the target attribute are obtained, because the user can predict and make decisions based on the information in the current task.
  • Determine the cause of the deviation to determine the target attribute and extract the characteristics of the information associated with the target attribute that caused the deviation and the characteristics of the information not associated with the target attribute, which is beneficial to the difficulty of the classification network in the process of generating predictive decision-making information. It is beneficial to improve the accuracy of the final prediction decision information.
  • Figure 9 is a schematic diagram of the beneficial effects of the neural network training method provided by the embodiments of the present application.
  • Figure 9 shows two sub-schematic diagrams on the left and right.
  • the target attribute is the film producer. According to the number of films produced, the producers are divided into mass producers and niche producers.
  • a neural network is used to determine whether to recommend a certain movie.
  • the left sub-schematic diagram and the right sub-schematic diagram of Figure 9 both use a polyline to show the accuracy and fairness of the prediction decision information generated by the trained neural network obtained through the embodiment of the present application, and use three polylines and a triangle to show the comparison.
  • the group's method obtains the accuracy and fairness of the predictive decision-making information generated by the trained neural network.
  • the higher the score of the accuracy index the higher the accuracy of the generated prediction decision information; the lower the score of the fairness index, the better the fairness of the generated prediction decision information on the target attribute.
  • the third feature information will be obtained, and the second feature information corresponding to the second training sample will be combined with the third feature information to obtain Construct training samples that don't actually exist.
  • the rightmost point in the left sub-schematic diagram and the right sub-schematic diagram of Figure 9 represents the accuracy and fairness of the prediction decision information without the second training stage. As shown in the figure, if the second training stage is not used, whether it is the left diagram of Figure 9 or the right diagram of Figure 9, the prediction decision information obtained based on the method of the embodiment of the present application has the highest accuracy. If the second training stage is adopted, the prediction decision information obtained based on the method of the embodiment of the present application has the highest accuracy and good fairness.
  • FIG 10 is a schematic structural diagram of a neural network training device provided by an embodiment of the present application.
  • the neural network training device 1000 includes: a feature extraction module 1001 for inputting the first training sample into the feature acquisition network, Feature extraction is performed on the first training sample through the feature acquisition network to obtain feature information of the first training sample; the generation module 1002 is configured to generate a third training sample corresponding to the first training sample through the feature acquisition network based on the feature information of the first training sample. A feature information and a second feature information.
  • the first feature information corresponding to the first training sample includes the characteristics of the information associated with the target attribute in the first training sample; the first classification module 1003 is used to classify the information according to the first training sample corresponding to the first training sample. Perform a classification operation on the first feature information to obtain predicted category information.
  • the predicted category information indicates the predicted category of the first feature information corresponding to the first training sample.
  • the predicted category is included in multiple categories corresponding to the target attribute; training module 1004 , used to train the feature acquisition network according to the first loss function to obtain the trained feature acquisition network; wherein the first loss function includes a first loss function term and a second loss function term, and the first loss function term indicates prediction
  • the similarity between the category information and the expected category information, which indicates the correct category of the information associated with the target attribute in the first training sample, is performed using the second loss function term
  • the purpose of training includes reducing the similarity between the first feature information and the second feature information.
  • the neural network training device 1000 further includes: a combination module for combining the first feature information and the second feature information corresponding to the first training sample to obtain the first combined feature;
  • the second classification module is used to input the first combined features into the first classification network to obtain the first prediction decision information corresponding to the first training sample output by the first classification network;
  • the training module 1004 is specifically used to calculate the first loss function according to the first loss function. , train the feature acquisition network and the first classification network, where the first loss function also includes a third loss function term, and the third loss function term indicates the relationship between the first prediction decision information and the expected decision information corresponding to the first training sample. similarity between.
  • the neural network training device 1000 further includes: an acquisition module, configured to acquire third feature information, where the data size of the third feature information and the first feature information corresponding to the second training sample are the same. And the data content is different; the combination module is also used to combine the second feature information and the third feature information corresponding to the second training sample to obtain the second combined feature; the second classification module is also used to combine the second combination
  • the latter features are input into the trained first classification network to obtain the second prediction decision information corresponding to the second training sample output by the trained first classification network;
  • the second classification module is also used to input the second combined features into the third
  • the second classification network obtains the third prediction decision information corresponding to the second training sample output by the second classification network;
  • the training module 1004 is specifically used to train the second classification network according to the second loss function to obtain the trained third
  • the trained feature acquisition network and the trained second classification network belong to the same target neural network; among them, the second loss function includes the fourth loss function term and the fifth loss function term, and the fourth loss function term indicates The similar
  • the acquisition module is specifically used to: generate first feature information corresponding to the second training sample through the trained feature acquisition network; combine the first feature information and disturbance information corresponding to the second training sample Perform weighted summation to obtain the third feature information, and the weight value of the disturbance information is adjustable.
  • the second classification module is also used to perform a classification operation based on the second feature information corresponding to the first training sample to obtain the fourth prediction decision information corresponding to the first training sample, wherein the first The loss function also includes a sixth loss function term, the sixth loss function term indicates the similarity between the fourth predicted decision information and the expected decision information corresponding to the first training sample, and the second loss function term indicates the first loss function term. The similarity between the corresponding gradient and the gradient corresponding to the sixth loss function term.
  • the neural network training device 1000 is applied to any of the following scenarios: determining whether to recommend the object pointed to by the first training sample, determining whether the object in the first training sample is in the target state, or determining whether to agree with the first training sample.
  • a training sample points to the applicant's request.
  • the embodiment of the present application also provides a data processing device. Please refer to Figure 11.
  • Figure 11 is a schematic structural diagram of the data processing device provided by the embodiment of the present application.
  • the data processing device 1100 includes: a feature extraction module 1101.
  • the data to be processed is input into the feature acquisition network, and the features of the data to be processed are extracted through the feature acquisition network to obtain the feature information of the data to be processed;
  • the generation module 1102 is used to generate the first first step through the feature acquisition network according to the feature information of the data to be processed.
  • Feature information and second feature information The first feature information includes the relationship between the target attributes in the data to be processed.
  • the combination module 1103 is used to combine the first feature information and the second feature information to obtain the first combined features
  • the classification module 1104 is used to input the first combined features into the classification network to obtain the classification Prediction decision information output by the network; wherein, the feature acquisition network is trained using a first loss function.
  • the first loss function includes a first loss function term and a second loss function term.
  • the first loss function term indicates the predicted category information and the expected category information.
  • the similarity between the two, the predicted category information indicates the predicted category of the information associated with the target attribute in the data input to the feature acquisition network, and the expected category information indicates the correct category of the information associated with the target attribute in the data input to the feature acquisition network, using the The purpose of training the two loss function terms includes reducing the similarity between the first feature information and the second feature information.
  • the first loss function also includes a third loss function term, and the third loss function term indicates the similarity between the first predicted decision information and the expected decision information corresponding to the data of the input feature acquisition network,
  • the first prediction decision information indicates a decision corresponding to the data input to the feature acquisition network.
  • FIG. 12 is a schematic structural diagram of an execution device provided by an embodiment of the present application.
  • the execution device 1200 includes: a receiver 1201, a transmitter 1202, a processor 1203 and a memory 1204 (the number of processors 1203 in the execution device 1200 may be one or more, one processor is taken as an example in Figure 12) , wherein the processor 1203 may include an application processor 12031 and a communication processor 12032.
  • the receiver 1201, the transmitter 1202, the processor 1203, and the memory 1204 may be connected through a bus or other means.
  • Memory 1204 may include read-only memory and random access memory and provides instructions and data to processor 1203 .
  • a portion of memory 1204 may also include non-volatile random access memory (NVRAM).
  • NVRAM non-volatile random access memory
  • the memory 1204 stores processor and operating instructions, executable modules or data structures, or a subset thereof, or an extended set thereof, where the operating instructions may include various operating instructions for implementing various operations.
  • the processor 1203 controls the execution of operations of the device.
  • various components of the execution device are coupled together through a bus system.
  • the bus system may also include a power bus, a control bus, a status signal bus, etc.
  • various buses are called bus systems in the figure.
  • the methods disclosed in the above embodiments of the present application can be applied to the processor 1203 or implemented by the processor 1203.
  • the processor 1203 may be an integrated circuit chip with signal processing capabilities. During the implementation process, each step of the above method can be completed by instructions in the form of hardware integrated logic circuits or software in the processor 1203 .
  • the above-mentioned processor 1203 can be a general-purpose processor, a digital signal processing (DSP), a microprocessor or a microcontroller, and can further include an application specific integrated circuit (ASIC), a field programmable Gate array (field-programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • DSP digital signal processing
  • ASIC application specific integrated circuit
  • FPGA field-programmable gate array
  • the processor 1203 can implement or execute the various methods, steps and logical block diagrams disclosed in the embodiments of this application.
  • a general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc.
  • the steps of the methods disclosed in the embodiments of this application can be directly
  • the implementation is implemented by a hardware decoding processor, or by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other mature storage media in this field.
  • the storage medium is located in the memory 1204.
  • the processor 1203 reads the information in the memory 1204 and completes the steps of the above method in combination with its hardware.
  • the receiver 1201 may be configured to receive input numeric or character information and generate signal inputs related to performing relevant settings and functional controls of the device.
  • the transmitter 1202 can be used to output numeric or character information through the first interface; the transmitter 1202 can also be used to send instructions to the disk group through the first interface to modify the data in the disk group; the transmitter 1202 can also include a display device such as a display screen .
  • the application processor 12031 in the processor 1203 is used to execute the data processing method executed by the execution device in the corresponding embodiment of FIG. 8 .
  • the specific manner in which the application processor 12031 performs each step in the data processing method is based on the same concept as the various method embodiments corresponding to Figure 8 in this application, and the technical effects it brings correspond to Figure 8 in this application.
  • the various method embodiments are the same. For specific details, please refer to the descriptions in the method embodiments shown above in this application, and will not be described again here.
  • FIG. 13 is a schematic structural diagram of the training device provided by the embodiment of the present application.
  • the training device 1300 is implemented by one or more servers.
  • the training device 1300 There may be relatively large differences due to different configurations or performance, and may include one or more central processing units (CPU) 1322 (for example, one or more processors) and memory 1332, one or more storage applications Storage medium 1330 for program 1342 or data 1344 (eg, one or more mass storage devices).
  • the memory 1332 and the storage medium 1330 may be short-term storage or persistent storage.
  • the program stored in the storage medium 1330 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations in the training device. Furthermore, the central processor 1322 may be configured to communicate with the storage medium 1330 and execute a series of instruction operations in the storage medium 1330 on the training device 1300 .
  • the training device 1300 may also include one or more power supplies 1326, one or more wired or wireless network interfaces 1350, one or more input and output interfaces 1358, and/or, one or more operating systems 1341, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM and so on.
  • operating systems 1341 such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM and so on.
  • the central processor 1322 is used to execute the neural network training method executed by the training device in the corresponding embodiments of Figures 2b to 7. It should be noted that the specific manner in which the central processor 1322 executes each step in the neural network training method is based on the same concept as the various method embodiments corresponding to Figures 2b to 7 in this application, and the technical effects it brings are the same as those in this application. The respective method embodiments corresponding to Figures 2b to 7 are the same. For details, please refer to the descriptions in the method embodiments shown above in this application, and will not be described again here.
  • An embodiment of the present application also provides a computer program product that, when run on a computer, causes the computer to perform the steps performed by the execution device in the method described in the embodiment shown in FIG. 8, or causes the computer to perform the following: The steps performed by the training device in the method described in the embodiments shown in Figures 2b to 7 are mentioned above.
  • Embodiments of the present application also provide a computer-readable storage medium.
  • the computer-readable storage medium stores a program for signal processing. When it is run on a computer, it causes the computer to execute the embodiment shown in Figure 8.
  • the steps performed by the execution device in the described method, or causing the computer to perform the steps shown in the aforementioned Figures 2b to 7 The example describes the steps performed by training the device in the method.
  • the neural network training device, data processing device, execution device or training device provided by the embodiment of the present application may specifically be a chip.
  • the chip includes: a processing unit and a communication unit.
  • the processing unit may be, for example, a processor.
  • the communication unit For example, it can be an input/output interface, a pin or a circuit, etc.
  • the processing unit can execute the computer execution instructions stored in the storage unit, so that the chip executes the neural network training method described in the embodiments shown in FIGS. 2b to 7, or to cause the chip in the training device to execute the neural network training method shown in FIG. 8.
  • the embodiment describes the training method of neural network.
  • the storage unit is a storage unit within the chip, such as a register, cache, etc.
  • the storage unit may also be a storage unit located outside the chip in the wireless access device, such as Read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (random access memory, RAM), etc.
  • ROM Read-only
  • Figure 14 is a structural schematic diagram of a chip provided by an embodiment of the present application.
  • the chip can be represented as a neural network processor NPU 140.
  • the NPU 140 serves as a co-processor and is mounted to the main CPU (Host). CPU), tasks are allocated by the Host CPU.
  • the core part of the NPU is the arithmetic circuit 1403.
  • the arithmetic circuit 1403 is controlled by the controller 1404 to extract the matrix data in the memory and perform multiplication operations.
  • the computing circuit 1403 internally includes multiple processing units (Process Engine, PE).
  • arithmetic circuit 1403 is a two-dimensional systolic array.
  • the arithmetic circuit 1403 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition.
  • arithmetic circuit 1403 is a general-purpose matrix processor.
  • the arithmetic circuit obtains the corresponding data of matrix B from the weight memory 1402 and caches it on each PE in the arithmetic circuit.
  • the operation circuit takes matrix A data and matrix B from the input memory 1401 to perform matrix operations, and the partial result or final result of the matrix is stored in an accumulator (accumulator) 1408 .
  • the unified memory 1406 is used to store input data and output data.
  • the weight data directly passes through the storage unit access controller (Direct Memory Access Controller, DMAC) 1405, and the DMAC is transferred to the weight memory 1402.
  • Input data is also transferred to unified memory 1406 via DMAC.
  • DMAC Direct Memory Access Controller
  • BIU is the Bus Interface Unit, that is, the bus interface unit 1410, which is used for the interaction between the AXI bus and the DMAC and the Instruction Fetch Buffer (IFB) 1409.
  • IFB Instruction Fetch Buffer
  • the bus interface unit 1410 (Bus Interface Unit, BIU for short) is used to fetch the memory 1409 to obtain instructions from the external memory, and is also used for the storage unit access controller 1405 to obtain the original data of the input matrix A or the weight matrix B from the external memory.
  • BIU Bus Interface Unit
  • DMAC is mainly used to transfer the input data in the external memory DDR to the unified memory 1406 or the weight data to the weight memory 1402 or the input data to the input memory 1401 .
  • the vector calculation unit 1407 includes multiple arithmetic processing units, and if necessary, further processes the output of the arithmetic circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, etc.
  • vector calculation unit 1407 can store the processed output vectors to unified memory 1406 .
  • the vector calculation unit 1407 can apply a linear function and/or a nonlinear function to the output of the operation circuit 1403, such as linear interpolation on the feature plane extracted by the convolution layer, or a vector of accumulated values, to generate an activation value.
  • vector calculation unit 1407 generates normalized values, pixel-wise summed values, or both.
  • the processed output vector can be used as an activation input to the arithmetic circuit 1403, such as for use in a subsequent layer in a neural network.
  • the instruction fetch buffer 1409 connected to the controller 1404 is used to store instructions used by the controller 1404;
  • the unified memory 1406, input memory 1401, weight memory 1402 and instruction fetch memory 1409 are all On-Chip memories. External memory is private to the NPU hardware architecture.
  • each layer in the neural network shown in the above embodiments can be performed by the operation circuit 1403 or the vector calculation unit 1407.
  • the processor mentioned in any of the above places may be a general central processing unit, a microprocessor, an ASIC, or one or more integrated circuits used to control program execution of the method of the first aspect.
  • the device embodiments described above are only illustrative.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physically separate.
  • the physical unit can be located in one place, or it can be distributed across multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • the connection relationship between modules indicates that there are communication connections between them, which can be specifically implemented as one or more communication buses or signal lines.
  • the present application can be implemented by software plus necessary general hardware. Of course, it can also be implemented by dedicated hardware including dedicated integrated circuits, dedicated CPUs, dedicated memories, Special components, etc. to achieve. In general, all functions performed by computer programs can be easily implemented with corresponding hardware. Moreover, the specific hardware structures used to implement the same function can also be diverse, such as analog circuits, digital circuits or special-purpose circuits. circuit etc. However, for this application, software program implementation is a better implementation in most cases. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence or that contributes to the existing technology.
  • the computer software product is stored in a readable storage medium, such as a computer floppy disk. , U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk, etc., including several instructions to cause a computer device (which can be a personal computer, training device, or network device, etc.) to execute the steps described in various embodiments of this application. method.
  • a computer device which can be a personal computer, training device, or network device, etc.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, the computer instructions may be transferred from a website, computer, training device, or data
  • the center transmits data to the network through wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) to another website, computer, training device or data center.
  • wired such as coaxial cable, optical fiber, digital subscriber line (DSL)
  • wireless such as infrared, wireless, microwave, etc.
  • the computer-readable storage medium may be any available medium that a computer can store, or a data storage device such as a training device or a data center integrated with one or more available media.
  • the available media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, DVD), or semiconductor media (eg, solid state disk (Solid State Disk, SSD)), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A neural network training method, a data processing method, and a device. In the method, artificial intelligence technology can be used for decision making. The method comprises: according to feature information of a first training sample, generating, by means of a feature acquisition network, first feature information and second feature information which correspond to the first training sample, wherein the first feature information comprises a feature of information, associated with a target attribute, in the first training sample; and training the feature acquisition network, wherein a first loss function item indicates the similarity between predicted category information and expected category information, the expected category information indicates a correct category for the information, associated with the target attribute, in the first training sample, and the purpose of performing training by using a second loss function item comprises reducing the similarity between the first feature information and the second feature information. Features of information associated with and not associated with a target attribute, which leads to a deviation, are respectively extracted, so as to improve the accuracy of finally-obtained prediction decision information.

Description

一种神经网络的训练方法、数据的处理方法以及设备A neural network training method, data processing method and equipment

本申请要求于2022年5月31日提交中国专利局、申请号为202210613415.0、发明名称为“一种神经网络的训练方法、数据的处理方法以及设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application submitted to the China Patent Office on May 31, 2022, with the application number 202210613415.0 and the invention title "A neural network training method, data processing method and equipment", and its entire content incorporated herein by reference.

技术领域Technical field

本申请涉及人工智能领域,尤其涉及一种神经网络的训练方法、数据的处理方法以及设备。This application relates to the field of artificial intelligence, and in particular to a neural network training method, data processing method and equipment.

背景技术Background technique

随着人工智能(Artificial Intelligence,AI)时代的发展,神经网络被越来越广泛的用来帮助人们做决策。作为示例,例如在新闻推荐系统中,可以利用神经网络确定是否推荐特定新闻;作为另一示例,例如在教育推荐系统中,可以利用神经网络确定是否推荐特定课程;作为另一示例,例如可以利用神经网络来确定是否给特定申请者发放贷款等等。With the development of the Artificial Intelligence (AI) era, neural networks are increasingly used to help people make decisions. As an example, for example, in a news recommendation system, a neural network can be used to determine whether to recommend specific news; as another example, for example, in an education recommendation system, a neural network can be used to determine whether to recommend a specific course; as another example, for example, Neural networks are used to determine whether to grant a loan to a specific applicant, and so on.

但由于我们收集到的训练数据可能会存在偏差,导致训练后的神经网络在推理阶段输出的预测决策信息也会存在偏差,甚至会放大这种偏差。作为示例,例如在教育推荐系统中,由于用户偏向于根据供应商的名称来搜索课程,因此采集到的训练数据中,针对大众的供应商提供的课程的反馈数据较多,针对小众的供应商提供的课程的反馈数据较少,基于这样的数据训练后的神经网络在确定推荐的课程时的考虑因素会包括课程的供应商,也即神经网络在训练过程中学习到了训练数据中存在的偏差。在推理阶段,大众的供应商提供的课程被推荐的概率就会大大提升,但前述特定供应商供应的课程的质量不一定好,也即神经网络在推理阶段输出的预测决策信息也会存在偏差。However, because the training data we collect may have biases, the predicted decision-making information output by the trained neural network during the inference phase will also be biased, or even amplify this bias. As an example, in an education recommendation system, since users tend to search for courses based on the name of the supplier, the collected training data contains more feedback data for courses provided by suppliers targeting the general public, and more feedback data for courses provided by niche suppliers. There is less feedback data on courses provided by suppliers. The neural network trained based on such data will consider the course suppliers when determining the recommended courses. That is, the neural network learns the characteristics of the training data during the training process. deviation. In the inference stage, the probability of courses provided by public suppliers being recommended will be greatly increased, but the quality of the courses provided by the specific suppliers mentioned above is not necessarily good, that is, the predictive decision-making information output by the neural network in the inference stage will also be biased. .

结合上述描述可知,目前用于决策的神经网络输出的预测决策信息存在偏差,一种能够提高神经网络输出的预测决策信息的方案亟待推出。Based on the above description, it can be seen that the predictive decision-making information output by the neural network currently used for decision-making is biased, and a solution that can improve the predictive decision-making information output by the neural network urgently needs to be introduced.

发明内容Contents of the invention

本申请实施例提供了一种神经网络的训练方法、数据的处理方法以及设备,分别获取输入数据中与目标属性关联的信息的特征,和,输入数据中与目标属性不关联的信息的特征,从而用户可以根据当前任务中导致预测决策信息出现偏差的原因来确定目标属性,将导致出现偏差的目标属性所关联的信息的特征和与目标属性不关联的信息的特征分别提取出来,有利于降低后续决策过程的难度,有利于提高最终得到的预测决策信息的准确度。Embodiments of the present application provide a neural network training method, a data processing method and a device to respectively obtain the characteristics of the information associated with the target attribute in the input data, and the characteristics of the information not associated with the target attribute in the input data, In this way, the user can determine the target attribute based on the reasons that cause the deviation of the prediction decision information in the current task, and extract the characteristics of the information associated with the target attribute that causes the deviation and the characteristics of the information not associated with the target attribute, which is beneficial to reducing the The difficulty of the subsequent decision-making process is conducive to improving the accuracy of the final predictive decision-making information.

为解决上述技术问题,本申请实施例提供以下技术方案:In order to solve the above technical problems, the embodiments of this application provide the following technical solutions:

第一方面,本申请实施例提供一种神经网络的训练方法,可将人工智能技术用于做决策,方法包括:训练设备将第一训练样本输入特征获取网络,通过特征获取网络对第一训练样本进行特征提取,得到第一训练样本的特征信息;根据第一训练样本的特征信息,通过特征获取网络生成与第一训练样本对应的第一特征信息和第二特征信息;其中,与第一训练样本对应的第一特征信息包括第一训练样本中与目标属性关联的信息的特征;与第一 训练样本对应的第二特征信息包括第一训练样本中与每个目标属性均不关联的信息的特征,也即训练的目标包括得到的第一特征信息和第二特征信息中包括不同的信息。进一步地,技术人员可以根据当前任务中导致预测决策信息出现偏差的因素来确定目标属性。作为示例,例如训练后的神经网络用于确定是否推荐某个课程,导致预测决策信息出现偏差的因素包括提供课程的供应商,则对该神经网络的训练阶段所采用的至少一个目标属性可以包括供应商,若第一训练样本是课程,则第一训练样本中与目标属性关联的信息可以包括用于指示课程的供应商的水印、课程封面中对供应商的介绍或其他信息等;作为另一示例,例如训练后的神经网络用于确定图像中的人脸是否是卷发,导致预测决策信息出现偏差的因素包括图像中人的性别,则对该神经网络的训练阶段所采用的至少一个目标属性可以包括用户性别,人脸图像(也即第一训练样本)中与目标属性关联的信息可以包括人脸图像中脖子部分的图像信息,前述脖子部分的图像信息可以用来确定用户是否有喉结等。In the first aspect, embodiments of the present application provide a neural network training method that can use artificial intelligence technology to make decisions. The method includes: the training device inputs the first training sample into a feature acquisition network, and uses the feature acquisition network to perform the first training Feature extraction is performed on the sample to obtain the feature information of the first training sample; according to the feature information of the first training sample, the first feature information and the second feature information corresponding to the first training sample are generated through the feature acquisition network; wherein, with the first The first characteristic information corresponding to the training sample includes the characteristics of the information associated with the target attribute in the first training sample; and the first characteristic information The second feature information corresponding to the training sample includes features of information in the first training sample that are not associated with each target attribute, that is, the training target includes the obtained first feature information and the second feature information includes different information. Further, technicians can determine target attributes based on factors in the current task that cause bias in predictive decision information. As an example, for example, if a trained neural network is used to determine whether to recommend a certain course, and factors that lead to bias in the prediction decision information include the supplier providing the course, then at least one target attribute used in the training phase of the neural network may include Supplier, if the first training sample is a course, the information associated with the target attribute in the first training sample may include a watermark indicating the supplier of the course, an introduction to the supplier in the course cover, or other information; as another For example, if a trained neural network is used to determine whether a face in an image has curly hair, and the factors that lead to bias in the prediction decision information include the gender of the person in the image, then at least one goal is used in the training phase of the neural network. The attributes may include the gender of the user. The information associated with the target attribute in the face image (i.e., the first training sample) may include image information of the neck part in the face image. The image information of the neck part may be used to determine whether the user has Adam's apple. wait.

训练设备根据与第一训练样本对应的第一特征信息执行分类操作,得到预测类别信息,预测类别信息指示与第一训练样本对应的第一特征信息的预测类别,也即第一预测类别信息指示第一训练样本中与目标属性关联的信息所对应的预测类别,预测类别包括于目标属性所对应的多种类别中;根据第一损失函数,对特征获取网络进行训练,得到训练后的特征获取网络;作为示例,例如神经网络用于确定某个课程是否被推荐,目标属性为供应商,则目标属性所对应的多种类别可以包括甲供应商、乙供应商、丙供应商和丁供应商等;作为另一示例,例如神经网络用于判断图像中的人是否为卷发,目标属性为性别,则目标属性所对应的多种类别可以包括男性和女性。The training device performs a classification operation according to the first feature information corresponding to the first training sample to obtain prediction category information. The prediction category information indicates the prediction category of the first feature information corresponding to the first training sample, that is, the first prediction category information indicates The prediction category corresponding to the information associated with the target attribute in the first training sample, the prediction category is included in multiple categories corresponding to the target attribute; according to the first loss function, the feature acquisition network is trained to obtain the feature acquisition after training Network; as an example, if a neural network is used to determine whether a certain course is recommended, and the target attribute is a supplier, then the multiple categories corresponding to the target attribute can include supplier A, supplier B, supplier C, and supplier D. etc.; as another example, if a neural network is used to determine whether the person in the image has curly hair, and the target attribute is gender, then the multiple categories corresponding to the target attribute may include male and female.

其中,第一损失函数包括第一损失函数项和第二损失函数项,第一损失函数项指示预测类别信息和期望类别信息之间的相似度,期望类别信息指示第一训练样本中与目标属性关联的信息的正确类别,采用第二损失函数项进行训练的目的包括降低第一特征信息和第二特征信息之间的相似度;进一步地,第二损失函数项可以直接计算第一特征信息和第二特征信息之间的相似度,或者,第二损失函数项也可以采用其他信息之间的相似度。Wherein, the first loss function includes a first loss function term and a second loss function term. The first loss function term indicates the similarity between the predicted category information and the expected category information. The expected category information indicates the similarity between the first training sample and the target attribute. Correct category of associated information, the purpose of using the second loss function term for training includes reducing the similarity between the first feature information and the second feature information; further, the second loss function term can directly calculate the first feature information and The similarity between the second feature information, or the second loss function term can also be the similarity between other information.

本实现方式中,通过特征获取网络对第一训练样本的特征信息进行分解,得到与第一训练样本对应的第一特征信息和第二特征信息,与第一训练样本对应的第一特征信息包括第一训练样本中与目标属性关联的信息的特征;根据与第一训练样本对应的第一特征信息执行分类操作,得到预测类别信息,预测类别信息指示与第一训练样本对应的第一特征信息的预测类别,预测类别包括于目标属性所对应的多种类别中;根据第一损失函数,对特征获取网络进行训练,得到训练后的特征获取网络;第一损失函数包括第一损失函数项和第二损失函数项,第一损失函数项指示预测类别信息和期望类别信息之间的相似度,期望类别信息指示第一训练样本中与目标属性关联的信息的正确类别,采用第二损失函数项进行训练的目的包括降低第一特征信息和第二特征信息之间的相似度。通过前述方案,训练后的特征获取网络能够分别获取到输入数据中与目标属性关联的信息的特征,和,输入数据中与目标属性不关联的信息的特征,从而用户可以根据当前任务中导致预测决策信息出现偏差的原因来确定目标属性,将导致出现偏差的目标属性所关联的信息的特征和与目标属性不关联的信息的特征分别提取出来,有利于降低后续决策过程的难度,有利于提高最 终得到的预测决策信息的准确度。In this implementation, the feature information of the first training sample is decomposed through the feature acquisition network to obtain the first feature information and the second feature information corresponding to the first training sample. The first feature information corresponding to the first training sample includes Characteristics of the information associated with the target attribute in the first training sample; performing a classification operation according to the first feature information corresponding to the first training sample to obtain predicted category information, where the predicted category information indicates the first feature information corresponding to the first training sample The prediction category is included in multiple categories corresponding to the target attribute; according to the first loss function, the feature acquisition network is trained to obtain the trained feature acquisition network; the first loss function includes the first loss function term and The second loss function term, the first loss function term indicates the similarity between the predicted category information and the expected category information, the expected category information indicates the correct category of the information associated with the target attribute in the first training sample, the second loss function term is used The purpose of training includes reducing the similarity between the first feature information and the second feature information. Through the above solution, the trained feature acquisition network can respectively obtain the characteristics of the information associated with the target attribute in the input data, and the characteristics of the information not associated with the target attribute in the input data, so that the user can make predictions based on the current task. Determine the target attribute based on the reasons for the deviation of the decision-making information, and extract the characteristics of the information associated with the target attribute that caused the deviation and the characteristics of the information not associated with the target attribute, which will help reduce the difficulty of the subsequent decision-making process and improve the quality of the decision-making process. most The accuracy of the final forecast decision-making information.

在第一方面的一种可能实现方式中,方法还包括:训练设备将与第一训练样本对应的第一特征信息和第二特征信息进行组合,得到第一组合后特征;前述“组合”操作的方式可以为如下任一种或多种操作:拼接、相加或其他类型的操作等;将第一组合后特征输入第一分类网络,得到第一分类网络输出的与第一训练样本对应的第一预测决策信息。In a possible implementation of the first aspect, the method further includes: the training device combines the first feature information and the second feature information corresponding to the first training sample to obtain the first combined feature; the aforementioned "combination" operation The method can be any one or more of the following operations: splicing, addition or other types of operations, etc.; input the first combined features into the first classification network to obtain the output of the first classification network corresponding to the first training sample First predictive decision information.

训练设备根据第一损失函数,对特征获取网络进行训练,包括:训练设备根据第一损失函数,对特征获取网络和第一分类网络进行训练,其中,第一损失函数还包括第三损失函数项,第三损失函数项指示第一预测决策信息和第一训练样本所对应的期望决策信息之间的相似度。The training device trains the feature acquisition network according to the first loss function, including: the training device trains the feature acquisition network and the first classification network according to the first loss function, where the first loss function also includes a third loss function term , the third loss function term indicates the similarity between the first predicted decision information and the expected decision information corresponding to the first training sample.

本实现方式中,在对特征获取网络进行训练的过程中,对特征获取网络进行训练的目的不仅包括从第一训练样本的特征信息中准确的获得目标属性所关联的信息的特征,还会将第一特征信息和第二特征信息进行组合得到第一组合后特征,并引入了第三损失函数项,采用第三损失函数项进行训练的目的包括提高基于第一组合后特征得到的预测决策信息的准确率,有利于进一步提高推理阶段得到的预测决策信息的准确率。In this implementation, in the process of training the feature acquisition network, the purpose of training the feature acquisition network not only includes accurately obtaining the characteristics of the information associated with the target attribute from the feature information of the first training sample, but also The first feature information and the second feature information are combined to obtain the first combined feature, and a third loss function term is introduced. The purpose of using the third loss function term for training includes improving the prediction decision information based on the first combined feature. The accuracy rate is conducive to further improving the accuracy rate of the prediction decision information obtained in the reasoning stage.

在第一方面的一种可能实现方式中,训练设备对特征获取网络和第一分类网络进行训练,得到训练后的特征获取网络和训练后的第一分类网络之后,方法还包括:训练设备获取第三特征信息,并将与第二训练样本对应的第二特征信息和第三特征信息进行组合,得到第二组合后特征;其中,第三特征信息和第二训练样本所对应的第一特征信息的数据尺寸相同且数据内容不同。进一步地,若第一特征信息具体表现为N维的张量,则第三特征信息也表现为N维的张量,且第三特征信息和第一特征信息在前述N维中的每一个维度上的长度均相同。作为示例,若第一特征信息具体表现为向量,则第三特征信息也表现为向量,且第三特征信息和第一特征信息的长度相同;若第一特征信息具体表现为矩阵,则第三特征信息也表现为矩阵,且第三特征信息和第一特征信息的长度和宽度均相同等;第三特征信息和第二训练样本所对应的第一特征信息的数据内容不同指的是:第三特征信息和第二训练样本所对应的第一特征信息的数据内容不完全相同,也即第三特征信息和第二训练样本所对应的第一特征信息中存在不同的数据即可。In a possible implementation of the first aspect, the training device trains the feature acquisition network and the first classification network, and after obtaining the trained feature acquisition network and the trained first classification network, the method further includes: training device acquisition third feature information, and combine the second feature information and the third feature information corresponding to the second training sample to obtain the second combined feature; wherein, the third feature information and the first feature corresponding to the second training sample The data size of the messages is the same and the data content is different. Further, if the first feature information is specifically expressed as an N-dimensional tensor, the third feature information is also expressed as an N-dimensional tensor, and the third feature information and the first feature information are in each of the aforementioned N dimensions. The lengths are the same. As an example, if the first feature information is specifically expressed as a vector, then the third feature information is also expressed as a vector, and the length of the third feature information and the first feature information are the same; if the first feature information is specifically expressed as a matrix, then the third feature information is specifically expressed as a vector. The feature information is also represented as a matrix, and the length and width of the third feature information and the first feature information are the same; the difference in data content of the third feature information and the first feature information corresponding to the second training sample refers to: The data contents of the third feature information and the first feature information corresponding to the second training sample are not exactly the same, that is, it suffices that there are different data in the third feature information and the first feature information corresponding to the second training sample.

训练设备将第二组合后特征输入训练后的第一分类网络,得到训练后的第一分类网络输出的与第二训练样本对应的第二预测决策信息;将第二组合后特征输入第二分类网络,得到第二分类网络输出的与第二训练样本对应的第三预测决策信息;训练设备根据第二损失函数,对第二分类网络进行训练,得到训练后的第二分类网络,训练后的特征获取网络和训练后的第二分类网络归属于同一目标神经网络;“第二预测决策信息”和“第三预测决策信息”均为一种决策信息,“第二预测决策信息”和“第三预测决策信息”所指示的内容取决于目标神经网络所执行的目标任务的类型。The training device inputs the second combined features into the trained first classification network to obtain the second prediction decision information corresponding to the second training sample output by the trained first classification network; inputs the second combined features into the second classification network to obtain the third prediction decision information corresponding to the second training sample output by the second classification network; the training device trains the second classification network according to the second loss function to obtain the trained second classification network. The feature acquisition network and the trained second classification network belong to the same target neural network; "second prediction decision information" and "third prediction decision information" are both a kind of decision information, and "second prediction decision information" and "third prediction decision information" What is indicated by "Three Predictive Decision Information" depends on the type of target task performed by the target neural network.

其中,第二损失函数包括第四损失函数项和第五损失函数项,第四损失函数项指示第二预测决策信息和第二训练样本所对应的期望决策信息之间的相似度,第五损失函数指示第二预测决策信息和第三预测决策信息之间的相似度。Wherein, the second loss function includes a fourth loss function term and a fifth loss function term. The fourth loss function term indicates the similarity between the second prediction decision information and the expected decision information corresponding to the second training sample. The fifth loss function term indicates the similarity between the second prediction decision information and the expected decision information corresponding to the second training sample. The function indicates the degree of similarity between the second prediction decision information and the third prediction decision information.

本实现方式中,第三特征信息和第一特征信息的数据尺寸相同且数据内容不同,则与 第二训练样本对应的第一特征信息和第二特征信息进行组合后得到的组合后特征,与第二组合后特征相比,也即第二训练样本中目标属性所关联的信息的特征发生的改变,但训练的目标还包括生成第二训练样本原本的期望决策信息,也即在训练阶段增加了不存在的训练数据,且训练目标包括基于不同类别的目标属性的信息均能够得到期望决策信息;通过前述方案,不仅增加了训练数据的多样性;且有利于降低训练后的神经网络对输入数据中目标属性关联的信息的依赖性,更加关注和所执行的任务相关的信息上,有利于提高输出的预测决策信息的准确性,且有利于提高针对不同类别的目标属性所指向的群体得到的预测决策信息的公平性。In this implementation, if the data size of the third feature information and the first feature information are the same and the data content is different, then The combined features obtained by combining the first feature information and the second feature information corresponding to the second training sample are compared with the second combined features, that is, the characteristics of the information associated with the target attribute in the second training sample occur. Change, but the training goal also includes generating the original expected decision-making information of the second training sample, that is, non-existent training data is added during the training phase, and the training goal includes information based on different categories of target attributes, and the expected decision-making information can be obtained ; Through the above solution, it not only increases the diversity of training data; it also helps reduce the dependence of the trained neural network on the information associated with the target attributes in the input data, and pays more attention to the information related to the tasks performed, which is beneficial to This improves the accuracy of the output prediction decision information and is conducive to improving the fairness of the prediction decision information obtained for groups pointed to by different categories of target attributes.

在第一方面的一种可能实现方式中,训练设备获取第三特征信息,包括:训练设备通过训练后的特征获取网络生成与第二训练样本对应的第一特征信息;将与第二训练样本对应的第一特征信息和扰动信息进行加权求和,得到第三特征信息,扰动信息的权重值为可调整的。进一步地,该扰动信息可以为由训练设备随机生成的信息,也可以包括第一损失函数项所对应的梯度,该扰动信息也可以通过其他方式获得等,此处不做穷举。In a possible implementation of the first aspect, the training device obtains the third feature information, including: the training device generates the first feature information corresponding to the second training sample through the trained feature acquisition network; The corresponding first feature information and disturbance information are weighted and summed to obtain the third feature information, and the weight value of the disturbance information is adjustable. Further, the disturbance information may be information randomly generated by the training equipment, or may include the gradient corresponding to the first loss function term. The disturbance information may also be obtained through other methods, etc., and is not exhaustive here.

本实现方式中,先将第二训练样本的特征信息进行分解,再对目标属性关联的信息的特征进行干预,从而得到与第二训练样本的实事相反的训练样本,若扰动信息的权重值越大,则第三特征信息和第一特征信息之间的相似度越低,则越有利于提高得到的预测决策信息的公平性;则可以通过调整扰动信息的权重值可以平衡得到的预测决策信息的准确性和公平性。In this implementation, the characteristic information of the second training sample is first decomposed, and then the characteristics of the information associated with the target attribute are intervened, thereby obtaining a training sample that is opposite to the actual situation of the second training sample. If the weight value of the disturbance information exceeds is larger, the lower the similarity between the third feature information and the first feature information, the more conducive to improving the fairness of the obtained prediction decision information; then the obtained prediction decision information can be balanced by adjusting the weight value of the disturbance information accuracy and fairness.

在第一方面的一种可能实现方式中,方法还包括:训练设备根据与第一训练样本对应的第二特征信息执行分类操作,得到与第一训练样本对应的第四预测决策信息;其中,第一损失函数还包括第六损失函数项,第六损失函数项指示第四预测决策信息和第一训练样本所对应的期望决策信息之间的相似度,第二损失函数项指示第一损失函数项所对应的梯度和第六损失函数项所对应的梯度之间的相似度。In a possible implementation of the first aspect, the method further includes: the training device performs a classification operation according to the second feature information corresponding to the first training sample, and obtains fourth prediction decision information corresponding to the first training sample; wherein, The first loss function also includes a sixth loss function term, the sixth loss function term indicates the similarity between the fourth predicted decision information and the expected decision information corresponding to the first training sample, and the second loss function term indicates the first loss function The similarity between the gradient corresponding to the term and the gradient corresponding to the sixth loss function term.

本实现方式中,第一损失函数项所对应的梯度是为了让能够使获取到第一特征信息能够更为准确的反映目标属性关联的信息的特征,第六损失函数项所对应的梯度是为了基于获取到的第二特征信息能够排除第一训练样本包括的目标属性关联的信息的干扰,从而能够生成更为准确的预测决策信息,第二损失函数项采用一损失函数项所对应的梯度和第六损失函数项所对应的梯度之间的相似度,能够进一步提高对特征获取网络的权重参数的更新效率,且有利于提高训练后的特征获取网络生成的第一特征信息和第二特征信息的准确度。In this implementation, the gradient corresponding to the first loss function term is to enable the acquisition of the first feature information to more accurately reflect the characteristics of the information associated with the target attribute, and the gradient corresponding to the sixth loss function term is to enable the obtained first feature information to more accurately reflect the characteristics of the information associated with the target attribute. Based on the obtained second feature information, the interference of the information related to the target attribute included in the first training sample can be eliminated, so that more accurate prediction decision information can be generated. The second loss function term adopts the gradient sum corresponding to a loss function term. The similarity between the gradients corresponding to the sixth loss function term can further improve the efficiency of updating the weight parameters of the feature acquisition network, and is conducive to improving the first feature information and the second feature information generated by the feature acquisition network after training. accuracy.

在第一方面的一种可能实现方式中,方法应用于如下任一种场景:确定是否推荐第一训练样本指向的物体、确定第一训练样本中的物体是否处于目标状态或确定是否同意第一训练样本指向的申请者的请求。对应的,本方案中的各种“决策信息”可以用于指示如下任一种信息:是否推荐第一训练样本指向的物体、第一训练样本中的物体是否处于目标状态、是否同意第一训练样本指向的申请者的请求。本实现方式中,提供了本方法的多种应用场景,提高了本方案的实现灵活性。In a possible implementation of the first aspect, the method is applied to any of the following scenarios: determining whether to recommend the object pointed by the first training sample, determining whether the object in the first training sample is in the target state, or determining whether to agree with the first The training sample points to the applicant's request. Correspondingly, various "decision information" in this solution can be used to indicate any of the following information: whether to recommend the object pointed to by the first training sample, whether the object in the first training sample is in the target state, whether to agree to the first training The sample points to the applicant's request. This implementation provides a variety of application scenarios for this method, which improves the implementation flexibility of this solution.

第二方面,本申请实施例提供一种数据的处理方法,可将人工智能技术用于做决策, 方法包括:执行设备将待处理数据输入特征获取网络,通过特征获取网络对待处理数据进行特征提取,得到待处理数据的特征信息;根据待处理数据的特征信息,通过特征获取网络生成第一特征信息和第二特征信息,第一特征信息包括待处理数据中与目标属性关联的信息的特征。执行设备将第一特征信息和第二特征信息进行组合,得到第一组合后特征;将第一组合后特征输入分类网络,得到分类网络输出的预测决策信息。其中,特征获取网络采用第一损失函数训练得到,第一损失函数包括第一损失函数项和第二损失函数项,第一损失函数项指示预测类别信息和期望类别信息之间的相似度,预测类别信息指示输入特征获取网络的数据中与目标属性关联的信息的预测类别,期望类别信息指示输入特征获取网络的数据中与目标属性关联的信息的正确类别,采用第二损失函数项进行训练的目的包括降低第一特征信息和第二特征信息之间的相似度。In the second aspect, embodiments of this application provide a data processing method that can use artificial intelligence technology to make decisions. The method includes: the execution device inputs the data to be processed into the feature acquisition network, performs feature extraction on the data to be processed through the feature acquisition network, and obtains the feature information of the data to be processed; and generates the first feature information through the feature acquisition network according to the feature information of the data to be processed. and second characteristic information, where the first characteristic information includes characteristics of information associated with the target attribute in the data to be processed. The execution device combines the first feature information and the second feature information to obtain the first combined feature; inputs the first combined feature into the classification network to obtain prediction decision information output by the classification network. Among them, the feature acquisition network is trained using a first loss function. The first loss function includes a first loss function term and a second loss function term. The first loss function term indicates the similarity between the predicted category information and the expected category information. The predicted The category information indicates the predicted category of the information associated with the target attribute in the data input to the feature acquisition network. The expected category information indicates the correct category of the information associated with the target attribute in the data input to the feature acquisition network. The second loss function term is used for training. The purpose includes reducing the similarity between the first feature information and the second feature information.

在第二方面的一种可能实现方式中,第一损失函数还包括第三损失函数项,第三损失函数项指示第一预测决策信息和输入特征获取网络的数据所对应的期望决策信息之间的相似度,第一预测决策信息指示与输入特征获取网络的数据对应的决策。In a possible implementation of the second aspect, the first loss function further includes a third loss function term, and the third loss function term indicates the relationship between the first prediction decision information and the expected decision information corresponding to the data of the input feature acquisition network. The similarity of the first prediction decision information indicates the decision corresponding to the data input to the feature acquisition network.

本申请第二方面中,执行设备还可以用于执行第一方面以及第一方面的各个可能实现方式中训练设备执行的步骤,第二方面的各个可能实现方式中的步骤的具体实现方式、名词的含义以及所带来的有益效果,均可以参阅第一方面,此处不再赘述。In the second aspect of this application, the execution device can also be used to execute the steps performed by the training device in the first aspect and each possible implementation manner of the first aspect. The specific implementation manner and noun of the steps in each possible implementation manner of the second aspect Please refer to the first aspect for its meaning and beneficial effects, and will not be repeated here.

第三方面,本申请实施例提供一种神经网络的训练装置,可将人工智能技术用于做决策,神经网络的训练装置包括:特征提取模块,用于将第一训练样本输入特征获取网络,通过特征获取网络对第一训练样本进行特征提取,得到第一训练样本的特征信息;生成模块,用于根据第一训练样本的特征信息,通过特征获取网络生成与第一训练样本对应的第一特征信息和第二特征信息,与第一训练样本对应的第一特征信息包括第一训练样本中与目标属性关联的信息的特征;第一分类模块,用于根据与第一训练样本对应的第一特征信息执行分类操作,得到预测类别信息,预测类别信息指示与第一训练样本对应的第一特征信息的预测类别,预测类别包括于目标属性所对应的多种类别中;训练模块,用于根据第一损失函数,对特征获取网络进行训练,得到训练后的特征获取网络;其中,第一损失函数包括第一损失函数项和第二损失函数项,第一损失函数项指示预测类别信息和期望类别信息之间的相似度,期望类别信息指示第一训练样本中与目标属性关联的信息的正确类别,采用第二损失函数项进行训练的目的包括降低第一特征信息和第二特征信息之间的相似度。In the third aspect, embodiments of the present application provide a neural network training device that can use artificial intelligence technology to make decisions. The neural network training device includes: a feature extraction module for inputting the first training sample into the feature acquisition network, Feature extraction is performed on the first training sample through the feature acquisition network to obtain the feature information of the first training sample; the generation module is used to generate the first training sample corresponding to the first training sample through the feature acquisition network based on the feature information of the first training sample. Feature information and second feature information. The first feature information corresponding to the first training sample includes characteristics of the information associated with the target attribute in the first training sample; A feature information performs a classification operation to obtain predicted category information. The predicted category information indicates the predicted category of the first feature information corresponding to the first training sample. The predicted category is included in multiple categories corresponding to the target attribute; the training module is used to According to the first loss function, the feature acquisition network is trained to obtain the trained feature acquisition network; wherein the first loss function includes a first loss function term and a second loss function term, and the first loss function term indicates the predicted category information and The similarity between expected category information, the expected category information indicates the correct category of the information associated with the target attribute in the first training sample, and the purpose of using the second loss function term for training includes reducing the difference between the first feature information and the second feature information. similarity between.

本申请第三方面中,神经网络的训练装置还可以用于执行第一方面以及第一方面的各个可能实现方式中训练设备执行的步骤,第三方面的各个可能实现方式中的步骤的具体实现方式、名词的含义以及所带来的有益效果,均可以参阅第一方面,此处不再赘述。In the third aspect of this application, the neural network training device can also be used to perform the steps performed by the training device in the first aspect and each possible implementation of the first aspect, and the specific implementation of the steps in each possible implementation of the third aspect. For the methods, meanings of nouns and the beneficial effects, please refer to the first aspect and will not be repeated here.

第四方面,本申请实施例提供一种数据的处理装置,可将人工智能技术用于做决策,数据的处理装置包括:特征提取模块,用于将待处理数据输入特征获取网络,通过特征获取网络对待处理数据进行特征提取,得到待处理数据的特征信息;生成模块,用于根据待处理数据的特征信息,通过特征获取网络生成第一特征信息和第二特征信息,第一特征信息包括待处理数据中与目标属性关联的信息的特征;组合模块,用于将第一特征信息和第二特征信息进行组合,得到第一组合后特征;分类模块,用于将第一组合后特征输入分类 网络,得到分类网络输出的预测决策信息;其中,特征获取网络采用第一损失函数训练得到,第一损失函数包括第一损失函数项和第二损失函数项,第一损失函数项指示预测类别信息和期望类别信息之间的相似度,预测类别信息指示输入特征获取网络的数据中与目标属性关联的信息的预测类别,期望类别信息指示输入特征获取网络的数据中与目标属性关联的信息的正确类别,采用第二损失函数项进行训练的目的包括降低第一特征信息和第二特征信息之间的相似度。In the fourth aspect, embodiments of the present application provide a data processing device that can use artificial intelligence technology to make decisions. The data processing device includes: a feature extraction module for inputting the data to be processed into the feature acquisition network, and through the feature acquisition The network performs feature extraction on the data to be processed to obtain the feature information of the data to be processed; the generation module is used to generate first feature information and second feature information through the feature acquisition network according to the feature information of the data to be processed. The first feature information includes the feature information to be processed. Process the characteristics of the information associated with the target attribute in the data; the combination module is used to combine the first feature information and the second feature information to obtain the first combined feature; the classification module is used to input the first combined feature into the classification network to obtain the prediction decision information output by the classification network; wherein, the feature acquisition network is trained using a first loss function, the first loss function includes a first loss function term and a second loss function term, and the first loss function term indicates the prediction category information The similarity between the predicted category information and the expected category information. The predicted category information indicates the predicted category of the information associated with the target attribute in the data input to the feature acquisition network. The expected category information indicates the correctness of the information associated with the target attribute in the data input to the feature acquisition network. Category, the purpose of using the second loss function term for training includes reducing the similarity between the first feature information and the second feature information.

本申请第四方面中,数据的处理装置还可以用于执行第二方面以及第二方面的各个可能实现方式中执行设备执行的步骤,第四方面的各个可能实现方式中的步骤的具体实现方式、名词的含义以及所带来的有益效果,均可以参阅第一方面,此处不再赘述。In the fourth aspect of the present application, the data processing device can also be used to execute the second aspect and the steps performed by the execution device in each possible implementation of the second aspect, and the specific implementation of the steps in each possible implementation of the fourth aspect. , the meaning of nouns and the beneficial effects they bring can be referred to the first aspect, and will not be repeated here.

第五方面,本申请实施例提供了一种计算机程序产品,计算机程序产品包括程序,当该程序在计算机上运行时,使得计算机执行上述第一方面所述的神经网络的训练方法,或者,使得计算机执行上述第二方面所述的数据的处理方法。In a fifth aspect, embodiments of the present application provide a computer program product. The computer program product includes a program. When the program is run on a computer, it causes the computer to execute the neural network training method described in the first aspect, or causes the computer to perform the neural network training method described in the first aspect. The computer executes the data processing method described in the second aspect above.

第六方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,当其在计算机上运行时,使得计算机执行上述第一方面所述的神经网络的训练方法,或者,使得计算机执行上述第二方面所述的数据的处理方法。In a sixth aspect, embodiments of the present application provide a computer-readable storage medium. The computer-readable storage medium stores a computer program that, when run on a computer, causes the computer to execute the neural processing described in the first aspect. A network training method, or a method for causing a computer to perform the data processing method described in the second aspect above.

第七方面,本申请实施例提供了一种训练设备,包括处理器和存储器,处理器与存储器耦合,存储器,用于存储程序;处理器,用于执行存储器中的程序,使得训练设备执行上述第一方面所述的神经网络的训练方法。In the seventh aspect, embodiments of the present application provide a training device, including a processor and a memory. The processor is coupled to the memory. The memory is used to store programs; the processor is used to execute the program in the memory, so that the training device executes the above The training method of the neural network described in the first aspect.

第八方面,本申请实施例提供了一种执行设备,包括处理器和存储器,处理器与存储器耦合,存储器,用于存储程序;处理器,用于执行存储器中的程序,使得执行设备执行上述第二方面所述的数据的处理方法。In the eighth aspect, embodiments of the present application provide an execution device, including a processor and a memory. The processor is coupled to the memory. The memory is used to store programs; the processor is used to execute the program in the memory, so that the execution device executes the above The data processing method described in the second aspect.

第九方面,本申请提供了一种芯片系统,该芯片系统包括处理器,用于支持执行设备或通信设备实现上述方面中所涉及的功能,例如,发送或处理上述方法中所涉及的数据和/或信息。在一种可能的设计中,所述芯片系统还包括存储器,所述存储器,用于保存执行设备或通信设备必要的程序指令和数据。该芯片系统,可以由芯片构成,也可以包括芯片和其他分立器件。In a ninth aspect, the present application provides a chip system, which includes a processor and is used to support an execution device or a communication device to implement the functions involved in the above aspects, for example, sending or processing data involved in the above methods and /or information. In a possible design, the chip system further includes a memory, and the memory is used to store necessary program instructions and data for execution devices or communication devices. The chip system may be composed of chips, or may include chips and other discrete devices.

附图说明Description of the drawings

图1为本申请实施例提供的人工智能主体框架的一种结构示意图;Figure 1 is a schematic structural diagram of the artificial intelligence main framework provided by the embodiment of the present application;

图2a为本申请实施例提供的数据处理系统的一种系统架构图;Figure 2a is a system architecture diagram of the data processing system provided by the embodiment of the present application;

图2b为本申请实施例提供的神经网络的训练方法的一种流程示意图;Figure 2b is a schematic flow chart of a neural network training method provided by an embodiment of the present application;

图3为本申请实施例提供的神经网络的训练方法的一种流程示意图;Figure 3 is a schematic flow chart of a neural network training method provided by an embodiment of the present application;

图4为本申请实施例提供的神经网络的训练方法的一种流程示意图;Figure 4 is a schematic flow chart of a neural network training method provided by an embodiment of the present application;

图5为本申请实施例提供的神经网络的训练方法中第一特征信息和第二特征信息的一种对比示意图;Figure 5 is a schematic diagram comparing the first feature information and the second feature information in the neural network training method provided by the embodiment of the present application;

图6为本申请实施例提供的神经网络的训练方法的一种流程示意图;Figure 6 is a schematic flow chart of a neural network training method provided by an embodiment of the present application;

图7为本申请实施例提供的神经网络的训练方法的一种流程示意图; Figure 7 is a schematic flow chart of a neural network training method provided by an embodiment of the present application;

图8为本申请实施例提供的数据的处理方法的一种流程示意图;Figure 8 is a schematic flow chart of the data processing method provided by the embodiment of the present application;

图9为本申请实施例提供的神经网络的训练方法的有益效果的一种示意图;Figure 9 is a schematic diagram of the beneficial effects of the neural network training method provided by the embodiment of the present application;

图10为本申请实施例提供的神经网络的训练装置的一种结构示意图;Figure 10 is a schematic structural diagram of a neural network training device provided by an embodiment of the present application;

图11为本申请实施例提供的数据的处理装置的一种结构示意图;Figure 11 is a schematic structural diagram of a data processing device provided by an embodiment of the present application;

图12为本申请实施例提供的执行设备的一种结构示意图;Figure 12 is a schematic structural diagram of an execution device provided by an embodiment of the present application;

图13为本申请实施例提供的训练设备的又一种结构示意图;Figure 13 is another structural schematic diagram of the training equipment provided by the embodiment of the present application;

图14为本申请实施例提供的芯片的一种结构示意图。Figure 14 is a schematic structural diagram of a chip provided by an embodiment of the present application.

具体实施方式Detailed ways

下面结合附图,对本申请的实施例进行描述。本领域普通技术人员可知,随着技术的发展和新场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。The embodiments of the present application are described below with reference to the accompanying drawings. Persons of ordinary skill in the art know that with the development of technology and the emergence of new scenarios, the technical solutions provided in the embodiments of this application are also applicable to similar technical problems.

本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,以便包含一系列单元的过程、方法、系统、产品或设备不必限于那些单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它单元。The terms "first", "second", etc. in the description and claims of this application and the above-mentioned drawings are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that the terms so used are interchangeable under appropriate circumstances, and are merely a way of distinguishing objects with the same attributes in describing the embodiments of the present application. Furthermore, the terms "include" and "having" and any variations thereof, are intended to cover non-exclusive inclusions, such that a process, method, system, product or apparatus comprising a series of elements need not be limited to those elements, but may include not explicitly other elements specifically listed or inherent to such processes, methods, products or equipment.

首先对人工智能系统总体工作流程进行描述,请参见图1,图1示出的为人工智能主体框架的一种结构示意图,下面从“智能信息链”(水平轴)和“IT价值链”(垂直轴)两个维度对上述人工智能主题框架进行阐述。其中,“智能信息链”反映从数据的获取到处理的一列过程。举例来说,可以是智能信息感知、智能信息表示与形成、智能推理、智能决策、智能执行与输出的一般过程。在这个过程中,数据经历了“数据—信息—知识—智慧”的凝练过程。“IT价值链”从人智能的底层基础设施、信息(提供和处理技术实现)到系统的产业生态过程,反映人工智能为信息技术产业带来的价值。First, the overall workflow of the artificial intelligence system is described. Please refer to Figure 1. Figure 1 shows a structural schematic diagram of the artificial intelligence main framework. The following is from the "intelligent information chain" (horizontal axis) and "IT value chain" ( The above artificial intelligence theme framework is elaborated on the two dimensions of vertical axis). Among them, the "intelligent information chain" reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, the data has gone through the condensation process of "data-information-knowledge-wisdom". The "IT value chain" reflects the value that artificial intelligence brings to the information technology industry, from the underlying infrastructure of human intelligence and information (providing and processing technology implementation) to the systematic industrial ecological process.

(1)基础设施(1)Infrastructure

基础设施为人工智能系统提供计算能力支持,实现与外部世界的沟通,并通过基础平台实现支撑。通过传感器与外部沟通;计算能力由智能芯片提供,该智能芯片具体可以采用中央处理器(central processing unit,CPU)、嵌入式神经网络处理器(neural-network processing unit,NPU)、图形处理器(graphics processing unit,GPU)、专用集成电路(application specific integrated circuit,ASIC)或现场可编程门阵列(field programmable gate array,FPGA)等硬件加速芯片;基础平台包括分布式计算框架及网络等相关的平台保障和支持,可以包括云存储和计算、互联互通网络等。举例来说,传感器和外部沟通获取数据,这些数据提供给基础平台提供的分布式计算系统中的智能芯片进行计算。Infrastructure provides computing power support for artificial intelligence systems, enables communication with the external world, and supports it through basic platforms. Communicate with the outside through sensors; computing power is provided by a smart chip, which can specifically use a central processing unit (CPU), an embedded neural network processor (neural-network processing unit, NPU), a graphics processor ( Graphics processing unit (GPU), application specific integrated circuit (ASIC) or field programmable gate array (FPGA) and other hardware acceleration chips; the basic platform includes distributed computing framework and network and other related platforms Guarantee and support can include cloud storage and computing, interconnection networks, etc. For example, sensors communicate with the outside world to obtain data, which are provided to smart chips in the distributed computing system provided by the basic platform for calculation.

(2)数据(2)Data

基础设施的上一层的数据用于表示人工智能领域的数据来源。数据涉及到图形、图像、语音、文本,还涉及到传统设备的物联网数据,包括已有系统的业务数据以及力、位移、 液位、温度、湿度等感知数据。Data from the upper layer of the infrastructure is used to represent data sources in the field of artificial intelligence. The data involves graphics, images, voice, and text, as well as IoT data of traditional devices, including business data of existing systems as well as force, displacement, Sensing data such as liquid level, temperature, humidity, etc.

(3)数据处理(3)Data processing

数据处理通常包括数据训练,机器学习,深度学习,搜索,推理,决策等方式。Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making and other methods.

其中,机器学习和深度学习可以对数据进行符号化和形式化的智能信息建模、抽取、预处理、训练等。Among them, machine learning and deep learning can perform symbolic and formal intelligent information modeling, extraction, preprocessing, training, etc. on data.

推理是指在计算机或智能系统中,模拟人类的智能推理方式,依据推理控制策略,利用形式化的信息进行机器思维和求解问题的过程,典型的功能是搜索与匹配。Reasoning refers to the process of simulating human intelligent reasoning in computers or intelligent systems, using formal information to perform machine thinking and problem solving based on reasoning control strategies. Typical functions are search and matching.

决策是指智能信息经过推理后进行决策的过程,通常提供分类、排序、预测等功能。Decision-making refers to the process of decision-making after intelligent information is reasoned, and usually provides functions such as classification, sorting, and prediction.

(4)通用能力(4) General ability

对数据经过上面提到的数据处理后,进一步基于数据处理的结果可以形成一些通用的能力,比如可以是算法或者一个通用系统,例如,翻译,文本的分析,计算机视觉的处理,语音识别,图像的识别等等。After the data is processed as mentioned above, some general capabilities can be formed based on the results of further data processing, such as algorithms or a general system, such as translation, text analysis, computer vision processing, speech recognition, and image processing. identification, etc.

(5)智能产品及行业应用(5) Intelligent products and industry applications

智能产品及行业应用指人工智能系统在各领域的产品和应用,是对人工智能整体解决方案的封装,将智能信息决策产品化、实现落地应用,其应用领域主要包括:智能终端、智能制造、智能交通、智能家居、智能医疗、智能安防、自动驾驶、智慧城市等。Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. They are the encapsulation of overall artificial intelligence solutions, productizing intelligent information decision-making and realizing practical applications. Its application fields mainly include: intelligent terminals, intelligent manufacturing, Smart transportation, smart home, smart healthcare, smart security, autonomous driving, smart city, etc.

本申请实施例可以应用于人工智能领域的各个应用领域中,具体的,可以将人工智能技术用于解决各个应用领域中的决策问题。作为示例,例如在智慧城市领域中,神经网络输出的预测决策信息可以指示是否推荐由某个供应商提供的特定的新闻;或者,神经网络输出的预测决策信息可以指示是否推荐由某个供应商提供的特定的课程;或者,神经网络输出的预测决策信息可以指示是否推荐由某个纸片上提供的特定的电影。The embodiments of the present application can be applied to various application fields in the field of artificial intelligence. Specifically, artificial intelligence technology can be used to solve decision-making problems in various application fields. As an example, in the field of smart cities, the predictive decision information output by the neural network can indicate whether to recommend specific news provided by a certain supplier; or, the predictive decision information output by the neural network can indicate whether to recommend specific news provided by a certain supplier. specific courses offered; alternatively, the predictive decision information output by the neural network can indicate whether to recommend a specific movie offered by a certain piece of paper.

作为另一示例,例如在智能家居领域中,神经网络输出的预测决策信息可以指示图像中的人是否为卷发;或者,神经网络输出的预测决策信息可以指示图像中的人是否正在微笑;或者,神经网络输出的预测决策信息可以指示图像中的人是否具有吸引力。As another example, for example, in the field of smart home, the prediction decision information output by the neural network can indicate whether the person in the image has curly hair; or, the prediction decision information output by the neural network can indicate whether the person in the image is smiling; or, The predictive decision-making information output by the neural network can indicate whether the person in the image is attractive or not.

作为另一示例,例如神经网络输出的预测决策信息可以指示是否同意特定的申请者的贷款请求等等,此处不对本申请实施例的应用场景进行穷举。As another example, for example, the prediction decision information output by the neural network can indicate whether to approve a specific applicant's loan request, etc. The application scenarios of the embodiments of this application are not exhaustive here.

为了能够提高训练后的神经网络输出的预测决策信息的准确度,本申请实施例提供了一种神经网络的训练方法,在对前述方法进行介绍之前,请先参阅图2a,图2a为本申请实施例提供的数据处理系统的一种系统架构图,在图2a中,数据处理系统200包括训练设备210、数据库220、执行设备230、数据存储系统240和客户设备250,执行设备230中包括计算模块231。In order to improve the accuracy of the prediction decision information output by the trained neural network, embodiments of the present application provide a neural network training method. Before introducing the foregoing method, please refer to Figure 2a. Figure 2a is a diagram of the present application. A system architecture diagram of the data processing system provided by the embodiment. In Figure 2a, the data processing system 200 includes a training device 210, a database 220, an execution device 230, a data storage system 240 and a client device 250. The execution device 230 includes a computing device Module 231.

其中,数据库220中存储有训练数据集合,训练设备210生成第一模型/规则201,并利用训练数据集合对第一模型/规则201进行迭代训练,得到训练后的第一模型/规则201。第一模型/规则201可以具体表现为神经网络,也可以表现为非神经网络的模型,本申请实施例中仅以第一模型/规则201表现为神经网络为例进行说明。进一步地,第一模型/规则201可以包括用于对输入的数据进行特征获取的神经网络。The database 220 stores a training data set, the training device 210 generates the first model/rule 201, and uses the training data set to iteratively train the first model/rule 201 to obtain the trained first model/rule 201. The first model/rule 201 may be embodied as a neural network, or may be embodied as a non-neural network model. In the embodiment of this application, only the first model/rule 201 expressed as a neural network is used as an example for explanation. Further, the first model/rule 201 may include a neural network for feature acquisition on input data.

具体的,请参阅图2b,图2b为本申请实施例提供的神经网络的训练方法的一种流程 示意图。A1、训练设备210将第一训练样本输入特征获取网络(也即第一模型/规则201的一个示例),通过特征获取网络对第一训练样本进行特征提取,得到第一训练样本的特征信息。A2、训练设备210通过特征获取网络对第一训练样本的特征信息进行分解,得到与第一训练样本对应的第一特征信息和第二特征信息,与第一训练样本对应的第一特征信息包括第一训练样本中与目标属性关联的信息的特征。A3、训练设备210根据与第一训练样本对应的第一特征信息执行分类操作,得到预测类别信息,预测类别信息指示与第一训练样本对应的第一特征信息的预测类别,预测类别包括于目标属性所对应的多种类别中。A4、训练设备210根据第一损失函数,对特征获取网络进行训练,得到训练后的特征获取网络,第一损失函数包括第一损失函数项和第二损失函数项,第一损失函数项指示预测类别信息和期望类别信息之间的相似度,期望类别信息指示第一训练样本中与目标属性关联的信息的正确类别,采用第二损失函数项进行训练的目的包括降低第一特征信息和第二特征信息之间的相似度。Specifically, please refer to Figure 2b. Figure 2b is a flowchart of a neural network training method provided by an embodiment of the present application. Schematic diagram. A1. The training device 210 inputs the first training sample into the feature acquisition network (that is, an example of the first model/rule 201), and performs feature extraction on the first training sample through the feature acquisition network to obtain feature information of the first training sample. A2. The training device 210 decomposes the feature information of the first training sample through the feature acquisition network to obtain the first feature information and the second feature information corresponding to the first training sample. The first feature information corresponding to the first training sample includes Characteristics of the information associated with the target attribute in the first training sample. A3. The training device 210 performs a classification operation according to the first feature information corresponding to the first training sample to obtain predicted category information. The predicted category information indicates the predicted category of the first feature information corresponding to the first training sample. The predicted category is included in the target in various categories corresponding to attributes. A4. The training device 210 trains the feature acquisition network according to the first loss function to obtain the trained feature acquisition network. The first loss function includes a first loss function term and a second loss function term. The first loss function term indicates prediction. The similarity between the category information and the expected category information. The expected category information indicates the correct category of the information associated with the target attribute in the first training sample. The purpose of using the second loss function term for training includes reducing the first feature information and the second similarity between feature information.

训练设备210得到的训练后的第一模型/规则201会被部署到执行设备230中,执行设备230可以调用数据存储系统240中的数据、代码等,也可以将数据、指令等存入数据存储系统240中。数据存储系统240可以置于执行设备230中,也可以为数据存储系统240相对执行设备230是外部存储器。The trained first model/rules 201 obtained by the training device 210 will be deployed to the execution device 230. The execution device 230 can call the data, codes, etc. in the data storage system 240, or store the data, instructions, etc. in the data storage. In system 240. The data storage system 240 may be placed in the execution device 230 , or the data storage system 240 may be an external memory relative to the execution device 230 .

本申请的一些实施例中,请参阅图2a,执行设备230和客户设备250可以为分别独立的设备,执行设备230配置有输入/输出(I/O)接口,与客户设备250进行数据交互,“用户”可以通过客户设备250输入待处理数据,客户设备250通过I/O接口向执行设备230发送待处理数据,执行设备230在通过计算模块231中的第一机器学习模型/规则201生成与待处理数据对应的预测决策信息之后,可以通过I/O接口将前述预测决策信息返回给客户设备250,提供给用户。In some embodiments of the present application, please refer to Figure 2a. The execution device 230 and the client device 250 may be independent devices. The execution device 230 is configured with an input/output (I/O) interface for data interaction with the client device 250. The "user" can input the data to be processed through the client device 250. The client device 250 sends the data to be processed to the execution device 230 through the I/O interface. The execution device 230 generates and After the prediction decision information corresponding to the data to be processed is obtained, the aforementioned prediction decision information can be returned to the client device 250 through the I/O interface and provided to the user.

值得注意的,图2a仅是本发明实施例提供的数据处理系统的一种架构示意图,图中所示设备、器件、模块等之间的位置关系不构成任何限制。例如,在本申请的另一些实施例中,执行设备230可以配置于客户设备250中,作为示例,例如当客户设备为手机或平板时,执行设备230可以为手机或平板的主处理器(Host CPU)中用于进行阵列图像处理的模块,执行设备230也可以为手机或平板中的图形处理器(graphics processing unit,GPU)或者神经网络处理器(NPU),GPU或NPU作为协处理器挂载到主处理器上,由主处理器分配任务。It is worth noting that Figure 2a is only an architectural schematic diagram of the data processing system provided by the embodiment of the present invention, and the positional relationship between the devices, devices, modules, etc. shown in the figure does not constitute any limitation. For example, in other embodiments of the present application, the execution device 230 may be configured in the client device 250. As an example, when the client device is a mobile phone or tablet, the execution device 230 may be the host processor (Host) of the mobile phone or tablet. A module in the CPU for array image processing. The execution device 230 can also be a graphics processing unit (GPU) or a neural network processor (NPU) in a mobile phone or tablet. The GPU or NPU serves as a co-processor. Loaded to the main processor, the main processor allocates tasks.

结合上述描述,下面开始对本申请实施例提供的神经网络的训练阶段和推理阶段的具体实现流程进行描述。In conjunction with the above description, the specific implementation process of the training phase and inference phase of the neural network provided by the embodiment of the present application will be described below.

一、训练阶段1. Training stage

本申请实施例中,具体的,请参阅图3,图3为本申请实施例提供的神经网络的训练方法的一种流程示意图,本申请实施例提供的神经网络的训练方法可以包括:In the embodiment of the present application, specifically, please refer to Figure 3. Figure 3 is a schematic flow chart of the neural network training method provided by the embodiment of the present application. The neural network training method provided by the embodiment of the present application may include:

301、训练设备将第一训练样本输入特征获取网络,通过特征获取网络对第一训练样本进行特征提取,得到第一训练样本的特征信息。301. The training device inputs the first training sample into the feature acquisition network, performs feature extraction on the first training sample through the feature acquisition network, and obtains feature information of the first training sample.

本申请实施例中,训练设备上部署有训练数据集,训练设备可以从训练数据集中采样 至少一个第一训练样本,将采样到的第一训练样本输入特征获取网络,通过特征获取网络对第一训练样本进行特征提取,得到第一训练样本的特征信息。In the embodiment of this application, a training data set is deployed on the training device, and the training device can sample from the training data set At least one first training sample is input into a feature acquisition network, and feature extraction is performed on the first training sample through the feature acquisition network to obtain feature information of the first training sample.

其中,特征获取网络可以包括用于进行特征提取的第一神经网络模块,第一神经网络模块具体可以采用卷积神经网络、循环神经网络、残差神经网络或其他类型的神经网络等,具体可以结合第一训练样本的数据类型进行选择,作为示例,例如第一神经网络模块可以采用残差神经网络(residual neural network,ResNet)-18、ResNet-34或其他类型的神经网络等,此处不做穷举。Wherein, the feature acquisition network may include a first neural network module for feature extraction. The first neural network module may specifically adopt a convolutional neural network, a recurrent neural network, a residual neural network or other types of neural networks. Specifically, it may The selection is made based on the data type of the first training sample. As an example, the first neural network module can use residual neural network (residual neural network, ResNet)-18, ResNet-34 or other types of neural networks, etc., which are not mentioned here. Do exhaustion.

302、训练设备根据第一训练样本的特征信息,通过特征获取网络生成与第一训练样本对应的第一特征信息和第二特征信息,与第一训练样本对应的第一特征信息包括第一训练样本中与目标属性关联的信息的特征。302. The training device generates first feature information and second feature information corresponding to the first training sample through the feature acquisition network based on the feature information of the first training sample. The first feature information corresponding to the first training sample includes the first training sample. Characteristics of the information in the sample associated with the target attribute.

本申请实施例中,训练设备可以根据第一训练样本的特征信息,通过特征获取网络生成与第一训练样本对应的至少一个第一特征信息和第二特征信息;其中,与第一训练样本对应的每个第一特征信息包括第一训练样本中与一个目标属性关联的信息的特征,与第一训练样本对应的第二特征信息包括第一训练样本中与每个目标属性均不关联的信息的特征,也即训练的目标包括得到的第一特征信息和第二特征信息中包括不同的信息。In the embodiment of the present application, the training device can generate at least one first feature information and second feature information corresponding to the first training sample through the feature acquisition network according to the feature information of the first training sample; wherein, corresponding to the first training sample Each of the first feature information includes features of information associated with a target attribute in the first training sample, and the second feature information corresponding to the first training sample includes information that is not associated with each target attribute in the first training sample. The characteristics, that is, the training target includes different information in the obtained first characteristic information and second characteristic information.

进一步地,技术人员可以根据当前任务中导致预测决策信息出现偏差的因素来确定目标属性。作为示例,例如训练后的神经网络用于确定是否推荐某个课程,导致预测决策信息出现偏差的因素包括提供课程的供应商,则对该神经网络的训练阶段所采用的至少一个目标属性可以包括供应商,若第一训练样本是课程,则第一训练样本中与目标属性关联的信息可以包括用于指示课程的供应商的水印、课程封面中对供应商的介绍或其他信息等;作为另一示例,例如训练后的神经网络用于确定图像中的人脸是否是卷发,导致预测决策信息出现偏差的因素包括图像中人的性别,则对该神经网络的训练阶段所采用的至少一个目标属性可以包括用户性别,人脸图像(也即第一训练样本)中与目标属性关联的信息可以包括人脸图像中脖子部分的图像信息,前述脖子部分的图像信息可以用来确定用户是否有喉结等等,应理解,此处举例仅为方便理解“目标属性”和“训练样本中与目标属性关联的信息”这两个概念,不用于限定本方案。Further, technicians can determine target attributes based on factors in the current task that cause bias in predictive decision information. As an example, for example, if a trained neural network is used to determine whether to recommend a certain course, and factors that cause bias in the prediction decision information include the supplier of the course, then at least one target attribute used in the training phase of the neural network may include Supplier, if the first training sample is a course, the information associated with the target attribute in the first training sample may include a watermark indicating the supplier of the course, an introduction to the supplier in the course cover, or other information; as another For example, if a trained neural network is used to determine whether a face in an image has curly hair, and the factors that cause bias in the predicted decision information include the gender of the person in the image, then at least one goal is used in the training phase of the neural network. The attributes may include the gender of the user. The information associated with the target attribute in the face image (i.e., the first training sample) may include the image information of the neck part in the face image. The image information of the neck part may be used to determine whether the user has Adam's apple. etc. It should be understood that the examples here are only for the convenience of understanding the two concepts of "target attribute" and "information associated with the target attribute in the training sample" and are not used to limit this solution.

具体的,在一种实现方式中,特征获取网络可以包括与至少一个目标属性一一对应的至少一个第二神经网络模块和第三神经网络模块,训练设备在生成第一训练样本的特征信息之后,可以通过每个第二神经网络模块从第一训练样本的特征信息中获取到与第一训练样本对应的一个第一特征信息,并通过第三神经网络模块从第一训练样本的特征信息中获取到与第一训练样本对应的一个第二特征信息。Specifically, in one implementation, the feature acquisition network may include at least a second neural network module and a third neural network module that correspond to at least one target attribute. The training device generates the feature information of the first training sample. , a first feature information corresponding to the first training sample can be obtained from the feature information of the first training sample through each second neural network module, and the first feature information corresponding to the first training sample can be obtained from the feature information of the first training sample through the third neural network module. A second feature information corresponding to the first training sample is obtained.

在另一种实现方式中,特征获取网络可以包括一个完整的第二神经网络模块,训练设备在生成第一训练样本的特征信息之后,可以将第一训练样本的特征信息输入第二神经网络模块中,通过第二神经网络模块执行分解操作,得到第二神经网络模块输出的与第一训练样本对应的至少一个第一特征信息和一个第二特征信息。In another implementation, the feature acquisition network may include a complete second neural network module. After the training device generates the feature information of the first training sample, it may input the feature information of the first training sample into the second neural network module. , performing a decomposition operation through the second neural network module to obtain at least one first feature information and one second feature information corresponding to the first training sample output by the second neural network module.

需要说明的是,特征获取网络还可以具体表现为其它结构形式,此处说明仅用于证明本方案的可实现性,不用于限定本方案。 It should be noted that the feature acquisition network can also be embodied in other structural forms. The description here is only used to prove the feasibility of this solution and is not used to limit this solution.

303、训练设备根据与第一训练样本对应的第一特征信息执行分类操作,得到第一预测类别信息,第一预测类别信息指示与第一训练样本对应的第一特征信息的预测类别,预测类别包括于目标属性所对应的多种类别中。303. The training device performs a classification operation according to the first feature information corresponding to the first training sample to obtain first prediction category information. The first prediction category information indicates the prediction category of the first feature information corresponding to the first training sample. The prediction category Included in various categories corresponding to the target attribute.

本申请实施例中,训练设备可以将与第一训练样本对应的第一特征信息输入第一分类器中,通过第一分类器执行分类操作,得到第一分类器生成的第一预测类别信息;其中,第一预测类别信息指示与第一训练样本对应的第一特征信息的预测类别,也即第一预测类别信息指示第一训练样本中与目标属性关联的信息所对应的预测类别,该预测类别包括于目标属性所对应的多种类别中。In the embodiment of the present application, the training device can input the first feature information corresponding to the first training sample into the first classifier, and perform the classification operation through the first classifier to obtain the first predicted category information generated by the first classifier; Wherein, the first prediction category information indicates the prediction category of the first feature information corresponding to the first training sample, that is, the first prediction category information indicates the prediction category corresponding to the information associated with the target attribute in the first training sample, and the prediction The category is included in various categories corresponding to the target attribute.

作为示例,例如神经网络用于确定某个课程是否被推荐,目标属性为供应商,则目标属性所对应的多种类别可以包括甲供应商、乙供应商、丙供应商和丁供应商等;作为另一示例,例如神经网络用于判断图像中的人是否为卷发,目标属性为性别,则目标属性所对应的多种类别可以包括男性和女性;作为另一示例,例如神经网络用于确定某部电影是否被推荐,第一训练样本中包括用户对电影的打分、电影自身的多个属性信息,目标属性可以包括电影的制片商,则目标属性所对应的多种类别可以包括甲制片商、乙制片商、丙制片商和丁制片商等,此处不做穷举。As an example, if a neural network is used to determine whether a certain course is recommended, and the target attribute is supplier, then the multiple categories corresponding to the target attribute may include supplier A, supplier B, supplier C, supplier D, etc.; As another example, for example, a neural network is used to determine whether the person in the image has curly hair, and the target attribute is gender, then the multiple categories corresponding to the target attribute may include male and female; as another example, for example, a neural network is used to determine whether the person in the image has curly hair. Whether a certain movie is recommended, the first training sample includes the user's rating of the movie and multiple attribute information of the movie itself. The target attribute can include the producer of the movie, and the multiple categories corresponding to the target attribute can include A system. Film producers, film producers B, film producers C, and film producers D, etc. I will not list them all here.

304、训练设备根据与第一训练样本对应的第二特征信息执行分类操作,得到与第一训练样本对应的第四预测决策信息。304. The training device performs a classification operation based on the second feature information corresponding to the first training sample, and obtains fourth prediction decision information corresponding to the first training sample.

本申请的一些实施例中,训练设备还可以将与第一训练样本对应的第二特征信息输入第二分类器,通过第二分类器生成与第一训练样本对应的第四预测决策信息。In some embodiments of the present application, the training device may also input second feature information corresponding to the first training sample into a second classifier, and generate fourth prediction decision information corresponding to the first training sample through the second classifier.

其中,训练后的特征获取网络用于执行目标任务,“第四预测决策信息”是由第一分类器输出的预测决策信息,“第四预测决策信息”所指示的内容取决于目标任务。进一步地,训练后的特征获取网络可以应用于如下任一种场景:确定是否推荐第一训练样本指向的物体、确定第一训练样本中的物体是否处于目标状态或确定是否同意第一训练样本指向的申请者的请求。Among them, the trained feature acquisition network is used to perform the target task, the "fourth prediction decision information" is the prediction decision information output by the first classifier, and the content indicated by the "fourth prediction decision information" depends on the target task. Further, the trained feature acquisition network can be applied to any of the following scenarios: determining whether to recommend the object pointed by the first training sample, determining whether the object in the first training sample is in the target state, or determining whether to agree with the object pointed by the first training sample. the applicant's request.

对应的,“第四预测决策信息”可以用于指示如下任一种信息:是否推荐第一训练样本指向的物体、第一训练样本中的物体是否处于目标状态、是否同意第一训练样本指向的申请者的请求或者“第四预测决策信息”还可以用于指示其他类型的信息等等。Correspondingly, the "fourth prediction decision information" can be used to indicate any of the following information: whether to recommend the object pointed by the first training sample, whether the object in the first training sample is in the target state, whether to agree with the object pointed by the first training sample The applicant's request or "fourth prediction decision information" may also be used to indicate other types of information and so on.

进一步地,作为示例,例如第一训练样本指向的物体可以为新闻、课程或其他类型的物体等,例如目标状态可以为微笑、卷发、具有吸引力或其他类型的状态等,例如申请者的请求可以为贷款请求、升职请求或其他类型的请求等等,此处举例均仅为方便理解本方案,不用于限定本方案。需要说明的是,“第四预测决策信息”所指示的信息的内容是基于目标任务的内容确定的,此处不做穷举。提供了本方法的多种应用场景,提高了本方案的实现灵活性。Further, as an example, for example, the object pointed by the first training sample may be news, courses or other types of objects, etc., for example, the target status may be smile, curly hair, attractive or other types of status, etc., such as the applicant's request. It can be a loan request, a promotion request, or other types of requests, etc. The examples here are only for the convenience of understanding this program and are not used to limit this program. It should be noted that the content of the information indicated by the “fourth prediction decision information” is determined based on the content of the target task, and is not exhaustive here. Multiple application scenarios of this method are provided, which improves the implementation flexibility of this solution.

305、训练设备将与第一训练样本对应的第一特征信息和第二特征信息进行组合,得到第一组合后特征。305. The training device combines the first feature information and the second feature information corresponding to the first training sample to obtain the first combined feature.

本申请实施例中,训练设备可以将与第一训练样本对应的第一特征信息和第二特征信息进行组合,得到第一组合后特征。其中,前述“组合”操作的方式可以为如下任一种或 多种操作:拼接、相加或其他类型的操作等,此处不做穷举。In this embodiment of the present application, the training device may combine the first feature information and the second feature information corresponding to the first training sample to obtain the first combined feature. Among them, the aforementioned "combination" operation method can be any of the following or Various operations: splicing, addition or other types of operations, etc., not exhaustive here.

具体的,训练设备在得到与第一训练样本对应的第一特征信息和第二特征信息之后,可以直接对前述第一特征信息和第二特征信息执行组合操作;也可以对与第一训练样本对应的第一特征信息和/或第二特征信息进行预处理,进而执行上述组合操作。前述预处理操作可以包括归一化处理、通过激活函数处理、与预设权重值相乘或其他处理方式等等,此处不做限定。Specifically, after obtaining the first feature information and the second feature information corresponding to the first training sample, the training device can directly perform a combination operation on the aforementioned first feature information and the second feature information; it can also perform a combination operation on the first training sample and the first feature information. The corresponding first feature information and/or second feature information are preprocessed, and then the above combination operation is performed. The aforementioned preprocessing operations may include normalization processing, processing through activation functions, multiplication with preset weight values or other processing methods, etc., which are not limited here.

306、训练设备将第一组合后特征输入第一分类网络,得到第一分类网络输出的与第一训练样本对应的第一预测决策信息。306. The training device inputs the first combined features into the first classification network to obtain the first prediction decision information corresponding to the first training sample output by the first classification network.

本申请实施例中,训练设备可以将第一组合后特征输入第一分类网络,得到第一分类网络输出的与第一训练样本对应的第一预测决策信息;“第一预测决策信息”的含义和“第四预测决策信息”的含义类似,区别在于,“第一预测决策信息”是由第一分类网络生成的,“第四预测决策信息”是由第二分类器生成的。In the embodiment of the present application, the training device can input the first combined features into the first classification network to obtain the first prediction decision information corresponding to the first training sample output by the first classification network; the meaning of "first prediction decision information" The meaning is similar to the "fourth prediction decision information", but the difference is that the "first prediction decision information" is generated by the first classification network, and the "fourth prediction decision information" is generated by the second classifier.

307、训练设备根据第一损失函数,对特征获取网络进行训练,其中,第一损失函数包括第一损失函数项和第二损失函数项,第一损失函数项指示第一预测类别信息和第一期望类别信息之间的相似度,第一期望类别信息指示第一训练样本中与目标属性关联的信息的正确类别,采用第二损失函数项进行训练的目的包括降低第一特征信息和第二特征信息之间的相似度。307. The training device trains the feature acquisition network according to the first loss function, where the first loss function includes a first loss function term and a second loss function term, and the first loss function term indicates the first prediction category information and the first loss function term. The similarity between expected category information, the first expected category information indicates the correct category of the information associated with the target attribute in the first training sample, and the purpose of using the second loss function term for training includes reducing the first feature information and the second feature similarity between information.

本申请实施例中,训练设备可以根据第一损失函数,对特征获取网络和第一分类器进行迭代训练,直至满足收敛条件,得到训练后的特征获取网络。其中,前述收敛条件可以包括迭代训练的次数达到预设次数、满足第一损失函数的收敛条件或其他收敛条件等等,此处不做穷举。In the embodiment of the present application, the training device can iteratively train the feature acquisition network and the first classifier according to the first loss function until the convergence conditions are met, and a trained feature acquisition network is obtained. The aforementioned convergence conditions may include the number of iterative training reaching a preset number, meeting the convergence conditions of the first loss function or other convergence conditions, etc., which are not exhaustive here.

第一损失函数可以包括第一损失函数项和第二损失函数项,第一损失函数项指示第一预测类别信息和第一期望类别信息之间的相似度,第一期望类别信息指示第一训练样本中与目标属性关联的信息的正确类别,采用第一损失函数项进行训练的目的包括提高第一预测类别信息和第一期望类别信息之间的相似度。采用第二损失函数项进行训练的目的包括降低第一特征信息和第二特征信息之间的相似度。The first loss function may include a first loss function term and a second loss function term, the first loss function term indicates the similarity between the first predicted category information and the first expected category information, and the first expected category information indicates the first training The correct category of the information associated with the target attribute in the sample, and the purpose of using the first loss function term for training includes improving the similarity between the first predicted category information and the first expected category information. The purpose of using the second loss function term for training includes reducing the similarity between the first feature information and the second feature information.

进一步地,第一损失函数项可以采用第一预测类别信息和第一期望类别信息之间余弦相似度、L1相似度、L2相似度或其他类型的相似度等,或者,第一损失函数项可以基于第一预测类别信息和第一期望类别信息之间的欧式距离、余弦距离、马氏距离或其他类型的距离得到,第一预测类别信息和第一期望类别信息之间的越大,第一预测类别信息和第一期望类别信息之间的相似度越小,需要说明的是,此处对第一损失函数项的举例仅为方便理解第一损失函数项,不用于限定本方案。Further, the first loss function term can adopt cosine similarity, L1 similarity, L2 similarity or other types of similarity between the first predicted category information and the first expected category information, or the first loss function term can Based on the Euclidean distance, cosine distance, Mahalanobis distance or other types of distance between the first predicted category information and the first expected category information, the greater the distance between the first predicted category information and the first expected category information, the first The smaller the similarity between the predicted category information and the first expected category information, it should be noted that the example of the first loss function term here is only for the convenience of understanding the first loss function term and is not used to limit this solution.

具体的,步骤304至306均为可选步骤,若步骤304至306均不执行,则在重复执行步骤301至303以及步骤307多次后,可以得到训练后的特征获取网络,则可以将训练后的特征获取网络部署至执行设备上。Specifically, steps 304 to 306 are all optional steps. If steps 304 to 306 are not executed, then after repeatedly executing steps 301 to 303 and step 307 multiple times, the trained feature acquisition network can be obtained, and the trained feature acquisition network can be The subsequent feature acquisition network is deployed to the execution device.

若执行步骤304至306,则步骤307可以包括:训练设备可以根据第一损失函数,对特征获取网络、第一分类器、第二分类器和第一分类网络进行迭代训练,直至满足收敛条 件,得到训练后的特征获取网络和训练后的第一分类网络。If steps 304 to 306 are executed, step 307 may include: the training device may iteratively train the feature acquisition network, the first classifier, the second classifier and the first classification network according to the first loss function until the convergence condition is met. software to obtain the trained feature acquisition network and the trained first classification network.

更具体的,训练设备在生成第一损失函数的函数值之后,对第一损失函数的函数值进行梯度求导,并反向更新特征获取网络、第一分类器、第二分类器和第一分类网络的权重参数,以完成对特征获取网络和第一分类网络的一次训练。More specifically, after generating the function value of the first loss function, the training device performs gradient derivation on the function value of the first loss function, and reversely updates the feature acquisition network, the first classifier, the second classifier and the first classifier. The weight parameters of the classification network are used to complete a training of the feature acquisition network and the first classification network.

其中,第一损失函数还可以包括第三损失函数项和第六损失函数项,第三损失函数项指示第一预测决策信息和第一训练样本所对应的期望决策信息之间的相似度;第六损失函数项指示第四预测决策信息和第一训练样本所对应的期望决策信息之间的相似度;“第一预测决策信息和第一训练样本所对应的期望决策信息之间的相似度”和“第四预测决策信息和第一训练样本所对应的期望决策信息之间的相似度”的计算方式可以参阅“第一预测类别信息和第一期望类别信息之间的相似度”的计算方式,此处不做赘述。Wherein, the first loss function may also include a third loss function term and a sixth loss function term, and the third loss function term indicates the similarity between the first predicted decision information and the expected decision information corresponding to the first training sample; The six loss function terms indicate the similarity between the fourth predicted decision information and the expected decision information corresponding to the first training sample; "the similarity between the first predicted decision information and the expected decision information corresponding to the first training sample" For the calculation method of "the similarity between the fourth predicted decision information and the expected decision information corresponding to the first training sample", please refer to the calculation method of "the similarity between the first predicted category information and the first expected category information" , will not be described in detail here.

本实现方式中,第二损失函数项可以直接计算第一特征信息和第二特征信息之间的相似度,或者,第二损失函数项也可以采用第一损失函数项所对应的梯度和第六损失函数项所对应的梯度之间的相似度;前述两个梯度之间的相似度的计算方式可以参阅“第一预测类别信息和第一期望类别信息之间的相似度”的计算方式,此处不做赘述。In this implementation, the second loss function term can directly calculate the similarity between the first feature information and the second feature information, or the second loss function term can also use the gradient corresponding to the first loss function term and the sixth The similarity between the gradients corresponding to the loss function term; for the calculation method of the similarity between the two gradients, please refer to the calculation method of "similarity between the first predicted category information and the first expected category information", here No further details will be given.

为了进一步理解本方案,以下对第一损失函数的计算公式进行展示,此处以两个梯度之间的相似度采用两个梯度之间的余弦相似度为例。




In order to further understand this solution, the calculation formula of the first loss function is shown below. Here, the similarity between two gradients is taken as the cosine similarity between the two gradients as an example.




其中,L1代表第一损失函数,n代表训练设备从训练数据集中获取n个训练样本(也即一个批次的训练样本)来对特征获取网络进行训练,代表第二损失函数项;代表第二特征信息(也即训练样本中与敏感属性不具有关联关系的信息的特征),代表第一特征信息(也即训练样本中与敏感属性关联的信息的特征),g代表第一分类网络,yi代表第一训练样本所对应的期望决策信息,Li代表第三损失函数项,也即第一分类网络输出的第一预测决策信息和yi之间的相似度;gy代表第二分类器,代表第六损失函数项,也即第四预测决策信息和第一训练样本所对应的期望决策信息之间的相似度;ga代表第一分类器,ai代表训练样本中与所述目标属性关联的信息的正确类别(也即第一期望类别信息),代表第一损失函数项,指示第一预测类别信息和ai之间的相似度;代表第六损失函数项所对 应的梯度,代表第一损失函数项所对应的梯度,代表第二损失函数项,此处以采用第一损失函数项所对应的梯度和第六损失函数项所对应的梯度之间的余弦相似度为例;β1、β2和β3分别代表三个权重值,应理解,此处对第一损失函数的具体实现方式的举例仅为方便理解本方案,不用于限定本方案。Among them, L 1 represents the first loss function, n represents the training device to obtain n training samples (that is, a batch of training samples) from the training data set to train the feature acquisition network, Represents the second loss function term; Represents the second feature information (that is, the feature of the information in the training sample that is not associated with the sensitive attribute), represents the first feature information (that is, the characteristics of the information associated with sensitive attributes in the training sample), g represents the first classification network, yi represents the expected decision information corresponding to the first training sample, and Li represents the third loss function term , that is, the similarity between the first prediction decision information output by the first classification network and yi ; g y represents the second classifier, represents the sixth loss function term, that is, the similarity between the fourth predicted decision information and the expected decision information corresponding to the first training sample; g a represents the first classifier, a i represents the target attribute in the training sample the correct category of the associated information (i.e. the first desired category of information), Represents the first loss function term, indicating the similarity between the first predicted category information and ai ; Represents the sixth loss function term corresponding gradient, Represents the gradient corresponding to the first loss function term, represents the second loss function term. Here, the cosine similarity between the gradient corresponding to the first loss function term and the gradient corresponding to the sixth loss function term is used as an example; β 1 , β 2 and β 3 respectively represent three weight value, it should be understood that the examples of specific implementation methods of the first loss function here are only for convenience of understanding this solution and are not used to limit this solution.

若执行步骤304,且不执行步骤305和306,则步骤307可以包括:训练设备可以根据第一损失函数,对特征获取网络、第一分类器和第二分类器进行迭代训练,直至满足收敛条件,得到训练后的特征获取网络。If step 304 is executed and steps 305 and 306 are not executed, step 307 may include: the training device may iteratively train the feature acquisition network, the first classifier, and the second classifier according to the first loss function until the convergence condition is met. , get the trained feature acquisition network.

其中,第一损失函数可以包括第一损失函数项、第二损失函数项和第六损失函数项;进一步地,第二损失函数项可以直接计算第一特征信息和第二特征信息之间的相似度,或者,也可以计算第一损失函数项所对应的梯度和第六损失函数项所对应的梯度之间的相似度。Wherein, the first loss function may include a first loss function term, a second loss function term and a sixth loss function term; further, the second loss function term may directly calculate the similarity between the first feature information and the second feature information. Alternatively, the similarity between the gradient corresponding to the first loss function term and the gradient corresponding to the sixth loss function term can also be calculated.

本申请实施例中,第一损失函数项所对应的梯度是为了让能够使获取到第一特征信息能够更为准确的反映目标属性关联的信息的特征,第六损失函数项所对应的梯度是为了基于获取到的第二特征信息能够排除第一训练样本包括的目标属性关联的信息的干扰,从而能够生成更为准确的预测决策信息,第二损失函数项采用一损失函数项所对应的梯度和第六损失函数项所对应的梯度之间的相似度,能够进一步提高对特征获取网络的权重参数的更新效率,且有利于提高训练后的特征获取网络生成的第一特征信息和第二特征信息的准确度。In the embodiment of the present application, the gradient corresponding to the first loss function term is to enable the acquisition of the first feature information to more accurately reflect the characteristics of the information associated with the target attribute. The gradient corresponding to the sixth loss function term is In order to eliminate the interference of information related to the target attribute included in the first training sample based on the obtained second feature information, so as to generate more accurate prediction decision information, the second loss function term adopts the gradient corresponding to a loss function term. The similarity between the gradient corresponding to the sixth loss function term can further improve the efficiency of updating the weight parameters of the feature acquisition network, and is conducive to improving the first feature information and second features generated by the trained feature acquisition network. Accuracy of information.

若不执行步骤304,且执行步骤305和306,则步骤307可以包括:训练设备可以根据第一损失函数,对特征获取网络、第一分类器和第一分类网络进行迭代训练,直至满足收敛条件,得到训练后的特征获取网络和训练后的第一分类网络。If step 304 is not performed and steps 305 and 306 are performed, step 307 may include: the training device may iteratively train the feature acquisition network, the first classifier, and the first classification network according to the first loss function until the convergence condition is met. , obtain the trained feature acquisition network and the trained first classification network.

其中,第一损失函数可以包括第一损失函数项、第二损失函数项和第三损失函数项;本实现方式中的第二损失函数项可以计算第一特征信息和第二特征信息之间的相似度。Wherein, the first loss function may include a first loss function term, a second loss function term and a third loss function term; the second loss function term in this implementation may calculate the difference between the first feature information and the second feature information. Similarity.

本申请实施例中,在对特征获取网络进行训练的过程中,对特征获取网络进行训练的目的不仅包括从第一训练样本的特征信息中准确的获得目标属性所关联的信息的特征,还会将第一特征信息和第二特征信息进行组合得到第一组合后特征,并引入了第三损失函数项,采用第三损失函数项进行训练的目的包括提高基于第一组合后特征得到的预测决策信息的准确率,有利于进一步提高推理阶段得到的预测决策信息的准确率。In the embodiment of the present application, in the process of training the feature acquisition network, the purpose of training the feature acquisition network not only includes accurately obtaining the features of the information associated with the target attribute from the feature information of the first training sample, but also The first feature information and the second feature information are combined to obtain the first combined feature, and a third loss function term is introduced. The purpose of using the third loss function term for training includes improving the prediction decision based on the first combined feature. The accuracy of the information is conducive to further improving the accuracy of the predictive decision-making information obtained in the reasoning stage.

为了更直观地理解本方案,请参阅图4和图5,图4为本申请实施例提供的神经网络的训练方法的一种流程示意图,图5为本申请实施例提供的神经网络的训练方法中第一特征信息和第二特征信息的一种对比示意图。先参阅图4,图4中以神经网络输出的预测决策信息指示图像中的人是否在微笑为例,训练设备可以获取到一个批次(patch)的第一训练样本(图4中以四个第一训练样本为例),并获取每个第一训练样本所对应的期望决策信息(图4中以第一个没有微笑,后三个微笑为例)。训练设备将四个第一训练样本逐次输入特征提取网络,得到每个第一训练样本的特征信息,对每个第一训练样本的特征信息执行分解操作,得到每个第一训练样本所对应的第一特征信息和第二特征信息,将每个第一训 练样本所对应的第一特征信息和第二特征信息进行拼接,可以得到每个第一训练样本所对应的第一组合后特征。训练设备将每个第一训练样本所对应的第一组合后特征输入第一分类网络,得到每个第一训练样本所对应的第一预测决策信息。In order to understand this solution more intuitively, please refer to Figures 4 and 5. Figure 4 is a schematic flowchart of a neural network training method provided by an embodiment of the present application, and Figure 5 is a neural network training method provided by an embodiment of the present application. A comparison diagram of the first feature information and the second feature information. Refer to Figure 4 first. In Figure 4, the prediction decision information output by the neural network indicates whether the person in the image is smiling is used as an example. The training device can obtain the first training sample of a batch (patch) (in Figure 4, four The first training sample is taken as an example), and the expected decision information corresponding to each first training sample is obtained (in Figure 4, the first one without a smile and the last three smiles are taken as an example). The training device inputs the four first training samples into the feature extraction network one after another to obtain the feature information of each first training sample, performs a decomposition operation on the feature information of each first training sample, and obtains the feature information corresponding to each first training sample. The first feature information and the second feature information, each first training The first feature information and the second feature information corresponding to the training sample are spliced, and the first combined feature corresponding to each first training sample can be obtained. The training device inputs the first combined features corresponding to each first training sample into the first classification network to obtain the first prediction decision information corresponding to each first training sample.

训练设备还可以根据每个第一训练样本所对应的第一特征信息执行分类操作,得到每个第一特征信息所对应的第一预测类别信息;根据每个第一特征信息所对应的第一预测类别信息、每个第一特征信息所对应的第一期望类别信息、每个第一训练样本所对应的第二特征信息、每个第一训练样本所对应的第一预测决策信和每个第一训练样本所对应的期望决策信息,生成第一损失函数的函数值,对第一损失函数的函数值进行梯度求导,并反向更新特征提取网络和第一分类网络的权重参数,以完成对特征提取网络和第一分类网络的多次训练。应理解,图4中的示例仅为方便理解本方案,不用于限定本方案。The training device may also perform a classification operation based on the first feature information corresponding to each first training sample to obtain the first prediction category information corresponding to each first feature information; Prediction category information, first expected category information corresponding to each first feature information, second feature information corresponding to each first training sample, first prediction decision information corresponding to each first training sample, and each third The expected decision information corresponding to a training sample is generated to generate the function value of the first loss function, perform gradient derivation of the function value of the first loss function, and reversely update the weight parameters of the feature extraction network and the first classification network to complete Multiple trainings of the feature extraction network and the first classification network. It should be understood that the example in Figure 4 is only for convenience of understanding this solution and is not used to limit this solution.

请继续参阅图5,图5是对第一特征信息和第二特征信息进行可视化处理后得到的图像,图5中以神经网络输出的预测决策信息指示图像中的人是否为卷发,目标属性为性别为例,如图5所示,第一训练样本所对应的第二特征信息(也即第一训练样本中与目标属性不具有关联关系的信息的特征)中携带的更多是图像中头发区域的特征信息,第一训练样本所对应的第一特征信息(也即第一训练样本中与目标属性关联的信息的特征)中头发区域的特征信息较少,携带的更多是图像中与性别关联的信息的特征;对比第一训练样本所对应的第二特征信息和第一特征信息,两者关注的地方明显不同,应理解,图5中的示例仅为方便理解本方案,不用于限定本方案。Please continue to refer to Figure 5. Figure 5 is an image obtained after visualizing the first feature information and the second feature information. In Figure 5, the prediction decision information output by the neural network indicates whether the person in the image has curly hair. The target attribute is Taking gender as an example, as shown in Figure 5, the second feature information corresponding to the first training sample (that is, the features of the information that is not associated with the target attribute in the first training sample) carries more of the hair in the image The characteristic information of the area, the first characteristic information corresponding to the first training sample (that is, the characteristics of the information associated with the target attribute in the first training sample) has less characteristic information of the hair area, and carries more information related to the image. Characteristics of gender-related information; comparing the second characteristic information and the first characteristic information corresponding to the first training sample, the focus of the two is obviously different. It should be understood that the example in Figure 5 is only for the convenience of understanding this solution and is not used for Limited to this plan.

308、训练设备通过训练后的特征获取网络生成与第二训练样本对应的第二特征信息。308. The training device generates second feature information corresponding to the second training sample through the trained feature acquisition network.

本申请的一些实施例中,训练设备在根据第一损失函数对特征获取网络进行迭代训练后,可以得到训练后的特征获取网络,训练设备还可以从训练数据集中获取第二训练样本,通过训练后的特征获取网络(也即步骤301至307阶段得到的训练后的特征获取网络)生成与第二训练样本对应的第二特征信息;可选地,训练设备可以生成与第二训练样本对应的第一特征信息和第二特征信息。In some embodiments of the present application, the training device can obtain the trained feature acquisition network after iteratively training the feature acquisition network according to the first loss function. The training device can also obtain the second training sample from the training data set, and through the training The final feature acquisition network (that is, the trained feature acquisition network obtained in steps 301 to 307) generates the second feature information corresponding to the second training sample; optionally, the training device can generate the second feature information corresponding to the second training sample. first characteristic information and second characteristic information.

上述步骤的具体实现方式可以参阅步骤302中的描述,“与第二训练样本对应的第一特征信息和第二特征信息”和“与第一训练样本对应的第一特征信息和第二特征信息”的概念类似,区别在于,第一训练样本和第二训练样本是不同的训练样本。For the specific implementation of the above steps, please refer to the description in step 302, "First feature information and second feature information corresponding to the second training sample" and "First feature information and second feature information corresponding to the first training sample" "The concept is similar, the difference is that the first training sample and the second training sample are different training samples.

可选地,训练设备在获取到与第二训练样本对应的第二特征信息之后,还可以根据与第二训练样本对应的第二特征信息执行分类操作,得到与第二训练样本对应的第五预测决策信息;“与第二训练样本对应的第五预测决策信息”和“与第一训练样本对应的第四预测决策信息”的概念类似,“第五预测决策信息”的具体获取方式可以参阅上述步骤中对“第四预测决策信息”的具体获取方式的描述,此处不做赘述。Optionally, after acquiring the second feature information corresponding to the second training sample, the training device can also perform a classification operation based on the second feature information corresponding to the second training sample to obtain the fifth feature information corresponding to the second training sample. Prediction decision information; the concepts of "fifth prediction decision information corresponding to the second training sample" and "fourth prediction decision information corresponding to the first training sample" are similar. For the specific acquisition method of "fifth prediction decision information", please refer to The description of the specific acquisition method of the "fourth prediction decision information" in the above steps will not be repeated here.

309、训练设备获取第三特征信息,第三特征信息和第二训练样本所对应的第一特征信息的数据尺寸相同且数据内容不同。309. The training device obtains the third feature information. The third feature information and the first feature information corresponding to the second training sample have the same data size and different data contents.

本申请的一些实施例中,训练设备还可以获取第三特征信息;其中,第三特征信息和第二训练样本所对应的第一特征信息的数据尺寸相同且数据内容不同。In some embodiments of the present application, the training device can also obtain third feature information; wherein the third feature information and the first feature information corresponding to the second training sample have the same data size and different data contents.

进一步地,若第一特征信息具体表现为N维的张量,则第三特征信息也表现为N维的 张量,且第三特征信息和第一特征信息在前述N维中的每一个维度上的长度均相同。作为示例,若第一特征信息具体表现为向量,则第三特征信息也表现为向量,且第三特征信息和第一特征信息的长度相同;若第一特征信息具体表现为矩阵,则第三特征信息也表现为矩阵,且第三特征信息和第一特征信息的长度和宽度均相同等,此处不做穷举。Further, if the first feature information is specifically expressed as an N-dimensional tensor, the third feature information is also expressed as an N-dimensional tensor. tensor, and the third feature information and the first feature information have the same length in each of the aforementioned N dimensions. As an example, if the first feature information is specifically expressed as a vector, then the third feature information is also expressed as a vector, and the length of the third feature information and the first feature information are the same; if the first feature information is specifically expressed as a matrix, then the third feature information is specifically expressed as a vector. The feature information is also represented as a matrix, and the length and width of the third feature information and the first feature information are the same, so an exhaustive list is not included here.

第三特征信息和第二训练样本所对应的第一特征信息的数据内容不同指的是:第三特征信息和第二训练样本所对应的第一特征信息的数据内容不完全相同,也即第三特征信息和第二训练样本所对应的第一特征信息中存在不同的数据即可。The difference in data content between the third feature information and the first feature information corresponding to the second training sample means that the data content of the third feature information and the first feature information corresponding to the second training sample are not exactly the same, that is, the third feature information and the first feature information corresponding to the second training sample are different in data content. It suffices that there are different data in the three feature information and the first feature information corresponding to the second training sample.

具体的,在一种实现方式中,训练设备可以获取与第二训练样本对应的第一特征信息,将与第二训练样本对应的第一特征信息和扰动信息进行加权求和,得到第三特征信息;其中,该扰动信息的权重值可以为可变的,也可以为固定不变的。Specifically, in one implementation, the training device can obtain the first feature information corresponding to the second training sample, perform a weighted summation of the first feature information and the disturbance information corresponding to the second training sample, and obtain the third feature. information; where the weight value of the disturbance information can be variable or fixed.

进一步地,该扰动信息可以为由训练设备随机生成的信息,也可以包括第一损失函数项所对应的梯度,该扰动信息也可以通过其他方式获得等,此处不做穷举。Further, the disturbance information may be information randomly generated by the training equipment, or may include the gradient corresponding to the first loss function term. The disturbance information may also be obtained through other methods, etc., and is not exhaustive here.

为更直观地理解本方案,如下公开了第三特征信息的计算公式的一个示例:
In order to understand this solution more intuitively, an example of the calculation formula of the third characteristic information is disclosed as follows:

其中,代表第三特征信息,代表第一特征信息,代表扰动信息,α1代表扰动信息的权重,α1是从[0,λ]中均匀采样得到的,代表第一损失函数项所对应的梯度,代表第一损失函数项,代表第一分类器输出的与第一特征信息对应的预测类别信息,ai代表与第一特征信息对应的期望类别信息(也即第二训练样本中与目标属性关联的信息的期望类别信息),代表第一损失函数项所对应的梯度的二范数;进一步地,λ可以为不变的,也可以为可调整的,应理解,式(6)中的示例仅用于方便理解本方案,不用于限定本方案。in, represents the third characteristic information, represents the first characteristic information, represents the disturbance information, α 1 represents the weight of the disturbance information, α 1 is uniformly sampled from [0, λ], Represents the gradient corresponding to the first loss function term, Represents the first loss function term, represents the predicted category information corresponding to the first feature information output by the first classifier, and a i represents the expected category information corresponding to the first feature information (that is, the expected category information of the information associated with the target attribute in the second training sample) , represents the second norm of the gradient corresponding to the first loss function term; further, λ can be constant or adjustable. It should be understood that the example in formula (6) is only for the convenience of understanding this solution, It is not used to limit this plan.

在另一种实现方式中,训练设备也可以从第三训练样本中获取第一特征信息,并将与第三训练样本对应的第一特征信息,确定为与第二训练样本对应的第三特征信息;第三训练样本中与目标属性关联的信息的正确类别,和,第二训练样本中与目标属性关联的信息的正确类别不同。In another implementation, the training device may also obtain the first feature information from the third training sample, and determine the first feature information corresponding to the third training sample as the third feature corresponding to the second training sample. information; the correct category of the information associated with the target attribute in the third training sample, and the correct category of the information associated with the target attribute in the second training sample are different.

在另一种实现方式中,训练设备可以随机获取一个第三特征信息,第二训练样本所对应的第一特征信息与前述随机获取的第三特征信息的数据尺寸相同。需要说明的是,训练设备还可以采用其他方式得到第三特征信息,此处不做穷举。In another implementation, the training device can randomly obtain a piece of third feature information, and the first feature information corresponding to the second training sample has the same data size as the aforementioned randomly obtained third feature information. It should be noted that the training device can also obtain the third feature information in other ways, and this list will not be exhaustive here.

可选地,训练设备在获取到与第二训练样本对应的第一特征信息之后,还可以根据与第二训练样本对应的第一特征信息执行分类操作,得到第二预测类别信息,第二预测类别信息指示与第二训练样本对应的第一特征信息的预测类别;“第二预测类别信息”和“第一预测类别信息”的概念类似,“第二预测类别信息”的具体获取方式可以参阅上述步骤中对“第一预测类别信息”的具体获取方式的描述,此处不做赘述。 Optionally, after acquiring the first feature information corresponding to the second training sample, the training device can also perform a classification operation based on the first feature information corresponding to the second training sample to obtain the second prediction category information. The second prediction The category information indicates the prediction category of the first feature information corresponding to the second training sample; the concepts of "second prediction category information" and "first prediction category information" are similar. For the specific acquisition method of "second prediction category information", please refer to The description of the specific acquisition method of the "first prediction category information" in the above steps will not be repeated here.

310、训练设备将与第二训练样本对应的第二特征信息和第三特征信息进行组合,得到第二组合后特征。310. The training device combines the second feature information and the third feature information corresponding to the second training sample to obtain the second combined feature.

311、训练设备将第二组合后特征输入训练后的第一分类网络,得到训练后的第一分类网络输出的与第二训练样本对应的第二预测决策信息。311. The training device inputs the second combined features into the trained first classification network, and obtains the second prediction decision information corresponding to the second training sample output by the trained first classification network.

本申请实施例中,步骤310和311的具体实现方式可以参阅上述步骤305和306中的描述,区别在于将步骤305和306中的“与第一训练样本对应的第二特征信息”替换为步骤310和311中的“与第二训练样本对应的第二特征信息”,将步骤305和306中的“与第一训练样本对应的第一特征信息”替换为步骤310和311中的“第三特征信息”,“第二预测决策信息”的含义与上述“第四预测决策信息”的含义类似,此处均不再赘述。In the embodiment of this application, the specific implementation of steps 310 and 311 can refer to the description in steps 305 and 306 above. The difference is that the "second feature information corresponding to the first training sample" in steps 305 and 306 is replaced by step "Second feature information corresponding to the second training sample" in steps 310 and 311, replace "first feature information corresponding to the first training sample" in steps 305 and 306 with "third feature information corresponding to the first training sample" in steps 310 and 311 The meanings of "feature information" and "second prediction decision information" are similar to the meanings of the above-mentioned "fourth prediction decision information", and will not be described again here.

312、训练设备将第二组合后特征输入第二分类网络,得到第二分类网络输出的与第二训练样本对应的第三预测决策信息。312. The training device inputs the second combined features into the second classification network, and obtains the third prediction decision information output by the second classification network corresponding to the second training sample.

本申请的一些实施例中,训练设备可以将第二组合后特征输入第二分类网络,得到第二分类网络输出的与第二训练样本对应的第三预测决策信息,“第三预测决策信息”的含义与上述“第四预测决策信息”的含义类似,此处不再赘述。In some embodiments of the present application, the training device can input the second combined features into the second classification network to obtain the third prediction decision information corresponding to the second training sample output by the second classification network, "third prediction decision information" The meaning is similar to the meaning of the "fourth prediction decision information" mentioned above, and will not be described again here.

313、训练设备根据第二损失函数,对第二分类网络进行训练,其中,第二损失函数包括第四损失函数项和第五损失函数项,第四损失函数项指示第二预测决策信息和第二训练样本所对应的期望决策信息之间的相似度,第五损失函数指示第二预测决策信息和第三预测决策信息之间的相似度。313. The training device trains the second classification network according to the second loss function, where the second loss function includes a fourth loss function term and a fifth loss function term, and the fourth loss function term indicates the second prediction decision information and the fifth loss function term. The similarity between the expected decision information corresponding to the two training samples, and the fifth loss function indicates the similarity between the second prediction decision information and the third prediction decision information.

本申请的一些实施例中,训练设备还可以保持第一分类网络的权重参数不变,根据第二损失函数,对第二分类网络进行迭代训练直至满足收敛条件,得到训练后的第二分类网络,训练后的特征获取网络和训练后的第二分类网络归属于同一目标神经网络,该目标神经网络将会被布置于执行设备上,以执行目标任务。前述目标任务可以为如下任一项任务:确定是否推荐第一训练样本指向的物体、确定第一训练样本中的物体是否处于目标状态、确定是否同意第一训练样本指向的申请者的请求或其他类型的任务。In some embodiments of the present application, the training device can also keep the weight parameters of the first classification network unchanged, and iteratively train the second classification network according to the second loss function until the convergence conditions are met to obtain the trained second classification network. , the trained feature acquisition network and the trained second classification network belong to the same target neural network, and the target neural network will be arranged on the execution device to perform the target task. The aforementioned target task may be any of the following tasks: determining whether to recommend the object pointed to by the first training sample, determining whether the object in the first training sample is in the target state, determining whether to agree to the request of the applicant pointed to by the first training sample, or other tasks. type of tasks.

可选地,训练设备还可以保持第一分类网络的权重参数不变,根据第二损失函数,对特征获取网络和第二分类网络进行迭代训练直至满足收敛条件,得到训练后的第二分类网络和再次训练后的特征获取网络,训练后的第二分类网络和再次训练后的特征获取网络归属于上述目标神经网络。Optionally, the training device can also keep the weight parameters of the first classification network unchanged, and iteratively train the feature acquisition network and the second classification network according to the second loss function until the convergence conditions are met, to obtain the trained second classification network. and the feature acquisition network after retraining, the second classification network after training and the feature acquisition network after retraining belong to the above-mentioned target neural network.

其中,第二损失函数至少包括第四损失函数项和第五损失函数项,第四损失函数项指示第二预测决策信息和第二训练样本所对应的期望决策信息之间的相似度,第五损失函数指示第二预测决策信息和第三预测决策信息之间的相似度。Wherein, the second loss function at least includes a fourth loss function term and a fifth loss function term, the fourth loss function term indicates the similarity between the second prediction decision information and the expected decision information corresponding to the second training sample, and the fifth loss function term indicates the similarity between the second prediction decision information and the expected decision information corresponding to the second training sample. The loss function indicates the similarity between the second prediction decision information and the third prediction decision information.

进一步地,第四损失函数项具体可以采用第二预测决策信息和第二训练样本所对应的期望决策信息之间的余弦相似度、L1相似度、L2相似度或其他类型的相似度等,或者,第四损失函数项可以基于第二预测决策信息和第二训练样本所对应的期望决策信息之间的欧式距离、余弦距离、马氏距离或其他类型的距离得到等,此处不做穷举。第五损失函数项的具体表现形式可以参阅第四损失函数项的具体表现形式,此处不做赘述。Further, the fourth loss function term may specifically adopt cosine similarity, L1 similarity, L2 similarity or other types of similarity between the second predicted decision information and the expected decision information corresponding to the second training sample, or , the fourth loss function term can be obtained based on the Euclidean distance, cosine distance, Mahalanobis distance or other types of distance between the second predicted decision information and the expected decision information corresponding to the second training sample, etc., no exhaustive list will be made here. . For the specific expression form of the fifth loss function term, please refer to the specific expression form of the fourth loss function term, and will not be described in detail here.

可选地,第二损失函数还可以包括第一损失函数项和第二损失函数项;对于“第一损 失函数项和第二损失函数项”的描述可以参阅步骤307中的描述,需要说明的是,步骤307中第一损失函数项和第二损失函数项均基于第一训练样本计算得到,步骤313中是基于第二训练样本计算得到的。Optionally, the second loss function may also include a first loss function term and a second loss function term; for "first loss function For the description of "loss function term and second loss function term", please refer to the description in step 307. It should be noted that in step 307, the first loss function term and the second loss function term are both calculated based on the first training sample. Step 313 is calculated based on the second training sample.

进一步可选地,第二损失函数还可以包括第七损失函数项,第七损失函数项指示第五预测决策信息和第二训练样本所对应的期望决策信息之间的相似度;则第二损失函数项可以采用第一损失函数项所对应的梯度和第七损失函数项所对应的梯度之间的相似度。Further optionally, the second loss function may also include a seventh loss function term, the seventh loss function term indicates the similarity between the fifth predicted decision information and the expected decision information corresponding to the second training sample; then the second loss The function term may adopt the similarity between the gradient corresponding to the first loss function term and the gradient corresponding to the seventh loss function term.

为进一步理解本方案,以下公开了第二损失函数的计算公式的一个示例:



To further understand this solution, an example of the calculation formula of the second loss function is disclosed below:



其中,L2代表第二损失函数,n代表训练设备从训练数据集中获取n个训练样本(也即一个批次的训练样本)来对特征获取网络和第二分类网络进行训练,代表第四损失函数项,代表第五损失函数项,代表第三特征信息,代表第二分类网络输出的与第二训练样本对应的第三预测决策信息,yi代表与第二训练样本对应的期望决策信息,代表训练后的第一分类网络输出的与第二训练样本对应的第二预测决策信息,γ代表权重值,式(7)中其它字符的含义可以参阅上述对式(1)的介绍,此处不做赘述。Among them, L 2 represents the second loss function, and n represents the training device that obtains n training samples (that is, a batch of training samples) from the training data set to train the feature acquisition network and the second classification network. Represents the fourth loss function term, Represents the fifth loss function term, represents the third characteristic information, represents the third prediction decision information corresponding to the second training sample output by the second classification network, yi represents the expected decision information corresponding to the second training sample, represents the second prediction decision information corresponding to the second training sample output by the trained first classification network, γ represents the weight value, and the meaning of other characters in equation (7) can be found in the above introduction to equation (1), here No further details will be given.

为了更直观地理解本方案,请参阅图6和图7,图6和图7为本申请实施例提供的神经网络的训练方法的两种流程示意图。图6中主要展示的是第二训练阶段中对目标神经网络包括的特征获取网络和第二分类网络进行训练的流程。图6中以神经网络输出的预测决策信息指示图像中的人是否在微笑为例,训练设备可以获取到一个批次(patch)的第二训练样本(图4中以四个第二训练样本为例),并获取每个第二训练样本所对应的期望决策信息(图4中以第一个没有微笑,后三个微笑为例)。训练设备将四个第二训练样本逐次输入特征提取网络,得到每个第二训练样本的特征信息,对每个第二训练样本的特征信息执行分解操作,得到每个第二训练样本所对应的第一特征信息和第二特征信息。In order to understand this solution more intuitively, please refer to Figures 6 and 7. Figures 6 and 7 are two schematic flow charts of neural network training methods provided by embodiments of the present application. Figure 6 mainly shows the process of training the feature acquisition network and the second classification network included in the target neural network in the second training stage. In Figure 6, the prediction decision information output by the neural network indicates whether the person in the image is smiling is used as an example. The training device can obtain a batch of second training samples (in Figure 4, four second training samples are For example), and obtain the expected decision information corresponding to each second training sample (in Figure 4, the first one without a smile and the last three smiles are taken as an example). The training device inputs the four second training samples into the feature extraction network one after another to obtain the feature information of each second training sample, performs a decomposition operation on the feature information of each second training sample, and obtains the feature information corresponding to each second training sample. first characteristic information and second characteristic information.

训练设备将每个第二训练样本所对应的第一特征信息和扰动信息进行组合,得到每个第二训练样本所对应的第三特征信息,将每个第二训练样本所对应的第三特征信息和第二特征信息进行拼接,可以得到每个第二训练样本所对应的第二组合后特征。The training device combines the first feature information corresponding to each second training sample and the disturbance information to obtain the third feature information corresponding to each second training sample, and combines the third feature information corresponding to each second training sample By splicing the information and the second feature information, the second combined features corresponding to each second training sample can be obtained.

训练设备将每个第二训练样本所对应的第二组合后特征分别训练后的第一分类网络和第二分类网络,得到训练后的第一分类网络输出的与第二训练样本对应的第二预测决策信息,和,第二分类网络输出的与第二训练样本对应的第三预测决策信息。 The training device separates the second combined features corresponding to each second training sample into the first classification network and the second classification network after training, and obtains the second classification network output by the trained first classification network corresponding to the second training sample. Prediction decision information, and third prediction decision information corresponding to the second training sample output by the second classification network.

训练设备可以根据每个第二训练样本所对应的第二预测决策信息、每个第二训练样本所对应的第三预测决策信息以及每个第二训练样本所对应的期望决策信息,生成第二损失函数的函数值,对第二损失函数的函数值进行反向求导,并更新特征提取网络和第二分类网络的权重参数,以完成对特征提取网络和第一分类网络的多次训练。需要说明的是,第二损失函数的具体含义可以参阅上述描述,图6中的示例仅为方便理解本方案,不用于限定本方案。The training device may generate the second prediction decision information according to the second prediction decision information corresponding to each second training sample, the third prediction decision information corresponding to each second training sample, and the expected decision information corresponding to each second training sample. The function value of the loss function is reversely derived from the function value of the second loss function, and the weight parameters of the feature extraction network and the second classification network are updated to complete multiple trainings of the feature extraction network and the first classification network. It should be noted that the specific meaning of the second loss function can be referred to the above description. The example in Figure 6 is only for convenience of understanding this solution and is not used to limit this solution.

本申请实施例中,第三特征信息和第一特征信息的数据尺寸相同且数据内容不同,则与第二训练样本对应的第一特征信息和第二特征信息进行组合后得到的组合后特征,与第二组合后特征相比,也即第二训练样本中目标属性所关联的信息的特征发生的改变,但训练的目标还包括生成第二训练样本原本的期望决策信息,也即在训练阶段增加了不存在的训练数据,且训练目标包括基于不同类别的目标属性的信息均能够得到期望决策信息;通过前述方案,不仅增加了训练数据的多样性;且有利于降低训练后的神经网络对输入数据中目标属性关联的信息的依赖性,更加关注和所执行的任务相关的信息上,有利于提高输出的预测决策信息的准确性,且有利于提高针对不同类别的目标属性所指向的群体得到的预测决策信息的公平性。In the embodiment of the present application, if the data size of the third feature information and the first feature information are the same and the data content is different, then the combined feature obtained by combining the first feature information and the second feature information corresponding to the second training sample, Compared with the second combined features, that is, the changes in the characteristics of the information associated with the target attribute in the second training sample, but the goal of training also includes generating the original expected decision information of the second training sample, that is, in the training phase Non-existent training data is added, and the training target includes information based on different categories of target attributes to obtain the expected decision-making information; through the above solution, not only the diversity of training data is increased; it is also beneficial to reducing the impact of the neural network after training. The dependence of the information associated with the target attributes in the input data, paying more attention to the information related to the tasks performed, is conducive to improving the accuracy of the output prediction decision information, and is conducive to improving the groups pointed to by different categories of target attributes. The fairness of the obtained prediction decision-making information.

本申请实施例中,先将第二训练样本的特征信息进行分解,再对目标属性关联的信息的特征进行干预,从而得到与第二训练样本的实事相反的训练样本,若扰动信息的权重值越大,则第三特征信息和第一特征信息之间的相似度越低,则越有利于提高得到的预测决策信息的公平性;则可以通过调整扰动信息的权重值可以平衡得到的预测决策信息的准确性和公平性。In the embodiment of the present application, the characteristic information of the second training sample is first decomposed, and then the characteristics of the information associated with the target attribute are intervened, thereby obtaining a training sample that is opposite to the actual situation of the second training sample. If the weight value of the perturbation information is The larger the value, the lower the similarity between the third feature information and the first feature information, which is more conducive to improving the fairness of the obtained prediction decision information; then the obtained prediction decision can be balanced by adjusting the weight value of the disturbance information Accuracy and fairness of information.

为了更直观地理解本方案,请参阅图7,图7为本申请实施例提供的神经网络的训练方法的一种流程示意图。如图7所示,本申请实施例提供的神经网络的训练方法可以分为第一训练阶段和第二训练阶段,在第一训练阶段中,训练设备将第一训练样本输入特征提取网络中,并对提取到的特征信息进行分解得到第一训练样本所对应的第一特征信息和第二特征信息,并将第一训练样本所对应的第一特征信息和第二特征信息进行拼接,得到第一组合后特征;训练设备可以将第一组合后特征输入第一分类网络,得到第一分类网络输出的与所述第一训练样本对应的第一预测决策信息;训练设备可以根据前述步骤中得到的信息,生成第一损失函数的函数值,并更新特征获取网络和第一分类网络的权重参数。训练设备重复执行前述步骤,以对特征获取网络和第一分类网络进行迭代训练,得到训练后的特征获取网络和训练后的第一分类网络。In order to understand this solution more intuitively, please refer to Figure 7 , which is a schematic flowchart of a neural network training method provided by an embodiment of the present application. As shown in Figure 7, the neural network training method provided by the embodiment of the present application can be divided into a first training stage and a second training stage. In the first training stage, the training device inputs the first training sample into the feature extraction network, The extracted feature information is decomposed to obtain the first feature information and the second feature information corresponding to the first training sample, and the first feature information and the second feature information corresponding to the first training sample are spliced to obtain the first feature information. A combined feature; the training device can input the first combined feature into the first classification network to obtain the first prediction decision information output by the first classification network corresponding to the first training sample; the training device can obtain according to the aforementioned steps information, generate the function value of the first loss function, and update the weight parameters of the feature acquisition network and the first classification network. The training device repeatedly performs the aforementioned steps to iteratively train the feature acquisition network and the first classification network to obtain the trained feature acquisition network and the trained first classification network.

在第二训练阶段中,训练设备将第二训练样本输入特征提取网络中,并对提取到的特征信息进行分解得到第二训练样本所对应的第一特征信息和第二特征信息,将第二训练样本所对应的第一特征信息和扰动信息进行加权求和,得到第三特征信息,将第二训练样本所对应的第二特征信息和第三特征信息进行拼接,得到第二组合后特征。In the second training phase, the training device inputs the second training sample into the feature extraction network, decomposes the extracted feature information to obtain the first feature information and the second feature information corresponding to the second training sample, and converts the second training sample into the feature extraction network. The first feature information and disturbance information corresponding to the training sample are weighted and summed to obtain the third feature information, and the second feature information and the third feature information corresponding to the second training sample are spliced to obtain the second combined feature.

训练设备将第二组合后特征分别输入第二分类网络和训练后的第一分类网络,得到第二分类网络输出的与第二训练样本对应的第三预测决策信息,并得到训练后的第一分类网络输出的与第二训练样本对应的第二预测决策信息;训练设备可以基于得到的前述信息, 生成第二损失函数的函数值,并保持第一分类网络的权重参数不变,更新特征获取网络和第一分类网络的权重参数。训练设备重复执行前述步骤,以对特征获取网络和第二分类网络进行迭代训练,得到再次训练后的特征获取网络和训练后的第二分类网络。应理解,图7中的示例仅为方便理解本方案,不用于限定本方案。The training device inputs the second combined features into the second classification network and the trained first classification network respectively, obtains the third prediction decision information output by the second classification network and corresponds to the second training sample, and obtains the trained first The second prediction decision information corresponding to the second training sample output by the classification network; the training device can, based on the obtained aforementioned information, Generate the function value of the second loss function, keep the weight parameters of the first classification network unchanged, and update the weight parameters of the feature acquisition network and the first classification network. The training device repeatedly performs the aforementioned steps to iteratively train the feature acquisition network and the second classification network to obtain the retrained feature acquisition network and the trained second classification network. It should be understood that the example in Figure 7 is only for convenience of understanding this solution and is not used to limit this solution.

本申请实施例中,训练后的特征获取网络能够分别获取到输入数据中与目标属性关联的信息的特征,和,输入数据中与目标属性不关联的信息的特征,从而用户可以根据当前任务中导致预测决策信息出现偏差的原因来确定目标属性,将导致出现偏差的目标属性所关联的信息的特征和与目标属性不关联的信息的特征分别提取出来,有利于降低后续决策过程的难度,有利于提高最终得到的预测决策信息的准确度。In the embodiment of the present application, the trained feature acquisition network can respectively obtain the characteristics of the information associated with the target attribute in the input data, and the characteristics of the information not associated with the target attribute in the input data, so that the user can obtain the characteristics of the information based on the current task. Determine the target attributes that cause deviations in the prediction decision information, and extract the characteristics of the information associated with the target attributes that cause the deviations and the characteristics of the information not associated with the target attributes, which will help reduce the difficulty of the subsequent decision-making process and have It is beneficial to improve the accuracy of the final prediction decision information.

二、推理阶段2. Reasoning stage

本申请实施例中,具体的,请参阅图8,图8为本申请实施例提供的数据的处理方法的一种流程示意图,本申请实施例提供的数据的处理方法可以包括:In the embodiment of the present application, specifically, please refer to Figure 8. Figure 8 is a schematic flow chart of the data processing method provided by the embodiment of the present application. The data processing method provided by the embodiment of the present application may include:

801、执行设备将待处理数据输入特征获取网络,通过特征获取网络对待处理数据进行特征提取,得到待处理数据的特征信息。801. The execution device inputs the data to be processed into the feature acquisition network, extracts features from the data to be processed through the feature acquisition network, and obtains the feature information of the data to be processed.

802、执行设备根据待处理数据的特征信息,通过特征获取网络生成第一特征信息和第二特征信息,第一特征信息包括待处理数据中与目标属性关联的信息的特征。802. The execution device generates first feature information and second feature information through the feature acquisition network based on the feature information of the data to be processed. The first feature information includes features of the information associated with the target attribute in the data to be processed.

本申请实施例中,步骤801和802的具体实现方式可以参阅上述图3对应实施例中步骤301和302中的描述,区别在于将步骤301和302中的“第一训练样本”替换为步骤801和802中的“待处理数据”,步骤801和802中各个名词的具体含义可以参阅上述图3对应实施例中的描述,此处均不再赘述。In the embodiment of this application, the specific implementation of steps 801 and 802 can refer to the description of steps 301 and 302 in the corresponding embodiment of Figure 3 above. The difference is that the "first training sample" in steps 301 and 302 is replaced with step 801 and "data to be processed" in step 802. The specific meaning of each noun in steps 801 and 802 can be referred to the description in the corresponding embodiment of Figure 3 above, and will not be described again here.

803、执行设备将第一特征信息和第二特征信息进行组合,得到第一组合后特征。803. The execution device combines the first feature information and the second feature information to obtain the first combined feature.

804、执行设备将第一组合后特征输入分类网络,得到分类网络输出的预测决策信息,其中,特征获取网络采用第一损失函数训练得到,第一损失函数至少包括第一损失函数项和第二损失函数项,第一损失函数项指示预测类别信息和期望类别信息之间的相似度,预测类别信息指示输入特征获取网络的数据中与目标属性关联的信息的预测类别,期望类别信息指示输入特征获取网络的数据中与目标属性关联的信息的正确类别,采用第二损失函数项进行训练的目的包括降低第一特征信息和第二特征信息之间的相似度。804. The execution device inputs the first combined features into the classification network to obtain prediction decision information output by the classification network. The feature acquisition network is trained using a first loss function, and the first loss function at least includes a first loss function term and a second loss function. Loss function term, the first loss function term indicates the similarity between the predicted category information and the expected category information, the predicted category information indicates the predicted category of the information associated with the target attribute in the data of the input feature acquisition network, and the expected category information indicates the input feature The purpose of obtaining the correct category of information associated with the target attribute in the network data and using the second loss function term for training includes reducing the similarity between the first feature information and the second feature information.

本申请实施例中,步骤803和804的具体实现方式可以参阅上述图3对应实施例中步骤305和306中的描述,区别在于将步骤305和306中的“第一训练样本”替换为步骤803和804中的“待处理数据”,步骤803和804中各个名词的具体含义可以参阅上述图3对应实施例中的描述,此处均不再赘述。In the embodiment of this application, the specific implementation of steps 803 and 804 can refer to the description of steps 305 and 306 in the corresponding embodiment of Figure 3 above. The difference is that the "first training sample" in steps 305 and 306 is replaced with step 803 and "data to be processed" in step 804. For the specific meanings of each noun in steps 803 and 804, please refer to the description in the corresponding embodiment of Figure 3 above, and will not be described again here.

其中,特征获取网络采用第一损失函数训练得到,特征获取网络的具体形式可以参阅图3对应实施例中的描述;分类网络具体可以采用图3对应实施例中的第一分类网络,也可以采用图3对应实施例中的第二分类网络,也可以为采用其他方式得到的训练后的分类网络等,此处不做穷举。Among them, the feature acquisition network is trained using the first loss function. For the specific form of the feature acquisition network, please refer to the description in the corresponding embodiment of Figure 3; the classification network can specifically use the first classification network in the corresponding embodiment of Figure 3, or it can also be used Figure 3 corresponds to the second classification network in the embodiment, and may also be a trained classification network obtained by other methods, etc., and is not exhaustive here.

可选地,特征获取网络和分类网络采用第一损失函数和第二损失函数训练得到,对于第一损失函数和第二损失函数的含义可以参阅图3对应实施例中的描述,此处不做赘述。 对于“第一损失函数和第二损失函数”中各个损失函数项的具体含义,也可以参阅图3对应实施例中的描述,此处均不做描述。Optionally, the feature acquisition network and the classification network are trained using the first loss function and the second loss function. For the meaning of the first loss function and the second loss function, please refer to the description in the corresponding embodiment in Figure 3, which will not be done here. Repeat. For the specific meaning of each loss function term in the "first loss function and the second loss function", you can also refer to the description in the corresponding embodiment in Figure 3, which will not be described here.

本申请实施例中,分别获取待处理数据中与目标属性关联的信息的特征,以及,待处理数据中与目标属性不具有关联关系的信息的特征,由于用户可以基于当前任务中导致预测决策信息出现偏差的原因来确定目标属性,将导致出现偏差的目标属性所关联的信息的特征和与目标属性不关联的信息的特征分别提取出来,有利于分类网络在生成预测决策信息过程的难度,有利于提高最终得到的预测决策信息的准确度。In the embodiment of the present application, the characteristics of the information associated with the target attribute in the data to be processed are obtained respectively, and the characteristics of the information in the data to be processed that are not associated with the target attribute are obtained, because the user can predict and make decisions based on the information in the current task. Determine the cause of the deviation to determine the target attribute, and extract the characteristics of the information associated with the target attribute that caused the deviation and the characteristics of the information not associated with the target attribute, which is beneficial to the difficulty of the classification network in the process of generating predictive decision-making information. It is beneficial to improve the accuracy of the final prediction decision information.

接下来还结合实验数据对本申请实施例所带来的有益效果进行展示,图9为本申请实施例提供的神经网络的训练方法的有益效果的一种示意图。图9示出了左和右两个子示意图,图9中是以目标属性为电影的制片商,根据制作电影的部数将制片商分为大众制片商和小众制片商,训练后的神经网络用于确定是否推荐某个电影。图9的左子示意图和右子示意图均由一条折线来展示通过本申请实施例得到的训练后的神经网络生成的预测决策信息的准确度和公平性,通过三条折线和一个三角形来展示通过对照组的方法得到的训练后的神经网络生成的预测决策信息的准确度和公平性。其中,准确度指标的得分越高,代表生成的预测决策信息的准确度越高;公平性指标的得分越低,代表生成的预测决策信息在目标属性上的公平性越好。Next, the beneficial effects brought by the embodiments of the present application will be demonstrated in combination with experimental data. Figure 9 is a schematic diagram of the beneficial effects of the neural network training method provided by the embodiments of the present application. Figure 9 shows two sub-schematic diagrams on the left and right. In Figure 9, the target attribute is the film producer. According to the number of films produced, the producers are divided into mass producers and niche producers. After training A neural network is used to determine whether to recommend a certain movie. The left sub-schematic diagram and the right sub-schematic diagram of Figure 9 both use a polyline to show the accuracy and fairness of the prediction decision information generated by the trained neural network obtained through the embodiment of the present application, and use three polylines and a triangle to show the comparison. The group's method obtains the accuracy and fairness of the predictive decision-making information generated by the trained neural network. Among them, the higher the score of the accuracy index, the higher the accuracy of the generated prediction decision information; the lower the score of the fairness index, the better the fairness of the generated prediction decision information on the target attribute.

在本申请的一些实施例中会存在第二训练阶段,在第二训练阶段中会获取第三特征信息,并将第二训练样本所对应的第二特征信息和该第三特征信息组合,以构建实际上不存在的训练样本。图9的左子示意图和右子示意图中最右侧的一个点代表的是不存在第二训练阶段的情况下预测决策信息的准确度和公平性。如图所示,若不采用第二训练阶段,无论是图9的左子示意图,还是图9的右子示意图,均为基于本申请实施例的方法得到的预测决策信息的准确度最高。若采用第二训练阶段,还是基于本申请实施例的方法得到的预测决策信息的准确度最高,且公平性表现良好。In some embodiments of the present application, there will be a second training phase. In the second training phase, the third feature information will be obtained, and the second feature information corresponding to the second training sample will be combined with the third feature information to obtain Construct training samples that don't actually exist. The rightmost point in the left sub-schematic diagram and the right sub-schematic diagram of Figure 9 represents the accuracy and fairness of the prediction decision information without the second training stage. As shown in the figure, if the second training stage is not used, whether it is the left diagram of Figure 9 or the right diagram of Figure 9, the prediction decision information obtained based on the method of the embodiment of the present application has the highest accuracy. If the second training stage is adopted, the prediction decision information obtained based on the method of the embodiment of the present application has the highest accuracy and good fairness.

在图1至图9所对应的实施例的基础上,为了更好的实施本申请实施例的上述方案,下面还提供用于实施上述方案的相关设备。具体参阅图10,图10为本申请实施例提供的神经网络的训练装置的一种结构示意图,神经网络的训练装置1000包括:特征提取模块1001,用于将第一训练样本输入特征获取网络,通过特征获取网络对第一训练样本进行特征提取,得到第一训练样本的特征信息;生成模块1002,用于根据第一训练样本的特征信息,通过特征获取网络生成与第一训练样本对应的第一特征信息和第二特征信息,与第一训练样本对应的第一特征信息包括第一训练样本中与目标属性关联的信息的特征;第一分类模块1003,用于根据与第一训练样本对应的第一特征信息执行分类操作,得到预测类别信息,预测类别信息指示与第一训练样本对应的第一特征信息的预测类别,预测类别包括于目标属性所对应的多种类别中;训练模块1004,用于根据第一损失函数,对特征获取网络进行训练,得到训练后的特征获取网络;其中,第一损失函数包括第一损失函数项和第二损失函数项,第一损失函数项指示预测类别信息和期望类别信息之间的相似度,期望类别信息指示第一训练样本中与目标属性关联的信息的正确类别,采用第二损失函数项进行 训练的目的包括降低第一特征信息和第二特征信息之间的相似度。On the basis of the embodiments corresponding to Figures 1 to 9, in order to better implement the above solutions of the embodiments of the present application, relevant equipment for implementing the above solutions is also provided below. Specifically referring to Figure 10, Figure 10 is a schematic structural diagram of a neural network training device provided by an embodiment of the present application. The neural network training device 1000 includes: a feature extraction module 1001 for inputting the first training sample into the feature acquisition network, Feature extraction is performed on the first training sample through the feature acquisition network to obtain feature information of the first training sample; the generation module 1002 is configured to generate a third training sample corresponding to the first training sample through the feature acquisition network based on the feature information of the first training sample. A feature information and a second feature information. The first feature information corresponding to the first training sample includes the characteristics of the information associated with the target attribute in the first training sample; the first classification module 1003 is used to classify the information according to the first training sample corresponding to the first training sample. Perform a classification operation on the first feature information to obtain predicted category information. The predicted category information indicates the predicted category of the first feature information corresponding to the first training sample. The predicted category is included in multiple categories corresponding to the target attribute; training module 1004 , used to train the feature acquisition network according to the first loss function to obtain the trained feature acquisition network; wherein the first loss function includes a first loss function term and a second loss function term, and the first loss function term indicates prediction The similarity between the category information and the expected category information, which indicates the correct category of the information associated with the target attribute in the first training sample, is performed using the second loss function term The purpose of training includes reducing the similarity between the first feature information and the second feature information.

在一种可能的设计中,神经网络的训练装置1000还包括:组合模块,用于将与第一训练样本对应的第一特征信息和第二特征信息进行组合,得到第一组合后特征;第二分类模块,用于将第一组合后特征输入第一分类网络,得到第一分类网络输出的与第一训练样本对应的第一预测决策信息;训练模块1004,具体用于根据第一损失函数,对特征获取网络和第一分类网络进行训练,其中,第一损失函数还包括第三损失函数项,第三损失函数项指示第一预测决策信息和第一训练样本所对应的期望决策信息之间的相似度。In one possible design, the neural network training device 1000 further includes: a combination module for combining the first feature information and the second feature information corresponding to the first training sample to obtain the first combined feature; The second classification module is used to input the first combined features into the first classification network to obtain the first prediction decision information corresponding to the first training sample output by the first classification network; the training module 1004 is specifically used to calculate the first loss function according to the first loss function. , train the feature acquisition network and the first classification network, where the first loss function also includes a third loss function term, and the third loss function term indicates the relationship between the first prediction decision information and the expected decision information corresponding to the first training sample. similarity between.

在一种可能的设计中,神经网络的训练装置1000还包括:获取模块,用于获取第三特征信息,其中,第三特征信息和第二训练样本所对应的第一特征信息的数据尺寸相同且数据内容不同;组合模块,还用于将与第二训练样本对应的第二特征信息和第三特征信息进行组合,得到第二组合后特征;第二分类模块,还用于将第二组合后特征输入训练后的第一分类网络,得到训练后的第一分类网络输出的与第二训练样本对应的第二预测决策信息;第二分类模块,还用于将第二组合后特征输入第二分类网络,得到第二分类网络输出的与第二训练样本对应的第三预测决策信息;训练模块1004,具体用于根据第二损失函数,对第二分类网络进行训练,得到训练后的第二分类网络,训练后的特征获取网络和训练后的第二分类网络归属于同一目标神经网络;其中,第二损失函数包括第四损失函数项和第五损失函数项,第四损失函数项指示第二预测决策信息和第二训练样本所对应的期望决策信息之间的相似度,第五损失函数指示第二预测决策信息和第三预测决策信息之间的相似度。In one possible design, the neural network training device 1000 further includes: an acquisition module, configured to acquire third feature information, where the data size of the third feature information and the first feature information corresponding to the second training sample are the same. And the data content is different; the combination module is also used to combine the second feature information and the third feature information corresponding to the second training sample to obtain the second combined feature; the second classification module is also used to combine the second combination The latter features are input into the trained first classification network to obtain the second prediction decision information corresponding to the second training sample output by the trained first classification network; the second classification module is also used to input the second combined features into the third The second classification network obtains the third prediction decision information corresponding to the second training sample output by the second classification network; the training module 1004 is specifically used to train the second classification network according to the second loss function to obtain the trained third For the two-classification network, the trained feature acquisition network and the trained second classification network belong to the same target neural network; among them, the second loss function includes the fourth loss function term and the fifth loss function term, and the fourth loss function term indicates The similarity between the second prediction decision information and the expected decision information corresponding to the second training sample, and the fifth loss function indicates the similarity between the second prediction decision information and the third prediction decision information.

在一种可能的设计中,获取模块,具体用于:通过训练后的特征获取网络生成与第二训练样本对应的第一特征信息;将与第二训练样本对应的第一特征信息和扰动信息进行加权求和,得到第三特征信息,扰动信息的权重值为可调整的。In one possible design, the acquisition module is specifically used to: generate first feature information corresponding to the second training sample through the trained feature acquisition network; combine the first feature information and disturbance information corresponding to the second training sample Perform weighted summation to obtain the third feature information, and the weight value of the disturbance information is adjustable.

在一种可能的设计中,第二分类模块,还用于根据与第一训练样本对应的第二特征信息执行分类操作,得到与第一训练样本对应的第四预测决策信息,其中,第一损失函数还包括第六损失函数项,第六损失函数项指示第四预测决策信息和第一训练样本所对应的期望决策信息之间的相似度,第二损失函数项指示第一损失函数项所对应的梯度和第六损失函数项所对应的梯度之间的相似度。In a possible design, the second classification module is also used to perform a classification operation based on the second feature information corresponding to the first training sample to obtain the fourth prediction decision information corresponding to the first training sample, wherein the first The loss function also includes a sixth loss function term, the sixth loss function term indicates the similarity between the fourth predicted decision information and the expected decision information corresponding to the first training sample, and the second loss function term indicates the first loss function term. The similarity between the corresponding gradient and the gradient corresponding to the sixth loss function term.

在一种可能的设计中,神经网络的训练装置1000应用于如下任一种场景:确定是否推荐第一训练样本指向的物体、确定第一训练样本中的物体是否处于目标状态或确定是否同意第一训练样本指向的申请者的请求。In one possible design, the neural network training device 1000 is applied to any of the following scenarios: determining whether to recommend the object pointed to by the first training sample, determining whether the object in the first training sample is in the target state, or determining whether to agree with the first training sample. A training sample points to the applicant's request.

需要说明的是,神经网络的训练装置1000中各模块/单元之间的信息交互、执行过程等内容,与本申请中图2b至图7对应的各个方法实施例基于同一构思,具体内容可参见本申请前述所示的方法实施例中的叙述,此处不再赘述。It should be noted that the information interaction and execution process between each module/unit in the neural network training device 1000 are based on the same concept as the various method embodiments corresponding to Figures 2b to 7 in this application. For specific content, please refer to The descriptions in the method embodiments shown above in this application will not be repeated here.

本申请实施例还提供一种数据的处理装置,请参阅图11,图11为本申请实施例提供的数据的处理装置的一种结构示意图,数据的处理装置1100包括:特征提取模块1101,用于将待处理数据输入特征获取网络,通过特征获取网络对待处理数据进行特征提取,得到待处理数据的特征信息;生成模块1102,用于根据待处理数据的特征信息,通过特征获取网络生成第一特征信息和第二特征信息,第一特征信息包括待处理数据中与目标属性关 联的信息的特征;组合模块1103,用于将第一特征信息和第二特征信息进行组合,得到第一组合后特征;分类模块1104,用于将第一组合后特征输入分类网络,得到分类网络输出的预测决策信息;其中,特征获取网络采用第一损失函数训练得到,第一损失函数包括第一损失函数项和第二损失函数项,第一损失函数项指示预测类别信息和期望类别信息之间的相似度,预测类别信息指示输入特征获取网络的数据中与目标属性关联的信息的预测类别,期望类别信息指示输入特征获取网络的数据中与目标属性关联的信息的正确类别,采用第二损失函数项进行训练的目的包括降低第一特征信息和第二特征信息之间的相似度。The embodiment of the present application also provides a data processing device. Please refer to Figure 11. Figure 11 is a schematic structural diagram of the data processing device provided by the embodiment of the present application. The data processing device 1100 includes: a feature extraction module 1101. The data to be processed is input into the feature acquisition network, and the features of the data to be processed are extracted through the feature acquisition network to obtain the feature information of the data to be processed; the generation module 1102 is used to generate the first first step through the feature acquisition network according to the feature information of the data to be processed. Feature information and second feature information. The first feature information includes the relationship between the target attributes in the data to be processed. characteristics of the associated information; the combination module 1103 is used to combine the first feature information and the second feature information to obtain the first combined features; the classification module 1104 is used to input the first combined features into the classification network to obtain the classification Prediction decision information output by the network; wherein, the feature acquisition network is trained using a first loss function. The first loss function includes a first loss function term and a second loss function term. The first loss function term indicates the predicted category information and the expected category information. The similarity between the two, the predicted category information indicates the predicted category of the information associated with the target attribute in the data input to the feature acquisition network, and the expected category information indicates the correct category of the information associated with the target attribute in the data input to the feature acquisition network, using the The purpose of training the two loss function terms includes reducing the similarity between the first feature information and the second feature information.

在一种可能的设计中,第一损失函数还包括第三损失函数项,第三损失函数项指示第一预测决策信息和输入特征获取网络的数据所对应的期望决策信息之间的相似度,第一预测决策信息指示与输入特征获取网络的数据对应的决策。In a possible design, the first loss function also includes a third loss function term, and the third loss function term indicates the similarity between the first predicted decision information and the expected decision information corresponding to the data of the input feature acquisition network, The first prediction decision information indicates a decision corresponding to the data input to the feature acquisition network.

需要说明的是,数据的处理装置1100中各模块/单元之间的信息交互、执行过程等内容,与本申请中图8对应的各个方法实施例基于同一构思,具体内容可参见本申请前述所示的方法实施例中的叙述,此处不再赘述。It should be noted that the information interaction, execution process, etc. between the modules/units in the data processing device 1100 are based on the same concept as the various method embodiments corresponding to Figure 8 in this application. For specific content, please refer to the foregoing description of this application. The descriptions in the method embodiments shown are not repeated here.

接下来介绍本申请实施例提供的一种执行设备,请参阅图12,图12为本申请实施例提供的执行设备的一种结构示意图。具体的,执行设备1200包括:接收器1201、发射器1202、处理器1203和存储器1204(其中执行设备1200中的处理器1203的数量可以一个或多个,图12中以一个处理器为例),其中,处理器1203可以包括应用处理器12031和通信处理器12032。在本申请的一些实施例中,接收器1201、发射器1202、处理器1203和存储器1204可通过总线或其它方式连接。Next, an execution device provided by an embodiment of the present application will be introduced. Please refer to FIG. 12 . FIG. 12 is a schematic structural diagram of an execution device provided by an embodiment of the present application. Specifically, the execution device 1200 includes: a receiver 1201, a transmitter 1202, a processor 1203 and a memory 1204 (the number of processors 1203 in the execution device 1200 may be one or more, one processor is taken as an example in Figure 12) , wherein the processor 1203 may include an application processor 12031 and a communication processor 12032. In some embodiments of the present application, the receiver 1201, the transmitter 1202, the processor 1203, and the memory 1204 may be connected through a bus or other means.

存储器1204可以包括只读存储器和随机存取存储器,并向处理器1203提供指令和数据。存储器1204的一部分还可以包括非易失性随机存取存储器(non-volatile random access memory,NVRAM)。存储器1204存储有处理器和操作指令、可执行模块或者数据结构,或者它们的子集,或者它们的扩展集,其中,操作指令可包括各种操作指令,用于实现各种操作。Memory 1204 may include read-only memory and random access memory and provides instructions and data to processor 1203 . A portion of memory 1204 may also include non-volatile random access memory (NVRAM). The memory 1204 stores processor and operating instructions, executable modules or data structures, or a subset thereof, or an extended set thereof, where the operating instructions may include various operating instructions for implementing various operations.

处理器1203控制执行设备的操作。具体的应用中,执行设备的各个组件通过总线系统耦合在一起,其中总线系统除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都称为总线系统。The processor 1203 controls the execution of operations of the device. In specific applications, various components of the execution device are coupled together through a bus system. In addition to the data bus, the bus system may also include a power bus, a control bus, a status signal bus, etc. However, for the sake of clarity, various buses are called bus systems in the figure.

上述本申请实施例揭示的方法可以应用于处理器1203中,或者由处理器1203实现。处理器1203可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器1203中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器1203可以是通用处理器、数字信号处理器(digital signal processing,DSP)、微处理器或微控制器,还可进一步包括专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。该处理器1203可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直 接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1204,处理器1203读取存储器1204中的信息,结合其硬件完成上述方法的步骤。The methods disclosed in the above embodiments of the present application can be applied to the processor 1203 or implemented by the processor 1203. The processor 1203 may be an integrated circuit chip with signal processing capabilities. During the implementation process, each step of the above method can be completed by instructions in the form of hardware integrated logic circuits or software in the processor 1203 . The above-mentioned processor 1203 can be a general-purpose processor, a digital signal processing (DSP), a microprocessor or a microcontroller, and can further include an application specific integrated circuit (ASIC), a field programmable Gate array (field-programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. The processor 1203 can implement or execute the various methods, steps and logical block diagrams disclosed in the embodiments of this application. A general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc. The steps of the methods disclosed in the embodiments of this application can be directly The implementation is implemented by a hardware decoding processor, or by a combination of hardware and software modules in the decoding processor. The software module can be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other mature storage media in this field. The storage medium is located in the memory 1204. The processor 1203 reads the information in the memory 1204 and completes the steps of the above method in combination with its hardware.

接收器1201可用于接收输入的数字或字符信息,以及产生与执行设备的相关设置以及功能控制有关的信号输入。发射器1202可用于通过第一接口输出数字或字符信息;发射器1202还可用于通过第一接口向磁盘组发送指令,以修改磁盘组中的数据;发射器1202还可以包括显示屏等显示设备。The receiver 1201 may be configured to receive input numeric or character information and generate signal inputs related to performing relevant settings and functional controls of the device. The transmitter 1202 can be used to output numeric or character information through the first interface; the transmitter 1202 can also be used to send instructions to the disk group through the first interface to modify the data in the disk group; the transmitter 1202 can also include a display device such as a display screen .

本申请实施例中,处理器1203中的应用处理器12031,用于执行图8对应实施例中的执行设备执行的数据的处理方法。需要说明的是,应用处理器12031执行数据的处理方法中各个步骤的具体方式,与本申请中图8对应的各个方法实施例基于同一构思,其带来的技术效果与本申请中图8对应的各个方法实施例相同,具体内容可参见本申请前述所示的方法实施例中的叙述,此处不再赘述。In this embodiment of the present application, the application processor 12031 in the processor 1203 is used to execute the data processing method executed by the execution device in the corresponding embodiment of FIG. 8 . It should be noted that the specific manner in which the application processor 12031 performs each step in the data processing method is based on the same concept as the various method embodiments corresponding to Figure 8 in this application, and the technical effects it brings correspond to Figure 8 in this application. The various method embodiments are the same. For specific details, please refer to the descriptions in the method embodiments shown above in this application, and will not be described again here.

本申请实施例还提供了一种训练设备,请参阅图13,图13是本申请实施例提供的训练设备一种结构示意图,具体的,训练设备1300由一个或多个服务器实现,训练设备1300可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上中央处理器(central processing units,CPU)1322(例如,一个或一个以上处理器)和存储器1332,一个或一个以上存储应用程序1342或数据1344的存储介质1330(例如一个或一个以上海量存储设备)。其中,存储器1332和存储介质1330可以是短暂存储或持久存储。存储在存储介质1330的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对训练设备中的一系列指令操作。更进一步地,中央处理器1322可以设置为与存储介质1330通信,在训练设备1300上执行存储介质1330中的一系列指令操作。The embodiment of the present application also provides a training device. Please refer to Figure 13. Figure 13 is a schematic structural diagram of the training device provided by the embodiment of the present application. Specifically, the training device 1300 is implemented by one or more servers. The training device 1300 There may be relatively large differences due to different configurations or performance, and may include one or more central processing units (CPU) 1322 (for example, one or more processors) and memory 1332, one or more storage applications Storage medium 1330 for program 1342 or data 1344 (eg, one or more mass storage devices). Among them, the memory 1332 and the storage medium 1330 may be short-term storage or persistent storage. The program stored in the storage medium 1330 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations in the training device. Furthermore, the central processor 1322 may be configured to communicate with the storage medium 1330 and execute a series of instruction operations in the storage medium 1330 on the training device 1300 .

训练设备1300还可以包括一个或一个以上电源1326,一个或一个以上有线或无线网络接口1350,一个或一个以上输入输出接口1358,和/或,一个或一个以上操作系统1341,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM等等。The training device 1300 may also include one or more power supplies 1326, one or more wired or wireless network interfaces 1350, one or more input and output interfaces 1358, and/or, one or more operating systems 1341, such as Windows Server™, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM and so on.

本申请实施例中,中央处理器1322,用于执行图2b至图7对应实施例中的训练设备执行的神经网络的训练方法。需要说明的是,中央处理器1322执行神经网络的训练方法中各个步骤的具体方式,与本申请中图2b至图7对应的各个方法实施例基于同一构思,其带来的技术效果与本申请中图2b至图7对应的各个方法实施例相同,具体内容可参见本申请前述所示的方法实施例中的叙述,此处不再赘述。In this embodiment of the present application, the central processor 1322 is used to execute the neural network training method executed by the training device in the corresponding embodiments of Figures 2b to 7. It should be noted that the specific manner in which the central processor 1322 executes each step in the neural network training method is based on the same concept as the various method embodiments corresponding to Figures 2b to 7 in this application, and the technical effects it brings are the same as those in this application. The respective method embodiments corresponding to Figures 2b to 7 are the same. For details, please refer to the descriptions in the method embodiments shown above in this application, and will not be described again here.

本申请实施例中还提供一种包括计算机程序产品,当其在计算机上运行时,使得计算机执行如前述图8所示实施例描述的方法中执行设备所执行的步骤,或者,使得计算机执行如前述图2b至图7所示实施例描述的方法中训练设备所执行的步骤。An embodiment of the present application also provides a computer program product that, when run on a computer, causes the computer to perform the steps performed by the execution device in the method described in the embodiment shown in FIG. 8, or causes the computer to perform the following: The steps performed by the training device in the method described in the embodiments shown in Figures 2b to 7 are mentioned above.

本申请实施例中还提供一种计算机可读存储介质,该计算机可读存储介质中存储有用于进行信号处理的程序,当其在计算机上运行时,使得计算机执行如前述图8所示实施例描述的方法中执行设备所执行的步骤,或者,使得计算机执行如前述图2b至图7所示实施 例描述的方法中训练设备所执行的步骤。Embodiments of the present application also provide a computer-readable storage medium. The computer-readable storage medium stores a program for signal processing. When it is run on a computer, it causes the computer to execute the embodiment shown in Figure 8. The steps performed by the execution device in the described method, or causing the computer to perform the steps shown in the aforementioned Figures 2b to 7 The example describes the steps performed by training the device in the method.

本申请实施例提供的神经网络的训练装置、数据的处理装置、执行设备或训练设备具体可以为芯片,芯片包括:处理单元和通信单元,所述处理单元例如可以是处理器,所述通信单元例如可以是输入/输出接口、管脚或电路等。该处理单元可执行存储单元存储的计算机执行指令,以使芯片执行上述图2b至图7所示实施例描述的神经网络的训练方法,或者,以使训练设备内的芯片执行上述图8所示实施例描述的神经网络的训练方法。可选地,所述存储单元为所述芯片内的存储单元,如寄存器、缓存等,所述存储单元还可以是所述无线接入设备端内的位于所述芯片外部的存储单元,如只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)等。The neural network training device, data processing device, execution device or training device provided by the embodiment of the present application may specifically be a chip. The chip includes: a processing unit and a communication unit. The processing unit may be, for example, a processor. The communication unit For example, it can be an input/output interface, a pin or a circuit, etc. The processing unit can execute the computer execution instructions stored in the storage unit, so that the chip executes the neural network training method described in the embodiments shown in FIGS. 2b to 7, or to cause the chip in the training device to execute the neural network training method shown in FIG. 8. The embodiment describes the training method of neural network. Optionally, the storage unit is a storage unit within the chip, such as a register, cache, etc. The storage unit may also be a storage unit located outside the chip in the wireless access device, such as Read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (random access memory, RAM), etc.

具体的,请参阅图14,图14为本申请实施例提供的芯片的一种结构示意图,所述芯片可以表现为神经网络处理器NPU 140,NPU 140作为协处理器挂载到主CPU(Host CPU)上,由Host CPU分配任务。NPU的核心部分为运算电路1403,通过控制器1404控制运算电路1403提取存储器中的矩阵数据并进行乘法运算。Specifically, please refer to Figure 14. Figure 14 is a structural schematic diagram of a chip provided by an embodiment of the present application. The chip can be represented as a neural network processor NPU 140. The NPU 140 serves as a co-processor and is mounted to the main CPU (Host). CPU), tasks are allocated by the Host CPU. The core part of the NPU is the arithmetic circuit 1403. The arithmetic circuit 1403 is controlled by the controller 1404 to extract the matrix data in the memory and perform multiplication operations.

在一些实现中,运算电路1403内部包括多个处理单元(Process Engine,PE)。在一些实现中,运算电路1403是二维脉动阵列。运算电路1403还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中,运算电路1403是通用的矩阵处理器。In some implementations, the computing circuit 1403 internally includes multiple processing units (Process Engine, PE). In some implementations, arithmetic circuit 1403 is a two-dimensional systolic array. The arithmetic circuit 1403 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, arithmetic circuit 1403 is a general-purpose matrix processor.

举例来说,假设有输入矩阵A,权重矩阵B,输出矩阵C。运算电路从权重存储器1402中取矩阵B相应的数据,并缓存在运算电路中每一个PE上。运算电路从输入存储器1401中取矩阵A数据与矩阵B进行矩阵运算,得到的矩阵的部分结果或最终结果,保存在累加器(accumulator)1408中。For example, assume there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit obtains the corresponding data of matrix B from the weight memory 1402 and caches it on each PE in the arithmetic circuit. The operation circuit takes matrix A data and matrix B from the input memory 1401 to perform matrix operations, and the partial result or final result of the matrix is stored in an accumulator (accumulator) 1408 .

统一存储器1406用于存放输入数据以及输出数据。权重数据直接通过存储单元访问控制器(Direct Memory Access Controller,DMAC)1405,DMAC被搬运到权重存储器1402中。输入数据也通过DMAC被搬运到统一存储器1406中。The unified memory 1406 is used to store input data and output data. The weight data directly passes through the storage unit access controller (Direct Memory Access Controller, DMAC) 1405, and the DMAC is transferred to the weight memory 1402. Input data is also transferred to unified memory 1406 via DMAC.

BIU为Bus Interface Unit即,总线接口单元1410,用于AXI总线与DMAC和取指存储器(Instruction Fetch Buffer,IFB)1409的交互。BIU is the Bus Interface Unit, that is, the bus interface unit 1410, which is used for the interaction between the AXI bus and the DMAC and the Instruction Fetch Buffer (IFB) 1409.

总线接口单元1410(Bus Interface Unit,简称BIU),用于取指存储器1409从外部存储器获取指令,还用于存储单元访问控制器1405从外部存储器获取输入矩阵A或者权重矩阵B的原数据。The bus interface unit 1410 (Bus Interface Unit, BIU for short) is used to fetch the memory 1409 to obtain instructions from the external memory, and is also used for the storage unit access controller 1405 to obtain the original data of the input matrix A or the weight matrix B from the external memory.

DMAC主要用于将外部存储器DDR中的输入数据搬运到统一存储器1406或将权重数据搬运到权重存储器1402中或将输入数据数据搬运到输入存储器1401中。DMAC is mainly used to transfer the input data in the external memory DDR to the unified memory 1406 or the weight data to the weight memory 1402 or the input data to the input memory 1401 .

向量计算单元1407包括多个运算处理单元,在需要的情况下,对运算电路的输出做进一步处理,如向量乘,向量加,指数运算,对数运算,大小比较等等。主要用于神经网络中非卷积/全连接层网络计算,如Batch Normalization(批归一化),像素级求和,对特征平面进行上采样等。The vector calculation unit 1407 includes multiple arithmetic processing units, and if necessary, further processes the output of the arithmetic circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, etc. Mainly used for non-convolutional/fully connected layer network calculations in neural networks, such as Batch Normalization, pixel-level summation, upsampling of feature planes, etc.

在一些实现中,向量计算单元1407能将经处理的输出的向量存储到统一存储器1406。 例如,向量计算单元1407可以将线性函数和/或非线性函数应用到运算电路1403的输出,例如对卷积层提取的特征平面进行线性插值,再例如累加值的向量,用以生成激活值。在一些实现中,向量计算单元1407生成归一化的值、像素级求和的值,或二者均有。在一些实现中,处理过的输出的向量能够用作到运算电路1403的激活输入,例如用于在神经网络中的后续层中的使用。In some implementations, vector calculation unit 1407 can store the processed output vectors to unified memory 1406 . For example, the vector calculation unit 1407 can apply a linear function and/or a nonlinear function to the output of the operation circuit 1403, such as linear interpolation on the feature plane extracted by the convolution layer, or a vector of accumulated values, to generate an activation value. In some implementations, vector calculation unit 1407 generates normalized values, pixel-wise summed values, or both. In some implementations, the processed output vector can be used as an activation input to the arithmetic circuit 1403, such as for use in a subsequent layer in a neural network.

控制器1404连接的取指存储器(instruction fetch buffer)1409,用于存储控制器1404使用的指令;The instruction fetch buffer 1409 connected to the controller 1404 is used to store instructions used by the controller 1404;

统一存储器1406,输入存储器1401,权重存储器1402以及取指存储器1409均为On-Chip存储器。外部存储器私有于该NPU硬件架构。The unified memory 1406, input memory 1401, weight memory 1402 and instruction fetch memory 1409 are all On-Chip memories. External memory is private to the NPU hardware architecture.

其中,上述各个实施例中所示的神经网络中各层的运算可以由运算电路1403或向量计算单元1407执行。Among them, the operations of each layer in the neural network shown in the above embodiments can be performed by the operation circuit 1403 or the vector calculation unit 1407.

其中,上述任一处提到的处理器,可以是一个通用中央处理器,微处理器,ASIC,或一个或多个用于控制上述第一方面方法的程序执行的集成电路。The processor mentioned in any of the above places may be a general central processing unit, a microprocessor, an ASIC, or one or more integrated circuits used to control program execution of the method of the first aspect.

另外需说明的是,以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外,本申请提供的装置实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。In addition, it should be noted that the device embodiments described above are only illustrative. The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physically separate. The physical unit can be located in one place, or it can be distributed across multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment. In addition, in the drawings of the device embodiments provided in this application, the connection relationship between modules indicates that there are communication connections between them, which can be specifically implemented as one or more communication buses or signal lines.

通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件的方式来实现,当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现。一般情况下,凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现,而且,用来实现同一功能的具体硬件结构也可以是多种多样的,例如模拟电路、数字电路或专用电路等。但是,对本申请而言更多情况下软件程序实现是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在可读取的存储介质中,如计算机的软盘、U盘、移动硬盘、ROM、RAM、磁碟或者光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,训练设备,或者网络设备等)执行本申请各个实施例所述的方法。Through the above description of the embodiments, those skilled in the art can clearly understand that the present application can be implemented by software plus necessary general hardware. Of course, it can also be implemented by dedicated hardware including dedicated integrated circuits, dedicated CPUs, dedicated memories, Special components, etc. to achieve. In general, all functions performed by computer programs can be easily implemented with corresponding hardware. Moreover, the specific hardware structures used to implement the same function can also be diverse, such as analog circuits, digital circuits or special-purpose circuits. circuit etc. However, for this application, software program implementation is a better implementation in most cases. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence or that contributes to the existing technology. The computer software product is stored in a readable storage medium, such as a computer floppy disk. , U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk, etc., including several instructions to cause a computer device (which can be a personal computer, training device, or network device, etc.) to execute the steps described in various embodiments of this application. method.

在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented using software, it may be implemented in whole or in part in the form of a computer program product.

所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、训练设备或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向 另一个网站站点、计算机、训练设备或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质集成的训练设备、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(Solid State Disk,SSD))等。 The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions described in the embodiments of the present application are generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, the computer instructions may be transferred from a website, computer, training device, or data The center transmits data to the network through wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) to another website, computer, training device or data center. The computer-readable storage medium may be any available medium that a computer can store, or a data storage device such as a training device or a data center integrated with one or more available media. The available media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, DVD), or semiconductor media (eg, solid state disk (Solid State Disk, SSD)), etc.

Claims (20)

一种神经网络的训练方法,其特征在于,所述方法包括:A neural network training method, characterized in that the method includes: 将第一训练样本输入特征获取网络,通过所述特征获取网络对所述第一训练样本进行特征提取,得到所述第一训练样本的特征信息;Input the first training sample into the feature acquisition network, perform feature extraction on the first training sample through the feature acquisition network, and obtain the feature information of the first training sample; 根据所述第一训练样本的特征信息,通过所述特征获取网络生成与所述第一训练样本对应的第一特征信息和第二特征信息,所述与所述第一训练样本对应的所述第一特征信息包括所述第一训练样本中与目标属性关联的信息的特征;According to the characteristic information of the first training sample, the first characteristic information and the second characteristic information corresponding to the first training sample are generated through the characteristic acquisition network, and the first characteristic information corresponding to the first training sample is The first feature information includes features of information associated with the target attribute in the first training sample; 根据与所述第一训练样本对应的所述第一特征信息执行分类操作,得到预测类别信息,所述预测类别信息指示与所述第一训练样本对应的所述第一特征信息的预测类别,所述预测类别包括于所述目标属性所对应的多种类别中;Perform a classification operation according to the first feature information corresponding to the first training sample to obtain predicted category information, where the predicted category information indicates the predicted category of the first feature information corresponding to the first training sample, The predicted category is included in multiple categories corresponding to the target attribute; 根据第一损失函数,对所述特征获取网络进行训练,得到训练后的特征获取网络;According to the first loss function, the feature acquisition network is trained to obtain a trained feature acquisition network; 其中,所述第一损失函数包括第一损失函数项和第二损失函数项,所述第一损失函数项指示所述预测类别信息和期望类别信息之间的相似度,所述期望类别信息指示所述第一训练样本中与所述目标属性关联的信息的正确类别,采用所述第二损失函数项进行训练的目的包括降低所述第一特征信息和所述第二特征信息之间的相似度。Wherein, the first loss function includes a first loss function term and a second loss function term, the first loss function term indicates the similarity between the predicted category information and the expected category information, and the expected category information indicates The correct category of the information associated with the target attribute in the first training sample. The purpose of using the second loss function term for training includes reducing the similarity between the first feature information and the second feature information. Spend. 根据权利要求1所述的方法,其特征在于,所述方法还包括:The method of claim 1, further comprising: 将与所述第一训练样本对应的所述第一特征信息和所述第二特征信息进行组合,得到第一组合后特征;Combine the first feature information and the second feature information corresponding to the first training sample to obtain a first combined feature; 将所述第一组合后特征输入第一分类网络,得到所述第一分类网络输出的与所述第一训练样本对应的第一预测决策信息;Input the first combined features into the first classification network to obtain the first prediction decision information output by the first classification network corresponding to the first training sample; 所述根据第一损失函数,对所述特征获取网络进行训练,包括:Training the feature acquisition network according to the first loss function includes: 根据所述第一损失函数,对所述特征获取网络和所述第一分类网络进行训练,其中,所述第一损失函数还包括第三损失函数项,所述第三损失函数项指示所述第一预测决策信息和所述第一训练样本所对应的期望决策信息之间的相似度。The feature acquisition network and the first classification network are trained according to the first loss function, wherein the first loss function also includes a third loss function term, and the third loss function term indicates the The similarity between the first predicted decision information and the expected decision information corresponding to the first training sample. 根据权利要求2所述的方法,其特征在于,所述对所述特征获取网络和所述第一分类网络进行训练,得到所述训练后的特征获取网络和训练后的第一分类网络之后,所述方法还包括:The method according to claim 2, characterized in that, after training the feature acquisition network and the first classification network to obtain the trained feature acquisition network and the trained first classification network, The method also includes: 获取第三特征信息,并将所述与第二训练样本对应的所述第二特征信息和所述第三特征信息进行组合,得到第二组合后特征,其中,所述第三特征信息和所述第二训练样本所对应的所述第一特征信息的数据尺寸相同且数据内容不同;Obtain third feature information, and combine the second feature information corresponding to the second training sample and the third feature information to obtain a second combined feature, wherein the third feature information and the third feature information are The data size of the first feature information corresponding to the second training sample is the same and the data content is different; 将所述第二组合后特征输入训练后的所述第一分类网络,得到所述训练后的第一分类网络输出的与所述第二训练样本对应的第二预测决策信息;Input the second combined features into the trained first classification network to obtain second prediction decision information corresponding to the second training sample output by the trained first classification network; 将所述第二组合后特征输入第二分类网络,得到所述第二分类网络输出的与所述第二训练样本对应的第三预测决策信息;Input the second combined features into a second classification network to obtain third prediction decision information output by the second classification network corresponding to the second training sample; 根据第二损失函数,对所述第二分类网络进行训练,得到训练后的第二分类网络,所述训练后的特征获取网络和所述训练后的第二分类网络归属于同一目标神经网络;According to the second loss function, the second classification network is trained to obtain a trained second classification network, and the trained feature acquisition network and the trained second classification network belong to the same target neural network; 其中,所述第二损失函数包括第四损失函数项和第五损失函数项,所述第四损失函数 项指示所述第二预测决策信息和所述第二训练样本所对应的期望决策信息之间的相似度,所述第五损失函数指示所述第二预测决策信息和所述第三预测决策信息之间的相似度。Wherein, the second loss function includes a fourth loss function term and a fifth loss function term, and the fourth loss function The term indicates the similarity between the second prediction decision information and the expected decision information corresponding to the second training sample, and the fifth loss function indicates the second prediction decision information and the third prediction decision information. similarity between them. 根据权利要求3所述的方法,其特征在于,所述获取第三特征信息,包括:The method according to claim 3, characterized in that said obtaining the third characteristic information includes: 通过训练后的所述特征获取网络生成与所述第二训练样本对应的所述第一特征信息;Generate the first feature information corresponding to the second training sample through the trained feature acquisition network; 将与所述第二训练样本对应的所述第一特征信息和扰动信息进行加权求和,得到所述第三特征信息,所述扰动信息的权重值为可调整的。The first feature information and disturbance information corresponding to the second training sample are weighted and summed to obtain the third feature information, and the weight value of the disturbance information is adjustable. 根据权利要求1至4任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 4, characterized in that the method further includes: 根据所述与第一训练样本对应的第二特征信息执行分类操作,得到与所述第一训练样本对应的第四预测决策信息,其中,所述第一损失函数还包括第六损失函数项,所述第六损失函数项指示所述第四预测决策信息和所述第一训练样本所对应的期望决策信息之间的相似度,所述第二损失函数项指示所述第一损失函数项所对应的梯度和所述第六损失函数项所对应的梯度之间的相似度。Perform a classification operation according to the second feature information corresponding to the first training sample to obtain fourth prediction decision information corresponding to the first training sample, wherein the first loss function also includes a sixth loss function term, The sixth loss function term indicates the similarity between the fourth predicted decision information and the expected decision information corresponding to the first training sample, and the second loss function term indicates the similarity of the first loss function term. The similarity between the corresponding gradient and the gradient corresponding to the sixth loss function term. 根据权利要求1至4任一项所述的方法,其特征在于,所述方法应用于如下任一种场景:确定是否推荐所述第一训练样本指向的物体、确定所述第一训练样本中的物体是否处于目标状态或确定是否同意所述第一训练样本指向的申请者的请求。The method according to any one of claims 1 to 4, characterized in that the method is applied to any of the following scenarios: determining whether to recommend the object pointed by the first training sample, determining whether the object pointed to by the first training sample is Determine whether the object is in the target state or whether to agree to the request of the applicant pointed to by the first training sample. 一种数据的处理方法,其特征在于,所述方法包括:A data processing method, characterized in that the method includes: 将待处理数据输入特征获取网络,通过所述特征获取网络对所述待处理数据进行特征提取,得到所述待处理数据的特征信息;Input the data to be processed into a feature acquisition network, perform feature extraction on the data to be processed through the feature acquisition network, and obtain the feature information of the data to be processed; 根据所述待处理数据的特征信息,通过所述特征获取网络生成第一特征信息和第二特征信息,所述第一特征信息包括所述待处理数据中与目标属性关联的信息的特征;According to the characteristic information of the data to be processed, first characteristic information and second characteristic information are generated through the characteristic acquisition network, where the first characteristic information includes characteristics of the information associated with the target attribute in the data to be processed; 将所述第一特征信息和所述第二特征信息进行组合,得到第一组合后特征;Combine the first feature information and the second feature information to obtain a first combined feature; 将所述第一组合后特征输入分类网络,得到所述分类网络输出的预测决策信息;Input the first combined features into a classification network to obtain prediction decision information output by the classification network; 其中,所述特征获取网络采用第一损失函数训练得到,所述第一损失函数包括第一损失函数项和第二损失函数项,所述第一损失函数项指示预测类别信息和期望类别信息之间的相似度,所述预测类别信息指示输入所述特征获取网络的数据中与目标属性关联的信息的预测类别,所述期望类别信息指示输入所述特征获取网络的数据中与所述目标属性关联的信息的正确类别,采用所述第二损失函数项进行训练的目的包括降低所述第一特征信息和所述第二特征信息之间的相似度。Wherein, the feature acquisition network is trained using a first loss function. The first loss function includes a first loss function term and a second loss function term. The first loss function term indicates the relationship between predicted category information and expected category information. The predicted category information indicates the predicted category of the information associated with the target attribute in the data input to the feature acquisition network, and the expected category information indicates the predicted category of the information associated with the target attribute in the data input to the feature acquisition network. The correct category of the associated information, the purpose of using the second loss function term for training includes reducing the similarity between the first feature information and the second feature information. 根据权利要求7所述的方法,其特征在于,所述第一损失函数还包括第三损失函数项,所述第三损失函数项指示第一预测决策信息和输入所述特征获取网络的数据所对应的期望决策信息之间的相似度,所述第一预测决策信息指示与输入所述特征获取网络的数据对应的决策。The method according to claim 7, characterized in that the first loss function further includes a third loss function term, the third loss function term indicates the first prediction decision information and the data input to the feature acquisition network. Similarity between corresponding desired decision information, the first predicted decision information indicating a decision corresponding to the data input to the feature acquisition network. 一种神经网络的训练装置,其特征在于,所述装置包括:A neural network training device, characterized in that the device includes: 特征提取模块,用于将第一训练样本输入特征获取网络,通过所述特征获取网络对所述第一训练样本进行特征提取,得到所述第一训练样本的特征信息;A feature extraction module, configured to input the first training sample into a feature acquisition network, perform feature extraction on the first training sample through the feature acquisition network, and obtain feature information of the first training sample; 生成模块,用于根据所述第一训练样本的特征信息,通过所述特征获取网络生成与所述第一训练样本对应的第一特征信息和第二特征信息,所述与所述第一训练样本对应的所 述第一特征信息包括所述第一训练样本中与目标属性关联的信息的特征;a generation module, configured to generate first feature information and second feature information corresponding to the first training sample through the feature acquisition network according to the feature information of the first training sample, and the first feature information corresponding to the first training sample. The sample corresponds to The first feature information includes features of information associated with the target attribute in the first training sample; 第一分类模块,用于根据与所述第一训练样本对应的所述第一特征信息执行分类操作,得到预测类别信息,所述预测类别信息指示与所述第一训练样本对应的所述第一特征信息的预测类别,所述预测类别包括于所述目标属性所对应的多种类别中;A first classification module configured to perform a classification operation based on the first feature information corresponding to the first training sample to obtain prediction category information, where the prediction category information indicates the third feature information corresponding to the first training sample. A predicted category of feature information, the predicted category being included in multiple categories corresponding to the target attribute; 训练模块,用于根据第一损失函数,对所述特征获取网络进行训练,得到训练后的特征获取网络;A training module, configured to train the feature acquisition network according to the first loss function to obtain a trained feature acquisition network; 其中,所述第一损失函数包括第一损失函数项和第二损失函数项,所述第一损失函数项指示所述预测类别信息和期望类别信息之间的相似度,所述期望类别信息指示所述第一训练样本中与所述目标属性关联的信息的正确类别,采用所述第二损失函数项进行训练的目的包括降低所述第一特征信息和所述第二特征信息之间的相似度。Wherein, the first loss function includes a first loss function term and a second loss function term, the first loss function term indicates the similarity between the predicted category information and the expected category information, and the expected category information indicates The correct category of the information associated with the target attribute in the first training sample. The purpose of using the second loss function term for training includes reducing the similarity between the first feature information and the second feature information. Spend. 根据权利要求9所述的装置,其特征在于,所述装置还包括:The device of claim 9, further comprising: 组合模块,用于将与所述第一训练样本对应的所述第一特征信息和所述第二特征信息进行组合,得到第一组合后特征;A combination module, configured to combine the first feature information and the second feature information corresponding to the first training sample to obtain a first combined feature; 第二分类模块,用于将所述第一组合后特征输入第一分类网络,得到所述第一分类网络输出的与所述第一训练样本对应的第一预测决策信息;A second classification module, configured to input the first combined features into a first classification network to obtain the first prediction decision information output by the first classification network corresponding to the first training sample; 所述训练模块,具体用于根据所述第一损失函数,对所述特征获取网络和所述第一分类网络进行训练,其中,所述第一损失函数还包括第三损失函数项,所述第三损失函数项指示所述第一预测决策信息和所述第一训练样本所对应的期望决策信息之间的相似度。The training module is specifically used to train the feature acquisition network and the first classification network according to the first loss function, wherein the first loss function also includes a third loss function term, and the The third loss function term indicates the similarity between the first predicted decision information and the expected decision information corresponding to the first training sample. 根据权利要求10所述的装置,其特征在于,所述装置还包括:The device according to claim 10, characterized in that the device further includes: 获取模块,用于获取第三特征信息,其中,所述第三特征信息和第二训练样本所对应的所述第一特征信息的数据尺寸相同且数据内容不同;An acquisition module, configured to acquire third feature information, wherein the data size of the third feature information and the first feature information corresponding to the second training sample are the same and the data content is different; 所述组合模块,还用于将所述与第二训练样本对应的所述第二特征信息和所述第三特征信息进行组合,得到第二组合后特征;The combination module is also used to combine the second feature information corresponding to the second training sample and the third feature information to obtain a second combined feature; 所述第二分类模块,还用于将所述第二组合后特征输入训练后的所述第一分类网络,得到所述训练后的第一分类网络输出的与所述第二训练样本对应的第二预测决策信息;The second classification module is also used to input the second combined features into the trained first classification network to obtain the output of the trained first classification network corresponding to the second training sample. second prediction decision information; 所述第二分类模块,还用于将所述第二组合后特征输入第二分类网络,得到所述第二分类网络输出的与所述第二训练样本对应的第三预测决策信息;The second classification module is also used to input the second combined features into a second classification network to obtain the third prediction decision information output by the second classification network corresponding to the second training sample; 所述训练模块,具体用于根据第二损失函数,对所述第二分类网络进行训练,得到训练后的第二分类网络,所述训练后的特征获取网络和所述训练后的第二分类网络归属于同一目标神经网络;The training module is specifically used to train the second classification network according to the second loss function to obtain the trained second classification network, the trained feature acquisition network and the trained second classification network. The network belongs to the same target neural network; 其中,所述第二损失函数包括第四损失函数项和第五损失函数项,所述第四损失函数项指示所述第二预测决策信息和所述第二训练样本所对应的期望决策信息之间的相似度,所述第五损失函数指示所述第二预测决策信息和所述第三预测决策信息之间的相似度。Wherein, the second loss function includes a fourth loss function term and a fifth loss function term, and the fourth loss function term indicates the relationship between the second prediction decision information and the expected decision information corresponding to the second training sample. The fifth loss function indicates the similarity between the second prediction decision information and the third prediction decision information. 根据权利要求11所述的装置,其特征在于,所述获取模块,具体用于:The device according to claim 11, characterized in that the acquisition module is specifically used for: 通过训练后的所述特征获取网络生成与所述第二训练样本对应的所述第一特征信息;Generate the first feature information corresponding to the second training sample through the trained feature acquisition network; 将与所述第二训练样本对应的所述第一特征信息和扰动信息进行加权求和,得到所述第三特征信息,所述扰动信息的权重值为可调整的。 The first feature information and disturbance information corresponding to the second training sample are weighted and summed to obtain the third feature information, and the weight value of the disturbance information is adjustable. 根据权利要求9至12任一项所述的装置,其特征在于,The device according to any one of claims 9 to 12, characterized in that, 所述第二分类模块,还用于根据所述与第一训练样本对应的第二特征信息执行分类操作,得到与所述第一训练样本对应的第四预测决策信息,其中,所述第一损失函数还包括第六损失函数项,所述第六损失函数项指示所述第四预测决策信息和所述第一训练样本所对应的期望决策信息之间的相似度,所述第二损失函数项指示所述第一损失函数项所对应的梯度和所述第六损失函数项所对应的梯度之间的相似度。The second classification module is also configured to perform a classification operation based on the second feature information corresponding to the first training sample to obtain fourth prediction decision information corresponding to the first training sample, wherein the first The loss function also includes a sixth loss function term, the sixth loss function term indicates the similarity between the fourth predicted decision information and the expected decision information corresponding to the first training sample, and the second loss function The term indicates the similarity between the gradient corresponding to the first loss function term and the gradient corresponding to the sixth loss function term. 根据权利要求9至12任一项所述的装置,其特征在于,所述装置应用于如下任一种场景:确定是否推荐所述第一训练样本指向的物体、确定所述第一训练样本中的物体是否处于目标状态或确定是否同意所述第一训练样本指向的申请者的请求。The device according to any one of claims 9 to 12, characterized in that the device is applied in any of the following scenarios: determining whether to recommend the object pointed by the first training sample, determining whether the object pointed to by the first training sample is Determine whether the object is in the target state or whether to agree to the request of the applicant pointed to by the first training sample. 一种数据的处理装置,其特征在于,所述装置包括:A data processing device, characterized in that the device includes: 特征提取模块,用于将待处理数据输入特征获取网络,通过所述特征获取网络对所述待处理数据进行特征提取,得到所述待处理数据的特征信息;A feature extraction module, used to input the data to be processed into a feature acquisition network, perform feature extraction on the data to be processed through the feature acquisition network, and obtain the feature information of the data to be processed; 生成模块,用于根据所述待处理数据的特征信息,通过所述特征获取网络生成第一特征信息和第二特征信息,所述第一特征信息包括所述待处理数据中与目标属性关联的信息的特征;Generating module, configured to generate first feature information and second feature information through the feature acquisition network according to the feature information of the data to be processed, where the first feature information includes the feature information associated with the target attribute in the data to be processed. Characteristics of information; 组合模块,用于将所述第一特征信息和所述第二特征信息进行组合,得到第一组合后特征;A combination module, used to combine the first feature information and the second feature information to obtain the first combined feature; 分类模块,用于将所述第一组合后特征输入分类网络,得到所述分类网络输出的预测决策信息;A classification module, configured to input the first combined features into a classification network to obtain prediction decision information output by the classification network; 其中,所述特征获取网络采用第一损失函数训练得到,所述第一损失函数包括第一损失函数项和第二损失函数项,所述第一损失函数项指示预测类别信息和期望类别信息之间的相似度,所述预测类别信息指示输入所述特征获取网络的数据中与目标属性关联的信息的预测类别,所述期望类别信息指示输入所述特征获取网络的数据中与所述目标属性关联的信息的正确类别,采用所述第二损失函数项进行训练的目的包括降低所述第一特征信息和所述第二特征信息之间的相似度。Wherein, the feature acquisition network is trained using a first loss function. The first loss function includes a first loss function term and a second loss function term. The first loss function term indicates the relationship between predicted category information and expected category information. The predicted category information indicates the predicted category of the information associated with the target attribute in the data input to the feature acquisition network, and the expected category information indicates the predicted category of the information associated with the target attribute in the data input to the feature acquisition network. The correct category of the associated information, the purpose of using the second loss function term for training includes reducing the similarity between the first feature information and the second feature information. 根据权利要求15所述的装置,其特征在于,所述第一损失函数还包括第三损失函数项,所述第三损失函数项指示第一预测决策信息和输入所述特征获取网络的数据所对应的期望决策信息之间的相似度,所述第一预测决策信息指示与输入所述特征获取网络的数据对应的决策。The device according to claim 15, characterized in that the first loss function further includes a third loss function term, the third loss function term indicates the first prediction decision information and the data input to the feature acquisition network. Similarity between corresponding desired decision information, the first predicted decision information indicating a decision corresponding to the data input to the feature acquisition network. 一种计算机程序产品,其特征在于,所述计算机程序产品包括程序,当所述程序在计算机上运行时,使得计算机执行如权利要求1至6中任一项所述的方法,或者,使得计算机执行如权利要求7或8所述的方法。A computer program product, characterized in that the computer program product includes a program that, when the program is run on a computer, causes the computer to perform the method according to any one of claims 1 to 6, or causes the computer to The method of claim 7 or 8 is performed. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有程序,当所述程序在计算机上运行时,使得计算机执行如权利要求1至6中任一项所述的方法,或者,使得计算机执行如权利要求7或8所述的方法。A computer-readable storage medium, characterized in that a program is stored in the computer-readable storage medium. When the program is run on a computer, it causes the computer to execute the method described in any one of claims 1 to 6. method, or causing the computer to perform the method as claimed in claim 7 or 8. 一种训练设备,其特征在于,包括处理器和存储器,所述处理器与所述存储器耦合,A training device, characterized by comprising a processor and a memory, the processor being coupled to the memory, 所述存储器,用于存储程序; The memory is used to store programs; 所述处理器,用于执行所述存储器中的程序,使得所述训练设备执行如权利要求1至6中任一项所述的方法。The processor is configured to execute the program in the memory, so that the training device executes the method according to any one of claims 1 to 6. 一种执行设备,其特征在于,包括处理器和存储器,所述处理器与所述存储器耦合,An execution device, characterized by comprising a processor and a memory, the processor being coupled to the memory, 所述存储器,用于存储程序;The memory is used to store programs; 所述处理器,用于执行所述存储器中的程序,使得所述执行设备执行如权利要求7或8所述的方法。 The processor is configured to execute the program in the memory, so that the execution device executes the method according to claim 7 or 8.
PCT/CN2023/094166 2022-05-31 2023-05-15 Neural network training method, data processing method, and device Ceased WO2023231753A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210613415.0 2022-05-31
CN202210613415.0A CN115081615A (en) 2022-05-31 2022-05-31 Neural network training method, data processing method and equipment

Publications (1)

Publication Number Publication Date
WO2023231753A1 true WO2023231753A1 (en) 2023-12-07

Family

ID=83248249

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/094166 Ceased WO2023231753A1 (en) 2022-05-31 2023-05-15 Neural network training method, data processing method, and device

Country Status (2)

Country Link
CN (1) CN115081615A (en)
WO (1) WO2023231753A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117850261A (en) * 2024-03-04 2024-04-09 深圳松诺技术有限公司 Intelligent switch sensor data analysis method and system
CN119046781A (en) * 2024-10-30 2024-11-29 北京谛声科技有限责任公司 Rolling bearing fault diagnosis method and device
CN119806028A (en) * 2025-03-13 2025-04-11 中北大学 An intelligent control method for microchannel continuous crystallizer

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115081615A (en) * 2022-05-31 2022-09-20 华为技术有限公司 Neural network training method, data processing method and equipment
CN116882512A (en) * 2023-05-30 2023-10-13 华为技术有限公司 Data processing method, training method of model and related equipment
CN118429003B (en) * 2024-07-04 2024-10-01 浙江鸟潮供应链管理有限公司 Merchant decision prediction method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113269149A (en) * 2021-06-24 2021-08-17 中国平安人寿保险股份有限公司 Living body face image detection method and device, computer equipment and storage medium
CN113449799A (en) * 2021-06-30 2021-09-28 上海西井信息科技有限公司 Target detection and classification method, system, device and storage medium
CN113869366A (en) * 2021-08-27 2021-12-31 深延科技(北京)有限公司 Model training method, relationship classification method, retrieval method and related device
CN115081615A (en) * 2022-05-31 2022-09-20 华为技术有限公司 Neural network training method, data processing method and equipment

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111695596A (en) * 2020-04-30 2020-09-22 华为技术有限公司 Neural network for image processing and related equipment
CN112183747B (en) * 2020-09-29 2024-07-02 华为技术有限公司 Neural network training method, neural network compression method and related equipment
CN113627421B (en) * 2021-06-30 2024-09-06 华为技术有限公司 An image processing method, a model training method and related equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113269149A (en) * 2021-06-24 2021-08-17 中国平安人寿保险股份有限公司 Living body face image detection method and device, computer equipment and storage medium
CN113449799A (en) * 2021-06-30 2021-09-28 上海西井信息科技有限公司 Target detection and classification method, system, device and storage medium
CN113869366A (en) * 2021-08-27 2021-12-31 深延科技(北京)有限公司 Model training method, relationship classification method, retrieval method and related device
CN115081615A (en) * 2022-05-31 2022-09-20 华为技术有限公司 Neural network training method, data processing method and equipment

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117850261A (en) * 2024-03-04 2024-04-09 深圳松诺技术有限公司 Intelligent switch sensor data analysis method and system
CN119046781A (en) * 2024-10-30 2024-11-29 北京谛声科技有限责任公司 Rolling bearing fault diagnosis method and device
CN119806028A (en) * 2025-03-13 2025-04-11 中北大学 An intelligent control method for microchannel continuous crystallizer

Also Published As

Publication number Publication date
CN115081615A (en) 2022-09-20

Similar Documents

Publication Publication Date Title
CN111797893B (en) Neural network training method, image classification system and related equipment
WO2023231753A1 (en) Neural network training method, data processing method, and device
EP3757905A1 (en) Deep neural network training method and apparatus
CN110659723B (en) Data processing method and device based on artificial intelligence, medium and electronic equipment
CN115757692A (en) Data processing method and device
CN113361593B (en) Methods, roadside equipment and cloud control platform for generating image classification models
WO2021218471A1 (en) Neural network for image processing and related device
WO2021238333A1 (en) Text processing network, neural network training method, and related device
CN111414915A (en) A text recognition method and related equipment
WO2022001724A1 (en) Data processing method and device
US20250225398A1 (en) Data processing method and related apparatus
CN111566646A (en) Electronic device for obfuscating and decoding data and method for controlling the same
US20240232575A1 (en) Neural network obtaining method, data processing method, and related device
CN113159315A (en) Neural network training method, data processing method and related equipment
WO2023231954A1 (en) Data denoising method and related device
CN112529149A (en) Data processing method and related device
CN112069412B (en) Information recommendation method, device, computer equipment and storage medium
WO2022052647A1 (en) Data processing method, neural network training method, and related device
WO2023185925A1 (en) Data processing method and related apparatus
CN113065634B (en) Image processing method, neural network training method and related equipment
WO2024179485A1 (en) Image processing method and related device thereof
CN113627421A (en) Image processing method, model training method and related equipment
CN115131593B (en) Data processing method, neural network training method and related equipment
WO2023197910A1 (en) User behavior prediction method and related device thereof
WO2023045949A1 (en) Model training method and related device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23814938

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 23814938

Country of ref document: EP

Kind code of ref document: A1