CN115130536A

CN115130536A - Training method of feature extraction model, data processing method, device and equipment

Info

Publication number: CN115130536A
Application number: CN202210369228.2A
Authority: CN
Inventors: 李文豪
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-04-08
Filing date: 2022-04-08
Publication date: 2022-09-30

Abstract

The embodiment of the application provides a training method, a data processing method, a device and equipment of a feature extraction model, and relates to the technical fields of artificial intelligence, payment safety, finance and cloud, wherein the training method comprises the following steps: acquiring a training set, and constructing a plurality of positive sample pairs and a plurality of negative sample pairs based on the training set, wherein the positive sample pairs comprise two training samples in the same category, and the negative sample pairs comprise two training samples in different categories; repeatedly executing training operation on the neural network model based on the training set until a preset condition is met, and obtaining a trained feature extraction model; and for each training operation, if the preset condition is not met, determining a plurality of new sample pairs based on the similarity between the feature vectors of the training samples output by the model, and taking the new sample pairs as the sample pairs based on the subsequent training operation. Based on the training method provided by the application, the performance of the feature extraction model can be effectively improved.

Description

Training method, data processing method, device and equipment for feature extraction model

技术领域technical field

本申请涉及人工智能、支付安全、金融及云技术领域，具体的，本申请实施例涉及一种特征提取模型的训练方法、数据处理方法、装置及设备。The present application relates to the fields of artificial intelligence, payment security, finance and cloud technologies. Specifically, the embodiments of the present application relate to a training method, data processing method, apparatus and equipment for a feature extraction model.

背景技术Background technique

随着人工智能技术的快速发展，越来越多的人工智能技术被应用于各种领域，基于人工智能技术的数据处理方式已经被广泛应用到越来越多的场景中，而基于神经网络模型的数据处理属于其中非常重要的分支之一。With the rapid development of artificial intelligence technology, more and more artificial intelligence technologies are applied in various fields. Data processing methods based on artificial intelligence technology have been widely used in more and more scenarios. data processing belongs to one of the very important branches.

对于基于神经网络模型的数据处理方式中，模型的输入数据的特征提取通常是必不可少的，而模型所提取的特征的表达能力是影响模型处理结果好坏的非常关键的因素，因此，通过训练方式得到高性能的特征提取模型是人工智能技术中重要的研究课题。现有技术中，为了提升模型性能，已经提出了多种模型训练方法，虽然一些现有训练方式训练得到的模型能够一定程度上满足应用需求，但是模型的性能仍有待提升，尤其是在将模型应用到特定任务(比如特定场景下的分类任务)时，通过模型提取出的特征的表达能力仍需改进。For the data processing method based on the neural network model, the feature extraction of the input data of the model is usually essential, and the expression ability of the features extracted by the model is a very key factor that affects the quality of the model processing results. Therefore, by Training methods to obtain high-performance feature extraction models is an important research topic in artificial intelligence technology. In the prior art, in order to improve the performance of the model, a variety of model training methods have been proposed. Although the models trained by some existing training methods can meet the application requirements to a certain extent, the performance of the model still needs to be improved, especially when the model is trained. When applied to specific tasks (such as classification tasks in specific scenarios), the expressiveness of the features extracted by the model still needs to be improved.

发明内容SUMMARY OF THE INVENTION

本申请实施例的目的旨在提供一种能够有效提升特征提取模型的性能的训练方法、以及基于该训练方法的数据处理方法、装置、电子设备以及计算机可读存储介质。为了实现该目的，本申请实施例提供的技术方案如下：The purpose of the embodiments of the present application is to provide a training method that can effectively improve the performance of a feature extraction model, and a data processing method, apparatus, electronic device, and computer-readable storage medium based on the training method. In order to achieve this purpose, the technical solutions provided in the embodiments of the present application are as follows:

一方面，本申请实施例提供了一种特征提取模型的训练方法，该方法包括：On the one hand, an embodiment of the present application provides a method for training a feature extraction model, the method comprising:

获取训练集，所述训练集包括多个类别的训练样本；obtaining a training set, the training set includes training samples of multiple categories;

基于所述训练集构建多个样本对，所述多个样本对包括多个正样本对和多个负样本对，其中，所述正样本对包括属于同一类别的两个训练样本，所述负样本对包括所述不同类别的两个训练样本；Construct a plurality of sample pairs based on the training set, the plurality of sample pairs include a plurality of positive sample pairs and a plurality of negative sample pairs, wherein the positive sample pairs include two training samples belonging to the same category, the negative sample pairs The sample pair includes two training samples of the different classes;

基于所述训练集对神经网络模型重复执行训练操作，直至满足预设条件，将满足所述预设条件的神经网络模型作为训练好的特征提取模型；其中，所述预设条件包括神经网络模型对应的训练总损失收敛或者训练次数达到设定次数，所述训练操作包括：The training operation is repeatedly performed on the neural network model based on the training set until the preset conditions are met, and the neural network model satisfying the preset conditions is used as the trained feature extraction model; wherein, the preset conditions include the neural network model The corresponding training total loss converges or the number of training times reaches the set number of times, and the training operations include:

将多个样本对中的各个训练样本分别输入到所述神经网络模型中，得到各个训练样本的特征向量；Input each training sample in the multiple sample pairs into the neural network model respectively, and obtain the feature vector of each training sample;

基于各个所述样本对中的训练样本的特征向量之间的第一相似度，确定训练总损失；Determine the total training loss based on the first similarity between the feature vectors of the training samples in each of the sample pairs;

若所述训练总损失未收敛且训练次数未达到设定次数，则对所述神经网络模型的模型参数进行调整，基于所述各个训练样本的特征向量之间的第二相似度，确定多个新的样本对，并将所述新的多个样本对作为后续训练操作时所基于的多个样本对。If the total training loss does not converge and the number of training times does not reach the set number of times, the model parameters of the neural network model are adjusted, and based on the second similarity between the feature vectors of the respective training samples, a plurality of new sample pairs, and use the new multiple sample pairs as multiple sample pairs on which subsequent training operations are based.

本申请的可选实施例中，上述训练总损失表征了输入到模型的多个样本对中正样本对之间的差异程度以及负样本对之间的相似程度。In an optional embodiment of the present application, the above-mentioned total training loss represents the degree of difference between pairs of positive samples and the degree of similarity between pairs of negative samples in multiple sample pairs input to the model.

本申请的可选实施例中，每个训练样本对应的新的正样本对所对应的第二相似度，小于该训练样本对应的更新前的正样本对所对应的第二相似度，每个训练样本对应的新的负样本对所对应的第二相似度，大于该训练样本对应的更新前的负样本对所对应的第二相似度。In an optional embodiment of the present application, the second similarity corresponding to the new positive sample pair corresponding to each training sample is smaller than the second similarity corresponding to the pre-updated positive sample pair corresponding to the training sample, and each The second similarity corresponding to the new negative sample pair corresponding to the training sample is greater than the second similarity corresponding to the pre-updated negative sample pair corresponding to the training sample.

可选的，上述特征提取模型为分类模型中的特征提取模型，分类模型用于通过特征提取模型提取第一待处理数据的特征向量，并基于提取的特征向量识别第一待处理数据对应的分类结果；其中，分类结果为多个指定类别中的一个，上述多个类别包括上述多个指定类别，每个指定类别的一个训练样本为是对应于该指定类别的第二待处理数据。Optionally, the above-mentioned feature extraction model is a feature extraction model in a classification model, and the classification model is used to extract the feature vector of the first data to be processed through the feature extraction model, and identify the classification corresponding to the first data to be processed based on the extracted feature vector. Result; wherein, the classification result is one of multiple specified categories, the multiple categories include the multiple specified categories, and a training sample of each specified category is the second data to be processed corresponding to the specified category.

可选的，上述分类模型具体可以基于提取的特征向量识别第一待处理数据是否是目标类别对应的数据，即上述分类结果表征了第一待处理数据为目标类别对应的数据或者为非目标类别对应的数据，相应的，上述多个指定类别包括目标类别和至少一个非目标类别。Optionally, the above classification model may specifically identify whether the first data to be processed is data corresponding to the target category based on the extracted feature vector, that is, the above classification result indicates that the first data to be processed is data corresponding to the target category or is a non-target category. Corresponding data, correspondingly, the above-mentioned multiple designated categories include a target category and at least one non-target category.

可选的，上述指定类别为指定对象类别，上述第二待处理数据为指定对象类别的样本对象对应的业务数据，上述第一待处理数据为目标对象对应的业务数据，上述分类结果表征了目标对象是否为目标类别的对象。Optionally, the above-mentioned specified category is a specified object category, the above-mentioned second data to be processed is business data corresponding to a sample object of the specified object category, the above-mentioned first to-be-processed data is business data corresponding to the target object, and the above classification result represents the target. Whether the object is an object of the target class.

另一方面，本申请实施例提供了一种特征提取模型的训练装置，该装置包括：On the other hand, an embodiment of the present application provides a training device for a feature extraction model, and the device includes:

训练数据获取模块，用于获取训练集，所述训练集包括多个类别的训练样本；a training data acquisition module for acquiring a training set, the training set including training samples of multiple categories;

训练数据处理模块，用于基于所述训练集构建多个样本对，所述多个样本对包括多个正样本对和多个负样本对，其中，所述正样本对包括同一类别的两个训练样本，所述负样本对包括所述不同类别的两个训练样本；A training data processing module, configured to construct a plurality of sample pairs based on the training set, the plurality of sample pairs include a plurality of positive sample pairs and a plurality of negative sample pairs, wherein the positive sample pairs include two samples of the same category training samples, the negative sample pair includes two training samples of the different categories;

模型训练模块，用于基于所述训练集对神经网络模型重复执行训练操作，直至满足预设条件，将满足所述预设条件的神经网络模型作为训练好的特征提取模型；其中，上述预设条件包括所述神经网络模型对应的训练总损失收敛或者训练次数达到设定次数，所述训练操作包括：A model training module, configured to repeatedly perform training operations on the neural network model based on the training set until a preset condition is met, and use the neural network model that satisfies the preset condition as a trained feature extraction model; wherein the above preset The conditions include that the total training loss corresponding to the neural network model converges or the number of training times reaches a set number of times, and the training operations include:

可选的，模型训练模块可以用于：对于每个所述训练样本，分别确定该训练样本的特征向量与各个第一样本的特征向量之间的第二相似度，将对应的第二相似度最低的第一样本和该训练样本作为一个新的正样本对，其中，所述第一样本为所述各个训练样本中与该训练样本属于相同类别的训练样本；对于每个所述训练样本，分别确定该训练样本的特征向量与各个第二样本的特征向量之间的第二相似度，将对应的第二相似度最高的第二样本和该训练样本作为一个新的负样本对，其中，所述第二样本为所述各个训练样本中与该训练样本属于不同类别的训练样本。Optionally, the model training module can be used to: for each of the training samples, respectively determine the second similarity between the feature vector of the training sample and the feature vector of each first sample, and assign the corresponding second similarity. The first sample with the lowest degree and the training sample are taken as a new positive sample pair, wherein the first sample is a training sample belonging to the same category as the training sample among the training samples; for each of the training samples training samples, respectively determine the second similarity between the feature vector of the training sample and the feature vector of each second sample, and use the corresponding second sample with the highest second similarity and the training sample as a new negative sample pair , wherein the second sample is a training sample belonging to a different category from the training sample among the training samples.

可选的，训练数据处理模块可以用于：将所述训练集中各个训练样本分别作为锚点，构建各所述锚点对应的样本组，每个锚点对应的样本组包括该锚点对应的一个正样本对和一个负样本对，其中，一个锚点对应的正样本对包括该锚点和该锚点的正样本，一个锚点对应的负样本对包括该锚点和该锚点的负样本；Optionally, the training data processing module can be used to: take each training sample in the training set as an anchor point, and construct a sample group corresponding to each anchor point, and the sample group corresponding to each anchor point includes the corresponding sample group of the anchor point. A positive sample pair and a negative sample pair, wherein a positive sample pair corresponding to an anchor point includes the anchor point and the positive sample of the anchor point, and a negative sample pair corresponding to an anchor point includes the anchor point and the negative sample of the anchor point. sample;

相应的，模型训练模块在确定训练总损失时可以用于：对于每个样本组，根据该样本组的正样本对中两个样本的特征向量之间的第一相似度、以及该样本组的负样本对中两个样本的特征向量之间的第一相似度，确定该样本组对应的训练损失；根据各所述样本组对应的训练损失，确定训练总损失；Correspondingly, when determining the total training loss, the model training module can be used to: for each sample group, according to the first similarity between the feature vectors of the two samples in the positive sample pair of the sample group, and the The first similarity between the feature vectors of the two samples in the negative sample pair determines the training loss corresponding to the sample group; the total training loss is determined according to the training loss corresponding to each of the sample groups;

模型训练模块在确定多个新的样本对时可以用于：基于所述各个训练样本的特征向量之间的第二相似度，确定各所述锚点对应的新的样本组，将各所述锚点对应的新的样本组中的样本对作为后续训练操作时的多个样本对。When determining a plurality of new sample pairs, the model training module can be used to: determine a new sample group corresponding to each anchor point based on the second similarity between the feature vectors of each training sample, The sample pairs in the new sample group corresponding to the anchor point are used as multiple sample pairs in subsequent training operations.

可选的，模型训练模块在确定各所述锚点对应的新的样本组时可以用于：对于每个所述锚点，分别确定该锚点的特征向量与各个第一样本的特征向量之间的第二相似度，将对应的第二相似度最低的第一样本确定为该锚点对应的新的正样本，其中，所述第一样本为所述各个训练样本中与该锚点属于相同类别的训练样本；对于每个所述锚点，确定该锚点的特征向量与各个第二样本的特征向量之间的第二相似度，将对应的第二相似度最高的第二样本确定为该锚点对应的新的负样本，其中，所述第二样本为所述各个训练样本中与该锚点属于不同类别的训练样本。Optionally, the model training module may be used when determining a new sample group corresponding to each of the anchor points: for each of the anchor points, respectively determine the feature vector of the anchor point and the feature vector of each first sample. The second similarity between the two is determined, and the first sample with the lowest second similarity is determined as a new positive sample corresponding to the anchor point, wherein the first sample is the same as the The anchor points belong to the training samples of the same category; for each anchor point, the second similarity between the feature vector of the anchor point and the feature vector of each second sample is determined, and the corresponding second similarity is the highest. The second sample is determined as a new negative sample corresponding to the anchor point, wherein the second sample is a training sample belonging to a different category from the anchor point among the training samples.

可选的，训练数据处理模块在构建各所述锚点对应的样本组时可以用于：根据所述训练集，构建至少一个批数据集，其中，每个所述批数据集包括p个类别的训练样本，且每个类别的训练样本的数量为k个，p≥2，k≥3；对于每个所述批数据集，将该批数据集中的每个训练样本分别作为锚点，基于该批数据集中的各训练样本，构建该批数据集中的每个锚点对应的样本组；Optionally, the training data processing module may be configured to: construct at least one batch data set according to the training set when constructing the sample groups corresponding to the anchor points, wherein each of the batch data sets includes p categories. , and the number of training samples for each category is k, p≥2, k≥3; for each batch of data sets, each training sample in the batch of data sets is used as an anchor point, based on For each training sample in the batch of datasets, construct a sample group corresponding to each anchor point in the batch of datasets;

相应的，模型训练模块可以用于：基于各所述批数据集对神经网络模型重复执行训练操作，其中，每次训练操作是基于一个所述批数据集中的各锚点对应的样本组进行的；Correspondingly, the model training module can be used for: repeating training operations on the neural network model based on each of the batch data sets, wherein each training operation is performed based on a sample group corresponding to each anchor point in one of the batch data sets ;

模型训练模块在确定各所述锚点对应的新的样本组时可以用于：对于当前次训练操作对应的批数据集中的每个锚点，根据该锚点与该批数据集中除该锚点之外的各训练样本之间的第二相似度，确定该锚点对应的新的样本组。When determining a new sample group corresponding to each of the anchor points, the model training module can be used to: for each anchor point in the batch data set corresponding to the current training operation, divide the anchor point according to the anchor point and the batch data set. The second similarity between the training samples other than the anchor point determines the new sample group corresponding to the anchor point.

可选的，对于每个样本组，模型训练模块在确定该样本组对应的训练损失时可以用于：Optionally, for each sample group, when determining the training loss corresponding to the sample group, the model training module can be used to:

确定该样本组的正样本对中两个样本的特征向量之间的第一距离、以及该样本组的负样本对中两个样本的特征向量之间的第二距离，所述第一距离表征了该样本组的正样本对所对应的第一相似度，所述第二距离表征了该样本组的负样本对所对应的第一相似度；Determine the first distance between the feature vectors of the two samples in the positive sample pair of the sample group and the second distance between the feature vectors of the two samples in the negative sample pair of the sample group, where the first distance represents the first similarity corresponding to the positive sample pair of the sample group, and the second distance represents the first similarity corresponding to the negative sample pair of the sample group;

确定所述第一距离和所述第二距离的差值；根据所述差值确定该样本组对应的训练损失，其中，该样本组对应的训练损失与所述差值成正相关。determining the difference between the first distance and the second distance; determining the training loss corresponding to the sample group according to the difference, wherein the training loss corresponding to the sample group is positively correlated with the difference.

可选的，每个所述样本组对应的训练损失是基于以下表达式确定出的：Optionally, the training loss corresponding to each of the sample groups is determined based on the following expression:

s(x)＝ln(1+e^x)s( ^x )=ln(1+ex )

x＝d(a，p)-d(a，n)+βx=d(a,p)-d(a,n)+β

其中，s(x)表示样本组对应的训练损失，a、p和n分别表示样本组中的锚点、正样本和负样本，d(a，p)表示第一距离，d(a，n)表示第二距离，β表示预设的调节阈值。Among them, s(x) represents the training loss corresponding to the sample group, a, p and n represent the anchor points, positive samples and negative samples in the sample group, respectively, d(a, p) represents the first distance, d(a, n ) represents the second distance, and β represents the preset adjustment threshold.

另一方面，本申请实施例还提供了一种基于神经网络模型的数据处理方法，该方法包括：On the other hand, an embodiment of the present application also provides a data processing method based on a neural network model, the method comprising:

获取待处理数据；将所述待处理数据输入第一特征提取模型中，通过所述第一特征提取模型提取所述待处理数据对应的第一特征向量，其中，所述第一特征提取模型是采用本申请任一可选实施例中提供的训练方法训练得到的；Obtaining data to be processed; inputting the data to be processed into a first feature extraction model, and extracting a first feature vector corresponding to the data to be processed through the first feature extraction model, wherein the first feature extraction model is Obtained by training using the training method provided in any optional embodiment of the present application;

基于所述第一特征向量，确定所述待处理数据对应的分类结果。Based on the first feature vector, a classification result corresponding to the data to be processed is determined.

再一方面，本申请实施例还提供了一种基于神经网络模型的数据处理装置，该装置包括：On the other hand, an embodiment of the present application also provides a data processing device based on a neural network model, the device comprising:

数据获取模块，用于获取待处理数据；The data acquisition module is used to acquire the data to be processed;

数据处理模块，用于将所述待处理数据输入第一特征提取模型中，通过所述第一特征提取模型提取所述待处理数据对应的第一特征向量，基于所述第一特征向量，确定所述待处理数据对应的分类结果，其中，所述第一特征提取模型是采用本申请任一可选实施例中提供的训练方法训练得到的。a data processing module, configured to input the data to be processed into a first feature extraction model, extract a first feature vector corresponding to the data to be processed through the first feature extraction model, and determine based on the first feature vector The classification result corresponding to the data to be processed, wherein the first feature extraction model is obtained by training using the training method provided in any optional embodiment of the present application.

可选的，所述数据处理模块可以用于：通过所述第二特征提取模型提取所述待处理数据的第二特征向量；将所述第一特征向量和所述第二特征向量融合；基于融合后的特征，确定所述待处理数据对应的分类结果。Optionally, the data processing module may be used to: extract the second feature vector of the data to be processed by using the second feature extraction model; fuse the first feature vector and the second feature vector; based on The fused features determine the classification result corresponding to the data to be processed.

可选的，数据获取模块可以用于：获取目标对象对应的多个时段的目标业务的业务数据，每个时段对应的所述业务数据包括目标业务的至少一个业务属性的属性值；基于所述多个时段对应的业务数据，构建所述目标对象对应的业务时序特征矩阵，将所述业务时序特征矩阵作为所述待处理数据，所述分类结果表征了所述目标对象的对象类型；其中，所述业务时序特征矩阵的行数为所述多个时段的时段个数，列数为所述至少一个业务属性的属性个数，所述业务时序特征矩阵中的每个元素值表征一个所述时段对应的一个业务属性的属性值。Optionally, the data acquisition module can be used to: acquire the business data of the target business in multiple time periods corresponding to the target object, and the business data corresponding to each time period includes the attribute value of at least one business attribute of the target business; business data corresponding to multiple time periods, construct a business time sequence feature matrix corresponding to the target object, use the business time sequence feature matrix as the data to be processed, and the classification result represents the object type of the target object; wherein, The number of rows of the service time sequence feature matrix is the number of time periods of the plurality of time periods, the number of columns is the number of attributes of the at least one service attribute, and each element value in the service time sequence feature matrix represents one of the The attribute value of a business attribute corresponding to the time period.

基于本申请实施例提供的方法，本申请还提供了一种电子设备，该电子设备包括存储器和处理器，所述存储器中存储有计算机程序，所述处理器执行所述计算机程序以实现本申请任一可选实施例提供的训练方法，或者实现本申请任一可选实施例提供的数据处理方法。Based on the methods provided by the embodiments of the present application, the present application further provides an electronic device, the electronic device includes a memory and a processor, the memory stores a computer program, and the processor executes the computer program to implement the present application The training method provided by any optional embodiment, or the data processing method provided by any optional embodiment of the present application is implemented.

本申请实施例还提供了一种计算机可读存储介质，该存储介质中存储有计算机程序，所述计算机程序被处理器执行时实现本申请任一可选实施例提供的训练方法，或者实现本申请任一可选实施例提供的数据处理方法。Embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored in the storage medium, and when the computer program is executed by a processor, the training method provided by any optional embodiment of the present application is implemented, or the computer program is implemented. Apply for the data processing method provided by any optional embodiment.

本申请实施例还提供了一种计算机程序产品，该计算机产品包括计算机程序，所述计算机程序被处理器执行时实现本申请任一可选实施例提供的训练方法，或者实现本申请任一可选实施例提供的数据处理方法。An embodiment of the present application further provides a computer program product, the computer product includes a computer program, and when the computer program is executed by a processor, implements the training method provided by any optional embodiment of the present application, or implements any optional embodiment of the present application. The data processing method provided by the selected embodiment.

本申请实施例提供的技术方案带来的有益效果如下：The beneficial effects brought by the technical solutions provided in the embodiments of the present application are as follows:

本申请实施例提供的训练方法，在基于正样本对和负样本对进行神经网络模型的训练过程中，会基于模型提取得到的训练样本的特征向量之间的相似度，不断进行各个样本对的更新，即根据相似度确定新的样本对。采用该方式，可以在模型训练过程中，不再是采用简单的随机选取正样本对和负样本对的方案，而是会根据样本之间的相似度来选择样本对，实现对样本对的优化，以提升模型的训练效果。基于本申请的该方法，可以在训练过程中根据应用需求选择相匹配的样本对，以更好的满足应用需求，可选的，可以通过筛选学习较困难的样本组合让模型来学习，从而可以有效提升训练好的特征提取模型的模型性能，提升模型所输出的特征向量的区分度，且可以加快模型的训练效率，通过该方式训练模型，能够更好的满足实际应用需求。In the training method provided by the embodiment of the present application, during the training process of the neural network model based on the positive sample pair and the negative sample pair, based on the similarity between the feature vectors of the training samples extracted by the model, the training of each sample pair is continuously performed. Update, that is, determine a new sample pair according to the similarity. In this way, in the process of model training, instead of simply selecting a positive sample pair and a negative sample pair at random, the sample pair is selected according to the similarity between the samples to realize the optimization of the sample pair. , to improve the training effect of the model. Based on the method of the present application, matching sample pairs can be selected according to application requirements in the training process to better meet application requirements. Optionally, the model can learn by selecting sample combinations that are difficult to learn. Effectively improve the model performance of the trained feature extraction model, improve the discrimination of the feature vector output by the model, and speed up the training efficiency of the model. Training the model in this way can better meet the needs of practical applications.

附图说明Description of drawings

为了更清楚地说明本申请实施例中的技术方案，下面将对本申请实施例描述中所需要使用的附图作简单地介绍。In order to illustrate the technical solutions in the embodiments of the present application more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments of the present application.

图1为本申请实施例适用的一种数据处理系统的架构示意图；FIG. 1 is a schematic diagram of the architecture of a data processing system to which an embodiment of the application is applicable;

图2为本申请实施例提供的一种应用场景中的数据处理方法的流程示意图；2 is a schematic flowchart of a data processing method in an application scenario provided by an embodiment of the present application;

图3为本申请实施例提供的一种特征提取模型的训练方法的流程示意图；3 is a schematic flowchart of a training method for a feature extraction model provided by an embodiment of the present application;

图4为本申请实施例提供的一种特征提取模型的训练方式的流程示意图；4 is a schematic flowchart of a training method of a feature extraction model provided by an embodiment of the present application;

图5为本申请实施例提供的一种模型训练原理的示意图；5 is a schematic diagram of a model training principle provided by an embodiment of the present application;

图6为本申请实施例提供的一种数据处理方法的流程示意图；6 is a schematic flowchart of a data processing method provided by an embodiment of the present application;

图7为本申请实施例提供的一种训练装置的结构示意图；7 is a schematic structural diagram of a training device provided by an embodiment of the present application;

图8为本申请实施例提供的一种数据处理装置的结构示意图；FIG. 8 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application;

图9为本申请实施例所适用的一种电子设备的结构示意图。FIG. 9 is a schematic structural diagram of an electronic device to which an embodiment of the present application is applied.

具体实施方式Detailed ways

下面结合本申请中的附图描述本申请的实施例。应理解，下面结合附图所阐述的实施方式，是用于解释本申请实施例的技术方案的示例性描述，对本申请实施例的技术方案不构成限制。Embodiments of the present application are described below with reference to the accompanying drawings in the present application. It should be understood that the embodiments described below in conjunction with the accompanying drawings are exemplary descriptions for explaining the technical solutions of the embodiments of the present application, and do not limit the technical solutions of the embodiments of the present application.

本技术领域技术人员可以理解，除非特意声明，这里使用的单数形式“一”、“一个”、“所述”和“该”也可包括复数形式。应该进一步理解的是，本申请实施例所使用的术语“包括”以及“包含”是指相应特征可以实现为所呈现的特征、信息、数据、步骤、操作、元件和/或组件，但不排除实现为本技术领域所支持其他特征、信息、数据、步骤、操作、元件、组件和/或它们的组合等。应该理解，当我们称一个元件被“连接”或“耦接”到另一元件时，该一个元件可以直接连接或耦接到另一元件，也可以指该一个元件和另一元件通过中间元件建立连接关系。此外，这里使用的“连接”或“耦接”可以包括无线连接或无线耦接。这里使用的术语“和/或”指示该术语所限定的项目中的至少一个，例如“A和/或B”可以实现为“A”，或者实现为“B”，或者实现为“A和B”。在描述多个(两个或两个以上)项目时，如果没有明确限定多个项目之间的关系，这多个项目之间可以是指多个项目中的一个、多个或者全部，例如，对于“参数A包括A1、A2、A3”的描述，可以实现为参数A包括A1或A2或A3，还可以实现为参数A包括参数A1、A2、A3这三项中的至少两项。It will be understood by those skilled in the art that the singular forms "a", "an", "the" and "the" as used herein can include the plural forms as well, unless expressly stated otherwise. It should be further understood that the terms "comprising" and "comprising" used in the embodiments of the present application mean that corresponding features can be implemented as presented features, information, data, steps, operations, elements and/or components, but do not exclude Implementations support other features, information, data, steps, operations, elements, components, and/or combinations thereof, etc., as supported in the art. It will be understood that when we refer to an element as being "connected" or "coupled" to another element, the one element can be directly connected or coupled to the other element, or the one element and the other element may be intervening through intervening elements Establish a connection relationship. Furthermore, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein indicates at least one of the items defined by the term, eg "A and/or B" can be implemented as "A", or as "B", or as "A and B" ". When describing multiple (two or more) items, if the relationship between the multiple items is not clearly defined, the multiple items may refer to one, more or all of the multiple items, for example, The description of "parameter A includes A1, A2, and A3" can be implemented as parameter A includes A1, A2, or A3, or as parameter A includes at least two of the three parameters A1, A2, and A3.

目前相关技术中，对于数据的特征提取，比如对于时序数据的特征提取，通常是采用基于统计域、频域或时域的特征构建方案，或者是采用基于深度学习的特征Embedding(嵌入)方式，但是前者需要先验知识、专家经验，具有一定程度的信息损失，对于后者，目前的特征嵌入方式大多是采用word2vec(词转向量)、item2vec(内容转向量)等无监督embedding方式，但是将这些方式提取的特征应用于特定场景中时，特征的增益效果无法保证，也就是提取的特征具有通用性，但在特定任务下的特征区分度较差。In the current related art, for feature extraction of data, such as feature extraction of time series data, a feature construction scheme based on statistical domain, frequency domain or time domain is usually adopted, or a feature Embedding method based on deep learning is adopted. However, the former requires prior knowledge and expert experience, and has a certain degree of information loss. For the latter, the current feature embedding methods mostly use unsupervised embedding methods such as word2vec (word steering) and item2vec (content steering). When the features extracted by these methods are applied in a specific scene, the gain effect of the features cannot be guaranteed, that is, the extracted features are universal, but the feature discrimination under specific tasks is poor.

本申请实施例提供的训练方法正是针对现有技术中存在的问题，而提出的一种基于深度度量学习的特征提取模型的训练方法以及数据处理方法，基于本申请实施例提供的方法，能够有效提升特征提取模型的性能，通过该模型提取得到的特征向量具有更好的特征区分度，能够应用于特定任务中，提升任务的数据处理效果。The training method provided by the embodiment of the present application is aimed at the problems existing in the prior art, and a training method and data processing method of a feature extraction model based on deep metric learning are proposed. Based on the method provided by the embodiment of the present application, it can be Effectively improve the performance of the feature extraction model. The feature vector extracted by this model has better feature discrimination, and can be applied to specific tasks to improve the data processing effect of the task.

其中，本申请实施例提供的方案，是基于人工智能(Artificial Intelligence，AI)技术实现的，例如，特征提取模型、分类模型都是基于人工智能的神经网络模型，其中，本申请实施例中的特征提取模型可以是基于现有任一特征提取模型的模型或者是对现有的特征提取模型进行改进后的模型，也就是说，本申请实施例提供的特征提取模型的训练方法可以适用于对任意特征提取模型的训练，可以提升训练得到的模型的性能，其中，特征提取模型的训练可以基于训练集(即大量的训练样本)，采用机器学习(Machine Learning，ML)的方式训练得到的。The solutions provided in the embodiments of the present application are implemented based on artificial intelligence (Artificial Intelligence, AI) technology. For example, the feature extraction model and the classification model are both artificial intelligence-based neural network models. The feature extraction model may be a model based on any existing feature extraction model or a model after improving the existing feature extraction model, that is to say, the training method of the feature extraction model provided in this The training of any feature extraction model can improve the performance of the model obtained by training, wherein the training of the feature extraction model can be obtained by means of machine learning (ML) based on the training set (ie, a large number of training samples).

其中，人工智能是研究各种智能机器的设计原理与实现方法，使机器具有感知、推理与决策的功能。随着人工智能技术的研究和进步，人工智能技术已经在常见的智能家居、智能穿戴设备、虚拟助理、智能音箱、智能营销、无人驾驶、自动驾驶、无人机、机器人、智能医疗、智能客服、车联网、自动驾驶、智慧交通等多个领域展开研究和应用，相信随着技术的发展，该技术将在更多的领域得到应用，并发挥越来越重要的价值。Among them, artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making. With the research and progress of artificial intelligence technology, artificial intelligence technology has been used in common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, autonomous driving, drones, robots, smart medical, intelligent It is believed that with the development of technology, this technology will be applied in more fields and play more and more important value.

可选的，本申请实施例提供的方法中所涉及的数据处理可以基于云技术实现。例如，在特征提取模型的应用过程(比如对待处理数据进行分类)中以及训练过程中所涉及的数据计算(如模型训练过程中的特征向量之间的相似度计算、训练损失的计算、模型参数的调整等)可以采用云计算技术实现。其中，云计算(cloud computing)是一种计算模式，它将计算任务分布在大量计算机构成的资源池上，使各种应用系统能够根据需要获取计算力、存储空间和信息服务。提供资源的网络被称为“云”。“云”中的资源在使用者看来是可以无限扩展的，并且可以随时获取，按需使用，随时扩展，按使用付费。Optionally, the data processing involved in the methods provided in the embodiments of the present application may be implemented based on cloud technology. For example, in the application process of the feature extraction model (such as classifying the data to be processed) and the data calculation involved in the training process (such as the similarity calculation between the feature vectors in the model training process, the calculation of training loss, model parameters adjustment, etc.) can be implemented using cloud computing technology. Among them, cloud computing is a computing mode that distributes computing tasks on a resource pool composed of a large number of computers, so that various application systems can obtain computing power, storage space and information services as needed. The network that provides the resources is called the "cloud". The resources in the "cloud" are infinitely expandable in the eyes of users, and can be obtained at any time, used on demand, expanded at any time, and paid for according to usage.

本申请实施例提供的特征提取模型的训练方法或数据处理方法可以由任意的电子设备执行，如可以由用户终端或服务器执行。例如，待处理数据可以是使用者通过其用户终端发送给服务器的数据，服务器端可以部署已训练好的特征提取模型，服务器通过执行本申请实施例提供的数据处理方法，通过调用该模型实现对待处理数据的特这向量的提取，并可以根据应用需求基于提取得到的特征向量进行后续处理，后续处理可以包括但不限于对待处理数据进行分类，待处理数据与其他数据之间的相似度判断等。比如，使用者可以通过其用户终端将包含大量图像的图像集发送给服务器，服务器可以调用训练好的特征提取模型分别提取图像集中每个图像的特征向量，可以根据图像的特征向量之间的相似度将图像集中的按照图像类别进行分组，并可以将分组结果提供给使用者。The training method or data processing method of the feature extraction model provided by the embodiments of the present application may be executed by any electronic device, for example, may be executed by a user terminal or a server. For example, the data to be processed may be data sent by the user to the server through the user terminal, the server may deploy a trained feature extraction model, and the server executes the data processing method provided by the embodiment of the present application and invokes the model to realize the processing The extraction of the special vector of the processing data, and the subsequent processing can be performed based on the extracted feature vector according to the application requirements. The subsequent processing can include but is not limited to classifying the data to be processed, and judging the similarity between the data to be processed and other data, etc. . For example, a user can send an image set containing a large number of images to the server through his user terminal, and the server can call the trained feature extraction model to extract the feature vector of each image in the image set separately, and can extract the feature vector of each image in the image set according to the similarity between the feature vectors of the images. The degree of grouping the image set according to the image category, and can provide the grouping results to the user.

其中，上述服务器可以是独立的物理服务器，也可以是多个物理服务器构成的服务器集群或者分布式系统，还可以是提供云计算服务的云服务器。上述用户终端(也可以称为用户设备)可以是智能手机、平板电脑、笔记本电脑、台式计算机、智能语音交互设备(例如智能音箱)、可穿戴电子设备(例如智能手表)、车载终端、智能家电(例如智能电视)、AR/VR设备等，但并不局限于此。终端以及服务器可以通过有线或无线通信方式进行直接或间接地连接，本申请在此不做限制。The above server may be an independent physical server, a server cluster or a distributed system composed of multiple physical servers, or a cloud server that provides cloud computing services. The above-mentioned user terminal (also referred to as user equipment) may be a smart phone, a tablet computer, a notebook computer, a desktop computer, an intelligent voice interaction device (such as a smart speaker), a wearable electronic device (such as a smart watch), a vehicle terminal, and a smart home appliance. (such as smart TV), AR/VR equipment, etc., but not limited to this. The terminal and the server may be directly or indirectly connected through wired or wireless communication, which is not limited in this application.

可选的，本申请实施例提供的方法可以实现为一独立的应用程序或者是一应用程序的功能模块/插件，比如，该应用程序可以是专门的数据分类软件或者是具有数据分类功能的其他应用程序，通过该应用程序，可以实现对待处理数据的分类。Optionally, the method provided in this embodiment of the present application may be implemented as an independent application program or a functional module/plug-in of an application program, for example, the application program may be a special data classification software or other data classification function. Application, through which the classification of the data to be processed can be achieved.

本申请实施例提供的特征提取模型的训练方法训练得到的特征提取模型，可以适用于任何需要对待处理数据进行有区分度的特征向量(也就是特征表示)的提取的应用场景中，可以包括但不限于数据分类、对象分类等场景，例如，本申请实施例提供的特征提取模型可以应用于风控管理，基于该特征提取模型可以提取相关业务的业务数据的特征向量，基于该特征向量进行风险识别，比如，可以将该特征提取模型作为分类模型的骨干网络，在通过骨干网络提取得到业务数据的特征向量之后，可以基于该业务向量通过分类模型的分类模型预测业务数据对应的风险等级，该场景中不同的风险等级为不同的类别，在模型训练时，训练集中包括的训练样本可以是各个风险等级对应的多个训练样本，一每个风险等级的一个训练样本可以包括对应于该风险等级的样本业务数据(即已知真实风险等级的业务数据)。The feature extraction model trained by the training method for the feature extraction model provided in the embodiment of the present application can be applied to any application scenario that requires the extraction of discriminative feature vectors (that is, feature representations) of the data to be processed, and may include but It is not limited to scenarios such as data classification and object classification. For example, the feature extraction model provided in this embodiment of the present application can be applied to risk control management. Based on the feature extraction model, the feature vector of the business data of the related business can be extracted, and risk assessment based on the feature vector can be performed. Identify, for example, the feature extraction model can be used as the backbone network of the classification model, after the feature vector of the business data is extracted through the backbone network, the risk level corresponding to the business data can be predicted based on the business vector and the classification model of the classification model. Different risk levels in the scene are of different categories. During model training, the training samples included in the training set may be multiple training samples corresponding to each risk level, and one training sample for each risk level may include a training sample corresponding to the risk level. sample business data (that is, business data with known true risk levels).

需要说明的是，在本申请的可选实施例中，所涉及到的与对象有关的等数据，当本申请的实施例运用到具体产品或技术中时，需要获得对象许可或者同意，且相关数据的收集、使用和处理需要遵守相关国家和地区的相关法律法规和标准。也就是说，本申请实施例中如果涉及到与对象有关的数据，这些数据需要经由对象授权同意、且符合国家和地区的相关法律法规和标准的情况下获取的。It should be noted that, in the optional embodiments of the present application, for the data related to the objects involved, when the embodiments of the present application are applied to specific products or technologies, the object's permission or consent needs to be obtained, and the relevant The collection, use and processing of data need to comply with relevant laws, regulations and standards of relevant countries and regions. That is to say, if the data related to the object is involved in the embodiments of the present application, the data needs to be obtained with the authorization and consent of the object and in compliance with the relevant laws, regulations and standards of the country and region.

为了更好的说明和理解本申请实施例提供的方法，下面先结合一个具体的应用场景对本申请提供的方法的可选实施方式进行说明。该场景实施例中，以将本申请实施例提供的数据处理方法应用于具有移动支付功能的应用程序中为例，基于本申请实施例提供的方法，可以用于识别该应用对应的对象的类型(也就是类别)，比如，可以基于与对象有关的业务数据判别对象是否是风险对象，即是否是目标类别的对象，业务数据可以是一个或多个指定类型的业务(即目标业务可以是一个或多个)的业务数据，其中，对于对象类型的划分方式本申请实施例不做限定。In order to better describe and understand the method provided by the embodiments of the present application, an optional implementation manner of the method provided by the present application is first described below with reference to a specific application scenario. In this scenario embodiment, taking the data processing method provided by the embodiment of the present application applied to an application program with a mobile payment function as an example, the method provided by the embodiment of the present application can be used to identify the type of the object corresponding to the application (that is, category), for example, whether the object is a risk object can be determined based on business data related to the object, that is, whether it is an object of the target category, and the business data can be one or more specified types of business (that is, the target business can be a or multiple) of business data, wherein the method for classifying object types is not limited in this embodiment of the present application.

图1示出了本申请实施例所适用的一种数据处理系统的结构示意图，如图1所示，该数据处理系统可以包括终端设备11、终端设备21、应用服务器20和训练服务器30，其中，应用服务器20可以是提供上述移动支付功能的应用程序的服务器。终端设备11可以是使用该应用程序的对象的电子设备，终端设备11可以通过网络与应用服务器20通信，可以通过应用程序的用户界面进行操作，使用应用程序提供的服务。终端设备21可以是应用程序的管理侧的电子设备，终端设备21上可以运行有应用程序的管理客户端，终端设备21同样可以通过网络与应用服务器20通信，有权限的管理者或者相关人员可以通过该管理客户端进行应该程序的管理。FIG. 1 shows a schematic structural diagram of a data processing system to which an embodiment of the present application is applied. As shown in FIG. 1 , the data processing system may include a terminal device 11 , a terminal device 21 , an application server 20 and a training server 30 , wherein , the application server 20 may be a server that provides the above-mentioned mobile payment function application program. The terminal device 11 may be an electronic device that uses the application program. The terminal device 11 may communicate with the application server 20 through the network, operate through the user interface of the application program, and use the services provided by the application program. The terminal device 21 may be an electronic device on the management side of the application, and the management client of the application may run on the terminal device 21. The terminal device 21 can also communicate with the application server 20 through the network, and an authorized administrator or related personnel can The management of the application is performed through the management client.

训练服务器30可以通过执行本申请实施例提供的训练方法，执行神经网络模型的训练操作，得到训练好的特征提取模型。可选的，在得到训练好的特征提取模型之后，训练服务器30还可以对包含该特征提取模型的分类模型进行训练，得到训练好的分类模型。应用服务器20和训练服务器30可以通过网络进行通信，在通过训练服务器30得到训练好的分类模型之后，可以将该训练好的分类模型部署于应用服务器20，由应用服务器20通过执行本申请实施例提供的数据处理方法，通过调用该训练好的分类模型进行待处理数据的处理，该处理可以包括但限于待处理数据对应的对象的类别的识别，可选的，应用服务器20还可以将识别结果发送给终端设备21，以将识别结果展示给应用程序的管理者。The training server 30 may perform the training operation of the neural network model by executing the training method provided in the embodiment of the present application, so as to obtain a trained feature extraction model. Optionally, after obtaining the trained feature extraction model, the training server 30 may further train the classification model including the feature extraction model to obtain the trained classification model. The application server 20 and the training server 30 may communicate through the network, and after obtaining the trained classification model through the training server 30, the trained classification model may be deployed in the application server 20, and the application server 20 executes the embodiment of the present application by executing the The provided data processing method is to process the data to be processed by invoking the trained classification model. The processing may include, but is limited to, the identification of the category of the object corresponding to the data to be processed. Optionally, the application server 20 may also use the identification result. It is sent to the terminal device 21 to present the identification result to the manager of the application.

需要说明的是，在上述应用场景中，本申请实施例提供的数据处理方法可以由应用服务器20执行，也可以由其他独立于应用服务器的其他电子设备执行，如云服务器。下面以应用服务器作为执行主体为例进行描述。It should be noted that, in the above application scenario, the data processing method provided in the embodiment of the present application may be executed by the application server 20, or may be executed by other electronic devices independent of the application server, such as a cloud server. The following description takes the application server as the execution subject as an example.

图2为基于图1中所示的数据处理系统进行数据处理的流程示意图。下面结合图1和图2对本申请提供的方案进行说明，该场景实施例可以包括步骤S11至步骤S13、以及步骤S21和步骤S23，具体如下：FIG. 2 is a schematic flowchart of data processing based on the data processing system shown in FIG. 1 . The solution provided by the present application will be described below with reference to FIG. 1 and FIG. 2. This scenario embodiment may include steps S11 to S13, and steps S21 and S23, as follows:

步骤S11：获取训练数据集。Step S11: Acquire a training data set.

步骤S12：训练服务器20训练特征提取模型以及分类模型。Step S12: The training server 20 trains the feature extraction model and the classification model.

其中，分类模型包括特征提取模型和分类模块，当然，还可以包括其他神经网络结构。可选的，可以先训练特征提取模型，在得到训练好的特征提取模型之后，可以在固定特征提取模型的模型参数的前提下，对分类模型再继续训练(也就是对分类模型的分类模块的模型参数进行学习)。当然，也可以是直接对整个分类模型进行端到端的训练，此时，分类模型的训练损失可以包括特征提取模型部分的训练总损失和分类模型的分类损失。下面以先训练特征提取模型的方式进行说明。Among them, the classification model includes a feature extraction model and a classification module, and of course, other neural network structures may also be included. Optionally, the feature extraction model can be trained first, and after the trained feature extraction model is obtained, the classification model can be further trained on the premise of fixing the model parameters of the feature extraction model (that is, the classification module of the classification model). model parameters for learning). Of course, it is also possible to directly perform end-to-end training on the entire classification model. In this case, the training loss of the classification model may include the total training loss of the feature extraction model and the classification loss of the classification model. The following describes the method of first training the feature extraction model.

可选的，该场景实施例中，训练数据集可以包括第一训练集和第二训练集，第一训练集和第二训练集中都包括大量的训练样本，第一训练集是用于训练特征提取模块的样本集，第二训练集是用于训练分类模型的样本集。为描述方便，将第一训练集中的训练样本称为第一样本，第二训练集中的训练样本称为第二样本。Optionally, in this scenario embodiment, the training data set may include a first training set and a second training set, both the first training set and the second training set include a large number of training samples, and the first training set is used for training features. The sample set of the extraction module, and the second training set is the sample set used to train the classification model. For the convenience of description, the training samples in the first training set are referred to as first samples, and the training samples in the second training set are referred to as second samples.

对于样本的获取方式本申请不做限定。可选的，为了使得训练好的分类模型能够更加符合上述应用场景，训练数据集中的样本可以是对应于该应用场景的样本。可选的，可以基于该应用程序对应的历史业务数据构建训练样本，或者通过模拟操作该应用程序的方式获取模拟的业务数据，基于模拟得到的业务数据构建训练样本，或者其他方式构建。以训练分类模型的目的是采用训练好的该模型识别目标对象是否是目标类别的对象为例，第一训练集中的多个类别的训练样本可以包括两个类别的训练样本，此时的两个类别可以是目标类别和非目标类别，第一训练集则包括目标类别的多个样本和非目标类别的多个样本，目标类别的一个样本是指目标类别的一个样本对象对应的业务数据或者基于该业务数据得到的业务时序特征，同样的，非目标类别的一个样本是指非目标类别的一个样本对象对应的业务数据或业务时序特征。同样的，第二训练集中的训练样本同样可以包括指定类别的样本对象对应的样本和非指定类别的样本对象对应的样本。This application does not limit the acquisition method of the sample. Optionally, in order to make the trained classification model more suitable for the above application scenario, the samples in the training data set may be samples corresponding to the application scenario. Optionally, training samples may be constructed based on historical business data corresponding to the application, or simulated business data may be obtained by simulating operating the application, training samples may be constructed based on the simulated business data, or other methods. Taking the purpose of training the classification model to use the trained model to identify whether the target object is an object of the target category as an example, the training samples of multiple categories in the first training set may include training samples of two categories. The categories can be target categories and non-target categories. The first training set includes multiple samples of the target category and multiple samples of non-target categories. A sample of the target category refers to the business data corresponding to a sample object of the target category or based on The service time sequence feature obtained from the service data, similarly, a sample of a non-target category refers to the service data or service sequence feature corresponding to a sample object of the non-target category. Similarly, the training samples in the second training set may also include samples corresponding to sample objects of a specified category and samples corresponding to sample objects of non-specified categories.

以第一训练集为例，可以基于各个样本对象的多个指定类型的业务数据构建多个第一样本，以一个样本的获取为例，可以获取一个样本对象在多个时段的多个指定类型的业务数据，如可以获取样本对象30天的多个指定类型的交易数据(也可以称为交易特征，可以是各个类型的业务属性的属性值，比如，对于支付业务而言，可以包括支付所涉及的资源数量)，可以按照30天的日期先后，根据这30天的多个指定类型的交易数据得到该样本对象对应的业务时序特征，假设共采用了上述多个指定类型的交易数据对应的100个业务属性的属性值，那么业务时序特征则可以表示为一个30*100的特征矩阵，因此，上述业务时序特征也可以称为业务时序特征矩阵，30为时序特征的维度，即30天，100为业务特征的维度，即100个属性值。可以将上述业务时序特征和该样本对象的真实对象类型(即样本标签，也就是类别标签)作为一个样本。Taking the first training set as an example, multiple first samples can be constructed based on multiple designated types of business data of each sample object. Taking the acquisition of one sample as an example, multiple designations of one sample object in multiple time periods can be obtained. Types of business data, such as multiple specified types of transaction data (also called transaction characteristics, which can be the attribute values of various types of business attributes, for example, for payment services, can include payment The number of resources involved), according to the date order of 30 days, according to the transaction data of multiple specified types of these 30 days to obtain the business time sequence characteristics corresponding to the sample object, assuming that the above-mentioned multiple specified types of transaction data are used in total. The attribute value of the 100 business attributes, then the business time sequence feature can be expressed as a 30*100 feature matrix. Therefore, the above business time sequence feature can also be called a business time sequence feature matrix, and 30 is the dimension of the time sequence feature, that is, 30 days , 100 is the dimension of the business feature, that is, 100 attribute values. The above-mentioned business time sequence feature and the real object type of the sample object (that is, the sample label, that is, the category label) can be used as a sample.

作为一个示例，下表1中示出了一个样本对应的多个指定类型的业务数据的示例，如表1所示，该示例中共包括m天的n个指定类型的业务的业务数据，属性值i(i∈[1，n])表示第i个指定类型的业务数据的属性值，D_j(j∈[1，m])表示第j天的业务数据，那么该示例中的业务数据对应的业务时序特征则可以是一个m*n的特征矩阵，将该特征矩阵作为样本输入到特征提取模型中，可以得到对应的特征向量。As an example, the following table 1 shows an example of multiple specified types of business data corresponding to one sample. As shown in table 1, the example includes m days of business data of n specified types of services, and the attribute value i(i∈[1,n]) represents the attribute value of the i-th specified type of business data, and _Dj (j∈[1,m]) represents the business data of the jth day, then the business data in this example corresponds to The business time sequence feature of can be an m*n feature matrix, and the feature matrix is input into the feature extraction model as a sample, and the corresponding feature vector can be obtained.

表1Table 1

属性值1attribute value 1 属性值2attribute value 2 属性值3attribute value 3 属性值4attribute value 4 …… 属性值nattribute value n D1D1 a11a11 a12a12 a13a13 a14a14 …… a1na1n D2D2 a21a21 a22a22 a23a23 a24a24 …… a2na2n …… …… …… …… …… …… …… DmDm am1am1 am2am2 am3am3 am4am4 …… amnamn

其中，特征提取模型输出的特征向量的维度是由模型的输出层的维度决定的，是可以根据实际应用需求配置的，可以根据实际应用需求配置，比如，可以根据业务需求，依据输入到模型中的特征的维度调整模型的输出的特征向量的维度，输入特征的维度越大，输出特征的维度可以相对于越大，理论上输出向量维度越高，特征向量包含的信息越稠密，但模型训练与推理速度会受影响。其中，在实际应用中，不建议输出向量的维度过低，否则信息损失否会较大，特征向量对于输入数据的表达能力会较弱，Among them, the dimension of the feature vector output by the feature extraction model is determined by the dimension of the output layer of the model, which can be configured according to the actual application requirements, and can be configured according to the actual application requirements. For example, it can be input into the model according to the business requirements. The dimension of the feature adjusts the dimension of the output feature vector of the model. The larger the dimension of the input feature, the larger the dimension of the output feature can be. In theory, the higher the dimension of the output vector, the denser the information contained in the feature vector, but the model training And inference speed will be affected. Among them, in practical applications, it is not recommended that the dimension of the output vector is too low, otherwise the loss of information will be large, and the ability of the feature vector to express the input data will be weak.

假设在实际应用中，数据处理的目的是识别第一类型的对象(即上述目标类别的对象，如不合规的对象)，那么对象的类型可以包括第一类型(也就是第一类别)和第二类型，那么一个样本的样本标签则表示第一类型或第二类型，如果一个样本对象为第一类型的对象，那么该样本对应的样本则可以认为是第一类型的样本，第二类型的样本对象对应的样本可以认为是第二类型的样本，第一类型的多个样本和第二类型的多个样本构成第一训练集。可以理解的是，在模型训练过程中，可以第一训练集中可以包括两个类别的样本或者是两个以上类别的样本。在将训练好特征提取模型应用于分类模型时，如果训练模型的目的是为了识别对象是否是第一类型的对象，那么分类模型的第二训练集可以是采用包括两个类别的样本，即第一类型的样本对象对应的多个样本，第二类型的样本对象对应的多个样本，以使得训练得到的二分类模型能够更好的应用到该具体的分类任务中。Assuming that in practical applications, the purpose of data processing is to identify objects of the first type (that is, objects of the above target category, such as non-compliant objects), the types of objects may include the first type (that is, the first category) and The second type, then the sample label of a sample represents the first type or the second type. If a sample object is an object of the first type, then the sample corresponding to the sample can be regarded as a sample of the first type and the second type. The samples corresponding to the sample object can be considered as samples of the second type, and multiple samples of the first type and multiple samples of the second type constitute the first training set. It can be understood that, in the model training process, the first training set may include samples of two categories or samples of more than two categories. When applying the trained feature extraction model to the classification model, if the purpose of training the model is to identify whether the object is of the first type, the second training set of the classification model can be a sample including two categories, namely the first Multiple samples corresponding to one type of sample object, and multiple samples corresponding to the second type of sample object, so that the two-classification model obtained by training can be better applied to the specific classification task.

对于特征提取模型的模型结构本申请实施例不做限定，可以是现有任意的特征提取网络，也可以是对现有的特征提取网络进行修改得到的。作为一可选方案，特征提取模型可以选用ResNet50网络(一种深度学习网络)作为基础网络，由于模型是用于提取输入数据的特征向量，因此，本申请实施例中的特征提取模型是对ResNet50网络进行修改后的网络，可选的，修改可以包括：将ResNet50网络最后的平均池化层、全连接层和Softmax层修改为了全连接层、Batch Normalization(批归一化)层和的全连接层，其中，对于修改后的模型中各层的神经元的参数信息(包括神经元的个数等)可以根据实际需求配置，比如，上述修改后的批归一化前后的全连接层的参数可以分别是2048*1000和1024*128，其中，2048和1024分别表示两个全连接层的输入特征的维度，1000和128分别表示两个全连接层的输出特征的维度，采用该参数配置，特征提取模型所输出的特征向量的维度则是128，即包括128个特征值的特征向量。The model structure of the feature extraction model is not limited in the embodiment of the present application, which may be any existing feature extraction network, or may be obtained by modifying the existing feature extraction network. As an optional solution, the feature extraction model may select the ResNet50 network (a deep learning network) as the basic network. Since the model is used to extract the feature vector of the input data, the feature extraction model in the embodiment of the present application is a response to the ResNet50 network. The network after the network is modified. Optionally, the modification can include: modifying the last average pooling layer, fully connected layer and Softmax layer of the ResNet50 network to fully connected layer, Batch Normalization (Batch Normalization) layer and full connection layer, in which the parameter information (including the number of neurons, etc.) of neurons in each layer in the modified model can be configured according to actual needs, for example, the parameters of the fully connected layer before and after the above modified batch normalization They can be 2048*1000 and 1024*128 respectively, where 2048 and 1024 respectively represent the dimensions of the input features of the two fully connected layers, and 1000 and 128 respectively represent the dimensions of the output features of the two fully connected layers. Using this parameter configuration, The dimension of the feature vector output by the feature extraction model is 128, that is, a feature vector including 128 feature values.

本申请实施例中，第一训练集中包括多个类别的第一训练样本，假设类别数量为m个，首先要基于第一训练集构建用于模型训练的初始样本组(三元组)，可以通过采样的方式构建多个批数据集，并基于每个批数据集中的样本构建每个批数据集对应的三元组。可选的，每次采样时，可以随机从包括M个类别的第一训练样本的第一训练集中选取p个类别的第一训练样本，每个类别中再随机挑选k个样本，即一次采样训练中共包含p*k个样本，即一个批数据集中的样本的数量为p*k个。在训练过程中，依次选择该批次中的特定的正样本作为锚点，再分别挑选出对锚点来说最难的一个正样本和最难的一个负样本与之组成三元组，可选的，可以对批数据集中的每个样本都构建其对应的三元组，那么在一次采样中，共含有p*k个这样的三元组。In the embodiment of the present application, the first training set includes first training samples of multiple categories, and it is assumed that the number of categories is m. First, an initial sample group (triple group) for model training should be constructed based on the first training set. Multiple batch datasets are constructed by sampling, and triples corresponding to each batch dataset are constructed based on the samples in each batch dataset. Optionally, during each sampling, the first training samples of p categories may be randomly selected from the first training set including the first training samples of M categories, and k samples may be randomly selected from each category, that is, one sampling. There are p*k samples in training, that is, the number of samples in a batch dataset is p*k. In the training process, the specific positive samples in the batch are selected as the anchor points in turn, and then the most difficult positive sample and the most difficult negative sample for the anchor point are respectively selected to form a triplet with them. Optionally, a corresponding triplet can be constructed for each sample in the batch dataset, then in one sampling, there are p*k such triples in total.

需要说明的是，初始的三元组可以是随机生成的，也就是说，在对构建好的神经网络模型(训练前的特征提取模型)训练前，对于批数据集中的每个样本，可以随机选择与该样本属于相同类别的一个样本作为正样本，选择与该样本属于不同类别的一个样本作为负样本，三者构成一个三元组。在训练过程中，可以基于模型输出的各个样本的特征向量之间的相似度，重新确定每个样本对应的新的三元组，即对于每个样本(也就是锚点)，选择对其来说最难的一个正样本(与该样本相似度最低的同类别的样本)和最难的一个负样本(与该样本相似度最高的不同类别的样本)，将三者作为该样本对应的新的三元组。It should be noted that the initial triplet can be randomly generated, that is, before training the constructed neural network model (feature extraction model before training), for each sample in the batch dataset, it can be randomly generated. A sample belonging to the same category as the sample is selected as a positive sample, and a sample belonging to a different category from the sample is selected as a negative sample, and the three constitute a triplet. During the training process, the new triples corresponding to each sample can be re-determined based on the similarity between the feature vectors of each sample output by the model, that is, for each sample (that is, the anchor point), select the Say the hardest positive sample (the sample of the same category with the lowest similarity to the sample) and the hardest negative sample (the sample of a different category with the highest similarity to the sample), and use the three as the new sample corresponding to the sample. of triples.

通过随机采样的方式得到各个批数据集，并在构建好每个批数据集中p*k个初始的三元组之后，则可以基于各个批数据集对神经网络模型进行迭代训练，其中，对模型进行训练时所采用的相关参数(如训练的代数、学习率等)的设置可以根据需求设置。作为一可选方案，训练的代数(即epoch，所有批数据集都参与了一次训练为一代)可以设置为t₁＝25000，训练时采用的网络优化器可以是Adam优化器，Adam优化器的默认学习率ε＝3e-4，两个指数衰减率系数β₁和β₂的默认取值分别为0.9和0.999。本申请实施例中，对Adam优化器的相关参数进行了优化，β₁设置为0.5，对于学习率，设置了一个训练代数阈值t₀＝15000，学习率ε(t)的表达式如下：Each batch data set is obtained by random sampling, and after constructing p*k initial triples in each batch data set, the neural network model can be iteratively trained based on each batch data set. The settings of the relevant parameters (such as training algebra, learning rate, etc.) used in training can be set according to requirements. As an optional solution, the epoch of training (ie, epoch, all batch datasets participate in one generation of training) can be set to t ₁ =25000, and the network optimizer used during training can be the Adam optimizer. The default learning rate ε=3e-4, and the default values of the two exponential decay rate coefficients β ₁ and β ₂ are 0.9 and 0.999, respectively. In the embodiment of this application, the relevant parameters of the Adam optimizer are optimized, β ₁ is set to 0.5, and for the learning rate, a training algebra threshold t ₀ =15000 is set, and the expression of the learning rate ε(t) is as follows:

其中，∈₀＝3e-4，t表示训练过程中当前的训练代数，∈(t)表示训练过程中的当前代数，也就是说，在训练代数达到t₀时，学习率随着训练代数的增加而变化，具体是成正相关，即t越大，∈(t)越大。Among them, ∈ ₀ = 3e-4, t represents the current training algebra in the training process, ∈(t) represents the current algebra in the training process, that is, when the training algebra reaches t ₀ , the learning rate increases with the training algebra. increase and change, specifically, it is positively correlated, that is, the larger t is, the larger ∈(t) is.

在对模型进行迭代训练时，基于一个批数据集进行模型的一次训练可以称为一次迭代。训练过程中，可以将该批数据集中的各个第一训练样本输入到神经网络模型中，得到各个第一训练样本的特征向量，并根据p*k个三元组中的第一训练样本的特征向量之间的第一相似度(例如，可以基于每个三元组中的锚点的特征向量和正样本的特征向量之间的距离、以及锚点的特征向量与负样本的特征向量之间的距离)计算此次迭代训练对应的训练总损失，对模型的参数进行更新，并根据各个第一训练样本的特征向量之间的第二相似度，更新该批数据集中的三元组，得到p*k个新的三元组，p*k个新的三元组用于下一代训练过程。基于各个批数据集，通过不断对模型进行训练和更新三元组等过程，直至训练代数达到上述设定代数t₁或者迭代训练过程中对应的训练总损失收敛，将此时的神经网络模型作为训练好的特征提取模型，可选的，还可以对模型进行验证或测试，将满足验证或测试条件的作为训练好的模型，如果不满足，可以继续进行训练。When iteratively training the model, one training of the model based on one batch dataset can be called one iteration. During the training process, each first training sample in the batch of data sets can be input into the neural network model to obtain the feature vector of each first training sample, and according to the features of the first training sample in the p*k triples The first similarity between vectors (for example, it can be based on the distance between the eigenvector of the anchor point and the eigenvector of the positive sample, and the distance between the eigenvector of the anchor point and the eigenvector of the negative sample in each triplet) distance) to calculate the total training loss corresponding to this iterative training, update the parameters of the model, and update the triples in the batch of data sets according to the second similarity between the feature vectors of the first training samples, to obtain p *k new triples, p*k new triples for the next generation training process. Based on each batch of data sets, through the process of continuously training the model and updating triples, until the training algebra reaches the above-mentioned set algebra t ₁ or the corresponding training total loss in the iterative training process converges, the neural network model at this time is used as the The trained feature extraction model, optionally, the model can also be verified or tested, and the model that satisfies the verification or testing conditions is regarded as a trained model. If not, the training can be continued.

在得到训练好的特征提取模型之后，可以基于第二训练集中的样本对包括该特征提取模型的分类模型进行训练，以得到训练好的分类模型。对于分类模型的训练方式本申请实施例不做限定，可选的，在对分类模型进行时，可以固定已经训练好的特征提取模型的模型参数。After the trained feature extraction model is obtained, the classification model including the feature extraction model may be trained based on the samples in the second training set to obtain the trained classification model. The training method of the classification model is not limited in this embodiment of the present application. Optionally, when the classification model is performed, the model parameters of the trained feature extraction model may be fixed.

步骤S13：将训练好的分类模型部署到应用服务器20中。Step S13 : deploy the trained classification model to the application server 20 .

步骤S21：应用服务器20获取待处理数据。Step S21: The application server 20 acquires the data to be processed.

步骤S22：应用服务器20通过训练好的分类模型对待处理数据进行识别，得到对应的分类结果；Step S22: the application server 20 identifies the data to be processed through the trained classification model, and obtains a corresponding classification result;

步骤S23：应用服务器将分类结果发送给终端设备21，以将分类结果提供给管理者。Step S23: The application server sends the classification result to the terminal device 21 to provide the classification result to the administrator.

在通过训练得到训练好的分类模型之后，训练服务器30可以将训练好的分类模型发送给应用服务器20，应用服务器20可以通过分类模型对待处理数据进行识别，并可以通过终端设备21将识别出的分类结果提供给管理者，比如，可以将分类结果发送给终端设备21，终端设备21可以通过应用程序的管理客户端将分类结果展示给管理者或者相关人员，或者是在分类结果是指定的结果时，向管理者发送相应的提示信息，比如，在分类结果为异常时，发出提示。After obtaining the trained classification model through training, the training server 30 can send the trained classification model to the application server 20, and the application server 20 can identify the data to be processed through the classification model, and can identify the identified data through the terminal device 21. The classification result is provided to the manager, for example, the classification result can be sent to the terminal device 21, and the terminal device 21 can display the classification result to the manager or related personnel through the management client of the application, or when the classification result is the specified result When the classification result is abnormal, send corresponding prompt information to the administrator, for example, when the classification result is abnormal, send a prompt.

本场景实施例中，待处理业务数据可以是使用应用程序的对象使用该应用程序时对应的上述多个指定类型的业务的业务数据，应用服务器20可以基于对象对应的各个业务的属性值，得到该对象对应的业务时序特征，该业务时序特征即为该应用场景中的待处理数据，基于该业务时序特征，应用服务器20可以通过调用训练好的分类模型，由分类模型的特征提取模型对输入的业务时序特征进行特征提取，得到对应的特征向量，分类模型的分类模块可以基于该特征向量得到对应的分类结果，该分类结果表征了该业务时序特征对应的上述对象的对象类型。In the embodiment of this scenario, the service data to be processed may be the service data of the above-mentioned multiple specified types of services corresponding to the object using the application program when the application program is used, and the application server 20 may obtain the attribute value of each service corresponding to the object to obtain The service sequence feature corresponding to the object is the data to be processed in the application scenario. Based on the service sequence feature, the application server 20 can call the trained classification model and extract the input from the feature extraction model of the classification model. Perform feature extraction on the service sequence feature of the service sequence to obtain a corresponding feature vector, and the classification module of the classification model can obtain a corresponding classification result based on the feature vector, and the classification result represents the object type of the above-mentioned object corresponding to the service sequence feature.

可以理解的是，本申请实施例提供的特征提取模型的训练方法以及数据处理方法可以适用于但不限定于上述应用场景。采用本申请提供的方案训练得到的特征提取模型，可以有效提高通过该模型提取得到的待处理数据的特征向量的区分度，从而在该特征向量进行进一步的处理时，可以有效提高处理效果，如在上述应用场景中，可以有效提升分类的准确性。It can be understood that the training method of the feature extraction model and the data processing method provided by the embodiments of the present application may be applicable to but not limited to the above application scenarios. Using the feature extraction model trained by the solution provided in this application can effectively improve the degree of discrimination of the feature vector of the data to be processed extracted by the model, so that when the feature vector is further processed, the processing effect can be effectively improved, such as In the above application scenarios, the classification accuracy can be effectively improved.

下面通过对几个示例性实施方式的描述，对本申请实施例的技术方案以及本申请的技术方案产生的技术效果进行说明。需要指出的是，下述实施方式之间可以相互参考、借鉴或结合，对于不同实施方式中相同的术语、相似的特征以及相似的实施步骤等，不再重复描述。The technical solutions of the embodiments of the present application and the technical effects produced by the technical solutions of the present application will be described below by describing several exemplary embodiments. It should be noted that the following embodiments may refer to, learn from, or combine with each other, and the same terms, similar features, and similar implementation steps in different embodiments will not be described repeatedly.

图3示出了本申请实施例提供的一种特征提取模型的训练方法的流程示意图，如图3中所示，该训练方法可以包括以下步骤S110至步骤S130。FIG. 3 shows a schematic flowchart of a training method for a feature extraction model provided by an embodiment of the present application. As shown in FIG. 3 , the training method may include the following steps S110 to S130 .

步骤S110：获取训练集，该训练集包括多个类别的训练样本。Step S110: Acquire a training set, where the training set includes training samples of multiple categories.

本申请实施例中，训练数据集中包括多个(至少两个)类别各自对应的训练子集，每个训练子集中包括属于同一个类别的多个训练样本。其中，对于类别的划分方式本申请不做限定，可以根据实际应用场景和应用需求进行类别的划分。In the embodiment of the present application, the training data set includes training subsets corresponding to multiple (at least two) categories, and each training subset includes multiple training samples belonging to the same category. The present application does not limit the classification method of the categories, and the categories can be divided according to actual application scenarios and application requirements.

对于训练集的获取方式本申请实施例也不做限定，可以是基于现有公开的用于分类模型训练的数据集、人工设计的数据集或者是从实际应用场景中采集的样本数据构建的训练集。可选的，训练集中的训练样本可以是对应于同一应用场景的大量训练样本，以使得训练好的特征提取模型可以更好的适用于该应用场景，通过该模型提取得到的该场景中不同的待处理数据对应的特征向量具有更好的区分度。当然，训练集也可以采用对应于多个应用场景的大量训练样本，以使得训练好的特征提取模型具有更好的通用性。The method of acquiring the training set is also not limited in the embodiment of the present application, which may be a training set constructed based on an existing public data set used for classification model training, a manually designed data set, or sample data collected from actual application scenarios. set. Optionally, the training samples in the training set may be a large number of training samples corresponding to the same application scenario, so that the trained feature extraction model can be better applied to the application scenario, and the different parameters in the scenario extracted by the model can be better. The feature vector corresponding to the data to be processed has better discrimination. Of course, the training set may also adopt a large number of training samples corresponding to multiple application scenarios, so that the trained feature extraction model has better generality.

可选的，在实际应用中，可以根据实际应用需求，选择的训练样本的数据形式，训练样本的数据形式可以与模型训练好之后所应用到的应用场景中待处理数据(也就是输入到该模型中需要进行特征提取的数据)的数据形式相对应，也就是说，训练样本可以是根据实际应用场景中所要解决的任务来选择的。可选的，上述特征提取模型可以用作分类模型中的特征提取模块，分类模型用于通过特征提取模型提取第一待处理数据的特征向量，并基于提取的特征向量识别第一待处理数据对应的分类结果；其中，分类结果可以为多个指定类别中的一个或者是各个指定类别对应概率，可以将最高概率对应的类别确定为待处理数据对应的类别，该可选方案中，上述多个类别包括上述多个指定类别，每个指定类别的一个训练样本为是对应于该指定类别的第二待处理数据。Optionally, in practical applications, the data form of the training samples can be selected according to the actual application requirements. The data that needs to be extracted for feature extraction in the model) corresponds to the data form, that is to say, the training samples can be selected according to the tasks to be solved in the actual application scenario. Optionally, the above-mentioned feature extraction model can be used as a feature extraction module in a classification model, and the classification model is used to extract the feature vector of the first data to be processed through the feature extraction model, and identify the corresponding data of the first data to be processed based on the extracted feature vector. The classification result; wherein, the classification result may be one of multiple specified categories or the corresponding probability of each specified category, and the category corresponding to the highest probability may be determined as the category corresponding to the data to be processed. In this optional solution, the above-mentioned multiple The categories include the above-mentioned multiple specified categories, and a training sample of each specified category is the second data to be processed corresponding to the specified category.

其中，第一待处理数据的数据形式是由分类模型所应用在的实际业务场景中要解决的具体分类任务决定的。例如，分类模型是用于根据对象在目标应用程序中的业务数据来识别对象的类别，即特征提取模型是应用到对象分类任务中，是要基于对象的业务数据提取用于对象分类的特征向量，那训练样本应是样本对象的业务数据或者是对该业务数据进行处理得到的处理后的业务数据特征(如前文中的业务时序特征矩阵)，训练集中则要包括各种类别的样本对象对应的业务数据或业务数据特征。再比如，分类模型是用于对文本的分类，训练集中可以包括多种类别(即上述多个指定类别)的样本文本(第二待处理数据)，训练好的特征提取模型则可以对输入到分类模型中的待处理文本(第一待处理数据)进行特征提取，并基于提取的特征向量由分类模型的分类模块预测出待处理文本的文本类别。The data form of the first data to be processed is determined by the specific classification task to be solved in the actual business scenario to which the classification model is applied. For example, the classification model is used to identify the category of the object according to the business data of the object in the target application, that is, the feature extraction model is applied to the object classification task, and the feature vector for object classification is to be extracted based on the business data of the object. , the training sample should be the business data of the sample object or the processed business data characteristics obtained by processing the business data (such as the business time series feature matrix in the preceding paragraph), and the training set should include various types of sample objects corresponding to business data or business data characteristics. For another example, the classification model is used to classify text, and the training set can include sample texts (second data to be processed) of various categories (that is, the above-mentioned multiple specified categories), and the trained feature extraction model can Feature extraction is performed on the text to be processed (first data to be processed) in the classification model, and based on the extracted feature vector, the classification module of the classification model predicts the text category of the text to be processed.

需要说明的是，上述分类模型可以是二分类模型，也可以是多分类模型。It should be noted that the above classification model may be a two-class model or a multi-class model.

可选的，分类模型具体可以基于提取的特征向量识别第一待处理数据是否是目标类别对应的数据，即上述分类结果表征了第一待处理数据为目标类别对应的数据或者为非目标类别对应的数据，相应的，上述多个指定类别包括目标类别和至少一个非目标类别。Optionally, the classification model may specifically identify whether the first data to be processed is data corresponding to the target category based on the extracted feature vector, that is, the above classification result indicates that the first data to be processed is data corresponding to the target category or corresponds to a non-target category. Correspondingly, the above-mentioned multiple specified categories include a target category and at least one non-target category.

该可选方案中，分类模型为二分类模型，其作用是用于识别输入到分类模型中的待处理数据是否是目标类别(上述多个指定类别中的某个特定类别)的数据，在特征提取模型的训练过程中，多个类别的训练样本至少包括该目标类别对应的训练样本和其他类别(不是目标类别的类别)对应的训练样本，其他类别可以是一个，也可以是多个。比如，分类模型对应的分类任务是识别待处理图像是否是a类别的图像，训练集中可以包括多个a类别的图像以及不是a类别的多个图像，每个图像可以作为一个训练样本。In this optional solution, the classification model is a binary classification model, and its function is to identify whether the data to be processed input into the classification model is the data of the target category (a specific category in the above-mentioned multiple specified categories). During the training process of the extraction model, the training samples of multiple categories include at least training samples corresponding to the target category and training samples corresponding to other categories (categories that are not target categories), and the other categories may be one or more. For example, the classification task corresponding to the classification model is to identify whether the image to be processed is an image of category a. The training set can include multiple images of category a and multiple images that are not of category a, and each image can be used as a training sample.

可以理解的是，上述多个指定类别的类别划分方式是可以根据实际应用场景和应用需求预先划分的，比如，分类模型是要基于对象的业务数据识别对象是否是目标类别的对象，那么上述指定类别则可以是对象类别，相应的，上述第二待处理数据即训练样本也就是可以是某个对象类别的样本对象对应的业务数据或基于该业务数据得到的业务特征数据，上述第一待处理数据为目标对象(待识别的对象)对应的业务数据或业务特征数据，上述分类结果表征了目标对象是否为目标类别的对象。再比如，分类模型是用于图像分类，那上述指定类别则是图像类别。It can be understood that the classification methods of the above-mentioned multiple specified categories can be pre-divided according to the actual application scenarios and application requirements. For example, the classification model is to identify whether the object is an object of the target category based on the business data of the object. The category can be an object category. Correspondingly, the above-mentioned second data to be processed, that is, the training sample, can also be the business data corresponding to the sample object of a certain object category or the business feature data obtained based on the business data. The above-mentioned first to-be-processed data The data is business data or business feature data corresponding to the target object (object to be identified), and the above classification result indicates whether the target object is an object of the target category. For another example, if the classification model is used for image classification, the above specified category is the image category.

步骤S120：基于训练集构建多个样本对，多个样本对包括多个正样本对和多个负样本对，其中，正样本对包括同一类别的两个训练样本，负样本对包括不同类别的两个训练样本。Step S120: Construct multiple sample pairs based on the training set, where the multiple sample pairs include multiple positive sample pairs and multiple negative sample pairs, wherein the positive sample pair includes two training samples of the same category, and the negative sample pair includes different categories of training samples. two training samples.

该步骤中构建得到的多个样本对可以理解为初始的样本对，在步骤S130的训练过程中，会基于神经网络模型的输出不断对样本对进行优化更新，每个训练操作得到的新的样本对作为后续训练(比如下一次)时所依据的样本对。The multiple sample pairs constructed in this step can be understood as initial sample pairs. During the training process in step S130, the sample pairs will be continuously optimized and updated based on the output of the neural network model, and new samples obtained by each training operation will be updated. The pair of samples on which subsequent training (such as the next time) is based.

对于初始的样本对的构建方式本申请实施例不做限定，比如，可以随机生成，即可以将同一个类别中的任意两个训练样本作为一个正样本对，可以将不同类别的两个训练样本作为一个负样本对。The construction method of the initial sample pair is not limited in the embodiment of the present application. For example, it can be randomly generated, that is, any two training samples in the same category can be used as a positive sample pair, and two training samples in different categories can be used as a positive sample pair. as a negative sample pair.

可以理解的是，本申请实施例中的上述正样本对和负样本对中的“正”和“负”是相对的概念，“正”和“负”是相对于一个样本而言的，对于一个样本，与该样本属于相同类别的样本可以成为该样本的正样本，与该样本属于不同类别的样本可以成为该样本的负样本。在后文中描述的样本组中，一个样本组中的正样本和负样本也是相对该样本组中的锚点而言的，样本组中与该锚点属于相同类别的样本为该锚点的正样本，与该锚点属于不同类别的样本为该锚点的负样本。It can be understood that "positive" and "negative" in the above-mentioned positive sample pair and negative sample pair in the embodiments of the present application are relative concepts, and "positive" and "negative" are relative to one sample. For a sample, the samples belonging to the same category as the sample can become the positive samples of the sample, and the samples belonging to different categories from the sample can become the negative samples of the sample. In the sample group described later, the positive samples and negative samples in a sample group are also relative to the anchor points in the sample group, and the samples in the sample group that belong to the same category as the anchor point are the positive samples of the anchor point. Samples that belong to a different category from the anchor point are negative samples of the anchor point.

步骤S130：基于训练集对神经网络模型重复执行训练操作，直至满足预设条件，将满足预设条件的神经网络模型作为训练好的特征提取模型。Step S130: Repeat the training operation on the neural network model based on the training set until a preset condition is met, and use the neural network model that meets the preset condition as a trained feature extraction model.

其中，上述预设条件为模型的训练结束条件，可以包括但不限于神经网络模型对应的训练总损失收敛或者训练次数达到设定次数，上述训练操作的流程如图4所示，可以包括如下步骤S131至步骤S133。Wherein, the above preset condition is the training end condition of the model, which may include but is not limited to the convergence of the total training loss corresponding to the neural network model or the number of training times reaching a set number of times. The process of the above training operation is shown in Figure 4, which may include the following steps S131 to step S133.

步骤S131：将多个样本对中的各个训练样本分别输入到神经网络模型中，得到各个训练样本的特征向量；Step S131: input each training sample in the multiple sample pairs into the neural network model respectively, and obtain the feature vector of each training sample;

步骤S132：基于各个样本对中的训练样本的特征向量之间的第一相似度，确定训练总损失；Step S132: Determine the total training loss based on the first similarity between the feature vectors of the training samples in each sample pair;

步骤S133：若训练总损失未收敛且训练次数未达到设定次数，则对神经网络模型的模型参数进行调整，基于各个训练样本的特征向量之间的第二相似度，确定多个新的样本对，并将新的多个样本对作为后续训练操作时所基于的多个样本对。Step S133: If the total training loss does not converge and the number of training times does not reach the set number of times, adjust the model parameters of the neural network model, and determine a plurality of new samples based on the second similarity between the feature vectors of each training sample. and use the new multiple sample pairs as the multiple sample pairs on which subsequent training operations are based.

可以理解的是，由于正样本对中的训练样本是属于同一个类别的两个样本，负样本对中的两个训练样本是属于不同类别的两个样本，那么训练模型的目的理论上应是让正样本对中两个样本的特征向量之间的相似度(可以简称为两个样本之间的相似度)尽可能高，负样本对中两个样本的特征向量之间的相似度尽可能低，即训练总损失表征了当前训练操作过程中输入到模型中的各个正样本对之间的差异程度、以及各个负样本对之间的相似程度，由于正样本对中两个样本的特征向量之间的相似度可以反映正样本对之间的差异程度，负样本对中两个样本的特征向量之间的相似度可以反映负样本对之间的相似程度，因此，在训练过程中，可以根据各个样本对中的两个样本之间的第一相似度，来计算模型的训练总损失，并可以基于训练总损失(即训练总损失是否收敛以及训练次数是否达到设定次数来判断模型的性能是否可以结束训练。可选的，对于每个样本对，可以根据该样本对中的两个样本的特征向量之间的第一相似度计算该样本对对应的训练损失，再基于各个样本对对应的训练损失得到模型对应的训练总损失，如将各个样本对对应的训练损失之和或者均值作为训练总损失。It is understandable that since the training samples in the positive sample pair are two samples belonging to the same category, and the two training samples in the negative sample pair are two samples belonging to different categories, the purpose of training the model should theoretically be Let the similarity between the feature vectors of the two samples in the positive sample pair (which can be simply referred to as the similarity between the two samples) be as high as possible, and the similarity between the feature vectors of the two samples in the negative sample pair is as high as possible Low, that is, the total training loss represents the degree of difference between each positive sample pair input into the model during the current training operation, and the degree of similarity between each negative sample pair, due to the feature vector of the two samples in the positive sample pair. The similarity between the pairs can reflect the degree of difference between the positive sample pairs, and the similarity between the feature vectors of the two samples in the negative sample pair can reflect the similarity between the negative sample pairs. Therefore, in the training process, it can be According to the first similarity between the two samples in each sample pair, the total training loss of the model is calculated, and the model can be judged based on the total training loss (that is, whether the total training loss converges and whether the number of training times reaches the set number of times) Whether the performance can end the training. Optionally, for each sample pair, the training loss corresponding to the sample pair can be calculated according to the first similarity between the feature vectors of the two samples in the sample pair, and then based on each sample pair The corresponding training loss is the total training loss corresponding to the model, for example, the sum or average of the corresponding training losses of each sample pair is used as the total training loss.

对于上述第一相似度的具体计算方式本申请实施例不做限定，理论上可以采用现有计算两个特征向量之间的相似度的任意方式，例如，可以计算两个特征向量之间的欧式距离，采用该距离表征第一相似度，距离越大相似度越小。The specific calculation method of the above-mentioned first similarity is not limited in the embodiment of the present application. In theory, any existing method for calculating the similarity between two eigenvectors can be adopted. For example, the Euclidean equation between the two eigenvectors can be calculated. Distance, which is used to represent the first similarity, the greater the distance, the smaller the similarity.

为了得到满足实际应用需求的特征提取网络，需要基于训练集中的训练样本对初始的神经网络模型不断进行训练，直至满足预设的训练结束条件即上述预设条件，可以理解的是，初始的神经网络模型即为待训练的特征提取模型，在训练阶段，神经网络模型的输入是训练样本，输出是训练样本的特征向量，通过不断对其进行训练(也就是调整模型参数)，得到符合要求的训练好的特征提取模型，在应用该模型时，模型的输入则是待处理数据，输出则是待处理数据的特征向量。In order to obtain a feature extraction network that meets the needs of practical applications, it is necessary to continuously train the initial neural network model based on the training samples in the training set until the preset training end condition is met, that is, the above preset conditions. It is understandable that the initial neural network model The network model is the feature extraction model to be trained. In the training phase, the input of the neural network model is the training sample, and the output is the feature vector of the training sample. When applying the trained feature extraction model, the input of the model is the data to be processed, and the output is the feature vector of the data to be processed.

其中，对于特征提取模型的具体神经网络结构本申请实施例不做限定，可以根据实际应用需求选择，可以是基于任意用于特征提取的神经网络结构的特征提取模型，比如，可以包括但不限于基于卷积神经网络的特征提取模型或基于循环神经网络的特征提取模型等。The specific neural network structure of the feature extraction model is not limited in the embodiment of the present application, which may be selected according to actual application requirements, and may be a feature extraction model based on any neural network structure used for feature extraction. For example, it may include but not limited to Feature extraction model based on convolutional neural network or feature extraction model based on recurrent neural network, etc.

在对神经网络模型进行不断训练时，训练结束的预设条件可以根据实际需求配置，本申请实施例不做限定。比如，预设条件可以包括训练总损失满足一定条件即训练总损失收敛(训练总损失收敛的判断条件可以根据需求设置，如训练总损失小于设定值或者连续两次训练操作对应的训练总损失的差值小于设定阈值中的至少一项等)或训练次数达到设定次数。可选的，预设条件还可以包括验证条件或者测试条件中的至少一项，相应的，还可以预配置验证数据集和/或测试数据集，在模型的训练过程中，可以基于验证数据集和/或测试数据集来评估当前模型(经过一次或多次训练后的模型)的性能是否满足验证条件，基于评估结果来决定是否需要继续对模型进行训练。需要说明的是，预设条件中的训练次数达到设定次数指的训练代数(epoch)达到设定代数，也就是训练集中的所有样本参与训练的次数都达到设定代数。During continuous training of the neural network model, the preset condition for ending the training may be configured according to actual requirements, which is not limited in this embodiment of the present application. For example, the preset condition may include that the total training loss satisfies a certain condition, that is, the total training loss converges (the condition for judging the convergence of the total training loss can be set according to requirements, such as the total training loss is less than the set value or the total training loss corresponding to two consecutive training operations The difference is less than at least one of the set thresholds, etc.) or the number of training times reaches the set number of times. Optionally, the preset conditions may also include at least one of verification conditions or test conditions. Correspondingly, a verification data set and/or a test data set may also be preconfigured. During the training process of the model, the verification data set may be and/or test data set to evaluate whether the performance of the current model (model after one or more trainings) meets the verification conditions, and based on the evaluation results, it is decided whether to continue training the model. It should be noted that the number of times of training in the preset condition reaches the set number of times means that the training epoch reaches the set epoch, that is, the number of times that all samples in the training set participate in the training reaches the set epoch.

本申请实施例提供的上述训练方式，在每进行一次训练操作之后，除了会基于正样本对的第二相似度(也就是正样本对中两个样本的特征向量之间的相似度)和负样本对的相似度计算模型的训练总损失之外，还会进行样本对的更新，具体的，可以根据模型输出的各训练样本的特征向量之间的相似度，重新确定新的正样本对和新的负样本对，使得新的正样本对之间的相似度尽可能低，新的负样本对之间的相似度尽可能高，其中，一个训练样本对应的新的正样本对之间的相似度不高于该样本对应的更新前的正样本对(此次训练输入到模型中的该样本对应的正样本对)，一个训练样本对应的新的负样本对之间的相似度不低于该样本对应的更新前的负样本对，也就是说，重新确定的正样本对和负样本对对于模型来说是比较困难的样本对，通过在每次训练过程中采用该方式重新确定用于后续训练操作的多个样本对，让模型可以尽可能对学习起来比较困难的样本对进行学习，从而可以尽可能的优化模型，提升最终训练得到的特征提取模型的模型性能。The above-mentioned training methods provided by the embodiments of the present application, after each training operation is performed, in addition to the second similarity based on the positive sample pair (that is, the similarity between the feature vectors of the two samples in the positive sample pair) and the negative In addition to calculating the total loss of the training model for the similarity of the sample pairs, the update of the sample pairs will also be performed. Specifically, the new positive sample pairs and The new negative sample pair makes the similarity between the new positive sample pair as low as possible, and the similarity between the new negative sample pair as high as possible, wherein the similarity between the new positive sample pair corresponding to a training sample is The similarity is not higher than the positive sample pair before the update corresponding to the sample (the positive sample pair corresponding to the sample input into the model in this training), and the similarity between the new negative sample pair corresponding to a training sample is not low For the negative sample pair before the update corresponding to the sample, that is to say, the re-determined positive sample pair and negative sample pair are more difficult sample pairs for the model, by adopting this method to re-determine the sample pair in each training process. Multiple sample pairs for subsequent training operations allow the model to learn the sample pairs that are difficult to learn as much as possible, so that the model can be optimized as much as possible, and the model performance of the feature extraction model obtained by the final training can be improved.

同样的，对于第二相似度的计算方式本申请实施例也不做限定，可以根据实际需求配置，比如，可以通过计算特征向量之间的欧式距离、余弦相似度或其他方式得到样本对之间相似度。Similarly, the calculation method of the second similarity is not limited in the embodiment of the present application, and can be configured according to actual requirements. similarity.

可选的，上述基于各个训练样本的特征向量之间的第二相似度，确定多个新的样本对，包括：Optionally, a plurality of new sample pairs are determined based on the second similarity between the feature vectors of each training sample, including:

对于每个训练样本，分别确定该训练样本的特征向量与各个第一样本的特征向量之间的第二相似度，将对应的第二相似度最低的第一样本和该训练样本作为一个新的正样本对，其中，第一样本为各个训练样本中与该训练样本属于相同类别的训练样本；For each training sample, determine the second similarity between the feature vector of the training sample and the feature vector of each first sample, and take the corresponding first sample with the lowest second similarity and the training sample as one a new pair of positive samples, wherein the first sample is a training sample belonging to the same category as the training sample in each training sample;

对于每个训练样本，分别确定该训练样本的特征向量与各个第二样本的特征向量之间的第二相似度，将对应的第二相似度最高的第二样本和该训练样本作为一个新的负样本对，其中，第二样本为各个训练样本中与该训练样本属于不同类别的训练样本。For each training sample, determine the second similarity between the feature vector of the training sample and the feature vector of each second sample, and use the corresponding second sample with the highest similarity and the training sample as a new Negative sample pair, wherein the second sample is a training sample belonging to a different category from the training sample in each training sample.

该可选方案中，对于训练集中的各训练样本，可以将一个训练样本和与该样本最不相似的同类别的训练样本作为正样本对，可以将一个训练样本和与该样本最相似的不同类别的训练样本作为负样本对，也就是，将最困难的样本组合作为训练样本对让神经网络模型来学习，从而使得训练得到的模型能够对相似度很高的不同类别的样本进行区分，对相似度很低的相同类别的样本也能够识别为同一类别，也就是说，在通过本申请提供的方法训练得到的特征提取模型进行待处理数据的特征提取时，即使是相似度很低的同一类别的两个待处理数据，模型输出的两个待处理数据的特征向量的相似度也是比较高的，即使是相似度很高的不同类别的两个待处理数据，模型输出的两个待处理数据的特征向量的相似度也是比较低的。In this optional solution, for each training sample in the training set, a training sample and a training sample of the same category that is least similar to the sample can be used as a positive sample pair, and a training sample and a different sample that is most similar to the sample can be used as a positive sample pair. The training samples of the category are used as negative sample pairs, that is, the most difficult sample combination is used as the training sample pair for the neural network model to learn, so that the trained model can distinguish samples of different categories with high similarity. Samples of the same category with very low similarity can also be identified as the same category, that is, when the feature extraction model trained by the method provided in this application performs feature extraction of the data to be processed, even the samples with very low similarity can be identified as the same category. The similarity of the feature vectors of the two types of data to be processed by the model is relatively high. Even if the two types of data to be processed have a high similarity, the two types of data to be processed by the model output are similar. The similarity of the feature vectors of the data is also relatively low.

可以理解的是，在实际实施时，除了上述通过选择相似度最低的样本构建新的正样本对，选择相似度最高的样本构建新的负样本对的方式之外，还可以基于相同的原理扩充新的样本对的数量，比如，对于一个训练样本，可以构建该训练样本对应的至少两个新的正样本对或负样本对，比如，可以根据模型输出的各个训练样本的特征向量，计算该训练样本与同类别的其他各个训练样本之间的相似度，可以按照相似度由低到高的顺序对其他各个训练样本进行排序，选择排序最靠前的至少两个样本(或者是按照设定占比选择部分最靠前的样本)与该样本分别构建新的正样本对，例如，选择相似度最低的前两个样本，这两个样本分别与该样本组合得到两个新的正样本对。同样，可以采用类似的方式构建新的负样本对，比如，对于一个训练样本，可以选择相似度最高的前两个不同类别的其他训练样本与该训练样本分别构建得到新的负样本对。采用该方式，在保证了确定出的新的样本对是学习难度大的样本对的前提下，还可以扩充训练样本对的数量。It can be understood that, in actual implementation, in addition to the above-mentioned method of constructing a new positive sample pair by selecting the sample with the lowest similarity and constructing a new negative sample pair by selecting the sample with the highest similarity, it can also be expanded based on the same principle. The number of new sample pairs, for example, for a training sample, at least two new positive sample pairs or negative sample pairs corresponding to the training sample can be constructed. The similarity between the training sample and other training samples of the same category can be sorted according to the order of similarity from low to high, and at least two samples with the highest ranking are selected (or according to the set Proportion to select the top sample) and the sample to construct a new positive sample pair, for example, select the first two samples with the lowest similarity, and combine these two samples with the sample to obtain two new positive sample pairs. . Similarly, a new negative sample pair can be constructed in a similar manner. For example, for a training sample, other training samples of the first two different categories with the highest similarity can be selected to construct a new negative sample pair with the training sample respectively. In this way, on the premise that the determined new sample pair is a sample pair with great learning difficulty, the number of training sample pairs can also be expanded.

作为一可选方案，上述基于训练集中的训练样本构建多个样本对，可以包括：As an optional solution, the above-mentioned construction of multiple sample pairs based on the training samples in the training set may include:

将训练集中各个训练样本分别作为锚点，构建各锚点对应的样本组，每个锚点对应的样本组包括该锚点对应的一个正样本对和一个负样本对，其中，一个锚点对应的正样本对包括该锚点和该锚点的正样本，一个锚点对应的负样本对包括该锚点和该锚点的负样本；Each training sample in the training set is used as an anchor point, and a sample group corresponding to each anchor point is constructed. The sample group corresponding to each anchor point includes a positive sample pair and a negative sample pair corresponding to the anchor point, wherein one anchor point corresponds to The positive sample pair includes the anchor point and the anchor point's positive sample, and the negative sample pair corresponding to an anchor point includes the anchor point and the anchor point's negative sample;

相应的，上述基于各个样本对中训练样本的特征向量之间的第一相似度，确定训练总损失，可以包括：Correspondingly, the above-mentioned determination of the total training loss based on the first similarity between the feature vectors of the training samples in each sample pair may include:

对于每个样本组，根据该样本组的正样本对中两个样本的特征向量之间的第一相似度、以及该样本组的负样本对中两个样本的特征向量之间的第一相似度，确定该样本组对应的训练损失；根据各样本组对应的训练损失，确定训练总损失；For each sample group, according to the first similarity between the feature vectors of the two samples in the positive sample pair of the sample group and the first similarity between the feature vectors of the two samples in the negative sample pair of the sample group degree, determine the training loss corresponding to the sample group; determine the total training loss according to the training loss corresponding to each sample group;

上述基于各个训练样本的特征向量之间的第二相似度，确定多个新的样本对，可以包括：The above-mentioned determination of multiple new sample pairs based on the second similarity between the feature vectors of each training sample may include:

基于各个训练样本的特征向量之间的第二相似度，确定各锚点对应的新的样本组，将各锚点对应的新的样本组中的样本对作为后续训练操作时的多个样本对。Based on the second similarity between the feature vectors of each training sample, a new sample group corresponding to each anchor point is determined, and the sample pairs in the new sample group corresponding to each anchor point are used as multiple sample pairs in subsequent training operations .

该可选方案中，可以采用样本组的形式用于神经网络模型的训练。其中，每个样本组是一个三元组，包括一个锚点(也就是一个训练样本)、该锚点对应的一个正样本和该锚点对应的一个负样本，也就是说，该锚点对应的正样本是与该锚点属于同一类别的一个训练样本，该锚点对应的负样本是与该锚点属于不同类别的一个训练样本，即一个样本组是以锚点为基准的三个训练样本构成的两个样本对。In this optional solution, the form of a sample group can be used for training the neural network model. Among them, each sample group is a triple, including an anchor point (that is, a training sample), a positive sample corresponding to the anchor point, and a negative sample corresponding to the anchor point, that is, the anchor point corresponds to The positive sample is a training sample that belongs to the same category as the anchor point, and the negative sample corresponding to the anchor point is a training sample that belongs to a different category from the anchor point, that is, a sample group is three training samples based on the anchor point The sample consists of two sample pairs.

其中，基于训练集构建初始的样本组(还没有输入到模型之前构建的样本组)的构建方式，本申请实施例不做限定，比如，可以采用随机生成的方式，将每个训练样本分别作为一个锚点，可以从与该锚点属于同一类别的训练样本中随机选择一个作为该锚点对应的正样本，从与该锚点属于不同类别的训练样本中随机选择一个作为该锚点对应的负样本，得到该锚点对应的样本组。Among them, the construction method of constructing the initial sample group (the sample group constructed before being input into the model) based on the training set is not limited in this embodiment of the present application. An anchor point can be randomly selected from the training samples belonging to the same category as the anchor point as the positive sample corresponding to the anchor point, and randomly selected from the training samples belonging to different categories with the anchor point as the anchor point. Negative samples, get the sample group corresponding to the anchor point.

对于神经网络模型而言，训练该模型的目的则是让模型学习出的每个样本组中的正样本对的特征向量之间的相似度尽可能高，同时负样本对的特征向量之间的相似度尽可能低，即尽可能拉近正样本对之间的距离，拉远负样本对之间的距离，学习出的特征向量具有很好的类别区别度。For the neural network model, the purpose of training the model is to make the similarity between the feature vectors of the positive sample pairs in each sample group learned by the model as high as possible, and the similarity between the feature vectors of the negative sample pairs The similarity is as low as possible, that is, the distance between positive sample pairs is shortened as much as possible, and the distance between negative sample pairs is widened, and the learned feature vector has a good degree of class distinction.

作为一个示意性的说明，图5中示出了一种基于三元组进行特征提取模型训练的原理示意图，如图5所示，图中锚点和正样本之间的线段的长度表示锚点和正样本之间的相似度，锚点和负样本之间的线段的长度表示锚点和负样本之间的相似度，其中，线段越长，相似度越小。图中左侧的锚点及其对应的正样本之间的距离、以及负样本和锚点之间的距离，可以理解为训练前的样本组，图中右侧的锚点与正样本之间的距离、以及锚点和负样本之间的距离，可以理解为通过训练好的特征提取模型输出的锚点与其正样本和负样本的特征向量得到的样本对之间的距离，从图5中可以看出，训练模型的目的是提高同一类别的样本的特征向量之间的相似度，减少不同类别的样本的特征向量之间的相似度。As a schematic illustration, Figure 5 shows a schematic diagram of the principle of training a feature extraction model based on triples. As shown in Figure 5, the length of the line segment between the anchor point and the positive sample in the figure represents the anchor point and the positive sample. Similarity between samples, the length of the line segment between the anchor point and the negative sample represents the similarity between the anchor point and the negative sample, wherein, the longer the line segment, the smaller the similarity. The distance between the anchor point on the left side of the figure and its corresponding positive sample, as well as the distance between the negative sample and the anchor point, can be understood as the sample group before training, and the distance between the anchor point on the right side of the figure and the positive sample The distance between the anchor point and the negative sample can be understood as the distance between the anchor point output by the trained feature extraction model and the sample pair obtained from the feature vector of the positive sample and the negative sample. From Figure 5 It can be seen that the purpose of training the model is to improve the similarity between feature vectors of samples of the same category and reduce the similarity between feature vectors of samples of different categories.

为了达到上述目的，在训练过程中，对于每个样本组，则可以基于该样本组中正样本对之间的相似度(即该样本组中的锚点与其对应的正样本的特征向量之间的相似度)和该样本组中负样本对之间的相似度来计算该样本组对应的训练损失，一个样本组对应的训练损失表征了该样本组中的正样本对之间的差异以及负样本对之间的差异。本申请实施例中，对于神经网络模型的损失函数的形式本申请实施例不做限定，例如，可以包括但不限于triplet loss(三元组损失)函数，该函数是基于三元组中正样本对之间的距离(表征了样本对之间的相似度)和负样本对之间的距离来计算三元组对应的训练损失。In order to achieve the above purpose, in the training process, for each sample group, it can be based on the similarity between the positive sample pairs in the sample group (that is, the relationship between the anchor point in the sample group and the feature vector of the corresponding positive sample) similarity) and the similarity between the negative sample pairs in the sample group to calculate the training loss corresponding to the sample group, the training loss corresponding to a sample group characterizes the difference between the positive sample pairs in the sample group and the negative samples difference between pairs. In the embodiment of the present application, the form of the loss function of the neural network model is not limited in the embodiment of the present application. For example, it may include but not limited to the triplet loss (triplet loss) function, which is based on the positive sample pair in the triplet The distance between (representing the similarity between sample pairs) and the distance between negative sample pairs is used to calculate the training loss corresponding to the triplet.

可选的，对于每个样本组，上述根据该样本组的正样本对中两个样本的特征向量之间的第一相似度、以及该样本组的负样本对中两个样本的特征向量之间的第一相似度，确定该样本组对应的训练损失，可以包括：Optionally, for each sample group, according to the first similarity between the feature vectors of the two samples in the positive sample pair of the sample group and the sum of the feature vectors of the two samples in the negative sample pair of the sample group. The first similarity between the samples determines the training loss corresponding to the sample group, which may include:

确定该样本组的正样本对中两个样本的特征向量之间的第一距离、以及该样本组的负样本对中两个样本的特征向量之间的第二距离，第一距离表征了该样本组的正样本对所对应的第一相似度，第二距离表征了该样本组的负样本对所对应的第一相似度；Determine the first distance between the feature vectors of the two samples in the positive sample pair of the sample group and the second distance between the feature vectors of the two samples in the negative sample pair of the sample group, where the first distance characterizes the the first similarity corresponding to the positive sample pair of the sample group, and the second distance represents the first similarity corresponding to the negative sample pair of the sample group;

确定第一距离和第二距离的差值；determining the difference between the first distance and the second distance;

根据差值确定该样本组对应的训练损失，其中，该样本组对应的训练损失与差值成正相关。The training loss corresponding to the sample group is determined according to the difference, wherein the training loss corresponding to the sample group is positively correlated with the difference.

由于两个特征向量之间的距离越大，说明两个特征向量越不相似，如果距离越小，说明特征向量越相似，因此，可以通过样本对中的两个训练样本的特征向量之间的距离表征样本对之间的相似度，样本对对应的距离越大，相似度越低。因为训练模型的目标是让模型输出的样本组中正样本对的特征向量之间相似度尽可能高且负样本对的特征向量之间的相似度尽可能低，也就是尽可能拉近样本组中正样本与锚点之间的距离，尽可能拉远负样本与锚点之间的距离，因此，对于每个样本组，可以基于该样本组中正样本与锚点对应的第一距离和负样本与该锚点对应的第二距离之间的差值来计算该样本组对应的训练损失。本申请实施例中，每个样本组对应的训练损失与该样本组对应的第一距离和第二距离之间的差值成正相关，差值越小损失越小，可以理解为损失越小，样本组中正样本与锚点之间的距离越小，负样本与锚点之间的距离越大。Since the larger the distance between the two feature vectors, the more dissimilar the two feature vectors are. If the distance is smaller, the more similar the feature vectors are. Therefore, the difference between the feature vectors of the two training samples in the sample pair can be calculated. The distance represents the similarity between sample pairs. The larger the distance corresponding to the sample pair, the lower the similarity. Because the goal of training the model is to make the similarity between the eigenvectors of the positive sample pair in the sample group output by the model as high as possible and the similarity between the eigenvectors of the negative sample pair as low as possible, that is, as close as possible to the positive sample pair in the sample group. The distance between the sample and the anchor point, and the distance between the negative sample and the anchor point as far as possible, therefore, for each sample group, it can be based on the first distance corresponding to the positive sample and the anchor point in the sample group and the negative sample and the anchor point. The difference between the second distances corresponding to the anchor point is used to calculate the training loss corresponding to the sample group. In the embodiment of the present application, the training loss corresponding to each sample group is positively correlated with the difference between the first distance and the second distance corresponding to the sample group. The smaller the difference, the smaller the loss, which can be understood as the smaller the loss. The smaller the distance between the positive samples and the anchor points in the sample group, the larger the distance between the negative samples and the anchor points.

本申请实施例提供的该可选方式，提供了一种对于改进型的triplet loss函数，现有的一个样本组(现有的样本组时随机选取得到的)对应的triplet loss的表达式如下：The optional method provided by the embodiment of the present application provides an improved triplet loss function, and the expression of the triplet loss corresponding to an existing sample group (randomly selected from the existing sample group) is as follows:

(d(a，p)-d(a，n)+α)₊ (1)(d(a,p)-d(a,n)+α) ₊ (1)

其中，d(a，p)表示样本组中锚点和正样本之间的第一距离，d(a，n)表示样本组中锚点和负样本之间的第二距离，α为预设的参数值，+表示括号中的值大于或等于0时，损失为括号中的值，括号中的值小于0，损失为0。Among them, d(a, p) represents the first distance between the anchor point and the positive sample in the sample group, d(a, n) represents the second distance between the anchor point and the negative sample in the sample group, and α is a preset Parameter value, + means that when the value in brackets is greater than or equal to 0, the loss is the value in brackets, and the value in brackets is less than 0, the loss is 0.

由上述表达式可以看出，现有的triplet loss函数在训练过程中，对于每个三元组，如果基于模型的输出判断出的三元组的关系是正确的(即d(a，p)-d(a，n)+α小于0)，直接将三元组对应的损失置为0，损失的计算采用的是硬截止的方式，未考虑d(a，p)-d(a，n)+α小于0的三元组对应的损失，但是这些三元组同样是对模型的训练结果是有影响的，在d(a，p)-d(a，n)+α是一个负数时，三元组中正样本对之间的距离虽然是比负样本对之间的距离小，但是如果也为该三元组赋予训练损失，也就是认为此次模型输出的正样本对的特征向量之间的距离与负样本对的特征向量之间的距离仍有待优化(即距离之差仍可以有所加大)，那么也是有利用模型性能的提升的，而现有的triplet loss是忽略了该部分三元组对应的训练损失的。It can be seen from the above expression that during the training process of the existing triplet loss function, for each triplet, if the relationship of the triplet determined based on the output of the model is correct (ie d(a, p) -d(a, n)+α is less than 0), directly set the loss corresponding to the triplet to 0, the loss calculation adopts the hard cutoff method, d(a, p)-d(a, n is not considered )+α is less than the loss corresponding to the triplet of 0, but these triples also have an impact on the training results of the model, when d(a, p)-d(a, n)+α is a negative number , although the distance between the positive sample pairs in the triplet is smaller than the distance between the negative sample pairs, if the training loss is also assigned to the triplet, that is, it is considered that the feature vector of the positive sample pair output by this model is the sum of the The distance between the distance and the distance between the feature vectors of the negative sample pair still needs to be optimized (that is, the difference between the distances can still be increased), then the performance of the model can also be improved, and the existing triplet loss ignores this. The training loss corresponding to some triples.

针对上述问题，本申请实施例提供的上述计算样本组对应的训练损失的方案，在计算模型对应的训练总损失时，是考虑了输入到模型中的所有样本组对应的训练损失的，每个样本组对应的训练损失与该样本组中正样本对之间的第一距离和负样本对之间的第二距离之间的差值成正相关，即第一距离减去第二距离，值越大损失相对越大，值越小损失相对越小。其中，差值与损失之间可以是线性变化关系，也可以是非线性变化关系，对于损失函数的具体形式可以基于本申请提供的该方案的原理进行设计与选择。In view of the above problems, the above-mentioned solution for calculating the training loss corresponding to the sample group provided by the embodiment of the present application takes into account the training loss corresponding to all the sample groups input into the model when calculating the total training loss corresponding to the model. The training loss corresponding to the sample group is positively correlated with the difference between the first distance between the positive sample pairs and the second distance between the negative sample pairs in the sample group, that is, the first distance minus the second distance, the larger the value is The loss is relatively larger, and the smaller the value is, the smaller the loss is. Wherein, the difference and the loss may be a linear change relationship or a non-linear change relationship, and the specific form of the loss function may be designed and selected based on the principle of the solution provided in this application.

作为一可选方案，每个样本组对应的训练损失是基于以下表达式确定出的：As an optional solution, the training loss corresponding to each sample group is determined based on the following expression:

s(x)＝ln(1+e^x) (2)s( ^x )=ln(1+ex ) (2)

x＝d(a，p)-d(a，n)+β (3)x=d(a,p)-d(a,n)+β(3)

由表达式(2)和(3)可以看出，本申请的该可选方案中，样本组对应的训练损失是呈指数衰减变化，而不是硬截止，采用该方案计算模型对应的训练总损失，并采用该损失约束模型的参数调整，可以使得训练出的模型能够更好的拉近同类样本在特征嵌入空间的距离，进一步提升模型性能。It can be seen from expressions (2) and (3) that in this optional solution of the present application, the training loss corresponding to the sample group changes exponentially, rather than a hard cutoff, and this solution is used to calculate the total training loss corresponding to the model. , and using the loss to constrain the parameter adjustment of the model, the trained model can better shorten the distance of similar samples in the feature embedding space, and further improve the performance of the model.

本申请实施例提供的训练方法，对于每次训练操作，在基于模型输出的各个训练样本的特征向量计算出各个样本组对应的训练损失之后，则可以基于各个样本组对应的训练损失，计算模型对应的训练总损失，例如，可以将各个样本组对应的训练损失的和或者均值作为训练总损失，进一步可以判断训练总损失是否满足预设条件，如果满足预设条件，可以结束模型训练，如果不满足预设条件，则可以对模型参数进行调整，如可以采用梯度下降算法进行模型参数的调整，并基于新的样本组对模型继续进行训练。In the training method provided in this embodiment of the present application, for each training operation, after calculating the training loss corresponding to each sample group based on the feature vector of each training sample output by the model, the model can be calculated based on the training loss corresponding to each sample group. The corresponding total training loss, for example, the sum or average of the training losses corresponding to each sample group can be used as the total training loss, and further it can be judged whether the total training loss satisfies the preset conditions, and if the preset conditions are met, the model training can be ended. If the preset conditions are not met, the model parameters can be adjusted. For example, the gradient descent algorithm can be used to adjust the model parameters, and the model can be trained based on the new sample group.

可以理解的是，对于该可选方案，基于模型输出的训练样本的特征向量之间的相似度确定新的样本对，也就是确定各个锚点对应的新的样本组，一个锚点对应的新的样本组包括该锚点对应的新的正样本对和该锚点对应的新的负样本组。It can be understood that, for this optional solution, a new sample pair is determined based on the similarity between the feature vectors of the training samples output by the model, that is, a new sample group corresponding to each anchor point is determined, and a new sample group corresponding to an anchor point is determined. The sample group of includes a new positive sample pair corresponding to the anchor point and a new negative sample group corresponding to the anchor point.

可选的，上述基于各个训练样本的特征向量之间的第二相似度，确定各锚点对应的新的样本组，可以包括：Optionally, the above-mentioned determination of a new sample group corresponding to each anchor point based on the second similarity between the feature vectors of each training sample may include:

对于每个锚点，分别确定该锚点的特征向量与各个第一样本的特征向量之间的第二相似度，将对应的第二相似度最低的第一样本确定为该锚点对应的新的正样本，其中，第一样本为各个训练样本中与该锚点属于相同类别的训练样本；For each anchor point, the second similarity between the feature vector of the anchor point and the feature vector of each first sample is respectively determined, and the corresponding first sample with the lowest second similarity is determined as the corresponding first sample of the anchor point The new positive sample of , wherein, the first sample is the training sample that belongs to the same category as the anchor point in each training sample;

对于每个锚点，确定该锚点的特征向量与各个第二样本的特征向量之间的第二相似度，将对应的第二相似度最高的第二样本确定为该锚点对应的新的负样本，其中，第二样本为各个训练样本中与该锚点属于不同类别的训练样本。For each anchor point, determine the second similarity between the feature vector of the anchor point and the feature vector of each second sample, and determine the corresponding second sample with the highest second similarity as the new one corresponding to the anchor point A negative sample, wherein the second sample is a training sample that belongs to a different category from the anchor point in each training sample.

同样的，在实际实施时，对于每个锚点，可以确定该锚点对应的一个新的样本组，也可以是确定多个新的样本组，具体的，对于一个锚点，可以将同类别的训练样本中与该锚点相似度最低的至少一个样本分别作为该锚点对应的正样本，将每个正样本分别与该锚点组合得到对应的正样本对，将不同类别的训练样本中与该锚点相似度最高的至少一个样本分别作为该锚点对应的负样本，将每个负样本分别与该锚点组合得到对应的负样本对，将锚点对应的一个正样本对和一个负样本对组合得到一个样本组，例如，正样本为两个，负样本为一个，那么正样本对为两个，负样本对为一个，可以将两个正样本对分别与负样本对组合，得到该锚点对应的两个样本组。采用本申请实施例提供的该方案，可以方便、快捷的实现样本组的扩充，且可以保证扩充后的样本组是比较困难的样本组合，采用这样的样本组合用于神经网络模型的训练，能够进一步提升训练好的模型所输出的特征向量的表达能力。Similarly, in actual implementation, for each anchor point, a new sample group corresponding to the anchor point can be determined, or multiple new sample groups can be determined. Specifically, for an anchor point, the same category can be determined. At least one sample with the lowest similarity to the anchor point in the training samples of , is used as the positive sample corresponding to the anchor point, and each positive sample is combined with the anchor point to obtain the corresponding positive sample pair, and the training samples of different categories are At least one sample with the highest similarity with the anchor point is used as the negative sample corresponding to the anchor point, and each negative sample is combined with the anchor point to obtain the corresponding negative sample pair, and a positive sample pair corresponding to the anchor point and a The negative sample pair is combined to obtain a sample group. For example, if there are two positive samples and one negative sample, then there are two positive sample pairs and one negative sample pair. The two positive sample pairs can be combined with negative sample pairs respectively, Get the two sample groups corresponding to the anchor point. By using the solution provided by the embodiment of the present application, the expansion of the sample group can be realized conveniently and quickly, and it can be ensured that the expanded sample group is a relatively difficult sample combination. Further improve the expression ability of the feature vector output by the trained model.

作为一可选方案，将训练集中各个训练样本分别作为锚点，构建各锚点对应的样本组，可以包括：As an optional solution, each training sample in the training set is used as an anchor point, and a sample group corresponding to each anchor point is constructed, which may include:

根据训练集，构建至少一个批数据集，其中，每个批数据集包括p个类别的训练样本，且每个类别的训练样本的数量为k个，其中，p≥2，k≥3；According to the training set, construct at least one batch data set, wherein each batch data set includes training samples of p categories, and the number of training samples of each category is k, where p≥2, k≥3;

对于每个批数据集，将该批数据集中的每个训练样本分别作为锚点，基于该批数据集中的各训练样本，构建该批数据集中的每个锚点对应的样本组；For each batch of datasets, each training sample in the batch of datasets is used as an anchor point, and based on each training sample in the batch of datasets, a sample group corresponding to each anchor point in the batch of datasets is constructed;

其中，上述基于训练集对神经网络模型重复执行训练操作，可以包括：Wherein, the above-mentioned repeated training operation on the neural network model based on the training set may include:

基于各批数据集对神经网络模型重复执行训练操作，每次训练操作是基于一个批数据集中的各锚点对应的样本组进行的；Repeatedly perform training operations on the neural network model based on each batch of data sets, and each training operation is performed based on a sample group corresponding to each anchor point in a batch of data sets;

相应的，上述基于各个训练样本的特征向量之间的第二相似度，确定各锚点对应的新的样本组，包括：Correspondingly, based on the second similarity between the feature vectors of each training sample, a new sample group corresponding to each anchor point is determined, including:

对于当前次训练操作对应的批数据集中的每个锚点，根据该锚点与该批数据集中除该锚点之外的各训练样本之间的第二相似度，确定该锚点对应的新的样本组。For each anchor point in the batch dataset corresponding to the current training operation, determine the new anchor point corresponding to the anchor point according to the second similarity between the anchor point and each training sample except the anchor point in the batch dataset. sample group.

对于模型的训练，由于训练集中的训练样本的数量通常都是比较多的，如果每次迭代训练都是采用训练集中的所有样本，计算开销会比较大，针对该问题，模型训练时，通过都是将训练集分成多个批数据集，一个批数据集也就是一个batch，每次迭代训练可以基于一个批数据集进行。采用该方案，可以减少每次计算模型的训练总损失时的计算量，还能够保证模型的损失函数收敛的稳定性。For model training, since the number of training samples in the training set is usually relatively large, if all the samples in the training set are used for each iteration training, the computational cost will be relatively large. It divides the training set into multiple batch data sets, a batch data set is also a batch, and each iteration training can be performed based on a batch data set. By adopting this solution, the amount of calculation when calculating the total loss of the model training each time can be reduced, and the stability of the convergence of the loss function of the model can also be ensured.

本申请实施例中，对于批数据集的大小(也就是批数据集中训练样本的总数量)，本申请实施例不做限定，只要满足每个批数据集中至少有两个类别的训练数据且每个类别的训练数据的个数不小于三个。将训练集划分为多个批数据集之后，构建初始的样本组则可以是对每个批数据集分别进行，即构建得到的每个样本组中三个训练样本都是同一个批数据集中的，同样的，在训练过程中确定新的样本组时，也是从同一个批数据集中确定该批数据集中每个锚点对应的新的样本组。In the embodiment of the present application, the size of the batch data set (that is, the total number of training samples in the batch data set) is not limited in the embodiment of the present application, as long as there are at least two categories of training data in each batch data set and each The number of training data for each category is not less than three. After the training set is divided into multiple batch data sets, the initial sample group can be constructed separately for each batch data set, that is, the three training samples in each sample group constructed are all in the same batch data set. , Similarly, when a new sample group is determined during the training process, a new sample group corresponding to each anchor point in the batch data set is also determined from the same batch data set.

可以理解的是，在采用批数据集进行模型训练操作时，各个批数据集都要参与模型的训练的，且每个批数据集都是参与多次训练，一次迭代训练基于一个批数据集进行，将该批数据集中的各个样本都输入到神经网络模型中，得到该批数据集中各个样本的特征向量，基于模型的输出向量，可以计算各批数据集中各个样本组对应的训练损失，从而得到模型的训练总损失，如果训练总损失不满足预设条件，则可以基于模型输出的各个样本的特征向量之间的相似度，更新每个锚点对应的样本组，得到新的样本组，并将得到的各个新的样本组作为该批数据集中的各个样本组用于后续的训练操作。It is understandable that when using batch data sets for model training operations, each batch data set must participate in the training of the model, and each batch data set participates in multiple trainings, and an iterative training is performed based on one batch data set. , input each sample in the batch of datasets into the neural network model, and obtain the feature vector of each sample in the batch of datasets. Based on the output vector of the model, the training loss corresponding to each sample group in each batch of datasets can be calculated, thereby obtaining The total training loss of the model. If the total training loss does not meet the preset conditions, the sample group corresponding to each anchor point can be updated based on the similarity between the feature vectors of each sample output by the model to obtain a new sample group, and Each new sample group obtained is used as each sample group in the batch of datasets for subsequent training operations.

其中，在基于批数据集对模型进行训练时，模型对应的训练总损失(也就是批数据集中的所有三元组对应的训练损失)可以表示如下：Among them, when the model is trained based on the batch data set, the total training loss corresponding to the model (that is, the training loss corresponding to all triples in the batch data set) can be expressed as follows:

s(α)＝ln(1+e^x)s(α)=ln( ¹ +ex )

其中，L_th表示训练总损失，s(α)表示批数据集中的一个锚点α对应的训练损失，也就是锚点α对应的三元组的训练损失，一个批数据集中可以由p×k个三元组，即批数据集中的每个样本分别作为一个锚点，构建每个锚点对应的三元组，

表示锚点α对应的三元组中的正样本对的特征向量之间的距离，

锚点α对应的三元组中的负样本对的特征向量之间的距离，可以理解的是，一个批数据集第一次被输入到模型中时，

和

为根据模型的输出计算出的锚点a的初始的三元组中样本对所对应的距离，除了第一次输入到模型中之外，

和

则是锚点a的新的三元组(即最困难的样本组合)中样本对所对应的距离。Among them, L _th represents the total training loss, and s(α) represents the training loss corresponding to an anchor point α in the batch dataset, that is, the training loss of the triplet corresponding to the anchor point α. A batch dataset can be composed of p×k three triples, that is, each sample in the batch dataset is used as an anchor, and the triple corresponding to each anchor is constructed,

represents the distance between the feature vectors of the positive sample pair in the triplet corresponding to the anchor point α,

The distance between the feature vectors of the negative sample pairs in the triplet corresponding to the anchor point α, it can be understood that when a batch dataset is first input into the model,

and

is the distance corresponding to the sample pair in the initial triplet of anchor point a calculated according to the output of the model, except for the first input into the model,

and

is the distance corresponding to the sample pair in the new triplet of anchor point a (ie the most difficult sample combination).

需要说明的是，在实际应用中，假设第n次训练操作是基于批数据集1进行的，在完成该次训练操作，并得到批数据集1对应的各个新的样本组之后，第n+1次操作可以是基于批数据集1的新的样本组进行，也可以是基于其他批数据集对应的原始的各样本组或者其他批数据集的新的样本组。也就是说，在神经网络模型的训练过程中，每次训练操作是基于哪个批数据集本申请实施例不做限定，只要各个批数据集都能够多次参与到模型的训练中即可，比如，可以预设每个批数据集参与模型的训练的最少次数，可以让多个批数据集参与模型训练的次数大致相同，以使得训练集中的各个训练样本参与模型训练的次数基本均衡。It should be noted that, in practical applications, it is assumed that the nth training operation is performed based on batch data set 1. After the training operation is completed and each new sample group corresponding to batch data set 1 is obtained, the n+th One operation may be performed based on a new sample group of batch data set 1, or may be based on original sample groups corresponding to other batch data sets or new sample groups of other batch data sets. That is to say, in the training process of the neural network model, which batch of data sets each training operation is based on is not limited in this embodiment of the present application, as long as each batch of data sets can participate in the training of the model multiple times, for example , the minimum number of times each batch of data sets participates in model training can be preset, and the number of batches of data sets participating in model training can be roughly the same, so that the number of times each training sample in the training set participates in model training is basically balanced.

可选的，假设通过对训练集进行采样的方式，将训练集划分成了3个批数据集，记为S1、S2和S3，那么在对模型进行训练时，可以通过轮询的方式，分别用S1、S2和S3中的初始的三元组依次对模型进行训练操作(三个批数据集都参与了一次训练操作也就是完成了一代训练)，之后，可以再分别用S1、S2和S3中对应的更新的三元组依次进行训练操作，通过不断的重复训练得到满足预设条件的训练好的特征提取模型。Optionally, suppose that the training set is divided into three batch data sets by sampling the training set, denoted as S1, S2 and S3, then when training the model, you can poll the data sets respectively. Use the initial triples in S1, S2 and S3 to train the model in turn (all three batches of data sets participate in one training operation, that is, a generation of training is completed), after which you can use S1, S2 and S3 respectively. The corresponding updated triples in the training operation are performed sequentially, and a trained feature extraction model that satisfies the preset conditions is obtained through continuous repeated training.

采用本申请实施例提供的训练方法得到的训练好的特征提取模型，在对输入到模型中的待处理数据进行特征提取时，能够提取出具有很好的区分性的特征向量，从而可以基于该特征向量实现对待处理数据的进一步处理，比如，对待处理数据的类型进行识别，还可以基于不同的待处理数据的特征向量，判断不同的数据之间的相似程度，还可以用于数据集的分类等等。其中，待处理数据可以是各种形式的数据，可以包括但不限于文本、字符(如数值)、图像或者其他形式的数据。训练好的特征提取模型可以应用于任何需要提取具有很好的区分能力的特征向量的场景中，例如，可以将该特征提取模型作为其他模型(如分类模型、相似度判断模型等)中的特征提取模块。The trained feature extraction model obtained by using the training method provided in the embodiment of the present application can extract a feature vector with good discrimination when performing feature extraction on the data to be processed input into the model, so that the feature vector can be extracted based on the The feature vector realizes further processing of the data to be processed. For example, it can identify the type of the data to be processed. It can also judge the similarity between different data based on the feature vectors of different data to be processed, and can also be used for the classification of data sets. and many more. The data to be processed may be data in various forms, including but not limited to text, characters (such as numerical values), images, or other forms of data. The trained feature extraction model can be applied to any scene that needs to extract feature vectors with good distinguishing ability. For example, the feature extraction model can be used as a feature in other models (such as classification models, similarity judgment models, etc.) Extract modules.

为了测试采用本申请实施例提供的训练方法训练得到的特征提取模型的效果，在基于训练集得到满足预设条件的特征提取模型即embedding模型之后，在跨时间测试集上对该模型进行了测试，提取测试集中测试数据的隐向量(也就是特征向量)，测试结果表明，该模型的iv值(Information Value，信息价值，用来衡量特征的预测能力)高达2.7。此外，还在跨时间测试集上，将采用该特征提取模型提取的特征向量与采用现有技术训练得到的特征提取模型提取到的特征向量拼接在一起，送入到xgboost等传统机器学习模型中，用于二分类任务来测试特征的分类效果，测试结果表明，其中可帮助xgboost模型下的测试集AUC(Area Under Curve，即ROC曲线下的面积)指标可以提高5个百分点，KS值(一种模型评估指标)提高了4个百分点。经实验证明，采用采用本申请实施例提供的训练方法训练应用于特定场景的特征提取模型可以提取得到对分类任务具有很好的区分度的隐向量特征。本申请实施例还提供了一种数据处理方法，如图6中所示，该方法可以包括以下步骤：In order to test the effect of the feature extraction model trained by using the training method provided in the embodiment of the present application, after obtaining a feature extraction model that satisfies the preset conditions based on the training set, that is, the embedding model, the model is tested on the cross-time test set , extract the latent vector (that is, the feature vector) of the test data in the test set. The test result shows that the iv value of the model (Information Value, which is used to measure the predictive ability of the feature) is as high as 2.7. In addition, on the cross-time test set, the feature vector extracted by the feature extraction model and the feature vector extracted by the feature extraction model trained by the existing technology are spliced together, and sent to traditional machine learning models such as xgboost. , which is used for the binary classification task to test the classification effect of the feature. The test results show that the AUC (Area Under Curve, that is, the area under the ROC curve) index of the test set under the xgboost model can be improved by 5 percentage points, and the KS value (one model evaluation metrics) increased by 4 percentage points. It has been proved by experiments that by using the training method provided by the embodiment of the present application to train a feature extraction model applied to a specific scene, a latent vector feature with a good degree of discrimination for the classification task can be extracted. The embodiment of the present application also provides a data processing method, as shown in FIG. 6 , the method may include the following steps:

步骤S210：获取待处理数据；Step S210: obtaining data to be processed;

步骤S220：将待处理数据输入到第一特征提取模型中，通过第一特征提取模型提取得到待处理数据的第一特征向量，其中，第一特征提取模型是采用本申请任一可选实施例中的训练方法训练得到的；Step S220: Input the data to be processed into the first feature extraction model, and extract the first feature vector of the data to be processed through the first feature extraction model, wherein the first feature extraction model adopts any optional embodiment of the present application obtained from the training method in ;

步骤S230：基于第一特征向量，确定待处理数据对应的分类结果。Step S230: Based on the first feature vector, determine the classification result corresponding to the data to be processed.

其中，待处理数据的形式本申请实施例不做限定，对于不同的应用场景，待处理数据的形式可以是不同的。比如，待处理数据可以是文本，也可以是图像，还可以是通过对业务数据进行处理得到的特征矩阵。对于不同的应用场景和应用需求，数据处理结果的形式也可以是不同的，比如，如果是想要识别待处理数据对应的类型，数据处理结果对应的数据分类结果，如果是想要判断待处理数据和其他数据的是否相似，则可以基于待处理数据的第一特征向量和其他数据的特征向量计算相似度，数据处理结果则是计算出的相似度或者是根据相似度和设定阈值确定出的相似度判断结果。本实施例中，以将采用本申请实施例提供的训练方法训练得到的特征提取模型应用于数据分类为例进行的描述，采用该特征提取模型提取出的特征向量，可以有效提高分类结果的准确性。The form of the data to be processed is not limited in this embodiment of the present application. For different application scenarios, the form of the data to be processed may be different. For example, the data to be processed may be text, images, or feature matrices obtained by processing business data. For different application scenarios and application requirements, the form of data processing results can also be different. For example, if you want to identify the type corresponding to the data to be processed, the data classification result corresponding to the data processing result, if you want to determine the type of data to be processed Whether the data is similar to other data, the similarity can be calculated based on the first eigenvector of the data to be processed and the eigenvectors of other data, and the data processing result is the calculated similarity or is determined according to the similarity and the set threshold. The similarity judgment result. In this embodiment, the description is given by applying the feature extraction model trained by the training method provided by the embodiment of the present application to data classification as an example. The feature vector extracted by the feature extraction model can effectively improve the accuracy of the classification result. sex.

可选的，该数据处理方法还可以包括：通过第二特征提取模型提取所述待处理数据的第二特征向量；Optionally, the data processing method may further include: extracting a second feature vector of the data to be processed by using a second feature extraction model;

相应的，上述步骤S230中，基于第一特征向量，确定待处理数据对应的数据处理结果，可以包括：Correspondingly, in the above step S230, based on the first feature vector, determining the data processing result corresponding to the data to be processed may include:

将所述第一特征向量和所述第二特征向量融合；fusing the first feature vector and the second feature vector;

基于融合后的特征，确定待处理数据的数据处理结果。Based on the fused features, a data processing result of the data to be processed is determined.

为了进一步提升分类结果的准确性，本申请的该可选方式中，可以将通过多种特征提取模型提取得到的待处理数据的特征向量进行融合，通过融合特征来确定分类结果。经测试证明，将采用本申请实施例提供的训练方法训练得到的特征提取模型所提取的待处理数据的特征向量和其他特征提取模型所提取的待处理数据的特征向量进行融合后用于分类任务，可以有效提高最终分类结果的准确度。In order to further improve the accuracy of the classification result, in this optional method of the present application, the feature vectors of the data to be processed obtained by extraction of various feature extraction models can be fused, and the classification result can be determined by fusing the features. It has been proved by testing that the feature vector of the data to be processed extracted by the feature extraction model trained by the training method provided in the embodiment of the present application and the feature vector of the data to be processed extracted by other feature extraction models are fused and used for the classification task. , which can effectively improve the accuracy of the final classification result.

其中，对于第二特征提取模型的模型结构以及训练方式本申请实施例不做限定，可以是采用现有训练方式训练得到的特征提取模型。对于特征向量的具体融合方式本申请实施例不做限定，可以包括但不限于将第一特征向量和第二特征向量进行拼接、相加或者计算均值等中的至少一项。The model structure and training method of the second feature extraction model are not limited in the embodiments of the present application, and may be a feature extraction model trained by using an existing training method. The specific fusion method of the feature vectors is not limited in the embodiment of the present application, which may include, but is not limited to, at least one of splicing, adding, or calculating the mean value of the first feature vector and the second feature vector.

本申请的可选实施例中，上述获取待处理数据，可以包括：In an optional embodiment of the present application, the above-mentioned obtaining of the data to be processed may include:

获取目标对象对应的多个时段的目标业务的业务数据，每个时段对应的业务数据包括目标业务的至少一个业务属性的属性值；Obtain business data of the target business in multiple time periods corresponding to the target object, and the business data corresponding to each time period includes an attribute value of at least one business attribute of the target business;

基于上述多个时段对应的业务数据，构建目标对象对应的业务时序特征矩阵，将业务特征矩阵作为待处理数据，上述分类结果表征了目标对象的对象类型Based on the business data corresponding to the above multiple time periods, a business time series feature matrix corresponding to the target object is constructed, and the business feature matrix is used as the data to be processed. The above classification results represent the object type of the target object

其中，业务时序特征矩阵的行数为多个时段的时段个数，列数为至少一个业务属性的属性个数，业务时序特征矩阵中的每个元素值表征一个时段对应的一个业务属性的属性值。The number of rows in the service sequence feature matrix is the number of time periods in multiple time periods, the number of columns is the number of attributes of at least one service attribute, and each element value in the service sequence feature matrix represents the attribute of a service attribute corresponding to a time period value.

由前文中提供的应用场景实施例可知，本申请实施例提供的数据处理方法可以应用于目标对象的类型识别场景中，可选的，可以通过包含上述第一特征提取模型的分类模型得到上述分类识别结果，比如，可以将上述业务时序特征矩阵输入到分类模型中，通过分类模型的第一特征提取模型提取特征矩阵的第一特征向量，通过第二特征提取模型提取特征矩阵的第二特征向量，将第一特征向量和第二特征向量拼接在一起，通过分类模块得到分类结果，分类模块的具体结构本申请实施例也不做限定。可选的，分类模型可以包括但不限于基于xgboost等传统机器学习模型的分类模型。It can be known from the application scenario embodiments provided in the foregoing that the data processing method provided in the embodiments of the present application can be applied to the type recognition scenario of the target object. Optionally, the above classification can be obtained through a classification model including the above-mentioned first feature extraction model. The identification result, for example, the above-mentioned business sequence feature matrix can be input into the classification model, the first feature vector of the feature matrix can be extracted through the first feature extraction model of the classification model, and the second feature vector of the feature matrix can be extracted through the second feature extraction model. , the first feature vector and the second feature vector are spliced together, and a classification result is obtained through the classification module, and the specific structure of the classification module is not limited in this embodiment of the present application. Optionally, the classification model may include, but is not limited to, a classification model based on traditional machine learning models such as xgboost.

基于与本申请实施例提供的训练方法相同的原理，本申请实施例还提供了一种特征提取模型的训练装置，如图7所示，该训练装置100包括训练数据获取模块110、训练数据处理模块120和模型训练模块130。Based on the same principle as the training method provided by the embodiment of the present application, the embodiment of the present application further provides a training device for a feature extraction model. As shown in FIG. 7 , the training device 100 includes a training data acquisition module 110, a training data processing module module 120 and model training module 130.

训练数据获取模块110，用于获取训练集，训练集包括多个类别的训练样本；A training data acquisition module 110, configured to acquire a training set, where the training set includes training samples of multiple categories;

训练数据处理模块120，用于基于训练集构建多个样本对，多个样本对包括多个正样本对和多个负样本对，其中，正样本对包括同一类别的两个训练样本，负样本对包括不同类别的两个训练样本；The training data processing module 120 is configured to construct multiple sample pairs based on the training set, where the multiple sample pairs include multiple positive sample pairs and multiple negative sample pairs, wherein the positive sample pairs include two training samples of the same category, and the negative sample pairs include For two training samples including different classes;

模型训练模块130，用于基于训练集对神经网络模型重复执行训练操作，直至满足预设条件，将满足预设条件的神经网络模型作为训练好的特征提取模型；其中，上述预设条件包括神经网络模型对应的训练总损失收敛或者训练次数达到设定次数，上述训练操作包括：The model training module 130 is used to repeatedly perform training operations on the neural network model based on the training set until a preset condition is met, and the neural network model that meets the preset condition is used as a trained feature extraction model; wherein, the above-mentioned preset conditions include neural network models. The total training loss corresponding to the network model converges or the number of training times reaches the set number of times. The above training operations include:

将多个样本对中的各个训练样本分别输入到神经网络模型中，得到各个训练样本的特征向量；基于各个样本对中的训练样本的特征向量之间的第一相似度，确定训练总损失；若训练总损失未收敛且训练次数未达到设定次数，则对神经网络模型的模型参数进行调整，基于各个训练样本的特征向量之间的第二相似度，确定多个新的样本对，并将新的多个样本对作为后续训练操作时所基于的多个样本对。Input each training sample in the multiple sample pairs into the neural network model, respectively, to obtain the feature vector of each training sample; determine the total training loss based on the first similarity between the feature vectors of the training samples in each sample pair; If the total training loss does not converge and the number of training times does not reach the set number of times, the model parameters of the neural network model are adjusted, and multiple new sample pairs are determined based on the second similarity between the feature vectors of each training sample, and Use the new multiple sample pairs as the multiple sample pairs on which subsequent training operations are based.

可选的，模型训练模块可以用于：对于每个训练样本，分别确定该训练样本的特征向量与各个第一样本的特征向量之间的第二相似度，将对应的第二相似度最低的第一样本和该训练样本作为一个新的正样本对，其中，第一样本为各个训练样本中与该训练样本属于相同类别的训练样本；对于每个训练样本，分别确定该训练样本的特征向量与各个第二样本的特征向量之间的第二相似度，将对应的第二相似度最高的第二样本和该训练样本作为一个新的负样本对，其中，第二样本为各个训练样本中与该训练样本属于不同类别的训练样本。Optionally, the model training module can be used to: for each training sample, respectively determine the second similarity between the feature vector of the training sample and the feature vector of each first sample, and set the corresponding second similarity to be the lowest. The first sample and the training sample are taken as a new positive sample pair, wherein the first sample is a training sample belonging to the same category as the training sample in each training sample; for each training sample, determine the training sample respectively The second similarity between the feature vector of the second sample and the feature vector of each second sample, the corresponding second sample with the highest second similarity and the training sample are taken as a new negative sample pair, where the second sample is each The training samples in the training samples belong to different categories from the training samples.

可选的，训练数据处理模块可以用于：将训练集中各个训练样本分别作为锚点，构建各锚点对应的样本组，每个锚点对应的样本组包括该锚点对应的一个正样本对和一个负样本对，其中，一个锚点对应的正样本对包括该锚点和该锚点的正样本，一个锚点对应的负样本对包括该锚点和该锚点的负样本；Optionally, the training data processing module can be used to: use each training sample in the training set as an anchor point, and construct a sample group corresponding to each anchor point, and the sample group corresponding to each anchor point includes a positive sample pair corresponding to the anchor point. and a negative sample pair, wherein the positive sample pair corresponding to an anchor point includes the anchor point and the positive sample of the anchor point, and the negative sample pair corresponding to an anchor point includes the anchor point and the negative sample of the anchor point;

相应的，模型训练模块在确定训练总损失时可以用于：对于每个样本组，根据该样本组的正样本对中两个样本的特征向量之间的第一相似度、以及该样本组的负样本对中两个样本的特征向量之间的第一相似度，确定该样本组对应的训练损失；根据各样本组对应的训练损失，确定训练总损失；Correspondingly, when determining the total training loss, the model training module can be used to: for each sample group, according to the first similarity between the feature vectors of the two samples in the positive sample pair of the sample group, and the The first similarity between the feature vectors of the two samples in the negative sample pair determines the training loss corresponding to the sample group; the total training loss is determined according to the training loss corresponding to each sample group;

模型训练模块在确定多个新的样本对时可以用于：基于各个训练样本的特征向量之间的第二相似度，确定各锚点对应的新的样本组，将各锚点对应的新的样本组中的样本对作为后续训练操作时的多个样本对。When determining a plurality of new sample pairs, the model training module can be used to: determine a new sample group corresponding to each anchor point based on the second similarity between the feature vectors of each training sample, and assign the new sample group corresponding to each anchor point. The sample pairs in the sample group are used as multiple sample pairs in subsequent training operations.

可选的，模型训练模块在确定各锚点对应的新的样本组时可以用于：对于每个锚点，分别确定该锚点的特征向量与各个第一样本的特征向量之间的第二相似度，将对应的第二相似度最低的第一样本确定为该锚点对应的新的正样本，其中，第一样本为各个训练样本中与该锚点属于相同类别的训练样本；对于每个锚点，确定该锚点的特征向量与各个第二样本的特征向量之间的第二相似度，将对应的第二相似度最高的第二样本确定为该锚点对应的新的负样本，其中，第二样本为各个训练样本中与该锚点属于不同类别的训练样本。Optionally, when determining the new sample group corresponding to each anchor point, the model training module can be used to: for each anchor point, respectively determine the number 1 between the feature vector of the anchor point and the feature vector of each first sample. The second similarity is to determine the first sample with the lowest second similarity as the new positive sample corresponding to the anchor point, wherein the first sample is the training sample that belongs to the same category as the anchor point in each training sample ; For each anchor point, determine the second similarity between the feature vector of the anchor point and the feature vector of each second sample, and determine the second sample with the highest second similarity as the new anchor point corresponding to the second sample. The negative sample of , wherein the second sample is a training sample that belongs to a different category from the anchor point in each training sample.

可选的，训练数据处理模块在构建各锚点对应的样本组时可以用于：根据训练集，构建至少一个批数据集，其中，每个批数据集包括p个类别的训练样本，且每个类别的训练样本的数量为k个，其中，p≥2，k≥3；对于每个批数据集，将该批数据集中的每个训练样本分别作为锚点，基于该批数据集中的各训练样本，构建该批数据集中的每个锚点对应的样本组；Optionally, the training data processing module can be used to construct at least one batch data set according to the training set when constructing the sample group corresponding to each anchor point, wherein each batch data set includes training samples of p categories, and each batch data set includes training samples of p categories. The number of training samples of each category is k, where p≥2, k≥3; for each batch of data sets, each training sample in the batch of data sets is used as an anchor point, based on the Training samples to construct a sample group corresponding to each anchor point in the batch of datasets;

相应的，模型训练模块可以用于：基于各批数据集对神经网络模型重复执行训练操作，其中，每次训练操作是基于一个批数据集中的各锚点对应的样本组进行的；Correspondingly, the model training module can be used to: repeatedly perform training operations on the neural network model based on each batch of data sets, wherein each training operation is performed based on a sample group corresponding to each anchor point in a batch of data sets;

模型训练模块在确定各锚点对应的新的样本组时可以用于：对于当前次训练操作对应的批数据集中的每个锚点，根据该锚点与该批数据集中除该锚点之外的各训练样本之间的第二相似度，确定该锚点对应的新的样本组。When determining a new sample group corresponding to each anchor point, the model training module can be used to: for each anchor point in the batch data set corresponding to the current training operation, according to the anchor point and the batch data set except the anchor point The second similarity between the training samples of , determines the new sample group corresponding to the anchor point.

确定第一距离和第二距离的差值；根据差值确定该样本组对应的训练损失，其中，该样本组对应的训练损失与差值成正相关。Determine the difference between the first distance and the second distance; determine the training loss corresponding to the sample group according to the difference, wherein the training loss corresponding to the sample group is positively correlated with the difference.

可选的，每个样本组对应的训练损失是基于以下表达式确定出的：Optionally, the training loss corresponding to each sample group is determined based on the following expression:

s(x)＝ln(1+e^x)s( ^x )=ln(1+ex )

x＝d(a，p)-d(a，n)+βx=d(a,p)-d(a,n)+β

基于与本申请实施例提供的数据处理方法相同的原理，本申请实施例还提供了一种基于神经网络模型的数据处理装置，如图8所示，该数据处理装置200包括数据获取模块210和数据处理模型220。Based on the same principle as the data processing method provided by the embodiment of the present application, the embodiment of the present application further provides a data processing apparatus based on a neural network model. As shown in FIG. 8 , the data processing apparatus 200 includes a data acquisition module 210 and a Data processing model 220 .

数据获取模块210，用于获取待处理数据；a data acquisition module 210, configured to acquire data to be processed;

数据处理模块220，用于将待处理数据输入到第一特征提取模型，通过第一特征提取模型提取待处理数据对应的第一特征向量，基于第一特征向量，通过分类模块得到待处理数据对应的分类结果，其中，第一特征提取模型是采用本申请任一可选实施例中提供的训练方法训练得到的。The data processing module 220 is configured to input the data to be processed into the first feature extraction model, extract the first feature vector corresponding to the data to be processed through the first feature extraction model, and obtain the corresponding data to be processed through the classification module based on the first feature vector. , wherein the first feature extraction model is obtained by training using the training method provided in any optional embodiment of the present application.

可选的，数据处理模块220具体可以用于：通过所述第二特征提取模型提取所述待处理数据的第二特征向量；将所述第一特征向量和所述第二特征向量融合；基于融合后的特征，确定所述待处理数据对应的分类结果。Optionally, the data processing module 220 may be specifically configured to: extract the second feature vector of the data to be processed by using the second feature extraction model; fuse the first feature vector and the second feature vector; The fused features determine the classification result corresponding to the data to be processed.

可选的，数据获取模块可以用于：获取目标对象对应的多个时段的目标业务的业务数据，每个时段对应的业务数据包括目标业务的至少一个业务属性的属性值；基于多个时段对应的业务数据，构建目标对象对应的业务特征矩阵，将业务特征矩阵作为待处理数据，上述分类结果表征了目标对象的对象类型；其中，业务特征矩阵的行数为多个时段的时段个数，列数为上述至少一个业务属性的属性个数，业务特征矩阵中的每个元素值表征一个时段对应的一个业务属性的属性值。Optionally, the data acquisition module can be used to: acquire the business data of the target business in multiple time periods corresponding to the target object, and the business data corresponding to each time period includes the attribute value of at least one business attribute of the target business; The business data of the target object is constructed, the business feature matrix corresponding to the target object is constructed, and the business feature matrix is used as the data to be processed. The above classification results represent the object type of the target object; wherein, the number of rows of the business feature matrix is the number of time periods in multiple time periods, The number of columns is the attribute number of the at least one service attribute, and each element value in the service feature matrix represents an attribute value of a service attribute corresponding to a time period.

本申请实施例的装置可执行本申请实施例所提供的与装置相对应的方法，其实现原理相类似，本申请各实施例的装置中的各模块所执行的动作是与本申请各实施例中相对应的方法中的步骤相对应的，对于装置的各模块的详细功能描述具体可以参见前文中所示的对应方法中的描述，此处不再赘述。本申请实施例提供的训练装置和数据处理装置可以是任一电子设备。The apparatuses in the embodiments of the present application can execute the methods corresponding to the apparatuses provided in the embodiments of the present application, and the implementation principles thereof are similar. Corresponding to the steps in the corresponding method in , for the detailed functional description of each module of the apparatus, reference may be made to the description in the corresponding method shown above, and details are not repeated here. The training apparatus and the data processing apparatus provided in the embodiments of the present application may be any electronic device.

本申请实施例中还提供了一种电子设备，该电子设备可以包括存储器、处理器及存储在存储器上的计算机程序，该处理器执行存储器中存储的计算机程序时可实现本申请任一可选实施例中的方法。The embodiment of the present application also provides an electronic device, the electronic device may include a memory, a processor, and a computer program stored in the memory, and when the processor executes the computer program stored in the memory, any optional optional device of the present application may be implemented. methods in the examples.

可选地，图9示出了本发明实施例所适用的一种电子设备的结构示意图，如图9所示，该电子设备可以为服务器或者用户终端，该电子设备可以用于实施本发明任一实施例中提供的方法。Optionally, FIG. 9 shows a schematic structural diagram of an electronic device to which an embodiment of the present invention is applicable. As shown in FIG. 9 , the electronic device may be a server or a user terminal, and the electronic device may be used to implement any The method provided in an embodiment.

如图9中所示，该电子设备2000可以包括至少一个处理器2001、存储器2002、通信模块2003和输入/输出接口2004等组件，可选的，各组件之间可以通过总线2005实现连接通信。需要说明的是，图9中示出的该电子设备2000的结构只是示意性的，并不构成对本申请实施例提供的方法所适用的电子设备的限定。As shown in FIG. 9 , the electronic device 2000 may include at least one component such as a processor 2001 , a memory 2002 , a communication module 2003 and an input/output interface 2004 . It should be noted that the structure of the electronic device 2000 shown in FIG. 9 is only schematic, and does not constitute a limitation on the electronic device to which the methods provided in the embodiments of the present application are applicable.

其中，存储器2002可以用于存储操作系统和应用程序等，应用程序可以包括在被处理器2001调用时实现本发明实施例所示方法的计算机程序，还可以包括用于实现其他功能或服务的程序。存储器2002可以是ROM(Read Only Memory，只读存储器)或可存储静态信息和指令的其他类型的静态存储设备，RAM(Random Access Memory，随机存取存储器)或者可存储信息和计算机程序的其他类型的动态存储设备，也可以是EEPROM(ElectricallyErasable Programmable Read Only Memory，电可擦可编程只读存储器)、CD-ROM(CompactDisc Read Only Memory，只读光盘)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质，但不限于此。The memory 2002 may be used to store an operating system and an application program, etc. The application program may include a computer program for implementing the method shown in the embodiment of the present invention when called by the processor 2001, and may also include a program for implementing other functions or services. . The memory 2002 can be ROM (Read Only Memory, read only memory) or other types of static storage devices that can store static information and instructions, RAM (Random Access Memory, random access memory) or other types that can store information and computer programs It can also be EEPROM (Electrically Erasable Programmable Read Only Memory, Electrically Erasable Programmable Read Only Memory), CD-ROM (CompactDisc Read Only Memory, CD-ROM) or other CD-ROM storage, CD-ROM storage (including compressed CD-ROM) , laser disc, compact disc, digital versatile disc, Blu-ray disc, etc.), magnetic disk storage medium or other magnetic storage device, or any other device capable of carrying or storing desired program code in the form of instructions or data structures and capable of being accessed by a computer Other media, but not limited to this.

处理器2001通过总线2005与存储器2002连接，通过调用存储器2002中所存储的应用程序实现相应的功能。其中，处理器2001可以是CPU(Central Processing Unit，中央处理器)，通用处理器，DSP(Digital Signal Processor，数据信号处理器)，ASIC(Application Specific Integrated Circuit，专用集成电路)，FPGA(FieldProgrammable Gate Array，现场可编程门阵列)或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合，其可以实现或执行结合本发明公开内容所描述的各种示例性的逻辑方框，模块和电路。处理器2001也可以是实现计算功能的组合，例如包含一个或多个微处理器组合，DSP和微处理器的组合等。The processor 2001 is connected to the memory 2002 through the bus 2005, and implements corresponding functions by calling the application program stored in the memory 2002. The processor 2001 may be a CPU (Central Processing Unit, central processing unit), a general-purpose processor, a DSP (Digital Signal Processor, data signal processor), an ASIC (Application Specific Integrated Circuit, an application-specific integrated circuit), an FPGA (FieldProgrammable Gate) Array, field programmable gate array) or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof, which can implement or execute the various exemplary logic blocks, modules, and logic blocks described in connection with the present disclosure. circuit. The processor 2001 may also be a combination that implements computing functions, such as a combination of one or more microprocessors, a combination of a DSP and a microprocessor, and the like.

电子设备2000可以通过通信模块2003(可以包括但不限于网络接口等组件)连接到网络，以通过网络与其它设备(如用户终端或服务器等)的通信，实现数据的交互，如向其他设备发送数据或从其他设备接收数据。其中，通信模块2003可以包括有线网络接口和/或无线网络接口等，即通信模块可以包括有线通信模块或无线通信模块中的至少一项。The electronic device 2000 can be connected to the network through the communication module 2003 (which may include but not limited to components such as a network interface), so as to communicate with other devices (such as user terminals or servers, etc.) through the network to realize data interaction, such as sending data to other devices. data or receive data from other devices. The communication module 2003 may include a wired network interface and/or a wireless network interface, etc., that is, the communication module may include at least one of a wired communication module or a wireless communication module.

电子设备2000可以通过输入/输出接口2004连接所需要的输入/输出设备，如键盘、显示设备等，电子设备2000自身可以具有显示设备，还可以通过接口2004外接其他显示设备。可选的，通过该接口2004还可以连接存储装置，如硬盘等，以可以将电子设备2000中的数据存储到存储装置中，或者读取存储装置中的数据，还可以将存储装置中的数据存储到存储器2002中。可以理解的，输入/输出接口2004可以是有线接口，也可以是无线接口。根据实际应用场景的不同，与输入/输出接口2004连接的设备，可以是电子设备2000的组成部分，也可以是在需要时与电子设备2000连接的外接设备。The electronic device 2000 can be connected to required input/output devices, such as keyboards, display devices, etc., through the input/output interface 2004 . Optionally, a storage device, such as a hard disk, can also be connected through the interface 2004, so that the data in the electronic device 2000 can be stored in the storage device, or the data in the storage device can be read, and the data in the storage device can also be stored. stored in the memory 2002. It can be understood that the input/output interface 2004 may be a wired interface or a wireless interface. According to different actual application scenarios, the device connected to the input/output interface 2004 may be a component of the electronic device 2000, or may be an external device connected to the electronic device 2000 when needed.

用于连接各组件的总线2005可以包括一通路，在上述组件之间传送信息。总线2005可以是PCI(Peripheral Component Interconnect，外设部件互连标准)总线或EISA(Extended Industry Standard Architecture，扩展工业标准结构)总线等。根据功能的不同，总线2005可以分为地址总线、数据总线、控制总线等。The bus 2005 used to connect the components may include a path to transfer information between the components. The bus 2005 may be a PCI (Peripheral Component Interconnect, Peripheral Component Interconnect) bus or an EISA (Extended Industry Standard Architecture, Extended Industry Standard Architecture) bus or the like. According to different functions, the bus 2005 can be divided into an address bus, a data bus, a control bus, and the like.

可选的，对于本发明实施例所提供的方案而言，存储器2002可以用于存储执行本发明方案的计算机程序，并由处理器2001来运行，处理器2001运行该计算机程序时实现本发明实施例提供的方法或装置的动作。Optionally, for the solutions provided by the embodiments of the present invention, the memory 2002 may be used to store a computer program for executing the solutions of the present invention, and be executed by the processor 2001. When the processor 2001 runs the computer program, the present invention is implemented. Examples of acts of the method or apparatus provided.

本申请实施例还提供了一种计算机可读存储介质，该计算机可读存储介质上存储有计算机程序，计算机程序被处理器执行时可实现前述方法实施例的相应内容。Embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the corresponding content of the foregoing method embodiments can be implemented.

本申请实施例还提供了一种计算机程序产品，该产品包括计算机程序，该计算机程序被处理器执行时可实现前述方法实施例的相应内容。Embodiments of the present application further provide a computer program product, the product including a computer program, and when the computer program is executed by a processor, the corresponding content of the foregoing method embodiments can be implemented.

需要说明的是，本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”、“第四”、“1”、“2”等(如果存在)是用于区别类似的对象，而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换，以便这里描述的本申请的实施例能够以除图示或文字描述以外的顺序实施。It should be noted that the terms "first", "second", "third", "fourth", "1", "2", etc. in the description and claims of the present application and the above drawings (if existence) is used to distinguish similar objects and is not necessarily used to describe a particular order or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances so that the embodiments of the application described herein can be practiced in sequences other than those illustrated or described in the text.

应该理解的是，虽然本申请实施例的流程图中通过箭头指示各个操作步骤，但是这些步骤的实施顺序并不受限于箭头所指示的顺序。除非本文中有明确的说明，否则在本申请实施例的一些实施场景中，各流程图中的实施步骤可以按照需求以其他的顺序执行。此外，各流程图中的部分或全部步骤基于实际的实施场景，可以包括多个子步骤或者多个阶段。这些子步骤或者阶段中的部分或全部可以在同一时刻被执行，这些子步骤或者阶段中的每个子步骤或者阶段也可以分别在不同的时刻被执行。在执行时刻不同的场景下，这些子步骤或者阶段的执行顺序可以根据需求灵活配置，本申请实施例对此不限制。It should be understood that, although the respective operation steps are indicated by arrows in the flowcharts of the embodiments of the present application, the execution order of these steps is not limited to the order indicated by the arrows. Unless explicitly stated herein, in some implementation scenarios of the embodiments of the present application, the implementation steps in each flowchart may be performed in other sequences as required. In addition, some or all of the steps in each flowchart are based on actual implementation scenarios, and may include multiple sub-steps or multiple stages. Some or all of these sub-steps or stages may be executed at the same time, and each of these sub-steps or stages may also be executed at different times respectively. In scenarios with different execution times, the execution order of these sub-steps or stages may be flexibly configured according to requirements, which is not limited in this embodiment of the present application.

以上所述仅是本申请部分实施场景的可选实施方式，应当指出，对于本技术领域的普通技术人员来说，在不脱离本申请的方案技术构思的前提下，采用基于本申请技术思想的其他类似实施手段，同样属于本申请实施例的保护范畴。The above are only optional implementations of some implementation scenarios of the present application. It should be pointed out that for those of ordinary skill in the art, without departing from the technical concept of the solution of the present application, the application based on the technical concept of the present application is adopted. Other similar implementation means also belong to the protection scope of the embodiments of the present application.

Claims

1. a training method of a feature extraction model, is characterized in that, comprises:

obtaining a training set, the training set includes training samples of multiple categories;

Construct a plurality of sample pairs based on the training set, the plurality of sample pairs include a plurality of positive sample pairs and a plurality of negative sample pairs, wherein the positive sample pairs include two training samples of the same category, and the negative sample pairs for two training samples including the different classes;

Repeat the training operation on the neural network model based on the training set until a preset condition is met, and use the neural network model that satisfies the preset condition as a trained feature extraction model; wherein the preset condition includes the neural network model The total training loss corresponding to the network model converges or the number of training times reaches the set number of times, and the training operations include:

Input each training sample in the multiple sample pairs into the neural network model respectively, and obtain the feature vector of each training sample;

Determine the total training loss based on the first similarity between the feature vectors of the training samples in each of the sample pairs;

If the total training loss does not converge and the number of training times does not reach the set number of times, the model parameters of the neural network model are adjusted, and based on the second similarity between the feature vectors of the respective training samples, a plurality of new sample pairs, and use the new multiple sample pairs as multiple sample pairs on which subsequent training operations are based.

2. The method according to claim 1, wherein, determining a plurality of new sample pairs based on the second similarity between the feature vectors of the respective training samples, comprising:

For each training sample, determine the second similarity between the feature vector of the training sample and the feature vector of each first sample, respectively, and compare the corresponding first sample with the lowest second similarity to the training sample As a new positive sample pair, wherein the first sample is a training sample belonging to the same category as the training sample in each of the training samples;

For each training sample, the second similarity between the feature vector of the training sample and the feature vector of each second sample is determined respectively, and the corresponding second sample with the highest second similarity and the training sample are regarded as one A new pair of negative samples, wherein the second sample is a training sample belonging to a different category from the training sample among the training samples.

3. The method according to claim 1, wherein the constructing a plurality of sample pairs based on the training samples in the training set comprises:

Each training sample in the training set is used as an anchor point, and a sample group corresponding to each anchor point is constructed, and the sample group corresponding to each anchor point includes a positive sample pair and a negative sample pair corresponding to the anchor point, wherein, The positive sample pair corresponding to an anchor point includes the anchor point and the positive sample of the anchor point, and the negative sample pair corresponding to an anchor point includes the anchor point and the negative sample of the anchor point;

Determining the total training loss based on the first similarity between the feature vectors of the training samples in each of the sample pairs, including:

For each sample group, according to the first similarity between the feature vectors of the two samples in the positive sample pair of the sample group and the first similarity between the feature vectors of the two samples in the negative sample pair of the sample group degree, determine the training loss corresponding to the sample group;

Determine the total training loss according to the training loss corresponding to each of the sample groups;

Determining a plurality of new sample pairs based on the second similarity between the feature vectors of the respective training samples, including:

Based on the second similarity between the feature vectors of the respective training samples, a new sample group corresponding to each of the anchor points is determined, and the sample pair in the new sample group corresponding to each of the anchor points is used as a subsequent training operation multiple sample pairs.

4. The method according to claim 3, wherein the determining a new sample group corresponding to each of the anchor points based on the second similarity between the feature vectors of the respective training samples, comprising:

For each anchor point, the second similarity between the feature vector of the anchor point and the feature vector of each first sample is determined respectively, and the corresponding first sample with the lowest second similarity is determined as the anchor A new positive sample corresponding to the point, wherein the first sample is a training sample belonging to the same category as the anchor point in each of the training samples;

For each anchor point, determine the second similarity between the feature vector of the anchor point and the feature vector of each second sample, and determine the corresponding second sample with the highest second similarity as the corresponding second sample of the anchor point A new negative sample, wherein the second sample is a training sample belonging to a different category from the anchor point among the training samples.

5. The method according to claim 3 or 4, characterized in that, using each training sample in the training set as an anchor point, and constructing a sample group corresponding to each anchor point, comprising:

According to the training set, construct at least one batch data set, wherein each batch data set includes training samples of p categories, and the number of training samples of each category is k, where p≥2, k ≥3;

For each of the batch data sets, each training sample in the batch data set is used as an anchor point, and based on each training sample in the batch data set, a sample group corresponding to each anchor point in the batch data set is constructed;

The repeatedly performing training operations on the neural network model based on the training set includes:

Repeatedly perform training operations on the neural network model based on each of the batch data sets, wherein each training operation is performed based on a sample group corresponding to each anchor point in one of the batch data sets;

Determining a new sample group corresponding to each anchor point based on the second similarity between the feature vectors of each training sample, including:

For each anchor point in the batch dataset corresponding to the current training operation, determine the new anchor point corresponding to the anchor point according to the second similarity between the anchor point and each training sample except the anchor point in the batch dataset. sample group.

6. The method according to claim 3 or 4, wherein, for each sample group, the first similarity between the feature vectors of two samples in the positive sample pair according to the sample group, and the The first similarity between the feature vectors of the two samples in the negative sample pair of the sample group determines the training loss corresponding to the sample group, including:

Determine the first distance between the feature vectors of the two samples in the positive sample pair of the sample group and the second distance between the feature vectors of the two samples in the negative sample pair of the sample group, where the first distance represents the first similarity corresponding to the positive sample pair of the sample group, and the second distance represents the first similarity corresponding to the negative sample pair of the sample group;

determining the difference between the first distance and the second distance;

The training loss corresponding to the sample group is determined according to the difference, wherein the training loss corresponding to the sample group is positively correlated with the difference.

7. The method according to claim 6, wherein the training loss corresponding to each of the sample groups is determined based on the following expression:

s( ^x )=ln(1+ex )

x=d(a,p)-d(a,n)+β

Among them, s(x) represents the training loss corresponding to the sample group, a, p and n represent the anchor points, positive samples and negative samples in the sample group, respectively, d(a, p) represents the first distance, d(a, n ) represents the second distance, and β represents the preset adjustment threshold.

8. A data processing method based on a neural network model, characterized in that, comprising:

Get data to be processed;

Input the data to be processed into a first feature extraction model, and extract the first feature vector corresponding to the data to be processed through the first feature extraction model, wherein the first feature extraction model is based on claims 1 to Obtained by the method training described in any one of 7;

Based on the first feature vector, a classification result corresponding to the data to be processed is determined.

9. The method according to claim 8, wherein the method further comprises:

Extract the second feature vector of the data to be processed by using the second feature extraction model;

The determining the classification result corresponding to the data to be processed based on the first feature vector includes:

fusing the first feature vector and the second feature vector;

Based on the fused features, a classification result corresponding to the data to be processed is determined.

10. The method according to claim 8 or 9, wherein the acquiring the data to be processed comprises:

obtaining business data of the target business in multiple time periods corresponding to the target object, and the business data corresponding to each time period includes an attribute value of at least one business attribute of the target business;

Based on the business data corresponding to the multiple time periods, construct a business time series feature matrix corresponding to the target object, and use the business time series feature matrix as the data to be processed, wherein the classification result represents the target object's characteristic matrix. object type;

Wherein, the number of rows of the service sequence feature matrix is the number of time periods of the plurality of time periods, the number of columns is the number of attributes of the at least one service attribute, and each element value in the service sequence feature matrix represents a The attribute value of a business attribute corresponding to the time period.

11. A training device for a feature extraction model, comprising:

a training data acquisition module for acquiring a training set, the training set including training samples of multiple categories;

A training data processing module, configured to construct a plurality of sample pairs based on the training set, the plurality of sample pairs include a plurality of positive sample pairs and a plurality of negative sample pairs, wherein the positive sample pairs include two samples of the same category training samples, the negative sample pair includes two training samples of the different categories;

A model training module, configured to repeatedly perform training operations on the neural network model based on the training set until a preset condition is met, and use the neural network model that satisfies the preset condition as a trained feature extraction model; wherein the preset The set conditions include that the total training loss corresponding to the neural network model converges or the number of training times reaches a set number of times, and the training operations include:

Determine the total training loss based on the first similarity between the feature vectors of the training samples in each of the sample pairs, where the total training loss represents the degree of difference between the positive sample pairs and the negative samples in the plurality of sample pairs degree of similarity between pairs;

12. A data processing device based on a neural network model, comprising:

The data acquisition module is used to acquire the data to be processed;

a data processing module, configured to input the data to be processed into a first feature extraction model, extract a first feature vector corresponding to the data to be processed through the first feature extraction model, and determine based on the first feature vector the classification result corresponding to the data to be processed;

Wherein, the first feature extraction model is obtained by training using the method of any one of claims 1 to 7.

13. An electronic device, characterized in that the electronic device comprises a memory and a processor, wherein a computer program is stored in the memory, and the processor executes the computer program to realize any one of claims 1 to 7 The method described or the method described in any one of claims 8 to 10 is implemented.

14. A computer-readable storage medium, characterized in that, a computer program is stored in the storage medium, and when the computer program is executed by a processor, the method according to any one of claims 1 to 7 is realized or the right The method of any one of claims 8 to 10.

15. A computer program product, characterized in that the computer product comprises a computer program that, when executed by a processor, implements the method of any one of claims 1 to 7 or implements claims 8 to 10 The method of any of the above.