CN112052818A

CN112052818A - Unsupervised domain adaptive pedestrian detection method, unsupervised domain adaptive pedestrian detection system and storage medium

Info

Publication number: CN112052818A
Application number: CN202010968987.1A
Authority: CN
Inventors: 谭宇志
Original assignee: Zhejiang Smart Video Security Innovation Center Co Ltd
Current assignee: Zhejiang Visual Intelligence Innovation Center Co ltd
Priority date: 2020-09-15
Filing date: 2020-09-15
Publication date: 2020-12-08
Anticipated expiration: 2040-09-15
Also published as: CN112052818B

Abstract

The embodiment of the application provides a pedestrian detection method and system adaptive to an unsupervised domain and a storage medium. According to the method and the device, only label-free image data in a new scene need to be acquired, and a large amount of image annotation is not needed, so that the manpower and material resource consumption caused by data annotation in the development process is greatly saved, and the efficiency is improved; the migration capability of the model is improved, and the method can be more suitable for the change of scenes.

Description

Pedestrian detection method, system and storage medium for unsupervised domain adaptation

技术领域technical field

本申请属于行人检测技术领域，具体地，涉及一种无监督域适应的行人检测方法、系统及存储介质。The present application belongs to the technical field of pedestrian detection, and in particular, relates to an unsupervised domain adaptation pedestrian detection method, system and storage medium.

背景技术Background technique

随着行人检测被越来越多的用到智能安防、自动驾驶等领域，行人检测的应用场景越来越丰富。然而，不同场景的光照条件，背景，摄像机角度等等都不相同，意味着不同场景下的数据分布通常是不一样的。行人检测的实际应用场景多种多样，基于一个场景下训练的网络模型直接用于另一个场景会造成检测性能的大幅度下降。深度学习方法依赖大量的有标签数据提升网络模型的泛化性能，因此，现有技术主要以数据驱动的方式，对每个场景的大量的数据都进行重新标注，然后在这些新标注的数据上重新训练网络得到新场景下的网络模型。With the increasing use of pedestrian detection in areas such as intelligent security and autonomous driving, the application scenarios of pedestrian detection are becoming more and more abundant. However, different scenes have different lighting conditions, backgrounds, camera angles, etc., which means that the data distribution in different scenes is usually different. The practical application scenarios of pedestrian detection are various, and the direct use of a network model trained in one scenario in another scenario will result in a significant drop in detection performance. The deep learning method relies on a large amount of labeled data to improve the generalization performance of the network model. Therefore, the existing technology mainly re-labels the large amount of data in each scene in a data-driven manner, and then relabels the newly labeled data. Retrain the network to get the network model in the new scene.

具体的，现有的方法中，处理这类问题主要以有监督的迁移学习为主。即在新场景下人为标注新的数据，利用重新标注的数据在原来模型上进行微调训练来实现模型迁移。具体实现过程包括：1、采集目标域场景下的数据；2、对采集数据进行人工标注；3、利用数据和其对应的标签信息，对源域上训练好的模型进行微调训练；4、利用训练好的模型进行目标域的目标检测。Specifically, in the existing methods, supervised transfer learning is mainly used to deal with such problems. That is, artificially label new data in a new scene, and use the re-labeled data to perform fine-tuning training on the original model to achieve model migration. The specific implementation process includes: 1. Collect data in the target domain scene; 2. Manually label the collected data; 3. Use the data and its corresponding label information to fine-tune the model trained on the source domain; 4. Use The trained model performs object detection in the target domain.

但是，虽然这种方法可以使得网络模型适应新场景，但是每一次切换场景后，会对应产生大量新的场景数据，大量的新场景数据都需要大量的人力物力进行标注，给应用以及开发人员带来了很大的不便，同时，随着摄像头的爆炸性增长，对每一个新架设的摄像头都进行数据标记需要非常大的成本。最后，随着场景不断增多的变化，使模型迁移能力逐渐变差。因此，以上基于标记数据进行微调训练的方法在实际应用中遭遇比较大的瓶颈，亟需一种新的行人检测算法解决以上问题。However, although this method can make the network model adapt to new scenarios, each time a scenario is switched, a large amount of new scenario data will be generated correspondingly, and a large amount of new scenario data will require a lot of manpower and material resources for labeling, which will bring great value to applications and developers. There is a great inconvenience, and at the same time, with the explosive growth of cameras, data tagging for each newly erected camera requires a very large cost. Finally, with the increasing number of changes in the scene, the model transfer ability gradually deteriorates. Therefore, the above methods for fine-tuning training based on labeled data encounter a relatively large bottleneck in practical applications, and a new pedestrian detection algorithm is urgently needed to solve the above problems.

发明内容SUMMARY OF THE INVENTION

本发明提出了一种无监督域适应的行人检测方法、系统及存储介质，旨在解决现有技术中基于标记数据进行微调训练时，需要对新场景下大量的数据进行人工标注，费时费力的问题。The present invention proposes a pedestrian detection method, system and storage medium for unsupervised domain adaptation, aiming to solve the problem of time-consuming and laborious manual labeling of a large amount of data in new scenarios when performing fine-tuning training based on labeled data in the prior art. question.

根据本申请实施例的第一个方面，提供了一种无监督域适应的行人检测方法，包括以下步骤：According to a first aspect of the embodiments of the present application, an unsupervised domain adaptation pedestrian detection method is provided, including the following steps:

随机选取有标签图像数据以及无标签图像数据；Randomly select labeled image data and unlabeled image data;

有标签图像数据通过随机数据增强得到增强有标签图像数据；无标图像签数据通过随机数据增强分别得到第一增强无标签图像数据以及第二增强无标签图像数据；The labeled image data is enhanced with random data to obtain enhanced labeled image data; the unlabeled image labeled data is obtained through random data enhancement to obtain the first enhanced unlabeled image data and the second enhanced unlabeled image data respectively;

输入增强有标签数据至第一行人检测网络，得到第一行人预测特征；输入第一增强无标签数据至第一行人检测网络，得到第二行人预测特征；输入第二增强无标签图像数据至第二行人检测网络得到第三行人预测特征；Input the enhanced labeled data to the first pedestrian detection network to obtain the first pedestrian prediction feature; input the first enhanced unlabeled data to the first pedestrian detection network to obtain the second pedestrian prediction feature; input the second enhanced unlabeled image The data is sent to the second pedestrian detection network to obtain the third pedestrian prediction feature;

根据有标签图像数据的标签特征以及第一行人预测特征，得到监督学习代价；根据第二行人预测特征以及第三行人预测特征，得到一致性代价；According to the label feature of the labeled image data and the first pedestrian prediction feature, the supervised learning cost is obtained; according to the second pedestrian prediction feature and the third pedestrian prediction feature, the consistency cost is obtained;

根据监督学习代价以及一致性代价相加，得到总代价；The total cost is obtained by adding the supervised learning cost and the consistency cost;

根据总代价，通过随机梯度下降算法，更新第一行人检测网络的权值参数；According to the total cost, the weight parameters of the first pedestrian detection network are updated through the stochastic gradient descent algorithm;

根据第一行人检测网络的权值参数，通过指数滑动平均算法，更新第二行人检测网络的权值参数。According to the weight parameters of the first pedestrian detection network, the weight parameters of the second pedestrian detection network are updated through the exponential moving average algorithm.

在本申请的一些实施方式中，无监督域适应的行人检测方法还包括：In some embodiments of the present application, the pedestrian detection method for unsupervised domain adaptation further includes:

重复以上步骤后直到第一行人检测网络以及第二行人监测网络收敛，得到更新后的第一行人检测网络以及第二行人监测网络；After repeating the above steps until the first pedestrian detection network and the second pedestrian detection network converge, the updated first pedestrian detection network and the second pedestrian detection network are obtained;

将待测的无标签图像数据输入至更新后的第一行人检测网络，得到行人检测结果。Input the unlabeled image data to be tested into the updated first pedestrian detection network to obtain the pedestrian detection result.

在本申请的一些实施方式中，有标签图像数据随机选取于标签数据已知的图像数据；无标签图像数据随机选取于待测的无标签图像数据。In some embodiments of the present application, the labeled image data is randomly selected from image data whose labeled data is known; the unlabeled image data is randomly selected from the unlabeled image data to be tested.

在本申请的一些实施方式中，第一行人预测特征、第二行人预测特征以及第三行人预测特征包括图像行人的大小信息、分类信息以及位置信息。In some embodiments of the present application, the first pedestrian prediction feature, the second pedestrian prediction feature, and the third pedestrian prediction feature include size information, classification information, and location information of the image pedestrian.

在本申请的一些实施方式中，监督学习代价以及一致性代价包括行人分类损失、行人中心点的偏移损失、行人边框宽度以及高度损失。In some embodiments of the present application, the supervised learning cost and the consistency cost include pedestrian classification loss, pedestrian center point offset loss, pedestrian border width and height loss.

在本申请的一些实施方式中，第一行人检测网络以及第二行人检测网络初始采用相同的神经网络架构。In some embodiments of the present application, the first pedestrian detection network and the second pedestrian detection network initially use the same neural network architecture.

在本申请的一些实施方式中，随机数据增强包括图像大小或者像素随机增强。In some embodiments of the present application, random data augmentation includes image size or pixel random augmentation.

在本申请的一些实施方式中，第二行人检测网络的权值参数通过对第一行人检测网络的权值计算训练过程中的指数滑动平均得到。In some embodiments of the present application, the weight parameters of the second pedestrian detection network are obtained by calculating the weights of the first pedestrian detection network by exponential moving average in the training process.

根据本申请实施例的第二个方面，提供了一种无监督域适应的行人检测系统，具体包括：According to a second aspect of the embodiments of the present application, an unsupervised domain adaptation pedestrian detection system is provided, which specifically includes:

训练数据选取模块：用于随机选取有标签图像数据以及无标签图像数据；Training data selection module: used to randomly select labeled image data and unlabeled image data;

数据增强模块：用于将有标签图像数据通过随机数据增强得到增强有标签图像数据；用于将无标图像签数据通过随机数据增强分别得到第一增强无标签图像数据以及第二增强无标签图像数据；Data enhancement module: used to enhance the labeled image data through random data enhancement to obtain enhanced labeled image data; used to enhance the unlabeled image labeled data through random data enhancement to obtain the first enhanced unlabeled image data and the second enhanced unlabeled image respectively. data;

特征预测网络模块：用于输入增强有标签数据至第一行人检测网络，得到第一行人预测特征；用于输入第一增强无标签数据至第一行人检测网络，得到第二行人预测特征；用于输入第二增强无标签图像数据至第二行人检测网络得到第三行人预测特征；Feature prediction network module: used to input the enhanced labeled data to the first pedestrian detection network to obtain the first pedestrian prediction feature; used to input the first enhanced unlabeled data to the first pedestrian detection network to obtain the second pedestrian prediction feature; used to input the second enhanced unlabeled image data to the second pedestrian detection network to obtain the third pedestrian prediction feature;

监督学习代价模块：用于根据有标签图像数据的标签特征以及第一行人预测特征，得到监督学习代价；Supervised learning cost module: used to obtain the supervised learning cost according to the label features of the labeled image data and the first pedestrian prediction feature;

一致性代价模块：用于根据第二行人预测特征以及第三行人预测特征，得到一致性代价；Consistency cost module: used to obtain the consistency cost according to the predicted features of the second pedestrian and the predicted features of the third pedestrian;

总代价模块：用于根据监督学习代价以及一致性代价相加，得到总代价；Total cost module: used to add the supervised learning cost and the consistency cost to obtain the total cost;

第一行人检测网络更新模块：用于根据总代价，通过随机梯度下降算法，更新第一行人检测网络的权值参数；The first pedestrian detection network update module: used to update the weight parameters of the first pedestrian detection network through the stochastic gradient descent algorithm according to the total cost;

第二行人检测网络更新模块：用于根据第一行人检测网络的权值参数，通过指数滑动平均算法，更新第二行人检测网络的权值参数。The second pedestrian detection network update module is used to update the weight parameters of the second pedestrian detection network through the exponential moving average algorithm according to the weight parameters of the first pedestrian detection network.

在本申请的一些实施方式中，无监督域适应的行人检测系统还包括：In some embodiments of the present application, the unsupervised domain-adapted pedestrian detection system further includes:

训练收敛模块：用于重复以上步骤后直到第一行人检测网络以及第二行人监测网络收敛，得到更新后的第一行人检测网络以及第二行人监测网络；Training convergence module: used to repeat the above steps until the first pedestrian detection network and the second pedestrian monitoring network converge, and the updated first pedestrian detection network and the second pedestrian monitoring network are obtained;

行人检测模块：用于将待测的无标签图像数据输入至更新后的第一行人检测网络，得到行人检测结果。Pedestrian detection module: used to input the unlabeled image data to be tested into the updated first pedestrian detection network to obtain pedestrian detection results.

根据本申请实施例的第三个方面，提供了一种计算机可读存储介质，其上存储有计算机程序；计算机程序被处理器执行以实现无监督域适应的行人检测方法。According to a third aspect of the embodiments of the present application, there is provided a computer-readable storage medium on which a computer program is stored; the computer program is executed by a processor to implement a pedestrian detection method for unsupervised domain adaptation.

采用本申请实施例中的无监督域适应的行人检测方法、系统及存储介质，随机选取有标签图像数据以及无标签图像数据；有标签图像数据通过随机数据增强得到增强有标签图像数据；无标图像签数据通过随机数据增强分别得到第一增强无标签图像数据以及第二增强无标签图像数据；输入增强有标签数据至第一行人检测网络，得到第一行人预测特征；输入第一增强无标签数据至第一行人检测网络，得到第二行人预测特征；输入第二增强无标签图像数据至第二行人检测网络得到第三行人预测特征；根据有标签图像数据的标签特征以及第一行人预测特征，得到监督学习代价；根据第二行人预测特征以及第三行人预测特征，得到一致性代价；根据监督学习代价以及一致性代价相加，得到总代价；根据总代价，通过随机梯度下降算法，更新第一行人检测网络的权值参数；根据第一行人检测网络的权值参数，通过指数滑动平均算法，更新第二行人检测网络的权值参数。本申请采用迁移学习来加强模型的迁移能力，通过无监督域适应的方法，即通过已有场景下的有标签数据和新场景下的无标签数据共同训练模型，使得模型可以将已有场景的数据表现能力迁移到新场景数据上。本申请只需要采集新场景下的无标签图像数据，不需要重新进行大量的图像标注，大大节省了开发过程中由于数据标注带来的人力物力消耗，提高了效率；提高了模型的迁移能力，更能适应场景的变化。Using the pedestrian detection method, system and storage medium for unsupervised domain adaptation in the embodiments of the present application, labeled image data and unlabeled image data are randomly selected; labeled image data is enhanced by random data enhancement to enhance labeled image data; The image signature data is obtained by random data enhancement to obtain the first enhanced unlabeled image data and the second enhanced unlabeled image data; input the enhanced labeled data to the first pedestrian detection network to obtain the first pedestrian prediction feature; input the first enhanced The unlabeled data is sent to the first pedestrian detection network to obtain the second pedestrian prediction feature; the second enhanced unlabeled image data is input to the second pedestrian detection network to obtain the third pedestrian prediction feature; According to the pedestrian prediction feature, the supervised learning cost is obtained; according to the second pedestrian prediction feature and the third pedestrian prediction feature, the consistency cost is obtained; according to the supervised learning cost and the consistency cost, the total cost is obtained; according to the total cost, through the stochastic gradient The descending algorithm is used to update the weight parameters of the first pedestrian detection network; according to the weight parameters of the first pedestrian detection network, the exponential moving average algorithm is used to update the weight parameters of the second pedestrian detection network. This application adopts transfer learning to strengthen the transfer ability of the model, through the method of unsupervised domain adaptation, that is, the model is jointly trained by the labeled data in the existing scene and the unlabeled data in the new scene, so that the model can transfer the existing scene to the Data representation capabilities are migrated to new scene data. This application only needs to collect unlabeled image data in a new scene, and does not need to re-label a large number of images, which greatly saves the human and material consumption caused by data labeling in the development process, improves efficiency, and improves the migration ability of the model. It is more adaptable to changes in the scene.

附图说明Description of drawings

此处所说明的附图用来提供对本申请的进一步理解，构成本申请的一部分，本申请的示意性实施例及其说明用于解释本申请，并不构成对本申请的不当限定。在附图中：The drawings described herein are used to provide further understanding of the present application and constitute a part of the present application. The schematic embodiments and descriptions of the present application are used to explain the present application and do not constitute an improper limitation of the present application. In the attached image:

图1中示出了根据本申请实施例的一种无监督域适应的行人检测方法的步骤流程图；FIG. 1 shows a flowchart of steps of an unsupervised domain adaptation pedestrian detection method according to an embodiment of the present application;

图2中示出了根据本申请另一实施例的无监督域适应的行人检测方法中步骤示意图；FIG. 2 shows a schematic diagram of steps in an unsupervised domain adaptation pedestrian detection method according to another embodiment of the present application;

图3中示出了根据本申请实施例的无监督域适应的行人检测方法的流程示意图；FIG. 3 shows a schematic flowchart of the pedestrian detection method for unsupervised domain adaptation according to an embodiment of the present application;

图4示出了根据本申请实施例的无监督域适应的行人检测系统的结构示意图。FIG. 4 shows a schematic structural diagram of an unsupervised domain-adapted pedestrian detection system according to an embodiment of the present application.

具体实施方式Detailed ways

在实现本申请的过程中，发明人发现现有的对于新场景下行人检测数据的处理时，通过有监督的迁移学习方法，对新场景下图像数据进行人为标注得到新的数据，并对原来场景的检测网络进行微调，最后，随着场景不断增多的变化，使模型迁移能力逐渐变差，检测结果不准确，且大量的人工数据标注费时费力。In the process of realizing this application, the inventor found that when the existing processing of pedestrian detection data in a new scene is used, a supervised transfer learning method is used to manually label the image data in the new scene to obtain new data, and the original The detection network of the scene is fine-tuned, and finally, with the increasing number of scenes, the model migration ability gradually deteriorates, the detection results are inaccurate, and a large amount of manual data annotation is time-consuming and labor-intensive.

为了解决模型迁移能力逐渐变差、以及标注数据成本高昂的问题，本申请提出一种无监督域适应的迁移学习方法，该方法只需要采集目标场景下的无标签数据，而不需要人工标记新的数据。In order to solve the problems of the gradual deterioration of model transfer ability and the high cost of labeling data, this application proposes a transfer learning method for unsupervised domain adaptation, which only needs to collect unlabeled data in the target scene without manually labeling new data. The data.

本申请构建两个完全相同的行人检测网络模型，一个作为学生模型，一个作为教师模型。通过教师网络根据目标域数据产生伪标签，利用产生的伪标签指导需要训练的学生网络。通过使得学生网络的输出与教师网络的输出尽可能接近，使得学生网络更加适应目标域数据的分布。This application builds two identical pedestrian detection network models, one as a student model and one as a teacher model. The teacher network generates pseudo-labels according to the target domain data, and uses the generated pseudo-labels to guide the student network that needs to be trained. By making the output of the student network as close as possible to the output of the teacher network, the student network is more adapted to the distribution of the target domain data.

本申请的无监督域适应的行人检测方法、系统及存储介质，随机选取有标签图像数据以及无标签图像数据；有标签图像数据通过随机数据增强得到增强有标签图像数据；无标图像签数据通过随机数据增强分别得到第一增强无标签图像数据以及第二增强无标签图像数据；输入增强有标签数据至第一行人检测网络，得到第一行人预测特征；输入第一增强无标签数据至第一行人检测网络，得到第二行人预测特征；输入第二增强无标签图像数据至第二行人检测网络得到第三行人预测特征；根据有标签图像数据的标签特征以及第一行人预测特征，得到监督学习代价；根据第二行人预测特征以及第三行人预测特征，得到一致性代价；根据监督学习代价以及一致性代价相加，得到总代价；根据总代价，通过随机梯度下降算法，更新第一行人检测网络的权值参数；根据第一行人检测网络的权值参数，通过指数滑动平均算法，更新第二行人检测网络的权值参数。本申请采用迁移学习来加强模型的迁移能力，通过无监督域适应的方法，通过已有场景下的有标签数据和新场景下的无标签数据共同训练模型，使得模型可以将已有场景的数据表现能力迁移到新场景数据上。本申请只需要采集新场景下的无标签图像数据，不需要重新进行大量的图像标注，大大提高了效率，从而大大提升了模型的迁移性能。In the pedestrian detection method, system and storage medium for unsupervised domain adaptation of the present application, labelled image data and unlabeled image data are randomly selected; labelled image data is enhanced by random data enhancement to enhance labelled image data; Random data enhancement obtains the first enhanced unlabeled image data and the second enhanced unlabeled image data; input the enhanced labeled data to the first pedestrian detection network to obtain the first pedestrian prediction feature; input the first enhanced unlabeled data to The first pedestrian detection network is used to obtain the second pedestrian prediction feature; the second enhanced unlabeled image data is input to the second pedestrian detection network to obtain the third pedestrian prediction feature; according to the label feature of the labeled image data and the first pedestrian prediction feature , the supervised learning cost is obtained; the consistency cost is obtained according to the second pedestrian prediction feature and the third pedestrian prediction feature; the total cost is obtained by adding the supervised learning cost and the consistency cost; according to the total cost, the stochastic gradient descent algorithm is used to update The weight parameters of the first pedestrian detection network; according to the weight parameters of the first pedestrian detection network, the weight parameters of the second pedestrian detection network are updated through the exponential moving average algorithm. This application adopts transfer learning to strengthen the transfer ability of the model. Through the method of unsupervised domain adaptation, the model is jointly trained by the labeled data in the existing scene and the unlabeled data in the new scene, so that the model can transfer the data of the existing scene. The performance capability is migrated to the new scene data. The present application only needs to collect unlabeled image data in a new scene, and does not need to re-label a large number of images, which greatly improves the efficiency and thus greatly improves the transfer performance of the model.

为了使本申请实施例中的技术方案及优点更加清楚明白，以下结合附图对本申请的示例性实施例进行进一步详细的说明，显然，所描述的实施例仅是本申请的一部分实施例，而不是所有实施例的穷举。需要说明的是，在不冲突的情况下，本申请中的实施例及实施例中的特征可以相互组合。In order to make the technical solutions and advantages of the embodiments of the present application more clear, the exemplary embodiments of the present application will be described in further detail below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present application, and Not all embodiments are exhaustive. It should be noted that the embodiments in the present application and the features of the embodiments may be combined with each other in the case of no conflict.

实施例1Example 1

图1中示出了根据本申请实施例的一种无监督域适应的行人检测方法的步骤流程图。FIG. 1 shows a flowchart of steps of an unsupervised domain adaptation pedestrian detection method according to an embodiment of the present application.

如图1所示，本申请的无监督域适应的行人检测方法，具体包括以下步骤：As shown in Figure 1, the pedestrian detection method for unsupervised domain adaptation of the present application specifically includes the following steps:

S101：随机选取有标签图像数据以及无标签图像数据。S101: Randomly select labeled image data and unlabeled image data.

其中，有标签图像数据随机选取于标签数据已知的图像数据；无标签图像数据随机选取于待测的无标签图像数据。Among them, the labeled image data is randomly selected from the image data whose labeled data is known; the unlabeled image data is randomly selected from the unlabeled image data to be tested.

S102：有标签图像数据通过随机数据增强得到增强有标签图像数据；无标图像签数据通过随机数据增强分别得到第一增强无标签图像数据以及第二增强无标签图像数据。S102: The labeled image data is enhanced with random data to obtain enhanced labeled image data; the unlabeled image labeled data is enhanced with random data to obtain first enhanced unlabeled image data and second enhanced unlabeled image data, respectively.

其中，随机数据增强包括图像大小或者像素随机增强。Among them, random data enhancement includes image size or pixel random enhancement.

S103：输入增强有标签数据至第一行人检测网络，得到第一行人预测特征；输入第一增强无标签数据至第一行人检测网络，得到第二行人预测特征；输入第二增强无标签图像数据至第二行人检测网络得到第三行人预测特征。S103: Input the enhanced labeled data to the first pedestrian detection network to obtain the first pedestrian prediction feature; input the first enhanced unlabeled data to the first pedestrian detection network to obtain the second pedestrian prediction feature; Label the image data to the second pedestrian detection network to obtain the third pedestrian prediction feature.

其中，第一行人检测网络以及第二行人检测网络初始采用相同的神经网络架构。Among them, the first pedestrian detection network and the second pedestrian detection network initially use the same neural network architecture.

S104：根据有标签图像数据的标签特征以及第一行人预测特征，得到监督学习代价；根据第二行人预测特征以及第三行人预测特征，得到一致性代价。S104: Obtain the supervised learning cost according to the label feature of the labeled image data and the first pedestrian prediction feature; obtain the consistency cost according to the second pedestrian prediction feature and the third pedestrian prediction feature.

其中，第一行人预测特征、第二行人预测特征以及第三行人预测特征包括图像行人的大小信息、分类信息以及位置信息。The first pedestrian prediction feature, the second pedestrian prediction feature, and the third pedestrian prediction feature include size information, classification information, and location information of the image pedestrian.

S105：根据监督学习代价以及一致性代价相加，得到总代价。S105: The total cost is obtained by adding the supervised learning cost and the consistency cost.

S106：根据总代价，通过随机梯度下降算法，更新第一行人检测网络的权值参数。S106: According to the total cost, the weight parameter of the first pedestrian detection network is updated through the stochastic gradient descent algorithm.

其中，监督学习代价以及一致性代价包括行人分类损失、行人中心点的偏移损失、行人边框宽度以及高度损失。Among them, supervised learning cost and consistency cost include pedestrian classification loss, pedestrian center point offset loss, pedestrian border width and height loss.

S107：根据第一行人检测网络的权值参数，通过指数滑动平均算法，更新第二行人检测网络的权值参数。S107: According to the weight parameters of the first pedestrian detection network, the weight parameters of the second pedestrian detection network are updated through an exponential moving average algorithm.

其中，第二行人检测网络的权值参数通过对第一行人检测网络的权值计算训练过程中的指数滑动平均得到。Wherein, the weight parameters of the second pedestrian detection network are obtained by calculating the weights of the first pedestrian detection network by exponential moving average in the training process.

图2中示出了根据本申请另一实施例的无监督域适应的行人检测方法中步骤示意图。FIG. 2 shows a schematic diagram of steps in an unsupervised domain adaptation pedestrian detection method according to another embodiment of the present application.

如图2所示，在本申请的一些实施方式中，无监督域适应的行人检测方法还包括以下步骤：As shown in FIG. 2, in some embodiments of the present application, the pedestrian detection method for unsupervised domain adaptation further includes the following steps:

S108：重复以上S101至S107步骤后直到第一行人检测网络以及第二行人监测网络收敛，得到更新后的第一行人检测网络以及第二行人监测网络；S108: Repeat the above steps S101 to S107 until the first pedestrian detection network and the second pedestrian detection network converge, and obtain the updated first pedestrian detection network and the second pedestrian detection network;

S109：将待测的无标签图像数据输入至更新后的第一行人检测网络，得到行人检测结果。S109: Input the unlabeled image data to be tested into the updated first pedestrian detection network to obtain a pedestrian detection result.

图3中示出了根据本申请实施例的无监督域适应的行人检测方法的流程示意图。FIG. 3 shows a schematic flowchart of a pedestrian detection method for unsupervised domain adaptation according to an embodiment of the present application.

本申请的监督域适应的行人检测方法具体实施中，首先需要构建两个完全相同的行人检测网络模型，即第一行人检测网络以及第二行人监测网络，第一行人检测网络作为学生模型，第二行人监测网络作为教师模型。In the specific implementation of the pedestrian detection method adapted to the supervised domain of the present application, it is first necessary to construct two identical pedestrian detection network models, namely the first pedestrian detection network and the second pedestrian detection network, and the first pedestrian detection network is used as a student model. , the second pedestrian monitoring network as a teacher model.

本实施例中，行人检测网络可以采用有锚框的网络，如SSD，YOLO-V3等网络结构；也可以采用无锚框行人检测网络，如Center-Net,YOLO-V1等网络结构。In this embodiment, the pedestrian detection network may use a network with anchor frames, such as SSD, YOLO-V3 and other network structures; or a pedestrian detection network without anchor frames, such as Center-Net, YOLO-V1 and other network structures.

如图3所示，本申请的监督域适应的行人检测方法具体步骤如下：As shown in Figure 3, the specific steps of the pedestrian detection method adapted to the supervised domain of the present application are as follows:

1)分别从已有场景下的有标签图像数据中以及新场景下的无标签图像数据中随机选取训练数据，有标签图像数据(x_s,B_s)以及无标签图像数据x_t，其中B_s表示有标签图像数据x_s的标签。1) Randomly select training data from labeled image data in the existing scene and unlabeled image data in the new scene, labeled image data (x _s , B _s ) and unlabeled image data x _t , where B _s represents the label of the labeled image data x _s .

2)有标签图像数据(x_s,B_s)得到增强有标签数据

x_t做两次随机数据增强处理后，分别得到第一增强无标签图像数据

和第二增强无标签图像数据

2) Labeled image data (x _s , B _s ) are enhanced with labelled data

After _doing two random data enhancement processing, the first enhanced unlabeled image data are obtained respectively.

and the second augmented unlabeled image data

3)将增强有标签数据

以及第一增强无标签图像数据

分别输入学生网络得到输出特征f_s以及f_t ^S，将第二增强无标签图像数据

输入教师网络得到输出特征f_t ^T，输出特征f_t ^T即伪标签。3) will enhance the labeled data

and the first enhanced unlabeled image data

Input the student network respectively to obtain the output features f _s and f _t ^S , and use the second enhanced unlabeled image data

Input the teacher network to get the output feature f _t ^T , and the output feature f _t ^T is the pseudo-label.

4)计算输出特征f_s与标签B_s之间的监督损失l_supervised，计算输出特征f_t ^S与f_t ^T之间的一致性损失l_consist。4) Calculate the supervision loss _lsupervised between the output features f _s and the label B _s , and calculate the consistency loss l _consist between the output features f _t ^S and f _t ^T.

5)对监督损失l_supervised和一致性代价l_consist求和得到总代价l_total。5) The total cost l _total is obtained by summing the supervision loss l _supervised and the consistency cost l _consist .

6)通过随机梯度下降算法(stochastic gradient descent，SGD)更新学生网络权值参数。6) Update the weight parameters of the student network through stochastic gradient descent (SGD).

7)通过指数滑动平均算法(exponential moving average，EMA)将学生网络权值更新到教师网络。7) Update the student network weights to the teacher network through the exponential moving average (EMA).

8)回到步骤1)循环这一系列步骤，直到学生网络收敛。8) Go back to step 1) to cycle through this series of steps until the student network converges.

在测试阶段，将新场景下的目标域待检测的图片输入给已经训练好的学生网络，学生网络输出分类置信度、边框偏移值和边框的宽和高。将行人分类置信度大于某阈值的特征点和其对应的边框作为最终的输出，本实施例的阈值设置为0.7。In the testing phase, the image to be detected in the target domain in the new scene is input to the trained student network, and the student network outputs the classification confidence, the frame offset value, and the width and height of the frame. The feature points with the pedestrian classification confidence greater than a certain threshold and their corresponding bounding boxes are used as the final output, and the threshold in this embodiment is set to 0.7.

具体实施的，学生网络的总代价，即损失函数由已有场景下的源域数据的监督损失以及新场景下的目标域数据的一致性损失两部分损失构成。Specifically, the total cost of the student network, that is, the loss function, is composed of two parts: the supervised loss of the source domain data in the existing scene and the consistency loss of the target domain data in the new scene.

关于已有场景下的源域数据的监督损失，以Center-Net检测网络为例，网络的每个特征点的输出包括其所属对象的分类信息和位置信息。其中，分类信息表示为检测任务中每个类的类别置信度。位置信息表示为每个点离所属目标中心的偏移距离，以及每个点所属目标边框的长和宽。Regarding the supervision loss of the source domain data in the existing scene, taking the Center-Net detection network as an example, the output of each feature point of the network includes the classification information and location information of the object to which it belongs. Among them, the classification information is expressed as the class confidence of each class in the detection task. The position information is expressed as the offset distance of each point from the center of the target, and the length and width of the target frame to which each point belongs.

因此，网络的监督损失包括累加的行人中心点的分类损失、行人中心点的偏移损失以及行人边框的宽度和高度损失。Therefore, the supervision loss of the network consists of the accumulated pedestrian center point classification loss, pedestrian center point offset loss, and pedestrian bounding box width and height losses.

监督损失l_supervised的总损失为三个损失的加权和，具体如公式(1)：The total loss of supervision loss l _supervised is the weighted sum of three losses, as shown in formula (1):

L_supervised＝L_k+λ_shapeL_shape+λ_offL_off公式(1)L _supervised =L _k +λ _shape L _shape +λ _off L _off Formula (1)

其中，L_k为分类损失，L_shape为行人边框的宽度和高度损失，L_off为行人中心点的偏移损失，λ_shape为行人边框的宽度和高度损失的权重，λ_off为行人中心点的偏移损失的权重。Among them, L _k is the classification loss, L _shape is the width and height loss of the pedestrian frame, L _off is the offset loss of the pedestrian center point, λ _shape is the weight of the width and height loss of the pedestrian frame, and λ _off is the pedestrian center point. Weights for offset loss.

关于新场景下的目标域数据的一致性损失，由于目标域数据没有人工的标签信息，因此采用的标签信息来源于教师网络的输出，与已有场景下的源域数据的监督损失相同的，也包括分类置信度损失和边框位置以及宽高损失，此处不再赘述。Regarding the consistency loss of the target domain data in the new scene, since the target domain data has no artificial label information, the label information used comes from the output of the teacher network, which is the same as the supervision loss of the source domain data in the existing scene. It also includes classification confidence loss and border position and width and height loss, which will not be repeated here.

具体说明的，如图3所示，学生网络接收两种输入数据：一种输入为源域的图像数据，另一种是目标域的图像数据。Specifically, as shown in Fig. 3, the student network receives two kinds of input data: one input is image data of the source domain, and the other is the image data of the target domain.

学生网络通过源域的图像数据输出特征f_s，包括图像数据中的行人位置和类别置信度。The student network outputs features f _s from the image data of the source domain, including pedestrian locations and class confidences in the image data.

由于源域的图像数据的标签是已知的，因此可以通过将网络预测的行人位置和类别置信度与真实标签label进行对比，并计算监督损失L_supervised。训练过程中，通过梯度下降使得L_supervised越来越小，从而使得学生网络能够输出在源域上越来越准确的行人检测结果。Since the labels of the image data of the source domain are known, the supervised loss L _supervised can be calculated by comparing the pedestrian locations and class confidences predicted by the network with the ground-truth labels. During the training process, the gradient descent makes L _supervised smaller and smaller, so that the student network can output more and more accurate pedestrian detection results in the source domain.

另一方面，学生网络通过目标域的图像数据输出特征f_t ^S，包括图像数据中的行人位置和类别置信度。On the other hand, the student network outputs features f _t ^S through the image data of the target domain, including pedestrian locations and class confidences in the image data.

由于目标域没有标签信息，因此不能直接用来训练网络。本申请提出构造“伪标签”的方法，将不同的目标域的图像数据输入教师网络得到特征f_t ^T，f_t ^T作为“伪标签”。Since the target domain has no label information, it cannot be directly used to train the network. The present application proposes a method for constructing "pseudo-labels", inputting image data of different target domains into the teacher network to obtain features f _t ^T , and f _t ^T as "pseudo-labels".

然后，将学生网络的输出特征f_t ^S与伪标签f_t ^T进行对比并计算损失L_consist，训练过程中，通过梯度下降使得L_consist越来越小，从而使得学生网络以及教师网络能够输出在目标域上预测越来越准确的行人检测结果。Then, the output feature f _t ^S of the student network is compared with the pseudo-label f _t ^T and the loss L _consist is calculated. During the training process, the gradient descent makes L _consist smaller and smaller, so that the student network and the teacher network can output in Predict increasingly accurate pedestrian detection results on the target domain.

其中，为了得到上述的“伪标签”，本申请构建的教师模型与学生模型相同。Among them, in order to obtain the above-mentioned "pseudo label", the teacher model constructed in this application is the same as the student model.

训练过程中，通过计算学生网络权值在训练过程中的指数滑动平均(EMA)得到教师网络模型的权值。教师网络本次迭代之后的参数表示如下：During the training process, the weights of the teacher network model are obtained by calculating the exponential moving average (EMA) of the student network weights during the training process. The parameters of the teacher network after this iteration are expressed as follows:

Y_t＝λY_t-1+(1-λ)X_t Y _t =λY _t-1 +(1-λ)X _t

其中，Y_t为教师网络本次迭代之后的参数，Y_t-1为教师网络上一次更新之后的参数，X_t为学生网络本次更新之后的参数，λ是分配给上一次迭代之后的模型参数的权重。Among them, Y _t is the parameter of the teacher network after this iteration, Y _t-1 is the parameter of the teacher network after the last update, X _t is the parameter of the student network after this update, and λ is the model assigned to the last iteration parameter weights.

当t＝0的时候，Y初始化并与X一致，即Y₀＝X₀。When t=0, Y is initialized and consistent with X, ie, Y ₀ =X ₀ .

因此，教师网络可以看成是一系列不同迭代阶段的学生网络的加权求和的结果。因此，教师网络的权重在时间上更加平滑。同时由于集合了不同阶段的学生网络，因此也具有更强的泛化能力。Therefore, the teacher network can be seen as the result of a weighted summation of a series of student networks at different iterative stages. Therefore, the weights of the teacher network are smoother over time. At the same time, it also has stronger generalization ability due to the collection of student networks at different stages.

本申请实施例中的学生网络的更新方法采用的随机梯度下降方法，也可以采用其他的优化更新方法。The stochastic gradient descent method used in the update method of the student network in the embodiment of the present application may also use other optimization update methods.

本申请实施例中的教师网络更新方法采用指数滑动平均加权方式，同样也可以替换为其他加权方式。The teacher network update method in the embodiment of the present application adopts an exponential sliding average weighting method, which can also be replaced by other weighting methods.

本申请实施例中的行人检测网络不限于特定深度网络模型。The pedestrian detection network in the embodiment of the present application is not limited to a specific deep network model.

本申请实施例中的无监督域适应的行人检测方法，将目标域无标签图像分别经过不同的数据增强，分别输入给学生网络和教师网络。教师网络预测出图像中行人的大小、位置和类别置信度等信息，将该信息作为学生网络的伪标签，指导学生网络进行学习。学生网络更新权重之后，再利用滑动平均更新教师网络的权重。In the pedestrian detection method for unsupervised domain adaptation in the embodiment of the present application, the unlabeled images in the target domain are respectively enhanced with different data, and then input to the student network and the teacher network respectively. The teacher network predicts information such as the size, location, and category confidence of pedestrians in the image, and uses this information as a pseudo-label for the student network to guide the student network to learn. After the student network updates the weights, the sliding average is used to update the weights of the teacher network.

同时，由于教师网络和学生网络接受的是相同数据经过不同的数据增强得到的图像。通过约束两个网络的输出一致，使得学生网络可以学到目标域数据的潜在相似性。而由于教师网络比学生网络具有更强的泛化能力，通过使得学生网络的输出和教师网络的输出一致，也可以使得学生网络的泛化能力增强。At the same time, since the teacher network and the student network receive images enhanced by different data from the same data. By constraining the output of the two networks to be consistent, the student network can learn the potential similarity of the target domain data. Since the teacher network has stronger generalization ability than the student network, the generalization ability of the student network can also be enhanced by making the output of the student network consistent with the output of the teacher network.

最后，如图3所示，通过不断地迭代以上训练过程，学生模型和教师模型可以相互促进性能的提升，因而教师模型可以预测更好的伪标签，学生模型也能更好地适应目标域数据的分布。Finally, as shown in Figure 3, by continuously iterating the above training process, the student model and the teacher model can promote each other's performance improvement, so the teacher model can predict better pseudo-labels, and the student model can also better adapt to the target domain data Distribution.

采用本申请实施例中的无监督域适应的行人检测方法，随机选取有标签图像数据以及无标签图像数据；有标签图像数据通过随机数据增强得到增强有标签图像数据；无标图像签数据通过随机数据增强分别得到第一增强无标签图像数据以及第二增强无标签图像数据；输入增强有标签数据至第一行人检测网络，得到第一行人预测特征；输入第一增强无标签数据至第一行人检测网络，得到第二行人预测特征；输入第二增强无标签图像数据至第二行人检测网络得到第三行人预测特征；根据有标签图像数据的标签特征以及第一行人预测特征，得到监督学习代价；根据第二行人预测特征以及第三行人预测特征，得到一致性代价；根据监督学习代价以及一致性代价相加，得到总代价；根据总代价，通过随机梯度下降算法，更新第一行人检测网络的权值参数；根据第一行人检测网络的权值参数，通过指数滑动平均算法，更新第二行人检测网络的权值参数。本申请采用迁移学习来加强模型的迁移能力，通过无监督域适应的方法，即通过已有场景下的有标签数据和新场景下的无标签数据共同训练模型，使得模型可以将已有场景的数据表现能力迁移到新场景数据上。本申请只需要采集新场景下的无标签图像数据，不需要重新进行大量的图像标注，大大节省了开发过程中由于数据标注带来的人力物力消耗，提高了效率；提高了模型的迁移能力，更能适应场景的变化。Using the pedestrian detection method for unsupervised domain adaptation in the embodiment of the present application, labeled image data and unlabeled image data are randomly selected; labeled image data is enhanced with random data to obtain enhanced labeled image data; The data enhancement obtains the first enhanced unlabeled image data and the second enhanced unlabeled image data; input the enhanced labeled data to the first pedestrian detection network to obtain the first pedestrian prediction feature; input the first enhanced unlabeled data to the first pedestrian detection network. A pedestrian detection network is used to obtain the second pedestrian prediction feature; the second enhanced unlabeled image data is input to the second pedestrian detection network to obtain the third pedestrian prediction feature; according to the label feature of the labeled image data and the first pedestrian prediction feature, Obtain the supervised learning cost; obtain the consistency cost according to the second pedestrian prediction feature and the third pedestrian prediction feature; add the supervised learning cost and the consistency cost to obtain the total cost; according to the total cost, use the stochastic gradient descent algorithm to update the first The weight parameters of the pedestrian detection network; according to the weight parameters of the first pedestrian detection network, the weight parameters of the second pedestrian detection network are updated through the exponential moving average algorithm. This application adopts transfer learning to strengthen the transfer ability of the model, through the method of unsupervised domain adaptation, that is, the model is jointly trained by the labeled data in the existing scene and the unlabeled data in the new scene, so that the model can transfer the existing scene to the Data representation capabilities are migrated to new scene data. This application only needs to collect unlabeled image data in a new scene, and does not need to re-label a large number of images, which greatly saves the human and material consumption caused by data labeling in the development process, improves efficiency, and improves the migration ability of the model. It is more adaptable to changes in the scene.

实施例2Example 2

本实施例提供了一种无监督域适应的行人检测系统，对于本实施例的无监督域适应的行人检测系统中未披露的细节，请参照其它实施例中的无监督域适应的行人检测方法的实施内容。This embodiment provides an unsupervised domain-adapted pedestrian detection system. For details not disclosed in the unsupervised domain-adapted pedestrian detection system of this embodiment, please refer to the unsupervised domain-adapted pedestrian detection method in other embodiments implementation content.

如图4所示，本申请实施例的无监督域适应的行人检测系统包括训练数据选取模块10、数据增强模块20、特征预测网络模块30、监督学习代价模块40、一致性代价模块50、总代价模块60、第一行人检测网络更新模块70以及第二行人检测网络更新模块80。As shown in FIG. 4 , the pedestrian detection system for unsupervised domain adaptation according to the embodiment of the present application includes a training data selection module 10 , a data enhancement module 20 , a feature prediction network module 30 , a supervised learning cost module 40 , a consistency cost module 50 , and a total The cost module 60 , the first pedestrian detection network update module 70 and the second pedestrian detection network update module 80 .

具体的：specific:

训练数据选取模块10：用于随机选取有标签图像数据以及无标签图像数据。Training data selection module 10: used to randomly select labeled image data and unlabeled image data.

数据增强模块20：用于将有标签图像数据通过随机数据增强得到增强有标签图像数据；用于将无标图像签数据通过随机数据增强分别得到第一增强无标签图像数据以及第二增强无标签图像数据。Data enhancement module 20: used for enhancing labeled image data through random data enhancement to obtain enhanced labeled image data; for enhancing unlabeled image label data through random data enhancement to obtain first enhanced unlabeled image data and second enhanced unlabeled image data respectively image data.

特征预测网络模块30：用于输入增强有标签数据至第一行人检测网络，得到第一行人预测特征；用于输入第一增强无标签数据至第一行人检测网络，得到第二行人预测特征；用于输入第二增强无标签图像数据至第二行人检测网络得到第三行人预测特征。Feature prediction network module 30: used to input the enhanced labeled data to the first pedestrian detection network to obtain the first pedestrian prediction feature; used to input the first enhanced unlabeled data to the first pedestrian detection network to obtain the second pedestrian Prediction feature; used to input the second enhanced unlabeled image data to the second pedestrian detection network to obtain the third pedestrian prediction feature.

监督学习代价模块40：用于根据有标签图像数据的标签特征以及第一行人预测特征，得到监督学习代价。Supervised learning cost module 40: used to obtain the supervised learning cost according to the label feature of the labeled image data and the first pedestrian prediction feature.

一致性代价模块50：用于根据第二行人预测特征以及第三行人预测特征，得到一致性代价。Consistency cost module 50: used to obtain the consistency cost according to the second pedestrian prediction feature and the third pedestrian prediction feature.

总代价模块60：用于根据监督学习代价以及一致性代价相加，得到总代价。The total cost module 60 is used to add the supervised learning cost and the consistency cost to obtain the total cost.

第一行人检测网络更新模块70：用于根据总代价，通过随机梯度下降算法，更新第一行人检测网络的权值参数。The first pedestrian detection network updating module 70 is used for updating the weight parameter of the first pedestrian detection network through the stochastic gradient descent algorithm according to the total cost.

第二行人检测网络更新模块80：用于根据第一行人检测网络的权值参数，通过指数滑动平均算法，更新第二行人检测网络的权值参数。The second pedestrian detection network update module 80 is configured to update the weight parameters of the second pedestrian detection network through the exponential moving average algorithm according to the weight parameters of the first pedestrian detection network.

训练收敛模块：用于重复以上步骤后直到第一行人检测网络以及第二行人监测网络收敛，得到更新后的第一行人检测网络以及第二行人监测网络。The training convergence module is used to repeat the above steps until the first pedestrian detection network and the second pedestrian monitoring network converge, and the updated first pedestrian detection network and the second pedestrian monitoring network are obtained.

图3中同样示出了根据本申请实施例的无监督域适应的行人检测系统的应用流程示意图。FIG. 3 also shows a schematic diagram of an application flow of the pedestrian detection system for unsupervised domain adaptation according to an embodiment of the present application.

首先需要构建两个完全相同的行人检测网络模型，即第一行人检测网络以及第二行人监测网络，第一行人检测网络作为学生模型，第二行人监测网络作为教师模型。First, two identical pedestrian detection network models need to be constructed, namely the first pedestrian detection network and the second pedestrian detection network, the first pedestrian detection network is used as the student model, and the second pedestrian detection network is used as the teacher model.

如图3所示，本申请的监督域适应的行人检测系统的应用具体步骤如下：As shown in FIG. 3 , the specific steps of the application of the pedestrian detection system adapted to the supervised domain of the present application are as follows:

2)有标签图像数据(x_s,B_s)得到增强有标签数据

和第二增强无标签图像数据

2) Labeled image data (x _s , B _s ) are enhanced with labelled data

and the second augmented unlabeled image data

3)将增强有标签数据

以及第一增强无标签图像数据

and the first enhanced unlabeled image data

本申请实施例中的无监督域适应的行人检测系统，将目标域无标签图像分别经过不同的数据增强，分别输入给学生网络和教师网络。教师网络预测出图像中行人的大小、位置和类别置信度等信息，将该信息作为学生网络的伪标签，指导学生网络进行学习。学生网络更新权重之后，再利用滑动平均更新教师网络的权重。In the pedestrian detection system for unsupervised domain adaptation in the embodiment of the present application, the unlabeled images in the target domain are respectively enhanced with different data, and then input to the student network and the teacher network respectively. The teacher network predicts information such as the size, location, and category confidence of pedestrians in the image, and uses this information as a pseudo-label for the student network to guide the student network to learn. After the student network updates the weights, the sliding average is used to update the weights of the teacher network.

采用本申请实施例中的无监督域适应的行人检测系统，随机选取有标签图像数据以及无标签图像数据；有标签图像数据通过随机数据增强得到增强有标签图像数据；无标图像签数据通过随机数据增强分别得到第一增强无标签图像数据以及第二增强无标签图像数据；输入增强有标签数据至第一行人检测网络，得到第一行人预测特征；输入第一增强无标签数据至第一行人检测网络，得到第二行人预测特征；输入第二增强无标签图像数据至第二行人检测网络得到第三行人预测特征；根据有标签图像数据的标签特征以及第一行人预测特征，得到监督学习代价；根据第二行人预测特征以及第三行人预测特征，得到一致性代价；根据监督学习代价以及一致性代价相加，得到总代价；根据总代价，通过随机梯度下降算法，更新第一行人检测网络的权值参数；根据第一行人检测网络的权值参数，通过指数滑动平均算法，更新第二行人检测网络的权值参数。本申请采用迁移学习来加强模型的迁移能力，通过无监督域适应的方法，即通过已有场景下的有标签数据和新场景下的无标签数据共同训练模型，使得模型可以将已有场景的数据表现能力迁移到新场景数据上。本申请只需要采集新场景下的无标签图像数据，不需要重新进行大量的图像标注，大大提高了效率，从而大大提升了模型的迁移性能。Using the pedestrian detection system for unsupervised domain adaptation in the embodiment of the present application, the labeled image data and the unlabeled image data are randomly selected; the labeled image data is enhanced by random data to obtain enhanced labeled image data; the unlabeled image labeled data is randomly The data enhancement obtains the first enhanced unlabeled image data and the second enhanced unlabeled image data; input the enhanced labeled data to the first pedestrian detection network to obtain the first pedestrian prediction feature; input the first enhanced unlabeled data to the first pedestrian detection network. A pedestrian detection network is used to obtain the second pedestrian prediction feature; the second enhanced unlabeled image data is input to the second pedestrian detection network to obtain the third pedestrian prediction feature; according to the label feature of the labeled image data and the first pedestrian prediction feature, Obtain the supervised learning cost; obtain the consistency cost according to the second pedestrian prediction feature and the third pedestrian prediction feature; add the supervised learning cost and the consistency cost to obtain the total cost; according to the total cost, use the stochastic gradient descent algorithm to update the first The weight parameters of the pedestrian detection network; according to the weight parameters of the first pedestrian detection network, the weight parameters of the second pedestrian detection network are updated through the exponential moving average algorithm. This application adopts transfer learning to strengthen the transfer ability of the model, through the method of unsupervised domain adaptation, that is, the model is jointly trained by the labeled data in the existing scene and the unlabeled data in the new scene, so that the model can transfer the existing scene to the Data representation capabilities are migrated to new scene data. The present application only needs to collect unlabeled image data in a new scene, and does not need to re-label a large number of images, which greatly improves the efficiency and thus greatly improves the transfer performance of the model.

实施例3Example 3

本实施例提供了一种计算机可读存储介质，其上存储有计算机程序；计算机程序被处理器执行以实现其他实施例中的无监督域适应的行人检测方法。This embodiment provides a computer-readable storage medium on which a computer program is stored; the computer program is executed by a processor to implement the pedestrian detection method for unsupervised domain adaptation in other embodiments.

本领域内的技术人员应明白，本申请的实施例可提供为方法、系统、或计算机程序产品。因此，本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且，本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。As will be appreciated by those skilled in the art, the embodiments of the present application may be provided as a method, a system, or a computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器，使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the present application. It will be understood that each process and/or block in the flowchart illustrations and/or block diagrams, and combinations of processes and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中，使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品，该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions The apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上，使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that The instructions provide steps for implementing the functions specified in the flow or blocks of the flowcharts and/or the block or blocks of the block diagrams.

在本发明使用的术语是仅仅出于描述特定实施例的目的，而非旨在限制本发明。在本发明和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式，除非上下文清楚地表示其他含义。还应当理解，本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。The terminology used in the present invention is for the purpose of describing particular embodiments only and is not intended to limit the present invention. As used in this specification and the appended claims, the singular forms "a," "the," and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise. It will also be understood that the term "and/or" as used herein refers to and includes any and all possible combinations of one or more of the associated listed items.

应当理解，尽管在本发明可能采用术语第一、第二、第三等来描述各种信息，但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如，在不脱离本发明范围的情况下，第一信息也可以被称为第二信息，类似地，第二信息也可以被称为第一信息。取决于语境，如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。It should be understood that although the terms first, second, third, etc. may be used in the present invention to describe various information, such information should not be limited by these terms. These terms are only used to distinguish the same type of information from each other. For example, the first information may also be referred to as the second information, and similarly, the second information may also be referred to as the first information, without departing from the scope of the present invention. Depending on the context, the word "if" as used herein can be interpreted as "at the time of" or "when" or "in response to determining."

尽管已描述了本申请的优选实施例，但本领域内的技术人员一旦得知了基本创造性概念，则可对这些实施例作出另外的变更和修改。所以，所附权利要求意欲解释为包括优选实施例以及落入本申请范围的所有变更和修改。While the preferred embodiments of the present application have been described, additional changes and modifications to these embodiments may occur to those skilled in the art once the basic inventive concepts are known. Therefore, the appended claims are intended to be construed to include the preferred embodiment and all changes and modifications that fall within the scope of this application.

显然，本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的精神和范围。这样，倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内，则本申请也意图包含这些改动和变型在内。Obviously, those skilled in the art can make various changes and modifications to the present application without departing from the spirit and scope of the present application. Thus, if these modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is also intended to include these modifications and variations.

Claims

1. An unsupervised domain adapted pedestrian detection method, comprising the steps of:

randomly selecting labeled image data and unlabeled image data;

the tagged image data is enhanced through random data to obtain enhanced tagged image data; the label-free image label data are subjected to random data enhancement to respectively obtain first enhanced label-free image data and second enhanced label-free image data;

inputting the enhanced labeled data to a first pedestrian detection network to obtain a first pedestrian prediction characteristic; inputting the first enhanced non-tag data to a first pedestrian detection network to obtain a second pedestrian prediction characteristic; inputting the second enhanced unlabeled image data to a second pedestrian detection network to obtain a third pedestrian prediction feature;

obtaining a supervised learning cost according to the label characteristics of the labeled image data and the first pedestrian prediction characteristics; obtaining consistency cost according to the second pedestrian prediction characteristic and the third pedestrian prediction characteristic;

adding the supervised learning cost and the consistency cost to obtain a total cost;

updating the weight parameter of the first pedestrian detection network through a random gradient descent algorithm according to the total cost;

and updating the weight parameter of the second pedestrian detection network through an exponential moving average algorithm according to the weight parameter of the first pedestrian detection network.

2. The unsupervised domain adapted pedestrian detection method of claim 1, further comprising:

repeating the steps until the first pedestrian detection network and the second pedestrian monitoring network are converged to obtain an updated first pedestrian detection network and an updated second pedestrian monitoring network;

and inputting the unlabeled image data to be detected into the updated first pedestrian detection network to obtain a pedestrian detection result.

3. The unsupervised domain adapted pedestrian detection method of claim 1, wherein the tagged image data is randomly selected from image data for which tag data is known; and randomly selecting the non-label image data from the to-be-detected non-label image data.

4. The unsupervised domain adapted pedestrian detection method of claim 1, wherein the first, second and third pedestrian prediction features comprise image pedestrian size information, classification information and location information.

5. The unsupervised domain adapted pedestrian detection method of claim 3, wherein the supervised learning cost and consistency cost comprise a pedestrian classification loss, a pedestrian center point offset loss, a pedestrian border width and a height loss.

6. The unsupervised domain adapted pedestrian detection method of claim 1, wherein the first and second pedestrian detection networks initially employ the same neural network architecture.

7. The unsupervised domain adapted pedestrian detection method of claim 1, wherein the weight parameters of the second pedestrian detection network are obtained by exponential moving average in a training process of weight calculation of the first pedestrian detection network.

8. A pedestrian detection system adaptive to an unsupervised domain is characterized by specifically comprising:

a training data selection module: the image processing device is used for randomly selecting image data with labels and image data without labels;

the data enhancement module: the image processing device is used for enhancing the labeled image data through random data to obtain enhanced labeled image data; the image enhancement module is used for enhancing the label-free image data through random data to respectively obtain first enhanced label-free image data and second enhanced label-free image data;

the characteristic prediction network module: the system is used for inputting the enhanced labeled data to a first pedestrian detection network to obtain a first pedestrian prediction characteristic; the first enhanced non-tag data is input to a first pedestrian detection network to obtain a second pedestrian prediction characteristic; the second enhanced unlabeled image data is input to a second pedestrian detection network to obtain a third pedestrian prediction characteristic;

a supervised learning cost module: the monitoring learning cost is obtained according to the label features of the labeled image data and the first pedestrian prediction features;

a consistency cost module: the system is used for obtaining consistency cost according to the second pedestrian prediction characteristic and the third pedestrian prediction characteristic;

a total cost module: the device is used for adding the supervised learning cost and the consistency cost to obtain a total cost;

a first pedestrian detection network update module: the weight parameter of the first pedestrian detection network is updated through a random gradient descent algorithm according to the total cost;

a second pedestrian detection network update module: and the weight parameter updating module is used for updating the weight parameter of the second pedestrian detection network through an exponential moving average algorithm according to the weight parameter of the first pedestrian detection network.

9. The unsupervised domain adapted pedestrian detection system of claim 8, further comprising:

training a convergence module: the pedestrian detection method comprises the steps of obtaining a first pedestrian detection network and a second pedestrian monitoring network after updating, and repeating the steps until the first pedestrian detection network and the second pedestrian monitoring network converge to obtain the updated first pedestrian detection network and the updated second pedestrian monitoring network;

a pedestrian detection module: and the pedestrian detection system is used for inputting the unlabeled image data to be detected into the updated first pedestrian detection network to obtain a pedestrian detection result.

10. A computer-readable storage medium, having stored thereon a computer program; a computer program for execution by a processor for implementing an unsupervised domain adapted pedestrian detection method according to any one of claims 1-7.