CN115880263A

CN115880263A - Image quality scoring method, device, system, storage medium and electronic equipment

Info

Publication number: CN115880263A
Application number: CN202211658758.5A
Authority: CN
Inventors: 蔡晓蕙
Original assignee: Shanghai Goldway Intelligent Transportation System Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2022-12-22
Filing date: 2022-12-22
Publication date: 2023-03-31

Abstract

The application discloses an image quality scoring method, which comprises the following steps: inputting an original image to be scored and a label corresponding to each current service type into a pre-trained state recognition model, marking an image area related to each current service type, and performing feature extraction based on the image area to obtain a first state feature vector corresponding to each current service type; for each first state feature vector, calculating the similarity of the first state feature vector and the state feature vectors of the R x P standard images corresponding to the corresponding current service type respectively to obtain R x P similarity results; and for each first state feature vector, determining a quality scoring result of the original image to be scored under the corresponding current service type based on the R x P similarity result. By applying the method and the device, reasonable image quality judgment can be performed in a targeted manner corresponding to different service types, so that the quality judgment result is more accurate.

Description

Image quality scoring method, device, system, storage medium and electronic equipment

技术领域technical field

本申请涉及图像处理技术，特别涉及用于进行状态识别的图像的质量评分方法、装置、系统、存储介质和电子设备。The present application relates to image processing technology, in particular to a method, device, system, storage medium and electronic equipment for evaluating the quality of images used for state recognition.

背景技术Background technique

随着图像处理技术的进步，利用图像进行状态识别的技术得到了越来越广泛的应用。With the advancement of image processing technology, the technology of state recognition using images has been more and more widely used.

当图像质量较差时，会影响状态识别系统的性能，导致状态识别错误。例如，驾驶室违法行为判断是一种典型的利用图像进行状态识别的应用。当采集的驾驶室图像质量较差时，会影响驾驶室违法行为判断模型的性能，导致违法行为误报，需大量人力进行复核，降低收益。基于此，希望在状态识别处理之前引入质量评分的处理，过滤掉质量较差的图像，提升状态识别的精准率。When the image quality is poor, it will affect the performance of the state recognition system and lead to state recognition errors. For example, judging illegal behavior in the cab is a typical application of state recognition using images. When the image quality of the collected cab is poor, it will affect the performance of the cab illegal behavior judgment model, resulting in false alarms of illegal behavior, requiring a lot of manpower for review and reducing revenue. Based on this, it is hoped that the quality scoring process will be introduced before the state recognition process to filter out images with poor quality and improve the accuracy of state recognition.

发明内容Contents of the invention

本申请提供用于状态识别的图像的质量评分方法、装置、系统、存储介质和电子设备，能够针对不同的业务类型，合理进行图像质量评分，进而提升状态识别的精准率。The present application provides an image quality scoring method, device, system, storage medium and electronic equipment for state recognition, which can reasonably score image quality for different business types, thereby improving the accuracy of state recognition.

为实现上述目的，本申请采用如下技术方案：In order to achieve the above object, the application adopts the following technical solutions:

本申请提供一种图像质量评分方法，包括：This application provides an image quality scoring method, including:

将待评分的原始图像和对应于每个当前业务类型的标签，输入预先训练好的状态识别模型，标记与所述每个当前业务类型相关的图像区域，并基于所述图像区域进行特征提取，得到与所述每个当前业务类型对应的第一状态特征向量；Input the original image to be scored and the label corresponding to each current business type into a pre-trained state recognition model, mark the image area related to each current business type, and perform feature extraction based on the image area, Obtaining a first state feature vector corresponding to each current service type;

对于每个所述第一状态特征向量，计算该第一状态特征向量分别与相应当前业务类型对应的R*P个标准图像的状态特征向量的相似度，得到R*P个相似度结果；For each of the first state feature vectors, calculate the similarity between the first state feature vector and the state feature vectors of R*P standard images corresponding to the corresponding current service type, and obtain R*P similarity results;

对于每个所述第一状态特征向量，基于所述R*P个相似度结果，确定所述待评分的原始图像在相应当前业务类型下的质量评分结果；For each of the first state feature vectors, based on the R*P similarity results, determine the quality scoring results of the original image to be scored under the corresponding current service type;

其中，所述标准图像的状态特征向量为：标准图像和相应当前业务类型的标签输入所述状态识别模型进行特征提取后得到的状态特征向量；R*P个标准图像是对应于相应当前业务类型的各种状态设置的可无歧义辨别状态类别的典型图像，所述R为相应当前业务类型下所有状态类别的总数，所述P为正整数。Wherein, the state feature vector of the standard image is: the state feature vector obtained after the standard image and the label of the corresponding current business type are input into the state recognition model for feature extraction; R*P standard images are corresponding to the corresponding current business type Typical images of various status settings that can be unambiguously identified status categories, the R is the total number of all status categories under the corresponding current service type, and the P is a positive integer.

较佳地，所述状态识别模型包括用于进行语义分割的第一层级网络和用于进行状态识别的第二层级网络；Preferably, the state recognition model includes a first-level network for semantic segmentation and a second-level network for state recognition;

所述第一层级网络的输入为所述待评分的原始图像和所述标签，输出为对应于每个所述标签的语义特征图，用于标记与所述每个当前业务类型相关的图像区域；The input of the first-level network is the original image to be scored and the label, and the output is a semantic feature map corresponding to each of the labels, which is used to mark the image area related to each of the current business types ;

所述第二层级网络包括与M个业务类型一一对应的M个子网络；其中，所述M为所有业务类型的总数；The second-level network includes M sub-networks corresponding to M service types; wherein, the M is the total number of all service types;

每个所述子网络的输入为所述待评分的原始图像和与该子网络对应的第一当前业务类型的各标签相应的、修正后的语义特征图，每个所述子网络将所述待评分的原始图像和所述修正后的语义特征图在通道维度上进行连接，并将连接后的图像输入分类网络进行状态识别，得到与所述第一当前业务类型对应的所述第一状态特征向量；其中，所述修正后的语义特征图为上采样到与所述待评分的原始图像大小相同的语义特征图。The input of each sub-network is the original image to be scored and the corrected semantic feature map corresponding to each label of the first current business type corresponding to the sub-network, and each of the sub-networks combines the The original image to be scored and the corrected semantic feature map are connected in the channel dimension, and the connected image is input into the classification network for state recognition to obtain the first state corresponding to the first current business type A feature vector; wherein, the corrected semantic feature map is a semantic feature map upsampled to the same size as the original image to be scored.

较佳地，所述将连接后的图像输入分类网络进行状态识别，包括：Preferably, said inputting the connected images into the classification network for state recognition includes:

将连接后的图像划分为若干二维切片patch，在每个所述二维切片中加入该二维切片对应的二维位置信息；Divide the connected image into several two-dimensional slice patches, and add the two-dimensional position information corresponding to the two-dimensional slice in each of the two-dimensional slices;

将所有加入二维位置信息后的二维切片输入分类网络进行状态识别；Input all the two-dimensional slices with two-dimensional position information into the classification network for state recognition;

其中，通过对所述状态识别模型的训练，保证位于所述当前业务类型相关的图像区域中的二维切片对应的二维位置信息权重大于位于所述当前业务类型相关的图像区域外的二维切片对应的二维位置信息权重。Wherein, through the training of the state recognition model, it is ensured that the weight of the two-dimensional position information corresponding to the two-dimensional slice located in the image area related to the current business type is greater than that of the two-dimensional position information located outside the image area related to the current business type. The weight of the two-dimensional position information corresponding to the slice.

较佳地，在所述分类网络中加入注意力机制，为输入所述分类网络的每个像素分配权重，且位于所述当前业务类型相关的图像区域中像素的权重大于位于所述当前业务类型相关的图像区域外像素的权重。Preferably, an attention mechanism is added to the classification network, and a weight is assigned to each pixel input to the classification network, and the weight of pixels located in the image area related to the current business type is greater than that of pixels located in the current business type The relative weights of pixels outside the image region.

较佳地，在训练所述状态识别模型时，利用训练样本图像完成所述第一层级网络的训练，基于训练好的所述第一层级网络，对所述第二层级网络进行训练；或者，Preferably, when training the state recognition model, the training of the first-level network is completed using training sample images, and the second-level network is trained based on the trained first-level network; or,

在训练所述状态识别模型时，对所述第一层级网络和所述第二层级网络进行联合训练；或者，When training the state recognition model, jointly train the first-level network and the second-level network; or,

在训练所述状态识别模型时，利用训练样本图像对所述第一层级网络进行初始训练，基于初始训练后得到的所述第一层级网络，对所述第二层级网络进行初始训练；在所述第二层级网络进行初始训练后，对所述第一层级网络和所述第二层级网络进行联合训练；When training the state recognition model, use training sample images to perform initial training on the first-level network, and based on the first-level network obtained after the initial training, perform initial training on the second-level network; After the initial training of the second-level network, joint training is performed on the first-level network and the second-level network;

其中，在进行所述联合训练时，损失函数为所述第一层级网络的第一损失函数与所述第二层级网络的第二损失函数的加权和。Wherein, when performing the joint training, the loss function is a weighted sum of the first loss function of the first-level network and the second loss function of the second-level network.

较佳地，在进行所述联合训练时，基于所有输入标签的损失权重计算每个输入业务类型对应的所述第一损失函数；其中，所述输入业务类型为输入标签所属的业务类型；Preferably, when performing the joint training, the first loss function corresponding to each input business type is calculated based on the loss weights of all input tags; wherein, the input business type is the business type to which the input tag belongs;

在计算任一输入业务类型对应的所述第一损失函数时，所述任一输入业务类型的输入标签的损失权重大于不属于所述任一输入业务类型的输入标签的损失权重。When calculating the first loss function corresponding to any input service type, the loss weight of the input tags of the any input service type is greater than the loss weight of the input tags not belonging to the any input service type.

较佳地，所述第一损失函数为dice损失函数，所述第二损失函数为focal损失函数。Preferably, the first loss function is a dice loss function, and the second loss function is a focal loss function.

较佳地，所述每个当前业务类型下每个状态类别包括P个标准图像，所述P个标准图像包括不同场景亮度、不同拍摄角度和/或所述每个状态类别下不同姿态的原始图像。Preferably, each state category under each current business type includes P standard images, and the P standard images include original images of different scene brightnesses, different shooting angles, and/or different postures under each state category. image.

较佳地，所述确定所述待评分的原始图像在相应测试业务类型下的质量评分结果，包括：Preferably, the determination of the quality scoring result of the original image to be scored under the corresponding test service type includes:

对所述R*P个相似度结果计算加权均值，将计算结果作为所述待评分的原始图像在相应当前业务类型下的质量评分结果。Calculate a weighted mean value for the R*P similarity results, and use the calculation result as the quality scoring result of the original image to be scored under the corresponding current service type.

本申请还提供一种图像质量评分方法，包括：The application also provides an image quality scoring method, including:

将待评分的原始图像和每个当前业务类型的标识，输入预先训练好的质量评分回归模型，得到所述待评分的原始图像在所述每个当前业务类型下的质量评分结果；Input the original image to be scored and the identification of each current business type into a pre-trained quality scoring regression model to obtain the quality scoring result of the original image to be scored under each current business type;

其中，所述质量评分回归模型是利用测试样本图像、每个测试业务类型的标识以及测试样本图像在所述每个测试业务类型下的质量评分结果训练得到的神经网络模型；Wherein, the quality scoring regression model is a neural network model obtained by using the test sample image, the identification of each test service type, and the quality score result of the test sample image under each test service type;

所述测试样本图像在所述每个测试业务类型下的质量评分结果的确定过程包括：The determination process of the quality scoring result of the test sample image under each test service type includes:

将所述测试样本图像和对应于所述每个测试业务类型的标签，输入预先训练好的状态识别模型，标记与所述每个测试业务类型相关的图像区域，并基于所述图像区域进行特征提取，得到与所述每个测试业务类型对应的第一状态特征向量；Input the test sample image and the label corresponding to each test service type into a pre-trained state recognition model, mark the image area related to each test service type, and perform feature based on the image area Extracting to obtain the first state feature vector corresponding to each test service type;

对于每个所述第一状态特征向量，计算该第一状态特征向量分别与相应测试业务类型对应的R*P个标准图像的状态特征向量的相似度，得到R*P个相似度结果；For each of the first state feature vectors, calculate the similarity between the first state feature vector and the state feature vectors of R*P standard images corresponding to the corresponding test service type, and obtain R*P similarity results;

对于每个所述第一状态特征向量，基于所述R*P个相似度结果，确定所述测试样本图像在相应测试业务类型下的质量评分结果；For each of the first state feature vectors, based on the R*P similarity results, determine the quality scoring results of the test sample images under the corresponding test service types;

其中，所述标准图像的状态特征向量为：标准图像和相应测试业务类型的标签输入所述状态识别模型进行特征提取后得到的状态特征向量；R*P个标准图像是对应于相应测试业务类型的各种状态设置的可无歧义辨别状态类别的典型图像，所述R为相应测试业务类型下所有状态类别的总数，所述P为正整数。Wherein, the state feature vector of the standard image is: the state feature vector obtained after the standard image and the label of the corresponding test service type are input into the state recognition model for feature extraction; R*P standard images are corresponding to the corresponding test service type Typical images of various status settings that can be distinguished without ambiguity. The R is the total number of all status categories under the corresponding test service type, and the P is a positive integer.

所述第一层级网络的输入为所述测试样本图像和所述标签，输出为对应于每个所述标签的语义特征图，用于标记与所述每个测试业务类型相关的图像区域；The input of the first-level network is the test sample image and the label, and the output is a semantic feature map corresponding to each of the labels, which is used to mark the image area related to each test service type;

每个所述子网络的输入为所述测试样本图像和与该子网络对应的第一测试业务类型的各标签相应的、修正后的语义特征图，每个所述子网络将所述测试样本图像和所述修正后的语义特征图在通道维度上进行连接，并将连接后的图像输入分类网络进行状态识别，得到与所述第一测试业务类型对应的所述第一状态特征向量；其中，所述修正后的语义特征图为上采样到与所述测试样本图像大小相同的语义特征图。The input of each sub-network is the test sample image and the corrected semantic feature map corresponding to each label of the first test service type corresponding to the sub-network, and each of the sub-networks takes the test sample The image and the corrected semantic feature map are connected in the channel dimension, and the connected image is input into the classification network for state recognition to obtain the first state feature vector corresponding to the first test service type; wherein , the corrected semantic feature map is a semantic feature map upsampled to the same size as the test sample image.

较佳地，在所述分类网络中加入注意力机制，为输入所述分类网络的每个像素分配权重，且位于所述测试业务类型相关的图像区域中像素的权重大于位于所述测试业务类型相关的图像区域外像素的权重。Preferably, an attention mechanism is added to the classification network, and a weight is assigned to each pixel input to the classification network, and the weight of pixels located in the image area related to the test service type is greater than that of pixels located in the test service type The relative weights of pixels outside the image region.

在进行所述联合训练时，损失函数为所述第一层级网络的第一损失函数与所述第二层级网络的第二损失函数的加权和。When performing the joint training, the loss function is a weighted sum of the first loss function of the first-level network and the second loss function of the second-level network.

较佳地，所述确定所述测试样本图像在相应测试业务类型下的质量评分结果，包括：Preferably, the determination of the quality scoring result of the test sample image under the corresponding test service type includes:

对所述R*P个相似度结果计算加权均值，将计算结果作为所述测试样本图像在相应测试业务类型下的质量评分结果。Calculate a weighted mean value for the R*P similarity results, and use the calculation result as the quality scoring result of the test sample image under the corresponding test service type.

本申请提供一种图像质量评分装置，包括：状态识别单元和评分单元；The present application provides an image quality scoring device, including: a state identification unit and a scoring unit;

所述状态识别单元，用于将待评分的原始图像和对应于每个当前业务类型的标签，输入预先训练好的状态识别模型，标记与所述每个当前业务类型相关的图像区域，并基于所述图像区域进行特征提取，得到与所述每个当前业务类型对应的第一状态特征向量；The state recognition unit is used to input the original image to be scored and the label corresponding to each current business type into a pre-trained state recognition model, mark the image area related to each current business type, and based on performing feature extraction on the image area to obtain a first state feature vector corresponding to each current service type;

所述评分单元，用于对于每个所述第一状态特征向量，计算该第一状态特征向量分别与相应测试业务类型对应的R*P个标准图像的状态特征向量的相似度，得到R*P个相似度结果；还用于对于每个所述第一状态特征向量，基于所述R*P个相似度结果，确定所述待评分的原始图像在相应当前业务类型下的质量评分结果；The scoring unit is configured to, for each of the first state feature vectors, calculate the similarity between the first state feature vector and the state feature vectors of R*P standard images corresponding to the corresponding test service type, to obtain R* P similarity results; for each of the first state feature vectors, based on the R*P similarity results, determine the quality scoring results of the original image to be scored under the corresponding current service type;

较佳地，在所述状态识别单元中，所述状态识别模型包括用于进行语义分割的第一层级网络和用于进行状态识别的第二层级网络；Preferably, in the state recognition unit, the state recognition model includes a first-level network for semantic segmentation and a second-level network for state recognition;

较佳地，在所述状态识别单元中，所述将连接后的图像输入分类网络进行状态识别，包括：Preferably, in the state identification unit, the input of the connected image into the classification network for state identification includes:

较佳地，在所述状态识别单元中，在所述分类网络加入注意力机制，为输入所述分类网络的每个像素分配权重，且位于所述当前业务类型相关的图像区域中像素的权重大于位于所述当前业务类型相关的图像区域外像素的权重。Preferably, in the state identification unit, an attention mechanism is added to the classification network to assign a weight to each pixel input to the classification network, and the weight of the pixel located in the image area related to the current business type Greater than the weight of pixels located outside the image area related to the current service type.

较佳地，所述装置进一步包括训练单元，用于训练生成所述状态识别模型；Preferably, the device further includes a training unit for training and generating the state recognition model;

其中，在所述训练单元训练所述状态识别模型时，利用训练样本图像完成所述第一层级网络的训练，基于训练好的所述第一层级网络，对所述第二层级网络进行训练；或者，Wherein, when the training unit trains the state recognition model, the training sample image is used to complete the training of the first-level network, and the second-level network is trained based on the trained first-level network; or,

在所述训练单元训练所述状态识别模型时，对所述第一层级网络和所述第二层级网络进行联合训练；或者，When the training unit trains the state recognition model, jointly train the first-level network and the second-level network; or,

在所述训练单元训练所述状态识别模型时，利用训练样本图像对所述第一层级网络进行初始训练，基于初始训练后得到的所述第一层级网络，对所述第二层级网络进行初始训练；在所述第二层级网络进行初始训练后，对所述第一层级网络和所述第二层级网络进行联合训练；When the training unit trains the state recognition model, the training sample image is used to initially train the first-level network, and based on the first-level network obtained after the initial training, the second-level network is initially trained. training; after the initial training of the second-level network, jointly train the first-level network and the second-level network;

较佳地，在所述状态识别单元中，所述确定所述待评分的原始图像在相应当前业务类型下的质量评分结果包括：Preferably, in the state identification unit, the determination of the quality scoring result of the original image to be scored under the corresponding current business type includes:

本申请还提供图像质量评分系统，包括训练装置和图像质量评分装置；The application also provides an image quality scoring system, including a training device and an image quality scoring device;

所述训练装置包括状态识别单元、评分单元和第一训练单元；The training device includes a state recognition unit, a scoring unit and a first training unit;

所述状态识别单元，用于将测试样本图像和对应于每个测试业务类型的标签，输入预先训练好的状态识别模型，标记与所述每个测试业务类型相关的图像区域，并基于所述图像区域进行特征提取，得到与所述每个测试业务类型对应的第一状态特征向量；The state recognition unit is used to input the test sample image and the label corresponding to each test service type into the pre-trained state recognition model, mark the image area related to each test service type, and based on the performing feature extraction on the image area to obtain a first state feature vector corresponding to each test service type;

所述评分单元，用于对于每个所述第一状态特征向量，计算该第一状态特征向量分别与相应测试业务类型对应的R*P个标准图像的状态特征向量的相似度，得到R*P个相似度结果；还用于对于每个所述第一状态特征向量，基于所述R*P个相似度结果，确定所述测试样本图像在相应测试业务类型下的质量评分结果；The scoring unit is configured to, for each of the first state feature vectors, calculate the similarity between the first state feature vector and the state feature vectors of R*P standard images corresponding to the corresponding test service type, to obtain R* P similarity results; for each of the first state feature vectors, based on the R*P similarity results, determine the quality scoring result of the test sample image under the corresponding test service type;

所述第一训练单元，用于利用所述测试样本图像、所述每个当前业务类型的标识，以及所述测试样本图像在所述每个当前业务类型下的质量评分结果，训练生成质量评分回归模型；The first training unit is configured to use the test sample image, the identification of each current service type, and the quality score result of the test sample image under each current service type to train and generate a quality score regression model;

所述图像质量评分装置包括输入单元、质量评分单元和输出单元；The image quality scoring device includes an input unit, a quality scoring unit and an output unit;

所述输入单元，用于接收待评分的原始图像和每个当前业务类型的标识，并发送给所述质量评分单元；The input unit is used to receive the original image to be scored and the identification of each current service type, and send them to the quality scoring unit;

所述质量评分单元，用于将接收的待评分的原始图像和每个当前业务类型的标识，输入所述质量评分回归模型，得到所述待评分的原始图像在所述每个当前业务类型下的质量评分结果；The quality scoring unit is configured to input the received original image to be scored and the identification of each current business type into the quality scoring regression model to obtain the original image to be scored under each current business type quality rating results for

所述输出单元，用于输出所述待评分的原始图像在所述每个当前业务类型下的质量评分结果；The output unit is configured to output a quality scoring result of the original image to be scored under each current service type;

较佳地，在所述状态识别单元中，在所述分类网络加入注意力机制，为输入所述分类网络的每个像素分配权重，且位于所述测试业务类型相关的图像区域中像素的权重大于位于所述测试业务类型相关的图像区域外像素的权重。Preferably, in the state recognition unit, an attention mechanism is added to the classification network, and a weight is assigned to each pixel input to the classification network, and the weight of the pixel located in the image area related to the test service type Greater than the weight of pixels located outside the image area related to the test service type.

较佳地，所述训练装置进一步包括第二训练单元，用于训练生成所述状态识别模型；Preferably, the training device further includes a second training unit for training and generating the state recognition model;

其中，在所述第二训练单元训练所述状态识别模型时，利用训练样本图像完成所述第一层级网络的训练，基于训练好的所述第一层级网络，对所述第二层级网络进行训练；或者，Wherein, when the second training unit trains the state recognition model, the training sample image is used to complete the training of the first-level network, and based on the trained first-level network, the second-level network is training; or,

在所述第二训练单元训练所述状态识别模型时，对所述第一层级网络和所述第二层级网络进行联合训练；或者，When the second training unit trains the state recognition model, jointly train the first-level network and the second-level network; or,

在所述第二训练单元训练所述状态识别模型时，利用训练样本图像对所述第一层级网络进行初始训练，基于初始训练后得到的所述第一层级网络，对所述第二层级网络进行初始训练；在所述第二层级网络进行初始训练后，对所述第一层级网络和所述第二层级网络进行联合训练；When the second training unit trains the state recognition model, the training sample image is used to initially train the first-level network, and based on the first-level network obtained after the initial training, the second-level network is performing initial training; after performing initial training on the second-level network, performing joint training on the first-level network and the second-level network;

较佳地，在所述第二训练单元进行所述联合训练时，基于所有输入标签的损失权重计算每个输入业务类型对应的所述第一损失函数；其中，所述输入业务类型为输入标签所属的业务类型；Preferably, when the second training unit performs the joint training, the first loss function corresponding to each input business type is calculated based on the loss weights of all input tags; wherein, the input business type is an input tag the type of business it belongs to;

较佳地，在所述状态识别单元中，所述确定所述测试样本图像在相应测试业务类型下的质量评分结果包括：Preferably, in the state identification unit, the determination of the quality scoring result of the test sample image under the corresponding test service type includes:

一种图像质量评分装置，包括：输入单元、质量评分单元和输出单元；An image quality scoring device, comprising: an input unit, a quality scoring unit and an output unit;

所述质量评分单元，用于将接收的待评分的原始图像和每个当前业务类型的标识，输入一训练装置预先训练好的质量评分回归模型，得到所述待评分的原始图像在所述每个当前业务类型下的质量评分结果；The quality scoring unit is configured to input the received original image to be scored and the identification of each current service type into a quality scoring regression model pre-trained by a training device, and obtain the original image to be scored in each The quality scoring results under the current business type;

所述第一训练单元，用于利用所述测试样本图像、所述每个当前业务类型的标识，以及所述测试样本图像在所述每个当前业务类型下的质量评分结果，训练生成所述质量评分回归模型；The first training unit is configured to train and generate the quality score regression model;

本申请还提供一种训练装置，包括：状态识别单元、评分单元和第一训练单元；The present application also provides a training device, including: a state recognition unit, a scoring unit and a first training unit;

其中，所述质量评分模型用于对输入的待评分的原始图像和每个当前业务类型的标识进行处理，得到所述待评分的原始图像在所述每个当前业务类型下的质量评分结果；所述标准图像的状态特征向量为：标准图像和相应测试业务类型的标签输入所述状态识别模型进行特征提取后得到的状态特征向量；R*P个标准图像是对应于相应测试业务类型的各种状态设置的可无歧义辨别状态类别的典型图像，所述R为相应测试业务类型下所有状态类别的总数，所述P为正整数。Wherein, the quality scoring model is used to process the input original image to be scored and the identification of each current business type to obtain the quality scoring result of the original image to be scored under each current business type; The state feature vector of the standard image is: the state feature vector obtained after the standard image and the label of the corresponding test service type are input into the state recognition model for feature extraction; the R*P standard images are each corresponding to the corresponding test service type. A typical image of a status category that can be distinguished without ambiguity for a status setting, the R is the total number of all status categories under the corresponding test service type, and the P is a positive integer.

本申请提供一种计算机可读存储介质，其上存储有计算机指令，所述指令被处理器执行时可实现上述任一项所述的图像质量评分方法。The present application provides a computer-readable storage medium, on which computer instructions are stored, and when the instructions are executed by a processor, the image quality scoring method described in any one of the above-mentioned methods can be realized.

本申请提供一种电子设备，该电子设备至少包括计算机可读存储介质，还包括处理器；The present application provides an electronic device, the electronic device at least includes a computer-readable storage medium, and also includes a processor;

所述处理器，用于从所述计算机可读存储介质中读取所述可执行指令，并执行所述指令以实现上述任一项所述的图像质量评分方法。The processor is configured to read the executable instructions from the computer-readable storage medium, and execute the instructions to implement the image quality scoring method described in any one of the above.

由上述技术方案可见，可以利用预先训练的状态识别模型，对输入的原始图像和对应于业务类型的标签进行处理，标记与每个业务类型相关的图像区域，基于该图像区域提取特征，得到第一状态特征向量，这样处理得到的第一状态特征向量主要反映与业务类型相关区域的特征信息；接下来，将与业务类型对应的R*P个标准图像输入状态识别模型得到R*P个标准图像的状态特征向量，通过计算第一状态特征向量分别与R*P个标准图像状态特征向量的相似度，来确定原始图像在某业务类型下的质量评分结果。在进行图像质量评分时，可以直接通过上述图像质量评分处理得到图像在业务类型下的质量评分结果，或者，也可以利用一个分数回归网络拟合上述图像质量评分过程，在实际进行图像评分时，将待评分的图像和业务类型标识输入训练好的分数回归网络，得到图像在相应业务类型下的质量评分结果。It can be seen from the above technical solution that the pre-trained state recognition model can be used to process the input original image and the label corresponding to the business type, mark the image area related to each business type, and extract features based on the image area to obtain the first A state feature vector, the first state feature vector obtained in this way mainly reflects the feature information of the area related to the business type; next, input the R*P standard images corresponding to the business type into the state recognition model to obtain R*P standard The state feature vector of the image, by calculating the similarity between the first state feature vector and the R*P standard image state feature vectors, determines the quality scoring result of the original image under a certain service type. When performing image quality scoring, the quality scoring results of images under business types can be obtained directly through the above image quality scoring process, or a fractional regression network can be used to fit the above image quality scoring process. When actually performing image scoring, Input the image to be scored and the business type identifier into the trained score regression network to obtain the quality scoring result of the image under the corresponding business type.

通过上述处理，利用当前识别业务类型作为监督信息，使得进行相似度比较所使用的第一状态特征向量主要反映与业务类型相关区域的特征信息，由此进行的图像质量判断更加关注到与业务相关的区域，而不是全图判断质量，从而能够对应不同的业务类型，有针对性地进行合理的图像质量判断，使质量判断结果更加精准。Through the above processing, using the current recognized business type as supervisory information, the first state feature vector used for similarity comparison mainly reflects the feature information of the area related to the business type, and the image quality judgment thus carried out pays more attention to business-related Instead of judging the quality of the entire image, it can correspond to different business types and make reasonable image quality judgments in a targeted manner, making the quality judgment results more accurate.

附图说明Description of drawings

图1为本申请中第一种图像质量评分方法的基本流程示意图；Fig. 1 is a schematic flow chart of the first image quality scoring method in the present application;

图2为本申请实施例一的图像质量评分方法的具体流程示意图；FIG. 2 is a schematic flow chart of the image quality scoring method in Embodiment 1 of the present application;

图3为实施例一中违法业务判断模型的整体框架示意图；FIG. 3 is a schematic diagram of the overall framework of the illegal business judgment model in Embodiment 1;

图4为实施例一的第二层级网络中每个子网络的结构示意图；FIG. 4 is a schematic structural diagram of each sub-network in the second-level network of Embodiment 1;

图5为实施例二中的图像质量评分方法的具体流程示意图；FIG. 5 is a schematic flow chart of the image quality scoring method in Embodiment 2;

图6为本申请中第一种图像质量评分装置的基本结构示意图；FIG. 6 is a schematic diagram of the basic structure of the first image quality scoring device in the present application;

图7为本申请中与第二种图像质量评分方法对应的图像质量评分系统的基本结构示意图；7 is a schematic diagram of the basic structure of the image quality scoring system corresponding to the second image quality scoring method in the present application;

图8为本申请中第二种图像质量评分装置的基本结构示意图；FIG. 8 is a schematic diagram of the basic structure of the second image quality scoring device in the present application;

图9为本申请的图像质量评分系统中训练装置的基本结构示意图；9 is a schematic diagram of the basic structure of the training device in the image quality scoring system of the present application;

图10为本申请提供的电子设备的基本结构示意图。FIG. 10 is a schematic diagram of the basic structure of the electronic device provided by the present application.

具体实施方式Detailed ways

为了使本申请的目的、技术手段和优点更加清楚明白，以下结合附图对本申请做进一步详细说明。In order to make the purpose, technical means and advantages of the present application clearer, the present application will be further described in detail below in conjunction with the accompanying drawings.

如背景技术部分所述，在进行状态识别前，通常需要对图像质量进行评分，过滤掉质量较差的图像，提升状态识别的精准率。而在存在多种不同识别业务的应用场景中，现有的图像质量评分方式，有的需要人工标定评分，有的无法对于不同的识别业务进行针对性的图像质量评分，使得图像质量评分结果准确性不高。As mentioned in the background technology section, before state recognition, it is usually necessary to score the image quality, filter out images with poor quality, and improve the accuracy of state recognition. However, in application scenarios where there are many different recognition services, some of the existing image quality scoring methods require manual calibration and scoring, and some cannot perform targeted image quality scoring for different recognition services, so that the image quality scoring results are accurate Sex is not high.

例如，在驾驶室违法行为判断的应用中，违法行为涉猎较广，存在多种不同的违法行为判断业务(如是否系安全带的判断业务、是否打电话的判断业务、是否抽烟的判断业务、是否怀抱婴儿的判断业务等)；同时，驾驶室违法行为判断的应用中，需要关注的区域较多(如主驾驶、副驾驶、后座乘客等)；不同判断业务关注的区域不同，图像质量评分若一概而论，则会影响评分结果的准确性；若针对不同的违法行为判断业务，维护不同的图像质量评分方法，则版本多，维护困难。For example, in the application of judging illegal behavior in the cab, illegal behavior is involved in a wide range, and there are many different illegal behavior judging services (such as judging whether to wear a seat belt, judging whether to make a phone call, judging whether to smoke, etc.) Judging whether to hold a baby, etc.); at the same time, in the application of judging illegal behavior in the cab, there are many areas that need to be paid attention to (such as the main driver, co-pilot, rear seat passengers, etc.); different judging businesses focus on different areas, and the image quality If the scoring is generalized, it will affect the accuracy of the scoring results; if judging businesses for different illegal activities and maintaining different image quality scoring methods, there will be many versions and difficult maintenance.

基于上述分析，本申请提供适用于多种不同识别业务的图像质量评分方法，其基本思想在于：通过对应不同识别业务(以下简称为业务)设置的标签，利用训练好的状态识别模型对输入的原始图像进行语义分割，标记与当前业务类型相关的图像区域，并基于该图像区域进行图像特征提取，用于进行图像质量评分。这样，通过业务驱动使状态识别模型关注到与当前识别业务类型相关的区域，基于该相关区域判断图像质量，而不是全图判断图像质量，从而对应不同的识别业务类型，合理进行图像质量判断，使图像质量评分结果更精准。Based on the above analysis, this application provides an image quality scoring method suitable for a variety of different recognition services. The original image is semantically segmented, the image area related to the current business type is marked, and image features are extracted based on the image area for image quality scoring. In this way, the state recognition model is driven by the business to focus on the area related to the current identification business type, and the image quality is judged based on the relevant area instead of the whole picture, so as to correspond to different recognition business types and reasonably judge the image quality. Make image quality scoring results more accurate.

基于上述本申请的基本思想，本申请提供两种图像质量评分方法。其中，第一种方法的基本流程如图1所示，包括：Based on the above basic idea of the present application, the present application provides two image quality scoring methods. Among them, the basic process of the first method is shown in Figure 1, including:

步骤101，将待评分的原始图像和对应于每个当前业务类型的标签，输入预先训练好的状态识别模型，标记与每个当前业务类型相关的图像区域，并基于图像区域进行特征提取，得到与每个当前业务类型对应的第一状态特征向量。Step 101, input the original image to be scored and the label corresponding to each current business type into the pre-trained state recognition model, mark the image area related to each current business type, and perform feature extraction based on the image area to obtain A first state feature vector corresponding to each current service type.

本申请中的状态识别模型是预先训练好的、可以基于原始图像识别当前状态的模型。例如，在驾驶室违法行为判断的应用中，状态识别模型可以是驾驶室行为判断模型，根据输入的原始图像判断当前状态为未系安全带、打电话或怀抱婴儿等状态。The state recognition model in this application is a pre-trained model that can recognize the current state based on the original image. For example, in the application of cab illegal behavior judgment, the state recognition model can be a cab behavior judgment model, which judges the current state as not wearing a seat belt, making a phone call, or holding a baby according to the input original image.

由于本申请中状态识别模型是以业务类型作为监督信息进行训练和处理的。因此，需要预先对应各种待识别的业务类型，设置相应的标签，一种业务类型对应的标签可能为一个或多个。对于不同的业务类型，对应的标签数目可能相同，也可能不同。优选地，标签可以对应于与业务类型相关的区域。例如，在驾驶室违法行为判断的应用中，违法业务类型(即待识别的业务类型)可能包括：安全带业务、打电话业务、怀抱婴儿业务等，对应于安全带业务，可以设置2个标签，分别是躯干和安全带，对应于躯干相关的区域和安全带相关的区域。In this application, the state recognition model is trained and processed with the business type as the supervisory information. Therefore, it is necessary to correspond to various business types to be identified in advance, and set corresponding tags, and one or more tags may be corresponding to a business type. For different service types, the number of corresponding tags may be the same or different. Preferably, the tags may correspond to areas related to service types. For example, in the application of judging illegal behavior in the cab, the type of illegal business (that is, the type of business to be identified) may include: seat belt business, phone call business, embrace baby business, etc., corresponding to the seat belt business, you can set two tags , are the torso and the belt, respectively, corresponding to the torso-related region and the belt-related region.

在本步骤的处理中，将待评分的原始图像和对应于每个当前业务类型的标签作为状态识别模型的输入，利用预先训练好的状态识别模型，对原始图像进行语义分割，得到与输入标签对应的图像区域，也就是与每个当前业务类型相关的图像区域，例如可以是对应输入标签的mask图。In the processing of this step, the original image to be scored and the label corresponding to each current business type are used as the input of the state recognition model, and the pre-trained state recognition model is used to perform semantic segmentation on the original image to obtain the input label The corresponding image area, that is, the image area related to each current business type, may be, for example, a mask map corresponding to the input label.

利用状态识别模型，在确定出与每个当前业务类型相关的图像区域后，对应每个当前业务类型，基于相关区域的图像进行图像分类特征提取，得到状态特征向量，且该状态特征向量是对应于某个当前业务类型的。将得到的状态特征向量称为第一状态特征向量。由于第一状态特征向量是基于与某个当前业务类型相关的图像区域得到的，因此，该第一状态特征向量主要反映与某个当前业务类型相关区域的特征信息。Using the state recognition model, after determining the image area related to each current business type, corresponding to each current business type, image classification feature extraction is performed based on the image of the relevant area, and the state feature vector is obtained, and the state feature vector is the corresponding for a current business type. The obtained state eigenvector is called the first state eigenvector. Since the first state feature vector is obtained based on an image area related to a certain current service type, the first state feature vector mainly reflects feature information of an area related to a certain current service type.

其中，当前业务类型也就是本次处理中要进行状态识别的业务类型，例如，状态识别模型的本次处理输入一张原始图像和对应于安全带业务的标签，那么通过状态识别模型的处理能够得到对应于安全带业务的第一状态特征向量，可以用于判定当前处于安全带业务的哪种状态。另外，当前业务类型可以是一个或多个，例如，可以在一次处理中，输入一张原始图像和安全带业务以及打电话业务对应的标签，则通过状态识别模型的处理能够得到对应于安全带业务的一个第一状态特征向量和对应于打电话业务的一个第一状态特征向量，分别可以用于判定当前处于安全带业务的哪种状态以及打电话业务的哪种状态。Among them, the current business type is also the business type that needs to be identified in this processing. For example, the current processing of the state recognition model inputs an original image and a label corresponding to the seat belt business, so the processing of the state recognition model can Obtaining the first state feature vector corresponding to the seat belt service can be used to determine which state of the seat belt service is currently in. In addition, the current service type can be one or more. For example, in one process, an original image and the label corresponding to the seat belt service and the phone service can be input, and the corresponding seat belt service can be obtained through the processing of the state recognition model. A first state feature vector of the service and a first state feature vector corresponding to the call service can be used to determine which state of the seat belt service and which state of the call service are currently in, respectively.

步骤102，对于每个第一状态特征向量，计算该第一状态特征向量分别与相应当前业务类型对应的R*P个标准图像的状态特征向量的相似度，得到R*P个相似度结果。Step 102, for each first state feature vector, calculate the similarity between the first state feature vector and the state feature vectors of R*P standard images corresponding to the current service type, and obtain R*P similarity results.

如前所述，第一状态特征向量是利用状态识别模型对输入的原始图像进行处理后得到的特征向量。As mentioned above, the first state feature vector is a feature vector obtained after processing the input original image by using the state recognition model.

本申请中，为了对应不同的业务类型进行图像质量评分，为每种业务类型设置R*P个标准图像，将该R*P个标准图像输入状态识别模型，分别得到R*P个标准图像的状态特征向量。其中，R*P个标准图像是对应于某个业务类型的各种状态设置的可无歧义辨别状态类别的典型图像，R为某个业务类型下所有状态类别的总数，P为正整数。In this application, in order to score the image quality corresponding to different business types, R*P standard images are set for each business type, and the R*P standard images are input into the state recognition model to obtain the R*P standard images respectively. state feature vector. Among them, the R*P standard images are typical images corresponding to various state settings of a certain service type, which can distinguish the state categories without ambiguity, R is the total number of all state categories under a certain service type, and P is a positive integer.

优选地，对应于业务类型的每个状态类别设置的P个标准图像包括不同场景亮度、不同拍摄角度和/或相应状态类别下不同姿态的原始图像。Preferably, the P standard images set corresponding to each status category of the business type include original images of different scene brightnesses, different shooting angles and/or different poses under the corresponding status categories.

R*P个标准图像可以无歧义判断状态类别，利用状态识别模型得到的标准图像的特征信息代表能够无歧义判断状态类别的图像的特征；对于与每个当前业务类型B对应的第一状态特征信息，分别计算该第一状态特征信息与业务类型B对应的这些标准图像的状态特征信息间的相似度，得到R*P个相似度结果，则这些相似度结果能够反映第一状态特征信息与可以无歧义判断状态类别的图像特征间的相似度。R*P standard images can unambiguously judge the state category, and the feature information of the standard image obtained by using the state recognition model represents the characteristics of the image that can unambiguously judge the state category; for the first state feature corresponding to each current business type B Information, respectively calculate the similarity between the first state characteristic information and the state characteristic information of these standard images corresponding to the service type B, and obtain R*P similarity results, then these similarity results can reflect the first state characteristic information and The similarity between image features of state categories can be judged without ambiguity.

步骤103，对于每个第一状态特征向量，基于R*P个相似度结果，确定待评分的原始图像在相应当前业务类型下的质量评分结果。Step 103, for each first state feature vector, based on the R*P similarity results, determine the quality scoring result of the original image to be scored under the corresponding current service type.

由于步骤102确定出的R*P个相似度结果能够反映第一状态特征信息与可以无歧义判断状态类别的图像特征间的相似度，则利用这些相似度结果就可以对原始图像质量进行评分，以反映原始图像相对于标准图像的质量状况，从而有效针对每个当前业务类型给出图像的合理质量评分。Since the R*P similarity results determined in step 102 can reflect the similarity between the first state feature information and the image features that can judge the state category without ambiguity, the original image quality can be scored by using these similarity results, To reflect the quality status of the original image relative to the standard image, so as to effectively give a reasonable image quality score for each current business type.

至此，图1所示的方法流程结束。So far, the flow of the method shown in FIG. 1 ends.

另外，为进一步优化图像质量评分的处理时延和占用的资源，还可以将上述第一种方法看作一个完整的图像质量评分过程，训练一个质量评分回归模型来拟合第一种方法给出的图像质量评分过程，也就是说，对一系列测试样本图像和测试业务类型执行第一种方法给出的图像质量评分过程，得到测试样本图像在测试业务类型下的质量评分结果，再利用测试样本图像和相应的质量评分结果作为训练样本对，训练生成质量评分回归模型，以拟合第一种方法给出的图像质量评分过程。本申请中提供的第二种图像质量评分方法就是基于训练好的质量评分回归模型来进行图像质量评分的。具体地，第二种图像质量评分方法包括：In addition, in order to further optimize the processing delay and resources occupied by image quality scoring, the above first method can also be regarded as a complete image quality scoring process, and a quality scoring regression model is trained to fit the first method. The image quality scoring process, that is to say, execute the image quality scoring process given by the first method on a series of test sample images and test service types, and obtain the quality scoring results of the test sample images under the test service type, and then use the test The sample images and corresponding quality scoring results are used as training sample pairs to train the generated quality scoring regression model to fit the image quality scoring process given by the first method. The second image quality scoring method provided in this application is to perform image quality scoring based on a trained quality scoring regression model. Specifically, the second image quality scoring method includes:

将待评分的原始图像和每个当前业务类型的标识，输入预先训练好的质量评分回归模型，得到待评分的原始图像在每个当前业务类型下的质量评分结果；Input the original image to be scored and the identification of each current business type into the pre-trained quality scoring regression model, and obtain the quality scoring result of the original image to be scored under each current business type;

其中，质量评分回归模型是利用测试样本图像、每个测试业务类型的标识以及测试样本图像在每个测试业务类型下的质量评分结果训练得到的神经网络模型；Wherein, the quality scoring regression model is a neural network model obtained by using the test sample image, the identification of each test service type and the quality score result training of the test sample image under each test service type;

测试样本图像在每个测试业务类型下的质量评分结果的确定过程包括：The process of determining the quality scoring results of test sample images under each test business type includes:

将测试样本图像和对应于每个测试业务类型的标签，输入预先训练好的状态识别模型，标记与每个测试业务类型相关的图像区域，并基于图像区域进行特征提取，得到与每个测试业务类型对应的第一状态特征向量；Input the test sample image and the label corresponding to each test business type into the pre-trained state recognition model, mark the image area related to each test business type, and perform feature extraction based on the image area to obtain the The first state eigenvector corresponding to the type;

对于每个第一状态特征向量，计算该第一状态特征向量分别与相应测试业务类型对应的R*P个标准图像的状态特征向量的相似度，得到R*P个相似度结果；For each first state feature vector, calculate the similarity between the first state feature vector and the state feature vectors of R*P standard images corresponding to the corresponding test service type, and obtain R*P similarity results;

对于每个第一状态特征向量，基于R*P个相似度结果，确定测试样本图像在相应测试业务类型下的质量评分结果。For each first state feature vector, based on the R*P similarity results, determine the quality scoring result of the test sample image under the corresponding test service type.

在上述两种图像质量评分方法都引入了标准图像的状态特征向量，用于与输入图像的第一状态特征向量进行比较，获取相似度，以利用相似度确定图像质量评分。其中，标准图像的状态特征向量是指：标准图像和相应测试业务类型的标签输入状态识别模型进行特征提取后得到的状态特征向量；对应每个业务类型提供R*P个标准图像，这些标准图像是对应一个业务类型的各种状态而设置的可无歧义辨别状态类别的典型图像，R为某业务类型下所有状态类别的总数，P为正整数，也就是每种状态下的典型图像个数。例如，对于驾驶室违法行为判断的应用场景下，业务类型为判断是否打电话，R可以是该业务下的状态类别总数，具体状态可以如手持打电话、胳膊将电话夹在耳朵附近打电话、未打电话等，在每种状态类别下，提供P张典型图像，例如兼顾场景的不同亮度、驾驶室人员的多种典型姿态以及不同拍摄角度等的典型图像。Both of the above two image quality scoring methods introduce the state feature vector of the standard image, which is used to compare with the first state feature vector of the input image to obtain the similarity, so as to determine the image quality score by using the similarity. Among them, the state feature vector of the standard image refers to: the state feature vector obtained after feature extraction of the standard image and the label input state recognition model of the corresponding test service type; R*P standard images are provided corresponding to each service type, and these standard images It is a typical image set corresponding to various states of a business type that can distinguish the state category without ambiguity. R is the total number of all state categories under a certain business type, and P is a positive integer, that is, the number of typical images in each state . For example, in the application scenario of judging illegal behavior in the cab, the business type is to judge whether to make a phone call, and R can be the total number of status categories under this business. For each state category, P typical images are provided, such as taking into account the different brightness of the scene, various typical postures of the cab personnel, and different shooting angles.

在上述两种图像质量评分方法中，进行相似度比较所使用的第一状态特征向量主要反映与业务类型相关区域的特征信息，由此进行的图像质量判断更加关注到与业务相关的区域，而不是全图判断质量，从而能够对应不同的业务类型，有针对性地进行合理的图像质量判断，使质量判断结果更加精准。In the above two image quality scoring methods, the first state feature vector used for similarity comparison mainly reflects the feature information of the area related to the business type, and the image quality judgment thus carried out pays more attention to the area related to the business, while It does not judge the quality of the whole picture, so that it can correspond to different business types, and carry out reasonable image quality judgment in a targeted manner, so that the quality judgment result is more accurate.

下面通过两个具体实施例对本申请的两种图像质量评分方法进行详细描述。The two image quality scoring methods of the present application will be described in detail below through two specific embodiments.

实施例一：Embodiment one:

本实施例用于对第一种图像质量评分方法进行详细介绍。This embodiment is used to introduce the first image quality scoring method in detail.

图2为本申请中第一种图像质量评分方法的具体实施例流程示意图。为描述方便起见，本具体实施例中以驾驶室判断违法行为的应用场景为例来说明图像质量评分方法，同时将状态识别模型的训练过程一同进行描述。如图2所示，该方法包括：Fig. 2 is a schematic flowchart of a specific embodiment of the first image quality scoring method in the present application. For the convenience of description, in this specific embodiment, the application scenario of judging illegal behavior in the cab is taken as an example to illustrate the image quality scoring method, and at the same time, the training process of the state recognition model is also described. As shown in Figure 2, the method includes:

步骤201，利用训练样本图像，训练生成违法业务判断模型。Step 201, using training sample images to train and generate an illegal business judgment model.

本实施例中的违法业务判断模型也就是前述状态识别模型。The illegal business judgment model in this embodiment is also the aforementioned state recognition model.

本实施例中的违法业务判断模型是以违法业务类型作为监督信息进行训练和处理的。因此，需要预先对应各种违法业务类型，设置相应的标签，一种违法业务类型对应的标签可能为一个或多个。对于不同的违法业务类型，对应的标签数目可能相同，也可能不同。优选地，标签可以对应于与违法业务类型相关的区域。例如，对于安全带业务，可以设置2个标签，分别是躯干和安全带，对应于躯干相关的区域和安全带相关的区域。The illegal business judging model in this embodiment is trained and processed with the type of illegal business as supervision information. Therefore, it is necessary to correspond to various illegal business types in advance and set corresponding tags. One or more tags may be corresponding to one illegal business type. For different illegal business types, the corresponding number of tags may be the same or different. Preferably, the tags may correspond to areas associated with illegal business types. For example, for the seat belt business, two labels can be set, which are the torso and the seat belt, corresponding to the torso-related area and the seat belt-related area.

违法业务判断模型的输入包括用于进行质量评分的原始图像和对应一个或多个违法业务类型(以下称为输入违法类型)的标签，利用违法业务判断模型在原始图像中标记与每个输入违法类型相关的图像区域，再基于标记的图像区域进行分类特征提取，得到与每个输入违法类型对应的状态特征向量进行输出。The input of the illegal business judgment model includes the original image used for quality scoring and labels corresponding to one or more illegal business types (hereinafter referred to as input illegal types), and the illegal business judgment model is used to mark the original image with each input illegal Type-related image regions, and then perform classification feature extraction based on the marked image regions, and obtain the state feature vector corresponding to each input violation type for output.

上述违法业务判断模型可以采用多种网络结构来实现。本实施例给出一种具体实现结构。The above-mentioned illegal business judgment model can be realized by using various network structures. This embodiment provides a specific implementation structure.

本实施例中的违法业务判断模型是包括两个层级网络的神经网络模型，第一层级网络为语义分割网络，第二层级网络为语义融合和业务识别网络。具体本实施例中的违法业务判断模型的整体框架可以如图3所示。The illegal service judging model in this embodiment is a neural network model including two hierarchical networks, the first hierarchical network is a semantic segmentation network, and the second hierarchical network is a semantic fusion and service identification network. Specifically, the overall framework of the illegal business judgment model in this embodiment may be shown in FIG. 3 .

首先介绍第一层级网络：First introduce the first level network:

1、在第一层级的语义分割网络中，语义的选择是由违法业务类型决定的，事实上，语义可以直接对应于输入的标签，语义分割也就是按照输入标签进行处理；1. In the first-level semantic segmentation network, the choice of semantics is determined by the type of illegal business. In fact, the semantics can directly correspond to the input labels, and semantic segmentation is processed according to the input labels;

2、第一层级网络可以是各种用于处理图像的神经网络模型，例如图3中所示的卷积神经网络，即CNN网络；2. The first-level network can be various neural network models for processing images, such as the convolutional neural network shown in Figure 3, namely the CNN network;

3、对输入的原始图像进行语义分割时，首先在输入的原始图像中将各个输入标签对应的区域进行区分标记，得到一张Mask图；3. When performing semantic segmentation on the input original image, firstly mark the areas corresponding to each input label in the input original image to obtain a Mask map;

由于输入的标签可能是对应一个或多个输入违法业务类型的，因此在该Mask图中，对所有输入违法业务类型对应的标签区域都进行了区分标记，也就是标记了与每个输入违法业务类型相关的区域。其中可能包括与一个或多个违法业务类型相关的图像区域。例如，假定输入的标签包括安全带业务和打电话业务的标签(具体包括人头、手臂和手掌)，即包括躯干、安全带、人头、手臂、手掌，在语义分割网络中，首先对躯干、安全带、人头、手臂、手掌各自对应的区域进行标记，得到Mask图；Since the input label may correspond to one or more input illegal business types, in the Mask diagram, the label areas corresponding to all input illegal business types are marked, that is, the labels corresponding to each input illegal business type are marked Type-dependent regions. These may include areas of imagery associated with one or more types of illegal business. For example, assuming that the input labels include the labels of seat belt business and phone business (specifically including human head, arm and palm), that is, including torso, seat belt, human head, arm and palm, in the semantic segmentation network, firstly, the torso, safety Mark the areas corresponding to the belt, head, arm, and palm to obtain the Mask map;

4、基于第3点中得到的Mask图，对应每个输入标签A，只保留该输入标签对应区域的标记，生成标签A对应的Mask图；这样，对应每个输入标签，得到一个相应的Mask图，所有输入标签对应的Mask图就是语义分割网络的输出，该输入标签对应的Mask图也称为语义特征图。4. Based on the Mask map obtained in point 3, corresponding to each input label A, only keep the mark of the corresponding area of the input label, and generate a Mask map corresponding to label A; in this way, corresponding to each input label, a corresponding Mask is obtained The Mask map corresponding to all input labels is the output of the semantic segmentation network, and the Mask map corresponding to the input label is also called a semantic feature map.

由上述可见，通过语义分割网络的处理，得到对应每个输入标签的语义特征图，同属于一个违法业务类型的标签所对应的语义特征图，标记了与该违法业务类型相关的图像区域。It can be seen from the above that through the processing of the semantic segmentation network, the semantic feature map corresponding to each input label is obtained, and the semantic feature map corresponding to the label belonging to the same illegal business type marks the image area related to the illegal business type.

接下来介绍第二层级网络：Next, introduce the second level network:

1、第二层级网络如图3所示，包括与M个业务类型一一对应的M个子网络，每个子网络的结构相同；M为待识别的所有业务类型的总数。第一层级网络输出与每个输入标签对应的语义特征图，进入第二层级网络后，将属于同一输入业务类型的标签对应的语义特征图和原始图像(即输入第一层级网络的原始图像)输入与相应输入业务类型对应的子网络。例如，第二层级网络包括三个子网络，分别与安全带业务、打电话业务和怀抱婴儿业务对应；第一层级网络输出的语义特征图是与安全带业务的标签和打电话业务的标签对应的，那么在进入第二层级网络时，将与安全带业务的标签对应的所有语义特征图和原始图像输入与安全带业务对应的子网络，将与打电话业务的标签对应的所有语义特征图和原始图像输入与打电话业务对应的子网络。下面的说明均以一个子网络以及该子网络对应的业务类型X为例进行说明；1. As shown in Figure 3, the second-level network includes M sub-networks corresponding to M service types one by one, and each sub-network has the same structure; M is the total number of all service types to be identified. The first-level network outputs the semantic feature map corresponding to each input label. After entering the second-level network, the semantic feature map and original image corresponding to the label belonging to the same input business type (that is, the original image input to the first-level network) Enter the subnetwork corresponding to the corresponding input business type. For example, the second-level network includes three sub-networks, which correspond to the seat belt business, the phone business, and the baby-holding business; the semantic feature map output by the first-level network corresponds to the label of the seat belt business and the label of the phone business , then when entering the second-level network, input all the semantic feature maps and original images corresponding to the labels of the seat belt service into the sub-network corresponding to the seat belt service, and input all the semantic feature maps and the original images corresponding to the labels of the phone call service The original image is input into the sub-network corresponding to the call service. The following descriptions take a subnetwork and the service type X corresponding to the subnetwork as an example for illustration;

2、子网络的处理包括两部分：融合处理和分类网络，具体如图4所示；2. The processing of the sub-network includes two parts: fusion processing and classification network, as shown in Figure 4;

3、融合处理：3. Fusion processing:

将输入子网络的所有语义特征图进行修正，即上采样到与原始图像的大小相同；将修正后的所有语义特征图和原始图像执行concat操作，即在通道维度上进行连接，作为融合处理结果输出给分类网络；Correct all the semantic feature maps of the input sub-network, that is, upsample to the same size as the original image; perform concat operation on all the corrected semantic feature maps and the original image, that is, connect in the channel dimension, as the fusion processing result output to the classification network;

4、分类网络：4. Classification network:

最简单地，可以将融合处理得到的连接后图像输入分类网络进行图像分类特征提取和分类结果判定，得到与业务类型X对应的状态特征向量以及状态分类结果。例如，对于与安全带业务对应的子网络，最后得到与安全带业务对应的状态特征向量以及系安全带业务的状态分类结果(如未系安全带等状态)。分类网络的网络结构可以采用各种已有的用于实现分类的神经网络模型，例如CNN网络等。In the simplest way, the connected images obtained through the fusion process can be input into the classification network for image classification feature extraction and classification result determination to obtain the state feature vector and state classification result corresponding to the service type X. For example, for the sub-network corresponding to the seat belt service, the state feature vector corresponding to the seat belt service and the state classification result of the seat belt service (such as the state of not wearing a seat belt) are finally obtained. The network structure of the classification network can adopt various existing neural network models for realizing classification, such as CNN network and the like.

另外，在分类网络处理中，为进一步加速分类任务收敛并提升分类网络的鲁棒性，可选地，可以将融合处理输出的连接后的图像划分为若干二维切片patch，假定连接后的图像尺寸为H*W*C，H和W分别为图像的高度和宽度，C为连接后的通道数，每个二维切片的尺寸为P*P，则将连接后的图像划分成二维切片后，二维切片的数量为H*W*C/P²。在每个二维切片中加入该二维切片对应的二维位置信息；再将所有加入二维位置信息后的二维切片输入分类网络进行状态识别；In addition, in the classification network processing, in order to further accelerate the convergence of the classification task and improve the robustness of the classification network, optionally, the connected image output by the fusion processing can be divided into several two-dimensional slice patches, assuming that the connected image The size is H*W*C, H and W are the height and width of the image respectively, C is the number of connected channels, and the size of each two-dimensional slice is P*P, then the connected image is divided into two-dimensional slices After that, the number of two-dimensional slices is H*W*C/P ² . Add the two-dimensional position information corresponding to the two-dimensional slice to each two-dimensional slice; then input all the two-dimensional slices after adding the two-dimensional position information into the classification network for state identification;

其中，通过对状态识别模型的训练，保证位于业务类型X相关的图像区域中的二维切片对应的二维位置信息权重大于位于业务类型X相关的图像区域外的二维切片对应的二维位置信息权重。这里，将二维位置信息加入二维切片中的处理，可以是将二维切片与二维位置信息权重进行点乘操作。例如，当业务类型X为安全带业务时，通过训练保证，位于安全带相关区域内的二维切片所对应的二维位置信息权重(例如可以为2)，大于位于安全带相关区域内的二维切片所对应的二维位置信息权重(例如可以为1)。Wherein, through the training of the state recognition model, it is ensured that the weight of the two-dimensional position information corresponding to the two-dimensional slice located in the image area related to the business type X is greater than the two-dimensional position corresponding to the two-dimensional slice located outside the image area related to the business type X information weight. Here, the process of adding the two-dimensional position information to the two-dimensional slice may be performing a dot product operation on the two-dimensional slice and the weight of the two-dimensional position information. For example, when the service type X is a seat belt service, it is guaranteed through training that the two-dimensional location information weight (for example, 2) corresponding to the two-dimensional slice located in the seat belt-related area is greater than the two-dimensional position information weight (for example, 2) located in the seat belt-related area. The weight of the two-dimensional position information corresponding to the dimension slice (for example, it may be 1).

在分类网络处理中，为进一步提升分类网络的分类准确性，可选地，还可以在分类网络中引入注意力机制，为输入分类网络的每个像素分配权重，且通过训练保证位于业务类型X相关的图像区域中像素的权重大于位于业务类型X相关的图像区域外像素的权重。这样，可以提取更关键和重要的特征信息，使分类网络做出更准确的分类结果判断，同时不会给状态识别模型的计算和存储带来更多的消耗。In the classification network processing, in order to further improve the classification accuracy of the classification network, optionally, an attention mechanism can also be introduced into the classification network to assign a weight to each pixel input into the classification network, and ensure that it is in the business type X through training. The weight of pixels in the relevant image area is greater than the weight of pixels located outside the image area related to service type X. In this way, more critical and important feature information can be extracted, so that the classification network can make more accurate classification result judgments, and at the same time, it will not bring more consumption to the calculation and storage of the state recognition model.

上述就是本实施例中给出的违法业务判断模型的示例性架构和具体处理。The above is the exemplary architecture and specific processing of the illegal business judgment model given in this embodiment.

在对违法业务判断模型(也即状态识别模型)进行训练时，输入训练样本图像以及一个或多个训练业务类型对应的标签，进行处理后将处理结果与已知的训练样本图像的状态类别进行比较得到损失函数，并根据损失函数对违法业务判断模型的参数进行调整。When the illegal business judgment model (that is, the state recognition model) is trained, the training sample image and one or more labels corresponding to the training business type are input, and after processing, the processing result is compared with the state category of the known training sample image The loss function is obtained by comparison, and the parameters of the illegal business judgment model are adjusted according to the loss function.

具体到上述由两个层级网络组成的违法业务判断模型，在进行训练时可以将两个层级网络分别进行训练，或者，也可以将两个层级网络进行联合训练，或者，也可以做分阶段训练，初始阶段将两个层级网络分别进行训练，初始阶段结束后再将两个层级网络进行联合训练。Specific to the above-mentioned illegal business judgment model composed of two hierarchical networks, the two hierarchical networks can be trained separately during training, or the two hierarchical networks can be jointly trained, or staged training can also be performed In the initial stage, the two-layer networks are trained separately, and after the initial stage, the two-layer networks are jointly trained.

具体地，当对两个层级网络分别进行训练时，可以利用一部分训练样本图像进行第一层级网络的训练，根据第一层级网络的损失函数(以下称为第一损失函数)对第一层级网络的参数进行调整，再返回更新参数后的第一层级网络对训练样本图像进行处理，直到达到第一层级网络的训练结束条件；当第一层级网络训练完成后，再利用又一部分训练样本图像进行第二层级网络的训练，第二层级网络进行训练时还需要使用第一层级网络，这里使用的第一层级网络是已经训练好、且参数取值已经固定的，经过第二层级网络处理后，根据第二层级网络的损失函数(以下称为第二损失函数)对第二层级网络的参数进行调整，再返回利用训练好的第一层级网络和更新参数后的第二层级网络对训练样本图像进行处理，直到达到第二层级网络的训练结束条件。Specifically, when training the two-level networks separately, a part of the training sample images can be used to train the first-level network. According to the loss function of the first-level network (hereinafter referred to as the first loss function), the first-level network Adjust the parameters of the first-level network, and then return to the first-level network after the updated parameters to process the training sample images until the training end condition of the first-level network is reached; when the first-level network training is completed, use another part of the training sample images for For the training of the second-level network, the first-level network needs to be used when the second-level network is trained. The first-level network used here has been trained and the parameter values have been fixed. After the second-level network is processed, Adjust the parameters of the second-level network according to the loss function of the second-level network (hereinafter referred to as the second loss function), and then return to the training sample image using the trained first-level network and the second-level network after updating parameters Process until the end-of-training condition for the second-level network is reached.

其中，对第一层级网络进行单独训练时，输入的标签可以一个或多个业务类型对应的标签，以输入的标签包括所有业务类型对应的所有N个标签(N为所有业务类型对应的所有标签总数)为例，第一损失函数可以是语义分割得到的对N个标签对应区域都进行标记的Mask图与预设的训练样本图像的标准Mask图(其中对于N个标签区域进行了准确标记)进行比较的结果，一般地，可以将相同标签区域进行一一比较，并根据各区域的比较结果得到最终的Mask图比较结果作为第一损失函数，对于不同标签区域，其比较结果参与最终结果计算时的权重通常是相同的。Wherein, when the first-level network is trained separately, the input label can be one or more labels corresponding to the business type, including all N labels corresponding to all business types with the input label (N is all labels corresponding to all business types total) as an example, the first loss function can be the Mask image obtained by semantic segmentation and the standard Mask image of the preset training sample image (where the N label regions are accurately marked) The result of the comparison, generally, the same label areas can be compared one by one, and the final Mask map comparison result is obtained according to the comparison results of each area as the first loss function. For different label areas, the comparison results participate in the final result calculation The weights are usually the same.

对第二层级网络进行单独训练时，由于第二层级网络包括M个子网络，因此在训练时根据各个子网络的处理结果确定该子网络对应的第二损失函数，并用于调整相应子网络的参数。具体地，训练样本图像首先输入训练好的第一层级网络，经第一层级网络处理后将各个输入标签对应的语义特征图和训练样本图像输入第二层级网络；第二层级网络将属于同一个业务类型X的输入标签对应的语义特征图和训练样本图像一起输入业务类型X对应的子网络Y，子网络Y进行处理后确定训练样本图像在业务类型X下的分类，并与训练样本图像在业务类型下的标准分类进行比较确定第二损失函数，根据第二损失函数调整子网络Y的参数。When training the second-level network separately, since the second-level network includes M sub-networks, the second loss function corresponding to the sub-network is determined according to the processing results of each sub-network during training, and used to adjust the parameters of the corresponding sub-network . Specifically, the training sample images are first input into the trained first-level network, and after being processed by the first-level network, the semantic feature maps and training sample images corresponding to each input label are input into the second-level network; the second-level network will belong to the same The semantic feature map corresponding to the input label of the business type X and the training sample image are input into the sub-network Y corresponding to the business type X, and the sub-network Y determines the classification of the training sample image under the business type X after processing, and compares it with the training sample image in The standard classification under the service type is compared to determine the second loss function, and the parameters of the subnetwork Y are adjusted according to the second loss function.

当对两个层级网络进行联合训练时，可以利用训练样本图像输入第一层级网络，经过第一层级网络的处理后，得到第一损失函数，再将第一层级网络的处理结果按照第二层级网络的输入要求输入第二层级网络，经过第二层级网络的处理后得到第二损失函数，再根据第一损失函数和第二损失函数的加权和得到联合损失函数，根据联合损失函数对第一层级网络和第二层级网络的参数进行调整，再返回更新参数后的第一层级网络和第二层级网络对输入训练样本图像进行处理，直到达到违法业务判断模型的训练结束条件。When performing joint training on the two-level network, the training sample image can be used to input the first-level network, after being processed by the first-level network, the first loss function is obtained, and then the processing result of the first-level network is calculated according to the second-level The input of the network requires input into the second-level network, and the second loss function is obtained after processing by the second-level network, and then the joint loss function is obtained according to the weighted sum of the first loss function and the second loss function, and the first loss function is calculated according to the joint loss function. Adjust the parameters of the hierarchical network and the second-level network, and then return to the updated first-level network and second-level network to process the input training sample images until the training end condition of the illegal business judgment model is reached.

其中，在联合训练中，输入的标签可以是一个或多个业务类型的标签，由于进行联合训练，需要第一损失函数和第二损失函数进行加权和来确定最终的联合损失函数，而第二损失函数是对应于单个业务类型的，因此第一损失函数也需要针对单个业务类型进行计算。具体地，训练样本图像以及一个或多个业务类型的标签输入第一层级网络后进行处理，得到标记了所有输入标签对应区域的Mask图，将该Mask图与训练样本图像中的输入标签对应区域被标记的标准Mask图进行比较，对应每个业务类型，计算得到与该单个业务类型对应的第一损失函数。下面介绍单个业务类型对应的第一损失函数的确定方式。Among them, in the joint training, the input label can be one or more business type labels. Due to the joint training, the weighted sum of the first loss function and the second loss function is required to determine the final joint loss function, and the second The loss function corresponds to a single business type, so the first loss function also needs to be calculated for a single business type. Specifically, the training sample image and one or more business type labels are input into the first-level network for processing, and a Mask map is obtained that marks the regions corresponding to all input labels, and the Mask map and the corresponding regions of the input labels in the training sample image are obtained. The marked standard Mask graphs are compared, corresponding to each business type, and the first loss function corresponding to the single business type is calculated. The following describes how to determine the first loss function corresponding to a single business type.

假定输入标签对应两个业务类型，分别为安全带业务和打电话业务，可以按照上一段描述的方式得到Mask图，Mask图对安全带业务和打电话业务的标签区域进行了标记，将该Mask图与标准Mask图进行比较，具体可以将Mask图中对应相同输入标签的区域与标准Mask图中对应相同输入标签的区域进行比较，并为各个输入标签设置权重，根据各个输入标签区域的比较结果和输入标签的权重确定对应安全带业务的第一损失函数S1和对应打电话业务的第一损失函数S2。在计算安全带业务对应的第一损失函数时，安全带业务对应标签(即躯干和安全带)的权重大于打电话业务对应标签(即人头、手臂和手掌)的权重；在计算打电话业务对应的第一损失函数时，打电话业务对应标签(即人头、手臂和手掌)的权重大于安全带业务对应标签(即躯干和安全带)的权重。也就是说，在进行联合训练时，基于所有输入标签的损失权重计算每个输入业务类型的第一损失函数，在计算某输入业务类型对应的第一损失函数时，属于该输入业务类型的标签的损失权重大于不属于该输入业务类型的标签的损失权重。上述处理能够保证与目标业务相匹配的标签区域，对于第一损失函数的影响更大，从而使模型更关注到目标业务相关的区域。Assuming that the input label corresponds to two business types, namely the seat belt service and the phone call service, the Mask graph can be obtained in the manner described in the previous paragraph. The Mask graph marks the label areas of the seat belt service and the phone call service, and the Mask Compare the image with the standard Mask image. Specifically, you can compare the area corresponding to the same input label in the Mask image with the area corresponding to the same input label in the standard Mask image, and set weights for each input label. According to the comparison results of each input label area The first loss function S1 corresponding to the seat belt service and the first loss function S2 corresponding to the phone call service are determined with the weight of the input label. When calculating the first loss function corresponding to the seat belt service, the weight of the label corresponding to the seat belt service (i.e., torso and seat belt) is greater than the weight of the corresponding label (i.e., head, arm, and palm) of the call service; When the first loss function is used, the weight of the label corresponding to the phone call service (ie, head, arm, and palm) is greater than the weight of the label corresponding to the seat belt service (ie, torso and seat belt). That is to say, during joint training, the first loss function of each input business type is calculated based on the loss weights of all input tags. When calculating the first loss function corresponding to an input business type, the labels belonging to the input business type The loss weight of is greater than the loss weight of tags that do not belong to the input business type. The above processing can ensure that the label area matching the target business has a greater impact on the first loss function, so that the model can pay more attention to the area related to the target business.

当对两个层级网络进行分阶段训练时，可以利用训练样本图像对第一层级网络进行初始训练，基于初始训练后得到的第一层级网络，对第二层级网络进行初始训练；在第一层级网络和第二层级网络完成初始训练后，对第一层级网络和第二层级网络进行联合训练。When the two-level network is trained in stages, the training sample image can be used to initially train the first-level network, and based on the first-level network obtained after the initial training, the second-level network is initially trained; in the first level After the network and the second-level network complete the initial training, the first-level network and the second-level network are jointly trained.

其中，第一损失函数可以是用于衡量距离损失的各种损失函数，例如Dice损失函数，第二损失函数可以是用于衡量分类损失的各种损失函数，例如Focal损失函数或交叉熵损失函数等。Among them, the first loss function can be various loss functions used to measure distance loss, such as Dice loss function, and the second loss function can be various loss functions used to measure classification loss, such as Focal loss function or cross-entropy loss function wait.

如上就可以训练得到违法业务判断模型，下面通过步骤202和203，利用训练好的违法业务判断模型对待评分的原始图像进行处理。As above, the illegal business judgment model can be trained. Next, through steps 202 and 203, the trained illegal business judgment model is used to process the original image to be scored.

步骤202，将待评分的原始图像和对应于每个当前业务类型的标签，输入预先训练好的违法业务判断模型，标记与每个当前业务类型相关的图像区域。Step 202, input the original image to be scored and the label corresponding to each current business type into the pre-trained illegal business judgment model, and mark the image area related to each current business type.

由前述步骤201中对于违法业务判断模型的描述可见，训练好的违法业务判断模型可以针对一幅原始图像，进行一个或多个业务类型的状态判定。It can be seen from the description of the illegal business judgment model in step 201 above that the trained illegal business judgment model can judge the state of one or more business types for an original image.

在实际需要对一幅原始图像进行一个或多个业务类型下的图像质量评分时，可以将待评分的原始图像和与每个需要进行质量评分的目标业务类型(以下称为当前业务类型)对应的标签输入违法业务模型，对与每个当前业务类型相关的图像区域进行标记。When it is actually necessary to perform image quality scoring under one or more business types on an original image, the original image to be scored can be corresponding to each target business type (hereinafter referred to as the current business type) that requires quality scoring The label input of the illegal business model marks the image regions related to each current business type.

当采用前述两个层级网络来实现违法业务模型时，本步骤的处理也就是第一层级网络的处理，即对原始图像进行语义分割，得到与每个输入标签对应的Mask图。When the above-mentioned two-level network is used to realize the illegal business model, the processing in this step is also the processing of the first-level network, that is, the original image is semantically segmented to obtain a Mask map corresponding to each input label.

步骤203，在违法业务判断模型中，基于与每个当前业务类型相关的图像区域进行特征提取，得到与每个当前业务类型对应的第一状态特征向量。Step 203, in the illegal business judgment model, perform feature extraction based on the image area related to each current business type, and obtain a first state feature vector corresponding to each current business type.

针对每个当前业务类型，在通过步骤202标记该当前业务类型相关的图像区域后，基于这些标记的图像区域，对原始图像进行图像分类特征提取，得到相应业务类型对应的第一状态特征向量。For each current business type, after the image areas related to the current business type are marked in step 202, image classification feature extraction is performed on the original image based on these marked image areas to obtain the first state feature vector corresponding to the corresponding business type.

当采用前述两个层级网络来实现违法业务模型时，本步骤的处理也就是第二层级网络的处理，即对第i个当前业务类型，基于对应该业务类型的标签对应的Mask图和原始图像，进行融合处理和分类网络的特征提取处理，得到融合后图像的分类特征向量，即第i个当前业务类型对应的第一状态特征向量。其中，i为当前业务类型的索引。When the aforementioned two-level network is used to implement the illegal business model, the processing of this step is also the processing of the second-level network, that is, for the i-th current business type, based on the Mask map and the original image corresponding to the label corresponding to the business type , performing fusion processing and feature extraction processing of the classification network to obtain the classification feature vector of the fused image, that is, the first state feature vector corresponding to the i-th current service type. Wherein, i is the index of the current service type.

接下来，对每个第一状态特征向量执行步骤204和205的处理，得到原始图像在与第一状态特征向量对应业务类型下的质量评分结果。Next, the processing of steps 204 and 205 is performed for each first state feature vector to obtain the quality scoring result of the original image under the business type corresponding to the first state feature vector.

步骤204，对于每个第一状态特征向量，计算该第一状态特征向量分别与相应当前业务类型的R*P个标准图像的状态特征向量的相似度，得到R*P个相似度结果。Step 204, for each first state feature vector, calculate the similarity between the first state feature vector and the state feature vectors of R*P standard images corresponding to the current service type, and obtain R*P similarity results.

本步骤中对于每个第一状态特征向量的处理都是相同的，为描述方便起见，以一个当前业务类型X对应的第一状态特征向量的处理为例说明本步骤的具体处理。The processing of each first state feature vector in this step is the same. For the convenience of description, the processing of a first state feature vector corresponding to a current service type X is used as an example to illustrate the specific processing of this step.

如前所述，本申请中，为了对应不同的业务类型进行图像质量评分，为每个业务类型设置对应的R*P个标准图像。本步骤中确定当前业务类型X对应的R*P个标准图像以及该R*P个标准图像的状态特征向量。其中，将每个标准图像和当前业务类型X的标签输入违法业务判断模型，经过违法业务判断模型的处理后得到输出的分类特征向量，也就是标准图像的状态特征向量。对标准图像进行处理得到标准图像的状态特征向量的处理可以是在本步骤进行，也可以预先进行，本步骤获取预测处理后得到的标准图像的状态特征向量。As mentioned above, in this application, in order to score the image quality corresponding to different business types, corresponding R*P standard images are set for each business type. In this step, R*P standard images corresponding to the current service type X and state feature vectors of the R*P standard images are determined. Among them, each standard image and the label of the current business type X are input into the illegal business judgment model, and the output classification feature vector is obtained after being processed by the illegal business judgment model, which is the state feature vector of the standard image. The process of processing the standard image to obtain the state feature vector of the standard image may be performed in this step or in advance. In this step, the state feature vector of the standard image obtained after the prediction process is obtained.

将第一状态特征向量分别与当前业务类型X对应的R*P个标准图像的状态特征向量进行相似度比较，得到当前业务类型X对应的R*P个相似度结果。Comparing the first state feature vector with the state feature vectors of R*P standard images corresponding to the current service type X to obtain R*P similarity results corresponding to the current service type X.

步骤205，对于每个第一状态特征向量，基于R*P个相似度结果，确定待评分的原始图像在相应当前业务类型下的质量评分结果。Step 205, for each first state feature vector, based on the R*P similarity results, determine the quality scoring result of the original image to be scored under the corresponding current service type.

对于当前业务类型X对应的R*P个相似度结果，利用该R*P个相似度结果计算得到待评分的原始图像在当前业务类型X下的质量评分结果。具体计算方式可以是计算R*P个相似度结果的加权平均值，当然计算加权平均值的方式只是以后总示例，实际应用中不限于这种计算方式。For the R*P similarity results corresponding to the current service type X, the R*P similarity results are used to calculate the quality scoring result of the original image to be scored under the current service type X. The specific calculation method may be to calculate the weighted average of R*P similarity results. Of course, the method of calculating the weighted average is just a general example in the future, and the actual application is not limited to this calculation method.

由于步骤204确定出的R*P个相似度结果能够反映第一状态特征信息与标准图像(即可以无歧义分辨状态的图像)的图像特征间的相似度，则利用这些相似度结果就可以对原始图像质量进行评分，以反映原始图像相对于标准图像的质量状况，从而有效针对每个当前业务类型给出图像的合理质量评分。Since the R*P similarity results determined in step 204 can reflect the similarity between the first state feature information and the image features of the standard image (that is, an image that can distinguish states without ambiguity), these similarity results can be used for The original image quality is scored to reflect the quality status of the original image relative to the standard image, so as to effectively give a reasonable image quality score for each current business type.

至此，图2所示的第一种图像质量评分方法的具体实施例流程结束。So far, the flow of the specific embodiment of the first image quality scoring method shown in FIG. 2 ends.

实施例二：Embodiment two:

本实施例用于对第二种图像质量评分方法进行详细介绍。This embodiment is used to introduce the second image quality scoring method in detail.

图5为本申请中第二种图像质量评分方法的具体实施例流程示意图。为描述方便起见，本具体实施例中仍以驾驶室判断违法行为的应用场景为例来说明图像质量评分方法，同时将违法业务判断模型和质量评分回归模型的训练过程一同进行描述。如图5所示，该方法包括：Fig. 5 is a schematic flowchart of a specific embodiment of the second image quality scoring method in the present application. For the convenience of description, in this specific embodiment, the application scenario of judging illegal behavior in the cab is still taken as an example to illustrate the image quality scoring method, and at the same time, the training process of the illegal business judging model and the quality scoring regression model are described together. As shown in Figure 5, the method includes:

步骤501，利用训练样本图像，训练生成违法业务判断模型。Step 501, using training sample images to train and generate an illegal business judgment model.

本步骤的处理与步骤201相同，这里就不再赘述。The processing of this step is the same as that of step 201, and will not be repeated here.

步骤502，将测试样本图像和对应于每个测试业务类型的标签，输入预先训练好的违法业务判断模型，标记与每个测试业务类型相关的图像区域。Step 502, input the test sample image and the label corresponding to each test business type into the pre-trained illegal business judgment model, and mark the image area related to each test business type.

本实施例中的测试样本图像和测试业务类型是用于训练质量评分回归模型的样本图像，在违法业务判断模型的处理中，对于测试样本图像和测试业务类型的处理，分别与实施例一中对待评分的原始图像和当前业务类型的处理相同。只是本实施例中，对测试样本图像进行处理得到相应的图像质量评分，其目的在于将测试样本图像及其图像质量评分结果作为训练样本，用于训练质量评分回归模型。The test sample image and test business type in this embodiment are sample images used to train the quality scoring regression model. In the processing of the illegal business judgment model, the processing of the test sample image and test business type is the same as in Embodiment 1, respectively. The original image to be scored is treated the same as the current business type. Only in this embodiment, the test sample image is processed to obtain the corresponding image quality score, the purpose of which is to use the test sample image and its image quality score result as a training sample for training the quality score regression model.

由于测试样本图像和测试业务类型的处理分别与实施例一中对待评分的原始图像和当前业务类型的处理相同，步骤502-505就不再赘述详细的处理过程。Since the processing of the test sample image and the test service type is the same as the processing of the original image to be scored and the current service type in Embodiment 1, the detailed processing of steps 502-505 will not be repeated.

步骤503，在违法业务判断模型中，基于与每个测试业务类型相关的图像区域进行特征提取，得到与每个测试业务类型对应的第一状态特征向量。Step 503, in the illegal business judgment model, perform feature extraction based on the image area related to each test business type, and obtain a first state feature vector corresponding to each test business type.

步骤504，对于每个第一状态特征向量，计算该第一状态特征向量分别与相应测试业务类型的R*P个标准图像的状态特征向量的相似度，得到R*P个相似度结果。Step 504, for each first state feature vector, calculate the similarity between the first state feature vector and the state feature vectors of R*P standard images of the corresponding test service type, and obtain R*P similarity results.

步骤505，对于每个第一状态特征向量，基于R*P个相似度结果，确定测试样本图像在相应测试业务类型下的质量评分结果。Step 505, for each first state feature vector, based on the R*P similarity results, determine the quality scoring result of the test sample image under the corresponding test service type.

通过对多个测试样本图像反复执行步骤502-505，得到多个测试样本图像及其对应的质量评分结果。By repeatedly performing steps 502-505 on multiple test sample images, multiple test sample images and their corresponding quality scoring results are obtained.

步骤506，利用多个测试样本图像及其对应的质量评分结果作为训练样本，训练生成质量评分回归模型。Step 506, using a plurality of test sample images and their corresponding quality scoring results as training samples to train and generate a quality scoring regression model.

其中，测试样本图像对应的质量评分结果可能为一个或多个，对应一个或多个测试业务类型。质量评分回归模型的网络结构可以采用已有的神经网络模型，例如CNN网络等。There may be one or more quality scoring results corresponding to the test sample image, corresponding to one or more test service types. The network structure of the quality scoring regression model can adopt an existing neural network model, such as a CNN network.

在训练质量评分回归模型时，将测试样本图像和测试业务类型的标识作为输入，输出测试样本图像在测试业务类型下的质量评分结果，并将该质量评分结果与步骤505得到的质量评分结果进行比较，得到损失函数，再根据损失函数调整质量评分回归模型的参数。其中，输入的测试业务类型的标识可以是一个测试业务类型的标识，也可以是多个测试业务类型的标识。也就是说，训练得到的质量评分回归模型针对一张原始图像可以得到一个或多个业务类型下的图像质量评分。按照上述方式训练生成的质量评分回归模型能够尽量拟合步骤502～505的图像质量评分过程。When training the quality scoring regression model, the test sample image and the identification of the test service type are used as input, and the quality score result of the test sample image under the test service type is output, and the quality score result is compared with the quality score result obtained in step 505. Compare to get the loss function, and then adjust the parameters of the quality score regression model according to the loss function. Wherein, the input test service type identifier may be one test service type identifier or multiple test service type identifiers. That is to say, the trained quality score regression model can obtain image quality scores under one or more business types for an original image. The quality scoring regression model generated by training in the above manner can fit the image quality scoring process in steps 502-505 as much as possible.

步骤507，在需要对待评分的原始图像进行图像质量评分时，将待评分的原始图像和每个当前业务类型的标识，输入预先训练好的质量评分回归模型，得到待评分的原始图像在每个当前业务类型下的质量评分结果。Step 507, when it is necessary to perform image quality scoring on the original image to be scored, input the original image to be scored and the identification of each current business type into the pre-trained quality scoring regression model, and obtain the original image to be scored in each The quality scoring result under the current business type.

前述步骤501-506的处理都可以归为质量评分回归模型的训练处理。这部分处理由于耗时和使用的资源都很多，因此，可以预先完成。在实际需要待评分的原始图像进行图像质量评分时，直接执行步骤507，完成实时的图像质量评分。这样，一方面通过预先完成的训练过程，大大节省了实时评分时的时间和资源消耗；另一方面通过质量评分回归模型对于步骤502-506的图像质量评分过程的拟合，保证图像质量评分结果能够适用于多种业务类型，提高图像质量评分的精准性，还可以同时实现多个业务类型下的图像质量评分。The processing of the aforementioned steps 501-506 can be classified as the training processing of the quality scoring regression model. Since this part of the processing is time-consuming and uses a lot of resources, it can be completed in advance. When the original image to be scored is actually required to perform image quality scoring, step 507 is directly executed to complete real-time image quality scoring. In this way, on the one hand, through the pre-completed training process, the time and resource consumption during real-time scoring are greatly saved; on the other hand, through the fitting of the quality scoring regression model to the image quality scoring process in steps 502-506, the image quality scoring results are guaranteed It can be applied to various business types, improve the accuracy of image quality scoring, and can also realize image quality scoring under multiple business types at the same time.

至此，图5所示的第二种图像质量评分方法的具体实施例流程结束。So far, the flow of the specific embodiment of the second image quality scoring method shown in FIG. 5 ends.

上述即为本申请中提供的两种图像质量评分方法的具体实现。在本申请中，通过业务类型的弱监督来生成图像质量的评分分数，不是采用外观相似、模板比对、人眼感观的方式，而是由业务驱动，通过训练状态识别模型，将原始图像与无歧义可辨别状态的多张标准样本的特征求相似度均值，作为质量分数，完全服务于状态识别业务的需求，提高图像质量评分的精准性。The above is the specific implementation of the two image quality scoring methods provided in this application. In this application, the image quality score is generated through the weak supervision of the business type, instead of using similar appearance, template comparison, and human perception, but driven by the business, by training the state recognition model, the original image Calculate the average similarity with the features of multiple standard samples with unambiguous and identifiable states, and use it as the quality score to fully serve the needs of the state recognition business and improve the accuracy of image quality scoring.

本申请对应上述两种图像质量评分方法，还提供量两种图像质量评分装置。图6为第一种图像质量评分装置，可以用于实现第一种图像质量评分方法。如图6所示，第一种图像质量评分装置包括：状态识别单元和评分单元；This application corresponds to the above two image quality scoring methods, and also provides two kinds of image quality scoring devices. Fig. 6 is a first image quality scoring device, which can be used to realize the first image quality scoring method. As shown in Figure 6, the first image quality scoring device includes: a state recognition unit and a scoring unit;

其中，状态识别单元，用于将待评分的原始图像和对应于每个当前业务类型的标签，输入预先训练好的状态识别模型，标记与每个当前业务类型相关的图像区域，并基于图像区域进行特征提取，得到与每个当前业务类型对应的第一状态特征向量；Among them, the state recognition unit is used to input the original image to be scored and the label corresponding to each current business type into the pre-trained state recognition model, mark the image area related to each current business type, and based on the image area Perform feature extraction to obtain a first state feature vector corresponding to each current service type;

评分单元，用于对于每个第一状态特征向量，计算该第一状态特征向量分别与相应测试业务类型对应的R*P个标准图像的状态特征向量的相似度，得到R*P个相似度结果；还用于对于每个第一状态特征向量，基于相应的R*P个相似度结果，确定待评分的原始图像在相应当前业务类型下的质量评分结果；Scoring unit, for each first state feature vector, calculate the similarity between the first state feature vector and the state feature vectors of R*P standard images corresponding to the corresponding test service type, and obtain R*P similarities Result; also used for each first state feature vector, based on the corresponding R*P similarity results, to determine the quality scoring result of the original image to be scored under the corresponding current service type;

其中，标准图像的状态特征向量为：标准图像和相应当前业务类型的标签输入状态识别模型进行特征提取后得到的状态特征向量；R*P个标准图像是对应于相应当前业务类型的各种状态设置的可无歧义辨别状态类别的典型图像，R为相应当前业务类型下所有状态类别的总数，P为正整数。Among them, the state feature vector of the standard image is: the state feature vector obtained after feature extraction of the standard image and the label input state recognition model corresponding to the current business type; R*P standard images are corresponding to various states of the current business type A typical image of the set status category that can be distinguished without ambiguity, R is the total number of all status categories under the corresponding current service type, and P is a positive integer.

可选地，在状态识别单元中，状态识别模型包括用于进行语义分割的第一层级网络和用于进行状态识别的第二层级网络；Optionally, in the state recognition unit, the state recognition model includes a first-level network for semantic segmentation and a second-level network for state recognition;

第一层级网络的输入为待评分的原始图像和标签，输出为对应于每个标签的语义特征图，用于标记与每个当前业务类型相关的图像区域；The input of the first-level network is the original image and label to be scored, and the output is a semantic feature map corresponding to each label, which is used to mark the image area related to each current business type;

第二层级网络包括与M个业务类型一一对应的M个子网络；其中，M为所有业务类型的总数；The second-level network includes M sub-networks corresponding to M service types; wherein, M is the total number of all service types;

每个所述子网络的输入为待评分的原始图像和与该子网络对应的第一当前业务类型的各标签相应的、修正后的语义特征图，每个子网络将待评分的原始图像和修正后的语义特征图在通道维度上进行连接，并将连接后的图像输入分类网络进行状态识别，得到与第一当前业务类型对应的第一状态特征向量；其中，修正后的语义特征图为上采样到与待评分的原始图像大小相同的语义特征图。The input of each sub-network is the original image to be scored and the corrected semantic feature map corresponding to each label of the first current business type corresponding to the sub-network, and each sub-network combines the original image to be scored and the corrected The final semantic feature map is connected in the channel dimension, and the connected image is input into the classification network for state recognition, and the first state feature vector corresponding to the first current business type is obtained; wherein, the corrected semantic feature map is the above Sampling to a semantic feature map of the same size as the original image to be scored.

可选地，为进一步加速分类收敛和提高网络的鲁棒性，可以在状态识别单元中为连接后的图像提供位置信息。具体地，将连接后的图像输入分类网络进行状态识别的处理，可以包括：Optionally, in order to further accelerate the classification convergence and improve the robustness of the network, position information can be provided for the connected images in the state recognition unit. Specifically, the processing of inputting the connected image into the classification network for state recognition may include:

可选地，为进一步提高分类准确性，在状态识别单元中，可以在分类网络加入注意力机制，为输入分类网络的每个像素分配权重，且位于当前业务类型相关的图像区域中像素的权重大于位于当前业务类型相关的图像区域外像素的权重。Optionally, in order to further improve the classification accuracy, in the state recognition unit, an attention mechanism can be added to the classification network to assign a weight to each pixel input to the classification network, and the weight of the pixel located in the image area related to the current business type Greater than the weight of pixels located outside the image area related to the current business type.

可选地，图6所示的图像质量评分装置可以进一步包括训练单元，用于训练生成所述状态识别模型；Optionally, the image quality scoring device shown in FIG. 6 may further include a training unit for training and generating the state recognition model;

其中，在训练单元训练所述状态识别模型时，可以利用训练样本图像完成第一层级网络的训练，基于训练好的第一层级网络，对第二层级网络进行训练；或者，Wherein, when the training unit trains the state recognition model, the training sample image can be used to complete the training of the first-level network, and based on the trained first-level network, the second-level network is trained; or,

在训练单元训练状态识别模型时，可以对第一层级网络和所述第二层级网络进行联合训练；或者，When the training unit trains the state recognition model, the first-level network and the second-level network can be jointly trained; or,

在训练单元训练状态识别模型时，利用训练样本图像对第一层级网络进行初始训练，基于初始训练后得到的第一层级网络，对第二层级网络进行初始训练；在第二层级网络进行初始训练后，对第一层级网络和第二层级网络进行联合训练；When training the state recognition model in the training unit, use the training sample image to perform initial training on the first-level network, based on the first-level network obtained after the initial training, perform initial training on the second-level network; perform initial training on the second-level network After that, jointly train the first-level network and the second-level network;

在进行联合训练时，损失函数为第一层级网络的第一损失函数与第二层级网络的第二损失函数的加权和。During joint training, the loss function is a weighted sum of the first loss function of the first-level network and the second loss function of the second-level network.

可选地，在进行联合训练时，可以基于所有输入标签的损失权重计算每个输入业务类型对应的所述第一损失函数；其中，输入业务类型为输入标签所属的业务类型；Optionally, when performing joint training, the first loss function corresponding to each input business type may be calculated based on the loss weights of all input tags; wherein, the input business type is the business type to which the input tag belongs;

在计算任一输入业务类型对应的第一损失函数时，该任一输入业务类型的输入标签的损失权重大于不属于该任一输入业务类型的输入标签的损失权重。When calculating the first loss function corresponding to any input service type, the loss weight of the input tag of any input service type is greater than the loss weight of the input tags not belonging to any input service type.

可选地，在状态识别单元中，确定待评分的原始图像在相应当前业务类型下的质量评分结果的处理，具体可以包括：Optionally, in the state identification unit, the process of determining the quality scoring result of the original image to be scored under the corresponding current business type may specifically include:

对R*P个相似度结果计算加权均值，将计算结果作为待评分的原始图像在相应当前业务类型下的质量评分结果。Calculate the weighted average of the R*P similarity results, and use the calculation result as the quality scoring result of the original image to be scored under the corresponding current business type.

对应于第二种图像质量评分方法，本申请提供一种图像质量评分系统，可以用于实现第二种图像质量评分方法，该系统的基本结构如图7所示，具体包括图像质量评分装置和训练装置，其中，图像质量评分系统中的图像质量评分装置和训练装置的基本结构分别如图8和图9所示。Corresponding to the second image quality scoring method, the present application provides an image quality scoring system that can be used to implement the second image quality scoring method. The basic structure of the system is shown in Figure 7, specifically including an image quality scoring device and The training device, wherein the basic structures of the image quality scoring device and the training device in the image quality scoring system are shown in Figure 8 and Figure 9 respectively.

参照图7、图8和图9，训练装置包括状态识别单元、评分单元和第一训练单元；Referring to Fig. 7, Fig. 8 and Fig. 9, the training device comprises a state recognition unit, a scoring unit and a first training unit;

其中，状态识别单元，用于将测试样本图像和对应于每个测试业务类型的标签，输入预先训练好的状态识别模型，标记与每个测试业务类型相关的图像区域，并基于图像区域进行特征提取，得到与每个测试业务类型对应的第一状态特征向量；Among them, the state recognition unit is used to input the test sample image and the label corresponding to each test business type into the pre-trained state recognition model, mark the image area related to each test business type, and perform feature based on the image area Extract to obtain the first state feature vector corresponding to each test service type;

评分单元，用于对于每个第一状态特征向量，计算该第一状态特征向量分别与相应测试业务类型对应的R*P个标准图像的状态特征向量的相似度，得到R*P个相似度结果；还用于对于每个第一状态特征向量，基于R*P个相似度结果，确定测试样本图像在相应测试业务类型下的质量评分结果；Scoring unit, for each first state feature vector, calculate the similarity between the first state feature vector and the state feature vectors of R*P standard images corresponding to the corresponding test service type, and obtain R*P similarities Result; also used for each first state feature vector, based on R*P similarity results, to determine the quality scoring result of the test sample image under the corresponding test service type;

第一训练单元，用于利用测试样本图像、所述每个当前业务类型的标识，以及测试样本图像在每个当前业务类型下的质量评分结果，训练生成质量评分回归模型；The first training unit is used to use the test sample image, the identification of each current business type, and the quality scoring results of the test sample image under each current business type to train and generate a quality scoring regression model;

其中，标准图像的状态特征向量为：标准图像和相应测试业务类型的标签输入所述状态识别模型进行特征提取后得到的状态特征向量；R*P个标准图像是对应于相应测试业务类型的各种状态设置的可无歧义辨别状态类别的典型图像，R为相应测试业务类型下所有状态类别的总数，P为正整数。Wherein, the state feature vector of the standard image is: the state feature vector obtained after the standard image and the label of the corresponding test service type are input into the state recognition model for feature extraction; R*P standard images are each corresponding to the corresponding test service type A typical image of the unambiguously identifiable status category of a status setting, R is the total number of all status categories under the corresponding test service type, and P is a positive integer.

图像质量评分装置包括：输入单元、质量评分单元和输出单元；The image quality scoring device includes: an input unit, a quality scoring unit and an output unit;

其中，输入单元，用于接收待评分的原始图像和每个当前业务类型的标识，并发送给质量评分单元；Wherein, the input unit is used to receive the original image to be scored and the identification of each current business type, and send it to the quality scoring unit;

质量评分单元，用于将接收的待评分的原始图像和每个当前业务类型的标识，输入训练装置预先训练好的质量评分回归模型，得到待评分的原始图像在所述每个当前业务类型下的质量评分结果；The quality scoring unit is used to input the received original image to be scored and the identification of each current business type into the quality scoring regression model pre-trained by the training device to obtain the original image to be scored under each current business type quality rating results for

输出单元，用于输出待评分的原始图像在每个当前业务类型下的质量评分结果。The output unit is used to output the quality scoring result of the original image to be scored under each current service type.

可选地，在训练装置的状态识别单元中，状态识别模型可以包括用于进行语义分割的第一层级网络和用于进行状态识别的第二层级网络；Optionally, in the state recognition unit of the training device, the state recognition model may include a first-level network for semantic segmentation and a second-level network for state recognition;

第一层级网络的输入为所述测试样本图像和所述标签，输出为对应于每个所述标签的语义特征图，用于标记与每个测试业务类型相关的图像区域；The input of the first-level network is the test sample image and the label, and the output is a semantic feature map corresponding to each of the labels, which is used to mark the image area related to each test service type;

每个子网络的输入为测试样本图像和与该子网络对应的第一测试业务类型的各标签相应的、修正后的语义特征图，每个子网络将测试样本图像和修正后的语义特征图在通道维度上进行连接，并将连接后的图像输入分类网络进行状态识别，得到与所述第一测试业务类型对应的所述第一状态特征向量；其中，修正后的语义特征图为上采样到与测试样本图像大小相同的语义特征图。The input of each sub-network is the test sample image and the corrected semantic feature map corresponding to each label of the first test service type corresponding to the sub-network, and each sub-network will test the sample image and the corrected semantic feature map in the channel Dimensional connection is performed, and the connected image is input into the classification network for state recognition, and the first state feature vector corresponding to the first test service type is obtained; wherein, the corrected semantic feature map is up-sampled to the Semantic feature maps with the same size as test sample images.

可选地，在训练装置的状态识别单元中，将连接后的图像输入分类网络进行状态识别，包括：Optionally, in the state recognition unit of the training device, the connected image is input into the classification network for state recognition, including:

其中，通过对状态识别模型的训练，保证位于当前业务类型相关的图像区域中的二维切片对应的二维位置信息权重大于位于当前业务类型相关的图像区域外的二维切片对应的二维位置信息权重。Among them, through the training of the state recognition model, it is ensured that the weight of the two-dimensional position information corresponding to the two-dimensional slice located in the image area related to the current business type is greater than the two-dimensional position corresponding to the two-dimensional slice located outside the image area related to the current business type information weight.

可选地，在训练装置的状态识别单元中，在分类网络加入注意力机制，为输入分类网络的每个像素分配权重，且位于所述测试业务类型相关的图像区域中像素的权重大于位于所述测试业务类型相关的图像区域外像素的权重。Optionally, in the state identification unit of the training device, an attention mechanism is added to the classification network, and a weight is assigned to each pixel input into the classification network, and the weight of the pixel located in the image area related to the test service type is greater than that located in the pixel located in the classification network. The weight of pixels outside the image area related to the test service type described above.

可选地，训练装置可以进一步包括第二训练单元，用于训练生成状态识别模型；Optionally, the training device may further include a second training unit for training the generated state recognition model;

其中，在第二训练单元训练状态识别模型时，利用训练样本图像完成第一层级网络的训练，基于训练好的第一层级网络，对第二层级网络进行训练；或者，Wherein, when the state recognition model is trained in the second training unit, the training sample image is used to complete the training of the first-level network, and the second-level network is trained based on the trained first-level network; or,

在第二训练单元训练状态识别模型时，对第一层级网络和所述第二层级网络进行联合训练；或者，When the second training unit trains the state recognition model, jointly train the first-level network and the second-level network; or,

在第二训练单元训练所述状态识别模型时，利用训练样本图像对第一层级网络进行初始训练，基于初始训练后得到的第一层级网络，对第二层级网络进行初始训练；在第二层级网络进行初始训练后，对第一层级网络和第二层级网络进行联合训练；When the second training unit trains the state recognition model, the first-level network is initially trained using the training sample image, and the second-level network is initially trained based on the first-level network obtained after the initial training; at the second level After the network is initially trained, the first-level network and the second-level network are jointly trained;

可选地，在第二训练单元进行联合训练时，基于所有输入标签的损失权重计算每个输入业务类型对应的第一损失函数；其中，输入业务类型为输入标签所属的业务类型；Optionally, when the second training unit performs joint training, the first loss function corresponding to each input business type is calculated based on the loss weights of all input tags; wherein, the input business type is the business type to which the input tag belongs;

可选地，在训练装置的状态识别单元中，确定测试样本图像在相应测试业务类型下的质量评分结果的处理，具体可以包括：Optionally, in the state identification unit of the training device, the process of determining the quality scoring result of the test sample image under the corresponding test service type may specifically include:

本申请还提供一种计算机可读存储介质，该计算机可读存储介质存储指令，指令在由处理器执行时可执行如上所述实现图像质量评分方法中的步骤。实际应用中，计算机可读介质可以是上述实施例各设备/装置/系统所包含的，也可以是单独存在，而未装配入该设备/装置/系统中。其中，在计算机可读存储介质中存储指令，其存储的指令在由处理器执行时可执行如上所述图像质量评分方法中的步骤。The present application also provides a computer-readable storage medium, where the computer-readable storage medium stores instructions, and the instructions, when executed by a processor, can perform the steps in the method for scoring image quality as described above. In practical applications, the computer-readable medium may be included in each device/device/system of the above-mentioned embodiments, or may exist independently without being assembled into the device/device/system. Wherein, instructions are stored in the computer-readable storage medium, and the instructions stored in the computer-readable storage medium can execute the steps in the above-mentioned image quality scoring method when executed by the processor.

根据本申请公开的实施例，计算机可读存储介质可以是非易失性的计算机可读存储介质，例如可以包括但不限于：便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件，或者上述的任意合适的组合，但不用于限制本申请保护的范围。在本申请公开的实施例中，计算机可读存储介质可以是任何包含或存储程序的有形介质，该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。According to the embodiments disclosed in the present application, the computer-readable storage medium may be a non-volatile computer-readable storage medium, such as may include but not limited to: portable computer disk, hard disk, random access memory (RAM), read-only memory (ROM) ), erasable programmable read-only memory (EPROM or flash memory), portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above, but not used to limit this application scope of protection. In the embodiments disclosed in the present application, a computer-readable storage medium may be any tangible medium containing or storing a program, and the program may be used by or in combination with an instruction execution system, apparatus or device.

图10为本申请还提供的一种电子设备。如图10所示，其示出了本申请实施例所涉及的电子设备的结构示意图，具体来讲：Fig. 10 is an electronic device further provided by the present application. As shown in Figure 10, it shows a schematic structural diagram of the electronic device involved in the embodiment of the present application, specifically:

该电子设备可以包括一个或一个以上处理核心的处理器1001、一个或一个以上计算机可读存储介质的存储器1002以及存储在存储器上并可在处理器上运行的计算机程序。在执行所述存储器1002的程序时，可以实现图像质量评分的方法。The electronic device may include a processor 1001 of one or more processing cores, a memory 1002 of one or more computer-readable storage media, and a computer program stored on the memory and executable on the processor. When the programs in the memory 1002 are executed, the image quality scoring method can be implemented.

具体的，实际应用中，该电子设备还可以包括电源1003、输入输出单元1004等部件。本领域技术人员可以理解，图10中示出的电子设备的结构并不构成对该电子设备的限定，可以包括比图示更多或更少的部件，或者组合某些部件，或者不同的部件布置。其中：Specifically, in practical applications, the electronic device may further include components such as a power supply 1003 and an input and output unit 1004 . Those skilled in the art can understand that the structure of the electronic device shown in Figure 10 does not constitute a limitation to the electronic device, and may include more or less components than shown in the illustration, or combine some components, or different components layout. in:

处理器1001是该电子设备的控制中心，利用各种接口和线路连接整个电子设备的各个部分，通过运行或执行存储在存储器1002内的软件程序和/或模块，以及调用存储在存储器1002内的数据，执行服务器的各种功能和处理数据，从而对该电子设备进行整体控制。The processor 1001 is the control center of the electronic device, and uses various interfaces and lines to connect various parts of the entire electronic device, by running or executing software programs and/or modules stored in the memory 1002, and calling the data, perform various functions of the server and process data, thereby exercising overall control of the electronic device.

存储器1002可用于存储软件程序以及模块，即上述计算机可读存储介质。处理器1001通过运行存储在存储器1002的软件程序以及模块，从而执行各种功能应用以及数据处理。存储器1002可主要包括存储程序区和存储数据区，其中，存储程序区可存储操作系统、至少一个功能所需的应用程序等；存储数据区可存储根据服务器的使用所创建的数据等。此外，存储器1002可以包括高速随机存取存储器，还可以包括非易失性存储器，例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。相应地，存储器1002还可以包括存储器控制器，以提供处理器1001对存储器1002的访问。The memory 1002 can be used to store software programs and modules, that is, the above-mentioned computer-readable storage medium. The processor 1001 executes various functional applications and data processing by executing software programs and modules stored in the memory 1002 . The memory 1002 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, at least one application required by a function, etc.; the data storage area may store data created according to the use of the server, etc. In addition, the memory 1002 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage devices. Correspondingly, the memory 1002 may further include a memory controller to provide the processor 1001 with access to the memory 1002 .

该电子设备还包括给各个部件供电的电源1003，可以通过电源管理系统与处理器1001逻辑相连，从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。电源1003还可以包括一个或一个以上的直流或交流电源、再充电系统、电源故障检测电路、电源转换器或者逆变器、电源状态指示器等任意组件。The electronic device also includes a power supply 1003 for supplying power to various components, which can be logically connected to the processor 1001 through the power management system, so as to implement functions such as managing charge, discharge, and power consumption through the power management system. The power supply 1003 may also include one or more DC or AC power supplies, recharging systems, power failure detection circuits, power converters or inverters, power status indicators and other arbitrary components.

该电子设备还可包括输入输出单元1004，该输入单元输出1004可用于接收输入的数字或字符信息，以及产生与用户设置以及功能控制有关的键盘、鼠标、操作杆、光学信号输入。该输入单元输出1004还可以用于显示由用户输入的信息或提供给用户的信息以及各种图像用户接口，这些图形用户接口可以由图形、文本、图标、视频和其任意组合来构成。The electronic device may also include an input/output unit 1004, the input unit output 1004 may be used to receive input digital or character information, and generate keyboard, mouse, joystick, optical signal input related to user settings and function control. The input unit output 1004 can also be used to display information input by the user or provided to the user, as well as various graphical user interfaces. These graphical user interfaces can be composed of graphics, text, icons, videos and any combination thereof.

以上所述仅为本发明的较佳实施例而已，并不用以限制本发明，凡在本发明的精神和原则之内，所做的任何修改、等同替换、改进等，均应包含在本发明保护的范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention shall be included in the present invention. within the scope of protection.

Claims

1. An image quality scoring method, characterized in that, comprising:

Input the original image to be scored and the label corresponding to each current business type into a pre-trained state recognition model, mark the image area related to each current business type, and perform feature extraction based on the image area, Obtaining a first state feature vector corresponding to each current service type;

For each of the first state feature vectors, calculate the similarity between the first state feature vector and the state feature vectors of R*P standard images corresponding to the corresponding current service type, and obtain R*P similarity results;

For each of the first state feature vectors, based on the R*P similarity results, determine the quality scoring results of the original image to be scored under the corresponding current service type;

Wherein, the state feature vector of the standard image is: the state feature vector obtained after the standard image and the label of the corresponding current business type are input into the state recognition model for feature extraction; R*P standard images are corresponding to the corresponding current business type Typical images of various status settings that can be unambiguously identified status categories, the R is the total number of all status categories under the corresponding current service type, and the P is a positive integer.

2. The method according to claim 1, wherein the state recognition model comprises a first-level network for semantic segmentation and a second-level network for state recognition;

The input of the first-level network is the original image to be scored and the label, and the output is a semantic feature map corresponding to each of the labels, which is used to mark the image area related to each of the current business types ;

The second-level network includes M sub-networks corresponding to M service types; wherein, the M is the total number of all service types;

The input of each sub-network is the original image to be scored and the corrected semantic feature map corresponding to each label of the first current business type corresponding to the sub-network, and each of the sub-networks combines the The original image to be scored and the corrected semantic feature map are connected in the channel dimension, and the connected image is input into the classification network for state recognition to obtain the first state corresponding to the first current business type A feature vector; wherein, the corrected semantic feature map is a semantic feature map upsampled to the same size as the original image to be scored.

3. The method according to claim 2, wherein the state recognition of the connected image input classification network includes:

Divide the connected image into several two-dimensional slice patches, and add the two-dimensional position information corresponding to the two-dimensional slice in each of the two-dimensional slices;

Input all the two-dimensional slices with two-dimensional position information into the classification network for state recognition;

Wherein, through the training of the state recognition model, it is ensured that the weight of the two-dimensional position information corresponding to the two-dimensional slice located in the image area related to the current business type is greater than that of the two-dimensional position information located outside the image area related to the current business type. The weight of the two-dimensional position information corresponding to the slice.

4. The method according to claim 2 or 3, wherein an attention mechanism is added to the classification network, a weight is assigned to each pixel input into the classification network, and it is located at the relevant pixel of the current business type. The weight of pixels in the image area is greater than the weight of pixels outside the image area related to the current service type.

5. The method according to claim 2, wherein, when training the state recognition model, the training sample image is used to complete the training of the first-level network, based on the trained first-level network, to The second level network is trained; or,

When training the state recognition model, jointly train the first-level network and the second-level network; or,

When training the state recognition model, use training sample images to perform initial training on the first-level network, and based on the first-level network obtained after the initial training, perform initial training on the second-level network; After the initial training of the second-level network, joint training is performed on the first-level network and the second-level network;

Wherein, when performing the joint training, the loss function is a weighted sum of the first loss function of the first-level network and the second loss function of the second-level network.

6. The method according to claim 5, wherein when performing the joint training, the first loss function corresponding to each input business type is calculated based on the loss weights of all input tags; wherein, the input The business type is the business type to which the input tag belongs;

When calculating the first loss function corresponding to any input service type, the loss weight of the input tags of the any input service type is greater than the loss weight of the input tags not belonging to the any input service type.

7. The method according to claim 5 or 6, wherein the first loss function is a dice loss function, and the second loss function is a focal loss function.

8. The method according to claim 1, wherein each status category under each current business type includes P standard images, and the P standard images include different scene brightnesses, different shooting angles and/or Raw images of different poses under each state category.

9. The method according to claim 1, wherein the determining the quality scoring result of the original image to be scored under the corresponding test service type comprises:

Calculate a weighted mean value for the R*P similarity results, and use the calculation result as the quality scoring result of the original image to be scored under the corresponding current service type.

10. An image quality scoring method, characterized in that, comprising:

Input the original image to be scored and the identification of each current business type into a pre-trained quality scoring regression model to obtain the quality scoring result of the original image to be scored under each current business type;

Wherein, the quality scoring regression model is a neural network model obtained by using the test sample image, the identification of each test service type, and the quality score result of the test sample image under each test service type;

The determination process of the quality scoring result of the test sample image under each test service type includes:

Input the test sample image and the label corresponding to each test service type into a pre-trained state recognition model, mark the image area related to each test service type, and perform feature based on the image area Extracting to obtain the first state feature vector corresponding to each test service type;

For each of the first state feature vectors, calculate the similarity between the first state feature vector and the state feature vectors of R*P standard images corresponding to the corresponding test service type, and obtain R*P similarity results;

For each of the first state feature vectors, based on the R*P similarity results, determine the quality scoring results of the test sample images under the corresponding test service types;

Wherein, the state feature vector of the standard image is: the state feature vector obtained after the standard image and the label of the corresponding test service type are input into the state recognition model for feature extraction; R*P standard images are corresponding to the corresponding test service type Typical images of various status settings that can be distinguished without ambiguity. The R is the total number of all status categories under the corresponding test service type, and the P is a positive integer.

11. An image quality scoring device, comprising: a state recognition unit and a scoring unit;

The state recognition unit is used to input the original image to be scored and the label corresponding to each current business type into a pre-trained state recognition model, mark the image area related to each current business type, and based on performing feature extraction on the image area to obtain a first state feature vector corresponding to each current service type;

The scoring unit is configured to, for each of the first state feature vectors, calculate the similarity between the first state feature vector and the state feature vectors of R*P standard images corresponding to the corresponding test service type, to obtain R* P similarity results; for each of the first state feature vectors, based on the R*P similarity results, determine the quality scoring results of the original image to be scored under the corresponding current service type;

12. The device according to claim 11, wherein, in the state recognition unit, the state recognition model includes a first-level network for semantic segmentation and a second-level network for state recognition ;

13. The device according to claim 12, wherein, in the state identification unit, the input of the connected image into the classification network for state identification includes:

14. The device according to claim 12 or 13, wherein in the state recognition unit, an attention mechanism is added to the classification network, and a weight is assigned to each pixel input to the classification network, and is located at The weight of pixels in the image area related to the current service type is greater than the weight of pixels outside the image area related to the current service type.

15. The device according to claim 12, characterized in that the device further comprises a training unit for training and generating the state recognition model;

Wherein, when the training unit trains the state recognition model, the training sample image is used to complete the training of the first-level network, and the second-level network is trained based on the trained first-level network; or,

When the training unit trains the state recognition model, jointly train the first-level network and the second-level network; or,

When the training unit trains the state recognition model, the training sample image is used to initially train the first-level network, and based on the first-level network obtained after the initial training, the second-level network is initially trained. training; after the initial training of the second-level network, jointly train the first-level network and the second-level network;

When performing the joint training, the loss function is a weighted sum of the first loss function of the first-level network and the second loss function of the second-level network.

16. The device according to claim 15, wherein when performing the joint training, the first loss function corresponding to each input service type is calculated based on the loss weights of all input tags; wherein, the input The business type is the business type to which the input tag belongs;

17. The device according to claim 11, wherein, in the state identification unit, the determination of the quality scoring result of the original image to be scored under the corresponding current service type comprises:

18. An image quality scoring system, comprising a training device and an image quality scoring device;

The training device includes a state recognition unit, a scoring unit and a first training unit;

The state recognition unit is used to input the test sample image and the label corresponding to each test service type into the pre-trained state recognition model, mark the image area related to each test service type, and based on the performing feature extraction on the image area to obtain a first state feature vector corresponding to each test service type;

The scoring unit is configured to, for each of the first state feature vectors, calculate the similarity between the first state feature vector and the state feature vectors of R*P standard images corresponding to the corresponding test service type, to obtain R* P similarity results; for each of the first state feature vectors, based on the R*P similarity results, determine the quality scoring result of the test sample image under the corresponding test service type;

The first training unit is configured to use the test sample image, the identification of each current service type, and the quality score result of the test sample image under each current service type to train and generate a quality score regression model;

The image quality scoring device includes an input unit, a quality scoring unit and an output unit;

The input unit is used to receive the original image to be scored and the identification of each current service type, and send them to the quality scoring unit;

The quality scoring unit is configured to input the received original image to be scored and the identification of each current business type into the quality scoring regression model to obtain the original image to be scored under each current business type quality rating results for

The output unit is configured to output a quality scoring result of the original image to be scored under each current service type;

19. An image quality scoring device, comprising: an input unit, a quality scoring unit, and an output unit;

The quality scoring unit is configured to input the received original image to be scored and the identification of each current service type into a quality scoring regression model pre-trained by a training device, and obtain the original image to be scored in each The quality scoring results under the current business type;

The first training unit is configured to train and generate the Quality score regression model;

20. A training device, characterized in that, comprising: a state recognition unit, a scoring unit and a first training unit;

Wherein, the quality scoring model is used to process the input original image to be scored and the identification of each current business type to obtain the quality scoring result of the original image to be scored under each current business type; The state feature vector of the standard image is: the state feature vector obtained after the standard image and the label of the corresponding test service type are input into the state recognition model for feature extraction; the R*P standard images are each corresponding to the corresponding test service type. A typical image of a status category that can be distinguished without ambiguity for a status setting, the R is the total number of all status categories under the corresponding test service type, and the P is a positive integer.

21. A computer-readable storage medium, on which computer instructions are stored, wherein the image quality scoring method according to any one of claims 1-10 can be implemented when the instructions are executed by a processor.

22. An electronic device, characterized in that the electronic device includes at least a computer-readable storage medium, and also includes a processor;

The processor is configured to read the executable instructions from the computer-readable storage medium, and execute the instructions to implement the image quality scoring method according to any one of claims 1-10.