Disclosure of Invention
Aiming at the corresponding defects of the prior art, the invention provides a method for acquiring an offline quantization calibration set of a face detection model, which marks samples in a face detection data set according to different dimensions, such as scene ID, scene quality, face density, face average orientation, face average quality, face average size and the like, and automatically screens more representative and diversified sample calibration sets.
The invention aims at realizing the following scheme, namely a method for acquiring an offline quantization calibration set of a face detection model, which classifies each sample in the face detection data set after acquiring the face detection data set and processes each sample in the face detection data set according to the following steps:
1) Determining scene quality labels corresponding to all samples in the face detection data set;
2) Determining face density labels corresponding to all samples in a face detection data set;
3) Determining face average area labels corresponding to all samples in a face detection data set;
4) Determining face average orientation labels corresponding to all samples in the face detection data set;
5) Determining face average quality labels corresponding to all samples in a face detection data set;
and finally, screening out a plurality of samples meeting the requirements from the face detection data set to form an offline quantization calibration set of the face detection model.
Preferably, after the face detection data set is acquired, the type of the face detection data set is determined, and each sample in the face detection data set is classified in the following manner:
(1) if the face detection data set is a public data set, extracting the characteristics of each sample, clustering through a characteristic matching algorithm to obtain a plurality of clustering centers, setting a corresponding scene ID label for each clustering center, and classifying each sample in the face detection data set by using different scene ID labels;
(2) If the face detection data set is a self-collection data set, setting a corresponding scene ID label for each sample collection device, and classifying each sample in the face detection data set by using different scene ID labels.
Preferably, the scene quality label comprises black and white and color, and the specific steps of detection are as follows:
(1) setting a black-and-white picture judgment threshold;
(2) splitting the sample into three channels R, G and B, calculating the difference value of each pixel coordinate of the sample between the color information of each channel, counting the number of pixel coordinates with zero difference value between the color information of each channel, and judging the size relation between the number and a black-and-white picture judgment threshold value:
if the number is more than or equal to the black-and-white picture judgment threshold value, setting the scene quality label of the sample to be black-and-white;
if the number is less than the black-and-white picture judgment threshold value, setting the scene quality label of the sample to be 'color';
preferably, the scene quality label further comprises "blurring", "sharpness", and the specific steps of detecting are as follows:
(1) setting a fuzzy picture judgment threshold;
(2) converting the sample into a gray picture, carrying out Laplace transformation, solving the variance of pixels in the gray picture, and judging the size relation between the variance and a fuzzy picture judgment threshold value:
If the variance is less than or equal to the fuzzy picture judgment threshold value, setting the scene quality label of the sample as 'fuzzy';
if the variance is larger than the fuzzy picture judgment threshold value, setting the scene quality label of the sample as clear;
preferably, the scene quality label further includes "color cast" and "unbiased", and the specific steps of determining whether the scene quality label corresponding to each sample in the face detection data set is "color cast" or "unbiased" are as follows:
(1) setting a color cast picture judgment threshold;
(2) the samples are converted from the RGB color space to the CIELAB color space and the bias values of the samples are calculated according to the following formula:
where da is the sample in CIELAB color space a * The mean value of the on-axis components, ma, is the CIELAB color space a for the sample * The variance of the on-axis component, db, is the sample in the CIELAB color space b * The mean value of the on-axis component, mb, is the CIELAB color space b for the sample * The variance of the component on the axis, K, is the bias value of the sample;
(3) judging the magnitude relation between the color cast value of the sample and the color cast picture judgment threshold value:
if the color cast value of the sample is more than or equal to the color cast picture judgment threshold value, setting the scene quality label of the sample as 'color cast';
If the color cast value of the sample is less than the color cast image judgment threshold value, setting the scene quality label of the sample as 'unbiased';
preferably, the scene quality label further comprises "normal brightness", "over-dark", "over-bright", and the specific steps of detecting are as follows:
(1) setting a normal brightness judgment threshold, wherein the normal brightness judgment threshold comprises a normal brightness judgment upper limit value and a normal brightness judgment lower limit value;
(2) converting the sample into a gray picture, and calculating the average value of pixels in the gray picture;
(3) judging the size relation between the average value of pixels in the gray level picture and the normal brightness judgment threshold value:
if the average value of pixels in the gray level picture is less than or equal to the normal brightness judgment lower limit value, setting the scene quality label of the sample to be 'excessively dark';
if the average value of pixels in the gray level picture is within the normal brightness judgment threshold value range, setting the scene quality label of the sample as normal brightness;
if the average value of pixels in the gray level picture is more than or equal to the normal brightness judgment upper limit value, setting the scene quality label of the sample to be 'over bright'.
Preferably, the specific manner of determining the face density label corresponding to each sample in the face detection data set is as follows:
2-1) setting a face density threshold, wherein the face density threshold comprises an upper limit value and a lower limit value of the face density;
2-2) reading labeling information of a sample, counting the total number of faces in the sample, and judging the size relation between the total number of faces in the sample and a face density threshold value:
if the total number of faces is smaller than the lower limit value of the face density, setting the face density label of the sample as low density;
if the total number of faces is within the face density threshold range, setting the face density label of the sample as medium density;
if the total number of faces is greater than the upper limit value of the face density, the face density label of the sample is set as 'high density'.
Preferably, the specific manner of determining the face average area label corresponding to each sample in the face detection data set is as follows:
3-1) setting a face average area threshold, wherein the face average area threshold comprises an upper limit value and a lower limit value of a face average area;
3-2) reading labeling information of a sample, counting the area of each face in the sample, and judging the size relation between the average face area in the sample and the average face area threshold value:
if the average face area is smaller than the lower limit value of the average face area, setting the average face area label of the sample as a small face;
If the average face area is within the average face area threshold range, setting the average face area label of the sample as a medium face;
and if the average face area is larger than the upper limit value of the average face area, setting the average face area label of the sample as a large face.
Preferably, the specific manner of determining the average face orientation label corresponding to each sample in the face detection data set is as follows:
4-1) setting an orientation deflection threshold;
4-2) reading labeling information of a sample, and calculating a pitch angle, a yaw angle and a roll angle corresponding to each labeling face image in the sample by using a head posture estimation network;
4-3) calculating an average value of the sum of absolute values of each pitch angle, an average value of the sum of absolute values of each yaw angle and an average value of the sum of absolute values of each roll angle, and judging the magnitude relation between each average value and the deflection threshold value:
if each average value is smaller than the orientation deflection threshold value, setting the face average orientation label of the sample as 'front orientation';
if any average value is greater than or equal to the orientation deflection threshold, the face average orientation label of the sample is set to be "side orientation".
Preferably, the specific manner of determining the face average quality label corresponding to each sample in the face detection data set is as follows:
5-1) setting a face quality threshold, wherein the face quality threshold comprises an upper limit value and a lower limit value of the face quality;
5-2) reading labeling information of any sample, and respectively sending each labeling face image in the sample into a face quality detection network to obtain corresponding face quality scores;
5-3) summing the face quality scores of the marked face images, then averaging, and comparing the obtained average value with a face quality threshold value:
if the average value is smaller than the lower limit value of the face quality, setting the face average quality label of the sample as a low-quality face;
if the average value is within the face quality threshold value range, setting the face average quality label of the sample as a 'medium quality face';
if the average value is larger than the upper limit value of the face quality, the face average quality label of the sample is set as a 'high-quality face'.
Preferably, a sample simplifying threshold is set, and before a plurality of samples meeting the requirements are screened out from the face detection data set to form an offline quantization calibration set of the face detection model, all samples corresponding to scene types with the number of samples lower than the sample simplifying threshold are deleted to simplify the face detection data set.
Preferably, a plurality of samples meeting the requirements are screened out from the face detection data set, and the specific mode for forming the off-line quantization calibration set of the face detection model is as follows:
Setting a scene quality control threshold, a face density control threshold, a face average area control threshold, a face average orientation control threshold and a face average quality control threshold;
preferably, the scene quality control threshold includes a normal color sample number threshold, a normal black and white sample number threshold;
preferably, the face density control threshold includes a low density sample number threshold, a medium density sample number threshold, a high density sample number threshold;
preferably, the face average area control threshold includes a small face sample number threshold, a medium face sample number threshold, and a large face sample number threshold;
preferably, the face average orientation control threshold includes a front orientation sample number threshold and a side orientation sample number threshold;
preferably, the face average quality control threshold includes a low quality sample number threshold, a medium quality sample number threshold, a high quality sample number threshold;
randomly screening a plurality of normal color samples and normal black-and-white samples from the face detection data set according to a scene quality control threshold value to form a scene quality sample set;
the normal color sample is a sample with four scene quality labels of color, clear, unbiased and normal brightness at the same time;
The normal black-and-white sample is a sample with four scene quality labels of black-and-white, clear, unbiased and normal brightness at the same time;
preferably, the number of normal color samples in the scene quality sample set is a normal color sample number threshold, and the number of normal black-and-white samples in the scene quality sample set is a normal black-and-white sample number threshold;
randomly screening samples with the face density labels of low density, medium density and high density from a face detection data set according to a face density control threshold value to form a face density sample set;
preferably, the number of samples with the face density label of "low density" in the face density sample set is a low density sample number threshold, the number of samples with the face density label of "medium density" is a medium density sample number threshold, and the number of samples with the face density label of "high density" is a high density sample number threshold;
fourthly, randomly screening samples of which the face average area labels are 'small faces', 'medium faces' and 'large faces' from a face detection data set according to a face average area control threshold value to form a face average area sample set;
Preferably, the number of samples with the face average area label of "small face" in the face average area sample set is a small face sample number threshold, the number of samples with the face average area label of "medium density" is a medium face sample number threshold, and the number of samples with the face average area label of "high density" is a large face sample number threshold;
fifthly, randomly screening samples with face average orientation labels of 'front orientation' and 'side orientation' from the face detection data set according to a face average orientation control threshold value to form a face average orientation sample set;
preferably, the number of samples with the face average orientation label of "front orientation" in the face average orientation sample set is a front orientation sample number threshold, and the number of samples with the face average orientation label of "side orientation" is a side orientation sample number threshold;
the method comprises the steps of (a) randomly screening samples with face average quality labels of low quality, medium quality and high quality from a face detection data set according to a face average quality control threshold value to form a face average quality sample set;
preferably, the number of samples with the face average quality label of "low quality" in the face average quality sample set is a low quality sample number threshold, the number of samples with the face average quality label of "medium quality" is a medium quality sample number threshold, and the number of samples with the face average quality label of "high quality" is a high quality sample number threshold;
And screening out repeated samples in the scene quality sample set, the face density sample set, the face average area sample set, the face average orientation sample set and the face average quality sample set, and combining the rest samples into a face detection model offline quantization calibration set.
The invention has the advantages that:
(1) the method and the device have the advantages that corresponding attribute labels are arranged on each sample in the face detection data set, the operation is simple, and the expandability is high;
(2) the invention utilizes the computer to automatically select a plurality of representative and diversified samples to form the offline quantization calibration set of the face detection model, and compared with the process of manually manufacturing the offline quantization calibration set of the face detection model, the invention has simple operation, strong expandability and extremely high working efficiency.
Detailed Description
As shown in fig. 1, a method for acquiring an offline quantization calibration set of a face detection model, after acquiring a face detection data set, first determines the type of the face detection data set, and classifies each sample in the face detection data set according to the following manner:
(1) if the face detection data set is a public data set, extracting the characteristics of each sample, clustering through a characteristic matching algorithm to obtain a plurality of clustering centers, setting a corresponding scene ID label for each clustering center, and classifying each sample in the face detection data set by using different scene ID labels;
For example, the face detection dataset is a public dataset, firstly, a feature point set of each sample picture in the public dataset is extracted in batches, then one sample picture is randomly selected as a clustering center, the number of feature points matched with the rest sample pictures in the public dataset is calculated by using a feature matching algorithm, if the quotient of the number of the matched feature points divided by the number of the feature points of the sample picture is greater than 0.1, the two sample pictures are considered to belong to one category, the same scene ID label is marked for the sample pictures, after all samples are traversed, one sample picture is randomly selected in the rest samples as a clustering center, and the process is repeated until all the sample pictures are marked.
In this embodiment, the feature point extraction algorithm includes a sift algorithm, a kaze/akaze algorithm, a superpoint (Self-Supervised Interest Point Detection and Description) algorithm, and the like; the feature matching algorithm includes nearest neighbor matching algorithm, violent matching algorithm, superglue (Learning Feature Matching with Graph Neural Networks) matching algorithm, etc.
(2) If the face detection data set is a self-collection data set, setting a corresponding scene ID label for each sample collection device, and classifying each sample in the face detection data set by using different scene ID labels.
After classifying each sample in the face detection data set, processing each sample in the face detection data set according to the following steps:
1) Determining scene quality labels corresponding to all samples in the face detection data set;
in this embodiment, the scene quality label includes "black and white" and "color", and the judgment mode is specifically as follows:
(1) setting a black-and-white picture judgment threshold, wherein in the embodiment, the black-and-white picture judgment threshold is 5% of the total number of pixels of a sample to be processed;
(2) splitting the sample into three channels R, G and B, calculating the difference value of each pixel coordinate of the sample between the color information of each channel, counting the number of pixel coordinates with zero difference value between the color information of each channel, and judging the size relation between the number and a black-and-white picture judgment threshold value:
if the number is more than or equal to the black-and-white picture judgment threshold value, setting the scene quality label of the sample to be black-and-white;
if the number is less than the black-and-white picture judgment threshold value, setting the scene quality label of the sample to be 'color';
in this embodiment, the scene quality tag further includes "blur", "clear", and the specific steps of detection are as follows:
(1) Setting a fuzzy picture judgment threshold, wherein in the embodiment, the fuzzy picture judgment threshold is set to be 100;
(2) converting the sample into a gray picture, carrying out Laplace transformation, solving the variance of pixels in the gray picture, and judging the size relation between the variance and a fuzzy picture judgment threshold value:
if the variance is less than or equal to the fuzzy picture judgment threshold value, setting the scene quality label of the sample as 'fuzzy';
if the variance is larger than the fuzzy picture judgment threshold value, setting the scene quality label of the sample as clear;
in this embodiment, the scene quality label further includes "color cast" and "unbiased", and the specific steps of determining whether the scene quality label corresponding to each sample in the face detection dataset is "color cast" or "unbiased" are as follows:
(1) setting a color-shifting picture judgment threshold, wherein in the embodiment, the color-shifting picture judgment threshold is set to be 1.5;
(2) the samples are converted from the RGB color space to the CIELAB color space and the bias values of the samples are calculated according to the following formula:
where da is the sample in CIELAB color space a * The mean value of the on-axis components, ma, is the sample's color empty at CIELABInterval a * The variance of the on-axis component, db, is the sample in the CIELAB color space b * The mean value of the on-axis component, mb, is the CIELAB color space b for the sample * The variance of the component on the axis, K, is the bias value of the sample;
(3) judging the magnitude relation between the color cast value of the sample and the color cast picture judgment threshold value:
if the color cast value of the sample is more than or equal to the color cast picture judgment threshold value, setting the scene quality label of the sample as 'color cast';
if the color cast value of the sample is less than the color cast image judgment threshold value, setting the scene quality label of the sample as 'unbiased';
in this embodiment, the scene quality label further includes "normal brightness", "too dark", "too bright", and the specific steps of detection are as follows:
(1) setting a normal brightness judgment threshold, wherein the normal brightness judgment threshold comprises a normal brightness judgment upper limit value and a normal brightness judgment lower limit value;
in this embodiment, the normal brightness determination upper limit value is 200, and the normal brightness determination lower limit value is 40;
(2) converting the sample into a gray picture, and calculating the average value of pixels in the gray picture;
(3) judging the size relation between the average value of pixels in the gray level picture and the normal brightness judgment threshold value:
if the average value of pixels in the gray level picture is less than or equal to the normal brightness judgment lower limit value, setting the scene quality label of the sample to be 'excessively dark';
If the average value of pixels in the gray level picture is within the normal brightness judgment threshold value range, setting the scene quality label of the sample as normal brightness;
if the average value of pixels in the gray level picture is more than or equal to the normal brightness judgment upper limit value, setting the scene quality label of the sample to be 'over bright'.
2) Face density labels corresponding to the samples in the face detection data set are determined according to the following mode:
2-1) setting a face density threshold, wherein the face density threshold comprises an upper limit value and a lower limit value of the face density;
in this embodiment, the upper limit value of the face density is 10, and the lower limit value of the face density is 5;
2-2) reading labeling information of a sample, counting the total number of faces in the sample, and judging the size relation between the total number of faces in the sample and a face density threshold value:
if the total number of faces is smaller than the lower limit value of the face density, setting the face density label of the sample as low density;
if the total number of faces is within the face density threshold range, setting the face density label of the sample as medium density;
if the total number of faces is greater than the upper limit value of the face density, the face density label of the sample is set as 'high density'.
3) The average face area label corresponding to each sample in the face detection data set is determined according to the following mode:
3-1) setting a face average area threshold, wherein the face average area threshold comprises an upper limit value and a lower limit value of a face average area;
in this embodiment, the upper limit value of the average face area is 9216 pixels, and the lower limit value of the average face area is 1024 pixels;
3-2) reading labeling information of a sample, counting the area of each face in the sample, calculating the average face area of the sample (namely, the sum of the areas of each face divided by the number of faces of the sample), and judging the size relation between the average face area in the sample and the average face area threshold value:
if the average face area is smaller than the lower limit value of the average face area, setting the average face area label of the sample as a small face;
if the average face area is within the average face area threshold range, setting the average face area label of the sample as a medium face;
and if the average face area is larger than the upper limit value of the average face area, setting the average face area label of the sample as a large face.
4) The average face orientation label corresponding to each sample in the face detection data set is determined according to the following mode:
4-1) setting a threshold of deflection towards, in this embodiment, 20 °;
4-2) reading labeling information of a sample, and calculating a pitch angle, a yaw angle and a roll angle corresponding to each labeling face image in the sample by using a head posture estimation network;
in this embodiment, the head pose estimation network used includes HopeNet (Fine-Grained Head Pose Estimation Without Keypoints), img2 post (Face Alignment and Detection via 6DoF,Face Pose Estimation), FSA-Net (Learning Fine-Grained Structure Aggregation for Head Pose Estimation from a Single Image), and the like.
4-3) calculating an average value of the sum of absolute values of each pitch angle, an average value of the sum of absolute values of each yaw angle and an average value of the sum of absolute values of each roll angle, and judging the magnitude relation between each average value and the deflection threshold value:
if each average value is smaller than the orientation deflection threshold value, setting the face average orientation label of the sample as 'front orientation';
if any average value is greater than or equal to the orientation deflection threshold, the face average orientation label of the sample is set to be "side orientation".
5) The average face quality label corresponding to each sample in the face detection data set is determined according to the following mode:
5-1) setting a face quality threshold, wherein the face quality threshold comprises an upper limit value and a lower limit value of the face quality;
in this embodiment, the upper limit value of the face quality is 0.6, and the lower limit value of the face quality is 0.3;
5-2) reading labeling information of any sample, and respectively sending each labeling face image in the sample into a face quality detection network to obtain corresponding face quality scores;
in this embodiment, the face quality detection network includes EQFace (A Simple Explicit Quality Network for Face Recognition), SER-FIQ (Unsupervised Estimation of Face Image Quality Based on Stochastic Embedding Robustness), PLQ (Pixel-Level Face Image Quality Assessment for Explainable Face Recognition) and the like.
5-3) summing the face quality scores of the marked face images, then averaging, and comparing the obtained average value with a face quality threshold value:
if the average value is smaller than the lower limit value of the face quality, setting the face average quality label of the sample as a low-quality face;
if the average value is within the face quality threshold value range, setting the face average quality label of the sample as a 'medium quality face';
If the average value is larger than the upper limit value of the face quality, the face average quality label of the sample is set as a 'high-quality face'.
And finally, screening out a plurality of samples meeting the requirements from the face detection data set to form an offline quantization calibration set of the face detection model, wherein the method comprises the following specific steps of:
setting a scene quality control threshold, a face density control threshold, a face average area control threshold, a face average orientation control threshold and a face average quality control threshold;
in this embodiment, the scene quality control threshold includes a normal color sample number threshold and a normal black-and-white sample number threshold;
the face density control threshold comprises a low-density sample number threshold, a medium-density sample number threshold and a high-density sample number threshold;
the face average area control threshold comprises a small face sample number threshold, a medium face sample number threshold and a large face sample number threshold;
the face average orientation control threshold comprises a front orientation sample number threshold and a side orientation sample number threshold;
the face average quality control threshold comprises a low quality sample number threshold, a medium quality sample number threshold and a high quality sample number threshold;
Randomly screening a plurality of normal color samples and normal black-and-white samples from the face detection data set according to a scene quality control threshold value to form a scene quality sample set;
the normal color sample is a sample with four scene quality labels of color, clear, unbiased and normal brightness at the same time;
the normal black-and-white sample is a sample with four scene quality labels of black-and-white, clear, unbiased and normal brightness at the same time;
in this embodiment, the number of normal color samples in the scene quality sample set is a normal color sample number threshold, and the number of normal black-and-white samples in the scene quality sample set is a normal black-and-white sample number threshold;
randomly screening samples with the face density labels of low density, medium density and high density from a face detection data set according to a face density control threshold value to form a face density sample set;
in this embodiment, the number of samples with a face density label of "low density" in the face density sample set is a low density sample number threshold, the number of samples with a face density label of "medium density" is a medium density sample number threshold, and the number of samples with a face density label of "high density" is a high density sample number threshold;
Fourthly, randomly screening samples of which the face average area labels are 'small faces', 'medium faces' and 'large faces' from a face detection data set according to a face average area control threshold value to form a face average area sample set;
in this embodiment, the number of samples with the face average area label of "small face" in the face average area sample set is a small face sample number threshold, the number of samples with the face average area label of "medium density" is a medium face sample number threshold, and the number of samples with the face average area label of "high density" is a large face sample number threshold;
fifthly, randomly screening samples with face average orientation labels of 'front orientation' and 'side orientation' from the face detection data set according to a face average orientation control threshold value to form a face average orientation sample set;
in this embodiment, the number of samples with the face average orientation label of "front orientation" in the face average orientation sample set is a front orientation sample number threshold, and the number of samples with the face average orientation label of "side orientation" is a side orientation sample number threshold;
the method comprises the steps of (a) randomly screening samples with face average quality labels of low quality, medium quality and high quality from a face detection data set according to a face average quality control threshold value to form a face average quality sample set;
In this embodiment, the number of samples with a face average quality label of "low quality" in the face average quality sample set is a low quality sample number threshold, the number of samples with a face average quality label of "medium quality" is a medium quality sample number threshold, and the number of samples with a face average quality label of "high quality" is a high quality sample number threshold;
and screening out repeated samples in the scene quality sample set, the face density sample set, the face average area sample set, the face average orientation sample set and the face average quality sample set, and combining the rest samples into a face detection model offline quantization calibration set.
It should be noted that, in order to simplify the face detection data set and improve the quantization precision, a sample simplifying threshold is usually set, before a plurality of samples meeting the requirements are screened out from the face detection data set to form an offline quantization calibration set of the face detection model, all samples corresponding to scene types with the number of samples lower than the sample simplifying threshold are deleted, that is, the samples in the offline quantization calibration set of the finally generated face detection model cannot have scene quality labels of "blurs", "color bias", "darkness" and "brightness", and the samples belong to extreme samples, so that the quantization precision is affected and need to be deleted, so as to ensure that all the scene types corresponding to all the samples in the offline quantization calibration set of the finally obtained face detection model are representative.
In addition, each threshold related to the embodiment is an empirical value, and the steps of determining each threshold are as follows:
(1) determining initial set values of all thresholds according to an offline quantization scheme provided by a chip manufacturer;
(2) according to the method, an offline quantization calibration set of the face detection model is obtained;
(3) and (3) carrying out model quantization processing on the face detection model offline quantization calibration set according to an offline quantization scheme provided by a chip manufacturer, judging whether the precision loss in the quantization process meets the requirement of the offline quantization scheme, if not, adjusting the value of each threshold value, and repeating the step (2) until the precision loss in the quantization process meets the requirement of the offline quantization scheme.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the invention, and those skilled in the art will appreciate that the modifications made to the invention fall within the scope of the invention without departing from the spirit of the invention.