CN116884062A - Image processing method, image processing equipment, electronic equipment and storage medium - Google Patents
Image processing method, image processing equipment, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN116884062A CN116884062A CN202310795975.7A CN202310795975A CN116884062A CN 116884062 A CN116884062 A CN 116884062A CN 202310795975 A CN202310795975 A CN 202310795975A CN 116884062 A CN116884062 A CN 116884062A
- Authority
- CN
- China
- Prior art keywords
- face
- region
- picture
- frame
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/30—Authentication, i.e. establishing the identity or authorisation of security principals
- G06F21/31—User authentication
- G06F21/32—User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/75—Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
- G06V10/751—Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/776—Validation; Performance evaluation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
- G06T2207/30201—Face
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2210/00—Indexing scheme for image generation or computer graphics
- G06T2210/12—Bounding box
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Computation (AREA)
- Computer Security & Cryptography (AREA)
- Oral & Maxillofacial Surgery (AREA)
- General Engineering & Computer Science (AREA)
- Computer Hardware Design (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Abstract
The present disclosure provides a picture processing method, a picture processing device, an electronic device, and a storage medium. The picture processing method comprises the following steps: detecting a plurality of facial key points from a target face picture; determining a face contour region and a face salient region based on the plurality of face keypoints; and determining that the target face picture is an invalid face picture in a case where the face salient region satisfies a first predetermined condition, wherein the first predetermined condition includes that the shape of the face salient region is a predetermined shape and that the position of the face salient region does not correspond to the position of the face contour region.
Description
Technical Field
The present disclosure relates to the field of picture processing, and more particularly, to a picture processing method, a picture processing apparatus, an electronic apparatus, and a storage medium.
Background
With the increase of informatization degree, the demands for accuracy and practicability of identification are increasing, and the face recognition technology, which is one of the mainstream technologies for identification, has practical and potential applications in various fields. Although the comparison result of the face feature comparison in the related face recognition technology has better precision, false detection (for example, judging non-human pictures such as animals, model toys and the like as face pictures) often occurs in the face detection stage before the face feature extraction, so that subsequent recognition processing fails, further the related face recognition technology has higher registration failure rate, and computational resources are wasted.
In addition, in order to obtain a more accurate face recognition network, the related face recognition technology needs to train a plurality of face recognition networks for face quality evaluation from different angles, and each quality defect needs to be labeled on data (such as blur degree, illumination type, occlusion area, face pose, etc.). Such heterogeneous large scale labeling requires a lot of resources and such additional labeling makes it impossible to easily expand the data set.
The foregoing information is presented merely as background information to aid in the understanding of the disclosure. No decision or assertion has been made as to whether any of the above is applicable as relevant technology with respect to the present disclosure.
Disclosure of Invention
Embodiments of the present disclosure provide a picture processing method, a picture processing apparatus, an electronic apparatus, and a storage medium to solve at least the above-described problems and/or disadvantages.
According to a first aspect of embodiments of the present disclosure, there is provided a picture processing method, including: detecting a plurality of facial key points from a target face picture; determining a face contour region and a face salient region based on the plurality of face keypoints; and determining that the target face picture is an invalid face picture in a case where the face salient region satisfies a first predetermined condition, wherein the first predetermined condition includes that the shape of the face salient region is a predetermined shape and that the position of the face salient region does not correspond to the position of the face contour region.
Optionally, the step of determining the face contour region and the face salient region based on the plurality of face keypoints comprises: determining a region containing the plurality of facial keypoints as a facial contour region; an area containing a partial face key point is determined as a face salient area, wherein the partial face key point is a face key point other than a face key point representing a face contour among the plurality of face key points.
Optionally, the face salient region is determined to satisfy the first predetermined condition by: in a case where it is determined that the shape of a salient region envelope frame in the face salient region is a predetermined shape and the position of the salient region envelope frame in the width direction does not correspond to the position of a key point envelope frame in the face contour region, the face salient region is determined to satisfy the first predetermined condition, wherein the salient region envelope frame is a minimum rectangular envelope frame of a face key point in the face salient region, and the key point envelope frame is a minimum rectangular envelope frame of a face key point in the face contour region.
Optionally, determining that the shape of the salient region envelope frame is a predetermined shape if the aspect ratio of the salient region envelope frame is greater than or equal to a first threshold; and determining that the position of the salient region envelope frame in the width direction does not correspond to the position of the key point envelope frame in the width direction when the ratio of the distance between the central axis of the salient region envelope frame in the height direction and the central axis of the key point envelope frame in the height direction and the width of the key point envelope frame is less than or equal to a second threshold.
Optionally, the picture processing method further includes: detecting a face region frame from the target face picture; and determining that the target face picture is an invalid face picture under the condition that the face region frame meets a second preset condition, wherein the second preset condition comprises at least one of the following conditions: determining the face region frame as a non-face through a face classification model; and the face region box does not spatially correspond to the face contour region.
Optionally, the step of determining that the face region box is a non-face through a face classification model includes: amplifying the face region frame in the target face picture; cutting a part of the target face picture corresponding to the enlarged face region frame into the face picture; the face picture is classified as a non-face picture by using a face classification model.
Optionally, the step of classifying the face picture as a non-face picture by using a face classification model comprises: acquiring a face confidence coefficient of the target face picture, which indicates the probability that the target face picture contains a face; and when the face confidence is larger than or equal to a third threshold value and smaller than or equal to a fourth threshold value, the face picture is a non-face picture by using a face classification model.
Optionally, the face classification model is trained using positive samples comprising a plurality of face picture samples and negative samples comprising a plurality of non-face picture samples, wherein the positive samples are obtained by cropping portions of a picture in the dataset corresponding to at least one of the following as face picture samples: the enlarged face region frame in the picture, the face label frame of the picture and the sliding window meeting specific conditions in the cross-merging ratio with the face label frame; and the negative sample is obtained by cropping a portion of the picture in the dataset corresponding to at least one of the following as a non-face picture sample: and the cross ratio of the face label frame and the face label frame does not meet the sliding window of the specific condition, and the sliding window with a preset step length.
Optionally, determining that the face region box does not spatially correspond to the face contour region by: and determining that the face region frame and the face contour region do not spatially correspond based on the spatial overlapping degree of the face region frame and the key point envelope frame in the face contour region, wherein the key point envelope frame is the minimum rectangular envelope frame of the plurality of face key points.
Optionally, the step of determining that the face region frame and the face contour region do not spatially correspond based on the degree of spatial overlap of the face region frame and the keypoint envelope frame in the face contour region comprises: determining the intersection ratio of the face region frame and the key point envelope frame; and under the condition that the intersection ratio is smaller than or equal to a fifth threshold value, determining that the face region frame and the face contour region do not correspond in space.
According to a second aspect of embodiments of the present disclosure, there is provided a picture processing method apparatus, including: a key point detection module configured to detect a plurality of facial key points from a target face picture; a face region determination module configured to determine a face contour region and a face salient region based on the plurality of face keypoints; the face picture processing module is configured to: and determining that the target face picture is an invalid face picture in a case where the face salient region satisfies a first predetermined condition, wherein the first predetermined condition includes that the shape of the face salient region is a predetermined shape and that the position of the face salient region does not correspond to the position of the face contour region.
Optionally, the facial region determination module is configured to: determining a region containing the plurality of facial keypoints as a facial contour region; an area containing a partial face key point is determined as a face salient area, wherein the partial face key point is a face key point other than a face key point representing a face contour among the plurality of face key points.
Optionally, the face picture processing module is configured to: in a case where it is determined that the shape of a salient region envelope frame in the face salient region is a predetermined shape and the position of the salient region envelope frame in the width direction does not correspond to the position of a key point envelope frame in the face contour region, the face salient region is determined to satisfy the first predetermined condition, wherein the salient region envelope frame is a minimum rectangular envelope frame of a face key point in the face salient region, and the key point envelope frame is a minimum rectangular envelope frame of a face key point in the face contour region.
Optionally, the face picture processing module is configured to: determining that the shape of the salient region envelope frame is a predetermined shape when the aspect ratio of the salient region envelope frame is greater than or equal to a first threshold; and determining that the position of the salient region envelope frame in the width direction does not correspond to the position of the key point envelope frame in the width direction when the ratio of the distance between the central axis of the salient region envelope frame in the height direction and the central axis of the key point envelope frame in the height direction and the width of the key point envelope frame is less than or equal to a second threshold.
Optionally, the picture processing device further includes: and the face region frame detection module is configured to detect a face region frame from the target face picture. Optionally, the face picture processing module is further configured to: and determining that the target face picture is an invalid face picture under the condition that the face region frame meets a second preset condition, wherein the second preset condition comprises at least one of the following conditions: determining the face region frame as a non-face through a face classification model; and the face region box does not spatially correspond to the face contour region.
Optionally, the face picture processing module is configured to: amplifying the face region frame in the target face picture; cutting a part of the target face picture corresponding to the enlarged face region frame into the face picture; the face picture is classified as a non-face picture by using a face classification model.
Optionally, the face picture processing module is configured to classify the face picture as a non-face picture by using a face classification model by: acquiring a face confidence coefficient of the target face picture, which indicates the probability that the target face picture contains a face; and when the face confidence is larger than or equal to a third threshold value and smaller than or equal to a fourth threshold value, the face picture is a non-face picture by using a face classification model.
Optionally, the face classification model is trained using positive samples comprising a plurality of face picture samples and negative samples comprising a plurality of non-face picture samples, wherein the positive samples are obtained by cropping portions of a picture in the dataset corresponding to at least one of the following as face picture samples: the enlarged face region frame in the picture, the face label frame of the picture and the sliding window meeting specific conditions in the cross-merging ratio with the face label frame; and the negative sample is obtained by cropping a portion of the picture in the dataset corresponding to at least one of the following as a non-face picture sample: and the cross ratio of the face label frame and the face label frame does not meet the sliding window of the specific condition, and the sliding window with a preset step length.
Optionally, the face picture processing module is configured to determine that the face region frame does not spatially correspond to the face contour region by: and determining that the face region frame and the face contour region do not spatially correspond based on the spatial overlapping degree of the face region frame and the key point envelope frame in the face contour region, wherein the key point envelope frame is the minimum rectangular envelope frame of the plurality of face key points.
Optionally, the face picture processing module is configured to: determining the intersection ratio of the face region frame and the key point envelope frame; and under the condition that the intersection ratio is smaller than or equal to a fifth threshold value, determining that the face region frame and the face contour region do not correspond in space.
According to a third aspect of embodiments of the present disclosure, there is provided a picture processing method, including: detecting a face region frame from a target face picture; amplifying the face region frame in the target face picture; cutting a part of the target face picture corresponding to the enlarged face region frame into the face picture; acquiring a face confidence coefficient of the target face picture, which indicates the probability that the target face picture contains a face; and when the face confidence is larger than or equal to a third threshold value and smaller than or equal to a fourth threshold value, the face picture is a non-face picture by using a face classification model.
According to a fourth aspect of embodiments of the present disclosure, there is provided a picture processing apparatus including: a face region frame detection module configured to detect a face region frame from a target face picture; the face picture processing module is configured to: amplifying the face region frame in the target face picture; cutting a part of the target face picture corresponding to the enlarged face region frame into the face picture; acquiring a face confidence coefficient of the target face picture, which indicates the probability that the target face picture contains a face; in the case that the face confidence is greater than or equal to a third threshold and less than or equal to a fourth threshold, the face picture is made a non-face picture by using a face classification model
According to a fifth aspect of embodiments of the present disclosure, there is provided a picture processing method, including: detecting a face region frame and a plurality of face key points from a target face picture; determining a face contour region based on the plurality of face keypoints; and determining that the face region frame and the face contour region do not spatially correspond based on the spatial overlapping degree of the face region frame and the key point envelope frame in the face contour region, wherein the key point envelope frame is the minimum rectangular envelope frame of the plurality of face key points.
Optionally, the step of determining that the face region frame and the face contour region do not spatially correspond based on the degree of spatial overlapping of the face region frame and the keypoint envelope frame in the face contour region includes: determining the intersection ratio of the face region frame and the key point envelope frame; and under the condition that the intersection ratio is smaller than or equal to a fifth threshold value, determining that the face region frame and the face contour region do not correspond in space.
According to a sixth aspect of embodiments of the present disclosure, there is provided a picture processing apparatus including: a face region frame detection module configured to detect a face region frame from a target face picture; a keypoint detection module configured to detect a plurality of facial keypoints from the target face picture; a face region determination module configured to determine a face contour region based on the plurality of face keypoints; the face picture processing module is configured to: and determining that the face region frame and the face contour region do not spatially correspond based on the spatial overlapping degree of the face region frame and the key point envelope frame in the face contour region, wherein the key point envelope frame is the minimum rectangular envelope frame of the plurality of face key points.
Optionally, the face picture processing module is configured to determine that the face region frame does not spatially correspond to the face contour region based on a degree of spatial overlap of the face region frame and a keypoint envelope frame in the face contour region by: determining the intersection ratio of the face region frame and the key point envelope frame; and under the condition that the intersection ratio is smaller than or equal to a fifth threshold value, determining that the face region frame and the face contour region do not correspond in space.
According to a seventh aspect of embodiments of the present disclosure, there is provided an electronic device comprising: at least one processor; at least one memory storing computer-executable instructions, wherein the computer-executable instructions, when executed by the at least one processor, cause the at least one processor to perform a picture processing method as described above.
According to an eighth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium, wherein instructions in the computer-readable storage medium, when executed by at least one processor, cause the at least one processor to perform the picture processing method as described above.
According to the picture processing method, the picture processing device, the electronic device and the storage medium, the face picture which has extreme head gestures (such as the side face with an overlarge angle) and cannot acquire the identity information can be filtered in a targeted manner, the registration failure rate is reduced, and the recognition precision is effectively improved. According to the picture processing method, the picture processing device, the electronic device and the storage medium, the picture processing method, the picture processing device, the electronic device and the storage medium have the capability of distinguishing the face examples from the environment samples, and the image processing performance can be greatly improved by classifying the face pictures which are incorrectly recognized, so that the face pictures which comprise environmental noise and are seriously blocked and the face pictures with lower quality are filtered. According to the picture processing method, the picture processing device, the electronic device and the storage medium, the face picture with incomplete face or the face picture caused by the face region frame or the face key point detection error can be screened out, and the picture data with stronger interference can be reduced. The picture processing method, the picture processing device, the electronic device and the storage medium can be suitable for data without marked data labels, can be suitable for any data set, and have good expansibility.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The above and other aspects, features and elements of certain embodiments of the present disclosure will become more apparent from the following description when taken in conjunction with the accompanying drawings in which:
fig. 1 is a flowchart illustrating a picture processing method according to an embodiment of the present disclosure;
fig. 2 is a schematic diagram illustrating a face salient region and a face contour region of an example picture according to an embodiment of the present disclosure;
FIG. 3 is a schematic diagram illustrating an exemplary implementation of determining invalid face pictures by a face classification model according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram illustrating a face region box and a facial contour region of an example picture according to an embodiment of the present disclosure;
fig. 5 is a schematic diagram illustrating an exemplary implementation of a picture processing method according to an embodiment of the present disclosure; and
fig. 6 shows a block diagram of a picture processing device according to an embodiment of the present disclosure.
Detailed Description
In order to enable those skilled in the art to better understand the technical solutions of the present disclosure, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the disclosure described herein may be capable of operation in sequences other than those illustrated or described herein. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.
It should be noted that, in this disclosure, "at least one of the items" refers to a case where three types of juxtaposition including "any one of the items", "a combination of any of the items", "an entirety of the items" are included. For example, "comprising at least one of a and B" includes three cases side by side: (1) comprises A; (2) comprising B; (3) includes A and B. For example, "execute at least one of the first and second steps", that is, represent three cases in parallel: (1) performing step one; (2) executing the second step; (3) executing the first step and the second step.
In the related art, the face recognition technology is generally composed of the following parts: face detection, face key point detection, face alignment, face feature extraction and face feature comparison. As a key to the accuracy of face recognition techniques, the performance of face feature extraction is often strongly related to the quality of the input face picture. Under the condition that the quality of the input picture does not reach the standard, the face recognition technology is often caused to have higher mismatching rate. For example, but not limited to, too small a face area in a picture, too large an angle of face deflection, or failure of a front face detection process results in face recognition of a low quality face picture or a non-face picture that is also difficult to recognize by the naked eye. The relevant processing method is usually to try to train a better face feature extraction network and predict the quality of the input picture, such as MagFace or AdaFace, when face features are extracted.
In order to reduce the false recognition rate, another approach is to increase the filtering criteria to filter out as many low quality input pictures as possible to increase the model accuracy. However, such excessive filtering rules may also bring a higher registration failure rate, for example, such that the picture that may be originally identified is judged to be unrecognizable, which affects the system efficiency and the user experience.
In addition, the related face recognition technology often needs to train the neural network for face recognition by using the marked data after additionally marking the data. Since a plurality of face recognition networks for face quality assessment from different angles need to be trained for the purpose of improving face recognition accuracy, and labeling data (e.g., blur degree, illumination type, occlusion area, face pose, etc.) is required for each quality defect, labeling the data and training the data networks wastes a large amount of computing resources, and a data set with labeling data cannot be easily expanded, which has a certain limitation.
In summary, under the condition that the related face detection model has false detection and the face feature extraction model has high general precision, filtering the input pictures which do not reach the standard in the face detection stage is particularly important to improve the face recognition precision. In view of the above problems in the related art, the present disclosure provides a method for processing a picture, which filters a face picture with low quality and a non-face picture with false detection by restricting the high pertinence of the face structural design, thereby improving the face recognition accuracy while reducing the registration failure rate. A picture processing method, a picture processing apparatus, an electronic apparatus, and a storage medium according to embodiments of the present disclosure will be described in detail below with reference to fig. 1 to 6.
Generally, for face recognition tasks requiring extremely low error rates (typically, the error rate is required to be no higher than one part per million or one part per million), low quality face pictures and face pictures whose spatial structures do not conform to the face topology are often prone to error in front face detection, and thus result in loss of identity information or difficulty in acquisition, such pictures can seriously interfere with the accuracy of face recognition techniques, and therefore, in order to filter such highly interfering pictures, the present disclosure proposes a picture processing method based on structural consistency determination of face contours and salient regions, which is described below in connection with fig. 1 and 2, fig. 1 is a schematic diagram illustrating a picture processing method according to an embodiment of the present disclosure, and fig. 2 is a face salient region and a face contour region illustrating an example picture according to an embodiment of the present disclosure.
Referring to fig. 1, in step S101, a plurality of face key points are detected from a target face picture. Specifically, a plurality of Face keypoints of a Face may first be detected from a target Face picture by using a Face keypoint Detection model (e.g., a Face Detection (Face Detection) model). The face keypoints may also be referred to herein as face keypoints, and may include points used to locate key region locations of the face of a face. For example, but not limited to, the facial keypoint detection model may be implemented using a ResNet-18 network structure trained based on 300 field multi-Pose Faces (300 Faces in-the-Wild Large Pose,300 WLP), california field occlusion face (Caltech Occluded Face in the Wild, COFW) datasets, or field broader face keypoint (Wider Facial Landmarks in-the-Wild, WFLW) datasets.
In step S102, a face contour region and a face salient region are determined based on the plurality of face key points.
Specifically, a plurality of areas corresponding to the face may be determined based on the area where the detected face key points are located.
According to an exemplary embodiment of the present disclosure, an area containing the plurality of face keypoints is determined as a face contour area. For example, a region containing the entire face may be determined as a face contour region. For example, referring to the example picture of fig. 2, the region where all the face key points are located (the larger region in fig. 2) is determined as the face contour region of the example picture.
Optionally, the region containing a partial face key point is determined as the face salient region, wherein the partial face key point is a face key point other than a face key point representing a face contour among the plurality of face key points. In other words, the face salient region may include a region for identifying a more salient feature position of an identity, a region constituted by face key points of a non-contour line may be determined as the face salient region, for example, a region corresponding to a face standard five-point template (e.g., left-eye center, right-eye center, nose tip, left-mouth corner, right-mouth corner) or five sense organs of a face may be determined as the face salient region, and a region (smaller region in fig. 2) in which the face key points corresponding to the face standard five-point template are located is determined as the face salient region of the example picture with reference to the example picture of fig. 2.
In step S103, in a case where the face significant region satisfies a first predetermined condition including that the shape of the face significant region is a predetermined shape and the position of the face significant region does not correspond to the position of the face contour region, the target face picture is determined to be an invalid face picture. As an example, in a case where the face significant region satisfies a first predetermined condition, a target face picture that is an invalid face picture may be filtered out.
In particular, when a face in an input target face picture has an extreme head pose (for example, but not limited to, a side face with a large angle), face contour information is easier to capture, and the face key point detection model can find an accurate face contour region, but the position of the face salient region is completely blocked by the head itself, so that it is difficult to capture a reliable face salient region.
According to the present disclosure, it is determined whether the shape of the face salient region has a predetermined shape (e.g., a narrow elongated shape), thereby indirectly determining whether the face in the target face picture is in an extreme head pose. Optionally, the step of determining the shape of the facial salient region to be a predetermined shape may include determining the shape of a salient region envelope frame in the facial salient region to be a predetermined shape, wherein the salient region envelope frame is the facial salient region Minimum rectangular envelope of facial keypoints in the domain. Referring back to fig. 2, for example, the face salient region may include a salient region envelope frame ef= (E) which is a minimum rectangular envelope frame of face key points constituting the face salient region x ,E y ,F x ,F y ) Wherein the significant region envelope EF is defined by the upper left corner at (E x ,E y ) The E point of (2) and the lower right corner have the coordinates (E x ,E y ) Is represented by the F point of (c). The face salient region may also be represented by any other suitable means, without being limited thereto. Optionally, an aspect ratio (HWR) of the envelope in the salient region is greater than or equal to a first threshold (T 1 ) In the case of (a), the shape of the salient region envelope frame is determined to be a predetermined shape. Specifically, when the HWR of the significant region envelope EF satisfies the following formula (1), the shape of the significant region envelope is determined to be a predetermined shape, that is, the significant region of the face is a narrow and long region in which identification information is difficult to recognize:
wherein the first threshold T 1 May be an empirical value set in advance. For example, but not limited to, the first threshold T may be determined during testing by traversing the first threshold at a fixed step size (such as 0.1) and determining the threshold with the highest accuracy and lowest registration failure rate as the first threshold T 1 For example, but not limited to, a first threshold T 1 May be 2.8.
According to an exemplary embodiment of the present disclosure, an aspect ratio (HWR) of the envelope in the salient region is less than a first threshold (T 1 ) In the case where the shape of the face salient region is not a predetermined shape, that is, it may be determined that the face in the target face picture is not in an extreme head pose (i.e., the face is a face such as a frontal face that is a normal pose), the target face picture may be used for identification without performing other processing related to the face salient region.
In addition to determining that the facial salient region is in an extreme head pose based on the shape of the facial salient region, the facial salient region is alsoAnd judging the position relation between the face salient region and the face outline region. Optionally, the step of determining that the position of the face salient region does not correspond to the position of the face contour region includes determining that the position of the salient region envelope frame in the width direction does not correspond to the position of a key point envelope frame in the face contour region in the width direction, wherein the key point envelope frame is a minimum rectangular envelope frame of the face key points in the face contour region. Referring back to fig. 2, for example, similar to the face salient region, the face contour region may include a key point envelope frame cd= (C) that is a minimum rectangular envelope frame of face key points constituting the face contour region x ,C y ,D x ,D y ) Wherein the key point envelope frame CD is represented by the upper left corner at the position (C x ,C y ) The C point of (2) and the lower right corner have the coordinates (D x ,D y ) Is represented by the D point of (c). The face contour region may also be represented by any other suitable means, without being limited thereto. Optionally, a ratio of a distance between a central axis of the salient region envelope frame in a height direction and a central axis of the keypoint envelope frame in the height direction and a width of the keypoint envelope frame is less than or equal to a second threshold (T 2 ) In the case of (a), it is determined that the position of the salient region envelope frame in the width direction does not correspond to the position of the key point envelope frame in the width direction. Specifically, the positional relationship MRR of the significant region envelope EF and the key point envelope CD is calculated by the following formula (2):
when the MRR is less than or equal to the second threshold value T 2 When determining that the position of the face salient region does not correspond to the position of the face contour region, that is, the target face picture lacks structural consistency of the face contour and the salient region, such target face picture may have a large influence on the face recognition accuracy, which are regarded as interference pictures according to the embodiments of the present disclosure. Here, the second threshold T 2 Can be preset Empirical values. For example, but not limited to, the second threshold may be traversed by a fixed step size (such as 0.05) during testing and the threshold with the highest accuracy and lowest registration failure rate is determined to be the second threshold T 2 For example, but not limited to, a second threshold T 2 May be 0.15.
Since there is no data set including environmental noise for testing the face recognition technology in the related art, the present disclosure also makes a test data set IJBC-NS with environmental noise in order to test the accuracy of the face recognition technology including the picture processing method as described above. Specifically, the IJB-C dataset is a video-based face recognition dataset, the present disclosure produces a test set IJBC-S from the official labeled face picture with face tag box of IJB-C dataset, and randomly generates multiple picture pairs comprising multiple positive sample pairs (i.e., two samples are two pictures belonging to the same person) and multiple negative sample pairs (i.e., two samples are pictures belonging to different persons) according to given identity information. On this basis, the present disclosure randomly generates a plurality of negative sample pairs including a plurality of sample pairs of environments and faces and a plurality of sample pairs of environments by adding a plurality of unmanned environment samples that are prone to face detection errors to a test set. The present disclosure evaluates the results of testing a test stage using a picture processing method according to an embodiment of the present disclosure by using a True Accept Rate (TAR) and a false Accept Rate (False Accept Rate, FAR).
Specifically, the test results of the dataset IJBC-NS made by the present disclosure were evaluated using a correct acceptance ratio (TAR) and a False Acceptance Ratio (FAR), wherein acceptance refers to the process of face verification in which two pictures are considered to be the same person. FAR and TAR are calculated by the following formulas (3) and (4):
where in picture processing FAR as low as possible is generally required for safety considerations, for example, but not limited to, FAR on the order of parts per million or even parts per million. The threshold T satisfying the FAR condition is determined by the formula (3) and according to the FAR, and then the TAR is determined as a final evaluation metric value by using the formula (4). The present disclosure aims to accurately find interference samples, and thus, in the present test, all pairs of samples that participate in the comparison are regarded as registration failure pairs of samples, regardless of whether or not the samples are filtered out by the picture processing method according to the exemplary embodiment of the present disclosure, and the proportion of the registration failure pairs to the total pairs of samples is calculated as the registration failure rate. In addition, the registration failure sample pair also participates in the calculation of TAR and FAR, but the sample pair score is uniformly considered as 0.
All sample comparison scores used in the test of the present disclosure are calculated by cosine similarity, assuming that a and B are feature vectors of two pictures, respectively, n represents the dimension of the feature vector, and the calculation formula (5) of the sample comparison score is as follows:
wherein, the higher the sample comparison score, the greater the probability that the sample pair belongs to the same person is considered.
As a result of the test, the picture processing method according to the embodiment of the present disclosure can greatly improve the accuracy of filtering the interference sample. For example, but not limited to, for cases where the head roll pose is greater than or close to 90 degrees, it is difficult for the naked eye to obtain reliable identity information for tasks in such pictures from pictures determined to be invalid faces, and therefore it is reasonable to filter such pictures by based on the structural consistency of the facial contour regions with the facial salient regions.
Therefore, the picture processing method according to the embodiment of the disclosure can pertinently filter the face pictures with strong interference, and effectively improve the recognition accuracy on the premise of hardly changing the normal face recognition flow and not increasing excessive operation amount.
In addition, the related Face recognition image processing method is used for simply detecting a Face and extracting a Face image through a general Face Detection (Face Detection) model, and because the current Face Detection model is limited by the scale of a data set, the negative sample diversity is insufficient, and false Detection situations (high false acceptance ratio, for example, judging non-Face images such as gorilla, dogs, model toys and the like as Face images, which increase difficulty for subsequent Face recognition) often occur.
In view of the above, the present disclosure may determine an invalid face picture in combination with at least one of reclassification of a face picture based on a face region and a picture processing method based on spatial correspondence of a face region and a face contour region, in addition to using the above-described picture processing method based on structural correspondence of a face contour region and a face salient region. Optionally, the picture processing method according to the embodiment of the present disclosure further includes: detecting a face region frame from the target face picture; and determining that the target face picture is an invalid face picture under the condition that the face region frame meets a second preset condition, wherein the second preset condition comprises at least one of the following conditions: determining the face region frame as a non-face through a face classification model; and the face region box does not spatially correspond to the face contour region. For example, but not limited to, a Face region box may be detected by using a ResNet-50 network architecture trained based on WIDER Face data sets. The present disclosure does not limit the face region frame detection model, and may be implemented using other methods, which will not be described in detail because of the face region frame detection technology for detecting a face region frame in the related art.
A picture processing method of determining that the face region box is a non-face through a face classification model according to an embodiment of the present disclosure will be described below with reference to fig. 3.
Referring to fig. 3, optionally, the step of classifying the face picture as a non-face picture by using a face classification model (which may also be referred to as a face instance discrimination model) includes: acquiring a face confidence coefficient of the target face picture, which indicates the probability that the target face picture contains a face; and when the face confidence is larger than or equal to a third threshold value and smaller than or equal to a fourth threshold value, the face picture is a non-face picture by using a face classification model. Specifically, first, a face confidence of a target face picture is acquired, and it is determined whether or not use of a face classification model according to an embodiment of the present disclosure is required based on the face confidence. For example, in the case where the face confidence is smaller than the third threshold (the third threshold is 0.3 in fig. 3), the target face picture is less likely to be a face picture, i.e., the target face picture does not include a face with a high probability, and the target face picture is not required to be classified using the face classification model according to the embodiment of the present disclosure, for example, in fig. 3, a picture with the face confidence of 0.31 is directly determined as a non-face picture. For example, in the case where the face confidence is greater than the fourth threshold (the fourth threshold is 0.6 in fig. 3), the possibility that the target face picture is a face picture is large, that is, the target face picture contains a face with a high probability, and the target face picture is not required to be classified using the face classification model according to the embodiment of the present disclosure, for example, in fig. 3, a picture with the face confidence of 0.87 is directly determined as a face picture. For example, in the case where the face confidence is greater than or equal to the third threshold value and less than or equal to the fourth threshold value, the target face picture may be a face picture or may be a non-face picture, and in order to reduce the false detection rate, it is necessary to classify the target face picture using the face classification model according to the embodiment of the present disclosure, for example, in fig. 3, pictures having face confidence of 0.46 and 0.59 need to be classified again. Thus, by setting two thresholds to judge the confidence, the pictures which are necessary to be subjected to face classification can be screened more, and the waste of computing resources is avoided.
Because the acquired face region box may contain only a portion of the face, it is often not possible to perfectly contain the complete face, based on which, in embodiments according to the present disclosure, the face region box is enlarged and re-determined for the reclassified face picture. Optionally, the step of determining that the face region box is a non-face through a face classification model includes: amplifying the face region frame in the target face picture; cutting a part of the target face picture corresponding to the enlarged face region frame into the face picture; the face picture is classified as a non-face picture by using a face classification model.
Here, the step of enlarging the face region frame in the target face picture may include: the enlarged detection frame is obtained by enlarging the face region frame by a predetermined multiple (for example, but not limited to, 1.1 times) with the center point of the face region frame as a reference. For example, but not limited to, the distance of each point of the face region frame from the center point may be enlarged to a predetermined multiple, or the size of the area of the face region frame may be enlarged to a predetermined multiple, but is not limited thereto.
Here, in order to more normally process the picture, the size of the face picture corresponding to the cut-out enlarged face region frame as an input of the face classification model may be readjusted to a fixed size, such as a 112×112×3 size. The face classification model according to the embodiment of the present disclosure aims at performing two classifications on an input picture to determine whether the picture contains face features. According to exemplary embodiments of the present disclosure, the face classification model may employ any classification network, which may include various lightweight models (such as MobileNet, resNet18, etc.).
The use of face classification models according to embodiments of the present disclosure requires ensuring diversification of the data set and equalization of positive and negative samples. The process of training the face classification model according to the embodiment of the present disclosure will be described in detail below. Optionally, the face classification model is trained using positive samples comprising a plurality of face picture samples and negative samples comprising a plurality of non-face picture samples.
Since the relevant face data set does not take into account the above mentioned problems mentioned in the present disclosure, for example, the face tag box cannot contain a complete face, the relevant face data set needs to be processed to obtain a positive sample. According to an exemplary embodiment of the present disclosure, the positive sample is obtained by cropping a portion of a picture in the dataset corresponding to at least one of the following as a face picture sample: the enlarged face region frame in the picture, the face label frame of the picture and the sliding window meeting specific conditions in the cross-merging ratio with the face label frame.
Specifically, a positive sample may be obtained in a number of ways: by detecting a face region box from a picture in a face dataset (such as WebFace260M dataset) that does not have a face tag box, enlarging the detected face region box using the method of enlarging the face region box as described above, cropping a portion of the picture corresponding to the enlarged face region box as a face picture sample, and/or sliding a picture in a face dataset (such as, but not limited to, FD dataset (such as widelaface)) having a face tag box, calculating an intersection ratio of the sliding window and a face tag box (such as face bbox) of the picture, cropping a portion of the picture corresponding to the sliding window that satisfies a predetermined condition as a face picture sample, and cropping a portion of the picture corresponding to the face tag box as a face picture sample. According to an exemplary embodiment of the present disclosure, the step of cropping a portion of the picture corresponding to the sliding window satisfying a predetermined condition as a face picture sample may include: in the case where the intersection ratio of the sliding window of the picture and the face tag frame is greater than or equal to a predetermined value (such as 0.15), a portion of the picture corresponding to the sliding window satisfying a predetermined condition is cut into a face picture sample.
According to an exemplary embodiment of the present disclosure, the negative sample is obtained by cropping a portion of a picture in the dataset corresponding to at least one of the following as a non-face picture sample: and the cross ratio of the face label frame and the face label frame does not meet the sliding window of the specific condition, and the sliding window with a preset step length. Specifically, the negative sample may be obtained in a number of ways: the method comprises the steps of sliding a picture in an environment or animal and plant data set with a fixed step length (for example, but not limited to, 20 pixels), cutting a part of the picture corresponding to the sliding window into a plurality of non-face picture samples, and/or sliding the picture in the face data set with a face tag frame, calculating an intersection ratio of the sliding window and the face tag frame of the picture, and cutting a part of the picture corresponding to the face tag frame and a part of the picture corresponding to the sliding window, wherein the intersection ratio does not meet a preset condition, into a plurality of non-face picture samples. Here, the process of processing a picture in a face data set having a face tag frame to obtain a negative sample is similar to the process of obtaining a positive sample, which will not be described in detail.
Through the above test, the image processing method according to the embodiment of the present disclosure may filter out a plurality of interference samples, for example, but not limited to, an image including environmental noise or serious occlusion, a face image with low quality, etc. which is difficult to obtain identity information, it is obvious that performing image processing according to the image processing method of the embodiment of the present disclosure may improve accuracy of face recognition, and make the face recognition result more reliable. As such, the picture processing method according to the embodiment of the present disclosure further performs secondary classification of the detection result on the basis of face detection, and can obtain excellent effects with few resources.
In addition, through testing, the picture processing method according to the exemplary embodiment of the present disclosure can obtain a much larger improvement in the case of using a face classification model than in the case of not using a face classification model, because a face picture obtained under conditions such as inaccurate detection of an incomplete face or a face region frame for the case of a face classification model is also easily taken as an environmental sample, whereas the picture processing method in the case of using a face classification model has a certain capability of distinguishing a face instance from an environmental sample, and can easily obtain a very high performance improvement on a data set including noise. That is, in the case where the picture processing method according to the exemplary embodiment of the present disclosure uses the face classification model in combination with the structural consistency of the face salient region and the face contour region, complementarity in different aspects can be achieved, and accuracy improvement of 1+1>2 is obtained, so that different kinds of interference samples are filtered well.
A picture processing method of determining that the face region frame is a non-face based on a spatial relationship of the face region frame and the face contour region according to an embodiment of the present disclosure will be described below with reference to fig. 4.
Referring to fig. 4, in the case where the face region frame does not spatially correspond to the face contour region, the target face picture is determined to be an invalid face picture. Optionally, determining that the face region box does not spatially correspond to the face contour region by: and determining that the face region frame and the face contour region do not spatially correspond based on the spatial overlapping degree of the face region frame and the key point envelope frame in the face contour region, wherein the key point envelope frame is the minimum rectangular envelope frame of the plurality of face key points. For example, similar to the key point envelope frame CD in fig. 2, the face contour region here may also include a key point envelope frame cd= (C) as the minimum rectangular envelope frame of the face key points constituting the face contour region x ,C y ,D x ,D y ) Wherein the key point envelope frame CD is represented by the upper left corner at the position (C x ,C y ) The C point of (2) and the lower right corner have the coordinates (D x ,D y ) Is represented by the D point of (c). The face contour region may also be represented by any other suitable means, without being limited thereto. Similarly, the face region box AB may be defined by the upper left corner having the coordinates (A x ,A y ) The coordinates of the point A and the lower right corner are (B x ,B y ) Is indicated by the point B of (c). The face region box may be represented by any other suitable means, and is not limited thereto.
Optionally, determining an intersection ratio (IoU) of the face region frame and the keypoint envelope frame, at IoU, being less than or equal to a fifth threshold (T 5 ) In the above, it is determined that the face region frame does not spatially correspond to the face contour region. Specifically, the intersection ratio (IoU) of the face region frame AB and the key point envelope frame CD is calculated by the following formula (6):
when IoU is less than or equal to the fifth threshold T 5 In this case, it is determined that the face region frame and the face contour region do not spatially correspond, that is, the face region frame and the face contour region lack spatial correspondence, and it is difficult to identify the face based on the target face picture. Wherein the fifth threshold T 5 May be an empirical value set in advance. For example, but not limited to, the fifth threshold may be determined during testing by traversing the fifth threshold in a fixed step size (such as 0.01) and determining the threshold with the highest accuracy and lowest registration failure rate as the fifth threshold T 5 For example, but not limited to, a fifth threshold T 5 May be 0.47.
Through the above test, the image processing method according to the present disclosure can successfully filter out unqualified face images, such as, but not limited to, images containing incomplete faces, images of interference noise of false detection of face region frames or face key points, images with serious face occlusion, images with poor quality, and the like. Since such pictures are extremely easily called ambient noise samples, picture processing in this way can easily greatly improve picture processing performance.
Although the various embodiments of determining invalid face pictures are described separately above, it should be understood that any of the embodiments may be applied in combination to determine invalid face pictures.
In addition, the present disclosure further tests the related art image processing method, and the test result proves that although the method of directly increasing the threshold value of the face frame detection (i.e., the fourth threshold value) can improve the face recognition accuracy to a certain extent, the simple screening strategy can cause a great increase in the registration failure rate, which can cause difficulty in successfully detecting the face in practical application, greatly reduce the user experience, and bring about limited accuracy improvement. In contrast, the image processing method according to the embodiment of the disclosure can bring very high precision improvement, and can accurately find an interference sample with a large influence on face recognition.
Next, a procedure of a specific implementation operation of the picture processing method according to an exemplary embodiment of the present disclosure will be described with reference to fig. 5.
Fig. 5 is a schematic diagram illustrating an exemplary implementation of a picture processing method according to an embodiment of the present disclosure.
Referring to fig. 5, first, a plurality of face keypoints and face region frames are detected from a target face picture, and the target face picture is determined to be an invalid face picture when any one of the three conditions in fig. 5 (i.e., a structural coincidence condition of a face salient region and a face contour region, a spatial coincidence condition of a face region frame and a face contour region, and a non-face picture classification condition) is satisfied. That is, as long as the target face picture satisfies a condition, it is determined that the target face picture is an invalid face picture, and the target face picture needs to be filtered out or discarded. According to an exemplary embodiment of the present disclosure, the structural consistency condition of the face salient region and the face contour region may include that the shape of the face salient region is a predetermined shape and the position of the face salient region does not correspond to the position of the face contour region, the spatial consistency condition of the face region frame and the face contour region includes that the face region frame does not spatially correspond to the face contour region, and the non-face picture classification condition includes that the face region frame is determined to be a non-face by a face classification model, wherein detailed descriptions about the respective conditions correspond to those in the above exemplary embodiments, and descriptions will not be repeated. In the case that none of the above three conditions is satisfied, the target face picture is determined to be a valid face picture, and subsequent picture processing, such as, but not limited to, face alignment processing and face recognition processing, may be performed on the target face picture.
According to the exemplary embodiment of the disclosure, the target face picture can be determined to be the invalid face picture under the condition that the target face picture meets any one condition, and the picture with stronger interference can be effectively filtered through the multiple filtering conditions, so that the recognition precision is effectively improved on the premise that the face recognition processing flow is hardly changed and the excessive operation amount is not increased.
Further, with the picture processing method according to the exemplary embodiment of the present disclosure, a target face picture that is not determined as an invalid face picture may be used for subsequent face recognition processing. For example, but not limited to, a target face picture may be input into a face alignment module and/or a face recognition module. By way of example, the face recognition module may be an S-ResNet-269 network structure trained over a WebFace42M dataset. The present disclosure is not limited to subsequent face recognition processing, and may be implemented using other methods.
Fig. 6 shows a block diagram of a picture processing device according to an embodiment of the present disclosure.
Referring to fig. 6, a picture processing device 600 according to an embodiment of the present disclosure may include a keypoint detection module 601, a facial region determination module 602, and a face picture processing module 603.
Specifically, the keypoint detection module 601 is configured to detect a plurality of facial keypoints from a target face picture. The facial region determination module 602 is configured to determine a facial contour region and a facial salient region based on the plurality of facial keypoints. The face picture processing module 603 is configured to: and determining that the target face picture is an invalid face picture in a case where the face salient region satisfies a first predetermined condition, wherein the first predetermined condition includes that the shape of the face salient region is a predetermined shape and that the position of the face salient region does not correspond to the position of the face contour region. As an example, the face picture processing module 603 may filter out a target face picture that is an invalid face picture in a case where the face significant region satisfies a first predetermined condition.
That is, the key point detection module 601 may perform operations corresponding to step S101 of the picture processing method described above with reference to fig. 1 to 4, the face region determination module 602 may perform operations corresponding to step S102 of the picture processing method described above with reference to fig. 1 to 4, and the face picture processing module 603 may perform operations corresponding to step S103 of the picture processing method described above with reference to fig. 1 to 4.
According to an exemplary embodiment of the present disclosure, the facial region determination module 602 is configured to: determining a region containing the plurality of facial keypoints as a facial contour region; an area containing a partial face key point is determined as a face salient area, wherein the partial face key point is a face key point other than a face key point representing a face contour among the plurality of face key points.
According to an exemplary embodiment of the present disclosure, the face picture processing module 603 is configured to: in a case where it is determined that the shape of a salient region envelope frame in the face salient region is a predetermined shape and the position of the salient region envelope frame in the width direction does not correspond to the position of a key point envelope frame in the face contour region, the face salient region is determined to satisfy the first predetermined condition, wherein the salient region envelope frame is a minimum rectangular envelope frame of a face key point in the face salient region, and the key point envelope frame is a minimum rectangular envelope frame of a face key point in the face contour region.
According to an exemplary embodiment of the present disclosure, the face picture processing module 603 is configured to: determining that the shape of the salient region envelope frame is a predetermined shape when the aspect ratio of the salient region envelope frame is greater than or equal to a first threshold; and determining that the position of the salient region envelope frame in the width direction does not correspond to the position of the key point envelope frame in the width direction when the ratio of the distance between the central axis of the salient region envelope frame in the height direction and the central axis of the key point envelope frame in the height direction and the width of the key point envelope frame is less than or equal to a second threshold.
According to an exemplary embodiment of the present disclosure, the picture processing device 600 further includes a face region box detection module (not shown). The face region frame detection module is configured to detect a face region frame from the target face picture. According to an exemplary embodiment of the present disclosure, the face picture processing module 603 is further configured to: and determining that the target face picture is an invalid face picture under the condition that the face region frame meets a second preset condition, wherein the second preset condition comprises at least one of the following conditions: determining the face region frame as a non-face through a face classification model; and the face region box does not spatially correspond to the face contour region.
According to an exemplary embodiment of the present disclosure, the face picture processing module 603 is configured to: amplifying the face region frame in the target face picture; cutting a part of the target face picture corresponding to the enlarged face region frame into the face picture; the face picture is classified as a non-face picture by using a face classification model.
According to an exemplary embodiment of the present disclosure, the face picture processing module 603 is configured to classify the face picture as a non-face picture by using a face classification model by: acquiring a face confidence coefficient of the target face picture, which indicates the probability that the target face picture contains a face; and when the face confidence is larger than or equal to a third threshold value and smaller than or equal to a fourth threshold value, the face picture is a non-face picture by using a face classification model.
According to an exemplary embodiment of the present disclosure, the face classification model is trained using positive samples comprising a plurality of face picture samples and negative samples comprising a plurality of non-face picture samples, wherein the positive samples are obtained by cropping portions of a picture in a dataset corresponding to at least one of the following as face picture samples: the enlarged face region frame in the picture, the face label frame of the picture and the sliding window meeting specific conditions in the cross-merging ratio with the face label frame; and the negative sample is obtained by cropping a portion of the picture in the dataset corresponding to at least one of the following as a non-face picture sample: and the cross ratio of the face label frame and the face label frame does not meet the sliding window of the specific condition, and the sliding window with a preset step length.
According to an exemplary embodiment of the present disclosure, the face picture processing module 603 is configured to determine that the face region box does not spatially correspond to the face contour region by: and determining that the face region frame and the face contour region do not spatially correspond based on the spatial overlapping degree of the face region frame and the key point envelope frame in the face contour region, wherein the key point envelope frame is the minimum rectangular envelope frame of the plurality of face key points.
According to an exemplary embodiment of the present disclosure, the face picture processing module 603 is configured to: determining the intersection ratio of the face region frame and the key point envelope frame; and under the condition that the intersection ratio is smaller than or equal to a fifth threshold value, determining that the face region frame and the face contour region do not correspond in space.
A picture processing apparatus according to another exemplary embodiment of the present disclosure may include: the device comprises a key point detection module, a face area frame detection module, a face area determination module and a face picture processing module. A picture processing device according to another exemplary embodiment of the present disclosure may be configured to perform operations corresponding to the picture processing method described with reference to fig. 5, and a description thereof will not be repeated.
According to an embodiment of the present disclosure, there is also provided a picture processing apparatus including: a face region frame detection module configured to detect a face region frame from a target face picture; the face picture processing module is configured to: amplifying the face region frame in the target face picture; cutting a part of the target face picture corresponding to the enlarged face region frame into the face picture; acquiring a face confidence coefficient of the target face picture, which indicates the probability that the target face picture contains a face; and when the face confidence is larger than or equal to a third threshold value and smaller than or equal to a fourth threshold value, the face picture is a non-face picture by using a face classification model.
According to an embodiment of the present disclosure, there is also provided a picture processing method, including: a face region frame detection module configured to detect a face region frame from a target face picture; a keypoint detection module configured to detect a plurality of facial keypoints from the target face picture; a face region determination module configured to determine a face contour region based on the plurality of face keypoints; the face picture processing module is configured to: and determining that the face region frame and the face contour region do not spatially correspond based on the spatial overlapping degree of the face region frame and the key point envelope frame in the face contour region, wherein the key point envelope frame is the minimum rectangular envelope frame of the plurality of face key points.
The specific manner in which the respective modules of the picture processing apparatus in the above embodiments perform operations has been described in detail in the embodiments of the related picture processing method, and will not be described in detail herein.
Further, it should be understood that the various modules in the picture processing device according to exemplary embodiments of the present disclosure may be implemented as hardware components and/or software components. The individual modules may be implemented, for example, using a Field Programmable Gate Array (FPGA) or an Application Specific Integrated Circuit (ASIC), depending on the processing performed by the individual modules as defined.
There is also provided, in accordance with an embodiment of the present disclosure, an electronic device including: at least one processor; at least one memory storing computer-executable instructions, wherein the computer-executable instructions, when executed by the at least one processor, cause the at least one processor to perform a picture processing method as described above.
According to an exemplary embodiment of the present disclosure, the electronic device may be a PC computer, tablet device, personal digital assistant, smart phone, or other device capable of executing the above-described set of instructions. Here, the electronic device is not necessarily a single electronic device, but may be any device or an aggregate of circuits capable of executing the above-described instructions (or instruction set) singly or in combination. The electronic device may also be part of an integrated control system or system manager, or may be configured as a portable electronic device that interfaces with either locally or remotely (e.g., via wireless transmission).
In an electronic device, a processor may include a Central Processing Unit (CPU), a Graphics Processor (GPU), a programmable logic device, a special purpose processor system, a microcontroller, or a microprocessor. By way of example, and not limitation, processors may also include analog processors, digital processors, microprocessors, multi-core processors, processor arrays, network processors, and the like.
The processor may execute instructions or code stored in the memory, wherein the memory may also store data. The instructions and data may also be transmitted and received over a network via a network interface device, which may employ any known transmission protocol.
The memory may be integrated with the processor, for example, RAM or flash memory disposed within an integrated circuit microprocessor or the like. In addition, the memory may include a stand-alone device, such as an external disk drive, a storage array, or any other storage device usable by a database system. The memory and the processor may be operatively coupled or may communicate with each other, for example, through an I/O port, a network connection, etc., such that the processor is able to read files stored in the memory.
In addition, the electronic device may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, mouse, touch input device, etc.). All components of the electronic device may be connected to each other via a bus and/or a network.
According to an exemplary embodiment of the present disclosure, there may also be provided a computer-readable storage medium, wherein instructions in the computer-readable storage medium, when executed by at least one processor, cause the at least one processor to perform the picture processing method as described above.
Examples of computer readable storage media according to exemplary embodiments of the present disclosure include: read-only memory (ROM), random-access programmable read-only memory (PROM), electrically erasable programmable read-only memory (EEPROM), random-access memory (RAM), dynamic random-access memory (DRAM), static random-access memory (SRAM), flash memory, nonvolatile memory, CD-ROM, CD-R, CD + R, CD-RW, CD+RW, DVD-ROM, DVD-R, DVD + R, DVD-RW, DVD+RW, DVD-RAM, BD-ROM, BD-R, BD-R LTH, BD-RE, blu-ray or optical disk storage, hard Disk Drives (HDD), solid State Disks (SSD), card memory (such as multimedia cards, secure Digital (SD) cards or ultra-fast digital (XD) cards), magnetic tape, floppy disks, magneto-optical data storage, hard disks, solid state disks, and any other means configured to store computer programs and any associated data, data files and data structures in a non-transitory manner and to provide the computer programs and any associated data, data files and data structures to a processor or computer to enable the processor or computer to execute the programs. The computer programs in the computer readable storage media described above can be run in an environment deployed in an electronic device, such as a client, host, proxy device, server, etc., and further, in one example, the computer programs and any associated data, data files, and data structures are distributed across networked computer systems such that the computer programs and any associated data, data files, and data structures are stored, accessed, and executed in a distributed fashion by one or more processors or computers.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the claims.
Claims (15)
1. A picture processing method, comprising:
detecting a plurality of facial key points from a target face picture;
determining a face contour region and a face salient region based on the plurality of face keypoints;
and determining that the target face picture is an invalid face picture in a case where the face salient region satisfies a first predetermined condition, wherein the first predetermined condition includes that the shape of the face salient region is a predetermined shape and that the position of the face salient region does not correspond to the position of the face contour region.
2. The picture processing method according to claim 1, wherein the step of determining a face contour region and a face salient region based on the plurality of face key points comprises:
determining a region containing the plurality of facial keypoints as a facial contour region;
an area containing a partial face key point is determined as a face salient area, wherein the partial face key point is a face key point other than a face key point representing a face contour among the plurality of face key points.
3. The picture processing method according to claim 1, wherein the face salient region is determined to satisfy the first predetermined condition by:
in a case where it is determined that the shape of a salient region envelope frame in the face salient region is a predetermined shape and the position of the salient region envelope frame in the width direction does not correspond to the position of a key point envelope frame in the face contour region, the face salient region is determined to satisfy the first predetermined condition, wherein the salient region envelope frame is a minimum rectangular envelope frame of a face key point in the face salient region, and the key point envelope frame is a minimum rectangular envelope frame of a face key point in the face contour region.
4. A picture processing method as claimed in claim 3, wherein,
determining that the shape of the salient region envelope frame is a predetermined shape when the aspect ratio of the salient region envelope frame is greater than or equal to a first threshold; the method comprises the steps of,
and determining that the position of the salient region envelope frame in the width direction does not correspond to the position of the key point envelope frame in the width direction under the condition that the ratio of the distance between the central axis of the salient region envelope frame in the height direction and the central axis of the key point envelope frame in the height direction and the width of the key point envelope frame is smaller than or equal to a second threshold value.
5. The picture processing method according to claim 1, further comprising:
detecting a face region frame from the target face picture;
and determining that the target face picture is an invalid face picture under the condition that the face region frame meets a second preset condition, wherein the second preset condition comprises at least one of the following conditions: determining the face region frame as a non-face through a face classification model; and the face region box does not spatially correspond to the face contour region.
6. The picture processing method according to claim 5, wherein the step of determining that the face region box is a non-face by a face classification model comprises:
Amplifying the face region frame in the target face picture;
cutting a part of the target face picture corresponding to the enlarged face region frame into the face picture;
acquiring a face confidence coefficient of the target face picture, which indicates the probability that the target face picture contains a face;
and when the face confidence is larger than or equal to a third threshold value and smaller than or equal to a fourth threshold value, the face picture is a non-face picture by using a face classification model.
7. The picture processing method according to claim 5, wherein it is determined that the face region frame does not spatially correspond to the face contour region by:
and determining that the face region frame and the face contour region do not spatially correspond based on the spatial overlapping degree of the face region frame and the key point envelope frame in the face contour region, wherein the key point envelope frame is the minimum rectangular envelope frame of the plurality of face key points.
8. The picture processing method according to claim 7, wherein the step of determining that the face region frame and the face contour region do not spatially correspond based on a degree of spatial overlap of the face region frame and a keypoint envelope frame in the face contour region comprises:
Determining the intersection ratio of the face region frame and the key point envelope frame;
and under the condition that the intersection ratio is smaller than or equal to a fifth threshold value, determining that the face region frame and the face contour region do not correspond in space.
9. A picture processing method, comprising:
detecting a face region frame from a target face picture;
amplifying the face region frame in the target face picture;
cutting a part of the target face picture corresponding to the enlarged face region frame into the face picture;
acquiring a face confidence coefficient of the target face picture, which indicates the probability that the target face picture contains a face;
and when the face confidence is larger than or equal to a third threshold value and smaller than or equal to a fourth threshold value, the face picture is a non-face picture by using a face classification model.
10. A picture processing method, comprising:
detecting a face region frame and a plurality of face key points from a target face picture;
determining a face contour region based on the plurality of face keypoints;
and determining that the face region frame and the face contour region do not spatially correspond based on the spatial overlapping degree of the face region frame and the key point envelope frame in the face contour region, wherein the key point envelope frame is the minimum rectangular envelope frame of the plurality of face key points.
11. A picture processing apparatus comprising:
a key point detection module configured to detect a plurality of facial key points from a target face picture;
a face region determination module configured to determine a face contour region and a face salient region based on the plurality of face keypoints;
the face picture processing module is configured to: and determining that the target face picture is an invalid face picture in a case where the face salient region satisfies a first predetermined condition, wherein the first predetermined condition includes that the shape of the face salient region is a predetermined shape and that the position of the face salient region does not correspond to the position of the face contour region.
12. A picture processing apparatus comprising:
a face region frame detection module configured to detect a face region frame from a target face picture;
the face picture processing module is configured to:
amplifying the face region frame in the target face picture;
cutting a part of the target face picture corresponding to the enlarged face region frame into the face picture;
acquiring a face confidence coefficient of the target face picture, which indicates the probability that the target face picture contains a face;
And when the face confidence is larger than or equal to a third threshold value and smaller than or equal to a fourth threshold value, the face picture is a non-face picture by using a face classification model.
13. A picture processing apparatus comprising:
a face region frame detection module configured to detect a face region frame from a target face picture;
a keypoint detection module configured to detect a plurality of facial keypoints from the target face picture;
a face region determination module configured to determine a face contour region based on the plurality of face keypoints;
the face picture processing module is configured to: and determining that the face region frame and the face contour region do not spatially correspond based on the spatial overlapping degree of the face region frame and the key point envelope frame in the face contour region, wherein the key point envelope frame is the minimum rectangular envelope frame of the plurality of face key points.
14. An electronic device, comprising:
at least one processor;
at least one memory storing computer-executable instructions,
wherein the computer executable instructions, when executed by the at least one processor, cause the at least one processor to perform the picture processing method of any one of claims 1 to 10.
15. A computer readable storage medium, wherein instructions in the computer readable storage medium, when executed by at least one processor, cause the at least one processor to perform the picture processing method of any one of claims 1 to 10.
Priority Applications (5)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310795975.7A CN116884062A (en) | 2023-06-30 | 2023-06-30 | Image processing method, image processing equipment, electronic equipment and storage medium |
| KR1020240058033A KR20250003295A (en) | 2023-06-30 | 2024-04-30 | Method and apparatus for image processing |
| US18/756,803 US20250005961A1 (en) | 2023-06-30 | 2024-06-27 | Method and apparatus with image processing |
| JP2024105572A JP2025010050A (en) | 2023-06-30 | 2024-06-28 | Image processing method and apparatus |
| EP24185602.0A EP4485397A1 (en) | 2023-06-30 | 2024-07-01 | Method and apparatus with image processing |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310795975.7A CN116884062A (en) | 2023-06-30 | 2023-06-30 | Image processing method, image processing equipment, electronic equipment and storage medium |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN116884062A true CN116884062A (en) | 2023-10-13 |
Family
ID=88263555
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202310795975.7A Pending CN116884062A (en) | 2023-06-30 | 2023-06-30 | Image processing method, image processing equipment, electronic equipment and storage medium |
Country Status (2)
| Country | Link |
|---|---|
| KR (1) | KR20250003295A (en) |
| CN (1) | CN116884062A (en) |
-
2023
- 2023-06-30 CN CN202310795975.7A patent/CN116884062A/en active Pending
-
2024
- 2024-04-30 KR KR1020240058033A patent/KR20250003295A/en active Pending
Also Published As
| Publication number | Publication date |
|---|---|
| KR20250003295A (en) | 2025-01-07 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20210124919A1 (en) | System and Methods for Authentication of Documents | |
| CN106228129B (en) | A kind of human face in-vivo detection method based on MATV feature | |
| US9014432B2 (en) | License plate character segmentation using likelihood maximization | |
| CN105512683B (en) | Target localization method and device based on convolutional neural network | |
| WO2019169532A1 (en) | License plate recognition method and cloud system | |
| WO2022100337A1 (en) | Face image quality assessment method and apparatus, computer device and storage medium | |
| WO2023273081A1 (en) | Clustering method, clustering apparatus, and non-transitory computer-readable storage medium | |
| KR20040008792A (en) | Method and system for face detecting using classifier learned decision boundary with face/near-face images | |
| CN111222589B (en) | Image text recognition method, device, equipment and computer storage medium | |
| US10007678B2 (en) | Image processing apparatus, image processing method, and recording medium | |
| CN103699905A (en) | A license plate location method and device | |
| CN110276295B (en) | Vehicle identification number detection and identification method and device | |
| CN112651996B (en) | Target detection tracking method, device, electronic equipment and storage medium | |
| JP7623868B2 (en) | Data processing device and method | |
| CN111507332A (en) | Vehicle VIN code detection method and equipment | |
| CN109389050A (en) | A method for identifying connection relationships in flowcharts | |
| CN113158777A (en) | Quality scoring method, quality scoring model training method and related device | |
| CN115019052B (en) | Image recognition method, device, electronic equipment and storage medium | |
| JP2010231254A (en) | Image analysis apparatus, image analysis method, and program | |
| CN114972940A (en) | Fusion model, fusion method, training method, device, equipment and medium | |
| US20180349739A1 (en) | Method and apparatus for recognizing object based on vocabulary tree | |
| CN120125915A (en) | Input image category recognition method, device, computer equipment and storage medium | |
| CN116884062A (en) | Image processing method, image processing equipment, electronic equipment and storage medium | |
| CN112101139A (en) | Humanoid detection method, device, equipment and storage medium | |
| CN119169634A (en) | A text recognition method, device and storage medium based on deep learning |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication |