[go: up one dir, main page]

WO2018108129A1 - 用于识别物体类别的方法及装置、电子设备 - Google Patents

用于识别物体类别的方法及装置、电子设备 Download PDF

Info

Publication number
WO2018108129A1
WO2018108129A1 PCT/CN2017/116218 CN2017116218W WO2018108129A1 WO 2018108129 A1 WO2018108129 A1 WO 2018108129A1 CN 2017116218 W CN2017116218 W CN 2017116218W WO 2018108129 A1 WO2018108129 A1 WO 2018108129A1
Authority
WO
WIPO (PCT)
Prior art keywords
category
information
neural network
convolutional neural
key
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2017/116218
Other languages
English (en)
French (fr)
Inventor
李步宇
闫俊杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sensetime Technology Development Co Ltd
Original Assignee
Beijing Sensetime Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sensetime Technology Development Co Ltd filed Critical Beijing Sensetime Technology Development Co Ltd
Publication of WO2018108129A1 publication Critical patent/WO2018108129A1/zh
Priority to US16/423,061 priority Critical patent/US10977523B2/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/255Detecting or recognising potential candidate objects based on visual cues, e.g. shapes
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/457Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by analysing connectivity, e.g. edge linking, connected component analysis or slices
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the present application relates to the field of computer vision technology, and in particular, to a method and apparatus for identifying an object category, and an electronic device.
  • Object detection is the detection of the position of an object in a given image and the identification of the type of object. Specifically, it can be divided into two processes of object positioning and object recognition, wherein the object positioning is to find an circumscribed rectangular frame (ie, an object candidate frame) containing an object in the image.
  • the object recognition is to determine which category the object in the object candidate box belongs to after obtaining the object candidate frame, or the object in the candidate frame belongs to the background.
  • the embodiment of the present application provides a technical solution for identifying an object category.
  • a method for identifying an object category includes: determining M pieces from respective object candidate frames according to position information of M key points in a plurality of object candidate frames of an image to be detected Key point neighborhood area, where M is less than or equal to the total number of key points of N preset object categories, M and N are positive integers; according to the M key point neighborhood areas, used for identification A convolutional neural network model of an object class in the image, determining class information of at least one object in the image to be detected.
  • any method for identifying an object category provided by the embodiment of the present application, wherein the convolutional neural network model includes K convolution layers, a pooling layer, and an output layer, where K is greater than or equal to a positive integer of 2; determining, according to the M key neighborhood area, a convolutional neural network model for identifying an object category in the image, and determining category information of at least one object in the image to be detected, including:
  • the K-1 convolution layer outputs a feature map corresponding to the M key points one by one; and outputs a response map of the key points obtained by converting each of the feature maps from the Kth convolution layer; a mapping area obtained by mapping the M key neighborhood regions to the corresponding key points; obtaining a pooling result corresponding to the M mapping areas from the pooling layer; and based on the M pooling results
  • the output layer obtains a first confidence level belonging to each preset object category, and determines category information of the at least one object according to the first confidence level belonging to each preset object category.
  • any method for identifying an object category which is provided by the embodiment of the present application, wherein the obtaining, according to the M pooling results, the first object belonging to each preset object category from the output layer Confidence, comprising: averaging the corresponding pooling results of key points belonging to the same preset object category to obtain scores of respective preset object categories; and outputting according to the scores of the respective preset object categories
  • the layer obtains a first confidence level belonging to each preset object class.
  • any method for identifying an object category is provided in combination with the embodiment of the present application, wherein the convolutional nerve for identifying an object category in an image is used in the neighborhood according to the M key points a network model, before determining category information of at least one object in the image to be detected, the method further comprising: training the convolutional neural network model;
  • the training of the convolutional neural network model includes: acquiring a sample image including location information of the key point, object frame labeling information, and category labeling information; performing convolution processing on the sample image to obtain a convolution result; Determining the object frame labeling information and the category labeling information, determining whether the object frame position information and/or the category information in the convolution result meet the training completion condition; if satisfied, completing the training on the convolutional neural network model; If not satisfied, the parameters of the convolutional neural network model are adjusted according to the object frame position information and/or the category information in the convolution result, and the convolution is performed according to the adjusted parameters of the convolutional neural network model.
  • the neural network model performs iterative training until the object frame position information and/or category information after the iterative training satisfies the training completion condition.
  • any method for identifying an object category according to any one of the embodiments of the present application, wherein the location information of the M key points in the plurality of object candidate frames according to the image to be detected is from the corresponding object candidate Before determining the M key neighborhood regions in the frame, the method further includes: acquiring a location of the multiple object candidate frames; acquiring M of the multiple object candidate frames according to the locations of the multiple object candidate frames Location information for key points.
  • the method for identifying an object category is provided in any one of the embodiments of the present application, wherein the acquiring the location of the multiple object candidate frame comprises: obtaining the multiple by using a first convolutional neural network. Position information of the object candidate frames, or obtaining the position information of the plurality of object candidate frames by using a selective search method, or obtaining position information of the plurality of object candidate frames by using an edge block Edge Box method.
  • any method for identifying an object category wherein the acquiring, according to the location of the multiple object candidate frames, the M key points in the multiple object candidate frames
  • the location information further includes: acquiring, according to the location of the plurality of object candidate frames, a second confidence level corresponding to each key point, where the second confidence level is data indicating whether the key point exists in the candidate box. ;
  • any method for identifying an object category according to any one of the embodiments of the present application, wherein the location information of the M key points in the plurality of object candidate frames according to the image to be detected is from the corresponding object candidate
  • the method further includes: correcting the position of the plurality of object candidate frames by using a second convolutional neural network, and obtaining position information of the corrected plurality of object candidate frames.
  • any method for identifying an object category is provided in combination with the embodiment of the present application, wherein when the object category is a non-background class, the plurality of object candidate frames according to the image to be detected are M.
  • the location information of the key point is determined from the corresponding object candidate frame, and the location of the M key points is determined according to the size information of the corresponding object candidate frame and the preset size multiple.
  • the M rectangular areas that are centered are used as the M key point neighborhood areas.
  • any method for identifying an object category provided by the embodiment of the present application, wherein when the object category is a background class, the M key according to the multiple object candidate frames of the image to be detected
  • the location information of the point is determined from the corresponding object candidate frame, and the area of the object candidate frame corresponding to the location information of the M key points is determined as the M key point neighborhood. region.
  • any method for identifying an object category provided by the embodiment of the present application, wherein the first convolutional neural network is a regional suggestion network RPN, and/or the second convolutional neural network Convolutional neural network Faster R-CNN for fast regions.
  • the first convolutional neural network is a regional suggestion network RPN
  • the second convolutional neural network Convolutional neural network Faster R-CNN for fast regions.
  • an apparatus for identifying an object category comprising: a key point neighborhood area determining module, configured to use M key points in a plurality of object candidate frames according to an image to be detected The location information determines M key neighborhood regions from the corresponding object candidate frame, where M is less than or equal to the total number of key points of the N preset object categories, and M and N are positive integers; And a module, configured to determine, according to the M key neighborhood regions, a category information of at least one object in the image to be detected by using a convolutional neural network model for identifying an object category in the image.
  • any device for identifying an object category is provided in combination with the embodiment of the present application, wherein the convolutional neural network model includes K convolution layers, a pooling layer, and an output layer, where K is greater than or equal to a positive integer of 2;
  • the category information determining module includes: a feature map output unit for outputting a feature map corresponding to the M key points one by one from the K-1th convolution layer; and a response graph output unit for using the Kth
  • the convolution layer outputs a response map of the key points obtained after each of the feature maps are transformed;
  • the mapping area acquiring unit is configured to map the M key point neighborhood regions to the response maps of the corresponding key points respectively to obtain a mapping a pooling result output unit, configured to output a pooling result corresponding to the M mapping areas from the pooling layer;
  • a confidence acquisition and information determining unit configured to output the output from the M based on the pooling result
  • the layer obtains a first confidence level belonging to each preset object category, and determines category information of the at least one object according to the first confidence level belonging to each preset object category.
  • any device for identifying an object category provided by the embodiment of the present application, wherein the confidence acquisition and information determining unit is configured to perform pooling on key points belonging to the same preset object category.
  • the confidence acquisition and information determining unit is configured to perform pooling on key points belonging to the same preset object category.
  • an average is obtained to obtain a score of each preset object category; and a first confidence level belonging to each preset object category is obtained from the output layer according to the score of each preset object category.
  • any device for identifying an object category is provided in combination with the embodiment of the present application, wherein the device further includes: a model training module, configured to train the convolutional neural network model;
  • the model training module is configured to acquire a sample image including position information of a key point, object frame labeling information, and category labeling information; performing convolution processing on the sample image to obtain a convolution result; and marking information according to the object frame and Type labeling information, determining whether the object frame position information and/or category information in the convolution result satisfies a training completion condition; if satisfied, completing training on the convolutional neural network model; if not, according to the Adjusting the parameters of the convolutional neural network model according to the object frame position information and/or category information in the convolution result, and performing iterative training on the convolutional neural network model according to the adjusted parameters of the convolutional neural network model Until the object frame position information and/or category information after the iterative training satisfies the training completion condition.
  • any device for identifying an object category is provided in combination with the embodiment of the present application, wherein the device further includes: a candidate frame position obtaining module, configured to acquire a position of the plurality of object candidate frames;
  • the point location information obtaining module is configured to acquire location information of the M key points in the plurality of object candidate frames according to the locations of the plurality of object candidate frames.
  • the device for identifying an object category is provided in any one of the embodiments of the present application, wherein the candidate frame location acquiring module is configured to obtain a location of the multiple object candidate frames by using a first convolutional neural network.
  • Information, or the location information of the plurality of object candidate frames is obtained by using a selective search device, or the location information of the plurality of object candidate frames is obtained by using an edge block Edge Box device.
  • any device for identifying an object category is provided in combination with the embodiment of the present application, wherein the key point location information acquiring module is further configured to acquire each key point according to the location of the multiple object candidate frames.
  • the second confidence being data representing a likelihood that a key point exists in the candidate box;
  • the category information determining module further includes: a pooling result marking unit, configured to record a pooling result of the response map corresponding to the key point whose second confidence level is lower than the set confidence threshold.
  • any device for identifying an object category is provided in combination with the embodiment of the present application, wherein the device further includes: a candidate frame position correction module, configured to use the second convolutional neural network to the plurality of The object candidate frame position is corrected, and the position information of the corrected plurality of object candidate frames is obtained.
  • a candidate frame position correction module configured to use the second convolutional neural network to the plurality of The object candidate frame position is corrected, and the position information of the corrected plurality of object candidate frames is obtained.
  • any device for identifying an object category is provided in combination with the embodiment of the present application, wherein when the object category is a non-background class, the key point neighborhood region determining module is configured to use the corresponding object according to the corresponding object.
  • the size information of the candidate frame and the preset size multiple are used to determine M rectangular regions centered on the positions of the M key points as the M key point neighborhood regions.
  • any device for identifying an object category is provided in combination with the embodiment of the present application, wherein when the object category is a background class, the keypoint neighborhood region determining module is configured to be used with the M The area of the object candidate frame corresponding to the position information of the key points is determined as the M key point neighborhood area.
  • any apparatus for identifying an object category provided by the embodiment of the present application, wherein the first convolutional neural network is a regional suggestion network RPN, and/or the second convolutional neural network Convolutional neural network Faster R-CNN for fast regions.
  • the first convolutional neural network is a regional suggestion network RPN
  • the second convolutional neural network Convolutional neural network Faster R-CNN for fast regions.
  • an electronic device includes: a processor, a memory, a communication component, and a communication bus, wherein the processor, the memory, and the communication component complete communication with each other through the communication bus; the memory is configured to store at least one Execution of instructions that cause the processor to perform operations corresponding to the method for identifying an object class as provided above.
  • a computer readable storage medium storing a computer program, the computer program comprising computer instructions, when the computer instructions are run in a processor of a device, The processor executes steps in a method for identifying an object class for implementing any of the embodiments, for example, the computer readable storage medium stores: M keys for a plurality of object candidate frames according to an image to be detected The location information of the point determines an executable instruction of the M key point neighborhood region from the corresponding object candidate frame; and is configured to use the convolutional nerve for identifying the object class in the image according to the M key point neighborhood regions a network model, executable instructions for determining category information of at least one object in the image to be detected.
  • the key point neighborhood region is determined from the corresponding object candidate frame according to the position information of the key point in the object candidate box, and the key point neighborhood is further utilized.
  • FIG. 1 is a flow chart showing a method for identifying an object category according to an embodiment of the present application
  • FIG. 2 is a flow chart showing another method for identifying an object category according to an embodiment of the present application.
  • FIG. 3 is a logic block diagram showing an apparatus for identifying an object category, in accordance with an embodiment of the present application.
  • FIG. 4 is a logic block diagram showing another apparatus for identifying an object category, in accordance with an embodiment of the present application.
  • FIG. 5 is a schematic structural view showing an electronic device according to an embodiment of the present application.
  • FIG. 1 is a flow chart showing a method for identifying an object category according to an embodiment of the present application. The method is performed by a device that includes means for identifying an object class.
  • step S110 M key point neighborhood regions are determined from corresponding object candidate frames according to position information of M key points in a plurality of object candidate frames of the image to be detected.
  • the object candidate frame may be an circumscribed rectangular frame in the image that may be an object.
  • the image to be detected may be a picture, a photo, a certain frame image in the video, or the like, but is not limited thereto.
  • the photo is the image to be detected, and the circumscribed rectangle containing the dog on the photo may be an object candidate frame.
  • the photo also includes other objects such as a chair, similarly, the object candidate frame can be used to position other objects such as a chair.
  • the photo as the image to be detected may contain a plurality of object candidate frames.
  • M is less than or equal to the total number of key points of N preset object categories, for example, M is the total number of key points of N preset object categories, and M and N are positive integers, each At least one key point of at least one object category in the object candidate box, for example, 0 ⁇ N ⁇ 200, 200 ⁇ M ⁇ 4000.
  • a common object detection data set such as PASCAL VOC
  • 20 categories of pre-set object categories such as people, dogs, trains, and chairs can be collectively referred to as non-background classes.
  • a category object it is defined as a background class.
  • the object recognition problem translates into a classification problem of 21 categories.
  • For each category define the number of key points. For example, when the category is human, you can define 11 key points such as head, left hand, right hand, left shoulder, and right shoulder. When the category is dog, you can define the head, limbs, tail, etc. 10 key points.
  • the position information of the key point may be the coordinate data of the key point, and the position information of the key point in the object candidate box can be obtained by the key point prediction technology.
  • key point prediction is performed using methods such as convolutional neural networks.
  • the present application can predict a set of key points by using a convolutional neural network for each object category preset, such that a set of key points corresponds to an object category, and each of each set of key points A key point corresponds to its position information in the image to be detected.
  • the key point neighborhood area in this application refers to an area containing key points and a certain range around the key points.
  • step S110 may include: determining, according to the size information of the corresponding object candidate frame and the preset size multiple, respectively, the M key points
  • the M rectangular areas with the center of the position are the M key neighborhood areas.
  • step S110 may include determining an area of the object candidate frame corresponding to the position information of the M key points as the M key point neighborhood areas.
  • a value between a quarter and a tenth of the width and height of the object candidate frame may be selected as the width and height of the key point neighborhood region. Assuming that the width of the object candidate frame is w and the height is h, then the region centered on the key point, w/5 wide, and h/5 high is taken as the key point neighborhood of the key point.
  • the background category is different from the above processing, but the area of the entire object candidate frame is taken as the key point neighborhood area.
  • step S120 based on the M key neighborhood regions, a convolutional neural network model for identifying an object category in the image is used to determine category information of at least one object in the image to be detected.
  • the image to be detected is based on the feature map obtained by the multiple convolution layers of the convolutional neural network model, and a convolution process is added to make 200 features.
  • the graph is transformed into a response graph of 200 key points.
  • 200 mapping areas are mapped to the response maps of the corresponding key points to obtain 200 mapping areas, and then the 200 mapping areas are pooled to obtain pooling results, and the pooling results of the same category are equal to the weight voting. (ie, average) to get the score for that category.
  • the 21 scores including the background category are done as softmax, and the confidence that any object candidate box belongs to each category is obtained. Generally, the category with the highest degree of confidence is selected as the category information of the object in the object candidate box.
  • the category information of the object in the object candidate frame is obtained, that is, the category information of the object in the image to be detected is determined. For example, if a certain object to be detected contains five object candidate frames, wherein the four object candidate frames belong to the background category with the highest confidence, and one object candidate box belongs to the person, the category has the highest confidence.
  • the type information of the object in the image identified by the method is a person.
  • the key point neighborhood is further determined from the corresponding object candidate frame according to the position information of the key points in the object candidate frame.
  • the category information of the object in the image to be detected is determined by using the key point neighborhood area and the convolutional neural network model for identifying the object category in the image.
  • step S120 may include: outputting a feature map corresponding to the M key points one by one from the K-1th convolution layer; and outputting each feature map from the Kth convolution layer a response map of the key points obtained after the transformation; respectively mapping the M key neighborhood regions to the response map of the corresponding key points to obtain a mapping region; and outputting the pooling result corresponding to the M mapping regions from the pooling layer;
  • the M pooling results obtain the first confidence level of each preset object category from the output layer, and determine the category information of the at least one object according to the first confidence level belonging to each preset object category.
  • the processing of obtaining the first confidence level belonging to each preset object category from the output layer based on the M pooling results includes: obtaining a pooled result corresponding to the key points belonging to the same preset object category The average value is obtained as a score of each preset object category; the first confidence level belonging to each preset object category is obtained from the output layer according to the score of each preset object category.
  • the embodiment has the following technical effects: the feature map generated by the convolutional layer of the convolutional neural network model for pooling is a response graph of each key point of each category object, and the number of feature maps of different categories Depending on the number of key points defined by the category, and the neighborhood of each key point of each category is pooled on the feature map, it can be seen that the pooled area is determined by the position of the key points, and the pooled areas of different categories are different, thereby The auxiliary identification information (ie, the position of the key points) is more fully and effectively utilized, which further improves the accuracy of object recognition.
  • the method for identifying an object category further comprises: training the convolutional neural network model.
  • the processing of the training of the convolutional neural network model may include: acquiring a sample image including position information of the key point, object frame labeling information, and category labeling information; performing convolution processing on the sample image to obtain a convolution result; The object frame labeling information and the category labeling information, determining whether the object frame position information and/or the category information in the convolution result meet the training completion condition; if satisfied, completing the training of the convolutional neural network model; If not satisfied, adjusting parameters of the convolutional neural network model according to object frame position information and/or category information in the convolution result, and correcting the volume according to the adjusted parameters of the convolutional neural network model
  • the product neural network model performs iterative training until the object frame position information and/or category information after the iterative training satisfies the training completion condition.
  • An optional specific example obtains a sample image containing location information of a key point, object frame labeling information, and category labeling information; and determining a neighborhood area of each key point according to position information of key points in the sample image; from K-1 a convolutional layer outputs a feature map corresponding to the key points one by one; outputs a response map of the key points obtained by transforming each feature map from the Kth convolutional layer; respectively maps the neighborhood regions of each key point to corresponding keys a mapping area is obtained on the response map of the point; a pooling result corresponding to each mapping area is output from the pooling layer; and a first confidence level belonging to each preset object category is obtained from the output layer based on each pooling result, and according to each The first confidence level of the preset object category determines the category information of at least one object in the sample image; and according to the object frame labeling information and the category labeling information of the sample image, whether the recognition result information output by the convolutional neural network model satisfies the training completion condition (for example
  • the training data may use a picture of a public data set such as a PASCAL VOC.
  • a PASCAL VOC There are 20 categories of objects in the PASCAL VOC.
  • the model is trained, at least one picture is input for each iteration and the possible key position of each category predicted by a method such as a convolutional neural network, and the labeled feature contains standard position information of the real area of the object (That is, the above-mentioned object frame labeling information) and the labeled standard category information (that is, the above-described category labeling information) indicating the real category of the object are also input as supervised information into the convolutional neural network model.
  • the iterative function is continuously iterated by the stochastic gradient descent method so that the network parameters are continuously updated until convergence.
  • the role of the Loss layer includes inputting the position information of the object frame and the standard position information output by the network as the input, and finding the Euclidean distance or the like as the loss, and the classification layer (such as the fully connected layer in the output layer) classifying the network ( That is, the scores of each category) and the standard category information are taken as inputs, and softmaxloss is used as a loss function.
  • the convolutional neural network model can perform feature extraction and object classification in the object frame based on the key point neighborhood region, thereby having the function of determining the category information of the object in the image to be detected.
  • FIG. 2 is a flow chart showing another method for identifying an object class in accordance with an embodiment of the present application. The method is performed by a device that includes means for identifying an object class.
  • step S210 positions of a plurality of object candidate frames are acquired.
  • the location of the object candidate frame is also the object location, and the object location is to find the circumscribed rectangle of the object in the image.
  • the position of the object candidate frame is usually represented by the coordinate data of the two end points of the diagonal of the circumscribed rectangle.
  • step S210 may include: obtaining position information of a plurality of object candidate frames by using a first convolutional neural network, or obtaining position information of a plurality of object candidate frames by using a selective search method. Or, the position information of the plurality of object candidate frames is obtained by using an Edge Box method.
  • the first convolutional neural network may be a Region Proposal Network (RPN).
  • the position of the object candidate frame obtained by the above-mentioned Selective Search, RPN and the like provides a data basis for the subsequent key point prediction processing, and is beneficial to improve the accuracy of the key point prediction result.
  • step S220 the position of the plurality of object candidate frames is corrected by using the second convolutional neural network, and the position information of the corrected plurality of object candidate frames is obtained.
  • the second convolutional neural network may be a Fast Regional Convolutional Neural Network (Faster R-CNN).
  • Faster R-CNN Fast Regional Convolutional Neural Network
  • the object candidate frame can be mapped to the feature map and then extracted to regress the feature.
  • the processing of this step can make the position of the object candidate frame more accurately adhere to the body.
  • the correction of the position of the object candidate frame can also use the R-FCN method to frame the candidate into a 7*7 grid and then perform regression in different channels.
  • step S230 position information of M key points in the plurality of object candidate frames is acquired according to the positions of the plurality of object candidate frames.
  • obtaining the position information of the key points in the object candidate frame according to the position of the object candidate frame can be regarded as a process of key point prediction.
  • the processing of key point prediction can be implemented by methods such as convolutional neural networks.
  • a key point of an object of each category needs to be defined.
  • the preset object categories are 20 categories such as people, dogs, trains, and chairs.
  • the object recognition problem translates into a classification problem of 21 categories.
  • For each category define the number of key points. For example, when the category is human, you can define 11 key points such as head, left hand, right hand, left shoulder, and right shoulder.
  • When the category is dog you can define the head, limbs, tail, etc. 10 key points;
  • a deep convolutional neural network can be trained.
  • the position of the object candidate frame is mapped to the corresponding area on the feature map, and the corresponding area obtained by the mapping is pooled.
  • the characteristics of the key points distributed by the convolutional neural network are obtained, and the key points of the distribution of the key points are obtained by regression. Since there is no category information in this step, it is necessary to regress the possible key points of the 20 types of objects, and the background class directly selects the center as its key point, and does not make regression prediction. There may not be all the key points in the object candidate box, or there is a problem that the key points are occluded, but usually there will still be a key point prediction result. That is to say, assuming that the total number of key points of the 21 preset object categories is 200, position information of 200 key points is obtained according to the positions of the plurality of object candidate frames.
  • step S230 further includes: acquiring a second confidence level corresponding to each key point according to a position of the plurality of object candidate frames, where the second confidence level is a possibility of characterizing whether the key point exists in the candidate box. data. That is to say, the confidence level of each key point (such as the second confidence level) is predicted while predicting the key point position, and the predicted second confidence level can provide a judgment for which key point neighborhood areas are pooled. in accordance with.
  • step S240 M key neighborhood regions are determined from the corresponding object candidate frames according to the position information of the M key points in the plurality of object candidate frames of the image to be detected.
  • step S240 The content of the steps in the step S240 is the same as the content of the step S110 in the first embodiment, and details are not described herein again.
  • step S250 based on the M key neighborhood regions, a convolutional neural network model for identifying an object category in the image is used to determine category information of at least one object in the image to be detected.
  • step S250 may further include: recording a pooling result of the response map corresponding to the key point whose confidence level (eg, the second confidence level) does not satisfy the predetermined requirement to zero.
  • the confidence eg, the second confidence
  • step S250 may further include: recording a pooling result of the response map corresponding to the key point whose confidence level (eg, the second confidence level) does not satisfy the predetermined requirement to zero.
  • the foregoing predicted second confidence level which provides a basis for determining which key point neighborhood regions are pooled, and specifically, the confidence (eg, the second confidence) is lower than the key of the set confidence threshold. The point is considered invalid, and the subsequent pooling of the neighborhood is not performed.
  • the pooled result of the response graph corresponding to the key point that is considered invalid is directly recorded as zero. This processing makes it easier to determine the class information of the object in the image to be detected according to the key point neighborhood region and the convolutional neural network model, and does not need all the key point neighborhood regions to be pooled.
  • the confidence of each key point is also predicted, and the predicted confidence level (such as the second confidence level) can provide a basis for determining which key point neighborhood areas are pooled.
  • FIG. 3 is a logic block diagram showing an apparatus for identifying an object category according to an embodiment of the present application.
  • the apparatus for identifying an object category includes a keypoint neighborhood area determining module 310 and a category information determining module 320.
  • the key point neighborhood determining module 310 is configured to determine M key point neighborhood regions from the corresponding object candidate frames according to location information of the M key points in the plurality of object candidate frames of the image to be detected, where M is less than or equal to The total number of key points of the N preset object categories. For example, M is the total number of key points of N preset object categories, and M and N are both positive integers.
  • the category information determining module 320 is configured to determine, according to the M key neighborhood regions, a category information of at least one object in the image to be detected by using a convolutional neural network model for identifying an object category in the image.
  • the apparatus for identifying an object category further determines a key point neighborhood area from a corresponding object candidate frame according to position information of a key point in the object candidate box, further utilizes a key point neighborhood area, and is used for identifying an image.
  • a convolutional neural network model of a medium object class which determines class information of objects in the image to be detected.
  • FIG. 4 is a logic block diagram showing an apparatus for identifying an object category according to an embodiment of the present application.
  • the convolutional neural network model includes K convolutional layers, a pooling layer, and an output layer, and K is a positive integer (eg, K is a positive integer greater than or equal to 2);
  • the category information determining module 320 includes :
  • the feature map output unit 3201 is for outputting a feature map corresponding to the M key points one by one from the K-1th convolution layer.
  • the response map output unit 3202 is configured to output a response map of the key points obtained after each feature map conversion from the Kth convolution layer.
  • the mapping area obtaining unit 3203 is configured to map the M key point neighborhood areas to the response maps of the corresponding key points respectively to obtain the mapping area.
  • the pooling result output unit 3204 is configured to output a pooling result corresponding to the M mapping areas from the pooling layer.
  • the confidence acquisition and information determining unit 3205 is configured to obtain, according to the M pooling results, the first confidence level belonging to each preset object category from the output layer, and determine at least one according to the first confidence level belonging to each preset object category.
  • the category information of the object is configured to obtain, according to the M pooling results, the first confidence level belonging to each preset object category from the output layer, and determine at least one according to the first confidence level belonging to each preset object category.
  • the confidence acquisition and information determining unit 3205 is configured to average the pooled results of the key points belonging to the same preset object category to obtain the scores of the respective preset object categories; according to each preset object The score of the category obtains the first confidence level of each preset object category from the output layer.
  • the means for identifying the object category further comprises: a model training module (not shown) for training the convolutional neural network model.
  • the model training module can be used to obtain the sample image containing the position information of the key point, the object frame labeling information and the category labeling information; convolution processing the sample image to obtain the convolution result; determining the volume according to the object frame labeling information and the category labeling information Whether the object frame position information and/or the category information in the product result satisfies the training completion condition; if satisfied, the training of the convolutional neural network model is completed; if not, the object frame position information and/or according to the convolution result is obtained.
  • the category information adjusts the parameters of the convolutional neural network model, and iteratively trains the convolutional neural network model according to the adjusted parameters of the convolutional neural network model until the position information and/or category information of the object frame after the iterative training meets the training completion. condition.
  • the means for identifying the object category further includes:
  • the candidate frame position obtaining module 330 is configured to acquire the positions of the plurality of object candidate frames.
  • the key position information acquiring module 340 is configured to acquire position information of M key points in the plurality of object candidate frames according to the positions of the plurality of object candidate frames.
  • the candidate frame location obtaining module 330 is configured to obtain location information of multiple object candidate frames by using a first convolutional neural network, or obtain location information of multiple object candidate frames by using a selective search device, or adopt an edge.
  • the block Edge Box device obtains position information of a plurality of object candidate frames.
  • the key point location information obtaining module 340 is further configured to acquire a second confidence level corresponding to each key point according to the location of the multiple object candidate frames, where the second confidence level is to indicate whether the key point exists in the candidate box.
  • the possibility of data is further configured to acquire a second confidence level corresponding to each key point according to the location of the multiple object candidate frames, where the second confidence level is to indicate whether the key point exists in the candidate box.
  • the category information determining module 320 may further include: the pooling result marking unit 3206 is configured to record the pooling result of the response map corresponding to the key point whose second confidence does not satisfy the predetermined requirement to be zero.
  • the device for identifying the object category further includes: the candidate frame position correction module 350 is configured to correct the position of the plurality of object candidate frames by using the second convolutional neural network, and obtain the position information of the corrected plurality of object candidate frames. .
  • the key point neighborhood determining module 310 is configured to determine, according to the size information of the corresponding object candidate frame and the preset size multiple, respectively, centering on the positions of the M key points.
  • the M rectangular areas serve as the M key point neighborhood areas.
  • the key point neighborhood area determining module 310 is configured to determine an area of the object candidate box corresponding to the location information of the M key points as the M key point neighborhood areas.
  • the first convolutional neural network is a regional suggestion network RPN
  • the second convolutional neural network is a fast regional convolutional neural network Faster R-CNN.
  • the device for identifying the object class in the embodiment is used to implement the corresponding method for identifying the object class in the foregoing multiple method embodiments, and has the beneficial effects of the corresponding method embodiment, and details are not described herein again.
  • the apparatus for identifying an object category of the present embodiment may be provided in a suitable electronic device, including but not limited to a mobile terminal, a PC, or the like.
  • the embodiment of the present application further provides an electronic device, such as a mobile terminal, a personal computer (PC), a tablet computer, a server, and the like.
  • an electronic device such as a mobile terminal, a personal computer (PC), a tablet computer, a server, and the like.
  • FIG. 5 there is shown a schematic structural diagram of an electronic device 500 suitable for implementing a terminal device or a server of an embodiment of the present application.
  • the electronic device 500 includes one or more processors and communication components.
  • the one or more processors are, for example: one or more central processing units (CPUs) 501, and/or one or more image processing units (GPUs) 513, etc., the processors may be stored in a read only memory ( Various suitable actions and processes are performed by executable instructions in ROM) 502 or executable instructions loaded into random access memory (RAM) 503 from storage portion 508.
  • the communication elements can include, but are not limited to, a communication portion 512 and a communication portion 509.
  • the processor can communicate with the read only memory 502 and/or the random access memory 503 to execute executable instructions, connect to the communication component via the bus 504, and communicate with other target devices via the communication component, thereby completing the functions provided by the embodiments of the present application.
  • Corresponding operation of a method for example, determining M key neighborhood regions from corresponding object candidate frames according to position information of M key points in a plurality of object candidate frames of the image to be detected, wherein M is less than or equal to N
  • the total number of key points of a preset object category for example, M is the total number of key points of N preset object categories, M and N are positive integers; according to the M key point neighborhoods
  • the CPU 501, the ROM 502, and the RAM 503 are connected to each other through a bus 504.
  • ROM 502 is an optional module.
  • the RAM 503 stores executable instructions, or writes executable instructions to the ROM 502 at runtime, the executable instructions causing the processor 501 to perform operations corresponding to the above-described communication methods.
  • An input/output (I/O) interface 505 is also coupled to bus 504.
  • the communication unit 512 can include, but is not limited to, a network card, which can include, but is not limited to, an IB (Infiniband) network card.
  • the communication unit 512 may be integrated or may be provided with a plurality of sub-modules (for example, a plurality of IB network cards) and on the bus link.
  • the following components are connected to the I/O interface 505: an input portion 506 including a keyboard, a mouse, etc.; an output portion 507 including, for example, a cathode ray tube (CRT), a liquid crystal display (LCD), and the like, and a storage portion 508 including a hard disk or the like. And a communication portion 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the Internet.
  • Driver 510 is also coupled to I/O interface 505 as needed.
  • a removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory or the like is mounted on the drive 510 as needed so that a computer program read therefrom is installed into the storage portion 508 as needed.
  • FIG. 5 is only an optional implementation manner.
  • the number and type of components in FIG. 5 may be selected, deleted, added, or replaced according to actual needs;
  • Different function components can also be implemented in separate settings or integrated settings, such as GPU and CPU detachable settings or GPU can be integrated on the CPU, the communication part can be separated, or integrated on the CPU or GPU. and many more.
  • an embodiment of the present disclosure includes a computer program product comprising a computer program tangibly embodied on a machine readable medium, the computer program comprising program code for executing the method illustrated in the flowchart, the program code comprising Executing instructions corresponding to the method steps provided in the embodiments of the present application, for example, determining M key neighborhood regions from corresponding object candidate frames according to location information of M key points in multiple object candidate frames of the image to be detected, where , M is less than or equal to the total number of key points of the N preset object categories, for example, M is the total number of key points of N preset object categories, and M and N are positive integers;
  • the M key neighborhood regions are determined by using a convolutional neural network model for identifying object categories in the image, and determining category information of at least one object in the image to be detected.
  • the computer program can be downloaded and installed from the network via the communication portion 509, and/or installed from the removable medium 511.
  • the computer program is executed by the central processing unit (CPU) 501, the above-described functions defined in the method of the present application are performed.
  • the methods, apparatus, and apparatus of the present application may be implemented in a number of ways.
  • the methods, apparatus, and apparatus of the present application can be implemented in software, hardware, firmware, or any combination of software, hardware, and firmware.
  • the above-described sequence of steps for the method is for illustrative purposes only, and the steps of the method of the present application are not limited to the order specifically described above unless otherwise specifically stated.
  • the present application can also be implemented as a program recorded in a recording medium, the programs including machine readable instructions for implementing the method according to the present application.
  • the present application also covers a recording medium storing a program for executing the method according to the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

一种用于识别物体类别的方法及装置、电子设备。其中,用于识别物体类别的方法包括:根据待检测图像的多个物体候选框中M个关键点的位置信息从相应的物体候选框中确定M个关键点邻域区域,其中,M小于或等于N个预设的物体类别的关键点的总个数,M和N均为正整数;根据所述M个关键点邻域区域,采用用于识别图像中物体类别的卷积神经网络模型,确定所述待检测图像中至少一个物体的类别信息。

Description

用于识别物体类别的方法及装置、电子设备
本申请要求在2016年12月16日提交中国专利局、申请号为201611168161.7、申请名称为“用于识别物体类别的方法及装置、电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机视觉技术领域,尤其涉及一种用于识别物体类别的方法及装置、电子设备。
背景技术
随着计算机视觉技术的发展,物体检测已成为计算机视觉研究中的一个核心问题。物体检测是检测给定图像中物体的位置并识别该物体的类别。具体可划分为物体定位和物体识别两个过程,其中,物体定位是将图像中包含物体的外接矩形框(即物体候选框)找到。物体识别是在获得物体候选框之后,判断物体候选框中的物体属于哪一类别,或者候选框中无物体而属于背景。
发明内容
本申请实施例提供一种识别物体类别的技术方案。
根据本申请实施例的一方面,提供一种用于识别物体类别的方法,包括:根据待检测图像的多个物体候选框中M个关键点的位置信息从相应的物体候选框中确定M个关键点邻域区域,其中,M小于或等于N个预设的物体类别的关键点的总个数,M和N均为正整数;根据所述M个关键点邻域区域,采用用于识别图像中物体类别的卷积神经网络模型,确定所述待检测图像中至少一个物体的类别信息。
可选地,结合本申请实施例提供的任一种用于识别物体类别的方法,其中,所述卷积神经网络模型包括K个卷积层、池化层和输出层,K为大于或等于2的正整数;所述根据所述M个关键点邻域区域,采用用于识别图像中物体类别的卷积神经网络模型,确定所述待检测图像中至少一个物体的类别信息,包括:从第K-1个卷积层输出与所述M个关键点一一对应的特征图;从第K个卷积层输出每一所述特征图转化后得到的关键点的响应图;分别将所述M个关键点邻域区域映射到相应的关键点的响应图上得到映射区域;从池化层输出对应于M个所述映射区域的池化结果;基于M个所述池化结果从所述输出层获得属于各预设的物体类别的第一置信度,并根据所述属于各预设的物体类别的第一置信度确定所述至少一个物体的类别信息。
可选地,结合本申请实施例提供的任一种用于识别物体类别的方法,其中,所述基于M个所述池化结果从所述输出层获得属于各预设的物体类别的第一置信度,包括:对属于同一预设的物体类别的关键点相应的池化结果求取平均值得到各个预设的物体类别的分数;根据所述各个预设的物体类别的分数从所述输出层获得属于各预设的物体类别的第一置信度。
可选地,结合本申请实施例提供的任一种用于识别物体类别的方法,其中,在所述根据所述M个关键点邻域区域,采用用于识别图像中物体类别的卷积神经网络模型,确定所述待检测图像中至少一个物体的类别信息之前,所述方法还包括:对卷积神经网络模型进行训练;
所述对卷积神经网络模型进行训练,包括:获取含有关键点的位置信息、物体框标注信息以及类别标注信息的样本图像;对所述样本图像进行卷积处理,获取卷积结果;根据所述物体框标注信息以及类别标注信息,判断所述卷积结果中的物体框位置信息和/或类别信息是否满足训练完成条件;若满足,则完成对所述卷积神经网络模型的训练;若不满足,则根据所述卷积结果中物体框位置信息和/或类别信息调整所述卷积神经网络模型的参数,并根据调整后的所述卷积神经网络模型的参数对所述卷积神经网络模型进行迭代训练,直至迭代训练后的物体框位置信息和/或类别信息满足所述训练完成条件。
可选地,结合本申请实施例提供的任一种用于识别物体类别的方法,其中,在所述根据待检测图像的多个物体候选框中M个关键点的位置信息从相应的物体候选框中确定M个关键点邻域区域之前,所述方法还包括:获取所述多个物体候选框的位置;根据所述多个物体候选框的位置获取所述多个物体候选框中M个关键点的位置信息。
可选地,结合本申请实施例提供的任一种用于识别物体类别的方法,其中,所述获取所述多个物体候选框的位置,包括:采用第一卷积神经网络获得所述多个物体候选框的位置信息,或者,采用选择性搜索方法获得所述多个物体候选框的位置信息,或者,采用边缘方块Edge Box方法获得所述多个物体候选框的位置信息。
可选地,结合本申请实施例提供的任一种用于识别物体类别的方法,其中,所述根据所述多个物体候选框的位置获取所述多个物体候选框中M个关键点的位置信息,还包括:根据所述多个物体候选框的位置获取每个关键点相对应的第二置信度,所述第二置信度为表征关键点是否存在于候选框中的可能性的数据;
所述根据所述M个关键点邻域区域,采用用于识别图像中物体类别的卷积神经网络模型,确定所述待检测图像中至少一个物体的类别信息,还包括:将第二置信度低于设定的置信度阈值的关键点对应的响应图的池化结果记为零。
可选地,结合本申请实施例提供的任一种用于识别物体类别的方法,其中,在所述 根据待检测图像的多个物体候选框中M个关键点的位置信息从相应的物体候选框中确定M个关键点邻域区域之前,所述方法还包括:采用第二卷积神经网络对所述多个物体候选框位置纠正,获得纠正后的多个物体候选框的位置信息。
可选地,结合本申请实施例提供的任一种用于识别物体类别的方法,其中,当所述物体类别为非背景类时,所述根据待检测图像的多个物体候选框中M个关键点的位置信息从相应的物体候选框中确定M个关键点邻域区域,包括:根据相应的物体候选框的尺寸信息和预设的尺寸倍数,确定分别以所述M个关键点的位置为中心的M个矩形区域作为所述M个关键点邻域区域。
可选地,结合本申请实施例提供的任一种用于识别物体类别的方法,其中,当所述物体类别为背景类时,所述根据待检测图像的多个物体候选框中M个关键点的位置信息从相应的物体候选框中确定M个关键点邻域区域,包括:将与所述M个关键点的位置信息相应的物体候选框的区域确定为所述M个关键点邻域区域。
可选地,结合本申请实施例提供的任一种用于识别物体类别的方法,其中,所述第一卷积神经网络为区域建议网络RPN,和/或,所述第二卷积神经网络为快速区域卷积神经网络Faster R-CNN。
根据本申请实施例的另一方面,还提供一种用于识别物体类别的装置,包括:关键点邻域区域确定模块,用于根据待检测图像的多个物体候选框中M个关键点的位置信息从相应的物体候选框中确定M个关键点邻域区域,其中,M小于或等于N个预设的物体类别的关键点的总个数,M和N均为正整数;类别信息确定模块,用于根据所述M个关键点邻域区域,采用用于识别图像中物体类别的卷积神经网络模型,确定所述待检测图像中至少一个物体的类别信息。
可选地,结合本申请实施例提供的任一种用于识别物体类别的装置,其中,所述卷积神经网络模型包括K个卷积层、池化层和输出层,K为大于或等于2的正整数;
所述类别信息确定模块包括:特征图输出单元,用于从第K-1个卷积层输出与所述M个关键点一一对应的特征图;响应图输出单元,用于从第K个卷积层输出每一所述特征图转化后得到的关键点的响应图;映射区域获取单元,用于分别将所述M个关键点邻域区域映射到相应的关键点的响应图上得到映射区域;池化结果输出单元,用于从池化层输出对应于M个所述映射区域的池化结果;置信度获取及信息确定单元,用于基于M个所述池化结果从所述输出层获得属于各预设的物体类别的第一置信度,并根据所述属于各预设的物体类别的第一置信度确定所述至少一个物体的类别信息。
可选地,结合本申请实施例提供的任一种用于识别物体类别的装置,其中,所述置信度获取及信息确定单元用于对属于同一预设的物体类别的关键点相应的池化结果求取 平均值得到各个预设的物体类别的分数;根据所述各个预设的物体类别的分数从所述输出层获得属于各预设的物体类别的第一置信度。
可选地,结合本申请实施例提供的任一种用于识别物体类别的装置,其中,所述装置还包括:模型训练模块,用于对卷积神经网络模型进行训练;
所述模型训练模块用于获取含有关键点的位置信息、物体框标注信息以及类别标注信息的样本图像;对所述样本图像进行卷积处理,获取卷积结果;根据所述物体框标注信息以及类别标注信息,判断所述卷积结果中的物体框位置信息和/或类别信息是否满足训练完成条件;若满足,则完成对所述卷积神经网络模型的训练;若不满足,则根据所述卷积结果中物体框位置信息和/或类别信息调整所述卷积神经网络模型的参数,并根据调整后的所述卷积神经网络模型的参数对所述卷积神经网络模型进行迭代训练,直至迭代训练后的物体框位置信息和/或类别信息满足所述训练完成条件。
可选地,结合本申请实施例提供的任一种用于识别物体类别的装置,其中,所述装置还包括:候选框位置获取模块,用于获取所述多个物体候选框的位置;关键点位置信息获取模块,用于根据所述多个物体候选框的位置获取所述多个物体候选框中M个关键点的位置信息。
可选地,结合本申请实施例提供的任一种用于识别物体类别的装置,其中,所述候选框位置获取模块用于采用第一卷积神经网络获得所述多个物体候选框的位置信息,或者,采用选择性搜索装置获得所述多个物体候选框的位置信息,或者,采用边缘方块Edge Box装置获得所述多个物体候选框的位置信息。
可选地,结合本申请实施例提供的任一种用于识别物体类别的装置,其中,所述关键点位置信息获取模块还用于根据所述多个物体候选框的位置获取每个关键点相对应的第二置信度,所述第二置信度为表征关键点是否存在于候选框中的可能性的数据;
所述类别信息确定模块还包括:池化结果标记单元,用于将第二置信度低于设定的置信度阈值的关键点对应的响应图的池化结果记为零。
可选地,结合本申请实施例提供的任一种用于识别物体类别的装置,其中,所述装置还包括:候选框位置纠正模块,用于采用第二卷积神经网络对所述多个物体候选框位置纠正,获得纠正后的多个物体候选框的位置信息。
可选地,结合本申请实施例提供的任一种用于识别物体类别的装置,其中,当所述物体类别为非背景类时,所述关键点邻域区域确定模块用于根据相应的物体候选框的尺寸信息和预设的尺寸倍数,确定分别以所述M个关键点的位置为中心的M个矩形区域作为所述M个关键点邻域区域。
可选地,结合本申请实施例提供的任一种用于识别物体类别的装置,其中,当所述 物体类别为背景类时,所述关键点邻域区域确定模块用于将与所述M个关键点的位置信息相应的物体候选框的区域确定为所述M个关键点邻域区域。
可选地,结合本申请实施例提供的任一种用于识别物体类别的装置,其中,所述第一卷积神经网络为区域建议网络RPN,和/或,所述第二卷积神经网络为快速区域卷积神经网络Faster R-CNN。
根据本申请实施例的又一方面,提供一种电子设备。所述电子设备包括:处理器、存储器、通信元件和通信总线,所述处理器、所述存储器和所述通信元件通过所述通信总线完成相互间的通信;所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行如上述提供的用于识别物体类别的方法对应的操作。
根据本申请实施例的又一方面,还提供了一种计算机可读存储介质,存储有计算机程序,所述计算机程序包括计算机指令,当所述计算机指令在设备的处理器中运行时,所述处理器执行用于实现任一实施方式的用于识别物体类别的方法中的步骤,例如,所述计算机可读存储介质存储有:用于根据待检测图像的多个物体候选框中M个关键点的位置信息从相应的物体候选框中确定M个关键点邻域区域的可执行指令;以及用于根据所述M个关键点邻域区域,采用用于识别图像中物体类别的卷积神经网络模型,确定所述待检测图像中至少一个物体的类别信息的可执行指令。
根据本申请实施例提供的用于识别物体类别的方法及装置、电子设备,根据物体候选框中关键点的位置信息从相应的物体候选框中确定关键点邻域区域,进一步利用关键点邻域区域以及用于识别图像中物体类别的卷积神经网络模型,确定待检测图像中物体的类别信息。通过利用关键点的位置信息,避免了候选框中的信息较为空泛化,以及为识别过程带来较大的信息损失的缺陷,进而有利于提高单类别或多类别物体的识别的识别准确性。尤其是对于多类别物体识别的情况下,有利于提高识别准确性。
附图说明
构成说明书的一部分的附图描述了本申请的实施例,并且连同描述一起用于解释本申请的原理。
参照附图,根据下面的详细描述,可以更加清楚地理解本申请,其中:
图1是示出根据本申请实施例的一用于识别物体类别的方法的流程图;
图2是示出根据本申请实施例的另一用于识别物体类别的方法的流程图;
图3是示出根据本申请实施例的一用于识别物体类别的装置的逻辑框图;
图4是示出根据本申请实施例的另一用于识别物体类别的装置的逻辑框图;
图5是示出根据本申请实施例的电子设备的结构示意图。
具体实施例
下面结合附图详细描述本申请的示例性实施例。
本领域技术人员可以理解,本方法实施例中的“第一”、“第二”等术语仅用于区别不同步骤、设备或模块等,既不代表任何特定技术含义,也不表示它们之间的必然逻辑顺序。
图1是示出根据本申请实施例的一用于识别物体类别的方法的流程图。通过包括用于识别物体类别的装置的设备执行所述方法。
参照图1,在步骤S110,根据待检测图像的多个物体候选框中M个关键点的位置信息从相应的物体候选框中确定M个关键点邻域区域。
这里,物体候选框可以是图像中包含可能是物体的外接矩形框。待检测图像可以是图片、照片、视频中某一帧图像等,但不限于此。例如,利用移动终端上的摄像头拍摄了一张狗的照片,那么该张照片就是待检测图像,在该张照片上包含狗的外接矩形框可以是物体候选框。如果该张照片中还包括如椅子等其他物体,同样地,可以采用物体候选框对椅子等其他物体定位。由此,作为待检测图像的该张照片可以含有多个物体候选框。
其中,M小于或等于N个预设的物体类别的关键点的总个数,例如,M为N个预设的物体类别的关键点的总个数,M和N均为正整数,每个物体候选框中至少有一个物体类别的至少一个关键点,例如,0<N≤200,200<M≤4000。
举例来说,以常用物体检测数据集(如PASCAL VOC)为例,预先设定的物体类别有人、狗、火车、椅子等20个类别,可统称为非背景类,此外,在没有这20个类别物体的情况下定义为背景类。由此,物体识别问题转化为一个21类别的分类问题。对于每个类别,定义其关键点的个数,例如,类别为人时,可定义头、左手、右手、左肩、右肩等11个关键点,类别为狗时,可定义头、四肢、尾巴等10个关键点。关键点的位置信息可以是关键点的坐标数据,物体候选框中关键点的位置信息可通过关键点预测技术获得。例如,利用卷积神经网络等方法进行关键点预测。一个可选的例子,本申请可以针对预先设定的每一个物体类别,利用卷积神经网络预测出一组关键点,从而一组关键点对应一个物体类别,且每一组关键点中的每一个关键点都对应有其在待检测图像中的位置信息。
在得到了M个关键点的位置信息之后,需要确定后续做池化处理的关键点邻域区域取多大。本申请中的关键点邻域区域是指包含有关键点以及关键点周边一定范围的区域。
由此,当物体类别为非背景类时,根据本申请示例性实施例,步骤S110可包括:根 据相应的物体候选框的尺寸信息和预设的尺寸倍数,确定分别将以M个关键点的位置为中心的M个矩形区域作为M个关键点邻域区域。
当物体类别为背景类时,根据本申请示例性实施例,步骤S110可包括:将与M个关键点的位置信息相应的物体候选框的区域确定为M个关键点邻域区域。
例如,可以选取物体候选框的宽、高的四分之一到十分之一之间的一个数值作为关键点邻域区域的宽、高。假设物体候选框的宽为w,高为h,那么可取以关键点为中心,宽为w/5,高为h/5的区域作为该关键点的关键点邻域区域。而背景类别与上述处理不同,而是取整个物体候选框的区域作为关键点邻域区域。
在步骤S120,根据M个关键点邻域区域,采用用于识别图像中物体类别的卷积神经网络模型,确定待检测图像中至少一个物体的类别信息。
在具体的实现方式中,以200个关键点为例,待检测图像在经过卷积神经网络模型的多个卷积层得到的特征图的基础上,再加一层卷积处理使得200个特征图转化为200个关键点的响应图。200个关键点邻域区域映射到相应的关键点的响应图上得到200个映射区域,然后对这200个映射区域做池化处理得到池化结果,同一类别的各池化结果再等权重投票(即求取平均值),得到该类别的分数。包括背景类别在内的21个分数做softmax,得到任一物体候选框属于各类别的置信度。一般选取置信度最高的类别,作为物体候选框中物体的类别信息。如果背景类别的置信度最高,则确定物体候选框中没有预设类别的物体。由此,获得了物体候选框中物体的类别信息,即是确定了待检测图像中物体的类别信息。例如,假设某一待检测图像中含有五个物体候选框,其中四个物体候选框属于背景类别的置信度最高,一个物体候选框属于人这一类别的置信度最高,那么,利用本实施例的方法识别出的该图像中物体的类别信息就是人。
通过本实施例提供的用于识别物体类别的方法,通过获得含有多个物体候选框的待检测图像,进一步根据物体候选框中关键点的位置信息从相应的物体候选框中确定关键点邻域区域,最后,利用关键点邻域区域以及用于识别图像中物体类别的卷积神经网络模型,确定待检测图像中物体的类别信息。通过利用关键点的位置信息,避免了候选框中的信息较为空泛化,以及为识别过程带来较大的信息损失的缺陷,进而有利于提高单类别或多类别物体的识别准确性。尤其有利于提高多类别物体的识别的准确性。
可选地,前述卷积神经网络模型具体包括K个卷积层、池化层和输出层,K为正整数(如K为大于或等于2的正整数)。相应地,根据本申请示例性实施例,步骤S120可包括:从第K-1个卷积层输出与M个关键点一一对应的特征图;从第K个卷积层输出每一特征图转化后得到的关键点的响应图;分别将M个关键点邻域区域映射到相应的关键点的响应图上得到映射区域;从池化层输出对应于M个映射区域的池化结果;基于M个池 化结果从输出层获得属于各预设的物体类别的第一置信度,并根据属于各预设的物体类别的第一置信度确定至少一个物体的类别信息。
上述过程中,所述基于M个池化结果从输出层获得属于各预设的物体类别的第一置信度的处理包括:对属于同一预设的物体类别的关键点相应的池化结果求取平均值得到各个预设的物体类别的分数;根据各个预设的物体类别的分数从输出层获得属于各预设的物体类别的第一置信度。
由此,本实施例还具有如下技术效果:卷积神经网络模型的最后一层卷积层产生的用于池化的特征图是各类别物体各关键点的响应图,不同类别的特征图数目取决于该类别定义的关键点数目,并且,在特征图上对各类别各关键点邻域区域做池化,可见,池化区域由关键点的位置决定,不同类别的池化区域不同,从而更充分有效地利用了辅助识别信息(即关键点的位置),进一步有利于提升物体识别的准确率。
可选地,在步骤S120之前,该用于识别物体类别的方法还包括:对卷积神经网络模型进行训练。
上述对卷积神经网络模型的训练的处理可包括:获取含有关键点的位置信息、物体框标注信息以及类别标注信息的样本图像;对所述样本图像进行卷积处理,获取卷积结果;根据所述物体框标注信息以及类别标注信息,判断所述卷积结果中的物体框位置信息和/或类别信息是否满足训练完成条件;若满足,则完成对所述卷积神经网络模型的训练;若不满足,则根据所述卷积结果中物体框位置信息和/或类别信息调整所述卷积神经网络模型的参数,并根据调整后的所述卷积神经网络模型的参数对所述卷积神经网络模型进行迭代训练,直至迭代训练后的物体框位置信息和/或类别信息满足所述训练完成条件。一个可选的具体例子,获取含有关键点的位置信息、物体框标注信息以及类别标注信息的样本图像;根据样本图像中的关键点的位置信息确定各关键点邻域区域;从第K-1个卷积层输出与关键点一一对应的特征图;从第K个卷积层输出每一特征图转化后得到的关键点的响应图;分别将各关键点邻域区域映射到相应的关键点的响应图上得到映射区域;从池化层输出对应于各映射区域的池化结果;基于各池化结果从输出层获得属于各预设的物体类别的第一置信度,并根据属于各预设的物体类别的第一置信度确定样本图像中至少一个物体的类别信息;根据样本图像的物体框标注信息及类别标注信息,判断卷积神经网络模型输出的识别结果信息是否满足训练完成条件(例如,连续预定次数损失函数的计算结果的变化是否小于预设值);若满足(例如,连续预定次数损失函数的计算结果的变化小于预设值),则完成对卷积神经网络模型的训练;若不满足(例如,连续预定次数损失函数的计算结果的变化大于或等于预设值),则根据卷积神经网络模型输出的识别结果信息调整卷积神经网络模型的参数,并根据调整后的卷积神经网络模 型的参数对卷积神经网络模型进行迭代训练,直至卷积神经网络模型输出的识别结果信息满足训练完成条件。本申请中的损失函数可以是针对卷积神经网络模型输出的识别结果信息与样本图像中的物体框标注信息和类别标注信息的偏差而构造的偏差函数。
在具体的实现方式中,训练数据可以使用公开数据集如PASCAL VOC的图片。PASCAL VOC中的物体有20个类别。模型训练时,每次迭代输入至少一张图片以及通过如卷积神经网络等方法预测出的可能的各类别的关键点位置,同时,将标注好的表征包含物体的真实区域的标准位置信息(即上述物体框标注信息)和标注好的表征物体真实类别的标准类别信息(即上述类别标注信息)作为监督信息也输入到卷积神经网络模型中。在卷积神经网络的输出层后面接上loss(损失)层来计算损失函数,通过随机梯度下降法不断迭代使得网络参数不断更新直到收敛。Loss层的作用包括以网络输出的物体框位置信息以及标准位置信息为输入,求其欧氏距离或类似的距离作为loss,而分类层(如输出层中的全连接层)把网络分类结果(即各类别的得分)和标准类别信息作为输入,以softmaxloss作为损失函数。
通过对卷积神经网络模型进行上述训练,卷积神经网络模型可以基于关键点邻域区域对待检测图像进行特征提取和物体框中物体分类,从而具有确定待检测图像中物体的类别信息的功能。
图2是示出根据本申请实施例的另一用于识别物体类别的方法的流程图。通过包括用于识别物体类别的装置的设备执行所述方法。
参照图2,在步骤S210,获取多个物体候选框的位置。
具体地,获取物体候选框的位置也就是物体定位,物体定位是把图像中的物体的外接矩形框找到。物体候选框的位置通常采用外接矩形的对角线两个端点的坐标数据来表示。
根据本申请示例性实施例,步骤S210可包括:采用第一卷积神经网络获得多个物体候选框的位置信息,或者,采用选择性搜索(Selective Search)方法获得多个物体候选框的位置信息,或者,采用边缘方块(Edge Box)方法获得多个物体候选框的位置信息。需要说明的是,第一卷积神经网络可以为区域建设网络(Region Proposal Network,RPN)。
通过上述Selective Search、RPN等方法获得的物体候选框的位置为后续关键点预测处理提供了数据基础,同时有利于提高关键点预测结果的精准性。
在步骤S220,采用第二卷积神经网络对多个物体候选框位置纠正,获得纠正后的多个物体候选框的位置信息。
这里,第二卷积神经网络可以为快速区域卷积神经网络(Faster Region Convolutional Neural Network,Faster R-CNN)。利用Faster R-CNN的方法,可以将物体候选框映射到特征图上池化出来再提取特征进行回归。这一步骤的处理可以使得物体候选框的位置更准确地贴合物体。在实际应用中,物体候选框位置的纠正还可以利用R-FCN的方法把候选框化为7*7的格子分别在不同通道池化再进行回归。
在步骤S230,根据多个物体候选框的位置获取多个物体候选框中M个关键点的位置信息。
具体地,根据物体候选框的位置获得物体候选框中关键点的位置信息可以认为是关键点预测的处理。关键点预测的处理可以利用卷积神经网络等方法实现。
在具体的实现方式中,首先,需要定义每一类别的物体的关键点。举例来说,以常用物体检测数据集(如PASCAL VOC)为例,预先设定的物体类别有人、狗、火车、椅子等20个类别,此外,并没有这20个类别物体的情况下定义为背景类。由此,物体识别问题转化为一个21类别的分类问题。对于每个类别,定义其关键点的个数,例如,类别为人时,可定义头、左手、右手、左肩、右肩等11个关键点,类别为狗时,可定义头、四肢、尾巴等10个关键点;其次,可训练一深层卷积神经网络,在所有卷积层之后得到的特征图上,将物体候选框的位置映射到特征图上对应区域,对映射得到的相应区域进行池化,得到卷积神经网络学习出来的关键点分布的特征,关键点分布的特征再通过回归的方式得到物体候选框中的关键点。由于在这一步骤是没有类别信息的,需要对20类物体可能的关键点都进行回归,而背景类直接选取中心作为其关键点,不做回归预测。物体候选框中可能并没有全部的关键点,或者存在关键点被遮挡的问题,但通常仍然会有一个关键点预测的结果。也就是说,假设21个预设的物体类别的关键点的总个数为200,那么根据多个物体候选框的位置就会获得200个关键点的位置信息。
可选地,步骤S230还包括:根据多个物体候选框的位置获取每个关键点相对应的第二置信度,所述第二置信度为表征关键点是否存在于候选框中的可能性的数据。也就是说,在预测关键点位置的同时会预测每个关键点的置信度(如第二置信度),预测出的第二置信度可以为后续哪些关键点邻域区域做池化处理提供判断依据。
在步骤S240,根据待检测图像的多个物体候选框中M个关键点的位置信息从相应的物体候选框中确定M个关键点邻域区域。
其中,上述步骤S240的步骤内容与上述实施例一中步骤S110的步骤内容相同,在此不再赘述。
在步骤S250,根据M个关键点邻域区域,采用用于识别图像中物体类别的卷积神经网络模型,确定待检测图像中至少一个物体的类别信息。
由于前述步骤S230中除了获得M个关键点的位置信息外,还可根据多个物体候选框的位置获得每个关键点相对应的第二置信度。相应地,步骤S250还可包括:将置信度(如第二置信度)不满足预定要求的关键点对应的响应图的池化结果记为零。对应于上述预测出的第二置信度为后续哪些关键点邻域区域做池化处理提供判断依据的说明,具体地,置信度(如第二置信度)低于设定的置信度阈值的关键点被视为无效,不会对其邻域做后续的池化处理,而是将视为无效的关键点对应的响应图的池化结果直接记为零。这一处理使得根据关键点邻域区域和卷积神经网络模型确定待检测图像中物体的类别信息的处理更加简化,无需所有的关键点邻域区域都做池化处理。
通过本实施例提供的用于识别物体类别的方法,在上述实施例的基础上,还具有如下技术效果:
一方面,利用RPN、Selective Search、Edge Box等方法有利于准确地进行物体定位,从而为后续关键点预测处理提供了数据基础,同时有利于使关键点预测的结果更加精准;
另一方面,利用Faster R-CNN方法进行候选框位置纠正,有利于进一步提高物体定位的准确度;
再一方面,在预测关键点位置的同时还预测每个关键点的置信度,预测出的置信度(如第二置信度)可以为后续哪些关键点邻域区域做池化处理提供判断依据,从而有利于使根据关键点邻域区域和卷积神经网络模型确定待检测图像中物体的类别信息的处理更加简化。
基于相同的技术构思,图3是示出根据本申请实施例的用于识别物体类别的装置的逻辑框图。参照图3,该用于识别物体类别的装置包括关键点邻域区域确定模块310和类别信息确定模块320。
关键点邻域区域确定模块310用于根据待检测图像的多个物体候选框中M个关键点的位置信息从相应的物体候选框中确定M个关键点邻域区域,其中,M小于或等于N个预设的物体类别的关键点的总个数,例如,M为N个预设的物体类别的关键点的总个数,M和N均为正整数。
类别信息确定模块320用于根据所述M个关键点邻域区域,采用用于识别图像中物体类别的卷积神经网络模型,确定所述待检测图像中至少一个物体的类别信息。
本实施例提供的用于识别物体类别的装置,通过根据物体候选框中关键点的位置信息从相应的物体候选框中确定关键点邻域区域,进一步利用关键点邻域区域以及用于识别图像中物体类别的卷积神经网络模型,确定待检测图像中物体的类别信息。通过利用 关键点的位置信息,有利于避免候选框中的信息较为空泛化,以及为识别过程带来较大的信息损失的缺陷,进而有利于提高在单类别或多类别物体的识别准确性。尤其是对于多类别物体识别的情况下,有利于提高识别准确性。
基于相同的技术构思,图4是示出根据本申请实施例的用于识别物体类别的装置的逻辑框图。
参照图4,可选地,卷积神经网络模型包括K个卷积层、池化层和输出层,K为正整数(如K为大于或等于2的正整数);类别信息确定模块320包括:
特征图输出单元3201用于从第K-1个卷积层输出与M个关键点一一对应的特征图。
响应图输出单元3202用于从第K个卷积层输出每一特征图转化后得到的关键点的响应图。
映射区域获取单元3203用于分别将M个关键点邻域区域映射到相应的关键点的响应图上得到映射区域。
池化结果输出单元3204用于从池化层输出对应于M个映射区域的池化结果。
置信度获取及信息确定单元3205用于基于M个池化结果从输出层获得属于各预设的物体类别的第一置信度,并根据属于各预设的物体类别的第一置信度确定至少一个物体的类别信息。
可选地,置信度获取及信息确定单元3205用于对属于同一预设的物体类别的关键点相应的池化结果求取平均值得到各个预设的物体类别的分数;根据各个预设的物体类别的分数从输出层获得属于各预设的物体类别的第一置信度。
可选地,用于识别物体类别的装置还包括:模型训练模块(图中未示出)用于对卷积神经网络模型进行训练。
模型训练模块可用于获取含有关键点的位置信息、物体框标注信息以及类别标注信息的样本图像;对样本图像进行卷积处理,获取卷积结果;根据物体框标注信息以及类别标注信息,判断卷积结果中的物体框位置信息和/或类别信息是否满足训练完成条件;若满足,则完成对卷积神经网络模型的训练;若不满足,则根据卷积结果中物体框位置信息和/或类别信息调整卷积神经网络模型的参数,并根据调整后的卷积神经网络模型的参数对卷积神经网络模型进行迭代训练,直至迭代训练后的物体框位置信息和/或类别信息满足训练完成条件。
可选地,用于识别物体类别的装置还包括:
候选框位置获取模块330用于获取多个物体候选框的位置。
关键点位置信息获取模块340用于根据多个物体候选框的位置获取多个物体候选框 中M个关键点的位置信息。
可选地,候选框位置获取模块330用于采用第一卷积神经网络获得多个物体候选框的位置信息,或者,采用选择性搜索装置获得多个物体候选框的位置信息,或者,采用边缘方块Edge Box装置获得多个物体候选框的位置信息。
可选地,关键点位置信息获取模块340还用于根据多个物体候选框的位置获取每个关键点相对应的第二置信度,第二置信度为表征关键点是否存在于候选框中的可能性的数据。
相应地,类别信息确定模块320还可以包括:池化结果标记单元3206用于将第二置信度不满足预定要求的关键点对应的响应图的池化结果记为零。
可选地,用于识别物体类别的装置还包括:候选框位置纠正模块350用于采用第二卷积神经网络对多个物体候选框位置纠正,获得纠正后的多个物体候选框的位置信息。
可选地,当物体类别为非背景类时,关键点邻域区域确定模块310用于根据相应的物体候选框的尺寸信息和预设的尺寸倍数,确定分别以M个关键点的位置为中心的M个矩形区域作为M个关键点邻域区域。
可选地,当物体类别为背景类时,关键点邻域区域确定模块310用于将与M个关键点的位置信息相应的物体候选框的区域确定为M个关键点邻域区域。
可选地,第一卷积神经网络为区域建议网络RPN,和/或,第二卷积神经网络为快速区域卷积神经网络Faster R-CNN。
本实施例用于识别物体类别的装置用于实现前述多个方法实施例中相应的用于识别物体类别的方法,并具有相应的方法实施例的有益效果,在此不再赘述。
此外,本实施例的用于识别物体类别的装置可以设置于适当的电子设备中,包括但不限于移动终端、PC等。
本申请实施例还提供了一种电子设备,例如可以是移动终端、个人计算机(PC)、平板电脑、服务器等。下面参考图5,其示出了适于用来实现本申请实施例的终端设备或服务器的电子设备500的结构示意图:如图5所示,电子设备500包括一个或多个处理器、通信元件等,所述一个或多个处理器例如:一个或多个中央处理单元(CPU)501,和/或一个或多个图像处理器(GPU)513等,处理器可以根据存储在只读存储器(ROM)502中的可执行指令或者从存储部分508加载到随机访问存储器(RAM)503中的可执行指令而执行各种适当的动作和处理。所述通信元件可包括但不限于通信部512和通信部分509。
处理器可与只读存储器502和/或随机访问存储器503中通信以执行可执行指令,通 过总线504与通信元件相连、并经通信元件与其他目标设备通信,从而完成本申请实施例提供的任一项方法对应的操作,例如,根据待检测图像的多个物体候选框中M个关键点的位置信息从相应的物体候选框中确定M个关键点邻域区域,其中,M小于或等于N个预设的物体类别的关键点的总个数,例如,M为N个预设的物体类别的关键点的总个数,M和N均为正整数;根据所述M个关键点邻域区域,采用用于识别图像中物体类别的卷积神经网络模型,确定所述待检测图像中至少一个物体的类别信息。
此外,在RAM 503中,还可存储有装置操作所需的各种程序和数据。CPU501、ROM502以及RAM503通过总线504彼此相连。在有RAM503的情况下,ROM502为可选模块。RAM503存储可执行指令,或在运行时向ROM502中写入可执行指令,可执行指令使处理器501执行上述通信方法对应的操作。输入/输出(I/O)接口505也连接至总线504。通信部512可包括但不限于网卡,所述网卡可包括但不限于IB(Infiniband)网卡。通信部512可以集成设置,也可以设置为具有多个子模块(例如多个IB网卡),并在总线链接上。
以下部件连接至I/O接口505:包括键盘、鼠标等的输入部分506;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分507;包括硬盘等的存储部分508;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分509。通信部分509经由诸如因特网的网络执行通信处理。驱动器510也根据需要连接至I/O接口505。可拆卸介质511,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器510上,以便于从其上读出的计算机程序根据需要被安装入存储部分508。
需要说明的,如图5所示的架构仅为一种可选实现方式,在具体实践过程中,可根据实际需要对上述图5的部件数量和类型进行选择、删减、增加或替换;在不同功能部件设置上,也可采用分离设置或集成设置等实现方式,例如GPU和CPU可分离设置或者可将GPU集成在CPU上,通信部可分离设置,也可集成设置在CPU或GPU上,等等。这些可替换的实施方式均落入本申请公开的保护范围。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括有形地包含在机器可读介质上的计算机程序,计算机程序包含用于执行流程图所示的方法的程序代码,程序代码可包括对应执行本申请实施例提供的方法步骤对应的指令,例如,根据待检测图像的多个物体候选框中M个关键点的位置信息从相应的物体候选框中确定M个关键点邻域区域,其中,M小于或等于N个预设的物体类别的关键点的总个数,例如,M为N个预设的物体类别的关键点的总个数,M和N均为正整数;根据所述M个关键点邻域区域,采用用于识别图像中物体类别的卷积神经网络模型,确定所述待检测图像中至少一个物体的类别信息。在这样的实施例中,该计算机程序可以通过通信部分509从网络上被下载和 安装,和/或从可拆卸介质511被安装。在该计算机程序被中央处理单元(CPU)501执行时,执行本申请的方法中限定的上述功能。
可能以许多方式来实现本申请的方法和装置、设备。例如,可通过软件、硬件、固件或者软件、硬件、固件的任何组合来实现本申请的方法和装置、设备。用于方法的步骤的上述顺序仅是为了进行说明,本申请的方法的步骤不限于以上具体描述的顺序,除非以其它方式特别说明。此外,在一些实施例中,还可将本申请实施为记录在记录介质中的程序,这些程序包括用于实现根据本申请的方法的机器可读指令。因而,本申请还覆盖存储用于执行根据本申请的方法的程序的记录介质。
本申请的描述是为了示例和描述起见而给出的,而并不是无遗漏的或者将本申请限于所公开的形式。很多修改和变化对于本领域的普通技术人员而言是显然的。选择和描述实施例是为了更好说明本申请的原理和实际应用,并且使本领域的普通技术人员能够理解本申请从而设计适于特定用途的带有各种修改的各种实施例。

Claims (25)

  1. 一种用于识别物体类别的方法,其特征在于,包括:
    根据待检测图像的多个物体候选框中M个关键点的位置信息,从相应的物体候选框中确定M个关键点邻域区域,其中,M小于或等于N个预设的物体类别的关键点的总个数,M和N均为正整数;
    根据所述M个关键点邻域区域,采用用于识别图像中物体类别的卷积神经网络模型,确定所述待检测图像中至少一个物体的类别信息。
  2. 根据权利要求1所述的方法,其特征在于,所述卷积神经网络模型包括K个卷积层、池化层和输出层,K为大于或等于2的正整数;
    所述根据所述M个关键点邻域区域,采用用于识别图像中物体类别的卷积神经网络模型,确定所述待检测图像中至少一个物体的类别信息,包括:
    从第K-1个卷积层输出与所述M个关键点一一对应的特征图;
    从第K个卷积层输出每一所述特征图转化后得到的关键点的响应图;
    分别将所述M个关键点邻域区域映射到相应的关键点的响应图上得到映射区域;
    从池化层输出对应于M个所述映射区域的池化结果;
    基于M个所述池化结果从所述输出层获得属于各预设的物体类别的第一置信度,并根据所述属于各预设的物体类别的第一置信度确定所述至少一个物体的类别信息。
  3. 根据权利要求2所述的方法,其特征在于,所述基于M个所述池化结果从所述输出层获得属于各预设的物体类别的第一置信度,包括:
    对属于同一预设的物体类别的关键点相应的池化结果求取平均值得到各个预设的物体类别的分数;
    根据所述各个预设的物体类别的分数从所述输出层获得属于各预设的物体类别的第一置信度。
  4. 根据权利要求1-3中任一项所述的方法,其特征在于,在所述根据所述M个关键点邻域区域,采用用于识别图像中物体类别的卷积神经网络模型,确定所述待检测图像中至少一个物体的类别信息之前,所述方法还包括:对卷积神经网络模型进行训练;
    所述对卷积神经网络模型进行训练,包括:
    获取含有关键点的位置信息、物体框标注信息以及类别标注信息的样本图像;
    对所述样本图像进行卷积处理,获取卷积结果;
    根据所述物体框标注信息以及类别标注信息,判断所述卷积结果中的物体框位置信息和/或类别信息是否满足训练完成条件;
    若满足,则完成对所述卷积神经网络模型的训练;
    若不满足,则根据所述卷积结果中物体框位置信息和/或类别信息,调整所述卷积神经网络模型的参数,并根据调整后的所述卷积神经网络模型的参数,对所述卷积神经网络模型进行迭代训练,直至迭代训练后的物体框位置信息和/或类别信息满足所述训练完成条件。
  5. 根据权利要求1-4中任一项所述的方法,其特征在于,在所述根据待检测图像的多个物体候选框中M个关键点的位置信息,从相应的物体候选框中确定M个关键点邻域区域之前,所述方法还包括:
    获取所述多个物体候选框的位置;
    根据所述多个物体候选框的位置,获取所述多个物体候选框中M个关键点的位置信息。
  6. 根据权利要求5所述的方法,其特征在于,所述获取所述多个物体候选框的位置,包括:
    采用第一卷积神经网络获得所述多个物体候选框的位置信息,或者,
    采用选择性搜索方法获得所述多个物体候选框的位置信息,或者,
    采用边缘方块方法获得所述多个物体候选框的位置信息。
  7. 根据权利要求5或6所述的方法,其特征在于,所述根据所述多个物体候选框的位置,获取所述多个物体候选框中M个关键点的位置信息,还包括:根据所述多个物体候选框的位置,获取每个关键点相对应的第二置信度,所述第二置信度为表征关键点是否存在于候选框中的可能性的数据;
    所述根据所述M个关键点邻域区域,采用用于识别图像中物体类别的卷积神经网络模型,确定所述待检测图像中至少一个物体的类别信息,还包括:将第二置信度低于设定的置信度阈值的关键点对应的响应图的池化结果记为零。
  8. 根据权利要求6或7所述的方法,其特征在于,在所述根据待检测图像的多个物体候选框中M个关键点的位置信息,从相应的物体候选框中确定M个关键点邻域区域之前,所述方法还包括:
    采用第二卷积神经网络对所述多个物体候选框位置纠正,获得纠正后的多个物体候选框的位置信息。
  9. 根据权利要求1-8中任一项所述的方法,其特征在于,当所述物体类别为非背景类时,所述根据待检测图像的多个物体候选框中M个关键点的位置信息,从相应的物体候选框中确定M个关键点邻域区域,包括:
    根据相应的物体候选框的尺寸信息和预设的尺寸倍数,确定分别以所述M个关键点 的位置为中心的M个矩形区域作为所述M个关键点邻域区域。
  10. 根据权利要求1-8中任一项所述的方法,其特征在于,当所述物体类别为背景类时,所述根据待检测图像的多个物体候选框中M个关键点的位置信息,从相应的物体候选框中确定M个关键点邻域区域,包括:
    将与所述M个关键点的位置信息相应的物体候选框的区域,确定为所述M个关键点邻域区域。
  11. 根据权利要求8所述的方法,其特征在于,所述第一卷积神经网络为区域建议网络RPN,和/或,所述第二卷积神经网络为快速区域卷积神经网络Faster R-CNN。
  12. 一种用于识别物体类别的装置,其特征在于,包括:
    关键点邻域区域确定模块,用于根据待检测图像的多个物体候选框中M个关键点的位置信息,从相应的物体候选框中确定M个关键点邻域区域,其中,M小于或等于N个预设的物体类别的关键点的总个数,M和N均为正整数;
    类别信息确定模块,用于根据所述M个关键点邻域区域,采用用于识别图像中物体类别的卷积神经网络模型,确定所述待检测图像中至少一个物体的类别信息。
  13. 根据权利要求12所述的装置,其特征在于,所述卷积神经网络模型包括K个卷积层、池化层和输出层,K为大于或等于2的正整数;
    所述类别信息确定模块包括:
    特征图输出单元,用于从第K-1个卷积层输出与所述M个关键点一一对应的特征图;
    响应图输出单元,用于从第K个卷积层输出每一所述特征图转化后得到的关键点的响应图;
    映射区域获取单元,用于分别将所述M个关键点邻域区域映射到相应的关键点的响应图上得到映射区域;
    池化结果输出单元,用于从池化层输出对应于M个所述映射区域的池化结果;
    置信度获取及信息确定单元,用于基于M个所述池化结果从所述输出层获得属于各预设的物体类别的第一置信度,并根据所述属于各预设的物体类别的第一置信度确定所述至少一个物体的类别信息。
  14. 根据权利要求13所述的装置,其特征在于,所述置信度获取及信息确定单元用于对属于同一预设的物体类别的关键点相应的池化结果求取平均值,得到各个预设的物体类别的分数;根据所述各个预设的物体类别的分数,从所述输出层获得属于各预设的物体类别的第一置信度。
  15. 根据权利要求12-14中任一项所述的装置,其特征在于,所述装置还包括:模型训练模块,用于对卷积神经网络模型进行训练;
    所述模型训练模块用于获取含有关键点的位置信息、物体框标注信息以及类别标注信息的样本图像;对所述样本图像进行卷积处理,获取卷积结果;根据所述物体框标注信息以及类别标注信息,判断所述卷积结果中的物体框位置信息和/或类别信息是否满足训练完成条件;若满足,则完成对所述卷积神经网络模型的训练;若不满足,则根据所述卷积结果中物体框位置信息和/或类别信息调整所述卷积神经网络模型的参数,并根据调整后的所述卷积神经网络模型的参数,对所述卷积神经网络模型进行迭代训练,直至迭代训练后的物体框位置信息和/或类别信息满足所述训练完成条件。
  16. 根据权利要求12-15中任一项所述的装置,其特征在于,所述装置还包括:
    候选框位置获取模块,用于获取所述多个物体候选框的位置;
    关键点位置信息获取模块,用于根据所述多个物体候选框的位置,获取所述多个物体候选框中M个关键点的位置信息。
  17. 根据权利要求16所述的装置,其特征在于,所述候选框位置获取模块用于采用第一卷积神经网络获得所述多个物体候选框的位置信息,或者,采用选择性搜索装置获得所述多个物体候选框的位置信息,或者,采用边缘方块装置获得所述多个物体候选框的位置信息。
  18. 根据权利要求16或17所述的装置,其特征在于,所述关键点位置信息获取模块还用于根据所述多个物体候选框的位置,获取每个关键点相对应的第二置信度,所述第二置信度为表征关键点是否存在于候选框中的可能性的数据;
    所述类别信息确定模块还包括:池化结果标记单元,用于将第二置信度低于设定的置信度阈值的关键点对应的响应图的池化结果记为零。
  19. 根据权利要求17或18所述的装置,其特征在于,所述装置还包括:
    候选框位置纠正模块,用于采用第二卷积神经网络对所述多个物体候选框位置纠正,获得纠正后的多个物体候选框的位置信息。
  20. 根据权利要求12-19中任一项所述的装置,其特征在于,当所述物体类别为非背景类时,所述关键点邻域区域确定模块用于根据相应的物体候选框的尺寸信息和预设的尺寸倍数,确定分别以所述M个关键点的位置为中心的M个矩形区域作为所述M个关键点邻域区域。
  21. 根据权利要求12-19中任一项所述的装置,其特征在于,当所述物体类别为背景类时,所述关键点邻域区域确定模块用于将与所述M个关键点的位置信息相应的物体候选框的区域确定为所述M个关键点邻域区域。
  22. 根据权利要求17所述的装置,其特征在于,所述第一卷积神经网络为区域建议网络RPN,和/或,所述第二卷积神经网络为快速区域卷积神经网络Faster R-CNN。
  23. 一种电子设备,包括:处理器、存储器、通信元件和通信总线,所述处理器、所述存储器和所述通信元件通过所述通信总线完成相互间的通信;
    所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行如权利要求1-11中任一项所述的用于识别物体类别的方法中的步骤。
  24. 一种计算机程序,包括计算机指令,当所述计算机指令在设备的处理器中运行时,所述处理器执行用于实现权利要求1-11中的任一权利要求所述的用于识别物体类别的方法中的步骤。
  25. 一种计算机可读存储介质,存储有计算机程序,所述计算机程序包括计算机指令,当所述计算机指令在电子设备中运行时,所述电子设备中的处理器执行用于实现权利要求1-11中的任一权利要求所述的用于识别物体类别的方法中的步骤。
PCT/CN2017/116218 2016-12-16 2017-12-14 用于识别物体类别的方法及装置、电子设备 Ceased WO2018108129A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/423,061 US10977523B2 (en) 2016-12-16 2019-05-27 Methods and apparatuses for identifying object category, and electronic devices

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201611168161.7 2016-12-16
CN201611168161.7A CN108229509B (zh) 2016-12-16 2016-12-16 用于识别物体类别的方法及装置、电子设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/423,061 Continuation US10977523B2 (en) 2016-12-16 2019-05-27 Methods and apparatuses for identifying object category, and electronic devices

Publications (1)

Publication Number Publication Date
WO2018108129A1 true WO2018108129A1 (zh) 2018-06-21

Family

ID=62558041

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/116218 Ceased WO2018108129A1 (zh) 2016-12-16 2017-12-14 用于识别物体类别的方法及装置、电子设备

Country Status (3)

Country Link
US (1) US10977523B2 (zh)
CN (1) CN108229509B (zh)
WO (1) WO2018108129A1 (zh)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109190467A (zh) * 2018-07-26 2019-01-11 北京纵目安驰智能科技有限公司 一种基于关键点回归的多物体检测方法、系统、终端和存储介质
CN109271842A (zh) * 2018-07-26 2019-01-25 北京纵目安驰智能科技有限公司 一种基于关键点回归的通用物体检测方法、系统、终端和存储介质
CN109919029A (zh) * 2019-01-31 2019-06-21 深圳和而泰数据资源与云技术有限公司 黑眼圈类型识别方法、装置、计算机设备和存储介质
CN111062263A (zh) * 2019-11-27 2020-04-24 杭州易现先进科技有限公司 手部姿态估计的方法、设备、计算机设备和存储介质
CN111079699A (zh) * 2019-12-30 2020-04-28 北京每日优鲜电子商务有限公司 商品识别方法及装置
CN111612034A (zh) * 2020-04-15 2020-09-01 中国科学院上海微系统与信息技术研究所 一种对象识别模型的确定方法、装置、电子设备及存储介质
CN111627029A (zh) * 2020-05-28 2020-09-04 北京字节跳动网络技术有限公司 图像实例分割结果的获取方法及装置
KR20200105004A (ko) * 2019-02-28 2020-09-07 현대모비스 주식회사 자동 영상 합성 장치 및 방법
CN111832559A (zh) * 2020-06-19 2020-10-27 浙江大华技术股份有限公司 目标检测方法及装置、存储介质、电子装置
CN112016398A (zh) * 2020-07-29 2020-12-01 华为技术有限公司 手持物体识别方法及装置
US11176421B2 (en) 2019-06-03 2021-11-16 Wipro Limited System and method for implementing neural network models on edge devices in IoT networks
CN113808079A (zh) * 2021-08-11 2021-12-17 同济大学 基于深度学习模型AGLNet的工业产品表面缺陷自适应检测方法

Families Citing this family (43)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10769453B2 (en) * 2017-05-16 2020-09-08 Samsung Electronics Co., Ltd. Electronic device and method of controlling operation of vehicle
WO2018033137A1 (zh) * 2016-08-19 2018-02-22 北京市商汤科技开发有限公司 在视频图像中展示业务对象的方法、装置和电子设备
US10803323B2 (en) * 2017-05-16 2020-10-13 Samsung Electronics Co., Ltd. Electronic device and method of detecting driving event of vehicle
EP3646240B1 (en) * 2017-06-26 2024-09-04 The Research Foundation for The State University of New York System, method, and computer-accessible medium for virtual pancreatography
US11164318B2 (en) * 2017-07-18 2021-11-02 Sony Interactive Entertainment Inc. Image recognition apparatus, method, and program for enabling recognition of objects with high precision
US10635979B2 (en) * 2018-07-20 2020-04-28 Google Llc Category learning neural networks
CN109187534A (zh) * 2018-08-01 2019-01-11 江苏凯纳水处理技术有限公司 水质检测方法及其水样图像识别装置
US11322073B2 (en) * 2018-09-21 2022-05-03 Dell Products, Lp Method and apparatus for dynamically optimizing gamma correction for a high dynamic ratio image
CN109448007B (zh) * 2018-11-02 2020-10-09 北京迈格威科技有限公司 图像处理方法、图像处理装置及存储介质
CN109657537A (zh) * 2018-11-05 2019-04-19 北京达佳互联信息技术有限公司 基于目标检测的图像识别方法、系统和电子设备
CN109492576B (zh) * 2018-11-07 2021-06-22 北京旷视科技有限公司 图像识别方法、装置及电子设备
CN109274625B (zh) * 2018-11-12 2020-06-19 北京邮电大学 一种信息调制方式确定方法、装置、电子设备及存储介质
CN109766928B (zh) * 2018-12-21 2020-04-17 创新奇智(重庆)科技有限公司 一种基于图像的物体分类方法、系统和电子设备
CN109919149B (zh) * 2019-01-18 2024-07-12 平安科技(深圳)有限公司 基于物体检测模型的物体标注方法及相关设备
CN109886121B (zh) * 2019-01-23 2021-04-06 浙江大学 一种遮挡鲁棒的人脸关键点定位方法
FR3094115B1 (fr) * 2019-03-22 2021-02-26 Idemia Identity & Security France Procede d’identification de bagages
US11113838B2 (en) * 2019-03-26 2021-09-07 Nec Corporation Deep learning based tattoo detection system with optimized data labeling for offline and real-time processing
CN110121077B (zh) * 2019-05-05 2021-05-07 广州方硅信息技术有限公司 一种题目生成方法、装置及设备
US20210117987A1 (en) * 2019-05-31 2021-04-22 Rakuten, Inc. Fraud estimation system, fraud estimation method and program
CN110263716B (zh) * 2019-06-21 2021-06-22 中国科学院地理科学与资源研究所 一种基于街景图像的遥感影像超分辨率土地覆被制图方法
CN110688894B (zh) * 2019-08-22 2024-05-10 平安科技(深圳)有限公司 一种手掌关键点提取方法和装置
CN110782404B (zh) * 2019-10-11 2022-06-10 北京达佳互联信息技术有限公司 一种图像处理方法、装置及存储介质
CN110852261B (zh) * 2019-11-08 2022-06-17 北京环境特性研究所 目标检测方法、装置、电子设备和可读存储介质
CN111126252B (zh) * 2019-12-20 2023-08-18 浙江大华技术股份有限公司 摆摊行为检测方法以及相关装置
CN111126339A (zh) * 2019-12-31 2020-05-08 北京奇艺世纪科技有限公司 手势识别方法、装置、计算机设备和存储介质
CN111259822A (zh) * 2020-01-19 2020-06-09 杭州微洱网络科技有限公司 针对电商图像中模特脖颈关键点的检测方法
US11783587B2 (en) * 2020-03-04 2023-10-10 Nec Corporation Deep learning tattoo match system based
CN111368751A (zh) * 2020-03-06 2020-07-03 Oppo广东移动通信有限公司 图像处理方法、装置、存储介质及电子设备
CN111428645B (zh) * 2020-03-25 2023-07-18 Oppo广东移动通信有限公司 人体关键点的检测方法、装置、电子设备及存储介质
CN111753635B (zh) * 2020-03-31 2023-08-04 杭州海康威视数字技术股份有限公司 化学实验操作的智能评分方法、装置及存储介质
CN111639533A (zh) * 2020-04-28 2020-09-08 深圳壹账通智能科技有限公司 基于步态特征的体态检测方法、装置、设备及存储介质
CN111640089B (zh) * 2020-05-09 2023-08-15 武汉精立电子技术有限公司 一种基于特征图中心点的缺陷检测方法及装置
CN111709428B (zh) * 2020-05-29 2023-09-15 北京百度网讯科技有限公司 图像中关键点位置的识别方法、装置、电子设备及介质
CN111738800B (zh) * 2020-06-30 2021-09-28 腾讯科技(深圳)有限公司 数据推荐方法、装置、计算机设备以及存储介质
CN114022830A (zh) * 2020-07-17 2022-02-08 华为技术有限公司 一种目标确定方法以及目标确定装置
CN111950451B (zh) * 2020-08-12 2024-11-22 南京师范大学 基于多尺度预测cnn及龙芯芯片的多类别目标识别方法
CN112613490B (zh) * 2021-01-08 2022-02-01 云从科技集团股份有限公司 一种行为识别方法、装置、机器可读介质及设备
CN112784743B (zh) * 2021-01-21 2023-08-04 北京百度网讯科技有限公司 关键点的识别方法、装置及存储介质
CN113065591B (zh) * 2021-03-30 2023-11-28 上海商汤智能科技有限公司 目标检测方法及装置、电子设备和存储介质
CN113553959B (zh) * 2021-07-27 2024-06-18 杭州逗酷软件科技有限公司 动作识别方法及装置、计算机可读介质和电子设备
US20230131935A1 (en) * 2021-10-21 2023-04-27 The Toronto-Dominion Bank Co-learning object and relationship detection with density aware loss
CN114332483B (zh) * 2021-11-03 2025-07-18 腾讯科技(深圳)有限公司 对象关键点检测方法及装置、训练方法及装置和计算设备
US12340547B2 (en) * 2021-12-28 2025-06-24 Samsung Electronics Co., Ltd. Electronic device and method for detecting target objects

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104463172A (zh) * 2014-12-09 2015-03-25 中国科学院重庆绿色智能技术研究院 基于人脸特征点形状驱动深度模型的人脸特征提取方法
WO2015157526A1 (en) * 2014-04-09 2015-10-15 Entrupy Inc. Authenticating physical objects using machine learning from microscopic variations
CN105335702A (zh) * 2015-10-15 2016-02-17 浙江捷尚视觉科技股份有限公司 一种基于统计学习的卡口车型识别方法
CN105678284A (zh) * 2016-02-18 2016-06-15 浙江博天科技有限公司 一种固定位人体行为分析方法

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103559504B (zh) * 2013-11-04 2016-08-31 北京京东尚科信息技术有限公司 图像目标类别识别方法及装置
CN104680508B (zh) * 2013-11-29 2018-07-03 华为技术有限公司 卷积神经网络和基于卷积神经网络的目标物体检测方法
CN104657706B (zh) * 2015-01-27 2017-12-29 北京大学 基于图像的高铁线路线杆断裂与联接结构体异常检测方法
CN105868269A (zh) * 2016-03-08 2016-08-17 中国石油大学(华东) 基于区域卷积神经网络的精确图像检索方法
CN106228158A (zh) * 2016-07-25 2016-12-14 北京小米移动软件有限公司 图片检测的方法和装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015157526A1 (en) * 2014-04-09 2015-10-15 Entrupy Inc. Authenticating physical objects using machine learning from microscopic variations
CN104463172A (zh) * 2014-12-09 2015-03-25 中国科学院重庆绿色智能技术研究院 基于人脸特征点形状驱动深度模型的人脸特征提取方法
CN105335702A (zh) * 2015-10-15 2016-02-17 浙江捷尚视觉科技股份有限公司 一种基于统计学习的卡口车型识别方法
CN105678284A (zh) * 2016-02-18 2016-06-15 浙江博天科技有限公司 一种固定位人体行为分析方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HUANG,ZI: "A target Detection Model Based on Convolutional Neural Networks", MASTER'S DISSERTATION OF SHANGAI JIAO TONG UNIVERSITY, 31 December 2015 (2015-12-31) *

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109271842A (zh) * 2018-07-26 2019-01-25 北京纵目安驰智能科技有限公司 一种基于关键点回归的通用物体检测方法、系统、终端和存储介质
CN109190467A (zh) * 2018-07-26 2019-01-11 北京纵目安驰智能科技有限公司 一种基于关键点回归的多物体检测方法、系统、终端和存储介质
CN109919029A (zh) * 2019-01-31 2019-06-21 深圳和而泰数据资源与云技术有限公司 黑眼圈类型识别方法、装置、计算机设备和存储介质
KR102190527B1 (ko) 2019-02-28 2020-12-14 현대모비스 주식회사 자동 영상 합성 장치 및 방법
KR20200105004A (ko) * 2019-02-28 2020-09-07 현대모비스 주식회사 자동 영상 합성 장치 및 방법
US11176421B2 (en) 2019-06-03 2021-11-16 Wipro Limited System and method for implementing neural network models on edge devices in IoT networks
CN111062263A (zh) * 2019-11-27 2020-04-24 杭州易现先进科技有限公司 手部姿态估计的方法、设备、计算机设备和存储介质
CN111062263B (zh) * 2019-11-27 2023-11-03 杭州易现先进科技有限公司 手部姿态估计的方法、设备、计算机设备和存储介质
CN111079699A (zh) * 2019-12-30 2020-04-28 北京每日优鲜电子商务有限公司 商品识别方法及装置
CN111612034B (zh) * 2020-04-15 2024-04-12 中国科学院上海微系统与信息技术研究所 一种对象识别模型的确定方法、装置、电子设备及存储介质
CN111612034A (zh) * 2020-04-15 2020-09-01 中国科学院上海微系统与信息技术研究所 一种对象识别模型的确定方法、装置、电子设备及存储介质
CN111627029B (zh) * 2020-05-28 2023-06-16 北京字节跳动网络技术有限公司 图像实例分割结果的获取方法及装置
CN111627029A (zh) * 2020-05-28 2020-09-04 北京字节跳动网络技术有限公司 图像实例分割结果的获取方法及装置
CN111832559A (zh) * 2020-06-19 2020-10-27 浙江大华技术股份有限公司 目标检测方法及装置、存储介质、电子装置
CN112016398A (zh) * 2020-07-29 2020-12-01 华为技术有限公司 手持物体识别方法及装置
CN113808079B (zh) * 2021-08-11 2023-06-27 同济大学 基于深度学习模型AGLNet的工业产品表面缺陷自适应检测方法
CN113808079A (zh) * 2021-08-11 2021-12-17 同济大学 基于深度学习模型AGLNet的工业产品表面缺陷自适应检测方法

Also Published As

Publication number Publication date
CN108229509B (zh) 2021-02-26
US10977523B2 (en) 2021-04-13
US20190279045A1 (en) 2019-09-12
CN108229509A (zh) 2018-06-29

Similar Documents

Publication Publication Date Title
WO2018108129A1 (zh) 用于识别物体类别的方法及装置、电子设备
CN107506761B (zh) 基于显著性学习卷积神经网络的脑部图像分割方法及系统
CN107633522B (zh) 基于局部相似性活动轮廓模型的脑部图像分割方法和系统
CN112069874B (zh) 胚胎光镜图像中细胞的识别方法及系统、设备及存储介质
CN109086811B (zh) 多标签图像分类方法、装置及电子设备
TWI747120B (zh) 深度模型訓練方法及裝置、電子設備及儲存介質
CN112541395A (zh) 一种目标检测和跟踪方法、装置、存储介质及电子装置
CN107944020A (zh) 人脸图像查找方法及装置、计算机装置和存储介质
CN108564102A (zh) 图像聚类结果评价方法和装置
CN111091101B (zh) 基于一步法的高精度行人检测方法、系统、装置
CN113780145A (zh) 精子形态检测方法、装置、计算机设备和存储介质
WO2023151237A1 (zh) 人脸位姿估计方法、装置、电子设备及存储介质
CN112419202A (zh) 基于大数据及深度学习的野生动物图像自动识别系统
CN119941731B (zh) 基于大模型的肺结节分析方法、系统、设备及介质
CN112818946A (zh) 年龄识别模型的训练、年龄识别方法、装置及电子设备
CN111353325A (zh) 关键点检测模型训练方法及装置
CN106407978A (zh) 一种结合似物度的无约束视频中显著物体检测方法
CN116597471A (zh) 人体跟踪方法、电子设备及存储介质
CN103065302A (zh) 一种基于离群数据挖掘的图像显著性检测方法
CN112199984B (zh) 一种大尺度遥感图像的目标快速检测方法
CN114596209A (zh) 指纹图像修复方法、系统、设备及存储介质
CN111325194B (zh) 一种文字识别方法、装置及设备、存储介质
CN115984618B (zh) 图像检测模型训练、图像检测方法、装置、设备及介质
CN111723688A (zh) 人体动作识别结果的评价方法、装置和电子设备
CN116664618A (zh) 数据处理方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17880222

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 05/12/2019)

122 Ep: pct application non-entry in european phase

Ref document number: 17880222

Country of ref document: EP

Kind code of ref document: A1