CN111666878B

CN111666878B - Object detection method and device

Info

Publication number: CN111666878B
Application number: CN202010507792.7A
Authority: CN
Inventors: 蒋进; 叶泽雄; 肖万鹏; 鞠奇
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-06-05
Filing date: 2020-06-05
Publication date: 2023-08-29
Anticipated expiration: 2040-06-05
Also published as: CN111666878A

Abstract

The embodiment of the application discloses an object detection method and device; the application provides an object detection method in the field of computer vision of artificial intelligence; the embodiment of the application acquires object images of at least two channels to obtain at least two object images, wherein one object image corresponds to one channel, a reference object image is determined from the at least two object images, object detection is carried out on the reference object image to obtain the position information of a reference object area, the object area is extracted from the object image of each channel based on the position information of the reference object area to obtain at least two object area images, object detection is carried out on the object area image of each channel to obtain an object detection result of each channel; the scheme can improve the speed of multi-channel object detection.

Description

Object detection method and device

Technical Field

The application relates to the field of artificial intelligence, in particular to an object detection method and device.

Background

Along with the development of technology, object detection is gradually applied to daily life of people, so that the requirement on the detection speed of object detection is higher and higher, in the prior art, two or more image acquisition channels are generally used for image acquisition during object detection, and because of the difference of the image acquisition channels, images acquired for the same object to be detected are not completely consistent, the acquired images need to be subjected to object detection respectively, and in the research and practice process of the prior art, the inventor of the application discovers that the detection speed is lower during image detection in the prior art.

Disclosure of Invention

The embodiment of the application provides an object detection method and device, which can improve the speed of object detection.

The embodiment of the application provides an object detection method, which comprises the following steps:

acquiring object images of at least two channels to obtain at least two object images, wherein one object image corresponds to one channel;

determining a reference object image from at least two object images;

performing object detection on the reference object image to obtain the position information of a reference object area;

extracting object areas from the object images of each channel based on the position information of the reference object areas to obtain at least two object area images;

and carrying out object detection on the object region image of each channel to obtain an object detection result of each channel.

Accordingly, an embodiment of the present application provides an object detection apparatus, including:

the acquisition module is used for acquiring object images of at least two channels to obtain at least two object images, wherein one object image corresponds to one channel;

a determining module for determining a reference object image from at least two object images;

the first detection module is used for carrying out object detection on the reference object image to obtain the position information of the reference object area;

The extraction module is used for extracting the object area from the object image of each channel based on the position information of the reference object area to obtain at least two object area images;

and the second detection module is used for carrying out object detection on the object area image of each channel to obtain an object detection result of each channel.

In some embodiments of the application, the first detection module comprises a transformation sub-module and a detection sub-module, wherein,

the transformation submodule is used for carrying out multi-scale transformation on the reference object image to obtain a plurality of images to be detected with different resolutions;

and the detection sub-module is used for carrying out object detection on the plurality of images to be detected with different resolutions to obtain a reference object area.

In some embodiments of the application, the detection submodule comprises a detection unit and a screening unit, wherein,

the detection unit is used for carrying out object detection on the image to be detected through a neural network model to obtain an initial object region and regression parameters;

and the screening unit is used for screening the initial object area based on the regression parameters to obtain a reference object area.

In some embodiments of the application, the screening unit is specifically configured to:

The initial object region is adjusted through the regression parameters, so that an adjusted region and a score of the adjusted region are obtained, and the score characterizes the probability that the adjusted region belongs to a real standard object region;

and screening the adjusted region based on the score to obtain a reference object region.

In some embodiments of the application, the detection unit is specifically configured to:

extracting features of the image to be detected through at least one convolution layer of the neural network model to obtain feature data of the image to be detected, and reducing dimensions of the feature data through at least one pooling layer of the neural network model to obtain dimension reduced data;

and processing the dimensionality reduced data through an output layer of the neural network model to obtain an initial object region and regression parameters.

In some embodiments of the application, the second detection module comprises a detection sub-module and a screening sub-module, wherein,

the detection sub-module is used for carrying out object detection on the object region image of each channel through the neural network model to obtain candidate object regions and regression parameters of each channel;

and the screening sub-module is used for screening the candidate object area of each channel based on the regression parameters to obtain the object detection result of each channel.

In some embodiments of the application, the detection submodule is specifically configured to:

extracting features of the object region image of each channel through at least one convolution layer of the neural network model to obtain feature data of the object region image of each channel, and reducing dimensions of the feature data of the object region image of each channel through at least one pooling layer of the neural network model to obtain dimension reduced data;

and processing the dimensionality reduced data through an output layer of the neural network model to obtain candidate object areas and regression parameters of each channel.

In some embodiments of the present application, the object detection apparatus further includes:

the sample acquisition module is used for acquiring training sample images;

the preprocessing module is used for preprocessing the training sample image to obtain a preprocessed training sample image;

the model detection module is used for detecting the training sample image and the preprocessed training sample image through a model to obtain a sample detection result;

and the training module is used for training the known detection result of the training sample image and the sample detection result to obtain a neural network model.

In some embodiments of the present application, the determining module is specifically configured to:

Acquiring acquisition time of each object image;

a reference object image is determined from at least two object images based on the acquisition time of each object image.

Correspondingly, the embodiment of the application also provides a storage medium, and the storage medium stores a computer program, and the computer program is suitable for being loaded by a processor to execute any one of the object detection methods provided by the embodiment of the application.

Correspondingly, the embodiment of the application also provides computer equipment, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes any one of the object detection methods provided by the embodiment of the application when executing the computer program.

The embodiment of the application firstly acquires object images of at least two channels to obtain at least two object images, wherein one object image corresponds to one channel, then determines a reference object image from the at least two object images, then performs object detection on the reference object image to obtain the position information of a reference object area, extracts an object area from the object image of each channel based on the position information of the reference object area to obtain at least two object area images, and finally performs object detection on the object area image of each channel to obtain an object detection result of each channel.

According to the scheme, the object region image can be obtained on each object image according to the position information of the reference object region of the reference object image, then the object region image is detected to obtain the object detection result of each channel, the object detection process is divided into two stages, and the position information of the reference object region obtained in the first stage is applied to all the object images.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic view of a scene of an object detection apparatus according to an embodiment of the present application;

FIG. 2 is a schematic flow chart of an object detection method according to an embodiment of the present application;

FIG. 3 is an exemplary diagram of a detection result of an object detection method according to an embodiment of the present application;

FIG. 4 is a schematic flow chart of an object detection method according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a dual-channel object detection method according to an embodiment of the present application;

FIG. 6 is a diagram of an architecture for object detection by a multi-tasking convolutional neural network provided by an embodiment of the present application;

fig. 7 is a schematic structural diagram of an object detection device according to an embodiment of the present application;

fig. 8 is another schematic structural diagram of an object detection device according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described in the present application are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to fall within the scope of the application.

The object detection method provided by the embodiment of the application relates to the field of artificial intelligence, in particular to a computer vision technology in the field of artificial intelligence.

Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

Computer Vision (CV) is a science of studying how to "look" a machine, and more specifically, to replace a human eye with a camera and a Computer to perform machine Vision such as recognition and measurement on a target, and further perform graphic processing to make the Computer process an image more suitable for human eye observation or transmission to an instrument for detection. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data. Computer vision techniques typically include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D techniques, virtual reality, augmented reality, synchronous positioning, and map construction, among others, as well as common biometric recognition techniques such as face recognition, fingerprint recognition, and others.

The embodiment of the application provides a process of detecting an object by using a neural network model, which relates to the technologies of computer vision and the like in the field of artificial intelligence, wherein the object can be detected from an image by the technology of computer vision of the artificial intelligence, for example, a human face can be detected from the image, and the specific content is described by the embodiment.

The embodiment of the application provides an object detection method and device. Specifically, the embodiment of the application may be integrated in an object detection device, where the object detection device may be integrated in an object detection computer device, where the object detection computer device may be an electronic device such as a terminal, and the terminal may be an electronic device such as a camera, a video camera, a smart phone, a tablet computer, a notebook computer, and a personal computer, as shown in fig. 1, and fig. 1 is a schematic view of a scene of the object detection device provided by the embodiment of the application.

The terminal can acquire object images of at least two channels to obtain at least two object images, wherein one object image corresponds to one channel, a reference object image is determined from the at least two object images, object detection is carried out on the reference object image to obtain position information of a reference object area, based on the position information of the reference object area, object areas are extracted from the object images of each channel to obtain at least two object area images, object detection is carried out on the object area images of each channel to obtain an object detection result of each channel.

The object detection computer device may also be an electronic device such as a server, where the server may have a data processing function, a data storage function, a data transmission function, and the like, for example, a cloud server, a mirror server, or a source server, and the server may be a single server or may be a server cluster.

The server may be configured to receive data sent by the terminal, for example, the object detection result sent by the terminal may be received; the server may also receive the request message sent by the terminal and send data to the terminal based on the request message, e.g., the server may send the terminal the object images of at least two channels collected by the terminal and uploaded to the server according to the request message of the terminal.

In addition, the server may perform the object detection process, for example, the terminal may upload the object image to the server, perform object detection by the server to obtain location information of the reference object area, extract the object area from the object image of each channel based on the location information of the reference object area, obtain at least two object area images, perform object detection on the object area image of each channel to obtain an object detection result of each channel, and send the object detection result to the terminal.

It should be noted that, the schematic view of the scenario of the object detection device shown in fig. 1 is only an example, and the object detection device and the scenario described in the embodiment of the present application are for more clearly describing the technical solution of the embodiment of the present application, and do not constitute a limitation on the technical solution provided by the embodiment of the present application, and those skilled in the art can know that, with the evolution of the object detection device and the appearance of a new service scenario, the technical solution provided by the embodiment of the present application is equally applicable to similar technical problems.

The following will describe in detail.

In this embodiment, description will be made in terms of an object detection apparatus that may be integrated in a terminal specifically, for example, a terminal provided with a storage unit, a microprocessor mounted, such as a camera, a video camera, a smart phone, a tablet computer, a notebook computer, a personal computer, and a wearable smart device.

Fig. 2 is a flow chart of an object detection method according to an embodiment of the application. The object detection method may include:

101. and acquiring object images of at least two channels to obtain at least two object images, wherein one object image corresponds to one channel.

The channel may include a channel of an image, for example, may be an optical channel, such as a visible light channel, an infrared channel, or the like, where the channel of the image is a basic concept of constructing the image or capturing the image. In the embodiment of the application, the channel can refer to an image acquisition channel, namely, what channel is needed to be adopted to construct an image. In practical applications, the collected images of different image collecting channels may be realized by different image collecting devices, each image collecting device corresponds to one channel, for example, the image of the infrared channel may be collected by the infrared collecting device, the image of the temperature channel may be collected by the thermal imaging collecting device, etc.

In the embodiment of the application, the image acquisition device can be integrated on the computer device and can be a constituent unit of the computer device, for example, on a smart phone, the acquisition device can be at least two cameras on the smart phone; the acquisition device may also be in the form of an external device as a constituent unit of the computer device, and specifically, may be flexibly set according to actual situations, without limitation. Different acquisition devices acquire images of a detection object, and each acquisition device can acquire an object image respectively. Due to the different parameters of the acquisition devices and the different installation positions of the acquisition devices on the computer device, there may be a certain degree of difference in the object images obtained by the different acquisition devices. The acquisition device may be a photographing device, which may include devices such as a visible light photographing device, an infrared photographing device, and the like.

The object image may be an image containing an object to be detected, and the object to be detected may include an object that may be acquired by any acquisition device and displayed on the object image, and may be detected by an object, such as a person, an animal (e.g., a cat, or a dog), an object (e.g., a table, or a lamp), a plant (e.g., flowers, leaves, or branches), or a natural phenomenon (e.g., fog, rain, or snow), and the like. The object image can comprise an image acquired by acquiring the object to be detected through the acquisition equipment, the imaging effect of the object image is associated with the acquisition equipment, the object to be detected and the like, the imaging effect of the object image is not limited obviously, but under the condition that other conditions are unchanged, the better the imaging effect of the object image is, the faster the implementation speed of the scheme is.

Specifically, the capturing of the object images of at least two channels may be performed by a computer device comprising a capturing device, e.g. image capturing may be performed by a camera, a smart phone, a personal computer, etc. comprising at least two cameras, resulting in at least two object images.

For example, a computer device has two acquisition devices (acquisition device 1 and acquisition device 2), and the acquisition device 1 and the acquisition device 2 acquire object images of a small a of an object to be detected, respectively, to obtain an image A1 and an image A2.

102. A reference object image is determined from the at least two object images.

The reference object image may be any object image in at least two object images, and specifically, there may be various ways of determining the reference object image from at least two object images, for example, an object image may be randomly extracted from the acquired object images, and the extracted object image is used as the reference object image; the reference object image may be determined according to the acquisition device, for example, the object image obtained by the acquisition device 1 may be designated as the reference object image; the reference object image may also be determined based on characteristics of the object image, e.g., the reference image with the worst imaging effect may be regarded as the reference object image, and so on. In the application scenario, the mode of determining the reference object image can be flexibly selected and set according to the requirement, and the actual use mode is not limited to the above listed mode, and is a feasible mode as long as the reference object image can be obtained from the object image, which is not described herein.

Determining the reference object image can accelerate the object detection process, and is helpful for rapidly completing multi-channel object detection.

For example, the object image includes an image A1 and an image A2, and the image A1 can be determined as the reference object image by means of random decimation.

In some embodiments, to speed up object detection, the step of "determining a reference object image from at least two object images" may include:

acquiring acquisition time of each object image; the object image with the earliest acquisition time is determined as a reference object image.

The acquisition time can comprise the time when the computer equipment receives the object image acquired by the acquisition equipment, the acquisition time can be recorded by the computer equipment, the computer equipment can take the object image received earlier as the reference object image, the mode is simple and easy to implement, and meanwhile, the time consumed by the computer equipment in the step of determining the reference object image can be saved, so that the process of object detection is accelerated.

For example, the computer device first receives the image A2 acquired by the acquisition device 2, and the computer device takes the image A2 as a reference object image.

103. And performing object detection on the reference object image to obtain the position information of the reference object region.

The reference object region may be a region containing the object to be detected, and the reference object region may be a part of a region of the reference object image, where the difference between the reference object region and the reference object image is that the ratio of the precise location region of the object to be detected in the reference object region is greater than the ratio of the precise location region of the object to be detected in the reference object image, that is, the reference object region is closer to the precise location region of the object to be detected.

The position information of the reference object region may be position information of the reference object region in the reference object image, the position information may be expressed in the form of a coordinate set or the like in the reference object image. For example, all coordinate points of the reference position area may be recorded, closed boundary coordinate points of the reference position area may be recorded, and so on.

The method comprises the steps of carrying out object detection on a reference object image to obtain position information of a reference object region, wherein the position information is a key step for improving the object detection speed of the object image acquired by multiple channels, the obtained reference object region can be a relatively accurate region of an object to be detected, is a detection result with certain accuracy according to an object detection mode, is beneficial to the follow-up step, and finally obtains object detection results of all object images.

The method for detecting the object of the reference object image can be carried out in various modes, the reference object image and the object template can be subjected to position matching in a template matching mode, the relative accurate position of the object to be detected in the reference object image is determined, the configuration requirement on hardware resources is low in the mode, and the object detection can be carried out on various computer devices.

Object detection can be performed by means of artificial feature plus classifier, such as Haar-like features and AdaBoost (a classifier), which can improve the maximum accuracy of object detection, and can be adjusted according to the requirement to obtain a reference object region with proper object detection accuracy; for example, the characteristics are obtained based on deep learning, the obtained reference object region is classified by a classifier and adjusted by means of a regression algorithm to obtain more accurate results, such as a Cascade convolutional neural network algorithm (Cascade CNN, cascade Convolution Neutral Network), the method can obtain higher accuracy based on the technology of convolutional neural networks and the like in the artificial intelligence field, and meanwhile, the detection accuracy of the obtained reference object region can be adjusted based on requirements, so that the method is more flexible and easy to apply; object detection may also be performed by a trained object detection model based on convolutional neural networks, and so on.

For example, the image A2 (the image A2 is a reference object image) is subject to detection, and positional information of the reference object region T is obtained.

In some embodiments, the step of performing object detection on the reference object image to obtain location information of the reference object area may include:

performing multi-scale transformation on the reference object image to obtain a plurality of images to be detected with different resolutions; and performing object detection on a plurality of images to be detected with different resolutions to obtain a reference object area.

The images to be detected can comprise images based on the reference object image but different from the resolution and the size of the reference object image, a set formed by all the images to be detected from one reference object image can be called an image pyramid, each image to be detected can be obtained by sequentially carrying out downsampling on the reference object image by selecting different downsampling factors, and the resolution and the size of the images to be detected obtained by the different downsampling factors are different from each other; each image to be detected can also be obtained based on the previous image to be detected, the reference object image is subjected to downsampling, a first image to be detected can be obtained, the first image to be detected is subjected to downsampling, a second image to be detected is obtained, and the like until all the required images to be detected are obtained, each image to be detected is the result obtained by downsampling the previous image to be detected, and the image is subjected to downsampling operation, so that the image with lower resolution and smaller size can be obtained; the scale change may also be a transformation of the reference object image into a larger-sized image, etc.

In addition, besides performing scale transformation on the reference object image, arbitrary operations such as noise reduction (image quality can be improved), parameter adjustment (parameter can be such as contrast), or clipping can be performed on the reference object image, and the operation can be convenient for smooth proceeding of a subsequent object detection process, so that a more appropriate detection result can be obtained.

For example, the reference object image A2 is subjected to scale transformation to obtain 10 different images to be detected, and the 10 images to be detected are subjected to object detection to finally obtain a reference object region of the reference object image.

In some embodiments, the step of performing object detection on a plurality of images to be detected with different resolutions to obtain a reference object area may include:

performing object detection on the image to be detected through a neural network model to obtain an initial object region and regression parameters; and screening the initial object region based on the regression parameters to obtain a reference object region.

The neural network model may be a model obtained by performing a scene application on an artificial intelligence technology, the neural network model includes a plurality of parameters, a plurality of data (the data may include text, picture, video, etc.) is input to the neural network model, a plurality of matrix operations are performed based on a certain operation rule or a set function, the neural network may output a result, and the neural network model may be trained before use to obtain optimal parameters, thereby obtaining an optimal neural network model. The number of parameters of the neural network model, the setting function used when processing data, the number thereof, and the like can be flexibly set according to the use of the neural network model, and the like.

The neural network model in this embodiment is used for performing object detection on a reference object image, before detection, the reference object image needs to be processed to obtain an image (i.e. an image to be detected) actually input into the neural network model, then object detection is performed through the neural network model to obtain an initial object region and regression parameters, where the initial object region may include a plurality of object regions, and is an initial result obtained by detecting the reference object image, the initial object region needs to be processed to obtain the reference object region, the regression parameters are parameters used for partially processing the initial object region, and if the number of the initial object regions is significantly greater than the number of the reference object regions to be obtained, the initial object region needs to be screened out; the initial object region may also be adjusted, etc., in order to obtain a more accurate reference object region.

For example, 10 different images to be detected of the reference object image A2 are input into a neural network model to obtain an initial object region P and a regression parameter M, and then the initial object region P is processed based on the regression parameter M to obtain the reference object region A2.

In some embodiments, to make a more accurate adjustment to the initial object region, the step of "filtering the initial object region based on the regression parameters to obtain the reference object region" may include:

the initial object region is adjusted through the regression parameters, the adjusted region and the score of the adjusted region are obtained, and the score characterizes the probability that the adjusted region belongs to the real standard object region; and screening the adjusted region based on the score to obtain a reference object region.

In order to obtain a more accurate reference object region, the initial object region can be adjusted through regression parameters, so that the adjusted region and the score of the adjusted region are obtained, and the score characterizes the probability that the adjusted region belongs to a real standard object region (namely, the accurate position region of an object to be detected), so that the higher the score is, the higher the probability that the adjusted region is the real standard object region is.

And then screening the adjusted region according to the score, wherein the adjusted region can be screened only according to the score, or the screening of the adjusted region can be realized by combining the position information of the adjusted region and the score, for example, the screening of the adjusted region is realized by using non-maximum suppression (NMS, non maximum suppression), specifically, the target adjusted region with the highest score is determined, then the overlapping area of the rest of the adjusted region and the target adjusted region is detected, if the adjusted region with the overlapping area larger than the set threshold exists, the adjusted region is screened out, and then the process is repeated on the reserved adjusted region until the reference object region is obtained, wherein the set threshold can be flexibly set according to actual requirements.

For example, the initial object region set P (P is a set including a plurality of initial object regions) is adjusted by the regression parameter M to obtain an adjusted region set PP and a score of each adjusted region, and the adjusted regions are screened based on the scores to obtain the reference object region A2.

In some embodiments, the step of performing object detection on the image to be detected through the neural network model to obtain an initial object region and regression parameters may include:

extracting features of the image to be detected through at least one convolution layer of the neural network model to obtain feature data of the image to be detected, and reducing dimensions of the feature data through at least one pooling layer of the neural network model to obtain dimension reduced data; and processing the dimensionality reduced data through an output layer of the neural network model to obtain an initial object region and regression parameters.

Image data obtained by image conversion of an input neural network model is subjected to feature extraction on the image data through a convolution kernel (a parameter) by a convolution layer to obtain data with a certain dimension (namely, the feature of the image is extracted), pooling operation can be performed on the obtained data for the purposes of optimizing a feature extraction process and reducing data quantity, pooling can be performed on the basis of the convolution kernel, common pooling operation includes modes such as maximum pooling, average pooling and the like, new data obtained after pooling operation is obtained, and the dimension of the new data is obviously reduced.

It should be noted that the convolution and pooling operations may be performed multiple times in the neural network, the number of convolution layers may be multiple, and the order and number of pooling layers in the neural network model are not limited, for example, one neural network model may include one convolution layer and one pooling layer, and one neural network model may include five convolution layers and two pooling layers, where the order of the convolution layers and the pooling layers may be set according to random settings.

In addition, the neural network model can further comprise an excitation layer, the excitation layer can carry out nonlinear mapping on the output result of the convolution layer, the excitation layer can realize output of the result through an excitation function, and common excitation functions such as an S-type function (Sigmoid function), a linear rectification function (ReLU, rectified Linear Unit), a hyperbolic tangent function (Tanh function) or the like have different characteristics, so that different activation functions can be flexibly selected according to actual requirements. The data after dimension reduction is processed through the output layer of the neural network model, so that an initial object region and regression parameters can be obtained, and parameters of the output layer can be set, so that the output layer outputs other desired results.

104. And extracting the object region from the object image of each channel based on the position information of the reference object region to obtain at least two object region images.

Because the imaging positions of the objects to be detected are different in the different object images due to the object images acquired by the different channels, when the object images acquired by the multiple channels are detected, each object image needs to be detected, and correction can be performed by determining the imaging position mapping relation of different acquisition devices, but the model and the position of the acquisition devices have influence on the imaging positions, and the actual operation needs to be specific to the specific operation of the different acquisition devices, so that the method is complicated.

After the position information of the reference object area is determined from the reference object images, the position information of the reference object area is extracted from each object image, the range of the obtained reference object area is larger than the accurate position area of the object to be detected to a certain extent, so that the object area image can comprise the accurate position area of the object to be detected.

For example, based on the position information of the reference object region T, the object region T1 is determined from the object image A1, the object region T2 is determined from the object image A2, and the object region images B1 and B2 are obtained.

105. And carrying out object detection on the object region image of each channel to obtain an object detection result of each channel.

For example, the object region image B1 is subjected to object detection to obtain a position region of the object to be detected in B1 (compared with a reference object region, the difference between the position region and the accurate position region of the object to be detected is smaller, the difference may even be 0) and a key point of the object to be detected (such as an eye, a tail, etc.), and the object region image B2 is subjected to object detection to obtain a position region of the object to be detected in B2 and a key point of the object to be detected (such as an elbow, an ear, etc.).

In some embodiments, the step of performing object detection on the object area image of each channel to obtain an object detection result of each channel may include:

performing object detection on the object region image of each channel through a neural network model to obtain candidate object regions and regression parameters of each channel; and screening the candidate object areas of each channel based on the regression parameters to obtain an object detection result of each channel.

For example, 5 different images to be detected of the reference object image B2 are input into a neural network model to obtain an initial object region X1 and a regression parameter Y1, and then the initial object region X1 is processed based on the regression parameter Y1 to obtain an object detection result of B2; 5 different images to be detected of the reference object image B1 are input into a neural network model to obtain an initial object region X2 and regression parameters Y2, and then the initial object region X2 is processed based on the regression parameters Y2 to obtain an object detection result of B2.

In some embodiments, the step of performing object detection on the object region image of each channel through the neural network model to obtain the candidate object region and the regression parameters of each channel may include:

extracting features of the object region image of each channel through at least one convolution layer of the neural network model to obtain feature data of the object region image of each channel, and reducing dimensions of the feature data of the object region image of each channel through at least one pooling layer of the neural network model to obtain dimension reduced data; and processing the dimensionality reduced data through an output layer of the neural network model to obtain candidate object areas and regression parameters of each channel.

It should be noted that the neural network model used herein may be different from the neural network model used for object detection of the reference object image, where the neural network model detects a detection result of each object region image, and the detection result is higher in accuracy than the reference object region.

In some embodiments, the object detection method further comprises:

acquiring a training sample image; preprocessing the training sample image to obtain a preprocessed training sample image; detecting the training sample image and the preprocessed training sample image through a model to obtain a sample detection result; training the known detection result of the training sample image and the sample detection result to adjust the parameters of the model, thereby obtaining the neural network model.

The preprocessing may include operations such as normalization, where the model is preset, the model includes initial parameters, a sample image is detected by the model, a sample detection result is obtained, the sample detection result is compared with a known detection result of the sample image, the initial parameters are updated according to the comparison result, a new model is obtained, the detection is performed again by the new model, the comparison and updating processes are performed, and the model is optimally trained by adjusting the parameters until the model meets the set requirements, so that a trained neural network model is obtained.

The method described in the above embodiments is described in further detail below by way of example.

In this embodiment, two channels are two face acquisition channels, namely a visible light channel and an infrared channel, for example, fig. 3 is a face detection result obtained in this embodiment, fig. 3 is a detection result of a left-hand color channel image, fig. 3 is a detection result of an infrared channel image, fig. 4 is a flow chart of an object detection method provided in the embodiment of the present application, and the object detection method may include:

201. the terminal respectively collects face images through the visible light channel and the infrared channel to obtain color face images and infrared face images.

202. And the terminal performs preliminary face detection on the color face image through the first model to obtain the position information of the reference face area.

For example, the first model may be a P Network (P-Net) and an R Network (R-Net, finer Network) in a Multi-task convolutional neural Network (MTCNN, multi-task Convolutional Neural Network), the P Network may generate a large number of face prediction frames, the R Network is more complex than the P Network in structure, the R Network may optimize a large number of face prediction frames generated by the P Network, specifically, the R Network may Refine and select an input face prediction frame, discard most of erroneous inputs, perform frame regression and keypoint location of a face region using frame regression and a face keypoint locator, and finally output location information of a more reliable reference face region.

203. And the terminal extracts the color face image and the infrared face image according to the position information to obtain a color area image and an infrared area image.

204. And the terminal respectively carries out face detection on the color area image and the infrared area image through a second model to obtain a face detection result of the color channel and a face detection result of the infrared channel.

For example, the second model may be an O-Network (O-Net) of MTCNN, where the O-Network has a more complex structure than the R-Network, and the O-Network is used to perform face detection on the color area image and the infrared area image respectively, so as to Output a face detection result of each image. The face detection result may include a face frame and a face key point.

Referring to fig. 5, a BGR channel is a color channel, a BGR channel image collected by the color channel performs face detection through P-Net and R-Net of MTCNN, and applies the obtained position information (i.e., a face coordinate frame) of a reference object area to an infrared channel image, and then performs face detection on a color area image and an infrared area image obtained based on the reference object area through O-Net, respectively, to obtain a BGR image detection result and an infrared image detection result. As shown in fig. 6, fig. 6 is an overall structure diagram of MTCNN for face detection, first, an image pyramid (image frame) is obtained by scaling (resolution) a test image, then, a candidate frame (a frame near a person in the figure) of a face region and a regression vector of a boundary frame are obtained by P-Net, the candidate frame is adjusted by using the regression vector to perform boundary regression (bounding box regression), the adjusted candidate frame is screened out by NMS, a processed candidate frame is obtained, an input image serving as a next stage is obtained based on the candidate frame, similarly, the boundary regression and screening are performed by obtaining candidate frames and regression parameters in R-Net and O-Net, and finally, a face detection result (a face frame and a face key point in a lower right figure) of the image is obtained.

According to the embodiment, the terminal extracts the color face image and the infrared face image respectively according to the position information to obtain a color area image and an infrared area image, then the terminal performs preliminary face detection on the color face image through a first model to obtain the position information of a reference face area, then the terminal extracts the color face image and the infrared face image respectively according to the position information to obtain the color area image and the infrared area image, and finally the terminal performs face detection on the color area image and the infrared area image through a second model to obtain a face detection result of a color channel and a face detection result of an infrared channel.

In order to better realize the object detection method provided by the embodiment of the application, the embodiment of the application also provides a device based on the object detection method. Where the meaning of nouns is the same as in the above-described object detection method, specific implementation details may be referred to in the description of the method embodiments.

As shown in fig. 7, fig. 7 is a schematic structural diagram of an object detection device according to an embodiment of the present application, where the object detection device may include an acquisition module 301, a determination module 302, a first detection module 303, an extraction module 304, and a second detection module 305, where:

the acquisition module 301 is configured to acquire object images of at least two channels, to obtain at least two object images, where one object image corresponds to one channel;

a determining module 302, configured to determine a reference object image from at least two object images;

a first detection module 303, configured to perform object detection on a reference object image to obtain location information of a reference object area;

an extracting module 304, configured to extract an object region from the object image of each channel based on the position information of the reference object region, so as to obtain at least two object region images;

and the second detection module 305 is configured to perform object detection on the object area image of each channel, so as to obtain an object detection result of each channel.

In some embodiments of the present application, as shown in fig. 8, the first detection module 303 includes a transform sub-module 3031 and a detection sub-module 3032, wherein,

the transformation submodule 3031 is used for performing multi-scale transformation on the reference object image to obtain a plurality of images to be detected with different resolutions;

and the detection submodule 3032 is used for carrying out object detection on a plurality of images to be detected with different resolutions to obtain a reference object region.

In some embodiments of the present application, the detection submodule 3032 includes a detection unit and a screening unit, wherein,

the detection unit is used for carrying out object detection on the image to be detected through the neural network model to obtain an initial object region and regression parameters;

the initial object region is adjusted through the regression parameters, the adjusted region and the score of the adjusted region are obtained, and the score characterizes the probability that the adjusted region belongs to the real standard object region;

In some embodiments of the present application, the second detection module 305 includes a detection sub-module and a screening sub-module, wherein,

the sample acquisition module is used for acquiring training sample images;

acquiring acquisition time of each object image;

a reference object image is determined from at least two object images based on the acquisition time of each object image. In the embodiment of the present application, the acquisition module 301 acquires object images of at least two channels to obtain at least two object images, where one object image corresponds to one channel, then the determination module 302 determines a reference object image from the at least two object images, the first detection module 303 performs object detection on the reference object image to obtain location information of a reference object area, the extraction module 304 extracts an object area from the object image of each channel based on the location information of the reference object area to obtain at least two object area images, and finally the second detection module 305 performs object detection on the object area image of each channel to obtain an object detection result of each channel.

In addition, the embodiment of the present application further provides a computer device, which may be a terminal or a server, as shown in fig. 9, which shows a schematic structural diagram of the computer device according to the embodiment of the present application, specifically:

the computer device may include one or more processors 401 of a processing core, memory 402 of one or more computer readable storage media, a power supply 403, and an input unit 404, among other components. Those skilled in the art will appreciate that the computer device structure shown in FIG. 9 is not limiting of the computer device and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components. Wherein:

The processor 401 is a control center of the computer device, connects various parts of the entire computer device using various interfaces and lines, and performs various functions of the computer device and processes data by running or executing software programs and/or modules stored in the memory 402, and calling data stored in the memory 402, thereby performing overall detection of the computer device. Optionally, processor 401 may include one or more processing cores; preferably, the processor 401 may integrate an application processor and a modem processor, wherein the application processor mainly processes an operating system, a user page, an application program, etc., and the modem processor mainly processes wireless communication. It will be appreciated that the modem processor described above may not be integrated into the processor 401.

The memory 402 may be used to store software programs and modules, and the processor 401 executes various functional applications and data processing by executing the software programs and modules stored in the memory 402. The memory 402 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data created according to the use of the computer device, etc. In addition, memory 402 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory 402 may also include a memory controller to provide the processor 401 with access to the memory 402.

The computer device further comprises a power supply 403 for supplying power to the various components, preferably the power supply 403 may be logically connected to the processor 401 by a power management system, so that functions of charge, discharge, and power consumption management may be performed by the power management system. The power supply 403 may also include one or more of any of a direct current or alternating current power supply, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.

The computer device may also include an input unit 404, which input unit 404 may be used to receive input numeric or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.

Although not shown, the computer device may further include a display unit or the like, which is not described herein. In particular, in this embodiment, the processor 401 in the computer device loads executable files corresponding to the processes of one or more application programs into the memory 402 according to the following instructions, and the processor 401 executes the application programs stored in the memory 402, so as to implement various functions as follows:

Acquiring object images of at least two channels to obtain at least two object images, wherein one object image corresponds to one channel, determining a reference object image from the at least two object images, performing object detection on the reference object image to obtain position information of a reference object region, extracting an object region from the object images of each channel based on the position information of the reference object region to obtain at least two object region images, and performing object detection on the object region image of each channel to obtain an object detection result of each channel.

The specific implementation of each operation above may be referred to the previous embodiments, and will not be described herein.

It will be appreciated by those of ordinary skill in the art that all or part of the steps of the various methods of the above embodiments may be performed by a computer program, or by computer program control related hardware, which may be stored in a computer readable storage medium and loaded and executed by a processor.

To this end, an embodiment of the present application further provides a storage medium in which a computer program is stored, the computer program being capable of being loaded by a processor to perform the steps of any one of the object detection methods provided by the embodiments of the present application. For example, the computer program may perform the steps of:

Wherein the storage medium may include: read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disk, and the like.

The steps in any one of the object detection methods provided in the embodiments of the present application may be executed by the computer program stored in the storage medium, so that the beneficial effects that any one of the object detection methods provided in the embodiments of the present application may be achieved, which are detailed in the previous embodiments and are not described herein. The foregoing has described in detail a method and apparatus for object detection provided by embodiments of the present application, and specific examples have been applied herein to illustrate the principles and embodiments of the present application, and the description of the foregoing embodiments is only for aiding in the understanding of the method and core idea of the present application; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in light of the ideas of the present application, the present description should not be construed as limiting the present application.

Claims

1. An object detection method, comprising:

Determining a reference object image from at least two object images;

extracting an object region from the object image of each channel based on the position information of the reference object region to obtain at least two object region images;

2. The method according to claim 1, wherein performing object detection on the reference object image to obtain location information of a reference object region includes:

performing multi-scale transformation on the reference object image to obtain a plurality of images to be detected with different resolutions;

and carrying out object detection on the plurality of images to be detected with different resolutions to obtain a reference object area.

3. The method according to claim 2, wherein performing object detection on the plurality of images to be detected with different resolutions to obtain position information of a reference object area includes:

performing object detection on the image to be detected through a neural network model to obtain an initial object region and regression parameters;

And screening the initial object region based on the regression parameters to obtain a reference object region.

4. The method of claim 3, wherein the screening the initial object region based on the regression parameters to obtain a reference object region comprises:

5. A method according to claim 3, wherein the object detection of the image to be detected by the neural network model to obtain an initial object region and regression parameters comprises:

6. The method according to claim 1, wherein the performing object detection on the object area image of each channel to obtain an object detection result of each channel includes:

performing object detection on the object region image of each channel through a neural network model to obtain candidate object regions and regression parameters of each channel;

and screening the candidate object area of each channel based on the regression parameters to obtain an object detection result of each channel.

7. The method of claim 6, wherein the object detection on the object region image of each channel by the neural network model, to obtain the candidate object region and the regression parameters of each channel, comprises:

8. The method according to any one of claims 3 to 7, further comprising:

acquiring a training sample image;

preprocessing the training sample image to obtain a preprocessed training sample image;

detecting the training sample image and the preprocessed training sample image through a model to obtain a sample detection result;

training the known detection result of the training sample image and the sample detection result to obtain a neural network model.

9. The method of claim 1, wherein determining a reference object image from at least two object images comprises:

acquiring acquisition time of each object image;

10. An object detection apparatus, comprising:

The extraction module is used for extracting an object region from the object image of each channel based on the position information of the reference object region to obtain at least two object region images;

11. A computer device, comprising: comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method according to any one of claims 1 to 9 when the computer program is executed.

12. A storage medium storing a computer program adapted to be loaded by a processor to perform the steps of the object detection method according to any one of claims 1 to 9.