CN111310806A

CN111310806A - Classification network, image processing method, device, system and storage medium

Info

Publication number: CN111310806A
Application number: CN202010075053.5A
Authority: CN
Inventors: 李永波; 李伯勋; 张弛
Original assignee: Beijing Megvii Technology Co Ltd
Current assignee: Beijing Megvii Technology Co Ltd
Priority date: 2020-01-22
Filing date: 2020-01-22
Publication date: 2020-06-19
Anticipated expiration: 2040-01-22
Also published as: CN111310806B

Abstract

The invention provides a classification network, an image processing method, a device, a system and a storage medium, wherein the classification network comprises a feature extraction sub-network, a target direction classification sub-network and a target class classification sub-network, wherein: the feature extraction sub-network is used for extracting features of the input image; the target direction classification sub-network is used for generating a direction classification result of a target to be classified in the input image based on the result of the feature extraction; the object class classification sub-network is used for generating a class classification result of the object to be classified based on the feature extraction result and the direction classification result. According to the classification network provided by the embodiment of the invention, under the general framework of the classification network, the branch structure for classifying the direction of the target to be classified is added, and the classification of the target to be classified is based on the direction classification result of the added branch structure, so that the classification accuracy performance of the classification network can be improved through different dimensions.

Description

Classification network, image processing method, device, system and storage medium

Technical Field

The present invention relates to the field of object classification technologies, and in particular, to a classification network, an image processing method, an image processing apparatus, an image processing system, and a storage medium.

Background

The use of neural networks is now becoming more common in the field of image processing, such as image recognition. For example, in a security scene, classification and judgment of objects such as pedestrians and human faces by using a classification network are basic problems in scene application. In order to improve the classification accuracy of the classification network, the existing method generally monitors the learning process of the classification network by optimizing the feature extraction network part of the classification network to obtain better feature extraction or designing a more reasonable loss function.

Disclosure of Invention

The invention provides a classification network and an image processing scheme, wherein the classification network is provided with a branch structure for classifying the direction of a target to be classified under the general framework of the classification network, and the classification of the target to be classified is based on the direction classification result of the added branch structure, so that the classification accuracy of the classification network can be improved through different dimensions. The following briefly describes the classification network and the image processing scheme proposed by the present invention, and more details will be described in the following detailed description with reference to the drawings.

According to an aspect of the present invention, there is provided a classification network comprising a feature extraction sub-network, a target direction classification sub-network and a target class classification sub-network, wherein: the feature extraction sub-network is used for extracting features of the input image; the target direction classification sub-network is used for generating a direction classification result of a target to be classified in the input image based on the result of the feature extraction; the object class classification sub-network is used for generating a class classification result of the object to be classified based on the feature extraction result and the direction classification result.

In one embodiment of the invention, the loss function used by the classification network during training is the sum of the loss function of the target class classification sub-network and the loss function of the target direction classification sub-network.

In one embodiment of the invention, the feature extraction sub-network comprises a convolutional layer and a pooling layer, the target direction classification sub-network comprises a convolutional layer, a pooling layer and a fully-connected layer, and the target class classification sub-network comprises a convolutional layer, a pooling layer and a fully-connected layer.

According to another aspect of the present invention, there is provided an image processing method including: acquiring an input image, and performing feature extraction on the input image; generating a direction classification result of the target to be classified in the input image based on the result of the feature extraction; and generating a classification result of the target to be classified based on the result of the feature extraction and the direction classification result.

In one embodiment of the present invention, the image processing method is performed by a trained classification network, and the classification network includes: the feature extraction sub-network is used for extracting features of the input image; a target direction classification sub-network, configured to generate a direction classification result of a target to be classified in the input image based on a result of the feature extraction; and the object class classification sub-network is used for generating a class classification result of the object to be classified based on the feature extraction result and the direction classification result.

According to still another aspect of the present invention, there is provided an image processing apparatus including: the device comprises a characteristic extraction module, a feature extraction module and a feature extraction module, wherein the characteristic extraction module is used for acquiring an input image and extracting the characteristics of the input image; the direction classification module is used for generating a direction classification result of the target to be classified in the input image based on the result of the feature extraction; and the category classification module is used for generating a category classification result of the target to be classified based on the feature extraction result and the direction classification result.

According to a further aspect of the present invention, there is provided an image processing system comprising a processor and a storage device having stored thereon a computer program which, when executed by the processor, performs the image processing method of any of the above.

According to yet another aspect of the present invention, there is provided a storage medium having stored thereon a computer program which, when executed, performs the image processing method of any one of the above.

According to yet another aspect of the present invention, there is provided a computer program for performing the image processing method of any one of the above when executed by a computer or a processor, the computer program further being for implementing the modules in the image processing apparatus of any one of the above.

According to the classification network provided by the embodiment of the invention, under the general framework of the classification network, the branch structure for classifying the direction of the target to be classified is added, and the classification of the class of the target to be classified is based on the direction classification result of the added branch structure, so that the classification accuracy of the classification network can be improved through different dimensions, and the classification accuracy of the target in the image can be improved by using the image processing method, the image processing device and the image processing system based on the classification network.

Drawings

The above and other objects, features and advantages of the present invention will become more apparent by describing in more detail embodiments of the present invention with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings, like reference numbers generally represent like parts or steps.

Fig. 1 shows a schematic block diagram of a classification network according to an embodiment of the present invention.

FIG. 2 shows a schematic diagram of a training process of a classification network according to an embodiment of the invention.

Fig. 3 shows a schematic flow diagram of an image processing method according to an embodiment of the invention.

Fig. 4 shows a schematic block diagram of an image processing apparatus according to an embodiment of the present invention.

Fig. 5 shows a schematic block diagram of an image processing system according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, exemplary embodiments according to the present invention will be described in detail below with reference to the accompanying drawings. It is to be understood that the described embodiments are merely a subset of embodiments of the invention and not all embodiments of the invention, with the understanding that the invention is not limited to the example embodiments described herein. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the invention described herein without inventive step, shall fall within the scope of protection of the invention.

Fig. 1 shows a schematic block diagram of a classification network 100 according to an embodiment of the invention. As shown in FIG. 1, the classification network 100 includes a feature extraction sub-network 110, a target direction classification sub-network 120, and a target class classification sub-network 130. The feature extraction sub-network 110 is used for feature extraction of an input image. The object direction classification sub-network 120 is used to generate a direction classification result of an object to be classified in the input image based on the result of feature extraction output by the feature extraction sub-network 110. The object classification sub-network 130 is configured to generate a classification result of the object to be classified based on the result of feature extraction output by the feature extraction sub-network 110 and the direction classification result output by the object direction classification sub-network 120.

In the embodiment of the present invention, the target to be classified in the input image may include a target object such as a pedestrian, a human face, a vehicle, or other target objects. The final purpose of the classification network 100 is to classify the class of the object to be classified in the input image, i.e., to determine the probability of whether the object to be classified is a pedestrian, a human face, a vehicle, or another object. For example, the classification network 100 may be a network for two-classification, in which case the classification network 100 may be used to determine whether an object to be classified in an input image is a target object of a certain type, such as a human face, a pedestrian, or the like. As another example, the classification network 100 may also be a multi-classification network, in which case the classification network 100 may be used to determine what type of target object each of the targets to be classified is in the input image.

In an embodiment of the invention, the classification network 100 comprises a target direction classification subnetwork 120, the target direction classification subnetwork 120 being a subnetwork capable of classifying the direction of the target to be classified in the input image. The direction of the target to be classified may refer to a position relationship of a key portion of the target to be classified in the input image. For example, when the object to be classified is a pedestrian, the direction of the object to be classified may include a first direction and a second direction, wherein the first direction may refer to that the head of the pedestrian is above and the feet are below (i.e., the person is upright) in the input image; the second direction may refer to the input image with the pedestrian's foot above and head below (i.e., the person is upside down). For another example, when the object to be classified is a pedestrian, the direction of the object to be classified may include a third direction and a fourth direction in addition to the aforementioned first direction and second direction, wherein the third direction may refer to that the head of the pedestrian is on the left and the foot is on the right in the input image; the fourth direction may refer to a pedestrian's foot on the left and head on the right in the input image. For another example, when the object to be classified is a human face, the direction of the object to be classified may include a first direction and a second direction, where the first direction may refer to that eyes on the human face are above and a mouth is below in the input image (i.e., the human face is upright); the second direction may refer to the input image with the mouth above and the eyes below the face (i.e., the face is upside down). In other examples, the direction of the target to be classified may be other cases, which is not exemplified here.

In general, the object direction classification sub-network 120 may be used to determine the direction of the object to be classified in the input image based on the features extracted by the feature extraction sub-network 110, and the directional characteristic of the object to be classified helps to finally judge the category of the object to be classified. Based on this, the object classification sub-network 130 may output the classification result of the object to be classified based on the features extracted by the feature extraction sub-network 110 and the direction classification result of the object to be classified output by the object direction classification sub-network 120. Compared with the classification only depending on the result of feature extraction, the classification network 100 according to the embodiment of the present invention also classifies the classification of the target to be classified according to the direction classification result of the target to be classified, so that a more accurate classification result can be obtained, and the classification network can be used in various visual tasks to improve the network performance.

The classification network 100 of the present invention is further described below in conjunction with fig. 2. Fig. 2 shows a schematic diagram of a training process of the classification network 100 according to an embodiment of the invention. In fig. 2, the two categories are described as an example. As shown in fig. 2, the sample image I may be an existing sample, and a multi-directional sample may be constructed based on the existing sample. Generally, objects to be classified such as in an actual security scene have obvious directivity, so that the sample image I can be generated by up-down flipping the sample image I in two directions as an example_flip. Based on the constructed samples in different directions, the samples can be labeled to obtain sample labels. The labeled samples in different directions are then input to the classification network 100 to train the classification network.

As shown in fig. 2, a sample image I and a sample image I_flipInput to a feature extraction sub-network 110. Feature extraction subnetwork 110 can include convolutional layers, pooling layers, and like network structures. Feature extraction sub-network 110 extracts feature vectors for the sample image. The output of the feature extraction subnetwork 110 is input to a target category classification subnetwork 130 and a target direction classification subnetwork 120.

The object class classification subnetwork 130 can include convolutional layers, pooling layers, fully-connected layers, and the like. The object class classification sub-network 130 outputs class classification results, such as a classification probability P, of the sample images based on the feature vectors output by the feature extraction sub-network 110_clsAnd 1-P_cls. Wherein, P_clsCan represent a sampleProbability of the object to be classified in the image being a pedestrian, 1-P_clsThe probability that the object to be classified in the sample image is not a pedestrian can be represented. The penalty function for the target class classification sub-network 130 may be represented as L_clsIt may be a general classification loss function, such as cross-entropy loss or the like. Loss function L of target class classification sub-network 130_clsCan be classified by the classification probability P_clsCalculated with the sample label y, i.e. L_cls＝Loss(p_cls,y)。

The target direction classification subnetwork 120 can include convolutional layers, pooling layers, fully-connected layers, and the like. Target direction classification sub-network 120 outputs a direction classification result of the sample image based on the feature vectors output from feature extraction sub-network 110, e.g., a classification probability P_flipAnd 1-P_flip. Wherein, P_flipMay represent the probability that the direction of the object to be classified in the sample image is a first direction (e.g., is a forward direction), 1-P_flipA probability that the direction of the object to be classified in the sample image is not the first direction (e.g., is the reverse direction) may be represented. The penalty function for the target direction classification sub-network 120 may be expressed as L_flip. Loss function L of target direction classification subnetwork 120_flipCan be classified by the classification probability P_flipAnd a sample label y_flipPerform a calculation of L_flip＝Loss(p_flip,y_flip)。

In the training process, the feature extraction sub-network 110, the object class classification sub-network 130 and the object direction classification sub-network 120 may be jointly trained under the supervision of the loss function L of the classification network 100. Illustratively, the loss function L of the classification network 100 may be constructed as the loss function L of the target class classification subnetwork 130_clsAnd loss function L of target direction classification subnetwork 120_flipSum, i.e. L ═ L_cls+L_flip. In other examples, the loss function L of the classification network 100 may also be constructed based on L_clsAnd L_flipOther forms of loss function.

It should be noted that, in the embodiment of the present invention, the loss function L of the sub-network 130 is classified into the target class_clsAnd target direction classificationLoss function L of sub-network 120_flipThe specific form of (A) is not limited, L_clsAnd L_flipVarious suitable loss functions, existing or emerging in the future, may be employed.

The output of the classification network 100 is P_clsI.e. the output of the target direction classification subnetwork 120 is not taken as the overall network output of the classification network 100, but is only used to assist the target class classification subnetwork 130 in classifying the class of the target to be classified. During the network inference process, the output of the classification network 100 is identical to the output of a classical classification network (i.e., a classification network that includes only a feature extraction sub-network and an object class classification sub-network), i.e., the discriminant score of the object to be classified. Therefore, the inference process of the whole network does not increase the complexity of forward inference.

Based on the above description, the classification network according to the embodiment of the present invention adds the branch structure for classifying the direction of the target to be classified under the general framework of the classification network, and makes the classification of the target to be classified based on the direction classification result of the added branch structure, so that the classification accuracy of the classification network can be improved through different dimensions, and the inference complexity of the network is not increased.

An image processing method provided according to another aspect of the present invention, which may be performed by a classification network according to an embodiment of the present invention, is described below with reference to fig. 3. Fig. 3 shows a schematic flow diagram of an image processing method 300 according to an embodiment of the invention. As shown in fig. 3, the image processing method 300 may include the steps of:

in step S310, an input image is acquired, and feature extraction is performed on the input image.

In step S320, a direction classification result of the target to be classified in the input image is generated based on the result of the feature extraction.

In step S330, a category classification result of the target to be classified is generated based on the result of the feature extraction and the direction classification result.

In the embodiment of the present invention, the target to be classified in the input image may include a target object such as a pedestrian, a human face, a vehicle, or other target objects. The final objective of the image processing method 300 is to classify the class of the object to be classified in the input image, i.e. to determine the probability of whether the object to be classified is a pedestrian, a human face, a vehicle or other object. Taking two classifications as an example, the image processing method 300 may determine whether an object to be classified in an input image is a target object of a certain type, such as a human face, a pedestrian, or the like.

In the embodiment of the invention, after the characteristic extraction is carried out on the input image, the direction characteristic of the target to be classified in the input image is judged according to the result of the characteristic extraction, and then the classification of the target to be classified is judged according to the result of the characteristic extraction and the direction characteristic of the target to be classified in the input image. The direction of the target to be classified may refer to a position relationship of a key portion of the target to be classified in the input image. For example, when the object to be classified is a pedestrian, the direction of the object to be classified may include a first direction and a second direction, wherein the first direction may refer to that the head of the pedestrian is above and the feet are below (i.e., the person is upright) in the input image; the second direction may refer to the input image with the pedestrian's foot above and head below (i.e., the person is upside down). For another example, when the object to be classified is a pedestrian, the direction of the object to be classified may include a third direction and a fourth direction in addition to the aforementioned first direction and second direction, wherein the third direction may refer to that the head of the pedestrian is on the left and the foot is on the right in the input image; the fourth direction may refer to a pedestrian's foot on the left and head on the right in the input image. For another example, when the object to be classified is a human face, the direction of the object to be classified may include a first direction and a second direction, where the first direction may refer to that eyes on the human face are above and a mouth is below in the input image (i.e., the human face is upright); the second direction may refer to the input image with the mouth above and the eyes below the face (i.e., the face is upside down). In other examples, the direction of the target to be classified may be other cases, which is not exemplified here.

Because the directional characteristic of the target to be classified is helpful for judging the category of the target to be classified finally, compared with the category classification which is carried out only by the result of feature extraction, the image processing method according to the embodiment of the invention can obtain a more accurate category classification result, and can be used for various visual tasks to improve the classification performance.

In an embodiment of the present invention, the image processing method 300 may be performed by a trained classification network, which may be the classification network described above in conjunction with fig. 1 and 2. The structure and operation of the classification network for performing the image processing method 300 can be understood by those skilled in the art in conjunction with the foregoing description, and thus, for the sake of brevity, will not be described in detail herein.

The image processing method according to the embodiment of the present invention is exemplarily shown above. Illustratively, the classification network method according to embodiments of the present invention may be implemented in a device, apparatus, or system having a memory and a processor. In addition, the image processing method can be conveniently deployed on mobile equipment such as a smart phone, a tablet computer and a personal computer. Alternatively, the image processing method according to the embodiment of the present invention may also be deployed at a server side (or a cloud side). Alternatively, the image processing method according to the embodiment of the present invention may also be distributively deployed at a server side (or a cloud side) and a personal terminal side.

An image processing apparatus according to still another aspect of the present invention is described below with reference to fig. 4. Fig. 4 shows a schematic block diagram of an image processing apparatus 400 according to an embodiment of the present invention.

As shown in fig. 4, the image processing apparatus 400 according to an embodiment of the present invention includes a feature extraction module 410, a direction classification module 420, and a category classification module 430. The feature extraction module 410 is configured to acquire an input image and perform feature extraction on the input image. The direction classification module 420 is configured to generate a direction classification result of the target to be classified in the input image based on the result of the feature extraction. The category classification module 430 is configured to generate a category classification result of the target to be classified based on the result of the feature extraction and the direction classification result. The respective modules may respectively perform the respective steps/functions of the image processing method described above in connection with fig. 3.

In the embodiment of the present invention, the target to be classified in the input image may include a target object such as a pedestrian, a human face, a vehicle, or other target objects. The ultimate purpose of the image processing apparatus 400 is to classify the class of the object to be classified in the input image, i.e., to determine the probability of whether the object to be classified is a pedestrian, a human face, a vehicle, or another target object. Taking two classifications as an example, the image processing apparatus 400 may determine whether an object to be classified in an input image is a target object of a certain type, such as a human face, a pedestrian, or the like.

In an embodiment of the present invention, the feature extraction module 410 is configured to perform feature extraction on an input image, after obtaining a result of the feature extraction, the direction classification module 420 determines a direction characteristic of an object to be classified in the input image according to the result of the feature extraction, and then the category classification module determines category classification of the object to be classified based on the result of the feature extraction and by using the direction characteristic of the object to be classified in the input image. The direction of the target to be classified may refer to a position relationship of a key portion of the target to be classified in the input image. For example, when the object to be classified is a pedestrian, the direction of the object to be classified may include a first direction and a second direction, wherein the first direction may refer to that the head of the pedestrian is above and the feet are below (i.e., the person is upright) in the input image; the second direction may refer to the input image with the pedestrian's foot above and head below (i.e., the person is upside down). For another example, when the object to be classified is a pedestrian, the direction of the object to be classified may include a third direction and a fourth direction in addition to the aforementioned first direction and second direction, wherein the third direction may refer to that the head of the pedestrian is on the left and the foot is on the right in the input image; the fourth direction may refer to a pedestrian's foot on the left and head on the right in the input image. For another example, when the object to be classified is a human face, the direction of the object to be classified may include a first direction and a second direction, where the first direction may refer to that eyes on the human face are above and a mouth is below in the input image (i.e., the human face is upright); the second direction may refer to the input image with the mouth above and the eyes below the face (i.e., the face is upside down). In other examples, the direction of the target to be classified may be other cases, which is not exemplified here.

Because the directional characteristic of the target to be classified is helpful for judging the category of the target to be classified finally, compared with the category classification which is carried out only by the result of feature extraction, the image processing device according to the embodiment of the invention can obtain a more accurate category classification result, and can be used for various visual tasks to improve the classification performance.

In an embodiment of the present invention, the modules of the image processing apparatus 400 may be implemented by a trained classification network, which may be the classification network described in conjunction with fig. 1 and 2. For example, the feature extraction sub-network 110 of the classification network 100 implements the feature extraction module 410 of the image processing apparatus 400, the target direction classification sub-network 110 of the classification network 100 implements the direction classification module 420 of the image processing apparatus 400, and the target category classification sub-network 130 of the classification network 100 implements the category classification module 430 of the image processing apparatus 400. The structure and operation of the modules for executing the image processing apparatus 400 can be understood by those skilled in the art in conjunction with the foregoing description, and are not described herein again for the sake of brevity.

An image processing system provided by a further aspect of the present invention is described below in conjunction with fig. 5. Fig. 5 shows a schematic block diagram of an image processing system 500 according to an embodiment of the invention.

Fig. 5 shows a schematic block diagram of an image processing system 500 according to an embodiment of the invention. Image processing system 500 includes a storage device 510 and a processor 520.

Wherein the storage means 510 stores program code for implementing respective steps in the image processing method according to an embodiment of the present invention. The processor 520 is configured to run program codes stored in the storage 510 to perform the respective steps of the image processing method according to the embodiment of the present invention, and is configured to implement the respective modules in the image processing apparatus according to the embodiment of the present invention.

In one embodiment, the program code, when executed by the processor 520, causes the image processing system 500 to perform the steps of: acquiring an input image, and performing feature extraction on the input image; generating a direction classification result of the target to be classified in the input image based on the result of the feature extraction; and generating a classification result of the target to be classified based on the result of the feature extraction and the direction classification result.

In one embodiment of the invention, the steps that when executed by processor 520 cause image processing system 500 to perform are performed by a trained classification network comprising: the feature extraction sub-network is used for extracting features of the input image; a target direction classification sub-network, configured to generate a direction classification result of a target to be classified in the input image based on a result of the feature extraction; and the object class classification sub-network is used for generating a class classification result of the object to be classified based on the feature extraction result and the direction classification result.

Further, according to an embodiment of the present invention, there is also provided a storage medium on which program instructions are stored, which when executed by a computer or a processor, are used to perform the respective steps of the image processing method according to an embodiment of the present invention, and to implement the respective modules in the image processing apparatus according to an embodiment of the present invention. The storage medium may include, for example, a memory card of a smart phone, a storage component of a tablet computer, a hard disk of a personal computer, a Read Only Memory (ROM), an Erasable Programmable Read Only Memory (EPROM), a portable compact disc read only memory (CD-ROM), a USB memory, or any combination of the above storage media.

In one embodiment, the computer program instructions may implement the respective functional modules of the image processing apparatus according to the embodiment of the present invention when executed by a computer and/or may perform the image processing method according to the embodiment of the present invention.

In one embodiment, the computer program instructions, when executed by a computer or processor, cause the computer or processor to perform the steps of: acquiring an input image, and performing feature extraction on the input image; generating a direction classification result of the target to be classified in the input image based on the result of the feature extraction; and generating a classification result of the target to be classified based on the result of the feature extraction and the direction classification result.

In one embodiment of the invention, the computer program instructions, when executed by a computer or processor, cause the computer or processor to perform the steps performed by a trained classification network comprising: the feature extraction sub-network is used for extracting features of the input image; a target direction classification sub-network, configured to generate a direction classification result of a target to be classified in the input image based on a result of the feature extraction; and the object class classification sub-network is used for generating a class classification result of the object to be classified based on the feature extraction result and the direction classification result.

In addition, according to the embodiment of the present invention, a computer program is also provided, and the computer program may be stored on a storage medium in the cloud or in the local. When being executed by a computer or a processor, for performing the respective steps of the image processing method according to the embodiment of the present invention, and for implementing the respective modules in the image processing apparatus according to the embodiment of the present invention.

Based on the above description, the classification network according to the embodiment of the present invention adds the branch structure for classifying the direction of the target to be classified under the general framework of the classification network, and enables the classification of the category of the target to be classified based on the direction classification result of the added branch structure, so that the classification accuracy of the classification network can be improved through different dimensions, and the image processing method, apparatus, and system based on the classification network can improve the accuracy of the classification of the target in the image.

Although the illustrative embodiments have been described herein with reference to the accompanying drawings, it is to be understood that the foregoing illustrative embodiments are merely exemplary and are not intended to limit the scope of the invention thereto. Various changes and modifications may be effected therein by one of ordinary skill in the pertinent art without departing from the scope or spirit of the present invention. All such changes and modifications are intended to be included within the scope of the present invention as set forth in the appended claims.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, and for example, the division of the units is only one logical functional division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another device, or some features may be omitted, or not executed.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the invention and aiding in the understanding of one or more of the various inventive aspects. However, the method of the present invention should not be construed to reflect the intent: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

It will be understood by those skilled in the art that all of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where such features are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. It will be appreciated by those skilled in the art that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functionality of some of the modules in an item analysis apparatus according to embodiments of the present invention. The present invention may also be embodied as apparatus programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

The above description is only for the specific embodiment of the present invention or the description thereof, and the protection scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and the changes or substitutions should be covered within the protection scope of the present invention. The protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A classification network comprising a feature extraction sub-network, a target direction classification sub-network and a target class classification sub-network, wherein:

the feature extraction sub-network is used for extracting features of the input image;

the target direction classification sub-network is used for generating a direction classification result of a target to be classified in the input image based on the result of the feature extraction;

the object class classification sub-network is used for generating a class classification result of the object to be classified based on the feature extraction result and the direction classification result.

2. The classification network of claim 1, wherein the classification network, when trained, employs a penalty function that is a sum of a penalty function of the target class classification sub-network and a penalty function of the target direction classification sub-network.

3. The classification network according to claim 1 or 2, wherein the feature extraction sub-network comprises a convolutional layer and a pooling layer, wherein the target direction classification sub-network comprises a convolutional layer, a pooling layer and a fully-connected layer, and wherein the target class classification sub-network comprises a convolutional layer, a pooling layer and a fully-connected layer.

4. An image processing method, characterized in that the image processing method comprises:

acquiring an input image, and performing feature extraction on the input image;

generating a direction classification result of the target to be classified in the input image based on the result of the feature extraction; and

and generating a classification result of the target to be classified based on the feature extraction result and the direction classification result.

5. The image processing method of claim 4, wherein the image processing method is performed by a trained classification network comprising:

a target direction classification sub-network, configured to generate a direction classification result of a target to be classified in the input image based on a result of the feature extraction; and

and the target class classification sub-network is used for generating a class classification result of the target to be classified based on the feature extraction result and the direction classification result.

6. The method of claim 5, wherein the classification network employs a penalty function in the training that is a sum of a penalty function of the target class classification sub-network and a penalty function of the target direction classification sub-network.

7. The image processing method of claim 5 or 6, wherein the feature extraction sub-network comprises a convolutional layer and a pooling layer, wherein the target direction classification sub-network comprises a convolutional layer, a pooling layer and a fully-connected layer, and wherein the target class classification sub-network comprises a convolutional layer, a pooling layer and a fully-connected layer.

8. An image processing apparatus characterized by comprising:

the device comprises a characteristic extraction module, a feature extraction module and a feature extraction module, wherein the characteristic extraction module is used for acquiring an input image and extracting the characteristics of the input image;

the direction classification module is used for generating a direction classification result of the target to be classified in the input image based on the result of the feature extraction; and

and the category classification module is used for generating a category classification result of the target to be classified based on the feature extraction result and the direction classification result.

9. An image processing system, characterized in that the image processing system comprises a processor and a storage device having stored thereon a computer program which, when executed by the processor, performs the image processing method according to any one of claims 4-7.

10. A storage medium, characterized in that the storage medium has stored thereon a computer program which, when run, executes the image processing method according to any one of claims 4-7.