Detailed Description
The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant disclosure and are not limiting of the disclosure. It should be noted that, for the convenience of description, only the parts relevant to the related disclosure are shown in the drawings.
It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
Fig. 1 illustrates an exemplary system architecture 100 of a method of extracting a feature map of an image or an apparatus of extracting a feature map of an image to which an embodiment of the present disclosure may be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various client applications, such as a picture processing application, a picture taking application, and the like, may be installed on the terminal devices 101, 102, 103.
The terminal apparatuses 101, 102, and 103 may be hardware or software. When the terminal apparatuses 101, 102, 103 are hardware, various electronic apparatuses supporting transmission of images are possible. When the terminal apparatuses 101, 102, 103 are software, they can be installed in the above-described electronic apparatuses. It may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.
The server 105 may be a server that provides various services, such as a background server that performs processing such as feature extraction and feature fusion on images uploaded by the terminal apparatuses 101, 102, and 103. The background server can input the image to be detected into the feature extraction network to obtain feature maps output by at least two feature extraction layers in the feature extraction network. And for two adjacent feature extraction layers in the at least two feature extraction layers, obtaining target feature maps corresponding to the two feature extraction layers based on the two feature maps output by the two feature extraction layers.
It should be noted that the method for extracting the feature map of the image provided by the embodiment of the present disclosure is generally executed by the server 105, and accordingly, the apparatus for extracting the feature map of the image is generally disposed in the server 105.
The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to FIG. 2, a flow 200 of one embodiment of a method of extracting a feature map of an image in accordance with the present disclosure is shown. The method for extracting the feature map of the image comprises the following steps:
step 201, an image to be detected is obtained.
In the present embodiment, an execution subject (for example, the server 105 shown in fig. 1) of the method of extracting the feature map of the image may acquire the image to be detected from a local or communicatively connected electronic device. The image to be detected may be an arbitrary image. The determination of the image to be detected can be specified by a technician or can be obtained by screening according to certain conditions.
Step 202, inputting an image to be detected into a feature extraction network, wherein the feature extraction network comprises at least two feature extraction layers, and obtaining a feature graph output by a feature extraction layer of the at least two feature extraction layers.
In this embodiment, the feature extraction network may be various artificial neural networks that can be used to extract image features. For example, it may be a Feature Pyramid Network (FPN), a residual error network (ResNet), a Convolutional Neural Network (CNN), or the like. The feature extraction network includes at least two feature extraction layers. The feature extraction network generally includes an input layer, an output layer, an intermediate layer, and the like. Any layer in the feature extraction network can be regarded as a feature extraction layer. The output of the feature extraction layer is feature maps (feature maps). The feature map may be a representation of the image after the artificial neural network extracts features of the image. Generally, in the process of processing an image by using an artificial neural network, especially in the process of feature extraction, the output of each layer of the artificial neural network can be regarded as a feature map.
In this embodiment, the feature extraction network includes at least two feature extraction layers. Each of the at least two feature layers may output a feature map. It will be appreciated that the profile of the different layer outputs may be different.
Step 203, for two adjacent feature extraction layers in the at least two feature extraction layers, obtaining target feature maps corresponding to the two feature extraction layers based on the two feature maps output by the two feature extraction layers.
In this embodiment, for two adjacent feature extraction layers of the at least two feature extraction layers, the execution subject may obtain the target feature maps corresponding to the two feature extraction layers in various manners based on the two feature maps output by the two adjacent feature extraction layers.
As an example, the execution body may first perform convolution processing on the two feature maps, respectively, so that the number of channels of the two feature maps is the same. On this basis, the feature map with higher resolution of the two feature maps subjected to the convolution processing is down-sampled so that the two feature maps have the same resolution. Then, the two feature maps with the same resolution may be spliced to obtain the target feature map.
The method provided by the embodiment of the disclosure can acquire the image to be detected; inputting an image to be detected into a feature extraction network, wherein the feature extraction network comprises at least two feature extraction layers, and obtaining a feature graph output by the feature extraction layer in the at least two feature extraction layers; and for two adjacent feature extraction layers in the at least two feature extraction layers, obtaining target feature maps corresponding to the two feature extraction layers based on two feature maps output by the two feature extraction layers. In the process, two feature maps output by two adjacent feature extraction layers are fused to obtain target feature maps corresponding to the two adjacent feature extraction layers. Thereby enabling the feature map fusion to be more complete. Further, if the target feature map is used for subsequent prediction, the accuracy of prediction can be improved.
With further reference to FIG. 3, a flow 300 of yet another embodiment of a method of extracting a feature map of an image is shown. The method 300 for extracting the feature map of the image comprises the following steps:
step 301, obtaining an image to be detected.
Step 302, inputting an image to be detected into a feature extraction network, wherein the feature extraction network comprises at least two feature extraction layers, and obtaining a feature map output by a feature extraction layer of the at least two feature extraction layers.
In this embodiment, the specific implementation of the steps 301 and 302 and the technical effects thereof can refer to the step 201 and 202 in the embodiment corresponding to fig. 2, and are not described herein again.
Step 303, for two adjacent feature extraction layers of the at least two feature extraction layers, step 3031-3033 is executed to obtain target feature maps corresponding to the two feature extraction layers based on the two feature maps output by the two feature extraction layers.
Step 3031, for the feature extraction layer in the at least two feature extraction layers, performing dimensionality reduction and resampling on the feature map output by the feature extraction layer to obtain a resampled feature map corresponding to the feature extraction layer, wherein the resolution of the resampled feature map corresponding to the feature extraction layer in the at least two feature extraction layers meets a preset condition.
In this embodiment, the executing body of the method for extracting the feature map of the image may perform the dimension reduction processing on the feature maps output by the respective feature extraction layers. And then, resampling the feature map subjected to the dimension reduction processing. Up-sampling or down-sampling may be performed as desired. According to the actual needs and different preset conditions, partial or all feature maps in the output of each feature extraction layer can be resampled. In practice, the preset condition may be various conditions. For example, the resolution of each feature map may be the same.
Step 3032, rearranging the resample feature maps corresponding to the at least two feature extraction layers to obtain rearranged feature maps corresponding to the feature extraction layers in the at least two feature extraction layers.
In this embodiment, for the resampled feature maps corresponding to the at least two feature extraction layers, the execution main body may rearrange to obtain a rearranged feature map corresponding to each of the at least two feature extraction layers.
Taking the example of including four feature extraction layers, each feature extraction layer corresponds to one resampled feature map and one rearranged feature map. The execution body may select, for the four resampled feature maps, a value of the first channel of each resampled feature map as a value of the first channel of the first resampled feature map. And then selecting the value of the second channel of each resampled feature map as the value of the first channel of the second rearranged feature map. Similarly, the value of the third channel of each resampled feature map is selected as the value of the first channel of the third rearranged feature map. And selecting the value of the fourth channel of each resampling feature map as the value of the first channel of the fourth resampling feature map. Then, the value of the fifth channel of each resampled feature map is selected as the value of the second channel of the first resampled feature map. By analogy, all channels of the four resampled feature maps can be rearranged, so that four rearranged feature maps are obtained.
Step 3033, for two adjacent feature extraction layers in the at least two feature extraction layers, obtaining target feature maps corresponding to the two feature extraction layers based on the two rearranged feature maps corresponding to the two feature extraction layers.
In this embodiment, the execution body may obtain the target feature maps corresponding to the two feature extraction layers by a plurality of methods based on the two rearranged feature maps corresponding to the two feature extraction layers. As an example, the execution main body may directly splice two rearranged feature maps to obtain the target feature maps corresponding to the two feature extraction layers. In practice, the splicing of the feature map may be implemented by various existing functions (e.g., concat).
In some optional implementation manners of this embodiment, the target feature maps corresponding to the two feature extraction layers may be obtained through the following steps:
and step one, splicing the two rearranged feature maps corresponding to the two feature extraction layers to obtain a spliced feature map corresponding to the two feature extraction layers.
And secondly, inputting the spliced feature maps corresponding to the two feature extraction layers into an attention model to obtain target feature maps corresponding to the two feature extraction layers.
Wherein an Attention Model (Attention Model) is used to weight at least one channel of the feature map. So that each channel may have a different weight value.
In some optional implementations of this embodiment, the method may further include: and inputting the target characteristic graph into an image prediction network to obtain the prediction result information of the image to be detected. Wherein, the image prediction network can be an object detection network, an image segmentation network, an attitude detection network, etc. according to actual needs.
With continued reference to fig. 4, fig. 4 is a schematic diagram of an application scenario of the method for extracting a feature map of an image according to the present embodiment. In the application scenario of fig. 4, the execution subject of the method of extracting the feature map of the image may be a server. In the context of this application, the feature extraction network is a feature pyramid network 402. As shown, the feature pyramid network 402 includes four feature extraction layers 4021-4024. First, the server may acquire an image to be detected 401. Then, the image 401 to be detected is input into the feature pyramid network 402, so as to obtain the feature maps 4031-4034 respectively output by the four feature extraction layers 4021-4024. On the basis, each feature map in the four feature maps 4031-4034 is subjected to down-scaling and resampling to obtain the resampling feature maps 4041-4044 corresponding to each feature extraction layer. The resolution and the number of channels of the resample feature map 4041-4044 are the same. On this basis, the resampling feature maps 4041-.
Specifically, the value of each first channel of the resample feature map 4041-. Then, the value of each fifth channel of the resampled feature maps 4041 and 4044 is used as the value of the second channel of the resampled feature map 4051, the value of each sixth channel of the resampled feature maps 4041 and 4044 is used as the value of the second channel of the resampled feature map 4052, and so on until all the channels are arranged completely. Therefore, the rearrangement characteristic graphs 4051-4054 corresponding to the four characteristic extraction layers 4021-4024, respectively, can be obtained.
On this basis, for two adjacent feature extraction layers in the four feature extraction layers 4021-4024, the target feature maps corresponding to the two feature extraction layers can be obtained based on the two rearranged feature maps corresponding to the two feature extraction layers. Specifically, two adjacent feature extraction layers of the four feature extraction layers 4021-4024 include: feature extraction layers 4021 and 4022, feature extraction layers 4022 and 4023, and feature extraction layers 4023 and 4024. Taking the feature extraction layers 4021 and 4022 as an example, the two corresponding rearranged feature maps 4051 and 4052 may be spliced and assigned to each channel through the attention model, so as to obtain the target feature map 4062. In a similar manner, the target feature maps 4063 and 4064 can be obtained. Further, if necessary, the rearranged feature map 4051 may be directly passed through an attention model to obtain a target feature map 4061.
As can be seen from fig. 3, compared with the embodiment corresponding to fig. 2, in the embodiment, the target feature map is obtained by performing operations such as dimensionality reduction and resampling, rearrangement, stitching, and value assignment on the feature map. Therefore, the feature maps can be fused more fully, and particularly, the feature maps output by non-adjacent feature extraction layers can be fused fully. Thereby making the features indicated by the target feature map more accurate. And an accurate characteristic basis is provided for subsequent image segmentation and other processing.
With further reference to fig. 5, as an implementation of the methods shown in the above figures, the present disclosure provides an apparatus for extracting a feature map of an image, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable to various electronic devices.
As shown in fig. 5, the apparatus 500 for extracting a feature map of an image according to the present embodiment includes: an acquisition unit 501, a feature extraction unit 502, and a feature fusion unit 503. Wherein the acquisition unit 501 is configured to acquire an image to be detected. The feature extraction unit 502 is configured to input the image to be detected into a feature extraction network, where the feature extraction network includes at least two feature extraction layers, and obtains a feature map output by a feature extraction layer of the at least two feature extraction layers. The feature fusion unit 503 is configured to, for two adjacent feature extraction layers of the at least two feature extraction layers, obtain target feature maps corresponding to the two feature extraction layers based on two feature maps output by the two feature extraction layers.
In some optional implementations of this embodiment, the apparatus 500 further includes: a resampling unit (not shown in the figure). The resampling unit is configured to perform dimensionality reduction and resampling on a feature map output by a feature extraction layer to obtain a resampling feature map corresponding to the feature extraction layer, wherein the resolution of the resampling feature map corresponding to the feature extraction layer of the at least two feature extraction layers meets a preset condition.
In some optional implementations of this embodiment, the apparatus 500 further includes: a rearrangement unit (not shown in the figure). The rearrangement unit is configured to rearrange the resampled feature maps corresponding to the at least two feature extraction layers to obtain rearranged feature maps corresponding to the feature extraction layers in the at least two feature extraction layers; and the feature fusion unit 503 is further configured to: and for two adjacent feature extraction layers in the at least two feature extraction layers, obtaining target feature maps corresponding to the two feature extraction layers based on two rearranged feature maps corresponding to the two feature extraction layers.
In some optional implementations of this embodiment, the feature fusion unit 503 is further configured to: splicing the two rearranged feature maps corresponding to the two feature extraction layers to obtain spliced feature maps corresponding to the two feature extraction layers; and inputting the spliced feature maps corresponding to the two feature extraction layers into an attention model to obtain target feature maps corresponding to the two feature extraction layers, wherein the attention model is used for weighting at least one channel of the feature maps.
In some optional implementations of this embodiment, the apparatus 500 further includes: a prediction unit (not shown in the figure). The prediction unit is configured to input the target feature map into an image prediction network, and obtain prediction result information of the image to be detected.
In the present embodiment, the acquisition unit may first acquire an image to be detected. And then, inputting the image to be detected into a feature extraction network by a feature extraction unit, wherein the feature extraction network comprises at least two feature extraction layers, and obtaining a feature graph output by the feature extraction layer in the at least two feature extraction layers. On the basis, the feature fusion unit obtains target feature maps corresponding to two adjacent feature extraction layers in the at least two feature extraction layers based on two feature maps output by the two feature extraction layers. In the process, two feature maps output by two adjacent feature extraction layers are fused to obtain target feature maps corresponding to the two adjacent feature extraction layers. Thereby enabling the feature map fusion to be more complete. Further, if the target feature map is used for subsequent prediction, the accuracy of prediction can be improved.
Referring now to FIG. 6, a schematic diagram of an electronic device (e.g., the server of FIG. 1) 600 suitable for use in implementing embodiments of the present disclosure is shown. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 6, the electronic device 600 includes a processing means (e.g., a central processing unit, a graphics processor, etc.) 601 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 illustrates an electronic device 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 6 may represent one device or may represent multiple devices as desired.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 609, or may be installed from the storage means 608, or may be installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of the embodiments of the present disclosure.
It should be noted that the computer readable medium described in the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In embodiments of the present disclosure, however, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF, etc., or any suitable combination of the foregoing.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring an image to be detected; inputting an image to be detected into a feature extraction network, wherein the feature extraction network comprises at least two feature extraction layers, and obtaining a feature graph output by the feature extraction layer in the at least two feature extraction layers; and for two adjacent feature extraction layers in the at least two feature extraction layers, obtaining target feature maps corresponding to the two feature extraction layers based on two feature maps output by the two feature extraction layers.
Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor comprising: the device comprises an acquisition unit, a feature extraction unit and a feature fusion unit. The names of these units do not in some cases form a limitation on the unit itself, and for example, the acquisition unit may also be described as a "unit that acquires an image to be detected".
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is possible without departing from the inventive concept as defined above. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.