[go: up one dir, main page]

CN109816037B - Method and device for extracting feature map of image - Google Patents

Method and device for extracting feature map of image Download PDF

Info

Publication number
CN109816037B
CN109816037B CN201910098641.8A CN201910098641A CN109816037B CN 109816037 B CN109816037 B CN 109816037B CN 201910098641 A CN201910098641 A CN 201910098641A CN 109816037 B CN109816037 B CN 109816037B
Authority
CN
China
Prior art keywords
feature
feature extraction
extraction layers
map
layers
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910098641.8A
Other languages
Chinese (zh)
Other versions
CN109816037A (en
Inventor
喻冬东
王长虎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Douyin Vision Co Ltd
Douyin Vision Beijing Co Ltd
Original Assignee
Beijing ByteDance Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ByteDance Network Technology Co Ltd filed Critical Beijing ByteDance Network Technology Co Ltd
Priority to CN201910098641.8A priority Critical patent/CN109816037B/en
Publication of CN109816037A publication Critical patent/CN109816037A/en
Application granted granted Critical
Publication of CN109816037B publication Critical patent/CN109816037B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The embodiment of the disclosure discloses a method and a device for extracting a feature map of an image. One embodiment of the method comprises: acquiring an image to be detected; inputting an image to be detected into a feature extraction network, wherein the feature extraction network comprises at least two feature extraction layers, and obtaining a feature graph output by the feature extraction layer in the at least two feature extraction layers; and for two adjacent feature extraction layers in the at least two feature extraction layers, obtaining target feature maps corresponding to the two feature extraction layers based on two feature maps output by the two feature extraction layers. This embodiment enables a more thorough fusion of the feature maps.

Description

Method and device for extracting feature map of image
Technical Field
The embodiment of the disclosure relates to the technical field of computers, in particular to a method and a device for extracting a feature map of an image.
Background
In the existing image processing processes such as image segmentation, attitude detection, target detection and the like, an image is generally input into a feature extraction network. And extracting a feature map output by the network according to the image features to perform subsequent prediction.
Disclosure of Invention
The embodiment of the disclosure provides a method and a device for extracting a feature map of an image.
In a first aspect, an embodiment of the present disclosure provides a method for extracting a feature map of an image, where the method includes: acquiring an image to be detected; inputting an image to be detected into a feature extraction network, wherein the feature extraction network comprises at least two feature extraction layers, and obtaining a feature graph output by the feature extraction layer in the at least two feature extraction layers; and for two adjacent feature extraction layers in the at least two feature extraction layers, obtaining target feature maps corresponding to the two feature extraction layers based on two feature maps output by the two feature extraction layers.
In some embodiments, before obtaining, for two adjacent feature extraction layers of the at least two feature extraction layers, target feature maps corresponding to the two feature extraction layers based on two feature maps output by the two feature extraction layers, the method further includes: and for the feature extraction layer in the at least two feature extraction layers, performing dimensionality reduction and resampling on the feature graph output by the feature extraction layer to obtain a resampled feature graph corresponding to the feature extraction layer, wherein the resolution of the resampled feature graph corresponding to the feature extraction layer in the at least two feature extraction layers meets a preset condition.
In some embodiments, the method further comprises: rearranging the resampling feature maps corresponding to the at least two feature extraction layers to obtain rearranged feature maps corresponding to the feature extraction layers in the at least two feature extraction layers; and for two adjacent feature extraction layers in the at least two feature extraction layers, obtaining target feature maps corresponding to the two feature extraction layers based on two feature maps output by the two feature extraction layers, wherein the obtaining comprises: and for two adjacent feature extraction layers in the at least two feature extraction layers, obtaining target feature maps corresponding to the two feature extraction layers based on two rearranged feature maps corresponding to the two feature extraction layers.
In some embodiments, obtaining the target feature maps corresponding to the two feature extraction layers based on the two rearranged feature maps corresponding to the two feature extraction layers includes: splicing the two rearranged feature maps corresponding to the two feature extraction layers to obtain spliced feature maps corresponding to the two feature extraction layers; and inputting the spliced feature maps corresponding to the two feature extraction layers into an attention model to obtain target feature maps corresponding to the two feature extraction layers, wherein the attention model is used for weighting at least one channel of the feature maps.
In some embodiments, the method further comprises: and inputting the target characteristic graph into an image prediction network to obtain the prediction result information of the image to be detected.
In a second aspect, an embodiment of the present disclosure provides an apparatus for extracting a feature map of an image, including: an acquisition unit configured to acquire an image to be detected; the image detection device comprises a feature extraction unit, a feature extraction unit and a feature extraction unit, wherein the feature extraction unit is configured to input an image to be detected into a feature extraction network, the feature extraction network comprises at least two feature extraction layers, and a feature graph output by a feature extraction layer in the at least two feature extraction layers is obtained; and the feature fusion unit is configured to obtain target feature maps corresponding to two adjacent feature extraction layers in the at least two feature extraction layers based on the two feature maps output by the two feature extraction layers.
In some embodiments, the apparatus further comprises: and the resampling unit is configured to perform dimensionality reduction and resampling on the feature map output by the feature extraction layer to obtain a resampling feature map corresponding to the feature extraction layer, wherein the resolution of the resampling feature map corresponding to the feature extraction layer of the at least two feature extraction layers meets a preset condition.
In some embodiments, the apparatus further comprises: the rearrangement unit is configured to rearrange the resampling feature maps corresponding to the at least two feature extraction layers to obtain a rearrangement feature map corresponding to a feature extraction layer in the at least two feature extraction layers; and the feature fusion unit is further configured to: and for two adjacent feature extraction layers in the at least two feature extraction layers, obtaining target feature maps corresponding to the two feature extraction layers based on two rearranged feature maps corresponding to the two feature extraction layers.
In some embodiments, the feature fusion unit is further configured to: splicing the two rearranged feature maps corresponding to the two feature extraction layers to obtain spliced feature maps corresponding to the two feature extraction layers; and inputting the spliced feature maps corresponding to the two feature extraction layers into an attention model to obtain target feature maps corresponding to the two feature extraction layers, wherein the attention model is used for weighting at least one channel of the feature maps.
In some embodiments, the apparatus further comprises: and the prediction unit is configured to input the target characteristic diagram into an image prediction network to obtain prediction result information of the image to be detected.
In a third aspect, an embodiment of the present disclosure provides a server, including: one or more processors; a storage device having one or more programs stored thereon; when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the method as described in any implementation manner of the first aspect.
In a fourth aspect, an embodiment of the present disclosure provides a computer-readable medium, on which a computer program is stored, which when executed by a processor implements the method as described in any of the implementations of the first aspect.
The method and the device provided by the embodiment of the disclosure can acquire the image to be detected; inputting an image to be detected into a feature extraction network, wherein the feature extraction network comprises at least two feature extraction layers, and obtaining a feature graph output by the feature extraction layer in the at least two feature extraction layers; and for two adjacent feature extraction layers in the at least two feature extraction layers, obtaining target feature maps corresponding to the two feature extraction layers based on two feature maps output by the two feature extraction layers. In the process, two feature maps output by two adjacent feature extraction layers are fused to obtain target feature maps corresponding to the two adjacent feature extraction layers. Thereby enabling the feature map fusion to be more complete. Further, if the target feature map is used for subsequent prediction, the accuracy of prediction can be improved.
Drawings
Other features, objects and advantages of the disclosure will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present disclosure may be applied;
FIG. 2 is a flow diagram of one embodiment of a method of extracting a feature map of an image according to the present disclosure;
FIG. 3 is a flow diagram of yet another embodiment of a method of extracting a feature map of an image according to the present disclosure;
FIG. 4 is a schematic diagram of an application scenario of a method of extracting a feature map of an image according to an embodiment of the present disclosure;
FIG. 5 is a schematic structural diagram of one embodiment of an apparatus for extracting a feature map of an image according to the present disclosure;
FIG. 6 is a schematic structural diagram of an electronic device suitable for use in implementing embodiments of the present disclosure.
Detailed Description
The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant disclosure and are not limiting of the disclosure. It should be noted that, for the convenience of description, only the parts relevant to the related disclosure are shown in the drawings.
It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
Fig. 1 illustrates an exemplary system architecture 100 of a method of extracting a feature map of an image or an apparatus of extracting a feature map of an image to which an embodiment of the present disclosure may be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various client applications, such as a picture processing application, a picture taking application, and the like, may be installed on the terminal devices 101, 102, 103.
The terminal apparatuses 101, 102, and 103 may be hardware or software. When the terminal apparatuses 101, 102, 103 are hardware, various electronic apparatuses supporting transmission of images are possible. When the terminal apparatuses 101, 102, 103 are software, they can be installed in the above-described electronic apparatuses. It may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.
The server 105 may be a server that provides various services, such as a background server that performs processing such as feature extraction and feature fusion on images uploaded by the terminal apparatuses 101, 102, and 103. The background server can input the image to be detected into the feature extraction network to obtain feature maps output by at least two feature extraction layers in the feature extraction network. And for two adjacent feature extraction layers in the at least two feature extraction layers, obtaining target feature maps corresponding to the two feature extraction layers based on the two feature maps output by the two feature extraction layers.
It should be noted that the method for extracting the feature map of the image provided by the embodiment of the present disclosure is generally executed by the server 105, and accordingly, the apparatus for extracting the feature map of the image is generally disposed in the server 105.
The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to FIG. 2, a flow 200 of one embodiment of a method of extracting a feature map of an image in accordance with the present disclosure is shown. The method for extracting the feature map of the image comprises the following steps:
step 201, an image to be detected is obtained.
In the present embodiment, an execution subject (for example, the server 105 shown in fig. 1) of the method of extracting the feature map of the image may acquire the image to be detected from a local or communicatively connected electronic device. The image to be detected may be an arbitrary image. The determination of the image to be detected can be specified by a technician or can be obtained by screening according to certain conditions.
Step 202, inputting an image to be detected into a feature extraction network, wherein the feature extraction network comprises at least two feature extraction layers, and obtaining a feature graph output by a feature extraction layer of the at least two feature extraction layers.
In this embodiment, the feature extraction network may be various artificial neural networks that can be used to extract image features. For example, it may be a Feature Pyramid Network (FPN), a residual error network (ResNet), a Convolutional Neural Network (CNN), or the like. The feature extraction network includes at least two feature extraction layers. The feature extraction network generally includes an input layer, an output layer, an intermediate layer, and the like. Any layer in the feature extraction network can be regarded as a feature extraction layer. The output of the feature extraction layer is feature maps (feature maps). The feature map may be a representation of the image after the artificial neural network extracts features of the image. Generally, in the process of processing an image by using an artificial neural network, especially in the process of feature extraction, the output of each layer of the artificial neural network can be regarded as a feature map.
In this embodiment, the feature extraction network includes at least two feature extraction layers. Each of the at least two feature layers may output a feature map. It will be appreciated that the profile of the different layer outputs may be different.
Step 203, for two adjacent feature extraction layers in the at least two feature extraction layers, obtaining target feature maps corresponding to the two feature extraction layers based on the two feature maps output by the two feature extraction layers.
In this embodiment, for two adjacent feature extraction layers of the at least two feature extraction layers, the execution subject may obtain the target feature maps corresponding to the two feature extraction layers in various manners based on the two feature maps output by the two adjacent feature extraction layers.
As an example, the execution body may first perform convolution processing on the two feature maps, respectively, so that the number of channels of the two feature maps is the same. On this basis, the feature map with higher resolution of the two feature maps subjected to the convolution processing is down-sampled so that the two feature maps have the same resolution. Then, the two feature maps with the same resolution may be spliced to obtain the target feature map.
The method provided by the embodiment of the disclosure can acquire the image to be detected; inputting an image to be detected into a feature extraction network, wherein the feature extraction network comprises at least two feature extraction layers, and obtaining a feature graph output by the feature extraction layer in the at least two feature extraction layers; and for two adjacent feature extraction layers in the at least two feature extraction layers, obtaining target feature maps corresponding to the two feature extraction layers based on two feature maps output by the two feature extraction layers. In the process, two feature maps output by two adjacent feature extraction layers are fused to obtain target feature maps corresponding to the two adjacent feature extraction layers. Thereby enabling the feature map fusion to be more complete. Further, if the target feature map is used for subsequent prediction, the accuracy of prediction can be improved.
With further reference to FIG. 3, a flow 300 of yet another embodiment of a method of extracting a feature map of an image is shown. The method 300 for extracting the feature map of the image comprises the following steps:
step 301, obtaining an image to be detected.
Step 302, inputting an image to be detected into a feature extraction network, wherein the feature extraction network comprises at least two feature extraction layers, and obtaining a feature map output by a feature extraction layer of the at least two feature extraction layers.
In this embodiment, the specific implementation of the steps 301 and 302 and the technical effects thereof can refer to the step 201 and 202 in the embodiment corresponding to fig. 2, and are not described herein again.
Step 303, for two adjacent feature extraction layers of the at least two feature extraction layers, step 3031-3033 is executed to obtain target feature maps corresponding to the two feature extraction layers based on the two feature maps output by the two feature extraction layers.
Step 3031, for the feature extraction layer in the at least two feature extraction layers, performing dimensionality reduction and resampling on the feature map output by the feature extraction layer to obtain a resampled feature map corresponding to the feature extraction layer, wherein the resolution of the resampled feature map corresponding to the feature extraction layer in the at least two feature extraction layers meets a preset condition.
In this embodiment, the executing body of the method for extracting the feature map of the image may perform the dimension reduction processing on the feature maps output by the respective feature extraction layers. And then, resampling the feature map subjected to the dimension reduction processing. Up-sampling or down-sampling may be performed as desired. According to the actual needs and different preset conditions, partial or all feature maps in the output of each feature extraction layer can be resampled. In practice, the preset condition may be various conditions. For example, the resolution of each feature map may be the same.
Step 3032, rearranging the resample feature maps corresponding to the at least two feature extraction layers to obtain rearranged feature maps corresponding to the feature extraction layers in the at least two feature extraction layers.
In this embodiment, for the resampled feature maps corresponding to the at least two feature extraction layers, the execution main body may rearrange to obtain a rearranged feature map corresponding to each of the at least two feature extraction layers.
Taking the example of including four feature extraction layers, each feature extraction layer corresponds to one resampled feature map and one rearranged feature map. The execution body may select, for the four resampled feature maps, a value of the first channel of each resampled feature map as a value of the first channel of the first resampled feature map. And then selecting the value of the second channel of each resampled feature map as the value of the first channel of the second rearranged feature map. Similarly, the value of the third channel of each resampled feature map is selected as the value of the first channel of the third rearranged feature map. And selecting the value of the fourth channel of each resampling feature map as the value of the first channel of the fourth resampling feature map. Then, the value of the fifth channel of each resampled feature map is selected as the value of the second channel of the first resampled feature map. By analogy, all channels of the four resampled feature maps can be rearranged, so that four rearranged feature maps are obtained.
Step 3033, for two adjacent feature extraction layers in the at least two feature extraction layers, obtaining target feature maps corresponding to the two feature extraction layers based on the two rearranged feature maps corresponding to the two feature extraction layers.
In this embodiment, the execution body may obtain the target feature maps corresponding to the two feature extraction layers by a plurality of methods based on the two rearranged feature maps corresponding to the two feature extraction layers. As an example, the execution main body may directly splice two rearranged feature maps to obtain the target feature maps corresponding to the two feature extraction layers. In practice, the splicing of the feature map may be implemented by various existing functions (e.g., concat).
In some optional implementation manners of this embodiment, the target feature maps corresponding to the two feature extraction layers may be obtained through the following steps:
and step one, splicing the two rearranged feature maps corresponding to the two feature extraction layers to obtain a spliced feature map corresponding to the two feature extraction layers.
And secondly, inputting the spliced feature maps corresponding to the two feature extraction layers into an attention model to obtain target feature maps corresponding to the two feature extraction layers.
Wherein an Attention Model (Attention Model) is used to weight at least one channel of the feature map. So that each channel may have a different weight value.
In some optional implementations of this embodiment, the method may further include: and inputting the target characteristic graph into an image prediction network to obtain the prediction result information of the image to be detected. Wherein, the image prediction network can be an object detection network, an image segmentation network, an attitude detection network, etc. according to actual needs.
With continued reference to fig. 4, fig. 4 is a schematic diagram of an application scenario of the method for extracting a feature map of an image according to the present embodiment. In the application scenario of fig. 4, the execution subject of the method of extracting the feature map of the image may be a server. In the context of this application, the feature extraction network is a feature pyramid network 402. As shown, the feature pyramid network 402 includes four feature extraction layers 4021-4024. First, the server may acquire an image to be detected 401. Then, the image 401 to be detected is input into the feature pyramid network 402, so as to obtain the feature maps 4031-4034 respectively output by the four feature extraction layers 4021-4024. On the basis, each feature map in the four feature maps 4031-4034 is subjected to down-scaling and resampling to obtain the resampling feature maps 4041-4044 corresponding to each feature extraction layer. The resolution and the number of channels of the resample feature map 4041-4044 are the same. On this basis, the resampling feature maps 4041-.
Specifically, the value of each first channel of the resample feature map 4041-. Then, the value of each fifth channel of the resampled feature maps 4041 and 4044 is used as the value of the second channel of the resampled feature map 4051, the value of each sixth channel of the resampled feature maps 4041 and 4044 is used as the value of the second channel of the resampled feature map 4052, and so on until all the channels are arranged completely. Therefore, the rearrangement characteristic graphs 4051-4054 corresponding to the four characteristic extraction layers 4021-4024, respectively, can be obtained.
On this basis, for two adjacent feature extraction layers in the four feature extraction layers 4021-4024, the target feature maps corresponding to the two feature extraction layers can be obtained based on the two rearranged feature maps corresponding to the two feature extraction layers. Specifically, two adjacent feature extraction layers of the four feature extraction layers 4021-4024 include: feature extraction layers 4021 and 4022, feature extraction layers 4022 and 4023, and feature extraction layers 4023 and 4024. Taking the feature extraction layers 4021 and 4022 as an example, the two corresponding rearranged feature maps 4051 and 4052 may be spliced and assigned to each channel through the attention model, so as to obtain the target feature map 4062. In a similar manner, the target feature maps 4063 and 4064 can be obtained. Further, if necessary, the rearranged feature map 4051 may be directly passed through an attention model to obtain a target feature map 4061.
As can be seen from fig. 3, compared with the embodiment corresponding to fig. 2, in the embodiment, the target feature map is obtained by performing operations such as dimensionality reduction and resampling, rearrangement, stitching, and value assignment on the feature map. Therefore, the feature maps can be fused more fully, and particularly, the feature maps output by non-adjacent feature extraction layers can be fused fully. Thereby making the features indicated by the target feature map more accurate. And an accurate characteristic basis is provided for subsequent image segmentation and other processing.
With further reference to fig. 5, as an implementation of the methods shown in the above figures, the present disclosure provides an apparatus for extracting a feature map of an image, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable to various electronic devices.
As shown in fig. 5, the apparatus 500 for extracting a feature map of an image according to the present embodiment includes: an acquisition unit 501, a feature extraction unit 502, and a feature fusion unit 503. Wherein the acquisition unit 501 is configured to acquire an image to be detected. The feature extraction unit 502 is configured to input the image to be detected into a feature extraction network, where the feature extraction network includes at least two feature extraction layers, and obtains a feature map output by a feature extraction layer of the at least two feature extraction layers. The feature fusion unit 503 is configured to, for two adjacent feature extraction layers of the at least two feature extraction layers, obtain target feature maps corresponding to the two feature extraction layers based on two feature maps output by the two feature extraction layers.
In some optional implementations of this embodiment, the apparatus 500 further includes: a resampling unit (not shown in the figure). The resampling unit is configured to perform dimensionality reduction and resampling on a feature map output by a feature extraction layer to obtain a resampling feature map corresponding to the feature extraction layer, wherein the resolution of the resampling feature map corresponding to the feature extraction layer of the at least two feature extraction layers meets a preset condition.
In some optional implementations of this embodiment, the apparatus 500 further includes: a rearrangement unit (not shown in the figure). The rearrangement unit is configured to rearrange the resampled feature maps corresponding to the at least two feature extraction layers to obtain rearranged feature maps corresponding to the feature extraction layers in the at least two feature extraction layers; and the feature fusion unit 503 is further configured to: and for two adjacent feature extraction layers in the at least two feature extraction layers, obtaining target feature maps corresponding to the two feature extraction layers based on two rearranged feature maps corresponding to the two feature extraction layers.
In some optional implementations of this embodiment, the feature fusion unit 503 is further configured to: splicing the two rearranged feature maps corresponding to the two feature extraction layers to obtain spliced feature maps corresponding to the two feature extraction layers; and inputting the spliced feature maps corresponding to the two feature extraction layers into an attention model to obtain target feature maps corresponding to the two feature extraction layers, wherein the attention model is used for weighting at least one channel of the feature maps.
In some optional implementations of this embodiment, the apparatus 500 further includes: a prediction unit (not shown in the figure). The prediction unit is configured to input the target feature map into an image prediction network, and obtain prediction result information of the image to be detected.
In the present embodiment, the acquisition unit may first acquire an image to be detected. And then, inputting the image to be detected into a feature extraction network by a feature extraction unit, wherein the feature extraction network comprises at least two feature extraction layers, and obtaining a feature graph output by the feature extraction layer in the at least two feature extraction layers. On the basis, the feature fusion unit obtains target feature maps corresponding to two adjacent feature extraction layers in the at least two feature extraction layers based on two feature maps output by the two feature extraction layers. In the process, two feature maps output by two adjacent feature extraction layers are fused to obtain target feature maps corresponding to the two adjacent feature extraction layers. Thereby enabling the feature map fusion to be more complete. Further, if the target feature map is used for subsequent prediction, the accuracy of prediction can be improved.
Referring now to FIG. 6, a schematic diagram of an electronic device (e.g., the server of FIG. 1) 600 suitable for use in implementing embodiments of the present disclosure is shown. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 6, the electronic device 600 includes a processing means (e.g., a central processing unit, a graphics processor, etc.) 601 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 illustrates an electronic device 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 6 may represent one device or may represent multiple devices as desired.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 609, or may be installed from the storage means 608, or may be installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of the embodiments of the present disclosure.
It should be noted that the computer readable medium described in the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In embodiments of the present disclosure, however, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF, etc., or any suitable combination of the foregoing.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring an image to be detected; inputting an image to be detected into a feature extraction network, wherein the feature extraction network comprises at least two feature extraction layers, and obtaining a feature graph output by the feature extraction layer in the at least two feature extraction layers; and for two adjacent feature extraction layers in the at least two feature extraction layers, obtaining target feature maps corresponding to the two feature extraction layers based on two feature maps output by the two feature extraction layers.
Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor comprising: the device comprises an acquisition unit, a feature extraction unit and a feature fusion unit. The names of these units do not in some cases form a limitation on the unit itself, and for example, the acquisition unit may also be described as a "unit that acquires an image to be detected".
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is possible without departing from the inventive concept as defined above. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims (8)

1. A method of extracting a feature map of an image, comprising:
acquiring an image to be detected;
inputting the image to be detected into a feature extraction network, wherein the feature extraction network comprises at least two feature extraction layers, and obtaining a feature graph output by a feature extraction layer in the at least two feature extraction layers;
for the feature extraction layer of the at least two feature extraction layers, performing dimensionality reduction and resampling on the feature map output by the feature extraction layer to obtain a resampled feature map corresponding to the feature extraction layer, wherein the resolution of the resampled feature map corresponding to the feature extraction layer of the at least two feature extraction layers meets a preset condition;
for the resampling feature maps corresponding to the at least two feature extraction layers, selecting a value of a first channel of each resampling feature map as a value of a first channel of a first rearrangement feature map, selecting a value of a second channel of each resampling feature map as a value of a first channel of a second rearrangement feature map, and so on, determining a value of the first channel of each rearrangement feature map, after determining the value of the first channel of each rearrangement feature map, determining a value of the second channel of each rearrangement feature map by adopting a determination mode of the value of the first channel of each rearrangement feature map according to an ascending sequence of the sequence number corresponding to the channel of each resampling feature map, and so on, to obtain a rearrangement feature map corresponding to a feature extraction layer in the at least two feature extraction layers;
for two adjacent feature extraction layers in the at least two feature extraction layers, obtaining target feature maps corresponding to the two feature extraction layers based on two feature maps output by the two feature extraction layers, including: and for two adjacent feature extraction layers in the at least two feature extraction layers, obtaining target feature maps corresponding to the two feature extraction layers based on two rearranged feature maps corresponding to the two feature extraction layers.
2. The method according to claim 1, wherein the obtaining the target feature maps corresponding to the two feature extraction layers based on the two rearranged feature maps corresponding to the two feature extraction layers comprises:
splicing the two rearranged feature maps corresponding to the two feature extraction layers to obtain spliced feature maps corresponding to the two feature extraction layers;
and inputting the spliced feature maps corresponding to the two feature extraction layers into an attention model to obtain target feature maps corresponding to the two feature extraction layers, wherein the attention model is used for weighting at least one channel of the feature maps.
3. The method according to any one of claims 1-2, wherein the method further comprises:
and inputting the target characteristic graph into an image prediction network to obtain the prediction result information of the image to be detected.
4. An apparatus for extracting a feature map of an image, comprising:
an acquisition unit configured to acquire an image to be detected;
the image to be detected is input into a feature extraction network, the feature extraction network comprises at least two feature extraction layers, and a feature graph output by a feature extraction layer of the at least two feature extraction layers is obtained;
the resampling unit is configured to perform dimensionality reduction and resampling on a feature map output by a feature extraction layer of the at least two feature extraction layers to obtain a resampling feature map corresponding to the feature extraction layer, wherein the resolution of the resampling feature map corresponding to the feature extraction layer of the at least two feature extraction layers meets a preset condition;
a rearranging unit configured to select, for the resampled feature maps corresponding to the at least two feature extraction layers, a value of a first channel of each resampled feature map as a value of a first channel of a first rearranged feature map, a value of a second channel of each resampled feature map as a value of a first channel of a second rearranged feature map, and so on, determine a value of the first channel of each rearranged feature map, determine, after determining the value of the first channel of each resampled feature map, a value of the second channel of each rearranged feature map in a manner of determining the value of the first channel of each rearranged feature map according to an order of increasing sequence numbers corresponding to the channels of each resampled feature map, and so on, to obtain a rearranged feature map corresponding to a feature extraction layer of the at least two feature extraction layers;
the feature fusion unit is configured to, for two adjacent feature extraction layers of the at least two feature extraction layers, obtain target feature maps corresponding to the two feature extraction layers based on two feature maps output by the two feature extraction layers, and includes: and for two adjacent feature extraction layers in the at least two feature extraction layers, obtaining target feature maps corresponding to the two feature extraction layers based on two rearranged feature maps corresponding to the two feature extraction layers.
5. The apparatus of claim 4, wherein the feature fusion unit is further configured to:
splicing the two rearranged feature maps corresponding to the two feature extraction layers to obtain spliced feature maps corresponding to the two feature extraction layers;
and inputting the spliced feature maps corresponding to the two feature extraction layers into an attention model to obtain target feature maps corresponding to the two feature extraction layers, wherein the attention model is used for weighting at least one channel of the feature maps.
6. The apparatus of any of claims 4-5, wherein the apparatus further comprises:
and the prediction unit is configured to input the target characteristic diagram into an image prediction network to obtain the prediction result information of the image to be detected.
7. A server, comprising:
one or more processors;
a storage device having one or more programs stored thereon,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-3.
8. A computer-readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the method of any one of claims 1-3.
CN201910098641.8A 2019-01-31 2019-01-31 Method and device for extracting feature map of image Active CN109816037B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910098641.8A CN109816037B (en) 2019-01-31 2019-01-31 Method and device for extracting feature map of image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910098641.8A CN109816037B (en) 2019-01-31 2019-01-31 Method and device for extracting feature map of image

Publications (2)

Publication Number Publication Date
CN109816037A CN109816037A (en) 2019-05-28
CN109816037B true CN109816037B (en) 2021-05-25

Family

ID=66606171

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910098641.8A Active CN109816037B (en) 2019-01-31 2019-01-31 Method and device for extracting feature map of image

Country Status (1)

Country Link
CN (1) CN109816037B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110222829A (en) * 2019-06-12 2019-09-10 北京字节跳动网络技术有限公司 Feature extracting method, device, equipment and medium based on convolutional neural networks
CN110782420A (en) * 2019-09-19 2020-02-11 杭州电子科技大学 Small target feature representation enhancement method based on deep learning
CN113554742B (en) * 2020-04-26 2024-02-02 上海联影医疗科技股份有限公司 A three-dimensional image reconstruction method, device, equipment and storage medium
CN114862975B (en) * 2021-01-18 2025-12-12 阿里巴巴集团控股有限公司 Feature map processing methods and apparatus, non-volatile storage media and electronic devices
CN115661449B (en) * 2022-09-22 2023-11-21 北京百度网讯科技有限公司 Image segmentation and training method and device for image segmentation model
CN115937645A (en) * 2022-12-16 2023-04-07 北京地平线信息技术有限公司 Feature map processing method, device, computer-readable storage medium, and electronic device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104281858A (en) * 2014-09-15 2015-01-14 中安消技术有限公司 Three-dimensional convolutional neutral network training method and video anomalous event detection method and device
CN109222972A (en) * 2018-09-11 2019-01-18 华南理工大学 A kind of full brain data classification method of fMRI based on deep learning

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100209875B1 (en) * 1994-02-08 1999-07-15 윤종용 Memory control apparatus and method for reordering image data
US7636098B2 (en) * 2006-09-28 2009-12-22 Microsoft Corporation Salience preserving image fusion
CN106485268B (en) * 2016-09-27 2020-01-21 东软集团股份有限公司 Image identification method and device
CN108229523B (en) * 2017-04-13 2021-04-06 深圳市商汤科技有限公司 Image detection method, neural network training method, device and electronic equipment
CN107506822B (en) * 2017-07-26 2021-02-19 天津大学 Deep neural network method based on space fusion pooling
CN107610140A (en) * 2017-08-07 2018-01-19 中国科学院自动化研究所 Near edge detection method, device based on depth integration corrective networks
CN108536772B (en) * 2018-03-23 2020-08-14 云南师范大学 An Image Retrieval Method Based on Multi-feature Fusion and Diffusion Process Reordering
CN108875904A (en) * 2018-04-04 2018-11-23 北京迈格威科技有限公司 Image processing method, image processing apparatus and computer readable storage medium
CN108710919A (en) * 2018-05-25 2018-10-26 东南大学 A kind of crack automation delineation method based on multi-scale feature fusion deep learning
CN108960261B (en) * 2018-07-25 2021-09-24 扬州万方电子技术有限责任公司 Salient object detection method based on attention mechanism

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104281858A (en) * 2014-09-15 2015-01-14 中安消技术有限公司 Three-dimensional convolutional neutral network training method and video anomalous event detection method and device
CN109222972A (en) * 2018-09-11 2019-01-18 华南理工大学 A kind of full brain data classification method of fMRI based on deep learning

Also Published As

Publication number Publication date
CN109816037A (en) 2019-05-28

Similar Documents

Publication Publication Date Title
CN109816037B (en) Method and device for extracting feature map of image
KR102406354B1 (en) Video restoration method and apparatus, electronic device and storage medium
CN108520220B (en) Model generation method and device
CN112598762B (en) Three-dimensional lane line information generation method, device, electronic device, and medium
CN108427939B (en) Model generation method and device
CN109741388B (en) Method and apparatus for generating a binocular depth estimation model
CN109242801B (en) Image processing method and device
CN110516678B (en) Image processing method and device
CN112150490B (en) Image detection method, device, electronic equipment and computer readable medium
CN111311480B (en) Image fusion method and device
CN113592033B (en) Oil tank image recognition model training method, oil tank image recognition method and device
KR20200018411A (en) Method and apparatus for detecting burr of electrode piece
CN110310299B (en) Method and apparatus for training optical flow network, and method and apparatus for processing image
CN111784712B (en) Image processing method, device, equipment and computer readable medium
CN112070888B (en) Image generation method, device, equipment and computer readable medium
CN112330788B (en) Image processing method, device, readable medium and electronic device
CN112150491B (en) Image detection method, device, electronic equipment and computer readable medium
CN111625692B (en) Feature extraction method, device, electronic equipment and computer readable medium
CN112668194B (en) Page-based automatic driving scene library information display method, device and device
CN116010289B (en) Automatic driving simulation scene test method and device, electronic equipment and readable medium
CN112883697B (en) Workflow form generation method, device, electronic equipment and computer readable medium
CN110033413B (en) Image processing method, device, equipment and computer readable medium of client
CN111862081B (en) Image scoring method, training method and device of score prediction network
CN112464039B (en) Tree-structured data display method and device, electronic equipment and medium
CN111385603B (en) Method for embedding video into two-dimensional map

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Patentee after: Douyin Vision Co.,Ltd.

Address before: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Patentee before: Tiktok vision (Beijing) Co.,Ltd.

Address after: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Patentee after: Tiktok vision (Beijing) Co.,Ltd.

Address before: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Patentee before: BEIJING BYTEDANCE NETWORK TECHNOLOGY Co.,Ltd.