[go: up one dir, main page]

CN116703950B - A camouflage target image segmentation method and system based on multi-level feature fusion - Google Patents

A camouflage target image segmentation method and system based on multi-level feature fusion Download PDF

Info

Publication number
CN116703950B
CN116703950B CN202310982262.1A CN202310982262A CN116703950B CN 116703950 B CN116703950 B CN 116703950B CN 202310982262 A CN202310982262 A CN 202310982262A CN 116703950 B CN116703950 B CN 116703950B
Authority
CN
China
Prior art keywords
features
feature
fusion
boundary
level
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310982262.1A
Other languages
Chinese (zh)
Other versions
CN116703950A (en
Inventor
任胜兵
梁义
周佳蕾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central South University
Original Assignee
Central South University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central South University filed Critical Central South University
Priority to CN202310982262.1A priority Critical patent/CN116703950B/en
Publication of CN116703950A publication Critical patent/CN116703950A/en
Application granted granted Critical
Publication of CN116703950B publication Critical patent/CN116703950B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a camouflage target image segmentation method and a camouflage target image segmentation system based on multi-level feature fusion, wherein the method carries out global feature enhancement on a first feature image of each level and carries out local feature enhancement on a second feature image of each level; carrying out feature fusion on the reinforced local features of each level and the reinforced global features of the same level to obtain fusion features of multiple levels; conducting boundary guiding on the fusion characteristics of two shallow network layers in the fusion characteristics of multiple layers to obtain a boundary map; performing feature interaction on fusion features of adjacent network layers in the fusion features of multiple layers to obtain multiple interaction features; respectively carrying out boundary fusion on the boundary map and each interactive feature in the plurality of interactive features to obtain a plurality of boundary fusion features; based on the plurality of boundary fusion features, a camouflage target image in the camouflage target images to be segmented corresponding to each boundary fusion feature is segmented. The invention can improve the accuracy of the camouflage target image segmentation.

Description

Camouflage target image segmentation method and system based on multi-level feature fusion
Technical Field
The invention relates to the technical field of camouflage target image segmentation, in particular to a camouflage target image segmentation method and system based on multi-level feature fusion.
Background
Camouflage Object Segmentation (COS) aims at segmenting camouflage objects that are highly similar to the background. The COS uses computer vision models to assist the human vision and perception system in camouflage target image segmentation.
However, in the prior art, the local feature extraction capability of a model taking CNN (such as ResNet) as a backbone is strong, and the capability of CNN is limited when a long-range feature dependency relationship is obtained because of the limitation of a receptive field; models based on a transducer (e.g., vision transformer) benefit from the attention mechanisms in the transducer, which have strong modeling capabilities for global feature relationships, but have limitations in capturing fine-grained details, resulting in reduced expression of local features. The disguised target image segmentation is not only to segment the target from the whole based on the global features, but also to process the detailed information such as boundaries based on the local features, and the adoption of a single king network is low in efficiency because the local features and the global features are fused by means of a complex method. Most methods use simple operations to fuse multi-level features, such as stitching and summing, where high-level features and low-level features interact by first fusing the two features through an addition operation. The fused features are then fed into a Sigmoid activation function to obtain a normalized feature map, which is treated as a feature level attention map to enhance the feature representation. In this case, the way in which cross-level feature enhancement is achieved using the fused feature map obtained by the simple add operation, valuable information relating to the split camouflage target height cannot be captured. Partial models are dedicated to extracting global texture features of camouflage targets, neglecting the impact of boundaries on model expressive power, and these models do not perform well when the target object shares the same texture features as the background. Since the texture of most camouflage objects is similar to the background, distinguishing subtle differences in boundary local information is particularly important to improve model performance. Some models consider boundary features, but often monitor the predicted boundary map as a separate branch without other processing, and the boundary map information is not fully utilized.
To sum up, it is difficult to capture practical feature information and predicted boundary map information cannot be fully utilized in the prior art, and thus it is difficult to achieve accurate camouflage target segmentation.
Disclosure of Invention
The present invention aims to solve at least one of the technical problems existing in the prior art. Therefore, the invention provides a camouflage target image segmentation method and a camouflage target image segmentation system based on multi-level feature fusion, which can improve the accuracy of camouflage target image segmentation.
In a first aspect, an embodiment of the present invention provides a method for dividing a camouflage target image based on multi-level feature fusion, where the method for dividing the camouflage target image based on multi-level feature fusion includes:
obtaining a camouflage target image to be segmented;
performing multi-level feature extraction on the camouflage target image to be segmented by adopting different network layers through a first branch network and a second branch network to obtain a first feature image of multiple levels output by the first branch network and a second feature image of multiple levels output by the second branch network;
performing global feature enhancement on the first feature map of each level to obtain global features after multiple levels of enhancement; carrying out local feature enhancement on the second feature map of each level to obtain local features after multiple levels of enhancement; carrying out feature fusion on the enhanced local features of each level and the enhanced global features of the same level to obtain fusion features of multiple levels;
Conducting boundary guiding on the fusion characteristics of two shallow network layers in the fusion characteristics of the multiple layers to obtain a boundary diagram;
performing feature interaction on the fusion features of adjacent network layers in the fusion features of the multiple layers to obtain multiple interaction features;
respectively carrying out boundary fusion on the boundary map and each interactive feature in the plurality of interactive features to obtain a plurality of boundary fusion features;
and dividing a camouflage target image in the camouflage target image to be divided, which corresponds to each boundary fusion feature, based on the boundary fusion features.
Compared with the prior art, the first aspect of the invention has the following beneficial effects:
according to the method, the first branch network and the second branch network are used for carrying out multi-level feature extraction on the camouflage target image to be segmented by adopting different network layers, so that a plurality of levels of first feature images output by the first branch network and a plurality of levels of second feature images output by the second branch network are obtained, and the features in the target image can be better extracted; the method comprises the steps of carrying out global feature enhancement on a first feature map of each level to obtain global features after multiple levels of enhancement, carrying out local feature enhancement on a second feature map of each level to obtain local features after multiple levels of enhancement, carrying out feature fusion on the local features after enhancement of each level and the global features after enhancement of the same level to obtain fusion features of multiple levels, carrying out fusion on the local features after enhancement and the global features, and carrying out mutual complementation on the local features and the global features to provide comprehensive feature information for accurately dividing camouflage target images; boundary guiding is carried out on the fusion characteristics of two shallow network layers in the fusion characteristics of multiple layers to obtain a boundary map, and as more semantic information is reserved in the shallow layers, the boundary guiding is carried out by adopting the fusion characteristics of the shallow layers, so that a high-quality boundary map can be generated; performing feature interaction on fusion features of adjacent network layers in the fusion features of multiple layers to obtain multiple interaction features, wherein the fusion features of multiple layers can be mutually complemented to obtain comprehensive feature expression; boundary fusion is carried out on the boundary map and each interaction feature in the interaction features to obtain boundary fusion features, the camouflage target image in the camouflage target image to be segmented corresponding to each boundary fusion feature is segmented based on the boundary fusion features, boundary information in the boundary map is used as guidance, the features of the boundary map are integrated with the interaction features of different layers, the boundary features are thinned, clear and complete boundaries are ensured, the separation of fine foreground and background boundaries of the camouflage target is facilitated, better expressive force is provided for the separation of the camouflage target, and the accuracy of the separation of the camouflage target image is improved.
According to some embodiments of the present invention, the extracting, by using different network layers, the multi-level feature of the camouflage target image to be segmented through the first branch network and the second branch network, to obtain a first feature map of multiple levels output by the first branch network and a second feature map of multiple levels output by the second branch network, including:
extracting features of global context information of the camouflage target image to be segmented by adopting different network layers through a first branch network to obtain a first feature map of multiple layers output by the first branch network;
and extracting the characteristics of the local detail information of the camouflage target image to be segmented by adopting different network layers through a second branch network, and obtaining a plurality of layers of second characteristic images output by the second branch network.
According to some embodiments of the present invention, global feature enhancement is performed on the first feature map of each level, so as to obtain global features after multiple levels of enhancement; carrying out local feature enhancement on the second feature map of each level to obtain local features after multiple levels of enhancement; and performing feature fusion on the enhanced local feature of each level and the enhanced global feature of the same level to obtain fusion features of multiple levels, wherein the feature fusion comprises the following steps:
Carrying out global feature enhancement on the first feature map of each level by adopting a residual channel attention mechanism to obtain global features after multiple levels of enhancement;
local feature enhancement is carried out on the second feature map of each level by adopting a spatial attention mechanism, so that local features after multiple levels of enhancement are obtained;
splicing the enhanced local features of each level with the enhanced global features of the same level to obtain splicing features of multiple levels, and adopting a convolution layer to promote feature fusion of the splicing features of the multiple levels to obtain fusion features of multiple levels.
According to some embodiments of the present invention, the performing boundary guiding on the fusion features of the two shallow network layers in the multiple layers of fusion features to obtain a boundary map includes:
convolving the fusion characteristics of two shallow network layers in the fusion characteristics of the multiple layers to obtain a first convolution characteristic and a second convolution characteristic;
and performing addition operation on the first convolution characteristic and the second convolution characteristic to obtain an addition characteristic, and performing boundary guiding on the addition characteristic by adopting a plurality of convolution layers to obtain a boundary map.
According to some embodiments of the present invention, the feature interaction is performed on the fusion features of adjacent network layers in the multiple layers of fusion features, to obtain multiple interaction features, including:
introducing a multi-scale channel attention mechanism into a feature interaction module, and adding fusion features of adjacent network layers in the multi-level fusion features to obtain a plurality of addition features;
inputting each added feature into the multi-scale channel attention mechanism to obtain a plurality of multi-scale channel features;
obtaining a plurality of normalized features by adopting an activation function for each multi-scale channel feature, and obtaining a plurality of normalized difference features by subtracting each normalized feature;
performing feature enhancement on the plurality of normalized features and the plurality of normalized difference features to obtain a plurality of enhanced normalized features and a plurality of enhanced normalized difference features;
residual connection is carried out on each enhanced normalized feature and the corresponding fusion feature to obtain a plurality of first residual features, and each first residual feature is convolved to obtain a plurality of first convolution features;
residual connection is carried out on each enhanced normalized difference feature and the corresponding fusion feature to obtain a plurality of second residual features, and convolution is carried out on each second residual feature to obtain a plurality of second convolution features;
And adding each first convolution feature and the corresponding second convolution feature to obtain a plurality of added convolution features, and adopting a convolution layer to promote fusion of the added convolution features to obtain a plurality of interaction features.
According to some embodiments of the invention, the performing boundary fusion on the boundary map and each of the plurality of interaction features to obtain a plurality of boundary fusion features includes:
based on each interaction characteristic, learning a target overall characteristic by adopting a target attention head branch; wherein the target attention head branch is used for separating a target and a background on the whole based on the interaction characteristics;
based on the boundary map and each interaction characteristic, adopting a boundary attention head branch to learn boundary detail characteristics; the boundary attention head branches are used for capturing sparse local boundary information of the target based on the boundary map and the interaction characteristics;
splicing the output of each target attention head branch and the output of each boundary attention head branch corresponding to the target attention head branch to obtain a plurality of output splicing features, and adopting a convolution layer to promote feature fusion of the plurality of output splicing features to obtain a plurality of convolution fusion features;
And carrying out residual connection on each convolution fusion feature and each interaction feature corresponding to each convolution fusion feature to obtain a plurality of boundary fusion features.
According to some embodiments of the invention, the segmenting the camouflage target image in the camouflage target image to be segmented corresponding to each boundary fusion feature based on the boundary fusion features includes:
inputting the boundary fusion features into a convolution layer with a Sigmoid activation function to generate a plurality of prediction graphs;
and dividing a camouflage target image in the camouflage target images to be divided based on each prediction graph.
In a second aspect, the embodiment of the present invention further provides a camouflage target image segmentation system based on multi-level feature fusion, where the camouflage target image segmentation system based on multi-level feature fusion includes:
the data acquisition unit is used for acquiring a camouflage target image to be segmented;
the feature extraction unit is used for extracting multi-level features of the camouflage target image to be segmented by adopting different network layers through a first branch network and a second branch network to obtain a first feature image of multiple levels output by the first branch network and a second feature image of multiple levels output by the second branch network;
The feature fusion unit is used for carrying out global feature enhancement on the first feature map of each level to obtain global features after multiple levels of enhancement; carrying out local feature enhancement on the second feature map of each level to obtain local features after multiple levels of enhancement; carrying out feature fusion on the enhanced local features of each level and the enhanced global features of the same level to obtain fusion features of multiple levels;
the boundary guiding unit is used for conducting boundary guiding on the fusion characteristics of the two shallow network layers in the fusion characteristics of the multiple layers to obtain a boundary diagram;
the feature interaction unit is used for carrying out feature interaction on the fusion features of the adjacent network layers in the fusion features of the multiple layers to obtain multiple interaction features;
the boundary fusion unit is used for carrying out boundary fusion on the boundary graph and each interaction feature in the interaction features respectively to obtain a plurality of boundary fusion features;
the image segmentation unit is used for segmenting a camouflage target image in the camouflage target images to be segmented, which correspond to each boundary fusion feature, based on the boundary fusion features.
In a third aspect, an embodiment of the present invention further provides an electronic device, including:
at least one memory;
at least one processor;
at least one computer program;
the at least one computer program is stored in the at least one memory, and the at least one processor executes the at least one computer program to implement a camouflage target image segmentation method based on multi-level feature fusion as described in the first aspect.
In a fourth aspect, an embodiment of the present invention further provides a storage medium, where the storage medium is a computer readable storage medium, where the computer readable storage medium stores a computer program, where the computer program is configured to make a computer execute a method for dividing a camouflage target image based on multi-level feature fusion according to the first aspect.
It is to be understood that the advantages of the second to fourth aspects compared with the related art are the same as those of the first aspect compared with the related art, and reference may be made to the related description in the first aspect, which is not repeated herein.
Drawings
The foregoing and/or additional aspects and advantages of the invention will become apparent and may be better understood from the following description of embodiments taken in conjunction with the accompanying drawings in which:
FIG. 1 is a flow chart of a camouflage target image segmentation method based on multi-level feature fusion according to an embodiment of the invention;
FIG. 2 is a flowchart of a method for camouflage target image segmentation based on multi-level feature fusion according to another embodiment of the present invention;
FIG. 3 is a schematic view of the overall structure of a model according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a Res2Net module and a basic convolution module according to one embodiment of the present invention;
FIG. 5 is a block diagram of a residual channel attention mechanism of an embodiment of the present invention;
FIG. 6 is a block diagram of a spatial attention mechanism of an embodiment of the present invention;
FIG. 7 is a block diagram of an LGA module according to an embodiment of the present invention;
FIG. 8 is a block diagram of an MS-CA according to an embodiment of the invention;
FIG. 9 is a block diagram of a CFT module in accordance with an embodiment of the present invention;
fig. 10 is a block diagram of an MTA according to an embodiment of the invention;
FIG. 11 is a block diagram of a BMTA according to an embodiment of the invention;
fig. 12 is a block diagram of a BAH according to an embodiment of the present invention;
FIG. 13 is a block diagram of a camouflage target image segmentation system based on multi-level feature fusion according to an embodiment of the invention;
fig. 14 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present invention.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention.
In the description of the present invention, the description of first, second, etc. is for the purpose of distinguishing between technical features only and should not be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated or implicitly indicating the precedence of the technical features indicated.
In the description of the present invention, it should be understood that the direction or positional relationship indicated with respect to the description of the orientation, such as up, down, etc., is based on the direction or positional relationship shown in the drawings, is merely for convenience of describing the present invention and simplifying the description, and does not indicate or imply that the apparatus or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention.
In the description of the present invention, unless explicitly defined otherwise, terms such as arrangement, installation, connection, etc. should be construed broadly and the specific meaning of the terms in the present invention can be determined reasonably by a person skilled in the art in combination with the specific content of the technical solution.
In the prior art, the local feature extraction capability of a model taking CNN (such as ResNet) as a trunk is strong, and the capability of CNN is limited when a long-range feature dependency relationship is acquired because of the limitation of a receptive field; models based on a transducer (e.g., vision transformer) benefit from the attention mechanisms in the transducer, which have strong modeling capabilities for global feature relationships, but have limitations in capturing fine-grained details, resulting in reduced expression of local features. Most methods use simple operations to fuse multi-level features, such as stitching and summing, where high-level features and low-level features interact by first fusing the two features through an addition operation. The fused features are then fed into a Sigmoid activation function to obtain a normalized feature map, which is treated as a feature level attention map to enhance the feature representation. In this case, the way in which cross-level feature enhancement is achieved using the fused feature map obtained by the simple add operation, valuable information relating to the split camouflage target height cannot be captured. Partial models are dedicated to extracting global texture features of camouflage targets, neglecting the impact of boundaries on model expressive power, and these models do not perform well when the target object shares the same texture features as the background. Since the texture of most camouflage objects is similar to the background, distinguishing subtle differences in boundary local information is particularly important to improve model performance. Some models consider boundary features, but often monitor the predicted boundary map as a separate branch without other processing, and the boundary map information is not fully utilized.
To sum up, it is difficult to capture practical feature information in the prior art, and the predicted boundary map information cannot be fully utilized, so that it is difficult to achieve accurate camouflage target segmentation.
In order to solve the problems, the invention adopts different network layers to extract multi-level characteristics of the camouflage target image to be segmented through the first branch network and the second branch network, so as to obtain a first characteristic image of various layers output by the first branch network and a second characteristic image of various layers output by the second branch network, and the characteristics in the target image can be better extracted; the method comprises the steps of carrying out global feature enhancement on a first feature map of each level to obtain global features after multiple levels of enhancement, carrying out local feature enhancement on a second feature map of each level to obtain local features after multiple levels of enhancement, carrying out feature fusion on the local features after enhancement of each level and the global features after enhancement of the same level to obtain fusion features of multiple levels, carrying out fusion on the local features after enhancement and the global features, and carrying out mutual complementation on the local features and the global features to provide comprehensive feature information for accurately dividing camouflage target images; boundary guiding is carried out on the fusion characteristics of two shallow network layers in the fusion characteristics of multiple layers to obtain a boundary map, and as more semantic information is reserved in the shallow layers, the boundary guiding is carried out by adopting the fusion characteristics of the shallow layers, so that a high-quality boundary map can be generated; performing feature interaction on fusion features of adjacent network layers in the fusion features of multiple layers to obtain multiple interaction features, wherein the fusion features of multiple layers can be mutually complemented to obtain comprehensive feature expression; boundary fusion is carried out on the boundary map and each interaction feature in the interaction features to obtain boundary fusion features, the camouflage target image in the camouflage target image to be segmented corresponding to each boundary fusion feature is segmented based on the boundary fusion features, boundary information in the boundary map is used as guidance, the features of the boundary map are integrated with the interaction features of different layers, the boundary features are thinned, clear and complete boundaries are ensured, the separation of fine foreground and background boundaries of the camouflage target is facilitated, better expressive force is provided for the separation of the camouflage target, and the accuracy of the separation of the camouflage target image is improved.
Referring to fig. 1, an embodiment of the present invention provides a method for dividing a camouflage target image based on multi-level feature fusion, where the method includes, but is not limited to, steps S100 to S700, where:
step S100, obtaining a camouflage target image to be segmented;
step 200, performing multi-level feature extraction on a camouflage target image to be segmented by adopting different network layers through a first branch network and a second branch network to obtain a first feature image of multiple layers output by the first branch network and a second feature image of multiple layers output by the second branch network;
step S300, carrying out global feature enhancement on the first feature map of each level to obtain global features after multiple levels of enhancement; carrying out local feature enhancement on the second feature map of each level to obtain local features after multiple levels of enhancement; carrying out feature fusion on the reinforced local features of each level and the reinforced global features of the same level to obtain fusion features of multiple levels;
step S400, conducting boundary guiding on the fusion characteristics of two shallow network layers in the fusion characteristics of multiple layers to obtain a boundary diagram;
Step S500, carrying out feature interaction on fusion features of adjacent network layers in the fusion features of multiple layers to obtain multiple interaction features;
step S600, respectively carrying out boundary fusion on the boundary map and each interactive feature in the plurality of interactive features to obtain a plurality of boundary fusion features;
step S700, based on a plurality of boundary fusion features, segmenting a camouflage target image in the camouflage target images to be segmented, wherein the camouflage target image corresponds to each boundary fusion feature.
In steps S100 to S700 of some embodiments, in order to better extract features in the target image, in this embodiment, multiple levels of feature extraction are performed on the camouflage target image to be segmented by adopting different network layers through the first branch network and the second branch network, so as to obtain multiple levels of first feature images output by the first branch network and multiple levels of second feature images output by the second branch network; in order to provide comprehensive feature information for accurately dividing a camouflage target image, the embodiment obtains global features after multiple layers of enhancement by carrying out global feature enhancement on the first feature map of each layer; carrying out local feature enhancement on the second feature map of each level to obtain local features after multiple levels of enhancement; carrying out feature fusion on the reinforced local features of each level and the reinforced global features of the same level to obtain fusion features of multiple levels; in order to generate a high-quality boundary map, the boundary map is obtained by conducting boundary guidance on the fusion characteristics of two shallow network layers in the fusion characteristics of multiple layers; in order to obtain comprehensive feature expression, in the embodiment, feature interaction is performed on the fusion features of adjacent network layers in the fusion features of multiple layers to obtain multiple interaction features; in order to improve the accuracy of the camouflage target image segmentation, in the embodiment, boundary fusion is performed on the boundary map and each interactive feature in the plurality of interactive features respectively to obtain a plurality of boundary fusion features, and the camouflage target image in the camouflage target image to be segmented corresponding to each boundary fusion feature is segmented based on the plurality of boundary fusion features.
In some embodiments, the method includes performing multi-level feature extraction on a camouflage target image to be segmented by using different network layers through a first branch network and a second branch network to obtain a first feature map of multiple levels output by the first branch network and a second feature map of multiple levels output by the second branch network, including:
extracting features of global context information of a camouflage target image to be segmented by adopting different network layers through a first branch network to obtain a first feature map of multiple layers output by the first branch network;
and extracting the characteristics of the local detail information of the camouflage target image to be segmented by adopting different network layers through the second branch network, and obtaining a plurality of layers of second characteristic diagrams output by the second branch network.
Specifically, feature extraction is carried out on global context information of a camouflage target image to be segmented by adopting different network layers through a Swin-transform V2 (namely a first branch network), so as to obtain a first feature map of multiple layers output by the first branch network; the multi-head self-attention mechanism in Swin-transform V2 can break through the receptive field limitation in CNN, model the context relation pixel by pixel in the global scope, and allocate larger weight for important features so that the feature expression is richer. Extracting the characteristics of local detail information of the camouflage target image to be segmented by adopting different network layers through Res2Net (namely a second branch network), and obtaining a second characteristic image of multiple layers output by the second branch network; res2Net has stronger and more effective multi-level feature extraction capability, refines features at a finer granularity level, and highlights the difference between foreground and background.
In this embodiment, since most of the current models extract features based on a single backbone network, a complex method is required to fuse local features and global features, which is inefficient. Therefore, in the embodiment, the first branch network and the second branch network adopt different network layers to extract the multi-level characteristics of the camouflage target image to be segmented, so that the characteristics in the target image can be better extracted.
Note that, in this embodiment, the Swin-transformation v2 and Res2Net are used for feature extraction, but the embodiment is not limited to the Swin-transformation v2 and Res2Net, and may be modified according to actual situations, and is not particularly limited.
In some embodiments, global feature enhancement is performed on the first feature map of each level, so as to obtain global features after multiple levels of enhancement; carrying out local feature enhancement on the second feature map of each level to obtain local features after multiple levels of enhancement; and carrying out feature fusion on the reinforced local features of each level and the reinforced global features of the same level to obtain fusion features of multiple levels, wherein the method comprises the following steps:
carrying out global feature enhancement on the first feature map of each level by adopting a residual channel attention mechanism to obtain global features after multiple levels of enhancement;
Carrying out local feature enhancement on the second feature map of each level by adopting a spatial attention mechanism to obtain local features after the enhancement of multiple levels;
splicing the reinforced local features of each level with the reinforced global features of the same level to obtain splicing features of multiple levels, and adopting a convolution layer to promote feature fusion of the splicing features of multiple levels to obtain fusion features of multiple levels.
Specifically, a local space detail and global context information fusion module is adopted to enhance and fuse the characteristics. The local space detail and global context information fusion module simultaneously applies a channel attention mechanism and a space attention mechanism, and adopts a residual channel attention mechanism to carry out global feature enhancement on the first feature map of each level so as to obtain global features after multiple levels of enhancement; carrying out local feature enhancement on the second feature map of each level by adopting a spatial attention mechanism to obtain local features after the enhancement of multiple levels; splicing the reinforced local features of each level with the reinforced global features of the same level to obtain splicing features of multiple levels, and adopting a convolution layer to promote feature fusion of the splicing features of multiple levels to obtain fusion features of multiple levels.
In this embodiment, the local spatial detail and global context information fusion module considers the global context and local detail of the image to identify the overall trend of the image, so as to effectively supplement the feature extraction capability of the two backbone branch networks. The global information is used for extracting important global features for roughly estimating the position of the target object, the local information is used for extracting fine-grained features of the target object, and the local features and the global features complement each other so as to provide comprehensive feature information for realizing accurate disguised target segmentation.
In some embodiments, performing boundary guiding on the fusion features of two shallow network layers in the fusion features of multiple layers to obtain a boundary map, including:
convolving the fusion characteristics of two shallow network layers in the fusion characteristics of multiple layers to obtain a first convolution characteristic and a second convolution characteristic;
and performing addition operation on the first convolution feature and the second convolution feature to obtain an addition feature, and performing boundary guiding on the addition feature by adopting a plurality of convolution layers to obtain a boundary map.
Specifically, a boundary guiding module is adopted to convolve the fusion characteristics of two shallow network layers in the fusion characteristics of multiple layers, so as to obtain a first convolution characteristic and a second convolution characteristic; and performing addition operation on the first convolution feature and the second convolution feature to obtain an addition feature, and performing boundary guiding on the addition feature by adopting a plurality of convolution layers to obtain a boundary map.
In this embodiment, since more semantic information is retained in the shallow layer, boundary guidance is performed by using the fusion features of the shallow layer, so that a high-quality boundary map can be generated.
In some embodiments, feature interaction is performed on fusion features of adjacent network layers in the multiple layers of fusion features to obtain multiple interaction features, including:
introducing a multi-scale channel attention mechanism into the feature interaction module, and adding fusion features of adjacent network layers in the fusion features of multiple layers to obtain multiple addition features;
inputting each added feature into a multi-scale channel attention mechanism to obtain a plurality of multi-scale channel features;
obtaining a plurality of normalized features by adopting an activation function for each multi-scale channel feature, and obtaining a plurality of normalized difference features by subtracting each normalized feature;
performing feature enhancement on the plurality of normalized features and the plurality of normalized difference features to obtain a plurality of enhanced normalized features and a plurality of enhanced normalized difference features;
residual connection is carried out on each enhanced normalized feature and the corresponding fusion feature to obtain a plurality of first residual features, and each first residual feature is convolved to obtain a plurality of first convolution features;
Residual connection is carried out on each enhanced normalized difference feature and the corresponding fusion feature to obtain a plurality of second residual features, and convolution is carried out on each second residual feature to obtain a plurality of second convolution features;
and adding each first convolution feature and the corresponding second convolution feature to obtain a plurality of added convolution features, and adopting a convolution layer to promote fusion of the added convolution features to obtain a plurality of interaction features.
In this embodiment, the feature interaction module introduces a multi-scale channel attention mechanism for realizing efficient interaction of cross-layer features and coping with the change of the target size in the disguised target segmentation. The multi-scale channel attention mechanism has strong adaptability to different scale targets. The multi-scale channel attention mechanism is based on a double-branch structure, one branch utilizes global average pooling to obtain global features and distributes more attention to large-scale objects, and the other branch utilizes point convolution to obtain fine-granularity local details, so that the capturing of the features of small objects is facilitated. Unlike other multi-scale attention mechanisms, the multi-scale channel attention mechanism uses point convolution in two branches to compress and recover characteristics of channel dimensions, thereby aggregating multi-scale channel information of different layers, effectively characterizing multi-scale information of a convolution layer. The fusion characteristics of a plurality of layers can be mutually complemented, so that more comprehensive characteristic expression is obtained.
In some embodiments, boundary fusion is performed on the boundary map and each interactive feature in the plurality of interactive features, so as to obtain a plurality of boundary fusion features, including:
based on each interaction characteristic, a target attention head branch is adopted to learn target overall characteristics; wherein the target attention head branch is used for separating the target and the background on the whole based on the interaction characteristics;
based on the boundary graph and each interaction characteristic, boundary detail characteristics are learned by adopting a boundary attention head branch; the boundary attention head branches are used for capturing sparse local boundary information of the target based on the boundary map and the interaction characteristics;
splicing the output of each target attention head branch and the output of each boundary attention head branch corresponding to the target attention head branch to obtain a plurality of output splicing features, and adopting a convolution layer to promote feature fusion of the plurality of output splicing features to obtain a plurality of convolution fusion features;
and carrying out residual connection on each convolution fusion feature and each interaction feature corresponding to each convolution fusion feature to obtain a plurality of boundary fusion features.
In the embodiment, the boundary information in the boundary map is used as a guide, the characteristics of the boundary map are integrated with the interactive characteristics of different layers, the boundary characteristics are refined, the definition and the integrity of the boundary are ensured, and the discrimination of the fine foreground and the background boundary of the camouflage target is facilitated.
In some embodiments, segmenting a camouflage target image in a camouflage target image to be segmented corresponding to each boundary fusion feature based on a plurality of boundary fusion features, includes:
inputting a plurality of boundary fusion features into a convolution layer with a Sigmoid activation function to generate a plurality of prediction graphs;
based on each prediction graph, a camouflage target image in the camouflage target images to be segmented is segmented.
In this embodiment, based on the multiple boundary fusion features, the method has better expressive force on the segmentation of the camouflage target, and can improve the accuracy of the image segmentation of the camouflage target.
For ease of understanding by those skilled in the art, a set of preferred embodiments are provided below:
referring to fig. 2 to 3, the overall structure of the model of the present embodiment includes feature extraction, a local spatial detail and global context information fusion module (LGA module) for fusing local features and global features, a feature interaction module (CFT module) for cross-layer feature interaction, a boundary guiding module (BGM module) for predicting a boundary map, a boundary guiding multi-convolution head transpose attention module (BMTA module) for fusing boundary features, and a prediction layer for generating a final segmentation map. Firstly, respectively inputting a camouflage target image to be segmented into a Swin-TransformaerV 2 and Res2Net parallel structure to extract multi-level features, then sending features with the same resolution in two branches into an LGA module to aggregate global information and local information, then carrying out interactive fusion and enhancement on the features output by the LGA module of an adjacent network layer through a CFT module, taking the features a1 and a2 output by the LGA module corresponding to a front shallow network layer as the input of a BGM module to generate a predicted Boundary Map (BM), sending the features and the boundary map output through the CFT module into a BMTA module, fusing the boundary information and the global information, finally sending the fused boundary map into a prediction layer to generate a prediction map, and segmenting the camouflage target image in the camouflage target image to be segmented based on each prediction map. The prediction graphs P1, P2, P3 in fig. 2 to 3 are the coarse to fine process, P1 is the coarsest, P3 is the clearest and the most complete, and P3 is the final result in this embodiment. P1, P2, P3 and BM are supervised by loss functions, the model is guided to optimize parameters, and the segmentation accuracy is improved. The method comprises the following specific steps:
1. And (5) extracting characteristics.
And extracting various layers of features from the camouflage target image to be segmented by adopting a Swin-TransformaerV 2 and Res2Net double-branch structure. The multi-head self-attention mechanism in Swin-transform V2 can break through the receptive field limitation in CNN, model the context relation pixel by pixel in the global scope, and allocate larger weight for important features so that the feature expression is richer. Res2Net has stronger and more effective multi-level feature extraction capability, refines features at a finer granularity level, and highlights the difference between foreground and background. Instead of representing the multi-level feature in a hierarchical fashion, res2Net replaces the 3 x 3 convolutional layer with a series of convolutional groups. The pair of Res2Net modules and basic convolution modules is shown in fig. 4, where a is the basic convolution module in fig. 4 and b is the Res2Net module in fig. 4. Each group has a larger receptive field output than the group after being processed by a 3 x 3 convolution layer, the grouping strategy can better process the characteristic graph, and the larger the split layer dimension is, the more the learned characteristic information is.
For a given image I εR 3×H×W Where H, W represents the height and width of the input image, respectively, 3 represents an RGB image, and Swin-fransformaerv 2 and Res2Net each comprise four stages, the images are fed into Swin-fransformaerv 2 and Res2Net, respectively, to generate a multi-level feature map from the four stages. Generating a characteristic diagram T through a Swin-TransformarV 2 branch i (i=1,2,3,4),T 1 Has a resolution of H/4 XW/4, T 4 The resolution of (2) is H/32 XW/32, and a feature map R is generated through Res2Net branches i (i=1,2,3,4),R 1 Resolution of H/4 XW/4, R 4 The resolution of (a) is H/32 XW/32, i.e. the feature map generated at the same stage of the two branches is the same spatial size (i.e. has the same resolution).
2. The local spatial detail is fused with global context information.
And inputting the feature graphs obtained by Swin-transform V2 and Res2Net into an LGA module, and simultaneously considering the global context and local detail of the image to identify the overall trend of the image, so that the feature extraction capability of two trunk branch networks is effectively supplemented. The global information is used for extracting important global features for roughly estimating the position of the target object, the local information is used for extracting fine-grained features of the object, the local features and the global features complement each other, and comprehensive feature information is provided for realizing accurate camouflage target segmentation (COS).
The LGA module applies both the residual channel attention mechanism and the spatial attention mechanism. The structure of the residual channel attention mechanism is shown in fig. 5. For the input feature diagram X ε R C×H×W The C, H, W represents the number of channels, the height and the width respectively, the residual channel attention mechanism firstly obtains a 1×1×c feature map through global averaging pooling, obtains global important feature information, then compresses the number of channels through downsampling (realized through convolution of 1×1), and then restores to the original number of channels C through upsampling (realized through convolution of 1×1), so as to obtain a weight coefficient of each channel, and the weight coefficient is multiplied with the original feature X to obtain a feature map with more distinguishing. Residual channel attention assigns greater weights to important channels, enhancing global features along the channel dimension.
The structure of the spatial attention mechanism is shown in fig. 6. For the input feature diagram X ε R C×H×W The spatial attention mechanism obtains a feature map F by performing maximum pooling and average pooling on channel dimensions to compress the channel dimensions of the features max ∈R 1×H×W And F avg ∈R 1×H×W Then F is carried out max And F avg Splicing and generating a spatial attention map F using a convolution operation on the spliced feature map followed by a Sigmoid activation function s ∈R 1×H×W Space attention force F s Multiplication with the input feature map X may assign greater weight to the important space, thereby enhancing local detail information of the spatial domain.
The structure of the LGA module is shown in FIG. 7. Features extracted from both branches of CNN (Res 2Net is one of CNN) and transducer (Swin-transducer V2 is a variant of transducer) with the same resolution (e.g. T 1 And R is 1 ) Is sent to LGA module, and the characteristic Fc from CNN is sent to spatial attentionSA) branching, further enhancing the local features extracted by CNN and suppressing irrelevant areas; features Ft from the transducer are fed into the Residual Channel Attention (RCA) branch, enhancing the transducer to extract global context features.
In order to focus the RCA branch on learning of globally important features, ft is connected with the features passing through RCA as residuals. For SA branches, in order to reduce the calculation amount of a model, firstly, convolution operation is carried out to reduce the channel dimension, and then, residual connection is carried out on the convolved result and the characteristics of the SA, so that the SA branches concentrate on learning spatial characteristics. The outputs of the two branches are then spliced to integrate global position information and local detail information and feature fusion is facilitated by a 3 x 3 convolutional layer, thereby adaptively integrating the local features and global dependencies.
3. And generating a boundary map.
Shallow feature layers (e.g. T 1 And T 2 、R 1 And R is 2 ) More of the object's edge space information is preserved, while the deep convolutional layers (e.g., T 3 And T 4 、R 3 And R is 4 ) More semantic information is retained, and therefore, a Boundary Map (BM) is generated using shallow features (a 1 and a2 in fig. 3) as input to the BGM module, the BM and the input image are made to have the same spatial size by upsampling, and the generated boundary map is measured by the following binary cross entropy loss function.
Wherein,,a border map true value representing the ith image,/->A boundary map representing a predicted ith image.
4. Cross-layer feature interactions.
The CFT module introduces a multi-scale channel attention (MS-CA) mechanism for realizing high-efficiency interaction of cross-layer characteristics and coping with the change of the target size in COS. The MS-CA mechanism has strong adaptability to different scale targets, and the structure of the MS-CA mechanism is shown in figure 8. The MS-CA mechanism is based on a double-branch structure, one branch utilizes global average pooling to obtain global features and distributes more attention to large-scale objects, and the other branch utilizes point convolution to obtain fine-granularity local details, so that the features of small objects can be captured more conveniently. Unlike other multiscale attention mechanisms, the MS-CA mechanism uses point convolution in two branches to compress and recover the characteristics of the channel dimensions, thereby aggregating multiscale channel information of different layers, effectively characterizing the multiscale information of the convolutional layers.
The overall structure of the feature interaction module is shown in FIG. 9, and features a of adjacent layers h And a l Firstly adding, then sending into MS-CA to obtain multi-scale channel information, and then using Sigmoid activation function to obtain normalized characteristic diagram F s ,F s 、1-F s (corresponding to the dashed arrows in FIG. 9) are respectively associated with a l And a h Multiplication enhances the feature representation. To preserve the original information of each feature, the original feature and the enhanced feature are connected in residual, then the two branches are combined by addition and promoted to be fused with a 3×3 convolution layer, thus obtaining the output F of the CFT module e
The feature interaction module aggregates cross-layer image features with different levels and receptive fields by introducing an MS-CA mechanism to provide rich multi-level context information, and the multi-level features interact to generate more effective and different image information, so that the model can adaptively divide targets with different sizes.
5. The boundary directs the multi-convolution head transpose attention.
The BMTA module effectively combines local details of the prediction boundary map and global information of the target in a multi-head attention mode. The BMTA module implements interaction of local and non-local pixels based on multi-convolution head transpose attention (MTA), and the structure of the multi-convolution transpose attention is shown in fig. 10.
Firstly, normalizing (layer normalization) an input feature map, then respectively sending three 1×1 convolutions and 3×3 depth convolutions to generate a query (Q), a key (K) and a value (V), then performing matrix multiplication on the transpose of Q and the transpose of K to generate an Attention Map (AM), and then performing matrix multiplication on the transpose matrix of AM and V to generate a new feature map, wherein the calculation formula is as follows:
the structure of the BMTA module is shown in fig. 11. The input feature map is fed into a Boundary Attention Head (BAH) branch and a target attention head (OAH) branch, respectively, for learning of different features. BAH branches are merged into a boundary map, boundary priori is provided, so that the model can learn edge detail characteristics better, and OAH learns overall characteristics of the target.
The OAH structure and calculation formula are consistent with MTA, so that cross-channel global feature extraction is realized, and the target and the background are separated integrally.
BAH introduces a predicted binary Boundary Map (BM) into the MTA, learning a representation of boundary enhancement to effectively capture important sparse local boundary information of the object. The structure of the BAH is shown in FIG. 12.
Specifically, Q and K obtained by convolution operation are multiplied by BM to obtain Q b And K b Then Q b And K b Multiplication results in an attention matrix with boundary information. The calculation formula of BAH is as follows:
v is a matrix without boundary information, so that features can be refined by establishing pairwise relationships at the boundaries, ensuring the definition and integrity of the boundaries.
The BMTA module finally splices the outputs of the OAH and BAH and then sends the output to a convolution of 3×3 to realize the fusion of the boundary and the integral feature, and to avoid the feature degradation, the residual connection is used to add the original and the fused feature. The BMTA integrates the features of the boundary map with the feature representations of different levels by taking the boundary information as a guide, and is helpful for distinguishing the fine foreground and background boundaries of the camouflage object by the model.
6. And generating a prediction graph.
Generating predictive pictures by features of BMTA being fed into a 3 x 3 convolutional layer with Sigmoid, predictive picture P for each sublayer i (i=1, 2, 3) are both supervised by BCE and IOU loss functions to optimize the parameters of the whole model. The model of the embodiment adopts multiple supervision, so that the model has better expressive force on the segmentation of the camouflage targets, and the overall loss function of the model is as follows:
wherein,,representing predictive pictures->Representing the true value of the image,/->An i-th prediction graph is shown.
Referring to fig. 13, the embodiment of the present invention further provides a camouflage target image segmentation system based on multi-level feature fusion, where the camouflage target image segmentation system based on multi-level feature fusion includes a data acquisition unit 100, a feature extraction unit 200, a feature fusion unit 300, a boundary guiding unit 400, a feature interaction unit 500, a boundary fusion unit 600, and an image segmentation unit 700, where:
a data acquisition unit 100 for acquiring a camouflage target image to be segmented;
the feature extraction unit 200 is configured to perform multi-level feature extraction on a camouflage target image to be segmented by adopting different network layers through a first branch network and a second branch network, so as to obtain a first feature map of multiple layers output by the first branch network and a second feature map of multiple layers output by the second branch network;
the feature fusion unit 300 is configured to perform global feature enhancement on the first feature map of each level, and obtain global features after multiple levels of enhancement; carrying out local feature enhancement on the second feature map of each level to obtain local features after multiple levels of enhancement; carrying out feature fusion on the reinforced local features of each level and the reinforced global features of the same level to obtain fusion features of multiple levels;
The boundary guiding unit 400 is configured to perform boundary guiding on the fusion features of two shallow network layers in the fusion features of multiple layers, so as to obtain a boundary map;
the feature interaction unit 500 is configured to perform feature interaction on the fusion features of the adjacent network layers in the multiple layers of fusion features to obtain multiple interaction features;
the boundary fusion unit 600 is configured to perform boundary fusion on the boundary map and each of the plurality of interaction features, so as to obtain a plurality of boundary fusion features;
the image segmentation unit 700 is configured to segment a camouflage target image in the camouflage target images to be segmented corresponding to each boundary fusion feature based on the plurality of boundary fusion features.
It should be noted that, since the camouflage target image segmentation system based on the multi-level feature fusion in the present embodiment and the above camouflage target image segmentation method based on the multi-level feature fusion are based on the same inventive concept, the corresponding content in the method embodiment is also applicable to the present system embodiment, and will not be described in detail here.
The embodiment of the application also provides electronic equipment, which comprises: at least one memory, at least one processor, at least one computer program stored in the at least one memory, the at least one processor executing the at least one computer program to implement any of the above embodiments of a camouflage target image segmentation method based on multi-level feature fusion. The electronic equipment can be any intelligent terminal including a tablet personal computer, a vehicle-mounted computer and the like.
Referring to fig. 14, fig. 14 illustrates a hardware structure of an electronic device according to another embodiment, the electronic device includes:
the processor 810 may be implemented by a general-purpose CPU (central processing unit), a microprocessor, an application-specific integrated circuit (ApplicationSpecificIntegratedCircuit, ASIC), or one or more integrated circuits, etc. for executing related programs to implement the technical solutions provided by the embodiments of the present application;
the memory 820 may be implemented in the form of read-only memory (ReadOnlyMemory, ROM), static storage, dynamic storage, or random access memory (RandomAccessMemory, RAM). The memory 820 may store an operating system and other application programs, and when the technical solution provided in the embodiments of the present disclosure is implemented by software or firmware, relevant program codes are stored in the memory 820, and the processor 810 invokes a camouflage target image segmentation method based on multi-level feature fusion to execute the embodiments of the present disclosure;
an input/output interface 830 for implementing information input and output;
the communication interface 840 is configured to implement communication interaction between the device and other devices, and may implement communication in a wired manner (e.g. USB, network cable, etc.), or may implement communication in a wireless manner (e.g. mobile network, WIFI, bluetooth, etc.);
Bus 850 transfers information between the various components of the device (e.g., processor 810, memory 820, input/output interface 830, and communication interface 840);
wherein processor 810, memory 820, input/output interface 830, and communication interface 840 enable communication connections among each other within the device via bus 850.
The embodiment of the application also provides a storage medium which is a computer readable storage medium, wherein the computer readable storage medium is stored with a computer program, and the computer program is used for enabling a computer to execute the camouflage target image segmentation method based on multi-level feature fusion.
The memory, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs as well as non-transitory computer executable programs. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one disk storage, flash memory, or other non-transitory solid state storage. In some embodiments, the memory optionally includes memory remotely located relative to the processor, the remote memory being connectable to the processor through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The embodiments described in the embodiments of the present application are for more clearly describing the technical solutions of the embodiments of the present application, and do not constitute a limitation on the technical solutions provided by the embodiments of the present application, and those skilled in the art can know that, with the evolution of technology and the appearance of new application scenarios, the technical solutions provided by the embodiments of the present application are equally applicable to similar technical problems.
It will be appreciated by those skilled in the art that the solution shown in fig. 1 is not limiting of the embodiments of the application and may include more or fewer steps than shown, or certain steps may be combined, or different steps.
The above described apparatus embodiments are merely illustrative, wherein the units illustrated as separate components may or may not be physically separate, i.e. may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
Those of ordinary skill in the art will appreciate that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof.
The terms "first," "second," "third," "fourth," and the like in the description of the application and in the above figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be understood that in the present application, "at least one (item)" means one or more, and "a plurality" means two or more. "and/or" for describing the association relationship of the association object, the representation may have three relationships, for example, "a and/or B" may represent: only a, only B and both a and B are present, wherein a, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b or c may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.
In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the above-described division of units is merely a logical function division, and there may be another division manner in actual implementation, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including multiple instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method of the various embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing a program.
The foregoing description of the preferred embodiments of the present application has been presented with reference to the drawings and is not intended to limit the scope of the claims. Any modifications, equivalent substitutions and improvements made by those skilled in the art without departing from the scope and spirit of the embodiments of the present application shall fall within the scope of the claims of the embodiments of the present application.

Claims (9)

1.一种基于多层次特征融合的伪装目标图像分割方法,其特征在于,所述基于多层次特征融合的伪装目标图像分割,包括:1. A camouflage target image segmentation method based on multi-level feature fusion, characterized in that the camouflage target image segmentation based on multi-level feature fusion includes: 获取待分割的伪装目标图像;Obtain the camouflage target image to be segmented; 通过第一分支网络和第二分支网络采用不同网络层对所述待分割的伪装目标图像进行多层次特征提取,获得所述第一分支网络输出的多种层次的第一特征图和所述第二分支网络输出的多种层次的第二特征图;The first branch network and the second branch network use different network layers to perform multi-level feature extraction on the camouflage target image to be segmented, and obtain multiple levels of first feature maps output by the first branch network and the third Multiple levels of second feature maps output by the two-branch network; 对每种层次的所述第一特征图进行全局特征增强,获得多种层次增强后的全局特征;对每种层次的所述第二特征图进行局部特征增强,获得多种层次增强后的局部特征;并将每种层次的所述增强后的局部特征和与其相同层次的所述增强后的全局特征进行特征融合,获得多种层次的融合特征;Global feature enhancement is performed on the first feature map of each level to obtain global features after multiple levels of enhancement; local features are enhanced on the second feature map of each level to obtain local features after multiple levels of enhancement. Features; and perform feature fusion on the enhanced local features of each level and the enhanced global features of the same level to obtain fusion features of multiple levels; 对所述多种层次的融合特征中两个浅层网络层的融合特征进行边界引导,得到边界图;Perform boundary guidance on the fusion features of two shallow network layers among the multiple levels of fusion features to obtain a boundary map; 将所述多种层次的融合特征中相邻网络层的融合特征进行特征交互,获得多个交互特征;具体为:Perform feature interaction on the fusion features of adjacent network layers among the multiple levels of fusion features to obtain multiple interactive features; specifically: 在特征交互模块中引入多尺度通道注意力机制,并对所述多种层次的融合特征中相邻网络层的融合特征进行相加,获得多个相加特征;Introduce a multi-scale channel attention mechanism into the feature interaction module, and add the fusion features of adjacent network layers among the multiple levels of fusion features to obtain multiple additive features; 将每个所述相加特征输入至所述多尺度通道注意力机制,获得多个多尺度通道特征;Input each of the additive features to the multi-scale channel attention mechanism to obtain multiple multi-scale channel features; 将每个所述多尺度通道特征采用激活函数获得多个归一化特征,并通过1减去每个所述归一化特征,获得多个归一化差值特征;Use an activation function to obtain multiple normalized features for each of the multi-scale channel features, and subtract each of the normalized features from 1 to obtain multiple normalized difference features; 对所述多个归一化特征和所述多个归一化差值特征进行特征增强,获得多个增强后的归一化特征和多个增强后的归一化差值特征;Perform feature enhancement on the plurality of normalized features and the plurality of normalized difference features to obtain a plurality of enhanced normalized features and a plurality of enhanced normalized difference features; 对每个所述增强后的归一化特征和与其对应的融合特征进行残差连接,获得多个第一残差特征,并将每个所述第一残差特征进行卷积,获得多个第一卷积特征;Residual connection is performed on each of the enhanced normalized features and its corresponding fused features to obtain multiple first residual features, and each of the first residual features is convolved to obtain multiple The first convolutional feature; 对每个所述增强后的归一化差值特征和与其对应的融合特征进行残差连接,获得多个第二残差特征,并对每个所述第二残差特征进行卷积,获得多个第二卷积特征;Perform residual connection on each of the enhanced normalized difference features and its corresponding fusion feature to obtain multiple second residual features, and perform convolution on each of the second residual features to obtain Multiple second convolutional features; 将每个所述第一卷积特征和与其对应的所述第二卷积特征进行相加,获得多个相加卷积特征,并将所述多个相加卷积特征采用卷积层促进融合,获得多个交互特征;Add each first convolution feature and the corresponding second convolution feature to obtain multiple additive convolution features, and use a convolution layer to facilitate the multiple additive convolution features. Fusion to obtain multiple interactive features; 将所述边界图分别与所述多个交互特征中的每个交互特征进行边界融合,获得多个边界融合特征;Perform boundary fusion on the boundary map with each interaction feature among the multiple interaction features to obtain multiple boundary fusion features; 基于所述多个边界融合特征,分割出每个所述边界融合特征对应的所述待分割的伪装目标图像中的伪装目标图像。Based on the plurality of boundary fusion features, the camouflage target image in the camouflage target image to be segmented corresponding to each of the boundary fusion features is segmented. 2.根据权利要求1所述的基于多层次特征融合的伪装目标图像分割方法,其特征在于,所述通过第一分支网络和第二分支网络采用不同网络层对所述待分割的伪装目标图像进行多层次特征提取,获得所述第一分支网络输出的多种层次的第一特征图和所述第二分支网络输出的多种层次的第二特征图,包括:2. The camouflage target image segmentation method based on multi-level feature fusion according to claim 1, characterized in that the first branch network and the second branch network adopt different network layers to segment the camouflage target image. Perform multi-level feature extraction to obtain multiple levels of first feature maps output by the first branch network and multiple levels of second feature maps output by the second branch network, including: 通过第一分支网络采用不同网络层对所述待分割的伪装目标图像的全局上下文信息进行特征提取,获得所述第一分支网络输出的多种层次的第一特征图;The first branch network uses different network layers to perform feature extraction on the global context information of the camouflage target image to be segmented, and obtains multiple levels of first feature maps output by the first branch network; 通过第二分支网络采用不同网络层对所述待分割的伪装目标图像的局部细节信息进行特征提取,获得所述第二分支网络输出的多种层次的第二特征图。The second branch network uses different network layers to perform feature extraction on the local detail information of the camouflage target image to be segmented, and obtains multiple levels of second feature maps output by the second branch network. 3.根据权利要求1所述的基于多层次特征融合的伪装目标图像分割方法,其特征在于,所述对每种层次的所述第一特征图进行全局特征增强,获得多种层次增强后的全局特征;对每种层次的所述第二特征图进行局部特征增强,获得多种层次增强后的局部特征;并将每种层次的所述增强后的局部特征和与其相同层次的所述增强后的全局特征进行特征融合,获得多种层次的融合特征,包括:3. The camouflage target image segmentation method based on multi-level feature fusion according to claim 1, characterized in that the first feature map of each level is globally feature enhanced to obtain multiple levels of enhanced features. Global features; perform local feature enhancement on the second feature map of each level to obtain enhanced local features at multiple levels; combine the enhanced local features of each level with the enhanced features of the same level Feature fusion is performed on the final global features to obtain multiple levels of fusion features, including: 采用残差通道注意力机制对所述每种层次的所述第一特征图进行全局特征增强,获得多种层次增强后的全局特征;Using a residual channel attention mechanism to perform global feature enhancement on the first feature map of each level to obtain enhanced global features at multiple levels; 采用空间注意力机制对所述每种层次的所述第二特征图进行局部特征增强,获得多种层次增强后的局部特征;Using a spatial attention mechanism to perform local feature enhancement on the second feature map of each level to obtain enhanced local features at multiple levels; 将每种层次的所述增强后的局部特征和与其相同层次的所述增强后的全局特征进行拼接,获得多种层次的拼接特征,并对所述多种层次的拼接特征采用卷积层促进特征融合,获得多种层次的融合特征。The enhanced local features of each level are spliced with the enhanced global features of the same level to obtain multiple levels of spliced features, and a convolution layer is used to facilitate the multiple levels of spliced features. Feature fusion to obtain multiple levels of fused features. 4.根据权利要求1所述的基于多层次特征融合的伪装目标图像分割方法,其特征在于,所述对所述多种层次的融合特征中两个浅层网络层的融合特征进行边界引导,得到边界图,包括:4. The camouflage target image segmentation method based on multi-level feature fusion according to claim 1, wherein the fusion features of two shallow network layers in the multiple levels of fusion features are subjected to boundary guidance, Get the boundary map, including: 对所述多种层次的融合特征中两个浅层网络层的融合特征进行卷积,获得第一卷积特征和第二卷积特征;Convolute the fusion features of two shallow network layers among the multiple levels of fusion features to obtain the first convolution feature and the second convolution feature; 对所述第一卷积特征和所述第二卷积特征进行加法操作,获得相加特征,并对所述相加特征采用多个卷积层进行边界引导,得到边界图。An addition operation is performed on the first convolution feature and the second convolution feature to obtain an additive feature, and multiple convolution layers are used for boundary guidance on the additive feature to obtain a boundary map. 5.根据权利要求1所述的基于多层次特征融合的伪装目标图像分割方法,其特征在于,所述将所述边界图分别与所述多个交互特征中的每个交互特征进行边界融合,获得多个边界融合特征,包括:5. The camouflage target image segmentation method based on multi-level feature fusion according to claim 1, wherein the boundary fusion is performed on the boundary map with each interactive feature in the plurality of interactive features, Obtain multiple boundary fusion features, including: 基于每个所述交互特征,采用目标注意力头分支学习目标整体特征;其中,所述目标注意力头分支用于基于所述交互特征,从整体上分离目标和背景;Based on each of the interaction features, a target attention head branch is used to learn the overall characteristics of the target; wherein the target attention head branch is used to separate the target and the background as a whole based on the interaction features; 基于所述边界图与每个所述交互特征,采用边界注意力头分支学习边界细节特征;其中,所述边界注意力头分支用于基于所述边界图与所述交互特征,捕获目标的稀疏局部边界信息;Based on the boundary map and each of the interaction features, a boundary attention head branch is used to learn boundary detail features; wherein the boundary attention head branch is used to capture the sparseness of the target based on the boundary map and the interaction features. local boundary information; 将每个所述目标注意力头分支的输出和与其对应的每个所述边界注意力头分支的输出进行拼接,获得多个输出拼接特征,并将所述多个输出拼接特征采用卷积层促进特征融合,获得多个卷积融合特征;The output of each target attention head branch and the output of each corresponding boundary attention head branch are spliced to obtain multiple output splicing features, and the multiple output splicing features are used in a convolution layer Promote feature fusion and obtain multiple convolutional fusion features; 将每个卷积融合特征和与其对应的每个所述交互特征进行残差连接,获得多个边界融合特征。Each convolutional fusion feature and each of its corresponding interactive features are residually connected to obtain multiple boundary fusion features. 6.根据权利要求1所述的基于多层次特征融合的伪装目标图像分割方法,其特征在于,所述基于所述多个边界融合特征,分割出每个所述边界融合特征对应的所述待分割的伪装目标图像中的伪装目标图像,包括:6. The camouflage target image segmentation method based on multi-level feature fusion according to claim 1, characterized in that, based on the plurality of boundary fusion features, the to-be-received image corresponding to each of the boundary fusion features is segmented. The camouflage target images in the segmented camouflage target images include: 将所述多个边界融合特征输入带有Sigmoid激活函数的卷积层生成多个预测图;Input the multiple boundary fusion features into a convolutional layer with a Sigmoid activation function to generate multiple prediction maps; 基于每个所述预测图,分割出所述待分割的伪装目标图像中的伪装目标图像。Based on each of the prediction maps, segment the camouflage target images in the camouflage target images to be segmented. 7.一种基于多层次特征融合的伪装目标图像分割系统,其特征在于,所述基于多层次特征融合的伪装目标图像分割系统包括:7. A camouflage target image segmentation system based on multi-level feature fusion, characterized in that the camouflage target image segmentation system based on multi-level feature fusion includes: 数据获取单元,用于获取待分割的伪装目标图像;A data acquisition unit is used to acquire the camouflage target image to be segmented; 特征提取单元,用于通过第一分支网络和第二分支网络采用不同网络层对所述待分割的伪装目标图像进行多层次特征提取,获得所述第一分支网络输出的多种层次的第一特征图和所述第二分支网络输出的多种层次的第二特征图;A feature extraction unit configured to use different network layers to perform multi-level feature extraction on the camouflage target image to be segmented through the first branch network and the second branch network, and obtain multiple levels of first features output by the first branch network. Feature maps and multiple levels of second feature maps output by the second branch network; 特征融合单元,用于对每种层次的所述第一特征图进行全局特征增强,获得多种层次增强后的全局特征;对每种层次的所述第二特征图进行局部特征增强,获得多种层次增强后的局部特征;并将每种层次的所述增强后的局部特征和与其相同层次的所述增强后的全局特征进行特征融合,获得多种层次的融合特征;A feature fusion unit is used to perform global feature enhancement on the first feature map of each level to obtain multi-level enhanced global features; perform local feature enhancement on the second feature map of each level to obtain multi-level features. Enhanced local features at each level; and perform feature fusion on the enhanced local features at each level and the enhanced global features at the same level to obtain multi-level fusion features; 边界引导单元,用于对所述多种层次的融合特征中两个浅层网络层的融合特征进行边界引导,得到边界图;A boundary guidance unit, used to perform boundary guidance on the fusion features of two shallow network layers among the multiple levels of fusion features to obtain a boundary map; 特征交互单元,用于将所述多种层次的融合特征中相邻网络层的融合特征进行特征交互,获得多个交互特征;具体为:The feature interaction unit is used to perform feature interaction on the fusion features of adjacent network layers among the multiple levels of fusion features to obtain multiple interactive features; specifically: 在特征交互模块中引入多尺度通道注意力机制,并对所述多种层次的融合特征中相邻网络层的融合特征进行相加,获得多个相加特征;Introduce a multi-scale channel attention mechanism into the feature interaction module, and add the fusion features of adjacent network layers among the multiple levels of fusion features to obtain multiple additive features; 将每个所述相加特征输入至所述多尺度通道注意力机制,获得多个多尺度通道特征;Input each of the added features to the multi-scale channel attention mechanism to obtain multiple multi-scale channel features; 将每个所述多尺度通道特征采用激活函数获得多个归一化特征,并通过1减去每个所述归一化特征,获得多个归一化差值特征;Use an activation function to obtain multiple normalized features for each of the multi-scale channel features, and subtract each of the normalized features from 1 to obtain multiple normalized difference features; 对所述多个归一化特征和所述多个归一化差值特征进行特征增强,获得多个增强后的归一化特征和多个增强后的归一化差值特征;Perform feature enhancement on the plurality of normalized features and the plurality of normalized difference features to obtain a plurality of enhanced normalized features and a plurality of enhanced normalized difference features; 对每个所述增强后的归一化特征和与其对应的融合特征进行残差连接,获得多个第一残差特征,并将每个所述第一残差特征进行卷积,获得多个第一卷积特征;Residual connection is performed on each of the enhanced normalized features and its corresponding fused features to obtain multiple first residual features, and each of the first residual features is convolved to obtain multiple The first convolutional feature; 对每个所述增强后的归一化差值特征和与其对应的融合特征进行残差连接,获得多个第二残差特征,并对每个所述第二残差特征进行卷积,获得多个第二卷积特征;Perform residual connection on each of the enhanced normalized difference features and its corresponding fusion feature to obtain multiple second residual features, and perform convolution on each of the second residual features to obtain Multiple second convolutional features; 将每个所述第一卷积特征和与其对应的所述第二卷积特征进行相加,获得多个相加卷积特征,并将所述多个相加卷积特征采用卷积层促进融合,获得多个交互特征;Add each first convolution feature and the corresponding second convolution feature to obtain multiple additive convolution features, and use a convolution layer to facilitate the multiple additive convolution features. Fusion to obtain multiple interactive features; 边界融合单元,用于将所述边界图分别与所述多个交互特征中的每个交互特征进行边界融合,获得多个边界融合特征;a boundary fusion unit, configured to perform boundary fusion between the boundary map and each of the plurality of interactive features to obtain multiple boundary fusion features; 图像分割单元,用于基于所述多个边界融合特征,分割出每个所述边界融合特征对应的所述待分割的伪装目标图像中的伪装目标图像。An image segmentation unit, configured to segment, based on the plurality of boundary fusion features, the camouflage target image in the camouflage target image to be segmented corresponding to each of the boundary fusion features. 8.一种电子设备,其特征在于,包括:8. An electronic device, characterized in that it includes: 至少一个存储器;at least one memory; 至少一个处理器;at least one processor; 至少一个计算机程序;at least one computer program; 所述至少一个计算机程序被存储在所述至少一个存储器中,所述至少一个处理器执行所述至少一个计算机程序以实现:The at least one computer program is stored in the at least one memory, and the at least one processor executes the at least one computer program to achieve: 如权利要求1至6任一项所述的基于多层次特征融合的伪装目标图像分割方法。The camouflage target image segmentation method based on multi-level feature fusion according to any one of claims 1 to 6. 9.一种存储介质,所述存储介质为计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,所述计算机程序用于使计算机执行:9. A storage medium, which is a computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program, and the computer program is used to cause the computer to execute: 如执行权利要求1至6任一项所述的基于多层次特征融合的伪装目标图像分割方法。The camouflage target image segmentation method based on multi-level feature fusion described in any one of claims 1 to 6 is performed.
CN202310982262.1A 2023-08-07 2023-08-07 A camouflage target image segmentation method and system based on multi-level feature fusion Active CN116703950B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310982262.1A CN116703950B (en) 2023-08-07 2023-08-07 A camouflage target image segmentation method and system based on multi-level feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310982262.1A CN116703950B (en) 2023-08-07 2023-08-07 A camouflage target image segmentation method and system based on multi-level feature fusion

Publications (2)

Publication Number Publication Date
CN116703950A CN116703950A (en) 2023-09-05
CN116703950B true CN116703950B (en) 2023-10-20

Family

ID=87843689

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310982262.1A Active CN116703950B (en) 2023-08-07 2023-08-07 A camouflage target image segmentation method and system based on multi-level feature fusion

Country Status (1)

Country Link
CN (1) CN116703950B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119206218B (en) * 2024-09-09 2025-07-15 沈阳工业大学 EdgeAttenNet glomerulus image accurate segmentation system and method based on camouflaged target detection
CN119992274A (en) * 2025-04-11 2025-05-13 南开大学 Method and device for detecting camouflaged objects
CN120374477B (en) * 2025-04-11 2025-11-18 安庆师范大学 Dark light field image enhancement system and method for drama scene

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008047369A (en) * 2006-08-11 2008-02-28 Furukawa Battery Co Ltd:The Method of manufacturing lead storage battery
US11222217B1 (en) * 2020-08-14 2022-01-11 Tsinghua University Detection method using fusion network based on attention mechanism, and terminal device
CN114565770A (en) * 2022-03-23 2022-05-31 中南大学 Image segmentation method and system based on edge-assisted computation and mask attention
CN114581752A (en) * 2022-05-09 2022-06-03 华北理工大学 A camouflaged target detection method based on context awareness and boundary refinement
CN114943963A (en) * 2022-04-29 2022-08-26 南京信息工程大学 Remote sensing image cloud and cloud shadow segmentation method based on double-branch fusion network
CN115471774A (en) * 2022-09-19 2022-12-13 中南大学 Video time domain action segmentation method based on audio and video bimodal feature fusion
WO2023024577A1 (en) * 2021-08-27 2023-03-02 之江实验室 Edge computing-oriented reparameterization neural network architecture search method
CN116228702A (en) * 2023-02-23 2023-06-06 南京邮电大学 A Camouflaged Object Detection Method Based on Attention Mechanism and Convolutional Neural Network

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112750140B (en) * 2021-01-21 2022-10-14 大连理工大学 Image segmentation method of camouflage target based on information mining

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008047369A (en) * 2006-08-11 2008-02-28 Furukawa Battery Co Ltd:The Method of manufacturing lead storage battery
US11222217B1 (en) * 2020-08-14 2022-01-11 Tsinghua University Detection method using fusion network based on attention mechanism, and terminal device
WO2023024577A1 (en) * 2021-08-27 2023-03-02 之江实验室 Edge computing-oriented reparameterization neural network architecture search method
CN114565770A (en) * 2022-03-23 2022-05-31 中南大学 Image segmentation method and system based on edge-assisted computation and mask attention
CN114943963A (en) * 2022-04-29 2022-08-26 南京信息工程大学 Remote sensing image cloud and cloud shadow segmentation method based on double-branch fusion network
CN114581752A (en) * 2022-05-09 2022-06-03 华北理工大学 A camouflaged target detection method based on context awareness and boundary refinement
CN115471774A (en) * 2022-09-19 2022-12-13 中南大学 Video time domain action segmentation method based on audio and video bimodal feature fusion
CN116228702A (en) * 2023-02-23 2023-06-06 南京邮电大学 A Camouflaged Object Detection Method Based on Attention Mechanism and Convolutional Neural Network

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
DeepCut: Object Segmentation from Bounding Box Annotations using Convolutional Neural Networks;Martin Rajchl, et al.;arXiv:1605.07866v2;1-10 *
Mask-and-Edge Co-Guided Separable Network for Camouflaged Object Detection;Jiesheng Wu,et al.;IEEE Signal Processing Letters;748 - 752 *
伪装目标检测研究进展;张冬冬 等;激光杂志;1-18 *
基于循环一致性对抗网络的数码迷彩伪装生成方法;滕旭;张晖;杨春明;赵旭剑;李波;;计算机应用(02);179-190 *
多尺度特征融合空洞卷积 ResNet遥感图像建筑物分割;徐胜军;欧阳朴衍;郭学源;Taha Muthar Khan;段中兴;;光学精密工程(07);262-266 *

Also Published As

Publication number Publication date
CN116703950A (en) 2023-09-05

Similar Documents

Publication Publication Date Title
CN116703950B (en) A camouflage target image segmentation method and system based on multi-level feature fusion
Ullah et al. A brief survey of visual saliency detection
CN111444826B (en) Video detection method, device, storage medium and computer equipment
WO2022247147A1 (en) Methods and systems for posture prediction
CN114445633B (en) Image processing method, device and computer readable storage medium
CN108491848B (en) Image saliency detection method and device based on depth information
CN111583173B (en) A saliency target detection method in RGB-D images
CN110765860A (en) Tumble determination method, tumble determination device, computer apparatus, and storage medium
CN112200818B (en) Dressing region segmentation and dressing replacement method, device and equipment based on image
CN112381828A (en) Positioning method, device, medium and equipment based on semantic and depth information
CN114693624A (en) Image detection method, device and equipment and readable storage medium
CN115984093A (en) Depth estimation method based on infrared image, electronic device and storage medium
CN116189281A (en) End-to-end human behavior classification method and system based on spatio-temporal adaptive fusion
CN111382654B (en) Image processing method and device and storage medium
WO2024199155A1 (en) Three-dimensional semantic scene completion method, device, and medium
CN116740479B (en) Three-dimensional attention-based camouflage target detection method for multi-scale context and multi-level feature interaction
CN114792315B (en) Medical image visual model training method and device, electronic equipment and storage medium
CN114842466B (en) Object detection method, computer program product and electronic device
CN115880350B (en) Image processing methods, devices, systems and computer-readable storage media
CN118196738A (en) Lane line detection method and device, electronic equipment and storage medium
CN117315791B (en) Skeleton action recognition method, equipment and storage medium
CN116957999A (en) Depth map optimization method, device, equipment and storage medium
US12322123B2 (en) Visual object detection in a sequence of images
CN114283290A (en) Training of image processing model, image processing method, device, equipment and medium
CN115222981A (en) Dishes identification method and device, storage medium and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant