[go: up one dir, main page]

CN116703950A - Camouflage target image segmentation method and system based on multi-level feature fusion - Google Patents

Camouflage target image segmentation method and system based on multi-level feature fusion Download PDF

Info

Publication number
CN116703950A
CN116703950A CN202310982262.1A CN202310982262A CN116703950A CN 116703950 A CN116703950 A CN 116703950A CN 202310982262 A CN202310982262 A CN 202310982262A CN 116703950 A CN116703950 A CN 116703950A
Authority
CN
China
Prior art keywords
features
feature
fusion
boundary
level
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310982262.1A
Other languages
Chinese (zh)
Other versions
CN116703950B (en
Inventor
任胜兵
梁义
周佳蕾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central South University
Original Assignee
Central South University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central South University filed Critical Central South University
Priority to CN202310982262.1A priority Critical patent/CN116703950B/en
Publication of CN116703950A publication Critical patent/CN116703950A/en
Application granted granted Critical
Publication of CN116703950B publication Critical patent/CN116703950B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a camouflage target image segmentation method and a camouflage target image segmentation system based on multi-level feature fusion, wherein the method carries out global feature enhancement on a first feature image of each level and carries out local feature enhancement on a second feature image of each level; carrying out feature fusion on the reinforced local features of each level and the reinforced global features of the same level to obtain fusion features of multiple levels; conducting boundary guiding on the fusion characteristics of two shallow network layers in the fusion characteristics of multiple layers to obtain a boundary map; performing feature interaction on fusion features of adjacent network layers in the fusion features of multiple layers to obtain multiple interaction features; respectively carrying out boundary fusion on the boundary map and each interactive feature in the plurality of interactive features to obtain a plurality of boundary fusion features; based on the plurality of boundary fusion features, a camouflage target image in the camouflage target images to be segmented corresponding to each boundary fusion feature is segmented. The invention can improve the accuracy of the camouflage target image segmentation.

Description

一种基于多层次特征融合的伪装目标图像分割方法和系统A Camouflage Target Image Segmentation Method and System Based on Multi-level Feature Fusion

技术领域technical field

本发明涉及伪装目标图像分割技术领域,尤其是涉及一种基于多层次特征融合的伪装目标图像分割方法和系统。The invention relates to the technical field of camouflage target image segmentation, in particular to a camouflage target image segmentation method and system based on multi-level feature fusion.

背景技术Background technique

伪装目标分割(COS)致力于分割出与背景高度相似性的伪装目标。COS使用计算机视觉模型来辅助人类视觉和感知系统进行伪装目标图像分割。Camouflaged Object Segmentation (COS) aims to segment camouflaged objects with high similarity to the background. COS uses a computer vision model to assist the human visual and perceptual system for image segmentation of camouflaged targets.

但现有技术以CNN(如ResNet)为主干的模型局部特征提取能力强大,而CNN因为感受野的限制,在获取长范围的特征依赖关系时能力有限;以transformer(如visiontransformer)为主干的模型受益于transformer中的注意力机制,对全局特征关系有很强的建模能力,但它在捕捉细粒度细节方面存在局限性,导致了对局部特征的表达能力减弱。而伪装目标图像分割既要基于全局特征把目标从整体上分割出来,又要基于局部特征处理边界等细节信息,采用单一主王网络需要依靠复杂的方法来融合局部特征和全局特征,效率较低。大多数方法使用简单的操作来融合多层次特征,例如拼接和相加,在高层特征和低层特征交互时,首先通过加法操作来融合这两个特征。然后,将融合的特征送入Sigmoid激活函数,获得归一化的特征图,将归一化的特征图看作特征级注意力图,来增强特征表示。在这种情况下,使用简单加操作得到的融合的特征图来实现跨级别特征增强的方式,无法捕获与分割伪装目标高度相关的有价值的信息。部分模型致力于提取伪装目标的全局纹理特征,忽视了边界对模型表达能力的影响,而当目标对象与背景共享相同的纹理特征时,这些模型的性能不优。由于大多数伪装物体的纹理与背景相似,区分边界局部信息的细微差异对提高模型性能尤为重要。一些模型虽然考虑边界特征,但往往把预测的边界图作为一个独立的分支进行监督而不进行其他的处理,边界图信息没有被充分利用。However, in the existing technology, the model with CNN (such as ResNet) as the backbone has a strong ability to extract local features, while CNN has limited ability to obtain long-range feature dependencies due to the limitation of the receptive field; the model with transformer (such as visiontransformer) as the backbone Benefiting from the attention mechanism in the transformer, it has a strong ability to model global feature relationships, but it has limitations in capturing fine-grained details, resulting in a weakened ability to express local features. However, the segmentation of camouflage target image needs to separate the target from the whole based on global features, and also needs to process detailed information such as boundaries based on local features. Using a single master network requires complex methods to fuse local features and global features, which is less efficient. . Most methods use simple operations to fuse multi-level features, such as concatenation and addition. When high-level features interact with low-level features, the two features are first fused by an addition operation. Then, the fused features are sent to the Sigmoid activation function to obtain a normalized feature map, and the normalized feature map is regarded as a feature-level attention map to enhance the feature representation. In this case, using the fused feature maps obtained by the simple addition operation to achieve cross-level feature enhancement cannot capture valuable information that is highly relevant to segmenting camouflage targets. Some models focus on extracting global texture features of camouflaged objects, ignoring the impact of boundaries on model expressiveness, and these models perform poorly when the target object shares the same texture features with the background. Since the texture of most camouflaged objects is similar to the background, distinguishing subtle differences in boundary local information is particularly important to improve model performance. Although some models consider boundary features, they often supervise the predicted boundary map as an independent branch without other processing, and the boundary map information is not fully utilized.

综上,现有技术难以捕获实用的特征信息,并且预测的边界图信息不能被充分利用,因此,难以实现准确的伪装目标分割。In summary, the existing techniques are difficult to capture practical feature information, and the predicted boundary map information cannot be fully utilized, therefore, it is difficult to achieve accurate camouflage target segmentation.

发明内容Contents of the invention

本发明旨在至少解决现有技术中存在的技术问题之一。为此,本发明提出一种基于多层次特征融合的伪装目标图像分割方法和系统,能够提高伪装目标图像分割的准确度。The present invention aims to solve at least one of the technical problems existing in the prior art. For this reason, the present invention proposes a method and system for segmenting a camouflaged target image based on multi-level feature fusion, which can improve the accuracy of segmenting a camouflaged target image.

第一方面,本发明实施例提供了一种基于多层次特征融合的伪装目标图像分割方法,所述基于多层次特征融合的伪装目标图像分割包括:In the first aspect, an embodiment of the present invention provides a method for segmenting a camouflage target image based on multi-level feature fusion, wherein the segmentation of a camouflage target image based on multi-level feature fusion includes:

获取待分割的伪装目标图像;Obtain the camouflaged target image to be segmented;

通过第一分支网络和第二分支网络采用不同网络层对所述待分割的伪装目标图像进行多层次特征提取,获得所述第一分支网络输出的多种层次的第一特征图和所述第二分支网络输出的多种层次的第二特征图;The first branch network and the second branch network use different network layers to perform multi-level feature extraction on the camouflaged target image to be segmented, and obtain the first feature map of various levels output by the first branch network and the second feature map. Multiple levels of second feature maps output by the two-branch network;

对每种层次的所述第一特征图进行全局特征增强,获得多种层次增强后的全局特征;对每种层次的所述第二特征图进行局部特征增强,获得多种层次增强后的局部特征;并将每种层次的所述增强后的局部特征和与其相同层次的所述增强后的全局特征进行特征融合,获得多种层次的融合特征;Perform global feature enhancement on the first feature map of each level to obtain multi-level enhanced global features; perform local feature enhancement on the second feature map of each level to obtain multi-level enhanced local features features; and performing feature fusion of the enhanced local features of each level and the enhanced global features of the same level to obtain fusion features of multiple levels;

对所述多种层次的融合特征中两个浅层网络层的融合特征进行边界引导,得到边界图;performing boundary guidance on the fusion features of the two shallow network layers in the fusion features of the multiple levels to obtain a boundary map;

将所述多种层次的融合特征中相邻网络层的融合特征进行特征交互,获得多个交互特征;performing feature interaction on the fusion features of adjacent network layers among the fusion features of the multiple levels to obtain multiple interaction features;

将所述边界图分别与所述多个交互特征中的每个交互特征进行边界融合,获得多个边界融合特征;performing boundary fusion on the boundary map and each of the plurality of interaction features, to obtain a plurality of boundary fusion features;

基于所述多个边界融合特征,分割出每个所述边界融合特征对应的所述待分割的伪装目标图像中的伪装目标图像。Based on the plurality of boundary fusion features, a camouflage target image in the to-be-segmented camouflage target images corresponding to each of the boundary fusion features is segmented.

与现有技术相比,本发明第一方面具有以下有益效果:Compared with the prior art, the first aspect of the present invention has the following beneficial effects:

本方法通过第一分支网络和第二分支网络采用不同网络层对待分割的伪装目标图像进行多层次特征提取,获得第一分支网络输出的多种层次的第一特征图和第二分支网络输出的多种层次的第二特征图,能够更好的提取目标图像中的特征;对每种层次的第一特征图进行全局特征增强,获得多种层次增强后的全局特征,对每种层次的第二特征图进行局部特征增强,获得多种层次增强后的局部特征,并将每种层次的增强后的局部特征和与其相同层次的增强后的全局特征进行特征融合,获得多种层次的融合特征,通过增强局部特征和全局特征,并将增强后的局部特征和全局特征进行融合,能够将局部特征和全局特征进行相互补充,为精确的分割伪装目标图像提供全面的特征信息;对多种层次的融合特征中两个浅层网络层的融合特征进行边界引导,得到边界图,由于浅层中保留了更多的语义信息,因此采用浅层的融合特征进行边界引导,能够生成高质量的边界图;将多种层次的融合特征中相邻网络层的融合特征进行特征交互,获得多个交互特征,多个层次的融合特征可以相互补充,获得全面的特征表达;将边界图分别与多个交互特征中的每个交互特征进行边界融合,获得多个边界融合特征,基于多个边界融合特征,分割出每个边界融合特征对应的待分割的伪装目标图像中的伪装目标图像,以边界图中的边界信息为指导,将边界图的特征与不同层次的交互特征相集成,细化边界特征,确保边界的清晰和完整,有助于区分伪装目标的精细前景和背景边界,对伪装目标的分割具有更好的表现力,提高了伪装目标图像分割的准确度。This method uses different network layers to perform multi-level feature extraction on the camouflage target image to be segmented through the first branch network and the second branch network, and obtains the first feature map of various levels output by the first branch network and the output of the second branch network. The second feature map of multiple levels can better extract the features in the target image; the global feature enhancement is performed on the first feature map of each level to obtain the global features after multi-level enhancement, and the second feature map of each level Local feature enhancement is performed on the two feature maps to obtain enhanced local features of various levels, and the enhanced local features of each level are fused with the enhanced global features of the same level to obtain fusion features of multiple levels , by enhancing local features and global features, and fusing the enhanced local features and global features, the local features and global features can complement each other, providing comprehensive feature information for accurate segmentation of camouflaged target images; for multiple levels In the fusion features of the two shallow network layers, the fusion features of the two shallow network layers are used for boundary guidance to obtain a boundary map. Since more semantic information is retained in the shallow layer, the shallow fusion features are used for boundary guidance, which can generate high-quality boundaries. Fig. The fusion features of the adjacent network layers in the multi-level fusion features are interacted to obtain multiple interactive features, and the fusion features of multiple levels can complement each other to obtain a comprehensive feature expression; the boundary map is respectively combined with multiple Boundary fusion is performed on each interaction feature in the interaction feature to obtain multiple boundary fusion features. Based on the multiple boundary fusion features, the camouflage target image in the camouflage target image to be segmented corresponding to each boundary fusion feature is segmented, and the boundary map Guided by the boundary information in , the features of the boundary map are integrated with the interaction features of different levels, and the boundary features are refined to ensure the clarity and integrity of the boundary, which helps to distinguish the fine foreground and background boundaries of the camouflage target, and the Segmentation is more expressive and improves the accuracy of segmentation of camouflaged target images.

根据本发明的一些实施例,所述通过第一分支网络和第二分支网络采用不同网络层对所述待分割的伪装目标图像进行多层次特征提取,获得所述第一分支网络输出的多种层次的第一特征图和所述第二分支网络输出的多种层次的第二特征图,包括:According to some embodiments of the present invention, the first branch network and the second branch network use different network layers to perform multi-level feature extraction on the to-be-segmented masquerade target image, and obtain various types of output from the first branch network. The first feature map of the level and the second feature map of various levels output by the second branch network include:

通过第一分支网络采用不同网络层对所述待分割的伪装目标图像的全局上下文信息进行特征提取,获得所述第一分支网络输出的多种层次的第一特征图;Using different network layers to perform feature extraction on the global context information of the camouflage target image to be segmented through the first branch network, and obtain first feature maps of various levels output by the first branch network;

通过第二分支网络采用不同网络层对所述待分割的伪装目标图像的局部细节信息进行特征提取,获得所述第二分支网络输出的多种层次的第二特征图。The second branch network uses different network layers to perform feature extraction on the local detail information of the to-be-segmented masquerade target image, and obtain second feature maps of various levels output by the second branch network.

根据本发明的一些实施例,所述对每种层次的所述第一特征图进行全局特征增强,获得多种层次增强后的全局特征;对每种层次的所述第二特征图进行局部特征增强,获得多种层次增强后的局部特征;并将每种层次的所述增强后的局部特征和与其相同层次的所述增强后的全局特征进行特征融合,获得多种层次的融合特征,包括:According to some embodiments of the present invention, the global feature enhancement is performed on the first feature map of each level to obtain global features after multi-level enhancement; the local feature is performed on the second feature map of each level Enhancing, obtaining enhanced local features of multiple levels; and performing feature fusion of the enhanced local features of each level and the enhanced global features of the same level to obtain fusion features of multiple levels, including :

采用残差通道注意力机制对所述每种层次的所述第一特征图进行全局特征增强,获得多种层次增强后的全局特征;Using a residual channel attention mechanism to perform global feature enhancement on the first feature map of each level to obtain global features after multiple level enhancements;

采用空间注意力机制对所述每种层次的所述第二特征图进行局部特征增强,获得多种层次增强后的局部特征;Using a spatial attention mechanism to enhance the local features of the second feature map of each level to obtain local features enhanced at multiple levels;

将每种层次的所述增强后的局部特征和与其相同层次的所述增强后的全局特征进行拼接,获得多种层次的拼接特征,并对所述多种层次的拼接特征采用卷积层促进特征融合,获得多种层次的融合特征。Concatenate the enhanced local features of each level with the enhanced global features of the same level to obtain multiple levels of spliced features, and use convolutional layers to promote the multiple levels of spliced features Feature fusion to obtain multiple levels of fusion features.

根据本发明的一些实施例,所述对所述多种层次的融合特征中两个浅层网络层的融合特征进行边界引导,得到边界图,包括:According to some embodiments of the present invention, performing boundary guidance on the fusion features of two shallow network layers among the fusion features of the various levels to obtain a boundary map includes:

对所述多种层次的融合特征中两个浅层网络层的融合特征进行卷积,获得第一卷积特征和第二卷积特征;Convolving the fusion features of the two shallow network layers in the fusion features of the various levels to obtain the first convolution feature and the second convolution feature;

对所述第一卷积特征和所述第二卷积特征进行加法操作,获得相加特征,并对所述相加特征采用多个卷积层进行边界引导,得到边界图。performing an addition operation on the first convolutional feature and the second convolutional feature to obtain an added feature, and performing boundary guidance on the added feature by using multiple convolutional layers to obtain a boundary map.

根据本发明的一些实施例,所述将所述多种层次的融合特征中相邻网络层的融合特征进行特征交互,获得多个交互特征,包括:According to some embodiments of the present invention, performing feature interaction on the fusion features of adjacent network layers among the fusion features of multiple levels to obtain multiple interaction features, including:

在特征交互模块中引入多尺度通道注意力机制,并对所述多种层次的融合特征中相邻网络层的融合特征进行相加,获得多个相加特征;Introduce a multi-scale channel attention mechanism in the feature interaction module, and add the fusion features of adjacent network layers among the fusion features of the multiple levels to obtain multiple addition features;

将每个所述相加特征输入至所述多尺度通道注意力机制,获得多个多尺度通道特征;Each of the added features is input to the multi-scale channel attention mechanism to obtain multiple multi-scale channel features;

将每个所述多尺度通道特征采用激活函数获得多个归一化特征,并通过一减去每个所述归一化特征,获得多个归一化差值特征;Using an activation function for each of the multi-scale channel features to obtain a plurality of normalized features, and subtracting each of the normalized features by one to obtain a plurality of normalized difference features;

对所述多个归一化特征和所述多个归一化差值特征进行特征增强,获得多个增强后的归一化特征和多个增强后的归一化差值特征;performing feature enhancement on the plurality of normalized features and the plurality of normalized difference features to obtain a plurality of enhanced normalized features and a plurality of enhanced normalized difference features;

对每个所述增强后的归一化特征和与其对应的融合特征进行残差连接,获得多个第一残差特征,并将每个所述第一残差特征进行卷积,获得多个第一卷积特征;Residually connect each of the enhanced normalized features and its corresponding fusion features to obtain multiple first residual features, and convolve each of the first residual features to obtain multiple first convolution feature;

对每个所述增强后的归一化差值特征和与其对应的融合特征进行残差连接,获得多个第二残差特征,并对每个所述第二残差特征进行卷积,获得多个第二卷积特征;Residually connect each of the enhanced normalized difference features and its corresponding fusion features to obtain a plurality of second residual features, and perform convolution on each of the second residual features to obtain a plurality of second convolutional features;

将每个所述第一卷积特征和与其对应的所述第二卷积特征进行相加,获得多个相加卷积特征,并将所述多个相加卷积特征采用卷积层促进融合,获得多个交互特征。Adding each of the first convolution features and the corresponding second convolution features to obtain a plurality of added convolution features, and using the convolution layer to promote the plurality of added convolution features Fusion, multiple interactive features are obtained.

根据本发明的一些实施例,所述将所述边界图分别与所述多个交互特征中的每个交互特征进行边界融合,获得多个边界融合特征,包括:According to some embodiments of the present invention, the boundary fusion of the boundary map and each of the multiple interaction features to obtain multiple boundary fusion features includes:

基于每个所述交互特征,采用目标注意力头分支学习目标整体特征;其中,所述目标注意力头分支用于基于所述交互特征,从整体上分离目标和背景;Based on each of the interaction features, the target attention head branch is used to learn the overall feature of the target; wherein the target attention head branch is used to separate the target and the background as a whole based on the interaction features;

基于所述边界图与每个所述交互特征,采用边界注意力头分支学习边界细节特征;其中,所述边界注意力头分支用于基于所述边界图与所述交互特征,捕获目标的稀疏局部边界信息;Based on the boundary map and each of the interaction features, use the boundary attention head branch to learn boundary detail features; wherein the boundary attention head branch is used to capture the sparseness of the target based on the boundary map and the interaction features Local boundary information;

将每个所述目标注意力头分支的输出和与其对应的每个所述边界注意力头分支的输出进行拼接,获得多个输出拼接特征,并将所述多个输出拼接特征采用卷积层促进特征融合,获得多个卷积融合特征;The output of each of the target attention head branches and the output of each of the corresponding boundary attention head branches are spliced to obtain a plurality of output splicing features, and the multiple output splicing features are used in a convolutional layer Promote feature fusion and obtain multiple convolution fusion features;

将每个卷积融合特征和与其对应的每个所述交互特征进行残差连接,获得多个边界融合特征。A residual connection is performed between each convolutional fusion feature and each of the corresponding interaction features to obtain multiple boundary fusion features.

根据本发明的一些实施例,所述基于所述多个边界融合特征,分割出每个所述边界融合特征对应的所述待分割的伪装目标图像中的伪装目标图像,包括:According to some embodiments of the present invention, the segmentation of the camouflage target image in the camouflage target image to be segmented corresponding to each of the boundary fusion features based on the plurality of boundary fusion features includes:

将所述多个边界融合特征输入带有Sigmoid激活函数的卷积层生成多个预测图;Inputting the plurality of boundary fusion features into a convolutional layer with a Sigmoid activation function generates a plurality of prediction maps;

基于每个所述预测图,分割出所述待分割的伪装目标图像中的伪装目标图像。Based on each of the prediction images, a masquerading target image among the masquerading target images to be segmented is segmented.

第二方面,本发明实施例还提供了一种基于多层次特征融合的伪装目标图像分割系统,所述基于多层次特征融合的伪装目标图像分割系统包括:In the second aspect, the embodiment of the present invention also provides a multi-level feature fusion based camouflage target image segmentation system, the multi-level feature fusion based camouflage target image segmentation system includes:

数据获取单元,用于获取待分割的伪装目标图像;A data acquisition unit, configured to acquire a camouflage target image to be segmented;

特征提取单元,用于通过第一分支网络和第二分支网络采用不同网络层对所述待分割的伪装目标图像进行多层次特征提取,获得所述第一分支网络输出的多种层次的第一特征图和所述第二分支网络输出的多种层次的第二特征图;The feature extraction unit is used to perform multi-level feature extraction on the camouflage target image to be segmented by using different network layers through the first branch network and the second branch network, and obtain the first multi-level output of the first branch network. a feature map and multiple levels of second feature maps output by the second branch network;

特征融合单元,用于对每种层次的所述第一特征图进行全局特征增强,获得多种层次增强后的全局特征;对每种层次的所述第二特征图进行局部特征增强,获得多种层次增强后的局部特征;并将每种层次的所述增强后的局部特征和与其相同层次的所述增强后的全局特征进行特征融合,获得多种层次的融合特征;The feature fusion unit is used to enhance the global features of the first feature map of each level to obtain global features after multiple level enhancements; perform local feature enhancement to the second feature map of each level to obtain multiple The enhanced local features of each level; and feature fusion of the enhanced local features of each level and the enhanced global features of the same level to obtain fusion features of multiple levels;

边界引导单元,用于对所述多种层次的融合特征中两个浅层网络层的融合特征进行边界引导,得到边界图;The boundary guidance unit is used to perform boundary guidance on the fusion features of the two shallow network layers in the fusion features of the various levels to obtain a boundary map;

特征交互单元,用于将所述多种层次的融合特征中相邻网络层的融合特征进行特征交互,获得多个交互特征;A feature interaction unit, configured to perform feature interaction on the fusion features of adjacent network layers among the fusion features of the multiple levels to obtain multiple interaction features;

边界融合单元,用于将所述边界图分别与所述多个交互特征中的每个交互特征进行边界融合,获得多个边界融合特征;a boundary fusion unit, configured to perform boundary fusion on the boundary map and each of the plurality of interactive features to obtain multiple boundary fusion features;

图像分割单元,用于基于所述多个边界融合特征,分割出每个所述边界融合特征对应的所述待分割的伪装目标图像中的伪装目标图像。An image segmentation unit, configured to segment a camouflage target image in the camouflage target images to be segmented corresponding to each of the boundary fusion features based on the plurality of boundary fusion features.

第三方面,本发明实施例还提供了一种电子设备,包括:In a third aspect, an embodiment of the present invention also provides an electronic device, including:

至少一个存储器;at least one memory;

至少一个处理器;at least one processor;

至少一个计算机程序;at least one computer program;

所述至少一个计算机程序被存储在所述至少一个存储器中,所述至少一个处理器执行所述至少一个计算机程序以实现上述第一方面所述的一种基于多层次特征融合的伪装目标图像分割方法。The at least one computer program is stored in the at least one memory, and the at least one processor executes the at least one computer program to implement the multi-level feature fusion-based masquerade target image segmentation described in the first aspect above method.

第四方面,本发明实施例还提供了一种存储介质,所述存储介质为计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序用于使计算机执行上述第一方面所述的一种基于多层次特征融合的伪装目标图像分割方法。In a fourth aspect, an embodiment of the present invention also provides a storage medium, the storage medium is a computer-readable storage medium, and the computer-readable storage medium stores a computer program, and the computer program is used to enable the computer to execute the above-mentioned first In one aspect, a camouflage target image segmentation method based on multi-level feature fusion is described.

可以理解的是,上述第二方面至第四方面与相关技术相比存在的有益效果与上述第一方面与相关技术相比存在的有益效果相同,可以参见上述第一方面中的相关描述,在此不再赘述。It can be understood that the beneficial effects of the above-mentioned second aspect to the fourth aspect compared with the related technology are the same as those of the above-mentioned first aspect compared with the related technology. Please refer to the relevant description in the above-mentioned first aspect. This will not be repeated here.

附图说明Description of drawings

本发明的上述和/或附加的方面和优点从结合下面附图对实施例的描述中将变得明显和容易理解,其中:The above and/or additional aspects and advantages of the present invention will become apparent and understandable from the description of the embodiments in conjunction with the following drawings, wherein:

图1是本发明一实施例的一种基于多层次特征融合的伪装目标图像分割方法的流程图;Fig. 1 is a flow chart of a method for segmenting a camouflaged target image based on multi-level feature fusion according to an embodiment of the present invention;

图2是本发明另一实施例的一种基于多层次特征融合的伪装目标图像分割方法的流程图;2 is a flow chart of a method for segmenting a camouflaged target image based on multi-level feature fusion according to another embodiment of the present invention;

图3是本发明一实施例的模型的整体结构示意图;Fig. 3 is the overall structure schematic diagram of the model of an embodiment of the present invention;

图4是本发明一实施例的Res2Net模块与基本卷积模块的示意图;4 is a schematic diagram of a Res2Net module and a basic convolution module according to an embodiment of the present invention;

图5是本发明一实施例的残差通道注意力机制的结构图;FIG. 5 is a structural diagram of a residual channel attention mechanism according to an embodiment of the present invention;

图6是本发明一实施例的空间注意力机制的结构图;6 is a structural diagram of a spatial attention mechanism according to an embodiment of the present invention;

图7是本发明一实施例的LGA模块的结构图;FIG. 7 is a structural diagram of an LGA module according to an embodiment of the present invention;

图8是本发明一实施例的MS-CA的结构图;Fig. 8 is a structural diagram of MS-CA according to an embodiment of the present invention;

图9是本发明一实施例的CFT模块的结构图;Fig. 9 is a structural diagram of a CFT module according to an embodiment of the present invention;

图10是本发明一实施例的MTA的结构图;FIG. 10 is a structural diagram of an MTA according to an embodiment of the present invention;

图11是本发明一实施例的BMTA的结构图;Fig. 11 is a structural diagram of a BMTA according to an embodiment of the present invention;

图12是本发明一实施例的BAH的结构图;Fig. 12 is a structural diagram of a BAH according to an embodiment of the present invention;

图13是本发明一实施例的一种基于多层次特征融合的伪装目标图像分割系统的结构图;13 is a structural diagram of a camouflage target image segmentation system based on multi-level feature fusion according to an embodiment of the present invention;

图14是本发明一实施例的电子设备的硬件结构示意图。Fig. 14 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present invention.

具体实施方式Detailed ways

下面详细描述本发明的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,仅用于解释本发明,而不能理解为对本发明的限制。Embodiments of the present invention are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals designate the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary only for explaining the present invention and should not be construed as limiting the present invention.

在本发明的描述中,如果有描述到第一、第二等只是用于区分技术特征为目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量或者隐含指明所指示的技术特征的先后关系。In the description of the present invention, if the first, second, etc. are described only for the purpose of distinguishing technical features, it cannot be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features or implying Indicates the sequence of the indicated technical features.

在本发明的描述中,需要理解的是,涉及到方位描述,例如上、下等指示的方位或位置关系为基于附图所示的方位或位置关系,仅是为了便于描述本发明和简化描述,而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作,因此不能理解为对本发明的限制。In the description of the present invention, it should be understood that when it comes to orientation descriptions, for example, the orientation or positional relationship indicated by up, down, etc. is based on the orientation or positional relationship shown in the drawings, which is only for the convenience of describing the present invention and simplifying the description , rather than indicating or implying that the device or element referred to must have a particular orientation, be constructed and operate in a particular orientation, and thus should not be construed as limiting the invention.

本发明的描述中,需要说明的是,除非另有明确的限定,设置、安装、连接等词语应做广义理解,所属技术领域技术人员可以结合技术方案的具体内容合理确定上述词语在本发明中的具体含义。In the description of the present invention, it should be noted that, unless otherwise clearly defined, words such as setting, installation, and connection should be understood in a broad sense, and those skilled in the art can reasonably determine that the above words are included in the present invention in combination with the specific content of the technical solution. specific meaning.

现有技术以CNN(如ResNet)为主干的模型局部特征提取能力强大,而CNN因为感受野的限制,在获取长范围的特征依赖关系时能力有限;以transformer(如visiontransformer)为主干的模型受益于transformer中的注意力机制,对全局特征关系有很强的建模能力,但它在捕捉细粒度细节方面存在局限性,导致了对局部特征的表达能力减弱。大多数方法使用简单的操作来融合多层次特征,例如拼接和相加,在高层特征和低层特征交互时,首先通过加法操作来融合这两个特征。然后,将融合的特征送入Sigmoid激活函数,获得归一化的特征图,将归一化的特征图看作特征级注意力图,来增强特征表示。在这种情况下,使用简单加操作得到的融合的特征图来实现跨级别特征增强的方式,无法捕获与分割伪装目标高度相关的有价值的信息。部分模型致力于提取伪装目标的全局纹理特征,忽视了边界对模型表达能力的影响,而当目标对象与背景共享相同的纹理特征时,这些模型的性能不优。由于大多数伪装物体的纹理与背景相似,区分边界局部信息的细微差异对提高模型性能尤为重要。一些模型虽然考虑边界特征,但往往把预测的边界图作为一个独立的分支进行监督而不进行其他的处理,边界图信息没有被充分利用。In the existing technology, the model with CNN (such as ResNet) as the backbone has a strong ability to extract local features, while CNN has limited ability to obtain long-range feature dependencies due to the limitation of the receptive field; the model with transformer (such as visiontransformer) as the backbone benefits The attention mechanism in the transformer has a strong modeling ability for global feature relationships, but it has limitations in capturing fine-grained details, resulting in a weakened ability to express local features. Most methods use simple operations to fuse multi-level features, such as concatenation and addition. When high-level features interact with low-level features, the two features are first fused by an addition operation. Then, the fused features are sent to the Sigmoid activation function to obtain a normalized feature map, and the normalized feature map is regarded as a feature-level attention map to enhance the feature representation. In this case, using the fused feature maps obtained by the simple addition operation to achieve cross-level feature enhancement cannot capture valuable information that is highly relevant to segmenting camouflage targets. Some models focus on extracting global texture features of camouflaged objects, ignoring the impact of boundaries on model expressiveness, and these models perform poorly when the target object shares the same texture features with the background. Since the texture of most camouflaged objects is similar to the background, distinguishing subtle differences in boundary local information is particularly important to improve model performance. Although some models consider boundary features, they often supervise the predicted boundary map as an independent branch without other processing, and the boundary map information is not fully utilized.

综上,现有技术难以捕获实用的特征信息,使预测的边界图信息不能被充分利用,因此,难以实现准确的伪装目标分割。In summary, the existing technology is difficult to capture practical feature information, so that the predicted boundary map information cannot be fully utilized, therefore, it is difficult to achieve accurate camouflaged target segmentation.

为解决上述问题,本发明通过第一分支网络和第二分支网络采用不同网络层对待分割的伪装目标图像进行多层次特征提取,获得第一分支网络输出的多种层次的第一特征图和第二分支网络输出的多种层次的第二特征图,能够更好的提取目标图像中的特征;对每种层次的第一特征图进行全局特征增强,获得多种层次增强后的全局特征,对每种层次的第二特征图进行局部特征增强,获得多种层次增强后的局部特征,并将每种层次的增强后的局部特征和与其相同层次的增强后的全局特征进行特征融合,获得多种层次的融合特征,通过增强局部特征和全局特征,并将增强后的局部特征和全局特征进行融合,能够将局部特征和全局特征进行相互补充,为精确的分割伪装目标图像提供全面的特征信息;对多种层次的融合特征中两个浅层网络层的融合特征进行边界引导,得到边界图,由于浅层中保留了更多的语义信息,因此采用浅层的融合特征进行边界引导,能够生成高质量的边界图;将多种层次的融合特征中相邻网络层的融合特征进行特征交互,获得多个交互特征,多个层次的融合特征可以相互补充,获得全面的特征表达;将边界图分别与多个交互特征中的每个交互特征进行边界融合,获得多个边界融合特征,基于多个边界融合特征,分割出每个边界融合特征对应的待分割的伪装目标图像中的伪装目标图像,以边界图中的边界信息为指导,将边界图的特征与不同层次的交互特征相集成,细化边界特征,确保边界的清晰和完整,有助于区分伪装目标的精细前景和背景边界,对伪装目标的分割具有更好的表现力,提高了伪装目标图像分割的准确度。In order to solve the above problems, the present invention uses different network layers to perform multi-level feature extraction on the camouflage target image to be segmented through the first branch network and the second branch network, and obtains the first feature map and the second feature map of various levels output by the first branch network. The multi-level second feature map output by the two-branch network can better extract the features in the target image; the global feature enhancement is performed on the first feature map of each level, and the global features after multi-level enhancement are obtained. The second feature map of each level is enhanced with local features to obtain enhanced local features of multiple levels, and the enhanced local features of each level are fused with the enhanced global features of the same level to obtain multiple A level of fusion features, by enhancing local features and global features, and fusing the enhanced local features and global features, the local features and global features can complement each other and provide comprehensive feature information for accurate segmentation of camouflaged target images ;Boundary guidance is performed on the fusion features of two shallow network layers in the fusion features of multiple levels to obtain a boundary map. Since more semantic information is retained in the shallow layer, the boundary guidance is performed using shallow fusion features, which can Generate a high-quality boundary map; perform feature interaction on the fusion features of adjacent network layers in multiple levels of fusion features to obtain multiple interactive features, and the fusion features of multiple levels can complement each other to obtain a comprehensive feature expression; the boundary Boundary fusion is performed on the graph with each interaction feature in the multiple interaction features to obtain multiple boundary fusion features, and based on the multiple boundary fusion features, the camouflage target in the camouflage target image to be segmented corresponding to each boundary fusion feature is segmented. Image, guided by the boundary information in the boundary map, integrates the features of the boundary map with different levels of interaction features, refines the boundary features, ensures the clarity and integrity of the boundary, and helps to distinguish the fine foreground and background boundaries of camouflaged targets , the segmentation of the camouflage target has better expressiveness, and improves the accuracy of the camouflage target image segmentation.

参照图1,本发明实施例提供了一种基于多层次特征融合的伪装目标图像分割方法,本基于多层次特征融合的伪装目标图像分割包括但不限于步骤S100至步骤S700,其中:Referring to FIG. 1 , an embodiment of the present invention provides a method for segmenting a camouflaged target image based on multi-level feature fusion. The segmentation of a camouflaged target image based on multi-level feature fusion includes but is not limited to steps S100 to S700, wherein:

步骤S100、获取待分割的伪装目标图像;Step S100, acquiring the camouflage target image to be segmented;

步骤S200、通过第一分支网络和第二分支网络采用不同网络层对待分割的伪装目标图像进行多层次特征提取,获得第一分支网络输出的多种层次的第一特征图和第二分支网络输出的多种层次的第二特征图;Step S200, use different network layers to perform multi-level feature extraction on the camouflage target image to be segmented through the first branch network and the second branch network, and obtain the first feature map of various levels output by the first branch network and the output of the second branch network The second feature map of multiple levels;

步骤S300、对每种层次的第一特征图进行全局特征增强,获得多种层次增强后的全局特征;对每种层次的第二特征图进行局部特征增强,获得多种层次增强后的局部特征;并将每种层次的增强后的局部特征和与其相同层次的增强后的全局特征进行特征融合,获得多种层次的融合特征;Step S300, perform global feature enhancement on the first feature map of each level to obtain multi-level enhanced global features; perform local feature enhancement on the second feature map of each level to obtain multi-level enhanced local features ; and the enhanced local features of each level are fused with the enhanced global features of the same level to obtain multi-level fusion features;

步骤S400、对多种层次的融合特征中两个浅层网络层的融合特征进行边界引导,得到边界图;Step S400, perform boundary guidance on the fusion features of two shallow network layers among the fusion features of multiple levels, and obtain a boundary map;

步骤S500、将多种层次的融合特征中相邻网络层的融合特征进行特征交互,获得多个交互特征;Step S500, performing feature interaction on the fusion features of adjacent network layers among the fusion features of multiple levels to obtain multiple interaction features;

步骤S600、将边界图分别与多个交互特征中的每个交互特征进行边界融合,获得多个边界融合特征;Step S600, performing boundary fusion on the boundary map and each of the multiple interaction features respectively, to obtain multiple boundary fusion features;

步骤S700、基于多个边界融合特征,分割出每个边界融合特征对应的待分割的伪装目标图像中的伪装目标图像。Step S700 , based on a plurality of boundary fusion features, segment a masquerade target image among the to-be-segmented masquerade target images corresponding to each boundary fusion feature.

在一些实施例的步骤S100至步骤S700中,为了更好的提取目标图像中的特征,本实施例通过第一分支网络和第二分支网络采用不同网络层对待分割的伪装目标图像进行多层次特征提取,获得第一分支网络输出的多种层次的第一特征图和第二分支网络输出的多种层次的第二特征图;为了给精确的分割伪装目标图像提供全面的特征信息,本实施例通过对每种层次的第一特征图进行全局特征增强,获得多种层次增强后的全局特征;对每种层次的第二特征图进行局部特征增强,获得多种层次增强后的局部特征;并将每种层次的增强后的局部特征和与其相同层次的增强后的全局特征进行特征融合,获得多种层次的融合特征;为了能够生成高质量的边界图,本实施例通过对多种层次的融合特征中两个浅层网络层的融合特征进行边界引导,得到边界图;为了获得全面的特征表达,本实施例通过将多种层次的融合特征中相邻网络层的融合特征进行特征交互,获得多个交互特征;为了提高伪装目标图像分割的准确度,本实施例通过将边界图分别与多个交互特征中的每个交互特征进行边界融合,获得多个边界融合特征,基于多个边界融合特征,分割出每个边界融合特征对应的待分割的伪装目标图像中的伪装目标图像。In some embodiments from step S100 to step S700, in order to better extract the features in the target image, this embodiment uses different network layers through the first branch network and the second branch network to perform multi-level features on the masquerading target image to be segmented Extracting, obtaining the first feature map of multiple levels output by the first branch network and the second feature map of multiple levels output by the second branch network; in order to provide comprehensive feature information for accurate segmentation and camouflage target images, this embodiment By performing global feature enhancement on the first feature map of each level, global features after multi-level enhancement are obtained; local feature enhancement is performed on the second feature map of each level, and local features after multi-level enhancement are obtained; and The enhanced local features of each level are fused with the enhanced global features of the same level to obtain fusion features of multiple levels; in order to generate high-quality boundary maps, this embodiment uses multiple levels of The fusion features of the two shallow network layers in the fusion feature are guided by the boundary to obtain a boundary map; in order to obtain a comprehensive feature expression, this embodiment performs feature interaction by combining the fusion features of the adjacent network layers in the fusion features of multiple levels, Obtain multiple interactive features; in order to improve the accuracy of the segmentation of the camouflaged target image, this embodiment obtains multiple boundary fusion features by performing boundary fusion on the boundary map with each of the multiple interactive features, based on multiple boundary The features are fused, and the camouflage target images in the camouflage target images to be segmented corresponding to each boundary fusion feature are segmented.

在一些实施例中,通过第一分支网络和第二分支网络采用不同网络层对待分割的伪装目标图像进行多层次特征提取,获得第一分支网络输出的多种层次的第一特征图和第二分支网络输出的多种层次的第二特征图,包括:In some embodiments, the first branch network and the second branch network use different network layers to perform multi-level feature extraction on the camouflage target image to be segmented, and obtain the first feature map and the second feature map of various levels output by the first branch network. Multiple levels of second feature maps output by the branch network, including:

通过第一分支网络采用不同网络层对待分割的伪装目标图像的全局上下文信息进行特征提取,获得第一分支网络输出的多种层次的第一特征图;performing feature extraction on the global context information of the camouflage target image to be segmented by using different network layers through the first branch network, and obtaining first feature maps of various levels output by the first branch network;

通过第二分支网络采用不同网络层对待分割的伪装目标图像的局部细节信息进行特征提取,获得第二分支网络输出的多种层次的第二特征图。The second branch network uses different network layers to perform feature extraction on the local detail information of the to-be-segmented camouflage target image, and obtains second feature maps of multiple levels output by the second branch network.

具体的,通过Swin-TransformerV2(即第一分支网络)采用不同网络层对待分割的伪装目标图像的全局上下文信息进行特征提取,获得第一分支网络输出的多种层次的第一特征图;Swin-TransformerV2中的多头自注意力机制可以突破CNN中的感受野限制,在全局范围内逐像素建模上下文关系,为重要特征分配更大的权重,使特征表达更丰富。通过Res2Net(即第二分支网络)采用不同网络层对待分割的伪装目标图像的局部细节信息进行特征提取,获得第二分支网络输出的多种层次的第二特征图;Res2Net具有更强且更有效的多层次特征提取能力,在更细粒度级别细化特征,突出前景和背景的差异。Specifically, Swin-TransformerV2 (ie, the first branch network) uses different network layers to perform feature extraction on the global context information of the camouflage target image to be segmented, and obtain the first feature map of multiple levels output by the first branch network; Swin- The multi-head self-attention mechanism in TransformerV2 can break through the limitation of the receptive field in CNN, model the context relationship pixel by pixel in the global scope, assign greater weight to important features, and make feature expression more abundant. Through Res2Net (that is, the second branch network), different network layers are used to extract the local detail information of the camouflaged target image to be segmented, and the second feature map of multiple levels output by the second branch network is obtained; Res2Net is stronger and more effective. The multi-level feature extraction capability can refine features at a finer-grained level, highlighting the difference between foreground and background.

在本实施例中,由于目前的大多数模型基于单一主干网络提取特征,需要依靠复杂的方法来融合局部特征和全局特征,效率较低。因此本实施例通过第一分支网络和第二分支网络采用不同网络层对待分割的伪装目标图像进行多层次特征提取,能够更好的提取目标图像中的特征。In this embodiment, since most current models extract features based on a single backbone network, it is necessary to rely on complex methods to fuse local features and global features, and the efficiency is low. Therefore, in this embodiment, the first branch network and the second branch network use different network layers to perform multi-level feature extraction on the camouflage target image to be segmented, so that the features in the target image can be better extracted.

需要说明的是,本实施例采用Swin-TransformerV2和Res2Net进行特征提取,但不限于Swin-TransformerV2和Res2Net,本实施例可根据实际情况进行更改,不做具体限定。It should be noted that this embodiment uses Swin-TransformerV2 and Res2Net for feature extraction, but is not limited to Swin-TransformerV2 and Res2Net, and this embodiment can be modified according to actual conditions without specific limitations.

在一些实施例中,对每种层次的第一特征图进行全局特征增强,获得多种层次增强后的全局特征;对每种层次的第二特征图进行局部特征增强,获得多种层次增强后的局部特征;并将每种层次的增强后的局部特征和与其相同层次的增强后的全局特征进行特征融合,获得多种层次的融合特征,包括:In some embodiments, global feature enhancement is performed on the first feature map of each level to obtain global features after multi-level enhancement; local feature enhancement is performed on the second feature map of each level to obtain multi-level enhanced features The local features of each level; and the enhanced local features of each level are fused with the enhanced global features of the same level to obtain multiple levels of fusion features, including:

采用残差通道注意力机制对每种层次的第一特征图进行全局特征增强,获得多种层次增强后的全局特征;The residual channel attention mechanism is used to enhance the global features of the first feature map of each level, and obtain the global features after multiple levels of enhancement;

采用空间注意力机制对每种层次的第二特征图进行局部特征增强,获得多种层次增强后的局部特征;The spatial attention mechanism is used to enhance the local features of the second feature map of each level, and obtain local features after multi-level enhancement;

将每种层次的增强后的局部特征和与其相同层次的增强后的全局特征进行拼接,获得多种层次的拼接特征,并对多种层次的拼接特征采用卷积层促进特征融合,获得多种层次的融合特征。Concatenate the enhanced local features of each level with the enhanced global features of the same level to obtain multiple levels of spliced features, and use convolutional layers to promote feature fusion for multiple levels of spliced features to obtain multiple Hierarchical fusion features.

具体的,采用局部空间细节与全局上下文信息融合模块对特征进行增强和融合。局部空间细节与全局上下文信息融合模块同时应用通道注意力机制和空间注意力机制,采用残差通道注意力机制对每种层次的第一特征图进行全局特征增强,获得多种层次增强后的全局特征;采用空间注意力机制对每种层次的第二特征图进行局部特征增强,获得多种层次增强后的局部特征;将每种层次的增强后的局部特征和与其相同层次的增强后的全局特征进行拼接,获得多种层次的拼接特征,并对多种层次的拼接特征采用卷积层促进特征融合,获得多种层次的融合特征。Specifically, the local spatial details and global context information fusion module is used to enhance and fuse features. The local spatial details and global context information fusion module applies the channel attention mechanism and the spatial attention mechanism at the same time, and uses the residual channel attention mechanism to enhance the global features of the first feature map of each level, and obtain the global features after multiple levels of enhancement. features; use the spatial attention mechanism to enhance the local features of the second feature map of each level, and obtain the local features enhanced by multiple levels; the enhanced local features of each level and the enhanced global features of the same level The features are spliced to obtain multiple levels of spliced features, and the convolutional layer is used to promote feature fusion for multiple levels of spliced features to obtain multiple levels of fusion features.

在本实施例中,局部空间细节与全局上下文信息融合模块同时考虑图像的全局上下文和局部细节来识别图像的总体趋势,有效地补充了两个主干分支网络的特征提取能力。全局信息用于提取粗略估计目标对象位置的重要全局特征,局部信息用于提取对象的细粒度特征,局部特征和全局特征相互补充,以为实现精确的伪装目标分割提供全面的特征信息。In this embodiment, the fusion module of local spatial details and global context information simultaneously considers the global context and local details of the image to identify the overall trend of the image, which effectively complements the feature extraction capabilities of the two main branch networks. Global information is used to extract important global features that roughly estimate the position of the target object, and local information is used to extract fine-grained features of the object. Local features and global features complement each other to provide comprehensive feature information for accurate camouflaged target segmentation.

在一些实施例中,对多种层次的融合特征中两个浅层网络层的融合特征进行边界引导,得到边界图,包括:In some embodiments, boundary guidance is performed on the fusion features of two shallow network layers in the fusion features of various levels to obtain a boundary map, including:

对多种层次的融合特征中两个浅层网络层的融合特征进行卷积,获得第一卷积特征和第二卷积特征;Convolving the fusion features of two shallow network layers in the fusion features of multiple levels to obtain the first convolution feature and the second convolution feature;

对第一卷积特征和第二卷积特征进行加法操作,获得相加特征,并对相加特征采用多个卷积层进行边界引导,得到边界图。The addition operation is performed on the first convolution feature and the second convolution feature to obtain the addition feature, and a plurality of convolution layers are used for boundary guidance on the addition feature to obtain a boundary map.

具体的,采用边界引导模块对多种层次的融合特征中两个浅层网络层的融合特征进行卷积,获得第一卷积特征和第二卷积特征;对第一卷积特征和第二卷积特征进行加法操作,获得相加特征,并对相加特征采用多个卷积层进行边界引导,得到边界图。Specifically, the boundary guidance module is used to convolve the fusion features of two shallow network layers among the fusion features of various levels to obtain the first convolution feature and the second convolution feature; for the first convolution feature and the second The convolutional features are added to obtain the added features, and multiple convolutional layers are used for boundary guidance on the added features to obtain a boundary map.

在本实施例中,由于浅层中保留了更多的语义信息,因此采用浅层的融合特征进行边界引导,能够生成高质量的边界图。In this embodiment, since more semantic information is retained in the shallow layer, the fusion features of the shallow layer are used for boundary guidance, and a high-quality boundary map can be generated.

在一些实施例中,将多种层次的融合特征中相邻网络层的融合特征进行特征交互,获得多个交互特征,包括:In some embodiments, the fusion features of adjacent network layers among the fusion features of various levels are subjected to feature interaction to obtain multiple interaction features, including:

在特征交互模块中引入多尺度通道注意力机制,并对多种层次的融合特征中相邻网络层的融合特征进行相加,获得多个相加特征;Introduce the multi-scale channel attention mechanism in the feature interaction module, and add the fusion features of adjacent network layers among the fusion features of multiple levels to obtain multiple addition features;

将每个相加特征输入至多尺度通道注意力机制,获得多个多尺度通道特征;Input each added feature to the multi-scale channel attention mechanism to obtain multiple multi-scale channel features;

将每个多尺度通道特征采用激活函数获得多个归一化特征,并通过一减去每个归一化特征,获得多个归一化差值特征;Use an activation function to obtain multiple normalized features for each multi-scale channel feature, and subtract each normalized feature by one to obtain multiple normalized difference features;

对多个归一化特征和多个归一化差值特征进行特征增强,获得多个增强后的归一化特征和多个增强后的归一化差值特征;Performing feature enhancement on multiple normalized features and multiple normalized difference features to obtain multiple enhanced normalized features and multiple enhanced normalized difference features;

对每个增强后的归一化特征和与其对应的融合特征进行残差连接,获得多个第一残差特征,并将每个第一残差特征进行卷积,获得多个第一卷积特征;Residually connect each enhanced normalized feature and its corresponding fusion feature to obtain multiple first residual features, and convolve each first residual feature to obtain multiple first convolutions feature;

对每个增强后的归一化差值特征和与其对应的融合特征进行残差连接,获得多个第二残差特征,并对每个第二残差特征进行卷积,获得多个第二卷积特征;Residually connect each enhanced normalized difference feature and its corresponding fusion feature to obtain multiple second residual features, and perform convolution on each second residual feature to obtain multiple second convolution feature;

将每个第一卷积特征和与其对应的第二卷积特征进行相加,获得多个相加卷积特征,并将多个相加卷积特征采用卷积层促进融合,获得多个交互特征。Add each first convolution feature and its corresponding second convolution feature to obtain multiple added convolution features, and use the convolution layer to promote fusion of multiple added convolution features to obtain multiple interactions feature.

在本实施例中,特征交互模块为实现跨层特征的高效交互,并应对伪装目标分割中目标大小的变化,引入了多尺度通道注意力机制。多尺度通道注意力机制对不同规模目标具有较强的适应性。多尺度通道注意力机制基于双分支结构,一个分支利用全局平均池化来获得全局特征,为大尺度对象分配更多的关注,另一个分支利用点卷积来获得细粒度的局部细节,更利于捕获小对象的特征。与其他多尺度注意机制不同,多尺度通道注意力机制使用两个分支中的点卷积来压缩和恢复通道维度的特征,从而聚合不同层次的多尺度通道信息,有效地表征卷积层的多尺度信息。通过多个层次的融合特征可以相互补充,获得更全面的特征表达。In this embodiment, the feature interaction module introduces a multi-scale channel attention mechanism in order to achieve efficient interaction of cross-layer features and to deal with changes in target size in camouflaged target segmentation. The multi-scale channel attention mechanism has strong adaptability to different scale targets. The multi-scale channel attention mechanism is based on a dual-branch structure. One branch uses global average pooling to obtain global features and allocate more attention to large-scale objects. The other branch uses point convolution to obtain fine-grained local details, which is more conducive to Capture features of small objects. Different from other multi-scale attention mechanisms, the multi-scale channel attention mechanism uses point convolution in two branches to compress and restore the features of the channel dimension, thereby aggregating multi-scale channel information at different levels and effectively characterizing the multiple layers of the convolutional layer. scale information. The fusion features of multiple levels can complement each other to obtain a more comprehensive feature expression.

在一些实施例中,将边界图分别与多个交互特征中的每个交互特征进行边界融合,获得多个边界融合特征,包括:In some embodiments, boundary fusion is performed between the boundary map and each of the multiple interaction features to obtain multiple boundary fusion features, including:

基于每个交互特征,采用目标注意力头分支学习目标整体特征;其中,目标注意力头分支用于基于交互特征,从整体上分离目标和背景;Based on each interaction feature, the target attention head branch is used to learn the overall feature of the target; where the target attention head branch is used to separate the target and background from the whole based on the interaction feature;

基于边界图与每个交互特征,采用边界注意力头分支学习边界细节特征;其中,边界注意力头分支用于基于边界图与交互特征,捕获目标的稀疏局部边界信息;Based on the boundary map and each interaction feature, the boundary attention head branch is used to learn the boundary detail features; where the boundary attention head branch is used to capture the sparse local boundary information of the target based on the boundary map and interaction features;

将每个目标注意力头分支的输出和与其对应的每个边界注意力头分支的输出进行拼接,获得多个输出拼接特征,并将多个输出拼接特征采用卷积层促进特征融合,获得多个卷积融合特征;The output of each target attention head branch is spliced with the output of each corresponding boundary attention head branch to obtain multiple output splicing features, and the multiple output splicing features are used to promote feature fusion by convolutional layers, and multiple output splicing features are obtained. Convolution fusion features;

将每个卷积融合特征和与其对应的每个交互特征进行残差连接,获得多个边界融合特征。Residual connections are made between each convolutional fusion feature and each corresponding interaction feature to obtain multiple boundary fusion features.

在本实施例中,以边界图中的边界信息为指导,将边界图的特征与不同层次的交互特征相集成,细化边界特征,确保边界的清晰和完整,有助于区分伪装目标的精细前景和背景边界。In this embodiment, guided by the boundary information in the boundary map, the features of the boundary map are integrated with the interaction features of different levels, and the boundary features are refined to ensure the clarity and completeness of the boundary, which is helpful to distinguish the subtleties of camouflaged targets. Foreground and background borders.

在一些实施例中,基于多个边界融合特征,分割出每个边界融合特征对应的待分割的伪装目标图像中的伪装目标图像,包括:In some embodiments, based on a plurality of boundary fusion features, the camouflage target image in the camouflage target image to be segmented corresponding to each boundary fusion feature is segmented, including:

将多个边界融合特征输入带有Sigmoid激活函数的卷积层生成多个预测图;Input multiple boundary fusion features into a convolutional layer with a Sigmoid activation function to generate multiple prediction maps;

基于每个预测图,分割出待分割的伪装目标图像中的伪装目标图像。Based on each prediction map, a masquerade target image among the masquerade target images to be segmented is segmented.

在本实施例中,基于多个边界融合特征,对伪装目标的分割具有更好的表现力,能够提高伪装目标图像分割的准确度。In this embodiment, based on multiple boundary fusion features, the segmentation of the camouflage target has better expressiveness, and the accuracy of the segmentation of the camouflage target image can be improved.

为方便本领域人员理解,以下提供一组最佳实施例:For the convenience of those skilled in the art to understand, a group of best embodiments are provided below:

参照图2至图3,本实施例模型的整体结构包括特征提取、用于融合局部特征和全局特征的局部空间细节与全局上下文信息融合模块(LGA模块)、用于跨层特征交互的特征交互模块(CFT模块)、用于预测边界图的边界引导模块(BGM模块)、用于融合边界特征的边界引导多卷积头转置注意力模块(BMTA模块)和生成最终分割图的预测层。首先,将待分割的伪装目标图像分别输入至Swin-TransformerV2和Res2Net并行结构提取多层次特征,接着两个分支中具有相同分辨率的特征被送入LGA模块来聚合全局信息和局部信息,然后将相邻网络层的LGA模块输出的特征通过CFT模块进行交互融合并增强,前浅层网络层对应的LGA模块输出的特征a1和a2作为BGM模块的输入,生成预测的边界图(BM),然后经过CFT模块输出的特征和边界图被送入BMTA模块中,将边界信息和全局信息融合,最后送入预测层来生成预测图,基于每个预测图,分割出待分割的伪装目标图像中的伪装目标图像。图2至图3中的预测图P1、P2、P3是由粗到细的过程,P1最粗糙,P3最清晰最完整,本实施例将P3作为最终结果。P1、P2、P3以及BM都被损失函数监督,引导模型优化参数,提高分割精度。具体步骤为:Referring to Figures 2 to 3, the overall structure of the model in this embodiment includes feature extraction, local spatial details and global context information fusion module (LGA module) for fusing local features and global features, and feature interaction for cross-layer feature interaction module (CFT module), a boundary-guided module (BGM module) for predicting boundary maps, a boundary-guided multi-convolution head-transposed attention module (BMTA module) for fusing boundary features, and a prediction layer to generate the final segmentation map. First, the camouflage target image to be segmented is input to the parallel structure of Swin-TransformerV2 and Res2Net to extract multi-level features, and then the features with the same resolution in the two branches are sent to the LGA module to aggregate global information and local information, and then The features output by the LGA module of the adjacent network layer are interactively fused and enhanced through the CFT module, and the features a1 and a2 output by the LGA module corresponding to the previous shallow network layer are used as the input of the BGM module to generate a predicted boundary map (BM), and then The feature and boundary map output by the CFT module are sent to the BMTA module, where the boundary information and global information are fused, and finally sent to the prediction layer to generate a prediction map. Camouflage the target image. The prediction graphs P1, P2, and P3 in Fig. 2 to Fig. 3 are from rough to fine, P1 is the roughest, P3 is the clearest and most complete, and this embodiment takes P3 as the final result. P1, P2, P3, and BM are all supervised by the loss function, which guides the model to optimize parameters and improve segmentation accuracy. The specific steps are:

1、特征提取。1. Feature extraction.

采用Swin-TransformerV2和Res2Net双分支结构对待分割的伪装目标图像提取多种层次的特征。Swin-TransformerV2中的多头自注意力机制可以突破CNN中的感受野限制,在全局范围内逐像素建模上下文关系,为重要特征分配更大的权重,使特征表达更丰富。Res2Net具有更强且更有效的多层次特征提取能力,在更细粒度级别细化特征,突出前景和背景的差异。与以分层方式表示多层次特征不同,Res2Net用一系列卷积组替换3×3的卷积层。Res2Net模块与基本的卷积模块对比如图4所示,图4中a为基本的卷积模块,图4中b为Res2Net模块。每一个小组在经过3×3的卷积层处理之后,都会有一个比本身更大的感受野的输出,这种分组的策略能够更好的处理特征图,而且拆分的层次维度越大学习到的特征信息越丰富。Using Swin-TransformerV2 and Res2Net dual-branch structure to extract multiple levels of features from the camouflage target image to be segmented. The multi-head self-attention mechanism in Swin-TransformerV2 can break through the limitation of the receptive field in CNN, model the context relationship pixel by pixel in the global scope, assign greater weight to important features, and make feature expression more abundant. Res2Net has stronger and more effective multi-level feature extraction capabilities, refines features at a finer-grained level, and highlights the difference between foreground and background. Unlike representing multi-level features in a hierarchical manner, Res2Net replaces the 3×3 convolutional layers with a series of convolutional groups. The comparison between the Res2Net module and the basic convolution module is shown in Figure 4. A in Figure 4 is the basic convolution module, and b in Figure 4 is the Res2Net module. After each group is processed by the 3×3 convolutional layer, it will have an output with a larger receptive field than itself. This grouping strategy can better process the feature map, and the larger the split level dimension is, the learning The richer the feature information is.

对于给定图像I∈R3×H×W,其中H、W分别表示输入图像的高度、宽度,3表示是RGB图像,Swin-TransformerV2和Res2Net都包含四个阶段,图像被分别送入Swin-TransformerV2和Res2Net,以从四个阶段生成多种层次的特征图。经过Swin-TransformerV2分支生成特征图Ti(i=1,2,3,4),T1的分辨率为H/4×W/4,T4的分辨率为H/32×W/32,经过Res2Net分支生成特征图Ri(i=1,2,3,4),R1的分辨率为H/4×W/4,R4的分辨率为H/32×W/32,即两个分支相同阶段生成的特征图空间大小一样(即具有相同的分辨率)。For a given image I∈R 3×H×W , where H and W represent the height and width of the input image respectively, and 3 represents an RGB image, both Swin-TransformerV2 and Res2Net contain four stages, and the images are sent to Swin- TransformerV2 and Res2Net to generate multi-level feature maps from four stages. The feature map T i (i=1,2,3,4) is generated through the Swin-TransformerV2 branch. The resolution of T 1 is H/4×W/4, and the resolution of T 4 is H/32×W/32. The feature map R i (i=1, 2, 3, 4) is generated through the Res2Net branch. The resolution of R 1 is H/4×W/4, and the resolution of R 4 is H/32×W/32, that is, two The feature maps generated at the same stage of each branch have the same spatial size (that is, have the same resolution).

2、局部空间细节与全局上下文信息融合。2. Fusion of local spatial details and global context information.

将Swin-TransformerV2和Res2Net获得的特征图输入LGA模块,LGA模块同时考虑图像的全局上下文和局部细节来识别图像的总体趋势,有效地补充了两个主干分支网络的特征提取能力。全局信息用于提取粗略估计目标对象位置的重要全局特征,局部信息用于提取对象的细粒度特征,局部特征和全局特征相互补充,为实现精确的伪装目标分割(COS)提供全面的特征信息。The feature maps obtained by Swin-TransformerV2 and Res2Net are input into the LGA module. The LGA module simultaneously considers the global context and local details of the image to identify the general trend of the image, effectively supplementing the feature extraction capabilities of the two main branch networks. Global information is used to extract important global features that roughly estimate the position of the target object, and local information is used to extract fine-grained features of the object. Local features and global features complement each other to provide comprehensive feature information for accurate camouflaged object segmentation (COS).

LGA模块同时应用残差通道注意力机制和空间注意力机制。残差通道注意力机制的结构如图5所示。对于输入特征图X∈RC×H×W,其中C、H、W分别表示通道数、高度、宽度,残差通道注意力机制首先通过全局平均池化得到1×1×C的特征图,获取全局重要特征信息,然后利用下采样(通过1×1的卷积实现)压缩通道数,再通过上采样(通过1×1的卷积实现)恢复到原始通道数C,得到每一个通道的权重系数,将权重系数与原始特征X相乘便可得到更具辨别性的特征图。残差通道注意力为重要通道分配更大的权重,从而增强沿通道维度的全局特征。The LGA module applies both residual channel attention and spatial attention. The structure of the residual channel attention mechanism is shown in Fig. 5. For the input feature map X∈R C×H×W , where C, H, and W represent the number of channels, height, and width respectively, the residual channel attention mechanism first obtains a 1×1×C feature map through global average pooling, Obtain global important feature information, then use downsampling (implemented by 1×1 convolution) to compress the number of channels, and then restore to the original number of channels C by upsampling (implemented by 1×1 convolution), and obtain the Weight coefficient, multiplying the weight coefficient with the original feature X can get a more discriminative feature map. Residual channel attention assigns larger weights to important channels, thereby enhancing global features along the channel dimension.

空间注意力机制的结构如图6所示。对于输入特征图X∈RC×H×W,空间注意力机制通过在通道维度做最大池化和平均池化压缩特征的通道维度,得到特征图Fmax∈R1×H×W和Favg∈R1×H×W,然后将Fmax和Favg拼接,并在拼接后的特征图上使用卷积操作,接着使用Sigmoid激活函数产生空间注意力图Fs∈R1×H×W,将空间注意力图Fs与输入特征图X相乘便可对重要空间分配更大的权重,从而增强空间域的局部细节信息。The structure of the spatial attention mechanism is shown in Fig. 6. For the input feature map X∈R C×H×W , the spatial attention mechanism obtains the feature map F max ∈ R 1×H×W and F avg by performing maximum pooling and average pooling on the channel dimension of the compressed feature. ∈R 1×H×W , then concatenate F max and F avg , and use the convolution operation on the spliced feature map, and then use the Sigmoid activation function to generate a spatial attention map F s ∈R 1×H×W , will Multiplying the spatial attention map F s with the input feature map X can assign larger weights to important spaces, thereby enhancing the local details of the spatial domain.

LGA模块的结构如图7所示。从CNN(Res2Net是CNN的一种)和Transformer(Swin-TransformerV2是Transformer的一个变体)两个分支提取的具有相同分辨率的特征(如T1和R1)被送到LGA模块中,来自CNN的特征Fc被送入空间注意力(SA)分支,进一步增强CNN提取的局部特征,并抑制不相关区域;来自Transformer的特征Ft被送入残差通道注意力(RCA)分支,增强Transformer提取全局上下文特征。The structure of the LGA module is shown in Figure 7. Features with the same resolution (such as T 1 and R 1 ) extracted from the two branches of CNN (Res2Net is a type of CNN) and Transformer (Swin-TransformerV2 is a variant of Transformer) are sent to the LGA module, from The feature Fc of CNN is sent to the spatial attention (SA) branch to further enhance the local features extracted by CNN and suppress irrelevant regions; the feature Ft from Transformer is sent to the residual channel attention (RCA) branch to enhance Transformer extraction Global context features.

为使RCA分支专注于全局重要特征的学习,将Ft与经过RCA的特征做残差连接。对于SA分支,为减少模型计算量,首先进行卷积操作降低通道维数,然后将卷积后的结果与经过SA的特征做残差连接,使SA分支专注于学习空间特征。然后将两个分支的输出拼接来集成全局位置信息和局部细节信息,并通过一个3×3的卷积层促进特征融合,以此将局部特征和全局的依赖性自适应地整合到一起。In order to make the RCA branch focus on the learning of global important features, Ft is residually connected with the RCA features. For the SA branch, in order to reduce the calculation amount of the model, the convolution operation is first performed to reduce the channel dimension, and then the convolution result is residually connected with the SA features, so that the SA branch can focus on learning spatial features. Then the output of the two branches is concatenated to integrate global position information and local detail information, and a 3×3 convolutional layer is used to promote feature fusion, so as to adaptively integrate local features and global dependencies.

3、边界图生成。3. Boundary map generation.

浅层的特征层(如T1和T2、R1和R2)中更多的保留了目标的边缘空间信息,而深层的卷积层(如T3和T4、R3和R4)中保留了更多的语义信息,因此,使用浅层特征(图3中的a1和a2)作为BGM模块的输入来生成边界图(BM),通过上采样使BM和输入图像具有相同的空间大小,并通过如下二值交叉熵损失函数来测量生成的边界图。Shallow feature layers (such as T 1 and T 2 , R 1 and R 2 ) retain more edge spatial information of the target, while deep convolutional layers (such as T 3 and T 4 , R 3 and R 4 ) retains more semantic information, so shallow features (a1 and a2 in Figure 3) are used as input to the BGM module to generate a boundary map (BM), and the BM and the input image have the same space through upsampling size, and measure the resulting boundary map by the following binary cross-entropy loss function.

其中,表示第i个图像的边界图真实值,/>表示预测的第i个图像的边界图。in, Indicates the ground truth value of the boundary map of the i-th image, /> represents the predicted boundary map for the i-th image.

4、跨层特征交互。4. Cross-layer feature interaction.

CFT模块为实现跨层特征的高效交互,并应对COS中目标大小的变化,引入了多尺度通道注意力(MS-CA)机制。MS-CA机制对不同规模目标具有较强的适应性,MS-CA机制的结构如图8所示。MS-CA机制基于双分支结构,一个分支利用全局平均池化来获得全局特征,为大尺度对象分配更多的关注,另一个分支利用点卷积来获得细粒度的局部细节,更利于捕获小对象的特征。与其他多尺度注意机制不同,MS-CA机制使用两个分支中的点卷积来压缩和恢复通道维度的特征,从而聚合不同层次的多尺度通道信息,有效地表征卷积层的多尺度信息。The CFT module introduces the Multi-Scale Channel Attention (MS-CA) mechanism to achieve efficient interaction of cross-layer features and cope with the variation of object size in COS. The MS-CA mechanism has strong adaptability to targets of different scales. The structure of the MS-CA mechanism is shown in Figure 8. The MS-CA mechanism is based on a dual-branch structure. One branch uses global average pooling to obtain global features and allocate more attention to large-scale objects. The other branch uses point convolution to obtain fine-grained local details, which is more conducive to capturing small characteristics of the object. Different from other multi-scale attention mechanisms, the MS-CA mechanism uses point convolution in two branches to compress and restore the features of the channel dimension, thereby aggregating the multi-scale channel information at different levels and effectively characterizing the multi-scale information of the convolutional layer. .

特征交互模块的整体结构如图9所示,相邻层的特征ah和al首先相加,然后送入MS-CA获得多尺度通道信息,接着使用Sigmoid激活函数获得归一化的特征图Fs,Fs、1-Fs(对应图9中的虚线箭头)分别与al和ah相乘来增强特征表示。为保留每个特征的原始信息,将原始特征与增强的特征进行残差连接,然后通过相加将两个分支合并,并跟一个3×3的卷积层促进融合,得到CFT模块的输出FeThe overall structure of the feature interaction module is shown in Figure 9. The features a h and a l of adjacent layers are first added, and then sent to MS-CA to obtain multi-scale channel information, and then the Sigmoid activation function is used to obtain a normalized feature map F s , F s , 1-F s (corresponding to the dotted arrows in Figure 9) are multiplied with a l and a h respectively to enhance the feature representation. In order to preserve the original information of each feature, the original feature and the enhanced feature are residually connected, and then the two branches are merged by adding, and a 3×3 convolutional layer is used to promote fusion, and the output F of the CFT module is obtained. e .

特征交互模块通过引入MS-CA机制来聚合具有不同层次和感受野的跨层图像特征,以提供丰富的多层次上下文信息,并且多层次特征相互作用,以产生更有效和有区别的图像信息,使模型能够自适应地分割不同大小的目标。The feature interaction module aggregates cross-layer image features with different levels and receptive fields by introducing the MS-CA mechanism to provide rich multi-level context information, and multi-level features interact to produce more effective and discriminative image information, Enables the model to adaptively segment objects of different sizes.

5、边界引导多卷积头转置注意力。5. Boundary-guided multi-convolution head transposed attention.

BMTA模块以多头注意力的方式有效结合预测边界图的局部细节和目标的全局信息。BMTA模块基于多卷积头转置注意力(MTA),实现局部与非局部像素的交互,多卷积转置注意力的结构如图10所示。The BMTA module effectively combines the local details of the predicted boundary map and the global information of the target in a multi-head attention manner. The BMTA module is based on the multi-convolution head transpose attention (MTA) to realize the interaction between local and non-local pixels. The structure of multi-convolution transpose attention is shown in Figure 10.

首先将输入的特征图做归一化(layer normalization)处理,然后分别送入三个1×1的卷积和3×3的深度卷积来生成query(Q)、key(K)、value(V),接着Q的转置和K的转置执行矩阵乘法生成注意力图(AM),然后AM与V的转置矩阵做矩阵乘法生成新的特征图,计算公式如下所示:First, the input feature map is normalized (layer normalization), and then sent to three 1×1 convolutions and 3×3 depth convolutions to generate query (Q), key (K), value ( V), then the transposition of Q and the transposition of K perform matrix multiplication to generate an attention map (AM), and then perform matrix multiplication between AM and the transposition matrix of V to generate a new feature map. The calculation formula is as follows:

BMTA模块的结构如图11所示。输入特征图被分别送入边界注意力头(BAH)分支和目标注意力头(OAH)分支进行不同特征的学习。BAH分支融入边界图,提供边界先验,使模型更好的学习边缘细节特征,OAH学习目标整体特征。The structure of the BMTA module is shown in Figure 11. The input feature maps are respectively fed into the Boundary Attention Head (BAH) branch and Object Attention Head (OAH) branch to learn different features. The BAH branch is integrated into the boundary map to provide boundary priors, so that the model can better learn edge detail features, and OAH learns the overall features of the target.

OAH的结构和计算公式与MTA保持一致,实现跨通道的全局特征提取,以从整体上分离目标和背景。The structure and calculation formula of OAH are consistent with MTA, and realize the global feature extraction across channels to separate the target and background as a whole.

BAH将预测的二值边界图(BM)引入MTA中,学习边界增强的表示,以有效地捕获对象的重要稀疏局部边界信息。BAH的结构如图12所示。BAH introduces the predicted binary boundary map (BM) into MTA to learn a boundary-enhanced representation to efficiently capture the important sparse local boundary information of objects. The structure of BAH is shown in Figure 12.

具体来说,将经过卷积运算后得到的Q和K分别与BM相乘得到Qb和Kb,接着Qb和Kb相乘得到带边界信息的注意力矩阵。BAH的计算公式如下所示:Specifically, Q and K obtained after the convolution operation are multiplied by BM to obtain Q b and K b , and then Q b and K b are multiplied to obtain an attention matrix with boundary information. The calculation formula of BAH is as follows:

V是没有边界信息的矩阵,这样,可以通过在边界建立成对关系来细化特征,确保边界的清晰和完整。V is a matrix without boundary information. In this way, features can be refined by establishing a pair relationship at the boundary to ensure the clarity and integrity of the boundary.

BMTA模块最后将OAH和BAH的输出拼接然后送入3×3的卷积实现边界和整体特征的融合,为避免特征退化,使用残差连接将原始与融合后的特征相加。BMTA以边界信息为指导,将边界图的特征与不同级别的特征表示相集成,有助于模型区分伪装物体的精细前景和背景边界。The BMTA module finally concatenates the outputs of OAH and BAH and sends them to a 3×3 convolution to realize the fusion of boundary and overall features. In order to avoid feature degradation, a residual connection is used to add the original and fused features. Guided by the boundary information, BMTA integrates the features of the boundary map with different levels of feature representations, which helps the model distinguish fine foreground and background boundaries of camouflaged objects.

6、预测图生成。6. Prediction map generation.

通过BMTA的特征被送入一个带有Sigmoid的3×3的卷积层来生成预测图,每一次层的预测图Pi(i=1,2,3)都被BCE损失函数和IOU损失函数监督以优化整个模型的参数。本实施例模型采用多重监督,使模型对伪装目标的分割具有更好的表现力,模型的整体损失函数如下所示:The features of BMTA are sent to a 3×3 convolutional layer with Sigmoid to generate a prediction map, and the prediction map P i (i=1,2,3) of each layer is used by the BCE loss function and the IOU loss function. supervision to optimize the parameters of the entire model. The model of this embodiment adopts multiple supervision, so that the model can better express the segmentation of the camouflaged target. The overall loss function of the model is as follows:

其中,表示预测图,/>表示图像的真实值,/>表示第i个预测图。in, Indicates the forecast map, /> represents the true value of the image, /> Denotes the i-th predicted map.

参照图13,本发明实施例还提供了一种基于多层次特征融合的伪装目标图像分割系统,本基于多层次特征融合的伪装目标图像分割系统包括数据获取单元100、特征提取单元200、特征融合单元300、边界引导单元400、特征交互单元500、边界融合单元600和图像分割单元700,其中:Referring to Fig. 13, the embodiment of the present invention also provides a camouflage target image segmentation system based on multi-level feature fusion, the camouflage target image segmentation system based on multi-level feature fusion includes a data acquisition unit 100, a feature extraction unit 200, a feature fusion Unit 300, boundary guidance unit 400, feature interaction unit 500, boundary fusion unit 600 and image segmentation unit 700, wherein:

数据获取单元100,用于获取待分割的伪装目标图像;A data acquisition unit 100, configured to acquire a masquerading target image to be segmented;

特征提取单元200,用于通过第一分支网络和第二分支网络采用不同网络层对待分割的伪装目标图像进行多层次特征提取,获得第一分支网络输出的多种层次的第一特征图和第二分支网络输出的多种层次的第二特征图;The feature extraction unit 200 is configured to perform multi-level feature extraction on the camouflage target image to be segmented by using different network layers through the first branch network and the second branch network, and obtain the first feature map and the second feature map of various levels output by the first branch network. Multiple levels of second feature maps output by the two-branch network;

特征融合单元300,用于对每种层次的第一特征图进行全局特征增强,获得多种层次增强后的全局特征;对每种层次的第二特征图进行局部特征增强,获得多种层次增强后的局部特征;并将每种层次的增强后的局部特征和与其相同层次的增强后的全局特征进行特征融合,获得多种层次的融合特征;The feature fusion unit 300 is configured to perform global feature enhancement on the first feature map of each level to obtain multi-level enhanced global features; perform local feature enhancement to the second feature map of each level to obtain multiple level enhancements The enhanced local features; and the enhanced local features of each level are fused with the enhanced global features of the same level to obtain multi-level fusion features;

边界引导单元400,用于对多种层次的融合特征中两个浅层网络层的融合特征进行边界引导,得到边界图;The boundary guidance unit 400 is used to perform boundary guidance on the fusion features of two shallow network layers in the fusion features of multiple levels to obtain a boundary map;

特征交互单元500,用于将多种层次的融合特征中相邻网络层的融合特征进行特征交互,获得多个交互特征;The feature interaction unit 500 is used to perform feature interaction on the fusion features of adjacent network layers in the fusion features of multiple levels to obtain multiple interaction features;

边界融合单元600,用于将边界图分别与多个交互特征中的每个交互特征进行边界融合,获得多个边界融合特征;A boundary fusion unit 600, configured to perform boundary fusion on the boundary map and each of the multiple interactive features respectively to obtain multiple boundary fusion features;

图像分割单元700,用于基于多个边界融合特征,分割出每个边界融合特征对应的待分割的伪装目标图像中的伪装目标图像。The image segmentation unit 700 is configured to segment a camouflage target image in the camouflage target images to be segmented corresponding to each boundary fusion feature based on a plurality of boundary fusion features.

需要说明的是,由于本实施例中的一种基于多层次特征融合的伪装目标图像分割系统与上述的一种基于多层次特征融合的伪装目标图像分割方法基于相同的发明构思,因此,方法实施例中的相应内容同样适用于本系统实施例,此处不再详述。It should be noted that since the multi-level feature fusion-based camouflage target image segmentation system in this embodiment is based on the same inventive concept as the above-mentioned multi-level feature fusion-based camouflage target image segmentation method, the method implementation The corresponding content in the example is also applicable to this system embodiment, and will not be described in detail here.

本申请实施例还提供了一种电子设备,该电子设备包括:至少一个存储器,至少一个处理器,至少一个计算机程序,至少一个计算机程序被存储在至少一个存储器中,至少一个处理器执行至少一个计算机程序以实现上述实施例中任一种基于多层次特征融合的伪装目标图像分割方法。该电子设备可以为包括平板电脑、车载电脑等任意智能终端。The embodiment of the present application also provides an electronic device, the electronic device includes: at least one memory, at least one processor, at least one computer program, at least one computer program is stored in at least one memory, and at least one processor executes at least one A computer program to implement any one of the multi-level feature fusion-based segmentation methods for camouflaged target images in the above embodiments. The electronic device may be any intelligent terminal including a tablet computer, a vehicle-mounted computer, and the like.

参照图14,图14示意了另一实施例的一种电子设备的硬件结构,该电子设备包括:Referring to FIG. 14, FIG. 14 illustrates a hardware structure of an electronic device according to another embodiment, and the electronic device includes:

处理器810,可以采用通用的CPU(CentralProcessingUnit,中央处理器)、微处理器、应用专用集成电路(ApplicationSpecificIntegratedCircuit,ASIC)、或者一个或多个集成电路等方式实现,用于执行相关程序,以实现本申请实施例所提供的技术方案;The processor 810 may be implemented by a general-purpose CPU (Central Processing Unit, central processing unit), a microprocessor, an application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, and is used to execute related programs, so as to realize The technical solutions provided by the embodiments of the present application;

存储器820,可以采用只读存储器(ReadOnlyMemory,ROM)、静态存储设备、动态存储设备或者随机存取存储器(RandomAccessMemory,RAM)等形式实现。存储器820可以存储操作系统和其他应用程序,在通过软件或者固件来实现本说明书实施例所提供的技术方案时,相关的程序代码保存在存储器820中,并由处理器810来调用执行本申请实施例的一种基于多层次特征融合的伪装目标图像分割方法;The memory 820 may be implemented in the form of a read-only memory (ReadOnlyMemory, ROM), a static storage device, a dynamic storage device, or a random access memory (RandomAccessMemory, RAM). The memory 820 can store operating systems and other application programs. When implementing the technical solutions provided by the embodiments of this specification through software or firmware, the relevant program codes are stored in the memory 820 and called by the processor 810 to execute the implementation of the present application. An example of a camouflage target image segmentation method based on multi-level feature fusion;

输入/输出接口830,用于实现信息输入及输出;The input/output interface 830 is used to realize information input and output;

通信接口840,用于实现本设备与其他设备的通信交互,可以通过有线方式(例如USB、网线等)实现通信,也可以通过无线方式(例如移动网络、WIFI、蓝牙等)实现通信;The communication interface 840 is used to realize the communication and interaction between this device and other devices, which can realize communication through wired methods (such as USB, network cable, etc.) or wireless methods (such as mobile network, WIFI, Bluetooth, etc.);

总线850,在设备的各个组件(例如处理器810、存储器820、输入/输出接口830和通信接口840)之间传输信息;bus 850, to transfer information between various components of the device (such as processor 810, memory 820, input/output interface 830, and communication interface 840);

其中处理器810、存储器820、输入/输出接口830和通信接口840通过总线850实现彼此之间在设备内部的通信连接。The processor 810 , the memory 820 , the input/output interface 830 and the communication interface 840 are connected to each other within the device through the bus 850 .

本申请实施例还提供了一种存储介质,该存储介质为计算机可读存储介质,该计算机可读存储介质存储有计算机程序,计算机程序用于使计算机执行上述实施例中任一种基于多层次特征融合的伪装目标图像分割方法。The embodiment of the present application also provides a storage medium, the storage medium is a computer-readable storage medium, and the computer-readable storage medium stores a computer program, and the computer program is used to make the computer execute any one of the above-mentioned embodiments based on multi-level Feature Fusion for Image Segmentation of Camouflaged Objects.

存储器作为一种非暂态计算机可读存储介质,可用于存储非暂态软件程序以及非暂态性计算机可执行程序。此外,存储器可以包括高速随机存取存储器,还可以包括非暂态存储器,例如至少一个磁盘存储、闪存、或其他非暂态固态存储。在一些实施方式中,存储器可选包括相对于处理器远程设置的存储器,这些远程存储器可以通过网络连接至该处理器。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。As a non-transitory computer-readable storage medium, memory can be used to store non-transitory software programs and non-transitory computer-executable programs. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one disk storage, flash memory, or other non-transitory solid-state storage. In some embodiments, the memory optionally includes memory located remotely from the processor, and these remote memories may be connected to the processor via a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.

本申请实施例描述的实施例是为了更加清楚的说明本申请实施例的技术方案,并不构成对于本申请实施例提供的技术方案的限定,本领域技术人员可知,随着技术的演变和新应用场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。The embodiments described in the embodiments of the present application are to illustrate the technical solutions of the embodiments of the present application more clearly, and do not constitute a limitation to the technical solutions provided by the embodiments of the present application. Those skilled in the art know that with the evolution of technology and new For the emergence of application scenarios, the technical solutions provided by the embodiments of the present application are also applicable to similar technical problems.

本领域技术人员可以理解的是,图1中示出的技术方案并不构成对本申请实施例的限定,可以包括比图示更多或更少的步骤,或者组合某些步骤,或者不同的步骤。Those skilled in the art can understand that the technical solution shown in Figure 1 does not constitute a limitation to the embodiment of the present application, and may include more or fewer steps than those shown in the illustration, or combine some steps, or different steps .

以上所描述的装置实施例仅仅是示意性的,其中作为分离部件说明的单元可以是或者也可以不是物理上分开的,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。The device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统、设备中的功能模块/单元可以被实施为软件、固件、硬件及其适当的组合。Those of ordinary skill in the art can understand that all or some of the steps in the methods disclosed above, the functional modules/units in the system, and the device can be implemented as software, firmware, hardware, and an appropriate combination thereof.

本申请的说明书及上述附图中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first", "second", "third", "fourth", etc. (if any) in the description of the present application and the above drawings are used to distinguish similar objects and not necessarily to describe specific sequence or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances such that the embodiments of the application described herein can be practiced in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" and "having", as well as any variations thereof, are intended to cover a non-exclusive inclusion, for example, a process, method, system, product or device comprising a sequence of steps or elements is not necessarily limited to the expressly listed instead, may include other steps or elements not explicitly listed or inherent to the process, method, product or apparatus.

应当理解,在本申请中,“至少一个(项)”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,用于描述关联对象的关联关系,表示可以存在三种关系,例如,“A和/或B”可以表示:只存在A,只存在B以及同时存在A和B三种情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或其类似表达,是指这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b或c中的至少一项(个),可以表示:a,b,c,“a和b”,“a和c”,“b和c”,或“a和b和c”,其中a,b,c可以是单个,也可以是多个。It should be understood that in this application, "at least one (item)" means one or more, and "multiple" means two or more. "And/or" is used to describe the association relationship of associated objects, indicating that there can be three types of relationships, for example, "A and/or B" can mean: only A exists, only B exists, and A and B exist at the same time , where A and B can be singular or plural. The character "/" generally indicates that the contextual objects are an "or" relationship. "At least one of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items. For example, at least one item (piece) of a, b or c can mean: a, b, c, "a and b", "a and c", "b and c", or "a and b and c ", where a, b, c can be single or multiple.

在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,上述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed devices and methods may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the above units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or can be Integrate into another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.

上述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described above as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.

集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括多指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例的方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,简称ROM)、随机存取存储器(Random Access Memory,简称RAM)、磁碟或者光盘等各种可以存储程序的介质。If the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or part of the contribution to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including multiple instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the method in each embodiment of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM for short), random access memory (Random Access Memory, RAM for short), magnetic disk or optical disk, etc., which can store programs. medium.

以上参阅附图说明了本申请实施例的优选实施例,并非因此局限本申请实施例的权利范围。本领域技术人员不脱离本申请实施例的范围和实质内所作的任何修改、等同替换和改进,均应在本申请实施例的权利范围之内。The preferred embodiments of the embodiments of the present application have been described above with reference to the accompanying drawings, which does not limit the scope of rights of the embodiments of the present application. Any modifications, equivalent replacements and improvements made by those skilled in the art without departing from the scope and essence of the embodiments of the present application shall fall within the scope of rights of the embodiments of the present application.

Claims (10)

1. The camouflage target image segmentation method based on the multi-level feature fusion is characterized by comprising the following steps of:
obtaining a camouflage target image to be segmented;
performing multi-level feature extraction on the camouflage target image to be segmented by adopting different network layers through a first branch network and a second branch network to obtain a first feature image of multiple levels output by the first branch network and a second feature image of multiple levels output by the second branch network;
performing global feature enhancement on the first feature map of each level to obtain global features after multiple levels of enhancement; carrying out local feature enhancement on the second feature map of each level to obtain local features after multiple levels of enhancement; carrying out feature fusion on the enhanced local features of each level and the enhanced global features of the same level to obtain fusion features of multiple levels;
conducting boundary guiding on the fusion characteristics of two shallow network layers in the fusion characteristics of the multiple layers to obtain a boundary diagram;
performing feature interaction on the fusion features of adjacent network layers in the fusion features of the multiple layers to obtain multiple interaction features;
Respectively carrying out boundary fusion on the boundary map and each interactive feature in the plurality of interactive features to obtain a plurality of boundary fusion features;
and dividing a camouflage target image in the camouflage target image to be divided, which corresponds to each boundary fusion feature, based on the boundary fusion features.
2. The method for splitting a camouflage target image based on multi-level feature fusion according to claim 1, wherein the steps of extracting multi-level features of the camouflage target image to be split by using different network layers through a first branch network and a second branch network to obtain a first feature map of multiple layers output by the first branch network and a second feature map of multiple layers output by the second branch network comprise:
extracting features of global context information of the camouflage target image to be segmented by adopting different network layers through a first branch network to obtain a first feature map of multiple layers output by the first branch network;
and extracting the characteristics of the local detail information of the camouflage target image to be segmented by adopting different network layers through a second branch network, and obtaining a plurality of layers of second characteristic images output by the second branch network.
3. The method for dividing a camouflage target image based on multi-level feature fusion according to claim 1, wherein global feature enhancement is performed on the first feature map of each level to obtain global features with enhanced multiple levels; carrying out local feature enhancement on the second feature map of each level to obtain local features after multiple levels of enhancement; and performing feature fusion on the enhanced local feature of each level and the enhanced global feature of the same level to obtain fusion features of multiple levels, wherein the feature fusion comprises the following steps:
carrying out global feature enhancement on the first feature map of each level by adopting a residual channel attention mechanism to obtain global features after multiple levels of enhancement;
local feature enhancement is carried out on the second feature map of each level by adopting a spatial attention mechanism, so that local features after multiple levels of enhancement are obtained;
splicing the enhanced local features of each level with the enhanced global features of the same level to obtain splicing features of multiple levels, and adopting a convolution layer to promote feature fusion of the splicing features of the multiple levels to obtain fusion features of multiple levels.
4. The method for dividing a camouflage target image based on multi-level feature fusion according to claim 1, wherein the performing boundary guiding on the fusion features of two shallow network layers in the multi-level fusion features to obtain a boundary map comprises:
convolving the fusion characteristics of two shallow network layers in the fusion characteristics of the multiple layers to obtain a first convolution characteristic and a second convolution characteristic;
and performing addition operation on the first convolution characteristic and the second convolution characteristic to obtain an addition characteristic, and performing boundary guiding on the addition characteristic by adopting a plurality of convolution layers to obtain a boundary map.
5. The method for dividing a camouflage target image based on multi-level feature fusion according to claim 1, wherein the feature interaction is performed on the fusion features of adjacent network layers in the multi-level fusion features to obtain a plurality of interaction features, and the method comprises the following steps:
introducing a multi-scale channel attention mechanism into a feature interaction module, and adding fusion features of adjacent network layers in the multi-level fusion features to obtain a plurality of addition features;
inputting each added feature into the multi-scale channel attention mechanism to obtain a plurality of multi-scale channel features;
Obtaining a plurality of normalized features by adopting an activation function for each multi-scale channel feature, and obtaining a plurality of normalized difference features by subtracting each normalized feature;
performing feature enhancement on the plurality of normalized features and the plurality of normalized difference features to obtain a plurality of enhanced normalized features and a plurality of enhanced normalized difference features;
residual connection is carried out on each enhanced normalized feature and the corresponding fusion feature to obtain a plurality of first residual features, and each first residual feature is convolved to obtain a plurality of first convolution features;
residual connection is carried out on each enhanced normalized difference feature and the corresponding fusion feature to obtain a plurality of second residual features, and convolution is carried out on each second residual feature to obtain a plurality of second convolution features;
and adding each first convolution feature and the corresponding second convolution feature to obtain a plurality of added convolution features, and adopting a convolution layer to promote fusion of the added convolution features to obtain a plurality of interaction features.
6. The method for segmenting a camouflage target image based on multi-level feature fusion according to claim 1, wherein the performing boundary fusion on the boundary map and each of the plurality of interaction features to obtain a plurality of boundary fusion features includes:
Based on each interaction characteristic, learning a target overall characteristic by adopting a target attention head branch; wherein the target attention head branch is used for separating a target and a background on the whole based on the interaction characteristics;
based on the boundary map and each interaction characteristic, adopting a boundary attention head branch to learn boundary detail characteristics; the boundary attention head branches are used for capturing sparse local boundary information of the target based on the boundary map and the interaction characteristics;
splicing the output of each target attention head branch and the output of each boundary attention head branch corresponding to the target attention head branch to obtain a plurality of output splicing features, and adopting a convolution layer to promote feature fusion of the plurality of output splicing features to obtain a plurality of convolution fusion features;
and carrying out residual connection on each convolution fusion feature and each interaction feature corresponding to each convolution fusion feature to obtain a plurality of boundary fusion features.
7. The method for segmenting the camouflage target image based on multi-level feature fusion according to claim 1, wherein the segmenting the camouflage target image in the camouflage target image to be segmented corresponding to each boundary fusion feature based on the plurality of boundary fusion features comprises:
Inputting the boundary fusion features into a convolution layer with a Sigmoid activation function to generate a plurality of prediction graphs;
and dividing a camouflage target image in the camouflage target images to be divided based on each prediction graph.
8. The camouflage target image segmentation system based on the multi-level feature fusion is characterized by comprising:
the data acquisition unit is used for acquiring a camouflage target image to be segmented;
the feature extraction unit is used for extracting multi-level features of the camouflage target image to be segmented by adopting different network layers through a first branch network and a second branch network to obtain a first feature image of multiple levels output by the first branch network and a second feature image of multiple levels output by the second branch network;
the feature fusion unit is used for carrying out global feature enhancement on the first feature map of each level to obtain global features after multiple levels of enhancement; carrying out local feature enhancement on the second feature map of each level to obtain local features after multiple levels of enhancement; carrying out feature fusion on the enhanced local features of each level and the enhanced global features of the same level to obtain fusion features of multiple levels;
The boundary guiding unit is used for conducting boundary guiding on the fusion characteristics of the two shallow network layers in the fusion characteristics of the multiple layers to obtain a boundary diagram;
the feature interaction unit is used for carrying out feature interaction on the fusion features of the adjacent network layers in the fusion features of the multiple layers to obtain multiple interaction features;
the boundary fusion unit is used for carrying out boundary fusion on the boundary graph and each interaction feature in the interaction features respectively to obtain a plurality of boundary fusion features;
the image segmentation unit is used for segmenting a camouflage target image in the camouflage target images to be segmented, which correspond to each boundary fusion feature, based on the boundary fusion features.
9. An electronic device, comprising:
at least one memory;
at least one processor;
at least one computer program;
the at least one computer program is stored in the at least one memory, the at least one processor executing the at least one computer program to implement:
the camouflage target image segmentation method based on multi-level feature fusion according to any one of claims 1 to 7.
10. A storage medium that is a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program for causing a computer to execute:
A camouflage target image segmentation method based on multi-level feature fusion as claimed in any one of claims 1 to 7.
CN202310982262.1A 2023-08-07 2023-08-07 A camouflage target image segmentation method and system based on multi-level feature fusion Active CN116703950B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310982262.1A CN116703950B (en) 2023-08-07 2023-08-07 A camouflage target image segmentation method and system based on multi-level feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310982262.1A CN116703950B (en) 2023-08-07 2023-08-07 A camouflage target image segmentation method and system based on multi-level feature fusion

Publications (2)

Publication Number Publication Date
CN116703950A true CN116703950A (en) 2023-09-05
CN116703950B CN116703950B (en) 2023-10-20

Family

ID=87843689

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310982262.1A Active CN116703950B (en) 2023-08-07 2023-08-07 A camouflage target image segmentation method and system based on multi-level feature fusion

Country Status (1)

Country Link
CN (1) CN116703950B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119206218A (en) * 2024-09-09 2024-12-27 沈阳工业大学 EdgeAttenNet glomerulus image accurate segmentation system and method based on camouflaged target detection
CN119992274A (en) * 2025-04-11 2025-05-13 南开大学 Method and device for detecting camouflaged objects
CN120374477A (en) * 2025-04-11 2025-07-25 安庆师范大学 Dark light field image enhancement system and method for drama scene

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008047369A (en) * 2006-08-11 2008-02-28 Furukawa Battery Co Ltd:The Method of manufacturing lead storage battery
US11222217B1 (en) * 2020-08-14 2022-01-11 Tsinghua University Detection method using fusion network based on attention mechanism, and terminal device
CN114565770A (en) * 2022-03-23 2022-05-31 中南大学 Image segmentation method and system based on edge-assisted computation and mask attention
CN114581752A (en) * 2022-05-09 2022-06-03 华北理工大学 A camouflaged target detection method based on context awareness and boundary refinement
US20220230324A1 (en) * 2021-01-21 2022-07-21 Dalian University Of Technology Camouflaged object segmentation method with distraction mining
CN114943963A (en) * 2022-04-29 2022-08-26 南京信息工程大学 Remote sensing image cloud and cloud shadow segmentation method based on double-branch fusion network
CN115471774A (en) * 2022-09-19 2022-12-13 中南大学 Video time domain action segmentation method based on audio and video bimodal feature fusion
WO2023024577A1 (en) * 2021-08-27 2023-03-02 之江实验室 Edge computing-oriented reparameterization neural network architecture search method
CN116228702A (en) * 2023-02-23 2023-06-06 南京邮电大学 A Camouflaged Object Detection Method Based on Attention Mechanism and Convolutional Neural Network

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008047369A (en) * 2006-08-11 2008-02-28 Furukawa Battery Co Ltd:The Method of manufacturing lead storage battery
US11222217B1 (en) * 2020-08-14 2022-01-11 Tsinghua University Detection method using fusion network based on attention mechanism, and terminal device
US20220230324A1 (en) * 2021-01-21 2022-07-21 Dalian University Of Technology Camouflaged object segmentation method with distraction mining
WO2023024577A1 (en) * 2021-08-27 2023-03-02 之江实验室 Edge computing-oriented reparameterization neural network architecture search method
CN114565770A (en) * 2022-03-23 2022-05-31 中南大学 Image segmentation method and system based on edge-assisted computation and mask attention
CN114943963A (en) * 2022-04-29 2022-08-26 南京信息工程大学 Remote sensing image cloud and cloud shadow segmentation method based on double-branch fusion network
CN114581752A (en) * 2022-05-09 2022-06-03 华北理工大学 A camouflaged target detection method based on context awareness and boundary refinement
CN115471774A (en) * 2022-09-19 2022-12-13 中南大学 Video time domain action segmentation method based on audio and video bimodal feature fusion
CN116228702A (en) * 2023-02-23 2023-06-06 南京邮电大学 A Camouflaged Object Detection Method Based on Attention Mechanism and Convolutional Neural Network

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
JIESHENG WU,ET AL.: "Mask-and-Edge Co-Guided Separable Network for Camouflaged Object Detection", IEEE SIGNAL PROCESSING LETTERS, pages 748 *
MARTIN RAJCHL, ET AL.: "DeepCut: Object Segmentation from Bounding Box Annotations using Convolutional Neural Networks", ARXIV:1605.07866V2, pages 1 - 10 *
张冬冬 等: "伪装目标检测研究进展", 激光杂志, pages 1 - 18 *
徐胜军;欧阳朴衍;郭学源;TAHA MUTHAR KHAN;段中兴;: "多尺度特征融合空洞卷积 ResNet遥感图像建筑物分割", 光学精密工程, no. 07, pages 262 - 266 *
滕旭;张晖;杨春明;赵旭剑;李波;: "基于循环一致性对抗网络的数码迷彩伪装生成方法", 计算机应用, no. 02, pages 179 - 190 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119206218A (en) * 2024-09-09 2024-12-27 沈阳工业大学 EdgeAttenNet glomerulus image accurate segmentation system and method based on camouflaged target detection
CN119206218B (en) * 2024-09-09 2025-07-15 沈阳工业大学 EdgeAttenNet glomerulus image accurate segmentation system and method based on camouflaged target detection
CN119992274A (en) * 2025-04-11 2025-05-13 南开大学 Method and device for detecting camouflaged objects
CN120374477A (en) * 2025-04-11 2025-07-25 安庆师范大学 Dark light field image enhancement system and method for drama scene

Also Published As

Publication number Publication date
CN116703950B (en) 2023-10-20

Similar Documents

Publication Publication Date Title
US11328172B2 (en) Method for fine-grained sketch-based scene image retrieval
CN116703950A (en) Camouflage target image segmentation method and system based on multi-level feature fusion
CN111275107A (en) Multi-label scene image classification method and device based on transfer learning
CN114612832A (en) Real-time gesture detection method and device
WO2024077781A1 (en) Convolutional neural network model-based image recognition method and apparatus, and terminal device
WO2020098257A1 (en) Image classification method and device and computer readable storage medium
WO2022166258A1 (en) Behavior recognition method and apparatus, terminal device, and computer-readable storage medium
CN112380978A (en) Multi-face detection method, system and storage medium based on key point positioning
CN118587449A (en) A RGB-D saliency detection method based on progressive weighted decoding
CN111507403A (en) Image classification method, apparatus, computer equipment and storage medium
CN117809198A (en) Remote sensing image significance detection method based on multi-scale feature aggregation network
CN113869371B (en) Model training method, clothing fine-grained segmentation method and related devices
CN118379601A (en) Network infrared small target detection method based on ladder interaction attention and pixel characteristic enhancement
CN118298109A (en) Multi-mode electronic information system view processing method
CN117593517A (en) Camouflage target detection method based on complementary perception cross-view fusion network
CN112884702A (en) Polyp identification system and method based on endoscope image
Zhou et al. Semantic image segmentation using low-level features and contextual cues
CN115439713A (en) Model training method and device, image segmentation method, equipment, storage medium
CN117523626A (en) Pseudo RGB-D face recognition method
CN114387489A (en) Power equipment identification method and device and terminal equipment
CN118505731B (en) Medical image segmentation method, system, electronic device and storage medium
Mao et al. Trinity‐Yolo: High‐precision logo detection in the real world
Kumar et al. ResUNet: an automated deep learning model for image splicing localization
CN119478953A (en) Automatic labeling method, device, equipment and storage medium
CN118521472A (en) Image processing method, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant