[go: up one dir, main page]

CN114821059A - Salient object detection method and system based on boundary enhancement - Google Patents

Salient object detection method and system based on boundary enhancement Download PDF

Info

Publication number
CN114821059A
CN114821059A CN202210467623.4A CN202210467623A CN114821059A CN 114821059 A CN114821059 A CN 114821059A CN 202210467623 A CN202210467623 A CN 202210467623A CN 114821059 A CN114821059 A CN 114821059A
Authority
CN
China
Prior art keywords
boundary
information
feature
features
scale
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210467623.4A
Other languages
Chinese (zh)
Other versions
CN114821059B (en
Inventor
田智强
余凌昊
陈张
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN202210467623.4A priority Critical patent/CN114821059B/en
Publication of CN114821059A publication Critical patent/CN114821059A/en
Application granted granted Critical
Publication of CN114821059B publication Critical patent/CN114821059B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

本发明公开了一种基于边界增强的显著性目标检测方法及系统,以视觉显著性图像数据为输入,通过卷积神经网络对显著性目标区域进行预测,解决显著性目标检测任务中尺度变化和边界区域像素模糊的问题,通过使用不同分辨率的特征信息相互进行补充,进一步加强单一分辨率特征的表达能力;使用多尺度特征提取,从固定分辨率特征中提取不同尺度的信息,更好的解决目标尺度变化的问题;使用边界提取建模显著性边界,提取边界信息后进一步补充显著性目标特征信息,一定程度上解决边界像素不清晰的问题,得到最终的显著性目标预测,使用一种混合损失函数,从不同层面监督模型训练,以更加均匀高亮的突出显著性目标区域。

Figure 202210467623

The invention discloses a saliency target detection method and system based on boundary enhancement, which takes visual saliency image data as input, predicts the salient target area through a convolutional neural network, and solves the problem of scale change and variation in the salient target detection task. For the problem of blurred pixels in the boundary area, by using the feature information of different resolutions to complement each other, the expression ability of single resolution features is further enhanced; using multi-scale feature extraction, extracting information of different scales from fixed resolution features, better Solve the problem of target scale change; use boundary extraction to model saliency boundaries, and further supplement salient target feature information after extracting boundary information to solve the problem of unclear boundary pixels to a certain extent, and obtain the final salient target prediction. Hybrid loss function to supervise model training at different levels to highlight saliency target regions more uniformly.

Figure 202210467623

Description

Salient object detection method and system based on boundary enhancement
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to a method and a system for detecting a salient object based on boundary enhancement.
Background
Significant object detection has been studied for over twenty years. There are a total of three very important nodes. Significance target detection the first development surge originated from an article by Itti in 1998, which acts to mimic the human attention process, building a significance map from the bottom up using underlying features; the second node is a binary segmentation problem that the concept that the saliency detection is integrated into the target is changed into the saliency target, and the second node is more suitable for practical application; the third wave heat tide of the detection of the salient object is aroused by the open world of the convolutional neural network, and the convolutional neural network has great feature extraction capability and can obtain a larger receptive field, so that the salient region in the image can be better detected, and the method is also a mainstream method at present. At present, the significance detection field has produced great theoretical and application value, but another value thereof is as an aid to many other visual tasks, for example, as a preprocessing of tasks such as object recognition, image editing and semantic segmentation.
The scale change is one of the major challenges for the SOD task. CNN is difficult to handle this problem, limited by the downsampling operation. Different levels of feature layers have the ability to handle only a particular scale, with different resolutions of features having different amounts of target information embedded therein. One way is to output features of each layer laterally in a top-down path, upsample to a uniform resolution, and then fuse to obtain an output containing multi-scale information, but this approach uses only a single resolution feature in each layer, which is not sufficient to cope with various scale problems. Yet another simple strategy is to integrate feature layer information of different resolutions, but this way of fusion is prone to information redundancy and noise interference. The processing mode aiming at the scale change problem still has a space for further optimization.
Detail information is continuously lost in the feature extraction process, and the boundary area of the saliency target obtained by the pixel-level saliency method is often unsatisfactory. In order to obtain a fine and significant boundary, there are many innovative methods in addition to the multi-scale feature fusion method. Some methods use a recursive method to refine high-level features by using low-level local information, and some methods use superpixels to perform preprocessing before saliency detection to extract boundaries or CRF to perform post-processing on a saliency prediction map to maintain object boundaries, and such methods need additional processing procedures and are relatively inefficient. For the selection of the loss function, a training loss function commonly used for salient object detection is binary cross entropy loss, but the confidence of the binary cross entropy loss is low when boundary pixels are judged, so that the boundary is very fuzzy, and meanwhile, the consistency of a salient region cannot be ensured. There are many possibilities for improvement in the design of the network structure and the loss function of the boundary information extraction.
Disclosure of Invention
The invention aims to provide a method and a system for detecting a salient object based on boundary enhancement, so as to overcome the defects of the prior art.
In order to achieve the purpose, the invention adopts the following technical scheme:
a salient object detection method based on boundary enhancement comprises the following steps:
s1, extracting abstract feature maps with different resolutions from the training set image, performing multi-level fusion on the abstract feature maps to obtain multi-level fusion feature maps, and processing the multi-level fusion feature maps to obtain feature maps containing multi-scale information;
s2, performing information conversion on the obtained feature map containing the multi-scale information, splicing and fusing to obtain features containing boundary information, simultaneously obtaining a boundary detection result by using the features after each stage of conversion, and further splicing and fusing to obtain a fused boundary detection result;
s3, extracting multi-scale information of the feature graph containing the multi-scale information, and splicing the feature graph containing the multi-scale information with the features containing the boundary information to obtain a significant target detection result;
s4, training the saliency target detection model by using the saliency target detection result, the boundary detection result of each level, the fused boundary detection result and the corresponding training set until the convergence condition is met, and performing target detection by using the trained saliency target detection model;
further, in step S1, the convolutional neural network uses ResNet-50 trained on ImageNet as a backbone of the network, and removes the final pooling layer and the full-link layer to obtain five feature maps with different sizes.
Further, a network structure formed by using ResNet-50 trained on ImageNet as a backbone of the network comprises a multi-stage feature aggregation module, a multi-scale information extraction module and a boundary information extraction module.
Furthermore, the sizes are kept consistent through upsampling or pooling, information supplementation is carried out through element addition, characteristics after information supplementation are aggregated, five multilevel aggregation characteristic graphs with different sizes are obtained, and the expression capacity of the characteristics can be improved.
Further, the cavity convolutions with different cavity rates are used for sampling, information with different scales is obtained through different receptive fields, and the detection capability of the network model on the scale change target is improved.
Furthermore, in each step-by-step feature fusion process, multi-scale information extraction is carried out on features of different sizes for multiple times, and multi-scale information is further fused.
Furthermore, the step-by-step fusion features are used as input, boundary detection is carried out on each level of features, boundary information of a target can be extracted, and a significant target detection result of the network is further refined;
further, the training process uses a loss function and makes parameter adjustments as the loss propagates backwards.
Further, the loss function includes a significant target detection loss and a significant boundary detection loss, wherein the significant target detection loss is used for guiding the correct classification of the significant target pixel points, and the significant boundary detection loss is used for guiding the correct classification of the significant target boundary region pixel points.
Further, the loss of saliency target detection includes cross entropy loss BCE for individual pixels and consistency enhancement loss CEL for the entire image, which can make the detection result more uniformly highlighted.
Compared with the prior art, the invention has the following beneficial technical effects:
the invention relates to a salient object detection method based on boundary enhancement, which comprises the steps of extracting abstract feature maps with different resolutions from a training set image, carrying out multi-level fusion on the abstract feature maps to obtain multi-level fusion feature maps, and processing the multi-level fusion feature maps to obtain feature maps containing multi-scale information; the method comprises the steps that visual saliency image data are used as input, a saliency target area is predicted through a convolutional neural network, the problems of scale change and boundary area pixel blurring in a saliency target detection task are solved, feature information of different resolutions is used for mutual complementation, and the expression capacity of a single resolution feature is further enhanced; by using multi-scale feature extraction, information of different scales is extracted from the fixed resolution features, so that the problem of target scale change is solved better; and a mixed loss function is used for supervising model training from different layers so as to highlight the salient object area more uniformly and brightly.
Furthermore, multi-stage feature aggregation is adopted, so that features of different scales are mutually aggregated, and the expression capability of the features of fixed scales is enhanced.
Furthermore, multi-scale information is extracted from the features of fixed scale through multi-scale information extraction, and the detection capability of the network model on scenes with large target size changes is enhanced.
Furthermore, after the boundary information of the salient object is extracted, the salient object is supplemented, and the quality of model prediction is further improved.
Furthermore, the loss of the significant target detection is composed of binary cross entropy loss and consistency enhancement loss, and especially the consistency enhancement loss is supervised from the whole image layer, so that on one hand, a loss function can be more emphasized on the foreground, and on the other hand, the loss can be prevented from being interfered by scale change, and the significant target detection effect is improved.
Drawings
Fig. 1 is a flowchart of an implementation of a method for detecting a salient object with enhanced boundary in an embodiment of the present invention.
Fig. 2 is a network structure diagram of a salient object detection model with enhanced boundary in an embodiment of the present invention.
Fig. 3 is an internal structural diagram of a multi-stage feature aggregation module in an embodiment of the present invention.
Fig. 4 is an internal structure diagram of a multi-scale information extraction module in the embodiment of the present invention.
Fig. 5 is an internal structure diagram of the boundary information extraction module in the embodiment of the present invention.
Fig. 6 is a detection effect diagram of the salient object detection model with enhanced boundary in the embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention is described in further detail below with reference to the accompanying drawings:
as shown in fig. 1, a salient object detection method based on boundary enhancement includes the following steps;
s1, abstract feature maps with different resolutions are extracted from the training set image, multi-level fusion is carried out on the abstract feature maps to obtain multi-level fusion feature maps, and the multi-level fusion feature maps are processed to obtain feature maps containing multi-scale information
S2, performing information conversion on the obtained feature map containing the multi-scale information, splicing and fusing to obtain features containing boundary information, simultaneously obtaining a boundary detection result by using the features after each stage of conversion, and further splicing and fusing to obtain a fused boundary detection result;
s3, extracting multi-scale information of the feature graph containing the multi-scale information, and splicing the feature graph containing the multi-scale information with the features containing the boundary information to obtain a significant target detection result;
s4, training the saliency target detection model by using the saliency target detection result, the boundary detection result of each level, the fused boundary detection result and the corresponding training set until the convergence condition is met, and performing target detection by using the trained saliency target detection model;
the application adopts public data as a data set, and the data set is divided into a training set and a testing set.
The network structure design is as shown in fig. 2, and in the feature extraction stage: the convolutional neural network adopts ResNet-50 trained on ImageNet as a network backbone, removes the last pooling layer and the full connection layer, inputs an input image into the backbone network, and extracts five abstract feature maps F1-F5 with different levels after five groups of convolution operations, wherein the abstract feature maps are 1/2, 1/4, 1/8, 1/16 and 1/32 with the sizes of input, the number of channels is 64, 256, 512, 1024 and 2048, low-level detail information from F1 to F5 is continuously reduced, and high-level semantic information is gradually increased.
The multistage feature aggregation module is shown in FIG. 3 and is divided into two stages, complementary and aggregated, from F i-1 To F i+1 The feature separation rate is gradually reduced, and the number of channels is gradually increased. In a complementary phase S 1 In the method, input three features are firstly preprocessed through a 1 × 1 convolution to keep the number of channels consistent, so that on one hand, the calculated amount can be reduced, and on the other hand, the subsequent element fusion is facilitated; then the input feature F i Pooling and upsampling separately for F i-1 And F i+1 Is supplemented by the information of F i-1 And F i+1 Pooling and upsampling pairs F, respectively i Performing information supplementation, pooling and upsampling operations so that the mutually supplemented features have the same resolution, and the feature complementation process is expressed as follows:
Figure BDA0003625080350000051
Figure BDA0003625080350000052
Figure BDA0003625080350000053
in the formula: f' j Represents F j Features after reducing channel dimensions;
Figure BDA0003625080350000061
represents the complementary phase S 1 Supplemented ith-level features; conv (·) represents that the convolution is responsible for changing the channel dimensions; ReLU stands for ReLU nonlinear activation function; up (-) represents the upsampling operation; AvgPool (·) stands for mean pooling operation. It is worth noting that the top-most and bottom features have only one neighbor, so that in the complementary phase there is only L each 2 +L 3 And L 1 +L 2 Two channels.
The second stage is a feature aggregation stage S 2 In the stage, the complementary features from different channels are aggregated to obtain a feature containing multi-level information and the feature is output to a decoder as a transverse feature, and the specific formula is as follows:
Figure BDA0003625080350000062
in the formula: the features are fused in an element-by-element addition mode. Like the complementary stages, MF 1 And MF 5 Feature of only one neighbor aggregated. In addition, the above two stages of all element fusion operations are followed by a set of 3 × 3 convolutions, and a combination of regularization and nonlinear variation ReLU is used to further abstract features.
The reverse step-by-step feature fusion is performed on top MF5, multi-scale information extraction is performed on MF5, the feature containing the multi-scale information is up-sampled to 2 times of the original size, a 1 × 1 convolution operation is performed to reduce the channel dimension and keep the feature MF4 consistent, and element-by-element addition can be performed on the two features; in addition, after the addition of the characteristic picture element, a convolution operation of 3 multiplied by 3 is added for further fusion; in this way, the multi-stage polymerization features M1, M2, M3, M4 and M5 with different scales are finally fused step by step, and four step-by-step fusion features h1, h2, h3 and h4 are respectively obtained.
As shown in fig. 4, specifically, given an input feature h, in the forward process, the feature h is first extracted by the void convolution with the void ratios of 2, 4, and 8 to obtain a feature sh containing information of different scales 1 、sh 2 And sh 3 . And secondly, performing element addition on the original features and the sampled features by using a residual operation, and then further aggregating the features and improving the nonlinear capability by using a convolution operation and an activation function to finally obtain the features M containing multi-scale information. As shown in the following formula:
Figure BDA0003625080350000063
in the formula: conv 3×3 (. smallcircle.) denotes a 3 × 3 convolution operation;
Figure BDA0003625080350000064
is the fusion of the feature layer element by element addition.
Boundary information extraction Module As shown in FIG. 5, 4 progressive fusion features h are generated in the decoder 4 、h 3 、h 2 、h 1 As input, firstly, the feature eh containing boundary information is extracted through an information conversion module 4 、eh 3 、eh 2 、eh 1 The information conversion module is composed of two convolution groups of 1 × 1, 3 × 3 and 1 × 1, and the convolution group comprises a residual connecting operation; then, the features containing the boundary information are subjected to 1 multiplied by 1 convolution to reduce the number of channels and then are subjected to up-sampling to obtain a significant boundary prediction result e 4 、e 3 、e 2 、e 1 . In order to transmit the extracted boundary information of the salient object to the salient object prediction branch to make up for the loss of details, the extracted multistage boundary characteristics eh 4 、eh 3 、eh 2 、eh 1 And after upsampling, splicing along a channel, inputting the upsampled feature into a boundary feature aggregation module to obtain a final feature EF containing boundary information, wherein the boundary aggregation module consists of four 3 multiplied by 3 convolutions containing residual error operation. The final boundary characteristics are shown below:
EF=EdgeInfo(Concat(Up(eh 1 ),Up(eh 1 ),Up(eh 3 ),Up(eh 4 )))
in the formula: EdgeInfo (·) represents the boundary feature aggregation module; and EF represents the significance boundary features of the aggregated multilevel information, and can be used for fusing with significance target features in the next step.
In the training process, parameters of the network are optimized by using a back propagation strategy, a loss function is used for assisting training, and the loss functions used in the training process are divided into two types according to different tasks: the detection loss of the significant boundary and the detection loss of the significant target, and the total loss function formula in the training process is expressed as follows:
Loss=L sod1 L edge
in the formula: lambda [ alpha ] 1 Is a hyperparameter to balance the loss of both tasks, and its value was set to 10 in the experiment.
Due to the high sparsity of boundary pixels, the high imbalance of the number of boundary pixels and the number of non-boundary pixels, the problem of pixel imbalance can be solved by using balanced binary entropy loss to supervise the significant boundary learning process. The balanced binary entropy loss formula is expressed as follows:
Figure BDA0003625080350000071
in the formula: beta is the proportion of the number of non-boundary pixels to the number of all pixels.
The significant target detection loss function is formed by combining two loss functions with different emphasis points, and comprises binary cross entropy loss aiming at a single pixel point and consistency enhancement loss aiming at a foreground area, wherein the total loss is expressed as follows:
L sod =L bce +L cel
the binary cross entropy penalty is the most used penalty function in the salient object detection task, which is a penalty at the pixel level that converges on all pixels, and the formula shows:
Figure BDA0003625080350000072
in the formula: p represents a saliency target prediction map; p represents a pixel point in P; g represents a true value map; g represents a pixel point in G; log (-) pixel level logarithm operation.
The consistency enhancement penalty is an image-level penalty that can make the penalty function more focused on the foreground on the one hand and free from scale-variation on the other hand. The consistency enhancement loss function is formulated as follows:
Figure BDA0003625080350000073
in the formula: p represents a saliency target prediction map; p represents a pixel point in the prediction map.
The invention relates to a salient target detection method based on boundary enhancement, which can solve the problems of large size change of a target in a visual scene, fuzzy prediction of a boundary region of the salient target and non-uniform pixels in the region aiming at a salient data set.
A multi-feature aggregation module is inserted into a network transmission layer, and the expression capability of the fixed resolution features is enhanced by aggregating feature information of different respective rates of adjacent layers.
A multi-scale information extraction module is inserted into each level of a network decoder, and the capability of the network facing a scene with large target size change is enhanced by extracting the multi-scale information of each level of features.
And based on the step-by-step fusion characteristics, detecting the boundary of the salient object by using a boundary extraction module, fusing the boundary characteristics and the salient object characteristics, and enhancing the detection effect of the network model in the boundary area.
A mixed loss function is used for a significant target detection task, supervision is performed from two layers of a pixel level and an image level, return of a gradient is promoted, model convergence is enhanced, and a model training effect is further improved;
the application achieves competitive Fmax and MAE results on four sets of published significant detection data sets, and the performance is superior to that of several popular significant target detection methods.
Examples
A salient object detection method based on boundary enhancement comprises the following steps:
s1, four sets of public significance data sets were used as the experimental data sets. The specific working process is as follows:
(1.1) adopting the training set part of the maximum data set as the training set of the model, and using the test set of the maximum data set and other three data sets as test sets;
(1.2) randomly turning the image data set horizontally before inputting the image data set into the network training to realize data expansion.
And S2, extracting abstract feature maps with different resolutions and different channel numbers by using a feature extraction network. The specific working process is as follows:
(2.1), removing the final pooling layer and the full-connection layer of the ResNet50 network, and only reserving the rest network structures;
and (2.2) inputting the data processed in the step (1.2) into a ResNet50 feature extraction network according to the dimension of (N, C, H, W) to obtain five groups of abstract feature maps with different resolutions and channel numbers.
S3, using the multi-stage feature aggregation module to enhance the expressive power of the features extracted by the encoder, as shown in the figure. The specific working process is as follows:
(3.1) carrying out convolution processing on the abstract feature diagram extracted in the step (2.2) to change the number of channels, and keeping the number of channels of all features consistent;
(3.2) supplementing the features obtained in the step (3.1), specifically, performing element fusion on the feature with low resolution after upsampling on the feature with high resolution between adjacent layers, performing element fusion on the feature with high resolution after pooling on the feature with low resolution, and supplementing the features with different resolutions;
and (3.3) aggregating the mutually complemented features obtained in the step (3.2), specifically, for each level of features, if higher level features exist, upsampling the higher level features and adding elements of the upsampled features, if lower level features exist, pooling the lower level features and adding elements of the pooled lower level features, and obtaining a corresponding aggregated feature for each level of features in the step (2.2), so that the expression capability of the fixed resolution features can be enhanced.
And S4, extracting multi-scale information from the fixed features by using a multi-scale information extraction module, and enhancing the detection capability of the network on different-scale targets. The specific working process is as follows:
(4.1) inputting the multi-stage aggregation characteristics extracted in the step (3.3) into three branches in parallel from top to bottom, sampling the three branches by convolution with different void rates of 2, 4 and 8 respectively, and performing element addition on the original characteristics and the sampled characteristics by using a residual operation;
and (4.2) upsampling the features containing the multi-scale information extracted in the step (4.1), adding elements of the corresponding party features in the step (3.3), and performing 3 x 3 convolution to obtain features after gradual fusion.
And S5, extracting significant boundary information from the step-by-step fusion features by using a boundary information extraction module, further supplementing significant target information, and enhancing the detection effect of the network. The specific working process is as follows:
(5.1) carrying out information conversion on the features obtained in the step (4.2) to obtain a feature map containing boundary information;
(5.2) reducing the channel dimension to 1 through 1 × 1 convolution on the features obtained in the step (5.1), obtaining boundary output through up-sampling, and fusing a plurality of boundary outputs to obtain fused boundary output;
and (5.3) performing upsampling on the boundary features in the step (5.1), splicing along the channels, and further fusing and changing the number of the channels through a boundary aggregation module to obtain the boundary features.
And S6, fusing the boundary information and the target information to obtain the final significance prediction. The specific working process is as follows:
(6.1) carrying out multi-scale information extraction again on the last feature in the (4.2), wherein the last feature is the bottom layer, so that element addition with the transverse feature is not needed;
and (6.2) splicing the boundary characteristics of (5.3) and the characteristics obtained in (6.1) along a channel, further performing convolution fusion, and obtaining a final significance prediction map through channel transformation and upsampling.
S7, training the target detection model by using the obtained boundary detection result, target detection result and corresponding training set image: in the training process, binary cross entropy loss is used, consistency enhancement loss and balance binary cross entropy loss are used for promoting gradient return, model convergence is enhanced, and the training effect is further improved.
S8, regarding the trained salient object detection model, using the test image as input, and obtaining a salient object detection result, as shown in fig. 6. The specific working process is as follows:
(8.1) regarding the significance target detection model in the step S7, taking the test set in the step (1.1) as an input to obtain a significance detection result;
(8.2) comparing the detection result of the saliency target detection model in the step (8.1) with the actual saliency target true value map, the saliency target detection model in the step (8.1) achieves excellent detection effect, and the detection result performs very well on Fmax, MAE and Em indexes on the four data sets, as shown in the figure.

Claims (10)

1. A salient object detection method based on boundary enhancement is characterized by comprising the following steps:
s1, extracting abstract feature maps with different resolutions from the training set image, performing multi-level fusion on the abstract feature maps to obtain multi-level fusion feature maps, and processing the multi-level fusion feature maps to obtain feature maps containing multi-scale information;
s2, performing information conversion on the obtained feature map containing the multi-scale information, splicing and fusing to obtain features containing boundary information, simultaneously obtaining a boundary detection result by using the features after each stage of conversion, and further splicing and fusing to obtain a fused boundary detection result;
s3, extracting multi-scale information of the feature graph containing the multi-scale information, and splicing the feature graph containing the multi-scale information with the features containing the boundary information to obtain a significant target detection result;
and S4, training the saliency target detection model by using the saliency target detection result, the boundary detection result and the corresponding training set until a convergence condition is met, and performing target detection by using the trained saliency target detection model.
2. The method for detecting the salient object based on the boundary enhancement as claimed in claim 1, wherein ResNet-50 trained on ImageNet is used as a backbone of a network to extract abstract feature maps with different resolutions from a training set image, and a final pooling layer and a full connection layer are removed to obtain five multi-level fusion feature maps with different sizes.
3. The method for detecting the salient object based on the boundary enhancement as claimed in claim 1, wherein the obtained multi-level fusion feature maps with different scales are fused from top to bottom in a reverse step-by-step manner, the multi-scale information extraction is performed on the current multi-level fusion feature map before each fusion, and then the up-sampling and the previous multi-level fusion feature map are fused to obtain the feature map containing the multi-scale information.
4. The method for detecting a salient object based on boundary enhancement as claimed in claim 1, wherein in step S2, the fusion of multilevel features is to keep the feature layer and its neighboring feature layer consistent in size through upsampling or pooling, supplement information by adding elements to each other, and aggregate the supplemented features to obtain five multilevel aggregate feature maps with different sizes.
5. The method for detecting the salient object based on the boundary enhancement as claimed in claim 1, is characterized in that the sampling is performed by using the hole convolution with different hole rates, and the information with different scales is obtained through different receptive fields, so that the detection capability of a network model for the object with the scale change is improved.
6. The method for detecting the salient object based on the boundary enhancement as claimed in claim 1, wherein in each step-by-step feature fusion process, multi-scale information extraction is performed on features of different sizes for multiple times, and the multi-scale information is further fused.
7. The method as claimed in claim 1, wherein the step-by-step fusion features are used as input, and the boundary detection is performed on each step of features to extract the boundary information of the target.
8. The method of claim 1, wherein the training process uses a loss function and adjusts parameters when the loss is propagated backwards.
9. The method of claim 8, wherein the loss function comprises significant object detection loss and significant boundary detection loss, wherein the significant object detection loss is used to guide the correct classification of the pixels in the significant object boundary region, and the significant boundary detection loss is used to guide the correct classification of the pixels in the significant object boundary region.
10. A salient object detection system based on boundary enhancement is characterized by comprising a convolution feature extraction network module, a multi-stage feature aggregation module, a boundary information extraction module, a multi-scale information extraction module and a detection module;
the convolutional feature extraction network is used for extracting abstract feature maps with different resolutions from a training set image, the multi-level feature aggregation module is used for carrying out multi-level fusion on the abstract feature maps to obtain multi-level fusion feature maps, and the multi-scale information extraction module is used for carrying out multi-scale extraction on the multi-level fusion feature maps to obtain feature maps containing multi-scale information;
the boundary information extraction module is used for carrying out information conversion on the obtained characteristic diagram containing the multi-scale information and then splicing and fusing the characteristic diagram to obtain characteristics containing the boundary information, obtaining a boundary detection result by utilizing the characteristics after each level of conversion, further splicing and fusing the characteristics to obtain a fused boundary detection result, and splicing the characteristic diagram containing the multi-scale information with the characteristic information containing the boundary information to obtain a significant target detection result after the multi-scale information extraction;
the detection module is used for training a significance target detection model according to the detected target detection result and the corresponding training set image until the loss value meets the convergence condition, and performing target detection by using the trained significance target detection model.
CN202210467623.4A 2022-04-29 2022-04-29 A method and system for salient object detection based on boundary enhancement Active CN114821059B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210467623.4A CN114821059B (en) 2022-04-29 2022-04-29 A method and system for salient object detection based on boundary enhancement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210467623.4A CN114821059B (en) 2022-04-29 2022-04-29 A method and system for salient object detection based on boundary enhancement

Publications (2)

Publication Number Publication Date
CN114821059A true CN114821059A (en) 2022-07-29
CN114821059B CN114821059B (en) 2025-05-30

Family

ID=82509639

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210467623.4A Active CN114821059B (en) 2022-04-29 2022-04-29 A method and system for salient object detection based on boundary enhancement

Country Status (1)

Country Link
CN (1) CN114821059B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115294359A (en) * 2022-08-17 2022-11-04 杭州电子科技大学 A saliency object detection method for high-resolution images based on deep learning
CN115731397A (en) * 2022-09-26 2023-03-03 苏州中农数智科技有限公司 Method and device for repairing uncertain edge points in significance detection
CN115937509A (en) * 2022-09-09 2023-04-07 深圳海翼智新科技有限公司 Model parameter adjustment method, device, storage medium and computer equipment
CN116403031A (en) * 2023-03-23 2023-07-07 广西电网有限责任公司电力科学研究院 Transmission line pin fault detection method based on deep data fusion
CN116862884A (en) * 2023-07-13 2023-10-10 西安理工大学 Concrete slump detection method based on remarkable target detection
CN119360031A (en) * 2024-12-25 2025-01-24 中国计量大学 A polyp image segmentation method based on dual-branch feature progressive fusion network

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160284085A1 (en) * 2015-03-25 2016-09-29 Oregon Health & Science University Systems and methods of choroidal neovascularization detection using optical coherence tomography angiography
WO2019136946A1 (en) * 2018-01-15 2019-07-18 中山大学 Deep learning-based weakly supervised salient object detection method and system
CN111310767A (en) * 2020-01-16 2020-06-19 浙江科技学院 A saliency detection method based on boundary enhancement
WO2021022752A1 (en) * 2019-08-07 2021-02-11 深圳先进技术研究院 Multimodal three-dimensional medical image fusion method and system, and electronic device
CN112836708A (en) * 2021-01-25 2021-05-25 绍兴图信物联科技有限公司 Image feature detection method based on Gram matrix and F norm
CN113139431A (en) * 2021-03-24 2021-07-20 杭州电子科技大学 Image saliency target detection method based on deep supervised learning
US11200679B1 (en) * 2020-07-09 2021-12-14 Toyota Research Institute, Inc. System and method for generating a probability distribution of a location of an object
CN114399741A (en) * 2021-12-03 2022-04-26 际络科技(上海)有限公司 Road surface obstacle identification method and system based on significance detection

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160284085A1 (en) * 2015-03-25 2016-09-29 Oregon Health & Science University Systems and methods of choroidal neovascularization detection using optical coherence tomography angiography
WO2019136946A1 (en) * 2018-01-15 2019-07-18 中山大学 Deep learning-based weakly supervised salient object detection method and system
WO2021022752A1 (en) * 2019-08-07 2021-02-11 深圳先进技术研究院 Multimodal three-dimensional medical image fusion method and system, and electronic device
CN111310767A (en) * 2020-01-16 2020-06-19 浙江科技学院 A saliency detection method based on boundary enhancement
US11200679B1 (en) * 2020-07-09 2021-12-14 Toyota Research Institute, Inc. System and method for generating a probability distribution of a location of an object
CN112836708A (en) * 2021-01-25 2021-05-25 绍兴图信物联科技有限公司 Image feature detection method based on Gram matrix and F norm
CN113139431A (en) * 2021-03-24 2021-07-20 杭州电子科技大学 Image saliency target detection method based on deep supervised learning
CN114399741A (en) * 2021-12-03 2022-04-26 际络科技(上海)有限公司 Road surface obstacle identification method and system based on significance detection

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ZHUGE, YZ等: "Boundary-Guided Feature Aggregation Network for Salient Object Detection", IEEE SIGNAL PROCESSING LETTERS, 11 October 2018 (2018-10-11) *
李媛丽;黄刚;王军;孟祥豪;张坤峰;段永胜;: "基于眼动预测与多层邻域感知的显著目标检测算法", 通信技术, no. 06, 10 June 2020 (2020-06-10) *
翟正利;孙霞;周炜;梁振明;: "基于全卷积神经网络的多目标显著性检测", 计算机技术与发展, no. 08, 10 August 2020 (2020-08-10) *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115294359A (en) * 2022-08-17 2022-11-04 杭州电子科技大学 A saliency object detection method for high-resolution images based on deep learning
CN115294359B (en) * 2022-08-17 2023-10-10 杭州电子科技大学 A method of salient target detection in high-resolution images based on deep learning
CN115937509A (en) * 2022-09-09 2023-04-07 深圳海翼智新科技有限公司 Model parameter adjustment method, device, storage medium and computer equipment
CN115731397A (en) * 2022-09-26 2023-03-03 苏州中农数智科技有限公司 Method and device for repairing uncertain edge points in significance detection
CN116403031A (en) * 2023-03-23 2023-07-07 广西电网有限责任公司电力科学研究院 Transmission line pin fault detection method based on deep data fusion
CN116862884A (en) * 2023-07-13 2023-10-10 西安理工大学 Concrete slump detection method based on remarkable target detection
CN119360031A (en) * 2024-12-25 2025-01-24 中国计量大学 A polyp image segmentation method based on dual-branch feature progressive fusion network

Also Published As

Publication number Publication date
CN114821059B (en) 2025-05-30

Similar Documents

Publication Publication Date Title
CN114821059A (en) Salient object detection method and system based on boundary enhancement
CN110378844B (en) A Blind Image Deblurring Method Based on Recurrent Multiscale Generative Adversarial Networks
CN108765296B (en) An Image Super-Resolution Reconstruction Method Based on Recurrent Residual Attention Network
CN112329800A (en) Salient object detection method based on global information guiding residual attention
CN111967524A (en) Multi-scale fusion feature enhancement algorithm based on Gaussian filter feedback and cavity convolution
CN112348870B (en) A salient object detection method based on residual fusion
CN113111736A (en) Multi-stage characteristic pyramid target detection method based on depth separable convolution and fusion PAN
CN113743301B (en) Solid-state nanopore sequencing electric signal noise reduction processing method based on residual self-encoder convolutional neural network
CN113888505B (en) Natural scene text detection method based on semantic segmentation
CN110929735A (en) Rapid significance detection method based on multi-scale feature attention mechanism
CN114419060A (en) Skin mirror image segmentation method and system
CN112529064A (en) Efficient real-time semantic segmentation method
CN114972155B (en) Polyp image segmentation method based on context information and reverse attention
CN114596503B (en) A road extraction method based on remote sensing satellite images
CN118982655A (en) An efficient RGB-D saliency detection method using multiple complementary information
CN119251244A (en) Polyp segmentation method and device
CN118334159A (en) Electric power image generation model construction method, image generation method and related device
CN113610732A (en) Full-focus image generation method based on interactive counterstudy
CN110189330A (en) A method of background removal based on deep learning
CN115147760B (en) A high-resolution remote sensing image change detection method based on video understanding and spatiotemporal decoupling
CN116152268A (en) Multi-scale intestinal polyp segmentation method integrating attention mechanisms
CN113313108A (en) Saliency target detection method based on super-large receptive field characteristic optimization
CN118134779A (en) Infrared and visible light image fusion method based on multi-scale reconstruction transducer and multi-dimensional attention
CN115457385B (en) Building change detection method based on lightweight network
CN111598841A (en) Example significance detection method based on regularized dense connection feature pyramid

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant