CN109903301B

CN109903301B - An Image Contour Detection Method Based on Multi-level Feature Channel Optimal Coding

Info

Publication number: CN109903301B
Application number: CN201910080334.7A
Authority: CN
Inventors: 范影乐; 方琳灵; 周涛; 武薇; 佘青山
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2019-01-28
Filing date: 2019-01-28
Publication date: 2021-04-13
Anticipated expiration: 2039-01-28
Also published as: CN109903301A

Abstract

The invention relates to an image contour detection method based on multi-level feature channel optimization coding. For the input image I(x, y), the present invention first obtains the optimal scale m _opt and direction θ _opt of the Gabor filter based on the similarity index, and uses m _opt and θ _opt as the frequency separation parameters of NSCT; The obtained contour subgraph is combined with I(x,y) for feature enhancement to realize the primary contour detection of I(x,y). ‑16s and FCN‑8s network units are composed of feature encoders and decoders. The convolution and pooling modules of the feature encoder are used to realize active learning of network parameters, and the deconvolution and upsampling modules of the feature decoder are used to obtain a , y) the corresponding image contour mask map, realizes the optimal coding of multi-level feature channels, and completes the efficient and accurate detection of image contours.

Description

Image contour detection method based on multistage characteristic channel optimization coding

Technical Field

The invention belongs to the field of machine learning and image processing, and particularly relates to an image contour detection method based on multilevel characteristic channel optimization coding.

Background

The contour information has important significance for segmentation and identification of image data, fast delineation of an image target region is achieved, analysis and understanding of an image on a limited feature dimension are facilitated, and therefore automatic detection of the image contour is one of important research contents in the field of machine learning and image processing. Conventional detection algorithms based on regional gradient information typically consider linear filtering and local directional characteristics of the image, such as image local energy based methods, but they typically do not involve important information such as active contours, texture edges, and region boundaries. The existing contour detection method based on deep learning is concerned, a processing process of a human visual perception system on visual information is simulated through a deep network structure, feature learning is actively carried out, original complex feature extraction and data reconstruction processes are effectively simplified, but the method generally has the following problems: (1) the segmentation and fusion of images directly through the neural network can result in the imprecision of segmentation results and the generalization of feature information. (2) By not combining deep learning with the traditional characteristic-based method, the detection performance is heavily dependent on the quantity and quality of training samples, and the filtering capability of redundant information including texture background is weak. (3) Although some methods consider the problem of extracting multi-source features, for example, SAR image segmentation based on Gabor-NSCT and a pulse neural network, it involves multi-source feature extraction of Gabor and NSCT under multiple scales, and then trains the extracted Gabor features and NSCT features as the inputs of two pulse neural networks, so that the segmentation performance will depend heavily on the perception capability of Gabor and NSCT on the image content, and the multi-source feature signal fusion coding capability under multiple scales is not fully utilized, and in addition, the pulse neural network does not belong to the deep learning category from the model level and structure. For example, there is also an image contour extraction method based on Gabor-NSCT and a visual mechanism, which also involves multi-source feature extraction at different scales, but considering the operational capability of a visual mechanism model, a simplified fusion coding mode is usually adopted, and a learning process represented by convolutional neural network training is essentially lacked, so that the effectiveness of multi-source features on expressing contours cannot be truly embodied.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides an image contour detection method based on multi-level characteristic channel optimization coding.

Although the NSCT transform has excellent performance in characterizing image details, it usually adopts optimized encoding in a manner of performing some weighting on decomposition results in scale and direction, and the artificial setting of weighting parameters in the processing process causes large uncertainty of detection results. Considering the effectiveness of the Gabor filter in perceiving the target dimension and direction of the image, the invention firstly calculates the optimal dimension m corresponding to the Gabor filter for the input image I (x, y)_optAnd a direction theta_optAnd will beM obtained_optAnd theta_optAs the frequency separation parameter of NSCT transformation, the traditional redundant fusion mode that Gabor and NSCT need to traverse all scales and directions is changed; in addition, the invention performs feature enhancement fusion on the contour subgraph obtained by NSCT and I (x, y), which is beneficial to efficiently and accurately obtaining the primary contour response E (x, y) of I (x, y); and then E (x, y) is transmitted into a full convolution neural network formed by FSC-32S, FSC-16S, FSC-8S network units, active learning of network parameters is realized by utilizing a convolution and pooling module of a feature encoder, an image contour mask image corresponding to I (x, y) is obtained through a deconvolution and up-sampling module of a feature decoder, and dot multiplication operation is carried out on the image contour mask image and I (x, y), so that accurate detection of the image contour is finally realized. The method specifically comprises the following steps:

step 1, acquiring a primary contour response of an input image I (x, y). First, the Gabor filter response of the input image I (x, y) is calculated, and the result is noted as

As shown in formulas (1) to (4).

In the formula:

the representation image I (x, y) is obtained on the scale m, the direction θ ═ n pi/K through a Gabor filterGabor characteristic information of (a); sigma_x,σ_yRespectively representing the standard deviation of the Gabor wavelet basis function along the x-axis and the y-axis; omega is the complex modulation frequency of the Gaussian function; the Gabor filter psi is obtained by taking psi (x, y) as mother wavelet and carrying out scale and rotation transformation on the mother wavelet_m,n(x, y); wherein u, v is ψ_m,nA template size of (x, y); m-0,., S-1, n-0,., K-1, S and K denote the number of scales and directions, respectively; α is a scale factor of ψ (x, y), where: alpha is alpha>1。

Calculating the optimal scale m corresponding to the Gabor filter based on the similarity index SSIM_optAnd a direction theta_optAs shown in formulas (5) to (8).

Wherein

Representing filter response

With the known outline marker image I^markThe similarity between them when

When taking the maximum value, obtaining the optimal scale m_optAnd a direction theta_opt；

And

respectively represent

And I^markQuantitative similarity measures in brightness, contrast, and structure between; u. of_Gabor、u_markRespectively representing images

And I^markMean value of brightness, delta_Gabor、δ_markRespectively representing images

And I^markThe standard deviation of the luminance of (a),

respectively representing images

And I^markBrightness variance of δ_G,mRepresentative image

And I^mark(ii) a luminance covariance of; i is^markThe pixels of the outline area of the image are 1, and the other pixels are 0; to avoid system instability caused by the denominator in equations (6) to (8) approaching zero, C₁、C₂And C₃Set to some positive constant, less than the filter response

3% of the mean brightness value.

M is to be_optAnd theta_optAs the frequency separation parameter of NSCT, the NSCT decomposes the image I (x, y) to obtain a profile subgraph

Since the NSCT decomposition process remains dimensionally unchanged, it will

And directly carrying out a pixel-level feature enhancement fusion operation with the I (x, y) to finally obtain a primary contour response E (x, y) of the input image I (x, y), as shown in formulas (9) and (10).

In the formula (I), the compound is shown in the specification,

represents the dimension m_optAnd a direction theta_optA non-downsampled contourlet transform under parametric conditions,

representing a corresponding NSCT contour sub-graph; t represents a contour subgraph

The average value of brightness of; max is the function of taking the maximum value, the same applies below.

Step 2: transmitting the primary contour response E (x, y) obtained in the step 1 to a full convolution neural network to obtain a heat map F trained by FCN-32S, FCN-16S, FCN-8S network elements respectively⁵，F⁴，F³. The full convolutional neural network is divided into two parts of a feature encoder and a feature decoder, and the whole network comprises 8 convolutional blocks, 5 maximum pooling layers, 5 upsampling layers and 2 convolutional layers. The concrete structure is as follows:

1. feature encoder

With VGG-16, performing optimization and reconstruction of the full convolution neural network by using the network as a basic network. In order to improve the network computing speed and enhance the generalization capability, 1 × 1 convolution kernels are added into every two convolution kernels with the convolution kernel number of 3 × 3 in a convolution block (3 × 3, 1 × 1 and 3 × 3) structure; in order to strengthen the nonlinearity and translation invariance of the learning image characteristics, a maximum pooling layer is added behind each layer of convolution module; meanwhile, E (x, y) is processed by a pooling layer Max pool5, and the size of E (x, y) is changed into 1/32 of I (x, y), which is marked as

Representing a feature diagram output after the FCN-32S network unit is trained; e (x, y) passes through pooling layer Max pool4 and convolutional layer 1X 1, and has a size of 1/16 of I (x, y), which is recorded as 1/16

Representing a feature diagram output after the FCN-16S network unit is trained; similarly, E (x, y) passes through pooling layer Max pool3 and convolutional layer 1 × 1, and the size becomes 1/8 of I (x, y), which is recorded as 1/8

And the characteristic diagram is output after the FCN-8S network unit is trained. Wherein each pooling layer output utilizes a Relu activation function to implement a sparse coding function. The characteristic encoder comprises the following thirteen-layer structure, wherein step lengths stride are all 1:

the first layer, convolutional layer CONV1-1, number of channels 8, convolution kernel size 3 × 3; CONV1-2, the number of channels is 8, and the size of a convolution kernel is 3 multiplied by 3;

a second layer, a maximum pooling layer Max pool1, with a pooling area size of 2 × 2;

the third layer, convolution layer CONV2-1, channel number 16, convolution kernel size 3x 3; CONV2-2, the number of channels is 16, and the size of a convolution kernel is 1 multiplied by 1; CONV2-3, the number of channels is 16, and the size of a convolution kernel is 3x 3;

the fourth layer is a maximum pooling layer Max pool2, and the size of the pooling area is 2 multiplied by 2;

the fifth layer, convolution layer CONV3-1, number of channels 32, convolution kernel size 3 × 3; CONV3-2, the number of channels is 32, and the size of a convolution kernel is 1 multiplied by 1; CONV3-3, the number of channels is 32, and the size of a convolution kernel is 3 multiplied by 3;

a sixth layer, a maximum pooling layer Max pool3, with a pooling area size of 2 × 2;

the seventh layer, convolution layer CONV4-1, channel number 64, convolution kernel size 3x 3; CONV4-2, the number of channels is 64, and the size of a convolution kernel is 1 multiplied by 1; CONV4-3, the number of channels is 64, and the size of a convolution kernel is 3 multiplied by 3;

the eighth layer, the largest pooling layer Max pool4, the pooling area size is 2 × 2;

the ninth layer, convolutional layer CONV5-1, number of channels 128, convolution kernel size 3 × 3; CONV5-2, the number of channels is 128, and the size of a convolution kernel is 1 multiplied by 1; CONV5-3, the number of channels is 128, and the size of a convolution kernel is 3 multiplied by 3;

the tenth layer, the largest pooling layer Max pool5, the pooling area size is 2 × 2;

the eleventh layer, convolutional layer CONV6, number of channels 256, size of convolutional kernel 1 × 1;

the twelfth layer, convolution layer CONV7, 256 channels, convolution kernel size 1 × 1;

the thirteenth layer, convolutional layer CONV8, number of channels 1, size of convolutional kernel 1 × 1;

2. feature decoder

The primary contour response E (x, y) is continuously reduced to 1/8, 1/16 and 1/32 times after feature coding, the obtained feature map has low resolution, and therefore a feature decoder is added to carry out bilinear upsampling operation on the feature map with low resolution. For 32 times down-sampled image

Using 32-fold bilinear upsampling, we obtain a heat map of the same size as I (x, y), denoted F⁵(ii) a Adding a prediction convolutional layer 1 multiplied by 1 for adjusting the number of characteristic image channels after a pooling layer Max pool4, and outputting to obtain an image

Simultaneous down-sampling of 32-fold images

Performing two-fold upsampling to obtain a result

Adding corresponding elements, and obtaining a heat map with the same size as I (x, y) by utilizing 16 times of bilinear upsampling, and marking the heat map as F⁴(ii) a Adding a prediction convolutional layer 1 multiplied by 1 for adjusting the number of characteristic image channels after a pooling layer Max pool3, and outputting to obtain an image

Simultaneous down-sampling of 16-fold images

Performing two-fold upsampling to obtain a result

Adding corresponding elements, and obtaining a heat map with the same size as I (x, y) by 8 times of bilinear upsampling, and marking the heat map as F³。

And step 3: for the heatmap F obtained in step 2⁵，F⁴，F³The max function is used to take the maximum pixel value of each pixel, the maximum pixel value is fused to obtain an image outline mask image F, and the image outline mask image F is further subjected to the action of Relu activation function and is combined with a known outline mark image I^markAnd performing loss operation, recording the result as loss, continuously and iteratively updating parameters of each network layer by adopting random gradient descent, finishing the training when the loss value is smaller than a threshold value epsilon, and setting epsilon to be 1-3% of the total number of pixels of the training image sample to obtain the trained full convolution neural network.

And 4, step 4: and (3) passing the image to be detected through the Gabor filter constructed in the steps 1-3, non-subsampled contourlet transformation and the trained full convolution neural network to obtain an image contour mask image, and performing dot multiplication operation on the image contour mask image and the image to be detected to finally obtain an image contour detection result.

The invention has the following beneficial effects:

1. a novel primary contour response method for multi-stage characteristic channel optimization coding is provided. Since NSCT transformation can simulate the frequency domain separation effect of the LGN of the outer knee in visual information processing, the detection result has large uncertainty due to the artificial setting of weighting parameters in the image decomposition process. Considering that the response characteristic of the Gabor filter is similar to that of a human visual system, the Gabor filter has certain robustness on illumination and posture, and has high-quality spatial locality and directional selectivity. Therefore, the invention provides that the optimal scale and direction based on Gabor filter response are searched for each picture, and then the optimal scale and direction are used as direct basis for setting frequency separation parameters for NSCT; and performing feature enhancement fusion on the contour sub-image obtained by NSCT and the original image, and being beneficial to efficiently and accurately obtaining the primary contour response. A novel primary contour response method of multilevel characteristic channel optimization coding is constructed, an image characteristic channel with low dimension and redundancy is obtained, and the method has important application prospects in relieving network pressure, reducing the computational complexity of a convolutional neural network and improving the training efficiency of the network.

2. A full convolutional neural network is constructed for multi-scale training, and characteristics such as smoothness and fineness of the FCN-32S, FCN-16S, FCN-8S in heat map expression are fully complemented. The network is divided into a feature encoder and a feature decoder, region selection is not needed for a target image from end to end, the feature encoder continuously and actively learns feature parameters through convolution and pooling, feature maps are reduced in proportion, the feature decoder guarantees two-dimensional characteristics of extracted features through deconvolution and up-sampling processes, main outlines of the images are represented by obtaining heat maps with the same size as an original image, prediction of each pixel is achieved, and meanwhile space information of the original image is reserved.

Drawings

In order to make the object, technical scheme and beneficial effect of the invention more clear, the invention provides the following drawings for explanation:

FIG. 1 is a flow chart of image contour detection according to the present invention;

FIG. 2 is a block diagram of a full convolution neural network architecture of the present invention;

Detailed Description

The following description of the present invention will be made with reference to the accompanying drawings, which are shown in fig. 1,

As shown in formulas (11) to (14).

In the formula:

representing Gabor characteristic information obtained by an image I (x, y) on a scale m and a direction theta (n pi/K) through a Gabor filter; sigma_x,σ_yRespectively representing the standard deviation of the Gabor wavelet basis function along the x-axis and the y-axis; omega is the complex modulation frequency of the Gaussian function; the Gabor filter psi is obtained by taking psi (x, y) as mother wavelet and carrying out scale and rotation transformation on the mother wavelet_m,n(x, y); wherein u, v is ψ_m,nA template size of (x, y); m-0,., S-1, n-0,., K-1, S and K denote the number of scales and directions, respectively; α is a scale factor of ψ (x, y), where: alpha is alpha>1。

Calculating the optimal scale m corresponding to the Gabor filter based on the similarity index SSIM_optAnd a direction theta_optAs shown in formulas (15) to (18).

Wherein

Representing filter response

With the known outline marker image I^markThe similarity between them when

And

respectively represent

And I^markThe standard deviation of the luminance of (a),

respectively representing images

And I^markBrightness variance of δ_G,mRepresentative image

And I^mark(ii) a luminance covariance of; i is^markThe pixels of the outline area of the image are 1, and the other pixels are 0; to avoid system instability caused by the denominator in equations (16) to (18) approaching zero, C₁、C₂And C₃Set to some positive constant, less than the filter response

3% of the mean brightness value.

M is to be_optAnd theta_optObtaining a profile subgraph as a frequency separation parameter of NSCT

Since the NSCT maintains the same dimension after the image I (x, y) is decomposed, it will do

And directly carrying out a feature enhancement fusion operation at a pixel level with the I (x, y), and finally obtaining a primary contour response E (x, y) of the input image I (x, y), as shown in formulas (19) and (20).

In the formula (I), the compound is shown in the specification,

Step 2: as shown in FIG. 2, a full convolutional neural network was constructed, obtaining a heatmap F trained by FCN-32S, FCN-16S, FCN-8S network elements, respectively⁵，F⁴，F³. The full convolutional neural network is divided into two parts of a feature encoder and a feature decoder, and the whole network comprises 8 convolutional blocks, 5 maximum pooling layers, 5 upsampling layers and 2 convolutional layers. The concrete structure is as follows:

1. feature encoder

(1) The primary contour response E (x, y) passes through a 3X 3-8, 3X 3-8 CONV1 volume block, then maximum pooling of 2X 2 is carried out, and the effect of Relu activation function is used to obtain an image F₁ ¹ _/2As shown in formula (21), the representation I (x, y) is passed through convolution-Max pool1, and the size becomes 1/2.

Wherein conv1() represents the convolution operation of the first layer; pool1() represents the first max pooling operation; relu () represents the activation function for the sparse result, as follows.

(2) Will be provided with

The image is obtained by the CONV2 volume blocks of 3x 3-16, 1 x 1-16 and 3x 3-16, the maximum pooling of 2 x 2 is carried out, and then the 1 x 1 prediction convolution for adjusting the number of characteristic image channels and the Relu activation function are added

As shown in equation (22), the image is represented

After convolution with Max pool2, the size becomes 1/4 for I (x, y).

Wherein conv2() represents the second layer convolution operation; pool2() represents the second max pooling operation.

(3) Will be provided with

The image is obtained by 3x 3-32, 1 x 1-32 and 3x 3-32 CONV3 volume blocks, 2 x 2 maximum pooling and Relu activation function

As shown in equation (23), the image is represented

After convolution with a convolution block-Max pool 3-predictive convolution, the size becomes 1/8 for I (x, y).

Wherein conv3() represents the third layer of convolution operations; pool3() represents the third max pooling operation, conv1 × 1() represents a 1 × 1 convolution kernel, the same applies below.

(4) Will be provided with

The image is obtained by carrying out 3x 3-64, 1 x 1-64 and 3x 3-64 CONV4 volume blocks, then carrying out 2 x 2 maximum pooling, then adding 1 x 1 prediction convolution for adjusting the number of characteristic image channels, and carrying out the function of Relu activation function

As shown in equation (24), the image is represented

After convolution with a convolution block-Max pool 4-predictive convolution, the size becomes 1/16 for I (x, y).

Wherein conv4() represents the fourth layer convolution operation; pool4() represents the fourth max pooling operation.

(5) Will be provided with

Obtaining images through CONV5 volume blocks of 3x 3-64, 1 x 1-64 and 3x 3-64, then performing 2 x 2 maximum pooling and Relu activation function action

As shown in equation (25), the image is represented

After convolution with Max pool4, the size becomes 1/32 for I (x, y).

Wherein conv5() represents the fifth layer convolution operation; pool5() represents the fifth max pooling operation.

2. Feature decoder

(1) Image of a person

Using 32-fold bilinear upsampling, we get a heat map of the same size as I (x, y), denoted F⁵. As shown in equation (26).

Where bilinear () represents a bilinear upsampling operation, the same applies below.

(2) Adding a prediction convolutional layer 1 multiplied by 1 for adjusting the number of characteristic image channels after a pooling layer Max pool4, and outputting to obtain an image

Simultaneous down-sampling of 32-fold images

Performing two-fold upsampling to obtain a result

Adding corresponding elements, and obtaining a heat map with the same size as I (x, y) by utilizing 16 times of bilinear upsampling, and marking the heat map as F⁴As shown in formula (27).

Where sum () represents a matrix addition operation, the same applies below.

(3) Adding a prediction convolutional layer 1 multiplied by 1 for adjusting the number of characteristic image channels after a pooling layer Max pool3, and outputting to obtain an image

Simultaneous down-sampling of 16-fold images

Performing two-fold upsampling to obtain a result

Adding corresponding elements, and obtaining a heat map with the same size as I (x, y) by 8 times of bilinear upsampling, and marking the heat map as F³As shown in equation (28).

And step 3: for the heatmap F obtained in step 2⁵，F⁴，F³And taking the maximum pixel value of each pixel by using a max function, and fusing to obtain an image contour mask image F as shown in a formula (29). And performing loss operation on the training image with the known artificial marked contour under the action of a Relu activation function, and recording the result as loss as shown in a formula (30). And continuously and iteratively updating parameters of each network layer by adopting random gradient descent (Stochastic gradient parameter), finishing the training when the loss value is smaller than a threshold value epsilon, setting epsilon to be 1-3% of the total number of pixels of the training image sample, and obtaining the trained full convolution neural network.

F＝max(F⁵,F⁴,F³) (29)

Wherein M, N is the number of rows and columns of the training image, F_i,jRepresenting the pixel values of the image profile mask F at coordinates (i, j),

marking an image I for a known contour^markThe pixel value at coordinate (i, j).

Claims

1. an image contour detection method based on multi-level feature channel optimization coding, is characterized in that, this method specifically comprises the following steps:

Step 1: obtain the primary contour response of the input image I(x, y);

First calculate the Gabor filter response of the input image I(x,y), the result is recorded as

As shown in formulas (1) to (4);

where:

Represents the Gabor feature information obtained by the image I( _x , _y ) through the Gabor filter at the scale m and the direction θ=nπ/K; ω is the complex modulation frequency of the Gaussian function; with ψ(x, y) as the mother wavelet, the Gabor filter ψ _m,n (x, y) is obtained by scaling and rotating it; where u, v are ψ Template size of _m,n (x,y); m=0,...,S-1, n=0,...,K-1, S and K represent the number of scales and directions respectively; α is ψ (x,y) scale factor, where: α>1;

Based on the similarity index SSIM, calculate the optimal scale m _opt and direction θ _opt corresponding to the Gabor filter, as shown in equations (5) to (8);

in

represents the filter response

The similarity with the known contour mark image I ^mark , when

When taking the maximum value, the optimal scale m _opt and direction θ _opt are obtained;

and

Respectively

Quantitative similarity measures in brightness, contrast, and structure with I ^mark ; u _Gabor and u _mark represent images, respectively

and the mean brightness of I ^mark , δ _Gabor and δ _mark represent the image, respectively

and the standard deviation of the brightness of I ^mark ,

separate images

and the luminance variance of I ^mark , δ _{G, m} represents the image

and the luminance covariance of I ^mark ; in order to avoid the instability of the system caused by the denominators in equations (6) to (8) being close to zero, C ₁ , C ₂ and C ₃ are set to a certain constant, less than the filter response

3% of the mean brightness;

Taking m _opt and θ _opt as the frequency separation parameters of the non-subsampled contourlet transform, the non-subsampled contourlet transform decomposes the image I(x, y) to obtain the contour subgraph

Since the size of the non-subsampled contourlet transform decomposition process remains the same, the

Perform a pixel-level feature enhancement fusion operation directly with I(x,y), and finally obtain the primary contour response E(x,y) of the input image I(x,y), as shown in equations (9) and (10);

In the formula,

represents the non-subsampled contourlet transform with scale m _opt and direction θ _opt parameters,

represents the corresponding non-subsampled contourlet transform contour submap; t represents the contour submap

The mean value of brightness; max represents the function of taking the maximum value, the same below;

Step 2: The primary contour response E(x, y) obtained in Step 1 is transmitted to the fully convolutional neural network to obtain the heatmap ^F5 trained by the FCN-32S, FCN-16S, and FCN-8S network units respectively, F ⁴ , F ³ ; the fully convolutional neural network is divided into two parts: feature encoder and feature decoder, the whole network contains 8 convolution blocks, 5 max pooling layers, 5 upsampling and 2 convolution layers; The specific structure is as follows:

1. Feature encoder

The full convolutional neural network is optimized and transformed with VGG-16 as the basic network; in order to improve the network computing speed and enhance the generalization ability, in the convolution block (3×3, 1×1, 3×3) structure, A 1×1 convolution kernel is added to every two 3×3 convolution kernels; in order to strengthen the nonlinearity and translation invariance of learning image features, a maximum pooling layer is added after each convolution module; at the same time, E(x,y ) After being processed by the pooling layer Max pool5, the size becomes 1/32 of I(x,y), denoted as

Represents the feature map output after FCN-32S network unit training; E(x,y) passes through the pooling layer Max pool4 and the convolutional layer 1×1, and the size becomes 1/16 of I(x,y), denoted as

Represents the feature map output after FCN-16S network unit training; Similarly, E(x,y) passes through the pooling layer Max pool3 and the convolutional layer 1×1, and the size becomes 1/8 of I(x,y) , denoted as

Represents the feature map output after training by the FCN-8S network unit; the output of each pooling layer uses the Relu activation function to implement the sparse coding function; the feature encoder includes the following thirteen-layer structure, where the stride is 1:

The first layer, convolutional layer CONV1-1, the number of channels is 8, and the size of the convolution kernel is 3×3; CONV1-2, the number of channels is 8, and the size of the convolution kernel is 3×3;

The second layer, the maximum pooling layer Max pool1, the size of the pooling area is 2 × 2;

The third layer, convolutional layer CONV2-1, the number of channels is 16, and the size of the convolution kernel is 3 × 3; CONV2-2, the number of channels is 16, and the size of the convolution kernel is 1 × 1; CONV2-3, the number of channels 16, the size of the convolution kernel is 3x3;

The fourth layer, the maximum pooling layer Max pool2, the size of the pooling area is 2 × 2;

The fifth layer, the convolution layer CONV3-1, the number of channels is 32, and the size of the convolution kernel is 3 × 3; CONV3-2, the number of channels is 32, and the size of the convolution kernel is 1 × 1; CONV3-3, the number of channels 32, the size of the convolution kernel is 3×3;

The sixth layer, the maximum pooling layer Max pool3, the size of the pooling area is 2 × 2;

The seventh layer, convolutional layer CONV4-1, the number of channels is 64, and the size of the convolution kernel is 3×3; CONV4-2, the number of channels is 64, and the size of the convolution kernel is 1×1; CONV4-3, the number of channels 64, the size of the convolution kernel is 3×3;

The eighth layer, the maximum pooling layer Max pool4, the size of the pooling area is 2 × 2;

The ninth layer, the convolutional layer CONV5-1, the number of channels is 128, and the size of the convolution kernel is 3×3; CONV5-2, the number of channels is 128, and the size of the convolution kernel is 1×1; CONV5-3, the number of channels 128, the size of the convolution kernel is 3×3;

The tenth layer, the maximum pooling layer Max pool5, the size of the pooling area is 2 × 2;

The eleventh layer, the convolution layer CONV6, the number of channels is 256, and the size of the convolution kernel is 1×1;

The twelfth layer, the convolution layer CONV7, the number of channels is 256, and the size of the convolution kernel is 1×1;

The thirteenth layer, the convolution layer CONV8, the number of channels is 1, and the size of the convolution kernel is 1×1;

2. Feature Decoder

The primary contour response E(x, y) is continuously reduced to 1/8, 1/16, 1/32 of the original through feature encoding, and the obtained feature map has low resolution, so a feature decoder is added to the low-resolution feature map. Perform a bilinear upsampling operation; for images downsampled by a factor of 32

Use 32-fold bilinear upsampling to obtain a heat map of the same size as I(x,y), denoted as F ⁵ ; add a prediction convolution layer 1×1 to adjust the number of feature image channels after the pooling layer Max pool4, output image

Also downsample the image by a factor of 32

Upsampling twice, the result is the same as

The corresponding elements are added, and then 16 times bilinear upsampling is used to obtain a heat map of the same size as I(x,y), denoted as F ⁴ ; after the pooling layer Max pool3, a prediction volume that adjusts the number of feature image channels is added Layer 1×1, output to get the image

Simultaneously downsample the image by a factor of 16

Upsampling twice, the result is the same as

Add the corresponding elements, and then use 8-fold bilinear upsampling to obtain a heat map of the same size as I(x, y), denoted as F ³ ;

Step 3: For the heatmaps F ⁵ , F ⁴ , F ³ obtained in step 2, use the max function to take the maximum pixel value on each pixel, fuse to obtain the image contour mask map F, and then pass the Relu activation function. The contour marked image I ^mark is used for loss calculation, and the result is recorded as loss, and stochastic gradient descent is used to continuously update the parameters of each network layer. When the loss value is less than the threshold ε, the training ends, and ε is set as the total number of training image samples. 1 to 3%, obtain the fully convolutional neural network after training;

Step 4: Pass the image to be detected through the Gabor filter constructed in steps 1 to 3, the non-subsampling contourlet transform and the trained full convolutional neural network to obtain an image contour mask map, and perform a dot multiplication operation with the image to be detected , and finally obtain the image contour detection result.