CN109559315A

CN109559315A - A kind of water surface dividing method based on multipath deep neural network

Info

Publication number: CN109559315A
Application number: CN201811138311.9A
Authority: CN
Inventors: 庞彦伟; 贾大宇
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2018-09-28
Filing date: 2018-09-28
Publication date: 2019-04-02
Anticipated expiration: 2038-09-28
Also published as: CN109559315B

Abstract

The present invention relates to a kind of water surface dividing methods based on multipath deep neural network, including the following steps: collects the image comprising various different classes of objects, and marks all attention objects in every image；Image set divides；Design is based on multipath deep neural network structure, effectively to realize object detection, semantic segmentation network is formed in parallel by the convolutional neural networks in three paths, it is connected by the expansion convolution of three different expansion rates in each path, the following number of each expansion convolution represents the expansion rate of expansion convolution, expansion convolution in each path is connected and between path using mid-span using dense structure, keeps receptive field denser；Design decoding network；Cost function is lost needed for project training process, and selects the parameter of the suitable designed network of method initialization；Model training.

Description

A kind of water surface dividing method based on multipath deep neural network

Technical field

It is the invention belongs to deep learning and field of neural networks, in particular to a kind of based on multipath deep neural network Water surface dividing method.

Background technique

Deep learning (Deep Learning, DL) rapidly develops in recent years, has been widely used for computer vision The fields such as (Computer Vison, CV) and natural language processing (Natural Language Processing, NLP).

The unmanned water surface is a very promising direction, is explored in ocean, the fields such as emergency searching and rescuing and national defence There is relatively broad application.In the water surface is unmanned, it would be desirable to be partitioned into the travelable region of ship on the water surface.

Semantic segmentation method at this stage is mainly for natural scene and street scene, in water surface automatic Pilot, we The water surface scene and above-mentioned scene for needing to divide are very different.The reflection of water surface sunlight and wave all have very segmentation result Big influence.Ship collision accident in order to prevent, the speed of method are also a critically important index.

Semantic segmentation method leading at this stage has FCN [1], SegNet [2] and DeepLab [3-6] series.FCN is one Neural network structure end to end is planted, it replaces the full articulamentum of traditional convolutional neural networks with convolutional layer, and this method exists Network performance is improved to a certain extent but image detail is handled bad.SegNet is a kind of network structure of encoding and decoding, it The full articulamentum of traditional convolutional neural networks up-sampling is replaced, final softmax layer exports the probability of each pixel. DeepLab serial algorithm is current state-of-the-art semantic segmentation algorithm, by using expansion convolution, multiple dimensioned and condition random field The methods of substantially increase the performance of neural network.

[1]Long J,Shelhamer E,Darrell T.Fully convolutional networks for semantic segmentation[C]//Computer Vision and Pattern Recognition.IEEE,2015: 3431-3440.

[2]Badrinarayanan V,Kendall A,Cipolla R.SegNet:A Deep Convolutional Encoder-Decoder Architecture for Scene Segmentation.[J].IEEE Transactions on Pattern Analysis&Machine Intelligence,2017,PP(99):2481-2495.

[3]Chen L C,Papandreou G,Kokkinos I,et al.Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs[J].Computer Science, 2014(4):357-361.

[4]Chen L C,Papandreou G,Kokkinos I,et al.DeepLab:Semantic Image Segmentation with Deep Convolutional Nets,Atrous Convolution,and Fully Connected CRFs.arXiv preprint arXiv:1606.00915,2016

[5]Chen L C,Papandreou G,Schroff F,et al.Rethinking Atrous Convolution for Semantic Image Segmentation[J].2017.

[6]Chen L C,Yukun Zhu,Papandreou G,Schroff F,et al.Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation.arXiv preprint arXiv:1802.02611,2018.

Summary of the invention

The invention proposes a kind of neural network structures using three pathdepths connection expansion convolution, to reinforce different languages The connection of adopted level achievees the purpose that improve network performance.Technical solution is as follows:

A kind of water surface dividing method based on multipath deep neural network, including the following steps:

(1) image comprising various different classes of objects is collected, and marks all attention objects in every image, is marked The affiliated object category for infusing each pixel of content, using it as image tag information.

(2) image set divides.The image of collection is divided into training set, verifying collection and test set, training set is for training Convolutional neural networks, for selecting optimal training pattern, test set is follow-up test modelling effect or actually answers verifying collection Used time uses.

(3) design is based on multipath deep neural network structure, effectively to realize object detection.

1) core network is designed.

2) design semantic divides network: being formed in parallel by the convolutional neural networks in three paths, each path is by three differences The expansion convolution of expansion rate is connected, and the following number of each expansion convolution represents the expansion rate of expansion convolution, is rolled up in each expansion Lamination introduces 1 × 1 convolutional layer to reduce port number, connects the expansion convolution in each path using dense structure and on road Mid-span is used between diameter, keeps receptive field denser.

3) it designs decoding network: the output of semantic segmentation network is up-sampled, it is semantic special with the low-level in core network Sign is up-sampled after being added by one 3 × 3 convolutional layer again, obtains final segmentation result.

4) cost function is lost needed for project training process, and suitable method is selected to initialize designed network Parameter.

(4) input data, forward calculation prediction result and loss cost, pass through the gradient of back-propagation algorithm calculating parameter And undated parameter.The undated parameter of iteration, when cost function curve convergence, model training is finished.

(5) trained model is applied in test/practical application, when input picture, can be counted by the model Calculation obtains image, semantic segmentation result, assists the decision in practical application scene.

For the present invention in coding structure, the present invention expands convolution using three paths to obtain the characteristic pattern of different scale.For Image detail is preferably extracted, present invention employs the expansion convolution of different expansion rates and with depth mid-span by different scale Fusion Features.In decoding structure, the present invention combines the characteristic pattern of the characteristic pattern of low layer semanteme and high-level semantic, further Network is improved to the processing capacity of image detail.In addition after each expansion convolutional layer, the present invention passes through the volume of introducing 1 × 1 Lamination reduces port number, to achieve the purpose that reduce calculation amount, network is made to be easier to train.The present invention is in the same of improving performance When, it is more likely to the processing of image detail, is also had great advantage in terms of efficiency, there is stronger practicability and universality.

Detailed description of the invention

Fig. 1 multipath deep neural network structure

Fig. 2 water surface automatic Pilot

Specific embodiment

Technical solution of the present invention will be described by taking water surface automatic Pilot as an example below.

In water surface automatic Pilot, needs to be partitioned into the travelable region of ship from the image that camera obtains and float Mark, ship, the barriers such as reef, upon occurrence of an emergency situation, it is desirable that system can quickly make a response, and this requires our methods Accuracy rate and speed need to be combined.

Apply the present invention in practical semantic segmentation task, include three steps: preparing data set；Design simultaneously training net Network；Test training pattern.Specific steps are described as follows:

Step 1: preparing training data set used.

1) suitable data set is selected.Common semantic segmentation data set has Pascal VOC at this stage, for driving automatically The Cityscapes and KITTI sailed.In view of the particularity of aquatic environment, the present invention divides data using the water surface of oneself mark Collection.

2) data set is handled.Data set is divided into training dataset, validation data set and test data set.Training dataset For training pattern, validation data set is used to regulating networks structure and adjustment model parameter, and test data set is used to evaluation model Final performance.

3) data enhance.In order to further increase the segmentation precision of model, training dataset can be overturn using random, with Machine is cut, the methods of random scaling.

Step 2: designing and training network.

5) core network is designed.Core network is mainly by modules groups such as multiple volume bases, pond layer, nonlinear activation layers At.In order to which the initialization model that can use on ImageNet initializes network, the core network of this patent chooses warp The ResNet-101 of allusion quotation.

6) design semantic divides network.Network is formed in parallel by the convolutional neural networks in three paths, and each path is by three The expansion convolution of different expansion rates is connected, and specific structure is as shown in Figure 1.Wherein the following number of each expansion convolution represents expansion The expansion rate of convolution, the structure realize the multiple dimensioned concept of semantic segmentation network of the invention.The present invention uses dense structure It connects the expansion convolution in each path and between path using mid-span, keeps receptive field denser.

7) decoding network is designed.The output of semantic segmentation network is up-sampled, it is semantic special with the low-level in core network Sign is up-sampled after being added by one 3 × 3 convolutional layer again, obtains final segmentation result.

8) suitable loss function is selected, training the number of iterations, initiation parameter are set.

Step 3: training is of the invention to be used for semantic segmentation based on multipath deep neural network.

By training data batch input neural network, the specific steps are as follows:

A) training data is inputted from core network, carries out propagated forward.

B) loss function and backpropagation are calculated, network weight is updated using gradient descent method.

C) operation of circulation a) and b), loss function convergence, obtains trained weight.

Step 4: trained model is applied in test or practical application

1) test set: inputting network for test set image, and the mark for obtaining semantic segmentation result and test set compares, and calculates MIOU out, the quality of evaluation model.

2) practical application: the water surface video image or previously stored practical water surface video input net that camera is obtained Network obtains the result of semantic segmentation.

In order to verify effect of the invention, we compare current effect preferable FCN, SegNet and DeepLab, test number According to for 2012 data set of Pascal VOC widely used in semantic segmentation.Table 1 gives contrast and experiment.

1 contrast and experiment of table

Claims

1. A water surface segmentation method based on a multi-path deep neural network, comprising the following steps:

(1) Collect images containing various types of objects, and label all objects of interest in each image, label the object category of each pixel of the content, and use it as image label information;

(2) Image set division; the collected images are divided into training set, validation set and test set, the training set is used to train the convolutional neural network, the validation set is used to select the best training model, and the test set is the effect of the subsequent test model or used in practical applications;

(3) The design is based on a multi-path deep neural network structure to achieve effective object detection;

1) Design the backbone network;

2) Design a semantic segmentation network: it is composed of three-path convolutional neural networks in parallel, each path is connected in series by three dilated convolutions with different dilation rates, and the number under each dilated convolution represents the dilation rate of the dilated convolution, In each dilated convolutional layer, a 1×1 convolutional layer is introduced to reduce the number of channels, and a dense structure is used to connect the dilated convolutions in each path and use cross-connections between paths to make the receptive field denser;

3) Design the decoding network: upsample the output of the semantic segmentation network, add it to the low-level semantic features in the backbone network, and then upsample it through a 3×3 convolutional layer to obtain the final segmentation result;

4) Design the loss cost function required in the training process, and select an appropriate method to initialize the parameters of the designed network;

(4) Input the data, calculate the prediction result and loss cost forward, calculate the gradient of the parameter through the back propagation algorithm and update the parameter; iteratively update the parameter, when the cost function curve converges, the model training is completed;

(5) Apply the trained model to testing/practical applications. When an image is input, the semantic segmentation result of the image can be calculated through the model to assist decision-making in practical application scenarios.