CN114299303A

CN114299303A - Ship target detection method, terminal device and storage medium

Info

Publication number: CN114299303A
Application number: CN202111485883.6A
Authority: CN
Inventors: 周海峰; 熊超; 肖钟湧; 王佳; 宋佳声; 罗成汉; 张兴杰; 郑东强; 李寒林; 林忠华; 廖文良; 陈鑫
Original assignee: Jimei University
Current assignee: Jimei University
Priority date: 2021-12-07
Filing date: 2021-12-07
Publication date: 2022-04-08
Anticipated expiration: 2041-12-07
Also published as: CN114299303B

Abstract

The invention relates to a ship target detection method, a terminal device and a storage medium, wherein the method comprises the following steps: s1: collecting ship images to form a training set; s2: constructing a ship target detection model based on an improved CenterNet network and training; in the model, a ResNeXt-50 network is firstly adopted for down-sampling, and a hole encoder DE based on hole residual convolution is added behind a feature layer obtained by 32 times of down-sampling; secondly, performing upsampling by adopting an FPN network to obtain a feature map of 4 times downsampling; finally, based on the 4-time down-sampling feature map, performing regression prediction through a regression prediction network to obtain two attributes of a key point thermodynamic map, a corresponding central point offset and a target frame size through prediction; s3: and carrying out ship target detection on the image to be detected through the trained ship target detection model. The invention improves the detection accuracy and reduces the false identification of the ship and the missing detection probability of the small target ship.

Description

Ship target detection method, terminal device and storage medium

Technical Field

The present invention relates to the field of target detection, and in particular, to a ship target detection method, a terminal device, and a storage medium.

Background

Along with the rapid development of water shipping, the flow of ships increases, the water traffic supervision difficulty increases, the limitation of manpower supervision is made up by using a visual target detection technology, and the improvement of a ship supervision system has important significance. The traditional target detection method such as a background modeling method is difficult to accurately detect the ship target, and the convolutional neural network has strong characteristic characterization capability, so that the ship target detection algorithm based on the convolutional neural network becomes a main research trend. At present, mainstream target detection algorithms are a two-stage Faster R-CNN algorithm, a single-stage SSD algorithm and a YOLOV3 algorithm based on an Anchor-based frame, are widely applied to the field of ship target detection, and have a good detection effect. The Anchor-based detection algorithm mainly aims at solving the problems that the potential position of a target in an image is exhausted by presetting anchors, and then the correct position of the target is regressed from the potential position, but the shapes and the sizes of the targets in ship data sets are different, the anchors with reasonable design are difficult to design, and the anchors mechanism can generate the problems of unbalance of positive and negative training samples, high training difficulty, calculation redundancy and the like. In recent years, Anchor-free (Anchor-free) target detection algorithms are gradually developed and are receiving wide attention, such as CornerNet, centret (Object as point) and FCOS detection algorithms, and Anchor-free detection algorithms have the advantage that an Anchor frame is not required to be preset, but a target is positioned by detecting a key point, namely an angular point or a central point, and a shape attribute is regressed to obtain a target boundary frame.

Disclosure of Invention

In order to solve the above problems, the present invention provides a ship target detection method, a terminal device, and a storage medium.

The specific scheme is as follows:

a ship target detection method comprises the following steps:

s1: collecting ship images for training a ship target detection model to form a training set;

s2: constructing a ship target detection model based on an improved CenterNet network, and training the model through a training set;

firstly, a ResNeXt-50 network is adopted in a ship target detection model for down-sampling, and a cavity encoder DE based on cavity residual convolution is added behind a feature layer obtained by 32 times of down-sampling; secondly, performing upsampling by adopting an FPN network to obtain a feature map of 4 times downsampling; finally, based on the 4-time down-sampling feature map, performing regression prediction through a regression prediction network to obtain two attributes of a key point thermodynamic map, a corresponding central point offset and a target frame size through prediction;

s3: and carrying out ship target detection on the image to be detected through the trained ship target detection model.

Furthermore, the network structure of the hole encoder comprises two parts, namely a preprocessing layer and a hole residual error layer, wherein the preprocessing layer firstly reduces the dimension of an input channel through 1 × 1 convolution, then refines context information through 3 × 3 convolution, and the hole residual error layer continuously stacks 4 hole residual error blocks with expansion rates of 2, 4, 6 and 8 in sequence.

Furthermore, when the FPN network is adopted for up-sampling, in the process of down-sampling from 32 times to 16 times, firstly, the feature map of the 32 times down-sampling is up-sampled by 2 times nearest neighbor interpolation, then the number of the feature map channels of the 32 times down-sampling layer and the 16 times down-sampling layer is adjusted by 1 × 1 convolution, and feature fusion is carried out by 3 × 3 convolution after addition.

Further, three branches in the regression prediction network for obtaining the key point thermodynamic diagram, the center point offset and the target frame size are performed by using a 3 × 3 convolution and a 1 × 1 convolution.

Further, a LeakyRelu activation function is adopted when a hole encoder, an FPN network and a regression prediction network are constructed.

Further, a loss function L of the ship target detection model_detThe calculation formula of (2) is as follows:

L_det＝L_kp+L_offset+γ_sizeL_size

wherein L is_kpLoss function, L, representing a key point thermodynamic diagram_offsetLoss function, L, representing center point offset_sizeLoss function, gamma, representing the size of the target box_sizeA size loss adjustment factor is indicated for suppressing an excessively large target box size.

Further, the loss function L of the key point thermodynamic diagram_kpThe calculation formula of (2) is as follows:

where α and β both represent hyper-parameters, N represents the number of keypoints in the image, K_xycThe number of the gaussian kernels is represented,

representing a label thermodynamic diagram, c representing a ship category,

respectively representing x-axis coordinates and y-axis coordinates of the position of a central point of the target mapped in the label thermodynamic diagram, x and y respectively representing x-axis coordinates and y-axis coordinates of a negative sample near the central point, and delta_pRepresenting the scale adaptive variance.

Further, a loss function L of center point offset_offsetAnd the loss function L of the target frame size_sizeAn L1 loss function was used.

A ship target detection terminal device comprises a processor, a memory and a computer program stored in the memory and operable on the processor, wherein the processor implements the steps of the method of the embodiment of the invention when executing the computer program.

A computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, carries out the steps of the method as described above for an embodiment of the invention.

By adopting the technical scheme, the average accuracy of detection of various types of ships can be improved to different degrees, the ship false identification and the small target ship missing detection probability are reduced, and the method has important significance for monitoring and applying the ship on water.

Drawings

Fig. 1 is a flowchart illustrating a first embodiment of the present invention.

Fig. 2 is a schematic diagram of a network structure of the ship target detection model in this embodiment.

Fig. 3 is a schematic diagram of the resenext-50 network module in this embodiment, in which fig. 3(a) shows a resenext basic module, and fig. 3(b) shows a resenext simplified basic module.

Fig. 4 is a schematic diagram showing a network structure of the hole encoder in this embodiment.

Fig. 5 is a schematic diagram illustrating a process of fusing multi-scale features of the FPN upsampling network in this embodiment.

Detailed Description

To further illustrate the various embodiments, the invention provides the accompanying drawings. The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate embodiments of the invention and, together with the description, serve to explain the principles of the embodiments. Those skilled in the art will appreciate still other possible embodiments and advantages of the present invention with reference to these figures.

The invention will now be further described with reference to the accompanying drawings and detailed description.

The first embodiment is as follows:

the embodiment of the invention provides a ship target detection method, as shown in fig. 1, the method comprises the following steps:

s1: and collecting ship images for training the ship target detection model to form a training set.

In this embodiment, the disclosed data set for ship target detection, i.e., seaship data set, is mainly used for ships coming and going in inland river environment, where there are 6 types of ships, and 7000 ship images in total, including 1750 training sets, 1750 verification sets, and 3500 test sets.

In order to meet the requirement of ship target detection application in an actual scene, in the embodiment, a camera and an unmanned aerial vehicle are used for collecting ship images and videos at a certain water area accessory, 1500 ship images are selected from the ship images, wherein the ship images comprise 4 types of ships, mainly including a passenger ship, a fishing ship, a small number of common cargo ships and a container ship, LabelImg is used for marking, the ship images are manufactured into the same format as the data in the SeaSeaSehips, a self-made data set is named Gu-SeaSehips, 500 training sets, 250 verification sets and 250 test sets are divided, SeaSeaHIPs are added, the diversity of ship training samples is increased, and an expanded data set is named SeaHIPs + +.

S2: and constructing a ship target detection model based on the improved CenterNet network, and training the model through a training set.

The target is regarded as a point object when the model is built by the CenterNet algorithm, so that an anchor frame does not need to be preset, and the detection problem is simplified into the detection problem of key points. The algorithm firstly obtains a high-resolution feature map through a feature extraction network; then, predicting a key point thermodynamic diagram from the feature diagram, and regressing two attributes of the central point offset and the target frame size; and finally, performing 3 × 3 maximum pooling on the key point thermodynamic diagram, selecting a local peak point with a confidence value larger than a set threshold value as a target center point, and combining the offset of the corresponding center point and the size of a target frame to generate a prediction result. Since the centret uses the maximal pooling approach to screen local area keypoints, no-Maximum Suppression (NMS) post-processing is not performed on the prediction frame.

Three different coding-decoding network structures are used in the feature extraction network of the centret algorithm. The Hourglass-104 and DLA-34 networks are complex in construction mode, and the Hourglass-104 parameter quantity is large. In order to balance the detection accuracy and the speed, the embodiment selects to improve on the basis of the ResDCN network. The ResDCN network adopts a ResNeXt-50 network to perform 32-time down-sampling on the image, then adopts a combination of Deformable Convolution (DCN) and Deconvolution (DCN) to perform 3 times up-sampling, and finally obtains a feature map of 4-time down-sampling. The ResDCN network only utilizes deep characteristic information to construct an output characteristic layer, details of a shallow layer are ignored, the network structure is simple, but the detection capability of the ResDCN network on small and medium targets is insufficient, and the detection accuracy is not high. In order to improve the detection effect of the algorithm, the embodiment provides an algorithm framework combining a hole encoder (DE) and FPN feature fusion as shown in fig. 2, the algorithm firstly adopts a resenext-50 network with stronger feature extraction capability to extract the features of an input ship image, secondly adopts the hole encoder to expand the receptive field of a 32-time down-sampling feature map to generate a feature map covering a plurality of target scales, secondly carries out up-sampling through the FPN network, fuses the 32-time down-sampling feature map generated by the hole encoder and original 16-time, 8-time and 4-time down-sampling feature maps in the up-sampling process, and finally regresses and predicts a key point thermodynamic diagram, a corresponding central point offset and a target frame size from the 4-time down-sampling feature map obtained by the up-sampling, and obtains a target prediction result according to the 3 pieces of information.

(1) ResNeXt-50 downsampling network

The ResNeXt network is an optimized version of the ResNet network, has higher identification accuracy in the field of image classification than the ResNet network, and is low in network complexity, less in hyper-parameters and convenient in model transplantation. The ResNeXt network adopts the residual error idea of the ResNet network and absorbs the Split-Transform-Merge (STM) idea of the inclusion network. As shown in fig. 3, the residual branch in the ResNet network adopts single-path convolution, while the residual branch of the ResNeXt network splits it into multi-path convolution as shown in fig. 3(a), the number of paths is determined by a variable base (Cardinality), each path has the same convolution topology, simultaneously performs feature conversion on input features, and then combines convolution outputs of all paths to obtain an output of the residual branch. The calculation formula of the ResNeXt basic module is as follows:

in the formula: x is the input characteristic, C is the cardinal number, which refers to the number of split paths in the basic module, T_i(x) As a convolution transform for each convolution topology branch.

Referring to implementation of Grouped Convolution in an AlexNet network, a resenext simplified basic module as shown in fig. 3(b) can be obtained, and by setting a grouping base number and the number of input/output channels of each group, the Grouped Convolution can obtain the same output as the multipath Convolution in fig. 3(a), thereby reducing the complexity of the network.

Assuming that the size of an input ship image is 512 multiplied by 512, a ResNeXt-50 ship detection downsampling network structure constructed by utilizing a ResNeXt simplified basic module is shown in a table 1, and C in the table is a grouping base number.

TABLE 1

(2) Hole encoder

When the receptive fields of the feature maps are matched with the target scales, the detection effect is good, and therefore, in order to detect the target with a large scale range, a multi-output mode is usually adopted to detect the target, such as SSD and YOLOv3 algorithms, and the target with multiple scales is covered by multiple receptive fields on multiple feature maps, so that a good detection effect is obtained by the advantage of the number of feature maps. The hole encoder expands the receptive field of the characteristic map by stacking hole residual blocks so as to cover a wider target scale range, in each hole residual block, a residual branch enlarges the receptive field through hole convolution, the scale range covered by the receptive field is enlarged, but the deviation is generated relative to the original target scale range, and therefore, the characteristic map covering the wider range can be obtained by adding the original characteristic map and the characteristic map of the enlarged receptive field.

Structurally, the centret center point detection algorithm using ResDCN as the feature extraction network belongs to a single-input single-output detection mode, so that the hole encoder is introduced in the embodiment to increase the receptive field of the ResNeXt-50 network 32 times of the downsampling feature map, and the detection accuracy of ship targets at different scales is improved. The structure of the hole encoder is shown in fig. 4, and the hole encoder comprises two parts, namely a preprocessing layer and a hole residual error layer, wherein the preprocessing layer firstly reduces the dimension of an input channel through 1 × 1 convolution, then refines context information through 3 × 3 convolution, the hole residual error layer continuously stacks 4 hole residual error blocks with expansion rates of 2, 4, 6 and 8 in sequence, and the receptive field is continuously increased through the improvement of the expansion rates, so that the scale range covered by the characteristic diagram is improved.

(3) FPN feature fusion

The ResDCN network constructs the upsampled layer by convolving the 32 times downsampled feature layer, which is a deep feature extracted by the ResNet network, with a combination of deformable convolution and deconvolution. The deformable convolution can adjust the scale and the receptive field according to the target shape, is essentially self-adaptive feature enhancement, and can extract better features; the deconvolution is to obtain a feature map with a higher resolution, so that a more accurate target center point position can be predicted, but the feature map has only deep semantic information, ignores detailed information included in a shallow feature map, and is not high in detection accuracy.

Therefore, in order to improve the detection effect of the algorithm, in the embodiment, the FPN network is used for upsampling, semantic information in the deep feature map extracted by resenext-50 and detail information in the shallow feature map are fused to obtain the upsampled feature map, and unlike the conventional FPN, the improved algorithm only performs regression prediction on the finally obtained feature map. As shown in fig. 5, in the up-sampling process, taking 32 times down-sampling to 16 times down-sampling as an example, in the up-sampling process, the 32 times down-sampling feature map is up-sampled by 2 times nearest neighbor interpolation, then the number of feature map channels of a 32 times down-sampling layer and a 16 times down-sampling layer is adjusted by 1 × 1 convolution, and after addition, feature fusion is performed by 3 × 3 convolution, and according to the method, a 4 times down-sampling high-resolution feature map can be obtained after 3 times up-sampling. The feature map generated by expanding the receptive field of the cavity encoder (DE) is injected into the FPN network, so that semantic information obtained by FPN feature fusion is further enriched.

(4) And performing regression prediction through a regression prediction network based on the 4-time downsampling feature map, and predicting to obtain two attributes of a key point thermodynamic diagram (K branch), a corresponding central point offset (O branch) and a target frame size (S branch). In the regression prediction network, each branch is performed using one 3 × 3 convolution and one 1 × 1 convolution.

(5) And a LeakyRelu activation function is adopted when the hole encoder, the FPN network and the regression prediction network are constructed.

Loss function L of ship target detection model_detThe calculation formula of (2) is as follows:

L_det＝L_kp+L_offset+γ_sizeL_size

wherein L is_kpLoss function, L, representing a key point thermodynamic diagram_offsetLoss function, L, representing center point offset_sizeLoss function, gamma, representing the size of the target box_sizeIndicating size loss adjustment factorFor suppressing an excessively large target frame size, 0.1 is set in this embodiment.

Under the condition that the key point thermodynamic diagram predicts the corresponding K branches, the input ship image is assumed to be I e R^W×H×3The predicted key point thermodynamic diagram is

Where W and H are the width and height in the input ship image size, and are typically 512 × 512 and 384 × 384, r is the down-sampling step size, and is typically 4, C is the ship category, and in this embodiment, C is set to 6, and K is set to K^*1 denotes that the prediction result is a ship target, K^*0 indicates that the prediction result is background.

Let p be the true center point of the object in image I, and p^～For rounding down p/r, a central point is mapped to the key point thermodynamic diagram by adopting a Gaussian kernel, so that the generated label thermodynamic diagram K_xycThe value of the center point of each positive sample is 1, and the values of the nearby negative samples are in a Gaussian distribution. The calculation formula of the Gaussian kernel generated label thermodynamic diagram is as follows:

in the formula (I), the compound is shown in the specification,

in order to scale-adaptively the variance, the variance is,

and x-axis coordinates and y-axis coordinates which respectively represent the position of a central point of the target mapped in the label thermodynamic diagram, and x and y respectively represent x-axis coordinates and y-axis coordinates of a negative sample near the central point.

Loss of key point thermodynamic diagram L_kpTraining with improved Focal loss:

in the formula, alpha and beta are hyper-parameters in Focal loss, and are respectively 2 and 4, and N is the number of key points in the image I. Improving the Focal loss increases the negative sample loss by a factor of (1-K) compared to the Focal loss_xyc)^βThe method is used for inhibiting the loss of negative samples near the Gaussian center point of the thermodynamic diagram.

At midpoint offset O branch, the predicted midpoint offset is

The actual discretization error caused by image down-sampling is o_p＝p/R-p^～Training of the center point offset using the L1 loss:

at target box size S branch, the predicted generated target box width height is

Suppose that

For the real target frame of the ship target k, the position of the target center point can be known

The method uses the key point estimation to generate a central point and regresses the target frame width height of an object k as

Target box size training was performed using L1 loss:

The embodiment of the invention adopts a ResNeXt-50 network based on packet convolution to carry out down-sampling, extracts more effective ship image characteristics, then introduces a hole encoder based on a hole residual error network to increase the receptive field of an output characteristic diagram, generates a characteristic diagram covering a plurality of target scales so as to adapt to ship target detection under different scales, adopts a characteristic pyramid network to carry out up-sampling in a characteristic fusion mode, fuses a deep characteristic diagram enhanced by the hole encoder with a shallow characteristic diagram, and extracts and obtains a prediction characteristic diagram containing rich ship characteristic information.

Example two:

the invention further provides a ship target detection terminal device, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor executes the computer program to realize the steps of the method embodiment of the first embodiment of the invention.

Further, as an executable scheme, the ship target detection terminal device may be a desktop computer, a notebook, a palm computer, a cloud server, and other computing devices. The ship target detection terminal equipment can comprise, but is not limited to, a processor and a memory. It is understood by those skilled in the art that the above-mentioned constituent structure of the ship target detection terminal device is only an example of the ship target detection terminal device, and does not constitute a limitation on the ship target detection terminal device, and may include more or less components than the above, or combine some components, or different components, for example, the ship target detection terminal device may further include an input/output device, a network access device, a bus, and the like, which is not limited in this embodiment of the present invention.

Further, as an executable solution, the Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, a discrete hardware component, and the like. The general processor may be a microprocessor or the processor may be any conventional processor, etc., and the processor is a control center of the ship target detection terminal device and connects various parts of the whole ship target detection terminal device by using various interfaces and lines.

The memory may be used to store the computer program and/or the module, and the processor may implement various functions of the ship target detection terminal device by operating or executing the computer program and/or the module stored in the memory and calling data stored in the memory. The memory can mainly comprise a program storage area and a data storage area, wherein the program storage area can store an operating system and an application program required by at least one function; the storage data area may store data created according to the use of the mobile phone, and the like. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.

The invention also provides a computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, carries out the steps of the above-mentioned method of an embodiment of the invention.

The ship target detection terminal device integrated module/unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), software distribution medium, and the like.

While the invention has been particularly shown and described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. a ship target detection method, is characterized in that, comprises the following steps:

S1: Collect ship images for training the ship target detection model to form a training set;

S2: Build a ship target detection model based on the improved CenterNet network, and train the model through the training set;

In the ship target detection model, the ResNeXt-50 network is first used for downsampling, and a hole encoder DE based on hole residual convolution is added after the feature layer obtained by 32 times downsampling; secondly, the FPN network is used for upsampling to obtain 4 The feature map of double downsampling; finally, based on the feature map of 4 times downsampling, the regression prediction is carried out through the regression prediction network, and the key point heat map and the corresponding center point offset and the target frame size are predicted to obtain two attributes;

S3: Perform ship target detection on the image to be detected through the trained ship target detection model.

2. The ship target detection method according to claim 1, characterized in that: the network structure of the hole encoder includes two parts, which are a preprocessing layer and a hole residual layer, and the preprocessing layer is first convoluted by 1×1. The input channel is dimensionally reduced, and then the context information is refined by 3×3 convolution, and the atrous residual layer is successively stacked with 4 atrous residual blocks with expansion rates of 2, 4, 6, and 8 in turn.

3. The ship target detection method according to claim 1 is characterized in that: when adopting the FPN network to perform upsampling, in the process of downsampling 32 times to 16 times downsampling, first the 32 times downsampling feature map is passed through 2 Double nearest neighbor interpolation is used for upsampling, and then the number of feature map channels of the 32x downsampling layer and the 16x downsampling layer is adjusted by 1×1 convolution, and the feature fusion is performed by 3×3 convolution after addition.

4. The ship target detection method according to claim 1, wherein a 3× 3 convolutions and one 1×1 convolution are performed.

5 . The ship target detection method according to claim 1 , wherein the LeakyRelu activation function is used when constructing the hole encoder, the FPN network and the regression prediction network. 6 .

6. ship target detection method according to claim 1, is characterized in that: the calculation formula of the loss function L _det of ship target detection model is:

L _det =L _kp +L _offset +γ _size L _size

Among them, L _kp represents the loss function of the key point heatmap, L _offset represents the loss function of the center point offset, L _size represents the loss function of the target frame size, and γ _size represents the size loss adjustment coefficient, which is used to suppress the target that is too large box size.

7. The ship target detection method according to claim 6, wherein: the calculation formula of the loss function L _kp of the key point heat map is:

where α and β both represent hyperparameters, N represents the number of key points in the image, K _xyc represents the Gaussian kernel,

represents the label heatmap, c represents the ship category,

represent the x-axis and y-axis coordinates of the center point of the target mapping in the label heatmap, respectively, x and y represent the x-axis and y-axis coordinates of the negative samples near the center point, respectively, and δ _p represents the scale adaptive variance.

8 . The ship target detection method according to claim 6 , wherein the loss function L offset of the center point _offset and the loss function L size of the target frame _size adopt the L1 loss function. 9 .

9. A ship target detection terminal device, characterized in that it comprises a processor, a memory, and a computer program stored in the memory and running on the processor, and when the processor executes the computer program, The steps of the method of any one of claims 1-8.

10. A computer-readable storage medium storing a computer program, characterized in that: when the computer program is executed by a processor, the steps of the method according to any one of claims 1 to 8 are implemented .