CN114202056B

CN114202056B - A small-sample underwater target recognition method based on deep transfer learning of multi-platform auditory perception features

Info

Publication number: CN114202056B
Application number: CN202111346434.3A
Authority: CN
Inventors: 陈越超; 王方勇; 周彬; 王青翠; 陈孝森; 尚金涛
Original assignee: 715 Research Institute Of China Shipbuilding Corp; China State Shipbuilding Corp Ltd; Hangzhou Institute of Applied Acoustics
Current assignee: 715 Research Institute Of China Shipbuilding Corp; China State Shipbuilding Corp Ltd
Priority date: 2021-11-11
Filing date: 2021-11-11
Publication date: 2025-08-08
Anticipated expiration: 2041-11-11
Also published as: CN114202056A

Abstract

This invention proposes a method for underwater target recognition based on deep transfer learning of multi-platform auditory perception features. This method establishes a multi-deep learning joint decision model consisting of an MFCC feature extraction network, a GFCC feature extraction network, a CFCC feature extraction network, and a fusion recognition network. Furthermore, a multi-platform data transfer learning system is established. This system leverages target data from other sonar platforms, simulated sonar target data, speaker voice data, and other data with referenceable features to perform transfer learning training on the deep learning model, specifically targeting the small sample data conditions of the method's application platform. Compared to traditional target recognition methods based on physical features, this method offers more in-depth and comprehensive feature mining and utilization. Furthermore, compared to intelligent recognition models driven solely by platform data, this method utilizes a wider range of data, and the model training process is less susceptible to overfitting. This method can improve sonar target recognition capabilities under small sample data conditions.

Description

Small sample underwater target recognition method based on multi-platform auditory perception feature deep transfer learning

Technical Field

The invention belongs to the technical field of underwater target identification and artificial intelligence, and particularly relates to a small sample underwater target identification method based on multi-platform auditory perception feature deep transfer learning.

Background

The underwater target recognition technology can provide target characteristic information for sonar operators, judge the target type, is an important basis for comprehensive decision making, and is one of the most important research directions in sonar signal processing.

The core research content of the underwater target identification is the extraction and expression of the acoustic characteristics of the target signals. The traditional underwater sound target feature acquisition method mainly builds a model based on the grasped target data/characteristic knowledge, is influenced by factors such as multi-target interference, ocean channel space-time variation, platform and environmental noise, is clean and clear, is large in difficulty in acquiring target features suitable for actual scenes, and becomes a bottleneck for restricting the development of the traditional recognition technology.

In recent years, the deep learning technology has been widely used in the fields of voice, image and the like at present, and has been focused by a plurality of scholars at home and abroad in the field of underwater sound, but is limited by factors such as test means and the like, and the high-quality target data of most sonar platforms are difficult to acquire, so that the generalization capability of an intelligent model driven by target data of the sonar platforms directly based on the small samples is not strong.

Disclosure of Invention

The invention aims to provide a method for identifying targets in small sample water for multi-platform auditory perception feature deep migration learning, which aims to solve the problem that the intelligent model generalization capability based on small sample sonar platform target data driving in the background art is not strong.

A target recognition method in small sample water based on multi-platform auditory perception feature deep transfer learning comprises the following steps of constructing a multi-depth learning combined judgment model facing multi-dimensional target auditory perception features in water, generating a multi-platform sample set based on multi-platform data, expanding multi-level transfer learning training on the multi-depth learning combined judgment model based on the multi-platform sample set, preprocessing unknown target radiation noise data, carrying out target recognition on a preprocessing result by adopting the trained multi-depth learning combined judgment model, and outputting a recognition result.

The construction method of the multi-deep learning joint judgment model comprises the steps of constructing an MFCC feature extraction network, a GFCC feature extraction network and a CFCC feature extraction network, placing the 3 feature extraction networks in parallel, sequentially adding a fusion recognition network and a support vector machine classifier at the top, wherein the fusion recognition network is a two-layer perceptron network, initializing parameters of the multi-deep learning joint judgment model, constructing a loss function J of the multi-deep learning joint judgment model, and setting training parameters during training.

Preferably, the construction method of the MFCC feature extraction network, GFCC feature extraction network or CFCC feature extraction network comprises the steps of adding a data input layer, adding a first convolution layer, wherein the first convolution layer parameter is (48× 1,144,4), adding a lrn layer, adding a second convolution layer, wherein the second convolution layer parameter is (16× 1,192,2), adding a third convolution layer, wherein the third convolution layer parameter is (7× 1,256,1), adding a lrn layer, adding a first pooling layer, wherein the first pooling layer parameter is (3×1, 2), adding three basic ResNet-Inception modules, adding a fourth convolution layer, wherein the fourth convolution layer parameter is (3× 1,384,1), adding a fifth convolution layer, wherein the fifth convolution layer parameter is (3× 1,512,1), adding a second pooling layer, wherein the second pooling layer parameter is (3×1, 2), adding a full connection layer, and the number of output nodes is 256.

Preferably, 3 parallel branches are added after a data input layer, branch 1 is a direct branch, branch 2 comprises 2 convolution layers, the parameters of each convolution layer are (1× 1,128,1) and (3× 3,128,1) in sequence, branch 3 comprises 2 convolution layers, the parameters of each convolution layer are (1× 3,128,1) and (3× 1,128,1) in sequence, feature dimension expansion operation is added after branches 2 and 3, integrated feature set output is obtained by integrating two branch output features, the total number of features is 256, direct addition summation operation is added after the output features of branch 1 and the integrated feature set, top-level features of a basic ResNet-Inception module are obtained, and a ReLU activation function is added to output final convolution features.

Preferably, the initializing the parameters of the multi-deep learning joint decision model includes generating 48 channels of auditory perception filter coefficients according to a set frequency band based on MFCC, GFCC, CFCC auditory perception filter construction methods, each auditory perception filter is a 48-dimensional vector, integrating all auditory perception filters to form a 48×144 matrix, and initializing parameters of first convolution layers of the MFCC feature extraction network, the GFCC feature extraction network and the CFCC feature extraction network respectively.

Preferably, the generating the multi-platform sample set based on the multi-platform data is that a speaker voice data set, a simulated sonar target data set, other sonar platform target data sets and a small sample sonar platform target data set to be applied are utilized, the upper and lower limit frequencies of data processing are set according to requirements based on MFCC, GFCC and CFCC auditory perception filters, corresponding auditory perception characteristic samples are generated, and the multi-platform sample set is constructed.

Preferably, the multi-stage transfer learning training for the multi-depth learning joint decision model based on the multi-platform sample set comprises the following steps of initializing the multi-depth learning joint decision model, performing primary transfer learning training on the multi-depth learning joint decision model, and performing advanced transfer learning training on the multi-depth learning joint decision model.

The initialization training of the multi-deep learning joint decision model preferably comprises a cyclic optimization step of randomly selecting a fixed number of samples from a speaker voice sample set and optimizing a loss function J for a plurality of rounds based on a gradient descent method, then randomly selecting a fixed number of samples from a simulated sonar target sample set and optimizing the loss function J for a plurality of rounds based on a gradient descent method, and repeating the cyclic optimization step until the cyclic optimization times reach a preset cyclic round time until the loss function converges to an expected threshold.

Preferably, the primary transfer learning training of the multi-deep learning combined judgment model comprises the steps of reserving structural parameters of an MFCC feature extraction network, a GFCC feature extraction network and a CFCC feature extraction network during the initialization training, randomly initializing, fusing and identifying network structural parameters, randomly selecting a fixed number of samples from other sonar target sample sets, and optimizing a loss function J for a plurality of rounds based on a gradient descent method until the loss function converges to an expected threshold.

Preferably, the advanced transfer learning training of the multi-deep learning joint judgment model comprises the steps of extracting a first local network structure of the multi-deep learning joint judgment model and preserving network structure parameters to obtain a feature extraction model S _E＝{s_M,s_G,s_C, wherein the first local network structure comprises an MFCC feature extraction network, a GFCC feature extraction network, a first local network structure of the multi-deep learning joint judgment model, and a second local network structure of the multi-deep learning joint judgment model, The method comprises the steps of extracting corresponding 3 basic ResNet-Inception modules in a CFCC feature extraction network and a previous network structure S _M、s_G、s_C, extracting a second local network structure of a multi-deep learning joint discrimination model, constructing a feature recognition model S _R and constructing a loss function J _m of the feature recognition model S _R, wherein the second local network structure is an MFCC feature extraction network, GFCC feature extraction network, The method comprises the steps of performing characteristic extraction on all samples in a target sample set of a small sample sonar platform to be applied through S _E to obtain a characteristic sample set x_E＝{x_E,1,x_E,2,…,x_E,n,(x_E,n＝(x_M,n,x_G,n,x_C,n),n∈N^*)},, wherein the characteristic sample x _E,n comprises a group of three characteristics x _M,n、x_G,n and x _C,n which respectively correspond to characteristic extraction results of S _M、s_G、s_C, randomly initializing structural parameters of a model S _R, randomly selecting a fixed number of characteristic samples from x _E, performing optimization on a loss function J _m for a plurality of rounds based on a gradient descent method until the loss function converges to an expected threshold, and re-integrating S _E and S _R to obtain a trained multi-deep learning joint judgment model S _D.

The invention discloses a small sample underwater target recognition method based on multi-platform auditory perception feature deep transfer learning, which is characterized by comprising the steps of firstly establishing a multi-depth learning combined judgment model consisting of an MFCC feature extraction network, a GFCC feature extraction network, a CFCC feature extraction network and a fusion recognition network according to the characteristics of underwater sound target noise data, combining a deep learning method with multi-source multi-platform data such as Mel-Frequency Cepstral Coefficients, MFCC, gammatone frequency cepstrum coefficients (Gammatone Frequency Cepstral Coefficients, GFCC), cochlear frequency cepstrum coefficients (Cochlear Frequency Cepstral Coefficients, CFCC) and the like, generating a standardized MFCC sample, GFCC sample and a CFCC sample according to the characteristics of underwater sound target noise data, then expanding multi-level transfer learning training of the multi-depth learning combined judgment model based on a multi-platform sample set, fully utilizing available data to optimize the model structure parameters, and finally recognizing the underwater sound target data based on the optimized multi-depth learning combined judgment model.

Compared with the prior art, the invention has the beneficial effects that:

According to the method, means such as auditory perception, a depth network and transfer learning are comprehensively utilized, on one hand, a sample is constructed based on an auditory perception filter, the depth network is initialized, network structure parameters are more adaptive to the sample, meanwhile, through establishing a multi-depth learning combined judgment model driven by a plurality of auditory perception feature samples, the separability information of target features in different feature dimensions can be fully utilized, on the other hand, a multi-platform data transfer learning system is established, aiming at the condition that the method applies small sample data conditions of a platform, the multi-platform data is fully utilized to optimize the depth learning model, and the target recognition tolerance is hopefully improved.

Compared with the traditional object recognition method based on physical characteristics, the method has the advantages that the characteristics are deeply and comprehensively mined and utilized, the data utilization range is wider compared with an intelligent recognition model driven by the platform data, the model training process is not easy to be over-fitted, and the sonar object recognition capability under the condition of a small sample can be improved.

Drawings

FIG. 1 is a general scheme of target recognition in small sample water based on multi-platform auditory perception feature deep transfer learning according to the present invention.

Fig. 2 is a flowchart of target recognition in small sample water based on multi-platform auditory perception feature deep transfer learning according to the present embodiment.

FIG. 3 is a t-SNE visualization projection result of original features and deep learning features.

Fig. 4 is a schematic diagram of the frame structure of the basic ResNet-Inception module.

Fig. 5 is a schematic diagram of a framework structure of a MFCC feature extraction network.

FIG. 6 is a graph showing statistics of recognition accuracy of the method according to the present invention and various comparison methods.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention. All other embodiments, which can be made by a person skilled in the art without any inventive effort, are intended to be within the scope of the present invention, based on the embodiments of the present invention.

Fig. 1 is a general scheme of the invention, referring to fig. 1, a method for identifying targets in small sample water based on multi-platform auditory perception feature deep migration learning, the method comprises the following 4 steps.

Step 1, constructing a multi-depth learning joint judgment model oriented to target auditory perception characteristics in multi-dimensional water;

step 2, generating a multi-platform sample set based on the multi-platform data;

step 3, developing multi-level transfer learning training for the multi-deep learning joint decision model based on a multi-platform sample set;

and 4, preprocessing target radiation noise data in unknown water, performing target recognition on a preprocessing result by adopting a trained multi-deep learning combined judgment model, and outputting a recognition result.

Referring to a part of the stage of fig. 2, a specific flow of constructing the multi-deep learning joint decision model for the target auditory perception feature in multi-dimensional water in step 1 is as follows.

Step 1.1, a basic ResNet-Inception module is built, and 3 parallel branches are added after the data input layer, as shown in fig. 4. Branch 1 is a direct branch, without any added operations. The branch 2 comprises 2 convolution layers, the parameter of the convolution layer 1 is (1× 1,128,1), namely the convolution kernel size is 1×1, the number of convolution kernels is 128, the convolution step size is 1, and the method is the same. The convolutional layer 2 parameter is (3 x 3,128,1). Branch 3 comprises 2 convolutional layers, the parameters of convolutional layer 1 and convolutional layer 2 in this branch being (1 x 3,128,1), (3 x 1,128,1) in order. And adding a feature dimension expansion operation after the branches 2 and 3, and integrating the two branch output features to obtain a comprehensive feature set output, wherein the total number of features is 256. And adding direct addition and summation operation after the branch 1 outputs the characteristic and the comprehensive characteristic set to obtain the top-level characteristic of the basic module, further adding a ReLU activation function, and outputting the final convolution characteristic.

And 1.2, respectively constructing an MFCC feature extraction network, a GFCC feature extraction network and a CFCC feature extraction network, wherein the construction methods of the 3 feature extraction networks are the same. Referring to fig. 5, the MFCC feature extraction network is constructed by adding a data input layer, wherein the number of nodes of the data input layer is consistent with the length of the MFCC sequence, adding a first convolution layer, the parameters are (48× 1,144,4), namely, the convolution kernel size is 48×1, the number of convolution kernels is 144, the convolution step size is 4, the same applies to the method, adding lrn layers, adding a second convolution layer, the parameters are (16× 1,192,2), adding a third convolution layer, the parameters are (7× 1,256,1), adding lrn layers, adding a first pooling layer, the parameters are (3×1, 2), namely, the pooling kernel size is 3×1, the step size is 2, adding 3 basic ResNet-Inception modules, namely, basic ResNet-Inception module 1, basic ResNet-Inception module 2 and basic ResNet-Inception module 3, adding a fourth convolution layer, the parameters are (3× 1,384,1), adding a fifth convolution layer, the parameters are (3× 1,512,1), adding a second pooling layer, the parameters are (3×1, 2), and adding a full-connection layer, the number of nodes is 256.

Step 1.3, constructing a fusion recognition network, namely adding two layers of perceptron networks on the MFCC feature extraction network, the GFCC feature extraction network and the CFCC feature extraction network, wherein the number of nodes of the two layers of perceptron networks is 512 and 128 respectively.

And step 1.4, adding a Softmax classifier at the top layer of the two-layer perceptron network.

The invention completes the construction of the integral framework of the multi-depth learning combined judgment model through the steps 1.1-1.4.

Step 1.5, based on MFCC, GFCC, CFCC auditory perception filter construction method, respectively generating 48-channel auditory perception filter coefficients according to a set frequency band, wherein each auditory perception filter is a 48-dimensional vector, integrating all auditory perception filters to form a 48×144 matrix, and respectively initializing parameters of a first convolution layer of an MFCC feature extraction network, a GFCC feature extraction network and a CFCC feature extraction network. And setting other network layers and training methods of the multi-deep learning joint decision model according to a general convolutional neural network parameter initialization and training method. The "parameter initialization of the MFCC feature extraction network, GFCC feature extraction network, CFCC feature extraction network first convolution layer" here is to initialize the weights of the convolution kernels.

In step 1.5 of the invention, based on the MFCC filter coefficient, mel frequency cepstrum coefficient (Mel-Frequency Cepstral Coefficients, MFCC), GFCC filter coefficient, gamma frequency cepstrum coefficient (Gammatone Frequency Cepstral Coefficients, GFCC), and CFCC filter coefficient, cochlear frequency cepstrum coefficient (Cochlear Frequency Cepstral Coefficients, CFCC), the network structure parameters of the 3 feature extraction networks above are constructed and initialized, so that the network structure parameters are more adapted to the sample.

And 1.6, constructing a loss function J between an output result of the multi-deep learning joint decision model and a label corresponding to an input sample by adopting a cross entropy method, and setting training parameters such as an optimizer, learning rate, training times and the like during iterative training.

Referring to the stage two part of fig. 2, the specific flow of generating the multi-platform sample set based on the multi-platform data in the step 2 is as follows. And setting data processing upper and lower limit frequencies according to requirements based on MFCC, GFCC and CFCC auditory perception filters by using a speaker voice data set, a simulated sonar target data set, other sonar platform target data sets and a small sample sonar platform target data set to be applied, generating corresponding auditory perception characteristic samples, and constructing a multi-platform sample set.

Referring to the three stages of fig. 2, in step 3, the multi-level migration learning training is developed for the multi-depth learning joint decision model based on the multi-platform sample set, and mainly includes three interactive sub-steps, namely, initializing the multi-depth learning joint decision model, developing primary migration learning training for the multi-depth learning joint decision model, and developing advanced migration learning training for the multi-depth learning joint decision model. Specifically, the training of the multi-deep learning joint decision model includes the following procedure.

And 3.1, carrying out initialization training on the multi-deep learning joint decision model. The speaker voice sample set and the simulated sonar target sample set for training are noted as x ₁＝{x_1,1,x_1,2,…,x_1,n,(n∈N^*) and x ₂＝{x_2,1,x_2,2,…,x_2,n,(n∈N^*), respectively. The number of input samples is N _Batch each time when the multi-deep learning combined decision model is trained, a fixed number (N _Batch) of samples are randomly selected from x ₁, the loss function J is subjected to a plurality of rounds of optimization based on a gradient descent method, and then the fixed number of samples are randomly selected from x ₂, and the loss function J is subjected to a plurality of rounds of optimization based on the gradient descent method. And (3) optimizing training by using the x ₁ and x ₂ expansion models in turn based on the mode until a plurality of circulation rounds set according to requirements, and continuing to optimize circulation according to the mode until the loss value of the loss function converges to the expected threshold when the loss value of the loss function does not converge to the expected threshold after circulation to the designated round.

And 3.2, performing primary transfer learning training on the model on the basis of initializing training on the multi-deep learning combined judgment model. And (3) preserving network structure parameters of the MFCC feature extraction network, the GFCC feature extraction network and the CFCC feature extraction network after the initialization training, and randomly initializing and merging network structure parameters of the identification network (two-layer perceptron network). The other sonar target sample set noted for training is x ₃＝{x_3,1,x_3,2,…,x_3,n,(n∈N^*). A fixed number (N _Batch) of samples is randomly selected from x ₃ and the loss function J is optimized for several rounds based on the gradient descent method until the loss value of the loss function converges to the desired threshold.

And 3.3, performing advanced transfer learning training on the model on the basis of performing primary transfer learning training on the multi-deep learning combined judgment model. Note that the target sample set of the small sample sonar platform to be applied for training is x ₄＝{x_4,1,x_4,2,…,x_4,n,(n∈N^*), note that the fourth convolution layer of the MFCC feature extraction network is l _M, the fourth convolution layer of the GFCC feature extraction network is l _G, and the fourth convolution layer of the CFCC feature extraction network is l _C. extracting a first local network structure in the multi-deep learning combined discrimination model after primary transfer learning and reserving network structure parameters to obtain a feature extraction model S _E＝{s_M,s_G,s_C, wherein the first local network structure comprises an MFCC feature extraction network, a GFCC feature extraction network, The corresponding basic ResNet-Inception module 3 and the previous network structure s _M、s_G、s_C in the CFCC feature extraction network, specifically, the feature extraction model consists of 3 mutually independent network structures s _M、s_G、s_C, and the network structure s _M is a network structure from a data input layer to a part of the basic ResNet-Inception module 3 in the multi-deep learning joint discrimination model obtained in the step 3.2. s _M、s_G、s_C correspond to MFCC, GFCC and CFCC feature extraction functions, respectively. Extracting a second local network structure of the multi-deep learning joint discrimination model, constructing a feature recognition model S _R and constructing a loss function J _m of the feature recognition model S _R, wherein the second local network structure is an MFCC feature extraction network, GFCC the network structures of the corresponding fourth convolution layers l _M、l_G、l_C to softmax classifiers in the feature extraction network and the CFCC feature extraction network, namely, the part from l _M、l_G、l_C (including the top layer network), the fusion recognition network (the two-layer perceptron network) and the softmax classifier in the three feature extraction networks are integrated and reconstructed into a feature recognition model S _R. All samples in x ₄ are subjected to feature extraction by applying a feature extraction model S _E to obtain a feature sample set x_E＝{x_E,1,x_E,2,…,x_E,n,(x_E,n＝(x_M,n,x_G,n,x_C,n),n∈N^*)},, wherein a feature sample x _E,n comprises a group of three features x _M,n、x_G,n and x _C,n, and the feature extraction results respectively correspond to the feature extraction results of S _M、s_G、s_C. the structural parameters of the feature recognition model S _R are randomly initialized. A fixed number (N _Batch) of samples were randomly selected from x _E and the loss function J _m was optimized for several rounds based on the gradient descent method until the loss function converged to the desired threshold. And (3) re-integrating the S _E and the S _R to obtain a trained multi-deep learning joint decision model S _D, namely forming a multi-deep learning joint decision model facing the small sample platform.

The fixed number in the step 3.1 is identical to the fixed number in the step 3.2 and the step 3.3, and is N _Batch.

Referring to the phase four part of fig. 2, step 4 specifically includes the following steps:

and 4.1, preprocessing target radiation noise data in water to generate a plurality of frames of MFCC, GFCC and CFCC samples with time consistency.

And 4.2, processing the MFCC, GFCC and CFCC samples based on the trained multi-deep learning joint decision model S _D, and outputting a final recognition result.

According to the invention, firstly, aiming at target radiation noise in water, expansion pretreatment is carried out, MFCC, GFCC and CFCC auditory perception filters are constructed, MFCC, GFCC and CFCC auditory perception characteristics are generated, deep network models oriented to MFCC, GFCC, CFCC characteristic extraction are respectively constructed based on a deep learning method, and meanwhile, the initial convolution layers of the deep network models are initialized by utilizing auditory perception filter coefficients, so that model structure parameters are more matched with the auditory perception characteristics to be processed. On the basis, a fusion recognition network is constructed by combining a multi-layer perceptron method, so that the comprehensive recognition of the target is realized. Through the application of the multiple networks, an integral multi-deep learning joint judgment model is formed, the characteristics of the MFCC, the GFCC and the CFCC can be comprehensively utilized, the comprehensiveness of the characteristic utilization is improved, and the target recognition tolerance is enhanced. Furthermore, the data unfolding model transfer learning training of other platforms is fully utilized aiming at the restriction of the small sample data condition of the sonar platform to be applied, and the current situation of insufficient data of the platform is made up. Firstly, a multi-deep learning combined judgment model is unfolded and initialized to be trained by using speaker voice data and simulated sonar target data, a model structure parameter migration and training strategy is set, then primary migration learning training is carried out on the model based on other sonar platform target data with relatively sufficient sample data, a model structure parameter migration and training strategy is set again, and finally, the model structure parameter is optimized as far as possible based on advanced migration learning training of a small sample sonar platform target data unfolding model to be applied.

Referring to fig. 3, for three types of target data acquired by a water surface ship platform, combined with large-scale speaker voice and simulated sonar target data and shore-based sonar platform target data (the duration is more than 5 times of that of the water surface ship platform target data), the method provided by the invention is used for processing, so that the separability of deep learning features is much higher than that of original features.

Fig. 6 shows statistics of recognition accuracy of the method and various comparison methods according to the present invention, and it can be seen that the method according to the present invention has the best recognition effect, and the effectiveness of intelligent target recognition under the condition of small sample is verified.

Claims

1. The method for identifying the target in the small sample water based on the multi-platform auditory perception feature deep transfer learning is characterized by comprising the following steps of:

The multi-depth learning joint judgment model oriented to the target auditory perception characteristics in multi-dimensional water is constructed by the following steps:

constructing an MFCC feature extraction network, GFCC feature extraction networks and a CFCC feature extraction network, placing the 3 feature extraction networks in parallel, and sequentially adding a fusion recognition network and a support vector machine classifier at the top, wherein the fusion recognition network is a two-layer perceptron network;

the construction method of the MFCC feature extraction network, GFCC feature extraction network or CFCC feature extraction network comprises the following steps:

Adding a data input layer;

Adding a first convolution layer, wherein the parameters of the first convolution layer are (48 multiplied by 1,144,4);

adding a second convolution layer, wherein the second convolution layer parameter is (16× 1,192,2);

adding a third convolution layer, wherein the parameter of the third convolution layer is (7× 1,256,1);

Adding a first pooling layer, wherein the parameters of the first pooling layer are (3 multiplied by 1, 2);

adding three basic ResNet-Inception modules;

adding a fourth convolution layer, wherein the parameter of the fourth convolution layer is (3× 1,384,1);

Adding a fifth convolution layer, wherein the parameter of the fifth convolution layer is (3× 1,512,1);

Adding a second pooling layer, wherein the parameters of the second pooling layer are (3 multiplied by 1, 2);

adding a full connection layer, wherein the number of output nodes is 256;

the construction method of the basic ResNet-Inception module comprises the following steps:

Adding 3 parallel branches after a data input layer, wherein a branch 1 is a direct branch, a branch 2 comprises 2 convolution layers, the parameters of each convolution layer are (1× 1,128,1) and (3× 3,128,1) in sequence, a branch 3 comprises 2 convolution layers, the parameters of each convolution layer are (1× 3,128,1) and (3× 1,128,1) in sequence, adding feature dimension expansion operation after the branches 2 and 3, obtaining comprehensive feature set output by integrating two branch output features, wherein the total feature number is 256, adding direct addition and summation operation after the branch 1 output features and the comprehensive feature set, obtaining top-level features of a basic ResNet-Inception module, adding a ReLU activation function, and outputting final convolution features;

initializing parameters of a multi-deep learning joint decision model, including:

Based on MFCC, GFCC, CFCC auditory perception filter construction method, respectively generating 48-channel auditory perception filter coefficients according to a set frequency band, wherein each auditory perception filter is a 48-dimensional vector, forming a 48 multiplied by 144 matrix by integrating all auditory perception filters, and respectively initializing parameters of first convolution layers of an MFCC feature extraction network, a GFCC feature extraction network and a CFCC feature extraction network;

Constructing a loss function J of a multi-deep learning joint decision model, and setting training parameters during training;

generating a multi-platform sample set based on multi-platform data, namely generating corresponding auditory perception characteristic samples by using a speaker voice data set, a simulated sonar target data set, other sonar platform target data sets and a small sample sonar platform target data set to be applied, setting data processing upper and lower limit frequencies based on MFCC, GFCC and CFCC auditory perception filters according to requirements, and constructing the multi-platform sample set;

based on a multi-platform sample set, developing multi-level transfer learning training for a multi-depth learning joint decision model, wherein the multi-level transfer learning training comprises the following steps of:

performing initialization training on the multi-deep learning joint decision model, including:

a cyclic optimization step, namely randomly selecting a fixed number of samples from a speaker voice sample set and optimizing a loss function J for a plurality of rounds based on a gradient descent method, and then randomly selecting a fixed number of samples from a simulated sonar target sample set and optimizing the loss function J for a plurality of rounds based on the gradient descent method;

repeating the cyclic optimization step until the cyclic optimization times reach the preset cyclic times until the loss function converges to the expected threshold;

developing primary transfer learning training on the multi-deep learning joint decision model, including:

The method comprises the steps of reserving structural parameters of an MFCC (multi-frequency component carrier) feature extraction network, a GFCC feature extraction network and a CFCC feature extraction network during initialization training, randomly initializing, fusing and identifying network structural parameters, randomly selecting a fixed number of samples from other sonar target sample sets, and optimizing a loss function J for a plurality of rounds based on a gradient descent method until the loss function converges to an expected threshold;

developing advanced transfer learning training on the multi-deep learning joint decision model, comprising the following steps:

Extracting a first local network structure of the multi-deep learning joint discrimination model and reserving network structure parameters to obtain a feature extraction model S _E＝{s_M,s_G,s_C, wherein the first local network structure comprises corresponding 3 basic ResNet-Inception modules in an MFCC feature extraction network, a GFCC feature extraction network and a CFCC feature extraction network and a previous network structure S _M、s_G、s_C;

Extracting a second local network structure of the multi-deep learning joint discrimination model, constructing a feature recognition model S _R and constructing a loss function Jm of a feature recognition model S _R, wherein the second local network structure is a network structure from a corresponding fourth convolution layer l _M、l_G、l_C to a softmax classifier in an MFCC feature extraction network, a GFCC feature extraction network and a CFCC feature extraction network;

Performing feature extraction on all samples in a target sample set of a small sample sonar platform to be applied through S _E to obtain a feature sample set x_E＝{x_E,1,x_E,2,…,x_E,n,(x_E,n＝(x_M,n,x_G,n,x_C,n),n∈N^*)},, wherein a feature sample x _E,n comprises a group of three features x _M,n、x_G,n and x _C,n, and the feature extraction results correspond to the features of S _M、s_G、s_C respectively;

Randomly initializing structural parameters of a model S _R;

Randomly selecting a fixed number of characteristic samples from x _E, and optimizing the loss function J _m for a plurality of rounds based on a gradient descent method until the loss function converges to an expected threshold;

Reintegration S _E and S _R to obtain a trained multi-deep learning joint decision model S _D;

preprocessing target radiation noise data in unknown water, performing target recognition on the preprocessing result by adopting a trained multi-depth learning combined judgment model, and outputting a recognition result.