[go: up one dir, main page]

CN114202056B - A small-sample underwater target recognition method based on deep transfer learning of multi-platform auditory perception features - Google Patents

A small-sample underwater target recognition method based on deep transfer learning of multi-platform auditory perception features

Info

Publication number
CN114202056B
CN114202056B CN202111346434.3A CN202111346434A CN114202056B CN 114202056 B CN114202056 B CN 114202056B CN 202111346434 A CN202111346434 A CN 202111346434A CN 114202056 B CN114202056 B CN 114202056B
Authority
CN
China
Prior art keywords
feature extraction
network
platform
extraction network
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111346434.3A
Other languages
Chinese (zh)
Other versions
CN114202056A (en
Inventor
陈越超
王方勇
周彬
王青翠
陈孝森
尚金涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
715 Research Institute Of China Shipbuilding Corp
China State Shipbuilding Corp Ltd
Original Assignee
715 Research Institute Of China Shipbuilding Corp
China State Shipbuilding Corp Ltd
Hangzhou Institute of Applied Acoustics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 715 Research Institute Of China Shipbuilding Corp, China State Shipbuilding Corp Ltd, Hangzhou Institute of Applied Acoustics filed Critical 715 Research Institute Of China Shipbuilding Corp
Priority to CN202111346434.3A priority Critical patent/CN114202056B/en
Publication of CN114202056A publication Critical patent/CN114202056A/en
Application granted granted Critical
Publication of CN114202056B publication Critical patent/CN114202056B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/08Feature extraction

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)

Abstract

本发明提出一种基于多平台听觉感知特征深度迁移学习的水中目标识别方法,通过建立由MFCC特征提取网络、GFCC特征提取网络、CFCC特征提取网络和融合识别网络组成的多深度学习联合判决模型,另一方面,建立多平台数据迁移学习体制,针对方法应用平台小样本数据条件的情况,充分利用其他声纳平台目标数据、仿真声纳目标数据、说话人语音数据等具备特征可借鉴性的数据对深度学习模型进行迁移学习训练。该方法与传统基于物理特征的目标识别方法相比,特征挖掘和利用更为深入和全面,同时和单纯基于本平台数据驱动的智能化识别模型相比,数据利用范围更为广泛,模型训练过程不容易过拟合,本方法可提升小样本条件下的声纳目标识别能力。

This invention proposes a method for underwater target recognition based on deep transfer learning of multi-platform auditory perception features. This method establishes a multi-deep learning joint decision model consisting of an MFCC feature extraction network, a GFCC feature extraction network, a CFCC feature extraction network, and a fusion recognition network. Furthermore, a multi-platform data transfer learning system is established. This system leverages target data from other sonar platforms, simulated sonar target data, speaker voice data, and other data with referenceable features to perform transfer learning training on the deep learning model, specifically targeting the small sample data conditions of the method's application platform. Compared to traditional target recognition methods based on physical features, this method offers more in-depth and comprehensive feature mining and utilization. Furthermore, compared to intelligent recognition models driven solely by platform data, this method utilizes a wider range of data, and the model training process is less susceptible to overfitting. This method can improve sonar target recognition capabilities under small sample data conditions.

Description

Small sample underwater target recognition method based on multi-platform auditory perception feature deep transfer learning
Technical Field
The invention belongs to the technical field of underwater target identification and artificial intelligence, and particularly relates to a small sample underwater target identification method based on multi-platform auditory perception feature deep transfer learning.
Background
The underwater target recognition technology can provide target characteristic information for sonar operators, judge the target type, is an important basis for comprehensive decision making, and is one of the most important research directions in sonar signal processing.
The core research content of the underwater target identification is the extraction and expression of the acoustic characteristics of the target signals. The traditional underwater sound target feature acquisition method mainly builds a model based on the grasped target data/characteristic knowledge, is influenced by factors such as multi-target interference, ocean channel space-time variation, platform and environmental noise, is clean and clear, is large in difficulty in acquiring target features suitable for actual scenes, and becomes a bottleneck for restricting the development of the traditional recognition technology.
In recent years, the deep learning technology has been widely used in the fields of voice, image and the like at present, and has been focused by a plurality of scholars at home and abroad in the field of underwater sound, but is limited by factors such as test means and the like, and the high-quality target data of most sonar platforms are difficult to acquire, so that the generalization capability of an intelligent model driven by target data of the sonar platforms directly based on the small samples is not strong.
Disclosure of Invention
The invention aims to provide a method for identifying targets in small sample water for multi-platform auditory perception feature deep migration learning, which aims to solve the problem that the intelligent model generalization capability based on small sample sonar platform target data driving in the background art is not strong.
A target recognition method in small sample water based on multi-platform auditory perception feature deep transfer learning comprises the following steps of constructing a multi-depth learning combined judgment model facing multi-dimensional target auditory perception features in water, generating a multi-platform sample set based on multi-platform data, expanding multi-level transfer learning training on the multi-depth learning combined judgment model based on the multi-platform sample set, preprocessing unknown target radiation noise data, carrying out target recognition on a preprocessing result by adopting the trained multi-depth learning combined judgment model, and outputting a recognition result.
The construction method of the multi-deep learning joint judgment model comprises the steps of constructing an MFCC feature extraction network, a GFCC feature extraction network and a CFCC feature extraction network, placing the 3 feature extraction networks in parallel, sequentially adding a fusion recognition network and a support vector machine classifier at the top, wherein the fusion recognition network is a two-layer perceptron network, initializing parameters of the multi-deep learning joint judgment model, constructing a loss function J of the multi-deep learning joint judgment model, and setting training parameters during training.
Preferably, the construction method of the MFCC feature extraction network, GFCC feature extraction network or CFCC feature extraction network comprises the steps of adding a data input layer, adding a first convolution layer, wherein the first convolution layer parameter is (48× 1,144,4), adding a lrn layer, adding a second convolution layer, wherein the second convolution layer parameter is (16× 1,192,2), adding a third convolution layer, wherein the third convolution layer parameter is (7× 1,256,1), adding a lrn layer, adding a first pooling layer, wherein the first pooling layer parameter is (3×1, 2), adding three basic ResNet-Inception modules, adding a fourth convolution layer, wherein the fourth convolution layer parameter is (3× 1,384,1), adding a fifth convolution layer, wherein the fifth convolution layer parameter is (3× 1,512,1), adding a second pooling layer, wherein the second pooling layer parameter is (3×1, 2), adding a full connection layer, and the number of output nodes is 256.
Preferably, 3 parallel branches are added after a data input layer, branch 1 is a direct branch, branch 2 comprises 2 convolution layers, the parameters of each convolution layer are (1× 1,128,1) and (3× 3,128,1) in sequence, branch 3 comprises 2 convolution layers, the parameters of each convolution layer are (1× 3,128,1) and (3× 1,128,1) in sequence, feature dimension expansion operation is added after branches 2 and 3, integrated feature set output is obtained by integrating two branch output features, the total number of features is 256, direct addition summation operation is added after the output features of branch 1 and the integrated feature set, top-level features of a basic ResNet-Inception module are obtained, and a ReLU activation function is added to output final convolution features.
Preferably, the initializing the parameters of the multi-deep learning joint decision model includes generating 48 channels of auditory perception filter coefficients according to a set frequency band based on MFCC, GFCC, CFCC auditory perception filter construction methods, each auditory perception filter is a 48-dimensional vector, integrating all auditory perception filters to form a 48×144 matrix, and initializing parameters of first convolution layers of the MFCC feature extraction network, the GFCC feature extraction network and the CFCC feature extraction network respectively.
Preferably, the generating the multi-platform sample set based on the multi-platform data is that a speaker voice data set, a simulated sonar target data set, other sonar platform target data sets and a small sample sonar platform target data set to be applied are utilized, the upper and lower limit frequencies of data processing are set according to requirements based on MFCC, GFCC and CFCC auditory perception filters, corresponding auditory perception characteristic samples are generated, and the multi-platform sample set is constructed.
Preferably, the multi-stage transfer learning training for the multi-depth learning joint decision model based on the multi-platform sample set comprises the following steps of initializing the multi-depth learning joint decision model, performing primary transfer learning training on the multi-depth learning joint decision model, and performing advanced transfer learning training on the multi-depth learning joint decision model.
The initialization training of the multi-deep learning joint decision model preferably comprises a cyclic optimization step of randomly selecting a fixed number of samples from a speaker voice sample set and optimizing a loss function J for a plurality of rounds based on a gradient descent method, then randomly selecting a fixed number of samples from a simulated sonar target sample set and optimizing the loss function J for a plurality of rounds based on a gradient descent method, and repeating the cyclic optimization step until the cyclic optimization times reach a preset cyclic round time until the loss function converges to an expected threshold.
Preferably, the primary transfer learning training of the multi-deep learning combined judgment model comprises the steps of reserving structural parameters of an MFCC feature extraction network, a GFCC feature extraction network and a CFCC feature extraction network during the initialization training, randomly initializing, fusing and identifying network structural parameters, randomly selecting a fixed number of samples from other sonar target sample sets, and optimizing a loss function J for a plurality of rounds based on a gradient descent method until the loss function converges to an expected threshold.
Preferably, the advanced transfer learning training of the multi-deep learning joint judgment model comprises the steps of extracting a first local network structure of the multi-deep learning joint judgment model and preserving network structure parameters to obtain a feature extraction model S E={sM,sG,sC, wherein the first local network structure comprises an MFCC feature extraction network, a GFCC feature extraction network, a first local network structure of the multi-deep learning joint judgment model, and a second local network structure of the multi-deep learning joint judgment model, The method comprises the steps of extracting corresponding 3 basic ResNet-Inception modules in a CFCC feature extraction network and a previous network structure S M、sG、sC, extracting a second local network structure of a multi-deep learning joint discrimination model, constructing a feature recognition model S R and constructing a loss function J m of the feature recognition model S R, wherein the second local network structure is an MFCC feature extraction network, GFCC feature extraction network, The method comprises the steps of performing characteristic extraction on all samples in a target sample set of a small sample sonar platform to be applied through S E to obtain a characteristic sample set xE={xE,1,xE,2,…,xE,n,(xE,n=(xM,n,xG,n,xC,n),n∈N*)},, wherein the characteristic sample x E,n comprises a group of three characteristics x M,n、xG,n and x C,n which respectively correspond to characteristic extraction results of S M、sG、sC, randomly initializing structural parameters of a model S R, randomly selecting a fixed number of characteristic samples from x E, performing optimization on a loss function J m for a plurality of rounds based on a gradient descent method until the loss function converges to an expected threshold, and re-integrating S E and S R to obtain a trained multi-deep learning joint judgment model S D.
The invention discloses a small sample underwater target recognition method based on multi-platform auditory perception feature deep transfer learning, which is characterized by comprising the steps of firstly establishing a multi-depth learning combined judgment model consisting of an MFCC feature extraction network, a GFCC feature extraction network, a CFCC feature extraction network and a fusion recognition network according to the characteristics of underwater sound target noise data, combining a deep learning method with multi-source multi-platform data such as Mel-Frequency Cepstral Coefficients, MFCC, gammatone frequency cepstrum coefficients (Gammatone Frequency Cepstral Coefficients, GFCC), cochlear frequency cepstrum coefficients (Cochlear Frequency Cepstral Coefficients, CFCC) and the like, generating a standardized MFCC sample, GFCC sample and a CFCC sample according to the characteristics of underwater sound target noise data, then expanding multi-level transfer learning training of the multi-depth learning combined judgment model based on a multi-platform sample set, fully utilizing available data to optimize the model structure parameters, and finally recognizing the underwater sound target data based on the optimized multi-depth learning combined judgment model.
Compared with the prior art, the invention has the beneficial effects that:
According to the method, means such as auditory perception, a depth network and transfer learning are comprehensively utilized, on one hand, a sample is constructed based on an auditory perception filter, the depth network is initialized, network structure parameters are more adaptive to the sample, meanwhile, through establishing a multi-depth learning combined judgment model driven by a plurality of auditory perception feature samples, the separability information of target features in different feature dimensions can be fully utilized, on the other hand, a multi-platform data transfer learning system is established, aiming at the condition that the method applies small sample data conditions of a platform, the multi-platform data is fully utilized to optimize the depth learning model, and the target recognition tolerance is hopefully improved.
Compared with the traditional object recognition method based on physical characteristics, the method has the advantages that the characteristics are deeply and comprehensively mined and utilized, the data utilization range is wider compared with an intelligent recognition model driven by the platform data, the model training process is not easy to be over-fitted, and the sonar object recognition capability under the condition of a small sample can be improved.
Drawings
FIG. 1 is a general scheme of target recognition in small sample water based on multi-platform auditory perception feature deep transfer learning according to the present invention.
Fig. 2 is a flowchart of target recognition in small sample water based on multi-platform auditory perception feature deep transfer learning according to the present embodiment.
FIG. 3 is a t-SNE visualization projection result of original features and deep learning features.
Fig. 4 is a schematic diagram of the frame structure of the basic ResNet-Inception module.
Fig. 5 is a schematic diagram of a framework structure of a MFCC feature extraction network.
FIG. 6 is a graph showing statistics of recognition accuracy of the method according to the present invention and various comparison methods.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention. All other embodiments, which can be made by a person skilled in the art without any inventive effort, are intended to be within the scope of the present invention, based on the embodiments of the present invention.
Fig. 1 is a general scheme of the invention, referring to fig. 1, a method for identifying targets in small sample water based on multi-platform auditory perception feature deep migration learning, the method comprises the following 4 steps.
Step 1, constructing a multi-depth learning joint judgment model oriented to target auditory perception characteristics in multi-dimensional water;
step 2, generating a multi-platform sample set based on the multi-platform data;
step 3, developing multi-level transfer learning training for the multi-deep learning joint decision model based on a multi-platform sample set;
and 4, preprocessing target radiation noise data in unknown water, performing target recognition on a preprocessing result by adopting a trained multi-deep learning combined judgment model, and outputting a recognition result.
Referring to a part of the stage of fig. 2, a specific flow of constructing the multi-deep learning joint decision model for the target auditory perception feature in multi-dimensional water in step 1 is as follows.
Step 1.1, a basic ResNet-Inception module is built, and 3 parallel branches are added after the data input layer, as shown in fig. 4. Branch 1 is a direct branch, without any added operations. The branch 2 comprises 2 convolution layers, the parameter of the convolution layer 1 is (1× 1,128,1), namely the convolution kernel size is 1×1, the number of convolution kernels is 128, the convolution step size is 1, and the method is the same. The convolutional layer 2 parameter is (3 x 3,128,1). Branch 3 comprises 2 convolutional layers, the parameters of convolutional layer 1 and convolutional layer 2 in this branch being (1 x 3,128,1), (3 x 1,128,1) in order. And adding a feature dimension expansion operation after the branches 2 and 3, and integrating the two branch output features to obtain a comprehensive feature set output, wherein the total number of features is 256. And adding direct addition and summation operation after the branch 1 outputs the characteristic and the comprehensive characteristic set to obtain the top-level characteristic of the basic module, further adding a ReLU activation function, and outputting the final convolution characteristic.
And 1.2, respectively constructing an MFCC feature extraction network, a GFCC feature extraction network and a CFCC feature extraction network, wherein the construction methods of the 3 feature extraction networks are the same. Referring to fig. 5, the MFCC feature extraction network is constructed by adding a data input layer, wherein the number of nodes of the data input layer is consistent with the length of the MFCC sequence, adding a first convolution layer, the parameters are (48× 1,144,4), namely, the convolution kernel size is 48×1, the number of convolution kernels is 144, the convolution step size is 4, the same applies to the method, adding lrn layers, adding a second convolution layer, the parameters are (16× 1,192,2), adding a third convolution layer, the parameters are (7× 1,256,1), adding lrn layers, adding a first pooling layer, the parameters are (3×1, 2), namely, the pooling kernel size is 3×1, the step size is 2, adding 3 basic ResNet-Inception modules, namely, basic ResNet-Inception module 1, basic ResNet-Inception module 2 and basic ResNet-Inception module 3, adding a fourth convolution layer, the parameters are (3× 1,384,1), adding a fifth convolution layer, the parameters are (3× 1,512,1), adding a second pooling layer, the parameters are (3×1, 2), and adding a full-connection layer, the number of nodes is 256.
Step 1.3, constructing a fusion recognition network, namely adding two layers of perceptron networks on the MFCC feature extraction network, the GFCC feature extraction network and the CFCC feature extraction network, wherein the number of nodes of the two layers of perceptron networks is 512 and 128 respectively.
And step 1.4, adding a Softmax classifier at the top layer of the two-layer perceptron network.
The invention completes the construction of the integral framework of the multi-depth learning combined judgment model through the steps 1.1-1.4.
Step 1.5, based on MFCC, GFCC, CFCC auditory perception filter construction method, respectively generating 48-channel auditory perception filter coefficients according to a set frequency band, wherein each auditory perception filter is a 48-dimensional vector, integrating all auditory perception filters to form a 48×144 matrix, and respectively initializing parameters of a first convolution layer of an MFCC feature extraction network, a GFCC feature extraction network and a CFCC feature extraction network. And setting other network layers and training methods of the multi-deep learning joint decision model according to a general convolutional neural network parameter initialization and training method. The "parameter initialization of the MFCC feature extraction network, GFCC feature extraction network, CFCC feature extraction network first convolution layer" here is to initialize the weights of the convolution kernels.
In step 1.5 of the invention, based on the MFCC filter coefficient, mel frequency cepstrum coefficient (Mel-Frequency Cepstral Coefficients, MFCC), GFCC filter coefficient, gamma frequency cepstrum coefficient (Gammatone Frequency Cepstral Coefficients, GFCC), and CFCC filter coefficient, cochlear frequency cepstrum coefficient (Cochlear Frequency Cepstral Coefficients, CFCC), the network structure parameters of the 3 feature extraction networks above are constructed and initialized, so that the network structure parameters are more adapted to the sample.
And 1.6, constructing a loss function J between an output result of the multi-deep learning joint decision model and a label corresponding to an input sample by adopting a cross entropy method, and setting training parameters such as an optimizer, learning rate, training times and the like during iterative training.
Referring to the stage two part of fig. 2, the specific flow of generating the multi-platform sample set based on the multi-platform data in the step 2 is as follows. And setting data processing upper and lower limit frequencies according to requirements based on MFCC, GFCC and CFCC auditory perception filters by using a speaker voice data set, a simulated sonar target data set, other sonar platform target data sets and a small sample sonar platform target data set to be applied, generating corresponding auditory perception characteristic samples, and constructing a multi-platform sample set.
Referring to the three stages of fig. 2, in step 3, the multi-level migration learning training is developed for the multi-depth learning joint decision model based on the multi-platform sample set, and mainly includes three interactive sub-steps, namely, initializing the multi-depth learning joint decision model, developing primary migration learning training for the multi-depth learning joint decision model, and developing advanced migration learning training for the multi-depth learning joint decision model. Specifically, the training of the multi-deep learning joint decision model includes the following procedure.
And 3.1, carrying out initialization training on the multi-deep learning joint decision model. The speaker voice sample set and the simulated sonar target sample set for training are noted as x 1={x1,1,x1,2,…,x1,n,(n∈N*) and x 2={x2,1,x2,2,…,x2,n,(n∈N*), respectively. The number of input samples is N Batch each time when the multi-deep learning combined decision model is trained, a fixed number (N Batch) of samples are randomly selected from x 1, the loss function J is subjected to a plurality of rounds of optimization based on a gradient descent method, and then the fixed number of samples are randomly selected from x 2, and the loss function J is subjected to a plurality of rounds of optimization based on the gradient descent method. And (3) optimizing training by using the x 1 and x 2 expansion models in turn based on the mode until a plurality of circulation rounds set according to requirements, and continuing to optimize circulation according to the mode until the loss value of the loss function converges to the expected threshold when the loss value of the loss function does not converge to the expected threshold after circulation to the designated round.
And 3.2, performing primary transfer learning training on the model on the basis of initializing training on the multi-deep learning combined judgment model. And (3) preserving network structure parameters of the MFCC feature extraction network, the GFCC feature extraction network and the CFCC feature extraction network after the initialization training, and randomly initializing and merging network structure parameters of the identification network (two-layer perceptron network). The other sonar target sample set noted for training is x 3={x3,1,x3,2,…,x3,n,(n∈N*). A fixed number (N Batch) of samples is randomly selected from x 3 and the loss function J is optimized for several rounds based on the gradient descent method until the loss value of the loss function converges to the desired threshold.
And 3.3, performing advanced transfer learning training on the model on the basis of performing primary transfer learning training on the multi-deep learning combined judgment model. Note that the target sample set of the small sample sonar platform to be applied for training is x 4={x4,1,x4,2,…,x4,n,(n∈N*), note that the fourth convolution layer of the MFCC feature extraction network is l M, the fourth convolution layer of the GFCC feature extraction network is l G, and the fourth convolution layer of the CFCC feature extraction network is l C. extracting a first local network structure in the multi-deep learning combined discrimination model after primary transfer learning and reserving network structure parameters to obtain a feature extraction model S E={sM,sG,sC, wherein the first local network structure comprises an MFCC feature extraction network, a GFCC feature extraction network, The corresponding basic ResNet-Inception module 3 and the previous network structure s M、sG、sC in the CFCC feature extraction network, specifically, the feature extraction model consists of 3 mutually independent network structures s M、sG、sC, and the network structure s M is a network structure from a data input layer to a part of the basic ResNet-Inception module 3 in the multi-deep learning joint discrimination model obtained in the step 3.2. s M、sG、sC correspond to MFCC, GFCC and CFCC feature extraction functions, respectively. Extracting a second local network structure of the multi-deep learning joint discrimination model, constructing a feature recognition model S R and constructing a loss function J m of the feature recognition model S R, wherein the second local network structure is an MFCC feature extraction network, GFCC the network structures of the corresponding fourth convolution layers l M、lG、lC to softmax classifiers in the feature extraction network and the CFCC feature extraction network, namely, the part from l M、lG、lC (including the top layer network), the fusion recognition network (the two-layer perceptron network) and the softmax classifier in the three feature extraction networks are integrated and reconstructed into a feature recognition model S R. All samples in x 4 are subjected to feature extraction by applying a feature extraction model S E to obtain a feature sample set xE={xE,1,xE,2,…,xE,n,(xE,n=(xM,n,xG,n,xC,n),n∈N*)},, wherein a feature sample x E,n comprises a group of three features x M,n、xG,n and x C,n, and the feature extraction results respectively correspond to the feature extraction results of S M、sG、sC. the structural parameters of the feature recognition model S R are randomly initialized. A fixed number (N Batch) of samples were randomly selected from x E and the loss function J m was optimized for several rounds based on the gradient descent method until the loss function converged to the desired threshold. And (3) re-integrating the S E and the S R to obtain a trained multi-deep learning joint decision model S D, namely forming a multi-deep learning joint decision model facing the small sample platform.
The fixed number in the step 3.1 is identical to the fixed number in the step 3.2 and the step 3.3, and is N Batch.
Referring to the phase four part of fig. 2, step 4 specifically includes the following steps:
and 4.1, preprocessing target radiation noise data in water to generate a plurality of frames of MFCC, GFCC and CFCC samples with time consistency.
And 4.2, processing the MFCC, GFCC and CFCC samples based on the trained multi-deep learning joint decision model S D, and outputting a final recognition result.
According to the invention, firstly, aiming at target radiation noise in water, expansion pretreatment is carried out, MFCC, GFCC and CFCC auditory perception filters are constructed, MFCC, GFCC and CFCC auditory perception characteristics are generated, deep network models oriented to MFCC, GFCC, CFCC characteristic extraction are respectively constructed based on a deep learning method, and meanwhile, the initial convolution layers of the deep network models are initialized by utilizing auditory perception filter coefficients, so that model structure parameters are more matched with the auditory perception characteristics to be processed. On the basis, a fusion recognition network is constructed by combining a multi-layer perceptron method, so that the comprehensive recognition of the target is realized. Through the application of the multiple networks, an integral multi-deep learning joint judgment model is formed, the characteristics of the MFCC, the GFCC and the CFCC can be comprehensively utilized, the comprehensiveness of the characteristic utilization is improved, and the target recognition tolerance is enhanced. Furthermore, the data unfolding model transfer learning training of other platforms is fully utilized aiming at the restriction of the small sample data condition of the sonar platform to be applied, and the current situation of insufficient data of the platform is made up. Firstly, a multi-deep learning combined judgment model is unfolded and initialized to be trained by using speaker voice data and simulated sonar target data, a model structure parameter migration and training strategy is set, then primary migration learning training is carried out on the model based on other sonar platform target data with relatively sufficient sample data, a model structure parameter migration and training strategy is set again, and finally, the model structure parameter is optimized as far as possible based on advanced migration learning training of a small sample sonar platform target data unfolding model to be applied.
Referring to fig. 3, for three types of target data acquired by a water surface ship platform, combined with large-scale speaker voice and simulated sonar target data and shore-based sonar platform target data (the duration is more than 5 times of that of the water surface ship platform target data), the method provided by the invention is used for processing, so that the separability of deep learning features is much higher than that of original features.
Fig. 6 shows statistics of recognition accuracy of the method and various comparison methods according to the present invention, and it can be seen that the method according to the present invention has the best recognition effect, and the effectiveness of intelligent target recognition under the condition of small sample is verified.

Claims (1)

1. The method for identifying the target in the small sample water based on the multi-platform auditory perception feature deep transfer learning is characterized by comprising the following steps of:
The multi-depth learning joint judgment model oriented to the target auditory perception characteristics in multi-dimensional water is constructed by the following steps:
constructing an MFCC feature extraction network, GFCC feature extraction networks and a CFCC feature extraction network, placing the 3 feature extraction networks in parallel, and sequentially adding a fusion recognition network and a support vector machine classifier at the top, wherein the fusion recognition network is a two-layer perceptron network;
the construction method of the MFCC feature extraction network, GFCC feature extraction network or CFCC feature extraction network comprises the following steps:
Adding a data input layer;
Adding a first convolution layer, wherein the parameters of the first convolution layer are (48 multiplied by 1,144,4);
adding a second convolution layer, wherein the second convolution layer parameter is (16× 1,192,2);
adding a third convolution layer, wherein the parameter of the third convolution layer is (7× 1,256,1);
Adding a first pooling layer, wherein the parameters of the first pooling layer are (3 multiplied by 1, 2);
adding three basic ResNet-Inception modules;
adding a fourth convolution layer, wherein the parameter of the fourth convolution layer is (3× 1,384,1);
Adding a fifth convolution layer, wherein the parameter of the fifth convolution layer is (3× 1,512,1);
Adding a second pooling layer, wherein the parameters of the second pooling layer are (3 multiplied by 1, 2);
adding a full connection layer, wherein the number of output nodes is 256;
the construction method of the basic ResNet-Inception module comprises the following steps:
Adding 3 parallel branches after a data input layer, wherein a branch 1 is a direct branch, a branch 2 comprises 2 convolution layers, the parameters of each convolution layer are (1× 1,128,1) and (3× 3,128,1) in sequence, a branch 3 comprises 2 convolution layers, the parameters of each convolution layer are (1× 3,128,1) and (3× 1,128,1) in sequence, adding feature dimension expansion operation after the branches 2 and 3, obtaining comprehensive feature set output by integrating two branch output features, wherein the total feature number is 256, adding direct addition and summation operation after the branch 1 output features and the comprehensive feature set, obtaining top-level features of a basic ResNet-Inception module, adding a ReLU activation function, and outputting final convolution features;
initializing parameters of a multi-deep learning joint decision model, including:
Based on MFCC, GFCC, CFCC auditory perception filter construction method, respectively generating 48-channel auditory perception filter coefficients according to a set frequency band, wherein each auditory perception filter is a 48-dimensional vector, forming a 48 multiplied by 144 matrix by integrating all auditory perception filters, and respectively initializing parameters of first convolution layers of an MFCC feature extraction network, a GFCC feature extraction network and a CFCC feature extraction network;
Constructing a loss function J of a multi-deep learning joint decision model, and setting training parameters during training;
generating a multi-platform sample set based on multi-platform data, namely generating corresponding auditory perception characteristic samples by using a speaker voice data set, a simulated sonar target data set, other sonar platform target data sets and a small sample sonar platform target data set to be applied, setting data processing upper and lower limit frequencies based on MFCC, GFCC and CFCC auditory perception filters according to requirements, and constructing the multi-platform sample set;
based on a multi-platform sample set, developing multi-level transfer learning training for a multi-depth learning joint decision model, wherein the multi-level transfer learning training comprises the following steps of:
performing initialization training on the multi-deep learning joint decision model, including:
a cyclic optimization step, namely randomly selecting a fixed number of samples from a speaker voice sample set and optimizing a loss function J for a plurality of rounds based on a gradient descent method, and then randomly selecting a fixed number of samples from a simulated sonar target sample set and optimizing the loss function J for a plurality of rounds based on the gradient descent method;
repeating the cyclic optimization step until the cyclic optimization times reach the preset cyclic times until the loss function converges to the expected threshold;
developing primary transfer learning training on the multi-deep learning joint decision model, including:
The method comprises the steps of reserving structural parameters of an MFCC (multi-frequency component carrier) feature extraction network, a GFCC feature extraction network and a CFCC feature extraction network during initialization training, randomly initializing, fusing and identifying network structural parameters, randomly selecting a fixed number of samples from other sonar target sample sets, and optimizing a loss function J for a plurality of rounds based on a gradient descent method until the loss function converges to an expected threshold;
developing advanced transfer learning training on the multi-deep learning joint decision model, comprising the following steps:
Extracting a first local network structure of the multi-deep learning joint discrimination model and reserving network structure parameters to obtain a feature extraction model S E={sM,sG,sC, wherein the first local network structure comprises corresponding 3 basic ResNet-Inception modules in an MFCC feature extraction network, a GFCC feature extraction network and a CFCC feature extraction network and a previous network structure S M、sG、sC;
Extracting a second local network structure of the multi-deep learning joint discrimination model, constructing a feature recognition model S R and constructing a loss function Jm of a feature recognition model S R, wherein the second local network structure is a network structure from a corresponding fourth convolution layer l M、lG、lC to a softmax classifier in an MFCC feature extraction network, a GFCC feature extraction network and a CFCC feature extraction network;
Performing feature extraction on all samples in a target sample set of a small sample sonar platform to be applied through S E to obtain a feature sample set xE={xE,1,xE,2,…,xE,n,(xE,n=(xM,n,xG,n,xC,n),n∈N*)},, wherein a feature sample x E,n comprises a group of three features x M,n、xG,n and x C,n, and the feature extraction results correspond to the features of S M、sG、sC respectively;
Randomly initializing structural parameters of a model S R;
Randomly selecting a fixed number of characteristic samples from x E, and optimizing the loss function J m for a plurality of rounds based on a gradient descent method until the loss function converges to an expected threshold;
Reintegration S E and S R to obtain a trained multi-deep learning joint decision model S D;
preprocessing target radiation noise data in unknown water, performing target recognition on the preprocessing result by adopting a trained multi-depth learning combined judgment model, and outputting a recognition result.
CN202111346434.3A 2021-11-11 2021-11-11 A small-sample underwater target recognition method based on deep transfer learning of multi-platform auditory perception features Active CN114202056B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111346434.3A CN114202056B (en) 2021-11-11 2021-11-11 A small-sample underwater target recognition method based on deep transfer learning of multi-platform auditory perception features

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111346434.3A CN114202056B (en) 2021-11-11 2021-11-11 A small-sample underwater target recognition method based on deep transfer learning of multi-platform auditory perception features

Publications (2)

Publication Number Publication Date
CN114202056A CN114202056A (en) 2022-03-18
CN114202056B true CN114202056B (en) 2025-08-08

Family

ID=80647460

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111346434.3A Active CN114202056B (en) 2021-11-11 2021-11-11 A small-sample underwater target recognition method based on deep transfer learning of multi-platform auditory perception features

Country Status (1)

Country Link
CN (1) CN114202056B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115062658B (en) * 2022-06-14 2024-05-28 电子科技大学 Modulation type recognition method for overlapping radar signals based on adaptive threshold network
CN115436951A (en) * 2022-08-06 2022-12-06 中国船舶重工集团公司第七一五研究所 A method for underwater target echo recognition based on auditory filtering
CN115412324B (en) * 2022-08-22 2025-05-06 北京鹏鹄物宇科技发展有限公司 Air-Ground-Space Network Intrusion Detection Method Based on Multimodal Conditional Countermeasure Domain Adaptation

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110225432A (en) * 2019-05-10 2019-09-10 中国船舶重工集团公司第七一五研究所 A kind of sonar target solid listens to method

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10149190A (en) * 1996-11-19 1998-06-02 Matsushita Electric Ind Co Ltd Voice recognition method and voice recognition device
WO2020139442A2 (en) * 2018-10-10 2020-07-02 Farsounder, Inc. Three-dimensional forward-looking sonar target recognition with machine learning
EP3953868A4 (en) * 2019-04-10 2023-01-11 Cornell University NEUROMORPHIC ALGORITHM FOR FAST ONLINE LEARNING AND SIGNAL RECOVERY
CN111209952B (en) * 2020-01-03 2023-05-30 西安工业大学 Underwater target detection method based on improved SSD and migration learning
CN111402922B (en) * 2020-03-06 2023-06-30 武汉轻工大学 Audio signal classification method, device, equipment and storage medium based on small samples
CN112149755B (en) * 2020-10-12 2022-07-05 自然资源部第二海洋研究所 A deep learning-based classification method for the bottom of small sample underwater acoustic images
CN112364779B (en) * 2020-11-12 2022-10-21 中国电子科技集团公司第五十四研究所 Underwater sound target identification method based on signal processing and deep-shallow network multi-model fusion
CN112580694B (en) * 2020-12-01 2024-04-19 中国船舶重工集团公司第七0九研究所 Small sample image target recognition method and system based on joint attention mechanism
CN112464837B (en) * 2020-12-03 2023-04-07 中国人民解放军战略支援部队信息工程大学 Shallow sea underwater acoustic communication signal modulation identification method and system based on small data samples
CN113253248B (en) * 2021-05-11 2023-06-30 西北工业大学 Small sample vertical array target distance estimation method based on transfer learning
CN113359207B (en) * 2021-06-03 2023-02-03 中国人民解放军国防科技大学 Acoustic-induced water surface micro-motion feature extraction method and device based on terahertz radar

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110225432A (en) * 2019-05-10 2019-09-10 中国船舶重工集团公司第七一五研究所 A kind of sonar target solid listens to method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
基于特征选择与多层级深度迁移学习的风电场短期功率预测;程凯等;高电压技术;20210825;第48卷(第2期);第499页 *
用于水下目标识别的脑听觉感知迁移学习方法;李俊豪等;中国声学学会水声学分会2019年学术会议论文集;20190430;第477-479页 *
面向无人机小样本目标识别的元学习方法研究;李宏男等;无人系统技术;20191130(第6期);第17-22页 *

Also Published As

Publication number Publication date
CN114202056A (en) 2022-03-18

Similar Documents

Publication Publication Date Title
CN114202056B (en) A small-sample underwater target recognition method based on deep transfer learning of multi-platform auditory perception features
CN112364779B (en) Underwater sound target identification method based on signal processing and deep-shallow network multi-model fusion
CN113488060B (en) Voiceprint recognition method and system based on variation information bottleneck
CN113191178A (en) Underwater sound target identification method based on auditory perception feature deep learning
CN105488466B (en) A deep neural network and underwater acoustic target voiceprint feature extraction method
CN110459225B (en) Speaker recognition system based on CNN fusion characteristics
CN108847223B (en) A speech recognition method based on deep residual neural network
CN108171318B (en) Convolution neural network integration method based on simulated annealing-Gaussian function
CN110490265B (en) Image steganalysis method based on double-path convolution and feature fusion
CN110287770B (en) A method for matching and identifying individual targets in water based on convolutional neural network
CN111899757B (en) Single-channel voice separation method and system for target speaker extraction
CN108922513A (en) Speech differentiation method, apparatus, computer equipment and storage medium
CN109785852A (en) A kind of method and system enhancing speaker's voice
CN104778448A (en) Structure adaptive CNN (Convolutional Neural Network)-based face recognition method
CN115602152B (en) Voice enhancement method based on multi-stage attention network
CN115273814B (en) Fake voice detection method, device, computer equipment and storage medium
CN112560603A (en) Underwater sound data set expansion method based on wavelet image
CN115249479B (en) Complex speech recognition method, system and terminal for power grid dispatching based on BRNN
CN113673323A (en) An underwater target recognition method based on the joint decision system of multiple deep learning models
CN117310668A (en) Underwater acoustic target recognition method integrating attention mechanism and deep residual shrinkage network
CN120294842A (en) A machine learning seismic data denoising method based on multi-band preprocessing
CN113724732B (en) A convolutional recurrent neural network model based on the fusion of multi-head attention mechanisms
CN118522309B (en) Method and device for identifying noise sources along highway by using convolutional neural network
CN119889286A (en) Dialect recognition method based on language model
CN113782045B (en) Single-channel voice separation method for multi-scale time delay sampling

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20240911

Address after: No. 889 Zhonghua Road, Huangpu District, Shanghai 200011

Applicant after: China Shipbuilding Group Co.,Ltd.

Country or region after: China

Applicant after: 715 Research Institute of China Shipbuilding Corp.

Address before: 310023 No. 715, Pingfeng street, Xihu District, Hangzhou City, Zhejiang Province

Applicant before: The 715nd Research Institute of China Shipbuilding Industry Corporation

Country or region before: China

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant