[go: up one dir, main page]

CN115762478B - Speech recognition method based on photon pulse neural network - Google Patents

Speech recognition method based on photon pulse neural network Download PDF

Info

Publication number
CN115762478B
CN115762478B CN202211192057.7A CN202211192057A CN115762478B CN 115762478 B CN115762478 B CN 115762478B CN 202211192057 A CN202211192057 A CN 202211192057A CN 115762478 B CN115762478 B CN 115762478B
Authority
CN
China
Prior art keywords
neuron
neural network
neurons
time
vcsel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211192057.7A
Other languages
Chinese (zh)
Other versions
CN115762478A (en
Inventor
项水英
张天瑞
郭星星
张雅慧
郝跃
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202211192057.7A priority Critical patent/CN115762478B/en
Publication of CN115762478A publication Critical patent/CN115762478A/en
Application granted granted Critical
Publication of CN115762478B publication Critical patent/CN115762478B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a voice recognition method based on a photon pulse neural network, which comprises the steps of preprocessing an original voice data set to obtain FBank characteristic values corresponding to a training set and a testing set respectively, training the convolution pulse neural network by utilizing the FBank characteristic values corresponding to the training set, processing FBank characteristic values corresponding to the training set and the testing set by utilizing the trained convolution pulse neural network to obtain high-dimensional characteristics corresponding to the training set and the testing set, training the photon pulse neural network by utilizing the high-dimensional characteristics corresponding to the training set, and processing the high-dimensional characteristics corresponding to the testing set by utilizing the trained photon pulse neural network to obtain a voice recognition result. The voice recognition method based on the photon pulse neural networks has the advantages of low power consumption, high speed and short delay, has higher network complexity, can classify and recognize the standard data set with larger scale, and is suitable for networks with larger scale.

Description

Speech recognition method based on photon pulse neural network
Technical Field
The invention belongs to the technical field of voice recognition, and particularly relates to a voice recognition method based on a photon pulse neural network.
Background
The impulse neural network is a new generation artificial neural network model derived from biological heuristics, mainly uses impulse neurons as basic units, has stronger biological basic support, belongs to a process very close to the processing problem of brain nerve signals, is an effective tool for complex space-time information processing, has better biological feasibility than the traditional ANN, and simultaneously has hardware friendliness and energy conservation based on a sparse impulse coding mode.
For speech recognition tasks, most of the current schemes are designed based on impulse neural networks. For example, a first method (Dennis J,Tran H D,Chng E S.Overlapping sound event recognition using local spectrogram features and the generalised hough transform[J].Pattern Recognition Letters,2013,34(9):1085-1093) provides that local features LSF (Local Spectrogram Feature) are directly extracted on a spectrogram, time and frequency two-dimensional information of LSF is converted into corresponding ignition information in a pulse neural network, and then a voting system is connected to perform classification and discrimination, a second method (Dennis J,Yu Q,Tang H,et al.Temporal coding of local spectrogram features for robust sound recognition[C]//2013IEEE International Conference on Acoustics,Speech and Signal Processing.IEEE,2013:803-807) is based on the first method, the extracted LSF features are firstly injected into an SOM network (Self Organising Maps, self-organizing map network), the output of the SOM is used as input information of the pulse neural network, and learning and training are performed by using Tempotron learning algorithm. Another type of scheme is to take a spectrogram as a special picture, refer to a processing mode of a convolutional neural network, for example, a method III (Dong M,Huang X,Xu B.Unsupervised speech recognition through spike-timing-dependent plasticity in a convolutional spiking neural network[J].PloS one,2018,13(11):e0204596) proposes to use the convolutional impulse neural network as a pre-network extraction feature and use softmax for classification and identification, a method IV (Zhang Z,Liu Q.Spike-Event-Driven Deep Spiking Neural Network With Temporal Encoding[J].IEEE Signal Processing Letters,2021,28:484-488) also proposes to use the convolutional impulse neural network as a pre-extraction feature, but use three convolution kernels with different sizes for convolution, and transmit the extracted feature into the impulse neural network for learning by using a STDBP algorithm.
However, the above-described speech recognition algorithm mainly adopts a simple and ideal impulse neuron model in terms of model use, and the impulse is represented by an impulse function. The representation method lacks an inherent mechanism for actually generating the pulse, can not effectively and accurately simulate the generation and transmission processes of the pulse in the biological neural network, and has the properties of absolute refractory period, relative refractory period and the like of the pulse neuron, thereby greatly reducing the complexity of the network and being inapplicable to networks with larger scale.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a voice recognition method based on a photon pulse neural network. The technical problems to be solved by the invention are realized by the following technical scheme:
a voice recognition method based on a photon pulse neural network comprises the following steps:
Step 1, preprocessing an original voice data set to obtain FBank characteristic values, wherein the original voice data set comprises a training set and a testing set, and the obtained FBank characteristic values comprise FBank characteristic values corresponding to the training set and FBank characteristic values corresponding to the testing set;
step 2, constructing a convolution impulse neural network, and training by utilizing FBank characteristic values corresponding to the training set to obtain a trained convolution impulse neural network;
step 3, processing FBank Feature values corresponding to the training set and FBank Feature values corresponding to the testing set by using a trained convolution impulse neural network to obtain high-dimension Feature features corresponding to the training set and high-dimension Feature features corresponding to the testing set;
Step 4, constructing a photon pulse neural network, and training by utilizing high-dimensional features corresponding to the training set to obtain a trained photon pulse neural network;
And 5, processing the high-dimensional Feature features corresponding to the test set by using the trained photon pulse neural network to obtain a voice recognition result.
In one embodiment of the present invention, in step 1, the preprocessing is performed on the original speech dataset to obtain FBank feature values, including:
extracting valid speech segments from the original speech dataset using an endpoint detection technique;
and extracting the characteristics of the effective voice segment to obtain FBank characteristic values.
In one embodiment of the present invention, step 2 comprises:
21 Constructing a convolution pulse neural network comprising a convolution layer and a pooling layer, and performing time coding on FBank characteristic values corresponding to the training set so as to convert the FBank characteristic values into the firing time of the convolution layer IF neurons;
22 Acquiring firing information for each IF neuron in the convolutional layer;
23 Setting a suppression strategy to determine IF neurons that need to update weights;
24 Updating weights of the IF neurons using an STDP algorithm based on the firing information and the suppression strategy to train the network;
25 Repeating the steps 23) -24) until the preset maximum training times are reached, and obtaining the trained convolution impulse neural network.
In one embodiment of the invention, step 23) comprises:
Setting an ignition suppression strategy:
IF the IF neurons with the same ignition time exist, selecting the IF neuron with the largest membrane voltage corresponding to the ignition time from the IF neurons with the same ignition time to be reserved;
setting an update suppression strategy:
For the same characteristic diagram, IF a plurality of IF neurons fire at adjacent positions, determining an IF neuron which fires earliest, updating weights of the IF neurons, and not updating the rest of the adjacent IF neurons, wherein IF the firing time of the plurality of IF neurons at the adjacent positions is the same, the IF neuron with the largest membrane voltage is selected for updating the weights, and the rest of the adjacent IF neurons are not updated.
In one embodiment of the invention, in step 24), the weights of the IF neurons are updated using the STDP algorithm as follows:
Wherein t i represents the firing time of the input neuron i, t j represents the firing time of the output neuron j, w ij represents the connection weight of the neuron i and the neuron j, Δw ij represents the weight update amount between the neuron i and the neuron j, α + represents the learning rate of the update expression when the firing time of the neuron i is earlier than the firing time of the neuron j, and α - represents the learning rate of the update expression when the firing time of the neuron i is earlier than the firing time of the neuron j.
In one embodiment of the present invention, step 4 comprises:
41 A photon pulse neural network comprising an input layer, an output layer and a discrimination layer is constructed;
42 Initializing the photon pulse neural network, and performing time coding on the high-dimensional Feature features corresponding to the training set so as to convert the high-dimensional Feature features into the ignition time of the VCSEL neurons of the output layer;
43 Acquiring the firing time of the output layer VCSEL neurons;
44 Setting a discrimination mode, and updating the weight of the VCSEL neuron of the output layer based on the weight of time so as to train a network;
45 Repeating steps 42) -44) until the preset maximum training times are reached, and obtaining the trained photon pulse neural network.
In one embodiment of the present invention, in step 44), setting the discriminant includes:
Each output VCSEL neuron is set to correspond to a sample, and when the input is a certain sample, the output VCSEL neurons corresponding to the sample fire earliest, and the remaining VCSEL neurons fire or remain stationary after the neurons.
In one embodiment of the present invention, in step 44), updating the weights of the output layer VCSEL neurons based on the weights of time comprises:
For the output VCSEL neuron n ref corresponding to the current input sample, if and only if the neuron is not firing, the weight size corresponding to the neuron n ref is updated as follows:
Δw=α1K(tmax-ti-tdelay),ti<tmax
Wherein Δw represents a weight increment, α 1 represents a corresponding positive learning rate, t max represents a time corresponding to when output power of the output layer VCSEL neuron takes a maximum value within a simulation cut-off time, t i represents an ignition time of the input layer VCSEL neuron, t delay represents a delay of the VCSEL neuron, and a K function corresponds to an STDP curve.
In one embodiment of the present invention, in step 44), updating the weights of the output layer VCSEL neurons based on the weights of time further comprises:
For other VCSEL neurons n o than the output VCSEL neuron n ref corresponding to the current input sample:
If the firing time t o is earlier than the firing time t ref of the output VCSEL neuron n ref corresponding to the current input sample, updating the weight corresponding to neuron n ref according to the following formula:
Δw=α1K(tmax-ti-tdelay),ti<tmax
If t o is later than t ref and the time difference between them does not exceed the set time threshold, updating the weight size corresponding to neuron n o according to the following formula:
wherein α 2 represents a negative constant learning rate, and t thre represents a set time threshold;
If t o is later than t ref and the time difference between the two is greater than the time threshold t thre, the weight is not updated.
In one embodiment of the present invention, step 5 includes:
inputting the high-dimensional Feature features corresponding to the test set into a trained photon pulse neural network for processing;
calculating the ignition condition of the VCSEL neuron of the output layer of the photon pulse neural network;
And determining a prediction label of the current input sample according to the ignition condition of the VCSEL neuron of the output layer, and completing voice recognition.
The invention has the beneficial effects that:
1. the invention provides a plurality of voice recognition methods based on photon pulse neural networks, which firstly utilize a convolution pulse neural network to extract voice data characteristics, and ensure that characteristic values extracted by the network are discrete; and then, carrying out coding recognition processing by utilizing a photon pulse neural network to obtain a voice recognition result. The method has the advantages of low power consumption, high speed and short delay, has higher network complexity, can classify and identify the standard data set with larger scale, and is suitable for networks with larger scale;
2. The voice recognition method based on the photon pulse neural network is designed based on an actual laser neuron model when the photon pulse neural network structure is designed, and various constraint designs in practice are considered, so that the voice recognition method based on the photon pulse neural network structure is suitable for hardware reasoning.
The present invention will be described in further detail with reference to the accompanying drawings and examples.
Drawings
FIG. 1 is a schematic diagram of a speech recognition method based on a photonic burst neural network according to an embodiment of the present invention;
FIG. 2 is an algorithm framework diagram of a speech recognition method based on a photonic burst neural network according to an embodiment of the present invention;
Fig. 3 is a result obtained by using the voice recognition method based on the photonic burst neural network provided by the invention.
Detailed Description
The present invention will be described in further detail with reference to specific examples, but embodiments of the present invention are not limited thereto.
Example 1
Aiming at a voice recognition task, the invention provides a learning algorithm based on a photon pulse neural network. Firstly, the convolution pulse neural network is utilized to extract the characteristics of the voice signals, so that the characteristic values extracted by the network are discrete, the encoding of the photon pulse neural network is facilitated, a new judgment standard is put forward in the photon pulse neural network, and an updating algorithm of the weight is designed according to the standard. The photon pulse neural network and the method are combined, so that the photon pulse neural network is applied to the aspect of voice recognition, and can be used for classifying and recognizing a standard data set with a large scale.
Specifically, please refer to fig. 1-2 in combination, fig. 1 is a schematic diagram of a voice recognition method based on a photonic pulse neural network according to an embodiment of the present invention, and fig. 2 is an algorithm frame diagram of a voice recognition method based on a photonic pulse neural network according to an embodiment of the present invention.
Firstly, it should be noted that the network architecture provided by the present invention requires using GPU to perform network training and test results. The host used must therefore be equipped with GPU devices of the NVIDIA family. In a specific implementation, the original voice data set can be placed under the current working directory, and the data storage path is set under the Result folder.
Specifically, the implementation steps of the invention include:
and step 1, preprocessing an original voice dataset to obtain FBank characteristic values.
The original voice data set comprises a training set and a testing set, and the FBank characteristic values obtained comprise FBank characteristic values corresponding to the training set and FBank characteristic values corresponding to the testing set.
In this embodiment, for the voice signal preprocessing section, key parameters are set first, and valid voice segments are extracted from the original voice data set by using an endpoint detection technique.
For example, using classical double threshold detection for the original speech dataset file in the wav format, valid speech segments are extracted and invalid speech segments are removed.
And then, extracting the characteristics of the effective voice segment to obtain FBank characteristic values.
Specifically, the number Frames of frames and the number Mels of filters in the mel filter bank are set to extract FBank characteristic values.
And (3) carrying out pre-emphasis, framing, windowing, fourier transformation and Mel filter bank filtering on the obtained effective voice segment to obtain FBank eigenvalue matrix Arr_feature, wherein the Arr_feature is represented as a spectrogram by an image and is used as a special gray level image to be transmitted into a convolution impulse neural network. Frames and Mels are the number of frames and the number of filters in the mel filter bank corresponding to the number of frames in the framing operation and the mel filter bank filtering operation, respectively, and the size of the obtained arr_feature is { Frames, mels }.
And carrying out the operation on all voice data sets including the training set and the testing set to obtain FBank characteristic values corresponding to the training set and FBank characteristic values corresponding to the testing set.
And 2, constructing a convolutional impulse neural network, and training by utilizing FBank characteristic values corresponding to the training set to obtain the trained convolutional impulse neural network.
21 A convolutional impulse neural network comprising a convolutional layer and a pooling layer is constructed, and FBank eigenvalues corresponding to the training set are time-coded to be converted into the firing time of the convolutional layer IF neurons.
Specifically, the convolutional impulse neural network constructed in the embodiment includes a convolutional layer and a pooling layer, the convolutional layer is used for performing convolutional processing on the FBank Feature values, and the pooling layer is used for obtaining higher dimension Feature features obtained after the convolutional processing.
The training of the convolutional impulse neural network is mainly aimed at determining the weights of the neurons of the convolutional layer.
Before training, it is necessary to time-encode the Arr_feature corresponding to the training set obtained in step 1, and inversely proportional convert the Arr_feature into the firing time of the input layer IF neuron.
22 Acquiring firing information for each IF neuron in the convolutional layer.
In this embodiment, the firing information of each IF neuron in the convolution layer is obtained, and includes the convolution characteristic settings of local connection and local weight sharing, which are specifically as follows:
For the convolution layer IF neuron, the calculation expression of the membrane voltage and the judgment of the ignition moment are as follows:
tconv=t and vconv=V(t),when V(t)≥IF_threshold
Wherein s i and w i represent the pulse amplitude and corresponding connection weight generated by the input layer IF neuron i at time t i, respectively, wherein the pulse amplitude s i generated by the input layer IF neuron is set to 1.
According to the above formula, combining local connection and local weight sharing strategies, when the membrane voltage of the convolution layer IF neuron exceeds the ignition threshold value IF_threshold, the ignition time and the membrane voltage value { t conv,vconv } are recorded, and no pulse is generated after ignition.
The local connection and the local weight sharing are different from the traditional convolutional neural network greatly, and the specific contents are as follows:
local connection-in combination with harmonic correlation of the spectrogram, the convolution kernel window contains all frequency bins on the frequency axis (Mels), but retains local connection characteristics on the time axis (Frames). The size of the convolution kernel window may be expressed as Δt, f. Thus, unlike CNN, the feature map obtained by convolving the spectrogram with the convolution kernel is one-dimensional, representing local features extracted over different time periods.
Local weight sharing, namely combining local connection, wherein the feature map represents local features in different time periods, so that the feature map is considered to be divided into a plurality of segments corresponding to different time periods, convolution kernels corresponding to different segments of the same feature map are different, and neurons in the same segment use the same convolution kernel to realize local weight sharing.
23 A suppression strategy is set to determine IF neurons that need to update weights.
After the firing information of the convolution layer IF neuron is obtained [ t conv,vconv ], a suppression policy needs to be set to determine the neuron needing to be subjected to weight update, which is specifically as follows:
First, an ignition suppression strategy is set such that only one ignition neuron can exist for the same position of different feature maps, and the rest are suppressed. The specific ignition suppression strategy can be described as:
IF the IF neurons with the same ignition time exist, selecting the IF neuron with the largest membrane voltage corresponding to the ignition time from the IF neurons with the same ignition time to be reserved;
Then, an update suppression policy is set:
For the same characteristic diagram, IF a plurality of IF neurons fire at adjacent positions, determining an IF neuron which fires earliest, updating weights of the IF neurons, and not updating the rest of the adjacent IF neurons, wherein IF the firing time of the plurality of IF neurons at the adjacent positions is the same, the IF neuron with the largest membrane voltage is selected for updating the weights, and the rest of the adjacent IF neurons are not updated.
24 Based on the firing information and the suppression strategy, updating weights of the IF neurons using an STDP algorithm to train the network.
Specifically, after determining neurons that need to update weights, the neurons are updated by using an STDP algorithm, where the update expression is:
Wherein t i represents the firing time of the input neuron i, t j represents the firing time of the output neuron j, w ij represents the connection weight of the neuron i and the neuron j, Δw ij represents the weight update amount between the neuron i and the neuron j, α + represents the learning rate of the update expression when the firing time of the neuron i is earlier than the firing time of the neuron j, and α - represents the learning rate of the update expression when the firing time of the neuron i is earlier than the firing time of the neuron j.
25 Repeating the steps 23) -24) until the preset maximum training times are reached, and obtaining the trained convolution impulse neural network.
Thus, the weight of the convolution impulse neural network is determined.
And 3, processing FBank characteristic values corresponding to the training set and FBank characteristic values corresponding to the testing set by using the trained convolution impulse neural network to obtain high-dimensional characteristic features corresponding to the training set and high-dimensional characteristic features corresponding to the testing set.
31 Firstly, initializing, namely, reading the trained weight obtained in the step 2, and respectively transmitting FBank characteristic value data corresponding to the training set and the testing set into a network.
32 Then, a pooling layer is introduced into the convolution impulse neural network to extract the Feature features with higher dimensionality.
Specifically, firing of the convolutional layer neurons is obtained in the manner described in step 2 (where only fire suppression is considered and no update operation is performed). After the convolution layer, a pooling layer is added, and the neurons in the window of the convolution layer are pooled (statistical operation) with the following details:
According to the internal paragraph of the feature map divided by the local weight sharing in the step 22), the feature map is connected with the corresponding pooling layer IF neurons, and statistical pooling operation is performed, namely, the membrane voltage of the pooling layer IF neurons is calculated, wherein the pulse firing strength of the convolution layer IF neurons is 1, and the weight of the convolution layer connected with the pooling layer is equal to 1. Thus, calculating the membrane voltage of the pooling layer IF neurons corresponds to calculating the number of convolutional layer neurons firing in a corresponding time period.
And (3) obtaining the high-dimensional Feature features corresponding to the training set and the high-dimensional Feature features corresponding to the test set through the processing of the convolution layer and the pooling layer.
And 4, constructing a photon pulse neural network, and training by utilizing the high-dimensional Feature corresponding to the training set to obtain the trained photon pulse neural network.
41 A photonic pulse neural network comprising an input layer, an output layer and a discrimination layer is constructed.
42 Initializing the photon pulse neural network, and performing time coding on the high-dimensional Feature corresponding to the training set so as to convert the high-dimensional Feature into the ignition time of the VCSEL neuron of the output layer.
In this embodiment, photonic pulse neural network initialization includes input layer and output layer scale, training weight initialization, and time encoding for higher dimensional Feature features.
Specifically, the size of the output layer is determined according to the used data set, the size of the input layer is determined according to the dimension of the Feature extracted in the step 3, and the weight of the network is initialized.
Meanwhile, because of the statistical pooling operation in step 32), the obtained eigenvalue is { f 1,f2,…,fn }, and thus the corresponding eigenvalue needs to be converted into an ignition time within the simulation deadline T.
43 Acquiring firing time of the output layer VCSEL neurons.
Specifically, according to the VCSEL neuron calculation expression, the output power P out of the VCSEL neuron of the output layer is obtained, and whether the neuron fires or not is judged according to whether P out exceeds a threshold value.
However, no matter whether the neuron fires or not, the corresponding time T max when the P out takes the maximum value can be obtained in the simulation deadline T, and the calculation method is as follows:
tmax=Max(Pout)
thus, the firing time t out of a VCSEL neuron can be expressed as:
44 Setting a discriminant mode, and updating the weight of the VCSEL neuron of the output layer based on the weight of time so as to train the network.
First, a discrimination method is set.
Since the output layer size is the same as the sample type of the data set, each output VCSEL neuron is set to correspond to one sample. Each output neuron is set to correspond to a sample, and when the input is a certain sample, the output VCSEL neurons corresponding to the sample fire first, and the rest VCSEL neurons fire or remain stationary after the neurons.
The algorithm is then updated.
In the present embodiment, the updating of the output VCSEL neuron n ref corresponding to the current input sample and the other VCSEL neurons n o other than the neuron n ref are discussed separately.
Case 1: for an output VCSEL neuron n ref corresponding to the current input sample, if and only if the neuron does not fire, its corresponding weight size needs to be increased, updating the weight size corresponding to the neuron n ref according to the following equation:
Δw=α1K(tmax-ti-tdelay),ti<tmax
Wherein Δw represents a weight increment, α 1 represents a corresponding positive learning rate, t max represents a time corresponding to when output power of the output layer VCSEL neuron takes a maximum value within a simulation cut-off time, t i represents an ignition time of the input layer VCSEL neuron, t delay represents a delay of the VCSEL neuron, and a K function corresponds to an STDP curve.
Specifically, the K function represents that mapping a partial STDP curve into the [0, t thre ] period, a higher resolution is obtained for the Δw=k (Δt) update expression.
Case 2 for other VCSEL neurons n o than the output VCSEL neuron n ref corresponding to the current input sample:
If the firing time t o is earlier than the firing time t ref of the output VCSEL neuron n ref corresponding to the current input sample, the weight corresponding to neuron n ref still needs to be increased, and then the weight is still updated according to the method in case 1.
If t o is later than t ref and the time difference between them does not exceed the set time threshold, updating the weight size corresponding to neuron n o according to the following formula:
Where α 2 denotes a negative constant learning rate, t thre denotes a set time threshold, and it can be seen that the update amplitude thereof decreases with an increase in the time difference.
If t o is later than t ref and the time difference between them is greater than the time threshold t thre, it is considered that this situation does not affect the judgment, and the weight is not updated.
45 Repeating steps 42) -44) until the preset maximum training times are reached, and obtaining the trained photon pulse neural network.
Thus, the weight of the photon pulse neural network is determined.
And 5, processing the high-dimensional Feature features corresponding to the test set by using the trained photon pulse neural network to obtain a voice recognition result.
51 Inputting the high-dimensional Feature features corresponding to the test set into a trained photon pulse neural network for processing;
52 Calculating the ignition condition of the VCSEL neuron of the output layer of the photon pulse neural network;
52 According to the ignition condition of the VCSEL neuron of the output layer, combining the discrimination mode proposed in the step 44), determining the predictive label of the current input sample, and completing the voice recognition.
The invention provides a plurality of voice recognition methods based on photon pulse neural networks, which firstly utilize a convolution pulse neural network to extract voice data characteristics, and ensure that characteristic values extracted by the network are discrete; and then, carrying out coding recognition processing by utilizing a photon pulse neural network to obtain a voice recognition result. The method has the advantages of low power consumption, high speed and short delay, has higher network complexity, can classify and identify the standard data set with larger scale, and is suitable for networks with larger scale. Meanwhile, when the photonic pulse neural network structure is designed, various constraint designs in practice are considered based on an actual laser neuron model, so that the photonic pulse neural network structure is suitable for hardware reasoning.
Example two
The voice recognition method based on the photon pulse neural network provided by the invention is illustrated by a specific example.
Step one, preprocessing an original voice data set
Assuming that the original voice data set comprises 1600 training samples and 400 test samples, the voice data processed by the double threshold detection method is saved.
To obtain Fbank eigenvalues, the number of frames Frames =41, and the number of filters in the mel filter bank Mels =40 are set to obtain FBank eigenvalue matrix arr_feature, sizeof (arr_feature) = {40,41}. The Arr_feature is converted into a one-dimensional vector and stored in a csv file, and the name of the voice file is stored at the same time for obtaining the label information.
Step two, convolutional impulse neural network training
Firstly, performing time coding on Arr_feature corresponding to each piece of voice data;
then, training part of the network parameters:
In combination with the local connection characteristic, the convolution kernel size is set to {40×6},6 represents the local connection on the time axis, and the step size stride=1, and thus the size= {36×1} of the obtained feature map.
And setting the size of the segments in the feature map as local_weight_sharing_size=4 by combining the Local weight sharing characteristic, wherein the neurons in the same segment share one Kernel, and the Kernel used by the neurons in different segments (in the same feature map) is different.
The firing threshold if_threshold=33 of the convolution layer IF neuron is set, the Feature map number feature_map=50, and the film voltage of the convolution layer IF neuron is calculated. Thus, arr_t and Arr_v matrices having a size {36×50} are obtained in the convolution layer, and the ignition information and the film voltage corresponding to the ignition timing are stored.
And determining neurons { t conv,vconv } needing to be subjected to weight updating according to Arr_t and Arr_v according to the inhibition strategy, and updating according to the STDP updating expression.
Setting the training frequency of the convolutional impulse neural network as 6, repeating the steps before ending, and storing the trained w weight.
Step three, acquiring Feature with higher dimension by using a trained convolutional neural network
The pooling layer is introduced to carry out statistical pooling operation on the convolution layer, and the size of the pooling window is 4 the same as that of the local_weight_sharing_size, so that the size of the pooling layer is {9×50}.
And extracting features of FBank Feature values of the training set and the testing set according to the trained weight values to obtain a high-dimensional Feature value Feature, and converting {9×50} into a one-dimensional vector form {1×450} to store.
Training photon pulse neural network
According to the statistical pooling operation and local_weight_sharing_size in the convolutional impulse neural network, the range of values of Feature characteristic values is discrete and limited, the possible values are {0,1,2,3,4}, the inverse proportion time coding is carried out on the Feature characteristic values, the converted ignition time is { T,9ns,8ns,7ns,6ns }, wherein the simulated cutoff time T=15 ns, so the ignition time T < T of the output neuron is calculated, and the ignition time t=1 s of the output neuron is set when the neuron is not ignited.
Setting the input layer size of the photon pulse neural network as 450, corresponding to the dimension of Feature, the output layer size as 10, corresponding to the sample type number, and the size of the initialization weight as {450x10};
Assuming that the current input sample label=0, after the firing conditions { t 0,t1,…,t9 } of 10 output neurons are acquired, the neurons corresponding to { t 0 } and { t 1,t2,…,t9 } need to be updated respectively.
For { t 0 }, the weight thereof needs to be updated if and only if t 0 =1 (not ignited). Whether firing or not, the neuron t ref is recorded, if not fired t ref=tmax.
For { t 1,t2,…,t9 }, if and only if the corresponding neuron fires, the weight of the corresponding neuron is updated according to t ref of its firing time t o,{t0 } and the effective time difference t thre = 4 ns.
Setting the training frequency of the photon pulse neural network as 50 times, repeating the steps before ending, and storing the trained w weight.
Fifthly, performing voice recognition by utilizing the trained network
Specifically, sample data of the test set is input, trained weights are imported, and sample types are deduced. Referring to fig. 3, fig. 3 is a result obtained by using the voice recognition method based on the photonic pulse neural network provided by the invention.
As can be seen from FIG. 3, the identification accuracy of the method can reach 93.3%, which meets the expectations. However, a key factor that restricts the accuracy of identifying the photonic pulse neural network is that the output layer VCSEL neurons do not generate pulses, so that the sample type (the prediction type corresponds to X on the abscissa in fig. 3) cannot be determined, and the effect of the problem is far greater than that of erroneous determination, which is why the photonic pulse neural network is difficult to train.
The foregoing is a further detailed description of the invention in connection with the preferred embodiments, and it is not intended that the invention be limited to the specific embodiments described. It will be apparent to those skilled in the art that several simple deductions or substitutions may be made without departing from the spirit of the invention, and these should be considered to be within the scope of the invention.

Claims (9)

1. A voice recognition method based on a photonic pulse neural network, comprising:
Step 1, preprocessing an original voice data set to obtain FBank characteristic values, wherein the original voice data set comprises a training set and a testing set, and the obtained FBank characteristic values comprise FBank characteristic values corresponding to the training set and FBank characteristic values corresponding to the testing set;
step 2, constructing a convolution impulse neural network, and training by utilizing FBank characteristic values corresponding to the training set to obtain a trained convolution impulse neural network;
step 3, processing FBank Feature values corresponding to the training set and FBank Feature values corresponding to the testing set by using a trained convolution impulse neural network to obtain high-dimension Feature features corresponding to the training set and high-dimension Feature features corresponding to the testing set;
Step 4, constructing a photon pulse neural network, and training by utilizing high-dimensional features corresponding to the training set to obtain a trained photon pulse neural network;
Step 5, processing the high-dimensional Feature corresponding to the test set by using the trained photon pulse neural network to obtain a voice recognition result;
wherein, step 4 includes:
41 A photon pulse neural network comprising an input layer, an output layer and a discrimination layer is constructed;
42 Initializing the photon pulse neural network, and performing time coding on the high-dimensional Feature features corresponding to the training set so as to convert the high-dimensional Feature features into the ignition time of the VCSEL neurons of the output layer;
43 Acquiring the firing time of the output layer VCSEL neurons;
44 Setting a discrimination mode, and updating the weight of the VCSEL neuron of the output layer based on the weight of time so as to train a network;
45 Repeating steps 42) -44) until the preset maximum training times are reached, and obtaining the trained photon pulse neural network.
2. The voice recognition method based on the photonic burst neural network according to claim 1, wherein in step 1, preprocessing an original voice dataset to obtain FBank feature values, includes:
extracting valid speech segments from the original speech dataset using an endpoint detection technique;
and extracting the characteristics of the effective voice segment to obtain FBank characteristic values.
3. The voice recognition method based on the photonic burst neural network according to claim 1, wherein step2 comprises:
21 Constructing a convolution pulse neural network comprising a convolution layer and a pooling layer, and performing time coding on FBank characteristic values corresponding to the training set so as to convert the FBank characteristic values into the firing time of the convolution layer IF neurons;
22 Acquiring firing information for each IF neuron in the convolutional layer;
23 Setting a suppression strategy to determine IF neurons that need to update weights;
24 Updating weights of the IF neurons using an STDP algorithm based on the firing information and the suppression strategy to train the network;
25 Repeating the steps 23) -24) until the preset maximum training times are reached, and obtaining the trained convolution impulse neural network.
4. A photonic impulse neural network-based speech recognition method according to claim 3, characterized in that step 23) comprises:
Setting an ignition suppression strategy:
IF the IF neurons with the same ignition time exist, selecting the IF neuron with the largest membrane voltage corresponding to the ignition time from the IF neurons with the same ignition time to be reserved;
setting an update suppression strategy:
For the same characteristic diagram, IF a plurality of IF neurons fire at adjacent positions, determining an IF neuron which fires earliest, updating weights of the IF neurons, and not updating the rest of the adjacent IF neurons, wherein IF the firing time of the plurality of IF neurons at the adjacent positions is the same, the IF neuron with the largest membrane voltage is selected for updating the weights, and the rest of the adjacent IF neurons are not updated.
5. A photonic impulse neural network-based speech recognition method according to claim 3, characterized in that in step 24) the weights of the IF neurons are updated with STDP algorithm as follows:
Wherein t i represents the firing time of the input neuron i, t j represents the firing time of the output neuron j, w ij represents the connection weight of the neuron i and the neuron j, Δw ij represents the weight update amount between the neuron i and the neuron j, α + represents the learning rate of the update expression when the firing time of the neuron i is earlier than the firing time of the neuron j, and α - represents the learning rate of the update expression when the firing time of the neuron i is equal to or later than the firing time of the neuron j.
6. The photonic impulse neural network-based speech recognition method according to claim 1, wherein in step 44), setting the discriminant comprises:
Each output VCSEL neuron is set to correspond to a sample, and when the input is a certain sample, the output VCSEL neurons corresponding to the sample fire earliest, and the remaining VCSEL neurons fire or remain stationary after the neurons.
7. The photonic impulse neural network-based speech recognition method of claim 6, wherein in step 44), updating the weights of the output layer VCSEL neurons based on time weights comprises:
For the output VCSEL neuron n ref corresponding to the current input sample, if and only if the neuron is not firing, the weight size corresponding to the neuron n ref is updated as follows:
Δw=α1K(tmax-ti-tdelay),ti<tmax
Wherein Δw represents a weight increment, α 1 represents a corresponding positive learning rate, t max represents a time corresponding to when output power of the output layer VCSEL neuron takes a maximum value within a simulation cut-off time, t i represents an ignition time of the input layer VCSEL neuron, t delay represents a delay of the VCSEL neuron, and a K function corresponds to an STDP curve.
8. The photonic impulse neural network-based speech recognition method of claim 7, wherein in step 44), updating the weights of the output layer VCSEL neurons based on time weights further comprises:
For other VCSEL neurons n o than the output VCSEL neuron n ref corresponding to the current input sample:
If the firing time t o is earlier than the firing time t ref of the output VCSEL neuron n ref corresponding to the current input sample, updating the weight corresponding to neuron n ref according to the following formula:
Δw=α1K(tmax-ti-tdelay),ti<tmax
If t o is later than t ref and the time difference between them does not exceed the set time threshold, updating the weight size corresponding to neuron n o according to the following formula:
wherein α 2 represents a negative constant learning rate, and t thre represents a set time threshold;
If t o is later than t ref and the time difference between the two is greater than the time threshold t thre, the weight is not updated.
9. The photonic impulse neural network-based speech recognition method according to claim 1, wherein step 5 comprises:
inputting the high-dimensional Feature features corresponding to the test set into a trained photon pulse neural network for processing;
calculating the ignition condition of the VCSEL neuron of the output layer of the photon pulse neural network;
And determining a prediction label of the current input sample according to the ignition condition of the VCSEL neuron of the output layer, and completing voice recognition.
CN202211192057.7A 2022-09-28 2022-09-28 Speech recognition method based on photon pulse neural network Active CN115762478B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211192057.7A CN115762478B (en) 2022-09-28 2022-09-28 Speech recognition method based on photon pulse neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211192057.7A CN115762478B (en) 2022-09-28 2022-09-28 Speech recognition method based on photon pulse neural network

Publications (2)

Publication Number Publication Date
CN115762478A CN115762478A (en) 2023-03-07
CN115762478B true CN115762478B (en) 2025-04-15

Family

ID=85352066

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211192057.7A Active CN115762478B (en) 2022-09-28 2022-09-28 Speech recognition method based on photon pulse neural network

Country Status (1)

Country Link
CN (1) CN115762478B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116994573B (en) * 2023-05-16 2025-09-09 北京理工大学 End-to-end voice recognition method and system based on impulse neural network
CN119252276B (en) * 2024-12-04 2025-03-18 华东交通大学 Unknown audio event recognition algorithm based on impulse neural network

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109660342A (en) * 2018-12-24 2019-04-19 江苏亨通智能物联系统有限公司 Wireless speech transfers net system based on quantum cryptography
CN112068555A (en) * 2020-08-27 2020-12-11 江南大学 A voice-controlled mobile robot based on semantic SLAM method

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11263516B2 (en) * 2016-08-02 2022-03-01 International Business Machines Corporation Neural network based acoustic models for speech recognition by grouping context-dependent targets
US20210290171A1 (en) * 2020-03-20 2021-09-23 Hi Llc Systems And Methods For Noise Removal In An Optical Measurement System
CN112633497B (en) * 2020-12-21 2023-08-18 中山大学 A Training Method for Convolutional Spiking Neural Networks Based on Reweighted Membrane Voltages
CN114595807B (en) * 2022-03-17 2025-05-06 西安电子科技大学 A device for VCSEL-SA multi-layer photon pulse neural network classification

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109660342A (en) * 2018-12-24 2019-04-19 江苏亨通智能物联系统有限公司 Wireless speech transfers net system based on quantum cryptography
CN112068555A (en) * 2020-08-27 2020-12-11 江南大学 A voice-controlled mobile robot based on semantic SLAM method

Also Published As

Publication number Publication date
CN115762478A (en) 2023-03-07

Similar Documents

Publication Publication Date Title
CN113287122B (en) Spiking Neural Networks
CN113780242B (en) A cross-scenario underwater acoustic target classification method based on model transfer learning
CN113111786B (en) Underwater target identification method based on small sample training diagram convolutional network
CN109448749A (en) Voice extraction method, the system, device paid attention to based on the supervised learning sense of hearing
CN113488060A (en) Voiceprint recognition method and system based on variation information bottleneck
CN115762478B (en) Speech recognition method based on photon pulse neural network
CN113488073A (en) Multi-feature fusion based counterfeit voice detection method and device
CN109741733B (en) Speech Phoneme Recognition Method Based on Consistent Routing Network
CN117636086B (en) Passive domain adaptive target detection method and device
CN113628615A (en) Speech recognition method, device, electronic device and storage medium
CN116153331A (en) Cross-domain self-adaption-based deep fake voice detection method
CN117198331B (en) An intelligent recognition method and system for underwater targets based on logarithmic ratio adjustment
CN117807502A (en) Underwater sound target identification method based on RNN structure and differential learning rate retraining
CN116663623A (en) Brain-like synapse learning method and neuromorphic hardware system of brain-like technology
CN110969203A (en) HRRP data redundancy removing method based on self-correlation and CAM network
CN115294414A (en) A new method of transforming time series data to two-dimensional image based on Shapelet Transform feature extraction
CN114037864A (en) Method and device for constructing image classification model, electronic equipment and storage medium
KR102300599B1 (en) Method and Apparatus for Determining Stress in Speech Signal Using Weight
CN115952493B (en) A black box model reverse attack method, attack device and storage medium
Liu et al. Underwater acoustic classification using wavelet scattering transform and convolutional neural network with limited dataset
Lin et al. Underwater passive target recognition based on self-supervised contrastive learning
CN117457022A (en) A method for underwater target detection and recognition based on two-way matching transfer
CN117370832A (en) Underwater sound target identification method and device based on Bayesian neural network
CN115329821A (en) Ship noise identification method based on pairing coding network and comparison learning
CN113095381A (en) Underwater sound target identification method and system based on improved DBN

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant