CN115762478B

CN115762478B - Speech recognition method based on photon pulse neural network

Info

Publication number: CN115762478B
Application number: CN202211192057.7A
Authority: CN
Inventors: 项水英; 张天瑞; 郭星星; 张雅慧; 郝跃
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2022-09-28
Filing date: 2022-09-28
Publication date: 2025-04-15
Anticipated expiration: 2042-09-28
Also published as: CN115762478A

Abstract

The invention discloses a voice recognition method based on a photon pulse neural network, which comprises the steps of preprocessing an original voice data set to obtain FBank characteristic values corresponding to a training set and a testing set respectively, training the convolution pulse neural network by utilizing the FBank characteristic values corresponding to the training set, processing FBank characteristic values corresponding to the training set and the testing set by utilizing the trained convolution pulse neural network to obtain high-dimensional characteristics corresponding to the training set and the testing set, training the photon pulse neural network by utilizing the high-dimensional characteristics corresponding to the training set, and processing the high-dimensional characteristics corresponding to the testing set by utilizing the trained photon pulse neural network to obtain a voice recognition result. The voice recognition method based on the photon pulse neural networks has the advantages of low power consumption, high speed and short delay, has higher network complexity, can classify and recognize the standard data set with larger scale, and is suitable for networks with larger scale.

Description

Speech recognition method based on photon pulse neural network

Technical Field

The invention belongs to the technical field of voice recognition, and particularly relates to a voice recognition method based on a photon pulse neural network.

Background

The impulse neural network is a new generation artificial neural network model derived from biological heuristics, mainly uses impulse neurons as basic units, has stronger biological basic support, belongs to a process very close to the processing problem of brain nerve signals, is an effective tool for complex space-time information processing, has better biological feasibility than the traditional ANN, and simultaneously has hardware friendliness and energy conservation based on a sparse impulse coding mode.

For speech recognition tasks, most of the current schemes are designed based on impulse neural networks. For example, a first method (Dennis J,Tran H D,Chng E S.Overlapping sound event recognition using local spectrogram features and the generalised hough transform[J].Pattern Recognition Letters,2013,34(9):1085-1093) provides that local features LSF (Local Spectrogram Feature) are directly extracted on a spectrogram, time and frequency two-dimensional information of LSF is converted into corresponding ignition information in a pulse neural network, and then a voting system is connected to perform classification and discrimination, a second method (Dennis J,Yu Q,Tang H,et al.Temporal coding of local spectrogram features for robust sound recognition[C]//2013IEEE International Conference on Acoustics,Speech and Signal Processing.IEEE,2013:803-807) is based on the first method, the extracted LSF features are firstly injected into an SOM network (Self Organising Maps, self-organizing map network), the output of the SOM is used as input information of the pulse neural network, and learning and training are performed by using Tempotron learning algorithm. Another type of scheme is to take a spectrogram as a special picture, refer to a processing mode of a convolutional neural network, for example, a method III (Dong M,Huang X,Xu B.Unsupervised speech recognition through spike-timing-dependent plasticity in a convolutional spiking neural network[J].PloS one,2018,13(11):e0204596) proposes to use the convolutional impulse neural network as a pre-network extraction feature and use softmax for classification and identification, a method IV (Zhang Z,Liu Q.Spike-Event-Driven Deep Spiking Neural Network With Temporal Encoding[J].IEEE Signal Processing Letters,2021,28:484-488) also proposes to use the convolutional impulse neural network as a pre-extraction feature, but use three convolution kernels with different sizes for convolution, and transmit the extracted feature into the impulse neural network for learning by using a STDBP algorithm.

However, the above-described speech recognition algorithm mainly adopts a simple and ideal impulse neuron model in terms of model use, and the impulse is represented by an impulse function. The representation method lacks an inherent mechanism for actually generating the pulse, can not effectively and accurately simulate the generation and transmission processes of the pulse in the biological neural network, and has the properties of absolute refractory period, relative refractory period and the like of the pulse neuron, thereby greatly reducing the complexity of the network and being inapplicable to networks with larger scale.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides a voice recognition method based on a photon pulse neural network. The technical problems to be solved by the invention are realized by the following technical scheme:

a voice recognition method based on a photon pulse neural network comprises the following steps:

Step 1, preprocessing an original voice data set to obtain FBank characteristic values, wherein the original voice data set comprises a training set and a testing set, and the obtained FBank characteristic values comprise FBank characteristic values corresponding to the training set and FBank characteristic values corresponding to the testing set;

step 2, constructing a convolution impulse neural network, and training by utilizing FBank characteristic values corresponding to the training set to obtain a trained convolution impulse neural network;

step 3, processing FBank Feature values corresponding to the training set and FBank Feature values corresponding to the testing set by using a trained convolution impulse neural network to obtain high-dimension Feature features corresponding to the training set and high-dimension Feature features corresponding to the testing set;

Step 4, constructing a photon pulse neural network, and training by utilizing high-dimensional features corresponding to the training set to obtain a trained photon pulse neural network;

And 5, processing the high-dimensional Feature features corresponding to the test set by using the trained photon pulse neural network to obtain a voice recognition result.

In one embodiment of the present invention, in step 1, the preprocessing is performed on the original speech dataset to obtain FBank feature values, including:

extracting valid speech segments from the original speech dataset using an endpoint detection technique;

and extracting the characteristics of the effective voice segment to obtain FBank characteristic values.

In one embodiment of the present invention, step 2 comprises:

21 Constructing a convolution pulse neural network comprising a convolution layer and a pooling layer, and performing time coding on FBank characteristic values corresponding to the training set so as to convert the FBank characteristic values into the firing time of the convolution layer IF neurons;

22 Acquiring firing information for each IF neuron in the convolutional layer;

23 Setting a suppression strategy to determine IF neurons that need to update weights;

24 Updating weights of the IF neurons using an STDP algorithm based on the firing information and the suppression strategy to train the network;

25 Repeating the steps 23) -24) until the preset maximum training times are reached, and obtaining the trained convolution impulse neural network.

In one embodiment of the invention, step 23) comprises:

Setting an ignition suppression strategy:

IF the IF neurons with the same ignition time exist, selecting the IF neuron with the largest membrane voltage corresponding to the ignition time from the IF neurons with the same ignition time to be reserved;

setting an update suppression strategy:

For the same characteristic diagram, IF a plurality of IF neurons fire at adjacent positions, determining an IF neuron which fires earliest, updating weights of the IF neurons, and not updating the rest of the adjacent IF neurons, wherein IF the firing time of the plurality of IF neurons at the adjacent positions is the same, the IF neuron with the largest membrane voltage is selected for updating the weights, and the rest of the adjacent IF neurons are not updated.

In one embodiment of the invention, in step 24), the weights of the IF neurons are updated using the STDP algorithm as follows:

Wherein t _i represents the firing time of the input neuron i, t _j represents the firing time of the output neuron j, w _ij represents the connection weight of the neuron i and the neuron j, Δw _ij represents the weight update amount between the neuron i and the neuron j, α ⁺ represents the learning rate of the update expression when the firing time of the neuron i is earlier than the firing time of the neuron j, and α ^- represents the learning rate of the update expression when the firing time of the neuron i is earlier than the firing time of the neuron j.

In one embodiment of the present invention, step 4 comprises:

41 A photon pulse neural network comprising an input layer, an output layer and a discrimination layer is constructed;

42 Initializing the photon pulse neural network, and performing time coding on the high-dimensional Feature features corresponding to the training set so as to convert the high-dimensional Feature features into the ignition time of the VCSEL neurons of the output layer;

43 Acquiring the firing time of the output layer VCSEL neurons;

44 Setting a discrimination mode, and updating the weight of the VCSEL neuron of the output layer based on the weight of time so as to train a network;

45 Repeating steps 42) -44) until the preset maximum training times are reached, and obtaining the trained photon pulse neural network.

In one embodiment of the present invention, in step 44), setting the discriminant includes:

Each output VCSEL neuron is set to correspond to a sample, and when the input is a certain sample, the output VCSEL neurons corresponding to the sample fire earliest, and the remaining VCSEL neurons fire or remain stationary after the neurons.

In one embodiment of the present invention, in step 44), updating the weights of the output layer VCSEL neurons based on the weights of time comprises:

For the output VCSEL neuron n _ref corresponding to the current input sample, if and only if the neuron is not firing, the weight size corresponding to the neuron n _ref is updated as follows:

Δw=α₁K(t_max-t_i-t_delay),t_i<t_max

Wherein Δw represents a weight increment, α ₁ represents a corresponding positive learning rate, t _max represents a time corresponding to when output power of the output layer VCSEL neuron takes a maximum value within a simulation cut-off time, t _i represents an ignition time of the input layer VCSEL neuron, t _delay represents a delay of the VCSEL neuron, and a K function corresponds to an STDP curve.

In one embodiment of the present invention, in step 44), updating the weights of the output layer VCSEL neurons based on the weights of time further comprises:

For other VCSEL neurons n _o than the output VCSEL neuron n _ref corresponding to the current input sample:

If the firing time t _o is earlier than the firing time t _ref of the output VCSEL neuron n _ref corresponding to the current input sample, updating the weight corresponding to neuron n _ref according to the following formula:

Δw=α₁K(t_max-t_i-t_delay),t_i<t_max

If t _o is later than t _ref and the time difference between them does not exceed the set time threshold, updating the weight size corresponding to neuron n _o according to the following formula:

wherein α ₂ represents a negative constant learning rate, and t _thre represents a set time threshold;

If t _o is later than t _ref and the time difference between the two is greater than the time threshold t _thre, the weight is not updated.

In one embodiment of the present invention, step 5 includes:

inputting the high-dimensional Feature features corresponding to the test set into a trained photon pulse neural network for processing;

calculating the ignition condition of the VCSEL neuron of the output layer of the photon pulse neural network;

And determining a prediction label of the current input sample according to the ignition condition of the VCSEL neuron of the output layer, and completing voice recognition.

The invention has the beneficial effects that:

1. the invention provides a plurality of voice recognition methods based on photon pulse neural networks, which firstly utilize a convolution pulse neural network to extract voice data characteristics, and ensure that characteristic values extracted by the network are discrete; and then, carrying out coding recognition processing by utilizing a photon pulse neural network to obtain a voice recognition result. The method has the advantages of low power consumption, high speed and short delay, has higher network complexity, can classify and identify the standard data set with larger scale, and is suitable for networks with larger scale;

2. The voice recognition method based on the photon pulse neural network is designed based on an actual laser neuron model when the photon pulse neural network structure is designed, and various constraint designs in practice are considered, so that the voice recognition method based on the photon pulse neural network structure is suitable for hardware reasoning.

The present invention will be described in further detail with reference to the accompanying drawings and examples.

Drawings

FIG. 1 is a schematic diagram of a speech recognition method based on a photonic burst neural network according to an embodiment of the present invention;

FIG. 2 is an algorithm framework diagram of a speech recognition method based on a photonic burst neural network according to an embodiment of the present invention;

Fig. 3 is a result obtained by using the voice recognition method based on the photonic burst neural network provided by the invention.

Detailed Description

The present invention will be described in further detail with reference to specific examples, but embodiments of the present invention are not limited thereto.

Example 1

Aiming at a voice recognition task, the invention provides a learning algorithm based on a photon pulse neural network. Firstly, the convolution pulse neural network is utilized to extract the characteristics of the voice signals, so that the characteristic values extracted by the network are discrete, the encoding of the photon pulse neural network is facilitated, a new judgment standard is put forward in the photon pulse neural network, and an updating algorithm of the weight is designed according to the standard. The photon pulse neural network and the method are combined, so that the photon pulse neural network is applied to the aspect of voice recognition, and can be used for classifying and recognizing a standard data set with a large scale.

Specifically, please refer to fig. 1-2 in combination, fig. 1 is a schematic diagram of a voice recognition method based on a photonic pulse neural network according to an embodiment of the present invention, and fig. 2 is an algorithm frame diagram of a voice recognition method based on a photonic pulse neural network according to an embodiment of the present invention.

Firstly, it should be noted that the network architecture provided by the present invention requires using GPU to perform network training and test results. The host used must therefore be equipped with GPU devices of the NVIDIA family. In a specific implementation, the original voice data set can be placed under the current working directory, and the data storage path is set under the Result folder.

Specifically, the implementation steps of the invention include:

and step 1, preprocessing an original voice dataset to obtain FBank characteristic values.

The original voice data set comprises a training set and a testing set, and the FBank characteristic values obtained comprise FBank characteristic values corresponding to the training set and FBank characteristic values corresponding to the testing set.

In this embodiment, for the voice signal preprocessing section, key parameters are set first, and valid voice segments are extracted from the original voice data set by using an endpoint detection technique.

For example, using classical double threshold detection for the original speech dataset file in the wav format, valid speech segments are extracted and invalid speech segments are removed.

And then, extracting the characteristics of the effective voice segment to obtain FBank characteristic values.

Specifically, the number Frames of frames and the number Mels of filters in the mel filter bank are set to extract FBank characteristic values.

And (3) carrying out pre-emphasis, framing, windowing, fourier transformation and Mel filter bank filtering on the obtained effective voice segment to obtain FBank eigenvalue matrix Arr_feature, wherein the Arr_feature is represented as a spectrogram by an image and is used as a special gray level image to be transmitted into a convolution impulse neural network. Frames and Mels are the number of frames and the number of filters in the mel filter bank corresponding to the number of frames in the framing operation and the mel filter bank filtering operation, respectively, and the size of the obtained arr_feature is { Frames, mels }.

And carrying out the operation on all voice data sets including the training set and the testing set to obtain FBank characteristic values corresponding to the training set and FBank characteristic values corresponding to the testing set.

And 2, constructing a convolutional impulse neural network, and training by utilizing FBank characteristic values corresponding to the training set to obtain the trained convolutional impulse neural network.

21 A convolutional impulse neural network comprising a convolutional layer and a pooling layer is constructed, and FBank eigenvalues corresponding to the training set are time-coded to be converted into the firing time of the convolutional layer IF neurons.

Specifically, the convolutional impulse neural network constructed in the embodiment includes a convolutional layer and a pooling layer, the convolutional layer is used for performing convolutional processing on the FBank Feature values, and the pooling layer is used for obtaining higher dimension Feature features obtained after the convolutional processing.

The training of the convolutional impulse neural network is mainly aimed at determining the weights of the neurons of the convolutional layer.

Before training, it is necessary to time-encode the Arr_feature corresponding to the training set obtained in step 1, and inversely proportional convert the Arr_feature into the firing time of the input layer IF neuron.

22 Acquiring firing information for each IF neuron in the convolutional layer.

In this embodiment, the firing information of each IF neuron in the convolution layer is obtained, and includes the convolution characteristic settings of local connection and local weight sharing, which are specifically as follows:

For the convolution layer IF neuron, the calculation expression of the membrane voltage and the judgment of the ignition moment are as follows:

t_conv＝t and v_conv=V(t),when V(t)≥IF_threshold

Wherein s _i and w _i represent the pulse amplitude and corresponding connection weight generated by the input layer IF neuron i at time t _i, respectively, wherein the pulse amplitude s _i generated by the input layer IF neuron is set to 1.

According to the above formula, combining local connection and local weight sharing strategies, when the membrane voltage of the convolution layer IF neuron exceeds the ignition threshold value IF_threshold, the ignition time and the membrane voltage value { t _conv,v_conv } are recorded, and no pulse is generated after ignition.

The local connection and the local weight sharing are different from the traditional convolutional neural network greatly, and the specific contents are as follows:

local connection-in combination with harmonic correlation of the spectrogram, the convolution kernel window contains all frequency bins on the frequency axis (Mels), but retains local connection characteristics on the time axis (Frames). The size of the convolution kernel window may be expressed as Δt, f. Thus, unlike CNN, the feature map obtained by convolving the spectrogram with the convolution kernel is one-dimensional, representing local features extracted over different time periods.

Local weight sharing, namely combining local connection, wherein the feature map represents local features in different time periods, so that the feature map is considered to be divided into a plurality of segments corresponding to different time periods, convolution kernels corresponding to different segments of the same feature map are different, and neurons in the same segment use the same convolution kernel to realize local weight sharing.

23 A suppression strategy is set to determine IF neurons that need to update weights.

After the firing information of the convolution layer IF neuron is obtained [ t _conv,v_conv ], a suppression policy needs to be set to determine the neuron needing to be subjected to weight update, which is specifically as follows:

First, an ignition suppression strategy is set such that only one ignition neuron can exist for the same position of different feature maps, and the rest are suppressed. The specific ignition suppression strategy can be described as:

Then, an update suppression policy is set:

24 Based on the firing information and the suppression strategy, updating weights of the IF neurons using an STDP algorithm to train the network.

Specifically, after determining neurons that need to update weights, the neurons are updated by using an STDP algorithm, where the update expression is:

Thus, the weight of the convolution impulse neural network is determined.

And 3, processing FBank characteristic values corresponding to the training set and FBank characteristic values corresponding to the testing set by using the trained convolution impulse neural network to obtain high-dimensional characteristic features corresponding to the training set and high-dimensional characteristic features corresponding to the testing set.

31 Firstly, initializing, namely, reading the trained weight obtained in the step 2, and respectively transmitting FBank characteristic value data corresponding to the training set and the testing set into a network.

32 Then, a pooling layer is introduced into the convolution impulse neural network to extract the Feature features with higher dimensionality.

Specifically, firing of the convolutional layer neurons is obtained in the manner described in step 2 (where only fire suppression is considered and no update operation is performed). After the convolution layer, a pooling layer is added, and the neurons in the window of the convolution layer are pooled (statistical operation) with the following details:

According to the internal paragraph of the feature map divided by the local weight sharing in the step 22), the feature map is connected with the corresponding pooling layer IF neurons, and statistical pooling operation is performed, namely, the membrane voltage of the pooling layer IF neurons is calculated, wherein the pulse firing strength of the convolution layer IF neurons is 1, and the weight of the convolution layer connected with the pooling layer is equal to 1. Thus, calculating the membrane voltage of the pooling layer IF neurons corresponds to calculating the number of convolutional layer neurons firing in a corresponding time period.

And (3) obtaining the high-dimensional Feature features corresponding to the training set and the high-dimensional Feature features corresponding to the test set through the processing of the convolution layer and the pooling layer.

And 4, constructing a photon pulse neural network, and training by utilizing the high-dimensional Feature corresponding to the training set to obtain the trained photon pulse neural network.

41 A photonic pulse neural network comprising an input layer, an output layer and a discrimination layer is constructed.

42 Initializing the photon pulse neural network, and performing time coding on the high-dimensional Feature corresponding to the training set so as to convert the high-dimensional Feature into the ignition time of the VCSEL neuron of the output layer.

In this embodiment, photonic pulse neural network initialization includes input layer and output layer scale, training weight initialization, and time encoding for higher dimensional Feature features.

Specifically, the size of the output layer is determined according to the used data set, the size of the input layer is determined according to the dimension of the Feature extracted in the step 3, and the weight of the network is initialized.

Meanwhile, because of the statistical pooling operation in step 32), the obtained eigenvalue is { f ₁,f₂,…,f_n }, and thus the corresponding eigenvalue needs to be converted into an ignition time within the simulation deadline T.

43 Acquiring firing time of the output layer VCSEL neurons.

Specifically, according to the VCSEL neuron calculation expression, the output power P _out of the VCSEL neuron of the output layer is obtained, and whether the neuron fires or not is judged according to whether P _out exceeds a threshold value.

However, no matter whether the neuron fires or not, the corresponding time T _max when the P _out takes the maximum value can be obtained in the simulation deadline T, and the calculation method is as follows:

t_max＝Max(P_out)

thus, the firing time t _out of a VCSEL neuron can be expressed as:

44 Setting a discriminant mode, and updating the weight of the VCSEL neuron of the output layer based on the weight of time so as to train the network.

First, a discrimination method is set.

Since the output layer size is the same as the sample type of the data set, each output VCSEL neuron is set to correspond to one sample. Each output neuron is set to correspond to a sample, and when the input is a certain sample, the output VCSEL neurons corresponding to the sample fire first, and the rest VCSEL neurons fire or remain stationary after the neurons.

The algorithm is then updated.

In the present embodiment, the updating of the output VCSEL neuron n _ref corresponding to the current input sample and the other VCSEL neurons n _o other than the neuron n _ref are discussed separately.

Case 1: for an output VCSEL neuron n _ref corresponding to the current input sample, if and only if the neuron does not fire, its corresponding weight size needs to be increased, updating the weight size corresponding to the neuron n _ref according to the following equation:

Δw=α₁K(t_max-t_i-t_delay),t_i<t_max

Specifically, the K function represents that mapping a partial STDP curve into the [0, t _thre ] period, a higher resolution is obtained for the Δw=k (Δt) update expression.

Case 2 for other VCSEL neurons n _o than the output VCSEL neuron n _ref corresponding to the current input sample:

If the firing time t _o is earlier than the firing time t _ref of the output VCSEL neuron n _ref corresponding to the current input sample, the weight corresponding to neuron n _ref still needs to be increased, and then the weight is still updated according to the method in case 1.

Where α ₂ denotes a negative constant learning rate, t _thre denotes a set time threshold, and it can be seen that the update amplitude thereof decreases with an increase in the time difference.

If t _o is later than t _ref and the time difference between them is greater than the time threshold t _thre, it is considered that this situation does not affect the judgment, and the weight is not updated.

Thus, the weight of the photon pulse neural network is determined.

51 Inputting the high-dimensional Feature features corresponding to the test set into a trained photon pulse neural network for processing;

52 Calculating the ignition condition of the VCSEL neuron of the output layer of the photon pulse neural network;

52 According to the ignition condition of the VCSEL neuron of the output layer, combining the discrimination mode proposed in the step 44), determining the predictive label of the current input sample, and completing the voice recognition.

The invention provides a plurality of voice recognition methods based on photon pulse neural networks, which firstly utilize a convolution pulse neural network to extract voice data characteristics, and ensure that characteristic values extracted by the network are discrete; and then, carrying out coding recognition processing by utilizing a photon pulse neural network to obtain a voice recognition result. The method has the advantages of low power consumption, high speed and short delay, has higher network complexity, can classify and identify the standard data set with larger scale, and is suitable for networks with larger scale. Meanwhile, when the photonic pulse neural network structure is designed, various constraint designs in practice are considered based on an actual laser neuron model, so that the photonic pulse neural network structure is suitable for hardware reasoning.

Example two

The voice recognition method based on the photon pulse neural network provided by the invention is illustrated by a specific example.

Step one, preprocessing an original voice data set

Assuming that the original voice data set comprises 1600 training samples and 400 test samples, the voice data processed by the double threshold detection method is saved.

To obtain Fbank eigenvalues, the number of frames Frames =41, and the number of filters in the mel filter bank Mels =40 are set to obtain FBank eigenvalue matrix arr_feature, sizeof (arr_feature) = {40,41}. The Arr_feature is converted into a one-dimensional vector and stored in a csv file, and the name of the voice file is stored at the same time for obtaining the label information.

Step two, convolutional impulse neural network training

Firstly, performing time coding on Arr_feature corresponding to each piece of voice data;

then, training part of the network parameters:

In combination with the local connection characteristic, the convolution kernel size is set to {40×6},6 represents the local connection on the time axis, and the step size stride=1, and thus the size= {36×1} of the obtained feature map.

And setting the size of the segments in the feature map as local_weight_sharing_size=4 by combining the Local weight sharing characteristic, wherein the neurons in the same segment share one Kernel, and the Kernel used by the neurons in different segments (in the same feature map) is different.

The firing threshold if_threshold=33 of the convolution layer IF neuron is set, the Feature map number feature_map=50, and the film voltage of the convolution layer IF neuron is calculated. Thus, arr_t and Arr_v matrices having a size {36×50} are obtained in the convolution layer, and the ignition information and the film voltage corresponding to the ignition timing are stored.

And determining neurons { t _conv,v_conv } needing to be subjected to weight updating according to Arr_t and Arr_v according to the inhibition strategy, and updating according to the STDP updating expression.

Setting the training frequency of the convolutional impulse neural network as 6, repeating the steps before ending, and storing the trained w weight.

Step three, acquiring Feature with higher dimension by using a trained convolutional neural network

The pooling layer is introduced to carry out statistical pooling operation on the convolution layer, and the size of the pooling window is 4 the same as that of the local_weight_sharing_size, so that the size of the pooling layer is {9×50}.

And extracting features of FBank Feature values of the training set and the testing set according to the trained weight values to obtain a high-dimensional Feature value Feature, and converting {9×50} into a one-dimensional vector form {1×450} to store.

Training photon pulse neural network

According to the statistical pooling operation and local_weight_sharing_size in the convolutional impulse neural network, the range of values of Feature characteristic values is discrete and limited, the possible values are {0,1,2,3,4}, the inverse proportion time coding is carried out on the Feature characteristic values, the converted ignition time is { T,9ns,8ns,7ns,6ns }, wherein the simulated cutoff time T=15 ns, so the ignition time T < T of the output neuron is calculated, and the ignition time t=1 s of the output neuron is set when the neuron is not ignited.

Setting the input layer size of the photon pulse neural network as 450, corresponding to the dimension of Feature, the output layer size as 10, corresponding to the sample type number, and the size of the initialization weight as {450x10};

Assuming that the current input sample label=0, after the firing conditions { t ₀,t₁,…,t₉ } of 10 output neurons are acquired, the neurons corresponding to { t ₀ } and { t ₁,t₂,…,t₉ } need to be updated respectively.

For { t ₀ }, the weight thereof needs to be updated if and only if t ₀ =1 (not ignited). Whether firing or not, the neuron t _ref is recorded, if not fired t _ref＝t_max.

For { t ₁,t₂,…,t₉ }, if and only if the corresponding neuron fires, the weight of the corresponding neuron is updated according to t _ref of its firing time t _o,{t₀ } and the effective time difference t _thre = 4 ns.

Setting the training frequency of the photon pulse neural network as 50 times, repeating the steps before ending, and storing the trained w weight.

Fifthly, performing voice recognition by utilizing the trained network

Specifically, sample data of the test set is input, trained weights are imported, and sample types are deduced. Referring to fig. 3, fig. 3 is a result obtained by using the voice recognition method based on the photonic pulse neural network provided by the invention.

As can be seen from FIG. 3, the identification accuracy of the method can reach 93.3%, which meets the expectations. However, a key factor that restricts the accuracy of identifying the photonic pulse neural network is that the output layer VCSEL neurons do not generate pulses, so that the sample type (the prediction type corresponds to X on the abscissa in fig. 3) cannot be determined, and the effect of the problem is far greater than that of erroneous determination, which is why the photonic pulse neural network is difficult to train.

The foregoing is a further detailed description of the invention in connection with the preferred embodiments, and it is not intended that the invention be limited to the specific embodiments described. It will be apparent to those skilled in the art that several simple deductions or substitutions may be made without departing from the spirit of the invention, and these should be considered to be within the scope of the invention.

Claims

1. A voice recognition method based on a photonic pulse neural network, comprising:

Step 5, processing the high-dimensional Feature corresponding to the test set by using the trained photon pulse neural network to obtain a voice recognition result;

wherein, step 4 includes:

43 Acquiring the firing time of the output layer VCSEL neurons;

2. The voice recognition method based on the photonic burst neural network according to claim 1, wherein in step 1, preprocessing an original voice dataset to obtain FBank feature values, includes:

3. The voice recognition method based on the photonic burst neural network according to claim 1, wherein step2 comprises:

22 Acquiring firing information for each IF neuron in the convolutional layer;

4. A photonic impulse neural network-based speech recognition method according to claim 3, characterized in that step 23) comprises:

Setting an ignition suppression strategy:

setting an update suppression strategy:

5. A photonic impulse neural network-based speech recognition method according to claim 3, characterized in that in step 24) the weights of the IF neurons are updated with STDP algorithm as follows:

Wherein t _i represents the firing time of the input neuron i, t _j represents the firing time of the output neuron j, w _ij represents the connection weight of the neuron i and the neuron j, Δw _ij represents the weight update amount between the neuron i and the neuron j, α ⁺ represents the learning rate of the update expression when the firing time of the neuron i is earlier than the firing time of the neuron j, and α ^- represents the learning rate of the update expression when the firing time of the neuron i is equal to or later than the firing time of the neuron j.

6. The photonic impulse neural network-based speech recognition method according to claim 1, wherein in step 44), setting the discriminant comprises:

7. The photonic impulse neural network-based speech recognition method of claim 6, wherein in step 44), updating the weights of the output layer VCSEL neurons based on time weights comprises:

Δw=α₁K(t_max-t_i-t_delay),t_i<t_max

8. The photonic impulse neural network-based speech recognition method of claim 7, wherein in step 44), updating the weights of the output layer VCSEL neurons based on time weights further comprises:

Δw=α₁K(t_max-t_i-t_delay),t_i<t_max

9. The photonic impulse neural network-based speech recognition method according to claim 1, wherein step 5 comprises: