[go: up one dir, main page]

WO2018153200A1 - Procédé et dispositif de modélisation acoustique se basant sur un modèle hlstm, et support d'informations - Google Patents

Procédé et dispositif de modélisation acoustique se basant sur un modèle hlstm, et support d'informations Download PDF

Info

Publication number
WO2018153200A1
WO2018153200A1 PCT/CN2018/073887 CN2018073887W WO2018153200A1 WO 2018153200 A1 WO2018153200 A1 WO 2018153200A1 CN 2018073887 W CN2018073887 W CN 2018073887W WO 2018153200 A1 WO2018153200 A1 WO 2018153200A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
hlstm
training
state
lstm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2018/073887
Other languages
English (en)
Chinese (zh)
Inventor
张鹏远
董振江
张宇
贾霞
李洁
张恒生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Publication of WO2018153200A1 publication Critical patent/WO2018153200A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks

Definitions

  • the present disclosure relates to the field of speech recognition technologies, and in particular, to an acoustic modeling method, apparatus, and storage medium based on a Highway Long Short Time Memory (HLSTM) model.
  • HLSTM Highway Long Short Time Memory
  • the Long Short Time Memory (LSTM) model was introduced into acoustic modeling, and the LSTM model has stronger acoustic modeling capabilities than a simple feedforward network. Due to the increasing amount of data, it is necessary to deepen the number of layers of the acoustic model neural network to improve the modeling ability. However, as the number of network layers in the LSTM model deepens, the training difficulty of the network increases, and the gradient disappears. In order to avoid the disappearance of the gradient, an HLSTM model based on the LSTM model was proposed, which introduced direct connections between memory cells in adjacent layers of the LSTM model.
  • the proposed HLSTM model enables the deeper network structure to be practically applied in the recognition system and greatly improves the recognition accuracy.
  • the deep HLSTM model has stronger modeling capabilities, the deepening of the number of layers and the introduction of new connections (the above-mentioned direct connection) also make the acoustic model have a more complex network structure, so the forward calculation takes longer. , eventually leading to slower decoding. Therefore, how to improve the performance without increasing the complexity of the acoustic model becomes a problem to be solved.
  • Embodiments of the present disclosure provide an acoustic modeling method, apparatus, and storage medium based on an HLSTM model.
  • Embodiments of the present disclosure provide an acoustic modeling method based on an HLSTM model, including:
  • the randomly initialized HLSTM model is trained based on a preset function, and the training result is optimized;
  • the HLSTM model is the same as the network parameter of the LSTM model.
  • the training is performed on the randomly initialized HLSTM model based on a preset function, and the training result is optimized, including:
  • the HLSTM model obtained by the training is optimized according to the state-level minimum Bayesian risk criterion.
  • the F CE represents a cross entropy objective function
  • the speech feature at time t is the annotation value of the output point in the y state
  • X t ) is the speech feature of the neural network t, corresponding to the output of the y state point
  • the X represents the training data
  • the N is the total duration of the speech feature.
  • the objective function corresponding to the state-level minimum Bayesian risk criterion is:
  • the Wu is an annotated text of a voice; the W and W′ are labels corresponding to a decoding path of the seed model; the p(O u
  • the O u is the speech feature of the corpus of the u sentence, the S represents the state sequence of the decoding path, and the P(W) and P(W') are both language model probability scores.
  • the number of network layers of the HLSTM model is greater than or equal to the number of network layers of the LSTM model.
  • the training the randomly initialized LSTM model based on the result of the forward calculation and the preset function including:
  • An embodiment of the present disclosure further provides an acoustic modeling device based on an HLSTM model, including:
  • the HLSTM model processing module is configured to train the randomly initialized HLSTM model based on a preset function and optimize the training result;
  • a calculation module configured to perform training forward calculation through the optimized HLSTM model
  • the LSTM model processing module is configured to train the randomly initialized long and short time memory LSTM model based on the forward calculation result and the preset function, and the obtained model is an acoustic model of the speech recognition system;
  • the HLSTM model is the same as the network parameter of the LSTM model.
  • the HLSTM model processing module includes:
  • a first training unit configured to train the randomly initialized HLSTM model using a cross entropy objective function
  • An optimization unit configured to optimize the HLSTM model obtained by the training according to a state-level minimum Bayesian risk criterion.
  • the F CE represents a cross entropy objective function
  • the speech feature at time t is the annotation value of the output point in the y state
  • X t ) is the speech feature of the neural network t, corresponding to the output of the y state point
  • the X represents the training data
  • the N is the total duration of the speech feature.
  • the objective function corresponding to the state-level minimum Bayesian risk criterion is:
  • the Wu is an annotated text of a voice; the W and W′ are labels corresponding to a decoding path of the seed model; the p(O u
  • the O u is the speech feature of the corpus of the u sentence, the S represents the state sequence of the decoding path, and the P(W) and P(W') are both language model probability scores.
  • the LSTM model processing module includes:
  • An obtaining unit configured to obtain an output result of each frame obtained by the forward calculation
  • a second training unit configured to train the randomly initialized LSTM model based on the output result of each frame and the cross entropy objective function; wherein, in the cross entropy objective function The result is output for each frame obtained by the forward calculation.
  • Embodiments of the present disclosure further provide a storage medium having stored thereon a computer program that, when executed by a processor, implements the steps of any of the above methods.
  • the HLSTM model-based acoustic modeling method, apparatus and storage medium train a randomly initialized HLSTM model based on a preset function, and optimize the training result; and the training data is obtained through the optimization.
  • the HLSTM model performs forward calculation; based on the result of the forward calculation and the preset function, training the randomly initialized LSTM model, and the obtained model is an acoustic model of the speech recognition system; wherein the HLSTM model and the The network parameters of the LSTM model are the same.
  • the embodiment of the present disclosure transmits the network information of the optimized HLSTM model to the LSTM network through the posterior probability, thereby improving the performance of the LSTM baseline model without increasing the complexity of the model.
  • FIG. 1 is a schematic flow chart of an acoustic modeling method based on an HLSTM model according to an embodiment of the present disclosure
  • FIG. 2 is a network structure diagram of a bidirectional HLSTM model according to an embodiment of the present disclosure
  • FIG. 3 is a schematic structural diagram of an acoustic modeling device based on an HLSTM model according to an embodiment of the present disclosure
  • FIG. 4 is a schematic structural diagram of an HLSTM model processing module according to an embodiment of the present disclosure.
  • FIG. 5 is a schematic structural diagram of an LSTM model processing module according to an embodiment of the present disclosure.
  • FIG. 1 is a schematic flowchart of an acoustic modeling method based on an HLSTM model according to an embodiment of the present disclosure. As shown in FIG. 1 , the method includes:
  • Step 101 Train the randomly initialized HLSTM model based on a preset function, and optimize the training result.
  • Step 102 Perform training calculation by using the HLSTM model obtained by the optimization
  • Step 103 Train the randomly initialized LSTM model based on the result of the forward calculation and the preset function, and obtain the model as an acoustic model of the speech recognition system;
  • the HLSTM model is the same as the network parameter of the LSTM model.
  • the HLSTM model and the LSTM model may both be bidirectional or both unidirectional.
  • the network parameters may include: an input layer node number, an output layer node number, an input observation vector, a hidden layer node number, a recursive delay, and a mapping layer connected after each hidden layer.
  • the embodiment of the present disclosure transmits the network information of the optimized HLSTM model to the LSTM network through the posterior probability, thereby improving the performance of the LSTM baseline model without increasing the complexity of the model.
  • the randomly initialized HLSTM model is shown in FIG. 2, and the dotted line box is an inter-layer memory unit connection (direct connection) set on the basis of the LSTM model, as shown in FIG. 2.
  • the direct connection between adjacent layer memory cells is introduced in the HLSTM model, the problem of gradient disappearance can be avoided, and the difficulty of network training is reduced, so that a deeper structure can be used in practical applications.
  • the number of network layers cannot be infinitely deepened because the larger parameter quantity model causes over-fitting compared to the amount of training data. In actual use, the number of network layers of the HLSTM model can be adjusted based on the amount of training data available.
  • the training is performed on the randomly initialized HLSTM model based on a preset function, and the training result is optimized, including:
  • the HLSTM model obtained by the training is optimized according to the state-level minimum Bayesian risk criterion.
  • the F CE represents a cross entropy objective function
  • the speech feature at time t is the annotation value of the output point in the y state
  • X t ) is the speech feature of the neural network t, corresponding to the output of the y state point
  • the X represents the training data
  • the N is the total duration of the speech feature.
  • the objective function corresponding to the state-level minimum Bayesian risk criterion is:
  • the Wu is an annotated text of a voice; the W and W′ are labels corresponding to a decoding path of the seed model; the p(O u
  • the O u is the speech feature of the corpus of the u sentence, the S represents the state sequence of the decoding path, and the P(W) and P(W') are both language model probability scores.
  • the number of network layers of the HLSTM model is greater than or equal to the number of network layers of the LSTM model.
  • the training the randomly initialized LSTM model based on the result of the forward calculation and the preset function including:
  • the embodiment of the present disclosure further provides an acoustic modeling device based on the HLSTM model, which is used to implement the above embodiments and specific implementations, and has not been described again.
  • the term “module” "unit” may implement a combination of software and/or hardware of a predetermined function.
  • the device comprises:
  • the HLSTM model processing module 301 is configured to train the randomly initialized HLSTM model based on a preset function, and optimize the training result;
  • the calculating module 302 is configured to perform training calculation by using the optimized HLSTM model
  • the LSTM model processing module 303 is configured to train the randomly initialized LSTM model based on the result of the forward calculation and the preset function, and the obtained model is an acoustic model of the speech recognition system;
  • the HLSTM model is the same as the network parameter of the LSTM model.
  • the embodiment of the present disclosure transmits the network information of the optimized HLSTM model to the LSTM network through the posterior probability, thereby improving the performance of the LSTM baseline model without increasing the complexity of the model.
  • the randomly initialized HLSTM model is shown in FIG. 2, and the dotted line box is an inter-layer memory unit connection (direct connection) set on the basis of the LSTM model, and the connection formula is as shown in FIG. 2.
  • the direct connection between adjacent layer memory cells is introduced in the HLSTM model, the problem of gradient disappearance can be avoided, and the difficulty of network training is reduced, so that a deeper structure can be used in practical applications.
  • the number of network layers cannot be infinitely deepened because the larger parameter quantity model causes over-fitting compared to the amount of training data. In actual use, the number of network layers of the HLSTM model can be adjusted based on the amount of training data available.
  • the HLSTM model processing module 301 includes:
  • the first training unit 3011 is configured to train the randomly initialized HLSTM model by using a cross entropy objective function
  • the optimization unit 3012 is configured to optimize the HLSTM model obtained by the training according to the state-level minimum Bayesian risk criterion.
  • the F CE represents a cross entropy objective function
  • the speech feature at time t is the annotation value of the output point in the y state
  • X t ) is the speech feature of the neural network t, corresponding to the output of the y state point
  • the X represents the training data
  • the N is the total duration of the speech feature.
  • the objective function corresponding to the state-level minimum Bayesian risk criterion is:
  • the Wu is an annotated text of a voice; the W and W′ are labels corresponding to a decoding path of the seed model; the p(O u
  • the O u is the speech feature of the corpus of the u sentence, the S represents the state sequence of the decoding path, and the P(W) and P(W') are both language model probability scores.
  • the LSTM model processing module 303 includes:
  • the obtaining unit 3031 is configured to acquire an output result of each frame obtained by the forward calculation
  • a second training unit 3032 configured to train the randomly initialized LSTM model based on the output result of each frame and the cross entropy objective function; wherein, in the cross entropy objective function The result is output for each frame obtained by the forward calculation.
  • the number of network layers of the HLSTM model is greater than or equal to the number of network layers of the LSTM model.
  • the HLSTM model processing module 301, the calculation module 302, the LSTM model processing module 303, the first training unit 3011, the optimization unit 3012, the acquisition unit 3031, and the second training unit 3032 may be in an HLSTM model-based acoustic modeling device. Processor implementation.
  • a deep two-way HLSTM model with stronger modeling capability is trained as a "teacher” model
  • a randomly initialized two-way LSTM model is used as a "student” model
  • a "teacher” model is used to train a student with a relatively small parameter amount. "model. The specific method is described as follows:
  • the HLSTM model is randomly initialized.
  • the network structure of the HLSTM model is shown in Figure 2. Since HLSTM introduces direct connection between adjacent layer memory cells, the problem of gradient disappearance is avoided, and the difficulty of network training is reduced. Therefore, a deeper structure can be used in practical applications.
  • the number of network layers cannot be infinitely deepened because the excessive parameter quantity model causes over-fitting compared to the amount of training data. In actual use, the number of HLSTM network layers can be adjusted according to the amount of training data available.
  • the training data can be 300h (hours)
  • the HLSTM model used is 6 layers, namely: an input layer, an output layer, and four hidden layers between them.
  • the HLSTM model is iteratively updated using the CrossEntropy (CE) objective function.
  • CE objective function formula is as follows:
  • the F CE represents a cross entropy objective function
  • the speech feature at time t is the annotation value of the output point in the y state
  • X t ) is the speech feature of the neural network t, corresponding to the output of the y state point
  • the X represents the training data
  • the N is the total duration of the speech feature.
  • the HLSTM model generated based on CE objective function training has better recognition performance.
  • the model is further optimized by the discriminative sequence-level training criterion, namely: State-level Minimum Bayes Risk (SMBR) criterion.
  • SMBR State-level Minimum Bayes Risk
  • the difference from the acoustic model training of the CE criterion is that the discriminative sequence-level training criterion tries to learn more classes from the positive and negative training samples on a limited training set by optimizing the function related to the system recognition rate. Distinguish information.
  • Its objective function is as follows:
  • the Wu is an annotated text of a voice; the W and W′ are labels corresponding to a decoding path of the seed model; the p(O u
  • the O u is the speech feature of the corpus of the u sentence, the S represents the state sequence of the decoding path, and the P(W) and P(W') are both language model probability scores.
  • HLSTM model newly connected model
  • the information transmission method of the embodiment of the present disclosure is to perform the forward calculation by using the "teacher” model, obtain the output corresponding to each frame input, mark the obtained output, and use the CE criterion mentioned above as the objective function to train.
  • the "student” model, the trained LSTM model is used as an acoustic model for speech recognition systems.
  • An advantage of embodiments of the present disclosure is to improve LSTM baseline model performance without increasing model complexity.
  • the HLSTM model has stronger modeling capabilities and higher recognition performance, the decoding real-time rate is also one of the indicators for evaluating the performance of the recognition system.
  • the HLSTM model has a higher parameter size and model complexity than the LSTM model, which inevitably slows down the decoding speed.
  • the HLSTM model network information is transmitted to the LSTM network through the posterior probability, thereby improving the performance of the LSTM baseline model.
  • the “student” model performance is lower than the “teacher” model, but Still higher than the performance of the directly trained LSTM model.
  • Step 1 Extract the speech features of the training data.
  • the EM algorithm is used to iteratively update the mean variance of the GMM-HMM system, and the GMM-HMM system is used to force the alignment of the feature data to obtain the three-factor cluster state annotation.
  • Step 2 Train the two-way HLSTM model based on the cross entropy criterion.
  • a six-layer bidirectional HLSTM model is used, and the parameter quantity of the model is 190M.
  • the specific configuration is as follows: the input layer has 260 nodes, the input observation vector is 2 frames for the context, and the number of nodes of the four hidden layers are both For 1024, the recursive delay is 1, 2, 3, 4; each layer of hidden layer is connected with a 512-dimensional mapping layer to reduce the dimension reduction parameter.
  • the number of nodes in the output layer is 2821, which corresponds to 2821 triphone clustering states.
  • Step 3 The model generated in step 2 is used as a seed model, and the bidirectional HLSTM model is iteratively updated based on the state-level minimum Bayesian risk criterion.
  • Step 4 Perform the forward calculation by using the two-way HLSTM model generated in step three to obtain the output vector.
  • Step 5 The output vector obtained in step 4 is labeled corresponding to the input feature, and the bidirectional LSTM model with three hidden layers is trained, and the parameter quantity is 120M.
  • the network parameters of the model are consistent with the HLSTM model in step 2.
  • embodiments of the present disclosure can be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of a hardware embodiment, a software embodiment, or a combination of software and hardware aspects. Moreover, the present disclosure may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage and optical storage, etc.) including computer usable program code.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.
  • an embodiment of the present disclosure further provides a storage medium, in particular a computer readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of the method in the embodiment of the present disclosure are implemented.
  • the solution provided by the embodiment of the present disclosure trains the randomly initialized HLSTM model based on a preset function, and optimizes the training result; and performs training data through the optimized HLSTM model for forward calculation; To the calculated result and the preset function, the randomly initialized LSTM model is trained, and the obtained model is an acoustic model of the speech recognition system; wherein the HLSTM model has the same network parameters as the LSTM model.
  • the embodiment of the present disclosure transmits the network information of the optimized HLSTM model to the LSTM network through the posterior probability, thereby improving the performance of the LSTM baseline model without increasing the complexity of the model.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Probability & Statistics with Applications (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Machine Translation (AREA)
  • Telephonic Communication Services (AREA)

Abstract

La présente invention concerne un procédé et un dispositif de modélisation acoustique se basant sur un modèle HLSTM (mémoire long court terme de réseau Highway) et un support de stockage. Ledit procédé comprenant : l'apprentissage d'un modèle HLSTM aléatoirement initialisé sur la base d'une fonction prédéfinie et l'optimisation d'un résultat d'apprentissage (101) ; l'exécution d'un calcul direct sur des données d'apprentissage au moyen du modèle HLSTM optimisé (102) ; et l'apprentissage d'un modèle LSTM (mémoire long court terme) aléatoirement initialisé sur la base du résultat du calcul direct et de la fonction prédéfinie de sorte à obtenir un modèle qui est un modèle acoustique d'un système de reconnaissance vocale (103), le modèle HLSTM et le modèle LSTM ayant les mêmes paramètres de réseau.
PCT/CN2018/073887 2017-02-21 2018-01-23 Procédé et dispositif de modélisation acoustique se basant sur un modèle hlstm, et support d'informations Ceased WO2018153200A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710094191.6 2017-02-21
CN201710094191.6A CN108461080A (zh) 2017-02-21 2017-02-21 一种基于hlstm模型的声学建模方法和装置

Publications (1)

Publication Number Publication Date
WO2018153200A1 true WO2018153200A1 (fr) 2018-08-30

Family

ID=63222056

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/073887 Ceased WO2018153200A1 (fr) 2017-02-21 2018-01-23 Procédé et dispositif de modélisation acoustique se basant sur un modèle hlstm, et support d'informations

Country Status (2)

Country Link
CN (1) CN108461080A (fr)
WO (1) WO2018153200A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110517679A (zh) * 2018-11-15 2019-11-29 腾讯科技(深圳)有限公司 一种人工智能的音频数据处理方法及装置、存储介质
US11158303B2 (en) 2019-08-27 2021-10-26 International Business Machines Corporation Soft-forgetting for connectionist temporal classification based automatic speech recognition

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110569700B (zh) * 2018-09-26 2020-11-03 创新先进技术有限公司 优化损伤识别结果的方法及装置
CN111709513B (zh) * 2019-03-18 2023-06-09 百度在线网络技术(北京)有限公司 长短期记忆网络lstm的训练系统、方法及电子设备
CN110751941B (zh) * 2019-09-18 2023-05-26 平安科技(深圳)有限公司 语音合成模型的生成方法、装置、设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104538028A (zh) * 2014-12-25 2015-04-22 清华大学 一种基于深度长短期记忆循环神经网络的连续语音识别方法
CN105529023A (zh) * 2016-01-25 2016-04-27 百度在线网络技术(北京)有限公司 语音合成方法和装置
CN105810193A (zh) * 2015-01-19 2016-07-27 三星电子株式会社 训练语言模型的方法和设备及识别语言的方法和设备
CN106098059A (zh) * 2016-06-23 2016-11-09 上海交通大学 可定制语音唤醒方法及系统
CN106170800A (zh) * 2014-09-12 2016-11-30 微软技术许可有限责任公司 经由输出分布来学习学生dnn
CN106328122A (zh) * 2016-08-19 2017-01-11 深圳市唯特视科技有限公司 一种利用长短期记忆模型递归神经网络的语音识别方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106170800A (zh) * 2014-09-12 2016-11-30 微软技术许可有限责任公司 经由输出分布来学习学生dnn
CN104538028A (zh) * 2014-12-25 2015-04-22 清华大学 一种基于深度长短期记忆循环神经网络的连续语音识别方法
CN105810193A (zh) * 2015-01-19 2016-07-27 三星电子株式会社 训练语言模型的方法和设备及识别语言的方法和设备
CN105529023A (zh) * 2016-01-25 2016-04-27 百度在线网络技术(北京)有限公司 语音合成方法和装置
CN106098059A (zh) * 2016-06-23 2016-11-09 上海交通大学 可定制语音唤醒方法及系统
CN106328122A (zh) * 2016-08-19 2017-01-11 深圳市唯特视科技有限公司 一种利用长短期记忆模型递归神经网络的语音识别方法

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110517679A (zh) * 2018-11-15 2019-11-29 腾讯科技(深圳)有限公司 一种人工智能的音频数据处理方法及装置、存储介质
CN110517679B (zh) * 2018-11-15 2022-03-08 腾讯科技(深圳)有限公司 一种人工智能的音频数据处理方法及装置、存储介质
US11158303B2 (en) 2019-08-27 2021-10-26 International Business Machines Corporation Soft-forgetting for connectionist temporal classification based automatic speech recognition

Also Published As

Publication number Publication date
CN108461080A (zh) 2018-08-28

Similar Documents

Publication Publication Date Title
CN106845411B (zh) 一种基于深度学习和概率图模型的视频描述生成方法
CN108733792B (zh) 一种实体关系抽取方法
CN107562792B (zh) 一种基于深度学习的问答匹配方法
Zhang et al. Combining cross-modal knowledge transfer and semi-supervised learning for speech emotion recognition
CN106649514B (zh) 用于受人启发的简单问答(hisqa)的系统和方法
CN106156003B (zh) 一种问答系统中的问句理解方法
WO2022022163A1 (fr) Procédé d'apprentissage de modèle de classification de texte, dispositif, appareil, et support de stockage
CN109065032B (zh) 一种基于深度卷积神经网络的外部语料库语音识别方法
WO2018153200A1 (fr) Procédé et dispositif de modélisation acoustique se basant sur un modèle hlstm, et support d'informations
CN116662552A (zh) 金融文本数据分类方法、装置、终端设备及介质
CN107818164A (zh) 一种智能问答方法及其系统
JP2019159654A (ja) 時系列情報の学習システム、方法およびニューラルネットワークモデル
CN108292305A (zh) 用于处理语句的方法
CN106502985A (zh) 一种用于生成标题的神经网络建模方法及装置
CN108647191B (zh) 一种基于有监督情感文本和词向量的情感词典构建方法
CN104376842A (zh) 神经网络语言模型的训练方法、装置以及语音识别方法
CN113255366B (zh) 一种基于异构图神经网络的方面级文本情感分析方法
WO2021208455A1 (fr) Procédé et système de reconnaissance de la parole par réseau neuronal orientés vers un environnement vocal domestique
US10529322B2 (en) Semantic model for tagging of word lattices
CN109036467A (zh) 基于tf-lstm的cffd提取方法、语音情感识别方法及系统
CN108846063A (zh) 确定问题答案的方法、装置、设备和计算机可读介质
CN111126040A (zh) 一种基于深度边界组合的生物医学命名实体识别方法
CN111914555B (zh) 基于Transformer结构的自动化关系抽取系统
CN108109615A (zh) 一种基于dnn的蒙古语声学模型的构造和使用方法
CN116982054A (zh) 使用前瞻树搜索的序列到序列神经网络系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18757896

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18757896

Country of ref document: EP

Kind code of ref document: A1