[go: up one dir, main page]

CN102543073A - Shanghai dialect phonetic recognition information processing method - Google Patents

Shanghai dialect phonetic recognition information processing method Download PDF

Info

Publication number
CN102543073A
CN102543073A CN2010105833672A CN201010583367A CN102543073A CN 102543073 A CN102543073 A CN 102543073A CN 2010105833672 A CN2010105833672 A CN 2010105833672A CN 201010583367 A CN201010583367 A CN 201010583367A CN 102543073 A CN102543073 A CN 102543073A
Authority
CN
China
Prior art keywords
voice
model
module
shanghai
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2010105833672A
Other languages
Chinese (zh)
Other versions
CN102543073B (en
Inventor
陈开�
许华虎
阳诚海
施建刚
孙弘刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHANGHAI SHANGDA HAIRUN INFORMATION SYSTEM CO Ltd
Original Assignee
SHANGHAI SHANGDA HAIRUN INFORMATION SYSTEM CO Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHANGHAI SHANGDA HAIRUN INFORMATION SYSTEM CO Ltd filed Critical SHANGHAI SHANGDA HAIRUN INFORMATION SYSTEM CO Ltd
Priority to CN201010583367.2A priority Critical patent/CN102543073B/en
Publication of CN102543073A publication Critical patent/CN102543073A/en
Application granted granted Critical
Publication of CN102543073B publication Critical patent/CN102543073B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The invention relates to a Shanghai dialect phonetic recognition information processing method, which includes steps that: 1) a voice input device inputs Shanghai dialect signals; 2) a preprocessing module preprocesses the input Shanghai dialect phonetic signals; 3) a feature extraction module extracts feature parameters reflecting signal features; 4) a training module performs preprocessing and feature parameter extraction on training phonetic signals input by users for a plurality of times to obtain feature vector parameters and then a feature modeling module builds a reference model base for training voice; 5) a recognition module carries out similarity comparison on feature vector parameters of the input voice and models in the reference module base and outputs the input model with the highest similarity as a recognition candidate result; 6) a postprocessing module performs phonetic knowledge processing on the recognition candidate result in step 5) to obtain the final recognition result; and 7) the final recognition result is output through a voice output device. Compared with the prior art, the Shanghai dialect phonetic recognition information processing method has the advantages of being high in recognition speed and the like.

Description

A kind of Shanghai language voice recognition information disposal route
Technical field
The present invention relates to a kind of audio recognition method, especially relate to a kind of Shanghai language voice recognition information disposal route.
Background technology
What aspect speech recognition, carry out the earliest is speaker's identification; Mainly concentrating on simple people's ear listens and distinguishes; Real speech recognition is that research adopts voice signal linear forecast coding technology and dynamic time warping technological, mainly is for isolated word, employing be the technology of template matches.China just carried out the research of speech recognition aspect to mandarin since 1987, and then for dialectal accent, the identification of dialect development relatively lags behind.Speak in the phonetic system structure in Shanghai, prosodic features, and the language syntax aspect all is different from mandarin.Can not simply use the method for identification mandarin and discern the Shanghai language.And the model of cognition of mandarin adopted classical H MM, and this method can cause the high problem of space-time complexity.
Summary of the invention
The object of the invention is exactly to provide a kind of recognition speed high Shanghai language voice recognition information disposal route for the defective that overcomes above-mentioned prior art existence.
The object of the invention can be realized through following technical scheme:
A kind of Shanghai language voice recognition information disposal route is characterized in that, may further comprise the steps:
1) audio input device input Shanghai language signal;
2) pre-processing module is carried out pre-service to the Shanghai language voice signal of input;
3) characteristic extracting module extracts the characteristic parameter of reflected signal characteristic;
4) training module is imported several times training utterance signal with the user; Through obtaining character vector after pre-service and the characteristic parameter extraction; Set up the reference model storehouse of training utterance then through the feature modeling module, or the reference model in the model bank is done the adaptability correction;
5) identification module character vector and the model in the reference model storehouse that will import voice carries out similarity and compares, and the input of the model that similarity is the highest is as the output of identification candidate result;
6) post-processing module obtains final recognition result to the identification candidate structure in the step 5) through the voice knowledge processing;
7) final recognition result is through audio output device output.
Described step 2) pre-service in comprises carries out end-point detection to noisy speech signal, and voice divide frame and pre-emphasis to handle.
The characteristic parameter step that extracts the reflected signal characteristic in the described step 3) is following:
1) choose pitch period, resonance peak and based on the Mel frequency cepstral coefficient of auditory properties as characteristic parameter;
2) voice signal is carried out LPF after, sample to set sampling frequency, to calculate related coefficient in short-term by frame the retardation time of setting, obtain pitch period at last;
3) directly voice signal is asked discrete Fourier transformation, compose the formant parameter that extracts voice signal with DFT;
4) carry out filtering with M Mel BPF., the output of each wave filter is taken the logarithm, obtain the log power spectrum of frequency band, and carry out inverse discrete cosine transformation, obtain L dimension Mel frequency cepstral coefficient, get preceding 12 dimension Mel frequency cepstral coefficients.
Reference model in the described step 4) is GMM and semicontinuous HMM model; This model comprises the tranining database of Shanghai language voice and the code book that is generated by database; In conjunction with code book and tranining database, calculate the mixed weighting value of acoustic model, generate GMM and semicontinuous HMM model at last.
Voice knowledge processing in the described step 6) comprises language model, morphology, sentence structure processing.
Compared with prior art; The present invention has the Shanghai phonics model based on multichannel GMM and semicontinuous HMM, and it is high that this model has solved HMM model space-time complexity to a certain extent, problems such as complicacy; Based on hyperchannel more accurate the estimation of each additional weights, improved recognition speed.
Description of drawings
Fig. 1 is a process flow diagram of the present invention;
Fig. 2 is a hardware configuration synoptic diagram of the present invention.
Embodiment
Below in conjunction with accompanying drawing and specific embodiment the present invention is elaborated.
Embodiment
As shown in Figure 1, a kind of Shanghai language voice recognition information disposal route is characterized in that, may further comprise the steps:
Step 101, audio input device 1 input Shanghai language signal;
The Shanghai language voice signal of step 102,21 pairs of inputs of pre-processing module carries out pre-service, and it mainly carries out end-point detection to noisy speech signal, and voice divide frame and pre-emphasis to handle;
Step 103, characteristic extracting module 22 choose pitch period, resonance peak and based on the Mel frequency cepstral coefficient of auditory properties as characteristic parameter; Pitch period contains abundant tone information; Resonance peak and reflected the voice tone color in itself based on the Mel frequency cepstral coefficient of auditory properties is of paramount importance characteristic parameter;
Step 104, since the fundamental frequency of voice signal generally all below 500Hz; Even soprano C transfers the highest 1KHz that also is no more than; Characteristic extracting module 22 use a bandwidth as the low-pass filter of 1KHz to voice signal filtering; Sample with the 2KHz sampling frequency then, to calculate related coefficient in short-term by frame the retardation time of 10ms, every frame length is 20ms at last., obtain pitch period;
Step 105, characteristic extracting module 22 are directly asked discrete Fourier transformation to voice signal; Compose the formant parameter that extracts voice signal with DFT; But directly the spectrum of DFT will receive the influence of fundamental frequency harmonics, and maximal value can only appear on the harmonic frequency, thereby the resonance peak error at measurment is bigger.In order to eliminate the influence of fundamental frequency harmonics, can adopt homomorphism uncoiling technology, obtain level and smooth spectrum through after the homomorphic filtering, detection peak just can directly be extracted formant parameter so simply;
Step 106, characteristic extracting module 22 a usefulness M Mel BPF. carry out filtering, because acting in people's ear of component superposes in each frequency band, therefore the energy in each filter band are superposeed, at this moment k wave filter output power spectrum.The output of each wave filter is taken the logarithm, obtain the log power spectrum of frequency band, and carry out inverse discrete cosine transformation, obtain L dimension MFCC.But, get the MFCC of preceding 12 dimensions usually because the MFCC of preceding several dimensions and last some dimensions is bigger to the differentiation performance of voice.
Step 107, training module 23 are imported several times training utterance signal with the user; Through obtaining character vector after pre-service and the characteristic parameter extraction; Set up the reference model storehouse of training utterance then through the feature modeling module, or the reference model in the model bank is done the adaptability correction, reference model is GMM and semicontinuous HMM model; This model comprises the tranining database of Shanghai language voice and the code book that is generated by database; In conjunction with code book and tranining database, calculate the mixed weighting value of acoustic model, generate GMM and semicontinuous HMM model at last;
Character vector and the model in the reference model storehouse that step 108, identification module 24 will be imported voice carry out similarity and compare, and the input of the model that similarity is the highest is as the output of identification candidate result;
Identification candidate structure in step 109,25 pairs of steps 108 of post-processing module obtains final recognition result through the voice knowledge processing;
Step 110, final recognition result are exported through audio output device 3.
As shown in Figure 2; Hardware device of the present invention comprises audio input device 1, processor 2, audio output device 3; Described processor 2 comprises pre-processing module 21, characteristic extracting module 22, training module 23, identification module 24, post-processing module 25; Described audio input device 1 is connected with pre-processing module 21; Described characteristic extracting module 22 is connected with training module 23, identification module 24 respectively, and described training module 23 is connected with identification module 24, and described identification module 24, post-processing module 25, audio output device 3 connect successively.

Claims (5)

1. a Shanghai language voice recognition information disposal route is characterized in that, may further comprise the steps:
1) audio input device input Shanghai language signal;
2) pre-processing module is carried out pre-service to the Shanghai language voice signal of input;
3) characteristic extracting module extracts the characteristic parameter of reflected signal characteristic;
4) training module is imported several times training utterance signal with the user; Through obtaining character vector after pre-service and the characteristic parameter extraction; Set up the reference model storehouse of training utterance then through the feature modeling module, or the reference model in the model bank is done the adaptability correction;
5) identification module character vector and the model in the reference model storehouse that will import voice carries out similarity and compares, and the input of the model that similarity is the highest is as the output of identification candidate result;
6) post-processing module obtains final recognition result to the identification candidate structure in the step 5) through the voice knowledge processing;
7) final recognition result is through audio output device output.
2. a kind of Shanghai according to claim 1 language voice recognition information disposal route is characterized in that described step 2) in pre-service comprise noisy speech signal carried out end-point detection that voice divide frame and pre-emphasis to handle.
3. a kind of Shanghai according to claim 1 language voice recognition information disposal route is characterized in that the characteristic parameter step that extracts the reflected signal characteristic in the described step 3) is following:
1) choose pitch period, resonance peak and based on the Mel frequency cepstral coefficient of auditory properties as characteristic parameter;
2) voice signal is carried out LPF after, sample to set sampling frequency, to calculate related coefficient in short-term by frame the retardation time of setting, obtain pitch period at last;
3) directly voice signal is asked discrete Fourier transformation, compose the formant parameter that extracts voice signal with DFT;
4) carry out filtering with M Mel BPF., the output of each wave filter is taken the logarithm, obtain the log power spectrum of frequency band, and carry out inverse discrete cosine transformation, obtain L dimension Mel frequency cepstral coefficient, get preceding 12 dimension Mel frequency cepstral coefficients.
4. a kind of Shanghai according to claim 1 language voice recognition information disposal route; It is characterized in that; Reference model in the described step 4) is GMM and semicontinuous HMM model, and this model comprises the tranining database of Shanghai language voice and the code book that is generated by database, in conjunction with code book and tranining database; Calculate the mixed weighting value of acoustic model, generate GMM and semicontinuous HMM model at last.
5. a kind of Shanghai according to claim 1 language voice recognition information disposal route is characterized in that the voice knowledge processing in the described step 6) comprises language model, morphology, sentence structure processing.
CN201010583367.2A 2010-12-10 2010-12-10 Shanghai dialect phonetic recognition information processing method Active CN102543073B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201010583367.2A CN102543073B (en) 2010-12-10 2010-12-10 Shanghai dialect phonetic recognition information processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201010583367.2A CN102543073B (en) 2010-12-10 2010-12-10 Shanghai dialect phonetic recognition information processing method

Publications (2)

Publication Number Publication Date
CN102543073A true CN102543073A (en) 2012-07-04
CN102543073B CN102543073B (en) 2014-05-14

Family

ID=46349813

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201010583367.2A Active CN102543073B (en) 2010-12-10 2010-12-10 Shanghai dialect phonetic recognition information processing method

Country Status (1)

Country Link
CN (1) CN102543073B (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103440865A (en) * 2013-08-06 2013-12-11 普强信息技术(北京)有限公司 Post-processing method for voice recognition
CN103886236A (en) * 2014-03-17 2014-06-25 深圳市中兴移动通信有限公司 Acoustic control screen unlocking method and mobile terminal
CN104123934A (en) * 2014-07-23 2014-10-29 泰亿格电子(上海)有限公司 Speech composition recognition method and system
CN104183245A (en) * 2014-09-04 2014-12-03 福建星网视易信息系统有限公司 Method and device for recommending music stars with tones similar to those of singers
CN104835495A (en) * 2015-05-30 2015-08-12 宁波摩米创新工场电子科技有限公司 High-definition voice recognition system based on low pass filter
CN104900235A (en) * 2015-05-25 2015-09-09 重庆大学 Voiceprint recognition method based on pitch period mixed characteristic parameters
CN105609101A (en) * 2014-11-14 2016-05-25 现代自动车株式会社 Speech recognition system and speech recognition method
CN106328125A (en) * 2016-10-28 2017-01-11 许昌学院 Henan dialect speech recognition system
CN106448657A (en) * 2016-10-26 2017-02-22 安徽省云逸智能科技有限公司 Continuous speech recognition system for restaurant robot servant
CN106531160A (en) * 2016-10-26 2017-03-22 安徽省云逸智能科技有限公司 Continuous speech recognition system based on wordnet language model
CN106537493A (en) * 2015-09-29 2017-03-22 深圳市全圣时代科技有限公司 Speech recognition system and method, client device and cloud server
CN107248409A (en) * 2017-05-23 2017-10-13 四川欣意迈科技有限公司 A kind of multi-language translation method of dialect linguistic context
CN108922535A (en) * 2018-08-23 2018-11-30 上海华测导航技术股份有限公司 Voice interactive system and exchange method for receiver
CN110047478A (en) * 2018-01-16 2019-07-23 中国科学院声学研究所 Multicenter voice based on space characteristics compensation identifies Acoustic Modeling method and device
CN113571043A (en) * 2021-07-27 2021-10-29 广州欢城文化传媒有限公司 Dialect simulation force evaluation method and device, electronic equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6434522B1 (en) * 1992-06-18 2002-08-13 Matsushita Electric Industrial Co Ltd Combined quantized and continuous feature vector HMM approach to speech recognition
CN1499484A (en) * 2002-11-06 2004-05-26 北京天朗语音科技有限公司 Recognition system of Chinese continuous speech
WO2007034478A2 (en) * 2005-09-20 2007-03-29 Gadi Rechlis System and method for correcting speech
CN101226742A (en) * 2007-12-05 2008-07-23 浙江大学 Voiceprint Recognition Method Based on Emotion Compensation
CN101650945A (en) * 2009-09-17 2010-02-17 浙江工业大学 Method for recognizing speaker based on multivariate core logistic regression model
WO2010019831A1 (en) * 2008-08-14 2010-02-18 21Ct, Inc. Hidden markov model for speech processing with training method
US20100161330A1 (en) * 2005-06-17 2010-06-24 Microsoft Corporation Speech models generated using competitive training, asymmetric training, and data boosting

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6434522B1 (en) * 1992-06-18 2002-08-13 Matsushita Electric Industrial Co Ltd Combined quantized and continuous feature vector HMM approach to speech recognition
CN1499484A (en) * 2002-11-06 2004-05-26 北京天朗语音科技有限公司 Recognition system of Chinese continuous speech
US20100161330A1 (en) * 2005-06-17 2010-06-24 Microsoft Corporation Speech models generated using competitive training, asymmetric training, and data boosting
WO2007034478A2 (en) * 2005-09-20 2007-03-29 Gadi Rechlis System and method for correcting speech
CN101226742A (en) * 2007-12-05 2008-07-23 浙江大学 Voiceprint Recognition Method Based on Emotion Compensation
WO2010019831A1 (en) * 2008-08-14 2010-02-18 21Ct, Inc. Hidden markov model for speech processing with training method
CN101650945A (en) * 2009-09-17 2010-02-17 浙江工业大学 Method for recognizing speaker based on multivariate core logistic regression model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王岐学,钱盛友,赵新民: "基于差分特征和高斯混合模型的湖南方言识别", 《计算机工程与应用》, 31 December 2009 (2009-12-31), pages 2 - 5 *

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103440865B (en) * 2013-08-06 2016-03-30 普强信息技术(北京)有限公司 The post-processing approach of speech recognition
CN103440865A (en) * 2013-08-06 2013-12-11 普强信息技术(北京)有限公司 Post-processing method for voice recognition
CN103886236A (en) * 2014-03-17 2014-06-25 深圳市中兴移动通信有限公司 Acoustic control screen unlocking method and mobile terminal
CN104123934A (en) * 2014-07-23 2014-10-29 泰亿格电子(上海)有限公司 Speech composition recognition method and system
CN104183245A (en) * 2014-09-04 2014-12-03 福建星网视易信息系统有限公司 Method and device for recommending music stars with tones similar to those of singers
CN105609101A (en) * 2014-11-14 2016-05-25 现代自动车株式会社 Speech recognition system and speech recognition method
CN104900235A (en) * 2015-05-25 2015-09-09 重庆大学 Voiceprint recognition method based on pitch period mixed characteristic parameters
CN104835495A (en) * 2015-05-30 2015-08-12 宁波摩米创新工场电子科技有限公司 High-definition voice recognition system based on low pass filter
CN104835495B (en) * 2015-05-30 2018-05-08 宁波摩米创新工场电子科技有限公司 A kind of high definition speech recognition system based on low-pass filtering
CN106537493A (en) * 2015-09-29 2017-03-22 深圳市全圣时代科技有限公司 Speech recognition system and method, client device and cloud server
WO2017054122A1 (en) * 2015-09-29 2017-04-06 深圳市全圣时代科技有限公司 Speech recognition system and method, client device and cloud server
CN106531160A (en) * 2016-10-26 2017-03-22 安徽省云逸智能科技有限公司 Continuous speech recognition system based on wordnet language model
CN106448657A (en) * 2016-10-26 2017-02-22 安徽省云逸智能科技有限公司 Continuous speech recognition system for restaurant robot servant
CN106328125A (en) * 2016-10-28 2017-01-11 许昌学院 Henan dialect speech recognition system
CN106328125B (en) * 2016-10-28 2023-08-04 许昌学院 A Speech Recognition System for Henan Dialect
CN107248409A (en) * 2017-05-23 2017-10-13 四川欣意迈科技有限公司 A kind of multi-language translation method of dialect linguistic context
CN110047478A (en) * 2018-01-16 2019-07-23 中国科学院声学研究所 Multicenter voice based on space characteristics compensation identifies Acoustic Modeling method and device
CN110047478B (en) * 2018-01-16 2021-06-08 中国科学院声学研究所 Acoustic modeling method and device for multi-channel speech recognition based on spatial feature compensation
CN108922535A (en) * 2018-08-23 2018-11-30 上海华测导航技术股份有限公司 Voice interactive system and exchange method for receiver
CN113571043A (en) * 2021-07-27 2021-10-29 广州欢城文化传媒有限公司 Dialect simulation force evaluation method and device, electronic equipment and storage medium
CN113571043B (en) * 2021-07-27 2024-06-04 广州欢城文化传媒有限公司 Dialect simulation force evaluation method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN102543073B (en) 2014-05-14

Similar Documents

Publication Publication Date Title
CN102543073B (en) Shanghai dialect phonetic recognition information processing method
US11056097B2 (en) Method and system for generating advanced feature discrimination vectors for use in speech recognition
JP4802135B2 (en) Speaker authentication registration and confirmation method and apparatus
Kumar et al. Design of an automatic speaker recognition system using MFCC, vector quantization and LBG algorithm
Ali et al. Mel frequency cepstral coefficient: a review
Shanthi et al. Review of feature extraction techniques in automatic speech recognition
CN108108357B (en) Accent conversion method and device and electronic equipment
CN114283822B (en) Gamma pass frequency cepstrum coefficient-based many-to-one voice conversion method
CN117935789B (en) Speech recognition method, system, device, and storage medium
CN120148484B (en) Speech recognition method and device based on microcomputer
Gamit et al. Isolated words recognition using mfcc lpc and neural network
Kanabur et al. An extensive review of feature extraction techniques, challenges and trends in automatic speech recognition
CN118411982A (en) English voice signal processing recognition method based on artificial intelligence
CN119204030B (en) Voice translation method and device for solving voice ambiguity
Sethu et al. Empirical mode decomposition based weighted frequency feature for speech-based emotion classification
Dave et al. Speech recognition: A review
Biswas et al. Hindi vowel classification using GFCC and formant analysis in sensor mismatch condition
Bahaghighat et al. Text-dependent Speaker Recognition by Combination of LBG VQ and DTW for Persian language
Nijhawan et al. A new design approach for speaker recognition using MFCC and VAD
CN118197309A (en) Intelligent multimedia terminal based on AI speech recognition
CN116469405A (en) A noise-reduction call method, medium and electronic device
CN114724589A (en) Voice quality inspection method and device, electronic equipment and storage medium
Chandrasekaram New Feature Vector based on GFCC for Language Recognition
Guntur Feature extraction algorithms for speaker recognition system and fuzzy logic
Chougule et al. Filter bank based cepstral features for speaker recognition

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant