CN102543073A - Shanghai dialect phonetic recognition information processing method - Google Patents
Shanghai dialect phonetic recognition information processing method Download PDFInfo
- Publication number
- CN102543073A CN102543073A CN2010105833672A CN201010583367A CN102543073A CN 102543073 A CN102543073 A CN 102543073A CN 2010105833672 A CN2010105833672 A CN 2010105833672A CN 201010583367 A CN201010583367 A CN 201010583367A CN 102543073 A CN102543073 A CN 102543073A
- Authority
- CN
- China
- Prior art keywords
- voice
- model
- module
- shanghai
- input
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000010365 information processing Effects 0.000 title abstract 3
- 238000003672 processing method Methods 0.000 title abstract 3
- 238000012549 training Methods 0.000 claims abstract description 15
- 239000000284 extract Substances 0.000 claims abstract description 8
- 238000012545 processing Methods 0.000 claims abstract description 8
- 238000007781 pre-processing Methods 0.000 claims abstract description 7
- 238000012805 post-processing Methods 0.000 claims abstract description 6
- 238000000605 extraction Methods 0.000 claims abstract description 5
- 238000001228 spectrum Methods 0.000 claims description 6
- 230000009466 transformation Effects 0.000 claims description 6
- 238000001914 filtration Methods 0.000 claims description 5
- 238000001514 detection method Methods 0.000 claims description 4
- 238000012937 correction Methods 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 230000008676 import Effects 0.000 claims description 2
- 241001672694 Citrus reticulata Species 0.000 description 4
- 238000000034 method Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000019771 cognition Effects 0.000 description 1
- 230000002950 deficient Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
Images
Landscapes
- Machine Translation (AREA)
Abstract
The invention relates to a Shanghai dialect phonetic recognition information processing method, which includes steps that: 1) a voice input device inputs Shanghai dialect signals; 2) a preprocessing module preprocesses the input Shanghai dialect phonetic signals; 3) a feature extraction module extracts feature parameters reflecting signal features; 4) a training module performs preprocessing and feature parameter extraction on training phonetic signals input by users for a plurality of times to obtain feature vector parameters and then a feature modeling module builds a reference model base for training voice; 5) a recognition module carries out similarity comparison on feature vector parameters of the input voice and models in the reference module base and outputs the input model with the highest similarity as a recognition candidate result; 6) a postprocessing module performs phonetic knowledge processing on the recognition candidate result in step 5) to obtain the final recognition result; and 7) the final recognition result is output through a voice output device. Compared with the prior art, the Shanghai dialect phonetic recognition information processing method has the advantages of being high in recognition speed and the like.
Description
Technical field
The present invention relates to a kind of audio recognition method, especially relate to a kind of Shanghai language voice recognition information disposal route.
Background technology
What aspect speech recognition, carry out the earliest is speaker's identification; Mainly concentrating on simple people's ear listens and distinguishes; Real speech recognition is that research adopts voice signal linear forecast coding technology and dynamic time warping technological, mainly is for isolated word, employing be the technology of template matches.China just carried out the research of speech recognition aspect to mandarin since 1987, and then for dialectal accent, the identification of dialect development relatively lags behind.Speak in the phonetic system structure in Shanghai, prosodic features, and the language syntax aspect all is different from mandarin.Can not simply use the method for identification mandarin and discern the Shanghai language.And the model of cognition of mandarin adopted classical H MM, and this method can cause the high problem of space-time complexity.
Summary of the invention
The object of the invention is exactly to provide a kind of recognition speed high Shanghai language voice recognition information disposal route for the defective that overcomes above-mentioned prior art existence.
The object of the invention can be realized through following technical scheme:
A kind of Shanghai language voice recognition information disposal route is characterized in that, may further comprise the steps:
1) audio input device input Shanghai language signal;
2) pre-processing module is carried out pre-service to the Shanghai language voice signal of input;
3) characteristic extracting module extracts the characteristic parameter of reflected signal characteristic;
4) training module is imported several times training utterance signal with the user; Through obtaining character vector after pre-service and the characteristic parameter extraction; Set up the reference model storehouse of training utterance then through the feature modeling module, or the reference model in the model bank is done the adaptability correction;
5) identification module character vector and the model in the reference model storehouse that will import voice carries out similarity and compares, and the input of the model that similarity is the highest is as the output of identification candidate result;
6) post-processing module obtains final recognition result to the identification candidate structure in the step 5) through the voice knowledge processing;
7) final recognition result is through audio output device output.
Described step 2) pre-service in comprises carries out end-point detection to noisy speech signal, and voice divide frame and pre-emphasis to handle.
The characteristic parameter step that extracts the reflected signal characteristic in the described step 3) is following:
1) choose pitch period, resonance peak and based on the Mel frequency cepstral coefficient of auditory properties as characteristic parameter;
2) voice signal is carried out LPF after, sample to set sampling frequency, to calculate related coefficient in short-term by frame the retardation time of setting, obtain pitch period at last;
3) directly voice signal is asked discrete Fourier transformation, compose the formant parameter that extracts voice signal with DFT;
4) carry out filtering with M Mel BPF., the output of each wave filter is taken the logarithm, obtain the log power spectrum of frequency band, and carry out inverse discrete cosine transformation, obtain L dimension Mel frequency cepstral coefficient, get preceding 12 dimension Mel frequency cepstral coefficients.
Reference model in the described step 4) is GMM and semicontinuous HMM model; This model comprises the tranining database of Shanghai language voice and the code book that is generated by database; In conjunction with code book and tranining database, calculate the mixed weighting value of acoustic model, generate GMM and semicontinuous HMM model at last.
Voice knowledge processing in the described step 6) comprises language model, morphology, sentence structure processing.
Compared with prior art; The present invention has the Shanghai phonics model based on multichannel GMM and semicontinuous HMM, and it is high that this model has solved HMM model space-time complexity to a certain extent, problems such as complicacy; Based on hyperchannel more accurate the estimation of each additional weights, improved recognition speed.
Description of drawings
Fig. 1 is a process flow diagram of the present invention;
Fig. 2 is a hardware configuration synoptic diagram of the present invention.
Embodiment
Below in conjunction with accompanying drawing and specific embodiment the present invention is elaborated.
Embodiment
As shown in Figure 1, a kind of Shanghai language voice recognition information disposal route is characterized in that, may further comprise the steps:
The Shanghai language voice signal of step 102,21 pairs of inputs of pre-processing module carries out pre-service, and it mainly carries out end-point detection to noisy speech signal, and voice divide frame and pre-emphasis to handle;
Character vector and the model in the reference model storehouse that step 108, identification module 24 will be imported voice carry out similarity and compare, and the input of the model that similarity is the highest is as the output of identification candidate result;
Identification candidate structure in step 109,25 pairs of steps 108 of post-processing module obtains final recognition result through the voice knowledge processing;
Step 110, final recognition result are exported through audio output device 3.
As shown in Figure 2; Hardware device of the present invention comprises audio input device 1, processor 2, audio output device 3; Described processor 2 comprises pre-processing module 21, characteristic extracting module 22, training module 23, identification module 24, post-processing module 25; Described audio input device 1 is connected with pre-processing module 21; Described characteristic extracting module 22 is connected with training module 23, identification module 24 respectively, and described training module 23 is connected with identification module 24, and described identification module 24, post-processing module 25, audio output device 3 connect successively.
Claims (5)
1. a Shanghai language voice recognition information disposal route is characterized in that, may further comprise the steps:
1) audio input device input Shanghai language signal;
2) pre-processing module is carried out pre-service to the Shanghai language voice signal of input;
3) characteristic extracting module extracts the characteristic parameter of reflected signal characteristic;
4) training module is imported several times training utterance signal with the user; Through obtaining character vector after pre-service and the characteristic parameter extraction; Set up the reference model storehouse of training utterance then through the feature modeling module, or the reference model in the model bank is done the adaptability correction;
5) identification module character vector and the model in the reference model storehouse that will import voice carries out similarity and compares, and the input of the model that similarity is the highest is as the output of identification candidate result;
6) post-processing module obtains final recognition result to the identification candidate structure in the step 5) through the voice knowledge processing;
7) final recognition result is through audio output device output.
2. a kind of Shanghai according to claim 1 language voice recognition information disposal route is characterized in that described step 2) in pre-service comprise noisy speech signal carried out end-point detection that voice divide frame and pre-emphasis to handle.
3. a kind of Shanghai according to claim 1 language voice recognition information disposal route is characterized in that the characteristic parameter step that extracts the reflected signal characteristic in the described step 3) is following:
1) choose pitch period, resonance peak and based on the Mel frequency cepstral coefficient of auditory properties as characteristic parameter;
2) voice signal is carried out LPF after, sample to set sampling frequency, to calculate related coefficient in short-term by frame the retardation time of setting, obtain pitch period at last;
3) directly voice signal is asked discrete Fourier transformation, compose the formant parameter that extracts voice signal with DFT;
4) carry out filtering with M Mel BPF., the output of each wave filter is taken the logarithm, obtain the log power spectrum of frequency band, and carry out inverse discrete cosine transformation, obtain L dimension Mel frequency cepstral coefficient, get preceding 12 dimension Mel frequency cepstral coefficients.
4. a kind of Shanghai according to claim 1 language voice recognition information disposal route; It is characterized in that; Reference model in the described step 4) is GMM and semicontinuous HMM model, and this model comprises the tranining database of Shanghai language voice and the code book that is generated by database, in conjunction with code book and tranining database; Calculate the mixed weighting value of acoustic model, generate GMM and semicontinuous HMM model at last.
5. a kind of Shanghai according to claim 1 language voice recognition information disposal route is characterized in that the voice knowledge processing in the described step 6) comprises language model, morphology, sentence structure processing.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201010583367.2A CN102543073B (en) | 2010-12-10 | 2010-12-10 | Shanghai dialect phonetic recognition information processing method |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201010583367.2A CN102543073B (en) | 2010-12-10 | 2010-12-10 | Shanghai dialect phonetic recognition information processing method |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN102543073A true CN102543073A (en) | 2012-07-04 |
| CN102543073B CN102543073B (en) | 2014-05-14 |
Family
ID=46349813
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201010583367.2A Active CN102543073B (en) | 2010-12-10 | 2010-12-10 | Shanghai dialect phonetic recognition information processing method |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN102543073B (en) |
Cited By (15)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103440865A (en) * | 2013-08-06 | 2013-12-11 | 普强信息技术(北京)有限公司 | Post-processing method for voice recognition |
| CN103886236A (en) * | 2014-03-17 | 2014-06-25 | 深圳市中兴移动通信有限公司 | Acoustic control screen unlocking method and mobile terminal |
| CN104123934A (en) * | 2014-07-23 | 2014-10-29 | 泰亿格电子(上海)有限公司 | Speech composition recognition method and system |
| CN104183245A (en) * | 2014-09-04 | 2014-12-03 | 福建星网视易信息系统有限公司 | Method and device for recommending music stars with tones similar to those of singers |
| CN104835495A (en) * | 2015-05-30 | 2015-08-12 | 宁波摩米创新工场电子科技有限公司 | High-definition voice recognition system based on low pass filter |
| CN104900235A (en) * | 2015-05-25 | 2015-09-09 | 重庆大学 | Voiceprint recognition method based on pitch period mixed characteristic parameters |
| CN105609101A (en) * | 2014-11-14 | 2016-05-25 | 现代自动车株式会社 | Speech recognition system and speech recognition method |
| CN106328125A (en) * | 2016-10-28 | 2017-01-11 | 许昌学院 | Henan dialect speech recognition system |
| CN106448657A (en) * | 2016-10-26 | 2017-02-22 | 安徽省云逸智能科技有限公司 | Continuous speech recognition system for restaurant robot servant |
| CN106531160A (en) * | 2016-10-26 | 2017-03-22 | 安徽省云逸智能科技有限公司 | Continuous speech recognition system based on wordnet language model |
| CN106537493A (en) * | 2015-09-29 | 2017-03-22 | 深圳市全圣时代科技有限公司 | Speech recognition system and method, client device and cloud server |
| CN107248409A (en) * | 2017-05-23 | 2017-10-13 | 四川欣意迈科技有限公司 | A kind of multi-language translation method of dialect linguistic context |
| CN108922535A (en) * | 2018-08-23 | 2018-11-30 | 上海华测导航技术股份有限公司 | Voice interactive system and exchange method for receiver |
| CN110047478A (en) * | 2018-01-16 | 2019-07-23 | 中国科学院声学研究所 | Multicenter voice based on space characteristics compensation identifies Acoustic Modeling method and device |
| CN113571043A (en) * | 2021-07-27 | 2021-10-29 | 广州欢城文化传媒有限公司 | Dialect simulation force evaluation method and device, electronic equipment and storage medium |
Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6434522B1 (en) * | 1992-06-18 | 2002-08-13 | Matsushita Electric Industrial Co Ltd | Combined quantized and continuous feature vector HMM approach to speech recognition |
| CN1499484A (en) * | 2002-11-06 | 2004-05-26 | 北京天朗语音科技有限公司 | Recognition system of Chinese continuous speech |
| WO2007034478A2 (en) * | 2005-09-20 | 2007-03-29 | Gadi Rechlis | System and method for correcting speech |
| CN101226742A (en) * | 2007-12-05 | 2008-07-23 | 浙江大学 | Voiceprint Recognition Method Based on Emotion Compensation |
| CN101650945A (en) * | 2009-09-17 | 2010-02-17 | 浙江工业大学 | Method for recognizing speaker based on multivariate core logistic regression model |
| WO2010019831A1 (en) * | 2008-08-14 | 2010-02-18 | 21Ct, Inc. | Hidden markov model for speech processing with training method |
| US20100161330A1 (en) * | 2005-06-17 | 2010-06-24 | Microsoft Corporation | Speech models generated using competitive training, asymmetric training, and data boosting |
-
2010
- 2010-12-10 CN CN201010583367.2A patent/CN102543073B/en active Active
Patent Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6434522B1 (en) * | 1992-06-18 | 2002-08-13 | Matsushita Electric Industrial Co Ltd | Combined quantized and continuous feature vector HMM approach to speech recognition |
| CN1499484A (en) * | 2002-11-06 | 2004-05-26 | 北京天朗语音科技有限公司 | Recognition system of Chinese continuous speech |
| US20100161330A1 (en) * | 2005-06-17 | 2010-06-24 | Microsoft Corporation | Speech models generated using competitive training, asymmetric training, and data boosting |
| WO2007034478A2 (en) * | 2005-09-20 | 2007-03-29 | Gadi Rechlis | System and method for correcting speech |
| CN101226742A (en) * | 2007-12-05 | 2008-07-23 | 浙江大学 | Voiceprint Recognition Method Based on Emotion Compensation |
| WO2010019831A1 (en) * | 2008-08-14 | 2010-02-18 | 21Ct, Inc. | Hidden markov model for speech processing with training method |
| CN101650945A (en) * | 2009-09-17 | 2010-02-17 | 浙江工业大学 | Method for recognizing speaker based on multivariate core logistic regression model |
Non-Patent Citations (1)
| Title |
|---|
| 王岐学,钱盛友,赵新民: "基于差分特征和高斯混合模型的湖南方言识别", 《计算机工程与应用》, 31 December 2009 (2009-12-31), pages 2 - 5 * |
Cited By (21)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103440865B (en) * | 2013-08-06 | 2016-03-30 | 普强信息技术(北京)有限公司 | The post-processing approach of speech recognition |
| CN103440865A (en) * | 2013-08-06 | 2013-12-11 | 普强信息技术(北京)有限公司 | Post-processing method for voice recognition |
| CN103886236A (en) * | 2014-03-17 | 2014-06-25 | 深圳市中兴移动通信有限公司 | Acoustic control screen unlocking method and mobile terminal |
| CN104123934A (en) * | 2014-07-23 | 2014-10-29 | 泰亿格电子(上海)有限公司 | Speech composition recognition method and system |
| CN104183245A (en) * | 2014-09-04 | 2014-12-03 | 福建星网视易信息系统有限公司 | Method and device for recommending music stars with tones similar to those of singers |
| CN105609101A (en) * | 2014-11-14 | 2016-05-25 | 现代自动车株式会社 | Speech recognition system and speech recognition method |
| CN104900235A (en) * | 2015-05-25 | 2015-09-09 | 重庆大学 | Voiceprint recognition method based on pitch period mixed characteristic parameters |
| CN104835495A (en) * | 2015-05-30 | 2015-08-12 | 宁波摩米创新工场电子科技有限公司 | High-definition voice recognition system based on low pass filter |
| CN104835495B (en) * | 2015-05-30 | 2018-05-08 | 宁波摩米创新工场电子科技有限公司 | A kind of high definition speech recognition system based on low-pass filtering |
| CN106537493A (en) * | 2015-09-29 | 2017-03-22 | 深圳市全圣时代科技有限公司 | Speech recognition system and method, client device and cloud server |
| WO2017054122A1 (en) * | 2015-09-29 | 2017-04-06 | 深圳市全圣时代科技有限公司 | Speech recognition system and method, client device and cloud server |
| CN106531160A (en) * | 2016-10-26 | 2017-03-22 | 安徽省云逸智能科技有限公司 | Continuous speech recognition system based on wordnet language model |
| CN106448657A (en) * | 2016-10-26 | 2017-02-22 | 安徽省云逸智能科技有限公司 | Continuous speech recognition system for restaurant robot servant |
| CN106328125A (en) * | 2016-10-28 | 2017-01-11 | 许昌学院 | Henan dialect speech recognition system |
| CN106328125B (en) * | 2016-10-28 | 2023-08-04 | 许昌学院 | A Speech Recognition System for Henan Dialect |
| CN107248409A (en) * | 2017-05-23 | 2017-10-13 | 四川欣意迈科技有限公司 | A kind of multi-language translation method of dialect linguistic context |
| CN110047478A (en) * | 2018-01-16 | 2019-07-23 | 中国科学院声学研究所 | Multicenter voice based on space characteristics compensation identifies Acoustic Modeling method and device |
| CN110047478B (en) * | 2018-01-16 | 2021-06-08 | 中国科学院声学研究所 | Acoustic modeling method and device for multi-channel speech recognition based on spatial feature compensation |
| CN108922535A (en) * | 2018-08-23 | 2018-11-30 | 上海华测导航技术股份有限公司 | Voice interactive system and exchange method for receiver |
| CN113571043A (en) * | 2021-07-27 | 2021-10-29 | 广州欢城文化传媒有限公司 | Dialect simulation force evaluation method and device, electronic equipment and storage medium |
| CN113571043B (en) * | 2021-07-27 | 2024-06-04 | 广州欢城文化传媒有限公司 | Dialect simulation force evaluation method and device, electronic equipment and storage medium |
Also Published As
| Publication number | Publication date |
|---|---|
| CN102543073B (en) | 2014-05-14 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN102543073B (en) | Shanghai dialect phonetic recognition information processing method | |
| US11056097B2 (en) | Method and system for generating advanced feature discrimination vectors for use in speech recognition | |
| JP4802135B2 (en) | Speaker authentication registration and confirmation method and apparatus | |
| Kumar et al. | Design of an automatic speaker recognition system using MFCC, vector quantization and LBG algorithm | |
| Ali et al. | Mel frequency cepstral coefficient: a review | |
| Shanthi et al. | Review of feature extraction techniques in automatic speech recognition | |
| CN108108357B (en) | Accent conversion method and device and electronic equipment | |
| CN114283822B (en) | Gamma pass frequency cepstrum coefficient-based many-to-one voice conversion method | |
| CN117935789B (en) | Speech recognition method, system, device, and storage medium | |
| CN120148484B (en) | Speech recognition method and device based on microcomputer | |
| Gamit et al. | Isolated words recognition using mfcc lpc and neural network | |
| Kanabur et al. | An extensive review of feature extraction techniques, challenges and trends in automatic speech recognition | |
| CN118411982A (en) | English voice signal processing recognition method based on artificial intelligence | |
| CN119204030B (en) | Voice translation method and device for solving voice ambiguity | |
| Sethu et al. | Empirical mode decomposition based weighted frequency feature for speech-based emotion classification | |
| Dave et al. | Speech recognition: A review | |
| Biswas et al. | Hindi vowel classification using GFCC and formant analysis in sensor mismatch condition | |
| Bahaghighat et al. | Text-dependent Speaker Recognition by Combination of LBG VQ and DTW for Persian language | |
| Nijhawan et al. | A new design approach for speaker recognition using MFCC and VAD | |
| CN118197309A (en) | Intelligent multimedia terminal based on AI speech recognition | |
| CN116469405A (en) | A noise-reduction call method, medium and electronic device | |
| CN114724589A (en) | Voice quality inspection method and device, electronic equipment and storage medium | |
| Chandrasekaram | New Feature Vector based on GFCC for Language Recognition | |
| Guntur | Feature extraction algorithms for speaker recognition system and fuzzy logic | |
| Chougule et al. | Filter bank based cepstral features for speaker recognition |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| C14 | Grant of patent or utility model | ||
| GR01 | Patent grant |