WO2017025108A2 - Sequencing the speech signal - Google Patents
Sequencing the speech signal Download PDFInfo
- Publication number
- WO2017025108A2 WO2017025108A2 PCT/EG2016/000029 EG2016000029W WO2017025108A2 WO 2017025108 A2 WO2017025108 A2 WO 2017025108A2 EG 2016000029 W EG2016000029 W EG 2016000029W WO 2017025108 A2 WO2017025108 A2 WO 2017025108A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- frequency bands
- presented
- speech
- frequency band
- sequencing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R25/00—Deaf-aid sets, i.e. electro-acoustic or electro-mechanical hearing aids; Electric tinnitus maskers providing an auditory perception
- H04R25/35—Deaf-aid sets, i.e. electro-acoustic or electro-mechanical hearing aids; Electric tinnitus maskers providing an auditory perception using translation techniques
- H04R25/353—Frequency, e.g. frequency shift or compression
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0316—Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
- G10L21/0364—Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude for improving intelligibility
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2225/00—Details of deaf aids covered by H04R25/00, not provided for in any of its subgroups
- H04R2225/43—Signal processing in hearing aids to enhance the speech intelligibility
Definitions
- THE DISCLOSURE MAY BE USEFUL IN APPLICATIONS SUCH AS COMMUNICATION DEVICES, E.G. TELEPHONES, OR LISTENING DEVICES, E.G. HEARING INSTRUMENTS, HEADSETS, HEAD PHONES, ACTIVE EAR PROTECTION DEVICES.
- VOWELS PRECEDING A VOICED CONSONANT ARE LONGER IN DURATION THAN VOWELS THAT PRECEDE AN UNVOICED CONSONANT. SO ENHANCEMENT THE DURATION OF VOWELS MIGHT BE USEFUL IN RECOGNITION OF FOLLOWING CONSONANT.
- FIG (2) AS EXAMPLE THE HIGH FREQUENCY BAND PRESENTED FIRST THEN HIGH AND MIDDLE FREQUENCY BANDS PRESENTED SECOND THEN ALL FREQUENCY BANDS PRESENTED LASTLY .
- FIG (1) FIRST SUGGESTED METHOD TO SEQUENCING A SINGLE SPEECH PHONEME SIGNAL.
- 1 HIGH FREQUENCY BAND PRESENTED 1 st
- 2 MIDDLE FREQUENCY BAND PRESENTED 2 nd
- 3 LOW FREQUENCY BAND PRESENTED LASTLY.
- FIG (2) SECOND SUGGESTED METHOD TO SEQUENCING A SINGLE SPEECH PHONEME SIGNAL.
- 1 HIGH FREQUENCY BAND PRESENTED 1 st
- 2 HIGH AND MIDDLE FREQUENCY BANDS PRESENTED 2 nd
- 3 ALL FREQUENCY BANDS PRESENTED LASTLY.
Landscapes
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Acoustics & Sound (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Otolaryngology (AREA)
- Neurosurgery (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Prostheses (AREA)
- Circuits Of Receivers In General (AREA)
Abstract
A method of operating an audio processing device to improve a user's perception to speech sound. The method comprising: Sequencing Speech Signal by splitting an audio signal into a plurality of frequency bands, and presenting in sequence (Non-simultaneous) these speech frequency bands from the high frequency bands to the low frequency bands.
Description
SEQUENCING THE SPEECH SIGNAL
TECHNICAL FIELD
THE PRESENT APPLICATION RELATES TO IMPROVE SPEECH PERCEPTION, E.G. SPEECH INTELLIGIBILITY, IN PARTICULAR TO IMPROVING SOUND PERCEPTION FOR A PERSON, E.G. A HEARING IMPAIRED PERSON.
THE APPLICATION RELATES TO AN AUDIO PROCESSING DEVICE AND IT'S USE LIKE ALL KINDS OF HEARING AIDS AND COCHLEAR IMPLANTS.
THE APPLICATION FURTHER RELATES TO A DATA PROCESSING SYSTEM COMPRISING A PROCESSOR PERFORMING THE METHOD.
THE DISCLOSURE MAY BE USEFUL IN APPLICATIONS SUCH AS COMMUNICATION DEVICES, E.G. TELEPHONES, OR LISTENING DEVICES, E.G. HEARING INSTRUMENTS, HEADSETS, HEAD PHONES, ACTIVE EAR PROTECTION DEVICES.
BACKGROUND ART
THE FOLLOWING WAYS FOR THE LOSS OF HEARING DESCRIBED IS TO MODIFY SPEECH TO MAKE IT MORE INTELLIGIBLE FOR PEOPLE WITH SENSORY NEURAL HEARING LOSS.
1- ENHANCEMENT SPECTRAL SHAPE:-
IT INCREASE THE GAIN FOR FREQUENCIES WITH HIGH CONCENTRATION OF ACOUSTIC ENERGY IN THE SPEECH WAVE (E.G. FORMANTS) TO MAKE THESE PEAKS MORE PROMINENT IN SPEECH SPECTRUM, UNFORTUNATELY, IMPROVEMENT IN INTELLIGIBILITY HAVE BEEN SMALL OR NON-EXISTENT.
2- ENHANCEMENT OF CONSONANT TO VOWEL RATIO :-
IT INCREASE THE GAIN FOR CONSONANT SOUNDS BUT NOT FOR VOWEL SOUNDS BASED ON THE IMPORTANCE OF CONSONANTS SOUND FOR INTELLIGIBILITY AND TO PREVENT MASKING OF CONSONANTS BY VOWELS.
3- TRANSIENT ENHANCEMENT:-
IT INCREASE THE RATE OF CHANGE IN INTENSITY FOR SOUNDS BASED ON THAT MANY CONSONANT SOUNDS HAVE RAPID INTENSITY CHANGES THAT MIGHT BE IMPORTANT FOR THESE SOUNDS RECOGNITION, BUT ITS USEFULNESS IN REAL LIFE MIGHT BE DISAPPOIMTNG.
4- ENHANCEMENT OF DURATION:-
VOWELS PRECEDING A VOICED CONSONANT ARE LONGER IN DURATION THAN VOWELS THAT PRECEDE AN UNVOICED CONSONANT. SO ENHANCEMENT THE DURATION OF VOWELS
MIGHT BE USEFUL IN RECOGNITION OF FOLLOWING CONSONANT.
UNFORTUNATELY THE INTELLIGIBILITY WITH THIS METHOD IS NOT HIGH.
5- SPEECH SIMPLIFICATION;-
INTERACTION OF MANY CUES OF SPEECH STIMULUS MAY BE DIFFICULT FOR HEARING IMPAIRED PERSON WITH A LIMITED HEARING ABILITY TO SEPARATE THESE CUES, SO REPLACE THE SPEECH SIGNAL AS AN EXTREME WITH PURE TONES WILL DECREASE THE CUES NEEDED TO BE RECOGNIZED AND SEPARATED, THAT MIGHT BE USEFUL IN INTELLIGIBILITY.
THIS METHOD APPEAR TO BE BENEFICIAL ONLY FOR SEVER HEARING IMPAIRED.
6- ENHANCEMENT BY RE-SYNTHESIS:-
IT CONSIST OF RECOGNITION OF SPEECH SIGNAL BY HEARING AID PROCESSOR THEN RESYNTHESIZED IT IN A CLEAR, NOISE FREE WAY.
THIS METHOD IS HIGHLY AFFECTED BY NOISE WHILE ACCENTS AND EMOTION WILL NOT BE CONVOYED.
DISCLOSURE OF INVENTION
SEQUENCING SPEECH SIGNAL:
MASKING WHICH OBSCURES A SOUND IMMEDIATELY FOLLOWING THE MASKER IS CALLED FORWARD MASKING. THAT MEAN THE SIGNAL IS PERSIST FOR SOME TIME AFTER IT TURNED OFF.
UPWARD SPREAD OF MASKING IS LOW-FREQUENCY SOUNDS MASKING HIGH-FREQUENCY SOUNDS.
WE CAN TAKE BENEFIT OF THESE TWO FACTS BY SEQUENCING
(NON-SIMULTANEOUS) THE SPEECH SIGNAL FOR EACH SPEECH PHONEME SO THE HIGH FREQUENCY INFORMATION IS PRESENTED FIRST THEN THE LOW FREQUENCY INFORMATION IS PRESENTED LATER. BY THIS MECHANISM THE UPWARD SPREAD OF MASKING WILL NOT OCCUR BECAUSE THE HIGH FREQUENCY WILL BE PRESENTED WITHOUT THE LOW FREQUENCY PART OF SPEECH SIGNAL.
THERE ARE TWO SUGGESTED METHODS TO SEQUENCING SPEECH SIGNAL: a. EACH FREQUENCY BAND PRESENTED ALONE FROM HIGH FREQUENCY BANDS TO LOW FREQUENCY BANDS, SEE FIG (1), AS EXAMPLE THE HIGH FREQUENCY BAND PRESENTED FIRST THEN MIDDLE FREQUENCY BAND PRESENTED SECOND THEN LOW FREQUENCY BAND PRESENTED LASTLY.
b. HIGH FREQUENCY BANDS PRESENTED FIRST THEN LOWER FREQUENCY BANDS ADDED TO THE HIGHER BANDS THEN PRESENTED SIMULTANEOUSLY, SEE FIG (2) AS EXAMPLE THE HIGH FREQUENCY BAND PRESENTED FIRST THEN HIGH AND MIDDLE FREQUENCY BANDS PRESENTED SECOND THEN ALL FREQUENCY BANDS PRESENTED LASTLY .
DURATION OF PRESENTATION OF EACH FREQUENCY BAND:-
THERE ARE MANY METHODS COULD BE USED TO DETERMINE THE DURATION OF EACH FREQUENCY BAND PRESENTATION, I WILL DISCUSS TWO METHODS OF THEM AS EXAMPLES :-
1- THE DURATION OF EACH FREQUENCY BAND OF EACH PHONEME COULD BE CONSTANT, I.E. THE DURATION FIXED FOR ANY PHONEME , BUT WE MUST BE SURE THAT THE SUM OF ALL FREQUENCY BANDS DURATION NOT EXCEED THE DURATION OF ANY PHONEME, THIS COULD BE DONE BY PROVIDE RELATIVELY SMALL DURATION FOR ALL FREQUENCY BANDS EXCEPT THE LAST ONE THAT COULD BE PRESENTED AS LONG AS PHONEME PRESENTED; FOR EXAMPLE THE DURATION OF PHONEME IS 90 MSEC THEN HIGH FREQUENCY PART LAST FOR (E.G. 25MSEC), MIDDLE FREQUENCY BAND LAST FOR (E.G. 25 MSEC), AND LOW FREQUENCY BAND LAST (E.G. 40 MSEC) SEE FIG (3).
2- THE DURATION OF EACH FREQUENCY BAND OF EACH PHONEME CORRELATED WITH THE DISTRIBUTION OF ACOUSTIC
ENERGY ACROSS FREQUENCIES, FOR EXAMPLE THE MORE ACOUSTIC ENERGY WITHIN HIGH FREQUENCY REGION COULD INDICATE MORE DURATION FOR HIGH FREQUENCY BANDS AND VISE VERSA.
BRIEF DESCRIPTION OF FIGURES
FIG (1): FIRST SUGGESTED METHOD TO SEQUENCING A SINGLE SPEECH PHONEME SIGNAL. 1=HIGH FREQUENCY BAND PRESENTED 1st, 2= MIDDLE FREQUENCY BAND PRESENTED 2nd, 3= LOW FREQUENCY BAND PRESENTED LASTLY.
FIG (2): SECOND SUGGESTED METHOD TO SEQUENCING A SINGLE SPEECH PHONEME SIGNAL. 1= HIGH FREQUENCY BAND PRESENTED 1st, 2= HIGH AND MIDDLE FREQUENCY BANDS PRESENTED 2nd, 3= ALL FREQUENCY BANDS PRESENTED LASTLY.
FIG (3): EXAMPLE ON FREQUENCY BAND DURATION. l=DURATION OF SINGLE PHONEME (90 MSEC), 2= HIGH FREQUENCY BAND PRESENTED FOR 25 MSEC, 3= MIDDLE FREQUENCY BAND PRESENTED FOR 25 MSEC, 4= LOW FREQUENCY BAND PRESENTED FOR 40 MSEC.
Claims
CLAIMS :-
1 - A METHOD OF OPERATING AN AUDIO PROCESSING DEVICE TO IMPROVE A USER'S PERCEPTION OF AN SPEECH SOUND, THE METHOD COMPRISING: SPLITTING AN AUDIO SIGNAL INTO A PLURALITY OF FREQUENCY BANDS. WHEREIN SAID FREQUENCY BANDS PRESENTED IN SEQUENCE (NON- SIMULTANEOUS).
2- A METHOD OF OPERATING AN AUDIO PROCESSING DEVICE TO IMPROVE A USER'S PERCEPTION OF AN SPEECH SOUND, THE METHOD COMPRISING: SPLITTING AN AUDIO SIGNAL INTO A PLURALITY OF FREQUENCY BANDS, WHEREIN THE LOWER FREQUENCY BANDS PRESENTED TOGETHER WITH THE HIGHER FREQUENCY BANDS IN SEQUENCE.
3- A HEARING ASSISTANCE APPARATUS, COMPRISING: SPLITTING AN AUDIO SIGNAL INTO A PLURALITY OF FREQUENCY BANDS, WHEREIN SAID FREQUENCY BANDS PRESENTED IN SEQUENCE (NON-SIMULTANEOUS).
4- A HEARING ASSISTANCE APPARATUS, COMPRISING: SPLITTING AN AUDIO SIGNAL INTO A PLURALITY OF FREQUENCY BANDS, WHEREIN THE LOWER FREQUENCY BANDS PRESENTED TOGETHER WITH THE HIGHER FREQUENCY BANDS IN SEQUENCE.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/EG2016/000029 WO2017025108A2 (en) | 2016-10-04 | 2016-10-04 | Sequencing the speech signal |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/EG2016/000029 WO2017025108A2 (en) | 2016-10-04 | 2016-10-04 | Sequencing the speech signal |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| WO2017025108A2 true WO2017025108A2 (en) | 2017-02-16 |
| WO2017025108A3 WO2017025108A3 (en) | 2017-07-06 |
Family
ID=57983118
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/EG2016/000029 Ceased WO2017025108A2 (en) | 2016-10-04 | 2016-10-04 | Sequencing the speech signal |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2017025108A2 (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111653285A (en) * | 2020-06-01 | 2020-09-11 | 北京猿力未来科技有限公司 | Packet loss compensation method and device |
Family Cites Families (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP2177054B1 (en) * | 2007-07-31 | 2014-04-09 | Phonak AG | Method for adjusting a hearing device with frequency transposition and corresponding arrangement |
| WO2011064950A1 (en) * | 2009-11-25 | 2011-06-03 | パナソニック株式会社 | Hearing aid system, hearing aid method, program, and integrated circuit |
| CA2818210C (en) * | 2010-12-08 | 2015-08-04 | Widex A/S | Hearing aid and a method of enhancing speech reproduction |
| AU2013299982B2 (en) * | 2012-08-06 | 2016-12-22 | Father Flanagan's Boys' Home Doing Business As Boys Town National Research Hospital | Multiband audio compression system and method |
| DE102015201073A1 (en) * | 2015-01-22 | 2016-07-28 | Sivantos Pte. Ltd. | Method and apparatus for noise suppression based on inter-subband correlation |
-
2016
- 2016-10-04 WO PCT/EG2016/000029 patent/WO2017025108A2/en not_active Ceased
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111653285A (en) * | 2020-06-01 | 2020-09-11 | 北京猿力未来科技有限公司 | Packet loss compensation method and device |
| CN111653285B (en) * | 2020-06-01 | 2023-06-30 | 北京猿力未来科技有限公司 | Packet loss compensation method and device |
Also Published As
| Publication number | Publication date |
|---|---|
| WO2017025108A3 (en) | 2017-07-06 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP4761506B2 (en) | Audio processing method and apparatus, program, and audio system | |
| CN107112026A (en) | System, method and device for intelligent speech recognition and processing | |
| EP3264799A1 (en) | A method and a hearing device for improved separability of target sounds | |
| Wang et al. | Improving the intelligibility of speech for simulated electric and acoustic stimulation using fully convolutional neural networks | |
| Yoo et al. | Speech signal modification to increase intelligibility in noisy environments | |
| Kusumoto et al. | Modulation enhancement of speech by a pre-processing algorithm for improving intelligibility in reverberant environments | |
| Kleczkowski et al. | Lombard effect in Polish speech and its comparison in English speech | |
| Turcott et al. | Efficient evaluation of coding strategies for transcutaneous language communication | |
| Hummersone | A psychoacoustic engineering approach to machine sound source separation in reverberant environments | |
| WO2017025108A2 (en) | Sequencing the speech signal | |
| Amano-Kusumoto et al. | A review of research on speech intelligibility and correlations with acoustic features | |
| Saba et al. | Formant priority channel selection for an “n-of-m” sound processing strategy for cochlear implants | |
| Sakayori et al. | Critical spectral regions for vowel identification | |
| WO2017036486A2 (en) | Enhancement of temporal information | |
| Bhattacharya et al. | Combined spectral and temporal enhancement to improve cochlear-implant speech perception | |
| Mohammadi et al. | Making Conversational Vowels More Clear. | |
| Fitzpatrick et al. | The Effect of Seeing the Interlocutor on Speech Production in Different Noise Types. | |
| Amano et al. | Acoustic features of pop-out voice in babble noise | |
| Shobha et al. | Influence of consonant-vowel intensity ratio on speech perception for hearing impaired listeners | |
| Wang et al. | Speech intelligibility and talker identification with non-telephone frequencies | |
| Hodoshima | Effects of talker and playback rate of reverberation-induced speech on speech intelligibility of older adults | |
| Nirgianaki et al. | Perception of fricative voice distinctions in Greek | |
| Heracleous et al. | The role of the Lombard reflex in parkinson's disease | |
| Bauer et al. | Digital speech signal processing to compensate severe sensory hearing deficits: The/s, z, C, t/transposer module in simulation-An overview and examples | |
| Huang et al. | Biologically inspired algorithm for enhancement of speech intelligibility over telephone channel |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 16834702 Country of ref document: EP Kind code of ref document: A2 |
|
| NENP | Non-entry into the national phase in: |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 16834702 Country of ref document: EP Kind code of ref document: A2 |