Lewis et al., 2003 - Google Patents
Audio-visual speech recognition using red exclusion and neural networksLewis et al., 2003
View PDF- Document ID
- 8207088873178296465
- Author
- Lewis T
- Powers D
- Publication year
- Publication venue
- Journal of Research and Practice in Information Technology
External Links
Snippet
Automatic speech recognition (ASR) performs well under restricted conditions, but performance degrades in noisy environments. Audio-Visual Speech Recognition (AVSR) combats this by incorporating a visual signal into the recognition. This paper briefly reviews …
- 230000001537 neural 0 title description 29
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
- G06K9/00268—Feature extraction; Face representation
- G06K9/00281—Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/66—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
- G10L21/10—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids transforming into visible information
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B21/00—Teaching, or communicating with, the blind, deaf or mute
- G09B21/001—Teaching or communicating with blind persons
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B21/00—Teaching, or communicating with, the blind, deaf or mute
- G09B21/009—Teaching or communicating with deaf persons
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Silsbee et al. | Computer lipreading for improved accuracy in automatic speech recognition | |
| Girin et al. | Audio-visual enhancement of speech in noise | |
| Chen | Audiovisual speech processing | |
| Chibelushi et al. | A review of speech-based bimodal recognition | |
| Chen et al. | Audio-visual integration in multimodal communication | |
| Petridis et al. | End-to-end audiovisual fusion with LSTMs | |
| CN103996155A (en) | Intelligent interaction and psychological comfort robot service system | |
| Meyer et al. | Continuous audio–visual digit recognition using N-best decision fusion | |
| Luettin et al. | Continuous audio-visual speech recognition | |
| Robert-Ribes et al. | A comparison of models for fusion of the auditory and visual sensors in speech perception | |
| Köse et al. | Multimodal representations for synchronized speech and real-time MRI video processing | |
| Zaferani et al. | Automatic personality traits perception using asymmetric auto-encoder | |
| Akman et al. | Lip reading multiclass classification by using dilated CNN with Turkish dataset | |
| Lewis et al. | Audio-visual speech recognition using red exclusion and neural networks | |
| Chand et al. | Survey on Visual Speech Recognition using Deep Learning Techniques | |
| Fellenz et al. | On emotion recognition of faces and of speech using neural networks, fuzzy logic and the ASSESS system | |
| Makhlouf et al. | Evolutionary structure of hidden Markov models for audio-visual Arabic speech recognition | |
| Chiţu¹ et al. | Automatic visual speech recognition | |
| Adjoudani et al. | A multimedia platform for audio-visual speech processing. | |
| Butko | Feature selection for multimodal: acoustic Event detection | |
| Zorić et al. | Real-time language independent lip synchronization method using a genetic algorithm | |
| Vatikiotis‐Bateson et al. | Auditory‐Visual Speech Processing: Something Doesn't Add Up | |
| Wakkumbura et al. | Phoneme-viseme mapping for sinhala speaking robot for Sri Lankan healthcare applications | |
| Goecke | A stereo vision lip tracking algorithm and subsequent statistical analyses of the audio-video correlation in Australian English | |
| Bhaskar et al. | Multimodal Based Audio-Visual Speech Recognition for Hard-of-Hearing: State of the Art Techniques and Challenges |