Lewis et al., 2003 - Google Patents

Audio-visual speech recognition using red exclusion and neural networks

Lewis et al., 2003

Document ID: 8207088873178296465
Author: Lewis T; Powers D
Publication year: 2003
Publication venue: Journal of Research and Practice in Information Technology

External Links

Cited by

Snippet

Automatic speech recognition (ASR) performs well under restricted conditions, but performance degrades in noisy environments. Audio-Visual Speech Recognition (AVSR) combats this by incorporating a visual signal into the recognition. This paper briefly reviews …

Continue reading at www.researchgate.net (PDF) (other versions)

230000001537 neural 0 title description 29

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
- G06K9/00268—Feature extraction; Face representation
- G06K9/00281—Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/66—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
- G10L21/10—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids transforming into visible information
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B21/00—Teaching, or communicating with, the blind, deaf or mute
- G09B21/001—Teaching or communicating with blind persons
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B21/00—Teaching, or communicating with, the blind, deaf or mute
- G09B21/009—Teaching or communicating with deaf persons

Similar Documents

Publication	Publication Date	Title
Silsbee et al.	2002	Computer lipreading for improved accuracy in automatic speech recognition
Girin et al.	2001	Audio-visual enhancement of speech in noise
Chen	2001	Audiovisual speech processing
Chibelushi et al.	2002	A review of speech-based bimodal recognition
Chen et al.	1998	Audio-visual integration in multimodal communication
Petridis et al.	2017	End-to-end audiovisual fusion with LSTMs
CN103996155A (en)	2014-08-20	Intelligent interaction and psychological comfort robot service system
Meyer et al.	2004	Continuous audio–visual digit recognition using N-best decision fusion
Luettin et al.	1998	Continuous audio-visual speech recognition
Robert-Ribes et al.	1995	A comparison of models for fusion of the auditory and visual sensors in speech perception
Köse et al.	2021	Multimodal representations for synchronized speech and real-time MRI video processing
Zaferani et al.	2021	Automatic personality traits perception using asymmetric auto-encoder
Akman et al.	2022	Lip reading multiclass classification by using dilated CNN with Turkish dataset
Lewis et al.	2003	Audio-visual speech recognition using red exclusion and neural networks
Chand et al.	2023	Survey on Visual Speech Recognition using Deep Learning Techniques
Fellenz et al.	2000	On emotion recognition of faces and of speech using neural networks, fuzzy logic and the ASSESS system
Makhlouf et al.	2016	Evolutionary structure of hidden Markov models for audio-visual Arabic speech recognition
Chiţu¹ et al.	2012	Automatic visual speech recognition
Adjoudani et al.	1997	A multimedia platform for audio-visual speech processing.
Butko	2011	Feature selection for multimodal: acoustic Event detection
Zorić et al.	2006	Real-time language independent lip synchronization method using a genetic algorithm
Vatikiotis‐Bateson et al.	2015	Auditory‐Visual Speech Processing: Something Doesn't Add Up
Wakkumbura et al.	2022	Phoneme-viseme mapping for sinhala speaking robot for Sri Lankan healthcare applications
Goecke	2004	A stereo vision lip tracking algorithm and subsequent statistical analyses of the audio-video correlation in Australian English
Bhaskar et al.	2022	Multimodal Based Audio-Visual Speech Recognition for Hard-of-Hearing: State of the Art Techniques and Challenges