Potamianos et al., 2006 - Google Patents

Audio-visual ASR from multiple views inside smart rooms

Potamianos et al., 2006

Document ID: 17561882001532046060
Author: Potamianos G; Lucey P
Publication year: 2006
Publication venue: 2006 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems

External Links

Cited by

Snippet

Visual information from a speaker's mouth region is known to improve automatic speech recognition robustness. However, the vast majority of audio-visual automatic speech recognition (AVASR) studies assume frontal images of the speaker's face, which is not …

Continue reading at eprints.qut.edu.au (PDF) (other versions)

210000000214 Mouth 0 abstract description 47

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
- G06K9/00268—Feature extraction; Face representation
- G06K9/00281—Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
- G06K9/00288—Classification, e.g. identification
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
- G06K9/00228—Detection; Localisation; Normalisation
- G06K9/00248—Detection; Localisation; Normalisation using facial parts and geometric relationships
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00335—Recognising movements or behaviour, e.g. recognition of gestures, dynamic facial expressions; Lip-reading
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00597—Acquiring or recognising eyes, e.g. iris verification
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00362—Recognising human body or animal bodies, e.g. vehicle occupant, pedestrian; Recognising body parts, e.g. hand
- G06K9/00369—Recognition of whole body, e.g. static pedestrian or occupant recognition
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00624—Recognising scenes, i.e. recognition of a whole field of perception; recognising scene-specific objects

Similar Documents

Publication	Publication Date	Title
Lucey et al.	2006	Lipreading using profile versus frontal views
US6219640B1 (en)	2001-04-17	Methods and apparatus for audio-visual speaker recognition and utterance verification
Xu et al.	2022	Ava-avd: Audio-visual speaker diarization in the wild
McCowan et al.	2003	Modeling human interaction in meetings
CN100334881C (en)	2007-08-29	Method and system for automatic detection and tracking of multiple persons using multiple cues
US20020101505A1 (en)	2002-08-01	Method and apparatus for predicting events in video conferencing and other applications
Hassanat	2011	Visual speech recognition
Potamianos et al.	2017	Audio and visual modality combination in speech processing applications
Huang et al.	2004	Audio-visual speech recognition using an infrared headset
Lucey et al.	2008	A visual front-end for a continuous pose-invariant lipreading system
Lucey et al.	2007	A unified approach to multi-pose audio-visual ASR
Galatas et al.	2012	Audio-visual speech recognition using depth information from the Kinect in noisy video conditions
Bredin et al.	2012	Fusion of speech, faces and text for person identification in tv broadcast
Tao et al.	2016	An unsupervised visual-only voice activity detection approach using temporal orofacial features
Lucey et al.	2008	Patch-based analysis of visual speech from multiple views
Douxchamps et al.	2007	Robust real time face tracking for the analysis of human behaviour
McCowan et al.	2004	Towards computer understanding of human interactions
Thermos et al.	2016	Audio-visual speech activity detection in a two-speaker scenario incorporating depth information from a profile or frontal view
Zhao et al.	2007	Local spatiotemporal descriptors for visual recognition of spoken phrases
Navarathna et al.	2011	Visual voice activity detection using frontal versus profile views
Potamianos et al.	2006	Audio-visual ASR from multiple views inside smart rooms
Reiter et al.	2005	Multimodal meeting analysis by segmentation and classification of meeting events based on a higher level semantic approach
Zhang et al.	2006	Boosting-based multimodal speaker detection for distributed meetings
Yu et al.	2007	Towards smart meeting: Enabling technologies and a real-world application
Stiefelhagen et al.	2006	Audio-visual perception of a lecturer in a smart seminar room