Potamianos et al., 2006 - Google Patents
Audio-visual ASR from multiple views inside smart roomsPotamianos et al., 2006
View PDF- Document ID
- 17561882001532046060
- Author
- Potamianos G
- Lucey P
- Publication year
- Publication venue
- 2006 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems
External Links
Snippet
Visual information from a speaker's mouth region is known to improve automatic speech recognition robustness. However, the vast majority of audio-visual automatic speech recognition (AVASR) studies assume frontal images of the speaker's face, which is not …
- 210000000214 Mouth 0 abstract description 47
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
- G06K9/00268—Feature extraction; Face representation
- G06K9/00281—Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
- G06K9/00288—Classification, e.g. identification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
- G06K9/00228—Detection; Localisation; Normalisation
- G06K9/00248—Detection; Localisation; Normalisation using facial parts and geometric relationships
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00335—Recognising movements or behaviour, e.g. recognition of gestures, dynamic facial expressions; Lip-reading
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00597—Acquiring or recognising eyes, e.g. iris verification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00362—Recognising human body or animal bodies, e.g. vehicle occupant, pedestrian; Recognising body parts, e.g. hand
- G06K9/00369—Recognition of whole body, e.g. static pedestrian or occupant recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00624—Recognising scenes, i.e. recognition of a whole field of perception; recognising scene-specific objects
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Lucey et al. | Lipreading using profile versus frontal views | |
| US6219640B1 (en) | Methods and apparatus for audio-visual speaker recognition and utterance verification | |
| Xu et al. | Ava-avd: Audio-visual speaker diarization in the wild | |
| McCowan et al. | Modeling human interaction in meetings | |
| CN100334881C (en) | Method and system for automatic detection and tracking of multiple persons using multiple cues | |
| US20020101505A1 (en) | Method and apparatus for predicting events in video conferencing and other applications | |
| Hassanat | Visual speech recognition | |
| Potamianos et al. | Audio and visual modality combination in speech processing applications | |
| Huang et al. | Audio-visual speech recognition using an infrared headset | |
| Lucey et al. | A visual front-end for a continuous pose-invariant lipreading system | |
| Lucey et al. | A unified approach to multi-pose audio-visual ASR | |
| Galatas et al. | Audio-visual speech recognition using depth information from the Kinect in noisy video conditions | |
| Bredin et al. | Fusion of speech, faces and text for person identification in tv broadcast | |
| Tao et al. | An unsupervised visual-only voice activity detection approach using temporal orofacial features | |
| Lucey et al. | Patch-based analysis of visual speech from multiple views | |
| Douxchamps et al. | Robust real time face tracking for the analysis of human behaviour | |
| McCowan et al. | Towards computer understanding of human interactions | |
| Thermos et al. | Audio-visual speech activity detection in a two-speaker scenario incorporating depth information from a profile or frontal view | |
| Zhao et al. | Local spatiotemporal descriptors for visual recognition of spoken phrases | |
| Navarathna et al. | Visual voice activity detection using frontal versus profile views | |
| Potamianos et al. | Audio-visual ASR from multiple views inside smart rooms | |
| Reiter et al. | Multimodal meeting analysis by segmentation and classification of meeting events based on a higher level semantic approach | |
| Zhang et al. | Boosting-based multimodal speaker detection for distributed meetings | |
| Yu et al. | Towards smart meeting: Enabling technologies and a real-world application | |
| Stiefelhagen et al. | Audio-visual perception of a lecturer in a smart seminar room |