[go: up one dir, main page]

Potamianos et al., 2006 - Google Patents

Audio-visual ASR from multiple views inside smart rooms

Potamianos et al., 2006

View PDF
Document ID
17561882001532046060
Author
Potamianos G
Lucey P
Publication year
Publication venue
2006 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems

External Links

Snippet

Visual information from a speaker's mouth region is known to improve automatic speech recognition robustness. However, the vast majority of audio-visual automatic speech recognition (AVASR) studies assume frontal images of the speaker's face, which is not …
Continue reading at eprints.qut.edu.au (PDF) (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/00221Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
    • G06K9/00268Feature extraction; Face representation
    • G06K9/00281Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/00221Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
    • G06K9/00288Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/00221Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
    • G06K9/00228Detection; Localisation; Normalisation
    • G06K9/00248Detection; Localisation; Normalisation using facial parts and geometric relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6217Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/00335Recognising movements or behaviour, e.g. recognition of gestures, dynamic facial expressions; Lip-reading
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/00597Acquiring or recognising eyes, e.g. iris verification
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/00362Recognising human body or animal bodies, e.g. vehicle occupant, pedestrian; Recognising body parts, e.g. hand
    • G06K9/00369Recognition of whole body, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/00624Recognising scenes, i.e. recognition of a whole field of perception; recognising scene-specific objects

Similar Documents

Publication Publication Date Title
Lucey et al. Lipreading using profile versus frontal views
US6219640B1 (en) Methods and apparatus for audio-visual speaker recognition and utterance verification
Xu et al. Ava-avd: Audio-visual speaker diarization in the wild
McCowan et al. Modeling human interaction in meetings
CN100334881C (en) Method and system for automatic detection and tracking of multiple persons using multiple cues
US20020101505A1 (en) Method and apparatus for predicting events in video conferencing and other applications
Hassanat Visual speech recognition
Potamianos et al. Audio and visual modality combination in speech processing applications
Huang et al. Audio-visual speech recognition using an infrared headset
Lucey et al. A visual front-end for a continuous pose-invariant lipreading system
Lucey et al. A unified approach to multi-pose audio-visual ASR
Galatas et al. Audio-visual speech recognition using depth information from the Kinect in noisy video conditions
Bredin et al. Fusion of speech, faces and text for person identification in tv broadcast
Tao et al. An unsupervised visual-only voice activity detection approach using temporal orofacial features
Lucey et al. Patch-based analysis of visual speech from multiple views
Douxchamps et al. Robust real time face tracking for the analysis of human behaviour
McCowan et al. Towards computer understanding of human interactions
Thermos et al. Audio-visual speech activity detection in a two-speaker scenario incorporating depth information from a profile or frontal view
Zhao et al. Local spatiotemporal descriptors for visual recognition of spoken phrases
Navarathna et al. Visual voice activity detection using frontal versus profile views
Potamianos et al. Audio-visual ASR from multiple views inside smart rooms
Reiter et al. Multimodal meeting analysis by segmentation and classification of meeting events based on a higher level semantic approach
Zhang et al. Boosting-based multimodal speaker detection for distributed meetings
Yu et al. Towards smart meeting: Enabling technologies and a real-world application
Stiefelhagen et al. Audio-visual perception of a lecturer in a smart seminar room