[go: up one dir, main page]

WO2006113409A3 - Method, system, and program product for measuring audio video synchronization using lip and teeth charateristics - Google Patents

Method, system, and program product for measuring audio video synchronization using lip and teeth charateristics Download PDF

Info

Publication number
WO2006113409A3
WO2006113409A3 PCT/US2006/014023 US2006014023W WO2006113409A3 WO 2006113409 A3 WO2006113409 A3 WO 2006113409A3 US 2006014023 W US2006014023 W US 2006014023W WO 2006113409 A3 WO2006113409 A3 WO 2006113409A3
Authority
WO
WIPO (PCT)
Prior art keywords
audio
video
information
program product
video synchronization
Prior art date
Application number
PCT/US2006/014023
Other languages
French (fr)
Other versions
WO2006113409A2 (en
Inventor
J Cooper
Mirko Dusan Vojnovic
Christopher Smith
Jibanananda Roy
Saurabh Jain
Original Assignee
Pixel Instr Corp
J Cooper
Mirko Dusan Vojnovic
Christopher Smith
Jibanananda Roy
Saurabh Jain
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from PCT/US2005/012588 external-priority patent/WO2005115014A2/en
Priority claimed from PCT/US2005/041623 external-priority patent/WO2007035183A2/en
Application filed by Pixel Instr Corp, J Cooper, Mirko Dusan Vojnovic, Christopher Smith, Jibanananda Roy, Saurabh Jain filed Critical Pixel Instr Corp
Priority to EP06750137A priority Critical patent/EP1969858A2/en
Priority to GB0622592A priority patent/GB2440384B/en
Priority to CA002566844A priority patent/CA2566844A1/en
Priority to AU2006235990A priority patent/AU2006235990A1/en
Publication of WO2006113409A2 publication Critical patent/WO2006113409A2/en
Publication of WO2006113409A3 publication Critical patent/WO2006113409A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/233Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/242Synchronization processes, e.g. processing of PCR [Program Clock References]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/04Synchronising
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/06Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
    • G10L21/10Transforming into visible information
    • G10L2021/105Synthesis of the lips movements from speech, e.g. for talking heads
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/44Receiver circuitry for the reception of television signals according to analogue transmission standards
    • H04N5/60Receiver circuitry for the reception of television signals according to analogue transmission standards for the sound signals
    • H04N5/602Receiver circuitry for the reception of television signals according to analogue transmission standards for the sound signals for digital sound signals

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Geometry (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
  • Image Analysis (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

Method, system, and program product for measuring audio video synchronization. This is done by first acquiring audio video information into an audio video synchronization system. The step of data acquisition is followed by analyzing the audio information, and analyzing the video information. Next, the audio information is analyzed to locate the presence of sounds therein related to a speaker's personal voice characteristics. In Analysis Phase Audio and Video MuEv-S are calculated from the audio and video information, and the audio and video information is classified into vowel sounds including AA, EE, OO, B, V, TH, F, silence, other sounds, and unclassified phonemes. The inner space between the lips are also identified and determined. This information is used to determine and associate a dominant audio class in a video frame. Matching locations are determined, and the offset of video and audio is determined.
PCT/US2006/014023 2004-05-14 2006-04-13 Method, system, and program product for measuring audio video synchronization using lip and teeth charateristics WO2006113409A2 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
EP06750137A EP1969858A2 (en) 2004-05-14 2006-04-13 Method, system, and program product for measuring audio video synchronization using lip and teeth charateristics
GB0622592A GB2440384B (en) 2005-04-13 2006-04-13 Method,system and program product for measuring audio video synchronization using lip and teeth characteristics
CA002566844A CA2566844A1 (en) 2005-04-13 2006-04-13 Method, system, and program product for measuring audio video synchronization using lip and teeth charateristics
AU2006235990A AU2006235990A1 (en) 2005-04-13 2006-04-13 Method, system, and program product for measuring audio video synchronization using lip and teeth charateristics

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
PCT/US2005/012588 WO2005115014A2 (en) 2004-05-14 2005-04-13 Method, system, and program product for measuring audio video synchronization
USPCT/US05/12588 2005-04-13
USPCT/US05/41623 2005-11-16
PCT/US2005/041623 WO2007035183A2 (en) 2005-04-13 2005-11-16 Method, system, and program product for measuring audio video synchronization independent of speaker characteristics

Publications (2)

Publication Number Publication Date
WO2006113409A2 WO2006113409A2 (en) 2006-10-26
WO2006113409A3 true WO2006113409A3 (en) 2007-06-07

Family

ID=37115719

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2006/014023 WO2006113409A2 (en) 2004-05-14 2006-04-13 Method, system, and program product for measuring audio video synchronization using lip and teeth charateristics

Country Status (3)

Country Link
CA (1) CA2566844A1 (en)
GB (1) GB2438691A (en)
WO (1) WO2006113409A2 (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102007039603A1 (en) * 2007-08-22 2009-02-26 Siemens Ag Method for synchronizing media data streams
FR3014675A1 (en) * 2013-12-12 2015-06-19 Oreal METHOD FOR EVALUATING AT LEAST ONE CLINICAL FACE SIGN
CN110750152B (en) * 2019-09-11 2023-08-29 云知声智能科技股份有限公司 Man-machine interaction method and system based on lip actions
CN111081270B (en) * 2019-12-19 2021-06-01 大连即时智能科技有限公司 Real-time audio-driven virtual character mouth shape synchronous control method
CN114360062A (en) * 2022-01-05 2022-04-15 上海交通大学 Lip language identification method and device based on edge computing terminal
CN115861881B (en) * 2022-11-30 2025-04-25 广东技术师范大学 A method for lip-tone consistency judgment based on multi-keynote joint score fusion

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4313135A (en) * 1980-07-28 1982-01-26 Cooper J Carl Method and apparatus for preserving or restoring audio to video synchronization
US4769845A (en) * 1986-04-10 1988-09-06 Kabushiki Kaisha Carrylab Method of recognizing speech using a lip image
US5387943A (en) * 1992-12-21 1995-02-07 Tektronix, Inc. Semiautomatic lip sync recovery system
US5572261A (en) * 1995-06-07 1996-11-05 Cooper; J. Carl Automatic audio to video timing measurement device and method
US5880788A (en) * 1996-03-25 1999-03-09 Interval Research Corporation Automated synchronization of video image sequences to new soundtracks
US5920842A (en) * 1994-10-12 1999-07-06 Pixel Instruments Signal synchronization

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4975960A (en) * 1985-06-03 1990-12-04 Petajan Eric D Electronic facial tracking and detection system and method and apparatus for automated speech recognition
US6829018B2 (en) * 2001-09-17 2004-12-07 Koninklijke Philips Electronics N.V. Three-dimensional sound creation assisted by visual information

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4313135A (en) * 1980-07-28 1982-01-26 Cooper J Carl Method and apparatus for preserving or restoring audio to video synchronization
US4313135B1 (en) * 1980-07-28 1996-01-02 J Carl Cooper Method and apparatus for preserving or restoring audio to video
US4769845A (en) * 1986-04-10 1988-09-06 Kabushiki Kaisha Carrylab Method of recognizing speech using a lip image
US5387943A (en) * 1992-12-21 1995-02-07 Tektronix, Inc. Semiautomatic lip sync recovery system
US5920842A (en) * 1994-10-12 1999-07-06 Pixel Instruments Signal synchronization
US5572261A (en) * 1995-06-07 1996-11-05 Cooper; J. Carl Automatic audio to video timing measurement device and method
US5880788A (en) * 1996-03-25 1999-03-09 Interval Research Corporation Automated synchronization of video image sequences to new soundtracks

Also Published As

Publication number Publication date
CA2566844A1 (en) 2006-10-26
GB0622589D0 (en) 2007-02-21
GB2438691A (en) 2007-12-05
WO2006113409A2 (en) 2006-10-26

Similar Documents

Publication Publication Date Title
GB2440384A (en) Method,system and program product for measuring audio video synchronization using lip and teeth characteristics
GB2429889A (en) Method, system, and program product for measuring audio video synchronization
EP1922720B1 (en) System and method for synchronizing sound and manually transcribed text
US20190370283A1 (en) Systems and methods for consolidating recorded content
JP4600828B2 (en) Document association apparatus and document association method
EP2267697A3 (en) Information processing system, method of processing information, and program for processing information
KR101616112B1 (en) Speaker separation system and method using voice feature vectors
WO2006113409A3 (en) Method, system, and program product for measuring audio video synchronization using lip and teeth charateristics
EP1657721A3 (en) Music content reproduction apparatus, method thereof and recording apparatus
AU2003222001A1 (en) Method and system for generating a likelihood of cardiovascular disease from analyzing cardiovascular sound signals.
AU2003225928A1 (en) Method for robust voice recognition by analyzing redundant features of source signal
EP1329877A3 (en) Speech synthesis and decoding
DE602005001142D1 (en) Messaging device
WO2006082868A3 (en) Method and system for identifying speech sound and non-speech sound in an environment
WO2007050368A3 (en) A computer-implemented system and method for obtaining customized information related to media content
US9240190B2 (en) Formant based speech reconstruction from noisy signals
JP2007233239A (en) Speech event separation method, speech event separation system, and speech event separation program
JP2010054991A (en) Recording device
JPH04158397A (en) Voice quality converting system
CN108257605A (en) Multi-channel recording method and device and electronic equipment
Sztahó et al. Automatic classification of emotions in spontaneous speech
CN109545196A (en) Audio recognition method, device and computer readable storage medium
Clemins et al. Application of speech recognition to African elephant (Loxodonta Africana) vocalizations
Liu et al. Leakage model and teeth clack removal for air-and bone-conductive integrated microphones
WO2009142464A3 (en) Method and apparatus for processing audio signals

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 200680021184.3

Country of ref document: CN

ENP Entry into the national phase

Ref document number: 0622592

Country of ref document: GB

Kind code of ref document: A

Free format text: PCT FILING DATE = 20060413

WWE Wipo information: entry into national phase

Ref document number: 2006235990

Country of ref document: AU

Ref document number: 0622592.4

Country of ref document: GB

WWE Wipo information: entry into national phase

Ref document number: 2566844

Country of ref document: CA

WWE Wipo information: entry into national phase

Ref document number: 1432/MUMNP/2006

Country of ref document: IN

WWP Wipo information: published in national office

Ref document number: 2006235990

Country of ref document: AU

121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: DE

NENP Non-entry into the national phase

Ref country code: RU

WWE Wipo information: entry into national phase

Ref document number: 2006750137

Country of ref document: EP