[go: up one dir, main page]

WO1998044483A1 - Time scale modification of audiovisual playback and teaching listening comprehension - Google Patents

Time scale modification of audiovisual playback and teaching listening comprehension Download PDF

Info

Publication number
WO1998044483A1
WO1998044483A1 PCT/IL1998/000145 IL9800145W WO9844483A1 WO 1998044483 A1 WO1998044483 A1 WO 1998044483A1 IL 9800145 W IL9800145 W IL 9800145W WO 9844483 A1 WO9844483 A1 WO 9844483A1
Authority
WO
WIPO (PCT)
Prior art keywords
speech
time base
audiovisual
base controller
operative
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/IL1998/000145
Other languages
French (fr)
Inventor
Zeev Shpiro
Rina Shainski
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Digispeech Israel Ltd
Original Assignee
Digispeech Israel Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Digispeech Israel Ltd filed Critical Digispeech Israel Ltd
Priority to AU65161/98A priority Critical patent/AU6516198A/en
Publication of WO1998044483A1 publication Critical patent/WO1998044483A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • G10L21/007Changing voice quality, e.g. pitch or formants characterised by the process used
    • G10L21/01Correction of time axis
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B19/00Teaching not covered by other main groups of this subclass
    • G09B19/04Speaking
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B19/00Teaching not covered by other main groups of this subclass
    • G09B19/06Foreign languages
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B5/00Electrically-operated educational appliances
    • G09B5/06Electrically-operated educational appliances with both visual and audible presentation of the material to be studied
    • G09B5/065Combinations of audio and video presentations, e.g. videotapes, videodiscs, television systems

Definitions

  • the present invention relates generally to audio/video playback and more specifically inter alia to apparatus and methods for learning listening comprehension.
  • the present invention seeks to provide improved digital audiovisual playback apparatus and methods for providing for increased or decreased playback speeds while maintaining audio playback quality.
  • a digital audiovisual playback system including at least one reader for reading a digital audiovisual memory file, a selectable time base controller receiving an output from the at least one reader, the selectable time base controller being responsive to a user input for indicating the speed at which audiovisual content read from the digital audiovisual file is played, while maintaining audio integrity and synchronization between audio and visual portions of the audiovisual content, and an audiovisual output assembly receiving an output from the selectable time base controller and providing a user-sensible audiovisual output.
  • the selectable time base controller is operative to substantially maintain the pitch of the audio portion of the audiovisual memory file notwithstanding changes in the speed at which it is played.
  • selectable time base controller is operative to vary time duration of periods of no sound occurring in the audio portion in response to the user input .
  • the selectable time base controller is operative to vary time duration of periods of sound occurring in the audio portion without substantially altering their pitch.
  • the selectable time base controller is operative to synchronize the visual portion with the audio portion . Still further in accordance with a preferred embodiment of the present invention the selectable time base controller is operative to synchronize the visual portion with the audio portion by either deleting video frames or by repeating or extending presentation or interpolating. Additionally in accordance with a preferred embodiment of the present invention the selectable time base controller is operative for decreasing the speed of playback of the audiovisual content.
  • the selectable time base controller is operative for increasing the speed of playback of the audiovisual content.
  • the selectable time base controller is embodied in a personal computer.
  • the selectable time base controller is embodied in a digital video disk player.
  • the selectable time base controller is embodied in a dedicated digital video player.
  • a user-interface controller For use in a digital audiovisual playback system, a user-interface controller includes a playback speed selector which enables a user to control playback speed of digital audiovisual content. Preferably the playback speed selector permits a speed variation over a range of at least 200%.
  • a digital audiovisual playback method including the steps of reading a digital audiovisual memory file, selectably controlling playing speed of audiovisual content read from the file by employing a time base controller receiving an output from the at least one reader, wherein the time base controller, responsive to a user input, selects the speed at which audiovisual content read from the digital audiovisual file is played, while maintaining audio integrity and synchronization between audio and visual portions of the audiovisual content, and receiving an output from the selectable time base controller and providing a user-sensible audiovisual output.
  • the selectable time base controller is operative to substantially maintain the pitch of the audio portion of the audiovisual memory file notwithstanding changes in the speed at which it is played.
  • the selectable time base controller is operative to vary time duration of periods of non- speech occurring in the audio portion in response to the user input.
  • the selectable time base controller is operative to vary time duration of periods of speech occurring in the audio portion without substantially altering their pitch.
  • the selectable time base controller is operative to synchronize the visual portion with the audio portion.
  • the selectable time base controller is operative to synchronize the visual portion with the audio portion by either deleting video frames or by repeating or extending existing frames.
  • the selectable time base controller is operative for decreasing the speed of playback of the audiovisual content.
  • the selectable time base controller is operative for increasing the speed of playback of the audiovisual content.
  • an apparatus for use in learning listening comprehension including an audio/visual output generator providing synchronized speech and video outputs and a user operable speech output pace controller operative to cause the output generator to provide a speech output at a user selected pace and at a pitch which is generally independent of the selected pace.
  • the output generator and the controller are operative to provide speech outputs at a pace which is variable over a range of 400 percent.
  • the output generator and the controller are operative to provide a speech output whose pace may be varied by both linear and non-linear techniques .
  • the scorer is responsive inter alia to the pace at which the speech outputs are provided.
  • the video outputs include at least one of images which assist in comprehension of the speech, subtitles and translations.
  • the subtitles and translations are synchronized to the pace of the speech outputs .
  • the video outputs include highlighting of portions of the subtitles in synchronization with the speech outputs .
  • the controller is responsive to a user selected learning level for determining not only the pace of the speech outputs but also whether at least one of subtitles and translations are provided.
  • the controller is also responsive to a user selected learning level for determining also whether portions of at least one of subtitles and translations are highlighted in synchronization with said speech outputs.
  • a method for teaching listening comprehension including providing an output generator which produces synchronized speech and video outputs, and causing the output generator to provide a speech output at a user selected pace and at a pitch which is generally independent of the selected pace.
  • the speech outputs are provided at a user selectable pace which is variable over a range of 400 percent.
  • the speech outputs are provided at a user selectable pace which may be varied by both linear and nonlinear techniques .
  • the scorer is responsive inter alia to the pace at which the speech outputs are provided.
  • the video outputs include at least one of images which assist in comprehension of the speech, subtitles and translations .
  • subtitles and translations are synchronized to the pace of the speech outputs .
  • the video outputs include highlighting of portions of said subtitles in synchronization with the speech outputs .
  • a user selected learning level determines not only the pace of the speech outputs but also whether at least one of subtitles and translations are provided.
  • a user selected learning level determines also whether portions of at least one of subtitles and translations are highlighted in synchronization with the speech outputs .
  • speech and “sound” are used interchangeably and refer to spoken words , phrases and sounds as well as non- spoken sounds .
  • Fig. IA, IB 1C, and ID are illustrations of slowing down an audiovisual playback
  • Figs. IA and IB illustrating the prior art
  • Figs. 1C and ID illustrating a preferred embodiment of the present invention
  • Fig. 2A, 2B, 2C, and 2D are illustrations of speeding up an audiovisual playback.
  • Figs. 2A and 2B illustrating the prior art
  • Figs. 2C and 2D illustrating a preferred embodiment of the present invention
  • Fig. 3 is a block diagram illustration of a digital audiovisual playback system constructed and operative in accordance with a preferred embodiment of the present invention
  • Figs. 4A, 4B, and 4C taken together, are graphical and block diagram illustrations of a preferred mode of operation of the system shown in Fig. 3;
  • Fig. 5 is a generalized illustration of apparatus for learning listening comprehension constructed and operative in accordance with a preferred embodiment of the present invention;
  • Fig. 6 is a table illustrating user selectability of various functionalities provided by the apparatus of Fig. 1; and Fig. 7 is an illustration of a preferred realization of various different audio paces by the apparatus of Fig. 1, while generally maintaining audio pitch uniformity.
  • the present invention provides a system and method for selectably, in response to user inputs, slowing down or speeding up audiovisual playback from a digital file.
  • the digital file may be in the form of a digital video tape, a digital video disk, a computer memory, such as a hard disk or a buffer or even a digital memory of a remote server, the contents of which are received concurrently and which may be, but need not necessarily be, stored in a buffer in a client computer.
  • FIG. IA illustrates typical original audiovisual content including a series of continuous video frames 10 and an accompanying audio soundtrack 12, here shown as including speech. It is appreciated that alternatively or additionally, the audio soundtrack 12 may include speech, music, or any other type of sound, and that multiple soundtracks may accompany video frames 10.
  • a time line 14 is shown having several time indices 16 to indicate the passage of time as frames 10 and soundtrack 12 are output.
  • Fig. IB shows a prior art technique for slowing down the playback of the frames 10 and sOundtrack 12 shown in Fig. IA.
  • each frame 10 is played back over a longer time than in the original and the soundtrack 12 is also similarly stretched. This stretching produces a pitch distortion in the audio output which is extremely unpleasant to a user and impairs the integrity of the audio playback, thus decreasing its intelligibility.
  • soundtrack 12 is divided into speech portions 18, representing active audio, and non-speech portions 20, representing the substantially silent intervals between sounds such as between words or phrases.
  • each frame 10 is played back over a longer time than in the original. Soundtrack 12, however, is not stretched to the extent that it is in the prior art. Speech portions 18 may be stretched to a certain extent, such as up to a factor of 2.5, but in a manner which ensures that the pitch is preserved. Furthermore non-speech portions 20 may be increased substantially, as required. Techniques for changing the time basis of speech are described hereinbelow with reference to Figs. 4 - 7.
  • the audio portion is and continues to be synchronized with the video portion. This is typically achieved by ensuring that the individual video frames 10 are played substantially over the same time duration as the portion of soundtrack 12 corresponding thereto. If necessary certain video frames may be repeated. As is shown in Fig. ID, each speech portion 18 remains synchronized with the video frame to which it originally corresponded, thus maintaining the overall synchronization between the audio and video portions .
  • the factors by which the speech portions 18 and the non-speech portions 20 are stretched are determined and applied in accordance with a difficulty level selected by a user. The video frames are then stretched such that each video frame that has a corresponding speech portion 18 continues to be synchronized with the speech portions 18 to which it originally corresponded.
  • FIG. 2A illustrates typical original audiovisual content including a series of continuous video frames 30 and accompanying audio soundtrack 32 , here shown as including speech. It is appreciated that alternatively or additionally, the audio soundtrack 32 may include speech, music, or any other type of sound, and that multiple soundtracks may accompany video frames 30.
  • a time line 34 is shown having several time indices including time index 36 to indicate the passage of time as frames 30 and soundtrack 32 are output.
  • Fig. 2B shows a prior art technique for speeding up the playback of frames 30 and soundtrack 32 shown in Fig. 2A.
  • each frame 30 is played back over a shorter time than in the original, and the soundtrack 32 is also similarly speeded up.
  • the frames 30 labeled ⁇ l' , 2 ' , and x 3' , as well as the portion of the soundtrack 34 corresponding to the frames shown are shown being output partly or completely prior to a time index 36' of a time line 34' , with time index 36' corresponding temporally to time index 36 of time line 34 of Fig. 2A.
  • This speeding up produces a pitch distortion in the audio output which is extremely unpleasant to a user and impairs the integrity of the audio playback, thus decreasing its intelligibility.
  • soundtrack 32 is divided into speech portions 38, representing sound such as speech or other active audio, and non-speech portions 40, representing the intervals between words or phrases.
  • the frames 30 labeled ⁇ l' , 2' , and ⁇ 4' , as well as the portion of the soundtrack 34 corresponding to the frames shown, are shown being output partly or completely prior to time index 36' of time line 34', with time index 36' corresponding temporally to time index 36 of time line 34 of Figs. 2A and 2C. Soundtrack 32 is not speeded up to the extent that it is in the prior art.
  • Speech portions 38 may be speeded up to a certain extent, such as up to a factor of 2.5, but in a manner which ensures that the pitch is preserved. Furthermore the non-speech portions 40 may be decreased substantially, as required. Techniques for changing the time basis of speech are described hereinbelow with reference to Fig. 4 - 7. Furthermore, in accordance with a preferred embodiment of the invention, the audio portion is and continues to be synchronized with the video portion. This is typically achieved by ensuring that the individual video frames 30 are played substantially over the same time duration as the portion of the soundtrack 32 corresponding thereto. If necessary certain video frames may be discarded, such as the frame 30 labeled ⁇ 3' is discarded in Fig. 2D.
  • each speech portion 38 remains synchronized with the video frame to which it originally corresponded, thus maintaining the overall synchronization between the audio and video portions.
  • the factors by which the speech portions 38 and the non-speech portions 40 are speeded up are determined and applied in accordance with a difficulty level selected by a user.
  • the video frames are then speeded up such that each video frame that has a corresponding speech portion 38 continues to be synchronized with the speech portions 38 to which it originally corresponded.
  • FIG. 3 is a block diagram illustration of a digital audiovisual playback system constructed and operative in accordance with a preferred embodiment of the present invention.
  • a data file 42 including digital audio and video content is typically stored on a storage medium 44 from where it is retrievable.
  • File 42 may comprise a header portion 46, typically containing descriptive information regarding a body portion 48, such as an AVI-format audiovisual file.
  • Header portion 46 typically includes time indices and durations of speech portions corresponding to the audio portion of body portion 48. Header portion 46 may also include data relating to or resulting from TSM pre-processing of body portion 48. Additionally or alternatively, some or all of header portion 46 may be included in a file separate from file 42.
  • File 42 is typically read at a reader 50 where it is split into audio parameters 52 , where audio parameters 52 are typically derived from header 46, an audio portion 54, a video portion 56, and additional video information 58, where additional video information 58 is also typically derived from header 46.
  • a difficulty table 60 is preferably maintained for controlling audio and video output, as is described in greater detail hereinbelow with reference to Figs . 5 and 6.
  • a time-scale modifier 62 receives audio parameters 52 and the audio portion 54 and produces a modified audio output 64.
  • a first video processor 66 receives the video portion 56 and produces a video output 68.
  • a second video processor 70 may be used to process the additional video information 58 for use with video processor 66 and/or or additional video output 72.
  • a selectable time base controller 74 preferably controls modifier 62, video processor 66, -and video processor 70, referred to collectively as an audiovisual output assembly, to provide a user-sensible audiovisual output.
  • a user interface is preferably provided to receive playback and processing parameters such as a user-selected difficulty level from table 60. The operation of elements of Fig. 3 is described in greater detail hereinbelow with reference to Figs . 4 - 7.
  • Figs. 4A, 4B, and 4C taken together, are graphical and block diagram illustrations of a preferred mode of operation of the system shown in Fig. 3.
  • Fig. 4A graphically illustrates audio and video output along a time axis 80.
  • the speed of the video output in Fig. 4A is originally set, for illustration purposes, at 24 frames/second.
  • a video portion 82 is defined as the video frames that correspond to the portion of the audio output that includes actual audio output, in this case speech, while a video portion 84 is defined as the video frames that correspond to the portion of the audio output that does not include speech.
  • the initial duration of video portion 82 and video portion 84 is set, for illustration purposes, at .5 seconds each, with the time elapsed indicated along time axis 80 by a variable t.
  • a user input is shown in Fig. 4B at 86 as indicating that the video/speech output rate is to be slowed down to .667 of original speed, while the non-speech output rate is to be slowed down to .5 of original speed.
  • adding a non-speech extension such as the .5 second non-speech extension shown in Fig. 4C, may optimize existing TSM algorithms.
  • Fig. 4C graphically illustrates audio and video output along time axis 80 as a result of the user input shown in Fig. 4B.
  • the output rate of video portion 82 decreases from 24 frames/second to 16 frames/second in order to accommodate the new speech part duration of .75 seconds.
  • the output rate of the remaining 12 frames of video portion 84 decreases from 24 frames/second to 8 frames/second in order to accommodate both the new non-speech part of .1 second as well as the non-speech extension of .5 seconds .
  • the present invention is particularly suited to applications where digital audiovisual playback is speeded up or slowed down as an aid in research or instruction.
  • the present invention may be implemented as a learning tool to increase listening comprehension as is now described with reference to Figs . 5 - 7.
  • Fig. 5 is a generalized illustration of apparatus for learning listening comprehension constructed and operative in accordance with a preferred embodiment of the present invention.
  • the apparatus of Fig. 5 is preferably embodied in a conventional personal computer 110, such as a Pentium R based personal computer, which is equipped with a keyboard 112, a display 114, a speaker 115, and a mouse 116.
  • the screen of the display 114 appears generally as shown at reference numeral 117 and includes three menu locations 118, 120 and 122, indicated respectively as FILE, DIFFICULTIES, and HELP.
  • a difficulty select scale 124 is also provided for enabling the user to select a level of difficulty, preferably in accordance with a table, such as that illustrated in Fig. 6.
  • buttons 126 enable the user to click on one or more of the following typical functionalities: PLAY, STOP, PAUSE/RESUME, SHORT REVERSE, LONG REVERSE, SHORT FORWARD.
  • a first window 130 illustrates the subject matter of a speech output, which is here indicated at reference numeral 132.
  • a scale 133 may indicate the location of the user in a given lesson and may be used together with a location select functionality thus to enable a user to select a desired location in a lesson.
  • a subtitle 137 may be displayed in a second window, designated by reference numeral 134.
  • This subtitle 137 is preferably a written version of the spoken speech and is synchronized with the spoken speech, as indicated at reference numeral 135.
  • a plurality of written words and/or phrases are displayed in window 134 at a given time and the word or phrase currently being spoken is highlighted, as indicated by reference numeral 136.
  • a translation 142 may be displayed in a third window, designated by reference numeral 138.
  • This translation 142 is also preferably synchronized with the spoken speech.
  • a plurality of translated words and/or phrases are displayed in window 138 at a given time and the word or phrase currently being spoken is highlighted, as indicated by reference numeral 140.
  • the timing of the speech output is variable over a relatively wide range, typically up to 400 percent, preferably without appreciably affecting the pitch thereof.
  • both the duration of each word or phrase and the time elapsed between words and/or phrases may be varied.
  • the speech segment illustrated at reference numeral 135 the speech waveform for each word or phrase is illustrated and its duration is labeled by an index Pn.
  • Intervals between adjacent words and/or phrases are labeled by indices Tn.
  • Fig. 6 is a table illustrating user sele ⁇ tability of various functionalities provided by the apparatus of Fig. 5.
  • pace of the speech output which may be expressed in one or both of linear speed of the speech and the amount of pause between words and/or phrases .
  • the amount of pause between words and/or phrases may be varied both by a linear extension and by addition of delay time; provision of a video output in first window 130; provision of subtitles in second window 134 ; provision of a translation in third window 138; and synchronized highlighting of the subtitles in second window 134.
  • Fig. 7 is an illustration of a preferred realization of various different audio paces by the apparatus of Fig. 5, while generally maintaining audio pitch uniformity.
  • FIG. 7 shows the timing of three different speech output paces, typically as indicated by levels 31 (corresponding to "normal” speech) , 11 and 20 in the table of Fig. 6.
  • levels 31 corresponding to "normal" speech
  • level 31 in the table of Fig. 6 both the duration of each word or phrase and the duration of the interval between each word or phrase are normal for native speakers.
  • both the duration of each word or phrase and the duration of the interval between each word or phrase is extended, albeit by different factors.
  • both the duration of each word or phrase and the duration of the interval between each word or phrase are extended, also by different factors, but to an extent greater than in level 20 and an additional pause between each word or phrase is added.
  • extension of the duration of words and/or phrases and of the duration of the interval between words and/or phrases may be carried out substantially without pitch change by using any suitable algorithm, such as the WSOLA algorithm or the ETSM algorithm.
  • the WSOLA algorithm is described in "An Overlap-Add Technique Based on Waveform Similarity (WSOLA) for High Quality Time-Scale Modification of Speech", ICASSP-93, W. Verhelst and M. Roelands, Vrije Universiteit Brussels, 0-7803-0946-4/93, and the ETSM algorithm is available from Entropic, Cambridge, Massachusetts, USA, Internet address http://www.entropic.com.

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Educational Technology (AREA)
  • Educational Administration (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Quality & Reliability (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

This invention discloses a digital audiovisual playback system including at least one reader (5) for reading a digital audiovisual memory file (42), a select time base controller (62) receiving an output from the at least one reader, the select time base controller being responsive to a user input for selecting the speed at which audiovisual content read from the digital audiovisual file is played while maintaining audio integrity and synchronization between audio and visual portions of the audiovisual content, and audiovisual output assembly (70) receiving an output from the select time base controller, and providing a user sensible audiovisual output.

Description

TIME SCALE MODIFICATION OF AUDIOVISUAL PLAYBACK AND TEACHING
LISTENING COMPREHENSION
FIELD OF THE INVENTION The present invention relates generally to audio/video playback and more specifically inter alia to apparatus and methods for learning listening comprehension.
BACKGROUND OF THE INVENTION Various techniques are known for varying the playback speed of digitally recorded audio-visual materials. Due to difficulties in coordinating the audio portion with the visual portion while maintaining audio playback quality, slow down and speed up functionalities are not commonly provided in audiovisual players . The present technological limitations on audio-visual playback are also noted in the field of language learning. An example of a relevant recent development in this field is a CD ROM which is distributed free of charge by ALC Press Inc. in Japan in conjunction with their print publication entitled English Network. This CD-ROM teaches listening comprehension by using a video segment taken from a news broadcast and transcribing paragraphs of sentences as they are being spoken.
The following U.S. Patents are believed to be representative of the state of the art: 5,392,163, 5,414,568, 5,418,623, 5,420,801, 5,523,896, 5,543,931, 5,583,652, 5,587,789, 5,596,420, 5,608,582, 5,627,692, 5,664,044, 5,692,092, 5,712,946, and 5,717,828.
SUMMARY OF THE INVENTION The present invention seeks to provide improved digital audiovisual playback apparatus and methods for providing for increased or decreased playback speeds while maintaining audio playback quality.
There is thus provided in accordance with a preferred embodiment of the present invention a digital audiovisual playback system including at least one reader for reading a digital audiovisual memory file, a selectable time base controller receiving an output from the at least one reader, the selectable time base controller being responsive to a user input for indicating the speed at which audiovisual content read from the digital audiovisual file is played, while maintaining audio integrity and synchronization between audio and visual portions of the audiovisual content, and an audiovisual output assembly receiving an output from the selectable time base controller and providing a user-sensible audiovisual output.
Further in accordance with a preferred embodiment of the present invention the selectable time base controller is operative to substantially maintain the pitch of the audio portion of the audiovisual memory file notwithstanding changes in the speed at which it is played.
Additionally or alternatively the selectable time base controller is operative to vary time duration of periods of no sound occurring in the audio portion in response to the user input .
Still further in accordance with a preferred embodiment of the present invention the selectable time base controller is operative to vary time duration of periods of sound occurring in the audio portion without substantially altering their pitch.
Additionally in accordance with a preferred embodiment of the present invention the selectable time base controller is operative to synchronize the visual portion with the audio portion . Still further in accordance with a preferred embodiment of the present invention the selectable time base controller is operative to synchronize the visual portion with the audio portion by either deleting video frames or by repeating or extending presentation or interpolating. Additionally in accordance with a preferred embodiment of the present invention the selectable time base controller is operative for decreasing the speed of playback of the audiovisual content.
Further in accordance with' a preferred embodiment of the present invention the selectable time base controller is operative for increasing the speed of playback of the audiovisual content.
Moreover in accordance with a preferred embodiment of the present invention the selectable time base controller is embodied in a personal computer.
Additionally in accordance with a preferred embodiment of the present invention the selectable time base controller is embodied in a digital video disk player. Alternatively the selectable time base controller is embodied in a dedicated digital video player.
For use in a digital audiovisual playback system, a user-interface controller includes a playback speed selector which enables a user to control playback speed of digital audiovisual content. Preferably the playback speed selector permits a speed variation over a range of at least 200%.
There is also provided in accordance with another preferred embodiment of the present invention a digital audiovisual playback method including the steps of reading a digital audiovisual memory file, selectably controlling playing speed of audiovisual content read from the file by employing a time base controller receiving an output from the at least one reader, wherein the time base controller, responsive to a user input, selects the speed at which audiovisual content read from the digital audiovisual file is played, while maintaining audio integrity and synchronization between audio and visual portions of the audiovisual content, and receiving an output from the selectable time base controller and providing a user-sensible audiovisual output.
Further in accordance with a preferred embodiment of the present invention the selectable time base controller is operative to substantially maintain the pitch of the audio portion of the audiovisual memory file notwithstanding changes in the speed at which it is played.
Additionally or alternatively the selectable time base controller is operative to vary time duration of periods of non- speech occurring in the audio portion in response to the user input. Preferably the selectable time base controller is operative to vary time duration of periods of speech occurring in the audio portion without substantially altering their pitch. Additionally or alternatively the selectable time base controller is operative to synchronize the visual portion with the audio portion.
Further in accordance with a preferred embodiment of the present invention the selectable time base controller is operative to synchronize the visual portion with the audio portion by either deleting video frames or by repeating or extending existing frames. Preferably the selectable time base controller is operative for decreasing the speed of playback of the audiovisual content. Additionally or alternatively the selectable time base controller is operative for increasing the speed of playback of the audiovisual content. There is also provided in accordance with another preferred embodiment of the present invention an apparatus for use in learning listening comprehension including an audio/visual output generator providing synchronized speech and video outputs and a user operable speech output pace controller operative to cause the output generator to provide a speech output at a user selected pace and at a pitch which is generally independent of the selected pace.
Further in accordance with a preferred embodiment of the present invention also including a scorer for sensing user responses and providing a score indication of user achievement level . Still further in accordance with a preferred embodiment of the present invention the output generator and the controller are operative to provide speech outputs at a pace which is variable over a range of 400 percent.
Additionally in accordance with a preferred embodiment of the present invention the output generator and the controller are operative to provide a speech output whose pace may be varied by both linear and non-linear techniques .
Moreover in accordance with a preferred embodiment of the present invention the scorer is responsive inter alia to the pace at which the speech outputs are provided.
Additionally in accordance with a preferred embodiment of the present invention the video outputs include at least one of images which assist in comprehension of the speech, subtitles and translations. Preferably the subtitles and translations are synchronized to the pace of the speech outputs .
Further in accordance with a preferred embodiment of the present invention the video outputs include highlighting of portions of the subtitles in synchronization with the speech outputs . Still further in accordance with a preferred embodiment of the present invention the controller is responsive to a user selected learning level for determining not only the pace of the speech outputs but also whether at least one of subtitles and translations are provided. Preferably the controller is also responsive to a user selected learning level for determining also whether portions of at least one of subtitles and translations are highlighted in synchronization with said speech outputs.
There is also provided in accordance with yet another preferred embodiment of the present invention a method for teaching listening comprehension including providing an output generator which produces synchronized speech and video outputs, and causing the output generator to provide a speech output at a user selected pace and at a pitch which is generally independent of the selected pace.
Further in accordance with a preferred embodiment of the present invention and also including sensing user responses and providing a score indication of user achievement level .
Still further in accordance with a preferred embodiment of the present invention the speech outputs are provided at a user selectable pace which is variable over a range of 400 percent.
Additionally in accordance with a preferred embodiment of the present invention the speech outputs are provided at a user selectable pace which may be varied by both linear and nonlinear techniques .
Moreover in accordance with a preferred embodiment of the present invention the scorer is responsive inter alia to the pace at which the speech outputs are provided. Still further in accordance with a preferred embodiment of the present invention the video outputs include at least one of images which assist in comprehension of the speech, subtitles and translations .
Preferably the subtitles and translations are synchronized to the pace of the speech outputs .
Further in accordance with a preferred embodiment of the present invention the video outputs include highlighting of portions of said subtitles in synchronization with the speech outputs .
Still further in accordance with a preferred embodiment of the present invention a user selected learning level determines not only the pace of the speech outputs but also whether at least one of subtitles and translations are provided.
Additionally in accordance with a preferred embodiment of the present invention a user selected learning level determines also whether portions of at least one of subtitles and translations are highlighted in synchronization with the speech outputs .
It is noted that throughout the specification and claims the terms "speech" and "sound" are used interchangeably and refer to spoken words , phrases and sounds as well as non- spoken sounds .
BRIEF DESCRIPTION OF THE DRAWINGS The present invention will be more fully understood and appreciated from the following detailed description, taken in conjunction with the drawings in which:
Fig. IA, IB 1C, and ID are illustrations of slowing down an audiovisual playback, Figs. IA and IB illustrating the prior art, and Figs. 1C and ID illustrating a preferred embodiment of the present invention; Fig. 2A, 2B, 2C, and 2D are illustrations of speeding up an audiovisual playback. Figs. 2A and 2B illustrating the prior art, and Figs. 2C and 2D illustrating a preferred embodiment of the present invention;
Fig. 3 is a block diagram illustration of a digital audiovisual playback system constructed and operative in accordance with a preferred embodiment of the present invention;
Figs. 4A, 4B, and 4C, taken together, are graphical and block diagram illustrations of a preferred mode of operation of the system shown in Fig. 3; Fig. 5 is a generalized illustration of apparatus for learning listening comprehension constructed and operative in accordance with a preferred embodiment of the present invention;
Fig. 6 is a table illustrating user selectability of various functionalities provided by the apparatus of Fig. 1; and Fig. 7 is an illustration of a preferred realization of various different audio paces by the apparatus of Fig. 1, while generally maintaining audio pitch uniformity.
DETAILED DESCRIPTION OF A PREFERRED EMBODIMENT The present invention provides a system and method for selectably, in response to user inputs, slowing down or speeding up audiovisual playback from a digital file. The digital file may be in the form of a digital video tape, a digital video disk, a computer memory, such as a hard disk or a buffer or even a digital memory of a remote server, the contents of which are received concurrently and which may be, but need not necessarily be, stored in a buffer in a client computer.
Reference is now made to Figs. IA, IB, 1C and ID, which illustrate in a simplified manner the operation of the present invention in slowing audiovisual playback in contrast to the prior art. Prior art Fig. IA illustrates typical original audiovisual content including a series of continuous video frames 10 and an accompanying audio soundtrack 12, here shown as including speech. It is appreciated that alternatively or additionally, the audio soundtrack 12 may include speech, music, or any other type of sound, and that multiple soundtracks may accompany video frames 10. A time line 14 is shown having several time indices 16 to indicate the passage of time as frames 10 and soundtrack 12 are output.
Fig. IB shows a prior art technique for slowing down the playback of the frames 10 and sOundtrack 12 shown in Fig. IA. According to the prior art, each frame 10 is played back over a longer time than in the original and the soundtrack 12 is also similarly stretched. This stretching produces a pitch distortion in the audio output which is extremely unpleasant to a user and impairs the integrity of the audio playback, thus decreasing its intelligibility.
In accordance with a preferred embodiment of the present invention, as shown in Figs. 1C and ID, soundtrack 12 is divided into speech portions 18, representing active audio, and non-speech portions 20, representing the substantially silent intervals between sounds such as between words or phrases. As shown in Fig. ID, each frame 10 is played back over a longer time than in the original. Soundtrack 12, however, is not stretched to the extent that it is in the prior art. Speech portions 18 may be stretched to a certain extent, such as up to a factor of 2.5, but in a manner which ensures that the pitch is preserved. Furthermore non-speech portions 20 may be increased substantially, as required. Techniques for changing the time basis of speech are described hereinbelow with reference to Figs. 4 - 7. Furthermore, in accordance with a preferred embodiment of the invention, the audio portion is and continues to be synchronized with the video portion. This is typically achieved by ensuring that the individual video frames 10 are played substantially over the same time duration as the portion of soundtrack 12 corresponding thereto. If necessary certain video frames may be repeated. As is shown in Fig. ID, each speech portion 18 remains synchronized with the video frame to which it originally corresponded, thus maintaining the overall synchronization between the audio and video portions . The factors by which the speech portions 18 and the non-speech portions 20 are stretched are determined and applied in accordance with a difficulty level selected by a user. The video frames are then stretched such that each video frame that has a corresponding speech portion 18 continues to be synchronized with the speech portions 18 to which it originally corresponded. Reference is now made to Figs. 2A, 2B, 2C, and 2D, which illustrate in a simplified manner the operation of the present invention in speeding up audiovisual playback in contrast to the prior art. Prior art Fig. 2A illustrates typical original audiovisual content including a series of continuous video frames 30 and accompanying audio soundtrack 32 , here shown as including speech. It is appreciated that alternatively or additionally, the audio soundtrack 32 may include speech, music, or any other type of sound, and that multiple soundtracks may accompany video frames 30. A time line 34 is shown having several time indices including time index 36 to indicate the passage of time as frames 30 and soundtrack 32 are output.
Fig. 2B shows a prior art technique for speeding up the playback of frames 30 and soundtrack 32 shown in Fig. 2A. According to the prior art, each frame 30 is played back over a shorter time than in the original, and the soundtrack 32 is also similarly speeded up. As seen in Fig. 2B, the frames 30 labeled Λl' , 2 ' , and x3' , as well as the portion of the soundtrack 34 corresponding to the frames shown, are shown being output partly or completely prior to a time index 36' of a time line 34' , with time index 36' corresponding temporally to time index 36 of time line 34 of Fig. 2A. This speeding up produces a pitch distortion in the audio output which is extremely unpleasant to a user and impairs the integrity of the audio playback, thus decreasing its intelligibility.
In accordance with a preferred embodiment of the present invention, as shown in Figs. *2C and 2D, soundtrack 32 is divided into speech portions 38, representing sound such as speech or other active audio, and non-speech portions 40, representing the intervals between words or phrases. As seen in Fig. 2D, the frames 30 labeled Λl' , 2' , and λ4' , as well as the portion of the soundtrack 34 corresponding to the frames shown, are shown being output partly or completely prior to time index 36' of time line 34', with time index 36' corresponding temporally to time index 36 of time line 34 of Figs. 2A and 2C. Soundtrack 32 is not speeded up to the extent that it is in the prior art. Speech portions 38 may be speeded up to a certain extent, such as up to a factor of 2.5, but in a manner which ensures that the pitch is preserved. Furthermore the non-speech portions 40 may be decreased substantially, as required. Techniques for changing the time basis of speech are described hereinbelow with reference to Fig. 4 - 7. Furthermore, in accordance with a preferred embodiment of the invention, the audio portion is and continues to be synchronized with the video portion. This is typically achieved by ensuring that the individual video frames 30 are played substantially over the same time duration as the portion of the soundtrack 32 corresponding thereto. If necessary certain video frames may be discarded, such as the frame 30 labeled λ3' is discarded in Fig. 2D.
As is shown in Fig. 2D, each speech portion 38 remains synchronized with the video frame to which it originally corresponded, thus maintaining the overall synchronization between the audio and video portions. The factors by which the speech portions 38 and the non-speech portions 40 are speeded up are determined and applied in accordance with a difficulty level selected by a user. The video frames are then speeded up such that each video frame that has a corresponding speech portion 38 continues to be synchronized with the speech portions 38 to which it originally corresponded.
Reference is now made to Fig. 3 which is a block diagram illustration of a digital audiovisual playback system constructed and operative in accordance with a preferred embodiment of the present invention. A data file 42 including digital audio and video content is typically stored on a storage medium 44 from where it is retrievable. File 42 may comprise a header portion 46, typically containing descriptive information regarding a body portion 48, such as an AVI-format audiovisual file. Header portion 46 typically includes time indices and durations of speech portions corresponding to the audio portion of body portion 48. Header portion 46 may also include data relating to or resulting from TSM pre-processing of body portion 48. Additionally or alternatively, some or all of header portion 46 may be included in a file separate from file 42.
File 42 is typically read at a reader 50 where it is split into audio parameters 52 , where audio parameters 52 are typically derived from header 46, an audio portion 54, a video portion 56, and additional video information 58, where additional video information 58 is also typically derived from header 46. A difficulty table 60 is preferably maintained for controlling audio and video output, as is described in greater detail hereinbelow with reference to Figs . 5 and 6.
A time-scale modifier 62 receives audio parameters 52 and the audio portion 54 and produces a modified audio output 64. A first video processor 66 receives the video portion 56 and produces a video output 68. A second video processor 70 may be used to process the additional video information 58 for use with video processor 66 and/or or additional video output 72. A selectable time base controller 74 preferably controls modifier 62, video processor 66, -and video processor 70, referred to collectively as an audiovisual output assembly, to provide a user-sensible audiovisual output. A user interface is preferably provided to receive playback and processing parameters such as a user-selected difficulty level from table 60. The operation of elements of Fig. 3 is described in greater detail hereinbelow with reference to Figs . 4 - 7.
Figs. 4A, 4B, and 4C, taken together, are graphical and block diagram illustrations of a preferred mode of operation of the system shown in Fig. 3. Fig. 4A graphically illustrates audio and video output along a time axis 80. The speed of the video output in Fig. 4A is originally set, for illustration purposes, at 24 frames/second. A video portion 82 is defined as the video frames that correspond to the portion of the audio output that includes actual audio output, in this case speech, while a video portion 84 is defined as the video frames that correspond to the portion of the audio output that does not include speech. The initial duration of video portion 82 and video portion 84 is set, for illustration purposes, at .5 seconds each, with the time elapsed indicated along time axis 80 by a variable t. A user input is shown in Fig. 4B at 86 as indicating that the video/speech output rate is to be slowed down to .667 of original speed, while the non-speech output rate is to be slowed down to .5 of original speed. As a result, the duration of the speech part increases from .5 seconds to .75 seconds (.5 x 1/.667 seconds = .75 seconds) and the non-speech part from .5 seconds to 1 second (.5 x 1/.5 seconds = 1 second) . It has been found through experimentation that adding a non-speech extension, such as the .5 second non-speech extension shown in Fig. 4C, may optimize existing TSM algorithms.
Fig. 4C graphically illustrates audio and video output along time axis 80 as a result of the user input shown in Fig. 4B. As video portion 84 includes 12 frames, the output rate of video portion 82 decreases from 24 frames/second to 16 frames/second in order to accommodate the new speech part duration of .75 seconds. Similarly, the output rate of the remaining 12 frames of video portion 84 decreases from 24 frames/second to 8 frames/second in order to accommodate both the new non-speech part of .1 second as well as the non-speech extension of .5 seconds .
The present invention is particularly suited to applications where digital audiovisual playback is speeded up or slowed down as an aid in research or instruction. For example, the present invention may be implemented as a learning tool to increase listening comprehension as is now described with reference to Figs . 5 - 7.
Reference is now made to Fig. 5, which is a generalized illustration of apparatus for learning listening comprehension constructed and operative in accordance with a preferred embodiment of the present invention. The apparatus of Fig. 5 is preferably embodied in a conventional personal computer 110, such as a Pentium R based personal computer, which is equipped with a keyboard 112, a display 114, a speaker 115, and a mouse 116. In accordance with a preferred embodiment of the present invention, during learning, the screen of the display 114 appears generally as shown at reference numeral 117 and includes three menu locations 118, 120 and 122, indicated respectively as FILE, DIFFICULTIES, and HELP. A difficulty select scale 124 is also provided for enabling the user to select a level of difficulty, preferably in accordance with a table, such as that illustrated in Fig. 6.
A plurality of operating buttons 126, typically six in number, enable the user to click on one or more of the following typical functionalities: PLAY, STOP, PAUSE/RESUME, SHORT REVERSE, LONG REVERSE, SHORT FORWARD.
A first window 130 illustrates the subject matter of a speech output, which is here indicated at reference numeral 132. A scale 133 may indicate the location of the user in a given lesson and may be used together with a location select functionality thus to enable a user to select a desired location in a lesson.
Additionally, in accordance with a preferred embodiment of the present invention a subtitle 137 may be displayed in a second window, designated by reference numeral 134. This subtitle 137 is preferably a written version of the spoken speech and is synchronized with the spoken speech, as indicated at reference numeral 135. Preferably, a plurality of written words and/or phrases are displayed in window 134 at a given time and the word or phrase currently being spoken is highlighted, as indicated by reference numeral 136.
Further, in accordance with a preferred embodiment of the present invention a translation 142 may be displayed in a third window, designated by reference numeral 138. This translation 142 is also preferably synchronized with the spoken speech. Preferably, a plurality of translated words and/or phrases are displayed in window 138 at a given time and the word or phrase currently being spoken is highlighted, as indicated by reference numeral 140.
It is a particular feature of the present invention that the timing of the speech output is variable over a relatively wide range, typically up to 400 percent, preferably without appreciably affecting the pitch thereof. In accordance with a preferred embodiment of the invention, as will be described hereinbelow with reference to Fig. 7, both the duration of each word or phrase and the time elapsed between words and/or phrases may be varied. In the speech segment illustrated at reference numeral 135, the speech waveform for each word or phrase is illustrated and its duration is labeled by an index Pn. Intervals between adjacent words and/or phrases are labeled by indices Tn. Reference is now made to Fig. 6, which is a table illustrating user seleσtability of various functionalities provided by the apparatus of Fig. 5. It is seen that there are quite a few levels of difficulty, which are distinguished from each other inter alia by one or more of the following: pace of the speech output which may be expressed in one or both of linear speed of the speech and the amount of pause between words and/or phrases . The amount of pause between words and/or phrases may be varied both by a linear extension and by addition of delay time; provision of a video output in first window 130; provision of subtitles in second window 134 ; provision of a translation in third window 138; and synchronized highlighting of the subtitles in second window 134. Fig. 7 is an illustration of a preferred realization of various different audio paces by the apparatus of Fig. 5, while generally maintaining audio pitch uniformity. Fig. 7 shows the timing of three different speech output paces, typically as indicated by levels 31 (corresponding to "normal" speech) , 11 and 20 in the table of Fig. 6. At the "normal" level, level 31 in the table of Fig. 6, both the duration of each word or phrase and the duration of the interval between each word or phrase are normal for native speakers.
It can be seen that in level 20, both the duration of each word or phrase and the duration of the interval between each word or phrase is extended, albeit by different factors. In level 11 both the duration of each word or phrase and the duration of the interval between each word or phrase are extended, also by different factors, but to an extent greater than in level 20 and an additional pause between each word or phrase is added. It is to be appreciated that extension of the duration of words and/or phrases and of the duration of the interval between words and/or phrases may be carried out substantially without pitch change by using any suitable algorithm, such as the WSOLA algorithm or the ETSM algorithm. The WSOLA algorithm is described in "An Overlap-Add Technique Based on Waveform Similarity (WSOLA) for High Quality Time-Scale Modification of Speech", ICASSP-93, W. Verhelst and M. Roelands, Vrije Universiteit Brussels, 0-7803-0946-4/93, and the ETSM algorithm is available from Entropic, Cambridge, Massachusetts, USA, Internet address http://www.entropic.com.
It will be appreciated that the present invention is not limited to what has been particularly shown and described hereinabove. Both combinations of various features described herein and subcombinations thereof as well as obvious variations thereof all fall within the scope of the present invention.

Claims

C L A I M S
We claim: 1. A digital audiovisual playback system comprising: at least one reader for reading a digital audiovisual memory file; a selectable time base controller receiving an output from said at least one reader, said selectable time base controller being responsive to a user input for selecting the speed at which audiovisual content read from the digital audiovisual file is played, while maintaining audio integrity and synchronization between audio and visual portions of said audiovisual content; and an audiovisual output assembly receiving an output from said selectable time base controller and providing a user- sensible audiovisual output.
2. A digital audiovisual playback system according to claim 1 and wherein said selectable time base controller is operative to substantially maintain the pitch of the audio portion of the audiovisual memory file notwithstanding changes in the speed at which it is played.
3. A digital audiovisual playback system according to claim 1 or claim 2 and wherein said selectable time base controller is operative to vary time duration of periods of no sound occurring in the audio portion in response to said user input .
4. A digital audiovisual playback system according to any of the preceding claims and wherein said selectable time base controller is operative to vary time duration of periods of sound occurring in the audio portion without substantially altering their pitch.
5. A digital audiovisual playback system according to claim 4 and wherein said selectable time base controller is operative to synchronize the visual portion with the audio portion .
6. A digital audiovisual playback system according to claim 5 and wherein said selectable time base controller is operative to synchronize the visual portion with the audio portion by either deleting video frames or by repeating existing frames .
7. A digital audiovisual playback system according to any of the preceding claims and wherein said selectable time base controller is operative for decreasing the speed of playback of said audiovisual content.
8. A digital audiovisual playback system according to any of the preceding claims and wherein said selectable time base controller is operative for increasing the speed of playback of said audiovisual content.
9. A digital audiovisual playback system according to any of the preceding claims and wherein said selectable time base controller is embodied in a personal computer.
10. A digital audiovisual playback system according to any of claims 1 - 8 and wherein said selectable time base controller is embodied in a digital video disk player.
11. A digital audiovisual playback system according to any of claims 1 - 8 and wherein said selectable time base controller is embodied in a dedicated digital video player.
12. For use in a digital audiovisual playback system according to any of claims 1 - 11, a user-interface controller including a playback speed selector which enables a user to control playback speed of digital audiovisual content.
13. A user-interface controller according to claim 12 and wherein said playback speed selector permits a speed variation over a range of at least 200%.
14. A digital audiovisual playback method comprising the steps of: reading a digital audiovisual memory file; selectably controlling playing speed of audiovisual content read from said file by employing a time base controller receiving an output from said at least one reader, wherein said time base controller, responsive to a user input, selects the speed at which audiovisual content read from the digital audiovisual file is played, while maintaining audio integrity and synchronization between audio arid visual portions of said audiovisual content; and receiving an output from said selectable time base controller and providing a user-sensible audiovisual output.
15. A digital audiovisual playback method according to claim 14 and wherein said selectable time base controller is operative to substantially maintain the pitch of the audio portion of the audiovisual memory file notwithstanding changes in the speed at which it is played.
16. A digital audiovisual playback method according to claim 14 or claim 15 and wherein said selectable time base controller is operative to vary time duration of periods of silence occurring in the audio portion in response to said user input.
17. A digital audiovisual playback method according to any of the preceding claims 14 - 16 and wherein said selectable time base controller is operative to vary time duration of periods of sound occurring in the audio portion without substantially altering their pitch.
18. A digital audiovisual playback method according to claim 17 and wherein said selectable time base controller is operative to synchronize the visual portion with the audio portion.
19. A digital audiovisual playback method according to claim 18 and wherein said selectable time base controller is operative to synchronize the visual portion with the audio portion by either deleting video frames or by repeating existing frames .
20. A digital audiovisual playback method according to any of the preceding claims 14 - 19 and wherein said selectable time base controller is operative for decreasing the speed of playback of said audiovisual content.
21. A digital audiovisual playback system according to any of the preceding claims 14 - 20 and wherein said selectable time base controller is operative for increasing the speed of playback of said audiovisual content.
22. Apparatus for use in learning listening comprehension including: an audio/visual output generator providing synchronized speech and video outputs; and a user operable speech output pace controller operative to cause the output generator to provide a speech output at a user selected pace and at a pitch which is generally independent of the selected pace.
23. Apparatus according to claim 22 and also comprising: a scorer for sensing user responses and providing a score indication of user achievement level .
24. Apparatus according to claim 22 and wherein the output generator and said controller are operative to provide speech outputs at a pace which is variable over a range of 400 percent.
25. Apparatus according to claim 22 and wherein said output generator and said controller are operative to provide a speech output whose pace may be varied by both linear and non- linear techniques .
26. Apparatus according to claim 23 and wherein said scorer is responsive inter alia to the pace at which the speech outputs are provided.
27. Apparatus according to claim 22 and wherein said video outputs include at least one of images which assist in comprehension of the speech, subtitles and translations.
28. Apparatus according to claim 27 and wherein said subtitles and translations are synchronized to the pace of the speech outputs .
29. Apparatus according to claim 28 and wherein said video outputs include highlighting of portions of said subtitles in synchronization with said speech outputs.
30. Apparatus according to claim 22 and wherein said controller is responsive to a user selected learning level for determining not only the pace of the speech outputs but also whether at least one of subtitles and translations are provided.
31. Apparatus according to claim 30 and wherein said controller is also responsive to a user selected learning level for determining also whether portions of at least one of subtitles and translations are highlighted in synchronization with said speech outputs.
32. A method for teaching listening comprehension including: providing an output generator which produces synchronized speech and video outputs; and causing the output generator to provide a speech output at a user selected pace and at a pitch which is generally independent of the selected pace.
33. A method according to claim 32 and also comprising: sensing user responses and providing a score indication of user achievement level.
3 . A method according to claim 32 and wherein the speech outputs are provided at a user selectable pace which is variable over a range of 400 percent.
35. A method according to claim 32 and wherein said speech outputs are provided at a user selectable pace which may be varied by both linear and non-linear techniques .
36. A method according to claim 33 and wherein said scorer is responsive inter alia to the pace at which the speech outputs are provided.
37. A method according to claim 32 and wherein said video outputs include at least one of images which assist in comprehension of the speech, subtitles and translations.
38. A method according to claim 37 and wherein said subtitles and translations are synchronized to the pace of the speech outputs.
39. A method according to claim 38 and wherein said video outputs include highlighting of portions of said subtitles in synchronization with said speech outputs.
40. A method according to claim 32 and wherein a user selected learning level determines not only the pace of the speech outputs but also whether at least one of subtitles and translations are provided.
41. A method according to claim 40 and wherein a user selected learning level determines also whether portions of at least one of subtitles and translations are highlighted in synchronization with said speech outputs.
PCT/IL1998/000145 1997-03-28 1998-03-27 Time scale modification of audiovisual playback and teaching listening comprehension Ceased WO1998044483A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU65161/98A AU6516198A (en) 1997-03-28 1998-03-27 Time scale modification of audiovisual playback and teaching listening comprehension

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US82534497A 1997-03-28 1997-03-28
US08/825,344 1997-03-28

Publications (1)

Publication Number Publication Date
WO1998044483A1 true WO1998044483A1 (en) 1998-10-08

Family

ID=25243773

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IL1998/000145 Ceased WO1998044483A1 (en) 1997-03-28 1998-03-27 Time scale modification of audiovisual playback and teaching listening comprehension

Country Status (2)

Country Link
AU (1) AU6516198A (en)
WO (1) WO1998044483A1 (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001026076A3 (en) * 1999-10-04 2002-03-07 Brent Carter Audio learning device
WO2002050798A3 (en) * 2000-12-18 2003-01-23 Digispeech Marketing Ltd Spoken language teaching system based on language unit segmentation
WO2004077381A1 (en) * 2003-02-28 2004-09-10 Dublin Institute Of Technology A voice playback system
EP1125287A4 (en) * 1998-10-09 2005-07-20 Donald J Hejna Jr Method and apparatus to prepare listener-interest-filtered works
GB2415585A (en) * 2004-06-01 2005-12-28 Hitachi Ltd Speed variable audio playback
WO2006054126A1 (en) * 2004-11-22 2006-05-26 Pirchio, Mario Method to synchronize audio and graphics in a multimedia presentation
US7203840B2 (en) 2000-12-18 2007-04-10 Burlingtonspeech Limited Access control for interactive learning system
US7536300B2 (en) 1998-10-09 2009-05-19 Enounce, Inc. Method and apparatus to determine and use audience affinity and aptitude
US7996321B2 (en) 2000-12-18 2011-08-09 Burlington English Ltd. Method and apparatus for access control to language learning system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4884972A (en) * 1986-11-26 1989-12-05 Bright Star Technology, Inc. Speech synchronized animation
US5689618A (en) * 1991-02-19 1997-11-18 Bright Star Technology, Inc. Advanced tools for speech synchronized animation
US5697789A (en) * 1994-11-22 1997-12-16 Softrade International, Inc. Method and system for aiding foreign language instruction

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4884972A (en) * 1986-11-26 1989-12-05 Bright Star Technology, Inc. Speech synchronized animation
US5689618A (en) * 1991-02-19 1997-11-18 Bright Star Technology, Inc. Advanced tools for speech synchronized animation
US5697789A (en) * 1994-11-22 1997-12-16 Softrade International, Inc. Method and system for aiding foreign language instruction

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7899668B2 (en) 1998-10-09 2011-03-01 Enounce Incorporated Method and apparatus to prepare listener-interest-filtered works
EP1125287A4 (en) * 1998-10-09 2005-07-20 Donald J Hejna Jr Method and apparatus to prepare listener-interest-filtered works
US10614829B2 (en) 1998-10-09 2020-04-07 Virentem Ventures, Llc Method and apparatus to determine and use audience affinity and aptitude
US9343080B2 (en) 1998-10-09 2016-05-17 Virentem Ventures, Llc Method and apparatus to prepare listener-interest-filtered works
US8478599B2 (en) 1998-10-09 2013-07-02 Enounce, Inc. Method and apparatus to determine and use audience affinity and aptitude
US8452589B2 (en) 1998-10-09 2013-05-28 Enounce Incorporated Method and apparatus to prepare listener-interest-filtered works
US7536300B2 (en) 1998-10-09 2009-05-19 Enounce, Inc. Method and apparatus to determine and use audience affinity and aptitude
WO2001026076A3 (en) * 1999-10-04 2002-03-07 Brent Carter Audio learning device
WO2002050798A3 (en) * 2000-12-18 2003-01-23 Digispeech Marketing Ltd Spoken language teaching system based on language unit segmentation
US7203840B2 (en) 2000-12-18 2007-04-10 Burlingtonspeech Limited Access control for interactive learning system
US7996321B2 (en) 2000-12-18 2011-08-09 Burlington English Ltd. Method and apparatus for access control to language learning system
WO2004077381A1 (en) * 2003-02-28 2004-09-10 Dublin Institute Of Technology A voice playback system
GB2415585B (en) * 2004-06-01 2006-05-24 Hitachi Ltd Digital information reproducing apparatus and method
US7693398B2 (en) 2004-06-01 2010-04-06 Hitachi, Ltd. Digital information reproducing apparatus and method
GB2424160B (en) * 2004-06-01 2007-01-31 Hitachi Ltd Digital information reproducing apparatus and method
GB2424160A (en) * 2004-06-01 2006-09-13 Hitachi Ltd Digital information reproducing apparatus and method
GB2415585A (en) * 2004-06-01 2005-12-28 Hitachi Ltd Speed variable audio playback
CN100594527C (en) * 2004-11-22 2010-03-17 马里奥·皮尔基奥 Method for synchronizing audio and graphics in multimedia presentation
US8068107B2 (en) 2004-11-22 2011-11-29 Mario Pirchio Method to synchronize audio and graphics in a multimedia presentation
WO2006054126A1 (en) * 2004-11-22 2006-05-26 Pirchio, Mario Method to synchronize audio and graphics in a multimedia presentation

Also Published As

Publication number Publication date
AU6516198A (en) 1998-10-22

Similar Documents

Publication Publication Date Title
Arons SpeechSkimmer: Interactively skimming recorded speech
US5697789A (en) Method and system for aiding foreign language instruction
CA2257298C (en) Non-uniform time scale modification of recorded audio
US20100298959A1 (en) Speech reproducing method, speech reproducing device, and computer program
US5621538A (en) Method for synchronizing computerized audio output with visual output
JP2003307997A (en) Language education system, voice data processing device, voice data processing method, voice data processing program, and storage medium
JPH08507153A (en) Interactive audiovisual control mechanism
WO1998044483A1 (en) Time scale modification of audiovisual playback and teaching listening comprehension
He et al. Exploring benefits of non-linear time compression
JP2010283605A (en) Video processing apparatus and method
JP3881620B2 (en) Speech speed variable device and speech speed conversion method
JP2008152605A (en) Presentation analysis apparatus and presentation viewing system
WO2003102897A1 (en) Speech data generation and reproduction method, method for supporting memorization and learning
JPH01300777A (en) Still picture file system, still picture reproducing device and its storage medium
Prögler Choices in editing oral history: The distillation of Dr. Hiller
KR100383061B1 (en) A learning method using a digital audio with caption data
JPH01300779A (en) Still picture file editing device
KR20000063615A (en) Method of Reproducing Audio Signal Corresponding to Partially Designated Text Data and Reproducing Apparatus for the Same
JP7288530B1 (en) system and program
JP2001154684A (en) Speech speed converter
JP2581700B2 (en) Information recording medium and information reproducing method
JPH0527787A (en) Music player
JP2006154531A (en) Audio speed conversion device, audio speed conversion method, and audio speed conversion program
JP2000099308A (en) Electronic book player
JP2004184619A (en) System for acquiring language and practicing musical performance

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AL AM AT AT AU AZ BA BB BG BR BY CA CH CN CU CZ CZ DE DE DK DK EE EE ES FI FI GB GE GH GM GW HU ID IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SK SL TJ TM TR TT UA UG US UZ VN YU ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN ML MR NE SN TD TG

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

NENP Non-entry into the national phase

Ref country code: JP

Ref document number: 1998541365

Format of ref document f/p: F

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: CA