[go: up one dir, main page]

WO2001082291A1 - Procedes et systemes de reconnaissance vocale et d'apprentissage vocal - Google Patents

Procedes et systemes de reconnaissance vocale et d'apprentissage vocal Download PDF

Info

Publication number
WO2001082291A1
WO2001082291A1 PCT/US2001/012959 US0112959W WO0182291A1 WO 2001082291 A1 WO2001082291 A1 WO 2001082291A1 US 0112959 W US0112959 W US 0112959W WO 0182291 A1 WO0182291 A1 WO 0182291A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
error
speech
audible sounds
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US2001/012959
Other languages
English (en)
Inventor
H. Donald Wilson
Anthony H. Handal
Michael Lessac
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
LESSAC SYSTEMS Inc
Original Assignee
LESSAC SYSTEMS Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by LESSAC SYSTEMS Inc filed Critical LESSAC SYSTEMS Inc
Priority to AU2001255560A priority Critical patent/AU2001255560A1/en
Publication of WO2001082291A1 publication Critical patent/WO2001082291A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B19/00Teaching not covered by other main groups of this subclass
    • G09B19/04Speaking
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • G10L2015/0638Interactive procedures

Definitions

  • the present invention relates to speech recognition technology and voice training using speech recognition of the type typically embodied in speech recognition software typically implemented on personal computer systems.
  • Tiny laptop computing devices fly at speeds a thousand times that of those earh powerhouse computers and boast thousands of times the memory. Instead of huge reels of recording tape, hard disks with capacities on the order of eighteen GB are found in those same laptop computing devices. These devices, with their huge memory and computing capabilities, move as freely in the business world as people, under the arm, in a bag, or on the lap of a businessman flying across the ocean. No doubt, this technology lies at the foundations of the most remarkable, reliable and completely unanticipated bull market in the history of business.
  • speech recognition programs generally have an error correction dialog window which is used to train the system to the features of an individual user's voice, as will be more fully described below.
  • error correction dialog window which is used to train the system to the features of an individual user's voice
  • an acoustic signal received by a microphone is input into a voice board which digitizes the signal.
  • the computer then generates a spectrogram which, for a series of discrete time intervals, records those frequency ranges at which sound exists and the intensity of sound in each of those frequency ranges.
  • the spectrogram referred to in the art as a token, is thus a series of spectrographic displays, one for each of a plurality of time intervals which together form an audible sound to be recognized.
  • Each spectrographic display shows the distribution of energy as a function of frequency during the time interval.
  • sampling rates of 6,000 to 16,000 samples per second are typical, and are used to generate about fifty spectrum intervals per second for an audible sound to be recognized.
  • a speech recognition system involves the input of vocabulary into the hard drive of a computer in the form of the above described spectral analysis matrix, with one or more spectral analysis matrices for each word in the vocabulary of the system. These matrices then serve as word models.
  • SUBST ⁇ UTE SHEET (RULE 26) models in the database can be used as a reliable means for speech recognition.
  • different speakers speak at different rates.
  • a word they take a certain period of time while for another speaker, the word they take a longer period of time.
  • different speakers have voices of different pitch.
  • speakers may give different inflections, emphasis, duration and so forth to different syllables of a word in different ways, depending on the speaker. Even a single speaker will speak in different ways on different occasions.
  • each of the spectral sample periods for the sound to be recognized are compared against the corresponding spectral sample periods of the model which is being rated.
  • the cumulative score for all of the sample periods in the sound against the model is a quality rating for the match.
  • the quality ratings for all the proposed matches are compared and the proposed match having the highest quality rating is output to the system, usually in the form of a computer display of the word or phrase.
  • SUBST ⁇ TUTE SHEET (RULE 26) the input of global information using one or more speakers to develop a global database.
  • the second method in which the database is assembled is the training of the database to a particular user's speech, typically done both during a training session with preselected text, and on an ad hoc basis through use of the error correction dialog window in the speech recognition program.
  • the present invention stems from the recognition that spectrograms of audible sounds may be used not only to recognize speakers and words, but also mispronunciations.
  • the performance of the speech recognition software is improved by focusing in on the user, as opposed to the software.
  • the invention has its objective improvement of the speech patterns of persons using the software. The result is enhanced performance, with the bonus of voice training for the user.
  • Such training is of great importance. For example, salesman, lawyers, store clerks, mothers dealing with children and many others rely heavily on oral communications skills to accomplish daily objectives. Nevertheless, many individuals possess poor speaking characteristics and take this handicap with them to the workplace and throughout daily life.
  • a specialized but highly effective speech training regimen is provided for application in the context of speech recognition software for receiving human language inputs in audio form to a microphone, analyzing the same in a personal computer and outputting alphanumeric documents and navigation commands for control of the person computer, and alphanumeric guidance and aural pronunciation examples from the sound card and speakers associated w ith a personal computer.
  • speech recognition is performed on a first computing device using a microphone to receive audible sounds input by a user into a first computing device having a program with a database consisting of (i) digital representations of known audible sounds and associated alphanumeric representations of the known audible sounds and (ii) digital representations of known audible sounds corresponding to mispronunciations resulting from known classes of mispronounced words and phrases.
  • the method is performed by receiving the audible sounds in the form of the electrical output of a microphone.
  • the training method is performed by having the person being trained read a preselected text and translating the audible sounds into the form of the electrical output of a microphone being sent to a computer.
  • a particular audible sound to be recognized is converting into a digital representation of the audible sound.
  • the digital representation of the particular audible sound is then compared to the digital representations of the known audible sounds to determine which of those known audible sounds is most likely to be the particular audible sound being compared to the sounds in the database.
  • a speech recognition output consisting of the alphanumeric representation associated with the audible sound most likely to be the particular audible sound is then produced.
  • An error indication is then received from the user indicating that there is an error in recognition.
  • the user also indicates the proper alphanumeric representation of the particular audible sound. This allows the system to determine whether the error is a result of a known type or instance of mispronunciation.
  • the digital representation of the particular audible sound is then compared to the digital representations of a proper pronunciation of that audible sound to determine whether there is an error that results from a known type or instance of mispronunciation.
  • the system presents an interactive training program from the computer to the user to enable the user to correct such mispronunciation.
  • the presented interactive training program comprises playback of the properly pronounced sound from a database of recorded sounds corresponding to proper pronunciations of the mispronunciations resulting from the known classes of mispronounced words and phrases.
  • the user is given the option of receiving speech training or training the program to recognize the user's speech pattern, although this is the choice of the user of the program.
  • the determination of whether the error is a result of a known type or instance of mispronunciation is performed by comparing the mispronunciation to the digital representations of known audible sounds corresponding to mispronunciations resulting from known classes of mispronounced words and phrases using a speech recognition engine.
  • the inventive method will be implemented by having the database consisting of (i) digital representations of known audible sounds and associated alphanumeric representations of the known audible sounds and (ii) digital representations of known audible sounds corresponding to mispronunciations resulting from known classes of mispronounced words and phrases, generated by the steps of speaking and digitizing the known audible sounds and the known audible sounds corresponding to mispronunciations resulting from known classes of mispronounced words and phrases.
  • the database will then be introduced into the computing device of many users after the generation by speaking and digitizing has been done on another computing device and transferred together w ith voice recognition and error correcting subroutines to the first computing device using CD-ROM or other appropriate data carrying medium.
  • mispronunciations are input into the database by actual speakers that have such errors as a natural part of this speech pattern.
  • normalization to word, phrase and other sound models may be achieved by normalizing words or phrases to one of a plurality of sound durations. This procedure is followed with respect to all the word that phrase models in the database. When a word is received by the system, it measures the actual duration, and then normalizes the duration of the sound to one of the plurality of the preselected normalized sound durations. This reduces the number of items in the database against which the sound is compared and rated.
  • Figure 1 is a block diagram illustrating a voice recognition program in accordance with the method of the present invention
  • Figure 2 is a block diagram illustrating a voice recognition program in accordance with the training method of the present invention
  • Figure 3 is an alternative embodiment of the inventive system illustrated in Figure 2;
  • a voice and error model is generated using subroutines 12 and 1 12.
  • Subroutines 12 and 1 12 comprises a number of steps which are performed at the site of the software developer, the results of which are sent for example, in the form of a CD-ROM, other media or via the Internet, together with the software for executing voice recognition, to a user, as will be apparent from the description below.
  • the inventive speech recognition method may be practiced on personal computers, as well as on more advanced systems, and even relatively stripped down lightweight systems, the referred to as subnotebooks and even smaller s ⁇ stems provided at the same have sound boards for interfacing with and receiving the output of a microphone. It is noted that quality sound board electronics is important to good recognition and successful of the methods of the invention.
  • a database of word models is generated by having speakers speak the relevant words and phrases into a microphone connected to the sound board of the personal computer being used to generate the database.
  • speakers who have been trained in proper speech habits are used to input words, phrases and sounds into the database at steps 14 and 1 14.
  • the information is generated by the speakers speaking into microphones, attached to the sound boards in the computer, the information is digitized, analyzed and stored on the hard drive 16, 1 16 of the computer.
  • phoneme is used to mean the smallest sound, perhaps meaningless in itself, capable of indicating a difference in meaning between two words.
  • the word “dog” differs from “cog” by virtue of a change of the phoneme “do” pronounced '"daw “ and “co” pronounced “cah. "”
  • the model generating speaker can speak a database of common phoneme errors into the microphone attached to the sound board of the computer to result in input of an error database into hard drive 16, 1 16 of the computer.
  • the phoneme errors are spoken by persons who in various ways make the pronunciation error as part of their normal speech patents.
  • the system is enhanced by the introduction into the database contained on hard drive 16, 1 16 of a plurality of exercise word models, selected for the purpose of training the speech of a user of the system.
  • the same are input into the system through the use of a microphone and sound board, in the same way that the database of the language model was input into the system.
  • SUBS i l l U l h SHEET (RULE 26) word and/or phrase models is associated with each type of phoneme error. This is because if a person makes a speech pronunciation error of a particular type, it is likely that the same speaker makes certain other errors which have common characteristics with other pronunciation errors in the group. For example, a person who mispronounces the word “them “ to sound like “dem " is also likely to mispronounce the words “that", “those '* and "these. "
  • Exercise phrase models are input at steps 22 and 122. These exercise phrase models are stored by the system in hard drive 16, 1 16. The exercise word models and the exercise phrase models input into the system at steps 20 , 22, 120 and 122 respectively are associated in groups having common mispronunciation characteristics. The same are input into the system through the use of a microphone and sound board, in the same way that the database of the language model was input into the system.
  • a plurality of typical mispronunciations are input into the system to create a database of exercise word error models in hard drive 16, 1 16. The same are input into the system through the use of a microphone and sound board, in the same way that the database of the language model was input into the system.
  • the database of relatively common mispronunciation errors is completed at steps 26 and 126 where the speaker generating that database speaks into the system to generate a plurality of exercise phrase error models. These error models are also input into the system through the use of a microphone and stored on hard drive 16, 1 16.
  • SUBST ⁇ UTE SHEET (RULE 26) is done using a speaker or speakers who have the actual speech error as part of their normal speech patterns. The same is believed to achieve substantially enhanced recognition of speech errors, although the same is not believed to be necessary to a functioning system.
  • the models stored on hard disk 16. 1 16. and generated as described above may be recorded on a CD-ROM or other program carrying media, together with a voice recognition engine, such as that marketed by any one of a number of manufacturers such as IBM. Dragon Systems, and others.
  • a prior art speech recognition program may be used for both the purpose of recognizing words, recognizing mispronunciations and phoneme errors, together with the above described audio recordings of proper pronunciations, both during speech recognition operation training sessions ( Figure 1 ). and during speech training session with the inventive interactive program ( Figures 2 and 3).
  • such software comprising the speech recognition engine, editing and training utilities, and database of word models, phrase models, vocal recordings, and error models may be supplied to the user for a one time fee and transported over a publicly accessible digital network, such as the Internet.
  • the software may be made available for limited use for any period of time with charges associated with each such use, in which case the software would never be permanently resident on the computer of a user.
  • the software containing the program and the database is loaded into a personal computer and words are spoken into a microphone coupled to the sound board of the computer, in order to input the speech into the computer in the manner of a conventional speech recognition program.
  • the software containing the program and the database is loaded into a personal computer and the student user is instructed to read a preselected text that appears on the screen of the computer.
  • words are spoken into a microphone coupled to the sound board of the computer, in order to input the speech into the computer in the manner of a conventional speech recognition program.
  • SUBST ⁇ SHEET (RULE 26) 18, 20, 22. 24 . 26, 1 14, 1 18, 120, 124, and 126 and the speech recognition engine, editing and training utilities added, the system proceeds at steps 28 and 128 to receive, through a microphone, speech to be recognized from a user of the program who has loaded the speech recognition engine, editing and training utilities, and database of word models, phrase models, vocal recordings, and error models onto the user's personal computer.
  • the operation of the speech recognition program of the present invention is substantially identical to other speech recognition programs presently on the market. More particularly, at steps 30 and 130, a conventional speech recognition algorithm is applied to recognize audible sounds as the words which they are meant to represent.
  • the computer then outputs the recognized speech on the screen of the computer monitor, and the next phrase uttered by the user proceeds at steps 30 and 130 through the speech recognition algorithm resulting in that speech also being displayed on the monitor screen.
  • the user may use any one of a number of different techniques to bring up an error correction window at step 32 and 132. For example, he may simply double-click on the error, or highlight the erroneous recognition and hit a key dedicated to presentation of the error correction window.
  • the call up of the error correction window at steps 34 and 134 has indicated to the system that there is an error. While some errors are unrelated to pronunciation errors, many are.
  • the system then proceeds at steps 36 and 136 to determine whether the error made by the user is recognized as one of the speech errors recognized by the system. If it is, this information is determined at steps 36 and 138. That the nature of the pronunciation error is then input into the system and logged at steps 38 and 140. In this matter, the system keeps track of the number of errors of a particular type for the user by storing them and tallying them in hard drive 16, 1 16.
  • the speech training will not be triggered by a single mispronunciation. Instead, it is contemplated that the repeated instances of a single type of mispronunciation error will be tallied, and when a threshold of pronunciation errors for that error is reached in the tally, only then will speech training be proposed by the appearance of the screen of a prompt window
  • SUBST ⁇ UTE SHEET suggesting speech training.
  • the same could take the form of a window having the words "The system has determined that it is likely that w e can improve your recognition and speech by coaching you now. Would you like to speak to the speech coach?"
  • the screen may also have a headline above the question, such as "The coach wants to talk to you!
  • the screen ill also have a button bar "OK” to start a training session.
  • a 5 button marked “Cancel” may also be included so the student click on the "Cancel” button to delay the speech coaching session for a limited amount of time or cancel it altogether.
  • the error correction algorithm operates in a manner identical to the speech recognition algorithm at steps 30 and 130, except that the error correction algorithm checks the database of common phoneme errors input into the system by the software developer at step 18 and 1 18 and
  • the system determines that the threshold number of errors in that class has not 25 been reached, it sends the system back to steps 28 and 128. where speech recognition proceeds. If, on the other hand, a predetermined number of errors of the same class have been detected by the system and logged at steps 38 or 140. at steps 40 or 142 the system is sent to step 42 or 144 where the above described "The coach wants to talk to you! screen is presented to the user, who is thus given the opportunity to train his voice.
  • step 42 If the speech recognition user declines the opportunity to train at steps 42. he is given the opportunity to train the database at step 43. If he declines that opportunity also, the system is returned to step 28, where, again, speech recognition proceeds.
  • step 45 the database is trained in the same manner as a conventional speech recognition processing program.
  • step 146 If the speech training user declines the opportunity to train at step 146. the system is returned to step 128, where, again, reading of the rest of the preselected pronunciation error detecting text proceeds.
  • step 42 or 146 when the user decides to accept speech training, the system proceeds to step 44 or 148 respectively, where the determination is made as to whether the particular error is an error in the pronunciation of a word or what is referred to herein as a phrase.
  • phrase in this context, is meant at least parts from two different words. This may mean two or more words, or the combination of one or more words and at least a syllable from another word, and most often the end of one word combined with the beginning of another word, following the tendency of natural speakers to couples sounds to each other, sometimes varying their stand-alone pronunciation. If. at step 44 or 148 the system determines that the mispronunciation is the mispronunciation of a word, the system is sent to step 46 or 150 respectively, where the system retrieves from memory words which have the same or similar mispronunciation errors.
  • these words have been stored in the system, not only in the form of alphanumeric presentations, but also in high-quality audio format.
  • the object of the storage of the high-quality audio sound is to provide for audible playback of the words in the training dialog screen.
  • the words retrieved at step 46 and 150 are also presented on-screen in alphanumeric form to the user and the user is invited to pronounce the word. If the word is pronounced properly, this is determined at step 48, 154. If there is no error, the system proceeds to step 50. 156 where the system determines whether there are two incidences of no error having occurred consecutively. If no error has occurred twice consecutively, the system is returned to act as a voice recognition system at step 28, 128. If no error has occurred only once, at steps 50 and 158 the system is returned to the training dialog screen at step 46 or 150 respectively and the user is invited to pronounce the same or another word having the same type of mispronunciation to ensure that the user is facet word correctly. Once the user has pronounced words twice in a row without errors, the user is returned at step 50 or 156 respectively, to the voice recognition function.
  • step 48 or 146 the system proceeds to step 50 or 150 respectively, to where an instruction screen telling the user how to make the sound, with physical instructions on how to move the muscles of the mouth and tongue to achieve the sound is presented to the user.
  • SUBST ⁇ UTE SHEET (RULE 26)
  • the screen allows for the incorporation of more creative speech training approaches such as the Lessac method described in The Use and Training of the Human Voice -A Bio-Dynamic Approach to Vocal Life. Arthur Lessac. Mayfield Publishing Co. ( 1997).
  • the user is encouraged to use his "inner harmonic sensing. " This enhances the description of a particular sound by having the user explore how the sound affects the user " s feelings or encourages the user to some action.
  • the Lessac method teaches the sound of the letter "N” by not only describing the physical requirements but also instructs the user to liken the sound to the "N” in violin and to "Play this consonant instrument tunefully.”
  • This screen also has a button which may be clicked to cause the system to play back the high-quality audio sound from memory, which was previously recorded during software development, as described above.
  • the system may also incorporate interactive techniques.
  • This approach presents the user with a wire frame drawings of a human face depicting, amongst other information, placement of the tongue, movement of the lips, etc.
  • the user may interactively move the wire frame drawing to get a view from various angles or cause the sounds to be made slowly so that the "facial" movements can be carefully observed.
  • This screen also has a button which may be clicked to cause the system to play back the high-quality audio sound from memory, which was previously recorded during software development, as described above.
  • the user is then invited to say the sound again, and at steps 54 and 152, the user says the word into the microphone which is coupled to the computer, which compares the word to the database for proper pronunciation at determines whether there is an error in the pronunciation of the word at step 56 and 154 respectively.
  • step 46 the word is displayed and the user invited to say the word into the machine to determine whether there is error, with the system testing the output to determine whether it should proceed to speech recognition at step 28. when the standard of two consecutive correct pronunciations has been reached. If there is no error at step 56, however, the tally is cleared and the system proceeds to step 28. where normal speech recognition continues.
  • step 158 the error tally flag is set at step 158 and the system is sent back to step 150 where, again, the sound is displayed in alphanumeric form and the user invited to say the sound into the machine with the system testing the output to determine whether there is an error at step 154. If no pronunciation error is found, the system determines in step 156 if the previous attempt was an error by checking whether the tally error flag is set. If the flag is set, indicating that the previous attempt had a pronunciation error, then the system is sent to step 158 where the tally flag is now cleared and the system returns to step 150. In step 156. if the tally flag is found not set, indicating that the previous attempt had no pronunciation error, then the standard of t o consecutive correct pronunciations has been met and the system until training has been completed.
  • step 44 or 148 the system determines that the mispronunciation is the mispronunciation of a phrase, the system is sent to step 58 or 150 respectively, where the system retrieves from memory phrases which have the same or similar mispronunciation errors.
  • the words or phrase retrieved at step 58 or 150 are also presented on-screen and alphanumeric form to the user at the user is invited to pronounce the word or phrase. If the word is pronounced properly, this is determined at step 60 Or 154 respectively. If there is no error, the system proceeds to step 62 or 158 respectively where the system determines whether there are two incidents of no error having occurred. If no error has occurred twice, the system is returned to act as a voice recognition system at steps 28 and 128.
  • step 62 and 158 the system is returned to the training dialog screen to at step 58 and 150 respectively and the user is invited to pronounce the same or to the word having the same type of mispronunciation to ensure that the user is pronouncing word correctly. Once the user has pronounced words twice in a row without errors, the user is returned at step 62 or 158 to the voice recognition function.
  • step 60 or 154 the system proceeds to step 62 or 150 respectively, where an instruction screen telling the user how to make the sound, with physical instructions on how to move the muscles of the mouth and tongue to achieve the sound is presented to the user as well as any other techniques such as the Lessac method described herein above.
  • This screen also has a button which may be clicked to cause the system to playback the high-quality audio sound from memory, which was previously recorded during software development, as described above.
  • the user is then invited to say the sound again, and at steps 66 and 152, the user says the phrase into the microphone which is coupled to the computer, which compares the word to the database for proper pronunciation and determines whether there is an error in the pronunciation of the word at step 68 and 154 respectively.
  • step 58 or 150 the word is displayed and the user invited to say the word into the machine to determine whether there is error, with the system testing the output to determine whether it should proceed to speech recognition at step 28, 128, when the standard of two consecutive correct pronunciations has been reached. If there is no error at step 68 or 150, however, the tally is cleared and the system proceeds to step 28, 128, where normal speech recognition continues, the training session having been completed.
  • FIG. 3 An alternative embodiment of the invention 210 is shown in Figure 3, wherein steps analogous to those of the Figure 2 embodiment are numbered with numbers one hundred higher than those in the Figure 2 embodiment.
  • the user has the additional alternative of training the database rather than having the database train his speech in step 262. This option is provided for those users who have particular speech habits that the user wants to accord special attention.
  • the database is taught the user ' s particular pronunciation error at step 262.
  • the user can assign a high error threshold or tell the database to ignore the error if he or she does not want training and prefers to keep his or her speech affection.
  • the user may assign a low error threshold if he or she desires extra training for a certain type of error.
  • the above methods can be incorporated into computer systems, including, computer-implementable programs, plug-in component hardware, and firmware.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Educational Technology (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Educational Administration (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

La présente invention concerne des procédés et des systèmes de reconnaissance vocale (10) et d'apprentissage vocal (110). Selon l'invention, un utilisateur entre (28), par l'intermédiaire d'un microphone, des sons audibles dans un premier dispositif informatique comportant un programme avec une base de données (16). Cette base de données est constituée d'une représentation numérique de sons audibles connus et de représentations alphanumériques associées de ces sons audibles connus et de mauvaises prononciations. Le programme compare la représentation numérique d'un son avec les représentations numériques de sons audibles connus se trouvant dans une base de données (30) pour déterminer la probabilité d'avoir l'émission de sons désirés. Si une erreur se produit dans la reconnaissance (32), l'utilisation peut indiquer la représentation alphanumérique correcte du son audible (34) particulier. Cela permet au système de déterminer si l'erreur est le résultat d'un type connu ou d'un exemple de mauvaise prononciation (36). En réponse à la détermination de la nature de l'erreur, le système présente un programme d'apprentissage interactif offert par l'ordinateur à l'utilisateur pour permettre à ce dernier de corriger une telle mauvaise prononciation (45). L'avantage offert par la présente invention réside dans l'amélioration de la reconnaissance vocale et des modèles vocaux de l'utilisateur par forte implication de celui-ci dans la correction d'erreurs. Cela permet à l'utilisateur d'améliorer sa communication orale.
PCT/US2001/012959 2000-04-21 2001-04-23 Procedes et systemes de reconnaissance vocale et d'apprentissage vocal Ceased WO2001082291A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2001255560A AU2001255560A1 (en) 2000-04-21 2001-04-23 Speech recognition and training methods and systems

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US55381100A 2000-04-21 2000-04-21
US55381000A 2000-04-21 2000-04-21
US09/553,810 2000-04-21
US09/553,811 2000-04-21

Publications (1)

Publication Number Publication Date
WO2001082291A1 true WO2001082291A1 (fr) 2001-11-01

Family

ID=27070435

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2001/012959 Ceased WO2001082291A1 (fr) 2000-04-21 2001-04-23 Procedes et systemes de reconnaissance vocale et d'apprentissage vocal

Country Status (2)

Country Link
AU (1) AU2001255560A1 (fr)
WO (1) WO2001082291A1 (fr)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6847931B2 (en) 2002-01-29 2005-01-25 Lessac Technology, Inc. Expressive parsing in computerized conversion of text to speech
US6865533B2 (en) 2000-04-21 2005-03-08 Lessac Technology Inc. Text to speech
US6963841B2 (en) 2000-04-21 2005-11-08 Lessac Technology, Inc. Speech training method with alternative proper pronunciation database
US7205662B2 (en) 2003-02-27 2007-04-17 Symmorphix, Inc. Dielectric barrier layer films
EP1606793A4 (fr) * 2002-12-31 2007-05-16 Lessac Technology Inc Procede de reconnaissance vocale
EP2337006A1 (fr) * 2009-11-24 2011-06-22 Kai Yu Traitement de la parole et apprentissage
CN113284380A (zh) * 2021-05-26 2021-08-20 秦皇岛职业技术学院 一种基于人工智能的英语口语训练装置
US11735169B2 (en) 2020-03-20 2023-08-22 International Business Machines Corporation Speech recognition and training for data inputs

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4783803A (en) * 1985-11-12 1988-11-08 Dragon Systems, Inc. Speech recognition apparatus and method
US5010495A (en) * 1989-02-02 1991-04-23 American Language Academy Interactive language learning system
US5231670A (en) * 1987-06-01 1993-07-27 Kurzweil Applied Intelligence, Inc. Voice controlled system and method for generating text from a voice controlled input
US5487671A (en) * 1993-01-21 1996-01-30 Dsp Solutions (International) Computerized system for teaching speech
US5679001A (en) * 1992-11-04 1997-10-21 The Secretary Of State For Defence In Her Britannic Majesty's Government Of The United Kingdom Of Great Britain And Northern Ireland Children's speech training aid
US5717828A (en) * 1995-03-15 1998-02-10 Syracuse Language Systems Speech recognition apparatus and method for learning
US5787231A (en) * 1995-02-02 1998-07-28 International Business Machines Corporation Method and system for improving pronunciation in a voice control system
GB2323693A (en) * 1997-03-27 1998-09-30 Forum Technology Limited Speech to text conversion
US5864805A (en) * 1996-12-20 1999-01-26 International Business Machines Corporation Method and apparatus for error correction in a continuous dictation system

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4783803A (en) * 1985-11-12 1988-11-08 Dragon Systems, Inc. Speech recognition apparatus and method
US5231670A (en) * 1987-06-01 1993-07-27 Kurzweil Applied Intelligence, Inc. Voice controlled system and method for generating text from a voice controlled input
US5010495A (en) * 1989-02-02 1991-04-23 American Language Academy Interactive language learning system
US5679001A (en) * 1992-11-04 1997-10-21 The Secretary Of State For Defence In Her Britannic Majesty's Government Of The United Kingdom Of Great Britain And Northern Ireland Children's speech training aid
US5487671A (en) * 1993-01-21 1996-01-30 Dsp Solutions (International) Computerized system for teaching speech
US5787231A (en) * 1995-02-02 1998-07-28 International Business Machines Corporation Method and system for improving pronunciation in a voice control system
US5717828A (en) * 1995-03-15 1998-02-10 Syracuse Language Systems Speech recognition apparatus and method for learning
US5864805A (en) * 1996-12-20 1999-01-26 International Business Machines Corporation Method and apparatus for error correction in a continuous dictation system
GB2323693A (en) * 1997-03-27 1998-09-30 Forum Technology Limited Speech to text conversion

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6865533B2 (en) 2000-04-21 2005-03-08 Lessac Technology Inc. Text to speech
US6963841B2 (en) 2000-04-21 2005-11-08 Lessac Technology, Inc. Speech training method with alternative proper pronunciation database
US7280964B2 (en) 2000-04-21 2007-10-09 Lessac Technologies, Inc. Method of recognizing spoken language with recognition of language color
US6847931B2 (en) 2002-01-29 2005-01-25 Lessac Technology, Inc. Expressive parsing in computerized conversion of text to speech
EP1606793A4 (fr) * 2002-12-31 2007-05-16 Lessac Technology Inc Procede de reconnaissance vocale
US7205662B2 (en) 2003-02-27 2007-04-17 Symmorphix, Inc. Dielectric barrier layer films
EP2337006A1 (fr) * 2009-11-24 2011-06-22 Kai Yu Traitement de la parole et apprentissage
US11735169B2 (en) 2020-03-20 2023-08-22 International Business Machines Corporation Speech recognition and training for data inputs
CN113284380A (zh) * 2021-05-26 2021-08-20 秦皇岛职业技术学院 一种基于人工智能的英语口语训练装置
CN113284380B (zh) * 2021-05-26 2022-03-25 秦皇岛职业技术学院 一种基于人工智能的英语口语训练装置

Also Published As

Publication number Publication date
AU2001255560A1 (en) 2001-11-07

Similar Documents

Publication Publication Date Title
US7280964B2 (en) Method of recognizing spoken language with recognition of language color
US6963841B2 (en) Speech training method with alternative proper pronunciation database
Gerosa et al. A review of ASR technologies for children's speech
US5717828A (en) Speech recognition apparatus and method for learning
Kumar et al. Improving literacy in developing countries using speech recognition-supported games on mobile devices
US6560574B2 (en) Speech recognition enrollment for non-readers and displayless devices
US6397185B1 (en) Language independent suprasegmental pronunciation tutoring system and methods
USRE37684E1 (en) Computerized system for teaching speech
Mak et al. PLASER: Pronunciation learning via automatic speech recognition
Arimoto et al. Naturalistic emotional speech collection paradigm with online game and its psychological and acoustical assessment
WO1999040556A1 (fr) Appareil de reconnaissance de la parole et methode d'apprentissage
Stemberger et al. Phonetic transcription for speech-language pathology in the 21st century
US20080027731A1 (en) Comprehensive Spoken Language Learning System
Vicsi et al. A multimedia, multilingual teaching and training system for children with speech disorders
WO2001082291A1 (fr) Procedes et systemes de reconnaissance vocale et d'apprentissage vocal
US20070003913A1 (en) Educational verbo-visualizer interface system
KR101270010B1 (ko) 음성 인식 기반의 단답형 학습 방법 및 시스템
WO1999013446A1 (fr) Systeme interactif permettant d'apprendre a lire et prononcer des discours
US20050144010A1 (en) Interactive language learning method capable of speech recognition
WO2006034569A1 (fr) Systeme d'entrainement vocal et procede permettant de comparer des enonces d'utilisateurs a des signaux vocaux de base
WO2024205497A1 (fr) Procédé et appareil pour générer des invites orales différenciées à des fins d'apprentissage
Ilhan et al. HAPOVER: A Haptic Pronunciation Improver Device
Stativă et al. Assessment of Pronunciation in Language Learning Applications
Kim et al. Non-native speech rhythm: A large-scale study of English pronunciation by Korean learners: A large-scale study of English pronunciation by Korean learners
JP7659710B1 (ja) 音声類似度算定エンジン、及びこれを用いた学習支援プログラム、学習支援システム及び学習支援方法

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP