US20150039312A1 - Controlling speech dialog using an additional sensor - Google Patents
Controlling speech dialog using an additional sensor Download PDFInfo
- Publication number
- US20150039312A1 US20150039312A1 US13/955,265 US201313955265A US2015039312A1 US 20150039312 A1 US20150039312 A1 US 20150039312A1 US 201313955265 A US201313955265 A US 201313955265A US 2015039312 A1 US2015039312 A1 US 2015039312A1
- Authority
- US
- United States
- Prior art keywords
- speech
- module
- information
- user
- speaking
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G10L15/265—
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L15/222—Barge in, i.e. overridable guidance for interrupting prompts
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
-
- G10L17/005—
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/24—Speech recognition using non-acoustical features
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/226—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
Definitions
- the technical field generally relates to speech systems, and more particularly relates to methods and systems for controlling dialog within a speech system based on information from a non-speech related sensor.
- Vehicle speech systems perform speech recognition or understanding of speech uttered by occupants of the vehicle.
- the speech utterances typically include commands that communicate with or control one or more features of the vehicle or other systems that are accessible by the vehicle.
- a speech dialog system of the vehicle speech system generates spoken commands in response to the speech utterances or to elicit speech utterances or other user input.
- the spoken commands are generated in response to the speech system needing further information in order to perform a desired task.
- the spoken commands are generated as a confirmation of the recognition result.
- Some speech systems perform the speech recognition/understanding and generate the spoken commands based on one or more turn-taking steps or functions.
- a dialog manager manages the dialog based on various scenarios that may occur during a conversation.
- the dialog manager manages when the vehicle speech system should be listening for speech uttered by a user and when the vehicle speech system should be generating spoken commands to the user. It is desirable to provide methods and systems for enhancing turn-taking in a speech system. Furthermore, other desirable features and characteristics of the present invention will become apparent from the subsequent detailed description and the appended claims, taken in conjunction with the accompanying drawings and the foregoing technical field and background.
- a method includes: receiving information determined from a non-speech related sensor; using the information in a turn-taking function to confirm at least one of if and when a user is speaking; and generating a command to at least one of a speech recognition module and a speech generation module based on the confirmation.
- a system in another embodiment, includes a first module that receives information determined from a non-speech related sensor, and that uses the information in a turn-taking function to confirm at least one of if and when a user is speaking.
- a second module at least one of starts and stops at least one of speech recognition and speech generation based on the confirmation.
- FIG. 1 is a functional block diagram of a vehicle that includes a speech system in accordance with various exemplary embodiments
- FIG. 2 is a dataflow diagram illustrating a speech system in accordance with various exemplary embodiments.
- FIG. 3 is a flowchart illustrating a speech method that may be performed by the speech system in accordance with various exemplary embodiments.
- module refers to an application specific integrated circuit (ASIC), an electronic circuit, a processor (shared, dedicated, or group) and memory that executes one or more software or firmware programs, a combinational logic circuit, and/or other suitable components that provide the described functionality.
- ASIC application specific integrated circuit
- processor shared, dedicated, or group
- memory that executes one or more software or firmware programs, a combinational logic circuit, and/or other suitable components that provide the described functionality.
- a speech system 10 is shown to be included within a vehicle 12 .
- the speech system 10 provides speech recognition or understanding and a dialog for one or more vehicle systems through a human machine interface module (HMI) module 14 .
- vehicle systems may include, for example, but are not limited to, a phone system 16 , a navigation system 18 , a media system 20 , a telematics system 22 , a network system 24 , or any other vehicle system that may include a speech dependent application.
- HMI human machine interface module
- vehicle systems may include, for example, but are not limited to, a phone system 16 , a navigation system 18 , a media system 20 , a telematics system 22 , a network system 24 , or any other vehicle system that may include a speech dependent application.
- one or more embodiments of the speech system 10 can be applicable to other non-vehicle systems having speech dependent applications and thus, is not limited to the present vehicle example.
- the speech system 10 and/or the HMI module 14 communicate with the multiple vehicle systems 16 - 24 through a communication bus and/or other communication means 26 (e.g., wired, short range wireless, or long range wireless).
- the communication bus can be, for example, but is not limited to, a controller area network (CAN) bus, local interconnect network (LIN) bus, or any other type of bus.
- CAN controller area network
- LIN local interconnect network
- the speech system 10 includes a speech recognition module 32 , a dialog manager module 34 , and a speech generation module 35 .
- the speech recognition module 32 , the dialog manager module 34 , and the speech generation module 35 may be implemented as separate systems and/or as a combined system as shown.
- the speech recognition module 32 receives and processes speech utterances from the HMI module 14 using one or more speech recognition or understanding techniques that rely on acoustic modeling, semantic interpretation and/or natural language understanding.
- the speech recognition module 32 generates one or more possible results from the speech utterance (e.g., based on a confidence threshold) to the dialog manager module 34 .
- the dialog manager module 34 manages an interaction sequence and a selection of speech prompts to be spoken to the user based on the results. In various embodiments, the dialog manager module 34 determines a next speech prompt to be generated by the system in response to the user's speech utterance. The speech generation module 35 generates a spoken command that is to be spoken to the user (e.g., via the HMI module) based on the next speech prompt provided by the dialog manager.
- the speech system 10 further includes a sensor data interpretation module 36 .
- the sensor data interpretation module 36 processes data received from a non-speech related sensor 38 and provides sensor information to the dialog manager module 34 .
- the non-speech related sensor 38 can include, for example, but is not limited to, an image sensor, an ultrasound sensor, a radar sensor, or other sensor that senses non-speech related observable conditions of one or more occupants of the vehicle.
- the non-speech related sensor 38 can be a single sensor that senses all occupants of the vehicle 12 or alternatively, may include multiple sensors that each senses a potential occupant of the vehicle 12 , or that sense all occupants of the vehicle 12 .
- the disclosure will be discussed in the context of the non-speech related sensor 38 being a single sensor.
- the sensor data interpretation module 36 processes the sensor data to determine which occupant is interacting with the HMI module 14 (e.g., if there are multiple occupants in the vehicle 12 ) and further processes the sensor data to determine the presence of speech from the occupant (e.g., whether or not the occupant is talking at a particular time). For example, in the case of the image sensor, the sensor data interpretation module 36 processes image data to determine the presence of speech, for example, based on whether or not the lips are open or closed, based on a rate of movement of the lips, or based on other detected facial expressions of the occupant.
- the sensor data interpretation module 36 processes ultrasound data to determine the presence of speech, for example, based on a detected movement or velocity of an occupant's lips.
- the sensor data interpretation module 36 processes radar data to determine the presence of speech based on a detected movement or velocity of an occupant's lips.
- the dialog manager module 34 receives information from the sensor data interpretation module 36 indicating the presence of speech from a particular occupant (referred to as a user of the system 10 ).
- the information includes a probability of speech presence from an occupant.
- the dialog manager module 34 manages the dialog with the user based on the information from the sensor data interpretation module 36 .
- the dialog manager module 34 uses the information in various turn-taking functions to confirm if and/or when the user is speaking.
- a dataflow diagram illustrates components of the dialog manager module 34 in accordance with various exemplary embodiments.
- various exemplary embodiments of the dialog manager module 34 may include any number of sub-modules.
- the sub-modules shown in FIG. 2 may be combined and/or further partitioned to similarly manage the speech dialog based on the information from the sensor data interpretation module 36 .
- the dialog manager module 34 includes a one or more turn-taking modules that each performs one or more turn-taking functions.
- the turn-taking modules can include, but are not limited to, a system start module 40 , a listening window determination module 42 , and a barge-in detection module 44 .
- Each of the turn-taking modules make use of the information from the sensor data interpretation module 36 to confirm if and when a particular user is speaking and to generate commands to either the speech recognition module 32 and/or the speech generation module 35 based on the confirmation.
- the dialog manager module 34 may include other turn-taking modules that perform one or more turn-taking functions that make use of the information from the sensor data interpretation module 36 to confirm if and when a particular user is speaking, and is not limited to the examples illustrated in FIG. 2 .
- the system start module 40 enables the user to start or wake up the speech system 10 based on an utterance 46 of a particular word (e.g., a magic word). For example, the system start module 40 listens for a particular word or words to be uttered by a particular user. Once the particular word has been uttered and recognized, the system start module 40 generates a command 48 to start the system 10 such that speech dialog can occur. For example, the command 48 can be generated to the speech recognition module 32 to perform the recognition or to the speech generation module 35 to generate a spoken command to initiate a dialog.
- a particular word e.g., a magic word
- the system start module 40 listens for a particular word or words to be uttered by a particular user. Once the particular word has been uttered and recognized, the system start module 40 generates a command 48 to start the system 10 such that speech dialog can occur.
- the command 48 can be generated to the speech recognition module 32 to perform the recognition or to the speech generation module 35 to generate a
- the system start module 40 uses information 50 from the sensor data interpretation module 36 to confirm that a particular user is speaking In various embodiments, the system start module 40 uses the information 50 from the sensor data interpretation module 36 to detect when a particular user is speaking and to initiate monitoring for the magic word(s). By using the information 50 from the sensor data interpretation module 36 , the system start module 40 is able to prevent false recognitions of noise as the magic word.
- the listening window determination module 42 determines a speaking window in which the user may speak after a spoken command is generated and/or before another spoken command is generated. For example, the listening window determination module 42 determines a window of time in which speech input 46 by the user can be received and processed. Based on the window of time, the listening window determination module 42 generates a command 52 to start or stop the generation of a spoken command by the system 10 .
- the listening window determination module 42 uses the information 50 from the sensor data interpretation module 36 to determine the window of time of listening to the user after a spoken command has been generated.
- the listening window can be extended or be determined flexibly in dependence of the speech prompt without risking false speech detection.
- the turn determination module 42 is able to prevent a loss of turn by the user and/or to prevent a speak-over command issued by the system.
- the barge-in detection module 44 enables the user to speak before the generation of the spoken command ends. For example, the barge-in detection module 44 receives speech input and detects whether a user has barged in to a spoken command issued by the system and determines whether to stop a spoken command upon detection of the barge-in. If barge-in has occurred, the barge-in detection module 44 generates a command or commands 54 , 56 to stop the generation of the spoken command and/or to begin the speech recognition.
- the barge-in detection module 44 uses the information 50 from the sensor data interpretation module 36 to confirm that the speech input 46 received is from the particular occupant interacting with the system and to confirm that the speech input 46 is in fact speech. If the barge-in detection module 44 is able to confirm that the speech input 46 is from the particular occupant and is in fact speech, the barge-in detection module 44 issues the commands 54 , 56 to stop the generation of the spoken command and/or to begin the speech recognition.
- the barge-in detection module 44 is able to prevent undetected barge-in where the system 10 fails to detect that the user is speaking over the spoken prompt and/or to prevent false barge-in where the system 10 falsely cuts the prompt short and starts recognition when the user is not actually speaking
- FIG. 3 a flowchart illustrates a speech method that may be performed by the speech system 10 in accordance with various exemplary embodiments.
- the order of operation within the method is not limited to the sequential execution as illustrated in FIG. 3 , but may be performed in one or more varying orders as applicable and in accordance with the present disclosure.
- one or more steps of the method may be added or removed without altering the spirit of the method.
- the method may begin at 100 .
- At least one turn-taking function is selected based on the current operating scenario of the system 10 at 110 . For example, if the system is asleep, then the system start function is selected. In another example, if the system is or is about to be engaging in a dialog, the listening window determination function is selected. In still another example, if the system is generating a spoken command, then a barge-in function is selected. As can be appreciated, other turn-taking functions may be selected thus the method is not limited to the present examples.
- the information 50 from the sensor data interpretation module 36 is received at 120 .
- the information 50 is then used in the selected function to confirm if and/or when a user of the vehicle 12 is speaking at 130 .
- Commands 48 , 52 , 54 , or 56 are generated the speech generation module 35 and/or the speech recognition module 32 based on the confirmation at 140 .
- the method may end at 150 .
- the method may iterate for any number of dialog turns.
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- User Interface Of Digital Computer (AREA)
- Machine Translation (AREA)
Abstract
Description
- The technical field generally relates to speech systems, and more particularly relates to methods and systems for controlling dialog within a speech system based on information from a non-speech related sensor.
- Vehicle speech systems perform speech recognition or understanding of speech uttered by occupants of the vehicle. The speech utterances typically include commands that communicate with or control one or more features of the vehicle or other systems that are accessible by the vehicle. A speech dialog system of the vehicle speech system generates spoken commands in response to the speech utterances or to elicit speech utterances or other user input. In some instances, the spoken commands are generated in response to the speech system needing further information in order to perform a desired task. In other instances, the spoken commands are generated as a confirmation of the recognition result.
- Some speech systems perform the speech recognition/understanding and generate the spoken commands based on one or more turn-taking steps or functions. For example, a dialog manager manages the dialog based on various scenarios that may occur during a conversation. The dialog manager, for example, manages when the vehicle speech system should be listening for speech uttered by a user and when the vehicle speech system should be generating spoken commands to the user. It is desirable to provide methods and systems for enhancing turn-taking in a speech system. Furthermore, other desirable features and characteristics of the present invention will become apparent from the subsequent detailed description and the appended claims, taken in conjunction with the accompanying drawings and the foregoing technical field and background.
- Accordingly, methods and systems are provided for managing speech dialog of a speech system. In one embodiment, a method includes: receiving information determined from a non-speech related sensor; using the information in a turn-taking function to confirm at least one of if and when a user is speaking; and generating a command to at least one of a speech recognition module and a speech generation module based on the confirmation.
- In another embodiment, a system includes a first module that receives information determined from a non-speech related sensor, and that uses the information in a turn-taking function to confirm at least one of if and when a user is speaking. A second module at least one of starts and stops at least one of speech recognition and speech generation based on the confirmation.
- The exemplary embodiments will hereinafter be described in conjunction with the following drawing figures, wherein like numerals denote like elements, and wherein:
-
FIG. 1 is a functional block diagram of a vehicle that includes a speech system in accordance with various exemplary embodiments; -
FIG. 2 is a dataflow diagram illustrating a speech system in accordance with various exemplary embodiments; and -
FIG. 3 is a flowchart illustrating a speech method that may be performed by the speech system in accordance with various exemplary embodiments. - The following detailed description is merely exemplary in nature and is not intended to limit the application and uses. Furthermore, there is no intention to be bound by any expressed or implied theory presented in the preceding technical field, background, brief summary or the following detailed description. As used herein, the term module refers to an application specific integrated circuit (ASIC), an electronic circuit, a processor (shared, dedicated, or group) and memory that executes one or more software or firmware programs, a combinational logic circuit, and/or other suitable components that provide the described functionality.
- Referring now to
FIG. 1 , in accordance with exemplary embodiments of the present disclosure a speech system 10 is shown to be included within avehicle 12. In various exemplary embodiments, the speech system 10 provides speech recognition or understanding and a dialog for one or more vehicle systems through a human machine interface module (HMI)module 14. Such vehicle systems may include, for example, but are not limited to, aphone system 16, anavigation system 18, amedia system 20, atelematics system 22, anetwork system 24, or any other vehicle system that may include a speech dependent application. As can be appreciated, one or more embodiments of the speech system 10 can be applicable to other non-vehicle systems having speech dependent applications and thus, is not limited to the present vehicle example. - The speech system 10 and/or the
HMI module 14 communicate with the multiple vehicle systems 16-24 through a communication bus and/or other communication means 26 (e.g., wired, short range wireless, or long range wireless). The communication bus can be, for example, but is not limited to, a controller area network (CAN) bus, local interconnect network (LIN) bus, or any other type of bus. - The speech system 10 includes a
speech recognition module 32, adialog manager module 34, and aspeech generation module 35. As can be appreciated, thespeech recognition module 32, thedialog manager module 34, and thespeech generation module 35 may be implemented as separate systems and/or as a combined system as shown. In general, thespeech recognition module 32 receives and processes speech utterances from theHMI module 14 using one or more speech recognition or understanding techniques that rely on acoustic modeling, semantic interpretation and/or natural language understanding. Thespeech recognition module 32 generates one or more possible results from the speech utterance (e.g., based on a confidence threshold) to thedialog manager module 34. - The
dialog manager module 34 manages an interaction sequence and a selection of speech prompts to be spoken to the user based on the results. In various embodiments, thedialog manager module 34 determines a next speech prompt to be generated by the system in response to the user's speech utterance. Thespeech generation module 35 generates a spoken command that is to be spoken to the user (e.g., via the HMI module) based on the next speech prompt provided by the dialog manager. - As will be discussed in more detail below, the speech system 10 further includes a sensor
data interpretation module 36. The sensordata interpretation module 36 processes data received from a non-speechrelated sensor 38 and provides sensor information to thedialog manager module 34. The non-speechrelated sensor 38 can include, for example, but is not limited to, an image sensor, an ultrasound sensor, a radar sensor, or other sensor that senses non-speech related observable conditions of one or more occupants of the vehicle. As can be appreciated, in various embodiments, the non-speechrelated sensor 38 can be a single sensor that senses all occupants of thevehicle 12 or alternatively, may include multiple sensors that each senses a potential occupant of thevehicle 12, or that sense all occupants of thevehicle 12. For exemplary purposes, the disclosure will be discussed in the context of the non-speechrelated sensor 38 being a single sensor. - The sensor
data interpretation module 36 processes the sensor data to determine which occupant is interacting with the HMI module 14 (e.g., if there are multiple occupants in the vehicle 12) and further processes the sensor data to determine the presence of speech from the occupant (e.g., whether or not the occupant is talking at a particular time). For example, in the case of the image sensor, the sensordata interpretation module 36 processes image data to determine the presence of speech, for example, based on whether or not the lips are open or closed, based on a rate of movement of the lips, or based on other detected facial expressions of the occupant. In another example, in the case of the ultrasound sensor, the sensordata interpretation module 36 processes ultrasound data to determine the presence of speech, for example, based on a detected movement or velocity of an occupant's lips. In yet another example, in the case of the radar sensor, the sensordata interpretation module 36 processes radar data to determine the presence of speech based on a detected movement or velocity of an occupant's lips. - The
dialog manager module 34 receives information from the sensordata interpretation module 36 indicating the presence of speech from a particular occupant (referred to as a user of the system 10). In various embodiments, the information includes a probability of speech presence from an occupant. Thedialog manager module 34 manages the dialog with the user based on the information from the sensordata interpretation module 36. For example, thedialog manager module 34 uses the information in various turn-taking functions to confirm if and/or when the user is speaking. - Referring now to
FIG. 2 and with continued reference toFIG. 1 , a dataflow diagram illustrates components of thedialog manager module 34 in accordance with various exemplary embodiments. As can be appreciated, various exemplary embodiments of thedialog manager module 34, according to the present disclosure, may include any number of sub-modules. In various exemplary embodiments, the sub-modules shown inFIG. 2 may be combined and/or further partitioned to similarly manage the speech dialog based on the information from the sensordata interpretation module 36. In various exemplary embodiments, thedialog manager module 34 includes a one or more turn-taking modules that each performs one or more turn-taking functions. - In various embodiments, the turn-taking modules can include, but are not limited to, a
system start module 40, a listeningwindow determination module 42, and a barge-indetection module 44. Each of the turn-taking modules make use of the information from the sensordata interpretation module 36 to confirm if and when a particular user is speaking and to generate commands to either thespeech recognition module 32 and/or thespeech generation module 35 based on the confirmation. As can be appreciated, thedialog manager module 34 may include other turn-taking modules that perform one or more turn-taking functions that make use of the information from the sensordata interpretation module 36 to confirm if and when a particular user is speaking, and is not limited to the examples illustrated inFIG. 2 . - With reference now to the specific examples shown in
FIG. 2 , thesystem start module 40 enables the user to start or wake up the speech system 10 based on anutterance 46 of a particular word (e.g., a magic word). For example, thesystem start module 40 listens for a particular word or words to be uttered by a particular user. Once the particular word has been uttered and recognized, thesystem start module 40 generates acommand 48 to start the system 10 such that speech dialog can occur. For example, thecommand 48 can be generated to thespeech recognition module 32 to perform the recognition or to thespeech generation module 35 to generate a spoken command to initiate a dialog. - In various embodiments, the
system start module 40 usesinformation 50 from the sensordata interpretation module 36 to confirm that a particular user is speaking In various embodiments, thesystem start module 40 uses theinformation 50 from the sensordata interpretation module 36 to detect when a particular user is speaking and to initiate monitoring for the magic word(s). By using theinformation 50 from the sensordata interpretation module 36, the system startmodule 40 is able to prevent false recognitions of noise as the magic word. - The listening
window determination module 42 determines a speaking window in which the user may speak after a spoken command is generated and/or before another spoken command is generated. For example, the listeningwindow determination module 42 determines a window of time in whichspeech input 46 by the user can be received and processed. Based on the window of time, the listeningwindow determination module 42 generates acommand 52 to start or stop the generation of a spoken command by the system 10. - In various embodiments, the listening
window determination module 42 uses theinformation 50 from the sensordata interpretation module 36 to determine the window of time of listening to the user after a spoken command has been generated. The listening window can be extended or be determined flexibly in dependence of the speech prompt without risking false speech detection. By using theinformation 50 from the sensordata interpretation module 36, theturn determination module 42 is able to prevent a loss of turn by the user and/or to prevent a speak-over command issued by the system. - The barge-in
detection module 44 enables the user to speak before the generation of the spoken command ends. For example, the barge-indetection module 44 receives speech input and detects whether a user has barged in to a spoken command issued by the system and determines whether to stop a spoken command upon detection of the barge-in. If barge-in has occurred, the barge-indetection module 44 generates a command or commands 54, 56 to stop the generation of the spoken command and/or to begin the speech recognition. - In various embodiments, the barge-in
detection module 44 uses theinformation 50 from the sensordata interpretation module 36 to confirm that thespeech input 46 received is from the particular occupant interacting with the system and to confirm that thespeech input 46 is in fact speech. If the barge-indetection module 44 is able to confirm that thespeech input 46 is from the particular occupant and is in fact speech, the barge-indetection module 44 issues the 54, 56 to stop the generation of the spoken command and/or to begin the speech recognition. By using thecommands information 50 from the sensordata interpretation module 36, the barge-indetection module 44 is able to prevent undetected barge-in where the system 10 fails to detect that the user is speaking over the spoken prompt and/or to prevent false barge-in where the system 10 falsely cuts the prompt short and starts recognition when the user is not actually speaking - Referring now to
FIG. 3 , a flowchart illustrates a speech method that may be performed by the speech system 10 in accordance with various exemplary embodiments. As can be appreciated in light of the disclosure, the order of operation within the method is not limited to the sequential execution as illustrated inFIG. 3 , but may be performed in one or more varying orders as applicable and in accordance with the present disclosure. As can further be appreciated, one or more steps of the method may be added or removed without altering the spirit of the method. - As shown, the method may begin at 100. At least one turn-taking function is selected based on the current operating scenario of the system 10 at 110. For example, if the system is asleep, then the system start function is selected. In another example, if the system is or is about to be engaging in a dialog, the listening window determination function is selected. In still another example, if the system is generating a spoken command, then a barge-in function is selected. As can be appreciated, other turn-taking functions may be selected thus the method is not limited to the present examples.
- Thereafter, the
information 50 from the sensordata interpretation module 36 is received at 120. Theinformation 50 is then used in the selected function to confirm if and/or when a user of thevehicle 12 is speaking at 130. 48, 52, 54, or 56 are generated theCommands speech generation module 35 and/or thespeech recognition module 32 based on the confirmation at 140. Thereafter, the method may end at 150. As can be appreciated, in various embodiments the method may iterate for any number of dialog turns. - While at least one exemplary embodiment has been presented in the foregoing detailed description, it should be appreciated that a vast number of variations exist. It should also be appreciated that the exemplary embodiment or exemplary embodiments are only examples, and are not intended to limit the scope, applicability, or configuration of the disclosure in any way. Rather, the foregoing detailed description will provide those skilled in the art with a convenient road map for implementing the exemplary embodiment or exemplary embodiments. It should be understood that various changes can be made in the function and arrangement of elements without departing from the scope of the disclosure as set forth in the appended claims and the legal equivalents thereof.
Claims (20)
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US13/955,265 US20150039312A1 (en) | 2013-07-31 | 2013-07-31 | Controlling speech dialog using an additional sensor |
| CN201310747419.9A CN104347069A (en) | 2013-07-31 | 2013-12-31 | Controlling speech dialog using an additional sensor |
| DE102014203116.8A DE102014203116A1 (en) | 2013-07-31 | 2014-02-20 | CONTROLLING A LANGUAGE DIALOG, WHICH USES AN ADDITIONAL SENSOR |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US13/955,265 US20150039312A1 (en) | 2013-07-31 | 2013-07-31 | Controlling speech dialog using an additional sensor |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| US20150039312A1 true US20150039312A1 (en) | 2015-02-05 |
Family
ID=52342110
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US13/955,265 Abandoned US20150039312A1 (en) | 2013-07-31 | 2013-07-31 | Controlling speech dialog using an additional sensor |
Country Status (3)
| Country | Link |
|---|---|
| US (1) | US20150039312A1 (en) |
| CN (1) | CN104347069A (en) |
| DE (1) | DE102014203116A1 (en) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20180251122A1 (en) * | 2017-03-01 | 2018-09-06 | Qualcomm Incorporated | Systems and methods for operating a vehicle based on sensor data |
| US10800043B2 (en) * | 2018-09-20 | 2020-10-13 | Electronics And Telecommunications Research Institute | Interaction apparatus and method for determining a turn-taking behavior using multimodel information |
Citations (14)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20020135618A1 (en) * | 2001-02-05 | 2002-09-26 | International Business Machines Corporation | System and method for multi-modal focus detection, referential ambiguity resolution and mood classification using multi-modal input |
| US20030018475A1 (en) * | 1999-08-06 | 2003-01-23 | International Business Machines Corporation | Method and apparatus for audio-visual speech detection and recognition |
| US20030171928A1 (en) * | 2002-02-04 | 2003-09-11 | Falcon Stephen Russel | Systems and methods for managing interactions from multiple speech-enabled applications |
| US20040006483A1 (en) * | 2002-07-04 | 2004-01-08 | Mikio Sasaki | Voice interactive computer system |
| US20040122673A1 (en) * | 2002-12-11 | 2004-06-24 | Samsung Electronics Co., Ltd | Method of and apparatus for managing dialog between user and agent |
| US20050075878A1 (en) * | 2003-10-01 | 2005-04-07 | International Business Machines Corporation | Method, system, and apparatus for natural language mixed-initiative dialogue processing |
| US20050177373A1 (en) * | 2004-02-05 | 2005-08-11 | Avaya Technology Corp. | Methods and apparatus for providing context and experience sensitive help in voice applications |
| US20070136071A1 (en) * | 2005-12-08 | 2007-06-14 | Lee Soo J | Apparatus and method for speech segment detection and system for speech recognition |
| US20080059175A1 (en) * | 2006-08-29 | 2008-03-06 | Aisin Aw Co., Ltd. | Voice recognition method and voice recognition apparatus |
| US20100189305A1 (en) * | 2009-01-23 | 2010-07-29 | Eldon Technology Limited | Systems and methods for lip reading control of a media device |
| US20110246190A1 (en) * | 2010-03-31 | 2011-10-06 | Kabushiki Kaisha Toshiba | Speech dialog apparatus |
| US20130021459A1 (en) * | 2011-07-18 | 2013-01-24 | At&T Intellectual Property I, L.P. | System and method for enhancing speech activity detection using facial feature detection |
| US20140028826A1 (en) * | 2012-07-26 | 2014-01-30 | Samsung Electronics Co., Ltd. | Voice recognition method and apparatus using video recognition |
| US20150161992A1 (en) * | 2012-07-09 | 2015-06-11 | Lg Electronics Inc. | Speech recognition apparatus and method |
Family Cites Families (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2006039120A (en) * | 2004-07-26 | 2006-02-09 | Sony Corp | Dialog apparatus, dialog method, program, and recording medium |
| KR20070038132A (en) * | 2004-08-06 | 2007-04-09 | 코닌클리케 필립스 일렉트로닉스 엔.브이. | Method for a system to communicate with a user |
| US20080255840A1 (en) * | 2007-04-16 | 2008-10-16 | Microsoft Corporation | Video Nametags |
| US20120259638A1 (en) * | 2011-04-08 | 2012-10-11 | Sony Computer Entertainment Inc. | Apparatus and method for determining relevance of input speech |
-
2013
- 2013-07-31 US US13/955,265 patent/US20150039312A1/en not_active Abandoned
- 2013-12-31 CN CN201310747419.9A patent/CN104347069A/en active Pending
-
2014
- 2014-02-20 DE DE102014203116.8A patent/DE102014203116A1/en not_active Withdrawn
Patent Citations (14)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20030018475A1 (en) * | 1999-08-06 | 2003-01-23 | International Business Machines Corporation | Method and apparatus for audio-visual speech detection and recognition |
| US20020135618A1 (en) * | 2001-02-05 | 2002-09-26 | International Business Machines Corporation | System and method for multi-modal focus detection, referential ambiguity resolution and mood classification using multi-modal input |
| US20030171928A1 (en) * | 2002-02-04 | 2003-09-11 | Falcon Stephen Russel | Systems and methods for managing interactions from multiple speech-enabled applications |
| US20040006483A1 (en) * | 2002-07-04 | 2004-01-08 | Mikio Sasaki | Voice interactive computer system |
| US20040122673A1 (en) * | 2002-12-11 | 2004-06-24 | Samsung Electronics Co., Ltd | Method of and apparatus for managing dialog between user and agent |
| US20050075878A1 (en) * | 2003-10-01 | 2005-04-07 | International Business Machines Corporation | Method, system, and apparatus for natural language mixed-initiative dialogue processing |
| US20050177373A1 (en) * | 2004-02-05 | 2005-08-11 | Avaya Technology Corp. | Methods and apparatus for providing context and experience sensitive help in voice applications |
| US20070136071A1 (en) * | 2005-12-08 | 2007-06-14 | Lee Soo J | Apparatus and method for speech segment detection and system for speech recognition |
| US20080059175A1 (en) * | 2006-08-29 | 2008-03-06 | Aisin Aw Co., Ltd. | Voice recognition method and voice recognition apparatus |
| US20100189305A1 (en) * | 2009-01-23 | 2010-07-29 | Eldon Technology Limited | Systems and methods for lip reading control of a media device |
| US20110246190A1 (en) * | 2010-03-31 | 2011-10-06 | Kabushiki Kaisha Toshiba | Speech dialog apparatus |
| US20130021459A1 (en) * | 2011-07-18 | 2013-01-24 | At&T Intellectual Property I, L.P. | System and method for enhancing speech activity detection using facial feature detection |
| US20150161992A1 (en) * | 2012-07-09 | 2015-06-11 | Lg Electronics Inc. | Speech recognition apparatus and method |
| US20140028826A1 (en) * | 2012-07-26 | 2014-01-30 | Samsung Electronics Co., Ltd. | Voice recognition method and apparatus using video recognition |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20180251122A1 (en) * | 2017-03-01 | 2018-09-06 | Qualcomm Incorporated | Systems and methods for operating a vehicle based on sensor data |
| US11702066B2 (en) * | 2017-03-01 | 2023-07-18 | Qualcomm Incorporated | Systems and methods for operating a vehicle based on sensor data |
| US12084045B2 (en) | 2017-03-01 | 2024-09-10 | Qualcomm Incorporated | Systems and methods for operating a vehicle based on sensor data |
| US10800043B2 (en) * | 2018-09-20 | 2020-10-13 | Electronics And Telecommunications Research Institute | Interaction apparatus and method for determining a turn-taking behavior using multimodel information |
Also Published As
| Publication number | Publication date |
|---|---|
| CN104347069A (en) | 2015-02-11 |
| DE102014203116A1 (en) | 2015-02-05 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11817094B2 (en) | Automatic speech recognition with filler model processing | |
| US7437297B2 (en) | Systems and methods for predicting consequences of misinterpretation of user commands in automated systems | |
| US9558739B2 (en) | Methods and systems for adapting a speech system based on user competance | |
| US9601111B2 (en) | Methods and systems for adapting speech systems | |
| US9858920B2 (en) | Adaptation methods and systems for speech systems | |
| US9502030B2 (en) | Methods and systems for adapting a speech system | |
| US20160111090A1 (en) | Hybridized automatic speech recognition | |
| US9881609B2 (en) | Gesture-based cues for an automatic speech recognition system | |
| CN110001558A (en) | Method for controlling a vehicle and device | |
| EP3244402A1 (en) | Methods and systems for determining and using a confidence level in speech systems | |
| US9715878B2 (en) | Systems and methods for result arbitration in spoken dialog systems | |
| US9830925B2 (en) | Selective noise suppression during automatic speech recognition | |
| US20150310853A1 (en) | Systems and methods for speech artifact compensation in speech recognition systems | |
| US11646031B2 (en) | Method, device and computer-readable storage medium having instructions for processing a speech input, transportation vehicle, and user terminal with speech processing | |
| JP6459330B2 (en) | Speech recognition apparatus, speech recognition method, and speech recognition program | |
| CN119360832A (en) | Method and apparatus for speech processing | |
| US20140343947A1 (en) | Methods and systems for managing dialog of speech systems | |
| US10468017B2 (en) | System and method for understanding standard language and dialects | |
| US20150039312A1 (en) | Controlling speech dialog using an additional sensor | |
| KR20190056115A (en) | Apparatus and method for recognizing voice of vehicle | |
| US20140136204A1 (en) | Methods and systems for speech systems | |
| US9858918B2 (en) | Root cause analysis and recovery systems and methods | |
| JP5074759B2 (en) | Dialog control apparatus, dialog control method, and dialog control program | |
| US20140358538A1 (en) | Methods and systems for shaping dialog of speech systems | |
| US20150317973A1 (en) | Systems and methods for coordinating speech recognition |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AS | Assignment |
Owner name: GM GLOBAL TECHNOLOGY OPERATIONS LLC, MICHIGAN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:TZIRKEL-HANCOCK, ELI;AASE, JAN H.;SIMS, ROBERT D.;AND OTHERS;SIGNING DATES FROM 20130628 TO 20130730;REEL/FRAME:030913/0368 |
|
| AS | Assignment |
Owner name: WILMINGTON TRUST COMPANY, DELAWARE Free format text: SECURITY INTEREST;ASSIGNOR:GM GLOBAL TECHNOLOGY OPERATIONS LLC;REEL/FRAME:033135/0440 Effective date: 20101027 |
|
| AS | Assignment |
Owner name: GM GLOBAL TECHNOLOGY OPERATIONS LLC, MICHIGAN Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:WILMINGTON TRUST COMPANY;REEL/FRAME:034189/0065 Effective date: 20141017 |
|
| STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |