US20180213396A1 - Privacy control in a connected environment based on speech characteristics - Google Patents
Privacy control in a connected environment based on speech characteristics Download PDFInfo
- Publication number
- US20180213396A1 US20180213396A1 US15/587,244 US201715587244A US2018213396A1 US 20180213396 A1 US20180213396 A1 US 20180213396A1 US 201715587244 A US201715587244 A US 201715587244A US 2018213396 A1 US2018213396 A1 US 2018213396A1
- Authority
- US
- United States
- Prior art keywords
- speech
- privacy expectations
- privacy
- cloud server
- assistant device
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/30—Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/04—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W12/00—Security arrangements; Authentication; Protecting privacy or anonymity
- H04W12/02—Protecting privacy or anonymity, e.g. protecting personally identifiable information [PII]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W12/00—Security arrangements; Authentication; Protecting privacy or anonymity
- H04W12/03—Protecting confidentiality, e.g. by encryption
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W4/00—Services specially adapted for wireless communication networks; Facilities therefor
- H04W4/70—Services for machine-to-machine communication [M2M] or machine type communication [MTC]
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/226—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
- G10L2015/228—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of application context
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
Definitions
- This disclosure relates to privacy control, and in particular privacy control in a connected environment such as a home.
- the Internet of Things allows for the internetworking of devices to exchange data among themselves to enable sophisticated functionality.
- devices configured for home automation can exchange data to allow for the control and automation of lighting, air conditioning systems, security, etc.
- this can also include home assistant devices providing an intelligent personal assistant to respond to speech.
- a home assistant device can include a microphone array to receive voice input and provide the corresponding voice data to a server for analysis to provide an answer to a question asked by a user.
- the server can provide the answer to the home assistant, which can provide the answer as voice output using a speaker.
- the user and the home assistant device can interact with each other using voice, and the interaction can be supplemented by a server outside of the home providing the answers.
- some users might have privacy concerns with sending voice data to a server outside of the home.
- a home assistant device including: a microphone; a speaker; one or more processors; and memory storing instructions, wherein the one or more processors are configured to execute the instructions such that the one or more processors and memory are configured to: detect first speech spoken by a first user of the home assistant device using the microphone; determine first characteristics of the first speech, the first characteristics including one or more of content of the first speech, time of the first speech, location of the first speech, distance from the home assistant device to a source of the first speech, identity of the first user providing the first speech, or audio characteristics of the first speech; determine first privacy expectations regarding the first speech based on the first characteristics of the first speech; provide the first speech to a cloud server based on the first privacy expectations corresponding to the first speech; receive a first response from the cloud server providing a response to the first speech; play back the first response using the speaker; detect second speech spoken by the first user of the home assistant device using the microphone; determine second characteristics of the second speech, the second characteristics including one or more of content of the second
- the local resources include one or both of hardware resources of the home assistant device or resources of other devices communicatively coupled with the home assistant device on the wireless network.
- Some of the subject matter described herein includes a method for privacy control in a connected environment, including: detecting first speech within an environment of an assistant device; determining, by a processor of the assistant device, first characteristics of the first speech; determining first privacy expectations regarding the first speech based on the first characteristics of the first speech; and providing the first speech to one or both of local resources of the assistant device or a cloud server based on the first privacy expectations.
- the first characteristics includes one or more of content of the first speech, time of the first speech, location of the first speech, distance from the home assistant device to a source of the first speech, identity of a user providing the first speech, or audio characteristics of the first speech.
- the first speech is provided to the cloud server, and the method further including: detecting second speech within the environment; determining second characteristics of the second speech, the first characteristics and the second characteristics being different; determining second privacy expectations regarding the second speech based on the second characteristics of the second speech, the first privacy expectations and the second privacy expectations being different; and providing the second speech to the local resources based on the second privacy expectations.
- the second privacy expectations represent higher privacy expectations than the first privacy expectations based on differences between the first characteristics and the second characteristics.
- the method includes: receiving first response data corresponding to the first speech from the cloud server; receiving second response data corresponding to the second speech from the local resources; and providing a response to the first speech and the second speech based on the first response data received from the cloud server and the second response data received from the local resources.
- the local resources include one or both of hardware resources of the assistant device or resources of other devices communicatively coupled with the assistant device on a wireless network.
- the first speech was provided at a first time
- the method further including: detecting second speech within an environment of the assistant device at a second time after the first time; determining second characteristics of the second speech, the first characteristics and the second characteristics being similar; determining second privacy expectations regarding the second speech based on the second characteristics, the first privacy expectations and the second privacy expectations being different based on a time difference between the first time and the second time; and providing the second speech to one or both of the local resources of the assistant device or the cloud server based on the second privacy expectations.
- Some of the subject matter described herein includes an electronic device, including: one or more processors; and memory storing instructions, wherein the one or more processors are configured to execute the instructions such that the one or more processors and memory are configured to: detect first speech within an environment of the electronic device; determine first characteristics of the first speech; determine first privacy expectations regarding the first speech based on the first characteristics of the first speech; and provide the first speech to one or both of local resources of the electronic device or a cloud server based on the first privacy expectations.
- the first characteristics includes one or more of content of the first speech, time of the first speech, location of the first speech, distance from the electronic device to a source of the first speech, identity of a user providing the first speech, or audio characteristics of the first speech.
- the first speech is provided to the cloud server, wherein the one or more processors are configured to execute the instructions such that the one or more processors and memory are configured to: detect second speech within the environment; determine second characteristics of the second speech, the first characteristics and the second characteristics being different; determine second privacy expectations regarding the second speech based on the second characteristics of the second speech, the first privacy expectations and the second privacy expectations being different; and provide the second speech to the local resources based on the second privacy expectations.
- the second privacy expectations represent higher privacy expectations than the first privacy expectations based on differences between the first characteristics and the second characteristics.
- the one or more processors are configured to execute the instructions such that the one or more processors and memory are configured to: receive first response data corresponding to the first speech from the cloud server; receive second response data corresponding to the second speech from the local resources; and provide a response to the first speech and the second speech based on the first response data received from the cloud server and the second response data received from the local resources.
- the local resources include one or both of hardware resources of the electronic device or resources of other devices communicatively coupled with the electronic device on a wireless network.
- the first speech was provided at a first time
- the one or more processors are configured to execute the instructions such that the one or more processors and memory are configured to: detect second speech within an environment of the electronic device at a second time after the first time; determine second characteristics of the second speech, the first characteristics and the second characteristics being similar; determine second privacy expectations regarding the second speech based on the second characteristics, the first privacy expectations and the second privacy expectations being different based on a time difference between the first time and the second time; and provide the second speech to one or both of the local resources of the electronic device or the cloud server based on the second privacy expectations.
- Some of the subject matter described herein includes a computer program product, comprising one or more non-transitory computer-readable media having computer program instructions stored therein, the computer program instructions being configured such that, when executed by one or more computing devices, the computer program instructions cause the one or more computing devices to: detect first speech within an environment of an electronic device; determine first characteristics of the first speech; determining first privacy expectations regarding the first speech based on the first characteristics of the first speech; and providing the first speech to one or both of local resources of the electronic device or a cloud server based on the first privacy expectations.
- the first characteristics includes one or more of content of the first speech, time of the first speech, location of the first speech, distance from the electronic device to a source of the first speech, identity of a user providing the first speech, or audio characteristics of the first speech.
- the first speech is provided to the cloud server, wherein the computer program instructions cause the one or more computing devices to: detect second speech within the environment; determine second characteristics of the second speech, the first characteristics and the second characteristics being different; determine second privacy expectations regarding the second speech based on the second characteristics of the second speech, the first privacy expectations and the second privacy expectations being different; and provide the second speech to the local resources based on the second privacy expectations.
- the second privacy expectations represent higher privacy expectations than the first privacy expectations based on differences between the first characteristics and the second characteristics.
- the computer program instructions cause the one or more computing devices to: receive first response data corresponding to the first speech from the cloud server; receive second response data corresponding to the second speech from the local resources; and provide a response to the first speech and the second speech based on the first response data received from the cloud server and the second response data received from the local resources.
- the local resources include one or both of hardware resources of the electronic device or resources of other devices communicatively coupled with the electronic device on a wireless network.
- the first speech was provided at a first time
- the computer program instructions cause the one or more computing devices to: detect second speech within an environment of the electronic device at a second time after the first time; determine second characteristics of the second speech, the first characteristics and the second characteristics being similar; determine second privacy expectations regarding the second speech based on the second characteristics, the first privacy expectations and the second privacy expectations being different based on a time difference between the first time and the second time; and provide the second speech to one or both of the local resources of the electronic device or the cloud server based on the second privacy expectations.
- Some of the subject matter described herein includes an electronic device, including: one or more processors; and memory storing instructions, wherein the one or more processors are configured to execute the instructions such that the one or more processors and memory are configured to: detect first noise within an environment of the electronic device; determine first characteristics of the first noise; determining first privacy expectations regarding the first noise based on the first characteristics of the first noise; and providing the first noise to one or both of local resources of the electronic device or a cloud server based on the first privacy expectations.
- the first characteristics includes one or more of content of the first noise, time of the first noise, location of the first noise, distance from the electronic device to a source of the first noise, identity of a user providing the first noise, or audio characteristics of the first noise.
- the first noise is provided to the cloud server, wherein the one or more processors are configured to execute the instructions such that the one or more processors and memory are configured to: detect second noise within the environment; determine second characteristics of the second noise, the first characteristics and the second characteristics being different; determine second privacy expectations regarding the second noise based on the second characteristics of the second noise, the first privacy expectations and the second privacy expectations being different; and provide the second noise to the local resources based on the second privacy expectations.
- the second privacy expectations represent higher privacy expectations than the first privacy expectations based on differences between the first characteristics and the second characteristics.
- the one or more processors are configured to execute the instructions such that the one or more processors and memory are configured to: receive first response data corresponding to the first noise from the cloud server; receive second response data corresponding to the second noise from the local resources; and provide a response to the first noise and the second noise based on the first response data received from the cloud server and the second response data received from the local resources.
- the local resources include one or both of hardware resources of the electronic device or resources of other devices communicatively coupled with the electronic device on a wireless network.
- the first noise was provided at a first time
- the one or more processors are configured to execute the instructions such that the one or more processors and memory are configured to: detect second noise within an environment of the electronic device at a second time after the first time; determine second characteristics of the second noise, the first characteristics and the second characteristics being similar; determine second privacy expectations regarding the second noise based on the second characteristics, the first privacy expectations and the second privacy expectations being different based on a time difference between the first time and the second time; and provide the second noise to one or both of the local resources of the electronic device or the cloud server based on the second privacy expectations.
- FIG. 1 illustrates an example of an assistant device responding to voice input.
- FIGS. 2A and 2B illustrate an example of a block diagram for an assistant device responding to voice input.
- FIG. 3 illustrates an example of an assistant device using local resources and cloud resources to respond to voice input.
- FIG. 4 illustrates an example of a block diagram for using local resources and cloud resources to respond to voice input.
- FIG. 5 illustrates an example of a block diagram of determining privacy expectations.
- FIG. 6 illustrates an example of an assistant device.
- a home assistant device can listen to speech asking a question in its vicinity using a microphone array and provide an audible answer to the question using a speaker.
- Some speech can include a hardware activation phrase in which the home assistant device can record and provide the rest of the speech subsequent to the hardware activation phrase to a server in the cloud via the Internet. The server in the cloud can then provide the answer by providing results data.
- a second hardware activation phrase can result in keeping the speech within the local resources of the home's connected environment, for example, the home assistant device itself can try to answer the question.
- the speech is “Cloud, what is today's date?” then “cloud” can be a hardware activation phrase indicating that “what is today's date?” can be provided to a cloud server.
- “local” can be a hardware activation phrase indicating that “what is today's date?” should be kept within the local resources of the home's connected environment. In this way, some speech can be kept locally within the home's connected environment rather than transmitted to a server in the cloud. This can allow some users to try to get answers to their questions that might include content that they might not want to leave their home environment due to privacy concerns.
- some speech can include a portion that can be answered by the local resources of the home's connected environment and another portion that can be answered by the cloud resources.
- the answers from the local resources and the cloud resources can then be analyzed and/or combined to provide an answer.
- speech can be provided to one or both of the cloud server and local resources without the use of a hardware activation phrase.
- the home assistant device can determine who is speaking to it (e.g., based on voice recognition, video recognition using a camera, etc.) and then determine the speaker's privacy expectations and then use the local resources, cloud resources, or both to provide an answer based on the determined privacy expectations.
- the context, content, timing, or other characteristics of the speech can be used to determine whether speech should be provided to the local resources, cloud resources, or both.
- FIG. 1 illustrates an example of an assistant device responding to voice input.
- home assistant device 105 can include an intelligent home assistant enabled by a microphone (or microphone array) to hear speech 110 and provide a response to speech 110 using one or more speakers.
- speech 110 can include a question and the response provided by home assistant device 105 can be an answer to that question provided in a voice output using the speakers.
- the experience using home assistant device 105 can be based on audio such as voice.
- the responses provided by home assistant device 105 can also be provided on a display screen, or a combination of both audio and the display screen. For example, answers to spoken questions can be textually or graphically displayed on the display screen of home assistant device 105 .
- speech 120 is provided to either cloud server 115 or kept within local resources 140 , for example, home assistant device 105 itself (e.g., its own hardware and software capabilities) or other devices within the home's wireless network (e.g., a tablet, laptop, etc. that home assistant device 105 is at least communicatively coupled with).
- Local hardware activation phrase 125 and cloud hardware activation phrase 130 can be different words or phrases of speech. For example, if cloud hardware activation phrase 130 is spoken and then speech 120 is subsequently spoken, then home assistant device 105 can determine that cloud hardware activation phrase 130 was spoken and then record speech 120 and provide speech 120 to cloud server 115 .
- Speech 120 can include a question or other type of content in which a response from home assistant device 105 is expected or would be useful.
- Cloud server 115 can analyze speech 120 (e.g., either translated into text by home assistant device 105 , or audio data including the speech) and provide results 135 b to home assistant device 105 providing the response.
- Results 135 b can be data such as text that home assistant device 105 can convert into speech played on its speakers, or results 135 b can be the audio that should be played on the speakers.
- speech 120 can be transmitted outside of the home's wireless network to cloud server 115 via the Internet.
- speech 120 can be kept within local resources 140 , which can be home assistant device 105 itself, other devices within the home's wireless network (e.g., a personal computer, laptop, tablet, smartphone, smartwatch, etc.), or a combination of home assistant device 105 and the other devices.
- local resources 140 can be home assistant device 105 itself, other devices within the home's wireless network (e.g., a personal computer, laptop, tablet, smartphone, smartwatch, etc.), or a combination of home assistant device 105 and the other devices.
- speech 120 can be provided to local resources 140 and results 135 a can be provided via the speaker in a similar manner as described above with respect to results 135 b.
- Some users might want to keep speech 120 within local resources 140 rather than cloud server 115 because they might not want sensitive content to be transmitted over the Internet to cloud server 115 outside of the home environment.
- a user can still use home assistant device 105 without fear of their privacy being violated. This can also allow for users to be more comfortable using home assistant device 105 because the user has more control over the privacy of their speech.
- FIGS. 2A and 2B illustrate an example of a block diagram for an assistant device responding to voice input.
- a home device can receive speech.
- home assistant device 105 can pick up speech 110 via its microphone.
- the speech can be recorded (e.g., saved in memory) for analysis by a processor of home assistant device 105 .
- the home device can determine which type of resource to use based on the hardware activation phrase of speech 110 . For example, in FIG.
- the hardware activation phrase can be local hardware activation phrase 125 representing an intent for the subsequent speech 120 to be contained within local resources 140 or cloud hardware activation phrase 130 representing an acceptability for the subsequent speech 120 to be provided to cloud server 115 .
- the speech can then be provided to the resource corresponding to the activation phrase.
- cloud hardware activation phrase 130 If cloud hardware activation phrase 130 is spoken, then at block 220 , the speech can be received by cloud server 115 and results based on that speech can be determined at block 225 . For example, if speech 120 included a question, then results 135 b including an answer to the question can be generated and provided to home assistant device 105 at block 230 .
- local hardware activation phrase 125 is spoken, then at block 235 in FIG. 2B , the speech can be provided to local resources 140 .
- Local resources 140 can receive speech 120 at block 240 , determine results based on the speech similar to block 225 , and then provide the results at block 245 .
- home assistant device 105 can include an alert indicating that speech 120 in FIG. 1 is about to be transmitted outside of the home environment to cloud server 115 .
- a light source such as a light emitting diode (LED) of home assistant device 105 can be turned on to indicate that speech 120 is about to be transmitted to cloud server 115 .
- the user can then interact with home assistant device 105 , for example, by pressing a button or using voice interaction to indicate that speech 120 should not be transmitted to cloud server 120 .
- speech 120 can then be attempted to be answered within local resources 140 .
- home assistant device 105 can also indicate that speech 120 will be kept within local resources 140 in a similar manner.
- Home assistant device 105 can also be instructed to send speech to cloud server 120 or local resources 140 based on other types of user interactions other than providing a hardware activation phrase. For example, the user can select or touch a button or touchscreen of home assistant device 105 to indicate that speech should be kept within local resources 140 . As another example, the user can select an application, or “app,” on a smartphone, press a button on a remote control, press a button on a smartwatch, etc. to indicate that speech should be kept within local resources 140 .
- local hardware activation phrase 125 and cloud hardware activation phrase 130 can be set by the user.
- local hardware activation phrase 125 can be a phrase including multiple words, a single word, a sound (e.g., whistling), etc. assigned by a user. The user assign another phrase, word, sound, etc. to cloud hardware activation phrase 130 such that they can be differentiated from each other.
- FIG. 3 illustrates an example of an assistant device using local resources and cloud resources to respond to voice input.
- speech 120 can be provided to home assistant device 105 and it can determine that speech 120 includes portion 305 that should be provided to local resources 140 and portion 310 that should be provided to cloud server 115 . That is, speech 120 can include one portion for local resources and another portion for cloud resources even without the use of a hardware activation phrase.
- Home assistant device 105 can separate the two portions (e.g., based on characteristics of the portions, as discussed later herein) and provide them to the respective resources (i.e., either local resources 140 or cloud server 115 ).
- results 315 a and 315 b provided to home assistant device 105 .
- Home assistant device 105 can use both results 315 a and 315 b to provide an answer to speech 120 .
- both can be combined to provide an answer. That is, results from both local resources 140 and cloud server 115 can be used to provide a response to a user's speech.
- one of local resources 140 or cloud server 115 can be prioritized and the results of that one can be used for the answer or the corresponding portion of the answer.
- FIG. 4 illustrates an example of a block diagram for using local resources and cloud resources to respond to voice input.
- speech can be received.
- speech 120 having local portion 305 and cloud portion 310 can be received.
- cloud portion 310 should be provided to cloud server 115 and local portion 305 should be provided to local resources 140 .
- the different portions of the speech can be determined based on characteristics of one or more of the portions of the speech (e.g., content, time, location, person or identity of person providing the speech, etc.).
- a portion of the speech within a time period before and after that word was spoken can be identified as one of the portions (e.g., local portion 305 ) and the rest of the speech can be identified as the other portion (e.g., cloud portion 310 ).
- some words can be identified as being related to sensitive speech that a user might not want to be sent to cloud server 115 , and therefore, if the certain word is identified as a sensitive word then local portion 305 can be identified.
- home assistant device 105 can include a dictionary (e.g., data in memory) of sensitive words that can be identified.
- the different portions can be provided to the resources.
- new speech data for the portions can be generated and provided to cloud server 115 and local resources 140 at blocks 420 and 435 , respectively.
- results based on the portions can be determined by the resources.
- the results can then be provided to the home assistant device at blocks 430 and 445 .
- Home assistant device 105 can then use both results received from the cloud resources and local resources to provide a response. For example, both can be combined to provide an answer to a question that was asked. That is, results from both local resources 140 and cloud server 115 can be used to provide a response to a user's speech.
- home assistant device 105 can determine portions of speech 120 that are relatively sensitive and classify those portions as local portion 205 and provided to local resources 140 . Portions that are not sensitive can be classified as cloud portion 210 and provided to cloud server 115 . For example, home assistant device 105 can develop an understanding of a user's privacy expectations and classify speech as local portion 205 based on the user's privacy expectations. Thus, characteristics of the speech can result in different privacy expectations and those privacy expectations can be used to determine whether speech should be provided to cloud server 115 or local resources 140 .
- home assistant device 105 can determine who is speaking. For example, home assistant device 105 can use voice recognition to determine a particular user. In another example, home assistant device 105 can include a camera to visually determine who is speaking, or home assistant device 105 can access a camera connected with the home's wireless network or a personal area network (PAN) set up by either the camera or home assistant device 105 . Based on the user interacting with home assistant device 105 , different privacy expectations can be determined. As a result, different users can say the same speech 120 , but different local portion 305 and cloud portion 310 may be identified based the privacy expectations of the user.
- PAN personal area network
- other characteristics of speech 120 can be used to determine the privacy expectations, and therefore, whether local resources 140 or cloud server 115 is to be used for speech 120 or a portion of speech 120 .
- the context e.g., multiple people talking, whether the user appears to be incapacitated in some manner such as intoxicated, etc.
- the content of speech 120 can be used, as previously discussed. For example, if the user is identified as speaking often regarding privacy concerns, discussing topics related to privacy, etc. then the privacy expectations of that user can be increased.
- the time when speech 120 was received by home assistant device 105 can be used. In one example, if speech 120 was received late at night or early in the morning, then this can indicate a higher privacy expectation.
- a user's speech is quiet (e.g., the volume of the speech is determined to be within a threshold volume range or beneath a threshold volume value), then this can mean that the user expects more privacy, and therefore, the privacy expectations for that speech can be stricter, increasing the likelihood of the speech or portions of the speech being provided to local resources 140 rather than cloud server 115 .
- the volume of the user's speech is loud, then this can indicate that it is not a sensitive topic, and therefore, the speech can be provided to cloud server 115 .
- audio characteristics of the speech can be used.
- stuttering, mumbling, etc. can also be used to determine the privacy expectations. For example, if a user is stuttering, the he or she may be nervous (e.g., due to the content of the speech) and, therefore, might not want the speech to be provided to cloud server 115 .
- Other characteristics of the speech of the users can be determined to adjust the privacy expectations. For example, the distance of the user from home assistant device 105 can be used to determine the user's privacy expectations. If the user is close to home assistant device 105 (e.g., determined to be within a threshold distance range of home assistant device 105 using cameras or audio recognition), then this can indicate that the user has higher privacy expectations, and therefore, the speech or portions of the speech should be provided to local resources 140 rather than cloud server 115 . If the user is farther away, then this might indicate that the user has lower privacy expectations.
- the distance of the user from home assistant device 105 can be used to determine the user's privacy expectations. If the user is close to home assistant device 105 (e.g., determined to be within a threshold distance range of home assistant device 105 using cameras or audio recognition), then this can indicate that the user has higher privacy expectations, and therefore, the speech or portions of the speech should be provided to local resources 140 rather than cloud server 115 . If the user is farther away, then this might indicate that the user has lower privacy
- the location of the speech can influence whether the speech is kept within local resources 140 , cloud server 115 , or both. For example, if speech is from participants in a bedroom, then it might be kept within local resources 140 due to that speech being from a more sensitive location where many people have a higher expectation of privacy. By contrast, if speech is from participants in a living room, then it can be provided to cloud server 115 . Accordingly, home assistant device 105 can determine the location of speech and then determine whether that speech should be kept within local resources 140 , cloud server 115 , or both based on the location within the home environment.
- home assistant device 105 can determine that a user's privacy expectations have changed. For example, home assistant device 105 can store the user's birthdays or ages, and as the user ages, the privacy expectations can become stricter (i.e., more speech is to be restricted to local resources 140 rather than allowed to be transmitted to cloud server 115 ). In another example, as the user ages, the privacy expectations can be more lenient (i.e., more speech is to be allowed to be transmitted to cloud server 115 ).
- FIG. 5 illustrates an example of a block diagram of determining privacy expectations.
- characteristics of the speech can be determined. For example, as previously discussed, the context of the speech, the volume of the speech, etc. can be used to determine various characteristics.
- privacy expectations can be determined based on those characteristics. For example, if a user is speaking quietly, then higher privacy expectations can be determined than if the user is speaking loudly.
- the speech can be provided to cloud resources or local resources based on the privacy expectations. For example, higher privacy expectations can result in speech being provided to local resources rather than cloud resources.
- home assistant device 105 can be set with user preferences as to what should be provided to local resources 140 and cloud server 115 . In some implementations, home assistant device 105 can learn the user's privacy expectations over time.
- the speech can include commands.
- home assistant device 105 can be commanded to perform an activity, such as turn on lights, open windows, turn on a security system, etc. in a smart home environment.
- speech including commands can be provided to cloud server 115 and it can perform speech-to-text translation. It can then provide results to home assistant device 105 with what it's supposed to do. That is, it can be provided data indicating how it should be responding to the commands, for example, turn on lights.
- Home assistant device 105 can then act on those commands. This can allow for cloud server 115 to perform the processing to determine the content of speech, but home assistant device 105 to actually perform the commands rather than cloud server 115 .
- home assistant device 105 can process a subset of possible speech on-device, but speech outside of its capabilities can be provided to cloud server 115 .
- home assistant device 105 might be able to recognize speech for a small dictionary (e.g., four hundred words) so that it can perform common commands, such as turning on lights, adjusting a thermostat, etc. This can allow home assistant device 105 to control various devices in the home without transmitting data to cloud server 115 , and therefore, it can still control devices even if the Internet connection to cloud server 115 goes down.
- more complex speech including commands can be determined to include content outside of the dictionary, and therefore, can be provided to cloud server 115 for processing.
- Home assistant device 105 can also provide a response based on whether results are received from cloud resources, local resources, or both. For example, home assistant device 105 can play back audio response to speech 120 at different volumes based on where the response or a portion of the response was received from. If results 315 a in FIG. 3 is received (i.e., some speech was provided to local resources), then the volume of playback of the response to speech 120 can be lower than if only results 315 b (i.e., speech was provided to cloud server 115 ) were received. In some implementations, the response to speech 120 can be displayed on the display screen of home assistant device 105 if speech was provided to local resources. In another example, if the results are only from cloud server 115 , then the response can be played back on the speaker of home assistant device 105 .
- privacy expectations can be determined using many of the aforementioned examples.
- An increase in privacy expectations can result in home assistant device 105 encrypting data provided to cloud server 115 more, for example, using different encryption algorithms that might take longer to encrypt and for cloud server 115 to decrypt.
- some users might find a delay acceptable if their privacy is ensured.
- a hierarchy of encryptions levels can provide different levels, strengths, or types of encryption based on the determined privacy expectations.
- home assistant device 105 can include an intercom feature and a home environment can include multiple home assistant devices.
- the different home assistant devices can communicate with each other and other devices (e.g., speakers) using technology such as Bluetooth, local WLAN, etc. This can allow users to communicate securely within a home without having communications routed through cellular communications.
- whether speech is provided to cloud resources or local resources can also be based on the context of an activity.
- the activity can be understood through the context of what is being communicated.
- the context can include the time of day, past behaviors, or other variables.
- noise within the environment can be used with the devices and techniques disclosed herein.
- music, television sounds, etc. can be used.
- environmental sounds such as glass breaking, objects shattering, etc. can be determined and provided to one or both of the local resources or cloud server based on the techniques disclosed herein.
- FIG. 6 illustrates an example of an assistant device.
- home assistant device 105 can be an electronic device with one or more processors 605 (e.g., circuits) and memory 610 for storing instructions that can be executed by processors 605 to implement privacy control 630 providing the techniques described herein.
- Home assistant device 105 can also include microphone 620 (e.g., one or more microphones that can implement a microphone array) to convert sounds into electrical signals, and therefore, speech into data that can be processed using processors 605 and stored in memory 610 .
- Speaker 615 can be used to provide audio output.
- display 625 can display a graphical user interface (GUI) implemented by processors 605 and memory 610 to provide visual feedback.
- GUI graphical user interface
- Memory 610 can be a non-transitory computer-readable storage media.
- Home assistant device 105 can also include various other hardware, such as cameras, antennas, etc. to implement the techniques disclosed herein.
- the examples described herein can be implemented with programmable circuitry (e.g., one or more microprocessors) programmed with software and/or firmware, or entirely in special-purpose hardwired (non-programmable) circuitry, or in a combination of such forms.
- Special-purpose hardwired circuitry may be in the form of, for example, one or more application specific integrated circuits (ASICs), complex programmable logic devices (CPLDs), field programmable gate arrays (FPGAs), structured ASICs, etc.
- ASICs application specific integrated circuits
- CPLDs complex programmable logic devices
- FPGAs field programmable gate arrays
- structured ASICs etc.
Landscapes
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Computer Security & Cryptography (AREA)
- Multimedia (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Computer Networks & Wireless Communication (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Telephonic Communication Services (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
Privacy control in a connected environment is described. An assistant device can detect speech spoken within its environment. The assistant device can determine characteristics of that speech and determine privacy expectations regarding the first speech based on the characteristics. Based on the privacy expectations, the speech can be provided one or both of local resources of the assistant device or a cloud server to receive a response regarding the speech.
Description
- This application claims priority to U.S. Provisional Patent Application No. 62/448,923, entitled “Privacy Control in a Connected Environment,” by Segal et al., and filed on Jan. 20, 2017. This application also claims priority to U.S. Provisional Patent Application No. 62/486,392, entitled “Privacy Control in a Connected Environment Based on Speech Characteristics,” by Segal, and filed on Apr. 17, 2017. This application also claims priority to U.S. Provisional Patent Application No. 62/486,388, entitled “Privacy Control in a Connected Environment,” by Segal, and filed on Apr. 17, 2017. The content of the above-identified applications are incorporated herein by reference in their entirety.
- This disclosure relates to privacy control, and in particular privacy control in a connected environment such as a home.
- The Internet of Things (IoT) allows for the internetworking of devices to exchange data among themselves to enable sophisticated functionality. For example, devices configured for home automation can exchange data to allow for the control and automation of lighting, air conditioning systems, security, etc. In the smart home environment, this can also include home assistant devices providing an intelligent personal assistant to respond to speech. For example, a home assistant device can include a microphone array to receive voice input and provide the corresponding voice data to a server for analysis to provide an answer to a question asked by a user. The server can provide the answer to the home assistant, which can provide the answer as voice output using a speaker. As such, the user and the home assistant device can interact with each other using voice, and the interaction can be supplemented by a server outside of the home providing the answers. However, some users might have privacy concerns with sending voice data to a server outside of the home.
- Some of the subject matter described herein includes a home assistant device including: a microphone; a speaker; one or more processors; and memory storing instructions, wherein the one or more processors are configured to execute the instructions such that the one or more processors and memory are configured to: detect first speech spoken by a first user of the home assistant device using the microphone; determine first characteristics of the first speech, the first characteristics including one or more of content of the first speech, time of the first speech, location of the first speech, distance from the home assistant device to a source of the first speech, identity of the first user providing the first speech, or audio characteristics of the first speech; determine first privacy expectations regarding the first speech based on the first characteristics of the first speech; provide the first speech to a cloud server based on the first privacy expectations corresponding to the first speech; receive a first response from the cloud server providing a response to the first speech; play back the first response using the speaker; detect second speech spoken by the first user of the home assistant device using the microphone; determine second characteristics of the second speech, the second characteristics including one or more of content of the second speech, time of the second speech, location of the second speech, distance from the home assistant device to a source of the second speech, identity of the first user providing the second speech, or audio characteristics of the second speech; determine second privacy expectations regarding the second speech based on the characteristics of the second speech, the first privacy expectations and the second privacy expectations being different, the second privacy expectations representing higher privacy expectations than the first privacy expectations based on differences between the first characteristics and the second characteristics; provide the second speech to local resources of a wireless network associated with the electronic device rather than the cloud server based on the second privacy expectations; receive a second response from the local resources providing a response to the second speech; and play back the second response using the speaker.
- In some implementations, the local resources include one or both of hardware resources of the home assistant device or resources of other devices communicatively coupled with the home assistant device on the wireless network.
- Some of the subject matter described herein includes a method for privacy control in a connected environment, including: detecting first speech within an environment of an assistant device; determining, by a processor of the assistant device, first characteristics of the first speech; determining first privacy expectations regarding the first speech based on the first characteristics of the first speech; and providing the first speech to one or both of local resources of the assistant device or a cloud server based on the first privacy expectations.
- In some implementations, the first characteristics includes one or more of content of the first speech, time of the first speech, location of the first speech, distance from the home assistant device to a source of the first speech, identity of a user providing the first speech, or audio characteristics of the first speech.
- In some implementations, the first speech is provided to the cloud server, and the method further including: detecting second speech within the environment; determining second characteristics of the second speech, the first characteristics and the second characteristics being different; determining second privacy expectations regarding the second speech based on the second characteristics of the second speech, the first privacy expectations and the second privacy expectations being different; and providing the second speech to the local resources based on the second privacy expectations.
- In some implementations, the second privacy expectations represent higher privacy expectations than the first privacy expectations based on differences between the first characteristics and the second characteristics.
- In some implementations, the method includes: receiving first response data corresponding to the first speech from the cloud server; receiving second response data corresponding to the second speech from the local resources; and providing a response to the first speech and the second speech based on the first response data received from the cloud server and the second response data received from the local resources.
- In some implementations, the local resources include one or both of hardware resources of the assistant device or resources of other devices communicatively coupled with the assistant device on a wireless network.
- In some implementations, the first speech was provided at a first time, the method further including: detecting second speech within an environment of the assistant device at a second time after the first time; determining second characteristics of the second speech, the first characteristics and the second characteristics being similar; determining second privacy expectations regarding the second speech based on the second characteristics, the first privacy expectations and the second privacy expectations being different based on a time difference between the first time and the second time; and providing the second speech to one or both of the local resources of the assistant device or the cloud server based on the second privacy expectations.
- Some of the subject matter described herein includes an electronic device, including: one or more processors; and memory storing instructions, wherein the one or more processors are configured to execute the instructions such that the one or more processors and memory are configured to: detect first speech within an environment of the electronic device; determine first characteristics of the first speech; determine first privacy expectations regarding the first speech based on the first characteristics of the first speech; and provide the first speech to one or both of local resources of the electronic device or a cloud server based on the first privacy expectations.
- In some implementations, the first characteristics includes one or more of content of the first speech, time of the first speech, location of the first speech, distance from the electronic device to a source of the first speech, identity of a user providing the first speech, or audio characteristics of the first speech.
- In some implementations, the first speech is provided to the cloud server, wherein the one or more processors are configured to execute the instructions such that the one or more processors and memory are configured to: detect second speech within the environment; determine second characteristics of the second speech, the first characteristics and the second characteristics being different; determine second privacy expectations regarding the second speech based on the second characteristics of the second speech, the first privacy expectations and the second privacy expectations being different; and provide the second speech to the local resources based on the second privacy expectations.
- In some implementations, the second privacy expectations represent higher privacy expectations than the first privacy expectations based on differences between the first characteristics and the second characteristics.
- In some implementations, the one or more processors are configured to execute the instructions such that the one or more processors and memory are configured to: receive first response data corresponding to the first speech from the cloud server; receive second response data corresponding to the second speech from the local resources; and provide a response to the first speech and the second speech based on the first response data received from the cloud server and the second response data received from the local resources.
- In some implementations, the local resources include one or both of hardware resources of the electronic device or resources of other devices communicatively coupled with the electronic device on a wireless network.
- In some implementations, the first speech was provided at a first time, wherein the one or more processors are configured to execute the instructions such that the one or more processors and memory are configured to: detect second speech within an environment of the electronic device at a second time after the first time; determine second characteristics of the second speech, the first characteristics and the second characteristics being similar; determine second privacy expectations regarding the second speech based on the second characteristics, the first privacy expectations and the second privacy expectations being different based on a time difference between the first time and the second time; and provide the second speech to one or both of the local resources of the electronic device or the cloud server based on the second privacy expectations.
- Some of the subject matter described herein includes a computer program product, comprising one or more non-transitory computer-readable media having computer program instructions stored therein, the computer program instructions being configured such that, when executed by one or more computing devices, the computer program instructions cause the one or more computing devices to: detect first speech within an environment of an electronic device; determine first characteristics of the first speech; determining first privacy expectations regarding the first speech based on the first characteristics of the first speech; and providing the first speech to one or both of local resources of the electronic device or a cloud server based on the first privacy expectations.
- In some implementations, the first characteristics includes one or more of content of the first speech, time of the first speech, location of the first speech, distance from the electronic device to a source of the first speech, identity of a user providing the first speech, or audio characteristics of the first speech.
- In some implementations, the first speech is provided to the cloud server, wherein the computer program instructions cause the one or more computing devices to: detect second speech within the environment; determine second characteristics of the second speech, the first characteristics and the second characteristics being different; determine second privacy expectations regarding the second speech based on the second characteristics of the second speech, the first privacy expectations and the second privacy expectations being different; and provide the second speech to the local resources based on the second privacy expectations.
- In some implementations, the second privacy expectations represent higher privacy expectations than the first privacy expectations based on differences between the first characteristics and the second characteristics.
- In some implementations, the computer program instructions cause the one or more computing devices to: receive first response data corresponding to the first speech from the cloud server; receive second response data corresponding to the second speech from the local resources; and provide a response to the first speech and the second speech based on the first response data received from the cloud server and the second response data received from the local resources.
- In some implementations, the local resources include one or both of hardware resources of the electronic device or resources of other devices communicatively coupled with the electronic device on a wireless network.
- In some implementations, the first speech was provided at a first time, wherein the computer program instructions cause the one or more computing devices to: detect second speech within an environment of the electronic device at a second time after the first time; determine second characteristics of the second speech, the first characteristics and the second characteristics being similar; determine second privacy expectations regarding the second speech based on the second characteristics, the first privacy expectations and the second privacy expectations being different based on a time difference between the first time and the second time; and provide the second speech to one or both of the local resources of the electronic device or the cloud server based on the second privacy expectations.
- Some of the subject matter described herein includes an electronic device, including: one or more processors; and memory storing instructions, wherein the one or more processors are configured to execute the instructions such that the one or more processors and memory are configured to: detect first noise within an environment of the electronic device; determine first characteristics of the first noise; determining first privacy expectations regarding the first noise based on the first characteristics of the first noise; and providing the first noise to one or both of local resources of the electronic device or a cloud server based on the first privacy expectations.
- In some implementations, the first characteristics includes one or more of content of the first noise, time of the first noise, location of the first noise, distance from the electronic device to a source of the first noise, identity of a user providing the first noise, or audio characteristics of the first noise.
- In some implementations, the first noise is provided to the cloud server, wherein the one or more processors are configured to execute the instructions such that the one or more processors and memory are configured to: detect second noise within the environment; determine second characteristics of the second noise, the first characteristics and the second characteristics being different; determine second privacy expectations regarding the second noise based on the second characteristics of the second noise, the first privacy expectations and the second privacy expectations being different; and provide the second noise to the local resources based on the second privacy expectations.
- In some implementations, the second privacy expectations represent higher privacy expectations than the first privacy expectations based on differences between the first characteristics and the second characteristics.
- In some implementations, the one or more processors are configured to execute the instructions such that the one or more processors and memory are configured to: receive first response data corresponding to the first noise from the cloud server; receive second response data corresponding to the second noise from the local resources; and provide a response to the first noise and the second noise based on the first response data received from the cloud server and the second response data received from the local resources.
- In some implementations, the local resources include one or both of hardware resources of the electronic device or resources of other devices communicatively coupled with the electronic device on a wireless network.
- In some implementations, the first noise was provided at a first time, wherein the one or more processors are configured to execute the instructions such that the one or more processors and memory are configured to: detect second noise within an environment of the electronic device at a second time after the first time; determine second characteristics of the second noise, the first characteristics and the second characteristics being similar; determine second privacy expectations regarding the second noise based on the second characteristics, the first privacy expectations and the second privacy expectations being different based on a time difference between the first time and the second time; and provide the second noise to one or both of the local resources of the electronic device or the cloud server based on the second privacy expectations.
-
FIG. 1 illustrates an example of an assistant device responding to voice input. -
FIGS. 2A and 2B illustrate an example of a block diagram for an assistant device responding to voice input. -
FIG. 3 illustrates an example of an assistant device using local resources and cloud resources to respond to voice input. -
FIG. 4 illustrates an example of a block diagram for using local resources and cloud resources to respond to voice input. -
FIG. 5 illustrates an example of a block diagram of determining privacy expectations. -
FIG. 6 illustrates an example of an assistant device. - This disclosure describes devices and techniques for managing privacy in an environment with connected devices. In one example, a home assistant device can listen to speech asking a question in its vicinity using a microphone array and provide an audible answer to the question using a speaker. Some speech can include a hardware activation phrase in which the home assistant device can record and provide the rest of the speech subsequent to the hardware activation phrase to a server in the cloud via the Internet. The server in the cloud can then provide the answer by providing results data. A second hardware activation phrase can result in keeping the speech within the local resources of the home's connected environment, for example, the home assistant device itself can try to answer the question. For example, if the speech is “Cloud, what is today's date?” then “cloud” can be a hardware activation phrase indicating that “what is today's date?” can be provided to a cloud server. By contrast, if the speech is “Local, what is today's date?” then “local” can be a hardware activation phrase indicating that “what is today's date?” should be kept within the local resources of the home's connected environment. In this way, some speech can be kept locally within the home's connected environment rather than transmitted to a server in the cloud. This can allow some users to try to get answers to their questions that might include content that they might not want to leave their home environment due to privacy concerns.
- In another example, some speech can include a portion that can be answered by the local resources of the home's connected environment and another portion that can be answered by the cloud resources. The answers from the local resources and the cloud resources can then be analyzed and/or combined to provide an answer. As a result, speech can be provided to one or both of the cloud server and local resources without the use of a hardware activation phrase.
- In another example, the home assistant device can determine who is speaking to it (e.g., based on voice recognition, video recognition using a camera, etc.) and then determine the speaker's privacy expectations and then use the local resources, cloud resources, or both to provide an answer based on the determined privacy expectations. The context, content, timing, or other characteristics of the speech can be used to determine whether speech should be provided to the local resources, cloud resources, or both.
- In more detail,
FIG. 1 illustrates an example of an assistant device responding to voice input. InFIG. 1 ,home assistant device 105 can include an intelligent home assistant enabled by a microphone (or microphone array) to hearspeech 110 and provide a response tospeech 110 using one or more speakers. For example,speech 110 can include a question and the response provided byhome assistant device 105 can be an answer to that question provided in a voice output using the speakers. Accordingly, the experience usinghome assistant device 105 can be based on audio such as voice. However, in other implementations, the responses provided byhome assistant device 105 can also be provided on a display screen, or a combination of both audio and the display screen. For example, answers to spoken questions can be textually or graphically displayed on the display screen ofhome assistant device 105. - In
FIG. 1 , based on whether the hardware activation phrase is localhardware activation phrase 125 or cloudhardware activation phrase 130,speech 120 is provided to eithercloud server 115 or kept withinlocal resources 140, for example,home assistant device 105 itself (e.g., its own hardware and software capabilities) or other devices within the home's wireless network (e.g., a tablet, laptop, etc. thathome assistant device 105 is at least communicatively coupled with). Localhardware activation phrase 125 and cloudhardware activation phrase 130 can be different words or phrases of speech. For example, if cloudhardware activation phrase 130 is spoken and thenspeech 120 is subsequently spoken, then homeassistant device 105 can determine that cloudhardware activation phrase 130 was spoken and thenrecord speech 120 and providespeech 120 tocloud server 115.Speech 120 can include a question or other type of content in which a response fromhome assistant device 105 is expected or would be useful.Cloud server 115 can analyze speech 120 (e.g., either translated into text byhome assistant device 105, or audio data including the speech) and provideresults 135 b tohome assistant device 105 providing the response.Results 135 b can be data such as text thathome assistant device 105 can convert into speech played on its speakers, orresults 135 b can be the audio that should be played on the speakers. As such,speech 120 can be transmitted outside of the home's wireless network tocloud server 115 via the Internet. - By contrast, if local
hardware activation phrase 125 is spoken, thenspeech 120 can be kept withinlocal resources 140, which can be homeassistant device 105 itself, other devices within the home's wireless network (e.g., a personal computer, laptop, tablet, smartphone, smartwatch, etc.), or a combination ofhome assistant device 105 and the other devices. For example, inFIG. 1 ,speech 120 can be provided tolocal resources 140 andresults 135 a can be provided via the speaker in a similar manner as described above with respect toresults 135 b. - Some users might want to keep
speech 120 withinlocal resources 140 rather thancloud server 115 because they might not want sensitive content to be transmitted over the Internet tocloud server 115 outside of the home environment. As a result, by having two different hardware activation phrases, a user can still use homeassistant device 105 without fear of their privacy being violated. This can also allow for users to be more comfortable usinghome assistant device 105 because the user has more control over the privacy of their speech. -
FIGS. 2A and 2B illustrate an example of a block diagram for an assistant device responding to voice input. InFIG. 2A , atblock 205, a home device can receive speech. For example, inFIG. 1 ,home assistant device 105 can pick upspeech 110 via its microphone. In some implementations, the speech can be recorded (e.g., saved in memory) for analysis by a processor ofhome assistant device 105. Atblock 210, the home device can determine which type of resource to use based on the hardware activation phrase ofspeech 110. For example, inFIG. 1 , the hardware activation phrase can be localhardware activation phrase 125 representing an intent for thesubsequent speech 120 to be contained withinlocal resources 140 or cloudhardware activation phrase 130 representing an acceptability for thesubsequent speech 120 to be provided tocloud server 115. Inblock 215, the speech can then be provided to the resource corresponding to the activation phrase. - If cloud
hardware activation phrase 130 is spoken, then atblock 220, the speech can be received bycloud server 115 and results based on that speech can be determined atblock 225. For example, ifspeech 120 included a question, then results 135 b including an answer to the question can be generated and provided tohome assistant device 105 atblock 230. - If local
hardware activation phrase 125 is spoken, then atblock 235 inFIG. 2B , the speech can be provided tolocal resources 140.Local resources 140 can receivespeech 120 atblock 240, determine results based on the speech similar to block 225, and then provide the results atblock 245. - In some implementations,
home assistant device 105 can include an alert indicating thatspeech 120 inFIG. 1 is about to be transmitted outside of the home environment tocloud server 115. For example, a light source such as a light emitting diode (LED) ofhome assistant device 105 can be turned on to indicate thatspeech 120 is about to be transmitted tocloud server 115. The user can then interact withhome assistant device 105, for example, by pressing a button or using voice interaction to indicate thatspeech 120 should not be transmitted tocloud server 120. In some implementations,speech 120 can then be attempted to be answered withinlocal resources 140. In some implementations,home assistant device 105 can also indicate thatspeech 120 will be kept withinlocal resources 140 in a similar manner. -
Home assistant device 105 can also be instructed to send speech tocloud server 120 orlocal resources 140 based on other types of user interactions other than providing a hardware activation phrase. For example, the user can select or touch a button or touchscreen ofhome assistant device 105 to indicate that speech should be kept withinlocal resources 140. As another example, the user can select an application, or “app,” on a smartphone, press a button on a remote control, press a button on a smartwatch, etc. to indicate that speech should be kept withinlocal resources 140. - In some implementations, local
hardware activation phrase 125 and cloudhardware activation phrase 130 can be set by the user. For example, localhardware activation phrase 125 can be a phrase including multiple words, a single word, a sound (e.g., whistling), etc. assigned by a user. The user assign another phrase, word, sound, etc. to cloudhardware activation phrase 130 such that they can be differentiated from each other. - Portions of speech can be provided to both cloud resources and local resources.
FIG. 3 illustrates an example of an assistant device using local resources and cloud resources to respond to voice input. InFIG. 3 ,speech 120 can be provided tohome assistant device 105 and it can determine thatspeech 120 includesportion 305 that should be provided tolocal resources 140 andportion 310 that should be provided tocloud server 115. That is,speech 120 can include one portion for local resources and another portion for cloud resources even without the use of a hardware activation phrase.Home assistant device 105 can separate the two portions (e.g., based on characteristics of the portions, as discussed later herein) and provide them to the respective resources (i.e., eitherlocal resources 140 or cloud server 115). This results in 315 a and 315 b provided toresults home assistant device 105.Home assistant device 105 can use both 315 a and 315 b to provide an answer toresults speech 120. For example, both can be combined to provide an answer. That is, results from bothlocal resources 140 andcloud server 115 can be used to provide a response to a user's speech. In some implementations, if there is some inconsistency with the answers provided by 315 a and 315 b, then one ofresults local resources 140 orcloud server 115 can be prioritized and the results of that one can be used for the answer or the corresponding portion of the answer. -
FIG. 4 illustrates an example of a block diagram for using local resources and cloud resources to respond to voice input. InFIG. 4 , atblock 405, speech can be received. For example, inFIG. 3 ,speech 120 havinglocal portion 305 andcloud portion 310 can be received. Atblock 410, it can be determined that the speech includes a first portion to be provided to cloud resources and a second portion to be provided to local resources. For example, inFIG. 3 ,cloud portion 310 should be provided tocloud server 115 andlocal portion 305 should be provided tolocal resources 140. In some implementations, the different portions of the speech can be determined based on characteristics of one or more of the portions of the speech (e.g., content, time, location, person or identity of person providing the speech, etc.). For example, if a certain word has been detected, then a portion of the speech within a time period before and after that word was spoken can be identified as one of the portions (e.g., local portion 305) and the rest of the speech can be identified as the other portion (e.g., cloud portion 310). In one example, some words can be identified as being related to sensitive speech that a user might not want to be sent tocloud server 115, and therefore, if the certain word is identified as a sensitive word thenlocal portion 305 can be identified. Thus,home assistant device 105 can include a dictionary (e.g., data in memory) of sensitive words that can be identified. Atblock 415, the different portions can be provided to the resources. For example, new speech data for the portions can be generated and provided tocloud server 115 andlocal resources 140 at 420 and 435, respectively. Atblocks 425 and 440, results based on the portions can be determined by the resources. The results can then be provided to the home assistant device atblocks 430 and 445.blocks Home assistant device 105 can then use both results received from the cloud resources and local resources to provide a response. For example, both can be combined to provide an answer to a question that was asked. That is, results from bothlocal resources 140 andcloud server 115 can be used to provide a response to a user's speech. - Regarding characteristics of the speech,
home assistant device 105 can determine portions ofspeech 120 that are relatively sensitive and classify those portions aslocal portion 205 and provided tolocal resources 140. Portions that are not sensitive can be classified ascloud portion 210 and provided tocloud server 115. For example,home assistant device 105 can develop an understanding of a user's privacy expectations and classify speech aslocal portion 205 based on the user's privacy expectations. Thus, characteristics of the speech can result in different privacy expectations and those privacy expectations can be used to determine whether speech should be provided tocloud server 115 orlocal resources 140. - In some implementations,
home assistant device 105 can determine who is speaking. For example,home assistant device 105 can use voice recognition to determine a particular user. In another example,home assistant device 105 can include a camera to visually determine who is speaking, orhome assistant device 105 can access a camera connected with the home's wireless network or a personal area network (PAN) set up by either the camera orhome assistant device 105. Based on the user interacting withhome assistant device 105, different privacy expectations can be determined. As a result, different users can say thesame speech 120, but differentlocal portion 305 andcloud portion 310 may be identified based the privacy expectations of the user. - In some implementations, other characteristics of
speech 120 can be used to determine the privacy expectations, and therefore, whetherlocal resources 140 orcloud server 115 is to be used forspeech 120 or a portion ofspeech 120. For example, the context (e.g., multiple people talking, whether the user appears to be incapacitated in some manner such as intoxicated, etc.) ofspeech 120 can be used. In another example the content ofspeech 120 can be used, as previously discussed. For example, if the user is identified as speaking often regarding privacy concerns, discussing topics related to privacy, etc. then the privacy expectations of that user can be increased. In another example, the time whenspeech 120 was received byhome assistant device 105 can be used. In one example, ifspeech 120 was received late at night or early in the morning, then this can indicate a higher privacy expectation. - In another example, if a user's speech is quiet (e.g., the volume of the speech is determined to be within a threshold volume range or beneath a threshold volume value), then this can mean that the user expects more privacy, and therefore, the privacy expectations for that speech can be stricter, increasing the likelihood of the speech or portions of the speech being provided to
local resources 140 rather thancloud server 115. If the volume of the user's speech is loud, then this can indicate that it is not a sensitive topic, and therefore, the speech can be provided tocloud server 115. Thus, audio characteristics of the speech can be used. In other examples, stuttering, mumbling, etc. can also be used to determine the privacy expectations. For example, if a user is stuttering, the he or she may be nervous (e.g., due to the content of the speech) and, therefore, might not want the speech to be provided tocloud server 115. - Other characteristics of the speech of the users can be determined to adjust the privacy expectations. For example, the distance of the user from
home assistant device 105 can be used to determine the user's privacy expectations. If the user is close to home assistant device 105 (e.g., determined to be within a threshold distance range ofhome assistant device 105 using cameras or audio recognition), then this can indicate that the user has higher privacy expectations, and therefore, the speech or portions of the speech should be provided tolocal resources 140 rather thancloud server 115. If the user is farther away, then this might indicate that the user has lower privacy expectations. - In some implementations, the location of the speech can influence whether the speech is kept within
local resources 140,cloud server 115, or both. For example, if speech is from participants in a bedroom, then it might be kept withinlocal resources 140 due to that speech being from a more sensitive location where many people have a higher expectation of privacy. By contrast, if speech is from participants in a living room, then it can be provided tocloud server 115. Accordingly,home assistant device 105 can determine the location of speech and then determine whether that speech should be kept withinlocal resources 140,cloud server 115, or both based on the location within the home environment. - In some implementations,
home assistant device 105 can determine that a user's privacy expectations have changed. For example,home assistant device 105 can store the user's birthdays or ages, and as the user ages, the privacy expectations can become stricter (i.e., more speech is to be restricted tolocal resources 140 rather than allowed to be transmitted to cloud server 115). In another example, as the user ages, the privacy expectations can be more lenient (i.e., more speech is to be allowed to be transmitted to cloud server 115). -
FIG. 5 illustrates an example of a block diagram of determining privacy expectations. InFIG. 5 , atblock 505, characteristics of the speech can be determined. For example, as previously discussed, the context of the speech, the volume of the speech, etc. can be used to determine various characteristics. Atblock 510, privacy expectations can be determined based on those characteristics. For example, if a user is speaking quietly, then higher privacy expectations can be determined than if the user is speaking loudly. Atblock 515, the speech can be provided to cloud resources or local resources based on the privacy expectations. For example, higher privacy expectations can result in speech being provided to local resources rather than cloud resources. - In some implementations,
home assistant device 105 can be set with user preferences as to what should be provided tolocal resources 140 andcloud server 115. In some implementations,home assistant device 105 can learn the user's privacy expectations over time. - Many of the aforementioned examples discuss speech including a question. However, in other examples, the speech can include commands. For example,
home assistant device 105 can be commanded to perform an activity, such as turn on lights, open windows, turn on a security system, etc. in a smart home environment. In some implementations, speech including commands can be provided tocloud server 115 and it can perform speech-to-text translation. It can then provide results tohome assistant device 105 with what it's supposed to do. That is, it can be provided data indicating how it should be responding to the commands, for example, turn on lights.Home assistant device 105 can then act on those commands. This can allow forcloud server 115 to perform the processing to determine the content of speech, buthome assistant device 105 to actually perform the commands rather thancloud server 115. - In some implementations,
home assistant device 105 can process a subset of possible speech on-device, but speech outside of its capabilities can be provided tocloud server 115. For example,home assistant device 105 might be able to recognize speech for a small dictionary (e.g., four hundred words) so that it can perform common commands, such as turning on lights, adjusting a thermostat, etc. This can allow homeassistant device 105 to control various devices in the home without transmitting data to cloudserver 115, and therefore, it can still control devices even if the Internet connection tocloud server 115 goes down. However, more complex speech including commands can be determined to include content outside of the dictionary, and therefore, can be provided tocloud server 115 for processing. -
Home assistant device 105 can also provide a response based on whether results are received from cloud resources, local resources, or both. For example,home assistant device 105 can play back audio response tospeech 120 at different volumes based on where the response or a portion of the response was received from. Ifresults 315 a inFIG. 3 is received (i.e., some speech was provided to local resources), then the volume of playback of the response tospeech 120 can be lower than if only results 315 b (i.e., speech was provided to cloud server 115) were received. In some implementations, the response tospeech 120 can be displayed on the display screen ofhome assistant device 105 if speech was provided to local resources. In another example, if the results are only fromcloud server 115, then the response can be played back on the speaker ofhome assistant device 105. - In some implementations, privacy expectations can be determined using many of the aforementioned examples. An increase in privacy expectations can result in
home assistant device 105 encrypting data provided tocloud server 115 more, for example, using different encryption algorithms that might take longer to encrypt and forcloud server 115 to decrypt. However, some users might find a delay acceptable if their privacy is ensured. Thus, a hierarchy of encryptions levels can provide different levels, strengths, or types of encryption based on the determined privacy expectations. - In some implementations,
home assistant device 105 can include an intercom feature and a home environment can include multiple home assistant devices. The different home assistant devices can communicate with each other and other devices (e.g., speakers) using technology such as Bluetooth, local WLAN, etc. This can allow users to communicate securely within a home without having communications routed through cellular communications. - In some implementations, whether speech is provided to cloud resources or local resources can also be based on the context of an activity. For example, the activity can be understood through the context of what is being communicated. In other implementations, the context can include the time of day, past behaviors, or other variables.
- Many of the aforementioned examples discuss a home environment. In other examples, the devices and techniques discussed herein can also be set up in an office, public facility, outdoors, etc.
- Many of the aforementioned examples discuss speech. In other examples, noise within the environment can be used with the devices and techniques disclosed herein. For example, music, television sounds, etc. can be used. In another example, environmental sounds such as glass breaking, objects shattering, etc. can be determined and provided to one or both of the local resources or cloud server based on the techniques disclosed herein.
-
FIG. 6 illustrates an example of an assistant device. InFIG. 6 ,home assistant device 105 can be an electronic device with one or more processors 605 (e.g., circuits) andmemory 610 for storing instructions that can be executed byprocessors 605 to implementprivacy control 630 providing the techniques described herein.Home assistant device 105 can also include microphone 620 (e.g., one or more microphones that can implement a microphone array) to convert sounds into electrical signals, and therefore, speech into data that can be processed usingprocessors 605 and stored inmemory 610.Speaker 615 can be used to provide audio output. Additionally, display 625 can display a graphical user interface (GUI) implemented byprocessors 605 andmemory 610 to provide visual feedback.Memory 610 can be a non-transitory computer-readable storage media.Home assistant device 105 can also include various other hardware, such as cameras, antennas, etc. to implement the techniques disclosed herein. Thus, the examples described herein can be implemented with programmable circuitry (e.g., one or more microprocessors) programmed with software and/or firmware, or entirely in special-purpose hardwired (non-programmable) circuitry, or in a combination of such forms. Special-purpose hardwired circuitry may be in the form of, for example, one or more application specific integrated circuits (ASICs), complex programmable logic devices (CPLDs), field programmable gate arrays (FPGAs), structured ASICs, etc. - Those skilled in the art will appreciate that the logic and process steps illustrated in the various flow diagrams discussed herein may be altered in a variety of ways. For example, the order of the logic may be rearranged, sub-steps may be performed in parallel, illustrated logic may be omitted, other logic may be included, etc. One will recognize that certain steps may be consolidated into a single step and that actions represented by a single step may be alternatively represented as a collection of substeps. The figures are designed to make the disclosed concepts more comprehensible to a human reader. Those skilled in the art will appreciate that actual data structures used to store this information may differ from the figures and/or tables shown, in that they, for example, may be organized in a different manner; may contain more or less information than shown; may be compressed, scrambled and/or encrypted; etc.
- From the foregoing, it will be appreciated that specific embodiments of the invention have been described herein for purposes of illustration, but that various modifications can be made without deviating from the scope of the invention. Accordingly, the invention is not limited except as by the appended claims.
Claims (25)
1. A home assistant device, comprising:
a microphone;
a speaker;
one or more processors; and
memory storing instructions, wherein the one or more processors are configured to execute the instructions such that the one or more processors and memory are configured to:
detect first speech spoken by a first user of the home assistant device using the microphone;
determine first characteristics of the first speech, the first characteristics including one or more of content of the first speech, time of the first speech, location of the first speech, distance from the home assistant device to a source of the first speech, identity of the first user providing the first speech, or audio characteristics of the first speech;
determine first privacy expectations regarding the first speech based on the first characteristics of the first speech;
provide the first speech to a cloud server based on the first privacy expectations corresponding to the first speech;
receive a first response from the cloud server providing a response to the first speech;
play back the first response using the speaker;
detect second speech spoken by the first user of the home assistant device using the microphone;
determine second characteristics of the second speech, the second characteristics including one or more of content of the second speech, time of the second speech, location of the second speech, distance from the home assistant device to a source of the second speech, identity of the first user providing the second speech, or audio characteristics of the second speech;
determine second privacy expectations regarding the second speech based on the characteristics of the second speech, the first privacy expectations and the second privacy expectations being different, the second privacy expectations representing higher privacy expectations than the first privacy expectations based on differences between the first characteristics and the second characteristics;
provide the second speech to local resources of a wireless network associated with the electronic device rather than the cloud server based on the second privacy expectations;
receive a second response from the local resources providing a response to the second speech; and
play back the second response using the speaker.
2. The home assistant device of claim 1 , wherein the local resources include one or both of hardware resources of the home assistant device or resources of other devices communicatively coupled with the home assistant device on the wireless network.
3. A method for privacy control in a connected environment, comprising:
detecting first speech within an environment of an assistant device;
determining, by a processor of the assistant device, first characteristics of the first speech;
determining first privacy expectations regarding the first speech based on the first characteristics of the first speech; and
providing the first speech to one or both of local resources of the assistant device or a cloud server based on the first privacy expectations.
4. The method of claim 3 , wherein the first characteristics includes one or more of content of the first speech, time of the first speech, location of the first speech, distance from the home assistant device to a source of the first speech, identity of a user providing the first speech, or audio characteristics of the first speech.
5. The method of claim 3 , wherein the first speech is provided to the cloud server, and the method further comprising:
detecting second speech within the environment;
determining second characteristics of the second speech, the first characteristics and the second characteristics being different;
determining second privacy expectations regarding the second speech based on the second characteristics of the second speech, the first privacy expectations and the second privacy expectations being different; and
providing the second speech to the local resources based on the second privacy expectations.
6. The method of claim 5 , the second privacy expectations represent higher privacy expectations than the first privacy expectations based on differences between the first characteristics and the second characteristics.
7. The method of claim 5 , further comprising:
receiving first response data corresponding to the first speech from the cloud server;
receiving second response data corresponding to the second speech from the local resources; and
providing a response to the first speech and the second speech based on the first response data received from the cloud server and the second response data received from the local resources.
8. The method of claim 3 , wherein the local resources include one or both of hardware resources of the assistant device or resources of other devices communicatively coupled with the assistant device on a wireless network.
9. The method of claim 3 , wherein the first speech was provided at a first time, the method further comprising:
detecting second speech within an environment of the assistant device at a second time after the first time;
determining second characteristics of the second speech, the first characteristics and the second characteristics being similar;
determining second privacy expectations regarding the second speech based on the second characteristics, the first privacy expectations and the second privacy expectations being different based on a time difference between the first time and the second time; and
providing the second speech to one or both of the local resources of the assistant device or the cloud server based on the second privacy expectations.
10. An electronic device, comprising:
one or more processors; and
memory storing instructions, wherein the one or more processors are configured to execute the instructions such that the one or more processors and memory are configured to:
detect first speech within an environment of the electronic device;
determine first characteristics of the first speech;
determine first privacy expectations regarding the first speech based on the first characteristics of the first speech; and
provide the first speech to one or both of local resources of the electronic device or a cloud server based on the first privacy expectations.
11. The electronic device of claim 10 , wherein the first characteristics includes one or more of content of the first speech, time of the first speech, location of the first speech, distance from the electronic device to a source of the first speech, identity of a user providing the first speech, or audio characteristics of the first speech.
12. The electronic device of claim 10 , wherein the first speech is provided to the cloud server, wherein the one or more processors are configured to execute the instructions such that the one or more processors and memory are configured to:
detect second speech within the environment;
determine second characteristics of the second speech, the first characteristics and the second characteristics being different;
determine second privacy expectations regarding the second speech based on the second characteristics of the second speech, the first privacy expectations and the second privacy expectations being different; and
provide the second speech to the local resources based on the second privacy expectations.
13. The electronic device of claim 12 , the second privacy expectations represent higher privacy expectations than the first privacy expectations based on differences between the first characteristics and the second characteristics.
14. The electronic device of claim 12 , wherein the one or more processors are configured to execute the instructions such that the one or more processors and memory are configured to:
receive first response data corresponding to the first speech from the cloud server;
receive second response data corresponding to the second speech from the local resources; and
provide a response to the first speech and the second speech based on the first response data received from the cloud server and the second response data received from the local resources.
15. The electronic device of claim 10 , wherein the local resources include one or both of hardware resources of the electronic device or resources of other devices communicatively coupled with the electronic device on a wireless network.
16. The electronic device of claim 10 , wherein the first speech was provided at a first time, wherein the one or more processors are configured to execute the instructions such that the one or more processors and memory are configured to:
detect second speech within an environment of the electronic device at a second time after the first time;
determine second characteristics of the second speech, the first characteristics and the second characteristics being similar;
determine second privacy expectations regarding the second speech based on the second characteristics, the first privacy expectations and the second privacy expectations being different based on a time difference between the first time and the second time; and
provide the second speech to one or both of the local resources of the electronic device or the cloud server based on the second privacy expectations.
17. A computer program product, comprising one or more non-transitory computer-readable media having computer program instructions stored therein, the computer program instructions being configured such that, when executed by one or more computing devices, the computer program instructions cause the one or more computing devices to:
detect first speech within an environment of an electronic device;
determine first characteristics of the first speech;
determining first privacy expectations regarding the first speech based on the first characteristics of the first speech; and
providing the first speech to one or both of local resources of the electronic device or a cloud server based on the first privacy expectations.
18. The computer program product of claim 17 , wherein the first characteristics includes one or more of content of the first speech, time of the first speech, location of the first speech, distance from the electronic device to a source of the first speech, identity of a user providing the first speech, or audio characteristics of the first speech.
19. The computer program product of claim 17 , wherein the first speech is provided to the cloud server, wherein the computer program instructions cause the one or more computing devices to:
detect second speech within the environment;
determine second characteristics of the second speech, the first characteristics and the second characteristics being different;
determine second privacy expectations regarding the second speech based on the second characteristics of the second speech, the first privacy expectations and the second privacy expectations being different; and
provide the second speech to the local resources based on the second privacy expectations.
20. The computer program product of claim 19 , the second privacy expectations represent higher privacy expectations than the first privacy expectations based on differences between the first characteristics and the second characteristics.
21. The computer program product of claim 19 , wherein the computer program instructions cause the one or more computing devices to:
receive first response data corresponding to the first speech from the cloud server;
receive second response data corresponding to the second speech from the local resources; and
provide a response to the first speech and the second speech based on the first response data received from the cloud server and the second response data received from the local resources.
22. The computer program product of claim 17 , wherein the local resources include one or both of hardware resources of the electronic device or resources of other devices communicatively coupled with the electronic device on a wireless network.
23. The method of claim 3 , wherein the characteristics include content of the first speech.
24. The method of claim 3 , wherein the characteristics include distance from the home assistant device to a source of the first speech.
25. The method of claim 3 , wherein the characteristics include an identity of a user speaking the first speech.
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US15/587,244 US20180213396A1 (en) | 2017-01-20 | 2017-05-04 | Privacy control in a connected environment based on speech characteristics |
| PCT/US2017/035548 WO2018136111A1 (en) | 2017-01-20 | 2017-06-01 | Privacy control in a connected environment based on speech characteristics |
| TW106119210A TW201828043A (en) | 2017-01-20 | 2017-06-09 | Privacy control in a connected environment based on speech characteristics |
Applications Claiming Priority (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US201762448923P | 2017-01-20 | 2017-01-20 | |
| US201762486388P | 2017-04-17 | 2017-04-17 | |
| US201762486392P | 2017-04-17 | 2017-04-17 | |
| US15/587,244 US20180213396A1 (en) | 2017-01-20 | 2017-05-04 | Privacy control in a connected environment based on speech characteristics |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| US20180213396A1 true US20180213396A1 (en) | 2018-07-26 |
Family
ID=62906588
Family Applications (2)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US15/587,244 Abandoned US20180213396A1 (en) | 2017-01-20 | 2017-05-04 | Privacy control in a connected environment based on speech characteristics |
| US15/587,230 Expired - Fee Related US10204623B2 (en) | 2017-01-20 | 2017-05-04 | Privacy control in a connected environment |
Family Applications After (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US15/587,230 Expired - Fee Related US10204623B2 (en) | 2017-01-20 | 2017-05-04 | Privacy control in a connected environment |
Country Status (4)
| Country | Link |
|---|---|
| US (2) | US20180213396A1 (en) |
| DE (1) | DE112017006876T5 (en) |
| TW (2) | TW201828043A (en) |
| WO (2) | WO2018136111A1 (en) |
Cited By (24)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111464489A (en) * | 2020-02-21 | 2020-07-28 | 中国电子技术标准化研究院 | A method and system for privacy protection of IoT devices |
| US10747894B1 (en) * | 2018-09-24 | 2020-08-18 | Amazon Technologies, Inc. | Sensitive data management |
| US20200367057A1 (en) * | 2017-10-19 | 2020-11-19 | Microsoft Technology Licensing, Llc | Single sign-in for iot devices |
| US11158312B2 (en) * | 2018-09-25 | 2021-10-26 | International Business Machines Corporation | Presenting contextually appropriate responses to user queries by a digital assistant device |
| CN113765757A (en) * | 2021-08-09 | 2021-12-07 | 珠海格力电器股份有限公司 | Equipment end voice reminding method and system and household appliance |
| US11226835B2 (en) * | 2018-11-12 | 2022-01-18 | International Business Machines Corporation | Determination and initiation of a computing interface for computer-initiated task response |
| US20230041125A1 (en) * | 2020-05-11 | 2023-02-09 | Apple Inc. | User interface for audio message |
| CN118197295A (en) * | 2024-04-11 | 2024-06-14 | 润芯微科技(江苏)有限公司 | In-vehicle voice privacy protection method, system, equipment and storage medium |
| US12169395B2 (en) | 2016-06-12 | 2024-12-17 | Apple Inc. | User interface for managing controllable external devices |
| US12197699B2 (en) | 2017-05-12 | 2025-01-14 | Apple Inc. | User interfaces for playing and managing audio items |
| US12223228B2 (en) | 2019-05-31 | 2025-02-11 | Apple Inc. | User interfaces for audio media control |
| US12244755B2 (en) | 2017-05-16 | 2025-03-04 | Apple Inc. | Methods and interfaces for configuring a device in accordance with an audio tone signal |
| US12242702B2 (en) | 2021-05-15 | 2025-03-04 | Apple Inc. | Shared-content session user interfaces |
| US12262089B2 (en) | 2018-05-07 | 2025-03-25 | Apple Inc. | User interfaces for viewing live video feeds and recorded video |
| US12267622B2 (en) | 2021-09-24 | 2025-04-01 | Apple Inc. | Wide angle video conference |
| US12301979B2 (en) | 2021-01-31 | 2025-05-13 | Apple Inc. | User interfaces for wide angle video conference |
| US12302035B2 (en) | 2010-04-07 | 2025-05-13 | Apple Inc. | Establishing a video conference during a phone call |
| US12348663B2 (en) | 2007-06-28 | 2025-07-01 | Apple Inc. | Portable electronic device with conversation management for incoming instant messages |
| US12368946B2 (en) | 2021-09-24 | 2025-07-22 | Apple Inc. | Wide angle video conference |
| US12379827B2 (en) | 2022-06-03 | 2025-08-05 | Apple Inc. | User interfaces for managing accessories |
| US12381924B2 (en) | 2021-05-15 | 2025-08-05 | Apple Inc. | Real-time communication user interface |
| US12422976B2 (en) | 2021-05-15 | 2025-09-23 | Apple Inc. | User interfaces for managing accessories |
| US12452389B2 (en) | 2018-05-07 | 2025-10-21 | Apple Inc. | Multi-participant live communication user interface |
| US12449961B2 (en) | 2021-05-18 | 2025-10-21 | Apple Inc. | Adaptive video conference user interfaces |
Families Citing this family (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2017197312A2 (en) * | 2016-05-13 | 2017-11-16 | Bose Corporation | Processing speech from distributed microphones |
| US10665234B2 (en) * | 2017-10-18 | 2020-05-26 | Motorola Mobility Llc | Detecting audio trigger phrases for a voice recognition session |
| CN111556409B (en) * | 2020-05-22 | 2022-04-19 | 上海创功通讯技术有限公司 | Microphone control circuit and electronic equipment |
Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20140067392A1 (en) * | 2012-09-05 | 2014-03-06 | GM Global Technology Operations LLC | Centralized speech logger analysis |
| US9070367B1 (en) * | 2012-11-26 | 2015-06-30 | Amazon Technologies, Inc. | Local speech recognition of frequent utterances |
| US20150279352A1 (en) * | 2012-10-04 | 2015-10-01 | Nuance Communications, Inc. | Hybrid controller for asr |
| US20160162844A1 (en) * | 2014-12-09 | 2016-06-09 | Samsung Electronics Co., Ltd. | Automatic detection and analytics using sensors |
| US20160162469A1 (en) * | 2014-10-23 | 2016-06-09 | Audience, Inc. | Dynamic Local ASR Vocabulary |
| US20160379626A1 (en) * | 2015-06-26 | 2016-12-29 | Michael Deisher | Language model modification for local speech recognition systems using remote sources |
| US9680983B1 (en) * | 2016-06-16 | 2017-06-13 | Motorola Mobility Llc | Privacy mode detection and response over voice activated interface |
| US20170280235A1 (en) * | 2016-03-24 | 2017-09-28 | Intel Corporation | Creating an audio envelope based on angular information |
Family Cites Families (22)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6185535B1 (en) * | 1998-10-16 | 2001-02-06 | Telefonaktiebolaget Lm Ericsson (Publ) | Voice control of a user interface to service applications |
| EP1088299A2 (en) * | 1999-03-26 | 2001-04-04 | Scansoft, Inc. | Client-server speech recognition |
| US20140098247A1 (en) | 1999-06-04 | 2014-04-10 | Ip Holdings, Inc. | Home Automation And Smart Home Control Using Mobile Devices And Wireless Enabled Electrical Switches |
| US8635243B2 (en) * | 2007-03-07 | 2014-01-21 | Research In Motion Limited | Sending a communications header with voice recording to send metadata for use in speech recognition, formatting, and search mobile search application |
| US8589161B2 (en) | 2008-05-27 | 2013-11-19 | Voicebox Technologies, Inc. | System and method for an integrated, multi-modal, multi-device natural language voice services environment |
| US9953653B2 (en) | 2011-01-07 | 2018-04-24 | Nuance Communications, Inc. | Configurable speech recognition system using multiple recognizers |
| US8972263B2 (en) * | 2011-11-18 | 2015-03-03 | Soundhound, Inc. | System and method for performing dual mode speech recognition |
| WO2013078388A1 (en) * | 2011-11-21 | 2013-05-30 | Robert Bosch Gmbh | Methods and systems for adapting grammars in hybrid speech recognition engines for enhancing local sr performance |
| US9060224B1 (en) | 2012-06-01 | 2015-06-16 | Rawles Llc | Voice controlled assistant with coaxial speaker and microphone arrangement |
| US9230560B2 (en) | 2012-10-08 | 2016-01-05 | Nant Holdings Ip, Llc | Smart home automation systems and methods |
| WO2014142702A1 (en) * | 2013-03-15 | 2014-09-18 | Obschestvo S Ogranichennoy Otvetstvennostiyu "Speaktoit" | Selective speech recognition for chat and digital personal assistant systems |
| US9131369B2 (en) * | 2013-01-24 | 2015-09-08 | Nuance Communications, Inc. | Protection of private information in a client/server automatic speech recognition system |
| US9355368B2 (en) | 2013-03-14 | 2016-05-31 | Toyota Motor Engineering & Manufacturing North America, Inc. | Computer-based method and system for providing active and automatic personal assistance using a robotic device/platform |
| US9223837B2 (en) | 2013-03-14 | 2015-12-29 | Toyota Motor Engineering & Manufacturing North America, Inc. | Computer-based method and system for providing active and automatic personal assistance using an automobile or a portable electronic device |
| KR20150026361A (en) | 2013-09-02 | 2015-03-11 | 삼성전자주식회사 | Clock Data Recovery Circuit and Display Device Thereof |
| US9666188B2 (en) | 2013-10-29 | 2017-05-30 | Nuance Communications, Inc. | System and method of performing automatic speech recognition using local private data |
| WO2015103338A1 (en) | 2013-12-31 | 2015-07-09 | Lookout, Inc. | Cloud-based network security |
| CN103736180B (en) | 2014-01-13 | 2015-07-01 | 常州正元医疗科技有限公司 | Hand-held high-frequency ultrasonic atomization full respiratory tract medicine introducing device |
| US10031721B2 (en) | 2014-05-15 | 2018-07-24 | Tyco Safety Products Canada Ltd. | System and method for processing control commands in a voice interactive system |
| EP3158427B1 (en) | 2014-06-19 | 2022-12-28 | Robert Bosch GmbH | System and method for speech-enabled personalized operation of devices and services in multiple operating environments |
| US9836620B2 (en) | 2014-12-30 | 2017-12-05 | Samsung Electronic Co., Ltd. | Computing system for privacy-aware sharing management and method of operation thereof |
| US10453098B2 (en) | 2015-03-04 | 2019-10-22 | Google Llc | Privacy-aware personalized content for the smart home |
-
2017
- 2017-05-04 US US15/587,244 patent/US20180213396A1/en not_active Abandoned
- 2017-05-04 US US15/587,230 patent/US10204623B2/en not_active Expired - Fee Related
- 2017-06-01 WO PCT/US2017/035548 patent/WO2018136111A1/en not_active Ceased
- 2017-06-01 DE DE112017006876.2T patent/DE112017006876T5/en not_active Withdrawn
- 2017-06-01 WO PCT/US2017/035546 patent/WO2018136110A1/en not_active Ceased
- 2017-06-09 TW TW106119210A patent/TW201828043A/en unknown
- 2017-06-09 TW TW106119218A patent/TW201828724A/en unknown
Patent Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20140067392A1 (en) * | 2012-09-05 | 2014-03-06 | GM Global Technology Operations LLC | Centralized speech logger analysis |
| US20150279352A1 (en) * | 2012-10-04 | 2015-10-01 | Nuance Communications, Inc. | Hybrid controller for asr |
| US9070367B1 (en) * | 2012-11-26 | 2015-06-30 | Amazon Technologies, Inc. | Local speech recognition of frequent utterances |
| US20160162469A1 (en) * | 2014-10-23 | 2016-06-09 | Audience, Inc. | Dynamic Local ASR Vocabulary |
| US20160162844A1 (en) * | 2014-12-09 | 2016-06-09 | Samsung Electronics Co., Ltd. | Automatic detection and analytics using sensors |
| US20160379626A1 (en) * | 2015-06-26 | 2016-12-29 | Michael Deisher | Language model modification for local speech recognition systems using remote sources |
| US20170280235A1 (en) * | 2016-03-24 | 2017-09-28 | Intel Corporation | Creating an audio envelope based on angular information |
| US9680983B1 (en) * | 2016-06-16 | 2017-06-13 | Motorola Mobility Llc | Privacy mode detection and response over voice activated interface |
Cited By (30)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US12348663B2 (en) | 2007-06-28 | 2025-07-01 | Apple Inc. | Portable electronic device with conversation management for incoming instant messages |
| US12302035B2 (en) | 2010-04-07 | 2025-05-13 | Apple Inc. | Establishing a video conference during a phone call |
| US12265364B2 (en) | 2016-06-12 | 2025-04-01 | Apple Inc. | User interface for managing controllable external devices |
| US12169395B2 (en) | 2016-06-12 | 2024-12-17 | Apple Inc. | User interface for managing controllable external devices |
| US12197699B2 (en) | 2017-05-12 | 2025-01-14 | Apple Inc. | User interfaces for playing and managing audio items |
| US12244755B2 (en) | 2017-05-16 | 2025-03-04 | Apple Inc. | Methods and interfaces for configuring a device in accordance with an audio tone signal |
| US20200367057A1 (en) * | 2017-10-19 | 2020-11-19 | Microsoft Technology Licensing, Llc | Single sign-in for iot devices |
| US12058519B2 (en) * | 2017-10-19 | 2024-08-06 | Microsoft Technology Licensing, Llc | Single sign-in for IoT devices |
| US12452389B2 (en) | 2018-05-07 | 2025-10-21 | Apple Inc. | Multi-participant live communication user interface |
| US12262089B2 (en) | 2018-05-07 | 2025-03-25 | Apple Inc. | User interfaces for viewing live video feeds and recorded video |
| US11755756B1 (en) | 2018-09-24 | 2023-09-12 | Amazon Technologies, Inc. | Sensitive data management |
| US10747894B1 (en) * | 2018-09-24 | 2020-08-18 | Amazon Technologies, Inc. | Sensitive data management |
| US11158312B2 (en) * | 2018-09-25 | 2021-10-26 | International Business Machines Corporation | Presenting contextually appropriate responses to user queries by a digital assistant device |
| US11226833B2 (en) * | 2018-11-12 | 2022-01-18 | International Business Machines Corporation | Determination and initiation of a computing interface for computer-initiated task response |
| US11226835B2 (en) * | 2018-11-12 | 2022-01-18 | International Business Machines Corporation | Determination and initiation of a computing interface for computer-initiated task response |
| US12223228B2 (en) | 2019-05-31 | 2025-02-11 | Apple Inc. | User interfaces for audio media control |
| CN111464489A (en) * | 2020-02-21 | 2020-07-28 | 中国电子技术标准化研究院 | A method and system for privacy protection of IoT devices |
| US20230041125A1 (en) * | 2020-05-11 | 2023-02-09 | Apple Inc. | User interface for audio message |
| US12265696B2 (en) * | 2020-05-11 | 2025-04-01 | Apple Inc. | User interface for audio message |
| US12301979B2 (en) | 2021-01-31 | 2025-05-13 | Apple Inc. | User interfaces for wide angle video conference |
| US12260059B2 (en) | 2021-05-15 | 2025-03-25 | Apple Inc. | Shared-content session user interfaces |
| US12242702B2 (en) | 2021-05-15 | 2025-03-04 | Apple Inc. | Shared-content session user interfaces |
| US12381924B2 (en) | 2021-05-15 | 2025-08-05 | Apple Inc. | Real-time communication user interface |
| US12422976B2 (en) | 2021-05-15 | 2025-09-23 | Apple Inc. | User interfaces for managing accessories |
| US12449961B2 (en) | 2021-05-18 | 2025-10-21 | Apple Inc. | Adaptive video conference user interfaces |
| CN113765757A (en) * | 2021-08-09 | 2021-12-07 | 珠海格力电器股份有限公司 | Equipment end voice reminding method and system and household appliance |
| US12267622B2 (en) | 2021-09-24 | 2025-04-01 | Apple Inc. | Wide angle video conference |
| US12368946B2 (en) | 2021-09-24 | 2025-07-22 | Apple Inc. | Wide angle video conference |
| US12379827B2 (en) | 2022-06-03 | 2025-08-05 | Apple Inc. | User interfaces for managing accessories |
| CN118197295A (en) * | 2024-04-11 | 2024-06-14 | 润芯微科技(江苏)有限公司 | In-vehicle voice privacy protection method, system, equipment and storage medium |
Also Published As
| Publication number | Publication date |
|---|---|
| WO2018136110A1 (en) | 2018-07-26 |
| DE112017006876T5 (en) | 2019-10-24 |
| TW201828043A (en) | 2018-08-01 |
| TW201828724A (en) | 2018-08-01 |
| US20180211657A1 (en) | 2018-07-26 |
| WO2018136111A1 (en) | 2018-07-26 |
| US10204623B2 (en) | 2019-02-12 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US10204623B2 (en) | Privacy control in a connected environment | |
| US20220012470A1 (en) | Multi-user intelligent assistance | |
| Jain et al. | Head-mounted display visualizations to support sound awareness for the deaf and hard of hearing | |
| US10958457B1 (en) | Device control based on parsed meeting information | |
| US10102856B2 (en) | Assistant device with active and passive experience modes | |
| US9344815B2 (en) | Method for augmenting hearing | |
| JP2020532757A (en) | Intercom-type communication using multiple computing devices | |
| KR102871125B1 (en) | Information processing device and information processing method, and information processing system | |
| US10405096B2 (en) | Directed audio system for audio privacy and audio stream customization | |
| US20190378518A1 (en) | Personalized voice recognition service providing method using artificial intelligence automatic speaker identification method, and service providing server used therein | |
| US20210225365A1 (en) | Systems and Methods for Assisting the Hearing-Impaired Using Machine Learning for Ambient Sound Analysis and Alerts | |
| US20180322300A1 (en) | Secure machine-curated scenes | |
| US20180210738A1 (en) | Contextual user interface based on changes in environment | |
| KR20200005741A (en) | Methods, systems, and media for providing information about detected events | |
| US12249328B2 (en) | Techniques for communication between hub device and multiple endpoints | |
| US20240089676A1 (en) | Hearing performance and habilitation and/or rehabilitation enhancement using normal things | |
| KR20200112481A (en) | Computer program, electronic device, and system for controlling conference | |
| US20250111851A1 (en) | Techniques for communication between hub device and multiple endpoints | |
| DE112021003164T5 (en) | Systems and methods for recognizing voice commands to create a peer-to-peer communication link | |
| KR102873881B1 (en) | Method, system and non-transitory computer-readable recording medium for providing automated cognitive status report by analyzing voice call | |
| Dornhoffer et al. | Patient-related factors do not predict use of computer-based auditory training by new adult cochlear implant recipients | |
| Groarke | Making room for auditory argument | |
| Rominger et al. | Hearing loss and hearing‐related factors: Technology and environmental interventions | |
| US20220378330A1 (en) | Method and device for providing auditory program simulating on-the-spot experience | |
| Shiell et al. | Multilevel Modelling of Gaze from Hearing-impaired Listeners following a Realistic Conversation |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AS | Assignment |
Owner name: ESSENTIAL PRODUCTS, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SEGAL, MARA CLAIR;ROMAN, MANUEL;DESAI, DWIPAL;AND OTHERS;SIGNING DATES FROM 20170512 TO 20170516;REEL/FRAME:042578/0807 |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
| STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |