[go: up one dir, main page]

WO2020022573A1 - Dispositif intelligent et procédé de commande associé - Google Patents

Dispositif intelligent et procédé de commande associé Download PDF

Info

Publication number
WO2020022573A1
WO2020022573A1 PCT/KR2018/014227 KR2018014227W WO2020022573A1 WO 2020022573 A1 WO2020022573 A1 WO 2020022573A1 KR 2018014227 W KR2018014227 W KR 2018014227W WO 2020022573 A1 WO2020022573 A1 WO 2020022573A1
Authority
WO
WIPO (PCT)
Prior art keywords
media content
smart device
voice
volume
text data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/KR2018/014227
Other languages
English (en)
Korean (ko)
Inventor
박성흠
김영훈
강승원
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Humax Co Ltd
Original Assignee
Humax Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Humax Co Ltd filed Critical Humax Co Ltd
Publication of WO2020022573A1 publication Critical patent/WO2020022573A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G03PHOTOGRAPHY; CINEMATOGRAPHY; ANALOGOUS TECHNIQUES USING WAVES OTHER THAN OPTICAL WAVES; ELECTROGRAPHY; HOLOGRAPHY
    • G03BAPPARATUS OR ARRANGEMENTS FOR TAKING PHOTOGRAPHS OR FOR PROJECTING OR VIEWING THEM; APPARATUS OR ARRANGEMENTS EMPLOYING ANALOGOUS TECHNIQUES USING WAVES OTHER THAN OPTICAL WAVES; ACCESSORIES THEREFOR
    • G03B21/00Projectors or projection-type viewers; Accessories therefor
    • G03B21/14Details
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • G10L17/24Interactive procedures; Man-machine interfaces the user being prompted to utter a password or a predefined phrase
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech

Definitions

  • the present invention relates to a smart device and a control method thereof, and more particularly, to a smart device and a control method for controlling the output of the feedback in the listening mode.
  • Smart speaker basically has the advantage of being able to use it freely in indoor space such as home or office because it performs commands or communicates with user through voice, but audio-type information has User convenience may be impaired in some situations due to limitations in information volume, difficulty in processing visual information, and non-persistence of information output.
  • One object of the present invention relates to a smart device for receiving a user's voice without noise while playing media content, and a control method thereof.
  • One object of the present invention relates to a smart device and a method of controlling the same, which perform a listening mode without interrupting the media content being played.
  • a wake-up word detection state for determining whether a received sound signal includes a wake-up word and a voice command included in the sound signal.
  • a media content control method performed by a smart device having an operating state including a listening state for recognizing, wherein a media content related to a first voice command input by a user is played, but the voice of the smart device is played.
  • the method of controlling a smart device may include providing a volume of data and displaying text data corresponding to the volume-controlled audio data of the media content together with video data of the media content.
  • a wake-up word detection state for determining whether a sound signal received from the outside includes a wake-up word and a voice included in the sound signal.
  • a media content control method performed by a smart device having an operating state including a listening state for recognizing a command, the method comprising: playing media content related to a first voice command input by a user, wherein the smart device is played; Playing the media content by outputting audio data of the media content via a voice output module of and displaying the video data of the media content through an image output module of the smart device; Entering the wakeup word detection state while maintaining playback of the media content after initiating playback of the media content; Determining, by the smart device in the wakeup word detection state, whether the wakeup word is included in a first sound signal received during playback of the media content; When the first sound signal includes the wakeup word, entering the listening state while maintaining playback of the media content; And performing any one of an operation of pausing playback of the media content or adjusting the volume of audio data
  • a wake-up word detection state for determining whether a sound signal received from the outside includes a wake-up word and included in the sound signal.
  • a smart device having an operating state including a listening state for recognizing a voice command, comprising: a voice input module for receiving a sound signal; A voice output module for outputting a voice; An image output module for displaying an image; And playing the media content related to the first voice command input by the user, outputting audio data of the media content through the voice output module of the smart device and video of the media content through the image output module of the smart device.
  • playback of the media content is performed.
  • a smart device for performing the first operation may be provided by adjusting a volume of and displaying text data corresponding to the audio data of which the volume of the media content is adjusted together with video data of the media content.
  • the media device stops the media content or removes or reduces the voice size of the media content, thereby preventing the voice resulting from the media content from being mixed with noise in the user's voice.
  • the user when the smart device enters the listening mode during the playback of the media content, the user may enjoy the media content even in the listening mode by displaying subtitles instead of removing or reducing the voice size of the media content being played.
  • FIG. 1 is a block diagram of a smart device according to an embodiment of the present invention.
  • FIGS. 2 to 6 are diagrams of some implementations of a smart device according to an embodiment of the present invention.
  • FIG. 7 and 8 are diagrams illustrating an example of operation of an operation mode of a smart device according to an embodiment of the present invention.
  • FIG. 9 is a diagram illustrating communication between a smart device and a voice assistant server according to an embodiment of the present invention.
  • FIG. 10 is a diagram illustrating an operation of a standby mode during media content playback according to an embodiment of the present invention.
  • FIG. 11 is a diagram illustrating the operation of a listening mode during media content playback according to an embodiment of the present invention.
  • FIG. 12 is a flowchart of a first example of a method for controlling a smart device according to an embodiment of the present invention.
  • FIG. 13 is a flowchart of a second example of a method for controlling a smart device according to an embodiment of the present invention.
  • 16 is a flowchart of a third example of a method for controlling a smart device according to an embodiment of the present invention.
  • FIG. 17 is a flowchart of a fourth example of a method for controlling a smart device according to an embodiment of the present invention.
  • FIG. 18 is a flowchart of a fifth example of a method of controlling a smart device according to an embodiment of the present invention.
  • a wake-up word detection state for determining whether a received sound signal includes a wake-up word and a voice command included in the sound signal.
  • a media content control method performed by a smart device having an operating state including a listening state for recognizing, wherein a media content related to a first voice command input by a user is played, but the voice of the smart device is played.
  • the method of controlling a smart device may include providing a volume of data and displaying text data corresponding to the volume-controlled audio data of the media content together with video data of the media content.
  • the method may further include performing a second operation related to the media content when returning to the wake-up word detection state after entering the listening state.
  • the performing of the second operation may further include performing the second operation. And adjusting the volume of data of the audio of the content before performing the first operation and ending the display of the text data.
  • the text data displayed for the first time among the text data may include text data corresponding to audio data at the time when the volume is adjusted and text data corresponding to audio data before a predetermined time from the time at which the volume is adjusted. have.
  • the text data displayed last among the text data may include text data corresponding to audio data when the volume is returned and text data corresponding to audio data after a predetermined time from the time when the volume is returned. Can be.
  • a wake-up word detection state for determining whether a sound signal received from the outside includes a wake-up word and a voice included in the sound signal.
  • a media content control method performed by a smart device having an operating state including a listening state for recognizing a command, the method comprising: playing media content related to a first voice command input by a user, wherein the smart device is played; Playing the media content by outputting audio data of the media content via a voice output module of and displaying the video data of the media content through an image output module of the smart device; Entering the wakeup word detection state while maintaining playback of the media content after initiating playback of the media content; Determining, by the smart device in the wakeup word detection state, whether the wakeup word is included in a first sound signal received during playback of the media content; When the first sound signal includes the wakeup word, entering the listening state while maintaining playback of the media content; And performing any one of an operation of pausing playback of the media content or adjusting the volume of audio data
  • the volume of the audio data may be adjusted by reducing or removing the volume of the audio data.
  • the method may further include displaying text data corresponding to the audio data of which the volume of the media content is adjusted together with video data of the media content when adjusting the volume of the audio data.
  • a wake-up word detection state for determining whether a sound signal received from the outside includes a wake-up word and included in the sound signal.
  • a smart device having an operating state including a listening state for recognizing a voice command, comprising: a voice input module for receiving a sound signal; A voice output module for outputting a voice; An image output module for displaying an image; And playing the media content related to the first voice command input by the user, outputting audio data of the media content through the voice output module of the smart device and video of the media content through the image output module of the smart device.
  • Playing the media content by displaying data, starting playback of the media content and entering the wakeup word detection state while maintaining playback of the media content; During playback of media content, it is determined whether the wakeup word is included in a first sound signal received through the voice input module. When the wakeup word is included in the first sound signal, playback of the media content is performed.
  • a smart device for performing the first operation may be provided by adjusting a volume of and displaying text data corresponding to the audio data of which the volume of the media content is adjusted together with video data of the media content.
  • the controller may perform a second operation related to the media content when the controller returns to the wakeup word detection state after entering the listening state, wherein the controller is further configured to adjust the volume of data of the audio of the media content.
  • the second operation may be performed by adjusting before performing the first operation and terminating the display of the text data.
  • the controller may output a feedback related to a second voice command included in a second sound signal received after entering the listening state, and perform a second operation related to the media content after outputting the feedback.
  • the controller may perform the second operation by adjusting the volume of data of the audio of the media content before performing the first operation and terminating the display of the text data.
  • the controller may output a talk-back related to a second voice command included in the second sound signal received after entering the listening state, and after the output of the talk-back is terminated, a second related to the media content.
  • the controller may perform the second operation by adjusting the volume of the data of the audio of the media content before performing the first operation and terminating the display of the text data.
  • the text data displayed for the first time among the text data may include text data corresponding to audio data at the time when the volume is adjusted and text data corresponding to audio data before a predetermined time from the time at which the volume is adjusted. have.
  • the text data displayed last among the text data may include text data corresponding to audio data when the volume is returned and text data corresponding to audio data after a predetermined time from the time when the volume is returned. Can be.
  • the smart device 1000 may interact with a user through voice.
  • the smart device 1000 may receive a user voice and output a feedback requested by a voice command included in the user voice.
  • the smart device 1000 may output an audio-type or video-type feedback.
  • the smart device 1000 may include any kind of device capable of interacting voice with a user using a voice assistant function.
  • the smart device 1000 may be typically provided in the form of a smart speaker, but in addition, the smart device 1000 may be provided in the form of a smart phone, a smart tablet, a notebook, a smart television, a smart set-top box, a smart display, a smart projector, and the like. It is also possible.
  • the smart device 1000 may include a voice input module 1200 for receiving a user voice for interaction with a voice-based user and a voice output module for outputting an audio-type feedback. 1300.
  • the smart device 1000 may further include a display module 1400 for outputting a video-type feedback.
  • the smart device 1000 may further include some components including a communication module 1020 for communicating with an external device (for example, a voice assistant server) to realize a voice assistant function. The description will be described later.
  • the smart device 1000 may implement a voice assistant function.
  • the voice assistant function is a concept encompassing all functions that enable interaction between the user and the smart device 1000 using voice as a medium.
  • the smart device 1000 transmits the received user voice to the voice assistant server 10.
  • the voice assistant server 10 interprets a voice command included in the user's voice, obtains feedback data about feedback requested by the voice command, and transmits the feedback data to the smart device 1000.
  • the smart device 1000 outputs feedback based on the feedback data received from the voice assistant server 10.
  • the smart device 1000 may implement a voice assistant function.
  • the smart device 1000 has been described as implementing the voice assistant function in cooperation with the voice assistant server 10. However, in some cases, the smart device 1000 executes the voice assistant function locally by stand-alone. It may also be possible.
  • the voice assistant server 10 may collectively refer to a server that implements a voice assistant function in cooperation with the smart device 1000 according to an embodiment of the present invention.
  • the voice assistant server 10 receives a user's voice from the smart device 1000, extracts a voice command included in the user's voice, interprets the voice command, and provides feedback to the feedback that the smart device 1000 outputs in response to the voice command. It may be responsible for the function of generating the feedback data relating to the smart device 1000, and deliver it.
  • the voice assistant server 10 may be implemented as a single server physically and / or functionally or as a plurality of servers as necessary.
  • the voice assistant server 10 may include a voice recognition server that extracts a voice command from a voice received from a smart device, an artificial intelligence server that interprets the extracted voice command, a server that manages multimedia content to provide feedback, and the like. It can be a collection of the same server. That is, the voice assistant server 10 may be in the form of a single server that implements all of the functions mentioned above and other functions required for the implementation of the voice assistant function, as well as a collection of servers sharing each function.
  • the smart device 1000 may obtain a user voice through the voice input module 1200.
  • the user voice may mean a voice spoken by a user who uses the smart device 1000.
  • the operation mode of the smart device 1000 may be a standby mode.
  • the smart device 1000 may obtain a user voice.
  • a voice other than the user voice is included in the voice received by the smart device 1000.
  • examples of other voices may include voices due to feedback output from the smart device 1000 or voices due to noise generated in the surroundings.
  • these other voices may be referred to as other voices separately from user voices. do.
  • a voice due to the feedback output from the smart device 1000 will be referred to as a feedback voice.
  • the voice of a user may include a voice command for requesting a user to perform / process a specific operation on the smart device 1000.
  • the user voice may be interpreted as a term defined from an acoustic point of view
  • the voice command may be interpreted as a term defined from an information processing point of view. Therefore, in the present specification, a user voice and a voice command will be used as a distinction.
  • the distinction between a user voice and a voice command is not always clear, and in some cases the benefits of the distinction may be substantially ineffective, so that some of the descriptions and claims that follow will be described herein to those skilled in the art. Note that the two terms, voice and voice command, may be used interchangeably.
  • the user does not necessarily have to instruct the smart device 1000 through the voice command.
  • the user may interact with the smart device 1000 in various forms such as a button input, a touch, a gesture, and the like.
  • the smart device 1000 may not know in advance or is difficult whether the received voice is a user voice uttered by the user to use the smart device 1000, and thus, the smart device 1000 may perform a specific operation from the user. It may have a standby mode as a preliminary phase for entering a listening mode for receiving a user voice requiring execution / processing.
  • the wake-up word may mean a trigger for entering the listening mode from the standby mode.
  • the smart device 1000 may detect the wakeup word from the user voice received in the standby mode, and enter the listening mode when the wakeup word is detected. Accordingly, the wakeup word may be regarded as a special voice command for instructing the smart device 1000 to enter the listening mode from the standby mode.
  • the wake-up word may be regarded as a special voice command for instructing the smart device 1000 to enter the listening mode from the standby mode.
  • the wakeup word may be referred to as much as possible. In some cases, however, the wakeup word may be described as being a type of voice command. In addition, in the following description, the wakeup word may sometimes be replaced with the term hot word.
  • a single word or a plurality of words or phrases may be set as a wakeup word.
  • the wake-up word may be used as a call name or the like in which a user calls the smart device 1000.
  • the wakeup word may be determined in the setting process of the smart device 1000, and may be determined as 'computer', 'hey speaker', or the like.
  • the voice command may take the form of an arbitrary word, phrase, or sentence, and thus, a complicated operation is required to recognize and interpret the voice command from the user voice in order to perform the voice assistant function. Therefore, since the smart device 1000 does not recognize any voice command on its own, the process of processing the voice command from the user voice is usually performed in the voice assistant server 10. In contrast, since the wakeup word is only a single word or a set of several words, the process of detecting the wakeup word from the user's voice may be processed locally in the smart device 1000.
  • the feedback may mean a response that the smart device 1000 outputs in response to an instruction, a request, or a request that the smart device 1000 is requested from a user.
  • Feedback herein may include audio-type feedback and video-type feedback.
  • the audio-type feedback may refer to auditory feedback output through the voice output module 1300.
  • the audio-type feedback will be referred to as a talkback for convenience of description.
  • the video-type feedback may refer to visual feedback output through the display module 1400.
  • the video-type feedback will be referred to as a display bag for convenience of description.
  • talk-back includes the word 'talk'
  • the talk-back does not necessarily mean conversational feedback but is interpreted as encompassing all of the auditory feedback such as music or sound effects. Should be.
  • the display back refers to the video-type feedback
  • the display back has been described as visual feedback and the talkback as an acoustic feedback
  • the display back can be interpreted to include the addition of the acoustic feedback to the visual feedback for convenience of description.
  • the smart device 1000 may be described as outputting the display back with respect to the smart device 1000 playing multimedia content that provides an audiovisual experience such as a movie or a game.
  • the display back may include a TV program, a movie or music video, a youtube streaming service, or the like.
  • FIG. 1 is a block diagram of a smart device 1000 according to an embodiment of the present invention.
  • the smart device 1000 may include a voice input module 1200, a voice output module 1300, a communication module 1020, a memory 1040, and a controller 1060.
  • the smart device 1000 receives a user voice through the voice input module 1200, transmits the user voice to the voice assistant server 10 through the communication module 1020, and requests a voice command included in the user voice.
  • Receiving the received feedback data, and outputting the talkback through the voice output module 1300, and the controller 1060 can control each module or process various information necessary for the above-described process, the memory 1040 Various information can be stored therein.
  • the smart device 1000 may further include a display module 1400 to output the display back.
  • the smart device 1000 may further include a driving module 1500 for adjusting the direction of the display back.
  • the voice input module 1200 may receive various voices including a user voice.
  • the voice input module 1200 may be provided with a single or a plurality of microphones 1202.
  • the voice input module 1200 may be provided to the microphone array 1204 in which the plurality of microphones 1202 are formed in a predetermined shape.
  • the voice output module 1300 may output various sounds including the talkback.
  • the voice output module 1300 may be provided as a single or a plurality of speakers 1302.
  • the voice output module 1300 may be arranged in a structure having omnidireictional as needed. Alternatively, the voice output module 1300 may be arranged in a structure that outputs sound in a directional manner as needed.
  • the display module 1400 may output various images including a display back.
  • the display module 1400 may be implemented in the form of a display panel 1420 or a projector 1440.
  • the direction of the display module 1400 or the direction of the display bag output by the display module 1400 may be adjusted by the driving module 1500.
  • the smart device 1000 may output a display back through the projector 1440, and the display area or the projection direction in which the display back is output may be dynamically adjusted around the user by the driving module 1500.
  • the driving module 1500 may further include a direction detecting sensor 1560 for detecting a direction of the current display back.
  • the direction sensor 1560 may sense a direction in which the projector 1440 is disposed.
  • the communication module 1020 communicates with an external device.
  • the smart device 1000 may transmit / receive information with the voice assistant server 10 through the communication module 1020. More specifically, the smart device 1000 may communicate with the voice assistant through the communication module 1020.
  • the user voice may be transmitted to the server 10, and feedback data may be received from the voice assistant server 10 through the communication module 1020.
  • the communication module 1020 may be largely divided into a wired type and a wireless type.
  • the wired type and the wireless type have advantages and disadvantages, and the smart device 1000 may be provided with a wired type communication module 1020 and / or a wireless type communication module 1020.
  • wired LAN Local Area Network
  • USB Universal Serial Bus
  • a mobile communication method a communication method of a wireless personal area network (WPAN) series such as Bluetooth or Zigbee, and a communication method of a wireless local area network (WLAN) series such as Wi-Fi And other known communication schemes.
  • WPAN wireless personal area network
  • WLAN wireless local area network
  • the memory 1040 may store various kinds of information.
  • the memory 1040 may store data temporarily or semi-permanently. Examples of the memory 1040 include a hard disk drive (HDD), a solid state drive (SSD), a flash memory, a read-only memory (ROM), a random access memory (RAM), and the like. This can be.
  • the memory 1040 may be provided in a form embedded in the smart device 1000 or a form detachable from the feedback device.
  • the memory 1040 stores an operating program (OS) for driving the smart device 1000, various applications installed on the smart device 1000, and various data required or used for the operation of the smart device 1000. Can be.
  • OS operating program
  • the controller 1060 may control overall operations of the smart device 1000. For example, outputting the display back by the smart device 1000 may be performed as the controller 1060 controls the display module 1400, and the smart device 1000 communicates with the voice assistant server 10. This may be performed as the controller 1060 controls the communication module 1020.
  • the control operation of the controller 1060 may be performed as the controller 1060 performs calculation and processing of various information.
  • the controller 1060 may be implemented as a computer or a similar device according to hardware, software, or a combination thereof.
  • the controller 1060 may be provided in the form of an electronic circuit that processes an electrical signal to perform a control function, and may be provided in the form of a program or code that drives a hardware circuit in software.
  • the controller 1060 may have a single physical configuration, but in some cases, may be provided in a physically separated form. In other words, the controller 1060 may be manufactured as a single chip, but may be provided as a plurality of chips or substrates that are physically distributed, and in this case, a communication interface between the separate controllers 1060 may be connected. have.
  • 2 to 8 are perspective views of embodiments of the smart device 1000 according to an embodiment of the present invention.
  • the smart device 1000 may be provided in the form of a smart speaker 1000a mounted on a horizontal surface such as a table or a floor as shown in FIG. 2.
  • the housing 1100 may include a lower surface 1101 lying on a horizontal surface, an upper surface 1102 corresponding to the lower surface 1101, and a side surface 1103 connecting the lower surface 1101 and the upper surface 1102. have.
  • FIG. 2 illustrates the housing 1100 in a circular pillar shape, but the housing 1100 may have various shapes such as a polygonal pillar, an upper surface 1102 having a tapered surface inclined, or a polygonal pillar, a circle or a polygonal horn. .
  • the smart device 1000 may be provided with an indicator 1106 indicating an operation mode of the smart device 1000.
  • the indicator 1106 may be a lamp or the like that displays a specific color or a specific pattern, depending on the mode of operation.
  • the indicator 1106 is disposed to surround the edge of the side surface 1103 of the housing 1100, but the shape or position of the indicator 1106 is not limited to FIG. 2.
  • the smart device 1000 may be provided with a microphone array 1204 including a plurality of microphones 1202.
  • the plurality of microphones 1202 may be disposed radially on the top surface 1102 of the housing 1100 or along the side 1103 of the housing 1100 as shown in FIG. 2.
  • the smart device 1000 may be provided with a single microphone 1202.
  • the microphone 1202 may be a point on the side 1103 of the housing 1100 in the direction in which the smart device 1000 is mainly used. It may be disposed on the upper surface 1102 of the housing 1100.
  • the smart device 1000 may be provided with a single speaker 1302 to output voice in a non-directional manner.
  • the speaker 1302 is arranged to output voice toward the lower surface 1101 of the housing 1100 inside the housing 1100, and a cone-shaped protrusion is provided on the lower surface 1101 of the housing 1100.
  • the sound path may be output in the omnidirectional direction of the housing 1100.
  • a plurality of speakers 1302 may be provided, and in this case, the smart device 1000 may output directional voice or output voice in a multi-channel (for example, stereo sound or 5.1 channel).
  • the smart device 1000 since the display module 1400 is not mounted in the smart device 1000, the smart device 1000 may not output the display back. Some of the smart device 1000 and the control method thereof according to an embodiment of the present invention to be described later may be applied to the smart device 1000 in which the display module 1400 is mounted, but the other part is the display module 1400. Note that it can also be applied to the smart device 1000 without.
  • the smart device 1000 may be mounted on a horizontal surface such as a table or a floor, as shown in FIG. 3, and provided as a form 1000b including a display panel 1420.
  • the display device 1420 may be provided in the smart device 1000.
  • the display panel 1420 may be mainly provided on one surface of the housing 1100.
  • the smart device 1000 may output various images (eg, a display back) through the display panel 1420.
  • the display panel 1420 may function as a touch input interface.
  • the indicator 1106 may be selectively provided in the smart device 1000 in this example.
  • the voice output module 1300 and the voice input module 1200 may be provided in various arrangements.
  • the smart device 1000 of the present example has been described as being stationary.
  • the smart device 1000 may be provided as a wall-mounted device.
  • a mounting means for hanging the smart device 1000 on the wall may be provided in the housing 1100.
  • the mounting means may be provided as an adhesive layer, a bracket, a recess or the like.
  • the smart device 1000 may be provided in the form 1000c including the projector 1440 as shown in FIG. 4. Accordingly, the smart device 1000 may output various images through the projector 1440.
  • the smart device 1000 shown in FIG. 4 may be mounted on a horizontal plane or installed on a ceiling or a wall. If a short projection distance is desired, an ultra short throw (UST) projector may be used as the projector 1440 of the smart device 1000.
  • UST ultra short throw
  • the smart device 1000 may optionally include a touch input interface, a gesture input interface, a gaze recognition interface, and / or a space recognition interface.
  • the touch input interface may detect a user's touch input on a surface on which the smart device 1000 is placed or a surface on which the smart device 1000 projects the display back.
  • An example of the touch input interface is an infrared touch sensor.
  • the infrared touch sensor may include light emitting means for irradiating infrared rays and light receiving means for receiving infrared rays, and may acquire a touch input by using infrared rays emitted by the light emitting means being reflected by a user's body and received by the light receiving means. .
  • a general camera or an infrared camera may be used instead of the light receiving means.
  • the light emitting means emits infrared light so as to form a predetermined pattern, and the touch input may be obtained by detecting that the pattern is deformed by the user's body in the infrared camera.
  • the gesture input interface may recognize a gesture (for example, an arm gesture, a finger gesture, a finger shape, etc.) expressed by the user's body based on various images.
  • a gesture for example, an arm gesture, a finger gesture, a finger shape, etc.
  • the gesture input interface has an advantage of receiving a gesture for a space without receiving only a touch on a physical surface.
  • the gaze recognition interface may recognize the gaze of the user of the smart device 1000.
  • the gaze recognition interface may recognize a direction viewed by a user or a point viewed by the user as two-dimensional or three-dimensional information.
  • the space aware interface may recognize a space and an object around the smart device 1000.
  • the space recognition interface may recognize a structure of a room where the smart device 1000 is placed, a location, a shape, etc. of an object placed in the vicinity.
  • the touch input interface, the gesture input interface, the gaze recognition interface, and / or the space recognition interface of the smart device 1000 are not limited to the above examples, and various modifications apparent to those skilled in the art are possible. It is also noted that the touch input interface, gesture input interface, gaze recognition interface, and / or space aware interface may be applied to the smart device 1000 as well as other implementations of the smart device 1000 of the present example.
  • the smart device 1000 may be provided in a form 1000d in which the projection direction is adjusted as the mounting direction thereof is adjusted.
  • the housing 1100 of the smart device 1000 may have at least two mounting surfaces 1104 and 1105, and the housing 1100 may be one of the two mounting surfaces 1104 and 1105. It may be determined whether the projection direction is facing the wall or the bottom surface depending on whether or not it is mounted. In other words, the projection direction or the projection area of the smart device 1000 may be adjusted by the user manually adjusting the mounting posture of the smart device 1000.
  • the smart device 1000 may automatically adjust the projection direction.
  • the smart device 1000 may further include a driving module 1500 for adjusting the direction of the projection or the area of the projection.
  • the smart device 1000 may be provided with a driving motor 1500 that rotates the projector.
  • the smart device 1000 may be provided as a driving module 1500 such as a reflection mirror for adjusting the light path of the projection.
  • the driving module 1500 may be provided in a combination of the rotating motor 1520 and the reflection mirror.
  • the reflective mirror may be implemented as a solid-state or may be provided as a physically oriented nodal mirror or polygonal mirror, such as a MEMS mirror, a resonance mirror, or the like.
  • the smart device 1000 is mounted on a table and outputs a display back on the table.
  • the smart device 1000 may move the projection direction of the display back to the user direction using the driving module 1500.
  • the smart device 1000 may properly adjust the wall surface or the table surface as necessary in the projection direction by using the driving module 1500.
  • the smart device 1000 which automatically adjusts the projection direction or the projection area is not limited to the form having the projector 1440.
  • a driving module 1500 that adjusts the direction of the display panel 1420 is mounted on the smart device 1000 of the type shown in FIG. 5 so that the smart device 1000 is automatically displayed through the driving module 15000. It is also possible to adjust the direction.
  • the smart device 1000 may mainly receive a user voice and provide feedback in response to a voice command included in the user voice.
  • some of the technical matters considered in connection with the process of the smart device 1000 performing the above-described operation, that is, receiving the user voice and providing feedback in response thereto may be as follows.
  • the first is to identify the user's intention to speak. Although the user is in an environment in which the smart device 1000 may be used, the user's speech may not necessarily be for using the smart device 1000. When the smart device 1000 reacts to the user's voice that is uttered without the intention of using the smart device 1000, the smart device 1000 may malfunction, and thus user convenience may be deteriorated.
  • the smart device 1000 mainly transmits the received user voice to the voice assistant server 10 to interpret the voice command, and all the user voices received by the voice input module 1200 of the smart device 1000 are all voice assistants. If sent to the server 10, there is a possibility that even the personal information that the user does not want is transmitted to the operating agent of the voice assistant server 10. This can be taken as a leak of personal information from the user's point of view.
  • the smart device 1000 distinguishes and processes whether a voice received through the voice input module 1200 is uttered in the intention of using the smart device 1000 or not. You should be able to.
  • the voice input through the voice input module 1200 may cause the user to use the smart device 1000. It may be difficult to determine whether or not to ignite for use.
  • the smart device 1000 may transmit the received user voice to the voice assistant server 10 before the phase of delivering the received user voice to the voice assistant server 10 for interpretation of the voice command.
  • determining a user intention as to whether or not the subsequent user voice is uttered in the intention to use the smart device 1000 based on whether or not to detect a specific word reflecting the intention to use,
  • the user's voice spoken with the intention of using the smart device 1000 and the voice not spoken may be distinguished and processed.
  • the operation mode of the smart device 1000 may include a standby mode and a listening mode.
  • the standby mode is a mode for determining a user's intention to use the smart device 1000
  • the listening mode is a mode for receiving a user voice containing a voice command on the premise that the user intends to use the smart device 1000. to be.
  • the smart device 1000 may have an off state in which the device does not operate. In the off state, the device may be turned off because it is not powered, or in a hibernation state with minimal power.
  • the smart device 1000 may operate in the standby mode without receiving a special instruction from the user. For example, the smart device 1000 may enter a standby mode when power is applied.
  • the standby mode is a phase in which a user receives an intention to use the smart device 1000.
  • the smart device 1000 may perform an operation of detecting the wakeup word from the user's voice in the standby mode, and determine a user's intention to use the smart device 1000 according to whether the wakeup word is detected. have.
  • the smart device 1000 may receive a voice through the voice input module 1200.
  • the voice input module 1200 may non-selectively receive the voice. Since the smart device 1000 does not know whether the received voice is a user voice uttered toward the smart device 1000, the smart device 1000 may not transmit the received voice to the voice assistant server 10 for interpretation of the voice command.
  • the smart device 1000 may detect a wakeup word from the received voice.
  • the wakeup word may be a specific word or phrase predetermined by the manufacturer of the smart device 1000 or may be selected by a user from a specific word group or phrase group predetermined by the manufacturer of the smart device 1000.
  • the controller 1060 of the smart device 1000 may detect a wakeup word from a voice received locally directly without cooperation with the voice assistant server 10.
  • the smart device 1000 may determine a user's intention to use the smart device 1000 based on whether a wakeup word is detected from the received voice.
  • the smart device 1000 may be understood as a user intending to use the smart device 1000.
  • the smart device 1000 may enter a listening mode in preparation for receiving a user voice containing a voice command from the user. On the contrary, when the wake-up word is not detected from the received voice, the smart device 1000 may understand that the user does not intend to use the smart device 1000. Instead, the smart device 1000 enters the listening mode. Standby mode can be maintained.
  • the standby mode is a mode for detecting / recognizing the wakeup word, it may also be referred to as a wakeup word detection state or mode in some cases.
  • the smart device 1000 may enter the listening mode. Meanwhile, the smart device 1000 may be provided with a button for instructing to enter the listening mode, and the smart device 1000 may enter the listening mode in response to a user input for the corresponding button.
  • the smart device 1000 may receive a user voice through the voice input module 1200.
  • the user's voice input in the listening mode may include a voice command for instructing a specific operation of the smart device 1000.
  • voice commands may take predominantly natural language forms. Recognizing natural language voice commands may be difficult to process locally due to a high amount of computation, so that the smart device 1000 may collaborate with the voice assistant server 10 to interpret voice commands from the user's voice.
  • the smart device 1000 may transmit the input user voice to the voice assistant server 10 for interpretation of the voice command included in the user voice.
  • the smart device 1000 may transmit the user voice to the voice assistant server 10 in some manner.
  • the smart device 1000 may transmit a user voice received in a listening mode to the voice assistant server 10 in real time.
  • the smart device 1000 may collect the received user voices and then collectively transmit the collected user voices to the voice assistant server 10.
  • the voice assistant server 10 may interpret the voice command from the user voice received from the smart device 1000, generate feedback data corresponding to the interpreted voice command, and return it to the smart device 1000.
  • the smart device 1000 may perform pre-processing on the user voice. Examples of preprocessing include noise canceling, speech data compression, and the like.
  • preprocessing include noise canceling, speech data compression, and the like.
  • the smart device 1000 may receive the user voice including the talk back and voice commands together. Can be.
  • the smart device 1000 may extract the user voice part from the received voice by removing the talk back part of the received voice by using the information on the talk back output by itself.
  • the smart device 1000 may perform an operation for creating a quiet surrounding environment in order to receive a clean user voice. For example, when the smart device 1000 enters the listening mode, the smart device 1000 may stop the output of the talkback that is being output or reduce the audio volume of the talkback. For example, when the smart device 1000 playing the radio news enters the listening mode, the smart device 1000 may pause the live news being played.
  • the smart device 1000 may maintain the listening mode for a certain time period. For example, when the smart device 1000 transmits a user input containing a voice command to the voice assistant server 10 and receives feedback data from the voice assistant server 10, the smart device 1000 may end the listening mode. For another example, when the input of the user voice is completed or when the voice is not input for a predetermined time after the user voice is input, the smart device 1000 may end the listening mode. For another example, the smart device 1000 may end the listening mode when there is no input of the user's voice for a predetermined time after entering the listening mode.
  • a time interval during which the smart device 1000 maintains a listening mode will be referred to as a 'listening window'.
  • an operation in which the smart device 1000 enters a listening mode is referred to as 'opening a listening window'
  • an operation in which the smart device 1000 terminates a listening mode is referred to as a 'listening window'.
  • a state in which the smart device 1000 maintains the listening mode will be referred to as a 'closing a listening window' state.
  • the smart device 1000 may detect the wakeup word from the user voice even in the listening mode. When the wake-up word is detected in the listening mode, the smart device 1000 may restart the listening mode and initialize the listening window.
  • the voice assistant server 10 extracts the voice command from the user voice, interprets the extracted voice command, and interprets the analyzed voice command.
  • the feedback data may be generated based on the command and transferred to the smart device 1000.
  • the smart device 1000 receiving the feedback data may output the feedback using the feedback data.
  • the response mode is to receive feedback and output it.
  • the smart device 1000 receives the feedback data and outputs the feedback accordingly, and the smart device 1000 may output the talk back and / or the display back.
  • the smart device 1000 may enter the standby mode at the same time as receiving the feedback data to start the output of the feedback or immediately after the output of the feedback. In this case, the smart device 1000 may output the feedback in the standby mode or the listening mode. According to an aspect, instead of defining that the smart device 1000 maintains the output of the feedback in the standby mode or the listening mode, it may be interpreted that the response mode and the standby mode are operated at the same time.
  • the smart device 1000 which is requested to stream live news from a user, receives feedback data about live news from the voice assistant server 10 in response mode, starts playback of live news, and enters standby mode. You can continue streaming live news.
  • the smart device 1000 may enter the listening mode but still maintain the live news streaming.
  • the smart device 1000 does not need to operate all the above-described modes.
  • the response mode may be omitted in the operation mode of the smart device 1000.
  • all of the above-described smart devices 1000 according to an embodiment of the present invention do not have to be operated independently.
  • the smart device 1000 may have an operation mode in which all or part of the standby mode, the listening mode, and the response mode are duplicated.
  • FIG. 7 and 8 are views illustrating an operation of an operation mode of the smart device 1000 according to an embodiment of the present invention
  • FIG. 9 is a smart device 1000 and a voice assistant according to an embodiment of the present invention. It is a figure regarding the communication between the servers 10.
  • FIG. 9 is a smart device 1000 and a voice assistant according to an embodiment of the present invention. It is a figure regarding the communication between the servers 10.
  • the smart device 1000 may enter the standby mode from the off state.
  • the smart device 1000 in the standby mode may continuously receive a voice through the voice input module 1200 and detect a wakeup word from the received voice.
  • the smart device 1000 may enter a listening mode. Alternatively, the smart device 1000 may enter the listening mode even when a user input such as a touch / button / gesture indicating the listening mode is input. In the listening mode, the smart device 1000 may receive a user voice containing a voice command.
  • the smart device 1000 may transmit the received user voice to the voice assistant server 10.
  • the smart device 1000 may receive feedback data for a voice command from the voice assistant server 10 and output feedback based on the feedback data.
  • the smart device 1000 in the listening mode may return from the listening mode to the standby mode. For example, if the user voice is not input after entering the listening mode, the smart device 1000 in the listening mode may return to the standby mode. For example, the smart device 1000 in the listening mode may return to the standby mode if no additional user voice is input after the user voice is input. For example, when the smart device 1000 in the listening mode transmits a user's voice to the voice assistant, when the feedback data of the feedback related to the voice command is received from the voice assistant server 10 or the feedback data is based on the feedback data. When the output (or the start of the output) is performed, it may return to the standby mode.
  • the smart device 1000 may enter a standby mode from an off state.
  • the smart device 1000 in the standby mode may continuously receive a voice through the voice input module 1200 and detect a wakeup word from the received voice.
  • the smart device 1000 may enter a listening mode.
  • the smart device 1000 may receive a user voice containing a voice command.
  • the smart device 1000 may transmit the received user voice to the voice assistant server 10.
  • the smart device 1000 that transmits the received user voice to the voice assistant server 10 may enter a response mode and output feedback.
  • the smart device 1000 in the listening mode may enter the response mode when the user voice is transmitted to the voice assistant server 10 or when the feedback data is received from the voice assistant server 10.
  • the smart device 1000 may receive feedback data for a voice command from the voice assistant server 10 and output feedback based on the feedback data.
  • the smart device 1000 that terminates the output of the feedback or initiates the output of the feedback may return to the standby mode.
  • the smart device 1000 in the listening mode does not receive the user voice while in the listening mode, the smart device 1000 may return to the standby mode instead of entering the response mode.
  • the smart device 1000 is equipped with a voice assistant function to receive a user voice and output feedback corresponding to a voice command included in the received user voice.
  • the smart device 1000 may receive a user voice including a voice command in a listening state, and transmit the input user voice to the voice assistant server 10.
  • the user voice transmitted to the voice assistant server 10 may be a user voice that has undergone pre-processing such as noise canceling in the smart device 1000.
  • the voice assistant server 10 may perform speech-to-text (STT) to textify the voice signal and recognize the voice command based on the textualized content.
  • STT speech-to-text
  • the voice assistant server 10 may generate appropriate feedback data based on the recognized voice command and transmit it to the smart device 1000.
  • the smart device 1000 may output the feedback data to the talkback in the form of voice or the display back in the form of an image.
  • the smart device 1000 has a personality as a voice interface device that interacts with a user through utterances made by a user and a talkback output through a speaker, and at the same time plays a video or music content.
  • a content consuming device contents consuming device
  • the smart device 1000 may be used to play multimedia content such as movies or music on a daily basis. Therefore, it may be advantageous that the smart device 1000 according to an embodiment of the present invention can receive a user's voice containing a voice command from the user even while the media content is being played.
  • the output of the audio data according to the playback of the media content is output to the voice input module 1200 of the smart device 1000.
  • the sound of the audio data may act as a factor that makes it difficult to interpret the voice according to the user's speech.
  • the smart device 1000 recognizes any voice command from the voice signal input in the listening state, even if there is no problem in detecting the wake-up word corresponding to a predetermined word from the voice signal received in the standby mode.
  • the recognition rate of the voice command is likely to decrease dramatically.
  • the smart device 1000 when the smart device 1000 according to an embodiment of the present invention attempts to receive a voice signal including a voice command by entering a listening state during the playback of media content, the audio output of the media content that is being played back may affect the speech of the user.
  • predetermined processing may be performed on the media content.
  • FIG. 10 is a diagram illustrating an operation of a standby mode during media content playback according to an embodiment of the present invention.
  • the smart device 1000 may play media content.
  • the smart device 1000 playing the media content may have a standby mode.
  • the smart device 1000 may receive a voice signal even while playing media content, and recognize a wakeup word from the received voice signal.
  • the wake-up word is limited to a predetermined word unlike other voice commands randomly uttered by the user. Therefore, the smart device 1000 wakes up from the input voice signal even if the voice output of the media content being played is mixed with the user's voice. The up word can be detected.
  • FIG. 11 is a diagram illustrating the operation of a listening mode during media content playback according to an embodiment of the present invention.
  • the operating state of the smart device 1000 may transition from the standby mode to the listening state.
  • the smart device 1000 which is playing the media content may control the playback of the media content.
  • the smart device 1000 may maintain the playback of the media content, but reduce or remove the voice volume of the media content. As another example, the smart device 1000 may stop playing the media content.
  • the smart device 1000 may receive only a voice signal of the user who does not mix audio data of the media content. Therefore, when the operating state of the smart device 1000 is a listening state, the voice command recognition rate of the user may be improved.
  • the smart device 1000 may output a subtitle corresponding to the audio data to compensate for the deletion or reduction of the volume of the media content.
  • the smart device 1000 may perform various operations so that the user can continue to watch the media content that is being played back even in the listening mode, which will be described in detail later.
  • FIG. 12 is a flowchart of a first example of a method for controlling a smart device according to an embodiment of the present invention.
  • the media content playback step (S1110), the step of receiving a voice signal during media content playback (S1120), and a wake-up word in the received voice signal If is included may include the step of entering the listening state (S1130) and adjusting the volume of the media content when entering the listening state (S1140).
  • the smart device 1000 may play the media content in operation S1110.
  • the media content may include at least audio data.
  • the media content may include video data and audio data.
  • the media content may include only audio data except video data.
  • the controller 1060 may play the media content by outputting audio data of the media content through the voice output module 1300.
  • the media content may be provided as feedback on a voice command of the user.
  • the media content may be talkback, and when the media content further includes video data, the media content may be display back.
  • media content does not necessarily have to be feedback by voice commands.
  • the smart device 1000 may receive a voice signal while playing the media content in operation S1120.
  • the voice signal may include a voice of a user received from the outside of the smart device 1000.
  • the voice of the user may include a wakeup word.
  • the smart device 1000 may recognize the wakeup word from the voice of the user.
  • the smart device 1000 When the smart device 1000 receives the user voice including the wakeup word, the smart device 1000 may enter a listening state (S1130).
  • the smart device 1000 may adjust the volume of the media content to increase the recognition rate of the voice command of the user (S1140). For example, the smart device 1000 may adjust by lowering or removing the volume of audio data of the media content being played in order to remove sound except for the voice of the user.
  • the smart device 1000 may adjust the volume of the audio data of the media content in various ways. For example, the smart device 1000 may lower the volume of the audio data to a predetermined size.
  • the predetermined size may be zero.
  • the smart device 1000 may terminate the output of the audio data. Accordingly, the sound of the media content being played back can be eliminated.
  • the smart device 1000 may disable the voice output module 1300 so that the sound of the media content is not output.
  • the smart device 1000 may gradually or rapidly adjust the volume.
  • the smart device 1000 may continuously provide media content to the user and receive voice commands without noise due to audio data of the media content being played.
  • the smart device 1000 may return the volume of the media content to its original state at a specific time.
  • the smart device 1000 may adjust the volume of the media content to the original level.
  • the smart device 1000 may return to the standby mode when a predetermined time elapses without the input of the voice command in the listening mode.
  • the volume of the media content may be adjusted.
  • the smart device 1000 may adjust the volume of the media content as it is when outputting feedback on the voice command received in the listening mode.
  • the feedback may be a display back without audio data.
  • the smart device 1000 may adjust the volume of the media content to be original when the output of the feedback for the voice command received in the listening mode is terminated. In this case, the feedback may be a talk back or a display back including audio data.
  • FIG. 13 is a flowchart of a second example of a method for controlling a smart device according to an embodiment of the present invention.
  • the media content playback step (S1210), the step of receiving a voice signal during media content playback (S1220), and a wake-up word in the received voice signal may further include entering the listening state (S1230), adjusting the volume of the media content when entering the listening state (S1240), and displaying the text data when entering the listening state (S1250).
  • steps S1210 to S1240 may be performed similarly to steps S1110 to S1140 described above, a description thereof will be omitted.
  • the smart device 1000 When the smart device 1000 lowers or removes the volume of the audio data of the media content being played in the listening mode, the user may not fully enjoy the media content. For example, if live news or the like is provided, the user may not know the contents of the live news in the listening mode. Therefore, the smart device 1000 may output text data corresponding to the audio data of the media content to supplement the voice of the reduced or eliminated media content (S1250).
  • the smart device 1000 may receive text data from the voice assistant server 10. For example, when the smart device 1000 enters a listening mode, the smart device 1000 may request text data from the voice assistant server 10.
  • the smart device 1000 may recognize a wake-up word while playing media content, enter a listening state, and display an indication that an operating state is a listening state.
  • the smart device 1000 may adjust the volume of audio data of the media content being played and output text data corresponding to the audio data whose volume is adjusted.
  • the smart device 1000 may display a display indicating a volume state of the audio data.
  • the smart device 1000 may display text data corresponding to audio data before the volume is adjusted in addition to text data corresponding to the audio data whose volume is adjusted.
  • the voice of "Chogoon is 6 degrees in Seoul and 8 degrees in Busan " is the volume of audio data output before the wake-up word is input.
  • the "6 degrees Busan is 8 degrees" portion is the portion where the volume is adjusted after the wake-up word is input, the smart device 1000 may output text data even for the portion where the volume is not adjusted. have. That is, the smart device 1000 may output text data corresponding to the audio data whose volume is adjusted and text data corresponding to the audio data related to the audio data whose volume is adjusted.
  • the audio data related to the audio data whose volume is adjusted may be a part contextually related to the audio data whose volume is adjusted.
  • the volume may be a portion of which the volume is controlled and a portion of a sentence or a phrase.
  • the audio data related to the audio data whose volume is adjusted may be audio data from a time point before a volume is adjusted to a time point where the volume is adjusted.
  • the audio data may be audio data output for two seconds from the time point at which the volume is adjusted.
  • the display of the text data may also be terminated.
  • the smart device 1000 may terminate the display of the text data at the time point when the volume of the media content is returned to its original state. For another example, the smart device 1000 may terminate the display of the text data after maintaining the text data until the end of the output of the audio data corresponding to the text data being displayed.
  • 16 is a flowchart of a third example of a method for controlling a smart device according to an embodiment of the present invention.
  • the method may include playing back media content (S1310), receiving a voice signal during media content playback (S1320), and entering a listening state. (S1330), adjusting the volume of the media content (S1340), displaying the text data (S1350), outputting the feedback (S1360), stopping the media content playback (S1370), and playing the media content It may include a step (S1380) to resume.
  • steps S1310 to S1360 may be performed similarly to the other steps described above, a detailed description thereof will be omitted.
  • the smart device 1000 may maintain the playback of the media content that was being played back, adjust the volume of the audio data of the media content, and output text data corresponding to the adjusted audio data.
  • the smart device 1000 may receive a voice command in a listening mode and output a feedback in response to the voice command (S1360).
  • the smart device 1000 may stop playing the media content to output the feedback (S1370). For example, if the feedback is a display back or talk white containing audio data, the smart device 1000 may stop playing the media content to output it. Thereafter, when the feedback output ends, the smart device 1000 may resume playing the media content (S1380). Specifically, assuming a smart device 1000 that receives a voice command asking for the weather of the day while playing live news, the smart device 1000 normally performs live news while the user inputs a wake-up word before asking for the weather of the day. Playback, adjust the volume of the live news when the wake-up word is recognized and enter the listening mode, stop the live news while outputting the weather information corresponding to the voice command asking for the weather to the talkback, and when the talkback ends Live news can be resumed.
  • the feedback is a display back or talk white containing audio data
  • the smart device 1000 may stop playing the media content to output it. Thereafter, when the feedback output ends, the smart device 1000 may resume playing the media content (S1380).
  • FIG. 17 is a flowchart of a fourth example of a method for controlling a smart device according to an embodiment of the present invention.
  • the method may include playing back media content (S1410), receiving a voice signal during media content playback (S1420), and entering a listening state. (S1430), determining the type of media content (S1440), and maintaining or stopping the playback of the media content according to the type of the media content (S1450).
  • steps S1410 to S1430 may be performed similarly to the other steps described above, a detailed description thereof will be omitted.
  • the smart device 1000 when the smart device 1000 enters a listening mode during media content playback, the smart device 1000 maintains the media content. In this case, the smart device 1000 may stop playing the media content.
  • the smart device 1000 may determine whether to maintain or stop the playback of the media content upon entering the listening mode according to the type of the media content being played (S1440).
  • the type of media content may include, for example, live content and non-live content.
  • the smart device 1000 may determine that the media content that is being played is to maintain the playback of the media content when the media content that is being played is live content such as news currently being broadcast or live sports relay. In this case, the smart device 1000 may maintain the playback of the media content when entering the listening mode, adjust the volume thereof, and output text data (S1450).
  • the smart device 1000 may determine to stop playing the media content when the media content being played is non-live content such as a movie or a movie. In this case, when entering the listening mode, the smart device 1000 may stop the playback of the media content (S1450) and then resume the playback.
  • FIG. 18 is a flowchart of a fifth example of a method of controlling a smart device according to an embodiment of the present invention.
  • the method may include receiving a voice signal in a standby mode (S1510), entering a listening state (S1520), and receiving a voice command ( S1530) and outputting feedback corresponding to the voice command in consideration of whether the media content is played in the standby mode (S1540).
  • the smart device 1000 may receive a voice signal in a standby mode (S1510), and enter a listening mode when the wake-up word is included in the voice signal (S1520). In the listening mode, the smart device 1000 receives a voice command (S1530), transmits it to the voice assistant server 10, receives feedback data from the voice assistant server 10, and outputs feedback using the received feedback data. can do.
  • the smart device 1000 may output feedback considering whether to play the media content in the standby mode (S1540).
  • the smart device 1000 may output a shorter feedback than otherwise.
  • the smart device 1000 may omit some of the feedback content to be output from the received feedback data or may output it at a high speed. Specifically, a portion of the talkback may be converted to display back to reduce the talkback contents or speed up the talkback speed.
  • the smart device 1000 when the smart device 1000 transmits a user's voice containing voice commands to the voice assistant server 10, the smart device 1000 delivers information indicating whether media content has been previously played in the standby mode, and the voice assistant server 10 may generate feedback data that is shorter than otherwise, based on the information indicating whether the media content has been previously played in the standby mode.
  • the smart device 1000 may adjust the data type of the feedback output in consideration of the data type of the pre-played media content. For example, when the pre-played media content is in the form of audio data, the feedback may be output to the display back to allow the user to audition the pre-played media content and visually request the newly requested feedback. Or vice versa, if the media content being played back is in the form of video data, the feedback may be output as a talkback so that the user can visually watch the media content being played back and audition newly requested feedback. To this end, the smart device 1000 may determine the feedback type in consideration of the type of media content that is being played back from the received feedback data. Alternatively, the smart device 1000 may transmit a form of media content that is being played back to the voice assistant server 10 so that the voice assistant server may determine a feedback form in consideration of this, and receive feedback data accordingly.
  • the method according to the above-described embodiment may be implemented in the form of program instructions that may be executed by various computer means, and may be recorded in a computer readable medium.
  • the computer readable medium may include program instructions, data files, data structures, etc. alone or in combination.
  • the program instructions recorded on the computer-readable medium may be those specially designed and configured for the exemplary embodiments, or may be known and available to those skilled in computer software.
  • Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tape, optical media such as CD-ROMs, DVDs, and magnetic disks, such as floppy disks.
  • Examples of program instructions include not only machine code generated by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like.
  • the hardware device described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • User Interface Of Digital Computer (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

L'invention concerne un procédé de commande de contenu multimédia au moyen d'un dispositif intelligent. Dans un aspect de l'invention, le procédé de commande de contenu multimédia est mis en oeuvre par un dispositif intelligent dont l'état de fonctionnement comprend un état de détection de mots de réveil déterminant un signal sonore reçu de l'extérieur qui contient des mots de réveil, et un état d'écoute servant à reconnaître une commande vocale contenue dans le signal sonore, et il consiste : à régler le volume du contenu multimédia à l'entrée dans l'état d'écoute pendant la lecture du contenu multimédia ; et à afficher, conjointement avec des données vidéo du contenu multimédia, des données textuelles correspondant aux données audio dont le volume a été réglé.
PCT/KR2018/014227 2018-07-27 2018-11-19 Dispositif intelligent et procédé de commande associé Ceased WO2020022573A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR10-2018-0087688 2018-07-27
KR1020180087688A KR102093030B1 (ko) 2018-07-27 2018-07-27 스마트 디바이스 및 그 제어 방법

Publications (1)

Publication Number Publication Date
WO2020022573A1 true WO2020022573A1 (fr) 2020-01-30

Family

ID=69181885

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2018/014227 Ceased WO2020022573A1 (fr) 2018-07-27 2018-11-19 Dispositif intelligent et procédé de commande associé

Country Status (2)

Country Link
KR (1) KR102093030B1 (fr)
WO (1) WO2020022573A1 (fr)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112233670A (zh) * 2020-08-28 2021-01-15 福州智象信息技术有限公司 一种基于alexa云服务的语音交互方法及系统
CN112929724A (zh) * 2020-12-31 2021-06-08 海信视像科技股份有限公司 显示设备、机顶盒及远场拾音唤醒控制方法
CN113628622A (zh) * 2021-08-24 2021-11-09 北京达佳互联信息技术有限公司 语音交互方法、装置、电子设备及存储介质
CN113689850A (zh) * 2020-05-18 2021-11-23 丰田自动车株式会社 智能体协作装置、其动作方法以及存储介质
CN114527711A (zh) * 2021-11-08 2022-05-24 厦门阳光恩耐照明有限公司 一种基于本地语音的智能设备控制的方法、装置及电子设备
CN115579002A (zh) * 2022-08-18 2023-01-06 北京声智科技有限公司 音频播放控制方法、装置及电子设备
WO2023072139A1 (fr) * 2021-10-28 2023-05-04 华为技术有限公司 Procédé de lecture audio, dispositif électronique et système

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112634890B (zh) * 2020-12-17 2023-11-24 阿波罗智联(北京)科技有限公司 用于唤醒播放设备的方法、装置、设备以及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130339028A1 (en) * 2012-06-15 2013-12-19 Spansion Llc Power-Efficient Voice Activation
KR20150029974A (ko) * 2013-09-11 2015-03-19 엘지전자 주식회사 디스플레이 디바이스 및 그 제어 방법
KR20150065643A (ko) * 2012-01-09 2015-06-15 삼성전자주식회사 표시 장치 및 그 제어방법
US20170257072A1 (en) * 2014-12-10 2017-09-07 Ebay Inc. Intelligent audio output devices
KR20180084392A (ko) * 2017-01-17 2018-07-25 삼성전자주식회사 전자 장치 및 그의 동작 방법

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20150065643A (ko) * 2012-01-09 2015-06-15 삼성전자주식회사 표시 장치 및 그 제어방법
US20130339028A1 (en) * 2012-06-15 2013-12-19 Spansion Llc Power-Efficient Voice Activation
KR20150029974A (ko) * 2013-09-11 2015-03-19 엘지전자 주식회사 디스플레이 디바이스 및 그 제어 방법
US20170257072A1 (en) * 2014-12-10 2017-09-07 Ebay Inc. Intelligent audio output devices
KR20180084392A (ko) * 2017-01-17 2018-07-25 삼성전자주식회사 전자 장치 및 그의 동작 방법

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113689850A (zh) * 2020-05-18 2021-11-23 丰田自动车株式会社 智能体协作装置、其动作方法以及存储介质
CN112233670A (zh) * 2020-08-28 2021-01-15 福州智象信息技术有限公司 一种基于alexa云服务的语音交互方法及系统
CN112929724A (zh) * 2020-12-31 2021-06-08 海信视像科技股份有限公司 显示设备、机顶盒及远场拾音唤醒控制方法
CN113628622A (zh) * 2021-08-24 2021-11-09 北京达佳互联信息技术有限公司 语音交互方法、装置、电子设备及存储介质
WO2023072139A1 (fr) * 2021-10-28 2023-05-04 华为技术有限公司 Procédé de lecture audio, dispositif électronique et système
CN114527711A (zh) * 2021-11-08 2022-05-24 厦门阳光恩耐照明有限公司 一种基于本地语音的智能设备控制的方法、装置及电子设备
CN115579002A (zh) * 2022-08-18 2023-01-06 北京声智科技有限公司 音频播放控制方法、装置及电子设备

Also Published As

Publication number Publication date
KR102093030B1 (ko) 2020-03-24
KR20200012414A (ko) 2020-02-05

Similar Documents

Publication Publication Date Title
WO2020022572A1 (fr) Dispositif intelligent et son procédé de commande
WO2020022573A1 (fr) Dispositif intelligent et procédé de commande associé
WO2012091185A1 (fr) Dispositif d'affichage et procédé fournissant une réaction suite à des gestes de ce dispositif
WO2020231181A1 (fr) Procédé et dispositif pour fournir un service de reconnaissance vocale
WO2019156314A1 (fr) Dispositif électronique de conversation avec un dialogueur et son procédé d'exploitation
WO2012157792A1 (fr) Dispositif électronique
WO2014003283A1 (fr) Dispositif d'affichage, procédé de commande de dispositif d'affichage, et système interactif
WO2019190097A1 (fr) Procédé de fourniture de services à l'aide d'un robot conversationnel et dispositif associé
WO2019190073A1 (fr) Dispositif électronique et son procédé de commande
WO2018043895A1 (fr) Dispositif d'affichage et procédé de commande de dispositif d'affichage
EP2514105A2 (fr) Procédé et système pour commander une sortie externe d'un dispositif mobile
WO2021049795A1 (fr) Dispositif électronique et son procédé de fonctionnement
WO2019013447A1 (fr) Dispositif de commande à distance et procédé de réception de voix d'un utilisateur associé
WO2019112181A1 (fr) Dispositif électronique pour exécuter une application au moyen d'informations de phonème comprises dans des données audio, et son procédé de fonctionnement
WO2015170832A1 (fr) Dispositif d'affichage, et procédé d'exécution d'appel vidéo correspondant
WO2019142988A1 (fr) Dispositif électronique, procédé de commande associé, et support d'enregistrement lisible par ordinateur
WO2021162267A1 (fr) Appareil et procédé pour convertir une sortie audio
WO2019009453A1 (fr) Dispositif d'affichage
WO2018124355A1 (fr) Dispositif audio et procédé de commande associé
WO2022169039A1 (fr) Appareil électronique et son procédé de commande
WO2020022567A1 (fr) Dispositif intelligent et son procédé de commande
WO2019128177A1 (fr) Procédé et dispositif de commande de lecture musicale dans un état d'écran éteint, et support de stockage informatique
WO2021125784A1 (fr) Dispositif électronique et son procédé de commande
WO2023106678A1 (fr) Procédé de traitement de signaux audio mal reconnus et dispositif associé
WO2020022568A1 (fr) Dispositif intelligent et son procédé de commande

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18927641

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18927641

Country of ref document: EP

Kind code of ref document: A1