WO2023048379A1 - Serveur et dispositif électronique pour traiter un énoncé d'utilisateur, et son procédé de fonctionnement - Google Patents
Serveur et dispositif électronique pour traiter un énoncé d'utilisateur, et son procédé de fonctionnement Download PDFInfo
- Publication number
- WO2023048379A1 WO2023048379A1 PCT/KR2022/010924 KR2022010924W WO2023048379A1 WO 2023048379 A1 WO2023048379 A1 WO 2023048379A1 KR 2022010924 W KR2022010924 W KR 2022010924W WO 2023048379 A1 WO2023048379 A1 WO 2023048379A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- information
- electronic device
- target
- domain
- utterance
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/30—Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/22—Interactive procedures; Man-machine interfaces
- G10L17/24—Interactive procedures; Man-machine interfaces the user being prompted to utter a password or a predefined phrase
Definitions
- the following embodiments relate to an intelligent server that processes user speech, an electronic device, and an operating method thereof.
- the electronic device can recognize the user's speech through an artificial intelligence server and grasp the meaning and intention of the speech.
- the artificial intelligence server interprets the user's utterance to infer the user's intention and can perform tasks according to the inferred intention. You can work according to it.
- the artificial intelligence server may analyze various information about the situation at the time of utterance in connection with the utterance in order to determine the utterance intention.
- the artificial intelligence server may prioritize the electronic device to process user speech according to a predefined policy, and after determining the electronic device to process user speech, process the user speech among the applications of the electronic device.
- application can be determined. For example, after the device is determined, the intention of the user's utterance may be classified, and an application to process the utterance among applications in the device may be determined.
- the method of determining an application to process speech within the corresponding device after determining the electronic device only considers whether or not the application supports speech, and does not consider the service quality of the application.
- the disclosure may provide a server and an electronic device for processing user speech and an operation method thereof.
- an intelligent server for processing user utterances includes a memory for storing context information including information on each of at least one electronic device and information on at least one domain corresponding to each of the at least one electronic device, and the at least one electronic device. Based on the target speech received from any one of the electronic devices and the context information, at least one combination of electronic device information capable of processing the target speech and domain information is generated, and processing of the target speech is performed among the context information.
- a method for processing user utterances in an intelligent server may include receiving a target speech from one or more electronic devices, and generating at least one combination of electronic device information capable of processing the target speech and domain information based on the target speech and context information.
- the context information includes information on each of the at least one electronic device and information on at least one domain corresponding to each of the at least one electronic device - Processing of the target utterance among the context information
- An operation of determining reference information for an operation of calculating a quality of service score for each of the at least one combination with reference to the reference information, and an operation of calculating a quality of service score for each of the at least one combination.
- an electronic device for processing user utterance includes context information including information on each of at least one electronic device including the electronic device and information on at least one domain corresponding to each of the at least one electronic device; a memory for storing computer-executable instructions; and based on the target speech and the context information received from the electronic device, generating at least one combination of electronic device information capable of processing the target speech and domain information, and a criterion for processing the target speech among the context information.
- determining information calculating a quality of service score for each of the one or more combinations with reference to the reference information, and based on the quality of service score, a target electronic device and a corresponding target electronic device and a processor that determines a target combination of target domains and transmits a command to the target electronic device to process the target utterance as the target domain.
- an intelligent server and an electronic device may be provided that process utterances in consideration of the service quality of the electronic device and the application.
- a better user experience may be provided by classifying user intentions based on a combination of electronic devices and applications without having to classify user intentions for each electronic device.
- the configuration of the user intention classifier may be simplified, learning time may be reduced, and consistent responses may be possible.
- FIG. 1 is a block diagram of an electronic device in a network environment according to an embodiment of the disclosure.
- FIG. 2 is a block diagram illustrating an integrated intelligence system according to an embodiment of the disclosure.
- FIG. 3 is a diagram illustrating a user terminal displaying a screen for processing a voice input received through an intelligent app, according to an embodiment of the disclosure.
- FIG. 4 is a diagram illustrating a form in which relationship information between concepts and actions is stored in a database according to an implementation of the disclosure.
- FIG. 5 is a block diagram illustrating an electronic device and an intelligent server, according to an embodiment of the disclosure.
- 6, 7, 8, and 9 are diagrams for explaining an operation of processing user utterance according to various embodiments of the present disclosure.
- FIG. 10 is a flowchart illustrating an ignition processing operation of an intelligent server according to an embodiment of the present disclosure.
- FIG. 1 is a block diagram of an electronic device in a network environment according to an embodiment of the disclosure.
- an electronic device 101 communicates with an electronic device 102 through a first network 198 (eg, a short-range wireless communication network) or through a second network 199. It may communicate with at least one of the electronic device 104 or the server 108 through (eg, a long-distance wireless communication network). According to one implementation, the electronic device 101 may communicate with the electronic device 104 through the server 108 .
- the electronic device 101 includes a processor 120, a memory 130, an input module 150, an audio output module 155, a display module 160, an audio module 170, a sensor module ( 176), interface 177, connection terminal 178, haptic module 179, camera module 180, power management module 188, battery 189, communication module 190, subscriber identification module 196 , or the antenna module 197 may be included.
- at least one of these components eg, the connection terminal 178) may be omitted or one or more other components may be added.
- some of these components eg, sensor module 176, camera module 180, or antenna module 197) may be integrated into a single component (eg, display module 160).
- the processor 120 for example, executes software (eg, the program 140) to cause at least one other component (eg, hardware or software component) of the electronic device 101 connected to the processor 120. It can control and perform various data processing or calculations. According to one implementation, as at least part of data processing or operation, processor 120 may store instructions or data received from other components (eg, sensor module 176 or communication module 190) in volatile memory 132. It may store, process commands or data stored in the volatile memory 132, and store resultant data in the non-volatile memory 134.
- software eg, the program 140
- processor 120 may store instructions or data received from other components (eg, sensor module 176 or communication module 190) in volatile memory 132. It may store, process commands or data stored in the volatile memory 132, and store resultant data in the non-volatile memory 134.
- the processor 120 includes a main processor 121 (eg, a central processing unit or an application processor) or a secondary processor 123 (eg, a graphics processing unit, a neural network processing unit (NPU)) that can operate independently of or together with the main processor 121 (eg, a central processing unit or an application processor). : neural processing unit), image signal processor, sensor hub processor, or communication processor).
- main processor 121 eg, a central processing unit or an application processor
- NPU neural network processing unit
- the main processor 121 eg, a central processing unit or an application processor
- the auxiliary processor 123 may use less power than the main processor 121 or be set to be specialized for a designated function.
- the secondary processor 123 may be implemented separately from or as part of the main processor 121 .
- the secondary processor 123 may, for example, take the place of the main processor 121 while the main processor 121 is in an inactive (eg, sleep) state, or the main processor 121 is active (eg, running an application). ) state, together with the main processor 121, at least one of the components of the electronic device 101 (eg, the display module 160, the sensor module 176, or the communication module 190) It is possible to control at least some of the related functions or states.
- the auxiliary processor 123 eg, an image signal processor or a communication processor
- the auxiliary processor 123 may include a hardware structure specialized for processing an artificial intelligence model.
- AI models can be created through machine learning. Such learning may be performed, for example, in the electronic device 101 itself where the artificial intelligence model is performed, or may be performed through a separate server (eg, the server 108).
- the learning algorithm may include, for example, supervised learning, unsupervised learning, semi-supervised learning or reinforcement learning, but in the above example Not limited.
- the artificial intelligence model may include a plurality of artificial neural network layers.
- Artificial neural networks include deep neural networks (DNNs), convolutional neural networks (CNNs), recurrent neural networks (RNNs), restricted boltzmann machines (RBMs), deep belief networks (DBNs), bidirectional recurrent deep neural networks (BRDNNs), It may be one of deep Q-networks or a combination of two or more of the foregoing, but is not limited to the foregoing examples.
- the artificial intelligence model may include, in addition or alternatively, software structures in addition to hardware structures.
- the memory 130 may store various data used by at least one component (eg, the processor 120 or the sensor module 176) of the electronic device 101 .
- the data may include, for example, input data or output data for software (eg, program 140) and commands related thereto.
- the memory 130 may include volatile memory 132 or non-volatile memory 134 .
- the program 140 may be stored as software in the memory 130 and may include, for example, an operating system 142 , middleware 144 , or an application 146 .
- the input module 150 may receive a command or data to be used by a component (eg, the processor 120) of the electronic device 101 from the outside of the electronic device 101 (eg, a user).
- the input module 150 may include, for example, a microphone, a mouse, a keyboard, a key (eg, a button), or a digital pen (eg, a stylus pen).
- the sound output module 155 may output sound signals to the outside of the electronic device 101 .
- the sound output module 155 may include, for example, a speaker or a receiver.
- the speaker can be used for general purposes such as multimedia playback or recording playback.
- a receiver may be used to receive an incoming call. According to one embodiment, the receiver may be implemented separately from the speaker or as part of it.
- the display module 160 may visually provide information to the outside of the electronic device 101 (eg, a user).
- the display module 160 may include, for example, a display, a hologram device, or a projector and a control circuit for controlling the device.
- the display module 160 may include a touch sensor configured to detect a touch or a pressure sensor configured to measure the intensity of force generated by the touch.
- the audio module 170 may convert sound into an electrical signal or vice versa. According to one embodiment, the audio module 170 acquires sound through the input module 150, the sound output module 155, or an external electronic device connected directly or wirelessly to the electronic device 101 (eg, an electronic device). Sound may be output through the device 102 (eg, a speaker or a headphone).
- the device 102 eg, a speaker or a headphone
- the sensor module 176 detects an operating state (eg, power or temperature) of the electronic device 101 or an external environmental state (eg, a user state), and generates an electrical signal or data value corresponding to the detected state. can do.
- the sensor module 176 may include, for example, a gesture sensor, a gyro sensor, a barometric pressure sensor, a magnetic sensor, an acceleration sensor, a grip sensor, a proximity sensor, a color sensor, an IR (infrared) sensor, a biometric sensor, and a temperature sensor.
- a sensor, a humidity sensor, a hall sensor, or an illuminance sensor may be included.
- the interface 177 may support one or more designated protocols that may be used to directly or wirelessly connect the electronic device 101 to an external electronic device (eg, the electronic device 102).
- the interface 177 may include, for example, a high definition multimedia interface (HDMI), a universal serial bus (USB) interface, an SD card interface, or an audio interface.
- HDMI high definition multimedia interface
- USB universal serial bus
- SD card interface Secure Digital Card interface
- audio interface audio interface
- connection terminal 178 may include a connector through which the electronic device 101 may be physically connected to an external electronic device (eg, the electronic device 102).
- the connection terminal 178 may include, for example, an HDMI connector, a USB connector, an SD card connector, or an audio connector (eg, a headphone connector).
- the haptic module 179 may convert electrical signals into mechanical stimuli (eg, vibration or motion) or electrical stimuli that a user may perceive through tactile or kinesthetic senses.
- the haptic module 179 may include, for example, a motor, a piezoelectric element, or an electrical stimulation device.
- the camera module 180 may capture still images and moving images. According to one embodiment, the camera module 180 may include one or more lenses, image sensors, image signal processors, or flashes.
- the power management module 188 may manage power supplied to the electronic device 101 .
- the power management module 188 may be implemented as at least part of a power management integrated circuit (PMIC), for example.
- PMIC power management integrated circuit
- the battery 189 may supply power to at least one component of the electronic device 101 .
- battery 189 may include, for example, a non-rechargeable primary cell, a rechargeable secondary cell, or a fuel cell.
- the communication module 190 is a direct (eg, wired) communication channel or a wireless communication channel between the electronic device 101 and an external electronic device (eg, the electronic device 102, the electronic device 104, or the server 108). Establishment and communication through the established communication channel may be supported.
- the communication module 190 may include one or more communication processors that operate independently of the processor 120 (eg, an application processor) and support direct (eg, wired) communication or wireless communication.
- the communication module 190 may be a wireless communication module 192 (eg, a cellular communication module, a short-range wireless communication module, or a global navigation satellite system (GNSS) communication module) or a wired communication module 194 (eg, a : a local area network (LAN) communication module or a power line communication module).
- a corresponding communication module is a first network 198 (eg, a short-range communication network such as Bluetooth, wireless fidelity (WiFi) direct, or infrared data association (IrDA)) or a second network 199 (eg, a legacy communication module).
- the wireless communication module 192 uses subscriber information (eg, International Mobile Subscriber Identifier (IMSI)) stored in the subscriber identification module 196 within a communication network such as the first network 198 or the second network 199.
- IMSI International Mobile Subscriber Identifier
- the wireless communication module 192 may support a 5G network after a 4G network and a next-generation communication technology, for example, NR access technology (new radio access technology).
- NR access technologies include high-speed transmission of high-capacity data (enhanced mobile broadband (eMBB)), minimization of terminal power and access of multiple terminals (massive machine type communications (mMTC)), or high reliability and low latency (ultra-reliable and low latency (URLLC)).
- eMBB enhanced mobile broadband
- mMTC massive machine type communications
- URLLC ultra-reliable and low latency
- -latency communications can be supported.
- the wireless communication module 192 may support, for example, a high frequency band (eg, a millimeter wave (mmWave) band) in order to achieve a high data rate.
- mmWave millimeter wave
- the wireless communication module 192 uses various technologies for securing performance in a high frequency band, such as beamforming, massive multiple-input and multiple-output (MIMO), and full-dimensional multiplexing. Technologies such as input/output (FD-MIMO: full dimensional MIMO), array antenna, analog beam-forming, or large scale antenna may be supported.
- the wireless communication module 192 may support various requirements defined for the electronic device 101, an external electronic device (eg, the electronic device 104), or a network system (eg, the second network 199).
- the wireless communication module 192 may be used to realize peak data rate (eg, 20 Gbps or more) for realizing eMBB, loss coverage (eg, 164 dB or less) for realizing mMTC, or U-plane latency (for realizing URLLC).
- peak data rate eg, 20 Gbps or more
- loss coverage eg, 164 dB or less
- U-plane latency for realizing URLLC.
- DL downlink
- UL uplink each of 0.5 ms or less, or round trip 1 ms or less
- the antenna module 197 may transmit or receive signals or power to the outside (eg, an external electronic device).
- the antenna module 197 may include an antenna including a radiator formed of a conductor or a conductive pattern formed on a substrate (eg, PCB).
- the antenna module 197 may include a plurality of antennas (eg, an array antenna). In this case, at least one antenna suitable for a communication method used in a communication network such as the first network 198 or the second network 199 is selected from the plurality of antennas by the communication module 190, for example. can be chosen A signal or power may be transmitted or received between the communication module 190 and an external electronic device through the selected at least one antenna.
- other components eg, a radio frequency integrated circuit (RFIC) may be additionally formed as a part of the antenna module 197 in addition to the radiator.
- RFIC radio frequency integrated circuit
- the antenna module 197 may form a mmWave antenna module.
- the mmWave antenna module includes a printed circuit board, an RFIC disposed on or adjacent to a first surface (eg, bottom surface) of the printed circuit board and capable of supporting a designated high frequency band (eg, mmWave band), and and a plurality of antennas (eg, array antennas) disposed on or adjacent to a second surface (eg, a top surface or a side surface) of the printed circuit board and capable of transmitting or receiving signals of the designated high frequency band.
- peripheral devices eg, a bus, general purpose input and output (GPIO), serial peripheral interface (SPI), or mobile industry processor interface (MIPI)
- signal e.g. commands or data
- commands or data may be transmitted or received between the electronic device 101 and the external electronic device 104 through the server 108 connected to the second network 199 .
- Each of the external electronic devices 102 or 104 may be the same as or different from the electronic device 101 .
- all or part of operations executed in the electronic device 101 may be executed in one or more external electronic devices among the external electronic devices 102 , 104 , or 108 .
- the electronic device 101 when the electronic device 101 needs to perform a certain function or service automatically or in response to a request from a user or another device, the electronic device 101 instead of executing the function or service by itself.
- one or more external electronic devices may be requested to perform the function or at least part of the service.
- One or more external electronic devices receiving the request may execute at least a part of the requested function or service or an additional function or service related to the request, and deliver the execution result to the electronic device 101 .
- the electronic device 101 may provide the result as at least part of a response to the request as it is or additionally processed.
- cloud computing distributed computing, mobile edge computing (MEC), or client-server computing technology may be used.
- the electronic device 101 may provide an ultra-low latency service using, for example, distributed computing or mobile edge computing.
- the external electronic device 104 may include an internet of things (IoT) device.
- Server 108 may be an intelligent server using machine learning and/or neural networks. According to one implementation, the external electronic device 104 or server 108 may be included in the second network 199 .
- the electronic device 101 may be applied to intelligent services (eg, smart home, smart city, smart car, or health care) based on 5G communication technology and IoT-related technology.
- FIG. 2 is a block diagram illustrating an integrated intelligence system according to an embodiment of the present disclosure.
- the integrated intelligent system 20 may include an electronic device 101 , an intelligent server 200 , and a service server 300 .
- the electronic device 101 may be a terminal device (or electronic device) connectable to the Internet, and includes a mobile phone, a smart phone, a personal digital assistant (PDA), a laptop computer, a television (TV), white goods, a wearable device, and an HMD. , or a smart speaker.
- a terminal device or electronic device connectable to the Internet
- PDA personal digital assistant
- TV television
- HMD headable device
- HMD headable device
- the electronic device 101 includes an interface 177, a microphone 150-1, a speaker 155-1, a display module 160, a memory 130, or a processor 120.
- the components listed above may be operatively or electrically connected to each other.
- the microphone 150-1 may be included in an input module (eg, the input module 150 of FIG. 1).
- the speaker 155-1 may be included in an audio output module (eg, the audio output module 155 of FIG. 1).
- the interface 177 may be connected to an external device to transmit/receive data.
- the microphone 150-1 may receive sound (eg, user's speech) and convert it into an electrical signal.
- the speaker 155-1 may output an electrical signal as sound (eg, voice).
- Display module 160 may be configured to display images or video.
- the display module 160 according to an embodiment may also display a graphic user interface (GUI) of an app (or application program) being executed.
- GUI graphic user interface
- the memory 130 may store a client module 151 , a software development kit (SDK) 153 , and a plurality of apps 146 .
- the client module 151 and the SDK 153 may constitute a framework (or solution program) for performing general functions. Also, the client module 151 or the SDK 153 may configure a framework for processing voice input.
- the plurality of apps 146 in the memory 130 may be programs for performing designated functions.
- the plurality of apps 146 may include a first app 146-1 and a second app 146-2.
- Each of the plurality of apps 146 may include a plurality of operations for performing a designated function.
- the apps may include an alarm app, a message app, and/or a schedule app.
- the plurality of apps 146 may be executed by the processor 120 to sequentially execute at least some of the plurality of operations.
- the processor 120 may control overall operations of the electronic device 101 .
- the processor 120 may be electrically connected to the interface 177, the microphone 150-1, the speaker 155-1, and the display module 160 to perform a designated operation.
- the processor 120 may also execute a program stored in the memory 130 to perform a designated function.
- the processor 120 may execute at least one of the client module 151 and the SDK 153 to perform the following operation for processing a voice input.
- the processor 120 may control operations of the plurality of apps 146 through the SDK 153, for example.
- the following operations described as operations of the client module 151 or the SDK 153 may be operations performed by the processor 120 .
- the client module 151 may receive voice input. For example, the client module 151 may receive a voice signal corresponding to a user's speech detected through the microphone 150-1. The client module 151 may transmit the received voice input to the intelligent server 200. The client module 151 may transmit state information of the electronic device 101 to the intelligent server 200 together with the received voice input. The state information may be, for example, execution state information of an app.
- the client module 151 may receive a result corresponding to the received voice input. For example, the client module 151 may receive a result corresponding to the received voice input when the intelligent server 200 can calculate a result corresponding to the received voice input. The client module 151 may display the received result on the display module 160 .
- the client module 151 may receive a plan corresponding to the received voice input.
- the client module 151 may display on the display module 160 a result of executing a plurality of operations of the app according to the plan.
- the client module 151 may sequentially display execution results of a plurality of operations on the display module 160 .
- the electronic device 101 may display on the display module 160 only some results of executing a plurality of operations (eg, a result of the last operation).
- the client module 151 may receive a request for obtaining information necessary for calculating a result corresponding to a voice input from the intelligent server 200 . According to one embodiment, the client module 151 may transmit the necessary information to the intelligent server 200 in response to the request.
- the client module 151 may transmit information as a result of executing a plurality of operations according to a plan to the intelligent server 200 .
- the intelligent server 200 can confirm that the received voice input has been correctly processed using the result information.
- the client module 151 may include a voice recognition module.
- the client module 151 may recognize a voice input that performs a limited function through the voice recognition module.
- the client module 151 may execute an intelligent app for processing a voice input to perform an organic operation through a designated input (eg, wake up!).
- the intelligent server 200 may receive information related to a user's voice input from the electronic device 101 through a communication network.
- the intelligent server 200 may change data related to the received voice input into text data.
- the intelligent server 200 may generate a plan for performing a task corresponding to a user voice input based on the text data.
- the plan may be generated by an artificial intelligent (AI) system.
- the artificial intelligence system may be a rule-based system, a neural network-based system (e.g., a feedforward neural network (FNN)), a recurrent neural network (RNN) ))) could be. Alternatively, it may be a combination of the foregoing or other artificially intelligent systems.
- the plan may be selected from a set of predefined plans or may be generated in real time in response to a user request. For example, the artificial intelligence system may select at least one plan from a plurality of predefined plans.
- the intelligent server 200 may transmit a result according to the generated plan to the electronic device 101 or transmit the generated plan to the electronic device 101 .
- the electronic device 101 may display the result according to the plan on the display module 160 .
- the electronic device 101 may display the result of executing the operation according to the plan on the display module 160 .
- the intelligent server 200 includes a front end 210, a natural language platform 220, a capsule DB 230, an execution engine 240, and an end user interface. (end user interface) 250, a management platform 260, a big data platform 270, or an analytic platform 280 may be included.
- the front end 210 may receive a voice input received from the electronic device 101 .
- the front end 210 may transmit a response corresponding to the voice input.
- the natural language platform 220 includes an automatic speech recognition module (ASR module) 221, a natural language understanding module (NLU module) 223, a planner module 225 , a natural language generator module (NLG module) 227 or a text to speech module (TTS module) 229.
- ASR module automatic speech recognition module
- NLU module natural language understanding module
- NNLG module natural language generator module
- TTS module text to speech module
- the automatic voice recognition module 221 may convert the voice input received from the electronic device 101 into text data.
- the natural language understanding module 223 may determine the user's intention using text data of the voice input. For example, the natural language understanding module 223 may determine the user's intention by performing syntactic analysis or semantic analysis.
- the natural language understanding module 223 determines the user's intention by identifying the meaning of a word extracted from a voice input using linguistic features (eg, grammatical elements) of a morpheme or phrase, and matching the meaning of the identified word to the intention. can
- the planner module 225 may generate a plan using the intent and parameters determined by the natural language understanding module 223 .
- the planner module 225 may determine a plurality of domains required to perform the task based on the determined intent.
- the planner module 225 may determine a plurality of operations included in each of the determined plurality of domains based on the intent.
- the planner module 225 may determine parameters necessary for executing the determined plurality of operations or result values output by the execution of the plurality of operations.
- the parameter and the resulting value may be defined as a concept of a designated format (or class). Accordingly, the plan may include a plurality of actions and a plurality of concepts determined by the user's intention.
- the planner module 225 may determine relationships between the plurality of operations and the plurality of concepts in stages (or hierarchically). For example, the planner module 225 may determine an execution order of a plurality of operations determined based on a user's intention based on a plurality of concepts. In other words, the planner module 225 may determine an execution order of the plurality of operations based on parameters required for execution of the plurality of operations and results output by the execution of the plurality of operations. Accordingly, the planner module 225 may generate a plan including a plurality of operations and association information (eg, an ontology) between a plurality of concepts. The planner module 225 may generate a plan using information stored in the capsule database 230 in which a set of relationships between concepts and operations is stored.
- the natural language generation module 227 may change designated information into text form.
- the information changed to the text form may be in the form of natural language speech.
- the text-to-speech conversion module 229 may change text-type information into voice-type information.
- some or all of the functions of the natural language platform 220 may be implemented in the electronic device 101 as well.
- the capsule database 230 may store information about relationships between a plurality of concepts and operations corresponding to a plurality of domains.
- the capsule may include a plurality of action objects (action objects or action information) and concept objects (concept objects or concept information) included in the plan.
- the capsule database 230 may store a plurality of capsules in the form of a concept action network (CAN). According to an embodiment, a plurality of capsules may be stored in a function registry included in the capsule database 230.
- CAN concept action network
- the capsule database 230 may include a strategy registry in which strategy information necessary for determining a plan corresponding to a voice input is stored.
- the strategy information may include reference information for determining one plan when there are a plurality of plans corresponding to the voice input.
- the capsule database 230 may include a follow-up registry in which information on a follow-up action is stored to suggest a follow-up action to the user in a designated situation.
- the follow-up action may include, for example, a follow-up utterance.
- the capsule database 230 may include a layout registry that stores layout information of information output through the electronic device 101 .
- the capsule database 230 may include a vocabulary registry in which vocabulary information included in capsule information is stored.
- the capsule database 230 may include a dialog registry in which dialog (or interaction) information with a user is stored.
- the capsule database 230 may update stored objects through a developer tool.
- the developer tool may include, for example, a function editor for updating action objects or concept objects.
- the developer tool may include a vocabulary editor for updating vocabulary.
- the developer tool may include a strategy editor for creating and registering strategies that determine plans.
- the developer tool may include a dialog editor to create a dialog with the user.
- the developer tool may include a follow up editor that can activate follow up goals and edit follow up utterances that provide hints. The subsequent goal may be determined based on a currently set goal, a user's preference, or environmental conditions.
- the capsule database 230 may also be implemented in the electronic device 101 .
- the execution engine 240 may calculate a result using the generated plan.
- the end user interface 250 may transmit the calculated result to the electronic device 101 . Accordingly, the electronic device 101 may receive the result and provide the received result to the user.
- the management platform 260 may manage information used in the intelligent server 200 .
- the big data platform 270 may collect user data.
- the analysis platform 280 may manage quality of service (QoS) of the intelligent server 200 . For example, the analysis platform 280 may manage the components and processing speed (or efficiency) of the intelligent server 200 .
- QoS quality of service
- the service server 300 may provide a designated service (eg, food ordering (CP service A) 301 or hotel reservation (CP service B) 302 ) to the electronic device 101 .
- the service server 300 may be a server operated by a third party.
- the service server 300 may provide information for generating a plan corresponding to the received voice input to the intelligent server 200 .
- the provided information may be stored in the capsule database 230.
- the service server 300 may provide result information according to the plan to the intelligent server 200.
- the electronic device 101 may provide various intelligent services to the user in response to user input.
- the user input may include, for example, an input through a physical button, a touch input, or a voice input.
- the electronic device 101 may provide a voice recognition service through an internally stored intelligent app (or voice recognition app).
- the electronic device 101 may recognize a user's utterance or voice input received through the microphone, and provide a service corresponding to the recognized voice input to the user. .
- the electronic device 101 may perform a designated operation alone or together with the intelligent server 200 and/or the service server 300 based on the received voice input. For example, the electronic device 101 may execute an app corresponding to the received voice input and perform a designated operation through the executed app.
- the electronic device 101 When the electronic device 101 provides a service together with the intelligent server 200 and/or the service server 300, the electronic device detects user speech using the microphone 150-1, and A signal (or voice data) corresponding to the detected user speech may be generated. The electronic device may transmit the voice data to the intelligent server 200 through the interface 177.
- the intelligent server 200 may generate a plan for performing a task corresponding to the voice input or a result of performing an operation according to the plan.
- the plan may include, for example, a plurality of operations for performing a task corresponding to a user's voice input, and a plurality of concepts related to the plurality of operations.
- the concept may define parameters input to the execution of the plurality of operations or result values output by the execution of the plurality of operations.
- the plan may include information related to a plurality of operations and a plurality of concepts.
- the electronic device 101 may receive the response using the interface 177.
- the electronic device 101 outputs a voice signal generated inside the electronic device 101 to the outside using the speaker 155-1 or uses the display module 160 to output a voice signal generated inside the electronic device 101. Images can be output externally.
- FIG. 3 is a diagram illustrating a screen on which an electronic device processes a voice input received through an intelligent app according to an embodiment of the present disclosure.
- the electronic device 101 may execute an intelligent app to process a user input through an intelligent server (eg, the intelligent server 200 of FIG. 2 ).
- an intelligent server eg, the intelligent server 200 of FIG. 2 .
- the electronic device 101 processes a voice input when recognizing a designated voice input (eg, wake up! or receiving an input through a hardware key (eg, a dedicated hardware key).
- You can run intelligent apps for The electronic device 101 may, for example, execute an intelligent app in a state in which a schedule app is executed.
- the electronic device 101 may display an object (eg, an icon) 311 corresponding to an intelligent app on a display (eg, the display module 160 of FIG. 1 ).
- the electronic device 101 may receive a voice input by a user's speech. For example, the electronic device 101 may receive a voice input saying “tell me this week's schedule!”.
- the electronic device 101 may display a user interface (UI) 313 (eg, an input window) of an intelligent app displaying text data of the received voice input on the display.
- UI user interface
- the electronic device 101 may display a result corresponding to the received voice input on the display. For example, the electronic device 101 may receive a plan corresponding to the received user input and display 'this week's schedule' on the display according to the plan.
- FIG. 4 is a diagram illustrating a form in which relationship information between concepts and actions is stored in a database according to an implementation of the disclosure.
- a capsule database (eg, capsule database 230 of FIG. 2 ) of an intelligent server (eg, intelligent server 200 of FIG. 2 ) may store capsules in a concept action network (CAN) form.
- the capsule database may store an operation for processing a task corresponding to a user's voice input and parameters necessary for the operation in the form of a concept action network (CAN).
- the capsule database may store a plurality of capsules (capsule (A) 401 and capsule (B) 404) corresponding to each of a plurality of domains (eg, applications).
- One capsule eg, capsule (A) 401) may correspond to one domain (eg, location (geo), application).
- one capsule may correspond to at least one service provider (eg, CP 1 402, CP 2 403, or CP 4 405) for performing a function for a domain related to the capsule.
- One capsule may include at least one operation 410 and at least one concept 420 for performing a designated function.
- Other service providers, such as CP 3 406, do not need to correspond to the capsule.
- a natural language platform may generate a plan for performing a task corresponding to a received voice input using a capsule stored in a capsule database.
- a planner module eg, the planner module 225 of FIG. 2
- plan 407 is created using operations 4011 and 4013 and concepts 4012 and 4014 of capsule A 401 and operation 4041 and concept 4042 of capsule B 404. can do.
- An electronic device disclosed in this document may be a device of various types.
- the electronic device may include, for example, a portable communication device (eg, a smart phone), a computer device, a portable multimedia device, a portable medical device, a camera, a wearable device, or a home appliance.
- a portable communication device eg, a smart phone
- a computer device e.g., a laptop, a desktop, a tablet, or a portable multimedia device.
- a portable medical device e.g., a portable medical device
- camera e.g., a camera
- a wearable device e.g., a portable medical device
- a home appliance e.g., a portable medical device
- first, second, or first or secondary may simply be used to distinguish that component from other corresponding components, and may refer to that component in other respects (eg, importance or order) is not limited.
- a (eg, first) component is said to be “coupled” or “connected” to another (eg, second) component, with or without the terms “functionally” or “communicatively.”
- the certain component may be connected to the other component directly (eg by wire), wirelessly, or through a third component.
- module used in various embodiments of this document may include a unit implemented in hardware, software, or firmware, and is interchangeable with terms such as, for example, logic, logical blocks, parts, or circuits.
- a module may be an integrally constructed component or a minimal unit of components or a portion thereof that performs one or more functions.
- the module may be implemented in the form of an application-specific integrated circuit (ASIC).
- ASIC application-specific integrated circuit
- a storage medium eg, internal memory 136 or external memory 138
- a machine eg, electronic device 101 of FIG. 1
- It may be implemented as software (eg, program 140) comprising one or more instructions.
- a processor eg, the processor 120
- a device eg, the electronic device 101
- the one or more instructions may include code generated by a compiler or code executable by an interpreter.
- the device-readable storage medium may be provided in the form of a non-transitory storage medium.
- the storage medium is a tangible device and does not contain a signal (e.g. electromagnetic wave), and this term refers to the case where data is stored semi-permanently in the storage medium. It does not discriminate when it is temporarily stored.
- a signal e.g. electromagnetic wave
- the method according to various embodiments disclosed in this document may be provided by being included in a computer program product.
- Computer program products may be traded between sellers and buyers as commodities.
- a computer program product is distributed in the form of a device-readable storage medium (e.g. compact disc read only memory (CD-ROM)), or through an application store (e.g. Play StoreTM) or on two user devices (e.g. It can be distributed (eg downloaded or uploaded) online, directly between smart phones.
- a device-readable storage medium e.g. compact disc read only memory (CD-ROM)
- an application store e.g. Play StoreTM
- two user devices e.g. It can be distributed (eg downloaded or uploaded) online, directly between smart phones.
- at least part of the computer program product may be temporarily stored or temporarily created in a device-readable storage medium such as a manufacturer's server, an application store server, or a relay server's memory.
- each component (eg, module or program) of the above-described components may include a single object or a plurality of objects, and some of the plurality of objects may be separately disposed in other components. .
- one or more components or operations among the aforementioned corresponding components may be omitted, or one or more other components or operations may be added.
- a plurality of components eg modules or programs
- the integrated component may perform one or more functions of each of the plurality of components identically or similarly to those performed by a corresponding component of the plurality of components prior to the integration. .
- the actions performed by a module, program, or other component are executed sequentially, in parallel, iteratively, or heuristically, or one or more of the actions are executed in a different order, omitted, or , or one or more other operations may be added.
- FIG. 5 is a block diagram illustrating an electronic device and an intelligent server, according to embodiments of the disclosure.
- the electronic device 101 may include at least some of the components of the electronic device 101 described with reference to FIG. 1 and the electronic device 101 described with reference to FIG. 2 .
- the intelligent server 200 of FIG. 5 may include at least some of the components of the intelligent server 200 described with reference to FIG. 2 .
- descriptions overlapping those described with reference to FIGS. 1 to 4 will be omitted.
- the electronic device 101 includes an input module 150 for inputting user speech, a communication module 190 for communicating with the intelligent server 200 for processing speech, and a computer. It may include a memory 130 in which computer-executable instructions are stored and/or a processor 120 that accesses the memory 130 and executes the instructions. According to an embodiment, the electronic device 101, the input module 150, the communication module 190, the memory 130, and/or the processor 120 may include the electronic device 101 and the input module described with reference to FIG. 150, the communication module 190, the memory 130 and/or the processor 120 may correspond.
- the electronic device 101 may be an electronic device 101 that communicates with the intelligent server 200 described with reference to FIG. 2 , and the client module 151 may be included in the memory 130 .
- the processor 120 may receive user speech through the input module 150 and transmit information about the user speech and the electronic device 101 to the intelligent server 200 .
- Information on the electronic device 101 includes information about the electronic device 101's specifications, such as account information, maximum supported volume information, and/or information on whether or not it is a professional device, whether or not it is locked. At least one of information about , information about a current location of the electronic device 101 , information about a ring tone setting value, and information about an application (app) of the electronic device 101 may be included. However, it is not limited thereto, and the processor 120 may transmit various information about the electronic device 101 to the intelligent server 200 .
- the processor 120 transmits user speech and information about the electronic device 101 to the intelligent server 200 through the communication module 190, and provides the user with a speech processing result based on a command received from the intelligent server 200. can output
- the intelligent server 200 may include a natural language platform 220 , a capsule database 230 , a communication module 590 , a processor 520 and/or a memory 530 .
- the intelligent server 200 is the intelligent server 200 described with reference to FIG. 2, and the communication module 590, processor 520, memory 530, natural language platform 220 and/or capsule database 230 are shown in FIG. 2 may correspond to the configuration of the intelligent server 200.
- the communication module 590 may correspond to the front end 210 of FIG. 2 .
- the processor 520 may receive user speech and information about the electronic device 101 from the electronic device 101 through the communication module 590 .
- the intelligent server 200 receives information about the electronic devices 102 and 104 from the electronic device 101 and other electronic devices 102 and 104 interlocked with the electronic device 101 through the communication module 590 (e.g., electronic device specification information, application information installed in the electronic device) may be received.
- a user may use various electronic devices such as an intelligent speaker 102, a smart watch 104, and/or a smart TV corresponding to a user account of the electronic device 101 (eg, a smartphone),
- the intelligent server 200 may receive device specification information and application performance information installed in the device from the smart phone 101 as well as the intelligent speaker 102 and/or the smart watch 104 and maintain them in the context information 540.
- the context information 540 information 541 on the electronic devices 101, 102, and 104 and information 543 on capsules corresponding to each electronic device, for example, information on applications included in the electronic devices, are maintained. It can be.
- the processor 520 may generate a processing result of the utterance received from the electronic device 101 and transmit the processing result to the electronic device 101 through the communication module 590 .
- the natural language platform 220 includes an automatic speech recognition module (ASR module) 221, a natural language understanding module (NLU module) 223, a planner module 225, and a natural language generation module. (NLG module) 227 or text-to-speech module (TTS module) 229 may be included.
- the memory 530 may include the capsule database 230.
- the capsule database 230 may store an operation for processing a task corresponding to a user's voice input and parameters necessary for the operation in the form of a concept action network (CAN) 400.
- the concept action network 400 may be configured as described with reference to FIG. 4 .
- Context information 540 may be stored in the memory 530 of the intelligent server 200 .
- the context information 540 may include information 541 on the electronic devices 101 , 102 , and 104 and capsule information 543 corresponding to the electronic devices 101 , 102 , and 104 .
- the capsule information 543 may correspond to domain (eg, location (geo), application) information.
- the domain information is software capable of processing target speech through the electronic device 101, and includes at least one of an application downloadable to the electronic device 101, 102, and 104, a program that provides services in the form of a widget, and a webapp. may contain one.
- the context information 540 may be divided into permanent context information that does not change in real time and instant context information that changes in real time.
- Persistent context information includes network information of one or more electronic devices 101, 102, and 104, account information of one or more electronic devices 101, 102, and 104, and whether one or more electronic devices 101, 102, and 104 are professional devices. information, and at least one of performance information of one or more domains.
- the instant context information may include at least one of user preference information of one or more domains, execution history information of one or more domains, and utterance history information received by the one or more electronic devices 101, 102, and 104.
- Persistent context information is transmitted from the electronic device to the intelligent server 200 when the electronic devices 101, 102, and 104 initially connect to the intelligent server 200, and may be maintained in the memory 530 of the intelligent server 200. .
- Instant context information may be transmitted from the electronic devices 101 , 102 , and 104 to the intelligent server 200 periodically or upon request.
- the context information is temporarily stored in a cache included in the memory 530 of the intelligent server 200, and the processor 520 directly caches the context information in the memory 530 as needed without receiving it from the electronic device. Cached context information may be obtained.
- the context information 540 and the capsule database 230 are shown separately, but are not limited thereto and the context information 540 may be included in the capsule database 230 .
- a memory 530 storing computer-executable instructions and a processor 520 accessing the memory to execute instructions are the natural language platform 220 or execution engine 240 of the intelligent server 200 described with reference to FIG. ) can correspond to
- the processor 520 may generate a plan by referring to the capsule database 230 or the context information 540 as described for the natural language platform 220 in FIG. 2, and in FIG.
- processing results may be generated according to a plan.
- the processor 520 receives target speech from the electronic device 101 through the communication module 590, and processes the target speech by referring to the natural language platform 220, capsule database 230, and context information 540. may be generated and transmitted to the electronic device 101.
- an electronic device-domain combination capable of processing a target utterance is generated based on the target utterance and context information 540 received from any one of one or more electronic devices 101, 102, and 104,
- a program e.g., program 140 of FIG. 1 that determines the target electronic device and target domain to process the target utterance by calculating a quality of service score for each combination is stored in the memory 530 as software. can be stored
- on-device artificial intelligence (AI) capable of processing speech without communication with the intelligent server 200 may be installed in the electronic device 101 .
- the natural language platform 220 and/or the capsule database 230 may be implemented in the electronic device 101
- the context information 540 is also the memory 130 of the electronic device.
- an electronic device-domain combination capable of processing a target utterance is generated in the memory 130 of the electronic device 101, and a QoS score is obtained for each combination.
- a program eg, program 140 of FIG. 1
- a program 140 of FIG. 1 that determines the target electronic device and target domain to process the target utterance by calculation may be stored as software.
- the electronic device 101 When the electronic device 101 is loaded with on-device AI and functions of the intelligent server are implemented in the electronic device 101, only some functions of the intelligent server may be implemented in the electronic device 101. For example, only some components of the natural language platform 220 of the intelligent server 200 described with reference to FIG. 2 (eg, the automatic voice recognition module 221) may be implemented in the electronic device 101.
- the electronic device 101 may include only the natural language platform 220 of the intelligent server 200, and the capsule database 230 or context information 540 may be maintained in the intelligent server 200.
- the processor 520 of the intelligent server 200 receives a target speech from any one of one or more electronic devices 101, 102, and 104 (eg, the electronic device 101 in FIG. 5), and receives the target speech and context. Based on the information 540, one or more combinations including electronic device information 541 capable of processing target speech and capsule information 543 (or domain information) may be generated.
- the processor 520 is a smartphone capable of processing the target utterance "Play music to the maximum" based on the natural language platform 220, the capsule database 230, and the context information 540. -You can create combinations such as music app, smartphone-media player app, intelligent speaker-music app, intelligent speaker-media player app, smart refrigerator-music app, and smart air conditioner-music app.
- the processor 520 may determine reference information about target speech processing from among the context information 540 . For example, with respect to the target utterance “Play music at maximum”, the maximum volume information of the electronic devices 101, 102, and 104 among the electronic device information 541 is determined as reference information, and the electronic devices 101, 102, Among the capsule information 543 (or domain information) corresponding to 104), “presence or absence of an amplification function” may be determined as reference information.
- the reference information may be previously determined based on the target utterance, and may be determined by analyzing the target utterance with reference to the natural language platform 220 .
- the processor 520 may determine a target electronic device and a target domain by calculating a QoS score for each of one or more electronic device-domain combinations with reference to the reference information and determining a combination having the highest QoS score as a target combination. .
- the processor 520 calculates quality of service as the sum of controllability scores, functionality scores, accessibility scores, and robustness scores as shown in Equation 1 below for each of one or more combinations. Scores are calculated, and as shown in [Equation 2] below, a combination with the highest service quality score may be determined as a target combination.
- the processor 520 may determine a high controllability score when the combination of controllability of the electronic device and the corresponding domain is high. For example, with respect to the target utterance "record your voice the loudest", the processor 520 may determine a high controllability score for a combination of an electronic device having maximum input sensitivity and a domain having a sound source amplification function.
- the processor 520 may determine a high functionality score when there are many shareable electronic devices or domains. For example, with respect to the target utterance of “Share me”, the processor 520 may determine a high functionality score for an electronic device-domain combination with a lot of sharing frequency and shareable media.
- the processor 520 may determine a high accessibility score when automatic login is applied or there are few authentication steps. For example, for utterances related to personal accounts, such as “send me an email,” the processor 520 scores an accessibility score for a smartphone-email app combination, since the authentication process for a family device such as a TV is more complicated than that for a personal device such as a smartphone. can be determined high.
- the processor 520 may determine a high function performance stability score when there is little conflict between domains in function execution. For example, if there is a follow-up utterance "Song title” after the utterance "Play 'I'm going to see you now'", a conflict between domains may occur due to a movie of the same name and a song of the same name, so the processor 520 determines the smart TV- Among media app combinations and intelligent speaker-music app combinations, a high function performance stability score can be determined for the intelligent speaker-music app combination.
- the processor 520 may set different weights for the controllability score, the functionality score, the accessibility score, and the function performance stability score. For example, the processor 520 may calculate a QoS score by setting weights high for an accessibility score and a function performance stability score for utterances such as “ ⁇ share me” or “ ⁇ login me”.
- the QoS score calculation process is not limited to the above-described method, and the processor 520 may determine a target combination composed of the target electronic device and the target domain by calculating the QoS score for the combination in various ways.
- an internal policy (not shown) for natural language processing may be stored in the memory 530, and the processor 520 may refer to the internal policy when calculating a quality of service (QoS) score.
- QoS quality of service
- the developer may define and add items related to the quality of service score to the internal policy, and the processor 520 considers the added items and A quality of service score can be calculated for a device-domain combination.
- the processor 520 may transmit a command to process the target utterance to the target domain of the target electronic device to the target electronic device.
- the target electronic device may process the utterance in the target domain and output the result to the user.
- the memory 530 or instructions stored in the memory 130 may be implemented as one function module in the operating system 142, middleware 144, or a separate application 146.
- the processor 120 of the electronic device 101 or the processor 520 of the intelligent server 200 processes the target utterance based on the target utterance and context information 540 received from the user.
- the processor 120 of the electronic device 101 or the processor 520 of the intelligent server 200 processes the target utterance based on the target utterance and context information 540 received from the user.
- Various implementations of determining a target electronic device and target domain to process a target utterance by generating possible electronic device-domain combinations and calculating a quality of service score for each combination will be described in detail.
- 6 to 9 are views for explaining an operation of processing a user's speech, according to various embodiments of the present disclosure.
- FIG. 6 an embodiment of processing a speech related to music reproduction, for example, a target speech such as “play the music excitedly” is illustrated.
- a user's target utterance "play music!”” is input to any one of one or more electronic devices, and the target utterance is transmitted to the intelligent server 200 .
- the intelligent server 200 In FIG. 6, only the smart phone 101 and the intelligent speaker 102 are shown for simplicity, but as described with reference to FIG. 5, target speech can be input through various electronic devices such as smart watches and smart refrigerators. There is, the target utterance can be transmitted to the intelligent server (200).
- a situation 610 is a situation in which a target utterance is processed according to an existing method of determining an electronic device to process the utterance and then determining a domain to process the utterance.
- the processor 520 of the intelligent server 200 determines a target electronic device to process the received target utterance based on a predefined policy. For example, a device with good wake-up reception sensitivity may be determined as a target electronic device, and priorities among various electronic devices may be determined in advance.
- the intelligent server 200 may determine the smart phone 101, the intelligent speaker 102, and the intelligent speaker 102 of the smart TV (not shown) as a target electronic device according to a predefined policy. After determining the electronic device, the intelligent server 200 classifies the user's intention with a capsule classifier corresponding to the electronic device, and among capsules (or domains) corresponding to the electronic device, capsules capable of processing target utterances (or domain).
- the processor 520 of the intelligent server 200 may determine the smart watch as the target electronic device and process the speech through the smart watch-fortune domain.
- the fortune-telling domain of the smart watch is a domain for generating a processing result of "This function cannot be supported", and if the utterance is processed in this way, a service that does not actually meet the user's intention may be provided.
- the processor 520 determines the intelligent speaker 102 as the target electronic device according to a predefined policy, for example, a policy that the intelligent speaker 102, which is a professional device, processes a speech related to music, and , 'Music App' capable of processing a music play command among various applications included in the intelligent speaker 102 may be determined as a target domain.
- a predefined policy for example, a policy that the intelligent speaker 102, which is a professional device, processes a speech related to music, and , 'Music App' capable of processing a music play command among various applications included in the intelligent speaker 102 may be determined as a target domain.
- music may be played through a music app.
- the performance of an application performing an operation in the electronic device may not be considered.
- the intelligent server 200 determines the intelligent speaker's music app so that the intelligent speaker 102 can play the music 620. It can be played small.
- context 650 involves processing utterances based on electronic device-domain combinations.
- the processor 520 of the intelligent server 200 is based on the target utterance “play the music quietly” and the context information 540, one or more combinations of electronic device information capable of processing the target utterance and domain information. can create Whether the target utterance can be processed may be determined through the natural language platform 220 as in the conventional method.
- the processor 520 uses the smart refrigerator-music app and the smart phone-music app for the target utterance of “play music excitedly”. , smart air conditioner-music app, and/or intelligent speaker-music app.
- a processing result of "This function cannot be supported” is generated as in the above description of the smart watch, but the processor 520 can process the target utterance. Can be determined as a combination .
- the processor 520 may determine reference information about target speech processing from among the context information 540 .
- the processor 520 in response to “play music!”,” includes information on whether the electronic device is a professional device among the electronic device information 541 of the context information 540 and the current volume of the electronic device. information may be determined as reference information.
- the processor 520 may calculate a QoS score for each of one or more combinations by referring to reference information. For example, in situation 650, the processor 520 refers to information on whether the electronic device is a professional device and current volume information among the context information 540, and selects a smart refrigerator-music app, a smartphone-music app, A service quality score may be calculated for each combination of the smart air conditioner-music app and the intelligent speaker-music app. As described with reference to FIG. 5 , the processor 520 may determine a controllability score, a functionality score, an accessibility score, and a function performance stability score for each combination, and calculate a QoS score as a sum of these scores.
- the processor 520 may determine a combination having the highest QoS score as a target combination. For example, the processor 520 may determine that the QoS score of the smartphone-music app combination is the highest in consideration of information on whether the device is a professional device and information on the current volume, and “Play music excitedly” to the smartphone 101 . A command to process the target utterance of '' through the music app may be transmitted. In the smart phone 101, music may be played through a music app based on a command (660).
- FIG. 7 an implementation of processing a target utterance for reproduction at maximum volume, for example, “play music at maximum” is illustrated.
- a user's target utterance "Play music at maximum” is input to any one of one or more electronic devices, and the target utterance may be transmitted to the intelligent server 200 .
- the smart phone 101 and the intelligent speaker 102 are shown for simplicity, but as described with reference to FIG. 5, target speech is received through various electronic devices such as a smart watch and a smart refrigerator, and the intelligent server (200).
- a situation 710 is a target target according to an existing method in which a processor 520 determines an electronic device to process an utterance and then determines a domain to process the utterance. It is a situation in which ignition is being processed.
- the processor 520 determines the intelligent speaker 102 as a target electronic device according to a predefined policy, for example, a policy that the intelligent speaker 102, which is a professional device, processes speech related to music, and selects a 'music app'. Music can be played through the music app of the intelligent speaker 102 by determining the target domain. However, in situation 710, although the maximum volume of the smart phone 101 and the intelligent speaker 102 is equal to 10, the music app of the smart phone 101 may support an amplification function, but the processor 520 is intelligent.
- the speaker 102 may be determined as a target electronic device.
- the processor 520 may determine to process an utterance through a music app through user intention classification among domains corresponding to the intelligent speaker 102, and the intelligent speaker 102 may play music at maximum volume. (720).
- context 750 revolves around processing utterances based on electronic device-domain combinations.
- the processor 520 of the intelligent server 200 may process the target utterance based on the target utterance “play the music to the maximum” and the context information 540 .
- One or more combinations consisting of capable electronic device information and capsule information may be created. Whether the target utterance can be processed may be determined through the natural language platform 220 as in the conventional method.
- the processor 520 uses the smart refrigerator-music app and the smart phone-music for the target utterance of “play music to maximum”. You can create electronic device-domain combinations such as apps, smart air conditioner-music apps, and intelligent speakers-music apps.
- the processor 520 may determine reference information about target speech processing from among the context information 540 .
- the processor 520 in response to “play the music at maximum,” the processor 520 includes information about whether the electronic device is a specialized device among the electronic device information 541 of the context information 540, and the information of the electronic device. Maximum volume information may be determined as reference information, and information on whether or not a domain has an amplification function among the capsule information 543 (or corresponding domain information) may be determined as reference information.
- the processor 520 may calculate a QoS score for each of one or more combinations by referring to reference information. For example, in situation 750, the processor 520 refers to information on whether the electronic device is a professional device, maximum volume information, and domain amplification function information among the context information 540, and the smart refrigerator-music A service quality score may be calculated for each combination of the app, the smartphone-music app, the smart air conditioner-music app, and the intelligent speaker-music app. For example, as described with reference to FIG. 5 , the processor 520 may determine a controllability score, a functionality score, an accessibility score, and a function performance stability score for each combination, and calculate a QoS score as a sum of these scores.
- the processor 520 may determine a combination having the highest QoS score as a target combination. For example, the processor 520 may determine that the smartphone-music app combination has the highest quality of service score, and command the smartphone 101 to process a target utterance "Play music to the maximum" through the music app. can transmit. In the smart phone 101 , by further using the amplification function of the music app, music may be reproduced at a higher volume than the volume reproduced in the situation 710 ( 720 ) ( 760 ).
- FIG. 8 an embodiment of processing a target speech for sound quality, for example, “play music with the best quality,” is illustrated.
- a user's target utterance "Play music with the best quality" is input to one of one or more electronic devices, and the target utterance is transmitted to the intelligent server 200 .
- the target utterance is received through various electronic devices such as a smart watch or a smart refrigerator. It can be transmitted to the intelligent server (200).
- situation 810 like situation 610 described with reference to FIG. 6 and situation 710 described with reference to FIG. 7, after processor 520 determines an electronic device to process speech, speech This is a situation in which a target utterance is processed according to an existing method of determining a domain to be processed.
- the processor 520 determines the intelligent speaker 102 as a target electronic device according to a predefined policy, for example, a policy that the intelligent speaker 102, which is a professional device, processes speech related to music, and selects a 'music app'. Music can be played through the music app of the intelligent speaker 102 by determining the target domain. However, in situation 810, the sound quality of the intelligent speaker 102 is better than the sound quality of the smartphone 101, but when the sound quality of the application is also considered, the smartphone 101-app 1 combination has the best sound quality. Regardless, the processor 520 may determine the intelligent speaker 102 as the target electronic device. The processor 520 may determine to process an utterance with App 1 through user intention classification among domains corresponding to the intelligent speaker 102, and music may be played through App 1 in the intelligent speaker 102. (820).
- a predefined policy for example, a policy that the intelligent speaker 102, which is a professional device, processes speech related to music, and selects a 'music app'. Music can be played through the
- context 850 revolves around processing utterances based on electronic device-domain combinations.
- one or more combinations of electronic device information capable of processing the target utterance and capsule information may be generated. Whether the target utterance can be processed may be determined through the natural language platform 220 as in the conventional method.
- the processor 520 uses the smart refrigerator-music app, the smart phone- Electronic device-domain combinations such as App1, smartphone-app2, smart air conditioner-music app, intelligent speaker-app1, and intelligent speaker-app2 can be created.
- the processor 520 may determine reference information about target speech processing from among the context information 540 .
- the processor 520 determines, among the electronic device information 541 of the context information 540, information about the sound quality of the electronic device as reference information. And, among the capsule information 543 (or corresponding domain information), information about sound quality of the domain may be determined as reference information.
- the processor 520 may calculate a QoS score for each of one or more combinations by referring to reference information. For example, in situation 850, the processor 520 refers to the information about the sound quality of the electronic device and the information about the sound quality of the domain among the context information 540, and the smart refrigerator-music app and the smartphone-app. 1, the service quality score can be calculated for each combination of smart phone-app 2, smart air conditioner-music app, intelligent speaker-app 1, and intelligent speaker-app 2. For example, as described with reference to FIG. 5 , the processor 520 may determine a controllability score, a functionality score, an accessibility score, and a function performance stability score for each combination, and calculate a QoS score as a sum of these scores.
- the processor 520 may determine a combination having the highest QoS score as a target combination. For example, the processor 520 may determine that the combination of smartphone-app1 has the highest quality of service score, and send a target utterance to the smartphone 101, “Play music with the best quality,” to the application of the smartphone 101. You can send a command to process through 1. In the smart phone 101, music may be played through App 1 (860).
- FIG. 9 an embodiment of processing a target utterance “What time is it?” is illustrated.
- a user's target utterance “what time is it?” is input to any one of one or more electronic devices, and the target utterance is transmitted to the intelligent server 200 .
- the intelligent server 200 In FIG. 9, only the smart phone 101 and the smart watch 104 are shown for brevity, but as described with reference to FIG. 5, target speech is received through various electronic devices such as an intelligent speaker or a smart refrigerator. It can be transmitted to the intelligent server (200).
- a situation 910 is a target target according to an existing method in which a processor 520 determines an electronic device to process an utterance and then determines a domain to process the utterance. It is a situation in which ignition is being processed.
- the processor 520 determines the smart watch 104 as the target electronic device and the 'watch app' as the target domain according to a predefined policy, for example, a policy that processes an utterance with a nearby device, and determines the smart watch (
- the utterance 920 may be processed through the watch app of 104 (920). However, this does not consider the performance of the domain, for example, response time, and processing speed may be relatively slow when the utterance 920 is processed by the smart watch 104.
- Context 950 revolves around processing utterances based on electronic device-domain combinations.
- the processor 520 of the intelligent server 200 may process the target utterance based on the target utterance “What time is it?” and the context information 540 .
- One or more combinations consisting of capable electronic device information and capsule information may be created. Whether the target utterance can be processed may be determined through the natural language platform 220 as in the conventional method.
- the processor 520 uses the smart refrigerator-clock app and the smart phone-clock app for the target utterance “What time is it now?” , smart air conditioner-watch app, and intelligent speaker-watch app.
- the processor 520 may determine reference information about target speech processing from among the context information 540 .
- Information, such as response time, that is determined independently of a domain (eg, a watch app) and according to an electronic device (eg, the smart phone 101 or the smart watch 104 of FIG. 9 ) may be determined as reference information.
- the processor 520 selects response time information from capsule information 543 (or corresponding domain information) of the context information 540. It can be determined by reference information.
- the processor 520 may calculate a QoS score for each of one or more combinations by referring to reference information.
- the processor 520 refers to the response time information of the domain among the context information 540, the smart refrigerator-clock app, the smartphone-clock app, the smart air conditioner-clock app, and the intelligent speaker.
- a service quality score may be calculated by referring to information about response time, which is standard information, for each watch app combination. For example, referring to the situation 950 of FIG. 9, since the response speed of the smart watch 104 is 200 ms and the response speed of the smartphone 101 is 30 ms, the processor 520 determines the quality of service for the smart watch-watch app. The service quality score for the smartphone-watch app may be determined higher than the score.
- the processor 520 may determine a controllability score, a functionality score, an accessibility score, and a function performance stability score for each combination, and calculate a QoS score as a sum of these scores.
- the processor 520 may determine a combination having the highest QoS score as the target combination. For example, the processor 520 may determine that the smartphone-watch app combination has the highest QoS score, and transmit a command to the smartphone 101 to process the target utterance "What time is it?" through the watch app. can The smartphone 101 may output a processing result for “what time is it now?” through a watch app (960).
- the processor 520 may process the utterance more appropriately to the user's intention by determining the electronic device and domain to process the utterance with reference to the target utterance and the context information 540 . For example, processing conversations related to personal information such as text messages or phone calls with a personal device such as a smart phone rather than a common device such as a smart TV or an intelligent speaker may be more appropriate to the user's intention.
- the processor 520 of the intelligent server 200 may determine that the electronic device is a family device through the electronic device information 541 of the context information 540 when the number of accounts logged into the electronic device is plural, and "Call Mom Combinations such as an intelligent speaker-phone app and a smartphone-phone app that can handle the target utterance of "" can be created.
- function performance stability items can be considered in the process of calculating the service quality score, and the personal device smart phone-phone app combination has a higher service quality than the intelligent speaker-phone app combination. Scores can be counted high.
- the intelligent server 200 may send a command to process “Call Mom” to the smart phone through the phone app, and the utterance may be processed by the phone app of the smart phone.
- the processor 520 of the intelligent server 200 refers to the electronic device information 541 and the capsule information 543 (or domain information) in the context information 540 to process the utterance. Therefore, subsequent ignition processing through another electronic device may be facilitated.
- a follow-up utterance "Search for the same thing on PC (personal computer)" may be transmitted to the intelligent server 200 .
- information indicating that 'Changdeokgung Palace' was searched for on an electronic device such as a TV may be included in the context information 540, and the processor 520 may perform a subsequent target utterance "same as on a PC".
- search me it can be determined that 'same thing' is "Changdeokgung Palace" of the previous utterance.
- the electronic device 101 may be equipped with an on-device AI, and various operations of the processor 520 described with reference to FIGS. 6 to 9 It can be performed without communication with the intelligent server 200 by the processor 120 of the device 101 .
- FIG. 10 is a flowchart illustrating an ignition processing operation of an intelligent server according to an embodiment of the present disclosure.
- operations 1010 to 1060 may be performed by the processor 520 of the intelligent server 200 described above with reference to FIG. 5, and have been described with reference to FIGS. 1 to 9 for concise description. Content that overlaps with the content may be omitted.
- the processor 520 may receive a target utterance from any one of one or more electronic devices 101, 102, and 104. For example, as described with reference to FIG. 7 , the processor 520 may receive a target utterance “play the music to the maximum”.
- the processor 520 may generate one or more combinations of electronic device information and domain information capable of processing the target utterance based on the target utterance and the context information 540.
- the processor 520 is capable of processing a target utterance of “play music to maximum”, a smart refrigerator-music app, a smartphone-music app, a smart air conditioner- Music apps, and electronic device-domain combinations such as intelligent speaker-music apps.
- the processor 520 may determine reference information for calculating a quality of service (QoS) score from among the context information 540 based on the target utterance. For example, as described with reference to FIG. 7 , for “play music to the maximum,” the processor 520 includes information about whether the electronic device is a professional device among electronic device information 541 of the context information 540, and Information on the maximum volume of the electronic device may be determined as reference information, and information on whether or not an amplification function of the domain exists among the capsule information 543 (or corresponding domain information) may be determined as reference information.
- QoS quality of service
- the processor 520 may calculate a QoS score for each of one or more electronic device information-domain information combinations with reference to the reference information. For example, as described with reference to FIG. 7 , the processor 520 refers to information on whether the electronic device is a professional device, maximum volume information, and domain amplification function information among the context information 540, and the smart refrigerator. -Service quality scores may be calculated for each combination of the music app, the smartphone-music app, the smart air conditioner-music app, and the intelligent speaker-music app. As described with reference to FIG. 5 , in operation 1040, the processor 520 may determine a controllability score, a functionality score, an accessibility score, and a function performance stability score for each combination, and calculate a QoS score as the sum of these scores. .
- the processor 520 may determine a target combination including a target electronic device and a target domain based on the QoS score. For example, as described with reference to FIG. 7 , the processor 520 may determine that the QoS score of the smartphone-music app combination is the highest.
- the processor 520 may transmit a command to process the target utterance to the target domain to the target electronic device. For example, as described with reference to FIG. 7 , the processor 520 may transmit a command to process a target utterance of “play music to maximum” to the smartphone 101 through a music app. In the smart phone 101, music may be played at maximum through a music app.
- On-device artificial intelligence (AI) capable of processing user speech without communication with the intelligent server 200 may be installed in the electronic device 101, for example, On-device artificial intelligence (AI) may be identical to or similar to the configuration of the natural language platform 220 and the capsule database 230 of the intelligent server 200 .
- the processor 120 receives a target utterance from the user, determines a target combination composed of a target electronic device and a target domain to process the target utterance in operations 1020 to 1050, and transfers the target electronic device to the target domain in operation 1060. You can send a command to process the target utterance with .
- the intelligent server 200 for processing user speech includes information 541 for each of the one or more electronic devices 101, 102, and 104 and information 541 for each of the one or more electronic devices 101, 102, and 104.
- a memory 530 storing context information 540 including information 543 on one or more domains corresponding to , and computer-executable instructions; and a processor 520 that accesses the memory 530 and executes instructions, wherein the instructions include target utterance and context information 540 received from any one of one or more electronic devices 101, 102, and 104.
- the instructions may be configured to determine information on whether the electronic device is a professional device and current volume information of the electronic device as reference information when the target utterance is a utterance related to music reproduction.
- the commands may be configured to determine information on whether the electronic device is a professional device, information on the maximum volume of the electronic device, and information on whether or not the domain has an amplification function as reference information when the target utterance is an utterance for reproduction at a maximum volume. there is.
- the instructions may be configured to determine, as reference information, information on the sound quality of the electronic device and information on the sound quality of the domain when the target speech is a speech about sound quality.
- the context information 540 may include permanent context information that does not change in real time and instant context information that changes in real time.
- Persistent context information includes network information of one or more electronic devices 101, 102, and 104, account information of one or more electronic devices 101, 102, and 104, and whether one or more electronic devices 101, 102, and 104 are professional devices. Including at least one of information about, performance information of one or more domains,
- the instant context information may include at least one of user preference information of one or more domains, execution history information of one or more domains, and utterance history information received by one or more electronic devices 101, 102, and 104.
- the domain is software capable of processing speech through a corresponding electronic device, and may include at least one of an application, a program that provides a service in the form of a widget, and a webapp.
- the commands are controllability scores, functionality scores, accessibility scores, and function performance stability scores for each of one or more domains corresponding to one or more electronic devices 101, 102, and 104, respectively. It may be configured to calculate the service quality score as a sum of .
- a method of processing a user utterance in the intelligent server 200 includes receiving a target utterance from any one of one or more electronic devices 101, 102, and 104; Based on the target utterance and context information 540, generating one or more combinations consisting of electronic device information 541 capable of processing the target utterance and domain information 543 - context information 540 includes one or more electronic device information 541; information 541 for each of the devices 101, 102, and 104 and information 543 for one or more domains corresponding to each of the one or more electronic devices 101, 102, and 104; determining reference information for processing of target utterance among the context information 540; calculating a quality of service score for each of one or more combinations by referring to the reference information; determining a target combination composed of a target electronic device and a target domain corresponding to the target electronic device based on the QoS score; and transmitting, to the target electronic device, a command to process the target utterance into the target domain.
- Determining the reference information may include determining whether the electronic device is a professional device and current volume information of the electronic device as the reference information when the target speech is a speech related to music reproduction.
- the target speech is a speech for reproduction at the maximum volume
- information on whether the electronic device is a professional device, information on the maximum volume of the electronic device, and information on whether or not there is an amplification function of the domain are used as reference information. It may include a decision-making action.
- the operation of determining the reference information may include an operation of determining information about the sound quality of the electronic device and information about the sound quality of the domain as the reference information when the target speech is a speech about sound quality.
- the context information 540 may include permanent context information that does not change in real time and instant context information that changes in real time.
- Persistent context information includes network information of one or more electronic devices 101, 102, and 104, account information of one or more electronic devices 101, 102, and 104, and whether one or more electronic devices 101, 102, and 104 are professional devices. It may include at least one of information about and performance information of one or more domains.
- the instant context information may include at least one of user preference information of one or more domains, execution history information of one or more domains, and utterance history information received by one or more electronic devices 101, 102, and 104.
- the electronic device 101 processing a user utterance includes information 541 for each of one or more electronic devices 101, 102, and 104 including the electronic device 101 and one or more Context information 540 including information 543 on one or more domains corresponding to each of the electronic devices 101, 102, and 104, and computer-executable instructions this stored memory 130; and a processor 120 that accesses the memory 130 and executes instructions, wherein the instructions process the target speech based on the target speech and context information 540 received from the electronic device 101.
- the instructions may be configured to determine information on whether the electronic device is a professional device and current volume information of the electronic device as reference information when the target utterance is a utterance related to music reproduction.
- the commands may be configured to determine information on whether the electronic device is a professional device, information on the maximum volume of the electronic device, and information on whether or not the domain has an amplification function as reference information when the target utterance is an utterance for reproduction at a maximum volume. there is.
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Computational Linguistics (AREA)
- Theoretical Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
L'invention concerne un serveur intelligent pour traiter un énoncé d'utilisateur. Le serveur intelligent peut comprendre : une mémoire pour stocker des informations de contexte comprenant des informations sur des domaines correspondant à des dispositifs électroniques et des informations sur chacun des dispositifs électroniques ; et un processeur qui : génère, sur la base des informations de contexte et d'un énoncé cible reçu en provenance d'un ou de plusieurs dispositifs électroniques, des combinaisons d'informations de domaine et d'informations sur un dispositif électronique capable de traiter l'énoncé cible ; détermine, parmi les informations de contexte, des informations de référence pour le traitement de l'énoncé cible ; calcule une note de qualité de service pour chacune des combinaisons en référence aux informations de référence ; détermine une combinaison cible d'un dispositif électronique cible et d'un domaine cible correspondant au dispositif électronique cible en référence à la qualité de score de service ; et transmet une commande pour traiter l'énoncé cible en utilisant le domaine cible au dispositif électronique cible.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US17/891,676 US20230095294A1 (en) | 2021-09-24 | 2022-08-19 | Server and electronic device for processing user utterance and operating method thereof |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| KR1020210126219A KR20230043397A (ko) | 2021-09-24 | 2021-09-24 | 사용자 발화를 처리하는 서버, 전자 장치 및 그의 동작 방법 |
| KR10-2021-0126219 | 2021-09-24 |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/891,676 Continuation US20230095294A1 (en) | 2021-09-24 | 2022-08-19 | Server and electronic device for processing user utterance and operating method thereof |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2023048379A1 true WO2023048379A1 (fr) | 2023-03-30 |
Family
ID=85720860
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/KR2022/010924 Ceased WO2023048379A1 (fr) | 2021-09-24 | 2022-07-26 | Serveur et dispositif électronique pour traiter un énoncé d'utilisateur, et son procédé de fonctionnement |
Country Status (2)
| Country | Link |
|---|---|
| KR (1) | KR20230043397A (fr) |
| WO (1) | WO2023048379A1 (fr) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN118433144A (zh) * | 2024-07-04 | 2024-08-02 | 阿里健康科技(杭州)有限公司 | 目标即时通信消息的确定方法及相关装置 |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR20110066357A (ko) * | 2009-12-11 | 2011-06-17 | 삼성전자주식회사 | 대화 시스템 및 그의 대화 방법 |
| US20190180750A1 (en) * | 2014-10-01 | 2019-06-13 | XBrain, Inc. | Voice and Connection Platform |
| US20200279556A1 (en) * | 2010-01-18 | 2020-09-03 | Apple Inc. | Task flow identification based on user intent |
| KR20210036527A (ko) * | 2019-09-26 | 2021-04-05 | 삼성전자주식회사 | 사용자 발화를 처리하는 전자 장치 및 그 작동 방법 |
| KR20210112403A (ko) * | 2019-02-06 | 2021-09-14 | 구글 엘엘씨 | 클라이언트-컴퓨팅된 콘텐츠 메타데이터에 기반한 음성 질의 QoS |
-
2021
- 2021-09-24 KR KR1020210126219A patent/KR20230043397A/ko active Pending
-
2022
- 2022-07-26 WO PCT/KR2022/010924 patent/WO2023048379A1/fr not_active Ceased
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR20110066357A (ko) * | 2009-12-11 | 2011-06-17 | 삼성전자주식회사 | 대화 시스템 및 그의 대화 방법 |
| US20200279556A1 (en) * | 2010-01-18 | 2020-09-03 | Apple Inc. | Task flow identification based on user intent |
| US20190180750A1 (en) * | 2014-10-01 | 2019-06-13 | XBrain, Inc. | Voice and Connection Platform |
| KR20210112403A (ko) * | 2019-02-06 | 2021-09-14 | 구글 엘엘씨 | 클라이언트-컴퓨팅된 콘텐츠 메타데이터에 기반한 음성 질의 QoS |
| KR20210036527A (ko) * | 2019-09-26 | 2021-04-05 | 삼성전자주식회사 | 사용자 발화를 처리하는 전자 장치 및 그 작동 방법 |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN118433144A (zh) * | 2024-07-04 | 2024-08-02 | 阿里健康科技(杭州)有限公司 | 目标即时通信消息的确定方法及相关装置 |
Also Published As
| Publication number | Publication date |
|---|---|
| KR20230043397A (ko) | 2023-03-31 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2022177164A1 (fr) | Dispositif électronique et procédé pour fournir un service d'intelligence artificielle sur dispositif | |
| WO2023017975A1 (fr) | Dispositif électronique permettant de délivrer en sortie un résultat de traitement de commande vocale à la suite d'un changement d'état et son procédé de fonctionnement | |
| WO2023048379A1 (fr) | Serveur et dispositif électronique pour traiter un énoncé d'utilisateur, et son procédé de fonctionnement | |
| WO2022220559A1 (fr) | Dispositif électronique de traitement d'un énoncé d'utilisateur et son procédé de commande | |
| WO2023177079A1 (fr) | Serveur et dispositif électronique permettant de traiter une parole d'utilisateur sur la base d'un vecteur synthétique, et procédé de fonctionnement associé | |
| WO2023158076A1 (fr) | Dispositif électronique et son procédé de traitement d'énoncé | |
| WO2024043729A1 (fr) | Dispositif électronique et procédé de traitement d'une réponse à un utilisateur par dispositif électronique | |
| WO2024063507A1 (fr) | Dispositif électronique et procédé de traitement d'énoncé d'utilisateur d'un dispositif électronique | |
| WO2023022381A1 (fr) | Dispositif électronique et procédé de traitement de la parole de dispositif électronique | |
| WO2022191395A1 (fr) | Appareil de traitement d'une instruction utilisateur et son procédé de fonctionnement | |
| WO2022182038A1 (fr) | Dispositif et procédé de traitement de commande vocale | |
| WO2022139420A1 (fr) | Dispositif électronique et procédé de partage d'informations d'exécution d'un dispositif électronique concernant une entrée d'utilisateur avec continuité | |
| WO2022025448A1 (fr) | Dispositif électronique et son procédé de fonctionnement | |
| WO2022163963A1 (fr) | Dispositif électronique et procédé de réalisation d'instruction de raccourci de dispositif électronique | |
| WO2023008798A1 (fr) | Dispositif électronique de gestion de réponses inappropriées et son procédé de fonctionnement | |
| WO2024058524A1 (fr) | Procédé de détermination de faux rejet et dispositif électronique pour sa mise en oeuvre | |
| WO2024029850A1 (fr) | Procédé et dispositif électronique pour traiter un énoncé d'utilisateur sur la base d'un modèle de langage | |
| WO2022177165A1 (fr) | Dispositif électronique et procédé permettant d'analyser un résultat de reconnaissance vocale | |
| WO2025005554A1 (fr) | Procédé d'obtention d'informations utilisateur et dispositif électronique exécutant ledit procédé | |
| WO2022191425A1 (fr) | Dispositif électronique pour appliquer un effet visuel à un texte de dialogue et son procédé de commande | |
| WO2023043025A1 (fr) | Procédé de traitement d'un énoncé continu incomplet, serveur et procédé de réalisation de dispositif électronique | |
| WO2025080098A1 (fr) | Dispositif électronique et procédé permettant de traiter un énoncé d'utilisateur | |
| WO2023106862A1 (fr) | Dispositif électronique et procédé de fonctionnement d'un dispositif électronique | |
| WO2023132470A1 (fr) | Serveur et dispositif électronique pour traiter un énoncé d'utilisateur et procédé d'action associé | |
| WO2023008819A1 (fr) | Dispositif électronique et son procédé de fonctionnement |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22873085 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 22873085 Country of ref document: EP Kind code of ref document: A1 |