CN110136713A

CN110136713A - User's dialogue method and system in multimodal interaction

Info

Publication number: CN110136713A
Application number: CN201910396697.1A
Authority: CN
Inventors: 龚建明; 朱成亚; 甘津瑞
Original assignee: AI Speech Ltd
Current assignee: AI Speech Ltd
Priority date: 2019-05-14
Filing date: 2019-05-14
Publication date: 2019-08-16

Abstract

The present invention discloses a kind of dialogue method of user in multi-modal interaction, including, establish semantic corresponding relationship.Establish semantic instructions collection.Current session paragraph is obtained by interactive device, the interactive device is the interactive device that can be realized multi-modal information input.The semantic corresponding relationship is called according to current session paragraph, obtains current semantics information.The semantic instructions collection is called according to the current semantics information, obtains and currently locally executes instruction.Described currently locally execute is executed to instruct and show implementing result at user session end.What synchronous local cache data was solved to cloud service.After a wheel end-of-dialogue or user switches dialogue topic, for example from weather, is switched to music, and want to carry out the dialogue of weather associated topic, precisely due in above scheme, supports dialogue state is synchronized in cloud service by client to solve.

Description

User's dialogue method and system in multimodal interaction

技术领域technical field

本发明属于用户交互及多模态技术领域，实现了语音，屏幕，摄像头等多模态交互体验，尤其涉及用户在多模态交互中的对话方法及系统。The invention belongs to the technical field of user interaction and multimodality, realizes multimodal interaction experience such as voice, screen, camera, etc., and particularly relates to a user dialogue method and system in multimodal interaction.

背景技术Background technique

在智能交互及人机对话的对话过程中，客户端执行多模态，此处的多模态包括，语音交互、文本交互及事件交互。其上述多模态的交互方式，必须与对话服务器做网络交互，但是多模态操作交互相对于语音交互而言，是一个比较简单的动作，用户有直观的感受，比如：用户在界面上做了触屏操作之后，希望得到正确并且更加及时的反馈，但是在网络环境较差的情况下，网络交互必然受到影响，尤其是导航领域，因此用户体验会大大降低。In the dialogue process of intelligent interaction and human-machine dialogue, the client performs multimodality, and the multimodality here includes voice interaction, text interaction and event interaction. The above-mentioned multi-modal interaction method requires network interaction with the dialogue server, but multi-modal operation interaction is a relatively simple action compared to voice interaction, and the user has an intuitive feeling. After the touch screen operation, I hope to get correct and more timely feedback, but in the case of a poor network environment, network interaction will inevitably be affected, especially in the navigation field, so the user experience will be greatly reduced.

现有技术在解决上述问题时，一般情况下，首先会将多模态操作，转成相同意思的文本，通过网络传输到云端网络，走识别，语义解析，对话流程。之后客户端需要支持多模态操作和语音交互流程的互通；一般竞品公司只会支持某一种动作，比如智能音箱类产品，只需要支持语音交互，且在家中等网络环境比较稳定的场景使用。最后客户端需要拥有离线语义解析。When the prior art solves the above problems, in general, the multi-modal operation is first converted into text with the same meaning, and transmitted to the cloud network through the network, and the process of recognition, semantic analysis, and dialogue is performed. After that, the client needs to support the interworking of multi-modal operations and voice interaction processes; generally, competing companies will only support a certain type of action, such as smart speaker products, which only need to support voice interaction, and are used in scenarios where the network environment is relatively stable, such as at home. . Finally, the client needs to have offline semantic parsing.

发明内容SUMMARY OF THE INVENTION

本发明实施例提供一种用户在多模态交互中的对话方法及系统，用于至少解决上述技术问题之一。Embodiments of the present invention provide a method and system for user dialogue in multimodal interaction, which are used to solve at least one of the above technical problems.

第一方面，本发明实施例提供了用户在多模态交互中的对话方法，包括：In a first aspect, an embodiment of the present invention provides a user dialogue method in multimodal interaction, including:

步骤S101，建立语义对应关系，所述语义对应关系为多个对话语段与多个语义信息的一一对应关系；Step S101, establishing a semantic correspondence, where the semantic correspondence is a one-to-one correspondence between multiple dialogue segments and multiple semantic information;

步骤S102，建立语义指令集，所述语义指令集是指：建立所述多个语义信息与多个本地执行指令之间的对应关联，该关联所组成的集合；Step S102, establishing a semantic instruction set, where the semantic instruction set refers to: establishing a corresponding association between the plurality of semantic information and a plurality of local execution instructions, and the set formed by the association;

步骤S103，通过交互设备获取当前对话语段，所述交互设备为能够实现多模态信息输入的交互设备；Step S103, obtaining the current dialogue segment through an interactive device, where the interactive device is an interactive device capable of inputting multimodal information;

步骤S104，根据当前对话语段调用所述语义对应关系，获取当前语义信息；Step S104, calling the semantic correspondence according to the current dialogue segment to obtain current semantic information;

步骤S105，根据所述当前语义信息调用所述语义指令集，获取当前本地执行指令；Step S105, calling the semantic instruction set according to the current semantic information to obtain the current local execution instruction;

步骤S106，执行所述当前本地执行指令并将执行结果显示在用户对话端；所述用户对话端为用户本地端。Step S106: Execute the current local execution instruction and display the execution result on the user dialogue terminal; the user dialogue terminal is the user local terminal.

在本发明的另一个实施例中，所述步骤S103中包括：In another embodiment of the present invention, the step S103 includes:

步骤S1031a，通过所述交互设备，获取当前语音信息及当前文本信息；Step S1031a, obtaining current voice information and current text information through the interactive device;

步骤S1032a，将所述当前语音信息通过语音识别转换为当前文本字段；Step S1032a, converting the current voice information into a current text field by voice recognition;

步骤S1033a，将所述当前文本信息或所述当前文本字段确定为所述当前对话语段。Step S1033a, determining the current text information or the current text field as the current dialogue segment.

在本发明的另一个实施例中，所述交互设备包括能够作为输入设备的触屏；In another embodiment of the present invention, the interaction device includes a touch screen capable of serving as an input device;

所述步骤S103中包括：The step S103 includes:

建立事件对应关系，所述事件对应关系为多个事件操作信息与多个对话语段的一一对应关系，所述多个事件操作信息为用户对所述触屏进行单点敲击、双击、沿单线轨迹滑动及沿双线轨迹滑动时，所述触屏能够获取的信息；Establish an event correspondence relationship, where the event correspondence relationship is a one-to-one correspondence between multiple event operation information and multiple dialogue segments, and the multiple event operation information is the user's single-click, double-click, The information that the touch screen can obtain when sliding along the single-line trajectory and sliding along the dual-line trajectory;

步骤S1031b，通过所述交互设备接收当前事件操作信息，所述当前事件操作信息包括在所述事件操作信息中；Step S1031b, receiving current event operation information through the interactive device, where the current event operation information is included in the event operation information;

步骤S1032b，根据所述当前事件操作信息调用所述事件对应关系获取当前对话语段。Step S1032b, calling the event correspondence according to the current event operation information to obtain the current dialogue segment.

在本发明的另一个实施例中，所述多个语义信息中包括，接收检索字符的字符段、调用地图字符段、调用系统操作字符段，屏幕放大及屏幕缩小字符段；In another embodiment of the present invention, the plurality of semantic information includes: a character field for receiving a retrieval character, a character field for calling a map, a character field for calling a system operation, a screen zooming in and a screen zooming out character field;

所述多个本地执行指令包括：检索并显示本地数据、检索并显示本地地图、显示本地对话框、执行系统调用、屏幕局部区域图像放大指令及屏幕局部区域图像缩小指令。The multiple local execution instructions include: retrieving and displaying local data, retrieving and displaying a local map, displaying a local dialog box, executing a system call, a command for zooming in an image of a partial area of the screen, and an instruction for reducing an image of a partial area of the screen.

在本发明的另一个实施例中，所述多个本地执行指令中包括调用本地In another embodiment of the present invention, the plurality of locally executed instructions include calling a local

数据的执行指令及调用远程数据的执行指令；Data execution instructions and execution instructions for calling remote data;

所述步骤S106中包括：The step S106 includes:

步骤S1061，若所述当前本地执行指令是调用本地数据的执行指令，则调用当前本地数据，根据所述当前本地数据执行所述当前本地执行指令并将执行结果显示在用户对话端；及Step S1061, if the current local execution instruction is an execution instruction that calls local data, then call the current local data, execute the current local execution instruction according to the current local data and display the execution result on the user dialog; and

步骤S1062，若所述当前本地执行指令是调用远程数据的执行指令，则建立远程数据链接，从远程调用当前远程数据，根据所述当前远程数据执行所述当前本地执行指令并将执行结果显示在用户对话端。Step S1062, if the current local execution instruction is an execution instruction that calls remote data, then establish a remote data link, call the current remote data from a remote location, execute the current local execution instruction according to the current remote data, and display the execution result in the User dialog.

在本发明的另一个实施例中，其中，所述S1062中还包括，若不能建立远程数据链接，则定时对远程网络信号进行检测；若能检测到远程网络信号，则建立远程数据链接。In another embodiment of the present invention, the step S1062 further includes, if the remote data link cannot be established, periodically detecting the remote network signal; if the remote network signal can be detected, establishing the remote data link.

第二方面，本发明实施例提供一种用户在多模态交互中的对话系统，包括：In a second aspect, an embodiment of the present invention provides a user dialogue system in multimodal interaction, including:

语义对应关系单元，其配置为建立所述语义对应关系为多个对话语段与多个语义信息的一一对应关系。The semantic correspondence unit is configured to establish the semantic correspondence as a one-to-one correspondence between a plurality of dialogue segments and a plurality of semantic information.

语义指令集单元，其配置为建立所述多个语义信息与多个本地执行指令之间的对应关联，该关联所组成的集合。A semantic instruction set unit, which is configured to establish a corresponding association between the plurality of semantic information and a plurality of locally executed instructions, a set formed by the association.

当前对话语段获取单元，其配置为通过交互设备获取当前对话语段，所述交互设备为能够实现多模态信息输入的交互设备。The current dialogue segment acquiring unit is configured to acquire the current dialogue segment through an interaction device, the interaction device being an interaction device capable of inputting multimodal information.

当前语义信息获取单元，其配置为根据当前对话语段调用所述语义对应关系单元中的语义对应关系，获取当前语义信息。The current semantic information obtaining unit is configured to call the semantic correspondence in the semantic correspondence unit according to the current dialogue segment to obtain the current semantic information.

当前本地执行指令获取单元，其配置为根据所述当前语义信息调用所述语义指令集单元中的语义指令集，获取当前本地执行指令。The current local execution instruction acquiring unit is configured to call the semantic instruction set in the semantic instruction set unit according to the current semantic information to acquire the current local execution instruction.

对话单元，其配置为执行所述当前本地执行指令并将执行结果显示在用户对话端；所述用户对话端为用户本地端。A dialog unit, configured to execute the current local execution instruction and display the execution result on the user dialog end; the user dialog end is the user local end.

在本发明的另一个实施例中，所述当前对话语段获取单元还配置为：通过所述交互设备，获取当前语音信息及当前文本信息；将所述当前语音信息通过语音识别转换为当前文本字段；将所述当前文本信息或所述当前文本字段确定为所述当前对话语段。In another embodiment of the present invention, the current dialogue segment obtaining unit is further configured to: obtain current voice information and current text information through the interactive device; convert the current voice information into current text through voice recognition field; determining the current text information or the current text field as the current dialogue segment.

所述当前对话语段获取单元还配置为：建立事件对应关系，所述事件对应关系为多个事件操作信息与多个对话语段的一一对应关系，所述多个事件操作信息为用户对所述触屏进行单点敲击、双击、沿单线轨迹滑动及沿双线轨迹滑动时，所述触屏能够获取的信息；通过所述交互设备接收当前事件操作信息，所述当前事件操作信息包括在所述事件操作信息中；根据所述当前事件操作信息调用所述事件对应关系获取当前对话语段。The current dialogue segment acquisition unit is further configured to: establish an event corresponding relationship, the event corresponding relationship is a one-to-one correspondence between multiple event operation information and multiple dialogue segments, and the multiple event operation information is the user's corresponding relationship. The information that the touch screen can obtain when the touch screen performs single-click tapping, double-clicking, sliding along a single-line trajectory, and sliding along a double-line trajectory; and receiving current event operation information through the interactive device, the current event operation information Included in the event operation information; calling the event correspondence according to the current event operation information to obtain the current dialogue segment.

在本发明的另一个实施例中，所述多个本地执行指令中包括调用本地数据的执行指令及调用远程数据的执行指令；In another embodiment of the present invention, the plurality of local execution instructions include an execution instruction for calling local data and an execution instruction for calling remote data;

所述对话单元还配置为，若所述当前本地执行指令是调用本地数据的执行指令，则调用当前本地数据，根据所述当前本地数据执行所述当前本地执行指令并将执行结果显示在用户对话端；及若所述当前本地执行指令是调用远程数据的执行指令，则建立远程数据链接，从远程调用当前远程数据，根据所述当前远程数据执行所述当前本地执行指令并将执行结果显示在用户对话端。The dialogue unit is further configured to, if the current local execution instruction is an execution instruction that calls local data, call the current local data, execute the current local execution instruction according to the current local data, and display the execution result in the user dialogue. and if the current local execution instruction is an execution instruction that calls remote data, then a remote data link is established, the current remote data is called from a remote location, the current local execution instruction is executed according to the current remote data and the execution result is displayed in the User dialog.

第三方面，提供一种电子设备，其包括：至少一个处理器，以及与所述至少一个处理器通信连接的存储器，其中，所述存储器存储有可被所述至少一个处理器执行的指令，所述指令被所述至少一个处理器执行，以使所述至少一个处理器能够执行本发明任一实施例的方法的步骤。In a third aspect, an electronic device is provided, comprising: at least one processor, and a memory communicatively connected to the at least one processor, wherein the memory stores instructions executable by the at least one processor, The instructions are executed by the at least one processor to enable the at least one processor to perform the steps of the method of any embodiment of the present invention.

第四方面，本发明实施例还提供一种计算机程序产品，所述计算机程序产品包括存储在非易失性计算机可读存储介质上的计算机程序，所述计算机程序包括程序指令，当所述程序指令被计算机执行时，使所述计算机执行本发明任一实施例的方法的步骤。In a fourth aspect, an embodiment of the present invention further provides a computer program product, the computer program product includes a computer program stored on a non-volatile computer-readable storage medium, the computer program includes program instructions, and when the program is The instructions, when executed by a computer, cause the computer to perform the steps of the method of any embodiment of the present invention.

本申请中的用户在多模态交互中的对话方法，通过在本地建立语义指令集对用户本地输入的多模态信息(语音交互信息、文本交互信息及事件交互信息)进行解析，获得用户当前的对应指令。在用户端对对应指令进行执行从而，用户可以执行多模态操作，来达到交互需求，不会因为网络延迟等原因，造成系统反应缓慢，从而降低用户体验。系统反应迅速，就是因为上述方案，提供了本地对话定制，以及同步本地缓存数据到云端服务来解决的。在一轮对话结束之后，或者用户切换对话话题，比如从天气，切换到音乐，又想进行天气相关话题的对话，就是由于上述方案中，支持客户端将对话状态同步到云端服务中来解决的。In the method of user dialogue in multimodal interaction in this application, the multimodal information (voice interaction information, text interaction information and event interaction information) input by the user locally is analyzed by establishing a semantic instruction set locally, and the current user's current interaction information is obtained. the corresponding command. The corresponding instructions are executed on the user side, so that the user can perform multi-modal operations to meet the interaction requirements, and the system will not respond slowly due to network delays and other reasons, thereby reducing the user experience. The system responds quickly because the above solution provides local dialog customization and synchronization of locally cached data to cloud services. After a round of conversation ends, or the user switches the conversation topic, such as from weather to music, and wants to have a conversation on weather-related topics, this is because in the above solution, the client is supported to synchronize the conversation state to the cloud service to solve the problem. .

附图说明Description of drawings

为了更清楚地说明本发明实施例的技术方案，下面将对实施例描述中所需要使用的附图作一简单地介绍，显而易见地，下面描述中的附图是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions of the embodiments of the present invention more clearly, the following briefly introduces the accompanying drawings used in the description of the embodiments. Obviously, the drawings in the following description are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without any creative effort.

图1为本发明一实施例提供的一种用户在多模态交互中的对话方法的流程图；FIG. 1 is a flowchart of a method for user dialogue in multimodal interaction according to an embodiment of the present invention;

图2为本发明一实施例提供的另一种用户在多模态交互中的对话方法的流程图；FIG. 2 is a flowchart of another method for user dialogue in multimodal interaction provided by an embodiment of the present invention;

图3为本发明一实施例提供的又一种用户在多模态交互中的对话方法的流程图；FIG. 3 is a flowchart of another method for user dialogue in multimodal interaction provided by an embodiment of the present invention;

图4为本发明一实施例提供的再一种用户在多模态交互中的对话方法的流程图；FIG. 4 is a flowchart of still another method for user dialogue in multimodal interaction provided by an embodiment of the present invention;

图5为本发明一实施方式中提供的一种用户在多模态交互中的对话系统的结构图；5 is a structural diagram of a user dialogue system in multimodal interaction provided in an embodiment of the present invention;

图6为本发明一实施例提供的一种用户在多模态交互中的对话系统的组成示意图；6 is a schematic diagram of the composition of a user dialogue system in multimodal interaction according to an embodiment of the present invention;

图7是本发明一实施例提供的电子设备的结构示意图。FIG. 7 is a schematic structural diagram of an electronic device provided by an embodiment of the present invention.

具体实施方式Detailed ways

为使本发明实施例的目的、技术方案和优点更加清楚，下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments These are some embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

本发明的一种实施例中，如图1所示，公开了一种用户在多模态交互中的对话方法，包括以下步骤：In an embodiment of the present invention, as shown in FIG. 1 , a method for user dialogue in multimodal interaction is disclosed, including the following steps:

步骤S101，建立语义对应关系。Step S101, establishing a semantic correspondence.

本步骤中，建立语义对应关系，语义对应关系为多个对话语段与多个语义信息的一一对应关系。上述多个对话语段为用户可能输入的语段。In this step, a semantic correspondence is established, and the semantic correspondence is a one-to-one correspondence between a plurality of dialogue segments and a plurality of semantic information. The above-mentioned multiple dialogue segments are segments that the user may input.

步骤S102，建立语义指令集。Step S102, establishing a semantic instruction set.

本步骤中，建立语义指令集，语义指令集是指：建立多个语义信息与多个本地执行指令之间的对应关联，该关联所组成的集合。In this step, a semantic instruction set is established, and the semantic instruction set refers to: establishing corresponding associations between a plurality of semantic information and a plurality of locally executed instructions, and a set formed by the associations.

步骤S103，获取当前对话语段。Step S103, acquiring the current dialogue segment.

本步骤中，通过交互设备获取当前对话语段，交互设备为能够实现多模态信息输入的交互设备。多个语义信息中包括，接收检索字符的字符段、调用地图字符段、调用系统操作字符段，屏幕放大及屏幕缩小字符段。多个本地执行指令包括：检索并显示本地数据、检索并显示本地地图、显示本地对话框、执行系统调用、屏幕局部区域图像放大指令及屏幕局部区域图像缩小指令。In this step, the current dialogue segment is acquired through an interactive device, which is an interactive device capable of inputting multimodal information. The plurality of semantic information includes a character field for receiving a retrieval character, a character field for invoking a map, a character field for invoking a system operation, and a character field for screen enlargement and screen reduction. The multiple local execution commands include: retrieving and displaying local data, retrieving and displaying local maps, displaying local dialog boxes, executing system calls, commands for zooming in on a local area of the screen, and commands for reducing images in a local area of the screen.

步骤S104，获取当前语义信息。Step S104, obtaining current semantic information.

本步骤中，根据当前对话语段调用语义对应关系，获取当前语义信息。In this step, the semantic correspondence is invoked according to the current dialogue segment to obtain the current semantic information.

步骤S105，获取当前本地执行指令。Step S105, obtaining the current local execution instruction.

本步骤中，根据当前语义信息调用语义指令集，获取当前本地执行指令。In this step, the semantic instruction set is invoked according to the current semantic information to obtain the current local execution instruction.

步骤S106，实现交互。Step S106, realizing interaction.

本步骤中，执行当前本地执行指令并将执行结果显示在用户对话端。用户对话端为用户本地端。In this step, the current local execution instruction is executed and the execution result is displayed on the user dialog. The user dialog end is the user's local end.

在本发明一种用户在多模态交互中的对话方法的另一种优选的实施例中，如图2所示，步骤S103中包括：In another preferred embodiment of a user dialogue method in multimodal interaction of the present invention, as shown in FIG. 2 , step S103 includes:

步骤S1031a，获取语音信息。Step S1031a, acquiring voice information.

本步骤中，通过交互设备获取当前语音信息及当前文本信息。In this step, the current voice information and the current text information are acquired through the interactive device.

步骤S1032a，进行文本转换。Step S1032a, performing text conversion.

本步骤中，将当前语音信息通过语音识别转换为当前文本字段。In this step, the current voice information is converted into the current text field through voice recognition.

步骤S1033a，确定对话语段。Step S1033a, determine the dialogue segment.

本步骤中，将当前文本信息或文本字段确定为当前对话语段。In this step, the current text information or text field is determined as the current dialogue segment.

在本发明一种用户在多模态交互中的对话方法的另一种优选的实施例中，如图3所示，交互设备包括能够作为输入设备的触屏。In another preferred embodiment of a user dialogue method in multimodal interaction of the present invention, as shown in FIG. 3 , the interaction device includes a touch screen that can be used as an input device.

步骤S103中包括：Step S103 includes:

建立事件对应关系，事件对应关系为多个事件操作信息与多个对话语段的一一对应关系，多个事件操作信息为用户对触屏进行单点敲击、双击、沿单线轨迹滑动及沿双线轨迹滑动时，触屏能够获取的信息。Establish an event correspondence relationship. The event correspondence relationship is a one-to-one correspondence between multiple event operation information and multiple dialogue segments. Multiple event operation information is the user's single-tap, double-click, sliding along a single-line trajectory and along the touch screen. The information that can be obtained by the touch screen when the two-line track slides.

步骤S1031b，接收当前事件操作。Step S1031b, receiving the current event operation.

本步骤中，通过交互设备接收当前事件操作信息，当前事件操作信息包括在事件操作信息中。In this step, the current event operation information is received through the interaction device, and the current event operation information is included in the event operation information.

步骤S1032b，确定事件对应的对话语段。Step S1032b, determining the dialogue segment corresponding to the event.

本步骤中，根据当前事件操作信息调用事件对应关系获取当前对话语段。In this step, the current dialogue segment is obtained by invoking the event correspondence according to the current event operation information.

在本发明一种用户在多模态交互中的对话方法的另一种优选的实施例中，如图4所示，多个本地执行指令中包括调用本地数据的执行指令及调用远程数据的执行指令。In another preferred embodiment of a user dialogue method in multimodal interaction of the present invention, as shown in FIG. 4 , the multiple local execution instructions include an execution instruction for calling local data and an execution instruction for calling remote data instruction.

步骤S106中包括：Step S106 includes:

步骤S1061，调用本地数据。Step S1061, calling local data.

本步骤中，若当前本地执行指令是调用本地数据的执行指令，则调用当前本地数据，根据当前本地数据执行当前本地执行指令并将执行结果显示在用户对话端。及In this step, if the current local execution instruction is an execution instruction for calling local data, the current local data is called, the current local execution instruction is executed according to the current local data, and the execution result is displayed on the user dialog. and

步骤S1062，调用远程数据。Step S1062, calling remote data.

本步骤中，若当前本地执行指令是调用远程数据的执行指令，则建立远程数据链接，从远程调用当前远程数据，根据当前远程数据执行当前本地执行指令并将执行结果显示在用户对话端。In this step, if the current local execution instruction is an execution instruction for calling remote data, a remote data link is established, the current remote data is called remotely, the current local execution instruction is executed according to the current remote data, and the execution result is displayed on the user dialog.

在上述S1062中还包括：若不能建立远程数据链接，则定时对远程网络信号进行检测。若能检测到远程网络信号，则建立远程数据链接。The above-mentioned S1062 also includes: if the remote data link cannot be established, periodically detecting the remote network signal. If a remote network signal can be detected, a remote data link is established.

本发明的一种实施例中，如图5所示，公开了一种用户在多模态交互中的对话方法，包括以下步骤：In an embodiment of the present invention, as shown in FIG. 5 , a method for user dialogue in multimodal interaction is disclosed, including the following steps:

本发明中所出现的英文缩写的英文全称及中文定义如下：The full English name and Chinese definition of the English abbreviations appearing in the present invention are as follows:

1.Usr:user用户。1.Usr:user user.

2.Sys：system系统。2.Sys: system system.

3.AVS：Alexa Voice Service亚马逊智能对话服务。3.AVS: Alexa Voice Service Amazon Smart Conversation Service.

4.Dueros：百度AI开放平台。4. Dueros: Baidu AI Open Platform.

步骤一：step one:

用户在执行多模态操作的时候，比如手动点击屏幕，执行一个操作，此时调用本地机器程序生成可处理的信息，例如：文本。When a user performs a multimodal operation, such as manually clicking on the screen to perform an operation, a local machine program is called to generate processable information, such as text.

步骤二:Step 2:

客户端调度模块，将上述文本，传递给本地语义解析模块做离线处理，得到对话可以处理的语义数据；处理指的是，根据相关的协议，将有效值转成对话可以处理的语义数据(实际上是一个json格式的数据)语义数据是满足一定协议的json格式数据。The client scheduling module transmits the above text to the local semantic parsing module for offline processing to obtain semantic data that can be processed by the dialogue; processing refers to converting the valid value into semantic data that can be processed by the dialogue according to the relevant protocol (actual The above is a json format data) semantic data is json format data that satisfies a certain protocol.

步骤三：Step 3:

客户端调度模块，将上述语义数据，做深度的定制，比如定制对话回复文本，或者本地执行命令。The client-side scheduling module makes deep customization of the above semantic data, such as customizing the dialogue reply text, or executing commands locally.

步骤四:Step 4:

客户端调度模块，得到定制返回的数据，直接做离线执行，同时将上述步骤得到的数据，做本地缓存。The client scheduling module obtains the data returned by customization and executes it directly offline, and at the same time caches the data obtained in the above steps locally.

如果有必要，客户端本地可以循环执行上述步骤；If necessary, the client can cycle through the above steps locally;

步骤五:Step 5:

在需要跟云端服务进行对话的时候，即在本地交互，不能满足用户的需求时，需要使用云端对话，为了保障对话的前后一致性，需要把前几轮对话的缓存，同步给服务端。When you need to communicate with the cloud service, that is, when the local interaction cannot meet the user's needs, you need to use the cloud dialog. In order to ensure the consistency of the dialog, the cache of the previous rounds of dialog needs to be synchronized to the server.

客户端会优先将上述缓存在本地的数据，同步给云端对话服务，然后进行正常的语音对话，此时云端调度服务中将会保存整个对话流程的信息，尽管之前的对话，曾经脱离过云端进行。可以实时同步，也可以设定在某些场景下同步，比如：网络状态不佳，传输失败，可以在本地先缓存，然后再同步到云端。The client will preferentially synchronize the above-mentioned data cached locally to the cloud dialogue service, and then conduct a normal voice dialogue. At this time, the cloud scheduling service will save the information of the entire dialogue process, although the previous dialogue has been separated from the cloud. . It can be synchronized in real time, or it can be set to synchronize in certain scenarios, such as: the network status is poor, the transmission fails, it can be cached locally, and then synchronized to the cloud.

正常的语音交互过程中，无论是首轮，还是次轮对话，如果客户端需要执行多模态操作，比如如下的使用场景：In the normal voice interaction process, whether it is the first round or the second round of dialogue, if the client needs to perform multi-modal operations, such as the following usage scenarios:

usr:语音输入：“导航去苏州市中心广场”。usr:Voice input: "navigate to the central square of Suzhou".

sys:导航界面，显示地址列表，并播报：“为你找到多个地址，请选择”，同时，车子进到隧道等网络环境不稳定的地方。sys: Navigate the interface, display the address list, and broadcast: "I found multiple addresses for you, please select", at the same time, the car enters the tunnel and other places where the network environment is unstable.

usr:手动点击屏幕“第一个”选项。usr: Manually click on the screen "first" option.

sys:显示第一个列表，并播报“为你导航去苏州市中心广场，是否确定？”，同时，车子切换到网络环境稳定的地方。sys: Display the first list, and broadcast "Navigate to Suzhou City Center Plaza for you, are you sure?" At the same time, the car switches to a place where the network environment is stable.

usr:语音输入：“确定”。usr:Voice input: "OK".

sys:为你导航去苏州市中心广场。sys: Navigate to Suzhou City Center Plaza for you.

作为本发明的一种实施例中，提供的备选方案为：As an embodiment of the present invention, the provided alternatives are:

本地执行多模态操作，直接在本地生成一个“选择意图”，绕过语义解析过程，然后将数据传输到云端，由云端对话执行对话处理，并回复数据给客户端，做相应处理；选择意图指的是：根据协议，生成相关数据流。上述方案的优点在于，一定程度上优化了点击响应时间。云端对话不需要额外做兼容处理，复用云端对处理的流程。但涉及到网络操作，网络不稳定的时候，用户体验差。Perform multi-modal operations locally, directly generate a "selection intent" locally, bypass the semantic parsing process, and then transmit the data to the cloud, where the cloud dialogue performs dialogue processing, and replies the data to the client for corresponding processing; select intent Refers to: according to the agreement, generate the relevant data stream. The advantage of the above solution is that the click response time is optimized to a certain extent. Cloud dialogue does not require additional compatibility processing, and reuses the process of cloud processing. But when it comes to network operations, when the network is unstable, the user experience is poor.

本发明的另一个实施例为：Another embodiment of the present invention is:

1.客户端将对话数据下载到本地。1. The client downloads the conversation data to the local.

2.用户执行多模态操作，跳过语义处理过程，直接触发一个对话意图，交给离线对话处理。2. The user performs a multi-modal operation, skips the semantic processing process, directly triggers a dialog intent, and hands it to the offline dialog for processing.

3.客户端调度模块，得到离线对话处理数据，执行相应操作。3. The client scheduling module obtains the offline dialogue processing data and performs corresponding operations.

上述方案的优点在于：The advantages of the above scheme are:

1.现有的技术条件下，做离线语音相关流程交互，是不成熟的。本方案可以直接解决在网络环境不稳定的场景下语音对话的缺陷，提高用户的体验；1. Under the existing technical conditions, it is immature to do offline voice-related process interaction. This solution can directly solve the defect of voice dialogue in the scenario of unstable network environment, and improve the user experience;

2.提供了触屏，手势等多模态操作，和语音交互配合的解决方案，提高了用户的体验。2. Provides solutions for multi-modal operations such as touch screen, gesture, and voice interaction to improve user experience.

本方案更进一步的优势在于：Further advantages of this solution are:

1.可以支持对话恢复等需求，比如：1. It can support the needs of dialogue recovery, such as:

Usr:苏州的天气怎么样？Usr: What's the weather like in Suzhou?

Sys：苏州的天气很好。Sys: The weather in Suzhou is very good.

此时，由于用户和系统之间的对话，已经结束。系统不会再开启麦克风，等待用户的语音输入。过了3分钟，用户出差去北京，准备出发的时候，唤醒系统。At this point, due to the dialogue between the user and the system, it has ended. The system will no longer turn on the microphone and wait for the user's voice input. After 3 minutes, the user goes on a business trip to Beijing and wakes up the system when he is ready to leave.

Usr：那北京呢？Usr: What about Beijing?

此时，客户端可以将前几轮对话中的天气话题相关缓存，先同步到云端服务中，此时云端服务将会拥有前几轮对话的信息，从而可以实现对话恢复功能。At this time, the client can cache the weather topics related to the previous rounds of conversations and synchronize them to the cloud service. At this time, the cloud service will have the information of the previous rounds of conversations, so that the conversation recovery function can be realized.

Sys：北京的天气是多云。Sys: The weather in Beijing is cloudy.

同时，本发明也提供了用户在多模态交互中的对话系统，如图6所示，包括：At the same time, the present invention also provides a user dialogue system in multimodal interaction, as shown in Figure 6, including:

语义对应关系单元101，其配置为建立语义对应关系为多个对话语段与多个语义信息的一一对应关系。The semantic correspondence unit 101 is configured to establish a semantic correspondence as a one-to-one correspondence between a plurality of dialogue segments and a plurality of semantic information.

语义指令集单元201，其配置为建立多个语义信息与多个本地执行指令之间的对应关联，该关联所组成的集合。The semantic instruction set unit 201 is configured to establish a corresponding association between a plurality of semantic information and a plurality of locally executed instructions, a set formed by the association.

当前对话语段获取单元301，其配置为通过交互设备获取当前对话语段，交互设备为能够实现多模态信息输入的交互设备。The current dialogue segment acquiring unit 301 is configured to acquire the current dialogue segment through an interactive device, which is an interactive device capable of inputting multimodal information.

当前语义信息获取单元401，其配置为根据当前对话语段调用语义对应关系单元101中的语义对应关系，获取当前语义信息。The current semantic information obtaining unit 401 is configured to call the semantic correspondence in the semantic correspondence unit 101 according to the current dialogue segment to obtain the current semantic information.

当前本地执行指令获取单元501，其配置为根据当前语义信息调用语义指令集单元201中的语义指令集，获取当前本地执行指令。The current local execution instruction acquiring unit 501 is configured to call the semantic instruction set in the semantic instruction set unit 201 according to the current semantic information to acquire the current local execution instruction.

对话单元601，其配置为执行当前本地执行指令并将执行结果显示在用户对话端。用户对话端为用户本地端。The dialog unit 601 is configured to execute the current local execution instruction and display the execution result on the user dialog end. The user dialog end is the user's local end.

在本发明一种优选的实施例中，当前对话语段获取单元301还配置为：通过交互设备，获取当前语音信息及当前文本信息。将当前语音信息通过语音识别转换为当前文本字段。将当前文本信息或文本字段确定为当前对话语段。In a preferred embodiment of the present invention, the current dialogue segment obtaining unit 301 is further configured to obtain current voice information and current text information through an interactive device. Convert the current voice information into the current text field through speech recognition. Identify the current text message or text field as the current dialogue segment.

在本发明一种优选的实施例中，交互设备包括能够作为输入设备的触屏。In a preferred embodiment of the present invention, the interaction device includes a touch screen that can be used as an input device.

当前对话语段获取单元301还配置为：建立事件对应关系，事件对应关系为多个事件操作信息与多个对话语段的一一对应关系，多个事件操作信息为用户对触屏进行单点敲击、双击、沿单线轨迹滑动及沿双线轨迹滑动时，触屏能够获取的信息。通过交互设备接收当前事件操作信息，当前事件操作信息包括在事件操作信息中。根据当前事件操作信息调用事件对应关系获取当前对话语段。The current dialogue segment acquiring unit 301 is further configured to: establish an event correspondence relationship, where the event correspondence relationship is a one-to-one correspondence between multiple event operation information and multiple dialogue segments, and multiple event operation information is for the user to perform a single point on the touch screen. The information that can be obtained by the touch screen when tapping, double-clicking, sliding along a single-line trajectory, and sliding along a double-line trajectory. The current event operation information is received through the interactive device, and the current event operation information is included in the event operation information. Call the event correspondence according to the current event operation information to obtain the current dialogue segment.

在本发明一种优选的实施例中，多个本地执行指令中包括调用本地数据的执行指令及调用远程数据的执行指令。In a preferred embodiment of the present invention, the multiple local execution instructions include an execution instruction for calling local data and an execution instruction for calling remote data.

对话单元601还配置为，若当前本地执行指令是调用本地数据的执行指令，则调用当前本地数据，根据当前本地数据执行当前本地执行指令并将执行结果显示在用户对话端。及若当前本地执行指令是调用远程数据的执行指令，则建立远程数据链接，从远程调用当前远程数据，根据当前远程数据执行当前本地执行指令并将执行结果显示在用户对话端。。The dialog unit 601 is further configured to call the current local data if the current local execution instruction is an execution instruction calling local data, execute the current local execution instruction according to the current local data, and display the execution result on the user dialog. And if the current local execution instruction is an execution instruction for calling remote data, a remote data link is established, the current remote data is called from the remote, the current local execution instruction is executed according to the current remote data, and the execution result is displayed on the user dialog. .

在另一些实施例中，本发明实施例还提供了一种非易失性计算机存储介质，计算机存储介质存储有计算机可执行指令，该计算机可执行指令可执行上述任意方法实施例中的语音信号处理和使用方法；In other embodiments, embodiments of the present invention further provide a non-volatile computer storage medium, where the computer storage medium stores computer-executable instructions, and the computer-executable instructions can execute the voice signal in any of the foregoing method embodiments methods of processing and use;

作为一种实施例，本发明的非易失性计算机存储介质存储有计算机可执行指令，计算机可执行指令设置为：As an embodiment, the non-volatile computer storage medium of the present invention stores computer-executable instructions, and the computer-executable instructions are set to:

建立语义对应关系，所述语义对应关系为多个对话语段与多个语义信息的一一对应关系；establishing a semantic correspondence, where the semantic correspondence is a one-to-one correspondence between multiple dialogue segments and multiple semantic information;

建立语义指令集，所述语义指令集是指：建立所述多个语义信息与多个本地执行指令之间的对应关联，该关联所组成的集合；establishing a semantic instruction set, where the semantic instruction set refers to: establishing a corresponding association between the plurality of semantic information and a plurality of locally executed instructions, and a set formed by the association;

通过交互设备获取当前对话语段，所述交互设备为能够实现多模态信息输入的交互设备；Obtain the current dialogue segment through an interactive device, the interactive device being an interactive device capable of inputting multimodal information;

根据当前对话语段调用所述语义对应关系，获取当前语义信息；Invoke the semantic correspondence according to the current dialogue segment to obtain the current semantic information;

根据所述当前语义信息调用所述语义指令集，获取当前本地执行指令；Invoke the semantic instruction set according to the current semantic information to obtain the current local execution instruction;

执行所述当前本地执行指令并将执行结果显示在用户对话端；所述用户对话端为用户本地端。The current local execution instruction is executed and the execution result is displayed on the user dialog end; the user dialog end is the user local end.

作为一种非易失性计算机可读存储介质，可用于存储非易失性软件程序、非易失性计算机可执行程序以及模块，如本发明实施例中的语音信号处理方法对应的程序指令/模块。一个或者多个程序指令存储在非易失性计算机可读存储介质中，当被处理器执行时，执行上述任意方法实施例中的语音信号处理方法。As a non-volatile computer-readable storage medium, it can be used to store non-volatile software programs, non-volatile computer-executable programs and modules, such as program instructions/program instructions corresponding to the voice signal processing method in the embodiments of the present invention. module. One or more program instructions are stored in a non-volatile computer-readable storage medium, and when executed by a processor, perform the speech signal processing method in any of the above method embodiments.

非易失性计算机可读存储介质可以包括存储程序区和存储数据区，其中，存储程序区可存储操作系统、至少一个功能所需要的应用程序；存储数据区可存储根据用户在多模态交互中的对话方法的使用所创建的数据等。此外，非易失性计算机可读存储介质可以包括高速随机存取存储器，还可以包括非易失性存储器，例如至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。在一些实施例中，非易失性计算机可读存储介质可选包括相对于处理器远程设置的存储器，这些远程存储器可以通过网络连接至用户在多模态交互中的对话方法。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The non-volatile computer-readable storage medium may include a stored program area and a stored data area, wherein the stored program area may store an operating system and an application program required by at least one function; Use the data created by the dialog method in etc. In addition, the non-volatile computer-readable storage medium may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some embodiments, the non-transitory computer-readable storage medium may optionally include memory located remotely from the processor that can be connected to the user's method of dialogue in the multimodal interaction through a network. Examples of such networks include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.

本发明实施例还提供一种计算机程序产品，计算机程序产品包括存储在非易失性计算机可读存储介质上的计算机程序，计算机程序包括程序指令，当程序指令被计算机执行时，使计算机执行上述任一项语音信号处理方法。An embodiment of the present invention further provides a computer program product, the computer program product includes a computer program stored on a non-volatile computer-readable storage medium, the computer program includes program instructions, and when the program instructions are executed by a computer, the computer is made to execute the above Any speech signal processing method.

图7是本发明实施例提供的电子设备的结构示意图，如图7所示，该设备包括：一个或多个处理器710以及存储器720，图7中以一个处理器710为例。语音信号处理方法的设备还可以包括：输入装置730和输出装置740。处理器710、存储器720、输入装置730和输出装置740可以通过总线或者其他方式连接，图7中以通过总线连接为例。存储器720为上述的非易失性计算机可读存储介质。处理器710通过运行存储在存储器720中的非易失性软件程序、指令以及模块，从而执行服务器的各种功能应用以及数据处理，即实现上述方法实施例用户在多模态交互中的对话方法。输入装置730可接收输入的数字或字符信息，以及产生与信息投放装置的用户设置以及功能控制有关的键信号输入。输出装置740可包括显示屏等显示设备。FIG. 7 is a schematic structural diagram of an electronic device provided by an embodiment of the present invention. As shown in FIG. 7 , the device includes: one or more processors 710 and a memory 720 . In FIG. 7 , one processor 710 is used as an example. The apparatus of the speech signal processing method may further include: an input device 730 and an output device 740 . The processor 710, the memory 720, the input device 730, and the output device 740 may be connected by a bus or in other ways, and the connection by a bus is taken as an example in FIG. 7 . The memory 720 is the aforementioned non-volatile computer-readable storage medium. The processor 710 executes various functional applications and data processing of the server by running the non-volatile software programs, instructions, and modules stored in the memory 720, that is, to implement the user's dialogue method in the multimodal interaction of the above method embodiments. . The input device 730 may receive input numerical or character information, and generate key signal input related to user settings and function control of the information delivery device. The output device 740 may include a display device such as a display screen.

上述产品可执行本发明实施例所提供的方法，具备执行方法相应的功能模块和有益效果。未在本实施例中详尽描述的技术细节，可参见本发明实施例所提供的方法。The above product can execute the method provided by the embodiment of the present invention, and has corresponding functional modules and beneficial effects for executing the method. For technical details not described in detail in this embodiment, reference may be made to the method provided by the embodiment of the present invention.

作为一种实施例，上述电子设备可以应用于智能语音对话平台中，包括：至少一个处理器；以及，与至少一个处理器通信连接的存储器；其中，存储器存储有可被至少一个处理器执行的指令，指令被至少一个处理器执行，以使至少一个处理器能够：As an embodiment, the above electronic device can be applied to an intelligent voice dialogue platform, and includes: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores a program that can be executed by the at least one processor. instructions that are executed by at least one processor to enable at least one processor to:

本申请实施例的电子设备以多种形式存在，包括但不限于：The electronic devices in the embodiments of the present application exist in various forms, including but not limited to:

(1)移动通信设备：这类设备的特点是具备移动通信功能，并且以提供话音、数据通信为主要目标。这类终端包括:智能手机(例如iPhone)、多媒体手机、功能性手机，以及低端手机等。(1) Mobile communication equipment: This type of equipment is characterized by having mobile communication functions, and its main goal is to provide voice and data communication. Such terminals include: smart phones (eg iPhone), multimedia phones, feature phones, and low-end phones.

(2)超移动个人计算机设备：这类设备属于个人计算机的范畴，有计算和处理功能，一般也具备移动上网特性。这类终端包括：PDA、MID和UMPC设备等，例如iPad。(2) Ultra-mobile personal computer equipment: This type of equipment belongs to the category of personal computers, has computing and processing functions, and generally has the characteristics of mobile Internet access. Such terminals include: PDAs, MIDs, and UMPC devices, such as iPads.

(3)便携式娱乐设备：这类设备可以显示和播放多媒体内容。该类设备包括:音频、视频播放器(例如iPod)，掌上游戏机，电子书，以及智能玩具和便携式车载导航设备。(3) Portable entertainment equipment: This type of equipment can display and play multimedia content. Such devices include: audio and video players (eg iPod), handheld game consoles, e-books, as well as smart toys and portable car navigation devices.

(4)服务器:提供计算服务的设备，服务器的构成包括处理器、硬盘、内存、系统总线等，服务器和通用的计算机架构类似，但是由于需要提供高可靠的服务，因此在处理能力、稳定性、可靠性、安全性、可扩展性、可管理性等方面要求较高。(4) Server: A device that provides computing services. The composition of the server includes a processor, a hard disk, a memory, a system bus, etc. The server is similar to a general computer architecture, but due to the need to provide highly reliable services, the processing power, stability , reliability, security, scalability, manageability and other aspects of high requirements.

(5)其他具有数据交互功能的电子装置。(5) Other electronic devices with data interaction function.

以上所描述的装置实施例仅仅是示意性的，其中作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下，即可以理解并实施。The device embodiments described above are only illustrative, wherein the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place , or distributed to multiple network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment. Those of ordinary skill in the art can understand and implement it without creative effort.

通过以上的实施例的描述，本领域的技术人员可以清楚地了解到各实施例可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件。基于这样的理解，上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品可以存储在计算机可读存储介质中，如ROM/RAM、磁碟、光盘等，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行各个实施例或者实施例的某些部分的方法。From the description of the above embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus a necessary general hardware platform, and certainly can also be implemented by hardware. Based on this understanding, the above-mentioned technical solutions can be embodied in the form of software products in essence or the parts that make contributions to the prior art, and the computer software products can be stored in computer-readable storage media, such as ROM/RAM, magnetic Disks, optical discs, etc., include instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform the methods of various embodiments or portions of embodiments.

最后应说明的是：以上实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, but not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that it can still be The technical solutions described in the foregoing embodiments are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. The user's dialogue method in multimodal interaction, including:

Step S101, establishing a semantic correspondence, where the semantic correspondence is a one-to-one correspondence between multiple dialogue segments and multiple semantic information;

Step S102, establishing a semantic instruction set, where the semantic instruction set refers to a set consisting of the corresponding associations between the multiple semantic information and multiple local execution instructions;

Step S103, obtaining the current dialogue segment through an interactive device, where the interactive device is an interactive device capable of inputting multimodal information;

Step S104, calling the semantic correspondence according to the current dialogue segment to obtain current semantic information;

Step S105, calling the semantic instruction set according to the current semantic information to obtain the current local execution instruction;

Step S106: Execute the current local execution instruction and display the execution result on the user dialogue terminal; the user dialogue terminal is the user local terminal.

2. The dialogue method according to claim 1, wherein the step S103 comprises:

Step S1031a, obtaining current voice information and current text information through the interactive device;

Step S1032a, converting the current voice information into a current text field by voice recognition;

Step S1033a, determining the current text information or the current text field as the current dialogue segment.

3. The dialogue method according to claim 1, wherein the interaction device comprises a touch screen capable of serving as an input device;

The step S103 includes:

Establish an event correspondence relationship, where the event correspondence relationship is a one-to-one correspondence between multiple event operation information and multiple dialogue segments, and the multiple event operation information is that the user performs a single tap or double click on the touch screen or The information that the touch screen can obtain when sliding along a single-line trajectory or sliding along a double-line trajectory;

Step S1031b, receiving current event operation information through the interactive device, where the current event operation information is included in the event operation information;

Step S1032b, calling the event correspondence according to the current event operation information to obtain the current dialogue segment.

4. The dialogue method according to claim 1, wherein the plurality of semantic information includes: a character field for receiving a retrieval character, a character field for calling a map, a character field for calling a system operation, a screen zooming in and a screen zooming out character field;

The multiple local execution instructions include: retrieving and displaying local data, retrieving and displaying a local map, displaying a local dialog box, executing a system call, a command for zooming in an image of a partial area of the screen, and an instruction for reducing an image of a partial area of the screen.

5. The dialogue method according to claim 1, wherein the plurality of local execution instructions include an execution instruction for calling local data and an execution instruction for calling remote data;

The step S106 includes:

Step S1061, if the current local execution instruction is an execution instruction that calls local data, then call the current local data, execute the current local execution instruction according to the current local data and display the execution result on the user dialog; and

Step S1062, if the current local execution instruction is an execution instruction that calls remote data, then establish a remote data link, call the current remote data from the remote, execute the current local execution instruction according to the current remote data and display the execution result in the User dialog.

6. The dialogue method according to claim 5, wherein the S1062 further comprises:

If the remote data link cannot be established, the remote network signal is detected regularly;

If a remote network signal can be detected, a remote data link is established.

7. User dialogue system in multimodal interaction, including:

a semantic correspondence unit, configured to establish the semantic correspondence as a one-to-one correspondence between a plurality of dialogue segments and a plurality of semantic information;

a semantic instruction set unit, which is configured to establish a corresponding association between the plurality of semantic information and a plurality of locally executed instructions, a set formed by the association;

A current dialogue segment acquiring unit, configured to acquire the current dialogue segment through an interaction device, the interaction device being an interaction device capable of inputting multimodal information;

A current semantic information obtaining unit, configured to call the semantic correspondence in the semantic correspondence unit according to the current dialogue segment, to obtain the current semantic information;

A current local execution instruction acquisition unit, configured to call the semantic instruction set in the semantic instruction set unit according to the current semantic information, to acquire the current local execution instruction;

A dialog unit, configured to execute the current local execution instruction and display the execution result on the user dialog end; the user dialog end is the user local end.

8. The dialogue system according to claim 7, wherein the current dialogue segment acquisition unit is further configured to: acquire current voice information and current text information through the interactive device; Converting to a current text field; determining the current text information or the current text field as the current dialogue segment.

9. The dialog system of claim 7, wherein the interaction device comprises a touch screen capable of serving as an input device;

The current dialogue segment acquisition unit is further configured to: establish an event corresponding relationship, the event corresponding relationship is a one-to-one correspondence between multiple event operation information and multiple dialogue segments, and the multiple event operation information is the user's corresponding relationship. The information that the touch screen can obtain when the touch screen performs single-click tapping, double-clicking, sliding along a single-line trajectory, and sliding along a double-line trajectory; and receiving current event operation information through the interactive device, the current event operation information Included in the event operation information; calling the event correspondence according to the current event operation information to obtain the current dialogue segment.

10. The dialogue system according to claim 7, wherein the plurality of local execution instructions include an execution instruction for calling local data and an execution instruction for calling remote data;

The dialogue unit is further configured to, if the current local execution instruction is an execution instruction that calls local data, call the current local data, execute the current local execution instruction according to the current local data, and display the execution result in the user dialogue. and if the current local execution instruction is an execution instruction that calls remote data, then establish a remote data link, call the current remote data from a remote location, execute the current local execution instruction according to the current remote data and display the execution result in the User dialog.