CN108877792A - For handling method, apparatus, electronic equipment and the computer readable storage medium of voice dialogue - Google Patents
For handling method, apparatus, electronic equipment and the computer readable storage medium of voice dialogue Download PDFInfo
- Publication number
- CN108877792A CN108877792A CN201810541680.6A CN201810541680A CN108877792A CN 108877792 A CN108877792 A CN 108877792A CN 201810541680 A CN201810541680 A CN 201810541680A CN 108877792 A CN108877792 A CN 108877792A
- Authority
- CN
- China
- Prior art keywords
- reply
- speech
- recognition result
- voice
- user
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/221—Announcement of recognition results
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Machine Translation (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
根据本公开的示例实施例,提供了一种用于处理语音对话的方法、装置、电子设备以及计算机可读存储介质。该方法包括响应于接收到来自用户的第一语音而提供针对第一语音的第一回复,其中第一回复基于对第一语音的第一识别结果而被生成。该方法还包括接收来自用户的第二语音,其中第二语音用于纠正或者补充第一识别结果。此外,该方法还包括基于第一语音和第二语音来生成将向用户提供的第二回复,其中第二回复比第一回复更加符合用户的意图。根据本公开的实施例,在由于语音识别异常而导致聊天机器人无法准确识别用户的语音内容的情况下,用户可以使用进一步的语音对话来进行主动纠正或者补充,从而能够解决语音识别中的异常。
According to example embodiments of the present disclosure, there are provided a method, an apparatus, an electronic device, and a computer-readable storage medium for processing a voice dialog. The method includes providing a first reply to the first speech in response to receiving the first speech from the user, wherein the first reply is generated based on a first recognition result of the first speech. The method also includes receiving a second speech from the user, wherein the second speech is used to correct or supplement the first recognition result. In addition, the method further includes generating a second reply to be provided to the user based on the first voice and the second voice, wherein the second reply is more in line with the user's intention than the first reply. According to the embodiments of the present disclosure, when the chatbot cannot accurately recognize the voice content of the user due to abnormal voice recognition, the user can use further voice dialogues to actively correct or supplement, thereby solving the abnormal voice recognition.
Description
技术领域technical field
本公开的实施例总体上涉及人工智能领域,并且更具体地涉及用于处理语音对话的方法、装置、电子设备以及计算机可读存储介质。Embodiments of the present disclosure generally relate to the field of artificial intelligence, and more particularly relate to methods, apparatuses, electronic devices, and computer-readable storage media for processing speech dialogues.
背景技术Background technique
近年来,“对话即平台(Conversation as a Platform)”的理念日益深入人心,越来越多的网络产品和应用开始使用对话式的人机交互方式。聊天机器人是指可以通过文字、语音或图片等实现人机交互的计算机程序或软件,其可以理解用户发出的内容,并且自动做出应答。聊天机器人在一定程度上可以取代真人进行对话,其可以被集成到对话系统中作为自动在线助理,以用于例如智能聊天、客户服务、信息询问等场景。In recent years, the concept of "Conversation as a Platform" has become increasingly popular, and more and more network products and applications have begun to use conversational human-computer interaction. A chat robot refers to a computer program or software that can realize human-computer interaction through text, voice or pictures, etc., which can understand the content sent by the user and automatically respond. To a certain extent, chatbots can replace real people for conversations, and they can be integrated into dialogue systems as automatic online assistants for scenarios such as intelligent chat, customer service, and information inquiries.
语音对话是一种常见的人机交互形式,与文本对话相比,语音对话还涉及到语音内容的处理,诸如前端识别、语音识别、语音合成等。由于对话系统基于语音识别内容而进行工作,因而对语音识别的准确性有较高的要求。语音对话的应用场景可以包括智能语音助手、智能音箱、车载导航等。Voice dialogue is a common form of human-computer interaction. Compared with text dialogue, voice dialogue also involves the processing of voice content, such as front-end recognition, speech recognition, and speech synthesis. Since the dialogue system works based on speech recognition content, it has high requirements on the accuracy of speech recognition. The application scenarios of voice dialogue can include intelligent voice assistants, smart speakers, car navigation, etc.
发明内容Contents of the invention
根据本公开的示例实施例,提供了一种用于处理语音对话的方法、装置、电子设备以及计算机可读存储介质。According to example embodiments of the present disclosure, there are provided a method, an apparatus, an electronic device, and a computer-readable storage medium for processing a voice dialog.
在本公开的第一方面中,提供了一种用于处理语音对话的方法。该方法包括:响应于接收到来自用户的第一语音,提供针对第一语音的第一回复,其中第一回复基于对第一语音的第一识别结果而被生成;接收来自用户的第二语音,其中第二语音用于纠正或者补充第一识别结果;以及基于第一语音和第二语音,生成将向用户提供的第二回复,其中第二回复比第一回复更加符合用户的意图。In a first aspect of the present disclosure, a method for processing a voice dialog is provided. The method includes: in response to receiving a first speech from a user, providing a first reply to the first speech, wherein the first reply is generated based on a first recognition result of the first speech; receiving a second speech from the user , wherein the second voice is used to correct or supplement the first recognition result; and based on the first voice and the second voice, generate a second reply to be provided to the user, wherein the second reply is more in line with the user's intention than the first reply.
在本公开的第二方面中,提供了一种用于处理语音对话的装置。该装置包括:第一提供模块,被配置为响应于接收到来自用户的第一语音,提供针对第一语音的第一回复,其中第一回复基于对第一语音的第一识别结果而被生成;语音接收模块,被配置为接收来自用户的第二语音,其中第二语音用于纠正或者补充第一识别结果;以及第二提供模块,被配置为基于第一语音和第二语音,生成将向用户提供的第二回复,其中第二回复比第一回复更加符合用户的意图。In a second aspect of the present disclosure, an apparatus for processing a voice conversation is provided. The apparatus includes: a first providing module configured to provide a first reply to the first speech in response to receiving the first speech from the user, wherein the first reply is generated based on a first recognition result of the first speech The speech receiving module is configured to receive a second speech from the user, wherein the second speech is used to correct or supplement the first recognition result; and the second providing module is configured to generate the following speech based on the first speech and the second speech A second reply provided to the user, where the second reply is more in line with the user's intent than the first reply.
在本公开的第三方面中,提供了一种电子设备,其包括一个或多个处理器;以及存储装置,用于存储一个或多个程序。一个或多个程序当被一个或多个处理器执行,使得电子设备实现根据本公开的实施例的方法或过程。In a third aspect of the present disclosure, there is provided an electronic device comprising one or more processors; and storage means for storing one or more programs. One or more programs, when executed by one or more processors, cause the electronic device to implement the method or process according to the embodiments of the present disclosure.
在本公开的第四方面中,提供了一种计算机可读介质,其上存储有计算机程序,该程序被处理器执行时实现根据本公开的实施例的方法或过程。In a fourth aspect of the present disclosure, a computer-readable medium is provided, on which a computer program is stored, and when the program is executed by a processor, the method or process according to the embodiments of the present disclosure is implemented.
应当理解,本发明内容部分中所描述的内容并非旨在限定本公开的实施例的关键或重要特征,也不用于限制本公开的范围。本公开的其它特征将通过以下的描述而变得容易理解。It should be understood that what is described in the Summary of the Invention is not intended to limit the key or important features of the embodiments of the present disclosure, nor is it intended to limit the scope of the present disclosure. Other features of the present disclosure will be readily understood through the following description.
附图说明Description of drawings
结合附图并参考以下详细说明,本公开各实施例的上述和其他特征、优点及方面将变得更加明显。在附图中,相同或相似的附图标记表示相同或相似的元素,其中:The above and other features, advantages and aspects of the various embodiments of the present disclosure will become more apparent with reference to the following detailed description when taken in conjunction with the accompanying drawings. In the drawings, identical or similar reference numerals denote identical or similar elements, wherein:
图1示出了本公开的实施例能够实现在其中的示例环境的示意图;Figure 1 shows a schematic diagram of an example environment in which embodiments of the present disclosure can be implemented;
图2示出了根据本公开的实施例的用于处理语音对话的方法的流程图;FIG. 2 shows a flowchart of a method for processing a voice dialogue according to an embodiment of the present disclosure;
图3示出了根据本公开的实施例的用于处理语音消息的过程的示意图;FIG. 3 shows a schematic diagram of a process for processing voice messages according to an embodiment of the present disclosure;
图4示出了根据本公开的实施例的通过对话解决文字识别错误的方法的流程图;FIG. 4 shows a flowchart of a method for solving character recognition errors through dialogue according to an embodiment of the present disclosure;
图5示出了根据本公开的实施例的通过对话解决数字识别错误的方法的流程图;FIG. 5 shows a flow chart of a method for solving digit recognition errors through dialogue according to an embodiment of the present disclosure;
图6示出了根据本公开的实施例的通过对话补充识别结果的方法的流程图;FIG. 6 shows a flow chart of a method for supplementing recognition results through dialogue according to an embodiment of the present disclosure;
图7示出了根据本公开的实施例的用于处理语音对话的装置的框图;以及FIG. 7 shows a block diagram of an apparatus for processing a voice dialog according to an embodiment of the present disclosure; and
图8示出了能够实施本公开的多个实施例的电子设备的框图。FIG. 8 shows a block diagram of an electronic device capable of implementing various embodiments of the present disclosure.
具体实施方式Detailed ways
下面将参照附图更详细地描述本公开的实施例。虽然附图中示出了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although certain embodiments of the present disclosure are shown in the drawings, it should be understood that the disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein; For a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for exemplary purposes only, and are not intended to limit the protection scope of the present disclosure.
在本公开的实施例的描述中,术语“包括”及其类似用语应当理解为开放性包含,即“包括但不限于”。术语“基于”应当理解为“至少部分地基于”。术语“一个实施例”或“该实施例”应当理解为“至少一个实施例”。下文还可能包括其他明确的和隐含的定义。In the description of the embodiments of the present disclosure, the term "comprising" and its similar expressions should be interpreted as an open inclusion, that is, "including but not limited to". The term "based on" should be understood as "based at least in part on". The term "one embodiment" or "the embodiment" should be read as "at least one embodiment". Other definitions, both express and implied, may also be included below.
在语音对话的场景中,由于存在环境噪声或者用户口音,常常导致语音识别异常(诸如语音识别错误或者不能识别)。为了解决语音识别异常的问题,一种改进方法是提高语音识别本身的准确性,另一种改进方法是提高语义的容错率。然而,即使使用以上两种改进方法,仍然会出现由于语音识别异常而无法准确对话的场景。一般来说,语义理解会基于对语音的识别结果而做出,而当聊天机器人不能识别或者错误识别用户的意图时,可能会造成意想不到的后果。In the scene of voice dialogue, due to the presence of environmental noise or user's accent, abnormal voice recognition (such as voice recognition error or failure to recognize) often occurs. In order to solve the problem of abnormal speech recognition, one improvement method is to improve the accuracy of speech recognition itself, and another improvement method is to increase the error tolerance rate of semantics. However, even if the above two improved methods are used, there will still be scenarios where accurate conversations cannot be made due to abnormal speech recognition. Generally speaking, semantic understanding will be based on the recognition results of speech, and when the chat robot cannot recognize or misidentify the user's intention, it may cause unexpected consequences.
本公开的实施例提出了一种用于处理语音对话的方案。根据本公开的实施例,在由于语音识别异常而导致聊天机器人错误识别或无法识别用户的语音内容时,用户可以使用进一步的语音对话来进行主动纠正或者补充。因此,根据本公开的实施例的语义理解平台能够解决语音识别异常的问题,由此提升聊天过程中的用户体验。以下将参考附图1-8详细描述本公开的一些示例实施例。Embodiments of the present disclosure propose a solution for processing voice dialogues. According to an embodiment of the present disclosure, when the chat robot misrecognizes or fails to recognize the user's voice content due to abnormal voice recognition, the user can use further voice dialogues to actively correct or supplement. Therefore, the semantic understanding platform according to the embodiments of the present disclosure can solve the problem of abnormal speech recognition, thereby improving user experience during chatting. Some example embodiments of the present disclosure will be described in detail below with reference to FIGS. 1-8 .
图1示出了本公开的实施例能够实现在其中的示例环境100的示意图。在示例环境100中,用户110正在与聊天机器人120(也称为“聊天引擎”)进行语音对话。可选地,用户110可以在聊天机器人120的本地,即用户110可以直接与聊天机器人120进行对话。备选地,用户110也可以使用其本地设备(诸如膝上型计算机、台式计算机、智能电话、平板电脑等)通过网络与聊天机器人120进行语音对话。应当理解,聊天机器人120可以被部署到本地的电子设备中,其也可以被部署到云中或者被分布式部署。FIG. 1 shows a schematic diagram of an example environment 100 in which embodiments of the present disclosure can be implemented. In the example environment 100, a user 110 is engaged in a voice conversation with a chatbot 120 (also referred to as a "chat engine"). Optionally, the user 110 may be local to the chatbot 120 , that is, the user 110 may directly have a conversation with the chatbot 120 . Alternatively, the user 110 can also use his local device (such as a laptop computer, a desktop computer, a smart phone, a tablet computer, etc.) to conduct a voice conversation with the chat robot 120 through the network. It should be understood that the chatbot 120 can be deployed in a local electronic device, it can also be deployed in the cloud or distributed.
参考图1,用户110向聊天机器人120发送语音121(称为“第一语音”),聊天机器人120处理语音121并且向用户110提供相应的回复122(称为“第一回复”)。至此,用户110与聊天机器人120的第一轮对话已经完成。在一些实施例中,可以将语音121的文本同时显示在用户设备的显示器中,使得用户更清晰了解当前的对话内容。Referring to FIG. 1 , a user 110 sends a voice 121 (referred to as a "first voice") to a chatbot 120, and the chatbot 120 processes the voice 121 and provides a corresponding reply 122 to the user 110 (referred to as a "first reply"). So far, the first round of dialogue between the user 110 and the chatbot 120 has been completed. In some embodiments, the text of the voice 121 can be displayed on the display of the user equipment at the same time, so that the user can understand the current conversation content more clearly.
在本公开的实施例中,回复122未能满足用户110的需求(例如对语音121存在识别错误使得聊天机器人120不能准确识别用户110的意图),用户可以向聊天机器人120发送进一步的语音123(称为“第二语音”)以用于纠正或者补充,聊天机器人120处理语音123并且向用户110提供相应的回复124(称为“第二回复”)。根据本公开的实施例,由于语音123是对语音121的识别结果的纠正或补充,因此,通过组合语音121和123,聊天机器人120能够更准确地识别用户110的意图。下表1示出了第一语音识别结果进行纠正的语音一个对话示例。In an embodiment of the present disclosure, if the reply 122 fails to meet the needs of the user 110 (for example, there is a recognition error in the voice 121 so that the chat robot 120 cannot accurately recognize the intention of the user 110), the user can send a further voice 123 to the chat robot 120 ( called “second voice”) for correction or supplementation, chatbot 120 processes voice 123 and provides corresponding reply 124 (called “second reply”) to user 110 . According to an embodiment of the present disclosure, since the voice 123 is a correction or supplement to the recognition result of the voice 121 , by combining the voices 121 and 123 , the chatbot 120 can more accurately recognize the intention of the user 110 . Table 1 below shows an example of a dialog with the speech corrected by the first speech recognition result.
表1Table 1
例如,用户110的语音121的识别结果为“查找王峰的联系方式”,聊天机器人120生成对应的回复122为“正在查询王峰的联系方式,请稍后”。由于用户110的本意为“查找王丰的联系方式”,因而其在语音123中对语音121的识别结果进行纠正,说了“是丰收的丰”。聊天机器人120根据纠正的内容,生成回复124“正在查询王丰的联系方式,请稍后”。如此,聊天机器人120能够准确识别用户110的真实意图。For example, the recognition result of the voice 121 of the user 110 is "looking for contact information of Feng Wang", and the corresponding reply 122 generated by the chatbot 120 is "inquiring about the contact information of Feng Wang, please wait". Since user 110 intended to "find Wang Feng's contact information", he corrected the recognition result of voice 121 in voice 123 and said "it's a good harvest". The chat robot 120 generates a reply 124 "inquiring about Wang Feng's contact information, please wait later" according to the corrected content. In this way, the chatbot 120 can accurately identify the real intention of the user 110 .
图2示出了根据本公开的实施例的用于处理语音对话的方法200的流程图。应当理解,方法200可以由以上参考图1所描述的聊天机器人120来执行。Fig. 2 shows a flow chart of a method 200 for processing a voice dialog according to an embodiment of the present disclosure. It should be understood that the method 200 may be performed by the chatbot 120 described above with reference to FIG. 1 .
在框202,响应于接收到来自用户的第一语音,提供针对第一语音的第一回复,其中第一回复基于对第一语音的第一识别结果而被生成。例如,在聊天机器人120从用户110接收到语音121之后,向用户110提供相应的回复122。在本公开的实施例中,第一回复由于语音识别异常而未能准确识别用户的意图,例如,可能引起错误的回复,或者提示“不能识别”而要求用户再说一遍。At block 202, in response to receiving a first speech from a user, a first reply to the first speech is provided, wherein the first reply is generated based on a first recognition result of the first speech. For example, after the chatbot 120 receives the voice 121 from the user 110 , it provides a corresponding reply 122 to the user 110 . In the embodiment of the present disclosure, the first reply fails to accurately recognize the user's intention due to abnormal speech recognition, for example, may cause a wrong reply, or prompts "unrecognizable" and requires the user to say it again.
在一些实施例中,可以仅通过语音的方式向用户110提供回复122。在另一些实施例中,为了让用户更直观了解其语音的识别结果,还可以通过可视的方式(例如通过显示设备)呈现语音121的识别结果以及回复122的文本形式。In some embodiments, reply 122 may be provided to user 110 by voice only. In some other embodiments, in order to allow the user to understand the recognition result of the voice more intuitively, the recognition result of the voice 121 and the text form of the reply 122 may also be presented in a visual way (for example, through a display device).
在框204,接收来自用户的第二语音,其中第二语音用于纠正或者补充第一识别结果。例如,由于回复122(即第一回复)未能正确识别用户的意图,聊天机器人120从用户110进一步接收用于纠正或者补充的语音123。也就是说,在由于语音识别错误而导致聊天机器人120错误识别或无法识别用户110的语音内容时,用户110可以通过进一步的语音对话来进行主动澄清。例如,用户可以主动纠正或者主动补充,包括通过语音纠正一个或多个文字和/或数字,或者通过语音进行补充。At block 204, a second speech from the user is received, wherein the second speech is used to correct or supplement the first recognition result. For example, since the reply 122 (ie, the first reply) fails to correctly recognize the user's intention, the chatbot 120 further receives a correction or supplementary speech 123 from the user 110 . That is to say, when the chatbot 120 misrecognizes or fails to recognize the voice content of the user 110 due to a voice recognition error, the user 110 can actively clarify through further voice dialogue. For example, the user can actively correct or actively supplement, including correcting one or more characters and/or numbers through voice, or supplement through voice.
在框206,基于第一语音和第二语音,生成将向用户提供的第二回复,其中第二回复比第一回复更加符合用户的意图。例如,聊天机器人120基于对语音123的识别结果以及对语音121的识别结果,向用户110提供回复124。由于语音123纠正或者补充对语音121的识别结果,使得聊天机器人120能够进一步理解用户110的意图,因而回复124(即第二回复)比回复122(第一回复)更加符合用户110的意图,从而解决语音识别中的异常,并且提升聊天过程的用户体验。In block 206, based on the first voice and the second voice, a second reply to be provided to the user is generated, wherein the second reply is more consistent with the user's intention than the first reply. For example, the chatbot 120 provides a reply 124 to the user 110 based on the recognition result of the voice 123 and the recognition result of the voice 121 . Since the speech 123 corrects or supplements the recognition result of the speech 121, the chat robot 120 can further understand the intention of the user 110, so the reply 124 (i.e. the second reply) is more in line with the intention of the user 110 than the reply 122 (the first reply), thus Solve the abnormality in speech recognition and improve the user experience of the chat process.
在一些实施例中,如果第二回复已经正确识别用户的意图,则可以执行与第二回复相关联的动作。例如,由于第二语音是对第一语音的第一识别结果的纠正或补充,使得聊天机器人能够准确识别用户的意图,因而聊天机器人可以执行或者指示执行与第二回复相关联的动作,例如电话呼叫、地图导航等。在一些实施例中,如果在提供第二回复之后的阈值时间内没有从用户接收到进一步的语音,可以默认为第二回复已经符合用户的意图,进而可以直接执行与第二回复相关联的动作。应当理解,也可以在生成第二回复的同时或前后直接执行与第二回复相关联的动作,而不管第二回复是否已经符合用户的意图。In some embodiments, the action associated with the second reply may be performed if the second reply has correctly identified the user's intent. For example, since the second voice is a correction or supplement to the first recognition result of the first voice, the chat robot can accurately recognize the user's intention, so the chat robot can perform or instruct to perform the action associated with the second reply, such as calling call, map navigation, etc. In some embodiments, if no further voice is received from the user within a threshold time after the second reply is provided, it may be assumed that the second reply has met the user's intention by default, and then the actions associated with the second reply may be directly performed . It should be understood that the actions associated with the second reply may also be directly performed while or before and after the second reply is generated, regardless of whether the second reply already meets the user's intention.
在一些实施例中,如果第二回复仍未识别用户的意图,则可以接收来自用户的第三语音。接下来,至少部分地基于第三语音,向用户提供第三回复。例如,虽然第二回复比第一回复更加符合用户的意图,但是可能仍然无法完全满足用户的需求。在这种情况下,用户可以发起进一步的语音以继续纠正或者补充之前的语音识别结果。In some embodiments, if the second reply still does not recognize the user's intent, a third voice from the user may be received. Next, based at least in part on the third speech, a third reply is provided to the user. For example, although the second reply is more in line with the user's intention than the first reply, it may still not fully satisfy the user's needs. In this case, the user can initiate further speech to continue to correct or supplement the previous speech recognition results.
图3示出了根据本公开的实施例的用于处理语音消息的过程300的示意图。应当理解,过程300可以由以上参考图1所描述的聊天机器人120来执行,并且过程300可以为以上参考图2所描述的基于接收到的语音来提供回复的示例实现。FIG. 3 shows a schematic diagram of a process 300 for processing voice messages according to an embodiment of the present disclosure. It should be appreciated that process 300 may be performed by chatbot 120 as described above with reference to FIG. 1 , and that process 300 may be an example implementation of providing replies based on received speech as described above with reference to FIG. 2 .
在框302,输入来自用户的语音,在框304,通过自动语音识别(ASR)来将输入语音转换成文本。在框306,通过自然语言理解(NLU)来将文本转换成计算机能够识别的表达形式。在框308,提取文本中的意图和词槽,并通过对话状态维护(DST)将其与历史的对话状态进行整合。在框310,根据当前的对话状态,通过动作候选排序选择一个最符合当前状态的动作。在获得动作之后,在框312,通过生成自然语言(NLG),并且在框314将生成的自然语言进行语音合成(TTS)。然后,在框316,输出语音以提供给用户。在过程300中,框302、框304、框314以及框316涉及语音处理过程,而框306、框308、框310以及框312涉及自然语言处理过程,其中对话状态维护和动作候选排序组成对话管理,其能基于语音的语义表示和当前上下文生成要执行的动作,并且更新上下文。At block 302, speech from a user is input, and at block 304, the input speech is converted to text by automatic speech recognition (ASR). At block 306, the text is converted into a computer-recognizable representation by natural language understanding (NLU). At block 308, the intents and word slots in the text are extracted and integrated with the historical dialog state through dialog state maintenance (DST). In block 310, according to the current dialog state, an action that best matches the current state is selected through action candidate ranking. After the actions are obtained, at block 312 , natural language is generated (NLG), and at block 314 the generated natural language is subjected to speech synthesis (TTS). Then, at block 316, the speech is output for presentation to the user. In process 300, blocks 302, 304, 314, and 316 relate to speech processing, while blocks 306, 308, 310, and 312 relate to natural language processing, where dialog state maintenance and action candidate ranking constitute dialog management , which can generate actions to be performed based on the semantic representation of speech and the current context, and update the context.
在一些实施例中,为了使用户具有更好的交互体验,可以将语音识别结果展示在显示设备(例如用户设备的显示器)上。在这种情况下,可以通过显示设备同时呈现语音的识别结果及其回复,使得用户可以知晓其语音的识别结果。In some embodiments, in order to provide the user with a better interactive experience, the speech recognition result may be displayed on a display device (such as a display of the user equipment). In this case, the voice recognition result and its reply can be presented simultaneously through the display device, so that the user can know the voice recognition result.
图4示出了根据本公开的实施例的通过对话解决文字识别错误的方法400的流程图。应当理解,方法400可以由以上参考图1所描述的聊天机器人120来执行,并且框402可以为以上参考图2所描述的框204的示例实现,框404和406可以为以上参考图2所描述的框206的示例实现。FIG. 4 shows a flow chart of a method 400 for solving character recognition errors through dialogue according to an embodiment of the present disclosure. It should be understood that the method 400 may be performed by the chatbot 120 described above with reference to FIG. 1 , and block 402 may be implemented as an example of block 204 described above with reference to FIG. An example implementation of block 206 of .
在框402,接收用于纠正第一识别结果中的一个或多个文字的第二语音,其中对第一语音的第一识别结果存在文字识别错误。在框404,使用对第二语音的第二识别结果来纠正第一识别结果。在框406,基于经纠正的第一识别结果,通过显示设备呈现第二回复。例如,以下表2-3示出了对第一语音识别结果的一个或多个文字错误进行纠正的语音对话示例。In block 402, a second speech for correcting one or more characters in a first recognition result is received, wherein a character recognition error exists in the first recognition result of the first speech. At block 404, the first recognition result is corrected using the second recognition result for the second speech. At block 406, a second reply is presented via the display device based on the corrected first recognition result. For example, the following Tables 2-3 show examples of speech dialogues for correcting one or more text errors of the first speech recognition result.
表2Table 2
表3table 3
在表2的示例中,用户在第二语音中纠正了第一语音的识别结果中单个错字“奇”;在表3的示例中,用户在第二语音中纠正第一语音的识别结果中的多个错字“习”和“奇”。接下来,聊天机器人基于纠正后的语音识别结果来提供第二回复,例如,表2和表3的示例中纠正后的识别结果为“我要去西二旗”。应当理解,虽然以上表2-3的实施例中对中文文字进行纠正,然而,本公开的是实施例也可以用于纠正其他语言的语音识别错误。In the example of Table 2, the user corrected a single typo "odd" in the recognition result of the first speech in the second speech; in the example of Table 3, the user corrected the typo in the recognition result of the first speech in the second speech Multiple typos of "Xi" and "Qi". Next, the chatbot provides a second reply based on the corrected speech recognition result, for example, in the examples in Table 2 and Table 3, the corrected recognition result is "I'm going to Xierqi". It should be understood that although Chinese characters are corrected in the above embodiments of Tables 2-3, the embodiments of the present disclosure can also be used to correct speech recognition errors in other languages.
图5示出了根据本公开的实施例的通过对话解决数字识别错误的方法500的流程图。应当理解,方法500可以由以上参考图1所描述的聊天机器人120来执行,并且框502可以为以上参考图2所描述的框204的示例实现,框504和506可以为以上参考图2所描述的框206的示例实现。FIG. 5 shows a flowchart of a method 500 for resolving digit recognition errors through dialogue according to an embodiment of the present disclosure. It should be understood that the method 500 may be performed by the chatbot 120 described above with reference to FIG. 1 , and block 502 may be implemented as an example of block 204 described above with reference to FIG. An example implementation of block 206 of .
在框502,接收用于纠正第一识别结果中的一个或多个数字的第二语音,其中对第一语音的第一识别结果存在数字识别错误。在框504,使用对第二语音的第二识别结果中的一个或多个数字来纠正第一识别结果。在框506,基于经纠正的第一识别结果,通过显示设备呈现第二回复。例如,以下表4示出了对第一语音识别结果的数字错误进行纠正的语音对话示例。At block 502, a second speech for correcting one or more digits in a first recognition result is received, wherein a digit recognition error exists for the first recognition result of the first speech. At block 504, the first recognition result is corrected using one or more digits in the second recognition result for the second speech. At block 506, a second reply is presented via the display device based on the corrected first recognition result. For example, Table 4 below shows an example of a speech dialogue for correcting a numerical error of the first speech recognition result.
表4Table 4
在表4的示例中,用户在第二语音中纠正第一语音的识别结果中单个错误数字“6”,即电话号码中第六位的数字应当是0而不是6。在一些实施例中,在用户纠正数字的过程中,可以基于第二语音与第一语音的识别结果之间的最大匹配来确定用户需要纠正的数字部分。例如,在表4的示例中,数字“110”与识别结果中的“116”最大程度匹配,因而可以确定数字“110”是用于纠正前一轮对话中的“116”。在一些实施例中,用户还可以同时纠正语音识别结果中的错误的文字和数字二者。In the example in Table 4, the user corrects a single wrong number "6" in the recognition result of the first voice in the second voice, that is, the sixth digit in the phone number should be 0 instead of 6. In some embodiments, during the process of the user correcting numbers, the part of the number that needs to be corrected by the user may be determined based on the maximum match between the recognition results of the second voice and the first voice. For example, in the example in Table 4, the number "110" matches "116" in the recognition result to the greatest extent, so it can be determined that the number "110" is used to correct "116" in the previous round of dialogue. In some embodiments, the user can also correct both erroneous letters and numbers in the speech recognition results at the same time.
本申请的发明人发现,某些语音识别错误仅能通过可视显示而被发现(例如,由于同音而造成是文字识别错误),而有些语音识别错误通过语音也能够被发现,例如电话号码中的一个或多个数字被识别错误。在一些实施例中,也可以不通过显示设备显示第一语音的识别结果和第一回复,而仅通过语音来提供第一回复。在这种情况下,用户可以基于第一回复中朗读一组数字的,来发出用于纠正一组数字中的一个或多个数字的语音。接下来,聊天机器人基于纠正后的一组数字,来提供更加符合用户意图的第二回复。应当理解,不显示识别结果的场景既可以为不具有显示器的场景(例如,不具有显示器的智能音箱),也可以为具有显示器但是显示器不用于显示语音识别结果的场景(例如,在智能手机黑屏状态下的语音对话)。The inventors of the present application have found that some speech recognition errors can only be found through visual display (for example, text recognition errors due to homophony), while some speech recognition errors can also be found through speech, such as in phone numbers. One or more digits of was misidentified. In some embodiments, instead of displaying the recognition result of the first voice and the first reply through the display device, the first reply is only provided through voice. In this case, the user may issue a voice for correcting one or more digits in the set of numbers based on the reading of the set of numbers in the first reply. Next, the chatbot provides a second reply that is more in line with the user's intent based on the corrected set of numbers. It should be understood that the scene where the recognition result is not displayed may be a scene without a display (for example, a smart speaker without a display), or a scene with a display but the display is not used to display the speech recognition result (for example, a black screen on a smart phone state voice dialogue).
图6示出了根据本公开的实施例的通过对话补充识别结果的方法600的流程图。应当理解,方法600可以由以上参考图1所描述的聊天机器人120来执行,并且框602可以为以上参考图2所描述的框204的示例实现,框604和606可以为以上参考图2所描述的框206的示例实现。FIG. 6 shows a flow chart of a method 600 for supplementing recognition results through dialogue according to an embodiment of the present disclosure. It should be understood that the method 600 may be performed by the chat robot 120 described above with reference to FIG. 1 , and block 602 may be implemented as an example of block 204 described above with reference to FIG. An example implementation of block 206 of .
在框602,接收用于补充第一识别结果的第二语音。例如,对第一语音的第一识别结果未完全反映出用户的需求。在框604,响应于第二语音的内容在语义上补充第一语音的内容,组合第一识别结果和对第二语音的第二识别结果以生成第三识别结果。At block 602, a second speech supplementing a first recognition result is received. For example, the first recognition result for the first voice does not fully reflect the needs of the user. At block 604, the first recognition result and the second recognition result for the second speech are combined to generate a third recognition result in response to the content of the second speech semantically complementing the content of the first speech.
可以通过各种手段来确定两段语音的内容是否具有补充或者说解释说明的关系。例如,在一些实施例中,可以判断第二语音的内容是否能够在语义上与第一语音的内容是否具有承接关系。例如,如果两段语音的内容可以作为整体被一起解析,则可以认定二者具有语义承接关系,并据此判定第二语音的内容是第一语音的补充。备选地或者附加地,在一些实施例中,可以判断两段语音的内容同时出现的概率是否大于预定阈值。例如,如果两个语音的内容多数时候都是同时出现,则可以判定第二语音的内容是第一语音的补充。Various means can be used to determine whether the content of the two speeches has a supplementary or explanatory relationship. For example, in some embodiments, it may be determined whether the content of the second voice can semantically have a succession relationship with the content of the first voice. For example, if the content of two speeches can be analyzed together as a whole, it can be determined that the two have a semantic succession relationship, and accordingly it can be determined that the content of the second speech is a supplement to the first speech. Alternatively or additionally, in some embodiments, it may be determined whether the probability that the contents of the two speeches appear simultaneously is greater than a predetermined threshold. For example, if the content of the two voices occurs at the same time most of the time, it can be determined that the content of the second voice is a supplement to the first voice.
在框606,基于第三识别结果,通过显示设备呈现第二回复。例如,以下表5示出了对第一语音识别结果进行补充的对话示例。At block 606, a second reply is presented via the display device based on the third recognition result. For example, Table 5 below shows an example of a dialog supplementing the first speech recognition result.
表5table 5
在表5的示例中,用户的意图是要去北京大学西门,而语音识别系统在接收到“北京大学”之后就已经开始识别,因而未能识别出用户的准确意图。在这情况下,用户可以通过自然语言对话补充信息,然后组合这两个语音的识别结果生成新识别结果“我要去北京大学西门”,继而基于新识别结果来生成相应的回复。在一些实施例中,对于未表达完全的识别结果,用户可以直接补充信息,而无需等待识别结果和解析结果的返回,也不需要重述先前的对话内容。In the example in Table 5, the user's intention is to go to the west gate of Peking University, but the speech recognition system has already started to recognize after receiving "Peking University", so it cannot recognize the user's accurate intention. In this case, the user can add information through natural language dialogue, and then combine the recognition results of the two voices to generate a new recognition result "I am going to the West Gate of Peking University", and then generate a corresponding reply based on the new recognition result. In some embodiments, for the incomplete recognition result, the user can directly add information without waiting for the return of the recognition result and analysis result, and without repeating the previous dialogue content.
图7示出了根据本公开的实施例的用于处理语音对话的装置700的框图。如图7所示,装置700包括第一提供模块710、语音接收模块720以及第二提供模块730。第一提供模块710被配置为响应于接收到来自用户的第一语音,提供针对第一语音的第一回复,其中第一回复基于对第一语音的第一识别结果而被生成。语音接收模块720被配置为接收来自用户的第二语音,其中第二语音用于纠正或者补充第一识别结果。第二提供模块730被配置为基于第一语音和第二语音,生成将向用户提供的第二回复,其中第二回复比第一回复更加符合用户的意图。Fig. 7 shows a block diagram of an apparatus 700 for processing a voice dialogue according to an embodiment of the present disclosure. As shown in FIG. 7 , the device 700 includes a first providing module 710 , a voice receiving module 720 and a second providing module 730 . The first providing module 710 is configured to provide a first reply to the first speech in response to receiving the first speech from the user, wherein the first reply is generated based on a first recognition result of the first speech. The speech receiving module 720 is configured to receive a second speech from the user, wherein the second speech is used to correct or supplement the first recognition result. The second providing module 730 is configured to generate a second reply to be provided to the user based on the first voice and the second voice, wherein the second reply is more in line with the user's intention than the first reply.
在一些实施例中,其中第一提供模块710包括第一呈现模块,第一呈现模块被配置为通过显示设备呈现第一识别结果和第一回复。In some embodiments, the first providing module 710 includes a first presentation module, and the first presentation module is configured to present the first recognition result and the first reply through a display device.
在一些实施例中,其中第二提供模块730包括第一纠正模块,第一纠正模块被配置为使用对第二语音的第二识别结果来纠正第一识别结果中的文字识别错误。In some embodiments, the second providing module 730 includes a first correcting module, and the first correcting module is configured to use the second recognition result of the second voice to correct character recognition errors in the first recognition result.
在一些实施例中,其中第二提供模块730包括第二纠正模块,第二纠正模块被配置为使用对第二语音的第二识别结果中的一个或多个数字来纠正第一识别结果中的数字识别错误。In some embodiments, wherein the second providing module 730 includes a second correction module, the second correction module is configured to use one or more digits in the second recognition result for the second voice to correct the digits in the first recognition result Number recognition error.
在一些实施例中,其中第二语音用于补充第一识别结果,并且第二提供模块730包括:组合模块,被配置为响应于确定第二语音的内容在语义上补充第一语音的内容,组合第一识别结果和对第二语音的第二识别结果以生成第三识别结果;以及第二呈现模块,被配置为基于第三识别结果,通过显示设备呈现第二回复。In some embodiments, wherein the second speech is used to complement the first recognition result, and the second providing module 730 includes: a combining module configured to, in response to determining that the content of the second speech semantically complements the content of the first speech, Combining the first recognition result and the second recognition result for the second voice to generate a third recognition result; and a second presentation module configured to present a second reply through the display device based on the third recognition result.
在一些实施例中,其中第一提供模块710包括第一语音提供模块,第一语音提供模块被配置为仅通过语音提供第一回复,其中第一回复包括来自第一识别结果的一组数字。In some embodiments, wherein the first providing module 710 includes a first speech providing module, the first speech providing module is configured to provide the first reply only by speech, wherein the first reply includes a set of digits from the first recognition result.
在一些实施例中,其中第二语音用于纠正一组数字中,并且第二提供模块730包括:第三纠正模块,被配置为使用对第二语音的第二识别结果中的一个或多个数字来纠正一组数字;以及第二语音提供模块,被配置为基于经纠正的一组数字,通过语音提供第二回复。In some embodiments, wherein the second voice is used to correct a set of numbers, and the second providing module 730 includes: a third correction module configured to use one or more of the second recognition results for the second voice correcting a set of digits; and a second voice providing module configured to provide a second reply by voice based on the corrected set of digits.
在一些实施例中,装置700还包括动作执行模块,动作执行模块被配置为响应于第二回复已经识别用户的意图,执行与第二回复相关联的动作。In some embodiments, the apparatus 700 further includes an action execution module configured to execute an action associated with the second reply in response to the second reply having identified the user's intention.
在一些实施例中,装置700还包括:第三语音接收模块,被配置为响应于第二回复仍未识别用户的意图,接收来自用户的第三语音;以及第三提供模块,被配置为至少部分地基于第三语音,向用户提供第三回复。In some embodiments, the apparatus 700 further includes: a third voice receiving module, configured to receive a third voice from the user in response to the second reply not recognizing the user's intention; and a third providing module, configured to at least Based in part on the third speech, a third reply is provided to the user.
应当理解,图7中所示出的第一提供模块710、语音接收模块720以及第二提供模块730可以被包括在参考图1所描述的聊天机器人120中。而且,应当理解,图7中所示出的模块可以执行参考本公开的实施例的方法或过程中的步骤或动作。It should be understood that the first providing module 710 , the voice receiving module 720 and the second providing module 730 shown in FIG. 7 may be included in the chat robot 120 described with reference to FIG. 1 . Moreover, it should be understood that the modules shown in FIG. 7 may execute steps or actions in the methods or processes referring to the embodiments of the present disclosure.
图8示出了可以用来实施本公开的实施例的示例设备800的示意性框图。应当理解,设备800可以用于实现本公开所描述的用于处理语音对话的装置700或者聊天机器人120。如图所示,设备800包括中央处理单元(CPU)801,其可以根据存储在只读存储器(ROM)802中的计算机程序指令或者从存储单元808加载到随机访问存储器(RAM)803中的计算机程序指令,来执行各种适当的动作和处理。在RAM 803中,还可存储设备800操作所需的各种程序和数据。CPU 801、ROM 802以及RAM 803通过总线804彼此相连。输入/输出(I/O)接口805也连接至总线804。FIG. 8 shows a schematic block diagram of an example device 800 that may be used to implement embodiments of the present disclosure. It should be understood that the device 800 may be used to implement the apparatus 700 for processing voice dialogue or the chat robot 120 described in this disclosure. As shown, device 800 includes a central processing unit (CPU) 801 that can be programmed according to computer program instructions stored in read only memory (ROM) 802 or loaded from storage unit 808 into random access memory (RAM) 803 program instructions to perform various appropriate actions and processes. In the RAM 803, various programs and data necessary for the operation of the device 800 can also be stored. The CPU 801 , ROM 802 , and RAM 803 are connected to each other via a bus 804 . An input/output (I/O) interface 805 is also connected to the bus 804 .
设备800中的多个部件连接至I/O接口805,包括:输入单元806,例如键盘、鼠标等;输出单元807,例如各种类型的显示器、扬声器等;存储单元808,例如磁盘、光盘等;以及通信单元809,例如网卡、调制解调器、无线通信收发机等。通信单元809允许设备800通过诸如因特网的计算机网络和/或各种电信网络与其他设备交换信息/数据。Multiple components in the device 800 are connected to the I/O interface 805, including: an input unit 806, such as a keyboard, a mouse, etc.; an output unit 807, such as various types of displays, speakers, etc.; a storage unit 808, such as a magnetic disk, an optical disk, etc. ; and a communication unit 809, such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 809 allows the device 800 to exchange information/data with other devices over a computer network such as the Internet and/or various telecommunication networks.
处理单元801执行上文所描述的各个方法和过程,例如方法200、过程300、方法400、方法500以及方法600。例如,在一些实施例中,方法200、过程300、方法400、方法500以及方法600可被实现为计算机软件程序,其被有形地包含于机器可读介质,例如存储单元808。在一些实施例中,计算机程序的部分或者全部可以经由ROM 802和/或通信单元809而被载入和/或安装到设备800上。当计算机程序加载到RAM 803并由CPU 801执行时,可以执行上文描述的方法200、过程300、方法400、方法500以及方法600的一个或多个动作或步骤。备选地,在其他实施例中,CPU 801可以通过其他任何适当的方式(例如,借助于固件)而被配置为执行方法200、过程300、方法400、方法500以及方法600。The processing unit 801 executes the various methods and processes described above, such as the method 200 , the process 300 , the method 400 , the method 500 and the method 600 . For example, in some embodiments, method 200 , process 300 , method 400 , method 500 , and method 600 may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 808 . In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 800 via the ROM 802 and/or the communication unit 809 . When the computer program is loaded into RAM 803 and executed by CPU 801 , one or more actions or steps of method 200 , process 300 , method 400 , method 500 and method 600 described above may be performed. Alternatively, in other embodiments, the CPU 801 may be configured to execute the method 200 , the process 300 , the method 400 , the method 500 and the method 600 in any other suitable manner (for example, by means of firmware).
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、芯片上系统的系统(SOC)、负载可编程逻辑设备(CPLD),等等。The functions described above herein may be performed at least in part by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: Field Programmable Gate Array (FPGA), Application Specific Integrated Circuit (ASIC), Application Specific Standard Product (ASSP), System on a Chip (SOC), load programmable logic device (CPLD), etc.
用于实施本公开的方法的程序代码可以采用一个或多个编程语言的任何组合来编写。这些程序代码可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理器或控制器,使得程序代码当由处理器或控制器执行时使流程图和/或框图中所规定的功能/操作被实施。程序代码可以完全在机器上执行、部分地在机器上执行,作为独立软件包部分地在机器上执行且部分地在远程机器上执行或完全在远程机器或服务器上执行。Program codes for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general-purpose computer, a special purpose computer, or other programmable data processing devices, so that the program codes, when executed by the processor or controller, make the functions/functions specified in the flow diagrams and/or block diagrams Action is implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of the present disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
此外,虽然采用特定次序描绘了各动作或步骤,但是这应当理解为要求这样动作或步骤以所示出的特定次序或以顺序次序执行,或者要求所有图示的动作或步骤应被执行以取得期望的结果。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实现中。相反地,在单个实现的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实现中。Furthermore, although acts or steps are depicted in a particular order, this should be understood as requiring that such acts or steps be performed in the particular order shown or in a sequential order, or that all illustrated acts or steps should be performed to achieve desired result. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while the above discussion contains several specific implementation details, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single implementation. Conversely, various features that are described in the context of a single implementation can also be implemented in multiple implementations separately or in any suitable subcombination.
尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本公开的实施例,但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。Although the embodiments of the present disclosure have been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are merely example forms of implementing the claims.
Claims (20)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201810541680.6A CN108877792B (en) | 2018-05-30 | 2018-05-30 | Method, apparatus, electronic device and computer readable storage medium for processing voice conversations |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201810541680.6A CN108877792B (en) | 2018-05-30 | 2018-05-30 | Method, apparatus, electronic device and computer readable storage medium for processing voice conversations |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN108877792A true CN108877792A (en) | 2018-11-23 |
| CN108877792B CN108877792B (en) | 2023-10-24 |
Family
ID=64335845
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201810541680.6A Active CN108877792B (en) | 2018-05-30 | 2018-05-30 | Method, apparatus, electronic device and computer readable storage medium for processing voice conversations |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN108877792B (en) |
Cited By (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109712616A (en) * | 2018-11-29 | 2019-05-03 | 平安科技(深圳)有限公司 | Telephone number error correction method, device and computer equipment based on data processing |
| CN109922371A (en) * | 2019-03-11 | 2019-06-21 | 青岛海信电器股份有限公司 | Natural language processing method, equipment and storage medium |
| CN110223694A (en) * | 2019-06-26 | 2019-09-10 | 百度在线网络技术(北京)有限公司 | Method of speech processing, system and device |
| CN110299152A (en) * | 2019-06-28 | 2019-10-01 | 北京猎户星空科技有限公司 | Interactive output control method, device, electronic equipment and storage medium |
| CN110347815A (en) * | 2019-07-11 | 2019-10-18 | 上海蔚来汽车有限公司 | Multi-task processing method and multitasking system in speech dialogue system |
| CN110738997A (en) * | 2019-10-25 | 2020-01-31 | 百度在线网络技术(北京)有限公司 | information correction method, device, electronic equipment and storage medium |
| CN112002321A (en) * | 2020-08-11 | 2020-11-27 | 海信电子科技(武汉)有限公司 | Display device, server and voice interaction method |
| WO2023040658A1 (en) * | 2021-09-18 | 2023-03-23 | 华为技术有限公司 | Speech interaction method and electronic device |
Citations (10)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2012037820A (en) * | 2010-08-11 | 2012-02-23 | Murata Mach Ltd | Voice recognition apparatus, voice recognition apparatus for picking, and voice recognition method |
| CN105094315A (en) * | 2015-06-25 | 2015-11-25 | 百度在线网络技术(北京)有限公司 | Method and apparatus for smart man-machine chat based on artificial intelligence |
| CN105468582A (en) * | 2015-11-18 | 2016-04-06 | 苏州思必驰信息科技有限公司 | Method and device for correcting numeric string based on human-computer interaction |
| CN106710592A (en) * | 2016-12-29 | 2017-05-24 | 北京奇虎科技有限公司 | Speech recognition error correction method and speech recognition error correction device used for intelligent hardware equipment |
| US9728188B1 (en) * | 2016-06-28 | 2017-08-08 | Amazon Technologies, Inc. | Methods and devices for ignoring similar audio being received by a system |
| CN107045496A (en) * | 2017-04-19 | 2017-08-15 | 畅捷通信息技术股份有限公司 | The error correction method and error correction device of text after speech recognition |
| CN107305483A (en) * | 2016-04-25 | 2017-10-31 | 北京搜狗科技发展有限公司 | A kind of voice interactive method and device based on semantics recognition |
| JP2018004976A (en) * | 2016-07-04 | 2018-01-11 | パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカPanasonic Intellectual Property Corporation of America | Voice interactive method, voice interactive device and voice interactive program |
| CN107870977A (en) * | 2016-09-27 | 2018-04-03 | 谷歌公司 | Form chatbot output based on user state |
| CN107943914A (en) * | 2017-11-20 | 2018-04-20 | 渡鸦科技(北京)有限责任公司 | Voice information processing method and device |
-
2018
- 2018-05-30 CN CN201810541680.6A patent/CN108877792B/en active Active
Patent Citations (10)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2012037820A (en) * | 2010-08-11 | 2012-02-23 | Murata Mach Ltd | Voice recognition apparatus, voice recognition apparatus for picking, and voice recognition method |
| CN105094315A (en) * | 2015-06-25 | 2015-11-25 | 百度在线网络技术(北京)有限公司 | Method and apparatus for smart man-machine chat based on artificial intelligence |
| CN105468582A (en) * | 2015-11-18 | 2016-04-06 | 苏州思必驰信息科技有限公司 | Method and device for correcting numeric string based on human-computer interaction |
| CN107305483A (en) * | 2016-04-25 | 2017-10-31 | 北京搜狗科技发展有限公司 | A kind of voice interactive method and device based on semantics recognition |
| US9728188B1 (en) * | 2016-06-28 | 2017-08-08 | Amazon Technologies, Inc. | Methods and devices for ignoring similar audio being received by a system |
| JP2018004976A (en) * | 2016-07-04 | 2018-01-11 | パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカPanasonic Intellectual Property Corporation of America | Voice interactive method, voice interactive device and voice interactive program |
| CN107870977A (en) * | 2016-09-27 | 2018-04-03 | 谷歌公司 | Form chatbot output based on user state |
| CN106710592A (en) * | 2016-12-29 | 2017-05-24 | 北京奇虎科技有限公司 | Speech recognition error correction method and speech recognition error correction device used for intelligent hardware equipment |
| CN107045496A (en) * | 2017-04-19 | 2017-08-15 | 畅捷通信息技术股份有限公司 | The error correction method and error correction device of text after speech recognition |
| CN107943914A (en) * | 2017-11-20 | 2018-04-20 | 渡鸦科技(北京)有限责任公司 | Voice information processing method and device |
Non-Patent Citations (2)
| Title |
|---|
| 盛世流光中: "苹果Siri对比三星Bixby,语音助手都成精了", 《爱奇艺》 * |
| 盛世流光中: "苹果Siri对比三星Bixby,语音助手都成精了", 《爱奇艺》, 23 November 2017 (2017-11-23), pages 19 * |
Cited By (13)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109712616B (en) * | 2018-11-29 | 2023-11-14 | 平安科技(深圳)有限公司 | Telephone number error correction method and device based on data processing and computer equipment |
| CN109712616A (en) * | 2018-11-29 | 2019-05-03 | 平安科技(深圳)有限公司 | Telephone number error correction method, device and computer equipment based on data processing |
| CN109922371A (en) * | 2019-03-11 | 2019-06-21 | 青岛海信电器股份有限公司 | Natural language processing method, equipment and storage medium |
| CN109922371B (en) * | 2019-03-11 | 2021-07-09 | 海信视像科技股份有限公司 | Natural language processing method, device and storage medium |
| CN110223694B (en) * | 2019-06-26 | 2021-10-15 | 百度在线网络技术(北京)有限公司 | Speech processing method, system and device |
| CN110223694A (en) * | 2019-06-26 | 2019-09-10 | 百度在线网络技术(北京)有限公司 | Method of speech processing, system and device |
| CN110299152A (en) * | 2019-06-28 | 2019-10-01 | 北京猎户星空科技有限公司 | Interactive output control method, device, electronic equipment and storage medium |
| CN110347815A (en) * | 2019-07-11 | 2019-10-18 | 上海蔚来汽车有限公司 | Multi-task processing method and multitasking system in speech dialogue system |
| CN110738997B (en) * | 2019-10-25 | 2022-06-17 | 百度在线网络技术(北京)有限公司 | Information correction method and device, electronic equipment and storage medium |
| CN110738997A (en) * | 2019-10-25 | 2020-01-31 | 百度在线网络技术(北京)有限公司 | information correction method, device, electronic equipment and storage medium |
| CN112002321A (en) * | 2020-08-11 | 2020-11-27 | 海信电子科技(武汉)有限公司 | Display device, server and voice interaction method |
| CN112002321B (en) * | 2020-08-11 | 2023-09-19 | 海信电子科技(武汉)有限公司 | Display device, server and voice interaction method |
| WO2023040658A1 (en) * | 2021-09-18 | 2023-03-23 | 华为技术有限公司 | Speech interaction method and electronic device |
Also Published As
| Publication number | Publication date |
|---|---|
| CN108877792B (en) | 2023-10-24 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN108877792B (en) | Method, apparatus, electronic device and computer readable storage medium for processing voice conversations | |
| US11842045B2 (en) | Modality learning on mobile devices | |
| US10438593B2 (en) | Individualized hotword detection models | |
| US11217236B2 (en) | Method and apparatus for extracting information | |
| US20190279622A1 (en) | Method for speech recognition dictation and correction, and system | |
| US20180189628A1 (en) | Determining semantically diverse responses for providing as suggestions for inclusion in electronic communications | |
| US9099091B2 (en) | Method and apparatus of adaptive textual prediction of voice data | |
| CN107680588B (en) | Intelligent voice navigation method, device and storage medium | |
| CN108763548A (en) | Collect method, apparatus, equipment and the computer readable storage medium of training data | |
| CN113743127B (en) | Task-based dialogue method, device, electronic device and storage medium | |
| US20190073994A1 (en) | Self-correcting computer based name entity pronunciations for speech recognition and synthesis | |
| JP2018063271A (en) | Voice dialogue apparatus, voice dialogue system, and control method of voice dialogue apparatus | |
| CN110232920B (en) | Voice processing method and device | |
| US9779722B2 (en) | System for adapting speech recognition vocabulary | |
| US10600405B2 (en) | Speech signal processing method and speech signal processing apparatus | |
| CN113111658B (en) | Method, device, equipment and storage medium for checking information | |
| CN112669839B (en) | Voice interaction method, device, equipment and storage medium | |
| US11308936B2 (en) | Speech signal processing method and speech signal processing apparatus | |
| CN111916085A (en) | Human-machine dialogue matching method, device and medium based on pronunciation similarity | |
| KR20200109995A (en) | A phising analysis apparatus and method thereof | |
| CN114154500B (en) | Text proofreading method, device, equipment, medium and program product | |
| CN117521609A (en) | Form filling method, device, equipment and medium |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |