[go: up one dir, main page]

CN116016779A - Speech call translation assistance method, system, computer equipment and storage medium - Google Patents

Speech call translation assistance method, system, computer equipment and storage medium Download PDF

Info

Publication number
CN116016779A
CN116016779A CN202211650023.8A CN202211650023A CN116016779A CN 116016779 A CN116016779 A CN 116016779A CN 202211650023 A CN202211650023 A CN 202211650023A CN 116016779 A CN116016779 A CN 116016779A
Authority
CN
China
Prior art keywords
audio
text
call
target
reply
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211650023.8A
Other languages
Chinese (zh)
Inventor
张猷健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
iFlytek Co Ltd
Original Assignee
iFlytek Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by iFlytek Co Ltd filed Critical iFlytek Co Ltd
Priority to CN202211650023.8A priority Critical patent/CN116016779A/en
Publication of CN116016779A publication Critical patent/CN116016779A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

本申请提供一种语音通话翻译辅助方法、系统、计算机设备和存储介质,方法包括:获取目标客户端的通话音频;若通话音频为目标语种,则切断目标客户端与目标服务端之间的语音通路;对通话音频进行处理,输出通话音频对应的预设语种的翻译文本,并展示在目标服务端的显示屏上;获取翻译文本对应的预设语种的答复信息,并对答复信息进行处理,生成对应目标语种的答复音频,并传输给目标客户端。本申请实施例提供的语音通话翻译辅助方法,能够在不引入第三方人员的基础上,全流程自动化实现目标客户端和目标服务端的通话交互,有效提高了呼叫中心在接收到非特定语种的电话后的服务效率。

Figure 202211650023

The present application provides a voice call translation assistance method, system, computer equipment and storage medium. The method includes: acquiring the call audio of the target client; if the call audio is in the target language, cutting off the voice path between the target client and the target server ;Process the call audio, output the translation text of the preset language corresponding to the call audio, and display it on the display screen of the target server; obtain the reply information of the preset language corresponding to the translation text, and process the reply information to generate a corresponding Reply audio in the target language and transmit to the target client. The voice call translation assistance method provided by the embodiment of the present application can automate the entire process to realize the call interaction between the target client and the target server without introducing third-party personnel, and effectively improve the efficiency of the call center when receiving non-specific language calls. subsequent service efficiency.

Figure 202211650023

Description

语音通话翻译辅助方法、系统、计算机设备和存储介质Speech call translation assistance method, system, computer equipment and storage medium

技术领域technical field

本申请涉及通话处理技术领域,具体涉及一种语音通话翻译辅助方法、系统、计算机设备和存储介质(计算机可读存储介质)。The present application relates to the technical field of call processing, in particular to a voice call translation assistance method, system, computer equipment and storage medium (computer-readable storage medium).

背景技术Background technique

呼叫中心,又称客户服务中心,是把用户的电话呼叫转移到一个相对集中的场所,由一批服务人员针对用户的需求进行电话沟通与服务。The call center, also known as the customer service center, transfers the user's telephone call to a relatively centralized place, and a group of service personnel conduct telephone communication and service according to the user's needs.

目前各类呼叫中心虽然可以满足电话服务请求,但受制于客服人员的语种服务能力,通常只能服务于特定语种服务的电话请求,非特定语种的电话接通后,需要先由服务人员提起翻译需求,然后通过三方通话功能联系第三方翻译人员,从而造成呼叫客户的等待,影响服务效率。At present, although various call centers can meet telephone service requests, they are limited by the language service capabilities of customer service personnel, and usually can only serve telephone requests for services in specific languages. After a call in a non-specific language is connected, the service personnel need to request translation demand, and then contact the third-party interpreter through the three-way call function, which will cause the waiting of the calling customer and affect the service efficiency.

发明内容Contents of the invention

基于此,有必要针对上述技术问题,提供一种语音通话翻译辅助方法、系统、计算机设备和存储介质,用以解决现有的呼叫中心在接到非特定语种的服务需求后,服务效率低的技术问题。Based on this, it is necessary to address the above technical problems and provide a voice call translation assistance method, system, computer equipment and storage media to solve the problem of low service efficiency in existing call centers after receiving service demands in non-specific languages. technical problem.

第一方面,本申请提供一种语音通话翻译辅助方法,包括:In the first aspect, the present application provides a voice call translation assistance method, including:

获取目标客户端的通话音频;Obtain the call audio of the target client;

若所述通话音频对应的语种类型为目标语种,则切断所述目标客户端与目标服务端之间的语音通路;If the language type corresponding to the call audio is the target language, cut off the voice path between the target client and the target server;

对所述通话音频进行处理,输出所述通话音频对应的预设语种的翻译文本,并展示在所述目标服务端的显示屏上;Processing the call audio, outputting the translation text of the preset language corresponding to the call audio, and displaying it on the display screen of the target server;

获取所述翻译文本对应的预设语种的答复信息,并对所述答复信息进行处理,生成对应所述目标语种的答复音频,并传输给所述目标客户端。Obtain reply information in a preset language corresponding to the translated text, process the reply information, generate reply audio corresponding to the target language, and transmit the reply to the target client.

作为本申请的一种可行实施例,所述对所述通话音频进行处理,输出所述通话音频对应的预设语种的翻译文本,包括:As a feasible embodiment of the present application, the processing the call audio and outputting the translated text of the preset language corresponding to the call audio includes:

根据所述目标语种查询预设数据库,获取与所述目标语种关联的音频处理模型;Querying a preset database according to the target language to obtain an audio processing model associated with the target language;

将所述通话音频输入至所述音频处理模型中进行处理,输出所述通话音频对应的预设语种的翻译文本。The call audio is input into the audio processing model for processing, and a translation text in a preset language corresponding to the call audio is output.

作为本申请的一种可行实施例,所述输出所述通话音频对应的预设语种的翻译文本并显示在所述目标服务端的显示屏上之后,所述方法还包括:As a feasible embodiment of the present application, after outputting the translated text in the preset language corresponding to the call audio and displaying it on the display screen of the target server, the method further includes:

对所述翻译文本进行处理,得到所述翻译文本对应的语义信息;Processing the translated text to obtain semantic information corresponding to the translated text;

根据所述语义信息查询预设数据库,获取与所述语义信息关联的关联文本,并在所述显示屏上展示所述关联文本;Querying a preset database according to the semantic information, obtaining associated text associated with the semantic information, and displaying the associated text on the display screen;

响应于对所述显示屏上所述关联文本的选择指令,将所述关联文本中对应所述选择指令的文本信息作为所述翻译文本对应的答复信息。In response to a selection instruction for the associated text on the display screen, use text information corresponding to the selection instruction in the associated text as reply information corresponding to the translated text.

作为本申请的一种可行实施例,所述获取所述翻译文本对应的答复信息,包括:As a feasible embodiment of the present application, the obtaining the reply information corresponding to the translated text includes:

响应于对所述显示屏上预设按钮的第一操作指令,启动预设录音装置;In response to a first operation instruction to a preset button on the display screen, start a preset recording device;

响应于对所述显示屏上预设按钮的第二操作指令,关闭预设录音装置;In response to a second operation instruction to the preset button on the display screen, turn off the preset recording device;

根据所述预设录音装置采集到的音频信息生成所述翻译文本对应的答复信息。Generate answer information corresponding to the translated text according to the audio information collected by the preset recording device.

作为本申请的一种可行实施例,所述根据所述预设录音装置采集到的音频信息生成所述翻译文本对应的答复信息,包括:As a feasible embodiment of the present application, the generating the reply information corresponding to the translated text according to the audio information collected by the preset recording device includes:

对所述预设录音装置采集到的音频信息进行处理,生成所述音频信息对应的预设语种的答复文本,并在所述显示屏上进行展示;Processing the audio information collected by the preset recording device, generating a reply text in a preset language corresponding to the audio information, and displaying it on the display screen;

响应于对所述显示屏上答复文本的修改操作指令,生成修改后的答复文本,并将所述修改后的答复文本作为所述翻译文本对应的答复信息。In response to an operation instruction for modifying the reply text on the display screen, a revised reply text is generated, and the modified reply text is used as reply information corresponding to the translated text.

作为本申请的一种可行实施例,所述获取所述翻译文本对应的预设语种的答复信息之后,所述方法还包括:As a feasible embodiment of the present application, after obtaining the reply information in the preset language corresponding to the translated text, the method further includes:

对所述答复信息进行处理,生成目标语种的答复文本,并将所述目标语种的答复文本传输给所述目标客户端。Processing the reply information to generate a reply text in the target language, and transmitting the reply text in the target language to the target client.

作为本申请的一种可行实施例,所述获取目标客户端的通话音频之后,所述方法还包括:As a feasible embodiment of the present application, after the acquisition of the call audio of the target client, the method further includes:

获取所述通话音频对应的通话间隔,以及所述目标语种对应的预设提醒语音信息;Obtain the call interval corresponding to the call audio, and the preset reminder voice information corresponding to the target language;

若所述通话间隔大于预设的时长阈值,则在所述生成对应所述目标语种的答复音频的步骤之前,将所述预设提醒语音信息传输给所述目标客户端。If the call interval is greater than a preset duration threshold, before the step of generating the reply audio corresponding to the target language, the preset reminder voice information is transmitted to the target client.

作为本申请的一种可行实施例,所述切断所述目标客户端与目标服务端之间的语音通路是指切断由所述目标服务端发送至所述目标客户端的单向语音通路。As a feasible embodiment of the present application, the cutting off the voice path between the target client and the target server refers to cutting off a one-way voice path sent from the target server to the target client.

第二方面,本申请提供一种语音通话翻译辅助系统,包括:In a second aspect, the present application provides a voice call translation assistance system, including:

语音通话子系统,用于获取目标客户端的通话音频;若所述通话音频对应的语种类型为目标语种,则切断所述目标客户端与目标服务端之间的语音通路;The voice call subsystem is used to obtain the call audio of the target client; if the language type corresponding to the call audio is the target language, then cut off the voice path between the target client and the target server;

语音翻译子系统,用于对所述通话音频进行处理,输出所述通话音频对应的预设语种的翻译文本,并展示在所述目标服务端的显示屏上;The speech translation subsystem is used to process the call audio, output the translation text of the preset language corresponding to the call audio, and display it on the display screen of the target server;

语音合成子系统,用于获取所述翻译文本对应的预设语种的答复信息,并对所述答复信息进行处理,生成对应所述目标语种的答复音频,并传输给所述目标客户端。The speech synthesis subsystem is configured to obtain reply information in a preset language corresponding to the translated text, process the reply information, generate reply audio corresponding to the target language, and transmit the reply to the target client.

第三方面,本申请还提供一种计算机设备,其特征在于,所述计算机设备包括:In a third aspect, the present application also provides a computer device, wherein the computer device includes:

处理器;以及processor; and

一个或多个应用程序,其中所述一个或多个应用程序被存储于所述处理器中,所述处理器执行所述一个或多个应用程序时,以用于实现如上述任一项所述的语音通话翻译辅助方法。One or more application programs, wherein the one or more application programs are stored in the processor, and when the processor executes the one or more application programs, it is used to implement any of the above The voice call translation assistance method described above.

第四方面,本申请还提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时,以用于实现如上述任一项所述的语音通话翻译辅助方法。In the fourth aspect, the present application also provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, it is used to implement the voice call translation assistance method as described in any one of the above .

本申请实施例提供的语音通话翻译辅助方法,在获取到目标客户端的通话音频,若检测到通话音频对应的语种类型为目标语种,则会切断所述目标客户端与目标服务端之间的语音通路,并输出该通话音频对应的预设语种的翻译文本,提供在目标服务端的显示屏上,从而在服务人员根据翻译文本反馈预设语种的答复信息后,能够对答复信息进行处理,生成对应目标语种的答复音频,提供给目标客户端。本申请实施例提供的语音通话翻译辅助方法,能够在不引入第三方人员的基础上,全流程自动化实现目标客户端和目标服务端的通话交互,有效提高了呼叫中心在接收到非特定语种的电话后的服务效率。In the voice call translation assistance method provided by the embodiment of the present application, after obtaining the call audio of the target client, if it is detected that the language type corresponding to the call audio is the target language, the voice between the target client and the target server will be cut off channel, and output the translation text of the preset language corresponding to the call audio, and provide it on the display screen of the target server, so that after the service personnel feed back the reply information in the preset language according to the translated text, they can process the reply information and generate corresponding The reply audio in the target language is provided to the target client. The voice call translation assistance method provided by the embodiment of the present application can automate the entire process to realize the call interaction between the target client and the target server without introducing third-party personnel, effectively improving the call center's ability to receive non-specific language calls. subsequent service efficiency.

附图说明Description of drawings

为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings that need to be used in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present application. For those skilled in the art, other drawings can also be obtained based on these drawings without any creative effort.

图1为本申请实施例提供的一种语音通话翻译辅助方法的应用场景示意图;FIG. 1 is a schematic diagram of an application scenario of a voice call translation assistance method provided by an embodiment of the present application;

图2为本申请实施例提供了一种语音通话翻译辅助方法的步骤流程示意图;Fig. 2 provides a schematic flow chart of the steps of a voice call translation assistance method according to an embodiment of the present application;

图3为本申请实施例提供的一种基于语种类型获取对应的音频处理模型来对通话音频进行处理的步骤流程示意图;FIG. 3 is a schematic flow diagram of steps for processing call audio by acquiring a corresponding audio processing model based on the language type provided by the embodiment of the present application;

图4为本申请实施例提供的一种在显示屏上同步展示关联文本以辅助客服人员进行答复的实现方案;FIG. 4 is an implementation scheme for synchronously displaying associated text on a display screen to assist customer service personnel in answering provided by an embodiment of the present application;

图5为本申请实施例提供的一种基于语音方式获取答复信息的步骤流程示意图;FIG. 5 is a schematic flow chart of steps for acquiring reply information based on voice provided in an embodiment of the present application;

图6为本申请实施例提供的一种展示答复文本以辅助进行修改的答复信息的步骤流程示意图;Fig. 6 is a schematic flow chart of the steps of displaying the reply text to assist in modifying the reply information provided by the embodiment of the present application;

图7为本申请实施例提供的一种提供提醒语音信息给目标客户端的步骤流程示意图;FIG. 7 is a schematic flow diagram of steps for providing reminder voice information to a target client according to an embodiment of the present application;

图8为本申请实施例提供的一种语音通话翻译辅助系统的结构示意图;FIG. 8 is a schematic structural diagram of a voice call translation assistance system provided by an embodiment of the present application;

图9为本申请实施例提供的一种计算机设备的结构示意图。FIG. 9 is a schematic structural diagram of a computer device provided by an embodiment of the present application.

具体实施方式Detailed ways

下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the application with reference to the drawings in the embodiments of the application. Apparently, the described embodiments are only some of the embodiments of the application, not all of them. Based on the embodiments in this application, all other embodiments obtained by those skilled in the art without making creative efforts belong to the scope of protection of this application.

在本申请的描述中,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个所述特征。在本申请的描述中,“多个”的含义是两个或两个以上,除非另有明确具体的限定。In the description of the present application, the terms "first" and "second" are used for descriptive purposes only, and cannot be understood as indicating or implying relative importance or implicitly specifying the quantity of indicated technical features. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of said features. In the description of the present application, "plurality" means two or more, unless otherwise specifically defined.

在本申请的描述中,“例如”一词用来表示“用作例子、例证或说明”。本申请中被描述为“例如”的任何实施例不一定被解释为比其它实施例更优选或更具优势。为了使本领域任何技术人员能够实现和使用本发明,给出了以下描述。在以下描述中,为了解释的目的而列出了细节。应当明白的是,本领域普通技术人员可以认识到,在不使用这些特定细节的情况下也可以实现本发明。在其它实例中,不会对公知的结构和过程进行详细阐述,以避免不必要的细节使本发明的描述变得晦涩。因此,本发明并非旨在限于所示的实施例,而是与符合本申请所公开的原理和特征的最广范围相一致。In the description of this application, the word "for example" is used to mean "serving as an example, illustration or illustration". Any embodiment described in this application as "such as" is not necessarily to be construed as preferred or advantageous over other embodiments. The following description is given to enable any person skilled in the art to make and use the invention. In the following description, details are set forth for purposes of explanation. It should be understood that one of ordinary skill in the art would recognize that the present invention may be practiced without the use of these specific details. In other instances, well-known structures and procedures are not described in detail to avoid obscuring the description of the present invention with unnecessary detail. Thus, the present invention is not intended to be limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and features disclosed in this application.

为便于理解本申请实施例提供的语音通话翻译辅助方法的完整实现方案,先对语音通话翻译辅助方法的实现场景进行说明,详述如下。In order to facilitate the understanding of the complete implementation solution of the voice call translation assistance method provided by the embodiment of the present application, the implementation scenario of the voice call translation assistance method is described first, and the details are as follows.

本申请实施例提供的语音通话翻译辅助方法主要用于呼叫中心的业务场景下,其中客户可以通过终端设备上的客户端与呼叫中心对应服务端之间的通讯功能,来与呼叫中心的客服人员进行通话,从而获取到相应的服务需求。然而,在大多数情况下,客服人员仅能够提供特定语种的服务,因此,当客服人员接听到非特定语种的目标客户的电话后,需要先发起翻译请求,然后通过呼叫中心的三方通话功能,让第三方翻译人员加入到通话进程中,但是该流程复杂,往往需要呼叫客户等待一定时间,影响了呼叫中心的服务效率。The voice call translation assistance method provided by the embodiment of the present application is mainly used in the business scenario of the call center, wherein the customer can communicate with the customer service personnel of the call center through the communication function between the client terminal on the terminal device and the corresponding server of the call center Make a call to obtain the corresponding service requirements. However, in most cases, customer service personnel can only provide services in a specific language. Therefore, when customer service personnel receive a call from a target customer in a non-specific language, they need to initiate a translation request first, and then use the three-way call function of the call center. Let a third-party interpreter join the call process, but the process is complicated, and the caller often needs to wait for a certain period of time, which affects the service efficiency of the call center.

正是为了解决上述问题,本申请实施例提供了一种语音通话翻译辅助方法、系统、计算机设备和存储介质。其中,语音通话翻译辅助方法是以程序的形式安装于语音通话翻译辅助系统中,语音通话翻译辅助系统是以处理器的形式设置于计算机设备中,计算机设备中的语音通话翻译辅助系统通过执行语音通话翻译辅助方法所对应的软件程序,从而实现本申请实施例所提供的语音通话翻译辅助方法。其中,语音通话翻译辅助系统可以是由呼叫中心所提供,具有与呼叫中心的服务端进行数据交互的功能,当然,也可以是由通信运营商所提供,具有同时与客户端和服务端进行通讯的功能。具体的,如图1所示,图1为本申请实施例提供的一种语音通话翻译辅助方法的应用场景示意图,在该示意图中,包括位于用户终端设备上的目标客户端100、由呼叫中心所提供的语音通话翻译辅助系统200以及位于呼叫中心的目标服务端300,其中,用户通过目标客户端100所录入的通话音频会被传输至语音通话翻译辅助系统200中,并在识别出通话音频中对应的语种类型为需要进行处理的目标语种后,由语音通话翻译辅助系统200对通话音频进行相应处理,将通话音频转化为预先设定好的语种的翻译文本,并传输给目标服务端300,以通过目标服务端300的显示屏输出给客服人员,从而便于接收客服人员基于该展示的翻译文本所提供的答复信息,然后合成对应目标语种的应答音频,重新发送给目标客户端100。Just to solve the above problems, the embodiments of the present application provide a voice call translation assistance method, system, computer equipment and storage medium. Wherein, the voice call translation auxiliary method is installed in the voice call translation auxiliary system in the form of a program, and the voice call translation auxiliary system is set in the computer equipment in the form of a processor, and the voice call translation auxiliary system in the computer equipment executes the voice call translation assistance system. The software program corresponding to the method for assisting call translation, thereby realizing the method for assisting speech call translation provided in the embodiment of the present application. Among them, the voice call translation auxiliary system can be provided by the call center, and has the function of data interaction with the server of the call center, of course, it can also be provided by the communication operator, and can communicate with the client and the server at the same time function. Specifically, as shown in FIG. 1, FIG. 1 is a schematic diagram of an application scenario of a voice call translation assistance method provided by the embodiment of the present application. In this schematic diagram, the target client 100 located on the user terminal device, the The provided voice call translation assistance system 200 and the target server 300 located in the call center, wherein the call audio entered by the user through the target client 100 will be transmitted to the voice call translation support system 200, and the call audio will be recognized After the corresponding language type is the target language that needs to be processed, the voice call translation assistance system 200 will process the call audio accordingly, convert the call audio into a translation text in a preset language, and transmit it to the target server 300 , to output to the customer service personnel through the display screen of the target server 300, so as to receive the reply information provided by the customer service personnel based on the displayed translated text, and then synthesize the response audio corresponding to the target language, and resend it to the target client 100.

需要说明的是,上述提供的语音通话翻译辅助系统的应用场景示意图仅仅为一种可行实现方案,并非构成对本申请实施例方案的限制,例如前述图1仅仅是以单个目标客户端100为例进行说明的,事实上,本申请实施例提供的语音通话翻译辅助系统可以同时针对于多个不同的目标客户端100,本申请实施例在此不再赘述。It should be noted that the application scenario schematic diagram of the voice call translation assistance system provided above is only a feasible implementation solution, and does not constitute a limitation to the embodiment of the application. For example, the aforementioned FIG. 1 only takes a single target client 100 as an example It should be noted that, in fact, the voice call translation assistance system provided by the embodiment of the present application can be aimed at multiple different target clients 100 at the same time, so the embodiment of the present application will not repeat them here.

此外,用户终端设备上的目标客户端100可以是以不同的形式存在,例如目标客户端可以是网页、APP软件、小程序等等,而语音通话翻译辅助系统200可以是服务器,也可以是云服务器,当然也可以是由多台服务器所组成的服务器集群,本申请实施例在此同样不再赘述。In addition, the target client 100 on the user terminal device can exist in different forms, for example, the target client can be a webpage, APP software, applet, etc., and the voice call translation assistance system 200 can be a server or a cloud Of course, the server may also be a server cluster composed of multiple servers, which will not be repeated in this embodiment of the present application.

在上述应用场景下,本申请实施例提供了一种语音通话翻译辅助方法的步骤流程示意图,具体的,包括步骤201~204:In the above application scenario, the embodiment of the present application provides a schematic diagram of the steps of a voice call translation assistance method, specifically, steps 201-204 are included:

201,获取目标客户端的通话音频。201. Acquire call audio of a target client.

本申请实施例中,当目标客户通过目标客户端向呼叫中心的目标服务端发起通讯请求后,若目标服务端接收请求并建立了与目标客户端之间的通讯关系后,此时目标客户通过目标客户端所录入的通话信息会以音频流的形式发送给语音通话翻译辅助系统,该音频流即为通话音频。In the embodiment of this application, when the target client initiates a communication request to the target server of the call center through the target client, if the target server receives the request and establishes a communication relationship with the The call information recorded by the target client will be sent to the voice call translation assistance system in the form of audio stream, and the audio stream is the call audio.

202,若所述通话音频对应的语种类型为目标语种,则切断所述目标客户端与目标服务端之间的语音通路。202. If the language type corresponding to the call audio is the target language, cut off the voice path between the target client and the target server.

本申请实施例中,在语音通话翻译辅助系统获取到通话音频,可以通过训练好的语种识别模型来检测得到该通话音频对应的语种类型,其中训练好的语种识别模型可以是基于有监督的神经网络算法所训练得到,例如通过深度学习使语种识别模型学习到不同的语种类型与音频中的声学频谱特征或是音素特征之间的关联关系,本申请实施例在此不再赘述。当然,除了由语音通话翻译辅助系统自动对通话音频进行检测外,目标客户也可以在通过目标客户端发起通讯请求的过程中,基于目标客户端所展示的语种类型选项选择相应的语种类型,此时,所选择的语种类型会被发送给语音通话翻译辅助系统来作为后续所获取到的通话音频对应的语种类型。In the embodiment of the present application, the call audio is obtained in the voice call translation assistance system, and the language type corresponding to the call audio can be detected through the trained language recognition model, wherein the trained language recognition model can be based on a supervised neural network The network algorithm is trained, for example, through deep learning to enable the language recognition model to learn the relationship between different language types and the acoustic spectrum features or phoneme features in the audio, which will not be described in this embodiment of the present application. Of course, in addition to the automatic detection of the call audio by the voice call translation assistance system, the target customer can also select the corresponding language type based on the language type options displayed by the target client during the process of initiating a communication request through the target client. , the selected language type will be sent to the voice call translation assistance system as the language type corresponding to the subsequently obtained call audio.

在确定通话音频对应的语种类型的基础上,若通话音频对应的语种类型是预先所设定好的目标语种,也就是呼叫中心所能提供的特定语种服务之外的非特定语种,语音通话翻译辅助系统会切断目标客户端与目标服务端之间的语音通路,以避免目标客户端提供的通话音频直接传输给目标服务端,也避免目标服务端答复的非目标语种的音频直接传输给目标客户端,从而造成非必要的误会。当然,作为本申请的一种可行实施例,这里切断目标客户端与目标服务端之间的语音通路是指切断由目标服务端发送至目标客户端的单向语音通路,即目标客户端的通话音频会被传输给目标服务端并播放,从而降低对通话音频进行识别与翻译的出错风险,但目标服务端答复的非目标语种的音频不会直接传输给目标客户端。On the basis of determining the language type corresponding to the call audio, if the language type corresponding to the call audio is a pre-set target language, that is, a non-specific language other than the specific language service that the call center can provide, voice call translation The auxiliary system will cut off the voice channel between the target client and the target server, so as to prevent the call audio provided by the target client from being directly transmitted to the target server, and also to prevent the non-target language audio replied by the target server from being directly transmitted to the target customer end, resulting in unnecessary misunderstandings. Of course, as a feasible embodiment of the present application, cutting off the voice path between the target client and the target server here refers to cutting off the one-way voice path sent from the target server to the target client, that is, the call audio of the target client will be It is transmitted to the target server and played, thereby reducing the risk of errors in identifying and translating the call audio, but the non-target language audio replied by the target server will not be directly transmitted to the target client.

203,对所述通话音频进行处理,输出所述通话音频对应的预设语种的翻译文本,并展示在所述目标服务端的显示屏上。203. Process the call audio, output a translation text in a preset language corresponding to the call audio, and display it on a display screen of the target server.

本申请实施例中,为实现客服人员对目标语种的通话音频的理解,语音通话翻译辅助系统会先对通话音频进行处理以输出预设语种的翻译文本,并提供给目标服务端。具体的,语音通话翻译辅助系统对通话音频进行处理的过程中主要包括音频识别过程以及文本翻译过程,前者主要用于将目标语种的通话音频识别成目标语种的通话文本,而后者则主要用于将得到的目标语种的通话文本翻译成预设语种的翻译文本。其中,对音频的识别以及对文本的翻译同样可以是基于训练好的神经网络模型来实现,例如,语音识别过程可以是利用现有的语音识别模型,即通过一段音频中的频谱特征信息确定该音频中所包含的信息内容。具体的,作为本申请的一种可行实施例,语音识别模型可以是预先基于训练样本数据所训练得到的,例如HMM(Hidden Markov Model,隐马尔科夫模型)、DNN(Deep NeuralNetworks,深度神经网络)模型等等。其中,语音识别模型的训练过程大致包括如下步骤:构建参数初始化的网络模型;将携带有标签文本的样本音频输入至该初始网络模型中,得到预测文本;基于预测文本和标签文本之间差异来对初始网络模型中的参数进行更新,以使更新后的网络模型处理样本音频所得到的预测文本更加接近于标签文本;重复迭代上述过程,直至利用当前所得到的网络模型处理样本音频所得到的预测文本与标签文本之间的差异小于预设的差异阈值时,此时当前的网络模型即为训练得到的语音识别模型,该模型具有将一段音频准确处理成相应文本内容的功能。同样的,对于翻译文本,文本的翻译过程可以是利用具有文本编码以及解码功能以实现文本翻译的transformer模型、RNN(RecurrentNeural Networks,循环神经网络)模型来实现,本申请实施例在此不再赘述。In the embodiment of the present application, in order to enable the customer service personnel to understand the call audio in the target language, the voice call translation assistance system will first process the call audio to output the translation text in the preset language, and provide it to the target server. Specifically, the voice call translation assistance system processes the call audio mainly including the audio recognition process and the text translation process. The former is mainly used to recognize the call audio in the target language into the call text in the target language, while the latter is mainly used Translate the obtained call text in the target language into the translated text in the preset language. Among them, the recognition of audio and the translation of text can also be realized based on the trained neural network model. For example, the speech recognition process can use the existing speech recognition model, that is, determine the frequency spectrum feature information in a piece of audio. The information content contained in the audio. Specifically, as a feasible embodiment of the present application, the speech recognition model can be pre-trained based on training sample data, such as HMM (Hidden Markov Model, Hidden Markov Model), DNN (Deep NeuralNetworks, deep neural network ) model and so on. Among them, the training process of the speech recognition model generally includes the following steps: constructing a network model with parameter initialization; inputting sample audio with label text into the initial network model to obtain predicted text; Update the parameters in the initial network model so that the predicted text obtained by the updated network model processing the sample audio is closer to the label text; repeat the above process until the current network model is used to process the sample audio. When the difference between the predicted text and the labeled text is less than the preset difference threshold, the current network model is the trained speech recognition model, which has the function of accurately processing a piece of audio into corresponding text content. Similarly, for translated text, the translation process of the text can be realized by using the transformer model and RNN (Recurrent Neural Networks, cyclic neural network) model with text encoding and decoding functions to realize text translation, and the embodiments of the present application will not repeat them here. .

当然,需要说明的是,由于不同语种的音频中所包含的频谱特征往往不同,因此,单个的语音识别模型往往难以完成对不同语种的音频识别,同理,单个的文本翻译模型同样也难以完成对不同语种的文本翻译,因此,为提高本申请实施例提供的语音通话翻译辅助方法的适用性,作为本申请实施例的一种可行实现方案,语音通话翻译辅助系统的数据库中部署有与不同语种关联的音频处理模型,以便于在对通话音频进行处理时,能够选择合适的音频处理模型进行处理,提高对通话音频的翻译效果,具体的实现方案可以参阅后续图3及其解释说明的内容。Of course, it should be noted that since the spectral features contained in audio in different languages are often different, it is often difficult for a single speech recognition model to complete audio recognition for different languages. Similarly, it is also difficult for a single text translation model to complete For text translation in different languages, therefore, in order to improve the applicability of the voice call translation assistance method provided by the embodiment of the present application, as a feasible implementation scheme of the embodiment of the application, the database of the voice call translation assistance system is deployed with different Language-related audio processing model, so that when processing the call audio, an appropriate audio processing model can be selected for processing, and the translation effect of the call audio can be improved. For the specific implementation plan, please refer to the following figure 3 and its explanation. .

此外,作为本申请的另一可行方案,为进一步提高客服人员的服务效率,目标服务端的显示屏除了会展示翻译文本外,还会同步展示与该翻译文本相关的关联文本,以用于客服人员答复,具体的实现方案可以参阅后续图4及其解释说明的内容。In addition, as another feasible solution of this application, in order to further improve the service efficiency of customer service personnel, in addition to displaying the translated text, the display screen of the target server will also simultaneously display associated text related to the translated text for customer service personnel. Answer, for the specific implementation plan, please refer to the subsequent Figure 4 and its explanation.

204,获取所述翻译文本对应的预设语种的答复信息,并对所述答复信息进行处理,生成对应所述目标语种的答复音频,并传输给所述目标客户端。204. Obtain reply information in a preset language corresponding to the translated text, process the reply information, generate reply audio corresponding to the target language, and transmit the reply to the target client.

本申请实施例中,由于目标客户端与目标服务端之间的语音通路已经切断,因此,客服人员在基于目标服务端的显示屏上的翻译文本反馈预设语种的答复信息不会直接传输给目标客户端,而是会先传输给语音通话翻译辅助系统进行处理,语音通话翻译辅助系统在完成对该答复信息的处理后,也就是翻译成目标语种的答复文本,然后合成为目标语种的答复音频后,才会被传输给目标客户端。In the embodiment of the present application, since the voice channel between the target client and the target server has been cut off, the customer service personnel will not directly transmit the reply information in the preset language based on the translated text feedback on the display screen of the target server to the target client, it will first be transmitted to the voice call translation assistance system for processing. After the voice call translation assistance system completes the processing of the reply information, it translates it into the reply text in the target language, and then synthesizes it into the reply audio in the target language Only then will it be transmitted to the target client.

具体的,翻译文本对应的预设语种的答复信息可以由多种方式录入到目标服务端,例如,可以在展示关联文本后,由客服人员在显示屏上通过选择方式,将选择指令对应的文本信息作为答复信息,但作为另一种简单可行的实现方案,可以通过对显示屏上的按钮的不同操作指令,启停录音装置来获取到客服人员答复的语音信息。具体的实现方案可以参阅后续图5及其解释说明的内容。Specifically, the reply information of the preset language corresponding to the translated text can be entered into the target server in various ways. For example, after displaying the associated text, the customer service personnel can select the text corresponding to the instruction on the display screen The information is used as the reply information, but as another simple and feasible implementation plan, the voice information answered by the customer service personnel can be obtained by starting and stopping the recording device through different operation instructions to the buttons on the display screen. For a specific implementation solution, refer to the subsequent FIG. 5 and its explanations.

进一步的,作为本申请的另一可选实施例,考虑到在语音通话翻译辅助系统对答复信息的处理过程中,会先生成目标语种的答复文本,然后合成目标语种的答复音频,提供给目标客户端,在此过程中,为进一步提高呼叫中心的服务效果,语音通话翻译辅助系统还可以将目标语种的答复文本传输给目标客户端。Further, as another optional embodiment of this application, it is considered that during the processing of the reply information by the voice call translation assistance system, the reply text in the target language will be generated first, and then the reply audio in the target language will be synthesized and provided to the target Client, in this process, in order to further improve the service effect of the call center, the voice call translation assistance system can also transmit the reply text in the target language to the target client.

此外,考虑到本申请实施例中,由于额外增设了语音识别、文本翻译、语音合成过程,容易造成目标客户的等待,因此,为消除目标客户等待所带来的不良影响,本申请实施例中,在获取到目标客户端的通话音频后,语音通话翻译辅助系统会记录该通话音频所对应的通话间隔,也就是记录距离目标客户完成通话音频的时长,并在通话间隔大于预设的时长阈值后,若仍未生成目标语种的答复音频,此时,语音通话翻译辅助系统,会将预设的提醒信息播放给目标客户端。具体的实现方案可以参阅后续图7及其解释说明的内容。In addition, considering that in this embodiment of the application, due to the addition of speech recognition, text translation, and speech synthesis processes, it is easy to cause target customers to wait. Therefore, in order to eliminate the adverse effects of target customers waiting, in the embodiment of this application , after obtaining the call audio of the target client, the voice call translation assistance system will record the call interval corresponding to the call audio, that is, record the time from the target client to complete the call audio, and when the call interval is greater than the preset duration threshold , if the reply audio in the target language is still not generated, at this time, the voice call translation assistance system will play the preset reminder information to the target client. For a specific implementation solution, refer to the subsequent FIG. 7 and its explanations.

本申请实施例提供的语音通话翻译辅助方法,在获取到目标客户端的通话音频,若检测到通话音频对应的语种类型为目标语种,则会切断所述目标客户端与目标服务端之间的语音通路,并输出该通话音频对应的预设语种的翻译文本,提供在目标服务端的显示屏上,从而在服务人员根据翻译文本反馈预设语种的答复信息后,能够对答复信息进行处理,生成对应目标语种的答复音频,提供给目标客户端。本申请实施例提供的语音通话翻译辅助方法,能够在不引入第三方人员的基础上,全流程自动化实现目标客户端和目标服务端的通话交互,有效提高了呼叫中心在接收到非特定语种的电话后的服务效率。In the voice call translation assistance method provided by the embodiment of the present application, after obtaining the call audio of the target client, if it is detected that the language type corresponding to the call audio is the target language, the voice between the target client and the target server will be cut off channel, and output the translation text of the preset language corresponding to the call audio, and provide it on the display screen of the target server, so that after the service personnel feed back the reply information in the preset language according to the translated text, they can process the reply information and generate corresponding The reply audio in the target language is provided to the target client. The voice call translation assistance method provided by the embodiment of the present application can automate the entire process to realize the call interaction between the target client and the target server without introducing third-party personnel, effectively improving the call center's ability to receive non-specific language calls. subsequent service efficiency.

如图3所示,图3为本申请实施例提供的一种基于语种类型获取对应的音频处理模型来对通话音频进行处理的步骤流程示意图,具体的,包括步骤301~302:As shown in FIG. 3 , FIG. 3 is a schematic flow chart of steps for processing call audio by acquiring a corresponding audio processing model based on the language type provided by the embodiment of the present application. Specifically, steps 301-302 are included:

301,根据所述目标语种查询预设数据库,获取与所述目标语种关联的音频处理模型。301. Query a preset database according to the target language to obtain an audio processing model associated with the target language.

本申请实施例中,语音通话翻译辅助系统的数据库中预先存储有与不同语种关联的音频处理模型,包括音频识别模型和文本翻译模型,其中,与不同语种关联的音频处理模型是分别与其关联的语种的训练文本所训练得到,例如,与语种A关联的音频处理模型,是利用语种A的训练文本训练得到,而与语种B关联的音频处理模型,是利用语种B的训练文本训练得到,本申请实施例在此不再赘述。In the embodiment of the present application, audio processing models associated with different languages are pre-stored in the database of the voice call translation assistance system, including audio recognition models and text translation models, wherein the audio processing models associated with different languages are respectively associated with The training text of the language is trained. For example, the audio processing model associated with language A is trained using the training text of language A, and the audio processing model associated with language B is trained using the training text of language B. This The application embodiments will not be repeated here.

进一步的,除了前述提供的语音识别模型以及文本翻译模型后,结合前述步骤204的相关描述可知,语音通话翻译辅助系统还可以通过语音合成模型来将目标语种的答复文本合成为目标语种的答复音频,因此,音频处理模型也还可以包含有语音合成模型,当然该模型的训练也是基于对应语种的训练文本实现。Further, in addition to the speech recognition model and text translation model provided above, combined with the relevant description of step 204 above, the voice call translation assistance system can also synthesize the reply text in the target language into the reply audio in the target language through the speech synthesis model , therefore, the audio processing model may also include a speech synthesis model, and of course the training of the model is also implemented based on the training text of the corresponding language.

302,将所述通话音频输入至所述音频处理模型中进行处理,输出所述通话音频对应的预设语种的翻译文本。302. Input the call audio into the audio processing model for processing, and output a translation text in a preset language corresponding to the call audio.

本申请实施例中,将通话音频输入至该音频处理模型中进行处理,由于该音频处理模型是基于目标语种的训练文本所实现,因而可以有效完成对通话音频的识别与翻译效果,保证了输出的翻译文本的准确性。In the embodiment of the present application, the call audio is input into the audio processing model for processing. Since the audio processing model is implemented based on the training text of the target language, it can effectively complete the recognition and translation of the call audio, ensuring the output accuracy of the translated text.

如图4所示,图4为本申请实施例提供的一种在显示屏上同步展示关联文本以辅助客服人员进行答复的实现方案,具体的,包括步骤401~403:As shown in Figure 4, Figure 4 is an implementation scheme for synchronously displaying associated text on the display screen to assist customer service personnel in answering provided by the embodiment of the present application, specifically, steps 401-403 are included:

401,对所述翻译文本进行处理,得到所述翻译文本对应的语义信息。401. Process the translated text to obtain semantic information corresponding to the translated text.

本申请实施例中,为提高客服人员的服务效果,语音通话翻译辅助系统还会进一步对得到的翻译文本进行语义识别处理,也就是通过提取文本中的关键词来确定翻译文本的语义信息。具体的,该过程通常可以是通过是基于训练好的语义识别模型来实现,例如,比较常见有基于NLP(Natural Language Processing,自然语言处理)技术实现的语义识别模型,本申请实施例在此不再赘述。In the embodiment of the present application, in order to improve the service effect of customer service personnel, the voice call translation assistance system will further perform semantic recognition processing on the obtained translated text, that is, determine the semantic information of the translated text by extracting keywords in the text. Specifically, this process can usually be realized by a semantic recognition model based on training. For example, a semantic recognition model based on NLP (Natural Language Processing, Natural Language Processing) technology is more common, and the embodiments of the present application are not described here. Let me repeat.

本申请实施例中,前述对翻译文本所处理得到的语音信息一般是描述了目标客户是基于何种类型的问题,例如生活问题、工作问题,来寻求帮助或是意见反馈。In the embodiment of the present application, the aforementioned voice information obtained by processing the translated text generally describes what type of problems the target customer seeks for help or feedback based on, for example, life problems or work problems.

402,根据所述语义信息查询预设数据库,获取与所述语义信息关联的关联文本,并在所述显示屏上展示所述关联文本。402. Query a preset database according to the semantic information, obtain associated text associated with the semantic information, and display the associated text on the display screen.

本申请实施例中,在前述识别出了翻译文本对应的语义信息后,语音通话翻译辅助系统还会基于该语义信息查询相应的数据库,从而直接从数据库获取到与语义信息关联的关联文本并通过前述显示屏提供给客服人员,例如,关联文本可以是与翻译文本的语义信息所关联的一些政策、规章文件等等。In the embodiment of the present application, after identifying the semantic information corresponding to the translated text, the voice call translation assistance system will also query the corresponding database based on the semantic information, so as to directly obtain the associated text associated with the semantic information from the database and pass The aforementioned display screen is provided to customer service personnel. For example, the associated text may be some policies, regulatory documents, etc. associated with the semantic information of the translated text.

当然,在展示翻译文本和关联文本的同时,为便于客服人员区分,在显示屏上所展示的翻译文本和关联文本可以存在差异,例如,可以通过不同的位置进行展示,如在显示屏上方区域展示翻译文本,而在显示屏下方区域展示关联文本,当然也可以采用不用的颜色加以区分,本申请实施例在此不再赘述。Of course, while displaying the translated text and associated text, in order to facilitate the customer service personnel to distinguish, the translated text and associated text displayed on the display screen may be different, for example, they may be displayed in different positions, such as in the upper area of the display screen The translated text is displayed, and the associated text is displayed in the lower area of the display screen. Of course, different colors can also be used to distinguish them, which will not be repeated in this embodiment of the present application.

403,响应于对所述显示屏上所述关联文本的选择指令,将所述关联文本中对应所述选择指令的文本信息作为所述翻译文本对应的答复信息。403. In response to a selection instruction for the associated text on the display screen, use text information corresponding to the selection instruction in the associated text as answer information corresponding to the translated text.

本申请实施例中,在前述通过显示屏展示翻译文本的关联文本后,若其中有若干部分的文本信息是可以用于答复通话音频的信息,此时,呼叫中心的客服人员可以直接在显示屏上通过选择的方式,例如框选或是点选的方式输入选择指令,此时语音通话翻译辅助系统会自动将该部分文本信息作为翻译文本对应的答复信息,以用于后续的处理,从而生成对应的答复音频。In the embodiment of the present application, after the associated text of the translated text is displayed on the display screen, if some of the text information in it is information that can be used to answer the call audio, at this time, the customer service personnel of the call center can directly display the text on the display screen. Input the selection command by means of selection, such as box selection or click selection. At this time, the voice call translation assistance system will automatically use this part of the text information as the response information corresponding to the translated text for subsequent processing, thereby generating The corresponding reply audio.

当然,上述提供的另一种简化的对翻译文本进行答复的实现方案,事实上,客服人员除了直接选择关联文本进行答复外,还可以进一步基于自身的业务经验进行答复,此时,为便于客服人员的答复,目标服务端还可以包括录音装置,以用于实现对用户语音的记录,作为答复信息进行回复,具体的,如图5所示,为本申请实施例提供的一种基于语音方式获取答复信息的步骤流程示意图,具体的,包括步骤501~503:Of course, another simplified implementation solution for replying to translated texts provided above, in fact, in addition to directly selecting related texts to reply, customer service personnel can also further reply based on their own business experience. At this time, in order to facilitate customer service For the personnel’s reply, the target server can also include a recording device to record the user’s voice and reply as the reply information. Specifically, as shown in Figure 5, it is a voice-based method provided by the embodiment of the present application Schematic flow chart of steps for obtaining reply information, specifically, steps 501-503 are included:

501,响应于对所述显示屏上预设按钮的第一操作指令,启动预设录音装置。501. Start a preset recording device in response to a first operation instruction to a preset button on the display screen.

502,响应于对所述显示屏上预设按钮的第二操作指令,关闭预设录音装置。502. In response to a second operation instruction to the preset button on the display screen, close the preset recording device.

本申请实施例中,显示屏上预设按钮可以为单个,当然也可以为多个,例如预设按钮也可以包括有两个。具体的,当显示屏上仅仅提供了一个预设按钮时,此时的第一操作指令可以为第一次点击,此时会启动目标服务端的录音装置,开始记录当前所输入的语音信息。进一步的,对该预设按钮的第二操作指令通常为可以再次点击,为便于区分两种不同的点击,预设按钮可以通过以不同样式进行展示,例如在第一次点击之前,预设按钮上会展示三角样式花纹,而当接收了对该按钮的第一操作指令之后,该预设按钮上会展示圆形样式花纹,并启动录音装置,而当再次接收到了对该按钮的第二操作指令之后,该预设按钮上会重新展示三角样式花纹,并关闭录音装置。In the embodiment of the present application, there may be a single preset button on the display screen, and of course there may be multiple preset buttons, for example, there may also be two preset buttons. Specifically, when only one preset button is provided on the display screen, the first operation instruction at this time may be the first click, and at this time, the recording device of the target server will be started to record the currently input voice information. Further, the second operation instruction of the preset button is usually clickable again. In order to distinguish between two different clicks, the preset button can be displayed in different styles. For example, before the first click, the preset button A triangular pattern pattern will be displayed on the button, and after receiving the first operation instruction of the button, a circular pattern pattern will be displayed on the preset button, and the recording device will be started, and when the second operation instruction of the button is received again After the command, the triangle pattern will reappear on the preset button and the recording device will be turned off.

当然,上述仅仅是以预设按钮包括单一按钮为例进行说明,事实上,预设按钮还可以包括两个乃至更多,例如,以预设按钮包括两个按钮为例,此时对预设按钮的第一操作指令可以是对预设按钮中第一按钮的点击操作,而对预设按钮的第二操作指令则可以是对预设按钮中第二按钮的点击操作,当然,预设按钮还可以包含更多,分别对应着对录音装置的不同功能,例如,降噪、波形增强、回音消除等等,本申请实施例在此不再赘述。Of course, the above is just an example of the preset button including a single button. In fact, the preset button can also include two or more. For example, take the preset button including two buttons as an example. At this time, the preset The first operation instruction of the button may be a click operation on the first button among the preset buttons, and the second operation instruction on the preset button may be a click operation on the second button among the preset buttons. Of course, the preset button More may be included, corresponding to different functions of the recording device, such as noise reduction, waveform enhancement, echo cancellation, etc., which will not be described in this embodiment of the present application.

503,根据所述预设录音装置采集到的音频信息生成所述翻译文本对应的答复信息,503. Generate reply information corresponding to the translated text according to the audio information collected by the preset recording device,

本申请实施例中,在前述录音装置启动过程中,也就是对显示屏上预设按钮的第一操作指令和第二操作指令之间,录音装置所记录到的语音信息即为翻译文本对应的预设语种的答复信息。In the embodiment of the present application, during the start-up process of the aforementioned recording device, that is, between the first operation instruction and the second operation instruction to the preset button on the display screen, the voice information recorded by the recording device is the corresponding translation text. Reply message in default language.

在上述方案的基础上,为进一步提高客服人员的答复效果,该音频信息会先转化为预设语种的文本信息,并在显示屏上进行展示,从而便于客服人员的修改,具体的,如图6所示,图6为本申请实施例提供的一种展示答复文本以辅助进行修改的答复信息的步骤流程示意图,具体的,包括步骤601~602:On the basis of the above scheme, in order to further improve the reply effect of customer service personnel, the audio information will be converted into text information in the preset language first, and displayed on the display screen, so as to facilitate modification by customer service personnel. Specifically, as shown in the figure 6, FIG. 6 is a schematic flow chart of the steps of displaying the reply text to assist in modifying the reply information provided by the embodiment of the present application. Specifically, it includes steps 601-602:

601,对所述预设录音装置采集到的音频信息进行处理,生成所述音频信息对应的预设语种的答复文本,并在所述显示屏上进行展示。601. Process the audio information collected by the preset recording device, generate a reply text in a preset language corresponding to the audio information, and display it on the display screen.

本申请实施例中,语音通话翻译辅助系统会先对录音装置采集到的音频信息进行处理语音识别处理,处理成为预设语种的答复文本,并在显示屏上进行展示。具体的,这里的语音识别过程可以是与前述步骤相似,采用语音识别模型处理得到,但需要说明的是这里的语音识别模型是利用该预设语种的训练文本训练得到,本申请实施例在此不再赘述。In the embodiment of the present application, the voice call translation assistance system will first process the audio information collected by the recording device for speech recognition processing, process it into a reply text in a preset language, and display it on the display screen. Specifically, the speech recognition process here may be similar to the aforementioned steps, obtained by using a speech recognition model, but it should be noted that the speech recognition model here is obtained by using the training text of the preset language. No longer.

602,响应于对所述显示屏上答复文本的修改操作指令,生成修改后的答复文本,并将所述修改后的答复文本作为所述翻译文本对应的答复信息。602. Generate a modified reply text in response to an operation instruction for modifying the reply text on the display screen, and use the modified reply text as reply information corresponding to the translated text.

本申请是实施例中,在通过显示屏展示客服人员的初始答复文本外,若客服人员想对其中的部分内容进行修改,也可以通过对显示屏上答复文本的修改操作指令来完成修改,具体的,修改后的答复文本将会被作为最终的答复信息,用于后续生成答复音频,并提供给目标客户端。In the embodiment of this application, in addition to displaying the initial reply text of the customer service personnel through the display screen, if the customer service personnel want to modify part of the content, the modification can also be completed by modifying the reply text on the display screen. Specifically Yes, the modified reply text will be used as the final reply information for subsequent generation of reply audio and provided to the target client.

如图7所示,图7为本申请实施例提供的一种提供提醒语音信息给目标客户端的步骤流程示意图,具体的,包括步骤701~702:As shown in FIG. 7, FIG. 7 is a schematic flow diagram of the steps for providing reminder voice information to the target client provided by the embodiment of the present application. Specifically, it includes steps 701-702:

701,获取所述通话音频对应的通话间隔,以及所述目标语种对应的预设提醒语音信息。701. Acquire a call interval corresponding to the call audio and preset reminder voice information corresponding to the target language.

本申请实施例中,在语音通话翻译辅助系统获取到目标客户端的通话音频,会通过计时装置记录该通话音频的通话间隔,然后获取到以目标语种信息存在的预设提醒信息,例如,“正在查询中,请稍后”,当然其他提醒信息也是可行的,本申请实施例在此不再赘述。In the embodiment of the present application, when the voice call translation assistance system acquires the call audio of the target client, it will record the call interval of the call audio through a timing device, and then obtain the preset reminder information in the target language information, for example, "Ongoing Inquiry, please wait later", of course, other reminder information is also possible, which will not be repeated here in this embodiment of the application.

702,若所述通话间隔大于预设的时长阈值,则在所述生成对应所述目标语种的答复音频的步骤之前,将所述预设提醒语音信息传输给所述目标客户端。702. If the call interval is greater than a preset duration threshold, before the step of generating the reply audio corresponding to the target language, transmit the preset reminder voice information to the target client.

本申请实施例中,在通话间隔大于预设的时长阈值,例如超过2分钟后,语音通话翻译辅助系统还未生成目标语种的答复音频,此时,语音通话翻译辅助系统会将该预设的提醒语音信息传输给目标客户端,以提醒客户等待,直至在生成对应目标语种的答复音频后,关闭播放提醒语音信息,而将答复音频传输给目标客户端。In the embodiment of the present application, when the call interval is greater than the preset duration threshold, for example, after more than 2 minutes, the voice call translation assistance system has not yet generated a reply audio in the target language. At this time, the voice call translation assistance system will use the preset The reminder voice information is transmitted to the target client to remind the customer to wait until the reply audio corresponding to the target language is generated, and the playback of the reminder voice information is turned off, and the reply audio is transmitted to the target client.

具体的,为更清楚理解本申请实施例提供的语音通话翻译辅助方法,在上述语音通话翻译辅助方法基础之上,本申请实施例还提供了一种语音通话翻译辅助系统,具体的,如图8所示,为本申请实施例提供的一种语音通话翻译辅助系统的结构示意图,具体的,包括810~830:Specifically, in order to more clearly understand the voice call translation assistance method provided by the embodiment of the present application, on the basis of the above-mentioned voice call translation assistance method, the embodiment of the present application also provides a voice call translation assistance system, specifically, as shown in the figure As shown in 8, it is a schematic structural diagram of a voice call translation assistance system provided in the embodiment of this application, specifically, it includes 810-830:

语音通话子系统810,用于获取目标客户端的通话音频;若所述通话音频对应的语种类型为目标语种,则切断所述目标客户端与目标服务端之间的语音通路;The voice call subsystem 810 is used to obtain the call audio of the target client; if the language type corresponding to the call audio is the target language, then cut off the voice path between the target client and the target server;

语音翻译子系统820,用于对所述通话音频进行处理,输出所述通话音频对应的预设语种的翻译文本,并展示在所述目标服务端的显示屏上;The speech translation subsystem 820 is configured to process the call audio, output the translated text in the preset language corresponding to the call audio, and display it on the display screen of the target server;

语音合成子系统830,用于获取所述翻译文本对应的预设语种的答复信息,并对所述答复信息进行处理,生成对应所述目标语种的答复音频,并传输给所述目标客户端。The speech synthesis subsystem 830 is configured to obtain reply information in a preset language corresponding to the translated text, process the reply information, generate reply audio corresponding to the target language, and transmit the reply to the target client.

本申请实施例中,语音翻译子系统还用于根据所述目标语种查询预设数据库,获取与所述目标语种关联的音频处理模型;将所述通话音频输入至所述音频处理模型中进行处理,输出所述通话音频对应的预设语种的翻译文本。In the embodiment of the present application, the speech translation subsystem is also used to query the preset database according to the target language to obtain an audio processing model associated with the target language; input the call audio into the audio processing model for processing to output the translated text in the preset language corresponding to the call audio.

本申请实施例中,语音翻译子系统在输出所述通话音频对应的预设语种的翻译文本并显示在所述目标服务端的显示屏上之后,还用于对所述翻译文本进行处理,得到所述翻译文本对应的语义信息;根据所述语义信息查询预设数据库,获取与所述语义信息关联的关联文本,并在所述显示屏上展示所述关联文本;响应于对所述显示屏上所述关联文本的选择指令,将所述关联文本中对应所述选择指令的文本信息作为所述翻译文本对应的答复信息。In the embodiment of the present application, after the speech translation subsystem outputs the translation text in the preset language corresponding to the call audio and displays it on the display screen of the target server, it is also used to process the translation text to obtain the Semantic information corresponding to the translated text; query a preset database according to the semantic information, obtain associated text associated with the semantic information, and display the associated text on the display screen; respond to the query on the display screen The selection instruction of the associated text uses the text information corresponding to the selection instruction in the associated text as the reply information corresponding to the translated text.

本申请实施例中,语音合成子系统还用于响应于对所述显示屏上预设按钮的第一操作指令,启动预设录音装置;响应于对所述显示屏上预设按钮的第二操作指令,关闭预设录音装置;根据所述预设录音装置采集到的音频信息生成所述翻译文本对应的答复信息。In the embodiment of the present application, the speech synthesis subsystem is also used to start the preset recording device in response to the first operation instruction to the preset button on the display screen; The operation instruction is to close the preset recording device; generate the reply information corresponding to the translated text according to the audio information collected by the preset recording device.

本申请实施例中,语音合成子系统还用于对所述预设录音装置采集到的音频信息进行处理,生成所述音频信息对应的预设语种的答复文本,并在所述显示屏上进行展示;响应于对所述显示屏上答复文本的修改操作指令,生成修改后的答复文本,并将所述修改后的答复文本作为所述翻译文本对应的答复信息。In the embodiment of the present application, the speech synthesis subsystem is also used to process the audio information collected by the preset recording device, generate the reply text in the preset language corresponding to the audio information, and perform the reply on the display screen. displaying; in response to an operation instruction for modifying the reply text on the display screen, generate a revised reply text, and use the modified reply text as reply information corresponding to the translated text.

本申请实施例中,语音合成子系统在获取所述翻译文本对应的预设语种的答复信息之后,还用于对所述答复信息进行处理,生成目标语种的答复文本,并将所述目标语种的答复文本传输给所述目标客户端。In the embodiment of the present application, after obtaining the reply information in the preset language corresponding to the translated text, the speech synthesis subsystem is also used to process the reply information, generate a reply text in the target language, and convert the text in the target language The reply text is transmitted to the target client.

本申请实施例中,语音通话子系统在获取目标客户端的通话音频之后,还用于获取所述通话音频对应的通话间隔,以及所述目标语种对应的预设提醒语音信息;此时,语音合成子系统用于若所述通话间隔大于预设的时长阈值,则在所述生成对应所述目标语种的答复音频的步骤之前,将所述预设提醒语音信息传输给所述目标客户端。In the embodiment of the present application, after the voice call subsystem acquires the call audio of the target client, it is also used to acquire the call interval corresponding to the call audio and the preset reminder voice information corresponding to the target language; at this time, the speech synthesis The subsystem is configured to transmit the preset reminder voice information to the target client before the step of generating the reply audio corresponding to the target language if the call interval is greater than a preset duration threshold.

关于语音通话翻译辅助系统的具体限定可以参见上文中对于语音通话翻译辅助方法的说明,在此不再赘述。上述语音通话翻译辅助方法中的各个步骤可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。For the specific limitations of the voice call translation assistance system, refer to the above description of the voice call translation assistance method, which will not be repeated here. Each step in the above voice call translation assistance method can be fully or partially realized by software, hardware and combinations thereof. The above-mentioned modules can be embedded in or independent of the processor in the computer device in the form of hardware, and can also be stored in the memory of the computer device in the form of software, so that the processor can invoke and execute the corresponding operations of the above-mentioned modules.

在本申请一些实施例中,语音通话翻译辅助系统200可以实现为一种计算机程序的形式,计算机程序可在如图9所示的计算机设备上运行。计算机设备的存储器中可存储组成该语音通话翻译辅助系统200的各个程序模块,各个程序模块构成的计算机程序使得处理器执行本说明书中描述的本申请各个实施例的语音通话翻译辅助方法中的步骤。In some embodiments of the present application, the voice call translation assistance system 200 may be implemented in the form of a computer program, and the computer program may run on a computer device as shown in FIG. 9 . Each program module constituting the voice call translation assistance system 200 can be stored in the memory of the computer device, and the computer program constituted by each program module enables the processor to execute the steps in the voice call translation assistance method of each embodiment of the application described in this specification. .

例如,图9所示的计算机设备包括通过系统总线连接的处理器、存储器和网络接口。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质和内存储器。该非易失性存储介质存储有操作系统和计算机程序。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的网络接口用于与外部的计算机设备通过网络连接通信。该计算机程序被处理器执行时以实现一种语音通话翻译辅助方法。For example, the computer device shown in FIG. 9 includes a processor, memory, and network interface connected by a system bus. Wherein, the processor of the computer device is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and computer programs. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used to communicate with external computer devices via a network connection. When the computer program is executed by the processor, a voice call translation assistance method is realized.

本领域技术人员可以理解,图9中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。Those skilled in the art can understand that the structure shown in FIG. 9 is only a block diagram of a part of the structure related to the solution of this application, and does not constitute a limitation on the computer equipment on which the solution of this application is applied. The specific computer equipment can be More or fewer components than shown in the figures may be included, or some components may be combined, or have a different arrangement of components.

在本申请一些实施例中,提供了一种计算机设备,包括一个或多个处理器;存储器;以及一个或多个应用程序,其中一个或多个应用程序被存储于存储器中,并配置为由处理器执行以实现以下步骤:In some embodiments of the present application, a computer device is provided, including one or more processors; memory; and one or more application programs, wherein one or more application programs are stored in the memory and configured to be used by The processor executes to achieve the following steps:

获取目标客户端的通话音频;Obtain the call audio of the target client;

若所述通话音频对应的语种类型为目标语种,则切断所述目标客户端与目标服务端之间的语音通路;If the language type corresponding to the call audio is the target language, cut off the voice path between the target client and the target server;

对所述通话音频进行处理,输出所述通话音频对应的预设语种的翻译文本,并展示在所述目标服务端的显示屏上;Processing the call audio, outputting the translation text of the preset language corresponding to the call audio, and displaying it on the display screen of the target server;

获取所述翻译文本对应的预设语种的答复信息,并对所述答复信息进行处理,生成对应所述目标语种的答复音频,并传输给所述目标客户端。Obtain reply information in a preset language corresponding to the translated text, process the reply information, generate reply audio corresponding to the target language, and transmit the reply to the target client.

在本申请一些实施例中,处理器执行计算机程序时还实现以下步骤:根据所述目标语种查询预设数据库,获取与所述目标语种关联的音频处理模型;将所述通话音频输入至所述音频处理模型中进行处理,输出所述通话音频对应的预设语种的翻译文本。In some embodiments of the present application, when the processor executes the computer program, the following steps are also implemented: querying the preset database according to the target language to obtain an audio processing model associated with the target language; inputting the call audio into the Processing is performed in the audio processing model, and the translation text in the preset language corresponding to the call audio is output.

在本申请一些实施例中,处理器执行计算机程序时还实现以下步骤:对所述翻译文本进行处理,得到所述翻译文本对应的语义信息;根据所述语义信息查询预设数据库,获取与所述语义信息关联的关联文本,并在所述显示屏上展示所述关联文本;响应于对所述显示屏上所述关联文本的选择指令,将所述关联文本中对应所述选择指令的文本信息作为所述翻译文本对应的答复信息。In some embodiments of the present application, when the processor executes the computer program, the following steps are also implemented: processing the translated text to obtain semantic information corresponding to the translated text; querying a preset database according to the semantic information, and obtaining information related to the translated text associated text associated with the semantic information, and display the associated text on the display screen; in response to a selection instruction for the associated text on the display screen, select the text corresponding to the selection instruction in the associated text information as the reply information corresponding to the translated text.

在本申请一些实施例中,处理器执行计算机程序时还实现以下步骤:响应于对所述显示屏上预设按钮的第一操作指令,启动预设录音装置;响应于对所述显示屏上预设按钮的第二操作指令,关闭预设录音装置;根据所述预设录音装置采集到的音频信息生成所述翻译文本对应的答复信息。In some embodiments of the present application, when the processor executes the computer program, the following steps are also implemented: in response to the first operation instruction to the preset button on the display screen, start the preset recording device; The second operation command of the preset button is to turn off the preset recording device; generate the reply information corresponding to the translated text according to the audio information collected by the preset recording device.

在本申请一些实施例中,处理器执行计算机程序时还实现以下步骤:对所述预设录音装置采集到的音频信息进行处理,生成所述音频信息对应的预设语种的答复文本,并在所述显示屏上进行展示;响应于对所述显示屏上答复文本的修改操作指令,生成修改后的答复文本,并将所述修改后的答复文本作为所述翻译文本对应的答复信息。In some embodiments of the present application, when the processor executes the computer program, the following steps are also implemented: processing the audio information collected by the preset recording device, generating a reply text in a preset language corresponding to the audio information, and displaying on the display screen; in response to an operation instruction to modify the reply text on the display screen, generate a revised reply text, and use the modified reply text as reply information corresponding to the translated text.

在本申请一些实施例中,处理器执行计算机程序时还实现以下步骤:对所述答复信息进行处理,生成目标语种的答复文本,并将所述目标语种的答复文本传输给所述目标客户端。In some embodiments of the present application, when the processor executes the computer program, the following steps are also implemented: processing the reply information, generating a reply text in the target language, and transmitting the reply text in the target language to the target client .

在本申请一些实施例中,处理器执行计算机程序时还实现以下步骤:获取所述通话音频对应的通话间隔,以及所述目标语种对应的预设提醒语音信息;若所述通话间隔大于预设的时长阈值,则在所述生成对应所述目标语种的答复音频的步骤之前,将所述预设提醒语音信息传输给所述目标客户端。In some embodiments of the present application, when the processor executes the computer program, the following steps are also implemented: obtaining the call interval corresponding to the call audio and the preset reminder voice information corresponding to the target language; if the call interval is greater than the preset before the step of generating the reply audio corresponding to the target language, transmit the preset reminder voice information to the target client.

在本申请一些实施例中,提供了一种计算机可读存储介质,存储有计算机程序,计算机程序被处理器进行加载,使得处理器执行以下步骤:In some embodiments of the present application, a computer-readable storage medium is provided, which stores a computer program, and the computer program is loaded by a processor, so that the processor performs the following steps:

获取目标客户端的通话音频;Obtain the call audio of the target client;

若所述通话音频对应的语种类型为目标语种,则切断所述目标客户端与目标服务端之间的语音通路;If the language type corresponding to the call audio is the target language, cut off the voice path between the target client and the target server;

对所述通话音频进行处理,输出所述通话音频对应的预设语种的翻译文本,并展示在所述目标服务端的显示屏上;Processing the call audio, outputting the translation text of the preset language corresponding to the call audio, and displaying it on the display screen of the target server;

获取所述翻译文本对应的预设语种的答复信息,并对所述答复信息进行处理,生成对应所述目标语种的答复音频,并传输给所述目标客户端。Obtain reply information in a preset language corresponding to the translated text, process the reply information, generate reply audio corresponding to the target language, and transmit the reply to the target client.

在本申请一些实施例中,处理器执行计算机程序时还实现以下步骤:根据所述目标语种查询预设数据库,获取与所述目标语种关联的音频处理模型;将所述通话音频输入至所述音频处理模型中进行处理,输出所述通话音频对应的预设语种的翻译文本。In some embodiments of the present application, when the processor executes the computer program, the following steps are also implemented: querying the preset database according to the target language to obtain an audio processing model associated with the target language; inputting the call audio into the Processing is performed in the audio processing model, and the translation text in the preset language corresponding to the call audio is output.

在本申请一些实施例中,处理器执行计算机程序时还实现以下步骤:对所述翻译文本进行处理,得到所述翻译文本对应的语义信息;根据所述语义信息查询预设数据库,获取与所述语义信息关联的关联文本,并在所述显示屏上展示所述关联文本;响应于对所述显示屏上所述关联文本的选择指令,将所述关联文本中对应所述选择指令的文本信息作为所述翻译文本对应的答复信息。In some embodiments of the present application, when the processor executes the computer program, the following steps are also implemented: processing the translated text to obtain semantic information corresponding to the translated text; querying a preset database according to the semantic information, and obtaining information related to the translated text associated text associated with the semantic information, and display the associated text on the display screen; in response to a selection instruction for the associated text on the display screen, select the text corresponding to the selection instruction in the associated text information as the reply information corresponding to the translated text.

在本申请一些实施例中,处理器执行计算机程序时还实现以下步骤:响应于对所述显示屏上预设按钮的第一操作指令,启动预设录音装置;响应于对所述显示屏上预设按钮的第二操作指令,关闭预设录音装置;根据所述预设录音装置采集到的音频信息生成所述翻译文本对应的答复信息。In some embodiments of the present application, when the processor executes the computer program, the following steps are also implemented: in response to the first operation instruction to the preset button on the display screen, start the preset recording device; The second operation command of the preset button is to turn off the preset recording device; generate the reply information corresponding to the translated text according to the audio information collected by the preset recording device.

在本申请一些实施例中,处理器执行计算机程序时还实现以下步骤:对所述预设录音装置采集到的音频信息进行处理,生成所述音频信息对应的预设语种的答复文本,并在所述显示屏上进行展示;响应于对所述显示屏上答复文本的修改操作指令,生成修改后的答复文本,并将所述修改后的答复文本作为所述翻译文本对应的答复信息。In some embodiments of the present application, when the processor executes the computer program, the following steps are also implemented: processing the audio information collected by the preset recording device, generating a reply text in a preset language corresponding to the audio information, and displaying on the display screen; in response to an operation instruction to modify the reply text on the display screen, generate a revised reply text, and use the modified reply text as reply information corresponding to the translated text.

在本申请一些实施例中,处理器执行计算机程序时还实现以下步骤:对所述答复信息进行处理,生成目标语种的答复文本,并将所述目标语种的答复文本传输给所述目标客户端。In some embodiments of the present application, when the processor executes the computer program, the following steps are also implemented: processing the reply information, generating a reply text in the target language, and transmitting the reply text in the target language to the target client .

在本申请一些实施例中,处理器执行计算机程序时还实现以下步骤:获取所述通话音频对应的通话间隔,以及所述目标语种对应的预设提醒语音信息;若所述通话间隔大于预设的时长阈值,则在所述生成对应所述目标语种的答复音频的步骤之前,将所述预设提醒语音信息传输给所述目标客户端。In some embodiments of the present application, when the processor executes the computer program, the following steps are also implemented: obtaining the call interval corresponding to the call audio and the preset reminder voice information corresponding to the target language; if the call interval is greater than the preset before the step of generating the reply audio corresponding to the target language, transmit the preset reminder voice information to the target client.

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和易失性存储器中的至少一种。非易失性存储器可包括只读存储器(Read-OnlyMemory,ROM)、磁带、软盘、闪存或光存储器等。易失性存储器可包括随机存取存储器(Random Access Memory,RAM)或外部高速缓冲存储器。作为说明而非局限,RAM可以是多种形式,如静态随机存取存储器(Static Random Access Memory,SRAM)或动态随机存取存储器(Dynamic Random Access Memory,DRAM)等。Those of ordinary skill in the art can understand that realizing all or part of the processes in the methods of the above embodiments can be completed by instructing related hardware through a computer program, and the computer program can be stored in a non-volatile computer-readable storage medium In this case, when the computer program is executed, it may include the processes of the embodiments of the above-mentioned methods. Any reference to memory, storage, database or other media used in the various embodiments provided in this application may include at least one of non-volatile and volatile memory. The non-volatile memory may include read-only memory (Read-Only Memory, ROM), magnetic tape, floppy disk, flash memory or optical memory, and the like. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM can be in various forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM).

以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。The technical features of the above embodiments can be combined arbitrarily. To make the description concise, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, they should be It is considered to be within the range described in this specification.

以上对本申请实施例所提供的一种语音通话翻译辅助方法、系统、计算机设备和存储介质进行了详细介绍,本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想;同时,对于本领域的技术人员,依据本发明的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本发明的限制。A voice call translation assistance method, system, computer equipment, and storage medium provided by the embodiments of the present application have been described above in detail. In this paper, specific examples are used to illustrate the principle and implementation of the present invention. The above embodiments The explanation is only used to help understand the method and core idea of the present invention; at the same time, for those skilled in the art, according to the idea of the present invention, there will be changes in the specific implementation and scope of application. In summary, The contents of this specification should not be construed as limiting the present invention.

Claims (11)

1.一种语音通话翻译辅助方法,其特征在于,包括:1. A voice call translation assistance method, characterized in that, comprising: 获取目标客户端的通话音频;Obtain the call audio of the target client; 若所述通话音频对应的语种类型为目标语种,则切断所述目标客户端与目标服务端之间的语音通路;If the language type corresponding to the call audio is the target language, cut off the voice path between the target client and the target server; 对所述通话音频进行处理,输出所述通话音频对应的预设语种的翻译文本,并展示在所述目标服务端的显示屏上;Processing the call audio, outputting the translation text of the preset language corresponding to the call audio, and displaying it on the display screen of the target server; 获取所述翻译文本对应的预设语种的答复信息,并对所述答复信息进行处理,生成对应所述目标语种的答复音频,并传输给所述目标客户端。Obtain reply information in a preset language corresponding to the translated text, process the reply information, generate reply audio corresponding to the target language, and transmit the reply to the target client. 2.根据权利要求1所述的语音通话翻译辅助方法,其特征在于,所述对所述通话音频进行处理,输出所述通话音频对应的预设语种的翻译文本,包括:2. The voice call translation assistance method according to claim 1, wherein the processing of the call audio and outputting the translation text of the preset language corresponding to the call audio includes: 根据所述目标语种查询预设数据库,获取与所述目标语种关联的音频处理模型;Querying a preset database according to the target language to obtain an audio processing model associated with the target language; 将所述通话音频输入至所述音频处理模型中进行处理,输出所述通话音频对应的预设语种的翻译文本。The call audio is input into the audio processing model for processing, and a translation text in a preset language corresponding to the call audio is output. 3.根据权利要求1所述的语音通话翻译辅助方法,其特征在于,所述输出所述通话音频对应的预设语种的翻译文本并显示在所述目标服务端的显示屏上之后,所述方法还包括:3. The voice call translation assistance method according to claim 1, characterized in that, after outputting the translated text of the preset language corresponding to the call audio and displaying it on the display screen of the target server, the method Also includes: 对所述翻译文本进行处理,得到所述翻译文本对应的语义信息;Processing the translated text to obtain semantic information corresponding to the translated text; 根据所述语义信息查询预设数据库,获取与所述语义信息关联的关联文本,并在所述显示屏上展示所述关联文本;Querying a preset database according to the semantic information, obtaining associated text associated with the semantic information, and displaying the associated text on the display screen; 响应于对所述显示屏上所述关联文本的选择指令,将所述关联文本中对应所述选择指令的文本信息作为所述翻译文本对应的答复信息。In response to a selection instruction for the associated text on the display screen, use text information corresponding to the selection instruction in the associated text as reply information corresponding to the translated text. 4.根据权利要求1所述的语音通话翻译辅助方法,其特征在于,所述获取所述翻译文本对应的答复信息,包括:4. The voice call translation assistance method according to claim 1, wherein said obtaining the reply information corresponding to said translated text comprises: 响应于对所述显示屏上预设按钮的第一操作指令,启动预设录音装置;In response to a first operation instruction to a preset button on the display screen, start a preset recording device; 响应于对所述显示屏上预设按钮的第二操作指令,关闭预设录音装置;In response to a second operation instruction to the preset button on the display screen, turn off the preset recording device; 根据所述预设录音装置采集到的音频信息生成所述翻译文本对应的答复信息。Generate answer information corresponding to the translated text according to the audio information collected by the preset recording device. 5.根据权利要求4所述的语音通话翻译辅助方法,其特征在于,所述根据所述预设录音装置采集到的音频信息生成所述翻译文本对应的答复信息,包括:5. The voice call translation assistance method according to claim 4, characterized in that, generating the reply information corresponding to the translated text according to the audio information collected by the preset recording device includes: 对所述预设录音装置采集到的音频信息进行处理,生成所述音频信息对应的预设语种的答复文本,并在所述显示屏上进行展示;Processing the audio information collected by the preset recording device, generating a reply text in a preset language corresponding to the audio information, and displaying it on the display screen; 响应于对所述显示屏上答复文本的修改操作指令,生成修改后的答复文本,并将所述修改后的答复文本作为所述翻译文本对应的答复信息。In response to an operation instruction for modifying the reply text on the display screen, a revised reply text is generated, and the modified reply text is used as reply information corresponding to the translated text. 6.根据权利要求1所述的语音通话翻译辅助方法,其特征在于,所述获取所述翻译文本对应的预设语种的答复信息之后,所述方法还包括:6. The voice call translation assistance method according to claim 1, characterized in that, after obtaining the reply information in the preset language corresponding to the translated text, the method further comprises: 对所述答复信息进行处理,生成目标语种的答复文本,并将所述目标语种的答复文本传输给所述目标客户端。Processing the reply information to generate a reply text in the target language, and transmitting the reply text in the target language to the target client. 7.根据权利要求1所述的语音通话翻译辅助方法,其特征在于,所述获取目标客户端的通话音频之后,所述方法还包括:7. The voice call translation assistance method according to claim 1, wherein, after the acquisition of the call audio of the target client, the method further comprises: 获取所述通话音频对应的通话间隔,以及所述目标语种对应的预设提醒语音信息;Obtain the call interval corresponding to the call audio, and the preset reminder voice information corresponding to the target language; 若所述通话间隔大于预设的时长阈值,则在所述生成对应所述目标语种的答复音频的步骤之前,将所述预设提醒语音信息传输给所述目标客户端。If the call interval is greater than a preset duration threshold, before the step of generating the reply audio corresponding to the target language, the preset reminder voice information is transmitted to the target client. 8.根据权利要求1~7任一项所述的语音通话翻译辅助方法,其特征在于,所述切断所述目标客户端与目标服务端之间的语音通路是指切断由所述目标服务端发送至所述目标客户端的单向语音通路。8. The voice call translation assistance method according to any one of claims 1 to 7, wherein the cutting off the voice path between the target client and the target server refers to cutting off the voice channel provided by the target server. A one-way voice path to the target client. 9.一种语音通话翻译辅助系统,其特征在于,包括:9. A voice call translation assistance system, characterized in that, comprising: 语音通话子系统,用于获取目标客户端的通话音频;若所述通话音频对应的语种类型为目标语种,则切断所述目标客户端与目标服务端之间的语音通路;The voice call subsystem is used to obtain the call audio of the target client; if the language type corresponding to the call audio is the target language, then cut off the voice path between the target client and the target server; 语音翻译子系统,用于对所述通话音频进行处理,输出所述通话音频对应的预设语种的翻译文本,并展示在所述目标服务端的显示屏上;The speech translation subsystem is used to process the call audio, output the translation text of the preset language corresponding to the call audio, and display it on the display screen of the target server; 语音合成子系统,用于获取所述翻译文本对应的预设语种的答复信息,并对所述答复信息进行处理,生成对应所述目标语种的答复音频,并传输给所述目标客户端。The speech synthesis subsystem is configured to obtain reply information in a preset language corresponding to the translated text, process the reply information, generate reply audio corresponding to the target language, and transmit the reply to the target client. 10.一种计算机设备,其特征在于,所述计算机设备包括:10. A computer device, characterized in that the computer device comprises: 处理器;以及processor; and 一个或多个应用程序,其中所述一个或多个应用程序被存储于所述处理器中,所述处理器执行所述一个或多个应用程序时,以用于实现如权利要求1~8任一项所述的语音通话翻译辅助方法。One or more application programs, wherein the one or more application programs are stored in the processor, and when the processor executes the one or more application programs, it is used to implement claims 1-8 Any one of the voice call translation assistance methods. 11.一种计算机可读存储介质,其特征在于,其上存储有计算机程序,所述计算机程序被处理器执行时,以用于实现如权利要求1~8任一项所述的语音通话翻译辅助方法。11. A computer-readable storage medium, characterized in that a computer program is stored thereon, and when the computer program is executed by a processor, it is used to realize the voice call translation according to any one of claims 1-8 helper method.
CN202211650023.8A 2022-12-21 2022-12-21 Speech call translation assistance method, system, computer equipment and storage medium Pending CN116016779A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211650023.8A CN116016779A (en) 2022-12-21 2022-12-21 Speech call translation assistance method, system, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211650023.8A CN116016779A (en) 2022-12-21 2022-12-21 Speech call translation assistance method, system, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116016779A true CN116016779A (en) 2023-04-25

Family

ID=86027503

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211650023.8A Pending CN116016779A (en) 2022-12-21 2022-12-21 Speech call translation assistance method, system, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116016779A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116561277A (en) * 2023-05-05 2023-08-08 科大讯飞股份有限公司 Knowledge quiz method, device, equipment and storage medium
CN117195916A (en) * 2023-09-25 2023-12-08 杭州龙席网络科技股份有限公司 A translation system and method based on online communication interaction
CN118939127A (en) * 2024-10-14 2024-11-12 中楹青创科技有限公司 A wearable artificial intelligence wireless IoT security system
CN119204043A (en) * 2024-10-31 2024-12-27 深圳大学 Content translation method, system and wearable translation device
CN119580703A (en) * 2024-11-12 2025-03-07 珠海市泛圈网络科技有限公司 A cross-language voice call system integrating AI voice cloning and real-time translation

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109274831A (en) * 2018-11-01 2019-01-25 科大讯飞股份有限公司 A kind of audio communication method, device, equipment and readable storage medium storing program for executing
CN110427455A (en) * 2019-06-24 2019-11-08 卓尔智联(武汉)研究院有限公司 A kind of customer service method, apparatus and storage medium
US20210312143A1 (en) * 2020-04-01 2021-10-07 Smoothweb Technologies Limited Real-time call translation system and method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109274831A (en) * 2018-11-01 2019-01-25 科大讯飞股份有限公司 A kind of audio communication method, device, equipment and readable storage medium storing program for executing
CN110427455A (en) * 2019-06-24 2019-11-08 卓尔智联(武汉)研究院有限公司 A kind of customer service method, apparatus and storage medium
US20210312143A1 (en) * 2020-04-01 2021-10-07 Smoothweb Technologies Limited Real-time call translation system and method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
常耀明;刘德培,王辰: "中华医学百科全书 军事与特种医学军事人机工效学", 31 January 2021, pages: 63 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116561277A (en) * 2023-05-05 2023-08-08 科大讯飞股份有限公司 Knowledge quiz method, device, equipment and storage medium
CN117195916A (en) * 2023-09-25 2023-12-08 杭州龙席网络科技股份有限公司 A translation system and method based on online communication interaction
CN118939127A (en) * 2024-10-14 2024-11-12 中楹青创科技有限公司 A wearable artificial intelligence wireless IoT security system
CN119204043A (en) * 2024-10-31 2024-12-27 深圳大学 Content translation method, system and wearable translation device
CN119204043B (en) * 2024-10-31 2025-04-15 深圳大学 Content translation method, system and wearable translation device
CN119580703A (en) * 2024-11-12 2025-03-07 珠海市泛圈网络科技有限公司 A cross-language voice call system integrating AI voice cloning and real-time translation

Similar Documents

Publication Publication Date Title
CN116016779A (en) Speech call translation assistance method, system, computer equipment and storage medium
JP6686226B2 (en) Call the appropriate agent automation assistant
US11355098B1 (en) Centralized feedback service for performance of virtual assistant
US10121475B2 (en) Computer-implemented system and method for performing distributed speech recognition
US12407776B2 (en) Methods and apparatus for bypassing holds
US20140358516A1 (en) Real-time, bi-directional translation
US11488603B2 (en) Method and apparatus for processing speech
US11900942B2 (en) Systems and methods of integrating legacy chatbots with telephone networks
JP2023501059A (en) Semi-delegated calls with automated assistants on behalf of human participants
JP7706576B2 (en) Method and apparatus for dynamically navigating an interactive communication system - Patents.com
US20250104702A1 (en) Conversational Artificial Intelligence Platform
CN111462726B (en) Method, device, equipment and medium for answering out call
US20250106321A1 (en) Interactive Voice Response Transcoding
US20250218423A1 (en) Dynamic adaptation of speech synthesis by an automated assistant during automated telephone call(s)
US20250182751A1 (en) Systems and methods for using contextual interim responses in conversations managed by a virtual assistant server
US20220319516A1 (en) Conversation method, conversation system, conversation apparatus, and program
CN105118507B (en) Voice activated control and its control method
US20250310279A1 (en) Real-time user response modifications for customer interactions
CN114598773B (en) An intelligent response system and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination