[go: up one dir, main page]

WO2017166495A1 - 一种语音信号处理方法及装置 - Google Patents

一种语音信号处理方法及装置 Download PDF

Info

Publication number
WO2017166495A1
WO2017166495A1 PCT/CN2016/088981 CN2016088981W WO2017166495A1 WO 2017166495 A1 WO2017166495 A1 WO 2017166495A1 CN 2016088981 W CN2016088981 W CN 2016088981W WO 2017166495 A1 WO2017166495 A1 WO 2017166495A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
voice signal
sound source
determined
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2016/088981
Other languages
English (en)
French (fr)
Inventor
赵宪浩
刘子超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Le Holdings Beijing Co Ltd
Leshi Zhixin Electronic Technology Tianjin Co Ltd
Original Assignee
Le Holdings Beijing Co Ltd
Leshi Zhixin Electronic Technology Tianjin Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Le Holdings Beijing Co Ltd, Leshi Zhixin Electronic Technology Tianjin Co Ltd filed Critical Le Holdings Beijing Co Ltd
Priority to US15/247,841 priority Critical patent/US20170278523A1/en
Publication of WO2017166495A1 publication Critical patent/WO2017166495A1/zh
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/725Cordless telephones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/02Constructional features of telephone sets
    • H04M1/19Arrangements of transmitters, receivers, or complete sets to prevent eavesdropping, to attenuate local noise or to prevent undesired transmission; Mouthpieces or receivers specially adapted therefor
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/02Constructional features of telephone sets
    • H04M1/20Arrangements for preventing acoustic feed-back

Definitions

  • the embodiments of the present invention relate to the field of signal processing technologies, and in particular, to a voice signal processing method and apparatus.
  • the existing multi-microphone terminals mainly include two microphone terminals, three microphone terminals and four microphone terminals, regardless of the two microphone terminals.
  • the three-microphone terminal or the four-microphone terminal usually has one microphone as the main microphone and the other microphones as the auxiliary microphone.
  • the main microphone is mainly used to collect vocal signals, and other microphones mainly collect noise signals for voice processing to achieve noise reduction.
  • the existing two microphone terminals, three microphone terminals, and four microphone terminals use a preset microphone as the main microphone for different voice applications (APP).
  • APP voice applications
  • the microphone set at the bottom is used as the main microphone, and the other microphones are used as the auxiliary microphone.
  • the embodiment of the invention provides a method and a device for processing a voice signal, which are used to solve the problem that the collected voice signal is relatively noisy in the prior art.
  • An embodiment of the present invention provides a voice signal processing method, where the method application includes at least two Terminals of voice collection devices, including:
  • the preset first correspondence a voice processing manner corresponding to the sound source feature value of the first voice signal collected by the at least two voice collection devices, where the preset first corresponding relationship includes the at least two Correspondence between the range of sound source feature values corresponding to the voice collection device and the voice processing mode;
  • the embodiment of the invention further provides a voice signal processing device, comprising:
  • At least two voice collection modules are respectively configured to acquire a first voice signal, where the at least two voice collection device modules are different in position of the first voice signal processing device;
  • a calculation module configured to determine a sound source characteristic value of the first voice signal collected by each of the at least two voice collection modules
  • a processing mode determining module configured to determine, according to the preset first correspondence, a voice processing manner corresponding to the sound source feature value of the first voice signal collected by the at least two voice collection modules determined by the calculating module,
  • the preset first corresponding relationship includes a correspondence between a range of sound source feature values corresponding to the at least two voice collection modules and a voice processing mode;
  • the signal processing module is configured to process the first voice signal collected by the at least two voice collection modules according to the voice processing manner determined by the determining module.
  • An embodiment of the present invention provides a voice signal processing apparatus, including a memory, a processor, and a voice collection device.
  • the processor may be configured to read a program in the memory, and perform the following process: collecting by using the at least two voice collection devices. a first voice signal; determining a sound source feature value of the first voice signal collected by each of the at least two voice collection devices; determining the at least two voice collection devices according to the preset first correspondence a voice processing mode corresponding to the collected sound source feature value of the first voice signal, where the preset first corresponding relationship includes a sound source feature value range and a voice processing mode corresponding to the at least two voice collection devices The first voice signal collected by the at least two voice collection devices is processed according to the determined voice processing manner.
  • Embodiments of the present invention provide a voice signal processing method and apparatus, by determining the at least a sound source characteristic value of the first voice signal collected by each of the two voice collection devices; and then a voice processing method corresponding to the sound source feature value of the first voice signal collected by the at least two voice collection devices And processing, by the determined voice processing manner, the first voice signal collected by the at least two voice collection devices.
  • the sound source characteristic value is matched to the optimal voice processing mode to switch the optimal input and output by presetting the correspondence between the sound source characteristic value range corresponding to the at least two voice collection modules and the voice processing mode.
  • the device achieves a good noise reduction effect and can give the user a better sound experience. The erroneous operation caused by the user's position of the terminal's main microphone is reduced.
  • FIG. 1 is a flow chart of a method for processing a voice signal according to the present invention
  • FIG. 2 is a flow chart of a voice signal processing apparatus provided by the present invention.
  • a voice-based application such as an APP installed on various mobile phones, such as WeChat, QQ voice chat, walkie-talkie application , voice recording application, voice notepad, etc.
  • different APP corresponds to a main microphone, and other microphones are used for noise reduction.
  • the user may communicate with the secondary microphone preset by the terminal as the primary microphone, but the secondary microphone is mainly responsible for The environmental noise is collected, so that the effectiveness of noise reduction is lowered, and thus the technical solution as described below is proposed, but is not limited to the embodiments described below.
  • the embodiment of the invention provides a method and a device for processing a voice signal, which are used to solve the problem that the collected voice signal is relatively noisy in the prior art.
  • the method and the device are based on the same inventive concept. Since the principles of the method and the device for solving the problem are similar, the implementation of the device and the method can be referred to each other, and the repeated description is not repeated.
  • An embodiment of the present invention provides a voice signal processing method, where the method applies a terminal that includes at least two voice collection devices, and the at least two voice collection devices are disposed at different positions of the terminal.
  • the voice collection device may be a microphone, but the form of the microphone, such as a headset, is not limited in the embodiment of the present invention.
  • the method includes:
  • the preset first corresponding relationship includes a correspondence between a range of sound source feature values corresponding to the at least two voice collection devices and a voice processing mode.
  • S104 Process the first voice signal collected by the at least two voice collection devices according to the determined voice processing manner.
  • each of the at least two voice collection devices may be periodically determined.
  • the sound source characteristic value of the first voice signal collected by the voice collection device Therefore, the voice processing mode corresponding to the sound source feature value of the first voice signal collected by the at least two voice collection devices is determined according to the preset first correspondence relationship, thereby avoiding frequent switching of the voice processing mode.
  • the voice processing mode corresponding to the sound source feature value of the first voice signal collected by the at least two voice collection devices is determined according to the preset first correspondence, which may be, but is not limited to, implemented as follows:
  • the voice collection device with the highest sound source feature value of the first voice signal collected in the at least two voice collection devices is selected to collect the voice signal of the primary sound source, and the other voice collection devices collect the external environment noise.
  • the sound source characteristic values of the two voice collection devices are respectively represented by MKF1 and MKF2, and the first correspondence relationship can be set as shown in Table 1.
  • the at least two voice collection devices may be multiple microphones, and when the user performs a normal voice call, the microphone located at the lower end of the terminal is used for the call, and the microphone at the lower end of the terminal mainly acquires the voice of the person, and The microphones in other positions of the terminal mainly acquire the noise of the external environment, so that the external environment noise collected by the microphones at other positions of the terminal is filtered out from the sound collected by the microphone at the lower end of the terminal, and a clear human voice can be obtained. Thereby achieving the purpose of noise reduction.
  • Two voice collection devices with the highest sound source feature value of the first voice signal collected in the at least two voice collection devices are selected to collect voice signals of the primary sound source, and other voice collection devices collect external environmental noise.
  • the second implementation is applicable to terminals including three or more voice collection devices.
  • the method may be implemented as follows:
  • the at least two voices are determined according to the currently determined voice processing manner.
  • the first voice signal collected by the collection device is processed.
  • the user initially uses the microphone at the lower end of the terminal as the main microphone to obtain the sound emitted by the user, and the other microphones are used to obtain the ambient noise, but the user changes the speaking posture during use, and aligns the microphone at the upper end of the terminal.
  • the microphone at the upper end of the terminal can be replaced as the main microphone for acquiring the sound emitted by the user, and the other microphones are used to obtain the ambient noise.
  • the duration of the last determined voice processing mode does not reach the preset duration threshold, according to the last determined voice processing manner.
  • the first voice signal collected by the at least two voice collection devices is processed.
  • the voice processing mode may not be switched.
  • the method before determining the sound source feature value of the first voice signal collected by each of the at least two voice collection devices, the method includes:
  • the voice processing mode for indicating the automatic selection of the voice processing mode is determined to be the on state.
  • the voice processing mode for the automatic selection of the voice processing mode is the off state
  • the sound source feature value of the first voice signal is no longer determined, and the voice processing mode is not determined by the manner provided by the embodiment of the present invention.
  • the manner provided by the prior art can be used, for example, corresponding voice processing is adopted for different applications.
  • the embodiment of the present invention may also be applied to a voice output device.
  • the terminal includes at least one voice output device.
  • the voice output device may be a speaker.
  • the voice output device may be a speaker.
  • the voice output device in the process of playing music by the speaker, when the sounds collected by the at least two voice collecting devices other than the music are large, the volume can be turned up to play the music.
  • the terminal includes two speakers, and the terminal pre-stores the distance between the at least two voice collection devices and the two speakers, when playing music, When the noise collected by the at least two voice collecting devices except the music is large, but the noise collected by the voice collecting device of the left channel is large, the volume of the right channel can be increased. Turn down the volume of the left channel.
  • the feature value of the voice signal collected by the voice collection device matches the best voice processing mode, and the optimal input and output device is switched, thereby achieving a good noise reduction effect, which can be brought to the user. Come for a better sound experience.
  • the erroneous operation caused by the user's position of the terminal's main microphone is reduced.
  • a voice signal processing device is also provided in the embodiment of the present invention. Since the principle and method for solving the problem are similar, the implementation of the device may refer to the implementation of the method, and the repeated description is not repeated.
  • the embodiment of the invention further provides a speech signal processing device, and the speech signal processing device is applied to a terminal.
  • the device comprises:
  • the first voice collection module 201a and the second voice collection module 201b are respectively used in the embodiment of the present invention.
  • the first voice collection module 201a and the second voice collection module 201b are respectively configured to collect the first voice signal.
  • the first voice collection module and the second voice collection module are different in location of the terminal.
  • the calculation module 202 is configured to determine sound source feature values of the first voice signals respectively collected by the first voice collection module 201a and the second voice collection module 201b.
  • the processing mode determining module 203 is configured to determine, according to the preset first correspondence, the sound source feature values of the first voice signals respectively collected by the first voice collection module 201a and the second voice collection module 201b determined by the calculation module 202.
  • the preset first corresponding relationship includes a correspondence between a range of sound source feature values corresponding to the first voice collection module 201a and the second voice collection module 201b and a voice processing mode.
  • the signal processing module 204 is configured to process the first voice signal collected by the first voice collection module 201a and the second voice collection module 201b according to the voice processing mode determined by the processing mode determining module 203.
  • the processing mode determining module 203 is configured to: select, in the first voice collecting module 201a and the second voice collecting module 201b, a voice collecting module with the largest sound source feature value as the voice signal for collecting the primary sound source.
  • the main device and other voice collection modules serve as auxiliary devices for collecting environmental noise.
  • the calculating module 202 is specifically configured to:
  • the sound source characteristic value of the first voice signal collected by each of the at least two voice collection devices is periodically determined.
  • the signal processing module 204 is specifically configured to:
  • the first voice collection module 201a is determined according to the voice processing mode determined this time. And processing the first voice signal collected by the second voice collection module 201b.
  • the device further includes:
  • the state determining module 205 is configured to determine, before the calculating module 202 determines the sound source feature values of the first voice signal collected by the first voice collecting module 201a and the second voice collecting module 201b, The voice processing mode of the processing mode is on.
  • the device may further include:
  • At least one voice output module 206 configured to output a second voice signal
  • the first voice collection module 201a and the second voice collection module 201b are further configured to: when the at least one voice output module outputs the second voice signal, acquire a third voice signal, where the third voice signal includes at least the second voice signal;
  • the calculation module 202 is further configured to determine sound source feature values of the third voice signal collected by the first voice collection module 201a and the second voice collection module 201b;
  • the output mode determining module 207 is configured to determine, according to the preset second correspondence, a voice output mode corresponding to the sound source feature value of the third voice signal collected by the first voice collecting module 201a and the second voice collecting module 201b,
  • the preset second corresponding relationship includes a correspondence between a sound source characteristic value range and a voice output mode corresponding to the first voice collection module 201a and the second voice collection module 201b;
  • control module configured to control the at least one voice output module 206 to output the second voice signal according to the determined voice output manner.
  • the above parts are respectively divided into modules (or units) according to functions.
  • the functions of the various modules (or units) may be implemented in one or more software or hardware in the practice of the invention.
  • the device identification device may be disposed in a server.
  • a voice signal The device includes a memory, a processor, and a voice collection device, wherein the processor is configured to read a program in the memory, and perform the following process: acquiring the first voice signal by the at least two voice collection devices; determining the at least The sound source characteristic value of the first voice signal collected by each of the two voice collection devices; determining the sound of the first voice signal collected by the at least two voice collection devices according to the preset first correspondence relationship a voice processing mode corresponding to the source feature value, where the preset first corresponding relationship includes a correspondence between a sound source feature value range corresponding to the at least two voice collection devices and a voice processing mode;
  • the voice processing mode processes the first voice signal collected by the at least two voice collection devices.
  • the device embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, ie may be located A place, or it can be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment. Those of ordinary skill in the art can understand and implement without deliberate labor.
  • the feature value of the voice signal collected by the voice collection device matches the best voice processing mode, and the optimal input and output device is switched, thereby achieving a good noise reduction effect, which can be brought to the user. Come for a better sound experience.
  • the erroneous operation caused by the user's position of the terminal's main microphone is reduced.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Telephone Function (AREA)

Abstract

本发明提供一种语音信号处理方法及装置,用于解决现有技术存在采集得到的语音信号噪声较大的问题,可以给用户带来更好的声音体验。语音信号处理方式方法包括:通过所述至少两个语音采集设备采集第一语音信号;确定所述至少两个语音采集设备中每个语音采集设备采集到的第一语音信号的声源特征值;根据预设的第一对应关系确定所述至少两个语音采集设备采集到的第一语音信号的声源特征值对应的语音处理方式,所述预设的第一对应的关系包括所述至少两个语音采集设备所对应的声源特征值范围与语音处理方式之间的对应关系;根据所述确定的语音处理方式对所述至少两个语音采集设备采集的第一语音信号进行处理。

Description

一种语音信号处理方法及装置
本申请要求在2016年3月28日提交中国专利局、申请号为201610184725.X、发明名称为“一种语音信号处理方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明实施例涉及信号处理技术领域,尤其涉及一种语音信号处理方法及装置。
背景技术
为了提高手机的语音应用的质量,许多手机厂商都通过增加麦克风数量来增加语音应用的质量,现有的多麦克风终端主要包括两麦克风终端、三麦克风终端以及四麦克风终端,而无论是两麦克风终端、三麦克风终端还是四麦克风终端,通常都是设置一个麦克风作为主麦克风,其他麦克风作为辅麦克风。通过主麦克风主要采集人声信号,其他麦克风主要采集噪音信号来进行语音处理的,达到降噪的效果。
但是现有的两麦克风终端、三麦克风终端以及四麦克风终端,针对不同语音应用(APP),采用终端预先设定好的麦克风作为主麦克风。比如针对微信语音时,采用设置在底部的麦克风作为主麦克风,其他的麦克风作为辅麦克风。
发明人在实现本发明的过程中发现:现在大多数用户不确定针对具体APP所设置的主麦克风,这样会导致用户可能会将终端预先设定的辅麦克风作为主麦克风进行通信,但是该辅麦克风主要负责采集环境噪声,从而会造成采集到的用户用于通信的语音信号噪声较大。
发明内容
本发明实施例提供一种语音信号处理方法及装置,用于解决现有技术存在采集得到的语音信号噪声较大的问题。
本发明实施例提供了一种语音信号处理方法,所述方法应用包括至少两 个语音采集设备的终端,包括:
通过所述至少两个语音采集设备采集第一语音信号;
确定所述至少两个语音采集设备中每个语音采集设备采集到的第一语音信号的声源特征值;
根据预设的第一对应关系确定所述至少两个语音采集设备采集到的第一语音信号的声源特征值对应的语音处理方式,所述预设的第一对应的关系包括所述至少两个语音采集设备所对应的声源特征值范围与语音处理方式之间的对应关系;
根据所述确定的语音处理方式对所述至少两个语音采集设备采集的第一语音信号进行处理。
本发明实施例还提供了一种语音信号处理装置,包括:
至少两个语音采集模块,分别用于采集第一语音信号,所述至少两个语音采集设备模块在所述第一语音信号处理装置的位置不同;
计算模块,用于确定所述至少两个语音采集模块中每个语音采集模块采集到的第一语音信号的声源特征值;
处理方式确定模块,用于根据预设的第一对应关系确定所述计算模块确定的所述至少两个语音采集模块采集到的第一语音信号的声源特征值对应的语音处理方式,所述预设的第一对应的关系包括所述至少两个语音采集模块所对应的声源特征值范围与语音处理方式之间的对应关系;
信号处理模块,用于根据所述确定模块确定的语音处理方式对所述至少两个语音采集模块采集的第一语音信号进行处理。
本发明实施例提供一种语音信号处理装置,包括存储器、处理器以及语音采集设备,其中,处理器可以用于读取存储器中的程序,执行下列过程:通过所述至少两个语音采集设备采集第一语音信号;确定所述至少两个语音采集设备中每个语音采集设备采集到的第一语音信号的声源特征值;根据预设的第一对应关系确定所述至少两个语音采集设备采集到的第一语音信号的声源特征值对应的语音处理方式,所述预设的第一对应的关系包括所述至少两个语音采集设备所对应的声源特征值范围与语音处理方式之间的对应关系;根据所述确定的语音处理方式对所述至少两个语音采集设备采集的第一语音信号进行处理。
本发明实施例提供了语音信号处理方法及装置,通过确定的所述至少 两个语音采集设备中每个语音采集设备采集到的第一语音信号的声源特征值;然后所述至少两个语音采集设备采集到的第一语音信号的声源特征值对应的语音处理方式,根据所述确定的语音处理方式对所述至少两个语音采集设备采集的第一语音信号进行处理。由于预先设置好所述至少两个语音采集模块所对应的声源特征值范围与语音处理方式之间的对应关系,通过声源特征值来匹配最佳的语音处理方式,切换最佳的输入输出设备,达到了很好的降噪效果,可以给用户带来更好的声音体验。减少了用户对终端的主麦克风所在位置的情况下所带来的误操作。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本发明提供的一种语音信号处理方法流程图;
图2为本发明提供的一种语音信号处理装置流程图。
具体实施方式
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
由于装配两或者三或者四个麦克风的手机的降噪技术针对通话场景提出的或者基于语音的各种应用提出的,例如各种手机上安装的APP,如微信、QQ里的语音聊天、对讲机应用、语音录制应用、语音记事本等,不同的APP对应一种主麦克风,其他的麦克风用于降噪。但是针对某一个应用使用确定的主麦风,如果用户不确定该应用的主麦克风的情况,这样会导致用户可能会将终端预先设定的辅麦克风作为主麦克风进行通信,但是该辅麦克风主要负责采集环境噪声,使得降噪的有效性降低了,因此提出了如下面所描述的技术方案,但不仅限于下面所描述的各实施例。
本发明实施例提供一种语音信号处理方法及装置,用于解决现有技术存在采集得到的语音信号噪声较大的问题。其中,方法和装置是基于同一发明构思的,由于方法及装置解决问题的原理相似,因此装置与方法的实施可以相互参见,重复之处不再赘述。
本发明实施例提供了一种语音信号处理方法,所述方法应用包括至少两个语音采集设备的终端,所述至少两个语音采集设备设置在所述终端的位置不同。语音采集设备可以为麦克风,但本发明实施例中不限定麦克风的形式,例如耳麦。
如图1所示,该方法包括:
S101,通过所述至少两个语音采集设备采集第一语音信号。
S102,确定所述至少两个语音采集设备中每个语音采集设备采集到的第一语音信号的声源特征值。
S103,根据预设的第一对应关系确定所述至少两个语音采集设备采集到的第一语音信号的声源特征值对应的语音处理方式。
所述预设的第一对应的关系包括所述至少两个语音采集设备所对应的声源特征值范围与语音处理方式之间的对应关系。
S104,根据所述确定的语音处理方式对所述至少两个语音采集设备采集的第一语音信号进行处理。
可选地,在确定所述至少两个语音采集设备中每个语音采集设备采集到的第一语音信号的声源特征值时,可以周期性的确定所述至少两个语音采集设备中每个语音采集设备采集到的第一语音信号的声源特征值。从而每周期根据预设的第一对应关系确定所述至少两个语音采集设备采集到的第一语音信号的声源特征值对应的语音处理方式,从而避免频繁的切换语音处理方式。
可选地,根据预设的第一对应关系确定所述至少两个语音采集设备采集到的第一语音信号的声源特征值对应的语音处理方式,可以但不仅限于通过如下方式实现:
第一种实现方式
选择所述至少两个语音采集设备中采集到的第一语音信号的声源特征值最大的语音采集设备采集主声源的语音信号,其他的语音采集设备采集外部环境噪音。
以两个语音采集设备为例,两个语音采集设备的声源特征值分别通过MKF1、MKF2表示,第一对应关系可以设置如表1所示。
表1
Figure PCTCN2016088981-appb-000001
在该技术方案中,至少两个语音采集设备可以是多个麦克风,用户在进行正常语音通话时,使用位于终端下端的麦克风进行通话,则终端下端的麦克风主要获取的是人的说话声音,而终端的其他位置上的麦克风主要获取的是外部环境的噪音,这样,从终端下端的麦克风采集的声音中过滤掉终端的其他位置的麦克风采集的外部环境噪音,就可以获取到清晰的人声,从而达到降噪的目的。
第二种实现方式
选择所述至少两个语音采集设备中采集到的第一语音信号的声源特征值最大的两个语音采集设备采集主声源的语音信号,其他的语音采集设备采集外部环境噪音。
第二种实现方式适用于包括三个或者三个以上的语音采集设备的终端。
可选地,在根据所述确定的语音处理方式对所述至少两个语音采集设备采集的第一语音信号进行处理时,可以通过如下方式实现:
确定本次确定的语音处理方式与上一次确定的语音处理方式不同且采用上一次确定的语音处理方式的时长达到预设时长阈值时,根据本次确定的语音处理方式对所述至少两个语音采集设备采集的第一语音信号进行处理。
比如用户使用微信过程中,一开始使用终端下端的麦克风作为主麦克风,用于获取用户发出的声音,其他麦克风用于获取环境噪声,但是用户使用过程中更换了说话姿势,对准终端上端的麦克风说话的时长达到预设时长阈值时,则可以更换将终端上端的麦克风作为主麦克风,用于获取用户发出的声音,其他麦克风用于获取环境噪声。
可选地,在确定本次确定的语音处理方式与上一次确定的语音处理方式不同且采用上一次确定的语音处理方式的时长未达到预设时长阈值时,根据上一次确定的语音处理方式对所述至少两个语音采集设备采集的第一语音信号进行处理。
通过上述实现方式,可以避免频繁的切换语音处理方式。例如,用户在打电话过程中,路过一个嘈杂的环境,但是在嘈杂环境中的时间较短,则可以不切换语音处理方式。
可选的,在确定所述至少两个语音采集设备中每个语音采集设备采集到的第一语音信号的声源特征值之前,所述方法包括:
确定用于指示自动选择语音处理方式的语音处理模式为开启状态。
在确定用于指示自动选择语音处理方式的语音处理模式为关闭状态时,则不再确定第一语音信号的声源特征值,不再通过本发明实施例提供的方式来确定语音处理方式,则可以采用现有技术提供的方式,例如针对不同的应用采用对应的语音处理方式。
可选地,本发明实施例还可以应用于语音输出设备。终端包括至少一个语音输出设备。
在至少一个语音输出设备输出第二语音信号时,通过所述至少两个语音采集设备采集第三语音信号,所述第三语音信号至少包括所述第二语音信号;
确定所述至少两个语音采集设备中每个语音采集设备采集到的第三语音信号的声源特征值;
根据预设的第二对应关系确定所述至少两个语音采集设备采集到的第三语音信号的声源特征值对应的语音输出方式,所述预设的第二对应的关系包括所述至少两个语音采集设备所对应的声源特征值范围与语音输出方式之间的对应关系;
根据所述确定的语音输出方式控制所述至少一个语音输出设备输出所述第二语音信号。
在本发明实施例中,语音输出设备可以是喇叭。比如在喇叭播放音乐的过程中,所述至少两个语音采集设备采集到的除所述音乐之外的其他声音较大时,则可以调高音量来播放音乐。比如终端包括两个喇叭,终端预先存储有至少两个语音采集设备与所述两个喇叭的距离,则在播放音乐时, 所述至少两个语音采集设备采集到的除所述音乐之外的噪声较大时,但是距离左声道的语音采集设备采集到的噪声较大时,则可以调高右声道的音量,调低左声道的音量。
通过本发明实施例提供的方式,通过语音采集设备采集到的语音信号的特征值匹配最佳的语音处理方式,切换最佳的输入输出设备,达到了很好的降噪效果,可以给用户带来更好的声音体验。减少了用户对终端的主麦克风所在位置的情况下所带来的误操作。
基于同一发明构思,本发明实施例中还提供了一种语音信号处理装置,由于装置解决问题的原理与方法相似,因此装置的实施可以参见方法的实施,重复之处不再赘述。
本发明实施例还提供了一种语音信号处理装置,所述语音信号处理装置应用于终端。如图2所示,该装置包括:
至少两个语音采集模块,本发明实施例以两个为例,分别为第一语音采集模块201a和第二语音采集模块201b。第一语音采集模块201a和第二语音采集模块201b分别用于采集第一语音信号。
所述第一语音采集模块和第二语音采集模块在终端的位置不同。
计算模块202,用于确定第一语音采集模块201a和第二语音采集模块201b分别采集到的第一语音信号的声源特征值。
处理方式确定模块203,用于根据预设的第一对应关系确定所述计算模块202确定的第一语音采集模块201a和第二语音采集模块201b分别采集到的第一语音信号的声源特征值对应的语音处理方式,所述预设的第一对应的关系包括第一语音采集模块201a和第二语音采集模块201b所对应的声源特征值范围与语音处理方式之间的对应关系。
信号处理模块204,用于根据所述处理方式确定模块203确定的语音处理方式对第一语音采集模块201a和第二语音采集模块201b采集的第一语音信号进行处理。
可选的,所述处理方式确定模块203,具体用于:在第一语音采集模块201a和第二语音采集模块201b中选择声源特征值最大的语音采集模块作为用于采集主声源语音信号的主设备,其他语音采集模块作为用于采集环境噪声的辅设备。
可选地,所述计算模块202,具体用于:
周期性的确定所述至少两个语音采集设备中每个语音采集设备采集到的第一语音信号的声源特征值。
可选地,所述信号处理模块204,具体用于:
确定本次确定的语音处理方式与上一次确定的语音处理方式不同且采用上一次确定的语音处理方式的时长达到预设时长阈值时,根据本次确定的语音处理方式对第一语音采集模块201a和第二语音采集模块201b采集的第一语音信号进行处理。
可选地,所述装置还包括:
状态确定模块205,用于在所述计算模块202确定所述第一语音采集模块201a和第二语音采集模块201b采集到的第一语音信号的声源特征值之前,确定用于指示自动选择语音处理方式的语音处理模式为开启状态。
所述装置还可以包括:
至少一个语音输出模块206,用于输出第二语音信号;
第一语音采集模块201a和第二语音采集模块201b,还用于在所述至少一个语音输出模块输出第二语音信号时,采集第三语音信号,所述第三语音信号至少包括所述第二语音信号;
所述计算模块202,还用于确定所述第一语音采集模块201a和第二语音采集模块201b采集到的第三语音信号的声源特征值;
输出方式确定模块207,用于根据预设的第二对应关系确定所述第一语音采集模块201a和第二语音采集模块201b采集到的第三语音信号的声源特征值对应的语音输出方式,所述预设的第二对应的关系包括所述第一语音采集模块201a和第二语音采集模块201b所对应的声源特征值范围与语音输出方式之间的对应关系;
控制模块,用于根据所述确定的语音输出方式控制所述至少一个语音输出模块206输出所述第二语音信号。
为了描述的方便,以上各部分按照功能划分为各模块(或单元)分别描述。当然,在实施本发明时可以把各模块(或单元)的功能在同一个或多个软件或硬件中实现。具体实施时,上述设备识别装置可以设置于服务器中。
本发明实施例中可以通过硬件处理器(hardware processor)来实现图2所示的除语音采集模块以外的相关功能模块。具体的,一种语音信号处 理装置,包括存储器、处理器以及语音采集设备,其中,处理器可以用于读取存储器中的程序,执行下列过程:通过所述至少两个语音采集设备采集第一语音信号;确定所述至少两个语音采集设备中每个语音采集设备采集到的第一语音信号的声源特征值;根据预设的第一对应关系确定所述至少两个语音采集设备采集到的第一语音信号的声源特征值对应的语音处理方式,所述预设的第一对应的关系包括所述至少两个语音采集设备所对应的声源特征值范围与语音处理方式之间的对应关系;根据所述确定的语音处理方式对所述至少两个语音采集设备采集的第一语音信号进行处理。
以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下,即可以理解并实施。
通过本发明实施例提供的方式,通过语音采集设备采集到的语音信号的特征值匹配最佳的语音处理方式,切换最佳的输入输出设备,达到了很好的降噪效果,可以给用户带来更好的声音体验。减少了用户对终端的主麦克风所在位置的情况下所带来的误操作。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件。基于这样的理解,上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。
最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。

Claims (10)

  1. 一种语音信号处理方法,其特征在于,所述方法应用包括至少两个语音采集设备的终端,所述至少两个语音采集设备设置在所述终端的位置不同,包括:
    通过所述至少两个语音采集设备采集第一语音信号;
    确定所述至少两个语音采集设备中每个语音采集设备采集到的第一语音信号的声源特征值;
    根据预设的第一对应关系确定所述至少两个语音采集设备采集到的第一语音信号的声源特征值对应的语音处理方式,所述预设的第一对应的关系包括所述至少两个语音采集设备所对应的声源特征值范围与语音处理方式之间的对应关系;
    根据所述确定的语音处理方式对所述至少两个语音采集设备采集的第一语音信号进行处理。
  2. 根据权利要求1所述的方法,其特征在于,所述根据预设的第一对应关系确定所述至少两个语音采集设备采集到的第一语音信号的声源特征值对应的语音处理方式,包括:
    在所述至少两个语音采集设备中选择声源特征值最大的语音采集设备作为用于采集主声源语音信号的主设备,其他语音采集设备作为用于采集环境噪声的辅设备。
  3. 根据权利要求1或2所述的方法,其特征在于,所述根据所述确定的语音处理方式对所述至少两个语音采集设备采集的第一语音信号进行处理,包括:
    确定本次确定的语音处理方式与上一次确定的语音处理方式不同且采用上一次确定的语音处理方式的时长达到预设时长阈值时,根据本次确定的语音处理方式对所述至少两个语音采集设备采集的第一语音信号进行处理。
  4. 根据权利要求1所述的方法,其特征在于,所述确定所述至少两个语音采集设备中每个语音采集设备采集到的第一语音信号的声源特征值之前,包括:
    确定用于指示自动选择语音处理方式的语音处理模式为开启状态。
  5. 根据权利要求1所述的方法,其特征在于,还包括:
    在至少一个语音输出设备输出第二语音信号时,通过所述至少两个语音采集设备采集第三语音信号,所述第三语音信号至少包括所述第二语音信号;
    确定所述至少两个语音采集设备中每个语音采集设备采集到的第三语音信号的声源特征值;
    根据预设的第二对应关系确定所述至少两个语音采集设备采集到的第三语音信号的声源特征值对应的语音输出方式,所述预设的第二对应的关系包括所述至少两个语音采集设备所对应的声源特征值范围与语音输出方式之间的对应关系;
    根据所述确定的语音输出方式控制所述至少一个语音输出设备输出所述第二语音信号。
  6. 一种语音信号处理装置,其特征在于,包括:
    至少两个语音采集模块,分别用于采集第一语音信号,所述至少两个语音采集设备模块在所述第一语音信号处理装置的位置不同;
    计算模块,用于确定所述至少两个语音采集模块中每个语音采集模块采集到的第一语音信号的声源特征值;
    处理方式确定模块,用于根据预设的第一对应关系确定所述计算模块确定的所述至少两个语音采集模块采集到的第一语音信号的声源特征值对应的语音处理方式,所述预设的第一对应的关系包括所述至少两个语音采集模块所对应的声源特征值范围与语音处理方式之间的对应关系;
    信号处理模块,用于根据所述确定模块确定的语音处理方式对所述至少两个语音采集模块采集的第一语音信号进行处理。
  7. 根据权利要求6所述的装置,其特征在于,所述处理方式确定模块,具体用于:在所述至少两个语音采集模块中选择声源特征值最大的语音采集模块作为用于采集主声源语音信号的主设备,其他语音采集模块作为用于采集环境噪声的辅设备。
  8. 根据权利要求6或7所述的装置,其特征在于,所述信号处理模块,具体用于:
    确定本次确定的语音处理方式与上一次确定的语音处理方式不同且采用上一次确定的语音处理方式的时长达到预设时长阈值时,根据本次确定的语音处理方式对所述至少两个语音采集模块采集的第一语音信号进行处 理。
  9. 根据权利要求6所述的装置,其特征在于,还包括:
    状态确定模块,用于在所述计算模块确定所述至少两个语音采集模块中每个语音采集设备采集到的第一语音信号的声源特征值之前,确定用于指示自动选择语音处理方式的语音处理模式为开启状态。
  10. 根据权利要求6所述的装置,其特征在于,还包括:
    至少一个语音输出模块,用于输出第二语音信号;
    所述至少两个语音采集模块,还用于在所述至少一个语音输出模块输出第二语音信号时,采集第三语音信号,所述第三语音信号至少包括所述第二语音信号;
    所述计算模块,还用于确定所述至少两个语音采集模块中每个语音采集模块采集到的第三语音信号的声源特征值;
    输出方式确定模块,用于根据预设的第二对应关系确定所述至少两个语音采集模块采集到的第三语音信号的声源特征值对应的语音输出方式,所述预设的第二对应的关系包括所述至少两个语音采集模块所对应的声源特征值范围与语音输出方式之间的对应关系;
    控制模块,用于根据所述确定的语音输出方式控制所述至少一个语音输出模块输出所述第二语音信号。
PCT/CN2016/088981 2016-03-28 2016-07-06 一种语音信号处理方法及装置 Ceased WO2017166495A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/247,841 US20170278523A1 (en) 2016-03-28 2016-08-25 Method and device for processing a voice signal

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610184725.XA CN105847497A (zh) 2016-03-28 2016-03-28 一种语音信号处理方法及装置
CN201610184725.X 2016-03-28

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/247,841 Continuation US20170278523A1 (en) 2016-03-28 2016-08-25 Method and device for processing a voice signal

Publications (1)

Publication Number Publication Date
WO2017166495A1 true WO2017166495A1 (zh) 2017-10-05

Family

ID=56583746

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/088981 Ceased WO2017166495A1 (zh) 2016-03-28 2016-07-06 一种语音信号处理方法及装置

Country Status (2)

Country Link
CN (1) CN105847497A (zh)
WO (1) WO2017166495A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107154265A (zh) * 2017-03-30 2017-09-12 联想(北京)有限公司 一种采集控制方法及电子设备
CN107886966A (zh) * 2017-10-30 2018-04-06 捷开通讯(深圳)有限公司 终端及其优化语音命令的方法、存储装置
CN110166879B (zh) 2019-06-28 2020-11-13 歌尔科技有限公司 语音采集控制方法、装置及tws耳机
CN110602327B (zh) 2019-09-24 2021-06-25 腾讯科技(深圳)有限公司 语音通话方法、装置、电子设备及计算机可读存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104702787A (zh) * 2015-03-12 2015-06-10 深圳市欧珀通信软件有限公司 一种应用于移动终端的声音采集方法和移动终端
CN105049606A (zh) * 2015-06-17 2015-11-11 惠州Tcl移动通信有限公司 一种移动终端麦克风切换方法及切换系统
WO2016000292A1 (zh) * 2014-06-30 2016-01-07 中兴通讯股份有限公司 选择主麦克风的方法及装置

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000341798A (ja) * 1999-05-28 2000-12-08 Sanyo Electric Co Ltd ステレオ音像拡大装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016000292A1 (zh) * 2014-06-30 2016-01-07 中兴通讯股份有限公司 选择主麦克风的方法及装置
CN104702787A (zh) * 2015-03-12 2015-06-10 深圳市欧珀通信软件有限公司 一种应用于移动终端的声音采集方法和移动终端
CN105049606A (zh) * 2015-06-17 2015-11-11 惠州Tcl移动通信有限公司 一种移动终端麦克风切换方法及切换系统

Also Published As

Publication number Publication date
CN105847497A (zh) 2016-08-10

Similar Documents

Publication Publication Date Title
CN110970057B (zh) 一种声音处理方法、装置与设备
US9071900B2 (en) Multi-channel recording
CN104702787A (zh) 一种应用于移动终端的声音采集方法和移动终端
US20170318374A1 (en) Headset, an apparatus and a method with automatic selective voice pass-through
CN106231088B (zh) 一种语音通话的方法、装置及终端
CN109360549B (zh) 一种数据处理方法、穿戴设备和用于数据处理的装置
US10461712B1 (en) Automatic volume leveling
CN115482830B (zh) 语音增强方法及相关设备
CN104462070A (zh) 语音翻译系统和语音翻译方法
KR20140121447A (ko) 오디오 데이터 프로세싱을 위한 방법, 디바이스, 및 시스템
US20240096343A1 (en) Voice quality enhancement method and related device
CN102710838A (zh) 一种音量调节方法及装置、电子设备
CN107995360A (zh) 通话处理方法及相关产品
CN108369805A (zh) 一种语音交互方法、装置和智能终端
JP2017527148A (ja) 音質改善のための方法及びヘッドセット
CN105187602B (zh) 一种终端音量的调节方法和装置
CN111988704B (zh) 声音信号处理方法、装置以及存储介质
US11081125B2 (en) Noise cancellation in voice communication systems
WO2017166495A1 (zh) 一种语音信号处理方法及装置
US10516941B2 (en) Reducing instantaneous wind noise
CN111063363B (zh) 一种语音获取方法、音频设备和具有存储功能的装置
KR20250150542A (ko) 예측된 노이즈를 사용한 스피치 향상
CN117765948A (zh) 可穿戴电子设备及其语音检测方法
JP3838159B2 (ja) 音声認識対話装置およびプログラム
EP4184507A1 (en) Headset apparatus, teleconference system, user device and teleconferencing method

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16896267

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 16896267

Country of ref document: EP

Kind code of ref document: A1