CN116959475A - A speech denoising method based on improved spectral subtraction - Google Patents
A speech denoising method based on improved spectral subtraction Download PDFInfo
- Publication number
- CN116959475A CN116959475A CN202310837044.9A CN202310837044A CN116959475A CN 116959475 A CN116959475 A CN 116959475A CN 202310837044 A CN202310837044 A CN 202310837044A CN 116959475 A CN116959475 A CN 116959475A
- Authority
- CN
- China
- Prior art keywords
- speech
- signal
- noise
- spectral subtraction
- method based
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000003595 spectral effect Effects 0.000 title claims abstract description 40
- 238000000034 method Methods 0.000 title claims abstract description 29
- 238000001228 spectrum Methods 0.000 claims abstract description 41
- 238000007781 pre-processing Methods 0.000 claims abstract description 4
- 238000012937 correction Methods 0.000 claims description 20
- 230000001629 suppression Effects 0.000 claims description 11
- 238000004364 calculation method Methods 0.000 claims description 9
- 102000037983 regulatory factors Human genes 0.000 claims description 6
- 108091008025 regulatory factors Proteins 0.000 claims description 6
- 238000009432 framing Methods 0.000 claims description 3
- 230000005236 sound signal Effects 0.000 claims description 3
- 238000012545 processing Methods 0.000 abstract description 3
- 230000006870 function Effects 0.000 description 11
- 238000013135 deep learning Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000005284 excitation Effects 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 210000001260 vocal cord Anatomy 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0232—Processing in the frequency domain
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
本发明公开了一种基于改进谱减法的语音去噪方法,包括以下步骤:输入带噪语音,对带噪语音的信号进行预处理,将信号从时域维度转换到帧维度;采用傅里叶变换,获取信号全段频谱;逐帧计算人声所在子频带的频谱能量占比;生成语音屏蔽掩码,估计噪声;根据调节因子划分语音浊音、语音清音和噪声;对语音浊音、语音清音、噪声进行谱减后权重修正。本发明一种基于改进谱减法的语音去噪方法,能够实时的、有效的抑制平稳和非平稳噪声,显著提升语音信噪比,同时能够保证语音信号有效部分不受前端去噪带来的失真影响,适用于各语音任务下系统的前端去噪,提升系统处理速率及性能。
The invention discloses a speech denoising method based on improved spectral subtraction, which includes the following steps: inputting noisy speech, preprocessing the signal of the noisy speech, converting the signal from the time domain dimension to the frame dimension; using Fourier transform Transform to obtain the full spectrum of the signal; calculate the proportion of spectrum energy of the sub-band where the human voice is located frame by frame; generate a speech shielding mask to estimate the noise; divide voiced speech, unvoiced speech and noise according to the adjustment factor; classify speech voiced, unvoiced speech, The noise is weighted after spectral subtraction. The present invention is a speech denoising method based on improved spectral subtraction, which can effectively suppress stationary and non-stationary noise in real time, significantly improve the speech signal-to-noise ratio, and at the same time ensure that the effective part of the speech signal is not subject to distortion caused by front-end denoising. Impact, suitable for front-end denoising of the system under various speech tasks, improving system processing speed and performance.
Description
技术领域Technical field
本发明属于语音增强技术领域,具体涉及一种基于改进谱减法的语音去噪方法。The invention belongs to the technical field of speech enhancement, and specifically relates to a speech denoising method based on improved spectral subtraction.
背景技术Background technique
语音去噪部署于语音系统前端,用于抑制噪声下语音信号的退化失真,提升语音信号的听觉质量和可懂性,以解决语音识别、声纹识别、场景录音和听力辅助等任务在某些极端场景下,噪声频谱污染严重、语音质量低下,进而影响到后端任务性能的共性问题。因此噪声问题是领域关注重点,有较多语音去噪方法被提出和应用。Speech denoising is deployed at the front end of the speech system to suppress the degradation distortion of speech signals under noise and improve the auditory quality and understandability of speech signals to solve certain tasks such as speech recognition, voiceprint recognition, scene recording and listening assistance. In extreme scenarios, common problems include serious noise spectrum pollution and low voice quality, which in turn affects the performance of back-end tasks. Therefore, the noise problem is the focus of the field, and many speech denoising methods have been proposed and applied.
目前已有的单通道和基于麦克风阵列的信号处理去噪方法和深度学习去噪方法尽管解决了部分场景的噪声问题,但仍然存在一些技术层面的不足,从而实际应用受限。这些不足可概括为如下方面:Although the existing single-channel and microphone array-based signal processing denoising methods and deep learning denoising methods have solved the noise problem in some scenes, they still have some technical deficiencies, which limit their practical application. These shortcomings can be summarized as follows:
(1)单通道的去噪方法中,如维纳滤波要求预期的纯净语音已知,小波变换的适用范围有限且去噪效果一般不佳,基于统计学和子空间的去噪方法计算量大,不适合于实时任务;(1) In single-channel denoising methods, for example, Wiener filtering requires that the expected pure speech is known, wavelet transform has a limited scope of application and the denoising effect is generally poor, and denoising methods based on statistics and subspaces are computationally intensive. Not suitable for real-time tasks;
(2)基于麦克风阵列的去噪方法存在一定的实际部署问题,多个麦克风部署成本高昂,且某些极端场景的麦克风阵列构型下形成的波束噪声抑制一般,甚至是不可行的;(2) Denoising methods based on microphone arrays have certain practical deployment problems. The cost of deploying multiple microphones is high, and the beam noise suppression formed under the microphone array configuration in some extreme scenarios is average or even unfeasible;
(3)深度学习的去噪方法多是数据驱动的,去噪效果依赖于训练数据质量和规模,且参数量庞大,实际部署有一定门槛,应用有限。(3) Deep learning denoising methods are mostly data-driven. The denoising effect depends on the quality and scale of the training data, and the number of parameters is huge. There are certain thresholds for actual deployment and limited applications.
因此,亟需一种适用性强、运算量小、实时性高、部署成本低的语音去噪方法。Therefore, there is an urgent need for a speech denoising method with strong applicability, small computational load, high real-time performance, and low deployment cost.
发明内容Contents of the invention
有鉴于此,本发明的目的在于提供一种基于改进谱减法的语音去噪方法。本发明旨解决现有语音去噪方法针对语音识别、声纹识别、场景录音和听力辅助等任务,存在适用范围小、运算量大、实时性差、部署成本高的问题。In view of this, the object of the present invention is to provide a speech denoising method based on improved spectral subtraction. The present invention aims to solve the problems of existing speech denoising methods for tasks such as speech recognition, voiceprint recognition, scene recording, and hearing assistance, which have small scope of application, large amount of calculation, poor real-time performance, and high deployment cost.
为达到上述目的,本发明提供了一种基于改进谱减法的语音去噪方法,包括以下步骤:In order to achieve the above objectives, the present invention provides a speech denoising method based on improved spectral subtraction, which includes the following steps:
S1.输入带噪语音,对带噪语音的信号进行预处理,将信号从时域维度转换到帧维度;S1. Input the noisy speech, preprocess the noisy speech signal, and convert the signal from the time domain dimension to the frame dimension;
S2.采用傅里叶变换,获取信号全段频谱;S2. Use Fourier transform to obtain the full spectrum of the signal;
S3.逐帧计算人声所在子频带的频谱能量占比;S3. Calculate the spectral energy proportion of the sub-band where the human voice is located frame by frame;
S4.生成语音屏蔽掩码,估计噪声;S4. Generate a speech shielding mask and estimate the noise;
S5.根据调节因子划分语音浊音、语音清音和噪声;S5. Classify voiced speech, unvoiced speech and noise according to the adjustment factor;
S6.对语音浊音、语音清音、噪声进行谱减后权重修正。S6. Perform weight correction after spectral subtraction for voiced speech, unvoiced speech, and noise.
进一步,所述步骤S1中,预处理操作包括预加重、分帧和加窗。Further, in step S1, the preprocessing operations include pre-emphasis, framing and windowing.
进一步,所述步骤S1中,加窗选用汉宁窗函数。Furthermore, in step S1, the Hanning window function is used for windowing.
进一步,所述步骤S3包括以下子步骤:Further, step S3 includes the following sub-steps:
S3.1在频域上划分出人声能量集中的子频带,子频带的频率区间:300至3400Hz;S3.1 divides the sub-bands in the frequency domain where the human voice energy is concentrated. The frequency range of the sub-bands is: 300 to 3400Hz;
S3.2统计人声所在子频带的频谱能量Esub与信号全频域上的频谱能量Eall;S3.2 Statistics of the spectral energy E sub of the sub-band where the human voice is located and the spectral energy E all of the full frequency domain of the signal;
S3.3计算子频带的频谱能量占比Ren,计算表达式如下: S3.3 Calculate the spectrum energy proportion R en of the sub-band. The calculation expression is as follows:
进一步,所述步骤S3.2中,人声所在子频带的频谱能量Esub,计算表达式如下:Further, in step S3.2, the spectral energy E sub of the sub-band where the human voice is located is calculated as follows:
式中,fstart、fend均为截止频率,fstart=300Hz,fend=3400Hz;S为频率f对应的幅值。In the formula, f start and f end are both cutoff frequencies, f start =300Hz, f end =3400Hz; S is the amplitude corresponding to frequency f.
进一步,所述步骤S4包括以下子步骤:Further, the step S4 includes the following sub-steps:
S4.1设置能量占比阈值Ten,通过能量占比阈值Ten比较产生帧维度的语音段的屏蔽掩码Maskn,Maskn的计算表达式如下:S4.1 Set the energy proportion threshold T en , and compare the energy proportion threshold T en to generate the frame-dimensional speech segment mask Mask n . The calculation expression of Mask n is as follows:
S4.2语音屏蔽掩码与频谱能量相乘,得到噪声频谱估计。S4.2 The speech mask is multiplied by the spectrum energy to obtain the noise spectrum estimate.
进一步,所述步骤S4.1中,能量占比阈值Ten为0.6。Further, in step S4.1, the energy proportion threshold T en is 0.6.
进一步,所述步骤S5包括以下子步骤:Further, the step S5 includes the following sub-steps:
S5.1信号全段逐帧计算平均幅度和过零率两频域特征,取比值作为抑制函数的调节因子,如下式:Calculate the two frequency domain characteristics of the average amplitude and zero-crossing rate frame by frame for the entire S5.1 signal, and take the ratio as the adjustment factor of the suppression function, as follows:
式中,AA代表音频信号平均幅度;ZCR代表短时过零率;F代表抑制函数的调节因子;In the formula, AA represents the average amplitude of the audio signal; ZCR represents the short-time zero-crossing rate; F represents the adjustment factor of the suppression function;
S5.2根据调节因子划分噪声、浊音和清音的门限值T1和T2,如下式:S5.2 divide the threshold values T 1 and T 2 of noise, voiced sound and unvoiced sound according to the adjustment factor, as follows:
T1=α1*max(F)+α2*min(F)T 1 =α 1 *max(F)+α 2 *min(F)
T2=β1*max(F)+β2*min(F)T 2 =β 1 *max(F)+β 2 *min(F)
式中,α1、α2代表T1对应的调节因子,α1+α2=1;β1、β2代表T2对应的调节因子,β1+β2=1;In the formula, α 1 and α 2 represent the regulatory factors corresponding to T 1 , α 1 + α 2 = 1; β 1 and β 2 represent the regulatory factors corresponding to T 2 , β 1 + β 2 = 1;
F<T1,划分为噪声;T1<F<T2,划分为清音;T2<F,划分为浊音。F<T 1 is classified as noise; T 1 <F <T 2 is classified as voiceless sound; T 2 <F is classified as voiced sound.
进一步,所述步骤S6的具体步骤如下:Further, the specific steps of step S6 are as follows:
设置谱减后信号频谱权重,噪声段的修正权重为w1,语音清音段的修正权重为w2,浊音段的修正权重为w3,然后作谱减后信号的权重修正,得到去噪后的信号频谱,如下式:Set the spectrum weight of the signal after spectral subtraction, the correction weight of the noise segment is w 1 , the correction weight of the unvoiced speech segment is w 2 , and the correction weight of the voiced segment is w 3 , and then perform the weight correction of the signal after spectral subtraction to obtain the denoised The signal spectrum is as follows:
式中,为去噪后的信号频谱;/>为噪声段的信号频谱;/>为语音清音段的信号频谱;/>为浊音段的信号频谱。In the formula, is the signal spectrum after denoising;/> is the signal spectrum of the noise section;/> is the signal spectrum of the unvoiced segment of speech;/> is the signal spectrum of the voiced segment.
进一步,所述步骤S6中,噪声段的修正权重w1为0.6,语音清音段的修正权重w2为0.8,浊音段的修正权重w3为1.1Further, in step S6, the correction weight w 1 of the noise segment is 0.6, the correction weight w 2 of the unvoiced speech segment is 0.8, and the correction weight w 3 of the voiced segment is 1.1
本发明的有益效果在于:The beneficial effects of the present invention are:
本发明公开了一种基于改进谱减法的语音去噪方法,针对语音识别、声纹识别、场景录音和听力辅助等任务中的噪声问题,能够实时的、有效的抑制平稳和非平稳噪声,显著提升语音信噪比,同时,能够保证语音信号有效部分不受前端去噪带来的失真影响,适用于各语音任务下系统的前端去噪,提升系统处理速率及性能。The invention discloses a speech denoising method based on improved spectral subtraction. Aiming at noise problems in tasks such as speech recognition, voiceprint recognition, scene recording and hearing assistance, it can suppress stationary and non-stationary noise in real time and effectively, and significantly It improves the speech signal-to-noise ratio and at the same time ensures that the effective part of the speech signal is not affected by distortion caused by front-end denoising. It is suitable for front-end denoising of the system under various speech tasks and improves system processing speed and performance.
本发明的其他优点、目标和特征在某种程度上将在随后的说明书中进行阐述,并且在某种程度上,基于对下文的考察研究,对本领域技术人员而言将是显而易见的,或者可以从本发明的实践中得到教导。本发明的目标和其他优点可以通过下面的说明书来实现和获得。Other advantages, objects, and features of the invention will, to the extent that they are set forth in the description that follows, and, to the extent that they will become apparent, or can be apparent to those skilled in the art upon examination of the following, Teach this invention by practicing it. The objects and other advantages of the invention may be realized and obtained by the following description.
附图说明Description of the drawings
图1为本发明语音去噪方法的流程图;Figure 1 is a flow chart of the speech denoising method of the present invention;
图2为人声所在子频带区域的示意图;Figure 2 is a schematic diagram of the sub-band area where the human voice is located;
图3为平均幅度和过零率与语音关系的示意图。Figure 3 is a schematic diagram of the relationship between average amplitude and zero-crossing rate and speech.
具体实施方式Detailed ways
为使本发明的技术方案、优点和目的更加清楚,下面将结合本发明实施例的附图,对本发明实施例的技术方案进行清楚、完整地描述。显然,所描述的实施例是本发明的一部分实施例,而不是全部的实施例。基于所描述的本发明的实施例,本领域普通技术人员在无需创造性劳动的前提下所获得的所有其他实施例,都属于本申请的保护范围。In order to make the technical solutions, advantages and purposes of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. Obviously, the described embodiments are some, but not all, of the embodiments of the present invention. Based on the described embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the protection scope of this application.
如图1所示,本发明提供了一种基于改进谱减法的语音去噪方法,包括以下步骤:As shown in Figure 1, the present invention provides a speech denoising method based on improved spectral subtraction, which includes the following steps:
S1.输入带噪语音,对带噪语音的信号进行预处理,将信号从时域维度转换到帧维度;S1. Input the noisy speech, preprocess the noisy speech signal, and convert the signal from the time domain dimension to the frame dimension;
本发明中采用的待处理带噪信号可以是语音识别、声纹识别、场景录音和听力辅助等各语音任务下的人声数据,在本发明实施例中,选取数据集为语料库CN-Celeb,数据集有3000个说话人下60万条话语,且是无体裁限制下录制的,具有多种场景的真实噪声,模拟声纹识别任务下的语音去噪,处理为.wav格式。The noisy signal to be processed used in the present invention can be human voice data under various speech tasks such as speech recognition, voiceprint recognition, scene recording, and listening assistance. In the embodiment of the present invention, the selected data set is the corpus CN-Celeb, The data set has 600,000 utterances from 3,000 speakers, and was recorded without genre restrictions. It has real noise in a variety of scenes, simulates speech denoising under the voiceprint recognition task, and is processed in .wav format.
时域形式的语音数据依次作预加重、分帧、窗函数的信号预处理,将信号从时域维度转换到帧维度上,其中帧长win_length和帧偏移hop_length分别设置为2048和512,本实施例使用汉宁窗函数。The speech data in the time domain form is sequentially subjected to signal preprocessing by pre-emphasis, framing, and window functions to convert the signal from the time domain dimension to the frame dimension. The frame length win_length and frame offset hop_length are set to 2048 and 512 respectively. This paper The embodiment uses the Hanning window function.
S2.采用傅里叶变换,获取信号全段频谱;S2. Use Fourier transform to obtain the full spectrum of the signal;
采用傅里叶变换,取傅里叶变换尺寸NFFT为2048,得到信号全段频谱。Use Fourier transform and set the Fourier transform size NFFT to 2048 to obtain the full spectrum of the signal.
S3.逐帧计算人声所在子频带的频谱能量占比;S3. Calculate the spectral energy proportion of the sub-band where the human voice is located frame by frame;
根据信号的语音段在低频子频带上的能量占比高于非语音段的声学实验结论,在频域上划分出人声能量集中的特定子频带,并统计信号每帧的频谱能量,与信号全频域上的频谱能量总和计算比值。According to the acoustic experimental conclusion that the energy proportion of the speech segment of the signal in the low-frequency sub-band is higher than that of the non-speech segment, specific sub-bands where the human voice energy is concentrated are divided in the frequency domain, and the spectral energy of each frame of the signal is counted, and the signal The ratio is calculated by summing the spectral energy over the entire frequency domain.
S3.1在频域上划分出人声能量集中的子频带;S3.1 divides the sub-bands where vocal energy is concentrated in the frequency domain;
根据声学实验结论,人声能量集中在如图2所示的300至3400Hz的频率区间,故子频带的频率区间为300至3400Hz。According to the conclusions of acoustic experiments, human voice energy is concentrated in the frequency range from 300 to 3400Hz as shown in Figure 2, so the frequency range of the sub-band is 300 to 3400Hz.
S3.2统计人声所在子频带的频谱能量Esub与信号全频域上的频谱能量Eall;S3.2 Statistics of the spectral energy E sub of the sub-band where the human voice is located and the spectral energy E all of the full frequency domain of the signal;
对信号频谱的幅值取平方值得到原始信号每一帧能量,并保留信号的原始相位。调用numpy科学计算库的numpy.fft.fftfreq函数,对信号自适应等距设置多个频率区间的中心频率,在这些频率上统计信号全段的频谱能量。The amplitude of the signal spectrum is squared to obtain the energy of each frame of the original signal, and the original phase of the signal is retained. Call the numpy.fft.fftfreq function of the numpy scientific computing library to adaptively set the center frequencies of multiple frequency intervals at equal intervals for the signal, and count the spectrum energy of the entire signal segment at these frequencies.
人声所在子频带的频谱能量Esub,计算表达式如下:The spectral energy E sub of the sub-band where the human voice is located, the calculation expression is as follows:
式中,fstart、fend均为截止频率,fstart=300Hz,fend=3400Hz;S为频率f对应的幅值;In the formula, f start and f end are both cutoff frequencies, f start =300Hz, f end =3400Hz; S is the amplitude corresponding to frequency f;
S3.3计算子频带的频谱能量占比Ren,计算表达式如下: S3.3 Calculate the spectrum energy proportion R en of the sub-band. The calculation expression is as follows:
S4.生成语音屏蔽掩码,估计噪声;S4. Generate a speech shielding mask and estimate the noise;
S4.1设置能量占比阈值Ten,Ten=0.6,通过能量占比阈值Ten比较产生帧维度的语音段的屏蔽掩码Maskn,Maskn的计算表达式如下:S4.1 Set the energy proportion threshold T en , T en =0.6, and compare the energy proportion threshold T en to generate the frame-dimensional speech segment mask Mask n . The calculation expression of Mask n is as follows:
S4.2语音屏蔽掩码与频谱能量相乘,得到噪声频谱估计。S4.2 The speech mask is multiplied by the spectrum energy to obtain the noise spectrum estimate.
S5.根据调节因子划分语音浊音、语音清音和噪声;S5. Classify voiced speech, unvoiced speech and noise according to the adjustment factor;
对于突变非平稳噪声的情况,全范围固定的因子值下的抑制函数不够合理,需动态计算信号某属性来调整抑制函数。根据噪声段和语音清音的过零率高,有激励源的噪声平均幅度大于清音段,语音浊音能量大且声带激励明显,比值大于清音和噪声段的变化关系,逐帧计算平均幅度和过零率两频域特征的比值作为抑制函数的调节因子,根据该比值设置划分噪声、浊音和清音的判断门限值。For the case of sudden non-stationary noise, the suppression function under the full range of fixed factor values is not reasonable enough, and a certain attribute of the signal needs to be dynamically calculated to adjust the suppression function. According to the high zero-crossing rate of noise segments and unvoiced speech, the average amplitude of noise with excitation sources is greater than that of unvoiced segments, the energy of voiced speech is large and the vocal fold excitation is obvious, and the ratio is greater than the changing relationship between unvoiced and noise segments, the average amplitude and zero-crossing are calculated frame by frame The ratio of the two frequency domain characteristics is used as the adjustment factor of the suppression function, and the judgment threshold value for dividing noise, voiced sound and unvoiced sound is set according to the ratio.
S5.1信号全段逐帧计算平均幅度和过零率两频域特征,如图3所示,取比值作为抑制函数的调节因子,如下式:The frequency domain characteristics of the average amplitude and zero-crossing rate are calculated frame by frame for the entire S5.1 signal, as shown in Figure 3. The ratio is used as the adjustment factor of the suppression function, as follows:
式中,AA代表音频信号平均幅度;ZCR代表短时过零率;F代表抑制函数的调节因子;In the formula, AA represents the average amplitude of the audio signal; ZCR represents the short-time zero-crossing rate; F represents the adjustment factor of the suppression function;
S5.2根据调节因子划分噪声、浊音和清音的门限值T1和T2,如下式:S5.2 divide the threshold values T 1 and T 2 of noise, voiced sound and unvoiced sound according to the adjustment factor, as follows:
T1=α1*max(F)+α2*min(F)T 1 =α 1 *max(F)+α 2 *min(F)
T2=β1*max(F)+β2*min(F)T 2 =β 1 *max(F)+β 2 *min(F)
式中,α1、α2代表T1对应的调节因子,α1+α2=1;β1、β2代表T2对应的调节因子,β1+β2=1;In the formula, α 1 and α 2 represent the regulatory factors corresponding to T 1 , α 1 + α 2 = 1; β 1 and β 2 represent the regulatory factors corresponding to T 2 , β 1 + β 2 = 1;
F<T1,划分为噪声;T1<F<T2,划分为清音;T2<F,划分为浊音。F<T 1 is classified as noise; T 1 <F <T 2 is classified as voiceless sound; T 2 <F is classified as voiced sound.
S6.对语音浊音、语音清音、噪声进行谱减后权重修正。S6. Perform weight correction after spectral subtraction for voiced speech, unvoiced speech, and noise.
对于噪声段尽可能收缩抑制减少残余来消除音乐噪声,对语音清音段的抑制相对减弱,保持重构信号时到浊音段的平稳,对语音浊音段,则需要加大增益进一步谱减,设置谱减后信号噪声段、语音清音段与浊音段的修正权重,得到去噪后信号频谱。Shrink the noise segment as much as possible to suppress and reduce the residual to eliminate music noise. The suppression of the unvoiced segment of speech is relatively weakened, keeping the voiced segment stable when reconstructing the signal. For the voiced segment of speech, it is necessary to increase the gain to further reduce the spectrum, and set the spectrum After subtracting the corrected weights of the signal noise segment, unvoiced segment and voiced segment, the signal spectrum after denoising is obtained.
设置谱减后信号频谱权重,噪声段的修正权重为w1,w1=0.6;语音清音段的修正权重为w2,w2=0.8;浊音段的修正权重为w3,w3=1.1;然后作谱减后信号的权重修正,得到去噪后的信号频谱,如下式:Set the spectrum weight of the signal after spectrum subtraction. The correction weight of the noise segment is w 1 , w 1 =0.6; the correction weight of the unvoiced speech segment is w 2 , w 2 =0.8; the correction weight of the voiced segment is w 3 , w 3 =1.1 ; Then perform weight correction on the signal after spectrum subtraction to obtain the denoised signal spectrum, as follows:
式中,为去噪后的信号频谱;/>为噪声段的信号频谱;/>为语音清音段的信号频谱;/>为浊音段的信号频谱。In the formula, is the signal spectrum after denoising;/> is the signal spectrum of the noise section;/> is the signal spectrum of the unvoiced segment of speech;/> is the signal spectrum of the voiced segment.
最后说明的是,以上实施例仅用以说明本发明的技术方案而非限制,尽管参照较佳实施例对本发明进行了详细说明,本领域的普通技术人员应当理解,可以对本发明的技术方案进行修改或者等同替换,而不脱离本技术方案的宗旨和范围,其均应涵盖在本发明的保护范围当中。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and are not limiting. Although the present invention has been described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand that the technical solutions of the present invention can be modified. Modifications or equivalent substitutions without departing from the purpose and scope of the technical solution shall be included in the protection scope of the present invention.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310837044.9A CN116959475A (en) | 2023-07-10 | 2023-07-10 | A speech denoising method based on improved spectral subtraction |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310837044.9A CN116959475A (en) | 2023-07-10 | 2023-07-10 | A speech denoising method based on improved spectral subtraction |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN116959475A true CN116959475A (en) | 2023-10-27 |
Family
ID=88445517
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202310837044.9A Pending CN116959475A (en) | 2023-07-10 | 2023-07-10 | A speech denoising method based on improved spectral subtraction |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN116959475A (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119669648A (en) * | 2024-11-26 | 2025-03-21 | 重庆大学 | Fourier transform denoising method and system based on hybrid mask strategy |
-
2023
- 2023-07-10 CN CN202310837044.9A patent/CN116959475A/en active Pending
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119669648A (en) * | 2024-11-26 | 2025-03-21 | 重庆大学 | Fourier transform denoising method and system based on hybrid mask strategy |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US8359195B2 (en) | Method and apparatus for processing audio and speech signals | |
| CN101593522B (en) | Method and equipment for full frequency domain digital hearing aid | |
| CN101976566B (en) | Speech enhancement method and device applying the method | |
| CN117321681A (en) | Speech optimization in noisy environments | |
| CN101976565A (en) | Dual-microphone-based speech enhancement device and method | |
| CN102074245A (en) | Dual-microphone-based speech enhancement device and speech enhancement method | |
| CN102074246A (en) | Dual-microphone based speech enhancement device and method | |
| CN101894563A (en) | Voice enhancing method | |
| Kim et al. | Nonlinear enhancement of onset for robust speech recognition. | |
| US20160365099A1 (en) | Method and system for consonant-vowel ratio modification for improving speech perception | |
| US9245538B1 (en) | Bandwidth enhancement of speech signals assisted by noise reduction | |
| CN103295580A (en) | Method and device for suppressing noise of voice signals | |
| CN106653004B (en) | Speaker identification feature extraction method for sensing speech spectrum regularization cochlear filter coefficient | |
| CN116959475A (en) | A speech denoising method based on improved spectral subtraction | |
| CN112233657A (en) | A speech enhancement method based on low-frequency syllable recognition | |
| Sadjadi et al. | A comparison of front-end compensation strategies for robust LVCSR under room reverberation and increased vocal effort | |
| Uhle et al. | Speech enhancement of movie sound | |
| Subramanya et al. | A graphical model for multi-sensory speech processing in air-and-bone conductive microphones | |
| JP2023157845A (en) | Audio processing method and device | |
| Chougule et al. | Channel robust MFCCs for continuous speech speaker recognition | |
| JP2001249676A (en) | Extraction method of fundamental period or fundamental frequency of periodic waveform with added noise | |
| EP2063420A1 (en) | Method and assembly to enhance the intelligibility of speech | |
| Zorila et al. | On the Quality and Intelligibility of Noisy Speech Processed for Near-End Listening Enhancement. | |
| Naithani et al. | Subjective evaluation of deep neural network based speech enhancement systems in real-world conditions | |
| Boyko et al. | Using recurrent neural network to noise absorption from audio files. |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |