CN113362840B - General voice information recovery device and method based on undersampled data of built-in sensor - Google Patents
General voice information recovery device and method based on undersampled data of built-in sensor Download PDFInfo
- Publication number
- CN113362840B CN113362840B CN202110615983.XA CN202110615983A CN113362840B CN 113362840 B CN113362840 B CN 113362840B CN 202110615983 A CN202110615983 A CN 202110615983A CN 113362840 B CN113362840 B CN 113362840B
- Authority
- CN
- China
- Prior art keywords
- spectrum
- frequency
- signal
- speech
- sensor
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 22
- 238000011084 recovery Methods 0.000 title claims abstract description 20
- 238000001228 spectrum Methods 0.000 claims abstract description 145
- 238000006243 chemical reaction Methods 0.000 claims abstract description 14
- 238000007781 pre-processing Methods 0.000 claims abstract description 11
- 230000002159 abnormal effect Effects 0.000 claims description 19
- 238000005070 sampling Methods 0.000 claims description 16
- 230000008859 change Effects 0.000 claims description 7
- 230000008569 process Effects 0.000 claims description 5
- 239000011159 matrix material Substances 0.000 claims description 3
- 238000012937 correction Methods 0.000 claims description 2
- 210000005069 ears Anatomy 0.000 claims description 2
- KVFIJIWMDBAGDP-UHFFFAOYSA-N ethylpyrazine Chemical compound CCC1=CN=CC=N1 KVFIJIWMDBAGDP-UHFFFAOYSA-N 0.000 claims 1
- 230000003595 spectral effect Effects 0.000 abstract description 12
- 238000012549 training Methods 0.000 abstract description 6
- 238000010586 diagram Methods 0.000 description 16
- 230000006872 improvement Effects 0.000 description 8
- 230000003993 interaction Effects 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 5
- 238000005259 measurement Methods 0.000 description 4
- 241000238558 Eucarida Species 0.000 description 2
- DWDGSKGGUZPXMQ-UHFFFAOYSA-N OPPO Chemical compound OPPO DWDGSKGGUZPXMQ-UHFFFAOYSA-N 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000012804 iterative process Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000010079 rubber tapping Methods 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0232—Processing in the frequency domain
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Telephone Function (AREA)
Abstract
本发明公开了一种基于内建传感欠采样数据的通用语音信息恢复装置及方法,装置包括信号预处理模块、基频估计模块、频谱重建模块和频谱语音转换模块,信号预处理模块、基频估计模块、频谱重建模块和频谱语音转换模块依次连接,不仅能够恢复带宽极窄且严重混叠的传感器数据中的语音信息,还解决了基于学习的模型迁移性比较差的问题。手机内建传感器在不同的场景下采集的数据有不同的特点,本发明从传感器数据的内在特征与语音信号的特点出发,无需使用数据集进行模型训练,直接构造了一个语音信息恢复系统,且能够适应用户、环境和设备的变化,有效地从手机内建传感器中恢复出隐匿的语音信号。
The invention discloses a general voice information recovery device and method based on built-in sensing undersampling data. The device includes a signal preprocessing module, a fundamental frequency estimation module, a spectrum reconstruction module and a spectrum voice conversion module. The frequency estimation module, the spectral reconstruction module and the spectral speech conversion module are connected in sequence, which can not only recover the speech information in the sensor data with extremely narrow bandwidth and serious aliasing, but also solve the problem of poor transferability of the learning-based model. The data collected by the built-in sensor of the mobile phone has different characteristics in different scenarios. The present invention directly constructs a voice information recovery system without using the data set for model training based on the inherent characteristics of the sensor data and the characteristics of the voice signal. It can adapt to changes in users, environments and devices, and effectively recover hidden voice signals from the built-in sensors of mobile phones.
Description
技术领域technical field
本发明公开了一种基于内建传感欠采样数据的通用语音信息恢复装置及方法。The invention discloses a general voice information restoration device and method based on built-in sensing undersampling data.
背景技术Background technique
随着智能手机助手的发展,声音在人机交互中越来越普及,甚至逐渐成为了盲人、老人、儿童等特殊人群的首选。因此,越来越多的物联网设备和移动设备部署了语音助手。例如,在移动设备上,有苹果的Siri,谷歌的Google Assistant和三星的Bixby;在智能音箱上,有亚马逊的Alexa和谷歌的Google Home;在传统的个人电脑设备上,有苹果的AppleSiri和微软的Microsoft Cortana。有研究表明,到2023年,全球语音助手的市值将达到78亿美元左右。然而,由于语音交互的个性化服务性质,在人机交互过程中,语音中通常会嵌入一些敏感的信息,例如用于认证的口头密码和聊天时的语音内容。而智能设备在使用扬声器播放这些语音的时候,不可避免地会泄露其中的敏感信息。本发明利用手机内建的加速度计、陀螺仪和磁力计,来采集扬声器播放语音时泄露的震动信号和磁场信号,并恢复其中隐含的语音信息。With the development of smartphone assistants, voice has become more and more popular in human-computer interaction, and has even gradually become the first choice for the blind, the elderly, children and other special groups. As a result, more and more IoT devices and mobile devices are deploying voice assistants. For example, on mobile devices, there's Apple's Siri, Google's Assistant, and Samsung's Bixby; on smart speakers, there's Amazon's Alexa and Google's Google Home; on traditional PC devices, there's Apple's Siri and Microsoft's Microsoft Cortana. Research shows that by 2023, the global market value of voice assistants will reach about 7.8 billion US dollars. However, due to the personalized service nature of voice interaction, in the process of human-computer interaction, some sensitive information is usually embedded in the voice, such as the verbal password used for authentication and the voice content during chatting. When smart devices use speakers to play these voices, they inevitably leak sensitive information. The invention utilizes the built-in accelerometer, gyroscope and magnetometer of the mobile phone to collect the vibration signal and magnetic field signal leaked when the speaker plays the voice, and recover the implicit voice information.
现有技术的缺点:Disadvantages of the prior art:
手机扬声器在播放语音时,可以使用手机内建的传感器(加速度计、陀螺仪、磁力计)来采集扬声器泄露的信号。而从低分辨率的传感器数据中恢复出高分辨率语音信息主要有两种途径,第一种是使用语音超分辨率技术,将低带宽的传感器信号转化成高带宽的语音信号。但是传感器数据的带宽极其狭窄且存在严重的混叠,语音超分辨率技术无法有效地恢复其中的语音信息。第二种是使用基于学习(机器学习、深度学习)的语音恢复技术,然而这种方法有两大缺点,其一,不可避免地引入了传感器数据的采集和标记工作,并且需要耗费大量的时间进行模型训练。其二,基于学习的传感器数据语音恢复模型在迁移到不同的主体、环境和设备时,性能会显著下降When the mobile phone speaker is playing voice, the built-in sensors (accelerometer, gyroscope, magnetometer) of the mobile phone can be used to collect the signal leaked by the speaker. There are two main ways to recover high-resolution voice information from low-resolution sensor data. The first is to use voice super-resolution technology to convert low-bandwidth sensor signals into high-bandwidth voice signals. However, the bandwidth of sensor data is extremely narrow and there is serious aliasing, and speech super-resolution technology cannot effectively recover the speech information in it. The second is to use the speech restoration technology based on learning (machine learning, deep learning). However, this method has two major disadvantages. First, it inevitably introduces the collection and labeling of sensor data, and it takes a lot of time. Perform model training. Second, the performance of the learning-based sensor data speech recovery model will drop significantly when migrated to different subjects, environments, and devices.
本发明提出一种模型驱动的传感器数据的语音信息恢复装置,省去了模型训练的过程,能够有效地从传感器数据中恢复隐匿的语音信息,并且保持足够高的鲁棒性。The present invention provides a model-driven sensor data voice information restoration device, which saves the model training process, can effectively restore hidden voice information from sensor data, and maintains sufficiently high robustness.
发明内容SUMMARY OF THE INVENTION
本发明针对现有技术的不足之处做出了改进,提供了一种基于内建传感器欠采样数据的通用语音信息恢复装置及方法,本发明是通过以下技术方案来实现的:The present invention has made improvements in view of the deficiencies of the prior art, and provides a general voice information recovery device and method based on built-in sensor undersampling data, and the present invention is realized through the following technical solutions:
本发明公开了一种基于内建传感欠采样数据的通用语音信息恢复装置,装置包括信号预处理模块、基频估计模块、频谱重建模块和频谱语音转换模块,信号预处理模块、基频估计模块、频谱重建模块和频谱语音转换模块依次连接。The invention discloses a general speech information recovery device based on built-in sensing under-sampling data. The device includes a signal preprocessing module, a fundamental frequency estimation module, a spectrum reconstruction module and a spectrum speech conversion module. The module, the spectrum reconstruction module and the spectrum speech conversion module are connected in sequence.
作为进一步地改进,本发明所述的信号预处理模块:用于采集手机加速度计、陀螺仪和磁力计的数据,并将采集到的传感器数据送入高通滤波器;基频估计模块:用于估计传感器数据中隐含的语音信号的基频;频谱重建模块:用于重建高频段的谐波、及校正低频段的异常谐波,将低分辨率的频谱恢复成高分辨率的频谱;频谱语音转换模块:用于使用Griffin-Lim算法将恢复出来的高分辨率频谱转换成人耳可听的语音信号。As a further improvement, the signal preprocessing module of the present invention: used to collect the data of the mobile phone accelerometer, gyroscope and magnetometer, and sent the collected sensor data into the high-pass filter; the fundamental frequency estimation module: used for Estimating the fundamental frequency of speech signals implicit in sensor data; spectrum reconstruction module: used to reconstruct high-frequency harmonics, correct abnormal low-frequency harmonics, and restore low-resolution spectrum to high-resolution spectrum; spectrum Speech conversion module: It is used to convert the recovered high-resolution spectrum into a human-audible speech signal using the Griffin-Lim algorithm.
本发明还公开了一种采用基于内建传感欠采样数据的通用语音信息恢复装置的恢复方法,其特征在于:The invention also discloses a recovery method using a general voice information recovery device based on built-in sensing undersampling data, which is characterized by:
1)通过信号预处理模块,采集手机加速度计z轴、陀螺仪y轴、磁力计z轴的数据,再将采集到的传感器数据送入一个高通滤波器,以滤除无意义的低频噪声;1) Through the signal preprocessing module, collect the data of the z-axis of the mobile phone accelerometer, the y-axis of the gyroscope, and the z-axis of the magnetometer, and then send the collected sensor data into a high-pass filter to filter out meaningless low-frequency noise;
2)通过基频估计模块,利用基于混叠的基频估计算法来估计基频大小,算法能够同时考虑语音信号中的正常谐波与信号欠采样时产生的异常谐波,估计传感器数据中隐含的语音信号的基频;2) Through the fundamental frequency estimation module, the fundamental frequency estimation algorithm based on aliasing is used to estimate the fundamental frequency. The algorithm can simultaneously consider the normal harmonics in the speech signal and the abnormal harmonics generated when the signal is undersampled, and estimate the hidden frequency in the sensor data. The fundamental frequency of the included speech signal;
3)根据估计的基频,通过频谱重建模块,重建高频段的谐波和校正低频段的异常谐波,将低分辨率的频谱恢复成高分辨率的频谱;3) According to the estimated fundamental frequency, through the spectrum reconstruction module, the harmonics of the high frequency band are reconstructed and the abnormal harmonics of the low frequency band are corrected, and the low-resolution spectrum is restored to a high-resolution spectrum;
4)通过频谱语音转换模块,使用Griffin-Lim算法将恢复出来的高分辨率频谱转换成人耳可听的语音信号。4) Through the spectral speech conversion module, the recovered high-resolution frequency spectrum is converted into a speech signal audible to the human ear by using the Griffin-Lim algorithm.
作为进一步地改进,本发明所述的步骤2)中,基频估计模块,使用短时傅里叶变换,将滤波处理后的传感器时域信号转换为幅度频谱M(t,f),当原始信号的频率大于传感器采样率的一半时,传感器实际采集的信号会因为欠采样而产生混叠,信号欠采样前后频率变化的关系为:As a further improvement, in step 2) of the present invention, the fundamental frequency estimation module uses short-time Fourier transform to convert the filtered sensor time-domain signal into an amplitude spectrum M(t, f), when the original When the frequency of the signal is greater than half of the sampling rate of the sensor, the signal actually collected by the sensor will be aliased due to undersampling. The relationship between the frequency change before and after the signal undersampling is:
f是原始频率,SR是采样率,A(f)是变化后的频率;f is the original frequency, SR is the sampling rate, and A(f) is the changed frequency;
使用基于混叠的谐波相加法来衡量频率f是基频的可能性H(f):Use aliasing-based harmonic addition to measure the likelihood H(f) that frequency f is fundamental:
其中,M(t,f)是传感器数据的幅度频谱,t为帧号,f为频率,k为谐波的阶数,n为频率低于的最高阶谐波的阶数,m代表频率低于1250Hz的最高阶谐波的阶数。Among them, M(t, f) is the amplitude spectrum of the sensor data, t is the frame number, f is the frequency, k is the order of the harmonic, and n is the frequency lower than The order of the highest-order harmonic of , m represents the order of the highest-order harmonic whose frequency is lower than 1250Hz.
作为进一步地改进,本发明由于频率变化,幅度频谱M(t,f)中存在对应语音信号中的正常谐波和欠采样生成的异常谐波。As a further improvement, due to the frequency change in the present invention, there are normal harmonics in the corresponding speech signal and abnormal harmonics generated by undersampling in the amplitude spectrum M(t, f).
作为进一步地改进,本发明所述的H(f)中,前一项累加了频谱中的正常谐波的能量,后一项累加了欠采样生成的异常谐波的能量,频谱中每一帧的基频fp为根据H(f)越大,f是基频可能性的越大的特点频谱中每一帧的基频fp为同时考虑频谱中的正常谐波和为欠采样生成的异常谐波这两个部分能够提高基频估计的准确性,其中85Hz到255Hz代表大部分成年人说话时的基频范围。As a further improvement, in H(f) described in the present invention, the former term accumulates the energy of normal harmonics in the spectrum, the latter term accumulates the energy of abnormal harmonics generated by undersampling, and each frame in the spectrum accumulates the energy of abnormal harmonics. The fundamental frequency f p of According to the characteristic that the larger H(f) is, the greater the possibility of f is the fundamental frequency, the fundamental frequency f p of each frame in the spectrum is Considering both the normal harmonics in the spectrum and the abnormal harmonics generated for undersampling can improve the accuracy of the fundamental frequency estimate, where 85Hz to 255Hz represents the fundamental frequency range in which most adults speak.
作为进一步地改进,步骤3)中,通过频谱重建模块重构传感器信号的频谱,通过混叠校正的超分辨率算法扩展传感器幅度频谱的带宽,记重建后的幅度频谱为Mnew(t,f)(初始为一个零矩阵),原始幅度频谱为Mold(t,f),具体步骤为:As a further improvement, in step 3), the spectrum of the sensor signal is reconstructed by the spectrum reconstruction module, the bandwidth of the sensor's amplitude spectrum is expanded by the super-resolution algorithm of aliasing correction, and the reconstructed amplitude spectrum is recorded as M new (t, f ) (initially a zero matrix), the original amplitude spectrum is M old (t, f), and the specific steps are:
a、算法遍历原始频谱Mold(t,f)的每一帧,在每一轮遍历中,先利用基频估计模块估计出基频fp的大小,再根据基频与谐波频率的整倍数关系,得到各个谐波的频率kfp;a. The algorithm traverses each frame of the original spectrum M old (t, f). In each round of traversal, the fundamental frequency estimation module is used to estimate the size of the fundamental frequency f p , and then according to the integral of the fundamental frequency and the harmonic frequency Multiple relationship, get the frequency kf p of each harmonic;
b、算法重建频谱,对于频率处于0Hz到之间的频谱,语音谐波的频率为将Mold(t,kfp)直接赋值给Mnew(t,kfp);(以保留低频段的正常语音谐波,并且去除低频段的混叠。)对于频率处于到FendHz之间的频谱,语音谐波的频率为Fend为重建频谱的最高频率;b. The algorithm reconstructs the spectrum, for frequencies from 0Hz to The frequency spectrum between the speech harmonics is M old (t, kf p ) is directly assigned to M new (t, kf p ); (to preserve the normal speech harmonics of the low frequency band and remove the aliasing of the low frequency band.) For frequencies in to F end Hz, the frequency of the speech harmonics is F end is the highest frequency of the reconstructed spectrum;
c、根据欠采样时频率变化的关系来估计这一部分谐波(频率为到FendHz)所在的位置以及能量的大小,原始正常语音谐波的频率为kfpHz,由于欠采样,其频率以公式(1)转变成A(kfp)Hz;c. Estimate this part of the harmonics (frequency is To the position of F end Hz) and the magnitude of the energy, the frequency of the original normal speech harmonic is kf p Hz, and due to undersampling, its frequency is transformed into A(kf p ) Hz by formula (1);
d、用已知的混叠谐波频谱Mold(t,A(kfp))来替换未知的正常谐波频谱Mnew(t,kfp),在完成对Mold(t,f)中每一帧t的遍历后,系统生成了重建后的幅度频谱Mnew(t,f)。d. Replace the unknown normal harmonic spectrum M new (t, kf p ) with the known aliased harmonic spectrum M old (t, A(kf p )), in completing the pair M old (t, f) After traversing each frame t, the system generates the reconstructed magnitude spectrum M new (t, f).
作为进一步地改进,本方面所述的步骤4)中,频谱语音转换模块基于重建频谱Mnew(t,f),使用Griffin-Lim算法从中恢复出人耳可听的语音信号,算法通过N次迭代从重建频谱Mnew(t,f)中估计出语音信号。As a further improvement, in step 4) described in this aspect, the spectrum-to-speech conversion module uses the Griffin-Lim algorithm to recover the audible speech signal based on the reconstructed spectrum M new (t, f), and the algorithm passes N times The speech signal is estimated iteratively from the reconstructed spectrum Mnew (t,f).
作为进一步地改进,本发明所述的步骤4)的具体步骤为:As a further improvement, the concrete steps of step 4) of the present invention are:
e、Griffin-Lim算法随机生成一个相位频谱P0,再利用逆短时傅里叶变换将相位频谱P0和幅度频谱Mnew(t,f)转化为语音信号x0;e. The Griffin-Lim algorithm randomly generates a phase spectrum P 0 , and then uses the inverse short-time Fourier transform to convert the phase spectrum P 0 and the amplitude spectrum M new (t, f) into a speech signal x 0 ;
f、对语音信号x0做一次短时傅里叶变换,得到相位频谱P1和幅度频谱由于幅度频谱与重建频谱Mnew(t,f)存在一定的区别,算法只保留相位频谱P1,并将P1送入下一次迭代过程;f. Do a short-time Fourier transform on the speech signal x 0 to obtain the phase spectrum P 1 and the amplitude spectrum Due to the magnitude spectrum There is a certain difference from the reconstructed spectrum M new (t, f), the algorithm only retains the phase spectrum P 1 , and sends P 1 to the next iteration process;
gf、Griffin-Lim算法通过N次的迭代不断修正相位频谱Pi,直到生成的幅度频谱与重建频谱Mnew(t,f)足够相似,Griffin-Lim算法利用给定的重建频谱Mnew(t,f)生成了对应的语音信号。The gf and Griffin-Lim algorithms continuously modify the phase spectrum P i through N iterations until the generated amplitude spectrum Similar enough to the reconstructed spectrum M new (t, f), the Griffin-Lim algorithm generates the corresponding speech signal with the given reconstructed spectrum M new (t, f).
本发明的有益效果如下:The beneficial effects of the present invention are as follows:
本发明构建了一个无需训练的传感器数据隐匿信息恢复装置,不仅能够恢复带宽极窄且严重混叠的传感器数据中的语音信息,还解决了基于学习的模型迁移性比较差的问题。手机内建传感器在不同的场景下采集的数据有不同的特点,具体来说,手机内建传感器捕捉到的扬声器语音信号可能来自不同的用户,并且具有不同的文本内容;手机内建传感器采集数据时可能处在不同的环境下,环境中的噪声和布置会影响传感器收集到的数据的质量;不同品牌和型号的手机内建传感器采集的数据会有不同的特点。这三种情况都会导致手机内建传感器采集的数据没有一个统一的特征,从而使得数据驱动的机器学习和深度学习方法很难训练出一个鲁棒性高的模型来恢复传感器数据中隐含的语音信息。而本发明从传感器数据的内在特征与语音信号的特点出发,无需使用数据集进行模型训练,直接构造了一个语音信息恢复系统,且能够适应用户、环境和设备的变化,有效地从手机内建传感器中恢复出隐匿的语音信号。The invention constructs a sensor data hidden information recovery device without training, which can not only recover the speech information in the sensor data with extremely narrow bandwidth and serious aliasing, but also solves the problem of poor transferability of the learning-based model. The data collected by the built-in sensor of the mobile phone has different characteristics in different scenarios. Specifically, the speaker voice signal captured by the built-in sensor of the mobile phone may come from different users and have different text content; the built-in sensor of the mobile phone collects data It may be in different environments, and the noise and arrangement in the environment will affect the quality of the data collected by the sensor; the data collected by the built-in sensors of different brands and models of mobile phones will have different characteristics. These three situations will cause the data collected by the built-in sensors of the mobile phone to not have a unified feature, making it difficult for data-driven machine learning and deep learning methods to train a model with high robustness to recover the speech implicit in the sensor data. information. However, the present invention directly constructs a voice information recovery system based on the inherent characteristics of sensor data and the characteristics of voice signals, without using data sets for model training, and can adapt to changes in users, environments and equipment, and effectively build a voice information system from mobile phones. The hidden voice signal is recovered from the sensor.
在频谱重建模块中,本发明使用混叠校正的超分辨率算法扩展原始传感器数据幅度频谱的带宽。原始传感器数据频谱的带宽为(传感器采样率的一半),本发明不仅在高频段(到1000Hz)上重建了缺失的语音谐波频谱,而且消除了低频段(0Hz到)上存在的混叠,使得重建的频谱与原始的语音频谱更加相似,从而提高了语音信息恢复的质量。本发明能够取得比较低的LSD,与传统语音超分辨率方法相比,能够更有效的恢复低分辨率传感器数据中的语音信息。另外,本发明能够在不同的设备上有效的运行,且去除用户交互对于传感器测量数据的干扰,实现有效的语音信息恢复。In the spectral reconstruction module, the present invention expands the bandwidth of the raw sensor data amplitude spectrum using an alias-corrected super-resolution algorithm. The bandwidth of the raw sensor data spectrum is (half of the sensor sampling rate), the present invention is not only in the high frequency band ( to 1000Hz), the missing speech harmonic spectrum is reconstructed, and the low frequency band (0Hz to 1000Hz) is removed. ), which makes the reconstructed spectrum more similar to the original speech spectrum, thereby improving the quality of speech information recovery. Compared with the traditional voice super-resolution method, the present invention can obtain relatively low LSD, and can more effectively restore the voice information in the low-resolution sensor data. In addition, the present invention can operate effectively on different devices, remove the interference of user interaction on sensor measurement data, and realize effective voice information recovery.
由于欠采样的传感器数据存在严重的混叠,导致传感器数据中不仅存在正常谐波,还存在异常的混叠谐波(高频的语音谐波欠采样成低频的异常谐波),很难从中找到正确的语音基频。因此,在基频估计模块中,本发明使用基于混叠的基频估计算法来估计基频的大小,即使用来衡量t时刻f是基频的可能性大小,并用来确定基频。其中将正常谐波的能量累加,将异常谐波的能量累加。这样不仅考虑到了正常的语音谐波,也考虑到了异常的混叠谐波,提高了基频估计的准确性。这一精确的基频估计结果能够有效的提高语音恢复的质量。Due to the serious aliasing of the undersampled sensor data, there are not only normal harmonics, but also abnormal aliasing harmonics (high-frequency speech harmonics are undersampled into low-frequency abnormal harmonics) in the sensor data. Find the correct fundamental frequency of speech. Therefore, in the fundamental frequency estimation module, the present invention uses the aliasing-based fundamental frequency estimation algorithm to estimate the magnitude of the fundamental frequency, that is, using to measure the possibility that f is the fundamental frequency at time t, and use to determine the fundamental frequency. in Adding up the energy of the normal harmonics, Accumulates the energy of abnormal harmonics. In this way, not only the normal speech harmonics, but also the abnormal aliasing harmonics are considered, and the accuracy of the fundamental frequency estimation is improved. This accurate fundamental frequency estimation result can effectively improve the quality of speech recovery.
附图说明Description of drawings
图1为本发明的系统框图;1 is a system block diagram of the present invention;
图2为不同场景下加速度计对扬声器泄露的语音信号的响应对比图;Figure 2 is a comparison diagram of the response of the accelerometer to the speech signal leaked by the speaker in different scenarios;
图3为磁力计信号,真实语音信号和重建语音信号的幅度频谱图;Fig. 3 is the amplitude spectrogram of magnetometer signal, real speech signal and reconstructed speech signal;
图4为本发明的总体性能对比图;Fig. 4 is the overall performance comparison diagram of the present invention;
图5为本发明对于不同使用主体的性能对比图;5 is a performance comparison diagram of the present invention for different use subjects;
图6为本发明对于不同文本内容的性能对比图;6 is a performance comparison diagram of the present invention for different text contents;
图7为本发明在不同设备上的性能对比图;7 is a performance comparison diagram of the present invention on different devices;
图8为本发明在不同环境下的性能对比图;8 is a performance comparison diagram of the present invention under different environments;
图9为本发明在不同的交互方式下的性能对比图。FIG. 9 is a performance comparison diagram of the present invention under different interaction modes.
具体实施方式Detailed ways
本发明公开了一种基于内建传感欠采样数据的通用语音信息恢复装置及方法,图1为本发明的系统框图,包括了4个部分,即信号预处理模块、基频估计模块、频谱重建模块和频谱语音转换模块,信号预处理模块、基频估计模块、频谱重建模块和频谱语音转换模块依次连接。The present invention discloses a general speech information recovery device and method based on built-in sensing undersampling data. Fig. 1 is a system block diagram of the present invention, which includes four parts, namely, a signal preprocessing module, a fundamental frequency estimation module, and a frequency spectrum. The reconstruction module and the spectral-voice conversion module, the signal preprocessing module, the fundamental frequency estimation module, the spectral reconstruction module and the spectral-voice conversion module are connected in sequence.
在信号预处理中,首先采集手机加速度计z轴、陀螺仪y轴、磁力计z轴的数据,接着将采集到的传感器数据送入一个高通滤波器,以滤除无意义的低频噪声。在基频估计中,利用基于混叠的基频估计算法来估计基频大小,该算法能够同时考虑语音信号中的正常谐波与信号欠采样时产生的异常谐波。根据估计的基频,在频谱重建中,不仅能重建高频段的谐波,而且还能校正低频段的异常谐波。最终将低分辨率的频谱恢复成高分辨率的频谱。在频谱语音转换中,系统会使用Griffin-Lim算法将恢复出来的高分辨率频谱转换成人耳可听的语音信号。In the signal preprocessing, the data of the z-axis of the mobile phone's accelerometer, the y-axis of the gyroscope, and the z-axis of the magnetometer are first collected, and then the collected sensor data is sent to a high-pass filter to filter out meaningless low-frequency noise. In the fundamental frequency estimation, the fundamental frequency estimation algorithm based on aliasing is used to estimate the fundamental frequency. According to the estimated fundamental frequency, in spectral reconstruction, not only the harmonics in the high frequency band can be reconstructed, but also the abnormal harmonics in the low frequency band can be corrected. Finally, the low-resolution spectrum is restored to a high-resolution spectrum. In spectral speech conversion, the system uses the Griffin-Lim algorithm to convert the recovered high-resolution frequency spectrum into a speech signal audible to human ears.
信号预处理的整个过程为:在手机扬声器播放语音信号时,采集手机加速度计的z轴、陀螺仪的y轴、磁力计的z轴的数据。由于用户使用手机时的行为可能会干扰传感器的测量,系统将采集到的传感器信号送入一个截止频率是80Hz的高通滤波器,以滤除无意义低频噪声,并保留扬声器播放语音时泄露的信号。图2为不同场景下加速度计对扬声器泄露的语音信号的响应对比图,展示了手机使用扬声器播放单音信号时,手机内建加速度计的测量值。其中图2(a)和2(b)分别是手机处于桌面和手持状态下手机内建加速度计的测量值,在手持状态下,用户手部的运动严重干扰了加速度的测量值,从而掩盖了扬声器播放语音时泄露的信号。图2(c)展示了经过高通滤波器之后的加速度计信号,可见扬声器泄露的信号成功的被分离出来。The whole process of signal preprocessing is as follows: when the mobile phone speaker plays the voice signal, the data of the z-axis of the mobile phone accelerometer, the y-axis of the gyroscope, and the z-axis of the magnetometer are collected. Since the user's behavior when using the mobile phone may interfere with the measurement of the sensor, the system sends the collected sensor signal into a high-pass filter with a cutoff frequency of 80Hz to filter out meaningless low-frequency noise and retain the signal leaked when the speaker plays voice. . Figure 2 is a comparison diagram of the response of the accelerometer to the voice signal leaked by the speaker in different scenarios, showing the measured value of the built-in accelerometer of the mobile phone when the mobile phone uses the speaker to play a single tone signal. Figures 2(a) and 2(b) are the measured values of the built-in accelerometer when the mobile phone is on the desktop and in the handheld state, respectively. In the handheld state, the movement of the user's hand seriously interferes with the measured value of the acceleration, thereby covering up the measured value of the acceleration. A signal leaked when the speaker is playing speech. Figure 2(c) shows the accelerometer signal after the high-pass filter, and it can be seen that the signal leaked by the speaker is successfully separated.
然后,用基频估计模块来估计传感器数据中隐含的语音信号的基频。使用短时傅里叶变换,将滤波处理后的传感器时域信号转换为幅度频谱M(t,f)。由于手机内建传感器的采样率远低于语音信号的实际频率,所以传感器实际采集的信号会因为欠采样而产生混叠。而信号欠采样前后频率变化的关系可以描述成:Then, a fundamental frequency estimation module is used to estimate the fundamental frequency of the speech signal implicit in the sensor data. Using a short-time Fourier transform, the filtered sensor time-domain signal is converted into an amplitude spectrum M(t,f). Since the sampling rate of the built-in sensor in the mobile phone is much lower than the actual frequency of the voice signal, the signal actually collected by the sensor will be aliased due to under-sampling. The relationship between the frequency change before and after the signal undersampling can be described as:
具体来说,一个原始频率是f的信号,以采样率SR欠采样后,信号的频率会变成A(f)。考虑到这种频率变化,幅度频谱M(t,f)中不仅存在对应语音信号中的正常谐波,还存在欠采样生成的异常谐波。所以本系统使用基于混叠的谐波相加法来衡量频率f是基频的可能性H(f)。H(f)可以表示成: Specifically, a signal whose original frequency is f will become A(f) after under-sampling with the sampling rate SR. Considering this frequency change, there are not only normal harmonics in the corresponding speech signal, but also abnormal harmonics generated by undersampling in the magnitude spectrum M(t, f). So this system uses the aliasing based harmonic addition method to measure the probability H(f) that the frequency f is the fundamental frequency. H(f) can be expressed as:
其中M(t,f)是传感器数据的幅度频谱,t代表帧号,f代表频率。k代表谐波的阶数。n代表频率低于的最高阶谐波的阶数,m代表频率低于1250Hz的最高阶谐波的阶数。H(f)的前一项包括了频谱中的正常谐波,后一项包括了因为欠采样而产生的,频率发生非线性变化的异常谐波,同时考虑这两个部分能够提高基频估计的准确性。根据H(f)越大,t时刻f是基频的可能性越大的特点,频谱中每一帧的基频fp可以表示为其中85Hz到255Hz代表大部分成年人说话时的基频范围。 where M(t, f) is the magnitude spectrum of the sensor data, t is the frame number, and f is the frequency. k represents the order of the harmonic. n stands for frequency below The order of the highest-order harmonic of , m represents the order of the highest-order harmonic whose frequency is lower than 1250Hz. The former term of H(f) includes normal harmonics in the spectrum, and the latter term includes abnormal harmonics with nonlinear frequency changes due to undersampling. Considering these two parts at the same time can improve the fundamental frequency estimation accuracy. According to the characteristic that the larger H(f) is, the greater the possibility that f is the fundamental frequency at time t is, the fundamental frequency f p of each frame in the spectrum can be expressed as Among them, 85Hz to 255Hz represents the fundamental frequency range when most adults speak.
在基频估计的基础上,系统进一步重构传感器信号的频谱,语音超分辨率技术利用基频与谐波之间的关系来扩展窄带信号。然而,由于欠采样引起的混叠,语音超分辨率技术无法直接用于从低分辨率传感器信号中恢复出人耳可听的语音。因此,本系统使用了一种混叠校正的超分辨率算法,能够扩展传感器幅度频谱的带宽。记重建后的幅度频谱为Mnew(t,f)(初始为一个零矩阵),原始幅度频谱为Mold(t,f)。该算法包括高频段的谐波重建与低频段的混叠消除。首先算法会遍历原始频谱Mold(t,f)的每一帧,在每一轮遍历中,先利用基频估计模块估计出基频fp的大小,再根据基频与谐波频率的整倍数关系,得到各个谐波的频率kfp。接着,算法会重建频谱,对于频率处于0Hz到之间的频谱,SR为传感器采样率,语音谐波的频率为算法只保留这一频带上语音谐波所在的频段,即Mold(t,kfp)处的频谱,具体来说,系统将Mold(t,kfp)直接赋值给Mnew(t,kfp),以保留低频段的正常语音谐波,并且去除低频段的混叠。对于频率处于到FendHz之间的频谱,Fend代表重建频谱的最高频率),语音谐波的频率为装置根据欠采样时频率变化的关系来估计这一部分谐波所在的位置以及能量的大小。具体来说,原始正常语音谐波的频率为kfpHz,由于欠采样,其频率以公式(1)转变成A(kfp)Hz。接下来系统用已知的混叠谐波频谱Mold(t,A(kfp))来替换未知的正常谐波频谱Mnew(t,kfp)。在完成对Mold(t,f)中每一帧t的遍历后,系统生成了重建后的幅度频谱Mnew(t,f)。在本发明中,如果传感器数据的带宽能达到250Hz,重建频谱的带宽Fend就能达到1000Hz。On the basis of the fundamental frequency estimation, the system further reconstructs the spectrum of the sensor signal, and the speech super-resolution technology utilizes the relationship between the fundamental frequency and harmonics to expand the narrowband signal. However, speech super-resolution techniques cannot be directly used to recover audible speech from low-resolution sensor signals due to aliasing caused by undersampling. Therefore, the present system uses an aliasing-corrected super-resolution algorithm capable of extending the bandwidth of the sensor's amplitude spectrum. Denote the reconstructed amplitude spectrum as M new (t, f) (initially a zero matrix), and the original amplitude spectrum as M old (t, f). The algorithm includes harmonic reconstruction in the high frequency band and aliasing removal in the low frequency band. First, the algorithm traverses each frame of the original spectrum M old (t, f). In each round of traversal, the fundamental frequency estimation module is used to estimate the size of the fundamental frequency f p , and then according to the integral of the fundamental frequency and the harmonic frequency The multiple relationship is obtained to obtain the frequency kf p of each harmonic. Next, the algorithm reconstructs the spectrum, for frequencies between 0Hz and The frequency spectrum between, SR is the sensor sampling rate, and the frequency of speech harmonics is The algorithm only retains the frequency band where the voice harmonics are located in this frequency band, that is, the frequency spectrum at M old (t, kf p ). Specifically, the system directly assigns M old (t, kf p ) to M new (t, kf ) p ) to preserve the normal speech harmonics of the low frequency band and remove the aliasing of the low frequency band. For frequencies at to F end Hz, where F end represents the highest frequency of the reconstructed spectrum), the frequency of the speech harmonics is The device estimates the position of this part of the harmonics and the magnitude of the energy according to the relationship between the frequency changes during under-sampling. Specifically, the frequency of the original normal speech harmonic is kf p Hz, and its frequency is transformed into A(kf p ) Hz by formula (1) due to undersampling. Next the system replaces the unknown normal harmonic spectrum M new (t, kf p ) with the known aliased harmonic spectrum M old (t, A(kf p )). After completing the traversal of each frame t in M old (t, f), the system generates a reconstructed magnitude spectrum M new (t, f). In the present invention, if the bandwidth of the sensor data can reach 250 Hz, the bandwidth F end of the reconstructed spectrum can reach 1000 Hz.
基于重建频谱Mnew(t,f),系统使用Griffin-Lim算法从中恢复出人耳可听的语音信号。该算法会通过一次次的迭代从重建频谱Mnew(t,f)中估计出语音信号。Griffin-Lim算法首先随机生成一个相位频谱P0,再利用逆短时傅里叶变换将相位频谱P0和幅度频谱Mnew(t,f)转化为语音信号x0。之后,再对语音信号x0做一次短时傅里叶变换,得到相位频谱P1和幅度频谱由于幅度频谱与重建频谱Mnew(t,f)存在一定的区别,算法只保留相位频谱P1,并将P1送入下一次迭代过程。以此类推,Griffin-Lim算法通过一次次的迭代不断修正相位频谱Pi,直到生成的幅度频谱与重建频谱Mnew(t,f)足够相似。在经过一定次数的迭代之后,Griffin-Lim算法利用给定的重建频谱Mnew(t,f)生成了对应的语音信号。Based on the reconstructed spectrum M new (t, f), the system uses the Griffin-Lim algorithm to recover the audible speech signal from it. The algorithm estimates the speech signal from the reconstructed spectrum M new (t, f) through iterations. The Griffin-Lim algorithm first randomly generates a phase spectrum P 0 , and then uses the inverse short-time Fourier transform to convert the phase spectrum P 0 and the amplitude spectrum M new (t, f) into a speech signal x 0 . After that, do a short-time Fourier transform on the speech signal x 0 to obtain the phase spectrum P 1 and the amplitude spectrum Due to the magnitude spectrum Different from the reconstructed spectrum M new (t, f), the algorithm only retains the phase spectrum P 1 and sends P 1 to the next iterative process. By analogy, the Griffin-Lim algorithm continuously corrects the phase spectrum P i through iterations until the generated amplitude spectrum Similar enough to the reconstructed spectrum M new (t, f). After a certain number of iterations, the Griffin-Lim algorithm uses the given reconstructed spectrum M new (t, f) to generate the corresponding speech signal.
本发明构建了一个无需训练的传感器数据隐匿信息恢复装置,不仅能够恢复带宽极窄且严重混叠的传感器数据中的语音信息,还解决了基于学习的模型迁移性比较差的问题。本发明能够适应用户、环境和设备的变化,有效地从手机内建传感器中恢复出隐匿的语音信号。The invention constructs a sensor data hidden information recovery device without training, which can not only recover the speech information in the sensor data with extremely narrow bandwidth and serious aliasing, but also solves the problem of poor transferability of the learning-based model. The present invention can adapt to the changes of users, environments and equipment, and effectively recover hidden voice signals from the built-in sensor of the mobile phone.
为了验证本发明的有效性,在三个手机(华为P40,小米10,OPPO Find X2)上分别部署了本发明。在手机扬声器播放语音时,直接采集这三部手机运动传感器(包括加速度计和陀螺仪)的测量数据,其中华为P40、小米10、OPPO Find X2的运动传感器的采样率分别是500Hz、397Hz、418Hz。由于手机内建磁力计的采样率过低(100Hz),还将一个采样率为500Hz,型号为MMC3416xPJ的磁力计附着在手机表面,以采集扬声器播放语音时泄露的磁场。采集到的传感器数据会被送入后端的隐匿信息恢复装置,以恢复语音信号。选择对数频谱距离(LSD)来衡量恢复的系统恢复的语音频谱与对应的真实语音频谱之间差异,LSD可以表示为:In order to verify the validity of the present invention, the present invention was deployed on three mobile phones (Huawei P40,
其中,和x分别是重建的语音信号和原始的语音信号,和X分别是重建的语音信号和原始的语音信号的对数能量频谱。LSD数值越小,说明语音恢复的质量越好。图3为磁力计信号,真实语音信号和重建语音信号的幅度频谱图,图3(a)和图3(b)分别展示了磁力计数据的原始频谱和对应的真实语音频谱(低于1000Hz)。可见,磁力计数据的原始频谱的带宽只有真实语音频谱带宽的且存在着严重的混叠。图3(c)展示了本发明从磁力计数据中重建的语音频谱,可见重建的语音频谱不仅在高频段上重建了谐波的结构,而且消除了低频段上存在的混叠。in, and x are the reconstructed speech signal and the original speech signal, respectively, and X are the logarithmic energy spectra of the reconstructed speech signal and the original speech signal, respectively. The smaller the LSD value, the better the quality of speech recovery. Figure 3 shows the magnitude spectrograms of the magnetometer signal, the real voice signal and the reconstructed voice signal. Figure 3(a) and Figure 3(b) show the original spectrum of the magnetometer data and the corresponding real voice spectrum (below 1000Hz), respectively. . It can be seen that the bandwidth of the original spectrum of the magnetometer data is only the bandwidth of the real voice spectrum. And there is serious aliasing. Figure 3(c) shows the speech spectrum reconstructed from magnetometer data in the present invention. It can be seen that the reconstructed speech spectrum not only reconstructs the structure of harmonics in the high frequency band, but also eliminates the aliasing existing in the low frequency band.
首先比较了本发明和已有的两种语音超分辨率技术(频谱折叠和频谱平移)的性能。图4为本发明的总体性能对比图,展示了本发明与这两种语音超分辨率技术的性能比较情况。可见,本发明能够取得比较低的LSD。具体来说,对于磁力计、加速度计和陀螺仪,本发明能够以29.5%,32.1%和13.5%的优势领先于频谱折叠和频谱平移的方法。这表明,与传统语音超分辨率方法相比,本发明能够更有效的恢复低分辨率传感器数据中的语音信息。Firstly, the performance of the present invention and two existing speech super-resolution techniques (spectral folding and spectral shifting) are compared. FIG. 4 is an overall performance comparison diagram of the present invention, showing the performance comparison between the present invention and these two speech super-resolution technologies. It can be seen that the present invention can obtain relatively low LSD. Specifically, for magnetometers, accelerometers and gyroscopes, the present invention can outperform spectral folding and spectral shifting methods by 29.5%, 32.1% and 13.5%. This shows that, compared with the traditional speech super-resolution method, the present invention can more effectively recover speech information in low-resolution sensor data.
为了验证本发明的迁移性,在不同的场景下部署了本发明。In order to verify the mobility of the present invention, the present invention is deployed in different scenarios.
当扬声器播放的语音来自不同的主体时,图5为本发明对于不同使用主体的性能对比图,展示了本发明的性能。可见,针对来自不同主体(20个)的语音,LSD没有发生比较明显的变化。具体来说,对于磁力计、加速度计和陀螺仪数据,LSD的标准差分别为0.09、0.11和0.21。这体现了本发明对于来自不同主体的语音的鲁棒性。在扬声器播放的语音包含不同的文本时,图6为本发明对于不同文本内容的性能对比图,展示了本发明的性能,可见,针对不同文本内容(10种)的语音,LSD也没有发生比较明显的变化,具体来说,对于磁力计、加速度计和陀螺仪数据,LSD的标准差分别为0.16、0.12和0.01。这体现了本发明对于不同语音文本的鲁棒性。When the voice played by the speaker comes from different subjects, FIG. 5 is a performance comparison diagram of the present invention for different using subjects, showing the performance of the present invention. It can be seen that for the speech from different subjects (20), there is no obvious change in LSD. Specifically, for the magnetometer, accelerometer, and gyroscope data, the standard deviation of the LSD is 0.09, 0.11, and 0.21, respectively. This demonstrates the robustness of the present invention to speech from different subjects. When the voice played by the speaker contains different texts, FIG. 6 is a performance comparison diagram of the present invention for different text contents, showing the performance of the present invention. It can be seen that for voices with different text contents (10 kinds), LSD does not compare. Significant variation, specifically, the standard deviation of LSD is 0.16, 0.12 and 0.01 for the magnetometer, accelerometer and gyroscope data, respectively. This reflects the robustness of the present invention to different speech texts.
当本发明部署的手机型号不同时,图7为本发明在不同设备上的性能对比图,展示了本发明的性能。可见,对于从加速度计和磁力计上采集到的数据,LSD都低于1.5,这代表本发明能够在不同的设备上有效的运行。由于陀螺仪的灵敏度要低于加速度计和磁力计,所以从陀螺仪数据恢复出的语音频谱的LSD比较高。When the mobile phone models deployed in the present invention are different, FIG. 7 is a performance comparison diagram of the present invention on different devices, showing the performance of the present invention. It can be seen that for the data collected from the accelerometer and the magnetometer, the LSD is lower than 1.5, which means that the present invention can operate effectively on different devices. Since the sensitivity of the gyroscope is lower than that of the accelerometer and the magnetometer, the LSD of the speech spectrum recovered from the gyroscope data is relatively high.
当本发明部署在不同的环境时,图8为本发明在不同环境下的性能对比图,展示了本发明的性能,可见,在不同的环境(噪声强度为45.3dbSPL下的实验室,噪声强度为48.9dbSPL下的宿舍,噪声强度为74.6dbSPL的食堂)下,LSD仍然比较接近且比较低,具体来说,对于磁力计、加速度计、陀螺仪数据,LSD的极差分别是0.101、0.051和0.311。这代表本发明能够有效的运行在不同的环境下。在不同的用户交互场景下,图9为本发明在不同的交互方式下的性能对比图,展示了本发明的性能。可见,在桌面场景(用户把手机放在桌面上)、手持场景(用户拿着手机)、敲击场景(用户持续地敲击屏幕)下,本发明仍然能够取得相似的LSD,具体来说,对于磁力计和加速度计数据,LSD的极差都低于0.1。这代表本发明能够去除用户交互对于传感器测量数据的干扰,实现有效的语音信息恢复。When the present invention is deployed in different environments, FIG. 8 is a performance comparison diagram of the present invention in different environments, showing the performance of the present invention. It can be seen that in different environments (the noise intensity is 45.3dbSPL in the laboratory, the noise intensity is For the dormitory under 48.9dbSPL and the canteen with noise intensity of 74.6dbSPL), the LSD is still relatively close and relatively low. 0.311. This means that the present invention can operate effectively in different environments. In different user interaction scenarios, FIG. 9 is a performance comparison diagram of the present invention under different interaction modes, showing the performance of the present invention. It can be seen that in the desktop scene (the user puts the mobile phone on the desktop), the hand-held scene (the user holds the mobile phone), and the tapping scene (the user continuously taps the screen), the present invention can still obtain similar LSDs. Specifically, For both magnetometer and accelerometer data, the LSD range is below 0.1. This means that the present invention can remove the interference of user interaction on sensor measurement data, and achieve effective voice information recovery.
本发明可改变为多种方式对本领域的技术人员是显而易见的,这样的改变不认为脱离本发明的范围,所有这样的对所述领域的技术人员显而易见的修改,将包括在本权利要求的范围之内。It will be apparent to those skilled in the art that the present invention may be modified in various ways. Such changes are not considered to depart from the scope of the present invention. All such modifications obvious to those skilled in the art will be included within the scope of the present claims. within.
Claims (7)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202110615983.XA CN113362840B (en) | 2021-06-02 | 2021-06-02 | General voice information recovery device and method based on undersampled data of built-in sensor |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202110615983.XA CN113362840B (en) | 2021-06-02 | 2021-06-02 | General voice information recovery device and method based on undersampled data of built-in sensor |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN113362840A CN113362840A (en) | 2021-09-07 |
| CN113362840B true CN113362840B (en) | 2022-03-29 |
Family
ID=77531451
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202110615983.XA Active CN113362840B (en) | 2021-06-02 | 2021-06-02 | General voice information recovery device and method based on undersampled data of built-in sensor |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN113362840B (en) |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN118969004B (en) * | 2024-09-30 | 2025-03-14 | 浙江芯劢微电子股份有限公司 | Voice noise reduction method and device and electronic equipment |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CA2090948A1 (en) * | 1992-03-09 | 1993-09-10 | Brian C. Gibson | Musical entertainment system |
| EP1130577A2 (en) * | 2000-03-02 | 2001-09-05 | Volkswagen Aktiengesellschaft | Method for the reconstruction of low speech frequencies from mid-range frequencies |
| US7474712B1 (en) * | 2002-12-31 | 2009-01-06 | Radioframe Networks, Inc. | Digital undersampling |
| CN102054480A (en) * | 2009-10-29 | 2011-05-11 | 北京理工大学 | Method for separating monaural overlapping speeches based on fractional Fourier transform (FrFT) |
Family Cites Families (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8311840B2 (en) * | 2005-06-28 | 2012-11-13 | Qnx Software Systems Limited | Frequency extension of harmonic signals |
-
2021
- 2021-06-02 CN CN202110615983.XA patent/CN113362840B/en active Active
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CA2090948A1 (en) * | 1992-03-09 | 1993-09-10 | Brian C. Gibson | Musical entertainment system |
| EP1130577A2 (en) * | 2000-03-02 | 2001-09-05 | Volkswagen Aktiengesellschaft | Method for the reconstruction of low speech frequencies from mid-range frequencies |
| US7474712B1 (en) * | 2002-12-31 | 2009-01-06 | Radioframe Networks, Inc. | Digital undersampling |
| CN102054480A (en) * | 2009-10-29 | 2011-05-11 | 北京理工大学 | Method for separating monaural overlapping speeches based on fractional Fourier transform (FrFT) |
Non-Patent Citations (1)
| Title |
|---|
| 基于语音频谱的共振峰声码器实现;王坤赤等;《现代电子技术》;20071101(第21期);全文 * |
Also Published As
| Publication number | Publication date |
|---|---|
| CN113362840A (en) | 2021-09-07 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| KR101153093B1 (en) | Method and apparatus for multi-sensory speech enhamethod and apparatus for multi-sensory speech enhancement ncement | |
| CN101606191B (en) | Multi-Sensing Speech Enhancement Using Speech State Models | |
| CN101199006B (en) | Multi-sensing speech enhancement method and system using prior noise-free speech | |
| US20180137877A1 (en) | Method, device and system for noise suppression | |
| CN103559888A (en) | Speech enhancement method based on non-negative low-rank and sparse matrix decomposition principle | |
| JP5634959B2 (en) | Noise / dereverberation apparatus, method and program thereof | |
| CN116386589A (en) | A Deep Learning Speech Reconstruction Method Based on Smartphone Acceleration Sensor | |
| CN112289343B (en) | Audio repair method and device, electronic equipment and computer readable storage medium | |
| Zhang et al. | Urgent challenge: Universality, robustness, and generalizability for speech enhancement | |
| Su et al. | Towards device independent eavesdropping on telephone conversations with built-in accelerometer | |
| CN113362840B (en) | General voice information recovery device and method based on undersampled data of built-in sensor | |
| CN115472153A (en) | Voice enhancement system, method, device and equipment | |
| CN117854541A (en) | Transformer fault detection model training method, fault diagnosis method and related equipment | |
| CN113611321A (en) | Voice enhancement method and system | |
| CN111968627B (en) | Bone conduction voice enhancement method based on joint dictionary learning and sparse representation | |
| Wang et al. | Voicelistener: A training-free and universal eavesdropping attack on built-in speakers of mobile devices | |
| CN103400578B (en) | Anti-noise voiceprint recognition device with joint treatment of spectral subtraction and dynamic time warping algorithm | |
| WO2022166738A1 (en) | Speech enhancement method and apparatus, and device and storage medium | |
| CN117809671A (en) | General speech enhancement back-end refinement method based on diffusion model | |
| Wang et al. | Audio keyword reconstruction from on-device motion sensor signals via neural frequency unfolding | |
| JP5035370B2 (en) | Motion detection device, motion detection method, and program | |
| CN119741946A (en) | Abnormal sound detection method, device, equipment, storage medium and program product | |
| US20210356502A1 (en) | Systems and methods of signal analysis and data transfer using spectrogram construction and inversion | |
| CN112185405A (en) | Bone conduction speech enhancement method based on differential operation and joint dictionary learning | |
| JP5867209B2 (en) | Sound removal apparatus, sound inspection apparatus, sound removal method, and sound removal program |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |