[go: up one dir, main page]

CN111078185A - Method and equipment for recording sound - Google Patents

Method and equipment for recording sound Download PDF

Info

Publication number
CN111078185A
CN111078185A CN201911366657.9A CN201911366657A CN111078185A CN 111078185 A CN111078185 A CN 111078185A CN 201911366657 A CN201911366657 A CN 201911366657A CN 111078185 A CN111078185 A CN 111078185A
Authority
CN
China
Prior art keywords
sound
target object
recording
sound signal
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911366657.9A
Other languages
Chinese (zh)
Inventor
李鼎逸
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Gree Electric Appliances Inc of Zhuhai
Original Assignee
Gree Electric Appliances Inc of Zhuhai
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Gree Electric Appliances Inc of Zhuhai filed Critical Gree Electric Appliances Inc of Zhuhai
Priority to CN201911366657.9A priority Critical patent/CN111078185A/en
Publication of CN111078185A publication Critical patent/CN111078185A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

本发明涉及录制声音的方法及设备,该方法包括:确定目标对象所在的方向;将录制声音的方向调整至所确定的方向;实时录制来自所确定的方向的声音信号,并对录制的声音信号进行去噪处理。本发明能够使得录音时只采集或录入目标对象所在的方向的声音,从而有效地排除了大部分的杂音,同时通过去噪处理使得录入的噪音例如回声也被滤除,进而保证了录音的质量。

Figure 201911366657

The invention relates to a method and device for recording sound. The method includes: determining the direction in which a target object is located; adjusting the direction of recording the sound to the determined direction; recording sound signals from the determined direction in real time, and comparing the recorded sound signals Perform denoising processing. The present invention enables only the sound in the direction of the target object to be collected or recorded during recording, thereby effectively eliminating most noises, and at the same time, the recorded noises such as echoes are also filtered out through denoising processing, thereby ensuring the quality of the recording .

Figure 201911366657

Description

Method and equipment for recording sound
Technical Field
The invention relates to the technical field of electroacoustic, in particular to a method and equipment for recording sound.
Background
With the rapid development of networks, for example, live video, video call, etc. have become the normal state in people's life. When people use the instant video communication software or, for example, when a meeting is held, people often only want to input or play the sound of a target object, but do not want to input the noise of other people or objects. However, in some noisy environments, for example, in an environment where multiple sound sources are sounded, the prior art cannot guarantee that only the sound of a target object, i.e., a single sound source, is recorded when recording video or audio, and the noise emitted by other sound sources is inevitably recorded.
Therefore, how to reduce or eliminate the input of noise except the sound of the target object in a noisy recording environment is a problem to be solved in the field of electroacoustic technology.
Disclosure of Invention
In order to solve the technical problem that in the prior art, in the environment of multi-sound-source sounding, the input of noise except for the sound of a target object cannot be reduced or eliminated, the invention provides a method and equipment for recording sound.
According to a first aspect of the present invention, there is provided a method of recording sound, the method comprising:
determining the direction of the target object;
adjusting the direction of the recorded sound to the determined direction;
and recording the sound signal from the determined direction in real time, and denoising the recorded sound signal.
Preferably, the denoising process includes at least one of: echo cancellation processing, beamforming processing, noise suppression processing, and dereverberation processing.
As an embodiment, determining the direction of the target object comprises the following steps:
s1: acquiring an image of an area of a current recording environment, wherein the current recording environment comprises a plurality of areas;
s2: comparing the image of the target object with the acquired image of the area, and outputting a comparison result, wherein the image of the target object comprises a face image of the target object;
s3: judging whether the image of the region contains the image of the target object according to the comparison result,
when the comparison result indicates that the image of the region contains the image of the target object, executing step S4, otherwise, updating the acquired region and returning to step S1;
s4: and determining the direction of the target object under the world coordinate system according to the image of the area and a camera calibration algorithm.
As another embodiment, determining the direction of the target object includes:
when the target object emits sound, determining the position of the target object in a world coordinate system based on the time difference and the intensity difference of the sound emitted by the target object reaching two sound recording devices;
and determining the direction of the target object in the world coordinate system based on the position of the target object in the world coordinate system.
Preferably, adjusting the direction of recording the sound to the determined direction includes:
adjusting the recording direction of the recording equipment according to the direction of the target object in the world coordinate system so that the adjusted recording direction of the recording equipment is consistent with the direction of the target object in the world coordinate system,
wherein the recording device includes a directional microphone.
Preferably, the echo cancellation process includes:
converting the recorded sound signal to a frequency domain by a conventional fast fourier transform algorithm or a modulated complex lapped transform algorithm;
echo portions of the sound signal in the frequency domain are filtered out by a plurality of adaptive acoustic echo cancellation filters,
the beamforming process comprises:
recording the sound signals through a recording device array consisting of a plurality of recording devices;
performing phase delay compensation on the sound signals recorded by each recording device in the recording device array;
and performing aliasing processing on the sound signals recorded by the plurality of sound recording devices after the phase delay compensation, so that the amplitude of the sound signal in the direction of the target object after the aliasing processing is increased.
Preferably, the noise suppression processing includes:
determining a frequency band in which a noise signal in the sound signal is located by using spectral subtraction;
filtering the noise signal of the frequency band in which the noise signal is located,
the dereverberation process includes:
estimating a frequency spectrum of a reverberations part in the sound signal using a delayed frequency spectrum of the sound signal and a parameter indicative of the decay of the reverberations part over time,
filtering out the reverberations part of the spectrum of the reverberations part with a filter.
Preferably, the method further comprises:
inputting or pre-storing a historical sound signal of a target object;
comparing the historical sound signal of the target object with the sound signal after denoising by utilizing voiceprint recognition to obtain the frequency band of the sound signal of the target object in the sound signal,
and filtering out the sound signals of the frequency bands except the frequency band of the sound of the target object in the sound signals subjected to the denoising processing by a filter.
Preferably, the method is applied to an apparatus for recording audio or video.
According to a second aspect of the present invention, there is provided an apparatus for recording sound, comprising:
a sound capture device;
a processor; and
the memory is stored with executable codes, and the executable codes can realize determining the direction of a target object when executed by the processor, sending instructions to the sound capture device for controlling the sound capture device to adjust the direction of the recorded sound to the determined direction and recording the sound signal from the determined direction in real time, and also realizing the denoising processing of the recorded sound signal.
Compared with the prior art, one or more embodiments in the above scheme can have the following advantages or beneficial effects:
the method and the device for recording the sound are applicable to a noisy recording environment with multiple sound sources, and the recording direction is adjusted, so that the recording device only collects or records the sound in the direction of the target object and does not record the sound in other directions except the direction of the target object, most of the noise is effectively eliminated during recording, and the recorded noise such as echo is also filtered through denoising processing, and the recording quality is ensured.
Further, in order to enable the recorded sound to be more accurate, the embodiment of the invention performs fine screening on the sound signal subjected to denoising processing through voiceprint recognition, and achieves the effect of completely filtering out other sound signals except the sound signal of the target object.
Further, the embodiment of the invention can also process the recorded audio or video containing the sound signals of a plurality of generating sources (target objects) and other noises to obtain the sound signal of each single generating source after the noises are removed. Meanwhile, the embodiment of the invention can mark the sound signal of each single generating source, distinguish the target object to which the sound signal of each single generating source belongs, and is favorable for later-stage quick editing.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention.
Fig. 1 is a flowchart of a method of recording sound according to an embodiment of the present invention.
Fig. 2 schematically shows an apparatus for recording sound according to an embodiment of the present invention.
Detailed Description
The following detailed description of the embodiments of the present invention will be provided with reference to the drawings and examples, so that how to apply the technical means to solve the technical problems and achieve the technical effects can be fully understood and implemented. It should be noted that, as long as there is no conflict, the embodiments and the features of the embodiments of the present invention may be combined with each other, and the technical solutions formed are within the scope of the present invention.
The invention provides a method and equipment for recording sound, aiming at solving the technical problem that in the prior art, in the environment of multi-sound-source sounding, the noise except the sound of a target object cannot be reduced or eliminated.
Fig. 1 is a flowchart of a method of recording sound according to an embodiment of the present invention. As shown in fig. 1, the method includes:
step S11: determining the direction of the target object;
step S12: adjusting the direction of the recorded sound to the determined direction;
step S13: and recording the sound signal from the determined direction in real time, and denoising the recorded sound signal.
First, it should be noted that the direction in which the target object is located refers to a direction of the target object relative to a center point of the sound recording apparatus or the sound recording apparatuses, for example, the direction in which the target object is located is 45 degrees to the left relative to the center point of the sound recording apparatus or the sound recording apparatuses.
As an embodiment, in step S11, for example, a face image of the target object may be input or stored in advance, and the stored face image of the target object may be compared with an image of the recording environment acquired by the image acquisition device through image recognition, so as to find the target object.
Considering that the range of the current recording environment may be large in practice, and the image acquisition device may only acquire images of a part of the area of the current recording environment at a time, the embodiment of the present invention preferably determines the direction of the target object in the world coordinate system by using a loop iteration method.
Specifically, step S11 includes the steps of:
s1: acquiring an image of an area of a current recording environment, wherein the current recording environment comprises a plurality of areas;
s2: comparing the image of the target object with the acquired image of the area, and outputting a comparison result, wherein the image of the target object comprises a face image of the target object;
s3: judging whether the image of the region contains the image of the target object according to the comparison result,
when the comparison result indicates that the image of the region contains the image of the target object, executing step S4, otherwise, updating the acquired region and returning to step S1;
s4: and determining the direction of the target object under the world coordinate system according to the image of the area and a camera calibration algorithm.
Taking a camera with a recording apparatus as an example, in step S1, an image of only one area of the current recording environment is captured at a time using the camera of the camera. In step S2, the stored face image of the target object is compared with the image of the one region (the region) of the current sound recording environment acquired in step S1, for example, by an image recognition technique, specifically, for example, by a processor of a camera, and the comparison result is output. In step S3, it is determined whether the image of the region includes the image of the target object according to the comparison result, that is, it is determined whether the image of the region includes the target object, when the comparison result indicates that the image of the region includes the image of the target object, step S4 is executed, otherwise, the collected region is updated and the step S1 is returned until the target object is found. In step S4, based on the image of the region containing the target object, the embodiment of the present invention preferably determines the direction of the target object by using a camera calibration algorithm. Specifically, for example, the direction of the target object in the world coordinate system is determined by converting the image coordinate system and the world coordinate system.
It should be noted that, for example, the current recording environment includes three areas, for example, the current recording environment includes a first area, a second area, and a third area. Updating the acquired region in step S3 refers to updating to the second region or the third region, for example, when the image of the first region has no target object.
As another implementation manner, in step S11, the embodiment of the present invention preferably determines the direction of the target object in the world coordinate system according to the time difference and the intensity difference between the sounds emitted by the target object and reaching the two sound recording devices.
Specifically, the recording apparatus may employ two or more microphones having a focusing function. When a target object sounds, the position of the target object (specified sound source) is determined using the binaural effect. The binaural effect is a spatial localization technique, for example, for a human, the human ears are symmetrically distributed on two sides of the head, and the auricle and the head of the human play an effective role in masking the sound. When the time and frequency intensity distribution of direct sound and reflected sound of a sound source sent to ears are different, the time and intensity difference of the same sound source sent to the ears is obvious, and the phenomenon results in that the position of the sound source, namely the 'binaural effect', can be clearly and accurately judged.
The embodiment of the invention preferably determines the position of the target object under the world coordinate system through a binaural effect algorithm. After the position of the target object is determined, the direction of the target object relative to the central points of the two sound recording devices in the world coordinate system can be determined according to the positions of the central points of the two sound recording devices and the position of the target object.
Returning to fig. 1, in step S12, the recording apparatus is controlled to adjust the recording direction to coincide with the direction of the target object in the world coordinate system.
Specifically, the sound recording apparatus includes, for example, directional microphones including, for example, a cardioid microphone and an ultracardioid microphone. In step S12, the recording angle or direction of the directional microphone is adjusted to be consistent with the direction of the target object according to the direction of the target object in the world coordinate system. Taking a camera with a directional microphone as an example, for example, a processor sends a rotation instruction to a rotation mechanism of the camera to control the rotation mechanism to rotate, so that the recording direction of the directional microphone on the camera body is consistent with the direction of the target object. Therefore, during recording, the directional microphone can only collect or record the sound in the direction of the target object, and does not record the sound in other directions except the direction of the target object, so that most of noise is effectively eliminated in recording.
In step S13, the sound signal from the determined direction is recorded in real time, and the recorded sound signal is subjected to a denoising process.
In order to filter out noise in the recorded sound, the embodiment of the present invention preferably performs a denoising process on the recorded sound using one or more of an echo cancellation process, a beamforming process, a noise suppression process, and a dereverberation process in step S13.
The echo cancellation process, the beamforming process, the noise suppression process, and the dereverberation process employed in the embodiment of the present invention will be described one by one below.
1) Echo cancellation processing
In the process of recording sound by the microphone, the sound captured by the microphone includes sound directly emitted by a sound source (target object) and echoes of the sound emitted by the sound source and/or the speaker after one or more reflections. Such as noisy conference rooms or lounges and hands-free telephones in automobiles, many high noise environments require effective echo cancellation.
In the embodiment of the present invention, the recorded sound signal is converted into the frequency domain by, for example, a conventional fast fourier transform FFT algorithm or a modulated complex lapped transform MCLT algorithm. The sound signal in the frequency domain is then processed by a plurality of adaptive acoustic echo cancellation filters to cancel the echo in the frequency domain.
Specifically, for the recorded sound signal, it is converted into a frequency domain sound signal by a conventional fast fourier transform algorithm or a modulated complex lapped transform algorithm. For each frequency in the frequency domain sound signal, a plurality of acoustic echo cancellation filters, e.g., N filters, are computed, each using different parameters of a different adaptation technique. For each frequency in the frequency domain sound signal, a linear combination of the outputs of the N filters is calculated. The linear combination of the N filter outputs for each frequency is then combined for all frequencies, converting it back to the time domain.
In an embodiment of the present invention, for the N acoustic echo cancellation filters, the momentum normalized least mean square MNLMS algorithm and the normalized least mean square NLMS algorithm are preferably used to provide the adaptation. The momentum normalization least mean square algorithm has a smooth characteristic and is suitable for a static environment, such as an environment in which nothing in a room moves excessively. The NLMS algorithm is applied in a dynamic environment, for example, in an environment where people often move.
2) Beamforming processing
Since the sound emitted from the generation source (target object) gradually weakens as the distance increases, the sound may already be weak when it reaches the microphone, resulting in a less than ideal recording effect. In addition, even if the recording direction of the microphone is adjusted to the direction in which the target object is located, it is impossible to completely eliminate interference of noise in other directions or other areas with the sound emitted from the target object.
Therefore, in order to overcome the loss of the sound propagation path and reduce interference with noise in directions other than the intended direction, the present invention preferably employs beamforming in recording the sound.
Specifically, for example, a microphone array composed of a plurality of microphones is used, and the gain of the relative phase and amplitude of the sound signal received by each microphone is concentrated in one direction (i.e., the direction in which the target object is located) based on the principle of mutual interference of waves. For each microphone, a specific phase delay is added to compensate for the phase of the sound signal received by that microphone. After phase compensation, the effective signals in the sound signals received by each microphone can be aligned in phase, so that the effective signals received by different microphones become large in amplitude after being added. On the other hand, when the interference signals propagating in other directions reach the microphone array, the delay corresponding to each microphone does not coincide with the time difference of arrival of the signals at the microphones, and thus the amplitude does not become large after the summation. In this way, during the recording process, the microphone array can increase the strength of the effective signal in the direction of the target object by matching a plurality of microphones with a specific delay, and simultaneously weaken the interference signals in other directions, thereby effectively blocking the interference signals from other directions except the intended direction.
3) Noise suppression processing
Noise suppression, as the name implies, is the removal of noise components from an audio signal. Specifically, the frequency band of the noise is determined, then the noise in the noise frequency band is filtered out, and the effective signal is reserved. In the embodiment of the present invention, it is preferable to determine the frequency band where the noise exists by using spectral subtraction and filter the noise in the noise frequency band.
4) Dereverberation processing
As described above, during the recording of sound by a microphone, the sound signal captured by the microphone may include reverberation or echoes from different surfaces. For example, in a room, sound signals (such as speech or music) are reflected by walls, ceiling and floor. Thus, the sound signal captured by the microphone is an acoustic signal that is a combination of the desired signal (received directly from the sound source) and the interfering signal (reflected via the reflecting surface). This interfering signal is referred to as the reverberations part of the sound signal.
In the embodiment of the invention, the process of dereverberation is as follows: the method comprises the steps of first estimating a frequency spectrum of a reverberations part of the received signal using a delayed frequency spectrum of the received signal and a parameter indicating the decay of the reverberations part over time, then filtering out the frequency spectrum of the reverberations part with a filter while estimating a frequency spectrum of an effective part using the frequency spectrum of the reverberations part by spectral subtraction, and reconstructing the effective signal from the estimated frequency spectrum of the effective part.
In an embodiment of the invention, the parameter indicative of the decay of the reverberation part over time is preferably:
Figure BDA0002338607450000081
Figure BDA0002338607450000082
where a denotes a parameter indicating the decay of the reverberations part over time, e is a mathematical constant (about 2.718), fs is the sampling frequency, 1n is the natural logarithm (31n10 about 6.9), T60 is the reverberation time, i.e. the length of time after which the signal level has dropped by 60 db relative to the original signal level, and k denotes the number of samples contained in each frame, e.g. splitting a recorded sound signal into n frames, each frame being divided into k samples at a preset frequency segment.
Through the steps S11 to S13, the embodiment of the invention can weaken and substantially filter other noises except for the sound of the target object, and only retain and amplify the sound of the target object in the noisy environment.
Further, in order to make the recorded sound more accurate, the embodiment of the present invention preferably performs fine screening on the sound signal subjected to the denoising processing by using voiceprint recognition, so as to completely filter out other sound signals except the sound signal of the target object.
For example, a historical sound signal of a target object is input or stored in advance, and then the stored historical sound signal of the target object is compared with a sound signal subjected to denoising processing by utilizing voiceprint recognition, for example, the comparison is performed based on one or more of the frequency, the loudness, the tone and the tone of the sound of the target object, so as to determine the frequency band where the sound signal of the target object in the sound signal subjected to denoising processing is located, and then the sound signal of a non-target object in other frequency bands except the frequency band where the sound signal of the target object is located is filtered by a filter.
Preferably, a prototype spectral model of the target object is established in advance, and then the denoised acoustic signal is subjected to contrast matching by using the prototype spectral model, for example, including spectral contrast and spectral analysis, and then the acoustic signal not in the prototype spectral model is filtered out according to the result of the contrast and analysis.
Therefore, the voice signal of the target object can be more accurately and clearly identified through voiceprint identification, and noise or other noise except the voice signal of the target object is removed.
As a more preferable implementation manner, the embodiment of the present invention may further process the recorded audio or video including the sound signals of multiple generating sources (target objects) and other noises, and obtain the sound signal of each single generating source after removing the noises.
Specifically, for example, the frequency band of the sound signal of each single generation source in the audio or video is determined by voiceprint recognition, and the sound signal of the frequency band of the sound signal of each single generation source is extracted, so as to obtain the sound signal of each single generation source. Subsequently, the sound signal of each single generation source is subjected to a denoising process using one or more of an echo cancellation process, a beamforming process, a noise suppression process, and a dereverberation process. Of course, the audio or video may be subjected to denoising processing first, and then the acoustic signal of each single generation source is extracted by utilizing voiceprint recognition, or the acoustic signal may not be subjected to denoising processing, and is flexibly selected according to the actual situation, which is not limited in the present invention.
Preferably, the extracted sound signal of each single generation source can be marked to distinguish which target sound is, so as to be beneficial to later-stage quick editing.
Fig. 2 schematically shows an apparatus for recording sound according to an embodiment of the present invention. As shown in fig. 2, the apparatus includes:
a sound capture device 201;
a processor 202; and
a memory 203, on which executable code is stored, which when executed by the processor enables determining the direction in which the target object is located, and issuing instructions to the sound capturing device 201 to control the sound capturing device to adjust the direction of the recorded sound to the determined direction and record the sound signal from the determined direction in real time, and also enables denoising the recorded sound signal.
The sound capturing device 201 includes a microphone, specifically, for example, a directional microphone, and the directional microphone includes, for example, a cardioid microphone and a hypercardioid microphone.
The device further includes a filter for receiving and executing instructions from the processor 202 to filter signals in a specified frequency band (e.g., a frequency band in which noise is present).
For detailed details of the operation of the sound capturing device 201, the processor 202 and the memory 203, reference is made to the above description of the method of the present invention with reference to fig. 1, and detailed description thereof is omitted here.
In summary, embodiments of the present invention provide a method and an apparatus for recording sound, which are applicable to a noisy recording environment with multiple sound sources, and enable a recording apparatus to only acquire or record sound in a direction in which a target object is located by adjusting a recording direction, but not record sound in other directions except the direction in which the target object is located, so as to ensure that most of noise is effectively removed during recording, and simultaneously enable recorded noise, such as echo, to be filtered by denoising processing, thereby ensuring the quality of recording.
Further, in order to enable the recorded sound to be more accurate, the embodiment of the invention performs fine screening on the sound signal subjected to denoising processing through voiceprint recognition, and achieves the effect of completely filtering out other sound signals except the sound signal of the target object.
Further, the embodiment of the invention can also process the recorded audio or video containing the sound signals of a plurality of generating sources (target objects) and other noises to obtain the sound signal of each single generating source after the noises are removed. Meanwhile, the embodiment of the invention can mark the sound signal of each single generating source, distinguish the target object to which the sound signal of each single generating source belongs, and is favorable for later-stage quick editing.
Those skilled in the art will appreciate that the modules or steps of the invention described above can be implemented in a general purpose computing device, centralized on a single computing device or distributed across a network of computing devices, and optionally implemented in program code that is executable by a computing device, such that the modules or steps are stored in a memory device and executed by a computing device, fabricated separately into integrated circuit modules, or fabricated as a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
Although the embodiments of the present invention have been described above, the above description is only for the convenience of understanding the present invention, and is not intended to limit the present invention. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1.一种录制声音的方法,其特征在于,包括:1. a method for recording sound, is characterized in that, comprises: 确定目标对象所在的方向;Determine the direction of the target object; 将录制声音的方向调整至所确定的方向;Adjust the direction of the recorded sound to the determined direction; 实时录制来自所确定的方向的声音信号,并对录制的声音信号进行去噪处理。The sound signal from the determined direction is recorded in real time, and the recorded sound signal is denoised. 2.根据权利要求1所述的方法,其特征在于,2. The method according to claim 1, wherein 所述去噪处理包括以下至少一项:回声消除处理、波束成形处理、噪声抑制处理和去混响处理。The denoising processing includes at least one of: echo cancellation processing, beamforming processing, noise suppression processing, and de-reverberation processing. 3.根据权利要求1所述的方法,其特征在于,确定目标对象所在的方向,包括以下步骤:3. The method according to claim 1, wherein determining the direction of the target object comprises the following steps: S1:采集当前录音环境的一个区域的图像,其中,所述当前录音环境包括多个区域;S1: Collect an image of an area of the current recording environment, wherein the current recording environment includes multiple areas; S2:将目标对象的图像与采集的该区域的图像进行比对,并输出比对结果,其中,目标对象的图像包括目标对象的人脸图像;S2: Compare the image of the target object with the collected image of the region, and output a comparison result, wherein the image of the target object includes a face image of the target object; S3:根据比对结果判断该区域的图像中是否包含目标对象的图像,S3: According to the comparison result, determine whether the image of the area contains the image of the target object, 当比对结果指示该区域的图像中包含目标对象的图像时,执行步骤S4,否则更新采集的区域并返回步骤S1;When the comparison result indicates that the image of the area contains the image of the target object, step S4 is performed, otherwise the collected area is updated and returns to step S1; S4:根据该区域的图像和相机标定算法,确定目标对象在世界坐标系下的方向。S4: Determine the direction of the target object in the world coordinate system according to the image of the area and the camera calibration algorithm. 4.根据权利要求1所述的方法,其特征在于,确定目标对象所在的方向,包括:4. The method according to claim 1, wherein determining the direction in which the target object is located comprises: 当所述目标对象发出声音时,基于所述目标对象发出的声音到达两个录音设备的时间差和强度差,确定所述目标对象在世界坐标系下的位置;When the target object emits a sound, the position of the target object in the world coordinate system is determined based on the time difference and the intensity difference between the sound emitted by the target object reaching the two recording devices; 基于所述目标对象在世界坐标系下的位置,确定所述目标对象在世界坐标系下的方向。Based on the position of the target object in the world coordinate system, the orientation of the target object in the world coordinate system is determined. 5.根据权利要求3或4所述的方法,其特征在于,将录制声音的方向调整至所确定的方向,包括:5. The method according to claim 3 or 4, wherein adjusting the direction of the recorded sound to the determined direction, comprising: 根据目标对象在世界坐标系下的方向调整录音设备的录音方向,使得调整后的录音设备的录音方向与目标对象在世界坐标系下的方向保持一致,Adjust the recording direction of the recording device according to the direction of the target object in the world coordinate system, so that the recording direction of the adjusted recording device is consistent with the direction of the target object in the world coordinate system, 其中,所述录音设备包括指向性麦克风。Wherein, the recording device includes a directional microphone. 6.根据权利要求2所述的方法,其特征在于,6. The method of claim 2, wherein 所述回声消除处理包括:The echo cancellation processing includes: 通过传统快速傅里叶变换算法或调制复重叠变换算法将录制的声音信号转换到频域;Convert the recorded sound signal to the frequency domain through the traditional fast Fourier transform algorithm or the modulated complex lapped transform algorithm; 通过多个自适应声学回声消除滤波器滤除频域下的声音信号的回声部分,所述波束成形处理包括:The echo part of the sound signal in the frequency domain is filtered out by a plurality of adaptive acoustic echo cancellation filters, and the beamforming process includes: 通过多个录音设备组成的录音设备阵列录制所述声音信号;Record the sound signal through a recording device array composed of a plurality of recording devices; 对所述录音设备阵列中的每个录音设备所录制的声音信号进行相位延迟补偿;performing phase delay compensation on the sound signal recorded by each recording device in the array of recording devices; 将相位延迟补偿后的所述多个录音设备录制的声音信号进行混叠处理,使得混叠处理后的所述目标对象所在的方向的声音信号的幅度增大。Aliasing processing is performed on the sound signals recorded by the plurality of recording devices after phase delay compensation, so that the amplitude of the sound signals in the direction where the target object is located after the aliasing processing is increased. 7.根据权利要求2所述的方法,其特征在于,7. The method of claim 2, wherein 所述噪声抑制处理包括:The noise suppression processing includes: 利用频谱减法来确定所述声音信号中的噪声信号所在的频段;Using spectral subtraction to determine the frequency band where the noise signal in the sound signal is located; 滤除噪声信号所在的频段的噪音信号,所述去混响处理包括:Filter out the noise signal in the frequency band where the noise signal is located, and the de-reverberation processing includes: 利用所述声音信号的延迟频谱和指示混响部分随时间衰减的参数,估计所述声音信号中混响部分的频谱,estimating the spectrum of the reverberated portion of the sound signal using the delay spectrum of the sound signal and a parameter indicating the decay of the reverberated portion over time, 利用滤波器滤除掉所述混响部分的频谱的所述混响部分。The reverberated portion of the spectrum of the reverberated portion is filtered out with a filter. 8.根据权利要求1所述的方法,其特征在于,还包括:8. The method of claim 1, further comprising: 输入或预先存储目标对象的历史声音信号;Input or pre-store the historical sound signal of the target object; 利用声纹识别将所述目标对象的历史声音信号与去噪处理后的声音信号进行比对,得到所述声音信号中目标对象的声音信号所在的频段,Use voiceprint recognition to compare the historical sound signal of the target object with the denoised sound signal, and obtain the frequency band where the sound signal of the target object is located in the sound signal, 通过滤波器滤除掉去噪处理后的声音信号中除目标对象的声音所在的频段之外的频段的声音信号。The sound signals of frequency bands other than the frequency band where the sound of the target object is located in the denoised sound signal are filtered out by the filter. 9.根据权利要求1所述的方法,其特征在于,所述方法应用于录制音频或视频的设备。9. The method of claim 1, wherein the method is applied to a device that records audio or video. 10.一种录制声音的设备,其特征在于,包括:10. A device for recording sound, comprising: 声音捕获装置;sound capture device; 处理器;以及processor; and 存储器,其上存储有可执行代码,所述可执行代码在被所述处理器执行时能够实现确定目标对象所在的方向,并向所述声音捕获装置发出控制所述声音捕获装置将录制声音的方向调整至所确定的方向以及实时录制来自所确定的方向的声音信号的指令,还能够实现对录制的声音信号进行去噪处理。The memory has executable codes stored thereon, and the executable codes, when executed by the processor, can realize the determination of the direction in which the target object is located, and send out to the sound capture device a command that controls the sound capture device to record sound; The instruction of adjusting the direction to the determined direction and recording the sound signal from the determined direction in real time can also realize denoising processing of the recorded sound signal.
CN201911366657.9A 2019-12-26 2019-12-26 Method and equipment for recording sound Pending CN111078185A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911366657.9A CN111078185A (en) 2019-12-26 2019-12-26 Method and equipment for recording sound

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911366657.9A CN111078185A (en) 2019-12-26 2019-12-26 Method and equipment for recording sound

Publications (1)

Publication Number Publication Date
CN111078185A true CN111078185A (en) 2020-04-28

Family

ID=70318140

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911366657.9A Pending CN111078185A (en) 2019-12-26 2019-12-26 Method and equipment for recording sound

Country Status (1)

Country Link
CN (1) CN111078185A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111970625A (en) * 2020-08-28 2020-11-20 Oppo广东移动通信有限公司 Recording method and device, terminal and storage medium
CN112309449A (en) * 2020-10-26 2021-02-02 维沃移动通信(深圳)有限公司 Audio recording method and device
CN112509597A (en) * 2020-11-19 2021-03-16 珠海格力电器股份有限公司 Recording data identification method and device and recording equipment
CN113422865A (en) * 2021-06-01 2021-09-21 维沃移动通信有限公司 Directional recording method and device
CN113676593A (en) * 2021-08-06 2021-11-19 Oppo广东移动通信有限公司 Video recording method, device, electronic device and storage medium
CN113689873A (en) * 2021-09-07 2021-11-23 联想(北京)有限公司 Noise suppression method, device, electronic equipment and storage medium
CN114171030A (en) * 2021-10-20 2022-03-11 深圳市鸿合创新信息技术有限责任公司 Sound amplifying method, device, equipment and storage medium
CN115440254A (en) * 2022-08-29 2022-12-06 火星语盟(深圳)科技有限公司 Noise data identification method and equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107360387A (en) * 2017-07-13 2017-11-17 广东小天才科技有限公司 Video recording method and device and terminal equipment
CN108877827A (en) * 2017-05-15 2018-11-23 福州瑞芯微电子股份有限公司 Voice-enhanced interaction method and system, storage medium and electronic equipment

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108877827A (en) * 2017-05-15 2018-11-23 福州瑞芯微电子股份有限公司 Voice-enhanced interaction method and system, storage medium and electronic equipment
CN107360387A (en) * 2017-07-13 2017-11-17 广东小天才科技有限公司 Video recording method and device and terminal equipment

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111970625A (en) * 2020-08-28 2020-11-20 Oppo广东移动通信有限公司 Recording method and device, terminal and storage medium
CN111970625B (en) * 2020-08-28 2022-03-22 Oppo广东移动通信有限公司 Recording method and device, terminal and storage medium
CN112309449A (en) * 2020-10-26 2021-02-02 维沃移动通信(深圳)有限公司 Audio recording method and device
CN112309449B (en) * 2020-10-26 2024-10-29 维沃移动通信(深圳)有限公司 Audio recording method and device
CN112509597A (en) * 2020-11-19 2021-03-16 珠海格力电器股份有限公司 Recording data identification method and device and recording equipment
CN113422865A (en) * 2021-06-01 2021-09-21 维沃移动通信有限公司 Directional recording method and device
CN113676593A (en) * 2021-08-06 2021-11-19 Oppo广东移动通信有限公司 Video recording method, device, electronic device and storage medium
CN113676593B (en) * 2021-08-06 2022-12-06 Oppo广东移动通信有限公司 Video recording method, video recording device, electronic equipment and storage medium
CN113689873A (en) * 2021-09-07 2021-11-23 联想(北京)有限公司 Noise suppression method, device, electronic equipment and storage medium
CN114171030A (en) * 2021-10-20 2022-03-11 深圳市鸿合创新信息技术有限责任公司 Sound amplifying method, device, equipment and storage medium
CN115440254A (en) * 2022-08-29 2022-12-06 火星语盟(深圳)科技有限公司 Noise data identification method and equipment
CN115440254B (en) * 2022-08-29 2025-09-12 深圳火星语盟科技股份有限公司 A method for identifying noisy data

Similar Documents

Publication Publication Date Title
CN111078185A (en) Method and equipment for recording sound
US9966059B1 (en) Reconfigurale fixed beam former using given microphone array
JP6196320B2 (en) Filter and method for infomed spatial filtering using multiple instantaneous arrival direction estimates
US8180067B2 (en) System for selectively extracting components of an audio input signal
JP4376902B2 (en) Voice input system
CN108370470B (en) Conference system and voice acquisition method in conference system
US9197974B1 (en) Directional audio capture adaptation based on alternative sensory input
US8787587B1 (en) Selection of system parameters based on non-acoustic sensor information
KR101449433B1 (en) Noise cancelling method and apparatus from the sound signal through the microphone
JP5007442B2 (en) System and method using level differences between microphones for speech improvement
KR101456866B1 (en) Method and apparatus for extracting a target sound source signal from a mixed sound
US9100734B2 (en) Systems, methods, apparatus, and computer-readable media for far-field multi-source tracking and separation
KR101555416B1 (en) Apparatus and method for spatially selective sound acquisition by acoustic triangulation
US20030026437A1 (en) Sound reinforcement system having an multi microphone echo suppressor as post processor
US9232309B2 (en) Microphone array processing system
US20040170284A1 (en) Sound reinforcement system having an echo suppressor and loudspeaker beamformer
WO2022256577A1 (en) A method of speech enhancement and a mobile computing device implementing the method
Ba et al. Enhanced MVDR beamforming for arrays of directional microphones
US20240249742A1 (en) Partially adaptive audio beamforming systems and methods
Kowalczyk et al. On the extraction of early reflection signals for automatic speech recognition
Reindl et al. An acoustic front-end for interactive TV incorporating multichannel acoustic echo cancellation and blind signal extraction
Stolbov et al. Dual-microphone speech enhancement system attenuating both coherent and diffuse background noise
WO2023125537A1 (en) Sound signal processing method and apparatus, and device and storage medium
Ihle et al. A Novel Noise Suppression Algorithm Using a Very Small Microphone Array
CN115512713A (en) Echo cancellation method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200428