[go: up one dir, main page]

WO2020110228A1 - Dispositif de traitement d'informations, programme et procédé de traitement d'informations - Google Patents

Dispositif de traitement d'informations, programme et procédé de traitement d'informations Download PDF

Info

Publication number
WO2020110228A1
WO2020110228A1 PCT/JP2018/043747 JP2018043747W WO2020110228A1 WO 2020110228 A1 WO2020110228 A1 WO 2020110228A1 JP 2018043747 W JP2018043747 W JP 2018043747W WO 2020110228 A1 WO2020110228 A1 WO 2020110228A1
Authority
WO
WIPO (PCT)
Prior art keywords
sound
observation
microphone
time
spectral component
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2018/043747
Other languages
English (en)
Japanese (ja)
Inventor
訓 古田
松岡 文啓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mitsubishi Electric Corp
Original Assignee
Mitsubishi Electric Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mitsubishi Electric Corp filed Critical Mitsubishi Electric Corp
Priority to PCT/JP2018/043747 priority Critical patent/WO2020110228A1/fr
Priority to JP2020557460A priority patent/JP6840302B2/ja
Publication of WO2020110228A1 publication Critical patent/WO2020110228A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10KSOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
    • G10K11/00Methods or devices for transmitting, conducting or directing sound in general; Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
    • G10K11/18Methods or devices for transmitting, conducting or directing sound
    • G10K11/26Sound-focusing or directing, e.g. scanning
    • G10K11/34Sound-focusing or directing, e.g. scanning using electrical steering of transducer arrays, e.g. beam steering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/20Arrangements for obtaining desired frequency or directional characteristics
    • H04R1/32Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only
    • H04R1/40Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones

Definitions

  • the present invention relates to an information processing device, a program, and an information processing method.
  • These hands-free voice operation systems, hands-free call systems, and abnormal sound monitoring systems are designed to produce a target sound such as a sound or abnormal sound under various noise environments such as in a moving vehicle, factory, office, or living room at home.
  • a microphone is installed to collect the.
  • such a microphone collects not only the target sound but also ambient noise other than the target sound and other voices (hereinafter, referred to as disturbing sound).
  • a method by beam forming such as directing the direction of the target sound by signal processing, or directing a blind spot to the disturbing sound, or
  • a method of estimating a mixing matrix by independent component analysis there is a method of estimating a mixing matrix by independent component analysis.
  • beamforming is excellent in suppressing noise, it is not so effective in separating voice, and independent component analysis has a problem that performance is deteriorated due to the influence of reverberation or noise.
  • the number of noise sources of the interfering sound is not limited to one, and there is a constraint that it is difficult to separate more sound sources than the number of microphones.
  • Binary masking is an effective method for suppressing directional interference that is easy to implement.
  • Patent Document 1 discloses a method of increasing the accuracy of binary masking for mixed speech in which sparseness is not guaranteed by intentionally causing an amplitude difference between power spectra.
  • the conventional method has a problem that an error occurs in the mask coefficient because a power difference is intentionally generated between the power spectra of the main microphone input signal and the sub microphone input signal.
  • One or more aspects of the present invention have been made to solve such a problem, and an object thereof is to make it possible to easily obtain a high quality target signal.
  • An information processing apparatus is based on a first observation analog signal generated by a first microphone based on an observation sound including a target sound coming from a first direction, and the observation sound.
  • the first observation digital signal receives the input of the second observation analog signal generated by the second microphone and converting each of the first observation analog signal and the second observation analog signal into a digital signal
  • the first observation digital signal An analog/digital conversion unit that generates a signal and a second observed digital signal, and converts each of the first observed digital signal and the second observed digital signal into a frequency domain signal,
  • a time/frequency conversion unit that generates a spectrum component and a second spectrum component and a cross-correlation function of the first spectrum component and the second spectrum component, the observed sound is transmitted to the first microphone.
  • a mask generation unit that calculates a filtering coefficient for masking a spectral component of a sound coming from a direction different from the first direction, based on a time difference between the arrival time and the arrival time at the second microphone; By masking the first spectrum component using the filtering coefficient, a masking filter unit that separates the spectrum component, and by converting the separated spectrum component into a signal in the time domain, And a time/frequency inverse converter that generates an output digital signal.
  • a program causes a computer to generate a first observation analog signal generated by a first microphone based on an observation sound including a target sound coming from a first direction, and the observation sound. Based on the input of the second observation analog signal generated by the second microphone based on the above, the first observation analog signal and the second observation analog signal are each converted into a digital signal, thereby performing the first observation.
  • An analog/digital converter that generates a digital signal and a second observed digital signal, each of the first observed digital signal and the second observed digital signal is converted into a signal in the frequency domain, and The observed sound arrives at the first microphone using a time/frequency conversion unit that generates a spectral component and a second spectral component, and a cross-correlation function of the first spectral component and the second spectral component.
  • a mask generation unit that calculates a filtering coefficient for masking a spectral component of a sound coming from a direction different from the first direction, based on a time difference between the time when the sound arrives at the second microphone and the time when the sound arrives at the second microphone;
  • a masking filter unit that separates the spectral component of 1 by masking the spectral component by using the filtering coefficient, and by converting the separated spectral component into a signal in the time domain, output It is characterized in that it functions as a time/frequency inverse conversion unit that generates a digital signal.
  • An information processing method is based on a first observed analog signal generated by a first microphone based on an observed sound including a target sound coming from a first direction, and the observed sound.
  • a spectral component is generated, and using the cross-correlation function of the first spectral component and the second spectral component, the time at which the observed sound arrives at the first microphone and the arrival at the second microphone.
  • the filtering coefficient for masking the spectral component of the sound arriving from the direction different from the first direction is calculated according to the time difference from the time to be used, and the filtering coefficient is used for the first spectral component.
  • An output digital signal is generated by separating the spectral components by performing masking and converting the separated spectral components into a signal in the time domain.
  • FIG. 4 is a block diagram schematically showing an internal configuration of a mask generation unit in the first to third embodiments. It is a schematic diagram for demonstrating arrangement
  • (A) to (C) are graphs for explaining the utterance amount ratio when the target speaker and the disturbing speaker speak.
  • (A) And (B) is a graph for explaining the effect in the first embodiment.
  • It is a block diagram which shows the 1st hardware structural example of a sound source separation apparatus. It is a block diagram which shows the 2nd hardware structural example of a sound source separation device.
  • FIG. 5 is a block diagram schematically showing a configuration of an information processing system including a sound source separation device according to a second embodiment.
  • FIG. It is a schematic diagram which shows an example of the method of excluding the influence of noise other than a target sound and a disturbance sound.
  • FIG. 1 is a block diagram schematically showing the configuration of a sound source separation device 100 as an information processing device according to the first embodiment.
  • the sound source separation device 100 includes an analog/digital conversion unit (hereinafter referred to as A/D conversion unit) 103, a time/frequency conversion unit (hereinafter referred to as T/F conversion unit) 104, a mask generation unit 105, and a masking filter.
  • a unit 110, a time/frequency inverse conversion unit (hereinafter referred to as T/F inverse conversion unit) 111, and a digital/analog conversion unit (hereinafter referred to as D/A conversion unit) 112 are provided.
  • the sound source separation device 100 is connected to a first microphone 101 and a second microphone 102.
  • FIG. 2 is a block diagram schematically showing the internal configuration of the mask generation unit 105.
  • the mask generation unit 105 includes a mask coefficient calculation unit 106, an utterance amount ratio calculation unit 107, a gain calculation unit 108, and a mask correction unit 109.
  • the sound source separation device 100 forms a masking filter based on the signal in the frequency domain generated from the signals in the time domain acquired by the first microphone 101 and the second microphone 102, and uses the masking filter as the first microphone. By multiplying the signal in the frequency domain corresponding to the signal acquired in 101, the output signal of the target sound from which the interfering sound is removed is obtained.
  • the first observed analog signal acquired by the first microphone 101 is also referred to as a first channel Ch1
  • the second observed analog signal acquired by the second microphone 102 is also referred to as a second channel Ch2. ..
  • the first microphone 101 and the second microphone 102 are located on the same horizontal plane, and their positions are known. Yes, and does not change over time. Further, it is assumed that the direction range in which the target sound and the disturbing sound can arrive does not change with time. The direction in which the target sound arrives is also called the first direction, and the direction in which the disturbing sound arrives is also called the second direction. Here, it is assumed that the target sound and the disturbing sound are voices from different single speakers.
  • the first microphone 101 generates a first observation analog signal by converting the observation sound into an electric signal.
  • the first observed analog signal is given to the A/D conversion unit 103.
  • the second microphone 102 generates a second observed analog signal by converting the observed sound into an electric signal.
  • the second observed analog signal is provided to the A/D conversion unit 103.
  • the A/D conversion unit 103 performs analog/digital conversion () on each of the first observed analog signal given from the first microphone 101 and the second observed analog signal given from the second microphone 102. Hereinafter, each is converted into a digital signal by performing A/D conversion) to generate a first observed digital signal and a second observed digital signal.
  • the A/D conversion unit 103 samples the first observation analog signal given from the first microphone 101 at a predetermined sampling frequency and converts it into a digital signal divided into frame units. By doing so, the first observed digital signal is generated.
  • the A/D conversion unit 103 samples the second observation analog signal supplied from the second microphone 102 at a predetermined sampling frequency to obtain a digital signal divided into frame units. The second observed digital signal is generated by the conversion.
  • the sampling frequency is, for example, 16 kHz
  • the frame unit is, for example, 16 ms.
  • the first observed digital signal generated from the first observed analog signal in the frame interval corresponding to the sample number t is represented by a code x 1 (t)
  • the second observed digital signal in the frame interval corresponding to the sample number t is represented.
  • the second observed digital signal generated from the observed analog signal is represented by the code x 2 (t).
  • the first observed digital signal x 1 (t) and the second observed digital signal x 2 (t) are provided to the T/F conversion unit 104.
  • the T/F conversion unit 104 receives the first observed digital signal x 1 (t) and the second observed digital signal x 2 (t), and receives the first observed digital signal x 1 (t) in the time domain and The second observed digital signal x 2 (t) is converted into a first short-time spectrum component X 1 ( ⁇ , ⁇ ) and a second short-time spectrum component X 2 ( ⁇ , ⁇ ) in the frequency domain.
  • represents a spectrum number that is a discrete frequency
  • represents a frame number.
  • the T/F conversion unit 104 performs, for example, Fast Fourier Transform of 512 points on the first observed digital signal x 1 (t), and thus the first short-time spectrum component X 1 Generate ( ⁇ , ⁇ ). Similarly, the T/F conversion unit 104 generates a second short-time spectrum component X 2 ( ⁇ , ⁇ ) from the second observed digital signal x 2 (t).
  • the short-time spectrum component of the current frame is simply described as a spectrum component and its description is omitted.
  • the mask generation unit 105 receives the first spectral component X 1 ( ⁇ , ⁇ ) and the second spectral component X 2 ( ⁇ , ⁇ ) and receives a time that is a filtering coefficient for performing masking for separating the target sound.
  • the frequency filter coefficient b mod ( ⁇ , ⁇ ) is calculated.
  • the mask generation unit 105 uses the cross-correlation function of the first spectral component X 1 ( ⁇ , ⁇ ) and the second spectral component X 2 ( ⁇ , ⁇ ) to determine that the observation sound is the first microphone 101. And a time difference between the second microphone 102 and the second microphone 102, a filtering coefficient for masking a spectral component of a sound coming from a direction different from the first direction in which the target sound arrives is calculated. ..
  • the first microphone 101 In obtaining the time-frequency filter coefficient b mod ( ⁇ , ⁇ ), as shown in FIG. 3, in the horizontal plane where the first microphone 101 and the second microphone 102 are provided, the first microphone 101 with respect to the vertical direction V 2 of the vertical V 1 and second microphones 102, from a direction included in a predetermined angle theta, it is assumed that the target sound comes. Incidentally, interference sound is the vertical direction V 2 of the vertical V 1 and second microphones 102 of the first microphone 101, it is assumed that the target sound coming from the opposite side.
  • the vertical direction V 2 of the vertical V 1 and second microphones 102 of the first microphone 101 to the straight line connecting the first microphone 101 and second microphone 102, which are perpendicular
  • the vertical direction V 2 of the vertical V 1 and second microphones 102 of the first microphone 101 a reference direction is predetermined, not necessarily vertical.
  • the distance between the first microphone 101 and the second microphone 102 is the distance d.
  • the arrival of sound using the signals from the first microphone 101 and the second microphone 102 In order to determine whether the sound collected by the first microphone 101 and the second microphone 102 is the target sound or the disturbing sound, the arrival of sound using the signals from the first microphone 101 and the second microphone 102. It is necessary to estimate whether the direction is within the desired range.
  • the time difference that occurs between the signals from the first microphone 101 and the second microphone 102 is determined by the angle ⁇ , the arrival direction can be estimated by using this time difference. This will be described below with reference to FIGS. 2 and 3.
  • the mask coefficient calculation unit 106 first calculates the cross-correlation function of the first spectral component X 1 ( ⁇ , ⁇ ) and the second spectral component X 2 ( ⁇ , ⁇ ) as shown in the following equation (1).
  • the cross spectrum D( ⁇ , ⁇ ) is calculated.
  • the mask coefficient calculation unit 106 gives the calculated cross spectrum D( ⁇ , ⁇ ) to the utterance amount ratio calculation unit 107.
  • the mask coefficient calculation unit 106 obtains the phase ⁇ D ( ⁇ , ⁇ ) of the cross spectrum D( ⁇ , ⁇ ) using the following equation (2).
  • Q( ⁇ , ⁇ ) and K( ⁇ , ⁇ ) represent the imaginary part and the real part of the cross spectrum D( ⁇ , ⁇ ), respectively.
  • the phase ⁇ D ( ⁇ , ⁇ ) obtained by the above equation (2) means the phase angle for each spectral component of the first channel Ch1 and the second channel Ch2, which is defined by the discrete frequency ⁇ .
  • the division represents the time delay between the two signals. That is, the time difference ⁇ ( ⁇ , ⁇ ) between the first channel Ch1 and the second channel Ch2 can be expressed by the following equation (3).
  • the mask coefficient b( ⁇ , ⁇ ) for performing masking for separating the target sound can be expressed by the following equation (5).
  • the mask coefficient calculation unit 106 uses the cross-correlation function of the first spectral component X 1 ( ⁇ , ⁇ ) and the second spectral component X 2 ( ⁇ , ⁇ ) to determine that the target sound is the first A first time difference between the time of arrival at the microphone 101 and the time of arrival at the second microphone 102, the time of arrival of an interfering sound at the first microphone 101, and the time of arrival at the second microphone 102.
  • the mask coefficient for separating the spectrum component of the sound coming from the direction included in the first range from the spectrum component of the sound coming from the direction included in the second range is calculated.
  • the mask coefficient b( ⁇ , ⁇ ) shown in the equation (5) is 1 when the target sound is estimated and is M when the disturbing sound is estimated.
  • M the mask coefficient has a binary value of 1 or 0 (binary), so a filter having such a mask coefficient is called a binary mask.
  • a decimal other than binary may be used as the filter coefficient, and the filter in this case is also called a soft mask.
  • the mask coefficient calculation unit 106 gives the mask coefficient b( ⁇ , ⁇ ) to the mask correction unit 109.
  • the utterance amount ratio calculation unit 107 calculates the first spectrum component X 1 ( ⁇ , ⁇ ) of the first channel Ch1, the second spectrum component X 2 ( ⁇ , ⁇ ) of the second channel Ch2, and the cross spectrum.
  • the utterance amount ratio which is the ratio between the utterance amount of the target sound speaker and the utterance amount of the disturbing sound speaker, is calculated.
  • the utterance amount ratio is the interference of the amount of the spectral component of the sound coming from the first range including the first direction in which the target sound comes in the first spectral component X 1 ( ⁇ , ⁇ ). It is the ratio to the amount of the spectral component of the sound coming from the second range including the second direction in which the sound comes.
  • the utterance amount ratio calculation unit 107 obtains the first power spectrum P 1 ( ⁇ , ⁇ ) of the first channel Ch1 from the first spectrum component X 1 ( ⁇ , ⁇ ) of the first channel Ch1. It is calculated from the following equation (6). However, X Re is the real part of the first spectral component X 1 ( ⁇ , ⁇ ), and X Im is the imaginary part of the first spectral component X 1 ( ⁇ , ⁇ ).
  • the utterance amount ratio calculation unit 107 uses the sign of the imaginary part Q( ⁇ , ⁇ ) of the cross spectrum D( ⁇ , ⁇ ) shown in the above equation (1) to observe the analog of the target voice. It is determined whether the signal comes from the target sound side or the interfering sound side. Then, the utterance amount ratio calculation unit 107 adds the first power spectrum P1( ⁇ , ⁇ ) of the first channel Ch1 according to the code determination result, as shown in the following expression (7), The utterance amount s Tgt ( ⁇ ) of the target speaker and the utterance amount s Int ( ⁇ ) of the disturbing speaker are respectively obtained.
  • the utterance amount ratio calculation unit 107 obtains the utterance amount ratio SR( ⁇ ) from the obtained two utterance amounts s Tgt ( ⁇ ) and s Int ( ⁇ ) by the following equation (8).
  • FIG. 4A to 4C are graphs for explaining the utterance amount ratio SR( ⁇ ) when the target speaker and the disturbing speaker speak.
  • FIG. 4A is a graph showing an example of the time waveform of the observed analog signal acquired by the first microphone 101.
  • FIG. 4B is a graph showing an example of the time variation of the utterance amount between the target sound speaker and the disturbing sound speaker.
  • FIG. 4C is a graph showing an example of temporal variation of the utterance amount ratio SR( ⁇ ) obtained from the utterance amount of the target sound speaker and the utterance amount of the disturbing sound speaker.
  • the target sound with high separation accuracy and less distortion can be obtained. Separation is possible. More specifically, for example, in a frame in which the utterance amount ratio SR( ⁇ ) is small, the masking filter coefficient is increased to strongly suppress the interfering sound and enhance the separation performance, thereby increasing the utterance amount ratio SR( ⁇ ). In the case of a large frame, it is possible to reduce the distortion of the target sound by reducing the numerical value of the masking filter coefficient.
  • the gain calculation unit 108 uses the utterance amount ratio SR( ⁇ ) obtained by the above equation (8) to determine the constant in the mask coefficient b( ⁇ , ⁇ ) of the above equation (5).
  • a correction gain g( ⁇ , ⁇ ) for correcting M is calculated by the following formula (9).
  • G Tgt , G Int, and G DT are predetermined modified gain constants
  • G Tgt is a constant when there is a high possibility that the observed analog signal is only the target sound
  • G Int is the observed analog signal.
  • G DT is a constant when there is a high possibility that both the target sound and the interfering sound are present in the observed analog signal.
  • M in the above equation (5) is controlled to be large, in other words, the suppression amount of the mask is controlled to be small.
  • the corrected M is limited to a value of 1 or less.
  • M in the above equation (5) is controlled to be further reduced, in other words, the amount of suppression of the disturbing sound is controlled to be further increased. That is, the gain calculation unit 108 calculates a correction gain for correcting the mask coefficient so that the higher the utterance amount ratio is, the lower the masking strength is.
  • the calculation cost is low because only the utterance volume ratio obtained from a simple calculation of the power of the observed analog signal and the conditional expression based on the comparison of the utterance volume ratio are sufficient, and the mask coefficient is efficiently corrected. It is possible to
  • K( ⁇ ) is a frequency correction coefficient represented by a positive number of 1 or less, and is set so that the value increases as the frequency increases, as shown in the following expression (10).
  • the frequency correction coefficient of the equation (10) is corrected so that the value increases as the frequency increases, but the frequency correction coefficient of the equation (10) is not limited to such an example.
  • the constant value of the correction gain or the constant threshold value of the utterance amount ratio SR( ⁇ ) described above is not limited to the case of Expression (9), and is appropriately adjusted according to the mode of the target sound or the disturbing sound. can do. Further, the condition for determining the correction gain is not limited to three stages as in the equation (9), and may be set in more stages.
  • the mask correction unit 109 applies the correction gain g obtained by the equation (9) to the mask coefficient b( ⁇ , ⁇ ) obtained by the above equation (5).
  • the correction is performed using ( ⁇ , ⁇ ) to obtain the time-frequency filter coefficient b mod ( ⁇ , ⁇ ).
  • the masking filter unit 110 uses the above formula ( 1 ) as the first spectral component X 1 ( ⁇ , ⁇ ) on the first microphone 101 side, as shown in the following formula (12).
  • the time frequency filter coefficient b mod ( ⁇ , ⁇ ) obtained in 11) is multiplied to calculate the spectral component Y( ⁇ , ⁇ ).
  • the masking filter unit 110 sends the calculated spectral component Y( ⁇ , ⁇ ) to the T/F inverse transform unit 111.
  • the spectral component Y( ⁇ , ⁇ ) separated here is also referred to as a target spectral component.
  • the target spectrum component is a spectrum component including a target sound.
  • the T/F inverse transform unit 111 performs, for example, inverse fast Fourier transform on the spectral component Y( ⁇ , ⁇ ) to calculate the output digital signal y(t).
  • the T/F inverse conversion unit 111 gives the calculated output digital signal y(t) to the D/A conversion unit 112.
  • the D/A conversion unit 112 generates an output signal by converting the output digital signal y(t) into an analog signal.
  • the generated output signal is output to an external device such as a voice recognition device, a hands-free communication device, or an abnormal sound monitoring device.
  • FIGS. 5A and 5B are graphs for explaining the effect in the first embodiment. Similar to FIG. 4A, FIG. 5A is a graph showing an example of the time waveform of the observed analog signal acquired by the first microphone 101. FIG. 5B is a graph showing an example of time variation of the output signal output from the D/A conversion unit 112. As is clear from FIGS. 5A and 5B, it can be seen that the interfering sound is almost removed from the output signal and only the target sound is separated.
  • the hardware configuration of the sound source separation device 100 described above can be realized by a computer with a built-in CPU (Central Processing Unit), such as a tablet-type portable computer or a microcomputer for use in a device such as a car navigation system.
  • the hardware configuration of the sound source separation apparatus 100 may be a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), or an FPGA (Field-Programmable Integrated) integrated circuit (LGA) such as an FPGA (Field-Integrated Gate) integrated (LGA) integrated circuit (LGA). May be done.
  • FIG. 6 is a block diagram showing a hardware configuration example of the sound source separation device 100 configured by using an LSI such as DSP, ASIC or FPGA.
  • LSI such as DSP, ASIC or FPGA.
  • the sound source separation device 100 includes a signal input/output unit 131, a signal processing circuit 132, a recording medium 133, and a signal path 134 such as a bus.
  • the signal input/output unit 131 is an interface circuit that realizes a connection function with the microphone circuit 140 and the external device 141.
  • the microphone circuit 140 corresponds to the first microphone 101 and the second microphone 102, and for example, a device that captures acoustic vibration and converts it into an electric signal can be used.
  • Each function of the T/F conversion unit 104, the mask generation unit 105, the masking filter unit 110, and the T/F inverse conversion unit 111 shown in FIG. 1 can be realized by the signal processing circuit 132 and the recording medium 133. it can. Further, the A/D conversion unit 103 and the D/A conversion unit 112 in FIG. 1 can be realized by the signal input/output unit 131.
  • the recording medium 133 is used to store various setting data of the signal processing circuit 132 and various data such as signal data.
  • a volatile memory such as SDRAM (Synchronous Dynamic Random Access Memory), a non-volatile memory such as HDD (Hard Disk Drive) or SSD (Solid State Drive) can be used.
  • the recording medium 133 can store the initial state of the sound source separation process, various setting data, constant data for control, and the like.
  • the output digital signal subjected to the sound source separation processing in the signal processing circuit 132 is sent from the signal input/output unit 131 to the external device 141.
  • the external device 141 for example, a voice recognition device, a hands-free call device, or It corresponds to an abnormal sound monitoring device.
  • FIG. 7 is a block diagram showing a hardware configuration example of the sound source separation device 100 configured by using a computing device such as a computer.
  • the sound source separation device 100 includes a signal input/output unit 131, a processor 136 including a CPU 135, a memory 137, a recording medium 138, and a signal path 134 such as a bus.
  • the signal input/output unit 131 is an interface circuit that realizes a connection function with the microphone circuit 140 and the external device 141.
  • the memory 137 is a program memory that stores various programs for implementing sound source separation processing, a work memory used when the processor 136 performs data processing, and a ROM (Read Only) that is used as a memory that expands signal data. Memory) and RAM (Random Access Memory).
  • the functions of the T/F conversion unit 104, the mask generation unit 105, the masking filter unit 110, and the T/F inverse conversion unit 111 can be realized by the processor 136, the memory 137, and the recording medium 138. Further, the A/D conversion unit 103 and the D/A conversion unit 112 can be realized by the signal input/output unit 131.
  • the recording medium 138 is used for accumulating various data such as various setting data and signal data of the processor 136.
  • a volatile memory such as SDRAM or a non-volatile memory such as HDD or SSD can be used. It is possible to store programs including an OS (Operating System), various setting data, and various data such as audio signal data.
  • the data in the memory 137 can be stored in the recording medium 138.
  • the processor 136 uses the memory 137 as a working memory, and operates according to the computer program read from the memory 137, whereby the T/F conversion unit 104, the mask generation unit 105, the masking filter unit 110, and the T/F inverse unit. It can function as the conversion unit 111.
  • the output signal generated by the sound source separation processing performed by the processor 136 is sent from the signal input/output unit 131 to the external device 141.
  • the external device 141 include a voice recognition device, a hands-free call device, or It corresponds to an abnormal sound monitoring device.
  • the program executed by the processor 136 may be stored in a storage device inside a computer that executes the software program, or may be in a format distributed in a storage medium such as a CD-ROM. It is also possible to acquire the program from another computer through a wireless or wired network such as a LAN (Local Area Network). Such a program may be provided as a program product, for example.
  • various data may be transmitted and received as digital signals through a wireless or wired network without passing through conversion of analog signals and digital signals.
  • the program executed by the processor 136 is a program executed by the external device 141, for example, a program executed by the computer to cause the computer to function as a voice recognition device, a hands-free communication device, or an abnormal sound monitoring device. It is also possible that they are combined with each other and run on the same computer, or they can be distributed and run on multiple computers.
  • the external device 141 may include the sound source separation device 100. That is, the voice recognition device, the hands-free communication device, or the abnormal sound monitoring device may be configured to include the sound source separation device 100.
  • FIG. 8 is a flowchart showing the operation of the sound source separation device 100.
  • the A/D conversion unit 103 sets each of the first observed analog signal and the second observed analog signal input from each of the first microphone 101 and the second microphone 102 in a predetermined frame.
  • the first observed digital signal x 1 (t) and the second observed digital signal x 2 (t) are generated by taking in at intervals and A/D converting each, and the T/F conversion unit 104 generates them. (S10).
  • the output from the A/D conversion unit 103 is repeatedly performed when the sample number t is smaller than the predetermined value T (No in S11).
  • step S12 the T/F conversion unit 104 performs, for example, fast Fourier transform of 512 points on each of the first observed digital signal x 1 (t) and the second observed digital signal x 2 (t). Then, the first spectral component X 1 ( ⁇ , ⁇ ) and the second spectral component X 2 ( ⁇ , ⁇ ) are calculated. Then, the T/F conversion unit 104 gives the first spectral component X 1 ( ⁇ , ⁇ ) and the second spectral component X 2 ( ⁇ , ⁇ ) to the mask generation unit 105, and the first spectral component X 1 ( ⁇ , ⁇ ) is given to the masking filter unit 110.
  • the mask generation unit 105 uses the first frequency component X 1 ( ⁇ , ⁇ ) and the second spectrum component X 2 ( ⁇ , ⁇ ) to mask the time-frequency filter coefficient b mod ( ⁇ , ⁇ ) is calculated (S13).
  • S13 time-frequency filter coefficient
  • step S13A the mask coefficient calculation unit 106 determines the cross spectrum D( ⁇ , ⁇ ) from the cross-correlation function of the first spectrum component X 1 ( ⁇ , ⁇ ) and the second spectrum component X 2 ( ⁇ , ⁇ ). And the mask coefficient b( ⁇ , ⁇ ) is calculated based on the obtained cross spectrum D( ⁇ , ⁇ ).
  • the mask coefficient calculation unit 106 gives the cross spectrum D( ⁇ , ⁇ ) to the utterance amount ratio calculation unit 107, and gives the mask coefficient b( ⁇ , ⁇ ) to the mask correction unit 109. Then, the process proceeds to step S13B.
  • step S13B the utterance amount ratio calculation unit 107 determines the target sound from the first spectrum component X 1 ( ⁇ , ⁇ ), the second spectrum component X 2 ( ⁇ , ⁇ ) and the cross spectrum D( ⁇ , ⁇ ).
  • An utterance amount ratio SR( ⁇ ) which is a ratio between the utterance amount of the speaker and the utterance amount of the interfering sound speaker is calculated.
  • the utterance amount ratio calculation unit 107 gives the utterance amount ratio SR( ⁇ ) to the gain calculation unit 108. Then, the process proceeds to step S13C.
  • step S13C the gain calculation unit 108 calculates the correction gain g( ⁇ , ⁇ ) for correcting the mask coefficient b( ⁇ , ⁇ ) using the utterance amount ratio SR( ⁇ ).
  • the gain calculation unit 108 gives the correction gain g( ⁇ , ⁇ ) to the mask correction unit 109. Then, the process proceeds to step S13D.
  • step S13D the mask correction unit 109 corrects the mask coefficient b( ⁇ , ⁇ ) using the correction gain g( ⁇ , ⁇ ) to obtain the time frequency filter coefficient b mod ( ⁇ , ⁇ ). Then, the mask correction unit 109 gives the time-frequency filter coefficient b mod ( ⁇ , ⁇ ) to the masking filter unit 110.
  • the masking filter unit 110 multiplies the first spectral component X 1 ( ⁇ , ⁇ ) by the time-frequency filter coefficient b mod ( ⁇ , ⁇ ), and the spectral component Y( ⁇ , ⁇ ) of the output digital signal y(t). ) Is calculated (S14). Then, the masking filter unit 110 provides the spectral component Y( ⁇ , ⁇ ) to the T/F inverse transform unit 111.
  • the T/F inverse transform unit 111 transforms the spectral component Y( ⁇ , ⁇ ) into an output digital signal y(t) in the time domain by performing an inverse fast Fourier transform on the spectral component Y( ⁇ , ⁇ ). Yes (S15).
  • the D/A conversion unit 112 converts the output digital signal y(t) into an output signal which is an analog signal by D/A conversion, and outputs the output signal to the outside (S16). Then, the output from the D/A conversion unit 112 is repeatedly performed when the sample number t is smaller than the predetermined value T (Yes in S17).
  • the sound source separation device 100 can create a masking filter with high separation performance at low calculation cost. Therefore, the target sound can be accurately acquired, and it is possible to provide a high-accuracy voice recognition device, a high-quality hands-free communication device, and an abnormal sound monitoring device with high detection accuracy.
  • Embodiment 2 Although the first embodiment exemplifies the configuration based on the voice, the second embodiment will be described as an embodiment that can be applied to the case where there is noise other than the voice that becomes the disturbing sound.
  • FIG. 9 is a block diagram schematically showing a configuration of an information processing system 250 including the sound source separation device 200 according to the second embodiment.
  • the information processing system 250 shown here is an example of a car navigation system, and shows a case where a speaker seated in a driver seat and a speaker seated in a passenger seat speak in a moving automobile.
  • a speaker seated in the driver's seat will be referred to as a target sound speaker
  • a speaker seated in the passenger seat will be referred to as an interfering sound speaker.
  • the information processing system 250 includes a first microphone 101, a second microphone 102, a sound source separation device 200, and an external device 141.
  • the first microphone 101 and the second microphone 102 in the second embodiment are the same as the first microphone 101 and the second microphone 102 in the first embodiment.
  • the external device 141 is similar to the external device 141 described with reference to FIG. 6 or 7.
  • noise such as vehicle running noise, and during hands-free communication It is the received voice of the far-end speaker transmitted from the speaker, the guide voice transmitted by the car navigation, or the acoustic echo around which the music of the car audio or the like goes around. Voices other than the voices of the target speaker and the disturbing speaker are noise.
  • the noise signal is a noise signal.
  • the sound arriving from a direction not included in the first range including the first direction in which the target sound arrives and a second range including the second direction in which the disturbing sound arrives is excluded by excluding the spectrum component and calculating the utterance amount ratio.
  • the external device 141 is, for example, a voice recognition device, a hands-free communication device, or an abnormal sound monitoring device, as described above.
  • the external device 141 performs, for example, a voice recognition process, a hands-free call process, or an abnormal sound detection process, and obtains an output result according to each process.
  • the sound source separation device 200 includes an A/D conversion unit 103, a T/F conversion unit 104, a mask generation unit 205, a masking filter unit 110, and a T/F inverse conversion unit 111.
  • the A/D conversion unit 103, the T/F conversion unit 104, the masking filter unit 110, and the T/F inverse conversion unit 111 of the sound source separation device 200 according to the second embodiment are A of the sound source separation device 100 of the first embodiment. It is the same as the /D conversion unit 103, the T/F conversion unit 104, the masking filter unit 110, and the T/F inverse conversion unit 111. However, in the sound source separation device 200 according to the second embodiment, the output digital signal y(t) generated by the T/F inverse conversion unit 111 is given to the external device 141.
  • the mask generation unit 205 includes a mask coefficient calculation unit 106, an utterance amount ratio calculation unit 207, a gain calculation unit 108, and a mask correction unit 109.
  • the mask coefficient calculation unit 106, the gain calculation unit 108 and the mask correction unit 109 of the mask generation unit 205 according to the second embodiment are the same as the mask coefficient calculation unit 106, the gain calculation unit 108 and the mask correction of the mask generation unit 105 according to the first embodiment. It is similar to the unit 109.
  • the utterance amount ratio calculation unit 207 excludes the disturbing sound signal from the calculation of the utterance amount ratio SR( ⁇ ) by using the equation (13) obtained by modifying the equation (7) described in the first embodiment.
  • the arrival direction of the target sound is determined by the sign of the imaginary part Q( ⁇ , ⁇ ) of the cross spectrum D( ⁇ , ⁇ ) of the equation (1).
  • the target speaker and the interfering sound are calculated from the calculation of the utterance amount. The influence of noise other than the speaker can be excluded.
  • ⁇ ⁇ DT and ⁇ ⁇ DN are threshold values of the time difference of the observed analog signal for excluding from the calculation of the utterance amount, and are predetermined constants obtained by converting the arrival direction angle into the time difference.
  • ⁇ ⁇ DT is assumed when the arrival time difference of observed analog signals is extremely small and it is difficult to determine whether the arrival direction is the target sound direction or the disturbing sound direction, or when noise is coming from the front direction. Is a threshold value for excluding from the calculation of the utterance amount.
  • ⁇ ⁇ DN is highly likely to deviate from the expected arrival directions of the target sound and the interfering sound, in other words, the observed analog signal is directional noise such as wind noise mixed from a window, or from the speaker. This is a threshold value for excluding such a case from the calculation of the utterance amount when the possibility of released music or the like is high.
  • FIG. 10 is a schematic diagram illustrating an example of a method for excluding the influence of noise other than the target sound and the disturbing sound in Expression (13).
  • the exclusion range is described based on the first channel Ch1.
  • the influence of noise other than the target sound and the interfering sound can be excluded, so that the calculation accuracy of the utterance amount ratio is improved and the quality of the utterance amount is further improved. It is possible to configure a high sound source separation device.
  • the sound source separation device 200 is configured as described above, it is possible to create a masking filter with high separation performance at low calculation cost even under various noise conditions. Therefore, since the target sound can be accurately acquired even under the noise in the vehicle, a high-accuracy voice recognition device, a high-quality hands-free communication device, or an abnormal sound monitoring device for detecting an abnormal sound in the vehicle. Can be provided.
  • Embodiment 3 In the first and second embodiments, only the current frame information is used for calculating the utterance amount ratio, but the embodiment is not limited to such an example, and the past frame information is used for the calculation. It is also possible.
  • the sound source separation device 300 includes an A/D conversion unit 103, a T/F conversion unit 104, a mask generation unit 305, a masking filter unit 110, and The T/F inverse conversion unit 111 and the D/A conversion unit 112 are provided.
  • the A/D conversion unit 103, the T/F conversion unit 104, the masking filter unit 110, the T/F inverse conversion unit 111, and the D/A conversion unit 112 of the sound source separation device 300 according to the third embodiment are the same as those of the first embodiment. This is the same as the A/D conversion unit 103, the T/F conversion unit 104, the masking filter unit 110, the T/F inverse conversion unit 111, and the D/A conversion unit 112 of the sound source separation device 100 according to.
  • the mask generation unit 305 includes a mask coefficient calculation unit 106, a speech amount ratio calculation unit 307, a gain calculation unit 108, and a mask correction unit 109.
  • the mask coefficient calculation unit 106, the gain calculation unit 108, and the mask correction unit 109 of the mask generation unit 305 according to the third embodiment are the same as the mask coefficient calculation unit 106, the gain calculation unit 108, and the mask correction unit of the mask generation unit 105 according to the first embodiment. It is similar to the unit 109.
  • the utterance amount ratio calculation unit 307 calculates the utterance amount ratio SR( ⁇ ) by using the above equation (8), and further calculates the calculated SR( ⁇ ) by using the following equation (14). Smoothing is performed with the utterance amount ratio SR( ⁇ -1) before the frame.
  • is a smoothing coefficient
  • the speech volume ratio calculated in the past is used to smooth the last calculated speech volume ratio, so that even if noise is mixed in the observed analog signal, it will be stable. It is possible to calculate the utterance amount ratio by using the above method, and it is possible to perform sound source separation with higher accuracy.
  • the utterance amount ratio calculation unit 207 calculates the utterance amount of each signal by using the equation (13). However, as a modification, the utterance amount ratio calculation unit 207 performs the calculation. To a predetermined frame section, in other words, by calculating the integral value of the power spectrum of a predetermined frame section, the occupation ratio of the target sound and the disturbing sound in the predetermined frame section, specifically, , It is possible to analyze which one is speaking long or which one is loud. Therefore, it is possible to determine which voice is dominant in the double talk of the target sound and the disturbing sound, and more accurate sound source separation can be performed.
  • the information processing system 250 includes a remote voice recognition system such as a smart speaker or a TV installed in a general home or an office, a voice call system of a TV conference system, a voice recognition dialogue system of a robot, or an abnormal sound of a factory. It can also be applied to monitoring systems. Even in such a case, the effects described in the second embodiment are similarly exerted on noise or acoustic echo generated in these acoustic environments.
  • the frequency bandwidth of the input signal is 16 kHz, but the first to third embodiments are not limited to such an example.
  • the first to third embodiments can be applied to an acoustic signal in a wider band such as 24 kHz.
  • the sound source separation devices 100 to 300 can perform high quality sound source separation at low calculation cost, any one of the voice recognition system, the voice communication system, and the abnormal sound monitoring system can be used. Can be introduced to. As a result, it is possible to improve the recognition rate of a remote voice recognition system such as a car navigation system or a television, and to improve the quality of a hands-free call system such as a mobile phone or an intercom system, a TV conference system or an abnormal sound monitoring system.
  • a remote voice recognition system such as a car navigation system or a television
  • a hands-free call system such as a mobile phone or an intercom system, a TV conference system or an abnormal sound monitoring system.
  • 100, 200, 300 sound source separation device 101 first microphone, 102 second microphone, 103 A/D conversion unit, 104 T/F conversion unit, 105, 205, 305 mask generation unit, 106 mask coefficient calculation unit, 107, 207, 307 utterance volume ratio calculation unit, 108 gain calculation unit, 109 mask correction unit, 110 masking filter unit, 111 T/F inverse conversion unit, 112 D/A conversion unit, 250 information processing system.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Otolaryngology (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

La présente invention comprend : une unité de conversion temps/fréquence (104) qui génère une première composante spectrale et une seconde composante spectrale par conversion de chacun d'un premier signal numérique d'observation et d'un second signal numérique d'observation, généré à partir d'un son observé, en un signal du domaine fréquentiel; une unité de génération de masque (105) qui, à l'aide d'une fonction de corrélation croisée de la première composante spectrale et de la seconde composante spectrale et sur la base d'une différence de temps entre le temps d'arrivée du son observé au niveau d'un premier microphone et le temps d'arrivée du son observé au niveau d'un second microphone, calcule un coefficient de filtrage pour masquer une composante spectrale d'un son arrivant d'une direction différente d'une première direction à partir de laquelle le son cible arrive; et une unité de filtre de masquage (110) qui sépare la composante spectrale en effectuant un masquage par rapport à la première composante spectrale à l'aide du coefficient de filtrage.
PCT/JP2018/043747 2018-11-28 2018-11-28 Dispositif de traitement d'informations, programme et procédé de traitement d'informations Ceased WO2020110228A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/JP2018/043747 WO2020110228A1 (fr) 2018-11-28 2018-11-28 Dispositif de traitement d'informations, programme et procédé de traitement d'informations
JP2020557460A JP6840302B2 (ja) 2018-11-28 2018-11-28 情報処理装置、プログラム及び情報処理方法

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2018/043747 WO2020110228A1 (fr) 2018-11-28 2018-11-28 Dispositif de traitement d'informations, programme et procédé de traitement d'informations

Publications (1)

Publication Number Publication Date
WO2020110228A1 true WO2020110228A1 (fr) 2020-06-04

Family

ID=70854207

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2018/043747 Ceased WO2020110228A1 (fr) 2018-11-28 2018-11-28 Dispositif de traitement d'informations, programme et procédé de traitement d'informations

Country Status (2)

Country Link
JP (1) JP6840302B2 (fr)
WO (1) WO2020110228A1 (fr)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11205416B2 (en) * 2018-12-04 2021-12-21 Fujitsu Limited Non-transitory computer-read able storage medium for storing utterance detection program, utterance detection method, and utterance detection apparatus
WO2022244173A1 (fr) * 2021-05-20 2022-11-24 三菱電機株式会社 Dispositif de collecte de son, procédé de collecte de son et programme de collecte de son
JP2023549411A (ja) * 2021-01-21 2023-11-24 テンセント・テクノロジー・(シェンジェン)・カンパニー・リミテッド 音声通話の制御方法、装置、コンピュータプログラム及び電子機器
JP2024510347A (ja) * 2021-03-22 2024-03-06 ドルビー ラボラトリーズ ライセンシング コーポレイション アーチファクトおよび歪みに対するディープラーニングベースの音声強調のためのロバスト性/性能改良

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004289762A (ja) * 2003-01-29 2004-10-14 Toshiba Corp 音声信号処理方法と装置及びプログラム
JP2011113044A (ja) * 2009-11-30 2011-06-09 Internatl Business Mach Corp <Ibm> 目的音声抽出方法、目的音声抽出装置、及び目的音声抽出プログラム
JP2013061421A (ja) * 2011-09-12 2013-04-04 Oki Electric Ind Co Ltd 音声信号処理装置、方法及びプログラム
JP2013097273A (ja) * 2011-11-02 2013-05-20 Toyota Motor Corp 音源推定装置、方法、プログラム、及び移動体

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004289762A (ja) * 2003-01-29 2004-10-14 Toshiba Corp 音声信号処理方法と装置及びプログラム
JP2011113044A (ja) * 2009-11-30 2011-06-09 Internatl Business Mach Corp <Ibm> 目的音声抽出方法、目的音声抽出装置、及び目的音声抽出プログラム
JP2013061421A (ja) * 2011-09-12 2013-04-04 Oki Electric Ind Co Ltd 音声信号処理装置、方法及びプログラム
JP2013097273A (ja) * 2011-11-02 2013-05-20 Toyota Motor Corp 音源推定装置、方法、プログラム、及び移動体

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11205416B2 (en) * 2018-12-04 2021-12-21 Fujitsu Limited Non-transitory computer-read able storage medium for storing utterance detection program, utterance detection method, and utterance detection apparatus
JP2023549411A (ja) * 2021-01-21 2023-11-24 テンセント・テクノロジー・(シェンジェン)・カンパニー・リミテッド 音声通話の制御方法、装置、コンピュータプログラム及び電子機器
JP7548482B2 (ja) 2021-01-21 2024-09-10 テンセント・テクノロジー・(シェンジェン)・カンパニー・リミテッド 音声通話の制御方法、装置、コンピュータプログラム及び電子機器
JP2024510347A (ja) * 2021-03-22 2024-03-06 ドルビー ラボラトリーズ ライセンシング コーポレイション アーチファクトおよび歪みに対するディープラーニングベースの音声強調のためのロバスト性/性能改良
JP7562878B2 (ja) 2021-03-22 2024-10-07 ドルビー ラボラトリーズ ライセンシング コーポレイション アーチファクトおよび歪みに対するディープラーニングベースの音声強調のためのロバスト性/性能改良
WO2022244173A1 (fr) * 2021-05-20 2022-11-24 三菱電機株式会社 Dispositif de collecte de son, procédé de collecte de son et programme de collecte de son
JPWO2022244173A1 (fr) * 2021-05-20 2022-11-24
JP7286057B2 (ja) 2021-05-20 2023-06-02 三菱電機株式会社 集音装置、集音方法、及び集音プログラム
DE112021007311B4 (de) 2021-05-20 2024-12-19 Mitsubishi Electric Corporation Klangbilderfassungsvorrichtung, klangbilderfassungsverfahren und klangbilderfassungsprogramm
US12477272B2 (en) 2021-05-20 2025-11-18 Mitsubishi Electric Corporation Sound collection device, sound collection method, and storage medium storing sound collection program

Also Published As

Publication number Publication date
JP6840302B2 (ja) 2021-03-10
JPWO2020110228A1 (ja) 2021-03-11

Similar Documents

Publication Publication Date Title
JP5528538B2 (ja) 雑音抑圧装置
CN103238183B (zh) 噪音抑制装置
EP2773137B1 (fr) Dispositif de correction de différence de sensibilité de microphone
JP6279181B2 (ja) 音響信号強調装置
US20170140771A1 (en) Information processing apparatus, information processing method, and computer program product
JP6780644B2 (ja) 信号処理装置、信号処理方法、および信号処理プログラム
CN110383798B (zh) 声学信号处理装置、声学信号处理方法和免提通话装置
JP5834088B2 (ja) 動的マイクロフォン信号ミキサ
KR101475864B1 (ko) 잡음 제거 장치 및 잡음 제거 방법
WO2015196729A1 (fr) Procédé et dispositif d&#39;amélioration vocale d&#39;un réseau de microphones
US11380312B1 (en) Residual echo suppression for keyword detection
JP6840302B2 (ja) 情報処理装置、プログラム及び情報処理方法
US10951978B2 (en) Output control of sounds from sources respectively positioned in priority and nonpriority directions
US11984132B2 (en) Noise suppression device, noise suppression method, and storage medium storing noise suppression program
CN106031196A (zh) 信号处理装置、方法以及程序
US11386911B1 (en) Dereverberation and noise reduction
JP4448464B2 (ja) 雑音低減方法、装置、プログラム及び記録媒体
JP2005514668A (ja) スペクトル出力比依存のプロセッサを有する音声向上システム
EP1913591B1 (fr) Amelioration de l&#39;intelligibilite vocale dans un dispositif de communication mobile par commande du fonctionnement d&#39;un vibreur en fonction du bruit de fond
JP6631127B2 (ja) 音声判定装置、方法及びプログラム、並びに、音声処理装置
JP7013789B2 (ja) 音声処理用コンピュータプログラム、音声処理装置及び音声処理方法
US20130226568A1 (en) Audio signals by estimations and use of human voice attributes
Hao et al. L3C-DeepMFC: Low-Latency Low-Complexity Deep Marginal Feedback Cancellation with Closed-Loop Fine Tuning for Hearing Aids
JP6221463B2 (ja) 音声信号処理装置及びプログラム
CN119601026A (zh) 噪声估计方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18941765

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020557460

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18941765

Country of ref document: EP

Kind code of ref document: A1