[go: up one dir, main page]

CN105206281B - Speech enhancement method based on distributed microphone array network - Google Patents

Speech enhancement method based on distributed microphone array network Download PDF

Info

Publication number
CN105206281B
CN105206281B CN201510582363.5A CN201510582363A CN105206281B CN 105206281 B CN105206281 B CN 105206281B CN 201510582363 A CN201510582363 A CN 201510582363A CN 105206281 B CN105206281 B CN 105206281B
Authority
CN
China
Prior art keywords
node
network
channel
microphone array
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201510582363.5A
Other languages
Chinese (zh)
Other versions
CN105206281A (en
Inventor
胡旻波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201510582363.5A priority Critical patent/CN105206281B/en
Publication of CN105206281A publication Critical patent/CN105206281A/en
Application granted granted Critical
Publication of CN105206281B publication Critical patent/CN105206281B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Circuit For Audible Band Transducer (AREA)

Abstract

The invention discloses a kind of sound enhancement methods based on distributed microphone array network, including the following steps: establishes the distributed microphone array network based on Ad-hoc network;It is synchronous that sample rate is carried out to network node;The signal of each node is subjected to framing;Speech enhan-cement is carried out using multichannel Wiener filter in each node;By the every other node of transmitting voice signal after enhancing to network;In each node, simultaneously according to voice signal after the enhancing of the single channel of the multichannel microphone array observation signal of present node and every other node, speech enhan-cement is carried out using multichannel Wiener filter again, obtains voice signal after the updated single channel enhancing of present node.By isolated microphone array, network interconnects the present invention by wireless communication, forms microphone array network, is conducive to the speech enhan-cement effect for improving individual node.

Description

Sound enhancement method based on distributed microphone array network
Technical field
The present invention relates to sound enhancement methods, and in particular to a kind of speech enhan-cement based on distributed microphone array network Method.
Background technique
Various noises are frequently accompanied by environment locating for us, for example, television set sound and fan sound, automobile in room Interior engine sound, the running car sound on road, Babble noise in coffee shop etc..Noise is to multiple voice processing system It has a negative impact.For example, noise can interfere the sound for even covering other side in voice communication, speech quality is reduced;In language In sound identifying system, noise can make phonetic recognization rate decline, or even keep identifying system entirely ineffective.Therefore, according to observing Noisy Speech Signal, estimation clean speech have a very important significance, we are referred to as speech enhan-cement.
Traditional voice enhancing algorithm is handled using the observation signal of a microphone, including single channel Wiener filtering Device, spectrum-subtraction, the maximum likelihood based on statistical model and sound enhancement method of maximum a posteriori probability etc..Although such method can To eliminate noise to a certain extent, but deposit problem both ways.It will cause lacking for phonetic element while firstly, eliminating noise It loses, i.e. generation speech distortion.Secondly, noise eliminate after frequency spectrum on be commonly present the extreme point of Random Discrete, experience hearer " music noise ".The factor of these two aspects all makes the enhanced intelligibility of speech be difficult to reach expected, and makes speech recognition Performance cannot effectively improve.
To solve the above-mentioned problems, people start with two or more microphones, form " microphone array ", to visit Seek more preferably multi-channel speech enhancement method.Microphone in microphone array is in space different location, but clock and sampling Rate is consistent.Therefore, multiple microphones provide the temporal redundancy and Spatial Difference of voice and noise, and more information make The raising of speech enhan-cement performance is provided with possibility.In order to enhance voice, people can design the space filter of referred to as " Beam-former " Wave device extracts the signal in target sound source direction, inhibits the noise in other directions.Simplest Beam-former is " delay adduction " Beam-former, and MVDR and LCMV Beam-former theoretically can avoid speech distortion while reducing noise.In addition to Except simple Beam-former, generalized sidelobe is eliminated (GSC) framework and is also widely used.Although can theoretically prove The equivalence of GSC and LCMV Beam-former, but the realization of GSC is more simple, computation complexity is relatively low.Above-mentioned wave beam shape Grow up to be a useful person be required to voice directions (even noise orientation) it is known that but sound bearing often and is not fixed under actual conditions, and make an uproar Sound bearing under sound and reverberation is difficult to estimate.In order to avoid auditory localization, single channel Wiener filter is generalized to multichannel, So that optimal multichannel Wiener filter can be designed according only to noise space time statistical properties, and noise space time statistical properties can In conjunction with voice existing probability or voice activity detection algorithms estimation and update.It is compared with single-channel algorithm, even twin-channel Sound enhancement method can obtain being obviously improved for performance.
Speech enhan-cement is carried out using microphone array and is increasingly becoming mainstream.Once microphone array hardware completes, Microphone spacing, the parameters such as included number of microphone are difficult to change.Due to the limitation in the spaces such as handheld device, microphone array More microphone and biggish spacing cannot be used.When microphone array is only in a lesser spatial dimension, it is difficult to Accurate comprehensive acquisition is carried out to ambient noise and reverberation.And theoretically more microphones and bigger microphone spacing can be with Effectively improve the performance of multicenter voice enhancing algorithm.Therefore, it is traditional based on the voice enhancement algorithm of microphone array by The limitation of microphone array itself scalability and space.
Summary of the invention
In view of the deficiencies of the prior art, the invention discloses a kind of speech enhan-cements based on distributed microphone array network Method.
Technical scheme is as follows:
A kind of sound enhancement method based on distributed microphone array network, including the following steps:
Step a, the distributed microphone array network based on Ad-hoc network being made of multiple microphone arrays is established; It can be in communication with each other between any two network node;
Step b, by distributed microphone array netinit, i.e., it is synchronous sample rate to be carried out to network node;
Step c, the signal of each node is subjected to framing, the multinode multichannel microphone array after obtaining framing observes letter Number;
Step d, in each node, for the multichannel microphone array observation signal of each frame, according to the more of present node Road microphone array observation signal carries out speech enhan-cement using multichannel Wiener filter, and voice is believed after obtaining single channel enhancing Number;
Step e, in each node, transmitting voice signal after which is enhanced by the single channel that the step d is obtained To the every other node of network;
Step f, in each node, while according to the multichannel microphone array observation signal and every other section of present node Voice signal after the single channel enhancing of point, carries out speech enhan-cement using multichannel Wiener filter again, obtains present node more Voice signal after single channel enhancing after new;
Step g, iteration step e~step f, when voice signal is restrained after the single channel enhancing that certain node obtains, Voice signal no longer updates after the single channel enhancing of present node;When all nodes single channel enhancing after voice signal no longer When update, processing terminate for present frame;The finally voice signal after each node obtains present node enhancing.
Its further technical solution are as follows: the microphone array includes audio collection module and communication module.
Its further technical solution are as follows: the structure of the Ad-hoc network in the step a is planar structure or classification Structure;Ad-hoc network using priori formula, reaction equation or hybrid-type Routing Protocol realize in network two node devices it Between be in communication with each other.
Its further technical solution are as follows: the step b further includes carrying out time synchronization to network node;
The distribution microphone array includes network equipment clock;When the time synchronization is by the network equipment Clock is synchronized based on ntp network time protocol.
Its further technical solution is, the step b specifically includes the following steps:
Step b1, network samples rate is initialized, makes K=1, i.e. network samples rate f0Equal to the equipment sample rate of node 1 f1
Step b2, the equipment sample rate of node K is fK;By the equipment sample rate f of node KKIt is transferred to node K+1;
If step b3, the equipment sample rate f of node K+1K+1> fK, then f0=fK, otherwise f0=fK+1
Step b4, K=K+1;
Step b5, step b2~step b4 is repeated, until all nodes are traversed, thus network samples rate f0For whole network institute There is the equipment sample rate minimum value of node;
Step b6, pass through finish node for current network sample rate f0Other each nodes are transferred to, so that all nodes Equipment sample rate is f0
Its further technical solution are as follows: the signal framing in the step c inhibits frequency using hamming window or Hanning window Spectrum leakage;The step c uses the framing strategy of time aliasing.
Its further technical solution are as follows: the step d uses the multi-pass of time domain multichannel Wiener filter or frequency domain Road Wiener filter is filtered multichannel microphone array observation signal, to achieve the effect that speech enhan-cement:
In node K, the expression formula of the time domain multichannel Wiener filter are as follows:
hW, K(t)=[RXx, K(t)+λRNn, K(t)]-1RXx, K(t)u;
In above formula, RXx, K(t)=RYy, K(t)-RNn, K(t);
It is the clean speech vector x of present nodeK(t)=[x1,K(t), x2,K(t),…,xM,K(t)]TTime domain autocorrelation matrix;
It is the noise vector n of present nodeK(t)=[n1,K(t),n2,K(t),..., nM,K(t)]TTime domain autocorrelation matrix;
It is the multichannel microphone array observation signal vector y of present nodeK(t)= [y1,K(t),y2,K(t),...,yM,K(t)]TTime domain autocorrelation matrix;
U=[1,0 ..., 0]T, the length is M;
M is the number of microphone of present node;
λ is the degree for controlling noise elimination and speech distortion, and λ > 0, λ is bigger, and the repressed effect of noise is more obvious, together When bring more speech distortions;
The time-domain filtering of node K exports are as follows:
In node K, the expression formula of the frequency-domain multi-channel Wiener filter are as follows:
HW, K(ω)=[RXX, K(ω)+λRNN, K(ω)]-1RXX, K(ω)u;
In above formula, RXX, K(ω)=RYY, K(ω)-RNN, K(ω);
It is the clean speech vector X of present nodeK(ω)=[X1, K (ω), X2, K(ω) ..., XM, K(ω)]HFrequency domain autocorrelation matrix;
It is the noise vector N of present nodeK(ω)=[N1,K(ω),N2,K (ω),...,NM,K(ω)]HFrequency domain autocorrelation matrix;
It is the multichannel microphone array observation signal vector Y of present nodeK(ω) =[Y1,K(ω),Y2,K(ω),...,YM,K(ω)]HFrequency domain autocorrelation matrix;
U=[1,0 ..., 0]T, the length is M;
M is the number of microphone of present node;
λ is the degree for controlling noise elimination and speech distortion, and λ > 0, λ is bigger, and the repressed effect of noise is more obvious, together When bring more speech distortions;
The frequency domain filtering of node K exports are as follows:
Its further technical solution are as follows: the step e includes that transmitting node sequence is added in the data packet of signal transmission Number, the information of receiving node serial number and multichannel Wiener filter number of processes.
Its further technical solution are as follows: the step f includes the multichannel Wiener filter pair using time domain or frequency domain Signal is filtered after the enhancing of present node multichannel observation signal and other nodes;
In the multichannel Wiener filter of the time domain,
The joint that signal is constituted after the enhancing of present node K multichannel microphone array observation signal and every other node Vector are as follows:
In above formula,For except Vector composed by the enhanced time domain single-channel voice of other outer nodes of node K;
NiFor the number of iterations of step g;
It isIn clean speech ingredient;
It isIn noise contribution;
For clean speech ingredient in present nodeTime domain from phase Close matrix;
For noise vector in present nodeTime domain autocorrelation matrix;
To combine vector in present nodeTime domain autocorrelation matrix;
U=[1,0 ..., 0]T, it is the node total number in network the length is M+P-1, P;
Then node K NiThe time domain multichannel Wiener filter of+1 iteration are as follows:
In the multichannel Wiener filter of the frequency domain,
The joint vector that signal is constituted after the enhancing of present node K multichannel observation signal and every other node are as follows:
In above formula, For vector composed by the enhanced frequency domain single-channel voice of other nodes in addition to node K;
NiFor the number of iterations of step g;
ForIn clean speech ingredient;
ForIn noise contribution;
For remove node K except other nodes it is pure The frequency domain autocorrelation matrix of speech vector;
To remove other node background noise vectors except node K Frequency domain autocorrelation matrix;
To remove the frequency domain of other node observation vectors except node K certainly Correlation matrix;
U=[1,0 ..., 0]T;The length is M+P-1, P is the node total number in network, then node K Ni+ 1 iteration Frequency-domain multi-channel Wiener filter are as follows:
Its further technical solution are as follows: the step g include according to filtering front and back signal vector difference norm and Signal energy judges the step of whether voice signal restrains after the single channel enhancing that node obtains, and method is as follows:
It is previous to filter obtained single channel time domain signal vector in node K are as follows:
This filters obtained single channel time domain signal vector are as follows:
WhenWhen, present filter output convergence;
In above formula, | | | |pP norm is represented, η is threshold value.
The method have the benefit that:
First, the invention proposes a kind of completely new frames that speech enhan-cement is carried out based on microphone array.With tradition side Method is different, and by isolated microphone array, network interconnects the present invention by wireless communication, forms microphone array network.
Second, each of microphone network node can directly or indirectly utilize whole Mikes in network Wind breaches the space limitation pole of each equipment, greatly extends the spatial observation range of individual node, be conducive to improve single The speech enhan-cement effect of node.For single-channel, after it is linked into microphone array network, it can reach more The speech enhan-cement effect in channel.
Third, microphone array network is to the microphone number in the relative position of network node quantity, each node, node Amount and spatial position are not done any it is assumed that having great scalability and freedom degree.
4th, by way of Ad-hoc networking, so that network needs not rely on central node, it can be completed distributed It calculates, improves the tolerance of network.
5th, each node of microphone array network obtains local optimal filter output simultaneously, this is each in network Node provides the user experience of differentiation.
Detailed description of the invention
Fig. 1 is flow chart of the invention.
Fig. 2 is the distributed microphone array network diagram based on Ad-hoc network.
Fig. 3 is the synchronous flow chart of distributed microphone array network samples rate.
Fig. 4 is the flow chart of the single node speech enhan-cement based on multichannel Wiener filter.
Fig. 5 is the flow chart of the iterative speech enhan-cement of multinode based on multichannel Wiener filter.
Specific embodiment
Fig. 1 is flow chart of the invention.
Core content in the present invention mainly includes three parts: (1), the foundation of Ad-hoc network shown in step a and The initialization of audio collection module shown in step b;(2), the single node voice shown in step d based on multichannel Wiener filter increases By force;(3) the iterative speech enhan-cement of multinode shown in step f based on multichannel Wiener filter.
As shown in Figure 1, the present invention specifically includes the following steps:
(1), the foundation and initialization of Ad-hoc network
Step a, multiple microphone arrays are set, point based on Ad-hoc network being made of multiple microphone arrays is established Cloth microphone array network;It can be in communication with each other between any two network node.
Ad-hoc network is otherwise known as interim self-organizing network.Since the network is not necessarily to additional infrastructure network, base In construction and extension, therefore by the present invention for constructing distributed microphone array network.
Fig. 2 is the distributed microphone array network diagram based on Ad-hoc network.In microphone array network, net Network node is each microphone array.The microphone array equipment of each node includes at least one microphone.Each node Microphone array equipment further includes audio collection module, communication module and computing module.It is connected with each other between modules.Its In, audio collection module is responsible for acquiring the sound in present node local environment, and communication module is responsible for and other nodes communicate mould Data transmission between block, the speech enhan-cement that computing module is responsible for this node calculate.
Hierarchical structure or plane formula network structure can be used in the structure of Ad-hoc network.In hierarchical structure, multiple nets Network node is divided into different " cluster ", and each node in cluster selects cluster head by certain election algorithm, cluster head safeguard this cluster and Routing iinformation between cluster head is realized any in network jointly by the communication between communication, cluster head and the cluster interior nodes between cluster head Communication between two nodes.In plane formula network structure, each node status equity, respective independent maintenance owns to other The routing iinformation of node.In general, hierarchical structure is used when network node is more, and when network node is less, it uses Plane formula network structure.
As shown in Fig. 2, the present embodiment only includes three network nodes, therefore use plane formula network structure.
The present embodiment uses standardized Ad-hoc network communication mode, and each node of Ad-hoc network passes through IEEE 802.11 agreement is communicated.When networking, start node will be set as when a certain node by software by user, and send request The wireless signal of networking.Network node to be added searches for the signal and start node completes that the network is added after confirming.Work as institute After having node that network is added, start node turn-off request networking signal, to complete the establishment process of network.Each node is pressed The sequencing for being shining into net is assigned node serial number.
Step b, by distributed microphone array netinit, i.e., it is synchronous sample rate to be carried out to network node.
Fig. 3 is the synchronous flow chart of distributed microphone array network samples rate.
Specifically includes the following steps:
Step b1, network samples rate is initialized, makes K=1, i.e. network samples rate f0Equal to the equipment sample rate of node 1 f1
Step b2, the equipment sample rate of node K is fK;By the equipment sample rate f of node KKIt is transferred to node K+1;
If step b3, the equipment sample rate f K+1 > f of node K+1K, then f0=fK, otherwise f0=fK+1
Step b4, K=K+1;
Step b5, step b2~step b4 is repeated, until all nodes are traversed, thus network samples rate f0For whole network institute There is the equipment sample rate minimum value of node;
Step b6, pass through finish node, i.e., the last one node in ergodic process, by current network sample rate f0Transmission To other each nodes, so that all node device sample rates are f0
Network samples rate in step b is the software sampling rate of whole network.Node device sample rate is that node passes through firmly The sample rate of part equipment acquisition voice signal.
Step b further includes that clock is synchronous.
Microphone array further includes network equipment clock, is set on communication module.Time synchronization is set by network Standby clock is synchronized based on ntp network time protocol.Number of the sequence of network as present node is added in node, initial to compile Number be 1.Communication module between each node uses high-accuracy network time protocol NTP and number to keep for 1 start node Clock is synchronous.Audio collection module in node reads the network equipment clock in communication module, at the beginning of audio collection Between and communication module specific time node TsAlignment.TsValue specified by user, and the whole network is sent to by start node Network.
(2), based on the single node speech enhan-cement of multichannel Wiener filter.
Step c, the signal of each node is subjected to framing, the multinode multichannel microphone array after obtaining framing observes letter Number.Signal framing in step c inhibits spectral leakage using hamming window or Hanning window.Step c uses the framing of time aliasing Strategy.
Step d: in each node, for the multichannel microphone array observation signal of each frame, according to the more of present node Road microphone array observation signal carries out speech enhan-cement using multichannel Wiener filter, and voice is believed after obtaining single channel enhancing Number.
It is eliminated relative to Beam-former and generalized sidelobe, a clear superiority of multichannel Wiener filter is without estimating Meter sound bearing can effectively realize speech enhan-cement.Since target language sound bearing may often change under actual conditions, and make an uproar The sound bearing that variation is tracked under acoustic environment is particularly difficult, therefore the present invention carries out voice increasing using multichannel Wiener filter By force.
Wiener filter can be calculated in time domain or frequency domain, theoretically, Time-Domain algorithm and frequency domain algorithm equivalent, But in practice due to the difference of time domain and Frequency domain noise estimated result, so that algorithm output is not fully consistent.In addition, two changes Changing domain algorithm, there is also the differences of computation complexity.
Fig. 4 is the flow chart of the single node speech enhan-cement based on multichannel Wiener filter.It is as shown in Figure 4: to first have to pair Speech activity detect or estimate voice existing probability, secondly estimates noise autocorrelation matrix, again Noise autocorrelation matrix is calculated, the calculating of multichannel Wiener filter is finally carried out.
The method being filtered using the multichannel Wiener filter of time domain or frequency domain to original multiple signals is as follows:
In node K, the expression formula of the time domain multichannel Wiener filter are as follows:
hW, K(t)=[RXx, K(t)+λRNn, K(t)]-1RXx, K(t)u;
In above formula, RXx, K(t)=RYy, K(t)-RNn, K(t);
It is the clean speech vector x of present nodeK(t)=[x1,K(t),x2,K (t),...,xM,K(t)]TTime domain autocorrelation matrix;
It is the noise vector n of present nodeK(t)=[n1,K(t),n2,K(t),..., nM,K(t)]TTime domain autocorrelation matrix;
It is the multichannel microphone array observation signal vector y of present nodeK(t)= [y1,K(t),y2,K(t),...,yM,K(t)]TTime domain autocorrelation matrix;
U=[1,0 ..., 0]T, the length is M;
M is the number of microphone of present node;
λ is the degree for controlling noise elimination and speech distortion, and λ > 0, when λ is bigger, the repressed effect of noise is more obvious, More speech distortions are brought simultaneously;
The time-domain filtering of node K exports
In node K, the expression formula of the frequency-domain multi-channel Wiener filter are as follows:
HW, K(ω)=[RXX, K(ω)+λRNN, K(ω)]-1RXX, K(ω)u;
Above formula, RXX, K(ω)=RYY, K(ω)-RNN, K(ω);
It is the clean speech vector X of present nodeK(ω)=[X1, K (ω), X2, K(ω) ..., XM, K(ω)]HFrequency domain autocorrelation matrix;
It is the noise vector N of present nodeK(ω)=[N1,K(ω),N2,K (ω),...,NM,K(ω)]HFrequency domain autocorrelation matrix;
It is the multichannel microphone array observation signal vector Y of present nodeK(ω) =[Y1,K(ω),Y2,K(ω),...,YM,K(ω)]HFrequency domain autocorrelation matrix;
U=[1,0 ..., 0]T, the length is M;
M is the number of microphone of present node;
λ is the degree for controlling noise elimination and speech distortion, and λ > 0, λ is bigger, and the repressed effect of noise is more obvious, together When bring more speech distortions;
The frequency domain filtering of node K exports
When node only includes a microphone, voice signal is original observation letter after the single channel enhancing of node output Number.
The critical issue of multichannel Wiener filter is the estimation of noise autocorrelation matrix.In the time domain, which can tie Voice activity detection is closed to be estimated.Assuming that present frame is judged as noise, then
In above formula, 0 < α > 1 is updating factor.Otherwise, the matrix is kept not update.Similarly, combinable on frequency domain Voice existing probability is estimated.Assuming that the voice existing probability of present frame frequency band omega is p (ω), then RNN, K(ω) updates are as follows:
RNN, K(ω)←αpRNN, K(ω)+(1-αp)X(ω)XH(ω);
Wherein, αp=α+p (ω) (1- α), similarly 0 < α < 1 is updating factor.The noise autocorrelation of time domain or frequency domain Matrix initialisation is the average value of matrix in initial several frames.
When node only includes a microphone, in order to avoid the enhanced voice signal distortion of this node, the node is defeated Voice signal is original multichannel microphone array observation signal after single channel enhancing out.
(3), the iterative speech enhan-cement of multinode based on multichannel Wiener filter.
Step e, in each node, transmitting voice signal after which is enhanced by the single channel that the step d is obtained To the every other node of network.In step e, transmitting node, receiving node sequence can also be added in the data packet of signal transmission Number and the information such as multichannel Wiener filter number of processes, to be mutually distinguishable with other data packets.
Step f, in each node, while according to the multichannel observation signal of the microphone array of present node and every other Voice signal after the single channel enhancing of node, carries out speech enhan-cement using multichannel Wiener filter again, obtains present node Voice signal after updated single channel enhancing;
Each node can obtain the enhanced voice letter of single channel according to the microphone array observation signal of node itself Number.On the one hand the different enhanced voice signals of node inhibit the noise around the node, on the other hand provide pure language The redundancy of sound, therefore can be used to further increase the effect of speech enhan-cement by other nodes.Go out from the angle of network communication Hair, transmits enhanced single-channel voice signal, and the multi-channel original signal that not a node is observed, has been greatly saved band Width, and ensure that the consistency of internodal data transformat.
This link is by the multi-channel GPS observations signal of single-channel voice signal after the enhancing of remaining node and this node together structure The observation vector of Cheng Xin.Single-channel voice signal can be considered the new observation channel of local node, class after the enhancing of remaining node As, multichannel Wiener filter can be used, according to above-mentioned new observation vector, obtains list after updated node enhancing Channel speech signal.
Step g, iteration step e~step f, when voice signal is restrained after the single channel enhancing that certain node obtains, Voice signal no longer updates after the single channel enhancing of present node;When all nodes single channel enhancing after voice signal no longer When update, processing terminate for present frame;The finally voice signal after each node obtains present node enhancing.
Fig. 5 is the flow chart of the iterative speech enhan-cement of multinode based on multichannel Wiener filter.Construction is current first The joint vector that signal is constituted after the enhancing of the multichannel observation signal and every other node of node, secondly to speech activity into Voice existing probability is estimated in row detection, updates noise autocorrelation matrix again, calculates signal with noise later Autocorrelation matrix finally carries out the calculating of multichannel Wiener filter.
Circular is as follows:
In node K, the multichannel Wiener filter of the time domain,
The joint vector that signal is constituted after the enhancing of present node K multichannel observation signal and every other node is
In above formula,For except section Vector composed by the enhanced time domain single-channel voice of other outer nodes of point K:
NiFor the number of iterations of step g;
It isIn clean speech ingredient;
It isIn noise contribution;
For clean speech ingredient in present nodeTime domain from phase Close matrix;
For noise contribution in present nodeTime domain autocorrelation matrix;
To combine vector in present nodeTime domain autocorrelation matrix;
U=[1,0 ..., 0]T, it is the node total number in network the length is M+P-1, P;
Then node K NiThe time domain multichannel Wiener filter of+1 iteration are as follows:
In node K, the multichannel Wiener filter of the frequency domain,
The joint vector that signal is constituted after the enhancing of present node K multichannel observation signal and every other node is
In above formula, For vector composed by the enhanced frequency domain single-channel voice of other nodes in addition to node K;
NiFor the number of iterations of step g;
To combine vectorIn clean speech ingredient;
To combine vectorIn noise contribution;
For remove node K except other nodes it is pure The frequency domain autocorrelation matrix of speech vector;
To remove other node background noise vectors except node K Frequency domain autocorrelation matrix;
To remove the frequency domain of other node observation vectors except node K certainly Correlation matrix;
U=[1,0 ..., 0]T;The length is M+P-1, P is the node total number in network, then node K Ni+ 1 iteration Frequency-domain multi-channel Wiener filter are as follows:
It, equally can be by this updated signal after all nodes obtain updated single-channel voice enhancing signal Other nodes are transmitted to, in order to which other nodes update single-channel voice enhancing signal again.It therefore, can be in distributed Mike Wind array network repeats the above steps, and when voice signal is restrained after the single channel enhancing that certain node obtains, voice is believed after enhancing It number no longer updates.When all node single-channel voice signals no longer update, processing terminate for present frame, finally in each node Obtain present node enhancing after voice signal.
Step g may also include the step of whether voice signal restrains judged.We can be according to filtering front and back signal vector Difference norm and signal energy comprehensive descision node obtain single channel enhancing after voice signal whether restrain: step g Language after the single channel enhancing obtained according to the norm of the difference of filtering front and back signal vector and signal energy comprehensive descision node Whether sound signal restrains, and method is as follows:
In node K, the previous obtained single channel time domain signal vector that filters is
This filters obtained single channel time domain signal vector
WhenWhen, it is believed that present filter output convergence;
||·||pP norm is represented, η is a threshold value.
What has been described above is only a preferred embodiment of the present invention, and present invention is not limited to the above embodiments.It is appreciated that this The other improvements and change that field technical staff directly exports or associates without departing from the spirit and concept in the present invention Change, is considered as being included within protection scope of the present invention.

Claims (9)

1.一种基于分布式麦克风阵列网络的语音增强方法,其特征在于,包括下列步骤:1. a speech enhancement method based on distributed microphone array network, is characterized in that, comprises the following steps: 步骤a、建立由多个麦克风阵列构成的基于Ad-hoc网络的分布式麦克风阵列网络;任意两个网络节点之间均可相互通信;Step a, establishing a distributed microphone array network based on an Ad-hoc network composed of multiple microphone arrays; any two network nodes can communicate with each other; 步骤b、将分布式麦克风阵列网络初始化,即对网络节点进行采样率同步;Step b, initializing the distributed microphone array network, that is, synchronizing the sampling rate of the network nodes; 步骤c、将各节点的信号进行分帧,得到分帧后的多节点多路麦克风阵列观测信号;Step c, sub-framing the signals of each node to obtain the multi-node and multi-channel microphone array observation signals after the sub-framing; 步骤d、在每个节点,对于每一帧的多路麦克风阵列观测信号,根据当前节点的多路麦克风阵列观测信号,采用多通道维纳滤波器进行语音增强,得到单通道增强后语音信号;Step d, at each node, for the multi-channel microphone array observation signal of each frame, according to the multi-channel microphone array observation signal of the current node, the multi-channel Wiener filter is used for speech enhancement, and a single-channel enhanced speech signal is obtained; 步骤e、在每个节点,将该节点通过所述步骤d得到的单通道增强后语音信号传输到网络的所有其他节点;Step e, at each node, transmit the single-channel enhanced voice signal obtained by the node through the step d to all other nodes of the network; 步骤f、在每个节点,同时根据当前节点的多路麦克风阵列观测信号和所有其他节点的单通道增强后语音信号,再次采用多通道维纳滤波器进行语音增强,得到当前节点更新后的单通道增强后语音信号;Step f. At each node, at the same time, according to the multi-channel microphone array observation signal of the current node and the single-channel enhanced speech signal of all other nodes, the multi-channel Wiener filter is used again for speech enhancement, and the updated single channel of the current node is obtained. Voice signal after channel enhancement; 步骤g、重复迭代步骤e~步骤f,当某节点得到的单通道增强后语音信号收敛时,当前节点的单通道增强后语音信号不再更新;当所有节点的单通道增强后语音信号均不再更新时,当前帧处理结束;最终在每个节点均得到当前节点增强后语音信号;Step g, repeat iterative steps e to step f, when the single-channel enhanced voice signal obtained by a node converges, the single-channel enhanced voice signal of the current node is no longer updated; when the single-channel enhanced voice signals of all nodes are not updated. When updating again, the processing of the current frame ends; finally, each node obtains the enhanced voice signal of the current node; 所述步骤b具体包括以下步骤:The step b specifically includes the following steps: 步骤b1、将网络采样率初始化,使K=1,即网络采样率f0等于节点1的设备采样率f1Step b1, initialize the network sampling rate so that K=1, that is, the network sampling rate f 0 is equal to the device sampling rate f 1 of node 1 ; 步骤b2、节点K的设备采样率为fK;将节点K的设备采样率fK传输到节点K+1;Step b2, the device sampling rate of node K is f K ; the device sampling rate f K of node K is transmitted to node K+1; 步骤b3、若节点K+1的设备采样率fK+1>fK,则f0=fK,否则f0=fK+1Step b3, if the device sampling rate f K+1 >f K of the node K+1, then f 0 =f K , otherwise f 0 =f K+1 ; 步骤b4、K=K+1;Step b4, K=K+1; 步骤b5、重复步骤b2~步骤b4,直至遍历所有节点,从而网络采样率f0为全网络所有节点的设备采样率最小值;Step b5: Repeat steps b2 to b4 until all nodes are traversed, so that the network sampling rate f 0 is the minimum device sampling rate of all nodes in the entire network; 步骤b6、通过最终节点将当前网络采样率f0传输到其他各个节点,使得所有节点设备采样率为f0In step b6, the current network sampling rate f 0 is transmitted to other nodes through the final node, so that the sampling rate of all node devices is f 0 . 2.如权利要求1所述的基于分布式麦克风阵列网络的语音增强方法,其特征在于:所述麦克风阵列包括音频采集模块和通信模块。2 . The voice enhancement method based on a distributed microphone array network according to claim 1 , wherein the microphone array comprises an audio collection module and a communication module. 3 . 3.如权利要求1所述的基于分布式麦克风阵列网络的语音增强方法,其特征在于:所述步骤a中的Ad-hoc网络的结构为平面结构或者分级结构;Ad-hoc网络采用先验式、反应式或者混合式的路由协议实现网络中两个节点设备之间的相互通信。3. the speech enhancement method based on distributed microphone array network as claimed in claim 1 is characterized in that: the structure of Ad-hoc network in described step a is plane structure or hierarchical structure; Ad-hoc network adopts a priori A routing protocol of type, reactive type or hybrid type realizes the mutual communication between two node devices in the network. 4.如权利要求1所述的基于分布式麦克风阵列网络的语音增强方法,其特征在于:所述步骤b还包括对网络节点进行时间同步;所述分布式麦克风阵列包括网络设备时钟;所述时间同步是通过所述网络设备时钟,基于NTP网络时间协议进行同步。4. The voice enhancement method based on a distributed microphone array network according to claim 1, wherein: the step b further comprises performing time synchronization on network nodes; the distributed microphone array comprises a network device clock; the Time synchronization is performed through the network device clock based on the NTP network time protocol. 5.如权利要求1所述的基于分布式麦克风阵列网络的语音增强方法,其特征在于:所述步骤c中的信号分帧使用海明窗或者汉宁窗抑制频谱泄露;所述步骤c采用时间混叠的分帧策略。5. the speech enhancement method based on distributed microphone array network as claimed in claim 1 is characterized in that: the signal framing in described step c uses Hamming window or Hanning window to suppress spectrum leakage; Described step c adopts Framing strategy for temporal aliasing. 6.如权利要求1所述的基于分布式麦克风阵列网络的语音增强方法,其特征在于:所述步骤d使用时域多通道维纳滤波器或者频域的多通道维纳滤波器对多路麦克风阵列观测信号进行滤波,以达到语音增强的效果:6. the speech enhancement method based on distributed microphone array network as claimed in claim 1 is characterized in that: described step d uses the multi-channel Wiener filter of time domain or the multi-channel Wiener filter of frequency domain to multi-channel The microphone array observes the signal for filtering to achieve the effect of speech enhancement: 在节点K,所述时域多通道维纳滤波器的表达式为:At node K, the expression of the time-domain multi-channel Wiener filter is: hw,K(t)=[Rxx,K(t)+λRnn,K(t)]-1Rxx,K(t)u;h w,K (t)=[ Rxx,K (t)+λRnn ,K (t)] - 1Rxx,K (t)u; 上式中,Rxx,K(t)=Ryy,K(t)-Rnn,K(t);In the above formula, R xx, K (t)=R yy, K (t)-R nn, K (t); 是当前节点的纯净语音向量xK(t)=[x1,K(t),x2,K(t),…,xM,K(t)]T的时域自相关矩阵; is the time domain autocorrelation matrix of the pure speech vector x K (t)=[x 1,K (t),x 2,K (t),…,x M,K (t)] T of the current node; 是当前节点的噪声向量nK(t)=[n1,K(t),n2,K(t),...,nM,K(t)]T的时域自相关矩阵; is the time domain autocorrelation matrix of the noise vector n K (t)=[n 1,K (t),n 2,K (t),...,n M,K (t)] T of the current node; 是当前节点的多路麦克风阵列观测信号向量yK(t)=[y1,K(t),y2,K(t),...,yM,K(t)]T的时域自相关矩阵; is the time domain of the current node's multi-channel microphone array observation signal vector y K (t)=[y 1,K (t),y 2,K (t),...,y M,K (t)] T autocorrelation matrix; u=[1,0,...,0]T,其长度为M;u=[1,0,...,0] T , whose length is M; M为当前节点的麦克风数量;M is the number of microphones of the current node; λ为控制噪声消除和语音畸变的程度,λ>0,λ越大,噪声被抑制的效果越明显,同时带来更多的语音畸变;λ is to control the degree of noise elimination and speech distortion, λ>0, the larger the λ, the more obvious the effect of noise suppression, and at the same time bring more speech distortion; 节点K的时域滤波输出为:The time-domain filtered output of node K is: 在节点K,所述频域多通道维纳滤波器的表达式为:At node K, the expression of the frequency-domain multi-channel Wiener filter is: Hw,K(ω)=[RXX,K(ω)+λRNN,K(ω)]-1RXX,K(ω)u;H w, K (ω)=[R XX, K (ω)+λR NN, K (ω)] −1 R XX, K (ω)u; 上式中,RXX,K(ω)=RYY,K(ω)-RNN,K(ω);In the above formula, R XX, K (ω)=R YY, K (ω)-R NN, K (ω); 是当前节点的纯净语音向量XK(ω)=[X1,K(ω),X2,K(ω),...,XM,K(ω)]H的频域自相关矩阵; is the frequency domain autocorrelation matrix of the pure speech vector X K (ω)=[X 1, K (ω), X 2, K (ω), ..., X M, K (ω)] H of the current node; 是当前节点的噪声向量NK(ω)=[N1,K(ω),N2,K(ω),...,NM,K(ω)]H的频域自相关矩阵; is the frequency domain autocorrelation matrix of the noise vector N K (ω)=[N 1,K (ω),N 2,K (ω),...,N M,K (ω)] H of the current node; 是当前节点的多路麦克风阵列观测信号向量YK(ω)=[Y1,K(ω),Y2,K(ω),...,YM,K(ω)]H的频域自相关矩阵; is the frequency domain of the current node's multi-channel microphone array observation signal vector Y K (ω)=[Y 1,K (ω),Y 2,K (ω),...,Y M,K (ω)] H autocorrelation matrix; u=[1,0,...,0]T,其长度为M;u=[1,0,...,0] T , whose length is M; M为当前节点的麦克风数量;M is the number of microphones of the current node; λ为控制噪声消除和语音畸变的程度,λ>0,λ越大,噪声被抑制的效果越明显,同时带来更多的语音畸变;λ is to control the degree of noise elimination and speech distortion, λ>0, the larger the λ, the more obvious the effect of noise suppression, and at the same time bring more speech distortion; 节点K的频域滤波输出为:The frequency domain filtering output of node K is: 7.如权利要求1所述的基于分布式麦克风阵列网络的语音增强方法,其特征在于:所述步骤e包括在信号传输的数据包中加入发射节点序号、接收节点序号以及多通道维纳滤波器处理次数的信息。7. The speech enhancement method based on distributed microphone array network as claimed in claim 1, is characterized in that: described step e comprises adding transmitting node serial number, receiving node serial number and multi-channel Wiener filter in the data packet of signal transmission information about the number of times the processor has processed it. 8.如权利要求1所述的基于分布式麦克风阵列网络的语音增强方法,其特征在于:所述步骤f包括使用时域或频域的多通道维纳滤波器对当前节点多路观测信号和其他节点的增强后信号进行滤波;8. The speech enhancement method based on distributed microphone array network as claimed in claim 1, is characterized in that: described step f comprises using the multi-channel Wiener filter of time domain or frequency domain to current node multi-channel observation signal and The enhanced signals of other nodes are filtered; 所述时域的多通道维纳滤波器中,The time-domain multi-channel Wiener filter, 当前节点K多路麦克风阵列观测信号和所有其他节点的增强后信号所构成的联合向量为:The joint vector formed by the current node K multi-channel microphone array observation signal and the enhanced signals of all other nodes is: 上式中,为除节点K外其他节点增强后的时域单通道语音所组成的向量;In the above formula, is the vector composed of the time-domain single-channel speech enhanced by other nodes except node K; Ni为步骤g的迭代次数;Ni is the number of iterations of step g ; 中的纯净语音成分; Yes The pure voice component in ; 中的噪声成分; Yes noise components in; 为当前节点中纯净语音成分的时域自相关矩阵; is the pure voice component in the current node The time domain autocorrelation matrix of ; 为当前节点中噪声向量的时域自相关矩阵; is the noise vector in the current node The time domain autocorrelation matrix of ; 为当前节点中联合向量的时域自相关矩阵; is the joint vector in the current node The time domain autocorrelation matrix of ; u=[1,0,...,0]T,其长度为M+P-1,P为网络中的节点总数;u=[1,0,...,0] T , its length is M+P-1, and P is the total number of nodes in the network; 则节点K第Ni+1次迭代的时域多通道维纳滤波器为:Then the time-domain multi-channel Wiener filter of the N i +1 iteration of node K is: 所述频域的多通道维纳滤波器中,In the frequency domain multi-channel Wiener filter, 当前节点K多路观测信号和所有其他节点的增强后信号所构成的联合向量为:The joint vector formed by the multi-channel observation signal of the current node K and the enhanced signals of all other nodes is: 上式中,为除节点K外其他节点增强后的频域单通道语音所组成的向量;In the above formula, is the vector composed of the enhanced frequency domain single-channel speech of other nodes except node K; Ni为步骤g的迭代次数;Ni is the number of iterations of step g ; 中的纯净语音成分; for The pure voice component in ; 中的噪声成分; for noise components in; 为除去节点K之外其他节点的纯净语音向量的频域自相关矩阵; is the frequency domain autocorrelation matrix of the pure speech vectors of other nodes except node K; 为除去节点K之外其他节点背景噪音向量的频域自相关矩阵; is the frequency domain autocorrelation matrix of the background noise vector of other nodes except node K; 为除去节点K之外其他节点观测向量的频域自相关矩阵; is the frequency domain autocorrelation matrix of the observation vector for other nodes except node K; u=[1,0,...,0]T;其长度为M+P-1,P为网络中的节点总数,则节点K第Ni+1次迭代的频域多通道维纳滤波器为: u =[1, 0, . The device is: 9.如权利要求1所述的基于分布式麦克风阵列网络的语音增强方法,其特征在于:所述步骤g包括根据滤波前后信号向量之差的范数以及滤波后信号向量的范数判断节点得到的单通道增强后语音信号是否收敛的步骤,其方法如下:9. The speech enhancement method based on distributed microphone array network as claimed in claim 1, is characterized in that: described step g comprises according to the norm of the difference of the signal vector before and after filtering and the norm of the signal vector after filtering and judging that the node obtains The steps of whether the speech signal converges after the single-channel enhancement is as follows: 在节点K,前次滤波得到的单通道时域信号向量为:At node K, the single-channel time-domain signal vector obtained by the previous filtering is: 本次滤波得到的单通道时域信号向量为: The single-channel time-domain signal vector obtained by this filtering is: 时,当前滤波器输出收敛;when When , the current filter output converges; 上式中,||·||p代表p范数,η是阈值。In the above formula, ||·|| p represents the p-norm, and η is the threshold.
CN201510582363.5A 2015-09-14 2015-09-14 Speech enhancement method based on distributed microphone array network Expired - Fee Related CN105206281B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510582363.5A CN105206281B (en) 2015-09-14 2015-09-14 Speech enhancement method based on distributed microphone array network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510582363.5A CN105206281B (en) 2015-09-14 2015-09-14 Speech enhancement method based on distributed microphone array network

Publications (2)

Publication Number Publication Date
CN105206281A CN105206281A (en) 2015-12-30
CN105206281B true CN105206281B (en) 2019-02-15

Family

ID=54953910

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510582363.5A Expired - Fee Related CN105206281B (en) 2015-09-14 2015-09-14 Speech enhancement method based on distributed microphone array network

Country Status (1)

Country Link
CN (1) CN105206281B (en)

Families Citing this family (83)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10097919B2 (en) 2016-02-22 2018-10-09 Sonos, Inc. Music service selection
US10095470B2 (en) 2016-02-22 2018-10-09 Sonos, Inc. Audio response playback
US10264030B2 (en) 2016-02-22 2019-04-16 Sonos, Inc. Networked microphone device control
US9811314B2 (en) 2016-02-22 2017-11-07 Sonos, Inc. Metadata exchange involving a networked playback system and a networked microphone system
US9947316B2 (en) 2016-02-22 2018-04-17 Sonos, Inc. Voice control of a media playback system
US9965247B2 (en) 2016-02-22 2018-05-08 Sonos, Inc. Voice controlled media playback system based on user profile
CN105957536B (en) * 2016-04-25 2019-11-12 深圳永顺智信息科技有限公司 Based on channel degree of polymerization frequency domain echo cancel method
US9978390B2 (en) 2016-06-09 2018-05-22 Sonos, Inc. Dynamic player selection for audio signal processing
CN106028227B (en) 2016-07-08 2019-05-24 乐鑫信息科技(上海)股份有限公司 Distributed microphone array and its applicable sonic location system
US10134399B2 (en) 2016-07-15 2018-11-20 Sonos, Inc. Contextualization of voice inputs
US10152969B2 (en) 2016-07-15 2018-12-11 Sonos, Inc. Voice detection by multiple devices
US10115400B2 (en) 2016-08-05 2018-10-30 Sonos, Inc. Multiple voice services
GB201615538D0 (en) * 2016-09-13 2016-10-26 Nokia Technologies Oy A method , apparatus and computer program for processing audio signals
US9942678B1 (en) 2016-09-27 2018-04-10 Sonos, Inc. Audio playback settings for voice interaction
US9743204B1 (en) 2016-09-30 2017-08-22 Sonos, Inc. Multi-orientation playback device microphones
US10181323B2 (en) 2016-10-19 2019-01-15 Sonos, Inc. Arbitration-based voice recognition
CN106782590B (en) * 2016-12-14 2020-10-09 南京信息工程大学 Microphone array beamforming method based on reverberation environment
CN106846803B (en) * 2017-02-08 2023-06-23 广西交通科学研究院有限公司 Traffic event detection device and method based on audio frequency
US10659877B2 (en) 2017-03-08 2020-05-19 Hewlett-Packard Development Company, L.P. Combined audio signal output
US11183181B2 (en) 2017-03-27 2021-11-23 Sonos, Inc. Systems and methods of multiple voice services
CN106992010B (en) * 2017-06-02 2020-02-21 厦门大学 Microphone array speech enhancement device without direct sound
US10475449B2 (en) 2017-08-07 2019-11-12 Sonos, Inc. Wake-word detection suppression
US10048930B1 (en) 2017-09-08 2018-08-14 Sonos, Inc. Dynamic computation of system response volume
US10446165B2 (en) 2017-09-27 2019-10-15 Sonos, Inc. Robust short-time fourier transform acoustic echo cancellation during audio playback
US10482868B2 (en) 2017-09-28 2019-11-19 Sonos, Inc. Multi-channel acoustic echo cancellation
US10051366B1 (en) 2017-09-28 2018-08-14 Sonos, Inc. Three-dimensional beam forming with a microphone array
US10621981B2 (en) 2017-09-28 2020-04-14 Sonos, Inc. Tone interference cancellation
US10466962B2 (en) 2017-09-29 2019-11-05 Sonos, Inc. Media playback system with voice assistance
CN107993670B (en) * 2017-11-23 2021-01-19 华南理工大学 Microphone array speech enhancement method based on statistical model
US10880650B2 (en) 2017-12-10 2020-12-29 Sonos, Inc. Network microphone devices with automatic do not disturb actuation capabilities
US10818290B2 (en) 2017-12-11 2020-10-27 Sonos, Inc. Home graph
US11343614B2 (en) 2018-01-31 2022-05-24 Sonos, Inc. Device designation of playback and network microphone device arrangements
US11175880B2 (en) 2018-05-10 2021-11-16 Sonos, Inc. Systems and methods for voice-assisted media content selection
US10847178B2 (en) 2018-05-18 2020-11-24 Sonos, Inc. Linear filtering for noise-suppressed speech detection
US10959029B2 (en) 2018-05-25 2021-03-23 Sonos, Inc. Determining and adapting to changes in microphone performance of playback devices
CN110364166B (en) * 2018-06-28 2022-10-28 腾讯科技(深圳)有限公司 Electronic equipment for realizing speech signal recognition
US10681460B2 (en) 2018-06-28 2020-06-09 Sonos, Inc. Systems and methods for associating playback devices with voice assistant services
CN109192196A (en) * 2018-08-22 2019-01-11 昆明理工大学 A kind of audio frequency characteristics selection method of the SVM classifier of anti-noise
US11076035B2 (en) 2018-08-28 2021-07-27 Sonos, Inc. Do not disturb feature for audio notifications
US10461710B1 (en) 2018-08-28 2019-10-29 Sonos, Inc. Media playback system with maximum volume setting
US10587430B1 (en) 2018-09-14 2020-03-10 Sonos, Inc. Networked devices, systems, and methods for associating playback devices based on sound codes
US11024331B2 (en) 2018-09-21 2021-06-01 Sonos, Inc. Voice detection optimization using sound metadata
US10811015B2 (en) 2018-09-25 2020-10-20 Sonos, Inc. Voice detection optimization based on selected voice assistant service
US11100923B2 (en) 2018-09-28 2021-08-24 Sonos, Inc. Systems and methods for selective wake word detection using neural network models
US10692518B2 (en) 2018-09-29 2020-06-23 Sonos, Inc. Linear filtering for noise-suppressed speech detection via multiple network microphone devices
US11899519B2 (en) 2018-10-23 2024-02-13 Sonos, Inc. Multiple stage network microphone device with reduced power consumption and processing load
EP3654249A1 (en) 2018-11-15 2020-05-20 Snips Dilated convolutions and gating for efficient keyword spotting
US11183183B2 (en) 2018-12-07 2021-11-23 Sonos, Inc. Systems and methods of operating media playback systems having multiple voice assistant services
US11132989B2 (en) 2018-12-13 2021-09-28 Sonos, Inc. Networked microphone devices, systems, and methods of localized arbitration
US10602268B1 (en) 2018-12-20 2020-03-24 Sonos, Inc. Optimization of network microphone devices using noise classification
US10867604B2 (en) 2019-02-08 2020-12-15 Sonos, Inc. Devices, systems, and methods for distributed voice processing
US11315556B2 (en) 2019-02-08 2022-04-26 Sonos, Inc. Devices, systems, and methods for distributed voice processing by transmitting sound data associated with a wake word to an appropriate device for identification
US11120794B2 (en) 2019-05-03 2021-09-14 Sonos, Inc. Voice assistant persistence across multiple network microphone devices
US11200894B2 (en) 2019-06-12 2021-12-14 Sonos, Inc. Network microphone device with command keyword eventing
US10586540B1 (en) 2019-06-12 2020-03-10 Sonos, Inc. Network microphone device with command keyword conditioning
US11361756B2 (en) 2019-06-12 2022-06-14 Sonos, Inc. Conditional wake word eventing based on environment
US10871943B1 (en) 2019-07-31 2020-12-22 Sonos, Inc. Noise classification for event detection
US11138975B2 (en) 2019-07-31 2021-10-05 Sonos, Inc. Locally distributed keyword detection
US11138969B2 (en) 2019-07-31 2021-10-05 Sonos, Inc. Locally distributed keyword detection
US11189286B2 (en) 2019-10-22 2021-11-30 Sonos, Inc. VAS toggle based on device orientation
US11200900B2 (en) 2019-12-20 2021-12-14 Sonos, Inc. Offline voice control
US11562740B2 (en) 2020-01-07 2023-01-24 Sonos, Inc. Voice verification for media playback
US11556307B2 (en) 2020-01-31 2023-01-17 Sonos, Inc. Local voice data processing
US11308958B2 (en) 2020-02-07 2022-04-19 Sonos, Inc. Localized wakeword verification
CN111048106B (en) * 2020-03-12 2020-06-16 深圳市友杰智新科技有限公司 Pickup method and apparatus based on double microphones and computer device
US11308962B2 (en) 2020-05-20 2022-04-19 Sonos, Inc. Input detection windowing
US11482224B2 (en) 2020-05-20 2022-10-25 Sonos, Inc. Command keywords with input detection windowing
US11727919B2 (en) 2020-05-20 2023-08-15 Sonos, Inc. Memory allocation for keyword spotting engines
US12387716B2 (en) 2020-06-08 2025-08-12 Sonos, Inc. Wakewordless voice quickstarts
US11698771B2 (en) 2020-08-25 2023-07-11 Sonos, Inc. Vocal guidance engines for playback devices
US12283269B2 (en) 2020-10-16 2025-04-22 Sonos, Inc. Intent inference in audiovisual communication sessions
US11984123B2 (en) 2020-11-12 2024-05-14 Sonos, Inc. Network device interaction by range
CN112735462B (en) * 2020-12-30 2024-05-31 科大讯飞股份有限公司 Noise reduction method and voice interaction method for distributed microphone array
CN112652310B (en) * 2020-12-31 2024-08-09 乐鑫信息科技(上海)股份有限公司 Distributed speech processing system and method
CN112820287B (en) * 2020-12-31 2024-08-27 乐鑫信息科技(上海)股份有限公司 Distributed speech processing system and method
CN112954122B (en) * 2021-01-22 2022-10-11 成都天奥信息科技有限公司 Voice selecting method for very high frequency voice communication system
US11551700B2 (en) 2021-01-25 2023-01-10 Sonos, Inc. Systems and methods for power-efficient keyword detection
CN113257270B (en) * 2021-05-10 2022-07-15 中国科学技术大学 Multi-channel voice enhancement method based on reference microphone optimization
CN113744751B (en) * 2021-08-16 2024-05-17 清华大学苏州汽车研究院(相城) Multichannel voice signal enhancement method applied to microphone array
CN114283832B (en) * 2021-09-09 2024-08-09 腾讯科技(深圳)有限公司 Processing method and device for multichannel audio signal
EP4409933A1 (en) 2021-09-30 2024-08-07 Sonos, Inc. Enabling and disabling microphones and voice assistants
CN114495949B (en) * 2021-12-22 2025-12-23 西北工业大学 Frame-level multichannel speaker identification method for large-scale self-organizing microphone arrays
US12327549B2 (en) 2022-02-09 2025-06-10 Sonos, Inc. Gatekeeping for voice intent processing

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101292547A (en) * 2005-10-13 2008-10-22 摩托罗拉公司 Method and apparatus for synchronizing a node within an ad-hoc communication system
CN101587712A (en) * 2008-05-21 2009-11-25 中国科学院声学研究所 A Directional Speech Enhancement Method Based on Small Microphone Array
CN101772983A (en) * 2007-07-31 2010-07-07 摩托罗拉公司 System and method of resource allocation within a communication system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014519071A (en) * 2011-03-28 2014-08-07 アンビエンツ Search system and method using acoustic context

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101292547A (en) * 2005-10-13 2008-10-22 摩托罗拉公司 Method and apparatus for synchronizing a node within an ad-hoc communication system
CN101772983A (en) * 2007-07-31 2010-07-07 摩托罗拉公司 System and method of resource allocation within a communication system
CN101587712A (en) * 2008-05-21 2009-11-25 中国科学院声学研究所 A Directional Speech Enhancement Method Based on Small Microphone Array

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"Distributed Adaptive Node-Specific Signal Estimation in Fully Connected Sensor Networks—Part I: Sequential Node Updating";A. Bertrand , ect.;《IEEE Transactions on Signal Processing》;20101031;第58卷(第10期);第5页第1栏第2段-第6页第1栏第4段,图3
"Distributed GSC Beamforming Using the Relative Transfer Function";M. G. Shmulik, ect.;《20th European Signal Processing Conference》;20120831;第1页摘要,第1部分第4段-第4页第5部分
"on Optimal Frequency-Domain Multichannel Linear Filtering for Noise Reduction";S. Mehrez, ect.;《IEEE Transactions on Audio, speech, and language processing》;20100228;第2卷(第18期);第3页第3段-第5页,第7页第IV部分

Also Published As

Publication number Publication date
CN105206281A (en) 2015-12-30

Similar Documents

Publication Publication Date Title
CN105206281B (en) Speech enhancement method based on distributed microphone array network
US9584909B2 (en) Distributed beamforming based on message passing
CN108172235B (en) LS wave beam forming reverberation suppression method based on wiener post filtering
CN109273021B (en) RNN-based real-time conference noise reduction method and device
Zeng et al. Distributed delay and sum beamformer for speech enhancement via randomized gossip
CN107316648A (en) A kind of sound enhancement method based on coloured noise
Heusdens et al. Distributed MVDR beamforming for (wireless) microphone networks using message passing
CN108986832B (en) Method and device for binaural speech de-reverberation based on speech occurrence probability and consistency
CN111312269B (en) Rapid echo cancellation method in intelligent loudspeaker box
JP5337072B2 (en) Model estimation apparatus, sound source separation apparatus, method and program thereof
WO2015196729A1 (en) Microphone array speech enhancement method and device
CN103026738A (en) Signal processing method in hearing aid system and hearing aid system
CN105388459A (en) Robustness sound source space positioning method of distributed microphone array network
CN114863944B (en) A low-delay audio signal overdetermined blind source separation method and separation device
CN106373589A (en) A Binaural Mixed Speech Separation Method Based on Iterative Structure
CN110739004B (en) Distributed voice noise elimination system for WASN
CN112581970A (en) System and method for audio signal generation
Velasco et al. Novel GCC-PHAT model in diffuse sound field for microphone array pairwise distance based calibration
Zeng et al. Distributed delay and sum beamformer for speech enhancement in wireless sensor networks via randomized gossip
CN113948101A (en) A noise suppression method and device based on spatial discrimination detection
Meng et al. Deep Kronecker Product Beamforming for Large-Scale Microphone Arrays
CN112201276A (en) Microphone array speech separation method based on TC-ResNet network
Hassani et al. Distributed node-specific direction-of-arrival estimation in wireless acoustic sensor networks
Tavakoli et al. Ad hoc microphone array beamforming using the primal-dual method of multipliers
CN104464745A (en) Two-channel speech enhancement system and method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20190215

CF01 Termination of patent right due to non-payment of annual fee