CN111863012B

CN111863012B - Audio signal processing method, device, terminal and storage medium

Info

Publication number: CN111863012B
Application number: CN202010763471.3A
Authority: CN
Inventors: 李炯亮
Original assignee: Beijing Xiaomi Pinecone Electronic Co Ltd
Current assignee: Beijing Xiaomi Pinecone Electronic Co Ltd
Priority date: 2020-07-31
Filing date: 2020-07-31
Publication date: 2024-07-16
Anticipated expiration: 2040-07-31
Also published as: CN111863012A

Abstract

The disclosure relates to an audio signal processing method, an audio signal processing device, a terminal and a storage medium, wherein the audio signal processing method comprises the following steps: converting original sound source signals received by at least two microphones into original sound source signals in a plurality of beam directions; wherein, the original sound source signal of a beam direction has a null direction, and the null directions of the original sound source signals of different beam directions are different; superposing original sound source signals of a plurality of beam directions based on the null directions to obtain at least two preprocessing sound source signals; wherein at least one of said pre-processed sound source signals has at least two null directions suppressing disturbances. The method can inhibit interference or reduce reverberation and improve the quality of the audio signal of the target sound source.

Description

Audio signal processing method, device, terminal and storage medium

Technical Field

The disclosure relates to the technical field of communication, and in particular relates to an audio signal processing method, an audio signal processing device, a terminal and a storage medium.

Background

In the related art, the smart product device may improve the quality of the audio signal by forming a beam forming technique through a plurality of microphones. In the case of limited microphones, however, the gain of the beam is limited, and it is difficult to determine an ideal spatial gain, so that the audio signal quality cannot be improved well. In addition, in a scene of a plurality of disturbing sound sources, not only disturbing signals from disturbing sources but also reverberation signals generated by refraction and reflection of walls and the like, the disturbing sound sources cause equipment to have more difficulty in obtaining an audio signal of a target sound source of higher quality.

Disclosure of Invention

The disclosure provides an audio signal processing method, an audio signal processing device, a terminal and a storage medium.

According to a first aspect of embodiments of the present disclosure, there is provided an audio signal processing method, the method comprising:

Converting original sound source signals received by at least two microphones into original sound source signals in a plurality of beam directions; wherein, the original sound source signal of a beam direction has a null direction, and the null directions of the original sound source signals of different beam directions are different;

Superposing original sound source signals of a plurality of beam directions based on the null directions to obtain at least two preprocessing sound source signals; wherein at least one of said pre-processed sound source signals has at least two null directions suppressing disturbances.

In the above scheme, the method further comprises:

And performing blind source separation on at least two of the preprocessed sound source signals to obtain at least two target sound source signals.

In the above scheme, the method further comprises:

and obtaining at least two corrected target sound source information based on the at least two target sound source signals and the respective weight coefficients.

In the above-mentioned scheme, the converting the original sound source signals received by the at least two microphones into the original sound source signals with multiple beam directions includes:

Converting the original sound source signal into original sound source signals of a plurality of beam directions which are all directed to the direction of a target sound source;

The superimposing the original sound source signals of the plurality of beam directions based on the null direction to obtain at least two preprocessed sound source signals includes:

superposing original sound source signals of at least two beam directions in the original sound source signals of a plurality of beam directions based on the null directions to obtain one preprocessed sound source signal with at least two null directions and at least one preprocessed sound source signal with one null direction; or alternatively

And superposing original sound source signals of at least two beam directions in the original sound source signals of a plurality of beam directions based on the null directions so as to obtain at least two preprocessed sound source signals with at least two null directions.

In the above solution, the superimposing the original sound source signals of at least two beam directions in the original sound source signals of multiple beam directions based on the null direction to obtain at least two preprocessed sound source signals with at least two null directions includes:

Dividing the original sound source signals in a plurality of beam directions into at least two parts;

And respectively superposing the original sound source signals of at least two beam directions based on the null directions to obtain at least two preprocessed sound source signals with at least two null directions.

In the above solution, the converting, by the at least two microphones, the original sound source signal into the original sound source signals with multiple beam directions includes:

Converting the original sound source signals into original sound source signals of a plurality of beam directions pointing to fixed directions, wherein the fixed directions corresponding to the original sound source signals of different beam directions are different; the fixed direction corresponding to the original sound source signal of one beam direction is the direction pointing to the target sound source;

superposing original sound source signals of at least two beam directions of which the fixed directions are not the directions of the target sound source based on the null directions to obtain at least one preprocessed sound source signal with at least two null directions;

the original sound source signal whose fixed direction is one beam direction directed to the direction of the target sound source is taken as one of the pre-processed sound source signals.

In the above scheme, if at least two microphones are linear arrays, a null direction of the preprocessed sound source signal is: the two opposite phase angles corresponding to the preprocessed sound source signal in one direction are null directions;

If at least two microphones are annular arrays, one null direction of the preprocessed sound source signal is: the pre-processed sound source signal has a phase angle in one direction that is the null direction.

According to a second aspect of embodiments of the present disclosure, there is provided an audio signal processing apparatus, the apparatus comprising:

The conversion module is used for converting the original sound source signals received by the at least two microphones into original sound source signals in a plurality of beam directions; wherein, the original sound source signal of a beam direction has a null direction, and the null directions of the original sound source signals of different beam directions are different;

The superposition module is used for superposing the original sound source signals in a plurality of beam directions based on the null direction so as to obtain at least two preprocessed sound source signals; wherein at least one of said pre-processed sound source signals has at least two null directions suppressing disturbances.

In the above scheme, the device further includes:

And the separation module is used for carrying out blind source separation on at least two preprocessed sound source signals so as to obtain at least two target sound source signals.

In the above scheme, the device further includes:

and the correction module is used for obtaining at least two corrected target sound source information based on at least two target sound source signals and respective weight coefficients.

In the above scheme, the conversion module is configured to convert the original sound source signal into original sound source signals in a plurality of beam directions that all point to a direction of a target sound source;

the superposition module is configured to superimpose original sound source signals in at least two beam directions in the original sound source signals in multiple beam directions based on null directions, so as to obtain the preprocessed sound source signal with at least two null directions and the preprocessed sound source signal with at least one null direction; or alternatively

In the above scheme, the superposition module is configured to divide the original sound source signals in multiple beam directions into at least two parts; and respectively superposing the original sound source signals of at least two beam directions based on the null directions to obtain at least two preprocessed sound source signals with at least two null directions.

In the above scheme, the conversion module is configured to convert the original sound source signal into original sound source signals in multiple beam directions pointing to a fixed direction, where the fixed directions corresponding to the original sound source signals in different beam directions are different; the fixed direction corresponding to the original sound source signal of one beam direction is the direction pointing to the target sound source;

The superposition module is used for superposing original sound source signals of at least two beam directions with fixed directions not pointing to the direction of the target sound source based on the null directions so as to obtain at least one preprocessed sound source signal with at least two null directions; the original sound source signal whose fixed direction is one beam direction directed to the direction of the target sound source is taken as one of the pre-processed sound source signals.

According to a third aspect of embodiments of the present disclosure, there is provided a terminal comprising:

A processor;

a memory for storing processor-executable instructions;

Wherein the processor is configured to: and when used for executing the executable instructions, the audio signal processing method according to any embodiment of the disclosure is realized.

According to a fourth aspect of embodiments of the present disclosure, there is provided a computer readable storage medium storing an executable program, wherein the executable program when executed by a processor implements the audio signal processing method according to any embodiment of the present disclosure.

The technical scheme provided by the embodiment of the disclosure can comprise the following beneficial effects:

In the embodiment of the disclosure, original sound source information received by at least two microphones can be converted into original sound source signal signals in a plurality of beam directions, and the original sound source signals in the beam directions have a null direction; and superposing the original sound source signals of the beam directions based on the null directions to obtain at least two preprocessed sound source signals, wherein at least one of the two preprocessed sound source signals has at least two null directions for suppressing interference. In this way, the embodiment of the disclosure can suppress the interference sound source at least from two directions, so that the influence of interference or reverberation and the like on the target sound source can be greatly reduced, and the quality of the audio signal of the target sound source can be improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

Fig. 1 is a schematic diagram showing an application scenario of an audio signal processing method.

Fig. 2 is a flowchart illustrating an audio signal processing method according to an exemplary embodiment.

Fig. 3 is a beam schematic of an original sound source signal of a beam direction according to an exemplary embodiment.

Fig. 4 is a beam schematic of an original sound source signal of a beam direction according to an exemplary embodiment.

Fig. 5 is a beam schematic of an original sound source signal of a beam direction according to an exemplary embodiment.

Fig. 6 is a beam schematic of an original sound source signal of a beam direction according to an exemplary embodiment.

Fig. 7 is a beam schematic of an original sound source signal of a beam direction, according to an exemplary embodiment.

Fig. 8 is a beam schematic of an original sound source signal of a beam direction according to an exemplary embodiment.

Fig. 9 is a flowchart illustrating an audio signal processing method according to an exemplary embodiment.

Fig. 10 is a flowchart illustrating an audio signal processing method according to an exemplary embodiment.

Fig. 11 is a flowchart illustrating an audio signal processing method according to an exemplary embodiment.

Fig. 12 is a flowchart illustrating an audio signal processing method according to an exemplary embodiment.

Fig. 13 is a schematic diagram illustrating an audio signal processing method according to an exemplary embodiment.

Fig. 14 is a schematic diagram of an audio signal processing apparatus according to an exemplary embodiment.

Fig. 15 is a block diagram of a terminal according to an exemplary embodiment.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the invention. Rather, they are merely examples of apparatus and methods consistent with aspects of the invention as detailed in the accompanying claims.

Fig. 1 is a schematic diagram of an application scenario of an audio signal processing method; as shown in fig. 1, in this application scenario, there is one target sound source and a plurality of interfering sound sources, where one plurality of interfering sound sources includes: television interference, human noise interference, outdoor noise, background noise, household noise, musical echoes, room reverberation, and the like. Here, the room reverberation may be a signal reflected or refracted by a target sound source through a cavity of a room or an obstacle or the like in the room; the room reverberation may also be a signal reflected or refracted by a cavity of a room or an obstacle in the room, such as a human voice disturbance or a television disturbance. Thus, the quality of the audio signal of the target sound source collected by the microphone is not high under the interference of a plurality of interference sound sources.

Fig. 2 is a flowchart illustrating a method of audio signal processing according to an exemplary embodiment, the method including the steps of:

Step S21: converting original sound source signals received by at least two microphones into original sound source signals in a plurality of beam directions;

Wherein, the original sound source signal of a beam direction has a null direction, and the null directions of the original sound source signals of different beam directions are different;

step S22: superposing original sound source signals of a plurality of beam directions based on the null directions to obtain at least two preprocessing sound source signals;

Wherein at least one of said pre-processed sound source signals has at least two null directions suppressing disturbances.

Here, the null direction is a direction indicating a disturbance sound source. Here, the null direction forms a null in a direction indicating a disturbing sound source.

Here, the interfering sound sources include a first type of interfering sound source and a second type of interfering sound source. Here, the first type of disturbing sound source may be a sound source emitted by various disturbing sources, for example, a sound source emitted by a person, an electronic device, or the like. The second type of interfering sound source may be reverberation or echo; for example, the echo may be the echo of the first type of disturbance sound source and the target sound source reflected or refracted by the obstacle, or the reverberation of the first type of disturbance sound source and the target sound source reflected or refracted by the obstacle, or the like.

Here, the disturbing sound source includes, but is not limited to, at least one of: human voice interference sound sources, television interference sound sources, background noise, household noise, echoes and reverberation.

Here, the at least two microphones may be two or more. For example, the number of microphones is 2, 4 or 6, etc.

Here, the at least two microphones may be linear arrays or may be annular arrays. For example, as shown in fig. 3, the number of microphones is 4, and the 4 microphones are in a linear array. As another example, as shown in fig. 4, the number of microphones is 6, and the 6 microphones are in a circular array.

Here, the original sound source signals received by the at least two microphones are omnidirectional original sound source signals; for example, as shown in fig. 5, the at least two microphones receive an original sound source signal that is 360 degrees.

In one embodiment, the original sound source signals of the plurality of beam directions are more than 2. For example, the 360 degree original sound source signal in fig. 5 may be converted into 4, 6 or 8 equal beam directions of the original sound source signal. Here, the angular range covered by each beam direction may be the same or different.

Here, one way to implement the step S21 is: determining a direction indicating each interfering sound source; an original sound source signal of one beam direction is obtained based on the direction of one interfering sound source, wherein the original sound source signal of one beam direction forms nulls in the direction of the interfering sound source.

Illustratively, as shown in fig. 6, the original sound source signal of 360 degrees is converted into original sound source signals of 4 beam directions, wherein the original sound source signals of one beam direction are represented by one beam; the original sound source signals in the 4 beam directions are beam 1, beam 2, beam 3, and beam 4, respectively. The 360-degree original sound source signal has an interference sound source at 90 degrees, 135 degrees, 180 degrees and 315 degrees, so that the null direction of the beam 1 is 90 degrees, the null direction of the beam 2 is 135 degrees, the null direction of the beam 3 is 180 degrees and the null direction of the beam 4 is 315 degrees.

Here, another way to implement the step S21 is as follows: the original sound source signals in the 360-degree direction are evenly divided into original sound source signals in a plurality of angle ranges, wherein the original sound source signals in each angle range are original sound source signals in one beam direction.

Illustratively, as shown in fig. 7, the original sound source signal of 360 is converted into original sound source signals of 6 angular ranges; wherein, 6 angle ranges are respectively: 330 to 30 degrees (including 330 to 360 degrees and 0 to 30 degrees), 30 to 90 degrees, 90 to 150 degrees, 150 to 210 degrees, 210 to 270 degrees, 270 to 330 degrees; the original sound source signals corresponding to the 6 angles are beam 1, beam 2, beam 3, beam 4, beam 5 and beam 6, respectively. Wherein, the null direction of beam 1 is 180 degrees, the null direction of beam 2 is 240 degrees, the null direction of beam 3 is 300 degrees, the null direction of beam 4 is 0 degrees, the null direction of beam 5 is 60 degrees and the null direction of beam 6 is 120 degrees.

Here, still another way to implement the step S21 is: the original sound source signal of 360 degrees direction is divided into original sound source signals of a plurality of angle ranges, wherein the plurality of angle ranges are unequal angle ranges.

Illustratively, the 360-degree original sound source signal is divided into unequal 4-angle-range original sound source signals; wherein, 4 angle ranges are respectively: 0 to 80 degrees, 80 to 180 degrees, 250 degrees, and 250 degrees, 36 degrees.

In some embodiments, if at least two of the microphones are linear arrays, one null direction of the pre-processed sound source signal is: the two opposite phase angles corresponding to the preprocessed sound source signal in one direction are null directions;

Illustratively, the at least two microphones are linear arrays that receive the original sound source signals in multiple beam directions converted from the original sound source signals, as shown in fig. 6. Here, the original sound source signals in the 4 beam directions are beam, beam 2, beam 3, and beam 4, respectively. One null direction of the 4 beams is the null direction of two opposite phase angles corresponding to one direction, respectively. For example, beam 1 has two opposite phase angles in the direction of 90 degrees, 90 degrees and 270 degrees, respectively; the phase angles are both the null directions for 90 degrees and 270. Here, one null direction is two angles in one direction, which are 180 degrees apart.

Illustratively, the at least two microphones are annular arrays, and the at least two microphones receive the original sound source signals in a plurality of beam directions converted from the original sound source signals, as shown in fig. 8. Here, one null direction of the 4 beams is a null direction of one phase angle of one direction. For example, beam 1 has two opposite phase angles in the direction of 90 degrees, 90 degrees and 270 degrees, respectively; but only at a phase angle of 90 degrees the beam 1 is in the null direction.

Here, based on the characteristic that the linear array microphone collects the sound source signal, one null direction is a direction in which one direction is opposite to two phase angles; based on the characteristics of the sound source signal collected by the annular array microphone, a null direction is a direction of one direction and one phase angle. Thus, the embodiment of the disclosure can determine the beam pattern of the original sound source signal of the beam direction according to the actual acquisition condition.

In the embodiment of the disclosure, original sound source information received by at least two microphones can be converted into original sound source signal signals in a plurality of beam directions, and the original sound source signals in the beam directions have a null direction; and superposing the original sound source signals of the beam directions based on the null directions to obtain at least two preprocessed sound source signals, wherein at least one of the two preprocessed sound source signals has at least two null directions for suppressing interference. In this way, the embodiment of the disclosure can suppress the interference sound source at least from two directions, and reduce the influence of the interference sound source on the target sound source, so that the quality of the audio signal of the target sound source can be improved.

In addition, the interference sound source in the present disclosure may be sound emitted by various interference sources, or the sound source emitted by various interference sources and the target sound source are reflected or refracted back or reverberated; as such, the embodiments of the present disclosure can suppress interference of various interfering sound sources and reduce reverberation.

As shown in fig. 9, in some embodiments, the step S21 includes:

step S211: converting the original sound source signal into original sound source signals of a plurality of beam directions which are all directed to the direction of a target sound source;

The step S22 includes:

Step S221: superposing original sound source signals of at least two beam directions in the original sound source signals of a plurality of beam directions based on the null directions to obtain one preprocessed sound source signal with at least two null directions and at least one preprocessed sound source signal with one null direction; or alternatively

Here, the original sound source signal of each beam direction is directed in the direction of the target sound source. For example, as shown in fig. 6, the direction of the target sound source is 0 degrees, and beams 1,2, 3, and 4 are all directed to 0 degrees.

Here, the original sound source signal of each beam direction has one null direction, and the null directions of the original sound source signals of different beam directions are different. For example, referring again to fig. 6, the null direction for beam 1 is 90 degrees, the null direction for beam 2 is 135 degrees, the null direction for beam 3 is 180 degrees, and the null direction for beam 4 is 315 degrees.

Here, in the step S221, original sound source signals of at least two beam directions among the original sound source signals of the plurality of beam directions may be superimposed based on the null direction, and original sound sources of other beam directions among the original sound source signals of the plurality of beam directions except for the original sound source signals of the at least two beams may be superimposed or not superimposed.

Here, one way to implement the superposition based on the null direction may be: the superposition is based on any at least two original sound source signals having beam directions of one null direction.

For example, as shown in fig. 6, beam 1 and beam 2 may be superimposed, and beam 3 and beam 4 may not be superimposed; or beam 1, beam 2 and beam 3 are overlapped, and beam 4 is not overlapped; or beam 1 and beam 3 are superimposed and beam 2 and beam 4 are superimposed; etc.

Another way to implement the stacking based on the null direction may be: the superposition is performed based on the original signals of beam directions having an angle between any at least two null directions smaller than a predetermined threshold. For example, if the angular difference between the null directions of beam 1 and beam 2 is less than the threshold angle of 90 degrees, then beam 1 and beam 2 are determined to be superimposed; for another example, if the angle difference between the largest two beams of the beams 1, 2 and 3 is smaller than the threshold angle of 90 degrees, it is determined that the beams 1, 2 and 3 are overlapped.

The case of superimposing beam 1 and beam 2 is exemplified below. Before beam 1 and beam 2 are superimposed, both beam 1 and beam 2 correspond to 0db in the 0 degree direction; before superposition, beam 1 is attenuated by 40db in the 90 degree direction (i.e., shown as-40 db in fig. 6), and beam 1 is attenuated by about 8db in the 135 degree direction (i.e., shown as-8 db in fig. 6); before superposition, beam 2 is attenuated by about 8db in the 90 degree direction (i.e., shown as-8 db in fig. 6), and beam 2 is attenuated by 40db in the 135 degree direction (i.e., shown as-40 db in fig. 6). After the beam 1 and the beam 2 are overlapped, the beam after the beam 1 and the beam 2 are overlapped is not attenuated in the 0-degree direction; the beam after superposition of the beam 1 and the beam 2 has attenuation of at least 8db at 90 degrees and attenuation of about 8db at 135 degrees; thus, the superimposed beam attenuates in both the 90 degree and 135 degree directions, i.e., the superimposed beam attenuates two interfering sound sources.

Therefore, the embodiment of the disclosure can inhibit a plurality of interference sound sources on the premise of ensuring that the target sound source is not attenuated, and compared with the prior art, the method can greatly improve the number of the interference sound sources and reduce reverberation and the like for inhibiting only one interference sound source. For example, several sources of sound include: the embodiment of the disclosure not only can inhibit the interference sound source, background noise and the like emitted by the human voice, the television and the like, but also can inhibit the echoes, reverberation and the like caused by reflection, refraction and the like of the target sound source and various interference sound sources; so that the quality of the audio signal of the target sound source can be improved.

In some embodiments, the superimposing the original sound source signals of at least two beam directions of the original sound source signals of the plurality of beam directions based on the null directions to obtain at least two preprocessed sound source signals having at least two null directions includes:

Here, the original sound source signals of the plurality of beam directions may be equally divided into two, three, four, five, or the like.

For example, as shown in fig. 6, 4 beams may be split into two parts, one of which is beam 1 and beam 2 and the other of which is beam 3 and beam 4; beam 1 and beam 2 are subsequently superimposed and beam 3 and beam 4 are superimposed.

For another example, if the original sound source signals in the beam direction have 5 beams from beam 1 to beam 5, the beams 1 to beam 5 can be divided into two parts, one part being beam 1 and beam 2, and the other part being beam 3, beam 4 and beam 5.

For another example, if the original sound source signals in the beam direction have 9 beams from beam 1 to beam 9, the beams 1 to beam 9 may be divided into three parts, wherein the first part is beam 1 to beam 3, the second part is beam 4 to beam 6, and the third part is beam 7 to beam 9.

In this way, in the embodiment of the present disclosure, the sound source signals in multiple beam directions may be equally divided as much as possible, so as to obtain at least two preprocessed sound source signals with substantially identical numbers of null directions; on one hand, each preprocessing sound source signal can inhibit the audio signals of a plurality of interference sound sources as much as possible, and on the other hand, the separation of the interference sound sources and the target sound sources during the subsequent blind source separation is also facilitated.

In an embodiment, the superimposing the original sound source signals of at least two beam directions of the original sound source signals of the plurality of beam directions based on the null directions to obtain at least two preprocessed sound source signals having at least two null directions includes:

Dividing the original sound source signals in a plurality of beam directions into two parts;

And respectively superposing the two original sound source signals based on the null directions to obtain two preprocessed sound source signals with at least two null directions.

In the embodiment of the present disclosure, since the number of the obtained preprocessed sound source signals is two, as many original sound source signals in the beam directions as possible can be superimposed, so that each preprocessed sound source signal suppresses as many interference sound sources as possible; and in the subsequent blind source separation of the preprocessed sound source signals, the input quantity is made to be as few as possible two paths of input signals, so that the complexity of blind source separation calculation can be greatly reduced.

As shown in fig. 10, in some embodiments, the step S21 includes:

Step S212: converting the original sound source signals into original sound source signals of a plurality of beam directions pointing to fixed directions, wherein the fixed directions corresponding to the original sound source signals of different beam directions are different; the fixed direction corresponding to the original sound source signal of one beam direction is the direction pointing to the target sound source;

The step S22 includes:

Step S222: superposing original sound source signals of at least two beam directions of which the fixed directions are not the directions of the target sound source based on the null directions to obtain at least one preprocessed sound source signal with at least two null directions; the original sound source signal whose fixed direction is one beam direction directed to the direction of the target sound source is taken as one of the pre-processed sound source signals.

Here, the fixed direction may be an arbitrary specified direction.

For example, if the original sound source signal is divided into 6 beam directions, it can be determined that the 6 beam original sound source signals are directed in fixed directions of 30 degrees, 90 degrees, 150 degrees, 210 degrees, 270 degrees, and 330 degrees, respectively. Then 360 degrees may be equally divided into beam directions of 6 beams, and then 6 beams may cover 0 to 60 degrees, 60 to 120 degrees, 120 to 180 degrees, 180 to 240 degrees, 240 to 300 degrees, and 300 to 360 degrees, respectively.

Here, the fixed direction of the original sound source signal of one beam direction among the original sound source signals of the plurality of beam directions is a direction directed toward the target sound source.

For example, if the original sound source signal is divided into 6 beam directions, if the direction of the target sound source is 0 degrees, one of the fixed directions is 0 degrees, and the other 5 fixed directions may be 60 degrees, 120 degrees, 180 degrees, 240 degrees, and 300 degrees. Then 360 degrees can be equally divided into the beam directions of 6 beams; as shown in fig. 7, the 6 beams may be: beam 1 of 330 to 360 and 0 to 30 degrees, beam 2 of 30 to 90 degrees, beam 3 of 90 to 150 degrees, beam 4 of 150 to 210 degrees, beam 5 of 210 to 270 degrees, and beam 6 of 270 to 330 degrees are covered, respectively.

Of course, in other embodiments, it is also possible that the fixed direction of the original sound source signal with one beam direction among the original sound source signals with multiple beam directions is the direction pointing to the target sound source; the null direction of the original sound source signal of the other beam directions should correspond as much as possible to the direction of the interfering sound source.

Of course, in other embodiments, the angular ranges covered by the original sound source signals of multiple beam directions pointing in different fixed directions may also be unequal; for example, 360 degrees may be divided into unequal beam directions of 6 beams, such as beam directions covering 0 to 65 degrees, beam 2 covering 65 to 95 degrees, beam 3 of 95 to 182 degrees, beam 4 of 182 to 220 degrees, beam 5 of 220 to 300 degrees, and beam 6 of 330 to 360 degrees.

In this way, the original sound source signal of one beam direction of the omni-directional, i.e., 360 degrees, original sound source signal can be directed to the target sound source direction, and the original sound source signals of a plurality of beam directions having one null direction; the target sound source can be initially separated.

Here, in the step S222, the original sound source signals of at least two beam directions whose fixed directions are not the directions of the target sound source are superimposed based on the null directions, which may be:

superposing original sound source signals of at least two beam directions of which all fixed directions are not the directions of the target sound source based on the null directions; or alternatively

Dividing at least two original sound source signals of at least two beam directions of which the fixed directions are not the directions of the target sound source into at least two parts, and respectively superposing the at least two original sound source signals of the beam directions.

For example, as shown in fig. 7, the fixed direction of the beam 1 is a direction pointing to the target sound source; the fixed directions of beams 2 to 6 are not the directions pointing to the target sound source; beams 2 to 6 may be superimposed; alternatively, beams 2 to 6 may be divided into two parts, and beam 2 and beam 3 may be superimposed, and beam 4, beam 5, and beam 6 may be superimposed, respectively.

Here, when the beams 2 to 6 are superimposed, the superimposed beams attenuate in at least 90 degrees, 150 degrees, 210 degrees, 270 degrees, 330 degrees, and the like.

Of course, in other embodiments, in step S222, the original sound source signals of the multiple beam directions may be directly divided into at least two parts, and the at least two parts of original sound source signals may be respectively superimposed. Here, instead of extracting the original sound source signals in the beam direction directed to the target sound source alone, the original sound source signals in the opposite directions of the plurality of beams may be superimposed in a grouped manner, and a plurality of interfering sound sources may be suppressed.

In the disclosed embodiment, the influence of reverberation, echo, interference sources and the like on the target sound source can be reduced by obtaining at least one preprocessed sound source signal with at least two null directions, so that a plurality of interference sound sources can be restrained. In addition, an original sound source signal with a fixed direction being the beam direction pointing to the direction of the target sound source can be extracted, so that the target sound source can be initially extracted, and further processing of the target sound source is facilitated.

As shown in fig. 11, in some embodiments, the method further comprises:

Step S23: and performing blind source separation on at least two of the preprocessed sound source signals to obtain at least two target sound source signals.

Here, the at least two target sound source signals include a first type target sound source signal and at least one second type target sound source signal; the first type of target sound source signals are audio signals comprising target sound sources, and the second type of target sound source signals are audio signals comprising interference sound sources.

Here, one implementation manner of the step S23 is: obtaining at least two mixed observation signals based on at least two of the pre-processed sound source signals; obtaining an estimation matrix based on the preprocessed sound source signal; obtaining a separate signal at each frequency domain point based on the estimation matrix; obtaining a target separation matrix based on the separation matrix at each frequency domain point; and obtaining at least two target sound source signals based on the target separation matrix and at least two observation signals.

Of course, in other embodiments, any other blind source separation technique may be used in the step S23, and only the preprocessed sound source signals need to be separated into the target sound source signals of the respective sound sources; the target sound source signal may be an audio signal of an interfering sound source or an audio signal of a target sound source.

In the embodiment of the disclosure, the sound source signals of the plurality of sound sources can be separated based on the blind source separation technology, and the audio signal of the target sound source can be separated from the sound source signals, so that the audio signal of the target sound source with high quality can be obtained.

Of course, in the above embodiment, if the number of the obtained preprocessed sound source signals is 2 or less than the predetermined number based on the above step S22, the computational complexity of performing the blind source separation in step S23 can be reduced. For example, if the number of the preprocessing sound source signals is 2, the input amount of the unknown signals (i.e., the preprocessing sound source signals) of the blind source separation is 2, the smaller the blind source separation calculation amount is, so that the calculation amount in the blind source separation can be greatly reduced.

Referring again to fig. 11, in some embodiments, the method further comprises:

step S24: and obtaining at least two corrected target sound source information based on the at least two target sound source signals and the respective weight coefficients.

Here, if the target sound source signal includes an audio signal of a target sound source, the target sound source signal is a first weight coefficient; if the target sound source signal comprises an audio signal of an interference sound source, the target sound source signal is a second weight coefficient; wherein the first weight coefficient is greater than the second weight coefficient.

For example, there are 2 target sound source signals, S1 and S2 respectively, wherein S1 is an audio signal of a target sound source, and S2 is an audio signal of an interfering sound source; it may be determined that the first weight coefficient of S1 is 80% and the second weight coefficient of S2 is 20%. Thus, the audio signal of the target sound source in the corrected target sound source signal is improved by 4 times, and the audio signal of the interference sound source is reduced by one fourth. Thus, the signal to noise ratio can be greatly improved; and even if a small amount of interfering sound source signals are included in the audio signal regarding the target sound source, the degree of influence thereof on the target sound source is very small.

Of course, if the target sound source signal includes audio signals of a plurality of interfering sound sources, for example, the target sound source signal is 3, and S1, S2 and S3 are respectively set; wherein, the S1 is the audio signal of the target sound source, and the S2 and the S3 are the audio signals of the interference sound source. The corresponding weight coefficients of S1, S2 and S3 may also be determined, for example, the first weight coefficient of S1 is 80%, the second weight coefficient of S2 is 15%, and the second weight coefficient of S3 is 5%; the first weight coefficient is only required to be larger than the second weight coefficient.

Thus, in the embodiment of the disclosure, the signal-to-noise ratio of the target sound source signal can be adjusted by determining the corresponding weight for each target sound source signal; if the weight coefficient of the audio signal of the target sound source is larger than that of the interference sound source, the signal to noise ratio of the target sound source signal can be greatly improved, and therefore the audio signal of the target sound source with higher quality is obtained.

A specific example is provided below in connection with any of the embodiments described above:

As shown in fig. 12, an embodiment of the present disclosure discloses an audio signal processing method, which is applied to a terminal, and includes the following steps:

step S41: acquiring original sound source signals received by two 2 microphones;

in an alternative embodiment, as shown in fig. 13, the terminal acquires two original sound source signals (i.e., x_1 and x_2) and inputs the two original sound source signals to the beam forming module.

Step S42: converting the original sound source signals into original sound source signals of 6 beam directions; dividing the original sound source signals of 6 beam directions into two parts; respectively superposing the original sound source signals in the two beam directions to obtain two preprocessed sound source signals;

Here, the original sound source signal of one beam direction has one null direction; the null direction of different original sound source signals is different. One pre-processed sound source direction has three null directions.

Referring again to fig. 13, in an alternative embodiment, in the beam forming module, two original sound source signals (i.e., x_1 and x_2) are converted into original sound source signals of 6 beam directions; the original sound source signals of the 6 beam directions are respectively beam 1 (beamform _1), beam 2 (beamform _2), beam 3 (beamform _3), beam 4 (beamform _4), beam 5 (beamform _5) and beam 6 (beamform _6); superposing beamform _1, beamform _2 and beamform _3 to obtain a pre-processed sound source signal (Y1); and superimpose beamform _4, beamform _5, and beamform _6 to obtain another pre-processed sound source signal (Y2).

Step S43: performing blind source separation on the two preprocessed sound source signals to obtain two target sound source signals;

referring again to FIG. 13, in an alternative embodiment, the Y1 and Y2 inputs are input to a blind source separation module. In the blind source separation module, blind source separation is performed on the Y1 and the Y2, and two target sound source signals are output, namely S1 and S2.

Step S44: and obtaining two corrected target sound source signals based on the two target sound source signals and various weight coefficients.

Referring again to FIG. 13, in an alternative embodiment, S1 and S2 are input to a post-processing module. In the preprocessing module, the weight coefficients of the S1 and the S2 are obtained respectively, the corrected target sound source signal S1 ^' is obtained based on the weight coefficients of the S1 and the S1, and the corrected target sound source signal S2 ^' is obtained based on the weight coefficients of the S2 and the S2.

In the embodiment of the disclosure, the original sound source signals are converted into the original sound source signals with a plurality of beam directions to be overlapped, and the preprocessed sound source signals with a plurality of null directions are obtained, so that the interference sound sources in more directions can be restrained, namely, the influence of the interference sound sources, echo, reverberation and the like on the target sound source can be reduced. Moreover, the audio signal of the target sound source can be separated by blind source separation, and the audio signal of the target sound source with high quality can be obtained. And because the preprocessing module can further correct each target sound source signal, the audio signal of the target sound source is further enhanced, the audio signal of the interference sound source is further attenuated, and the signal-to-noise ratio in the separated target sound source can be improved, so that the quality of the audio signal of the target sound source is further improved.

Fig. 14 is a block diagram of an audio signal processing apparatus according to an exemplary illustration. Referring to fig. 14, the audio signal processing apparatus includes:

A conversion module 61, configured to convert original sound source signals received by at least two microphones into original sound source signals in multiple beam directions; wherein, the original sound source signal of a beam direction has a null direction, and the null directions of the original sound source signals of different beam directions are different;

a superposition module 62, configured to superimpose the original sound source signals in multiple beam directions based on the null direction, so as to obtain at least two preprocessed sound source signals; wherein at least one of said pre-processed sound source signals has at least two null directions suppressing disturbances.

Referring again to fig. 14, in some embodiments, the apparatus further comprises:

And a separation module 63, configured to perform blind source separation on at least two of the preprocessed sound source signals, so as to obtain at least two target sound source signals.

And a correction module 64, configured to obtain corrected at least two pieces of target sound source information based on at least two pieces of target sound source signals and respective weight coefficients.

In some embodiments, the conversion module 61 is configured to convert the original sound source signal into original sound source signals of a plurality of beam directions each pointing in a direction of a target sound source;

the superimposing module 62 is configured to superimpose at least two beam direction original sound source signals among the plurality of beam direction original sound source signals based on a null direction to obtain one preprocessed sound source signal having at least two null directions and at least one preprocessed sound source signal having one null direction; or alternatively

In some embodiments, the superposition module 62 is configured to divide the original sound source signals of multiple beam directions equally into at least two parts; and respectively superposing the original sound source signals of at least two beam directions based on the null directions to obtain at least two preprocessed sound source signals with at least two null directions.

In some embodiments, the conversion module 61 is configured to convert the original sound source signal into original sound source signals in a plurality of beam directions pointing in a fixed direction, where the fixed directions corresponding to the original sound source signals in different beam directions are different; the fixed direction corresponding to the original sound source signal of one beam direction is the direction pointing to the target sound source;

The superimposing module 62 is configured to superimpose the original sound source signals of at least two beam directions whose fixed directions are not the directions of the target sound source, based on the null directions, so as to obtain at least one preprocessed sound source signal having at least two null directions; the original sound source signal whose fixed direction is one beam direction directed to the direction of the target sound source is taken as one of the pre-processed sound source signals.

The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

The embodiment of the disclosure also provides a terminal, which is characterized by comprising:

A processor;

a memory for storing processor-executable instructions;

Wherein the processor is configured to: when used for executing executable instructions, the audio signal processing method according to any embodiment of the disclosure is realized.

The memory may include various types of storage media, which are non-transitory computer storage media capable of continuing to memorize information stored thereon after a power down of the communication device.

The processor may be coupled to the memory via a bus or the like for reading an executable program stored on the memory, for example, implementing at least one of the methods shown in fig. 2, 8-12.

Embodiments of the present disclosure also provide a computer-readable storage medium storing an executable program, wherein the executable program when executed by a processor implements the audio signal processing method according to any embodiment of the present disclosure. For example, at least one of the methods shown in fig. 2, 9 to 12 is implemented.

Fig. 15 is a block diagram illustrating a method for a terminal 800 according to an example embodiment. For example, the terminal 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, or the like.

Referring to fig. 15, the terminal 800 may include one or more of the following components: a processing component 802, a memory 804, a power component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensor component 814, and a communication component 816.

The processing component 802 generally controls overall operation of the terminal 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 802 may include one or more processors 820 to execute instructions to perform all or part of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interactions between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the device 800. Examples of such data include instructions for any application or method operating on the terminal 800, contact data, phonebook data, messages, pictures, videos, and the like. The memory 804 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

The power supply component 806 provides power to the various components of the terminal 800. The power components 806 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the terminal 800.

The multimedia component 808 includes a screen between the terminal 800 and the user that provides an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or slide action, but also the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front camera and/or a rear camera. The front camera and/or the rear camera may receive external multimedia data when the device 800 is in an operational mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the terminal 800 is in an operation mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may be further stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 further includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be a keyboard, click wheel, buttons, etc. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button.

The sensor assembly 814 includes one or more sensors for providing status assessment of various aspects of the terminal 800. For example, the sensor assembly 814 may detect an on/off state of the device 800, a relative positioning of the components, such as a display and keypad of the terminal 800, the sensor assembly 814 may also detect a change in position of the terminal 800 or a component of the terminal 800, the presence or absence of user contact with the terminal 800, an orientation or acceleration/deceleration of the terminal 800, and a change in temperature of the terminal 800. The sensor assembly 814 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate communication between the terminal 800 and other devices, either wired or wireless. The terminal 800 may access a wireless network based on a communication standard, such as WiFi,2G or 3G, or a combination thereof. In one exemplary embodiment, the communication component 816 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the terminal 800 can be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements for executing the methods described above.

In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as memory 804 including instructions executable by processor 820 of terminal 800 to perform the above-described method. For example, the non-transitory computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

Other embodiments of the application will be apparent to those skilled in the art from consideration of the specification and practice of the application disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It is to be understood that the invention is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims

1. A method of audio signal processing, the method comprising:

2. The method according to claim 1, wherein the method further comprises:

3. The method according to claim 2, wherein the method further comprises:

and obtaining at least two corrected target sound source signals based on the at least two target sound source signals and the respective weight coefficients.

4. A method according to claim 1 or 2, characterized in that,

The converting the original sound source signals received by at least two microphones into original sound source signals of a plurality of beam directions includes:

5. The method of claim 4, wherein the superimposing the original sound source signals of at least two beam directions of the plurality of beam directions based on the null directions to obtain at least two of the preprocessed sound source signals having at least two null directions comprises:

6. A method according to claim 1 or 2, characterized in that,

The at least two microphones receiving an original sound source signal and converting the original sound source signal into original sound source signals of a plurality of beam directions, including:

7. A method according to claim 1 or 2, wherein if at least two of said microphones are in a linear array, one null direction of said pre-processed sound source signal is: the two opposite phase angles corresponding to the preprocessed sound source signal in one direction are null directions;

8. An audio signal processing apparatus, the apparatus comprising:

9. The apparatus of claim 8, wherein the apparatus further comprises:

10. The apparatus of claim 9, wherein the apparatus further comprises:

And the correction module is used for obtaining at least two corrected target sound source signals based on the at least two target sound source signals and the respective weight coefficients.

11. The device according to claim 8 or 9, wherein,

The conversion module is used for converting the original sound source signals into original sound source signals of a plurality of beam directions which are all directed to the direction of the target sound source;

12. The apparatus of claim 11, wherein the device comprises a plurality of sensors,

The superposition module is used for equally dividing the original sound source signals in a plurality of beam directions into at least two parts; and respectively superposing the original sound source signals of at least two beam directions based on the null directions to obtain at least two preprocessed sound source signals with at least two null directions.

13. The device according to claim 8 or 9, wherein,

The conversion module is used for converting the original sound source signals into original sound source signals of a plurality of beam directions pointing to fixed directions, wherein the fixed directions corresponding to the original sound source signals of different beam directions are different; the fixed direction corresponding to the original sound source signal of one beam direction is the direction pointing to the target sound source;

14. The device according to claim 8 or 9, wherein,

If at least two microphones are linear arrays, one null direction of the preprocessed sound source signal is: the two opposite phase angles corresponding to the preprocessed sound source signal in one direction are null directions;

15. A terminal, comprising:

A processor;

a memory for storing processor-executable instructions;

Wherein the processor is configured to: for implementing the audio signal processing method of any of claims 1-7 when said executable instructions are executed.

16. A computer-readable storage medium, characterized in that the readable storage medium stores an executable program, wherein the executable program, when executed by a processor, implements the audio signal processing method of any one of claims 1 to 7.