[go: up one dir, main page]

WO2021004067A1 - Dispositif d'affichage - Google Patents

Dispositif d'affichage Download PDF

Info

Publication number
WO2021004067A1
WO2021004067A1 PCT/CN2020/075958 CN2020075958W WO2021004067A1 WO 2021004067 A1 WO2021004067 A1 WO 2021004067A1 CN 2020075958 W CN2020075958 W CN 2020075958W WO 2021004067 A1 WO2021004067 A1 WO 2021004067A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal
voice
circuit
sound
display device
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2020/075958
Other languages
English (en)
Chinese (zh)
Inventor
李本友
于云涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hisense Visual Technology Co Ltd
Original Assignee
Hisense Visual Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN201910620438.2A external-priority patent/CN110349582B/zh
Priority claimed from CN201910619184.2A external-priority patent/CN110223707A/zh
Application filed by Hisense Visual Technology Co Ltd filed Critical Hisense Visual Technology Co Ltd
Publication of WO2021004067A1 publication Critical patent/WO2021004067A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation

Definitions

  • This application relates to the field of electronic technology, and in particular to a display device.
  • the display device With the continuous development of electronic technology, more and more display devices, such as mobile phones, tablet computers, and televisions, have functions that can interact with users through voice. Among them, the user can directly speak the instructions that need to be executed to the display device.
  • the display device collects the voice signal of the external environment where the display device is located through the microphone, and recognizes that the voice signal includes the instruction spoken by the user, and then executes it. The function corresponding to the instruction.
  • some display devices also implement a far-field sound pickup function based on the voice interaction function. Among them, after the display device receives the voice signal, it needs to process the voice signal such as filtering, denoising and echo cancellation, and then recognize the instructions in the processed voice signal to obtain higher far-field voice Recognition accuracy rate.
  • the display device when the display device performs echo cancellation processing on the received voice signal, the echo itself is caused by the original playback signal determined by the System on Chip (SOC) controller in the display device.
  • SOC System on Chip
  • the display device can directly use the original playback signal in the SOC controller as the echo reference signal, and perform echo cancellation processing on the received voice signal.
  • the present application provides a display device to improve the echo cancellation effect when the display device performs echo cancellation.
  • the display device includes: a voice processing circuit, a power amplifier, a speaker, a voice collection circuit, and an echo processing circuit;
  • the voice processing circuit, the power amplifier and the loudspeaker are connected in sequence; the voice processing circuit is connected to the voice collection circuit and the echo processing circuit respectively;
  • the voice processing circuit is used to send an original playback signal to the power amplifier;
  • the power amplifier is used to process the original playback signal, and then send the obtained signal to be played to the speaker for playback;
  • the voice collection circuit is used to collect voice signals to be processed in the environment where the display device is located;
  • the echo processing circuit is configured to obtain the signal to be played sent by the power amplifier to the speaker;
  • the voice processing circuit is further configured to perform echo cancellation processing on the voice signal to be processed according to the signal to be played.
  • the echo processing circuit is also used to preprocess the signal to be played; the voice processing circuit is specifically used to perform processing on the signal to be played according to the preprocessed signal to be played The voice signal undergoes echo cancellation processing.
  • the preprocessing includes: amplitude reduction processing.
  • the power amplifier is also used to obtain the left channel signal and the right channel signal corresponding to the signal to be played according to the differential processing of the signal to be played, and send them to the speaker for performing Play;
  • the preprocessing also includes: converting to single-ended processing.
  • the left channel signal includes: a left channel positive differential signal and a left channel negative differential signal
  • the echo processing circuit includes: a left channel processing circuit
  • the right channel signal includes : Right channel positive differential signal and right channel negative differential signal
  • the echo processing circuit includes: a left channel processing circuit and a right channel processing circuit;
  • the left channel processing circuit is configured to perform amplitude reduction processing and single-ended conversion on the left channel positive differential signal and the left channel negative differential signal; wherein, the left channel processing circuit includes: The first input resistor, the first feedback resistor, and the first operational amplifier; the left channel positive differential signal is connected to the same direction input terminal of the first operational amplifier, and the left channel negative differential signal passes through the first operational amplifier.
  • An input resistor is connected to the inverting input terminal of the first operational amplifier, and the output terminal of the first operational amplifier is connected to the inverting input terminal of the first operational amplifier through the first feedback resistor.
  • the right channel signal includes: a right channel positive differential signal and a right channel negative differential signal;
  • the echo processing circuit includes: a right channel processing circuit;
  • the right channel processing circuit is configured to perform amplitude reduction processing and conversion to single-ended processing on the right channel positive differential signal and the right channel negative differential signal; wherein, the right channel processing circuit includes: A second input resistor, a second feedback resistor and a second operational amplifier; the right channel positive differential signal is connected to the same direction input terminal of the second operational amplifier, and the right channel negative differential signal passes through the first Two input resistors are connected to the reverse input end of the second operational amplifier, and the output end of the second operational amplifier is connected to the reverse input end of the second operational amplifier through the second feedback resistor.
  • the voice collection circuit is composed of a MIC array, and the MIC array includes a plurality of MICs; the voice signal to be processed is a pulse density modulated PDM signal collected by the MIC array.
  • the MIC array includes a first MIC, a second MIC, a third MIC, and a fourth MIC arranged in sequence;
  • the set second MIC and fourth MIC are recorded as the second group of MIC;
  • the MIC array specifically collects the to-be-processed voice signal in turn through the first group of MICs and the second group of MICs.
  • the voice processing circuit is also used to perform sampling and analog-to-digital conversion processing on the PDM signal.
  • the voice processing circuit is further configured to recognize the instruction in the to-be-processed voice signal after echo cancellation processing, and execute the function corresponding to the instruction.
  • the voice processing circuit is further configured to send the voice signal of the voice signal to be processed after echo cancellation processing to the server, so that the server recognizes the instruction in the voice signal to be processed, and then sends the voice signal to the server.
  • the voice processing circuit sends an instruction message; receives the instruction message sent by the server, and executes the function corresponding to the instruction message.
  • the present application provides a display device.
  • the display device includes a speaker and a far-field voice processing circuit; the far-field voice processing circuit includes:
  • Speaker used to play the sound output by the device
  • a sound pickup circuit for picking up far-field sounds where the far-field sounds include far-field voices emitted by a user and sounds played by the speaker and transmitted to the sound pickup circuit;
  • a preprocessing circuit connected to the sound pickup circuit to receive the picked-up far-field sound, and the preprocessing circuit is connected to the front end of the speaker to obtain the playback sound recovery signal;
  • the echo processing circuit is connected to the preprocessing circuit to receive the picked up far-field sound and the playback sound recovery signal, and use the playback sound recovery signal to echo cancel the picked up far-field sound to obtain the user's output Far-field voice.
  • the preprocessing circuit includes:
  • the pre-processing circuit is coupled with the sound pickup circuit and the front end of the speaker to convert the picked-up far-field sound and the playback sound recovery signal into a format compatible with the echo processing circuit.
  • the pre-processing circuit is further used to adjust the phase of the picked up far-field sound and the playback sound recovery signal, so that the phase of the playback sound recovery signal is ahead of the picked up far-field sound The phase of is within the preset duration.
  • the preprocessing circuit further includes:
  • a first encoder the pre-processing circuit is connected to the front end of the speaker through the first encoder, and the first encoder performs analog-to-digital conversion on the playback sound recovery signal.
  • the display device includes a power amplifier; the power amplifier is connected between the speaker and the echo processing circuit, and is used to provide the speaker with multiple channels of sound output by the device; and the playback sound
  • the recovery signal includes the multi-channel sound obtained from the front end of the speaker;
  • the first encoder is also used for synthesizing multiple sounds obtained from the front end of the speaker.
  • the sound pickup circuit includes a microphone array, and a second encoder electrically connected to the microphone array, wherein the microphone array is used for picking up the far-field sound; the second encoder is used for To perform analog-to-digital conversion on the far-field sound;
  • the second encoder is also used for synthesizing multiple far-field sounds picked up by the microphone array.
  • the far-field sound processing circuit further includes a speech enhancement circuit and a sound source localization circuit, and the echo-cancelled far-field sound output by the echo cancellation circuit is transmitted to the speech enhancement circuit and the sound source localization circuit respectively.
  • the speech enhancement circuit is connected to the sound source localization circuit to receive the sound source localization result output by the sound source localization circuit, and according to the sound source localization result, enhance the far-field sound after echo cancellation, To generate to form the far-field voice to be uploaded.
  • the display device further includes a voice engine circuit connected to the output terminal of the voice enhancement circuit, and the voice engine circuit performs wake-up word recognition processing on the far-field voice to be uploaded , To encode the far-field voice to be uploaded and transmit it to the designated terminal when the preset wake-up word is recognized;
  • the voice engine circuit is also used to receive an instruction corresponding to the far-field voice returned from a designated terminal.
  • the display device has a main control chip, and the echo processing circuit, voice enhancement circuit, sound source localization circuit, and voice engine circuit are all integrated in the main control chip.
  • a far-field speech processing circuit includes:
  • a sound pickup circuit for picking up far-field sounds where the far-field sounds include far-field voices emitted by a user and sounds played by the speaker and transmitted to the sound pickup circuit;
  • a preprocessing circuit connected to the sound pickup circuit to receive the picked-up far-field sound, and the preprocessing circuit is connected to the front end of the speaker to obtain the playback sound recovery signal;
  • the echo processing circuit is connected to the preprocessing circuit to receive the picked up far-field sound and the playback sound recovery signal, and use the playback sound recovery signal to echo cancel the picked up far-field sound to obtain the user's output Far-field voice.
  • the power amplifier will perform related processing on the sound signal that needs to be played, so the sound signal that needs to be played has already undergone non-linear changes before and after passing through the power amplifier; therefore, this solution changes from the power amplifier
  • the back end and the front end of the speaker obtain the playback sound recovery signal, so even after the non-linear signal processing such as equalization and amplification in the power amplifier, the playback sound recovery signal obtained by the preprocessing circuit and the speaker playback signal picked up by the sound pickup circuit
  • the sound is very close, so the playback sound recovery signal is used to eliminate the echo of the picked up far-field sound, which can greatly reduce the echo interference in the far-field voice sent by the user, and improve the accuracy of identifying the far-field voice. , Thereby improving the sensitivity of remote sound pickup to interrupt wake-up and improve user experience;
  • this embodiment sets up a preprocessing circuit to receive the picked up far-field sound and play the sound recovery signal, thereby overcoming that many existing display device SOC chips do not have corresponding interfaces and cannot receive the far-field sound transmitted by the microphone array. Defects. Therefore, the technical solution of this application improves the popularity of far-field voice human-computer interaction technology on display devices.
  • the display device provided by the present application can obtain the signal to be played output from the power amplifier to the speaker through the echo processing circuit, and use the signal to be played as the echo reference signal to perform echo cancellation on the voice signal to be processed received by the voice collection circuit deal with. It can more accurately represent the sound signal actually played by the speaker in the voice signal to be processed, so that the voice processing circuit can achieve a better echo cancellation effect when the voice signal to be processed is echo canceled, thereby improving the subsequent processing of the voice signal. Accuracy of recognition.
  • Figure 1 is a schematic diagram of the application scenario of this application.
  • FIG. 2 is a schematic diagram of the processing flow of the voice signal by the electronic device display device of this application;
  • FIG. 3 is a schematic diagram of a structure of an electronic device display device in the related art.
  • FIG. 5 is a schematic structural diagram of an embodiment of an electronic device display device provided by this application.
  • Figure 6 is a schematic diagram of the structure of a power amplifier
  • FIG. 7 is a schematic structural diagram of an embodiment of an electronic device display device provided by this application.
  • FIG. 8 is a schematic structural diagram of an embodiment of an echo processing circuit provided by this application.
  • FIG. 9 is a schematic structural diagram of an embodiment of an electronic device display device provided by this application.
  • FIG. 10 is a schematic structural diagram of an embodiment of an arrangement of a MIC array in a voice collection circuit provided by this application;
  • Figure 11 is a schematic circuit diagram of an embodiment of a voice collection circuit provided by this application.
  • FIG. 12 is a schematic diagram of the processing flow of the voice signal to be processed and the signal to be played by the voice processing circuit provided by this application;
  • FIG. 13 is a front view of an embodiment of a display device of the present application.
  • Figure 14 is an exploded view of part of the structure of Figure 13;
  • Fig. 16 is a circuit connection block diagram of an embodiment of the far-field speech processing circuit of the present application.
  • 17 is a circuit connection block diagram of another embodiment of the far-field speech processing circuit of the present application.
  • FIG. 18 is a circuit connection block diagram of still another embodiment of the far-field speech processing circuit of the present application.
  • Figure 19 is a circuit diagram of the interface between the microphone array and the second encoder
  • 20 is a block diagram of the functional structure of an embodiment of the main control chip
  • Fig. 21 is a block diagram of partial circuit connections of an embodiment of the far-field speech processing circuit of the present application.
  • FIG. 1 is a schematic diagram of the application scenario of this application, in which each embodiment of this application is applied to a display device 1 with a voice interaction function, that is, the display device 1 can execute related functions corresponding to the instructions according to the instructions spoken by the user 2.
  • the display device 1 includes: mobile phones, tablet computers, notebook computers, televisions, and other smart devices with related signal processing functions or data processing functions, such as smart watches, smart speakers, smart appliances, etc.
  • the display device 1 is a television set as an exemplary description, rather than limiting it.
  • the flow of processing the voice signal by the display device 1 can be referred to as shown in FIG. 2, where FIG. 2 is a schematic diagram of the processing flow of the voice signal by the display device of this application.
  • the display device 1 collects sound signals of the surrounding environment where the display device 1 is located through a microphone to obtain a voice signal, and then performs detection processing on the collected voice signal.
  • the key words are used to detect the instructions issued by the user 2 in the voice signal that need to be executed by the display device, such as "turn on”, “change channel”, “increase or decrease volume”, and “turn off”.
  • the display device 1 executes the function corresponding to the recognized instruction. For example, after the display device 1 recognizes that the command in the collected voice signal is "shutdown", it executes the shutdown operation.
  • the display device 1 can usually detect the instructions spoken by the user 2 through the voice signal with a higher recognition rate.
  • the distance between the user 2 and the display device 1 is relatively long
  • the instructions given by the user 2 are very weak, and the voice signals include interference such as noise and echo, so that the display device 1 recognizes the instructions spoken by the user 2 farther away.
  • the recognition rate is low. Therefore, some display devices 1 also have a far-field sound pickup function.
  • the voice signal After receiving a voice signal including instructions uttered by a user 2 who is far away, the voice signal is also processed by filtering, denoising and echo cancellation.
  • the instruction of the user 2 in the processed voice signal is detected again to improve the recognition accuracy of the instruction spoken by the user 2 far away from the display device 1.
  • the display device 1 when the display device 1 is a television, the television itself is also playing sound signals through its speakers, and at the same time, in order to recognize the instructions spoken by the user 2, the display device 1 is collecting the voice signals of the environment where it is located. At this time, the collected voice signal will inevitably include the sound signal played by the display device 1 through the speaker. Therefore, in order for the display device 1 to accurately detect the instructions spoken by the user 2 in the voice signal, it needs to eliminate the voice signal played by the display device 1 itself in the voice signal. This processing process is called "echoes" in some technologies. eliminate".
  • FIG. 3 is a schematic structural diagram of a display device in the related art.
  • the display device 1 shown in FIG. 3 it includes: a MIC board 11, a main board 12, a left channel speaker 123 and a right channel speaker 124.
  • the SOC121 on the main board 12 is used to determine the original playback signal to be played by the display device 1, and send the original playback signal to the AMP122 through the I2S interface for processing; then the AMP122 amplifies the original playback signal and converts the single-ended signal to After the left channel signal and the right channel signal, they are respectively sent to the left channel speaker 123 and the right channel speaker 124 for playback.
  • the microphone 111 on the MIC board 11 is used to collect the voice signal of the environment where the display device 1 is located, and send the voice signal to the codec unit 112 for codec processing, convert it into a voice signal in I2S format and send it to the MIC board MCU113. After the voice signal is further processed by the MCU113, it is sent to the SOC121 on the main board 12 through the USB interface.
  • the SOC121 can determine the original playback signal that the display device needs to play, and it needs to perform echo cancellation processing on the received voice signal, so the SOC121 can directly use the original playback signal that needs to be played as the echo reference signal.
  • the SOC121 After the voice signal sent by the MCU113 undergoes echo cancellation processing, the SOC121 then performs keyword detection, instruction recognition, and execution of functions corresponding to the instruction according to the voice signal after the echo cancellation processing.
  • the SOC121 in the display device 1 can determine the original playback signal to be played, it can use the original playback signal as the echo reference signal to perform the collected voice signal Echo cancellation processing.
  • the original playing signal determined by the SOC121 in the display device 1 will be amplified by the AMP122 and some non-linear processing to obtain the real signal to be played before the signal to be played output by the AMP122 is played through the speaker. This results in a big difference between the signal to be played that is actually played by the speaker after being processed by the AMP122 and the original playing signal determined by the SOC121.
  • the SOC121 Since the echo that needs to be eliminated in the voice signal received by the SOC121 is the signal to be played actually played by the speaker, at this time, if the SOC121 still only performs echo cancellation on the voice signal based on the original playback signal that has not been processed by the AMP122, the internal The sound signal does not better restore the to-be-played signal actually played by the speaker, which will reduce the echo cancellation effect of the SOC121's echo cancellation processing on the voice signal, which may affect the accuracy of subsequent voice signal recognition.
  • the present application provides a display device that collects the sound signal output to the speaker after AMP processing as an echo reference signal, and performs echo cancellation processing on the voice signal, thereby improving the echo cancellation effect on the voice signal, and further improving the subsequent response to the voice signal The recognition accuracy rate.
  • the display device 3 includes: a voice processing circuit 31, a power amplifier 32, a speaker 33, a voice collection circuit 34, and an echo processing circuit 35. Among them, the voice processing circuit 31, the power amplifier 32, and the speaker 33 are connected in sequence, and the voice processing circuit 31 is connected to the voice collecting circuit 34 and the echo processing circuit 35 respectively.
  • the voice processing circuit 31, the power amplifier 32 and the speaker 33 are jointly used to implement the voice playback function of the display device 3.
  • the voice processing circuit 31 may be a circuit on a system-on-chip (SOC) on the motherboard of the display device 1, or a central processing unit (SOC) in other forms. :CPU), graphics processing unit (Graphics Processing Unit, GPU for short) and other circuits on processing equipment with processing capabilities. This application does not limit the specific implementation of the voice processing circuit.
  • the voice processing circuit 31 is used to determine the original playback signal corresponding to the sound to be played by the display device 1 and send the original playback signal to the power amplifier 32 for processing.
  • the processing includes: amplifying the original playback signal.
  • the power amplifier 32 may also be called an operational amplifier, a power amplifier, etc., or may be called an Operational Amplifier, or AMP for short. Then, after the power amplifier 32 receives the original playback signal sent by the voice processing circuit 31, the signal to be played is amplified. The amplified signal to be played is sent to the speaker 33 for playing, and finally the speaker 33 in the display device 3 plays the signal to be played amplified by the power amplifier.
  • the voice collection circuit 34, the echo processing circuit 35, and the voice processing circuit 31 are jointly used to realize the voice signal collection and the voice of the display device 3. Signal processing function.
  • the voice collection circuit 34 may be a microphone (Microphone, MIC for short) provided in the display device 3, and the voice collection circuit 34 is used to collect sound signals in the surrounding environment where the display device 3 is located as the voice signals to be processed. , And send the collected voice signal to be processed to the voice processing circuit 31 for subsequent processing.
  • the echo processing circuit 35 is used to collect the to-be-played signal amplified by the power amplifier and output from the power amplifier 32 to the speaker 33, and send the collected to-be-played signal to the voice processing circuit 31 for subsequent processing.
  • the voice processing circuit 34 After receiving the to-be-processed voice signal sent by the voice collecting circuit 34 and the to-be-played signal sent by the echo processing circuit 35, the to-be-played signal is used as the echo reference signal, and the to-be-processed voice signal is echo canceled. deal with.
  • the present application does not limit the specific manner in which the voice processing circuit 31 performs echo cancellation; and the echo cancellation can be implemented by a hardware circuit in the voice processing circuit 31, or it can also be implemented by a processor in the voice processing circuit 31 in a way of program software.
  • FIG. 5 is a schematic structural diagram of an embodiment of a display device provided by this application.
  • the embodiment shown in FIG. 5 is a specific arrangement of various circuits in the display device based on the embodiment shown in FIG. 4.
  • the voice collection circuit 34 can be provided on the MIC board 301 of the display device 3, and the voice processing circuit 31, the power amplifier 32 and the echo processing circuit 35 can all be provided on the main board 302 of the display device 3.
  • the display device provided in the embodiment shown in FIG. 4 and FIG. 5 can obtain the signal to be played from the power amplifier to the speaker through the echo processing circuit, and then use the signal to be played as the echo reference signal.
  • the voice signal to be processed received by the voice acquisition circuit undergoes echo cancellation processing.
  • the to-be-played signal output by the power amplifier has undergone processing such as amplification, it can be directly played through the speaker, so that there is a gap between the to-be-played signal collected by the echo processing circuit and the sound signal directly played in the speaker of the display device. The difference is small.
  • the voice processing circuit uses the to-be-played signal output by the power amplifier as the echo reference signal, and performs echo cancellation processing on the to-be-processed voice signal, it can more accurately represent the sound signal actually played by the speaker in the to-be-processed voice signal, thereby This enables the voice processing circuit to achieve a better echo cancellation effect when performing echo cancellation processing on the voice signal to be processed, thereby improving the accuracy of subsequent voice signal recognition.
  • the power amplifiers in some display devices also need to be specially configured.
  • the display device is a television
  • the audio signal played by the speakers of the television needs to meet requirements such as audio power. Therefore, the amplitude of the signal to be played output from the power amplifier to the speaker is relatively large. For example, if the audio power requirement of a 55-inch TV signal is 10W, the amplitude of the signal to be played can reach 9V.
  • the signal amplitude that can be received by the voice processing circuit in the display device is small.
  • the voice processing circuit is an SOC
  • the upper limit of the effective value of the signal amplitude that the SOC can receive is generally 1V.
  • the echo processing circuit also needs to perform amplitude reduction processing on the collected signal to be played before sending it to the voice processing circuit for processing.
  • the speakers specifically include a left-channel speaker and a right-channel speaker
  • the power amplifier is also used to convert the signal to be played into a differential signal and output it to the speaker for playback.
  • the power amplifier specifically converts the signal to be played from a single-ended signal into a differential signal of a left channel signal and a right channel signal, and then sends them to the left channel speaker and the right channel speaker for playback.
  • Figure 6 is a schematic structural diagram of a power amplifier, where the power amplifier receives a signal to be played sent by a voice processing circuit, and the signal to be played includes a left channel signal and a right channel signal, and the power amplifier will The signal and the right channel signal are amplified separately, and after the single-ended signal is converted into a differential signal, the two differential left channel signals are sent to the left channel speaker for playback, and the two differential right channel signals are sent to Right channel speaker playback.
  • the left channel signal includes a differential AMP-Lout- signal and a SMP-Lout+ signal
  • the right channel signal includes a differential AMP-Rout+ signal and an AMP-Rout- signal.
  • FIG. 7 is a schematic structural diagram of an embodiment of a display device provided by this application.
  • the echo processing circuit 35 also needs to receive the left channel differential signal and the right channel differential signal sent by the power amplifier 32 respectively. After the channel differential signal, the differential signal received by the power amplifier is converted into a single-ended signal, and the single-ended signal of the left channel and the single-ended signal of the right channel are sent to the voice processing circuit.
  • this embodiment provides a specific implementation of the echo processing circuit, which can reduce the amplitude of the signal to be played from the amplifier and convert it to single-ended through the echo processing circuit After processing, the processed left channel signal and right channel signal are sent to the voice processing circuit for processing.
  • FIG. 8 is a schematic structural diagram of an embodiment of the echo processing circuit provided by this application. As shown in FIG. 8, the echo processing circuit specifically includes a right channel processing circuit and a left channel processing circuit.
  • the input end of the first operational amplifier N1A can be connected to the AMP-Rout+ signal and the AMP-Rout- signal output by the power amplifier as shown in FIG. 6.
  • the AMP-Rout- signal of the right channel is processed by the first capacitor C11, and then connected to the inverting input terminal IN- of the first operational amplifier N1A through the first input resistor R11; the AMP-Rout+ of the right channel
  • the output terminal OUT of the first operational amplifier N1A is also connected through the first feedback resistor R12.
  • the positive input terminal IN+ of the first operational amplifier N1A is grounded, and the positive input terminal IN+ and the inverting input terminal IN- of the first operational amplifier N1A are "virtually short", making the positive The voltage to the input terminal IN+ and the inverting input terminal IN- are both zero.
  • the inverting input terminal IN-input resistance R11 is high, a "virtual disconnection” is formed, so that there is almost no current injection and outflow from the inverting input terminal IN-.
  • the first input resistor R11 and the first feedback resistor R12 are connected in series, and the current flowing through the first input resistor R11 and the first feedback resistor R12 is the same.
  • the ratio of the voltage at the output terminal OUT of the first operational amplifier N1A to the voltage at the inverting input terminal IN- is the ratio of the first feedback resistor R12 to the first input resistor R11.
  • the voltage of the single-ended signal of AMP-RIN output by the first operational amplifier is smaller than the differential signal AMP-Rout- and AMP- of the input terminal of the first operational amplifier N1A.
  • the voltage of Rout+ that is, the first operational amplifier N1A realizes the conversion of differential signal to single-ended and amplitude reduction at the same time, and the single-ended signal AMP-RIN output by the first operational amplifier N1A can be used as the right channel signal of the signal to be played and directly sent to the voice processing circuit To process.
  • the input end of the second operational amplifier N1B can be connected to the AMP-Lout+ signal and the AMP-Lout- signal output by the power amplifier as shown in FIG. 6.
  • the AMP-Lout- signal of the right channel is processed by the third capacitor C31 and is connected to the inverting input terminal IN- of the second operational amplifier N1B through the third input resistor R31; the AMP-Lout+ of the left channel
  • the output terminal OUT of the second operational amplifier N1B is also connected through the second feedback resistor R32.
  • the positive input terminal IN+ of the second operational amplifier N1B is grounded, and the positive input terminal IN+ and the inverting input terminal IN- of the second operational amplifier N1B are "virtually short", making the positive The voltage to the input terminal IN+ and the inverting input terminal IN- are both zero.
  • the inverting input terminal IN-input resistance R31 is high, a "virtual disconnection” is formed, so that there is almost no current injection and outflow from the inverting input terminal IN-.
  • the third input resistor R31 and the second feedback resistor R32 are connected in series, and the current flowing through the third input resistor R31 and the second feedback resistor R32 is the same.
  • the ratio of the voltage at the output terminal OUT of the second operational amplifier N1B to the voltage at the inverting input terminal IN- is the ratio of the second feedback resistor R32 to the third input resistor R31.
  • the second operational amplifier N1B realizes the conversion of differential signal to single-ended and amplitude reduction at the same time, and the single-ended signal AMP-LIN output by the second operational amplifier N1B can be used as the left channel signal of the signal to be played and directly sent to the voice processing circuit To process.
  • the voice collection circuit 34 provided by the present application since the voice collection circuit 34 provided by the present application is only used to collect voice data to be processed, all subsequent processing of the voice data requires the voice processing circuit 31 to execute . Therefore, the voice collection circuit 34 provided in the present application may be a MIC array, and the to-be-processed voice signal received by the voice processing circuit is a pulse density modulation (Pulse Density Modulation, PDM) signal directly collected by the MIC array.
  • PDM Pulse Density Modulation
  • FIG. 9 is a schematic structural diagram of an embodiment of a display device provided by this application.
  • the voice collection circuit 34 of the display device 3 is a 4MIC array.
  • the MIC array is 4MIC as an example.
  • the voice collection circuit 34 may also be 2MIC, 8MIC, or 16MIC, which is only an increase or decrease in number, and the implementation principle is the same. No longer.
  • FIG. 10 is a schematic structural diagram of an embodiment of the arrangement of the MIC array in the voice collection circuit provided by this application.
  • the 4 MICs of the MIC array can be arranged in order. They are arranged inside the display device 1 from left to right. The figure also uses the display device 1 as a television as an example.
  • FIG. 11 is a schematic circuit diagram of an embodiment of the voice acquisition circuit provided by this application, in which four MICs of MIC1, MIC2, MIC3, and MIC4 are arranged in parallel on the circuit structure, and MIC1 and MIC3 are recorded For the first group D0, mark MIC2 and MIC4 as the second group D1.
  • the collected PDM signals are used as voice signals to be processed and sent to the voice processing circuit for processing.
  • the four MICs of MIC1, MIC2, MIC3 and MIC4 can be controlled through the PDM_CLK signal.
  • the L/R pin of MIC1 is directly connected to VDD through the resistor R1
  • the L/R pin of MIC1 is set to a high level by VDD.
  • the L/R pin of MIC2 is directly grounded through resistor R879.
  • resistor R9 is not connected in the figure, the L/R pin of MIC2 is set to low level.
  • the L/R pin of MIC3 is set to high level
  • the L/R pin of MIC4 is set to low level.
  • the CLK pins of the four MICs of MIC1, MIC2, MIC3 and MIC4 are connected to the square wave form of PDM_CLK signal, between the rising edge of the PDM_CLK signal and the next falling edge, MIC1 and MIC3 are the first group of D0. Collect the voice signal to be processed, and send the collected PDM_D0 signal and PDM_D1 signal to the voice processing circuit. And between the falling edge of the DM_CLK signal and the next rising edge, MIC2 and MIC4, the second group D1, collect the voice signal to be processed, and send the collected PDM_D0 signal and PDM_D1 signal to the voice processing circuit.
  • the to-be-processed voice signals collected by different groups of MICs will be received at different times, and in the embodiments of the present application, the to-be-processed voice signals received by the voice processing circuit are PDM signals.
  • the voice processing circuit can perform echo cancellation processing on the voice signal to be processed based on the received signal to be played, and the voice processing circuit can further perform operations such as voice recognition and semantic understanding on the voice signal to be processed.
  • FIG. 12 is a schematic diagram of the processing flow of the voice signal to be processed and the signal to be played by the voice processing circuit provided in this application.
  • the voice processing circuit 31 after receiving the to-be-processed voice signal from the voice collection circuit, the to-be-processed voice signal is first filtered, and then 16k sampling is performed to obtain the digitized voice signal to be processed, and then the digitized voice-to-be-processed After the signal undergoes gain control and delay control, it is sent to a direct memory access (Direct Memory Access, referred to as DMA) unit for processing.
  • DMA Direct Memory Access
  • the preprocessed signal to be played when receiving the preprocessed signal to be played from the echo collection circuit, the preprocessed signal to be played is first subjected to analog-to-digital conversion and 16k sampling to obtain the digitized signal to be played. Then the digitized signal to be played is also subjected to gain control and delay control, and then sent to the DMA unit for processing.
  • the DMA unit is the memory of the voice processing circuit, and its manifestation can be DDR.
  • the two signals obtained by the DMA unit are stored in the static random access memory (Static Random-Access Memory, referred to as SRAM) of the speech processing circuit, and the SRAM can be the hard disk of the speech processing circuit.
  • SRAM static random access memory
  • the voice processing circuit uses the signal to be played stored in the SRAM as an echo reference signal, and performs echo cancellation processing on the voice data to be processed to obtain the final voice data.
  • the voice processing circuit before performing echo cancellation, the voice processing circuit also needs to set the amplitude of the voice signal to be processed and the amplitude of the signal to be played to improve the efficiency of echo cancellation processing.
  • the purpose of delay control of the voice signal to be processed and the signal to be played is because the voice processing circuit receives the voice signal to be processed and the signal to be played from different circuits, and the processing of echo cancellation by the voice processing circuit is relative to real-time processing. The collected signal lags behind the asynchronous operation. Therefore, after the voice processing circuit receives the to-be-processed voice signal and the to-be-played signal, it needs to synchronize the two.
  • the voice processing circuit may further detect the user's instruction in the echo canceled voice data after obtaining the voice data after the echo cancellation processing. And after the user's instruction is detected, the function corresponding to the instruction is executed. For example, when this embodiment is applied in the scene as shown in FIG. 1 and the display device is a TV, if the user says to the TV the instruction to "turn off”. Then the to-be-processed voice data collected by the TV includes an instruction to "turn off".
  • the TV performs echo cancellation processing on the to-be-processed voice data according to the method provided in any of the foregoing embodiments of this application, it further recognizes that the to-be-processed voice data is "Shut down" command and execute the action of turning off the TV.
  • the voice processing circuit may also send the voice data after echo cancellation processing to the server on the network side through the communication circuit, and the server further detects the user's instructions in the voice data, and returns corresponding messages to the voice processing circuit according to the instructions, so that the voice The processing circuit performs corresponding functions according to the received message.
  • the voice data after the echo cancellation processing is sent to the server.
  • the server recognizes the "shutdown" instruction in the voice data, the server sends a shutdown message to the TV. Finally, after the TV receives the shutdown message sent by the server, it executes the shutdown action of the TV.
  • the display device proposed in this embodiment has a human-machine voice interaction function.
  • FIG. 13 is a front view of the display device of this embodiment
  • FIG. 14 is an exploded view of the structure of the display device of this embodiment.
  • the display device includes a panel 41, a backlight assembly 42, a main board 43, a power supply board 44, a rear case 45, a base 46, and a pickup circuit 47.
  • the panel 41 is used to present images to the user;
  • the backlight assembly 42 is located below the panel 41, usually some optical components, used to supply sufficient brightness and uniformly distributed light sources, so that the panel 41 can display images normally, the backlight assembly 42 also Including a back plate 4201, the main board 43 and the power supply board 44 are arranged on the back board 4201, and some convex structures are usually stamped on the back plate 4201.
  • the main board 43 and the power supply board 44 are fixed on the convex package by screws or hooks; the rear shell 45 The cover is set on the panel 41 to hide the backlight assembly 42, the main board 43, and the power supply board 44 and other display device components to achieve a beautiful effect; the base 46 is used to support the display device with a pickup circuit for picking up remote Field voice microphone.
  • the pickup circuit 47 can be arranged on the lower side of the rear case, and roughly located in the middle of the entire display device.
  • the pickup circuit 47 and the rear case 45 are an integrated structure or can be detachably connected by screws, buckles, etc. .
  • a microphone is provided on the remote control to pick up the voice uttered by the user.
  • the user needs to perform voice interaction with the display device, he must hold the remote control and speak to the remote control. Therefore, when the remote control is not around, the user needs to look for the remote control first, and while the user is holding the remote control to make a voice, the user’s hand is occupied and cannot do other things, which greatly causes inconvenience for the user, especially for Some users with hand disabilities will not be able to fully use the human-machine voice interaction function of the display device.
  • a display device with a far-field sound pickup function appears.
  • the microphone array for the user to pick up the sound is set on the display device. Therefore, the user can emit voice without the remote control and be picked up by the display device directly. This method liberates the user's hands and greatly facilitates the user's use.
  • the far-field pickup is interrupted and the recognition effect is deteriorated, thereby affecting the user experience. This is because the user’s far-field voice is often accompanied by the display device itself playing songs/videos and other local sounds through the speakers. Therefore, the microphone array actually collects the local sounds emitted by the display device’s speakers and the user’s actual Speaking voice, and the purpose of echo cancellation is to remove the local voice part of the speaker and only keep the user's voice.
  • the main board SOC of the display device sends out the sound signal to be played to the power amplifier, which is amplified by the power amplifier and then output to the speaker for playing. Therefore, it is usually used at the output end of the SOC chip to lead out a sound recovery signal as a reference to eliminate the signal.
  • the power amplifier will perform related processing on the sound signal that needs to be played, so the sound signal that needs to be played has already undergone non-linear changes before and after the power amplifier. Therefore, there is a certain gap between the collected sound recovery signal and the actual sound of the speaker. Therefore, even if the accuracy of the echo cancellation algorithm is high, the actual sound of the speaker cannot be completely eliminated, and the echo cancellation is incomplete. The problem has never been solved.
  • the motherboard 43 of the display device in this embodiment includes a SOC (System on Chip), and a power amplifier 550 connected to the SOC.
  • the output terminal of the power amplifier 550 is connected with a speaker 540, and the SOC outputs the audio signal to be played into the power amplifier 550.
  • the power amplifier 550 amplifies the audio signal and performs analog-to-digital conversion processing to drive the speaker 540 to play. Specifically, two or more speakers 540 may be provided.
  • the pickup circuit 47 includes a microphone board 58 on which a microphone array 511 is arranged.
  • the microphone array 511 includes a plurality of microphones arranged at intervals, and the distance between two adjacent microphones is approximately the same.
  • the microphone board 58 is also provided with a first encoder 522 for encoding the playback sound recovery signal obtained from the back end of the power amplifier 550, and a second encoder 512 for encoding the microphone output signal.
  • the main board 43 and the microphone board 58 need to transmit signals through the interface socket.
  • the far-field sound picked up by the microphone array 511 and the playback sound recovery signal acquired from the back end of the power amplifier 550 are all transmitted through the USB interface.
  • the interface socket can be a USB port, or a dedicated USB interface designed with the UAC (USB Audio Class) protocol of the USB as the interface protocol.
  • the embodiment of the present application proposes a far-field speech processing circuit of a device.
  • the device may be a smart terminal, such as a display device.
  • the application of the far-field speech processing circuit to the display device is taken as an example for description.
  • the far-field voice processing circuit includes a speaker 540, a sound pickup circuit 510, a preprocessing circuit 520, and a main control chip (not shown in the figure), and the main control chip integrates an echo processing circuit 531.
  • the speaker 540 is used to play the sound output by the device.
  • the sound pickup circuit 510 is used for picking up far-field sounds, and the far-field sounds include the far-field voice emitted by the user and the mixed sound that is transmitted to the sound pickup circuit 510 by the sound played by the speaker 540.
  • the preprocessing circuit 520 is connected to the sound pickup circuit 510 to receive the picked up far-field sound, and the preprocessing circuit 520 is connected to the front end of the speaker 540 to obtain the playback sound recovery signal.
  • the echo processing circuit 531 is connected to the preprocessing circuit 520 to receive the picked up far-field voice and the playback sound recovery signal, and use the playback sound recovery signal to echo cancel the picked far-field sound to obtain the far-field voice from the user.
  • the echo processing circuit 531 may be a separate circuit.
  • the user implements human-computer interaction with the display device by emitting voice, and the display device itself will play music, voice in video and other sounds through the speaker 540 when it is working; therefore, the sound pickup circuit 510 will inevitably pick up the remote voice of the user. Field voice and sound played by the speaker 540.
  • the main control chip of the display device transmits the sound signal to be played to the power amplifier (referred to as the power amplifier 550), and the power amplifier 550 will amplify the sound signal to be played to drive the speaker 540 plays a sound.
  • the power amplifier 550 will process the sound signals that need to be played. Therefore, the sound signals that need to be played have undergone non-linear changes before and after the power amplifier 550. Therefore, in the back end of the power amplifier 550, the speaker The sound acquired by the front end of the 540 can be closer to the real sound played by the speaker 540 to a greater extent.
  • the playback signal of the playback sound is obtained from the back end of the power amplifier 550 and the front end of the speaker 540. Therefore, the playback signal of the playback sound is very close to the sound played by the speaker 540 picked up in the sound pickup circuit 510. Therefore, based on the playback sound
  • the recovery signal performs echo cancellation on the far-field sound picked up, which can greatly reduce the echo doped in the far-field voice of the user (the echo refers to the sound played by the speaker 540), and improve the accuracy of recognizing far-field voice , Thereby improving the sensitivity of remote sound pickup interruption and wake-up, and improving user experience.
  • the "sound" in this embodiment may specifically refer to the sound wave signal corresponding to the sound and the analog signal and digital signal corresponding to the sound.
  • the sound pickup circuit 510 picks up the sound wave signal of the far-field sound, which is processed to form the digital signal of the far-field sound, and then is transmitted to the preprocessing circuit 520.
  • Those skilled in the art have the ability to judge some format changes that occur when sound is transmitted to different circuits.
  • the preprocessing circuit 520 includes a preprocessing circuit 521 and a first encoder 522.
  • the pre-processing circuit 521 may be an MCU, a single-chip microcomputer, or some other digital processing chips with audio interfaces.
  • the preprocessing circuit 521 is an MCU as an example for description.
  • the preprocessing circuit 21 is connected to the front end of the speaker 540 through the first encoder 522, and the first encoder 522 performs analog-to-digital conversion on the playback sound recovery signal.
  • the back end of the power amplifier 550 and the front end of the speaker 540 output the playback sound recovery signal as an analog signal, so the first encoder 522 performs analog-to-digital conversion on the playback sound recovery signal, and transmits the playback sound recovery signal after the analog-to-digital conversion.
  • the MCU that is, the pre-processing circuit, 521).
  • the first encoder 522 can perform analog-to-digital conversion on the playback sound recovery signals output by the multiple speakers 540 and convert them into a channel of digital signal output.
  • the output terminal of an audio signal corresponds to "one channel” here, and the multiple analog signals output by the multiple speakers can undergo analog-to-digital conversion in the encoder and output through one channel.
  • the first encoder 522 may specifically adopt the AC108 of X-POWER Company. The AC108 can convert the analog signals output by the two speakers 540 into a channel of digital signal output.
  • the far-field voice processing circuit includes a power amplifier, which is connected between the speaker 540 and the main control chip of the display device.
  • the playback sound recovery signal includes multiple sounds obtained from the front ends of the multiple speakers 540.
  • the far-field voice processing circuit further includes a signal processing circuit 570.
  • the input end of the signal processing circuit 570 is connected to the back end of the power amplifier 550 and the front end of the speaker 540.
  • the signal processing circuit 570 The output terminal is connected to the first encoder 522. That is, the playback sound recovery signal output from the power amplifier 550 is input to the first encoder 522 after the signal processing circuit performs voltage reduction and filtering processing.
  • the signal processing circuit 570 can use a BUCK step-down circuit or a resistor divider circuit to step down the playback sound recovery signal output from the power amplifier 550; it can also use an RC filter circuit to filter the playback sound playback signal after the step-down.
  • the sound pickup circuit 510 (refer to FIGS. 16 and 18) includes a microphone array 511, and a second encoder 512 electrically connected to the microphone array 511.
  • the microphone array 511 includes multiple microphones, each of which can pick up far-field sounds; multiple microphones simultaneously pick up far-field sounds to generate multiple analog signals of far-field sounds.
  • the multiple microphones are arranged in a linear array, and the original far-field sound signals are collected and converted into analog electrical signals, and then output to the first encoder 522 at the back end.
  • the second encoder 512 is used for analog-to-digital conversion of the analog signal of the far-field sound.
  • the second encoder 512 is also used to convert the digital signals of multiple channels of far-field sounds into one channel of audio signals to transmit to the MCU after performing analog-to-digital conversion on the analog signals of the far-field sounds.
  • the second encoder 512 can use X-POWER’s AC108.
  • AC108 contains a four-channel analog-to-digital converter, which can convert a total of four analog signals output by four microphones into analog-to-digital conversion and convert them into one-channel digital signals. Output.
  • the one-channel digital audio signal converted by the first encoder 522 and the second encoder 512 may be in the IIS audio format or the TDM audio format.
  • the linear microphone array 511 is synchronized as much as possible during the signal transmission process, so that the phase difference of the transmitted waveforms cannot exceed 180°.
  • a 1kHz single-frequency electrical signal can be used to pass into the microphone array 511 for testing, so as to better observe the phase difference of each microphone output signal.
  • the four microphones will correspondingly output four analog signals of the far-field sound to the second encoder 512, and the second encoder 512 will respond to the four analog signals of the far-field sound.
  • the one-channel audio signal substantially includes analog signals output by 4 microphones.
  • CON1-CON4 are interfaces for four microphones.
  • the microphones are placed equidistantly in a straight line, with a spacing of approximately 35mm between two pairs to form a linear four-microphone array that meets the space requirements of the algorithm.
  • the analog signals of the four microphones are directly input into the second encoder 512 to complete signal processing such as analog-to-digital conversion and low-pass filtering, and then converted into a 1-channel IIS format audio signal, and the audio signal is transmitted to the MCU through the IIS interface The corresponding IIS interface.
  • the pre-processing circuit 521 is coupled to the sound pickup circuit 510 and the front end of the speaker 540 to convert the picked-up far-field sound and the playback sound recovery signal into a format compatible with the echo processing circuit 531.
  • the pre-processing circuit 521 may be an MCU.
  • the MCU receives the far-field sound signal converted into one channel and the playback sound recovery signal converted into one channel, it will synthesize the far-field sound signal and the playback sound recovery signal to An audio signal in a format compatible with the echo processing circuit 531 is formed, so that the MCU can transmit the processed far-field sound signal and the playback sound recovery signal to the echo processing circuit 531.
  • the echo processing circuit 531 is integrated in the SOC of the display device. Therefore, the MCU needs to synthesize the audio signal in a format compatible with the SOC after the far-field sound signal and the playback sound recovery signal.
  • the MCU converts the far-field sound signal and the playback sound recovery signal into a USB data format, so that the MCU can use a standard USB data cable through the UAC (USB Audio Class) protocol of the USB interface. Audio data transmission between MCU and SOC.
  • UAC USB Audio Class
  • the pre-processing circuit 521 is provided to receive the picked up far-field sound and play the sound recovery signal, thereby overcoming that many existing display device SOC chips do not have corresponding audio transmission interfaces and cannot receive the far-field transmitted by the microphone array 511. Defects of field sound. Therefore, the technical solution of the present application improves the popularity of the far-field voice human-computer interaction technology on the display device.
  • the MCU before the format conversion, is also used to adjust the phase of the picked-up far-field sound and the playback sound recovery signal, so that the phase of the playback sound recovery signal is ahead of the phase of the picked-up far-field sound.
  • the phase of the playback sound recovery signal is ahead of the phase of the picked up far-field sound within 20 ms, so that the sound played by the speaker 540 can be better eliminated.
  • the MCU is also used to perform low-pass filtering on the far-field sound and playback sound recovery signals picked up by an algorithm to filter audio with a frequency higher than 8KHz to achieve the final far-field sound and playback sound recovery output by the MCU
  • the signal has no harmonics and no aliasing; it improves the preprocessing effect of the far-field sound and the playback sound recovery signal, thereby improving the echo processing effect.
  • the far-field sound and the playback sound recovery signal can be low-pass filtered through the algorithm, and then the phase between the two can be adjusted, and finally the format conversion can be performed; the far-field sound can also be converted first. Perform phase adjustment with the playback sound recovery signal, then filter, and finally perform format conversion.
  • the MCU receives the digitized playback sound retrieving signal output by the front-end first encoder 522 and the digitized far-field sound signal output by the second encoder 512, it first performs low-pass filtering on them to prevent aliasing. Affect the recognition of the echo cancellation algorithm, and then control and adjust the phase difference between the far-field sound signal and the playback sound recovery signal.
  • the processed far-field sound and the playback sound recovery signal are synthesized into a USB format audio signal and transmitted to the back-end SOC processing.
  • the far-field speech processing circuit further includes an encryption chip 580, the encryption chip 580 is used to store the key of the remote speech recognition algorithm, and the MCU is used to communicate with the encryption chip 580. Only when the MCU and the encryption chip 580 communicate successfully, can the far-field speech recognition algorithm be started. Specifically, after the display device is powered on, the MCU will communicate with the encryption chip 580. After the communication is successful, the far-field voice obtained after the SOC echoes the far-field voice can be further used by the subsequent far-field voice recognition algorithm. It is further recognized to analyze the semantics of far-field speech.
  • the echo processing algorithm is used to remove the part of the picked-up far-field sound that corresponds to the playback sound recovery signal, so as to preserve the far-field voice of the user.
  • Existing echo processing algorithms can all be applied in this embodiment, which is not specifically limited here.
  • the echo cancellation algorithm in the voice service program field (voice server APK) integrated in the SOC dynamically determines the voice signal picked up by the microphone array 511
  • the far-field voice and the energy difference and phase difference of the playback sound output signal from the speaker 540 can be extracted from the far-field voice signal picked up by the microphone array 511, thereby eliminating the display The echo interference phenomenon caused by the sound played by the device.
  • the remote voice that has been echo-processed needs to be further processed to restore the far-field voice actually emitted by the user to the greatest extent. Refer to Figure 20 and Figure 21.
  • the SOC also includes a speech enhancement circuit 633 and a sound source localization circuit 632.
  • the far-field sound after echo cancellation output by the echo cancellation circuit is transmitted to the speech enhancement circuit 633 and the sound source localization circuit 632 respectively; the speech enhancement circuit 633 and the sound source localization circuit 632 is connected to receive the sound source localization result output by the sound source localization circuit 632, and according to the sound source localization result, the far-field sound after echo cancellation is enhanced.
  • the speech enhancement circuit 633 may include one or more of a beam forming circuit 6331, a de-reverberation circuit 6332, and a noise reduction circuit 6333.
  • the speech enhancement circuit 633 also includes a beam forming circuit 6331, a de-reverberation circuit 6332, and a noise reduction circuit 6333 that are connected in sequence to perform beam forming, de-reverberation, and de-reverberation on the far-field sound after echo cancellation. And noise reduction processing to generate far-field voice to be uploaded.
  • the sound source location circuit 632 is used to identify the source location of the user's far-field voice, and feed this location back to the voice enhancement circuit 633.
  • the voice enhancement circuit 633 is based on the determined source location of the user's far-field voice. Perform beam forming, and suppress the voice in the corresponding area based on the formed beam, and further perform noise reduction processing to finally obtain the far-field voice to be uploaded.
  • the far-field voice to be uploaded obtained in this embodiment is already very close to the real far-field voice uttered by the user.
  • the SOC also includes a speech engine circuit 634.
  • the speech engine circuit 634 is connected to the output terminal of the speech enhancement circuit 633.
  • the speech engine circuit 634 performs wake-up word recognition processing on the far-field sound to be uploaded. When a preset wake-up word is recognized When the time, the wake-up event is triggered, and the far-field sound to be uploaded is encoded and transmitted to the designated terminal 660; the speech engine circuit 634 is also used to receive the instruction corresponding to the far-field sound returned from the designated terminal 660.
  • the designated terminal 660 may be the cloud, or may be other processing circuits in the display device. Taking uploading to the cloud as an example, voice recognition and semantic understanding are performed in the cloud, and instructions corresponding to far-field sounds are generated through online voice synthesis. By executing the instructions, the entire process of human-machine voice interaction of the display device is completed.
  • the instructions received by the voice engine circuit 634 from the cloud may include voice response messages that answer questions raised by the user, and the voice response messages may be broadcast through the power amplifier 550 and the speaker 540 of the display device.
  • the instruction can also control the control instruction that the display device responds to according to the control requirements in the user's far-field voice; the SOC of the display device controls the relevant circuit to respond to the control instruction according to the control instruction. For example, the control command is shutdown, and the SOC coordinates the power supply system of the display device to stop the power supply to the display system.
  • the voice to be uploaded will be synchronously uploaded to the voice service program (voice server APK), and then reported to the algorithm provider’s cloud service background by the voice service program to realize the closed loop of wake-up Optimization; This can improve the sensitivity of the recognition of wake-up words issued by different timbres and pronunciations.
  • the echo processing circuit 531, the speech enhancement circuit 633, the sound source localization circuit 632, and the speech engine circuit 634 may be separate circuits. In this embodiment, they are all algorithm circuits and are stored in the SOC.
  • the power amplifier 550 will perform related processing on the sound signal that needs to be played. Therefore, the sound signal that needs to be played has undergone nonlinear changes before and after passing through the power amplifier 550; therefore, this solution
  • the playback sound recovery signal is obtained from the back end of the power amplifier 550 and the front end of the speaker 540.
  • the playback sound recovery signal obtained by the preprocessing circuit 521 and the sound pickup circuit 510 is very close, so based on the playback sound recovery signal, the echo cancellation of the far-field sound picked up can greatly reduce the echo interference in the far-field voice sent by the user and improve the recognition The accuracy of far-field voice, thereby improving the sensitivity of remote sound pickup to interrupt wake-up, and improve user experience;
  • the preprocessing circuit 521 is set to receive the picked-up far-field sound and to play the sound recovery signal, thereby overcoming that many existing display device SOC chips do not have corresponding interfaces and cannot receive the far-field transmitted by the microphone array 511. Defects of field sound. Therefore, the technical solution of the present application improves the popularity of the far-field voice human-computer interaction technology on the display device.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Telephone Function (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

La présente invention concerne un dispositif d'affichage (3) pour obtenir, au moyen d'un circuit de traitement d'écho (35), un signal à lire délivré à un haut-parleur (33) par un amplificateur de puissance (32), puis effectuer, en utilisant ledit signal comme signal de référence d'écho, un traitement d'annulation d'écho sur un signal vocal à traiter reçu par un circuit d'acquisition de voix (34), de telle sorte qu'un circuit de traitement de voix (31) peut mettre en œuvre un bon effet d'annulation d'écho lors de la réalisation d'un traitement d'annulation d'écho sur le signal vocal à traiter, permettant ainsi d'améliorer la précision de la reconnaissance ultérieure sur le signal vocal.
PCT/CN2020/075958 2019-07-10 2020-02-20 Dispositif d'affichage Ceased WO2021004067A1 (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN201910620438.2 2019-07-10
CN201910619184.2 2019-07-10
CN201910620438.2A CN110349582B (zh) 2019-07-10 2019-07-10 显示装置与远场语音处理电路
CN201910619184.2A CN110223707A (zh) 2019-07-10 2019-07-10 显示装置

Publications (1)

Publication Number Publication Date
WO2021004067A1 true WO2021004067A1 (fr) 2021-01-14

Family

ID=74114937

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/075958 Ceased WO2021004067A1 (fr) 2019-07-10 2020-02-20 Dispositif d'affichage

Country Status (1)

Country Link
WO (1) WO2021004067A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112929722A (zh) * 2021-03-31 2021-06-08 杭州国芯科技股份有限公司 用于机顶盒语音控制的外接设备
CN116189646A (zh) * 2022-10-25 2023-05-30 Oppo广东移动通信有限公司 语音处理方法、装置、电子设备及存储介质
CN116389974A (zh) * 2023-03-23 2023-07-04 合肥智能语音创新发展有限公司 一种音频回采装置

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105825862A (zh) * 2015-01-05 2016-08-03 沈阳新松机器人自动化股份有限公司 一种机器人人机对话回声消除系统
US20170092272A1 (en) * 2015-09-10 2017-03-30 Crestron Electronics, Inc. System and method for determining recipient of spoken command in a control system
CN106782589A (zh) * 2016-12-12 2017-05-31 奇酷互联网络科技(深圳)有限公司 移动终端及其语音输入方法和装置
CN106782591A (zh) * 2016-12-26 2017-05-31 惠州Tcl移动通信有限公司 一种在背景噪音下提高语音识别率的装置及其方法
CN109360562A (zh) * 2018-12-07 2019-02-19 深圳创维-Rgb电子有限公司 回声消除方法、装置、介质以及语音唤醒方法和设备
CN109545237A (zh) * 2018-10-24 2019-03-29 广东思派康电子科技有限公司 一种计算机可读存储介质和应用该介质的语音交互音箱
US20190172463A1 (en) * 2014-09-10 2019-06-06 Fred Bargetzi Acoustic sensory network
CN209017204U (zh) * 2018-12-25 2019-06-21 深圳创维-Rgb电子有限公司 语音识别系统
CN110223707A (zh) * 2019-07-10 2019-09-10 青岛海信电器股份有限公司 显示装置
CN110349582A (zh) * 2019-07-10 2019-10-18 青岛海信电器股份有限公司 显示装置与远场语音处理电路

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190172463A1 (en) * 2014-09-10 2019-06-06 Fred Bargetzi Acoustic sensory network
CN105825862A (zh) * 2015-01-05 2016-08-03 沈阳新松机器人自动化股份有限公司 一种机器人人机对话回声消除系统
US20170092272A1 (en) * 2015-09-10 2017-03-30 Crestron Electronics, Inc. System and method for determining recipient of spoken command in a control system
CN106782589A (zh) * 2016-12-12 2017-05-31 奇酷互联网络科技(深圳)有限公司 移动终端及其语音输入方法和装置
CN106782591A (zh) * 2016-12-26 2017-05-31 惠州Tcl移动通信有限公司 一种在背景噪音下提高语音识别率的装置及其方法
CN109545237A (zh) * 2018-10-24 2019-03-29 广东思派康电子科技有限公司 一种计算机可读存储介质和应用该介质的语音交互音箱
CN109360562A (zh) * 2018-12-07 2019-02-19 深圳创维-Rgb电子有限公司 回声消除方法、装置、介质以及语音唤醒方法和设备
CN209017204U (zh) * 2018-12-25 2019-06-21 深圳创维-Rgb电子有限公司 语音识别系统
CN110223707A (zh) * 2019-07-10 2019-09-10 青岛海信电器股份有限公司 显示装置
CN110349582A (zh) * 2019-07-10 2019-10-18 青岛海信电器股份有限公司 显示装置与远场语音处理电路

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112929722A (zh) * 2021-03-31 2021-06-08 杭州国芯科技股份有限公司 用于机顶盒语音控制的外接设备
CN112929722B (zh) * 2021-03-31 2024-05-28 杭州国芯科技股份有限公司 用于机顶盒语音控制的外接设备
CN116189646A (zh) * 2022-10-25 2023-05-30 Oppo广东移动通信有限公司 语音处理方法、装置、电子设备及存储介质
CN116389974A (zh) * 2023-03-23 2023-07-04 合肥智能语音创新发展有限公司 一种音频回采装置

Similar Documents

Publication Publication Date Title
CN103458137B (zh) 用于音频会议中的语音增强的系统和方法
CN110349582B (zh) 显示装置与远场语音处理电路
CN101277331A (zh) 声音再现设备和声音再现方法
WO2015139642A1 (fr) Procédé, dispositif et système de réduction de bruit de casque d'écoute bluetooth
US20210160611A1 (en) Microphone with adjustable signal processing
US20120057717A1 (en) Noise Suppression for Sending Voice with Binaural Microphones
CN105208189A (zh) 音频处理方法及移动终端
CN102104815A (zh) 自动调音耳机及耳机调音方法
WO2021004067A1 (fr) Dispositif d'affichage
CN108055610A (zh) 智能音箱
CN108449691A (zh) 一种拾音装置及声源距离确定方法
CN103428593B (zh) 基于扬声器来采集音频信号的装置
US20140126726A1 (en) Enhanced stereophonic audio recordings in handheld devices
CN203243508U (zh) 一种无线啸叫抑制装置
TWI790718B (zh) 會議終端及用於會議的回音消除方法
CN111276150A (zh) 一种基于麦克风阵列的智能语音转文字及同声翻译系统
CN211089900U (zh) 一种k歌耳机
CN113611272A (zh) 基于多移动终端的扬声方法、装置及存储介质
CN203181164U (zh) 耳麦
CN207039811U (zh) 一种多媒体麦克风智能检测音箱
CN213547829U (zh) 麦克风的电路结构及终端
CN205029850U (zh) 话筒适配器
CN115499761A (zh) 音频处理设备和音频处理系统
CN112804620B (zh) 回声处理方法、装置、电子设备及可读存储介质
CN113612881A (zh) 基于单移动终端的扬声方法、装置及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20836706

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20836706

Country of ref document: EP

Kind code of ref document: A1