WO2017113370A1 - Voiceprint detection method and apparatus - Google Patents
Voiceprint detection method and apparatus Download PDFInfo
- Publication number
- WO2017113370A1 WO2017113370A1 PCT/CN2015/100286 CN2015100286W WO2017113370A1 WO 2017113370 A1 WO2017113370 A1 WO 2017113370A1 CN 2015100286 W CN2015100286 W CN 2015100286W WO 2017113370 A1 WO2017113370 A1 WO 2017113370A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- signal portion
- preset
- feature
- characteristic
- audio signal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
Definitions
- the present invention relates to the field of electronic technologies, and in particular, to a method and apparatus for voiceprint detection.
- terminal devices have become an indispensable part of people's daily lives.
- most terminal devices are provided with a password protection unlock function.
- the terminal device When the terminal device is in the locked state, the user can only unlock the terminal device by inputting the correct password.
- voice unlocking has higher security than other unlocking methods, it has become a widely used unlocking method.
- the terminal device or application software provides a function of unlocking voice, verifying the user through voice unlocking, further unlocking the terminal device, or providing services.
- the voice unlocking mainly authenticates the user through the voiceprint unlocking, and compares the sound signal input by the user with the preset sound signal when unlocking, and if it is determined that the voiceprint input by the user matches the preset voiceprint, it is determined to be a person. , then unlock it.
- the recording attack cannot be prevented, that is, the text recognized by the user is recorded, and the voice-recognition text of the recording is played out to unlock the voiceprint, and the voiceprint can be successfully unlocked.
- the soundprint unlocking has a safety hazard and the safety is not high.
- the invention provides a method and a device for detecting a voiceprint, which improves the security of voiceprint unlocking.
- the method for detecting voiceprint comprises: detecting whether a sound signal is present by a terminal, and if the terminal detects a sound signal, the terminal receives the sound signal, and the terminal extracts the audio signal portion and the judgment signal portion of the sound signal, The voiceprint feature of the audio signal portion is compared with the preset voiceprint feature, and the expiratory airflow characteristic of the judgment signal portion is compared with the exhalation airflow characteristic of the audio signal portion, and the voiceprint feature of the audio signal portion is preset.
- the matching degree of the voiceprint feature exceeds a preset threshold, and the degree of matching between the expiratory airflow characteristic of the signal portion and the expiratory airflow characteristic of the audio feature portion exceeds a preset threshold, it is determined that the voiceprint detection result is successful.
- the terminal recognize the sound
- the sound signal is divided into an audio signal portion and a judgment signal portion, thereby realizing double recognition of the sound signal, and at the same time, effectively avoiding the situation that the user blows the mouth while playing the recording, and improves the security of the voiceprint unlocking.
- the feature of the expiratory airflow that is greater than a preset airflow threshold in the portion of the determination signal is received; the characteristics of the expiratory airflow are quantized; and the quantized expiratory airflow characteristic corresponds to the text corresponding to the audio signal portion. Comparing the characteristics of the expiratory airflow; if the matching of the characterized expiratory airflow characteristic with the expiratory airflow characteristic of the audio signal portion exceeds a preset threshold, determining the expiratory airflow characteristic of the signal portion and the audio characteristic portion The matching of the expiratory airflow characteristics exceeds a preset threshold.
- the expiratory airflow characteristic is compared with a preset airflow threshold, and if the expiratory airflow characteristic is greater than the preset airflow threshold, the expiratory airflow feature is quantized to 1; otherwise, The expiratory flow characteristic is quantized to 0; if at least one of the following two conditions: the expiratory airflow characteristic is quantized to 1, and the text corresponding to the audio signal portion is an aspirated sound; the expiratory airflow characteristic is quantized to 0, and the audio signal portion is The corresponding text is a non-aspirate sound, and the matched expiratory airflow characteristic and the expiratory airflow characteristic of the audio signal portion exceed a preset threshold.
- the characteristics of the expiratory flow are quantified by comparing the characteristics of the expiratory flow with the preset airflow threshold.
- the range it is judged that the voiceprint detection result is successful.
- the angle of the pointing direction of the determining signal portion and the pointing direction of the audio signal portion are respectively compared with the preset pointing angle threshold; if the angle of the pointing direction of the signal portion and the audio signal portion are determined The angle of the pointing direction is smaller than the preset pointing angle threshold, and then the pointing direction characteristic of the signal portion and the pointing direction characteristic of the audio signal portion are preset.
- the preset pointing angle threshold By comparing the angle of the pointing direction of the judgment signal portion and the angle of the pointing direction of the audio signal portion with the preset pointing angle threshold, respectively, whether the pointing direction characteristic of the judgment signal portion and the pointing direction characteristic of the audio signal portion are in advance Set within the scope.
- the sensing temperature characteristic of the determining signal portion is compared with a preset temperature threshold; when the matching degree between the voiceprint feature of the audio signal portion and the preset voiceprint feature exceeds a preset threshold, and determining The matching degree of the expiratory airflow characteristic of the signal part and the expiratory airflow characteristic of the audio characteristic part exceeds a preset threshold value, and the pointing direction characteristic of the signal part and the pointing direction characteristic of the audio signal part are within a preset range, and the judgment signal part is When the perceived temperature characteristic is greater than or equal to the preset temperature threshold, it is determined that the voiceprint detection result is successful.
- the method further includes: the terminal separating the sound signal into the audio signal portion and the determining signal portion; specifically, the terminal adopts the sound signal a filter of a preset frequency is filtered to obtain an audio signal portion; the terminal filters the sound signal by using a filter of a second preset frequency to obtain a judgment signal portion; wherein the filter of the first preset frequency is a high-pass filter The filter of the second preset frequency is a low pass filter.
- the sound signal is separated into an audio signal portion and a judgment signal portion by passing the sound signal through a filter of a preset frequency.
- the voiceprint feature of the audio signal portion includes: at least one of a voiceprint waveform and a signal frequency; and at least one of the following two cases: the voiceprint waveform of the audio signal portion and the pre- Aligning the characteristic waveforms of the voiceprint samples; comparing the signal frequency of the audio signal portion with the characteristic frequency of the preset voiceprint sample; if the matching between the voiceprint waveform of the audio signal portion and the preset voiceprint sample characteristic waveform exceeds The threshold value is set; and/or, the matching between the signal frequency of the audio signal portion and the characteristic frequency of the preset voiceprint sample exceeds a preset threshold, and the matching degree between the voiceprint feature of the audio signal portion and the preset voiceprint feature exceeds Set the threshold.
- the method further includes: collecting, by the terminal, a sound signal sent by the user, performing feature analysis on the sound signal, acquiring a preset voiceprint feature, and storing the sound signal. Collect users in advance through the terminal The sound signal is emitted, and the sound signal is analyzed as a preset voiceprint feature and stored to ensure the accuracy of the preset voiceprint feature, thereby improving the accuracy of matching the voiceprint feature of the audio signal portion with the preset voiceprint feature. Sex, which improves the security of voiceprint unlocking.
- the method further includes: acquiring, by the terminal, an airflow feature that is exhaled when the user outputs a sound corresponding to the sound signal.
- the acquisition of the expiratory airflow characteristic of the judging signal portion is performed to ensure that the expiratory airflow characteristic of the judging signal portion is compared with the expiratory airflow characteristic of the audio signal portion.
- the method further includes: acquiring, by the terminal, a direction of the user outputting a sound corresponding to the sound signal.
- the obtaining of the pointing direction feature of the judging signal portion is performed to ensure that the pointing direction feature of the judging signal portion and the pointing direction feature of the audio signal portion are within a preset range.
- the method further includes: acquiring, by the terminal, a temperature when the user outputs a sound corresponding to the sound signal.
- the acquisition of the sensing temperature characteristic of the determination signal portion is performed to ensure that the sensing temperature characteristic of the determination signal portion is compared with the preset temperature threshold.
- the terminal includes: a detecting module, configured to detect whether a sound signal is present; a receiving module, configured to receive a sound signal; and an extracting module, configured to extract an audio signal portion and a determining signal portion of the sound signal; a matching module for comparing the voiceprint feature of the audio signal portion with the preset voiceprint feature; comparing the expiratory airflow characteristic of the determination signal portion with the expiratory airflow characteristic of the audio signal portion; wherein, exhaling The airflow characteristic is a feature of the airflow exhaled when the user outputs the sound corresponding to the sound signal; and the determining module is configured to: when the matching degree of the voiceprint feature of the audio signal portion and the preset voiceprint feature exceeds a preset threshold, and determine the signal portion When the matching degree of the expiratory airflow characteristic and the expiratory airflow characteristic of the audio feature part exceeds a preset threshold, it is determined that the voiceprint detection result is the detection success.
- the sound signal is divided into the audio signal portion and the judgment signal portion, thereby realizing the double recognition of the sound signal, and at the same time, effectively avoiding the user blowing the air while playing the recording, and improving the voiceprint unlocking. Security.
- the terminal includes: a microphone and a processor; a microphone for detecting whether there is a sound signal; if detecting a sound signal, receiving a sound signal; and a processor for extracting an audio signal portion of the sound signal And judging the signal portion; comparing the voiceprint feature of the audio signal portion with the preset voiceprint feature; comparing the expiratory airflow characteristic of the determination signal portion with the expiratory airflow characteristic of the audio signal portion; wherein, the expiratory airflow The feature is a feature of the airflow exhaled when the user outputs the sound corresponding to the sound signal; when the matching degree of the voiceprint feature of the audio signal portion and the preset voiceprint feature exceeds a preset threshold, and the expiratory airflow characteristic and the audio of the signal portion are judged Exhalation of the characteristic part When the matching degree of the airflow feature exceeds a preset threshold, it is determined that the voiceprint detection result is successful.
- the sound signal is divided into the audio signal portion and the judgment signal portion, thereby realizing the double recognition of the sound signal, and at the same time, effectively avoiding the user blowing the air while playing the recording, and improving the voiceprint unlocking. Security.
- the present invention provides a non-transitory computer readable storage medium storing computer instructions for causing an apparatus for controlling a cache to perform an operation in the above method.
- the method and device for detecting voiceprint detect whether there is a sound signal through a terminal, and if the terminal detects a sound signal, the terminal receives the sound signal, and the terminal extracts the audio signal portion and the judgment signal portion of the sound signal, and the audio signal portion
- the voiceprint feature is compared with the preset voiceprint feature, and the expiratory airflow characteristic of the judgment signal portion is compared with the exhalation airflow characteristic of the audio signal portion, and the voiceprint feature and the preset voiceprint feature of the audio signal portion are compared.
- the matching degree exceeds the preset threshold, and the matching degree of the expiratory airflow characteristic of the signal part and the expiratory airflow characteristic of the audio characteristic part exceeds a preset threshold, it is determined that the voiceprint detection result is the detection success.
- the terminal recognizes the sound signal, the sound signal is divided into the audio signal portion and the judgment signal portion, thereby realizing the double recognition of the sound signal, and at the same time, effectively avoiding the user blowing the air while playing the recording, and improving the voiceprint unlocking. Security.
- FIG. 1A is a schematic diagram of a scene for unlocking a voiceprint according to an embodiment of the present invention
- FIG. 1B is a schematic diagram of a scenario for setting a voiceprint password according to an embodiment of the present invention
- FIG. 2 is a flowchart of a method for detecting voiceprint according to Embodiment 1 of the present invention
- FIG. 3A is a schematic diagram of quantification of a blowing signal according to Embodiment 1 of the present invention.
- FIG. 3B is a schematic diagram of quantification of a blowing signal according to Embodiment 2 of the present invention.
- FIG. 4 is a schematic diagram of a process of voiceprint detection according to Embodiment 1 of the present invention.
- FIG. 5 is a flowchart of a method for voiceprint detection according to Embodiment 2 of the present invention.
- FIG. 6 is a schematic diagram of an angle of a pointing direction of a sound signal according to Embodiment 1 of the present invention.
- FIG. 7 is a flowchart of a method for voiceprint detection according to Embodiment 3 of the present invention.
- FIG. 8 is a flowchart of a method for voiceprint detection according to Embodiment 4 of the present invention.
- FIG. 9 is a schematic structural diagram of a terminal according to Embodiment 1 of the present invention.
- FIG. 10 is a schematic structural diagram of a terminal according to Embodiment 2 of the present invention.
- FIG. 11 is a schematic structural diagram of a terminal according to Embodiment 3 of the present invention.
- FIG. 12 is a schematic structural diagram of a terminal according to Embodiment 4 of the present invention.
- FIG. 13 is a schematic structural diagram of a device for detecting a voiceprint according to Embodiment 1 of the present invention.
- FIG. 1A is a schematic diagram of a scene of a voiceprint unlocking according to an embodiment of the present invention.
- the terminal device or the application software provides a function of unlocking the voiceprint.
- the user verifies the user by unlocking the voiceprint by speaking the corresponding voiceprint password, further unlocking the device, or providing a service.
- Voiceprint recognition generally includes two types: 1.
- the text content recognized during voiceprint recognition is preset: each time the unlock is unlocked, the same user-preset text recognition is repeated (for example, sesame opens the door); or
- the electronic device randomly generates some text or digital passwords, and the user reads out the prompt random password to ensure the security of the voiceprint recognition; 2.
- FIG. 1B is a schematic diagram of a scene of a voiceprint password setting according to an embodiment of the present invention. As shown in FIG.
- a user can set a voiceprint password, and a voiceprint password can be defined in advance, for example, by defining a voiceprint password, the user After the voiceprint password "Sesame Opens" is spoken, the terminal successfully enters the user's voiceprint password through the microphone, and the user logs in the account through the voiceprint password, and the terminal determines whether to let the user log in to the account by verifying the voiceprint password input by the user.
- FIG. 2 is a flowchart of a method for voiceprint detection according to Embodiment 1 of the present invention. As shown in FIG. 2, the method provided by the embodiment of the present invention includes:
- S201 The terminal detects whether there is a sound signal.
- the terminal has the function of receiving voice, and the terminal may include, but is not limited to, a mobile communication device such as a mobile phone or a tablet computer.
- the user when the user needs to unlock the verification, the user sends a sound signal (speech signal) to the terminal.
- the voice signal sent by the user may be that the user speaks the preset voiceprint password “open sesame” or the user calls.
- the name of the voice assistant such as "small ice", “hello google”, etc., may also be a text or digital password randomly generated by the user to read the terminal, or the user may randomly say a paragraph.
- the terminal When the terminal is in the unlocked state, it detects whether there is a voice signal sent by the user. If the terminal detects the voice signal sent by the user when the terminal is in the unlocked state, that is, when the voiceprint recognition signal is detected, the voice signal sent by the user is recognized.
- the terminal is not always in the living voiceprint recognition mode, but when the terminal detects the voiceprint recognition signal, after entering the living voiceprint recognition mode, the voice signal sent by the user is identified.
- the terminal is in an unlocked (standby) state.
- voiceprint recognition is required, the voiceprint recognition mode is entered, for example, the terminal enters the screen to be locked, the application software is to be unlocked, the user's mouth is close to the microphone, or the user is identified.
- the live voiceprint recognition mode is entered.
- the mouth proximity microphone can be judged by a sensor such as a proximity sensor, an ultrasonic sensor, an infrared sensor, or the like, and enters the living voiceprint recognition mode.
- the living voiceprint recognition mode requires the terminal to open the corresponding module to perform corresponding analysis and processing on the received voiceprint identification signal, including, for example, a recording module, a voiceprint recognition module, a thermometer module, a light sensor module, a directional monitor module, Any module or combination of modules in the ultrasonic sensor and infrared sensor to enter the live voiceprint recognition mode.
- the terminal in the embodiment of the present invention may also be in the living voiceprint recognition mode. When the voiceprint recognition signal is detected, the voice signal sent by the user may be identified.
- the implementation of the present invention mainly describes how to enter the living voiceprint recognition mode when the terminal detects the sound signal, but is not limited thereto.
- the voiceprint is a sound wave spectrum of a sound signal (speech signal) displayed by an electroacoustic instrument. Due to different habits of different people's voices, different people's vocal airflow is different, resulting in sound quality and tone. There are differences, and each voiceprint is different.
- Voiceprint recognition is a type of biometric recognition to confirm whether a certain voice is spoken by a designated person.
- the voiceprint recognition signal is a sound signal (speech signal) detected when the terminal is in an unlocked state, and the voiceprint recognition signal includes a voice signal of the user voiceprint, and the terminal can recognize the sound signal according to the voiceprint recognition signal. Whether the user's voiceprint is the voiceprint of the specified user to confirm whether the detected voice signal is what the specified user said.
- the terminal when detecting a sound signal, can receive the sound signal through the microphone.
- the terminal receives the sound signal and stores the received sound signal.
- the terminal may always be in the listening state, and buffer the received sound signal, so that when the terminal enters the living voiceprint recognition mode, the voiceprint identification signal is complete for analysis and processing. .
- S203 The terminal extracts an audio signal portion and a judgment signal portion of the sound signal.
- the sound signal may include an audio signal of the user's speaking voice and a perceived temperature when the user speaks, and may also include an audio signal and a sound signal direction of the user's speaking voice, and may also include an audio signal of the user's speaking voice and the user speaking.
- the exhaled signal the terminal may divide the sound signal into an audio signal portion and a judgment signal portion, wherein the audio signal portion may include a voiceprint feature of the audio signal in the sound signal, and the determination signal portion may include a perceived temperature when the user speaks, At least one of a direction of the sound signal and a signal of exhalation when the user speaks, for example, the terminal can obtain the perceived temperature of the voice signal in the voice signal by the terminal; the terminal can also obtain the directivity direction of the sound signal through the microphone array; The terminal can also obtain a signal of exhalation when the user speaks through a filter of a preset frequency (low pass filter).
- the audio signal portion may include a voiceprint feature of the audio signal in the sound signal
- the determination signal portion may include a perceived temperature when the user speaks, At least one of a direction of the sound signal and a signal of exhalation when the user speaks, for example, the terminal can obtain the perceived temperature of the voice signal in the voice signal by the terminal; the terminal can also obtain the directivity direction of the sound signal
- the terminal compares the voiceprint feature of the audio signal portion with the preset voiceprint feature, and determines whether the voiceprint feature of the audio signal portion matches the preset voiceprint feature.
- the user before the terminal enters the standby state, the user may set the living voiceprint recognition in the terminal, including receiving a voice signal preset by the user, for example, giving the word “sesame opening”, the user reads the a preset text, the terminal records a user voice signal, the voice signal includes an audio signal that the user reads the preset text, the audio signal has a voiceprint recognition feature, and the voiceprint recognition feature of the audio signal is used as a preset sound Pattern features.
- the voiceprint feature may include at least one of a voiceprint waveform of the audio signal and a signal frequency of the audio signal. Comparing the voiceprint feature of the audio signal portion with the preset voiceprint feature may be by at least one of two things:
- the signal frequency of the audio signal portion is compared with the preset voiceprint sample characteristic frequency.
- S205 Compare the characteristics of the expiratory airflow of the judging signal portion with the characteristics of the expiratory airflow of the audio signal portion.
- the exhalation airflow is characterized by the airflow exhaled when the user corresponding to the sound signal outputs the sound.
- the terminal captures the microphone input by detecting a sound signal received by the microphone, using a tape recorder or the like.
- a sound signal received by the microphone
- the exhaled airflow must open the glottis. Due to the Bernoulli effect, the glottis is returned, and the pressure under the glottis is large enough to open the glottis repeatedly.
- the opening and closing forms a periodic tremor, so there is an airflow exhalation when the sound is pronounced, which is referred to herein as a blowing signal, that is, the blowing signal is an outgoing airflow characteristic corresponding to the sound output by the user.
- the microphone receives the effective airflow as a blow signal. Since the frequency of the audio in the sound signal is about 300-3000 Hertz (Hz), the air blowing sound to the microphone is mainly a low-frequency signal, so the high-frequency component that is not blown can be filtered out by low-pass filtering to obtain the air blowing. Signal, thereby separating the audio signal and the blow signal.
- Hz Hertz
- the terminal when the terminal detects that there is a sound signal, and the extracted judgment signal portion includes a blow signal, the terminal converts the audio signal into a corresponding text, and determines that the exhalation airflow characteristic of each word or word of the text is a breath sound or The air-free tone is compared with the expiratory airflow characteristic of the audio signal portion to determine whether the user's air blowing signal matches the audio signal. For example, when the user recognizes a sample in a preset voiceprint, the voice is sounded for a certain word, but when the voiceprint is verified, the voice is unvoiced when the word is pronounced, and it is determined that the user's blow signal does not match the audio signal.
- the terminal may learn the characteristics of the exhalation airflow of each user's audio signal from at least one of the user's daily call and the voice assistant according to the user's voice habit, for example, When the user speaks a particular word or word, the blow is larger, while the same word or other user blows less, to improve the accuracy of the user's expiratory flow characteristics.
- the terminal compares the voiceprint feature of the audio signal portion with the preset voiceprint feature, determines that the voiceprint feature of the audio signal portion matches the preset voiceprint feature, and the terminal determines the expiratory airflow characteristic of the signal portion and The characteristics of the expiratory airflow in the audio signal portion are compared, and the characteristics of the expiratory airflow of the signal portion are matched with the characteristics of the expiratory airflow of the audio feature portion.
- the voiceprint detection result is successful.
- the terminal is unlocked, and the user can be at the terminal. Complete the corresponding operations, such as unlocking the phone, logging in to WeChat, and so on.
- the terminal may first determine whether the matching degree between the voiceprint feature of the audio signal portion and the preset voiceprint feature exceeds a preset threshold, if the matching degree between the voiceprint feature of the audio signal portion and the preset voiceprint feature does not exceed The preset threshold value does not match the voiceprint feature of the audio signal portion and the preset voiceprint feature, and the terminal determines that the voiceprint detection fails, and the terminal can directly exit the voiceprint detection mode; if the voiceprint portion of the audio signal portion and the preset sound If the matching degree of the pattern feature exceeds a preset threshold, it is further determined whether the matching degree of the expiratory airflow characteristic of the determination signal portion and the expiratory airflow characteristic of the audio characteristic portion exceeds a preset threshold value, and if the expiratory airflow characteristic of the signal portion is determined The matching degree of the expiratory airflow characteristic with the audio feature portion exceeds a preset threshold, and the voiceprint feature of the audio signal portion matches the preset voiceprint feature, and the exhalation airflow characteristic of the signal portion and the ex
- the airflow feature is matched, the terminal determines that the voiceprint detection is successful, and the terminal unlocks; if the signal part of the expiratory airflow characteristic and the audio feature is judged Matching feature of the expiratory flow does not exceed a preset threshold, wherein the expiratory flow expiratory flow characteristic signal portion and an audio portion is determined characteristic does not match, the terminal determines a failure detector voiceprint, voiceprint detection mode terminal exits.
- the preset threshold may be determined according to actual conditions. For example, if the voiceprint feature matching accuracy in the terminal is high, the preset threshold may be set to 95%. If the voiceprint feature matching accuracy in the terminal is low, Set the preset threshold to 90%.
- the voiceprint detection method detects whether there is a sound signal through the terminal. If the terminal detects a sound signal, the terminal receives the sound signal, and the terminal extracts the audio signal portion and the judgment signal portion of the sound signal, and the audio signal portion The voiceprint feature is compared with the preset voiceprint feature, and the expiratory airflow characteristic of the judgment signal portion is compared with the exhalation airflow characteristic of the audio signal portion, and the voiceprint feature and the preset voiceprint feature of the audio signal portion are compared.
- the matching degree exceeds the preset threshold, and the matching degree of the expiratory airflow characteristic of the signal part and the expiratory airflow characteristic of the audio characteristic part exceeds a preset threshold, it is determined that the voiceprint detection result is successful, so that the terminal recognizes the sound.
- the sound signal is divided into an audio signal part and a judgment signal part to realize double recognition of the sound signal, and at the same time, the user can effectively avoid the situation that the user blows the mouth while playing the recording, and the safety of the voiceprint unlocking is improved.
- the method for voiceprint detection further includes:
- An expiratory airflow characteristic that is greater than a preset airflow threshold in the portion of the determination signal is received.
- the characteristics of the expiratory flow are quantified.
- the quantified expiratory airflow characteristics are compared to the expiratory airflow characteristics of the audio signal portion.
- the matching degree between the expiratory airflow characteristic of the signal portion and the expiratory airflow characteristic of the audio characteristic portion exceeds a preset threshold, including:
- the quantified expiratory airflow characteristic matches the expiratory airflow characteristic of the audio signal portion exceeding a preset threshold.
- the terminal determines whether the size of the expiratory airflow of the inflating signal is greater than a preset airflow threshold, and the receiving the determining signal portion is greater than the preset airflow threshold.
- the expiratory flow is quantified based on the size of the expiratory flow.
- the preset airflow threshold in the embodiment of the present invention may take 0.10 liters/second (L/s).
- the characteristics of the expiratory flow are quantified, including:
- the expiratory airflow characteristic is compared with a preset airflow threshold. If the expiratory airflow characteristic is greater than the preset airflow threshold, the expiratory airflow characteristic is quantized to 1; otherwise, the expiratory flow characteristic is quantized to zero.
- the quantified expiratory airflow characteristic and the expiratory airflow characteristic matching degree of the audio signal portion exceed a preset threshold, and includes: at least one of the following two cases.
- the expiratory airflow characteristic is quantized to 1, and the text corresponding to the audio signal portion is an aspirated sound.
- the expiratory airflow characteristic is quantized to 0, and the text corresponding to the audio signal portion is an unspised sound.
- the insufflation may be divided into several levels, for example, 10 levels.
- the gas of the received insufflation signal is greater than or equal to the fifth level, the determination is met.
- the preset threshold determines that the blow signal is 1.
- the gas of the received blow signal is less than the fifth level, it is determined that the preset threshold is not reached, and then the blow signal is determined to be zero.
- FIG. 3A is a schematic diagram of quantification of a blowing signal according to Embodiment 1 of the present invention. As shown in FIG.
- FIG. 3A is a schematic diagram of quantification of a blowing signal according to Embodiment 2 of the present invention. As shown in FIG. 3B, when the gas of the blowing signal reaches 8 levels, it is determined that the blowing signal is 1.
- FIG. 4 is a schematic diagram of a process of voiceprint detection according to Embodiment 1 of the present invention.
- a user sends a voiceprint identification signal “Opening a door of sesame”, and after the microphone of the terminal is received, the separation module separates the voiceprint recognition.
- the signal is an audio signal and a blow signal, and the audio signal is further sent to the voiceprint recognition module to complete the voiceprint recognition.
- the audio-to-text module converts the audio into corresponding text, and determines each word of the text or The blow signal corresponding to the word is an aspirated sound or a non-aspirate sound.
- the blowing module quantifies the received blowing signal, defines 1 for the blowing signal equal to or greater than the threshold, 0 for the threshold, outputs the binary signal of the blowing signal, and judges the word output by the module to the audio to text module. Or the word is compared with the binary signal output by the blowing signal recognition module. For example, the blowing signal of the "sesame door opening" that the user says is “0", “0", "1", "0". For example, the user says that the "top” blow signal is “1" "1", and the user says that the "sport” blow signal is “0" "1".
- FIG. 5 is a flowchart of a method for voiceprint detection according to Embodiment 2 of the present invention. Another specific implementation manner of the method provided by the embodiment of the present invention, as shown in FIG. 5, the method provided by the embodiment of the present invention includes:
- S501 The terminal detects whether there is a sound signal.
- S503 The terminal extracts an audio signal portion and a judgment signal portion of the sound signal.
- S504 Align the voiceprint feature of the audio signal portion with the preset voiceprint feature.
- S501, S502, S503, and S504 are the same as those of S201, S202, S203, and S204.
- S201, S202, S203, and S204 For details, refer to the descriptions of S201, S202, S203, and S204, and details are not described herein again.
- the determining signal portion may include a pointing direction feature, wherein the pointing direction feature is a direction in which the user corresponding to the sound signal outputs the sound.
- the sound signal received by the terminal may have problems in that the audio signal and the air blowing signal come from different directions, that is, other users use the recording for the audio signal, and at the same time, the other voice signal is used to emit the blowing signal, resulting in the audio signal and The blowing signal is not from the same voice signal, and the audio signal is inconsistent with the direction of the blowing signal. For example, the word “sesame opening" is given, but the pronunciation is not pronounced, so that the direction and blowing of the recording are played.
- the direction of the gas is inconsistent, and the terminal determines whether the pointing direction feature of the signal portion and the pointing direction characteristic of the audio signal portion are within a preset range to determine whether the audio signal and the blowing signal are from the same directivity direction, thereby avoiding recording attacks. .
- the terminal detects that the voice signal of the user A can unlock the terminal, that is, the voice signal of the user A can unlock the terminal. If user B holds the voice recording of user A while blowing the mouth, but does not pronounce it, the direction of playing the recording and the mouth-type blowing may not be the same at this time, but when the ordinary voiceprint is used to unlock the user, user B also The terminal may be successfully unlocked, which poses a security risk and is not safe. In the embodiment of the present invention, it is determined whether the two directivity directions are within a preset range of the microphone array.
- the two directivity directions are within a preset range of the microphone array, the sound signals are from the same directivity direction, and There is a recording attack; if the two directivity directions are not within the preset range of the microphone array, the sound signal comes from different directivity directions, and there is a recording attack.
- determining whether the pointing direction feature of the determining signal portion and the pointing direction feature of the audio signal portion are within a preset range comprising: respectively determining an angle of a pointing direction of the determining signal portion and a pointing direction of the audio signal portion with a pre-preparation Set the pointing angle threshold comparison.
- determining the pointing direction feature of the signal portion and the pointing direction feature of the audio signal portion are within a preset range, including: determining that the angle of the pointing direction of the signal portion and the pointing direction of the audio signal portion are smaller than the preset pointing angle Threshold.
- the microphone directivity receiving technology can be used to prevent the recording attack.
- the terminal can include a mode in which the microphone is directed to receive the signal, that is, the microphone enters the directivity listening mode, and only the receiving conforms to the preset. Audio and blow signals in the range of angles avoid recording attacks by limiting the range of microphones that receive audio and blow signals.
- the directional reception of the microphone is realized by the sound source localization technology, which can be realized by the microphone array.
- the sound transmitted from different directions can be captured, and the microphone is pointed to a specific direction by an algorithm operation to form a “beam” pointing to the sound, and the direction is captured.
- the audio signal can be used to realize the directionality of the microphone to receive the voice signal.
- the acoustic waves reach the microhour difference between each microphone in the array, and the microphone array can achieve better directivity than a single microphone.
- the specific implementation includes that the microphone array can point the sound beam to a range of angles, for example, by a generalized cross-correlation method, smooth coherent transform, phase transform or maximum likelihood, and then adjust the radio direction according to the delay and the set position of the microphone array.
- the tangential receiving direction is a conical shape with an angle ⁇ , and it is further determined that the audio signal and the blowing signal of the received sound source S are both from the effective information number in the direction of the ⁇ 1 angle in the direction smaller than the ⁇ angular direction cone.
- FIG. 6 is a schematic diagram of an angle of a pointing direction of a sound signal according to Embodiment 1 of the present invention.
- one mobile phone has two microphones A and B, and the distance between A and B is fixed, which is a known d, and the sound propagation speed is fixed to C, and the time difference between the arrival of the microphones A and B according to the sound is ⁇ .
- the angle ⁇ 1 between the sound source (sound signal signal) and the microphone B can be calculated, and it is judged whether it is within the conic shape of the effective sound source direction ⁇ angle according to the angle ⁇ 1. Therefore, it can be judged that the audio signal and the air blowing signal of the sound source are effective signals received by the directional microphone.
- ⁇ is the delay amount of the sound reaching the two microphones
- d is the distance between the two microphones
- ⁇ 1 is the directivity direction angle of the speech signal
- C is the speed of the sound.
- the distance between the sound source and the microphone can be set.
- the distance between the sound source and the microphone is determined by the light sensor, the infrared sensor, the ultrasonic sensor, etc., and the distance threshold is set to ensure whether the direction of the recording attack and the blowing signal are consistent. Because if the sound source is close to the microphone, the recording attack and the blow signal will come from the same direction.
- S506 determining a sound when the matching degree between the voiceprint feature of the audio signal portion and the preset voiceprint feature exceeds a preset threshold, and determining that the pointing direction feature of the signal portion and the pointing direction feature of the audio signal portion are within a preset range The result of the detection is successful.
- the matching degree between the voiceprint feature of the audio signal portion and the preset voiceprint feature exceeds a preset threshold, and whether the audio signal and the air blowing signal in the voiceprint recognition signal are from the same directivity direction, determining the voiceprint The test result is that the test is successful.
- the voiceprint detection method detects whether there is a sound signal through the terminal. If the terminal detects a sound signal, the terminal receives the sound signal, and the terminal extracts the audio signal portion and the judgment signal portion of the sound signal, and the audio signal portion
- the voiceprint feature is compared with the preset voiceprint feature to determine whether the pointing direction feature of the signal portion and the pointing direction feature of the audio signal portion are within a preset range, when the voiceprint feature of the audio signal portion and the preset voiceprint feature
- the matching degree exceeds the preset threshold, and the pointing direction characteristic of the determined signal portion and the pointing direction characteristic of the audio signal portion are within the preset range, determining that the voiceprint detection result is successful for detection, so that when the terminal recognizes the sound signal
- the sound signal is divided into an audio signal part and a judgment signal part to realize double recognition of the sound signal, and at the same time, the situation that the playing recording and the mouth-type blowing direction may be inconsistent is effectively avoided, and the security of the voiceprint unlocking
- FIG. 7 is a flowchart of a method for voiceprint detection according to Embodiment 3 of the present invention.
- a further implementation manner of the method provided by the embodiment of the present invention, as shown in FIG. 7 the method provided by the embodiment of the present invention includes:
- S701 The terminal detects whether there is a sound signal.
- S703 The terminal extracts an audio signal portion and a judgment signal portion of the sound signal.
- S704 Align the voiceprint feature of the audio signal portion with the preset voiceprint feature.
- S701, S702, S703, and S704 are implemented in the same manner as S201, S202, S203, and S204, respectively.
- S201, S202, S203, and S204 are implemented in the same manner as S201, S202, S203, and S204, respectively.
- S201, S202, S203, and S204 are implemented in the same manner as S201, S202, S203, and S204, respectively.
- S201, S202, S203, and S204 are not described herein again.
- the determining signal portion may include a sensing temperature characteristic, wherein the sensing temperature characteristic is a temperature when the user corresponding to the sound signal outputs the sound.
- the sensing temperature characteristic of the determining signal portion is compared with a preset temperature threshold to determine whether the sensing temperature characteristic of the determining signal portion is greater than or equal to a preset temperature threshold.
- the terminal can sense the temperature of the adjacent microphone through the infrared sensor to determine that the voice signal is from the human body, such as a user, not a recorded electronic device.
- the preset temperature threshold may be determined according to a temperature range of the human body, and the preset temperature threshold is generally set to a minimum temperature within a normal range of the human body, such as 36 degrees Celsius.
- the voice signal received by the terminal may be determined.
- the voiceprint detection method detects whether there is a sound signal through the terminal. If the terminal detects a sound signal, the terminal receives the sound signal, and the terminal extracts the audio signal portion and the judgment signal portion of the sound signal, and the audio signal portion
- the voiceprint feature is compared with the preset voiceprint feature, and the perceived temperature characteristic of the determination signal portion is compared with the preset temperature threshold, and the matching degree between the voiceprint feature of the audio signal portion and the preset voiceprint feature exceeds the preset. Threshold and judge the sense of the signal When the temperature characteristic is greater than or equal to the preset temperature threshold, it is determined that the voiceprint detection result is successful.
- the preset temperature threshold is used to determine that the voice signal received by the terminal is from the user, not the recorded electronic device, thereby avoiding the recording attack and improving the security of the voiceprint unlocking.
- FIG. 8 is a flowchart of a method for voiceprint detection according to Embodiment 4 of the present invention.
- the method provided by the embodiment of the present invention is another specific implementation manner of the method provided in the embodiment 1 of FIG. 2 .
- the method provided by the embodiment of the present invention includes:
- the terminal Before the voiceprint recognition signal is detected, before the terminal enters the living voiceprint recognition mode, the terminal further includes:
- the terminal detects whether there is a voiceprint recognition signal; wherein the voiceprint recognition signal is a sound signal detected when the terminal is in an unlocked state.
- the terminal detects whether there is a voiceprint identification signal, including: when the unlocked state, the terminal detects whether there is a sound signal; if the terminal detects a sound signal, the sound signal is a voiceprint recognition signal.
- S802 The terminal receives the voiceprint recognition signal and stores it.
- S803 The terminal extracts an audio signal portion and a judgment signal portion of the voiceprint recognition signal.
- S804 The terminal determines whether the matching degree between the voiceprint feature of the audio signal portion and the preset voiceprint feature exceeds a preset threshold. If the matching degree between the voiceprint feature of the audio signal portion and the preset voiceprint feature exceeds a preset threshold, then S805 is performed; otherwise, S808 is performed.
- the terminal may compare the voiceprint feature of the audio signal portion with the preset voiceprint feature to determine whether the matching relationship between the voiceprint feature of the audio signal portion and the preset voiceprint feature is Exceeded the preset threshold.
- S805 The terminal determines whether the audio signal of the audio signal portion and the air blowing signal of the determination signal portion are from the same directivity direction. If the audio signal of the audio signal portion and the air blowing signal of the determination signal portion are from the same directivity direction, then S806 is performed; otherwise, S808 is performed.
- the terminal may determine whether the pointing direction feature of the signal portion and the pointing direction feature of the audio signal portion are within a preset range to determine the audio signal and the judgment signal portion of the audio signal portion. Whether the gas signal comes from the same directivity direction.
- S806 The terminal determines whether the text corresponding to the audio signal portion matches the expiratory airflow in the determination signal portion. If the text corresponding to the audio signal portion matches the expiratory airflow in the judgment signal portion, Then, S807 is executed; otherwise, S808 is executed.
- the terminal may compare the expiratory airflow characteristic of the determination signal part with the expiratory airflow characteristic of the audio signal part to determine the text corresponding to the audio signal part and the call in the judgment signal part. Whether the gas flow matches.
- the method further comprises: determining whether the sensing temperature characteristic of the determining signal portion is greater than or equal to a preset temperature threshold; if the matching relationship between the voiceprint feature of the audio signal portion and the preset voiceprint feature exceeds a preset threshold, and determining that the matching degree of the expiratory airflow characteristic of the signal part and the expiratory airflow characteristic of the audio characteristic part exceeds a preset threshold value, and determining a pointing direction characteristic of the signal part and a pointing direction characteristic of the audio signal part are preset
- the living voiceprint detection is successful within the range and when the perceived temperature characteristic of the signal portion is greater than or equal to the preset temperature threshold.
- the voiceprint detection method when the voiceprint recognition signal is detected, the terminal enters the living voiceprint recognition mode, the terminal receives the voiceprint recognition signal and stores, and the terminal extracts the audio signal portion of the voiceprint recognition signal. And determining the signal portion, when the matching degree between the voiceprint feature of the audio signal portion and the preset voiceprint feature exceeds a preset threshold, and the judgment feature of the determination signal portion satisfies a preset determination condition, determining the voiceprint detection result is detection Successfully, when the terminal recognizes the voiceprint recognition signal, the voiceprint recognition signal is divided into an audio signal portion and a judgment signal portion, thereby realizing double recognition of the voiceprint recognition signal, and improving the security of the voiceprint unlocking.
- the method before the terminal extracts the audio signal portion and the determination signal portion of the voiceprint recognition signal, the method further includes:
- the terminal separates the voiceprint recognition signal into an audio signal portion and a judgment signal portion
- the terminal separates the voiceprint recognition signal into an audio signal portion and a judgment signal portion, including:
- the terminal filters the voiceprint recognition signal by using a filter of a first preset frequency to obtain an audio signal portion
- the terminal filters the voiceprint recognition signal by using a filter of a second preset frequency to obtain a judgment signal part
- the filter of the first preset frequency is a high pass filter
- the filter of the second preset frequency is a low pass filter
- FIG. 9 is a schematic structural diagram of a terminal according to Embodiment 1 of the present invention.
- the terminal provided by the embodiment of the present invention includes: a detecting module 901, a receiving module 902, an extracting module 903, a first matching module 904, and a determining module 905.
- the detecting module 901 is configured to detect whether there is a sound signal.
- the receiving module 902 is configured to receive a sound signal.
- the extraction module 903 is configured to extract an audio signal portion and a determination signal portion of the sound signal.
- the first matching module 904 is configured to compare the voiceprint feature of the audio signal portion with the preset voiceprint feature; and compare the expiratory airflow characteristic of the determination signal portion with the expiratory airflow characteristic of the audio signal portion.
- the exhalation airflow is characterized by the airflow exhaled when the user corresponding to the sound signal outputs the sound.
- the determining module 905 is configured to: when the matching degree between the voiceprint feature of the audio signal portion and the preset voiceprint feature exceeds a preset threshold, and determine the matching degree between the expiratory airflow feature of the signal portion and the expiratory airflow feature of the audio feature portion When the preset threshold is exceeded, it is determined that the voiceprint detection result is successful.
- the terminal of the embodiment of the present invention is used to perform the technical solution of the method embodiment shown in FIG. 2, and the implementation principle and the technical effect are similar, and details are not described herein again.
- the receiving module 902 is further configured to receive an expiratory airflow feature that is greater than a preset airflow threshold in the determination signal portion.
- the terminal further includes: a quantization module.
- a quantification module for quantifying the characteristics of the expiratory flow.
- the first matching module 904 is further configured to compare the quantized expiratory airflow characteristics with the expiratory airflow characteristics of the audio signal portion.
- the matching degree between the expiratory airflow characteristic of the judging signal portion judged by the judging module 905 and the expiratory airflow characteristic of the audio characteristic portion exceeds a preset threshold, including: the quantized expiratory airflow characteristic and the audio
- the expiratory airflow feature matching of the signal portion exceeds a preset threshold.
- the first matching module 904 is specifically configured to: compare the characteristics of the expiratory airflow with the preset airflow threshold, and if the expiratory airflow characteristic is greater than the preset airflow threshold, The expiratory flow feature is quantized to one; otherwise, the expiratory flow characteristic is quantized to zero.
- the matching degree of the expired expiratory airflow characteristic determined by the determining module 905 and the expiratory airflow characteristic of the audio signal portion exceeds a preset threshold, and includes: at least one of the following two situations:
- the expiratory airflow characteristic is quantized to 1, and the text corresponding to the audio signal portion is an aspirated sound.
- the expiratory airflow characteristic is quantized to 0, and the text corresponding to the audio signal portion is an unspised sound.
- FIG. 10 is a schematic structural diagram of a terminal according to Embodiment 2 of the present invention. As shown in FIG. 10, the terminal provided by the embodiment of the present invention further includes: a second matching module 906, based on the foregoing embodiment.
- the second matching module 906 is configured to determine whether the pointing direction feature of the determining signal portion and the pointing direction feature of the audio signal portion are within a preset range.
- the determining mode 905 is further configured to: when the matching degree between the voiceprint feature of the audio signal portion and the preset voiceprint feature exceeds a preset threshold, and determine the matching of the expiratory airflow characteristic of the signal portion with the expiratory airflow characteristic of the audio feature portion When the degree exceeds the preset threshold, and the pointing direction characteristic of the signal portion and the pointing direction characteristic of the audio signal portion are within the preset range, it is determined that the voiceprint detection result is the detection success.
- the terminal of the embodiment of the present invention is used to implement the technical solution of the method embodiment shown in FIG. 5, and the implementation principle and the technical effect are similar, and details are not described herein again.
- the second matching module 906 is specifically configured to: respectively compare an angle of a pointing direction of the determining signal portion and an angle of a pointing direction of the audio signal portion with a preset pointing angle threshold.
- the pointing direction feature of the determining signal portion determined by the determining module 905 and the pointing direction feature of the audio signal portion are within a preset range, including: determining that the angle of the pointing direction of the signal portion and the pointing direction of the audio signal portion are smaller than the preset pointing direction. Angle threshold.
- FIG. 11 is a schematic structural diagram of a terminal according to Embodiment 3 of the present invention. As shown in FIG. 11, the terminal provided by the embodiment of the present invention further includes: a third matching module 907, based on the foregoing embodiment.
- the third matching module 907 is configured to compare the perceived temperature feature of the determination signal portion with a preset temperature threshold.
- the determining module 905 is further configured to: when the matching degree between the voiceprint feature of the audio signal portion and the preset voiceprint feature exceeds a preset threshold, and determine the matching of the expiratory airflow characteristic of the signal portion with the expiratory airflow characteristic of the audio feature portion If the degree exceeds the preset threshold, the direction of the pointing direction of the signal portion and the pointing direction characteristic of the audio signal portion are within a preset range, and the sensing temperature characteristic of the signal portion is greater than or equal to the preset temperature threshold, determining the voiceprint detection result is The test was successful.
- the terminal of the embodiment of the present invention is used to implement the technical solution of the method embodiment shown in FIG. 7.
- the implementation principle and the technical effect are similar, and details are not described herein again.
- the terminal further includes: a separation module.
- a separation module configured to separate the sound signal into an audio signal portion and a determination signal portion before the extraction module extracts the audio signal portion of the sound signal and the determination signal portion.
- the separation module is specifically configured to: filter the sound signal by using a filter of a first preset frequency to obtain an audio signal portion; and filter the sound signal by using a filter of a second preset frequency to obtain a judgment signal portion.
- the filter of the first preset frequency is a high pass filter
- the filter of the second preset frequency is a low pass filter
- FIG. 12 is a schematic structural diagram of a terminal according to Embodiment 4 of the present invention.
- a terminal provided by an embodiment of the present invention includes: a microphone 1201, a memory 1202, and a processor 1203.
- the microphone 1201 may be corresponding to the detection module 901 of the terminal for detecting whether there is a sound signal; if the sound signal is detected, the sound signal is received.
- the microphone 1503 can also be configured to receive an expiratory airflow characteristic of the determination signal portion that is greater than a preset airflow threshold.
- the memory 1202 is configured to store execution instructions, and the processor 1203 may be a Central Processing Unit (CPU), or an Application Specific Integrated Circuit (ASIC), or one or more implementations of the embodiments of the present invention. Integrated circuits. When the terminal is running, the processor 1203 communicates with the memory 1202, and the processor 1203 invokes an execution instruction for performing the following operations:
- the expiratory airflow characteristic is a feature of the airflow exhaled when the user corresponding to the sound signal outputs the sound; when the matching degree of the voiceprint feature of the audio signal portion and the preset voiceprint feature exceeds a preset threshold, and the determination signal portion Expiratory airflow characteristics and audio features When the matching degree of the expiratory airflow characteristic exceeds a preset threshold, it is determined that the voiceprint detection result is the detection success.
- the terminal may further include: a recorder 1204.
- the recorder 1204 can be used to collect sound signals emitted by the user, perform feature analysis on the sound signals, and acquire preset voiceprint features and store them.
- the processor 1203 is further configured to perform the following operations:
- Quantifying the characteristics of the expiratory flow comparing the quantified expiratory flow characteristics with the characteristics of the expiratory flow of the audio signal portion;
- the matching degree between the expiratory airflow characteristic of the judging signal portion determined by the processor 1203 and the expiratory airflow characteristic of the audio characteristic portion exceeds a preset threshold, including: the quantified expiratory airflow characteristic matches the expiratory airflow characteristic of the audio signal portion Degree exceeds the preset threshold
- the processor 1203 is further configured to perform the following operations:
- the degree of matching between the quantized expiratory airflow characteristic determined by the processor 1203 and the expiratory airflow characteristic of the audio signal portion exceeds a preset threshold, including: at least one of the following two cases:
- the expiratory airflow characteristic is quantized to 0, and the text corresponding to the audio signal portion is an unspised sound.
- the processor 1203 is further configured to perform the following operations:
- the processor 1203 is further configured to perform the following operations:
- the pointing direction feature of the judgment signal portion judged by the processor 1203 and the pointing direction feature of the audio signal portion are within a preset range, including: determining the angle of the pointing direction of the signal portion and the audio signal The angle of the pointing direction of the number portion is smaller than the preset pointing angle threshold.
- the processor 1203 is further configured to perform the following operations:
- the processor 1203 is further configured to perform the following operations:
- the signal frequency of the audio signal portion is compared with the preset voiceprint sample characteristic frequency.
- the processor 1203 is further configured to perform the following operations:
- the sound signal is separated into an audio signal portion and a judgment signal portion.
- the sound signal is filtered by using a filter of a first preset frequency to obtain an audio signal portion; and the sound signal is filtered by a filter of a second preset frequency to obtain a judgment signal portion; wherein, the first preset frequency
- the filter is a high pass filter and the filter of the second preset frequency is a low pass filter.
- FIG. 13 is a schematic structural diagram of an apparatus for detecting a voiceprint according to Embodiment 1 of the present invention.
- the device provided by the example of the present invention can be implemented as a single device, or can be integrated into various voice assistant devices, such as a set top box, a mobile phone, a tablet personal computer, and a laptop computer. , multimedia player, digital camera, personal digital assistant (PDA), navigation device, mobile Internet device (MID) or wearable device (Wearable Device).
- the apparatus provided by the embodiment of the present invention may include one or more of the following units: an input unit, a storage unit, a processor unit, a communication unit, a peripheral interface, an output unit, and a power source.
- the microphone can be used as an input unit, and the input unit can input an audio signal to detect whether the terminal has a voiceprint recognition signal.
- the memory can be used as a storage unit, and the storage unit can store execution instructions, such as an execution instruction such as an operation program and an application program, or a specific blow signal recognition module, a blow signal and an audio signal separation module, and a blow signal determination module. Wait Line instructions.
- the processor may be a processor unit, and the processor unit may be a central processing unit (CPU), or an application specific integrated circuit (ASIC), or one or more implementations of the embodiments of the present invention. Integrated circuits.
- the processor unit When the terminal is running, the processor unit communicates with the memory unit, and the processor unit invokes an execution instruction for performing the operations in the above method embodiments.
- the communication unit can be used for limited or wireless communication between the terminal and other devices.
- the peripheral interface can be used to provide an interface between the terminal and the peripheral interface module, wherein the peripheral interface module can be a keyboard, a button, or the like.
- the output unit can be used to output an audio signal.
- the power supply can be used to provide power to the various units of the terminal.
- Embodiments of the present invention also provide a non-transitory computer readable storage medium, such as a storage unit including instructions that are executable by a processor of a voiceprint detecting device to perform the above method.
- the non-transitory computer readable storage medium can be a ROM, a random access memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, and an optical data storage device.
- a non-transitory computer readable storage medium storing computer instructions for causing an apparatus for controlling a cache to perform an operation in the above-described method embodiments.
- the instructions in the storage medium are executed by the processor of the terminal, the terminal is enabled to perform the operations in the above method embodiments.
Landscapes
- Engineering & Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- Telephone Function (AREA)
- Toys (AREA)
Abstract
Description
本发明涉及电子技术领域,尤其涉及一种声纹检测的方法和装置。The present invention relates to the field of electronic technologies, and in particular, to a method and apparatus for voiceprint detection.
随着移动终端和智能交互的发展,终端设备成为人们日常生活中必不可少的一部分。为了保护用户在终端设备中存储的隐私信息,大部分终端设备都提供有密码保护解锁功能,当终端设备处于锁定状态时,用户只有输入正确的密码才可以解锁终端设备。目前,终端解锁的方法很多,由于语音解锁相较于其他解锁方法具有较高的安全性,已成为一种应用广泛的解锁方法。终端设备或应用软件提供语音解锁的功能,通过语音解锁来验证用户,进一步解锁终端设备,或提供服务等。With the development of mobile terminals and intelligent interactions, terminal devices have become an indispensable part of people's daily lives. In order to protect the private information stored by the user in the terminal device, most terminal devices are provided with a password protection unlock function. When the terminal device is in the locked state, the user can only unlock the terminal device by inputting the correct password. At present, there are many methods for unlocking a terminal. Since voice unlocking has higher security than other unlocking methods, it has become a widely used unlocking method. The terminal device or application software provides a function of unlocking voice, verifying the user through voice unlocking, further unlocking the terminal device, or providing services.
目前,语音解锁主要通过声纹解锁来验证用户,解锁时通过将用户输入的声音信号与预设的声音信号进行比较,若确定用户输入的声纹与预设声纹匹配,即确定是一个人,则进行解锁。At present, the voice unlocking mainly authenticates the user through the voiceprint unlocking, and compares the sound signal input by the user with the preset sound signal when unlocking, and if it is determined that the voiceprint input by the user matches the preset voiceprint, it is determined to be a person. , then unlock it.
然而,采用目前的声纹解锁方法,不能防止录音攻击,即对用户说出的声纹识别的文本进行录音,将录音的声纹识别的文本播放出来进行声纹解锁,也可以成功解锁,使得声纹解锁存在安全隐患,安全性不高。However, with the current voiceprint unlocking method, the recording attack cannot be prevented, that is, the text recognized by the user is recorded, and the voice-recognition text of the recording is played out to unlock the voiceprint, and the voiceprint can be successfully unlocked. The soundprint unlocking has a safety hazard and the safety is not high.
发明内容Summary of the invention
本发明提供一种声纹检测的方法和装置,提高了声纹解锁的安全性。The invention provides a method and a device for detecting a voiceprint, which improves the security of voiceprint unlocking.
第一方面,本发明提供的声纹检测的方法,包括:终端检测是否有声音信号,若终端检测有声音信号,则终端接收声音信号,终端提取声音信号的音频信号部分和判断信号部分,将音频信号部分的声纹特征与预设声纹特征进行比对,将判断信号部分的呼气气流特征与音频信号部分的呼气气流特征进行比对,当音频信号部分的声纹特征与预设声纹特征的匹配度超过预设的阈值,且判断信号部分的呼气气流特征与音频特征部分的呼气气流特征的匹配度超过预设的阈值时,判断声纹检测结果为检测成功。使得终端识别声音 信号时,将声音信号分为音频信号部分和判断信号部分,实现对声音信号的双重识别,同时,有效避免了用户边播放录音边对口型吹气的情况,提高了声纹解锁的安全性。In a first aspect, the method for detecting voiceprint according to the present invention comprises: detecting whether a sound signal is present by a terminal, and if the terminal detects a sound signal, the terminal receives the sound signal, and the terminal extracts the audio signal portion and the judgment signal portion of the sound signal, The voiceprint feature of the audio signal portion is compared with the preset voiceprint feature, and the expiratory airflow characteristic of the judgment signal portion is compared with the exhalation airflow characteristic of the audio signal portion, and the voiceprint feature of the audio signal portion is preset. When the matching degree of the voiceprint feature exceeds a preset threshold, and the degree of matching between the expiratory airflow characteristic of the signal portion and the expiratory airflow characteristic of the audio feature portion exceeds a preset threshold, it is determined that the voiceprint detection result is successful. Make the terminal recognize the sound When the signal is used, the sound signal is divided into an audio signal portion and a judgment signal portion, thereby realizing double recognition of the sound signal, and at the same time, effectively avoiding the situation that the user blows the mouth while playing the recording, and improves the security of the voiceprint unlocking.
在一种可能的实现方式中,接收判断信号部分中大于预设气流阈值的呼气气流特征;将呼气气流特征进行量化;将量化后的呼气气流特征与音频信号部分对应的文本所对应的呼气气流特征进行比对;若量化后的呼气气流特征与音频信号部分的呼气气流特征匹配度超过预设的阈值,则判断信号部分的呼气气流特征与所述音频特征部分的呼气气流特征的匹配度超过预设的阈值。通过将量化后的呼气气流特征与音频信号部分的呼气气流特征进行比对,实现判断量化后的呼气气流特征与音频信号部分的呼气气流特征匹配度是否超过预设的阈值,进而提高吹气信号识别的准确性。In a possible implementation manner, the feature of the expiratory airflow that is greater than a preset airflow threshold in the portion of the determination signal is received; the characteristics of the expiratory airflow are quantized; and the quantized expiratory airflow characteristic corresponds to the text corresponding to the audio signal portion. Comparing the characteristics of the expiratory airflow; if the matching of the characterized expiratory airflow characteristic with the expiratory airflow characteristic of the audio signal portion exceeds a preset threshold, determining the expiratory airflow characteristic of the signal portion and the audio characteristic portion The matching of the expiratory airflow characteristics exceeds a preset threshold. By comparing the quantified expiratory airflow characteristics with the expiratory airflow characteristics of the audio signal portion, whether the matching degree of the quantified expiratory airflow characteristics and the expiratory airflow characteristic of the audio signal portion exceeds a preset threshold is determined. Improve the accuracy of blow signal recognition.
在一种可能的实现方式中,将呼气气流特征与预设气流门限值比对,若呼气气流特征大于预设气流门限值,则将呼气流特征量化为1;否则,将呼气流量特征量化为0;若以下两种情况中的至少一种:呼气气流特征量化为1,且音频信号部分对应的文本为送气音;呼气气流特征量化为0,且音频信号部分对应的文本为不送气音,则量化后的呼气气流特征与音频信号部分的呼气气流特征匹配度超过预设的阈值。通过将呼气气流特征与预设气流门限值比对,实现将呼气气流特征进行量化。In a possible implementation manner, the expiratory airflow characteristic is compared with a preset airflow threshold, and if the expiratory airflow characteristic is greater than the preset airflow threshold, the expiratory airflow feature is quantized to 1; otherwise, The expiratory flow characteristic is quantized to 0; if at least one of the following two conditions: the expiratory airflow characteristic is quantized to 1, and the text corresponding to the audio signal portion is an aspirated sound; the expiratory airflow characteristic is quantized to 0, and the audio signal portion is The corresponding text is a non-aspirate sound, and the matched expiratory airflow characteristic and the expiratory airflow characteristic of the audio signal portion exceed a preset threshold. The characteristics of the expiratory flow are quantified by comparing the characteristics of the expiratory flow with the preset airflow threshold.
在一种可能的实现方式中,判断判断信号部分的指向方向特征与音频信号部分的指向方向特征是否在预设范围内;当音频信号部分的声纹特征与预设声纹特征的匹配度超过预设的阈值,且判断信号部分的呼气气流特征与音频特征部分的呼气气流特征的匹配度超过预设的阈值,以及判断信号部分的指向方向特征与音频信号部分的指向方向特征在预设范围内时,判断声纹检测结果为检测成功。通过判断声音信号中的音频信号与吹气信号是否来自同一指向性方向,有效避免了播放录音和对口型吹气的方向可能不一致的情况,提高了声纹解锁的安全性。In a possible implementation manner, it is determined whether the pointing direction feature of the determining signal portion and the pointing direction feature of the audio signal portion are within a preset range; when the matching relationship between the voiceprint feature of the audio signal portion and the preset voiceprint feature exceeds a preset threshold, and determining that the matching degree of the expiratory airflow characteristic of the signal portion and the expiratory airflow characteristic of the audio characteristic portion exceeds a preset threshold, and determining a pointing direction characteristic of the signal portion and a pointing direction characteristic of the audio signal portion are in advance When the range is set, it is judged that the voiceprint detection result is successful. By judging whether the audio signal and the air blowing signal in the sound signal are from the same directivity direction, the situation that the playing recording and the mouth-type blowing direction may be inconsistent is effectively avoided, and the safety of the voiceprint unlocking is improved.
在一种可能的实现方式中,分别将判断信号部分的指向方向的角度和音频信号部分的指向方向的角度与预设指向角度阈值比对;若判断信号部分的指向方向的角度和音频信号部分的指向方向的角度均小于预设指向角度阈值,则判断信号部分的指向方向特征与音频信号部分的指向方向特征在预设 范围内。通过分别将判断信号部分的指向方向的角度和音频信号部分的指向方向的角度与预设指向角度阈值比对,从而实现判断判断信号部分的指向方向特征与音频信号部分的指向方向特征是否在预设范围内。In a possible implementation manner, the angle of the pointing direction of the determining signal portion and the pointing direction of the audio signal portion are respectively compared with the preset pointing angle threshold; if the angle of the pointing direction of the signal portion and the audio signal portion are determined The angle of the pointing direction is smaller than the preset pointing angle threshold, and then the pointing direction characteristic of the signal portion and the pointing direction characteristic of the audio signal portion are preset. Within the scope. By comparing the angle of the pointing direction of the judgment signal portion and the angle of the pointing direction of the audio signal portion with the preset pointing angle threshold, respectively, whether the pointing direction characteristic of the judgment signal portion and the pointing direction characteristic of the audio signal portion are in advance Set within the scope.
在一种可能的实现方式中,将判断信号部分的感知温度特征与预设温度阈值比对;当音频信号部分的声纹特征与预设声纹特征的匹配度超过预设的阈值,且判断信号部分的呼气气流特征与音频特征部分的呼气气流特征的匹配度超过预设的阈值,判断信号部分的指向方向特征与音频信号部分的指向方向特征在预设范围内,以及判断信号部分的感知温度特征大于等于预设温度阈值时,判断声纹检测结果为检测成功。通过确定判断信号部分的感知温度特征是否大于等于预设温度阈值,进而判断终端接收的声音信号来自用户,而不是录音的电子设备,从而避免了录音攻击,提高了声纹解锁的安全性。In a possible implementation manner, the sensing temperature characteristic of the determining signal portion is compared with a preset temperature threshold; when the matching degree between the voiceprint feature of the audio signal portion and the preset voiceprint feature exceeds a preset threshold, and determining The matching degree of the expiratory airflow characteristic of the signal part and the expiratory airflow characteristic of the audio characteristic part exceeds a preset threshold value, and the pointing direction characteristic of the signal part and the pointing direction characteristic of the audio signal part are within a preset range, and the judgment signal part is When the perceived temperature characteristic is greater than or equal to the preset temperature threshold, it is determined that the voiceprint detection result is successful. By determining whether the perceived temperature characteristic of the signal portion is greater than or equal to the preset temperature threshold, it is determined that the sound signal received by the terminal is from the user, not the recorded electronic device, thereby avoiding the recording attack and improving the security of the voiceprint unlocking.
在一种可能的实现方式中,在终端提取声音信号的音频信号部分和判断信号部分之前,还包括:终端将声音信号分离为音频信号部分和判断信号部分;具体的,终端将声音信号采用第一预设频率的滤波器进行滤波,得到音频信号部分;终端将声音信号采用第二预设频率的滤波器进行滤波,得到判断信号部分;其中,第一预设频率的滤波器为高通滤波器,第二预设频率的滤波器为低通滤波器。通过将声音信号通过预设频率的滤波器,实现对声音信号分离为音频信号部分和判断信号部分。In a possible implementation manner, before the terminal extracts the audio signal portion and the determination signal portion of the sound signal, the method further includes: the terminal separating the sound signal into the audio signal portion and the determining signal portion; specifically, the terminal adopts the sound signal a filter of a preset frequency is filtered to obtain an audio signal portion; the terminal filters the sound signal by using a filter of a second preset frequency to obtain a judgment signal portion; wherein the filter of the first preset frequency is a high-pass filter The filter of the second preset frequency is a low pass filter. The sound signal is separated into an audio signal portion and a judgment signal portion by passing the sound signal through a filter of a preset frequency.
在一种可能的实现方式中,音频信号部分的声纹特征包括:声纹波形和信号频率中的至少一个;通过以下两种情况中的至少一种:将音频信号部分的声纹波形与预设声纹样本特征波形进行比对;将音频信号部分的信号频率与预设声纹样本特征频率进行比对;若音频信号部分的声纹波形与预设声纹样本特征波形的匹配度超过预设的阈值;和/或,音频信号部分的信号频率与预设声纹样本特征频率的匹配度超过预设的阈值,则音频信号部分的声纹特征与预设声纹特征的匹配度超过预设的阈值。通过将音频信号部分的声纹波形与预设声纹样本特征波形进行比对;和/或,将音频信号部分的信号频率与预设声纹样本特征频率进行比对,实现将音频信号部分的声纹特征与预设声纹特征进行比对。In a possible implementation manner, the voiceprint feature of the audio signal portion includes: at least one of a voiceprint waveform and a signal frequency; and at least one of the following two cases: the voiceprint waveform of the audio signal portion and the pre- Aligning the characteristic waveforms of the voiceprint samples; comparing the signal frequency of the audio signal portion with the characteristic frequency of the preset voiceprint sample; if the matching between the voiceprint waveform of the audio signal portion and the preset voiceprint sample characteristic waveform exceeds The threshold value is set; and/or, the matching between the signal frequency of the audio signal portion and the characteristic frequency of the preset voiceprint sample exceeds a preset threshold, and the matching degree between the voiceprint feature of the audio signal portion and the preset voiceprint feature exceeds Set the threshold. Comparing the voiceprint waveform of the audio signal portion with the preset voiceprint sample feature waveform; and/or comparing the signal frequency of the audio signal portion with the preset voiceprint sample feature frequency to implement the audio signal portion The voiceprint feature is compared to the preset voiceprint feature.
在一种可能的实现方式中,还包括:终端采集用户所发出的声音信号,对声音信号进行特征分析获取预设声纹特征并存储。通过终端事先采集用户 所发出的声音信号,并对该声音信号分析作为预设声纹特征并存储,确保预设声纹特征的准确性,从而提高音频信号部分的声纹特征与预设声纹特征的匹配的准确性,进而提高声纹解锁的安全性。In a possible implementation manner, the method further includes: collecting, by the terminal, a sound signal sent by the user, performing feature analysis on the sound signal, acquiring a preset voiceprint feature, and storing the sound signal. Collect users in advance through the terminal The sound signal is emitted, and the sound signal is analyzed as a preset voiceprint feature and stored to ensure the accuracy of the preset voiceprint feature, thereby improving the accuracy of matching the voiceprint feature of the audio signal portion with the preset voiceprint feature. Sex, which improves the security of voiceprint unlocking.
在一种可能的实现方式中,还包括:终端获取声音信号对应的用户输出声音时呼出的气流特征。实现判断信号部分的呼气气流特征的获取,确保将判断信号部分的呼气气流特征与音频信号部分的呼气气流特征进行比对。In a possible implementation manner, the method further includes: acquiring, by the terminal, an airflow feature that is exhaled when the user outputs a sound corresponding to the sound signal. The acquisition of the expiratory airflow characteristic of the judging signal portion is performed to ensure that the expiratory airflow characteristic of the judging signal portion is compared with the expiratory airflow characteristic of the audio signal portion.
在一种可能的实现方式中,还包括:终端获取声音信号对应的用户输出声音的方向。实现判断信号部分的指向方向特征的获取,确保判断信号部分的指向方向特征与音频信号部分的指向方向特征在预设范围内。In a possible implementation manner, the method further includes: acquiring, by the terminal, a direction of the user outputting a sound corresponding to the sound signal. The obtaining of the pointing direction feature of the judging signal portion is performed to ensure that the pointing direction feature of the judging signal portion and the pointing direction feature of the audio signal portion are within a preset range.
在一种可能的实现方式中,还包括:终端获取声音信号对应的用户输出声音时的温度。实现判断信号部分的感知温度特征的获取,确保判断信号部分的感知温度特征与预设温度阈值比对。In a possible implementation manner, the method further includes: acquiring, by the terminal, a temperature when the user outputs a sound corresponding to the sound signal. The acquisition of the sensing temperature characteristic of the determination signal portion is performed to ensure that the sensing temperature characteristic of the determination signal portion is compared with the preset temperature threshold.
第二方面,本发明提供的终端,包括:检测模块,用于检测是否有声音信号;接收模块,用于接收声音信号;提取模块,用于提取声音信号的音频信号部分和判断信号部分;第一匹配模块,用于将音频信号部分的声纹特征与预设声纹特征进行比对;将判断信号部分的呼气气流特征与音频信号部分的呼气气流特征进行比对;其中,呼气气流特征为声音信号对应的用户输出声音时呼出的气流的特征;判断模块,用于当音频信号部分的声纹特征与预设声纹特征的匹配度超过预设的阈值,且判断信号部分的呼气气流特征与音频特征部分的呼气气流特征的匹配度超过预设的阈值时,判断声纹检测结果为检测成功。使得终端识别声音信号时,将声音信号分为音频信号部分和判断信号部分,实现对声音信号的双重识别,同时,有效避免了用户边播放录音边对口型吹气的情况,提高了声纹解锁的安全性。In a second aspect, the terminal provided by the present invention includes: a detecting module, configured to detect whether a sound signal is present; a receiving module, configured to receive a sound signal; and an extracting module, configured to extract an audio signal portion and a determining signal portion of the sound signal; a matching module for comparing the voiceprint feature of the audio signal portion with the preset voiceprint feature; comparing the expiratory airflow characteristic of the determination signal portion with the expiratory airflow characteristic of the audio signal portion; wherein, exhaling The airflow characteristic is a feature of the airflow exhaled when the user outputs the sound corresponding to the sound signal; and the determining module is configured to: when the matching degree of the voiceprint feature of the audio signal portion and the preset voiceprint feature exceeds a preset threshold, and determine the signal portion When the matching degree of the expiratory airflow characteristic and the expiratory airflow characteristic of the audio feature part exceeds a preset threshold, it is determined that the voiceprint detection result is the detection success. When the terminal recognizes the sound signal, the sound signal is divided into the audio signal portion and the judgment signal portion, thereby realizing the double recognition of the sound signal, and at the same time, effectively avoiding the user blowing the air while playing the recording, and improving the voiceprint unlocking. Security.
第三方面,本发明提供的终端,包括:麦克风和处理器;麦克风,用于检测是否有声音信号;若检测有声音信号,则接收声音信号;处理器,用于提取声音信号的音频信号部分和判断信号部分;将音频信号部分的声纹特征与预设声纹特征进行比对;将判断信号部分的呼气气流特征与音频信号部分的呼气气流特征进行比对;其中,呼气气流特征为声音信号对应的用户输出声音时呼出的气流的特征;当音频信号部分的声纹特征与预设声纹特征的匹配度超过预设的阈值,且判断信号部分的呼气气流特征与音频特征部分的呼气 气流特征的匹配度超过预设的阈值时,判断声纹检测结果为检测成功。使得终端识别声音信号时,将声音信号分为音频信号部分和判断信号部分,实现对声音信号的双重识别,同时,有效避免了用户边播放录音边对口型吹气的情况,提高了声纹解锁的安全性。In a third aspect, the terminal provided by the present invention includes: a microphone and a processor; a microphone for detecting whether there is a sound signal; if detecting a sound signal, receiving a sound signal; and a processor for extracting an audio signal portion of the sound signal And judging the signal portion; comparing the voiceprint feature of the audio signal portion with the preset voiceprint feature; comparing the expiratory airflow characteristic of the determination signal portion with the expiratory airflow characteristic of the audio signal portion; wherein, the expiratory airflow The feature is a feature of the airflow exhaled when the user outputs the sound corresponding to the sound signal; when the matching degree of the voiceprint feature of the audio signal portion and the preset voiceprint feature exceeds a preset threshold, and the expiratory airflow characteristic and the audio of the signal portion are judged Exhalation of the characteristic part When the matching degree of the airflow feature exceeds a preset threshold, it is determined that the voiceprint detection result is successful. When the terminal recognizes the sound signal, the sound signal is divided into the audio signal portion and the judgment signal portion, thereby realizing the double recognition of the sound signal, and at the same time, effectively avoiding the user blowing the air while playing the recording, and improving the voiceprint unlocking. Security.
第四方面,本发明提供的非易失性计算机可读存储介质,非易失性计算机可读存储介质存储计算机指令,计算机指令用于使控制缓存刷盘的装置执行上述方法中的操作。In a fourth aspect, the present invention provides a non-transitory computer readable storage medium storing computer instructions for causing an apparatus for controlling a cache to perform an operation in the above method.
本发明提供的声纹检测的方法和装置,通过终端检测是否有声音信号,若终端检测有声音信号,则终端接收声音信号,终端提取声音信号的音频信号部分和判断信号部分,将音频信号部分的声纹特征与预设声纹特征进行比对,将判断信号部分的呼气气流特征与音频信号部分的呼气气流特征进行比对,当音频信号部分的声纹特征与预设声纹特征的匹配度超过预设的阈值,且判断信号部分的呼气气流特征与音频特征部分的呼气气流特征的匹配度超过预设的阈值时,判断声纹检测结果为检测成功。使得终端识别声音信号时,将声音信号分为音频信号部分和判断信号部分,实现对声音信号的双重识别,同时,有效避免了用户边播放录音边对口型吹气的情况,提高了声纹解锁的安全性。The method and device for detecting voiceprint according to the present invention detect whether there is a sound signal through a terminal, and if the terminal detects a sound signal, the terminal receives the sound signal, and the terminal extracts the audio signal portion and the judgment signal portion of the sound signal, and the audio signal portion The voiceprint feature is compared with the preset voiceprint feature, and the expiratory airflow characteristic of the judgment signal portion is compared with the exhalation airflow characteristic of the audio signal portion, and the voiceprint feature and the preset voiceprint feature of the audio signal portion are compared. When the matching degree exceeds the preset threshold, and the matching degree of the expiratory airflow characteristic of the signal part and the expiratory airflow characteristic of the audio characteristic part exceeds a preset threshold, it is determined that the voiceprint detection result is the detection success. When the terminal recognizes the sound signal, the sound signal is divided into the audio signal portion and the judgment signal portion, thereby realizing the double recognition of the sound signal, and at the same time, effectively avoiding the user blowing the air while playing the recording, and improving the voiceprint unlocking. Security.
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below. Obviously, the drawings in the following description are only It is a certain embodiment of the present invention, and other drawings can be obtained from those skilled in the art without any inventive labor.
图1A为本发明实施例提供的声纹解锁的场景示意图;1A is a schematic diagram of a scene for unlocking a voiceprint according to an embodiment of the present invention;
图1B为本发明实施例提供的声纹密码设置的场景示意图;FIG. 1B is a schematic diagram of a scenario for setting a voiceprint password according to an embodiment of the present invention; FIG.
图2为本发明实施例一提供的声纹检测的方法流程图;2 is a flowchart of a method for detecting voiceprint according to Embodiment 1 of the present invention;
图3A为本发明实施例一提供的吹气信号的量化示意图;3A is a schematic diagram of quantification of a blowing signal according to Embodiment 1 of the present invention;
图3B为本发明实施例二提供的吹气信号的量化示意图;3B is a schematic diagram of quantification of a blowing signal according to Embodiment 2 of the present invention;
图4为本发明实施例一提供的声纹检测的过程示意图;4 is a schematic diagram of a process of voiceprint detection according to Embodiment 1 of the present invention;
图5为本发明实施例二提供的声纹检测的方法流程图; FIG. 5 is a flowchart of a method for voiceprint detection according to Embodiment 2 of the present invention; FIG.
图6为本发明实施例一提供的声音信号的指向方向的角度的示意图;6 is a schematic diagram of an angle of a pointing direction of a sound signal according to Embodiment 1 of the present invention;
图7为本发明实施例三提供的声纹检测的方法流程图;7 is a flowchart of a method for voiceprint detection according to Embodiment 3 of the present invention;
图8为本发明实施例四提供的声纹检测的方法流程图;FIG. 8 is a flowchart of a method for voiceprint detection according to Embodiment 4 of the present invention; FIG.
图9为本发明实施例一提供的终端结构示意图;FIG. 9 is a schematic structural diagram of a terminal according to Embodiment 1 of the present invention;
图10为本发明实施例二提供的终端结构示意图;10 is a schematic structural diagram of a terminal according to Embodiment 2 of the present invention;
图11为本发明实施例三提供的终端结构示意图;FIG. 11 is a schematic structural diagram of a terminal according to Embodiment 3 of the present invention;
图12为本发明实施例四提供的终端结构示意图;FIG. 12 is a schematic structural diagram of a terminal according to Embodiment 4 of the present invention;
图13为本发明实施例一提供的声纹检测的方装置结构示意图。FIG. 13 is a schematic structural diagram of a device for detecting a voiceprint according to Embodiment 1 of the present invention.
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, but not all embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.
图1A为本发明实施例提供的声纹解锁的场景示意图。如图1A所示,终端设备或应用软件提供了声纹解锁的功能,用户通过说出相应的声纹密码,终端通过声纹解锁来验证用户,进一步解锁设备,或提供服务。声纹识别一般包括两种:1、声纹识别时识别的文本内容是预先设定的:每次解锁时,重复相同的用户预先设定好的文本识别(例如,芝麻开门);或为了提升安全,声纹识别时,电子设备随机生成一些文字或数字密码,用户读出提示的随机密码,来保证声纹识别的安全;2、声纹识别时识别的文本内容是随机设定的:设置时用户随机说一些话,电子设备提取用户的特征参数后,识别用户,当需要声纹识别时,用户说话即可辨认出为机主。图1B为本发明实施例提供的声纹密码设置的场景示意图,如图1B所示,用户可以对声纹密码进行设置,可以预先定义一个声纹密码,例如微信通过定义一个声纹密码,用户说出声纹密码“芝麻开门”后,终端通过麦克风来成功录入用户的声纹密码,用户通过该声纹密码来登录帐号,终端通过验证用户输入的声纹密码来确定是否让用户登录账号。FIG. 1A is a schematic diagram of a scene of a voiceprint unlocking according to an embodiment of the present invention. As shown in FIG. 1A, the terminal device or the application software provides a function of unlocking the voiceprint. The user verifies the user by unlocking the voiceprint by speaking the corresponding voiceprint password, further unlocking the device, or providing a service. Voiceprint recognition generally includes two types: 1. The text content recognized during voiceprint recognition is preset: each time the unlock is unlocked, the same user-preset text recognition is repeated (for example, sesame opens the door); or When the voiceprint is recognized, the electronic device randomly generates some text or digital passwords, and the user reads out the prompt random password to ensure the security of the voiceprint recognition; 2. The text content recognized during the voiceprint recognition is randomly set: setting When the user randomly says something, the electronic device recognizes the user after extracting the characteristic parameters of the user, and when the voiceprint recognition is required, the user can recognize the player as the speaker. FIG. 1B is a schematic diagram of a scene of a voiceprint password setting according to an embodiment of the present invention. As shown in FIG. 1B, a user can set a voiceprint password, and a voiceprint password can be defined in advance, for example, by defining a voiceprint password, the user After the voiceprint password "Sesame Opens" is spoken, the terminal successfully enters the user's voiceprint password through the microphone, and the user logs in the account through the voiceprint password, and the terminal determines whether to let the user log in to the account by verifying the voiceprint password input by the user.
图2为本发明实施例一提供的声纹检测的方法流程图。如图2所示,本发明实施例提供的方法,包括: FIG. 2 is a flowchart of a method for voiceprint detection according to Embodiment 1 of the present invention. As shown in FIG. 2, the method provided by the embodiment of the present invention includes:
S201:终端检测是否有声音信号。S201: The terminal detects whether there is a sound signal.
需要说明的是,本发明实施例所涉及的终端具备接收语音的功能,终端可以包括但不仅限于手机、平板电脑等移动通信设备。It should be noted that the terminal according to the embodiment of the present invention has the function of receiving voice, and the terminal may include, but is not limited to, a mobile communication device such as a mobile phone or a tablet computer.
具体的,用户在需要解锁验证时,会向终端发出声音信号(语音信号),比如,用户发出的声音信号可以是用户说出预先设定好的声纹密码“芝麻开门”,或是用户呼叫语音助手的名字,例如“小冰”,“hello google”等,也可以是用户读出终端随机生成的文字或数字密码,可以是用户随机说一段话。终端在未解锁状态时,检测是否有用户发出的语音信号,若终端在未解锁状态时检测有用户发出的声音信号,即检测有声纹识别信号时,对用户发出的声音信号进行识别。Specifically, when the user needs to unlock the verification, the user sends a sound signal (speech signal) to the terminal. For example, the voice signal sent by the user may be that the user speaks the preset voiceprint password “open sesame” or the user calls. The name of the voice assistant, such as "small ice", "hello google", etc., may also be a text or digital password randomly generated by the user to read the terminal, or the user may randomly say a paragraph. When the terminal is in the unlocked state, it detects whether there is a voice signal sent by the user. If the terminal detects the voice signal sent by the user when the terminal is in the unlocked state, that is, when the voiceprint recognition signal is detected, the voice signal sent by the user is recognized.
可选的,本发明实施例中终端并不是一直处于活体声纹识别模式,只是当终端检测有声纹识别信号时,进入活体声纹识别模式后,对用户发出的声音信号进行识别。终端处于未解锁(待机)状态,当需要声纹识别时,进入活体声纹识别模式,例如终端进入待锁屏模式、应用软件待声纹解锁、识别用户嘴部接近麦克风、或识别用户对着麦克风说话等场景或任一场景的组合时,则进入活体声纹识别模式。其中嘴部接近麦克风可通过例如接近传感器、超声波传感器、红外传感器等传感器来判断并进入活体声纹识别模式。活体声纹识别模式需要终端开启相应的模块,以便对接收到的声纹识别信号进行相应的分析处理,具体包括例如录音模块,声纹识别模块,温度计模块,光传感器模块,指向性监听模块、超声波传感器、红外传感器中任意模块或模块组合来进入活体声纹识别模式。可选的,本发明实施例的终端也可以一直处于活体声纹识别模式,只要检测到有声纹识别信号时,就可以对用户发出的声音信号进行识别。本发明实施主要以终端检测有声音信号时,进入活体声纹识别模式为例进行阐述,但并不仅限于此。Optionally, in the embodiment of the present invention, the terminal is not always in the living voiceprint recognition mode, but when the terminal detects the voiceprint recognition signal, after entering the living voiceprint recognition mode, the voice signal sent by the user is identified. The terminal is in an unlocked (standby) state. When voiceprint recognition is required, the voiceprint recognition mode is entered, for example, the terminal enters the screen to be locked, the application software is to be unlocked, the user's mouth is close to the microphone, or the user is identified. When a microphone or other scene or a combination of any scene is combined, the live voiceprint recognition mode is entered. The mouth proximity microphone can be judged by a sensor such as a proximity sensor, an ultrasonic sensor, an infrared sensor, or the like, and enters the living voiceprint recognition mode. The living voiceprint recognition mode requires the terminal to open the corresponding module to perform corresponding analysis and processing on the received voiceprint identification signal, including, for example, a recording module, a voiceprint recognition module, a thermometer module, a light sensor module, a directional monitor module, Any module or combination of modules in the ultrasonic sensor and infrared sensor to enter the live voiceprint recognition mode. Optionally, the terminal in the embodiment of the present invention may also be in the living voiceprint recognition mode. When the voiceprint recognition signal is detected, the voice signal sent by the user may be identified. The implementation of the present invention mainly describes how to enter the living voiceprint recognition mode when the terminal detects the sound signal, but is not limited thereto.
需要说明的是,声纹是用电声学仪器显示的携带声音信号(语音信号)的声波频谱,由于不同的人的发声的习惯不同,使得不同的人的发声气流不同,造成发声的音质、音色等存在差别,每一个的声纹都不相同。声纹识别是生物识别的一种,用以确认某段语音是否是指定的某个人所说的。声纹识别信号为终端在未解锁状态时检测到的声音信号(语音信号),声纹识别信号中包含用户声纹的语音信号,终端可以根据声纹识别信号识别出声音信号 中用户的声纹是否是指定用户的声纹,以确认检测到的声音信号是否是指定的用户所说的。It should be noted that the voiceprint is a sound wave spectrum of a sound signal (speech signal) displayed by an electroacoustic instrument. Due to different habits of different people's voices, different people's vocal airflow is different, resulting in sound quality and tone. There are differences, and each voiceprint is different. Voiceprint recognition is a type of biometric recognition to confirm whether a certain voice is spoken by a designated person. The voiceprint recognition signal is a sound signal (speech signal) detected when the terminal is in an unlocked state, and the voiceprint recognition signal includes a voice signal of the user voiceprint, and the terminal can recognize the sound signal according to the voiceprint recognition signal. Whether the user's voiceprint is the voiceprint of the specified user to confirm whether the detected voice signal is what the specified user said.
S202:若终端检测有声音信号,则终端接收声音信号。S202: If the terminal detects that there is a sound signal, the terminal receives the sound signal.
具体的,当检测有声音信号时,终端可以通过麦克风接收声音信号。可选的,终端接收声音信号后并将接收的声音信号存储。其中为了避免错漏检测到声纹识别信号的声音信号,终端可能一直处于监听状态,并缓存接收到的声音信号,以便当终端进入活体声纹识别模式时有完整的声纹识别信号以供分析处理。Specifically, when detecting a sound signal, the terminal can receive the sound signal through the microphone. Optionally, the terminal receives the sound signal and stores the received sound signal. In order to avoid the sound signal of the voiceprint recognition signal detected by the fault, the terminal may always be in the listening state, and buffer the received sound signal, so that when the terminal enters the living voiceprint recognition mode, the voiceprint identification signal is complete for analysis and processing. .
S203:终端提取声音信号的音频信号部分和判断信号部分。S203: The terminal extracts an audio signal portion and a judgment signal portion of the sound signal.
具体的,声音信号中可以包括用户说话声音的音频信号和用户说话时的感知温度,也可以包括用户说话声音的音频信号和声音信号的方向,也可以包括用户说话声音的音频信号和用户说话时呼气的信号,终端可以将声音信号分为音频信号部分和判断信号部分,其中,音频信号部分可以包括声音信号中的音频信号的声纹特征,判断信号部分可以包括用户说话时的感知温度、声音信号的方向和用户说话时呼气的信号中至少一个,举例来说,终端可以通过温度传感器获取声音信号中用户说话时的感知温度;终端也可以通过麦克风阵列获取声音信号的指向性方向;终端也可以通过预设频率的滤波器(低通滤波器)获取用户说话时呼气的信号。Specifically, the sound signal may include an audio signal of the user's speaking voice and a perceived temperature when the user speaks, and may also include an audio signal and a sound signal direction of the user's speaking voice, and may also include an audio signal of the user's speaking voice and the user speaking. The exhaled signal, the terminal may divide the sound signal into an audio signal portion and a judgment signal portion, wherein the audio signal portion may include a voiceprint feature of the audio signal in the sound signal, and the determination signal portion may include a perceived temperature when the user speaks, At least one of a direction of the sound signal and a signal of exhalation when the user speaks, for example, the terminal can obtain the perceived temperature of the voice signal in the voice signal by the terminal; the terminal can also obtain the directivity direction of the sound signal through the microphone array; The terminal can also obtain a signal of exhalation when the user speaks through a filter of a preset frequency (low pass filter).
S204:将音频信号部分的声纹特征与预设声纹特征进行比对。S204: Align the voiceprint feature of the audio signal portion with the preset voiceprint feature.
具体的,终端将音频信号部分的声纹特征与预设声纹特征进行比对,判断音频信号部分的声纹特征与预设声纹特征是否匹配。Specifically, the terminal compares the voiceprint feature of the audio signal portion with the preset voiceprint feature, and determines whether the voiceprint feature of the audio signal portion matches the preset voiceprint feature.
本发明实施例中,在终端进入待机状态之前,用户可在终端中对活体声纹识别进行设置,包括接收用户预设的语音信号,例如给出“芝麻开门”四个字,用户读出该预设的文本,终端记录下用户语音信号,该语音信号包括该用户读出该预设文本的音频信号,该音频信号具备声纹识别特征,将该音频信号的声纹识别特征作为预设声纹特征。In the embodiment of the present invention, before the terminal enters the standby state, the user may set the living voiceprint recognition in the terminal, including receiving a voice signal preset by the user, for example, giving the word “sesame opening”, the user reads the a preset text, the terminal records a user voice signal, the voice signal includes an audio signal that the user reads the preset text, the audio signal has a voiceprint recognition feature, and the voiceprint recognition feature of the audio signal is used as a preset sound Pattern features.
可选的,声纹特征可以包括:音频信号的声纹波形和音频信号的信号频率中的至少一个。将音频信号部分的声纹特征与预设声纹特征进行比对可以通过以下两种情况中的至少一种:Optionally, the voiceprint feature may include at least one of a voiceprint waveform of the audio signal and a signal frequency of the audio signal. Comparing the voiceprint feature of the audio signal portion with the preset voiceprint feature may be by at least one of two things:
一种情况:将音频信号部分的声纹波形与预设声纹样本特征波形进行比 对。A case where the voiceprint waveform of the audio signal portion is compared with the preset voiceprint sample feature waveform Correct.
另一种情况:将音频信号部分的信号频率与预设声纹样本特征频率进行比对。Another case: the signal frequency of the audio signal portion is compared with the preset voiceprint sample characteristic frequency.
S205:将判断信号部分的呼气气流特征与音频信号部分的呼气气流特征进行比对。S205: Compare the characteristics of the expiratory airflow of the judging signal portion with the characteristics of the expiratory airflow of the audio signal portion.
其中,呼气气流特征为声音信号对应的用户输出声音时呼出的气流的特征。The exhalation airflow is characterized by the airflow exhaled when the user corresponding to the sound signal outputs the sound.
在本发明实施例中,终端通过检测麦克风接收到的声音信号,使用录音机等来捕获麦克输入。人们正常发音时,包括气流强弱及声带是否振动,发声时,呼出的气流必须冲开声门,由于伯努利效应,声门复归,声门下气压足够大时,又冲开声门,反复开闭形成周期性的颤动,因此发音时会有气流呼出,这里称为吹气信号,即吹气信号为用户输出的声音对应的呼出的气流特征。举例来说,例如用户在说“开”字时,由于开字是送气音,用户在发出“开”字的声音时,需要呼出气流冲开生门才能发出这个音,这是需要带出一些气流,麦克风接收到该有效的气流为吹气信号。由于声音信号中音频的频率大概在300-3000赫兹(Hz)之间,向麦克吹气音主要为低频信号,因此通过低通滤波可以将不是吹气的高频成分滤出,来得到吹气信号,从而实现音频信号和吹气信号的分离。In the embodiment of the present invention, the terminal captures the microphone input by detecting a sound signal received by the microphone, using a tape recorder or the like. When people are pronounced normally, including the strength of the airflow and whether the vocal cords vibrate, when the vocalization, the exhaled airflow must open the glottis. Due to the Bernoulli effect, the glottis is returned, and the pressure under the glottis is large enough to open the glottis repeatedly. The opening and closing forms a periodic tremor, so there is an airflow exhalation when the sound is pronounced, which is referred to herein as a blowing signal, that is, the blowing signal is an outgoing airflow characteristic corresponding to the sound output by the user. For example, for example, when the user says "open", since the open word is a gas-sounding sound, when the user makes a "open" sound, the user needs to exhale the airflow to open the door to emit the sound, which is necessary to bring out some The airflow, the microphone receives the effective airflow as a blow signal. Since the frequency of the audio in the sound signal is about 300-3000 Hertz (Hz), the air blowing sound to the microphone is mainly a low-frequency signal, so the high-frequency component that is not blown can be filtered out by low-pass filtering to obtain the air blowing. Signal, thereby separating the audio signal and the blow signal.
具体的,当终端检测有声音信号后,提取的判断信号部分包括吹气信号时,终端将音频信号转化为对应的文本,判断出文字的每个字或词的呼气气流特征为送气音或不送气音,将判断信号部分的呼气气流特征与音频信号部分的呼气气流特征比对,以判断用户的吹气信号是否与音频信号匹配。例如用户在预设的声纹识别样本时,针对某个字发音时是送气音,但是声纹识别验证时该字发音时为不送气音,则判定用户的吹气信号与音频信号不匹配。Specifically, when the terminal detects that there is a sound signal, and the extracted judgment signal portion includes a blow signal, the terminal converts the audio signal into a corresponding text, and determines that the exhalation airflow characteristic of each word or word of the text is a breath sound or The air-free tone is compared with the expiratory airflow characteristic of the audio signal portion to determine whether the user's air blowing signal matches the audio signal. For example, when the user recognizes a sample in a preset voiceprint, the voice is sounded for a certain word, but when the voiceprint is verified, the voice is unvoiced when the word is pronounced, and it is determined that the user's blow signal does not match the audio signal.
需要说明的是,终端可以根据不同用户的语音习惯,可以根据用户的语音习惯从用户日常接打电话中和语音助手中的至少一个学习每个用户的音频信号的呼气气流特征,例如有的用户说某个特定单词或字时吹气较大,而同一个单词或字其他用户吹气较小,来提升用户的呼气气流特征的准确度。It should be noted that, according to the voice habits of different users, the terminal may learn the characteristics of the exhalation airflow of each user's audio signal from at least one of the user's daily call and the voice assistant according to the user's voice habit, for example, When the user speaks a particular word or word, the blow is larger, while the same word or other user blows less, to improve the accuracy of the user's expiratory flow characteristics.
S206:当音频信号部分的声纹特征与预设声纹特征的匹配度超过预设的阈值,且判断信号部分的呼气气流特征与音频特征部分的呼气气流特征的匹 配度超过预设的阈值时,判断声纹检测结果为检测成功。S206: When the matching degree between the voiceprint feature of the audio signal portion and the preset voiceprint feature exceeds a preset threshold, and determining the characteristics of the expiratory airflow characteristic of the signal portion and the expiratory airflow characteristic of the audio feature portion When the degree of distribution exceeds the preset threshold, it is determined that the voiceprint detection result is successful.
具体的,终端将音频信号部分的声纹特征与预设声纹特征进行比对,判断音频信号部分的声纹特征与预设声纹特征匹配,且终端将判断信号部分的呼气气流特征与音频信号部分的呼气气流特征进行比对,判断信号部分的呼气气流特征与音频特征部分的呼气气流特征的匹配,声纹检测结果为检测成功,此时,终端解锁,用户可以在终端上完成相应的操作,比如解锁手机、登录微信等。Specifically, the terminal compares the voiceprint feature of the audio signal portion with the preset voiceprint feature, determines that the voiceprint feature of the audio signal portion matches the preset voiceprint feature, and the terminal determines the expiratory airflow characteristic of the signal portion and The characteristics of the expiratory airflow in the audio signal portion are compared, and the characteristics of the expiratory airflow of the signal portion are matched with the characteristics of the expiratory airflow of the audio feature portion. The voiceprint detection result is successful. At this time, the terminal is unlocked, and the user can be at the terminal. Complete the corresponding operations, such as unlocking the phone, logging in to WeChat, and so on.
举例来说,终端可以先判断音频信号部分的声纹特征与预设声纹特征的匹配度是否超过预设的阈值,若音频信号部分的声纹特征与预设声纹特征的匹配度没有超过预设的阈值,则音频信号部分的声纹特征与预设声纹特征不匹配,终端确定声纹检测失败,终端可以直接退出声纹检测模式;若音频信号部分的声纹特征与预设声纹特征的匹配度超过预设的阈值,则进一步确定判断信号部分的呼气气流特征与音频特征部分的呼气气流特征的匹配度是否超过预设的阈值,若判断信号部分的呼气气流特征与音频特征部分的呼气气流特征的匹配度超过预设的阈值,则音频信号部分的声纹特征与预设声纹特征匹配,且判断信号部分的呼气气流特征与音频特征部分的呼气气流特征匹配,终端确定声纹检测成功,终端解锁;若判断信号部分的呼气气流特征与音频特征部分的呼气气流特征的匹配度没有超过预设的阈值,则判断信号部分的呼气气流特征与音频特征部分的呼气气流特征不匹配,终端确定声纹检测失败,终端退出声纹检测模式。For example, the terminal may first determine whether the matching degree between the voiceprint feature of the audio signal portion and the preset voiceprint feature exceeds a preset threshold, if the matching degree between the voiceprint feature of the audio signal portion and the preset voiceprint feature does not exceed The preset threshold value does not match the voiceprint feature of the audio signal portion and the preset voiceprint feature, and the terminal determines that the voiceprint detection fails, and the terminal can directly exit the voiceprint detection mode; if the voiceprint portion of the audio signal portion and the preset sound If the matching degree of the pattern feature exceeds a preset threshold, it is further determined whether the matching degree of the expiratory airflow characteristic of the determination signal portion and the expiratory airflow characteristic of the audio characteristic portion exceeds a preset threshold value, and if the expiratory airflow characteristic of the signal portion is determined The matching degree of the expiratory airflow characteristic with the audio feature portion exceeds a preset threshold, and the voiceprint feature of the audio signal portion matches the preset voiceprint feature, and the exhalation airflow characteristic of the signal portion and the exhalation of the audio feature portion are judged. The airflow feature is matched, the terminal determines that the voiceprint detection is successful, and the terminal unlocks; if the signal part of the expiratory airflow characteristic and the audio feature is judged Matching feature of the expiratory flow does not exceed a preset threshold, wherein the expiratory flow expiratory flow characteristic signal portion and an audio portion is determined characteristic does not match, the terminal determines a failure detector voiceprint, voiceprint detection mode terminal exits.
需要说明的是,预设的阈值可以根据实际情况而定,比如,若终端中声纹特征匹配精度高,可以将预设的阈值设为95%,若终端中声纹特征匹配精度低,可以将预设的阈值设为90%。It should be noted that the preset threshold may be determined according to actual conditions. For example, if the voiceprint feature matching accuracy in the terminal is high, the preset threshold may be set to 95%. If the voiceprint feature matching accuracy in the terminal is low, Set the preset threshold to 90%.
本发明实施例提供的声纹检测的方法,通过终端检测是否有声音信号,若终端检测有声音信号,则终端接收声音信号,终端提取声音信号的音频信号部分和判断信号部分,将音频信号部分的声纹特征与预设声纹特征进行比对,将判断信号部分的呼气气流特征与音频信号部分的呼气气流特征进行比对,当音频信号部分的声纹特征与预设声纹特征的匹配度超过预设的阈值,且判断信号部分的呼气气流特征与音频特征部分的呼气气流特征的匹配度超过预设的阈值时,判断声纹检测结果为检测成功,使得终端识别声音信号时, 将声音信号分为音频信号部分和判断信号部分,实现对声音信号的双重识别,同时,有效避免了用户边播放录音边对口型吹气的情况,提高了声纹解锁的安全性。The voiceprint detection method provided by the embodiment of the present invention detects whether there is a sound signal through the terminal. If the terminal detects a sound signal, the terminal receives the sound signal, and the terminal extracts the audio signal portion and the judgment signal portion of the sound signal, and the audio signal portion The voiceprint feature is compared with the preset voiceprint feature, and the expiratory airflow characteristic of the judgment signal portion is compared with the exhalation airflow characteristic of the audio signal portion, and the voiceprint feature and the preset voiceprint feature of the audio signal portion are compared. When the matching degree exceeds the preset threshold, and the matching degree of the expiratory airflow characteristic of the signal part and the expiratory airflow characteristic of the audio characteristic part exceeds a preset threshold, it is determined that the voiceprint detection result is successful, so that the terminal recognizes the sound. Signal, The sound signal is divided into an audio signal part and a judgment signal part to realize double recognition of the sound signal, and at the same time, the user can effectively avoid the situation that the user blows the mouth while playing the recording, and the safety of the voiceprint unlocking is improved.
进一步地,在图2所示实施例中,声纹检测的方法还包括:Further, in the embodiment shown in FIG. 2, the method for voiceprint detection further includes:
接收判断信号部分中大于预设气流阈值的呼气气流特征。An expiratory airflow characteristic that is greater than a preset airflow threshold in the portion of the determination signal is received.
将呼气气流特征进行量化。The characteristics of the expiratory flow are quantified.
将量化后的呼气气流特征与音频信号部分的呼气气流特征进行比对。The quantified expiratory airflow characteristics are compared to the expiratory airflow characteristics of the audio signal portion.
判断信号部分的呼气气流特征与音频特征部分的呼气气流特征的匹配度超过预设的阈值,包括:The matching degree between the expiratory airflow characteristic of the signal portion and the expiratory airflow characteristic of the audio characteristic portion exceeds a preset threshold, including:
量化后的呼气气流特征与音频信号部分的呼气气流特征匹配度超过预设的阈值。The quantified expiratory airflow characteristic matches the expiratory airflow characteristic of the audio signal portion exceeding a preset threshold.
在本发明实施例中,终端在提取声音信号的音频信号部分和判断信号部分后,判断吹气信号的呼气气流的大小是否大于预设气流阈值,并接收判断信号部分中大于预设气流阈值的呼气气流,并根据呼气气流的大小进行量化。其中,本发明实施例中的预设气流阈值可以取0.10升/秒(L/s)。In the embodiment of the present invention, after extracting the audio signal portion of the sound signal and the determining signal portion, the terminal determines whether the size of the expiratory airflow of the inflating signal is greater than a preset airflow threshold, and the receiving the determining signal portion is greater than the preset airflow threshold. The expiratory flow is quantified based on the size of the expiratory flow. The preset airflow threshold in the embodiment of the present invention may take 0.10 liters/second (L/s).
进一步地,在图2所示实施例中,将呼气气流特征进行量化,包括:Further, in the embodiment shown in Figure 2, the characteristics of the expiratory flow are quantified, including:
将呼气气流特征与预设气流门限值比对,若呼气气流特征大于预设气流门限值,则将呼气流特征量化为1;否则,将呼气流量特征量化为0。The expiratory airflow characteristic is compared with a preset airflow threshold. If the expiratory airflow characteristic is greater than the preset airflow threshold, the expiratory airflow characteristic is quantized to 1; otherwise, the expiratory flow characteristic is quantized to zero.
量化后的呼气气流特征与音频信号部分的呼气气流特征匹配度超过预设的阈值,包括:以下两种情况中的至少一种。The quantified expiratory airflow characteristic and the expiratory airflow characteristic matching degree of the audio signal portion exceed a preset threshold, and includes: at least one of the following two cases.
一种情况:呼气气流特征量化为1,且音频信号部分对应的文本为送气音。In one case, the expiratory airflow characteristic is quantized to 1, and the text corresponding to the audio signal portion is an aspirated sound.
另一种情况:呼气气流特征量化为0,且音频信号部分对应的文本为不送气音。In another case, the expiratory airflow characteristic is quantized to 0, and the text corresponding to the audio signal portion is an unspised sound.
在本发明实施例中,对于吹气信号的量化,可将吹气分为几个级别,例如10个级别,当接收到的吹气信号的气大于等于第5个级别时,则判定符合了预设的门限,则判定吹气信号为1.当接收到的吹气信号的气小于第5个级别时,则判定没有达到预设的门限,则判定吹气信号为0。通过对吹气信号的量化,可提高吹气信号识别的准确性。图3A为本发明实施例一提供的吹气信号的量化示意图,如图3A所示,当吹气信号的气为达到3个级别时, 则判断吹气信号为0。图3B为本发明实施例二提供的吹气信号的量化示意图,如图3B所示,当吹气信号的气为达到8个级别时,则判断吹气信号为1。In the embodiment of the present invention, for the quantification of the insufflation signal, the insufflation may be divided into several levels, for example, 10 levels. When the gas of the received insufflation signal is greater than or equal to the fifth level, the determination is met. The preset threshold determines that the blow signal is 1. When the gas of the received blow signal is less than the fifth level, it is determined that the preset threshold is not reached, and then the blow signal is determined to be zero. By quantifying the blow signal, the accuracy of the blow signal recognition can be improved. FIG. 3A is a schematic diagram of quantification of a blowing signal according to Embodiment 1 of the present invention. As shown in FIG. 3A, when the gas of the blowing signal reaches three levels, Then, it is judged that the blowing signal is 0. FIG. 3B is a schematic diagram of quantification of a blowing signal according to Embodiment 2 of the present invention. As shown in FIG. 3B, when the gas of the blowing signal reaches 8 levels, it is determined that the blowing signal is 1.
图4为本发明实施例一提供的声纹检测的过程示意图,如图4所示,例如用户发出声纹识别信号“芝麻开门”,终端的麦克风接收到后,分离模块将分离该声纹识别信号为音频信号和吹气信号,进一步将音频信号送到声纹识别模块完成声纹识别,通过声纹识别后,音频转文字模块将音频转化为对应的文本,判断出文字的每个字或词对应的吹气信号为送气音或不送气音。吹气模块对接收到的吹气信号进行量化,对于大于等于门限的吹气信号定义为1,小于门限的定义为0,输出吹气信号的二进制信号,判断模块对音频转文字模块输出的字或词与吹气信号识别模块输出的二进制信号进行比较,例如用户说的“芝麻开门”的吹气信号为“0”“0”“1”“0”。例如用户说“top”的吹气信号为“1”“1”,用户说“sport”吹气信号为“0”“1”。4 is a schematic diagram of a process of voiceprint detection according to Embodiment 1 of the present invention. As shown in FIG. 4, for example, a user sends a voiceprint identification signal “Opening a door of sesame”, and after the microphone of the terminal is received, the separation module separates the voiceprint recognition. The signal is an audio signal and a blow signal, and the audio signal is further sent to the voiceprint recognition module to complete the voiceprint recognition. After the voiceprint recognition, the audio-to-text module converts the audio into corresponding text, and determines each word of the text or The blow signal corresponding to the word is an aspirated sound or a non-aspirate sound. The blowing module quantifies the received blowing signal, defines 1 for the blowing signal equal to or greater than the threshold, 0 for the threshold, outputs the binary signal of the blowing signal, and judges the word output by the module to the audio to text module. Or the word is compared with the binary signal output by the blowing signal recognition module. For example, the blowing signal of the "sesame door opening" that the user says is "0", "0", "1", "0". For example, the user says that the "top" blow signal is "1" "1", and the user says that the "sport" blow signal is "0" "1".
图5为本发明实施例二提供的声纹检测的方法流程图。本发明实施例提供的方法的另一种具体实现方式,如图5所示,本发明实施例提供的方法,包括:FIG. 5 is a flowchart of a method for voiceprint detection according to Embodiment 2 of the present invention. Another specific implementation manner of the method provided by the embodiment of the present invention, as shown in FIG. 5, the method provided by the embodiment of the present invention includes:
S501:终端检测是否有声音信号。S501: The terminal detects whether there is a sound signal.
S502:若终端检测有声音信号,则终端接收声音信号。S502: If the terminal detects that there is a sound signal, the terminal receives the sound signal.
S503:终端提取声音信号的音频信号部分和判断信号部分。S503: The terminal extracts an audio signal portion and a judgment signal portion of the sound signal.
S504:将音频信号部分的声纹特征与预设声纹特征进行比对。S504: Align the voiceprint feature of the audio signal portion with the preset voiceprint feature.
需要说明的是,S501、S502、S503、S504分别与S201、S202、S203、S204的实现方式相同,详见S201、S202、S203、S204的描述,此处不再赘述。It should be noted that the implementations of S501, S502, S503, and S504 are the same as those of S201, S202, S203, and S204. For details, refer to the descriptions of S201, S202, S203, and S204, and details are not described herein again.
S505:判断判断信号部分的指向方向特征与音频信号部分的指向方向特征是否在预设范围内。S505: It is judged whether the pointing direction feature of the judgment signal portion and the pointing direction feature of the audio signal portion are within a preset range.
具体的,判断信号部分可以包括指向方向特征,其中,指向方向特征为声音信号对应的用户输出声音的方向。在实际应用中,终端接收到的声音信号会存在音频信号与吹气信号来自不同方向的问题,即其他用户针对音频信号使用了录音,同时采用另一语音信号发出吹气信号,导致音频信号与吹气信号并不是来自同一语音信号,音频信号与吹气信号向性方向不一致,例如给出“芝麻开门”这几个字的吹气,但是不发音,使得播放录音的方向和吹 气的方向不一致,终端通过判断判断信号部分的指向方向特征与音频信号部分的指向方向特征是否在预设范围内,以确定音频信号与吹气信号是否来自同一指向性方向,从而可以避免录音攻击。Specifically, the determining signal portion may include a pointing direction feature, wherein the pointing direction feature is a direction in which the user corresponding to the sound signal outputs the sound. In practical applications, the sound signal received by the terminal may have problems in that the audio signal and the air blowing signal come from different directions, that is, other users use the recording for the audio signal, and at the same time, the other voice signal is used to emit the blowing signal, resulting in the audio signal and The blowing signal is not from the same voice signal, and the audio signal is inconsistent with the direction of the blowing signal. For example, the word "sesame opening" is given, but the pronunciation is not pronounced, so that the direction and blowing of the recording are played. The direction of the gas is inconsistent, and the terminal determines whether the pointing direction feature of the signal portion and the pointing direction characteristic of the audio signal portion are within a preset range to determine whether the audio signal and the blowing signal are from the same directivity direction, thereby avoiding recording attacks. .
举例来说,终端检测到用户A的语音信号可以解锁终端,即用户A的语音信号可以解锁终端。如果用户B拿着用户A的语音录音的同时,对口型吹气,但不发音,此时播放录音和对口型吹气的方向可能不一致,但是采用普通的声纹解锁验证用户时,用户B也可能成功解锁终端,从而存在安全隐患,安全性不高。本发明实施例中通过判断这两个指向性方向是否在麦克风阵列的预设范围内,若这两个指向性方向在麦克风阵列的预设范围内,则说明声音信号来自同一指向性方向,不存在录音攻击;若这两个指向性方向不在麦克风阵列的预设范围内,则说明声音信号来自不同指向性方向,存在录音攻击。For example, the terminal detects that the voice signal of the user A can unlock the terminal, that is, the voice signal of the user A can unlock the terminal. If user B holds the voice recording of user A while blowing the mouth, but does not pronounce it, the direction of playing the recording and the mouth-type blowing may not be the same at this time, but when the ordinary voiceprint is used to unlock the user, user B also The terminal may be successfully unlocked, which poses a security risk and is not safe. In the embodiment of the present invention, it is determined whether the two directivity directions are within a preset range of the microphone array. If the two directivity directions are within a preset range of the microphone array, the sound signals are from the same directivity direction, and There is a recording attack; if the two directivity directions are not within the preset range of the microphone array, the sound signal comes from different directivity directions, and there is a recording attack.
可选的,判断判断信号部分的指向方向特征与音频信号部分的指向方向特征是否在预设范围内,包括:分别将判断信号部分的指向方向的角度和音频信号部分的指向方向的角度与预设指向角度阈值比对。Optionally, determining whether the pointing direction feature of the determining signal portion and the pointing direction feature of the audio signal portion are within a preset range, comprising: respectively determining an angle of a pointing direction of the determining signal portion and a pointing direction of the audio signal portion with a pre-preparation Set the pointing angle threshold comparison.
可选的,判断信号部分的指向方向特征与音频信号部分的指向方向特征在预设范围内,包括:判断信号部分的指向方向的角度和音频信号部分的指向方向的角度均小于预设指向角度阈值。Optionally, determining the pointing direction feature of the signal portion and the pointing direction feature of the audio signal portion are within a preset range, including: determining that the angle of the pointing direction of the signal portion and the pointing direction of the audio signal portion are smaller than the preset pointing angle Threshold.
在本发明实施例中,可以使用麦克风指向性接收技术防止录音攻击,终端对声音信号进行识别时,可以包含一种麦克风定向接收信号的模式,即麦克风进入指向性监听模式,只接收符合预设角度范围内的音频和吹气信号,通过限定麦克风接收音频信号和吹气信号的范围,可避免录音攻击。In the embodiment of the present invention, the microphone directivity receiving technology can be used to prevent the recording attack. When the terminal recognizes the sound signal, the terminal can include a mode in which the microphone is directed to receive the signal, that is, the microphone enters the directivity listening mode, and only the receiving conforms to the preset. Audio and blow signals in the range of angles avoid recording attacks by limiting the range of microphones that receive audio and blow signals.
麦克风的指向性接收根据音源定位技术,可通过麦克风阵列来实现,一般可以捕捉不同方向传来的声音,通过算法运算使麦克风指向某一特定的方向,形成“波束”指向收音,放大该方向捕捉的音频信号,通过该方法可以实现麦克风的指向性接收语音信号。声波抵达阵列中每个麦克风之间的微小时差相互作用,麦克风阵列可得到比单个麦克风更好地指向性。具体实现包括,麦克风阵列可将收音波束指向一定角度的范围,例如通过广义互相关方法,平滑相干变换,相位变换或最大似然进行加权,再根据时延和麦克风阵列的集合位置调整收音方向,通过调整算法中的参数,进一步调整麦克风指 向性接收方向为θ角的圆锥形,进一步判断接收到的声源S的音频信号和吹气信号均来自小于θ角方向圆锥内的方向为θ1角方向内的有效信息号。The directional reception of the microphone is realized by the sound source localization technology, which can be realized by the microphone array. Generally, the sound transmitted from different directions can be captured, and the microphone is pointed to a specific direction by an algorithm operation to form a “beam” pointing to the sound, and the direction is captured. The audio signal can be used to realize the directionality of the microphone to receive the voice signal. The acoustic waves reach the microhour difference between each microphone in the array, and the microphone array can achieve better directivity than a single microphone. The specific implementation includes that the microphone array can point the sound beam to a range of angles, for example, by a generalized cross-correlation method, smooth coherent transform, phase transform or maximum likelihood, and then adjust the radio direction according to the delay and the set position of the microphone array. Further adjust the microphone finger by adjusting the parameters in the algorithm The tangential receiving direction is a conical shape with an angle θ, and it is further determined that the audio signal and the blowing signal of the received sound source S are both from the effective information number in the direction of the θ1 angle in the direction smaller than the θ angular direction cone.
举例来说,图6为本发明实施例一提供的声音信号的指向方向的角度的示意图。如图6所示,一个手机具有两个麦克风A和B,A和B之间的距离固定,为已知的d,声音的传播速度固定为C,根据声音到达麦克风A和B的时间差为τ,可计算出声源(声音信号信号)与麦克风B之间的夹角θ1,根据该夹角θ1判断是否在有效声源方向θ角圆锥形之内。从而可以判断出声源的音频信号和吹气信号为指向性麦克风接收的有效信号。也可以通过公式计算得到声源方向,其中τ为声音到达两个麦克风的延迟量,d为两个麦克风之间的距离,θ1是语音信号的指向性方向角度,C是声音的速度。For example, FIG. 6 is a schematic diagram of an angle of a pointing direction of a sound signal according to Embodiment 1 of the present invention. As shown in FIG. 6, one mobile phone has two microphones A and B, and the distance between A and B is fixed, which is a known d, and the sound propagation speed is fixed to C, and the time difference between the arrival of the microphones A and B according to the sound is τ. The angle θ1 between the sound source (sound signal signal) and the microphone B can be calculated, and it is judged whether it is within the conic shape of the effective sound source direction θ angle according to the angle θ1. Therefore, it can be judged that the audio signal and the air blowing signal of the sound source are effective signals received by the directional microphone. Can also pass the formula The direction of the sound source is calculated, where τ is the delay amount of the sound reaching the two microphones, d is the distance between the two microphones, θ1 is the directivity direction angle of the speech signal, and C is the speed of the sound.
进一步地,可设定声源与麦克风的距离门限,例如通过光传感器,红外传感器,超声波传感器等判断声源与麦克风距离,通过设定距离门限,可以确保录音攻击和吹气信号的方向是否一致,因为若声源与麦克风距离较近时,会使得录音攻击和吹气信号来自同一方向。Further, the distance between the sound source and the microphone can be set. For example, the distance between the sound source and the microphone is determined by the light sensor, the infrared sensor, the ultrasonic sensor, etc., and the distance threshold is set to ensure whether the direction of the recording attack and the blowing signal are consistent. Because if the sound source is close to the microphone, the recording attack and the blow signal will come from the same direction.
S506:当音频信号部分的声纹特征与预设声纹特征的匹配度超过预设的阈值,且判断信号部分的指向方向特征与音频信号部分的指向方向特征在预设范围内时,判断声纹检测结果为检测成功。S506: determining a sound when the matching degree between the voiceprint feature of the audio signal portion and the preset voiceprint feature exceeds a preset threshold, and determining that the pointing direction feature of the signal portion and the pointing direction feature of the audio signal portion are within a preset range The result of the detection is successful.
具体的,当音频信号部分的声纹特征与预设声纹特征的匹配度超过预设的阈值,且声纹识别信号中的音频信号与吹气信号是否来自同一指向性方向时,判断声纹检测结果为检测成功。Specifically, when the matching degree between the voiceprint feature of the audio signal portion and the preset voiceprint feature exceeds a preset threshold, and whether the audio signal and the air blowing signal in the voiceprint recognition signal are from the same directivity direction, determining the voiceprint The test result is that the test is successful.
本发明实施例提供的声纹检测的方法,通过终端检测是否有声音信号,若终端检测有声音信号,则终端接收声音信号,终端提取声音信号的音频信号部分和判断信号部分,将音频信号部分的声纹特征与预设声纹特征进行比对,判断信号部分的指向方向特征与音频信号部分的指向方向特征是否在预设范围内,当音频信号部分的声纹特征与预设声纹特征的匹配度超过预设的阈值,且所判断信号部分的指向方向特征与音频信号部分的指向方向特征在预设范围内时,判断声纹检测结果为检测成功,使得终端识别声音信号时,将声音信号分为音频信号部分和判断信号部分,实现对声音信号的双重识别,同时,有效避免了播放录音和对口型吹气的方向可能不一致的情况,提高了声纹解锁的安全性。 The voiceprint detection method provided by the embodiment of the present invention detects whether there is a sound signal through the terminal. If the terminal detects a sound signal, the terminal receives the sound signal, and the terminal extracts the audio signal portion and the judgment signal portion of the sound signal, and the audio signal portion The voiceprint feature is compared with the preset voiceprint feature to determine whether the pointing direction feature of the signal portion and the pointing direction feature of the audio signal portion are within a preset range, when the voiceprint feature of the audio signal portion and the preset voiceprint feature When the matching degree exceeds the preset threshold, and the pointing direction characteristic of the determined signal portion and the pointing direction characteristic of the audio signal portion are within the preset range, determining that the voiceprint detection result is successful for detection, so that when the terminal recognizes the sound signal, The sound signal is divided into an audio signal part and a judgment signal part to realize double recognition of the sound signal, and at the same time, the situation that the playing recording and the mouth-type blowing direction may be inconsistent is effectively avoided, and the security of the voiceprint unlocking is improved.
图7为本发明实施例三提供的声纹检测的方法流程图。本发明实施例提供的方法的又一种具体实现方式,如图7所示,本发明实施例提供的方法,包括:FIG. 7 is a flowchart of a method for voiceprint detection according to Embodiment 3 of the present invention. A further implementation manner of the method provided by the embodiment of the present invention, as shown in FIG. 7 , the method provided by the embodiment of the present invention includes:
S701:终端检测是否有声音信号。S701: The terminal detects whether there is a sound signal.
S702:若终端检测有声音信号,则终端接收声音信号。S702: If the terminal detects that there is a sound signal, the terminal receives the sound signal.
S703:终端提取声音信号的音频信号部分和判断信号部分。S703: The terminal extracts an audio signal portion and a judgment signal portion of the sound signal.
S704:将音频信号部分的声纹特征与预设声纹特征进行比对。S704: Align the voiceprint feature of the audio signal portion with the preset voiceprint feature.
需要说明的是,S701、S702、S703、S704分别与S201、S202、S203、S204的实现方式相同,详见S201、S202、S203、S204的描述,此处不再赘述。It should be noted that S701, S702, S703, and S704 are implemented in the same manner as S201, S202, S203, and S204, respectively. For details, refer to the descriptions of S201, S202, S203, and S204, and details are not described herein again.
S705:将判断信号部分的感知温度特征与预设温度阈值比对。S705: Compare the perceived temperature characteristic of the determination signal portion with a preset temperature threshold.
具体的,判断信号部分可以包括感知温度特征,其中,感知温度特征为声音信号对应的用户输出声音时的温度。将判断信号部分的感知温度特征与预设温度阈值比对,确定判断信号部分的感知温度特征是否大于等于预设温度阈值。比如,终端可以通过红外传感器感知临近麦克风的温度来判断语音信号是来自人体,比如用户,而不是录音的电子设备。其中,预设温度阈值可以根据人体的温度范围而定,一般将预设温度阈值设为人体的正常范围内的最低温度,比如36摄氏度。Specifically, the determining signal portion may include a sensing temperature characteristic, wherein the sensing temperature characteristic is a temperature when the user corresponding to the sound signal outputs the sound. The sensing temperature characteristic of the determining signal portion is compared with a preset temperature threshold to determine whether the sensing temperature characteristic of the determining signal portion is greater than or equal to a preset temperature threshold. For example, the terminal can sense the temperature of the adjacent microphone through the infrared sensor to determine that the voice signal is from the human body, such as a user, not a recorded electronic device. The preset temperature threshold may be determined according to a temperature range of the human body, and the preset temperature threshold is generally set to a minimum temperature within a normal range of the human body, such as 36 degrees Celsius.
S706:当音频信号部分的声纹特征与预设声纹特征的匹配度超过预设的阈值,且判断信号部分的感知温度特征大于等于预设温度阈值时,判断声纹检测结果为检测成功。S706: When the matching degree of the voiceprint feature of the audio signal portion and the preset voiceprint feature exceeds a preset threshold, and the perceived temperature characteristic of the signal portion is greater than or equal to the preset temperature threshold, determining that the voiceprint detection result is successful.
具体的,当音频信号部分的声纹特征与预设声纹特征的匹配度超过预设的阈值,且判断信号部分的感知温度特征大于等于预设温度阈值时,则可判断终端接收的语音信号来自用户,而不是录音的电子设备,从而避免了录音攻击,提高了声纹解锁的安全性。Specifically, when the matching degree between the voiceprint feature of the audio signal portion and the preset voiceprint feature exceeds a preset threshold, and the sensing temperature characteristic of the signal portion is greater than or equal to the preset temperature threshold, the voice signal received by the terminal may be determined. Electronic devices from users, not recordings, thus avoiding recording attacks and improving the security of voiceprint unlocking.
本发明实施例提供的声纹检测的方法,通过终端检测是否有声音信号,若终端检测有声音信号,则终端接收声音信号,终端提取声音信号的音频信号部分和判断信号部分,将音频信号部分的声纹特征与预设声纹特征进行比对,将判断信号部分的感知温度特征与预设温度阈值比对,当音频信号部分的声纹特征与预设声纹特征的匹配度超过预设的阈值,且判断信号部分的感 知温度特征大于等于预设温度阈值时,判断声纹检测结果为检测成功,当音频信号部分的声纹特征与预设声纹特征的匹配时,通过确定判断信号部分的感知温度特征是否大于等于预设温度阈值,进而判断终端接收的语音信号来自用户,而不是录音的电子设备,从而避免了录音攻击,提高了声纹解锁的安全性。The voiceprint detection method provided by the embodiment of the present invention detects whether there is a sound signal through the terminal. If the terminal detects a sound signal, the terminal receives the sound signal, and the terminal extracts the audio signal portion and the judgment signal portion of the sound signal, and the audio signal portion The voiceprint feature is compared with the preset voiceprint feature, and the perceived temperature characteristic of the determination signal portion is compared with the preset temperature threshold, and the matching degree between the voiceprint feature of the audio signal portion and the preset voiceprint feature exceeds the preset. Threshold and judge the sense of the signal When the temperature characteristic is greater than or equal to the preset temperature threshold, it is determined that the voiceprint detection result is successful. When the voiceprint feature of the audio signal portion matches the preset voiceprint feature, it is determined whether the perceived temperature characteristic of the signal portion is greater than or equal to The preset temperature threshold is used to determine that the voice signal received by the terminal is from the user, not the recorded electronic device, thereby avoiding the recording attack and improving the security of the voiceprint unlocking.
图8为本发明实施例四提供的声纹检测的方法流程图。本发明实施例提供的方法为图2所示实施例一提供方法的再一种具体实现方式,如图8所示,本发明实施例提供的方法,包括:FIG. 8 is a flowchart of a method for voiceprint detection according to Embodiment 4 of the present invention. The method provided by the embodiment of the present invention is another specific implementation manner of the method provided in the embodiment 1 of FIG. 2 . As shown in FIG. 8 , the method provided by the embodiment of the present invention includes:
S801:当检测到声纹识别信号时,终端进入活体声纹识别模式。S801: When the voiceprint recognition signal is detected, the terminal enters the living voiceprint recognition mode.
在当检测到声纹识别信号时,终端进入活体声纹识别模式之前,还包括:Before the voiceprint recognition signal is detected, before the terminal enters the living voiceprint recognition mode, the terminal further includes:
终端检测是否有声纹识别信号;其中,声纹识别信号为终端在未解锁状态时检测到的声音信号。The terminal detects whether there is a voiceprint recognition signal; wherein the voiceprint recognition signal is a sound signal detected when the terminal is in an unlocked state.
终端检测是否有声纹识别信号,包括:在未解锁状态时,终端检测是否有声音信号;若终端检测到有声音信号,则声音信号为声纹识别信号。The terminal detects whether there is a voiceprint identification signal, including: when the unlocked state, the terminal detects whether there is a sound signal; if the terminal detects a sound signal, the sound signal is a voiceprint recognition signal.
S802:终端接收声纹识别信号并进行存储。S802: The terminal receives the voiceprint recognition signal and stores it.
S803:终端提取声纹识别信号的音频信号部分和判断信号部分。S803: The terminal extracts an audio signal portion and a judgment signal portion of the voiceprint recognition signal.
S804:终端判断音频信号部分的声纹特征与预设声纹特征的匹配度是否超过预设的阈值。若音频信号部分的声纹特征与预设声纹特征的匹配度超过预设的阈值,则执行S805;否则,执行S808。S804: The terminal determines whether the matching degree between the voiceprint feature of the audio signal portion and the preset voiceprint feature exceeds a preset threshold. If the matching degree between the voiceprint feature of the audio signal portion and the preset voiceprint feature exceeds a preset threshold, then S805 is performed; otherwise, S808 is performed.
可选的,在本发明实施中,终端可以通过将音频信号部分的声纹特征与预设声纹特征进行比对,以判断音频信号部分的声纹特征与预设声纹特征的匹配度是否超过预设的阈值。Optionally, in the implementation of the present invention, the terminal may compare the voiceprint feature of the audio signal portion with the preset voiceprint feature to determine whether the matching relationship between the voiceprint feature of the audio signal portion and the preset voiceprint feature is Exceeded the preset threshold.
S805:终端判断音频信号部分的音频信号与判断信号部分的吹气信号是否来自同一指向性方向。若音频信号部分的音频信号与判断信号部分的吹气信号来自同一指向性方向,则执行S806;否则,执行S808。S805: The terminal determines whether the audio signal of the audio signal portion and the air blowing signal of the determination signal portion are from the same directivity direction. If the audio signal of the audio signal portion and the air blowing signal of the determination signal portion are from the same directivity direction, then S806 is performed; otherwise, S808 is performed.
可选的,在本发明实施中,终端可以通过判断判断信号部分的指向方向特征与音频信号部分的指向方向特征是否在预设范围内,以判断音频信号部分的音频信号与判断信号部分的吹气信号是否来自同一指向性方向。Optionally, in the implementation of the present invention, the terminal may determine whether the pointing direction feature of the signal portion and the pointing direction feature of the audio signal portion are within a preset range to determine the audio signal and the judgment signal portion of the audio signal portion. Whether the gas signal comes from the same directivity direction.
S806:终端判断音频信号部分对应的文本与判断信号部分中的呼气气流是否匹配。若音频信号部分对应的文本与判断信号部分中的呼气气流匹配, 则执行S807;否则,执行S808。S806: The terminal determines whether the text corresponding to the audio signal portion matches the expiratory airflow in the determination signal portion. If the text corresponding to the audio signal portion matches the expiratory airflow in the judgment signal portion, Then, S807 is executed; otherwise, S808 is executed.
可选的,在本发明实施中,终端可以通过将判断信号部分的呼气气流特征与音频信号部分的呼气气流特征进行比对,以判断音频信号部分对应的文本与判断信号部分中的呼气气流是否匹配。Optionally, in the implementation of the present invention, the terminal may compare the expiratory airflow characteristic of the determination signal part with the expiratory airflow characteristic of the audio signal part to determine the text corresponding to the audio signal part and the call in the judgment signal part. Whether the gas flow matches.
S807:活体声纹检测成功。S807: The live voiceprint detection was successful.
S808:活体声纹检测失败。S808: The live voiceprint detection failed.
需要说明的是,可选的,在本发明实施中,在判断音频信号部分的声纹特征与预设声纹特征的匹配度超过预设的阈值之后,判断音频信号部分的音频信号与判断信号部分的吹气信号是否来自同一指向性方向之前,还包括:判断判断信号部分的感知温度特征是否大于等于预设温度阈值;若音频信号部分的声纹特征与预设声纹特征的匹配度超过预设的阈值,且判断信号部分的呼气气流特征与音频特征部分的呼气气流特征的匹配度超过预设的阈值,判断信号部分的指向方向特征与音频信号部分的指向方向特征在预设范围内,以及判断信号部分的感知温度特征大于等于预设温度阈值时,活体声纹检测成功。It should be noted that, in the implementation of the present invention, after determining that the matching degree between the voiceprint feature of the audio signal portion and the preset voiceprint feature exceeds a preset threshold, the audio signal and the determination signal of the audio signal portion are determined. Before the partial blowing signal is from the same directivity direction, the method further comprises: determining whether the sensing temperature characteristic of the determining signal portion is greater than or equal to a preset temperature threshold; if the matching relationship between the voiceprint feature of the audio signal portion and the preset voiceprint feature exceeds a preset threshold, and determining that the matching degree of the expiratory airflow characteristic of the signal part and the expiratory airflow characteristic of the audio characteristic part exceeds a preset threshold value, and determining a pointing direction characteristic of the signal part and a pointing direction characteristic of the audio signal part are preset The living voiceprint detection is successful within the range and when the perceived temperature characteristic of the signal portion is greater than or equal to the preset temperature threshold.
本发明实施例提供的声纹检测的方法,通过当检测到声纹识别信号时,终端进入活体声纹识别模式,终端接收声纹识别信号并进行存储,终端提取声纹识别信号的音频信号部分和判断信号部分,当音频信号部分的声纹特征与预设声纹特征的匹配度超过预设的阈值,且判断信号部分的判断特征满足预设的判断条件时,判断声纹检测结果为检测成功,使得终端识别声纹识别信号时,将声纹识别信号分为音频信号部分和判断信号部分,实现对声纹识别信号的双重识别,提高了声纹解锁的安全性。同时,通过当音频信号部分的声纹特征与预设声纹特征的匹配,且声纹识别信号中的音频信号与吹气信号来自同一指向性方向,以及声纹识别信号中的音频信号对应的文本与吹气信号的呼气气流匹配时,判断声纹检测结果为检测成功,有效避免了播放录音和对口型吹气的方向可能不一致的情况,提高了声纹解锁的安全性。The voiceprint detection method provided by the embodiment of the present invention, when the voiceprint recognition signal is detected, the terminal enters the living voiceprint recognition mode, the terminal receives the voiceprint recognition signal and stores, and the terminal extracts the audio signal portion of the voiceprint recognition signal. And determining the signal portion, when the matching degree between the voiceprint feature of the audio signal portion and the preset voiceprint feature exceeds a preset threshold, and the judgment feature of the determination signal portion satisfies a preset determination condition, determining the voiceprint detection result is detection Successfully, when the terminal recognizes the voiceprint recognition signal, the voiceprint recognition signal is divided into an audio signal portion and a judgment signal portion, thereby realizing double recognition of the voiceprint recognition signal, and improving the security of the voiceprint unlocking. At the same time, by matching the voiceprint feature of the audio signal portion with the preset voiceprint feature, and the audio signal in the voiceprint recognition signal and the blow signal are from the same directivity direction, and the audio signal in the voiceprint recognition signal corresponds to When the text matches the exhalation airflow of the blowing signal, it is judged that the voiceprint detection result is successful, and the situation that the playing recording and the mouth-type blowing direction may be inconsistent is effectively avoided, and the security of the voiceprint unlocking is improved.
进一步地,在上述实施例中,在终端提取声纹识别信号的音频信号部分和判断信号部分之前,还包括:Further, in the above embodiment, before the terminal extracts the audio signal portion and the determination signal portion of the voiceprint recognition signal, the method further includes:
终端将声纹识别信号分离为音频信号部分和判断信号部分;The terminal separates the voiceprint recognition signal into an audio signal portion and a judgment signal portion;
终端将声纹识别信号分离为音频信号部分和判断信号部分,包括: The terminal separates the voiceprint recognition signal into an audio signal portion and a judgment signal portion, including:
终端将声纹识别信号采用第一预设频率的滤波器进行滤波,得到音频信号部分;The terminal filters the voiceprint recognition signal by using a filter of a first preset frequency to obtain an audio signal portion;
终端将声纹识别信号采用第二预设频率的滤波器进行滤波,得到判断信号部分;The terminal filters the voiceprint recognition signal by using a filter of a second preset frequency to obtain a judgment signal part;
其中,第一预设频率的滤波器为高通滤波器,第二预设频率的滤波器为低通滤波器。The filter of the first preset frequency is a high pass filter, and the filter of the second preset frequency is a low pass filter.
图9为本发明实施例一提供的终端结构示意图。如图9所示,本发明实施例提供的终端,包括:检测模块901、接收模块902、提取模块903、第一匹配模块904和判断模块905。FIG. 9 is a schematic structural diagram of a terminal according to Embodiment 1 of the present invention. As shown in FIG. 9, the terminal provided by the embodiment of the present invention includes: a detecting
检测模块901,用于检测是否有声音信号。The detecting
接收模块902,用于接收声音信号。The receiving
提取模块903,用于提取声音信号的音频信号部分和判断信号部分。The
第一匹配模块904,用于将音频信号部分的声纹特征与预设声纹特征进行比对;将判断信号部分的呼气气流特征与音频信号部分的呼气气流特征进行比对。The
其中,呼气气流特征为声音信号对应的用户输出声音时呼出的气流的特征。The exhalation airflow is characterized by the airflow exhaled when the user corresponding to the sound signal outputs the sound.
判断模块905,用于当音频信号部分的声纹特征与预设声纹特征的匹配度超过预设的阈值,且判断信号部分的呼气气流特征与音频特征部分的呼气气流特征的匹配度超过预设的阈值时,判断声纹检测结果为检测成功。The determining
本发明实施例的终端用于执行图2所示方法实施例的技术方案,其实现原理和技术效果类似,此处不再赘述。The terminal of the embodiment of the present invention is used to perform the technical solution of the method embodiment shown in FIG. 2, and the implementation principle and the technical effect are similar, and details are not described herein again.
进一步地,在图9所示实施例中,接收模块902,还用于接收判断信号部分中大于预设气流阈值的呼气气流特征。Further, in the embodiment shown in FIG. 9, the receiving
终端还包括:量化模块。The terminal further includes: a quantization module.
量化模块,用于将呼气气流特征进行量化。A quantification module for quantifying the characteristics of the expiratory flow.
第一匹配模块904,还用于将量化后的呼气气流特征与音频信号部分的呼气气流特征进行比对。The
判断模块905判断的判断信号部分的呼气气流特征与音频特征部分的呼气气流特征的匹配度超过预设的阈值,包括:量化后的呼气气流特征与音频
信号部分的呼气气流特征匹配度超过预设的阈值。The matching degree between the expiratory airflow characteristic of the judging signal portion judged by the judging
进一步地,在图2所示实例中,第一匹配模块904具体用于:将呼气气流特征与预设气流门限值比对,若呼气气流特征大于预设气流门限值,则将呼气流特征量化为1;否则,将呼气流量特征量化为0。Further, in the example shown in FIG. 2, the
判断模块905判断的量化后的呼气气流特征与音频信号部分的呼气气流特征匹配度超过预设的阈值,包括:以下两种情况中的至少一种:The matching degree of the expired expiratory airflow characteristic determined by the determining
一种情况:呼气气流特征量化为1,且音频信号部分对应的文本为送气音。In one case, the expiratory airflow characteristic is quantized to 1, and the text corresponding to the audio signal portion is an aspirated sound.
另一种情况:呼气气流特征量化为0,且音频信号部分对应的文本为不送气音。In another case, the expiratory airflow characteristic is quantized to 0, and the text corresponding to the audio signal portion is an unspised sound.
图10为本发明实施例二提供的终端结构示意图。如图10所示,本发明实施例提供的终端,在上述实施例的基础上,还包括:第二匹配模块906。FIG. 10 is a schematic structural diagram of a terminal according to Embodiment 2 of the present invention. As shown in FIG. 10, the terminal provided by the embodiment of the present invention further includes: a
第二匹配模块906,用于判断判断信号部分的指向方向特征与音频信号部分的指向方向特征是否在预设范围内。The
判断模905,还用于当音频信号部分的声纹特征与预设声纹特征的匹配度超过预设的阈值,且判断信号部分的呼气气流特征与音频特征部分的呼气气流特征的匹配度超过预设的阈值,以及判断信号部分的指向方向特征与音频信号部分的指向方向特征在预设范围内时,判断声纹检测结果为检测成功。The determining
本发明实施例的终端用于执行图5所示方法实施例的技术方案,其实现原理和技术效果类似,此处不再赘述。The terminal of the embodiment of the present invention is used to implement the technical solution of the method embodiment shown in FIG. 5, and the implementation principle and the technical effect are similar, and details are not described herein again.
进一步地,在图10所示实施例中,第二匹配模块906具体用于:分别将判断信号部分的指向方向的角度和音频信号部分的指向方向的角度与预设指向角度阈值比对。Further, in the embodiment shown in FIG. 10, the
判断模块905判断的判断信号部分的指向方向特征与音频信号部分的指向方向特征在预设范围内,包括:判断信号部分的指向方向的角度和音频信号部分的指向方向的角度均小于预设指向角度阈值。The pointing direction feature of the determining signal portion determined by the determining
图11为本发明实施例三提供的终端结构示意图。如图11所示,本发明实施例提供的终端,在上述实施例的基础上,还包括:第三匹配模块907。FIG. 11 is a schematic structural diagram of a terminal according to Embodiment 3 of the present invention. As shown in FIG. 11, the terminal provided by the embodiment of the present invention further includes: a
第三匹配模块907,用于将判断信号部分的感知温度特征与预设温度阈值比对。
The
判断模块905,还用于当音频信号部分的声纹特征与预设声纹特征的匹配度超过预设的阈值,且判断信号部分的呼气气流特征与音频特征部分的呼气气流特征的匹配度超过预设的阈值,判断信号部分的指向方向特征与音频信号部分的指向方向特征在预设范围内,以及判断信号部分的感知温度特征大于等于预设温度阈值时,判断声纹检测结果为检测成功。The determining
本发明实施例的终端用于执行图7所示方法实施例的技术方案,其实现原理和技术效果类似,此处不再赘述。The terminal of the embodiment of the present invention is used to implement the technical solution of the method embodiment shown in FIG. 7. The implementation principle and the technical effect are similar, and details are not described herein again.
进一步地,在图11所示实施例中,终端还包括:分离模块。Further, in the embodiment shown in FIG. 11, the terminal further includes: a separation module.
分离模块,用于在提取模块提取声音信号的音频信号部分和判断信号部分之前,将声音信号分离为音频信号部分和判断信号部分。And a separation module, configured to separate the sound signal into an audio signal portion and a determination signal portion before the extraction module extracts the audio signal portion of the sound signal and the determination signal portion.
分离模块具体用于:将声音信号采用第一预设频率的滤波器进行滤波,得到音频信号部分;将声音信号采用第二预设频率的滤波器进行滤波,得到判断信号部分。The separation module is specifically configured to: filter the sound signal by using a filter of a first preset frequency to obtain an audio signal portion; and filter the sound signal by using a filter of a second preset frequency to obtain a judgment signal portion.
其中,第一预设频率的滤波器为高通滤波器,第二预设频率的滤波器为低通滤波器。The filter of the first preset frequency is a high pass filter, and the filter of the second preset frequency is a low pass filter.
图12为本发明实施例四提供的终端结构示意图。如图12所示,本发明实施例提供的终端,包括:麦克风1201、存储器1202和处理器1203。FIG. 12 is a schematic structural diagram of a terminal according to Embodiment 4 of the present invention. As shown in FIG. 12, a terminal provided by an embodiment of the present invention includes: a
需要说明的是,麦克风1201可以与终端的检测模块901对应,用于检测是否有声音信号;若检测有声音信号,则接收声音信号。麦克风1503还可以用于接收判断信号部分中大于预设气流阈值的呼气气流特征。存储器1202用于存储执行指令,处理器1203可以是一个中央处理器(Central Processing Unit,CPU),或者是特定集成电路(Application Specific Integrated Circuit,ASIC),或者完成实施本发明实施例的一个或多个集成电路。当终端运行时,处理器1203与存储器1202之间通信,处理器1203调用执行指令,用于执行以下操作:It should be noted that the
提取声音信号的音频信号部分和判断信号部分;将音频信号部分的声纹特征与预设声纹特征进行比对;将判断信号部分的呼气气流特征与音频信号部分的呼气气流特征进行比对;其中,呼气气流特征为声音信号对应的用户输出声音时呼出的气流的特征;当音频信号部分的声纹特征与预设声纹特征的匹配度超过预设的阈值,且判断信号部分的呼气气流特征与音频特征部分 的呼气气流特征的匹配度超过预设的阈值时,判断声纹检测结果为检测成功。Extracting an audio signal portion and a judgment signal portion of the sound signal; comparing the voiceprint feature of the audio signal portion with the preset voiceprint feature; comparing the expiratory airflow characteristic of the determination signal portion with the expiratory airflow characteristic of the audio signal portion Wherein, the expiratory airflow characteristic is a feature of the airflow exhaled when the user corresponding to the sound signal outputs the sound; when the matching degree of the voiceprint feature of the audio signal portion and the preset voiceprint feature exceeds a preset threshold, and the determination signal portion Expiratory airflow characteristics and audio features When the matching degree of the expiratory airflow characteristic exceeds a preset threshold, it is determined that the voiceprint detection result is the detection success.
可选的,终端还可以包括:录音机1204。Optionally, the terminal may further include: a
需要说明的是,录音机1204可以用于采集用户所发出的声音信号,对声音信号进行特征分析获取预设声纹特征并存储。It should be noted that the
处理器1203还用于执行以下操作:The
将呼气气流特征进行量化;将量化后的呼气气流特征与音频信号部分的呼气气流特征进行比对;Quantifying the characteristics of the expiratory flow; comparing the quantified expiratory flow characteristics with the characteristics of the expiratory flow of the audio signal portion;
处理器1203判断的判断信号部分的呼气气流特征与音频特征部分的呼气气流特征的匹配度超过预设的阈值,包括:量化后的呼气气流特征与音频信号部分的呼气气流特征匹配度超过预设的阈值The matching degree between the expiratory airflow characteristic of the judging signal portion determined by the
处理器1203还用于执行以下操作:The
将呼气气流特征与预设气流门限值比对,若呼气气流特征大于预设气流门限值,则将呼气流特征量化为1;否则,将呼气流量特征量化为0;Comparing the characteristics of the expiratory airflow with the preset airflow threshold, and if the expiratory airflow characteristic is greater than the preset airflow threshold, quantizing the expiratory airflow characteristic to 1; otherwise, quantifying the expiratory flow characteristic to 0;
处理器1203判断的量化后的呼气气流特征与音频信号部分的呼气气流特征匹配度超过预设的阈值,包括:以下两种情况中的至少一种:The degree of matching between the quantized expiratory airflow characteristic determined by the
一种情况:呼气气流特征量化为1,且音频信号部分对应的文本为送气音;A case where the expiratory airflow characteristic is quantized to be 1, and the text corresponding to the audio signal portion is an aspirating tone;
另一种情况:呼气气流特征量化为0,且音频信号部分对应的文本为不送气音。In another case, the expiratory airflow characteristic is quantized to 0, and the text corresponding to the audio signal portion is an unspised sound.
处理器1203还用于执行以下操作:The
判断判断信号部分的指向方向特征与音频信号部分的指向方向特征是否在预设范围内;当音频信号部分的声纹特征与预设声纹特征的匹配度超过预设的阈值,且判断信号部分的呼气气流特征与音频特征部分的呼气气流特征的匹配度超过预设的阈值,以及判断信号部分的指向方向特征与音频信号部分的指向方向特征在预设范围内时,判断声纹检测结果为检测成功。Determining whether the pointing direction characteristic of the judgment signal portion and the pointing direction characteristic of the audio signal portion are within a preset range; when the matching degree between the voiceprint feature of the audio signal portion and the preset voiceprint feature exceeds a preset threshold, and determining the signal portion Judging the voiceprint detection when the matching degree of the expiratory airflow characteristic with the exhalation airflow characteristic of the audio characteristic portion exceeds a preset threshold value, and the pointing direction characteristic of the signal portion and the pointing direction characteristic of the audio signal portion are within a preset range The result is a successful test.
处理器1203还用于执行以下操作:The
分别将判断信号部分的指向方向的角度和音频信号部分的指向方向的角度与预设指向角度阈值比对;Comparing the angle of the pointing direction of the signal portion and the pointing direction of the audio signal portion with the preset pointing angle threshold respectively;
处理器1203判断的判断信号部分的指向方向特征与音频信号部分的指向方向特征在预设范围内,包括:判断信号部分的指向方向的角度和音频信
号部分的指向方向的角度均小于预设指向角度阈值。The pointing direction feature of the judgment signal portion judged by the
处理器1203还用于执行以下操作:The
将判断信号部分的感知温度特征与预设温度阈值比对;当音频信号部分的声纹特征与预设声纹特征的匹配度超过预设的阈值,且判断信号部分的呼气气流特征与音频特征部分的呼气气流特征的匹配度超过预设的阈值,判断信号部分的指向方向特征与音频信号部分的指向方向特征在预设范围内,以及判断信号部分的感知温度特征大于等于预设温度阈值时,判断声纹检测结果为检测成功。Comparing the perceived temperature characteristic of the determination signal portion with the preset temperature threshold; when the matching degree between the voiceprint feature of the audio signal portion and the preset voiceprint feature exceeds a preset threshold, and determining the expiratory airflow characteristic and audio of the signal portion The matching degree of the expiratory airflow characteristic of the characteristic part exceeds a preset threshold, determining that the pointing direction characteristic of the signal part and the pointing direction characteristic of the audio signal part are within a preset range, and determining that the sensing temperature characteristic of the signal part is greater than or equal to the preset temperature At the threshold, it is judged that the voiceprint detection result is successful.
处理器1203还用于执行以下操作:The
将音频信号部分的声纹波形与预设声纹样本特征波形进行比对;Comparing the voiceprint waveform of the audio signal portion with the preset voiceprint sample feature waveform;
和/或,and / or,
将音频信号部分的信号频率与预设声纹样本特征频率进行比对。The signal frequency of the audio signal portion is compared with the preset voiceprint sample characteristic frequency.
处理器1203还用于执行以下操作:The
将声音信号分离为音频信号部分和判断信号部分。The sound signal is separated into an audio signal portion and a judgment signal portion.
具体的,将声音信号采用第一预设频率的滤波器进行滤波,得到音频信号部分;将声音信号采用第二预设频率的滤波器进行滤波,得到判断信号部分;其中,第一预设频率的滤波器为高通滤波器,第二预设频率的滤波器为低通滤波器。Specifically, the sound signal is filtered by using a filter of a first preset frequency to obtain an audio signal portion; and the sound signal is filtered by a filter of a second preset frequency to obtain a judgment signal portion; wherein, the first preset frequency The filter is a high pass filter and the filter of the second preset frequency is a low pass filter.
图13为本发明实施例一提供的声纹检测的装置结构示意图。本发明实例例提供的装置可以实作成单独一台装置,也可以整合于各种不同的语音助手装置中,诸如机顶盒、移动电话、平板电脑(Tablet Personal Computer)、膝上型电脑(Laptop Computer)、多媒体播放器、数字摄影机、个人数字助理(personal digital assistant,简称PDA)、导航装置、移动上网装置(Mobile Internet Device,简称MID)或可穿戴式设备(Wearable Device)等。如图13所示,本发明实施例提供的装置,可以包括以下一个或多个单元:输入单元、存储单元、处理器单元、通信单元、外设接口、输出单元和电源。FIG. 13 is a schematic structural diagram of an apparatus for detecting a voiceprint according to Embodiment 1 of the present invention. The device provided by the example of the present invention can be implemented as a single device, or can be integrated into various voice assistant devices, such as a set top box, a mobile phone, a tablet personal computer, and a laptop computer. , multimedia player, digital camera, personal digital assistant (PDA), navigation device, mobile Internet device (MID) or wearable device (Wearable Device). As shown in FIG. 13, the apparatus provided by the embodiment of the present invention may include one or more of the following units: an input unit, a storage unit, a processor unit, a communication unit, a peripheral interface, an output unit, and a power source.
本发明实例中,麦克风可以作为输入单元,输入单元可以输入音频信号,检测终端是否有声纹识别信号。存储器可以作为存储单元,存储单元可以存储执行指令,比如可以是操作程序和应用程序等执行指令,也可以是具体的吹气信号识别模块、吹气信号和音频信号分离模块和吹气信号判断模块等执 行指令。处理器可以作为处理器单元,处理器单元可以是一个中央处理器(Central Processing Unit,CPU),或者是特定集成电路(Application Specific Integrated Circuit,ASIC),或者完成实施本发明实施例的一个或多个集成电路。当终端运行时,处理器单元与存储器单元之间通信,处理器单元调用执行指令,用于执行上述方法实施例中的操作。通信单元可以用于终端与其他设备之间的有限或无线方式的通信。外设接口可以用于终端与外围接口模块之间提供接口,其中,外围接口模块可以是键盘、按钮等。输出单元可以用于输出音频信号。电源可以用于为终端的各个单元提供电力。In the example of the present invention, the microphone can be used as an input unit, and the input unit can input an audio signal to detect whether the terminal has a voiceprint recognition signal. The memory can be used as a storage unit, and the storage unit can store execution instructions, such as an execution instruction such as an operation program and an application program, or a specific blow signal recognition module, a blow signal and an audio signal separation module, and a blow signal determination module. Wait Line instructions. The processor may be a processor unit, and the processor unit may be a central processing unit (CPU), or an application specific integrated circuit (ASIC), or one or more implementations of the embodiments of the present invention. Integrated circuits. When the terminal is running, the processor unit communicates with the memory unit, and the processor unit invokes an execution instruction for performing the operations in the above method embodiments. The communication unit can be used for limited or wireless communication between the terminal and other devices. The peripheral interface can be used to provide an interface between the terminal and the peripheral interface module, wherein the peripheral interface module can be a keyboard, a button, or the like. The output unit can be used to output an audio signal. The power supply can be used to provide power to the various units of the terminal.
本发明实施例还提供了一种非易失性计算机可读存储介质,例如包括指令的存储单元,上述指令可由声纹检测的装置的处理器执行以完成上述方法。例如,该非易失性计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。Embodiments of the present invention also provide a non-transitory computer readable storage medium, such as a storage unit including instructions that are executable by a processor of a voiceprint detecting device to perform the above method. For example, the non-transitory computer readable storage medium can be a ROM, a random access memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, and an optical data storage device.
一种非易失性计算机可读存储介质,该非易失性计算机可读存储介质存储计算机指令,该计算机指令用于使控制缓存刷盘的装置执行上述方法实施例中的操作。当该存储介质中的指令由终端的处理器执行时,使得终端能够执行上述方法实施例中的操作。A non-transitory computer readable storage medium storing computer instructions for causing an apparatus for controlling a cache to perform an operation in the above-described method embodiments. When the instructions in the storage medium are executed by the processor of the terminal, the terminal is enabled to perform the operations in the above method embodiments.
最后应说明的是:以上各实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述各实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。 Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, and are not intended to be limiting; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art will understand that The technical solutions described in the foregoing embodiments may be modified, or some or all of the technical features may be equivalently replaced; and the modifications or substitutions do not deviate from the technical solutions of the embodiments of the present invention. range.
Claims (22)
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/CN2015/100286 WO2017113370A1 (en) | 2015-12-31 | 2015-12-31 | Voiceprint detection method and apparatus |
| CN201580079562.2A CN107533415B (en) | 2015-12-31 | 2015-12-31 | Method and device for voiceprint detection |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/CN2015/100286 WO2017113370A1 (en) | 2015-12-31 | 2015-12-31 | Voiceprint detection method and apparatus |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2017113370A1 true WO2017113370A1 (en) | 2017-07-06 |
Family
ID=59224366
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2015/100286 Ceased WO2017113370A1 (en) | 2015-12-31 | 2015-12-31 | Voiceprint detection method and apparatus |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN107533415B (en) |
| WO (1) | WO2017113370A1 (en) |
Cited By (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2019141139A1 (en) * | 2018-01-17 | 2019-07-25 | Huawei Technologies Co., Ltd. | Echoprint user authentication |
| CN113707182A (en) * | 2021-09-17 | 2021-11-26 | 北京声智科技有限公司 | Voiceprint recognition method and device, electronic equipment and storage medium |
| CN113744431A (en) * | 2020-05-14 | 2021-12-03 | 大富科技(安徽)股份有限公司 | Shared bicycle lock control device, method, equipment and medium |
| CN116092167A (en) * | 2023-02-23 | 2023-05-09 | 唯思电子商务(深圳)有限公司 | Human face living body detection method based on reading |
| CN119207432A (en) * | 2024-09-14 | 2024-12-27 | 维沃移动通信有限公司 | Voiceprint verification method, device, electronic device and readable storage medium |
| GB2635257A (en) * | 2021-09-07 | 2025-05-07 | Pi A Creative Systems Ltd | Method for detecting user input to a breath input configured user interface |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN115346340B (en) * | 2022-07-21 | 2023-11-17 | 浙江极氪智能科技有限公司 | Devices and methods for improving driving fatigue |
Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPH06105388A (en) * | 1992-09-18 | 1994-04-15 | Matsushita Electric Ind Co Ltd | Exhalation airflow sensor |
| US20060036441A1 (en) * | 2004-08-13 | 2006-02-16 | Canon Kabushiki Kaisha | Data-managing apparatus and method |
| CN101441869A (en) * | 2007-11-21 | 2009-05-27 | 联想(北京)有限公司 | Method and terminal for speech recognition of terminal user identification |
| CN102737634A (en) * | 2012-05-29 | 2012-10-17 | 百度在线网络技术(北京)有限公司 | Authentication method and device based on voice |
| CN102866844A (en) * | 2012-08-13 | 2013-01-09 | 上海华勤通讯技术有限公司 | Mobile terminal and unlocking method thereof |
| CN202841290U (en) * | 2012-06-04 | 2013-03-27 | 百度在线网络技术(北京)有限公司 | Unlocking device of mobile terminal and mobile terminal having unlocking device |
| CN104021790A (en) * | 2013-02-28 | 2014-09-03 | 联想(北京)有限公司 | Sound control unlocking method and electronic device |
Family Cites Families (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101897627B (en) * | 2010-06-30 | 2012-10-24 | 广州医学院第一附属医院 | Method for establishing and detecting mouse cough model |
| CN102523347A (en) * | 2011-12-16 | 2012-06-27 | 广东步步高电子工业有限公司 | Air blowing control method and device applied to electronic products |
| CN103886861B (en) * | 2012-12-20 | 2017-03-01 | 联想(北京)有限公司 | A kind of method of control electronics and electronic equipment |
-
2015
- 2015-12-31 CN CN201580079562.2A patent/CN107533415B/en active Active
- 2015-12-31 WO PCT/CN2015/100286 patent/WO2017113370A1/en not_active Ceased
Patent Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPH06105388A (en) * | 1992-09-18 | 1994-04-15 | Matsushita Electric Ind Co Ltd | Exhalation airflow sensor |
| US20060036441A1 (en) * | 2004-08-13 | 2006-02-16 | Canon Kabushiki Kaisha | Data-managing apparatus and method |
| CN101441869A (en) * | 2007-11-21 | 2009-05-27 | 联想(北京)有限公司 | Method and terminal for speech recognition of terminal user identification |
| CN102737634A (en) * | 2012-05-29 | 2012-10-17 | 百度在线网络技术(北京)有限公司 | Authentication method and device based on voice |
| CN202841290U (en) * | 2012-06-04 | 2013-03-27 | 百度在线网络技术(北京)有限公司 | Unlocking device of mobile terminal and mobile terminal having unlocking device |
| CN102866844A (en) * | 2012-08-13 | 2013-01-09 | 上海华勤通讯技术有限公司 | Mobile terminal and unlocking method thereof |
| CN104021790A (en) * | 2013-02-28 | 2014-09-03 | 联想(北京)有限公司 | Sound control unlocking method and electronic device |
Cited By (10)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2019141139A1 (en) * | 2018-01-17 | 2019-07-25 | Huawei Technologies Co., Ltd. | Echoprint user authentication |
| US10853463B2 (en) | 2018-01-17 | 2020-12-01 | Futurewei Technologies, Inc. | Echoprint user authentication |
| US11461447B2 (en) | 2018-01-17 | 2022-10-04 | Futurewei Technologies, Inc. | Echoprint user authentication |
| CN113744431A (en) * | 2020-05-14 | 2021-12-03 | 大富科技(安徽)股份有限公司 | Shared bicycle lock control device, method, equipment and medium |
| CN113744431B (en) * | 2020-05-14 | 2024-04-09 | 大富科技(安徽)股份有限公司 | Shared bicycle lock control device, method, equipment and medium |
| GB2635257A (en) * | 2021-09-07 | 2025-05-07 | Pi A Creative Systems Ltd | Method for detecting user input to a breath input configured user interface |
| GB2635257B (en) * | 2021-09-07 | 2025-12-03 | Pi A Creative Systems Ltd | Method for detecting user input to a breath input configured user interface |
| CN113707182A (en) * | 2021-09-17 | 2021-11-26 | 北京声智科技有限公司 | Voiceprint recognition method and device, electronic equipment and storage medium |
| CN116092167A (en) * | 2023-02-23 | 2023-05-09 | 唯思电子商务(深圳)有限公司 | Human face living body detection method based on reading |
| CN119207432A (en) * | 2024-09-14 | 2024-12-27 | 维沃移动通信有限公司 | Voiceprint verification method, device, electronic device and readable storage medium |
Also Published As
| Publication number | Publication date |
|---|---|
| CN107533415B (en) | 2020-09-11 |
| CN107533415A (en) | 2018-01-02 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11735191B2 (en) | Speaker recognition with assessment of audio frame contribution | |
| CN108305615B (en) | Object identification method and device, storage medium and terminal thereof | |
| US11475899B2 (en) | Speaker identification | |
| CN107533415B (en) | Method and device for voiceprint detection | |
| US11735189B2 (en) | Speaker identification | |
| Wang et al. | Voicepop: A pop noise based anti-spoofing system for voice authentication on smartphones | |
| US8589167B2 (en) | Speaker liveness detection | |
| US20250225983A1 (en) | Detection of replay attack | |
| US10950245B2 (en) | Generating prompts for user vocalisation for biometric speaker recognition | |
| US11042616B2 (en) | Detection of replay attack | |
| Wang et al. | Secure your voice: An oral airflow-based continuous liveness detection for voice assistants | |
| GB2608710A (en) | Speaker identification | |
| JP6220304B2 (en) | Voice identification device | |
| Gupta et al. | Deep convolutional neural network for voice liveness detection | |
| Sahidullah et al. | Robust speaker recognition with combined use of acoustic and throat microphone speech | |
| Zhang et al. | A phoneme localization based liveness detection for text-independent speaker verification | |
| Chang et al. | Vogue: Secure user voice authentication on wearable devices using gyroscope | |
| KR20110079161A (en) | Speaker authentication method and device in mobile terminal | |
| WO2019006587A1 (en) | Speaker recognition system, speaker recognition method, and in-ear device | |
| WO2023047893A1 (en) | Authentication device and authentication method | |
| Zhang et al. | A Continuous Liveness Detection System for Text-independent Speaker Verification | |
| Korshunov et al. | Presentation attack detection in voice biometrics |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 15911995 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 15911995 Country of ref document: EP Kind code of ref document: A1 |