[go: up one dir, main page]

WO2018113526A1 - 基于人脸识别和声纹识别的交互式认证系统及方法 - Google Patents

基于人脸识别和声纹识别的交互式认证系统及方法 Download PDF

Info

Publication number
WO2018113526A1
WO2018113526A1 PCT/CN2017/114928 CN2017114928W WO2018113526A1 WO 2018113526 A1 WO2018113526 A1 WO 2018113526A1 CN 2017114928 W CN2017114928 W CN 2017114928W WO 2018113526 A1 WO2018113526 A1 WO 2018113526A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
recognition
user
terminal
server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2017/114928
Other languages
English (en)
French (fr)
Inventor
刘�东
李晓冬
杨震泉
彭世伟
孙云松
孟庆康
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Changhong Electric Co Ltd
Original Assignee
Sichuan Changhong Electric Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Changhong Electric Co Ltd filed Critical Sichuan Changhong Electric Co Ltd
Publication of WO2018113526A1 publication Critical patent/WO2018113526A1/zh
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/08Network architectures or network communication protocols for network security for authentication of entities
    • H04L63/0861Network architectures or network communication protocols for network security for authentication of entities using biometrical features, e.g. fingerprint, retina-scan
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Definitions

  • the invention relates to an authentication technology, in particular to an authentication technology for face recognition and voiceprint recognition.
  • Biometrics technology authenticates legal identity through the physiological or behavioral characteristics of the human body, such as fingerprints, irises, Facial image recognition and DNA sequence matching matching.
  • fingerprint recognition because it is easy to be forged, only needs to obtain the fingerprint of the other party from the daily necessities of the forged person, and the fingerprint can be forged. Therefore, the field of fingerprint identification is only a daily attendance record with low security requirements. in.
  • the iris recognition technology collects the annular part between the black pupil and the white sclera through the camera equipment, which contains many interlaced spots, filaments, crowns, stripes and crypts, so the camera
  • the hardware equipment requirements are relatively high, and it is not easy to be commercialized on a large scale or promoted to ordinary users.
  • Single image recognition verification face recognition verification
  • face recognition verification is also easy to use for static images (photos) to impersonate, while DNA sorting matches the threshold of recognition, which requires direct contact with the human body, so it is not suitable for "short” , flat, fast” Internet platform.
  • the human voice is rich in information of multiple dimensions, such as speech content, speech tone and sound characteristics.
  • Voiceprint recognition is a technique for distinguishing different speakers through human voice characteristics. Different channel structures determine the sound. The uniqueness of the pattern.
  • the object of the present invention is to solve the problem that the detection result of face recognition authentication is easily replaced by impersonation, and an interactive authentication system and method based on face recognition and voiceprint recognition are provided.
  • the interactive authentication system based on face recognition and voiceprint recognition comprises a terminal and a server, and the terminal and the server are connected through a network, wherein
  • the terminal is configured to acquire a facial video of the detected user, collect voice audio data input by the user, send the voice audio data to the server, and display display prompt information sent by the server;
  • the server is configured to perform matching between the facial feature parameters of the user and the user voiceprint feature vector, and perform the intersection of the voiceprint recognition result and the face recognition result. If there is only one result in the intersection, the verification is successful, and the return is successful. The terminal verifies the success information.
  • the matching of the user facial feature parameters and the matching of the user voiceprint feature vector means that the server acquires the user facial feature parameters from the received facial video of the detected user, and obtains the user facial feature parameters and the server in advance. All stored facial feature parameters of the user are matched. If the matching is successful, the face recognition result is obtained, and then the preset voice password text is sent to the terminal, and after receiving the voice audio data sent by the voice collection module of the terminal, converting the voice audio data into text Content, and matching the text content with the previously sent voice password text. If the matching is successful, the voiceprint feature vector in the voice audio data is extracted, and matched with all user voiceprint feature vectors pre-stored by the server, and matched. Success will result in voiceprint recognition.
  • the terminal includes a display module, a face video capture module, a voice collection module, and a first communication module
  • the server includes a face recognition module, a voice recognition module, a verification module, a database, and a second communication module
  • the display module The face video acquisition module and the voice collection module are respectively connected with the first communication module, and the face recognition module, the voice recognition module and the verification module are respectively connected with the second communication module, and the face recognition module and the voice recognition module are respectively connected with the verification module.
  • the database module is respectively connected with the face recognition module, the voice recognition module and the verification module, and the first communication module and the second communication module are connected through a network.
  • the face video capture module is configured to acquire a facial video of the detected user and send the video to the face recognition module through the first communication module and the second communication module;
  • the voice collection module is configured to collect voice audio data input by the user and send the voice audio data to the voice recognition module through the first communication module and the second communication module;
  • the display module is configured to display display prompt information sent by the server, including face recognition failure information, voice password input incorrect information, verification failure information, voice password text, and verification success information;
  • the first communication module and the second communication module are used for information interaction between the terminal and the server;
  • the face recognition module is configured to filter and denoise the face video of the detected user, extract key frames, acquire user facial feature parameters according to the key frame, and select key feature parameters and stored in the database. All the user facial feature parameters are matched. If the matching is successful, the matching success result is sent to the verification module, and the successful matching result is the face recognition result. If the matching fails, the terminal face recognition failure information is returned;
  • the voice recognition module is configured to: after receiving the voice recognition request sent by the verification module, send the preset voice password text to the terminal, so that the terminal displays the voice password text through the display module, and is sent by the voice collection module of the terminal.
  • the voice audio data is converted into text content, and the text content is matched with the previously sent voice password text, if the matching fails, the recognition is failed, and the terminal voice password input incorrect information is returned, and if the matching is successful, the data is extracted.
  • the voiceprint feature vector in the voice audio data is matched with all user voiceprint feature vectors stored in the database. If the match fails, the recognition fails, and the terminal voice recognition failure information is returned. If the match is successful, the match is successful.
  • the result is sent to the verification module, and the successful result of the matching is the voiceprint recognition result;
  • the verification module is configured to send a voice recognition request to the voice recognition module after receiving the matching success result sent by the face recognition module, and after receiving the matching success result sent by the voice recognition module, and the face recognition module If the intersection is empty, the current user verification fails, and the terminal verification failure information is returned. If there is only one result in the intersection, the verification is successful, and the terminal verification success information is returned. If there is more than one result in the concentration, the voiceprint feature is not obvious, and the voice recognition request is resent to the voice recognition module. If a predetermined number of voice recognition requests have been sent at this time, the user authentication failure is considered, and the terminal verification failure information is returned. .
  • the face video capture module is a camera module
  • the voice capture module is a pickup.
  • the face recognition module is configured with an image similarity preset value, and when the key feature parameter in the user facial feature parameter is matched with the user facial feature parameter stored in the database, if the matching result is When the threshold value of the facial feature parameter of each user is smaller than the preset value of the image similarity, it is determined that the matching is successful, otherwise it is determined that the matching fails.
  • the successful matching result of the face recognition module includes user information, where the user information includes user age information.
  • the voice recognition request sent by the verification module to the voice recognition module includes user age information or a voice password text when requesting to send a registration.
  • the voice recognition request sent by the verification module to the voice recognition module if the voice recognition request is sent to the voice recognition module by the preset number of times, the voice recognition request includes the voice when requesting to send the registration. Password text.
  • the preset voice password text is an easy-to-read text or a number of numbers or a piece of news text or a voice password text corresponding to the user information.
  • the voice recognition request is further determined according to the voice recognition request, and if the voice recognition request is requested to send the voice password text when registering, the voice recognition module selects the preset
  • the voice password text is a voice password text corresponding to the user information, and if there is user age information in the voice recognition request, the user age is determined according to the user age information, and the preset is selected if the user is an elderly person or a minor.
  • the voice password text is an easy-to-read text or a number of numbers, otherwise the selected preset voice password text is a piece of news text.
  • the voice recognition module starts timing and determines whether the voice and audio data sent by the terminal is received within a preset time, and if the time count reaches the preset time. If the voice audio data sent by the terminal is not received, the preset voice password text is replaced and the replaced preset voice password text is re-sent to the terminal, and the timing is restarted, and it is determined whether the terminal is sent within the preset time.
  • the step of voice audio data is performed by the terminal is sent to the terminal.
  • An interactive authentication method based on face recognition and voiceprint recognition is applied to the above-mentioned interactive authentication system based on face recognition and voiceprint recognition, characterized in that it comprises the following steps;
  • Step 1 The user uses the terminal to perform user registration with the server, and the server stores the user information, the facial feature parameters of the user, and the user voiceprint feature vector in the database;
  • Step 2 When authenticating, the terminal acquires a facial video of the detected user and sends the video to the server;
  • Step 3 The server filters and denoises the facial video of the detected user, extracts the key frame, acquires the facial feature parameters of the user according to the key frame, and selects the key feature parameters and all the user facial feature parameters stored in the database. Matching, if the matching is successful, the face recognition result is obtained and proceeds to step 5, if the matching fails, the process proceeds to step 4;
  • Step 4 the server returns the terminal face recognition failure information, the terminal displays the face recognition failure and prompts the user, and returns to step 2;
  • Step 5 The server generates and sends a preset voice password text to the terminal.
  • Step 6 the terminal displays the voice password text, and collects the voice audio data input by the user and uploads it to the server;
  • Step 7 The server converts the received voice audio data into text content, and matches the text content with the previously sent voice password text. If the matching fails, the identification fails, and the terminal voice password input incorrect information is returned. Go to step 8, if the match is successful, go to step 9;
  • Step 8 the terminal displays the voice password input incorrect information, return to step 2;
  • Step 9 The server extracts the voiceprint feature vector in the voice audio data, and matches it with all user voiceprint feature vectors stored in the database. If the match fails, the recognition fails, and the terminal voice recognition failure information is returned, and the process proceeds to the step. 10, if the match is successful, the speech recognition result is obtained and proceeds to step 11;
  • Step 10 The terminal displays the voice recognition failure information, and returns to step 2;
  • Step 11 The server performs the intersection of the face recognition result and the voice recognition result. If the intersection is empty, it is considered that the current user verification fails, and the terminal verification failure information is returned, and the process proceeds to step 12. If there is only one result in the intersection, it is considered If the verification is successful, the terminal verification success information is returned. If there is more than one result in the intersection, the voiceprint feature is considered to be inconspicuous, and it is determined whether the current authentication has sent a preset number of voice password texts. If yes, the user verification is failed. The terminal verifies the failure information, proceeds to step 12, otherwise regenerates and sends the preset voice password text to the terminal, and returns to step 6;
  • step 12 the terminal displays the verification failure information, and returns to step 2.
  • step 1 includes the following steps:
  • Step 101 The user inputs user information to the terminal, and collects a face video or a plurality of face images through the terminal, and the terminal uploads the user information and the face video or the plurality of face images to the server;
  • Step 102 The server intercepts multiple face images from the face video or uses the received multiple images as face samples to obtain the facial feature parameters of the user, and performs face modeling and associates with the user information. Stored in the database, and randomly generated voice password text is sent to the terminal;
  • Step 103 The terminal displays the voice password text, and collects voice audio data of the user, and uploads the collected voice and audio data to the server;
  • Step 104 The server performs voiceprint feature vector extraction on the voice audio data, and associates the extracted voiceprint feature vector, voice audio data, and corresponding voice password text with the user information, and stores the data in the database.
  • step 102 the randomly generated voice password text is sent to the terminal, and at least one piece of voice password text is randomly generated and sent to the terminal in sequence;
  • step 103 the terminal displays the voice password text, and collects the user's voice and audio data, and uploads the collected voice and audio data to the server, and the terminal displays the voice password text in sequence, when a voice password text is collected three times. After the user's voice and audio data, the next voice password text is displayed, and each of the three voice and audio data corresponding to all the voice password texts is obtained and sent to the server.
  • step 104 after receiving all the voice and audio data, the server separately extracts the voiceprint feature vector, and selects, for each voice password text, a voice and audio data in which the voiceprint feature vector is most obvious, and the voice is The password text, the selected voice and audio data, and the voiceprint feature vector are associated with the information system. Stored in the database.
  • step 11 the regenerating and sending the preset voice password text to the terminal, the regenerated preset voice password text is one of the voice password texts at the time of registration corresponding to the user information.
  • step 3 the image similarity preset value is set in the server, and when the key feature parameter in the user facial feature parameter is matched with the user facial feature parameter stored in the database, if the matching result is When the user facial feature parameter similarity threshold is smaller than the image similarity preset value, it is determined that the matching is successful, otherwise it is determined that the matching fails.
  • the preset voice password text is a randomly generated piece of readable text or a randomly generated piece of numbers or a randomly generated piece of news type text or a registered voice code text corresponding to the user information.
  • the user information includes user age information
  • step 3 the face recognition result includes user information
  • step 5 when the server generates and sends a preset voice password text to the terminal, if the user information in the face recognition result is displayed as an elderly person or a minor, the preset voice password text selected is an easy-to-read text or A number of digits, otherwise the selected preset voice password text is a piece of news text.
  • step 9 if the matching fails, it is further determined whether a preset number minus one voice password text has been generated, and if yes, the recognition is failed, and the terminal voice recognition failure information is returned, and the process proceeds to step 10, otherwise re-generating and The terminal sends the preset voice password text, and returns to step 6.
  • the preset voice password text that is regenerated and sent to the terminal is a randomly generated piece of easy-to-read text or a randomly generated segment number or a randomly generated piece of news text. The length is greater than the preset sound password text generated last time.
  • step 9 the preset value of the voiceprint similarity is set in the server, and when the server matches the voiceprint feature vector in the extracted voice audio data with all the user voiceprint feature vectors stored in the database, If the threshold value of the voiceprint feature vector of each user in the matched result is less than the preset value of the voiceprint similarity, it is determined that the matching is successful, otherwise it is determined that the matching fails.
  • step 5 after the server generates and sends the preset voice password text to the terminal, the timing is also started;
  • step 9 after the server regenerates and sends the preset voice password text to the terminal, the timing is also started;
  • step 11 after the server regenerates and sends the preset voice password text to the terminal, the timing is also started;
  • step 5 the following steps are further included:
  • Step A the server determines whether the voice audio data sent by the terminal is received within a preset time, if the voice audio data sent by the terminal is not received after the preset time reaches the preset time, the process proceeds to step A, otherwise proceeds to step 7;
  • Step B The server replaces the preset voice password text and resends the replaced preset voice password text to the terminal, and restarts the timing, and returns to step A, where the replaced preset voice password text is a re-randomly generated segment. Easy-to-read text or randomly generated numbers or randomly generated pieces of news text.
  • step 9 if the matching fails, after returning the terminal speech recognition failure information, the server also proceeds to step 13;
  • step 11 if the verification is successful, returning the terminal verification success information, the server also proceeds to step 13, if it is considered that the current user verification fails, returning the terminal verification failure information, the server also proceeds to step 13;
  • Step 13 The server optimizes the face modeling corresponding to the user information in the face recognition result by using the face image received in the current authentication.
  • the invention has the beneficial effects that in the solution of the present invention, through the above-mentioned interactive authentication system and method based on face recognition and voiceprint recognition, face recognition and voiceprint recognition are used to achieve higher security authentication and improve security. Sex.
  • FIG. 1 is a system block diagram of an interactive authentication system based on face recognition and voiceprint recognition according to an embodiment of the present invention.
  • FIG. 1 An interactive authentication system based on face recognition and voiceprint recognition according to the present invention, the system block diagram of which is shown in FIG. 1 , including a terminal and a server, where the terminal and the server are connected through a network, wherein the terminal is configured to acquire a facial video of the detected user and The voice audio data input by the user is collected and sent to the server, and the display prompt information sent by the server is displayed; the server is configured to perform matching of the user facial feature parameters and matching the user voiceprint feature vector, and the voiceprint recognition result is related to the person.
  • the face recognition result is collected and intersected. If there is only one result in the intersection, the verification is successful, and the terminal verification success information is returned.
  • the interactive authentication method based on face recognition and voiceprint recognition is applied to the above-mentioned interactive authentication system based on face recognition and voiceprint recognition.
  • the user uses the terminal to perform user registration with the server, and the server is in the database.
  • the user information, the user facial feature parameter, and the user voiceprint feature vector are stored.
  • the terminal acquires the facial video of the detected user and sends the video to the server, and the server filters and denoises the facial video of the detected user.
  • the server returns Returning the terminal face recognition failure information, the terminal displays the face recognition failure and prompts the user to return to the authentication step to re-authenticate.
  • the matching is successful, the face recognition result is obtained, and the preset voice password text is generated and sent to the terminal, and then the terminal Display the voice password text, and collect the voice audio data input by the user to upload to the server, and then convert the received voice audio data into text content, and match the text content with the previously sent voice password text, if the match If the failure is that the recognition fails, the terminal voice password input error information is returned, the terminal displays the voice password input incorrect information, and the step back to the authentication is re-authenticated. If the match is successful, the server extracts the voiceprint feature vector in the voice audio data. Match it with all user voiceprint feature vectors stored in the database. If the match fails, the recognition is failed, and the terminal voice recognition failure information is returned.
  • the terminal displays the voice recognition failure information, and returns to the authentication step to re-authenticate. Successfully get speech recognition results
  • the server performs the intersection of the face recognition result and the voice recognition result. If the intersection is empty, it is considered that the user verification fails, and the terminal verification failure information is returned, and the terminal displays the verification failure information, and returns to the authentication step to re-authenticate. If there is only one result in the intersection, it is considered that the verification is successful, and the terminal verification success information is returned. If there is more than one result in the intersection, the voiceprint feature is considered to be inconspicuous, and it is determined whether the authentication has sent a preset number of voice password texts, and if so If the user authentication fails, the terminal returns the terminal verification failure message. The terminal displays the verification failure information and returns to the authentication step to re-authenticate. Otherwise, it regenerates and sends the preset voice password text to the terminal, and returns to the terminal to display the voice password text. .
  • FIG. 1 An interactive authentication system based on face recognition and voiceprint recognition according to an embodiment of the present invention is shown in FIG. 1 , which includes a terminal and a server.
  • the terminal and the server are connected through a network, and the terminal may include a display module and a face video capture module.
  • the voice collection module and the first communication module, the server may include a face recognition module, a voice recognition module, a verification module, a database, and a second communication module, and the display module, the face video collection module, and the voice collection module are respectively connected to the first communication module.
  • the face recognition module, the voice recognition module and the verification module are respectively connected with the second communication module, and the face recognition module and the voice recognition module are respectively connected with the verification module, and the database module is respectively connected with the face recognition module, the voice recognition module and the verification module.
  • the first communication module and the second communication module are connected through a network.
  • the terminal is configured to acquire the facial video of the detected user and collect the voice audio data input by the user, and send the data to the server, and display the display prompt information sent by the server.
  • the terminal may include a display module, a face video acquisition module, a voice collection module, and a first communication module.
  • the face video capture module is configured to obtain the face video of the detected user and send it to the face recognition module through the first communication module and the second communication module;
  • the camera module can be a camera module such as a camera.
  • the voice collection module is configured to collect voice and audio data input by the user and pass the first communication module and the second communication
  • the module is sent to the speech recognition module; it can be a pickup such as a microphone.
  • the display module is configured to display display prompt information sent by the server, including face recognition failure information, voice password input incorrect information, verification failure information, voice password text, and verification success information.
  • the first communication module is used for information interaction between the terminal and the server.
  • the server is configured to perform matching between the facial feature parameters of the user and the user voiceprint feature vector, and combine the voiceprint recognition result with the face recognition result. If there is only one result in the intersection, the verification is successful, and the terminal verification is returned. Success information.
  • the matching of the user facial feature parameters and the matching of the user voiceprint feature vector is preferably: the server acquires the user facial feature parameters from the received facial video of the detected user, and acquires the obtained user facial feature parameters and all the pre-stored parameters of the server.
  • the user facial feature parameters are matched, and the face recognition result is obtained after the matching is successful, and then the preset voice password text is sent to the terminal, and after receiving the voice audio data sent by the voice collection module of the terminal, the voice audio data is converted into text content, and Matching the text content with the previously sent voice password text, and if the matching is successful, extracting the voiceprint feature vector in the voice audio data, and matching it with all user voiceprint feature vectors pre-stored by the server, and matching is successful. Voiceprint recognition results.
  • the server may include a face recognition module, a voice recognition module, a verification module, a database, and a second communication module.
  • the second communication module is used for information interaction between the terminal and the server.
  • the face recognition module is configured to filter and denoise the face video of the detected user, extract key frames, acquire user facial feature parameters according to the key frame, and select all the key feature parameters and all stored in the database.
  • the user facial feature parameters are matched. If the matching is successful, the matching success result is sent to the verification module, and the successful matching result is the face recognition result. If the matching fails, the terminal face recognition failure information is returned.
  • the image recognition module may set an image similarity preset value, and when the key feature parameter in the user facial feature parameter is matched with the user facial feature parameter stored in the database, if the user facial feature parameter is matched in the result When the similarity threshold is smaller than the image similarity preset value, it is determined that the matching is successful, otherwise it is determined that the matching fails.
  • the matching result of the face recognition module may include user information, and the user information includes user age information.
  • the voice recognition module is configured to: after receiving the voice recognition request sent by the verification module, send the preset voice password text to the terminal, so that the terminal displays the voice password text through the display module, and receives the voice audio sent by the voice collection module of the terminal. After the data, it is converted into text content, and the text content is matched with the previously sent voice password text. If the matching fails, the recognition is failed, and the terminal voice password input incorrect information is returned. If the matching is successful, the voice is extracted. The voiceprint feature vector in the audio data is matched with all user voiceprint feature vectors stored in the database. If the match fails, the recognition fails, and the terminal voice recognition failure information is returned. If the match is successful, the match success result is sent.
  • the successful result of the matching is the voiceprint recognition result.
  • the preset voice password text is an easy-to-read text or a number of numbers or a piece of news text or a voice password text corresponding to the user information; in the voice recognition module, the preset voice password text is sent to the terminal.
  • the voice password request may be judged according to the voice recognition request. If the voice password request has a request to send the voice password text, the voice password module selects the preset voice password text as the voice password text corresponding to the user information, if the voice If there is user age information in the identification request, the user's age is determined according to the user's age information.
  • the preset voice password text is an easy-to-read text or a number of digits, otherwise the selected default voice password is selected.
  • the text is a piece of news text; in addition, in the voice recognition module, after the preset voice password text is sent to the terminal, the time is also started to determine whether the terminal sends the received time within a preset time (for example, 10 seconds). Voice and audio data, if the time is up to the preset time, it has not been sent by the terminal. For voice and audio data, replace the preset voice password text and resend the replaced preset voice password text to the terminal, and restart the timing, and return to the step of determining whether to receive the voice and audio data sent by the terminal within the preset time. .
  • the verification module is configured to send a voice recognition request to the voice recognition module after receiving the matching success result sent by the face recognition module, and send the voice recognition request to the face recognition module after receiving the matching success result sent by the voice recognition module. If the intersection is empty, it is considered that the user verification fails, and the terminal verification failure information is returned. If there is only one result in the intersection, the verification is successful, and the terminal verification success information is returned, if the intersection has If there is more than one result, it is considered that the voiceprint feature is not obvious, and the voice recognition request is resent to the voice recognition module. If a predetermined number of voice recognition requests have been sent at this time, the current user authentication failure is considered, and the terminal verification failure information is returned.
  • the voice recognition request sent by the verification module to the voice recognition module includes the user age information or the voice password text when requesting to send the registration, and may also be in the voice recognition request sent by the verification module to the voice recognition module, if this is the first The preset number of times (for example, when the preset number is 3, and the third time is now), the voice recognition request is sent to the voice recognition module, and the voice recognition request includes the voice password text when the registration is requested to be sent.
  • the processing method is as follows:
  • Step 1 The user uses the terminal to perform user registration with the server, and the server stores user information, the user facial feature parameter, and the user voiceprint feature vector in the database.
  • the user information preferably includes user age information, and the step may specifically include the following steps:
  • Step 101 The user inputs user information to the terminal, and collects a face video or a plurality of face images through the terminal, and the terminal uploads the user information and the face video or the plurality of face images to the server.
  • Step 102 The server intercepts multiple face images from the face video or uses the received multiple images as face samples to obtain the facial feature parameters of the user, and performs face modeling and associates with the user information. Stored in the database and randomly generated voice password text is sent to the terminal.
  • the randomly generated voice password text is sent to the terminal, and at least one piece of voice password text can be randomly generated and sent to the terminal in sequence, for example, three pieces of voice password text are randomly generated, randomly sorted, and then sequentially transmitted to the terminal.
  • how many pieces of voice password text are randomly generated is determined according to the security degree of the service authentication. Generally, the service authentication with higher security requirement, the more the number of randomly generated voice password texts at the time of registration.
  • Step 103 The terminal displays the voice password text, and collects voice audio data of the user, and uploads the collected voice audio data to the server.
  • the terminal displays the voice password text, and collects the user's voice and audio data, and uploads the collected voice and audio data to the server. If the terminal receives the plurality of voice password texts in sequence, the voice password text is displayed in order, when one After the voice password data is collected three times corresponding to the user's voice and audio data, the next voice password text is displayed, and each of the three voice and audio data corresponding to all the voice password texts is obtained and sent to the server. For example, when the terminal receives two pieces of voice password text in sequence, the first voice password text is displayed first, and the user voice audio data input by the user according to the first voice password text is collected three times, and then the second voice password text is displayed.
  • the user voice audio data input by the user according to the second voice password text is collected three times, and then the three user voice audio data corresponding to the first voice password text and the three user voice audio data corresponding to the second voice password text are collected together. Sent to the server for a total of six user voice audio data.
  • Step 104 The server performs voiceprint feature vector extraction on the voice audio data, and associates the extracted voiceprint feature vector, voice audio data, and corresponding voice password text with the user information, and stores the data in the database.
  • the server if the server receives a plurality of voice and audio data, the server respectively extracts the voiceprint feature vectors after receiving all the voice and audio data, and selects the most distinctive voiceprint feature vector for each voice password text.
  • a voice audio data, the voice password text, the selected voice audio data and its voiceprint feature vector are associated with the information system and stored in the database. That is, one voice password text corresponds to one voice audio data, and the other two voice audio data can be deleted.
  • Step 2 During authentication, the terminal acquires the face video of the detected user and sends it to the server.
  • Step 3 The server filters and denoises the facial video of the detected user, extracts the key frame, acquires the facial feature parameters of the user according to the key frame, and selects the key feature parameters and all the user facial feature parameters stored in the database. Matching is performed. If the matching is successful, the face recognition result is obtained and the process proceeds to step 5. If the matching fails, the process proceeds to step 4.
  • the image similarity preset value may be set in the server, and when the key feature parameter in the user facial feature parameter is matched with the user facial feature parameter stored in the database, if the user facial features are matched in the matched result
  • the parameter similarity threshold is less than the image similarity preset value, it is determined that the matching is successful, otherwise the determination is Match failed.
  • the face recognition result preferably includes user information, and the user information is visible from step 1, which preferably includes user age information.
  • Step 4 The server returns the terminal face recognition failure information, and the terminal displays that the face recognition fails and prompts the user, and returns to step 2.
  • Step 5 The server generates and sends a preset voice password text to the terminal.
  • the preset voice password text may be a randomly generated piece of readable text or a randomly generated piece of numbers or a randomly generated piece of news type text or a voice password text at the time of registration corresponding to the user information.
  • the server when the server generates and sends the preset voice password text to the terminal, if the user information in the face recognition result (which can be judged according to the user age information) is displayed as an elderly person or a minor, the preset voice password text is a segment. Easy-to-read text or a number of numbers, the purpose is to ensure that the user can understand and read the voice password text, otherwise the selected preset voice password text is a piece of news text, otherwise the user information display user is an adult Adults can generally understand and read the voice password text, so choose a piece of news text to increase recognition accuracy.
  • Step 6 The terminal displays the voice password text, and collects the voice audio data input by the user and uploads it to the server.
  • Step 7 The server converts the received voice audio data into text content, and matches the text content with the previously sent voice password text. If the matching fails, the identification fails, and the terminal voice password input incorrect information is returned. Go to step 8. If the match is successful, go to step 9.
  • Step 8 The terminal displays the voice password input incorrect information, and returns to step 2.
  • Step 9 The server extracts the voiceprint feature vector in the voice audio data, and matches it with all user voiceprint feature vectors stored in the database. If the match fails, the recognition fails, and the terminal voice recognition failure information is returned, and the process proceeds to the step. 10. If the matching is successful, the speech recognition result is obtained and the process proceeds to step 11.
  • step 10 if the matching fails, it can also be determined whether the preset number has been generated minus one (for example, the preset number is 3, then it is judged whether 2 voice password texts have been generated), if it is Then, the recognition fails, and the terminal voice recognition failure information is returned, and the process proceeds to step 10; otherwise, the preset voice password text is regenerated and sent to the terminal, and the process returns to step 6, and the preset voice password text that is regenerated and sent to the terminal is randomly generated.
  • a piece of readable text or a randomly generated piece of numbers or a randomly generated piece of news type text having a length greater than the previously generated preset sound password text, visible, which may correspond to the generation method in step 5.
  • the preset value of the voiceprint similarity may also be set in the server, and if the voiceprint feature vector in the extracted voice audio data is matched with all the user voiceprint feature vectors stored in the database, if the server matches In the result, when the user user's voiceprint feature vector similarity threshold is smaller than the preset value of the voiceprint similarity, it is determined that the match is Work, otherwise it is determined that the match failed.
  • Step 10 The terminal displays the voice recognition failure information, and returns to step 2.
  • Step 11 The server performs the intersection of the face recognition result and the voice recognition result. If the intersection is empty, it is considered that the current user verification fails, and the terminal verification failure information is returned, and the process proceeds to step 12. If there is only one result in the intersection, it is considered If the verification is successful, the terminal verification success information is returned. If there is more than one result in the intersection, the voiceprint feature is considered to be inconspicuous, and it is determined whether the current authentication has sent a preset number of voice password texts. If yes, the user verification is failed. The terminal verifies the failure information, and proceeds to step 12, otherwise regenerates and sends the preset voice password text to the terminal, and returns to step 6.
  • the preset voice password text is regenerated and sent to the terminal, and the regenerated preset voice password text is one of the voice password texts at the time of registration corresponding to the user information, that is, random in step 102 in this example.
  • One of the generated voice password texts when there is only one, the voice password text is directly selected. If the random voice password text is not generated as in step 102, the user voice audio data is directly collected, and then passed. The user voice audio data is acquired to the user's voiceprint feature vector, and then the voice password text corresponding to the user voice audio data can be selected (which can be obtained by converting the user voice audio data into text data).
  • step 12 the terminal displays the verification failure information, and returns to step 2.
  • the timer is also started.
  • the server may be the first time to generate and send the preset voice password text to the terminal during the current authentication, or the server may be the current time.
  • the authentication is re-generated and the preset voice password text is sent to the terminal, it means that the timer starts as long as the server generates and sends the preset voice password text to the terminal.
  • step 5 the following steps may also be included:
  • Step A the server determines whether the voice audio data sent by the terminal is received within a preset time, if the voice audio data sent by the terminal is not received after the preset time reaches the preset time, the process proceeds to step A, otherwise proceeds to step 7;
  • Step B The server replaces the preset voice password text and resends the replaced preset voice password text to the terminal, and restarts the timing, and returns to step A, where the replaced preset voice password text is a re-randomly generated segment. Easy-to-read text or randomly generated numbers or randomly generated pieces of news text.
  • step 9 if the matching fails, after returning the terminal voice recognition failure information, the server may also proceed to step 13, at which time the terminal still proceeds to step 10;
  • step 11 if the verification success is successful and the terminal verification success information is returned, the server may further enter step 13. If the user authentication failure is determined and the terminal verification failure information is returned, the server may proceed to step 13. At this point, the terminal still proceeds to step 12.
  • Step 13 may be: the server optimizes the face modeling corresponding to the user information in the face recognition result by using the face image received in the current authentication.
  • the purpose is: since the face recognition is successful, it indicates that the face image used for the recognition or the collected face video is correct, and the correct face image information can be used to optimize the face modeling and improve the person. Accuracy in face recognition, deletion of invalid user facial feature parameters, etc., to improve computational efficiency.
  • the server may further perform the voiceprint feature data corresponding to the user information in the face recognition result by using the voice and audio data received in the current authentication. optimization.
  • the face recognition step is prior to the front, and the voiceprint is recognized later.
  • the reason is: First, the face recognition has been developed over the past several decades, and the technology is relatively mature and the algorithm is efficient. The processing speed is fast, and the voiceprint recognition is different from other physiological feature recognition.
  • the voiceprint recognition feature must be a "personalized" feature, and the speaker (ie, the user who needs voiceprint recognition) needs to recognize the feature for the speaker must be There are “common characteristics”.
  • acoustic features related to the anatomical structure of the human's pronunciation mechanism (eg, spectrum) , cepstrum, formant, pitch, reflection coefficient, etc.), nasal sound, deep breath sounds, hoarseness, laughter, etc.; 2) semantics, rhetoric, pronunciation, etc. affected by socioeconomic status, education level, place of birth, etc. Speech habits, etc.; 3) Personal characteristics or rhythm, rhythm, speed, intonation, volume and other characteristics affected by parents.
  • the features currently available for the voiceprint automatic recognition model include: 1) acoustic features (cepstrum); 2) lexical features (speaker-related word n-gram, phoneme n-gram) 3) prosodic features (pitch and energy "postures” described by n-gram); 4) language, dialect and accent information; 5) channel information (what channel is used). Therefore, in the solution of the present invention, the preset voice password text may be randomly generated based on the user information.
  • the specific method of face recognition and voiceprint recognition mentioned in the present invention is a relatively mature technology, the present invention will not be described in detail.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Telephonic Communication Services (AREA)
  • Collating Specific Patterns (AREA)
  • Telephone Function (AREA)

Abstract

本发明涉及认证技术。本发明是要解决现有人脸识别认证其检测结果易被冒名顶替的问题,提供了一种基于人脸识别和声纹识别的交互式认证系统及方法,其技术方案可概括为:基于人脸识别和声纹识别的交互式认证系统,包括终端及服务器,终端与服务器通过网络连接,其中,终端用于获取被检测用户的面部视频及采集用户输入的语音音频数据将其发送至服务器,且显示服务器发送来的显示提示信息;服务器用于进行用户面部特征参数进行匹配和用户声纹特征向量进行匹配,并将声纹识别结果与人脸识别结果进行集合取交集,若交集中只有一个结果,则认为验证成功,返回终端验证成功信息。本发明的有益效果是,提升安全性,适用于认证系统。

Description

基于人脸识别和声纹识别的交互式认证系统及方法 技术领域
本发明涉及认证技术,特别涉及人脸识别及声纹识别的认证技术。
背景技术
随着互联网+时代的来临,网络化管理、无纸化办公及电子交易等已经渗透于日常生活的各个部位。虚拟生活和虚拟市场等逐渐成为上班族购物休闲的主要渠道,但互联网在方便人们生活的同时,它也是一把双刃剑,因为一切活动或者交易都在虚拟的网络中进行,没有人与人之间直接的接触,甚至都无须有文字的交流,彼此的信任和凭证都依靠口令、密钥或者短信验证码去实现,而互联网是一个开放网络、一个平等的平台,同时它也是一个不受控制的孩子。凡事在网络中传输的东西都有可能被人窃取,网民平时为了便于记忆方便使用,通常是一个密钥,处处使用,可使用的平台的好坏与安全性却是千差万别,可谓是一处被泄漏,处处被攻破,目前逐渐提出用手机随机验证码来取代传统的固定密钥,然后据统计手机却是最易遗失的个人财产之一。
硬件技术的发展,智能手机、个人电脑的普及,近来生物特征识别技术成为人们日益关注的焦点,生物特征识别技术通过人体的生理特征或行为特征来进行合法身份的认证,比如说指纹、虹膜、面部图像识别及脱氧核糖核酸(DNA)排序匹配识别等。
其中,指纹识别,因为容易被伪造,只需要简单的从被伪造者日常生活用品中获取对方的指纹,就可以进行指纹的伪造,所以指纹识别适用领域也只是安全程度要求不高的日常考勤记录中。
而虹膜识别技术是通过摄像器材采集识别者位于黑色瞳孔和白色巩膜之间的圆环状部分,其包含有很多相互交错的斑点、细丝、冠状、条纹及隐窝等细节特征,所以对摄像硬件设备要求比较高,不易于大规模商用或者向普通用户推广。
单一的图像识别验证(人脸识别验证),也容易用静态图像(照片)来冒名顶替,而脱氧核糖核酸(DNA)排序匹配识别的门槛较高,需要人体的直接接触所以并不适合“短、平、快”的互联网平台。
人的声音富含了多个维度的信息,如说话内容、说话语气及声音特征等,声纹识别是一种通过人的声音特征来辨别不同说话人的技术,不同的声道结构决定了声纹的唯一性。
发明内容
本发明的目的是要解决目前人脸识别认证其检测结果易被冒名顶替的问题,提供了一种基于人脸识别和声纹识别的交互式认证系统及方法。
本发明解决其技术问题,采用的技术方案是,基于人脸识别和声纹识别的交互式认证系统,包括终端及服务器,终端与服务器通过网络连接,其特征在于,
所述终端用于获取被检测用户的面部视频及采集用户输入的语音音频数据将其发送至服务器,且显示服务器发送来的显示提示信息;
所述服务器用于进行用户面部特征参数进行匹配和用户声纹特征向量进行匹配,并将声纹识别结果与人脸识别结果进行集合取交集,若交集中只有一个结果,则认为验证成功,返回终端验证成功信息。
进一步的,所述进行用户面部特征参数进行匹配和用户声纹特征向量进行匹配是指:服务器从接收到的被检测用户的面部视频获取用户面部特征参数,将获取的用户面部特征参数与服务器预先存储的所有用户面部特征参数进行匹配,匹配成功则得到人脸识别结果,然后向终端发送预设声音口令文本,在接收到终端的语音采集模块发送来的语音音频数据后,将其转换为文本内容,并将该文本内容与之前所发送的声音口令文本进行匹配,匹配成功则提取该语音音频数据中的声纹特征向量,将其与服务器预先存储的所有用户声纹特征向量进行匹配,匹配成功则得到声纹识别结果。
具体的,所述终端包括显示模块、人脸视频采集模块、语音采集模块及第一通讯模块,服务器包括人脸识别模块、语音识别模块、验证模块、数据库及第二通讯模块,所述显示模块、人脸视频采集模块、语音采集模块分别与第一通讯模块连接,人脸识别模块、语音识别模块、验证模块分别与第二通讯模块连接,人脸识别模块、语音识别模块分别与验证模块连接,数据库模块分别与人脸识别模块、语音识别模块及验证模块连接,第一通讯模块与第二通讯模块通过网络连接,
所述人脸视频采集模块用于获取被检测用户的面部视频将其通过第一通讯模块及第二通讯模块发送至人脸识别模块;
所述语音采集模块用于采集用户输入的语音音频数据将其通过第一通讯模块及第二通讯模块发送至语音识别模块;
所述显示模块用于显示服务器发送来的显示提示信息,包括人脸识别失败信息、声音口令输入不正确信息、验证失败信息、声音口令文本及验证成功信息;
所述第一通讯模块及第二通讯模块用于终端与服务器之间的信息交互;
所述人脸识别模块用于接收到被检测用户的面部视频后对其进行过滤及去噪,并提取关键帧,根据关键帧获取用户面部特征参数,选取其中的关键特征参数与数据库中所存储的所有用户面部特征参数进行匹配,若匹配成功则将匹配成功结果发送给验证模块,该匹配成功结果即为人脸识别结果,若匹配失败则将返回终端人脸识别失败信息;
所述语音识别模块用于在接收到验证模块发送来的语音识别请求后,向终端发送预设声音口令文本,令终端通过显示模块显示声音口令文本,在接收到终端的语音采集模块发送来的语音音频数据后,将其转换为文本内容,并将该文本内容与之前所发送的声音口令文本进行匹配,若匹配失败则认为识别失败,返回终端声音口令输入不正确信息,若匹配成功则提取该语音音频数据中的声纹特征向量,将其与数据库中所存储的所有用户声纹特征向量进行匹配,若匹配失败则认为识别失败,返回终端语音识别失败信息,若匹配成功则将匹配成功结果发送给验证模块,该匹配成功结果即为声纹识别结果;
所述验证模块用于接收到人脸识别模块发送来的匹配成功结果后,向语音识别模块发送语音识别请求,在接收到语音识别模块发送来的匹配成功结果后,将其与人脸识别模块发送来的匹配成功结果进行集合取交集,若交集为空,则认为本次用户验证失败,返回终端验证失败信息,若交集中只有一个结果,则认为验证成功,返回终端验证成功信息,若交集中有不止一个结果,则认为声纹特征不明显,向语音识别模块重新发送语音识别请求,若此时已发送过预设数量的语音识别请求则认为本次用户验证失败,返回终端验证失败信息。
再进一步的,所述人脸视频采集模块为摄像模块,所述语音采集模块为拾音器。
具体的,所述人脸识别模块中设置有图像相似度预设值,在选取用户面部特征参数中的关键特征参数与数据库中所存储的用户面部特征参数进行匹配时,若匹配出的结果中各用户面部特征参数相似度阈值小于图像相似度预设值时,判定为匹配成功,否则判定为匹配失败。
再进一步的,所述人脸识别模块的匹配成功结果中包括用户信息,所述用户信息中包括用户年龄信息。
具体的,所述验证模块向语音识别模块发送的语音识别请求中,包含用户年龄信息或请求发送注册时的声音口令文本。
再进一步的,所述验证模块向语音识别模块发送的语音识别请求中,若此次是第预设数量次向语音识别模块发送语音识别请求,则该语音识别请求中包括请求发送注册时的声音口令文本。
具体的,所述语音识别模块中,预设声音口令文本为一段易读文字或一段数字或一段新闻类文字或与用户信息对应的注册时的声音口令文本。
再进一步的,所述语音识别模块中,在向终端发送预设声音口令文本前还根据语音识别请求进行判断,若语音识别请求中有请求发送注册时的声音口令文本则语音识别模块选择的预设声音口令文本为与用户信息对应的注册时的声音口令文本,若语音识别请求中有用户年龄信息,则根据用户年龄信息判断用户年龄,若用户为老年人或未成年人则选择的预设声音口令文本为一段易读文字或一段数字,否则选择的预设声音口令文本为一段新闻类文字。
具体的,所述语音识别模块中,在向终端发送预设声音口令文本后,还开始计时,判断是否在预设时间内接收到终端发送来的语音音频数据,若计时时间达到预设时间仍未收到终端发送来的语音音频数据,则更换预设声音口令文本并重新向终端发送更换后的预设声音口令文本,且重新开始计时,回到判断是否在预设时间内接收到终端发送来的语音音频数据那一步。
基于人脸识别和声纹识别的交互式认证方法,应用于上述基于人脸识别和声纹识别的交互式认证系统,其特征在于,包括以下步骤;
步骤1、用户采用终端向服务器进行用户注册,服务器在数据库中存储用户信息、该用户面部特征参数及该用户声纹特征向量;
步骤2、认证时,终端获取被检测用户的面部视频并发送至服务器;
步骤3、服务器对接收到被检测用户的面部视频进行过滤及去噪,并提取关键帧,根据关键帧获取用户面部特征参数,选取其中的关键特征参数与数据库中所存储的所有用户面部特征参数进行匹配,若匹配成功则得到人脸识别结果并进入步骤5,若匹配失败则进入步骤4;
步骤4、服务器返回终端人脸识别失败信息,终端显示人脸识别失败并提示用户,回到步骤2;
步骤5、服务器生成并向终端发送预设声音口令文本;
步骤6、终端显示声音口令文本,并采集用户输入的语音音频数据上传至服务器;
步骤7、服务器将接收到的语音音频数据后转换为文本内容,并将该文本内容与之前所发送的声音口令文本进行匹配,若匹配失败则认为识别失败,返回终端声音口令输入不正确信息,进入步骤8,若匹配成功则进入步骤9;
步骤8、终端显示声音口令输入不正确信息,回到步骤2;
步骤9、服务器提取该语音音频数据中的声纹特征向量,将其与数据库中所存储的所有用户声纹特征向量进行匹配,若匹配失败则认为识别失败,返回终端语音识别失败信息,进入步骤10,若匹配成功则得到语音识别结果并进入步骤11;
步骤10、终端显示语音识别失败信息,回到步骤2;
步骤11、服务器将人脸识别结果与语音识别结果进行集合取交集,若交集为空,则认为本次用户验证失败,返回终端验证失败信息,进入步骤12,若交集中只有一个结果,则认为验证成功,返回终端验证成功信息,若交集中有不止一个结果,则认为声纹特征不明显,判断本次认证是否已发送预设数量的声音口令文本,若是则认为本次用户验证失败,返回终端验证失败信息,进入步骤12,否则重新生成并向终端发送预设声音口令文本,回到步骤6;
步骤12,终端显示验证失败信息,回到步骤2。
具体的,步骤1包括以下步骤:
步骤101、用户向终端输入用户信息,并通过终端采集人脸视频或多张人脸图像,终端将用户信息及人脸视频或多张人脸图像上传至服务器;
步骤102、服务器从人脸视频中截取多张人脸图像或将接收到的多张图像作为人脸样本,得到该用户面部特征参数,并进行人脸建模,并将其与用户信息关联后存储于数据库中,并随机生成声音口令文本发送给终端;
步骤103、终端显示声音口令文本,并采集用户的语音音频数据,将所采集的语音音频数据上传给服务器;
步骤104、服务器对语音音频数据进行声纹特征向量提取,将提取的声纹特征向量、语音音频数据及对应的声音口令文本与用户信息关联后存储于数据库中。
进一步的,步骤102中,所述随机生成声音口令文本发送给终端中,随机生成至少一段声音口令文本,并按顺序发送给终端;
步骤103中,所述终端显示声音口令文本,并采集用户的语音音频数据,将所采集的语音音频数据上传给服务器中,终端按顺序显示声音口令文本,当一个声音口令文本采集了三次对应的用户的语音音频数据后,再显示下一个声音口令文本,得到所有声音口令文本对应的各三个语音音频数据后,发送给服务器。
具体的,步骤104中,服务器接收到所有语音音频数据后,分别对其进行声纹特征向量提取,针对每一个声音口令文本,选择出其中声纹特征向量最明显的一个语音音频数据,将声音口令文本、所选择的语音音频数据及其声纹特征向量与用信息系关联后存 储于数据库中。
再进一步的,步骤11中,所述重新生成并向终端发送预设声音口令文本中,所重新生成的预设声音口令文本为与用户信息对应的注册时的声音口令文本中的一个
具体的,步骤3中,服务器中设置有图像相似度预设值,在选取用户面部特征参数中的关键特征参数与数据库中所存储的用户面部特征参数进行匹配时,若匹配出的结果中各用户面部特征参数相似度阈值小于图像相似度预设值时,判定为匹配成功,否则判定为匹配失败。
再进一步的,步骤5中,所述预设声音口令文本为随机生成的一段易读文字或随机生成的一段数字或随机生成的一段新闻类文字或与用户信息对应的注册时的声音口令文本。
具体的,步骤1中,所述用户信息包括用户年龄信息;
步骤3中,所述人脸识别结果中包括用户信息;
步骤5中,所述服务器生成并向终端发送预设声音口令文本时,若人脸识别结果中的用户信息显示为老年人或未成年人则选择的预设声音口令文本为一段易读文字或一段数字,否则选择的预设声音口令文本为一段新闻类文字。
再进一步的,步骤9中,若匹配失败时,还判断是否已生成过预设数量减一个声音口令文本,若是则认为识别失败,返回终端语音识别失败信息,进入步骤10,否则重新生成并向终端发送预设声音口令文本,回到步骤6,该重新生成并向终端发送的预设声音口令文本为随机生成的一段易读文字或随机生成的一段数字或随机生成的一段新闻类文字,其长度大于上一次生成的预设声音口令文本。
再进一步的,步骤9中,服务器中设置有声纹相似度预设值,在服务器将提取的语音音频数据中的声纹特征向量与数据库中所存储的所有用户声纹特征向量进行匹配时,若匹配出的结果中各用户用户声纹特征向量相似度阈值小于声纹相似度预设值时,判定为匹配成功,否则判定为匹配失败。
具体的,步骤5中,在服务器生成并向终端发送预设声音口令文本后,还开始计时;
和/或,步骤9中,在服务器重新生成并向终端发送预设声音口令文本后,还开始计时;
和/或,步骤11中,在服务器重新生成并向终端发送预设声音口令文本后,还开始计时;
步骤5与步骤7之间,还包括以下步骤:
步骤A、服务器判断是否在预设时间内接收到终端发送来的语音音频数据,若计时时间达到预设时间仍未收到终端发送来的语音音频数据,则进入步骤A,否则进入步骤7;
步骤B、服务器更换预设声音口令文本并重新向终端发送更换后的预设声音口令文本,且重新开始计时,回到步骤A,所述更换后的预设声音口令文本为重新随机生成的一段易读文字或随机生成的一段数字或随机生成的一段新闻类文字。
再进一步的,步骤9中,若匹配失败,返回终端语音识别失败信息后,服务器还进入步骤13;
步骤11中,若认为验证成功,返回终端验证成功信息后,服务器还进入步骤13,若认为本次用户验证失败,返回终端验证失败信息后,服务器还进入步骤13;
步骤13、服务器利用本次认证中接收到的人脸图像对人脸识别结果中的用户信息对应的人脸建模进行优化。
本发明的有益效果是,在本发明方案中,通过上述基于人脸识别和声纹识别的交互式认证系统及方法,利用人脸识别及声纹识别,达到安全性更高的认证,提升安全性。
附图说明
图1为本发明实施例中基于人脸识别和声纹识别的交互式认证系统的系统框图。
具体实施方式
下面结合附图及实施例,详细描述本发明的技术方案。
本发明所述基于人脸识别和声纹识别的交互式认证系统,其系统框图参见图1,包括终端及服务器,终端与服务器通过网络连接,其中,终端用于获取被检测用户的面部视频及采集用户输入的语音音频数据将其发送至服务器,且显示服务器发送来的显示提示信息;服务器用于进行用户面部特征参数进行匹配和用户声纹特征向量进行匹配,并将声纹识别结果与人脸识别结果进行集合取交集,若交集中只有一个结果,则认为验证成功,返回终端验证成功信息。
本发明所述的基于人脸识别和声纹识别的交互式认证方法,应用于上述基于人脸识别和声纹识别的交互式认证系统中,首先用户采用终端向服务器进行用户注册,服务器在数据库中存储用户信息、该用户面部特征参数及该用户声纹特征向量,在认证时,终端获取被检测用户的面部视频并发送至服务器,服务器对接收到被检测用户的面部视频进行过滤及去噪,并提取关键帧,根据关键帧获取用户面部特征参数,选取其中的关键特征参数与数据库中所存储的所有用户面部特征参数进行匹配,若匹配失败则服务器返 回终端人脸识别失败信息,终端显示人脸识别失败并提示用户,回到认证时那一步重新认证,若匹配成功则得到人脸识别结果,生成并向终端发送预设声音口令文本,然后终端显示声音口令文本,并采集用户输入的语音音频数据上传至服务器,服务器再将接收到的语音音频数据后转换为文本内容,并将该文本内容与之前所发送的声音口令文本进行匹配,若匹配失败则认为识别失败,返回终端声音口令输入不正确信息,终端显示声音口令输入不正确信息,回到认证时那一步重新认证,若匹配成功则服务器提取该语音音频数据中的声纹特征向量,将其与数据库中所存储的所有用户声纹特征向量进行匹配,若匹配失败则认为识别失败,返回终端语音识别失败信息,终端显示语音识别失败信息,回到认证时那一步重新认证,若匹配成功则得到语音识别结果,服务器将人脸识别结果与语音识别结果进行集合取交集,若交集为空,则认为本次用户验证失败,返回终端验证失败信息,终端显示验证失败信息,回到认证时那一步重新认证,若交集中只有一个结果,则认为验证成功,返回终端验证成功信息,若交集中有不止一个结果,则认为声纹特征不明显,判断本次认证是否已发送预设数量的声音口令文本,若是则认为本次用户验证失败,返回终端验证失败信息,终端显示验证失败信息,回到认证时那一步重新认证,否则重新生成并向终端发送预设声音口令文本,回到终端显示声音口令文本那一步。
实施例
本发明实施例的基于人脸识别和声纹识别的交互式认证系统,其系统框图参见图1,包括终端及服务器,终端与服务器通过网络连接,终端可包括显示模块、人脸视频采集模块、语音采集模块及第一通讯模块,服务器可包括人脸识别模块、语音识别模块、验证模块、数据库及第二通讯模块,显示模块、人脸视频采集模块、语音采集模块分别与第一通讯模块连接,人脸识别模块、语音识别模块、验证模块分别与第二通讯模块连接,人脸识别模块、语音识别模块分别与验证模块连接,数据库模块分别与人脸识别模块、语音识别模块及验证模块连接,第一通讯模块与第二通讯模块通过网络连接。
其中,终端用于获取被检测用户的面部视频及采集用户输入的语音音频数据将其发送至服务器,且显示服务器发送来的显示提示信息。
终端可包括显示模块、人脸视频采集模块、语音采集模块及第一通讯模块。
人脸视频采集模块用于获取被检测用户的面部视频将其通过第一通讯模块及第二通讯模块发送至人脸识别模块;其可以为摄像头等摄像模块。
语音采集模块用于采集用户输入的语音音频数据将其通过第一通讯模块及第二通 讯模块发送至语音识别模块;其可以为麦克风等拾音器。
显示模块用于显示服务器发送来的显示提示信息,包括人脸识别失败信息、声音口令输入不正确信息、验证失败信息、声音口令文本及验证成功信息等。
第一通讯模块用于终端与服务器之间的信息交互。
服务器用于进行用户面部特征参数进行匹配和用户声纹特征向量进行匹配,并将声纹识别结果与人脸识别结果进行集合取交集,若交集中只有一个结果,则认为验证成功,返回终端验证成功信息。这里,进行用户面部特征参数进行匹配和用户声纹特征向量进行匹配优选为:服务器从接收到的被检测用户的面部视频获取用户面部特征参数,将获取的用户面部特征参数与服务器预先存储的所有用户面部特征参数进行匹配,匹配成功则得到人脸识别结果,然后向终端发送预设声音口令文本,在接收到终端的语音采集模块发送来的语音音频数据后,将其转换为文本内容,并将该文本内容与之前所发送的声音口令文本进行匹配,匹配成功则提取该语音音频数据中的声纹特征向量,将其与服务器预先存储的所有用户声纹特征向量进行匹配,匹配成功则得到声纹识别结果。
服务器可包括人脸识别模块、语音识别模块、验证模块、数据库及第二通讯模块。
第二通讯模块用于终端与服务器之间的信息交互。
人脸识别模块用于接收到被检测用户的面部视频后对其进行过滤及去噪,并提取关键帧,根据关键帧获取用户面部特征参数,选取其中的关键特征参数与数据库中所存储的所有用户面部特征参数进行匹配,若匹配成功则将匹配成功结果发送给验证模块,该匹配成功结果即为人脸识别结果,若匹配失败则将返回终端人脸识别失败信息。人脸识别模块中可以设置图像相似度预设值,在选取用户面部特征参数中的关键特征参数与数据库中所存储的用户面部特征参数进行匹配时,若匹配出的结果中各用户面部特征参数相似度阈值小于图像相似度预设值时,判定为匹配成功,否则判定为匹配失败。人脸识别模块的匹配成功结果中可以包括用户信息,而用户信息中包括用户年龄信息。
语音识别模块用于在接收到验证模块发送来的语音识别请求后,向终端发送预设声音口令文本,令终端通过显示模块显示声音口令文本,在接收到终端的语音采集模块发送来的语音音频数据后,将其转换为文本内容,并将该文本内容与之前所发送的声音口令文本进行匹配,若匹配失败则认为识别失败,返回终端声音口令输入不正确信息,若匹配成功则提取该语音音频数据中的声纹特征向量,将其与数据库中所存储的所有用户声纹特征向量进行匹配,若匹配失败则认为识别失败,返回终端语音识别失败信息,若匹配成功则将匹配成功结果发送给验证模块,该匹配成功结果即为声纹识别结果。语音 识别模块中,预设声音口令文本为一段易读文字或一段数字或一段新闻类文字或与用户信息对应的注册时的声音口令文本等;语音识别模块中,在向终端发送预设声音口令文本前还可以根据语音识别请求进行判断,若语音识别请求中有请求发送注册时的声音口令文本则语音识别模块选择的预设声音口令文本为与用户信息对应的注册时的声音口令文本,若语音识别请求中有用户年龄信息,则根据用户年龄信息判断用户年龄,若用户为老年人或未成年人则选择的预设声音口令文本为一段易读文字或一段数字,否则选择的预设声音口令文本为一段新闻类文字;另外,在语音识别模块中,还可以在向终端发送预设声音口令文本后,还开始计时,判断是否在预设时间(如10秒)内接收到终端发送来的语音音频数据,若计时时间达到预设时间仍未收到终端发送来的语音音频数据,则更换预设声音口令文本并重新向终端发送更换后的预设声音口令文本,且重新开始计时,回到判断是否在预设时间内接收到终端发送来的语音音频数据那一步。
验证模块用于接收到人脸识别模块发送来的匹配成功结果后,向语音识别模块发送语音识别请求,在接收到语音识别模块发送来的匹配成功结果后,将其与人脸识别模块发送来的匹配成功结果进行集合取交集,若交集为空,则认为本次用户验证失败,返回终端验证失败信息,若交集中只有一个结果,则认为验证成功,返回终端验证成功信息,若交集中有不止一个结果,则认为声纹特征不明显,向语音识别模块重新发送语音识别请求,若此时已发送过预设数量的语音识别请求则认为本次用户验证失败,返回终端验证失败信息。验证模块向语音识别模块发送的语音识别请求中,包含用户年龄信息或请求发送注册时的声音口令文本等,且还可以在验证模块向语音识别模块发送的语音识别请求中,若此次是第预设数量次(如预设数量为3,此时为第3次时)向语音识别模块发送语音识别请求,则该语音识别请求中包括请求发送注册时的声音口令文本。
使用时,其处理方法如下:
步骤1、用户采用终端向服务器进行用户注册,服务器在数据库中存储用户信息、该用户面部特征参数及该用户声纹特征向量。
本步骤中,用户信息优选包括用户年龄信息,本步骤具体可包括以下步骤:
步骤101、用户向终端输入用户信息,并通过终端采集人脸视频或多张人脸图像,终端将用户信息及人脸视频或多张人脸图像上传至服务器。
步骤102、服务器从人脸视频中截取多张人脸图像或将接收到的多张图像作为人脸样本,得到该用户面部特征参数,并进行人脸建模,并将其与用户信息关联后存储于数据库中,并随机生成声音口令文本发送给终端。
这里,随机生成声音口令文本发送给终端中,可随机生成至少一段声音口令文本,并按顺序发送给终端,例如随机生成三段声音口令文本,随机对其排序后按顺序发送给终端。其中,随机生成多少段声音口令文本根据业务认证的安全度来确定,一般来说,安全度需求越高的业务认证,在注册时,随机生成的声音口令文本的数量越多。
步骤103、终端显示声音口令文本,并采集用户的语音音频数据,将所采集的语音音频数据上传给服务器。
这里,终端显示声音口令文本,并采集用户的语音音频数据,将所采集的语音音频数据上传给服务器中,若终端是按顺序接收到多段声音口令文本时,按顺序显示声音口令文本,当一个声音口令文本采集了三次对应的用户的语音音频数据后,再显示下一个声音口令文本,得到所有声音口令文本对应的各三个语音音频数据后,发送给服务器。例如终端按顺序接收到两段声音口令文本时,则先显示第一段声音口令文本,采集三次用户按照第一段声音口令文本输入的用户语音音频数据后,再显示第二段声音口令文本,再次采集三次用户按照第二段声音口令文本输入的用户语音音频数据,然后将对应第一段声音口令文本的三个用户语音音频数据及对应第二段声音口令文本的三个用户语音音频数据一起发送给服务器,总共六个用户语音音频数据。
步骤104、服务器对语音音频数据进行声纹特征向量提取,将提取的声纹特征向量、语音音频数据及对应的声音口令文本与用户信息关联后存储于数据库中。
这里,若服务器接收到多个语音音频数据,则服务器在接收到所有语音音频数据后,分别对其进行声纹特征向量提取,针对每一个声音口令文本,选择出其中声纹特征向量最明显的一个语音音频数据,将声音口令文本、所选择的语音音频数据及其声纹特征向量与用信息系关联后存储于数据库中。即一个声音口令文本对应一个语音音频数据,可删除另两个语音音频数据。
步骤2、认证时,终端获取被检测用户的面部视频并发送至服务器。
步骤3、服务器对接收到被检测用户的面部视频进行过滤及去噪,并提取关键帧,根据关键帧获取用户面部特征参数,选取其中的关键特征参数与数据库中所存储的所有用户面部特征参数进行匹配,若匹配成功则得到人脸识别结果并进入步骤5,若匹配失败则进入步骤4。
本步骤中,服务器中可以设置图像相似度预设值,在选取用户面部特征参数中的关键特征参数与数据库中所存储的用户面部特征参数进行匹配时,若匹配出的结果中各用户面部特征参数相似度阈值小于图像相似度预设值时,判定为匹配成功,否则判定为匹 配失败。这里,人脸识别结果优选包括用户信息,而用户信息由步骤1可见,其优选包括用户年龄信息。
步骤4、服务器返回终端人脸识别失败信息,终端显示人脸识别失败并提示用户,回到步骤2。
步骤5、服务器生成并向终端发送预设声音口令文本。
本步骤中,预设声音口令文本可以为随机生成的一段易读文字或随机生成的一段数字或随机生成的一段新闻类文字或与用户信息对应的注册时的声音口令文本等。
这里,服务器生成并向终端发送预设声音口令文本时,若人脸识别结果中的用户信息(可根据用户年龄信息判断)显示为老年人或未成年人则选择的预设声音口令文本为一段易读文字或一段数字,其目的就在于保证用户能够看懂且读出声音口令文本,否则选择的预设声音口令文本为一段新闻类文字,此处否则即是指用户信息显示用户为成年人,而成年人一般都能够看懂且读出声音口令文本,因此选择一段新闻类文字,以增加识别精准度。
步骤6、终端显示声音口令文本,并采集用户输入的语音音频数据上传至服务器。
步骤7、服务器将接收到的语音音频数据后转换为文本内容,并将该文本内容与之前所发送的声音口令文本进行匹配,若匹配失败则认为识别失败,返回终端声音口令输入不正确信息,进入步骤8,若匹配成功则进入步骤9。
步骤8、终端显示声音口令输入不正确信息,回到步骤2。
步骤9、服务器提取该语音音频数据中的声纹特征向量,将其与数据库中所存储的所有用户声纹特征向量进行匹配,若匹配失败则认为识别失败,返回终端语音识别失败信息,进入步骤10,若匹配成功则得到语音识别结果并进入步骤11。
本步骤中,若匹配失败时,还可以判断是否已生成过预设数量减一个(例如预设数量为3,则此时即是判断是否已生成过2个声音口令文本)声音口令文本,若是则认为识别失败,返回终端语音识别失败信息,进入步骤10,否则重新生成并向终端发送预设声音口令文本,回到步骤6,该重新生成并向终端发送的预设声音口令文本为随机生成的一段易读文字或随机生成的一段数字或随机生成的一段新闻类文字,其长度大于上一次生成的预设声音口令文本,可见,其可与步骤5中的生成方法相对应。
本步骤中,服务器中还可以设置声纹相似度预设值,在服务器将提取的语音音频数据中的声纹特征向量与数据库中所存储的所有用户声纹特征向量进行匹配时,若匹配出的结果中各用户用户声纹特征向量相似度阈值小于声纹相似度预设值时,判定为匹配成 功,否则判定为匹配失败。
步骤10、终端显示语音识别失败信息,回到步骤2。
步骤11、服务器将人脸识别结果与语音识别结果进行集合取交集,若交集为空,则认为本次用户验证失败,返回终端验证失败信息,进入步骤12,若交集中只有一个结果,则认为验证成功,返回终端验证成功信息,若交集中有不止一个结果,则认为声纹特征不明显,判断本次认证是否已发送预设数量的声音口令文本,若是则认为本次用户验证失败,返回终端验证失败信息,进入步骤12,否则重新生成并向终端发送预设声音口令文本,回到步骤6。
本步骤中,重新生成并向终端发送预设声音口令文本中,所重新生成的预设声音口令文本为与用户信息对应的注册时的声音口令文本中的一个,即本例中步骤102中随机生成的声音口令文本中的一个,当其只有一个时,则就直接选择该声音口令文本,若之前未如步骤102那样生成了随机声音口令文本,而是直接采集的用户语音音频数据,再通过用户语音音频数据获取到用户的声纹特征向量,则此时可选择该用户语音音频数据对应的声音口令文本(可通过转换用户语音音频数据为文本数据的方法得到)。
步骤12,终端显示验证失败信息,回到步骤2。
本例中,在服务器生成并向终端发送预设声音口令文本后,还开始计时,这里,服务器可以是本次认证时首次生成并向终端发送预设声音口令文本,也可以是服务器在本次认证时重新生成并向终端发送预设声音口令文本,即是指只要服务器生成并向终端发送预设声音口令文本后,就开始计时。
则步骤5与步骤7之间,还可以包括以下步骤:
步骤A、服务器判断是否在预设时间内接收到终端发送来的语音音频数据,若计时时间达到预设时间仍未收到终端发送来的语音音频数据,则进入步骤A,否则进入步骤7;
步骤B、服务器更换预设声音口令文本并重新向终端发送更换后的预设声音口令文本,且重新开始计时,回到步骤A,所述更换后的预设声音口令文本为重新随机生成的一段易读文字或随机生成的一段数字或随机生成的一段新闻类文字。
本例中,步骤9中,若匹配失败,返回终端语音识别失败信息后,服务器还可以进入步骤13,此时终端仍然进入步骤10;
步骤11中,若认为验证成功,返回终端验证成功信息后,服务器还可以进入步骤13,若认为本次用户验证失败,返回终端验证失败信息后,服务器还可以进入步骤13, 此时终端仍然进入步骤12。
则步骤13可以为:服务器利用本次认证中接收到的人脸图像对人脸识别结果中的用户信息对应的人脸建模进行优化。其目的在于:由于人脸识别成功,则说明所用于识别的人脸图像或所采集的人脸视频是正确的,则可利用这些正确的人脸图像信息对人脸建模进行优化,提高人脸识别时的精确度,删除其中无效的用户面部特征参数等,以提高运算效率。
同理,在步骤11中,若认为验证成功,返回终端验证成功信息后,服务器还可利用本次认证中接收到的语音音频数据对人脸识别结果中的用户信息对应的声纹特征数据进行优化。
本例中,参见上述处理可知,优选为人脸识别步骤在前,而声纹识别在后,其原因是:首先,人脸识别经过目前几十年的发展,其技术较为成熟,算法效率高、处理速度快,且声纹识别与其他生理特征识别不同,声纹识别的特征必须是“个性化”特征,而说话人(即需要声纹识别的用户)需要识别的特征针对该说话人必须是有“共性特征”的。虽然目前大部分声纹识别系统采用的都是声学层面的特征,但是表征一个人特点的特征应该是多层面的,包括:1)与人类的发音机制的解剖学结构有关的声学特征(如频谱、倒频谱、共振峰、基音、反射系数等等)、鼻音、带深呼吸音、沙哑音、笑声等;2)受社会经济状况、受教育水平、出生地等影响的语义、修辞、发音、言语习惯等;3)个人特点或受父母影响的韵律、节奏、速度、语调、音量等特征。从利用数学方法可以建模的角度出发,声纹自动识别模型目前可以使用的特征包括:1)声学特征(倒频谱);2)词法特征(说话人相关的词n-gram,音素n-gram);3)韵律特征(利用n-gram描述的基音和能量“姿势”);4)语种、方言和口音信息;5)通道信息(使用何种通道)等。因此,在本发明方案中,其预设声音口令文本可以是基于用户信息来随机生成的。而又由于本发明中提到的人脸识别及声纹识别的具体方式为现有较为成熟的技术,因此本案不再详述。

Claims (22)

  1. 基于人脸识别和声纹识别的交互式认证系统,包括终端及服务器,终端与服务器通过网络连接,其特征在于,
    所述终端用于获取被检测用户的面部视频及采集用户输入的语音音频数据将其发送至服务器,且显示服务器发送来的显示提示信息;
    所述服务器用于进行用户面部特征参数进行匹配和用户声纹特征向量进行匹配,并将声纹识别结果与人脸识别结果进行集合取交集,若交集中只有一个结果,则认为验证成功,返回终端验证成功信息。
  2. 如权利要求1所述的基于人脸识别和声纹识别的交互式认证系统,其特征在于,
    所述进行用户面部特征参数进行匹配和用户声纹特征向量进行匹配是指:服务器从接收到的被检测用户的面部视频获取用户面部特征参数,将获取的用户面部特征参数与服务器预先存储的所有用户面部特征参数进行匹配,匹配成功则得到人脸识别结果,然后向终端发送预设声音口令文本,在接收到终端的语音采集模块发送来的语音音频数据后,将其转换为文本内容,并将该文本内容与之前所发送的声音口令文本进行匹配,匹配成功则提取该语音音频数据中的声纹特征向量,将其与服务器预先存储的所有用户声纹特征向量进行匹配,匹配成功则得到声纹识别结果。
  3. 如权利要求2所述的基于人脸识别和声纹识别的交互式认证系统,其特征在于,所述终端包括显示模块、人脸视频采集模块、语音采集模块及第一通讯模块,服务器包括人脸识别模块、语音识别模块、验证模块、数据库及第二通讯模块,所述显示模块、人脸视频采集模块、语音采集模块分别与第一通讯模块连接,人脸识别模块、语音识别模块、验证模块分别与第二通讯模块连接,人脸识别模块、语音识别模块分别与验证模块连接,数据库模块分别与人脸识别模块、语音识别模块及验证模块连接,第一通讯模块与第二通讯模块通过网络连接,
    所述人脸视频采集模块用于获取被检测用户的面部视频将其通过第一通讯模块及第二通讯模块发送至人脸识别模块;
    所述语音采集模块用于采集用户输入的语音音频数据将其通过第一通讯模块及第二通讯模块发送至语音识别模块;
    所述显示模块用于显示服务器发送来的显示提示信息,包括人脸识别失败信 息、声音口令输入不正确信息、验证失败信息、声音口令文本及验证成功信息;
    所述第一通讯模块及第二通讯模块用于终端与服务器之间的信息交互;
    所述人脸识别模块用于接收到被检测用户的面部视频后对其进行过滤及去噪,并提取关键帧,根据关键帧获取用户面部特征参数,选取其中的关键特征参数与数据库中所存储的所有用户面部特征参数进行匹配,若匹配成功则将匹配成功结果发送给验证模块,该匹配成功结果即为人脸识别结果,若匹配失败则将返回终端人脸识别失败信息;
    所述语音识别模块用于在接收到验证模块发送来的语音识别请求后,向终端发送预设声音口令文本,令终端通过显示模块显示声音口令文本,在接收到终端的语音采集模块发送来的语音音频数据后,将其转换为文本内容,并将该文本内容与之前所发送的声音口令文本进行匹配,若匹配失败则认为识别失败,返回终端声音口令输入不正确信息,若匹配成功则提取该语音音频数据中的声纹特征向量,将其与数据库中所存储的所有用户声纹特征向量进行匹配,若匹配失败则认为识别失败,返回终端语音识别失败信息,若匹配成功则将匹配成功结果发送给验证模块,该匹配成功结果即为声纹识别结果;
    所述验证模块用于接收到人脸识别模块发送来的匹配成功结果后,向语音识别模块发送语音识别请求,在接收到语音识别模块发送来的匹配成功结果后,将其与人脸识别模块发送来的匹配成功结果进行集合取交集,若交集为空,则认为本次用户验证失败,返回终端验证失败信息,若交集中只有一个结果,则认为验证成功,返回终端验证成功信息,若交集中有不止一个结果,则认为声纹特征不明显,向语音识别模块重新发送语音识别请求,若此时已发送过预设数量的语音识别请求则认为本次用户验证失败,返回终端验证失败信息。
  4. 如权利要求3所述的基于人脸识别和声纹识别的交互式认证系统,其特征在于,所述人脸识别模块中设置有图像相似度预设值,在选取用户面部特征参数中的关键特征参数与数据库中所存储的用户面部特征参数进行匹配时,若匹配出的结果中各用户面部特征参数相似度阈值小于图像相似度预设值时,判定为匹配成功,否则判定为匹配失败。
  5. 如权利要求3所述的基于人脸识别和声纹识别的交互式认证系统,其特征在于,所述人脸识别模块的匹配成功结果中包括用户信息,所述用户信息中包括用户年龄信息。
  6. 如权利要求5所述的基于人脸识别和声纹识别的交互式认证系统,其特征在于,所述验证模块向语音识别模块发送的语音识别请求中,包含用户年龄信息或请求发送注册时的声音口令文本。
  7. 如权利要求6所述的基于人脸识别和声纹识别的交互式认证系统,其特征在于,所述验证模块向语音识别模块发送的语音识别请求中,若此次是第预设数量次向语音识别模块发送语音识别请求,则该语音识别请求中包括请求发送注册时的声音口令文本。
  8. 如权利要求6所述的基于人脸识别和声纹识别的交互式认证系统,其特征在于,所述语音识别模块中,预设声音口令文本为一段易读文字或一段数字或一段新闻类文字或与用户信息对应的注册时的声音口令文本。
  9. 如权利要求8所述的基于人脸识别和声纹识别的交互式认证系统,其特征在于,所述语音识别模块中,在向终端发送预设声音口令文本前还根据语音识别请求进行判断,若语音识别请求中有请求发送注册时的声音口令文本则语音识别模块选择的预设声音口令文本为与用户信息对应的注册时的声音口令文本,若语音识别请求中有用户年龄信息,则根据用户年龄信息判断用户年龄,若用户为老年人或未成年人则选择的预设声音口令文本为一段易读文字或一段数字,否则选择的预设声音口令文本为一段新闻类文字。
  10. 如权利要求3-9任一项所述的基于人脸识别和声纹识别的交互式认证系统,其特征在于,所述语音识别模块中,在向终端发送预设声音口令文本后,还开始计时,判断是否在预设时间内接收到终端发送来的语音音频数据,若计时时间达到预设时间仍未收到终端发送来的语音音频数据,则更换预设声音口令文本并重新向终端发送更换后的预设声音口令文本,且重新开始计时,回到判断是否在预设时间内接收到终端发送来的语音音频数据那一步。
  11. 基于人脸识别和声纹识别的交互式认证方法,应用于如权利要求1-10任一项所述的基于人脸识别和声纹识别的交互式认证系统,其特征在于,包括以下步骤;
    步骤1、用户采用终端向服务器进行用户注册,服务器在数据库中存储用户信息、该用户面部特征参数及该用户声纹特征向量;
    步骤2、认证时,终端获取被检测用户的面部视频并发送至服务器;
    步骤3、服务器对接收到被检测用户的面部视频进行过滤及去噪,并提取关 键帧,根据关键帧获取用户面部特征参数,选取其中的关键特征参数与数据库中所存储的所有用户面部特征参数进行匹配,若匹配成功则得到人脸识别结果并进入步骤5,若匹配失败则进入步骤4;
    步骤4、服务器返回终端人脸识别失败信息,终端显示人脸识别失败并提示用户,回到步骤2;
    步骤5、服务器生成并向终端发送预设声音口令文本;
    步骤6、终端显示声音口令文本,并采集用户输入的语音音频数据上传至服务器;
    步骤7、服务器将接收到的语音音频数据后转换为文本内容,并将该文本内容与之前所发送的声音口令文本进行匹配,若匹配失败则认为识别失败,返回终端声音口令输入不正确信息,进入步骤8,若匹配成功则进入步骤9;
    步骤8、终端显示声音口令输入不正确信息,回到步骤2;
    步骤9、服务器提取该语音音频数据中的声纹特征向量,将其与数据库中所存储的所有用户声纹特征向量进行匹配,若匹配失败则认为识别失败,返回终端语音识别失败信息,进入步骤10,若匹配成功则得到语音识别结果并进入步骤11;
    步骤10、终端显示语音识别失败信息,回到步骤2;
    步骤11、服务器将人脸识别结果与语音识别结果进行集合取交集,若交集为空,则认为本次用户验证失败,返回终端验证失败信息,进入步骤12,若交集中只有一个结果,则认为验证成功,返回终端验证成功信息,若交集中有不止一个结果,则认为声纹特征不明显,判断本次认证是否已发送预设数量的声音口令文本,若是则认为本次用户验证失败,返回终端验证失败信息,进入步骤12,否则重新生成并向终端发送预设声音口令文本,回到步骤6;
    步骤12,终端显示验证失败信息,回到步骤2。
  12. 如权利要求11所述的基于人脸识别和声纹识别的交互式认证方法,其特征在于,步骤1包括以下步骤:
    步骤101、用户向终端输入用户信息,并通过终端采集人脸视频或多张人脸图像,终端将用户信息及人脸视频或多张人脸图像上传至服务器;
    步骤102、服务器从人脸视频中截取多张人脸图像或将接收到的多张图像作为人脸样本,得到该用户面部特征参数,并进行人脸建模,并将其与用户信息关 联后存储于数据库中,并随机生成声音口令文本发送给终端;
    步骤103、终端显示声音口令文本,并采集用户的语音音频数据,将所采集的语音音频数据上传给服务器;
    步骤104、服务器对语音音频数据进行声纹特征向量提取,将提取的声纹特征向量、语音音频数据及对应的声音口令文本与用户信息关联后存储于数据库中。
  13. 如权利要求12所述的基于人脸识别和声纹识别的交互式认证方法,其特征在于,步骤102中,所述随机生成声音口令文本发送给终端中,随机生成至少一段声音口令文本,并按顺序发送给终端;
    步骤103中,所述终端显示声音口令文本,并采集用户的语音音频数据,将所采集的语音音频数据上传给服务器中,终端按顺序显示声音口令文本,当一个声音口令文本采集了三次对应的用户的语音音频数据后,再显示下一个声音口令文本,得到所有声音口令文本对应的各三个语音音频数据后,发送给服务器。
  14. 如权利要求13所述的基于人脸识别和声纹识别的交互式认证方法,其特征在于,步骤104中,服务器接收到所有语音音频数据后,分别对其进行声纹特征向量提取,针对每一个声音口令文本,选择出其中声纹特征向量最明显的一个语音音频数据,将声音口令文本、所选择的语音音频数据及其声纹特征向量与用信息系关联后存储于数据库中。
  15. 如权利要求14所述的基于人脸识别和声纹识别的交互式认证方法,其特征在于,步骤11中,所述重新生成并向终端发送预设声音口令文本中,所重新生成的预设声音口令文本为与用户信息对应的注册时的声音口令文本中的一个。
  16. 如权利要求11所述的基于人脸识别和声纹识别的交互式认证方法,其特征在于,步骤3中,服务器中设置有图像相似度预设值,在选取用户面部特征参数中的关键特征参数与数据库中所存储的用户面部特征参数进行匹配时,若匹配出的结果中各用户面部特征参数相似度阈值小于图像相似度预设值时,判定为匹配成功,否则判定为匹配失败。
  17. 如权利要求11所述的基于人脸识别和声纹识别的交互式认证方法,其特征在于,步骤5中,所述预设声音口令文本为随机生成的一段易读文字或随机生成的一段数字或随机生成的一段新闻类文字或与用户信息对应的注册时的声 音口令文本。
  18. 如权利要求17所述的基于人脸识别和声纹识别的交互式认证方法,其特征在于,步骤1中,所述用户信息包括用户年龄信息;
    步骤3中,所述人脸识别结果中包括用户信息;
    步骤5中,所述服务器生成并向终端发送预设声音口令文本时,若人脸识别结果中的用户信息显示为老年人或未成年人则选择的预设声音口令文本为一段易读文字或一段数字,否则选择的预设声音口令文本为一段新闻类文字。
  19. 如权利要求11所述的基于人脸识别和声纹识别的交互式认证方法,其特征在于,步骤9中,若匹配失败时,还判断是否已生成过预设数量减一个声音口令文本,若是则认为识别失败,返回终端语音识别失败信息,进入步骤10,否则重新生成并向终端发送预设声音口令文本,回到步骤6,该重新生成并向终端发送的预设声音口令文本为随机生成的一段易读文字或随机生成的一段数字或随机生成的一段新闻类文字,其长度大于上一次生成的预设声音口令文本。
  20. 如权利要求11所述的基于人脸识别和声纹识别的交互式认证方法,其特征在于,步骤9中,服务器中设置有声纹相似度预设值,在服务器将提取的语音音频数据中的声纹特征向量与数据库中所存储的所有用户声纹特征向量进行匹配时,若匹配出的结果中各用户用户声纹特征向量相似度阈值小于声纹相似度预设值时,判定为匹配成功,否则判定为匹配失败。
  21. 如权利要求11-20任一项所述的基于人脸识别和声纹识别的交互式认证方法,其特征在于,步骤5中,在服务器生成并向终端发送预设声音口令文本后,还开始计时;
    和/或,步骤9中,在服务器重新生成并向终端发送预设声音口令文本后,还开始计时;
    和/或,步骤11中,在服务器重新生成并向终端发送预设声音口令文本后,还开始计时;
    步骤5与步骤7之间,还包括以下步骤:
    步骤A、服务器判断是否在预设时间内接收到终端发送来的语音音频数据,若计时时间达到预设时间仍未收到终端发送来的语音音频数据,则进入步骤A,否则进入步骤7;
    步骤B、服务器更换预设声音口令文本并重新向终端发送更换后的预设声音 口令文本,且重新开始计时,回到步骤A,所述更换后的预设声音口令文本为重新随机生成的一段易读文字或随机生成的一段数字或随机生成的一段新闻类文字。
  22. 如权利要求11-20任一项所述的基于人脸识别和声纹识别的交互式认证方法,其特征在于,步骤9中,若匹配失败,返回终端语音识别失败信息后,服务器还进入步骤13;
    步骤11中,若认为验证成功,返回终端验证成功信息后,服务器还进入步骤13,若认为本次用户验证失败,返回终端验证失败信息后,服务器还进入步骤13;
    步骤13、服务器利用本次认证中接收到的人脸图像对人脸识别结果中的用户信息对应的人脸建模进行优化。
PCT/CN2017/114928 2016-12-20 2017-12-07 基于人脸识别和声纹识别的交互式认证系统及方法 Ceased WO2018113526A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201611181543.3A CN106790054A (zh) 2016-12-20 2016-12-20 基于人脸识别和声纹识别的交互式认证系统及方法
CN201611181543.3 2016-12-20

Publications (1)

Publication Number Publication Date
WO2018113526A1 true WO2018113526A1 (zh) 2018-06-28

Family

ID=58890935

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/114928 Ceased WO2018113526A1 (zh) 2016-12-20 2017-12-07 基于人脸识别和声纹识别的交互式认证系统及方法

Country Status (2)

Country Link
CN (1) CN106790054A (zh)
WO (1) WO2018113526A1 (zh)

Cited By (58)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108694767A (zh) * 2018-07-13 2018-10-23 北京工业职业技术学院 身份认证装置和智能门禁系统
CN108846676A (zh) * 2018-08-02 2018-11-20 平安科技(深圳)有限公司 生物特征辅助支付方法、装置、计算机设备及存储介质
CN109543377A (zh) * 2018-10-17 2019-03-29 深圳壹账通智能科技有限公司 身份验证方法、装置、计算机设备和存储介质
CN109842805A (zh) * 2019-01-04 2019-06-04 平安科技(深圳)有限公司 视频看点的生成方法、装置、计算机设备及存储介质
CN110074519A (zh) * 2019-04-10 2019-08-02 南京启诺信息技术有限公司 一种语言识别手环
CN110163630A (zh) * 2019-04-15 2019-08-23 中国平安人寿保险股份有限公司 产品监管方法、装置、计算机设备及存储介质
CN110287363A (zh) * 2019-05-22 2019-09-27 深圳壹账通智能科技有限公司 基于深度学习的资源推送方法、装置、设备及存储介质
CN110309570A (zh) * 2019-06-21 2019-10-08 济南大学 一种具有认知能力的多模态仿真实验容器及方法
CN110363278A (zh) * 2019-07-23 2019-10-22 广东小天才科技有限公司 一种亲子互动方法、机器人、服务器及亲子互动系统
CN110427468A (zh) * 2019-07-10 2019-11-08 深圳市一恒科电子科技有限公司 一种基于儿童云服务的学习方法及学习机
CN110442033A (zh) * 2019-07-30 2019-11-12 恒大智慧科技有限公司 家居设备的权限控制方法、装置、计算机设备及存储介质
CN110472394A (zh) * 2019-07-24 2019-11-19 天脉聚源(杭州)传媒科技有限公司 一种预留信息处理方法、系统、装置和存储介质
CN110751471A (zh) * 2018-07-06 2020-02-04 上海博泰悦臻网络技术服务有限公司 基于声纹识别的车内支付方法与云端服务器
WO2020029496A1 (zh) * 2018-08-10 2020-02-13 珠海格力电器股份有限公司 信息推送方法及装置
CN110807630A (zh) * 2019-09-19 2020-02-18 平安科技(深圳)有限公司 基于人脸识别的支付方法、装置、计算机设备和存储介质
CN111063358A (zh) * 2019-12-18 2020-04-24 浙江中辰城市应急服务管理有限公司 一种具有生命体识别功能的早期火灾预警和逃生指示系统
CN111079443A (zh) * 2019-12-26 2020-04-28 上海传英信息技术有限公司 一种视频通话人机交互方法及装置
CN111103966A (zh) * 2018-10-25 2020-05-05 安徽黑洞科技有限公司 一种智能展品统筹控制系统
CN111128144A (zh) * 2019-10-16 2020-05-08 国网浙江省电力有限公司金华供电公司 一种语音电网调度系统及方法
CN111341464A (zh) * 2020-03-25 2020-06-26 北京金和网络股份有限公司 疫情信息采集与分析方法及系统
CN111368737A (zh) * 2020-03-04 2020-07-03 开放智能机器(上海)有限公司 一种自动分析员工工作行为的系统及方法
CN111767805A (zh) * 2020-06-10 2020-10-13 云知声智能科技股份有限公司 多模态数据自动清洗与标注方法与系统
CN111803955A (zh) * 2019-04-12 2020-10-23 奇酷互联网络科技(深圳)有限公司 通过可穿戴设备管理账号的方法及系统、存储装置
CN112000939A (zh) * 2020-08-04 2020-11-27 叶兵 一种基于数字证书认证的律师远程法律服务系统及方法
CN112069484A (zh) * 2020-11-10 2020-12-11 中国科学院自动化研究所 基于多模态交互式的信息采集方法及系统
CN112185363A (zh) * 2020-10-21 2021-01-05 北京猿力未来科技有限公司 音频处理方法及装置
CN112202912A (zh) * 2020-10-12 2021-01-08 安徽兴安电气设备股份有限公司 一种二次供水远程自动监控系统
CN112235682A (zh) * 2020-11-17 2021-01-15 歌尔科技有限公司 耳机通话保密方法以及通话装置
CN112651610A (zh) * 2020-12-17 2021-04-13 韦福瑞 一种基于声音判断与识别模拟环境适应能力的检查方法和系统
CN112819061A (zh) * 2021-01-27 2021-05-18 北京小米移动软件有限公司 口令信息识别方法、装置、设备及存储介质
CN113032758A (zh) * 2021-03-26 2021-06-25 平安银行股份有限公司 视讯问答流程的身份识别方法、装置、设备及存储介质
CN113034110A (zh) * 2021-03-30 2021-06-25 泰康保险集团股份有限公司 基于视频审核的业务处理方法、系统、介质与电子设备
CN113112664A (zh) * 2021-02-23 2021-07-13 广州李博士科技研究有限公司 一种人脸识别立式门禁设备
CN113221672A (zh) * 2021-04-22 2021-08-06 国网安徽省电力有限公司 一种用于电力仪表库房的面部识别设备
CN113239041A (zh) * 2021-05-13 2021-08-10 大连交通大学 一种计算机大数据处理的采集系统及方法
CN113343211A (zh) * 2021-06-24 2021-09-03 工银科技有限公司 数据处理方法、处理系统、电子设备及存储介质
CN113469012A (zh) * 2021-06-28 2021-10-01 广州云从鼎望科技有限公司 用户刷脸验证的方法、系统、介质及装置
CN113890736A (zh) * 2021-11-22 2022-01-04 国网四川省电力公司成都供电公司 一种基于国密sm9算法的移动终端身份认证方法及系统
CN114007043A (zh) * 2021-10-27 2022-02-01 北京鼎普科技股份有限公司 基于视频数据指纹特征的视频解码方法、装置及系统
CN114038087A (zh) * 2020-07-20 2022-02-11 阜阳万瑞斯电子锁业有限公司 一种用于电子锁语音识别的开锁系统及方法
CN114168722A (zh) * 2021-11-23 2022-03-11 安徽经邦软件技术有限公司 基于人工智能技术的财务问答机器人
CN114187630A (zh) * 2021-11-29 2022-03-15 华人运通(上海)云计算科技有限公司 一种人脸特征的比对方法及系统
CN114511941A (zh) * 2022-02-16 2022-05-17 中国工商银行股份有限公司 防作弊签到方法、装置、设备、介质和程序产品
CN114580034A (zh) * 2022-03-10 2022-06-03 合肥工业大学 一种基于fpga的ro puf双重身份认证系统及其控制方法
CN114842513A (zh) * 2022-04-02 2022-08-02 湖南麓木和择科技有限公司 一种基于互联网的数据采集系统及信息采集装置
CN114876321A (zh) * 2022-05-23 2022-08-09 江苏德普尔门控科技有限公司 一种智能化自动感应式带家居系统的入户门
CN114979543A (zh) * 2021-02-24 2022-08-30 中国联合网络通信集团有限公司 一种智能家居控制方法及装置
CN115189911A (zh) * 2022-05-30 2022-10-14 平安科技(深圳)有限公司 面签文件的生成方法、装置、设备及存储介质
CN115311713A (zh) * 2022-08-05 2022-11-08 南京甄视智能科技有限公司 终端提取深度学习网络人脸特征值并网内高效复用的方法与系统
CN115412284A (zh) * 2022-07-04 2022-11-29 国网浙江省电力有限公司杭州市临安区供电公司 一种电力现场故障信息安全传输方法
CN115641105A (zh) * 2022-12-01 2023-01-24 中网道科技集团股份有限公司 一种监控社区矫正对象请假外出的数据处理方法
CN115690866A (zh) * 2021-07-23 2023-02-03 青岛聚看云科技有限公司 一种图像识别方法、服务器及显示设备
CN116189680A (zh) * 2023-05-04 2023-05-30 北京水晶石数字科技股份有限公司 一种展演智能设备的语音唤醒方法
CN116259095A (zh) * 2023-03-31 2023-06-13 南京审计大学 一种基于计算机的识别系统及方法
CN117273747A (zh) * 2023-09-28 2023-12-22 广州佳新智能科技有限公司 基于人脸图像识别的支付方法、装置、存储介质和设备
CN117376854A (zh) * 2023-10-30 2024-01-09 深圳中网讯通技术有限公司 多媒体短信内容的生成方法、装置、设备及存储介质
CN113127827B (zh) * 2021-05-08 2024-03-08 上海日羲科技有限公司 一种基于ai系统的用户指令处理方法
CN120410783A (zh) * 2025-07-03 2025-08-01 遇见美好文旅科技集团有限公司 一种中老年旅游erp业务全流程管理方法及系统

Families Citing this family (52)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106790054A (zh) * 2016-12-20 2017-05-31 四川长虹电器股份有限公司 基于人脸识别和声纹识别的交互式认证系统及方法
CN106878344A (zh) * 2017-04-25 2017-06-20 北京洋浦伟业科技发展有限公司 一种生物特征认证、注册方法及装置
CN109147770B (zh) * 2017-06-16 2023-07-28 阿里巴巴集团控股有限公司 声音识别特征的优化、动态注册方法、客户端和服务器
CN107358699B (zh) * 2017-07-17 2020-04-24 深圳市斑点猫信息技术有限公司 一种安全验证方法及系统
CN107481449A (zh) * 2017-08-25 2017-12-15 南京真格邦软件有限公司 一种基于人脸识别和语音识别的vtm机
CN107564541B (zh) * 2017-09-04 2018-11-02 南方医科大学南方医院 一种便携式婴儿啼哭声识别器及其识别方法
CN107832720B (zh) * 2017-11-16 2022-07-08 北京百度网讯科技有限公司 基于人工智能的信息处理方法和装置
CN108154884A (zh) * 2017-12-07 2018-06-12 浙江海洋大学 一种防替考的身份识别系统
CN108074310B (zh) * 2017-12-21 2021-06-11 广东汇泰龙科技股份有限公司 基于语音识别模块的语音交互方法及智能锁管理系统
CN108171137B (zh) * 2017-12-22 2021-12-28 深圳市泛海三江科技发展有限公司 一种人脸识别方法及系统
CN110022454B (zh) 2018-01-10 2021-02-23 华为技术有限公司 一种在视频会议中识别身份的方法及相关设备
CN108600627A (zh) * 2018-04-25 2018-09-28 东莞职业技术学院 一种智慧校园视频处理系统
CN108734114A (zh) * 2018-05-02 2018-11-02 浙江工业大学 一种结合面部和声纹的宠物识别方法
CN110555918B (zh) * 2018-06-01 2022-04-26 杭州海康威视数字技术股份有限公司 考勤管理的方法和考勤管理设备
CN110634472B (zh) * 2018-06-21 2024-06-04 中兴通讯股份有限公司 一种语音识别方法、服务器及计算机可读存储介质
CN110647729A (zh) * 2018-06-27 2020-01-03 深圳联友科技有限公司 一种登录验证方法及系统
CN110875905A (zh) * 2018-08-31 2020-03-10 百度在线网络技术(北京)有限公司 账号管理方法、装置及存储介质
CN109450850B (zh) * 2018-09-26 2022-10-11 深圳壹账通智能科技有限公司 身份验证方法、装置、计算机设备和存储介质
CN108965341A (zh) * 2018-09-28 2018-12-07 北京芯盾时代科技有限公司 登录认证的方法、装置及系统
CN109542216B (zh) * 2018-10-11 2022-11-22 平安科技(深圳)有限公司 人机交互方法、系统、计算机设备及存储介质
CN111083278A (zh) * 2018-10-21 2020-04-28 内蒙古龙腾睿昊智能有限公司 基于智能手机监测呼吸、步伐及定位人员信息的采集识别
CN109560941A (zh) * 2018-12-12 2019-04-02 深圳市沃特沃德股份有限公司 会议记录方法、装置、智能终端及存储介质
CN109767335A (zh) * 2018-12-15 2019-05-17 深圳壹账通智能科技有限公司 双录质检方法、装置、计算机设备及存储介质
CN109815806B (zh) * 2018-12-19 2024-06-28 平安科技(深圳)有限公司 人脸识别方法及装置、计算机设备、计算机存储介质
CN109769099B (zh) * 2019-01-15 2021-01-22 三星电子(中国)研发中心 通话人物异常的检测方法和装置
CN113947376B (zh) * 2019-01-16 2024-06-18 北京影谱科技股份有限公司 基于多重生物特征的c/s打卡方法和装置
CN109658579A (zh) * 2019-02-28 2019-04-19 中新智擎科技有限公司 一种门禁控制方法、系统、设备及存储介质
CN110210935B (zh) * 2019-05-22 2022-05-17 未来(北京)黑科技有限公司 安全认证方法及装置、存储介质、电子装置
CN110472485A (zh) * 2019-07-03 2019-11-19 华为技术有限公司 识别身份的方法和装置
CN110349583A (zh) * 2019-07-15 2019-10-18 高磊 一种基于语音识别的游戏教育方法及系统
CN110599325A (zh) * 2019-08-27 2019-12-20 杭州深景数据技术有限公司 一种告知书读取的方法、装置、设备及存储介质
CN112446395B (zh) 2019-08-29 2023-07-25 杭州海康威视数字技术股份有限公司 网络摄像机、视频监控系统及方法
CN111124109B (zh) * 2019-11-25 2023-05-05 北京明略软件系统有限公司 一种交互方式的选择方法、智能终端、设备及存储介质
CN110963382B (zh) * 2019-12-31 2022-03-15 界首市迅立达电梯有限公司 一种基于语音助手的电梯选层控制系统及方法
CN111401218B (zh) * 2020-03-12 2023-05-26 上海虹点智能科技有限公司 一种智慧城市监控方法及系统
CN111417018A (zh) * 2020-04-29 2020-07-14 苏州思必驰信息科技有限公司 用于智能视频播放设备的智能遥控注册和使用方法及装置
WO2021257000A1 (en) * 2020-06-19 2021-12-23 National University Of Singapore Cross-modal speaker verification
CN111882739B (zh) * 2020-07-21 2022-05-17 中国工商银行股份有限公司 门禁验证方法、门禁装置、服务器及系统
CN112016452A (zh) * 2020-08-27 2020-12-01 四川卫宁软件有限公司 一种医疗行为分析方法及其分析系统、计算机终端
CN112214298B (zh) * 2020-09-30 2023-09-22 国网江苏省电力有限公司信息通信分公司 基于声纹识别的动态优先级调度方法及系统
CN112491844A (zh) * 2020-11-18 2021-03-12 西北大学 一种基于可信执行环境的声纹及面部识别验证系统及方法
CN112466057B (zh) * 2020-12-01 2022-07-29 上海旷日网络科技有限公司 基于人脸识别和语音识别的交互式认证取件系统
CN112863513A (zh) * 2021-01-21 2021-05-28 中国南方电网有限责任公司超高压输电公司柳州局 一种通过面部语音识别结合身份验证下达控制指令的方法
CN113160826B (zh) * 2021-03-01 2022-09-02 特斯联科技集团有限公司 一种基于人脸识别的家庭成员通联方法和系统
CN113329013A (zh) * 2021-05-28 2021-08-31 南京国网电瑞系统工程有限公司 基于数字证书的电力调度数据网安全加密方法及系统
CN113271587B (zh) * 2021-06-11 2023-12-26 北京白龙马云行科技有限公司 一种用于车辆的物联网可信认证系统
CN113658357A (zh) * 2021-08-11 2021-11-16 四川长虹电器股份有限公司 基于声音和图像识别的远程控制智能门锁的方法
CN113806703B (zh) * 2021-08-26 2025-02-18 江苏苏商银行股份有限公司 一种多方位身份识别认证系统与方法
CN114710328A (zh) * 2022-03-18 2022-07-05 中国建设银行股份有限公司 一种身份识别处理方法和装置
CN115830683A (zh) * 2022-12-08 2023-03-21 石家庄职业技术学院 一种基于人工智能的互联网大数据处理系统及方法
CN115981184A (zh) * 2023-03-20 2023-04-18 太原重工股份有限公司 基于人脸和语音双重认证的远程急停控制系统及方法
CN116416726B (zh) * 2023-04-10 2024-06-25 深圳智慧空间信息技术有限公司 基于多重特征验证的高安全性门禁识别方法和系统

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102708867A (zh) * 2012-05-30 2012-10-03 北京正鹰科技有限责任公司 一种基于声纹和语音的防录音假冒身份识别方法及系统
CN103634118A (zh) * 2013-12-12 2014-03-12 山东神思电子技术股份有限公司 基于证卡和复合生物特征识别的生存认证方法
CN103841108A (zh) * 2014-03-12 2014-06-04 北京天诚盛业科技有限公司 用户生物特征的认证方法和系统
KR20140093459A (ko) * 2013-01-18 2014-07-28 한국전자통신연구원 자동 통역 방법
WO2014117583A1 (en) * 2013-01-29 2014-08-07 Tencent Technology (Shenzhen) Company Limited User authentication method and apparatus based on audio and video data
CN104834849A (zh) * 2015-04-14 2015-08-12 时代亿宝(北京)科技有限公司 基于声纹识别和人脸识别的双因素身份认证方法及系统
CN105426723A (zh) * 2015-11-20 2016-03-23 北京得意音通技术有限责任公司 基于声纹识别、人脸识别以及同步活体检测的身份认证方法及系统
CN106790054A (zh) * 2016-12-20 2017-05-31 四川长虹电器股份有限公司 基于人脸识别和声纹识别的交互式认证系统及方法

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102708867A (zh) * 2012-05-30 2012-10-03 北京正鹰科技有限责任公司 一种基于声纹和语音的防录音假冒身份识别方法及系统
KR20140093459A (ko) * 2013-01-18 2014-07-28 한국전자통신연구원 자동 통역 방법
WO2014117583A1 (en) * 2013-01-29 2014-08-07 Tencent Technology (Shenzhen) Company Limited User authentication method and apparatus based on audio and video data
CN103634118A (zh) * 2013-12-12 2014-03-12 山东神思电子技术股份有限公司 基于证卡和复合生物特征识别的生存认证方法
CN103841108A (zh) * 2014-03-12 2014-06-04 北京天诚盛业科技有限公司 用户生物特征的认证方法和系统
CN104834849A (zh) * 2015-04-14 2015-08-12 时代亿宝(北京)科技有限公司 基于声纹识别和人脸识别的双因素身份认证方法及系统
CN105426723A (zh) * 2015-11-20 2016-03-23 北京得意音通技术有限责任公司 基于声纹识别、人脸识别以及同步活体检测的身份认证方法及系统
CN106790054A (zh) * 2016-12-20 2017-05-31 四川长虹电器股份有限公司 基于人脸识别和声纹识别的交互式认证系统及方法

Cited By (75)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110751471A (zh) * 2018-07-06 2020-02-04 上海博泰悦臻网络技术服务有限公司 基于声纹识别的车内支付方法与云端服务器
CN108694767A (zh) * 2018-07-13 2018-10-23 北京工业职业技术学院 身份认证装置和智能门禁系统
CN108846676B (zh) * 2018-08-02 2023-07-11 平安科技(深圳)有限公司 生物特征辅助支付方法、装置、计算机设备及存储介质
CN108846676A (zh) * 2018-08-02 2018-11-20 平安科技(深圳)有限公司 生物特征辅助支付方法、装置、计算机设备及存储介质
WO2020029496A1 (zh) * 2018-08-10 2020-02-13 珠海格力电器股份有限公司 信息推送方法及装置
CN109543377A (zh) * 2018-10-17 2019-03-29 深圳壹账通智能科技有限公司 身份验证方法、装置、计算机设备和存储介质
CN111103966A (zh) * 2018-10-25 2020-05-05 安徽黑洞科技有限公司 一种智能展品统筹控制系统
CN109842805A (zh) * 2019-01-04 2019-06-04 平安科技(深圳)有限公司 视频看点的生成方法、装置、计算机设备及存储介质
CN109842805B (zh) * 2019-01-04 2022-10-21 平安科技(深圳)有限公司 视频看点的生成方法、装置、计算机设备及存储介质
CN110074519A (zh) * 2019-04-10 2019-08-02 南京启诺信息技术有限公司 一种语言识别手环
CN111803955A (zh) * 2019-04-12 2020-10-23 奇酷互联网络科技(深圳)有限公司 通过可穿戴设备管理账号的方法及系统、存储装置
CN110163630A (zh) * 2019-04-15 2019-08-23 中国平安人寿保险股份有限公司 产品监管方法、装置、计算机设备及存储介质
CN110163630B (zh) * 2019-04-15 2024-04-05 中国平安人寿保险股份有限公司 产品监管方法、装置、计算机设备及存储介质
CN110287363A (zh) * 2019-05-22 2019-09-27 深圳壹账通智能科技有限公司 基于深度学习的资源推送方法、装置、设备及存储介质
CN110309570B (zh) * 2019-06-21 2022-11-04 济南大学 一种具有认知能力的多模态仿真实验容器及方法
CN110309570A (zh) * 2019-06-21 2019-10-08 济南大学 一种具有认知能力的多模态仿真实验容器及方法
CN110427468A (zh) * 2019-07-10 2019-11-08 深圳市一恒科电子科技有限公司 一种基于儿童云服务的学习方法及学习机
CN110363278A (zh) * 2019-07-23 2019-10-22 广东小天才科技有限公司 一种亲子互动方法、机器人、服务器及亲子互动系统
CN110472394A (zh) * 2019-07-24 2019-11-19 天脉聚源(杭州)传媒科技有限公司 一种预留信息处理方法、系统、装置和存储介质
CN110442033A (zh) * 2019-07-30 2019-11-12 恒大智慧科技有限公司 家居设备的权限控制方法、装置、计算机设备及存储介质
CN110807630A (zh) * 2019-09-19 2020-02-18 平安科技(深圳)有限公司 基于人脸识别的支付方法、装置、计算机设备和存储介质
CN111128144A (zh) * 2019-10-16 2020-05-08 国网浙江省电力有限公司金华供电公司 一种语音电网调度系统及方法
CN111063358A (zh) * 2019-12-18 2020-04-24 浙江中辰城市应急服务管理有限公司 一种具有生命体识别功能的早期火灾预警和逃生指示系统
CN111079443A (zh) * 2019-12-26 2020-04-28 上海传英信息技术有限公司 一种视频通话人机交互方法及装置
CN111368737A (zh) * 2020-03-04 2020-07-03 开放智能机器(上海)有限公司 一种自动分析员工工作行为的系统及方法
CN111341464A (zh) * 2020-03-25 2020-06-26 北京金和网络股份有限公司 疫情信息采集与分析方法及系统
CN111767805A (zh) * 2020-06-10 2020-10-13 云知声智能科技股份有限公司 多模态数据自动清洗与标注方法与系统
CN114038087B (zh) * 2020-07-20 2024-03-15 阜阳万瑞斯电子锁业有限公司 一种用于电子锁语音识别的开锁系统及方法
CN114038087A (zh) * 2020-07-20 2022-02-11 阜阳万瑞斯电子锁业有限公司 一种用于电子锁语音识别的开锁系统及方法
CN112000939A (zh) * 2020-08-04 2020-11-27 叶兵 一种基于数字证书认证的律师远程法律服务系统及方法
CN112000939B (zh) * 2020-08-04 2023-10-27 叶兵 一种基于数字证书认证的律师远程法律服务系统及方法
CN112202912A (zh) * 2020-10-12 2021-01-08 安徽兴安电气设备股份有限公司 一种二次供水远程自动监控系统
CN112202912B (zh) * 2020-10-12 2022-08-09 安徽兴安电气设备股份有限公司 一种二次供水远程自动监控系统
CN112185363A (zh) * 2020-10-21 2021-01-05 北京猿力未来科技有限公司 音频处理方法及装置
CN112185363B (zh) * 2020-10-21 2024-02-13 北京猿力未来科技有限公司 音频处理方法及装置
CN112069484A (zh) * 2020-11-10 2020-12-11 中国科学院自动化研究所 基于多模态交互式的信息采集方法及系统
CN112235682A (zh) * 2020-11-17 2021-01-15 歌尔科技有限公司 耳机通话保密方法以及通话装置
CN112651610A (zh) * 2020-12-17 2021-04-13 韦福瑞 一种基于声音判断与识别模拟环境适应能力的检查方法和系统
CN112651610B (zh) * 2020-12-17 2024-02-02 韦福瑞 一种基于声音判断与识别模拟环境适应能力的检查方法和系统
CN112819061B (zh) * 2021-01-27 2024-05-10 北京小米移动软件有限公司 口令信息识别方法、装置、设备及存储介质
CN112819061A (zh) * 2021-01-27 2021-05-18 北京小米移动软件有限公司 口令信息识别方法、装置、设备及存储介质
CN113112664A (zh) * 2021-02-23 2021-07-13 广州李博士科技研究有限公司 一种人脸识别立式门禁设备
CN114979543A (zh) * 2021-02-24 2022-08-30 中国联合网络通信集团有限公司 一种智能家居控制方法及装置
CN113032758A (zh) * 2021-03-26 2021-06-25 平安银行股份有限公司 视讯问答流程的身份识别方法、装置、设备及存储介质
CN113034110A (zh) * 2021-03-30 2021-06-25 泰康保险集团股份有限公司 基于视频审核的业务处理方法、系统、介质与电子设备
CN113034110B (zh) * 2021-03-30 2023-12-22 泰康保险集团股份有限公司 基于视频审核的业务处理方法、系统、介质与电子设备
CN113221672A (zh) * 2021-04-22 2021-08-06 国网安徽省电力有限公司 一种用于电力仪表库房的面部识别设备
CN113127827B (zh) * 2021-05-08 2024-03-08 上海日羲科技有限公司 一种基于ai系统的用户指令处理方法
CN113239041A (zh) * 2021-05-13 2021-08-10 大连交通大学 一种计算机大数据处理的采集系统及方法
CN113343211A (zh) * 2021-06-24 2021-09-03 工银科技有限公司 数据处理方法、处理系统、电子设备及存储介质
CN113469012A (zh) * 2021-06-28 2021-10-01 广州云从鼎望科技有限公司 用户刷脸验证的方法、系统、介质及装置
CN113469012B (zh) * 2021-06-28 2024-05-03 广州云从鼎望科技有限公司 用户刷脸验证的方法、系统、介质及装置
CN115690866A (zh) * 2021-07-23 2023-02-03 青岛聚看云科技有限公司 一种图像识别方法、服务器及显示设备
CN114007043A (zh) * 2021-10-27 2022-02-01 北京鼎普科技股份有限公司 基于视频数据指纹特征的视频解码方法、装置及系统
CN114007043B (zh) * 2021-10-27 2023-09-26 北京鼎普科技股份有限公司 基于视频数据指纹特征的视频解码方法、装置及系统
CN113890736A (zh) * 2021-11-22 2022-01-04 国网四川省电力公司成都供电公司 一种基于国密sm9算法的移动终端身份认证方法及系统
CN113890736B (zh) * 2021-11-22 2023-02-28 国网四川省电力公司成都供电公司 一种基于国密sm9算法的移动终端身份认证方法及系统
CN114168722A (zh) * 2021-11-23 2022-03-11 安徽经邦软件技术有限公司 基于人工智能技术的财务问答机器人
CN114187630A (zh) * 2021-11-29 2022-03-15 华人运通(上海)云计算科技有限公司 一种人脸特征的比对方法及系统
CN114511941A (zh) * 2022-02-16 2022-05-17 中国工商银行股份有限公司 防作弊签到方法、装置、设备、介质和程序产品
CN114580034A (zh) * 2022-03-10 2022-06-03 合肥工业大学 一种基于fpga的ro puf双重身份认证系统及其控制方法
CN114842513A (zh) * 2022-04-02 2022-08-02 湖南麓木和择科技有限公司 一种基于互联网的数据采集系统及信息采集装置
CN114876321A (zh) * 2022-05-23 2022-08-09 江苏德普尔门控科技有限公司 一种智能化自动感应式带家居系统的入户门
CN115189911A (zh) * 2022-05-30 2022-10-14 平安科技(深圳)有限公司 面签文件的生成方法、装置、设备及存储介质
CN115412284A (zh) * 2022-07-04 2022-11-29 国网浙江省电力有限公司杭州市临安区供电公司 一种电力现场故障信息安全传输方法
CN115311713A (zh) * 2022-08-05 2022-11-08 南京甄视智能科技有限公司 终端提取深度学习网络人脸特征值并网内高效复用的方法与系统
CN115641105A (zh) * 2022-12-01 2023-01-24 中网道科技集团股份有限公司 一种监控社区矫正对象请假外出的数据处理方法
CN115641105B (zh) * 2022-12-01 2023-08-08 中网道科技集团股份有限公司 一种监控社区矫正对象请假外出的数据处理方法
CN116259095A (zh) * 2023-03-31 2023-06-13 南京审计大学 一种基于计算机的识别系统及方法
CN116189680B (zh) * 2023-05-04 2023-09-26 北京水晶石数字科技股份有限公司 一种展演智能设备的语音唤醒方法
CN116189680A (zh) * 2023-05-04 2023-05-30 北京水晶石数字科技股份有限公司 一种展演智能设备的语音唤醒方法
CN117273747B (zh) * 2023-09-28 2024-04-19 广州佳新智能科技有限公司 基于人脸图像识别的支付方法、装置、存储介质和设备
CN117273747A (zh) * 2023-09-28 2023-12-22 广州佳新智能科技有限公司 基于人脸图像识别的支付方法、装置、存储介质和设备
CN117376854A (zh) * 2023-10-30 2024-01-09 深圳中网讯通技术有限公司 多媒体短信内容的生成方法、装置、设备及存储介质
CN120410783A (zh) * 2025-07-03 2025-08-01 遇见美好文旅科技集团有限公司 一种中老年旅游erp业务全流程管理方法及系统

Also Published As

Publication number Publication date
CN106790054A (zh) 2017-05-31

Similar Documents

Publication Publication Date Title
WO2018113526A1 (zh) 基于人脸识别和声纹识别的交互式认证系统及方法
CN104834849B (zh) 基于声纹识别和人脸识别的双因素身份认证方法及系统
US8812319B2 (en) Dynamic pass phrase security system (DPSS)
CN106782572B (zh) 语音密码的认证方法及系统
CN108075892B (zh) 一种语音处理的方法、装置和设备
KR101997371B1 (ko) 신원 인증 방법 및 장치, 단말기 및 서버
CN106850648B (zh) 身份验证方法、客户端和服务平台
US10276168B2 (en) Voiceprint verification method and device
US9979721B2 (en) Method, server, client and system for verifying verification codes
US11665153B2 (en) Voice biometric authentication in a virtual assistant
WO2017197953A1 (zh) 基于声纹的身份识别方法及装置
WO2016123900A1 (zh) 基于动态密码语音的具有自学习功能的身份认证系统及方法
US9721079B2 (en) Image authenticity verification using speech
CN111611568A (zh) 一种人脸声纹复核终端及其身份认证方法
CN103841108A (zh) 用户生物特征的认证方法和系统
CN106373575A (zh) 一种用户声纹模型构建方法、装置及系统
CN110866234B (zh) 一种基于多生物特征的身份验证系统
CN103714282A (zh) 一种互动式的基于生物特征的识别方法
EP3001343B1 (en) System and method of enhanced identity recognition incorporating random actions
CN103177238A (zh) 终端和用户识别方法
CN107451185B (zh) 录音方法、朗读系统、计算机可读存储介质和计算机装置
CN110110513A (zh) 基于人脸和声纹的身份认证方法、装置和存储介质
CN109785834B (zh) 一种基于验证码的语音数据样本采集系统及其方法
CN112417412A (zh) 一种银行账户余额查询方法、装置及系统
CN109727342A (zh) 门禁系统的识别方法、装置、门禁系统及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17884689

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17884689

Country of ref document: EP

Kind code of ref document: A1