TW202450270A

TW202450270A - A system, method and computer-readable medium thereof for interaction biometrics identity verification

Info

Publication number: TW202450270A
Application number: TW112120359A
Authority: TW
Inventors: 酈頤芳; 邱彥霖; 吳玉善; 柳恆崧
Original assignee: 中華電信股份有限公司
Priority date: 2023-05-31
Filing date: 2023-05-31
Publication date: 2024-12-16

Abstract

The present invention provides a system, method and computer-readable medium for interactive biometric authentication, including an interactive identification module, a facial recognition module, an interactive question-and-answer module, and a biometric recognition module. The invention presents a verification question to a user based on existing question-and-answer data, and uses the user's response to judge whether the answer is correct and whether the user is a live person. In this way, accurately confirms the correctness and authenticity of the answer video stream, and prevents individuals with malicious intent from using AI generated fake videos to pass identity verification, thereby enhancing the security of online services and reducing the cost of identity verification.

Description

An interactive liveness identification authentication system, method and computer-readable medium thereof

本發明關於一種身分驗證技術，尤其指一種互動式活體識別身分驗證系統、方法及其電腦可讀媒介。 The present invention relates to an identity verification technology, and more particularly to an interactive liveness identification identity verification system, method, and computer-readable medium thereof.

隨著線上服務的蓬勃發展，網路個人資訊安全更加重要，因而身分驗證相當重要。 With the rapid development of online services, the security of personal information on the Internet has become more important, so identity verification is very important.

現有技術所採用的活體識別身分驗證，需要使用者拍攝並註冊一段影片，以預先儲存於一活體資訊資料庫中，其中，該影片內容為使用者所唸出其帳號之影像，如：123456。因此，當需要驗證時，使用者必須對著影音通訊設備(如手機)，唸出經註冊之帳號(如123456)，使系統比對使用者之人臉、聲紋及內容是否與活體資訊資料庫中之影音相符。 The existing technology for biometric identity verification requires the user to shoot and register a video to be pre-stored in a biometric information database, where the video content is the image of the user reading out his account number, such as: 123456. Therefore, when verification is required, the user must face the audio and video communication device (such as a mobile phone) and read out the registered account number (such as 123456) so that the system can compare the user's face, voiceprint and content with the audio and video in the biometric information database to see if they match.

然而，現有技術必須另外建立使用者之活體資訊資料庫，以記錄使用者的人臉、聲紋等生物資訊，不但要花費相當大的成本，還會造成使用者在註冊上的不便。再者，在資訊安全性上，該活體資訊資料庫若發生資訊外洩之情勢，其造成的資安問題更是無法估計。 However, the existing technology must establish a user's living body information database to record the user's face, voiceprint and other biometric information, which not only costs a lot of money, but also causes inconvenience to the user in registration. Furthermore, in terms of information security, if the living body information database is leaked, the security problems it causes are even more immeasurable.

因此，如何提供一種活體識別身分驗證技術，在無需建立新的活體資訊資料庫之情況下，透過活體識別技術對使用者進行身分驗證，以大幅降低人力及物力成本，且提升資訊安全。 Therefore, how to provide a liveness identification authentication technology to authenticate users through liveness identification technology without the need to establish a new liveness information database, so as to significantly reduce manpower and material costs and improve information security.

為解決上述問題，本發明提供一種互動式活體識別身分驗證系統，係包含：一互動判別模組，係接收到來自一使用者裝置之身分驗證需求後，發出至少一驗證問題至該使用者裝置，且接收該使用者裝置所回覆的一使用者之答案影音串流；一互動式問答模組，係通訊連接該互動判別模組，以接收來自該互動判別模組之該答案影音串流，以取得該答案影音串流中的該使用者之驗證回答，再由該互動式問答模組利用一人員編號取得一正確答案後，判斷該使用者之驗證回答是否正確；以及一活體辨識模組，係通訊連接該互動判別模組，以接收來自該互動判別模組之該答案影音串流，俾判別該答案影音串流中之影像及聲音之相似程度是否達到影音同步，進而判斷該答案影音串流中之該使用者是否為活體，其中，於該互動式問答模組判斷該使用者為回答正確，且該活體辨識模組判別該使用者為活體時，該互動判別模組確認該使用者通過身分識別認證，以將驗證成功之身分驗證結果傳送至該使用者裝置；反之，於該互動式問答模組判斷該使用者回答錯誤，或該活體辨識模組判別該使用者為非活體時，該互動判別模組確認該使用者未通過身分識別認證，以將驗證失敗之身分驗證結果傳送至該使用者裝置。 To solve the above problems, the present invention provides an interactive live body recognition identity verification system, which includes: an interactive determination module, which, after receiving an identity verification request from a user device, issues at least one verification question to the user device and receives a video stream of a user's answer replied by the user device; an interactive question-and-answer module, which is communicatively connected to the interactive determination module to receive the video stream of the answer from the interactive determination module to obtain the user's verification answer in the video stream of the answer, and then the interactive question-and-answer module obtains a correct answer using a personal number to determine whether the user's verification answer is correct; and a live body recognition module, which is communicatively connected to the interactive determination module to receive the video stream of the answer from the interactive determination module. The interactive question-and-answer module determines whether the image and sound in the answer video stream are similar enough to achieve video and audio synchronization, and further determines whether the user in the answer video stream is alive. When the interactive question-and-answer module determines that the user's answer is correct and the liveness recognition module determines that the user is alive, the interactive recognition module confirms that the user has passed the identity recognition authentication, and transmits the identity recognition result of successful authentication to the user device; on the contrary, when the interactive question-and-answer module determines that the user's answer is incorrect, or the liveness recognition module determines that the user is not alive, the interactive recognition module confirms that the user has not passed the identity recognition authentication, and transmits the identity recognition result of failed authentication to the user device.

本發明又提供一種互動式活體識別身分驗證方法，係包含：由一互動判別模組接收到來自一使用者裝置之身分驗證需求後，發出至少一驗證問題至該使用者裝置，且接收該使用者裝置所回覆的一使用者之答案影音串流；由一互動式問答模組接收來自該互動判別模組之該答案影音串流，以取得該答案影音串流中的該使用者之驗證回答；由該互動式問答模組利用一人員編號取得一正確答案後，判斷該使用者之驗證回答是否正確；以及由一活體辨識模組接收來自該互動判別模組之該答案影音串流，以判別該答案影音串流中之影像及聲音之相似程度是否達到影音同步，進而判斷該答案影音串流中之該使用者是否為活體，其中，於該互動式問答模組判斷該使用者為回答正確，且該活體辨識模組判別該使用者為活體時，該互動判別模組確認該使用者通過身分識別認證，以將驗證成功之身分驗證結果傳送至該使用者裝置；反之，於該互動式問答模組判斷該使用者回答錯誤，或該活體辨識模組判別該使用者為非活體時，該互動判別模組確認該使用者未通過身分識別認證，以將驗證失敗之身分驗證結果傳送至該使用者裝置。 The present invention also provides an interactive live body identification identity verification method, which includes: after an interactive identification module receives an identity verification request from a user device, it sends at least one verification question to the user device and receives a video stream of a user's answer replied by the user device; an interactive question-and-answer module receives the video stream of the answer from the interactive identification module to obtain the user's verification answer in the video stream of the answer; after the interactive question-and-answer module obtains a correct answer using a personal number, it determines whether the user's verification answer is correct; and a live body identification module receives the video stream of the answer from the interactive identification module to determine whether the answer is correct. The interactive question-and-answer module determines whether the image and sound in the audio and video stream are similar enough to achieve audio and video synchronization, and then determines whether the user in the answer audio and video stream is alive. When the interactive question-and-answer module determines that the user's answer is correct and the liveness recognition module determines that the user is alive, the interactive determination module confirms that the user has passed the identity recognition authentication, and transmits the identity verification result of successful authentication to the user device; on the contrary, when the interactive question-and-answer module determines that the user's answer is incorrect, or the liveness recognition module determines that the user is not alive, the interactive determination module confirms that the user has not passed the identity recognition authentication, and transmits the identity verification result of failed authentication to the user device.

於前述實施例中，更包含一人臉辨識模組，係通訊連接該互動判別模組，以接收來自該互動判別模組之該答案影音串流，且對該答案影音串流中之該使用者進行人臉辨識，其中，當該使用者通過人臉辨識後從一資料庫取得該人員編號，以通知該互動判別模組該使用者通過人臉辨識。 In the aforementioned embodiment, a face recognition module is further included, which is communicatively connected to the interactive determination module to receive the answer video stream from the interactive determination module and perform face recognition on the user in the answer video stream, wherein when the user passes the face recognition, the person number is obtained from a database to notify the interactive determination module that the user passes the face recognition.

於前述實施例中，該活體辨識模組從該影像擷取該使用者之連續人臉影像及從該聲音擷取出該使用者之連續MFCC特徵，以計算出該連續人臉影像及該連續MFCC特徵之間的複數餘弦相似性之數值，俾從該複數餘弦相似性之數餘弦相似性之數值中得到一同步相似值，進而依據該同步相似值判別該答案影音串流是否為同步。 In the aforementioned embodiment, the liveness recognition module captures the user's continuous facial images from the image and the user's continuous MFCC features from the voice to calculate the value of the complex cosine similarity between the continuous facial images and the continuous MFCC features, so as to obtain a synchronization similarity value from the value of the complex cosine similarity, and then determine whether the answer video stream is synchronized based on the synchronization similarity value.

於前述實施例中，該活體辨識模組更包含一活體辨識模型，以由該活體辨識模型將該連續人臉影像及該連續MFCC特徵經深度學習運算後，輸出相對應的多維度空間之影像向量組及聲音向量組，進而依據該影像向量組及該聲音向量組計算出該連續人臉影像及該連續MFCC特徵之間的該複數餘弦相似性之數值。 In the aforementioned embodiment, the liveness recognition module further includes a liveness recognition model, which outputs a corresponding image vector group and a sound vector group in a multi-dimensional space after deep learning operation on the continuous face image and the continuous MFCC feature, and then calculates the value of the complex cosine similarity between the continuous face image and the continuous MFCC feature according to the image vector group and the sound vector group.

於前述實施例中，該互動判別模組判斷是否有下一題驗證問題，且於具有下一題驗證問題時，由該互動式問答模組及該活體辨識模組判斷該使用者裝置所回覆之另一答案影音串流中之該使用者是否回答正確且為活體。 In the aforementioned embodiment, the interactive determination module determines whether there is a next verification question, and when there is a next verification question, the interactive question-and-answer module and the liveness recognition module determine whether the user in the other answer video stream replied by the user device has answered correctly and is alive.

由上可知，本發明之互動式活體識別身分驗證系統、方法及其電腦可讀媒介，於接收到使用者裝置之身分驗證需求後，對使用者提出問題，藉此得到一答案影音串流，並對該答案影音串流進行人臉辨識後取得正確答案，以確認使用者之驗證回答是否正確，同時判別該答案影音串流是否為影音同步，藉此確認該答案影音串流中之使用者是否為活體。 As can be seen from the above, the interactive liveness identification authentication system, method and computer-readable medium of the present invention, after receiving the authentication request from the user device, asks the user a question to obtain an answer video stream, and performs facial recognition on the answer video stream to obtain the correct answer to confirm whether the user's authentication answer is correct, and at the same time determines whether the answer video stream is synchronized, thereby confirming whether the user in the answer video stream is alive.

是以，本發明藉由判斷驗證回答之正確性及答案影音串流之同步性，能準確地確認答案影音串流的真實性，以避免有心人士利用人工智慧所生成之偽造影音而通過身分認證，更可使系統服務提供者直接使用原先資料庫中之問答資料進行身分驗證，而無須花費更大的成本建立用戶的活體資料庫，藉此提升線上服務之安全性及降低身分驗證之成本。 Therefore, the present invention can accurately confirm the authenticity of the answer video stream by judging the correctness of the verification answer and the synchronization of the answer video stream, so as to prevent malicious people from using fake videos generated by artificial intelligence to pass identity authentication. It can also enable system service providers to directly use the question and answer data in the original database for identity authentication without spending more costs to establish a user's live database, thereby improving the security of online services and reducing the cost of identity authentication.

1:互動式活體識別身分驗證系統 1: Interactive liveness identification authentication system

11:互動判別模組 11: Interaction identification module

12:人臉辨識模組 12: Face recognition module

13:互動式問答模組 13:Interactive question and answer module

14:活體辨識模組 14: Liveness recognition module

14a:活體辨識模型 14a: Liveness recognition model

15:資料庫 15: Database

9:使用者裝置(智慧型手機) 9: User device (smartphone)

9a:鏡頭 9a: Lens

S11至S16、S21至S211:步驟 S11 to S16, S21 to S211: Steps

圖1係為本發明之互動式活體識別身分驗證系統之架構示意圖。 Figure 1 is a schematic diagram of the architecture of the interactive liveness identification authentication system of the present invention.

圖1A係為本發明之影音同步辨識流程示意圖。 Figure 1A is a schematic diagram of the video and audio synchronization recognition process of the present invention.

圖2係為本發明之互動式活體識別身分驗證方法之流程示意圖。 Figure 2 is a schematic diagram of the process of the interactive liveness identification authentication method of the present invention.

圖3A至圖3C係為本發明之互動式活體識別身分驗證系統之具體實施例之示意圖。 Figures 3A to 3C are schematic diagrams of a specific embodiment of the interactive liveness identification authentication system of the present invention.

以下藉由特定的具體實施例說明本發明之實施方式，熟悉此技藝之人士可由本說明書所揭示之內容輕易地瞭解本發明之其他優點及功效。 The following is a specific and concrete example to illustrate the implementation of the present invention. People familiar with this technology can easily understand other advantages and effects of the present invention from the content disclosed in this manual.

須知，本說明書所附圖式所繪示之結構、比例、大小等，均僅用以配合說明書所揭示之內容，以供熟悉此技藝之人士之瞭解與閱讀，並非用以限定本發明可實施之限定條件，故不具技術上之實質意義，任何結構之修飾、比例關係之改變或大小之調整，在不影響本發明所能產生之功效及所能達成之目的下，均應仍落在本發明所揭示之技術內容得能涵蓋之範圍內。同時，本說明書中所引用之如「一」、「第一」、「第二」、「上」及「下」等之用語，亦僅為便於敘述之明瞭，而非用以限定本發明可實施之範圍，其相對關係之改變或調整，在無實質變更技術內容下，當視為本發明可實施之範疇。 It should be noted that the structures, proportions, sizes, etc. depicted in the drawings attached to this specification are only used to match the contents disclosed in the specification for understanding and reading by people familiar with this technology, and are not used to limit the restrictive conditions for the implementation of the present invention. Therefore, they have no substantial technical significance. Any modification of the structure, change of the proportion relationship or adjustment of the size should still fall within the scope of the technical content disclosed by the present invention without affecting the effects and purposes that can be achieved by the present invention. At the same time, the terms such as "one", "first", "second", "upper" and "lower" used in this specification are only used to facilitate the clarity of the description, and are not used to limit the scope of the implementation of the present invention. The changes or adjustments in their relative relationships shall be regarded as the scope of the implementation of the present invention without substantially changing the technical content.

圖1係為本發明之互動式活體識別身分驗證系統1之架構示意圖，係包括一互動判別模組11、一人臉辨識模組12、一互動式問答模組13、一具有活體辨識模型14a之活體辨識模組14及一資料庫15。 FIG1 is a schematic diagram of the structure of the interactive liveness recognition identity verification system 1 of the present invention, which includes an interactive judgment module 11, a face recognition module 12, an interactive question-and-answer module 13, a liveness recognition module 14 having a liveness recognition model 14a, and a database 15.

具體而言，該互動式活體識別身分驗證系統1係建立於伺服器(網路伺服器、通用型伺服器、檔案型伺服器、儲存單元型伺服器等)或電腦等具有適當演算機制之電子設備中，其中，該互動式活體識別身分驗證系統1中之各個模組(該互動判別模組11、該人臉辨識模組12、該互動式問答模組13及該活體辨識模組14)均可為軟體、硬體或韌體；若為硬體，則可為具有資料處理與運算能力之處理單元、處理器、電腦或伺服器；若為軟體或韌體，則可包括處理單元、處理器、電腦或伺服器可執行之指令，且可安裝於同一硬體裝置或分布於不同的複數硬體裝置。此外，該資料庫15係包含硬碟(機械硬碟、固態硬碟等)，且該資料庫15可為SQL(Structured Query Language)或NoSQL(Non-Structured Query Language)資料庫，於此不限。 Specifically, the interactive biometric authentication system 1 is established in a server (network server, general-purpose server, file server, storage unit server, etc.) or a computer or other electronic device with an appropriate computing mechanism, wherein each module in the interactive biometric authentication system 1 (the interactive judgment module 11, the face recognition module 12, the interactive The dynamic question-answering module 13 and the liveness recognition module 14 can be software, hardware or firmware; if it is hardware, it can be a processing unit, processor, computer or server with data processing and computing capabilities; if it is software or firmware, it can include instructions that can be executed by a processing unit, processor, computer or server, and can be installed on the same hardware device or distributed on different multiple hardware devices. In addition, the database 15 includes a hard disk (mechanical hard disk, solid state hard disk, etc.), and the database 15 can be a SQL (Structured Query Language) or NoSQL (Non-Structured Query Language) database, which is not limited here.

所述之互動判別模組11，係通訊(或電性)連接一使用者裝置9，以於接收到來自該使用者裝置9之身分驗證需求後，由該互動判別模組11依據一資料庫15中預設之複數驗證問題，以透過該使用者裝置9向一使用者發出至少一(包含一個、複數個)驗證問題，且該使用者裝置9將該使用者所回覆之一答案影音串流回傳至該互動判別模組11。 The interactive determination module 11 is communicatively (or electrically) connected to a user device 9, so that after receiving an identity verification request from the user device 9, the interactive determination module 11 sends at least one (including one or more) verification questions to a user through the user device 9 according to a plurality of verification questions preset in a database 15, and the user device 9 returns a video and audio stream of an answer replied by the user to the interactive determination module 11.

所述之人臉辨識模組12，係通訊(或電性)連接該互動判別模組11，以接收來自該互動判別模組11之答案影音串流，且該人臉辨識模組12對該答案影音串流中之該使用者進行人臉辨識，且當該使用者通過人臉辨識後，從一資料庫15取得該使用者相對應之人員編號，以提供給該互動判別模組11，以通知該互動判別模組11該使用者通過人臉辨識。 The face recognition module 12 is connected to the interactive judgment module 11 by communication (or electrical connection) to receive the answer video stream from the interactive judgment module 11, and the face recognition module 12 performs face recognition on the user in the answer video stream, and when the user passes the face recognition, obtains the corresponding personnel number of the user from a database 15 to provide it to the interactive judgment module 11 to notify the interactive judgment module 11 that the user passes the face recognition.

在一實施例中，由該人臉辨識模組12採用人臉辨識技術，以取得該使用者之人臉特徵資料，且與該資料庫15中之人臉資料進行比對，藉此確認該使用者之身分，其中，該人臉辨識技術包含但不限於基於人臉特徵點的識別演算法(feature-based recognition algorithms)、基於整幅人臉圖像的識別演算法 (appearance-based recognition algorithms)、基於模板的識別演算法(template-based recognition algorithms)、利用神經網絡進行識別的演算法(recognition algorithms using neural network)及利用支持向量機進行識別的演算法(recognition algorithms using SVM)等。 In one embodiment, the face recognition module 12 uses face recognition technology to obtain the user's facial feature data and compare it with the facial data in the database 15 to confirm the identity of the user, wherein the face recognition technology includes but is not limited to recognition algorithms based on facial features (feature-based recognition algorithms), recognition algorithms based on the entire face image (appearance-based recognition algorithms), template-based recognition algorithms (template-based recognition algorithms), recognition algorithms using neural networks (recognition algorithms using neural networks) and recognition algorithms using support vector machines (recognition algorithms using SVM), etc.

在一實施例中，該資料庫15中已具有一般進行身分驗證時所需要之正確答案，且該正確答案係為使用者註冊帳號時所輸入之答案。 In one embodiment, the database 15 already has the correct answer required for general identity verification, and the correct answer is the answer entered by the user when registering an account.

所述之互動式問答模組13，係通訊(或電性)連接該互動判別模組11，以於該互動判別模組11確認該使用者通過人臉辨識後，該互動判別模組11提供該答案影音串流至該互動式問答模組13，且由該互動式問答模組13利用影音辨識技術、語音辨識技術或唇語辨識技術取得該答案影音串流中的該使用者之驗證回答，再由該互動式問答模組13依據該人臉辨識模組12所取得之人員編號從該資料庫15中取得一正確答案，以比對該使用者之驗證回答與該正確答案使否一致，藉此判斷該使用者是否回答正確。 The interactive question-and-answer module 13 is communicatively (or electrically) connected to the interactive determination module 11, so that after the interactive determination module 11 confirms that the user has passed the face recognition, the interactive determination module 11 provides the answer video stream to the interactive question-and-answer module 13, and the interactive question-and-answer module 13 uses the video recognition technology, voice recognition technology or lip recognition technology to obtain the user's verification answer in the answer video stream, and then the interactive question-and-answer module 13 obtains a correct answer from the database 15 according to the personnel number obtained by the face recognition module 12, so as to compare the user's verification answer with the correct answer to see if they are consistent, thereby judging whether the user's answer is correct.

在一實施例中，語音辨識技術包含但不限於深度卷積神經網路(deep convolutional neural network)或長短記憶神經網路(long-short memory neural network)之自動語音識別工具(Auto Speech Recognition Tool,ASRT)；以及該唇語辨識技術包含但不限於基於卷積神經網路(Convolutional Neural Networks,CNN)之LipNet，其中，LipNet架構的輸入為連續嘴唇影像，經過基於卷積神經網路(Convolutional Neural Networks,CNN)之深度學習模型後，再經過CTC(Connectionist Temporal Classification)計算，以得到唇語的輸出結果。 In one embodiment, the speech recognition technology includes but is not limited to an automatic speech recognition tool (ASRT) based on a deep convolutional neural network or a long-short memory neural network; and the lip reading recognition technology includes but is not limited to LipNet based on a convolutional neural network (CNN), wherein the input of the LipNet architecture is a continuous lip image, which is calculated by a deep learning model based on a convolutional neural network (CNN) and then by CTC (Connectionist Temporal Classification) to obtain the output result of lip reading.

所述之活體辨識模組14，係通訊(或電性)連接該互動判別模組11，以於該互動判別模組11確認該使用者通過人臉辨識後，該互動判別模組11提供該答案影音串流至該活體辨識模組14，且由該活體辨識模組14執行一影音同步辨識流程，以辨識該答案影音串流中之影像及聲音之相似程度，再依據該聲音及該影像之相似程度判別影音是否為同步，藉此判斷該答案影音串流中之該使用者是否為活體。 The liveness recognition module 14 is connected to the interactive judgment module 11 by communication (or electrical connection), so that after the interactive judgment module 11 confirms that the user has passed the face recognition, the interactive judgment module 11 provides the answer video stream to the liveness recognition module 14, and the liveness recognition module 14 executes a video and audio synchronization recognition process to identify the similarity between the image and sound in the answer video stream, and then judges whether the video and audio are synchronized according to the similarity between the sound and the image, thereby judging whether the user in the answer video stream is alive.

在一實施例中，如圖1A所示，該影音同步辨識流程係包括以下步驟S11至S16： In one embodiment, as shown in FIG. 1A , the video and audio synchronization recognition process includes the following steps S11 to S16:

於步驟S11中，由該活體辨識模組14接收該答案影音串流，且將該答案影音串流切分為該影像及該聲音之檔案。 In step S11, the liveness recognition module 14 receives the answer video stream and divides the answer video stream into the image and sound files.

於步驟S12中，由該活體辨識模組14取得該影像後，從該影像中擷取該使用者之連續人臉影像。 In step S12, after the liveness recognition module 14 obtains the image, the continuous facial image of the user is captured from the image.

於步驟S13中，由該活體辨識模組14取得該聲音後，對該聲音進行語音處理，以擷取出MFCC(Mel-Frequency Cepstral Coefficient，梅爾頻率倒譜係數)特徵，其中，該活體辨識模組14將該聲音經由預強調(Pre-emphasis)、擷取音框(Frame)、漢明(Hamming)窗、快速傅立葉轉換(Fast Fourier Transform,FFT)、三角帶通濾波器(Triangular Bandpass Filters)、對數(Log)轉換及離散餘弦轉換(Discrete Cosine Transformation,DCT)，藉此擷取出該連續MFCC特徵。 In step S13, after the liveness recognition module 14 obtains the sound, it performs speech processing on the sound to extract MFCC (Mel-Frequency Cepstral Coefficient) features, wherein the liveness recognition module 14 processes the sound through pre-emphasis, frame capture, Hamming window, Fast Fourier Transform (FFT), Triangular Bandpass Filters, Log transform and Discrete Cosine Transform (DCT), thereby extracting the continuous MFCC features.

於步驟S14中，由該活體辨識模組14將擷取後的該連續人臉影像及該連續MFCC特徵輸入一基於類神經網路之活體辨識模型14a，以由該活體辨識模型14a將該連續人臉影像及該連續MFCC特徵經深度學習運算後，輸出相對應的多維度空間之影像向量組及聲音向量組，其中，該連續人臉影像透過該活體辨識模型14a轉換為該影像向量組，而該連續MFCC特徵透過該活體辨識模型14a轉換為該聲音向量組。 In step S14, the liveness recognition module 14 inputs the captured continuous facial image and the continuous MFCC features into a liveness recognition model 14a based on a neural network, so that the liveness recognition model 14a performs deep learning operations on the continuous facial image and the continuous MFCC features and outputs a corresponding image vector group and sound vector group in a multi-dimensional space, wherein the continuous facial image is converted into the image vector group through the liveness recognition model 14a, and the continuous MFCC features are converted into the sound vector group through the liveness recognition model 14a.

於步驟S15中，由該活體辨識模組14利用一公式(1)，以依據該影像向量組及該聲音向量組計算出該連續人臉影像及該連續MFCC特徵之間的複數餘弦相似性(Cosine-similarity)之數值(介於-1至1之間)，且利用統計學方式依據該複數餘弦相似性之數值得到一同步相似值，藉此計算出維度空間之距離。 In step S15, the living body recognition module 14 uses a formula (1) to calculate the value of the complex cosine similarity (between -1 and 1) between the continuous face image and the continuous MFCC features according to the image vector group and the sound vector group, and obtains a synchronous similarity value according to the value of the complex cosine similarity by a statistical method, thereby calculating the distance in the dimensional space.

其中，A係指該連續人臉影像所轉換之影像向量組；以及B係指該連續MFCC特徵所轉換之聲音向量組。 Among them, A refers to the image vector group converted by the continuous face image; and B refers to the sound vector group converted by the continuous MFCC features.

在一實施例中，該活體辨識模組14從該複數餘弦相似性之數值中取中位數，以得到該同步相似值；或是，該活體辨識模組14採用平均數、加權平均數等方式，以依據該複數餘弦相似性之數值計算出該同步相似值。 In one embodiment, the liveness recognition module 14 takes the median from the value of the multiple cosine similarity to obtain the synchronous similarity value; or, the liveness recognition module 14 uses an average, a weighted average, etc. to calculate the synchronous similarity value based on the value of the multiple cosine similarity.

舉例而言，當取樣率為100Hz時，每秒取100幀之MFCC特徵，且該連續MFCC特徵係為13維之特徵向量，再取每秒25幀之連續人臉影像，且每1幀人臉影像係對應到4幀MFCC特徵。再者，由該活體辨識模型14a進行向量轉換後，該活體辨識模組14逐一計算該影像向量組與該聲音向量組之間的複數餘弦相似性之數值，例如：計算「第1幀至第5幀的人臉影像之影像向量」與「第1幀至第20幀的MFCC特徵之聲音向量」之間的餘弦相似性之數值、「第2幀至第6幀的人臉影像之影像向量」與「第5幀至第24幀的MFCC特徵之聲音向量」之間的餘弦相似性之數值、「第3幀至第7幀的人臉影像之影像向量」與「第9幀至第28幀的MFCC特徵之聲音向量」之間的餘弦相似性之數值，以此類推，藉此計算出該複數餘弦相似性之數值，進而透過取中位數、平均數、加權平均數等統計學方式得到該同步相似值。 For example, when the sampling rate is 100 Hz, 100 frames of MFCC features are taken per second, and the continuous MFCC features are 13-dimensional feature vectors. Then, 25 frames of continuous facial images are taken per second, and each frame of facial image corresponds to 4 frames of MFCC features. Furthermore, after the liveness recognition model 14a performs vector conversion, the liveness recognition module 14 calculates the values of the complex cosine similarities between the image vector group and the sound vector group one by one, for example: calculating the values of the cosine similarity between the "image vector of the facial image from the 1st to the 5th frame" and the "sound vector of the MFCC features from the 1st to the 20th frame", the "image vector of the facial image from the 2nd to the 6th frame" and the "sound vector of the MFCC features from the 2nd to the 6th frame". The value of the cosine similarity between the sound vector of the MFCC features of the 5th to 24th frames, the value of the cosine similarity between the image vector of the face image of the 3rd to 7th frames and the sound vector of the MFCC features of the 9th to 28th frames, and so on, is calculated to obtain the value of the complex cosine similarity, and then the synchronous similarity value is obtained by taking the median, average, weighted average and other statistical methods.

於步驟S16中，由該活體辨識模組14比對該同步相似值是否大於一門閥值，以判斷該答案影音串流是否達到影音同步，藉此判斷該答案影音串流中之使用者是否為活體，其中，若該同步相似值大於該門閥值，則該答案影音串流視為影音同步；反之，若該同步相似值小於該門閥值，則該答案影音串流視為影音不同步。例如：同步相似值之閥值可設置0.5，甲樣本經運算後，其同步相似值為0.9係大於0.5，故甲樣本為影音同步；以及乙樣本經運算後，其同步相似值為0.2係小於0.5，故乙樣本為影音不同步。 In step S16, the liveness recognition module 14 compares whether the synchronization similarity value is greater than a threshold value to determine whether the answer video stream has achieved video synchronization, thereby determining whether the user in the answer video stream is a live body, wherein if the synchronization similarity value is greater than the threshold value, the answer video stream is considered to be video synchronized; conversely, if the synchronization similarity value is less than the threshold value, the answer video stream is considered to be video asynchronous. For example: the threshold value of the synchronization similarity value can be set to 0.5, after calculation, the synchronization similarity value of sample A is 0.9, which is greater than 0.5, so sample A is video synchronized; and after calculation, the synchronization similarity value of sample B is 0.2, which is less than 0.5, so sample B is video asynchronous.

在一實施例中，於該互動式問答模組13判斷該使用者為回答正確，且該活體辨識模組14判別該使用者為活體時，該互動判別模組11確認該使用者通過身分識別認證，且將身分驗證結果(即身分驗證成功)傳送至該使用者裝置9，以透過該使用者裝置9通知該使用者其身分驗證成功；反之，當該互動式問答模組13判斷該使用者回答錯誤，或該活體辨識模組14判別該使用者為非活體時，該互動判別模組11確認該使用者未通過身分識別認證，且將身分驗證結果(即身分驗證失敗)傳送至該使用者裝置9，以透過該使用者裝置9通知該使用者其身分驗證失敗。 In one embodiment, when the interactive question-and-answer module 13 determines that the user's answer is correct and the liveness recognition module 14 determines that the user is alive, the interactive recognition module 11 confirms that the user has passed the identity recognition authentication and transmits the identity verification result (i.e., identity verification success) to the user device 9 to notify the user of the success of the identity verification through the user device 9; on the contrary, when the interactive question-and-answer module 13 determines that the user's answer is incorrect or the liveness recognition module 14 determines that the user is not alive, the interactive recognition module 11 confirms that the user has not passed the identity recognition authentication and transmits the identity verification result (i.e., identity verification failure) to the user device 9 to notify the user of the failure of the identity verification through the user device 9.

在一實施例中，由該互動判別模組11判斷是否有下一題驗證問題，其中，於具有下一題驗證問題時，該互動式活體識別身分驗證系統1再判斷該使用者所回覆之另一答案影音串流是否回答正確且影音同步。 In one embodiment, the interactive determination module 11 determines whether there is a next verification question, wherein when there is a next verification question, the interactive liveness identification authentication system 1 determines whether the other answer video stream replied by the user is correct and the video and audio are synchronized.

圖2係為本發明之互動式活體識別身分驗證方法之流程示意圖，且包含以下步驟S21至S211。 FIG2 is a schematic diagram of the process of the interactive liveness identification authentication method of the present invention, and includes the following steps S21 to S211.

於步驟S21中，由一使用者利用一使用者裝置9向一互動判別模組11發出一身分驗證需求。 In step S21, a user uses a user device 9 to send an identity verification request to an interactive determination module 11.

於步驟S22中，由該互動判別模組11接收來自該使用者裝置9之身分驗證需求，且依據一資料庫15中預設之複數問題透過該使用者裝置9向一使用者發出至少一驗證問題。 In step S22, the interaction determination module 11 receives an identity verification request from the user device 9, and issues at least one verification question to a user through the user device 9 based on a plurality of questions preset in a database 15.

於步驟S23中，由該使用者透過該使用者裝置9接受來自該互動判別模組11之驗證問題，且該使用者利用該使用者裝置9錄製一答案影音串流，以由該使用者裝置9將該使用者所回覆之答案影音串流回傳至該互動判別模組11。 In step S23, the user accepts the verification question from the interactive determination module 11 through the user device 9, and the user uses the user device 9 to record an answer video stream, so that the user device 9 returns the answer video stream replied by the user to the interactive determination module 11.

於步驟S24中，由該互動判別模組11接收來自該使用者裝置9之答案影音串流後，將該答案影音串流傳送至一人臉辨識模組12。 In step S24, after the interaction determination module 11 receives the answer video stream from the user device 9, the answer video stream is transmitted to a face recognition module 12.

於步驟S25中，由該人臉辨識模組12對該答案影音串流中之該使用者進行人臉辨識，且於該使用者通過人臉辨識後，該人臉辨識模組12從該資料庫15取得該使用者相對應之人員編號，並提供給該互動判別模組11，藉此通知該互動判別模組11該使用者通過人臉辨識。 In step S25, the face recognition module 12 performs face recognition on the user in the answer video stream, and after the user passes the face recognition, the face recognition module 12 obtains the personnel number corresponding to the user from the database 15 and provides it to the interaction determination module 11, thereby notifying the interaction determination module 11 that the user passes the face recognition.

於步驟S26中，由該互動判別模組11確認該使用者是否通過人臉辨識，其中，若該使用者通過人臉辨識，則執行步驟S27；反之，若該使用者並未通過人臉辨識，則執行步驟S29。 In step S26, the interaction determination module 11 confirms whether the user has passed the facial recognition. If the user has passed the facial recognition, step S27 is executed; otherwise, if the user has not passed the facial recognition, step S29 is executed.

於步驟S27中，由該互動判別模組11提供該答案影音串流至該互動式問答模組13及該活體辨識模組14，以由該互動式問答模組13取得該答案影音串流中的該使用者之驗證回答，且依據該人員編號從該資料庫15中取得一正確答案，以判斷該使用者是否回答正確，以及由該活體辨識模組14依據該答案影音串流中之聲音及影像之相似程度判別影音是否為同步，其中，若該使用者為回答正確且該答案影音串流為影音同步，則執行步驟S28；反之，若該使用者為回答錯誤或/及該答案影音串流為影音不同步，則執行步驟S29。 In step S27, the interactive judgment module 11 provides the answer video stream to the interactive question-answer module 13 and the liveness recognition module 14, so that the interactive question-answer module 13 obtains the user's verification answer in the answer video stream, and obtains a correct answer from the database 15 according to the personnel number to determine whether the user's answer is correct, and the liveness recognition module 14 determines whether the video is synchronized according to the similarity between the sound and image in the answer video stream. If the user's answer is correct and the answer video stream is synchronized, step S28 is executed; otherwise, if the user's answer is incorrect or/and the answer video stream is not synchronized, step S29 is executed.

於步驟S28中，由該互動判別模組11判斷是否有下一題驗證問題，其中，若具有下一題驗證問題，則判斷該使用者所回覆之另一答案影音串流是否回答正確且影音同步；反之，若無下一題驗證問題，則執行步驟S210。 In step S28, the interaction determination module 11 determines whether there is a next verification question. If there is a next verification question, it determines whether the other answer video stream replied by the user is correct and the video and audio are synchronized; otherwise, if there is no next verification question, step S210 is executed.

於步驟S29中，由該互動判別模組11確認該使用者未通過身分識別認證，以將驗證失敗之身分驗證結果傳送至該使用者裝置9。 In step S29, the interaction determination module 11 confirms that the user has not passed the identity authentication, and transmits the identity authentication result of the failed authentication to the user device 9.

於步驟S210中，由該互動判別模組11確認該使用者通過身分識別認證，以將驗證成功之身分驗證結果傳送至該使用者裝置9。 In step S210, the interaction determination module 11 confirms that the user has passed the identity authentication, and transmits the successful identity authentication result to the user device 9.

於步驟S211中，由該使用者裝置9接收來自該互動判別模組11之身分驗證結果(包含身分驗證成功或身分驗證失敗)，以通知該使用者。 In step S211, the user device 9 receives the identity verification result (including identity verification success or identity verification failure) from the interaction determination module 11 to notify the user.

此外，本發明還揭示一種電腦可讀媒介，係應用於具有處理器(例如，CPU、GPU等)及/或記憶體的計算裝置或電腦中，且儲存有指令，並可利用此計算裝置或電腦透過處理器及/或記憶體執行此電腦可讀媒介，以於執行此電腦可讀媒介時執行上述之方法及各步驟。 In addition, the present invention also discloses a computer-readable medium, which is applied to a computing device or computer having a processor (e.g., CPU, GPU, etc.) and/or a memory, and stores instructions, and the computing device or computer can execute the computer-readable medium through the processor and/or memory to execute the above-mentioned method and each step when executing the computer-readable medium.

圖3A至圖3C係為本發明之互動式活體識別身分驗證系統1之具體實施例之示意圖，且此實施例與上述實施例相同處，不再贅述。 Figures 3A to 3C are schematic diagrams of a specific embodiment of the interactive liveness identification authentication system 1 of the present invention, and the similarities between this embodiment and the above embodiment will not be repeated.

於本實施例中，在一使用者要使用如線上客服等服務功能前，該使用者利用一智慧型手機9(即使用者裝置)先透過網路連接至一互動式活體識別身分驗證系統1，以進行身份驗證，其中，該智慧型手機9向該互動式活體識別身分驗證系統1中之互動判別模組11發出一身分驗證需求，且由該互動判別模組11再回傳至少一驗證問題至該智慧型手機9。舉例而言，如圖3A所示，該互動判別模組11回傳「請問你就讀什麼國小？」之驗證問題至該智慧型手機9，以顯示該驗證問題於該智慧型手機9之螢幕上。此時，如圖3B所示，該使用者開啟該智慧型手機9之鏡頭9a，以錄製回覆為「中壢國小」之答案影音串流，且該答案影音串流中包含該使用者之人臉。 In this embodiment, before a user wants to use service functions such as online customer service, the user first uses a smart phone 9 (i.e., a user device) to connect to an interactive biometric identity verification system 1 through a network to perform identity verification, wherein the smart phone 9 sends an identity verification request to the interactive determination module 11 in the interactive biometric identity verification system 1, and the interactive determination module 11 then returns at least one verification question to the smart phone 9. For example, as shown in FIG. 3A , the interactive determination module 11 returns the verification question “What elementary school do you attend?” to the smart phone 9, so that the verification question is displayed on the screen of the smart phone 9. At this time, as shown in FIG. 3B , the user turns on the camera 9a of the smart phone 9 to record the answer video stream of "Zhongli Elementary School", and the answer video stream includes the user's face.

接著，該智慧型手機9將該答案影音串流傳送回傳至該互動判別模組11，而該互動判別模組11將該答案影音串流傳送至一人臉辨識模組12，以由該人臉辨識模組12對該答案影音串流中之該使用者進行人臉辨識，其中，於該使用者通過人臉辨識後，該人臉辨識模組12從一資料庫15取得該使用者相對應之人員編號，以提供給該互動判別模組11，並通知該使用者通過人臉辨識。 Then, the smart phone 9 transmits the answer video stream back to the interactive determination module 11, and the interactive determination module 11 transmits the answer video stream to a face recognition module 12, so that the face recognition module 12 performs face recognition on the user in the answer video stream. After the user passes the face recognition, the face recognition module 12 obtains the corresponding personnel number of the user from a database 15, provides it to the interactive determination module 11, and notifies the user that the face recognition has been passed.

對此，該互動判別模組11提供該答案影音串流至該互動式問答模組13及該活體辨識模組14，其中，由該互動式問答模組13取得該答案影音串流中的該使用者之驗證回答(即「中壢國小」)，再依據該人員編號從該資料庫15中取得一正確答案(即「中壢國小」)，藉此該互動式問答模組13比對出該驗證回答與正確答案為一致，故判斷該使用者回答正確。 In this regard, the interactive determination module 11 provides the answer video stream to the interactive question-and-answer module 13 and the liveness recognition module 14, wherein the interactive question-and-answer module 13 obtains the user's verification answer (i.e. "Zhongli Elementary School") in the answer video stream, and then obtains a correct answer (i.e. "Zhongli Elementary School") from the database 15 according to the personnel number, whereby the interactive question-and-answer module 13 compares the verification answer and the correct answer to be consistent, and thus determines that the user's answer is correct.

再者，該活體辨識模組14依據該答案影音串流中之聲音及影像之相似程度判別是否為影音同步。具言之，由該活體辨識模組14將該答案影音串流切分為該影像及該聲音之檔案後，從該影像擷取該使用者之連續人臉影像及從該聲音擷取出該使用者之連續MFCC特徵，其中，該連續人臉影像及該連續MFCC特徵之精細度視其取樣率而定，例如：每秒取25張連續人臉影像及25組連續MFCC特徵(即聲音特徵)，且每一組連續MFCC特徵包含4張MFCC特徵，以使每張人臉影像及每組MFCC特徵可精細至40毫秒。 Furthermore, the liveness recognition module 14 determines whether the video and audio are synchronized according to the similarity between the sound and the image in the answer video and audio stream. Specifically, after the liveness recognition module 14 divides the answer video and audio stream into the image and the sound files, it captures the user's continuous facial image from the image and the user's continuous MFCC features from the sound. The precision of the continuous facial image and the continuous MFCC features depends on the sampling rate, for example: 25 continuous facial images and 25 sets of continuous MFCC features (i.e., sound features) are taken per second, and each set of continuous MFCC features contains 4 MFCC features, so that each facial image and each set of MFCC features can be as precise as 40 milliseconds.

之後，由該活體辨識模組14計算出該連續人臉影像及該連續MFCC特徵之間的複數餘弦相似性之數值，且利用統計學方式得到一同步相似值，以判斷該同步相似值是否大於一門閥值，且於該連續人臉影像及該連續 MFCC特徵之間的該同步相似值小於一門閥值時，判斷該答案影音串流達到影音同步，故該活體辨識模組14判別該使用者為活體。此時，由該互動判別模組11再判斷是否有下一題驗證問題，例如：「你養的第一隻寵物叫甚麼名字？」，其中，若無下一題驗證問題，則如圖3C所示，由該互動判別模組11確認該使用者通過身分識別認證，以將驗證成功之身分驗證結果傳送至該智慧型手機9，以通知該使用者身分驗證成功，藉此令該使用者能繼續使用後續的線上客服等服務。 Afterwards, the liveness recognition module 14 calculates the value of the complex cosine similarity between the continuous face image and the continuous MFCC feature, and obtains a synchronization similarity value using a statistical method to determine whether the synchronization similarity value is greater than a threshold value. When the synchronization similarity value between the continuous face image and the continuous MFCC feature is less than a threshold value, it is determined that the answer video stream has achieved video synchronization, so the liveness recognition module 14 determines that the user is alive. At this time, the interactive judgment module 11 determines whether there is a next verification question, such as "What is the name of the first pet you raised?" If there is no next verification question, as shown in FIG3C, the interactive judgment module 11 confirms that the user has passed the identity authentication, and transmits the successful identity authentication result to the smart phone 9 to notify the user that the identity authentication is successful, so that the user can continue to use the subsequent online customer service and other services.

綜上所述，本發明之互動式活體識別身分驗證系統、方法及其電腦可讀媒介，係接收到使用者裝置之身分驗證需求後，對使用者提出問題，藉此得到一答案影音串流，並對該答案影音串流進行人臉辨識後取得正確答案，以確認使用者之驗證回答是否正確，同時判別該答案影音串流是否為影音同步，藉此確認該答案影音串流中之使用者是否為活體。是以，本發明藉由判斷驗證回答之正確性及答案影音串流之同步性，能準確地確認答案影音串流的真實性，以避免有心人士利用人工智慧所生成之偽造影音而通過身分認證，更可使系統服務提供者直接使用原先資料庫中之問答資料進行身分驗證，而無須花費更大的成本建立用戶的活體資料庫，藉此提升線上服務之安全性及降低身分驗證之成本。 In summary, the interactive liveness identification authentication system, method and computer-readable medium of the present invention, after receiving the authentication request from the user device, asks the user a question to obtain an answer video stream, and obtains the correct answer after performing facial recognition on the answer video stream to confirm whether the user's authentication answer is correct, and at the same time determines whether the answer video stream is synchronized with the video, thereby confirming whether the user in the answer video stream is alive. Therefore, the present invention can accurately confirm the authenticity of the answer video stream by judging the correctness of the verification answer and the synchronization of the answer video stream, so as to prevent malicious people from using fake videos generated by artificial intelligence to pass identity authentication. It can also enable system service providers to directly use the question and answer data in the original database for identity authentication without spending more costs to establish a user's live database, thereby improving the security of online services and reducing the cost of identity authentication.

再者，本發明之互動式活體識別身分驗證系統、方法及其電腦可讀媒介至少具有以下技術差異及其功效： Furthermore, the interactive liveness identification authentication system, method and computer-readable medium of the present invention have at least the following technical differences and functions:

一、本發明係將影音中一連串影像(如連續人臉影像)與聲音特徵(連續MFCC特徵)輸入基於類神經網路之活體辨識模型，以將影像及聲音轉成多維度空間上的向量，再計算聲音和影像於維度空間之距離，以判斷是否影像同步，藉此確認該使用者是否為活體。是以，相較於現有影音同步判別技術，僅針對唇語及語音之單字元，以相對應之時間點做識別，而本發明可更精細地對影音進行分析，大幅提升判斷的準確度。 1. The present invention inputs a series of images (such as continuous face images) and sound features (continuous MFCC features) in audio and video into a living body recognition model based on neural networks to convert the images and sounds into vectors in multi-dimensional space. Then calculate the distance between the sound and the image in the dimensional space to determine whether the images are synchronized, thereby confirming whether the user is alive. Therefore, compared with the existing audio-visual synchronization judgment technology, it only targets Single characters of lip language and speech are recognized at corresponding time points, and the present invention can analyze the video and audio more precisely, greatly improving the accuracy of judgment.

二、本發明之互動式活體識別身分驗證系統、方法及其電腦可讀媒介能廣泛地應用於各個產業，例如：通訊、醫療、金融、學校、網路銷售產業，且應用的線上服務產品也很豐富，例如：電信客服、線上金融服務系統、線上銷售客服系統、線上就醫系統，以快速且便利地提供更安全的線上驗證服務環境。 2. The interactive liveness identification authentication system, method and computer-readable medium of the present invention can be widely used in various industries, such as communications, medical, finance, schools, and online sales industries, and the applied online service products are also rich, such as telecommunications customer service, online financial service system, online sales customer service system, and online medical treatment system, so as to quickly and conveniently provide a safer online authentication service environment.

上述實施形態僅例示性說明本發明之原理及其功效，而非用於限制本發明。任何熟習此項技藝之人士均可在不違背本發明之精神及範疇下，對上述實施形態進行修飾與改變。因此，本發明之權利保護範圍應如申請專利範圍所列。 The above implementation forms are merely illustrative of the principles and effects of the present invention, and are not intended to limit the present invention. Anyone familiar with this art may modify and change the above implementation forms without violating the spirit and scope of the present invention. Therefore, the scope of protection of the present invention should be as listed in the scope of the patent application.

11:互動判別模組 11: Interaction identification module

12:人臉辨識模組 12: Face recognition module

13:互動式問答模組 13:Interactive question and answer module

14:活體辨識模組 14: Liveness recognition module

14a:活體辨識模型 14a: Liveness recognition model

15:資料庫 15: Database

9:使用者裝置(智慧型手機) 9: User device (smartphone)

Claims

An interactive liveness identification authentication system comprises:

An interactive identification module, after receiving an identity verification request from a user device, sends at least one verification question to the user device, and receives a video stream of a user's answer replied by the user device;

An interactive question-and-answer module is communicatively connected to the interactive determination module to receive the answer video stream from the interactive determination module to obtain the user's verification answer in the answer video stream, and then the interactive question-and-answer module obtains a correct answer using a personnel number to determine whether the user's verification answer is correct; and

A liveness recognition module is connected to the interactive identification module in communication to receive the answer video stream from the interactive identification module, so as to determine whether the similarity between the image and sound in the answer video stream reaches video synchronization, and further determine whether the user in the answer video stream is live.

When the interactive question-and-answer module determines that the user's answer is correct and the liveness recognition module determines that the user is alive, the interactive recognition module confirms that the user has passed the identity recognition authentication, and transmits the identity verification result of successful authentication to the user device; on the contrary, when the interactive question-and-answer module determines that the user's answer is incorrect, or the liveness recognition module determines that the user is not alive, the interactive recognition module confirms that the user has not passed the identity recognition authentication, and transmits the identity verification result of failed authentication to the user device.

The interactive liveness identification authentication system as described in claim 1 further includes a face recognition module, which is communicatively connected to the interactive determination module to receive the answer video stream from the interactive determination module, and perform face recognition on the user in the answer video stream, wherein when the user passes the face recognition, the person number is obtained from a database to notify the interactive determination module that the user passes the face recognition.

The interactive liveness identification authentication system as described in claim 1, wherein the liveness identification module captures the user's continuous facial images from the image and the user's continuous MFCC features from the voice to calculate the value of the complex cosine similarity between the continuous facial images and the continuous MFCC features, so as to obtain a synchronization similarity value from the value of the complex cosine similarity, and then determine whether the answer video stream is synchronized based on the synchronization similarity value.

The interactive liveness identification authentication system as described in claim 3, wherein the liveness identification module further includes a liveness identification model, which outputs the corresponding image vector group and sound vector group in the multi-dimensional space after deep learning operation on the continuous face image and the continuous MFCC feature, and then calculates the value of the complex cosine similarity between the continuous face image and the continuous MFCC feature according to the image vector group and the sound vector group.

The interactive liveness identification authentication system as described in claim 1, wherein the interactive judgment module determines whether there is a next verification question, and when there is a next verification question, the interactive question-answering module and the liveness identification module determine whether the user in the other answer video stream replied by the user device has answered correctly and is alive.

An interactive liveness identification authentication method includes:

After receiving an identity verification request from a user device, an interactive identification module sends at least one verification question to the user device, and receives a video stream of a user's answer replied by the user device;

An interactive question-and-answer module receives the answer video stream from the interactive judgment module to obtain the user's verification answer in the answer video stream;

After the interactive question-and-answer module obtains a correct answer using a person ID, it determines whether the user's verification answer is correct; and

A liveness recognition module receives the answer video stream from the interaction determination module to determine whether the similarity between the image and sound in the answer video stream reaches video and audio synchronization, and further determines whether the user in the answer video stream is alive.

The interactive liveness identification authentication method as described in claim 6 further includes receiving the answer video stream from the interactive determination module by a face recognition module to perform face recognition on the user in the answer video stream, wherein after the user passes the face recognition, the person number is obtained from a database to notify the interactive determination module that the user has passed the face recognition.

The interactive liveness identification authentication method as described in claim 6 further includes the liveness identification module capturing the user's continuous facial images from the image and the user's continuous MFCC features from the voice to calculate the value of the complex cosine similarity between the continuous facial images and the continuous MFCC features, so as to obtain a synchronization similarity value from the value of the complex cosine similarity, and then determine whether the answer video stream is synchronized based on the synchronization similarity value.

The interactive liveness identification authentication method as described in claim 8 further includes the liveness identification module performing deep learning operations on the continuous facial images and the continuous MFCC features, outputting corresponding image vector groups and sound vector groups in a multi-dimensional space, and then calculating the value of the complex cosine similarity between the continuous facial images and the continuous MFCC features based on the image vector group and the sound vector group.

The interactive liveness identification authentication method as described in claim 6 further includes the interactive determination module determining whether there is a next verification question, and when there is a next verification question, the interactive question-and-answer module and the liveness identification module determine whether the user in the other answer video stream replied by the user device has answered correctly and is alive.

A computer-readable medium, used in a computing device or a computer, stores instructions for executing an interactive liveness identification authentication method as described in any one of claims 6 to 10.