KR20210026982A

KR20210026982A - Method and apparatus for recognizing user based on on-device training

Info

Publication number: KR20210026982A
Application number: KR1020190127239A
Authority: KR
Inventors: 이도환; 김규홍; 한재준
Original assignee: 삼성전자주식회사
Priority date: 2019-09-02
Filing date: 2019-10-14
Publication date: 2021-03-10

Abstract

온-디바이스 트레이닝 기반의 사용자 인식 방법 및 장치가 개시된다. 일 실시예에 따른 사용자 인식 방법은 일반화된 사용자(generalized user)들에 대응하는 참조 데이터 및 사용자 데이터에 기초하여 특징 추출기에 관한 온-디바이스 트레이닝을 수행하고, 사용자 데이터의 입력에 반응한 특징 추출기의 출력에 기초하여 등록 특징 벡터를 결정하고, 테스트 데이터의 입력에 반응한 특징 추출기의 출력에 기초하여 테스트 특징 벡터를 결정하고, 등록 특징 벡터와 테스트 특징 벡터 간의 비교에 기초하여 테스트 사용자에 관한 사용자 인식을 수행하는 단계들을 포함한다.Disclosed is a method and apparatus for user recognition based on on-device training. The user recognition method according to an embodiment performs on-device training on a feature extractor based on reference data and user data corresponding to generalized users, and the feature extractor responds to input of user data. Determine the registered feature vector based on the output, determine the test feature vector based on the output of the feature extractor in response to the input of the test data, and recognize the test user based on the comparison between the registered feature vector and the test feature vector And performing steps.

Description

User recognition method and device based on on-device training {METHOD AND APPARATUS FOR RECOGNIZING USER BASED ON ON-DEVICE TRAINING}

아래 실시예들은 온-디바이스 트레이닝 기반의 사용자 인식 방법 및 장치에 관한 것이다.The following embodiments relate to a method and apparatus for user recognition based on on-device training.

인식 프로세스의 기술적 자동화는, 예를 들어, 특수한 계산 구조로서 프로세서로 구현된 뉴럴 네트워크 모델을 통해 구현되었으며, 이는 상당한 훈련 후에 입력 패턴과 출력 패턴 사이에서 계산상 직관적인 매핑을 제공할 수 있다. 이러한 맵핑을 생성하는 훈련된 능력은 신경망의 학습 능력이라 할 수 있다. 더구나, 특화된 훈련으로 인해, 이와 같이 특화되어 훈련된 신경망은, 예를 들어, 훈련하지 않은 입력 패턴에 대하여 비교적 정확한 출력을 발생시키는 일반화 능력을 가질 수 있다.Technical automation of the recognition process has been implemented, for example, through a neural network model implemented by a processor as a special computational structure, which can provide a computationally intuitive mapping between input and output patterns after considerable training. The trained ability to generate this mapping can be said to be the learning ability of the neural network. Moreover, due to specialized training, a neural network trained by specializing in this way can have a generalization ability to generate relatively accurate outputs for, for example, untrained input patterns.

일 실시예에 따르면, 인식 방법은 사용자 등록을 위해 정당한 사용자에 의해 입력된 사용자 데이터를 수신하는 단계; 일반화된 사용자(generalized user)들에 대응하는 참조 데이터 및 상기 사용자 데이터에 기초하여 특징 추출기에 관한 온-디바이스 트레이닝을 수행하는 단계; 상기 사용자 데이터의 입력에 반응한 상기 특징 추출기의 출력에 기초하여 등록 특징 벡터를 결정하는 단계; 사용자 인식을 위해 테스트 사용자에 의해 입력된 테스트 데이터를 수신하는 단계; 상기 테스트 데이터의 입력에 반응한 상기 특징 추출기의 출력에 기초하여 테스트 특징 벡터를 결정하는 단계; 및 상기 등록 특징 벡터와 상기 테스트 특징 벡터 간의 비교에 기초하여 상기 테스트 사용자에 관한 사용자 인식을 수행하는 단계를 포함한다.According to an embodiment, the recognition method includes: receiving user data input by a legitimate user for user registration; Performing on-device training on a feature extractor based on reference data corresponding to generalized users and the user data; Determining a registered feature vector based on the output of the feature extractor in response to the input of the user data; Receiving test data input by a test user for user recognition; Determining a test feature vector based on an output of the feature extractor in response to the input of the test data; And performing user recognition on the test user based on a comparison between the registered feature vector and the test feature vector.

상기 특징 추출기는 고정된 파라미터를 갖는 제1 뉴럴 네트워크 및 조절가능한 파라미터를 갖는 제2 뉴럴 네트워크를 포함할 수 있고, 상기 온-디바이스 트레이닝에 의해 상기 제2 뉴럴 네트워크의 상기 조절가능한 파라미터가 조절될 수 있다. 상기 제1 뉴럴 네트워크는 대규모 사용자 데이터베이스에 기초하여 입력 데이터에서 특징을 추출하도록 사전에 트레이닝될 수 있다. 상기 온-디바이스 트레이닝을 수행하는 단계는 상기 사용자 데이터 및 상기 참조 데이터 각각에 서로 다른 값의 레이블들을 할당하는 단계; 및 상기 사용자 데이터 및 상기 참조 데이터의 입력에 반응한 상기 특징 추출기의 출력들과 상기 레이블들 간의 비교에 기초하여 상기 온-디바이스 트레이닝을 수행하는 단계를 포함할 수 있다.The feature extractor may include a first neural network having a fixed parameter and a second neural network having an adjustable parameter, and the adjustable parameter of the second neural network may be adjusted by the on-device training. have. The first neural network may be pre-trained to extract features from input data based on a large user database. The performing of the on-device training may include assigning labels of different values to each of the user data and the reference data; And performing the on-device training based on a comparison between the labels and outputs of the feature extractor in response to the input of the user data and the reference data.

상기 특징 추출기는 고정된 파라미터를 갖는 제1 뉴럴 네트워크 및 조절가능한 파라미터를 갖는 제2 뉴럴 네트워크를 포함할 수 있고, 상기 온-디바이스 트레이닝을 수행하는 단계는 상기 제1 뉴럴 네트워크에 상기 사용자 데이터를 입력하는 단계; 상기 사용자 데이터의 입력에 반응한 상기 제1 뉴럴 네트워크의 출력 및 상기 참조 데이터를 상기 제2 뉴럴 네트워크에 입력하는 단계; 및 상기 제2 뉴럴 네트워크의 출력에 기초하여 상기 온-디바이스 트레이닝을 수행하는 단계를 포함할 수 있다. 상기 참조 데이터는 상기 일반화된 사용자들에 대응하는 일반화된 특징 벡터(generalized feature vector)들을 포함할 수 있고, 상기 일반화된 특징 벡터들은 다수의 일반 사용자들에 대응하는 특징 벡터들을 클러스터화하여 생성될 수 있다.The feature extractor may include a first neural network having a fixed parameter and a second neural network having an adjustable parameter, and the performing of the on-device training inputs the user data to the first neural network The step of doing; Inputting an output of the first neural network and the reference data in response to an input of the user data to the second neural network; And performing the on-device training based on the output of the second neural network. The reference data may include generalized feature vectors corresponding to the generalized users, and the generalized feature vectors may be generated by clustering feature vectors corresponding to a plurality of general users. have.

상기 사용자 인식을 수행하는 단계는 상기 등록 특징 벡터와 상기 테스트 특징 벡터 간의 거리 및 임계치 간의 비교에 기초하여 상기 사용자 인식을 수행하는 단계를 포함할 수 있다. 상기 등록 특징 벡터와 상기 테스트 특징 벡터 간의 거리는 상기 등록 특징 벡터와 상기 테스트 특징 벡터 간의 코사인 거리 및 유클리디안 거리 중 어느 하나에 기초하여 결정될 수 있다. 상기 인식 방법은 상기 등록 특징 벡터가 결정되면, 상기 결정된 등록 특징 벡터를 등록 사용자 데이터베이스에 저장하는 단계를 더 포함할 수 있다.The performing of the user recognition may include performing the user recognition based on a comparison between a distance and a threshold value between the registered feature vector and the test feature vector. The distance between the registered feature vector and the test feature vector may be determined based on one of a cosine distance and a Euclidean distance between the registered feature vector and the test feature vector. The recognition method may further include storing the determined registered feature vector in a registered user database when the registered feature vector is determined.

다른 일 실시예에 따르면, 인식 방법은 고정된 파라미터를 갖는 제1 뉴럴 네트워크 및 조절가능한 파라미터를 갖는 제2 뉴럴 네트워크를 포함하는 특징 추출기를 획득하는 단계; 정당한 사용자에 대응하는 사용자 데이터 및 일반화된 사용자(generalized user)들에 대응하는 참조 데이터에 기초하여 상기 특징 추출기에 관한 온-디바이스 트레이닝을 수행하는 단계; 및 상기 온-디바이스 트레이닝이 완료되면, 상기 특징 추출기를 이용하여 사용자 인식을 수행하는 단계를 포함한다.According to another embodiment, a recognition method includes: obtaining a feature extractor comprising a first neural network having a fixed parameter and a second neural network having an adjustable parameter; Performing on-device training on the feature extractor based on user data corresponding to legitimate users and reference data corresponding to generalized users; And when the on-device training is completed, performing user recognition using the feature extractor.

일 실시예에 따르면, 미리 트레이닝되어 고정된 파라미터를 갖는 제 1 뉴럴 네트워크 및 조절 가능한 파라미터를 갖는 제 2 뉴럴 네트워크를 포함하고, 사용자 디바이스에 탑재된, 특징 추출기의 온-디바이스 트레이닝 방법은 정당한 사용자에 의해 입력된 사용자 데이터를 획득하는 단계; 상기 사용자 데이터를 상기 제 1 뉴럴 네트워크에 입력하는 단계; 및 상기 사용자 데이터의 입력에 반응한 상기 제1 뉴럴 네트워크의 출력 및 미리 정해진 참조 데이터를 상기 제 2 뉴럴 네트워크에 입력하여 상기 제 2 뉴럴 네트워크의 파라미터를 조절하는 단계를 포함한다.According to an embodiment, the on-device training method of the feature extractor, including a first neural network having a pre-trained and fixed parameter and a second neural network having an adjustable parameter, is mounted on a user device. Acquiring user data input by; Inputting the user data into the first neural network; And inputting an output of the first neural network in response to the input of the user data and predetermined reference data to the second neural network to adjust a parameter of the second neural network.

상기 참조 데이터는 1000개 이하의 특징 벡터들, 500개 이하의 특징 벡터들 또는 100개 이하의 특징 벡터들을 포함할 수 있다.The reference data may include 1000 or less feature vectors, 500 or less feature vectors, or 100 or less feature vectors.

일 실시예에 따르면, 인식 장치는 프로세서; 및 상기 프로세서에서 실행가능한 명령어들을 포함하는 메모리를 포함하고, 상기 명령어들이 상기 프로세서에서 실행되면, 상기 프로세서는 사용자 등록을 위해 정당한 사용자에 의해 입력된 사용자 데이터를 수신하고, 일반화된 사용자(generalized user)들에 대응하는 참조 데이터 및 상기 사용자 데이터에 기초하여 특징 추출기에 관한 온-디바이스 트레이닝을 수행하고, 상기 사용자 데이터의 입력에 반응한 상기 특징 추출기의 출력에 기초하여 등록 특징 벡터를 결정하고, 사용자 인식을 위해 테스트 사용자에 의해 입력된 테스트 데이터를 수신하고, 상기 테스트 데이터의 입력에 반응한 상기 특징 추출기의 출력에 기초하여 테스트 특징 벡터를 결정하고, 상기 등록 특징 벡터와 상기 테스트 특징 벡터 간의 비교에 기초하여 상기 테스트 사용자에 관한 사용자 인식을 수행한다.According to an embodiment, the recognition device includes a processor; And a memory including instructions executable in the processor, and when the instructions are executed in the processor, the processor receives user data input by a legitimate user for user registration, and a generalized user Perform on-device training on the feature extractor based on reference data corresponding to the user data and the user data, determine a registered feature vector based on the output of the feature extractor in response to the input of the user data, and recognize the user For receiving test data input by a test user, determining a test feature vector based on the output of the feature extractor in response to the input of the test data, and comparing the registered feature vector and the test feature vector Thus, user recognition of the test user is performed.

다른 일 실시예에 따르면, 인식 장치는 프로세서; 및 상기 프로세서에서 실행가능한 명령어들을 포함하는 메모리를 포함하고, 상기 명령어들이 상기 프로세서에서 실행되면, 상기 프로세서는 고정된 파라미터를 갖는 제1 뉴럴 네트워크 및 조절가능한 파라미터를 갖는 제2 뉴럴 네트워크를 포함하는 특징 추출기를 획득하고, 정당한 사용자에 대응하는 사용자 데이터 및 일반화된 사용자(generalized user)들에 대응하는 참조 데이터에 기초하여 상기 특징 추출기에 관한 온-디바이스 트레이닝을 수행하고, 상기 온-디바이스 트레이닝이 완료되면, 상기 특징 추출기를 이용하여 사용자 인식을 수행한다.According to another embodiment, the recognition device includes a processor; And a memory including instructions executable in the processor, wherein when the instructions are executed in the processor, the processor comprises a first neural network having a fixed parameter and a second neural network having an adjustable parameter. Obtain an extractor, perform on-device training on the feature extractor based on user data corresponding to a legitimate user and reference data corresponding to generalized users, and when the on-device training is completed , User recognition is performed using the feature extractor.

도 1은 일 실시예에 따른 사용자 등록 및 사용자 인식을 위한 인식 장치의 동작을 나타낸 도면.
도 2는 일 실시예에 따른 사전 트레이닝, 사용자 등록, 및 사용자 인식을 위한 프로세스들을 나타낸 도면.
도 3은 일 실시예에 따른 사전 트레이닝의 상세 프로세스를 나타낸 도면.
도 4는 일 실시예에 따른 온-디바이스 트레이닝 및 사용자 등록을 위한 인식 장치의 동작을 나타낸 도면.
도 5는 일 실시예에 따른 온-디바이스 트레이닝의 상세 프로세스를 나타낸 도면.
도 6은 일 실시예에 따른 일반화된 유저 모델의 생성 프로세스를 나타낸 도면.
도 7은 일 실시예에 따른 사용자 인식을 위한 인식 장치의 동작을 나타낸 도면.
도 8 및 도 9는 일 실시예에 따른 온-디바이스 트레이닝 전후의 특징 벡터들의 분포 변화를 나타낸 도면이다.
도 10은 일 실시예에 따른 온-디바이스 트레이닝 기반의 인식 방법을 나타낸 플로우 차트.
도 11은 다른 일 실시예에 따른 온-디바이스 트레이닝 기반의 인식 방법을 나타낸 플로우 차트.
도 12는 일 실시예에 따른 온-디바이스 트레이닝 기반의 인식 장치를 나타낸 블록도.
도 13은 일 실시예에 따른 사용자 디바이스를 나타낸 블록도.1 is a diagram illustrating an operation of a recognition device for user registration and user recognition according to an exemplary embodiment.
2 is a diagram illustrating processes for pre-training, user registration, and user recognition according to an embodiment.
3 is a diagram showing a detailed process of pre-training according to an embodiment.
4 is a diagram illustrating an operation of a recognition device for on-device training and user registration according to an embodiment.
5 is a diagram showing a detailed process of on-device training according to an embodiment.
6 is a diagram illustrating a process of generating a generalized user model according to an embodiment.
7 is a diagram illustrating an operation of a recognition device for user recognition according to an exemplary embodiment.
8 and 9 are diagrams illustrating a change in distribution of feature vectors before and after on-device training according to an embodiment.
10 is a flow chart showing a recognition method based on on-device training according to an embodiment.
Fig. 11 is a flow chart showing a recognition method based on on-device training according to another embodiment.
12 is a block diagram illustrating a recognition apparatus based on on-device training according to an embodiment.
13 is a block diagram showing a user device according to an embodiment.

아래 개시되어 있는 특정한 구조 또는 기능들은 단지 기술적 개념을 설명하기 위한 목적으로 예시된 것으로서, 아래 개시와는 달리 다른 다양한 형태로 실시될 수 있으며 본 명세서의 실시예들을 한정하지 않는다.Specific structures or functions disclosed below are exemplified only for the purpose of describing a technical concept, and may be implemented in various forms different from the disclosure below, and embodiments of the present specification are not limited thereto.

제1 또는 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 이런 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 이해되어야 한다. 예를 들어, 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소는 제1 구성요소로도 명명될 수 있다.Terms such as first or second may be used to describe various components, but these terms should be understood only for the purpose of distinguishing one component from other components. For example, a first component may be referred to as a second component, and similarly, a second component may be referred to as a first component.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "포함하다" 등의 용어는 설시된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함으로 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Singular expressions include plural expressions unless the context clearly indicates otherwise. In the present specification, terms such as "comprise" are intended to designate that the specified features, numbers, steps, actions, components, parts, or combinations thereof exist, and one or more other features or numbers, steps, actions, It is to be understood that the possibility of addition or presence of components, parts, or combinations thereof is not preliminarily excluded.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 해당 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 갖는 것으로 해석되어야 하며, 본 명세서에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the relevant technical field. Terms as defined in a commonly used dictionary should be construed as having a meaning consistent with the meaning of the related technology, and should not be interpreted as an ideal or excessively formal meaning unless explicitly defined in the present specification. Does not.

이하, 실시예들을 첨부된 도면을 참조하여 상세하게 설명한다. 각 도면에 제시된 동일한 참조 부호는 동일한 부재를 나타낸다.Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. The same reference numerals shown in each drawing indicate the same members.

도 1은 일 실시예에 따른 사용자 등록 및 사용자 인식을 위한 인식 장치의 동작을 나타낸 도면이다. 도 1을 참조하면, 인식 장치(110)는 정당한 사용자(101)의 사용자 데이터에 기초하여 정당한 사용자(101)를 인식 장치(110)에 등록할 수 있다. 정당한 사용자(101)는 한 명 이상일 수 있으며, 한 명 이상의 정당한 사용자(101)가 인식 장치(110)에 등록될 수 있다. 정당한 사용자(101)는 인식 장치(110)가 탑재된 디바이스의 소유자나 관리자와 같은 인식 장치(110)의 사용 권한을 갖는 자를 말하며, 진짜 사용자(genuine user)로 지칭될 수도 있다. 정당한 사용자(101)를 인식 장치(110)에 등록하는 것은 사용자 등록 프로세스로 지칭될 수 있다. 사용자 등록 프로세스를 통해 정당한 사용자(101)의 식별 정보(예: 등록 특징 벡터)가 인식 장치(110) 또는 인식 장치(110)와 연계된 다른 장치에 저장될 수 있다. 정당한 사용자(101)의 등록이 완료되면, 정당한 사용자(101)는 등록 사용자로 지칭될 수 있다.1 is a diagram illustrating an operation of a recognition device for user registration and user recognition according to an exemplary embodiment. Referring to FIG. 1, the recognition device 110 may register a legitimate user 101 with the recognition device 110 based on user data of the legitimate user 101. There may be more than one legitimate user 101, and one or more legitimate users 101 may be registered in the recognition device 110. The legitimate user 101 refers to a person who has the right to use the recognition device 110, such as the owner or administrator of the device on which the recognition device 110 is mounted, and may be referred to as a genuine user. Registering the legitimate user 101 with the recognition device 110 may be referred to as a user registration process. Through the user registration process, identification information (eg, a registration feature vector) of the legitimate user 101 may be stored in the recognition device 110 or another device associated with the recognition device 110. When the registration of the legitimate user 101 is completed, the legitimate user 101 may be referred to as a registered user.

테스트 사용자(102)는 신원 미상인 상태로 인식 장치(110)를 이용하기 위해 인식 장치(110)에 사용자 인식을 시도하는 자이며, 정당한 사용자(101) 또는 침입자(imposter)에 해당할 수 있다. 침입자는 인식 장치(110)의 사용 권한을 갖지 않는 자를 말한다. 인식 장치(110)는 테스트 사용자(102)의 테스트 데이터와 사용자 데이터를 비교하여 테스트 사용자(102)에 관한 사용자 인식을 수행하고, 인식 결과를 출력할 수 있다. 테스트 사용자(102)에 관한 사용자 인식을 수행하는 것은 사용자 인식 프로세스로 지칭될 수 있다. 사용자 인식 프로세스는 시간적으로 사용자 등록 프로세스 후에 수행될 수 있다.The test user 102 is a person who attempts to recognize a user in the recognition device 110 in order to use the recognition device 110 in a state of unknown identity, and may correspond to a legitimate user 101 or an imposter. An intruder refers to a person who does not have the right to use the recognition device 110. The recognition device 110 may perform user recognition on the test user 102 by comparing the test data of the test user 102 with the user data, and output a recognition result. Performing user recognition on test user 102 may be referred to as a user recognition process. The user recognition process may be performed temporally after the user registration process.

사용자 인식(user recognition)은 사용자 식별(user identification) 및 사용자 인증(user verification)을 포함할 수 있다. 사용자 인증은 테스트 사용자(102)가 등록 사용자가 맞는지 결정하는 것이고, 사용자 식별은 테스트 사용자(102)가 복수의 사용자들 중 어느 사용자인지 결정하는 것이다. 복수의 등록 사용자들이 존재하고 테스트 사용자(102)가 복수의 등록 사용자들 중 어느 하나에 속한다면, 사용자 식별에 따라 테스트 사용자(102)가 복수의 등록 사용자들 중 어느 등록 사용자인지 결정될 수 있다.User recognition may include user identification and user verification. User authentication is to determine whether the test user 102 is a registered user, and user identification is to determine which of a plurality of users the test user 102 is. If there are a plurality of registered users and the test user 102 belongs to any one of the plurality of registered users, it may be determined which of the plurality of registered users the test user 102 is according to the user identification.

인식 결과는 식별 결과 및 인증 결과 중 적어도 하나를 포함할 수 있다. 예를 들어, 테스트 사용자(102)가 등록 사용자에 해당하는 경우, 인식 장치(110)는 인식 성공에 해당하는 인증 결과를 출력할 수 있다. 이 때, 등록 사용자가 복수로 존재하는 경우, 인식 결과는 테스트 사용자(102)가 복수의 등록 사용자들 중 어느 등록 사용자에 해당하는지에 관한 식별 결과를 포함할 수 있다. 테스트 사용자(102)가 침입자에 해당하는 경우, 인식 장치(110)는 인식 실패에 해당하는 인증 결과를 출력할 수 있다.The recognition result may include at least one of an identification result and an authentication result. For example, when the test user 102 corresponds to a registered user, the recognition device 110 may output an authentication result corresponding to successful recognition. In this case, when there are a plurality of registered users, the recognition result may include an identification result regarding which registered user the test user 102 corresponds to among the plurality of registered users. When the test user 102 corresponds to an intruder, the recognition device 110 may output an authentication result corresponding to the recognition failure.

사용자 데이터는 정당한 사용자(101)의 것이며, 테스트 데이터는 테스트 사용자(102)의 것이다. 사용자 데이터는 정당한 사용자(101)에 의해 인식 장치(110)에 입력되거나, 인식 장치(110)를 포함하는 다른 장치에 입력되어 인식 장치(110)에 전달되거나, 인식 장치(110)와 별개의 또 다른 장치에 입력되어 인식 장치(110)에 전달될 수 있다. 마찬가지로, 테스트 데이터는 테스트 사용자(102)에 의해 인식 장치(110)에 입력되거나, 인식 장치(110)를 포함하는 다른 장치에 입력되어 인식 장치(110)에 전달되거나, 인식 장치(110)와 별개의 또 다른 장치에 입력되어 인식 장치(110)에 전달될 수 있다.The user data belongs to the legitimate user 101, and the test data belongs to the test user 102. User data is input to the recognition device 110 by a legitimate user 101, input to another device including the recognition device 110, and transfer to the recognition device 110, or It may be input to another device and transmitted to the recognition device 110. Similarly, the test data is input to the recognition device 110 by the test user 102, input to another device including the recognition device 110 and transfer to the recognition device 110, or separate from the recognition device 110. It may be input to another device and transmitted to the recognition device 110.

사용자 데이터 및 테스트 데이터와 같이 인식 장치(110)에 입력되는 데이터는 입력 데이터로 지칭될 수 있다. 입력 데이터는 음성 또는 영상을 포함할 수 있다. 예를 들어, 화자 인식(speaker recognition)의 경우 입력 데이터는 발화(speech), 음성(voice), 또는 오디오를 포함할 수 있다. 얼굴 인식의 경우 입력 데이터는 얼굴 영상을 포함할 수 있고, 지문 인식의 경우 입력 데이터는 지문 영상을 포함할 수 있고, 홍채 인식의 경우 입력 데이터는 홍채 영상을 포함할 수 있다. 인식 장치(110)는 이러한 복수의 인증 방식들 중 적어도 하나에 기초하여 사용자 인증을 수행할 수 있다. 사용자 데이터, 테스트 데이터, 참조 데이터, 및 트레이닝 데이터의 모달리티는 인증 장치(110)가 사용하는 적어도 하나의 인증 방식에 대응할 수 있다. 아래에서는 대표적으로 화자 인식의 실시예를 설명하겠으나, 이는 단순히 설명의 편의를 위한 것이며, 아래의 설명은 화자 인식 이외에 다른 인증 방식에도 적용될 수 있다.Data input to the recognition device 110, such as user data and test data, may be referred to as input data. The input data may include audio or video. For example, in the case of speaker recognition, input data may include speech, voice, or audio. In the case of face recognition, the input data may include a face image, in the case of fingerprint recognition, the input data may include a fingerprint image, and in the case of iris recognition, the input data may include an iris image. The recognition device 110 may perform user authentication based on at least one of the plurality of authentication methods. The modality of user data, test data, reference data, and training data may correspond to at least one authentication method used by the authentication device 110. An exemplary embodiment of speaker recognition will be described below as a representative example, but this is simply for convenience of description, and the following description may be applied to other authentication methods other than speaker recognition.

인식 장치(110)는 특징 추출기(120)를 이용하여 사용자 인증을 수행할 수 있다. 특징 추출기(120)는 제1 뉴럴 네트워크(121) 및 제2 뉴럴 네트워크(122)와 같은 뉴럴 네트워크를 포함할 수 있다. 뉴럴 네트워크의 적어도 일부는 소프트웨어로 구현되거나, 뉴럴 프로세서(neural processor)를 포함하는 하드웨어로 구현되거나, 혹은 소프트웨어 및 하드웨어의 조합으로 구현될 수 있다. 예를 들어, 뉴럴 네트워크는 완전 연결 네트워크(fully connected network), 딥 컨볼루셔널 네트워크(deep convolutional network) 및 리커런트 뉴럴 네트워크(recurrent neural network) 등을 포함하는 딥 뉴럴 네트워크(deep neural network, DNN)에 해당할 수 있다. DNN은 복수의 레이어들을 포함할 수 있다. 복수의 레이어들은 입력 레이어(input layer), 적어도 하나의 히든 레이어(hidden layer), 및 출력 레이어(output layer)를 포함할 수 있다.The recognition device 110 may perform user authentication using the feature extractor 120. The feature extractor 120 may include a neural network such as the first neural network 121 and the second neural network 122. At least a part of the neural network may be implemented by software, hardware including a neural processor, or a combination of software and hardware. For example, a neural network is a deep neural network (DNN) including a fully connected network, a deep convolutional network, and a recurrent neural network. It may correspond to. The DNN may include a plurality of layers. The plurality of layers may include an input layer, at least one hidden layer, and an output layer.

뉴럴 네트워크는 딥 러닝에 기반하여 비선형적 관계에 있는 입력 데이터 및 출력 데이터를 서로 매핑함으로써 주어진 동작을 수행하도록 트레이닝될 수 있다. 딥 러닝은 빅 데이터 세트로부터 주어진 문제를 해결하기 위한 기계 학습 기법이다. 딥 러닝은 준비된 트레이닝 데이터를 이용하여 뉴럴 네트워크를 트레이닝하면서 에너지가 최소화되는 지점을 찾아가는 최적화 문제 풀이 과정으로 이해될 수 있다. 딥 러닝의 지도식(supervised) 또는 비지도식(unsupervised) 학습을 통해 뉴럴 네트워크의 구조, 혹은 모델에 대응하는 웨이트가 구해질 수 있고, 이러한 웨이트를 통해 입력 데이터 및 출력 데이터가 서로 매핑될 수 있다. 도 1에 특징 추출기(120)는 인식 장치(110)의 외부에 위치하는 것으로 도시되어 있으나, 특징 추출기(120)는 인식 장치(110)의 내부에 위치할 수 있다.The neural network may be trained to perform a given operation by mapping input data and output data having a nonlinear relationship to each other based on deep learning. Deep learning is a machine learning technique for solving a given problem from a big data set. Deep learning can be understood as a process of solving an optimization problem to find a point where energy is minimized while training a neural network using prepared training data. Through supervised or unsupervised learning of deep learning, a weight corresponding to the structure of a neural network or a model may be obtained, and input data and output data may be mapped to each other through these weights. In FIG. 1, the feature extractor 120 is shown to be located outside the recognition device 110, but the feature extractor 120 may be located inside the recognition device 110.

인식 장치(110)는 특징 추출기(120)에 입력 데이터를 입력하고, 입력 데이터의 입력에 따른 특징 추출기(120)의 출력에 기초하여 인식 장치(110)에 사용자를 등록하거나 인식 결과를 생성할 수 있다. 일 실시예에 따르면, 인식 장치(110)는 입력 데이터에 일정한 전처리를 적용하고, 전처리가 적용된 입력 데이터를 특징 추출기(120)에 입력할 수 있다. 입력 데이터는 전처리를 통해 특징 추출기(120)가 특징을 추출하는데 적합한 형태로 변형될 수 있다. 예를 들어, 입력 데이터가 음성 웨이브에 해당하는 경우, 음성 웨이브는 전처리 과정을 통해 주파수 스펙트럼으로 변환될 수 있다.The recognition device 110 may input input data to the feature extractor 120 and register a user in the recognition device 110 or generate a recognition result based on the output of the feature extractor 120 according to the input of the input data. have. According to an embodiment, the recognition device 110 may apply a predetermined pre-processing to the input data and input the pre-processed input data to the feature extractor 120. The input data may be transformed into a form suitable for the feature extractor 120 to extract features through pre-processing. For example, when the input data corresponds to a voice wave, the voice wave may be converted into a frequency spectrum through a pre-processing process.

특징 추출기(120)는 입력 데이터의 입력에 반응하여 출력 데이터를 출력할 수 있다. 특징 추출기(120)의 출력 데이터는 특징 벡터로 지칭될 수 있다. 또는, 특징 추출기(120)의 출력 데이터는 사용자의 식별 정보를 포함한다는 의미에서 임베딩 벡터(embedding vector)로 지칭될 수도 있다. 정당한 사용자(101)의 사용자 등록 프로세스에서 특징 추출기(120)는 사용자 데이터의 입력에 반응하여 특징 벡터를 출력할 수 있다. 이 때 출력된 특징 벡터는 등록 특징 벡터로 지칭될 수 있고, 정당한 사용자(101)의 식별 정보로서 인식 장치(110) 또는 인식 장치(110)와 연계된 다른 장치에 저장될 수 있다. 테스트 사용자(102)의 사용자 인식 프로세스에서 특징 추출기(120)는 테스트 데이터의 입력에 반응하여 특징 벡터를 출력할 수 있다. 이 때 출력된 특징 벡터는 테스트 특징 벡터로 지칭될 수 있다.The feature extractor 120 may output output data in response to the input of the input data. The output data of the feature extractor 120 may be referred to as a feature vector. Alternatively, the output data of the feature extractor 120 may be referred to as an embedding vector in the sense that it includes user identification information. In the user registration process of the legitimate user 101, the feature extractor 120 may output a feature vector in response to input of user data. At this time, the output feature vector may be referred to as a registered feature vector, and may be stored in the recognition device 110 or another device associated with the recognition device 110 as identification information of the legitimate user 101. In the user recognition process of the test user 102, the feature extractor 120 may output a feature vector in response to an input of test data. In this case, the output feature vector may be referred to as a test feature vector.

인식 장치(110)는 등록 특징 벡터와 테스트 특징 벡터를 비교하여 인식 결과를 생성할 수 있다. 예를 들어, 인식 장치(110)는 등록 특징 벡터와 테스트 특징 벡터 간의 거리를 결정하고, 결정된 거리와 임계치 간의 비교에 기초하여 인식 결과를 생성할 수 있다. 결정된 거리의 값이 임계치보다 작은 경우, 등록 특징 벡터와 테스트 특징 벡터가 서로 매칭되는 것으로 표현될 수 있다.The recognition device 110 may generate a recognition result by comparing the registered feature vector and the test feature vector. For example, the recognition device 110 may determine a distance between the registered feature vector and the test feature vector, and generate a recognition result based on a comparison between the determined distance and a threshold value. When the value of the determined distance is smaller than the threshold, the registered feature vector and the test feature vector may be expressed as matching each other.

등록 사용자가 복수인 경우 각 등록 사용자에 관한 등록 특징 벡터가 복수로 존재할 수 있다. 이 경우, 인식 장치(110)는 테스트 특징 벡터를 각 등록 특징 벡터와 비교하여 인식 결과를 생성할 수 있다. 예를 들어, 테스트 특징 벡터가 복수의 등록 특징 벡터들 중 어느 하나와 매칭되는 경우, 인식 장치(110)는 인식 성공에 해당하는 인식 결과를 출력할 수 있다. 이 때, 인식 결과는 테스트 특징 벡터와 매칭된 등록 특징 벡터에 대응하는 등록 사용자에 관한 식별 결과를 포함할 수 있다. 테스트 사용자(102)가 복수의 등록 사용자들 중 어느 등록 사용자에 해당하는지에 관한 식별 결과를 포함할 수 있다.When there are a plurality of registered users, there may be a plurality of registered feature vectors for each registered user. In this case, the recognition device 110 may generate a recognition result by comparing the test feature vector with each registered feature vector. For example, when the test feature vector matches any one of a plurality of registered feature vectors, the recognition device 110 may output a recognition result corresponding to the recognition success. In this case, the recognition result may include an identification result of a registered user corresponding to the registered feature vector matched with the test feature vector. The test user 102 may include an identification result regarding which registered user among the plurality of registered users.

특징 추출기(120)는 제1 뉴럴 네트워크(121) 및 제2 뉴럴 네트워크(122)를 포함할 수 있다. 제1 뉴럴 네트워크(121)는 대규모 사용자 데이터베이스에 기초하여 사전에 트레이닝될 수 있고, 제2 뉴럴 네트워크(122)는 사용자 등록 프로세스에서 사용자 데이터에 기초하여 추가로 트레이닝될 수 있다. 여기서, '사전'이라는 것은 사용자 등록 프로세스의 수행 전 시점, 예를 들어 특징 추출기(120)의 개발 및 생산 시점을 의미할 수 있다. 대규모 사용자 데이터베이스는 불특정 사용자들에 대응할 수 있고, 사용자 데이터는 정당한 사용자(101)와 같은 특정 사용자에 대응할 수 있다. 제1 뉴럴 네트워크(121)의 트레이닝은 특징 추출기(120)의 개발 및 생산 단계에서 서버에 의해 수행될 수 있고, 사전 트레이닝 혹은 1차적 트레이닝으로 지칭될 수 있다. 제2 뉴럴 네트워크(122)의 트레이닝은 사용자 등록 프로세스에서 인식 장치(110)를 포함하는 디바이스에 의해 수행될 수 있고, 온-디바이스 트레이닝 혹은 2차적 트레이닝으로 지칭될 수 있다. 온-디바이스 트레이닝에서 '디바이스'는 인식 장치(110)가 탑재된 사용자 디바이스를 의미할 수 있다.The feature extractor 120 may include a first neural network 121 and a second neural network 122. The first neural network 121 may be trained in advance based on a large user database, and the second neural network 122 may be additionally trained based on user data in the user registration process. Here, the term "dictionary" may mean a time point before the user registration process is performed, for example, a time point of development and production of the feature extractor 120. A large user database may correspond to unspecified users, and user data may correspond to a specific user such as a legitimate user 101. Training of the first neural network 121 may be performed by a server in the development and production stages of the feature extractor 120, and may be referred to as pre-training or primary training. Training of the second neural network 122 may be performed by a device including the recognition device 110 in the user registration process, and may be referred to as on-device training or secondary training. In on-device training,'device' may refer to a user device in which the recognition device 110 is mounted.

제1 뉴럴 네트워크(121)는 고정된 파라미터를 가질 수 있고, 제2 뉴럴 네트워크(122)는 조절가능한 파라미터를 가질 수 있다. 파라미터는 가중치를 포함할 수 있다. 사전 트레이닝에 의해 제1 뉴럴 네트워크(121)가 트레이닝되면, 제1 뉴럴 네트워크(121)의 파라미터는 고정되며, 온-디바이스 트레이닝에 의해 변경되지 않는다. 파라미터가 고정되었다는 것은 파라미터가 동결된(freeze) 것으로 표현될 수도 있다. 제2 뉴럴 네트워크(122)의 파라미터는 온-디바이스 트레이닝에 의해 조절될 수 있다. 제1 뉴럴 네트워크(121)가 일반적인 방식으로 입력 데이터에서 특징을 추출한다면, 제2 뉴럴 네트워크(122)는 제1 뉴럴 네트워크(121)에 의해 추출된 특징을 개별 디바이스의 사용자들에게 특화되도록 재배치(remapping)하는 것으로 이해될 수 있다.The first neural network 121 may have a fixed parameter, and the second neural network 122 may have an adjustable parameter. The parameters may include weights. When the first neural network 121 is trained by pre-training, the parameters of the first neural network 121 are fixed and are not changed by on-device training. That the parameter is fixed may be expressed as the parameter freeze. The parameters of the second neural network 122 may be adjusted by on-device training. If the first neural network 121 extracts features from the input data in a general manner, the second neural network 122 rearranges the features extracted by the first neural network 121 to be specialized to users of individual devices ( It can be understood as remapping).

사용자 인식에 있어서, 트레이닝 데이터와 실제 사용자 데이터 간의 부조화는 인식 성능을 떨어뜨리는 결과를 초래할 수 있다. 예를 들어, 제1 뉴럴 네트워크(121)의 사전 트레이닝 시 실제 사용자 데이터는 사용되지 않으므로, 제1 뉴럴 네트워크(121)만으로 구성된 특징 추출기(120)의 인식 성능은 높지 않을 수 있다. 제2 뉴럴 네트워크(122)의 온-디바이스 트레이닝은 실제 사용자 데이터에 기초하여 수행되므로, 이러한 부조화를 해소하는데 도움을 줄 수 있다. 예를 들어, 사전 트레이닝만 적용된 일반적인 특징 추출기가 이용될 경우, 가족 구성원들과 같이 유사한 특징을 갖는 사용자들의 식별이 어려울 수 있다. 그러나, 실시예에 따른 특징 추출기(120)가 이용될 경우, 각 사용자의 실제 사용자 데이터가 온-디바이스 트레이닝에 이용되므로, 유사한 특징을 갖는 사용자들이 비교적 정확히 식별될 수 있다.In user recognition, a mismatch between training data and actual user data may result in deterioration of recognition performance. For example, since actual user data is not used during pre-training of the first neural network 121, the recognition performance of the feature extractor 120 composed of only the first neural network 121 may not be high. Since the on-device training of the second neural network 122 is performed based on actual user data, it may help to resolve such incongruity. For example, when a general feature extractor to which only pre-training is applied is used, it may be difficult to identify users having similar features such as family members. However, when the feature extractor 120 according to the embodiment is used, since actual user data of each user is used for on-device training, users having similar features can be relatively accurately identified.

나아가, 실시예에 따른 온-디바이스 트레이닝에는 사용자 데이터뿐만 아니라 일반화된 사용자(generalized user)들에 대응하는 참조 데이터가 사용될 수 있다. 사용자 데이터 및 참조 데이터를 이용한 온-디바이스 트레이닝에 따라, 특징 추출기(120)는 사용자 데이터에서 참조 데이터와 구분되는 특징을 추출할 수 있다. 이에 따라 침입자의 특징 벡터와 등록 사용자의 특징 벡터가 더욱 명확히 구분될 수 있고, 인식 성능이 향상될 수 있다. 사용자 데이터 및 참조 데이터를 이용한 온-디바이스 트레이닝은 추후 상세히 설명된다.Further, in the on-device training according to an embodiment, not only user data but also reference data corresponding to generalized users may be used. According to on-device training using user data and reference data, the feature extractor 120 may extract features distinguished from the reference data from the user data. Accordingly, a feature vector of an intruder and a feature vector of a registered user can be more clearly distinguished, and recognition performance can be improved. On-device training using user data and reference data will be described in detail later.

도 2는 일 실시예에 따른 사전 트레이닝, 사용자 등록, 및 사용자 인증을 위한 프로세스들을 나타낸 도면이다. 도 2를 참조하면, 단계(210)에서 사전 트레이닝이 수행된다. 사전 트레이닝은 불특정 사용자들에 대응하는 대규모 사용자 데이터베이스에 기초하여 수행될 수 있으며, 사전 트레이닝을 통해 특징 추출기(200)의 제1 뉴럴 네트워크(201)가 트레이닝될 수 있다. 사전 트레이닝은 서버 단에서 수행될 수 있으며, 단계(210)가 수행된 이후 특징 추출기(200)는 디바이스에 탑재되어 사용자에게 배포될 수 있다.2 is a diagram illustrating processes for pre-training, user registration, and user authentication according to an embodiment. 2, in step 210, pre-training is performed. The pre-training may be performed based on a large-scale user database corresponding to unspecified users, and the first neural network 201 of the feature extractor 200 may be trained through pre-training. The pre-training may be performed at the server side, and after step 210 is performed, the feature extractor 200 may be mounted on a device and distributed to users.

사용자 등록을 위해 정당한 사용자에 의해 사용자 데이터가 입력되면, 단계(220)에서 온-디바이스 트레이닝이 수행되고, 단계(230)에서 사용자 등록이 수행된다. 단계들(220, 230)은 사용자 등록 프로세스로 지칭될 수 있다. 온-디바이스 트레이닝은 사용자 등록 프로세스에서 수행되는 것으로 이해될 수 있다. 온-디바이스 트레이닝은 정당한 사용자와 같은 특정 사용자의 사용자 데이터 및 일반화된 사용자들에 대응하는 참조 데이터에 기초하여 수행될 수 있으며, 온-디바이스 트레이닝을 통해 특징 추출기(200)의 제2 뉴럴 네트워크(202)가 트레이닝될 수 있다. 온-디바이스 트레이닝이 수행되기 전 제2 뉴럴 네트워크(202)는 항등 행렬(identity matrix)로 초기화된 상태일 수 있다.When user data is input by a legitimate user for user registration, on-device training is performed in step 220, and user registration is performed in step 230. Steps 220 and 230 may be referred to as a user registration process. On-device training can be understood as being performed in the user registration process. The on-device training may be performed based on user data of a specific user such as a legitimate user and reference data corresponding to generalized users, and the second neural network 202 of the feature extractor 200 through on-device training. ) Can be trained. Before on-device training is performed, the second neural network 202 may be in a state initialized with an identity matrix.

단계(220) 이후에 특징 추출기(200)는 등록 사용자에게 특화된 상태가 될 수 있다. 온-디바이스 트레이닝이 완료된 이후에, 단계(230)에서 정당한 사용자의 사용자 데이터가 특징 추출기(200)에 입력될 수 있다. 사용자 데이터의 입력에 반응한 특징 추출기(200)의 출력에 기초하여 등록 특징 벡터가 결정될 수 있다. 등록 특징 벡터가 결정되면, 결정된 등록 특징 벡터는 등록 사용자 데이터베이스에 저장될 수 있다.After step 220, the feature extractor 200 may be in a state specialized for the registered user. After the on-device training is completed, user data of a legitimate user may be input to the feature extractor 200 in step 230. A registered feature vector may be determined based on the output of the feature extractor 200 in response to input of user data. When the registration feature vector is determined, the determined registration feature vector may be stored in the registered user database.

단계(240)에서 사용자 인식이 수행된다. 단계(240)는 사용자 인식 프로세스로 지칭될 수 있다. 사용자 인식을 위해 테스트 사용자에 의해 입력된 테스트 데이터는 특징 추출기(200)에 입력될 수 있다. 테스트 데이터의 입력에 반응한 특징 추출기(200)의 출력에 기초하여 테스트 특징 벡터가 결정될 수 있다. 등록 특징 벡터와 테스트 특징 벡터 간의 비교에 기초하여 테스트 사용자에 관한 사용자 인식이 수행될 수 있다. 단계들(220 내지 240)은 디바이스에 의해 수행될 수 있다.In step 240, user recognition is performed. Step 240 may be referred to as a user recognition process. Test data input by a test user for user recognition may be input to the feature extractor 200. A test feature vector may be determined based on the output of the feature extractor 200 in response to the input of the test data. User recognition of the test user may be performed based on the comparison between the registered feature vector and the test feature vector. Steps 220 to 240 may be performed by the device.

도 3은 일 실시예에 따른 사전 트레이닝의 상세 프로세스를 나타낸 도면이다. 도 3을 참조하면, 트레이닝 장치(310)는 입력 데이터에서 특징을 추출하도록 대규모 사용자 데이터베이스(320)를 이용하여 뉴럴 네트워크(330)를 트레이닝할 수 있다. 예를 들어, 대규모 사용자 데이터베이스(320)는 다수의 불특정 사용자들에 관한 트레이닝 데이터를 포함할 수 있고, 각 트레이닝 데이터에 관해 레이블이 할당될 수 있다. 트레이닝 데이터는 음성 또는 영상을 포함할 수 있다. 예를 들어, 화자 인식의 경우 입력 데이터는 발화, 음성, 또는 오디오를 포함할 수 있다.3 is a diagram illustrating a detailed process of pre-training according to an embodiment. Referring to FIG. 3, the training apparatus 310 may train the neural network 330 using a large-scale user database 320 to extract features from input data. For example, the large user database 320 may contain training data for a number of unspecified users, and a label may be assigned for each training data. Training data may include audio or video. For example, in the case of speaker recognition, input data may include speech, voice, or audio.

뉴럴 네트워크(330)는 입력 레이어(331), 적어도 하나의 히든 레이어(332), 및 출력 레이어(333)를 포함한다. 예를 들어, 입력 레이어(331)는 트레이닝 데이터에 대응할 수 있고, 출력 레이어(333)는 소프트맥스(Softmax)와 같은 활성화 함수에 대응할 수 있다. 뉴럴 네트워크(330)의 사전 트레이닝을 통해 적어도 하나의 히든 레이어(332)의 파라미터(예: 가중치)가 조절될 수 있다. 각 트레이닝 데이터에 관해 서로 다른 레이블에 할당될 수 있고, 이러한 트레이닝 데이터 및 레이블에 기초한 사전 트레이닝에 따라 뉴럴 네트워크(330)는 서로 다른 입력 데이터에 관해 서로 다른 출력 데이터를 출력하는 능력을 갖게 될 수 있다. 이러한 뉴럴 네트워크(330)의 능력은 특징 추출 기능으로 이해될 수 있다.The neural network 330 includes an input layer 331, at least one hidden layer 332, and an output layer 333. For example, the input layer 331 may correspond to training data, and the output layer 333 may correspond to an activation function such as Softmax. A parameter (eg, weight) of at least one hidden layer 332 may be adjusted through pre-training of the neural network 330. Each training data may be assigned to a different label, and according to the training data and pre-training based on the label, the neural network 330 may have the ability to output different output data for different input data. . The capability of the neural network 330 may be understood as a feature extraction function.

예를 들어, 제1 트레이닝 데이터에 관해 제1 레이블에 할당되고, 제2 트레이닝 데이터에 관해 제2 레이블이 할당된 것을 가정할 수 있다. 이 경우, 뉴럴 네트워크(330)는 제1 트레이닝 데이터의 입력에 반응하여 제1 출력 데이터를 출력할 수 있고, 제2 트레이닝 데이터의 입력에 반응하여 제2 출력 데이터를 출력할 수 있다. 트레이닝 장치(310)는 제1 출력 데이터와 제1 레이블을 비교하고, 제1 출력 데이터과 제1 레이블이 동일해질 수 있는 방향으로 적어도 하나의 히든 레이어(332)의 파라미터를 조절할 수 있다. 마찬가지로, 트레이닝 장치(310)는 제2 출력 데이터와 제2 레이블을 비교하고, 제2 출력 데이터과 제2 레이블이 동일해질 수 있는 방향으로 적어도 하나의 히든 레이어(332)의 파라미터를 조절할 수 있다. 트레이닝 장치(310)는 이와 같은 과정을 대규모 사용자 데이터베이스(320)에 관해 반복하여 뉴럴 네트워크(330)를 사전 트레이닝할 수 있다.For example, it may be assumed that a first label is assigned for first training data and a second label is assigned for second training data. In this case, the neural network 330 may output first output data in response to an input of the first training data, and may output second output data in response to an input of the second training data. The training apparatus 310 may compare the first output data and the first label, and adjust the parameters of the at least one hidden layer 332 in a direction in which the first output data and the first label may be the same. Similarly, the training apparatus 310 may compare the second output data and the second label, and adjust the parameters of the at least one hidden layer 332 in a direction in which the second output data and the second label may be the same. The training apparatus 310 may pre-train the neural network 330 by repeating this process with respect to the large-scale user database 320.

일 실시예에 따르면, 트레이닝 프로세스는 적절한 배치(batch) 단위로 수행될 수 있다. 예를 들어, 하나의 트레이닝 데이터를 뉴럴 네트워크(330)에 입력하고, 트레이닝 데이터의 입력에 따른 뉴럴 네트워크(330)의 출력에 대응하는 하나의 출력 데이터를 획득하는 프로세스가 배치 단위로 실행되며, 배치 단위의 실행의 반복을 통해 대규모 사용자 데이터베이스(320)를 이용한 사전 트레이닝이 수행될 수 있다.According to an embodiment, the training process may be performed in an appropriate batch unit. For example, a process of inputting one training data to the neural network 330 and obtaining one output data corresponding to the output of the neural network 330 according to the input of the training data is executed in batch units, and batch Pre-training may be performed using the large-scale user database 320 through repetition of execution of units.

출력 레이어(333)는 적어도 하나의 히든 레이어(332)에서 출력되는 특징 벡터를 레이블에 대응하는 형태로 변환하는 역할을 수행할 수 있다. 사전 트레이닝에 따라 적어도 하나의 히든 레이어(332)의 파라미터는 트레이닝 목적에 따른 값으로 설정될 수 있고, 사전 트레이닝이 완료되면 적어도 하나의 히든 레이어(332)의 파라미터는 고정될 수 있다. 이후에, 뉴럴 네트워크(330)에서 출력 레이어(333)가 제거될 수 있고, 입력 레이어(331) 및 적어도 하나의 히든 레이어(332)를 포함하는 파트(340)로 특징 추출기의 제1 뉴럴 네트워크가 구성될 수 있다.The output layer 333 may serve to convert a feature vector output from at least one hidden layer 332 into a form corresponding to a label. According to the pre-training, the parameter of the at least one hidden layer 332 may be set to a value according to the training purpose, and when the pre-training is completed, the parameter of the at least one hidden layer 332 may be fixed. Thereafter, the output layer 333 may be removed from the neural network 330, and the first neural network of the feature extractor is formed as a part 340 including the input layer 331 and at least one hidden layer 332. Can be configured.

사전 트레이닝이 완료되면, 뉴럴 네트워크(330)는 서로 다른 입력 데이터에 관해 서로 다른 출력 데이터를 출력하는 특징 추출 기능을 수행할 수 있다. 이러한 특징 추출 기능은 트레이닝 데이터가 사용자 등록 프로세스 및 사용자 인식 프로세스에서 사용되는 실제 데이터와 동일한 케이스에서 최대의 성능을 발휘할 수 있다. 그러나, 일반적으로 트레이닝 데이터와 실제 데이터는 다를 수 있다. 인식 성능을 향상시키기 위해 트레이닝 데이터에 실제 데이터를 포함시켜 재트레이닝을 수행함으로써 트레이닝 데이터와 실제 데이터 간의 부조화를 좁히는 방법이 이론적으로는 가능하다.When pre-training is completed, the neural network 330 may perform a feature extraction function of outputting different output data for different input data. This feature extraction function can exhibit maximum performance in the case where the training data is the same as the actual data used in the user registration process and the user recognition process. However, in general, training data and actual data may be different. In order to improve the recognition performance, it is theoretically possible to narrow the incongruity between the training data and the actual data by performing retraining by including real data in the training data.

그러나, 뉴럴 네트워크(330)가 특징 추출 기능을 갖게 되기까지 대규모 사용자 데이터베이스(320)로 뉴럴 네트워크(330)를 트레이닝하는 것이 필요하고, 이러한 트레이닝 과정에는 대규모 컴퓨팅 자원이 요구된다. 일반적으로 사용자 디바이스의 컴퓨팅 자원에는 한계가 있으므로, 이러한 트레이닝은 대용량 서버 단에서 수행될 수 있다. 따라서, 실시예에 따르면 대규모 사용자 데이터베이스(320)를 통해 뉴럴 네트워크(330)를 트레이닝하여 특징 추출기의 제1 뉴럴 네트워크를 생성하고, 실제 데이터에 기반하여 특징 추출기의 제2 뉴럴 네트워크를 생성하는, 사전 트레이닝과 온-디바이스 트레이닝의 이원화된 트레이닝 방식을 제공한다. 이에 따라 트레이닝 데이터와 실제 데이터 간의 부조화가 해결되고, 사용자 디바이스에 특화된 특징 추출기가 제공될 수 있다.However, it is necessary to train the neural network 330 with the large user database 320 until the neural network 330 has a feature extraction function, and a large-scale computing resource is required for this training process. In general, the computing resources of the user device are limited, so such training can be performed in a large-capacity server. Accordingly, according to an embodiment, the neural network 330 is trained through the large-scale user database 320 to generate the first neural network of the feature extractor, and the second neural network of the feature extractor is generated based on actual data. It provides a dualized training method of training and on-device training. Accordingly, incongruity between training data and actual data is solved, and a feature extractor specialized for a user device may be provided.

도 4는 일 실시예에 따른 온-디바이스 트레이닝 및 사용자 등록을 위한 인증 장치의 동작을 나타낸 도면이다. 도 4를 참조하면, 정당한 사용자는 사용자 등록을 위해 사용자 데이터를 입력할 수 있다. 인식 장치(410)는 사용자 데이터에 기초하여 특징 추출기(420)에 관한 온-디바이스 트레이닝을 수행할 수 있다. 특징 추출기(420)는 제1 뉴럴 네트워크(421) 및 제2 뉴럴 네트워크(422)를 포함한다. 제1 뉴럴 네트워크(421)의 파라미터는 사전 트레이닝에 의해 고정될 수 있고, 온-디바이스 트레이닝에 의해 제2 뉴럴 네트워크(422)의 파라미터가 조절될 수 있다. 온-디바이스 트레이닝에는 참조 데이터가 이용될 수 있다. 인식 장치(410)는 일반화된 유저 모델(430)로부터 참조 데이터를 획득하여 제2 뉴럴 네트워크(422)에 입력할 수 있다. 사용자 데이터는 정당한 사용자에 대응할 수 있고, 참조 데이터는 일반화된 사용자(generalized user)에 대응할 수 있다. 일반화된 유저 모델(430)은 추후 상세히 설명된다.4 is a diagram illustrating an operation of an authentication apparatus for on-device training and user registration according to an embodiment. Referring to FIG. 4, a legitimate user may input user data for user registration. The recognition device 410 may perform on-device training on the feature extractor 420 based on user data. The feature extractor 420 includes a first neural network 421 and a second neural network 422. The parameters of the first neural network 421 may be fixed by pre-training, and the parameters of the second neural network 422 may be adjusted by on-device training. Reference data may be used for on-device training. The recognition device 410 may obtain reference data from the generalized user model 430 and input it to the second neural network 422. User data may correspond to a legitimate user, and reference data may correspond to a generalized user. The generalized user model 430 will be described in detail later.

인식 장치(410)는 각 사용자 데이터 및 각 참조 데이터에 서로 다른 레이블을 할당하고, 각 사용자 데이터 및 각 참조 데이터의 입력에 반응한 특징 추출기(420)의 출력들과 레이블들을 서로 비교하여, 제2 뉴럴 네트워크(422)의 파라미터를 조절할 수 있다. 이와 같이 인식 장치는 특징 추출기(420)가 각 사용자 데이터 및 각 참조 데이터에 관해 서로 구분되는 특징 벡터를 출력할 수 있도록 특징 추출기(420)를 트레이닝할 수 있다. 사용자 데이터 및 참조 데이터를 함께 사용하여 특징 추출기(420)를 트레이닝함에 따라, 등록 사용자들의 등록 특징 벡터들이 서로 명확히 구분될 수 있을뿐만 아니라, 등록 사용자의 등록 특징 벡터 및 침입자의 특징 벡터가 서로 명확히 구분될 수 있다. 따라서, 온-디바이스 트레이닝에 따라 특징 추출기(420)에 등록 사용자들을 서로 식별할 수 있는 식별 능력 및 등록 사용자와 침입자를 구분하여 등록 사용자를 인증할 수 있는 인증 능력이 부여될 수 있다.The recognition device 410 allocates different labels to each user data and each reference data, compares the outputs of the feature extractor 420 in response to the input of each user data and each reference data and the labels, The parameters of the neural network 422 can be adjusted. As described above, the recognition apparatus may train the feature extractor 420 so that the feature extractor 420 can output feature vectors that are distinguished from each other for each user data and each reference data. By training the feature extractor 420 using the user data and reference data together, not only can the registered feature vectors of registered users be clearly distinguished from each other, but also the registered feature vectors of registered users and the intruder feature vectors are clearly distinguished from each other. Can be. Accordingly, according to the on-device training, the feature extractor 420 may be given an identification capability capable of identifying registered users from each other and an authentication capability capable of authenticating a registered user by distinguishing a registered user from an intruder.

온-디바이스 트레이닝이 완료되면, 인식 장치(410)는 사용자 데이터를 특징 추출기(420)에 입력하고, 사용자 데이터의 입력에 반응한 특징 추출기(420)에 의해 출력된 특징 벡터를 획득할 수 있다. 인식 장치(410)는 특징 추출기(420)에 의해 출력된 특징 벡터를 등록 특징 벡터로서 등록 사용자 데이터베이스(440)에 저장할 수 있다. 등록 특징 벡터는 추후 사용자 인증 프로세스에 이용될 수 있다.When the on-device training is completed, the recognition device 410 may input user data to the feature extractor 420 and obtain a feature vector output by the feature extractor 420 in response to the input of the user data. The recognition device 410 may store the feature vector output by the feature extractor 420 as a registered feature vector in the registered user database 440. The registration feature vector can be used later in the user authentication process.

도 5는 일 실시예에 따른 온-디바이스 트레이닝의 상세 프로세스를 나타낸 도면이다. 도 5를 참조하면, 인식 장치(510)는 사용자 데이터 및 참조 데이터를 이용하여 특징 추출기(520)에 관한 온-디바이스 트레이닝을 수행할 수 있다. 사용자 데이터는 제1 뉴럴 네트워크(521)에 입력될 수 있고, 참조 데이터는 제2 뉴럴 네트워크(522)에 입력될 수 있다. 참조 데이터는 일반화된 유저 모델(540)으로부터 획득될 수 있다. 인식 장치(510)는 사용자 데이터를 제1 뉴럴 네트워크(521)에 입력할 수 있고, 제1 뉴럴 네트워크(521)가 사용자 데이터의 입력에 반응하여 특징 벡터를 출력하면, 해당 특징 벡터를 제2 뉴럴 네트워크(522)에 입력할 수 있다. 참조 데이터는 제1 뉴럴 네트워크(521)처럼 특징 추출을 수행하는 뉴럴 네트워크를 이용하여 생성된 것일 수 있다. 제1 뉴럴 네트워크(521)의 출력은 인식 장치의 특별한 제어 없이 제1 뉴럴 네트워크(521)로부터 제2 뉴럴 네트워크(522)에 입력되는 것으로 이해될 수 있다.5 is a diagram illustrating a detailed process of on-device training according to an embodiment. Referring to FIG. 5, the recognition device 510 may perform on-device training on the feature extractor 520 using user data and reference data. User data may be input to the first neural network 521, and reference data may be input to the second neural network 522. The reference data may be obtained from the generalized user model 540. The recognition device 510 may input user data to the first neural network 521, and when the first neural network 521 outputs a feature vector in response to the input of the user data, the corresponding feature vector is converted to a second neural network. Input into the network 522. The reference data may be generated using a neural network that performs feature extraction, like the first neural network 521. It may be understood that the output of the first neural network 521 is input to the second neural network 522 from the first neural network 521 without special control of the recognition device.

제2 뉴럴 네트워크(522)는 도 3의 뉴럴 네트워크(330)와 유사한 과정을 통해 트레이닝될 수 있다. 예를 들어, 도 3의 트레이닝 데이터가 사용자 데이터에 대응하는 특징 벡터 및 참조 벡터로 대체된 것으로 이해될 수 있다. 제2 뉴럴 네트워크(522)는 입력 레이어(523), 적어도 하나의 히든 레이어(524), 및 출력 레이어(525)를 포함한다. 예를 들어, 입력 레이어(523)는 사용자 데이터에 대응하는 특징 벡터 및 참조 데이터를 포함하는 입력 데이터에 대응할 수 있고, 출력 레이어(525)는 소프트맥스(Softmax)와 같은 활성화 함수에 대응할 수 있다. 온-디바이스 트레이닝을 통해 적어도 하나의 히든 레이어(524)의 파라미터(예: 가중치)가 조절될 수 있다.The second neural network 522 may be trained through a process similar to that of the neural network 330 of FIG. 3. For example, it may be understood that the training data of FIG. 3 has been replaced with a feature vector and a reference vector corresponding to the user data. The second neural network 522 includes an input layer 523, at least one hidden layer 524, and an output layer 525. For example, the input layer 523 may correspond to input data including a feature vector and reference data corresponding to user data, and the output layer 525 may correspond to an activation function such as Softmax. A parameter (eg, weight) of at least one hidden layer 524 may be adjusted through on-device training.

각 사용자 데이터 및 각 참조 데이터에 관해 서로 다른 레이블에 할당될 수 있고, 이러한 사용자 데이터, 참조 데이터 및 레이블에 기초한 온-디바이스 트레이닝에 따라 특징 추출기(520)는 서로 다른 사용자 데이터 및 서로 다른 참조 데이터에 관해 서로 다른 출력 데이터를 출력하는 능력을 갖게 될 수 있다. 예를 들어, 제1 뉴럴 네트워크(521)가 일반적인 방식으로 입력 데이터에서 특징을 추출한다면, 제2 뉴럴 네트워크(522)는 제1 뉴럴 네트워크(521)에 의해 추출된 특징을 개별 디바이스의 사용자들에게 특화되도록 재배치(remapping)하는 것으로 이해될 수 있다.For each user data and each reference data may be assigned to different labels, and according to the on-device training based on these user data, reference data and labels, the feature extractor 520 is applied to different user data and different reference data. You may have the ability to output different output data. For example, if the first neural network 521 extracts features from the input data in a general manner, the second neural network 522 provides the features extracted by the first neural network 521 to users of individual devices. It can be understood as remapping to be specialized.

일 실시예에 따르면, 트레이닝 프로세스는 적절한 배치(batch) 단위로 수행될 수 있다. 예를 들어, 사용자 데이터 및 참조 데이터 중 하나를 특징 추출기(520)에 입력하고, 특징 추출기(520)의 출력에 대응하는 하나의 출력 데이터를 획득하는 프로세스가 배치 단위로 실행되며, 배치 단위의 실행의 반복을 통해 사용자 데이터 및 참조 데이터를 이용한 온-디바이스 트레이닝이 수행될 수 있다. 온-디바이스 트레이닝이 완료되면 적어도 하나의 히든 레이어(524)의 파라미터는 고정될 수 있다. 이후에, 제2 뉴럴 네트워크(522)에서 출력 레이어(525)가 제거될 수 있고, 출력 레이어(525)가 제거된 상태로 제2 뉴럴 네트워크(522)가 확정될 수 있다.According to an embodiment, the training process may be performed in an appropriate batch unit. For example, a process of inputting one of user data and reference data to the feature extractor 520 and obtaining one output data corresponding to the output of the feature extractor 520 is executed in batch units, and execution in batch units On-device training may be performed using user data and reference data through repetition of. When on-device training is completed, a parameter of at least one hidden layer 524 may be fixed. Thereafter, the output layer 525 may be removed from the second neural network 522, and the second neural network 522 may be determined with the output layer 525 removed.

상술된 것처럼, 온-디바이스 트레이닝을 통해 트레이닝 데이터와 실제 데이터 간의 부조화가 해결될 수 있다. 예를 들어, 사용자 데이터를 통해 등록 특징 벡터들 간의 식별 능력이 향상될 수 있고, 참조 데이터를 통해 등록 특징 벡터와 침입자의 특징 벡터 간의 식별 능력이 향상될 수 있다.As described above, a mismatch between training data and actual data may be solved through on-device training. For example, the ability to discriminate between the registered feature vectors may be improved through user data, and the discrimination capability between the registered feature vectors and the intruder's feature vectors may be improved through reference data.

도 6은 일 실시예에 따른 일반화된 유저 모델의 생성 프로세스를 나타낸 도면이다. 도 6을 참조하면, 대규모 사용자 데이터베이스(610)로부터 입력 데이터가 추출되어 뉴럴 네트워크(620)에 입력된다. 예를 들어, 뉴럴 네트워크(620)는 도 4의 제1 뉴럴 네트워크(421)에 대응할 수 있고, 입력 데이터에 기초하여 특징 벡터를 출력할 수 있다. 대규모 사용자 데이터베이스(610)는 도 3의 대규모 사용자 데이터베이스(610)와 동일할 수도 있고, 상이할 수도 있다.6 is a diagram illustrating a process of generating a generalized user model according to an exemplary embodiment. Referring to FIG. 6, input data is extracted from a large-scale user database 610 and input to a neural network 620. For example, the neural network 620 may correspond to the first neural network 421 of FIG. 4 and may output a feature vector based on input data. The large-scale user database 610 may be the same as or different from the large-scale user database 610 of FIG. 3.

뉴럴 네트워크(620)에 의해 출력된 특징 벡터들은 벡터 평면(630) 상에 점으로 표시되어 있다. 이들 특징 벡터들은 대규모 사용자 데이터베이스(610)에 포함된 다수의 일반 사용자들에 대응할 수 있고, 기초 특징 벡터들로 지칭될 수 있다. 이러한 기초 특징 벡터들을 대표하는 벡터들로서 대표 특징 벡터들(θ1, θ2, ..., θc)이 선정될 수 있다. 예를 들어, 기초 특징 벡터들을 클러스터화하여 대표 특징 벡터들(θ1, θ2, ..., θc)이 선정될 수 있다. 대표 특징 벡터들(θ1, θ2, ..., θc)은 일반화된 사용자들에 대응할 수 있고, 일반화된 특징 벡터(generalized feature vector)들로 지칭될 수 있다. 또한, 대표 특징 벡터들(θ1, θ2, ..., θc)은 참조 데이터로서 일반화된 유저 모델(640)을 구성할 수 있고, 온-디바이스 트레이닝을 위해 이용될 수 있다. 이러한 대표 특징 벡터들은 수십 내지 수백 개로, 1000개 이하, 500개 이하, 또는 100개 이하일 수 있으며, 사용자 디바이스가 현실적으로 딥러닝 트레이닝을 처리할 수 있는 정도의 데이터일 수 있다. 예를 들어, 약 10만명의 사용자들 각각으로부터 10개의 발화를 수집하여 약 100만개의 발화를 포함하는 데이터베이스를 구성할 수 있고, 해당 데이터베이스에 기초하여 약 100개의 대표 특징 벡터들이 생성될 수 있다.Feature vectors output by the neural network 620 are displayed as dots on the vector plane 630. These feature vectors may correspond to a number of general users included in the large-scale user database 610 and may be referred to as basic feature vectors. Representative feature vectors (θ1, θ2, ..., θc) may be selected as vectors representing these basic feature vectors. For example, representative feature vectors θ1, θ2, ..., θc may be selected by clustering basic feature vectors. The representative feature vectors θ1, θ2, ..., θc may correspond to generalized users and may be referred to as generalized feature vectors. In addition, the representative feature vectors θ1, θ2, ..., θc may constitute a generalized user model 640 as reference data, and may be used for on-device training. These representative feature vectors may be tens to hundreds, and may be 1000 or less, 500 or less, or 100 or less, and may be data sufficient for the user device to realistically process deep learning training. For example, a database including about 1 million utterances may be constructed by collecting 10 utterances from each of about 100,000 users, and about 100 representative feature vectors may be generated based on the database.

도 7은 일 실시예에 따른 사용자 인증을 위한 인증 장치의 동작을 나타낸 도면이다. 도 7을 참조하면, 인식 장치(710)는 테스트 데이터를 특징 추출기(720)에 입력한다. 특징 추출기(720)는 온-디바이스 트레이닝이 완료된 상태일 수 있다. 특징 추출기(720)는 테스트 데이터의 입력에 반응하여 테스트 특징 벡터를 출력한다. 테스트 데이터는 사용자 인증 프로세스에서 테스트 사용자에 의해 입력된 것일 수 있다. 테스트 사용자는 신원 미상인 상태로 인식 장치(710)를 이용하기 위해 인식 장치(710)에 사용자 인식을 시도하는 자이며, 정당한 사용자 또는 침입자(imposter)에 해당할 수 있다.7 is a diagram illustrating an operation of an authentication device for user authentication according to an exemplary embodiment. Referring to FIG. 7, the recognition device 710 inputs test data into the feature extractor 720. The feature extractor 720 may be in a state in which on-device training has been completed. The feature extractor 720 outputs a test feature vector in response to the input of the test data. The test data may be input by a test user in the user authentication process. The test user is a person who attempts to recognize a user in the recognition device 710 in order to use the recognition device 710 in an unidentified state, and may correspond to a legitimate user or an imposter.

인식 장치는 등록 사용자 데이터베이스(730)에서 등록 특징 벡터를 획득하고, 등록 특징 벡터와 테스트 특징 벡터를 비교하여 테스트 사용자에 관한 사용자 인식을 수행하고, 인식 결과를 생성할 수 있다. 예를 들어, 인식 장치(710)는 등록 특징 벡터와 테스트 특징 벡터 간의 거리를 결정하고, 결정된 거리와 임계치 간의 비교에 기초하여 인식 결과를 생성할 수 있다. 예를 들어, 등록 특징 벡터와 테스트 특징 벡터 간의 거리는 등록 특징 벡터와 테스트 특징 벡터 간의 코사인 거리, 유클리디안 거리 등에 기초하여 결정될 수 있다.The recognition device may obtain a registered feature vector from the registered user database 730, compare the registered feature vector and the test feature vector to perform user recognition on the test user, and generate a recognition result. For example, the recognition device 710 may determine a distance between the registered feature vector and the test feature vector, and generate a recognition result based on a comparison between the determined distance and a threshold. For example, the distance between the registered feature vector and the test feature vector may be determined based on a cosine distance, a Euclidean distance, and the like between the registered feature vector and the test feature vector.

도 8 및 도 9는 일 실시예에 따른 온-디바이스 트레이닝 전후의 특징 벡터들의 분포 변화를 나타낸 도면이다. 도 8을 참조하면, 벡터 평면들(810, 820) 상에 등록 특징 벡터들이 표시되어 있다. 등록 특징 벡터들은 점 모양으로 표시되어 있으며, 동일한 무늬의 점들은 동일한 등록 사용자의 등록 특징 벡터들을 나타낸다. 벡터 평면(810)의 등록 특징 벡터들은 온-디바이스 트레이닝이 반영되지 않은 특징 추출기를 통해 획득된 것이고, 벡터 평면(820)의 등록 특징 벡터들은 온-디바이스 트레이닝이 반영된 특징 추출기를 통해 획득된 것이다. 도 8에 도시된 것처럼, 등록 특징 벡터들은 온-디바이스 트레이닝을 통해 등록 사용자들에게 특화되도록 재배치(remapping)될 수 있다. 이에 따라, 등록 사용자들, 특히 가족 구성원들과 같이 유사한 특징을 갖는 등록 사용자들이 서로 명확하게 식별될 수 있다.8 and 9 are diagrams illustrating a change in distribution of feature vectors before and after on-device training according to an embodiment. Referring to FIG. 8, registration feature vectors are displayed on vector planes 810 and 820. Registered feature vectors are indicated by a dot shape, and dots of the same pattern represent registered feature vectors of the same registered user. The registered feature vectors of the vector plane 810 are acquired through a feature extractor that does not reflect on-device training, and the registered feature vectors of the vector plane 820 are acquired through a feature extractor reflecting the on-device training. As shown in FIG. 8, registration feature vectors may be remapping to be specialized to registered users through on-device training. Accordingly, registered users, in particular registered users having similar characteristics such as family members, can be clearly identified from each other.

도 9를 참조하면, 벡터 평면들(910, 920)은 도 8의 벡터 평면들(810, 820)에 비해 침입자 특징 벡터들을 더 포함한다. 침입자 특징 벡터들은 별 모양으로 표시되어 있다. 벡터 평면(910)의 등록 특징 벡터들 및 침입자 특징 벡터들은 온-디바이스 트레이닝이 반영되지 않은 특징 추출기를 통해 획득된 것이고, 벡터 평면(920)의 등록 특징 벡터들 및 침입자 특징 벡터들은 온-디바이스 트레이닝이 반영된 특징 추출기를 통해 획득된 것이다. 도 9에 도시된 것처럼, 등록 특징 벡터들뿐만 아니라 침입자 특징 벡터들도 온-디바이스 트레이닝을 통해 등록 사용자들에게 특화되도록 재배치(remapping)될 수 있다. 이에 따라, 등록 사용자와 침입자가 서로 명확하게 구분되어 등록 사용자가 정확히 인증될 수 있다.Referring to FIG. 9, vector planes 910 and 920 further include intruder feature vectors compared to the vector planes 810 and 820 of FIG. 8. Intruder feature vectors are marked with a star. The registered feature vectors and the intruder feature vectors of the vector plane 910 are obtained through a feature extractor that does not reflect on-device training, and the registered feature vectors and the intruder feature vectors of the vector plane 920 are on-device training. This is obtained through the reflected feature extractor. As shown in FIG. 9, not only registration feature vectors but also intruder feature vectors may be remapped to be specialized to registered users through on-device training. Accordingly, the registered user and the intruder are clearly separated from each other, so that the registered user can be accurately authenticated.

도 10은 일 실시예에 따른 온-디바이스 트레이닝 기반의 인증 방법을 나타낸 플로우 차트이다. 도 10을 참조하면, 인증 장치는 단계(1010)에서 사용자 등록을 위해 정당한 사용자　에 의해 입력된 사용자 데이터를 수신하고, 단계(1020)에서 일반화된 사용자들에 대응하는 참조 데이터 및 상기 사용자 데이터에 기초하여 특징 추출기에 관한 온-디바이스 트레이닝을 수행하고, 단계(1030)에서 사용자 데이터의 입력에 반응한 특징 추출기의 출력에 기초하여 등록 특징 벡터를 결정하고, 단계(1040)에서 사용자 인식을 위해 테스트 사용자에 의해 입력된 테스트 데이터를 수신하고, 단계(1050)에서 테스트 데이터의 입력에 반응한 특징 추출기의 출력에 기초하여 테스트 특징 벡터를 결정하고, 단계(1060)에서 등록 특징 벡터와 테스트 특징 벡터 간의 비교에 기초하여 테스트 사용자에 관한 사용자 인식을 수행한다. 그 밖에, 온-디바이스 트레이닝 기반의 인증 방법에는 도 1 내지 도 9를 통해 설명된 사항이 적용될 수 있다.10 is a flowchart illustrating an authentication method based on on-device training according to an embodiment. Referring to FIG. 10, the authentication device receives user data input by a legitimate user for user registration in step 1010, and is based on reference data corresponding to generalized users and the user data in step 1020. Then, on-device training for the feature extractor is performed, a registered feature vector is determined based on the output of the feature extractor in response to an input of user data in step 1030, and a test user for user recognition in step 1040 A test feature vector is determined based on the output of the feature extractor in response to the input of the test data in step 1050, and the test feature vector is determined in step 1060, and the registered feature vector and the test feature vector are compared in step 1060 Based on the user recognition of the test user is performed. In addition, the matters described with reference to FIGS. 1 to 9 may be applied to the authentication method based on on-device training.

도 11은 다른 일 실시예에 따른 온-디바이스 트레이닝 기반의 인증 방법을 나타낸 플로우 차트이다. 도 11을 참조하면, 인증 장치는 단계(1110)에서 고정된 파라미터를 갖는 제1 뉴럴 네트워크 및 조절가능한 파라미터를 갖는 제2 뉴럴 네트워크를 포함하는 특징 추출기를 획득하고, 단계(1120)에서 정당한 사용자에 대응하는 사용자 데이터 및 일반화된 사용자들에 대응하는 참조 데이터에 기초하여 상기 특징 추출기에 관한 온-디바이스 트레이닝을 수행하고, 단계(1130)에서 온-디바이스 트레이닝이 완료되면, 특징 추출기를 이용하여 사용자 인식을 수행한다. 그 밖에, 온-디바이스 트레이닝 기반의 인증 방법에는 도 1 내지 도 10을 통해 설명된 사항이 적용될 수 있다.11 is a flowchart illustrating an authentication method based on on-device training according to another embodiment. Referring to FIG. 11, the authentication device obtains a feature extractor including a first neural network having a fixed parameter and a second neural network having an adjustable parameter in step 1110, and in step 1120 On-device training is performed on the feature extractor based on corresponding user data and reference data corresponding to generalized users, and when on-device training is completed in step 1130, user recognition is performed using the feature extractor. Perform. In addition, the matters described with reference to FIGS. 1 to 10 may be applied to the authentication method based on on-device training.

도 12는 일 실시예에 따른 온-디바이스 트레이닝 기반의 인증 장치를 나타낸 블록도이다. 도 12를 참조하면, 인식 장치(1200)는 사용자 데이터 및 테스트 데이터를 포함하는 입력 데이터를 수신하고, 입력 데이터와 관련된 뉴럴 네트워크의 동작을 처리할 수 있다. 예를 들어, 뉴럴 네트워크의 동작은 사용자 인식 동작을 포함할 수 있다. 인식 장치(1200)는 뉴럴 네트워크의 처리와 관련하여 본 명세서에 기술되거나 또는 도시된 하나 이상의 동작을 수행할 수 있고, 뉴럴 네트워크의 처리 결과를 사용자에게 제공할 수 있다.12 is a block diagram illustrating an authentication apparatus based on on-device training according to an embodiment. Referring to FIG. 12, the recognition device 1200 may receive input data including user data and test data, and may process an operation of a neural network related to the input data. For example, the operation of the neural network may include a user recognition operation. The recognition device 1200 may perform one or more operations described or illustrated herein in connection with processing of a neural network, and may provide a processing result of the neural network to a user.

인식 장치(1200)는 하나 이상의 프로세서(1210) 및 메모리(1220)를 포함할 수 있다. 메모리(1220)는 프로세서(1210)에 연결되고, 프로세서(1210)에 의해 실행가능한 명령어들, 프로세서(1210)가 연산할 데이터 또는 프로세서(1210)에 의해 처리된 데이터를 저장할 수 있다. 메모리(1220)는 비일시적인 컴퓨터 판독가능 매체, 예컨대 고속 랜덤 액세스 메모리 및/또는 비휘발성 컴퓨터 판독가능 저장 매체(예컨대, 하나 이상의 디스크 저장 장치, 플래쉬 메모리 장치, 또는 기타 비휘발성 솔리드 스테이트 메모리 장치)를 포함할 수 있다.The recognition device 1200 may include one or more processors 1210 and a memory 1220. The memory 1220 is connected to the processor 1210 and may store instructions executable by the processor 1210, data to be calculated by the processor 1210, or data processed by the processor 1210. The memory 1220 includes non-transitory computer-readable media, such as high-speed random access memory and/or non-volatile computer-readable storage media (e.g., one or more disk storage devices, flash memory devices, or other non-volatile solid state memory devices). Can include.

프로세서(1210)는 도 1 내지 도 11을 참조하여 설명된 하나 이상의 동작을 실행하기 위한 명령어들을 실행할 수 있다. 일 실시예에 따르면, 메모리(1220)에 저장된 명령어가 프로세서(1210)에서 실행되면, 프로세서(1210)는 사용자 등록을 위해 정당한 사용자에 의해 입력된 사용자 데이터를 수신하고, 일반화된 사용자(generalized user)들에 대응하는 참조 데이터 및 사용자 데이터에 기초하여 특징 추출기(1225)에 관한 온-디바이스 트레이닝을 수행하고, 사용자 데이터의 입력에 반응한 특징 추출기(1225)의 출력에 기초하여 등록 특징 벡터를 결정하고, 사용자 인식을 위해 테스트 사용자에 의해 입력된 테스트 데이터를 수신하고, 테스트 데이터의 입력에 반응한 특징 추출기(1225)의 출력에 기초하여 테스트 특징 벡터를 결정하고, 등록 특징 벡터와 테스트 특징 벡터 간의 비교에 기초하여 테스트 사용자에 관한 사용자 인식을 수행할 수 있다.The processor 1210 may execute instructions for executing one or more operations described with reference to FIGS. 1 to 11. According to an embodiment, when an instruction stored in the memory 1220 is executed in the processor 1210, the processor 1210 receives user data input by a legitimate user for user registration, and a generalized user On-device training is performed on the feature extractor 1225 based on reference data and user data corresponding to the user data, and a registered feature vector is determined based on the output of the feature extractor 1225 in response to the input of the user data. , Receiving test data input by the test user for user recognition, determining a test feature vector based on the output of the feature extractor 1225 in response to the input of the test data, and comparing the registered feature vector and the test feature vector Based on, user recognition of the test user may be performed.

다른 일 실시예에 따르면, 메모리(1220)에 저장된 명령어가 프로세서(1210)에서 실행되면, 프로세서(1210)는 고정된 파라미터를 갖는 제1 뉴럴 네트워크 및 조절가능한 파라미터를 갖는 제2 뉴럴 네트워크를 포함하는 특징 추출기(1225)를 획득하고, 정당한 사용자에 대응하는 사용자 데이터 및 일반화된 사용자(generalized user)들에 대응하는 참조 데이터에 기초하여 특징 추출기(1225)에 관한 온-디바이스 트레이닝을 수행하고, 온-디바이스 트레이닝이 완료되면, 특징 추출기(1225)를 이용하여 사용자 인식을 수행할 수 있다.According to another embodiment, when an instruction stored in the memory 1220 is executed in the processor 1210, the processor 1210 includes a first neural network having a fixed parameter and a second neural network having an adjustable parameter. Acquire the feature extractor 1225, perform on-device training on the feature extractor 1225 based on user data corresponding to legitimate users and reference data corresponding to generalized users, and When device training is completed, user recognition may be performed using the feature extractor 1225.

도 13은 일 실시예에 따른 사용자 디바이스를 나타낸 도면이다. 도 13을 참조하면, 사용자 디바이스(1300)는 입력 데이터를 수신하고, 입력 데이터와 관련된 뉴럴 네트워크의 동작을 처리할 수 있다. 예를 들어, 뉴럴 네트워크의 동작은 사용자 인식 동작을 포함할 수 있다. 사용자 디바이스(1300)는 도 1 내지 도 12를 통해 설명된 인식 장치를 포함하거나, 도 1 내지 도 12를 통해 설명된 인식 장치의 기능을 수행할 수 있다.13 is a diagram illustrating a user device according to an exemplary embodiment. Referring to FIG. 13, the user device 1300 may receive input data and may process an operation of a neural network related to the input data. For example, the operation of the neural network may include a user recognition operation. The user device 1300 may include the recognition apparatus described with reference to FIGS. 1 to 12, or may perform the function of the recognition apparatus described with reference to FIGS. 1 to 12.

사용자 디바이스(1300)는 프로세서(1310), 메모리(1320), 카메라(1330), 저장 장치(1340), 입력 장치(1350), 출력 장치(1360) 및 네트워크 인터페이스(1370)를 포함할 수 있다. 프로세서(1310), 메모리(1320), 카메라(1330), 저장 장치(1340), 입력 장치(1350), 출력 장치(1360) 및 네트워크 인터페이스(1370)는 통신 버스(1380)를 통해 서로 통신할 수 있다. 예를 들어, 사용자 디바이스(1300)는 스마트 폰, 태블릿 PC, 노트북, 데스크톱 PC, 웨어러블 디바이스, 스마트 가전기기, 스마트 스피커, 스마트 카 등을 포함할 수 있다.The user device 1300 may include a processor 1310, a memory 1320, a camera 1330, a storage device 1340, an input device 1350, an output device 1360, and a network interface 1370. The processor 1310, the memory 1320, the camera 1330, the storage device 1340, the input device 1350, the output device 1360, and the network interface 1370 can communicate with each other through the communication bus 1380. have. For example, the user device 1300 may include a smart phone, a tablet PC, a notebook, a desktop PC, a wearable device, a smart home appliance, a smart speaker, a smart car, and the like.

프로세서(1310)는 사용자 디바이스(1300) 내에서 실행하기 위한 기능 및 명령어들을 실행한다. 예를 들어, 프로세서(1310)는 메모리(1320) 또는 저장 장치(1340)에 저장된 명령어들을 처리할 수 있다. 프로세서(1310)는 도 1 내지 도 12를 통하여 설명된 하나 이상의 동작을 수행할 수 있다.The processor 1310 executes functions and instructions for execution in the user device 1300. For example, the processor 1310 may process instructions stored in the memory 1320 or the storage device 1340. The processor 1310 may perform one or more operations described through FIGS. 1 to 12.

메모리(1320)는 뉴럴 네트워크의 동작을 처리하기 위한 정보를 저장한다. 메모리(1320)는 컴퓨터 판독가능한 저장 매체 또는 컴퓨터 판독가능한 저장 장치를 포함할 수 있다. 메모리(1320)는 프로세서(1310)에 의해 실행하기 위한 명령어들을 저장할 수 있고, 사용자 디바이스(1300)에 의해 소프트웨어 또는 애플리케이션이 실행되는 동안 관련 정보를 저장할 수 있다.The memory 1320 stores information for processing an operation of a neural network. The memory 1320 may include a computer readable storage medium or a computer readable storage device. The memory 1320 may store instructions for execution by the processor 1310 and may store related information while software or an application is being executed by the user device 1300.

카메라(1330)는 정지 영상, 비디오 영상, 또는 이들 모두를 촬영할 수 있다. 카메라(1330)는 사용자가 얼굴 인증을 시도하기 위해 입력하는 얼굴 영역을 촬영할 수 있다. 카메라(1330)는 객체들에 관한 깊이 정보를 포함하는 3D 영상을 제공할 수도 있다.The camera 1330 may capture a still image, a video image, or both. The camera 1330 may capture a face area that the user inputs to attempt face authentication. The camera 1330 may provide a 3D image including depth information on objects.

저장 장치(1340)는 컴퓨터 판독가능한 저장 매체 또는 컴퓨터 판독가능한 저장 장치를 포함한다. 일 실시예에 따르면, 저장 장치(1340)는 메모리(1320)보다 더 많은 양의 정보를 저장하고, 정보를 장기간 저장할 수 있다. 예를 들어, 저장 장치(1340)는 자기 하드 디스크, 광 디스크, 플래쉬 메모리, 플로피 디스크 또는 이 기술 분야에서 알려진 다른 형태의 비휘발성 메모리를 포함할 수 있다.The storage device 1340 includes a computer readable storage medium or a computer readable storage device. According to an embodiment, the storage device 1340 may store a greater amount of information than the memory 1320 and may store the information for a long period of time. For example, the storage device 1340 may include a magnetic hard disk, an optical disk, a flash memory, a floppy disk, or other types of nonvolatile memory known in the art.

입력 장치(1350)는 키보드 및 마우스를 통한 전통적인 입력 방식, 및 터치 입력, 음성 입력, 및 이미지 입력과 같은 새로운 입력 방식을 통해 사용자로부터 입력을 수신할 수 있다. 예를 들어, 입력 장치(1350)는 키보드, 마우스, 터치 스크린, 마이크로폰, 또는 사용자로부터 입력을 검출하고, 검출된 입력을 사용자 디바이스(1300)에 전달할 수 있는 임의의 다른 장치를 포함할 수 있다. 입력 장치(1350)를 통해 사용자의 지문, 홍채, 발화(speech), 음성(voice), 및 오디오 등의 데이터가 입력될 수 있다.The input device 1350 may receive input from a user through a traditional input method through a keyboard and a mouse, and new input methods such as touch input, voice input, and image input. For example, the input device 1350 may include a keyboard, a mouse, a touch screen, a microphone, or any other device capable of detecting input from a user and passing the detected input to the user device 1300. Data such as a user's fingerprint, iris, speech, voice, and audio may be input through the input device 1350.

출력 장치(1360)는 시각적, 청각적 또는 촉각적인 채널을 통해 사용자에게 사용자 디바이스(1300)의 출력을 제공할 수 있다. 출력 장치(1360)는 예를 들어, 디스플레이, 터치 스크린, 스피커, 진동 발생 장치 또는 사용자에게 출력을 제공할 수 있는 임의의 다른 장치를 포함할 수 있다. 네트워크 인터페이스(1370)는 유선 또는 무선 네트워크를 통해 외부 장치와 통신할 수 있다.The output device 1360 may provide an output of the user device 1300 to a user through a visual, auditory or tactile channel. The output device 1360 may include, for example, a display, a touch screen, a speaker, a vibration generating device, or any other device capable of providing output to a user. The network interface 1370 may communicate with an external device through a wired or wireless network.

이상에서 설명된 실시예들은 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치, 방법 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(Arithmetic Logic Unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(Field Programmable Gate Array), PLU(Programmable Logic Unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The embodiments described above may be implemented as a hardware component, a software component, and/or a combination of a hardware component and a software component. For example, the apparatus, methods, and components described in the embodiments are, for example, a processor, a controller, an Arithmetic Logic Unit (ALU), a digital signal processor, a microcomputer, a Field Programmable Gate (FPGA). Array), Programmable Logic Unit (PLU), microprocessor, or any other device capable of executing and responding to instructions, such as one or more general purpose computers or special purpose computers. The processing device may execute an operating system (OS) and one or more software applications executed on the operating system. Further, the processing device may access, store, manipulate, process, and generate data in response to the execution of software. For the convenience of understanding, although it is sometimes described that one processing device is used, one of ordinary skill in the art, the processing device is a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that it may include. For example, the processing device may include a plurality of processors or one processor and one controller. In addition, other processing configurations are possible, such as a parallel processor.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may include a computer program, code, instructions, or a combination of one or more of these, configuring the processing unit to operate as desired or processed independently or collectively. You can command the device. Software and/or data may be interpreted by a processing device or, to provide instructions or data to a processing device, of any type of machine, component, physical device, virtual equipment, computer storage medium or device. , Or may be permanently or temporarily embodyed in a transmitted signal wave. The software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored on one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the embodiment, or may be known and usable to those skilled in computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. -A hardware device specially configured to store and execute program instructions such as magneto-optical media, and ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those produced by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like. The hardware device described above may be configured to operate as one or more software modules to perform the operation of the embodiment, and vice versa.

이상과 같이 실시예들이 비록 한정된 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기를 기초로 다양한 기술적 수정 및 변형을 적용할 수 있다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described by the limited drawings, a person of ordinary skill in the art can apply various technical modifications and variations based on the above. For example, the described techniques are performed in a different order from the described method, and/or components such as systems, structures, devices, circuits, etc. described are combined or combined in a form different from the described method, or other components Alternatively, even if substituted or substituted by an equivalent, an appropriate result can be achieved.

Claims

Receiving user data input by a legitimate user for user registration;
Performing on-device training on a feature extractor based on reference data corresponding to generalized users and the user data;
Determining a registered feature vector based on the output of the feature extractor in response to the input of the user data;
Receiving test data input by a test user for user recognition;
Determining a test feature vector based on an output of the feature extractor in response to the input of the test data; And
Performing user recognition on the test user based on a comparison between the registered feature vector and the test feature vector
Recognition method comprising a.

The method of claim 1,
The feature extractor comprises a first neural network with fixed parameters and a second neural network with adjustable parameters,
The adjustable parameter of the second neural network is adjusted by the on-device training,
Recognition method.

The method of claim 2,
The first neural network is pre-trained to extract features from input data based on a large user database,
Recognition method.

The method of claim 1,
The step of performing the on-device training
Allocating labels of different values to each of the user data and the reference data; And
Performing the on-device training based on a comparison between the labels and outputs of the feature extractor in response to the input of the user data and the reference data
Containing, recognition method.

The method of claim 1,
The feature extractor comprises a first neural network with fixed parameters and a second neural network with adjustable parameters,
The step of performing the on-device training
Inputting the user data into the first neural network;
Inputting an output of the first neural network and the reference data in response to an input of the user data to the second neural network; And
Performing the on-device training based on the output of the second neural network
Containing, recognition method.

The method of claim 1,
The reference data includes generalized feature vectors corresponding to the generalized users,
The generalized feature vectors are generated by clustering feature vectors corresponding to a plurality of general users,
Recognition method.

The method of claim 1,
The step of performing the user recognition
Comprising the step of performing the user recognition based on a comparison between a distance and a threshold value between the registration feature vector and the test feature vector,
Recognition method.

The method of claim 7,
The distance between the registration feature vector and the test feature vector is determined based on any one of a cosine distance and a Euclidean distance between the registration feature vector and the test feature vector,
Recognition method.

The method of claim 1,
If the registration feature vector is determined, further comprising the step of storing the determined registration feature vector in a registered user database,
Recognition method.

Obtaining a feature extractor comprising a first neural network with fixed parameters and a second neural network with adjustable parameters;
Performing on-device training on the feature extractor based on user data corresponding to legitimate users and reference data corresponding to generalized users; And
When the on-device training is completed, performing user recognition using the feature extractor
Recognition method comprising a.

The method of claim 10,
Wherein the adjustable parameter of the second neural network is adjusted by the on-device training.

The method of claim 10,
The step of performing the on-device training
Inputting the user data into the first neural network;
Inputting an output of the first neural network and the reference data in response to an input of the user data to the second neural network; And
Performing the on-device training based on the output of the second neural network
Containing, recognition method.

In the on-device training method of a feature extractor, comprising a first neural network having a pre-trained and fixed parameter and a second neural network having an adjustable parameter, and mounted on a user device,
Obtaining user data input by a legitimate user;
Inputting the user data into the first neural network; And
Adjusting a parameter of the second neural network by inputting an output of the first neural network and predetermined reference data in response to the input of the user data to the second neural network
On-device training method comprising a.

The method of claim 13,
The reference data includes 1000 or less feature vectors.

The method of claim 13,
The reference data includes 500 or less feature vectors.

The method of claim 13,
The reference data includes 100 or less feature vectors.

The method of claim 13,
The reference data includes generalized feature vectors corresponding to the generalized users,
On-device training method.

The method of claim 17,
The generalized feature vectors are generated by clustering feature vectors corresponding to a plurality of general users,
On-device training method.

A computer-readable storage medium storing one or more programs including instructions for performing the method of any one of claims 1 to 18.

Processor; And
A memory containing instructions executable in the processor
Including,
When the instructions are executed in the processor, the processor
Receiving user data entered by a legitimate user for user registration,
On-device training for a feature extractor is performed based on reference data corresponding to generalized users and the user data,
Determining a registered feature vector based on the output of the feature extractor in response to the input of the user data,
Receive test data entered by a test user for user recognition,
Determining a test feature vector based on the output of the feature extractor in response to the input of the test data,
Performing user recognition on the test user based on a comparison between the registration feature vector and the test feature vector,
Recognition device.

The method of claim 20,
The feature extractor comprises a first neural network with fixed parameters and a second neural network with adjustable parameters,
The adjustable parameter of the second neural network is adjusted by the on-device training,
Recognition device.

The method of claim 21,
The first neural network is pre-trained to extract features from input data based on a large user database,
Recognition device.

The method of claim 20,
The processor is
Assigning labels of different values to each of the user data and the reference data,
Performing the on-device training based on a comparison between the labels and outputs of the feature extractor in response to the input of the user data and the reference data,
Recognition device.

The method of claim 20,
The feature extractor comprises a first neural network with fixed parameters and a second neural network with adjustable parameters,
The processor is
Inputting the user data into the first neural network,
Inputting the output of the first neural network and the reference data in response to the input of the user data into the second neural network,
Performing the on-device training based on the output of the second neural network,
Recognition device.

The method of claim 20,
The reference data includes generalized feature vectors corresponding to the generalized users,
The generalized feature vectors are generated by clustering feature vectors corresponding to a plurality of general users,
Recognition device.

The method of claim 20,
The processor is
Performing the user recognition based on a comparison between a distance and a threshold value between the registered feature vector and the test feature vector,
Recognition device.

The method of claim 26,
The distance between the registration feature vector and the test feature vector is determined based on any one of a cosine distance and a Euclidean distance between the registration feature vector and the test feature vector,
Recognition device.

The method of claim 20,
When the registration feature vector is determined, the processor stores the determined registration feature vector in a registered user database,
Recognition device.

Processor; And
A memory containing instructions executable in the processor
Including,
When the instructions are executed in the processor, the processor
Obtaining a feature extractor comprising a first neural network with fixed parameters and a second neural network with adjustable parameters,
Perform on-device training on the feature extractor based on user data corresponding to legitimate users and reference data corresponding to generalized users,
When the on-device training is completed, performing user recognition using the feature extractor,
Recognition device.

The method of claim 29,
The recognition apparatus, wherein the adjustable parameter of the second neural network is adjusted by the on-device training.

The method of claim 29,
The processor is
Inputting the user data into the first neural network,
Inputting the output of the first neural network and the reference data in response to the input of the user data into the second neural network,
Performing the on-device training based on the output of the second neural network,
Recognition device.