CN112446408A

CN112446408A - Method and apparatus for identifying user based on-device training

Info

Publication number: CN112446408A
Application number: CN202010637175.9A
Authority: CN
Inventors: 李*焕; 李焕; 金圭洪; 韩在濬
Original assignee: Samsung Electronics Co Ltd
Current assignee: Samsung Electronics Co Ltd
Priority date: 2019-09-02
Filing date: 2020-07-03
Publication date: 2021-03-05
Anticipated expiration: 2040-07-03
Also published as: US20240135177A1; EP3798925A1; US11900246B2; US20210064923A1; US20240232619A9; JP7534045B2; JP2021039749A; CN112446408B

Abstract

A method and apparatus for identifying a user based on-device training is provided. A user identification method based on-device training comprises the following steps: performing on-device training on the feature extractor based on reference data and user data corresponding to the generalized user; determining a registration feature vector based on an output from the feature extractor in response to an input of user data; determining a test feature vector based on an output from the feature extractor in response to an input of test data; and performing user identification on the test user based on a result of comparing the registered feature vector with the test feature vector.

Description

Method and apparatus for identifying user based on-device training

This application claims the rights of korean patent application No. 10-2019-0108199, filed on 2.9.2019 and korean patent application No. 10-2019-0127239, filed on 14.10.2019, korean intellectual property office, the entire disclosures of which are incorporated herein by reference for all purposes.

Technical Field

The following description relates to an on-device training-based (on-device training-based) user identification method and apparatus.

Background

For example, technical automation of recognition has been achieved by a neural network model implemented as a processor of a dedicated computing architecture that, after extensive training, can provide a computationally intuitive mapping between input and output modes. The training ability to produce such a mapping may be referred to as the learning ability of the neural network. Furthermore, due to specialized training, such specially trained neural networks may therefore have the ability to generalize to generate relatively accurate outputs for input patterns that, for example, the neural network may not have been trained yet.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.

In one general aspect, an identification method includes: receiving user data for user registration input by a valid user; performing on-device training on the feature extractor based on the user data and reference data corresponding to the generalized user; determining a registration feature vector based on an output from the feature extractor in response to an input of user data; receiving test data for user identification input by a test user; determining a test feature vector based on an output from the feature extractor in response to an input of test data; and performing user identification on the test user based on a result of comparing the registered feature vector with the test feature vector.

The feature extractor may include a first neural network having set parameters and a second neural network having adjustable parameters. The adjustable parameters of the second neural network may be adjusted through on-device training. The first neural network may be pre-trained to extract features from input data based on a large user Database (DB). The step of performing on-device training may comprise: assigning labels of different values to the user data and the reference data, respectively; and performing on-device training based on a result of comparing the label with an output from the feature extractor in response to input of the user data and the reference data.

The feature extractor may include a first neural network having set parameters and a second neural network having adjustable parameters. The step of performing on-device training may comprise: inputting user data into a first neural network; inputting reference data and an output from the first neural network to the second neural network in response to the input of the user data; and performing on-device training based on output from the second neural network. The reference data may include generalized feature vectors corresponding to generalized users. The generalized feature vector may be generated by grouping feature vectors corresponding to a plurality of generalized users into clusters.

The step of performing user identification may comprise: user identification is performed based on a result of comparing a distance between the registration feature vector and the test feature vector with a threshold. Determining a distance between the registration feature vector and the test feature vector based on one of a cosine distance between the registration feature vector and the test feature vector and a Euclidean distance between the registration feature vector and the test feature vector. When the registration feature vector is determined, the identification method may further include: the determined registered feature vector is stored in a registered user database.

In another general aspect, an identification method includes: obtaining a feature extractor comprising a first neural network with set parameters and a second neural network with adjustable parameters; performing on-device training on the feature extractor based on user data corresponding to valid users and reference data corresponding to generalized users; and performing user recognition on the test data using the feature extractor when the training on the device is complete.

In another general aspect, an on-device training method for a feature extractor provided in a user device, the feature extractor including a first neural network that is pre-trained and has set parameters and a second neural network having adjustable parameters, the on-device training method comprising: obtaining user data entered by a valid user; inputting user data into a first neural network; and adjusting the adjustable parameters of the second neural network by inputting preset reference data to the second neural network and an output from the first neural network in response to the input of the user data.

The reference data may include 1000 or less feature vectors, 500 or less feature vectors, or 100 or less feature vectors.

In another general aspect, an identification device includes: a processor; and a memory including instructions executable in the processor. When the instructions are executed by a processor, the processor may be configured to: receiving user data for user registration input by a valid user; performing on-device training on the feature extractor based on the user data and reference data corresponding to the generalized user; determining a registration feature vector based on an output from the feature extractor in response to an input of user data; receiving test data for user identification input by a test user; determining a test feature vector based on an output from the feature extractor in response to an input of test data; and performing user identification on the test user based on a result of comparing the registered feature vector with the test feature vector.

In another general aspect, an identification device includes: a processor; and a memory including instructions executable in the processor. When the instructions are executed by a processor, the processor may be configured to: obtaining a feature extractor comprising a first neural network with set parameters and a second neural network with adjustable parameters; performing on-device training on the feature extractor based on user data corresponding to valid users and reference data corresponding to generalized users; and performing user recognition on the test data using the feature extractor when the training on the device is complete.

In another general aspect, a method includes: pre-training a first neural network of a feature extractor at a server side; setting the feature extractor to the device after the first neural network is pre-trained; training a second neural network of the feature extractor on the device using data input to the device; and performing user identification on the test data input to the device using the feature extractor.

The data input to the device may include user data for user registration input by a valid user and reference data corresponding to a generalized user.

The method may comprise: user identification is performed by comparing the registration feature vector corresponding to the user data with the test feature vector corresponding to the test data.

Other features and aspects will be apparent from the following detailed description, the accompanying drawings, and the claims.

Drawings

Fig. 1 is a diagram showing an example of operations for user registration and user identification to be performed by an identification device.

Fig. 2 is a diagram showing an example of processing for pre-training, user registration, and user recognition to be performed.

Fig. 3 is a diagram illustrating an example of pre-training.

Fig. 4 is a diagram illustrating an example of operations for on-device training and user registration to be performed by the recognition apparatus.

Fig. 5 is a diagram illustrating an example of training on a device.

Fig. 6 is a diagram showing an example of generating a generalized user model (generated user model).

Fig. 7 is a diagram showing an example of an operation for user identification to be performed by the identification device.

Fig. 8 and 9 are diagrams illustrating examples of changes in distribution of feature vectors based on training on a device.

FIG. 10 is a flow chart illustrating an example of a recognition method based on-device training.

FIG. 11 is a flow diagram illustrating another example of a recognition method based on-device training.

Fig. 12 is a diagram showing an example of a recognition apparatus based on-device training.

Fig. 13 is a diagram showing an example of a user apparatus.

Throughout the drawings and detailed description, the same drawing reference numerals will be understood to refer to the same elements, features and structures unless otherwise described or provided. The figures may not be to scale and the relative sizes, proportions and depictions of the elements in the figures may be exaggerated for clarity, illustration and convenience.

Detailed Description

The following detailed description is provided to assist the reader in obtaining a thorough understanding of the methods, devices, and/or systems described herein. However, various changes, modifications, and equivalents of the methods, apparatus, and/or systems described herein will be apparent to those skilled in the art after understanding the present disclosure. For example, the order of operations described herein is merely an example and is not limited to the order of operations set forth herein, but may be changed as is apparent after understanding the disclosure of the present application, except for operations that must occur in a particular order. Furthermore, descriptions of features known after understanding the disclosure of the present application may be omitted for the sake of clarity and conciseness.

The features described herein may be implemented in different forms and are not to be construed as limited to the examples described herein. Rather, the examples described herein have been provided to illustrate only some of the many possible ways to implement the methods, devices, and/or systems described herein, which will be apparent after understanding the disclosure of the present application.

Throughout the specification, when an element such as a layer, region or substrate is described as being "on," "connected to" or "coupled to" another element, the element may be directly on, connected to or coupled to the other element or one or more other elements may be present therebetween. In contrast, when an element is referred to as being "directly on," "directly connected to" or "directly coupled to" another element, there may be no other elements present between them. As used herein, the term "and/or" includes any one of the associated listed items and any combination of any two or more.

Although terms such as "first," "second," and "third" may be used herein to describe various elements, components, regions, layers or sections, these elements, components, regions, layers or sections should not be limited by these terms. Rather, these terms are only used to distinguish one element, component, region, layer or section from another element, component, region, layer or section. Thus, a first element, component, region, layer or section referred to in the examples described herein could also be referred to as a second element, component, region, layer or section without departing from the teachings of the examples.

The terminology used herein is for the purpose of describing various examples only and is not intended to be limiting of the disclosure. The singular is intended to include the plural unless the context clearly indicates otherwise. The terms "comprises," "comprising," and "having" specify the presence of stated features, quantities, operations, elements, components, and/or combinations thereof, but do not preclude the presence or addition of one or more other features, quantities, operations, components, elements, and/or combinations thereof.

Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs and as commonly understood after understanding the disclosure of this application. Unless explicitly defined as such herein, terms (such as those defined in general dictionaries) will be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and the disclosure of the present application and will not be interpreted in an idealized or overly formal sense.

Further, in the description of the examples, when it is considered that a detailed description of a structure or a function thus known after understanding the disclosure of the present application will lead to a vague explanation of the examples, such description will be omitted.

Examples will be described in detail below with reference to the accompanying drawings, and in the drawings, like reference numerals refer to like elements throughout.

Fig. 1 is a diagram showing an example of operations for user registration and user identification to be performed by an identification device. Referring to fig. 1, the recognition apparatus 110 registers the valid user 101 in the recognition apparatus 110 based on the user data of the valid user 101. The active user 101 may be one or more users, and one or more users may be registered in the identification device 110. The active user 101 may be a person having rights to use the identification device 110 (e.g., an owner or administrator of the apparatus in which the identification device 110 is disposed or embedded). The active user 101 may also be referred to as a real user (genine user). The registration of a valid user 101 in the identification device 110 may be referred to herein as a user registration process. Through the user registration process, identification information (e.g., a registration feature vector) of the valid user 101 is stored in the recognition device 110 or another device or apparatus associated with the recognition device 110. When the valid user 101 is registered through the user registration process, the valid user 101 may be subsequently referred to as a registered user.

The test user 102 may be an unidentified person who has not yet been identified, and the test user 102 attempts user identification by the identification device 110 to use the identification device 110. The test user 102 may be a valid user 101 or an imposter indicating a person who does not have the right to use the identification device 110. The identification device 110 may perform user identification on the test user 102 by comparing the test data of the test user 102 with the user data. Performing user identification on the test user 102 may be referred to herein as a user identification process. The user identification process may be performed after the user registration process is performed.

User identification may include user authentication and user recognition. User verification may be performed to determine whether the test user 102 is a registered user, and user recognition may be performed to determine which of a plurality of users is the test user 102. For example, when there are multiple registered users and the test user 102 is one of the multiple registered users, user recognition may be performed to determine one registered user corresponding to the test user 102.

The result of the user identification (simply referred to as "identification result" herein) may include at least one of a result of the user authentication (authentication result) and a result of the user identification (identification result). For example, when the test user 102 is a registered user, the identification device 110 may output a verification result corresponding to a successful identification. In this example, when there are multiple registered users, the recognition result may include a recognition result indicating which of the multiple registered users corresponds to the test user 102. However, when the test user 102 is an imposter, the identification device 110 may output a verification result corresponding to an unsuccessful identification.

User data may be associated with active users 101 and test data may be associated with test users 102. The user data may be input to the identification device 110 by the valid user 101, the user data may be input to another device or apparatus including the identification device 110 to be transmitted to the identification device 110, or the user data may be input to another device or apparatus separate from the identification device 110 to be transmitted to the identification device 110. Similarly, test data may be input to the identification device 110 by the test user 102, test data may be input to another device or apparatus that includes the identification device 110 to be transmitted to the identification device 110, or test data may be input to another device or apparatus that is separate from the identification device 110 to be transmitted to the identification device 110.

The data input to the identification device 110, such as user data and test data, may be referred to as input data. The input data may include voice or images. For example, in the case of speaker recognition, the input data may include speech, or audio. In the case of face recognition, the input data may include a face image. In the case of fingerprint recognition, the input data may include a fingerprint image. In the case of iris recognition, the input data may include an iris image. The identification device 110 may perform user authentication based on at least one of such various authentication methods. The modality (modality) of each of the user data, the test data, the reference data, and the training data may correspond to at least one verification method used by the identification device 110. Hereinafter, for convenience of description, examples will be described with respect to speaker recognition. However, the examples are also applicable to other verification methods than speaker recognition.

The recognition device 110 may use the feature extractor 120 to perform user recognition. The feature extractor 120 includes neural networks (e.g., a first neural network 121 and a second neural network 122). At least a portion of the neural network may be implemented by software, hardware including a neural processor, or a combination thereof. For example, the neural network may be a Deep Neural Network (DNN) including, for example, a fully-connected network, a deep convolutional network, and a Recurrent Neural Network (RNN). The DNN may include a plurality of layers including an input layer, at least one hidden layer, and an output layer.

The neural network may be trained to perform a given operation by mapping input data and output data in a non-linear relationship based on deep learning. Deep learning may be one type of machine learning that will be performed based on large data sets to solve a given problem. Deep learning can be interpreted as an optimization process that finds the point at which energy is minimized. Through supervised learning or unsupervised learning of deep learning, weights corresponding to an architecture or a model of a neural network may be obtained, and input data and output data may be mapped to each other through the weights obtained as described above. Although the feature extractor 120 is shown in fig. 1 as being located outside the recognition device 110, the feature extractor 120 may be located inside the recognition device 110.

The recognition device 110 may input data to the feature extractor 120, and in response to the input of the input data, register a user in the recognition device 110 or generate a recognition result based on an output from the feature extractor 120. In one example, the recognition device 110 may apply preprocessing to the input data and input the input data obtained by applying the preprocessing to the feature extractor 120. Through preprocessing, the input data may be changed into a form suitable for the feature extractor 120 to extract features therefrom. For example, when the input data corresponds to an audio wave, the audio wave may be converted into a frequency spectrum by preprocessing.

The feature extractor 120 may output data in response to input of input data. The output data of the feature extractor 120 may be referred to herein as a feature vector. Optionally, the output data of the feature extractor 120 may also be referred to as an embedded vector, indicating that the output data includes identification information of the user. In the user registration process for the valid user 101, the feature extractor 120 may output a feature vector in response to input of user data. The output feature vector may be referred to herein as a registered feature vector and is stored as identification information of the valid user 101 in the identification device 110 or another device or apparatus associated with the identification device 110. In the user recognition process for the test user 102, the feature extractor 120 may output a feature vector in response to an input of test data. The output feature vector may be referred to herein as a test feature vector.

The recognition device 110 may generate a recognition result by comparing the enrollment feature vector with the test feature vector. For example, the recognition device 110 may determine a distance between the registered feature vector and the test vector, and generate a recognition result based on a result of comparing the determined distance to a threshold. In this example, the enrollment feature vector and the test feature vector may be represented as matching each other when the determined distance is less than the threshold, and the enrollment feature vector and the test feature vector may be represented as not matching each other when the determined distance is not less than the threshold.

For example, when there are multiple registered users, there may be multiple registered feature vectors for each registered user. In this example, the recognition device 110 may generate a recognition result by comparing the test feature vector with each of the registration feature vectors. When the test feature vector matches one of the registered feature vectors, the recognition apparatus 110 may output a recognition result corresponding to a successful recognition. The identification result may include a recognition result associated with a registered user corresponding to the registered feature vector matching the test feature vector. For example, the recognition result may include a recognition result associated with one of the registered users corresponding to the test user 102.

The feature extractor 120 includes a first neural network 121 and a second neural network 122. The first neural network 121 may be pre-trained or pre-trained based on a large user Database (DB) (which may also be referred to as a non-specific general user database), and the second neural network 122 may be additionally trained based on user data in a user registration process. Here, the term "pre" or "pre" may indicate a point of time before the user registration process is performed (e.g., a point of time of development and production of the feature extractor 120). The large user database may correspond to non-specific general users, and the user data may correspond to specific users (e.g., active users 101). In one example, the training of the first neural network 121 may be performed by the server in steps of development and production of the feature extractor 120, and is referred to as pre-training or first training. In addition, the training of the second neural network 122 may be performed by a device or apparatus including the recognition device 110 in a user registration process, and is referred to as on-apparatus training or second training. Here, the "device" in the term "on-device training" may indicate a user device in which the recognition apparatus 110 is provided or embedded.

The first neural network 121 may have set parameters and the second neural network 122 may have adjustable parameters. The parameters used herein may include weights. When the first neural network 121 is trained through pre-training, the parameters of the first neural network 121 may be set and not changed through on-device training. Setting a parameter can also be described as the parameter being frozen, and setting a parameter can also be referred to as freezing a parameter. The parameters of the second neural network 122 may be adjusted through on-device training. The first neural network 121 may extract features from the input data in a general manner, and the second neural network 122 may remap the features extracted by the first neural network 121 so that the features are specific to the user of the individual device.

In user recognition, a mismatch between training data and actual user data may result in poor recognition performance. For example, the actual user data is not used for pre-training of the first neural network 121, and thus the level of recognition performance of the feature extractor 120 including only the first neural network 121 may not be satisfactory. However, in this example, on-device training of the second neural network 122 may be performed based on actual user data, and thus may help reduce such mismatches. For example, when using a general feature extractor to which only pre-training is applied, users (e.g., family members) with similar features may not be easily recognized. However, when using the feature extractor 120 described herein, the actual user data for each user may be used for on-device training, and thus, multiple users with similar features may be relatively accurately identified.

In addition, for on-device training, reference data corresponding to a generalized user (generated user) may be used in addition to user data. For example, a broad user may be understood as a typical user or a representative user among non-specific general users. The feature extractor 120 may extract features from the user data that are distinguishable from features in the reference data by on-device training using the user data and the reference data. Thus, the feature vector of the impostor and the feature vector of the registered user can be more accurately identified, and the identification performance is improved. On-device training using user data and reference data is described in more detail below.

Fig. 2 is a diagram showing an example of processing for pre-training, user registration, and user recognition to be performed. Referring to fig. 2, in operation 210, pre-training is performed. Pre-training may be performed based on a large user database corresponding to non-specific general users. Through pre-training, the first neural network 201 of the feature extractor 200 may be trained. The pre-training may be performed on the server side. After operation 210 is performed, the feature extractor 200 may be provided or embedded in a device and distributed to users.

In operation 220, on-device training is performed when user data is input by a valid user for user enrollment. In operation 230, user registration is performed.

Operations

220 and 230 may be collectively referred to as a user registration process. On-device training may be performed in a user registration process. On-device training may be performed based on user data corresponding to a particular user (e.g., an active user) and reference data corresponding to a generalized user. Through on-device training, the second neural network 202 of the feature extractor 200 is trained. The second neural network 202 may be initialized with an identity matrix (identity matrix) before performing on-device training.

After operation 220 is performed, feature extractor 200 may become specialized for registering users. In operation 230, user data for valid users is input to the feature extractor 200 after training on the device is complete. The registration feature vector is determined based on output from the feature extractor 200 in response to input of user data. When the registration feature vector is determined, the determined registration feature vector is stored in a registered subscriber database.

In operation 240, user identification is performed. Here, operation 240 may be referred to as a user identification process. In this operation, test data for user recognition input by a test user is input to the feature extractor 200, and a test feature vector is determined based on an output of the feature extractor 200 in response to the input of the test data. Based on the result of comparing the registered feature vector with the test feature vector, user identification is performed on the test user. Operations 220 through 240 may be performed by a device.

Fig. 3 is a diagram illustrating an example of pre-training. Referring to fig. 3, a training device 310 trains a neural network 330 using a large user database 320 to extract features from input data. For example, large user database 320 may include training data associated with a plurality of non-specific general users, and labels may be assigned to each or each set of training data. The training data may include speech or images. For example, in the case of speaker recognition, the input data may include speech, or audio.

The neural network 330 includes an input layer 331, at least one hidden layer 332, and an output layer 333. For example, the input layer 331 may correspond to training data, and the output layer 333 may correspond to an activation function (such as, for example, maximum flexibility (Softmax)). Through pre-training of the neural network 330, parameters (e.g., weights) of the hidden layer 332 may be adjusted. When assigning labels, different labels may be assigned to each training data, and by pre-training based on the labels and the training data, the neural network 330 may output different output data in response to different input data. For example, different labels may be assigned to the training data of the respective groups, and by pre-training based on the labels and the training data, the neural network 330 may output different groups of output data in response to different groups of input data. This capability of the neural network 330 may be interpreted as a feature extraction function.

For example, a first label may be assigned to first training data and a second label may be assigned to second training data. In this example, the neural network 330 may be responsive to input of the first training data and output the first output data, and responsive to input of the second training data and output the second output data. The training device 310 may then compare the first output data to the first label and adjust the parameters of the hidden layer 332 such that the first output data and the first label may become identical to each other. Similarly, the training device 310 may compare the second output data with the second label and adjust the parameters of the hidden layer 332 such that the second output data and the second label may become identical to each other. The training device 310 may pre-train the neural network 330 by repeating such processing as described above based on the large user database 320.

In one example, the training process may be performed by a batch unit (batch unit). For example, a process of inputting training data to the neural network 330 and obtaining output data corresponding to an output from the neural network 330 in response to the input of the training data (for example, a process of inputting a set of training data to the neural network 330 and obtaining a set of output data corresponding to an output from the neural network 330 in response to the input of the training data) may be performed by a batch processing unit, and pre-training using the large user database 320 may be performed by repeating such a process by the batch processing unit.

The output layer 333 may convert the feature vectors output from the hidden layer 332 into a form corresponding to the tag. With the pre-training, the parameters of the hidden layer 332 may be set to values suitable for the training target, and when the pre-training is completed, the parameters of the hidden layer 332 may be set or fixed. Subsequently, the output layer 333 may be removed from the neural network 330, and a first neural network of the feature extractor may be configured using the portion 340 including the input layer 331 and the hidden layer 332.

When the pre-training is complete, the neural network 330 may perform a feature extraction function to output different output data in response to different input data, e.g., different sets of output data in response to different sets of input data. Such a feature extraction function may exhibit maximum performance when the training data is the same as the actual data used in the user registration process and the user identification process. However, the training data and the actual data may typically be different from each other. It is only theoretically possible to reduce the mismatch between the training data and the actual data by including the actual data in the training data and performing retraining to improve the recognition performance.

However, the neural network 330 may need to be trained using the large user database 320 until the neural network 330 has feature extraction functionality, and such training may require significant computational resources. Typically, user devices may have limited computing resources, and thus, such training may be performed at a large-scale server end. Thus, according to one example, a dual training method is provided that includes pre-training and on-device training. The dual training method may generate a first neural network of feature extractors by training the neural network 330 using the large user database 320, and generate a second neural network of feature extractors based on actual data. Thus, mismatches between training data and actual data may be reduced or minimized, and feature extractors specific to user devices are provided.

Fig. 4 is a diagram illustrating an example of operations for on-device training and user registration to be performed by the recognition apparatus. Referring to fig. 4, the active user inputs user data for user registration. The recognition device 410 performs on-device training on the feature extractor 420 based on the user data. The feature extractor 420 includes a first neural network 421 and a second neural network 422. The parameters of the first neural network 421 may be set or fixed by pre-training, and the parameters of the second neural network 422 may be adjusted by on-device training. For on-device training, reference data may be used. The recognition device 410 obtains reference data from the generalized user model 430 and inputs the obtained reference data to the second neural network 422. The user data may correspond to valid users and the reference data may correspond to generalized users. The generalized user model 430 will be described in more detail below.

The identification device 410 adjusts the parameters of the second neural network 422 by assigning a different label to each or each set of user data and reference data and comparing the labels to the output from the feature extractor 420 in response to the input of the user data and reference data. For example, the parameters of the second neural network 422 may be adjusted such that the output of the feature extractor 420 in response to the input of the user data (i.e., the output of the second neural network 422) becomes the same as the label assigned to the user data, and the output of the feature extractor 420 in response to the input of the reference data (i.e., the output of the second neural network 422) becomes the same as the label assigned to the reference data. As described above, the recognition device 410 may train the feature extractor 420 such that the feature extractor 420 may output different feature vectors corresponding to the user data and the reference data, respectively. By training the feature extractor 420 using the user data and the reference data, the registered feature vectors of the registered users can be more accurately identified, and the registered feature vectors of the registered users and the feature vectors of imposters can be more accurately identified. Thus, through on-device training, the feature extractor 420 may have an identification capability to identify each registered user, as well as a verification capability to distinguish registered users from imposters and verify registered users.

When the on-device training is completed, the recognition apparatus 410 inputs user data to the feature extractor 420, and obtains a feature vector output from the feature extractor 420 in response to the input of the user data. The recognition device 410 stores the feature vector output by the feature extractor 420 as a registered feature vector in the registered user database 440. The registered feature vectors may then be used in a user identification process.

Fig. 5 is a diagram illustrating an example of training on a device. Referring to fig. 5, the recognition apparatus 510 performs on-device training on the feature extractor 520 using the user data and the reference data. User data is input to the first neural network 521 and reference data is input to the second neural network 522. The reference data is obtained from the generalized user model 540. The recognition device 510 inputs the user data to the first neural network 521. When the first neural network 521 responds to the input of the user data and outputs the feature vector, the recognition device 510 inputs the output feature vector to the second neural network 522. The reference data may be generated using a neural network configured to perform feature extraction similar to the first neural network 521. Can be explained as follows: the output from the first neural network 521 is input from the first neural network 521 to the second neural network 522 without being controlled by the recognition device 510.

The second neural network 522 may be trained by similar processing performed on the neural network 330 of FIG. 3. Can be explained as follows: the training data of fig. 3 is replaced with the feature vectors and reference vectors corresponding to the user data in the example of fig. 5. The second neural network 522 includes an input layer 523, at least one hidden layer 524, and an output layer 525. For example, the input layer 523 may correspond to input data including feature vectors corresponding to user data and reference data, and the output layer 525 may correspond to an activation function (such as maximum flexibility). The parameters (e.g., weights) of the hidden layer 524 may be adjusted through on-device training. For example, the parameters of the hidden layer 524 may be adjusted such that the output of the feature extractor 420 in response to the input of the user data (i.e., the output of the second neural network 522) becomes the same as the label assigned to the user data, and the output of the feature extractor 420 in response to the input of the reference data (i.e., the output of the second neural network 522) becomes the same as the label assigned to the reference data. The second neural network 522 may be constructed using a portion 530 that includes an input layer 523 and a hidden layer 524.

A tag may be assigned to each of the user data and the reference data. By on-device training based on user data, reference data, and different labels assigned to the respective data, the feature extractor 520 may become capable of outputting different output data in response to different user data and reference data. For example, a tag may be assigned to each group of user data and reference data. By on-device training based on the user data, the reference data, and different labels assigned to the data of the respective groups, the feature extractor 520 may become capable of outputting different groups of output data in response to different groups of user data and reference data. For example, the first neural network 521 may extract features from the input data in a general manner, while the second neural network 522 may remap the features extracted by the first neural network 521 so that the features become specific to the user of the individual device.

In one example, the training process may be performed by a batch processing unit. For example, a process of inputting user data and reference data to the feature extractor 520 and obtaining output data corresponding to an output from the feature extractor 520 may be performed by a batch processing unit (for example, a process of inputting one or a set of user data and reference data to the feature extractor 520 and obtaining one or a set of output data corresponding to an output from the feature extractor 520 may be performed by a batch processing unit), and on-device training using the user data and the reference data may be performed by repeating such a process via the batch processing unit. When the on-device training is complete, the parameters of the hidden layer 524 may be set or fixed. Subsequently, the output layer 525 may be removed from the second neural network 522, and the second neural network 522 may be determined or decided upon with the output layer 525 removed therefrom.

By on-device training as described above, mismatches between training data and actual data may be reduced or minimized. For example, the recognition capability of recognizing the registered feature vector by the user data and the verification capability of recognizing or distinguishing the registered feature vector and the feature vector of the impostor by the reference data can be improved.

FIG. 6 is a diagram illustrating an example of generating a generalized user model. Referring to fig. 6, input data is extracted from a large user database 610 and input to a neural network 620. For example, the neural network 620 may correspond to the first neural network 421 of fig. 4, and the neural network 620 may output a feature vector based on the input data. Large subscriber database 610 may be the same as or different from large subscriber database 320 of fig. 3.

In the example of fig. 6, the feature vectors output by the neural network 620 are indicated by small circles on the vector plane 630. These feature vectors may correspond to a plurality of generalized users included in large user database 610, and are also usedReferred to as the base feature vector. As a vector representing the basic feature vector, a representative feature vector (e.g., θ) may be selected₁，θ₂，…，θ_c). For example, the representative feature vector θ may be selected by grouping the basis feature vectors into clusters (cluster)₁，θ₂，…，θ_c(e.g., one or more feature vectors are selected as representative feature vectors from each of some or all of the grouped clusters). Representative feature vector θ₁，θ₂，…，θ_cMay correspond to a generalized user and is also referred to as a generalized eigenvector. In addition, the representative feature vector θ₁，θ₂，…，θ_cMay be included as reference data in the generalized user model 640 and used for on-device training. For example, there may be tens or hundreds of such typical feature vectors. For example, the number of representative feature vectors may be 1000 or less, 500 or less, or 100 or less. Typical feature vectors may correspond to data that a user device actually or realistically processes through deep learning or training. For example, 10 utterances or voices may be collected from each of approximately one hundred thousand users, and a database including approximately one million utterances or voices may be configured. Based on this database, approximately 100 representative feature vectors may be generated.

Fig. 7 is a diagram showing an example of an operation for user identification to be performed by the identification device. Referring to fig. 7, the recognition apparatus 710 inputs test data to the feature extractor 720. In the example of fig. 7, the feature extractor 720 is in a state where training on the device is complete. The feature extractor 720 outputs a test feature vector in response to an input of test data. The test data may be entered by the test user in a user verification process. The test user may be an unrecognized person attempting user identification by the identification device 710 to use the identification device 710. The test user may be a valid user or an imposter.

The recognition apparatus 710 obtains the registered feature vector from the registered user database 730, performs user recognition on the test user by comparing the registered feature vector with the test feature vector, and generates a recognition result. For example, the recognition device 710 determines a distance between the enrollment feature vector and the test feature vector, and generates a recognition result based on a result of comparing the determined distance with a threshold. For example, the distance between the registered feature vector and the test feature vector may be determined based on a cosine distance (cosine distance) or a Euclidean distance (Euclidean distance) between the registered feature vector and the test feature vector.

Fig. 8 and 9 are diagrams illustrating examples of changes in distribution of feature vectors based on training on a device. In the example of fig. 8, the registration feature vectors are indicated on

vector planes

810 and 820. The registered feature vectors are indicated by small circles, and small circles having the same pattern indicate registered feature vectors of the same registered user. In one example, the registered feature vectors on vector plane 810 are obtained by feature extractors that have not been trained on the application device, and the registered feature vectors on vector plane 820 are obtained by feature extractors to which on-device training has been applied. As shown in fig. 8, the registered feature vectors may be remapped to be specific to registered users through on-device training. Accordingly, registered users, e.g., registered users having similar characteristics, such as family members, can be more accurately identified from each other.

Referring to fig. 9, in contrast to

vector planes

810 and 820 shown in fig. 8, vector planes 910 and 920 also include feature vectors of imposters. In the example of fig. 9, the imposter's feature vector is simply referred to as an imposter feature vector and is indicated by a hexagram. In this example, the enrolled feature vectors and imposter feature vectors on vector plane 910 are obtained by feature extractors that have not been trained on the application device, and the enrolled feature vectors and imposter feature vectors on vector plane 920 are obtained by feature extractors to which training on the device has been applied. As shown in fig. 9, in addition to the enrolled feature vectors, the impostor feature vectors may also be remapped to be specific to enrolled users through on-device training. Accordingly, the registered user and the impostor can be more accurately identified or distinguished from each other, so that the registered user can be more accurately authenticated.

FIG. 10 is a flow chart illustrating an example of a recognition method based on-device training. Referring to fig. 10, in operation 1010, the identification apparatus receives user data for user registration input by a valid user. In operation 1020, the recognition device performs on-device training on the feature extractor based on the user data and reference data corresponding to the generalized user. In operation 1030, the recognition device determines a registration feature vector based on an output from the feature extractor in response to an input of user data. In operation 1040, the identification device receives test data entered by a test user for user identification. In operation 1050, the recognition device determines a test feature vector based on an output from the feature extractor in response to an input of test data. In operation 1060, the recognition device performs user recognition on the test user based on a result of comparing the registration feature vector with the test feature vector. For a more detailed description of the recognition method based on-device training, reference may be made to the description provided above with reference to fig. 1 to 9.

FIG. 11 is a flow diagram illustrating another example of a recognition method based on-device training. Referring to fig. 11, in operation 1110, a recognition apparatus obtains a feature extractor including a first neural network having set parameters and a second neural network having adjustable parameters. In operation 1120, the recognition apparatus performs on-device training on the feature extractor based on the user data corresponding to the valid user and the reference data corresponding to the generalized user. In operation 1130, when the on-device training is completed, the recognition apparatus performs user recognition using the feature extractor. For a more detailed description of the recognition method based on-device training, reference may be made to the description provided above with reference to fig. 1 to 10.

Fig. 12 is a diagram showing an example of a recognition apparatus based on-device training. The recognition device 1200 may receive input data including user data and test data and process the operation of the neural network associated with the received input data. For example, the operation of the neural network may include user identification. The recognition device 1200 may perform one or more or all of the operations or methods described herein with respect to processing a neural network and provide the results of processing the neural network to a user.

Referring to fig. 12, the recognition apparatus 1200 includes at least one processor 1210 and a memory 1220. The memory 1220 may be connected to the processor 1210 and store instructions executable by the processor 1210 and data to be processed by the processor 1210 or data processed by the processor 1210. The memory 1220 may include a non-transitory computer-readable medium (e.g., high speed Random Access Memory (RAM)) and/or a non-volatile computer-readable storage medium (e.g., at least one disk storage device, flash memory device, and other non-volatile solid-state memory devices).

Processor 1210 may execute instructions to perform one or more or all of the operations or methods described above with reference to fig. 1-11. In one example, when instructions stored in memory 1220 are executed by processor 1210, processor 1210 may receive user data for user registration input by a valid user, perform on-device training for feature extractor 1225 based on the user data and reference data corresponding to a generalized user, determine a registration feature vector based on output from feature extractor 1225 in response to input of the user data, receive test data for user identification input by a test user, determine a test feature vector based on output from feature extractor 1225 in response to input of the test data, and perform user identification for the test user based on a result of comparing the registration feature vector to the test feature vector.

In another example, when instructions stored in the memory 1220 are executed by the processor 1210, the processor 1210 may obtain the feature extractor 1225 including a first neural network having set parameters and a second neural network having adjustable parameters, perform on-device training on the feature extractor 1225 based on user data corresponding to a valid user and reference data corresponding to a generalized user, and perform user recognition using the feature extractor 1225 when the on-device training is completed.

Fig. 13 is a diagram showing an example of a user apparatus. The user device 1300 may receive input data and process the operation of the neural network associated with the received input data. For example, the operation of the neural network may include user identification. The user apparatus 1300 may include the identification device described above with reference to fig. 1 to 12 and perform the operation or function of the identification device as described above with reference to fig. 1 to 12.

Referring to fig. 13, the user device 1300 includes a processor 1310, a memory 1320, a camera 1330, a storage device 1340, an input device 1350, an output device 1360, and a network interface 1370. The processor 1310, memory 1320, camera 1330, storage 1340, input device 1350, output device 1360, and network interface 1370 may communicate with each other through a communication bus 1380. For example, user device 1300 may include a smart phone, a tablet Personal Computer (PC), a laptop computer, a desktop computer, a wearable device, a smart home appliance, a smart speaker, a smart car, and so forth.

Processor 1310 may execute functions and instructions in user device 1300. For example, the processor 1310 may process instructions stored in the memory 1320 or the storage device 1340. The processor 1310 may perform one or more or all of the operations or methods described above with reference to fig. 1-12.

The memory 1320 may store information to be used in processing the operation of the neural network. Memory 1320 may include a computer-readable storage medium or a computer-readable storage device. Memory 1320 may store instructions to be executed by processor 1310 and store relevant information while software or applications are being executed by user device 1300.

The camera 1330 may capture still images, motion or video images, or both still and motion or video images. The camera 1330 may capture an image of a face region to be input by the user for face verification. The camera 1330 may also provide a three-dimensional (3D) image including depth information of the object.

Storage 1340 may include a computer-readable storage medium or a computer-readable storage device. Storage device 1340 may store larger amounts of information for longer periods of time than memory 1320. For example, storage 1340 may include a magnetic hard disk, optical disk, flash memory, floppy disk and other types of non-volatile memory known in the relevant art.

The input device 1350 may receive input from a user through a conventional input method (e.g., a keyboard and a mouse) and a new input method (e.g., a touch input, a voice input, and an image input). For example, input devices 1350 may include keyboards, mice, touch screens, microphones, and other devices that can detect input from a user and send the detected input to user device 1300. Through the input device 1350, data of a user's fingerprint, iris, voice, and audio, etc. can be input.

The output device 1360 may provide output to a user from the user device 1300 through visual, auditory, or tactile channels. For example, output devices 1360 may include displays, touch screens, speakers, vibration generators, and other devices that may provide output to a user. The network interface 1370 may communicate with an external device through a wired network or a wireless network.

The recognition apparatus, training apparatus, user devices and other apparatus, devices, units, modules and other components described herein with respect to fig. 1, 12 and 13 are implemented by hardware components. Examples of hardware components that may be used to perform the operations described in this application include, where appropriate: a controller, a sensor, a generator, a driver, a memory, a comparator, an arithmetic logic unit, an adder, a subtractor, a multiplier, a divider, an integrator, and any other electronic component configured to perform the operations described herein. In other examples, one or more of the hardware components that perform the operations described in this application are implemented by computing hardware (e.g., by one or more processors or computers). A processor or computer may be implemented by one or more processing elements, such as an array of logic gates, a controller and arithmetic logic unit, a digital signal processor, a microcomputer, a programmable logic controller, a field programmable gate array, a programmable logic array, a microprocessor, or any other device or combination of devices that is configured to respond to and execute instructions in a defined manner to achieve a desired result. In one example, a processor or computer includes or is connected to one or more memories that store instructions or software for execution by the processor or computer. A hardware component implemented by a processor or a computer may execute instructions or software (such as an Operating System (OS) and one or more software applications running on the OS) for performing the operations described in this application. The hardware components may also access, manipulate, process, create, and store data in response to execution of instructions or software. For simplicity, the singular terms "processor" or "computer" may be used in the description of the examples described in this application, but in other examples, multiple processors or computers may be used, or a processor or computer may include multiple processing elements or multiple types of processing elements, or both. For example, a single hardware component or two or more hardware components may be implemented by a single processor, or two or more processors, or a processor and a controller. One or more hardware components may be implemented by one or more processors, or processors and controllers, and one or more other hardware components may be implemented by one or more other processors, or other processors and other controllers. One or more processors, or processors and controllers, may implement a single hardware component or two or more hardware components. The hardware components may have any one or more of different processing configurations, examples of which include: single processors, independent processors, parallel processors, Single Instruction Single Data (SISD) multiprocessing, Single Instruction Multiple Data (SIMD) multiprocessing, Multiple Instruction Single Data (MISD) multiprocessing, and Multiple Instruction Multiple Data (MIMD) multiprocessing.

The methods illustrated in fig. 2-11 to perform the operations described in the present application are performed by computing hardware (e.g., by one or more processors or computers) implemented as executing instructions or software as described above to perform the operations described in the present application as performed by the methods. For example, a single operation or two or more operations may be performed by a single processor or two or more processors, or a processor and a controller. One or more operations may be performed by one or more processors, or processors and controllers, and one or more other operations may be performed by one or more other processors, or other processors and other controllers. One or more processors, or a processor and a controller, may perform a single operation or two or more operations.

Instructions or software for controlling a processor or computer to implement hardware components and perform methods as described above are written as computer programs, code segments, instructions, or any combination thereof, to individually or collectively instruct or configure the processor or computer to operate as a machine or special purpose computer to perform operations performed by hardware components and methods as described above. In one example, the instructions or software include machine code that is directly executed by a processor or computer (such as machine code generated by a compiler). In another example, the instructions or software comprise high-level code that is executed by a processor or computer using an interpreter. Instructions or software can be readily written by a programmer of ordinary skill in the art based on the block and flow diagrams illustrated in the drawings and the corresponding descriptions in the specification, which disclose algorithms for performing the operations performed by the hardware components and methods as described above.

Instructions or software for controlling a processor or computer to implement hardware components and perform methods as described above, as well as any associated data, data files, and data structures, are recorded, stored, or fixed in or on one or more non-transitory computer-readable storage media. Examples of non-transitory computer-readable storage media include: read-only memory (ROM), random-access programmable read-only memory (PROM), electrically erasable programmable read-only memory (EEPROM), random-access memory (RAM), dynamic random-access memory (DRAM), static random-access memory (SRAM), flash memory, non-volatile memory, CD-ROM, CD-R, CD + R, CD-RW, CD + RW, DVD-ROM, DVD-R, DVD + R, DVD-RW, DVD + RW, DVD-RAM, BD-ROM, BD-R, BD-R LTH, BD-RE, Blu-ray or optical disk memory, Hard Disk Drive (HDD), Solid State Drive (SSD), card-type memory (such as a multimedia card or a mini-card (e.g., Secure Digital (SD) or extreme digital (XD)), magnetic tape, floppy disk, magneto-optical data storage device, magnetic tape, magneto-optical data storage device, optical data, Hard disks, solid state disks, and any other device configured to store and provide instructions or software and any associated data, data files, and data structures to a processor or computer in a non-transitory manner such that the processor or computer can execute the instructions.

Although the present disclosure includes specific examples, it will be apparent to those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the claims and their equivalents. The examples described herein are to be considered merely as illustrative and not restrictive. The description of features or aspects in each example will be considered applicable to similar features or aspects in other examples. Suitable results may be achieved if the described techniques are performed in a different order and/or if components in the described systems, architectures, devices, or circuits are combined in a different manner and/or replaced or supplemented by other components or their equivalents. Therefore, the scope of the present disclosure is defined not by the detailed description but by the claims and their equivalents, and all variations within the scope of the claims and their equivalents are to be construed as being included in the present disclosure.

Claims

1. A method of identification comprising:

receive user data entered by a valid user for user registration;

performing on-device training on the feature extractor based on the user data and the reference data corresponding to the generalized user;

determining a registration feature vector based on the output from the feature extractor in response to the input of the user data;

receive test data entered by the test user for user identification;

determining a test feature vector based on the output from the feature extractor in response to the input of the test data; and

Based on the result of comparing the registration feature vector with the test feature vector, user identification is performed on the test user.

2. The identification method according to claim 1, wherein the feature extractor comprises a first neural network with set parameters and a second neural network with adjustable parameters,

Wherein, the adjustable parameters of the second neural network are adjusted through on-device training.

3. The identification method of claim 2, wherein the first neural network is pretrained to extract features from input data based on a large user database.

4. The identification method according to claim 1, wherein the step of performing on-device training comprises:

labels that assign different values to user data and reference data, respectively; and

On-device training is performed based on the results of comparing the labels to outputs from the feature extractor in response to inputs of user data and reference data.

5. The identification method according to claim 1, wherein the feature extractor comprises a first neural network with set parameters and a second neural network with adjustable parameters,

Wherein, the step of performing on-device training includes:

input user data into the first neural network;

inputting reference data to the second neural network and an output from the first neural network responsive to the input of the user data; and

On-device training is performed based on the output from the second neural network.

6. The identification method according to claim 1, wherein the reference data comprises a generalized eigenvector corresponding to a generalized user,

Among them, generalized feature vectors are generated by grouping feature vectors corresponding to multiple generalized users into clusters.

7. The identification method according to claim 1, wherein the step of performing user identification comprises:

User identification is performed based on the result of comparing the distance between the registration feature vector and the test feature vector with a threshold.

8. identification method according to claim 7, wherein, based on one in the cosine distance between the registration eigenvector and the test eigenvector and the Euclidean distance between the registration eigenvector and the test eigenvector, it is determined that the registration eigenvector and the test distance between feature vectors.

9. The identification method according to claim 1, further comprising:

Store the determined registration feature vector in the registered user database.

10. A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the identification method of claim 1 .

11. A method of identification comprising:

obtaining a feature extractor comprising a first neural network with set parameters and a second neural network with adjustable parameters;

performing on-device training on the feature extractor based on user data corresponding to valid users and reference data corresponding to generalized users; and

When the on-device training is complete, user identification is performed on the test data using a feature extractor.

12. The identification method according to claim 11, wherein the adjustable parameters of the second neural network are adjusted through on-device training.

13. The identification method of claim 11, wherein the step of performing on-device training comprises:

input user data into the first neural network;

14. An on-device training method for a feature extractor provided in a user device, the feature extractor comprising a pre-trained first neural network with set parameters and a second neural network with adjustable parameters, The on-device training method includes:

Obtain user data entered by a valid user;

inputting user data to the first neural network; and

Adjustable parameters of the second neural network are adjusted by inputting preset reference data to the second neural network and output from the first neural network in response to the input of user data.

15. The on-device training method of claim 14, wherein the reference data includes 1000 or less feature vectors.

16. The on-device training method of claim 14, wherein the reference data includes 500 or less feature vectors.

17. The on-device training method of claim 14, wherein the reference data includes 100 or less feature vectors.

18. The on-device training method of claim 14, wherein the reference data includes generalized feature vectors corresponding to generalized users.

19. The on-device training method of claim 18, wherein the generalized feature vectors are generated by grouping feature vectors corresponding to a plurality of generalized users into clusters.

20. An identification device comprising:

processor; and

memory, including instructions capable of being executed in the processor,

wherein, when the instructions are executed by the processor, the processor is configured to:

receive user data entered by a valid user for user registration;

receive test data entered by the test user for user identification;

21. The identification device of claim 20, wherein the feature extractor comprises a first neural network with set parameters and a second neural network with adjustable parameters, and

22. The identification device of claim 21, wherein the first neural network is pretrained to extract features from input data based on a large database of users.

23. The identification device of claim 20, wherein the processor is configured to:

24. The identification device of claim 20, wherein the feature extractor comprises a first neural network with set parameters and a second neural network with adjustable parameters, and

where the processor is configured as:

input user data into the first neural network;

25. The identification device of claim 20, wherein the reference data includes generalized feature vectors corresponding to generalized users,

26. The identification device of claim 20, wherein the processor is configured to:

27. The identification device according to claim 26, wherein, based on one of the cosine distance between the registration eigenvector and the test eigenvector and the Euclidean distance between the registration eigenvector and the test eigenvector, it is determined that the registration eigenvector and the test distance between feature vectors.

28. The identification device of claim 20, wherein the processor is configured to store the determined registration feature vector in a registered user database.

29. An identification device comprising:

processor; and

memory, including instructions capable of being executed in the processor,

30. The recognition apparatus of claim 29, wherein the adjustable parameters of the second neural network are adjusted by on-device training.

31. The identification device of claim 29, wherein the processor is configured to:

input user data into the first neural network;

32. A method of identification comprising:

Pre-training the first neural network of the feature extractor on the server side;

after the first neural network is pretrained, setting the feature extractor to the device;

training a second neural network of the feature extractor on the device using the data input to the device; and

User identification is performed on test data input to the device using a feature extractor.

33. The identification method of claim 32, wherein the data input to the device includes user data for user registration input by a valid user and reference data corresponding to a generalized user.

34. The identification method of claim 33, further comprising performing user identification by comparing a registration feature vector corresponding to user data with a test feature vector corresponding to test data.