[go: up one dir, main page]

CN112446408A - Method and apparatus for identifying user based on-device training - Google Patents

Method and apparatus for identifying user based on-device training Download PDF

Info

Publication number
CN112446408A
CN112446408A CN202010637175.9A CN202010637175A CN112446408A CN 112446408 A CN112446408 A CN 112446408A CN 202010637175 A CN202010637175 A CN 202010637175A CN 112446408 A CN112446408 A CN 112446408A
Authority
CN
China
Prior art keywords
user
neural network
data
identification
test
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010637175.9A
Other languages
Chinese (zh)
Other versions
CN112446408B (en
Inventor
李*焕
李焕
金圭洪
韩在濬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Samsung Electronics Co Ltd
Original Assignee
Samsung Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from KR1020190127239A external-priority patent/KR20210026982A/en
Application filed by Samsung Electronics Co Ltd filed Critical Samsung Electronics Co Ltd
Publication of CN112446408A publication Critical patent/CN112446408A/en
Application granted granted Critical
Publication of CN112446408B publication Critical patent/CN112446408B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/40Software arrangements specially adapted for pattern recognition, e.g. user interfaces or toolboxes therefor
    • G06F18/41Interactive pattern learning with a human teacher
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • G06F21/32User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Human Computer Interaction (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Multimedia (AREA)
  • Computer Security & Cryptography (AREA)
  • Probability & Statistics with Applications (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Hardware Design (AREA)
  • Neurology (AREA)
  • Image Analysis (AREA)

Abstract

A method and apparatus for identifying a user based on-device training is provided. A user identification method based on-device training comprises the following steps: performing on-device training on the feature extractor based on reference data and user data corresponding to the generalized user; determining a registration feature vector based on an output from the feature extractor in response to an input of user data; determining a test feature vector based on an output from the feature extractor in response to an input of test data; and performing user identification on the test user based on a result of comparing the registered feature vector with the test feature vector.

Description

Method and apparatus for identifying user based on-device training
This application claims the rights of korean patent application No. 10-2019-0108199, filed on 2.9.2019 and korean patent application No. 10-2019-0127239, filed on 14.10.2019, korean intellectual property office, the entire disclosures of which are incorporated herein by reference for all purposes.
Technical Field
The following description relates to an on-device training-based (on-device training-based) user identification method and apparatus.
Background
For example, technical automation of recognition has been achieved by a neural network model implemented as a processor of a dedicated computing architecture that, after extensive training, can provide a computationally intuitive mapping between input and output modes. The training ability to produce such a mapping may be referred to as the learning ability of the neural network. Furthermore, due to specialized training, such specially trained neural networks may therefore have the ability to generalize to generate relatively accurate outputs for input patterns that, for example, the neural network may not have been trained yet.
Disclosure of Invention
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
In one general aspect, an identification method includes: receiving user data for user registration input by a valid user; performing on-device training on the feature extractor based on the user data and reference data corresponding to the generalized user; determining a registration feature vector based on an output from the feature extractor in response to an input of user data; receiving test data for user identification input by a test user; determining a test feature vector based on an output from the feature extractor in response to an input of test data; and performing user identification on the test user based on a result of comparing the registered feature vector with the test feature vector.
The feature extractor may include a first neural network having set parameters and a second neural network having adjustable parameters. The adjustable parameters of the second neural network may be adjusted through on-device training. The first neural network may be pre-trained to extract features from input data based on a large user Database (DB). The step of performing on-device training may comprise: assigning labels of different values to the user data and the reference data, respectively; and performing on-device training based on a result of comparing the label with an output from the feature extractor in response to input of the user data and the reference data.
The feature extractor may include a first neural network having set parameters and a second neural network having adjustable parameters. The step of performing on-device training may comprise: inputting user data into a first neural network; inputting reference data and an output from the first neural network to the second neural network in response to the input of the user data; and performing on-device training based on output from the second neural network. The reference data may include generalized feature vectors corresponding to generalized users. The generalized feature vector may be generated by grouping feature vectors corresponding to a plurality of generalized users into clusters.
The step of performing user identification may comprise: user identification is performed based on a result of comparing a distance between the registration feature vector and the test feature vector with a threshold. Determining a distance between the registration feature vector and the test feature vector based on one of a cosine distance between the registration feature vector and the test feature vector and a Euclidean distance between the registration feature vector and the test feature vector. When the registration feature vector is determined, the identification method may further include: the determined registered feature vector is stored in a registered user database.
In another general aspect, an identification method includes: obtaining a feature extractor comprising a first neural network with set parameters and a second neural network with adjustable parameters; performing on-device training on the feature extractor based on user data corresponding to valid users and reference data corresponding to generalized users; and performing user recognition on the test data using the feature extractor when the training on the device is complete.
In another general aspect, an on-device training method for a feature extractor provided in a user device, the feature extractor including a first neural network that is pre-trained and has set parameters and a second neural network having adjustable parameters, the on-device training method comprising: obtaining user data entered by a valid user; inputting user data into a first neural network; and adjusting the adjustable parameters of the second neural network by inputting preset reference data to the second neural network and an output from the first neural network in response to the input of the user data.
The reference data may include 1000 or less feature vectors, 500 or less feature vectors, or 100 or less feature vectors.
In another general aspect, an identification device includes: a processor; and a memory including instructions executable in the processor. When the instructions are executed by a processor, the processor may be configured to: receiving user data for user registration input by a valid user; performing on-device training on the feature extractor based on the user data and reference data corresponding to the generalized user; determining a registration feature vector based on an output from the feature extractor in response to an input of user data; receiving test data for user identification input by a test user; determining a test feature vector based on an output from the feature extractor in response to an input of test data; and performing user identification on the test user based on a result of comparing the registered feature vector with the test feature vector.
In another general aspect, an identification device includes: a processor; and a memory including instructions executable in the processor. When the instructions are executed by a processor, the processor may be configured to: obtaining a feature extractor comprising a first neural network with set parameters and a second neural network with adjustable parameters; performing on-device training on the feature extractor based on user data corresponding to valid users and reference data corresponding to generalized users; and performing user recognition on the test data using the feature extractor when the training on the device is complete.
In another general aspect, a method includes: pre-training a first neural network of a feature extractor at a server side; setting the feature extractor to the device after the first neural network is pre-trained; training a second neural network of the feature extractor on the device using data input to the device; and performing user identification on the test data input to the device using the feature extractor.
The data input to the device may include user data for user registration input by a valid user and reference data corresponding to a generalized user.
The method may comprise: user identification is performed by comparing the registration feature vector corresponding to the user data with the test feature vector corresponding to the test data.
Other features and aspects will be apparent from the following detailed description, the accompanying drawings, and the claims.
Drawings
Fig. 1 is a diagram showing an example of operations for user registration and user identification to be performed by an identification device.
Fig. 2 is a diagram showing an example of processing for pre-training, user registration, and user recognition to be performed.
Fig. 3 is a diagram illustrating an example of pre-training.
Fig. 4 is a diagram illustrating an example of operations for on-device training and user registration to be performed by the recognition apparatus.
Fig. 5 is a diagram illustrating an example of training on a device.
Fig. 6 is a diagram showing an example of generating a generalized user model (generated user model).
Fig. 7 is a diagram showing an example of an operation for user identification to be performed by the identification device.
Fig. 8 and 9 are diagrams illustrating examples of changes in distribution of feature vectors based on training on a device.
FIG. 10 is a flow chart illustrating an example of a recognition method based on-device training.
FIG. 11 is a flow diagram illustrating another example of a recognition method based on-device training.
Fig. 12 is a diagram showing an example of a recognition apparatus based on-device training.
Fig. 13 is a diagram showing an example of a user apparatus.
Throughout the drawings and detailed description, the same drawing reference numerals will be understood to refer to the same elements, features and structures unless otherwise described or provided. The figures may not be to scale and the relative sizes, proportions and depictions of the elements in the figures may be exaggerated for clarity, illustration and convenience.
Detailed Description
The following detailed description is provided to assist the reader in obtaining a thorough understanding of the methods, devices, and/or systems described herein. However, various changes, modifications, and equivalents of the methods, apparatus, and/or systems described herein will be apparent to those skilled in the art after understanding the present disclosure. For example, the order of operations described herein is merely an example and is not limited to the order of operations set forth herein, but may be changed as is apparent after understanding the disclosure of the present application, except for operations that must occur in a particular order. Furthermore, descriptions of features known after understanding the disclosure of the present application may be omitted for the sake of clarity and conciseness.
The features described herein may be implemented in different forms and are not to be construed as limited to the examples described herein. Rather, the examples described herein have been provided to illustrate only some of the many possible ways to implement the methods, devices, and/or systems described herein, which will be apparent after understanding the disclosure of the present application.
Throughout the specification, when an element such as a layer, region or substrate is described as being "on," "connected to" or "coupled to" another element, the element may be directly on, connected to or coupled to the other element or one or more other elements may be present therebetween. In contrast, when an element is referred to as being "directly on," "directly connected to" or "directly coupled to" another element, there may be no other elements present between them. As used herein, the term "and/or" includes any one of the associated listed items and any combination of any two or more.
Although terms such as "first," "second," and "third" may be used herein to describe various elements, components, regions, layers or sections, these elements, components, regions, layers or sections should not be limited by these terms. Rather, these terms are only used to distinguish one element, component, region, layer or section from another element, component, region, layer or section. Thus, a first element, component, region, layer or section referred to in the examples described herein could also be referred to as a second element, component, region, layer or section without departing from the teachings of the examples.
The terminology used herein is for the purpose of describing various examples only and is not intended to be limiting of the disclosure. The singular is intended to include the plural unless the context clearly indicates otherwise. The terms "comprises," "comprising," and "having" specify the presence of stated features, quantities, operations, elements, components, and/or combinations thereof, but do not preclude the presence or addition of one or more other features, quantities, operations, components, elements, and/or combinations thereof.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs and as commonly understood after understanding the disclosure of this application. Unless explicitly defined as such herein, terms (such as those defined in general dictionaries) will be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and the disclosure of the present application and will not be interpreted in an idealized or overly formal sense.
Further, in the description of the examples, when it is considered that a detailed description of a structure or a function thus known after understanding the disclosure of the present application will lead to a vague explanation of the examples, such description will be omitted.
Examples will be described in detail below with reference to the accompanying drawings, and in the drawings, like reference numerals refer to like elements throughout.
Fig. 1 is a diagram showing an example of operations for user registration and user identification to be performed by an identification device. Referring to fig. 1, the recognition apparatus 110 registers the valid user 101 in the recognition apparatus 110 based on the user data of the valid user 101. The active user 101 may be one or more users, and one or more users may be registered in the identification device 110. The active user 101 may be a person having rights to use the identification device 110 (e.g., an owner or administrator of the apparatus in which the identification device 110 is disposed or embedded). The active user 101 may also be referred to as a real user (genine user). The registration of a valid user 101 in the identification device 110 may be referred to herein as a user registration process. Through the user registration process, identification information (e.g., a registration feature vector) of the valid user 101 is stored in the recognition device 110 or another device or apparatus associated with the recognition device 110. When the valid user 101 is registered through the user registration process, the valid user 101 may be subsequently referred to as a registered user.
The test user 102 may be an unidentified person who has not yet been identified, and the test user 102 attempts user identification by the identification device 110 to use the identification device 110. The test user 102 may be a valid user 101 or an imposter indicating a person who does not have the right to use the identification device 110. The identification device 110 may perform user identification on the test user 102 by comparing the test data of the test user 102 with the user data. Performing user identification on the test user 102 may be referred to herein as a user identification process. The user identification process may be performed after the user registration process is performed.
User identification may include user authentication and user recognition. User verification may be performed to determine whether the test user 102 is a registered user, and user recognition may be performed to determine which of a plurality of users is the test user 102. For example, when there are multiple registered users and the test user 102 is one of the multiple registered users, user recognition may be performed to determine one registered user corresponding to the test user 102.
The result of the user identification (simply referred to as "identification result" herein) may include at least one of a result of the user authentication (authentication result) and a result of the user identification (identification result). For example, when the test user 102 is a registered user, the identification device 110 may output a verification result corresponding to a successful identification. In this example, when there are multiple registered users, the recognition result may include a recognition result indicating which of the multiple registered users corresponds to the test user 102. However, when the test user 102 is an imposter, the identification device 110 may output a verification result corresponding to an unsuccessful identification.
User data may be associated with active users 101 and test data may be associated with test users 102. The user data may be input to the identification device 110 by the valid user 101, the user data may be input to another device or apparatus including the identification device 110 to be transmitted to the identification device 110, or the user data may be input to another device or apparatus separate from the identification device 110 to be transmitted to the identification device 110. Similarly, test data may be input to the identification device 110 by the test user 102, test data may be input to another device or apparatus that includes the identification device 110 to be transmitted to the identification device 110, or test data may be input to another device or apparatus that is separate from the identification device 110 to be transmitted to the identification device 110.
The data input to the identification device 110, such as user data and test data, may be referred to as input data. The input data may include voice or images. For example, in the case of speaker recognition, the input data may include speech, or audio. In the case of face recognition, the input data may include a face image. In the case of fingerprint recognition, the input data may include a fingerprint image. In the case of iris recognition, the input data may include an iris image. The identification device 110 may perform user authentication based on at least one of such various authentication methods. The modality (modality) of each of the user data, the test data, the reference data, and the training data may correspond to at least one verification method used by the identification device 110. Hereinafter, for convenience of description, examples will be described with respect to speaker recognition. However, the examples are also applicable to other verification methods than speaker recognition.
The recognition device 110 may use the feature extractor 120 to perform user recognition. The feature extractor 120 includes neural networks (e.g., a first neural network 121 and a second neural network 122). At least a portion of the neural network may be implemented by software, hardware including a neural processor, or a combination thereof. For example, the neural network may be a Deep Neural Network (DNN) including, for example, a fully-connected network, a deep convolutional network, and a Recurrent Neural Network (RNN). The DNN may include a plurality of layers including an input layer, at least one hidden layer, and an output layer.
The neural network may be trained to perform a given operation by mapping input data and output data in a non-linear relationship based on deep learning. Deep learning may be one type of machine learning that will be performed based on large data sets to solve a given problem. Deep learning can be interpreted as an optimization process that finds the point at which energy is minimized. Through supervised learning or unsupervised learning of deep learning, weights corresponding to an architecture or a model of a neural network may be obtained, and input data and output data may be mapped to each other through the weights obtained as described above. Although the feature extractor 120 is shown in fig. 1 as being located outside the recognition device 110, the feature extractor 120 may be located inside the recognition device 110.
The recognition device 110 may input data to the feature extractor 120, and in response to the input of the input data, register a user in the recognition device 110 or generate a recognition result based on an output from the feature extractor 120. In one example, the recognition device 110 may apply preprocessing to the input data and input the input data obtained by applying the preprocessing to the feature extractor 120. Through preprocessing, the input data may be changed into a form suitable for the feature extractor 120 to extract features therefrom. For example, when the input data corresponds to an audio wave, the audio wave may be converted into a frequency spectrum by preprocessing.
The feature extractor 120 may output data in response to input of input data. The output data of the feature extractor 120 may be referred to herein as a feature vector. Optionally, the output data of the feature extractor 120 may also be referred to as an embedded vector, indicating that the output data includes identification information of the user. In the user registration process for the valid user 101, the feature extractor 120 may output a feature vector in response to input of user data. The output feature vector may be referred to herein as a registered feature vector and is stored as identification information of the valid user 101 in the identification device 110 or another device or apparatus associated with the identification device 110. In the user recognition process for the test user 102, the feature extractor 120 may output a feature vector in response to an input of test data. The output feature vector may be referred to herein as a test feature vector.
The recognition device 110 may generate a recognition result by comparing the enrollment feature vector with the test feature vector. For example, the recognition device 110 may determine a distance between the registered feature vector and the test vector, and generate a recognition result based on a result of comparing the determined distance to a threshold. In this example, the enrollment feature vector and the test feature vector may be represented as matching each other when the determined distance is less than the threshold, and the enrollment feature vector and the test feature vector may be represented as not matching each other when the determined distance is not less than the threshold.
For example, when there are multiple registered users, there may be multiple registered feature vectors for each registered user. In this example, the recognition device 110 may generate a recognition result by comparing the test feature vector with each of the registration feature vectors. When the test feature vector matches one of the registered feature vectors, the recognition apparatus 110 may output a recognition result corresponding to a successful recognition. The identification result may include a recognition result associated with a registered user corresponding to the registered feature vector matching the test feature vector. For example, the recognition result may include a recognition result associated with one of the registered users corresponding to the test user 102.
The feature extractor 120 includes a first neural network 121 and a second neural network 122. The first neural network 121 may be pre-trained or pre-trained based on a large user Database (DB) (which may also be referred to as a non-specific general user database), and the second neural network 122 may be additionally trained based on user data in a user registration process. Here, the term "pre" or "pre" may indicate a point of time before the user registration process is performed (e.g., a point of time of development and production of the feature extractor 120). The large user database may correspond to non-specific general users, and the user data may correspond to specific users (e.g., active users 101). In one example, the training of the first neural network 121 may be performed by the server in steps of development and production of the feature extractor 120, and is referred to as pre-training or first training. In addition, the training of the second neural network 122 may be performed by a device or apparatus including the recognition device 110 in a user registration process, and is referred to as on-apparatus training or second training. Here, the "device" in the term "on-device training" may indicate a user device in which the recognition apparatus 110 is provided or embedded.
The first neural network 121 may have set parameters and the second neural network 122 may have adjustable parameters. The parameters used herein may include weights. When the first neural network 121 is trained through pre-training, the parameters of the first neural network 121 may be set and not changed through on-device training. Setting a parameter can also be described as the parameter being frozen, and setting a parameter can also be referred to as freezing a parameter. The parameters of the second neural network 122 may be adjusted through on-device training. The first neural network 121 may extract features from the input data in a general manner, and the second neural network 122 may remap the features extracted by the first neural network 121 so that the features are specific to the user of the individual device.
In user recognition, a mismatch between training data and actual user data may result in poor recognition performance. For example, the actual user data is not used for pre-training of the first neural network 121, and thus the level of recognition performance of the feature extractor 120 including only the first neural network 121 may not be satisfactory. However, in this example, on-device training of the second neural network 122 may be performed based on actual user data, and thus may help reduce such mismatches. For example, when using a general feature extractor to which only pre-training is applied, users (e.g., family members) with similar features may not be easily recognized. However, when using the feature extractor 120 described herein, the actual user data for each user may be used for on-device training, and thus, multiple users with similar features may be relatively accurately identified.
In addition, for on-device training, reference data corresponding to a generalized user (generated user) may be used in addition to user data. For example, a broad user may be understood as a typical user or a representative user among non-specific general users. The feature extractor 120 may extract features from the user data that are distinguishable from features in the reference data by on-device training using the user data and the reference data. Thus, the feature vector of the impostor and the feature vector of the registered user can be more accurately identified, and the identification performance is improved. On-device training using user data and reference data is described in more detail below.
Fig. 2 is a diagram showing an example of processing for pre-training, user registration, and user recognition to be performed. Referring to fig. 2, in operation 210, pre-training is performed. Pre-training may be performed based on a large user database corresponding to non-specific general users. Through pre-training, the first neural network 201 of the feature extractor 200 may be trained. The pre-training may be performed on the server side. After operation 210 is performed, the feature extractor 200 may be provided or embedded in a device and distributed to users.
In operation 220, on-device training is performed when user data is input by a valid user for user enrollment. In operation 230, user registration is performed. Operations 220 and 230 may be collectively referred to as a user registration process. On-device training may be performed in a user registration process. On-device training may be performed based on user data corresponding to a particular user (e.g., an active user) and reference data corresponding to a generalized user. Through on-device training, the second neural network 202 of the feature extractor 200 is trained. The second neural network 202 may be initialized with an identity matrix (identity matrix) before performing on-device training.
After operation 220 is performed, feature extractor 200 may become specialized for registering users. In operation 230, user data for valid users is input to the feature extractor 200 after training on the device is complete. The registration feature vector is determined based on output from the feature extractor 200 in response to input of user data. When the registration feature vector is determined, the determined registration feature vector is stored in a registered subscriber database.
In operation 240, user identification is performed. Here, operation 240 may be referred to as a user identification process. In this operation, test data for user recognition input by a test user is input to the feature extractor 200, and a test feature vector is determined based on an output of the feature extractor 200 in response to the input of the test data. Based on the result of comparing the registered feature vector with the test feature vector, user identification is performed on the test user. Operations 220 through 240 may be performed by a device.
Fig. 3 is a diagram illustrating an example of pre-training. Referring to fig. 3, a training device 310 trains a neural network 330 using a large user database 320 to extract features from input data. For example, large user database 320 may include training data associated with a plurality of non-specific general users, and labels may be assigned to each or each set of training data. The training data may include speech or images. For example, in the case of speaker recognition, the input data may include speech, or audio.
The neural network 330 includes an input layer 331, at least one hidden layer 332, and an output layer 333. For example, the input layer 331 may correspond to training data, and the output layer 333 may correspond to an activation function (such as, for example, maximum flexibility (Softmax)). Through pre-training of the neural network 330, parameters (e.g., weights) of the hidden layer 332 may be adjusted. When assigning labels, different labels may be assigned to each training data, and by pre-training based on the labels and the training data, the neural network 330 may output different output data in response to different input data. For example, different labels may be assigned to the training data of the respective groups, and by pre-training based on the labels and the training data, the neural network 330 may output different groups of output data in response to different groups of input data. This capability of the neural network 330 may be interpreted as a feature extraction function.
For example, a first label may be assigned to first training data and a second label may be assigned to second training data. In this example, the neural network 330 may be responsive to input of the first training data and output the first output data, and responsive to input of the second training data and output the second output data. The training device 310 may then compare the first output data to the first label and adjust the parameters of the hidden layer 332 such that the first output data and the first label may become identical to each other. Similarly, the training device 310 may compare the second output data with the second label and adjust the parameters of the hidden layer 332 such that the second output data and the second label may become identical to each other. The training device 310 may pre-train the neural network 330 by repeating such processing as described above based on the large user database 320.
In one example, the training process may be performed by a batch unit (batch unit). For example, a process of inputting training data to the neural network 330 and obtaining output data corresponding to an output from the neural network 330 in response to the input of the training data (for example, a process of inputting a set of training data to the neural network 330 and obtaining a set of output data corresponding to an output from the neural network 330 in response to the input of the training data) may be performed by a batch processing unit, and pre-training using the large user database 320 may be performed by repeating such a process by the batch processing unit.
The output layer 333 may convert the feature vectors output from the hidden layer 332 into a form corresponding to the tag. With the pre-training, the parameters of the hidden layer 332 may be set to values suitable for the training target, and when the pre-training is completed, the parameters of the hidden layer 332 may be set or fixed. Subsequently, the output layer 333 may be removed from the neural network 330, and a first neural network of the feature extractor may be configured using the portion 340 including the input layer 331 and the hidden layer 332.
When the pre-training is complete, the neural network 330 may perform a feature extraction function to output different output data in response to different input data, e.g., different sets of output data in response to different sets of input data. Such a feature extraction function may exhibit maximum performance when the training data is the same as the actual data used in the user registration process and the user identification process. However, the training data and the actual data may typically be different from each other. It is only theoretically possible to reduce the mismatch between the training data and the actual data by including the actual data in the training data and performing retraining to improve the recognition performance.
However, the neural network 330 may need to be trained using the large user database 320 until the neural network 330 has feature extraction functionality, and such training may require significant computational resources. Typically, user devices may have limited computing resources, and thus, such training may be performed at a large-scale server end. Thus, according to one example, a dual training method is provided that includes pre-training and on-device training. The dual training method may generate a first neural network of feature extractors by training the neural network 330 using the large user database 320, and generate a second neural network of feature extractors based on actual data. Thus, mismatches between training data and actual data may be reduced or minimized, and feature extractors specific to user devices are provided.
Fig. 4 is a diagram illustrating an example of operations for on-device training and user registration to be performed by the recognition apparatus. Referring to fig. 4, the active user inputs user data for user registration. The recognition device 410 performs on-device training on the feature extractor 420 based on the user data. The feature extractor 420 includes a first neural network 421 and a second neural network 422. The parameters of the first neural network 421 may be set or fixed by pre-training, and the parameters of the second neural network 422 may be adjusted by on-device training. For on-device training, reference data may be used. The recognition device 410 obtains reference data from the generalized user model 430 and inputs the obtained reference data to the second neural network 422. The user data may correspond to valid users and the reference data may correspond to generalized users. The generalized user model 430 will be described in more detail below.
The identification device 410 adjusts the parameters of the second neural network 422 by assigning a different label to each or each set of user data and reference data and comparing the labels to the output from the feature extractor 420 in response to the input of the user data and reference data. For example, the parameters of the second neural network 422 may be adjusted such that the output of the feature extractor 420 in response to the input of the user data (i.e., the output of the second neural network 422) becomes the same as the label assigned to the user data, and the output of the feature extractor 420 in response to the input of the reference data (i.e., the output of the second neural network 422) becomes the same as the label assigned to the reference data. As described above, the recognition device 410 may train the feature extractor 420 such that the feature extractor 420 may output different feature vectors corresponding to the user data and the reference data, respectively. By training the feature extractor 420 using the user data and the reference data, the registered feature vectors of the registered users can be more accurately identified, and the registered feature vectors of the registered users and the feature vectors of imposters can be more accurately identified. Thus, through on-device training, the feature extractor 420 may have an identification capability to identify each registered user, as well as a verification capability to distinguish registered users from imposters and verify registered users.
When the on-device training is completed, the recognition apparatus 410 inputs user data to the feature extractor 420, and obtains a feature vector output from the feature extractor 420 in response to the input of the user data. The recognition device 410 stores the feature vector output by the feature extractor 420 as a registered feature vector in the registered user database 440. The registered feature vectors may then be used in a user identification process.
Fig. 5 is a diagram illustrating an example of training on a device. Referring to fig. 5, the recognition apparatus 510 performs on-device training on the feature extractor 520 using the user data and the reference data. User data is input to the first neural network 521 and reference data is input to the second neural network 522. The reference data is obtained from the generalized user model 540. The recognition device 510 inputs the user data to the first neural network 521. When the first neural network 521 responds to the input of the user data and outputs the feature vector, the recognition device 510 inputs the output feature vector to the second neural network 522. The reference data may be generated using a neural network configured to perform feature extraction similar to the first neural network 521. Can be explained as follows: the output from the first neural network 521 is input from the first neural network 521 to the second neural network 522 without being controlled by the recognition device 510.
The second neural network 522 may be trained by similar processing performed on the neural network 330 of FIG. 3. Can be explained as follows: the training data of fig. 3 is replaced with the feature vectors and reference vectors corresponding to the user data in the example of fig. 5. The second neural network 522 includes an input layer 523, at least one hidden layer 524, and an output layer 525. For example, the input layer 523 may correspond to input data including feature vectors corresponding to user data and reference data, and the output layer 525 may correspond to an activation function (such as maximum flexibility). The parameters (e.g., weights) of the hidden layer 524 may be adjusted through on-device training. For example, the parameters of the hidden layer 524 may be adjusted such that the output of the feature extractor 420 in response to the input of the user data (i.e., the output of the second neural network 522) becomes the same as the label assigned to the user data, and the output of the feature extractor 420 in response to the input of the reference data (i.e., the output of the second neural network 522) becomes the same as the label assigned to the reference data. The second neural network 522 may be constructed using a portion 530 that includes an input layer 523 and a hidden layer 524.
A tag may be assigned to each of the user data and the reference data. By on-device training based on user data, reference data, and different labels assigned to the respective data, the feature extractor 520 may become capable of outputting different output data in response to different user data and reference data. For example, a tag may be assigned to each group of user data and reference data. By on-device training based on the user data, the reference data, and different labels assigned to the data of the respective groups, the feature extractor 520 may become capable of outputting different groups of output data in response to different groups of user data and reference data. For example, the first neural network 521 may extract features from the input data in a general manner, while the second neural network 522 may remap the features extracted by the first neural network 521 so that the features become specific to the user of the individual device.
In one example, the training process may be performed by a batch processing unit. For example, a process of inputting user data and reference data to the feature extractor 520 and obtaining output data corresponding to an output from the feature extractor 520 may be performed by a batch processing unit (for example, a process of inputting one or a set of user data and reference data to the feature extractor 520 and obtaining one or a set of output data corresponding to an output from the feature extractor 520 may be performed by a batch processing unit), and on-device training using the user data and the reference data may be performed by repeating such a process via the batch processing unit. When the on-device training is complete, the parameters of the hidden layer 524 may be set or fixed. Subsequently, the output layer 525 may be removed from the second neural network 522, and the second neural network 522 may be determined or decided upon with the output layer 525 removed therefrom.
By on-device training as described above, mismatches between training data and actual data may be reduced or minimized. For example, the recognition capability of recognizing the registered feature vector by the user data and the verification capability of recognizing or distinguishing the registered feature vector and the feature vector of the impostor by the reference data can be improved.
FIG. 6 is a diagram illustrating an example of generating a generalized user model. Referring to fig. 6, input data is extracted from a large user database 610 and input to a neural network 620. For example, the neural network 620 may correspond to the first neural network 421 of fig. 4, and the neural network 620 may output a feature vector based on the input data. Large subscriber database 610 may be the same as or different from large subscriber database 320 of fig. 3.
In the example of fig. 6, the feature vectors output by the neural network 620 are indicated by small circles on the vector plane 630. These feature vectors may correspond to a plurality of generalized users included in large user database 610, and are also usedReferred to as the base feature vector. As a vector representing the basic feature vector, a representative feature vector (e.g., θ) may be selected1,θ2,…,θc). For example, the representative feature vector θ may be selected by grouping the basis feature vectors into clusters (cluster)1,θ2,…,θc(e.g., one or more feature vectors are selected as representative feature vectors from each of some or all of the grouped clusters). Representative feature vector θ1,θ2,…,θcMay correspond to a generalized user and is also referred to as a generalized eigenvector. In addition, the representative feature vector θ1,θ2,…,θcMay be included as reference data in the generalized user model 640 and used for on-device training. For example, there may be tens or hundreds of such typical feature vectors. For example, the number of representative feature vectors may be 1000 or less, 500 or less, or 100 or less. Typical feature vectors may correspond to data that a user device actually or realistically processes through deep learning or training. For example, 10 utterances or voices may be collected from each of approximately one hundred thousand users, and a database including approximately one million utterances or voices may be configured. Based on this database, approximately 100 representative feature vectors may be generated.
Fig. 7 is a diagram showing an example of an operation for user identification to be performed by the identification device. Referring to fig. 7, the recognition apparatus 710 inputs test data to the feature extractor 720. In the example of fig. 7, the feature extractor 720 is in a state where training on the device is complete. The feature extractor 720 outputs a test feature vector in response to an input of test data. The test data may be entered by the test user in a user verification process. The test user may be an unrecognized person attempting user identification by the identification device 710 to use the identification device 710. The test user may be a valid user or an imposter.
The recognition apparatus 710 obtains the registered feature vector from the registered user database 730, performs user recognition on the test user by comparing the registered feature vector with the test feature vector, and generates a recognition result. For example, the recognition device 710 determines a distance between the enrollment feature vector and the test feature vector, and generates a recognition result based on a result of comparing the determined distance with a threshold. For example, the distance between the registered feature vector and the test feature vector may be determined based on a cosine distance (cosine distance) or a Euclidean distance (Euclidean distance) between the registered feature vector and the test feature vector.
Fig. 8 and 9 are diagrams illustrating examples of changes in distribution of feature vectors based on training on a device. In the example of fig. 8, the registration feature vectors are indicated on vector planes 810 and 820. The registered feature vectors are indicated by small circles, and small circles having the same pattern indicate registered feature vectors of the same registered user. In one example, the registered feature vectors on vector plane 810 are obtained by feature extractors that have not been trained on the application device, and the registered feature vectors on vector plane 820 are obtained by feature extractors to which on-device training has been applied. As shown in fig. 8, the registered feature vectors may be remapped to be specific to registered users through on-device training. Accordingly, registered users, e.g., registered users having similar characteristics, such as family members, can be more accurately identified from each other.
Referring to fig. 9, in contrast to vector planes 810 and 820 shown in fig. 8, vector planes 910 and 920 also include feature vectors of imposters. In the example of fig. 9, the imposter's feature vector is simply referred to as an imposter feature vector and is indicated by a hexagram. In this example, the enrolled feature vectors and imposter feature vectors on vector plane 910 are obtained by feature extractors that have not been trained on the application device, and the enrolled feature vectors and imposter feature vectors on vector plane 920 are obtained by feature extractors to which training on the device has been applied. As shown in fig. 9, in addition to the enrolled feature vectors, the impostor feature vectors may also be remapped to be specific to enrolled users through on-device training. Accordingly, the registered user and the impostor can be more accurately identified or distinguished from each other, so that the registered user can be more accurately authenticated.
FIG. 10 is a flow chart illustrating an example of a recognition method based on-device training. Referring to fig. 10, in operation 1010, the identification apparatus receives user data for user registration input by a valid user. In operation 1020, the recognition device performs on-device training on the feature extractor based on the user data and reference data corresponding to the generalized user. In operation 1030, the recognition device determines a registration feature vector based on an output from the feature extractor in response to an input of user data. In operation 1040, the identification device receives test data entered by a test user for user identification. In operation 1050, the recognition device determines a test feature vector based on an output from the feature extractor in response to an input of test data. In operation 1060, the recognition device performs user recognition on the test user based on a result of comparing the registration feature vector with the test feature vector. For a more detailed description of the recognition method based on-device training, reference may be made to the description provided above with reference to fig. 1 to 9.
FIG. 11 is a flow diagram illustrating another example of a recognition method based on-device training. Referring to fig. 11, in operation 1110, a recognition apparatus obtains a feature extractor including a first neural network having set parameters and a second neural network having adjustable parameters. In operation 1120, the recognition apparatus performs on-device training on the feature extractor based on the user data corresponding to the valid user and the reference data corresponding to the generalized user. In operation 1130, when the on-device training is completed, the recognition apparatus performs user recognition using the feature extractor. For a more detailed description of the recognition method based on-device training, reference may be made to the description provided above with reference to fig. 1 to 10.
Fig. 12 is a diagram showing an example of a recognition apparatus based on-device training. The recognition device 1200 may receive input data including user data and test data and process the operation of the neural network associated with the received input data. For example, the operation of the neural network may include user identification. The recognition device 1200 may perform one or more or all of the operations or methods described herein with respect to processing a neural network and provide the results of processing the neural network to a user.
Referring to fig. 12, the recognition apparatus 1200 includes at least one processor 1210 and a memory 1220. The memory 1220 may be connected to the processor 1210 and store instructions executable by the processor 1210 and data to be processed by the processor 1210 or data processed by the processor 1210. The memory 1220 may include a non-transitory computer-readable medium (e.g., high speed Random Access Memory (RAM)) and/or a non-volatile computer-readable storage medium (e.g., at least one disk storage device, flash memory device, and other non-volatile solid-state memory devices).
Processor 1210 may execute instructions to perform one or more or all of the operations or methods described above with reference to fig. 1-11. In one example, when instructions stored in memory 1220 are executed by processor 1210, processor 1210 may receive user data for user registration input by a valid user, perform on-device training for feature extractor 1225 based on the user data and reference data corresponding to a generalized user, determine a registration feature vector based on output from feature extractor 1225 in response to input of the user data, receive test data for user identification input by a test user, determine a test feature vector based on output from feature extractor 1225 in response to input of the test data, and perform user identification for the test user based on a result of comparing the registration feature vector to the test feature vector.
In another example, when instructions stored in the memory 1220 are executed by the processor 1210, the processor 1210 may obtain the feature extractor 1225 including a first neural network having set parameters and a second neural network having adjustable parameters, perform on-device training on the feature extractor 1225 based on user data corresponding to a valid user and reference data corresponding to a generalized user, and perform user recognition using the feature extractor 1225 when the on-device training is completed.
Fig. 13 is a diagram showing an example of a user apparatus. The user device 1300 may receive input data and process the operation of the neural network associated with the received input data. For example, the operation of the neural network may include user identification. The user apparatus 1300 may include the identification device described above with reference to fig. 1 to 12 and perform the operation or function of the identification device as described above with reference to fig. 1 to 12.
Referring to fig. 13, the user device 1300 includes a processor 1310, a memory 1320, a camera 1330, a storage device 1340, an input device 1350, an output device 1360, and a network interface 1370. The processor 1310, memory 1320, camera 1330, storage 1340, input device 1350, output device 1360, and network interface 1370 may communicate with each other through a communication bus 1380. For example, user device 1300 may include a smart phone, a tablet Personal Computer (PC), a laptop computer, a desktop computer, a wearable device, a smart home appliance, a smart speaker, a smart car, and so forth.
Processor 1310 may execute functions and instructions in user device 1300. For example, the processor 1310 may process instructions stored in the memory 1320 or the storage device 1340. The processor 1310 may perform one or more or all of the operations or methods described above with reference to fig. 1-12.
The memory 1320 may store information to be used in processing the operation of the neural network. Memory 1320 may include a computer-readable storage medium or a computer-readable storage device. Memory 1320 may store instructions to be executed by processor 1310 and store relevant information while software or applications are being executed by user device 1300.
The camera 1330 may capture still images, motion or video images, or both still and motion or video images. The camera 1330 may capture an image of a face region to be input by the user for face verification. The camera 1330 may also provide a three-dimensional (3D) image including depth information of the object.
Storage 1340 may include a computer-readable storage medium or a computer-readable storage device. Storage device 1340 may store larger amounts of information for longer periods of time than memory 1320. For example, storage 1340 may include a magnetic hard disk, optical disk, flash memory, floppy disk and other types of non-volatile memory known in the relevant art.
The input device 1350 may receive input from a user through a conventional input method (e.g., a keyboard and a mouse) and a new input method (e.g., a touch input, a voice input, and an image input). For example, input devices 1350 may include keyboards, mice, touch screens, microphones, and other devices that can detect input from a user and send the detected input to user device 1300. Through the input device 1350, data of a user's fingerprint, iris, voice, and audio, etc. can be input.
The output device 1360 may provide output to a user from the user device 1300 through visual, auditory, or tactile channels. For example, output devices 1360 may include displays, touch screens, speakers, vibration generators, and other devices that may provide output to a user. The network interface 1370 may communicate with an external device through a wired network or a wireless network.
The recognition apparatus, training apparatus, user devices and other apparatus, devices, units, modules and other components described herein with respect to fig. 1, 12 and 13 are implemented by hardware components. Examples of hardware components that may be used to perform the operations described in this application include, where appropriate: a controller, a sensor, a generator, a driver, a memory, a comparator, an arithmetic logic unit, an adder, a subtractor, a multiplier, a divider, an integrator, and any other electronic component configured to perform the operations described herein. In other examples, one or more of the hardware components that perform the operations described in this application are implemented by computing hardware (e.g., by one or more processors or computers). A processor or computer may be implemented by one or more processing elements, such as an array of logic gates, a controller and arithmetic logic unit, a digital signal processor, a microcomputer, a programmable logic controller, a field programmable gate array, a programmable logic array, a microprocessor, or any other device or combination of devices that is configured to respond to and execute instructions in a defined manner to achieve a desired result. In one example, a processor or computer includes or is connected to one or more memories that store instructions or software for execution by the processor or computer. A hardware component implemented by a processor or a computer may execute instructions or software (such as an Operating System (OS) and one or more software applications running on the OS) for performing the operations described in this application. The hardware components may also access, manipulate, process, create, and store data in response to execution of instructions or software. For simplicity, the singular terms "processor" or "computer" may be used in the description of the examples described in this application, but in other examples, multiple processors or computers may be used, or a processor or computer may include multiple processing elements or multiple types of processing elements, or both. For example, a single hardware component or two or more hardware components may be implemented by a single processor, or two or more processors, or a processor and a controller. One or more hardware components may be implemented by one or more processors, or processors and controllers, and one or more other hardware components may be implemented by one or more other processors, or other processors and other controllers. One or more processors, or processors and controllers, may implement a single hardware component or two or more hardware components. The hardware components may have any one or more of different processing configurations, examples of which include: single processors, independent processors, parallel processors, Single Instruction Single Data (SISD) multiprocessing, Single Instruction Multiple Data (SIMD) multiprocessing, Multiple Instruction Single Data (MISD) multiprocessing, and Multiple Instruction Multiple Data (MIMD) multiprocessing.
The methods illustrated in fig. 2-11 to perform the operations described in the present application are performed by computing hardware (e.g., by one or more processors or computers) implemented as executing instructions or software as described above to perform the operations described in the present application as performed by the methods. For example, a single operation or two or more operations may be performed by a single processor or two or more processors, or a processor and a controller. One or more operations may be performed by one or more processors, or processors and controllers, and one or more other operations may be performed by one or more other processors, or other processors and other controllers. One or more processors, or a processor and a controller, may perform a single operation or two or more operations.
Instructions or software for controlling a processor or computer to implement hardware components and perform methods as described above are written as computer programs, code segments, instructions, or any combination thereof, to individually or collectively instruct or configure the processor or computer to operate as a machine or special purpose computer to perform operations performed by hardware components and methods as described above. In one example, the instructions or software include machine code that is directly executed by a processor or computer (such as machine code generated by a compiler). In another example, the instructions or software comprise high-level code that is executed by a processor or computer using an interpreter. Instructions or software can be readily written by a programmer of ordinary skill in the art based on the block and flow diagrams illustrated in the drawings and the corresponding descriptions in the specification, which disclose algorithms for performing the operations performed by the hardware components and methods as described above.
Instructions or software for controlling a processor or computer to implement hardware components and perform methods as described above, as well as any associated data, data files, and data structures, are recorded, stored, or fixed in or on one or more non-transitory computer-readable storage media. Examples of non-transitory computer-readable storage media include: read-only memory (ROM), random-access programmable read-only memory (PROM), electrically erasable programmable read-only memory (EEPROM), random-access memory (RAM), dynamic random-access memory (DRAM), static random-access memory (SRAM), flash memory, non-volatile memory, CD-ROM, CD-R, CD + R, CD-RW, CD + RW, DVD-ROM, DVD-R, DVD + R, DVD-RW, DVD + RW, DVD-RAM, BD-ROM, BD-R, BD-R LTH, BD-RE, Blu-ray or optical disk memory, Hard Disk Drive (HDD), Solid State Drive (SSD), card-type memory (such as a multimedia card or a mini-card (e.g., Secure Digital (SD) or extreme digital (XD)), magnetic tape, floppy disk, magneto-optical data storage device, magnetic tape, magneto-optical data storage device, optical data, Hard disks, solid state disks, and any other device configured to store and provide instructions or software and any associated data, data files, and data structures to a processor or computer in a non-transitory manner such that the processor or computer can execute the instructions.
Although the present disclosure includes specific examples, it will be apparent to those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the claims and their equivalents. The examples described herein are to be considered merely as illustrative and not restrictive. The description of features or aspects in each example will be considered applicable to similar features or aspects in other examples. Suitable results may be achieved if the described techniques are performed in a different order and/or if components in the described systems, architectures, devices, or circuits are combined in a different manner and/or replaced or supplemented by other components or their equivalents. Therefore, the scope of the present disclosure is defined not by the detailed description but by the claims and their equivalents, and all variations within the scope of the claims and their equivalents are to be construed as being included in the present disclosure.

Claims (34)

1.一种识别方法,包括:1. A method of identification comprising: 接收由有效用户输入的用于用户注册的用户数据;receive user data entered by a valid user for user registration; 基于用户数据和与广义用户对应的参考数据对特征提取器执行装置上训练;performing on-device training on the feature extractor based on the user data and the reference data corresponding to the generalized user; 基于来自特征提取器的响应于用户数据的输入的输出,确定注册特征向量;determining a registration feature vector based on the output from the feature extractor in response to the input of the user data; 接收由测试用户输入的用于用户识别的测试数据;receive test data entered by the test user for user identification; 基于来自特征提取器的响应于测试数据的输入的输出,确定测试特征向量;以及determining a test feature vector based on the output from the feature extractor in response to the input of the test data; and 基于将注册特征向量与测试特征向量进行比较的结果,对测试用户执行用户识别。Based on the result of comparing the registration feature vector with the test feature vector, user identification is performed on the test user. 2.根据权利要求1所述的识别方法,其中,特征提取器包括具有设定参数的第一神经网络和具有可调参数的第二神经网络,2. The identification method according to claim 1, wherein the feature extractor comprises a first neural network with set parameters and a second neural network with adjustable parameters, 其中,第二神经网络的可调参数通过装置上训练进行调整。Wherein, the adjustable parameters of the second neural network are adjusted through on-device training. 3.根据权利要求2所述的识别方法,其中,第一神经网络被预训练,以从基于大用户数据库的输入数据提取特征。3. The identification method of claim 2, wherein the first neural network is pretrained to extract features from input data based on a large user database. 4.根据权利要求1所述的识别方法,其中,执行装置上训练的步骤包括:4. The identification method according to claim 1, wherein the step of performing on-device training comprises: 分别向用户数据和参考数据分配不同值的标签;以及labels that assign different values to user data and reference data, respectively; and 基于将所述标签与来自特征提取器的响应于用户数据和参考数据的输入的输出进行比较的结果,执行装置上训练。On-device training is performed based on the results of comparing the labels to outputs from the feature extractor in response to inputs of user data and reference data. 5.根据权利要求1所述的识别方法,其中,特征提取器包括具有设定参数的第一神经网络和具有可调参数的第二神经网络,5. The identification method according to claim 1, wherein the feature extractor comprises a first neural network with set parameters and a second neural network with adjustable parameters, 其中,执行装置上训练的步骤包括:Wherein, the step of performing on-device training includes: 将用户数据输入到第一神经网络;input user data into the first neural network; 向第二神经网络输入参考数据和来自第一神经网络的响应于用户数据的输入的输出;以及inputting reference data to the second neural network and an output from the first neural network responsive to the input of the user data; and 基于来自第二神经网络的输出执行装置上训练。On-device training is performed based on the output from the second neural network. 6.根据权利要求1所述的识别方法,其中,参考数据包括与广义用户对应的广义特征向量,6. The identification method according to claim 1, wherein the reference data comprises a generalized eigenvector corresponding to a generalized user, 其中,通过将与多个广义用户对应的特征向量分组为簇来生成广义特征向量。Among them, generalized feature vectors are generated by grouping feature vectors corresponding to multiple generalized users into clusters. 7.根据权利要求1所述的识别方法,其中,执行用户识别的步骤包括:7. The identification method according to claim 1, wherein the step of performing user identification comprises: 基于将注册特征向量与测试特征向量之间的距离与阈值进行比较的结果来执行用户识别。User identification is performed based on the result of comparing the distance between the registration feature vector and the test feature vector with a threshold. 8.根据权利要求7所述的识别方法,其中,基于注册特征向量与测试特征向量之间的余弦距离和注册特征向量与测试特征向量之间的欧式距离中的一个,确定注册特征向量与测试特征向量之间的距离。8. identification method according to claim 7, wherein, based on one in the cosine distance between the registration eigenvector and the test eigenvector and the Euclidean distance between the registration eigenvector and the test eigenvector, it is determined that the registration eigenvector and the test distance between feature vectors. 9.根据权利要求1所述的识别方法,还包括:9. The identification method according to claim 1, further comprising: 将确定的注册特征向量存储在注册用户数据库中。Store the determined registration feature vector in the registered user database. 10.一种存储指令的非暂时性计算机可读存储介质,所述指令在被处理器执行时,使得处理器执行根据权利要求1所述的识别方法。10. A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the identification method of claim 1 . 11.一种识别方法,包括:11. A method of identification comprising: 获得包括具有设定参数的第一神经网络和具有可调参数的第二神经网络的特征提取器;obtaining a feature extractor comprising a first neural network with set parameters and a second neural network with adjustable parameters; 基于与有效用户对应的用户数据和与广义用户对应的参考数据,对特征提取器执行装置上训练;以及performing on-device training on the feature extractor based on user data corresponding to valid users and reference data corresponding to generalized users; and 当装置上训练完成时,使用特征提取器对测试数据执行用户识别。When the on-device training is complete, user identification is performed on the test data using a feature extractor. 12.根据权利要求11所述的识别方法,其中,第二神经网络的可调参数通过装置上训练进行调整。12. The identification method according to claim 11, wherein the adjustable parameters of the second neural network are adjusted through on-device training. 13.根据权利要求11所述的识别方法,其中,执行装置上训练的步骤包括:13. The identification method of claim 11, wherein the step of performing on-device training comprises: 将用户数据输入到第一神经网络;input user data into the first neural network; 向第二神经网络输入参考数据和来自第一神经网络的响应于用户数据的输入的输出;以及inputting reference data to the second neural network and an output from the first neural network responsive to the input of the user data; and 基于来自第二神经网络的输出执行装置上训练。On-device training is performed based on the output from the second neural network. 14.一种用于设置在用户装置中的特征提取器的装置上训练方法,所述特征提取器包括具有设定参数的预训练的第一神经网络和具有可调参数的第二神经网络,所述装置上训练方法包括:14. An on-device training method for a feature extractor provided in a user device, the feature extractor comprising a pre-trained first neural network with set parameters and a second neural network with adjustable parameters, The on-device training method includes: 获得由有效用户输入的用户数据;Obtain user data entered by a valid user; 将用户数据输入到第一神经网络;以及inputting user data to the first neural network; and 通过向第二神经网络输入预设参考数据和来自第一神经网络的响应于用户数据的输入的输出,调节第二神经网络的可调参数。Adjustable parameters of the second neural network are adjusted by inputting preset reference data to the second neural network and output from the first neural network in response to the input of user data. 15.根据权利要求14所述的装置上训练方法,其中,参考数据包括1000个或更少的特征向量。15. The on-device training method of claim 14, wherein the reference data includes 1000 or less feature vectors. 16.根据权利要求14所述的装置上训练方法,其中,参考数据包括500个或更少的特征向量。16. The on-device training method of claim 14, wherein the reference data includes 500 or less feature vectors. 17.根据权利要求14所述的装置上训练方法,其中,参考数据包括100个或更少的特征向量。17. The on-device training method of claim 14, wherein the reference data includes 100 or less feature vectors. 18.根据权利要求14所述的装置上训练方法,其中,参考数据包括与广义用户对应的广义特征向量。18. The on-device training method of claim 14, wherein the reference data includes generalized feature vectors corresponding to generalized users. 19.根据权利要求18所述的装置上训练方法,其中,通过将与多个广义用户对应的特征向量分组为簇来生成广义特征向量。19. The on-device training method of claim 18, wherein the generalized feature vectors are generated by grouping feature vectors corresponding to a plurality of generalized users into clusters. 20.一种识别设备,包括:20. An identification device comprising: 处理器;以及processor; and 存储器,包括能够在处理器中执行的指令,memory, including instructions capable of being executed in the processor, 其中,当所述指令由处理器执行时,处理器被配置为:wherein, when the instructions are executed by the processor, the processor is configured to: 接收由有效用户输入的用于用户注册的用户数据;receive user data entered by a valid user for user registration; 基于用户数据和与广义用户对应的参考数据对特征提取器执行装置上训练;performing on-device training on the feature extractor based on the user data and the reference data corresponding to the generalized user; 基于来自特征提取器的响应于用户数据的输入的输出,确定注册特征向量;determining a registration feature vector based on the output from the feature extractor in response to the input of the user data; 接收由测试用户输入的用于用户识别的测试数据;receive test data entered by the test user for user identification; 基于来自特征提取器的响应于测试数据的输入的输出,确定测试特征向量;以及determining a test feature vector based on the output from the feature extractor in response to the input of the test data; and 基于将注册特征向量与测试特征向量进行比较的结果,对测试用户执行用户识别。Based on the result of comparing the registration feature vector with the test feature vector, user identification is performed on the test user. 21.根据权利要求20所述的识别设备,其中,特征提取器包括具有设定参数的第一神经网络和具有可调参数的第二神经网络,并且21. The identification device of claim 20, wherein the feature extractor comprises a first neural network with set parameters and a second neural network with adjustable parameters, and 其中,第二神经网络的可调参数通过装置上训练进行调整。Wherein, the adjustable parameters of the second neural network are adjusted through on-device training. 22.根据权利要求21所述的识别设备,其中,第一神经网络被预训练,以从基于大用户数据库的输入数据提取特征。22. The identification device of claim 21, wherein the first neural network is pretrained to extract features from input data based on a large database of users. 23.根据权利要求20所述的识别设备,其中,处理器被配置为:23. The identification device of claim 20, wherein the processor is configured to: 分别为用户数据和参考数据分配不同值的标签;以及Labels that assign different values to user data and reference data, respectively; and 基于将所述标签与来自特征提取器的响应于用户数据和参考数据的输入的输出进行比较的结果,执行装置上训练。On-device training is performed based on the results of comparing the labels to outputs from the feature extractor in response to inputs of user data and reference data. 24.根据权利要求20所述的识别设备,其中,特征提取器包括具有设定参数的第一神经网络和具有可调参数的第二神经网络,并且24. The identification device of claim 20, wherein the feature extractor comprises a first neural network with set parameters and a second neural network with adjustable parameters, and 其中,处理器被配置为:where the processor is configured as: 将用户数据输入到第一神经网络;input user data into the first neural network; 向第二神经网络输入参考数据和来自第一神经网络的响应于用户数据的输入的输出;以及inputting reference data to the second neural network and an output from the first neural network responsive to the input of the user data; and 基于来自第二神经网络的输出执行装置上训练。On-device training is performed based on the output from the second neural network. 25.根据权利要求20所述的识别设备,其中,参考数据包括与广义用户对应的广义特征向量,25. The identification device of claim 20, wherein the reference data includes generalized feature vectors corresponding to generalized users, 其中,通过将与多个广义用户对应的特征向量分组为簇来生成广义特征向量。Among them, generalized feature vectors are generated by grouping feature vectors corresponding to multiple generalized users into clusters. 26.根据权利要求20所述的识别设备,其中,处理器被配置为:26. The identification device of claim 20, wherein the processor is configured to: 基于将注册特征向量与测试特征向量之间的距离与阈值进行比较的结果来执行用户识别。User identification is performed based on the result of comparing the distance between the registration feature vector and the test feature vector with a threshold. 27.根据权利要求26所述的识别设备,其中,基于注册特征向量与测试特征向量之间的余弦距离和注册特征向量与测试特征向量之间的欧式距离中的一个,确定注册特征向量与测试特征向量之间的距离。27. The identification device according to claim 26, wherein, based on one of the cosine distance between the registration eigenvector and the test eigenvector and the Euclidean distance between the registration eigenvector and the test eigenvector, it is determined that the registration eigenvector and the test distance between feature vectors. 28.根据权利要求20所述的识别设备,其中,处理器被配置为:将确定的注册特征向量存储在注册用户数据库中。28. The identification device of claim 20, wherein the processor is configured to store the determined registration feature vector in a registered user database. 29.一种识别设备,包括:29. An identification device comprising: 处理器;以及processor; and 存储器,包括能够在处理器中执行的指令,memory, including instructions capable of being executed in the processor, 其中,当所述指令由处理器执行时,处理器被配置为:wherein, when the instructions are executed by the processor, the processor is configured to: 获得包括具有设定参数的第一神经网络和具有可调参数的第二神经网络的特征提取器;obtaining a feature extractor comprising a first neural network with set parameters and a second neural network with adjustable parameters; 基于与有效用户对应的用户数据和与广义用户对应的参考数据,对特征提取器执行装置上训练;以及performing on-device training on the feature extractor based on user data corresponding to valid users and reference data corresponding to generalized users; and 当装置上训练完成时,使用特征提取器对测试数据执行用户识别。When the on-device training is complete, user identification is performed on the test data using a feature extractor. 30.根据权利要求29所述的识别设备,其中,第二神经网络的可调参数通过装置上训练进行调整。30. The recognition apparatus of claim 29, wherein the adjustable parameters of the second neural network are adjusted by on-device training. 31.根据权利要求29所述的识别设备,其中,处理器被配置为:31. The identification device of claim 29, wherein the processor is configured to: 将用户数据输入到第一神经网络;input user data into the first neural network; 向第二神经网络输入参考数据和来自第一神经网络的响应于用户数据的输入的输出;以及inputting reference data to the second neural network and an output from the first neural network responsive to the input of the user data; and 基于来自第二神经网络的输出执行装置上训练。On-device training is performed based on the output from the second neural network. 32.一种识别方法,包括:32. A method of identification comprising: 在服务器端对特征提取器的第一神经网络进行预训练;Pre-training the first neural network of the feature extractor on the server side; 在第一神经网络被预训练之后,将特征提取器设置到装置;after the first neural network is pretrained, setting the feature extractor to the device; 使用输入到装置的数据在装置上对特征提取器的第二神经网络进行训练;以及training a second neural network of the feature extractor on the device using the data input to the device; and 使用特征提取器对输入到装置的测试数据执行用户识别。User identification is performed on test data input to the device using a feature extractor. 33.根据权利要求32所述的识别方法,其中,输入到装置的数据包括由有效用户输入的用于用户注册的用户数据以及与广义用户对应的参考数据。33. The identification method of claim 32, wherein the data input to the device includes user data for user registration input by a valid user and reference data corresponding to a generalized user. 34.根据权利要求33所述的识别方法,还包括:通过将对应于用户数据的注册特征向量与对应于测试数据的测试特征向量进行比较来执行用户识别。34. The identification method of claim 33, further comprising performing user identification by comparing a registration feature vector corresponding to user data with a test feature vector corresponding to test data.
CN202010637175.9A 2019-09-02 2020-07-03 Method and apparatus for identifying users based on on-device training Active CN112446408B (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
KR10-2019-0108199 2019-09-02
KR20190108199 2019-09-02
KR1020190127239A KR20210026982A (en) 2019-09-02 2019-10-14 Method and apparatus for recognizing user based on on-device training
KR10-2019-0127239 2019-10-14

Publications (2)

Publication Number Publication Date
CN112446408A true CN112446408A (en) 2021-03-05
CN112446408B CN112446408B (en) 2025-05-13

Family

ID=72243014

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010637175.9A Active CN112446408B (en) 2019-09-02 2020-07-03 Method and apparatus for identifying users based on on-device training

Country Status (4)

Country Link
US (2) US11900246B2 (en)
EP (1) EP3798925A1 (en)
JP (1) JP7534045B2 (en)
CN (1) CN112446408B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113407843B (en) * 2021-07-09 2024-11-15 深圳壹账通智能科技有限公司 User portrait generation method, device, electronic device and computer storage medium
US11928762B2 (en) * 2021-09-03 2024-03-12 Adobe Inc. Asynchronous multi-user real-time streaming of web-based image edits using generative adversarial network(s)
US20230214642A1 (en) * 2022-01-05 2023-07-06 Google Llc Federated Learning with Partially Trainable Networks

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105447498A (en) * 2014-09-22 2016-03-30 三星电子株式会社 A client device configured with a neural network, a system and a server system
US9824692B1 (en) * 2016-09-12 2017-11-21 Pindrop Security, Inc. End-to-end speaker recognition using deep neural network
US20180018973A1 (en) * 2016-07-15 2018-01-18 Google Inc. Speaker verification
US20180068218A1 (en) * 2016-09-07 2018-03-08 Samsung Electronics Co., Ltd. Neural network based recognition apparatus and method of training neural network
CN108460365A (en) * 2018-03-27 2018-08-28 百度在线网络技术(北京)有限公司 Identity identifying method and device
CN108806696A (en) * 2018-05-08 2018-11-13 平安科技(深圳)有限公司 Establish method, apparatus, computer equipment and the storage medium of sound-groove model

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1669979B1 (en) 2003-10-03 2008-01-23 Asahi Kasei Kabushiki Kaisha Data processing device and data processing device control program
KR101113770B1 (en) 2009-12-28 2012-03-05 대한민국(국가기록원) The same speaker's voice to change the algorithm for speech recognition error rate reduction
WO2014112375A1 (en) 2013-01-17 2014-07-24 日本電気株式会社 Speaker identification device, speaker identification method, and speaker identification program
US9978374B2 (en) 2015-09-04 2018-05-22 Google Llc Neural networks for speaker verification
KR102494139B1 (en) 2015-11-06 2023-01-31 삼성전자주식회사 Apparatus and method for training neural network, apparatus and method for speech recognition
CN107785015A (en) 2016-08-26 2018-03-09 阿里巴巴集团控股有限公司 A kind of audio recognition method and device
DE112017006136T5 (en) * 2016-12-05 2019-08-22 Avigilon Corporation System and method for CNN layer sharing
KR102359558B1 (en) * 2017-03-28 2022-02-09 삼성전자주식회사 Face verifying method and apparatus
US20190019500A1 (en) 2017-07-13 2019-01-17 Electronics And Telecommunications Research Institute Apparatus for deep learning based text-to-speech synthesizing by using multi-speaker data and method for the same
KR102415509B1 (en) * 2017-11-10 2022-07-01 삼성전자주식회사 Face verifying method and apparatus
KR102486395B1 (en) * 2017-11-23 2023-01-10 삼성전자주식회사 Neural network device for speaker recognition, and operation method of the same
KR102654874B1 (en) * 2018-12-31 2024-04-05 삼성전자주식회사 Neural network device for speaker recognition, and operation method of the same

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105447498A (en) * 2014-09-22 2016-03-30 三星电子株式会社 A client device configured with a neural network, a system and a server system
US20180018973A1 (en) * 2016-07-15 2018-01-18 Google Inc. Speaker verification
US20180068218A1 (en) * 2016-09-07 2018-03-08 Samsung Electronics Co., Ltd. Neural network based recognition apparatus and method of training neural network
US9824692B1 (en) * 2016-09-12 2017-11-21 Pindrop Security, Inc. End-to-end speaker recognition using deep neural network
CN108460365A (en) * 2018-03-27 2018-08-28 百度在线网络技术(北京)有限公司 Identity identifying method and device
CN108806696A (en) * 2018-05-08 2018-11-13 平安科技(深圳)有限公司 Establish method, apparatus, computer equipment and the storage medium of sound-groove model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
S. A. OSIA ET AL: "A Hybrid Deep Learning Architecture for Privacy-Preserving Mobile Analytics", IEEE INTERNET OF THINGS, vol. 7, no. 5, 31 May 2020 (2020-05-31), pages 4505 - 4518, XP011788113, DOI: 10.1109/JIOT.2020.2967734 *

Also Published As

Publication number Publication date
US20240135177A1 (en) 2024-04-25
EP3798925A1 (en) 2021-03-31
US11900246B2 (en) 2024-02-13
US20210064923A1 (en) 2021-03-04
US20240232619A9 (en) 2024-07-11
JP7534045B2 (en) 2024-08-14
JP2021039749A (en) 2021-03-11
CN112446408B (en) 2025-05-13

Similar Documents

Publication Publication Date Title
US11295111B2 (en) Method and apparatus for detecting fake fingerprint, and method and apparatus for recognizing fingerprint
CN111091176B (en) Data identification device and method and training device and method
CN109948408B (en) Activity testing method and equipment
US12374110B2 (en) System and method for predicting formation in sports
CN115147875A (en) Anti-spoofing method and device
CN108573137B (en) Fingerprint verification methods and equipment
CN111382666B (en) Device and method with user authentication
US20240232619A9 (en) Method and apparatus for recognizing user based on on-device training
US20160217198A1 (en) User management method and apparatus
CN111667839B (en) Registration method and device, speaker recognition method and device
KR20170016231A (en) Multi-modal fusion method for user authentification and user authentification method
WO2013095727A1 (en) Face feature vector construction
CN109299594B (en) Identity verification method and device
US20230267709A1 (en) Dataset-aware and invariant learning for face recognition
US11335117B2 (en) Method and apparatus with fake fingerprint detection
Tolosana et al. Reducing the template ageing effect in on‐line signature biometrics
JP2025016785A (en) User authentication method and device using generalized user model
KR20200055836A (en) Method and apparatus for classifying data, method and apparatus for training classifier
CN113408693A (en) Method and apparatus for recognizing image
JP6771361B2 (en) Authentication method, authentication device and learning method
CN113361314A (en) Anti-spoofing method and apparatus
CN112926574A (en) Image recognition method, image recognition device and image recognition system
KR20210026982A (en) Method and apparatus for recognizing user based on on-device training
CN114341889A (en) Deep learning internal data extraction method and device
US11393255B2 (en) Liveness determining method and apparatus and method and apparatus for training the liveness determining apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
OR01 Other related matters
OR01 Other related matters