NL2035809B1

NL2035809B1 - Determining a relationship between a probe and multiple references under encryption

Info

Publication number: NL2035809B1
Application number: NL2035809A
Authority: NL
Inventors: Bassit Amina; Nicolaas Johan Veldhuis Raymond; Werner Hahn Florian; Peter Andreas
Original assignee: Univ Twente
Priority date: 2023-09-15
Filing date: 2023-09-15
Publication date: 2025-03-25
Also published as: WO2025058514A1

Abstract

5 1 A method comprises receiving (201) probe information (88) which indicates, for a probe, per feature, an index which represents a value of the feature. The method further comprises selecting (203), per feature, from reference vectors associated with the feature, a reference vector (45 1-45 3) corresponding to the index indicated for the feature. The pluralities of reference vectors (423) are stored in a database. Each of the reference vectors comprises feature comparison scores relating to the references and are fully homomorphically encrypted. Feature comparison scores relating to a reference have been 10 obtained from a lookup table based on a value of the corresponding feature for the reference. The method further comprises calculating (205) a vector (455) of evaluations by summing up the selected reference vectors without decrypting them and determining or enabling to determine (207) a relationship between the probe and each of the references based on the vector of evaluations. 15 FIG. 14

Description

DETERMINING A RELATIONSHIP BETWEEN A PROBE AND MULTIPLE

REFERENCES UNDER ENCRYPTION

FIELD OF THE INVENTION

The invention relates to a computer-implemented method of determining or enabling to determine a relationship between a probe and each of a plurality of references under encryption.

The invention also relates to computer program products enabling a computer system to perform such a method.

The invention further relates to a system for determining or enabling to determine a relationship between a probe and each of a plurality of references under encryption,

BACKGROUND OF THE INVENTION

Biometric search is the process of comparing a biometric probe against many biometric references in a database to return either 1) a ranked list of the most similar candidates or 2) a comparison decision that the probe does or does not match with one or more references. More precisely, given a biometric probe and a biometric reference database, the goal is to efficiently evaluate a score function (e.g. cosine similarity) by pair-wise comparing all references in the database against a probe and return a list of scores, after which several filter functions (e.g. top-N ranking) can evaluate this list to yield a search decision.

For accurately automating the recognition process, biometric samples are often represented by distinctive feature vectors from well-trained deep neural networks. In large biometric reference databases, the biometric search problem becomes particularly challenging due to the number of references in the database, the vectors’ dimensionality, and the scalability of the score function with respect to both the number of references and the dimensionality.

Although biometric recognition systems achieve high recognition performance, this comes at the cost of individuals’ privacy. The biometric feature vector representations used in such systems contain personally identifiable information, and their analysis can reveal individuals’ soft biometrics, making them privacy-sensitive. Moreover, those feature vectors are invertible to the raw biometric samples. which can lead to identity fraud. Therefore, biometric feature vectors are sensitive and need to be protected in their three stages, namely at rest, in transit, and in use.

Fully homomorphic encryption (FHE) technology demonstrates very promising and practical solutions. The use of FHE guarantees the protection of biometric probes and references in their three stages since their FHE encrypted form allows arithmetic operations under encryption; thus. a score function can be evaluated on their encrypted form without decryption. Despite the heavy computational nature of homomorphic operations,

FHE can encrypt a vector of plaintext values and process it at once in a single-instruction multiple-data (SIMD) manner, which facilitates the search over encrypted biometric reference databases. Hence, for sensitivity and practicality reasons, biometric reference databases are homomorphically encrypted since the references are not only sensitive but also useful for evaluating a score function at a later stage.

Under encryption, the biometric search problem becomes even harder to tackle due to the heavy additional computational workload brought in by homomorphic operations. To cope with the burden of large biometric reference databases containing high dimensional vectors, traditional biometric search approaches, in both the cleartext and under encryption, avoid exhaustively parsing the full database by employing techniques such as indexing, hierarchical clustering, and hashing, to preselect and thus reduce the search space at the cost of their biometric search accuracy. Thus, their search results are only approximate since the probe is not evaluated across the entire database but only over a subset estimated to contain relevant references.

However, several applications, such as biometric forensics, demand high precision to avoid severe consequences. In such cases, an accurate full search across all references in the database is essential to prevent wrongful convictions of innocent people due to incorrect matches of traces like facial surveillance images or finger marks. Biometric search solutions exist that focus on the exhaustive biometric search using FHE supporting

SIMD by adapting their encrypted reference database using a packing technique to combine a maximum number of references per ciphertext, to allow the parallelization of several comparisons at the cost of one. However, the computational complexity of these solutions is relatively high. For example, the system disclosed in US2019/140818 Al encrypts the probe vector x and thus they are obliged to perform at least d homomorphic multiplications, where d is the dimension of the vector x.

SUMMARY OF THE INVENTION

It is the first objective of the invention to provide a method. which can be used to accurately determine a relationship between a probe and multiple references under encryption with reduced computational complexity.

It is the second objective of the invention to provide a system, which can accurately determine a relationship between a probe and multiple references under encryption with reduced computational complexity.

In the first aspect of the invention, a computer-implemented method of determining or enabling to determine a relationship between a probe and each of a plurality of references under encryption comprises receiving probe information, said probe information indicating, for said probe. an index per feature of a plurality of features. said index having been determined per respective feature of said plurality of features based on a value of said respective feature for said probe, selecting, per respective feature of said plurality of features, from a plurality of reference vectors associated with said respective feature. a reference vector corresponding to said index indicated for said respective feature, said pluralities of reference vectors being stored in a database, each of said reference vectors comprising feature comparison scores relating to said plurality of references and being fully homomorphically encrypted, feature comparison scores relating to a respective reference of said plurality of references having been obtained from a lookup table based on a value of said respective feature for said respective reference. calculating a vector of evaluations by summing up said selected reference vectors without decrypting said selected reference vectors, said vector of evaluations being fully homomorphically encrypted. and determining or enabling to determine said relationship between said probe and each of said plurality of references based on said vector of evaluations. The method may be performed by software running on a programmable device. This software may be provided as a computer program product.

By using a lookup table-based method for comparing a probe and a reference, which methods are known to be multiplication-free and preserve the accuracy of the comparisons, and organizing the fully homomorphically encrypted database such that several references are packed in a way that enables them to be pair-wise compared against the same probe at once and such that it becomes possible to use the full plaintext packing capacity of ciphertexts to store the encrypted reference templates, it becomes possible to accurately determine a relationship between a probe and multiple references under encryption with reduced computational complexity. For example, an exhaustive search over encrypted references with zero homomorphic encryption multiplications may be realized.

Said probe may be a biometric probe, said references may be biometric references. and said features may be biometric features. for example. Summing up said selected reference vectors may comprise performing a vector summation in a Single

Instruction Multiple Data manner. Thus, the use of the SIMD property of vector summation under fully homomorphic encryption may be maximized to obtain an exhaustive biometric search with reduced computation complexity. The method may also be used in a domain other than the biometric domain.

Said pluralities of references vectors may belong to a first chunk of said database and said method may comprises selecting, per respective feature of said plurality of features, from a further plurality of reference vectors associated with said respective feature, a further reference vector corresponding to said index indicated for said respective feature, said pluralities of further reference vectors being stored in said database and belonging to a second chunk of said database, each of said further reference vectors comprising feature comparison scores relating to said plurality of further references and being fully homomorphically encrypted, feature comparison scores relating to a respective further reference of said plurality of further references having been obtained from said lookup table based on a value of said respective feature for said respective further reference, calculating a further vector of evaluations by summing up said selected further reference vectors without decrypting said selected further reference vectors, said further vector of evaluations being fully homomorphically encrypted. and determining or enabling to determine said relationship between said probe and each of said plurality of references further based on said further vector of evaluations.

The plaintext packing capacity of ciphertexts, which depends on the chosen security level for the FHE scheme, defines a maximum number of references that may be compared with a probe with the same operation. To compare a large number of references with a probe, the database may be partitioned into chunks, each comprising a maximum number of references determined by the plaintext packing capacity. When used for a biometric search, the use of the SIMD property of vector summation under fully homomorphic encryption may be maximized to obtain an optimal exhaustive biometric search with a complexity dominated by the number of chunks instead of the number of references.

Determining said relationship between said probe and said plurality of references may comprise applying a filter function to said homomorphically encrypted vector of evaluations. This is referred to as an encrypted decision. Alternatively, determining said relationship between said probe and said plurality of references may comprise decrypting said homomorphically encrypted vector of evaluations and applying a filter function to said decrypted vector of evaluations. This is referred to as a cleartext search decision.

Determining said relationship between said probe and said plurality of references may comprise determining a similarity or a dissimilarity between said probe and 5 each of said plurality of references. Determining said similarity between said probe and said plurality of references may comprise identifying references with an evaluation higher than a threshold and/or identifving references with the highest evaluation. Determining said dissimilarity between said probe and said plurality of references may comprise identifying references with an evaluation lower than said threshold and/or identifying references with the lowest evaluation.

Said pluralities of reference vectors and said vector of evaluations may be fully homomorphically encrypted with the public key of a privacy-compliant service provider and said privacy-compliant service provider may be allowed to learn only said relationship between said probe and each of said plurality of references by transmitting result information to said privacy-compliant service provider, said result information comprising said fully homomorphically encrypted vector of evaluations filtered or not.

Said method may further comprise obtaining pluralities of temporary reference vectors, each temporary reference vector of said pluralities of temporary reference vectors comprising one or more feature comparison scores relating to one or more new references at one or more positions, said one or more positions being unused in said pluralities of reference vectors, each of said one or more new references being associated with one of said one or more positions, each temporary reference vector of said pluralities of temporary reference vectors further comprising zeros at other positions and being fully homomorphically encrypted. said other positions being used for said plurality of references in said pluralities of reference vectors, and modifying said pluralities of reference vectors by adding each temporary reference vector of said pluralities of temporary reference vectors to a corresponding reference vector of said pluralities of reference vectors.

Said method may further comprise creating pluralities of further temporary reference vectors by multiplying a multiplication vector with each reference vector of said pluralities of reference vectors, said multiplication vector comprising one or more values equaling minus one at one or more positions which are used in said pluralities of reference vectors for one or more references which are to be removed from said database and zeros at other positions, said other positions being used in said pluralities of reference vectors for references which are not to be removed from said database, and modifving said pluralities of reference vectors by adding each further temporary reference vector of said pluralities of further temporary reference vectors to a corresponding reference vector of said pluralities of reference vectors.

This allows references to be added and/or removed from the database under encryption, thereby providing flexibility. Many existing biometric search approaches under encryption that try to cope with the burden of large biometric reference databases containing high dimensional vectors lack flexibility. They are static with respect to the addition and deletion of encrypted references since thev assume to have the entire reference database beforehand, structure it to allow efficient search, and then encrypt the structured database; thus, adding or deleting a reference requires changing the structure of the database.

Said indices indicated in said probe information may have been created by mapping a respective identifier selected from an ordered set of identifiers for each respective feature of said plurality of features to a permuted identifier from a permutation of said ordered set of identifiers, said identifiers identifying lines of cells in said lookup table, said respective identifier having a certain order number in said ordered set of identifiers, and said permuted identifier having said same certain order number in said permutation, said order of said permutation being dependent on said probe.

A first order in which said feature comparison scores are stored in different reference vectors of a respective plurality of reference vectors for a respective reference of said plurality of references may be different from a second order in which said feature comparison scores are stored in said lookup table. said first order being dependent on said respective reference.

By using permuted identifiers in the probe information, others, including the receiver of the probe information, may be prevented from determining from the probe information of multiple probes which probes have similar features. If the probe and a reference are the same subject, then the same permutation function would have been used on the probe information and the reference information of this subject. If the probe and a reference are not the same subject, then a different permutation function would have been used on the probe information and the reference information of this subject. This has the additional advantage of an improved recognition performance and an improved search performance.

Said feature comparison scores relating to a respective reference of said plurality of references and a respective feature of said plurality of features may have been obtained from said lookup table based a quantized actual value of said respective feature for said respective reference and may have been determined for quantized potential values of said respective feature for any future probe.

Said feature comparison scores may be obtained from a single lookup table for said plurality of features or from a lookup table per feature of said plurality of features.

In a fifth aspect of the invention, a system for determining or enabling to determine a relationship between a probe and each of a plurality of references under encryption comprises at least one processor configured to receive probe information, said probe information indicating, for said probe, an index per feature of a plurality of features, said index having been determined per respective feature of said plurality of features based on a value of said respective feature for said probe, select, per respective feature of said plurality of features, from a plurality of reference vectors associated with said respective feature, a reference vector corresponding to said index indicated for said respective feature. said pluralities of reference vectors being stored in a database, each of said reference vectors comprising feature comparison scores relating to said plurality of references and being fully homomorphically encrypted. feature comparison scores relating to a respective reference of said plurality of references having been obtained from a lookup table based on a value of said respective feature for said respective reference, calculate a vector of evaluations by summing up said selected reference vectors without decrypting said selected reference vectors, said vector of evaluations being fully homomorphically encrypted, and determine or enable to determine said relationship between said probe and each of said plurality of references based on said vector of evaluations.

Moreover, a computer program for carrying out the methods described herein, as well as a non-transitory computer readable storage-medium storing the computer program are provided. A computer program may, for example, be downloaded by or uploaded to an existing device or be stored upon manufacturing of these systems.

A non-transitory computer-readable storage medium stores at least a first software code portion, the first software code portion, when executed or processed by a computer, being configured to perform executable operations for determining or enabling to determine a relationship between a probe and each of a plurality of references under encryption.

The executable operations comprise receiving probe information, said probe information indicating, for said probe, an index per feature of a plurality of features, said index having been determined per respective feature of said plurality of features based on a value of said respective feature for said probe, selecting, per respective feature of said plurality of features, from a plurality of reference vectors associated with said respective feature, a reference vector corresponding to said index indicated for said respective feature, said pluralities of reference vectors being stored in a database. each of said reference vectors comprising feature comparison scores relating to said plurality of references and being fully homomorphically encrypted, feature comparison scores relating to a respective reference of said plurality of references having been obtained from a lookup table based on a value of said respective feature for said respective reference, calculating a vector of evaluations by summing up said selected reference vectors without decrypting said selected reference vectors, said vector of evaluations being fully homomorphically encrypted, and determining or enabling to determing said relationship between said probe and each of said plurality of references based on said vector of evaluations.

As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a device, a method. or a computer program product.

Accordingly, aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.), or an embodiment combining software and hardware aspects that may all generally be referred to herein as a "circuit", "module" or "system." Functions described in this disclosure may be implemented as an algorithm executed by a processor/microprocessor of a computer. Furthermore, aspects of the present invention may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied, ¢.g., stored, thereon.

Any combination of one or more computer readable medium(s) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing.

More specific examples of a computer readable storage medium may include, but are not limited to, the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of the present invention, a computer readable storage medium may be any tangible medium that can contain, or store, a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in a baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms,

including, but not limited to, electro-magnetic, optical, or any suitable combination thereof.

A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber, cable, RF, etc. or any suitable combination of the foregoing. Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java(TM), Smalltalk, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any tvpe of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an

Internet Service Provider).

Aspects of the present invention are described below with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the present invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor, in particular a microprocessor or a central processing unit (CPU), of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer, other programmable data processing apparatus, or other devices create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of devices, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s).

It should also be noted that, in some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

BRIEF DESCRIPTION OF THE DRAWINGS

These and other aspects of the invention are apparent from and will be further elucidated, by way of example, with reference to the drawings, in which:

Fig. 1 is a block diagram of an embodiment of the system for determining or enabling to determine a relationship between a probe and multiple references under encryption;

Fig. 2 is a flow chart of a first embodiment of the method of determining or enabling to determine a relationship between a probe and multiple references under encryption;

Fig. 3 is a flow chart of an example method of creating a lookup table;

Fig. 4 illustrates the projection of a point belonging to the unit ball of a space onto an axis;

Fig. 5 illustrates the determination of border values of cells of the lookup table in the method of Fig. 3;

Fig. 6 shows an example of a lookup table created with the method of Fig. 3:

Fig. 7 is a flow chart of a first example method of creating reference information;

Fig. 8 is a flow chart of a second example method of creating reference information;

Fig. 9 shows an example of the pluralities of temporary reference vectors which may be created to add a reference to the database;

Fig. 10 shows steps for the addition of a reference to the database in a second embodiment of the method;

Fig. 11 shows an example of the pluralities of reference vectors stored in the database after the modifications in step 223 of Fig. 9;

Fig. 12 a flow chart of a first example method of creating probe information;

Fig. 13 a flow chart of a second example method of creating probe information;

Fig. 14 illustrates the method of Fig. 2 being performed on the database of

Fig. 11;

Fig. 15 shows steps for the removal of a reference from the database in a third embodiment of the method;

Fig. 16 shows an example of the pluralities of further temporary reference vectors created in step 231 of Fig. 15;

Fig. 17 is a flow chart of a fourth embodiment of the method;

Fig. 18 shows an example of a database with multiple chunks used in the method of Fig. 17;

Figs. 19-21 show the biometric performance of the MFIP comparator for the verification task and for the search task; and

Fig. 22 is a block diagram of an exemplary data processing system for performing the methods of the invention.

Corresponding elements in the drawings are denoted by the same reference numeral.

DETAILED DESCRIPTION OF THE DRAWINGS

Fig. 1 is a block diagram of an embodiment of the system for creating a lookup table of feature comparison scores of a plurality of features for determining a relationship between a probe and a reference under encryption: a system 1. Fig. 1 further shows a client device 11, a search server 21 of a privacy-compliant service provider, and a database server 31.

In an honest-but-curious scenario, client devices may be used, for example, by biometric data owners who want to use a service (e.g. a social service) offered by a service provider (e.g. healthcare, pension, or public assistance programs). The service provider (i.e. the search server) gives access to its services only after checking if a client is present in its biometric reference database. On the one hand. the client wants to keep its biometric data protected, and the service provider wants to comply with the data protection regulations, such as the European GDPR. On the other hand. the service provider needs to store the protected biometric reference during enrollment and learn the search outcome before granting access.

Hence, the service provider should be able to read the search outcome after searching over a protected biometric reference database; at the same time, the service provider should not be able to learn the unprotected biometric references. One way to achieve this 1s by homomorphically encrypting the biometric reference database using the public key (pk) of the service provider and delegating the search to a separate non-colluding database server, which returns the encrypted search outcome to the service provider who uses its private/secret key (sk) to decrypt it.

Thus, the client devices encrypt their biometric reference templates using pk but send them to the database server, which does not know sk, to update the encrypted reference database with these templates, search over the reference templates while encrypted, and send to the service provider only the search outcome encrypted. Therefore, the clients’ biometric data is protected during the process, the service provider complies with the regulations and can learn the search outcome, and the database server is processing only encrypted data.

The client devices 11, 41, and 42 are configured to create probe information and reference information for determining the relationship under encryption. The database server 31 is configured to enable the search server 21 to determine the relationship. The probe may be a biometric probe, the reference may be a biometric reference, and the features may be biometric features, for example.

The system | comprises a receiver 3, a transmitter 4, a processor 5, and a memory 7. The processor 5 is configured to create one or more lookup tables. The processor 5 may be configured to transmit, via the transmitter 4, the one or more lookup tables to the client devices 11, 41, and 42, for example. As a first example, the method disclosed in the paper “‘Multiplication-free biometric recognition for faster processing under encryption” by

A. Bassit, F. Hahn, R. Veldhuis, and A. Peter, published in 2022 IEEE International Joint

Conference on Biometrics (IJCB), 2022, pp. 1-9, may be used to create a single lookup table for all features.

For example, the processor 5 may be configured to determine a probability density function corresponding to a projection of a point belonging to the unit ball of a space onto an axis, determine a cumulative distribution function of the probability density function, determine an inverse of the cumulative distribution function, and determine border values of cells of the lookup table based on the inverse of the cumulative distribution function. The space has a number of dimensions which is equal to the number of features in the plurality of features. This unit ball is also referred to as the unit d-ball, where d is the number of dimensions of the space. The border values apply to both a first axis and a second axis of the lookup table.

In this example, the processor is further configured to determine a first score component for each respective cell of the cells and determine a second score component for each respective cell of the cells. The first score component is an integration of the probability density function over normalized possible values of the plurality of features in an interval between the border values of the respective cell on the first axis of the lookup table. The second score component is an integration of the probability density function over normalized possible values of the plurality of features in an interval between the border values of the respective cell on the second axis of the lookup table.

The integrations may be calculated as part of the score determination or may be calculated before the scores are determined. In this example, the processor 5 is further configured to calculate, for each respective cell of the cells, a feature comparison score by performing a mathematical operation on the first score component calculated for the respective cell and the second score component calculated for the respective cell.

As a second example, a lookup table may be created per feature using the method disclosed in the paper “Fast and accurate likelihood ratio-based biometric verification secure against malicious adversaries” by A. Bassit, F. Hahn, J. Peeters, T.

Kevenaar, R. Veldhuis, and A. Peter, published in IEEE Transactions on Information

Forensics and Security, vol. 16, 2021.

The client devices 11, 41, and 42 each comprise a receiver 13, a transmitter 14, a processor 15, and a memory 17. The processor 15 is configured to obtain, ¢.g. receive via receiver 13, one or more lookup tables from the system 1, obtain a normalized and quantized value of each of a plurality of features for the reference, select, from the applicable lookup table, a line of cells per respective feature of the plurality of features on the first axis of the one or more lookup tables based on the normalized and quantized value of the respective feature and the border values on the second axis of this lookup table, create a vector for each of the selected lines of cells, and create the reference information by homomorphically encrypting each of the vectors, typically with a public key associated with the privacy-compliant service provider. The processor 15 is configured to transmit, via the transmitter 14, the reference information to the search server 21, i.e. to the service provider.

In an alternative embodiment, the processor 15 is configured to transmit the reference information directly to the database server 31.

The processor 15 is further configured to obtain a normalized and quantized value of each of a plurality of features for the probe and select an identifier per respective feature of the plurality of features based on the normalized and quantized value of the respective feature and the border values on the first axis of the lookup table which is applicable to the respective feature. The identifier identifies a line of cells on the second axis of this lookup table. The processor 15 is further configured to determine, for inclusion in the probe information. an index per feature of the plurality of features based on the selected identifiers. The processor 15 is configured to transmit, via the transmitter 14, the probe information to the search server 21, i.e. the service provider. The transmitted probe information includes the index per feature. In an alternative embodiment, the processor 15 is configured to transmit the probe information directly to the database server 31.

The search server 21 comprises a receiver 23, a transmitter 24, a processor 25, and a memory 27. The processor 25 is configured to receive, via receiver 23, reference information from a client device, and to transmit, via transmitter 24, the reference information to the database server 31. The processor 25 is further configured to receive, via receiver 23, probe information from a client device and transmit, via transmitter 24, the probe information to the database server 31, receive, via receiver 23, result information from the database server 31 in response to providing the probe information, and determine the relationship between the probe and the reference based on the result information.

The database server 31 comprises a receiver 33, a transmitter 34, a processor 35, and a memory 37. The processor 35 is configured to receive, via receiver 33, items of reference information from the search server 21. Each item of reference information comprises a plurality of homomorphically encrypted vectors. Each vector comprises a plurality of feature comparison scores. The items of reference information are provided by one or more of the client devices 11, 41, and 42 to the search server 21 and by the search server 21 to the database server 31 in the enrollment phase.

The processor 35 is further configured to store the received reference information in a database stored in memory 37. The database stores at least a first chunk comprising multiple pluralities of reference vectors for a plurality of references. Each plurality of reference vectors is associated with one of the features. Each of the reference vectors comprises feature comparison scores relating to the plurality of references and is fully homomorphically encrypted. Feature comparison scores relating to a respective reference of the plurality of references have been obtained from a lookup table based on a value of the respective feature for the respective reference.

The database may store multiple chunks for multiple pluralities of references.

If n features are extracted for each reference and each probe, each chunk comprises n pluralities of reference vectors, and if the lookup table 71 of Fig. 6 is used, each plurality of reference vectors comprises eight reference vectors. The processor 35 is further configured to receive, via receiver 33, the probe information from the search server 21. The probe information comprises an index per feature of a plurality of features. The index has been determined per respective feature of the plurality of features based on a value of the respective feature for the probe.

The processor 35 is further configured to select. per respective feature of the plurality of features, from the plurality of reference vectors associated with the respective feature, a reference vector corresponding to the index indicated for the respective feature in the probe information, and calculate a vector of evaluations by summing up the selected reference vectors without decrypting the selected reference vectors. The vector of evaluations is fully homomorphically encrypted (FHE).

FHE is a revolutionary encryption technology that enables secure computation over encrypted data, preserving its privacy during processing. Its security against Indistinguishable Chosen-Plaintext Attack (IND-CPA) prevents honest-but-curious attackers from learning the protected data by observing or manipulating ciphertexts.

However, its practical implementation is challenging due to its high computational complexity.

The heaviest homomorphic operations are multiplications and rotations, while homomorphic additions come for free. Thus, given a security level of X bits, a ciphertext has a plaintext capacity of e(2) := c slots, which means it can encrypt a plaintext vector of up to ¢ slots into one ciphertext and perform ¢ homomorphic operations at the cost of one homomorphic operation. By exploiting the SIMD property of FHE, FHE computations can be executed more efficiently, making it suitable for searching large encrypted reference databases via a packing approach. Such a packing approach allows the parallelization of several comparisons at the cost of one. The database server 31 leverages the SIMD property of FHE by using a vertical packing approach, where each ciphertext represents the encryption of a vector element of several references.

Given a d-dimensional vector x=(xi,*-,Xq), the encryption of x using traditional HE schemes implies an elementwise encryption [x] = (xl, ..., [x4]) resulting in d ciphertexts, while using FHEs supporting SIMD, it becomes a single ciphertext packing the entire vector [x] = (kl, ..., [x4]). Thus, arithmetic operations ©. such as additions and multiplications, applied to ciphertexts packing a vector will be spread over its vector elements (a.k.a plaintext slots) [x] ® [yl = [Gi ® yi, .... xg © ya)]. In addition, the

SIMD property allows the homomorphic rotation of a vector under encryption

Rot; (I(x, 2x) = [Ci Xi wor Xa, X15 oo, Xi]

The processor 35 is optionally configured to apply a filter function to the homomorphically encrypted vector of evaluations. The processor 35 is further configured to transmit, via transmitter 34, result information to the search server 21 to enable the search server 21 to determine the relationship between the probe and the reference based on the result information.

As the database server 31 is preferably not able to decrypt the fully homomorphically encrypted vector of evaluations, the pluralities of reference vectors and the vector of evaluations are preferably fully homomorphically encrypted with the public key of the privacy-compliant service provider which operates the search server 21, and the result information preferably comprises the fully homomorphically encrypted vector of evaluations, filtered or not. In this case, the processor 25 of the search server 21 is configured to decrypt the (filtered or unfiltered) homomorphically encrypted vector of evaluations.

If the vector of evaluations is not already filtered, the processor 25 of the search server 21 is configured to apply a filter function to the decrypted vector of evaluations, thereby making a cleartext search decision. If the vector of evaluations is already filtered, an encrypted decision is made.

In the embodiments shown in Fig. 1, the systems 1, 21, and 31 comprise one processor 5, one processor 25, and one processor 35, respectively. In an alternative embodiment, one or more of the systems 1, 21, and 31 comprise multiple processors. The processors 5, 25, and 35 may be general-purpose processors, e.g.. ARM, Qualcomm, AMD, or Intel processors. or application-specific processors. The processors 5, 25, and 35 may run

Google Android, Apple 10S, a Unix-based operating system, or Windows as operating system, for example.

The receivers 3, 23, and 33 and the transmitters 4, 24, and 34 of the systems

I, 21, and 31, respectively, may use one or more wired or wireless communication technologies such as Ethernet, Wi-Fi, LTE, and/or 5G New Radio to communicate with other devices on the Internet. The receiver and the transmitter of a system may be combined in a transceiver. The systems 1, 21, and 31 may comprise other components typical for a digital device such as a server.

In the embodiment shown in Fig. 1, the client devices 11, 41, and 42 comprise one processor 15. In an alternative embodiment, one or more of the client devices 11, 41, and 42 comprise multiple processors. The processor 15 may be a general-purpose processor, ¢.g., an ARM, Qualcomm, AMD, or Intel processor, or an application-specific processor. The processor 15 may run Google Android, Apple 108, a Unix-based operating system, or Windows as an operating system, for example.

The receiver 13 and the transmitter 14 of the client devices 11, 41, and 42 may use one or more wired or wireless communication technologies such as Ethernet, Wi-Fi,

LTE, and/or 5G New Radio to communicate with other devices on the Internet. The receiver 13 and the transmitter 14 may be combined in a transceiver. The client devices 11. 41, and 42 may comprise other components typical for a client device, ¢.g., a battery and/or a power connector.

A first embodiment of the computer-implemented method of determining or enabling to determine a relationship between a probe and each of a plurality of references under encryption is shown in Fig. 2. This method may be performed by the database server 31 of Fig. 1, for example. The probe may be a biometric probe, the reference may be a biometric reference, and the features may be biometric features, for example.

A step 201 comprises receiving probe information. The probe information indicates, for the probe, an index per feature of a plurality of features. The index has been determined per respective feature of the plurality of features based on a value of the respective feature for the probe.

A step 203 comprises selecting, per respective feature of the plurality of features, from a plurality of reference vectors associated with the respective feature, a reference vector corresponding to the index indicated for the respective feature in the probe information received in step 201. The pluralities of reference vectors have been stored in a database. Each of the reference vectors comprise feature comparison scores relating to the plurality of references and are fully homomorphically encrypted.

Feature comparison scores relating to a respective reference of the plurality of references have been obtained from a lookup table based on a value of the respective feature for the respective reference. The feature comparison scores are obtained from a single lookup table for the plurality of features or from a lookup table per feature of the plurality of features, as described in relation to Fig. 1.

A step 205 comprises calculating a vector of evaluations by summing up the selected reference vectors without decrypting the selected reference vectors. The vector of evaluations is fully homomorphically encrypted. A step 207 comprises determining or enabling to determine the relationship between the probe and each of the plurality of references based on the vector of evaluations calculated in step 205.

Determining the relationship between the probe and the plurality of references may comprise applying a filter function to the homomorphically encrypted vector of evaluations. Alternatively, determining the relationship between the probe and the plurality of references may comprise decrypting the homomorphically encrypted vector of evaluations and applying a filter function to the decrypted vector of evaluations.

Determining the relationship between the probe and the plurality of references may comprise determining a similarity or a dissimilarity between the probe and each of the plurality of references. Determining the similarity between the probe and the plurality of references may comprise identifying references with an evaluation higher than a threshold and/or identifying references with the highest evaluation. Determining the dissimilarity between the probe and the plurality of references may comprise identifying references with an evaluation lower than the threshold and/or identifying references with the lowest evaluation.

Fig. 3 shows an example method of creating a lookup table. This example method is described in more detail in the above-mentioned paper “Multiplication-free biometric recognition for faster processing under encryption”. This method may be performed by the system 1 of Fig. 1, for example.

A step 101 comprises determining a probability density function (PDF) corresponding to a projection of a point belonging to the unit ball of a space onto an axis.

The space has a number of dimensions which is equal to the number of features in the plurality of features. This unit ball is also referred to as the unit d-ball, where d is the number of dimensions of the space. This PDF is the PDF of every coordinate of a d-dimensional normalized feature vector. Preferably, the probability density function corresponds to a projection, onto the axis, of points which are uniformly distributed over the unit ball. This results in the best performance.

Fig. 4 illustrates the projection of a point belonging to the surface area of the unit d-ball onto the x-axis. In the example of Fig. 4, the projection axis is the x axis 51. Let

X denote the random variable after projection on the x-axis 51. Let x denote a realization 55 of that random variable. Let g(x) denote the angle of a point on the d-ball that is projected d onto x. Let Sa-1 denote the surface 53 of a d-ball given by Sq-1(T) = 9 121 wherer 2 denotes the radius and I" denotes the Gamma function T(z) = | 0 e ttZ-ldt. The probability mass of the points on the d-ball is uniformly distributed over the d-ball. The probability mass that a point on the d-ball is projected onto the x-axis is the mass contained on the surface of the (d-1)-ball with center x and radius 56: V1 — x? = sin (¢(x)).

In order to derive the PDF fx (x). the equation f(x) = =~ PR{X < x} may 1 n - 1. be used. Fig. 4 shows that PR{X < x} = IG Soto S4-2(sin(£))dt, where SD isa normalization factor. The same probability density function can be used for both the x-axis and the y-axis. Thus, f(x) = fy(x) = f, (x). In view of the above, the probability density function f(x) may be calculated with the following equation: 1 (d-3) fC) = omar (VI- 2%) (1) 20 2 where the Beta function B(x,y) = I TI — NVT dt, xel-11], and d is the number of dimensions of the space.

A step 103 comprises determining a cumulative distribution function (CDF) of the probability density function. The cumulative distribution function may be calculated with the following equation:

F(x) = — |= (z) arcsin(t) +

BGI 1273

Tn ; . x tt 774 n dt n-2kf_ ot (* TT 2k 3 {n-2k-j) | 9 2-1 do (x) (n—2k) (37 oda( 1) j (v 1-t ) t 9 (2) where B(x, y) = fy tT H1 — £)¥"1 dt. d is the number of dimensions of the space, and n= d-2.

A step 105 comprises determining an inverse of the cumulative distribution function. For calculating the inverse CDF, Brent’s method implemented in SciPy (e.g. (release 1.9.0) may be used to compute the numerical inverse of the CDF at a specific point, see e.g. htips://scipv org.

A step 107 comprises determining border values of cells of the lookup table based on the inverse of the cumulative distribution function. The border values apply to both a first axis and a second axis of the lookup table. A single lookup table is used for all features, which is possible, because all features follow the same PDF. The raw values of different features may have different ranges but they are normalized to fall within the same range [-1,1]. The raw values are further quantized to accommodate the lookup table. A feature quantization maps the continuous domain of real numbers to a finite set of bins, which is needed to limit the possible feature values. The size of the lookup table depends on the chosen number of feature quantization levels N = 2", where n is the number of bits upon which the feature is encoded.

The bins’ borders, and thus the cell borders, may be determined with a variation on the algorithm shown below. The below algorithm is taken from the paper “Fast and accurate likelihood ratio-based biometric verification secure against malicious adversaries” by A. Bassit, F. Hahn, J. Peeters, T. Kevenaar, R. Veldhuis, and A. Peter, IEEE

Transactions on Information Forensics and Security, vol. 16, 2021. In the below algorithm,

ICDF(p,0,1) is the inverse of the CDF of a NM (0,1) at the cumulative probability p. In the variation on this algorithm, the inverse CDF of the point projection on the unit d-ball is used stead of ICDF(p.0.1).

Algorithm 1:

Input: N: number of feature quantization levels

Output: Bn: array containing the bins’ borders of size N —1; forj <= ltoN-1do p=i/N:

Bn[j]=ICDF(p.0,1); end for

Fig. 5 illustrates the determination of border values of cells of the lookup table. Fig. 5 shows an example of a lookup table 61 with four feature quantization levels and feature vectors of dimension d. The border values 63, 64, and 65 are the same for both axes and each correspond to the inverse CDF of the projection of a point belonging to the unit d- ball onto an axis. The calculation of the feature comparison scores, including the feature comparison score of cell 69, will be described in relation to steps 109-113.

Fig. 6 shows an example of the lookup table created with the method of Fig. 3. In general, one of the axes of the lookup table corresponds to the potential reference and the other axis corresponds to the potential probe. In the example of Fig. 6, the y axis of the lookup table 71 represents possible values of the reference and the x axis of the lookup table 71 represents possible features values of the probe. In the example of Fig. 6, the lookup table 71 has eight feature quantization levels and feature vectors of dimension d. There is only one lookup table for all dimensions/features and a higher number of dimensions/features does not increase the size of the lookup table. The lookup table 71 contains 8x8=64 feature comparison scores.

A step 109 comprises determining a first score component for each respective cell of the cells. The first score component is an integration of the probability density function over normalized possible values of the plurality of features in an interval between the border values of the respective cell on the first axis of the lookup table. The integrations may be calculated as part of the score determination or may be calculated before the scores are determined (either in step 109 or in a separate step). If the first axis is the x axis, a first score component Ex[x|Bn,] may be calculated as follows:

Ey[x|Bn,] = Jom, xf (x)dx (3)

Bn is the interval between the border values of the cell x,y with respect to the x-axis. The interval Bn, has one of the three forms [—1,Bn[1]). [Bn[j].Bn[j+1], or [Bn|N- 1].1]. For example, the border value 65 of Fig. 5 corresponds to Bn[1], the border value 64 of

Fig. 5 corresponds to Bn[2], and the border value 63 of Fig. 5 corresponds to Bn[3].

Equation (3) defines the expectation over the cell’s borders with respect to the x-axis Bn.

A step 111 comprises determining a second score component for each respective cell of the cells. The second score component is an integration of the probability density function over normalized possible values of the plurality of features in an interval between the border values of the respective cell on the second axis of the lookup table. The integrations may be calculated as part of the score determination or may be calculated before the scores are determined (either in step 111 or in a separate step). If the second axis Is the y axis, a second score component Ey [v|Bny] may be calculated as follows:

EylylBny] = fo, vf Ody 4)

Bn, is the interval between the border values of the cell x,v with respect to the y-axis. The interval Bn, has one of the three forms [-1,Bn[1]). [Bn[j].Bn[j+1], or [Bn|N-

11.1]. Equation (3) defines the expectation over the cell’s borders with respect to the y-axis

Bn,.

A step 113 comprises calculating, for each respective cell of the cells, a feature comparison score by performing a mathematical operation on the first score component calculated for the respective cell and the second score component calculated for the respective cell. By performing step 113, the creation of the lookup table is completed.

The feature comparison score may be calculated, for example, for each respective cell of the cells, by multiplying the first score component calculated for the respective cell and the second score component calculated for the respective cell, i.e. by calculating the inner product. as shown below in Equation (5). Equation (6) represents the probability of a cell.

Ex bx, y|Bn] = Xr bin] 5)

P(Bn) == (6)

Alternatively, the feature comparison score may be calculated, for example, for each respective cell of the cells, by calculating the square of a difference between the first score component calculated for the respective cell and the second score component calculated for the respective cell, i.e. by calculating the squared Euclidian distance, as shown below in Equation (7).

Buy, y1Bn) = Eel Bellen) 7

The feature comparison scores Ey y[x, y|Bn] may be intermediate comparison scores which are then quantized to integers. Given that the feature comparison scores Exy[x, ylBn] are real-valued, another quantization mapping may be applied, that may be referred to as cell quantization, to convert the real values of the intermediate comparison scores to integers, making them suitable for homomorphic encryption (HE) schemes with an integer plaintext space. The cell quantization takes the cell’s intermediate comparison score, divides it by a quantization step A, and rounds it to the nearest integer. The resulting lookup table, e.g. lookup table 71 of Fig. 6, consists of precomputed integer values. An example of a final feature comparison score is score 77 (“5”) of Fig. 6.

In summary, to generate the lookup table, first, the borders of a cell are determined by equiprobably cutting the x-axis and y-axis according to the PDF of the projection of a point belonging to the unit d-ball onto an axis. Then, a table of NxN cells is defined, where N denotes the feature quantization levels, and each cell's borders are determined according to the bins Bn for both axes x and y. Subsequently, the cell’s feature comparison score is calculated with the joint conditional expectation Equation (5) or the joint conditional expectation Equation (7) for independent X and Y . A lookup table generated with Equation (5) is also referred to as an MFIP (Multiplication-Free Inner Product) lookup table and a lookup table generated with Equation (7) is also referred to as an MFSED (Multiplication-Free Squared Euclidean Distance) lookup table.

Fig. 7 shows a first example of a method of creating reference information.

This method may be performed by the client device 11 of Fig. 1, for example. A step 131 comprises obtaining a lookup table created with the method of Fig. 3. A step 133 comprises obtaining a normalized and quantized value of each of a plurality of features for the reference. The same normalization and feature quantization performed in step 107 of Fig. 3 is performed in step 133.

A step 135 comprises selecting, from the lookup table, a line of cells per respective feature of the plurality of features on the first axis of the lookup table based on the normalized and quantized value of the respective feature, as obtained in step 133, and the border values on the second axis of the lookup table, as obtained in step 131. A step 137 comprises creating a vector for each of the lines of cells selected in step 135. A step 139 comprises creating the reference information by homomorphically encrypting each of the vectors created in step 137, e.g. with a public key associated with a privacy-compliant service provider.

Fig. 8 shows a second example of a method of creating reference information. The second example of Fig. 8 is an extension of the first example of Fig. 7. In the example of Fig. 8, step 137 of Fig. 7 is implemented by a step 141. Step 131 comprises obtaining a lookup table created with the method of Fig. 3, e.g. the lookup table 71 of Fig. 6.

Step 133 comprises obtaining a normalized and quantized value of each of a plurality of features for the reference. The result is reference feature vector 81, i.e. (fli Te).

Step 135 comprises selecting, from the lookup table, a line of cells per respective feature of the plurality of features on the first axis of the lookup table based on the normalized and quantized value of the respective feature, as obtained in step 133, and the border values on the second axis of the lookup table, as obtained in step 131. In the example of Fig. 6, the y axis of the lookup table 71 represents possible values of the reference and the x axis of the lookup table 71 represents possible features values of the probe. This means that rows of the lookup table 71 would be selected in step 135. In Fig. 6, row 73 corresponds to bin 1. In the example of Fig. 8. rows 82, which comprise rows #2, #5, and #7 of lookup table 71, are selected. Row #7 (the row corresponding to bin #7) is selected for feature #1, row #5 1s selected for feature #1, and row #2 is selected for feature #d.

Step 141 comprises creating a vector for each of the lines of cells selected in step 135 such that the order in which the feature comparison scores of the lines of cells are stored in the vectors is different from an order in which the feature comparison scores are stored in the lines of cells on the first axis in the lookup table. Step 141 may comprise applying a permutation function to the selected lines of cells. Different permutation functions may be applied to different lines of cells. The same permutation function(s) will be used in relation to the probe.

In the example of Fig. 8, a permutation function 7; is applied to the row selected for feature fi (row #7), a permutation function 7; is applied to the row selected for feature fi (row #5), and a permutation function 7 is applied to the row selected for feature fy (row #2). This results in the creation of vectors 83. In the example of Fig. 8, of the vectors 83, the vector 401 which corresponds to feature fi 1s (0,2.1,-2,3,-1,-3,0), which is a permutation of row #7 of the lookup table 71 (-3,-2,-1,0,0,1,2,3). The vectors 402 and 403 correspond to features fi and fa. respectively. The permuted reference template created in step 141 for the reference/subject u can be written as (rt) = (sh) Kel vg? ie[1,4]- If a single lookup table is used for all features, then Ni=N for ie[1,d].

Step 139 comprises creating the reference information by homomorphically encrypting each of the vectors created in step 141. The reference information for multiple references is used to construct a reference database (DB), e.g. a biometric reference database.

By using one or more lookup tables, homomorphic additions may be used instead of homomorphic multiplications. Homomorphic additions are the cheapest operations under encryption. As described above, the probe is represented by a set of ciphertext indexes to be summed up (requiring low space and leading to homomorphic additions) instead of a ciphertext (requiring high space and leading to homomorphic multiplications and rotations).

A ciphertext under an FHE scheme supporting SIMD property can encrypt a vector of dimension up to c as a single ciphertext, where c=c(2) depends on the security level x. For a database of m references, such as m>>c, the database may be partitioned into K =

BE chunks where each chunk comprises ciphertexts of capacity c.

For example, the database may comprise m references where, if each piece of reference information comprises d vectors of length N, the database comprises K chunks of d*N ciphertexts of capacity c. This is a better solution than using m ciphertexts of length d*N, as it makes optimal use of the plaintext packing capacity of ciphertexts and optimal use of (SIMD) vector summarization.

When permutation functions are used to protect probes, as will described in relation to step 163 of Fig. 13, the same permutation functions should be used stored the reference information in the reference database. In this case, a first order in which the feature comparison scores are stored (in the database) in different reference vectors of a respective plurality of reference vectors for a respective reference of the plurality of references is different from a second order in which the feature comparison scores are stored in the lookup table. The first order is dependent on the respective reference.

In this embodiment, each subject has its own permutations or set of permutations to prevent linking subjects with similar features. If a lookup table per feature is used (instead of the same lookup table for all features), a different permutation m may be used per feature i, e.g. to shuffle the order of the cells within a row from lookup table T;. In this case, the permuted reference template becomes (m (RY), Tg (R19). where 7 permutes the Ri associated with the i-th feature and selected from the table T; and each permuted row is encrypted in its permuted order.

Many existing biometric search approaches under encryption that try to cope with the burden of large biometric reference databases containing high dimensional vectors lack flexibility. They are static with respect to the addition and deletion of encrypted references since they assume to have the entire reference database beforehand, structure it to allow efficient search, and then encrypt the structured database; thus, adding or deleting a reference requires changing the structure of the database. When using the method of Fig. 2, references may be added or deleted under encryption, thereby providing flexibility.

A new reference is added to the database at a position which is unused in the pluralities of reference vectors stored in the database; this position is referred to as the subject offset. This position is the same for all ciphertexts within the same chunk. In a first implementation, the client device receives an encrypted subject offset from the search server.

The new reference will be added to the database at this subject offset. In a second implementation, the client device does not receive the encrypted subject offset.

In the first implementation, step 139 comprises creating pluralities of temporary reference vectors to make this addition under encryption possible. Each temporary reference vector of the pluralities of temporary reference vectors is fully homomorphically encrypted. Each temporary reference vector of the pluralities of temporary reference vectors comprises one or more feature comparison scores relating to one or more new references at one or more positions which are unused in the pluralities of reference vectors stored in the database. In the first implementation, pluralities of temporary reference vectors 422 are created in step 139 based on the vectors 83. These pluralities of temporary reference vectors 422 are shown in more detail in Fig. 9.

In the example of Fig. 9, a single new reference is added to the pluralities of reference vectors stored in the database at position #9. In an alternative example, multiple new references are added at the same time by including the feature comparison scores relating to the multiple new references at multiple positions which are unused in the pluralities of reference vectors stored in the database. Each of these multiple new references is associated with one of the multiple positions.

If the method of Fig. 8 is performed by the client device 11 of Fig. 1, the client device 11 may request an offset number from the search server 21 of Fig. 1 and may in response receive this offset number (#9 in the example of Fig. 9) from the search server 21.

The search server 21 may keep track of the available offsets per chunk, e.g. as illustrated with the following table:

Table 1

Chunk | Available

The search server 21 may consult this mapping to get the available offset Ou at chunk J for a new reference/subject U. Then, search server 21 sends to the client device 11 the available offset at the chunk J. ¢.g. as a ciphertext of the form cto, = [(0,...,0,1] where 1 is at the position Ox.

Fig. 9 shows the pluralities of temporary reference vectors 422 after the feature comparison scores of the vectors 83 of Fig. 8 have been inserted at the subject offset

Ou and each of the temporary reference vectors has been fully homomorphically encrypted.

The first ciphertext comprises the first value of vector 401 (which corresponds to feature fi) in encrypted form at position #9 and the last ciphertext comprises the last value of vector 403 {which corresponds to feature fy) in encrypted form at position #9.

Each temporary reference vector of the pluralities of temporary reference vectors further comprises zeros at other positions which are used for the plurality of references in the pluralities of reference vectors. Each temporary reference vector of the pluralities of temporary reference vectors may also comprise zeros at unused positions (not shown in Fig. 9). Alternatively, each temporary reference vector of the pluralities of temporary reference vectors may have a length Oy.

In the first implementation, step 139 may comprise generating a set of CT ciphertexts Eje | cach encrypting one element of [[(rt,). If a single lookup table is used for all features. then N=N for ie[1,d] and CT=d*N. ct} is the ciphertext encrypting the k-th element of the permuted row mi (Ry) corresponding to the 1-th feature. This is done by scalar multiplying each element from [](rt,) by the offset ciphertext Co. If a single lookup table is used for all features, then Ti=T for 1e[L.d].

Fig. 10 shows steps for the addition of a reference to the database in a second embodiment of the method. These steps may be performed by the database server 31 of Fig.

I, for example. Initially, the database may be completely empty, e.g. each chunk may comprise s ciphertexts encrypting zeros, or partially filled, e.g. some indices of one or more chunks may be filled with encrypted references.

In the example of Fig. 10, the database 410 is partially filled and already comprises eight references (in the pluralities of reference vectors 421). In the example of

Fig. 9. the pluralities of reference vectors 421, e.g. ciphertexts ct} in chunk J, are represented by a matrix and the reference vectors/ciphertexts are rows in this matrix. In an alternative example, the reference vectors/ciphertexts may be columns in this matrix.

A step 221 comprises obtaining pluralities of temporary reference vectors, e.g. the pluralities of temporary reference vectors 422 of Fig. 9. The pluralities of temporary reference vectors may be created by the client device 11 of Fig. 1 and received by the database server 31 of Fig. 1. for example. In the first implementation described above, step 221 comprises receiving the above-mentioned set of CT ciphertexts (Ghar along with the chunk id J, ¢.g. from the client device 11, possibly via the search server 21.

A step 223 comprises modifying the pluralities of reference vectors 421 by adding each temporary reference vector of the pluralities of temporary reference vectors 422 obtained in step 221 to a corresponding reference vector of the pluralities of reference vectors 421. In the first implementation, chunk J of the database is updated by adding each ciphertext cti with its counterpart ciphertext ctf in chunk J. The search server 21 may be notified that the addition was successful such that the search server 21 can update its mapping, e.g. as illustrated in Table 1 above. Fig. 11 shows an example of pluralities of reference vectors 423 (e.g. chunk J) stored in the database 410 after the modifications in step 223.

In the second implementation. the client device is not informed at which subject offset the reference will be added. In this second implementation, step 221 comprises receiving the concatenation of the permuted reference template [[(rt,) as one single . ~ . ~ . ~ i yi€l1,d] ciphertext cty and transforming ct; into a set of padded ciphertexts Gi 1 ĳ as ‚Ni described in relation to the first implementation and exemplified in Fig. 9.

This transformation may be done by extracting each element through a scalar multiplication of the ciphertext with the plaintext vector filled with zeros and a one in the element’s position. Then, the resulting ciphertexts are all rotated to the available offset Oy, ~ jyieltd generating the set of padded ciphertexts Eje oe The same procedures may be followed

HAVE to add multiple encrypted references simultaneously by transforming them into padded ciphertexts for different available subject offsets. Next, step 223 is then performed as described above. Algorithm 2 gives a high-level overview of a procedure for the addition in both implementations.

Algorithm 2:

Input: [ot = (si) eon) ie[1,d] permuted reference template of U with an available offset Oy on the J-th chunk’s [DB]

Output: J-th chunk’s [DB] {otk} epa] te1d] updated if Oy is known then

Search server sends to client cto, = Je 0, 1 —_—

Ou // Loop executed by the client: for i[1.d] and k&[1.Ni] do ctl = sk Xs cto, end for . ~ iy i€l1d] en

Client sends {Ct vering] to the DB server: end if if Ou is unknown then

Client sends cty = [[[(rty)] to the DB server:

// Loop executed by the DB server; fori€[1,d] and k&[1,Ni] do

Ct}, = Rot}! (0 1,0, .] Xs Cty); ki end for endif /f Loop executed by the DB server; for ctl € [DB] do cth « ctl + ctl; end for return [DB] updated

Fig. 12 shows a first example of a method of creating probe information. This method may be performed by the client device 11 of Fig. 1, for example. A step 151 comprises obtaining a lookup table created with the method of Fig. 3. A step 153 comprises obtaining a normalized and quantized value of each of a plurality of features for the probe.

The same normalization and feature quantization performed in step 107 of Fig. 3 is performed in step 153.

A step 155 comprises selecting an identifier per respective feature of the plurality of features based on the normalized and quantized value of the respective feature, as obtained in step 151, and the border values on the first axis of the lookup table, as obtained in step 153. The identifier identifies a line of cells on the second axis of the lookup table. A step 157 comprises determining, for inclusion in the probe information, an index per feature of the plurality of features based on the identifiers selected in step 155.

Fig. 13 shows a second example of the method of creating probe information.

The second example of Fig. 13 is an extension of the first example of Fig. 12. In the example of Fig. 13, step 155 of Fig. 12 is implemented by a step 161 and step 157 of Fig. 12 is implemented by a step 163. Step 151 comprises obtaining a lookup table created with the method of Fig. 3. e.g. the lookup table 71 of Fig. 6. Step 153 comprises obtaining a normalized and quantized value of each of a plurality of features for the probe. The result is probe feature vector 86, i.e. (P1.....Di,....Pd).

Step 161 comprises selecting an identifier per respective feature of the plurality of features based on the normalized and quantized value of the respective feature, as obtained in step 151, and the border values on the first axis of the lookup table. as obtained in step 153. The identifier identifies a line of cells on the second axis of the lookup table. In the example of Fig. 6, the y axis of the lookup table 71 represents possible values of the reference and the x axis of the lookup table 71 represents possible features values of the probe. This means that columns of the lookup table 71 would be selected in step 161. In Fig. 5, column 75 corresponds to bin 1.

In step 161, the identifiers are selected from an ordered set of identifiers. The identifiers are ordered according to feature value. For example, in the lookup table 71, the bins of possible feature values are numbered 1 to 8, as there are eight feature quantization levels. In the example of Fig. 13, column #2 (the column corresponding to bin #2) is selected for feature #1, column #6 is selected for feature #1. and column #4 is selected for feature #d.

In the example of Fig. 13, the vector 87 of selected identifiers is (2,....6.....4). While the identifiers are selected from a set ordered according to feature value, the vector itself is ordered according to feature.

Step 163 comprises determining, for inclusion in the probe information, an index per feature of the plurality of features based on the identifiers selected in step 161. In step 163, the indices are created by mapping a respective identifier selected for a respective feature of the plurality of features to a permuted identifier from a permutation of the ordered set of identifiers. The respective identifier has a certain order number in the ordered set of identifiers and the permuted identifier has the same certain order number in the permutation.

Step 163 may comprise applying a permutation function to the identifiers selected in step 161. Different permutation functions may be applied to different identifiers.

The one or more used permutation functions are the same one or more permutation functions that were used in relation to the reference.

In the example of Fig. 13, a permutation function 7: is applied to the identifier of the column selected for feature fi (column #2), a permutation function 7; is applied to the identifier of the column selected for feature fi (column #6), and a permutation function 7: is applied to the identifier of the column selected for feature fy (column #4). This results in the creation of a vector 88, which comprises (4,....1,....5).

For example. permutation function 7; permutes a vector (ai.a»,a3,a4,a5,85,a7,a5) Into (4:,47,4,42,48,93,41,,95). Applving this permutation function to the row selected for feature fi (row #7), i.e. (-3.-2.-1,0,0,1,2.3) creates a permutation (0,2,1,-2.3 - 1.-3,0). Applying this permutation function 7; to the identifier of the column selected for feature fi. i.e. 2, creates a permuted identifier 4 (a; is the fourth element of the permuted vector).

Fig. 14 illustrates the method of Fig. 2 being performed on the database of

Fig. 11. Step 201 comprises receiving probe information. The probe information indicates, for the probe, an index per feature of a plurality of features. The index has been determined per respective feature of the plurality of features based on a value of the respective feature for the probe. The probe information may be determined as described in relation to Fig. 13.

In the example of Fig. 14, the probe information 88 of Fig. 13 is received in step 201.

In the embodiment of Fig. 14, the indices indicated in the probe information have been created by mapping a respective identifier selected from an ordered set of identifiers for each respective feature of the plurality of features to a permuted identifier from a permutation of the ordered set of identifiers, as described in relation to step 163 of

Fig. 13. Each respective identifier identifies a lines of cells in the lookup table applicable to the respective feature. The respective identifier has a certain order number in the ordered set of identifiers and the permuted identifier has the same certain order number in the permutation, ¢.g. because a permutation function is used. The order of the permutation is dependent on the probe, i.e. the subject. In other words, a permutation function applied in step 141 of Fig. 8 to lines of cells selected with respect to a reference is different from a permutation function applied in step 163 of Fig. 13 to the identifiers selected with respect to a probe if the probe and the reference are not the same subject. The permutations are normally generated randomly independently from the probe.

Step 203 comprises selecting, per respective feature of the plurality of features, from a plurality of reference vectors associated with the respective feature, a reference vector corresponding to the index indicated for the respective feature in the probe information received in step 201. The pluralities of reference vectors have been stored in a database. In the example of Fig. 14, the reference vectors are selected from pluralities of reference vectors 423.

In the example of Fig. 14, row/ciphertext 451 1s selected for feature #1, which corresponds to index #4 indicated for feature f; in the probe information, row/ciphertext 452 is selected for feature fi, which corresponds to index #1 indicated for feature #i in the probe information, and row/ciphertext 453 is selected for feature #d. which corresponds to index #5 indicated for feature fa in the probe information.

Each of the reference vectors comprise feature comparison scores relating to the plurality of references and are fully homomorphically encrypted. Feature comparison scores relating to a respective reference of the plurality of references have been obtained from a lookup table based on a value of the respective feature for the respective reference.

The feature comparison scores are obtained from a single lookup table for the plurality of features or from a lookup table per feature of the plurality of features.

In the embodiment of Fig. 14, the feature comparison scores relating to a respective reference of the plurality of references and a respective feature of the plurality of features have been obtained from the lookup table(s) based a quantized actual value of the respective feature for the respective reference and have been determined for quantized potential values of the respective feature for any future probe, as described, for example, in relation to Fig. 8.

In the mated case. the permuted probe is equivalent to the non-permuted probe because 7; +(x;(pi)) = p;. Thus, the permutations preserve the mated score distribution. However, in the non-mated case, they are distinct since the permutations are associated with different users implying different permutations, resulting in 7; 7 (rr; (pi)) # pi. which positively affects the non-mated score distribution as it increases incorrect selections. This is because the permutation used to generate the reference template and the probe template should be the same so that both can point to the same cells they were supposed to point to without permutation.

Step 205 comprise calculating a vector of evaluations by summing up the selected reference vectors without decrypting the selected reference vectors. The vector of evaluations are fully homomorphically encrypted. Summing up the selected reference vectors comprises performing a vector summation in a Single Instruction Multiple Data manner. Per chunk, ¢ evaluations may be computed simultaneously by using only d-1 homomorphic additions.

In the example of Fig. 14, the pluralities of reference vectors 423 comprise nine references and the calculated vector of evaluations (ev_vec) 455 therefore comprises nine relevant valuations evi..evo (in encrypted form). In the example of Fig. 14, the vector of evaluations 455 further comprises zeros for unused positions of the pluralities of reference vectors 423.

Step 207 comprises determining or enabling to determine the relationship between the probe and each of the plurality of references based on the vector of evaluations calculated in step 205. Determining the relationship between the probe and the plurality of references may comprise applying a filter function to the homomorphically encrypted vector of evaluations, 1.e. making an encrypted search decision. Alternatively, determining the relationship between the probe and the plurality of references may comprise decrypting the homomorphically encrypted vector of evaluations and applying a filter function to the decrypted vector of evaluations, i.e. making a cleartext search decision.

Determining the relationship between the probe and the plurality of references may comprise determining a similarity or a dissimilarity between the probe and each of the plurality of references. Determining the similarity between the probe and the plurality of references may comprise identifying references with an evaluation higher than a threshold and/or identifving references with the highest evaluation. Determining the dissimilarity between the probe and the plurality of references may comprise identifving references with an evaluation lower than the threshold and/or identifying references with the lowest evaluation.

The position of an evaluation in the vector of evaluations is associated with a reference. If the goal is to determine a specific identity, then the order of the determined vector of evaluations should be preserved; otherwise, a random permutation under encryption of the homomorphically encrypted vector of evaluations may be applied before transmitting result information which comprises the homomorphically encrypted vector of evaluations/ciphertext, filtered or not.

To make a cleartext decision, the database server sends the vector of evaluations of all the chunks to the search server who decrypts it to get a list of evaluations/final scores per chunk. Then, the database server continues the decision procedure in the cleartext. Depending on the decision type, the database server can evaluate any filter function fe over the list of evaluations/final scores to make its decision. Algorithm 3 shows the procedure for the filter function greater than or equal to f= :={® <"} The evaluation fe (FSJ[j])) may return the index(es) of the reference(s) with the highest score(s) or a binary vector, where 0 means that the evaluation/final score is greater than or equal to © and | means that the evaluation/final score is smaller than ©.

Algorithm 3:

Input: {ct]s} Jerri SEE of evaluations/final scores ciphertexts and fo:={ ® <"} filter function parametrized by © threshold

Output: L list of candidates or list of match/no match

L=] for J--1 to K do

FS; =Decrypt(ct/);

L=[]: for j==1 toc do

Ly.append(fs (FS;[j])): end for

L.append(L;): end for return L

To make an encrypted decision, a filter function fe is evaluated over the evaluations/final scores ciphertext for each of the K chunks in a blinded manner. In order to do this, the database server receives the encrypted threshold [0] from the search server and receives the number of possible evaluations/scores between the threshold @ and the maximum evaluation/score Smax, i.e. [=Smax @+1, from the search server. For a biometric search, a biometric threshold filter function f::={ ® St may be used. as this is the building block of most filter functions used in biometrics. Algorithm 4 shows this procedure.

Then, from [0], the database server computes the set of threshold ciphertexts by creating 1 ciphertexts of the form [8] := [(68,..,9)] replicating the value 0=0+i where 1< [0,1-1] over the c plaintext slots. Then, the subtraction of the evaluations/final scores ciphertext with each threshold ciphertext is blinded with a vector of ¢ different random values to produce the blinded comparison ciphertexts. These blindings count for scalar multiplications * only, cheaper than homomorphic multiplications. The set of blinded comparison ciphertexts {the}, is randomly permuted by the database server before sending it to the search server. When the search server receives this set, it decrypts each ciphertext to obtain a vector composed of either zero values or random values that are not equal to zero.

If the i-th element of the vector is zero, it indicates that the evaluation/final score associated with the i-th slot is equal to or greater than the threshold value. On the other hand, if a non-zero random value is found in the i-th slot, it means that the evaluation/final score associated with that slot is strictly below the threshold. Hence, the positions of the zeros represent the subjects for which the final score is above the threshold, representing the list of potential candidates.

In order for the search server to know which position corresponds to which identity, the search server needs to keep track of the registered subjects, as described in relation to Table |.Sometimes, the desired result from a biometric search is to know whether a given probe belongs to the database without knowing its identity. In this case, the search server should not keep track of the registered references and their associated plaintext slot in a chunk. Thus, finding a zero in a certain position will not mean it corresponds to the probes identity.

Algorithm 4:

Input: {ct]s} Jerri SEE of evaluations/final scores ciphertexts and fo:={ ®@ SV filter function parametrized by © threshold

Output: L list of candidates or list of match/no match /f Loop executed by the DB server for Chunk J<1 to K do foreach [0] <{[6]}s do

Sample 7g a vector of s random scalars: ct) g = Te xs (etfs — [O): end foreach

Send {ct] }, - {TMranafeth oh, end for /{ Loop executed by the search server

L=[1; for J-1 to K do

L:=[]: for I-1 to Smax ® +1 do

D/ =Decrypt(ct} )): if 0€D/ then

L;append(D/ Index{(0}}: end if end for

L.append(Ly); end for return L

Fig. 15 shows steps for the removal of a reference from the database in a third embodiment of the method. These steps may be performed by the database server 31 of

Fig. 1, for example. To delete one or more encrypted references from the database (e.g. from a specific chunk J), the database server receives from the search server (e.g. in addition to an identifier of chunk J} either the offsets of the references that need to be deleted or an encrypted vector indicating their corresponding positions.

A step 231 comprises creating pluralities of further temporary reference vectors, ¢.g. pluralities of further temporary reference vectors 424, by multiplying a multiplication vector v, ¢.g. multiplication vector 457, with each reference vector of the pluralities of reference vectors, e.g. pluralities of reference vectors 423 stored in the database 410.

The multiplication vector v comprises one or more values equaling minus one at one or more positions which are used in the pluralities of reference vectors for one or more references which are to be removed from the database and zeros at other positions which are used in the pluralities of reference vectors for references which are not to be removed from the database. The database server may create this multiplication/deletion vector v or the database server may receive this multiplication/deletion vector v formed and encrypted by the search server so that the database server does not know what he is deleting. The multiplication vector v may also comprise zeros in unused positions.

The pluralities of further temporary reference vectors 424 are shown in more detail in Fig. 16. For each ciphertext ctl in the pluralities of reference vectors 423, the database server multiplies the multiplication/deletion vector v with a copy of the ciphertexts to inverse the sign of the positions of interest in the pluralities of further temporary reference vectors 424.

A step 233 comprises modifving the pluralities of reference vectors by adding each further temporary reference vector of the pluralities of further temporary reference vectors, e.g. the pluralities of further temporary reference vectors 424, to a corresponding reference vector of the pluralities of reference vectors, ¢.g. the pluralities of reference vectors 423. In other words, the ciphertexts that result from the multiplications in step 231 are added to ct}. This results in the modified pluralities of reference vectors 425 as shown in Fig. 15. The updated database has zeros in the plaintext slots at the offset position (#4 in the example of Fig. 15). Algorithm 5 shows a deletion procedure in pseudo-code.

Algorithm 5:

Input: {0y,, es Oy, }

set of subjects’ offsets to be deleted, J chunk’s index where the subjects’ reference templates are stored, and the chunk’s local encrypted etl} os

Output: J-th chunk’s [DB] updated v=(0 z1 0,0, 1 ,0,..,0);

Ou, ou, if {0y,, es 0p, } is known by the DB server then

The DB server receives v and the chunk id J from the search

Server; // Loop executed by the DB server; for i[1,d] and KE [1,Ni] do ct - ct + vx; ct; end for end if if {0p,, oe Oy, } is unknown by the DB server then

The DB server receives [[v] and the chunk id J from the search server; /f Loop executed by the DB server; fori€[1.d] andk&[1,N;] do ct ctj + [lvl xgg ctk: end for end if return [DB] updated

Another operation that may be performed besides addition and deletion is deduplication. In biometrics, deduplication refers to identifying and removing duplicate biometric references from a database to ensure that each reference is unique and the database has no redundant information. It is essential for accurate and reliable biometric searches and to prevent the creation of multiple identities based on the same biometric data. The process of identifying duplicates varies depending on the biometric application. Duplicate enrollment checks may be performed during the enrollment phase, for example. The deduplication operation may comprise a (e.g. biometric) search followed by either a cleartext search decision or an encrypted search decision to identify the list of duplicates and then a deletion of the list of duplicates.

A fourth embodiment of the computer-implemented method of determining or enabling to determine a relationship between a probe and each of a plurality of references under encryption is shown in Fig. 17.

Step 201 comprises receiving probe information. The probe information indicates, for the probe, an index per feature of a plurality of features. The index has been determined per respective feature of the plurality of features based on a value of the respective feature for the probe.

A step 241 comprises selecting a first or next chunk of the K chunks stored in a database. Each chunk comprises pluralities of reference vectors, e.g. d*N reference vectors of length s. for s references. Each of the reference vectors stored in a chunk comprises feature comparison scores relating to the references stored in the chunk and are fully homomorphically encrypted. In the first iteration of step 241, step 241 comprises selecting the first chunk of the K chunks.

Feature comparison scores relating to a respective reference of the plurality of references have been obtained from a lookup table based on a value of the respective feature for the respective reference. The feature comparison scores are obtained from a single lookup table for the plurality of features or from a lookup table per feature of the plurality of features.

Step 203 comprises selecting, per respective feature of the plurality of features, from the plurality of reference vectors associated with the respective feature in the chunk selected in step 241, a reference vector corresponding to the index indicated for the respective feature in the probe information received in step 201. Step 205 comprises calculating a vector of evaluations for the selected chunk by summing up the selected reference vectors of the selected chunk without decrypting the selected reference vectors.

The vector of evaluations are fully homomorphically encrypted.

Next, a step 243 comprises checking whether a vector of evaluations has been calculated for all K chunks. If not, step 241 is repeated and the next chunk is selected. If so, a step 245 is performed next. Step 245 comprises determining or enabling to determine the relationship between the probe and each of the m references stored in the K chunks based on the K vectors of evaluations calculated in step 205 for the K chunks.

Determining the relationship between the probe and the plurality of references may comprise applying a filter function to the homomorphically encrypted vector of evaluations. In this case, result information transmitted by the database server to the search server may comprise the K vectors of evaluation, e.g. ev_veci, ev_vec,, and ev_vec:

for three chunks. Fig. 18 shows an example of a database 410 with three chunks 423, 431, and 441.

Alternatively, determining the relationship between the probe and the plurality of references may comprise decrypting the homomorphically encrypted vector of evaluations and applying a filter function to the decrypted vector of evaluations. In this case, result information transmitted by the database server to the search server may comprise K filtered vectors of evaluation, e.g. fg(ev_veci), fglev vec), and fg(ev_vecs) for three chunks.

The above methods may be used, for example, in the honest-but-curious scenario where the client device, database server, and search server are none colluding parties that interact in accordance with the prescribed protocol and try to learn extra information merely from observing the protocol’s transcripts (such as the received and sent messages). In an embodiment, the search server has an FHE key pair (pk,sk) and shares its public key pk with the database server and the client device. The search server can learn the search outcome only and does not have access to the encrypted references and the protected probes. All the search server can learn are the final evaluations/scores in case the search was established in the cleartext search decision setting for the sake of gaining efficiency.

In an embodiment, the client device, in addition to pk, possesses a set of pseudo-random permutations, a different set per subject, as a user-specific secret he owns and does not share with both servers. During the search and the addition with an unknown subject offset, the client device receives nothing from both servers since it sends to the database server its permuted probe template and its permuted reference template encrypted with pk. While during addition with a known subject offset, the client device receives the offset encrypted as a mask, based on which the client device generates his encrypted reference template. Thus, the client device does not learn any useful information regarding the encrypted reference database besides what he knows.

The database server plays a central role in the embodiments described above, as It is the database server which holds the encrypted reference database and performs the search, addition, deletion, and deduplication under encryption and interacts with the client device and the search server. In the honest-but-curious setting, the database server strictly complies with the protocols; however, it attempts to learn the raw reference templates, the raw probe templates, the search final evaluations/scores, the search outcome, or any other extra information not permitted by the protocols, just by observing the messages received from and sent to the client device and the search server. During the (e.g. biometric) search, the database server receives the indexes of ciphertexts it needs to select for the summation to form the final evaluations/scores and does not know their meaning, which is prevented in an embodiment by the use of one or more pseudo-random permutations, which are only known bv the client device and different from one reference/subject U to the other. The meaning of ciphertext indexes is not known by the database server. To the database server, these indexes only appear as random values that point to a ciphertext that has been encrypted probabilistically. This means that even if two ciphertexts are encrypting the same plaintext value, they will be completely distinct from each other. As such, by observing those ciphertexts and their indexes, the database server cannot deduce useful information to assist in mounting a statistical frequency analysis.

In an embodiment, during the addition of a new reference, the database server receives the reference encrypted, with which it updates the encrypted reference database; whether the offset is specified or not, the database server is unable to link an offset to the actual identity of a reference/subject. Similarly, when, in an embodiment, the database server receives a deletion request from the search server, it only receives the offsets to delete as a cleartext vector of —1s and Os or a ciphertext encrypting this vector; in both cases, it deletes without learning anything meaningful from executing the deletion procedure. This holds from leveraging the IND-CPA security of FHE schemes that allows the indistinguishability between ciphertexts. This entails that indistinguishability is also ensured between two different chunks. The same applies to the deduplication procedure since it combines a search and deletion. Therefore, these embodiments provides security in the honest-but-curious scenario.

Figs. 19-21 show the biometric performance of the MFIP comparator for the verification task and the search task in closed-set and open-set scenarios. Figs. 19-21 distinguish between the with permutation case and the without permutation case. In the with permutation case, for each subject, a different permutation was used for generating its reference-probe MFIP-based templates, whereas in the without permutation case, no permutations were used, i.e. the same permutation was used for all users. To detach the performance assessment from the dataset, two datasets were used: the VGGFace2 dataset and the LFWdataset from which normalized facial feature vectors of dimension 512 were extracted using ResNet-100 trained with ArcFace. a) Verification task: Fig. 19 depicts the DET curves, representing the FNMR as a function of the FMR, corresponding to the performance of the MFIP comparator in conducting the verification task compared to the baseline IP comparator that operates directly on the normalized vectors without pre-computation. quantization, and permutation.

For both datasets, Fig. 19 shows that the DETs corresponding to the MFIP comparator without permutation (the dashed-line curves) and the baseline IP DET (the solid-line curves) completely overlap and thus achieve similar performance, confirming that biometric accuracy is unaffected by the pre-computation and quantization. In contrast, the DETs corresponding to the MFIP comparator with permutation (the dotted-line curves) remain consistently below the overlapping DETs (the dashed-line and solid-line curves), showing the positive effect of the permutation on the recognition performance as it helps to outperform even the baseline IP. Hence, using different permutations per subject significantly lowers the false non-match rate compared to the baseline IP and without using permutation for a similar false match rate. b) Closed-set search task: Fig. 20 shows the performance of the MFIP comparator when conducting the search task in the closed-set scenario compared to the baseline IP and expresses it in terms of CMC curves from rank-1 to rank-20. In the closed-set scenario, only probes of subjects enrolled in the reference DB were considered. Like with the verification task, the permutation positively affects the search accuracy in the closed-set scenario in both datasets. This is observed in the CMC's corresponding to the MFIP comparator in the permutation case (the dotted-line curves) that remain above the other overlapping CMCs representing the baseline and the MFIP without permutation, improving over the baseline (the dashed-line and solid-line curves). In particular, the rank-1 recognition rate measured for the VGGFace2 (resp. LFW) dataset equals —92.8% (resp. 99.2%) for

MFIP with permutations, while the baseline IP and the MFIP without permutations have a rate for rank-1 around ~84.7% (resp. ~98.1%). c) Open-set search task: Fig. 21 shows the performance of the MFIP comparator for conducting the search task in the open-set scenario compared to the baseline

IP expressed in terms of DET curves representing the trade-off between the false negative identification rate (FNIR) and the false positive identification rate (FPIR). A 10-fold cross- validation protocol was followed over subject identities in an independent and identically distributed manner. For each fold, 10% of the identities were considered as the non-enrolled subjects, from which the non-mated probe templates were generated, which were excluded from the reference DB. The remaining 90% of the identities were used to form the reference

DB and generate mated probe templates. As a result, each fold produces ~ 18260 (resp. ~ 1391) non-mated scores and 8218 (resp. 756) mated scores for the VGGFace2 (resp. LFW) dataset. Identically to the results shown in Figs. 19 and 20, Fig. 21 shows that the DET corresponding to the IP baseline and MFIP without permutation are completely overlapping, while the DET relative to MFIP with permutation is constantly below both DETs, outperforming them.

In summary, without using permutation, the lookup table-based comparator meets the baseline performance, while using different permutations per subject leads to a significant performance gain for the biometric search task in both the closed-set and open-set scenarios. Figs. 20-21 show that that the lookup table-based approach (MFIP) preserves the search accuracy of the baseline (IP) comparator. Table 2 provided below shows the performance advantage of the embodiment of Fig. 14, which contains zero homomorphic multiplications and rotations, compared to state of the art biometric search methods. This embodiment performs a full (¢.g. biometric) search under encryption in a database of one million references with 512-dim vectors in 37.01 seconds on commodity hardware (A 64-bit computer Intel(R) Core 17-10750H CPU with 4 cores rated at 2.60GHz and 16GB of memory), which is 32 times faster than SOTA4 that runs in 1200 seconds for vectors of the same dimension for a cleartext search decision.

Table 2

Exhaustive; | This 512 IP BFVrns 128 8192 0.023 2.82 cleartext patent 192 8192 0.023 2.82 decision specificat 256 16384 0.039 2.41 ion

Exhaustive; | This 512 IP BFVms 128 8192 0.225 27.67 encrypted patent 192 8192 0.2226 27.79 decision specificat 256 16384 0.597 37.01 ion

Exhaustive; SOTAI 64 SED | BFV 128 1000 0.82 8.2x102 cleartext 512 46.32 4.63x10% decision

Exhaustive; SOTA2 64 IP FV 128 4096 0.6 1.48x10? cleartext 512 4.96 1.21x10° decision

Exhaustive; SOTA3 128 IP CKKS 128 16384 2.35 1.43x10? cleartext decision

Exhaustive; SOTA3 128 IP CKKS 128 16384 9.35 5.70x10? encrypted decision

Pre-selection | SOTA4 512 HD NTRU 128 4096 17 4.15 x10°

Hl 0 a

SOTAI: P. BauspieB, J. Olafsson, J. Kolberg, P. Drozdowski, C. Rathgeb, and C. Busch, “Improved homomorphically encrypted biometric identification using coefficient packing,” in 2022 International Workshop on Biometrics and Forensics (IWBF).

IEEE, 2022, pp. 1-6.

SOTA2: J.J. Engelsma, A. K. Jain, and V.N. Boddeti, “HERS:

Homomorphically encrypted representation search,” IEEE Transactions on Biometrics,

Behavior, and Identity Science, 2022.

SOTA3: A. Ibarrondo, H. Chabanne, V. Despiegel, and M. "Onen, “Grote:

Group testing for privacy-preserving face identification,” 2023.

SOTA4: P. Drozdowski, F. Stockhardt, C. Rathgeb, D. Osorio-Roig, and C.

Busch, “Feature fusion methods for indexing and retrieval of biometric data: Application to face recognition with privacy protection,” IEEE Access, vol. 9, pp. 139 361-139 378, 2021.

Fig. 22 depicts a block diagram illustrating an exemplary data processing system that may perform the method as described with reference to Figs. 2, 10, 14, 15, and 17.

As shown in Fig. 22, the data processing system 300 may include at least one processor 302 coupled to memory elements 304 through a system bus 306. As such, the data processing system may store program code within memory elements 304. Further, the processor 302 may execute the program code accessed from the memory elements 304 via a system bus 306. In one aspect, the data processing system may be implemented as a computer that is suitable for storing and/or executing program code. It should be appreciated, however, that the data processing system 300 may be implemented in the form of any system including a processor and a memory that is capable of performing the functions described within this specification.

The memory elements 304 may include one or more physical memory devices such as, for example, local memory 308 and one or more bulk storage devices 310.

The local memory may refer to random access memory or other non-persistent memory device(s) generally used during actual execution of the program code. A bulk storage device may be implemented as a hard drive or other persistent data storage device. The processing system 300 may also include one or more cache memories (not shown) that provide temporary storage of at least some program code in order to reduce the number of times program code must be retrieved from the bulk storage device 310 during execution.

Input/output (I/O) devices depicted as an input device 312 and an output device 314 optionally can be coupled to the data processing system. Examples of input devices may include, but are not limited to, a kevboard, a pointing device such as a mouse, or the like. Examples of output devices may include, but are not limited to, a monitor or a display, speakers, or the like. Input and/or output devices may be coupled to the data processing system either directly or through intervening I/O controllers.

In an embodiment, the input and the output devices may be implemented as a combined input/output device (illustrated in Fig. 22 with a dashed line surrounding the input device 312 and the output device 314). An example of such a combined device is a touch sensitive display, also sometimes referred to as a “touch screen display” or simply “touch screen”. In such an embodiment, input to the device may be provided by a movement of a physical object, such as e.g. a stylus or a finger of a user, on or near the touch screen display.

A network adapter 316 may also be coupled to the data processing system to enable it to become coupled to other systems, computer systems, remote network devices, and/or remote storage devices through intervening private or public networks. The network adapter may comprise a data receiver for receiving data that is transmitted by said systems, devices and/or networks to the data processing system 300, and a data transmitter for transmitting data from the data processing system 300 to said systems, devices and/or networks. Modems, cable modems, and Ethernet cards are examples of different types of network adapter that may be used with the data processing system 300.

As pictured in Fig. 22, the memory elements 304 may store an application 318. In various embodiments, the application 318 may be stored in the local memory 308. he one or more bulk storage devices 310, or separate from the local memory and the bulk storage devices. It should be appreciated that the data processing system 300 may further execute an operating system (not shown in Fig. 22) that can facilitate execution of the application 318. The application 318, being implemented in the form of executable program code, can be executed by the data processing system 300, ¢.g., by the processor 302.

Responsive to executing the application, the data processing system 300 may be configured to perform one or more operations or method steps described herein.

Various embodiments of the invention may be implemented as a program product for use with a computer system, where the program(s) of the program product define functions of the embodiments (including the methods described herein). In one embodiment,

the program(s) can be contained on a variety of non-transitory computer-readable storage media, where, as used herein, the expression “non-transitory computer readable storage media” comprises all computer-readable media, with the sole exception being a transitory, propagating signal. In another embodiment, the program(s) can be contained on a variety of transitory computer-readable storage media. Illustrative computer-readable storage media include, but are not limited to: (i) non-writable storage media (e.g., read-only memory devices within a computer such as CD-ROM disks readable by a CD-ROM drive, ROM chips or any type of solid-state non-volatile semiconductor memory) on which information is permanently stored; and (ii) writable storage media (e.g., flash memory, floppy disks within a diskette drive or hard-disk drive or any type of solid-state random-access semiconductor memory) on which alterable information is stored. The computer program may be run on the processor 302 described herein.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms "a." “an,” and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

The corresponding structures, materials, acts, and equivalents of all means or step plus function elements in the claims below are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of embodiments of the present invention has been presented for purposes of illustration, but is not intended to be exhaustive or limited to the implementations in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope of the present invention.

The embodiments were chosen and described in order to best explain the principles and some practical applications of the present invention, and to enable others of ordinary skill in the art to understand the present invention for various embodiments with various modifications as are suited to the particular use contemplated.

Claims

CONCLUSIONS

1. A computer-implemented method for determining or enabling the determination of a relationship between a probe and each of a plurality of encrypted references, said method comprising: — receiving (201) probe information (88), said probe information (88) indicating, for said probe, an index per property of a plurality of properties, said index being determined per respective property of said plurality of properties based on a value of said respective property for said probe; - selecting (203), per respective property of said plurality of properties, from a plurality of reference vectors associated with said respective property, a reference vector (451-453) corresponding to said index indicated for said respective property, said plurality of reference vectors (421, 423) being stored in a database (410), each of said reference vectors comprising property comparison scores associated with said plurality of references and being fully homomorphically encrypted, said property comparison scores associated with a respective reference of said plurality of references being obtained from a lookup table (71) based on a value of said respective property for said respective reference; - calculating (205) a vector (455) of evaluations by said selected reference vectors

(451-453) to enumerate without decrypting said selected reference vectors (451-453), said vector (455) of evaluations being fully homomorphically encrypted; and -— determining (207) or enabling the determination of said relationship between said probe and each of said plurality of references based on said vector (455) of evaluations.

A computer-implemented method as claimed in claim 1, wherein determining said relationship between said probe and said plurality of references comprises applying a filter function to said homomorphically encrypted vector (455) of evaluations.

A computer-implemented method as claimed in claim 1, wherein determining said relationship between said probe and said plurality of references comprises decrypting said homomorphically encrypted vector (455) of evaluations and applying a filter function to said decrypted vector of evaluations.

4. A computer-implemented method according to the preceding claims, wherein determining said relationship between said probe and said plurality of references comprises determining a similarity or a difference between said probe and each of said plurality of references.

5. A computer-implemented method according to claim 4, wherein determining said similarity between said probe and said plurality of references comprises identifying references having an evaluation higher than a threshold and/or identifying references having the highest evaluation, or determining said difference between said probe and said plurality of references comprises identifying references having an evaluation lower than said threshold and/or identifying references having the lowest evaluation.

6. A computer-implemented method according to the preceding claims, wherein said plurality of reference vectors and said vector of evaluations are fully homomorphically encrypted with the public key of a privacy-compliant service provider and wherein said privacy-compliant service provider is only enabled to learn said relationship between said probe and each of said plurality of references by transmitting result information to said privacy-compliant service provider, said result information comprising said fully homomorphically encrypted vector of evaluations, filtered or not.

7. A computer-implemented method according to any preceding claim, wherein said probe is a biometric probe, said references are biometric references, and said properties are biometric properties.

8. A computer-implemented method according to any preceding claim, wherein enumerating said selected reference vectors comprises performing vector summation in a Single Instruction Multiple Data manner.

9. A computer-implemented method according to the preceding claims, further comprising: - obtaining (221) pluralities of temporary reference vectors (422), each temporary reference vector of said plurality of temporary reference vectors (422) comprising one or more feature comparison scores associated with one or more new references at one or more positions, said one or more positions being unused in said plurality of reference vectors (421), each of said one or more new references being associated with one of said one or more positions, each temporary reference vector of said plurality of temporary reference vectors further comprising zeros at other positions and being fully homomorphically encrypted, said other positions being used for said plurality of references in said plurality of reference vectors, and - modifying (223) said plurality of reference vectors (421) by adding each temporary reference vector of said plurality of temporary reference vectors (422) to a corresponding reference vector of the mentioned multiplicities of reference vectors (421).

10. A computer-implemented method according to the preceding claims, further comprising: - creating (231) pluralities of further temporary reference vectors (424) by multiplying a multiplication vector (457) by each reference vector of said pluralities of reference vectors (423), said multiplication vector (424) comprising one or more values equal to minus one at one or more positions used in said pluralities of reference vectors (423) for one or more references to be removed from said database (410) and zeros at other positions, said other positions being used in said pluralities of reference vectors (423) for references not to be removed from said database (410), and - modifying (223) said pluralities of reference vectors (421) by adding each further temporary reference vector of said pluralities of further temporary reference vectors (422) to a corresponding reference vector of said pluralities of further temporary reference vectors (423) multiplicities of reference vectors (423).

A computer-implemented method according to the preceding claims, wherein a first order in which said property comparison scores are stored in different reference vectors of a respective plurality of reference vectors for a respective reference of said plurality of references differs from a second order in which said property comparison scores are stored in said lookup table (71), said first order being dependent on said respective reference.

A computer-implemented method according to the preceding claims, wherein said indices in said probe information (88) are created by mapping a respective identifier selected from an ordered set of identifiers for each respective property of said plurality of properties to a permuted identifier from a permutation of said ordered set of identifiers, said identifiers identifying lines of cells (75) in said lookup table (71), said respective identifier having a particular order number in said ordered set of identifiers, and said permuted identifier having the same said particular order number in said permutation, said order of said permutation being dependent on said probe.

13. A computer-implemented method according to the preceding claims, wherein said property comparison scores associated with a respective reference of said plurality of references and a respective property of said plurality of properties are obtained from said lookup table (71) based on a quantized actual value of said respective property for said respective reference and are determined for quantized potential values of said respective property for each future probe.

14. A computer-implemented method according to the preceding claims, wherein said property comparison scores are obtained from a single lookup table (71) for said plurality of properties or from a per-property lookup table of said plurality of properties.

15. A computer-implemented method according to the preceding claims, wherein said plurality of reference vectors are associated with a first segment (423) of said database and wherein said method comprises: - selecting, per respective property of said plurality of properties, from a further plurality of reference vectors associated with said respective property, a further reference vector corresponding to said index indicated for said respective property, said plurality of further reference vectors being stored in said database and belonging to a second segment (431, 441) of said database, each of said further reference vectors comprising property comparison scores associated with said plurality of further references and being fully homomorphically encrypted, said property comparison scores associated with a respective further reference of said plurality of further references being obtained from said lookup table based on a value of said respective property for said respective further reference, - calculating a further vector of evaluations by said enumerating selected further reference vectors without decrypting said selected further reference vectors, said further vector of evaluations being fully homomorphically encrypted, and - determining or enabling the determination of said relationship between said probe and each of said plurality of references further based on said further vector of evaluations.

16. A computer program or suite of computer programs comprising at least one software code portion or a computer program product storing at least one software code portion, the software code portion, when executed on a computer system, being adapted to perform the method of claims 1 to 15.

17. A system (31) for determining or enabling the determination of a relationship between a probe and each of a plurality of encrypted references, said system (31) comprising at least one processor (35) configured to: — receive probe information, said probe information indicating, for said probe, an index per property of a plurality of properties, said index being determined per respective property of said plurality of properties based on a value of said respective property for said probe, - selecting, per respective property of said plurality of properties, from a plurality of reference vectors associated with said respective property, a reference vector corresponding to said index indicated for said respective property, said plurality of reference vectors being stored in a database, each of said reference vectors comprising property comparison scores associated with said plurality of references and being fully homomorphically encrypted, said property comparison scores associated with a respective reference of said plurality of references are obtained from a lookup table based on a value of said respective property for said respective reference, - computing a vector of evaluations by enumerating said selected reference vectors without decrypting said selected reference vectors, said vector of evaluations being fully homomorphically encrypted, and - determining or enabling the determination of said relationship between said probe and each of said plurality of references based on said vector of evaluations.