Disclosure of Invention
In order to solve the technical problems in the prior art, the invention provides a privacy protection method, a terminal and a storage medium of a DBSCAN algorithm.
In order to achieve the above object, the embodiment of the present invention provides the following technical solutions:
in a first aspect, in one embodiment provided by the present invention, there is provided a privacy protection method of a DBSCAN algorithm, the method including the steps of:
Respectively establishing keys between a client and the servers A and B through DH protocol;
wherein, the key between the first client and the server A is set as The key with the server B is
The client divides the data point into two parts by utilizing addition secret sharing, encrypts the data point by using corresponding keys respectively and then sends the encrypted data point to corresponding servers A and B;
The server A, B decrypts the secret sharing value of the received data point respectively, obtain the distance between any two points and the size of M;
the server A obtains a point set in epsilon-neighbor of each point through the size relation between the distance between two points and epsilon, and the server A obtains a clustering result through a DBSCAN algorithm.
As a further aspect of the invention, data points are usedAndFor illustration. Assume that server a gets the data points AndSecret shared value of (2) isData points obtained by server BAndSecret shared value of (2) is The method for obtaining the distance between any two points and the size of M comprises the following steps:
s301, calculating by server A Server B computing
S302, the servers A and B respectively obtain data through multiplication secret sharingAnd is also provided withAndSatisfy the following requirements
S303, calculating by the server A to obtain data The server B calculates the data
S304, the server B sends sign (V B) to the server A, and the server A preliminarily judgesAnd epsilon size;
S305, comparing the magnitudes of |V A | and |V B | by adopting a method OT_Compare (|V A|,|VB |) between the servers A and B, thereby obtaining And epsilon.
As a further scheme of the invention, the server B sends sign (V B) to the server A, and the server A preliminarily judgesAnd epsilon, which includes the following cases:
I. V A and V B are 0 or more, and can be obtained
II. V A and V B are 0 or less, and can be obtained
III, V A is greater than or equal to 0, V B is less than 0, V A is less than or equal to 0, and V B is greater than 0, then adopting a method OT_Compare (I V A|,|VB I) to Compare the magnitudes of I V A I and I V B I between the servers A and B, thereby obtainingAnd epsilon.
As a further aspect of the present invention, the sizes of |v A | and |v B | are compared between servers a, B using the method ot_compound (|v A|,|VB |), thereby obtainingAnd epsilon, which includes the following cases:
I. if sign (V A)·sign(|VA|-|VB |) is not less than 0, it can be obtained
II. If sign (V A)·sign(|VA|-|VB |) is less than or equal to 0, then
As a further aspect of the present invention, the server a obtains a clustering result through a DBSCAN algorithm, which includes:
Input sample set Neighborhood parameters (. Epsilon., M).
The output is class division c= { C 1,C2,…,Ck }.
As a further scheme of the invention, the server A obtains a clustering result through a DBSCAN algorithm, and the method comprises the following steps:
S401, all core points are obtained;
if at some point (e.g ) The number of points in the epsilon-neighborhood of (i.e.)) Setting the set of all core points as omega;
s402, marking all core points as unvisited;
s403, taking out a core point from omega (After removal)) Marking the point as accessed, setting the current class sequence number k, and creating a queue of the current core pointCreating a set of points of a current category
S404, taking out an element from the core point queue omega k (After removal)) Acquiring all atNon-visited point set in epsilon-neighborhood of (c-c)Non-accessed core point setWill beThese points within epsilon-neighborhood are marked as accessed, and the current class point set is updated toThe core points of the points are collectedTo the core point queue (after joining));
S405, repeating the step S404 until the core point queue omega k is empty after the step S404 is executed, wherein the current category point set C k forms a category, and eliminating the core points marked as accessed from omega;
S406, repeating the steps S403-S405 until the element in omega is null, and obtaining the category division of C= { C 1,C2,…,Ck }.
As a further aspect of the present invention, the servers a, B do not cross each other.
As a further aspect of the present invention, client dataWherein, the The value range of i is 1-m, and the values of m and n are positive integers.
In a second aspect, in yet another embodiment provided by the present invention, a terminal is provided, including a memory storing a computer program and a processor implementing the steps of the privacy preserving method of the DBSCAN algorithm when the computer program is loaded and executed by the processor.
In a third aspect, in yet another embodiment of the present invention, a storage medium is provided storing a computer program which, when loaded and executed by a processor, implements the steps of the privacy preserving method of the DBSCAN algorithm.
The technical scheme provided by the invention has the following beneficial effects:
According to the privacy protection method, the terminal and the storage medium of the DBSCAN algorithm, too many matrix operations are not needed to be carried out on the client, the number of parameters uploaded to cloud services is greatly reduced, and the calculation and communication cost of the client is saved. And the algorithm does not need the cloud server to perform extremely complex operation on the ciphertext, so that the calculation cost of the cloud server is effectively reduced. The data provider may be a plurality of clients. After the data is divided by adopting an addition secret sharing technology, each divided part is respectively encrypted and sent to different servers, so that interception of other people can be prevented. And then, a multiplication secret sharing technology and an unintentional transmission technology are used between the two servers to acquire the size relation between the distance between the two points and the neighborhood range threshold, so that the contents such as core points, density reachable relations, density connection relations and the like in the DBCSCAN algorithm can be acquired, and further, the safe calculation of the DBSCAN algorithm is realized. Compared with the existing results, the method and the system save the calculation and communication costs of the client and reduce the calculation costs of the cloud server.
These and other aspects of the invention will be more readily apparent from the following description of the embodiments. It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention as claimed.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The flow diagrams depicted in the figures are merely illustrative and not necessarily all of the elements and operations/steps are included or performed in the order described. For example, some operations/steps may be further divided, combined, or partially combined, so that the order of actual execution may be changed according to actual situations.
It is to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
In particular, embodiments of the present invention are further described below with reference to the accompanying drawings.
Referring to fig. 1, fig. 1 is a flowchart of a privacy protection method of a DBSCAN algorithm according to an embodiment of the present invention, and as shown in fig. 1, the privacy protection method of the DBSCAN algorithm includes steps S10 to S40. The privacy protection method of the DBSCAN algorithm is applied to a system, wherein the system comprises at least one client and at least one server A and at least one server B.
S10, secret keys between the client and the servers A and B are respectively established through DH protocol.
Wherein, the key between the first client and the server A is set asThe key with the server B is
And S20, the client divides the data point into two parts by utilizing addition secret sharing, encrypts the data point by using the corresponding secret key and sends the encrypted data point to the corresponding servers A and B.
In embodiments of the invention, for example, data pointsT represents matrix transposition, divided into by addition secretAnd then respectively using the corresponding key pairsAnd after encryption, the encrypted data are sent to the servers A and B.
S30, the server A and the server B respectively decrypt the secret sharing values of the received data points, and the distance between any two points and the size of M are obtained by the following steps.
We use the data pointsAndFor illustration.
Suppose that server A getsAndSecret shared value of (2) isObtained by server BAndSecret shared value of (2) is The method comprises the following specific steps:
s301, calculating by server A Server B computing
S302, the servers A and B respectively obtain data through multiplication secret sharingThey satisfy
S303, calculating by the server A to obtain data The server B calculates the data
It can be seen that V A+VB results inSo long as the relation between V A+VB and 0 can be judgedAnd epsilon.
S304, the server B sends sign (V B) to the server A, and the server A preliminarily judgesAnd epsilon. The following conditions are adopted:
I. V A and V B are 0 or more, and can be obtained
II. V A and V B are 0 or less, and can be obtained
III, V A is greater than or equal to 0, V B is less than 0, V A is less than or equal to 0, and V B is greater than 0, and the process proceeds to step S305.
S305, comparing the magnitudes of |V A | and |V B | by adopting a method OT_Compare (|V A|,|VB |) between the servers A and B, thereby obtainingAnd epsilon. The following cases are divided into
I. if sign (V A)·sign(|VA|-|VB |) is not less than 0, it can be obtained
II. If sign (V A)·sign(|VA|-|VB |) is less than or equal to 0, then
S40, the server A obtains a point set in the epsilon-neighborhood of each point according to the size relation between the distance between the two points and epsilon, and the server A obtains a clustering result according to a DBSCAN algorithm.
The invention firstly divides the data into two parts by an addition secret sharing technology, encrypts the two parts and sends the encrypted data to the corresponding cloud server. After the cloud server decrypts the data, the secret sharing value of the square of the distance between two points can be obtained by using the addition and multiplication secret sharing. And then, obtaining the magnitude relation between the square of the distance between the two points and the square of the distance threshold value by using the unintentional transmission, and further obtaining the magnitude relation between the distance between the two points and the distance threshold value. Then, one server can obtain a core point, a boundary point and a noise point in the DBSCAN algorithm by utilizing the distance size relation, and obtain the density direct, density reachable and density connection relation of the data points by the core point, the boundary point and the noise point, so as to realize the secret calculation of the DBSCAN algorithm.
The DBSCAN algorithm describes the degree of compactness of a sample set based on the number of samples in the neighborhood. We first describe the parameters associated with the DBSCAN algorithm.
Assuming that the data point set isThere are the following relevant definitions
Epsilon-neighborhood. For the followingIts epsilon-neighborhood refers to the sum of the data point set DIs not greater than epsilon, i.e.The number of this set is counted as
Core points. For any data pointIf it is epsilon-neighborhoodComprises at least M samples, i.eThenIs the core point.
The density is directly reached. If it isIs positioned atWithin epsilon-neighborhood of (2), andIs the core point, then calledFrom the following componentsThe density is directly reached. The opposite is not necessarily the case, unlessIs also the core point. The density is directly asymmetric.
The density can be achieved. For the followingAndIf a sample sequence is presentSatisfy the following requirements AndFrom the following componentsThe density is direct and is calledFrom the following componentsThe density can be achieved. It is easy to see that the density is achieved with transitivity. As with the direct density, the density can be achieved without symmetry.
And (3) density connection. For the followingAndIf there is a core objectMake the following stepsAndAre all provided withThe density is up toAndAnd (3) density connection. Density-connected symmetry can be obtained.
Wherein the servers A and B are not mutually communicated.
Client dataWherein the method comprises the steps ofT represents the matrix transpose, i.e., all data points are n-dimensional vectors. Neighborhood parameters (. Epsilon., M).
In the embodiment of the present invention, the server a obtains a clustering result through a DBSCAN algorithm, which includes:
Input sample set Neighborhood parameters (. Epsilon., M).
The output is class division c= { C 1,C2,…,Ck }, where the value of k is not known prior to program execution.
In the embodiment of the invention, the server A obtains the clustering result through the DBSCAN algorithm, and the method comprises the following steps:
s401, all core points are obtained. If at some point (e.g ) The number of points in the epsilon-neighborhood of (i.e.)) This is then the core point. Let the set of all core points be Ω.
S402, marking all core points as unvisited.
S403, taking out a core point from omega(After removal)) Marking the point as accessed, setting the current class sequence number k, and creating a queue of the current core pointCreating a set of points of a current category
S404, taking out an element from the core point queue omega k (After removal)) Acquiring all atNon-visited point set in epsilon-neighborhood of (c-c)Non-accessed core point setWill beThese points within epsilon-neighborhood are marked as accessed, and the current class point set is updated toThe core points of the points are collectedTo the core point queue (after joining))。
S405, repeating the step S404 until the core point queue Ω k is empty after the step S404 is completed. At this time, the current category point set C k forms a category, and core points marked as accessed are removed from Ω.
S406, repeating the steps S403-S405 until the element in omega is null, and obtaining the category division of C= { C 1,C2,…,Ck }.
The key to the confidentiality of the DBSCAN algorithm is the confidentiality of user data information. It can be seen that in the above algorithm, the secret calculation of the DBSCAN algorithm can be realized as long as we can realize the secret judgment of the distance and epsilon of the point. With respect to the measurement of the distance of the data points, we use Euclidean distance in the present invention.
Among them, secret sharing is an important technology in multiparty security computing, and is widely used in the field of privacy protection because it is relatively simple to use. Only two-way additive secret sharing is used in the present invention, which is briefly described below.
In additive secret sharing, data x is randomly split into the sum of two data (x 0,x1), i.e., x=x 0+x1, and then x 0,x1 is stored separately at the participants P 0,P1. The recovery of data x must be performed in combination with the data of party P 0,P1.
The party P 0 may be provided with the data x and the other party P 1 with the data y. The additive secret sharing may be implemented as follows.
1. P 0 generates a random number x 0 and computes x 1=x-x0, then sends x 1 to P 1.
2. P 1 generates a random number y 0 and calculates y 1=y-y0, then sends y 0 to P 0.
3. P 0 calculated z 0=x0+y0,P1 calculated z 1=x1+y1.
At this time, the sum of x and y is calculated, and only z 0 and z 1 are needed to be summarized to the demander, and the result is z 0+z1. It can be seen that in this process, the data x and y are not revealed, so that the data privacy is well protected.
In addition, if data information is eavesdropped during the data transmission process, the data privacy still has the risk of leakage, the data can be encrypted before the sender sends the data, and for the sake of simplicity of calculation, a traditional symmetric encryption algorithm can be used. With respect to the transmission of the key, we use Diffie-Hellman (hereinafter referred to as DH) key exchange method. The DH key exchange protocol is briefly described next.
Alice and Bob want to share a single key for symmetric encryption. But the communication channel between them is not secure. All information passing through this channel is seen by the adversary Eve. How do they exchange information to let Eve know this key?
The security of the DH algorithm depends on the degree of difficulty in computing discrete logarithms. The concept of primitive root is needed in the following schemes, and we give a definition of this.
Definition 1a is the primitive root of n if the smallest positive power m, which makes a m =1 mod n hold, satisfies m=Φ n. Where Φ n is the Euler function.
Thus, for any primitive root, a, of integer b and prime number, p, there is a unique power, i, such that b=a i modp, 0≤i≤p-1. The discrete logarithm difficulty problem is that given a, b, p, it is difficult to calculate i.
The following is a DH protocol scheme:
alice and Bob agree on p and g first, where p is a large prime number, g is the primitive root of p, and p and g are disclosed. Eve also knows their values.
Alice takes a private integer a, not known to anyone, and gives Bob the calculation result a=g a modp. Eve also sees the value of A.
3. Similarly, bob takes a private integer B and sends Alice the result b=g b modp. Also Eve will see what is passing B.
Alice calculates s=b amodp=(gb)amodp=gab modp.
Bob can also calculate s=a bmodp=(ga)bmodp=gab modp.
Alice and Bob now have a common key S. Although Eve sees p, g, A, B, she cannot know the specific values of a and B given the difficulty in computing discrete logarithms. Eve has no knowledge of what is the key S.
The 1-out-of-N OT can be implemented as follows:
1. And (5) a preparation stage. The protocol is in the order of the group of big prime number q The upper operation (i.e. the operation results in the protocol are all the results in the modulo q sense), the selection groupIs a primitive root g. A random predictive function H (e.g., SHA-1) is selected. The parameters q, g, H are shared by Alice and Bob.
2. In the initialization stage, alice selects N-1 random numbers C 1,C2,…,CN-1. A random number r is then selected and g r is calculated and then C 1,C2,…,CN-1,gr is sent to Bob. Alice pre-calculates (C 1)r,(C2)r,…,(CN-1)r. (Bob cannot obtain the discrete logarithm of C 1,C2,…,CN-1 and the value of r due to the discrete logarithm difficulty).
3. On-line computing stage:
a. Bob selects a random number k, sets
PK 0 then sends Alice. (Alice cannot obtain the value of k).
B. Alice calculates (PK 0)r, then calculates (PK i)r=(Ci/PK0)r, 1. Ltoreq.i≤n-1. Then selects a random string R (where R is selected long enough to ensure that the Hash values for the two different data pairs are different) and encrypts each M i, 0≤i≤n-1 with H ((PK i)r,R,i)⊕Mi), then sends the encryption result and R to Bob.
C. Bob can calculate (PKσ)r=(Cσ/PK0)r=(gk)r=(gr)k, and then decrypt using H ((PK σ)r, R, σ) to yield M σ.
The two numbers can then be safely compared in size using this protocol.
Suppose Alice owns data x and Bob owns data y. And x, y e {0,... We can compare the x, y magnitudes using the following procedure.
1. Alice constructs N plaintext messages
2. Bob obtains the value of m y through 1-out-of-N OT, and can obtain a result
Bob then sends the size result of both to Alice.
If x, y is a generally real number, x can be expressed in N-ary form, e.g., x=x p-1…x0.x-1…x-q, i.e.The integer has p bits and the decimal has q bits. y can also be expressed as the form y=y p-1…y0·y-1…y-q of N-ary. The two sizes are then compared starting from the most significant bit,
1. If x i=yi, -q≤i≤p, x=y.
2. If k is present, x i=yi is present when i > k, and x i>yi is present when i=k, then x > y.
3. If k is present, x i=yi is present when i > k, and x i<yi is present when i=k, x < y.
We denote the above procedure of comparing the two sizes by ot_component (x, y).
Secret sharing is an important technology in multiparty security computing, and is widely used in the field of privacy protection because it is relatively simple to use.
In two-party addition secret sharing, data x is randomly split into the sum of two data (x 0,x1), i.e., x=x 0+x1, and then x 0,x1 is held at the participants P 0,P1, respectively. The recovery of data x must be performed in combination with the data of party P 0,P1.
The party P 0 may be provided with the data x and the other party P 1 with the data y. The addition secret sharing can be implemented as follows.
1. P 0 generates a random number x 0 and computes x 1=x-x0, then sends x 1 to P 1.
2. P 1 generates a random number y 0 and calculates y 1=y-y0, then sends y 0 to P 0.
3. P 0 calculated z 0=x0+y0,P1 calculated z 1=x1+y1.
At this time, the sum of x and y is calculated, and only z 0 and z 1 are needed to be summarized to the demander, and the result is z 0+z1. It can be seen that in this process, the data x and y are not revealed, so that the data privacy is well protected.
Two-party multiplicative secret sharing can be implemented with 1-out-of-N OT.
Let party P 0 possess data x and party P 1 possess data y. X is expressed as N-ary form x=x p-1…x0·x-1…x-q, i.eThe integer has p bits and the decimal has q bits. y can also be expressed as the form y=y p-1…y0·y-1…y-q of N-ary. When i= -q, p-1, the following methods were used for calculation, respectively
1. Bob generates (m i,0,…,mi,N-1), where m i,0 is a random number, m i,j=Nijy-mi,0.
2. Alice uses 1-out-of-N OT acquisition
3. Alice calculationBob calculation
It is easy to see that z A+zB =x·y, i.e. to obtain the value of x·y, only the value of z A,zB needs to be summarized, and the values of x, y do not need to be revealed.
The invention adopts the addition secret sharing technology to divide the data, then encrypts each divided part and sends the encrypted part to different servers, thereby preventing others from eavesdropping. And then, a multiplication secret sharing technology and an unintentional transmission technology are used between the two servers to acquire the size relation between the distance between the two points and the neighborhood range threshold, so that the contents such as core points, density reachable relations, density connection relations and the like in the DBCSCAN algorithm can be acquired, and further, the safe calculation of the DBSCAN algorithm is realized. Compared with the existing results, the method saves the calculation and communication cost of the client and reduces the calculation cost of the cloud server.
It should be understood that although described in a certain order, the steps are not necessarily performed sequentially in the order described. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, some steps of the present embodiment may include a plurality of steps or stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily sequential, but may be performed alternately or alternately with at least a part of the steps or stages in other steps or other steps.
In one embodiment, referring to fig. 2, a terminal is further provided in an embodiment of the present invention, where the terminal includes a processor 501, a communication interface 502, a memory 503, and a communication bus 504, where the processor 501, the communication interface 502, and the memory 503 perform communication with each other through the communication bus 504.
A memory 503 for storing a computer program;
The processor 501 is configured to execute the privacy protection method of the DBSCAN algorithm when executing the computer program stored in the memory 503, and the processor executes the instructions to implement the steps in the method embodiments described above:
The communication bus mentioned by the above terminal may be a peripheral component interconnect standard (PERIPHERAL COMPONENTINTERCONNECT, abbreviated as PCI) bus or an extended industry standard architecture (Extended Industry StandardArchitecture, abbreviated as EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.
The communication interface is used for communication between the terminal and other devices.
The memory may include random access memory (Random Access Memory, RAM) or may include non-volatile memory (non-volatile memory), such as at least one disk memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.
The processor may be a general-purpose processor including a central Processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), a digital signal processor (DIGITAL SIGNAL Processing, DSP), an application specific integrated circuit (Application SpecificIntegrated Circuit, ASIC), a Field-Programmable GATE ARRAY, FPGA, or other Programmable logic device, discrete gate or transistor logic device, or discrete hardware components.
The terminal comprises user equipment and network equipment. The user equipment comprises, but is not limited to, a computer, a smart phone, a PDA and the like, and the network equipment comprises, but is not limited to, a single network server, a server group formed by a plurality of network servers or Cloud Computing (Cloud Computing) based Cloud formed by a large number of computers or network servers, wherein the Cloud Computing is one of distributed Computing and is a super virtual computer formed by a group of loosely coupled computer sets. The terminal can independently operate to realize the invention, and can also access the network and realize the invention through the interaction operation with other terminals in the network. The network where the terminal is located includes, but is not limited to, the internet, a wide area network, a metropolitan area network, a local area network, a VPN network, and the like.
It should also be understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
In one embodiment of the invention there is also provided a storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method embodiments described above.
Those skilled in the art will appreciate that implementing all or part of the above described embodiment methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the above described embodiment methods. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory.
It should be understood that as used herein, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items. The foregoing embodiment of the present invention has been disclosed with reference to the number of embodiments for the purpose of description only, and does not represent the advantages or disadvantages of the embodiments.
It will be appreciated by persons skilled in the art that the foregoing discussion of any embodiment is merely exemplary and is not intended to imply that the scope of the disclosure of embodiments of the invention, including the claims, is limited to such examples, that technical features of the above embodiments or different embodiments may be combined and that many other variations of the different aspects of the embodiments of the invention as described above exist within the spirit of the embodiments of the invention, which are not provided in detail for clarity. Therefore, any omission, modification, equivalent replacement, improvement, etc. of the embodiments should be included in the protection scope of the embodiments of the present invention.