CN116094708B

CN116094708B - Privacy protection method, terminal and storage medium of DBSCAN algorithm

Info

Publication number: CN116094708B
Application number: CN202310080172.3A
Authority: CN
Inventors: 王小伟; 张旭; 吴睿振; 孙华锦; 王凛
Original assignee: Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Center Co Ltd
Current assignee: Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Center Co Ltd
Priority date: 2023-01-30
Filing date: 2023-01-30
Publication date: 2025-07-15
Anticipated expiration: 2043-01-30
Also published as: CN116094708A

Abstract

The present invention relates to the field of information technology, and in particular to a privacy protection method, terminal and storage medium for a DBSCAN algorithm. The method uses a DH protocol to establish keys between a client and servers A and B respectively; wherein, the key between the first client and server A is The key between server B is The client uses additive secret sharing to divide the data points into two parts, and encrypts them with corresponding keys and sends them to the corresponding servers A and B. Servers A and B decrypt the secret sharing values of the received data points to obtain the distance between any two points and the size of M. Server A obtains the point set in the ε-neighborhood of each point through the relationship between the distance between the two points and ε. Server A obtains the clustering result through the DBSCAN algorithm. The present invention saves the computing and communication costs of the client and reduces the computing costs of the cloud server.

Description

Privacy protection method, terminal and storage medium of DBSCAN algorithm

Technical Field

The present invention relates to the field of information technologies, and in particular, to a privacy protection method, a terminal, and a storage medium for a DBSCAN algorithm.

Background

With the rapid development of cloud services, people have made great changes in the way mass data is processed and calculated. When the computing capacity of the local computer is limited and the computing capacity required for solving the problem is huge, the data can be uploaded to the cloud server, and the work of the user can be completed by means of the powerful computing capacity of the cloud server. Today's cloud servers involve content that encompasses aspects of life, such as personal physical status, family member information, company member constitution, etc., that may be related to personal privacy or business confidentiality, which if uploaded directly to the server would result in the risk of revealing the privacy or business confidentiality. For the application scenario of machine learning, since the data volume and the calculation volume of the application scenario tend to be relatively large, the application scenario is often outsourced to a server for operation. Common unsupervised machine learning algorithms are the K-means algorithm, the DBSCAN algorithm, and the like. For the K-means algorithm, there are already some efficient and safe ways of computation. Compared with the K-means algorithm, the DBSCAN algorithm can find any cluster in any shape under the condition of noisy points, and has a remarkable effect of removing malicious data.

In the prior art, a BGV homomorphic encryption algorithm is adopted to secret user data related to a DBSCAN algorithm. The user encrypts the data and then uploads the data to the cloud server, and as the BGV algorithm meets the property of full homomorphism, the cloud server sends the calculation result to the user after performing a series of calculations on the ciphertext, and the user can obtain the correct algorithm result after decrypting the calculation result.

Due to the complexity of fully homomorphic encryption algorithms, there are many operations that increase the computational expense and network expense of clients as well as greatly increasing the amount of server computation during operation of the algorithm. For example, before data encryption, data is first encoded, matrix operation is performed on each piece of data, the encrypted ciphertext may be several times as large as the original data, and the ciphertext needs to be sent to a cloud server through a network. In order to solve the technical problem, a privacy protection method, a terminal and a storage medium of a DBSCAN algorithm are provided.

Disclosure of Invention

In order to solve the technical problems in the prior art, the invention provides a privacy protection method, a terminal and a storage medium of a DBSCAN algorithm.

In order to achieve the above object, the embodiment of the present invention provides the following technical solutions:

in a first aspect, in one embodiment provided by the present invention, there is provided a privacy protection method of a DBSCAN algorithm, the method including the steps of:

Respectively establishing keys between a client and the servers A and B through DH protocol;

wherein, the key between the first client and the server A is set as The key with the server B is

The client divides the data point into two parts by utilizing addition secret sharing, encrypts the data point by using corresponding keys respectively and then sends the encrypted data point to corresponding servers A and B;

The server A, B decrypts the secret sharing value of the received data point respectively, obtain the distance between any two points and the size of M;

the server A obtains a point set in epsilon-neighbor of each point through the size relation between the distance between two points and epsilon, and the server A obtains a clustering result through a DBSCAN algorithm.

As a further aspect of the invention, data points are usedAndFor illustration. Assume that server a gets the data points AndSecret shared value of (2) isData points obtained by server BAndSecret shared value of (2) is The method for obtaining the distance between any two points and the size of M comprises the following steps:

s301, calculating by server A Server B computing

S302, the servers A and B respectively obtain data through multiplication secret sharingAnd is also provided withAndSatisfy the following requirements

S303, calculating by the server A to obtain data The server B calculates the data

S304, the server B sends sign (V _B) to the server A, and the server A preliminarily judgesAnd epsilon size;

S305, comparing the magnitudes of |V _A | and |V _B | by adopting a method OT_Compare (|V _A|,|V_B |) between the servers A and B, thereby obtaining And epsilon.

As a further scheme of the invention, the server B sends sign (V _B) to the server A, and the server A preliminarily judgesAnd epsilon, which includes the following cases:

I. V _A and V _B are 0 or more, and can be obtained

II. V _A and V _B are 0 or less, and can be obtained

III, V _A is greater than or equal to 0, V _B is less than 0, V _A is less than or equal to 0, and V _B is greater than 0, then adopting a method OT_Compare (I V _A|,|V_B I) to Compare the magnitudes of I V _A I and I V _B I between the servers A and B, thereby obtainingAnd epsilon.

As a further aspect of the present invention, the sizes of |v _A | and |v _B | are compared between servers a, B using the method ot_compound (|v _A|,|V_B |), thereby obtainingAnd epsilon, which includes the following cases:

I. if sign (V _A)·sign(|V_A|-|V_B |) is not less than 0, it can be obtained

II. If sign (V _A)·sign(|V_A|-|V_B |) is less than or equal to 0, then

As a further aspect of the present invention, the server a obtains a clustering result through a DBSCAN algorithm, which includes:

Input sample set Neighborhood parameters (. Epsilon., M).

The output is class division c= { C ₁,C₂,…,C_k }.

As a further scheme of the invention, the server A obtains a clustering result through a DBSCAN algorithm, and the method comprises the following steps:

S401, all core points are obtained;

if at some point (e.g ) The number of points in the epsilon-neighborhood of (i.e.)) Setting the set of all core points as omega;

s402, marking all core points as unvisited;

s403, taking out a core point from omega (After removal)) Marking the point as accessed, setting the current class sequence number k, and creating a queue of the current core pointCreating a set of points of a current category

S404, taking out an element from the core point queue omega _k (After removal)) Acquiring all atNon-visited point set in epsilon-neighborhood of (c-c)Non-accessed core point setWill beThese points within epsilon-neighborhood are marked as accessed, and the current class point set is updated toThe core points of the points are collectedTo the core point queue (after joining));

S405, repeating the step S404 until the core point queue omega _k is empty after the step S404 is executed, wherein the current category point set C _k forms a category, and eliminating the core points marked as accessed from omega;

S406, repeating the steps S403-S405 until the element in omega is null, and obtaining the category division of C= { C ₁,C₂,…,C_k }.

As a further aspect of the present invention, the servers a, B do not cross each other.

As a further aspect of the present invention, client dataWherein, the The value range of i is 1-m, and the values of m and n are positive integers.

In a second aspect, in yet another embodiment provided by the present invention, a terminal is provided, including a memory storing a computer program and a processor implementing the steps of the privacy preserving method of the DBSCAN algorithm when the computer program is loaded and executed by the processor.

In a third aspect, in yet another embodiment of the present invention, a storage medium is provided storing a computer program which, when loaded and executed by a processor, implements the steps of the privacy preserving method of the DBSCAN algorithm.

The technical scheme provided by the invention has the following beneficial effects:

According to the privacy protection method, the terminal and the storage medium of the DBSCAN algorithm, too many matrix operations are not needed to be carried out on the client, the number of parameters uploaded to cloud services is greatly reduced, and the calculation and communication cost of the client is saved. And the algorithm does not need the cloud server to perform extremely complex operation on the ciphertext, so that the calculation cost of the cloud server is effectively reduced. The data provider may be a plurality of clients. After the data is divided by adopting an addition secret sharing technology, each divided part is respectively encrypted and sent to different servers, so that interception of other people can be prevented. And then, a multiplication secret sharing technology and an unintentional transmission technology are used between the two servers to acquire the size relation between the distance between the two points and the neighborhood range threshold, so that the contents such as core points, density reachable relations, density connection relations and the like in the DBCSCAN algorithm can be acquired, and further, the safe calculation of the DBSCAN algorithm is realized. Compared with the existing results, the method and the system save the calculation and communication costs of the client and reduce the calculation costs of the cloud server.

These and other aspects of the invention will be more readily apparent from the following description of the embodiments. It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention as claimed.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are necessary for the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention and that other embodiments may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a privacy preserving method of DBSCAN algorithm of an embodiment of the present invention;

fig. 2 is a block diagram of a terminal according to an embodiment of the present invention.

In the figure, a processor-501, a communication interface-502, a memory-503 and a communication bus-504 are shown.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The flow diagrams depicted in the figures are merely illustrative and not necessarily all of the elements and operations/steps are included or performed in the order described. For example, some operations/steps may be further divided, combined, or partially combined, so that the order of actual execution may be changed according to actual situations.

It is to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

In particular, embodiments of the present invention are further described below with reference to the accompanying drawings.

Referring to fig. 1, fig. 1 is a flowchart of a privacy protection method of a DBSCAN algorithm according to an embodiment of the present invention, and as shown in fig. 1, the privacy protection method of the DBSCAN algorithm includes steps S10 to S40. The privacy protection method of the DBSCAN algorithm is applied to a system, wherein the system comprises at least one client and at least one server A and at least one server B.

S10, secret keys between the client and the servers A and B are respectively established through DH protocol.

Wherein, the key between the first client and the server A is set asThe key with the server B is

And S20, the client divides the data point into two parts by utilizing addition secret sharing, encrypts the data point by using the corresponding secret key and sends the encrypted data point to the corresponding servers A and B.

In embodiments of the invention, for example, data pointsT represents matrix transposition, divided into by addition secretAnd then respectively using the corresponding key pairsAnd after encryption, the encrypted data are sent to the servers A and B.

S30, the server A and the server B respectively decrypt the secret sharing values of the received data points, and the distance between any two points and the size of M are obtained by the following steps.

We use the data pointsAndFor illustration.

Suppose that server A getsAndSecret shared value of (2) isObtained by server BAndSecret shared value of (2) is The method comprises the following specific steps:

s301, calculating by server A Server B computing

S302, the servers A and B respectively obtain data through multiplication secret sharingThey satisfy

It can be seen that V _A+V_B results inSo long as the relation between V _A+V_B and 0 can be judgedAnd epsilon.

S304, the server B sends sign (V _B) to the server A, and the server A preliminarily judgesAnd epsilon. The following conditions are adopted:

I. V _A and V _B are 0 or more, and can be obtained

II. V _A and V _B are 0 or less, and can be obtained

III, V _A is greater than or equal to 0, V _B is less than 0, V _A is less than or equal to 0, and V _B is greater than 0, and the process proceeds to step S305.

S305, comparing the magnitudes of |V _A | and |V _B | by adopting a method OT_Compare (|V _A|,|V_B |) between the servers A and B, thereby obtainingAnd epsilon. The following cases are divided into

I. if sign (V _A)·sign(|V_A|-|V_B |) is not less than 0, it can be obtained

II. If sign (V _A)·sign(|V_A|-|V_B |) is less than or equal to 0, then

S40, the server A obtains a point set in the epsilon-neighborhood of each point according to the size relation between the distance between the two points and epsilon, and the server A obtains a clustering result according to a DBSCAN algorithm.

The invention firstly divides the data into two parts by an addition secret sharing technology, encrypts the two parts and sends the encrypted data to the corresponding cloud server. After the cloud server decrypts the data, the secret sharing value of the square of the distance between two points can be obtained by using the addition and multiplication secret sharing. And then, obtaining the magnitude relation between the square of the distance between the two points and the square of the distance threshold value by using the unintentional transmission, and further obtaining the magnitude relation between the distance between the two points and the distance threshold value. Then, one server can obtain a core point, a boundary point and a noise point in the DBSCAN algorithm by utilizing the distance size relation, and obtain the density direct, density reachable and density connection relation of the data points by the core point, the boundary point and the noise point, so as to realize the secret calculation of the DBSCAN algorithm.

The DBSCAN algorithm describes the degree of compactness of a sample set based on the number of samples in the neighborhood. We first describe the parameters associated with the DBSCAN algorithm.

Assuming that the data point set isThere are the following relevant definitions

Epsilon-neighborhood. For the followingIts epsilon-neighborhood refers to the sum of the data point set DIs not greater than epsilon, i.e.The number of this set is counted as

Core points. For any data pointIf it is epsilon-neighborhoodComprises at least M samples, i.eThenIs the core point.

The density is directly reached. If it isIs positioned atWithin epsilon-neighborhood of (2), andIs the core point, then calledFrom the following componentsThe density is directly reached. The opposite is not necessarily the case, unlessIs also the core point. The density is directly asymmetric.

The density can be achieved. For the followingAndIf a sample sequence is presentSatisfy the following requirements AndFrom the following componentsThe density is direct and is calledFrom the following componentsThe density can be achieved. It is easy to see that the density is achieved with transitivity. As with the direct density, the density can be achieved without symmetry.

And (3) density connection. For the followingAndIf there is a core objectMake the following stepsAndAre all provided withThe density is up toAndAnd (3) density connection. Density-connected symmetry can be obtained.

Wherein the servers A and B are not mutually communicated.

Client dataWherein the method comprises the steps ofT represents the matrix transpose, i.e., all data points are n-dimensional vectors. Neighborhood parameters (. Epsilon., M).

In the embodiment of the present invention, the server a obtains a clustering result through a DBSCAN algorithm, which includes:

Input sample set Neighborhood parameters (. Epsilon., M).

The output is class division c= { C ₁,C₂,…,C_k }, where the value of k is not known prior to program execution.

In the embodiment of the invention, the server A obtains the clustering result through the DBSCAN algorithm, and the method comprises the following steps:

s401, all core points are obtained. If at some point (e.g ) The number of points in the epsilon-neighborhood of (i.e.)) This is then the core point. Let the set of all core points be Ω.

S402, marking all core points as unvisited.

S403, taking out a core point from omega(After removal)) Marking the point as accessed, setting the current class sequence number k, and creating a queue of the current core pointCreating a set of points of a current category

S404, taking out an element from the core point queue omega _k (After removal)) Acquiring all atNon-visited point set in epsilon-neighborhood of (c-c)Non-accessed core point setWill beThese points within epsilon-neighborhood are marked as accessed, and the current class point set is updated toThe core points of the points are collectedTo the core point queue (after joining))。

S405, repeating the step S404 until the core point queue Ω _k is empty after the step S404 is completed. At this time, the current category point set C _k forms a category, and core points marked as accessed are removed from Ω.

The key to the confidentiality of the DBSCAN algorithm is the confidentiality of user data information. It can be seen that in the above algorithm, the secret calculation of the DBSCAN algorithm can be realized as long as we can realize the secret judgment of the distance and epsilon of the point. With respect to the measurement of the distance of the data points, we use Euclidean distance in the present invention.

Among them, secret sharing is an important technology in multiparty security computing, and is widely used in the field of privacy protection because it is relatively simple to use. Only two-way additive secret sharing is used in the present invention, which is briefly described below.

In additive secret sharing, data x is randomly split into the sum of two data (x ₀,x₁), i.e., x=x ₀+x₁, and then x ₀,x₁ is stored separately at the participants P ₀,P₁. The recovery of data x must be performed in combination with the data of party P ₀,P₁.

The party P ₀ may be provided with the data x and the other party P ₁ with the data y. The additive secret sharing may be implemented as follows.

1. P ₀ generates a random number x ₀ and computes x ₁＝x-x₀, then sends x ₁ to P ₁.

2. P ₁ generates a random number y ₀ and calculates y ₁＝y-y₀, then sends y ₀ to P ₀.

3. P ₀ calculated z ₀＝x₀+y₀,P₁ calculated z ₁＝x₁+y₁.

At this time, the sum of x and y is calculated, and only z ₀ and z ₁ are needed to be summarized to the demander, and the result is z ₀+z₁. It can be seen that in this process, the data x and y are not revealed, so that the data privacy is well protected.

In addition, if data information is eavesdropped during the data transmission process, the data privacy still has the risk of leakage, the data can be encrypted before the sender sends the data, and for the sake of simplicity of calculation, a traditional symmetric encryption algorithm can be used. With respect to the transmission of the key, we use Diffie-Hellman (hereinafter referred to as DH) key exchange method. The DH key exchange protocol is briefly described next.

Alice and Bob want to share a single key for symmetric encryption. But the communication channel between them is not secure. All information passing through this channel is seen by the adversary Eve. How do they exchange information to let Eve know this key?

The security of the DH algorithm depends on the degree of difficulty in computing discrete logarithms. The concept of primitive root is needed in the following schemes, and we give a definition of this.

Definition 1a is the primitive root of n if the smallest positive power m, which makes a ^m =1 mod n hold, satisfies m=Φ _n. Where Φ _n is the Euler function.

Thus, for any primitive root, a, of integer b and prime number, p, there is a unique power, i, such that b=a ⁱ modp, 0≤i≤p-1. The discrete logarithm difficulty problem is that given a, b, p, it is difficult to calculate i.

The following is a DH protocol scheme:

alice and Bob agree on p and g first, where p is a large prime number, g is the primitive root of p, and p and g are disclosed. Eve also knows their values.

Alice takes a private integer a, not known to anyone, and gives Bob the calculation result a=g ^a modp. Eve also sees the value of A.

3. Similarly, bob takes a private integer B and sends Alice the result b=g ^b modp. Also Eve will see what is passing B.

Alice calculates s=b ^amodp＝(g^b)^amodp＝g^ab modp.

Bob can also calculate s=a ^bmodp＝(g^a)^bmodp＝g^ab modp.

Alice and Bob now have a common key S. Although Eve sees p, g, A, B, she cannot know the specific values of a and B given the difficulty in computing discrete logarithms. Eve has no knowledge of what is the key S.

The 1-out-of-N OT can be implemented as follows:

1. And (5) a preparation stage. The protocol is in the order of the group of big prime number q The upper operation (i.e. the operation results in the protocol are all the results in the modulo q sense), the selection groupIs a primitive root g. A random predictive function H (e.g., SHA-1) is selected. The parameters q, g, H are shared by Alice and Bob.

2. In the initialization stage, alice selects N-1 random numbers C ₁,C₂,…,C_N-1. A random number r is then selected and g ^r is calculated and then C ₁,C₂,…,C_N-1,g^r is sent to Bob. Alice pre-calculates (C ₁)^r,(C₂)^r,…,(C_N-1)^r. (Bob cannot obtain the discrete logarithm of C ₁,C₂,…,C_N-1 and the value of r due to the discrete logarithm difficulty).

3. On-line computing stage:

a. Bob selects a random number k, sets

PK ₀ then sends Alice. (Alice cannot obtain the value of k).

B. Alice calculates (PK ₀)^r, then calculates (PK _i)^r＝(C_i/PK₀)^r, 1. Ltoreq.i≤n-1. Then selects a random string R (where R is selected long enough to ensure that the Hash values for the two different data pairs are different) and encrypts each M _i, 0≤i≤n-1 with H ((PK _i)^r,R,i)⊕M_i), then sends the encryption result and R to Bob.

C. Bob can calculate (PK_σ)^r＝(C_σ/PK₀)^r＝(g^k)^r＝(g^r)^k, and then decrypt using H ((PK _σ)^r, R, σ) to yield M _σ.

The two numbers can then be safely compared in size using this protocol.

Suppose Alice owns data x and Bob owns data y. And x, y e {0,... We can compare the x, y magnitudes using the following procedure.

1. Alice constructs N plaintext messages

2. Bob obtains the value of m _y through 1-out-of-N OT, and can obtain a result

Bob then sends the size result of both to Alice.

If x, y is a generally real number, x can be expressed in N-ary form, e.g., x=x _p-1…x₀.x_-1…x_-q, i.e.The integer has p bits and the decimal has q bits. y can also be expressed as the form y=y _p-1…y₀·y_-1…y_-q of N-ary. The two sizes are then compared starting from the most significant bit,

1. If x _i＝y_i, -q≤i≤p, x=y.

2. If k is present, x _i＝y_i is present when i > k, and x _i＞y_i is present when i=k, then x > y.

3. If k is present, x _i＝y_i is present when i > k, and x _i＜y_i is present when i=k, x < y.

We denote the above procedure of comparing the two sizes by ot_component (x, y).

Secret sharing is an important technology in multiparty security computing, and is widely used in the field of privacy protection because it is relatively simple to use.

In two-party addition secret sharing, data x is randomly split into the sum of two data (x ₀,x₁), i.e., x=x ₀+x₁, and then x ₀,x₁ is held at the participants P ₀,P₁, respectively. The recovery of data x must be performed in combination with the data of party P ₀,P₁.

The party P ₀ may be provided with the data x and the other party P ₁ with the data y. The addition secret sharing can be implemented as follows.

3. P ₀ calculated z ₀＝x₀+y₀,P₁ calculated z ₁＝x₁+y₁.

Two-party multiplicative secret sharing can be implemented with 1-out-of-N OT.

Let party P ₀ possess data x and party P ₁ possess data y. X is expressed as N-ary form x=x _p-1…x₀·x_-1…x_-q, i.eThe integer has p bits and the decimal has q bits. y can also be expressed as the form y=y _p-1…y₀·y_-1…y_-q of N-ary. When i= -q, p-1, the following methods were used for calculation, respectively

1. Bob generates (m _i,0,…,m_i,N-1), where m _i,0 is a random number, m _i,j＝Nⁱjy-m_i,0.

2. Alice uses 1-out-of-N OT acquisition

3. Alice calculationBob calculation

It is easy to see that z _A+z_B =x·y, i.e. to obtain the value of x·y, only the value of z _A,z_B needs to be summarized, and the values of x, y do not need to be revealed.

The invention adopts the addition secret sharing technology to divide the data, then encrypts each divided part and sends the encrypted part to different servers, thereby preventing others from eavesdropping. And then, a multiplication secret sharing technology and an unintentional transmission technology are used between the two servers to acquire the size relation between the distance between the two points and the neighborhood range threshold, so that the contents such as core points, density reachable relations, density connection relations and the like in the DBCSCAN algorithm can be acquired, and further, the safe calculation of the DBSCAN algorithm is realized. Compared with the existing results, the method saves the calculation and communication cost of the client and reduces the calculation cost of the cloud server.

It should be understood that although described in a certain order, the steps are not necessarily performed sequentially in the order described. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, some steps of the present embodiment may include a plurality of steps or stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily sequential, but may be performed alternately or alternately with at least a part of the steps or stages in other steps or other steps.

In one embodiment, referring to fig. 2, a terminal is further provided in an embodiment of the present invention, where the terminal includes a processor 501, a communication interface 502, a memory 503, and a communication bus 504, where the processor 501, the communication interface 502, and the memory 503 perform communication with each other through the communication bus 504.

A memory 503 for storing a computer program;

The processor 501 is configured to execute the privacy protection method of the DBSCAN algorithm when executing the computer program stored in the memory 503, and the processor executes the instructions to implement the steps in the method embodiments described above:

The communication bus mentioned by the above terminal may be a peripheral component interconnect standard (PERIPHERAL COMPONENTINTERCONNECT, abbreviated as PCI) bus or an extended industry standard architecture (Extended Industry StandardArchitecture, abbreviated as EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.

The communication interface is used for communication between the terminal and other devices.

The memory may include random access memory (Random Access Memory, RAM) or may include non-volatile memory (non-volatile memory), such as at least one disk memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.

The processor may be a general-purpose processor including a central Processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), a digital signal processor (DIGITAL SIGNAL Processing, DSP), an application specific integrated circuit (Application SpecificIntegrated Circuit, ASIC), a Field-Programmable GATE ARRAY, FPGA, or other Programmable logic device, discrete gate or transistor logic device, or discrete hardware components.

The terminal comprises user equipment and network equipment. The user equipment comprises, but is not limited to, a computer, a smart phone, a PDA and the like, and the network equipment comprises, but is not limited to, a single network server, a server group formed by a plurality of network servers or Cloud Computing (Cloud Computing) based Cloud formed by a large number of computers or network servers, wherein the Cloud Computing is one of distributed Computing and is a super virtual computer formed by a group of loosely coupled computer sets. The terminal can independently operate to realize the invention, and can also access the network and realize the invention through the interaction operation with other terminals in the network. The network where the terminal is located includes, but is not limited to, the internet, a wide area network, a metropolitan area network, a local area network, a VPN network, and the like.

It should also be understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

In one embodiment of the invention there is also provided a storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method embodiments described above.

Those skilled in the art will appreciate that implementing all or part of the above described embodiment methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the above described embodiment methods. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory.

It should be understood that as used herein, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items. The foregoing embodiment of the present invention has been disclosed with reference to the number of embodiments for the purpose of description only, and does not represent the advantages or disadvantages of the embodiments.

It will be appreciated by persons skilled in the art that the foregoing discussion of any embodiment is merely exemplary and is not intended to imply that the scope of the disclosure of embodiments of the invention, including the claims, is limited to such examples, that technical features of the above embodiments or different embodiments may be combined and that many other variations of the different aspects of the embodiments of the invention as described above exist within the spirit of the embodiments of the invention, which are not provided in detail for clarity. Therefore, any omission, modification, equivalent replacement, improvement, etc. of the embodiments should be included in the protection scope of the embodiments of the present invention.

Claims

1. A privacy protection method for a DBSCAN algorithm, characterized in that the method comprises:

Use the DH protocol to establish keys between the client and servers A and B respectively;

Among them, let the key between the lth client and server A be , the key between server B is ;

The client uses additive secret sharing to divide the data point into two parts, encrypts them with the corresponding keys and sends them to the corresponding servers A and B ;

Servers A and B decrypt the secret shared values of the received data points respectively to obtain the distance between any two points and the size of M ;

Server A obtains the value of each point by the relationship between the distance between two points and ε . The set of points in the neighborhood;

Server A obtains clustering results using the DBSCAN algorithm;

Server A gets the data point and The secret sharing value is , server B gets the data point and The secret sharing value is ; T is the matrix transpose, and the method of obtaining the distance between any two points and the size of M includes the following steps:

Server A calculates , server B calculates ;

Through multiplicative secret sharing, the server Get data separately , ,and satisfy ;

server Calculate the data , server B calculates the data , where ε is the area radius of the data point;

Server B sends sign( V _B ) to Server A , which makes a preliminary judgment and the size of ε ;

Between servers A and B , the method Compare and The size of and the size of ε ;

Among them, ε is the domain radius of the data point, and M is the number of samples.

2. The privacy protection method of the DBSCAN algorithm as claimed _in claim 1, characterized in that the server B sends sign( VB ) to the server A , and the server A preliminarily determines With the size of ε , it includes the following cases:

If _both VA and VB are greater than or _equal to 0, we get ;

If _both VA and VB are less than or _equal to 0, we get ;

_If VA is greater than or equal to 0 and VB is less than 0, _or VA _is less than or equal to 0 and VB _is greater than 0, then the method is used between servers A and B. Compare and The size of with the size of ε .

3. The privacy protection method of the DBSCAN algorithm as claimed in claim 1, characterized in that a method is used between servers A and B Compare and The size of With the size of ε , it includes the following cases:

if , then we get ,

if , then we get .

4. The privacy protection method of the DBSCAN algorithm according to claim 1, wherein the server A obtains the clustering result through the DBSCAN algorithm, which comprises:

Input: Sample set , neighborhood parameter ( ε , M );

Output: Classification .

5. The privacy protection method of the DBSCAN algorithm according to claim 1, characterized in that the server A obtains the clustering result through the DBSCAN algorithm, comprising the following steps:

S401, finding all core points;

If a certain point If the number of points in the neighborhood is greater than M , then this point is a core point. Let the set of all core points be Ω ;

S402, marking the core point as unvisited;

S403. Take a core point from Ω , mark this point as visited, set the current category number k , and create a queue for the current core point , create the current category point set ;

S404, take out an element from the core point queue Ω _k , after taking out , get all of The set of points not visited in the neighborhood , the set of unvisited core points ;Will of These points in the neighborhood are marked as visited, and the current category point set is updated to , the core point set among these points Join the core point queue;

S405, repeat step S404 until after step S404 is executed, the core point queue Ω _k is empty; the current category point set C _k constitutes one category, and the core points marked as visited are removed from Ω ;

S406, repeat steps S403-S405 until the element in Ω is empty, and the category division is obtained. .

6. The privacy protection method of the DBSCAN algorithm as described in claim 1 is characterized in that the servers A and B do not collude with each other.

7. The privacy protection method of the DBSCAN algorithm as claimed in claim 1, characterized in that the client data ,in, , is an n- dimensional vector, the value range of i is 1-m; the values of m and n are both positive integers.

8. A terminal comprising a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of the privacy protection method of the DBSCAN algorithm as described in any one of claims 1 to 7 when loading and executing the computer program.

9. A storage medium storing a computer program, wherein when the computer program is loaded and executed by a processor, the steps of the privacy protection method of the DBSCAN algorithm as described in any one of claims 1 to 7 are implemented.