CN120074838A

CN120074838A - Data element privacy protection method based on zero-knowledge machine learning and on-chain verification

Info

Publication number: CN120074838A
Application number: CN202510229298.1A
Authority: CN
Inventors: 冯立波; 袁泽辉; 郭俊威; 刘孟壮; 王金丽; 余益民
Original assignee: Yunnan University YNU
Current assignee: Yunnan University YNU
Priority date: 2025-02-28
Filing date: 2025-02-28
Publication date: 2025-05-30
Anticipated expiration: 2045-02-28
Also published as: CN120074838B

Abstract

The present invention discloses a data element privacy protection method based on zero-knowledge machine learning and on-chain verification, including: constructing a system model consisting of decentralized applications, contract users, data suppliers, distributed oracle architecture and on-chain verification nodes; delegating computationally intensive tasks in the model to off-chain oracle machines for execution through ZK-SNARK technology; based on the zero-knowledge machine learning model on-chain and parameter update algorithm, the model parameters processed by differential privacy are quickly synchronized to multiple oracle nodes through the zero-knowledge gossip protocol, and the update process of these parameters is recorded in the blockchain. The present invention delegates the model calculation tasks to the off-chain oracle server for execution through ZK-SNARK technology, significantly reducing the on-chain computing cost and effectively protecting privacy.

Description

Data element privacy protection method based on zero-knowledge machine learning and on-chain verification

Technical Field

The invention belongs to the technical field of big data security, and particularly relates to a data element privacy protection method based on zero-knowledge machine learning and on-chain verification.

Background

Privacy protection has become particularly critical in the data-driven era. With the widespread use of personal data, transaction information, and business data, how to ensure that these sensitive data are not revealed during analysis and sharing has become a major problem to be solved. Traditional privacy preserving means, such as data encryption, desensitization and access control, have been widely used in the early days, but the limitations of these approaches are emerging as the data requirements become increasingly complex. In particular, in the field of machine learning, data sharing and use often face the risk of privacy disclosure, while model training relies on large amounts of data, resulting in an increasingly acute conflict between privacy protection and data availability. At the same time, the integrity and prediction accuracy problems of machine learning models are also of increasing concern. How does the model owner prove that its model trains on a specification-compliant basis? how to prove this while ensuring the underlying dataset and model privacy.

The current mainstream solution is zero-knowledge machine learning. The zero knowledge proof technology can effectively verify the correctness of model prediction without revealing private information. Within the ZK-SNARK framework, provers use given public input (i.e., the basic structure of the model) and private input data (i.e., the user data of the model). The verifier may use the proof pi to verify the integrity of the model training or prediction without accessing the private input data. However, to ensure the trustworthiness and transparency of zero knowledge proof, especially in multiparty participation scenarios, it becomes critical to rely on a decentralized, non-tamperable platform to maintain trust. At this point, blockchain technology provides an ideal solution. The blockchain not only can ensure the credibility of the operation of each party through the decentralization and transparent characteristics of the blockchain, but also can ensure that each operation meets the preset privacy protection specification through the process of automatically verifying and executing zero knowledge proof through intelligent contracts. By means of the block chain, all training processes, model updating and prediction verification can be recorded on the chain in a public and transparent mode, data privacy protection is enhanced, and meanwhile auditability and compliance of the whole system are improved.

However, while blockchain provides a strong trust foundation and guarantee mechanism for zero knowledge proof, in practical applications, blockchain technology itself faces a series of challenges. First, on-chain computing resources are inadequate and costly. Each node of the blockchain participates in verification and computation, while the computing power of these nodes is relatively low and each computation requires a computational expense. Second, security of on-link and off-link data transmission. The data on the chain can be stolen or tampered by some malicious nodes, and sensitive information is easy to leak in the transmission process of the data under the chain. Third, on-link intelligent contracts are not scalable enough. Intelligent contracts face limitations in computational and space resources, cannot use machine learning models to accomplish complex business tasks, and have insufficient programmability and flexibility in programming languages.

Disclosure of Invention

In view of this, the invention provides a data element privacy protection method based on zero-knowledge machine learning and on-chain verification, which is implemented by putting model calculation tasks into an under-chain predictor server through ZK-SNARK technology, obviously reduces on-chain calculation cost, effectively protects privacy, designs a zero-knowledge machine learning model uplink and parameter updating algorithm, rapidly synchronizes model parameters subjected to differential privacy processing to a plurality of predictor nodes through a zero-knowledge gossip protocol, and records the updating process of the parameters in a blockchain to ensure the safety and traceability of the model parameters.

In order to achieve the above purpose, the present invention provides the following technical solutions:

the invention provides a data element privacy protection method based on zero-knowledge machine learning and on-chain verification, which comprises the following steps:

constructing a system model consisting of an decentralized application, a contract user, a data provider, a distributed predictor architecture and on-chain verification nodes;

The computational intensive tasks in the model are put into an under-chain predictor to be executed through the ZK-SNARK technology;

based on a zero-knowledge machine learning model uplink and parameter updating algorithm, model parameters subjected to differential privacy processing are rapidly synchronized to a plurality of predictor nodes through a zero-knowledge gossip protocol, and updating processes of the parameters are recorded in a block chain.

Preferably, in the system model:

an decentralized application for providing services to contract users in the form of intelligent contracts on a blockchain;

the contract user side comprises a model trainer side and a common user side, wherein the common user side takes private and real data as input to exchange the decentralization service provided by the decentralization application, and the model trainer side is provided with a circuit in the right zero knowledge proof;

The data provider, independent of the blockchain and as a data source for user data and business data, verifies the authenticity of the data by signing the data for the user using its private key, the signed data being verified by the corresponding public key;

The prophetic machine adopts a distributed architecture, and is used for acquiring data from a data provider under a chain according to the request of a user contract, generating a certification through ZK-SNARK and transmitting the certification to the chain;

And the on-chain verification node is used for verifying the Proof and the model of the incoming intelligent contract.

Preferably, the machine learning trusted data feed process under the zero knowledge chain comprises:

Starting from the request from the contract user side, a data feed request triggers any event request by the user contract to the foresight contract, then to the foresight to request data from an external data source, a model proof is generated in the foresight, then to the model forecast to obtain a result, and the end user takes the result back.

Preferably, the zero-knowledge-chain machine learning application flow mainly comprises the following steps:

the contract user side submits service requests according to different functions provided by the centralized application by using the identity authentication information, the identity and service data of the user are uploaded to the service contract, and the service contract forwards the information to a predictor on a blockchain after receiving the requests;

Automatically starting a contract event trigger within the time specified by the service contract, and requesting a predictor to acquire data;

The predictor requests the privacy data of the user from the data source through the HTTP request carrying the identity authentication information of the user;

The data source verifies the authenticity of the identity of the user according to the identity authentication information, signs the privacy data of the user through a private key, and returns the signed data to the predictor;

The predictor verifies the integrity of the data, desensitizes the data according to specific business contract model information and circuit information to obtain desensitized data, and then sends the business contract model information, the circuit information, the desensitized data and current model parameters to a model contract;

the model trainer sends Drequest to the model contract to obtain model information, circuit information, model parameters, and desensitization data;

The model trainer obtains the data training to obtain new model parameters, generates a public witness of the model through the public parameters and generates a proof, and returns a training result and the generated data to the model contract of the blockchain through generating a verification key and a proof key of the model circuit;

periodically actively verifying the model proof transaction uploaded to the blockchain by a verification node of the blockchain, so as to determine effective model parameters, and fusing the effective model parameters to obtain new model parameters;

Repeating the step of generating the circuit certificate again to generate a group of new model certificates for the later verification nodes to verify the model parameters;

When the verification is passed, the machine learning model circuit parameters of the service are approved by the blockchain and the predictor server for the service call of the common user.

Preferably, after the predictor server updates the model parameters, the personal data of the user is converted into a private input witness of the model circuit and a corresponding zero knowledge proof is generated;

the verification node verifies the zero knowledge proof to ensure the credibility and accuracy of the model reasoning process;

after the verification node is successfully verified, the correct model reasoning result is determined, and the service request of the user can be processed continuously.

Preferably, the zero-knowledge machine learning model uplink and parameter updating algorithm process comprises the following steps:

Model initialization, namely, for each node, calling an initialization Model () subroutine to Initialize Model parameters;

in each training period, each node calls a Train Model () subroutine to update its Model parameters;

model synchronization and aggregation each node invokes Synchronize Model () subroutine to synchronize its model parameters and obtain final model parameters, and aggregate the final model parameters of all nodes into a global final model parameter.

Preferably, the model initialization specifically includes:

The initialization of the model parameters is completed by a central node or a preselected trusted node, and the initial parameters are generated in a random generation mode in the initialization process or are set based on prior historical data and an existing pre-training model;

generating a model evidence, namely generating a zero knowledge evidence related to the initial model parameters through the nodes;

The trusted node uses the Gossip protocol to broadcast the generated initial model parameters, verification keys and zero knowledge proof in a point-to-point mode, so that the initial model parameters and the proof thereof can rapidly cover the whole distributed network;

Each participating node uses a verification key to verify the received evidence after receiving the initial model parameters and the zero knowledge evidence, uses the model parameters which pass the verification to initialize the local model, refuses to receive the model parameters which do not pass the verification and records related anomalies;

And after all the nodes are successfully verified and accept the initial model parameters, determining that the model initialization process is completed.

Preferably, the following steps are executed in each local training process corresponding to the distributed training process:

Obtaining a model circuit structure and initial model parameters through a block chain or a server node in a predictor;

Model training is carried out by using local data, the gradient of the current model parameters is calculated, and the model parameters are updated according to the calculated gradient and the learning rate through a gradient descent algorithm;

Adding noise to the updated model parameters to ensure differential privacy, generating witness by using the updated model parameters and private input, generating zero knowledge proof based on the witness and the public key, and finishing training;

After training, the node uploads the updated model parameters, the zero knowledge proof and the verification key to the blockchain.

Preferably, in the process of the zero-knowledge machine learning model uplink and parameter updating algorithm, the updating work of the distributed model parameters is executed through the following steps:

Each node sends the updated model parameters, zero knowledge proof, verification keys and witness to the network through a foresight machine contract, and the nodes propagate among a plurality of nodes by utilizing a Gossip protocol;

the receiving end node asynchronously receives the model parameters and the zero knowledge proof from other nodes, and verifies the received model parameters and zero knowledge proof so as to ensure the validity and reliability of the calculation result;

If the verification is passed, the model parameters are added into a specific set, and if the verification is failed, the model parameters are discarded;

after each round of synchronization is finished, the node calculates new global model parameters through an aggregation algorithm according to the model parameters in the specific set;

The optimization work of the global model parameters is completed through multiple rounds of iteration, and in each round of iteration, each node repeatedly executes the steps of broadcasting, verifying and aggregating so as to gradually enhance the convergence and precision of the model and realize the consistency of the whole network;

after all the scheduled iteration rounds are completed, the model parameters of the whole network reach a convergence state, and a final global model is formed, so that each node is ensured to have a consistent and optimized model;

after the distributed model training synchronization is finished, the system records the finally aggregated model parameters and the corresponding zero knowledge proofs on the blockchain so as to realize the certification storage function.

The invention has at least the following beneficial effects:

1. the model calculation task is downloaded to the under-chain predictor server for execution through the ZK-SNARK technology, so that the calculation cost on the chain is obviously reduced, and the privacy is effectively protected.

2. The zero-knowledge machine learning model uplink and parameter updating algorithm is designed, model parameters subjected to differential privacy treatment are rapidly synchronized to a plurality of predictor nodes through a zero-knowledge gossip protocol, and updating processes of the parameters are recorded in a block chain, so that safety and traceability of the model parameters are ensured, correctness of the model parameters can be verified in distributed machine learning training of a multi-node predictor, training data cannot be revealed, and safety problems such as poisoning attack are prevented.

3. Through the distributed Oracle technology, it is ensured that the on-chain DApp remains efficient and scalable in handling complex tasks.

4. Through zero knowledge proof and differential privacy technology, the problem of privacy disclosure of user data in the interactive process of the link and the link is solved, and a safe and reliable machine learning scheme is provided for the fields sensitive to privacy such as medical treatment, finance and the like.

Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objects and other advantages of the invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out in the specification.

Drawings

In order to make the objects, technical solutions and advantageous effects of the present invention more clear, the present invention provides the following drawings for description:

FIG. 1 is a diagram of a zero-knowledge machine learning trusted data feed model architecture in an embodiment of the present invention;

FIG. 2 is a timing diagram of zero knowledge machine learning data feed in an embodiment of the present invention;

FIG. 3 is a diagram of a zero-knowledge machine learning model uplink and parameter updating architecture in an embodiment of the present invention;

FIG. 4 is a pseudo code schematic of algorithm 1 according to an embodiment of the present invention;

FIG. 5 is a pseudo code schematic of algorithm 2 in an embodiment of the present invention;

FIG. 6 is a pseudo code schematic of algorithm 3 in an embodiment of the present invention;

Fig. 7 is a pseudo code schematic of algorithm 4 in an embodiment of the invention.

Detailed Description

The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, it being understood that the preferred embodiments described herein are for illustration and explanation of the present invention only, and are not intended to limit the present invention.

The invention provides a data element privacy protection method based on zero-knowledge machine learning and on-chain verification, which follows the principle of 'under-chain calculation and on-chain verification'. The model calculation task is downloaded to the under-chain predictor server for execution through the ZK-SNARK technology, so that the calculation cost on the chain is obviously reduced, and the privacy is effectively protected. In addition, in order to ensure that the correctness of model parameters can be verified in the distributed machine learning training of the multi-node prophetic machine without revealing training data and preventing safety problems such as poisoning attack, a zero-knowledge machine learning model uplink and parameter updating algorithm is designed, model parameters subjected to differential privacy processing are rapidly synchronized to a plurality of prophetic machine nodes through a zero-knowledge gossip protocol, and the updating process of the parameters is recorded in a block chain, so that the safety and traceability of the model parameters are ensured.

By combining ZK-SNARK and distributed Oracle technology, an efficient and safe solution is provided for fusion of blockchain and machine learning. The method comprises the steps of firstly, moving a computation intensive task to the under-chain execution through an under-chain computing and on-chain verification framework, greatly relieving the computation resource pressure of a blockchain, ensuring that the on-chain DApp maintains high efficiency and expandability when processing complex tasks through a distributed Oracle technology, solving the problem of privacy leakage of user data in the interaction process between the on-chain and the under-chain through a zero knowledge proof and differential privacy technology, and providing a safe and reliable machine learning scheme for the privacy sensitive fields such as medical treatment, finance and the like. The research results show the synergistic potential of the blockchain technology and the machine learning in practical application, expand the application boundary of the blockchain in the intelligent decision field, and provide a technical basis for realizing more powerful intelligent contract and decentralization application.

The overall architecture of the system model of the solution is shown in FIG. 1, consisting of 8 entities, respectively, on-chain validation nodes, decentralised applications (Dapp), contractual users, model trainers, predictors, government authorities and non-authorities. Wherein,

1) The application of decentralization is that the service is provided on the blockchain in the form of intelligent contracts.

2) Contract user-the contract user is divided into model trainer and general user who can directly access Dapp the provided decentralized service, which requires his private and real data as input. For privacy reasons, dapp users wish to maintain the privacy of data while enjoying the de-centralized service. As with a typical blockchain, each DU has one or more public/private key pairs. The model trainer can zero the circuits in the knowledge proof.

3) The data provider comprises government authorities and non-authority authorities, wherein the government authorities refer to authorities such as hospitals and courts and can provide very specialized and sensitive privacy data, and the non-authority authorities refer to institutions such as factories and weather forecast providers and can provide non-sensitive data such as business and the like and guarantee the authenticity of the data. The data provider is independent of the blockchain, typically the data source that generates the user data and the service data. It can verify the data by signing the data for the user using its private key. The DA also knows when the data was generated and with whom the data was associated, but it never reveals the data to anyone. The signed data may be verified by everyone with a corresponding public key pk _a. For example, a physical examination report is generated and signed by a trusted hospital and the report is associated with a particular individual. Hospitals play the role of DA in this case, believing that report content is not revealed to others.

4) Prophetic mechanisms linking bridges above and below chains provide blockchains with the ability to actively de-chain data. It may actively go to the data provider to obtain the data and transmit it onto the chain. The method can ensure the safety of the data in the transmission process, but cannot ensure the reliability of the data source. The predictor server receives the request of the user contract on the chain, actively acquires the data of the data provider, generates the data of the data provider through the ZK-SNARK to prove, and returns to the chain to achieve the privacy of the data.

5) On-chain authentication nodes, which may be any node on the blockchain, that a verifier may authenticate with the authentication key and ZKP, and that the on-chain node may authenticate with the Proof and model of an incoming smart contract.

In one embodiment, the system model designs a blockchain-based decentralised insurance contract platform that primarily provides flight insurance services for airline passengers. The platform incorporates the roles of multiple parties such as contractual users (passengers), model trainers (airline engineers), data providers (authoritative and non-authoritative), and predictors (acquire in-chain data and provide privacy protection). The method comprises the following steps:

a) Decentralizing application, insurance contract Dapp

Dapp provide flight insurance services for users, including insurance purchases, compensation fee prediction, flight risk analysis, and other functions. The user can conveniently check the risk assessment of the own flight through Dapp, and decide whether to purchase insurance according to the assessment result. Dapp utilize smart contracts to perform all insurance transactions, including signing of insurance and setting of reimbursement triggering conditions, etc. To ensure privacy of the user data, dapp employs encryption or zero knowledge proof techniques to protect the user data, even if the user's private data is used for model training, without revealing to any unauthorized third party. The personal information, the flight condition, the health condition and other data provided by the user are encrypted or subjected to zero knowledge proof processing when the model is input, so that the privacy of the data is ensured. In the whole process, the specific data content of the user is always protected, and any leakage is avoided.

B) Contract user, passenger

Contract users in this role refer to airline passengers who purchase insurance. On the platform, passengers obtain risk assessment and insurance quotes by entering data on flights, personal health, travel plans, etc. Because of the importance of privacy of passenger data, they wish to ensure the security and privacy of personal data while enjoying the decentralized insurance service. The passenger performs authentication by means of the public and private keys, ensuring that only authorized users can access their private data and transaction records. After the passenger selects the insurance product, the passenger provides the required data (such as personal health status, flight information, etc.), and the insurance purchase and payment request is completed on the blockchain through the intelligent contract without relying on any central entity. In the data transmission and processing process, the data of the passengers always keep an encryption state, and the privacy is ensured to be effectively protected.

C) Model trainer, flight company Engineers

The airline personnel act as model trainers, responsible for collecting the passenger's desensitization data (e.g., flight information, historical insurance compensation records, etc.), and using these data to train a machine learning model for predicting flight risk and amount of insurance compensation. Model trainer verifies the correctness of model training and prediction by means of Zero Knowledge Proof (ZKP) through a blockchain platform, and ensures that model output is based on real and fair data. The airline trains risk assessment and compensation prediction models based on passenger-provided desensitization data (e.g., flight delay records, historical accident data, etc.). After model training is completed, the flight company generates zero knowledge proof, so that the model is trained based on compliance and real data, and private information of passengers cannot be revealed. The verified training model proves that the transparency and compliance of the whole process are ensured by uploading and verifying the blockchain intelligent contract.

D) Data provider, authority and non-authority

Data providers can be categorized as authoritative and non-authoritative. Authorities, such as hospitals and health data providers, are responsible for providing sensitive data related to the health of passengers, while non-authorities, such as weather forecast companies and flight data providers, provide real-time information and weather forecast related to flights, etc. Regardless of the type of institution, the data provider's core responsibility is to ensure the authenticity and accuracy of the data and to sign the provided data using a private key to verify the validity and trustworthiness of the data source. Specifically, an authority (such as a hospital) provides health data (such as whether there is a history of disease, whether it is suitable for flight, etc.) of passengers, and signs the data with a private key to ensure the reliability and accuracy of the data. The hospital does not disclose the specific health condition of the passengers, but provides encrypted certificates to ensure the protection of the privacy of the passengers. Non-authorities (e.g., weather forecast companies) provide real-time weather information (e.g., the day of the flight, whether or not there is bad weather, etc.) that is also subject to signature verification to ensure the trustworthiness of their source. After being signed, the data is transmitted to the blockchain through the propulsor for passengers and model trainers to use. The encrypted and verified data are used for risk assessment and insurance compensation calculation, so that accurate prediction of the model based on real and credible data is ensured, and meanwhile, the privacy of passengers is ensured.

E) Prophetic machine-bridge with on-and off-link data

The predictors act as bridges for on-and off-chain data and are responsible for actively retrieving data from external data providers (e.g., weather forecasters, flights, etc.) and transmitting it to the blockchain. It provides external data support for blockchain systems, facilitating model prediction and execution of intelligent contracts. The predictors ensure the safety of data in the transmission process, ensure the data privacy through zero knowledge proof (ZK-SNARK), but cannot completely verify the reliability of the data source. Specifically, after receiving the request of the intelligent contract, the predictor actively links off to acquire information such as weather, flights and the like, and encrypts the data. The predictor then generates a zero knowledge proof, which is uploaded to the blockchain after ensuring the privacy and security of the encrypted data. Although the predictive engine can secure the data transfer process, it requires a trust mechanism to be established with a trusted data provider to ensure the reliability of the data, since the accuracy of the data source cannot be fully verified.

F) On-chain authentication node-verifier in blockchain network

The verification node is an important participant in the blockchain network and is responsible for verifying model training results, zero knowledge proof and authenticity of data. Each node may ensure accuracy and compliance of contract execution by checking parameters, model update certificates, and data signatures of smart contracts. When the verification node receives information such as intelligent contracts, model training results, data signatures and the like on the chain, the verification node uses a verification key and zero knowledge proof to verify. The validation node confirms whether the model training and data provision process is compliant and ensures that all operations are performed in a transparent, non-tamperable environment. If the verification is successful, the node records the transaction and continues to execute the operation on the chain, and if the verification is failed, an exception handling mechanism of the contract is triggered, so that the safety and the reliability of the system are ensured.

In one embodiment, the business process of the system comprises:

A. System initialization

1. Parameter initialization, namely setting global model parameters { theta, info _model,info_circuit,pk_a,sk_a }, wherein theta is a weight parameter of the model and is used for representing the training state of the model. info _model contains the input and output formats of the model, model type (e.g., regression model, classification model, etc.), and other specific information about the model architecture. info _circuit is a configuration associated with a zero knowledge proof circuit and specifically includes information such as the number and type of inputs disclosed (e.g., structural information of the model), the number and type of inputs privately (e.g., user data or training data), circuit proof logic (i.e., how the model verifies the correctness of the inputs and outputs), and the like. pk _a,sk_a is the public and private keys of the signature generated by the data source. These global parameters θ, info _model,info_circuit,pk_a will be broadcast and stored in the federated blockchain and stored and managed by way of smart contracts. These parameters not only provide shared underlying information to the participants of the system, but also ensure trust and transparency of the parties. On this basis, all participants (e.g., model trainers, data providers, predictors, etc.) can access and use these global parameters to perform relevant operations (e.g., data validation, model training and validation, zero knowledge proof generation, etc.) via the smart contracts. In addition, the system should ensure the security of these parameters, preventing malicious tampering and data leakage.

When the system is initialized, the intelligent appointment defines and deploys some basic rules and protocols, so that the participants can execute operations according to preset contract terms. For example, the data provider may provide encrypted data according to contract requirements, while the model trainer trains and generates zero knowledge proof using the disclosed model information and circuit configuration. All these operations will be recorded on the blockchain, ensuring transparency and non-tamper resistance of each step.

2. Circuit initialization-circuit initialization is a critical step in a zero knowledge proof system that ensures that the circuit can properly verify the validity of the computing process and generate the proper proof and verification keys. The process involves setting a set of parameters { λ, ζ, pk, vk, C } associated with the circuit and is performed as follows:

Security parameters lambda first, a security parameter lambda, typically a positive integer, needs to be set, indicating the security level required for a zero knowledge proof system. The security parameters determine the complexity of the circuit and the strength of the encryption algorithm used. A larger lambda generally provides greater security but also increases the computational cost.

The circuit arrangement Setup (lambda) lambda _ zeta _ generates a set of basic common parameters xi on the bilinear cluster (or other suitable mathematical structure) by means of the security parameter lambda. These parameters include the keys and algorithms required to build the zero knowledge proof and provide the basis for subsequent circuit compilation and proof generation. The common parameter ζ is information shared by all the participants of the circuit.

Circuit compilation Compile (ζ, info _circuit) →c next, the circuit is compiled using the generated common parameters ζ and the detailed configuration of the circuit (i.e., info _circuit, including public inputs, private inputs, attestation logic, etc.). This step converts the circuit from an abstract description to executable attestation logic by performing a compilation process, resulting in a circuit description C that contains specific rules on how to process the input data, verifying the correctness of the computation.

Key generation GenKey (C) → (pk, vk) →the key for generating the certificate and authentication needs to be generated after the circuit compilation is completed. The certification key pk and the verification key are generated by a key generation algorithm GenKey (C) from the compiled circuit C. The attestation key pk is used to generate a zero knowledge attestation that ensures that an attester can prove that a certain computational process is valid without exposing private data. The verification key vk is used for the verifier to check the validity of the zero knowledge proof and ensure the correctness of calculation. The attestation key vk is typically held by the constructor of the circuit or model trainer and is responsible for providing the necessary key information when generating the zero knowledge attestation. The validation key pk is typically held by a validation party (e.g., a validation node in a blockchain network) for validating the zero knowledge proof. This key can also be disclosed so that anyone verifies the authenticity of the calculation.

Circuit parameters and key storage once these keys and circuit descriptions are generated, the relevant information (e.g., ζ, pk, vk, and C) is typically stored in a blockchain or other de-centralized storage system. It is critical for a participant (e.g., model trainer, data provider, etc.) to be able to access such information. The common parameters of the circuit and the authentication key will ensure that all participants can perform proper authentication at different stages without exposing sensitive data.

B. Machine learning trusted data feed process under zero knowledge chain

The circuit model in the machine learning credible data feeding flow under the zero knowledge chain has been initialized and trained, and can directly complete the reasoning task of the model. The process is that a user sends out a request, a data feed request triggers an event request to a foresight machine contract by a user contract, then the event request goes to a foresight server to request data from an external data source, a model proof is generated in the foresight machine, then the model forecast obtains a result, and an end user takes the whole process of returning the result.

As shown in fig. 2, the machine learning application flow under the zero knowledge chain mainly includes the following steps:

1. User request when accessing the decentralised application (Dapp), the user submits a service request according to the different functions provided by Dapp. In the request, the user needs to carry identity authentication information, which may include personal identity IDAut |h _i, authentication information, and other business-related privacy data (e.g., health data, flight information, etc.). At this step, the identity of the user and the business data are uploaded to the business contract. The service contract, upon receiving the request, forwards this information to the predictor server on the blockchain. Here, although the private data of the user is transferred, the private data is not directly disclosed, but is processed by a decentralization method, so that the privacy is ensured not to be revealed.

2. The predictor obtains data, namely, the contract event trigger is automatically started in the time specified by the service contract, and the predictor server is requested to obtain the data. The predictor requests the user's private data from the data source through the HTTP request carrying the user's authentication information IDAuthi. The data sources may include authorities such as hospitals, airlines, or non-authorities such as weather forecast companies. The data source verifies the authenticity of the user's identity and signs the user's private data with private key sk _a. Signature data sig=sig (IDAuth _i,Hash(data),sk_a), data, and pk _a are returned to the predictor server. The predictor server verifies the integrity of the data, and desensitizes the data according to specific business contract model information (info _model) and circuit information (info _circuit) to obtain desensitized data m. The predictor then sends info _model、info_circuit, m, and the model parameters θ ₁ of the current server to the model contract. At this point, the data is still in an encrypted or desensitized state, ensuring that user privacy is not exposed.

3. The model trainer sends Drequest to the model contract to obtain the model information info _model, the circuit information info _circuit, the model parameters θ ₁ and the desensitization data m. Converting public data m into private witness of model circuitData such as theta ₁ and m are used as input parameters of algorithm 1, a model trainer obtains new model parameters theta ₂ through data training, generates a public witness x of a model through common parameters zeta, theta ₁ and theta ₂, generates proof pi ₂, generates a verification key and a proof key of a model circuit, and passes throughThe results are returned to the model contracts for the blockchain. The validation node of the blockchain will periodically actively validate the model certification transaction uploaded to the blockchain ifIf the result is true, the model reasoning is correct, the model parameters theta ₂ are effective, the fusion of the model parameters is carried out to obtain new model parameters theta, the previous circuit evidence generation step is repeated, a group of new model evidence pi ₂ is generated, and the model parameters are verified by a later verification node. After verification, the machine learning model circuit parameters of the service are approved by the blockchain and the predictor server, and can be used for service call of common users.

4. Verification and business Process after the model parameters are updated by the predictor server, the user's personal data is converted into private input witness of the model CircuitAnd generates a corresponding zero knowledge proof. Will be proved byAnd verifying to ensure the credibility and accuracy of the model reasoning process. After the verification node is successfully verified, the model reasoning result is considered to be correct, and the service request of the user can be processed continuously. For example, assuming that this is a predictive service for flight delay insurance, the predictor server enters real-time information for the flight and health data for the user into a trained model, which returns the probability of the flight delay and its amount of insurance compensation. If the model reasoning is successful, the contract continues to execute the reimbursement flow, and the corresponding amount is paid to the user. Similarly, for agricultural insurance reimbursement caused by severe weather, the user's data will be used for model reasoning, ultimately generating the reimbursement amount.

All of these operations are automatically performed by intelligent contracts, ensuring the decentralization and transparency of the business process. The business contracts ensure that each step is credible, the data privacy is protected and the model reasoning process is not tamperable through a blockchain and zero knowledge proving technology.

In a specific embodiment, in the centralized predictor architecture, since all training data, computing resources and model updates are managed and executed by a single central node, model parameter updating and verification operations in the whole training process are centralized in the node, and no synchronization problem among the nodes is considered, and no additional algorithm is needed to coordinate parameter updating of a plurality of nodes. After model training is finished, the central node directly updates parameters and stores the updated model.

However, in the distributed predictor architecture, since different nodes have independent data sets, and each node performs independent training locally, the final model parameters need to be synchronized through an effective parameter updating algorithm, and are combined into a globally consistent model through a parameter aggregation operation. The local training result of each node and the updated model parameters must ensure the legality and consistency of the nodes through corresponding mechanisms, so that the malicious nodes are prevented from tampering or erroneously updating to influence the global model.

In order to ensure the safety and data privacy protection of the model updating process, the parameter updating and aggregation operation in the distributed environment not only needs to design an effective algorithm to process parameter synchronization among nodes, but also needs to combine privacy protection mechanisms such as zero knowledge proof and the like to ensure that the updating process of each node is legal and sensitive data is not exposed. In addition, because the training progress is different between distributed nodes, some nodes may not update their model parameters in time due to network delay or failure, so the whole system needs to have a certain fault tolerance and consistency guarantee.

Thus, to achieve the above objective, a zero-knowledge machine learning model uplink and parameter update algorithm is proposed to ensure the safety, efficiency and consistency of the model training and update process, see fig. 3. The algorithm has the following core characteristics and innovation points:

1. Ensuring correctness and privacy protection of model parameters

The algorithm ensures that the updating of the model parameters is strictly verified every time in the multi-node distributed predictive engine by introducing a zero knowledge proof technology, thereby ensuring the accuracy of the training process. In the whole process, the zero knowledge proof can verify the validity of the training result without exposing specific contents of the training data or model parameters. The design not only protects the privacy of the participating nodes, but also effectively prevents the security risks such as poisoning attack of malicious nodes, and ensures the credibility of the system.

2. Protection against poisoning attacks and malicious updates

By generating and verifying zero knowledge proof in the parameter updating process, the algorithm can identify and reject the model parameters which are updated by mistake or deliberately tampered by malicious nodes, and the threat of the poisoning attack to the global model is radically stopped. After the local training of each node is finished, the submitted model parameters automatically generate corresponding zero knowledge certificates, and other nodes verify the certificates before synchronizing the parameters, so that only legal updating can be ensured to influence the global model.

3. Distributed synchronization and gossip protocol

The algorithm realizes the rapid distributed synchronization of the model parameters through the gossip protocol. Under the environment of multiple nodes, each node can gradually synchronize model parameters generated by local training and corresponding zero knowledge proof to other nodes in a point-to-point mode, so that the problem of single-point faults in the traditional centralized architecture is avoided. In addition, the privacy protection capability of the system in the distributed synchronization process is further enhanced by carrying out differential privacy processing on the model parameters.

4. Uplink logging and traceability

To further improve security and transparency, the algorithm records the update process of the model parameters and their corresponding zero knowledge proof on the blockchain. Through the non-tamperability and traceability of the blockchain, the update histories of all nodes can be safely stored and verified. The design not only ensures that the parameter synchronization process in the distributed environment is completely transparent, but also provides technical support for possible subsequent audit or traceability requirements.

5. Consistency and security assurance of global model

After the parameter synchronization is completed, each node performs a parameter aggregation operation (e.g., a weighted average or other method) based on the collected model parameters and their zero knowledge proof to generate globally consistent model parameters. The update history of the blockchain record is combined with a zero knowledge proof verification mechanism, so that the safety, consistency and legality of the global model parameters under multi-node cooperation are ensured.

The algorithm comprises the following three subprocesses of model initialization, model trainer local training of the model and uploading of the model, and synchronization, updating and aggregation of model parameters.

Algorithm 1 main function ZKML training and synchronization algorithm describes the whole process, (see fig. 4):

1) Model initialization-for each node i, an Initialize Model () subroutine is called to Initialize Model parameters M _i.

2) Distributed training, in each training period t, each node i invokes the Train Model () subroutine to update its Model parameters M _i.

Model synchronization and aggregation Each node i invokes Synchronize Model () subroutine to synchronize its model parameters and get the final model parametersFinal model parameters of all nodesAggregate into a global final model parameter M.

In one particular embodiment, the distributed ZKML model initialization flow includes:

model initialization is a first step of training a distributed zero-knowledge machine learning (ZKML) system, and is mainly aimed at ensuring that all participating nodes in a network can synchronously possess consistent and reliable initial model parameters, so as to lay a foundation for a subsequent distributed training process. The specific flow is as shown in algorithm 2 (see fig. 5) and includes:

Initial model parameters are generated, and initialization of the model parameters is completed by a central node or a preselected trusted node. These initial parameters M may be generated in a random manner or set based on some prior knowledge (e.g. historical data) or on an existing pre-trained model. The generation process of the initial model needs to ensure that the initial model meets the basic requirements of subsequent training, and has enough robustness.

Generating model evidence to ensure correctness and privacy of model parameters, the node will generate zero knowledge proof related to the initial model parameters. Specifically, the node will generate a witness w using the private input and the public input (private data E, zero knowledge circuit initialization parameters ζ and initialization model parameters M), and then generate zero knowledge proof pi using the witness and public key pk. The specific definition of the circuit is to calculate the loss value of the model through the model parameter M and the private data E, and the model is effective if the model is required to be proved.

Broadcasting by the Gossip protocol, wherein in order to ensure that all nodes can receive consistent initial model parameters, the trusted nodes use the Gossip protocol to broadcast the generated initial model parameters M, verification keys vk and zero knowledge proof pi, and the Gossip protocol enables the initial model parameters and the proof thereof to rapidly cover the whole distributed network by a point-to-point propagation mode, and meanwhile, the propagation fault tolerance is enhanced.

The verification model proves that each participating node performs the following operations after receiving the initial model parameter M and the zero knowledge proof pi:

A. Authentication zero knowledge proof the node uses the authentication key vk to authenticate the received proof pi. The verification process relies on the disclosed circuit structure and model parameters M without accessing the secret data E. The zero knowledge proof will be verified using the verification key vk. In this process, the participants only know the structure and the public witness (model parameter M) of the public data set and the circuit, and can know whether the model parameter is reliable or not without knowing the specific value of the private data of the real user.

B. Accept or reject model parameters-the validated model parameters are accepted and used to initialize the local model M _i -if the validation fails, the node will reject the parameters and record the relevant anomalies.

And after all the nodes are successfully verified and receive the initial model parameters M, the initialization process of the model is completed. By combining zero knowledge proof and Gossip protocol, the safety, consistency and traceability of the parameters of the initialization model are ensured. Each node can be sure of the credibility and validity of the initial model parameters under the condition that other private data are not accessed.

In one particular embodiment, in a distributed zero-knowledge machine learning (ZKML) system, local training is an important step in which each participating node independently updates model parameters based on its own data. This process aims to ensure the credibility and privacy of the training process through a zero knowledge proof technology, and at the same time, prevent data disclosure through a differential privacy technology. As shown in fig. 3, in the local training process, a model trainer interacts with the blockchain and the predictor node to acquire necessary information, and after model training is completed, generated proof and updated model parameters are uploaded to the blockchain for full-network verification and synchronization.

Specific details are as in algorithm 3, with reference to fig. 6:

1. acquiring circuit and model information

The model trainer obtains the model circuit structure and initial model parameters through a server node in the blockchain or propulsor. These information define the computational flow required in the training process, including the definition of the loss function, the gradient calculation method, the construction of the zero-knowledge circuit, etc.

2. Local gradient computation and update

Model Trainer _i model training using local data E _i to obtain gradients with computed current model parameters M _i Gradient calculations can be expressed as

Wherein, The loss function is represented by a function of the loss,Representing current model parameters. Then according to the calculated gradientAnd learning rate η updates model parameters M _i. In particular toThis process is called a gradient descent step. The learning rate η is a super-parameter.

3. Differential privacy handling

In order to protect data privacy, the updated model parameters need to be added with noise N _i to ensure differential privacy. Noise N _i is generated in a manner that obeys gaussian distribution:

Where Δ is sensitivity and ε is the privacy budget. The model parameters after adding noise are expressed as:

M′_i＝M_i+N_i

the purpose of the noise addition is to make it difficult for an external attacker to infer the specific content of the original data E _i even if the model parameters M' _i are acquired, thereby achieving privacy protection.

4. Generating zero knowledge proof

The model trainer uses the updated model parameters M' _i and the privacy input ζ to generate witness w _i:

w_i＝GenWitness(M′_i,ξ)

Based on witness w _i and public key pk, a zero knowledge proof pi _i is generated:

π_i＝Prove(M′_i,pk,w_i)

the proof ensures the correctness of the training process and parameter updating while hiding the private data content.

5. Model proof upload and verification

After training is completed, the node uploads to the blockchain the updated model parameters M' _i (public input), zero knowledge proof pi _i, verification key vk. After the blockchain verification node receives the data, the verification key vk is used for verifying the zero knowledge proof pi _i:

A. Verification is passed, that is, calculation of the explanation node is correct and reliable, and updated model parameters are recorded and synchronized into the predictor node.

B. And the verification fails, namely the update of the node is refused, and the pollution of the global model by the unreliable data is prevented.

6. Interaction and synchronization

Through the above process, all local node updated model parameters and their certificates are verified and recorded on the blockchain. The verified parameters are synchronized into the distributed network through the foreshadowing machine contract, so that the whole network nodes can perform further global model aggregation based on the trusted local updating result.

In a specific embodiment, the updating of the distributed ZKML model parameters is a key step for realizing the consistency and optimization of the global model, and is mainly completed through repeated iteration of broadcasting, verification and aggregation, so that the computing efficiency and the security are improved while the data privacy of the system is ensured.

As shown in fig. 3, the model parameters M' _i, proof pi _i, verification key vk, and witness w _i are sent to the predictor node via a predictor contract, and the propagation of the full node is performed via the Gossip protocol. T1, T2, T3 in the figure are the turns of Gossip propagation, and each propagation in the figure is these parameters. In distributed machine learning, model synchronization is an iterative process that achieves consistency of global model parameters through multiple rounds of broadcasting, receiving, verifying, and aggregating.

Specifically, as algorithm 4, referring to fig. 7:

1. Broadcasting of parameters

Each node will send its updated model parameters M' _i, zero knowledge proof pi _i, verification key vk and witness w _i to the network via the foreshadowing contract. And propagates among multiple nodes using the Gossip protocol. The spreading of the information is done in turn according to the turns T1, T2, T3 in fig. 3, each time the node broadcast is randomly selected by the system, while ensuring that all nodes in the network have the opportunity to receive the information. The randomization mechanism can not only effectively avoid the problem of network isolation, but also remarkably reduce the risk of single-point faults, thereby improving the robustness and reliability of the whole system.

2. Model parameter reception and verification

A. asynchronous reception-node P will asynchronously receive model parameters from other nodes and zero knowledge prove that the number and time of pi receptions varies depending on network conditions, and this flexibility improves the fault tolerance of the system, allowing the system to operate normally in a dynamic and uncertain network environment.

B. And (3) zero knowledge verification, namely each node performs strict verification on the received model parameters and zero knowledge proof so as to ensure the validity and reliability of the calculation result. The specific steps of the verification process are as follows:

IsVali|d_j←Verify(w_j,vk,π_j)

By witness w _i, verification key vk, and proof pi _j, it is verified whether the model parameters meet the set circuit conditions. If the verification passes, the model parameters M _j' will be added to the set R, and if the verification fails, the parameters will be discarded, thus defending against malicious attacks.

3. Aggregation of parameters

After each round of synchronization is finished, the node calculates new global model parameters through an aggregation algorithm. Common polymerization modes include:

simple averaging, wherein model parameters of all nodes participate in aggregation with the same weight, and the formula is as follows:

the method is simple in calculation and suitable for scenes with similar node data quantity and calculation capacity.

Weighted average, namely, assigning weights w _j to the nodes according to the characteristics (such as local data quantity or computing capacity) of the nodes, and performing weighted calculation on model parameters:

Where ω _j is the weight of the j-th node, typically calculated based on the local data volume or computing power of the node, weight ω _j >0. And the normalization factor of the weight is used for ensuring the correctness of the calculation result.

The weighted average can more effectively reflect the difference of data distribution and calculation resources, thereby improving the efficiency and fairness of model training. The aggregated global model parameters M not only can more comprehensively integrate the calculation results of all nodes, but also can reduce the influence of abnormal nodes on the overall model, and ensure the accuracy and stability of the global model. In the iteration of a plurality of rounds, the continuous optimization of parameter aggregation gradually converges the global model, and finally high-quality distributed machine learning is realized.

4. Multiple round iterative synchronization

Optimization of global model parameters is accomplished through N rounds of iterative synchronization. In each iteration, each node repeatedly executes the steps of broadcasting, verifying and aggregating to gradually enhance the convergence and precision of the model and realize the consistency of the whole network. The specific description is as follows:

A. And in each iteration, each node transmits the current updated model parameters M' _i and zero knowledge proof pi _i to other randomly selected neighbor nodes again through the Gossip protocol. By the method, each node is guaranteed to receive model update of all other nodes with high probability, and information island is avoided.

B. Step-by-step verification, in each round, the node independently verifies the received model parameters and zero knowledge proof, and only accepts the model parameters passing verification. The mechanism effectively resists the attack action of malicious nodes and prevents the global model from being polluted by error parameters.

C. and (3) iterative aggregation, namely after each round of synchronization is completed, each node aggregates the model parameters passing through verification in the current round to generate new global model parameters. The process gradually integrates the calculation results of all the nodes, so that the global model gradually tends to be converged.

D. And improving convergence, namely continuously approaching the optimal solution to the global model parameters in the system through N rounds of synchronous iteration. In a distributed environment, multiple iterations can also effectively alleviate problems caused by network delay or node asynchronization, and accuracy and consistency of a final result are ensured.

After all the scheduled iteration rounds are completed, the model parameters of the whole network reach a convergence state, and a final global model is formed, so that each node is ensured to have a consistent and optimized model.

5. Evidence of final outcome

After the distributed model training synchronization is finished, the system records the finally aggregated model parameters M and the corresponding zero knowledge proof pi on the blockchain so as to realize the evidence storage function. The process not only improves the transparency and the trust of the distributed system, but also provides security assurance and traceability for subsequent verification and audit. The specific description is as follows:

A. Model parameter verification, namely, the final global model parameter M is uploaded to a blockchain for storage after aggregation and verification. By the non-tamperable characteristic of the block chain, the recorded parameters can be ensured to be complete and reliable, and parameter pollution caused by malicious modification or system failure is prevented.

B. Zero knowledge proof and evidence, namely, uploading zero knowledge proof pi along with model parameters, which is used for proving the legality of the parameters and the correctness of the calculation process. The blockchain verification node can verify pi through the public verification key vk, so that the final model can meet the set circuit condition, and private data of the node is not required to be accessed.

C. Security and traceability-the model parameters and proof after the certification provide a public trusted basis for the subsequent use. The method provides reliable data source for auditing the running process of the distributed system or verifying the model result, and enhances the safety and transparency of the system.

D. Multiparty verification that after the forensic information is stored in the blockchain, any authorized party can read the relevant records and verify the validity of the model parameters. This mechanism not only enhances the openness of the system, but also further improves the level of trust in the cross-organizational collaboration.

Through the certification function of the blockchain, each step of the distributed model training can be strictly recorded and verified, and important support is provided for constructing a safe, transparent and efficient distributed system.

Finally, it is noted that the above-mentioned preferred embodiments are only intended to illustrate rather than limit the invention, and that, although the invention has been described in detail by means of the above-mentioned preferred embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the invention as defined by the appended claims.

Claims

1. A data element privacy protection method based on zero-knowledge machine learning and on-chain verification, comprising:

2. The method for protecting the privacy of data elements based on zero-knowledge machine learning and on-chain verification according to claim 1, wherein in the system model:

3. The method for protecting the privacy of data elements based on zero-knowledge machine learning and on-chain verification according to claim 1, wherein the zero-knowledge under-chain machine learning trusted data feed process comprises:

4. A method for protecting the privacy of data elements based on zero-knowledge machine learning and on-chain verification according to claim 3, wherein the application flow of the zero-knowledge under-chain machine learning mainly comprises the following steps:

5. The method for protecting data element privacy based on zero-knowledge machine learning and on-chain verification according to claim 4, wherein after the model parameters are updated by the predictor server, the personal data of the user is converted into a private input witness of the model circuit and a corresponding zero-knowledge proof is generated;

6. The method for protecting the privacy of data elements based on zero-knowledge machine learning and on-chain verification according to claim 1, wherein the zero-knowledge machine learning model uplink and parameter updating algorithm process comprises the following steps:

7. The method for protecting the privacy of data elements based on zero-knowledge machine learning and on-chain verification according to claim 6, wherein the model initialization specifically comprises:

8. The method for protecting data element privacy based on zero-knowledge machine learning and on-chain verification of claim 6, wherein each of the local training processes corresponding to the distributed training process performs the steps of:

9. The method for protecting the privacy of data elements based on zero-knowledge machine learning and on-chain verification according to claim 6, wherein the updating work of the distributed model parameters is performed during the zero-knowledge machine learning model uplink and parameter updating algorithm by the following steps: