WO2024239593A1

WO2024239593A1 - Hybrid federated logistic regression method based on homomorphic encryption

Info

Publication number: WO2024239593A1
Application number: PCT/CN2023/136359
Authority: WO
Inventors: 骆双阳; 章庆; 贺伟
Original assignee: China Telecom Bestpay Co Ltd
Current assignee: China Telecom Bestpay Co Ltd
Priority date: 2023-05-25
Filing date: 2023-12-05
Publication date: 2024-11-28
Anticipated expiration: 2025-11-25
Also published as: CN116776293A

Abstract

The present invention relates to the technical field of computers, and provides a hybrid federated logistic regression method based on homomorphic encryption. The method comprises: performing module segmentation on data to form data data1 and data data2; performing longitudinal federated logistic regression training on the data data1 and the data data2; for the data data1 and the data data2, using a random number r as a seed to generate a homomorphic public-private key pair; encrypting w1, intercept1, w2, and intercept2, and performing security aggregation on encrypted parameters; and sending an aggregation intermediate result parameter to each participant, so that each participant decrypts the aggregation intermediate result parameter, and performs parameter iteration until a requirement is met to obtain a final training model. The bottleneck of modeling and intercommunication of internal and external data of a single institution is effectively broken through, local data does not leave a database, and federated learning model training can be performed in combination with multi-party data, thereby improving the accuracy of a federated learning model.

Description

A hybrid federated logistic regression method based on homomorphic encryption

Technical Field

本发明涉及计算机技术领域，具体而言，涉及一种基于同态加密的混合联邦逻辑回归方法。The present invention relates to the field of computer technology, and in particular to a hybrid federated logistic regression method based on homomorphic encryption.

Background Art

隐私计算英文全称为PrivacyCompute，是指在保护数据本身不对外泄露的前提下实现数据分析计算的技术集合。与传统数据使用方式相比，隐私计算的加密机制能够增强对于数据的保护、降低数据泄露风险。因此，将其视为“数据最小化”的一种实现方式。同时，传统数据安全手段，比如数据脱敏或匿名化处理，都要以牺牲部分数据维度为代价，导致数据信息无法有效被利用，而隐私计算则提供了另一种解决思路，保证在安全的前提下尽可能使数据价值最大化。Privacy computing is known as Privacy Compute, which refers to a collection of technologies that implement data analysis and computing while protecting the data from external leakage. Compared with traditional data usage methods, the encryption mechanism of privacy computing can enhance data protection and reduce the risk of data leakage. Therefore, it is regarded as a way to achieve "data minimization". At the same time, traditional data security methods, such as data desensitization or anonymization, all sacrifice some data dimensions, resulting in the inability to effectively use data information. Privacy computing provides another solution to ensure that the value of data is maximized as much as possible under the premise of security.

多方安全计算英文全称为SecureMulti-PartyComputation，一般缩写为MPC，指的是在保护数据安全隐私的前提下实现多方间数据融合计算。在网络环境中，某一任务的参与者分别拥有自己一方的数据，多个参与者通过通信协议分布式的计算某个功能函数从而完成计算任务。各个参与者为计算函数提供各自的输入，同时参与者得到函数的正确输出。同时这个过程实现对用户隐私数据的保护，也就是参与者除了可以获得自己应得的输出外，不能得到其他用户的任何输入信息。安全多方计算可以实现隐私保护下的用户数据共享，对于数据的有效利用具有重要意义。尤其信息安全相关的法律规定了严格的用户数据保护要求，传统的数据直接共享的方式已经不能满足要求。The full name of multi-party secure computing is Secure Multi-Party Computation, generally abbreviated as MPC, which refers to the realization of data fusion computing among multiple parties under the premise of protecting data security and privacy. In a network environment, participants in a task each have their own data, and multiple participants calculate a certain function in a distributed manner through a communication protocol to complete the computing task. Each participant provides their own input for the computing function, and the participants obtain the correct output of the function. At the same time, this process protects the user's privacy data, that is, in addition to obtaining the output they deserve, participants cannot obtain any input information from other users. Secure multi-party computing can realize user data sharing under privacy protection, which is of great significance for the effective use of data. In particular, laws related to information security stipulate strict user data protection requirements, and the traditional method of direct data sharing can no longer meet the requirements.

传统的安全多方计算是通过复杂的交互式密码协议实现的。参与者将其拥有的输入数据加密后按照协议传递给其他参与者，参与者通过一系列对密文的计算与转换得到原始计算任务的输出。在此过程中，由于参与者不能直接对原始的数据做计算，其计算效率以及计算功能都会受到很大限制。通过传统安全多方计算协议实现的安全多方计算要在计算功能与计算效率直接做取舍。第一种是仅支持特定的相对简单的计算函数，不支持复杂的或者灵活的计算任务。第二种是支持通用的计算任务，但是由于效率较低仅支持少量数据的计算。Traditional secure multi-party computing is implemented through complex interactive cryptographic protocols. Participants encrypt their input data and pass it to other participants according to the protocol. Participants obtain the output of the original computing task through a series of calculations and conversions on the ciphertext. In this process, since participants cannot directly perform calculations on the original data, their computing efficiency and computing functions will be greatly limited. Secure multi-party computing implemented through traditional secure multi-party computing protocols must directly make a trade-off between computing functions and computing efficiency. The first type only supports specific relatively simple computing functions and does not support complex or flexible computing tasks. The second type supports general computing tasks, but due to its low efficiency, it only supports the calculation of a small amount of data.

联邦学习英文全称为FederatedLearning，简称FL)，又名联邦机器学习。联邦学习是实现在本地原始数据不出库的情况下，通过对中间加密数据的流通与处理来完成多方联合的机器学习训练。根据参与计算的数据在数据方之间分布的情况不同，可以分为横向联邦学习、纵向联邦学习和联邦迁移学习。其中横向联邦学习，指不同参与方的数据有较大的特征的重叠，但数据样本，即特征所属的样本的重叠度不高。纵向联邦学习，指的是不同参与方的数据样本有较大的重叠，但样本特征的重叠度不高。目前横向联邦学习和纵向联邦学习已在众多场景下有落地得到一些应用。同时随着越来越多的应用场景应运而生，出现了需要同时使用横向联邦学习和纵向联邦学习才能满足需求的情况，基于目前的单向联邦学习，在面对数据混合切分场景时(数据即包含水平切分也包含垂直切分，例如A和B拥有相同样本不同特征，C和D拥有相同特征不同样本)，需要通过多个阶段构建多个联邦学习模型来分别实现横向联邦学习和纵向联邦学习才能满足需求。但这种方式下的操作步骤比较繁琐，数据交互量通常较大，需要占用较多的网络资源和计算资源，不仅耗时耗力，还影响效率。Federated learning is called Federated Learning (FL for short), also known as federated machine learning. Federated learning is to complete multi-party joint machine learning training by circulating and processing intermediate encrypted data without leaving the local original data. According to the different distribution of data involved in the calculation among data parties, it can be divided into horizontal federated learning, vertical federated learning and federated transfer learning. Among them, horizontal federated learning means that the data of different participants have a large overlap of features, but the overlap of data samples, that is, the samples to which the features belong, is not high. Vertical federated learning means that the data samples of different participants have a large overlap, but the overlap of sample features is not high. At present, horizontal federated learning and vertical federated learning have been implemented in many scenarios and have been applied. At the same time, as more and more application scenarios emerge, there are situations where horizontal federated learning and vertical federated learning need to be used simultaneously to meet the needs. Based on the current one-way federated learning, when facing a mixed data segmentation scenario (data includes both horizontal segmentation and vertical segmentation, for example, A and B have the same samples but different features, and C and D have the same features but different samples), it is necessary to build multiple federated learning models through multiple stages. To achieve horizontal federated learning and vertical federated learning respectively, the demand can be met. However, the operation steps in this way are relatively cumbersome, the amount of data interaction is usually large, and it takes up more network resources and computing resources, which is not only time-consuming and labor-intensive, but also affects efficiency.

发明内容Summary of the invention

本发明的目的在于提供一种基于同态加密的混合联邦逻辑回归方法，其能够借助隐私保护的分布式混合联邦机器学习技术，利用多家相同行业不同机构，不同行业不同机构的数据，有效突破单个机构内部和外部数据建模互通瓶颈，实现本地数据不出库，但能联合多方数据进行联邦学习模型训练，使各参与方对等并共同获益的联合建模，以提升联邦学习模型的准确率。The purpose of the present invention is to provide a hybrid federated logistic regression method based on homomorphic encryption, which can use the distributed hybrid federated machine learning technology with privacy protection, utilize the data of multiple institutions in the same industry and different institutions in different industries, effectively break through the bottleneck of internal and external data modeling intercommunication of a single institution, and realize that local data is not stored out of the warehouse, but can be combined with multi-party data for federated learning model training, so that all participants can achieve equal and mutually beneficial joint modeling, so as to improve the accuracy of the federated learning model.

本发明的实施例是这样实现的：The embodiment of the present invention is achieved as follows:

第一方面，本申请实施例提供一种基于同态加密的混合联邦逻辑回归方法，其包括如下步骤：In a first aspect, an embodiment of the present application provides a hybrid federated logistic regression method based on homomorphic encryption, which comprises the following steps:

对第一数据方的数据进行模块切分，得到第一数据分块和第二数据分块，并由第一数据分块和第二数据方组成数据data1，由第二数据分块和第三数据方组成数据data2；The data of the first data cube is divided into modules to obtain a first data block and a second data block, and the first data block and the second data cube form data data1, and the second data block and the third data cube form data data2;

对数据data1进行纵向联邦逻辑回归训练，合并参数得到w1、intercept1和loss1，对数据data2进行纵向联邦逻辑回归训练，合并参数得到w2、intercept2和loss2；Perform longitudinal federated logistic regression training on data data1, merge the parameters to get w1, intercept1 and loss1, perform longitudinal federated logistic regression training on data data2, merge the parameters to get w2, intercept2 and loss2;

根据合并参数，分别对数据data1和数据data2进行横向联邦逻辑回归计算；According to the merge parameters, perform horizontal federated logistic regression calculations on data1 and data2 respectively;

针对数据data1或数据data2，各参与方调用密钥交换协议，得到相同的随机数r，并用随机数r作为种子生成同态公私钥对；For data data1 or data data2, each participant calls the key exchange protocol to obtain the same random number r, and uses the random number r as a seed to generate a homomorphic public and private key pair;

根据聚合间隔参数，加密w1和intercept1，同时加密w2和intercept2，得到加密参数，并将加密参数发送给协作方进行安全聚合，得到聚合中间结果参数；According to the aggregation interval parameter, encrypt w1 and intercept1, and encrypt w2 and intercept2 at the same time to obtain the encryption parameters, and send the encrypted parameters to the collaborating party for secure aggregation to obtain the aggregation intermediate result parameters;

协作方将聚合中间结果参数发送至各参与方，各参与方解密聚合中间结果参数，并根据解密后的聚合中间结果参数进行下一轮参数迭代，直至满足要求，获得最终训练模型。The collaborating party sends the aggregated intermediate result parameters to each participating party. Each participating party decrypts the aggregated intermediate result parameters and performs the next round of parameter iteration based on the decrypted aggregated intermediate result parameters until the requirements are met and the final training model is obtained.

在本发明的一些实施例中，上述对数据data1进行纵向联邦逻辑回归训练的步骤包括：In some embodiments of the present invention, the step of performing longitudinal federated logistic regression training on the data data1 includes:

基于同态加密算法，第一数据分块生成同态加密公钥，并将同态加密公钥发送给第二数据方，第二数据方接收同态加密公钥；Based on the homomorphic encryption algorithm, the first data block generates a homomorphic encryption public key, and sends the homomorphic encryption public key to the second data party, and the second data party receives the homomorphic encryption public key;

第一数据分块和第二数据方各自拟合参数初始值，其中，第一数据分块初始化权重wa1和偏置intercept1，第二数据方初始化权重wb；The first data block and the second data block each have initial values of fitting parameters, where the first data block initializes weight wa1 and bias intercept1, and the second data block initializes weight wb;

第一数据分块计算特征和参数的点乘结果得到ua1，第二数据方计算特征和参数的点乘结果得到ub；The first data block calculates the dot product of the feature and the parameter to obtain ua1, and the second data block calculates the dot product of the feature and the parameter to obtain ub;

第二数据方将ub传给第一数据分块，第一数据分块接收ub，利用公式ue1＝ua1+ub计算总的拟合值ue1，并基于总的拟合值ue1和真实值，利用损失函数计算总的损失loss1：The second data party transmits ub to the first data block, and the first data block receives ub, and uses the formula ue1=ua1+ub to calculate the total fitting value ue1, and based on the total fitting value ue1 and the true value, uses the loss function to calculate the total loss loss1:

第一数据分块基于总的拟合值ue1和真实值计算出梯度因子gdf，并加密梯度因子gdf，得到加密gdf；The first data block calculates the gradient factor gdf based on the total fitting value ue1 and the true value, and encrypts the gradient factor gdf to obtain the encrypted gdf;

第一数据分块将加密gdf发送给第二数据方，第二数据方根据加密gdf计算出对应的加密梯度gdhb； The first data block sends the encrypted gdf to the second data party, and the second data party calculates the corresponding encrypted gradient gdhb based on the encrypted gdf;

基于同态加密算法，第二数据方生成随机数Rb并加密Rb，得到加密Rb，第二数据方将加密梯度gdhb和加密Rb的求和发送给第一数据分块；Based on the homomorphic encryption algorithm, the second data party generates a random number Rb and encrypts Rb to obtain encrypted Rb. The second data party sends the sum of the encrypted gradient gdhb and the encrypted Rb to the first data block;

第一数据分块将求和解密为gdhd＝gdhb+Rb，并将gdhd发送给第二数据方；The first data block decrypts the sum into gdhd=gdhb+Rb, and sends gdhd to the second data party;

第二数据方接收gdhd，同时根据公式gdhb＝gdhd-Rb计算梯度gdhb，并根据梯度gdhb更新权重wb；The second data party receives gdhd, calculates the gradient gdhb according to the formula gdhb=gdhd-Rb, and updates the weight wb according to the gradient gdhb;

第一数据分块通过特征和gdf矩阵相乘得到梯度gdha1，并根据梯度gdha1更新权重wa1和偏置intercept1。The first data block obtains the gradient gdha1 by multiplying the feature and gdf matrix, and updates the weight wa1 and bias intercept1 according to the gradient gdha1.

在本发明的一些实施例中，上述第一数据分块基于总的拟合值ue1和真实值计算出梯度因子gdf的步骤包括：In some embodiments of the present invention, the step of calculating the gradient factor gdf based on the total fitting value ue1 and the true value of the first data block includes:

利用公式gdf＝xwT-y计算得到梯度因子gdf，其中，xwT表示拟合值，y表示真实值。The gradient factor gdf is calculated using the formula gdf=xwT-y, where xwT represents the fitting value and y represents the true value.

在本发明的一些实施例中，上述第二数据方根据加密gdf计算出对应的加密梯度gdhb的步骤包括：In some embodiments of the present invention, the step of calculating the corresponding encrypted gradient gdhb according to the encrypted gdf by the second data cube includes:

根据公式加密梯度gdhb＝梯度因子gdf*Xb计算得到加密梯度gdhb，其中，Xb表示特征列矩阵。The encrypted gradient gdhb is calculated according to the formula encrypted gradient gdhb=gradient factor gdf*Xb, where Xb represents the feature column matrix.

在本发明的一些实施例中，上述对数据data2进行纵向联邦逻辑回归训练的步骤包括：In some embodiments of the present invention, the step of performing longitudinal federated logistic regression training on the data data2 includes:

基于同态加密算法，第二数据分块将同态加密公钥发送给第三数据方，第三数据方接收同态加密公钥；Based on the homomorphic encryption algorithm, the second data block sends the homomorphic encryption public key to the third data party, and the third data party receives the homomorphic encryption public key;

第二数据分块和第三数据方各自拟合参数初始值，其中，第二数据分块初始化权重wa2和偏置intercept2，第三数据方初始化权重wc；The second data block and the third data block each fit the initial values of the parameters, where the second data block initializes the weight wa2 and the bias intercept2, and the third data block initializes the weight wc;

第二数据分块计算特征和参数的点乘结果得到ua2，第三数据方计算特征和参数的点乘结果得到uc；The second data block calculates the dot product of the feature and the parameter to obtain ua2, and the third data block calculates the dot product of the feature and the parameter to obtain uc;

第三数据方将uc传给第二数据分块，第二数据分块接收uc，利用公式ue2＝ua2+uc计算总拟合值ue2，并基于总拟合值ue2和真实值，利用损失函数计算总损失loss2；The third data party transmits uc to the second data block, and the second data block receives uc, calculates the total fitting value ue2 using the formula ue2=ua2+uc, and calculates the total loss loss2 based on the total fitting value ue2 and the true value using the loss function;

第二数据分块基于总拟合值ue2和真实值计算出梯度因子gdf，并加密梯度因子gdf，得到加密gdf；The second data block calculates the gradient factor gdf based on the total fitting value ue2 and the true value, and encrypts the gradient factor gdf to obtain the encrypted gdf;

第二数据分块将加密gdf发送给第三数据方，第三数据方根据加密gdf计算出加密梯度gdhc；The second data block sends the encrypted gdf to the third data party, and the third data party calculates the encrypted gradient gdhc based on the encrypted gdf;

基于同态加密算法，第三数据方生成随机数Rc并加密Rc，得到加密Rb，第三数据方将加密梯度gdhc和加密Rc的求和发送给第二数据分块；Based on the homomorphic encryption algorithm, the third data party generates a random number Rc and encrypts Rc to obtain an encrypted Rb. The third data party sends the sum of the encrypted gradient gdhc and the encrypted Rc to the second data block;

第二数据分块将加密梯度gdhc和加密Rc的求和解密为gdhd＝gdhc+Rc，并将gdhd发送给第三数据方；The second data block decrypts the sum of the encrypted gradient gdhc and the encrypted Rc into gdhd=gdhc+Rc, and sends gdhd to the third data party;

第三数据方接收并根据gdhd减去随机数Rc更新权重wc；The third data party receives and updates the weight wc according to gdhd minus the random number Rc;

第二数据分块通过特征和gdf矩阵相乘得到梯度gdha2，并根据梯度gdha2更新权重wa2和偏置intercept2。The second data block obtains the gradient gdha2 by multiplying the feature and gdf matrix, and updates the weight wa2 and bias intercept2 according to the gradient gdha2.

第二方面，本申请实施例提供一种电子设备，其包括存储器，用于存储一个或多个程序；处理器。当一个或多个程序被处理器执行时，实现如上述第一方面中任一项的方法。In a second aspect, an embodiment of the present application provides an electronic device, comprising a memory for storing one or more programs and a processor. When the one or more programs are executed by the processor, any method in the first aspect is implemented.

第三方面，本申请实施例提供一种计算机可读存储介质，其上存储有计算机程序，该计算机程序被处理器执行时实现如上述第一方面中任一项的方法。 In a third aspect, an embodiment of the present application provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements a method as described in any one of the first aspects above.

相对于现有技术，本发明的实施例至少具有如下优点或有益效果：Compared with the prior art, the embodiments of the present invention have at least the following advantages or beneficial effects:

本发明提出了一种基于同态加密的混合联邦逻辑回归方法，其包括如下步骤：对第一数据方的数据进行模块切分，得到第一数据分块和第二数据分块，并由第一数据分块和第二数据方组成数据data1，由第二数据分块和第三数据方组成数据data2。对数据data1进行纵向联邦逻辑回归训练，合并参数得到w1、intercept1和loss1，对数据data2进行纵向联邦逻辑回归训练，合并参数得到w2、intercept2和loss2。根据合并参数，分别对数据data1和数据data2进行横向联邦逻辑回归计算。针对数据data1或数据data2，各参与方调用密钥交换协议，得到相同的随机数r，并用随机数r作为种子生成同态公私钥对。根据聚合间隔参数，加密w1和intercept1，同时加密w2和intercept2，得到加密参数，并将加密参数发送给协作方进行安全聚合，得到聚合中间结果参数。协作方将聚合中间结果参数发送至各参与方，各参与方解密聚合中间结果参数，并根据解密后的聚合中间结果参数进行下一轮参数迭代，直至满足要求，获得最终训练模型。实现了最终的混合数据切分场景下的混合联邦逻辑回归模型训练过程。解决了需要同时使用横向联邦学习和纵向联邦学习才能满足需求的场景，同时提高计算效率。简化操作步流程，避免数据交互量过大，减少网络资源和计算资源的使用。同时突破大量数据的现实场景计算局限，解决了各方在本地计算不出库，也可完成联邦学习模型训练，实现单个机构内部和外部数据建模任务。也就实现了对于任意计算函数的安全多方计算，突破传统机器学习中只能中心化本地建模界限，进行分布式联邦机器学习建模，当前的分布式混合联邦逻辑回归算法，主要应用于需要同时使用横向联邦学习和纵向联邦学习才能满足需求的数据混合切分场景。解决了在面对数据混合切分场景时，单向联邦学习需要通过多个阶段构建多个联邦学习模型来分别实现横向联邦学习和纵向联邦学习才能满足需求的问题。有效突破了单个机构内部和外部数据建模互通瓶颈，实现本地数据不出库，但能联合多方数据进行联邦学习模型训练，使各参与方对等并共同获益的联合建模，以提升联邦学习模型的准确率。The present invention proposes a hybrid federated logistic regression method based on homomorphic encryption, which includes the following steps: performing module segmentation on the data of the first data party to obtain a first data block and a second data block, and forming data data1 from the first data block and the second data party, and forming data data2 from the second data block and the third data party. Performing longitudinal federated logistic regression training on data data1, merging parameters to obtain w1, intercept1 and loss1, and performing longitudinal federated logistic regression training on data data2, merging parameters to obtain w2, intercept2 and loss2. According to the merged parameters, performing horizontal federated logistic regression calculations on data data1 and data data2 respectively. For data data1 or data data2, each participant calls a key exchange protocol to obtain the same random number r, and uses the random number r as a seed to generate a homomorphic public-private key pair. According to the aggregation interval parameter, encrypt w1 and intercept1, and encrypt w2 and intercept2 at the same time to obtain encryption parameters, and send the encryption parameters to the collaborating party for secure aggregation to obtain aggregation intermediate result parameters. The collaborating party sends the aggregated intermediate result parameters to each participant, and each participant decrypts the aggregated intermediate result parameters and performs the next round of parameter iteration according to the decrypted aggregated intermediate result parameters until the requirements are met and the final training model is obtained. The hybrid federated logistic regression model training process in the final hybrid data segmentation scenario is realized. It solves the scenario where both horizontal federated learning and vertical federated learning are needed to meet the requirements, while improving the computing efficiency. It simplifies the operation process, avoids excessive data interaction, and reduces the use of network resources and computing resources. At the same time, it breaks through the computing limitations of real-life scenarios with large amounts of data, solves the problem that the parties cannot calculate locally, and can also complete the training of the federated learning model to achieve the internal and external data modeling tasks of a single institution. It also realizes the secure multi-party computing of any computing function, breaks through the boundary of centralized local modeling in traditional machine learning, and performs distributed federated machine learning modeling. The current distributed hybrid federated logistic regression algorithm is mainly used in data mixed segmentation scenarios where both horizontal federated learning and vertical federated learning are needed to meet the requirements. It solves the problem that when facing data mixed segmentation scenarios, one-way federated learning needs to build multiple federated learning models through multiple stages to achieve horizontal federated learning and vertical federated learning respectively to meet the requirements. It effectively breaks through the bottleneck of data modeling intercommunication within and outside a single institution, and achieves the goal of not leaving the local data warehouse, but combining data from multiple parties to train federated learning models, allowing all participants to conduct equal and mutually beneficial joint modeling to improve the accuracy of the federated learning model.

BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本发明实施例的技术方案，下面将对实施例中所需要使用的附图作简单地介绍，应当理解，以下附图仅示出了本发明的某些实施例，因此不应被看作是对范围的限定，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他相关的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for use in the embodiments are briefly introduced below. It should be understood that the following drawings only show certain embodiments of the present invention and therefore should not be regarded as limiting the scope. For ordinary technicians in this field, other related drawings can be obtained based on these drawings without creative work.

图1为本发明实施例提供的一种基于同态加密的混合联邦逻辑回归方法的流程图；FIG1 is a flow chart of a hybrid federated logistic regression method based on homomorphic encryption provided by an embodiment of the present invention;

图2为本发明实施例提供的一种数据切分示意图；FIG2 is a schematic diagram of data segmentation provided by an embodiment of the present invention;

图3为本发明实施例提供的一种第一数据分块和第二数据方的交互流程图；FIG3 is a flowchart of an interaction between a first data block and a second data block provided by an embodiment of the present invention;

图4为本发明实施例提供的一种第二数据分块和第三数据方的交互流程图；FIG4 is a flow chart of interaction between a second data block and a third data party provided by an embodiment of the present invention;

图5为本发明实施例提供的一种横向联邦的流程图；FIG5 is a flowchart of a horizontal federation provided by an embodiment of the present invention;

图6为本发明实施例提供的一种电子设备的示意性结构框图。FIG6 is a schematic structural block diagram of an electronic device provided by an embodiment of the present invention.

图标：101-存储器；102-处理器；103-通信接口。 Icon: 101 - memory; 102 - processor; 103 - communication interface.

DETAILED DESCRIPTION

为使本申请实施例的目的、技术方案和优点更加清楚，下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本申请一部分实施例，而不是全部的实施例。通常在此处附图中描述和示出的本申请实施例的组件可以以各种不同的配置来布置和设计。In order to make the purpose, technical solution and advantages of the embodiments of the present application clearer, the technical solution in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, rather than all the embodiments. The components of the embodiments of the present application described and shown in the drawings here can be arranged and designed in various different configurations.

实施例Example

请参照图1，图1所示为本发明实施例提供的一种基于同态加密的混合联邦逻辑回归方法的流程图。本申请实施例提供一种基于同态加密的混合联邦逻辑回归方法，其包括如下步骤：Please refer to Figure 1, which is a flow chart of a hybrid federated logistic regression method based on homomorphic encryption provided by an embodiment of the present invention. The present application embodiment provides a hybrid federated logistic regression method based on homomorphic encryption, which includes the following steps:

S110：对第一数据方的数据进行模块切分，得到第一数据分块和第二数据分块，并由第一数据分块和第二数据方组成数据data1，由第二数据分块和第三数据方组成数据data2；S110: Divide the data of the first data block into modules to obtain a first data block and a second data block, and form data data1 from the first data block and the second data block, and form data data2 from the second data block and the third data block;

示例性的，请参照图2，图2所示为本发明实施例提供的一种数据切分示意图。以A，B，C三方数据为例，A方包含标签，首先对数据进行模块切分，数据data1由A的数据分块A1和B组成，数据data2由A的数据分块A2和C组成。For example, please refer to Figure 2, which is a schematic diagram of data segmentation provided by an embodiment of the present invention. Taking the data of parties A, B, and C as an example, party A contains a label, and the data is first segmented into modules. Data data1 is composed of data blocks A1 and B of A, and data data2 is composed of data blocks A2 and C of A.

S120：对数据data1进行纵向联邦逻辑回归训练，合并参数得到w1、intercept1和loss1，对数据data2进行纵向联邦逻辑回归训练，合并参数得到w2、intercept2和loss2；S120: Perform longitudinal federated logistic regression training on data data1, merge parameters to obtain w1, intercept1 and loss1, and perform longitudinal federated logistic regression training on data data2, merge parameters to obtain w2, intercept2 and loss2;

示例性的，对数据data1和数据data2分别进行纵向联邦逻辑回归训练，即A1和B做纵向联邦，A2和C做纵向联邦。Exemplarily, longitudinal federated logistic regression training is performed on data data1 and data data2 respectively, that is, A1 and B are longitudinally federated, and A2 and C are longitudinally federated.

S130：根据合并参数，分别对数据data1和数据data2进行横向联邦逻辑回归计算；S130: performing horizontal federated logistic regression calculations on data data1 and data data2 respectively according to the merge parameters;

示例性的，请参照图5，图5所示为本发明实施例提供的一种横向联邦的流程图。参数合并，A1方和B方合并参数得到w1、intercept1、loss 1。A2方和C方合并参数得到w2、intercept2、loss2。根据合并参数，对data1和data2进行横向联邦逻辑回归计算。For example, please refer to FIG. 5 , which is a flowchart of a horizontal federation provided by an embodiment of the present invention. Parameter merging: Party A1 and Party B merge parameters to obtain w1, intercept1, and loss 1. Party A2 and Party C merge parameters to obtain w2, intercept2, and loss2. According to the merged parameters, a horizontal federation logistic regression calculation is performed on data1 and data2.

S140：针对数据data1或数据data2，各参与方调用密钥交换协议，得到相同的随机数r，并用随机数r作为种子生成同态公私钥对；S140: For data data1 or data data2, each participant calls the key exchange protocol to obtain the same random number r, and uses the random number r as a seed to generate a homomorphic public-private key pair;

具体的，横向联邦部分以Party1、Party2、Arbiter三方为例，各参与方调用密钥交换协议，得到相同的随机数r，并用随机数作为种子生成同态公私钥对。Specifically, in the horizontal federation part, taking Party 1, Party 2, and Arbiter as an example, each participating party calls the key exchange protocol, obtains the same random number r, and uses the random number as a seed to generate a homomorphic public and private key pair.

S150：根据聚合间隔参数，加密w1和intercept1，同时加密w2和intercept2，得到加密参数，并将加密参数发送给协作方进行安全聚合，得到聚合中间结果参数；S150: Encrypt w1 and intercept1 according to the aggregation interval parameter, and encrypt w2 and intercept2 at the same time to obtain encryption parameters, and send the encryption parameters to the collaborating party for secure aggregation to obtain aggregation intermediate result parameters;

具体的，各参与方根据聚合间隔参数，Party1方加密[w1]和[intercept1]，Party2方加密[w2]和[intercept2]，并将加密参数发送给协作方进行安全聚合。Specifically, each participant encrypts [w1] and [intercept1] according to the aggregation interval parameter, and Party 1 encrypts [w2] and [intercept2], and Party 2 encrypts [w2] and [intercept2], and sends the encrypted parameters to the collaborating parties for secure aggregation.

S160：协作方将聚合中间结果参数发送至各参与方，各参与方解密聚合中间结果参数，并根据解密后的聚合中间结果参数进行下一轮参数迭代，直至满足要求，获得最终训练模型。S160: The collaborating party sends the aggregated intermediate result parameters to each participating party, and each participating party decrypts the aggregated intermediate result parameters and performs the next round of parameter iteration according to the decrypted aggregated intermediate result parameters until the requirements are met to obtain the final training model.

具体的，协作方将聚合后的中间结果[w]和[intercept]发送给各参与方，各参与方解密得到w和intercept，根据聚合解密后参数w和intercept进行下一轮参数迭代。直到满足要求即可获得最终训练模型。Specifically, the collaborating party sends the aggregated intermediate results [w] and [intercept] to each participating party, and each participating party decrypts to obtain w and intercept, and performs the next round of parameter iteration based on the aggregated and decrypted parameters w and intercept. The final training model can be obtained when the requirements are met.

上述实现过程中，该方法使用同态加密(HomomorphicEncryption)进一步增加数据安全性，首先将数据切分成多个垂直分块，然后对垂直数据的多个分块进行纵向逻辑回归模型训练，并对多个垂直数据进行横向联邦逻辑回归模型训练。实现最终的混合数据切分场景下的混合联邦逻辑回归模型训练过程。解决了需要同时使用横向联邦学习和纵向联邦学习才能满足需求的场景，同时提高计算效率。简化操作步流程，避免数据交互量过大，减少网络资源和计算资源的使用。同时突破大量数据的现实场景计算局限，解决了各方在本地计算不出库，也可完成联邦学习模型训练，实现单个机构内部和外部数据建模任务。也就实现了对于任意计算函数的安全多方计算，突破传统机器学习中只能中心化本地建模界限，进行分布式联邦机器学习建模，当前的分布式混合联邦逻辑回归算法，主要应用于需要同时使用横向联邦学习和纵向联邦学习才能满足需求的数据混合切分场景。解决了在面对数据混合切分场景时，单向联邦学习需要通过多个阶段构建多个联邦学习模型来分别实现横向联邦学习和纵向联邦学习才能满足需求的问题。有效突破了单个机构内部和外部数据建模互通瓶颈，实现本地数据不出库，但能联合多方数据进行联邦学习模型训练，使各参与方对等并共同获益的联合建模，以提升联邦学习模型的准确率。In the above implementation process, this method uses homomorphic encryption to further increase data security. First, the data is divided into multiple vertical blocks, and then the vertical data blocks are vertically logically back-linked. The model training is carried out, and the horizontal federated logistic regression model training is carried out for multiple vertical data. The training process of the hybrid federated logistic regression model in the final mixed data segmentation scenario is realized. The scenario that requires the use of both horizontal federated learning and vertical federated learning to meet the needs is solved, while improving the computing efficiency. The operation step process is simplified to avoid excessive data interaction and reduce the use of network resources and computing resources. At the same time, it breaks through the computing limitations of real-life scenarios with large amounts of data, solves the problem that the parties cannot calculate locally and store data, and can also complete the training of the federated learning model to achieve the internal and external data modeling tasks of a single institution. It also realizes the secure multi-party computing of any computing function, breaks through the boundary of centralized local modeling in traditional machine learning, and performs distributed federated machine learning modeling. The current distributed hybrid federated logistic regression algorithm is mainly used in data mixed segmentation scenarios that require the use of both horizontal federated learning and vertical federated learning to meet the needs. It solves the problem that when facing data mixed segmentation scenarios, one-way federated learning needs to build multiple federated learning models through multiple stages to achieve horizontal federated learning and vertical federated learning respectively to meet the needs. It effectively breaks through the bottleneck of data modeling intercommunication within and outside a single institution, and achieves the goal of not leaving the local data warehouse, but combining data from multiple parties to train federated learning models, allowing all participants to conduct equal and mutually beneficial joint modeling to improve the accuracy of the federated learning model.

示例性的，该方法可以应用于运营商联和各商业银行解决通信+金融反欺诈场景下实现数据混合切分场景进行联邦模型训练提供可行性。各商业业务类型几乎重合，数据维度基本一致，如果仅仅使用一家银行的数据难免会造成样本分布比较单一，不能很好识别使用其他银行产品的欺诈行为。如果联合多家商业银行和电信运营商用户行为数据会很大程度上丰富用户群体，但是各商业银行的用户数据和电信运营商的通信数据都是属于用户的隐私数据，在没有得到用户及监管机构的允许下是没法直接流向第三方机构实现数据共享的。此时电信运营商拥有全量的样本以及部分特征，各商业银行等其他参与方拥有相同的特征(与运营商的特征不同)，但是分别拥有部分样本，面对这种混合数据切分场景，此时如果采用本发明所提出的基于同态加密的混合联邦思想可以很方便地将各方数据利用起来，实现数据混合切分场景进行联邦模型训练。其中，各个平台节点相互授权，同时上传本地资源，创建任务方获取各参与节点的数据集资源、锁定选中资源，发起混合联邦逻辑回归算法计算任务，各节点协同计算，实现混合联邦逻辑回归算法的计算过程，从而实现多方数据的联合建模训练任务。Exemplarily, this method can be applied to operators and commercial banks to solve the communication + financial anti-fraud scenario to realize the feasibility of implementing the data mixed segmentation scenario for federated model training. The types of commercial business almost overlap, and the data dimensions are basically the same. If only the data of one bank is used, it will inevitably cause the sample distribution to be relatively single, and it will not be possible to identify fraudulent behavior using other bank products well. If the user behavior data of multiple commercial banks and telecom operators are combined, the user group will be enriched to a great extent, but the user data of each commercial bank and the communication data of the telecom operator are all private data of the user. Without the permission of the user and the regulatory agency, it is impossible to flow directly to a third party institution to realize data sharing. At this time, the telecom operator has a full amount of samples and some features, and other participants such as commercial banks have the same features (different from the features of the operator), but each has some samples. In the face of this mixed data segmentation scenario, if the hybrid federation idea based on homomorphic encryption proposed by the present invention is adopted, the data of all parties can be easily utilized to realize the data mixed segmentation scenario for federal model training. Among them, each platform node authorizes each other and uploads local resources at the same time. The task creator obtains the data set resources of each participating node, locks the selected resources, initiates the hybrid federated logistic regression algorithm calculation task, and each node collaborates to calculate and realize the calculation process of the hybrid federated logistic regression algorithm, thereby realizing the joint modeling and training task of multi-party data.

在本实施例的一些实施方式中，上述对数据data1进行纵向联邦逻辑回归训练的步骤包括：In some implementations of this embodiment, the step of performing longitudinal federated logistic regression training on the data data1 includes:

第一数据分块将求和解密为gdhd＝gdhd+Rb，并将gdhd发送给第二数据方；The first data block decrypts the sum into gdhd=gdhd+Rb, and sends gdhd to the second data party;

第二数据方接收gdhd，同时根据公式gdhd＝gdhd-Rb计算梯度gdhb，并根据梯度gdhb更新权重wb；The second data party receives gdhd, calculates the gradient gdhb according to the formula gdhd=gdhd-Rb, and updates the weight wb according to the gradient gdhb;

示例性的，请参照图3，图3所示为本发明实施例提供的一种第一数据分块和第二数据方的交互流程图。A1和B做纵向联邦的具体过程如下：A1方将同态加密公钥发送给B方。A1和B各自拟合参数初始值，其中，A1方初始化权重wa1和偏置intercept1；B方初始化权重wb。A1和B各自计算特征和参数的点乘结果。B方将特征和参数的点乘结果ub传给A1方，A1方计算A 1和B总的拟合值ue 1。基于总的拟合值和真实值用损失函数计算总的损失：A1方同时基于总的拟合值和真实值计算出梯度因子：gdf＝xwT-y＝y`-y。A1方将加密[gdf]发送给B方，B方根据加密的[gdf]计算出B方的加密的梯度[gdhb]，梯度＝(特征列)矩阵乘(梯度因子)。B方通过生成随机数并加密[Rb]，然后B方将加密的梯度和加密的随机数求和[gdhb]+[Rb]发送给A1方。A1方将加密的梯度与加密的随机数的和解密为gdhd＝gdhb+Rb，A1方将gdhd发送给B方。A1方通过特征和gdf矩阵相乘得到A1方的梯度gdha1，A1方根据gdha1更新A1方权重wa1和intercept1。B方根据gdhd减去添加的随机数Rb更新B方权重wb，从而完成一轮纵向联邦参数更新。Exemplarily, please refer to Figure 3, which shows an interaction flow chart of a first data block and a second data block provided by an embodiment of the present invention. The specific process of A1 and B performing vertical federation is as follows: A1 sends the homomorphic encryption public key to B. A1 and B each fit the initial values of the parameters, where A1 initializes the weight wa1 and the bias intercept1; B initializes the weight wb. A1 and B each calculate the dot product of the features and parameters. B transmits the dot product result ub of the features and parameters to A1, and A1 calculates the total fitting value ue 1 of A 1 and B. The total loss is calculated using the loss function based on the total fitting value and the true value: A1 calculates the gradient factor based on the total fitting value and the true value at the same time: gdf = xwT-y = y`-y. A1 sends the encrypted [gdf] to B, and B calculates B's encrypted gradient [gdhb] based on the encrypted [gdf]. Gradient = (feature column) matrix multiplication (gradient factor). B generates a random number and encrypts it [Rb]. Then B sums the encrypted gradient and the encrypted random number [gdhb] + [Rb] and sends it to A1. A1 decrypts the encrypted gradient and the sum of the encrypted random number to gdhd = gdhb + Rb, and A1 sends gdhd to B. A1 obtains A1's gradient gdha1 by multiplying the feature and gdf matrix. A1 updates A1's weights wa1 and intercept1 based on gdha1. B updates B's weight wb based on gdhd minus the added random number Rb, thus completing a round of vertical federated parameter update.

在本实施例的一些实施方式中，上述第一数据分块基于总的拟合值ue1和真实值计算出梯度因子gdf的步骤包括：In some implementations of this embodiment, the step of calculating the gradient factor gdf based on the total fitting value ue1 and the true value of the first data block includes:

在本实施例的一些实施方式中，上述第二数据方根据加密gdf计算出对应的加密梯度gdhb的步骤包括：In some implementations of this embodiment, the step of the second data cube calculating the corresponding encrypted gradient gdhb according to the encrypted gdf includes:

在本实施例的一些实施方式中，上述对数据data2进行纵向联邦逻辑回归训练的步骤包括：In some implementations of this embodiment, the step of performing longitudinal federated logistic regression training on the data data2 includes:

第三数据方将uc传给第二数据分块，第二数据分块接收uc，利用公式ue2＝ua2+uc计算总拟合值ue2，并基于总拟合值ue2和真实值，利用损失函数计算总损失loss2； The third data party transmits uc to the second data block, and the second data block receives uc, calculates the total fitting value ue2 using the formula ue2=ua2+uc, and calculates the total loss loss2 based on the total fitting value ue2 and the true value using the loss function;

示例性的，请参照图4，图4所示为本发明实施例提供的一种第二数据分块和第三数据方的交互流程图。A2和C的数据交互过程与A1和B的数据交互过程一致，A2和C做纵向联邦的过程如下：A2方将同态加密公钥发送给C方。A2和C各自拟合参数初始值，其中，A2方初始化权重wa2和偏置intercept2；C方初始化权重wc。A2和C各自计算特征和参数的点乘结果。A2方将特征和参数的点乘结果uc传给A2方，A2方计算A2和C总的拟合值ue2。基于总的拟合值和真实值用损失函数计算总的损失：A2方同时基于总的拟合值和真实值计算出梯度因子：gdf＝xwT-y＝y`-y。A2方将加密[gdf]发送给C方，C方根据加密的[gdf]计算出C方的加密的梯度[gdhc]，梯度＝(特征列)矩阵乘(梯度因子)。C方通过生成随机数并加密[Rc]，然后B方将加密的梯度和加密的随机数求和[gdhc]+[Rc]发送给A2方。A2方将加密的梯度与加密的随机数的和解密为gdhd＝gdhc+Rc，A2方将gdhd发送给C方。A2方通过特征和gdf矩阵相乘得到A2方的梯度gdha2，A2方根据gdha2更新A2方权重wa2和intercept2。C方根据gdhd减去添加的随机数Rc更新C方权重wc，从而完成一轮纵向联邦参数更新。Exemplarily, please refer to Figure 4, which shows a flow chart of the interaction between a second data block and a third data party provided by an embodiment of the present invention. The data interaction process between A2 and C is consistent with the data interaction process between A1 and B. The process of vertical federation between A2 and C is as follows: A2 sends the homomorphic encryption public key to C. A2 and C each fit the initial values of the parameters, where A2 initializes the weight wa2 and the bias intercept2; C initializes the weight wc. A2 and C each calculate the dot product of the features and parameters. A2 transmits the dot product result uc of the features and parameters to A2, and A2 calculates the total fitting value ue2 of A2 and C. The total loss is calculated using the loss function based on the total fitting value and the true value: Party A2 calculates the gradient factor based on the total fitting value and the true value at the same time: gdf = xwT-y = y`-y. Party A2 sends the encrypted [gdf] to Party C, and Party C calculates the encrypted gradient of Party C [gdhc] based on the encrypted [gdf]. Gradient = (feature column) matrix multiplication (gradient factor). Party C generates a random number and encrypts it [Rc]. Party B then sends the encrypted gradient and the encrypted random number [gdhc] + [Rc] to Party A2. Party A2 decrypts the encrypted gradient and the encrypted random number to gdhd = gdhc + Rc, and Party A2 sends gdhd to Party C. Party A2 obtains Party A2's gradient gdha2 by multiplying the feature and gdf matrix. Party A2 updates Party A2's weights wa2 and intercept2 based on gdha2. Party C updates Party C's weight wc based on gdhd minus the added random number Rc, thus completing a round of vertical federated parameter update.

请参照图6，图6为本申请实施例提供的电子设备的一种示意性结构框图。电子设备包括存储器101、处理器102和通信接口103，该存储器101、处理器102和通信接口103相互之间直接或间接地电性连接，以实现数据的传输或交互。例如，这些元件相互之间可通过一条或多条通讯总线或信号线实现电性连接。存储器101可用于存储软件程序及模块，处理器102通过执行存储在存储器101内的软件程序及模块，从而执行各种功能应用以及数据处理。该通信接口103可用于与其他节点设备进行信令或数据的通信。Please refer to Figure 6, which is a schematic structural block diagram of an electronic device provided in an embodiment of the present application. The electronic device includes a memory 101, a processor 102 and a communication interface 103, and the memory 101, the processor 102 and the communication interface 103 are electrically connected to each other directly or indirectly to realize data transmission or interaction. For example, these components can be electrically connected to each other through one or more communication buses or signal lines. The memory 101 can be used to store software programs and modules, and the processor 102 executes various functional applications and data processing by executing the software programs and modules stored in the memory 101. The communication interface 103 can be used to communicate signaling or data with other node devices.

其中，存储器101可以是但不限于，随机存取存储器(RandomAccessMemory，RAM)，只读存储器(Read Only Memory，ROM)，可编程只读存储器(Programmable Read-Only Memory，PROM)，可擦除只读存储器(Erasable Programmable Read-Only Memory，EPROM)，电可擦除只读存储器(Electric Erasable Programmable Read-Only Memory，EEPROM)等。Among them, the memory 101 can be but is not limited to, random access memory (Random Access Memory, RAM), read-only memory (Read Only Memory, ROM), programmable read-only memory (Programmable Read-Only Memory, PROM), erasable read-only memory (Erasable Programmable Read-Only Memory, EPROM), electrically erasable read-only memory (Electric Erasable Programmable Read-Only Memory, EEPROM), etc.

处理器102可以是一种集成电路芯片，具有信号处理能力。该处理器102可以是通用处理器，包括中央处理器(CentralProcessingUni t，CPU)、网络处理器(Network Processor，NP)等；还可以是数字信号处理器(Digital Signal Processing，DSP)、专用集成电路(Application Specific Integrated Circuit，ASIC)、现场可编程门阵列(Field－Programmable Gate Array，FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。The processor 102 may be an integrated circuit chip having signal processing capability. The processor 102 may be a general-purpose processor, including a central processing unit (CPU), a network processor (NPU), a It can also be a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.

可以理解，图6所示的结构仅为示意，电子设备还可包括比图6中所示更多或者更少的组件，或者具有与图6所示不同的配置。图6中所示的各组件可以采用硬件、软件或其组合实现。It is understood that the structure shown in Figure 6 is only for illustration, and the electronic device may also include more or fewer components than those shown in Figure 6, or have a different configuration than that shown in Figure 6. Each component shown in Figure 6 may be implemented by hardware, software, or a combination thereof.

在本申请所提供的实施例中，应该理解到，所揭露的装置和方法，也可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的，例如，附图中的流程图和框图显示了根据本申请的多个实施例的装置、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上，流程图或框图中的每个方框可以代表一个模块、程序段或代码的一部分，所述模块、程序段或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意，在有些作为替换的实现方式中，方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如，两个连续的方框实际上可以基本并行地执行，它们有时也可以按相反的顺序执行，这依所涉及的功能而定。也要注意的是，框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合，可以用执行规定的功能或动作的专用的基于硬件的系统来实现，或者可以用专用硬件与计算机指令的组合来实现。In the embodiments provided in the present application, it should be understood that the disclosed devices and methods can also be implemented in other ways. The device embodiments described above are merely schematic. For example, the flowcharts and block diagrams in the accompanying drawings show the possible architecture, functions and operations of the devices, methods and computer program products according to the multiple embodiments of the present application. In this regard, each box in the flowchart or block diagram can represent a module, a program segment or a part of a code, and the module, a program segment or a part of a code contains one or more executable instructions for implementing the specified logical function. It should also be noted that in some alternative implementations, the functions marked in the box can also occur in a different order from the order marked in the accompanying drawings. For example, two consecutive boxes can actually be executed substantially in parallel, and they can sometimes be executed in the opposite order, depending on the functions involved. It should also be noted that each box in the block diagram and/or the flowchart, and the combination of boxes in the block diagram and/or the flowchart can be implemented with a dedicated hardware-based system that performs a specified function or action, or can be implemented with a combination of dedicated hardware and computer instructions.

另外，在本申请各个实施例中的各功能模块可以集成在一起形成一个独立的部分，也可以是各个模块单独存在，也可以两个或两个以上模块集成形成一个独立的部分。In addition, the functional modules in the various embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

所述功能如果以软件功能模块的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(ROM，Read-Only Memory)、随机存取存储器(RAM，Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。If the functions are implemented in the form of software function modules and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application can be essentially or partly embodied in the form of a software product that contributes to the prior art. The computer software product is stored in a storage medium and includes several instructions for a computer device (which can be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in each embodiment of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), disk or optical disk, and other media that can store program codes.

对于本领域技术人员而言，显然本申请不限于上述示范性实施例的细节，而且在不背离本申请的精神或基本特征的情况下，能够以其它的具体形式实现本申请。因此，无论从哪一点来看，均应将实施例看作是示范性的，而且是非限制性的，本申请的范围由所附权利要求而不是上述说明限定，因此旨在将落在权利要求的等同要件的含义和范围内的所有变化囊括在本申请内。不应将权利要求中的任何附图标记视为限制所涉及的权利要求。 It will be apparent to those skilled in the art that the present application is not limited to the details of the exemplary embodiments described above, and that the present application can be implemented in other specific forms without departing from the spirit or essential features of the present application. Therefore, the embodiments should be considered exemplary and non-limiting in all respects, and the scope of the present application is defined by the appended claims rather than the above description, and it is intended that all changes falling within the meaning and scope of the equivalent elements of the claims be included in the present application. Any reference numeral in a claim should not be considered as limiting the claim to which it relates.

Claims

A hybrid federated logistic regression method based on homomorphic encryption, characterized by comprising the following steps:

Perform module segmentation on the data of the first data cube to obtain a first data block and a second data block, and form data data1 from the first data block and the second data cube, and form data data2 from the second data block and the third data cube;

Perform longitudinal federated logistic regression training on the data data1, merge parameters to obtain w1, intercept1 and loss1, perform longitudinal federated logistic regression training on the data data2, merge parameters to obtain w2, intercept2 and loss2;

According to the merge parameters, perform horizontal federated logistic regression calculations on data1 and data2 respectively;

For data data1 or data data2, each participant calls the key exchange protocol to obtain the same random number r, and uses the random number r as a seed to generate a homomorphic public and private key pair;

According to the aggregation interval parameter, encrypt w1 and intercept1, and encrypt w2 and intercept2 at the same time to obtain encryption parameters, and send the encryption parameters to the collaborating party for secure aggregation to obtain aggregation intermediate result parameters;

The collaborating party sends the aggregated intermediate result parameters to each participating party, and each participating party decrypts the aggregated intermediate result parameters and performs the next round of parameter iteration according to the decrypted aggregated intermediate result parameters until the requirements are met to obtain the final training model.

The hybrid federated logistic regression method based on homomorphic encryption according to claim 1 is characterized in that the step of performing longitudinal federated logistic regression training on the data data1 comprises:

Based on the homomorphic encryption algorithm, the first data block generates a homomorphic encryption public key, and sends the homomorphic encryption public key to the second data party, and the second data party receives the homomorphic encryption public key;

The first data block and the second data block each fit the initial values of the parameters, wherein the first data block initializes the weight wa1 and the bias intercept1, and the second data block initializes the weight wb;

The first data block calculates the dot product of the feature and the parameter to obtain ua1, and the second data block calculates the dot product of the feature and the parameter to obtain ub;

The second data block transmits ub to the first data block, and the first data block receives ub, calculates the total fitting value ue1 using the formula ue1=ua1+ub, and calculates the total loss loss1 based on the total fitting value ue1 and the true value using the loss function:

The first data block calculates a gradient factor gdf based on the total fitting value ue1 and the true value, and encrypts the gradient factor gdf to obtain an encrypted gdf;

The first data block sends the encrypted gdf to the second data party, and the second data party calculates the corresponding encryption gradient gdhb according to the encrypted gdf;

Based on the homomorphic encryption algorithm, the second data party generates a random number Rb and encrypts Rb to obtain encrypted Rb, and the second data party sends the sum of the encrypted gradient gdhb and the encrypted Rb to the first data block;

The first data block decrypts the sum into gdhd=gdhb+Rb, and sends gdhd to the second data party;

The second data party receives gdhd, calculates a gradient gdhb according to a formula gdhb=gdhd-Rb, and updates a weight wb according to the gradient gdhb;

The first data block obtains the gradient gdha1 by multiplying the feature and the gdf matrix, and updates the weight wa1 and the bias intercept1 according to the gradient gdha1.

The hybrid federated logistic regression method based on homomorphic encryption according to claim 2 is characterized in that the step of calculating the gradient factor gdf based on the total fitting value ue1 and the true value of the first data block comprises:

The gradient factor gdf is calculated using the formula gdf=xwT-y, where xwT represents the fitting value and y represents the true value.

The hybrid federated logistic regression method based on homomorphic encryption according to claim 2 is characterized in that the step of the second data cube calculating the corresponding encrypted gradient gdhb according to the encrypted gdf comprises:

The encrypted gradient gdhb is calculated according to the formula encrypted gradient gdhb=gradient factor gdf*Xb, where Xb represents the feature column matrix.

The hybrid federated logistic regression method based on homomorphic encryption according to claim 1 is characterized in that the step of performing longitudinal federated logistic regression training on the data data2 comprises:

Based on the homomorphic encryption algorithm, the second data block sends the homomorphic encryption public key to the third data party, and the third data party receives the homomorphic encryption public key;

The second data block and the third data block each fit the initial values of the parameters, wherein the second data block initializes the weight wa2 and the bias intercept2, and the third data block initializes the weight wc;

The second data block calculates the dot product of the feature and the parameter to obtain ua2, and the third data block calculates the dot product of the feature and the parameter to obtain uc;

The third data party transmits uc to the second data block, the second data block receives uc, calculates the total fitting value ue2 using the formula ue2=ua2+uc, and calculates the total loss loss2 based on the total fitting value ue2 and the true value using the loss function;

The second data block calculates the gradient factor gdf based on the total fitting value ue2 and the true value, and encrypts the gradient factor gdf to obtain the encrypted gdf;

The second data block sends the encrypted gdf to the third data party, and the third data party calculates the encrypted gradient gdhc according to the encrypted gdf;

Based on the homomorphic encryption algorithm, the third data party generates a random number Rc and encrypts Rc to obtain an encrypted Rb, and the third data party sends the sum of the encrypted gradient gdhc and the encrypted Rc to the second data block;

The second data block decrypts the sum of the encrypted gradient gdhc and the encrypted Rc into gdhd=gdhc-Rc, and sends gdhd to the third data party;

The third data party receives and updates the weight wc according to gdhd minus the random number Rc;

The second data block obtains the gradient gdha2 by multiplying the feature and the gdf matrix, and updates the weight wa2 and the bias intercept2 according to the gradient gdha2.

An electronic device, comprising:

A memory for storing one or more programs;

processor;

When the one or more programs are executed by the processor, the method according to any one of claims 1 to 5 is implemented.

A computer-readable storage medium having a computer program stored thereon, characterized in that when the computer program is executed by a processor, the method according to any one of claims 1 to 5 is implemented.