CN117521778B

CN117521778B - A cost-optimization approach for split federated learning

Info

Publication number: CN117521778B
Application number: CN202311398490.0A
Authority: CN
Inventors: 黄旭民; 杨锐彬; 吴茂强; 钟伟锋; 谢胜利
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2023-10-25
Filing date: 2023-10-25
Publication date: 2024-11-05
Anticipated expiration: 2043-10-25
Also published as: CN117521778A

Abstract

The invention discloses a cost optimization method for split federal learning, which comprises the following steps: s1: constructing a split federal learning system model, wherein the model comprises a server and a plurality of clients, and each client is provided with a local data set; s2: calculating the time delay and the energy consumption of data calculation and communication of all clients; s3: calculating time delay and energy consumption of communication by using client data, analyzing an objective function and related constraints by taking model splitting layers and bandwidths as decision variables, and establishing a cost optimization problem of a splitting federal learning system; s4: solving an optimization problem to obtain a model splitting and bandwidth allocation strategy; s5: and organizing all clients to train the given complete model by matching with the server based on the obtained model splitting and bandwidth allocation strategy, and obtaining the trained complete model. The invention calculates the optimal model splitting and bandwidth allocation strategy, so that the weighted sum of the total time and the total energy consumption of all clients in the splitting federal learning process is minimum, and the cost of time and energy consumption is reduced.

Description

A cost-optimization approach for split federated learning

技术领域Technical Field

本发明涉及人工智能领域，更具体地，涉及一种拆分联邦学习的成本优化方法及系统。The present invention relates to the field of artificial intelligence, and more specifically, to a cost optimization method and system for split federated learning.

背景技术Background Art

联邦学习(Federated Learning,FL)和拆分学习(Split Learning,SL)是深度学习中以不分享用户数据为前提的分布式模型训练技术。在联邦学习中，服务器首先初始化一个全局模型。在每一轮全局模型训练中，服务器下发全局模型给所有客户端，每个客户端在本地侧基于本地数据训练此模型，随后将更新后的模型参数上传至服务器，由服务器聚合更新全局模型，以上过程持续迭代进行，直至全局模型精度收敛或者全局模型训练轮次到达预设值。在服务器与某客户端之间的拆分学习中，首先将完整的模型在指定位置(即拆分层)拆分成前后两部分子模型，为各个客户端分配前部分子模型，为服务器端分发后部分子模型。在每一轮模型训练中，客户端基于本地数据执行到拆分层的前向传播，获得粉碎数据并上传至服务器，由服务器完成剩余层的前向传播，服务器也进行到拆分层的反向传播，获得上述粉碎数据的梯度信息并回传至客户端，客户端接着完成剩余层的反向传播。以上前向与反向传播两个过程持续迭代进行直至客户端完成所有本地数据计算，再将更新后客户端子模型参数发送到下一个参与模型训练的客户端，重复以上步骤直至模型收敛。联邦学习的优势是允许多个客户端在保护数据隐私条件下并行训练模型，拆分学习是允许客户端只单独训练子模型，为了结合两者的优势，现有文献[Thapa,Chandra,et al."Splitfed:When federated learning meets split learning."Proceedings oftheAAAIConference onArtificial Intelligence.Vol.36.No.8.2022.]提出了拆分联邦学习(Splitfed Learning,SFL)，所有客户端与服务器采用拆分学习完成模型训练，并且服务器利用联邦学习更新统一的客户端子模型。然而，考虑能量受限的物联网设备参与拆分联邦学习，尽管拆分联邦学习将大部分模型训练涉及的计算负荷分配至服务器，但是在训练过程中仍旧需要每个客户端持续地计算、发送与接收数据，带来巨大的时间与通信开销，这对物联网设备造成不小的压力。因此，为了确保拆分联邦学习的实用性，对参与拆分联邦学习的所有客户端在包括时间与能耗的成本优化管理至关重要。Federated Learning (FL) and Split Learning (SL) are distributed model training technologies in deep learning that do not share user data. In federated learning, the server first initializes a global model. In each round of global model training, the server sends the global model to all clients. Each client trains the model based on local data on the local side, and then uploads the updated model parameters to the server. The server aggregates and updates the global model. The above process continues to iterate until the accuracy of the global model converges or the global model training round reaches the preset value. In split learning between the server and a client, the complete model is first split into two parts at the specified position (i.e., the split layer), and the front part sub-model is assigned to each client, and the back part sub-model is distributed to the server. In each round of model training, the client performs forward propagation to the split layer based on local data, obtains the crushed data and uploads it to the server. The server completes the forward propagation of the remaining layers. The server also performs back propagation to the split layer, obtains the gradient information of the above crushed data and transmits it back to the client, and the client then completes the back propagation of the remaining layers. The above two processes of forward and backward propagation are iterated continuously until the client completes all local data calculations, and then the updated client sub-model parameters are sent to the next client participating in model training, and the above steps are repeated until the model converges. The advantage of federated learning is that it allows multiple clients to train models in parallel while protecting data privacy. Split learning allows the client to train only sub-models separately. In order to combine the advantages of both, the existing literature [Thapa, Chandra, et al. "Splitfed: When federated learning meets split learning." Proceedings of the AAAIConference on Artificial Intelligence. Vol. 36. No. 8. 2022.] proposed split federated learning (Splitfed Learning, SFL), in which all clients and servers use split learning to complete model training, and the server uses federated learning to update the unified client sub-model. However, considering the participation of energy-constrained IoT devices in split federated learning, although split federated learning distributes most of the computational load involved in model training to the server, each client still needs to continuously calculate, send and receive data during the training process, which brings huge time and communication overhead, which puts a lot of pressure on IoT devices. Therefore, in order to ensure the practicality of split federated learning, it is crucial to optimize the cost management of all clients participating in split federated learning, including time and energy consumption.

发明内容Summary of the invention

本发明为克服上述现有技术所述模型训练时间与能耗的成本较大的缺陷，提供一种拆分联邦学习的成本优化方法。In order to overcome the defects of the above-mentioned prior art in that the model training time and energy consumption are relatively high, the present invention provides a cost optimization method for split federated learning.

本发明的首要目的是为解决上述技术问题，本发明的技术方案如下：The primary purpose of the present invention is to solve the above technical problems. The technical solution of the present invention is as follows:

本发明第一方面提供了一种拆分联邦学习的成本优化方法，包括以下步骤：A first aspect of the present invention provides a cost optimization method for splitting federated learning, comprising the following steps:

S1：构建拆分联邦学习系统模型，该模型包括一个服务器与若干个个客户端，每个客户端拥有自己的本地数据集；S1: Build a split federated learning system model, which includes a server and several clients, each of which has its own local dataset;

S2：计算所有客户端数据计算与通信的时延和能耗；S2: Calculates the latency and energy consumption of all client data computation and communication;

S3：利用客户端数据计算与通信的时延和能耗，以模型拆分层与带宽为决策变量，分析目标函数与相关约束，建立拆分联邦学习系统成本优化问题；S3: Using the latency and energy consumption of client data computing and communication, taking the model splitting layer and bandwidth as decision variables, analyzing the objective function and related constraints, and establishing the cost optimization problem of the split federated learning system;

S4：求解优化问题，得到模型拆分和带宽分配策略；S4: Solve the optimization problem and obtain the model splitting and bandwidth allocation strategy;

S5：在拆分联邦学习系统中，基于得到的模型拆分和带宽分配策略组织所有客户端配合服务器对给定的完整模型进行训练，得到训练后的完整模型。S5: In the split federated learning system, all clients are organized to cooperate with the server to train the given complete model based on the obtained model splitting and bandwidth allocation strategy to obtain the trained complete model.

进一步的，步骤S2所述计算所有客户端数据计算与通信的时延和能耗，首先计算客户端本地计算能量消耗和计算时延，然后计算拆分联邦学习系统客户端能量成本和计算通信延迟。Furthermore, step S2 calculates the latency and energy consumption of all client data computation and communication, firstly calculating the client local computation energy consumption and computation latency, and then calculating the split federated learning system client energy cost and computation communication latency.

进一步的，所述计算客户端本地计算能量消耗和计算时延的具体过程如下：Furthermore, the specific process of calculating the energy consumption and the calculation delay of the local calculation client is as follows:

客户端i进行单位数据客户端子模型训练的本地计算负载为：The local computing load of client i for unit data client sub-model training is:

其中，γ(·)、ξ(·)分别表示客户端子模型和服务器子模型样本的前向、反向计算所需要的操作数，单位为FLOPs，C表示客户端子模型包含的层数；同理，对应需要服务器协同的计算负载 Among them, γ(·) and ξ(·) represent the number of operations required for the forward and reverse calculations of the client sub-model and server sub-model samples, respectively. The unit is FLOPs, where C represents the number of layers in the client submodel; similarly, it corresponds to the computing load that requires server collaboration.

客户端本地计算能量消耗计算时延分别为Client local computing energy consumption Calculation delay They are

其中，客户端i计算频率为f_i，单位为cycle/s，k_i为客户端i计算强度，单位为FLOPs/cycle，则定义客户端的计算速度为φ_i＝f_i×k_i，单位为FLOPS；客户端计算能耗系数为ε_i，单位为J/FLOPs；D_i为客户端i拥有的本地数据集；服务器协同客户端i的计算速度为ψ。Among them, the computing frequency of client i is _fi , the unit is cycle/s, _ki is the computing intensity of client i, the unit is FLOPs/cycle, then the computing speed of the client is defined as _φi = _fi × _ki , the unit is FLOPS; the client computing energy consumption coefficient is _εi , the unit is J/FLOPs; _Di is the local data set owned by client i; the computing speed of the server collaborating with client i is ψ.

进一步的，所述计算拆分联邦学习系统客户端能量成本和计算通信延迟，具体过程如下：Furthermore, the calculation splits the energy cost of the federated learning system client and the calculation communication delay. The specific process is as follows:

在传输粉碎数据与更新的客户端子模型参数时，客户端i的上行链路数据传输速率r_i ^up为：When transmitting the shredded data and updated client sub-model parameters, the uplink data transmission rate _ri ^up of client i is:

其中，P_i ^up为客户端i上传数据时传输功率，b_i为服务器分配给客户端i的带宽，σ²为高斯通道噪声，为服务器与客户端之间的信道增益,其中α₀表示距离d＝1 m处的信道增益，d_i表示服务器与客户端之间的欧式距离；Where, _Piup is ^the transmission power when client i uploads data, _bi is the bandwidth allocated by the server to client i, ^σ2 is the Gaussian channel noise, is the channel gain between the server and the client, where α ₀ represents the channel gain at a distance of d = 1 m, and d _i represents the Euclidean distance between the server and the client;

服务器向客户端i传输粉碎数据梯度信息的下行链路数据传输速率r_i ^down为：The downlink data transmission rate r _i ^down of the server transmitting the crushing data gradient information to the client i is:

其中，B^D是服务器用于向每个用户广播的固定带宽，P_B为平均传输功率；客户端上传时延下载时延分别为Where ^BD is the fixed bandwidth used by the server to broadcast to each user, _PB is the average transmission power, and the client upload delay is Download delay They are

其中，单个样本粉碎数据大小为D(C)，梯度大小为G(C)，数据标签大小为β，客户端子模型参数大小为 Among them, the size of a single sample crushed data is D(C), the size of the gradient is G(C), the size of the data label is β, and the size of the client sub-model parameter is

通信时延T_i ^comm为：The communication delay _Ti ^comm is:

客户端通信能量消耗为Client communication energy consumption for

其中，P_i ^down为客户端接收功率，P_i ^up客户端i平均发射功率；Where, _Pi ^down is the client receive power, and _Pi ^{up is} the average transmit power of client i;

客户端i时间开销T_i ^total与能耗开销分别为：Client i time cost _Ti ^total and energy cost They are:

T_i ^total＝T_i ^comm+T_i ^comp T _i ^total =T _i ^comm +T _i ^comp

在指定M个全局模型训练轮次中，拆分联邦学习系统客户端能量成本E和计算通信延迟T分别为：In a specified M global model training rounds, the split federated learning system client energy cost E and computational communication delay T are:

进一步的，在步骤S3中，建立的系统成本优化问题具体如下：Furthermore, in step S3, the system cost optimization problem established is as follows:

subject to：subject to：

C1：C_min≤C≤C_max C1 _: _{Cmin≤C≤Cmax}

C2：b_min≤b_i≤b_max C2: _bmin ≤b _i _≤bmax

C3： C3:

其中，B_total为所有客户端可用上行带宽；C1指明模型拆分层的范围，约束客户端子模型层数；C2指明每个客户端能分配到的带宽值取值范围，该取值范围为预先设定的；C3保证分配给客户端的带宽资源的总和不超过B_total；表示总能耗与总时间延迟之间的权重， Wherein, B _total is the available uplink bandwidth for all clients; C1 indicates the range of model splitting layers, constraining the number of client sub-model layers; C2 indicates the bandwidth value range that can be allocated to each client, which is pre-set; C3 ensures that the sum of bandwidth resources allocated to the client does not exceed B _total ; represents the weight between total energy consumption and total time delay,

进一步的，在步骤S4中，求解优化问题具体包括：将带宽变量离散化，设置离散间隔为Δ，则其中，b_i为服务器分配给客户端i的带宽，b_min和b_max分别表示客户端能分配到的带宽值的最小值和最大值。Further, in step S4, solving the optimization problem specifically includes: discretizing the bandwidth variable and setting the discrete interval to Δ, then Where _bi is the bandwidth allocated by the server to client i, _bmin and _bmax represent the minimum and maximum bandwidth values that can be allocated to the client, respectively.

进一步的，利用Python中Scipy库工具或者现有的规划问题求解器求解出整数规划问题的最优解。Furthermore, the optimal solution to the integer programming problem is solved using the Scipy library tool in Python or an existing planning problem solver.

进一步的，步骤S5所述基于得到的模型拆分和带宽分配策略组织所有客户端配合服务器对给定的完整模型进行训练，具体过程如下：Furthermore, in step S5, all clients are organized to cooperate with the server to train a given complete model based on the obtained model splitting and bandwidth allocation strategy. The specific process is as follows:

将给定的完整模型拆分为全局客户端子模型与全局服务器子模型，在每一轮全局模型的训练中，训练全局客户端子模型与全局服务器子模型，包括两个方面：首先，每个客户端接收服务器下发的全局客户端子模型，通过联合服务器训练此子模型，随后将更新后的客户端子模型参数上传至服务器以聚合更新全局客户端子模型；另外，服务器在内部给每个客户端分配一个相关联的服务器子模型，当协同每个客户端训练其客户端子模型时也更新此关联的服务器子模型，最后通过聚合所有关联的服务器子模型以更新全局服务器子模型。The given complete model is split into a global client sub-model and a global server sub-model. In each round of global model training, the global client sub-model and the global server sub-model are trained, which includes two aspects: first, each client receives the global client sub-model sent by the server, trains this sub-model by jointly training the server, and then uploads the updated client sub-model parameters to the server to aggregate and update the global client sub-model; in addition, the server internally assigns an associated server sub-model to each client, and updates this associated server sub-model when coordinating each client to train its client sub-model. Finally, the global server sub-model is updated by aggregating all associated server sub-models.

进一步的，步骤S5的具体过程如下：Furthermore, the specific process of step S5 is as follows:

S51：服务器选择模型拆分策略以确定客户端子模型包含的层数C，对给定的完整模型进行拆分后将客户端子模型W^client下发至每个客户端；S51: The server selects a model splitting strategy to determine the number of layers C included in the client sub-model, splits the given complete model, and sends the client sub-model W ^client to each client;

S52：每个客户端接收W^client，在服务器协同下基于本地数据更新此子模型，服务器负责更新服务器子模型；S52: Each client receives W ^client , and updates this sub-model based on local data in cooperation with the server, and the server is responsible for updating the server sub-model;

S53：任意客户端i拥有本地数据集D_i，训练批量大小为H，将本地数据集划分为个批次；S53: Any client i has a local dataset D _i , and the training batch size is H. The local dataset is divided into batches;

S54：任意客户端i使用当前批次数据执行客户端子模型的前向传播，将粉碎数据与当前批次对应的数据标签发送到服务器，其中粉碎数据即拆分层的输出；S54: Any client i uses the current batch data to perform forward propagation of the client sub-model, and sends the crushed data and the data label corresponding to the current batch to the server, where the crushed data is the output of the splitting layer;

S55：服务器利用接收到的粉碎数据，进行客户端i关联的服务器子模型的前向传播，并根据所接收的数据标签计算损失函数，在关联的服务器子模型执行反向传播计算得到粉碎数据的梯度，基于粉碎数据梯度以更新关联的服务器子模型；对于所有客户端的以上步骤在服务器内部并行执行；S55: The server uses the received crushed data to perform forward propagation of the server sub-model associated with client i, and calculates the loss function according to the received data label, performs back propagation calculation on the associated server sub-model to obtain the gradient of the crushed data, and updates the associated server sub-model based on the crushed data gradient; the above steps for all clients are executed in parallel inside the server;

S56：服务器将所有粉碎数据的梯度信息分别下发至对应客户端；S56: The server sends the gradient information of all crushed data to the corresponding clients respectively;

S57：每个客户端接收到粉碎数据的梯度信息，在客户端子模型执行反向传播并更新其客户端子模型；S57: Each client receives the gradient information of the crushed data, performs back propagation in the client sub-model and updates its client sub-model;

S58：重复S54至S57直到用完所有本地数据；S58: Repeat S54 to S57 until all local data is used up;

S59：每个客户端将更新的本地客户端子模型参数上传至服务器，服务器利用加权平均来聚合更新全局客户端子模型，随后将全局客户端子模型重新下发至所有客户端进行新一轮的客户端子模型本地训练；服务器通过加权平均来聚合所有关联的服务器子模型以更新全局服务器子模型。S59: Each client uploads the updated local client sub-model parameters to the server. The server uses weighted averaging to aggregate and update the global client sub-model, and then re-sends the global client sub-model to all clients for a new round of local client sub-model training. The server aggregates all associated server sub-models through weighted averaging to update the global server sub-model.

本发明第二方面提供了一种拆分联邦学习的成本优化系统，该系统包括：存储器、处理器，所述存储器中包括一种拆分联邦学习的成本优化方法程序，所述一种拆分联邦学习的成本优化方法程序被所述处理器执行时实现如下步骤：A second aspect of the present invention provides a cost optimization system for split federated learning, the system comprising: a memory, a processor, the memory comprising a cost optimization method program for split federated learning, the cost optimization method program for split federated learning being executed by the processor to implement the following steps:

与现有技术相比，本发明技术方案的有益效果是：Compared with the prior art, the technical solution of the present invention has the following beneficial effects:

本发明首先构建拆分联邦学习系统模型，该模型包括一个服务器与若干个客户端，每个客户端拥有自己的本地数据集；其次计算所有客户端数据计算与通信的时延和能耗；然后利用客户端数据计算与通信的时延和能耗，以模型拆分层与带宽为决策变量，分析目标函数与相关约束，建立拆分联邦学习系统成本优化问题；之后求解优化问题，得到模型拆分和带宽分配策略，获得最佳模型拆分和带宽分配策略；最后在拆分联邦学习系统中，基于得到的模型拆分和带宽分配策略组织所有客户端配合服务器对给定的完整模型进行训练，得到训练后的完整模型，保护客户端的数据隐私同时以一种训练成本优化方式令客户端完成模型训练任务。The present invention first constructs a split federated learning system model, which includes a server and several clients, each of which has its own local data set; secondly, the delay and energy consumption of data calculation and communication of all clients are calculated; then, the delay and energy consumption of data calculation and communication of the clients are used, the model splitting layer and bandwidth are used as decision variables, the objective function and related constraints are analyzed, and the cost optimization problem of the split federated learning system is established; then, the optimization problem is solved to obtain the model splitting and bandwidth allocation strategy, and the optimal model splitting and bandwidth allocation strategy is obtained; finally, in the split federated learning system, based on the obtained model splitting and bandwidth allocation strategy, all clients are organized to cooperate with the server to train a given complete model, and the trained complete model is obtained, so as to protect the data privacy of the client and enable the client to complete the model training task in a training cost optimization manner.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明实施例提供的一种拆分联邦学习的成本优化方法流程图。FIG1 is a flow chart of a cost optimization method for split federated learning provided by an embodiment of the present invention.

图2为本发明实施例提供的拆分联邦学习系统模型图。FIG2 is a diagram of a split federated learning system model provided in an embodiment of the present invention.

图3为本发明实施例提供的拆分联邦学习系统工作流图。FIG3 is a workflow diagram of a split federated learning system provided in an embodiment of the present invention.

图4为本发明实施例提供的模型拆分和带宽分配策略计算流程图。FIG4 is a flow chart of model splitting and bandwidth allocation strategy calculation provided by an embodiment of the present invention.

图5为本发明实施例提供的ResNet18模型拆分示意图。FIG5 is a schematic diagram of splitting a ResNet18 model provided in an embodiment of the present invention.

图6为本发明实施例提供仿真结果图。FIG. 6 is a diagram showing simulation results according to an embodiment of the present invention.

具体实施方式DETAILED DESCRIPTION

为了能够更清楚地理解本发明的上述目的、特征和优点，下面结合附图和具体实施方式对本发明进行进一步的详细描述。需要说明的是，在不冲突的情况下，本申请的实施例及实施例中的特征可以相互组合。In order to more clearly understand the above-mentioned purpose, features and advantages of the present invention, the present invention is further described in detail below in conjunction with the accompanying drawings and specific embodiments. It should be noted that the embodiments of the present application and the features in the embodiments can be combined with each other without conflict.

在下面的描述中阐述了很多具体细节以便于充分理解本发明，但是，本发明还可以采用其他不同于在此描述的其他方式来实施，因此，本发明的保护范围并不受下面公开的具体实施例的限制。In the following description, many specific details are set forth to facilitate a full understanding of the present invention. However, the present invention may also be implemented in other ways different from those described herein. Therefore, the protection scope of the present invention is not limited to the specific embodiments disclosed below.

实施例1Example 1

如图1所示，本发明第一方面提供了一种拆分联邦学习的成本优化方法，包括以下步骤：As shown in FIG1 , the first aspect of the present invention provides a cost optimization method for splitting federated learning, comprising the following steps:

S1：构建拆分联邦学习系统模型，该模型包括一个服务器与I个客户端，每个客户端拥有自己的本地数据集，如图2所示为拆分联邦学习系统模型图。S1: Build a split federated learning system model, which includes a server and I clients. Each client has its own local data set. Figure 2 shows the split federated learning system model diagram.

更具体的，构建拆分联邦学习系统模型，如图2所示，包括一个服务器与I个客户端，由服务器执行服务器子模型训练，同时聚合更新所有客户端共享的客户端子模型，并分配不同通信带宽给不同客户端。每个客户端拥有自己的本地数据集，以训练自身客户端子模型，通过无线信道与服务器进行数据通信。More specifically, a split federated learning system model is constructed, as shown in Figure 2, which includes a server and I clients. The server performs server sub-model training, aggregates and updates the client sub-model shared by all clients, and allocates different communication bandwidths to different clients. Each client has its own local data set to train its own client sub-model and communicates data with the server through a wireless channel.

S2：计算所有客户端数据计算与通信的时延和能耗。S2: Calculate the latency and energy consumption of all client data calculations and communications.

更具体的，计算客户端本地计算能量消耗和计算时延的具体过程如下：More specifically, the specific process of calculating the client's local computing energy consumption and computing latency is as follows:

客户端i进行单位数据客户端子模型训练的本地计算负载为γ(·)、ξ(·)分别计算网络模型单个样本的前向、反向计算所需要的操作数，单位为FLOPs，C表示客户端子模型包含的层数；同理，对应需要服务器协同的计算负载客户端i计算频率为f_i，单位为cycle/s，k_i为客户端i计算强度，单位为FLOPs/cycle，则定义客户端的计算速度为φ_i＝f_i×k_i，单位为FLOPS；客户端计算能耗系数为ε_i，单位为J/FLOPs；D_i为客户端i拥有的本地数据集；服务器协同客户端i的计算速度为ψ，则客户端本地计算能量消耗计算时延T_i ^comp分别为The local computing load of client i for unit data client sub-model training is γ(·) and ξ(·) respectively calculate the number of operations required for the forward and reverse calculations of a single sample of the network model. The unit is FLOPs, where C represents the number of layers in the client submodel; similarly, it corresponds to the computing load that requires server collaboration. The computing frequency of client i is _fi , in cycle/s, and _ki is the computing intensity of client i, in FLOPs/cycle. Then the computing speed of the client is defined as _φi = _fi × _ki , in FLOPS. The computing energy consumption coefficient of the client is _εi , in J/FLOPs. _Di is the local data set owned by client i. The computing speed of the server in collaboration with client i is ψ, and the local computing energy consumption of the client is The calculated delays _Ti ^comp are

计算拆分联邦学习系统客户端能量成本和计算通信延迟，具体过程如下：The calculation splitting federated learning system client energy cost and calculation communication delay are as follows:

P_i ^up为客户端i上传数据时传输功率，b_i为服务器分配给客户端i的带宽，σ²为高斯通道噪声，为服务器与客户端之间的信道增益,其中α₀表示距离d＝1m处的信道增益，d_i表示服务器与客户端之间的欧式距离；在传输粉碎数据与更新的客户端子模型参数时，客户端i的上行链路数据传输速率r_i ^up：P _i ^up is the transmission power when client i uploads data, b _i is the bandwidth allocated by the server to client i, σ ² is the Gaussian channel noise, is the channel gain between the server and the client, where α ₀ represents the channel gain at a distance of d = 1 m, and d _i represents the Euclidean distance between the server and the client; when transmitting the pulverized data and the updated client sub-model parameters, the uplink data transmission rate r _i ^up of client i is:

类似地，服务器向客户端i传输粉碎数据梯度信息的下行链路数据传输速率r_i ^down：Similarly, the downlink data transmission rate r _i ^down at which the server transmits the crushed data gradient information to the client i is:

其中，B^D是服务器用于向每个用户广播的固定带宽，P_B为平均传输功率；Where ^BD is the fixed bandwidth used by the server to broadcast to each user, and _PB is the average transmission power;

单个样本粉碎数据大小为D(C)，梯度大小为G(C)，数据标签大小为β，客户端子模型参数大小为则客户端上传时延下载时延分别为The size of a single sample crushing data is D(C), the size of the gradient is G(C), the size of the data label is β, and the size of the client sub-model parameter is The client upload delay Download delay They are

通信时延T_i ^comm为：The communication delay _Ti ^comm is:

P_i ^down为客户端接收功率，P_i ^up客户端i平均发射功率，则客户端通信能量消耗为 _Pi ^down is the client receiving power, _Pi ^{up is} the average transmitting power of client i, and the client communication energy consumption is for

客户端i时间开销与能耗开销分别为：Client i time overhead and energy consumption They are:

T_i ^total＝T_i ^comm+T_i ^comp T _i ^total =T _i ^comm +T _i ^comp

S3：利用客户端数据计算与通信的时延和能耗，以模型拆分层与带宽为决策变量，分析目标函数与相关约束，建立拆分联邦学习系统成本优化问题。S3: Using the latency and energy consumption of client data computing and communication, taking the model splitting layer and bandwidth as decision variables, analyzing the objective function and related constraints, and establishing the cost optimization problem of the split federated learning system.

更具体的，在步骤S3中，建立的系统成本优化问题具体如下：More specifically, in step S3, the system cost optimization problem established is as follows:

subject to：subject to：

C1：C_min≤C≤C_max C1 _: _{Cmin≤C≤Cmax}

C2：b_min≤b_i≤b_max C2: _bmin ≤b _i _≤bmax

C3： C3:

S4：求解优化问题，得到模型拆分和带宽分配策略。S4: Solve the optimization problem and obtain the model splitting and bandwidth allocation strategy.

更具体的，为了在实际应用中简化问题求解，考虑将带宽变量离散化，设置离散间隔为Δ，则此时，原始问题将转化为传统的整数规划问题，进而利用Python中Scipy库工具或者现有的规划问题求解器(诸如Lingo)求解出整数规划问题的最优解。More specifically, in order to simplify the problem solving in practical applications, consider discretizing the bandwidth variable and setting the discrete interval to Δ, then At this point, the original problem will be transformed into a traditional integer programming problem, and then the optimal solution to the integer programming problem will be solved using the Scipy library tool in Python or an existing planning problem solver (such as Lingo).

更具体的，将给定的完整模型拆分为全局客户端子模型与全局服务器子模型，在每一轮全局模型的训练中，训练全局客户端子模型与全局服务器子模型，包括两个方面：首先，每个客户端接收服务器下发的全局客户端子模型，通过联合服务器训练此子模型，随后将更新后的客户端子模型参数上传至服务器以聚合更新全局客户端子模型；另外，服务器在内部给每个客户端分配一个相关联的服务器子模型，当协同每个客户端训练其客户端子模型时也更新此关联的服务器子模型，最后通过聚合所有关联的服务器子模型以更新全局服务器子模型。More specifically, the given complete model is split into a global client sub-model and a global server sub-model. In each round of global model training, the global client sub-model and the global server sub-model are trained, which includes two aspects: first, each client receives the global client sub-model sent by the server, trains this sub-model by jointly training the server, and then uploads the updated client sub-model parameters to the server to aggregate and update the global client sub-model; in addition, the server internally assigns an associated server sub-model to each client, and updates this associated server sub-model when coordinating the training of each client's client sub-model. Finally, the global server sub-model is updated by aggregating all associated server sub-models.

更具体的，如图3为拆分联邦学习系统工作流图，步骤S5的具体过程如下：More specifically, as shown in FIG3 , a workflow diagram of the split federated learning system is shown, the specific process of step S5 is as follows:

实施例2Example 2

实施拆分联邦学习成本优化的流程图，如图4所示。以下通过具体的实验仿真展示方法效果。考虑客户端数量为5的拆分联邦学习系统，服务器位于中心位置，客户端均匀分布在服务器附近100到200米，训练任务为经典CIFAR10数据集分类任务,全局迭代次数M＝100。CIFAR10数据集由10个类别的60000张32×32的彩色图像组成，每个类别有6000张图像，选取50000张训练图像和10000张测试图像。模型选取卷积神经网络模型ResNet-18，定义模型层数为10层，模型拆分候选位置如图5所示。考虑异构性，客户端本地数据在[8000,10000]张均匀分布，计算速度φ_i满足[0.01,0.03]TFLOPS均匀分布，客户端能耗系数ε_i满足[0.5,0.7]J/TFLOPs均匀分布，传输功率P_i ^up满足[0.1,0.2]W均匀分布，简单起见令P_i ^up＝P_i ^down。服务器协同每个客户端的计算速度ψ为2TFLOPS，系统总带宽B_total为8MHz，信道增益α₀设置为-50dB，服务器传输功率为1W，用于广播的带宽B^D为15MHz。目标函数中权重系数 The flowchart for implementing split federated learning cost optimization is shown in Figure 4. The following is a specific experimental simulation to demonstrate the effect of the method. Consider a split federated learning system with 5 clients, the server is located in the center, and the clients are evenly distributed 100 to 200 meters around the server. The training task is the classic CIFAR10 dataset classification task, and the global iteration number M=100. The CIFAR10 dataset consists of 60,000 32×32 color images in 10 categories, each category has 6,000 images, and 50,000 training images and 10,000 test images are selected. The model selects the convolutional neural network model ResNet-18, and the number of model layers is defined as 10. The candidate positions for model splitting are shown in Figure 5. Considering heterogeneity, the client local data is evenly distributed in [8000, 10000] sheets, the computing speed φ _i satisfies the uniform distribution of [0.01, 0.03] TFLOPS, the client energy consumption coefficient ε _i satisfies the uniform distribution of [0.5, 0.7] J/TFLOPs, and the transmission power _Pi ^up satisfies the uniform distribution of [0.1, 0.2] W. For simplicity, let _Pi ^up = _Pi ^down . The computing speed ψ of the server cooperating with each client is 2 TFLOPS, the total bandwidth of the system B _total is 8 MHz, the channel gain α ₀ is set to -50 dB, the server transmission power is 1 W, and the bandwidth used for broadcasting ^BD is 15 MHz. The weight coefficient in the objective function

带宽离散间距为Δ＝0.1MHz，分配给用户最低带宽b_min为0.5MHz，最大带宽b_max＝3.5MHz，则b_i∈{0.5+k×Δ,|k＝0,1,2，…，30}，仿真结果如图6所示。The bandwidth discrete interval is Δ=0.1MHz, the minimum bandwidth b _min allocated to the user is 0.5MHz, and the maximum bandwidth b _max =3.5MHz, then b _i ∈{0.5+k×Δ,|k=0,1,2，…，30}. The simulation results are shown in FIG6 .

当C＝10时，即执行原始联邦学习方案。由表易知，本专利在拆分层C＝6处取得最优解，此时带宽分配为(1.3,2.6,1.8,1.1,1.2)，目标函数优于原始联邦学习方案的目标函数 When C=10, the original federated learning scheme is executed. As can be seen from the table, this patent obtains the optimal solution at the split layer C=6, at which the bandwidth allocation is (1.3, 2.6, 1.8, 1.1, 1.2), and the objective function Objective function that is better than the original federated learning solution

显然，本发明的上述实施例仅仅是为清楚地说明本发明所作的举例，而并非是对本发明的实施方式的限定。对于所属领域的普通技术人员来说，在上述说明的基础上还可以做出其它不同形式的变化或变动。这里无需也无法对所有的实施方式予以穷举。凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等，均应包含在本发明权利要求的保护范围之内。Obviously, the above embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. For those skilled in the art, other different forms of changes or modifications can be made based on the above description. It is not necessary and impossible to list all the embodiments here. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the protection scope of the claims of the present invention.

Claims

1. A cost optimization method for split federal learning, comprising the steps of:

S1: constructing a split federal learning system model, wherein the model comprises a server and a plurality of clients, and each client is provided with a local data set;

s2: calculating the calculation time delay, the calculation energy consumption, the communication time delay and the communication energy consumption of all client data;

S3: using the calculation time delay, the calculation energy consumption, the communication time delay and the communication energy consumption of the client data, taking the model splitting layer and the bandwidth as decision variables, and establishing an optimization target for minimizing the cost of the splitting federal learning system according to an objective function and constraint conditions;

S4: solving an optimization problem to obtain a model splitting and bandwidth allocation strategy;

s5: in the split federal learning system, organizing all clients to train a given complete model by matching with a server based on the obtained model split and bandwidth allocation strategy to obtain a trained complete model;

step S2, calculating the calculation time delay, the calculation energy consumption, the communication time delay and the communication energy consumption of all client data, firstly calculating the local calculation energy consumption and calculation time delay of the client, and then calculating the communication energy consumption and the communication time delay of the split federal learning system client;

The specific process of calculating the local energy consumption and the calculating time delay of the client side is as follows:

The local computational load of the client i for training the unit data client terminal model is as follows:

Wherein γ ^client (C) represents the required operands for forward computation of the model for a single sample of the client at split point C; ζ ^client (C) represents the required operands for forward computation of the model for a single sample of the client at split point C; The unit is FLOPs, and C represents the number of layers contained in the client terminal model; similarly, the corresponding computing load that requires server collaboration

Client-side local computing energy consumptionThe calculation time delays T _i ^comp are respectively

The calculation frequency of the client i is f _i, the unit is cycle/s, k _i is the calculation intensity of the client i, the unit is FLOPs/cycle, the calculation speed of the client is phi _i＝f_i×k_i, and the unit is FLOPS; the client calculates the energy consumption coefficient as epsilon _i and the unit is J/FLOPs; d _i is the local dataset owned by client i; the calculation speed of the server cooperative client i is psi.

2. The cost optimization method for splitting federal learning according to claim 1, wherein the calculating the communication energy consumption and the communication time delay of the client of the splitting federal learning system comprises the following specific processes:

In transmitting the shredding data and the updated client terminal model parameters, the uplink data transmission rate r _i ^up of the client i is:

Where Pi ^up is the transmission power at which the client i uploads data, bi is the bandwidth allocated to the client i by the server, σ ² is gaussian channel noise, Is the channel gain between the server and the client, where α ₀ represents the channel gain at distance d=1m, di represents the euclidean distance between the server and the client;

The downlink data transmission rate r _i ^down of the server transmitting the crushed data gradient information to the client i is:

wherein B ^D is a fixed bandwidth used by the server for broadcasting to each user, and P _B is an average transmission power;

Client upload time delay Download time delayRespectively is

Wherein, the size of the crushed data of a single sample is D (C), the gradient size is G (C), the size of the data label is beta, and the parameter size of the client terminal model is

Communication delayThe method comprises the following steps:

client communication energy consumption Is that

Pi ^down is the client receiving power, and Pi ^up is the average transmitting power of the client i;

Client i time overhead T _i ^total and energy consumption overhead The method comprises the following steps of:

T_i ^total＝T_i ^comm+T_i ^comp

In the appointed M global model training rounds, the total time expenditure T and the total energy expenditure E of all clients of the split federal learning system are respectively as follows:

T＝Mmax(T₁ ^total,T₂ ^total,...,T_I ^total)

3. The method for optimizing the cost of split federation learning according to claim 2, wherein in step S3, the optimization objective for minimizing the cost of the split federation learning system is established according to the objective function and the constraint condition as follows:

The objective function is:

The constraint conditions are as follows:

subject to:

C1:C_min≤C≤C_max

C2:b_min≤b_i≤b_max

C3:

Wherein, B _total is the available uplink bandwidth of all clients; c1 is the range of model splitting layers, and the number of layers of the client terminal model is constrained; c2 is the constraint of the value range of the bandwidth value which can be allocated to each client; c3 is a constraint on the sum of bandwidth resources allocated to the client; representing a weight between total energy consumption overhead and total latency overhead; b _i is the bandwidth allocated to the client i by the server; c is the number of layers the client terminal model contains.

4. The method for cost optimization of split federal learning according to claim 1, wherein in step S4, solving the optimization problem specifically includes: discretizing the bandwidth variable, setting the discretization interval as delta, thenWherein b _i is the bandwidth allocated to the client i by the server, and b _min and b _max respectively represent the minimum value and the maximum value of the bandwidth values that the client can allocate to; k is the kth discrete interval.

5. The method of claim 4, wherein the optimal solution of the integer programming problem is solved using a Scipy library tool in Python or an existing programming problem solver.

6. The method for optimizing cost of split federal learning according to claim 1, wherein in step S5, all clients are organized to train a given complete model with a server based on the obtained model split and bandwidth allocation policy, and the specific process is as follows:

Splitting a given complete model into a global client terminal model and a global server sub-model, and training the global client terminal model and the global server sub-model in each round of training of the global model comprises the following two aspects: firstly, each client receives a global client terminal model issued by a server, trains the sub-model through a joint server, and then uploads updated client terminal model parameters to the server to aggregate and update the global client terminal model; in addition, the server internally assigns each client an associated server sub-model that is updated as each client trains its client sub-model in conjunction with it, and finally updates the global server sub-model by aggregating all of the associated server sub-models.

7. The method for cost optimization for split federal learning according to claim 6, wherein the specific process of step S5 is as follows:

s51: the server selects a model splitting strategy to determine the layer number C contained in the client terminal model, splits a given complete model and then issues a global client terminal model W ^client to each client terminal;

S52: each client receives W ^client, updates the sub-model based on local data under the cooperation of a server, and the server is responsible for updating the server sub-model;

S53: any client i owns a local dataset D _i with training batch size H, dividing the local dataset into Each batch;

s54: any client i uses the current batch data to perform forward propagation of the client terminal model, and sends the data labels of the crushed data and the current batch to a server, wherein the crushed data is output of a split layer;

s55: the server utilizes the received crushed data to conduct forward propagation of a server sub-model associated with the client i, calculates a loss function according to the received data label, executes backward propagation calculation on the associated server sub-model to obtain a gradient of the crushed data, and updates the associated server sub-model based on the crushed data gradient; the above steps for all clients are performed in parallel inside the server;

S56: the server respectively transmits gradient information of all the crushing data to the corresponding client;

S57: each client receives gradient information of the crushed data, performs back propagation on the client terminal model and updates the client terminal model thereof;

S58: repeating S54 to S57 until all the local data is exhausted;

S59: each client uploads the updated local client terminal model parameters to a server, the server utilizes weighted average to aggregate and update the global client terminal model, and then the global client terminal model is re-issued to all clients for a new round of client terminal model local training; the server aggregates all associated server sub-models by weighted average to update the global server sub-model.

8. A cost optimization system for split federal learning, the system comprising: the system comprises a memory and a processor, wherein the memory comprises a cost optimization method program for splitting federal learning, and the cost optimization method program for splitting federal learning realizes the following steps when being executed by the processor: