WO2025126219A1

WO2025126219A1 - Methods, apparatus and systems for federated learning

Info

Publication number: WO2025126219A1
Application number: PCT/IN2023/051160
Authority: WO
Inventors: Satheesh Kumar PEREPU; Saravanan M
Original assignee: Telefonaktiebolaget LM Ericsson AB
Current assignee: Telefonaktiebolaget LM Ericsson AB
Priority date: 2023-12-11
Filing date: 2023-12-11
Publication date: 2025-06-19
Anticipated expiration: 2026-06-11

Abstract

Methods, apparatus and systems for use in the updating of a global ML model in a federated learning system are provided. A computer-implemented method comprises receiving first and second model updates. The method further comprises estimating a first level of label noise in first local data used to generate the first model update and a second level of label noise in second local data used to generate the second model update, and determining first and second weighting based on the first and second levels of label noise respectively. The method also modifying the first model update using the first weighting and the second model update using the second weighting. The method still further comprises initiating transmission of the modified first and second model updates to a global device for use in updating the global ML model.

Description

METHODS, APPARATUS AND SYSTEMS FOR FEDERATED LEARNING

Technical Field

Embodiments described herein relate to computer-implemented methods and apparatus for Machine Learning (ML), in particular for use in the updating of a global ML model in a federated learning system.

Background

ML is becoming increasingly integral to a broad variety of technical fields, including natural language processing, computer vision, speech recognition, Internet of Things (loT) related to automation and digitalization tasks. Successful implementation of ML systems often relies upon collecting and processing large amounts of data in suitable environments. For some applications of ML, the process of collecting data may raise privacy concerns, in particular related to the sharing of potentially private and/or confidential data with outside entities (such as those providing ML models) for the purposes of model training.

Some ML techniques may be implemented in such a way as to help address privacy concerns. One such technique is Federated Learning (FL), a machine learning approach in which training data need not necessarily leave a users’ device at all. FL is a collaborative form of machine learning where the training process is distributed among many local users. Typically, in FL systems, a single global user (for example, a server) communicates with plural local users (for example, users of local devices); the local users typically do not communicate directly with one another. The global user (for example, server) may have the role of coordinating between local users, but most if not all of the model training is not performed by the global user (such as a server) and is instead performed by the local users collectively. Instead of sharing data, local users themselves may compute ML model updates (for example, node weight and bias updates for the nodes and edges within a ML model, such as a Neural Network, NN) for a ML model using their locally available data. The ML model updates from the local users may then be shared with another entity (typically the global user), without requiring any sharing of the local user data itself.

The typical operational process of FL systems is as follows. A global user provides a ML model to at least some of a plurality of local users. After the ML model is initialized by the local users a certain number of the local users (potentially all of the local users) may be selected to improve the model. Each selected local user may use its locally available data to train the model, thereby computing a model update. The computed model updates are then sent back to the global user where they are combined in some way (for example: averaged, potentially weighted by the number of training examples that the local user used). The global user then applies the combined update to the ML model, typically by using some form of gradient descent, to generate an updated ML model. The updated ML model may then be shared with the local users again for use and/or further training.

ML applications, including deep learning, have achieved success in numerous domains through the use of large amounts of data. However, the lack of high-quality labels (that is, accurate labels) in many real-world scenarios may be a cause for concern. Low quality labels (that is, labels that are inaccurate or use non-standard terminology, for example) may severely degrade the performance of deep neural networks, however in some situations it may be necessary to use data having low quality labels for training. Accordingly, learning accurately from data having low quality labels (also referred to as noisy labels) is becoming increasingly important in modem deep learning applications.

“Optimal User Selection for High-Performance and Stabilized Energy -Efficient Federated Learning Platforms” by Joohyung, J. et al, available at https://arxiv.org/abs/2204.04677 as of 2 November 2023 discloses FedCorr , a general multistage framework to tackle heterogeneous label noise in FL. FedCorr is intended to deal with data heterogeneity and to increase training stability using an adaptive local proximal regularization term that is based on estimated local noise levels.

It is desirable to provide methods for effectively estimating the label noise on a per local user basis and using the label noise estimates to assign relative weights to the contributions from the local models for use in updating a global model.

Summary

It is an object of the present disclosure to provide methods, apparatus and computer- readable media which at least partially address one or more of the challenges discussed above. In particular, it is an object of the present disclosure to allow label noise estimation without requiring access to local user data outside local users.

The present disclosure provides methods, apparatuses and systems for use in the updating of a global ML model in a federated learning system. Embodiments may support accurate label noise estimation, and hence support efficient generation of accurate global ML models, through use of data impressions that may correspond to local user data.

An embodiment provides a computer-implemented method for use in the updating of a global ML model in a federated learning system. The method comprises receiving a first model update from a first local computing device of the federated learning system and a second model update from a second local computing device of the federated learning system. The method further comprises estimating a first level of label noise in first local data used to generate the first model update and a second level of label noise in second local data used to generate the second model update, and determining a first weighting based on the estimated first level of label noise and a second weighting based on the estimated second level of label noise. The method also comprises modifying the first model update using the first weighting and the second model update using the second weighting; and initiating transmission of the modified first model update and the modified second model update to a global computing device for use in updating the global ML model.

In some embodiments the method may further comprise, by the global computing device, initiating transmission of an initial global model to the first local computing device and the second local computing device. In some embodiments the method may also comprise, by the first local computing device, training the initial global model using first local data to generate the first model update, and by the second local computing device, training the initial global model using second local data to generate the second model update.

A further embodiment provides a moderator for use in the updating of a global ML model in a federated learning system. The moderator comprises processing circuitry, one or more interfaces and a memory containing instructions executable by the processing circuitry. The moderator is operable to receive a first model update from a first local computing device of the federated learning system and a second model update from a second local computing device of the federated learning system. The moderator is further operable to estimate a first level of label noise in first local data used to generate the first model update and a second level of label noise in second local data used to generate the second model update, and to determine a first weighting based on the estimated first level of label noise and a second weighting based on the estimated second level of label noise. The moderator is also operable to modify the first model update using the first weighting and the second model update using the second weighting, and initiate transmission of the modified first model update and the modified second model update to a global computing device for use in updating the global ML model.

A federated learning system in accordance with some embodiments may comprise the moderator, and may further comprise the global computing device, the first local computing device and the second local computing device. The global computing device may be configured to initiate transmission of an initial global model to the first local computing device and the second local computing device. The first local computing device may be configured to train the initial global model using first local data to generate the first model update. The second local computing device may be configured to train the initial global model using second local data to generate the second model update.

Embodiments of the disclosure are discussed below. The scope of the disclosure is defined by the claims.

Brief Description of Drawings

The present disclosure is described, by way of example only, with reference to the following figures, in which:-

Figure 1 is a flowchart of a method in accordance with embodiments;

Figure 2 A is schematic diagram of a moderator in accordance with embodiments;

Figure 2B is schematic diagram of a further moderator in accordance with embodiments;

Figure 2C is a schematic diagram of a global computing device in accordance with embodiments;

Figure 2D is a schematic diagram of a local computing device in accordance with embodiments;

Figure 3 is a schematic illustration of an example FL system in accordance with embodiments;

Figure 4 is sequence diagram showing an example method in accordance with embodiments; and

Figure 5 is a plot comparing the performance of an embodiment and a traditional FL system. Detailed Description

For the purpose of explanation, details are set forth in the following description in order to provide a thorough understanding of the embodiments disclosed. It will be apparent, however, to those skilled in the art that the embodiments may be implemented without these specific details or with an equivalent arrangement.

In existing FL systems, it is typically assumed that all local users have similar distribution of data across all users and the data does not have any label noise. However, in many cases, this assumption may not be valid and may worsen the performance of the FL system. Also, the label used for given data may be dependent on the perspective of the user labelling the data. For example, a first user may label a pattern of features with label ‘A’ while a second user may label the same pattern of features with label ‘B’, thereby resulting in label noise.

In some known systems label noise may be estimated by analysing data, however this requires the sharing of data and may introduce privacy concerns as discussed above; this option is therefore not typically compatible with FL systems where it is intended to provide data privacy.

Embodiments provide systems for effectively taking into account label noise in FL. Embodiments may result in global ML models being more efficiently trained to a given accuracy level and/or providing a higher accuracy level within a given number of training epochs. Robust ML modelling that has reduced sensitivity to label noise may be provided.

In some embodiments to calculate the label noise in a federated learning system a verified dataset may be used. The verified dataset may be publicly available to all local users (that is, users of local computing devices) and a global user (that is, a user of a global computing device) In some embodiments, instead of the sending ML model updates directly to the global computing device, the ML model updates may be sent to a moderator. The moderator may then create anonymized data impressions and thereby estimate the label noise level by comparing the constructed data impressions and actual features in the verified data. The global user may then use the estimated label noise levels to assign relative weightings to the local device ML model updates, for use when updating a global ML model. Accordingly, label noise levels may be accurately estimated without requiring access to the original user data outside the local users. A method in accordance with embodiments is illustrated by Figure 1, which is a flowchart showing a computer-implemented method for use in the updating of a global ML model in a FL system. Embodiments are applicable to a wide variety of ML model types, such as NN, random forest, and so on. The computer-implemented method may be performed by any suitable apparatus, for example, by a moderator 20A, 20B such as those shown in Figure 2A and Figure 2B. Where the network is a communications network (or part of the same), the method may be performed by an apparatus (such as a moderator) that is or forms part of a network node, such as a base station or core network node (or may be incorporated in a base station or core network node). The communications network may be for example a 5G network or 6G network. Alternatively, the moderator may be or form part of a system for the control of autonomous and/or semi-autonomous vehicles. In some embodiments, the method may be performed at least in part by a global computing device 31 and/or plural local computing devices 32, such as those shown in Figure 2C and Figure 2D.

As shown in step S102 of Figure 1, the method comprises receiving a first model update from a first local computing device of the federated learning system and a second model update from a second local computing device of the federated learning system. The step of receiving first and second model updates may be performed in accordance with a computer program stored in a memory 23, executed by a processor 21 in conjunction with one or more interfaces 22 of moderator 20A, as illustrated by Figure 2A. Alternatively, the step of receiving first and second model updates may be performed by transceiver 24 of moderator 20B, as shown in Figure 2B. In some embodiments, the model updates may comprise updated node weights and biases for a ML model (such as a neural network, for example) that have been obtained through training at the local computing devices (using local computing device data). The term “node” in this context is used to refer to software constructs within a ML model, rather than a physical device. In some embodiments, the method further comprises initiating transmission by the global computing device 31 of an initial global ML model to the local computing devices 32; typically the same initial global model may be sent to all of the local computing devices. The step of initiating transmission of the initial global ML model may be performed in accordance with a computer program stored in a memory 35, executed by a processor 33 in conjunction with one or more interfaces 34 of global computing device 31, as illustrated by Figure 2C. The initial global model may then be used as the starting point for the training by the local computing devices; by way of example, the first local computing device may train the initial global model using first local data to generate the first model update, and the second local computing device may train the initial global model using second local data to generate the second model update. For completeness, it is noted that all of the embodiments discussed herein may comprise additional local computing devices beyond the first and second local computing devices, that is, equivalent steps to those performed by first and second local computing devices may also be performed by third local computing devices, fourth local computing devices, and so on. The number of local devices involved in a FL system is dependent on the specific functioning of the system; some FL systems may include tens or hundreds of local computing devices. The local data of the local computing devices may, in some embodiments, be kept confidential at the local computing devices; in other embodiments some or all of the local data may be shared with other local computing devices or the global computing device.

The ML model may be trained at the local computing devices in any suitable way. An example of a suitable training method is utilising Reinforcement Learning, RL. RL allows a ML model to learn by attempting to maximise an expected cumulative reward for a series of actions utilising trial-and-error. RL agents (that is, a system which uses RL in order to improve ML model performance in a given task over time) may be closely linked to the system (environment) they are being used to model/control, and learn through experiences of performing actions that alter the state of the environment. Additionally or alternatively, RL may be used to train a ML model wherein a simulation of an environment is used, before the trained ML model is used to control an actual environment; this option may be of use where it is desirable to avoid having an untrained or partially trained ML model controlling an actual operating environment. Following a training epoch, the updated ML model parameters (ML model update) from each of the local computing devices performing model training may then be transmitted to the global computing device (which may comprise a moderator), or to a separate moderator, depending on the system configuration The step of training the initial global ML model and initiating transmission of the ML model updates to the global computing device 31 or moderator 20 may be performed in accordance with a computer program stored in a memory 38, executed by a processor 36 in conjunction with one or more interfaces 37 of local computing device 32, as illustrated by Figure 2D.

When the first and second ML model updates have been received (for example, by the moderator, which may be part of a global computing device), the method continues with estimating a first level of label noise in first local data used to generate the first model update and a second level of label noise in second local data used to generate the second model update, as shown in step S104 of Figure 1. As discussed above, in some embodiments the estimate can be provided without the moderator having any access to the local data of the local computing devices. The step of estimating the levels of label noise may be performed in accordance with a computer program stored in a memory 23, executed by a processor 21 in conjunction with one or more interfaces 22 of moderator 20A, as illustrated by Figure 2A. Alternatively, the step of estimating the levels of label noise may be performed by estimator 25 of moderator 20B, as shown in Figure 2B.

In some embodiments, the moderator may use a verified data set when estimating the first level of label noise and the second level of label noise. The verified data set may be used to provide a benchmark data set, against which model updates generated using unverified local data at the local computing devices may be compared. In some embodiments, the verified data set may be publicly available. The verified data set may be hosted by the moderator, thereby providing easy access for the moderator to the verified data set; alternatively, the moderator may retrieve the verified data set from a server 28 separate from but connected to the moderator (as shown in Figure 2B).

In embodiments wherein the moderator uses a verified data set when estimating the label noise levels, the moderator may process data from the verified data set using the given model update to generate a given output, then reverse process the given output using the given model update to generate a given data impression corresponding to the given model update. The moderator may then compare the given data impression to the data from the verified data set and estimates the label noise for the given model update based on the divergence between the given data impression and the data from the verified data set.

An example of how the label noise on the model updates may be estimated, in accordance with embodiments, is as follows. In the following example, the public data set is y_p) ; in this example the public data set is hosted at the moderator and is available to the local computing devices and the global computing device. In the example, the moderator first sends the initial global model G₀to the local computing devices. Subsequently, at iteration ‘i’, the local computing devices update the model G_L by training the model using local data, and send model updates mi j = 1, ... , 1V to the moderator.

The moderator then estimates the level of label noise in the local data used to generate the model updates. In examples according to embodiments, the level of label noise may be estimated using an anonymized data impression. Given a model M which relates between the input X (which is the local user data) and output model update y, where X 6 R^{M x N} is set of the features available (with N being the number of features in the local user data) and y G R^M. The anonymized data impression is an anonymized feature set X* which has same properties as X. In order to determine X*, softmax values from the Dirichlet distribution are sampled. To make the obtained softmax values match with the realistic scenario, a Class Similarity matrix is used to control the sampling distribution. The class similarity matrix contains important information on how similar the classes are to each other. If the classes are similar, typically the softmax values are concentrated uniformly over these labels and vice-versa. The class similarity matrix is obtained by considering the weights of the last layer of the model (which has been trained using local user data). In general, any classification model has the final layer as fully connected layer with a softmax non-linearity. If the classes are similar, we find similar weights between connection of previous layer to the nodes of the classes. The class similarity matrix (C) is obtained using C(i, ^Wi '^{s vec}t°^r °f the

weights connecting the previous layer nodes to the class node i and C G R^{K x K} is the similarity matrix for K classes in the data. When the class similarity matrix (C) has been obtained, the softmax values can be sampled as Softmax = Dir (K, C) where K is a concentration parameter that controls the spread of the softmax values over classes in the data.

When the softmax values have been determined, these values may then be used to generate the anonymised data impressions. Where the softmax values (Y^k) corresponding to class k are calculated using Y^k = [yi^k, y2^k,.... , yN^k] 6 R^{K x N}, sampled from Dirichlet distribution constructed as discussed above, the data impressions X* may be obtained by solving the optimization problem X* = arg minx LCE (yi^k, M(X)) using model M and softmax values Y^k . In order to solve the problem, the input X is set to a random input, then iterated until the cross-entropy loss (LCE) change is less than a significant value between two iterations. The process is repeated for each of the K categories, thereby obtaining data impressions for each class, which thereby provide anonymized data features for each class.

When the anonymized data impressions have been calculated, the difference between the input data (from the verified data set) and the anonymized data impressions then be obtained. To summarise the above example, the model updates mi .j = 1, ... , 1V from the N local computing devices are used to process verified data x from the verified data set to generate outputs ytj.j = 1, ... , IV. The outputs are then used to generate reconstructions of the input verified data (the reconstructions are the anonymized data impressions, Xij,j — 1, ... A). The anonymized data impressions Xij, j = 1, ... N may then be compared to the verified data x that was input, with the level of divergence D between the two calculated using

11 x_tJ - x_p 11² (hereinafter referred to as Equation 1).

The level of divergence D for a given model update correlates with the level of label noise in the data used to generate the model update, and hence correlates with how reliable the model update is. Accordingly, the level of label noise for local computing device j at iteration z may be obtained.

When the first level of label noise in the first local data used to generate the first model update and the second level of label noise in the second local data used to generate the second model update have been estimated (for example, as discussed above, along with any further levels of label noise for further sets of local data), weightings for the model updates may then be determined as shown in step S 106 of Figure 1. The step of determining a first weighting based on the estimated first level of label noise and a second weighting based on the estimated second level of label noise may be performed in accordance with a computer program stored in a memory 23, executed by a processor 21 in conjunction with one or more interfaces 22 of moderator 20A, as illustrated by Figure 2A. Alternatively, the step of determining a first weighting based on the estimated first level of label noise and a second weighting based on the estimated second level of label noise may be performed by determinator 26 of moderator 20B, as shown in Figure 2B. Typically, the determined weighting for a given model update is inversely proportional to the estimated level of label noise corresponding to the model update, such that model updates generated using local data having more noisy labels are given lower weightings and are thus less influential when updating the global ME model.

When the first and second weightings have been determined, the method continues with modifying the first model update using the first weighting and the second model update using the second weighting, as shown in step S108 of Figure 1. In some embodiments, the modification to the model updates may comprise appending the determined weighting on to the model update, or by applying the determined weighting to the model update (for example, as a multiplier). The step of modifying the model updates may be performed in accordance with a computer program stored in a memory 23, executed by a processor 21 in conjunction with one or more interfaces 22 of moderator 20A, as illustrated by Figure 2A. Alternatively, the step of modifying the model updates may be performed by modifier 27 of moderator 20B, as shown in Figure 2B .

Subsequent to the modification of the model updates based on the weightings, the method continues with initiating transmission of the modified first model update and the modified second model update to the global computing device for use in updating the global ML model, as shown in step SI 10 of Figure 1. In some embodiments, the moderator may perform the transmission itself; alternatively the moderator may instruct the transmission to be performed. Where the moderator forms part of the global computing device, the transmission may be within said device. Where the moderator is separate from but connected to the global computing device, any suitable wired or wireless transmission method may be used. The step of initiating transmission of the modified model updates may be performed in accordance with a computer program stored in a memory 23, executed by a processor 21 in conjunction with one or more interfaces 22 of moderator 20A, as illustrated by Figure 2A. Alternatively, the step of initiating transmission of the modified model updates may be performed by transceiver 24 of moderator 20B, as shown in Figure 2B.

In some embodiments when the global computing device has been provided with the modified model updates (for example, modified first model update, modified second model update, modified third model update, and so on), the global computing device may generate an updated global ML model using the modified first model update, the modified second model update, and so on. By way of example, the updated global model may be generated in accordance with Gj = ₌₁ £ ₇7nj₇ (hereinafter referred to as Equation 2), using the nomenclature set out in the example above. In some embodiments, the global computing device may then initiate transmission (either by transmitting itself or by instructing a further device to transmit) of the updated global ML model to the local computing devices. The steps of generating the updated global ML model and initiating transmission of the updated global ML model may be performed in accordance with a computer program stored in a memory 35, executed by a processor 33 in conjunction with one or more interfaces 34 of global computing device 31, as illustrated by Figure 2C. The local computing devices may then use the updated global ML model and/or initiate a further training epoch of the updated global ML model as discussed above. The step of using and/or further training the updated global ML model may be performed in accordance with a computer program stored in a memory 38, executed by a processor 36 in conjunction with one or more interfaces 37 of local computing device 32, as illustrated by Figure 2D

Figure 3 is a schematic diagram illustrating an example FL system in accordance with embodiments. Figure 4 is a sequence diagram corresponding to Figure 3. In the example shown in Figure 3 and Figure 4, the global user 31 (that is, the user of the global computing device) sends the initial global ML model to the local users 32A, 32B and 32C (collectively 32) that is, users of the local computing devices (see S401, S402 and S403 in Figure 4). In the Figure 3 and Figure 4 example local users 1, 2 and N are shown; as explained above the number of local computing devices participating in the FL system varies depending on the specific implementation. The local users 32 then train the global ML model using (private) local data (see S404, S405 and S406) and send the model updates to the moderator 20 (see S407, S408 and S409). The local data is not sent with the model updates to the moderator. In the example FL system shown in Figure 3, the moderator is separate to the global computing device. The moderator then estimates the levels of label noise in the (private) local data used to generate the model updates (using Equation 1 as discussed above) and modifies the model updates accordingly (see S410 and S411). The modified model updates are then sent to the global computing device (see S412) and used to generate an updated global ML model using Equation 2 as discussed above (see S413), which may then be sent by the global user to the local users as shown in Figure 3 to restart the cycle.

The plot in Figure 5 shows a comparison of the effectiveness of a FL system implementing a method in accordance with embodiments (solid line), and a known FL system not utilizing a method in accordance with embodiments (dashed line). In the known FL system, the model updates from the local computing devices are combined by simple averaging at the global computing device to generate the updated global ML model; there is no weighting or modification of the model updates. The x axis of the plot shows a number of training iterations, while the y axis of the plot shows the accuracy of the global ML model generated using the model updates from the local computing devices.

In the Figure 5 example, the private user data at the local computing devices includes three labels {Z₁₍ l₂, Z₃}. To introduce label-noise the labels corresponding to same features are modified to artificially introduce label noise. The label noise is as follows: User 1: Noise level of zero i.e. the pattern of labels of this user is the same as the pattern in the verified dataset.

User 2: Noise level of 50% i.e 50% of labels are mislabeled manually.

User 3: Noise level of 30% i.e. 30% of labels are mislabeled manually.

To generate the results, the global user shares the initial global model to all three local users. The choice of model is 3-layer Convolutional Neural Network (CNN) model. The results in Figure 5 were obtained using 10 training iterations. The final accuracies obtained at the end of the 10 iterations with different methods are as shown in Table 1 below:

Table 1

As can be seen from Figure 5 and Table 1, the example method in accordance with embodiments results in higher accuracies when compared with typical FL. Embodiments may handle label-noise robustly (in particular, when compared to typical FL methods), and may provide high accuracy and rapid convergence.

Embodiments may be utilised in any system wherein FL is employed. By way of example, embodiments may be utilised in systems wherein the first local computing device and second local computing device are controllers for autonomous vehicles, and wherein the global ML model is an autonomous vehicle control model. In such embodiments, the global ML model may be used in the control of the autonomous vehicles. Embodiments may be particularly well suited to this application; it is likely that autonomous vehicles in a given area will have been provided by different vendors. The vendors may use their own labels and use their domain expertise to label the data, leading to inconsistencies in the training data used by different local computing devices. To address the privacy and security concerns, FL may be used to create a global model. If the inherent label differences are not taken into account, it is likely that the performance of the global ML model generated by the FL system will not be satisfactory. Using methods in accordance with embodiments may help to circumvent potential data label issues, providing a satisfactorily performing global ML model.

As a further example, embodiments may also be used in systems wherein the first local computing device and second local computing device are communication network controllers, and wherein the global ML model is a network anomaly detection model. In such embodiments, the global ML model may be used in the detection of network anomalies. Anomaly detection coming from the Al can effectively augment and automate early detection, predictions, and decision-making regarding operations and business processes. When the time of detecting the deviation is improved, the resolution of incidents may be effected more rapidly, reducing costs associated with the interruption. Embodiments may be used to understand incorrect or poisonous labels and to analyse multiple dimensions of data sources - looking at the cell, subscriber, and device-level Key Performance Indicators (KPIs), fault monitoring in network equipment and correlating alerts across domains for noise reduction and root cause analysis.

It will be appreciated that examples of the present disclosure may be virtualised, such that the methods and processes described herein may be run in a cloud environment.

The methods of the present disclosure may be implemented in hardware, or as software modules running on one or more processors. The methods may also be carried out according to the instructions of a computer program, and the present disclosure also provides a computer readable medium having stored thereon a program for carrying out any of the methods described herein. A computer program embodying the disclosure may be stored on a computer readable medium, or it could, for example, be in the form of a signal such as a downloadable data signal provided from an Internet website, or it could be in any other form.

In general, the various exemplary embodiments may be implemented in hardware or special purpose circuits, software, logic or any combination thereof. For example, some embodiments may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the disclosure is not limited thereto. While various aspects of the exemplary embodiments of this disclosure may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as nonlimiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.

As such, it should be appreciated that at least some aspects of the exemplary embodiments of the disclosure may be practiced in various components such as integrated circuit chips and modules. It should thus be appreciated that the exemplary embodiments of this disclosure may be realized in an apparatus that is embodied as an integrated circuit, where the integrated circuit may comprise circuitry (as well as possibly firmware) for embodying at least one or more of a data processor, a digital signal processor, baseband circuitry and radio frequency circuitry that are configurable so as to operate in accordance with the exemplary embodiments of this disclosure.

It should be appreciated that at least some aspects of the exemplary embodiments of the disclosure may be embodied in computer-executable instructions, such as in one or more program modules, executed by one or more computers or other devices. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types when executed by a processor in a computer or other device. The computer executable instructions may be stored on a computer readable medium such as a hard disk, optical disk, removable storage media, solid state memory, RAM, etc. As will be appreciated by one of skill in the art, the function of the program modules may be combined or distributed as desired in various embodiments. In addition, the function may be embodied in whole or in part in firmware or hardware equivalents such as integrated circuits, field programmable gate arrays (FPGA), and the like.

References in the present disclosure to “one embodiment”, “an embodiment” and so on, indicate that the embodiment described may include a particular feature, structure, or characteristic, but it is not necessary that every embodiment includes the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to implement such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.

It should be understood that, although the terms “first”, “second” and so on may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first element could be termed a second element, and similarly, a second element could be termed a first element, without departing from the scope of the disclosure. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed terms.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit the present disclosure. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises”, “comprising”, “has”, “having”, “includes” and/or “including”, when used herein, specify the presence of stated features, elements, and/or components, but do not preclude the presence or addition of one or more other features, elements, components and/ or combinations thereof. The terms “connect”, “connects”, “connecting” and/or “connected” used herein cover the direct and/or indirect connection between two elements.

The present disclosure includes any novel feature or combination of features disclosed herein either explicitly or any generalization thereof. Various modifications and adaptations to the foregoing exemplary embodiments of this disclosure may become apparent to those skilled in the relevant arts in view of the foregoing description, when read in conjunction with the accompanying drawings. However, any and all modifications will still fall within the scope of the non-limiting and exemplary embodiments of this disclosure. For the avoidance of doubt, the scope of the disclosure is defined by the claims.

Claims

Claims:

1. A computer- implemented method for use in the updating of a global machine learning, ML model in a federated learning system, the method comprising: receiving (S102) a first model update from a first local computing device of the federated learning system and a second model update from a second local computing device of the federated learning system; estimating (S 104) a first level of label noise in first local data used to generate the first model update and a second level of label noise in second local data used to generate the second model update; determining (S106) a first weighting based on the estimated first level of label noise and a second weighting based on the estimated second level of label noise; modifying (S108) the first model update using the first weighting and the second model update using the second weighting; and initiating transmission (SI 10) of the modified first model update and the modified second model update to a global computing device for use in updating the global ML model.

2. The method of claim 1, wherein a verified data set is used when estimating the first level of label noise and the second level of label noise.

3. The method of claim 2, wherein a moderator (20) hosts the verified data set, or wherein the verified data set is retrieved from a server (28).

4. The method of any of claims 2 and 3 wherein, when estimating a given level of label noise, data from the verified data set is processed using the given model update to generate a given output, then the given output is reverse processed using the given model update to generate a given data impression corresponding to the given model update.

5. The method of claim 4 wherein, when estimating the given level of label noise, the given data impression is compared to the data from the verified data set, and the label noise for the given model update is estimated based on the divergence between the given data impression and the data from the verified data set.

6. The method of any preceding claim further comprising: by the global computing device, initiating transmission of an initial global model to the first local computing device and the second local computing device; by the first local computing device, training the initial global model using first local data to generate the first model update; and by the second local computing device, training the initial global model using second local data to generate the second model update.

7. The method of claim 6, wherein the first local data is kept confidential at the first local computing device, and the second local data is kept confidential at the second local computing device.

8. The method of any preceding claim further comprising, at the global computing device, generating an updated global model using the modified first model update and the modified second model update.

9. The method of claim 8 further comprising, at the global computing device, initiating transmission of the updated global model to at least one of the first local computing device and the second local computing device.

10. The method of any preceding claim, wherein the first model update comprises first node weights and first biases, and the second model update comprises second node weights and second biases.

11. The method of any preceding claim, wherein a moderator forms part of the global computing device, or wherein the moderator is separate from but connected to the global computing device.

12. The method of any preceding claim, wherein the first local computing device and second local computing device are controllers for autonomous vehicles, and wherein the global ML model is an autonomous vehicle control model.

13. The method of claim 12 further comprising, at one or more of the first local computing device and second local computing device, using the global ML model in the control of autonomous vehicles.

14. The method of any of claims 1 to 11, wherein the first local computing device and second local computing device are communication network controllers, and wherein the global ML model is a network anomaly detection model.

15. The method of claim 14 further comprising, at one or more of the first local computing device and second local computing device, using the global ML model in the detection of network anomalies.

16. A moderator (20A) for use in the updating of a global machine learning, ML model in a federated learning system, the moderator comprising processing circuitry (21), one or more interfaces (23) and a memory (22) containing instructions executable by the processing circuitry (21), whereby the moderator (20A) is operable to: receive a first model update from a first local computing device (32A) of the federated learning system and a second model update from a second local computing device (32B) of the federated learning system; estimate a first level of label noise in first local data used to generate the first model update and a second level of label noise in second local data used to generate the second model update; determine a first weighting based on the estimated first level of label noise and a second weighting based on the estimated second level of label noise; modify the first model update using the first weighting and the second model update using the second weighting; and initiate transmission of the modified first model update and the modified second model update to a global computing device (31) for use in updating the global ML model.

17. The moderator (20A) of claim 16, wherein the moderator (20A) is configured to use a verified data set to estimate the first level of label noise and the second level of label noise.

18. The moderator (20A) of claim 17, wherein the moderator (20A) hosts the verified data set, or wherein the moderator (20A) is configured to retrieve the verified data set from a server (28).

19. The moderator (20A) of any of claims 17 and 18, wherein the moderator (20A) is configured, when estimating a given level of label noise, to process data from the verified data set using the given model update to generate a given output, and to reverse processes the given output using the given model update to generate a given data impression corresponding to the given model update.

20. The moderator (20A) of claim 19, wherein the moderator (20A) is configured, when estimating the given level of label noise, to compare the given data impression to the data from the verified data set, and to estimate the label noise for the given model update based on the divergence between the given data impression and the data from the verified data set.

21. A federated learning system comprising the moderator (20A) of any of claims 16 to 20, and further comprising the global computing device (31), the first local computing device (32A) and the second local computing device (32B), wherein: the global computing device (31), is configured to initiate transmission of an initial global model to the first local computing device (32A) and the second local computing device (32B); the first local computing device (32A) is configured to train the initial global model using first local data to generate the first model update; and the second local computing device (32B) is configured to train the initial global model using second local data to generate the second model update.

22. The federated learning system of claim 21, wherein the first local computing device (32A) is configured to keep the first local data confidential at the first local computing device (32A), and the second local computing device (32B) is configured to keep the second local data confidential at the second local computing device (32B).

23. The federated learning system of any of claims 21 and 22, wherein the global computing device (31) is configured to generate an updated global model using the modified first model update and the modified second model update.

24. The federated learning system of claim 23, wherein the global computing device (31) is configured to initiate transmission of the updated global model to at least one of the first local computing device (32A) and the second local computing device (32B).

25. The federated learning system of any of claims 21 to 24, wherein the first model update comprises first node weights and first biases, and the second model update comprises second node weights and second biases.

26. The federated learning system of any of claims 21 to 25, wherein the moderator (20) forms part of the global computing device (31), or wherein the moderator (20) is separate from but connected to the global computing device (31).

27. The federated learning system of any of claims 21 to 26, wherein the first local computing device (32A) and second local computing device (32B) are controllers for autonomous vehicles, and wherein the global ML model is an autonomous vehicle control model.

28. The federated learning system of claim 27, wherein one or more of the first local computing device (32A) and second local computing device (32B) are configured to use the global ML model in the control of autonomous vehicles.

29. The federated learning system of any of claims 21 to 26, wherein the first local computing device (32A) and second local computing device (32B) are communication network controllers, and wherein the global ML model is a network anomaly detection model.

30. The federated learning system of claim 29, wherein one or more of the first local computing device (32A) and second local computing device (32B) are configured to use the global ML model in the detection of network anomalies.

31. A computer-readable medium comprising instructions which, when executed on a computer, cause the computer to perform a method in accordance with any of claims 1 to 15.