EP4566008A1

EP4566008A1 - Methods for training models in a federated system

Info

Publication number: EP4566008A1
Application number: EP23745545.6A
Authority: EP
Inventors: Tao Xu
Original assignee: F Hoffmann La Roche AG
Current assignee: F Hoffmann La Roche AG
Priority date: 2022-08-02
Filing date: 2023-07-27
Publication date: 2025-06-11
Also published as: WO2024028196A1

Abstract

The present disclosure relates generally to machine learning. More particularly, the present disclosure relates to federated learning. Moreover, the present invention relates to a system for federated learning, a computer program and a computer-readable medium.

Description

Methods for training models in a federated system Field of the invention The present disclosure relates generally to machine learning. More particularly, the present disclosure relates to federated learning. Moreover, the present invention relates to a system for federated learning, a computer program and a computer-readable medium. Background of the invention Access to high quality data is one of the key components for the success of building machine learning models. However, due to the improved regulations on data protection and stricter control for data privacy, it is becoming more difficult to use a centralized approach to collect all data under one party. For example, healthcare datasets are quite often siloed in the different providers in the whole healthcare system, as the data collectors usually have the concerns about the intended use of the data and by sharing it in a central system the access control will be compromised. Federated data systems were developed to solve these data access problems. In a federated system, a user on one node can generate insights and build models without direct access to the data of other nodes, so data does not need to be moved to a central place for computation. Only model parameters and summary statistics are communicated in this setting. However, this also comes with a cost that many traditional modeling methods are not applicable in the new setting, since they will need direct access to the datasets. Models have to be trained in each node separately, and the challenge is on how the final model can be built from the individual models. It is therefore desirable to provide methods and devices which address the above mentioned technical challenges. The present invention is directed to provide a method for aggregating model parameters from individual models that demonstrate a robust performance across datasets on individual nodes. Summary Aspects and advantages of embodiments of the present disclosure will be set forth in part in the following description, or can be learned from the description, or can be learned through practice of the embodiments. One example aspect of the present disclosure is directed to a computer-implemented method for model aggregation in a federated system. The method comprises obtaining, by each client computing device, information specifying a model training process, and global values for a set of parameters of a machine- learned model. The method comprises training, by each of said client computing device, the machine- learned model based at least in part on a local dataset to obtain an update matrix descriptive of updated values for the set of parameters of the machine-learned model, wherein the local dataset is stored locally by said client computing device. The method comprises communicating, by each client computing device, information descriptive of the update matrix, and a set of model evaluation metrics, to a server computing device. The method comprises computing, by said server computing device, a global parameter based on updated values for each set of parameters obtained from a multiplicity of said client computing devices selected at least in part by model evaluation metrics thereof, wherein the global parameter is an average of said update matrices, and a residual parameter matrix is calculated by subtracting the average of the parameter matrix from the original parameter matrix, and a low-rank approximated of the residual matrix is calculated via singular value decomposition. The method comprises outputting, by the server computing device, an aggregated model parameter, wherein the aggregated model parameter is a normalized pseudo inverse of the sum of the global parameter and said residual matrix thereof. Another example aspect of the present disclosure is directed to a computer-implemented method for model aggregation in a federated system. The method comprises obtaining, by each client computing device, ^{information specifying a model training process, and global values for a set of parameters} (Pg) ^{of a} machine-learned model from a server computing device. The method comprises training, by each of said client computing device, the machine-learned model based at least in part on a local dataset to obtain updated values for the set of parameters (P_g,i) of the machine-learned model, wherein the local dataset is stored locally by said client computing device. The method comprises communicating, by each client computing device, information descriptive of the updated values for the set of parameters (P_g,i), and a set of model evaluation metrics (E_i), to a server computing device. The method comprises computing, by said server computing device, a global parameter based on updated values for each set of parameters obtained from a multiplicity of said client computing devices selected at least in part by model evaluation metrics ^{thereof, wherein the global parameter is an average (} ^^ ^^) ^{of said update matrices} ([Pg,i,…,Pg,N])^{, and a} residual parameter matrix ([Rg,1,…,Rg,N]) is calculated by subtracting the average of the parameter matrix (Pg) from the said updated matrices ([Pg,i,…,Pg,N]), and a low-rank approximated of the residual matrix is ^{calculated via singular value decomposition} ([Rg,1’,…,Rg,N’ ])^{. The method comprises outputting, by the} server computing device, an aggregated model parameter, wherein the aggregated model parameter is a normalized pseudo inverse of the sum of the global parameter ^^ ^^ and said low-rank approximated of the residual matrix thereof ([Rg,1’,…,Rg,N’ ]). Another example aspect of the present disclosure is directed to a client computing device. The client device includes at least one processor and at least one computer-readable medium that stores instructions that, when executed by the at least one processor, cause the client computing device to perform operations. The operations comprise obtaining, by the client computing device, information specifying a model training process, and global values for a set of parameters of a machine-learned model. The operations comprise training, by said client computing device, the machine-learned model based at least in part on a local dataset to obtain an update matrix descriptive of updated values for the set of parameters of the machine- learned model, wherein the local dataset is stored locally by said client computing device. The operations comprise communicating, by said client computing device, information descriptive of the update matrix, and a set of model evaluation metrics, to a server computing device. Another example aspect of the present disclosure is directed to a client computing device. The client device includes at least one processor and at least one computer-readable medium that stores instructions that, when executed by the at least one processor, cause the client computing device to perform operations. The operations comprise obtaining, by the client computing device, information specifying a model training process, and global values for a set of parameters (Pg) of a machine-learned model from a server computing device. The operations comprise training, by said client computing device, the machine-learned model based at least in part on a local dataset to obtain updated values for the set of parameters (P_g,i) of the machine-learned model, wherein the local dataset is stored locally by said client computing device. The operations comprise communicating, by said client computing device, information descriptive of the updated values for the set of parameters (P_g,i), and a set of model evaluation metrics (E_i), to a server computing device. Another example aspect of the present disclosure is directed to at least one computer-readable medium that stores instructions that, when executed by a client computing device, cause the client computing device to perform operations. The operations comprise obtaining, by the client computing device, information specifying a model training process, and global values for a set of parameters of a machine-learned model. The operations comprise training, by said client computing device, the machine-learned model based at least in part on a local dataset to obtain an update matrix descriptive of updated values for the set of parameters of the machine-learned model, wherein the local dataset is stored locally by said client computing device. The operations comprise communicating, by said client computing device, information descriptive of the update matrix, and a set of model evaluation metrics, to a server computing device. Another example aspect of the present disclosure is directed to at least one computer-readable medium that stores instructions that, when executed by a client computing device, cause the client computing device to perform operations. The operations comprise obtaining, by the client computing device, information specifying a model training process, and global values for a set of parameters (Pg) of a machine-learned model from a server computing device. The operations comprise training, by said client computing device, the machine-learned model based at least in part on a local dataset to obtain updated values for the set of parameters (P_g,i) of the machine-learned model, wherein the local dataset is stored locally by said client computing device. The operations comprise communicating, by said client computing device, information ^{descriptive of the updated values for the set of parameters} (Pg,i)^{, and a set of model evaluation metrics} (E_i), to a server computing device. Other aspects of the present disclosure are directed to various systems, apparatuses, computer readable media, user interfaces, and electronic devices. Brief description of the drawings Figure 1 depicts a schema of an example federated learning system according to example embodiments of the present disclosure. Figure 2 depicts a block diagram of an example computing system according to example embodiments of the present disclosure. Figure 3 depicts a flow chart diagram of an example model aggregation process according to example embodiments of the present disclosure. Figure 4 depicts a flow chart diagram of an example model aggregation process according to example embodiments of the present disclosure. Figure 5 depicts the experimental setup of an assessment to evaluate the performance of the method of the invention. Figure 6 shows the results of the application of the method of invention on different train/test splits. The x- axis indicates the clinical trial that was used as the testing case. The other trials were used as training data for the corresponding testing case. Figure 7 shows an example of LRD based-model of the invention that performs better than the model created on the pooled dataset. Detailed description of the invention Definitions As used in the following, the terms “have”, “comprise” or “include” or any arbitrary grammatical variations thereof are used in a non-exclusive way. Thus, these terms may both refer to a situation in which, besides the feature introduced by these terms, no further features are present in the entity described in this context and to a situation in which one or more further features are present. As an example, the expressions “A has B”, “A comprises B” and “A includes B” may both refer to a situation in which, besides B, no other element is present in A (i.e. a situation in which A solely and exclusively consists of B) and to a situation in which, besides B, one or more further elements are present in entity A, such as element C, elements C and D or even further elements. Further, it shall be noted that the terms “at least one”, “one or more” or similar expressions indicating that a feature or element may be present once or more than once typically will be used only once when introducing the respective feature or element. In the following, in most cases, when referring to the respective feature or element, the expressions “at least one” or “one or more” will not be repeated, non- withstanding the fact that the respective feature or element may be present once or more than once. Further, as used in the following, the terms "preferably", "more preferably", "particularly", "more particularly", "specifically", "more specifically" or similar terms are used in conjunction with optional features, without restricting alternative possibilities. Thus, features introduced by these terms are optional features and are not intended to restrict the scope of the claims in any way. The invention may, as the skilled person will recognize, be performed by using alternative features. Similarly, features introduced by "in an embodiment of the invention" or similar expressions are intended to be optional features, without any restriction regarding alternative embodiments of the invention, without any restrictions regarding the scope of the invention and without any restriction regarding the possibility of combining the features introduced in such way with other optional or non-optional features of the invention. The term “machine learning” as used herein is a broad term and is to be given its ordinary and customary meaning to a person of ordinary skill in the art and is not to be limited to a special or customized meaning. The term specifically may refer, without limitation, to a method of using artificial intelligence (AI) for automatically model building of analytical models. The term “machine learning system” as used herein is a broad term and is to be given its ordinary and customary meaning to a person of ordinary skill in the art and is not to be limited to a special or customized meaning. The term specifically may refer, without limitation, to a system comprising at least one processing unit such as a processor, microprocessor, or computer system configured for machine learning, in particular for executing a logic in a given algorithm. The machine learning system may be configured for performing and/or executing at least one machine learning algorithm, wherein the machine learning algorithm is configured for building the at least one analysis model based on the training data. The term “mobile device” as used herein is a broad term and is to be given its ordinary and customary meaning to a person of ordinary skill in the art and is not to be limited to a special or customized meaning. The term may specifically refer, without limitation, to a mobile electronics device, more specifically to a mobile communication device comprising at least one processor. The mobile device may specifically be a cell phone or smartphone. The mobile device may also refer to a tablet computer or any other type of portable computer. The mobile device may comprise a data acquisition unit which may be configured for data acquisition. The mobile device may be configured for detecting and/or measuring either quantitatively or qualitatively physical parameters and transform them into electronic signals such as for further processing and/or analysis. For this purpose, the mobile device may comprise at least one sensor. It will be understood that more than one sensor can be used in the mobile device, i.e. at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine or at least ten or even more different sensors. The sensor may be at least one sensor selected from the group consisting of: at least one gyroscope, at least one magnetometer, at least one accelerometer, at least one proximity sensor, at least one thermometer, at least one pedometer, at least one fingerprint detector, at least one touch sensor, at least one voice recorder, at least one light sensor, at least one pressure sensor, at least one location data detector, at least one camera, at least one GPS, and the like. The mobile device may comprise the processor and at least one database as well as software which is tangibly embedded to said device and, when running on said device, carries out a method for data acquisition. The mobile device may comprise a user interface, such as a display and/or at least one key, e.g. for performing at least one task requested in the method for data acquisition. The term “predicting” as used herein is a broad term and is to be given its ordinary and customary meaning to a person of ordinary skill in the art and is not to be limited to a special or customized meaning. The term specifically may refer, without limitation, to determining at least one numerical or categorical value indicative of the disease status for the at least one state variable. In particular, the state variable may be filled in the analysis as input and the analysis model may be configured for performing at least one analysis on the state variable for determining the at least one numerical or categorical value indicative of the disease status. The analysis may comprise using the at least one trained algorithm. The term “input data” as used herein is a broad term and is to be given its ordinary and customary meaning to a person of ordinary skill in the art and is not to be limited to a special or customized meaning. The term specifically may refer, without limitation, to experimental data used for model building. The input data comprises the set of historical digital biomarker feature data. The term “biomarker” as used herein is a broad term and is to be given its ordinary and customary meaning to a person of ordinary skill in the art and is not to be limited to a special or customized meaning. The term specifically may refer, without limitation, to a measurable characteristic of a biological state and/or biological condition. The term “feature” as used herein is a broad term and is to be given its ordinary and customary meaning to a person of ordinary skill in the art and is not to be limited to a special or customized meaning. The term specifically may refer, without limitation, to a measurable property and/or characteristic of a symptom of the disease on which the prediction is based. In particular, all features from all tests may be considered and the optimal set of features for each prediction is determined. Thus, all features may be considered for each disease. The term “digital biomarker feature data” as used herein is a broad term and is to be given its ordinary and customary meaning to a person of ordinary skill in the art and is not to be limited to a special or customized meaning. The term specifically may refer, without limitation, to experimental data determined by at least one digital device such as by a mobile device which comprises a plurality of different measurement values per subject relating to symptoms of the disease. The digital biomarker feature data may be determined by using at least one mobile device. With respect to the mobile device and determining of digital biomarker feature data with the mobile device reference is made to the description of the determination of the state variable with the mobile device above. The set of historical digital biomarker feature data comprises a plurality of measured values per subject indicative of the disease status to be predicted. The term “historical” as used herein is a broad term and is to be given its ordinary and customary meaning to a person of ordinary skill in the art and is not to be limited to a special or customized meaning. The term specifically may refer, without limitation, to the fact that the digital biomarker feature data was determined and/or collected before model building such as during at least one test study. The input data may be determined in at least one active test and/or in at least one passive monitoring. For example, the input data may be determined in an active test using at least one mobile device such as at least one cognition test and/or at least one hand motor function test and/or or at least one mobility test. The term “machine learning model” or “machine-learned model” as used herein is a broad term and is to be given its ordinary and customary meaning to a person of ordinary skill in the art and is not to be limited to a special or customized meaning. The term specifically may refer, without limitation, to at least one trainable algorithm. The model unit may comprise a plurality of machine learning models, e.g. different machine learning models for building the regression model and machine learning models for building the classification model. For example, the analysis model may be a regression model and the algorithm of the machine learning model may be at least one algorithm selected from the group consisting of: Generalized linear models (GLM); partial last-squares (PLS); linear discriminant analysis (LDA); quadratic discriminant analysis (QDA); Generalized Additive Model (GAM); Neural Networks (NN). For example, the analysis model may be a classification model and the algorithm of the machine learning model may be at least one algorithm selected from the group consisting of: support vector machines (SVM); linear discriminant analysis (LDA); quadratic discriminant analysis (QDA); Neural Networks (NN). The term “training data set” as used herein is a broad term and is to be given its ordinary and customary meaning to a person of ordinary skill in the art and is not to be limited to a special or customized meaning. The term specifically may refer, without limitation, to a subset of the input data used for training the machine learning model. The term “test data set” as used herein is a broad term and is to be given its ordinary and customary meaning to a person of ordinary skill in the art and is not to be limited to a special or customized meaning. The term specifically may refer, without limitation, to another subset of the input data used for testing the trained machine learning model. The training data set may comprise a plurality of training data sets. In particular, the training data set comprises a training data set per subject of the input data. The test data set may comprise a plurality of test data sets. In particular, the test data set comprises a test data set per subject of the input data. The processing unit may be configured for generating and/or creating per subject of the input data a training data set and a test data set, wherein the test data set per subject may comprise data only of that subject, whereas the training data set for that subject comprises all other input data. The term “performance” as used herein is a broad term and is to be given its ordinary and customary meaning to a person of ordinary skill in the art and is not to be limited to a special or customized meaning. The term specifically may refer, without limitation, to suitability of the determined analysis model for predicting the target variable. The performance may be characterized by deviations between predicted target variable and true value of the target variable. The machine learning system may comprises at least one output interface. The output interface may be designed identical to the communication interface and/or may be formed integral with the communication interface. The output interface may be configured for providing at least one output. The output may comprise at least one information about the performance of the determined analysis model. The information about the performance of the determined analysis model may comprises one or more of at least one scoring chart, at least one predictions plot, at least one correlations plot, and at least one residuals plot. The model unit may comprise a plurality of machine learning models, wherein the machine learning models are distinguished by their algorithm. For example, for building a regression model the model unit may comprise the following algorithms: Generalized linear models (GLM); partial last-squares (PLS); linear discriminant analysis (LDA); quadratic discriminant analysis (QDA); Generalized Additive Model (GAM); Neural Networks (NN). For example, for building a classification model the model unit may comprise the following algorithms: support vector machines (SVM); linear discriminant analysis (LDA); quadratic discriminant analysis (QDA); Neural Networks (NN). The processing unit may be configured for determining an analysis model for each of the machine learning models by training the respective machine learning model with the training data set and for predicting the target variables on the test data set using the determined analysis models. The term “training the machine learning model” as used herein is a broad term and is to be given its ordinary and customary meaning to a person of ordinary skill in the art and is not to be limited to a special or customized meaning. The term specifically may refer, without limitation, to a process of determining parameters of the algorithm of machine learning model on the training data set. The training may comprise at least one optimization or tuning process, wherein a best parameter combination is determined. The training may be performed iteratively on the training data sets of different subjects. The processing unit may be configured for considering different numbers of features for determining the analysis model by training the machine learning model with the training data set. The algorithm of the machine learning model may be applied to the training data set using a different number of features, e.g. depending on their ranking. The training may comprise n-fold cross validation to get a robust estimate of the model parameters. The training of the machine learning model may comprise at least one controlled learning process, wherein at least one hyper-parameter is chosen to control the training process. If necessary, the training is step is repeated to test different combinations of hyper-parameters. As used herein, the terms “computer-readable data carrier” and “computer-readable storage medium” specifically may refer to non-transitory data storage means, such as a hardware storage medium having stored thereon computer-executable instructions. The computer-readable data carrier or storage medium specifically may be or may comprise a storage medium such as a random-access memory (RAM) and/or a read-only memory (ROM). Further disclosed and proposed herein is a computer program product having program code means, in order to perform the method according to the present invention in one or more of the embodiments enclosed herein when the program is executed on a computer or computer network. Specifically, the program code means may be stored on a computer-readable data carrier and/or on a computer-readable storage medium. Further disclosed and proposed herein is a data carrier having a data structure stored thereon, which, after loading into a computer or computer network, such as into a working memory or main memory of the computer or computer network, may execute the method according to one or more of the embodiments disclosed herein. Further disclosed and proposed herein is a computer program product with program code means stored on a machine-readable carrier, in order to perform the method according to one or more of the embodiments disclosed herein, when the program is executed on a computer or computer network. As used herein, a computer program product refers to the program as a tradable product. The product may generally exist in an arbitrary format, such as in a paper format, or on a computer-readable data carrier and/or on a computer- readable storage medium. Specifically, the computer program product may be distributed over a data network. Finally, disclosed and proposed herein is a modulated data signal which contains instructions readable by a computer system or computer network, for performing the method according to one or more of the embodiments disclosed herein. Generally, in embodiments the present disclosure provides systems and methods for model aggregation in federated learning systems. The method can be applied in a broad area for building models where data privacy is a major concern and/or where data transfer is costly. More particularly, federated learning is a machine learning framework that enables training of a high-quality centralized model based on training data distributed over a large number of client computing devices. The clients typically have low computational power and/or slow/unstable connections to the network. Federated learning may also be referred to as "federated optimization" in certain instances. Federated learning is often used in scenarios where direct data access is not feasible. The challenge is on how to build global model based on individual local models. The present invention as disclosed herein provides a method which solves this problem by aggregating local models trained on individual client computing devices. The method is efficient as it minimizes the communication needed between the server and client computing devices. The method also helps to improve the generalizability of the global model to unseen data. The method for model aggregation in a federated system as disclosed herein is advantageous to the other methods in the art in several aspects. For instance, unlike the methods proposed in the art, which require communication across all client devices for each iteration of training, the model of the present invention advantageously needs the model combination after individual models are trained on client devices, so the communication during the model fitting step can be minimized. Moreover, the methods in the art utilize the Federated Average approach or a variation thereof, such as weighted average approach for aggregating the coefficients. The method of the present invention utilizes a totally different approach, which is based on finding a "best fit" to a system of linear equations, where each may define a certain aspect of the underlying ground truth based on the observed local data. The concept is totally different from the averaging approach. Furthermore, the model selection criteria implemented in the method of the invention guarantees the input for model aggregation; it also checks for other potential data issues on the nodes, which may contribute to biases in the model. The data selection and training data assumption check is a good approach to make sure the model is trained for the intended use purposes. Moreover, computationally, it requires only the transmission of model parameters, so saving the bandwidth for communication between different nodes in the federated system. Furthermore, besides the model aggregation step, all the other calculations are done on the local environments. No intermediate results need to be communicated before the model aggregation step, so there is no risk of data privacy break compared to other methods which either need communication of intermediate results, e.g. the gradients, or an anonymized dataset. Additionally, models built by this method also show a good generalizability compared to some modeling methods (e.g. empirical risk minimization based methods) when a full dataset can be accessed. Indeed, we have compared the present method with other methods, when the model is simple, and contains a small number of parameters, such as the generalized linear model, which represents a majority of clinical use cases. The performance of the current method is superior to the iterative methods with regard to generalizability, etc. The method can be developed as a software module for a federated data management and analytics system, allowing building up models using the datasets that the system has access to. In an aspect, the methods of the invention described herein for model aggregation in a federated system comprise the following method steps which, specifically, may be performed in the given order. Still, a different order is also possible. It is further possible to perform two or more of the method steps fully or partially simultaneously. Further, one or more or even all of the method steps may be performed once or may be performed repeatedly, such as repeated once or several times. Further, the method may comprise additional method steps which are not listed. Said method steps are: each client computing device obtains information specifying a model training process, and global values for a set of parameters (Pg) of a machine-learned model from a server computing device; each of said client computing device then trains the machine-learned model based at least in part on a local dataset to obtain updated values for the set of parameters (P_g,i) of the machine-learned model, wherein the local dataset is stored locally by said client computing device; each client computing device communicates information descriptive of the updated ^{values for the set of parameters} (Pg,i)^{, and a set of model evaluation metrics} (Ei)^{, to a server computing} device; said server computing device computes a global parameter based on updated values for each set of parameters (Pg,i) obtained from a multiplicity of said client computing devices selected at least in part by ^{model evaluation metrics thereof, wherein the global parameter is an average (} ^^ ^^) ^{of said update matrices} ([P_g,i,…,P_g,N]), and a residual parameter matrix ([R_g,1,…,R_g,N]) is calculated by subtracting the average of the parameter matrix (P_g) from the said updated matrices ([P_g,i,…,P_g,N]), and a low-rank approximated of the residual matrix is calculated via singular value decomposition ([R_g,1’,…,R_g,N’ ]); the server computing device outputs an aggregated model parameter, wherein the aggregated model parameter is a normalized pseudo inverse of the sum of the global parameter ^^ ^^ and said low-rank approximated of the residual ^{matrix thereof} ([Rg,1’,…,Rg,N’ ])^. In some embodiments, the information specifying the model training process comprise the type of machine- learned model, such as generalized linear model, Cox proportional hazard model. In some embodiments, the information specifying the model training process comprise the variables for model training, such as age, gender, biomarker values. In some embodiments, the information specifying the model training process comprise the inclusion and exclusion criteria for the local dataset. In some embodiments, the set of model evaluation metrics comprises the performance metrics returned to the server computing device. In some embodiments, the performance metrics returned to the server computing device comprises one or more of the metrics such as goodness of fit (R²), AUC (Area Under the Curve) of ROC (Receiver Operating Characteristic) from cross validation, and/or the acceptance criteria. In some embodiments, the acceptance criteria for the goodness of fit is greater than 0.6. In some embodiments, the federated system comprises a network of hospitals. In some embodiments, at least one of the client computing devices is an edge device, such as a smartphone or a smart watch and/or a diagnostic instrument. In an example embodiment, the method can be used in a hospital network where patient electronic medical records (EMR) are not directly shared with each other, but managed in a federated system. They can be used to build clinical prediction models using data that resides on different hospitals which have data firewalls in between. A motivating example for federated learning arises when the training data is kept locally on users’ mobile computing devices and such mobile computing devices are used as nodes for performing computation on their local data in order to update a global model. Thus, the federated learning framework differs from conventional distributed machine learning due to the large number of clients, data that is highly unbalanced and not independent and identically distributed ("IID"), and unreliable network connections. In an example embodiment, the method can be used for personal health data that are collected on an edge device, such as a smartphone or a smart watch, but not gathered by any centralized data system. In an example embodiment, the method can help to build individual models on each edge device and then aggregate individual models to get insights at a population level. In an example embodiment, the method can be used for building models that can help to identify system anomalies. For example, for the diagnostic instruments that have been deployed, we may need to get data regarding the system running status to detect abnormalities or predict potential abnormalities in the near future. The instrument status data are usually in large amounts, so costly in transferring, and they are also not always shared with the manufacturers. The method of the present invention can be used in this scenario to build local models for each machine and return the models to the manufacturer to better understand the instrument performance on the field. In an aspect, provided herein is a client computing device, comprising: at least one processor; and at least one computer-readable medium that stores instructions that, when executed by the at least one processor, cause the client computing device to perform the methods of the invention described herein for model aggregation in a federated system. In an aspect, provided herein is a server computing device, comprising: at least one processor; and at least one computer-readable medium that stores instructions that, when executed by the at least one processor, cause the server computing device to perform the methods of the invention described herein for model aggregation in a federated system. In an aspect, provided herein is a system for federated learning comprising a server computing device and a plurality of client computing devices, wherein the system is configured to perform the methods of the invention described herein for model aggregation in a federated system. In an aspect, provided herein is a computer program product comprising instructions which, when the program is executed by a computer, cause the computer to carry out the methods of the invention described herein for model aggregation in a federated system. In an aspect, provided herein is a computer-readable medium having stored thereon said computer program product. Examples Example Model Aggregation In an example federated learning framework, the aggregation of models may include the three steps as described below. In step i, average parameters α c for a model including n parameters are calculated based on A, which represents a D×n matrix of D sets of model parameters returned from D client computing devices in the federated system. In step ii, single value decomposition (SVD) is performed on the residual matrix of A after extracting the average of the parameters α c. Only the top k components from the SVD will be retained to ^{reconstruct a new residual matrix,} AsΓ ^{. In step iii, a normalized pseudo inverse of the sum of the} α c ^and AsΓ ^{gives the aggregated model parameter} α c^{new, which will be used as the final parameter for the} aggregated model. Example Systems Figure 1 depicts an example system 100 for aggregating a multiplicity of machine-learned models 102A, 102B, 102C using respective training data 108A, 108B, 108C stored locally on a plurality of client devices 102A, 102B, 102C. System 100 can include a server device 104. Server 104 can be configured to access machine learning model 106, and to provide model 106 to a plurality of client devices 102A, 102B, 102C. Model 106 can be, for instance, a linear regression model, logistic regression model, a support vector machine model, a neural network (e.g. convolutional neural network, recurrent neural network, etc.), or other suitable model. In some implementations, sever 104 can be configured to communicate with client devices 102A, 102B, 102C over one or more networks. Client devices 102A, 102B, 102C can each be configured to determine one or more local updates associated with model 106 based at least in part on training data 108A, 108B, 108C. For instance, training data 108A, 108B, 108C can be data that is respectively stored locally on the client devices 102A, 102B, 102C. The training data 108A, 108B, 108C can include audio files, image files, video files, a typing history, location history, and/or various other suitable data. In some implementations, the training data can be any data derived through a user interaction with a client device 102A, 102B, 102C. Further to the descriptions above, a user may be provided with controls allowing the user to make an election as to both if and when systems, programs or features described herein may enable collection, storage, and/or use of user information (e.g., training data 108A, 108B, 108C), and if the user is sent content or communications from a server. In addition, certain data may be treated in one or more ways before it is stored or used, so that personally identifiable information is removed. For example, a user’s identity may be treated so that no personally identifiable information can be determined for the user, or a user’s geographic location may be generalized where location information is obtained (such as to a city, ZIP code, or state level), so that a particular location of a user cannot be determined. Thus, the user may have control over what information is collected about the user, how that information is used, and what information is provided to the user. The training data 108A, 108B, 108C consists of data that is respectively stored at each device 102A, 102B, 102C. Thus, in some implementations, the training data (108A, 108B, 108C) is highly unbalanced and not independent and identically distributed. Client devices 102A, 102B, 102C can be configured to provide the local updates to server 104. As indicated above, training data 108A, 108B, 108C may be privacy sensitive. In this manner, the local updates can be performed and provided to server 104 without compromising the privacy of training data 108A, 108B, 108C. For instance, in such implementations, training data 108A, 108B, 108C is not provided to server 104. The local update does not include training data 108A, 108B, 108C. In some implementations in which a locally updated model is provided to server 104, some privacy sensitive data may be able to be derived or inferred from the model parameters. In such implementations, one or more of encryption techniques, random noise techniques, and/or other security techniques can be added to the training process to obscure any inferable information. As indicated above, server 104 can receive each local update from client device 102A, 102B, 102C, and can aggregate the local updates to determine a global update to the model 106. Figure 2 depicts an example computing system 200 that can be used to implement the methods and systems of the present disclosure. The system 200 can be implemented using a client-server architecture that includes a server 210 that communicates with one or more client devices 230 over a network 242. Thus, Figure 2 provides an example system 200 that can implement the scheme illustrated by system 100 of Figure 1. The system 200 includes a server 210, such as a web server. The server 210 can be implemented using any suitable computing device(s). The server 210 can have one or more processors 212 and one or more memory devices 214. The server 210 can be implemented using one server device or a plurality of server devices. In implementations in which a plurality of devices are used, such plurality of devices can operate according to a parallel computing architecture, a sequential computing architecture, or a combination thereof. The server 210 can also include a network interface used to communicate with one or more client devices 230 over the network 242. The network interface can include any suitable components for interfacing with one more networks, including for example, transmitters, receivers, ports, controllers, antennas, or other suitable components. The one or more processors 212 can include any suitable processing device, such as a microprocessor, microcontroller, integrated circuit, logic device, or other suitable processing device. The one or more memory devices 214 can include one or more computer-readable media, including, but not limited to, non- transitory computer-readable media, RAM, ROM, hard drives, flash drives, or other memory devices. The one or more memory devices 214 can store information accessible by the one or more processors 212, including computer-readable instructions 216 that can be executed by the one or more processors 212. The instructions 216 can be any set of instructions that when executed by the one or more processors 212, cause the one or more processors 212 to perform operations. For instance, the instructions 216 can be executed by the one or more processors 212 to implement a global updater 220. The global updater 220 can be configured to receive one or more local updates and to determine a global model based at least in part on the local updates. The instructions 216 can further include instructions that cause the server 210 to implement a decoder 222. The decoder 222 can decode an update that has been encoded by a client device 230 (e.g., according to one of the encoding techniques known in the art such as subsampling, quantization, random rotation, etc.). As shown in Figure 2, the one or more memory devices 214 can also store data 218 that can be retrieved, manipulated, created, or stored by the one or more processors 212. The data 218 can include, for instance, local updates, global parameters, and other data. The data 218 can be stored in one or more databases. The one or more databases can be connected to the server 210 by a high bandwidth LAN or WAN, or can also be connected to server 210 through network 242. The one or more databases can be split up so that they are located in multiple locales. The server 210 can exchange data with one or more client devices 230 over the network 242. Any number of client devices 230 can be connected to the server 210 over the network 242. Each of the client devices 230 can be any suitable type of computing device, such as a general purpose computer, special purpose computer, laptop, desktop, mobile device, navigation system, smartphone, tablet, wearable computing device, gaming console, a display with one or more processors, or other suitable computing device. Similar to the server 210, a client device 230 can include one or more processor(s) 232 and a memory 234. The one or more processor(s) 232 can include, for example, one or more central processing units (CPUs), graphics processing units (GPUs) dedicated to efficiently rendering images or performing other specialized calculations, and/or other processing devices. The memory 234 can include one or more computer- readable media and can store information accessible by the one or more processors 232, including instructions 236 that can be executed by the one or more processors 232 and data 238. The instructions 236 can include instructions for implementing a local updater configured to determine one or more local updates according to example aspects of the present disclosure. For example, the local updater can perform one or more training techniques such as, for example, backwards propagation of errors to re-train or otherwise update the model based on the locally stored training data. The local updater can be configured to perform structured updates, sketched updates, or other techniques. The local updater can be included in an application or can be included in the operating system of the device 230. The instructions 236 can further include instructions for implementing an encoder. For example, the encoder can perform one or more of the encoding techniques known in the art (e.g., subsampling, quantization, random rotation, etc.). The data 238 can include one or more training data examples to be used in solving one or more optimization problems. The training data examples of each client device 230 can be distributed unevenly among the client devices, such that no client device 230 includes a representative sample of the overall distribution of the training data examples. The data 238 can further include updated parameters to be communicated to the server 210. The client device 230 of Figure 2 can include various input/output devices for providing and receiving information from a user, such as a touch screen, touch pad, data entry keys, speakers, and/or a microphone suitable for voice recognition. The client device 230 can also include a network interface used to communicate with one or more remote computing devices (e.g. server 210) over the network 242. The network interface can include any suitable components for interfacing with one more networks, including for example, transmitters, receivers, ports, controllers, antennas, or other suitable components. The network 242 can be any type of communications network, such as a local area network (e.g. intranet), wide area network (e.g. Internet), cellular network, or some combination thereof. The network 242 can also include a direct connection between a client device 230 and the server 210. In general, communication between the server 210 and a client device 230 can be carried via network interface using any type of wired and/or wireless connection, using a variety of communication protocols (e.g. TCP/IP, HTTP, SMTP, FTP), encodings or formats (e.g. HTML, XML), and/or protection schemes (e.g. VPN, secure HTTP, SSL). Example method Figure 3 depicts a flow chart diagram of an example method 300 to perform model aggregation according to example embodiments of the present disclosure. For example, method 300 can be performed by a client computing device. At step 302, a client computing device obtains information specifying a model training process, and global values for a set of parameters of a machine-learned model. The information may include the type of model that will be used (e.g. generalized linear model, Cox proportional hazard model), the variables that will be used for model training (e.g. age, gender, biomarker values), the inclusion and exclusion criteria for the selection of training data and/or instruction on the metrics for the assessment of the machine-learned model. At step 304, the client computing device trains the machine-learned model based at least in part on a local dataset to obtain an update matrix descriptive of updated values for the set of parameters of the machine- learned model, wherein the local dataset is stored locally by said client computing device. The method applies in general with methods that uses a linear combination of the features to predict the outcome. For example, the analysis model may be a regression model and the algorithm of the machine learning model may be at least one algorithm selected from the group consisting of: Generalized linear models (GLM); partial last-squares (PLS); linear discriminant analysis (LDA); quadratic discriminant analysis (QDA); Generalized Additive Model (GAM); Neural Networks (NN). For example, the analysis model may be a classification model and the algorithm of the machine learning model may be at least one algorithm selected from the group consisting of: support vector machines (SVM); linear discriminant analysis (LDA); quadratic discriminant analysis (QDA); Neural Networks (NN). The training data set may comprise a plurality of training data sets. In particular, the training data set comprises a training data set per subject of the input data. In some implementations, the update matrix is restricted to be a low-rank matrix. In some implementations, training the machine-learned model at step 304 can include training the machine- learned model based at least in part on the local dataset such that updated values are determined only for a pre-selected portion of the set of parameters. In such implementations, the update matrix can be descriptive of only the updated values for the pre-selected portion of the set of parameters. At step 306, the client computing device communicates information descriptive of the update matrix to the server computing device, e.g. via a network. The network can be any type of communications network, such as a local area network (e.g. intranet), wide area network (e.g. Internet), cellular network, or some combination thereof. The network can also include a direct connection between the client device and the server. At step 308, model aggregation is performed by the server computing device, e.g. as in Figure 4. Figure 4 depicts a flow chart diagram of an example method 308 to perform model aggregation according to example embodiments of the present disclosure. At 3081, the server computer device selects the models at least in part by model evaluation metrics thereof. In an embodiment, only models that meet the pre-specified performance acceptance criteria will be used in the next step for model aggregation. At 3082, the server computer device computes average model coefficients across selected models. At 3083, the server computer device selects top eigenvector(s) from the mean-adjusted matrix of coefficients. At 3084, the server computer device obtains a new set of general coefficients as a linear combination of the selected eigenvectors. At 3085, the server computer device calculates a normalized pseudo inverse of the sum of original average model coefficients and the new set of coefficients generated in 3084. This calculated value will be used as the parameter for the aggregated model. Example Experiments Example experiments were conducted to evaluate the performance of a model trained under the method of the invention when individual datasets were not accessible vs. the performance of a model trained when all data are accessible. For this assessment, data from the following clinical trials was used to build prognostic models for the patients with Non-Small Cell Lung Cancer (NSCLC) regarding their overall survival. We have picked seven randomized controlled trials (RCTs) for NSCLC for this study, which will be referred to as RCT 1-7 herein. The following variables were included to build the models based on the previous literature report (Alexander M, et.al. Br J Cancer.2017): - Sex - Age - Ecog (ecog = 0/1, other categories were too small (N<10) thus included in the ecog = 1) - Smoker (Current, Former, Never) - Stage (I, II, IIIA, IIIB, IV) Experiment setup The different RCTs were split into training and testing sets. Besides one RCT, other trials were included in the training dataset. Prognostic models were built under two scenarios: 1) when all data were pooled together, which simulates the case when access to patient data in all RCTs were feasible; 2) when no direct access to patient data was feasible, which mimics the scenario in a federated data system. In scenario 1, a model could be fitted based on the pooled full dataset. In scenario 2, a model would be fitted on each of the dataset, and be given as input for the LRD model aggregation method with which a final aggregated model would be created. The final models built under the two scenarios will be tested on the same holdout testing RCT. The model performance metric, as measured by the Concordance index (C-index), from the two final models would be compared (Figure 5). Results The performance of the aggregated models using method of the invention (LRD method) had very similar performance as the models that were built with access to the entire pooled dataset (Figure 6). In two cases, the performance of the LRD-based model were even better than the model built on the pooled data. The influence of the number of retained components during the decomposition step on performance of the LRD-based model was assessed in one of the testing cases. In general rank=1 or 2 gives a better performance than the cases which retained higher rank of the components (Figure 5). Additionally, both the LRD-based and the pooled-data-based models were compared with an "Oracle" model, which was fitted directly on the test dataset thus will be the best fit to the testing data besides the error from model fitting. The LRD had a much closer performance to the "Oracle" model, further demonstrating the advantage of this method (Figure 7). Additional disclosure The technology discussed herein makes reference to servers, databases, software applications, and other computer-based systems, as well as actions taken and information sent to and from such systems. The inherent flexibility of computer-based systems allows for a great variety of possible configurations, combinations, and divisions of tasks and functionality between and among components. For instance, processes discussed herein can be implemented using a single device or component or multiple devices or components working in combination. Databases and applications can be implemented on a single system or distributed across multiple systems. Distributed components can operate sequentially or in parallel. While the present subject matter has been described in detail with respect to various specific example embodiments thereof, each example is provided by way of explanation, not limitation of the disclosure. Those skilled in the art, upon attaining an understanding of the foregoing, can readily produce alterations to, variations of, and equivalents to such embodiments. Accordingly, the subject disclosure does not preclude inclusion of such modifications, variations and/or additions to the present subject matter as would be readily apparent to one of ordinary skill in the art. For instance, features illustrated or described as part of one embodiment can be used with another embodiment to yield a still further embodiment. Thus, it is intended that the present disclosure cover such alterations, variations, and equivalents. Embodiments In the following, further particular embodiments of the present invention are listed. Embodiments A 1A. In an embodiment, a computer-implemented method for model aggregation in a federated system is disclosed, the method comprising the steps of: i) Obtaining, by each client computing device, information specifying a model training process, and global values for a set of parameters of a machine-learned model; ii) Training, by each of said client computing device, the machine-learned model based at least in part on a local dataset to obtain an update matrix descriptive of updated values for the set of parameters of the machine-learned model, wherein the local dataset is stored locally by said client computing device; iii) Communicating, by each client computing device, information descriptive of the update matrix, and a set of model evaluation metrics, to a server computing device; iv) Computing, by said server computing device, a global parameter based on updated values for each set of parameters obtained from a multiplicity of said client computing devices selected at least in part by model evaluation metrics thereof, wherein the global parameter is an average of said update matrices, and a residual parameter matrix is calculated by subtracting the average of the parameter matrix from the original parameter matrix, and a low-rank approximated of the residual matrix is calculated via singular value decomposition; iv) Outputting, by the server computing device, an aggregated model parameter, wherein the aggregated model parameter is a normalized pseudo inverse of the sum of the global parameter and said residual matrix thereof. 2A. In an embodiment, a computer-implemented method for model aggregation in a federated system is disclosed, the method consisting of the steps of: i) Obtaining, by each client computing device, information specifying a model training process, and global values for a set of parameters of a machine-learned model; ii) Training, by each of said client computing device, the machine-learned model based at least in part on a local dataset to obtain an update matrix descriptive of updated values for the set of parameters of the machine-learned model, wherein the local dataset is stored locally by said client computing device; iii) Communicating, by each client computing device, information descriptive of the update matrix, and a set of model evaluation metrics, to a server computing device; iv) Computing, by said server computing device, a global parameter based on updated values for each set of parameters obtained from a multiplicity of said client computing devices selected at least in part by model evaluation metrics thereof, wherein the global parameter is an average of said update matrices, and a residual parameter matrix is calculated by subtracting the average of the parameter matrix from the original parameter matrix, and a low-rank approximated of the residual matrix is calculated via singular value decomposition; iv) Outputting, by the server computing device, an aggregated model parameter, wherein the aggregated model parameter is a normalized pseudo inverse of the sum of the global parameter and said residual matrix thereof. 3A. In an embodiment, the method of any of the preceding embodiments is disclosed, wherein the steps are performed sequentially. 4A. In an embodiment, the method of any of the preceding embodiments is disclosed, wherein the information specifying the model training process comprise the type of machine-learned model. 5A. In an embodiment, the method of embodiment 4A is disclosed, wherein the type of machine-learned model is generalized linear model. 6A. In an embodiment, the method of embodiment 4A is disclosed, wherein the type of machine-learned model is Cox proportional model. 7A. In an embodiment, the method of any of the preceding embodiments is disclosed, wherein the information specifying the model training process comprise the variables for model training, such as age, gender, and/or biomarker values. 8A. In an embodiment, the method of any of the preceding embodiments is disclosed, wherein the information specifying the model training process comprise the inclusion and exclusion criteria for the local dataset. 9A. In an embodiment, the method of any of the preceding embodiments is disclosed, wherein the set of model evaluation metrics comprises the performance metrics returned to the server computing device. 10A. In an embodiment, the method of embodiment 9A is disclosed, wherein the performance metrics returned to the server computing device comprises one or more of the metrics such as goodness of fit (R2), AUC of ROC from cross validation, and/or the other acceptance criteria. 11A. In an embodiment, the method of embodiment 10A is disclosed, wherein the acceptance criteria for the goodness of fit is greater than 0.6. 12A. In an embodiment, the method of any of the preceding embodiments is disclosed, wherein the federated system comprises a network of hospitals. 13A. In an embodiment, the method of any of the preceding embodiments is disclosed, wherein at least one of the client computing devices is an edge device, such as a smartphone or a smart watch and/or a diagnostic instrument. 14A. In an embodiment, a client computing device is disclosed, comprising: at least one processor; and at least one computer-readable medium that stores instructions that, when executed by the at least one processor, cause the client computing device to perform the method of any one of the preceding embodiments. 15A. In an embodiment, a client computing device is disclosed, consisting of: at least one processor; and at least one computer-readable medium that stores instructions that, when executed by the at least one processor, cause the client computing device to perform the method of any one of the preceding embodiments. 16A. In an embodiment, a system for federated learning is disclosed, comprising a server computing device and a plurality of client computing devices, wherein the system is configured to perform the method of any one of the preceding embodiments. 17A. In an embodiment, a system for federated learning is disclosed, consisting of a server computing device and a plurality of client computing devices, wherein the system is configured to perform the method of any one of the preceding embodiments. 18A. In an embodiment, a computer program product is disclosed, comprising instructions which, when the program is executed by a computer, cause the computer to carry out the method of any of the embodiments 1A-13A. 19A. In an embodiment, a computer program product is disclosed, consisting of instructions which, when the program is executed by a computer, cause the computer to carry out the method of any of the embodiments 1A-13A. 20A. In an embodiment, a computer-readable medium is disclosed, having stored thereon the computer program product of embodiment 18A or 19A. Embodiments B 1B. In an embodiment, a computer-implemented method for model aggregation in a federated system is disclosed, the method comprising the steps of: i) Obtaining, by each client computing device, information specifying a model training process, and global values for a set of parameters (P_g) of a machine-learned model from a server computing device; ii) Training, by each of said client computing device, the machine-learned model based at least in part on ^{a local dataset to obtain updated values for the set of parameters} (Pg,i) ^{of the machine-learned model,} wherein the local dataset is stored locally by said client computing device; iii) Communicating, by each client computing device, information descriptive of the updated values for the set of parameters (Pg,i), and a set of model evaluation metrics (Ei), to a server computing device; iv) Computing, by said server computing device, a global parameter based on updated values for each set of parameters (P_g,i) obtained from a multiplicity of said client computing devices selected at least in part by model evaluation metrics thereof, wherein the global parameter is an average ( ^^ ^^) of said update matrices ([P_g,i,…,P_g,N]), and a residual parameter matrix ([R_g,1,…,R_g,N]) is calculated by subtracting the average of the parameter matrix (P_g) from the said updated matrices ([P_g,i,…,P_g,N]), and a low-rank approximated of the residual matrix is calculated via singular value decomposition ([R_g,1’,…,R_g,N’ ]); v) Outputting, by the server computing device, an aggregated model parameter, wherein the aggregated model parameter is a normalized pseudo inverse of the sum of the global parameter ( ^^ ^^) and said low- rank approximated of the residual matrix thereof ([Rg,1’,…,Rg,N’ ]). 2B. In an embodiment, a computer-implemented method for model aggregation in a federated system is disclosed, the method consisting of the steps of: i) Obtaining, by each client computing device, information specifying a model training process, and global values for a set of parameters (Pg) of a machine-learned model from a server computing device; ii) Training, by each of said client computing device, the machine-learned model based at least in part on a local dataset to obtain updated values for the set of parameters (P_g,i) of the machine-learned model, wherein the local dataset is stored locally by said client computing device; iii) Communicating, by each client computing device, information descriptive of the updated values for the ^{set of parameters} (Pg,i)^{, and a set of model evaluation metrics} (Ei)^{, to a server computing device;} iv) Computing, by said server computing device, a global parameter based on updated values for each set of parameters (Pg,i) obtained from a multiplicity of said client computing devices selected at least in part by model evaluation metrics thereof, wherein the global parameter is an average ( ^^ ^^) of said update matrices ([Pg,i,…,Pg,N]), and a residual parameter matrix ([Rg,1,…,Rg,N]) is calculated by subtracting the average of the parameter matrix (Pg) from the said updated matrices ([Pg,i,…,Pg,N]), and a low-rank approximated of the residual matrix is calculated via singular value decomposition ([Rg,1’,…,Rg,N’ ]); v) Outputting, by the server computing device, an aggregated model parameter, wherein the aggregated model parameter is a normalized pseudo inverse of the sum of the global parameter ( ^^ ^^) and said low- ^{rank approximated of the residual matrix thereof} ([Rg,1’,…,Rg,N’ ])^. 3B. In an embodiment, the method embodiment 1B or embodiment 2B is disclosed, wherein the steps are performed sequentially. 4B. In an embodiment, the method of any of the embodiments 1B - 3B is disclosed, wherein the information specifying the model training process comprise the type of machine-learned model. 5B. In an embodiment, the method of embodiment 4B is disclosed, wherein the type of machine-learned model is generalized linear model. 6B. In an embodiment, the method of embodiment 4B is disclosed, wherein the type of machine-learned model is Cox proportional model. 7B. In an embodiment, the method of any of the embodiments 1B – 6B is disclosed, wherein the information specifying the model training process comprise the variables for model training, such as age, gender, and/or biomarker values. 8B. In an embodiment, the method of any of the embodiments 1B – 7B is disclosed, wherein the information specifying the model training process comprise the inclusion and exclusion criteria for the local dataset. 9B. In an embodiment, the method of any of the embodiments 1B – 8B is disclosed, wherein the set of model evaluation metrics comprises the performance metrics returned to the server computing device. 10B. In an embodiment, the method of embodiment 9B is disclosed, wherein the performance metrics returned to the server computing device comprises one or more of the metrics such as goodness of fit (R2), AUC of ROC from cross validation, and/or the other acceptance criteria. 11B. In an embodiment, the method of embodiment 10B is disclosed, wherein the acceptance criteria for the goodness of fit is greater than 0.6. 12B. In an embodiment, the method of any of the embodiments 1B – 11B is disclosed, wherein the federated system comprises a network of hospitals. 13B. In an embodiment, the method of any of the embodiments 1B – 12B is disclosed, wherein at least one of the client computing devices is an edge device, such as a smartphone or a smart watch and/or a diagnostic instrument. 14B. In an embodiment, a client computing device is disclosed, comprising: at least one processor; and at least one computer-readable medium that stores instructions that, when executed by the at least one processor, cause the client computing device to perform the method of any one of the embodiments 1B – 13B. 15B. In an embodiment, a client computing device is disclosed, consisting of: at least one processor; and at least one computer-readable medium that stores instructions that, when executed by the at least one processor, cause the client computing device to perform the method of any one of the embodiments 1B – 13B. 16B. In an embodiment, a system for federated learning is disclosed, comprising a server computing device and a plurality of client computing devices, wherein the system is configured to perform the method of any one of the embodiments 1B – 13B. 17B. In an embodiment, a system for federated learning is disclosed, consisting of a server computing device and a plurality of client computing devices, wherein the system is configured to perform the method of any one of the embodiments 1B – 13B. 18B. In an embodiment, a computer program product is disclosed, comprising instructions which, when the program is executed by a computer, cause the computer to carry out the method of any of the embodiments 1B – 13B. 19B. In an embodiment, a computer program product is disclosed, consisting of instructions which, when the program is executed by a computer, cause the computer to carry out the method of any of the embodiments 1B – 13B. 20B. In an embodiment, a computer-readable medium is disclosed, having stored thereon the computer program product of embodiment 18B or 19B. Embodiments C 1C. In an embodiment, a computer-implemented method for model aggregation in a federated system is disclosed, the method comprising the steps of: i) Obtaining, by each client computing device, information specifying a model training process, and global values for a set of parameters of a machine-learned model; ii) Training, by each of said client computing device, the machine-learned model based at least in part on a local dataset to obtain an update matrix descriptive of updated values for the set of parameters of the machine-learned model, wherein the local dataset is stored locally by said client computing device; iii) Communicating, by each client computing device, information descriptive of the update matrix, and a set of model evaluation metrics, to a server computing device; iv) Computing, by said server computing device, a global parameter based on updated values for each set of parameters obtained from a multiplicity of said client computing devices selected at least in part by model evaluation metrics thereof, wherein the global parameter is an average of said update matrices, and a residual parameter matrix is calculated by subtracting the average of the parameter matrix from the original parameter matrix, and a low-rank approximated of the residual matrix is calculated via singular value decomposition; iv) Outputting, by the server computing device, an aggregated model parameter, wherein the aggregated model parameter is a normalized pseudo inverse of the sum of the global parameter and said residual matrix thereof. 2C. In an embodiment, the method of embodiment 1C is disclosed, wherein the information specifying the model training process comprise the type of machine-learned model, such as generalized linear model, Cox proportional hazard model. 3C. In an embodiment, the method of embodiment 1C or 2C is disclosed, wherein the information specifying the model training process comprise the variables for model training, such as age, gender, and/or biomarker values. 4C. In an embodiment, the method of any of embodiments 1C-3C is disclosed, wherein the information specifying the model training process comprise the inclusion and exclusion criteria for the local dataset. 5C. In an embodiment, the method of any of embodiments 1C-4C is disclosed, wherein the set of model evaluation metrics comprises the performance metrics returned to the server computing device. 6C. In an embodiment, the method of embodiment 5C is disclosed, wherein the performance metrics returned to the server computing device comprises one or more of the metrics such as goodness of fit (R²), AUC of ROC from cross validation, and/or the other acceptance criteria. 7C. In an embodiment, the method of embodiment 6C is disclosed, wherein the acceptance criteria for the goodness of fit is greater than 0.6. 8C. In an embodiment, the method of any of embodiments 1C-7C is disclosed, wherein the federated system comprises a network of hospitals. 9C. In an embodiment, the method of any of embodiments 1C-8C is disclosed, wherein at least one of the client computing devices is an edge device, such as a smartphone or a smart watch and/or a diagnostic instrument. 10C. In an embodiment, a client computing device is disclosed, comprising: at least one processor; and at least one computer-readable medium that stores instructions that, when executed by the at least one processor, cause the client computing device to perform the method of any one of the preceding embodiments. 11C. In an embodiment, a system for federated learning is disclosed, comprising a server computing device and a plurality of client computing devices, wherein the system is configured to perform the method of any one of embodiments 1C to 9C. 12C. In an embodiment, a computer program product is disclosed, comprising instructions which, when the program is executed by a computer, cause the computer to carry out the method of any of embodiments 1C-9C. 13C. In an embodiment, a computer-readable medium is disclosed, having stored thereon the computer program product of embodiment 12C.

Claims

Claims 1. A computer-implemented method for model aggregation in a federated system, the method comprising the steps of: i) Obtaining, by each client computing device, information specifying a model training process, and global values for a set of parameters (Pg) of a machine-learned model from a server computing device; ii) Training, by each of said client computing device, the machine-learned model based at least in part on a local dataset to obtain updated values for the set of parameters (P_g,i) of the machine-learned model, wherein the local dataset is stored locally by said client computing device; iii) Communicating, by each client computing device, information descriptive of the updated values for the set of parameters (P_g,i), and a set of model evaluation metrics (E_i), to a server computing device; iv) Computing, by said server computing device, a global parameter based on updated values for each set of parameters (Pg,i) obtained from a multiplicity of said client computing devices selected at least in part by ^{model evaluation metrics thereof, wherein the global parameter is an average} ^^ ^^ ^{of said update matrices} ([Pg,i,…,Pg,N]), and a residual parameter matrix ([Rg,1,…,Rg,N]) is calculated by subtracting the average of the parameter matrix (Pg) from the said updated matrices ([Pg,i,…,Pg,N]), and a low-rank approximated of the residual matrix is calculated via singular value decomposition ([Rg,1’,…,Rg,N’ ]); v) Outputting, by the server computing device, an aggregated model parameter, wherein the aggregated model parameter is a normalized pseudo inverse of the sum of the global parameter ^^ ^^ and said low-rank approximated of the residual matrix thereof ( .

2. The method of claim 1, wherein the information specifying the model training process comprise the type of machine-learned model, such as generalized linear model, Cox proportional hazard model.

3. The method of claim 1 or 2, wherein the information specifying the model training process comprise the variables for model training, such as age, gender, and/or biomarker values.

4. The method of any of claims 1-3, wherein the information specifying the model training process comprise the inclusion and exclusion criteria for the local dataset.

5. The method of any of claims 1-4, wherein the set of model evaluation metrics comprises the performance metrics returned to the server computing device.

6. The method of claim 5, wherein the performance metrics returned to the server computing device comprises one or more of the metrics such as goodness of fit (R²), AUC of ROC from cross validation, and/or the other acceptance criteria.

7. The method of claim 6, wherein the acceptance criteria for the goodness of fit is greater than 0.6.

8. The method of any of claims 1-7, wherein the federated system comprises a network of hospitals.

9. The method of any of claims 1-8, wherein at least one of the client computing devices is an edge device, such as a smartphone or a smart watch and/or a diagnostic instrument.

10. A client computing device, comprising: at least one processor; and at least one computer-readable medium that stores instructions that, when executed by the at least one processor, cause the client computing device to perform the method of any of claims 1-9.

11. A server computing device, comprising: at least one processor; and at least one computer-readable medium that stores instructions that, when executed by the at least one processor, cause the server computing device to perform the method of any of claims 1-9.

12. A system for federated learning comprising a server computing device and a plurality of client computing devices, wherein the system is configured to perform the method of any of claims 1-9.

13. A computer program product comprising instructions which, when the program is executed by a computer, cause the computer to carry out the method of any of claims 1-9.

14. A computer-readable medium having stored thereon the computer program product of claim 13.

15. The invention as hereinbefore described.