[go: up one dir, main page]

WO2025085632A1 - Procédé de bras de commande externes fédérés pour inférence causale améliorée par la confidentialité sur des données distribuées - Google Patents

Procédé de bras de commande externes fédérés pour inférence causale améliorée par la confidentialité sur des données distribuées Download PDF

Info

Publication number
WO2025085632A1
WO2025085632A1 PCT/US2024/051768 US2024051768W WO2025085632A1 WO 2025085632 A1 WO2025085632 A1 WO 2025085632A1 US 2024051768 W US2024051768 W US 2024051768W WO 2025085632 A1 WO2025085632 A1 WO 2025085632A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
training
centers
data
hessian
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/US2024/051768
Other languages
English (en)
Inventor
Jean Du TERRAIL
Quentin KLOPFENSTEIN
Honghao Li
Imke MAYER
Mohammad HALLAL
Nicolas Loiseau
Mathieu ANDREUX
Felix BALAZARD
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Owkin Inc
Original Assignee
Owkin Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Owkin Inc filed Critical Owkin Inc
Publication of WO2025085632A1 publication Critical patent/WO2025085632A1/fr
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/20ICT specially adapted for the handling or processing of patient-related medical or healthcare data for electronic clinical trials or questionnaires
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H20/00ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
    • G16H20/10ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to drugs or medications, e.g. for ensuring correct administration to patients
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/50ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records

Definitions

  • This invention relates generally to machine learning and more particularly to using federated learning methods to adapt external control arms techniques for privacy enhanced causal inference on distributed data.
  • BACKGROUND OF THE INVENTION [0002] Correctly inferring causal relationships, such as treatment effects, has been an active research topic and remains an open problem in clinical research. The current gold standard to establish the effect of a new drug is to randomly assign patients to either a treated group (or arm) or to a control group.
  • Treatment given to patients in the control arm can be placebo or the current standard of care (SOC), against which the new experimental drug will be compared.
  • SOC current standard of care
  • the populations of different treatment groups are directly comparable and the effect of the drug in terms of target outcomes can be assessed without biases.
  • RCT randomized controlled trials
  • RCE real-world evidence
  • ECA external control arm
  • IPTW inverse probability weighted treatment effect estimation
  • FL Federated Learning
  • PTT privacy-enhancing technology
  • IP intellectual properties
  • FL approaches enable the use of ECA in the settings where the external control arm and the experimental treatment arm are held in different institutions. Such settings have received little attention from the community, with no working methodology in the case of time-to-event outcomes that are central in oncology.
  • the device trains a first model using a plurality of training centers, wherein each training center includes a private set of training data and a compute unit, where the first model estimates the probability of receiving the experimental treatment for each sample given the distributed private set of training data.
  • the device may train a second model using the plurality of training centers, wherein the training of the second model includes computing weights for each sample using the estimated probabilities from the first model, and the second model then estimates the effect of the drug on time-to-event outcomes on weighted data.
  • the device may use the second model in addition to a third distributed procedure to test whether the experimental drug has a significant effect on the survival outcome in a robust fashion.
  • Figure 1 is an illustration of one embodiment of a system for computing a causal effect using federated learning that allows for private data to remain on the premises of its owners while learning global insights. This enhances the privacy of the training and keeps participants in control of their data as opposed to pooling data in a central place and gives equivalent results.
  • Figure 2 is a flow diagram of one embodiment of a process to compute a causal effect using federated learning that allows for private data to remain on the premises of its owners while learning global insights.
  • Figure 3 is a flow diagram of one embodiment of a process to train a propensity score model.
  • Figure 4 is a flow diagram of one embodiment of a process to train a Cox model using federated IPTW.
  • Figure 5 is a flow diagram of one embodiment of a process to perform federated IPTW.
  • Figure 6 is a flow diagram of one embodiment of a process to compute gradient and hessian components at each center.
  • Figure 7 is a comparison of one embodiment between pooled IPTW and the method called federated external control arm (FedECA) or federated IPTW.
  • Figure 8 is a flow diagram of one embodiment of a process to test the Cox model.
  • Figure 9 illustrates one example of a typical computer system, which may be used in conjunction with the embodiments described herein.
  • DETAILED DESCRIPTION A method and apparatus of a device that computes a federated IPTW to leverage multicentric real-world data or historical clinical trials as a control group for a single-arm trial while accounting for sampling biases due to the lack of randomization is described.
  • federated external control arm FedECA
  • federated IPTW federated external control arm
  • numerous specific details are set forth to provide thorough explanation of embodiments of the present invention. It will be apparent, however, to one skilled in the art, that embodiments of the present invention may be practiced without these specific details. In other instances, well-known components, structures, and techniques have not been shown in detail in order not to obscure the understanding of this description. [0019] Reference in the specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the invention. The appearances of the phrase “in one embodiment” in various places in the specification do not necessarily all refer to the same embodiment.
  • Coupled is used to indicate that two or more elements, which may or may not be in direct physical or electrical contact with each other, co-operate or interact with each other.
  • Connected is used to indicate the establishment of communication between two or more elements that are coupled with each other.
  • pooled will be used to indicate a setting with no privacy constraints where all data can be pooled in a third-party entity. This simpler “pooled” setting will act as a reference to validate the results of the method.
  • processing logic that comprises hardware (e.g., circuitry, dedicated logic, etc.), software (such as is run on a general- purpose computer system or a dedicated machine), or a combination of both.
  • processing logic comprises hardware (e.g., circuitry, dedicated logic, etc.), software (such as is run on a general- purpose computer system or a dedicated machine), or a combination of both.
  • ECA external control arms
  • IPTW inverse probability of treatment weighting
  • Federated Learning is used to circumvent this data bottleneck in a privacy-enhancing fashion.
  • This can be described as federated IPTW for time-to-event outcomes and also is denoted as “federated external control arm” (FedECA).
  • FedECA federated external control arm
  • FedECA is equivalent or very close to IPTW on pooled data and provides the same or similar statistical power and type I error for robust estimation.
  • the addition of differential privacy to federated IPTW is studied.
  • ECA Historical clinical trials, insurance claims, and EHR have been considered as the data source for building an ECA.
  • a major obstacle to the feasibility of ECA is data sharing. Due to its sensitivity, health data is strictly regulated by, e.g., the General Data Protection Regulation (GDPR) in the EU and the Health Insurance Portability and Accountability Act (HIPAA) in the US and cannot be shared without being anonymized. Even in cases where compliant sharing is technically possible, data can be considered as a strategic asset by the pharmaceutical companies and healthcare centers that could take part in the ECA, which also drastically limits the ability to pool data in a single place.
  • GDPR General Data Protection Regulation
  • HIPAA Health Insurance Portability and Accountability Act
  • MAIC is also applicable as it only assumes access to individual patient data from one arm and access to summary statistics from the other arm.
  • an algorithm is presented to adapt the IPTW methodology to federated datasets, under varying privacy constraints.
  • the proposed methods FedECA (Federated External Control Arm) and DP-FedECA (Differentially Private FedECA), extend the framework of WebDISCO (Web Distributed Cox model training) to use propensity models in a federated setting with time-to-event outcomes to provide treatment effect and associated robust variance estimation.
  • WebDISCO Web Distributed Cox model training
  • Substra an open-source FL software hosted by the Linux Foundation for AI that was previously used in similar privacy sensitive contexts in healthcare, can be used.
  • FIG. 1 is an illustration of one embodiment of a system 100 for computing a causal effect using federated learning that allows for private data to remain on the premises of its owners while learning global insights and give equivalent results.
  • the system 100 includes a central aggregator 102 coupled to multiple training centers 104A-N and a “pharma” node 106, where the label “pharma” indicates the owner of the treated arm.
  • the system 100 trains the machine learning model(s) to leverage multicentric real- world data or historical clinical trials coming from 104A-N as a control group for a single-arm trial of the new drug elaborated by the organization owning the data in the “pharma” node 106, while accounting for sampling biases due to the lack of randomization.
  • each of the central aggregator 102 and training centers 104A-N and 106 can include one of a personal computer, laptop, server (on premise or in the cloud), mobile device (e.g., smartphone, laptop, personal digital assistant, music playing device, gaming device, etc.), and/or any device capable processing data.
  • each of the aggregator 102 and training centers 104A-N and 106 can include either a virtual or a physical device.
  • a machine learning model (or simply a model) is a potentially large file containing the parameters of a trained model.
  • each of the training centers 104A-N includes data from historical patients that can be used for training the model(s) using FedECA and training center 106 owns the data from the treated arm.
  • one or more of the training centers are used to construct the external control arm.
  • each of the training centers 104A-N from the control arm can be hospitals or medical institutions that collect the patient training data.
  • FedECA uses two different models, a propensity score model and a Cox model. As both models are generalized linear models, a model refers to a set of weights that describes the model.
  • a trained model is the result of training a model with a given set of training data.
  • the aggregator 102 manages the communication across centers and the updates of the model.
  • a third-party might be responsible for the orchestration of the distributed computation.
  • the central aggregator 102 receives the quantities computed in each center, aggregates them before redistributing the updated quantities, where the updated quantities now include collective insights from all the training centers 104A-N and 106 to each of the individual training centers 104A-N and 106.
  • each of the training centers 104A-N and 106 are used to compute components used for training of the two models.
  • the training centers 104A- N and 106 each receive the same propensity model initialization which is a deterministic set of weights or a random set of weights that it will use for computing the components used to train the models, computes these components, and returns the components to the aggregator 102.
  • each of the training enters 104A-N and 106 receives a vector full of zeros as initialization for the propensity model and use it to compute a local gradient and Hessian of the current model using the training data that each of the training centers 104A-N and 106 holds.
  • the raw training data of each of the training centers 104A-N and 106 remains undisclosed to any of the other participants involved including the aggregator 102.
  • the aggregated quantities (gradients, hessians and building blocks thereof) are shared, whereas the private data for each of the training centers 104A-N remains private to that training center 104A-N.
  • the aggregator averages the local Hessians and gradients and redistributes them back to the training centers 104A-N and 106. This procedure is repeated multiple times until convergence.
  • each of the training centers 104A-N and 106 computes components (scores or weighting) that are used to train the Cox model in a similar fashion, although the components shared by the training centers are not exactly the same. While in one embodiment, there are two training centers 104A-N are illustrated, in other embodiments, there can be more or less training centers. [0036] To illustrate the performance of the proposed FL implementation, simulations with synthetic data are relied upon. Covariates and related time-to-event outcomes respecting the proportional hazards (PH) assumption, with the baseline hazard function derived from a Weibull distribution are simulated.
  • PH proportional hazards
  • a constant treatment allocation is assumed across the population within a training center but in another embodiment training centers could have patient data with both treated and untreated patients.
  • the data generation process consists of several consecutive steps that are described below assuming our target is a dataset with p covariates and n samples.
  • the synthetic data generation procedure of this embodiment is only used to illustrate the capabilities of the method, which should operate the same way on any synthetic or non-synthetic data distribution irrespective of the details of the generation. [0037] First, a design matrix ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ is drawn from a multivariate normal distribution to obtain (baseline) observations for n individuals described by p covariates.
  • the covariance matrix ⁇ is taken to be a Toeplitz matrix, such that the correlation coefficients of the covariates decay geometrically. This implies that, for a fixed ⁇ > 0, the correlation between two variables Xi and Xj is given by ⁇ .
  • Such a covariance matrix implies a locally and hierarchically grouped structure underlying the covariates, which we choose to mimic the potentially complex structure of real-world data.
  • the coefficients ⁇ i of the linear combination used to build the hazard ratio are drawn from a standard normal distribution.
  • the treatment allocation variable A is simulated in such a way that it depends on the covariates X1, . . . , Xp. More precisely, it is assumed that A follows a Bernoulli distribution, where the probability of being treated (the propensity score) q depends on a linear combination of the covariates, connected by a logit link function g.
  • the coefficients ⁇ i of the linear combination are drawn from a uniform distribution, where the range k ⁇ is symmetric around 0 and is normalized by the number of covariates.
  • the treatment allocation variable Ai is composed with the constant treatment effect, defined here as the hazard ratio ⁇ , to obtain the final hazard ratio h i for each individual.
  • the time-to-event Ti * of each sample is then drawn from a Weibull distribution with shape ⁇ and the scale depending on hi and ⁇ . Meanwhile, for all samples, a constant dropout (or censoring) rate d across time is assumed, resulting in a censoring time that follows an exponential distribution.
  • FedECA comprises three major steps: (1) training a propensity score model; (2) training a federated weighted Cox model; and (3) using the quantities produced by the Cox model training in a third federated procedure in order to compute a robust p-value, which gives a significance score of the treatment effect.
  • event of interest e.g., death or disease relapse
  • n denote the total number of patients, indexed by ⁇ . [0042]
  • ⁇ s denote the set of patients with an event at this time, e.g., ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ (4) ⁇ and let Rs denote the set of patients at risk at this time (also called “ risk set” ), i.e. ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ (5) Further, let ! " denote the set of times where at least one true event occurs, i.e. !
  • FIG. 1 is a flow diagram of one embodiment of a process 200 to compute a causal effect using federated learning that maintains privacy of the training data.
  • process 200 is performed by an aggregator of a federated learning system, such as the aggregator 102.
  • Figure 2 begins by process 200 receiving input data at block 202.
  • process 200 receives configuration information that is used to perform the FedECA computation.
  • process 200 trains a propensity score model. In one embodiment, due to the lack of randomization, for each sample, the probability of being assigned the treatment A depends on the covariates X.
  • a logistic model is used for p ⁇ , e.g., / ,- ⁇ . ⁇ # /012 ⁇ 3 4 2 ⁇ (8)
  • Propensity score model training is [0047]
  • process 200 trains a Cox model using federated IPTW to generate a final set of weights and a Hessian.
  • an IPTW weight is defined as ⁇ i ⁇ (0 ⁇ + ⁇ ) based on the propensity score model trained in the previous block as ⁇ ⁇ ⁇ ⁇ / / 5 ⁇ # $ ⁇ 6 ⁇ 2 ⁇ ⁇ 78 ⁇ ⁇ # ⁇ / ⁇ 6 ⁇ 2 ⁇ ⁇ 9: ⁇ ;57% ⁇ ⁇ (9) [0048] In order to the weights 5 ⁇ using 99% quantiles using federated quantile computations (federated averaging applied on the pinball loss). In this embodiment, weights are clipped at 10 -16 (or another clipping threshold) to avoid overflow errors.
  • a weighted Cox proportional hazards (CoxPH) model is trained with parameters ⁇ ⁇ R q , related to patient-specific variables z i ⁇ R q .
  • the variables z i are not the same as the covariates xi.
  • additional covariates can be used, especially if they are known confounders. For simplicity, and in one embodiment, vector notations are used to support both cases.
  • the CoxPH model is fitted by maximizing a data-fidelity term consisting in the partial likelihood L( ⁇ ) with Breslow approximation.
  • C 4 DE I ⁇ I ⁇ where the second equation has been rewritten using the sets D s and R s .
  • the log-likelihood ⁇ ( ⁇ ) log L( ⁇ ) is used, which reads Q ⁇ ⁇ ⁇ # F> ( ⁇ K" > F> ⁇ ⁇ LM > R 5 ⁇ ⁇ S T ⁇ U 5 ⁇ Q9V Q9V ⁇ W F> X ⁇ YM > 5 X ⁇ Z4[E ⁇ ⁇ ] ⁇ (11)
  • Process 200 tests the Cox model block 208.
  • process 200 uses the fitted weights ⁇ , in block 208, estimates the variance matrix of ⁇ using the Hessian of ⁇ , _ ⁇ ⁇ ( ⁇ ), as well as the quantities a ( b c ⁇ d , a ( / c ⁇ d and F> X ⁇ LMe > 5f X , by constructing a robust variance estimator g h ⁇ ; ⁇ ⁇ # ⁇ i ⁇ / j ⁇ i ⁇ / ⁇ S (12) where i ⁇ # _ ⁇ k ⁇ h ⁇ and j ⁇ # F ⁇ ⁇ A/ > ⁇ lm ⁇ ⁇ lm ⁇ ⁇ S (13) and ⁇ ⁇ lm ⁇ ⁇ # n ⁇ ⁇ 5f ⁇ oT ⁇ U pM cZ q d M cZqd s ⁇ U 5f ⁇ ⁇ Zq4[E / T ⁇ F>
  • FIG. 3 is a flow diagram of one embodiment of a process 300 to train a propensity score model.
  • process 300 fits a model to obtain the propensity z score, Eq. (9,) based on distributed ⁇ .
  • this strategy computes full batch gradients and hessians, in time O(n k ) on each center, and each communication with the aggregator requires the exchange of O(p 2 ) floating numbers.
  • ECAs in this embodiment, we might have nk ⁇ 10,000 and p ⁇ 50, making such a second approach tractable.
  • Figure 3 begins by receiving the initial propensity score model at block 302.
  • process 300 initializes the propensity model parameters.
  • process 300 zeroes the parameters of the propensity model.
  • Process 300 sets the initialized propensity model to the current model at block 306.
  • process 300 sends the current model parameters to each of the training centers, where each training center computes a gradient and Hessian of the model using the current model parameters.
  • each k th of the training centers computes the gradient using Eq. (17): V ⁇ y # _ 3 ⁇ y ⁇
  • the k th training center computes ⁇ y ⁇
  • process 300 can compute a gradient without computing the Hessian by adding noise to the gradient computation.
  • process 300 modifies the gradient computation V ⁇ y ⁇ ⁇ by stochastically sampling with replacement a batch of size B ⁇ min(n k ) ⁇ in each training center then restricting the computation of the gradient _ 3 ⁇ y ⁇
  • process 300 receives the gradients and Hessians computed by each training center.
  • process 300 receives the gradients and not the Hessian.
  • Process 300 computes the global gradient and Hessian from the individual gradient and Hessian components received from the training centers at block 312. In one embodiment, process 300 computes an average of the gradient and Hessian components using Eq. (19) and (20): / V ⁇ # y F z y A/ > V ⁇ y (19) / i ⁇ # y F z y A/ > i ⁇ y (20). In another embodiment Process 300 receives only the (noisy) gradients from each of the training centers, and process 300 just computes the global gradient using (19) above.
  • process 300 computes the updated propensity score model parameters.
  • process 300 computes the updated propensity score model parameters using the Eq. (20):
  • process 300 determines if the model has converged. In one embodiment, process 300 determines that the model has converged if the number of loops has reached a max loop threshold.
  • process 300 can determine if the model has converged using a different scheme, such as inspecting the norm of the hessian, the difference between the current likelihood and the likelihood from the round just before, or another scheme to determine convergence. If the model has converged, process 300 outputs the updated propensity score model at 318. If the model has not converged, process 300 sets the updated model to the current model at block 320. Execution proceeds to block 308 above.
  • the propensity score model that is computed in Figure 3 is used for the training of the Cox model.
  • Figure 4 is a flow diagram of one embodiment of a process 400 to train a Cox model using federated IPTW. As described above with reference to Eq.
  • the log- likelihood equation has cross-terms that make it difficult to spread the computation of the log- likelihood across different training centers.
  • a method to minimize the regularized weighted CoxPH model in a federated fashion is described. Since the non- separability of the weighted CoxPH log-likelihood ⁇ ( ⁇ ) prevents the use of a straightforward FL algorithms, a modified Web Distributed Cox model (DISCO) approach is used to build a pooled- equivalent second-order method.
  • DISCO Web Distributed Cox model
  • the main difficulty of federating Equation (11) stems from the non-separability of the log-likelihood component because of the cross-center terms.
  • the risk set R s is a union of per-center terms, ⁇ ( # ⁇ z y A/ ⁇ ( ⁇ y (22)
  • the modified WebDISCO includes performing an iterative server- level Newton-Raphson descent on L.
  • the gradient ⁇ ( ⁇ ) and Hessian _ Z ⁇ ⁇ ( ⁇ ) thus needs to be computed in a federated fashion.
  • Equations (23) and (24) can be re-written as: ⁇ _Q ⁇ > pM ⁇ Z ⁇ Z ⁇ ⁇ # F ( ⁇ K" > ⁇ ( U ⁇ (pMr ⁇ Z ⁇ (34) and (25) as: w ⁇ ⁇ ⁇ ⁇ 4 _ Z ⁇ Q ⁇ # UF > ( ⁇ K" > ⁇ ( R pM Z p r U pM ⁇ Z ⁇ p r M ⁇ Z ⁇ M ⁇ Z ⁇ pM ⁇ Z ⁇ w ] (35) I n one embodiment, re- _ ZQ ⁇ and Hessian _Z ⁇ Q ⁇ can be expressed as > _ Z Q ⁇ ⁇ ⁇ # Fz y A/ > £ F> ( ⁇ K" > ⁇ ( ⁇ y U ⁇ F ( ⁇ y ⁇ e > p ⁇ M ⁇ e ⁇ Z ⁇ F > > r ⁇ ⁇ ⁇ (36) Assuming the set of all true events !
  • Figure 4 begins by initializing the weights of the Cox model at block 402.
  • process 400 initializes the weights by setting the weights to zero.
  • process 400 could initialize the weights to any other deterministic value or generate random weights.
  • process 400 sets the initial weights to be the current weights.
  • Process 400 computes an updated gradient and Hessian at block 406. In one embodiment, process 400 computes the updated gradient and Hessian as described in Figure 5 below. At block 408, process 400 computes the updated weights using the updated gradient and Hessian.
  • process 400 computes the updated weights using the equation: ⁇ / ⁇ 1 # ⁇ 1 ⁇ / U ⁇ 1 W_ Z ⁇ ⁇ ⁇ ⁇ 1 ⁇ / ⁇ ⁇ W_ Z > ⁇ ⁇ ⁇ 1 ⁇ / ⁇ ⁇ (38) where _ Z ⁇ ⁇ 1 ⁇ / ⁇ and the _ Z > ⁇ ⁇ ⁇ 1 ⁇ / ⁇ # _ Z > Q ⁇ ⁇ 1 ⁇ / ⁇ v ⁇ _ Z ⁇ ⁇ ⁇ ⁇ (39) and _ Z > Q ⁇ ⁇ 1 ⁇ / ⁇ and _ Z ⁇ Q ⁇ ⁇ 1 ⁇ / ⁇ are the gradient and Hessian components that have been computed using a distributed procedure by the training centers illustrated in Fig. 5 described below.
  • process 400 determines if the model has converged. In one embodiment, process 400 determines that the model has converged if the number of loops has reached a max loop threshold. In another embodiment, process 400 determines if the model has converged using a different scheme such as inspecting the norm of the global Hessian or computing the improvement over the current log-likelihood. If the model has converged, process 400 outputs the updated model weights and Hessian at block 412. If the model has not converged, process 400 sets the updated model to the current model at block 414. Execution proceeds to block 406 above. [0066] As described above, a distributed procedure is used to compute the gradient and Hessian components.
  • FIG. 5 is a flow diagram of one embodiment of a process 500 to compute gradient and Hessian components for a federated IPTW.
  • process 500 is performed by an aggregator to compute the gradient and Hessian components, such as the aggregator 106 as described in Figure 1 above.
  • process 500 begins by receiving the current weights and the set S at block 502.
  • process 500 requests a computation of Wk,s, Zk,s, and the aggregated quantities over the risk set components (a ( b ⁇ y ⁇ ⁇ ⁇ ⁇ a ( / ⁇ y ⁇ ⁇ ⁇ ⁇ a ( ⁇ ⁇ y ⁇ ⁇ ⁇ ) from the training centers.
  • Wk,s, Zk,s, and the risk set ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ a ( ⁇ ⁇ y ⁇ ⁇ ⁇ ) is further described in Figure 6 below.
  • Process 500 receives Wk,s, Zk,s, and the risk set components ab ( ⁇ y ⁇ a/ ( ⁇ y ⁇ a ⁇ ( ⁇ y ⁇ ) from the training centers at block 506. With these components, process gradient, Hessian, and risk set components at block 508. In one embodiment, process uses Equation (36) and (37) to compute gradient and Hessian with W k,s , Z k,s , and the risk set components (a ( b ⁇ y ⁇ a ( / ⁇ y ⁇ a ( ⁇ ⁇ y ⁇ ). At block 510, process 500 determines if the computation is In one embodiment, the computation is completed based on achieving a maximum number of steps.
  • each training center can compute intermediate quantities that are used by the aggregator to compute a Hessian and risk set components for the Cox model.
  • Figure 6 is a flow diagram of one embodiment of a process 600 to compute Hessian and risk set components of the Cox model at each center. In one embodiment, Figure 6 begins by receiving the weights and the set ! " that are used to compute the intermediate data at block 602.
  • process 600 computes Wk,s, Zk,s, for each item in its training data that is in set ! " .
  • process 600 computes Wk,s, Zk,s, using Equations (32) and (33) above.
  • Process 600 for each item in set ! " ⁇ such that Wk,s > 0, computes the risk set components.
  • process 600 computes (a ( b ⁇ y ⁇ a ( / ⁇ y ⁇ a ( ⁇ ⁇ y ⁇ ) using Equations (29), (30), and (31) respectively.
  • Process Z k,s , and the risk set components (a ( b ⁇ y ⁇ a ( / ⁇ y ⁇ a ( ⁇ ⁇ y ⁇ ) at block 608.
  • the Cox model is tested. In one embodiment, the testing of the Cox model is distributed across the multiple training centers.
  • Figure 8 is a flow diagram of one embodiment of a process 800 to test the Cox model. In one embodiment, process 800 begins by receiving the outputted Cox model weights, the Hessian and the risk set components at block 802. In this embodiment, process 800 receives the outputted Cox model weights, the Hessian and the risk set components as described in Figure 4, block 412 above.
  • process 800 computes MK defined in (43).
  • process 800 computes MK by sending the Cox model weights ⁇ , the Hessian, and the risk set components and requesting each of the training centers to compute a corresponding MK using Cox model weights ⁇ , the Hessian, and the risk set components.
  • each training center computes MK using (43) below.
  • the training centers compute their ⁇ y by means of a ( b ⁇ ⁇ a ( / ⁇ H and with their own private data and ⁇ end it to the aggregator 102 for it to reconstruct M by summing all ⁇ y , which then allows it to compute ⁇ g ⁇ ; ⁇ and to perform a Wald test.
  • process 800 receives the ⁇ ycomputed using the ab ( ⁇ ⁇ a/ ( ⁇ « ⁇ s in block 804.
  • each of the training centers computes ⁇ y using (43) and returns the corresponding ⁇ y to process 800.
  • Process 800 computes ⁇ g ⁇ ; ⁇ with the gathered ⁇ y at block 808 using (42).
  • FIG. 7 is an illustration of a plot 700 of one embodiment between pooled IPTW and FedECA.
  • the error between the pooled IPTW and FedECA is small, on the order of 10 -5 for propensity scores, 10 -5 (or smaller) for hazard ratio, 10 -4 for ⁇ -values, 10 -5 for propensity scores, and 10 -6 for partial likelihood.
  • the errors between pooled IPTW and FedECA are small.
  • Figure 9 shows one example of a data processing system 900, which may be used with one embodiment of the present invention.
  • system 900 may be implemented by any of the training centers 104A-N, 106 and aggregator 102 as shown in Figure 1 above.
  • Figure 9 illustrates various components of a computer system, it is not intended to represent any particular architecture or manner of interconnecting the components as such details are not germane to the present invention. It will also be appreciated that network computers and other data processing systems or other consumer electronic devices, which have fewer components or perhaps more components, may also be used with the present invention.
  • the computer system 900 which is a form of a data processing system, includes a bus 903 which is coupled to a microprocessor(s) 905 and a ROM (Read Only Memory) 907 and volatile RAM 909 and a non-volatile memory 911.
  • the microprocessor 905 may include one or more CPU(s), GPU(s), a specialized processor, and/or a combination thereof.
  • the microprocessor 905 may retrieve the instructions from the memories 907, 909, 911 and execute the instructions to perform operations described above.
  • the bus 903 interconnects these various components together and also interconnects these components 905, 907, 909, and 911 to a display controller and display device 917 and to peripheral devices 915 such as input/output (I/O) devices which may be mice, keyboards, modems, network interfaces, printers and other devices which are well known in the art.
  • peripheral devices 915 such as input/output (I/O) devices which may be mice, keyboards, modems, network interfaces, printers and other devices which are well known in the art.
  • I/O input/output
  • the volatile RAM (Random Access Memory) 909 is typically implemented as dynamic RAM (DRAM), which requires power continually in order to refresh or maintain the data in the memory.
  • DRAM dynamic RAM
  • the mass storage 911 is typically a magnetic hard drive or a magnetic optical drive or an optical drive or a DVD RAM or a flash memory or other types of memory systems, which maintain data (e.g., large amounts of data) even after power is removed from the system.
  • the mass storage 911 will also be a random-access memory although this is not required. While Figure 9 shows that the mass storage 911 is a local device coupled directly to the rest of the components in the data processing system, it will be appreciated that the present invention may utilize a non-volatile memory which is remote from the system, such as a network storage device which is coupled to the data processing system through a network interface such as a modem, an Ethernet interface or a wireless network.
  • the bus 903 may include one or more buses connected to each other through various bridges, controllers and/or adapters as is well known in the art.
  • Portions of what was described above may be implemented with logic circuitry such as a dedicated logic circuit or with a microcontroller or other form of processing core that executes program code instructions.
  • program code such as machine-executable instructions that cause a machine that executes these instructions to perform certain functions.
  • a “mach” may be a machine that converts intermediate form (or “ abstract” ) instructions into processor specific instructions (e.g., an abstract execution environment such as a “ virtual machine” (e.g., a Java Virtual Machine), an interpreter, a Common Language Runtime, a high-level language virtual machine, etc.), and/or, electronic circuitry disposed on a semiconductor chip (e.g., “ logic circuitry” implemented with transistors) designed to execute instructions such as a general-purpose processor and/or a special-purpose processor. Processes taught by the discussion above may also be performed by (in the alternative to a machine or in combination with a machine) electronic circuitry designed to perform the processes (or a portion thereof) without the execution of program code.
  • processor specific instructions e.g., an abstract execution environment such as a “ virtual machine” (e.g., a Java Virtual Machine), an interpreter, a Common Language Runtime, a high-level language virtual machine, etc.
  • electronic circuitry disposed on a semiconductor chip e
  • the present invention also relates to an apparatus for performing the operations described herein.
  • This apparatus may be specially constructed for the required purpose, or it may comprise a general-purpose computer selectively activated or reconfigured by a computer program stored in the computer.
  • a computer program may be stored in a computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD- ROMs, and magnetic-optical disks, read-only memories (ROMs), RAMs, EPROMs, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus.
  • a machine-readable medium includes any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer).
  • a machine- readable medium includes read only memory (“ROM” ); random access memory (“RAM” ); magnetic disk storage media; optical storage media; flash memory devices; etc.
  • An article of manufacture may be used to store program code.
  • An article of manufacture that stores program code may be embodied as, but is not limited to, one or more memories (e.g., one or more flash memories, random access memories (static, dynamic or other)), optical disks, CD-ROMs, DVD ROMs, EPROMs, EEPROMs, magnetic or optical cards or other type of machine-readable media suitable for storing electronic instructions.
  • Program code may also be downloaded from a remote computer (e.g., a server) to a requesting computer (e.g., a client) by way of data signals embodied in a propagation medium (e.g., via a communication link (e.g., a network connection)).
  • a remote computer e.g., a server
  • a requesting computer e.g., a client
  • Program code may also be downloaded from a remote computer (e.g., a server) to a requesting computer (e.g., a client) by way of data signals embodied in a propagation medium (e.g., via a communication link (e.g., a network connection)).
  • a communication link e.g., a network connection
  • these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like. [0080] It should be kept in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Epidemiology (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Pathology (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medicinal Chemistry (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

L'invention concerne un procédé et un appareil d'un dispositif qui effectue une analyse par IPTW fédérée robuste pour tirer parti de données du monde réel multicentrique ou d'essais cliniques historiques en tant que groupe témoin pour un essai à bras unique tout en tenant compte des biais d'échantillonnage en raison de l'absence de randomisation. Dans un mode de réalisation donné à titre d'exemple, le dispositif entraîne un premier modèle à l'aide d'une pluralité de centres d'entraînement, chaque centre d'entraînement comprenant un ensemble privé de données d'entraînement et une unité de calcul, le premier modèle estimant la probabilité de réception du traitement expérimental pour chaque échantillon compte tenu d'un ensemble privé de données d'entraînement. En outre, le dispositif peut entraîner un deuxième modèle à l'aide de la pluralité de centres d'entraînement, l'entraînement du deuxième modèle comprenant le calcul de poids pour chaque échantillon à l'aide des probabilités estimées à partir du premier modèle, et le deuxième modèle estime ensuite l'effet du médicament sur des résultats de temps avant événement sur des données pondérées. De plus, le dispositif peut utiliser le deuxième modèle ainsi qu'une troisième procédure fédérée pour tester si le médicament expérimental a un effet significatif sur le résultat de survie d'une manière robuste.
PCT/US2024/051768 2023-10-20 2024-10-17 Procédé de bras de commande externes fédérés pour inférence causale améliorée par la confidentialité sur des données distribuées Pending WO2025085632A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP23306845.1 2023-10-20
EP23306845 2023-10-20

Publications (1)

Publication Number Publication Date
WO2025085632A1 true WO2025085632A1 (fr) 2025-04-24

Family

ID=88697746

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2024/051768 Pending WO2025085632A1 (fr) 2023-10-20 2024-10-17 Procédé de bras de commande externes fédérés pour inférence causale améliorée par la confidentialité sur des données distribuées

Country Status (1)

Country Link
WO (1) WO2025085632A1 (fr)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190371468A1 (en) * 2015-09-30 2019-12-05 Inform Genomics, Inc. Systems and Methods for Predicting Treatment-Regimen-Related Outcomes
US20200411199A1 (en) * 2018-01-22 2020-12-31 Cancer Commons Platforms for conducting virtual trials
US20210142910A1 (en) * 2019-11-08 2021-05-13 Tempus Labs, Inc. Evaluating effect of event on condition using propensity scoring

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190371468A1 (en) * 2015-09-30 2019-12-05 Inform Genomics, Inc. Systems and Methods for Predicting Treatment-Regimen-Related Outcomes
US20200411199A1 (en) * 2018-01-22 2020-12-31 Cancer Commons Platforms for conducting virtual trials
US20210142910A1 (en) * 2019-11-08 2021-05-13 Tempus Labs, Inc. Evaluating effect of event on condition using propensity scoring

Similar Documents

Publication Publication Date Title
Ying et al. Two‐stage residual inclusion for survival data and competing risks—An instrumental variable approach with application to SEER‐Medicare linked data
Rigdon et al. Randomization inference for treatment effects on a binary outcome
Cronin et al. strmst2 and strmst2pw: new commands to compare survival curves using the restricted mean survival time
Amico et al. The single‐index/Cox mixture cure model
Bar‐Yam The limits of phenomenology: from behaviorism to drug testing and engineering design
CN115943397A (zh) 垂直分区数据的联邦双随机核学习
Zenil‐Ferguson et al. chromploid: An R package for chromosome number evolution across the plant tree of life
Wang et al. A quantum approximate optimization algorithm with metalearning for maxcut problem and its simulation via tensorflow quantum
Knudson et al. Likelihood‐based inference for generalized linear mixed models: Inference with the R package glmm
Saarela et al. Predictive Bayesian inference and dynamic treatment regimes
WO2021174881A1 (fr) Procédé et appareil de prédiction de combinaisons d'informations multidimensionnelles, dispositif informatique et support
Cai et al. CAPITAL: Optimal subgroup identification via constrained policy tree search
Zhang et al. Adaptively leveraging external data with robust meta‐analytical‐predictive prior using empirical Bayes
Kivaranovic et al. Conformal prediction intervals for the individual treatment effect
Li et al. Targeted learning on variable importance measure for heterogeneous treatment effect
Golchi et al. Estimating the sampling distribution of posterior decision summaries in Bayesian clinical trials
Renson et al. Identifying and estimating effects of sustained interventions under parallel trends assumptions
Griswold et al. Practical marginalized multilevel models
WO2025085632A1 (fr) Procédé de bras de commande externes fédérés pour inférence causale améliorée par la confidentialité sur des données distribuées
Chen et al. A semiparametric mixture cure survival model for left‐truncated and right‐censored data
KR102892152B1 (ko) 연합학습을 통해 의료예측정보를 도출하는 모델을 학습시키는 방법 및 이를 수행하는 시스템, 컴퓨터-판독가능 매체
Swihart et al. A Unifying Framework for Marginalised Random‐Intercept Models of Correlated Binary Outcomes
Tang Size and power estimation for the Wilcoxon–Mann–Whitney test for ordered categorical data
Smith et al. Performance of Cross-Validated Targeted Maximum Likelihood Estimation
Choi et al. Efficient semiparametric mixture inferences on cure rate models for competing risks

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24880575

Country of ref document: EP

Kind code of ref document: A1