WO2023237874A1

WO2023237874A1 - Health prediction method and apparatus for patients with copd

Info

Publication number: WO2023237874A1
Application number: PCT/GB2023/051478
Authority: WO
Inventors: Craig TURPIE; Shane BURNS
Original assignee: Lenus Health Ltd
Current assignee: Lenus Health Ltd
Priority date: 2022-06-06
Filing date: 2023-06-06
Publication date: 2023-12-14
Anticipated expiration: 2024-12-06
Also published as: GB202208262D0

Abstract

A method for predicting at least one health outcome for a patient with chronic obstructive pulmonary disease (COPD) comprising: receiving data for a patient representing values for a set of features, wherein the set of features comprise features representing or associated with a clinical history of the patient; applying at least one trained machine learning model to the received data to obtain a prediction of one or more health outcomes relating to COPD, wherein at least one of a), b) and c): a) the method further comprises obtaining a rationale for the prediction and/or a contribution to the prediction for one or more features of the set of features; b) wherein the set of features are predetermined by at least one further machine learning procedure wherein the at least one further machine learning procedure comprises an unsupervised machine learning procedure; c) wherein the at least one trained machine learning model comprises a calibrated model such that the obtained prediction comprises a calibrated prediction.

Description

HEALTH PREDICTION METHOD AND APPARATUS FOR PATIENTS WITH COPD

Field

The present invention relates to a method and apparatus for predicting at least one health outcome, for example for predicting at least one health outcome for a patient with chronic obstructive pulmonary disease (COPD). The predicting of the health outcome may be calibrated, fair and explainable.

Background

Chronic obstructive pulmonary disease (COPD) is a common, progressive, preventable and treatable respiratory disorder that affects 1.2 million people in the UK and 384 million people worldwide. It is forecast to become the third leading cause of death worldwide by 2030.

COPD is characterised by frequent exacerbations that may result in a reduction in quality of life and disease progression for patients. An exacerbation may comprise an acute deterioration of signs and symptoms associated with COPD, for example breathlessness, coughing and fatigue. In some cases, an exacerbation may require hospital treatment.

COPD exacerbations are responsible for a large proportion of the disease burden, adverse outcomes and healthcare costs associated with COPD. COPD exacerbations account for one in eight of all UK hospital admissions and are projected to cost the NHS £2.5bn per year by 2030.

Current COPD management may typically be based on a reactive approach instead of a preventative approach. In some circumstances, delays in recognising treatable opportunities may result in COPD care quality gaps. Such care quality gaps may limit provision of cost-effective evidence-based interventions which may improve quality of life, exacerbation outcomes and admission rates.

Summary

In a first aspect of the invention, there is provided a method for predicting at least one health outcome for a patient with chronic obstructive pulmonary disease (COPD) comprising: receiving a clinical history of the patient; receiving patient-reported outcomes (PRO) data submitted by the patient; receiving sensor data comprising or representative of physiological measurements for the patient, wherein the sensor data is captured daily or more often; and applying at least one trained machine learning model to the clinical history, the PRO data and the sensor data to obtain a prediction of one or more health outcomes relating to COPD.

At least some of the sensor data may be captured by a wearable device worn by the patient.

The PRO data may comprise wellbeing data. The wellbeing data may comprise data that is self-reported by the patient. The wellbeing data may comprise data on at least one of cough, phlegm, shortness of breath, wheezing, night sleeping, night waking, chest illness, smoking, chest tightness, activity, confidence in leaving the home, sleep quality, energy, mobility, self-care, pain, discomfort, anxiety, depression.

The sensor data may comprise patient activity data, heart rate data and data indicative of restless sleep. The prediction may be based at least partially on decline in patient- reported wellbeing in combination with a change in patient activity data, data indicative of restless sleep and heart rate data.

The prediction may comprise a prediction of an exacerbation of COPD occurring within a time period. The time period for which exacerbation of COPD is predicted may be described as a short-term or near-term time period. An extent of the time period may be between 24 hours and 1 week. An extent of the time period may be between 48 hours and 120 hours. The time period may be 72 hours.

The prediction may comprise a prediction of mortality. The prediction of mortality may be a prediction of mortality occurring within a mortality time period. An extent of the mortality time period may be between 6 months and 5 years. The mortality time period may be 12 months.

The prediction may comprise a prediction of readmission to hospital. The prediction of readmission to hospital may be a prediction of readmission to hospital occurring within a readmission time period. An extent of the readmission time period may be between 1 month and 2 years. The readmission time period may be 3 months. Readmissions may be all-cause readmissions or respiratory related readmissions.

The prediction may be expressed as a score.

The method may further comprise displaying or communicating the prediction to a clinician. The method may further comprise displaying or communicating to the clinician a set of features used as, or related to, an input to the machine learning model. The method may further comprise displaying or communicating to the clinician a rationale for the prediction.

The method comprise obtaining a rationale for the prediction by applying one or more explainability procedures.

The method may further comprise issuing an alert in dependence on the prediction.

The method may further comprise obtaining predictions for the at least one health outcome for a plurality of patients, and displaying or communicating the predictions for at least some of the plurality of patients to a clinician. The method may further comprise filtering the patients in dependence on the predictions. The method may further comprise ordering the patients in dependence on the predictions.

The method may further comprise providing messaging functionality for communication between the patient and one or more clinicians.

The clinical history may be obtained or derived from the patient’s electronic health record (EHR).

At least part of the clinical history may be input by one or more clinicians. At least part of the clinical history may be input as part of a patient onboarding process. The at least part of the clinical history may comprise at least one of: presence of one or more comorbidities of a set list of comorbidities, a number of exacerbations in the previous 12 months, a number of hospital admissions in the previous 12 months. The clinical history may comprise information on at least one of: hospital admissions, episodes of severe illness, COPD exacerbations, treatment, laboratory data, prescribing data, diagnosis history.

The PRO data may comprise data obtained by presenting a survey or questionnaire to a patient and recording the patient’s responses. The survey or questionnaire may comprise questions on the patient’s perceived wellbeing.

The sensor data may comprise at least one of: activity data, respiratory data, heart rate, sleep data, energy expenditure, oxygen saturation.

The sensor data may comprise data received from a home respiratory sensor. The sensor data may comprise data received from a non-invasive ventilator (NIV) mask. The sensor data may comprise respiratory rate data. The sensor data may comprise data on one or more ventilation parameters.

Applying the at least one trained machine learning model may comprise applying a supervised trained machine learning model to input data for the patient, the input data representing a predetermined set of features representing and/or related to the clinical history, the PRO data and the sensor data, wherein the predetermined set of features are determined using an unsupervised machine learning procedure. The unsupervised machine learning procedure may comprise a principle component analysis and/or a clustering procedure. The at least on trained machine learning model may comprise a supervised machine learning model or a model obtained using a supervised machine learning procedure. The supervised trained machine learning model may comprise a tree-based procedure, for example, a decision tree. The supervised trained machine learning model may comprise a k-means clustering procedure. The supervised algorithm may comprise a one-versus-rest procedure

The method may comprise determining a contribution to the predicted health outcome for features representing and/or related to the clinical history, the PRO data and the sensor data, and outputting a representation of the determined contribution. Determining the contribution may comprise applying an explainability process, for example, a local or global explainability process, to obtain the contribution. The output may comprise a calibrated prediction of a health outcome.

The output of the at least one model may comprise a probability of a health outcome. The probability may comprise a calibrated probability determined based on a calibration procedure. The probability may be obtained by applying a pre-determined transformation to the output of the model. The output may comprise a calibrated prediction of a health outcome. Applying the trained machine learning model may comprise outputting a probability of the at least one health outcome together with the input numerical value for one or more input features and a contributing factor or score for said input feature.

Applying the at least one trained machine learning model may comprise applying a plurality of trained models to obtain a corresponding plurality of predictions for the at least one health outcome and wherein at least one of: the plurality of predictions are combined to obtain a combined prediction; the plurality of machine learning models are calibrated; an explainability procedure is applied to each of the plurality of trained models to obtain a contribution of the input to the prediction. The explainability procedure may be applied to obtain a rationale for the prediction. The method may comprise applying a demographic and class balancing process.

The method may comprise determining one or more metrics representing an expected model behaviour at said a probability threshold, optionally wherein the probability threshold is selected by a user.

The method may comprise obtaining user input representing a selected probability threshold for the model.

In a second aspect, which may be provided independently, there is provided an apparatus comprising processing circuitry configured to: receive a clinical history of the patient; receive patient-reported outcomes (PRO) data submitted by the patient; receive sensor data comprising or representative of physiological measurements for the patient, wherein the sensor data is captured daily or more often; and apply at least one trained machine learning model to the clinical history, the PRO data and the sensor data to obtain a prediction of one or more health outcomes relating to COPD. In a third aspect, which may be provided independently, there is provided a method for training at least one machine learning model to predict at least one health outcome for patients with COPD, the method comprising: receiving a plurality of training data sets, each training data set comprising clinical history, PRO data and sensor data for a respective patient; and using the plurality of training data sets to train the at least one machine learning model to predict the at least one health outcome, the training comprising a feature selection process to select a plurality of features relevant to the at least one health outcome.

The machine learning model may comprise a tree-based model. The training may comprise boosting and bagging. The training may comprise using domain-driven feature interaction constraints. The training may comprise optimizing the model to minimize false positives. The training may comprise optimizing hyperparameters of the model using a grid-search approach utilizing cross validation.

The method may further comprise performing feature engineering to determine an initial set of features for initial training of the at least one machine learning. The feature selection process may comprise selecting at least some of the initial set of features. The feature selection process may comprise selecting at least one further feature generated during the training process.

The method may further comprise data cleansing of the training data sets.

The method may further comprise exploring the training data sets for potential bias.

In a fourth aspect, which may be provided independently, there is provided an apparatus comprising processing circuitry configured to: receive a plurality of training data sets, each training data set comprising clinical history, PRO data and sensor data for a respective patient; and train a machine learning model to predict at least one health outcome for patients with COPD from the plurality of features, the training comprising performing a feature selection process to select a plurality of features relevant to the at least one health outcome. In a fifth aspect, which may be provided independently, there is provided a computer program product comprising computer readable instructions that are executable to perform a method as claimed or described herein.

In a sixth aspect, which may be provided independently, there is provided a method for predicting at least one health outcome for a patient with chronic obstructive pulmonary disease (COPD) comprising: receiving data for a patient representing values for a set of features, wherein the set of features comprise features representing or associated with a clinical history of the patient; and applying at least one trained machine learning model to the received data to obtain a prediction of one or more health outcomes relating to COPD. The method may further comprise obtaining a rationale for the prediction and/or a contribution to the prediction for one or more features of the set of features. The method may further comprise obtaining other information explaining and/or regarding the prediction. The set of features may be predetermined by at least one further machine learning procedure wherein the at least one further machine learning procedure comprises an unsupervised machine learning procedure. The at least one trained machine learning model may comprise a calibrated model such that the obtained prediction comprises a calibrated prediction. The clinical history may comprise information on at least one of: hospital admissions, episodes of severe illness, COPD exacerbations, treatment, laboratory data, prescribing data, diagnosis history. The clinical history may be represented as or comprise clinical history data.

The set of features may comprise at least one of patient identification features, demographic features, prescribing features, admission features and laboratory test features. The received data may comprise receiving patient-reported outcomes (PRO) data submitted by the patient and/or sensor data.

The prediction may comprise a prediction of an exacerbation of COPD occurring within a time period, wherein an extent of the time period is between 24 hours and 1 week, optionally wherein the time period is 72 hours. The prediction may comprise a prediction of mortality occurring within the next 12 months. The prediction may comprise a prediction of readmission to hospital within the next 3 months.

The rationale and/or contribution to the prediction may comprise a feature importance score. The method may comprise ranking and/or filtering and/or selecting one or more features of the set of features based on the determined importance. The method may comprise displaying the most important features

The importance scores may comprise global and/or local importance scores. The global feature importance score may be indicative of the importance of a feature to the model in general. The local feature importance score may be indicative of the importance of a feature to a specific prediction.

Obtaining the rationale for the prediction and/or the contribution to the prediction one or more health outcome for at least one of the set of features may comprise applying an explainability procedure to the at least one model and/or to the received data.

The explainability procedure may comprise a SHAP (SHapley Additive exPlanation), LIME (Local Interpretable Model-Agnostic Explanations) or ELI5 (Explain Like I'm 5) based procedure

The at least one trained machine learning model may comprise a supervised machine learning model or a machine learning model trained using supervised learning. The trained supervised machine learning model may be configured to output one or more labels of a set of labels representing a prediction of a health outcome wherein the set of labels is obtained using the unsupervised machine learning procedure.

The method may comprise applying the unsupervised learning procedure to determine at least some of the set of features for the training of the at least one machine learning model. The method may comprises performing the unsupervised machine learning procedure on at least one training data set to obtain said labels.

The unsupervised machine learning procedure may comprise a principle component analysis and/or a clustering procedure. The at least one trained machine learning model may comprise a tree-based procedure, for example, a decision tree and/or a k-means clustering procedure and/or a one-versus-rest procedure

The method may further comprise applying a demographic and class balancing process. The at least one trained model may be calibrated such that the predicted outcome substantially corresponds to a real-world probability of said health outcome. The at least one trained model may be calibrated using a Platt calibration method.

The method may further comprise applying an unsupervised topic modelling process to at least part of the clinical history data to obtain one or more labelled topics for use as one or more features of the set of features. The unsupervised topic modelling process may be applied to a diagnosis history across a specified time window.

The at least one trained model may comprise a plurality of trained models and applying the at least one trained model may comprise applying the plurality of trained models to obtain a corresponding plurality of predictions for the at least one health outcome. The method may comprise combining the plurality of predictions to obtain a combined prediction. The plurality of machine learning models may be calibrated. The method may comprise applying an explainability procedure to each of the plurality of trained models to obtain a rationale and/or contribution for each of the plurality of trained models.

The method may further comprise determining expected model behaviour and/or one or more metrics representing said behaviour at a probability threshold. The probability threshold may be selected by a user.

The method may further comprise displaying or communicating the prediction to a clinician. The method may further comprise displaying or communicating to the clinician a set of features used as, or related to, an input to the machine learning model. The method may further comprise displaying or communicating to the clinician the obtained rationale for the prediction and/or the obtain contribution. The method may further comprise issuing an alert in dependence on the prediction.

The received data may further comprise sensor data comprising or representative of physiological measurements for the patient, wherein the sensor data is captured daily or more often. The received data may comprise patient-reported outcomes (PRO) data submitted by the patient; The method may comprises obtaining the clinical history data, wherein obtaining the received data comprises at least one of: performing at least one laboratory test to obtain laboratory data.

In a seventh aspect, which may be provided independently, there is provided an apparatus comprising processing circuitry configured to: receive data for a patient representing values for a set of features, wherein the set of features comprise features representing or associated with a clinical history of the patient; apply at least one trained machine learning model to the received data to obtain a prediction of one or more health outcomes relating to COPD. The processing circuitry may be configured to obtain a rationale for the prediction and/or a contribution to the prediction for one or more features of the set of features. The processing circuitry may be configured to obtain other information explaining and/or regarding the prediction. The set of features may be predetermined by at least one further machine learning procedure wherein the at least one further machine learning procedure comprises an unsupervised machine learning procedure. The at least one trained machine learning model may comprise a calibrated model such that the obtained prediction comprises a calibrated prediction. The clinical history may comprise information on at least one of: hospital admissions, episodes of severe illness, COPD exacerbations, treatment, laboratory data, prescribing data, diagnosis history. The clinical history may be represented as or comprise clinical history data.

In an eight aspect, which may be provided independently, there is provided a method for training at least one machine learning model to predict at least one health outcome for patients with COPD, the method comprising: receiving a plurality of training data sets, each training data set comprising at least clinical history data for a respective patient; and using the plurality of training data sets to train the at least one machine learning model to predict the at least one health outcome relating to COPD, wherein the training comprises a feature selection process to select a plurality of features relevant to the at least one health outcome.

The machine learning model may comprise a tree-based model and/or the training comprises boosting and bagging. The training may comprise using custom loss functions. The training may comprise optimizing the model to minimize false positives. The training may comprise optimizing hyperparameters of the model using a grid-search approach utilizing cross validation.

The training may comprise performing a calibration process to obtain at least one calibrated machine learning model from at least one un-calibrated machine learning model. The calibration process may comprise obtaining a mapping between the output of the at least one trained model to a probability distribution. The calibration process may comprises a Platt regression or other Platt scaling process.

The method may further comprises performing feature engineering to determine an initial set of features for initial training of the at least one machine learning, and wherein the feature selection process comprises selecting at least some of the initial set of features and/or selecting at least one further feature generated during the training process.

The method may further comprise data cleansing of the training data sets and/or exploring the training data sets for potential bias.

In an ninth aspect, which may be provided independently, there is provided an apparatus comprising processing circuitry configured to: receive a plurality of training data sets, each training data set comprising at least clinical history data for a respective patient; and use the plurality of training data sets to train the at least one machine learning model to predict the at least one health outcome relating to COPD, wherein the training comprises a feature selection process to select a plurality of features relevant to the at least one health outcome.

Features in one aspect may be provided as features in any other aspect. For example, any one of apparatus, method or computer program product features may be provided as any one other of apparatus, method or computer program product features.

Brief description of the drawings

Embodiments are now described, by way of non-limiting example, and are illustrated in the following figures, in which:

Figure 1 is a schematic illustration of a computer system in accordance with an embodiment; Figure 2 is a flow chart illustrating in overview a training method in accordance with an embodiment;

Figure 3 is a schematic illustration of a system in accordance with an embodiment;

Figure 4 is a flow chart illustrating in overview a model deployment method in accordance with an embodiment;

Figure 5 is a flow chart illustrating in overview a model deployment method in accordance with a further embodiment, and

Figure 6 is a flow chart illustrating in overview a model deployment method in accordance with an embodiment.

Detailed description

Figure 1 is a schematic diagram illustrating in brief overview a computer system 10 in accordance with an embodiment. The computer system is configured to train a model to predict COPD outcomes, and to use the trained model to predict COPD outcomes. In other embodiments, a first system may be configured to train the model and one or more second, different systems may be configured to use the trained model to predict COPD outcomes.

The computer system comprises a computing apparatus 12, for example a server or workstation, which is connected to one or more display screens 14 and one or more input devices 16 (for example, a keyboard and mouse). The computing apparatus 12 comprises a processor 18 and memory 20.

The processor 18 includes data processing circuitry 22 for processing data, including obtaining features from the data; training circuitry 24 for training one or more models; prediction circuitry 26 for using the one or more models to obtain one or more predictions; and display circuitry 28 for displaying the data and/or the prediction(s).

In the present embodiment, the circuitries are each implemented in the processor 18 by means of a computer program having computer-readable instructions that are executable to perform the method of the embodiment. In other embodiments, the circuitries may be implemented as one or more ASICs (application specific integrated circuits) or FPGAs (field programmable gate arrays). The computing apparatus 12 also includes a hard drive and other components including RAM, ROM, a data bus, an operating system including various device drivers, and hardware devices including a graphics card. Such components are not shown in Figure 1 for clarity.

In further embodiments, the computer system 10 may comprise a plurality of computing apparatuses. Functionality of the circuitries may be divided between multiple computing apparatuses and/or multiple processors. Functionality of the circuitries may be divided across a network or cloud-based system. For example, each of the data processing circuitry 22, training circuitry 24 and prediction circuitry 26 may be implemented as a respective one or more cloud-based resources. The cloud-based resources comprise cloud computing resources implemented remotely, for example at one or more data centres, and accessed via a network.

Figure 2 is a flow chart illustrating in overview a method of training a plurality of models to predict a plurality of outcomes using the computer system 10. In other embodiments, functionality of at least part of the computer system 10 may be provided by one or more cloud-based resources.

In the embodiment of Figure 2, the outcomes for which the models are trained are COPD outcomes. The outcomes comprise a measure of a risk of mortality due to COPD occurring within the next 12 months (12-month COPD mortality), a measure of a risk of hospital readmission occurring within the next 3 months (3-month hospital readmission) and a risk of a COPD exacerbation occurring within the next 72 hours (72-hour COPD exacerbation). In other embodiments, the outcomes may relate to different time periods. For example, 24-month COPD mortality may be predicted rather than 12-month COPD mortality The risk of COPD exacerbation may be determined for any suitable short-term time period of hours or days, for example exacerbation within 24 hours, 48 hours or one week. In further embodiments, different outcomes may be predicted. In some embodiments, the outcomes that are predicted may not be COPD outcomes.

At stage 30, the data processing circuitry 22 receives a plurality of training data sets. The plurality of training data sets form part of a data cohort. In one embodiment, the data cohort comprises over 50,000 patients with over 900,000 admissions including patient demographics, diagnoses, admission history, length of stay, prescribing and labs data. The data cohort further comprises 15 months of data from around 550 patients which further includes patient reported outcome (PRO) submissions, daily steps, daily resting heart rate, sleep and home NIV data and a verified set of clinical events for each patient.

The data cohort is divided into training, test and validation data. In the embodiment of Figure 2, the test data forms 15% of the entire data cohort and is randomly selected and statistically tested to ensure it are representative of the full cohort. The remaining 85% of the data cohort is split 80/20 into training data and validation data for model development. Patients that appear in test data do not appear in validation data or test data. Patients that appear in validation data do not appear in test data or training data. Patients that appear in training data do not appear in validation data or test data. The split between training, validation and test data is intended to ensure that models are validated on independent data. Class imbalance and key-features balances are maintained across the training data, validation data and test data to avoid data leakages and mitigate against biases, with the aim of ensuring generalisability to new patients.

The data is divided with the aim that the training data, validation data and test data are each representative of the full population in terms of baseline demographic information and in terms of disease severity. This division is achieved using bespoke functions and statistical methods such as Kolmogorov-Smirnov tests.

In other embodiments, any suitable data cohort may be used. The data cohort may be divided into training data, validation data and test data in any suitable manner and in any suitable proportions.

The training data received by the data processing circuitry 22 comprises a plurality of training data sets. Each training data set relates to a respective patient.

Each training data set comprises historical clinical information relating to a clinical history of the patient. For at least some of the training data sets, the training data set also comprises patient reported outcome (PRO) data submitted by the patient, and sensor data that comprises or is representative of physiological measurements for the patient.

The clinical history comprises historical information about the patient’s health, and may include specific information about the patient’s history with COPD. The clinical history may comprise information about the whole of a patient’s life, or information for a selected period.

In the embodiment of Figure 2, at least part of the clinical history is obtained or derived from historical Electronic Health Record (EHR) data for the patient. The EHR data includes patient demographics, prescribing data, laboratory test data, and hospital admissions data. In other embodiments, the clinical history may include data on any one or more of hospital admissions, episodes of severe illness, treatments, laboratory data, prescribing data and diagnosis history, which may be obtained from an EHR or from any suitable records.

The clinical history may further comprise data that is input by a clinician or other user. For example, the data may be input by a clinician as part of a patient onboarding procedure.

The data that is input may comprise, for example, data on hospital admissions and treatment. The data may comprise data relating to the presence of one or more comorbidities of a set list of comorbidities, a number of exacerbations in the previous 12 months, a number of hospital admissions in the previous 12 months, or any other suitable parameters. In the present embodiment, the clinical history comprises data on exacerbation events. The data on exacerbation events comprises data on exacerbation events that were managed in the home and on exacerbation events that required hospital admissions. The data on exacerbation events is recorded as part of a COPD service and is manually verified by at least one clinician.

The PRO data comprises data submitted by the patient which comprises a selfassessment of one or more aspects of the patient’s condition, for example the patient’s perceived wellbeing.

In the embodiment of Figure 2, the PRO data is obtained from a patient application (app). The patient accesses the patient app via a smartphone. The PRO data is submitted by the patient through the patient app. For example, the PRO data may be submitted by the patient in response to a patient questionnaire or survey. The app may present a questionnaire to the patient, which may comprise for example questions on the patient’s perceived wellbeing. The questionnaire may comprise questions from standard question sets, for example questions from one or more of a MRC Dyspnoea Scale (Medical Research Council) question set, a CAT (COPD Assessment Test) question set or an EQ-5D question set.

The MRC Dyspnoea Scale (https://www.ukri.org/councils/mrc/facilities-and- resources/find-an-mrc-facility-or-resource/mrc-dyspnoea-scale/) is used to assess the degree of baseline functional disability due to dyspnoea. The MRC Dyspnoea Scale includes, amongst other questions, questions about cough, phlegm, shortness of breath, wheezing, night waking, chest illnesses and smoking.

The CAT (https://www.catestonline.org/) quantifies impact of COPD symptoms on patient overall health. The CAT includes questions about cough, phlegm, chest tightness, breathlessness, activity, confidence in leaving the home, sleep quality and energy.

EQ-5D (https://euroqol.org/) is an instrument that assesses mobility, self-care, usual activities, pain/discomfort and anxiety/depression.

In other embodiments, at least some of the questions presented to the patient may be questions that do not form part of any of the standard question sets. For example, questions may be set by a clinician.

The PRO data may comprise data relating to any of the measures of wellbeing used in any of the above-mentioned question sets or any other suitable measures of wellbeing. The measures of wellbeing may be related to symptoms such as cough or breathlessness and/or to more general aspects of wellbeing such as energy or anxiety.

The app may then record the patient’s responses to questions of the questionnaire, which the patient inputs into the app via the smartphone. The patient may input PRO data as part of a symptom diary comprising symptoms self-reported by the patient. In other embodiments, any suitable device may be used to receive the PRO data, for example a laptop, tablet, desktop or other computing device. The PRO data may be gathered using any suitable method, which in some embodiments may not comprise an app. The PRO data is gathered over a study period which may be, for example, a period of weeks, months or years. The PRO data is gathered regularly, for example at least daily or at least weekly.

The sensor data comprises data obtained using a plurality of sensors which are implemented in a plurality of devices. One or more of the devices may be a wearable device worn by the patient. The sensor data may be obtained outside a traditional acute medical setting, for example in the patient’s home or wherever the patient goes. In the embodiment of Figure 2, the devices comprise a wearable device and a NIV (non- invasive ventilation) device. The wearable device may also be referred to as a wearable fitness tracker. The wearable device may comprise, for example, a Fitbit, Garmin or Apple Watch. The NIV device may comprise, for example a ResMed NIV device. In other embodiments, any suitable one or more sensors in one or more devices may be used to obtain the sensor data.

Sensor data is gathered over a study period which may be, for example, a period of weeks, months or years. The sensor data is gathered at regular intervals, for example at least hourly or at least daily. The sensor data may provide information about a patient’s condition and about changes in condition that occur over a period of hours or days.

In the embodiment of Figure 2, the sensor data comprises activity data which is obtained using the wearable device. The activity data may comprise data obtained by measuring steps taken by the patient. The activity data may comprise a step count, for example a daily step count. In other embodiments, activity data may be obtained in any suitable manner and by any suitable device or devices.

The sensor data further comprises heart rate data measured by the wearable device, where the heart rate data comprises resting heart rate data. In other embodiments, any suitable heart rate data may be obtained. The heart rate data may be obtained in any suitable manner and by any suitable device or devices.

The sensor data further comprises sleep data measured using the wearable device. The wearable device is configured to measure when the patient is awake and when the patient is asleep. The sleep data is representative of periods of awake and sleep activity. Granular activity and sleep data may be used to map a typical pattern for each user, allowing a change from baseline to be identified. Sleep data may also include identification of periods of sleep and restlessness throughout a sleep cycle. In other embodiments, sleep data may be obtained in any suitable manner and by any suitable device or devices.

The sensor data further comprises respiratory data. Respiratory data may comprise, for example, respiratory rate and/or ventilation parameters. In the embodiment of Figure 2, respiratory data is measured using a home NIV (non-invasive ventilation) device, for example a ResMed NIV device. A home NIV device may measure respiratory events at a relatively high frequency. For example, the home NIV device may obtain data as frequently as every second. In other examples, the home NIV device may measure one or more respiratory parameters at least every hour, at least every 10 minutes, at least every minute, at least every 30 seconds, or at least every 10 seconds. The home NIV device may provide inferred information on tidal volume, minute ventilation and respiratory rate.

Alternatively or additionally, respiratory data may be measured using a respiratory rate sensor, for example a PneumoWave respiratory rate sensor. The respiratory rate sensor may be a chest-worn patient wearable device. In other embodiments, respiratory data may be obtained in any suitable manner and by any suitable device or devices.

In other embodiments, any suitable sensor data may be obtained from any suitable sensors. For example, the sensor data may comprise any one or more of activity data, respiratory data, heart rate data, sleep data, energy expenditure data or oxygen saturation data. The sensor data may be acquired at any suitable intervals, for example for every minute, every 10 minutes, every half hour, or every hour.

At stage 32, the data processing circuitry 22 cleanses the training data. The cleansing of the training data comprises checking the training data for erroneous data. The cleansing of the training data comprises checking that a data type of the training data is consistent, for example checking that a data type of a training data set is consistent with other data within that training data set or a further training data set, or checking that a data type of a training data set is consistent with an expected data type. The cleansing of the training data further comprises checking that data values of the training data fall into one or more expected ranges. In other embodiments, any suitable data cleansing process may be performed.

At stage 32, the data processing circuitry 22 may also explore the training data for potential biases. Potential biases may comprise biases that occur in readings obtained from one or more individual sensors. For example, an individual sensor may consistently output values that are over or under a true value.

Potential biases may additionally or alternatively comprise biases in the training data as a whole. The training data may comprise societal, demographical and/or systematic biases. For example, different disease presentations in men versus women or across different ethnicities may result in training data that is biased by gender or ethnicity. Different access to care for different members of society may result in bias in the training data. Varying care practices for different patient groups may result in bias in the training data. For example, someone who has diabetes may be more likely to have a BMI (body mass index) measurement recorded than someone who does not have diabetes.

In other embodiments, data cleansing and/or exploration for biases may be performed before the training data is provided to the data processing circuitry 22. Stage 32 may be omitted.

At stage 34, the data processing circuitry 22 processes the training data to obtain, for each of the training data sets, values for a plurality of features. The plurality of features for which values are obtained at stage 34 are features that have been pre-selected for use in initial model training. Certain features are described below with reference to Figure 2. In other embodiments, a method described with reference to Figure 2 may be applied to any features described herein.

In the embodiment of Figure 2, feature engineering is performed to enrich the raw training data. Raw data may refer to data as it is received, for example received from a sensor. Raw data may refer to data after cleansing and/or pre-processing, but before aggregation or transformation is performed.

Some features are obtained by aggregation or transformation of data. Some clinician- driven domain specific feature engineering is performed to obtain domain specific features. The domain specific features are measures that clinicians and/or literature associate with deterioration in patient condition.

In the embodiment of Figure 2, the domain specific features include Neutrophil to Lymphocyte ratio; Maximum Eosinophils count in the previous year; number of exacerbations in the previous 12 months; and number of hospital exacerbations in the previous 12 months, where a hospital exacerbation is an exacerbation for which the patient was admitted to hospital.

At least one feature relates to hospital admissions. One or more features may distinguish COPD/respiratory admissions versus other admissions. One or more features may distinguish winter admissions versus non-winter admissions. Feature may count the number of admissions in each category (for example, COPD/respiratory, other, winter, non-winter). Features may include lengths of stay for hospital admissions.

Raw heart-rate data are enriched using simple descriptive statistics and aggregations, together with more advanced clinician-driven feature calculation. An example of a clinician-driven feature is elevated heart rate for x-minutes during periods of low activity. A value for number of minutes, x, may be set by the clinician or based on values in the clinical literature. One or more features may calculate time spent above or below a predetermined heart rate threshold value. The features are calculated from sensor data, for example from data from a Fitbit.

A feature to measure the restlessness of the patient’s sleep is calculated from sensor data, for example from data from a Fitbit.

In the embodiment of Figure 2, data from a wearable device, for example a Fitbit, is analysed at 1 minute granularity to mine insights from the data. It has been found that daily aggregates may mask key events during a day, which may lead to lower predictive ability. For example, a subject may have had a 10-minute interval during the day where their heart rate was greater than 200 bpm. This effect would not be seen by purely considering averages. Temporal features may be calculated from the raw sensor data, for example raw Fitbit data. Features may be designed to mitigate effects of data drift from sensors. Every wearable device, for example a Fitbit, Garmin or Apple Watch, has sensors accurate to a certain tolerance. Individual sensors have variability which may drift over time (unless a calibration means exists). As a result, it may be the case that no two sensors measure the exact same value. For example, one wearable device may overcount steps by 2% and a second may undercount by 2%. Firmware updates may be an additional source of data drift whereby a sensor may read a slightly different value before and after a firmware update.

Data drift may be mitigated through feature engineering approaches used to extract meaning from sensor data. Approaches may be used that consider data relative to the individual user. For example, rather than considering absolute activity, features such as differences between today’s value and the values for a predetermined number of previous days may be used. Smoothing techniques may be used. Features may be obtained that are substantially sensor-invariant, meaning that an effect of a biased sensor on model performance may be insignificant.

One or more features may be calculated using var_r3, which is the ratio of a current day’s value to an average of values for the previous three days. By using this ratio, data is normalised relative to the individual patient and to the previous three days to detect changes specific to that patient.

One or more features may be calculated using var_ratio, which is a ratio of a current day’s value to a value for the previous day. By using this ratio, data is normalised relative to data for the individual person the previous day, for example to the individual patient’s behaviour the previous day.

In other embodiments, a ratio of a current day’s value to values for any suitable number of preceding days may be used. A ratio of a value for any current time period to any preceding time period may be used, for example comparing a value for a current hour to a value for one or more preceding hours.

One or more features may be calculated using var_diff, which is a difference between a value for a current day and a value for a previous day or other previous time period. In some situations, some data may be missing. Data loss may occur due to technical or connectivity issues, or from a drop in user engagement. Feature engineering may include variables such as count to capture the completeness of the data and null certain variables if there are not enough data to calculate the feature and days since previous exacerbation.

A feature measures change in patient engagement based on engagement by a patient with a process of obtaining PRO data. For example, the feature may be based on whether or how regularly a PRO questionnaire is completed. The feature, or a further feature, may be based on how often the patient performs messaging. In some circumstances, an increase in messaging may indicate that there has been a change in a patient’s condition. In some circumstances, a decrease in messaging may indicate that there has been a change in a patient’s condition.

The features include a last labs test value and a complementary feature to capture a number of days since the last labs was taken. In some embodiments, the days since labs feature is only allowed to interact with its corresponding labs feature, patient's age, and sex. One or more features may aggregate labs test results. For example, one or more features may aggregate a previous one year’s labs test results as, for example, a minimum, maximum, median, 10th and 90th percentile.

Further features calculating presence or absence of a comorbidity (for example, for 30 different comorbidities) are derived from the clinical history. Corresponding features to calculate the number of years that the patient has had each comorbidity are derived from the clinical history.

Features may additionally be obtained using K-fold group target encoding. In one example, training data is split into 10 folds, which are 10 groups of unique patients. Each fold is representative of the full population. For a variable of interest, for example British National Formulary (BNF) prescription code, rare values are filtered out. For example rare prescription codes seen by less than 100 unique patients may be filtered out. Each fold may then be split into all possible categories of BNF code (which may for example be more than 1000 codes). Across each BNF group within each fold, an average of the target variable (0 or 1 ) for each group is calculated. The value for each group is averaged across the 10 folds. The result is then a numerical representation for each of the 1000+ BNF codes which represents the relationship between that BNF code and the target variable of interest. This technique may also be used for other categorical features, for example diagnosis, ethnicity and post code sector. The target encoded features may be further enriched by applying aggregations over the numerical representations of the categorical feature (for example, maximum encoded BNF value in the previous year).

In other embodiments, any suitable features may be obtained from the training data at stage 34. The features may include descriptive statistics. The features may comprise aggregations. The features may comprise, for example, an average, minimum or maximum values for one or more parameters, for example an average, minimum or maximum heart rate. The features may comprise or be representative of a change or rate of change of one or more parameters.

In summary, the features may include features derived from any or all of the clinical history, PRO data and sensor data over any appropriate time scale.

The set of initial features for which values are obtained at stage 34 is pre-determined. As described below, model training is iterative and may involve creation of new features based on the initial findings at previous model training iterations. A feature list resulting from the training described below may therefore be different from the initial features for which values are determined at stage 34.

At stage 36, the training circuitry 24 trains a first machine learning model, model A, to predict 12-month COPD mortality. The training of model A uses values for at least some of the plurality of features that are output from stage 34 and ground truth death data. The ground truth death data is obtained from the National Records of Scotland. In other embodiments, any suitable source of ground truth death data may be used.

The training of model A may initially use all of the features which values were obtained at stage 34, or a selected subset of the features. For example, features which are considered by a clinician or other expert to be relevant to 12-month COPD mortality may be selected for use in initial model training of model A.

The training of model A uses a machine learning classification algorithm, which in the embodiment of Figure 2 is a tree-based gradient boosting algorithm, for example XGBoost. Boosting and bagging (bootstrap aggregation) are used. In other embodiments, any suitable machine learning algorithm may be used, for example any suitable gradient boosting algorithm.

The training is iterative and involves the creation of new features based on findings at previous model training iterations. A large feature list (for example, 25 to 250 features) may be generated. The feature list is then reduced using feature selection techniques to optimize model performance.

At stage 38, the training circuitry 24 outputs a trained first model A, which is trained to predict 12-month COPD mortality from patient data and to output a risk score that is representative of a risk of 12-month COPD mortality. For example, the risk score may be a value on a scale from 0 to 1. The trained first model A predicts 12-month COPD mortality from values for a plurality of features which may include some or all of the initial features for which values are obtained at stage 34 and may include further features generated during training. The training circuitry 24 may also output a set of features that are used as input by the trained first model A.

At stage 40, the training circuitry 24 trains a second machine learning model, model B, to predict 3-month hospital readmission. The training of model B uses values for at least some of the plurality of features that are output from stage 34 and ground truth data. The ground truth data used to train the second machine learning model comprises ground truth hospital readmission data which is derived from EHR data. In other embodiments, any suitable source of ground truth hospital readmission data may be used.

The training of model B may initially use all of the features for which values were obtained at stage 34, or a selected subset of the features. For example, features which are considered by a clinician or other expert to be relevant to 3-month hospital readmission may be selected for use in initial model training of model B. The selected subset of features used in training model B may be different from the selected subset of features used in training model A.

The training of model B uses a machine learning classification algorithm, which in the embodiment of Figure 2 is a tree-based gradient boosting algorithm, for example XGBoost. Boosting and bagging (bootstrap aggregation) are used. In other embodiments, any suitable machine learning algorithm may be used, for example any suitable gradient boosting algorithm.

The training is iterative and involves the creation of new features based on findings at previous model training iterations. A large feature list (for example, 25 to 250 features) may be generated. The feature list is then reduced using feature selection techniques to optimize model performance. The resulting features for model B may be different than the features resulting from the training of model A.

At stage 42, the training circuitry 24 outputs a trained second model B, which is trained to predict 3-month hospital readmission from patient data and to output a risk score that is representative of a risk of 3-month hospital readmission. For example, the risk score may be a value on a scale from 0 to 1 . The trained second model B predicts 3-month hospital readmission from values for a plurality of features which may include some or all of the initial features for which values are obtained at stage 34 and may include further features generated during training. The training circuitry 24 may also output a set of features that are used as input by the trained second model B.

At stage 44, the training circuitry 24 trains a third machine learning model, model C, to predict 72-hour COPD exacerbation based on the values for the plurality of features that are output from stage 34 and ground truth data. In other embodiments, the COPD exacerbation may be any suitable near-term COPD exacerbation, which may not be 72- hour. For example, the COPD exacerbation may be an exacerbation over a specified time period which is between 48 hours and 120 hours.

The ground truth data used to train the third machine learning model comprises ground truth exacerbation data which in the embodiment of Figure 2 is obtained from a combination of clinician verified events, patient reported events, and information derived from EHR data. In other embodiments, any suitable source of ground truth exacerbation data may be used.

The training of model C may initially use all of the features which values were obtained at stage 34, or a selected subset of the features. For example, features which are considered by a clinician or other expert to be relevant to 72-hour COPD exacerbation may be selected for use in initial model training of model C. The training of model C uses a machine learning classification algorithm, which in the embodiment of Figure 2 is a tree-based gradient boosting algorithm, for example XGBoost. Boosting and bagging (bootstrap aggregation) are used. In other embodiments, any suitable machine learning algorithm may be used, for example any suitable gradient boosting algorithm.

In some embodiments, model C has multiple forms. One form uses XGBoost as in model A and model B. Another form is based on time-series approaches.

In some embodiments, alternative models may be trained that use different data types as inputs. For example, a first version of model C may be trained to predict 72-hour COPD exacerbation using clinical history, activity data, heart rate data, sleep data and respiratory data, and a second version of model C may predict 72-hour COPD exacerbation using clinical history, activity data, heart rate data and sleep data without respiratory data.

Alternative models may be trained for use where data is not available. For example, one version of model C may make use of data from a wearable device. An alternative version of model C may be trained to obtain a prediction in the absence of data from the wearable device. The alternative version of model C may be used if data from the wearable device is unavailable or of poor quality, for example if there is a level of missing data that is deemed unacceptable.

At stage 46, the training circuitry 24 outputs at least one trained third model C, which is trained to predict 72-hour COPD exacerbation from patient data and to output a risk score that is representative of a risk of 72-hour COPD exacerbation. For example, the risk score may be a value on a scale from 0 to 1. The or each trained third model C predicts 72-hour COPD exacerbation from values for a plurality of features which may include some or all of the initial features for which values are obtained at stage 34 and may include further features generated during training. The training circuitry 24 may also output a set of features that are used as input by the or each trained third model C. In other embodiments, the trained third model C may predict any suitable near-term COPD exacerbation, which may not be 72-hour exacerbation. For example, the near-term COPD exacerbation may be 24-hour, 48-hour exacerbation, 120-hour exacerbation or exacerbation over a period of one week.

The first model A, second model B, and third model C are machine learning models. In the embodiment of Figure 2, the first model A, second model B and third model C are trained separately, to perform different tasks. In other embodiments, the first, second and third models may be trained together, or a single model may be trained to perform multiple tasks.

In some embodiments, all of the models A, B and C may be trained using the same training data and/or the same initial plurality of features. In other embodiments, different training data and/or different initial features may be used in the training of different models. The trained models may use different features as input. In the embodiment of Figure 2, there is a significant overlap in features that are used for training model A and model B but each of model A and model B has features that are unique to that model. The training of model C uses some features that are also used in model A and/or model B, but may mostly use features that are unique to model C. Each of the models starts off with a larger feature set at the start of the model training set, which is then reduced using feature elimination techniques using model training.

In some embodiments, the training data includes data that has been provided during the Covid-19 pandemic.

In the embodiment of Figure 2, the models A, B and C are trained on features obtained from clinical history, from PRO data and from sensor data including activity data, heart rate data, sleep data and respiratory data. In other embodiments, different features may be used.

In the embodiment of Figure 2, the machine learning models are not deep learning models. The use of deep learning algorithms is avoided on the basis that it may add additional abstraction and/or lack of interpretability. In other embodiments, any suitable machine learning model(s) may be used.

Tree-based approaches comprising boosting and bagging methods are used. The treebased approaches are selected due to their interpretability.

In the embodiment of Figure 2, the machine learning models are trained using custom loss functions. The custom loss functions are developed with the aim of ensuring that the model produced in training is inherently fair. Custom code plus third-party packages such as fairlearn (https://fairlearn.org/) may be used to identify potential inequalities between different groups within the data. If potential inequalities are identified in a model, the model may be re-trained, tuning the loss functions to ensure parity between groups. The custom loss functions may learn from an initial inequality in predictions.

In some embodiments, the custom loss functions may be modified to allow clinician users, or other users, to tune a model for a specific use case. For example, a model may be optimised to minimise false positives. Models may be calibrated as described below.

In the embodiment of Figure 2, the hyperparameters of each model are optimized using a grid-search approach utilizing cross validation. In other embodiments, any suitable optimisation method may be used.

Model dropout analysis may be used to help determine which sensors and features add value to risk prediction scoring and enhanced remote monitoring.

In the embodiment of Figure 2, separate models are trained to predict each of the outcomes. Different models predict 12 month mortality, 3 month readmission and 72 hour (or near-term) exacerbation respectively. In other embodiments, a single model may be trained to predict multiple outcomes. In further embodiments, any suitable number of machine learning models may be trained to predict any suitable number of outcomes.

At stage 48, the trained models are evaluated on the test data, which comprises a plurality of test data sets. The test data is a hold-out dataset which is intended to be representative of the full population (assessed using Kolmogorov-Smirnov tests). When reporting on model performance, several appropriate metrics may be used. In the embodiment of Figure 2, precision-recall AUC (area under the precision recall curve) is reported in addition to a full classification report (for all sub-groups) to ensure full transparency on model performance. It is noted that the use of ROC-AUC (area under the receiver operating characteristic curve) may result in inflated accuracy metrics are reported which do not reflect true performance.

Tools such as fairlearn (https://fairlearn.org/) may be used to assess model performance on sub-demographics and sub-groups of interest to ensure the model doesn't perform badly on certain groups.

Model performance may be tested on models trained using varying degrees of data completeness in order to fully capture the change in performance as a result of missing data, and determine what is an acceptable level of missing data. From a machine learning point of view, sparsity-aware algorithms may be used for cases where missing data is deemed to be acceptable. When the level of missing data is deemed unacceptable, the method may revert to a model which does not use the wearable data. For example a first version of model C may be used when wearable data is available and a second, alternative version of model C may be used when the wearable data is missing or inadequate.

The use of wearable data may improve performance and accuracy of prediction models, and in particular may improve performance and accuracy of a 72-hour or near-term COPD exacerbation risk model. The use of wearable data may improve performance relative to a model that only uses PRO data and EHR data.

The use of wearable device data along with PRO data, patient demographics data, and historical EHR data may improve the predictive ability of the trained models when compared with the predictive ability of a model that is trained to use wearable device data alone. PRO data, demographics data and EHR data may give context to the wearable device data. For example, it may be expected that the baseline activity levels for a non-smoker in their 20s should be vastly different to the baseline activity levels for a smoker in their 80s. In some embodiments, a prediction of COPD exacerbation by machine learning model C is based at least partially on a decline in patient-reported wellbeing in combination with a change in patient activity data (for example, steps data recorded by a steps counter), with data indicative of restless sleep, and with heart rate data over a preceding few days. In other embodiments, any suitable model or algorithm, for example any suitable boosting algorithm, may be used to predict COPD exacerbation from a combination of wellbeing, activity data, restless sleep and heart rate data. In some embodiments, the model or algorithm may not use machine learning.

Models trained according to the method of Figure 2 may be used to predict outcomes for a patient using data obtained for that patient. Figure 3 is a schematic diagram illustrating in brief overview a system used to capture data relating to a patient 50, to process the data relating to the patient and predict outcomes for the patient 50, and to display the data and predicted outcomes to a clinician 66.

At least one sensor device 52 is used to capture sensor data relating to the patient. Wearables, sensors and/or other devices 52 may be used to capture sensor data relating to the patient 50. For example, in one embodiment, each patient using a COPD digital service is supplied with a Fitbit Charge 4 device to support collection and transmission of step count, heart rate and sleep data. The patient may also be supplied with a NIV device and/or respiratory rate sensor.

A communications device 54, for example a smartphone, is used to capture data input by the patient. In the embodiment of Figure 3, a patient application (app) is accessible via the smartphone. The patient app is configured to provide device agnostic data capture and patient reported outcomes. The patient app enables daily prompted recording of patient reported outcomes. For example, the patient may be prompted to record a self-assessment of their wellbeing via a questionnaire or survey.

The patient app further provides a personalised self-management and care plan for the patient. The patient app further provides the patient with the ability to conduct asynchronous messaging to a clinical team comprising one or more clinicians 66 that are involved in the patient’s care. The communications device 54 may be further configured to communicate with the one or more sensor devices 52 to capture data from the one or more sensor devices 52. Open APIs may be provided to support GDPR-compliant health data exchange from a range of apps and wearables devices. Access may be provided to a range of pre-existing integrations such as, for example, Google Fit, iOS Health Kit and Fitbit, to consume data from one or more wearable devices.

In other embodiments, the one or more sensor devices 52 may be configured to communicate data directly to another computing device or to a cloud-based resource rather than communicating via the communications device 54. For example, a wearable sensor device 52 may communicate with a proprietary cloud platform using 3G, 4G, 5G or any other suitable communications method.

In further embodiments, at least some of the functionality of the communications device 54 may be provided by the sensor device 52, or at least some of the functionality of the sensor device 52 may be provided by the communications device 54. In some embodiments, a single device acts as both a sensor device 52 and a communications device 54.

A primary care system 56 may be used to obtain data relating to the patient. For example, one or more clinicians may input and/or verify details of COPD exacerbations that the patient has experienced. The primary care system provides processes and services provided to patients as part of day-to-day healthcare by a healthcare provider. In other embodiments, any one or more care systems, for example primary and/or secondary health care systems, may be used. In some embodiments, the primary care system is used to access data relating to the patient but is not used to input data relating to the patient.

A clinical portal 58 is used to obtain clinical history data, which may comprise data from an EHR. The clinical portal 58 may comprise, or have access to, a repository containing electronic health records. The clinical portal 58 may be used in a secondary health care setting. The clinical portal 58 comprises an interface through which a clinician or clinical administrative user may access and securely authenticate patient health data. The patient health data may be aggregated from multiple sources including EHR, patient generated inputs and data from consumer or medical devices. In some embodiments, the primary care system 56 and clinical portal 58 may be part of the same apparatus(es) or system(s). Functionality of the primary care system 56 and/or clinical portal 58 may be provided by the same apparatus(es) or system(s) as provide the clinical dashboard 64 as described below.

In some embodiments, data may be input to the clinical portal 58. For example, data concerning a COPD event may be recorded in the clinical portal 58. At least some of the data may be communicated from the clinical portal 58 to the primary care system 56.

Data from some or all of the sensor device(s) 52, the communication device 54, the primary care system 56 and the clinical portal 58 is received by a cloud-based service 60, which may also be described as a cloud-based platform. The cloud-based service 60 may be implemented using any suitable cloud-based resources.

The cloud-based service 60 provides an infrastructure layer that offers identity, consent, security, data capture, curation, storage and integration with systems of record. In particular, the cloud-based service 60 provides a mechanism for managing user consent. A patient is invited to use the cloud-based service 60 to access services to manage the patient’s COPD. The patient provides authentication using personal information. The patient may also provide identification using an identifier such as the Community Health Index (CHI) number which is used in Scotland to identify individuals in health care. In some embodiments, the CHI number is obtained from the clinical portal 58 or from an EHR. The cloud-based service 60 allows the patient to consent for their data to be stored and used, for example in accordance with a COPD management process. The patient may withdraw consent at any time using the cloud-based service 60.

The cloud-based service 60 provides FHIR (Fast Healthcare Interoperability Resources) APIs. For example, the cloud-based service 60 may interact with proprietary cloud-based systems that are used in conjunction with wearable sensor devices or other devices.

The cloud-based service 60 further provides data storage services store data relating to patients. The cloud-based service 60 may provide a secure, API-driven data exchange which may enforce data interoperability and standards. The cloud-based service 60 provides data collection, aggregation and processing via methods that may use application programming interfaces (API) or human data entry forms, storing and processing the data within an application or series of inter-connected applications.

In alternative embodiments, at least part of the functionality of the cloud-based service 60 may be provided by any one or more computer apparatuses, which in some embodiments may not be cloud-based. For example, the functionality of the cloud-based service 60 may be provided by computing apparatus 10 of Figure 1.

An Al insights module 62 is configured to provide an analytics layer which may be used to obtain predictions of outcomes from patient data as described below in relation to Figure 4. The analytics layer exploits structured datasets across patient and clinical systems to create Al-driven actionable insights.

The Al insights module 62 is implemented by one or more cloud-based resources, which may differ from those used to implement the cloud-based system 60. In other embodiments, the Al insights module 62 may be implemented by any one or more computer apparatuses which may not be cloud-based, for example, by computing apparatus 10 of Figure 1.

The Al insights module 62 receives data about a patient that has been gathered by the cloud-based service 60, for example from data that has been received from some or all of the sensor device(s) 52, the communication device 54, the primary care system 56 and the clinical portal 58 and communicated to the cloud-based service. The Al insights module 62 applies one or more models to the data to obtain one or more predictions.

A clinical dashboard 64 is provided. The Al insights module 62 outputs the one or more predictions to the clinical dashboard 64, where the predictions may be displayed. The clinical dashboard 64 may receive patient data from the cloud-based system 60, either directly or via the Al insights module. In some embodiments, the clinical dashboard 64 may receive data directly from any of the sensor device(s) 52, the communication device 54, the primary care system 56 and the clinical portal 58 The clinical dashboard may be used by one or more clinicians 66 who are involved in the management of the patient 50. In the embodiment of Figure 3, the clinical dashboard 64 links data from the patient app with data from sensor devices 52 comprising a patient wearable device, for example a Fitbit, and a home ventilation therapy device, for example a ResMed device. The clinical dashboard 64 may provide curated data visualisation to enhance patient management and inform care planning.

Data that is displayed on the dashboard may include any one or more of clinical history data, PRO data and sensor data from any appropriate sensor. Predictions for outcomes may also be displayed on the clinical dashboard 64. The predictions for outcomes may be displayed as a risk score.

The display of the clinical dashboard 64 may be such that a clinician 66 or other user can interrogate the clinical history, PRO data, sensor data or any other relevant data in greater detail. For example, the clinical dashboard 64 may initially display a summary for the patient that comprises one or more predictions and a summary of results for one or more parameters, for example heart rate and activity. The clinician or other user may then interact with the clinical dashboard 64 to obtain more detailed information about the patient, or to obtain information that is not displayed on the initial display.

Data from sensors may be displayed as a trend over time with certain level of aggregation. For example, the level of aggregation may be predetermined or may be selected by a user.

The clinical dashboard display may offer a holistic view of all the relevant information that a clinician may need to quickly assess a state of the patient’s health.

In the embodiment of Figure 3, data relating to multiple patients may be displayed through the same clinical dashboard 64. The clinical dashboard 64 is configured to prioritise patients in accordance with a respective prediction for each patient. For example, the clinical dashboard 64 may order patients in a priority order in dependence on the predictions, such that patients having predictions that may be a cause for concern (for example, high risk of exacerbation) may be reviewed most quickly. In the embodiment of Figure 3, the clinical dashboard 64 is further configured to issue an alert in dependence on the prediction for a patient. The clinical dashboard 64 may trigger an alarm if a risk score is high, for example if a risk score exceeds a predetermined threshold. In other embodiments, any suitable alert may be used. For example, a patient record or a prediction for a patient may be highlighted in any suitable manner. An audible or visual alarm may be issued.

In the embodiment of Figure 3, the clinical dashboard 64 and the patient app are configured to provide asynchronous messaging for communication between the clinician 64 and patient 50. The asynchronous messaging communication functionality may also be described as chat functionality.

In the embodiment of Figure 3, the information available to the clinician 66 through the clinical dashboard 64 is different from the information available to the patient 50 through use of the patient app on communications device 54. The predictions are visible to the clinician 66 or other user of the clinical dashboard 66. The predictions are not visible to the patient 50. If a prediction is of concern, a clinician 66 or other user may be alerted to the prediction and may then notify the patient 50.

By using the system of Figure 3, an integrated system may be provided to capture patient data, make predictions, and provide useful information to clinicians in a transparent manner. Services across the platform can access a range of pre-existing integrations such as Google Fit, iOS Health Kit and Fitbit to consume data from different wearable devices and aggregate these with other clinical patient data. This interoperability may provide the facility to include data from consumer and healthcare medical technology devices.

Figure 4 is a flow chart illustrating in overview a method of obtaining predictions from patient data using models trained in accordance with the method of Figure 2. In the embodiment of Figure 4, various stages of the method are described as being performed using the data processing circuitry 22, prediction circuitry 26 and display circuitry 28 that are shown in Figure 1. In the system described above with reference to Figure 3, some of all of the stages of the method of Figure 4 are performed by the Al insights module 62 which is implemented using one or more cloud-based resources. At stage 80, the prediction circuitry 26 receives a set of patient data for a patient, which may also be described as inference data. The patient is a patient for whom it is desired to predict COPD outcomes.

The patient data comprises clinical history data, PRO data and sensor data. In the embodiment of Figure 4, the clinical history data comprises EHR data and demographic data. The clinical history data may also comprise details of exacerbations that have been verified by a clinician. The PRO data comprises daily wellbeing data that has been input by the patient to a communication device 54 using a patient app. The wellbeing data may have been input over a period of days, weeks or months. The sensor data comprises activity data, heart rate data and sleep data obtained from a wearable device 52 worn by the patient, for example a Fitbit. The sensor data further comprises respiratory data obtained from a home NIV device 52. The sensor data may have been obtained over a period of days, weeks or months.

In other embodiments, any suitable combination of data relating to the patient may be received. Sensor data may be obtained using any suitable sensor or sensors 52. PRO data may be reported on any suitable communication device 54.

At stage 82, the data processing circuitry 22 cleanses the patient data. The cleansing of the patient data comprises checking the patient data for erroneous data; checking that a data type of the patient data is consistent; and checking that data values of the patient data fall into one or more expected ranges. In other embodiments, any suitable data cleansing process may be performed. The data processing circuitry 22 may also check the patient data for bias, for example bias in readings from one or more sensors.

At stage 84, the data processing circuitry 22 processes the patient data to obtain values for a plurality of features. The plurality of features comprises features used as input by trained model A, feature used as input by trained model B, and features used as input by trained model C. The features used as input to models A, B and C were determined in training as described above in relation to Figure 2.

At stage 86, the prediction circuitry 26 inputs to trained model A the determined values for the features used as input for trained model A. Trained model A uses the determined values for the features to obtain a prediction of 12-month COPD mortality risk for the patient.

At stage 88, the prediction circuitry 26 outputs the determined prediction of 12-month COPD mortality risk.

At stage 90, the prediction circuitry 26 inputs to trained model B the determined values for the features used as input for trained model B. Trained model B uses the determined values for the features to obtain a prediction of 3-month hospital readmission risk for the patient.

At stage 92, the prediction circuitry 26 outputs the determined prediction of 3-month hospital readmission risk.

At stage 94, the prediction circuitry 26 inputs to trained model C the determined values for the features used as input for trained model C. Trained model C uses the determined values for the features to obtain a prediction of 72-hour COPD exacerbation risk for the patient.

At stage 96, the prediction circuitry 26 outputs the determined prediction of 72-hour COPD exacerbation risk.

In some embodiments, risk stratification is performed based on one or more of the determined predictions. The prediction circuitry 26 may perform a risk stratification in which the patient is assigned to a risk category, for example high, medium or low risk, in accordance with the prediction. For example, the prediction circuitry 26 may determine whether the patient is high risk by comparing the risk score to a predetermined threshold value, and identifying the patient as high risk if the risk score exceeds the predetermined threshold value. In some embodiments, the patient is classified as one of four categories A-D in accordance with the Global Initiative for Chronic Obstructive Lung Disease 2017 (GOLD) classification. Categories A-D are stratified by symptom burden and exacerbation risk.

At stage 98, the display circuitry 28 displays values for the predictions to a clinician 66 using the clinical dashboard 64. Each of the predictions may be displayed as a respective risk score, for example as a value between 0 and 1. The display circuitry 28 may also display a category to which each prediction has been assigned, for example a stratification into high, medium or low categories.

The display circuitry 28 may also display on the clinical dashboard 64 any other information that may be useful to the clinician, for example demographic or clinical history data. The display circuitry 28 may display sensor data and/or PRO data, for example in the form of a summary.

The display circuitry 28 may also display information that explains the predictions. For example, the display circuitry 28 may display values for certain features that were used in the prediction.

For example, the display circuitry 28 may display explainable information about the overall model performance and the global model feature importance. The display circuitry 28 may display an expected model performance on one or more demographics of interest. The display circuitry 28 may display a probability for the individual prediction to help a clinician gauge a level of certainty of the prediction made by the model.

The display circuitry 28 may display a local feature importance for individual predictions together with a true feature value. Features may be ranked in importance and the most important features displayed to the clinician. For example, if number of exacerbations was the most important feature in the prediction of an outcome for an individual, the display circuitry 28 may display the information that number of exacerbations is the most important feature, and may also present the number of exacerbations to the clinician.

Clinicians may be presented with importance figures (or scores) which indicate how important each feature is to the model in general (global feature importance) and/or to a specific prediction (local feature importance).

At stage 100, the display circuitry 28 determines whether any predicted risk is high. If any of the predicted risks is determined to be high, the display circuitry 28 issues an alert, for example a visual or audible alert. At stage 102, asynchronous communication takes place between the patient and the clinician. For example, the clinician on receiving the alert of stage 100 may message the patient through the clinical dashboard 64 using an asynchronous messaging facility. The patient may reply asynchronously to the clinician.

In other embodiments, the stages of Figure 4 may be performed in any suitable order. Certain stages may be omitted. Certain stages may occur repeatedly. For example, one or more of the predictions may be recalculated regularly, for example daily.

Patients that have been stratified as high risk by models may be reviewed regularly, for example bi-weekly. Model insights may enable optimised care delivery by clinical teams by highlighting patients who would benefit from anticipatory care planning or a targeted intervention including equipping them with wearables e.g. respiratory sensor, to reduce the risk of an unscheduled hospital admission.

Risk scores provided via the clinical dashboard may be used to identify patients who have significant chronic risk and recommend preventative care regimes to support anticipatory care planning. Risk scores provided via the clinical dashboard may be used to identify patients who have significant risk of hospital admission and recommend immediate changes to a care regime to avoid the consequential health risk to the patient and the cost of hospital admission.

An Al system is provided that is capable of remotely risk-stratifying patients with chronic illnesses, specifically patients with Chronic Obstructive Pulmonary Disease (COPD). The technology predicts the risk of short term exacerbation (extreme deterioration in condition) using an Al trained on data generated as part of the service together with historical patient Electronic Health Record (EHR) data. Limitations of other comparable Al systems may be addressed in that the system is designed to be inherently fair and ethical, which is relevant to the safe and trusted adoption of Al prediction technology.

These data generated as part of the service include patient reported outcomes data (PROs) submitted electronically, activity, heart-rate and sleep data, together with home NIV (non-invasive ventilation) respiratory data. Machine learning models combine structured data sets with longitudinal patient record data. The technology produces a risk score for each patient on the service together with local explainability allowing the clinician to view the bio-plausibility of the features driving the model decision.

The methods described above with relation to Figures 1 to 4 may provide a reliable tool used to remotely predict short term exacerbation in COPD patients. The tool may reduce the burden of clinicians. The tool may act as a 24/7 monitor for the patient. Reductions in exacerbation and hospitalisations are the outcomes rated as most important by clinicians and people with COPD.

Machine learning models may be used to stratify risk and predict COPD exacerbation and mortality. Methods as described above may address the transformation of COPD service provision from a reactive approach to one focused on prevention, anticipation and supported self-management.

Accurate, explainable and clinically actionable predictive machine learning models for key outcomes may be provided. Personalised risk scores may be provided within a clinical dashboard which may help predict deterioration and mortality.

In the above described embodiments, use of machine learning models was described. Figure 5 and 6 relate to further embodiments, in which an unsupervised machine learning procedure is as part of a feature reduction process to obtain a reduced set of features on which to train a further machine learning model. In some embodiments, the unsupervised machine learning procedure identified the labels to be used for training the further machine learning model.

Figure 5 illustrates the training of a first formulation of a further machine learning model, which may be referred to as Model E. Model E comprises an unsupervised machine learning model. Model E is trained for the purpose of grouping COPD patients into clusters. By grouping COPD patients into clusters, a risk stratification may be performed. For example, patients may be assigned to risk categories such as high, medium or low risk, or assigned to categories A-D in accordance with the Global Initiative for Chronic Obstructive Lung Disease 2017 (GOLD) classification. The model may be used to, for example, identify under-medicated patients or patients in need of a clinical review. Use of an unsupervised model may offer advantages. For example, one advantage is that accurate ground truth labels are not required. A further advantage here, is that the unsupervised method can discover relationships and/or groupings or "disease phenotypes" that previously were unknown.

At stage 130, the data processing circuitry 22 receives training data. In one embodiment, the training data comprises EHR data from 1985 to 2021 for COPD patients in a Greater Glasgow & Clyde (GG&C) cohort. The raw EHR data includes admission history, lab tests, prescriptions, comorbidities, and demographics.

At stage 132, the training data may be cleansed, for example as described above in relation to stage 32 of Figure 2.

At stage 134, the data processing circuitry 22 pre-processes the EHR data a format suitable for machine learning, resulting in 52 features with one row of data per patient per 12-month period.

The features comprise bespoke features derived from prescribing datasets. At least one feature is indicative of how many mono, dual and triple therapies a patient is taking. In some circumstances, a patient may take more than one prescription for a given condition, for example for breathlessness. A mono therapy is where the patient takes one type of inhaler for the condition. A dual therapy is where the patient takes two types of inhaler for the condition. A triple therapy is where the patient takes three types of inhaler for the condition. In some circumstances, the number of mono, dual and triple therapies may be predictive of risk. In other embodiments, features based on the number of mono, dual and/or triple therapies may be used in any models described above.

At least one feature captures how much reliever inhaler medication the patient has needed, for example over a predetermined timescale. At least one feature captures how much rescue medication the patient has needed, for example over a predetermined timescale.

At least one feature captures how many anxiety and depression related episodes and prescriptions the patient has had. In other embodiments, any of the features described above in relation to stage 134 may be used in training Model A, Model B and/or Model C as described above in relation to Figure 2, or in training any other COPD prediction model.

Once pre-processing is complete, the training circuitry 24 proceeds to stage 136 of Figure 5. At stage 136, the data is scaled and passed through Principal Components Analysis (PCA) to perform feature importance and selection, reducing features from 52 to 14. From these 14 features, six principal components (projections into a lower dimensional space) were created for model training using PCA. In other embodiments, any suitable feature reduction may be used. Any suitable number of principal components may be identified.

At stage 138, the training circuitry 24 trains a K-Means clustering algorithm to cluster data by risk. Any suitable method of training a K-Means clustering algorithm may be used.

At stage 140, the training circuitry 24 outputs the trained K-Means clustering algorithm. Results of the K-Means clustering algorithm may be plotted for inspection, for example using display circuitry 28.

At stage 142, the training circuitry 24 evaluates the trained K-Means clustering algorithm. In some embodiments stage 142 corresponds substantially to stage 48. The evaluation process includes examining key events in the following year for patients classified into each of the three clusters. These key events include, time-to-death, time-to-admission, rescue medication usage.

At stage 144, the prediction circuitry 26 applies the trained K-Means clustering algorithm to new data to obtain a prediction. For example, the prediction circuitry 26 may apply the trained K-Means clustering algorithm to data relating to a new patient in order to obtain a predicted risk for said patient. In some embodiments, the input data may be stratified by risk. For example, the patient may be assigned to a risk categories such as high, medium or low risk, category A-D in accordance with GOLD classification.

In other embodiments, the six principal components obtained from principal component analysis, or any other suitable number of features, are used with a different clustering algorithm. Figure 6 illustrates the training of a second formulation of Model E. Stages 130 to 136 of

Figure 6 are the same as stages 130 to 136 of Figure 5.

At stage 150 of Figure 6, the training circuitry 24 uses the six principal components (or in other embodiments, any suitable number of principal components) to train an unsupervised machine learning model comprising Hierarchical clustering. Hierarchical clustering is used to cluster the training data to obtain a plurality of trained clusters.

Hierarchical clustering alone does not allow for prediction on new data. At stage 152, the training circuitry 24 trains a second, supervised machine learning model on labels of the trained clusters from stage 150 and is deliberately overfit to the labels of the trained clusters. This creates a method for predicting on new data. In use, new data is fed into the second, supervised machine learning model to obtain a prediction. The output of this second model is a cluster label (i.e. a label identified by the clustering algorithm) that is used to classify the new data.

The second, supervised machine learning model may be a tree-based method such as a decision tree classifier. The second, supervised machine learning model may comprise a one versus rest approach whereby the problem is modelled as cluster 1 versus other clusters, cluster 2 versus other clusters, cluster 3 versus other clusters and the highest scoring value from each of these sub-models is used to determine the label of new predictions. As described below, an additional calibration process may be provided, in some embodiments, to map or transform the label of the new prediction to a probability distribution, for example, a probability distribution that represents a real world probability of the occurrence of predicted health outcome.

At stage 154, the training circuitry 24 outputs the trained models from stages 150 and 152. At stage 156, the training circuitry 24 evaluates the trained models.

At stage 158, the prediction circuitry 26 uses the trained models to obtain a prediction for input data relating to a patient. For example, the prediction circuitry 26 may apply the trained model to data relating to a new patient in order to obtain a predicted risk for said patient. Input data may be stratified by risk. For example, the patient may be assigned to a risk categories such as high, medium or low risk, category A-D in accordance with GOLD classification.

The combination of an unsupervised clustering model plus a second, supervised machine learning model may be viewed as an explainable clustering model.

At stage 160, the prediction is displayed by display circuitry 28 along with information regarding the prediction. The information regarding the prediction is obtained by applying one or more explainability procedures the second, supervised machine learning model. For example, the explainability procedures may include at least one of SHAP (SHapley Additive exPlanation), LIME (Local Interpretable Model-Agnostic Explanations) or ELI5 (Explain Like I'm 5). For example, the information regarding the prediction may explain which features were most important to the prediction. The information regarding the prediction may comprise a global and/or local feature importance as described above. In the present embodiment, the explainability procedures is applied at step 158 together with obtaining the prediction. Stage 160 may include displaying reasons or a rationale for a prediction, for example, in a human readable or any suitable format.

In some embodiments, the explainability procedure is applied to obtain importance scores and/or a rationale for the prediction. In some embodiments, obtaining the rationale and/or the importance scores includes determining the effect of a change in the value of an input feature on the output of the model for each of the set of input features. In some embodiments, information explaining and/or regarding the prediction is obtained. The rationale and/or importance scores may be represented as a filtered and/or ranked list of features that are most important. In some embodiments, the rationale may comprise human readable text based on the importance scores and/or presentation of the importance scores in a suitable format that allows a clinician to understand the output of the model and the impact and/or effect of input data on the output.

Model E, for example as described above with relation to Figure 5 or Figure 6, may be used to stratify risk in a patient population. For example Model E may be used to identify under-medicated patients. Model E may be used to identify patients in need of a clinical review. In some embodiments, a calibration process is used to calibrate an inference output of a prediction model, for example an output of any of models A, B, C or E. For use in a clinical setting, it is important that the inference output of the prediction can be interpreted as a probability of an event. It is important to quantify to what extent an output measure is reflective of a true probability of an event, for example a probability of an exacerbation or a probability of hospitalization in a given timescale.

High class imbalance may cause models to be poorly calibrated and may affect the utility of models in a live setting. The high imbalance may mean that there are a lot more training samples in one class than in another class. It will be understood that the imbalance reflects what is occurring in reality. For example, predicting mortality is an imbalanced problem as much fewer people die than those who survive on a per year basis.

In some embodiments, the calibration method used is Platt regression on folds of patient data, demographically and class-balanced stratified. The calibration process may comprise transforming the output of the at least one trained model, for example, a classification, into a probability distribution. The training process may involve obtaining the mapping between possible outputs of the model and probabilities.

Any of the models described above, for example Model A, B, C or E, may comprise a combination of several constituent models, which may be described as base classifiers. The training of a model may comprise training several base classifiers. The base classifiers are fitted on different folds of patient data. Probabilities are calibrated using the hold-out test data set for that group of folds. When predicting on unseen data, each base classifier is inferenced on and the average is taken.

Model calibration allows the presentation of expected model behaviour for a multidisciplinary team (MDT) for example a 100 patient MDT. In other embodiments, any suitable size of MDT may be used. The multi disciplinary team will be understood to describe, in a healthcare setting, a team comprising several disciplines (for example, in the care of an elderly adult with dementia, an MDT might comprise a nurse, psychologist, psychiatrist, physio etc.). Expected model behaviour may be presented at various levels of calibrated probability threshold. In some embodiments, a user, for example, a clinician, can select an expected model behaviour that fits the user’s needs. For example, the users can select a threshold, see the performance metrics at this threshold, and are presented with how this model would perform scaled to 100 patients in terms of True and False positive, and True and False Negatives. By providing calibrated outputs, the system may be suitable for use in a clinical setting.

In further detail, it will be understood that a distribution of probabilities for a binary classification problems may differ greatly depending on the model performance and class imbalance. For each model, present expected behaviour were you to classify "at risk" to be greater than a given threshold (a probability value selected between 0 and 1). This is repeated for a number of different values between 0 and 1 (for example, 6 or 7) and the results are presented in terms of how the model would perform on 100 patients in terms of true and false positives, true and false negatives

The expected model behaviour may refer to a proportion of true and false positives, and true and false negatives for a given probability threshold. This allows a user to see how this model will perform for 100 (or however many) patients. For example, in the embodiment for Model A, as the threshold moves closer to 1 , the model becomes very precise and will therefore minimize False Positives but at a trade off of a lot of people being missed (False Negatives). This may be provided visually. The metrics corresponding to the above scenario (precision, recall, specificity etc) may also be determined and displayed.

The performance metrics may include, for example, accuracy, precision, recall, specificity, f1 -score, ROC-AUC, precision-recall AUC.

In some circumstances, a standard procedure for implementing explainability may not work with a calibrated model. One reason is because multiple models are used to provide the calibrated prediction rather than just a single model.

In some embodiments, an explainability procedure is applied which comprises iterating over a plurality of base classifiers which form the calibrated model, calculating local and global explainability for each of the base classifiers, and averaging the results across each of the models.

Local explainability explains the contribution of a model input to a particular prediction, for example a prediction associated with a single patient. Global explainability explains the contribution of a model input to prediction in general, for example over a cohort of patients.

In one embodiment, a calibrated model comprises a plurality of constituent models, which may be described as base classifiers. Each base classifier is configured to receive a plurality of model inputs (which may also be referred to as features) and to output a respective prediction, for example as a probability.

For each base classifier, an explainability procedure is used to assess the contribution of each feature to the base classifier’s prediction. In some embodiments, the explainability procedure is SHAP (SHapley Additive exPlanation), which uses a game- theoretic approach to assess the contribution of each feature.

The explainability procedure outputs a respective contributing factor for each model input for each base classifier. For each model input, the contributing factors obtained from all of the base classifiers are combined to obtain a combined contributing factor, for example by averaging. The combined contributing factor is representative of the contribution that the model input makes to the overall prediction that is output by the model, for example to a probability that is output by the model.

In some embodiments, the prediction circuitry 26 outputs the combined contributing factor of each model input to the overall probability on an individual patient prediction level. This display circuitry 28 may present each model input value together with the contribution that the model input value had to the overall probability score. For example, the contributing factors may be displayed along with a prediction.

The output of contributing factors may give transparency to the user. This output may also give information as to why a feature is important, for example, by indicating to the user that low albumin correlates to a higher mortality risk. This extends a typical approach of giving feature importance order to include richer information around what characteristics of that feature make it important.

In some embodiments, a machine learning model used comprises one or more Glassbox methods, for example Explainable Boosting Machines. Glass-box methods are inherently interpretable which removes the black box element of a lot of machine learning algorithms. The framework describes above outputs both a prediction and model explainability. Due to the additive nature of the algorithm, it is therefore possible to see the contribution each feature has on the final prediction and therefore considered a Glass-box method.

Custom code plus third-party packages such as fairlearn (https://fairlearn.org/) are used to identify potential inequalities between different groups within the data used for model training. Using the outcome of this analysis, custom loss functions are developed to minimize performance disparities between the groups. This resulted in fair, performant, and calibrated models.

Fair and explainable models may be provided. The models may be calibrated and transparent. By providing models that are calibrated, fair and explainanable, models may be provided that are trusted by end users. Models that are not trusted by end users may not be used.

The models are not limited to use the XGBoost algorithm. In some embodiments, additionally advanced Glass-box methods such as Explainable Boosting Machines were used. These models are inherently interpretable which removes the black box element of a lot of machine learning algorithms.

It will be understood that loss functions are used throughout the model learning process to evaluate how the model is performing observation by observation. They typically optimize the learning to minimize a quantity. While a wide variety of loss functions are already implemented out of the box, sometimes there is a specific need for a particular model that is not covered by the pre-existing loss functions. As a first example, a model may need to be trained to be inherently fair.

To implement a custom loss function in XGBoost involves the following steps: 1) pen and paper mathematics to describe and represent the custom metric to optimise for (ideally this will be continuous and differentiable) 2) obtaining the first and second derivative of this function (for example, gradient and hessian) 3) applying the XGBoost algorithm to make use of a Taylor expansion and therefore requires a first and second derivative to compute.

A number of bespoke features are described above in relation to various embodiments. In other embodiments, any suitable bespoke features may be used. Further bespoke features are described below. Features may be used in any suitable combination. For example, any of the further bespoke features described below may be used as an input to Model A, to model B, to model C, to model E, or to any other model described herein.

In some embodiments, BNF (British National Formulary) prescribing risk mapping is used to create a numerical representation of risk for prescribed medication. In the BNF, each medication is given a respective code which may be referred to as a BNF code.

The data processing circuitry 22 creates numerical representation of risk for each of a plurality of BNF codes, which may be BNF codes that are present in the training data. Each trimmed BNF code (first 9 digits) other than the rarest 100 is assigned a number. This number is a floating point number between 0 and 1 .

Firstly, the data processing circuitry 22 removes the last 3 characters from each BNF code, thereby removing the generic information. The data processing circuitry 22 removes codes that have been prescribed to 100 patients or fewer, to avoid information leakage.

The data processing circuitry 22 splits the training data into 10 age, sex, and disease class stratified folds. In other embodiments, any suitable number of folds may be used.

The data processing circuitry 22 uses k-fold target encoding to convert each code in each fold to a numerical representation of disease likelihood given that code. The data processing circuitry 22 then performs averaging over all folds. In some embodiments, the numerical representation of risk is enriched by considering other factors, for example, maximum mapped BNF encoded value in the previous 12 months.

It will be understood that the above described approach to risk mapping, may also be applied, in some embodiments to, for example, ICD-10 diagnosis mapping or ICD-10 diagnosis to comorbidity mapping. For ICD-10 diagnosis mapping, the same approach as that described above is adopted but using ICD-10 codes in place of BNF codes. For an ICD-10 diagnosis to comorbidity mapping, a bespoke mapping for ICD-10 codes to 30 different comorbidities is performed and those 30 comorbidities are then used as model features.

As described above, some of the features are domain specific. In some embodiments, the domain features include prescribing features. For example, the features may include, for example, rescue medication features or reliever inhaler features. In some embodiments, a feature of a sum of COPD related prescriptions and a sum of non-COPD related prescriptions may be used. In some embodiments, the sum of the total may be based on prescription from a pre-determined time period, such as a year. In some embodiments, only a lifetime total is considered.

In some embodiments, the features include laboratory features, for example, Neutrophil -lymphocyte ratio of the patients most recent tests, Maximum Eosinophil value the patient has had in their life time. In some embodiments, the feature correspond to days since labs test feature. In such embodiments, the model is aware of how recent the test data is and is then able to weight the associated risk/lack of risk relevant to the data recency using custom interaction constraints.

In some embodiments, the interaction constraints are applied during model training or when applying the trained models. These constraints allow a user to specify which features are allowed to interact with each other and/or to restrict which features are not allow to interact. In some embodiments, no interaction constraints are enforced and the model finds the relationships between features. In such embodiments, the generalizability of the model may be reduced and the model may be less explainable. In some embodiments, bespoke constraints on laboratory test features are imposed such that the days since a test was taken feature is only allowed to interact with the test value itself and the age and sex of the patient.

In some embodiments, natural language processing (NLP) techniques are used, including Deep Learning word embedding methods and Latent Dirichlet allocation to find topics and stories within a patient’s diagnosis and prescribing history.

In a separate pre-processing step, unsupervised topic modelling may be applied to, for example, the patients ICD10 diagnosis history across a specified time window (for example 2 years). This produces a specified number of topics/stories to which a label is assigned. In addition, a one-hot-encode process is performed as new features to be used within the models. It will be understood that this process does not provided a separate output to the model but may produce new features in some variant of the models. As a non-limiting example, a story may look like “J441-J449-J47" which can be considered as "story" of ICD10 codes.

In some embodiments, condition specific features are used. The number of communitymanaged exacerbations in the previous year may be used. In some embodiment, the total number of COPD-respiratory 3-month readmission may be used. In some embodiments, COPD hospital admissions over a lifetime may be used.

In some embodiments, data from patients with incomplete socioeconomical data is used. In some embodiments, the training data includes data collected during the COVID-19 pandemic.

In some embodiments, the steps of training, testing and validation and cross-validation includes checking, using statistical tests, that each of the sets are balanced in terms of key demographic features and class balance. In some embodiments, a combination of bespoke data splitting methods and known splitting method may be used. In addition, an appropriate statistical test such as kolmogorov smirnov test may be used to ensure the splits are representative of the full population. This may be performed iteratively until adequate fits are found. In some embodiments, validation is evaluated in terms of performance, fairness, calibration and explainability. The statistical tests are carried out on any data that is used for training, testing and validation (including any k-fold methods). Validation is typically then in terms of performance, fairness, calibration and explainability and can be done on a hold out validation dataset or in a k-fold manner.

In further detail, the evaluation stages 48, 142, 156 described may include application of a set of tests that evaluate the trained model in terms of performance in addition to fairness, calibration and explainability.

In some embodiments, the model can predict on respiratory hospitalization or all-cause For example, Model C predicts based on a deterioration in condition. In addition, in some embodiments, the output of the model is a calibrated probability allowing a clinician to reliably interpret how the model will perform for a given threshold. In some embodiments, an explanation for each prediction is also provided. Obtaining a calibrated prediction, during model training, the data is split into k-folds (k could be, for example, 5). The model is trained on 4 of the folds and then calibrated against the 5th using Platt scaling in an iterative manner.

In some embodiments, the prediction may be based on events which could be community managed exacerbations and/or the number and/or type of medications prescribed prior to the prediction date. In some embodiments, the features include, for example “max Eoisonophil”, “Neutrophil Lymphocyte ratio” and/or feature relating to patient sleep.

The following variables or features that may be used in the models, are described. The features include patient identification features, demographic features, prescribing features, admission features and lab features.

All datasets - Identification features and identification data

Demographic data and features

Admission data and features

Prescribing features and data

Laboratory data and features

In the above described embodiments, at stage 136, the data is scaled and passed through a Principal Components Analysis (PCA) to perform feature importance and selection, reducing features from 52 to 14. The 52 feature data set includes the following features: ['SafeHavenlD', 'eoy', 'adm_per_year', 'total_hosp_days', 'meanjos', ‘anxiety_depression_per_year', 'days_since_adm', 'adm_to_date', ‘copd_to_date', 'resp_to_date', 'anxiety_depression_to_date', 'comorb_per_year', 'presc_to_date', 'days_since_rescue', 'rescue_to_date', 'anxiety_depression_presc_to_date', 'salbutamol_per_year', 'rescue_meds_per_year', 'presc_per_year', 'anxiety_depression_presc_per_year', 'alt_med_2yr', 'ast_med_2yr', 'albumin_med_2yr', 'alkaline_phosphatase_med_2yr', 'basophils_med_2yr', 'c_reactive_protein_med_2yr', 'chloride_med_2yr', 'creatinine_med_2yr', 'eosinophils_med_2yr', 'estimated_gfr_med_2yr', 'haematocrit_med_2yr', 'haemoglobin_med_2yr', 'lymphocytes_med_2yr', 'mch_med_2yr', 'mean_cell_volume_med_2yr', 'monocytes_med_2yr', 'neutrophils_med_2yr', 'platelets_med_2yr', 'potassium_med_2yr', 'red_blood_count_med_2yr', 'sodium_med_2yr', 'total_bilirubin_med_2yr', 'urea_med_2yr', 'white_blood_count_med_2yr', 'neut_lymph_med_2yr', 'labs_to_date', 'labs_per_year', 'days_since_copd_resp', 'ggc_years', 'age', 'singlejnhaler', 'doublejnhaler', 'triplejnhaler', 'copd_resp_per_year']

The extracted feature set includes the following set of 14 features:

In the above described embodiments, a model in which 14 features were extracted was described. The 14 features for this model are ['comorb_per_year', 'days_since_adm', 'ggc_years', 'days_since_copd_resp', 'days_since_rescue', 'presc_per_year', ‘age¹, 'presc_to_date', ’estimated_gfr_med_2yr', 'albumin_med_2yr', 'labs_to_date', 'haemoglobin_med_2yr', 'red_blood_count_med_2yr', 'labs_per_year'].

While the above list of features relate to identification features, demographic features, prescribing features, admission features and lab features, it will be understood that a subset of these features may be used in certain embodiments. At least some of the demographic, prescribing, admission and lab features may be referred to as clinical history features. In addition, it will be understood that, in some embodiments, additional data may be used together with these features as input to the model and/or as training data. For example, in some embodiments, these features may be combined with PRO data and/or sensor data. It will be understood that the date described above may comprise numerical values representing the set of features.

Whilst certain embodiments have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the invention.

Claims

CLAIMS:

1. A method for predicting at least one health outcome for a patient with chronic obstructive pulmonary disease (COPD) comprising: receiving data for a patient representing values for a set of features, wherein the set of features comprise features representing or associated with a clinical history of the patient, wherein the clinical history comprises information on at least one of: hospital admissions, episodes of severe illness, COPD exacerbations, treatment, laboratory data, prescribing data, diagnosis history; applying at least one trained machine learning model to the received data to obtain a prediction of one or more health outcomes relating to COPD, wherein at least one of a), b) and c): a) the method further comprises obtaining a rationale for the prediction and/or a contribution to the prediction for one or more features of the set of features; b) wherein the set of features are predetermined by at least one further machine learning procedure wherein the at least one further machine learning procedure comprises an unsupervised machine learning procedure; c) wherein the at least one trained machine learning model comprises a calibrated model such that the obtained prediction comprises a calibrated prediction.

2. The method of claim 1 , wherein the prediction comprises a prediction of an exacerbation of COPD occurring within a time period, wherein an extent of the time period is between 24 hours and 1 week, optionally wherein the time period is 72 hours.

3. The method according to any preceding claim, wherein the prediction comprises a prediction of mortality occurring within the next 12 months.

4. The method according to any preceding claim, wherein the prediction comprises a prediction of readmission to hospital within the next 3 months.

5. The method of any preceding claim, wherein the rationale and/or contribution to the prediction comprises a feature importance score, optionally wherein the method comprises ranking and/or filtering and/or selecting one or more features of the set of features based on the determined importance.

SUBSTITUTE SHEET (RULE 26)

6, The method of any preceding claim, wherein the importance scores comprise global and/or local importance scores.

7. The method of any preceding claim, wherein obtaining the rationale for the prediction and/or the contribution to the prediction one or more health outcome for at least one of the set of features comprises applying an explainability procedure to the at least one model and/or the received data.

8. The method of claim 7, wherein the explainability procedure comprises a SHAP (SHapley Additive exPlanation), LIME (Local Interpretable Model-Agnostic Explanations) or ELI5 (Explain Like I'm 5) based procedure

9. The method of any preceding claim, wherein the trained machine learning procedure comprises a supervised machine learning model and/or a machine learning model trained using supervised learning and is configured to output one or more labels representing a prediction of a health outcome of a set of predetermined labels, wherein the set of labels is obtained using the unsupervised machine learning procedure.

10. The method of any preceding claim, wherein the unsupervised machine learning procedure comprises a principle component analysis and/or a clustering procedure.

11 . The method of any preceding claim, wherein the trained machine learning model comprises a tree-based procedure, for example, a decision tree and/or a k-means clustering procedure and/or a one-versus-rest procedure

12. The method of any preceding claim, wherein the at least one trained model is calibrated such that the predicted outcome substantially corresponds to a real-world probability of said health outcome

13. The method of any preceding claim further comprising applying an unsupervised topic modelling process to at least part of the clinical history data to obtain one or more labelled topics for use as one or more features of the set of features.

SUBSTITUTE SHEET (RULE 26)

14. The method of any preceding claim, wherein the at least one trained model comprise a plurality of trained models and applying the at least one trained model comprises applying the plurality of trained models to obtain a corresponding plurality of predictions for the at least one health outcome and wherein at least one of a), b), c): a) the method comprises combining the plurality of predictions to obtain a combined prediction; b) the plurality of machine learning models are calibrated; c) an explainability procedure is applied to each of the plurality of trained models and/or the received data to obtain a rationale and/or contribution for each of the plurality of trained models.

15. The method of any preceding claim, wherein the method further comprises determining expected model behaviour and/or one or more metrics representing said behaviour at a probability threshold, optionally wherein the probability threshold is selected by a user.

16. The method according to any preceding claim, further comprising at least one of a) to d):- a) displaying or communicating the prediction to a clinician; b) displaying or communicating to the clinician a set of features used as, or related to, an input to the machine learning model; c) displaying or communicating to the clinician the rationale for the prediction and/or the contribution; d) issuing an alert in dependence on the prediction.

17. An apparatus comprising processing circuitry configured to: receive data for a patient representing values for a set of features, wherein the set of features comprise features representing or associated with a clinical history of the patient, wherein the clinical history comprises information on at least one of: hospital admissions, episodes of severe illness, COPD exacerbations, treatment, laboratory data, prescribing data, diagnosis history; apply at least one trained machine learning model to the received data to obtain a prediction of one or more health outcomes relating to COPD, wherein at least one of a), b), c):

SUBSTITUTE SHEET (RULE 26) a) the processing circuitry is configured to obtain a rationale for the prediction and/or a contribution to the prediction for one or more features of the set of features; b) wherein the set of features are predetermined by applying at least one further machine learning procedure wherein the at least one further machine learning procedure comprises an unsupervised machine learning procedure; c) wherein the at least one trained machine learning model comprises a calibrated model such that the obtained prediction comprises a calibrated prediction.

18. A method for training at least one machine learning model to predict at least one health outcome for patients with COPD, the method comprising: receiving a plurality of training data sets, each training data set comprising at least clinical history data for a respective patient; and using the plurality of training data sets to train the at least one machine learning model to predict the at least one health outcome relating to COPD, wherein the training comprises a feature selection process to select a plurality of features relevant to the at least one health outcome.

19. The method according to claim 18, wherein the machine learning model comprises a tree-based model and/or the training comprises boosting and bagging.

20. The method according to claim 18 or claim 19, wherein the training comprises using custom loss functions and/or optimizing the model to minimize false positives and/or optimizing hyperparameters of the model using a grid-search approach utilizing cross validation.

21. The method according to any of claims 18 to 20, wherein the training method comprises performing a calibration process to obtain at least one calibrated machine learning model from at least one un-calibrated machine learning model.

22. The method of claim 21 , wherein the calibration process obtaining a mapping between the output of the at least one trained model to a probability distribution.

23. The method of claims 21 or 22 wherein the calibration process comprises a Platt regression or other Platt scaling process.

SUBSTITUTE SHEET (RULE 26)

24. The method according to any of claims 18 to 23, wherein the method further comprises performing feature engineering to determine an initial set of features for initial training of the at least one machine learning, and wherein the feature selection process comprises selecting at least some of the initial set of features and/or selecting at least one further feature generated during the training process.

25. The method according to any of claims 18 to 24, further comprising data cleansing of the training data sets and/or exploring the training data sets for potential bias.

26. An apparatus comprising processing circuitry configured to: receiving a plurality of training data sets, each training data set comprising at least clinical history data for a respective patient; and using the plurality of training data sets to train the at least one machine learning model to predict the at least one health outcome relating to COPD, wherein the training comprises a feature selection process to select a plurality of features relevant to the at least one health outcome.

27. A computer program product comprising computer readable instructions that are executable by a processor to perform a method according to any of claims 1 to 16 or 18 to 25.

28. A method for predicting at least one health outcome for a patient with chronic obstructive pulmonary disease (COPD) comprising: receiving a clinical history of the patient; receiving patient-reported outcomes (PRO) data submitted by the patient; receiving sensor data comprising or representative of physiological measurements for the patient, wherein the sensor data is captured daily or more often; and applying at least one trained machine learning model to the clinical history, the PRO data and the sensor data to obtain a prediction of one or more health outcomes relating to COPD.

29. The method according to claim 28, wherein at least some of the sensor data is captured by a wearable device worn by the patient.

SUBSTITUTE SHEET (RULE 26)

30. The method according to claim 28 or claim 29, wherein the PRO data comprises wellbeing data and the sensor data comprises patient activity data, heart rate data and data indicative of restless sleep; and wherein the prediction is based at least partially on decline in patient-reported wellbeing in combination with a change in patient activity data, data indicative of restless sleep and heart rate data.

31 . The method according to any of claims 28 to 30, wherein the prediction comprises a prediction of an exacerbation of COPD occurring within a time period, wherein an extent of the time period is between 24 hours and 1 week, optionally wherein the time period is 72 hours.

32. The method according to any of claims 28 to 31 , wherein the prediction comprises a prediction of mortality occurring within the next 12 months.

33. The method according to any of claims 28 to 32, wherein the prediction comprises a prediction of readmission to hospital within the next 3 months.

34. The method according to any according to any of claims 28 to 33, further comprising at least one of a) to d):- a) displaying or communicating the prediction to a clinician; b) displaying or communicating to the clinician a set of features used as, or related to, an input to the machine learning model; c) displaying or communicating to the clinician a rationale for the prediction; d) issuing an alert in dependence on the prediction.

35. The method according to any of claims 28 to 34, comprising obtaining predictions for the at least one health outcome for a plurality of patients, and displaying or communicating the predictions for at least some of the plurality of patients to a clinician, optionally further comprising filtering and/or ordering the patients in dependence on the predictions.

36. The method according to any of claims 28 to 35, further comprising providing messaging functionality for communication between the patient and one or more clinicians.

SUBSTITUTE SHEET (RULE 26)

37. The method according to any of claims 28 to 36, wherein the clinical history is obtained or derived from the patient’s electronic health record (EHR).

38. The method according to any of claims 28 to 37, wherein at least part of the clinical history is input by one or more clinicians, optionally wherein at least part of the clinical history is input as part of a patient onboarding process and comprises at least one of: presence of one or more comorbidities of a set list of comorbidities, a number of exacerbations in the previous 12 months, a number of hospital admissions in the previous 12 months.

39. The method according to any of claims 28 to 38, wherein the clinical history comprises information on at least one of: hospital admissions, episodes of severe illness, COPD exacerbations, treatment, laboratory data, prescribing data, diagnosis history.

40. The method according to any of claims 28 to 39, wherein the PRO data comprises data obtained by presenting a survey or questionnaire to a patient and recording the patient’s responses, optionally wherein the survey or questionnaire comprises questions on the patient’s perceived wellbeing.

41. The method according to any of claims 28 to 40, wherein the sensor data comprises at least one of: activity data, respiratory data, heart rate, sleep data, energy expenditure, oxygen saturation, optionally wherein the sensor data includes data received from a home respiratory sensor, data received from a non-invasive ventilator (NIV) mask, respiratory rate and/or ventilation parameters.

42. The method according to any of claims 28 to 37, wherein applying the at least one trained machine learning model comprises applying a supervised trained machine learning model to input data for the patient, the input data representing a predetermined set of features representing and/or related to the clinical history, the PRO data and the sensor data, wherein the predetermined set of features determined using an unsupervised machine learning procedure.

43. The method according to any of claims 28 to 42, further comprising determining a contribution to the predicted health outcome for features representing and/or related to

SUBSTITUTE SHEET (RULE 26) the clinical history, the PRO data and the sensor data, and outputting a representation of the determined contribution.

44. The method according to any of claims 28 to 43, wherein determining the contribution comprises applying an explainability process, for example, a local or global explainability process, to obtain the contribution.

45. The method according to any of claims 28 to 44, wherein the output comprises a calibrated prediction of a health outcome.

46. The method according to any of claims 28 to 45, wherein applying the at least one trained machine learning model comprises applying a plurality of trained models to obtain a corresponding plurality of predictions for the at least one health outcome and wherein at least one of: a) the plurality of predictions are combined to obtain a combined prediction; b) the plurality of machine learning models are calibrated; c) an explainability procedure is applied to each of the plurality of trained models to obtain a contribution of the input to the prediction

47. The method according to any of claims 28 to 46 further comprising, determining one or more metrics representing an expected model behaviour at said a probability threshold, optionally wherein the probability threshold is selected by a user.

48. An apparatus comprising processing circuitry configured to: receive a clinical history of a patient; receive patient-reported outcomes (PRO) data submitted by the patient; receive sensor data comprising or representative of physiological measurements for the patient, wherein the sensor data is captured daily or more often; and apply at least one trained machine learning model to the clinical history, the PRO data and the sensor data to obtain a prediction of one or more health outcomes relating to COPD.

49. A method for training at least one machine learning model to predict at least one health outcome for patients with COPD, the method comprising:

SUBSTITUTE SHEET (RULE 26) receiving a plurality of training data sets, each training data set comprising clinical history, PRO data and sensor data for a respective patient; and using the plurality of training data sets to train the at least one machine learning model to predict the at least one health outcome, the training comprising a feature selection process to select a plurality of features relevant to the at least one health outcome.

50. A method according to claim 49, wherein the machine learning model comprises a tree-based model and/or the training comprises boosting and bagging.

51 . A method according to any of claims 49 to 50, wherein the training comprises using custom loss functions.

52. A method according to any of claims 49 to 51 , wherein the training comprises optimizing the model to minimize false positives.

53. A method according to any of claims 49 to 52, wherein the training comprises optimizing hyperparameters of the model using a grid-search approach utilizing cross validation.

54. A method according to any of claims 49 to 53, wherein the method further comprises performing feature engineering to determine an initial set of features for initial training of the at least one machine learning, and wherein the feature selection process comprises selecting at least some of the initial set of features and/or selecting at least one further feature generated during the training process.

55. A method according to any of claims 49 to 54, further comprising data cleansing of the training data sets and/or exploring the training data sets for potential bias.

56. An apparatus comprising processing circuitry configured to: receive a plurality of training data sets, each training data set comprising clinical history, PRO data and sensor data for a respective patient; and train a machine learning model to predict the at least one health outcome for patients with COPD from a plurality of features, the training comprising performing a

SUBSTITUTE SHEET (RULE 26) feature selection process to select the plurality of features relevant to the at least one health outcome.

57. A computer program product comprising computer readable instructions that are executable by a processor to perform a method according to any of claims 28 to 47 or 49 to 55.

SUBSTITUTE SHEET (RULE 26)