WO2020142035A1

WO2020142035A1 - Disease diagnosis system

Info

Publication number: WO2020142035A1
Application number: PCT/TR2019/051101
Authority: WO
Inventors: Ali Cakmak; Muhammed Hasan CELİK
Original assignee: Istanbul Sehir University
Current assignee: Istanbul Sehir University
Priority date: 2018-12-31
Filing date: 2019-12-18
Publication date: 2020-07-09
Anticipated expiration: 2021-06-30

Abstract

The invention is a disease diagnosis system (100), wherein the disease of a test individual is predicted based on metabolomic measurements taken from the said test individual. Accordingly, it comprises a processor unit (110), characterized in that the said processor unit (110) is configured to perform the following steps: accessing a metabolic network model (121) stored in a memory unit, wherein a metabolic network (400) comprising a plurality of metabolic pathways (410) identified by metabolites and metabolic reactions (412) in which metabolites and metabolites (411) are involved; accessing a reference database (122) in which the metabolite (411) fold-change values of said individuals are stored in relation to the reactions (412) involving the metabolites (411) for each of the healthy individuals and the diseased individuals associated with at least one disease; determining an average metabolic flux value for each metabolic pathway (410) in the metabolic network model according to the metabolite (411) fold change values in the reference database (122); generating a set of metabolic flux values for each individual in the reference database comprising individual-specific minimum and maximum metabolic flux values of each metabolic pathway (410) in the metabolic network model (121); generating a set of reference difference values for each individual according to the deviations of said metabolic flux value sets from the average metabolic flux values; receiving test data of the fold-change values of the metabolites (411) in the metabolomic measurements taken from said test individual as input; associating the metabolites (411) in the test data with the metabolic pathways (410) containing the reactions (412) of said metabolites and determining a minimum and maximum metabolic flux value for each of the metabolic pathways (410); generating test difference values according to the deviations of the determined metabolic flux values from the average metabolic flux values; applying a classification algorithm in which the generated test difference values belong to which of the set of reference difference values is predicted.

Description

DISEASE DIAGNOSIS SYSTEM

TECHNICAL FIELD

The invention relates to a computer-based disease diagnosis system and method in which a disease of said test individual is predicted based on metabolomic measurements taken from a test individual.

BACKGROUND ART

Accurate and early diagnosis of diseases, such as cancer, is important for the successful continuation of the treatment process and ensuring the correct treatment. Accurate diagnosis reduces both the patient's time loss and the patient's loss of financial resources.

Phenotypes of diseases often have implications for patients' metabolism.

With the effect of the disease, the activity of some metabolic pathways in the metabolism network increases and some of them decrease. These changes can give information about the etiology of the disease.

It is known in the art to detect irregularities in metabolic pathways in tumor specimens and to generate scores for each metabolic pathway and to relate certain specific metabolic pathways to certain diseases (Drier, Y., Sheffer, M., & Domany, E. (2013). Pathway-based personalized analysis of cancer. Proceedings of the National Academy of Sciences, 110 (16), 6388-6393.).

The calculation and analysis of metabolic flux in the metabolic network are also known in the art (Orth, JD, Thiele, I., & Palsson, B. 0. (2010). What is flux balance analysis? Nature biotechnology, 28(3), 245-248.).

It is known in the art that certain metabolic pathways are significantly altered in cancer patients. (Vaske, C. J., Benz, S. C., Sanborn, J. Z. et al. (2010)). Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. Bioinformatics, 26(12), Ϊ237-Ϊ245.) In the patent application with number US2015006445A1 , it is determined whether the patient responses appropriately to the treatment by analyzing specific metabolic pathways.

The above-mentioned methods treat each pathway independently in an isolated way one by one. None of the pathway analysis methods in the literature (including Pathifier and Paradigm) takes into account the fact that pathways are part of a large biological network and interact with each other.

As a result, all issues mentioned above made it necessary to make an innovation in the relevant technical field.

BRIEF DESCRIPTION OF THE INVENTION

The present invention relates to a robot in order to eliminate the above-mentioned disadvantages and bring new advantages to the relevant technical field.

It is an object of the invention to provide a disease diagnosis system and method for diagnosing diseases with increased accuracy.

The present invention is a disease diagnosis system in which the disease of the said test individual is predicted based on metabolomic measurements taken from a test individual in order to accomplish all the objects mentioned above and which will emerge from the following detailed description. Accordingly, the innovation comprises a processor unit, and the said processor unit is configured to perform the following steps:

- accessing a metabolic network model stored in a memory unit in which a metabolic network is defined, consisting of a number of metabolic pathways formed by metabolites and metabolic reactions that produce and consume these metabolites,

- accessing to a reference database in which the metabolite fold-change values of said individuals are stored in relation to the reactions involving the metabolites for each of the healthy individuals and the diseased individuals associated with at least one disease,

- determining an average minimum and maximum metabolic flux value for each metabolic pathway in the metabolic network model according to the metabolite fold-change values in the reference database,

- creating a set of metabolic flux values for each individual in the reference database containing individual minimum and maximum metabolic flux values of each metabolic pathway in the metabolic network model, - generating a set of reference difference values for each individual according to the deviations of said metabolic flux value sets from the average metabolic flux values,

- receiving test data of the fold-change values of the metabolites in the metabolomic measurements taken from said test subject as input,

- linking these said metabolites to metabolic pathways with the reactions in the database,

- determining a minimum and maximum metabolic flux value for each of the associated metabolic pathways,

- generating test difference values according to the deviations of the determined metabolic flux values from the average metabolic flux values,

- applying a classification algorithm in which the generated test difference values belong to which of the set of reference difference values is predicted.

The main novelty of the method described in the present specification of this invention is that it performs a holistic analysis on the entire metabolic network rather than isolating each pathway separately. The main advantage of such an approach is that it allows the detection of key pathways that have no (or few) gene/metabolite measurements associated with them in the analyzed ohmic data for a given disease. On the other hand, Pathifier and Paradigm are capable of evaluating and scoring pathways with at least a few metabolites in the analyzed metabolomics data set. Therefore, they may miss many important key pathways.

Diseases cause significant changes in specific metabolic pathways but also cause the metabolic network to change entirely. Detecting a disease by considering the metabolic network as a whole and analyzing changes in the metabolic network are not mentioned in the background art. By means of the system and method of the invention, it is determined whether a test individual falls into the category of sick or healthy individuals by considering the whole metabolic network of healthy and sick individuals.

A preferred embodiment of the invention is characterized in that the processor unit is configured to display the disease with which the patient individual is associated if the test difference values are predicted to belong to a diseased individual.

Another preferred embodiment of the invention is characterized in that the said classification algorithm is a machine-learning based classification algorithm. Another preferred embodiment of the invention is characterized in that the said classification algorithm is a logistic regression algorithm. Thus, the accuracy of disease prediction is further increased.

Another preferred embodiment of the invention is characterized in that it comprises an input/output unit for receiving the test data as input.

The invention is also a computer-based disease diagnosing method in which the disease of the said test individual is predicted based on metabolomic measurements from a test individual. Accordingly, the innovation comprises the following steps performed by a processor unit:

- accessing to a metabolic network model stored in a memory unit in which a metabolic network of multiple metabolic pathways is identified by metabolites and metabolic reactions of metabolites,

- determining an average metabolic flux value for each metabolic pathway in the metabolic network model according to the metabolite fold-change values in the reference database,

- generating a set of metabolic flux values for each individual in the reference database containing individual-specific metabolic flux values of each metabolic pathway in the metabolic network model,

- generating a set of reference difference values for each individual according to the deviations of said metabolic flux value sets from the average metabolic flux values,

- receiving test data of the fold-change values of metabolites in metabolomic measurements taken from said test subject that are associated with metabolic pathways with the reactions of said metabolites, as input.

- determining a metabolic flux value for each of the metabolic pathways in the test data,

BRIEF DESCRIPTION OF THE FIGURE

Figure 1 shows a representative view of the disease diagnosis system. Figure 2 shows a representative view of the memory unit.

Figure 3 shows a representative thermal map of the difference values of metabolic pathways according to diseased and healthy individuals.

Figure 4 shows a representative view of the metabolic network and its components.

DETAILED DESCRIPTION OF THE INVENTION

In this detailed description, the subject matter of the invention is described by using examples only for a better understanding, which will have no limiting effect.

With reference to Figure 1 , the subject matter of the invention, the disease diagnosis system (100), comprises a processor unit (1 10), a memory unit (120) with which the said processor unit (1 10) is associated to perform read/write operations. The said processor unit (1 10) may be a microprocessor. The memory unit (120) may comprise a memory capable of storing data permanently or a suitable combination of memories that can store data permanently/temporarily. It may comprise an input/output unit (130) associated with the processor unit (1 10) to input data to the processor unit (110) and to receive data from the processor unit (1 10).

The disease diagnosis system (100) may also comprise a communication unit (150) for enabling the processor unit (110) to exchange data with a communication network (200) such as the internet. The disease diagnosis system (100) may comprise a bus (140) that enables processor unit (1 10), memory unit (120), input/output unit (130), communication unit (150), and other components known in the art but not mentioned herein to associate them with each other in such a way as to make data transfer appropriately.

A metabolic network model (121) is stored in the memory unit (120). Figure 2 shows a representative view of a metabolic network (400). The metabolic network model (121) comprises a plurality of metabolic pathways (410) which are associated with each other. Metabolic pathways (410) comprise reactions (412) and the metabolites (41 1) (such as alene, glutamine, fatty acids, etc.) involved in these reactions (412).

Each metabolic pathway (410) can be expressed as a linear equation in terms of reaction (412) and metabolites (41 1) involved in the reaction (412). Below are examples where metabolic pathways (410) are defined as linear equations.

A <® B + C Reaction #1

B + 2C D Reaction #2

The metabolic network model (121) can also be defined in terms of these reactions (412). More specifically, the metabolic network model (121) can be defined as an S matrix. The said matrix may comprise, for example, metabolites (411) in the rows and reactions (412) in the columns. In the cells, there are values according to the role of the corresponding metabolite (411) in the corresponding reaction (412) in the cell. These values are the stoichiometry of the metabolite (411) in the reaction (412).

Data on metabolomic measurements from healthy individuals and data on metabolomic measurements of diseased individual associated with at least one disease are stored in a reference database (122) in the memory unit (120).

The processor unit (1 10) enables a disease detection module comprising command lines in the memory unit (120) to execute the following steps as an innovative aspect of the invention: It accesses the metabolic network model (121). It accesses the reference database (122). For each metabolic pathway (410) in the metabolic network model (121), an average, minimum, and maximum metabolic flux values are determined based on the metabolite fold-change values in the reference database (122). For each individual in the reference database (122), a set of metabolic flux values is generated including individual-specific minimum and maximum metabolic flux values for each metabolic pathway (410) in the metabolic network model (121). It generates a set of reference difference values for each individual based on the deviations of said metabolic flux value sets from the average metabolic flux values. Test data including the fold-change values of metabolites (411) in metabolomic measurements taken from a test individual are received as input. It is associated with metabolic pathways (410) which covers the reactions (412) with said metabolites (411) in the test data. A minimum and maximum metabolic flux value are determined for each of these associated metabolic pathways (410). Test difference values are determined based on the deviations of the determined metabolic flux values from the average metabolic flux values. A classification algorithm is applied to the generated test difference values to predict to which set of reference difference values they belong. If it is determined that the test difference values belong to a diseased individual's reference difference value sets, one or more diseases associated with the diseased individual are provided as output by the input/output unit (130). Thus, the metabolic network is evaluated holistically and the test individual is matched with a diseased individual or healthy individuals and the diagnosis of the disease is made with an increased accuracy.

Obtaining the values from the steps described in the innovative aspect of the invention and making the prediction is explained in detail using mathematical methods below: Flux variables of reactions in the metabolic network are represented by a V vector. Metabolic flux values are calculated by solving the equation of SxV = 0. During the solution, the equation SxV=0 is transformed into an optimization problem. The steady-state assumptions of the optimization problem are represented as constants in the optimization problem; the objective function of the optimization problem is to maximize the metabolic flux values of the reactions (412) that produce metabolites that are observed to increase in the cell of the test individual specific to the test individual, and to minimize metabolic flux values of the reactions (412) that produce metabolites that are observed to decrease in the test individual. More specifically, the objective function of the optimization problem is as follows:

M: Metabolomic measurement values provided as input (metabolite fold-change values), m_R: All reactions that produce metabolite (41 1) m,

m _c: (412), the measured fold-change value of metabolite (41 1) m,

Flux(Ri): The metabolic flux variable for the reaction (412) of R,,

Stoichiometry (R,, m): Stoichiometry of metabolite m in the reaction (412) of R,,

TotalStoichiometry (m): Total stoichiometry of metabolite (411) m in reactions (412) producing m.

In order to calculate the fold-changes, firstly, the average densities of all metabolites (41 1) are calculated based on the metabolomic measurements of healthy individuals held in the memory unit (120). Then the fold-change values are determined for each metabolite (41 1) of each individual based on their deviation from the average density.

Thus, the objective function changes dynamically for each individual. Since the number of reactions (412) in the metabolic network is greater than the number of metabolites (411), the part of the optimization problem described above is incomplete. Therefore, there can be many alternative solutions. Each alternative solution can be obtained in such a way as to allow the cells to achieve different targets. Flux Variability Analysis (FVA) is performed to cover all alternative solutions. The following steps are taken for FVA:

- The value of the objective function is determined by solving the optimization problem.

- The value of the objective function is added as an additional constraint to the optimization problem.

- For an R reaction in the metabolic network, a new objective function is created so as to maximize the metabolic flux of the reaction and the optimization problem is solved again. The calculated objective function value is recorded as the upper limit (maximum) of the metabolic flux value of the R reaction.

- For the R reaction (412), the objective function is reconstructed so as to minimize the metabolic flux value and the optimization problem is solved again. The calculated objective function value is recorded as the lower limit (minimum) of the metabolic flux value of the R reaction.

These steps are repeated for each reaction (412) in the metabolic network and the metabolic flux value limits of the reactions (412) are determined.

After performing FVA analysis, the average of the lower and upper limits of the metabolic flux values are calculated and the mean metabolic flux values are calculated. The lower and upper limits of the metabolic flux values are determined based on the metabolomic measurements of a test individual; the average of these lower and upper limits is calculated and a metabolic flux value is determined for each metabolic pathway (410). Test difference values are generated based on the difference between the metabolic flux values determined for the test individual and the average metabolic flux value.

In a possible embodiment of the invention, a statistical significance analysis of these difference values is carried out based on the difference values of diseased and healthy individuals. Thus, the major metabolic pathway (410) changes causing the disease can be detected. For each metabolic pathway (410), the statistical significance is determined by matching the test difference value with an F-value and a P-value calculated using ANOVA.

Following these procedures, a classification algorithm based on machine-learning is taught using the difference values calculated for the diseased individuals and healthy individuals. This algorithm predicts whether the result of metabolomic analysis, ie, test difference values and the significance values associated with these test difference values, belong to a diseased or a healthy individual. More specifically, it is determined which reference difference value sets for diseased and healthy individuals are the most similar to the results. If the results belong to a diseased individual, the disease associated with the diseased individual is displayed in the input/output unit (130). This possible embodiment is based on the logistic regression algorithm.

Figure 3 shows the thermal map of the difference values of the metabolic pathways (410) of healthy individuals and diseased individuals. Values for healthy individuals are indicated in a circumscribed symbol. It can be observed that diseased individuals have metabolic pathways which have distinct difference values compared to healthy individuals.

In a possible embodiment, the metabolomic measurements of each test individual are recorded in the disease database or reference database (122) according to the analysis result, therefore the reference database (122) is constantly updated. For example, Table 1 shows the difference values of the metabolic pathways (410) of a patient associated with breast cancer and the F and P values for the statistical significance of the difference values.

Table 1 :

In a possible embodiment of the invention, the disease diagnosis system (100) is provided on a server. The server is connected to a communication network (200) and it receives the test data from the client devices (300) connected to the same communication network (200) as input and as a result, determines whether the test data belongs to a diseased individual or a healthy individual. Using the obtained analysis values, the invention can be used to diagnose diseases, as well as to determine possible diseases that may be incurred by the individual at whom the measurements are made, to identify the main metabolic mechanisms of the disease and to present them visually to health researchers and clinicians, to plan appropriate personalized treatments according to the main metabolic mechanism of the disease, to make comparative analysis of metabolic mechanisms of diseases and as a result, to make recommendations to the type and drug researchers for the same/similar diseases for the use of existing treatment for another disease, to make recommendations on which possible proteins can be metabolic drug targets for the drugs to be developed against diseases that are not yet medicated. The scope of the protection of the invention is set forth in the annexed claims and certainly cannot be limited to exemplary explanations in this detailed description. It is evident that one skilled in the technic can make similar embodiments in the light of the explanations above without moving away the main theme of the invention.

REFERENCE NUMBERS IN THE FIGURE

100: Disease diagnosis system

1 10: Processor unit

120: Memory unit

121 : Metabolic network model

122: Reference database

123: Disease diagnosis module

130: Input/output units

140: Bus

150: Communication unit

200: Communication network

300: Client device

400: Metabolic network

410: Metabolic pathway

411 : Metabolite

412: Reaction

Claims

1. A disease diagnosis system (100) wherein the disease of a test individual is predicted based on metabolomic measurements taken from the test individual and characterized in that it comprises a processor unit (1 10), and the said processor unit (1 10) is configured to perform the following steps:

- accessing a metabolic network model (121) stored in a memory unit (120) in which a metabolic network (400) of multiple metabolic pathways (410) is identified by metabolites and metabolic reactions (412) of metabolites (41 1),

- accessing a reference database (122) in which the metabolite fold-change values of said individuals are stored in relation to the reactions (412) involving the metabolites (41 1) for each of the healthy individuals and the diseased individuals associated with at least one disease,

- determining an average minimum and maximum metabolic flux value for each metabolic pathway (410) in the metabolic network model according to the metabolite fold change values in the reference database (122),

- generating a set of metabolic flux values for each individual in the reference database comprising individual-specific metabolic flux values of each metabolic pathway (410) in the metabolic network model (121),

- receiving the test data of the fold-change values of metabolites (41 1) in metabolomic measurements taken from the said test subject that are associated with metabolic pathways (410) with the reactions (412) of said metabolites (41 1), as input,

- determining a metabolic flux value for each of the metabolic pathways (410) in the test data,

2. A disease diagnosis system (100) for predicting the disease of a patient based on metabolomic measurements in accordance with Claim 1 characterized in that the processor unit (1 10) is configured to display the disease(s) to which the diseased individual is associated, if the test difference values are predicted to belong to a diseased individual.

3. A disease diagnosis system (100) for predicting the disease of a patient based on metabolomic measurements in accordance with Claim 1 characterized in that said classification algorithm is a machine-learning based classification algorithm.

4. A disease diagnosis system (100) for predicting the disease of a patient based on metabolomic measurements in accordance with Claim 1 characterized in that the said classification algorithm is the logistic regression algorithm.

5. A disease diagnosis system (100) for predicting the disease of a patient based on metabolomic measurements in accordance with Claim 1 characterized in that it comprises an input/output unit (130) for receiving the test data as input.

6. A computer-based disease diagnosis method in which a test individual's disease is predicted based on metabolomic measurements taken from said test individual and characterized in that it comprises the following steps performed by a processor unit (1 10):

- accessing a metabolic network model (121) stored in a memory unit (120), wherein a metabolic network comprising a plurality of metabolic pathways (410) identified by metabolites and metabolic reactions (412) in which metabolites and metabolites (41 1) are involved,

- accessing to a reference database (122) in which the metabolite (411) fold-change values of said individuals are stored in relation to the reactions (412) involving the metabolites (41 1) for each of the healthy individuals and the diseased individuals associated with at least one disease,

- determining an average metabolic flux value for each metabolic pathway (410) in the metabolic network model according to the metabolite (41 1) fold change values in the reference database (122),

- generating a set of metabolic flux values for each individual in the reference database comprising individual-specific minimum and maximum metabolic flux values of each metabolic pathway (410) in the metabolic network model (121),

- receiving test data of the fold-change values of the metabolites (411) in the metabolomic measurements taken from said test individual as input, - associating the metabolites (41 1) in the test data with the metabolic pathways (410) containing the reactions (412) of said metabolites and stored in the memory unit (120),

- determining a minimum and maximum metabolic flux value for each of the metabolic pathways (410) in the test data,