[go: up one dir, main page]

WO2022162440A1 - Method for predictive analysis of a biological system - Google Patents

Method for predictive analysis of a biological system Download PDF

Info

Publication number
WO2022162440A1
WO2022162440A1 PCT/IB2021/055527 IB2021055527W WO2022162440A1 WO 2022162440 A1 WO2022162440 A1 WO 2022162440A1 IB 2021055527 W IB2021055527 W IB 2021055527W WO 2022162440 A1 WO2022162440 A1 WO 2022162440A1
Authority
WO
WIPO (PCT)
Prior art keywords
biological system
metabolic
phenotype
biological
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/IB2021/055527
Other languages
French (fr)
Inventor
Mauro DI NUZZO
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Netabolics Srl
Original Assignee
Netabolics Srl
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Netabolics Srl filed Critical Netabolics Srl
Publication of WO2022162440A1 publication Critical patent/WO2022162440A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
    • G16B5/30Dynamic-time models
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis

Definitions

  • This invention relates to the technical sector of bioinformatics and biological logic.
  • the invention relates to a method for the predictive analysis of a biological system, especially from the point of view of the processes which involve the cellular metabolism.
  • the more commonly used approaches relate to models which operate on non-equilibrium stationary phenotypes (that is, in the absence of time dynamics), which include, for example, constraint-based models of metabolic reactions, of which the most known approach is the so-called Flux Balance Analysis (FBA), and variants (for example, dynamic Flux Balance Analysis (dFBA); Thermodynamic-Based Flux Analysis (TFA); Group Contribution Model (GCM); Elementary Flux Modes Analysis (EFM); Flux Variability Analysis (FVA); and others), and network-based models of intracellular signalling, such as the so-called Pathway Analysis (PA) and variants (Over Representation Analysis (ORA); Functional Class Scoring (FCS); Pathway Topology-Based Analysis (PTB); and others).
  • FBA Flux Balance Analysis
  • dFBA dynamic Flux Balance Analysis
  • TFA Thermodynamic-Based Flux Analysis
  • GCM Group Contribution Model
  • EFM Elementary Flux Modes Analysis
  • FVA Flux Variability Analysis
  • PA Pathway Analysis
  • each chemical reaction or biological interaction is assigned a numerical value which is independent of time.
  • the experimental data is commonly provided as comparative measurements of two or more phenotypes, such as, for example, treated/untreated, healthy /ill, or the like.
  • these techniques do not allow any information to be obtained with respect to the transition which may have a variable duration, for example it may last from a few seconds to several years, and can provide indications with respect to which elements of the living system participate in the transition.
  • each of the chemical reactions or biological relations which characterise the cellular metabolism is represented by a parametric equation which depends on time, and not on a stationary value.
  • the experimental data can be acquired continuously, which makes it possible to follow the time development of a treatment, a pathology, or the like.
  • Kinetic models are the best tool for simulating the metabolism of a biological system from a biochemical point of view, since they allow, for example, for the concentrations of various components and their rate of variation to be predicted.
  • the technical purpose which forms the basis of the invention is to provide a method for the predictive analysis of a biological system which overcomes at least some of the above-mentioned drawbacks of the prior art.
  • the aim of the invention is to provide a method for the predictive analysis of a biological system which is able to identify and define in a particularly fast and efficient manner a final phenotype of at least one component of a biological system to be analysed.
  • the invention describes a method which can be implemented by a computer for analysing a biological system.
  • the method is executed by acquiring a plurality of data representing an initial phenotype of the biological system.
  • the initial phenotype defines one or more features and values representative of at least one component of the biological system at the start of the analysis.
  • a metabolic network of the biological system is constructed according to the plurality of data.
  • the metabolic network represents the components of interest of the biological system, the relations between them and any sequence of chemical reactions to which these components are subject.
  • a time development of the metabolic network is then simulated in such a way as to identify a final phenotype of the biological system.
  • the final phenotype defines one or more features and values representing at least one component of the biological system after a predetermined period of time or upon reaching predetermined conditions.
  • a report is generated and provided to a user which represents and uniquely identifies the final phenotype obtained from the simulation.
  • the method proposed makes it possible to obtain greater and more precise information with respect to the prior art techniques since an assessment is performed on the time development, which as well as considering the individual components and their specific features, also assesses the mutual interactions simultaneously with their development over time.
  • FIG. 3 shows in a general manner some of the passages which lead to the generation of the report associated with the final phenotype of the biological system to be analysed;
  • FIG. 7 and 8 show in more detail a block diagram identifying some specific analysis procedures which can be performed
  • FIG. 9 shows a different representation of the logic flows of the processes which can be performed using one of the components of the apparatus.
  • the method which can be implemented by a computer according to the invention makes it possible to perform an accurate analysis of a biological system, in particular with reference to its metabolic processes and the variations in its phenotype.
  • phenotype is used to mean the set of all the features manifested in the biological system, that is, by way of example and without limiting the scope of the invention, its morphology, its development, its biochemical and physiological properties, the concentrations of all the substances (metabolites) present inside it and the set of enzymes expressed and their characteristics in terms of levels of expression and kinetic properties.
  • the biological system may be, for example, a culture of tissue organs (organotypic), cellular (such as suspension of individual cells), a biological fluid (blood, saliva%), bioptic fragments of organs or tissues, organs (cerebral, thymic, hepatic, pancreatic, pulmonary, intestinal, renal, and others).
  • organs organs (organotypic), cellular (such as suspension of individual cells), a biological fluid (blood, saliva%), bioptic fragments of organs or tissues, organs (cerebral, thymic, hepatic, pancreatic, pulmonary, intestinal, renal, and others).
  • the method is performed by acquiring a plurality of data which contain information by means of which it is possible to represent an initial phenotype of the biological system.
  • the data acquired define in a manner as complete as possible the features of the biological system and can preferably comprise: genome data 201, transcriptomic data 202, protein data 203, metabolomic data 204, flow data 205, biometric data.
  • the data may be acquired in the form of tabulated data which can be obtained from suitable databases 200 and public repositories, preferably databases 200 available and accessible by the Internet.
  • Examples of database 200 available and by means of which it is possible to acquire the tabulated data of interest include in particular: Human Metabolome Database (HMDB, https://www.hmdb.ca), Kinetic MOdels of biological SYStems (KiMoSys, https://kimosys.org), Biochemical Reaction Kinetic Database (SABIO-RK, http://sabiork.h-its.org), BRaunschweig ENzyme DAtabase (BRENDA, https://www.brenda-enzymes.org), Kyoto Encyclopedia of Genes and Genomes (KEGG, https://www.genome.jp/kegg/), Virtual Metabolic Human (VMH, https://www.vmh.life), Interaction Reference Index (iReflndex, https://irefindex.vib.be).
  • HMDB Human Metabolome Database
  • KiMoSys Kinetic MOdels of biological SYStems
  • SABIO-RK Biochemical Reaction Kinetic Database
  • the acquisition of the data of interest occurs by acquiring a plurality of experimental data 207 through a digital twin 206 of the biological system and also representing the initial phenotype.
  • the digital twin 206 is the virtual representation of a physical entity, in the specific case of the biological system, defining an actual digital replica of the phenotype and of the metabolism of the biological system which is replicated.
  • the digital twin 206 can provide the experimental data 207, such as, for example, laboratory measurements, by means of communication protocols 208 which allow the connection, if necessary in real time, between the digital twin 206 and the computer which implements the method described here.
  • experimental data 207 such as, for example, laboratory measurements
  • the method is performed in such a way as to generate, for example by the same computer which performs the method, the digital twin 206 of the sample of interest.
  • the generation of the digital twin 206 is performed by activating at least one sensor configured to measure at least one parameter of the biological system.
  • This parameter provides information concerning at least one feature of the biological system as a function of which the digital twin 206 is generated.
  • the digital twin 206 defines a dynamic computational representation of the biological system and can use advanced technologies, for example virtual or augmented reality and IOT sensors (Intemet-of-Things) for monitoring, sharing and updating the experimental data 207 dynamically (that is to say, data which evolve over time), by means of suitable communication protocols 208, between the biological system, in terms of laboratory measurements, and its virtual modelling.
  • the time scale for the data acquisition, data communication, and the updating of the model may vary from seconds, or fractions of seconds, up to hours, or multiples of hours.
  • the plurality of data acquired from databases 200 and/or digital twin 206 is used to construct the metabolic network 209 of the biological system.
  • the metabolic network 209 identifies the components of interest of the biological system, their specific features, their correlations and the rate of change of these quantities over time.
  • the components of the biological system may comprise enzymes, receptors, ion channels, transport proteins, hormones, growth factors, DNA/RNA, ribosomes and in general any metabolite producible/interacting with the biological system.
  • the metabolic network 209 is validated in terms of its completeness, for example using algorithms known in the prior art for gap-filling or functional annotation of the encoded regions of the genome of the biological system (obtained from the database 200 or calculated on the basis of the information provided by the digital twin 206).
  • the method simulates a time development of the metabolic network 209 in such a way as to identify a final phenotype of the biological system.
  • the method performs a simulation 222 of the metabolic system model which makes it possible to rapidly define the behaviour of the biological system over time, identifying one or more features associated with a final phenotype.
  • final phenotype means identifying a status calculated in accordance with one or more conditions of interest which can be time conditions (for example, a predetermined time interval) or physiological conditions of the biological system (reaching or manifestation of predetermined features in the phenotype or in the components of the sample).
  • the biological system identifies a plurality of biological targets 705 defining a condition of the respective components and the final phenotype represents a set of biological targets which considered in their entirety provide information concerning the final status of the individual components of interest of the biological system.
  • the accuracy of the simulation 222 is controlled and validated.
  • the acquisition of the experimental data 207 is preferably performed in real time and the simulation 222 of the time development of the metabolic network 209 is validated by means of a comparison with the experimental data 207 and, if necessary, with tabulated data which are periodically acquired.
  • the validation of the simulation 222, relative to the coupling with its digital twin 206 may be assessed by means of quantitative indicators, including, but by way of a non-limiting example, publications such as NAFEMS (National Agency for Finite Element Methods and Standards) concerning the best practice to be adopted in terms of modelling, analyses and simulations, or according to the guidelines defined in ISO/IEC Guide to the Expression of Uncertainty in Measurement (GUM).
  • NAFEMS National Agency for Finite Element Methods and Standards
  • the simulation 222 can be regenerated every time the differences with respect to the data (experimental or tabulated) that are acquired deviate by a percentage value greater than a predetermined threshold.
  • the step of constructing the metabolic network 209 of the biological system and/or the step of simulating the time development of the metabolic network 209 are performed as a function of a sub-set of the plurality of data.
  • the accuracy of the simulation 222 is guaranteed by the application of Data Reduction methods such as, for example, machine learning techniques based on Principal Component Analysis (PCA), Generative Adversarial Networks (GAN), modelling of the Gaussian process, or other methods which compensate for the imprecisions deriving from the use of a smaller number of items of information to construct the Gaussian network 209.
  • PCA Principal Component Analysis
  • GAN Generative Adversarial Networks
  • modelling of the Gaussian process or other methods which compensate for the imprecisions deriving from the use of a smaller number of items of information to construct the Gaussian network 209.
  • the procedure described above may also be performed by increasing the quantity of data considered or reintegrating data previously rejected if the times for execution of the above-mentioned steps are maintained below a further predetermined time interval.
  • a report is generated representing and identifying the final phenotype.
  • a document is generated which identifies the features of the final phenotype preferably identifying at least one biological target 705, even more preferably a feature of the biological target 705 such as, for example, a concentration, in the final phenotype.
  • Figure 3 shows a diagram for implementing the method according to the invention which starts with the reconstruction of the metabolic network 209 starting from the data acquired relative to the biological system to be analysed until obtaining the simulation 222.
  • the metabolic network 209 is represented in terms of a problem of satisfying constraints 210 (indicated in the drawings for simplicity also as “constraints 210”), which may include matrix representations known in the prior art, such as the stoichiometric matrix. This representation allows the formalisation and the resolution of the problem and the sampling of the metabolic flows 212, using, for example, artificial centering hit-and-run techniques (ACHR) or constraint logic programming (CLP).
  • ACCR centering hit-and-run techniques
  • CLP constraint logic programming
  • the time intervals 403 are used to determine the solutions which maximise the distribution of the metabolic flows 213 determined previously, thus allowing the distribution of the concentrations 216 of the metabolites to be obtained, that is, of components of interest of the biological system.
  • FIG. 4 The operating principle which forms the basis of the machine learning algorithms implemented for determining the distribution of the metabolic flows 213 is schematically illustrated in Figure 4, which shows that the data coming from the database 200, together with the properties of the metabolites 401, are provided at input to an artificial neural network 402 defined by the optimisation engine 211.
  • the properties of the metabolites 401 contribute to identifying the biological targets and can include chemical-physical properties, such as, for example, molecular weight, hydrophobicity, electronic charge, and others, that is to say, quantities calculated on the basis of the participation of the metabolites in metabolic paths, for example deriving from analysis of the topology of the metabolic networks 209.
  • biological targets defined by one or more properties, of which those listed above are possible examples, are identified and supplied to the neural network 402.
  • the neural network 402 provides as output the ranges 403 of the concentrations of metabolites whose properties have been provided to the neural network 402.
  • intervals 403, together with the constraints 210 defined by the reformulation of the metabolic network 209, are directed to a non-linear optimisation device 404 which minimises a target function 405, in order to obtain a preliminary estimate of the concentrations 406 of the metabolites.
  • the estimates 406 are supplied at input to an opposing generative network which comprises a first neural network (generator) 407 and a second neural network (discriminator) 411.
  • the generator 407 transforms the concentrations 406 into a code 408 of flows and parameters.
  • the code 408 is used by a computer 409 in order to generate concentrations (constructed) 410, by rate equations which can be based on Michaelis-Menten kinetics, mass action, power-law, LinLog, and others, in the context of the so- called hybrid models.
  • the estimated concentrations 406 and those constructed 410 by the generator 407 are supplied at input 412 to the discriminator 411.
  • the discriminator 411 constructs a prediction of the flows and from the comparison of the concentrations modifies the neural network 407 by back- propagation of the error 414 (competitive training) as shown in the “prediction + feedback block” network 413 of the diagram of Figure 4.
  • the distribution of the metabolic flows 213 may be optimised by comparing with the result produced by ACHR, CLP algorithms and others.
  • the method further comprises the step of determining a time dynamic of the concentrations of the components of the biological system, identified in the drawings as a distribution of the concentrations 216, as a function of the time dynamics of the metabolic flows.
  • the generation of the simulation 222 is performed also as a function of the time dynamics of the concentrations of the components of the biological system.
  • the distribution of the metabolic flows 213 is supplied at input to an opposing generative network which comprises a first neural network (generator) 501 and a second neural network (discriminator) 505.
  • an opposing generative network which comprises a first neural network (generator) 501 and a second neural network (discriminator) 505.
  • the generator 501 transforms the distribution of the metabolic flows 213 into a code 502 of concentrations and parameters.
  • the code 502 is used by a flow calculator 503, through rate equations, similar to those identified above to generate the distributions of the flows (constructed) 504.
  • the distribution of the metabolic flows 213 and the distribution of the flows (constructed) 504 are provided at the input 506 to the discriminator 505.
  • the discriminator 505 constructs a prediction of the flows and from the comparison of the concentrations modifies the neural network of the generator 501 by back-propagation of the error 508 (competitive training) as shown in the “prediction + feedback block" 507 of the diagram of Figure 5.
  • This process which concludes when the discriminator is unable to distinguish between distribution of the metabolic flows 213 and distributions of the flows (constructed) 504, produces a distribution of the concentrations 216 which identifies a time development of the concentrations of the components of the biological system.
  • the determination of the time dynamic of the metabolic flows 213 and of the concentrations 216 of the components is performed by applying predictive and/or automatic learning algorithms defining kinetic parameters 220 of the biological system.
  • differential equations 221 are identified which parameterise the time development of the simulation of the biological system as a function of the kinetic parameters 220.
  • differential equations 221 comprise balance and rate equations, as described above, and contain a number of kinetic parameters 220 which are defined as a function of the distribution of the metabolic flows 213 and of the distribution of the concentrations 216.
  • Differential equations 221 and parameters 220 constitute a model which is used to implement computational simulations 222 of the time dynamics of the biological system.
  • Figure 6 illustrates in more detail the process for defining the differential equations 221.
  • a monitoring, optimisation and simulation engine 218 constructs by means of machine learning algorithms a learning procedure by reinforcement, so-called reinforcement- learning 219, which comprises an agent 601, and an environment 605.
  • the agent 601 may be implemented through a neural network 603 which receives as input the observation of a status 606 from the environment 605.
  • the status 606 is represented by the vector of the distribution of the metabolic flows 213 and of the concentrations 216 deriving from the simulation 222 of the time dynamics of the metabolic network 209.
  • the neural network 603 transforms the status 606 entering a probability vector wherein each component represents a particular action 604.
  • the action 604 is implemented as an modification on the environment 605, as described below.
  • the environment 605 comprises the metabolic network 209 and the simulation 222, in terms of a system of differential equations 221.
  • the effect of the action 604 is to modify one or more kinetic parameters 220 and to re-determine the simulation 222 based on the modifications made by the action.
  • the simulation 222 is determined each time on the basis of random disturbances which involve specific classes of components of the biological system.
  • disturbances include mono-, di-, tri-phosphate adenosine and adenylate kinase (which reflect the cellular energy status), nicotinamide adenine dinucleotide (phosphate) oxidised and reduced (which reflect the cellular redox and oxidising stress status).
  • thermodynamic constraints 217 are calculated, such as, for example, chemical potentials and distance from thermodynamic equilibrium, for example Onsager relations.
  • thermodynamic constraints 217 The degree of deviation between thermodynamic constraints 217 with respect to the experimental data 207 during the time development of the biological system is translated into a compensation value 607.
  • This compensation modifies the neural network 603 in order to maximise the total positive compensation.
  • each neural network may comprise a series of functional elements, for example computation elements which are each adjusted by means of a weight parameter, added to each other plus a bias, and subjected to an activation function.
  • the activation function may be represented by various types, including logistic (sigmoidal), softmax, Gaussian, absolute value, linear, rectified linear, sine, squared, square root, polyharmonic, and so on, including the inverse functions.
  • the hyper-parameters of the neural network are adjusted in order to minimise the error between the predictions of the model and the experimental data sets.
  • the method may be advantageously implemented for identifying biological targets 705 generated/influenced in response to a disturbance, for example exposure of the biological system to a chemical compound (such as a drug).
  • a chemical compound such as a drug
  • the initial phenotype that is, the reference phenotype 701 and the definition of a target phenotype 702 to be reached, is determined.
  • the phenotypes 701, 702 each consist of vectors of the concentrations of the metabolites and metabolic flows.
  • phenotypes may, if necessary, but not necessarily, come entirely or partly from databases 200, public repositories or also from experimental data 207.
  • the simulation 222 is generated and the flows are altered by means of reinforcement-learning algorithms 219 until the current status approaches the target status with a desired degree of tolerance.
  • the identification of one or more biological targets 705 occurs on the basis of the amplitude of the changes to the flows performed, in increasing order, storing 704 the data obtainable from the simulation 222.
  • the biological targets 705 identified and a summary of the computational data supporting the identification are used to generate the report 706.
  • the method is performed defining a target phenotype 702 and simulating the time development until the final phenotype coincides with the target phenotype 702.
  • the report is generated by providing information on one or more biological targets 705 which have led to the definition of the target phenotype 702 of interest.
  • the method may be implemented for executing the identification of the final phenotype of a biological system generated/determined in response to a disturbance, for example the exposure of the biological system to a chemical compound (a drug).
  • the reference phenotype 801 is determined. This status may, if necessary, but not necessarily, come entirely or partly from databases 200, public repositories or also from experimental data 207.
  • Definition 802 of one or more biological targets is also carried out. This definition may derive from the identification of one or more biological targets, as described above.
  • the method is performed by defining at least one objective biological target and simulating the time development of the biological system until the objective biological target is obtained, that is to say, until a biological target is detected which coincides with the objective biological target.
  • the method described also makes it possible to determine the efficacy and/or any degree of toxicity which a disturbance may have relative to the metabolic network of the biological system being analysed.
  • a disturbance is applied to the metabolic network such as to modify the distribution of the metabolic flows 213 and/or the concentrations 216 of the components.
  • the procedure is performed using a monitoring, optimisation and simulation engine 218 which comprises a resolutor 902 of differential equations 221.
  • the resolutor 902 returns as output the time dynamics of the concentrations 903 of the metabolites and the time dynamics of the flows 904 of the metabolic reactions, collectively grouped together in suitable vectors that constitute the simulation 222 of the metabolism of the biological system described by the differential equations 221, which are in turn derived from the metabolic network 209.
  • the kinetic parameters 220 of the model, as described above, together with efficacy and toxicity thresholds 901, are directed to a computer for calculating a prediction of efficacy and toxicity 905.
  • These thresholds may, if necessary, but not necessarily, come entirely or partly from databases 200, or also from experimental data 207.
  • the determination of efficacy and toxicity of a chemical compound (as already indicated, for example, a drug), or of a genetic or environmental disturbance of different types, on the behaviour of the biological system with reference to one or more biological targets, identified algorithmically 705 or defined 802 in advance, is performed by calculating distance functions 906 from the reference phenotype based on the parameters of the simulations and on the definition of efficacy and toxicity thresholds.
  • These distance functions include, by way of non-limiting example, the divergence of Kullbach-Leibler, the variational distance, and others.
  • the method also comprises an activation step wherein one or more actuators are operated in such a way as to modify at least one parameter of the biological system as a function of the report.
  • the parameter in question may be a biological target and the activation of the actuators causes a variation in the phenotype of the biological system.
  • the invention achieves the preset aims overcoming the drawbacks of the prior art by providing the user with a method for the predictive analysis of a biological system which makes it possible to obtain complete and accurate information regarding the metabolism of a biological system.
  • the invention also relates to an apparatus for analysing a biological system which is particularly suitable for performing the method described above.
  • the apparatus comprises a computer, made available to the user for example by means of a computing cloud system 100, configured for executing a method in accordance with any combination of the features identified above and a user interface 107.
  • the apparatus may comprise a cloud-computing system 100 of the client-server type, where specific communication hardware/software 106, for example Web sockets, allow the two-way transfer of data between various processes on the same device, that is to say, on separate devices.
  • specific communication hardware/software 106 for example Web sockets
  • the client may send requests to the server, which processes them and responds to the client, if necessary with the transfer of data of various types in both directions.
  • the user can connect to the server using a Uniform Resource Locator (URL) and start a request based on the Hyper Text Transfer Protocol (HTTP) or other protocols, which include the transfer of web pages in the Hyper Text Markup Language (HTML) which are rendered on the client system.
  • HTTP Hyper Text Transfer Protocol
  • GUI Graphical User Interface
  • the user interface 107 includes input devices 109 through which the user enters data which the client can send to the server.
  • the user interface 107 includes output devices 108 through which the user displays the data which the server can send to the client.
  • the cloud-computing system 100 includes at least one or more data processing units 101, in terms of CPU, and one or more memory units 102.
  • the memory 102 contains the algorithms which implement the steps of the method described above.
  • the apparatus in particular the computer, comprises a predictor module of the metabolic flows (predictor of the flows of the metabolic reactions 103 in the drawings), which is configured to construct the metabolic network of the biological system as a function of the plurality of data and to determine the time dynamics of the metabolic flows of the biological system as a function of the metabolic network.
  • a predictor module of the metabolic flows predictor of the flows of the metabolic reactions 103 in the drawings
  • the module can further comprise a predictor module of the concentrations of the components of the biological samples (predictor of the concentrations of the metabolites 104 in the drawings), which is configured for determining a time dynamic of the concentrations of the components of the biological system as a function of the time dynamic of the metabolic flows.
  • a predictor module of the concentrations of the components of the biological samples predictor of the concentrations of the metabolites 104 in the drawings
  • the apparatus may further comprise a predictor module of the kinetic model 105 of the biological system, which is configured for simulating the time development of the metabolic network identifying the final phenotype of the biological system.
  • the apparatus may further comprise at least one sensor configured to measure at least one parameter of the biological system.
  • the computer is further configured to generate a digital twin 206 of the biological system as a function of the at least one parameter, from which can subsequently be acquired experimental data 207 useful for determining the metabolic network and for simulating its time development.
  • the computer may be further connected with a database 200 (for example one or more of the databases 200 identified above) for acquiring a plurality of tabulated data representing the initial phenotype of the biological system.
  • a database 200 for example one or more of the databases 200 identified above
  • the apparatus may also comprise a plurality of actuators configured to modify at least one parameter of the biological system.
  • the actuators can be operated as a function of and in response to the results of the operations which lead to the generation of the simulation representing the time development of the biological system, and in particular as a function of the report.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Molecular Biology (AREA)
  • Physiology (AREA)
  • Artificial Intelligence (AREA)
  • Bioethics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Evolutionary Computation (AREA)
  • Public Health (AREA)
  • Software Systems (AREA)
  • Automatic Analysis And Handling Materials Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

Described is a method which can be implemented by a computer for the predictive analysis of a biological system by acquiring a plurality of data representing an initial phenotype of the biological system and constructing on the basis of the data a metabolic network (209) identifying the components of the biological system and the mutual relations. A time development of the metabolic network (209) is simulated by the application of predictive algorithms and automatic learning in such a way as to identify a final phenotype type of the biological system. Lastly, a report is generated which represents and identifies the final phenotype

Description

DESCRIPTION
METHOD FOR PREDICTIVE ANALYSIS OF A BIOLOGICAL SYSTEM
This invention relates to the technical sector of bioinformatics and biological logic.
More specifically, the invention relates to a method for the predictive analysis of a biological system, especially from the point of view of the processes which involve the cellular metabolism.
The prediction of the cellular behaviour in various conditions is a problem with enormous effects in the pharmaceutical and biotechnological sector.
For example, clinical tests carried out on humans relative to the development of potential new medicines have a failure rate of close to 90% due mainly to problems of poor efficacy or high toxicity of the compounds used. For this reason, the predictive analysis relative to the metabolic processes which characterise a biological system, in particular with respect to the effects caused by the interaction of the compounds potentially usable as drugs, is extremely useful in order to assess the efficacy and/or the toxicity of these compounds.
However, in the predictive analyses relating to the biological effect of potential new medicines, the number of factors involved is immense due to the enormous complexity and the degree of interconnection between gene expression and chemical reactions.
The same considerations apply in general with reference to the study of the metabolic behaviour of biological samples in order to assess the response to certain stimuli or disturbances of the ambient conditions to which they are exposed.
In this context there in fact are numerous levels in which intricate biophysical processes and sophisticated control and adjustment mechanisms are organised.
The degree of complexity of biological samples makes it extremely difficult to infer these processes and mechanisms on the basis of experimental measurements relating to an initial phenotype of one or more components of interest of the sample which are inevitably incomplete. In fact, the amount of data needed to characterise the overall metabolic activity makes acquisition with current technologies prohibitive in terms of costs and times.
In short, despite the biomedical and/or industrial interest, it is currently practically impossible to establish in advance the effect of various types of disturbances, such as genetic mutations, medicines or environmental factors, on the metabolic behaviour of the samples on which the analysis is performed, especially when these disturbances occur in a synergic manner, that is to say, in a way that is not isolated and/or in situations in which different disturbances occur on different time scales.
Accurate predictions of the effects of genetic, pharmacological or environmental disturbances on the status of the components of the biological system would make it possible to improve the efficiency of the processes for discovering new medicines and for metabolic engineering, which currently require considerable resources in terms of costs and time.
In this context there are computational methods which make it possible to attenuate the problems regarding the processes for analysing biological samples.
Amongst the computational methods developed for analysing the metabolism of the biological samples, the more commonly used approaches relate to models which operate on non-equilibrium stationary phenotypes (that is, in the absence of time dynamics), which include, for example, constraint-based models of metabolic reactions, of which the most known approach is the so-called Flux Balance Analysis (FBA), and variants (for example, dynamic Flux Balance Analysis (dFBA); Thermodynamic-Based Flux Analysis (TFA); Group Contribution Model (GCM); Elementary Flux Modes Analysis (EFM); Flux Variability Analysis (FVA); and others), and network-based models of intracellular signalling, such as the so-called Pathway Analysis (PA) and variants (Over Representation Analysis (ORA); Functional Class Scoring (FCS); Pathway Topology-Based Analysis (PTB); and others).
In the approaches described above, such as, for example, FBA and PA and variants, each chemical reaction or biological interaction is assigned a numerical value which is independent of time.
In these approaches, the experimental data is commonly provided as comparative measurements of two or more phenotypes, such as, for example, treated/untreated, healthy /ill, or the like.
These techniques are therefore intrinsically stationary, in the sense that they provide static information regarding the cellular phenotype at a predetermined instant or in a specific condition.
This does not allow all the situations to be analysed in which there are variations or alterations of one or more variables depending on time which have the effect of making the living system pass from an initial phenotype to a final phenotype.
For this reason, these techniques do not allow any information to be obtained with respect to the transition which may have a variable duration, for example it may last from a few seconds to several years, and can provide indications with respect to which elements of the living system participate in the transition.
For this reason, the prior art techniques cannot gain advantages from the experimental data, currently available relative to the concentrations of the elements which make up a vast range of biological samples of interest to the study of the metabolic processes.
Other approaches present in the prior art are able to analyse the cellular behaviour in a dynamic manner (that is to say, depending on time), and include enzyme kinetics-based models and variants (Sensitivity Analysis; Metabolic Control Analysis (MCA); Universal Method Analysis; Metabolic Design Analysis (MDA); Lin-Log + MCA; Metabolic Flux Analysis (MFA) or isotope-MFA; Stochastic Kinetic Modelling; Deterministic Kinetic Modelling; and others).
In accordance with these techniques, each of the chemical reactions or biological relations which characterise the cellular metabolism is represented by a parametric equation which depends on time, and not on a stationary value.
In this approach, the experimental data can be acquired continuously, which makes it possible to follow the time development of a treatment, a pathology, or the like.
Kinetic models are the best tool for simulating the metabolism of a biological system from a biochemical point of view, since they allow, for example, for the concentrations of various components and their rate of variation to be predicted.
On the other hand, the formalisation of the equations which describe the behaviour of a single component and the development of the system of coupled ordinary differential equations (ODE) which makes it possible to simulate in real time the metabolic behaviour is extremely complex and still represents a limitation to the development and use of the kinetic models of the cellular activity. The known kinetic techniques do not therefore provide the possibility of assessing in an overall and organic manner all the parameters of interest, and are therefore unable to accurately predict or describe the effect of various types of disturbances on the metabolism of the biological system.
More specifically, although the prior art techniques are able to provide indications relative to the behaviour of individual components, they are not able to efficiently analyse the interrelations and the consequences that the variations undergone by a predetermined component have on the rest of the elements making up the biological system to be analysed.
There is therefore a strongly felt need in the sector to develop a method which, on the one hand, permits the construction of kinetic models of the cellular activity, and, on the other hand, the analysis of large quantities of data and the calculation of a large number of variables in order to provide accurate predictions on the metabolic behaviour even in the absence of direct experimental measurements, that is to say, in the presence of necessarily incomplete experimental measurements.
In this context, the technical purpose which forms the basis of the invention is to provide a method for the predictive analysis of a biological system which overcomes at least some of the above-mentioned drawbacks of the prior art.
More specifically, the aim of the invention is to provide a method for the predictive analysis of a biological system which is able to identify and define in a particularly fast and efficient manner a final phenotype of at least one component of a biological system to be analysed.
The technical purpose indicated and the aims specified are substantially achieved by a method for the predictive analysis of a biological system comprising the technical features described in one or more of the appended claims.
The invention describes a method which can be implemented by a computer for analysing a biological system.
The method is executed by acquiring a plurality of data representing an initial phenotype of the biological system.
The initial phenotype defines one or more features and values representative of at least one component of the biological system at the start of the analysis.
A metabolic network of the biological system is constructed according to the plurality of data.
The metabolic network represents the components of interest of the biological system, the relations between them and any sequence of chemical reactions to which these components are subject.
A time development of the metabolic network is then simulated in such a way as to identify a final phenotype of the biological system.
The final phenotype defines one or more features and values representing at least one component of the biological system after a predetermined period of time or upon reaching predetermined conditions.
Lastly, a report is generated and provided to a user which represents and uniquely identifies the final phenotype obtained from the simulation.
Advantageously, the method proposed makes it possible to obtain greater and more precise information with respect to the prior art techniques since an assessment is performed on the time development, which as well as considering the individual components and their specific features, also assesses the mutual interactions simultaneously with their development over time.
The dependent claims, incorporated herein for reference, relate to different embodiments of the invention.
Further features and advantages of the invention are more apparent in the detailed description below, with reference to a preferred, non-restricting, embodiment of a method for the predictive analysis of a biological system as illustrated in the accompanying drawings, in which: - Figures 1 and 2 show with different levels of detail some of the components of an apparatus which is able to perform the method for the analysis of biological samples according to the invention;
- Figure 3 shows in a general manner some of the passages which lead to the generation of the report associated with the final phenotype of the biological system to be analysed;
- Figures 4, 5 and 6 show the logic flows of the processes which can be performed using some of the components of the apparatus;
- Figures 7 and 8 show in more detail a block diagram identifying some specific analysis procedures which can be performed;
- Figure 9 shows a different representation of the logic flows of the processes which can be performed using one of the components of the apparatus.
The method which can be implemented by a computer according to the invention makes it possible to perform an accurate analysis of a biological system, in particular with reference to its metabolic processes and the variations in its phenotype.
For the purposes of this description, the term "phenotype" is used to mean the set of all the features manifested in the biological system, that is, by way of example and without limiting the scope of the invention, its morphology, its development, its biochemical and physiological properties, the concentrations of all the substances (metabolites) present inside it and the set of enzymes expressed and their characteristics in terms of levels of expression and kinetic properties.
The biological system may be, for example, a culture of tissue organs (organotypic), cellular (such as suspension of individual cells), a biological fluid (blood, saliva...), bioptic fragments of organs or tissues, organs (cerebral, thymic, hepatic, pancreatic, pulmonary, intestinal, renal, and others).
Operatively, the method is performed by acquiring a plurality of data which contain information by means of which it is possible to represent an initial phenotype of the biological system.
In other words, the data acquired define in a manner as complete as possible the features of the biological system and can preferably comprise: genome data 201, transcriptomic data 202, protein data 203, metabolomic data 204, flow data 205, biometric data.
The data may be acquired in the form of tabulated data which can be obtained from suitable databases 200 and public repositories, preferably databases 200 available and accessible by the Internet.
Examples of database 200 available and by means of which it is possible to acquire the tabulated data of interest include in particular: Human Metabolome Database (HMDB, https://www.hmdb.ca), Kinetic MOdels of biological SYStems (KiMoSys, https://kimosys.org), Biochemical Reaction Kinetic Database (SABIO-RK, http://sabiork.h-its.org), BRaunschweig ENzyme DAtabase (BRENDA, https://www.brenda-enzymes.org), Kyoto Encyclopedia of Genes and Genomes (KEGG, https://www.genome.jp/kegg/), Virtual Metabolic Human (VMH, https://www.vmh.life), Interaction Reference Index (iReflndex, https://irefindex.vib.be).
Alternatively or in addition, the acquisition of the data of interest occurs by acquiring a plurality of experimental data 207 through a digital twin 206 of the biological system and also representing the initial phenotype.
The digital twin 206 is the virtual representation of a physical entity, in the specific case of the biological system, defining an actual digital replica of the phenotype and of the metabolism of the biological system which is replicated.
In more detail, the digital twin 206 can provide the experimental data 207, such as, for example, laboratory measurements, by means of communication protocols 208 which allow the connection, if necessary in real time, between the digital twin 206 and the computer which implements the method described here.
Preferably, the method is performed in such a way as to generate, for example by the same computer which performs the method, the digital twin 206 of the sample of interest.
In detail, the generation of the digital twin 206 is performed by activating at least one sensor configured to measure at least one parameter of the biological system. This parameter provides information concerning at least one feature of the biological system as a function of which the digital twin 206 is generated. In other words, the digital twin 206 defines a dynamic computational representation of the biological system and can use advanced technologies, for example virtual or augmented reality and IOT sensors (Intemet-of-Things) for monitoring, sharing and updating the experimental data 207 dynamically (that is to say, data which evolve over time), by means of suitable communication protocols 208, between the biological system, in terms of laboratory measurements, and its virtual modelling.
The time scale for the data acquisition, data communication, and the updating of the model may vary from seconds, or fractions of seconds, up to hours, or multiples of hours.
The plurality of data acquired from databases 200 and/or digital twin 206 is used to construct the metabolic network 209 of the biological system.
The metabolic network 209 identifies the components of interest of the biological system, their specific features, their correlations and the rate of change of these quantities over time.
By way of a non-limiting example, the components of the biological system may comprise enzymes, receptors, ion channels, transport proteins, hormones, growth factors, DNA/RNA, ribosomes and in general any metabolite producible/interacting with the biological system.
Preferably, the metabolic network 209 is validated in terms of its completeness, for example using algorithms known in the prior art for gap-filling or functional annotation of the encoded regions of the genome of the biological system (obtained from the database 200 or calculated on the basis of the information provided by the digital twin 206).
Once the metabolic network 209 associated with the biological system to be analysed is reconstructed in a precise and accurate manner, the method simulates a time development of the metabolic network 209 in such a way as to identify a final phenotype of the biological system.
In other words, the method performs a simulation 222 of the metabolic system model which makes it possible to rapidly define the behaviour of the biological system over time, identifying one or more features associated with a final phenotype.
As described in more detail below, the term "final phenotype" means identifying a status calculated in accordance with one or more conditions of interest which can be time conditions (for example, a predetermined time interval) or physiological conditions of the biological system (reaching or manifestation of predetermined features in the phenotype or in the components of the sample).
More specifically, the biological system identifies a plurality of biological targets 705 defining a condition of the respective components and the final phenotype represents a set of biological targets which considered in their entirety provide information concerning the final status of the individual components of interest of the biological system.
Preferably, the accuracy of the simulation 222 is controlled and validated.
For this purpose, the acquisition of the experimental data 207 is preferably performed in real time and the simulation 222 of the time development of the metabolic network 209 is validated by means of a comparison with the experimental data 207 and, if necessary, with tabulated data which are periodically acquired.
More specifically, the validation of the simulation 222, relative to the coupling with its digital twin 206, may be assessed by means of quantitative indicators, including, but by way of a non-limiting example, publications such as NAFEMS (National Agency for Finite Element Methods and Standards) concerning the best practice to be adopted in terms of modelling, analyses and simulations, or according to the guidelines defined in ISO/IEC Guide to the Expression of Uncertainty in Measurement (GUM).
In any case, the simulation 222 can be regenerated every time the differences with respect to the data (experimental or tabulated) that are acquired deviate by a percentage value greater than a predetermined threshold.
At the same time, in order to prevent the process for generating the metabolic network 209 and/or the subsequent simulation from requiring excessive time, for example times such that the final phenotype is made available when the information is obsolete, it is possible to reduce the computational time necessary by reducing the quantity of data used.
In particular, if the construction of the metabolic network 209 and/or the simulation of the time development of the metabolic network 209 is not completed within a predetermined time interval, the step of constructing the metabolic network 209 of the biological system and/or the step of simulating the time development of the metabolic network 209 are performed as a function of a sub-set of the plurality of data.
In this context, the accuracy of the simulation 222 is guaranteed by the application of Data Reduction methods such as, for example, machine learning techniques based on Principal Component Analysis (PCA), Generative Adversarial Networks (GAN), modelling of the Gaussian process, or other methods which compensate for the imprecisions deriving from the use of a smaller number of items of information to construct the Gaussian network 209.
The procedure described above may also be performed by increasing the quantity of data considered or reintegrating data previously rejected if the times for execution of the above-mentioned steps are maintained below a further predetermined time interval.
Once the features of the final phenotype have been precisely defined, a report is generated representing and identifying the final phenotype.
In other words, a document is generated which identifies the features of the final phenotype preferably identifying at least one biological target 705, even more preferably a feature of the biological target 705 such as, for example, a concentration, in the final phenotype.
In more detail, Figure 3 shows a diagram for implementing the method according to the invention which starts with the reconstruction of the metabolic network 209 starting from the data acquired relative to the biological system to be analysed until obtaining the simulation 222.
The metabolic network 209 is represented in terms of a problem of satisfying constraints 210 (indicated in the drawings for simplicity also as “constraints 210”), which may include matrix representations known in the prior art, such as the stoichiometric matrix. This representation allows the formalisation and the resolution of the problem and the sampling of the metabolic flows 212, using, for example, artificial centering hit-and-run techniques (ACHR) or constraint logic programming (CLP).
These solutions are used by automatic learning algorithms to obtain the distribution of the metabolic flows 213, that is to say, a time dynamic of the metabolic flows of the components of interest of the biological system which will in turn be used to generate the simulation of the time development of the metabolic network 209.
Similarly, using the database 200 to train artificial neural networks using machine-learning algorithms defines physiological intervals 403 of the concentrations (lower limits and upper limits) for the concentrations of the metabolites.
The time intervals 403 are used to determine the solutions which maximise the distribution of the metabolic flows 213 determined previously, thus allowing the distribution of the concentrations 216 of the metabolites to be obtained, that is, of components of interest of the biological system.
The operating principle which forms the basis of the machine learning algorithms implemented for determining the distribution of the metabolic flows 213 is schematically illustrated in Figure 4, which shows that the data coming from the database 200, together with the properties of the metabolites 401, are provided at input to an artificial neural network 402 defined by the optimisation engine 211.
The properties of the metabolites 401 (together in particular with the flows 213) contribute to identifying the biological targets and can include chemical-physical properties, such as, for example, molecular weight, hydrophobicity, electronic charge, and others, that is to say, quantities calculated on the basis of the participation of the metabolites in metabolic paths, for example deriving from analysis of the topology of the metabolic networks 209.
In other words, biological targets defined by one or more properties, of which those listed above are possible examples, are identified and supplied to the neural network 402.
The neural network 402 provides as output the ranges 403 of the concentrations of metabolites whose properties have been provided to the neural network 402.
These intervals 403, together with the constraints 210 defined by the reformulation of the metabolic network 209, are directed to a non-linear optimisation device 404 which minimises a target function 405, in order to obtain a preliminary estimate of the concentrations 406 of the metabolites.
The estimates 406 are supplied at input to an opposing generative network which comprises a first neural network (generator) 407 and a second neural network (discriminator) 411.
More specifically, the generator 407 transforms the concentrations 406 into a code 408 of flows and parameters.
The code 408 is used by a computer 409 in order to generate concentrations (constructed) 410, by rate equations which can be based on Michaelis-Menten kinetics, mass action, power-law, LinLog, and others, in the context of the so- called hybrid models.
The estimated concentrations 406 and those constructed 410 by the generator 407 are supplied at input 412 to the discriminator 411.
The discriminator 411 constructs a prediction of the flows and from the comparison of the concentrations modifies the neural network 407 by back- propagation of the error 414 (competitive training) as shown in the “prediction + feedback block" network 413 of the diagram of Figure 4.
This process, which concludes when the discriminator is unable to distinguish between estimated 406 and constructed 410 concentrations, produces the distribution of the metabolic flows 213.
The distribution of the metabolic flows 213 may be optimised by comparing with the result produced by ACHR, CLP algorithms and others.
The method further comprises the step of determining a time dynamic of the concentrations of the components of the biological system, identified in the drawings as a distribution of the concentrations 216, as a function of the time dynamics of the metabolic flows.
Advantageously, the generation of the simulation 222 is performed also as a function of the time dynamics of the concentrations of the components of the biological system.
The process for generating the distribution of the concentrations 216 is shown in more detail in Figure 5, which illustrates an optimisation engine 214, which implements the machine-learning algorithms described below for optimising the sampling of the concentrations.
More specifically, the distribution of the metabolic flows 213 is supplied at input to an opposing generative network which comprises a first neural network (generator) 501 and a second neural network (discriminator) 505.
More specifically, the generator 501 transforms the distribution of the metabolic flows 213 into a code 502 of concentrations and parameters.
The code 502 is used by a flow calculator 503, through rate equations, similar to those identified above to generate the distributions of the flows (constructed) 504.
The distribution of the metabolic flows 213 and the distribution of the flows (constructed) 504 are provided at the input 506 to the discriminator 505.
The discriminator 505 constructs a prediction of the flows and from the comparison of the concentrations modifies the neural network of the generator 501 by back-propagation of the error 508 (competitive training) as shown in the “prediction + feedback block" 507 of the diagram of Figure 5.
This process, which concludes when the discriminator is unable to distinguish between distribution of the metabolic flows 213 and distributions of the flows (constructed) 504, produces a distribution of the concentrations 216 which identifies a time development of the concentrations of the components of the biological system.
For this reason, in light of the above, the determination of the time dynamic of the metabolic flows 213 and of the concentrations 216 of the components is performed by applying predictive and/or automatic learning algorithms defining kinetic parameters 220 of the biological system.
In other words, the differential equations 221 are identified which parameterise the time development of the simulation of the biological system as a function of the kinetic parameters 220.
In detail, the differential equations 221 comprise balance and rate equations, as described above, and contain a number of kinetic parameters 220 which are defined as a function of the distribution of the metabolic flows 213 and of the distribution of the concentrations 216. Differential equations 221 and parameters 220 constitute a model which is used to implement computational simulations 222 of the time dynamics of the biological system.
If necessary, it is also possible to establish a connection, by means of suitable hardware/software communication elements 106, with the digital twin 206, and experimental data 207 obtained from the biological system are used in the context of the simulations 222, for example to perform a prediction 905 of efficacy and/or toxicity of a drug, as described below, that is to say, of other genetic or environmental disturbance.
Figure 6 illustrates in more detail the process for defining the differential equations 221.
A monitoring, optimisation and simulation engine 218 constructs by means of machine learning algorithms a learning procedure by reinforcement, so-called reinforcement- learning 219, which comprises an agent 601, and an environment 605.
The agent 601 may be implemented through a neural network 603 which receives as input the observation of a status 606 from the environment 605. The status 606 is represented by the vector of the distribution of the metabolic flows 213 and of the concentrations 216 deriving from the simulation 222 of the time dynamics of the metabolic network 209.
The neural network 603 transforms the status 606 entering a probability vector wherein each component represents a particular action 604.
The action 604 is implemented as an modification on the environment 605, as described below.
More specifically, the environment 605 comprises the metabolic network 209 and the simulation 222, in terms of a system of differential equations 221.
The effect of the action 604 is to modify one or more kinetic parameters 220 and to re-determine the simulation 222 based on the modifications made by the action. The simulation 222 is determined each time on the basis of random disturbances which involve specific classes of components of the biological system.
Examples of disturbances include mono-, di-, tri-phosphate adenosine and adenylate kinase (which reflect the cellular energy status), nicotinamide adenine dinucleotide (phosphate) oxidised and reduced (which reflect the cellular redox and oxidising stress status).
Based on the simulations 222 obtained from the above-mentioned disturbances, thermodynamic constraints 217 are calculated, such as, for example, chemical potentials and distance from thermodynamic equilibrium, for example Onsager relations.
The degree of deviation between thermodynamic constraints 217 with respect to the experimental data 207 during the time development of the biological system is translated into a compensation value 607.
This compensation, which may be negative (penalty), modifies the neural network 603 in order to maximise the total positive compensation.
In general, each neural network may comprise a series of functional elements, for example computation elements which are each adjusted by means of a weight parameter, added to each other plus a bias, and subjected to an activation function. The activation function may be represented by various types, including logistic (sigmoidal), softmax, Gaussian, absolute value, linear, rectified linear, sine, squared, square root, polyharmonic, and so on, including the inverse functions. During the process of training the neural network, the hyper-parameters of the neural network are adjusted in order to minimise the error between the predictions of the model and the experimental data sets. Various prior art methods can be used to minimise the error function, including gradient-descend, sum of square errors (residual), hinge-loss methods, and other methods of second order or approximation such as, for example, Hessian-free, Nesterov moment, or the like.
According to one aspect of the invention, as illustrated schematically in Figure 7, the method may be advantageously implemented for identifying biological targets 705 generated/influenced in response to a disturbance, for example exposure of the biological system to a chemical compound (such as a drug).
Following the acquisition of the metabolic network 209, the initial phenotype, that is, the reference phenotype 701 and the definition of a target phenotype 702 to be reached, is determined.
The phenotypes 701, 702 each consist of vectors of the concentrations of the metabolites and metabolic flows.
These phenotypes may, if necessary, but not necessarily, come entirely or partly from databases 200, public repositories or also from experimental data 207.
It is then determined whether the flows of the target status 702 differ from the reference status 701, and an alteration 703 of the flows in the model is performed directly if this is the case.
After this step, that is to say, in a negative case, the simulation 222 is generated and the flows are altered by means of reinforcement-learning algorithms 219 until the current status approaches the target status with a desired degree of tolerance.
The identification of one or more biological targets 705 occurs on the basis of the amplitude of the changes to the flows performed, in increasing order, storing 704 the data obtainable from the simulation 222.
The biological targets 705 identified and a summary of the computational data supporting the identification are used to generate the report 706.
In other words, the method is performed defining a target phenotype 702 and simulating the time development until the final phenotype coincides with the target phenotype 702.
The report is generated by providing information on one or more biological targets 705 which have led to the definition of the target phenotype 702 of interest.
Alternatively or additionally, with reference to Figure 8, the method may be implemented for executing the identification of the final phenotype of a biological system generated/determined in response to a disturbance, for example the exposure of the biological system to a chemical compound (a drug).
Following acquisition of the metabolic network 209, the reference phenotype 801 is determined. This status may, if necessary, but not necessarily, come entirely or partly from databases 200, public repositories or also from experimental data 207.
Definition 802 of one or more biological targets is also carried out. This definition may derive from the identification of one or more biological targets, as described above.
The definition firstly, and then the alteration, of the pharmacokinetic (PK) and pharmacodynamic (PD) parameters (PD) 803, in the case of pharmacological disturbance, precedes the simulation 222, as described below, until the desired PK/PD parameters are finished. The characterisation of the biological effect of the disturbance, in terms of efficacy and toxicity, as described below, occurs by storing the data 804 obtainable from the simulation 222.
The efficacy and the toxicity of the drug, that is, of the disturbance, and a summary of the computational data supporting the determination of the biological effect are used to generate detailed reports 806.
In other words, the method is performed by defining at least one objective biological target and simulating the time development of the biological system until the objective biological target is obtained, that is to say, until a biological target is detected which coincides with the objective biological target.
The method described also makes it possible to determine the efficacy and/or any degree of toxicity which a disturbance may have relative to the metabolic network of the biological system being analysed.
More specifically, it is possible to define a reference phenotype which identifies a predetermined and desired combination of parameters for the components of interest of the biological system.
A disturbance is applied to the metabolic network such as to modify the distribution of the metabolic flows 213 and/or the concentrations 216 of the components.
Depending on the effect obtained from this disturbance, a disturbed phenotype of the biological system is identified.
It is therefore possible to determine an efficacy and toxicity parameter associated with the disturbance as a function of the distance between the final phenotype and the reference phenotype.
In more detail, with reference to Figure 9, the procedure is performed using a monitoring, optimisation and simulation engine 218 which comprises a resolutor 902 of differential equations 221. The resolutor 902 returns as output the time dynamics of the concentrations 903 of the metabolites and the time dynamics of the flows 904 of the metabolic reactions, collectively grouped together in suitable vectors that constitute the simulation 222 of the metabolism of the biological system described by the differential equations 221, which are in turn derived from the metabolic network 209.
The kinetic parameters 220 of the model, as described above, together with efficacy and toxicity thresholds 901, are directed to a computer for calculating a prediction of efficacy and toxicity 905.
These thresholds may, if necessary, but not necessarily, come entirely or partly from databases 200, or also from experimental data 207.
The determination of efficacy and toxicity of a chemical compound (as already indicated, for example, a drug), or of a genetic or environmental disturbance of different types, on the behaviour of the biological system with reference to one or more biological targets, identified algorithmically 705 or defined 802 in advance, is performed by calculating distance functions 906 from the reference phenotype based on the parameters of the simulations and on the definition of efficacy and toxicity thresholds.
These distance functions include, by way of non-limiting example, the divergence of Kullbach-Leibler, the variational distance, and others.
Preferably, the method also comprises an activation step wherein one or more actuators are operated in such a way as to modify at least one parameter of the biological system as a function of the report.
In particular, the parameter in question may be a biological target and the activation of the actuators causes a variation in the phenotype of the biological system.
Advantageously, the invention achieves the preset aims overcoming the drawbacks of the prior art by providing the user with a method for the predictive analysis of a biological system which makes it possible to obtain complete and accurate information regarding the metabolism of a biological system.
The invention also relates to an apparatus for analysing a biological system which is particularly suitable for performing the method described above.
From a structural point of view, the apparatus comprises a computer, made available to the user for example by means of a computing cloud system 100, configured for executing a method in accordance with any combination of the features identified above and a user interface 107.
More in detail, as illustrated with different levels of detail in Figures 1 and 2, the apparatus may comprise a cloud-computing system 100 of the client-server type, where specific communication hardware/software 106, for example Web sockets, allow the two-way transfer of data between various processes on the same device, that is to say, on separate devices.
More specifically, using a user interface 107 the client may send requests to the server, which processes them and responds to the client, if necessary with the transfer of data of various types in both directions.
For example, the user can connect to the server using a Uniform Resource Locator (URL) and start a request based on the Hyper Text Transfer Protocol (HTTP) or other protocols, which include the transfer of web pages in the Hyper Text Markup Language (HTML) which are rendered on the client system. Alternatively, the user can use a Graphical User Interface (GUI) to enable data to be sent to the server.
The user interface 107 includes input devices 109 through which the user enters data which the client can send to the server. The user interface 107 includes output devices 108 through which the user displays the data which the server can send to the client.
The cloud-computing system 100 includes at least one or more data processing units 101, in terms of CPU, and one or more memory units 102.
The memory 102, as well as storing the input and output data, contains the algorithms which implement the steps of the method described above.
In more detail, the apparatus, in particular the computer, comprises a predictor module of the metabolic flows (predictor of the flows of the metabolic reactions 103 in the drawings), which is configured to construct the metabolic network of the biological system as a function of the plurality of data and to determine the time dynamics of the metabolic flows of the biological system as a function of the metabolic network.
The module can further comprise a predictor module of the concentrations of the components of the biological samples (predictor of the concentrations of the metabolites 104 in the drawings), which is configured for determining a time dynamic of the concentrations of the components of the biological system as a function of the time dynamic of the metabolic flows.
The apparatus may further comprise a predictor module of the kinetic model 105 of the biological system, which is configured for simulating the time development of the metabolic network identifying the final phenotype of the biological system. The apparatus may further comprise at least one sensor configured to measure at least one parameter of the biological system.
In this context, the computer is further configured to generate a digital twin 206 of the biological system as a function of the at least one parameter, from which can subsequently be acquired experimental data 207 useful for determining the metabolic network and for simulating its time development.
The computer may be further connected with a database 200 (for example one or more of the databases 200 identified above) for acquiring a plurality of tabulated data representing the initial phenotype of the biological system.
The apparatus may also comprise a plurality of actuators configured to modify at least one parameter of the biological system.
More specifically, the actuators can be operated as a function of and in response to the results of the operations which lead to the generation of the simulation representing the time development of the biological system, and in particular as a function of the report.

Claims

1. A method which can be implemented by a computer for the predictive analysis of a biological system comprising the following steps:
- acquiring a plurality of data representing an initial phenotype of the biological system;
- constructing a metabolic network (209) of the biological system as a function of said plurality of data;
- simulating (222) a time development of said metabolic network (209) in such a way as to identify a final phenotype of the biological system;
- generating (706, 806) a report representing and identifying said final phenotype.
2. The method according to claim 1, wherein the step of acquiring the plurality of data comprises at least one of the following steps:
- acquiring from a database (200) a plurality of tabulated data representing the initial phenotype;
- acquiring from a digital twin (206) of the biological system a plurality of experimental data (207) representing the initial phenotype.
3. The method according to claim 2, wherein the acquisition of the experimental data (207) is performed in real time and the simulation of the time development of the metabolic network (209) is validated by comparing it with the experimental data (207).
4. The method according to claim 2 or 3, comprising the following steps:
- activating at least one sensor for measuring at least one parameter of a biological system;
- generating as a function of said at least one parameter a digital twin (206) of the biological system.
5. The method according to any one of the preceding claims, wherein the plurality of data comprises at least one between: genomic data (201), transcriptomic data (202), proteomic data (203), metabolic data (204), flow data (205), biometric data.
6. The method according to any one of the preceding claims, wherein if the step of simulating the time development of the metabolic network (209) is not completed within a predetermined time interval, the step of constructing the metabolic network (209) of the biological system is performed as a function of a sub-set of the plurality of data.
7. The method according to any one of the preceding claims, wherein the biological system identifies a plurality of biological targets defining a condition of respective components of the biological system and wherein the report identifies at least one biological target in the final phenotype.
8. The method according to claim 7, comprising the following steps:
- defining at least one objective biological target;
- simulating the time development until a biological target coincides with the objective target; said report being generated according to at least said objective biological target.
9. The method according to claim 7 or 8, comprising the following steps:
- defining an objective phenotype (702);
- simulating the time development until the final phenotype coincides with the objective phenotype (702); said report being generated according to at least of said objective phenotype (702)
10. The method according to any one of claims 7 to 9, wherein said components of the biological system comprise at least one between: enzymes, receptors, ion channels, transport proteins, hormones, growth factors, DNA/RNA, ribosomes or metabolites.
11. The method according to any one of the preceding claims, comprising the step of activating at least one actuator in such a way as to modify at least one parameter of said biological system according to the report.
12. The method according to any one of the preceding claims, comprising a step of applying to the metabolic network (209) a gap-filling algorithm in such a way as to validate said metabolic network (209).
13. The method according to any one of claims 7 to 12, comprising the following steps:
- representing the metabolic network (209) in terms of the problem of satisfying constraints (210);
- acquiring one or more biological targets;
- determining a time dynamic of the metabolic flows of the components of the biological system as a function of the problem of satisfying constraints (210) and biological targets; said simulating steps being performed as a function of the time dynamic of the metabolic flows.
14. The method according to claim 13, comprising the step of determining a time dynamic of the concentrations of the components of the biological system as a function of the time dynamic of the metabolic flows; said simulating steps being performed as a function of the time dynamic of the concentrations of the components of the biological system.
15. The method according to claim 14, wherein the determining of the time dynamic of the metabolic flows and of the concentrations of the components is performed by applying predictive and/or automatic learning algorithms defining kinetic parameters (220) of the biological system.
16. The method according to claim 15, comprising the following steps: - defining a reference phenotype;
- applying a disturbance to the metabolic network (209) such as to modify the dynamic time of the metabolic flows and/or of the concentrations of the components;
- identifying, as a function of said disturbance, a disturbed phenotype of the biological system;
- determining an efficacy and toxicity parameter of said disturbance as a function of the distance (906) between the final phenotype and the reference phenotype.
17. An apparatus for analysing a biological system comprising:
- a computer configured to execute a method according to any one of the preceding claims;
- a user interface (107).
18. The apparatus according to claim 16, comprising at least one sensor configured for measuring at least one parameter of the biological system, the computer preferably being configured to generate a digital twin (206) of said biological system as a function of the at least one parameter.
19. The apparatus according to claim 16 or 17, wherein the computer is connected or connectable to a database (200) for acquiring a plurality of tabulated data representing the initial phenotype of the biological system.
20. The apparatus according to any of the claims from 16 to 18, comprising a plurality of actuators configured for modifying at least one parameter of the biological system.
21. The apparatus according to any one of claims 16 to 19, comprising:
- a predictor module of the metabolic flows configured to construct the metabolic network (209) of the biological system as a function of said plurality of data and for determining the time dynamic of the metabolic flows of the components of the biological system as a function of said metabolic network (209);
- a predictor module of the concentrations of the components of the biological samples configured for determining a time dynamic of the concentrations of the components of the biological system as a function of the time dynamic of the metabolic flows;
- a predictor module of the kinetic model (105) of the biological system configured for simulating the time development of the metabolic network (209) identifying the final phenotype of the biological system.
PCT/IB2021/055527 2021-02-01 2021-06-23 Method for predictive analysis of a biological system Ceased WO2022162440A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IT102021000002033A IT202100002033A1 (en) 2021-02-01 2021-02-01 METHOD FOR PREDICTIVE ANALYSIS OF A BIOLOGICAL SYSTEM
IT102021000002033 2021-02-01

Publications (1)

Publication Number Publication Date
WO2022162440A1 true WO2022162440A1 (en) 2022-08-04

Family

ID=75936968

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2021/055527 Ceased WO2022162440A1 (en) 2021-02-01 2021-06-23 Method for predictive analysis of a biological system

Country Status (2)

Country Link
IT (1) IT202100002033A1 (en)
WO (1) WO2022162440A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118942530A (en) * 2024-10-08 2024-11-12 中南大学 A biological fluid flow fitting method, system, terminal and medium
WO2025230601A1 (en) * 2024-04-29 2025-11-06 X Development Llc Automated kinetic model generation for biochemical pathways

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010098865A1 (en) * 2009-02-26 2010-09-02 Gt Life Sciences, Inc. Mammalian cell line models and related methods
WO2014072950A1 (en) * 2012-11-09 2014-05-15 Cellworks Research India Private Limited System and method for development of therapeutic solutions
WO2020224779A1 (en) * 2019-05-08 2020-11-12 Insilico Biotechnology Ag Method and means for optimizing biotechnological production

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010098865A1 (en) * 2009-02-26 2010-09-02 Gt Life Sciences, Inc. Mammalian cell line models and related methods
WO2014072950A1 (en) * 2012-11-09 2014-05-15 Cellworks Research India Private Limited System and method for development of therapeutic solutions
WO2020224779A1 (en) * 2019-05-08 2020-11-12 Insilico Biotechnology Ag Method and means for optimizing biotechnological production

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
CSERMELY PETER ET AL: "Structure and dynamics of molecular networks: A novel paradigm of drug discovery A comprehensive review", PHARMACOLOGY & THERAPEUTICS, ELSEVIER, GB, vol. 138, no. 3, 4 February 2013 (2013-02-04), pages 333 - 408, XP029177912, ISSN: 0163-7258, DOI: 10.1016/J.PHARMTHERA.2013.01.016 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2025230601A1 (en) * 2024-04-29 2025-11-06 X Development Llc Automated kinetic model generation for biochemical pathways
CN118942530A (en) * 2024-10-08 2024-11-12 中南大学 A biological fluid flow fitting method, system, terminal and medium

Also Published As

Publication number Publication date
IT202100002033A1 (en) 2022-08-01

Similar Documents

Publication Publication Date Title
Hartig et al. Statistical inference for stochastic simulation models–theory and application
van Rosmalen et al. Model reduction of genome-scale metabolic models as a basis for targeted kinetic models
Tejada-Lapuerta et al. Causal machine learning for single-cell genomics
Brooks et al. Challenges and best practices in omics benchmarking
DiRenzo et al. A practical guide to understanding and validating complex models using data simulations
MacLeod et al. Modeling systems-level dynamics: Understanding without mechanistic explanation in integrative systems biology
Eriksson et al. Uncertainty quantification, propagation and characterization by Bayesian analysis combined with global sensitivity analysis applied to dynamical intracellular pathway models
Cannon et al. The ion channel inverse problem: neuroinformatics meets biophysics
KR20250111274A (en) System and method for optimizing general purpose biological network for drug response prediction using meta-reinforcement learning agent
Trensch et al. Rigorous neural network simulations: a model substantiation methodology for increasing the correctness of simulation results in the absence of experimental validation data
WO2022162440A1 (en) Method for predictive analysis of a biological system
Cao et al. Evolving cell models for systems and synthetic biology
Lotterhos et al. Analysis validation has been neglected in the Age of Reproducibility
Kulkarni et al. Scientific hypothesis generation and validation: Methods, datasets, and future directions
Mousa et al. Incorporating Physical Constraints inside Neural Networks to Improve their Accuracy and Physical Reliability for Chemical Engineering Unit Operations Modeling
EP4169021A1 (en) Kinematic modeling of biochemical pathways
EP4169022A1 (en) Kinematic modeling of biochemical pathways
Sharma et al. Estimate Reliability of Component Based Software System Using Modified Neuro Fuzzy Model
Barberousse et al. How Do the Validations of Simulations and Experiments Compare?
Li et al. Modeling biological networks: a systematic review of computational approaches to network dynamics
Xiaolin Intelligent educational cyber‐physical systems for college English course
Adiamah et al. Streamlining the construction of large-scale dynamic models using generic kinetic equations
Lee et al. Discovering sparse control strategies in neural activity
Gjerga et al. Literature and data-driven based inference of signalling interactions using time-course data
Gore et al. INSIGHT: understanding unexpected behaviours in agent-based simulations

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21734231

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 15/11/2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21734231

Country of ref document: EP

Kind code of ref document: A1