[go: up one dir, main page]

WO2014015196A2 - Techniques permettant de prédire le phénotype à partir du génotype sur la base d'un modèle de calcul correspondant à une cellule entière - Google Patents

Techniques permettant de prédire le phénotype à partir du génotype sur la base d'un modèle de calcul correspondant à une cellule entière Download PDF

Info

Publication number
WO2014015196A2
WO2014015196A2 PCT/US2013/051167 US2013051167W WO2014015196A2 WO 2014015196 A2 WO2014015196 A2 WO 2014015196A2 US 2013051167 W US2013051167 W US 2013051167W WO 2014015196 A2 WO2014015196 A2 WO 2014015196A2
Authority
WO
WIPO (PCT)
Prior art keywords
cell
sub
protein
rna
dna
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US2013/051167
Other languages
English (en)
Other versions
WO2014015196A3 (fr
Inventor
Markus COVERT
Jonathan KARR
Jayodita SANGHVI
Jared Jacobs
Derek MACKLIN
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Leland Stanford Junior University
Original Assignee
Leland Stanford Junior University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Leland Stanford Junior University filed Critical Leland Stanford Junior University
Publication of WO2014015196A2 publication Critical patent/WO2014015196A2/fr
Publication of WO2014015196A3 publication Critical patent/WO2014015196A3/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/30Detection of binding sites or motifs
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
    • G16B5/10Boolean models
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations

Definitions

  • a method comprises retrieving on a processor cell state data that indicates chromosome data that represents over 40% of genes in a single cell of an organism, and which also indicates populations of gene products in multiple functional compartments.
  • the functional compartments include at least an external conditions compartment, a membrane compartment, a cytosol compartment, and a DNA interactions compartment.
  • the method includes executing on a processor multiple sub-models that each simulate, for a same time duration called a time step, a different cell process selected from a group of cell processes comprising at least transcription, translation, ribosome assembly, protein translocation, and metabolism.
  • the method also includes updating on a processor values for the cell state data based on results from executing the plurality of sub-models; and, repeating, for multiple time steps, the steps of executing the plurality of sub-models and updating the values for the cell state data.
  • the process further includes storing, on a computer-readable medium, values of the cell state data after the plurality of time steps.
  • the chromosome data represents all genes in a single cell of an organism.
  • the method also includes determining a first cell phenotype, such as cell mass doubling time or cell division time, based on the values of the cell state data after the plurality of time steps.
  • the method also includes determining whether the first cell phenotype matches a target phenotype; and, if so, then genetically modifying an actual cell of the organism to correspond to the
  • the sub-models include a variety of types of models, including models based on ordinary differential equations, Boolean networks models, constraint-based modeling, and flux mass balance models.
  • an apparatus, a system, or a computer-readable medium is configured to cause an apparatus to perform one or more steps of the above methods.
  • Still other aspects, features, and advantages of the invention are readily apparent from the following detailed description, simply by illustrating a number of particular embodiments and implementations, including the best mode contemplated for carrying out the invention.
  • the invention is also capable of other and different embodiments, and its several details can be modified in various obvious respects, all without departing from the spirit and scope of the invention. Accordingly, the drawings and description are to be regarded as illustrative in nature, and not as restrictive.
  • FIG. 1 is a block diagram that illustrates example processes in a cell to be modeled, according to an embodiment
  • FIG. 2 is a block diagram that illustrates useful compartments, according to some embodiments.
  • FIG. 3 is a block diagram that illustrates example simulations using cell state variables and sub-models, according to an embodiment
  • FIG. 4 is a flowchart that illustrates an example method for forming a whole-cell model, according to an embodiment
  • FIG. 5A through FIG. 5C are flowcharts that illustrate an example use of whole- cell model simulation, according to an embodiment
  • FIG. 6A through FIG. 6C are bock diagrams that illustrate example fields and arrays and initial values for representing chromosome data in the Chromosome state variable class, according to an embodiment
  • FIG. 7 is a block diagram that illustrates example fields in the Transcript state variable class, according to an embodiment
  • FIG, 8A is a block diagram that illustrates example stages of RNA maturation and change tracked in the RNA state variable class, according to an embodiment
  • FIG. 8B is a block diagram that illustrates example fields in the RNA state variable class, accordins to an embodiment
  • FIG. 8C is a table that lists example interactions between different cell sub-models and the RNA state variable class, according to an embodiment
  • FIG. 9 is a block diagram that illustrates example fields in the Polypeptide state variable class, according to an embodiment.
  • FIG. 10A is a block diagram that illustrates example stages of protein monomer maturation and change tracked in the Protein Monomer state variable class, according to an embodiment;
  • FIG. 10B is a block diagram that illustrates example fields in the Protein Monomer state variable class, according to an embodiment
  • FIG. IOC is a cable that list example interactions between different cell sub-models and the Protein Monomer state variable class, according to an embodiment
  • FIG. 1 1 A is a block diagram that illustrates example fields in the Protein Complex state variable class, according to an embodiment
  • FIG. I IB is a table that list example interactions between different cell sub-models and the Protein Complex state variable class, according to an embodiment
  • FIG, 12 is a block diagram that illustrates example fields in the RNA Polymerase state variable class, according to an embodiment
  • FIG. 13 is a block diagram that illustrates example fields in the Ribosome state variable class, according to an embodiment
  • FIG. 14 is a block diagram that illustrates example fields in the Cytokinesis state variable class, according to an embodiment
  • FIG. 15 is a block diagram that illustrates example fields in the Metabolic Reaction state variable class, according to an embodiment
  • FIG. 16 is a block diagram that illustrates example fields in the Metabolite state variable class, according to an embodiment
  • FIG. 17 A is a block diagram that illustrates example cell geometry with a cell division septum region, according to an embodiment
  • FIG. 17B is a block diagram that illustrates example fields in the Geometry state variable class, according to an embodiment
  • FIG, 18 is a block diagram that illustrates example fields in the Host state variable class, according to an embodiment
  • FIG. 19 is a block diagram that illustrates example fields in the Mass state variable class, according to an embodiment.
  • FIG. 20 is a block diagram that illustrates example fields in the Stimulus state variable class, according to an embodiment;
  • FIG, 21 is a block diagram that illustrates example fields in the Time state variable class, according to an embodiment
  • FIG. 22A is a block diagram that illustrates features of DNA condensation, according to an embodiment
  • FIG. 22B is a flowchart that illustrates an example DNA Condensation sub-model, according to an embodiment
  • FIG. 23A is a flowchart that illustrates an example DNA Segregation sub-model, according to an embodiment
  • FIG. 23B is a list showing the relationship of the DNA Segregation sub-model to cell state variables, according to an embodiment
  • FIG. 24A is a flowchart that illustrates an example DNA Damage sub-model, according to an embodiment
  • FIG. 24B is a list of instructions that illustrates an example DNA Damage submodel for M. genitalium, according to an embodiment
  • FIG. 25A is a flowchart that illustrates an example DNA Repair sub-model, according to an embodiment
  • FIG. 25B is a list of instructions that illustrates an example DNA Repair sub-model for M. genitalium, according to an embodiment
  • FIG. 26A is block diagram that illustrates different regions for defining helicity on a replicating chromosome, according to an embodiment
  • FIG. 26B through FIG. 26D are graphs that illustrate the effect of superhelical density on the activity of certain DNA binding proteins, according to an embodiment
  • FIG. 26E and FIG. 26F are flow charts that illustrate an example method for a DNA Supercoiling sub-model, according to an embodiment
  • FIG. 26G is a list of parameter values set for the DNA Supercoiling sub-model, according to an embodiment.
  • FIG. 27A through FIG. 27C are a block diagram that illustrates the progress of DNA replication and components of replication machinery in a cell, modeled according to an embodiment;
  • FIG. 27D through FIG. 27F are flowcharts that illustrate an example method for a DNA Replication sub-model, according to an embodiment
  • FIG. 27G is a list of parameter values set for the DNA Replication sub-model, according to an embodiment
  • FIG. 27H is a list of cell state variables and other sub-models interacting with the DNA Replication sub-model, according to an embodiment
  • FIG. 271 through FIG. 27K are lists of instructions that illustrate an example DNA Replication sub-model, according to an embodiment
  • FIG. 28A is a list of parameters and values for an example DNA Replication Initiation sub-model, according to an embodiment
  • FIG. 28B and FIG. 28C are block diagrams that illustrate binding state
  • FIG. 28D is a list of cell state variables and other sub-models interacting with the DNA Replication Initiation sub-model, according to an embodiment
  • FIG. 29A is a flowchart that illustrates an example method for a Transcriptional Regulation sub-model, according to an embodiment
  • FIG. 29B is a list of instructions that illustrates an example Transcriptional Regulation sub-model, according to an embodiment
  • FIG. 30A and FIG. 30B are flowcharts that illustrate an example method for a Transcription sub-model, according to an embodiment
  • FIG. 30C is a list of cell state variables and other sub-models interacting with the Transcription sub-model, according to an embodiment
  • FIG. 30D is a list of parameters and values for a Transcription sub-model, according to an embodiment.
  • FIG. 31A is a flowchart that illustrates an example method for a RNA Processing sub-model, according to an embodiment;
  • FIG. 3 IB is a list of instructions that illustrates an example RNA Processing submodel, according to an embodiment
  • FIG. 32A is a flowchart that illustrates an example method for a RNA Modification sub-model, according to an embodiment
  • FIG. 32B is a list of instructions that illustrates an example RNA Modification submodel, according to an embodiment
  • FIG. 33A is a flowchart that illustrates an example method for an RNA
  • Aminoacylation sub-model according to an embodiment
  • FIG. 33B is a list of parameters for an RNA Aminoacylation sub-model, according to an embodiment
  • FIG. 34A is a flowchart that illustrates an example method for an RNA Decay submodel, according to an embodiment
  • FIG. 34B is a list of cell state variables and other sub-models interacting with the RNA Decay sub-model, according to an embodiment
  • FIG. 34C is a list of parameters and values for an RNA Decay sub-model, according to an embodiment
  • FIG. 35A and FIG. 35B are flowcharts that illustrate an example method for a Translation sub-model, according to an embodiment
  • FIG. 35C is a list of cell state variables and other sub-models interacting with the Translation sub-model, according to an embodiment
  • FIG. 35D is a list of parameters and values for a Translation sub-model, according to an embodiment
  • FIG. 35E is a list of instructions that illustrates an example method for initializing ribosomes for a Translation sub-model, according to an embodiment
  • FIG. 36A is a flowchart that illustrates an example method for a Protein Processing I sub-model, according to an embodiment
  • FIG. 36B is a list of instructions that illustrates an example Protein Processing I sub-model, according to an embodiment
  • FIG. 37A is a flowchart that illustrates an example method for a Protein
  • FIG. 37B is a list of instructions that illustrates an example Protein Translocation sub-model, according to an embodiment
  • FIG. 38A is a flowchart that illustrates an example method for a Protein Processing II sub-model, according to an embodiment
  • FIG. 38B is a list of instructions that illustrates an example Protein Processing II sub-model, according to an embodiment
  • FIG. 39A is a flowchart that illustrates an example method for a Protein Folding sub-model, according to an embodiment
  • FIG. 39B is a list of instructions that illustrates an example Protein Folding submodel, according to an embodiment
  • FIG. 40A is a flowchart that illustrates an example method for a Protein
  • FIG. 40B is a list of instructions that illustrates an example Protein Modification sub-model, according to an embodiment
  • FIG. 41A is a flowchart that illustrates an example method for a Protein
  • FIG. 41B is a list of instructions that illustrates an example Protein Complexation sub-model, according to an embodiment
  • FIG. 42A is a flowchart that illustrates an example method for a Ribosome Assembly sub-model, according to an embodiment
  • FIG. 42B is a list of instructions that illustrates an example Ribosome Assembly sub-model, according to an embodiment
  • FIG. 43A is a flowchart that illustrates an example method for a Organelle Assembly sub-model, according to an embodiment
  • FIG. 43B is a block diagram of hierarchical protein organization of a Terminal Organelle, according to an embodiment
  • FIG. 43C is a list of instructions that illustrates an example Terminal Organelle Assembly sub-model, according to an embodiment
  • FIG. 44A is a flowchart that illustrates an example method for a Protein Activation sub-model, according to an embodiment
  • FIG. 44B is a list of instructions that illustrates an example Protein Activation submodel, according to an embodiment
  • FIG. 45A through FIG. 45D are flowcharts that illustrate an example method for a Protein Decay sub-model, according to an embodiment
  • FIG. 45E through FIG. 45G are lists of instructions that illustrate an example Protein Decay sub-model, according to an embodiment
  • FIG. 46 is a list of parameters and values for an FtsZ polymerization sub-model useful for modeling cell division, according to an embodiment
  • FIG. 47A and FIG. 47B are block diagrams that illustrate a metabolic network of M. genitalium, according to an embodiment
  • FIG. 47C and FIG. 47D are block diagrams that illustrate an integrated flux balance analysis for M. genitalium, according to an embodiment
  • FIG. 47E and FIG. 47F are lists of instructions that illustrate an example
  • Metabolism sub-model according to an embodiment
  • FIG. 48A is a list of parameters and values for a Cytokinesis sub-model useful for modeling cell division, according to an embodiment
  • FIG. 48B is a block diagram that illustrates an example cycle of filament bending during cytokinesis in the Cytokinesis sub-model, according to an embodiment
  • FIG. 48C is a list of state variables and sub-models that interact with the
  • Cytokinesis sub-model according to an embodiment
  • FIG. 49 is a list of instructions that illustrates an example Host Interaction submodel, according to an embodiment.
  • FIG. 50 is a graph that shows example comparisons between measured and predicted growth rates for wild-type and 12 single-gene disrupted strains, according to an embodiment.
  • FIG. 51 is a block diagram that illustrates a computer system 5100 upon which an embodiment of the invention may be implemented.
  • a whole-cell model is a processor-based method that generates populations of gene products in multiple functional compartments in each of multiple time steps from over 40% of genes in all chromosomes of a cell, preferably over 50% of the genes, more preferably over 90% of the genes, and, in some embodiments, from every gene in all chromosomes.
  • Such models are used to predict the evolution of an individual cell with a particular genotype through cell division, or the variability among a population of cells with the same genotype, or a range of genotypes, including a tissue constructed from a population of such cells, for time scales in a range from one time step to a duration corresponding to many thousands of cell mass doublings and cell divisions.
  • . genitalium a human urogenital prokaryotic cell parasite whose genome contains 525 genes.
  • the invention is not limited to this context.
  • a whole-cell model is formed and used for other prokaryotic cells (cells without a nucleus in which genetic material is free in the cytoplasm) as well as eukaryotic cells (that carry out more internal chemical mechanisms and so utilize a nuclear membrane to separate DNA processes), using one or more techniques described herein.
  • a population of cells is simulated by rerunning the whole-cell model multiple times, either in parallel at each time step, or separately.
  • population simulations may be used, in various embodiments, to determine variability within the population or interactions between cells for cells of the same tissue or of the same organism, or for cell populations of mixed organisms, or some combination.
  • ADP Adenosine Di-Phosphate a molecule that results from hydrolysis of
  • ATP Adenosine Tri- Phosphate a molecule used by ceils to store energy
  • cell compartment A portion of cell in which a particular set of processes are performed and populated by molecules that serve as reactants and products of those processes.
  • chromosome A single piece of coiled DNA containing many genes, regulatory
  • DNA Deoxyribonucleic acid is a usually double-stranded long
  • nucleotides or “bases.” There are four bases: adenine, thymine, cytosine, and guanine, represented by the letters A, T, C and G, respectively.
  • Adenine on one strand of DNA always binds to thymine on the other strand of DNA
  • guanine on one strand always binds to cytosine on the other strand; such bonds are called base pairs.
  • Any order of A, T, C and G is allowed on one strand, and that order determines the complementary order on the other strand.
  • the actual order encodes other shorter molecules, such as proteins, used to build and control all living organisms and thus determines the function of that portion of the DNA molecule.
  • DNA-binding Proteins that are composed of DNA-binding domains that have a proteins specific or general affinity for either single or double stranded DNA.
  • DNA-binding proteins include transcription factors which modulate the process of transcription, various polymerases, nucleases which cleave DNA molecules, and histones which are involved in chromosome packaging and transcription in the cell nucleus.
  • GDP Guanosine diphosphate a molecule that results from hydrolysis of GTP to release energy.
  • a gene includes, without limitation, regions preceding and following the coding region, such as the promoter and 3'-untranslated region, respectively, as well as intervening sequences (introns) between individual coding segments (exons).
  • DNA or, for many types of viruses, in RNA includes both the genes and the non-coding sequences of the DNA/RNA.
  • genotype The genetic composition of an individual organism.
  • GTP Guanosine-5'-triphosphate a molecule used by cells to store energy.
  • chromosome that can be used to determine the chromosome location of an arbitrary sequenced portion of DNA.
  • NF-s B nuclear factor kappa-light-chain-enhancer of activated B cells nucleic acid A molecule comprising a sequence of one or more repeating chemical units known as “nucleotides” or “bases” of “base pairs” (bp).
  • DNA deoxyribonucleic acid
  • adenine, thymine, cytosine, and guanine represented by the letters A, T, C and G, respectively.
  • RNA ribonucleic acid
  • U replaces the base thymine (T).
  • phenotype An individual organism's observable characteristics or traits, such as its morphology, development, biochemical or physiological properties, phenology, behavior, and products of behavior.
  • pi Isoelectric point that indicates the pH at which a particular molecule or surface carries no net electrical charge.
  • protein A generic term referring to molecules made up of a chain of amino acids and including native protein, fragments, peptides, or analogs of a polypeptide sequence. Synonym used herein includes "polypeptide.” Hence, native protein, fragments, and analogs are species of a polypeptide genus. R/M restriction/modification
  • RNA Ribonucleic acid comprises a single stranded chain of
  • sample Includes any biological specimen obtained from a subject.
  • SNP Single-nucleotide polymorphism a DNA sequence variation occurring when a single nucleotide in the genome (or other shared sequence) differs between members of a biological species or paired
  • chromosomes in an individual.
  • the vast majority of polymorphic sites are bi-allelic, meaning that no more than two possible distinct alleles can be observed at these sites.
  • a 0 indicates one nucleotide at the position of the SNP and a 1 indicates a different nucleotide.
  • mammals e.g., humans, dogs, cows, horses, kangaroos, pigs, sheep, goats, cats, mice, rabbits, rats, and transgenic non-human animals as well as plants and protists. Synonyms used herein include “patient” and "animal.”
  • steps to obtain beneficial or desired results including clinical results, such as alleviating or ameliorating one or more symptoms of a disease; diminishing the extent of disease; delaying or slowing disease progression; ameliorating and palliating or stabilizing a metric
  • whole-cell model A computational process that generates populations of gene products in (also, whole cell multiple functional compartments in each of multiple time steps from computational over 40% of genes in all chromosomes of a cell, preferably over 50% model) of the genes, more preferably over 90% of the genes, and when
  • a useful goal is to predict the complex phenotypes of individual cells in terms of individual molecules and their interactions.
  • the primary challenges to building a unified, whole-cell model of a single cell are threefold: (1) complexity, processes relevant to cellular behavior are diverse and span a wide range of length and time scales, (2) heterogeneity, cellular networks have heterogeneous mathematical structures and are typically investigated using heterogeneous experimental methods, and (3) sparsity, little quantitative data rigorously describing single cell physiology is available. The flexibility of a hybrid model best allowed navigating these challenges.
  • Bio systems are herein viewed as modular at certain time scales.
  • the cell is modeled herein by (1) dividing cellular processes into separate submodels corresponding to separate functional processes, (2) independently running each submodel on a short time scale, and (3) integrating longer time scales by using de-conflicted sub-model outputs at the end of one time step as input to one or more sub-models for the next time step.
  • the sub-models span six areas of cell biology: (1) transport and metabolism, (2) DNA replication and maintenance, (3) RNA synthesis and maturation, (4) protein synthesis and maturation and (6) interaction, if any.
  • the protein synthesis and maturation sub-models include an additional area for cell-type specific functions, such as cell-specific organelle functions. For example, M.
  • the sub-models include a variety of types of models, including models based on ordinary differential equations, Boolean networks models, constraint-based modeling, and flux mass balance models.
  • FIG. 1 is a block diagram that illustrates example processes in a cell to be modeled, according to an embodiment.
  • FIG. 1 depicts 28 sub-models grouped by category as metabolic 140, RNA 120, protein 130, and DNA 110— in the context of a single cell. Sub-models are connected through common metabolites, RNA, protein, and the chromosome which are depicted as solid filled arrows. It was discovered that much of the cell phenotype, including growth and cell division is described with these 28 sub models listed in Table 2. These sub-models are described in more detail below with reference to FIGs.
  • sub-models are excluded. For example, if cell division is not modeled, sub-models directed to replication and cell division are omitted. In other embodiments, other sub-models are added, such as sub-models for producing other organelles.
  • Protein Protein Activation Causing a protein to be active for performing a function based on external conditions falling in a particular range
  • Cytokines Pinching of the cell membrane in a certain location until the membrane is separated and division is complete host interaction
  • Host Interaction Determining external conditions caused by host, if any, in response to proteins and
  • compartments are a portion of cell in which a particular set of processes are performed and populated by molecules that serve as reactants and products of those processes.
  • the compartment might correspond to one or more locations in a cell, such as the membrane. It was found that useful whole-cell modeling could be performed with four or more compartments representing, at least, external conditions, cell membrane, cytosol and DNA interactions. Additional compartments are included in some embodiments, such as one or more compartments for special organelles and a nucleus membrane compartment for cells with a nucleus.
  • FIG. 2 is a block diagram that illustrates useful compartments 200, according to some embodiments.
  • the compartments 200 include a DNA interactions compartment 210, a cytosol compartment 220, a cell membrane compartment 230 and an external conditions compartment 240. Also depicted are a compartment 250 for a nucleus membrane a compartment 260 for special organelles, and sub-compartment 262 for organelle membranes in the organelle compartment 260 and sub-compartment 232 for lipoprotein interactions in cell membrane compartment 230.
  • a whole-cell model includes the gene products of a large fraction, if not all, of the genome.
  • the term whole-cell model means a computational process that generates populations of gene products in multiple functional compartments in each of multiple time steps from over 40% of genes in all chromosomes of a cell, preferably over 50% of the genes, more preferably over 90% of the genes, and when possible every gene in all chromosomes.
  • the entire genome of a simple one-celled organism, M. genitalium is included in the whole cell mode.
  • a method includes determining sub-models for each compartment including: transcription regulation, transcription and RNA maturation in the DNA interactions compartment; translation, protein maturation, protein translocation and ribosome assembly in the cytosol compartment; and metabolism in the cell membrane compartment.
  • the inputs to each submodel and the outputs of each sub-model are determined.
  • Variable inputs used by more than one sub model and outputs of each model are collected as state variables.
  • the values of the state variables describe the phenotype of the cell at a particular time.
  • Constants used by the sub-models are considered model parameters, but values of those constants often have to be reconciled among multiple sub-models.
  • the 28 models use the populations and
  • Chromosome For each gene: location, start coordinate, length,
  • tRNA genes codon and amino acid, essential, expression shock, half life of each RNA transcript, sequence, G/C content, weight of each transcript, pi of each transcript, half-life RNA, decay rate RNA, synthesis rate RNA, polymerase binding probability RNA, homologs, database ref, transcription unit, reactions involving gene product.
  • R/M sites Short tandem repeats, coordinate, direction, sequence and length, progress of the RNA polymerase along the transcription unit, whether aborted at this location, calculated dry weight.
  • RNA species copy count, weights,
  • Protein Monomers For each gene coded protein monomer species: copy count, DNA-bound copy count, result of successful translation. Localization, signal sequence, N-terminal methionine cleavage, prosthetic groups, chaperone, topology, catalysis, DNA footprint (instability, stability, aliphatic index, GRAVY, extinction coefficient and absorption of gene), folded or unfolded, modified or unmodified, nascent or processed. Processing enzyme copy count, cytosolic- and membrane- and organelle cytosol- and organelle membrane-localized copy counts.
  • Protein Complex For each protein complex species: copy count, DNA- bound copy number, complex subunit composition, molecular weights, amino acid composition, half-lives, and localization, minimum expression of each complex for each process with a minimum, folded and unfolded copy counts, and modified and unmodified copy counts.
  • RNA polymerase Copy count chromosome location of DNA-bound copy, expected probabilities of a polymerase being in four conditions (free, non-specific bound, specific bound, actively transcribing), fold changes to the base binding probabilities.
  • Ribosome Copy count For each copy: (1) the status - actively translating or stalled, (2) the mRNA (or tmRNA if stalled) species bound to and translating, and (3) the position, in codons from the start codon.
  • Cytokines is protein Number of edges and status of each edge (bent or not bent) in a protein complex associated with cell division, such as a FtsZ ring.
  • Geometry Diameter of cell midline (progress of cytosis), length of cylinder- shaped cell, width, volume (assuming constant density), surface area, septum length.
  • Mass Cell mass and mass of each component water mass, chromosome mass, RNA mass, protein monomer and complex masses, transcript mass, polypeptide mass, metabolites mass, mass of bi-lipid membrane (based on mass of other components and density).
  • FIG. 3 is a block diagram that illustrates example simulations using cell state variables and sub-models, according to an embodiment. Values of 16 multidimensional cell state variables 310 at one time are used as input to the 28 cell process sub-models 320. Cell functions and variables are grouped into 5 physiologic categories: DNA, RNA, protein, metabolite, and other. Lines between the variables and sub-models indicate the cell variables predicted by each sub-model. The number in parentheses next to each sub-model indicates the number of genes involved in the sub-model processes in the example organism, M.
  • the sub-models are executed (in random order in some embodiments to fairly apportion limited resources) to model corresponding processes for the same time duration (called the time step) to produce a phenotype made up of new values of the cell state variables at a later time. That phenotype is compared to a simulation end condition 330, such as the cell dividing (occurs when a septum diameter output by cytokinesis sub-model equals zero), or a certain number of rounds of executing the sub-models. If the condition is satisfied, the simulation ends. Otherwise the values of the cell state variables output by the sub-models are used as input to the sub-models for another round of execution to advance the cell state another time step.
  • a whole-cell model includes more or fewer multidimensional cell state variables or sub-models or some combination.
  • An advantage of the whole-cell model approach is that uncertain values for any variables or parameters can be determined in order to produce molecular closure and temporal closure.
  • Molecular closure means that the populations of gene products are limited by the availability of the molecules that make them up. When all molecular resources are consumed, no new gene products or derivative complexes can be generated, until the resident gene products are broken down into their constituent molecules for reuse or new molecules are imported through the cell membrane.
  • FIG. 4 is a flowchart that illustrates an example method for forming a whole-cell model, according to an embodiment. Although steps are depicted in FIG. 4, and in subsequent flowcharts, as integral steps in a particular order for purposes of illustration, in other embodiments, one or more steps, or portions thereof, are performed in a different order, or overlapping in time, in series or in parallel, or are omitted, or one or more additional steps are added, or the method is changed in some combination of ways.
  • compartments are selected for a cell type.
  • the compartments include at least the DNA interactions compartment, cytosol compartment, cell membrane compartment; and external conditions compartment.
  • other compartments such as those depicted in FIG. 2 are also included.
  • the special organelle compartment is a so-called terminal organelle used for attaching to the urogenital passage of a host organism.
  • the determination of compartments includes performing a comprehensive search of the available literature on the subject. In many cases, direct information about an organism compartment is not available. In such cases information about a closely related organism is used. For example, for instances in which the knowledge base of M. genitalium is lacking, information about Escherichia coli (E. coli) was substituted. In some cases other Mycoplasmas are used; in other cases, E. coli was used; and in yet other cases organisms were not limited to Mycoplasmas or E. coli.
  • E. coli Escherichia coli
  • sub-models are determined for modeling the processes in each compartment including, at least: transcription regulation, transcription and RNA maturation in the DNA interactions compartment; translation, protein maturation, and ribosome assembly in the cytosol compartment; and, protein translocation and metabolism in the cell membrane compartment.
  • more sub-models are included for simulating other processes in various compartments, such as one or more of the sub-models listed above in Table 2, or others not listed there.
  • the determination of processes or the determination of existing sub-models for those processes includes performing a comprehensive search of the available literature on the subject. In many cases, direct information about an organism genome and processes is not available.
  • variable inputs and outputs are determined for each sub-model (such as chromosome and copy counts of molecules in compartment and cell mass) and parameters (such as reaction rate statistics, percentage of cell mass associated with an important molecule).
  • the determination of parameter values for those processes includes performing a comprehensive search of the available literature on the subject.
  • cell states variables are determined based on sub-model inputs and outputs. Because each cell state variable is multidimensional, each is stored in a data structure that includes one or more Boolean, scalar or array fields, or some combination.
  • the cell state data structure is generated to store each multidimensional attribute of cell state and to indicate cell state methods to initialize, update, retrieve or derive values for every attribute of the cell state.
  • object-oriented programming structures provide a computational mechanism for storing multidimensional variable values and methods for manipulation of those values, such as to initialize, update, retrieve or derive values.
  • Objects of object-oriented programming are instances that can inherit methods and attributes, such as parameter values, from an object-oriented class.
  • one or more of the cell state variable data structures are objects of a cell state variable class.
  • step 409 for each sub-model, methods are generated to utilize cell state data structures as input or output or both, and to set parameter values.
  • object oriented programming methods are implemented as a method class.
  • step 409 includes generating or update a cell process sub-model class for each sub-model so that the class utilizes values stored in cell state variable objects.
  • step 411 a time scale for which input and output state variables for each submodel are well mixed in every compartment where the sub-model's processes occur is determined.
  • steady state conditions are determined for at least some sub-model.
  • Steady state conditions are those values of the cell state variables utilized by the sub-model that cause the model to output cell state variable values that are substantively the same as the input cell state variable values. Steady state values are useful for comparison with observations to ensure that sub-model parameters are properly set.
  • step 421 it is determined whether there is a another sub-model for which to perform steps 405 through 413. If so, control passes back to step 405. If not, control passes to step 431.
  • step 431 a time step for simulation of the cell is determined based on the well mixed time scale. The time step should be short enough to resolve the evolution of most cell state variables over time, but not so short that many extraneous time steps are taken to model even the fastest processes. To keep the time step from being too long, the time step it is selected to be shorter than the time for at least some sub-model processes to complete. To keep the time step from being too short, the time step it is taken to be about the same as the fastest well mixed time scale. In the example embodiment, the time step is selected to be 1 second. [0129] As a result of method 400, data structures are constructed to store knowledge base for much or most of the genome (40% to 100%) including sub-models for separate processes and state variables in multiple compartments of the single cell.
  • FIG. 5A through FIG. 5C are flowcharts that illustrate an example use of whole-cell model simulation, according to an embodiment.
  • FIG. 5A is directed either to training the whole-cell model based on observations, or to engineering a cell that exhibits a desired phenotype, e.g., the production of a certain product, such as an enzyme for digesting oil, or behavior, such as growth under conditions of abnormally high temperatures.
  • a desired phenotype e.g., the production of a certain product, such as an enzyme for digesting oil, or behavior, such as growth under conditions of abnormally high temperatures.
  • step 501 it is determined what behavior or product is desired in an organism and what known subject organism is suitable as a starting organism because that subject organism phenotype is close enough to the desired phenotype.
  • the desired phenotype is the actual phenotype of a subject organism.
  • step 511 a whole-cell model of the subject organism is constructed, or retrieved if already constructed.
  • a model can be constructed as described above with reference to FIG. 4.
  • the sub-models included in step 511 have not been optimized for use in a particular organism.
  • the values of parameters of the sub-models such as kinetic rates, are aligned so as not to be contradictory with parameter values used in other sub-models or so as to produce concentrations and proportions of gene products in a normal range under normal conditions.
  • the parameter values are set to provide molecular closure. In some embodiments, this optimization has already occurred in step 511, and step 513 is omitted.
  • initial conditions variance control is determined.
  • Cell theory states that all cells are created from old cells, or, more mathematically, that on the time scale of a single generation, mother and daughter cells are statistically similar. This principle was used to relate the initial and final cell state distributions of the whole-cell model, providing temporal closure and a rigorous way to define the initial cell state distribution in terms of the dynamic model.
  • the total cell mass variance am, and initial copy counts ⁇ of all RNA and ⁇ of all protein species, are used to control initial conditions variance.
  • FIG. 5B A method that calculates the expectation value of each cell state variable was developed based on the whole-cell model of step 511, This method is depicted in FIG. 5B.
  • FIG. 5C shows how to use the method of FIG. 5B to apply the principal of statistical equivalence of mother and daughter cells.
  • the inputs of FIG, 5B were determined initially with osn, ⁇ and ⁇ all set equal to zero.
  • the distribution of each cell state variable was approximated by a standard, well-behaved statistical distribution, and set the mean of each distribution to its calculated value. For example, it was assumed that the copy number of each RNA and protein species is multinomially distributed, and that the total cell mass is normally distributed.
  • step 523 the elapsed time is set, e.g., initially the elapsed time set to zero, the time of the initial conditions set in step 521; but, subsequently, the elapsed time is incremented by one time step. In an illustrated embodiment, the time step is 1 second.
  • step 525 the external conditions are set, so that time varying external conditions can be simulated, such as rising temperatures or falling levels of metabolites.
  • step 530 shared resources, such as metabolites, are divided among the processes competing for those resources.
  • step 530 includes steps 531 through 539.
  • the shared resources are not evenly distributed but can be differentially concentrated in one compartment compared to another.
  • Such embodiments include step 531 to determine the next compartment with shared resource to divide.
  • step 531 is omitted.
  • step 533 the next resource and amount available (e.g., in cell or compartment) is determined.
  • step 535 the next process that uses the shared resource is determined. In order to prevent one or a few processes form always getting the resources, the process is determined at random.
  • one sub-model may include multiple processes; but the sub-model is configured to apportion the amount of the shared resource allotted to the sub-model among the processes included in the sub-model. Because the sub-models run independently, process 530 divides the resources among multiple submodels before any sub-model is invoked.
  • step 537 the demand for the resource by the sub-model is determined. For example, one sub-model may not require a particular resource at all, another requires only one copy of the resource, and another may require multiple copies of the resource, e.g for large reactions or a large number of small reactions included in one sub-model. Steps 535 and 537 are repeated until all sub-models have been considered, steps 533 through 537 are repeated until all resources have been considered, and steps 531 through 537 are repeated until all compartments have been considered. When the demand for every shared resource is known for all compartments, control passes to step 539.
  • each shared resource is divided among the sub-models based on the relative demand by those sub-models, i.e., proportional to the demand.
  • a first sub-model that demands twice as many ATP molecules as a second sub-model will receive twice as many as the second sub-model.
  • step 550 the evolution of values for each cell state variable is determined.
  • step 530 includes steps 551 through step 555.
  • step 551 the next sub-model to be executed is determined.
  • the next sub-model is chosen at random so that a second sub-model that depends on an unshared output of a first sub-model will, at least on occasion, be executed after the first sub-model and not have to wait until the next time step.
  • step 553 the next sub-model is executed by retrieving the values of any unshared cell state variables and retrieving the divided part of any shared cell state variable.
  • step 555 the contributions (negative or positive) to the values of the cell state variables by executing the process are combined with the values already in those cell state variables an the values of those cell state variables updated.
  • the effects are not additive; step 555 includes any switching, saturation, or multiplicative effects of each contribution to the value of the cell state variable.
  • step 573 it is determined if the simulated phenotype is the desired phenotype. For example, does the simulation indicate that the genotype used in step 511 produces an organism with the desired behavior or product? If so, then an organism should be genetically engineered with the genotype used in step 511. If not, then the genotype should be adjusted in step 575 and the model run again with the new genotype by returning control to step 511.
  • Any method may be used to genetically engineer the organism. Introducing genes into microorganisms or cell culture lines is foundational to the biotechnology industry, enabling the production of proteins, biofuels, and biochemical. This process known as genetic engineering is often accomplished by inserting genes into a larger piece of DNA, such as a plasmid, that then gets transferred to a host cell. Genetic engineering alters the genetic makeup of an organism using techniques that remove heritable material or that introduce DNA prepared outside the organism either directly into the host or into a cell that is then fused or hybridized with the host.
  • Genetic perturbations include the introduction of a plasmid or virus containing heterologous genes, knock-outs and knock-ins of chromosomal genes, and modifications of functional and regulatory elements encoded in DNA (e.g., promoters, transcription-factor binding sites, ribosomal binding sites, regions of secondary structure in the transcribed RNA, termination sites).
  • modifications of functional and regulatory elements encoded in DNA e.g., promoters, transcription-factor binding sites, ribosomal binding sites, regions of secondary structure in the transcribed RNA, termination sites.
  • an entire chromosome can be synthesized and transplanted into a competent recipient host.
  • Whole-cell modeling incorporates known molecular biology into a quantitative framework for running computational simulations. These simulations are capable of predicting the multifactorial effects of the addition or removal of one or more genes in a cell. Additional genes can be modeled either by defining additional genes on one of the existing chromosomes, including possibly extending the sequence of the existing chromosome, or by modeling extrachromosomal plasmids analogously to genomic DNA. Genes can be removed from the model either functionally by eliminating their RNA and protein expression, or structurally by removing the gene's transcription start and stop sites and possibly altering the chromosome sequences.
  • step 581 it is determined whether the actual phenotype of an organism, such as the genetically engineered organism, matches the simulated phenotype. If so, then the process is done. If not, then the model is flawed and should be adjusted in step 583, and run again with the adjustments by returning control to step 511.
  • the model is trained by comparisons between simulated phenotypes and observed phenotypes, as described in more detail below with reference to an example whole-cell model for M. genitalium. In such embodiments, there is considered to be no desired phenotype different from the simulated phenotype, and step 573 is omitted.
  • a state variable class is often referred to as a state class in the following.
  • data structures and fields are depicted in many of the diagrams in this section as integral blocks in a particular order for purposes of illustration, in other embodiments, one or more data structures or messages or fields, or portions thereof, are arranged in a different order, in the same or different number of data structures or databases in one or more hosts or messages, or are omitted, or one or more additional fields are included, or the data structures and messages are changed in some combination of ways.
  • Chromosomes encode the structure and function of every RNA and protein, and thereby control cellular behavior. Due to their critical function and large size, cells dedicate considerable resources to chromosome replication, maintenance, and compaction. 'This state represents the polymerization, winding, modification, protein occupancy, and (de)catenation status of the chromosome(s). This state represents the polymerization, winding, modification, and protein occupancy of each nucleotide of each strand of each copy of the organism chromosome, and the (decatenation status of the two sister chromosomes following replication.
  • FIG. 6A through FIG. 6B are bock diagrams that illustrate example fields and arrays for representing chromosome data in the Chromosome state variable class, according to an embodiment. Many of the properties associated with these fields are described in more detail below with reference to the sub-models.
  • FIG. 6A depicts example fields according to one embodiment.
  • Chromosome ID field 601 holds data that indicates an identifier (ID) for a particular chromosome.
  • ID identifier
  • the subject organism has only one
  • Chromosome length field 602 holds data that indicates length of the chromosome in nucleotides (bases).
  • OriC position field 603 holds data that indicates a position along the chromosome where replication commences;
  • TerC position field 604 holds data that indicates a position along the chromosome where replication is completed
  • Chromosome sequence field 605 holds data that indicates the sequence of nucleotides of the forward strand. For brevity going forward, the expression that a field "indicates" means that the field "holds data that indicates.”
  • the regions accessible for protein binding field 606 indicates the position ranges in bases where the chromosome can be accessed for binding with proteins of sufficient affinity.
  • the locations of DNA-bound proteins field 607 indicates the position in bases where a protein is actually bound and the species of the bound protein.
  • the DNA footprints of proteins field 608 indicates the DNA footprints of the bound protein species.
  • the DnaA complex at OriC field 610 indicates whether that complex is currently bound to the chromosome at that location, signaling that replication can begin.
  • the polymerized regions field indicates the range of bases that have been polymerized (copied by the growing complementary strand during replication).
  • the superhelicity field 612 indicates the change in linking number from relaxed coiling for the chromosome or a replicating portion of the chromosome.
  • the strand breaks 613 field indicates locations of strand break, such as introduced when fragments are formed during replicating of a lagging strand or during chromosome damage.
  • the unbound bases field 614 and damaged bases field 615 indicate types of chromosome configurations that invoke chromosome repair.
  • Segregation status field 616 indicates whether replicated chromosomes have separated and moved to opposite poles of the cell.
  • the chromosome includes many genes, transcribed in groups of one or more genes called transcription units. In various embodiments, properties of each gene or transcription unit are also included in this state class. For example each gene is described by a field 620 with multiple component fields, which is repeated for each gene or transcription unit, as indicated be ellipsis below field 720.
  • Gene ID field 621 indicates a unique identifier for a gene, such as gene nomenclature name or MG number.
  • Gene start coordinate field 622 indicates the location on the chromosome where the gene begins, the direction field 623 the direction of the gene from that location, and the length field 624 the number of bases in the gene.
  • the transcription unit ID field 625 indicates the transcription unit to which the gene belongs with a unique identifier for the transcription unit.
  • the essentiality field 626 indicates, e.g., with a Boolean value, whether expression of the gene is essential for the cell.
  • the type field 627 indicates the type of RNA produced from the gene (mRNA, rRNA, sRNA, tRNA) and the promoter field 628 indicates the location and base sequence of the promoter where the RNA polymerase binds to the DNA.
  • the chromosome state variable class includes a number of wild variations field 631 that indicates how many variations of this gene are found in nature and the sequence of wild type field 632 or single nucleotide polymorphisms (SNPs) field 633 indicates the bases in the wild variation.
  • the G/C content field 634 indicates the percent of the bases that are either guanine or cytosine, indicative of increased probability for recombination.
  • the Chromosome state variable class includes a polymerase ID field 635 indicates uniquely one of the polymerases produced and the polymerase binding probability field 636 indicates the probability that polymerase will bind to the promoter for the current gene.
  • the weight of the transcript produced, the rate at which the transcript is synthesized, the half life of the transcript, and the decay rate of the transcript are indicated by the contents of field 641, field 642, field 643 and field 644.
  • the product of this gene, either RNA or protein from mRNA is then involved in one or more reactions modeled by the sub-models. So in the illustrated embodiment, the
  • Chromosome state variable class 600 includes field 634 to indicate the reactions involving the products of this gene. If the gene is for tRNA, the associated codon and amino acid are indicated by field 646. The tendency of the gene to be expressed under shock conditions, such as temperature shock, chemical shock, or radiation shock, e.g., at four ultraviolet wavelengths, is indicated by the contents of field 647.
  • shock conditions such as temperature shock, chemical shock, or radiation shock, e.g., at four ultraviolet wavelengths
  • Homologs field 651 holds data that indicates genes in other organisms with very similar sequences, which can be used in some embodiments to determine the properties of the gene in the subject organism.
  • the database references field 652 indicates what databases were the sources of the values placed in one or more fields for this gene. Other fields for the gene, if any, are indicated by ellipsis.
  • the methods of the Chromosome state class include: a method 691 to retrieve values from an object of the class; a method 692 to update values in one or more fields; and a method 693 to initialize values in one or more fields.
  • the illustrated embodiment includes one or more methods 694 to derive a value in one field from values in one or more other fields (such as a method to dissociate DnaA from the chromosome after initiation of replication).
  • Other methods for the chromosome state variable class are indicated by ellipses below method 694.
  • FIG. 6B summarizes the mathematical representation of several properties of chromosome(s) including the size and type of multiple attributes, according to another embodiment.
  • properties are each represented by a tensor for base (nucleotide) i on strand / of chromosome k,
  • the index i varies from 1 to L indicating the length of the chromosome in bases.
  • the index j varies from 1 up to 2 strands; and the index k varies from 1 up to the number of chromosomes (2 for replication of an organism with a single chromosome, such as M. genitalium),
  • Properties of bound metabolites, proteins or protein complexes include a fourth dimension with index / to indicate the bound species.
  • Column 671 indicates the name of the physical property.
  • Column 672 indicates a
  • Column 673 indicates the size of the array specifying the property at each base of the chromosome and column 674 indicates the type o value that fills the array, either Boolean (True or False) or integer or real.
  • Each property except winding, base and sugar- hosphate modification, protein occupancy and catenation, is represented as a 3-dimensional Boolean tensor.
  • Winding w in the mathematical notation is represented as a 3-dimensional real tensor which indicates the linking number density of each nucleotide.
  • Protein monomer and protein complex occupancy (b 'n and h c , respectively in this notation) are
  • Boolean scalar The other properties represented by 3-D Boolean tensors include polymerization (p), gap site modification (m*), abasic site modification (m"), intra-strand cross link modification (rn ), strand break modification (rn ), and Holliday junction modification (m h ).
  • the total size is within the data structures easily maintained by modem computational equipment.
  • the Metabolite state describes the identity of each modified base and sugar- phosphate.
  • the Protein Monomer and Protein Complex states represent the total DNA-bound copy count of each protein monomer and macromolecular complex.
  • the RNA Polymerase state redundantly represents the chromosomal location of each bound RNA polymerase.
  • the mass of the chromosome(s), including all modifications and all DNA-bound proteins, is included in the cell mass calculated by the Mass state.
  • the Replication Initiation sub-model models DnaA DNA-binding and formation of the oriC DnaA complex which promotes replication initiation.
  • the Replication sub-model models bidirectional DNA polymerization from the oriC, continuously on the leading strand and discontinuous])'' on the lagging strand, and Okazald fragment ligation.
  • the Chromosome Segregation sub-model models sister chromosome decatenation following successful
  • the Cytokinesis sub-model models cell division following successful decatenation.
  • the Chromosome Condensation and DNA Supercoiling sub-models model SMC-and supercoiling-mediated chromosome compaction,
  • the DNA Damage submodel models stochastic and radiation- and chemically-induced DNA damage.
  • the DNA Repair sub-model models three DNA repair pathways— base excision repair, nucleotide excision repair, and homologous recombination.
  • the DNA Repair sub-model also models methylation and restriction of Muni (MG184) restriction/modification sites.
  • the Transcriptional Regulation and Transcription sub-models model the binding of transcription factors and RNA polymerases to promoters and the synthesis of RN A.
  • RNA polymerase for each promoter can be fit to provide the gene products required by the 28 modeled processes to reproduce the observed mass doubling time
  • the reconstructed example M. genitali m chromosome contains 525 genes based on the Comprehensive Microbial Resource (CMR) genomic annotation organized into 335 transcription units based on the M. pneumoniae operon organization experimentally defined and the M. genitalium. promoters computationally predicted.
  • the reconstructed genome also contains 17 transcriptional regulatory elements based on 15 studies and databases, 2,283 DnaA binding sites were computationally identified based on the consensus binding motif reported. Muni restriction/modification (R/M) sites were computationally identified based on the reported Muni binding motif, and 19 short tandem repeats were identified. Additionally, the location of the replication origin and the predicted DnaA box sites were reconstructed based on studies. Finally, the reconstructed steady-state superhelicity is based on the observed equilibrium helical repeat length of plasmid DNA and the observed superhelicity of E. coli DNA.
  • RNA polymerase for each promoter was first reconstructed from the observed RN A composition of E. coli, the expression of each M, genitaiium m N A reported, the half-life of each E. coli mRN A reported, the observed amino acid composition of M. gallisepticum, and the Mycoplasma genetic code, Subsequently, the affinity of RNA polymerase for each promoter was fit to match the additional data used to train the 28 modeled cellular processes.
  • the chemical composition of M. genitaiium. and the objective of the flux -balance analysis metabolic model were fit to match the mass and dN MP composition of the M. genitaiium. chromosome.
  • the affinity of RNA polymerase for each promoter was fit to provide the gene products required by the 28 modeled processes to reproduce the observed nine -hour mass doubling time of M, genitaiium.
  • oriC DnaA DNA-binding cooperativity was fit to match expectations for the duration of the replication initiation cell cycle phase,
  • the expression and activity of DNA gyrase and topoisomerse I were balanced to match the observed steady state superhelical density, 3.2 Transcript state variable
  • an RNA polymerase binds to a gene promoter and facilitates the synthesis of an RNA transcript.
  • This state stores all of the information that pertains to nascent RNA transcripts and aids in the time evolution of Transcription. Transcription proceeds at a maximal rate of about 50 nucleotides per second. Therefore, it may take several time steps to synthesize a transcript, and the progress of the RNA polymerase along the transcription unit must be stored.
  • This state stores the transcription unit to which each RNA polymerase is bound, including which chromosome it is bound to and the progress of the RNA polymerase along the transcription unit.
  • RNA polymerase may stall in the transcription process or be knocked off of a gene by another protein. In these cases, an incomplete transcript may result, and will be targeted for degradation by the RNA Decay process. Therefore, the Transcript state variable also holds the sequence of each aborted transcript, such that the nucleotides can correctly be accounted for upon degradation.
  • RNA polymerases may be bound to a given transcription unit, and this state can calculate the total number of polymerases on each unit.
  • the Transcript state also calculates the dry weight of the nascent transcripts.
  • FIG, 7 is a block diagram that illustrates example fields in the Transcript state variable class 700, according to an embodiment.
  • a copy count field 701 indicates how many transcription units populate the chromosome.
  • An ID field 703 uniquely identifies a particular transcription unit.
  • the length, start coordinate, direction of transcription and nucleotide sequence are indicated in fields 703,704,705 and 706, respectively.
  • the genes included in the transcription unit are listed in field 71 1, the location of DnaA boxes are indicated in field 712, the location of R/M sites are indicated in field 713, and the locations of short tandem repeats are indicated in field 714.
  • the current location of an RNA polymerase along the transcription unit, in number of nucleotides from the start coordinate, is given in field 721.
  • the abort status field 722 indicates whether the polymerase is progressing normally or has stalled, e.g., with a Boolean value.
  • Other fields for a particular transcription unit are indicated by ellipses to the right of field 722, Fields 702 through 722, and any following fields, are repeated for the next transcription unit, as indicated by ellipsis below field 721.
  • the methods 790 of the transcript state class include: a method 791 to retrieve values from an object of the class; a method 792 to update values in one or more fields; and a method 793 to initialize values in one or more fields.
  • the illustrated embodiment includes one or more methods 794 to derive a value in one field from values in one or more other fields, such as calculating the dry weight of nascent transcripts.
  • Other methods for the chromosome state variable class are indicated by ellipses below method 794. For brevity, similar methods for the remaining state variables are represented by a single method field.
  • the Transcript state class retrieves the existence of an aborted sequence from the RNA polymerase state class. Upon RNA polymerase displacement, the Transcript state class updates the Transcript state in the Chromosome state class. It also records the start sites and directions of transcription units in the Chromosome state class.
  • the Transcription sub-model class uses all of the fixed properties and time evolving properties housed in the Transcript state class.
  • RNA polymerase may be initialized in the actively transcribing state. For all such polymerases, the growing transcript is accounted for in the Transcript state class.
  • FIG. 8 A is a block diagram that illustrates example stages of RNA maturation and change tracked in the RNA state variable class, according to an embodiment.
  • RNAs 761 resulting from transcription undergo various steps towards maturation, including cleavage from polycistronic (transcription unit) to single RNA forms such as intergenic RNA segments and processed RNA forms 765, including tRNA, mmRNA, sRNA, tRNA and tmRNA. Further processes of methylation and thiolation produce mature RNA 767 of the same types. Maturation is carried out by the RNA Processing and RNA Modification process classes.
  • RNAs may exist with proteins in a macromolecular complex, and when the complex is marked for decay by the Protein Decay process class, its RNA subunits are marked as damaged in the Protein
  • RNA Decay process class Monomer state class. Damaged RNAs are degraded by the RNA Decay process class.
  • RNA state variable class RNA state variable class
  • this state class holds information about RNAs such as their weights, compositions, lengths, and localizations. It also holds important parameters for the maturation pathway of RNA, such as a number of matrices that map one form of RNA to another, and the gene composition of transcription units. Other parameters such as expected RNA half-lives and RNA weight fractions are included in this class, and are used to determine the RNA Poiymerase-DNA binding parameters associated with gene expression.
  • FIG. 8B is a block diagram that illustrates example fields in the RNA state variable class 800, according to an embodiment.
  • Field 801 indicates a number of RNA species with unique nucleotide sequences populating the cell. For each RNA species there is a field 810 with multiple component fields. Other fields 810 are indicated by ellipsis below field 810.
  • State class methods 890 are also included in the class 800.
  • Each RNA species field 810 includes an RNA species ID field 811, uniquely identifying the species, e.g., by an alphanumeric code or name or name of gene of which the RNA is a product.
  • the length of the RNA in number of nucleotides and the nucleotide sequence are indicated in fields 813 and 815.
  • the RNA is identified by its sequence in field 815; and, field 81 1 is omitted.
  • the RNA composition during several stages is tracked by fields 821, 831 and 841, among others indicated by ellipsis, for unmodified (nascent) RNA, cleaved (processed) RNA composition, and modified (mature) RNA composition, and for fields implied by ellipsis, aminoacylated or bound or damaged RNA compositions, or some combination.
  • the compartment localization if any, is indicated in fields 823, 833 and 843, among others, respectively: the copy count is indicated in fields 825, 835, 845, among others, respectively; the molecular weight is indicated in fields 827, 837, 847, among others, respectively; and whether marked for decay and decay rate or half-life is indicated in fields 829, 839, 849, among others, respectively.
  • RNA decay rates are calculated from the experimentally measured half-lives.
  • the actual decay rates and RNA expression values used in the model are fit from the experimentally measured values such that the cell will double its mass and successfully divide by the end of the cell cycle.
  • FIG. 8C is a table that lists example interactions between different cell processes and the fields in the RNA state variable class.
  • RNAs are expressed and at what quantities. The determination of which RNAs are expressed and at what quantities is determined randomly based on expected RN A expression and expected total RNA mass fraction.
  • the Polypeptide state variable class keeps track of the progress of translation. In some embodiments, binding of free ribosomes to mature mRNAs is determined by mRNA availability. Once bound, a polypeptide is synthesized at a rate of up to 16 amino acids per second. Therefore, polypeptides may take multiple time steps to be completed as a protein monomer.
  • the Polypeptide state class holds information about which mRNAs are ribosome bound, as well as ribosomal progress of translating a transcript across time steps. For reasons such as limited resources, a ribosome may stall resulting in an incomplete polypeptide.
  • a proteolysis tag (a short peptide added to the end of a nascent polypeptide) may be synthesized to mark the ribosome for release and mark the aborted polypeptide for degradation. This state also houses the progress of proteolysis tag synthesis and the amino acid sequences of the incomplete polypeptides that are to be degraded.
  • the Polypeptide state class serves as a "support system" for translation, providing it with information that is required to translate polypeptides.
  • the state holds fixed information such as the length, tRNA sequence, and amino acid sequence of every protein monomer for the cell, and the length, molecular weight, tRNA sequence, and amino acid sequence of every possible proteolysis tag for the cell.
  • FIG. 9 is a block diagram that illustrates example fields in the Polypeptide state variable class 900, according to an embodiment.
  • a polypeptide field 910 is utilized for every ribosome bound to an mRNA, whether the ribosome is actively translating or not.
  • field 901 indicates the copy count for bound ribosomes. This is redundant with information included in the Ribosome state variable; thus, field 901 is omitted in some embodiments.
  • the ribosome field 911 uniquely identifies a particular ribosome.
  • the associated mRNA species chat the ribosome is bound to is indicated in field 913, such as with a unique species or copy identifier.
  • the length is indicated in field 915.
  • the corresponding tRNA sequence and amino acid sequence are explicitly stored in fields 917 and 919, respectively.
  • Progress is indicated by the current translated length in field 921, and polypeptide weight in field 923, A unique identifier for a proteolysis tag species or copy, if any, is indicated in field 925.
  • a non-null value in field 925 indicates the translation has stalled or aborted and the corresponding polypeptide is marked for degradation. Similar sets of fields for other polypeptides being formed by ribosomes are indicated by ellipsis below field 910.
  • proteolysis tags available for marking an aborted or stalled polypeptide are provided in following fields. For example a unique identifier for the proteolysis species or copy is indicated in fields 931a and 931b, among others indicted by ellipsis, collectively referenced as field 931 , for multiple species or copies. Similarly, the proteolysis tag length is indicated in fields 933a and 933b, among others indicted by ellipsis, collectively referenced as field 933, for multiple species or copies.
  • the tRNA and amino acid sequences for the proteolysis are indicated in fields 935a and 935b, among others indicted by ellipsis, collectively referenced as field 935 for the tRNA sequences, and in fields 937a and 937b, among others indicted by ellipsis, collectively referenced as field 937 for the amino acid sequences.
  • the proteolysis tag weight is indicated in fields 939a and 939b, among others indicted by ellipsis, collectively referenced as field 939, for multiple species or copies.
  • the Polypeptide state variable class includes methods 990.
  • polypeptides and proteolysis tags both reads and updates the time evolving parameters describing polypeptides and proteolysis tags.
  • the Protein Decay process class reads the sequence and length of each aborted polypeptide.
  • the state is initialized so that ribosomes are already bound to mRNAs and in the process of elongating a polypeptide. Each growing polypeptide is accounted for in the Polypeptide state class. No proteolysis tags exist at the start of the example simulation.
  • FIG. 10A is a block diagram that illustrates example stages of protein monomer maturation and change tracked in the Protein Monomer state variable class, accordins to an embodiment. As diagrammed in FIG.
  • 10A protein monomers are found in nascent form 1081 after translation, processed (I) form 1082 after deformylation or methionine cleavage, translocated into or through the cell membrane 1083, processed (II) to remain in the membrane 1084 or through the membrane as signal sequences 1084, folded 1086, and mature 1087 after phosphorylation, lipoate ligation or glutamate ligation as the protein monomer moves through the maturation pipeline.
  • Mature protein monomers can be combined with other macromolecules to form protein complexes 1089, as described in more detail next.
  • a monomer non-functional resulting- in the misfolded 1088a, inactivated 1088b, or damaged 1088c forms.
  • a mature monomer may be freely floating in the cytoplasm or bound to another molecule in the cell to generate a bound form 1088d.
  • a translation factor may be bound to an mRNA, or a topoisomerase may be bound to a chromosome. The counts of monomers in each of the maturation phases, functional forms, and non-functional forms are scored in the Protein Monomer state variable class.
  • a purpose of the Protein onomer state variable class is to hold the counts and attributes of the monomelic species in the system.
  • This state class holds important information describing protein monomers such as their molecular weights, amino acids composition, length (in amino acids), and half-lives. A subset of monomers is translocated to different compartments, such as the membrane or terminal organelle.
  • the Protein Monomer state class contains information about where each monomer localizes.
  • the Protein Monomer state class also houses information that is important for the fitting and initialization of the model. Some essential functions require a certain abundance of particular proteins. For example, the Cell Division sub-model may require at least some threshold abundance of the division protein FtsZ for cell division to ever be possible.
  • the system is then fit such that in an unperturbed state, at least this threshold amount of FtsZ is produced.
  • This state stores the minimum expression of each monomer.
  • the current version of this model involves a stochastic generation of the initial protein abundances in the cell. The initial abundance of certain monomers is more rigid than others for the maintenance of a stable system that can grow and divide.
  • the Protein Monomer state class stores the degree of variation allowed in the initial abundances of each monomer.
  • FIG, I OB is a block diagram that illustrates example fields in the Protein Monomer state variable class 1000, according to an embodiment.
  • the number of protein monomer species, each species with a different amino acid chain, is indicated in field 1001.
  • a protein monomer species field 1010 includes multiple component fields. Similar fields 1010 for other species are indicated by the ellipsis below field 1010.
  • One or more class methods 1090 are also included in the class 1000.
  • the component fields of field 1010 include the following.
  • Field 1011 indicates a unique identifier for the monomer species.
  • the length, amino acid sequence and weight of the monomer are indicated in fields 1013, 1015 and 1017, respectively.
  • Fields 1021 , 1023, 1025 and 1027 indicate the catalysis reactions, prosthetic groups, chaperones that assist in folding the monomer, if any, and toplogy in addition
  • fields 1013, 1015, and 1017 indicate the reactions catalyzed by each monomer (or parent complex), the prosthetic groups required by each monomer, chaperones that assist in folding the monomer, and each monomer's topology.
  • Catalysis may be defined in this instance as the reaction catalyzed by the protein monomer (or parent complex).
  • Prosthetic groups include the metal ion cofactors that provide metal ion chemistry critical for chemical catalysis.
  • Topology can be defined as the 3D- dimensional structure (e.g. domains).
  • the DNA footprint of the monomer is indicated in field 1031.
  • the processing enzyme that advances the nascent monomer to a processed monomer is indicated in field 1033.
  • the count for the processing enzyme is indicated in field 1035.
  • the counts for the various forms in the various compartments are indicated in the next fields.
  • the total count, nascent count, processed count, folded count, and mature count in the cytosol compartment are indicated in fields 1041, 1043, 1045, 1047 and 1049, respectively.
  • the same breakdown counts for the membrane compartment are indicated in fields 1051, 1053, 1055, 1057 and 1059, respectively.
  • the same breakdown counts for the organelle compartment are indicated in fields 1061, 1063, 1065, 1067 and 1069, respectively.
  • a purpose of the Protein Monomer state variable class is to hold the counts and attributes of the monomelic species in the system. This state class holds important information describing protein monomers such as their molecular weights, amino acids composition, length (in amino acids), and half-lives. A subset of monomers is translocated to different compartments, such as the membrane or terminal organelle.
  • the Protein Monomer state class contains information about where each monomer localizes. The Protein Monomer state class also houses information that is important for the fitting and initialization of the model. Some essential functions require a certain abundance of particular proteins.
  • the Cell Division sub-model may require at least some threshold abundance of the division protein FtsZ for cell division to ever be possible. 'The system is then fit such that in an unperturbed state, at least this threshold amount of FtsZ is produced, This state stores the minimum expression of each monomer.
  • the current version of this model involves a stochastic generation of the initial protein abundances in the cell. The initial abundance of certain monomers is more rigid than others for the maintenance of a stable system that can grow and divide, The Protein Monomer state class stores the degree of variation allowed in the initial abundances of each monomer.
  • FIG. 8C is a table that lists example interactions between different cell sub-models and the Protein Monomer state variable class, according to an embodiment
  • the system is initialized with a set of proteins. The determination of which proteins are expressed and at what counts is determined randomly based on expected protein expression and expected total protein mass fraction.
  • Protein complexes are used by almost all of the sub-models in the system to perform their respective functions.
  • a protein complex could be as small as transketolase, a dimer, or as large as pyruvate dehydrogenase that has 192 monomelic subunits. In organ sms other than M. genitalium, a protein complex could be larger than 192 monomelic subunits.
  • a complex may contain a mixture of protein and RNA subunits (e.g. ribosome), metabolites (e.g. DnaA protein bound to ATP), or ions (e.g. oxidized form of the protein thioredoxin).
  • a comple can also exist in various locations within a cell including in the cytoplasm, membrane, special organelle, or bound to the chromosome.
  • the Protein Complex state class holds information about complex composition, as well as the counts of each complex in the ceil, except for complexes, such as ribosomes and RNA polymerase that are specifically handled in other state variable classes.
  • the Protein Complex state class also holds values for fixed parameters used by multiple sub-models, including the complex subunit composition, molecular weights, amino acid composition, half-lives, and localization. Some functions essential to cell survival require a certain threshold abundance of particular protein complexes, and so the Protein Complex state class also holds the minimum expression of each complex.
  • FIG. 11 A is a block diagram that illustrates example fields in the Protein Complex state variable class 1100, according to an embodiment.
  • Field 110.1 indicates a number of protein complex species that populate the cell.
  • a protein complex species field 1110 holds data for several component fields, including a complex species identifier (ID), molecular weight, half-life, number of subunits and subunit identifier in field 1111 , field 1113, field 1115, field 1131 and field 1133.
  • ID complex species identifier
  • the species copy counts in various compartments are indicated in cytosolic count field 1121, membrane count field 1123, organelle count field 1125, among others indicated by ellipsis to the right of field 1125.
  • FIG. 1 1 B is a table that lists example interactions between different cell sub-models and the Protein Complex state variable class, according to an embodiment.
  • RN A polymerases are protein complexes that bind to gene promoters on the chromosomes and mediate the synthesis of RNA transcripts. This state class helps keep track of the conditions and chromosomal locations of RNA polymerases in the cell. In the illustrated embodiment, the condition of the RNA is tracked among four conditions: free (not bound to DNA); non-specifically bound (bound to DNA but not to a specific gene promoter and not transcribing); specifically bound (bound to a specific gene promoter); and, actively transcribing (moving along a gene adding RNA bases to produce an RNA molecule).
  • the RNA Polymerase state class stores the condition of each RNA polymerase in the simulated cell. Transition between the conditions may involve RNA polymerase association and dissociation from the cell's chromosome(s). This state class holds the precise positions on the chromosome(s) where polymerases are bound.
  • the RNA Polymerase state class also handles basic accounting. For example, upon an RNA polymerase decay event, a free polymerase is decremented. Further, this class records the premature release of RNA polymerases from the chromosome(s) and passes information about the aborted transcript to the Transcript state class. Further, the Transcription submodel requires the RNA Polymerase-promoter binding probability for each transcription unit.
  • RNA Polymerase state class can also calculate the total number of RNA polymerases in each condition
  • FIG, 12 is a block diagram that illustrates example fields in the RNA Polymerase state variable class 1200, according to an embodiment, The number of RNA Polymerase molecules in the cell is indicated in field 1201. Details about each copy are indicated in field 1210 which is repeated for each copy as indicated by ellipsis below field 1210. Class methods 1290 are included in the RNA Polymerase state class 1200.
  • Field 1210 includes component field 1211 that identifies a particular copy, and field 1213 that indicates a location on the chromosome where the RNA polymerase is bound. Also included are values for some parameters, such as the probability that an unfolded RNA polymerase is free, bound, specifically bound or actively transcribing, as indicated in field 1221, field 1223, field 1225 and field 1227, respectively. Different probabilities may apply if the RNA polymerase is folded. The probability that a folded RNA polymerase is free, bound, specifically bound or actively transcribing, is indicated in field 1231, field 1233, field 1235 and field 1237, respectively. In some embodiments, the probability that unfolded RNA is free, bound, specifically bound or actively transcribing, is also included in other fields represented by ellipsis.
  • This state class interacts with the sub-models and other state classes as follows.
  • the Chromosome state class updates the chromosomal positions of RNA polymerases in the RNA Polymerase state class.
  • the Transcript state class reads the aborted RNA sequences from and writes the updated aborted sequences to the RNA Polymerase state class.
  • the Transcription sub-model reads the RNA polymerase conditions and progress from this state class and updates the RNA polymerase conditions and progress to the RNA Polymerase state class.
  • the Transcription sub-model also reads the RNA polymerase condition transition probabilities and RNA polymerase binding probabilities from the RNA Polymerase state class.
  • the Transcriptional Regulation and DNA Supercoiling sub-models record the fold change in RNA polymerase binding probabilities to the RNA Polymerase state class.
  • RNA polymerases are initialized as follows: 1. Each RNA polymerase is randomly assigned (with replacement) to one of the actively transcribing, specifically bound, non- specifically bound, or free states weighted by the expected occupancy of each state. 2.
  • Actively transcribing and specifically bound polymerases are randomly assigned to transcription units weighted by the transcription unit binding probabilities (Ptu). 3. Actively transcribing polymerases are randomly assigned to positions within the assigned segment of their assigned transcription unit (positions near the segment border are not allowed to prevent polymerases from being too close to each other) with uniform probability. 4. Non- specifically bound polymerases are randomly assigned to an accessible region on the chromosome.
  • Ribosomes are a specific kind of protein complex handled separately. Ribosomes are large ribonucleoproteins which synthesize polypeptides. Following ribosome assembly, ribosomes synthesize amino acid polymers according to nucleic acid templates specified by mRNA and decoded by tRNA. Upon reaching the stop codon UAG, ribosome release factor binds to the ribosome, recognizes the stop codon, hydrolvzes the peptidyi-tRNA bond, and releases the terminal tRNA, Finally, an elongation factor and an initiation factor dissociate the ribosomal subunits, the mRNA, and the ribosome release factor.
  • tmRNA Upon prolonged translational stalling, tmRNA can displace both ribosome-bound tRNA and mRNA, leading to the generation of chimeric polypeptides containing an N- terminal mRNA-coded domain and a C-terminai tmRNA-coded degradation domain, or SsrA tag.
  • the protein degradation machinery recognizes the SsrA tag and degrades the chimeric polypeptide.
  • the Ribosome state represents (1) the status actively translating or stalled- of each ribosome, (2) the mRNA, or tmRNA in the case of stalled ribosomes, species each ribosome is bound to and translating, and (3) the position, in codons, of each ribosome from the start, codon,
  • the status, bound (t)mRNA, and (t)mRNA position of each ribosome are implemented as scalar integer variables in the illustrated embodiment.
  • FIG. 13 is a block diagram that illustrates example fields in the Ribosome state variable class 1300, according to an embodiment. The number of ribosome copies is indicated in field 1301.
  • the field 1310 For each ribosome copy there is a field 1310 that is repeated for every copy as indicated by ellipsis.
  • the field 1310 includes a copy identifier (ID) in field 131 1, the status of the copy in field 1313, RNA bound to the ribosome in field 1315 and the position of the ribosome from the star codon in field 1317, among others indicated by ellipsis to the right of field 1317.
  • ID copy identifier
  • the Ribosome state class is related to other state classes and sub-models as follows.
  • the Polypeptide state represents the sequence of each nascent polypeptide.
  • the Protein Complex state represents the copy numbers of free ribosomal subunits (e.g., 308 and SOS in M. genitalium), and of mRNA-bound ribosomes (e.g., 70S in M. genitalium).
  • the RN A state represents the copy number of each mRN A species and of tmRNA.
  • Three submodel Translation, RN A Decay, and Protein Decay access and modify the Ribosome state.
  • the Translation sub-model initializes the values in the state class and models the formation of ribosomes on mRNA, polypeptide synthesis catalyzed by ribosomes, and ribosome disassembly following translation termination.
  • the Translation sub-model also models ribosome stalling, tmRNA substitution, and SsrA degradation tag synthesis.
  • the Translation sub-model predicts the state of each ribosome, the mRNA or tmRNA species each ribosome is bound to, and the elongation rate of each nascent polypeptide,
  • the mRNA and tmRN A degradation events modeled by the RNA Decay submodel trigger early ribosome disassembly resulting in incomplete polypeptides
  • ribosome degradation events modeled by the Protein Decay sub-model result in incomplete polypeptides
  • Both sub-models decrement the copy count of mRNA-bound ribosomes represented by the Protein Complex and Ribosome states.
  • the Ribosome Assembly submodel models the assembly of ribosomal subunits from rRNA and ribosomal protein monomers.
  • the Protein Activation sub-model models the effect of antibiotics on the catalytic activity of ribosomal particles.
  • RNA and Protein Monomer states initialize the total copy count of each RNA and protein monomer species and the Ribosome Assembly sub-model initializes the total copy count of the ribosomal subunits
  • the Translation sub-model initializes the RNA and Protein Monomer states
  • Ribosome and Polypeptide states, and forms ribosomes equal to the minimum of the subun.it copy counts, and randomly positions each ribosome on mRNA weighted by the copy count of each mRNA species.
  • Cytokinesis is the division of a cell into two daughter cells.
  • Many cells including example organism M. genitalium, contain a cytokinesis protein called FtsZ that assembles into long filaments that are implicated in cell pinching during cytokinesis. These filaments bind to the cell membrane at the midline of the cell.
  • FtsZ is a GTPase, and when the GTPs bound in a membrane-bound FtsZ filament hydrolyze to GDP, the filaments bend, thus constricting the membrane.
  • a geometric process of iterative filament bending is used in a cytokinesis sub-model, modified from a model proposed by Li et al.
  • the FtsZ ring at the midline of the cell can exist in many states of various numbers of filaments in the bent and straight configurations.
  • the purpose of the FtsA ring state class is to keep track of the state of the FtsZ ring across time steps.
  • the FtsZ ring is represented as a polygon of FtsZ filaments. As the circumference of the midline of the cell decreases with time, the number of edges in the inscribed FtsZ polygon decreases.
  • the FtsZ filaments are of a fixed length (Fl)
  • NE rounddown ⁇ ⁇ / arc sin ( i/CD) ⁇ (1)
  • rounddown is a function that rounds the value of its argument to the greatest lower integer.
  • the Cytokinesis sub-model determines how many straight and/or bent filaments reside at each edge of the FtsZ polygon. At each edge, any of the following occupancies are possible: 1 straight filament; 2 straight filaments; 1 bent filament; 2 bent filaments; 1 bent filament and 1 straight filament; 1 bent filament and 2 straight filaments
  • the FtsZ Ring state class keeps track of the filament occupancy of each edge. As the filament occupancy in a time step is highly dependent on the occupancy at the previous time step, the tracking of the state of the FtsZ ring is considered important in the time evolution of Cytokinesis.
  • FIG. 14 is a block diagram that illustrates example fields in the Cytokinesis state variable class 140, according to an embodiment.
  • the copy count of a particular cytokinesis protein such as the number of FtsZ edges, is indicated in field 1401.
  • a cytokinesis protein copy field 1310 indicates an identifier (ID) for the copy in field 141 1 and the cytokinesis phase (e.g., number of bent or straight edges or both) of the copy in field 1413, among other properties indicated by ellipsis to the right of field 1413.
  • ID identifier
  • the properties of other copies, as well as the properties of other species of cytokinesis proteins, are indicated by ellipsis below field 1400.
  • the class 1400 also includes class methods 1490.
  • the cytokinesis protein state class is represented by a FtsZ Ring state class
  • Integration with other states and sub models are as follows, A method 1409 of this class reads the diameter of the cell's midline (CD) from the Geometry state class and computes NE using Equation 1.
  • the Cytokinesis sub-model reads the number of FtsZ ring edges (NE) and the filament phase at each edge from the FtsZ Ring state class.
  • the Cytokinesis sub-mode! updates the filament phase at each edge and stores the updated value in the FtsZ Ring state class,
  • the values of the field are initially set to zero, because a FtsZ ring is not assembled at the beginning of the simulation.
  • the Metabolism sub-model models the dynamics of the transport and chemical reactions whic provide the metabolic building blocks required for macromolecular synthesis and drive cellular growth.
  • M. genitalium for example, is modeled with metabolic 645 reactions.
  • the Metabolic Reaction state class records the instantaneous flux of each metabolic reaction in reactions per second as a floating point vector. This state also records the instantaneous cellular growth rate predicted by the Metabolism sub-model in cells per second as a floating point scalar.
  • FIG. 15 is a block diagram that illustrates example fields in the Metabolic Reaction state variable class 1500, according to an embodiment.
  • the number of different metabolic reactions is indicated in field 1501.
  • field 1510 indicates a unique identifier (ID) for the reaction in field 151 1 , a description of the reaction in field 1513, such as the stoichiometry of the reaction Mmr, which indicates the number of particular metabolite m in reaction r, and the calculated flux in field 1515, among other properties indicated by ellipsis.
  • Field 1510 is repeated for each different reaction species, as indicated by ellipsis under field 1510,
  • One or more class methods 1590 are included in state class 1500. 3.11 Metabolite state variable
  • Cells use nucleic and amino acids to synthesize DNA, RNA, and protein. Cells also use ions and other prosthetic groups to stabilize macromolecules. In addition, cells use small molecule bonds and gradients to store energy and drive cellular processes. In particular, ceils drive many energetically unfavorable reactions through hydrolysis of the high energy intermediates ATP and Guanosine-5'-triphosphate (GTP). Furthermore, cells use coenzyme functional moieties to facilitate chemical catalysis. Additionally, cells use small molecules
  • the Metabolic state variable class represents the copy number of each metabolite in each of 3 compartments— cytosol, membrane, and extracellular space— as an integer matrix.
  • FIG. 16 is a block diagram that illustrates example fields in the Metabolite state variable class 1600, according to an embodiment.
  • a unique identifier for the metabolite species is indicated in field 1611, a
  • Metabolite class methods 1690 are also included in class 1600.
  • the Metabolite state class is integrated with other state classes and sub-models as follows.
  • a Simulation object e.g., method 500 depicted in FIG. 5A
  • the Simulation object also allocates shared metabolites among processes at each time step, during step 530. Cytosolic- and membrane-localized metabolites are included in the cell mass calculated by the Mass state.
  • the Metabolism sub-model models the import of extracellular nutrients into the cytosol and membrane, the conversion of those metabolites into the precursors of DNA, RNA, protein, and lipids, and the export of byproducts into the extracellular space.
  • the exchange rate of each metabolite of the flux -balance analysis metabolic model is limited by its extracellular copy number.
  • 23 additional sub-models access and modify the Metabolite state, primarily drawing cytosolic metabolites to support macromolecule synthesis.
  • Four processes do not directly interact with the Metabolite state: Host Interaction, Macromolecular Complexation, Special Organelle Assembly, and
  • the Metabolite state initializes the extracellular copy number of each metabolite according to the reconstructed medium composition.
  • the Chromosome, RNA, Protein Monomer, Protein Complex, Transcript, and Polypeptide states initialize the copy number of each macromolecule species, and decrement the metabolite copy numbers to maintain the initial cell mass.
  • M. genitalium the chemical compositions of M. genitaiium and the extracellular medium were initially reconstructed based on the experimentally characterized compositions of M. genitalium. and SP-4 medium.
  • the cell and medium composition were both supplemented to support the metabolic demands of the 28 sub-models, and included 722 metabolite species that serve many important functions in over 1,100 chemical reactions across three compartments— cytosol, membrane, and extracellular space.
  • the empirical formula and structure of each metabolite was curated based on an extensive review of the primary literature including two genome-scale metabolic models of M. genitaiium and E.
  • Binary cell division is modeled as the cell varying from a spherical to short rod shape.
  • the cell is modeled as a cylinder with two hemispherical caps, and growth is modeled in the cylinder length. Once cell pinching commences at the midline of the cell, the shape and size of a "septum region" is also modeled.
  • the Geometry state class calculates and keeps track of the physical shape of the cell including its width, length, volume, and surface area. This state also keeps track of the progress of cytokinesis, and the completion of cytokinesis is the trigger to end the entire simulation,
  • Geometry state uses a set of geometric equations to calculate the shape of the cell, given the width (C ), density (Cp), and cytokinesis progress of the cell.
  • the state methods provide for the calculations of cell length (CI), volume (CV), surface area (CA), and progress of cell pinching.
  • the geometric representation involves an assumption that the cell density (Cp) and cell width (CW) are constant across the cell cycle.
  • Cp cell density
  • CW cell width
  • Various sources have indicated that the volumetric density of a cell remains constant over time.
  • the cell width is calculated based on the initial cell mass (CMO) and density and the assumption that the cell is a sphere.
  • the geometric equations 2 through 8 use the fixed width and fixed ceil density assumptions to calculate all the other aspects of the cell geometry.
  • the mass (CM) and density (Cp) enable calculation of the volume (CV) at all time points as given by Equation 2.
  • the cell geometry is approximated as two hemispheres separated by a cylinder.
  • Equation 4 lc is the length of the cylinder between the hemispheres and is initially equal to zero, and can be stated in terms of the cell volume (CV) based on rearranging Equation 3 and substituted into Equation 4.
  • FIG. 1 7A is a block diagram that illustrates example cel l geometry with a cell division septum region, according to an embodiment.
  • the cell retains the two hemispheric caps, the cylinder is split in half and a septum region is inserted at the cell center line.
  • the distance from the cell center to the edge of the septum region is Is.
  • the cylinder length (lc) is the combined length of the two separated cylinders.
  • the septum length (Is) is the length from the cell midpoint to the edge of the septum region. This septum length is calculated from the "pinched diameter" property that is calculated in the Cytokinesis submodel.
  • Each half of the septum volume is calculated as the area of two quarter circles (radius: Is) and one rectangle (width: Is, height: CW ⁇ lis), integrated around the midline cylindrically.
  • Is radius
  • width width
  • Is height
  • CW ⁇ lis surface area
  • CV two hemisphers + cylinder 4- septum region
  • the cell length lc can be stated in terms of the cell volume (CV) based on rearranging
  • Equation 6 Equation 6 and substituted into Equation 7.
  • FIG, 17B is a block diagram that illustrates example fields in the Geometry state variable class 1700, according to an embodiment.
  • the cell length lc is indicated in field 171 1 , width CW in field 171 3, volume CV in field 1715, surface area CA I field 1717 and progress of cytosis, e.g.. septum length Is, in field 1719. Other fields are indicated by ellipsis.
  • On or more Geometry state class methods 1790 are included in Geometry state class 1700.
  • the Geometry state variable clss interacts with other classes and sub-models as follows,
  • the Geometry state class obtains the initial mass of the cell (to determine the cell width) from the Mass state class. It also obtains the mass of the cell from the Mass state class at all time points to determine the cell volume.
  • the Cytokinesis sub-model class retrieves the pinched diameter Is from the Geometry state class, and sends back the updated pinched diameter.
  • initial conditions are determined as follows.
  • the initial cell volume is calculated based on the initial cell mass (CMO) and the fixed density (Cp).
  • the initial cell width CWO is calculated from the volume and the assumption that the cell is a sphere. All of the other parameters describing cell geometry are evaluated from the mass (CM), volume (CV), density ( Cp ). and width (CW).
  • the cell density (Cp) used is that of E. coli, which has been estimated as 1 100 grams per liter (g/L).
  • a cell hosted by another organism can change the state of the host, such as by an eliciting an immune response by the host to one or more proteins in the membrane or secreted by the cell. Modeling such a response is supported by the properties stored in the Host state variable class.
  • the Host state class represents the instantaneous configuration of the host.
  • FIG. 18 is a block diagram that illustrates example fields in the Flost state variable class 1800, according to an embodiment. A host interaction is indicated in field 1811 and additional interactions are indicated in additional fields represented by ellipses.
  • the class 1800 includes one or more Host state class methods 1890.
  • the M. genitalium terminal organelle is believed to mediate attachment to the human urogenital epithelium and enable its parasitic lifestyle. Upon attachment, lipoproteins are believed to activate host Toll-like receptors (TLRs)l, 2, and 6, the host transcriptional regulator NF- ⁇ , and ultimately the host inflammatory response.
  • TLRs Toll-like receptors
  • the Host state variable class uses six Boolean variables to represent the instantaneous configuration of the human host: Adherent (true if M genitalium is attached to its human host and false otherwise); TLR 1 activation (true if TLR receptor 1 is active and false otherwise): TLR 2 activation (true if TLR receptor 2 is active and false otherwise): TLR 6 activation (true if TLR receptor 6 is active and false otherwise); NF-s B activation (true if NF-KB is active and false otherwise); and Inflammatory response activation (true if the host inflammatory response is active and false otherwise).
  • Cells are composed primarily of water, DNA, RNA, and protein enclosed in a bilipid membrane.
  • the Metabolite, Chromosome, RNA, Protein Monomer, and Protein Complex states represent the detailed molecular composition of the cell.
  • the Mass state calculates the total cell mass from those states. The total cell mass is used as a proxy for membrane surface area in several processes which are based on mathematical models originally developed outside the whole-cell modeling framework. The Mass state calculates the total cell mass. 'The total ceil mass is equal to the sum of the masses represented by the Chromosome, RNA, Protein Monomer, Protein Complex, Transcript, Polypeptide, and Metabolite states.
  • FIG. 19 is a block diagram that illustrates example fields in the Mass state variable class 1900, according to an embodiment.
  • the water mass is indicated in field 191, the chromosome mass in field 1913, the RNA mass in field 1915, the transcript mass in field 1917, the polypeptide mass in field 1921 , the protein monomer mass in field 1923, the protein complex mass in field 1925, the metabolites mass in field 1931 and the bilipid membrane mass in field 1933, among other fields indicated by ellipsis. Though some of these values are redundant with information in other state classes, in the illustrated embodiment, these fields are included in the Mads state for computational efficiency.
  • One or more mass state class methods 1990 are also included in the class 1900.
  • the interaction of the Mass state class with other state classes and sub-models is as follows,
  • the Mass state interacts with the Geometry state which calculates the cell shape, surface area, and volume from the calculated cell mass and observed density. None of the sub-models directly modify the Mass state.
  • the Metabolism sub-model uses the cell mass to bound the rates of exchange reactions.
  • the Replication Initiation sub-model models DnaA- ADP reactivation to DnaA-ATP as a function of membrane lipid mass.
  • the Metabolism and Replication Initiation sub-models use the cell mass rather than the cell volume because these processes are based on mathematical models that were originally developed outside the whole-cell modeling framework.
  • the values of one or more fields are initialized hierarchically. First, the total cell mass is initialized according to a normal distribution wit mean and standard deviation
  • a mean of 13.1 femtograms, fg, 1 fg 10 " grams, and standard deviation 0.66 fg for M. genitali m).
  • the mean initial cell mass was set to the reconstructed value.
  • the variance was fit to match that of the predicted final cell mass following cell division.
  • the total cell mass is divided among individual metabolites according to the reconstructed chemical composition.
  • the Chromosome, RNA, Protein Monomer, Protein Complex, Transcript, and Polypeptide states are initialized as functions of the metabolite copy counts, and the metabolite copy counts are decremented to maintain the initial cell mass.
  • the mass and chemical composition of M. genitalium were reconstructed based on an extensive review of the primary literature and fit to match the 28 sub-models.
  • the molecular composition of the dry mass was hierarchically reconstructed: 1. The M.
  • genitalium dry mass was divided into eight classes— DNA/dNMPs, RNA/NMPs, protein/amino acids, lipids, carbohydrates, polyamines, vitamins & cofactors, and ions. 2. The fractional mass and molecular composition of each class was reconstructed from the literature. 3. The complete molecular composition of M. genitalium was assembled. 4. The dry mass composition was fit to match the 28 sub-models.
  • the mean initial cell mass was fit to the reconstructed value.
  • the variance was fit to match that of the predicted final cell mass following cell division.
  • Cellular composition is the balance of the production and usage of molecules by cellular processes, Consequently, it was necessary to reconcile the reconstructed cellular composition with that predicted by the 28 sub-models.
  • the DNA dry mass fraction including both chromosomal DNA and free dNTP, was increased 322 to be consistent with the predicted mass of the chromosome and the free dNTP pool. The predicted chromosome mass accounted for the observed
  • RNA dry mass fraction including both RNA and free NTPs, was increased 39% to be consistent with the amounts of mRNA, rRNA, sRNA, and tR A desirable to support the observed nine-hour mass doubling time.
  • the NMP composition of the R.NA fraction was set equal to that predicted by the expression, sequence, and modification of each RNA species (as implemented in the 'Transcription, RNA
  • the vitamin & cofactor and ion dry mass fractions were increased to match the amounts of vitamins, cofactors, and ions needed for protein folding and catalysis (as implemented in the Protein Folding sub-model).
  • the protein dry mass fraction, including both proteins and free amino acids was decreased 23% to offset the increased DNA and RNA mass fractions.
  • the amino acid composition of the protein fraction was set equal to that predicted by the expression, sequence, and modification of each protein species (as implemented in the Translation, Protein Processing I, Protein Processing II, and Protein Modification sub-models).
  • the Stimulus state class represents ten properties: the temperature in °C; six types (wavelengths) of radiation in units of Watts per square meter
  • each property is implemented as a floating point scalar.
  • FIG. 20 is a block diagram that illustrates example fields in the Stimulus state variable class 2000, according to an embodiment.
  • the temperature is indicated in field 2011, a first radiation type in field 2013, a radiation level in field 2015 for the radiation type specified in field 2013, and ellipses indicating other radiation types.
  • Stress conditions are indicated in field 2021 for pH induced stress, and other stress conditions are indicated in field 2023 and 2025 and other fields indicated by ellipsis. More stimulus conditions are represented by other fields indicated by ellipsis below field 2021.
  • Stimulus state class methods 2090 are also included in class 2000, In an illustrated embodiment, the stress factors of Table 4 are included in the Stimulus state variable class.
  • the Geometry, Metabolite, Protein Monomer, Protein Complex, and Host states represent additional properties of the extracellular environment.
  • the Geometry state represents the volume of the external environment.
  • the Metabolite state represents the copy counts of extracellular metabolites including antibiotics.
  • the Protein Monomer and Protein Complex states represent the copy counts of extracellularly localized proteins.
  • the Host state represents six properties of the host urogenital epithelium. Three sub-models— Protein Activation, DNA Damage, and Metabolism— access the Stimulus state.
  • the Protein Activation sub-model regulates the activity of four transcription factors as functions of temperature and three stress conditions.
  • the DNA Damage sub-model models the rates of several types of DNA damage as functions of radiation.
  • the Metabolism sub-model models the generation of hydroxyl radicals from water and ⁇ -radiation. None of the sub-models modify the 10 properties of the external environment. [0253] In an example simulation of the M. genitalium example model, temperature was set at 37 degrees C, electron radiation at 2xl0_n Gy/s and gamma radiation at 2.8x10 11 Gy/s. The contents of the other state variable fields were set to zero.
  • the physical, chemical, and biological processes relevant to cell physiology span a wide range of time scales.
  • the whole-cell modeling becomes tractable.
  • the effects of the faster processes are averaged over the time step.
  • the time step was determined as described in method 400 depicted in FIG. 4 at step 41 1 .
  • a time step of about 1 second is effective in the example whole-cell modeling of M. genitalium, which can be approximated as a well-mixed system at this time scale.
  • This state represents the time elapsed from the start of the simulation in seconds as a single integer variable
  • FIG. 21 is a block diagram that illustrates example fields in the Time state variable class 2100, according to an embodiment. Elapsed time is indicated in field 21 1 1 . Other fields are indicated by ellipses. One or more Time state variable class methods 2190 are included.
  • the Time state directly interacts with only three parts of the simulation:
  • the Simulation model 500 linearly advances time in even time increments (e.g., 1 second increments for the example organism). If the cell has not yet divided, each simulation terminates at a predetermined maximum time approximately equal to 150% of the observed mean mass doubling time of the organism (for example at of 50,000 s for M. genitalium).
  • the concentrations of extracellular metabolites and the values of extracellular stimuli can be functions of elapsed time (although held constant in the example simulations). None of the processes or other states depends directly on the Time state.
  • indexes i, j, k, I, m, n, etc. are used repeatedly and are not to be given a particular meaning.
  • the variable letter preceding the index is also used repeatedly and must be interpreted in the light of the local context.
  • the indices i, j, k are reserved as much as is convenient for nucleotide, strand and
  • variable names are capital italicized letters. However, the descriptions in this section and the instruction listings in some figures reuse indices and variable letters more heavily.
  • Cell chromosomes are compacted (e.g., bacterial cell chromosomes are compacted
  • Compaction is also advantageous to support various cellular processes including chromosome replication, cell division, and chromosome segregation to opposing poles of the cell.
  • Cells such as bacteria, employ several mechanisms to compact DNA including "clamping" of the DN A by structural maintenance of chromosome (SMC) proteins, DN A supercoiling, macromolecular crowding, and charge neutralization.
  • SMC structural maintenance of chromosome
  • DN A supercoiling DN A supercoiling
  • macromolecular crowding macromolecular crowding
  • charge neutralization DNA Supercoiling is modeled in a separate sub-model.
  • the Condensation sub-model explicitly models DNA clamping by SMC complexes.
  • FIG. 22A is a block diagram that illustrates features of DNA condensation, according to an embodiment.
  • SMC complexes 2210a and 2210b consist of an SMC core protein 221 1 (e.g., product of gene MG298) and segregation and condensation proteins A 2212 and B 2213 (e.g., Sep A of gene MG213, ScpB of gene MG214).
  • SMC complex 2210 is a "V" shaped structure (with, a head and two legs) that induces positive supercoils in double stranded DNA 2290.
  • the complexes are believed to work with a lock and key mechanism in which first DNA is looped around the legs of the SMC complex, and then an ATP is hydrolyzed to ADP to bind, between the two tails to lock the SMC complex in place.
  • the complexes 2210 bind and clamp the DNA at corresponding different locations, causing many loops in the DNA and resulting in DNA compaction.
  • the loops 2222 around each leg occupy 90 base pairs (bp).
  • a loop 2224 of about 450 bp forms between the two SMC complex legs.
  • there is about one S MC complex 221.0 bound for an average spacing S 7130 bp of DNA.
  • base pairs on a double strand and nucleotides are used
  • An SMC complex threshold spacing parameter (I) 2220 prevents this phenomenon by inhibiting S MC complex binding where the gap between existing SMC complexes 2210 on the DNA 2290 is already much less than the desired spacing.
  • This spacing parameter 2220 was fit such that the average SMC complex spacing S in the model is 7130 bp. The fit of this parameter was assessed by various tests to assure an average SMC spacing S of 7130 bp.
  • SMCs are bound at an average spacing (S) of 7130 bases. Since it is unknown whether SMC complexes bind to specific DNA motifs, at each time step, SMC complexes are bound to random positions on the chromosomes. The binding is weighed by calculated probabilities (P) of a free SMC complex binding to each accessible double-stranded DNA base. The probability distribution is calculated as a step function: bases within a threshold spacing parameter (/) from another SMC complex have a zero probability of being bound, and bases beyond the threshold spacing parameter distance from other SMC complexes bind at a probability of 1/5, SMC comple disassociation from the DNA occurs upon interaction with other DNA binding proteins and is handled by the Chromosome state class.
  • S average spacing
  • the DNA Condensation sub-model models the contribution of SMC proteins to chromosome condensation.
  • SMC-related DNA condensing could be considered along with the effects of topoisomerases in some embodiments, but the effect of the topoisomerases is modeled by tracking the effect of each topoisomerase activity on the DNA linking number (in the DNA supercoiling sub-model).
  • SMCs do not act by strand-passing events (the causing of a nick in the DNA, enabling two double-stranded DNA regions to pass through each other, and the re- ligation of the DNA), and since the exact effect of SMC complex activity on the DNA linking number is not known, SMC condensation is considered as compacting the DNA at a different level and therefore it is modeled independently of the DN A linking number calculations in the DN A Supercoiling sub-model. Macromoleeular crowding and charge neutralization are not presently included in any sub-model of the illustrated embodiment, but these processes are modeled in one or more new or extant submodels in other embodiments. An advantage of computational efficiency is achieved by omitting these other processes, without apparent hindrance to obtaining good agreement with experiments, as described in more detail in another section.
  • FIG. 22B is a flowchart that illustrates an example method 2250 for a DNA Condensation sub-model, according to an embodiment.
  • SMC chromosome
  • step 2261 a maximum number of SMC that can bind during time step is determined.
  • SMC binding limit is the minimum of: a count of free SMC complexes and a count of available ATP molecules.
  • step 2271 SMC complexes are randomly bound to a chromosome.
  • step 2271 includes steps 2273 through 2277.
  • step 2273 it is determined based on data in Chromosome state, free bases where SMC complexes can bind that are more than / bases from a different, already bound SMC complex.
  • step 2275 sites are randomly chosen for binding SMC complexes up to the limit determined in step 2261. For any site, the probability of binding an SMC complex is 1 / (S -21). If the SMC binds, then in step 2277 an SMC- ATP complex is formed and hydrolyzed to an SMC-ADP complex. The copy count of free SMC complexes I reduced and the metabolites involved in hydrolysis are updated (e.g., decrease H20 count and ATP count and increase ADP count and H+ count).
  • the DNA Condensation sub-model reads from and writes to the Chromosome state class. It reads in the regions of DNA that SMC complexes can bind to, and writes back the positions of SMC complexes on the chromosome.
  • M. genitalium has the machinery to perform all of these levels of DNA compaction and was modeled using the method 2250 depicted in FIG. 22B.
  • DNA replication produces two chromosomes that are catenated, or connected like links of a chain. Before cell division can take place, these chromosomes need to migrate to opposing sides of the cell and be decatenated. Chromosome segregation is believed to result from both en tropic factors and enzymatic activity.
  • the Chromosome Segregation sub-model assumes that the chromosomes entropically segregate during replication. That is, it is unfavorable for the two copies of replicated DNA to exist too close to each other. Thus as the replication fork moves, the replicated DNA starts to migrate towards the poles.
  • Enzymatic segregation activity is also modeled.
  • Five chromosome segregation proteins are modeled, including a nucleotide binding domain (e.g., CobQ/CobB/MinD/ParA from MG470), an MraZ protein (e.g., MraZ from MG221), a GTP binding protein (Era from MG387), a GTPase (Obg from MG384), and Topoisomerase IV (ParE from MG203, ParC from MG204).
  • Another segregation protein, FtsZ is accounted for in the Cytokinesis submodel.
  • the energetic cost of segregation has not been experimentally characterized, The energetic cost is set to a nominal value of 1 GTP per segregation event.
  • the superhelical density tolerance is also unknown, so the allowed range of DNA superhe!Ica! density was set to be quite wide, The chromosome state is initialized with I chromosome with superhelical density within the acceptable tolerance.
  • FIG. 23A is a flowchart that illustrates an example method 2300 for a DNA
  • Segregation sub-model Segregation sub-model, according to an embodiment.
  • parameters for DNA segregation are set.
  • the energy for segregation Eseg is set equal to the GTP cost of segregation ( ⁇ 1 GTP); otol, the superhelical density tolerance is set ( ⁇ 0.1); ⁇ , the equilibrium superhelical density is set ( ⁇ -0.06); segregation enzymes are indicated by their protein monomer identifiers (e.g., the proteins from genes
  • step 2302 the state of the chromosome is checked to determine whether the chromosome is fully replicated. If not, the process ends without completing segregation. If so, then in step 2305 it is determined whether the chromosome winding is within range for segregation, e.g., between ⁇ +/- otol. If not, the process ends without competing segregation. If so, then in step 2307 it is determined whether there is at least on free copy of each of the segregation enzymes. If not, the process ends without competing segregation. If so, then in step 2309 it is determined whether there is sufficient energy (e.g., available GTP greater than Eseg demand for GTP). If not, the process ends without competing segregation. If so, then in step 2311 the chromosomes are marked as segregated, e.g., the status in field 616 is updated to segregated.
  • the chromosome winding is within range for segregation, e.g., between ⁇ +/- o
  • FIG. 23B is a list showing the relationship of the DNA Segregation sub-model to cell state variables, according to an embodiment.
  • the sub-model interacts with the Chromosome state class for the properties listed in FIG. 23 B.
  • DNA is not absolutely stable. Rather, DNA is susceptible to spontaneous modification including base loss and deamination, DN A is also susceptible to modification by many exotic agents including UV-A and UV-B radiation, ⁇ -, ⁇ -, and ⁇ -radiation, oxygen, hydroxyl, carbon, and nitrogen radicals, and alkylating agents. These diverse DNA modification modes generate equally diverse DNA structures including missing and modified sugar-phosphates, missing and modified nucleotides, intra- and inter-strand cross links, and strand breaks. DNA damage is an evolutionarily important source of genetic variation.
  • each DNA damage reaction models each DNA damage reaction as an independent Poisson process.
  • the rate, ri, of each spontaneous DNA damage reaction is given by the observed specific rate ki.
  • the effective rate, ri, of each DN A damage reaction i caused by radiation j is given by the product of the experimentally observed specific rate, ki, and the radiation flux, sj.
  • Simi lar rates are determined in some embodiments for chemically induced damage, caused by chemical agent n, given by the product of the experimentally observed specific rate, ki, and the concentration cn of chemical agent n,
  • the Chromosome state initializes all chromosome bases, fully methylated at R/M sites and otherwise unmodified.
  • Several other processes including Transcription sub-models initialize the protein occupancy of the chromosome.
  • FIG. 24A is a flowchart that illustrates an example method 2400 for a DNA Damage sub-model, according to an embodiment.
  • a DNA damage reaction n refers to damage to any nucleotide in the chromosome. So the specific reaction rate for damage to a nucleotide by reaction n is Kn.
  • the amount of stimulus s associated with the reaction n is Ss and refers not only to radiation flux, but also chemical concentration of a chemical agent, or temperature, or any other real valued stimulus.
  • the effective rate Rn Kn * Ss.
  • Rn Kn.
  • step 2401 parameters are set for reaction n to stimulus s in stimulus state class.
  • step 2411 it is determined if the next base i in strand j of chromosome copy k is susceptible to reaction n. If not, control passes back to step 2403 for the next reaction, if any. If so, then in step 2413, the copy count Nm of metabolite m is retrieved from Metabolite state variable for every metabolite in reaction n. In step 2415, it is determined if there are insufficient metabolites for the reaction to occur, e.g., it is determined whether Nm ⁇ Mmn for any metabolite m. If there are insufficient metabolites, control passes back to step 2403 for the next reaction, if any. If there are sufficient metabolites, then control passes to step 2421.
  • step 2423 it is determined whether the random number X > 1. If not, no damage is done to the base and control passes back to step 2411 for the next base. If so, then damage takes the form of one or more modifications to the base as indicated by Zn, e.g., one or more of gap, abasci, sugar- phosphate, base, cross-link and strand break modifications. Control passes back to step 2411 for the next base.
  • the Chromosome state represents the copy count, modification status, and protein occupancy of each chromosome.
  • the Metabolite state represents the copy number of each metabolite.
  • the Stimulus state represents the stimulus, such as fluxes of six types of radiation: ⁇ -, ⁇ -, and ⁇ -particles, electrons, protons, and UV-A and UV-B in an example embodiment.
  • the Metabolism sub-model models the generation of hydroxyl radicals.
  • the DNA Repair sub-model models four DNA repair pathways: direct damage reversal, base excision repair, nucleotide excision repair, and homologous recombination.
  • the DNA Repair sub-model also models DNA methylation at restriction/modification (R/M) sites. All DNA modifications except methylations of R/M sites are assumed to impede protein DNA-binding
  • FIG. 24B is a list of instructions that illustrates an example DNA Damage submodel for M. genitalium, according to an embodiment.
  • DNA is susceptible to several intrinsic and extrinsic modes of damage. Because damaged DNA is deleterious to many processes including transcription and replication, organisms have evolved specialized machinery to detect and repair damaged DNA.
  • the machinery includes proteins serving as enzymes for detection and repair of damage, including damage sensor DisA and four DNA repair pathways: direct damage reversal (DDR), base excision repair (BER), global genomic nucleotide excision repair (GG-NER), and homologous recombination double strand break repair (HR-DSBR) and type II restriction/modification (R/M) system Muni to selectively degrade foreign DNA.
  • DDR direct damage reversal
  • BER base excision repair
  • GG-NER global genomic nucleotide excision repair
  • HR-DSBR homologous recombination double strand break repair
  • R/M type II restriction/modification
  • DisA is believed to signal DNA damage through the absence of the second messenger cycIic-di-AMP, In the absence of DNA damage, DisA cyclizes ATP to cyclic-di-AMP. In the presence of DNA damage, DisA is believed to bind DNA, adopt a catalytically inactive conformation, and via an unknown mechanism possibly involving cyclic-di-AMP and the transcriptional regulator SpoOA, to induce expression of genes.
  • BER The primary role of BER is to repair small patches of oxidized nucleobases.
  • One BER pathway involves four steps. First, glycosylase Fpg(MG498) or Ung(MG097) hydrolytically cleaves the glycosidic bond between the damaged nucleobase and the DNA backbone, creating an abasic site. Second, two parallel subpathways cleave the
  • the first subpathway begins with cleavage of the 3' phosphodiester bond site by AP lyase Fpg and ends with cleavage of the 5' phosphodiester bond by 5 '-deoxyribosephosphodiesterase Nfo.
  • the second subpathway begins with cleavage of the 5' phosphodiester bond by 5'-AP endonuclease Nfo (MG235) and ends with cleavage of the 3' phosphodiester bond by 3'-(deoxyribose-5'- phosphate) lyase DnaN (MG001).
  • DNA polymerase DnaN restores the missing base using the template provided by the opposite strand.
  • DNA ligase ligates the inserted base.
  • NER The primary role of NER is to repair bulky distortions in the shape of the DNA helix of up to 12-13 bases in length caused by UV radiation and nitricoxide. Compared to BER, NER employs less specific endonucleases and consequently has broader repair capacity. NER repairs DNA via a four-step mechanism. First, UvrABC (MG421, MG073, MG206) identifies a DNA lesion and cleaves the phosphodiester bonds 6-7 bases 3' and 5' to the lesion. Second, helicase PrcA excises the cleaved DNA. Third, DNA polymerase DnaN (MG001) restores the missing bases using the template provided by the opposite strand. Finally, DNA ligase ligates the inserted base.
  • UvrABC MG421, MG073, MG206
  • HR repairs double strand breaks caused by ionizing radiation and stalled replication forks. HR also repairs strand gaps generated by the interaction of replication forks with unrepaired lesions.
  • Recombination repair is modeled as an eight-step process: 1. Initiation: 5 '-3' exonuclease removes dNMPs from the 5' ends of the strand break, leaving 3' overhangs of at least 8 bases. 2. Strand exchange: RecA (MG339) catalyzes the formation of a Holliday junction between one of the damaged 3' overhands and the undamaged homologous chromosome. 3. Polymerization: DnaN (MG001) polymerizes DNA guided by the undamaged chromosome template. 4.
  • Ligation LigA (MG254) ligates the newly produced DNA. 5. Second strand exchange: RecA catalyzes the formation of a second Holliday junction. 6. Holliday junction migration: RuvA and RuvB widen the distance between the two strand crossover points by moving the Holliday junctions to the preferred sequence 5'-[AT]TTN[GC]-3'532 . 7. Resolution: RecU nicks the DNA at the cross over points, creating four single strand breaks. 8. Ligation: LigA ligates the four strand breaks.
  • Organisms employ R/M systems to distinguish foreign from self DNA. These systems maintain self DNA in a fully methylated configuration by methylating self DNA during chromosomal replication and cleaving unmethylated foreign DNA which has not been exposed to self DNA methylases. Muni methylates the third base of both strands of the palindromic sequence 5'-CAATTG-3', producing N6-imethyladenine.
  • oligonucleotide salvage genes include pcrA (MG244), mgpA (MG190), and polA (MG262). Of these genes, we assumed that mgpA is responsible for oligonucleotide salvage. Individual damaged nucleotides are further metabolized and exported.
  • the DNA Repair sub-model models four DNA repair pathways and methylation and restriction of MunlR/Msites. Because DNA repair is not well understood on the genomic level, this process makes several simplifying assumptions. First, this sub-model represents each step in DNA repair, methylation, and restriction as a separate reaction, and assumes that each reaction is independent. Consequently, the rate of each reaction is determined only by the configuration of the DNA. Second, this sub-model assumes that the mean arrive rate, vi, of each reaction i is independently limited b y(l) the copy counts of intracellular metabolites, mj, and (2) the DNA repair enzyme copy numbers, ej. Based on these assumptions, the function form of vi is given by Equation 9.
  • Mji is the stoichiometry of metabolite / in reaction i
  • M max(0,-M) is the negative part of M
  • Kji is the experimentally observed catalytic rate of enzyme j in reaction i
  • At is the simulation time step. This process also stochastically models the binding of free DisA to D A lesions.
  • the Chromosome state initializes one set of chromosomes (e.g., one Chromosome for M. genitalium), modified only at R/M sites.
  • chromosomes e.g., one Chromosome for M. genitalium
  • FIG. 25A is a flowchart that illustrates an example method 2500 for a DNA Repair sub-model, according to an embodiment.
  • a repair reaction is indicated by index r
  • a metabolite by index m and enzymes by index e.
  • the mean arrive rate of repair reaction r is Vr.
  • the copy count is indicated by variable N.
  • Mmr stoichiometry of metabolite m in reaction r
  • step 2503 the next repair reaction r is selected at random. If none, the method is complete and the sub-model ends. If a reaction remains, then in step 2505 it is determined whether there is a next randomly selected base susceptible to the repair reaction. If none remain, control returns to step 2503 to check the next reaction. If there is a base for the reaction, then in step 2511, copy counts Nm for metabolites and Ne for repair enzymes are retrieved from the cell state variables. In step 2513, the mean arrive rate Vr is determined based on Equation 9.
  • step 2521 it is determined whether Vr > 1. If not, control returns to step 2505. If so, then in step 2523 the repair reaction r is run on the current base and the metabolite counts are updated and enzymatic capacity decreased. Control returns to step 2505 for the next base, if any.
  • the M. genitalium genome contains the DNA damage sensor DisA, the four DNA repair pathways described above, and the type II restriction/modification (R/M) system
  • Muni. M. genitalium employs only one dedicated machine— DisA (MG105)— to recognize DNA damage. M. genitalium, however, does not have most of the downstream signaling machinery associated with B, suhtiUs DisA-dependent induction of expression of genes.
  • DNA ligation is the only DDR pathway employed by M. genitalium, DNA ligation is catalyzed by DNA ligase LigA (MG254) by a Nicotinamide adenine dinucleotide (NAD)-dependent mechanism. DNA ligase repairs single strand breaks as well as ligates DNA synthesized during replication, BER, NER, and HR.
  • DNA ligase LigA MG254
  • NAD Nicotinamide adenine dinucleotide
  • M. genitalium BER repairs single damaged nucleobases.
  • M. genitalium does not undergo long patch BER because M. genitalium does not have a flap endonuclease.
  • M. genitalium NER provides global protection against small DN A lesions.
  • M. genitalium does not undergo transcription-coupled NER because M. genitalium does have the transcription-repair coupling factor mfd. M. genitalium NER repairs DNA via the four-step mechanism described above,
  • M. genitalium HR has a very reduced complement of recombination repair proteins, and follows the eight steps described above with modifications. No traditional initiation gene has been identified in M. genitalium; so, it was assumed that poll-like 5'- 3'exonuclease(PolA, MG262) is the M. genitalium HR initiator. Additionally, HR is believed to be critical for M. genitalium antigenic variation.
  • M. genitalium contains a reduced R/M system consisting of the type I DNA recognition subunit EcoD (MG438) and the type II methylase Muni (MG184).
  • EcoD is the DNA binding domain of a type I multimeric methylation and restriction complex which recognizes the sequence 5'-TCARTTC-3'. Because M. genitalium contains only the DNA recognition subunit and not the methylation and restriction subunits, type I R/M was not modeled. Because M. genitalium does not contain a separate type II endonuclease and because type II R/M systems are often monomeric, it was assumed Muni also has restriction activity.
  • FIG. 25B is a list of instructions that illustrates an example DNA Repair sub-model for M. genitalium, according to an embodiment.
  • DNA naturally exists at a certain level of helicity, and this level of helical density is important for the DNA's stability, its ability to fit in the cell, and its ability to bind proteins. DNA supercoiling can also have an effect on RNA transcription rates as the helicity of the DNA may make given genes more or less accessible.
  • topoisomerase proteins DNA gyrase, Topoisomerase I, and Topoisomerase IV— transiently break a DNA strand to wind (Topoisomerase I) or unwind (Topoisomerase IV, gyrase) the DNA.
  • the opposing actions of the topoisomerase enzymes help maintain a stable level of DNA helicity. This is especially important during replication because while the replication loops progress along the chromosome unwinding the DNA, the coils that previously existed in the DNA persist and accumulate downstream of the replication loop. This over-coiled region is relieved by the actions of topoisomerases.
  • DNA gyrase and Topoisomerase IV each require 2 ATP to induce 2 negative supercoils each time they act. Topoisomerase IV induces positive supercoils relatively rarely and its positive coiling activity is therefore not considered in some embodiments.
  • Topoisomerase I acts to induce positive supercoils. Gyrase, Topoisomerase IV, and
  • Topoisomerase I act at races of 1.2, 2.5, and 1 strand passing events per second, respectively.
  • the supercoiling of the DNA is quantified using a metric known as the DNA linking number, LK, defined as (# of base pairs)/10.5, where 10.5 is the number of bases per turn in a relaxed double helix.
  • the relaxed LK is designated LKrelaxed, and indicates how many complete turns are on the portion of DNA of interest.
  • the current LK, designated LKcurrent may deviate from the LKrelaxed due to the actions of topoisomerases and the progression of replication.
  • the ALK is defined as the difference between the current level of DNA supercoiling and the relaxed level of DNA supercoiling.
  • FIG. 26A is block diagram that illustrates different regions for defining helicity on a replicating chromosome, according to an embodiment.
  • the unreplicated region 2600 of DNA region downstream of the two replication loops
  • the LKrelaxed decreases.
  • FIG. 26B through FIG. 26D are graphs that illustrate the effect of superhelical density on the activity of certain DNA binding proteins, according to an embodiment.
  • FIG. 26B is graph 2610 that shows the activity of Topoisomerase I as curve 2616.
  • the horizontal axis 2612 represents superhelical density ⁇ , which is dimensionless; and the vertical axis 2614 indicates activity in relative units.
  • FIG. 26C is graph 2620 that shows the activity of Gyrase as curve 2626.
  • the horizontal axis 2622 represents superhelical density ⁇ , which is dimensionless; and the vertical axis 2624 indicates activity in relative units.
  • FIG. 26D is graph 2630 that shows the activity of
  • Topoisomerase IV as curve 2636.
  • the horizontal axis 2632 represents superhelical density ⁇ , which is dimensionless; and the vertical axis 2634 indicates activity in relative units
  • topoisomerases are unable to act beyond certain ⁇ thresholds (gyrase can only act at ⁇ above threshold designated Tgyr, Topoisomerase IV can only act at ⁇ above threshold designated TtopIV, and Topoisomerase I can only act at ⁇ threshold designated Ttopl), so the activities of the topoisomerases outside of the thresholds are set to zero.
  • the exact enzyme activity profiles within these bounds is unknown. Fitting a logistic function within the thresholds of gyrase and Topoisomerase I allows the maintenance of ⁇ approximately near ⁇ . Topoisomerase IV is not active near ⁇ and therefore acts more rarely than the other topoisomerases.
  • the sub-model keeps track of enzyme processivity, or how long an enzyme can remain bound to the DNA and continue to act.
  • Topoisomerase IV and DNA gyrase are highly processive and may perform several strand-passing events before falling off of the DNA.
  • Topoisomerase I is not highly processive; it acts on the DN A and falls back off right away.
  • DNA coiling enzymes such as gyrase, Topoisomerase I, and Topoisomerase IV
  • the progress of replication loops can all affect the linking number (and superhelical density) of the DNA.
  • the DNA Supercoiling sub-model evaluates the effects of all the free coiling enzymes in random order, determines what regions they can bind in, and randomly binds them to large enough open positions on the DNA.
  • the linking number is then adjusted based on the actions of all the bound enzymes, and the usage of ATP is determined.
  • the processivity of gyrase and Topoisomerase IV are also tracked so that they can be unbound from the DNA when appropriate.
  • Region 1 includes unreplicated DNA.
  • Region 2 includes Replicated DNA, Chromosome 2 (upstream of replication loops, increasing in length over time);
  • Region 3 includes Unreplicated DNA (downstream of replication loops, decreasing in length over time).
  • Region 1 includes Replicated DNA, Chromosome 1; Region 2 includes Replicated DNA, Chromosome 2.
  • FIG. 26E and FIG. 26F are flow charts that illustrate an example method 2640 for a DNA Supercoiling sub-model, according to an embodiment.
  • step 2643 the next region is selected.
  • step 2645 length of region LR and LKcurrent are determined; and bound coiling enzymes are determiend based on
  • step 2647 superhelical density ⁇ is determined using
  • step 2651 it is determined whether superhelical density is greater than an activity threshold (e.g., 0). If not, then in step 2653 unbind coiling enzymes not active below this threshold such as all parC and parE. In either case control passes to step 2661.
  • an activity threshold e.g., 0
  • step 2671 the next free coiling enzyme that binds at superhelical density greater than zero is selected at random, if any. If found, and superhelical density is greater than zero, then the enzyme is bound to the DNA in step 2677 and the copy counts are updated, if it is determined in step 2675 that there is a length available on the DNA greater than the footprint of the enzyme. Otherwise the next free coiling enzyme, if any, is selected. When there are no more such free enzymes, control passes to step 2681.
  • step 2685 LKcurrent is updated by adding the product of Se and SPe.
  • step 2687 counts of metabolites ATP and H20 consumed and metabolites ADP, PI and H+ produced are updated.
  • step 2689 all non processive enzymes, such as topA, are unbound. Control passes back to step 2681 to determine the next enzyme, if any.
  • step 2691 If no other coiling enzyme remains, then in step 2691, ⁇ for the region is updated and fold change in probability is determined based on response curves for effect of superhelical density on gene expression. Also, the polymerase binding probability field is updated in Chromosome state for genes in transcription unit with any of the coiling enzymes. In step 2693, it is determined if there is another region. If so, control returns to step 2643. Otherwise the method ends.
  • M. genitalium has the three topoisomerase proteins: DNA gyrase, Topoisomerase I, and Topoisomerase IV.
  • the logistic function parameters (Lgyr, LtopJ) were fit such that the equilibrium superhelical density, ⁇ , could be maintained while still allowing both gyrase and Topoisomerase I to act at the equilibrium ⁇ .
  • Linear fits were approximated for gene expression data experimentally measured at varying ⁇ .
  • the upper and lower bounds of ⁇ for applying the linear fit of the ⁇ -gene expression data (olower, oupper) were set in the range where a linear fit was appropriate for the data.
  • FIG. 26G is a list of parameter values set for the DNA Supercoiling sub-model, according to an embodiment for M. genitalium.
  • DNA replication is an integral part of the cell cycle which produces a complete chromosome for each daughter cell.
  • DNA replication is init ated by the formation of a large multimeric DnaA complex at the origin of replication (oriC), which enables the recruiting of the DNA replication machinery to the oriC.
  • Replication proceeds bidirectionally from the oriC to the terminus (terC), half way around the circular chromosome.
  • the chromosome has two strands of base pairing DNA called the leading and lagging strands, Replication of the leading strand proceeds continuously in the 5' to 3' direction. Replication of the lagging strand occurs in short Okazaki fragments, also in the 5' to 3' direction.
  • the DNA replication machinery 2700 consists of multiple replisome proteins, and there is one set of machinery for each of the two replication bubbles (on each side of the oriC).
  • the replicative DNA helicase 2704 serves to unwind the coiled DNA.
  • primase 2705 makes short primers 2707 that help initiate the polymerization of a long stretch of DNA 2709a and 2709b (collectively referenced hereinafter as DNA 2709). Primers are made once at the origin for the replication of the leading strand, and one primer is made to start each Okazaki fragment.
  • DNA polymerization is carried out by DNA polymerase III molecules, consisting of two core subunits 2701 a and 2701b (collectively referenced as core subunits 2701), a gamma complex 2702, and beta-clamps 2703a and 2703b (collectively referenced as beta clamps 2703),
  • core subunits 2701 a and 2701b
  • beta-clamps 2703a and 2703b a and 2703b
  • beta clamps 2703 beta-clamps 2703a and 2703b
  • One core 2701a or 2701b resides on each of the leading strand 2709a or lagging strand 2709b, respectively, of the DNA 2709, and is made up of two alpha subunits.
  • the two cores 2701 are held together by a gamma complex 2702, consisting of delta, delta prime, gamma, and tau subunits.
  • the core 2701 is also bound to a sliding beta-clamp 2703 which helps anchor the polymerase to the DNA and maintain processivity.
  • beta-clamps 2703b are swapped out each time a new DNA loop 2710 is created to make a new Okazaki fragment.
  • a "back-up" beta-clamp 2703c binds downstream of the lagging DNA loop 2710 to facilitate this switch and formation of the next loop. (The lagging stand polymerization requires the formation of DNA loops, so that the DNA can be
  • DNA ligases 2708 connect the separate polymerized Okazaki fragments together.
  • SSBs 2706 stabilize and protect single-stranded DNA, which is created as the DNA is unwound to make room for the DNA machinery.
  • the lagging strand DNA loop 2710 often has long stretches of unwound DN A waiting to be polymerized. All of the replication proteins and all of the parameters used in replication of M. genitalium are described in FIG. 27G, FIG.
  • 27G is a list of parameter values set for the DNA Replication sub-model, according to an embodiment, [0328]
  • the DNA Replication sub-model (also called the Replication sub-model for brevity) tracks the progression of the replication proteins on the known chromosome sequences specified in the Chromosome state variable.
  • the Okazaki fragment lengths are randomly determined based on a Poisson distribution centered around the mean fragment length (JOF)
  • Polymerization of both the leading and lagging strands is limited by the average rate of polymerization (kei) in the organism (e.g., Mycoplasmas), the availability of nucleotides and energy, and the available protein binding sites on the chromosome. Further, polymerization of the leading strand is prevented if additional unwinding will lead to too many unprotected single-stranded bases on the lagging stand, or if the leading polymerase will progress a distance that is greater than two Okazaki fragment lengths beyond the lagging strand polymerase.
  • the Replication sub-model outputs the progress of replication, which has a big impact on many other processes in the system and corresponding sub-models.
  • the copy count of each gene plays a role in RNA polymerase binding in the Transcription sub- model, and affects the cell' s gene expression.
  • the duplicated regions of the chromosome have to be wound appropriately by the DNA Supercoiling sub-model, and the completion of replication triggers other cell cycle events such as the decatenation of the chromosomes by the Chromosome Segregation sub-model.
  • RNA polymerase is traveling in a direction opposite to the helicase
  • co-directionally RNA polymerase is traveling in the same direction as the helicase
  • RNA polymerase If the helicase hits an RNA polymerase, polymerization pauses, the RNA polymerase falls off, and its transcript is degraded, if it is a head-on collision, the impact will stall progression of the replication bubble for some amount of time (tstall). If it is a co-directional collision, polymerization will continue at full speed the following time step.
  • Okazaki fragment lengths were chosen randomly based on a Poisson distribution around the mean length lOF (e.g., about 1500 nucleotides). From this, the start position of each Okazaki fragment is obtained to be used to determine primer, beta-clamp, and polymerase core binding during the simulation.
  • the Replication sub-model involves keeping track of the specific positions of the replication proteins on the chromosome.
  • the position on the chromosome where the proteins bind is determined based on the following rules.
  • the Helicase is centered on the boundary between double stranded DNA (wound) and single-stranded DNA (unwound). The position over which it is centered is the next position to be unwound.
  • the positions over which the polymerase cores are centered are the next positions to be polymerized.
  • polymerase cores are centered at oriC ⁇ 1 base and the he!icases are a distance Iprirn (e.g., 11 nucleotides) ahead,
  • the lagging polymerase, beta-clamp, and primase are accounted for as part of a complex on the leading strand (containing also the helicase, leading polymerase, gamma complex, and leading beta-clamp). At all other times, the lagging polymerase, lagging beta-clamp, and primase are accounted for as a comple on a different strand. This allows for separate tracking of the leading and lagging polymerase positions.
  • SSBs are bound at a fixed spacing (sssb) In the single-stranded regions of DNA.
  • the Replication sub-model is built up of several sub-functions which move the replication proteins along the chromosomes and evolve the chromosome copy count of the cell to double (e.g., from 1 to 2 for Mycoplasmas). These sub-functions are evaluated in random order, so that shared resources, such as energy, are allocated fairly across sub-functions.
  • FIG. 27D through FIG. 27F are flowcharts that illustrate an example method 2720 for a DNA Replication sub-model, according to an embodiment.
  • parameters for the model are set, such as a list of the protein monomers that serve as replisome enzymes, the SSB dissociation rate.
  • the other sub-functions represented by steps
  • 2722, 2730, 2740, 2750, 2760, 2770, 2780) are performed in random order.
  • Step 2722 is a sub-function for starting the replication. Step 2722 includes steps
  • step 2723 it is determined if conditions are satisfied for replication to start, such as an initiation protein at the OriC sites, and sufficient enzymes and metabolites (including nucleic acid bases and ATP and water) to make two replisome complexes, each including 1 helicase, 1 primase, 2 polymerase cores, 1 beta clamp and 1 gamma complex.
  • step 2725 enough DNA base pairs are unwound to accommodate the DNA footprint of and thus allow binding of the two replisome complexes, and those locations are marked as unwound in the Chromosome state class.
  • step 2727 the copy counts of the metabolites are updated, e.g., the copy counts of ATP and water (H 2 0, also H20) are decremented by the ATP energy cost for each unwound base times the number of unwound bases plus twice the ATP energy cost for binding each beta-clamp, and the copy counts of reaction products ADP and inorganic Phosphate (Pi, also PI) and hydrogen ion (H+) are incremented by a like number.
  • the copy counts of the metabolites are updated, e.g., the copy counts of ATP and water (H 2 0, also H20) are decremented by the ATP energy cost for each unwound base times the number of unwound bases plus twice the ATP energy cost for binding each beta-clamp, and the copy counts of reaction products ADP and inorganic Phosphate (Pi, also PI) and hydrogen ion (H+) are incremented by a like number.
  • Step 2730 is a sub-function for advancing the replisome complexes along the DNA.
  • Step 2730 includes steps 2731, 2733, 2735 and 2737.
  • step 2731 it is determined if a Okazaki fragment is starting, e.g., by the current base being a multiple of the fixed length determined during initialization. If not, then control passes to step 2735 to select the next strand. If so, then in step 2733 a primase and polymerase core are bound to the lagging strand and control passes to step 2735 to select the next strand.
  • step 2737 the unwinding and polymerizing of the DNA is modeled by advancing the positions of the leading and lagging polymerases and helicases as far as allowed while following conditions are met, where N indicates the number of polymerized/unwound bases on a given strand in the given time step, and P indicates the last polymerized position varying from 0 at OriC to L/2 at TerC, and L is the length of the chromosome.
  • Step 2740 is a sub-function for binding and unbinding SSBs along the unwound DNA.
  • Step 2740 includes steps 2741, 2743, 2745 and 2747 in which the positions are determined at fixed spacing (sssb), randomly bind a free SSB to each site randomly release a bound SSB based on the SSB dissociation rate, and dissociate free SSB 8-mers into two SSB tetramers, respectively.
  • sssb fixed spacing
  • Step 2750 is a sub-function for binding a back-up beta clamp.
  • Step 2750 includes step 2751 through 2755.
  • conditions that favor binding a free back-up beta clamp are evaluated.
  • the conditions include: copy count for available beta-clamp monomers > 2; copy count for ATP > ATP cost of beta-clamp binding; position on DNA is accessible; helicase has passed beta-clamp binding site; Okazaki fragment length > minimum for beta-clamp binding.
  • step 2753 it is determined if all conditions are satisfied. If not, the sub-function ends.
  • step 2755 the state of the beta-clamp protein complex is marked bound, the copy counts of metabolites are updated, e.g., counts are decremented for ATP and H20 based on the ATP costs of binding, and the counts of metabolites produced ADP, PI and H+ are incremented by a like number.
  • Step 2760 is a sub-function for terminating Okazaki fragments.
  • Step 2760 includes step 2761 through 2765.
  • conditions that favor terminating are evaluated.
  • step 2763 it is determined if all conditions are satisfied. If not, the sub-function ends.
  • step 2765 the state of the beta-clamp protein complex is marked unbound, the Okazaki fragment is marked with a break to be ligated, the lagging strand primase and polymerase are associated with the released clamp, and the copy counts of metabolites are updated.
  • Step 2770 is a sub-function for ligating terminated Okazaki fragments.
  • Step 2770 includes step 2771 through 2775.
  • step 2771 the next single strand with a break, if any, is selected randomly. If none, the sub-function ends. If such a strand is found, then, in step 2773, conditions that favor ligation are evaluated.
  • step 2775 it is determined if all conditions are satisfied. If not, the sub-function ends. If so, then in step 2777, the strand break is repaired by ligation, and the copy counts of metabolites are updated, e.g., copy count of NAD is decremented and copy counts of adenosine
  • AMP monophosphate
  • NMM N-methyl mesoporphorin IX
  • Step 2780 is a sub-function for terminating replication.
  • Step 2780 includes step 2781 through 2785.
  • conditions that favor terminating are evaluated.
  • the conditions include: leading strand polymerized to one side of terC; lagging strand polymerized to other side of terC.
  • step 2783 it is determined if all conditions are satisfied. If not, the sub-function ends. If so, then in step 2785, release all replisome complexes from chromosome so complexes are free and not bound, mark terC with a single strand break for subsequent repair in the Chromosome and Protein Complex state variables.
  • M. genitalium For example, some bacterial species are known to have multiple replication initiation events during their life cycles. This has never been demonstrated in M. genitalium, so in example embodiments for M. genitalium the sub-model performs up to 1 chromosome duplication event per cell life cycle. Further, the exact mechanism of replication initiation in M. genitalium. is unknown. M. genitalium does not include a DnaC homolog, which in other species is an essential cue for the binding of the replication machinery to the oriC. For the M. genitalium. whole-cell model, the binding of 29 DnaA-ATP molecules to the oriC is the cue for replication initiation. [0345] FIG.
  • FIG. 27H is a list of cell state variables and other sub-models interacting with the DNA Replication sub-model, according to an embodiment.
  • FIG. 271 through FIG. 27K are lists of instructions that illustrate an example DNA Replication sub-model, according to an embodiment. This example uses the parameter and values listed in FIG. 27G for the M.
  • the Replication Initiation sub-model determines when during the cell cycle chromosome duplication begins, This replication initiation time is therefore very important in determining the cell's division time.
  • the mechanism of replication initiation used here is modeled after that described for E. coli involving the protein DnaA (MG469). Chromosome Replication begins when a complex of 29 DnaA -ATP molecules assembles near the replication origin, OriC, at specific DNA motifs called R1-R5.
  • DnaA The assembly of this complex is rare because the DnaA molecules are titrated out by approximately 2000 additional binding sites, or "DnaA boxes," that exist all around the chromosome, DnaA needs to be bound to ATP or ADP to bind to these sites, and binds on and off these sites throughout the cell cycle. Binding of the DnaA -ATP at the R 1-R5 sites is cooperative, enabling the large complex to form at this specific location on the chromosome.
  • FIG. 28A is a list of parameters and values for an example DNA Replication Initiation sub-model, according to an embodiment. AH of the parameters used in the
  • Replication Initiation sub-model are described in FIG. 28A. All of the rate constants used in this sub-model were obtained from previous models of replication initiation. The site and state cooperativity constants were fit according to the cell cycle length. The site and state cooperatively constants directly affect the time required for replication initiation. The total cell cycle length has been experimentally measured, and the time required for Cytokinesis and Chromosome Replication can be estimated from their sub-models. By Equation 13,
  • the site and state cooperativity constants were fit such that on average the simulated cell divides in the experimentally measured amount of time, [0348]
  • DnaA rapidly associates with ATP, and in some cases DnaA-ATP can be converted to DnaA-ADP.
  • DnaA-ATP and DnaA-ADP molecules bind to and release from the binding sites on the chromosome.
  • the number of binding/unbinding events that occur in a given period of time is determined by the number of available DnaA-ATP and DnaA-ADP molecules and calculated binding/unbinding rates.
  • DnaA binding sites In addition to the R1- 5 sites at the origin, there are DnaA binding sites all around the chromosome. There are 2227 DnaA binding sites based on the M. genitalium motifs, which are 9 bases long, A high affinity site is defined as one that exactly matches the motif or its reverse complement (e.g., 148 sites for M. genitalium). A medium affinity motif is defined as one that matches 8 of the 9 bases in the motif or its reverse complement (e.g., 2079 sites for M. genitalium); a low affinity site is one that only matches 7 of 9 bases.
  • the replication initiation sub-model allows binding to all of the high and medium affinity sites outside of the oriC region.
  • the sub-model recognizes 5 DnaA binding sites in the OriC region: one high affinity (R4), three medium affinity (R1-R3), and one low affinity (R5), to mimic E. coif s 5 R-sites and the 5 R- sites predicted for M. genitalium. These sites reside between the genes MG469 and MG470 (bases: 578581-579224). There are 9 high and medium affinity motifs in this region, but the sub-model only recognizes 4 to match the E. coli pattern, and therefore ignore the sites at genome positions 578837, 578855, 578881, 578966, and 579139. R5 matches 7 of 9 bases of the motif, so it is a very weak binder of DnaA.
  • the sub-model binds this site after a 28-mer DnaA-ATP initiator complex at R1-R4 is formed, and the R5 binding triggers initiation. This is the only- low affinity binding site included in the sub-model,
  • the sub-model recognizes 7 states at each binding site.
  • the 28-mer initiator complex consists of 7 DnaA-ATP molecules bound to each of the R1-R4 sites. This binding is split up into 7 serial states, and in each state one additional DnaA-ATP molecule binds to each of the R1-R4 sites. All four sites must be bound before transition to the next state.
  • DnaA-ATP binding at the origin is cooperative. There are two forms of
  • FIG. 28B and FIG. 28C are block diagrams that illustrate binding state cooperativity for an example DNA Replication
  • the first type of cooperativity is state cooperativity, which helps increase the probability of transitioning into higher states. Once the binding within a state is completed, the cooperativity to transition into the next state is calculated as
  • Cstate is a fixed cooperativity constant.
  • the second type of cooperativity is site cooperativity, which describes the effects of the binding of one or more of the R 1-R.4 sites, on the probability of binding the other sites.
  • the sice cooperativity factor of a given site is 1 unless certain other R-sites are already bound, in which case the cooperativity factor changes to a variable Csite, R4 is a high affinity site, so it generally is the first to bind and has the highest cooperative effect on the other sites.
  • the site cooperativity rules can be seen in FIG. 28C.
  • R5 is also bound by cooperativity, given the presence of a complete 28-mer DnaA- ATP initiator complex at the R1-R4 sites.
  • the resulting DnaA-ATP29-mer is the trigger to begin the Replication of the chromosome.
  • the DnaA- ATP complex at the origin dissociates and hydrolyzes. DnaA molecules can continue to bind around the chromosome.
  • the formation of a second complete initiation complex is rare because the time left in the cell cycle is generally less than that needed for an initiation event and because DnaA molecules are further titrated with a second set of binding sites on the second chromosome.
  • the sub-model does not include a second DNA replication event.
  • FIG. 28D is a list of cell state variables and other sub-models interacting with the DNA Replication Initiation sub-model, according to an embodiment. State classes interacting with the DNA Replication Initiation sub-model are listed in FIG. 28D.
  • DnaA activation Deterministically form DnaA- ATP complexes up to the limit of available DnaA and ATP. The kinetics of DnaA activation are not known, and are not modeled in this embodiment.
  • Number of activation events min ⁇ Number of available ATP molecules
  • DnaA- ATP complex dissociation.
  • a complex of multiple bound DnaA-ATPs is only found near the oriC. This complex can be released from the chromosome by the DNA polymerase at the start of DNA replication. The resulting DnaA-ADP molecules are able to re-bind to the chromosome or reactivate to DnaA- ATP. If 29-mer DnaA-ATP complex is disrupted by the replisome machinery, then the following steps are taken: Dissociate the complex to individual DnaA-ATP molecules; Hydrolyze the ATP to form free DnaA-ADP molecules; Decrement the counts of DnaA-ATP complexes and H20 molecules; Increment
  • Binding DnaA-ATP and DnaA-ADP Up to 7 DnaA-ATPs can polymerize at each of the origin sites and only one DnaA-ATP can bind at each site outside of the origin. Only one DnaA-ADP can bind at each site. If the chromosomes are sufficiently supercoiled, then do the following: Stochastically select both high and low affinity sites at the origin and around the chromosome to bind DnaA-ATP and DnaA-ADP molecules. Binding proceeds such that the rate of binding each site is as follows (where the cooperativity factor, depends on the polymerization status of the R1-R4 boxes):
  • State Cooperativity Factor C state (State-1), (14f) Calculate the Site Cooperativity. 'The Site Cooperativity Factor for a given site is 1 or Csite depending on the occupancy of the other R-sites, Assign the total Cooperativity Factor for sites Rl, R2, and R3 as their respective Site Cooperativity Factors. Assign the total
  • rate of DnaA-ATP displacement kdlATP (14g)
  • rate of DnaA-ADP displacement kdlADP ( 14h)
  • membraneConcentration 14i
  • membrane concentration is the grams of membrane in the cell divided by the volume of the cell.
  • Replication-dependent bound DnaA-ATP release include a global term for the effect of active beta-clamps (a part of the DNA polymerase machinery) on bound DnaA-ATP inactivation. If the model predicts the exact position of the active beta-clamps on the chromosome, a global term is not needed; instead, model the release of a DnaA-ATP molecule exactly when the beta-clamp encounters it on the DNA. 'The free DnaA- ATP in the cytosol is not distinguished from the DnaA-ATP that has been recently released by a beta- clamp. Therefore, the release is modeled in the form of DnaA- ATP and not the hydrolysis to DnaA-ADP. This beta-clamp-dependent release is handled by the DNA Replication submodel described above and Chromosome state class.
  • Transcription factors are one of many mechanisms cells employ to respond to external signals and maintain homeostasis. Transcription factors regulate the synthesis rates of RNA by modulating the affinity of RNA polymerase for promoters. Transcriptional enhancers stabilize RNA polymerase-promoter complexes by contributing negative tree energy to the complex, for example by providing additional surfaces for RNA polymerase binding. Transcriptional repressors destabilize RNA polymerase-promoter complexes, for example by sterically blocking promoters.
  • the Transcriptional Regulation sub-model models the binding of transcriptional regulators to promoters and the fold-change effect of transcriptional regulators on RNA polymerase-promoter binding for individual promoters. Because transcript onal regulation is not always well characterized, the sub-model makes several simplifying assumptions. First, it is assumed that transcriptional reguIator-DNA binding is kinetically fast and energetically favorable, and therefore proceeds to completion within the simulation time step. Second, because transcriptional regulator-promoter affinities have not been systematically characterized, it is assumed that transcript onal regulators bind promoters with affinity proportional to their fold change effect. Third, it is assumed that only one copy of each transcriptional regulator can bind at each promoter. Fourth, it is assumed that transcriptional regulators stably bind DNA. Consequently, transcriptional regulator dissociation is ignored except for displacement by other DNA-binding proteins, which is modeled by the
  • Transcription sub-model and Replication sub-model It is also assumed that transcriptional regulators independently affect RNA polymerase, and thus their fold change effects add multiplicatively.
  • the Transcriptional Regulation sub-model initializes the status— free or DNA-bound-— and chromosomal location of the transcriptional regulators to a steady- state of the transcriptional regulatory network by iteratively evaluating sub-model results until convergence. Because the transcriptional regulatory model assumes that transcriptional regulator-promoter binding proceeds to completion within the simulation time step and is stable, the sub-model converges in one iteration.
  • FIG. 29A is a flowchart that illustrates an example method for a Transcriptional Regulation sub-model, according to an embodiment.
  • parameters are set for the sub-model, including identifying the transcription regulation enzymes e for the organism, e.g., by the protein monomer species or protein complex species or copy identifier.
  • For each transcription regulation enzyme e its binding site Xep on promoter p, as well as its motif, affinity, and fold-change effect Fep is also retrieved.
  • step 2903 the next promoter p expressed in chromosome k is selected; and, in step 2905, the count CNe of that enzyme in the cytosol compartment is retrieve and used with Fep to compute a transcription rate Repk for transcription of the gene at promoter p on chromosome k by enzyme e. This step is repeated until there are no more promoters. Then control passes to step 2910.
  • step 2910 enzymes are bound to the chromosome in proportion to the rates Repk.
  • Step 2910 includes steps 2911, 2913 and 2915.
  • step 2911 a random value for enzyme e, promoter p and chromosome k is selected from a multinomial distribution, with relative probabilities based on the rates Repk.
  • step 2920 the fold change effect ⁇ ? of the bound enzymes on the affinity of RNA polymerase for the promoter p is determined
  • Step 2920 includes steps 2921, 2923 and 2925.
  • step 2923 the next promoter bound to an enzyme is selected. If none are left, then the method ends. Otherwise the fold change effect of all enzymes on the promoter is accumulated multiplicatively— fp is updated with the product of fp and Fep.
  • the Protein Monomer and Protein Complex states represent the free and DNA- bound copy numbers of each transcriptional regulator.
  • the Chromosome state represents the exact chromosomal location of each DNA-bound transcriptional regulator.
  • FIG. 29B is a list of instructions that illustrates an example Transcriptional Regulation sub-model, according to an embodiment. These instructions re-use different variable names and indices, than used in FIG. 29A.
  • M. genitalium For application to M. genitalium, the applicable transcriptional regulatory network was reconstructed based on an extensive review of the primary literature and the proteomic database DBTBS. M. genitali m has few homologs to reported transcription factors. The reconstructed network contains five regulators which regulate 54 genes through 29 regulatory interactions, including one regulator (Spx, MG 127) which interacts directly with RNA polymerase. The five reconstructed transcriptional regulators and the binding site, motif, affinity, and fold-change effect of each regulatory interaction were curated.
  • RNA polymerase is the first step in the synthesis of functional gene products where RNA polymerase and several accessory enzymes translate transcription units (regions of the DNA containing 1 or more genes) into RNA molecules.
  • An RNA polymerase can bind on and off of the DNA either specifically to a gene promoter or non-speeifieaily to a non- promoter site. Transcription begins with the recruitment of RN A polymerase to a gene promoter with the help of a sigma initiation factor and possibly transcription factors. Next elongation factors are recruited, RN A begins to be polymerized, and the sigma factor is released. In this stage, the RNA polymerase is said to be in the actively transcribing state. Finally the RN A polymerase reaches a terminator at the end of the transcription unit, and with the help of termination factors releases the polymerized RNA and dissociates from the DNA.
  • the niRNA transcripts may be bound by ribosomes and translated to polypeptides.
  • the transcription sub-model does not represent this phenomenon, allowing translation only of completed mRNAs.
  • tRNA expression estimates analogous to the mRNA microarray data are useful. Since no microarray data is available for many organisms, cell composition data was used to approximate tRNA expression. The total tRNA expression is approximated as the total mRNA expression multiplied by the ratio of the tRNA to mRNA weight fractions of the cell. To approximate the expression of each individual tRNA, measurements of the relative abundances of each amino acid in the cell were used. A tRNA expression is the total tRNA expression multiplied by the fraction of the total amino acid weight represented by its amino acid.
  • alanine accounts for about 8% of the total amino acid mass. Then the expression of tRNA for alanine, MG471, is 8% of the total tRNA expression. In cases of degeneracy, where multiple tRNAs bind to the same amino acid, the amino acid weight fractions is split evenly between the tRNAs.
  • the total rRNA expression is calculated as the total mRNA expression multiplied by the ratio of the rRNA to mRNA weight fractions of the cell. This total rRNA expression is split between the rRNA species based on their relative abundances in the cell. Also sRNA expression is calculated in the same way, such that expression is proportional to R.NA abundance in the cell.
  • RNAi The rate of change of the concentration of an RNA species i, designated RNAi, is its synthesis net its degradation, as given by Equations 15a.
  • dIRNAil/dt ksynth, i - kdeg, i ⁇ RNAiU (15a) where ksynth is the rate of synthesis and kdeg is the rate of degradation.
  • the degradation rate is described by the first-order degradation constant of RNAi with half-life hi given by Equation 15b.
  • IRNA/I IRNAilo exp( ln(2) t / ⁇ ) (15f)
  • dIRNAil/dt ksynth, i, 0 exp( ln(2) t / ⁇ ) - kdeg, i IRNAil 0 exp( ln(2) t / ⁇ ) ( 15g) Assuming that mother and daughter cells are identically distributed:
  • the probability of an R.NA polymerase binding a given transcription unit is based on the synthesis rate of transcription unit i relative to the synthesis rate of all the other transcription units:
  • the probability of binding can be derived from the half-lives and the relative concentrations Ei of RNA.
  • RNA polymerase may exist in any of the four following states: 1. Free (FS, not bound to a chromosome); 2. Non-specifically bound somewhere on a chromosome (NSB); 3.
  • a transcription unit SB
  • AT Actively transcribing a transcription unit
  • Transition probabilities between these states are designed to maintain the occupancy of each state within a narrow window around their expected values.
  • the transition probabilities (Ptrans) are determined by four logistic control functions, tuned by the RNA polymerase state expectations.
  • RNA polymerases are in the free state. From the free state, a polymerase can transition to the non-specifically bound state, transition to the specifically bound state, or remain in the free state. From the non-specifically bound state, a polymerase can transition to the free state, transition to the specifically bound state, or remain in the non-specifically bound state, A random position on a chromosome for the non- specifically bound polymerase is selected from the polymerase accessible sites as determined by the Chromosome state. From the specifically bound state, a polymerase can transition to the free state, non-specifically bound state, or actively transcribing state, or remain in the specifical ly bound state.
  • a polymerase can only transition into the specifically bound state if a free sigma factor is available. Upon transition a polymerase into this state, a transcription unit to bind to is picked randomly. The probability of binding each transcription unit is based on experimentally measured gene expression and half-life data, as derived above. Once in the actively transcribing state, a polymerase will remain in the actively transcribing state until transcription is terminated, the RNA polymerase is displaced by another protein, or the RNA polymerase falls off due to a stall. In all these cases, the RNA polymerase falls off into the free state.
  • any bound sigma factors are released and the transcript is elongated according to nucleic acid limits (given the transcript sequence) and the elongation rate kel.
  • the transcript and polymerase are released if transcription is complete and the necessary termination factors are avai lable.
  • RNA polymerases are initialized as follows: Each RNA polymerase is randomly assigned (with replacement) to one of the actively transcribing, specifically bound, non-specifically bound, or free states weighted by the expected occupancy of each state. Actively transcribing and specifically bound polymerases are randomly assigned to transcription units weighted by the transcription unit binding probabilities (Ptu). Each transcription unit to which one or more actively transcribing polymerases have been assigned is divided into 1 segment for each polymerase. Actively transcribing polymerases are randomly assigned to positions within the assigned segment of their assigned transcription unit (positions near the segment border are not allowed to prevent polymerases from being too close to each other) with uniform probability. Non-specifically bound polymerases are randomly assigned to an accessible region on the chromosome.
  • FIG. 30A and FIG. 30B are flowcharts that illustrate an example method 3000 for a Transcription sub-model, according to an embodiment.
  • step 3000 parameters are set for the Transcription sub-model, including the allowed states and probabilities of transitioning form one state to the next.
  • Markov chain parameters are set to preserve observed
  • RNA species and states Protein monomers and complexes that serve as transcription enzymes are identified, including at least one each of sigma factors, elongation factors, termination factors and anti-termination factors.
  • the RNA polymerase elongation rate kel is also set for the organism.
  • step 3003 and 3005 the FS state is increment upon the occurrence of a new RNA polymerase.
  • step 3011 a Markov model weighted by the transition probabilities is used to change counts in each state.
  • step 3013 respond to increase in NSB by updating
  • chromosome state for a random binding site.
  • step 3015 a specific transcription unit and promoter p are determined based on transcription unit binding probabilities Ptu and binding probability fold changes ⁇ ? for each promoter. If promoter o is accessible and there is a sigma factor determined in step 3021, then in step 3023 update the chromosome stat to bind RNA polymerase at the promoter and in step 3025, decrement copy count for free sigma factor.
  • next AT polymerase with elongation factor is selected in step 3031, and a sigma factor is released in step 3035 if elongation is in its first time step. If the next position on DNA is accessible and the needed RNA base is available and the elongation rate is not exceeded, as determined in step 3041, then in step 3043 an RNA base is added to the transcript. If there is another non-limited AT polymerase, as determined in step 3045, then control returns to step 3031 to get the next AT polymerase. If not, then it is determined in step 3051 whether transcription by an RNA polymerase is completed and termination factors are available. If not, the sub-model ends for the current time step. If so, then in step 3053, the RNA transcript is released and its count incremented; and the RNA is released to the free state and the counts of AT and FS states are adjusted.
  • FIG. 30C is a list of cell state variables and other sub-models interacting with the Transcription sub-model, according to an embodiment.
  • Rho is not essential in B. subtilis (gram positive) or M. genitalium, and therefore only Rho-independent. termination was included in the M. genitalium specific transcription sub-model. Rho- independent termination occurs via the intrinsic properties of RNA which disrupt RNA polymerase-DN A bindin g.
  • the microarray data was presented in normalized log form. Since the exact method of normalization was unknown to us, to re-derive expression levels the value 2 ⁇ ⁇ presented value) was calculated.
  • the M. genitalium genome is contained in the larger M. pneumonie genome and we were able to map a M. pneumonia gene to all but six M. genitalium genes. The genes were mapped based on matching gene names, matching gene descriptions, and Bio-cyc's Align Multi-Genome Browser.
  • the half-lives of mRNA for M. genitalium are obtained from measurements in E. coli performed at 30 C in M9 minimal media. Additional sets of E. coli half-life data are available, but the set was chosen with the most comprehensive list of genes mapping to homologous M. genitalium genes, and whose average half-life was closest to the reported bulk E. coli mRNA half-life.
  • the gene mapping between E. coli and M. genitalium was based on common gene names and annotations, Bio-Cyc's Align Multi-Genome Browser, and UniProt. A half-life estimate for 274 M. genitalium genes was thus obtained. For the remaining genes, the average half-life, 4.425 minutes, was assigned. Separate sources were used to acquire the half-lives of tRNAs (45 minutes), rRNAs (150 minutes), and sRNAs (89 minutes,
  • the transcription unit structure (sets of genes that are transcribed together following a single promoter binding event) was compiled from several sources including primary reports and databases of co-transcribed genes, and a computational model that predicts promoter and transcription unit start sites. Transcription units were also predicted using information on the conservation of gene order across multiple species, the related functions of adjacent genes, the similar expression levels of adjacent genes as measured by
  • microarrays and the strandedness (transcription direction) of adjacent genes.
  • the 525 M. genitalium genes were transcribed in 355 transcription units. 294 genes fall in transcription units of more than one gene. The longest transcription unit contains 4 genes.
  • FIG. 30D is a list of parameters and values for a Transcription sub-model, according to an embodiment. At each time step, the sub-model for M. genitalium performs the following steps.
  • RNA polymerase specific binding can only occur if there is an available sigma factor, and a free sigma factor is accordingly decremented,
  • RNA polymerization by actively transcribing RN A polymerases with the aid of elongation factors. (a) Release the sigma factor if this is the first second of elongation, and increment the free sigma factor count,
  • RNA Processing submodel models operonic RNA cleavage into individual RNA gene products.
  • RNA processing sub-model models the cleavage of operonic non-coding RNA into individual gene products and intercistronic RNA fragments. Because the kinetics of RNA cleavage are not well characterized, this process makes several simplifying assumptions. First, this process assumes that each RNA is fully cleaved in a single time step. Consequently, this process collapses the cleavage of each RNA into a single reaction, and only represents uncleaved and fully cleaved RNA. Intermediate cleavage configurations are not represented.
  • Mj ' i is the stoichiometry of metabolite j in the processing of RNA species i including NTP
  • M max(0,-M) is the negative part of M
  • Kji is the experimentally observed catalytic rate of enzyme j in the processing of RN A species i
  • At is the simulation time step.
  • the RNA Processing sub-model implements a stochastic model of the arrival of RNA processing events with relative rates vi. Until RNA, metabolic, and/or enzymatic resources are exhausted, the model iteratively (1) computes the arrival rate, vi, of each processing event, (2) selects a single processing event to execute according to a multinomial distribution parameterized by vi, and (3) executes the selected processing reaction, updating the copy numbers of RNA and metabolites and decrementing the available enzymatic capacity.
  • the RNA state initializes the total copy count of each RNA species and initializes all RNA to their mature -processed and modified— conf guration
  • the tRNA Aminoacylation sub-model initializes tRNA to the aminoacylated configuration.
  • the Macromolecular Complexation and Ribosome Assembly sub-models initialize the copy count of each ribonucleoprotein complex.
  • the Translation sub-model initializes the mRNA codon location of each ribosome.
  • the expression of the RNA processing enzymes was fit to provide sufficient enzymes to quickly process RNA, or more specifically to prevent sustained accumulation of unprocessed RNA.
  • FIG. 31 A is a flowchart that illustrates an example method 3100 for a RNA
  • This flow chart uses the notation that the count of metabolite index m is Nm, the count of enzyme index e is Ne, and the reaction index is r for RNA processing reactions.
  • the protein monomers and complexes that serve as enzymes for RNA processing reactions are identified as is the stoichiometry Mmr of metabolites in the reactions, and catalytic rate Ker of enzymes in the reaction.
  • the enzyme capacity Ce during a single time step is set to the count of that enzyme.
  • Vr a Vr has been calculated for every unprocessed RNA species
  • step 3123 the RNA processing reaction is run to process one copy of
  • RNA species r unprocessed RNA species r.
  • the copy count of the RNA is updated (unprocessed RNA count is decremented and processed RNA count incremented).
  • the metabolite counts for metabolites m are decremented by Mmr and the enzyme capacity Ce is also decremented. Control passes back to step 3111 to recomputed Vr for all r based on the new counts.
  • the RNA state variable represents the copy numbers of unprocessed, processed, and intercistronic RNA.
  • the Metabolite state variable represents the copy number of each intracellular metabolite.
  • the Protein Monomer and Protein Complex state variables represent the copy number of each RNA processing enzyme.
  • RNA are synthesized and matured in four steps. This process models the cleavage of RNA transcripts produced by the Transcription sub-model.
  • the RNA Modification sub-model models the modification of transcripts cleaved by this process.
  • the Macromolecular Complexation and Ribosome Assembly sub-models model the formation of macromolecular complexes, including the ribosomal sub-units.
  • the Translation sub-model models the function of m-, r ⁇ , s-, and tRNA in translation.
  • the RNA Decay sub-model models the degradation of cleaved intercistronic RNA fragments.
  • genitalium chromosome was reconstructed by mapping the "suboperon" structure of M. pneumoniae chromosome on to that of M. genitalium with two modifications.
  • the in silico M. genitalium chromosome is organized into 335 transcription units containing 525 genes.
  • Bacteria also transcribe 3' and 5' leader sequences before and after each non- coding RNA gene. These leader sequences must be cleaved to produce functional non-coding RNA.
  • M. genitalium mRNA cleavage is not well described and is not modeled. The model assumes M. genitalium polycistronic mRNA are not cleaved.
  • E. coli rRNA have been shown to be transcribed as a single 30S transcript which is cleaved into individual 5S, 16S, and 23S rRNA transcripts by the action of several ribonucleases. The E. coli rRNA cleavage scheme was adapted and simplified for the reduced ribonuclease complement of M.
  • ribonuclease III hydrolytically cleaves the 30S rRNA into 5S, 16S, and 23S rRNA precursors.
  • hydrolytic ribonuclease J MG139
  • phosphorolytic ribonucleases RsgA phosphorolytic ribonucleases
  • M. genitalium doesn't contain a homolog of the E. coli 9S RNA cleavage enzyme ribonuclease E.
  • E. The mechanisms of 3' and 5' cleavage of the M. genitalium 5S rRNA precursor are unknown.
  • 3' and 5' cleavage of the 5S rRNA precursor are modeled as a spontaneous process.
  • the reconstructed M. genitalium small non-coding RNA (sRNA) precursor cleavage was based on that of E. coli.
  • ffs (MGOOOl) is cleaved at both its 3' and 5' ends by ribonuclease III.
  • Ribonuclease P cleaves the 5' end of rpnB (MG0003) and ssrA (MG0004).
  • the reconstructed M. genitalium tRNA precursor cleavage scheme was based on that of E. coli.
  • Ribonucleases III (MG367) and P (MG0003, MG465) hydrolytically cleave the 3' and 5' ends of each pre-tRNA, removing the 3' and 5' leader regions and intercistronic regions to produce individual tRNA.
  • the stoichiometry, kinetics, and energetics of each non-coding RNA cleavage reaction were reconstructed based on extensive review of the primary literature.
  • FIG. 3 IB is a list of instructions that illustrates an example RNA Processing submodel, according to an embodiment for M. genitalium.
  • Post-transcriptional base modification occurs in order to degenerately encode amino acid triplet codes using far fewer tRNA species than possible using only Watson-Crick base pairing. Modification of wobble position is believed to be most important for improving codon recognition. Modifications distant from the anticodon are believed to help tRNAs properly fold and stabilize their catalytically active structures. Many cells also post- transcriptionally modify rRNA. rRNA modifications are believed to help stabilize rRNA, participate in protein synthesis, and confer resistance to ribosomal inhibitors. The exact role of rRNA modification is unknown. The RNA Modification sub-model models tRNA and rRNA modifications.
  • RNA Modification sub-model models non-coding RNA modification. Because the kinetics of RNA modification are not well characterized, this sub-model makes several simplifying assumptions. First, this sub-model assumes that each RNA is fully modified in a single time step. Consequently, this process collapses the modification of each RNA into a single reaction, and only represents unmodified and fully modified RNA. Intermediate modification configurations are not represented. Second, this sub-model assumes that the mean arrival rate, vi of modification events of each RNA species i is independently limited by ( 1 ) the copy count of unmodified RNA, r, (2) the copy counts of intracellular metabolites, mj, and (3) the copy counts of RNA modification enzymes, ej. Based on these assumptions, the functional form of vi is given by Equation 17, which parallels Equation 16.
  • Mji is the stoichiometry of metabolite j in the modification of RNA species i
  • M max(0,---M) is the negative part of M
  • Kji is the experimentally observed catalytic rate of enzyme j in the modification of RNA species i
  • At is the simulation time step.
  • This sub-model implements a stochastic model of the arrival of RNA modification events with relative rates vi. Until RN A, metabolic, and/or enzymatic resources are exhausted, the model iterative! y (1) computes the arrival rate, vi, of each modification event, (2) selects a single modification to execute according to a multinomial distribution parameterized by vi, and (3) executes the selected modification reaction, updating the copy numbers of RN A and metabolites and decrementing the available enzymatic capacity.
  • FIG. 32A is a flowchart that illustrates an example method 3200 for a RNA Modification sub-model, according to an embodiment.
  • This flow chart uses the notation of FIG. 31A that the count of metabolite index m is Nm, the count of enzyme index e is Ne, and the reaction index is r for RNA processing reactions, and follows the same flow, with the only change being the steps in FIG. 32A applies to RNA modification instead of RNA processing. For brevity these steps will not be narrated here.
  • the RNA state represents the unmodified and modified copy numbers of each RNA species.
  • the Metabolite state represents the copy number of each intracellular metabolite.
  • the Protein Monomer and Protein Complex states represent the copy number of each RNA modification enzyme.
  • RNA are synthesized and matured in four steps. This sub-model models the modification of individual non-coding RNA following RNA.
  • the RNA state represents the unmodified and modified copy numbers of each RNA species.
  • the Metabolite state
  • Macromolecu!ar Complexation and Ribosonie Assembly p sub-models model the formation of macromolecular complexes, including the 30S and 50S ribosomal particles.
  • Translation sub-model models the function of ITS-, r-, s-, and tRNA in translation.
  • the RNA initial conditions were described above with respect to the RNA Processing sub-model.
  • tRNA modification complement was reconstructed based on the observed complement of E. coli modifications.
  • E. coli modification complement was reconstructed.
  • modifications situated in conserved motifs were transferred to M. genitalium.
  • the M. genitalium rRNA modification complement was similarly reconstructed. The stoichiometry and kinetics of each RNA modification reaction were reconstructed based on an extensive review of the literature.
  • FIG. 32B is a list of instructions that illustrates an example RNA Modification submodel, according to an embodiment.
  • tRNA Aminoacylation sub-model simulates the conjugation of amino acids to the tRNAs.
  • tRNAs serve as mediators between the ribosonie and the amino acids which form polypeptides.
  • tRNAs are composed of short RNA sequences which recognize specific codons (triplets of bases) on mRNAs. Each tRNA binds to a specific amino acid, and then interacts with a ribosome to deliver this amino acid to an elongating polypeptide chain, according to the mRN.A code.
  • tmRNAs are short RN A structures that add a proteolysis tag to the end of incomplete polypeptides upon ribosomal stalling, signaling the polypeptides for degradation.
  • the tRNA Aminoacylation sub-model also simulates the aminoacylation of the taiRNA which delivers the amino acid alanine to stalled ribosomes. These aminoacylation reactions are both enzyme and energy dependent.
  • the tR NA Aminoacylation sub-model maximizes the number of aminoacylation reactions (tRNA aminoacylations, tmRNA ammoacylations, and tRNA modifications) up to the limits of available RNAs, enzymes, and metabolites.
  • the enzymatic bounds are calculated from the catalytic turnover constant (kcat) of each enzyme.
  • the required metabolites include, among others, amino acids and ATP.
  • reactions to occur within a given time step are randomly selected using a probability distribution that is weighted by the abundances of the reaction requirements. That is, the limits of each reaction are calculated assuming that all of the available required resources would be allocated to the given reaction. The probability distribution for selecting given reactions is weighted by these calculated limits. Reactions are performed one by one until insufficient resources exist to perform any additional reactions. Intermediate steps in the aminoacylation of tRNAs are not represented.
  • the tRNA Aminoacylation sub-model reads from and writes to the RNA state class, it reads in the counts of free and aminoacylated tRN As and tmRNA, and writes back the updated counts. All of the tRNAs are initialized to an aminoacylated state.
  • FIG. 33A is a flowchart that illustrates an example method 3300 for an RNA Aminoacylation sub-model, according to an embodiment.
  • This flow chart uses the notation that the count of metabolite index m is Nm, the count of enzyme index e is Ne, and the reaction index is r for tRNA aminoacylation reactions.
  • the protein monomers and complexes that serve as enzymes e for tRNA aminoacylation reactions are identified as is the stoichiometry Mmr of metabolites in the reactions, and catalytic rate Ker of enzymes in the reaction.
  • the enzyme capacity Ce during a single time step is set to the count of that enzyme, Ne.
  • control passes to step 3321 to randomly select a reaction r to implement using a multinomial distribution with probabilities based on the reaction limits RLr.
  • step 3323 the RNA aminoacylation reaction r is run to process one copy of the tRNA or tmRNA species r.
  • the copy count of the RNA is updated (free tRNA count or free tmRNA count is decremented and aminoacylated RNA count incremented).
  • the metabolite counts for metabolites m are decremented by Mmr and the enzyme capacity Ce is also decremented. Control passes back to step 3311 to recomputed RLr for all r based on the new counts.
  • M. genitalium is believed to have 36 tRNA aminoacylation reactions and 1 tmRNA aminoacylation reaction, and all of the reactions require 1 ATP. There are also two tRNA modification reactions that must occur. Since there is no glutaminyl tRNA synthetase (to add a glutamine to a tRNA), glutamyl- tRNA synthetase first adds a glutamate to tRNA MG502, and then Glu-tRNA Gin amido transferase converts the glutamate into a glutamine. Also, a methionylforaiyl transferase is used to add the formyl group onto the methionine on tRNA MG488 (formylmetbio ine is used as the start codon).
  • FIG. 33B is a list of parameters for an RNA Aminoacylation sub-model, according to an embodiment. The following list of instructions indicates a method for the example embodiment at each time step. For each of the 39 possible reactions (36 tRNA
  • the sub-model calculates the maximum number of reactions that can occur using Equation 18.
  • RNAs In presence of ribonucleases such as ribonuclease R, RNAs have relatively short half-lives compared to that of other macromolecules (e.g., protein, DNA) and the mass doubling time.
  • the relatively short half-lives of RNAs enables the small cells with very small pool of RNAs and particularly few mRNAs to sample a broader range of configurations of the RNA pool over a shorter period that would be possible with longer half-lives. This helps the cell more finely tune the expression of proteins, more efficiently execute cell-cycle dependent events, and respond to the external environment. However, this enhanced fitness due to short RNA half-lives comes at a large energetic cost.
  • aminoacylated RNAs require peptidyl tRNA hydrolase to release their conjugated amino acids. This sub-model decays all species of R.NA, and at all maturation states including aminoacvlated states.
  • RNAs are randomly selected for degradation by a Poisson probability distribution based on the RNA half-life.
  • This submodel contains no intermediate representation of RNA degradation. RNA degradation is treated as an all-or-nothing event that proceeds to completion in a single time step. Upon degradation, water is used to break the nucleotide -nucleotide bonds and the nucleotides are recycled. All aborted transcripts (due to stalled RNA polymerases or RNA polymerases that have been knocked off of the DNA) are also degraded by this process. No initialization steps are required for this sub-model.
  • FIG. 34A is a flowchart that illustrates an example method 3400 for an RNA Decay sub-model, according to an embodiment.
  • This flow chart uses the notation that the count of metabolite index m is Nm, the count of enzyme index e is Ne, and the reaction index is r for decay reactions of RNA species r.
  • the protein monomers and complexes that serve as enzymes e for RNA decay reactions are identified as is the stoichiometry Mmr of metabolites in the reactions, and catalytic rate Ker of enzyme e in the reaction r.
  • steps 3421 values of iVr, the copy count for RNA species r, kdecay r and reactants and products of reaction r from one or more cell state variables are retrieved.
  • step 3431 and 3433 the next RNA species is selected that has associated decay enzymes, if any. If none, the method ends. Otherwise step 3435 is executed to determine if a copy of that RNA species is aborted. If so, the aborted RNA copy is slated for decay in step 3441, described below. Otherwise, the next RNA species is selected at random in step 3437. In step 3437 and 3439, it is determined whether a Poisson distribution with rate parameter equal to Nr*kdecay r provides a random number X > 1. If not, no copy of that species is decayed or degraded and control passes back to step 3431 and following to select the next RNA species.
  • Step 3441 includes decrement metabolite copy count Nm by Mmr for all m; increment copy count of produced metabolites; decrement copy count of RNA species r; decrement number of decay events DR (and DT if species is tRNA).
  • FIG. 34B is a list of cell state variables and other sub-models interacting with the RNA Decay sub-model, according to an embodiment.
  • RNA decay is modeled as requiring ribonuciease R (MG104). 'The decay of tRNAs also requires peptidyl tRNA hydrolase (Pth: MG083). Half-lives of all the RNA species are largely based on
  • E. coli genes are mapped to the M. genitalium genes by homology.
  • FIG. 34C is a list of parameters and values for an RNA Decay sub-model, according to an embodiment applied in the M. genitalium example. For each time step, do the following.
  • ribonuciease R has a high rate of activity, so as long as NR > 0, RNA decay can occur
  • RNAi kdecay i (19) where RNAi is the count of a given RNA species i
  • Translation is the process whereby the ribosome, accessory enzymes, and tRNAs transcode mRNAs and produce amino acid polymers, also called polypeptides, which when completed form protein monomers.
  • 'Translation begins with the recruitment of the 30S and SOS ribosomal particles (also called subunits) and an initiation factor (e.g., initiation factor 3, IF3) to an mRNA molecule.
  • the ribosomal particles assemble into a ribosome (e.g., 70S ribosome) on the mRNA molecule with the help of initiation factors.
  • the ribosome polymerizes amino acids presented by aminoacylated tRNAs with the help of elongation factors.
  • a release factor recognizes the stop codon UAG or UAA, hydrolyzes the peptidyl tRNA bond, and dissociates.
  • a ribosome recycling factor dissociates the E-site tRNA.
  • an eloneation factor G releases the release factor and 50S ribosome, and an initiation factor 3 dissociates the 30S ribosome, P-site tRNA, and mRNA,
  • a ribosome may stall for various reasons including collisions with other proteins and limited resources required of translation.
  • a ribosome stalls its last tRNA is expelled and replaced by the tRNA-like domain of a tmRNA molecule.
  • the mRNA-like domain of the tmRNA expels the bound mRNA.
  • the translation sub-model simulates protein translation by ribosomes and accessory initiation, elongation, and termination factors. Ribosomes transition from a free to active state if the necessary factors are present, and bind to an mRNA. The selection of a specific mRNA to bind to a ribosome is random and weighted by mRNA abundances. As the sequence of each gene is known, the codons presented by the mRN As are translated using aminoacylated tRN As and elongation factors at a rate of up to 16 amino acids per second (measured by radioactive labeling in E. coli). Once a stop codon has been reached, translation is terminated and the polypeptide will undergo further modifications by other processes in the whole model simulation.
  • the sub-model also simulates the identification of stalled ribosomes by the tmRNA, the replacement of the tRNA and mRNA with the tmRNA, and the synthesis of the proteolysis tag encoded by the tmRNA' s mRNA-like domain.
  • the simulation begins in a state in which ribosomes are already bound to mRNAs and in the process of elongating. Each ribosome is randomly assigned (without replacement) to an mRNA species, weighted by the current expression of the mRNAs. Then, each ribosome is assigned to positions within the assigned mRNA with uniform probability. No ribosomes are initialized to the stalled state, since the expected probability of stalling is negligible. No tmRNAs are initiated to the bound state.
  • FIG. 35A and FIG. 35B are flowcharts that illustrate an example method 3500 for a Translation sub-model, according to an embodiment.
  • This flow chart uses the notation that the count of metabolite index m is Nm, the count of enzyme index e is Ne, and the reaction index is r for translation reactions of mRNA species r.
  • the protein monomers and complexes that serve as enzymes e for translation reactions are identified, as is the stoichiometry Mmr of metabolites in the reactions.
  • steps 3503 and 3505 the next free ribosome is selected at random for which there is both an initiation factor and available energy. If none, control passes to step 3521 to select the next bound ribosome. Otherwise, in step 3511 a random mRNA species r is selected based on a multinomial distribution with probability of each species based on the copy count of free members of that species. In step 3513, the free ribosome is bound to a copy of the selected mRNA species r. the free counts of both ribosomes and RNA species r are decremented and their bound counts are incremented. Control passes back to step 3503 to select the next free ribosome.
  • step 3521 and 3523 the next bound ribosome is selected at random for which there is both an elongation factor and needed metabolites. If none, control passes to step 3545 to test for stalling. Otherwise, elongation occurs. In step 3525 it is determined whether the ribosome was first bound in the current time step. If so, then, in step 3631, the initiation factor (IF) is released, and copy counts for the free and bound IF updated; and, in step 3532 a start amino acid, f-methionine, is associated as a first element of the polypeptide to be produced. Control passes back to step 3521 to select the next bound ribosome at random.
  • IF initiation factor
  • step 354 the elongation limit for the current time step is set to the elongation rate kelong for the time step. Control then passes to step 3533 in the loop for adding amino acids one by one to the polypeptide.
  • step 3533 the next amino acid to be added is determined from the mRNA sequence. If the sequence is finished, then the polypeptide-ribosome complex is marked completed; and, next bound ribosome is inspected at step 3521. If the amino acid sequence is not yet finished, then if there are enough resources (elongation factors and metabolites, and aminoacylated tRNA with the correct amino acid, and elongation limit is not exceeded), as determined in step 3541, add the amino acid in step 3543, update the counts for the affected molecules, and return to step 3533 for the next amino acid in the sequence.
  • resources elongation factors and metabolites, and aminoacylated tRNA with the correct amino acid, and elongation limit is not exceeded
  • step 3561 is a list of cell state variables and other sub-models interacting with the Translation sub-model, according to an embodiment.
  • FIG. 35D is a list of parameters and values for a Translation sub-model, according to an embodiment.
  • the example translation sub-model embodiment performs the following steps at each time step.
  • FIG. 35E is a list of instructions that illustrates an example method for initializing ribosomes for a Translation sub-model, according to an embodiment. Each ribosome exists in one of two states, free or actively transcribing.
  • Ribosomes are created in the free state and may transition to the actively transcribing state as follows: If ribosome factor A is present, initiation factors (IF-1, IF-2, and IF-3) are present, and one unit of energy (GTP) is available, then randomly select free ribosomes to initiate up to the limits of ribosome factor A, initiation factors, and GTP. Randomly select mRNA species for each initiating ribosome to bind to, weighted by the counts of each mRNA species. Update the state of each ribosome and mRNA, and the amount of available substrates.
  • initiation factors IF-1, IF-2, and IF-3
  • GTP one unit of energy
  • Elongation Next, the polypeptides are elongated. It is assumed that one of each elongation factor (EF-tu, TS, and G) is sufficient for each ribosome, but a separate set of factors may be useful for each ribosome. Elongation proceeds as follows: If elongation factors (EF-tu, TS, and G), aminoacylated tRNAs, and energy (GTP) are available, then randomly select actively transcribing ribosomes to elongate up to the limits of elongation factors. Available amino acids and energy are allocated among actively translating ribosomes.
  • EF-tu, TS, and G aminoacylated tRNAs
  • GTP energy
  • the ribosome is newly initiated, then release the initiation factors; associate a tRNA for f-methionine to bind the first amino acid, and release a free tRNA. Otherwise, derive the amino acid sequence of the translating gene by converting the genome sequence into a tRNA sequence using the amino acid code; associate aminoacylated tRNAs to bind amino acids to the growing polypeptide up to the aminoacylated tRNA limit, energy limit, and elongation limit, kelong. Release free tRNAs, then update the state of each ribosome, the progress of all actively translating ribosomes, and the amount of available substrates. Note, for cases in which translation of a peptide finishes partway through the time step, the elongation factors are released, but only available at the time step. For simplicity, the transition between P and E sites is not explicitly modeled in this embodiment.
  • Termination Once all amino acids of a protein have been translated, one release factor, one recycling factor, one elongation factor G, and one GTP molecule are sufficient to terminate the polypeptide, as follows: If all release factors, recycling factors, elongation factors G, and energy (GTP) are available, then randomly select completed polypeptide- mRNA-ribosome complexes to dissociate; and update the state of each ribosome and mRNA, monomer counts, and the amount of available substrates. The ribosome will be available to bind mRNA at the following iteration
  • the Protein Processing I sub-model simulates N-terminal formylmethionine deformylation and N-terminal. methionine cleavage, the first step of post-translational processing. Because the mechanisms of early protein processing are not well characterized on the genomic scale, this sub-model makes several simplifying assumptions. First, this process assumes that each protein is both deformylated and cleaved in a single time step. Consequently, this process collapses the processing of each protein into a single reaction, and only represents nascent and fully processed proteins. Intermediate protein configurations are not represented. Second, this sub-model assumes that the mean arrival rate, vi, of processing events of each protein species i is independently limited by (1) the copy number of unprocessed protein, p. , (2) the copy numbers of intracellular metabolites, mj, and (3) the copy numbers of processing enzymes, ej. Based on these assumptions, the functional form of vi is given by Equation
  • the Protein Processing I sub-model implements a stochastic model of the arrival of early protein monomer processing events with relative rates vi. Until, protein, metabolic, and/or enzymatic resources are exhausted, the model iteratively (1) computes the arrival rate, vi, of each processing event, (2) selects a single processing event to execute according to a multinomial distribution parameterized by vi, and (3) executes the selected processing reaction, updating the copy numbers of protein monomers and metabolites and decrementing the available enzymatic capacity.
  • the Protein Monomer and Protein Complex states represent the copy number of each protein processing enzyme.
  • the Protein Monomer state also represents the nascent and processed copy numbers of each protein monomers species.
  • the Metabolite state represents the copv number of each intracellular metabolite. Proteins are synthesized and matured in six steps. This sub-model simulates the first step in post-translational processing following translation (see Translation sub-model) and preceding membrane and extracellular protein translocation and cytosolic protein folding (see Protein Translocation and Protein Folding sub-models).
  • the nascent and processed configurations of each protein monomer species are initialized with zero copy number because the typical occupancy these configurations is small compared to that of the mature configuration.
  • the expression of the protein processing enzymes was fit to provide sufficient enzymes to quickly process proteins, or more specifical ly to prevent sustained accumulation of unprocessed proteins.
  • FIG. 36A is a flowchart that illustrates an example method 3600 for a Protein Processing I sub-model, according to an embodiment.
  • This flowchart uses the notation that the count of metabolite index m is Nm, the count of enzyme index e is Ne, and the reaction index is b for protein processing reactions.
  • a protein not yet processed by processing I is called a nascent protein.
  • the flow is similar to the flow of the stochastic models for RNA processing (FIG. 31A) and RNA modification (FIG. 32A).
  • step 3601 the protein monomers and complexes that serve as enzymes for protein processing I reactions are identified as is the stoichiometry Mmb of metabolites in the reactions, and catalytic rate Keb of enzymes in the reaction.
  • step 3603 the enzyme capacity Ce during a single time step is set to the count of that enzyme.
  • Vb corresponding to vi
  • step 3623 the RNA processing reaction is run to process one copy of nascent protein species b.
  • the copy count of the protein species is updated (nascent protein count is decremented and processed protein count incremented).
  • the metabolite counts for metabolites m are decremented by Mmb and the enzyme capacity Ce is also decremented. Control passes back to step 3611 to recompute Vb for all b based on the new counts.
  • M. genitalium peptide deformylase (MG106) was assumed to deformylate all protein monomers. The protein substrates of M.
  • MG172 genitalium methionine aminopeptidase
  • FIG. 36B is a list of instructions that illustrates an example Protein Processing I sub-model, according to an embodiment for the M. genitalium example.
  • Cell proteins are translated in the cytosol. However, many functionally important proteins including nutrient transporters, ATP synthase, metabolic enzymes, adhesins, receptors, transducers, virulence factors, and the protein translocation machinery itself must be embedded in the cell membrane or secreted in order to properly function. Cells employ short N-terminal and C-terminal signal sequences to selectively translocate proteins. This sub-model simulates membrane and extracellular protein localization, the second step in post-translational processing.
  • the Protein Translocation sub-model simulates integral membrane, lipoprotein, and secreted protein translocation into the cell membrane by a translocation enzyme (such as the Sec A preprotein translocase), the second step of post-translational processing. Proteins axe synthesized and matured in six steps.
  • the Protein Translocation sub-model simulates protein translocation following deformylation and N-terminal methionine cleavage and preceding lipoprotein diacylglyeeryl transfer and lipoprotein and secreted protein signal sequence cleavage. Because the mechanisms of protein translocation are not well characterized on the genomic scale, this sub-model makes several simplifying assumptions, similar to those made for other sub-models.
  • this process decouples translation, N-terminal formyimethionine processing, and translocation by assuming that translocation does not begin until after translation termination and N-terminal formyimethionine processing.
  • this sub-model assumes that each protein is fully translocated in a single time step, Consequently, this process collapses the translocation of each protein into a single reaction, and only represents un translocated and fully translocated proteins, Intermediate protein localizations are not represented.
  • this process assumes that the mean arrival rate, vi, of translocation events of each protein species i is independently limited by (1) the copy number of untranslocated protein, p., (2) the copy numbers of intracellular metabolites, mj, and (3) the copy numbers of the translocation enzymes, ej. Based on these assumptions, the functional form of vi is given by Equation 21,
  • the metabolites for this reaction include ATP and GTP hydrolysis.
  • the Protein Translocation sub-model implements a stochastic model of the arrival of protein monomer translocation events with relative rates vi. Until protein, metabolic, and/or enzymatic resources are exhausted, the model iterative!'- (1) computes the arrival rate, vi, of each translocation event, (2) selects a single translocation event to execute according to a multinomial distribution parameterized by vi, and (3) executes the selected translocation reaction, updating the copy numbers of protein monomers and metabolites and decrementing the available enzymatic capacity.
  • FIG. 37A is a flowchart that illustrates an example method 3700 for a Protein Translocation sub-model, according to an embodiment. This flow chart uses the notation that the count of metabolite index m is Nm, the count of enzyme index e is Ne, and the reaction index is b for protein processing reactions. A protein not yet translocated is called an untranslocated protein. The flow is similar to the flow of the stochastic models for protein processing I (FIG. 36A) and will not be repeated here. In step 3713, Equation 21 is evaluated as expressed in the revised notation.
  • M. genitalium employs two convergent SecA-dependent pathways— co-translational and post-translational— to translocate approximately 35-45% of all protein monomers into the plasma membrane.
  • M. genitalium does not contain a type I Tat transporter or sortase.
  • the co-translational SecA pathway translocates integral membrane proteins into the cell membrane.
  • First, GTP-dependent signal recognition particles (SRP; MG0001, MG048) and molecular chaperones co-translationally recognize the type I signal sequence of nascent integral membrane proteins.
  • SRPs deliver nascent proteins to their cognate receptor FtsY (MG297) which is associated with the SecA type II preprotein translocon.
  • the preprotein translocase SecA (MG072) iteratively pushes nascent proteins through the SecYEG (MG170, MG055, MG476) translocase pore by an ATP-dependent, step-wise mechanism.
  • SecDF MG277
  • YidC MG464
  • the M. genitalium post-translational SecA pathway translocates lipoproteins and secretes proteins.
  • Type II signal sequences are short, positively charged N-terminal sequences.
  • Type II signal sequences are composed of three regions: n, h, and c.
  • the n region is composed of 1-5 positively charged amino acids.
  • the h region is composed of 7-15 hydrophobic amino acids and forms an a-helix.
  • the c region is composed of 3-7 polar amino acids and forms a ⁇ -strand.
  • Signal peptidase II cleaves type II signal sequences at lipoboxes distinguished by the sequence L[ASI][GA]C inside the c region. Similarly to the co-translational pathway, SecA iteratively translocates nascent proteins into the plasma membrane.
  • M. genitalium does not have an I signal sequence protease and does not cleave the signal sequences of integral membrane proteins. Integral membrane protein signal peptides are believed to help anchor proteins to the membrane.
  • FIG. 37B is a list of instructions that illustrates an example Protein Translocation sub-model, according to an embodiment for the M. genitalium example.
  • This sub-model addresses the third step of post-translational processing: lipoprotein diacylglyceryl adduction and lipoprotein and secreted protein signal peptide cleavage.
  • the localization and N-terminal signal sequence length of each protein monomer was
  • This sub-model implements a stochastic model of the arrival of protein monomer processing events with relative rates vi. Until protein, metabolic, and/or enzymatic resources are exhausted, the model iteratively (1) computes the arrival rate, vi, of each processing event, (2) selects a single processing event to execute according to a multinomial distribution parameterized by vi, and (3) executes the selected processing reaction, updating the copy numbers of protein monomers and metabolites and decrementing the available enzymatic capacity, The unprocessed and processed configurations of each protein monomer species are initialized with zero copy number because the typical occupancy these configurations is small compared to that of the mature configuration.
  • FIG. 38A is a flowchart that illustrates an example method 3800 for a Protein Processing II sub-model, according to an embodiment.
  • This flow chart uses the notation that the count of metabolite index m is Nm, the count of enzyme index e is Ne, and the reaction index is b for protein processing reactions.
  • the flow is similar to the flow of the stochastic models for protein processing I (FIG. 36A) and will not be repeated here.
  • Equation 22 is evaluated as expressed in the revised notation.
  • M genitalium lipoproteins are translated in the cytosol and subsequently targeted to the plasma membrane by type II N- terminal signal sequences.
  • M. genitaiium lipoproteins are first anchored to the outer leaflet of the plasma membrane. 'This is achieved via covalent adduction of diacylgiyceryl to the sulfhydryl group of the lipobo cysteine by diacylgiyceryl transferase (MG086). Many bacteria further anchor lipoproteins through phospholipidation.
  • M. genitaiium does not have an apolipoprotein transacylase, and is not believed to phospholipidate lipoproteins.
  • lipoprotein N-termina! signal sequences are cleaved immediately C-terminal to the lipobox cysteine by type II signal peptidase (MG210).
  • Extracellular M. genitaiium proteins are transcribed in the cytosol and targeted to the plasma membrane by type II signal sequences. In contrast to lipoproteins, extracellular proteins do not undergo diacylgiyceryl transfer, and instead are cleaved immediately C- terminal to the cysteines of their lipoboxes by type II signal peptidase, releasing the resultant protein and signal peptide into the extracellular space.
  • M. genitaiium does not have a type I signal sequence protease and does not cleave the signal sequences of integral membrane proteins. Integral membrane protein signal peptides are believed to help anchor proteins to the membrane.
  • FIG. 38B is a list of instructions that illustrates an example Protein Processing II sub-model, according to an embodiment for the M. genitaiium example.
  • Proteins are synthesized as long, catalytically inactive linear chains of amino acids. Subsequently, proteins fold into energetically favorable, compact, and enzymatically competent three-dimensional structures. While some protein species are believed to fold spontaneously, other proteins are believed to require helper chaperone proteins to properly fold. In addition, some protein species require metal ions and other small molecule prosthetic groups to fold. This process models chaperone-mediated protein folding. The prosthetic group requirements for protein folding were reconstructed based on an extensive review of the literature and several databases.
  • the Protein Folding sub-model implements a model of the chaperone-mediated folding of processed, translocated polypeptides. Because the kinetics and energetics of protein folding are not well understood, this process represents the three-dimensional configuration of each protein as a two-state (folded, unfolded) Boolean variable.
  • this process makes the simplifying assumption that the folding rate, ri, of protein species i is a B lites and chaperones.
  • Protein folding is one of the last steps in protein maturation following protein processing and translocation and preceding protein modification.
  • the Protein Monomer and Protein Complex states initialize the total copy number of each protein species, and initially set all protein monomers and complexes to their mature— folded and modified—
  • FIG. 39A is a flowchart that illustrates an example method 3900 for a Protein Folding sub-model, according to an embodiment.
  • This flow chart uses the notation that the count of metabolite index m is Nm, the count of enzyme index e is Ne, and the reaction index is b for protein processing reactions. The flow is similar to the flow of other stochastic models for protein processing.
  • step 3901 the protein monomers and complexes that serve as enzymes (chaperones) for protein folding reactions are identified, as is the stoichiometry Mmb of metabolites in the reactions, and Boolean chaperone requirement CReb for enzyme e to fold protein b in the reaction.
  • step 3911 the next unfolded protein species b, if any, is determined; and in step 3913 Equation 23 is evaluated to determine Vb (corresponding to vi) using the revised notation.
  • Vb minimum of ⁇ Nb; minimum over all m of Nm I MAX [ 0, - Mmb ];
  • step 3923 the protein folding reaction is run to fold one copy of unfolded protein species b.
  • the copy count of the protein species is updated (unfolded protein count is decremented and folded protein count incremented).
  • the metabolite counts for metabolites m are decremented by Mmb. Control passes back to step 3911 to recompute Vb for all b based on the new counts.
  • M. genitalium is believed to employ three chaperones to assist protein folding.
  • trigger factor Tig (MG238) co-translationally binds all nascent polypeptides at the ribosome exit site (L23) and assists in early protein folding.
  • GroEL and its co- chaperone GroES (MG393) are believed to help fold intermediate sized proteins (20-60 kDa). GroEL is believed to bind each protein for 30 s to 10 min, and to couple folding to ATP hydrolysis.
  • DnaK and its co-chaperones DnaJ (MG019) and GrpE (MG201) are believed to help fold large proteins (> 30 kDa).
  • DnaK is a monomeric protein which transiently ( ⁇ 2 min) binds to the backbones of short, linear, unfolded peptide segments.
  • GrpE is believed to couple peptide release to ATP hydrolysis.
  • DnaJ is believed to regulate the activity of DnaK, as well as bind the side chains of hydrophobic and aromatic residues and directly assist protein folding.
  • FtsH and the lipids phosphatidyl ethanolamine and phosphatidyl glycerol have been suggested to assist membrane protein folding.
  • FtsH as a molecular chaperone
  • FtsH has been associated with several additional functions. Consequently, it was decided not to model FtsH as a chaperone.
  • the contributions of phosphatidyl ethanolamine and phosphatidyl glycerol to protein folding are also poorly understood, and are not modeled in the example embodiments.
  • M. genitalium GroEL substrates were reconstructed based on two proteome-scale studies of E. coli and B. subtilis (see Table S3AG).
  • M. genitalium DnaK substrates were reconstructed based on a proteome scale study of E. coli.
  • FIG. 39B is a list of instructions that illustrates an example Protein Folding submodel, according to an embodiment for the M. genitalium example.
  • Post-translational protein modification serves several important functions.
  • post-translation modification can increase the structural and chemical diversity of the proteome by stabilizing alternative conformations and providing catalytic cofactors.
  • post-translational modification, and in particular phosphorylation provides a mechanism to regulate protein activity.
  • post-translational modification can be used to regulate protein expression through proteasome recruitment. This process models protein covalent modification including phosphorylation, iipoyl transfer, and a-glutamate ligation.
  • the Protein Modification sub-model simulates protein covalent modification, the fifth step of post-translational processing. In particular, this process models protein phosphorylation, lipoyl transfer, and a-glutamate ligation. Because the mechanisms of protein modification are not well c aracterized on the genomic scale, this sub-model makes several simplifying assumptions. First, this sub-model assumes that each protein is fully modified in a single time step, collapses the modification of each protein into a single reaction, and only represents unmodified and fully modified proteins. Intermediate protein configurations are not represented.
  • this sub-model assumes that the mean arrival rate, vi, of modification events of each protein species i is independently limited by (1) the copy number of unmodified protein, p, (2) the copy numbers of intracellular metabolites, mj, and (3) the copy numbers of the protein modification enzymes, ej. Based on these assumptions, the functional form of vi is given by Equation 24.
  • Mji Is the stoichiometry of metabolite j in the modification of protein species i
  • M max (0,-M) is the negative part of M
  • Kji is the experimentally observed catalytic rate of enzyme / in the modification of protein species i
  • At is the simulation time step.
  • This sub-model implements a stochastic model of the arrival of protein monomer and macromo!ecular complex modification events with relative rates vi. Until protein, metabolic, and/or enzymatic resources are exhausted, the model iteratively (1) computes the arrival rate, vi, of each modification event, (2) selects a single modification event to execute according to a multinomial distribution parameterized by vi, and (3) executes the selected modification reaction, updating the copy numbers of proteins and metabolites and decrementing the available enzymatic capacity.
  • the Protein Monomer state and Protein Complex state represent the copy number of each protein modification enzyme.
  • the Protein Monomer state also represents the unmodified and modified copy numbers of each protein monomers species.
  • the Metabolite state represents the copy number of each intracel lular metabolite.
  • FIG. 40A is a flowchart that illustrates an example method 4000 for a Protein Modification sub-model, according to an embodiment. This flow chart uses the notation that the count of metabolite index m is Nm, the count of enzyme index e is Ne, and the reaction index is b for protein processing reactions. A protein not yet modified is called an
  • step 4001 the protein monomers and complexes that serve as enzymes for protein modification reactions are identified as is the stoichiometry Mmb of metabolites in the reactions, and catalytic rate Keb of enzymes in the reaction.
  • step 4003 the enzyme capacity Ce during a single time step is set to the count of that enzyme.
  • step 4021 control passes to step 4021 to randomly select a reaction to implement using a multinomial distribution with probabilities based on the arrival rates Vb.
  • step 4023 the RNA processing reaction is run to process one copy of
  • unmodified protein species b unmodified protein species b.
  • the copy count of the protein species is updated (unmodified protein count is decremented and modified protein count incremented).
  • the metabolite counts for metabolites m are decremented by Mmb and the enzyme capacity Ce is also decremented. Control passes back to step 4011 to recompute Vb for all b based on the new counts.
  • M. genitalium protein modification complement was reconstructed in three steps. First, protein modifications that have been observed in M. genitalium, M. pneumoniae, or other related bacteria or which have been computationally predicted were curated. Second, curated protein modifications were mapped to M. genitalium homologs using sequence alignment. Third, based on the genome annotation, M. genitalium was determined to have three protein modification pathways: (1) phosphorylation catalyzed by serine/threonine kinase PrkC (MG109), (2) lipoyl transfer to lysine catalyzed by LplA (MG270), and (3) C- terminal a-glutamate ligation catalyzed by RimK (MG012).
  • MG109 serine/threonine kinase PrkC
  • MG270 lipoyl transfer to lysine catalyzed by LplA
  • RimK RimK
  • the reconstructed protein modification network contains one kinase that modifies 16 proteins, one lipoyl transferase that modifies the E2 subunit of pyruvate dehydrogenase (PdhC, MG272), providing an important organosulfur cofactor which contains a catalytic disulfide bond, and one a-glutamate ligase that modifies 50S ribosomal protein L6 (RplF, MG166).
  • the stoichiometry and kinetics of all three modeled protein modification reactions were based on a review of the primary literature. Catalytic disulfide bonds were separately reconstructed and are modeled by several chemical reactions in the Metabolism sub-model.
  • FIG. 40B is a list of instructions that illustrates an example Protein Modification sub-model, according to an embodiment for the M. genitalium example.
  • Macromolecular complexes are believed to farm quickly (kon ⁇ 10 - 10 M s ),
  • the Protein Complexation sub-model simulates the formation of macromolecular complexes as a stochastic process. Because macromolecular Complexation is poorly understood, this sub-model makes several simplifying assumptions. First, this process assumes that macromolecular complexes form spontaneously, uncoupled to other chemical reactions and without assistance from chaperones or other proteins. The contribution of chaperones to the three-dimensional folding of individual protein monomers is modeled by the Protein Folding process. Second, this sub-model assumes that macromolecular complexation is fast and proceeds to completion within the Is time step of the simulation. Third, this process makes the simplifying assumption that each macromolecuie complex forms with the same specific rate. Fourth, this sub-model assumes that complexes form by simultaneous collision of each subunit, Together these assumptions imply that the relative formation rate, ri, of each complex, / ' , is described by mass-action kinetics, as given by Equation 25.
  • the copy count of each macromolecular complex is initialized by iteratively evaluating the dynamic model until convergence, or more specifically until insufficient subunits are available to form additional complexes.
  • RNA and Protein Monomer states represent the copy count of each RNA and protein subunit synthesized by the RNA and protein synthesis pathways.
  • FIG. 41A is a flowchart that illustrates an example method 4100 for a Protein Complexation sub-model, according to an embodiment. This flow chart uses the notation that the count of metabolite index m is Nm, the count of gene product index g is Ng, where gene products include both protein monomers produced by translation and RNA produced by RNA polymerase reactions, and the reaction index is x for complex processing reactions to produce complex x. The gene products to be combined in the complex are also called subunits.
  • the flow is similar to the flow of the stochastic models for Protein Processing I, among others.
  • step 4101 the protein monomers and RNA that serve as subunits of complexation reactions are identified, as is the stoichiometry Sxg of gene products in the reaction x. There are no enzymes in the modeled complexation reactions.
  • control passes to step 4121 to randomly select a reaction to implement using a multinomial distribution with probabilities based on the arrival rates Vx.
  • step 4123 the Complexation reaction is run to assemble the subunits of one complex x.
  • the copy count of the complex and gene products are updated (free gene products counts are decremented by Sxg and complex copy count incremented).
  • Control passes back to step 4111 to recompute Vx for all x based on the new counts.
  • FIG. 41B is a list of instructions that illustrates an example Protein Complexation sub-model, according to an embodiment.
  • Ribosomes are large ribonucleoprotein s which synthesize polypeptides. Thus ribosomes are a special case of protein complexes.
  • the Ribosome Assembly sub-model makes several simplifying assumptions to model ribosomal particle assembly. First, the process only represents individual rRNA transcripts and protein monomers and fully formed ribosomal particles. Intermediate states of ribosomal particle formation are not represented. Second, this process assumes that ribosomal particle formation is kinetically fast and energetically favorable such that ribosomal particle formation proceeds to completion within the simulation time step. Finally, the process assumes that each GTPase hydrolyzes 1 GTP molecule per ribosomal particle. With these assumptions, the rate of formation of ribosomal particle i is given by Equation 26.
  • the sub-model implements a stochastic model of the arrival of ribosomal particle assembly events with relative rates Act Until RNA, protein, metabolic, and/or enzymatic resources are exhausted, the model iteratively (1) computes the arrival rate, Aci, of each assembly event, (2) selects an event to execute according to a binomial distribution parameterized by Aci, and (3) executes the selected assembly reaction, updating the copy numbers of RNA, protein, and metabolites and decrementing the available enzymatic capacity.
  • the RNA and Protein Monomer states represents the free copy numbers of each RNA and protein monomer species.
  • the Protein Complex state represents the copy numbers of ribosomal particles and the ribosome.
  • the Ribosome state represents the (t)mRNA location of each ribosome.
  • the RNA and Protein Monomer states initialize the total copy number of each rRNA and protein monomer species.
  • the Ribosome Assembly sub-model initializes the total copy counts of ribosomal particles to a steady-state of the ribosome assembly sub-model by evaluating the sub-model with excess GTP.
  • the Translation sub-model initializes the copy number of ribosomes, and randomly positions each ribosome on mRNA weighted by the expressed copy number of each codon. The expression of each ribosomal RNA and protein monomer was fit to provide sufficient ribosomes for translation and prevent sustained amino acid accumulation.
  • FIG. 42A is a flowchart that illustrates an example method 4200 for a Ribosome Assembly sub-model, according to an embodiment.
  • This flow chart uses the notation that the count of metabolite index m is Nm, the count of gene product index g is Ng, where a gene products include both protein monomers produced by translation and RNA produced by RNA polymerase reactions, and the reaction index is x for complex processing reactions to produce complex x.
  • the gene products to be combined in the complex are also called subunits.
  • metabolite m e.g., GTPases and H20
  • the M. genitalium 70S ribosome is composed of two subunits - the 30S and SOS ribosomal particles - which assemble at the mRNA Shine Dalgarno sequence with assistance from initiation factors.
  • the M. genitalium example embodiment of this sub-model simulates the enzyme-catalyzed formation of 30S and SOS ribosomal particles.
  • the 30S particle is composed of 1 RNA and 20 protein monomer subunits.
  • the SOS particle is composed of 2 RNA and 32 protem monomer subunits.
  • the 30S and SOS particles have both been shown to assemble in stereotyped patterns with assistance from several GTPases. Era (MG387) and RbfA (MG143) have been associated with 30S particle formation. EngA (MG329), EngB (MG335), Obg (MG384), and RbgA (MG442) have been associated with SOS particle formation. The exact functions, kinetics, and energetics of the six GTPases are unknown.
  • FIG. 42B is a list of instructions that illustrates an example Ribosome Assembly sub-model, according to an embodiment.
  • Various organisms have cells with one or more special organelles, such as Golgi bodies. These organelles are described with a hierarchical assembly of protein content.
  • Organelle Assembly sub-model implements a Boolean model of the observed hierarchical assembly of the protein content, for example of the M, genitalium. terminal organelle. After the total cell mass is initialized, the Protein Monomer state initializes the organelle copy number of each organelle protein.
  • FIG. 43A is a flowchart that illustrates an example method for a Organelle Assembly sub-model, according to an embodiment.
  • set parameters for special organelle assembly protein IDs of protein species found in special organelle (mitochondria, golgi bodies, terminal organelle...); protein IDs of organelle assembly enzymes;
  • step 4311 it is determined whether there is a next protein species b for translocation. If so, then in step 4313 it is determined if there are sufficient resources to translocate at least one protein, in terms of translocation enzymes and metabolites. If so, then in step 4315 copies of the protein species are localized in organelle; copy count of protein in organelle compartment is incremented; copy count of protein in cytosol is decremented; consumed metabolite counts are decrement and released metabolite counts are incremented. Control returns to step 4311 to select the next protein species. In some embodiments, the protein species are retrieved randomly.
  • step 4321 it is determined whether there is a next protein species b for assembly in the organelle compartment. If so, then in step 4323 it is determined if there are sufficient resources to assemble at least one protein, in terms of assembling enzymes and metabolites. If so, then in step 4325, copies of the protein species are assembled in organelle; copy counts of protein and metabolites are updated. Control returns to step 4321 to select the next protein species. In some embodiments, the protein species are retrieved randomly. If there are no more protein species to assemble or resources to assemble them, then the method ends.
  • M. genitalium maintains a flask shape with a single 300 x80 nm membrane-bound bleb or terminal attachment organelle throughout most of its life cycle.
  • the M. genitalium terminal organelle divides during cell division, producing a daughter organelle which subsequently migrates to the opposite pole.
  • the M. genitalium terminal organelle has been associated with several cellular processes including motility, adhesion, replication, and cytokinesis.
  • a Terminal Organelle sub-model models the assembly of the protein content of the terminal organelle.
  • the terminal organelle is composed of eight proteins - high molecular weight cytadherence accessory proteins (HMW) 1-3 (MG312, MG218, MG317), adhesins MgPa (MG191), P32 (MG318) and P65 (MG217), and proteins PI 10 (MG192) and P200 (MG386)406-409.
  • the terminal organelle protein content is believed to assemble in the stereotyped hierarchical pattern illustrated in FIG. 43B.
  • FIG. 43B is a block diagram of hierarchical protein organization of a Terminal Organelle, according to an embodiment.
  • HMW1 and HMW2 mutually recruit each other into the terminal organelle.
  • HMW1 recruits MgPa, HMW3, and P200 into the terminal organelle.
  • HMW3 recruits P32 which in turn recruits P65.
  • PI 10 independently localizes to the terminal organelle.
  • the kinetics of terminal organelle assembly are unknown.
  • the Terminal Organelle sub-model implements a Boolean model of the observed hierarchical assembly of the protein content of the ⁇ /. genitalium terminal organelle. After the total cell mass is initialized, the Protein Monomer state initializes the terminal organelle copy number of each terminal organelle protein.
  • FIG. 43C is a list of instructions that illustrates an example Terminal Organelle Assembly sub-model, according to an
  • Protein chemical regulation was reconstructed based on extensive review of the primary literature and several databases, with particular emphasis on antibiotics.
  • the reconstructed protein chemical regulation network contains 16 metabolites, three Boolean- valued pseudometabolites or stimuli, and temperature which regulate 10 proteins including four transcription factors, Topoisomerases II and IV, the 30S and 50S ribosomal particles.
  • the 10 putative chemically regulated proteins are critical for transcription, supercoiling, and translation.
  • the effects of physical properties such as temperature and pH on protein activity were curated from BRENDA, BioCyc, and UniProt.
  • the prosthetic groups and coenzymes required for maturation and catalysis were also curated.
  • mj an element of the vector m in Equation 26
  • sj an element of the vector s in Equation 26
  • T the temperature
  • the Protein Folding sub-model separately models the role of chaperones in protein folding and accounts for the small molecule prosthetic groups required for protein folding.
  • Several processes including Metabolism model the coenzymes required for catalytic activity. After the temperature, cell volume, pseudometabolite values, and metabolite and total protein copy numbers are initialized, the competency state of each of the regulated proteins is initialized to the steady state of the regulatory network. Because the regulatory network is acyclic, the network converges in one iteration.
  • FIG. 44A is a flowchart that illustrates an example method 4400 for a Protein Activation sub-model, according to an embodiment.
  • the index b is used for proteins, soft) corresponds to ft in Equation 26.
  • the sub-model sets parameters for protein activation: Boolean function fb determining whether concentration of metabolites Mm (based on counts Nm of metabolites divided by volume v of cell), temperature T and other stimuli Ss are within ranges for a protein of species b to be competent.
  • step 4411 the next protein species b is selected for activation, until all proteins that are not inherently competent have been selected and the method ends. When one is selected, then it is determeind in step 4413, whether Equation 26 evaluates to true. If so, then in step 4415 all copies of protein species b are marked competent (i.e., all activated none inactivated). If not, then in step 4417 no copies of protein species b are marked competent (i.e., none activated all inactivated). In either case, control returns to step 4411 to select the next protein species.
  • FIG. 44B is a list of instructions that illustrates an example Protein Activation submodel, according to an embodiment for an M. genitalium example.
  • Protein decay includes degradation, disassembly and refolding. Protein degradation serves two critically important functions. First, protein degradation eliminates abnormal and potentially toxic proteins including prematurely aborted polypeptides tagged with SsrA degradation signals, and salvages their valuable metabolic resources to support protein synthesis. Second, degradation controls protein expression, enabling cells to direct valuable amino acids away from ineffective or even harmful proteins toward productive proteins, regulate protein activity, and respond to external signals. The degradation rates of E. coli proteins are correlated with their N-terminal residue, suggesting that some cells target proteins for degradation by manipulating their N-termini. This relationship is known as the N-end rule.
  • Protein refolding is an efficient mechanism for eliminating abnormally folded proteins, requiring less energy than protein degradation and avoiding the large cost of re- synthesis.
  • the kinetics and energetics of protein refolding are not well characterized. This process models the degradation of protein monomers, macromolecular complexes, cleaved signal sequences, and prematurely aborted polypeptides as well as the mis-folding and refolding of protein monomers and complexes.
  • the half-life of each protein monomer was predicted using the N-end rule.
  • the half-life of each macromolecular complex was set equal to the weighted mean half-life of its RNA and protein subunits.
  • the half-lives of damaged and misfolded configurations of each protein species, cleaved signal sequences, and prematurely aborted polypeptides were set to zero to reflect their rapid degradation.
  • the sub-model stochastically degrades proteins as a function of the copy number and catalytic rate of each enzyme and the number of cytosolic ATP molecules.
  • the sub-model assumes (1) cytosol- localized nascent, mature, misfolded, and damaged proteins are degraded by protease La and six peptideases, (2) membrane and extracellular-localized proteins are not vulnerable to degradation by the cytosol localized protease La, (3) aborted polypeptides are degraded by protease FtsH and six peptidases, (4) protein degradation is kinetically fast and therefore this sub-model doesn't represent intermediate degradation states, and (5) aborted, misfolded, and damaged proteins are immediately degraded if proteases are expressed and energy is available.
  • the Protein Decay sub-model simulates protein unfolding as a Poisson process with unfolding rate parameter ku set to the nominal rate 10 6 per second per molecule.
  • Cytosol-localized protein re-folding is modeled as a single chemical event catalyzed by the cyto sol-localized chaperone ClpB and driven by the hydrolysis of a single ATP molecule. Because protein refolding is not well characterized, this process assumes that if ClpB is expressed and excess ATP is available, protein refolding proceeds to completion within the simulation time step. If ClpB is expressed, but ATP is not present in excess, the model stochastically re-folds proteins equal to the intracellular copy number of ATP. This model makes several simplifying assumptions. First, the model assumes that all protein species mis- fold and re-fold at the same rate. Second, the model assumes that the kinetics of protein mis- folding and re-folding are fast and therefore intermediate mis-folded states can be ignored.
  • the Protein Monomer and Protein Complex states represent the copy number of each protein monomer and macromolecular complex including the signal sequence, misfolded, and damaged copy numbers of each protein species and the copy number of each proteolytic enzyme in each of five compartments: cytosol, membrane, terminal organelle cytosol, terminal organelle membrane, and extracellular space.
  • the Polypeptide state represents the amino acid sequence of each prematurely aborted polypeptide.
  • Chromosome, FtsZ Ring, Ribosome, and RNA Polymerase states represent the detailed configurations of DNA-bound proteins, the FtsZ septal ring, RNA polymerases, and ribosomes, respectively. These detailed configurations are updated when DNA-bound proteins, FtsZ, RNA polymerase, or ribosomes are degraded to reflect decreased enzyme copy numbers.
  • the Transcript and Polypeptide states are also updated upon RNA polymerase or 70S ribosome degradation to reflect transcript and polypeptide abortion.
  • the RNA state represents the damaged copy number of each RNA species.
  • the Metabolite state represents the copy number of each intracellular metabolite.
  • the Protein Decay sub-model marks RNA subunits of degraded macromolecule complexes as "damaged" for immediate degradation by the RNA Decay process. Initially, misfolded and damaged configurations of each RNA and protein species are set with zero copy counts because the typical occupancies of these configurations are small compared to that of the mature configuration. Similarly, the cell is initialized with no prematurely aborted polypeptides.
  • the RNA, Protein Monomer, and Protein Complex states initialize the total copy counts of each RNA and protein species. The express on of the proteases and peptidases was fit to provide sufficient enzymes to quickly degrade damaged proteins, or more specifically to prevent sustained accumulation of damaged proteins.
  • FIG. 45A through FIG. 45D are flowcharts that illustrate an example method 4500 for a Protein Decay sub-model, according to an embodiment.
  • the index for target protein monomer species (gene product) is b
  • for target polypeptide (partial gene product) species is p
  • the index for enzyme (protein or complex) species is e
  • the index for target complex species is x
  • the index for target RNA species is r
  • the index for metabolites is m.
  • parameter values are set, including values for: unfolding rates KUb, KUx; kinetic rate Ke in units of amino acids released per second by enzyme e; degradation rates of proteins and complexes KDPb and KDCx, respectively, based on their half-lives; stoichiometry Mmb of metabolite m in reaction b to degrade protein monomer; stoichiometry Mmx of metabolite m in reaction x to degrade complex; stoichiometry Rrx of RNA species r in reaction x to degrade complex; stoichiometry Pbx of protein monomer species b in reaction x to degrade complex; stoichiometry Mmp of metabolite m (including metabolites of ATP hydrolysis) in reaction p to degrade polypeptide.
  • KU is KUb a function of the protein species (or some other factor or factors); but, in the illustrated embodiment, KU is a constant for gene product protein monomers.
  • the count of unfolded proteins is designated NUb and is incremented by the random number pulled from the Poisson distribution.
  • the count of unfolded complexes is designated NUx and is incremented by the random number pulled from the Poisson distribution.
  • unfolded proteins or complexes are randomly folded while folding resources are available (e.g., folding enzyme present and metabolites are sufficient for ATP hydrolyss, and unfolded proteins or complexes remain), as determined in step 4521 and 4535, respectively.
  • Copy counts of affected state variables are updated.
  • the selected unfolded protein or complex is based on a multinomial distribution with probability for each species based on the copy count of unfolded members of the species.
  • occurrence rate Nx*KDCx— provided resources are sufficient, as determined in step 4521 and step 4551.
  • Each RNA and protein subunit produced by degrading a complex is counted as damaged.
  • the M. genitalium chromosome contains one protein chaperone—the cytosol- localized and ATP-dependent protease ClpB (MG355)— dedicated to protein refolding.
  • the M. genitalium genome contains homologs of proteases FtsH (MG457) and La (MG239) and six peptidases: aminopeptidase (MG324), cytosol aminopeptidase (MG391), glycoprotease (MG208), metalloendopeptidase (MG046), oligoendopeptidase F (MG183), and proline iminopeptidase (MG020) 182,634 .
  • the membrane-localized protease FtsH is believed to cleave prematurely aborted polypeptides tagged with an N-terminal SsrA tag into
  • FIG. 45E through FIG.45G are lists of instructions that illustrate an example Protein Decay sub-model, according to an embodiment for an M. genitalium example.
  • FtsZ is a homologue of eukaryotic tubulin that assembles into long polymers. These polymers are typically localized to the center of the cell, forming a membrane-bound ring.
  • FtsZ is a GTPase, and GTP hydrolysis to GDP causes the FtsZ filaments to bend. This bending serves as one of the forces enabling cell division. GTP only hydrolyzes to GDP at an active site formed at the interface of two FtsZ molecules, meaning that FtsZ polymerization is required for GTPase activity.
  • the FtsZ polymerization sub-model addresses the assembly FtsZ monomers into long polymers.
  • FtsZ polymers of variable length are able to bind the cell membrane to form contract le rings.
  • the mean filament length in E. coli has been measured to be 100 nm (23 FtsZ monomers).
  • the minimum fragment length observed in C. crescentus is 40 nm (9 FtsZ monomers).
  • M.genitalium would have filaments on the lower end of what is sufficient for contractile force generation.
  • the sub-model uses polymers of a fixed length, 9 subunits. This sub-model simulates polymerization of FtsZ-GTP monomers to 9mers.
  • the sub-model uses a differential equation model to assemble the polymers .
  • FtsZ-GTP can exist in any intermediate state: lmer, 2mer, ... 8mer, 9mer.
  • the set of differential equations models the following reactions: Activation of FtsZ monomers (F) to FtsZ-GTP(FT); Deactivation of FtsZ-GTP(FT) to FtsZ(F) Exchange of FtsZ- GDP(FD) to FtsZ-GTP(FT); Reverse exchange of FtsZ-GTP(FT) to FtsZ-GDP(FD);
  • FIG. 46 is a list of parameters and values for an FtsZ polymerization sub-model useful for modeling cell division, according to an embodiment.
  • the count of full-length activated FtsZ (9mer) filaments is used by the Cytokinesis sub-model for cel l pinching.
  • the FtsZ polymerization differentia] equations are run a number of times (up to 50 iterations) before the simulation starts to ensure starting at a steady state distribution of polymer lengths. Iterations are run until the change in the solution across consecutive iterations is less than a set tolerance (e.g., about 0.2% of the enzymatic counts).
  • FtsZ can exist in one of multiple states: inactivated monomer, activated monomer (GTPbound), nucleated (dimer of two activated FtsZ molecules), elongated polymer of three or more GTPbound FtsZ molecules.
  • the hydrolyzed FtsZ polymers are only considered during cell membrane pinching in the Cytokinesis sub-model.
  • the FtsZ molecules can move to states of higher and lower polymerization at known rates.
  • FtsZ polymerization is modeled using a set of differential equations modified involving the activation, nucleation, and elongation of FtsZpolymers. The main modifications were that the equations were simplified to not include annealing and cyclization of FtsZ polymers.
  • the following differential equation model given by equations 27 is evaluated at each time step.
  • M. genitalium is believed to have adapted to the rich environment provided by its host human urogenital epithelium by massive degenerative evolution from low G+C content Gram positive bacteria, eliminating non-essential genes involved in oxidative
  • M. genitalium has evolved metabolic enzymes with relaxed substrate specificity.
  • M. genitalium is generally cultured on SP-4 medium, a complex and undefined medium based on CMRL- 1066.
  • Extracellular Medium The in silico medium was based on SP-4 medium with supplementary metabolites added to facilitate in silico growth.
  • the chemical composition of SP-4 medium was estimated from the characterized composition of each medium component.
  • additional metabolites were added until the flux-balance analysis (FBA) metabolic model predicted non-zero growth, Finally, the concentrations of several metabolites were increased to support sustained cell growth.
  • FBA flux-balance analysis
  • FIG. 47 A and FIG. 47B are block diagrams that illustrate part of a metabolic network of M. genitaUum, according to an embodiment.
  • the M, genitaUum metabolic network was reconstructed with guidance from the M. genitaUum flux-balance analysis (FBA) metabolic model. Briefly, the metabolic network was reconstructed based on homology to previously modeled organisms and an extensive review of the primary literature. In particular, the metabolic network was reconstructed to support the metabolic demands of all 27 other sub-models.
  • FBA flux-balance analysis
  • the Metabolism sub-model models several critical cellular functions including (1) the uptake of external nutrients, (2) the synthesis of the metabolic building blocks and intermediate energy carriers, (3) lipid assembly, membrane insertion, and maturation, and (4) the recycling and export of the metabolic byproducts of other cellular processes.
  • This process makes two key assumptions and uses FBA to model M. genitaUum metabolism. In particular, this process calculates the flux, v, of each metabolic reaction.
  • the sub-model assumes that the internal dynamics of the metabolome are fast compared to the time step of the M. genitaUum simulation, or, equivalently, that the metabolic network can be considered to be at steady- state on the Is simulation time scale. Second, the sub-model assumes that the M.
  • genitaUum metabolic network maximizes cellular metabolite production given the available extracellular nutrients and metabolic enzvmes.
  • the Metabolism sub-model has four differences from most FBA metabolic models.
  • the metabolic network, S, and the cellular mass production pseudoreaction are expanded to produce all of the metabolites required by the 27 other sub-model, where Sij is the stoichiometry of metabolite i in reaction j and Sib is the stoichiometry of metabolite i in the cellular mass production pseudoreaction, b.
  • the optimization objective, g is expanded to include the constmction of new cell mass as well as the recycling and export of the metabolic byproducts of the 27 other sub-models.
  • the objective g represents the relative fitness gains of intracellular metabolite recycling and novel cell mass production, that is, g represents how heavily the metabolic network favors novel cell mass production over metabolite recycling, g was set assuming that the production/exchange of each molecule contributes equally to cellular fitness. Specifically g was set such that the production/exchange of each individual molecule is weighted equally.
  • the flux bounds, vl are vu, are functions of the cell dry mass, m, the copy count and maximum exchange rate of each extracellular metabolite, nm and k+ex, the thermodynamics, ⁇ , of each reaction, and the copy number and maximum catalytic rate of each metabolic enzyme, ne and k+cat.
  • external exchange flux bounds are functions of the cell dry mass and the extracellular copy counts and maximum exchange rates of the exchanged metabolites
  • internal exchange flux bounds are functions of the intracellular copy numbers of the exchanged metabolites
  • transport and chemical reaction flux bounds are functions of the copy numbers and kinetics of the enzymes and the reaction thermodynamics.
  • FIG. 47C and FIG. 47D are block diagrams that illustrate an integrated flux balance analysis for M. genitalium, according to an embodiment.
  • Equation set 28 summarizes the FBA metabolic model.
  • the Metabolism sub-model imports extracellular nutrients and converts them into the building blocks of macro-molecular synthesis.
  • the Metabolite state represents both the extracellular and cellular copy numbers of each metabolite. 'The rates of extracellular nutrient exchange are functions of the total cell mass represented by the Mass state and the extracellular metabolite copy numbers. Cytosolic and membrane metabolite copy counts limit the rates of internal exchange/recycling. The rates of chemical and transport reactions are limited by the copy numbers and kinetics of metabolic enzymes synthesized by the protein synthesis and maturation pathway and represented by the Protein Monomer and Protein Complex states.
  • the Metabolic Reaction state records the calculated flux of each metabolic reaction.
  • the Simulation object allocates metabolites produced by the Metabolism sub-model to the other 27 sub-models to support several functions including DNA, NA, and protein synthesis and maturation. Additionally, the other 27 sub-models generate byproducts which the Metabolism sub-model either recycles or exports from the cell.
  • the Metabolite state initializes the copy count of each metabolite.
  • the Protein Monomer state initializes the total copy count of each protein monomer.
  • the Macromolecular Complexation sub-model initializes the copy count of each macromolecular complex, This process initializes the cellular growth rate and reaction fluxes to the steady-state of the metabolic network using the same FBA simulation used at each time step,
  • FIG. 47E and FIG. 47F are lists of instructions that illustrate an example Metabolism sub-model, according to an embodiment for the M. genitalium example.
  • Cytokinesis the division of the cell into two separate daughter cells, is the final step in the cell cycle,
  • the major step of cytokinesis is the pinching of the cell membrane in a certain location (typically the midline of the cell) until the membrane is separated and division is complete.
  • a membrane-bound ring of a tubulin homologue forms at the cell membrane around the pinching site, and the localization and assembly of this ring requires a set of accessory proteins. Both contractile forces of the ring filaments and external forces by the accessory proteins contribute to the pinching of the cell membrane,
  • M. genitalium has the tubulin homologue, FtsZ (MG224), but does not contain the accessory proteins required for cell division in other bacterial species, Therefore, cytokinesis in M. genitalium is not fully understood.
  • FtsZ tubulin homologue
  • the proposed mechanism invokes the formation of a contractile Z ring along the interior surface of the midline of the cell membrane.
  • the ring is comprised of GTP-activated FtsZ polymer filaments that bend when their GTP is hydrolvzed. The bending draws the membrane-bound ends of the filaments together, thereby constricting the cell membrane. Many such pinching events result in a complete constriction of the cell.
  • M. genitalium cytokinesis is thus a cycle of filament binding, bending, and dissociation. It has been suggested that M. genitalium cell division can occur in the absence of FtsZ through motility.
  • the Cytokinesis sub-model uses the iterative pinching model because M. genitalium motility is not accessible, and FtsZ is known to be an essential gene in M. genitalium.
  • FIG. 48A is a list of parameters and values for the Cytokinesis sub-model useful for modeling cell division, according to an embodiment. All of the parameters used in the Cytokinesis sub-model are described in FIG. 48 A.
  • FIG. 48B is a block diagram that illustrates an example cycle of filament bending during cytokinesis in the Cytokinesis sub-model, according to an embodiment.
  • First straight FtsZ-GTP filaments bind inside the cell membrane, such that they form an inscribed polygon (stage A).
  • the GTP molecules in these filaments hydrolyze to GDI 3 , causing the filaments to bend. Since the filaments are cell membrane bound, this bending pinches the membrane inwards.
  • the filaments bend just enough to approximately form a new circle.
  • the new circumference of the pinched region is now smaller, and equal to the perimeter of the inscribed polygon (stage C).
  • Cytokinesis was implemented as a process that is only initiated upon chromosomal segregation.
  • the iterative pinching is modified such that two sets of filaments bind (stage A). Multiple filament sets are involved to maintain the progress of membrane constriction. Indeed, it was previously proposed that the FtsZ filaments are more abundant and that a new ring of filaments attaches to stabilize the membrane and maintain progress before the depolymerization event. Others report that the FtsZ ring is 3-9 filaments thick depending on the bacterial strain. Using the lower bound, since M. genitali m is a very small bacterium, the sub-model assumes that the FtsZ maintains a thickness of up to 3 filaments.
  • One embodiment of the sub-model involves two sets of straight FtsZ filaments to inscribe the diameter of the cell. Then up to one set of filaments can depolymerize while two new sets of straight filaments bind the diameter of the cell. Only once the two new sets of straight filaments have bound, can the second bent set of filaments depolymerize.
  • the output of the Cytokinesis sub-model is an indication of the progress of cell pinching, including the determination of when the cell has successfully split into two daughter cells. Complete cell division is also the trigger to end the entire simulation.
  • FIG. 48C is a list of state variables and sub-models that interact with the Cytokinesis sub-model, according to an embodiment. Initialization steps are not required for this process, as
  • the sub-model bends a pair of full, length (9mer) FtsZ polymers along the circumference of the midplane of the cell.
  • FtsZ full, length
  • circle and arc formulas are used to calculate the new smaller circumference of the cell.
  • one or more of the following five steps is performed.
  • the sub-model attempts to bind two straight filaments at each polygon edge, forming a regular polygon that inscribes the pinched cell circumference.
  • Bent filament dissociation II (stage B): If there was a previous cycle and at least one straight filament has bound each polygon edge, then dissociate the bent filament from each edge.
  • dissociating each filament is equal to kunbind.
  • Filament bending (Schematic S6C): If all polygon edges have two filaments bound and all residual bent filaments have dissociated, bend the edges of the newly completed polygon. When the filaments bend, their length does not change, and the amount of bending is only sufficient to form a new circle. The new circumference is therefore equal to the old polygon's perimeter. Each fragment is now an arc. For simplicity, all of the GTP molecules in the pair of filaments at a particular edge hydrolyze at the same time. In this version of the model, al l filaments are of a fixed length. When they are joined end-to-end to form a regular polygon, the polygon may not fully inscribe the entire circumference. This remaining portion of the circumference does not bend when the polygon does. It is preserved and accounted for in the next iteration.
  • Bent filament dissociation I stage D: If all the filaments have been bent, dissociate one bent filament from each polygon edge. The other ring must remain to preserve pinching progress by maintaining the new smaller circumference,
  • dissociating each filament is equal to kunbind.

Landscapes

  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Molecular Biology (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Genetics & Genomics (AREA)
  • Chemical & Material Sciences (AREA)
  • Physiology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
PCT/US2013/051167 2012-07-18 2013-07-18 Techniques permettant de prédire le phénotype à partir du génotype sur la base d'un modèle de calcul correspondant à une cellule entière Ceased WO2014015196A2 (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201261672855P 2012-07-18 2012-07-18
US61/672,855 2012-07-18
US201361783826P 2013-03-14 2013-03-14
US61/783,826 2013-03-14

Publications (2)

Publication Number Publication Date
WO2014015196A2 true WO2014015196A2 (fr) 2014-01-23
WO2014015196A3 WO2014015196A3 (fr) 2014-03-27

Family

ID=49949385

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2013/051167 Ceased WO2014015196A2 (fr) 2012-07-18 2013-07-18 Techniques permettant de prédire le phénotype à partir du génotype sur la base d'un modèle de calcul correspondant à une cellule entière

Country Status (1)

Country Link
WO (1) WO2014015196A2 (fr)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015199614A1 (fr) * 2014-06-27 2015-12-30 Nanyang Technological University Systèmes et procédés pour la conception de biologie synthétique et simulation de cellules hôtes
WO2018183746A1 (fr) * 2017-03-30 2018-10-04 Monsanto Technology Llc Systèmes et procédés destinés à être utilisés pour identifier de multiples éditions génomiques et pour prédire les effets agrégés des éditions génomiques identifiées
WO2018236852A1 (fr) * 2017-06-19 2018-12-27 Jungla Inc. Interprétation de variants génétiques et génomiques par l'intermédiaire d'un système d'apprentissage mutationnel en profondeur expérimental et informatique intégré
US10453551B2 (en) 2016-06-08 2019-10-22 X Development Llc Simulating living cell in silico
CN111052252A (zh) * 2017-09-01 2020-04-21 X开发有限责任公司 用于对生化环境建模的异构方法
US10657179B2 (en) 2017-09-01 2020-05-19 X Development Llc Bipartite graph structure
US10839937B1 (en) 2018-07-19 2020-11-17 X Development Llc Whole cell circular delta viewer and navigator
US11024403B2 (en) 2018-01-22 2021-06-01 X Development Llc Method for analyzing and optimizing metabolic networks
CN113826167A (zh) * 2019-05-13 2021-12-21 格瑞尔公司 基于模型的特征化和分类
US11309058B2 (en) 2018-03-30 2022-04-19 X Development Llc Modeling the chemical composition of a biological cell wall
US11380420B1 (en) 2018-05-07 2022-07-05 X Development Llc Data structure, compilation service, and graphical user interface for rapid simulation generation
US11393555B1 (en) 2018-09-06 2022-07-19 X Development Llc Dynamic coordinating framework for model cell simulations
CN114912610A (zh) * 2022-05-26 2022-08-16 安徽理工大学 一种基于膜计算的单细胞膜最优解计算优化方法
US11456053B1 (en) 2017-07-13 2022-09-27 X Development Llc Biological modeling framework
US11508459B2 (en) 2018-01-31 2022-11-22 X Development Llc Modified FBA in a production network
US11636916B1 (en) 2018-05-10 2023-04-25 X Development Llc Biological cell simulation heuristics ranking
CN118588164A (zh) * 2024-06-07 2024-09-03 浙江工业大学 一种生物塑料降解的功能基因在线分析平台
EP4423749A4 (fr) * 2021-10-28 2025-09-03 Univ Ramot Système et procédé pour accélérer des simulations de cellules entières

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7415359B2 (en) * 2001-11-02 2008-08-19 Gene Network Sciences, Inc. Methods and systems for the identification of components of mammalian biochemical networks as targets for therapeutic agents
EP1495321A4 (fr) * 2002-03-29 2006-10-25 Genomatica Inc Modeles de metabolismes humains, et procedes associes
WO2004081862A2 (fr) * 2003-03-10 2004-09-23 Michael Ellison Systeme et procede pour simuler des processus pour des cellules vivantes
US20060223058A1 (en) * 2005-04-01 2006-10-05 Perlegen Sciences, Inc. In vitro association studies
US20110059861A1 (en) * 2009-09-08 2011-03-10 Nodality, Inc. Analysis of cell networks

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106663146A (zh) * 2014-06-27 2017-05-10 南洋理工大学 合成生物学设计及宿主细胞模拟系统和方法
WO2015199614A1 (fr) * 2014-06-27 2015-12-30 Nanyang Technological University Systèmes et procédés pour la conception de biologie synthétique et simulation de cellules hôtes
US10453551B2 (en) 2016-06-08 2019-10-22 X Development Llc Simulating living cell in silico
US11990205B2 (en) 2017-03-30 2024-05-21 Monsanto Technology Llc Systems and methods for use in identifying multiple genome edits and predicting the aggregate effects of the identified genome edits
WO2018183746A1 (fr) * 2017-03-30 2018-10-04 Monsanto Technology Llc Systèmes et procédés destinés à être utilisés pour identifier de multiples éditions génomiques et pour prédire les effets agrégés des éditions génomiques identifiées
WO2018236852A1 (fr) * 2017-06-19 2018-12-27 Jungla Inc. Interprétation de variants génétiques et génomiques par l'intermédiaire d'un système d'apprentissage mutationnel en profondeur expérimental et informatique intégré
US11456053B1 (en) 2017-07-13 2022-09-27 X Development Llc Biological modeling framework
US10657179B2 (en) 2017-09-01 2020-05-19 X Development Llc Bipartite graph structure
US11183273B2 (en) 2017-09-01 2021-11-23 X Development Llc Heterogeneous method for modelling a biochemical environment
CN111052252B (zh) * 2017-09-01 2023-06-30 X开发有限责任公司 用于对生化环境建模的异构方法
CN111052252A (zh) * 2017-09-01 2020-04-21 X开发有限责任公司 用于对生化环境建模的异构方法
US11397769B2 (en) 2017-09-01 2022-07-26 X Development Llc Bipartite graph structure
US11024403B2 (en) 2018-01-22 2021-06-01 X Development Llc Method for analyzing and optimizing metabolic networks
US11508459B2 (en) 2018-01-31 2022-11-22 X Development Llc Modified FBA in a production network
US11309058B2 (en) 2018-03-30 2022-04-19 X Development Llc Modeling the chemical composition of a biological cell wall
US11380420B1 (en) 2018-05-07 2022-07-05 X Development Llc Data structure, compilation service, and graphical user interface for rapid simulation generation
US11636916B1 (en) 2018-05-10 2023-04-25 X Development Llc Biological cell simulation heuristics ranking
US10839937B1 (en) 2018-07-19 2020-11-17 X Development Llc Whole cell circular delta viewer and navigator
US11393555B1 (en) 2018-09-06 2022-07-19 X Development Llc Dynamic coordinating framework for model cell simulations
CN113826167A (zh) * 2019-05-13 2021-12-21 格瑞尔公司 基于模型的特征化和分类
EP4423749A4 (fr) * 2021-10-28 2025-09-03 Univ Ramot Système et procédé pour accélérer des simulations de cellules entières
CN114912610A (zh) * 2022-05-26 2022-08-16 安徽理工大学 一种基于膜计算的单细胞膜最优解计算优化方法
CN114912610B (zh) * 2022-05-26 2024-04-09 安徽理工大学 一种基于膜计算的单细胞膜最优解计算优化方法
CN118588164A (zh) * 2024-06-07 2024-09-03 浙江工业大学 一种生物塑料降解的功能基因在线分析平台

Also Published As

Publication number Publication date
WO2014015196A3 (fr) 2014-03-27

Similar Documents

Publication Publication Date Title
WO2014015196A2 (fr) Techniques permettant de prédire le phénotype à partir du génotype sur la base d'un modèle de calcul correspondant à une cellule entière
Thomas et al. Direct nanopore sequencing of individual full length tRNA strands
Monk et al. Multi-omics quantification of species variation of Escherichia coli links molecular features with strain phenotypes
Haas et al. How deep is deep enough for RNA-Seq profiling of bacterial transcriptomes?
Gerdes et al. Experimental determination and system level analysis of essential genes in Escherichia coli MG1655
Zhou et al. Obtaining a panel of cascade promoter-5′-UTR complexes in Escherichia coli
Xavier et al. Systems biology perspectives on minimal and simpler cells
Weber In silico tools for the analysis of antibiotic biosynthetic pathways
Fang et al. How essential are nonessential genes?
Mellor et al. Semisupervised Gaussian process for automated enzyme search
Li et al. Development of a synthetic 3-dehydroshikimate biosensor in Escherichia coli for metabolite monitoring and genetic screening
Wang et al. MinGenome: an in silico top-down approach for the synthesis of minimized genomes
González-González et al. Adaptive mutations in RNA polymerase and the transcriptional terminator rho have similar effects on Escherichia coli gene expression
Charlop-Powers et al. Selective enrichment of environmental DNA libraries for genes encoding nonribosomal peptides and polyketides by phosphopantetheine transferase-dependent complementation of siderophore biosynthesis
Qi et al. Altered metabolic strategies: elaborate mechanisms adopted by Oenococcus oeni in response to acid stress
Shuler et al. Modeling a minimal cell
Freed et al. Genome-wide tuning of protein expression levels to rapidly engineer microbial traits
Jack et al. Transcript degradation and codon usage regulate gene expression in a lytic phage
Choudhury et al. Determinants for efficient editing with Cas9-mediated recombineering in Escherichia coli
Sonnenberg et al. The Pseudoalteromonas multipartite genome: distribution and expression of pangene categories, and a hypothesis for the origin and evolution of the chromid
Williams et al. Extreme mitochondrial reduction in a novel group of free-living metamonads
Paul et al. MLDSPP: Bacterial promoter prediction tool using DNA structural properties with machine learning and explainable AI
Mendum et al. Interrogation of global mutagenesis data with a genome scale model of Neisseria meningitidis to assess gene fitness in vitro and in sera
Shao et al. DpCoA tagSeq: barcoding dpCoA-Capped RNA for direct nanopore sequencing via maleimide-thiol reaction
US20150127317A1 (en) Method for in silico Modeling of Gene Product Expression and Metabolism

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13819768

Country of ref document: EP

Kind code of ref document: A2

122 Ep: pct application non-entry in european phase

Ref document number: 13819768

Country of ref document: EP

Kind code of ref document: A2