[go: up one dir, main page]

WO2020095035A1 - Analyse génomique - Google Patents

Analyse génomique Download PDF

Info

Publication number
WO2020095035A1
WO2020095035A1 PCT/GB2019/053128 GB2019053128W WO2020095035A1 WO 2020095035 A1 WO2020095035 A1 WO 2020095035A1 GB 2019053128 W GB2019053128 W GB 2019053128W WO 2020095035 A1 WO2020095035 A1 WO 2020095035A1
Authority
WO
WIPO (PCT)
Prior art keywords
gene
genomic variation
genomic
variation
individual
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/GB2019/053128
Other languages
English (en)
Inventor
Tamas Korcsmaros
Johanne BROOKS
Simon R. CARDING
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Earlham Institute
University of East Anglia
Quadram Institute Bioscience
Original Assignee
Earlham Institute
University of East Anglia
Quadram Institute Bioscience
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Earlham Institute, University of East Anglia, Quadram Institute Bioscience filed Critical Earlham Institute
Publication of WO2020095035A1 publication Critical patent/WO2020095035A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks

Definitions

  • the present invention relates to analysis of genomic data.
  • the present invention relates to a method of identifying the contribution of a genomic variation to a phenotypic feature.
  • the present invention also relates to identifying, from a group of individuals with a particular phenotypic feature, individuals with one or more genomic variations in common.
  • the present invention also relates to identifying, from a group of individuals with a particular phenotypic feature, an individual with a susceptibility to one or more treatment pathways.
  • the present invention also relates to determining susceptibility of an individual to a particular treatment pathway, and in particular, a method of determining the susceptibility of an individual with one or more phenotypic features to a particular treatment pathway which may be associated with one or more of the phenotypic features.
  • the present invention also relates to a method of assigning an individual to one of a certain number of treatment pathways on the basis of their genetic profile, and in particular, a method of assigning an individual with one or more phenotypic features to one of a certain number of treatment pathways on the basis of their genetic profile, wherein the treatment pathway is relevant to one or more aspects of their genetic profile.
  • GWAS Genome-Wide Association Studies
  • a computer-implemented method of identifying contribution of a genomic variation to a phenotypic feature comprises determining a degree of a genomic variation in each individual in a set of individuals and recording the degree of genomic variation in each individual in a database, for example, in a first table.
  • the method comprises determining a location of the genomic variation to allow determination, by virtue of the location, of whether the genomic variation affects a gene product directly and/ or regulates production of a gene product, for example, from a table, such as a second table.
  • the method comprises determining a first gene affected by the genomic variation.
  • the genomic variation affects a gene product directly, recording the gene, in the coding sequence of which the genomic variation is located, as the first gene for example, in a second table.
  • the genomic variation regulates the production of a gene product, recording the gene, in the promoter region or other regulatory region of which the genomic variation is located, as the first gene, for example, in the second table.
  • the method comprises determining an outcome of the genomic variation on the first gene and recording the outcome of the genomic variation in the database, for example, in the second table, and determining the presence or absence of one or more other gene products (i) interacting with the first gene, (ii) encoded by the first gene; and/or (iii) regulating the first gene and, if present, recording the identity of the gene encoding the one or more other gene products in the database as the second gene for example, in the second table.
  • a computer- implemented method comprises determining a degree of a genomic variation in each individual in a set of individuals and recording the degree of genomic variation in each individual in a database, determining a location of the genomic variation to allow determination, by virtue of the location, of whether the genomic variation affects a gene product directly and/or regulates production of a gene product; determining a first gene affected by the genomic variation wherein, when the genomic variation affects a gene product directly, recording the gene, in the coding sequence of which the genomic variation is located, as the first gene, when the genomic variation regulates the production of a gene product, recording the gene, in the promoter region or other regulatory region of which the genomic variation is located, as the first gene, and determining an outcome of the genomic variation on the first gene and recording the outcome of the genomic variation in the database.
  • a computer-implemented method of determining the presence or absence of one or more other gene products A degree of genomic variation in each individual in a set of individuals has been recorded in a database, a location of the genomic variation to allow determination, by virtue of the location, of whether the genomic variation affects a gene product directly and/or regulates production of a gene product has been determined, a first gene affected by the genomic variation has been determined wherein, when the genomic variation affects a gene product directly, the gene has been recorded, in the coding sequence of which the genomic variation is located, as the first gene and, when the genomic variation regulates the production of a gene product, the gene has been recorded, in the promoter region or other regulatory region of which the genomic variation is located, as the first gene, and an outcome of the genomic variation on the first gene has been determined and recorded in the database.
  • the method comprises determining the presence or absence of one or more other gene products: (i) interacting with the first gene, (ii) encoded by the first gene; and/or (iii) regulating the first gene, and, if present, recording the identity of the gene encoding the one or more other gene products in the database as the second gene, optionally in the second database.
  • genomic variation means a difference in any aspect of the coding sequence that determines the genetic makeup of an individual compared to a comparable aspect of code from another individual or group of individuals.
  • genomic variations include allelic variations, polymorphism or mutations in DNA or RNA, such as hereditary mutations and somatic mutations, missense or non- synonymous, synonymous and nonsense mutations, insertions, deletions, substitutions, inversions, frameshift mutations, repeat expansions, duplications, copy number variations, point mutations, single nucleotide polymorphisms (SNPs). Mutation may be within an extra-chromosomal nucleotide sequence (such as a plasmid) or a chromosomal nucleotide sequence.
  • SNPs single nucleotide polymorphisms
  • Genomic variations may also include epigenetic modifications, for example DNA methylation (for example of CpG regions/islands) or histone modifications (for example methylation, acetylation, phosphorylation, ubiquitination).
  • phenotypic feature means an identifiable trait or condition. It includes observable characteristics, such as one or more aspects of morphology, for example bone length; physiology, for example metabolic rate; or behaviour, such as aggression. It also includes diseases, clinical conditions and/or pathologies in any stage or state, or a marker of a disease, clinical condition or pathology, or a marker of a response to treatment of a disease. It also includes desirable traits (for example increased milk yield in a cow), or undesirable traits, such as biofilm formation in a bacteria or bacterial resistance to an antibiotic.
  • the phenotypic feature may be a disease, clinical condition or pathology, or a stage of a disease, clinical condition or pathology; or a marker of a disease, clinical condition or pathology.
  • the phenotypic feature may be a marker of a response to treatment of a disease, clinical condition or pathology or a stage of a disease, clinical condition or pathology.
  • Examples include elevation of one or more markers of inflammation; depression of a metabolite or hormone, for example depression of insulin levels as an indicator of diabetes; presence or absence of biomarkers associated with a disease or condition, for example CD34 or CD38 as prognostic biomarkers for acute B lymphoblastic leukemia; elevation or depression of expression of transcripts, proteins and/or metabolites, for example elevation of phospholipid metabolites as an indicator of cancer cell growth, or altered levels of cell death markers, such as apoptotic markers, as an indicator of neurodegenerative conditions or cancer.
  • the term“individual” means any organism, for example eukaryotes such as animal, plants and protists, prokaryotes such as bacteria and Archaea, viruses and fungi.
  • the degree of genomic variation in each individual in a set of individuals is then determined.
  • the step of determining the degree of a genomic variation in each individual in a set of individuals may be carried out by comparing the code or a part of the code that determines the genetic makeup of each individual to the comparable code, or comparable part of the code from a control group.
  • the individual has or displays one or more phenotypic features of interest.
  • the control group does not have and/ or display the phenotypic feature(s) of interest.
  • the code or part of the code that determines the genetic makeup of each individual may be in a dataset stored, for example, in an appropriate storage. One or more of the datasets may be publicly available and/or are available to members of the research community.
  • the code may have been generated from genome-wide association studies, and thus the data may be genotype-phenotype associated data.
  • the code or part of the code may therefore be from individuals with a recognised phenotypic feature.
  • the phenotypic feature may be the phenotypic feature of interest.
  • datasets include Immunochip, or databases such as Immunobase, the database of Genotypes and Phenotypes (dbGaP), OMIM, COSMIC, PharmGKB and bacterial databases such as SalComMac, PhenoLink, ProTraits; epigenetic databases such as IHEC Data Portal, ROADMAP Epigenomics, DiseaseMeth; and genome-wide association databases (Oz et al Mol. Biol and Evol. (2014); Lazar et al Mol. Sys. Biol. (2013) and Lazar et al Nature Comm. (2014)).
  • the comparable code or part of the code for the control group may be a dataset stored for example in an appropriate storage means such as one or more databases or one or more chips.
  • One or more of the datasets maybe publicly available or available to members of the research community.
  • databases include the International Genome Sequencing Consortium; TSC (The SNP Consortium ltd); the International HapMap Project, Decipher and ancestral allele databases, such as db ancestral allele database (Sherry et al., 2001), Human Gene Mutation Database (HGVbase), ExPASy, GeneSNPs, ClinBar, Geneatlas, GeneCards Database, Genome Variation Server (GVS), Human Organised Whole Genome Database (HOWDY), jSNP (Japanese SNPs), Leelab SNP database, The Human SNP database, OMIM (Online Mendelian Inheritance in
  • All of the code that determines the genetic makeup of an individual may be compared to the comparable code of the control, and the degree of genomic variation between the codes maybe recorded in the database.
  • part of the code that determines the genetic makeup of an individual may be compared to the comparable part of the control code.
  • the computer may be programmed to analyse only a proportion of the code of the individual and comparable control, such as regions of code known or thought to comprise genomic variations associated with the phenotypic feature of interest. The degree of genomic variation between the codes in those regions may be recorded in the database.
  • the degree of genomic variation may be binary. For example, a SNP may be found, as compared to the control, in a particular location in part of the code that determines the genetic makeup of an individual. The presence of the SNP may be recorded in the database as the‘presence’ or‘absence’ of an SNP.
  • the degree of genomic variation may be non-binary, that is quantitative, for example the number of mutations, size of insertion, number of CNV, size of deletion, or graded, for example the degree of gene expression, methylation, non- coding RNA regulatory effect.
  • Non-binary genomic variation may be measured using a continuous scale. Accordingly, by the phrase“degree of genomic variation” as used herein is meant a quantitative and/or qualitative measurement of the difference between any aspect of the code that determines the genetic makeup of an individual compared to a comparable aspect of code from another individual or group of individuals.
  • the step of recording the degree of genomic variation in each individual in a database may occur as an integral part of the comparison step.
  • variant identifiers may be recorded in the database.
  • Example of such details include the accession number, the reference SNP cluster ID (rsID), TGCA IDs, the chromosomal location ID, ssID, HGVS, Cosmic ID, HGMD, ClinVar ID, Uniprot ID, DGVa variant call ID, dbVAr variant call.
  • Information about an individual, or each individual in the set of individuals may also be recorded.
  • Such information may include clinically relevant information, and/ or demographic data, for example, gender, weight, height, age, symptoms, date of onset of symptoms, drug or treatment regime.
  • Such information may be recorded in the database.
  • the location of the genomic variation is then determined.
  • the location of the genomic variation maybe achieved by comparing the genomic variation with a database of genetic information and/ or identification of the genomic variation within one or more database of genetic information.
  • One or more of the databases may be publicly available or available to members of the research community.
  • the information provided in the database may be reordered or re-co nfigured, for example for ease of reference or use.
  • Comparison of the identified genomic variation with a database of genetic information for the purpose of determining the location of the genomic variation may be carried out by a computer using software adapted for the purpose. For example, Regulatory Sequence Analysis Tool (RSAT) matrix-scan (Turatsinze et ah, 2008) and ReMap with integrated ChIP-seq peak analysis (Cheneby et al, 2018)
  • RSAT Regulatory Sequence Analysis Tool
  • the purpose of identifying the location of the genomic variation is to allow
  • the term“gene product” means an entity resulting from transcription, or transcription and translation of genetic code. It includes proteins and RNA including non-coding RNAs (nc-RNAS), rRNA, tRNA, tmRNA, antisense RNA, messenger RNA (mRNA), microRNA (miRNA), small nuclear RNA (snoRNA), short interfering RNA (siRNA), rasiRNA, piwi RNA (piRNA), (tmRNA) and long non-coding (lncRNA).
  • nc-RNAS non-coding RNAs
  • rRNA rRNA
  • tRNA tRNA
  • tmRNA antisense RNA
  • messenger RNA messenger RNA
  • miRNA microRNA
  • small nuclear RNA small nuclear RNA
  • siRNA small interfering RNA
  • rasiRNA piwi RNA
  • tmRNA piwi RNA
  • tmRNA long non-coding
  • Gene expression is the process by which the code that determines the genetic makeup of an individual is used as a template for the synthesis of a gene product.
  • the process is comprised of two independent processes: transcription and translation.
  • a gene product can result from either transcription alone (an RNA species), or transcription followed by translation (a protein).
  • pre-mRNA precursor messenger RNA
  • RNA polymerase RNA polymerase
  • the pre-mRNA then undergoes further processing (splicing to remove non-coding introns, addition of a 5’ Cap and a 3’ poly (A) tail) and in eukaryotes (and their viruses), the mature mRNA is translocated out of the nucleus (in prokaryotes and their viruses, transcription and translation occur simultaneously).
  • gene products that do not encode proteins are processed further and directed to their downstream pathways.
  • Translation occurs within the ribosome and is the process by which the mature mRNA is used as a template for the directed assembly of an amino acid or polypeptide chain. With the aid of chaperone proteins, the polypeptide chain folds into the characteristic three-dimensional structure of a protein.
  • the transcription or transcription and translation of gene products is tightly controlled. Ensuring the controlled transcription or transcription and translation of a gene is important for the maintenance of homeostasis within the cellular environment. This can be affected the presence of a genomic variation within the code that determines the genetic makeup of an individual.
  • the resultant gene product will be modified as a result of the variation.
  • the phrase“affects a gene product directly” as used herein means the genomic variation is located within a coding region of a gene, and transcription, or transcription and translation of the coding sequence results in modification of the gene product in comparison to the gene product that would result from transcription or transcription and translation of the coding region without the genomic variation.
  • a genomic variation within a coding region for a mRNA could result in an amino acid substitution or an alteration to the length of the translated polypeptide.
  • the resultant protein would thus be modified in comparison to a protein resulting from transcription and translation of the coding region with an alternative genomic variant that does not lead to an amino acid substitution.
  • the genomic variation is present within the mRNA, in a target site for another gene product (such as but not limited to a miRNA).
  • a target site for another gene product such as but not limited to a miRNA.
  • the modification within the target site is recognised by miRNA, which then directs the gene product to RNA degradation pathway.
  • genomic variation can be present on the miRNA and either changes the miRNA’s mature region responsible for binding to the target site on the mRNA, or changes the miRNA sequences targeted by lncRNAs or other regulatory RNAs or other competitive endogenous RNAs (ceRNAs).
  • genomic variations can occur outside of a coding region of a gene and have the effect of modifying the production of the gene product.
  • a genomic variation can also occur at an epigenetic site on the DNA in the promoter region or other regulatory regions, and thus modify the transcription rate of nearby gene(s).
  • the phrase“regulating production of a gene product” as used herein means that the genomic variation is located outside of a coding region of a gene, and within the promoter region or other regulatory region of said gene, and transcription or transcription and translation of the gene has the effect of modifying the production of the gene product.
  • the promoter region includes sequences recognised by gene products such as transcription factors and enhancers, which are required for the initiation of transcription. Genomic variations within this region can, for example, prevent the binding of the factors required for the activation of transcription.
  • a genomic variation within the promoter region can prevent the production of a transcript and thus prevent production of a gene product; or enhance the production of a transcript thus resulting in an increase in the gene product.
  • An epigenetic (methylation) marker within a promoter or other regulatory region as a result of a genomic variation can, for example, result in modification of the transcription rate of nearby gene(s).
  • the aberrant addition of a methyl group to cytosine (hyper-methylation) within CpG islands found in the promoter region of a gene can prevent the binding of transcription factors.
  • Methyl groups may also be removed (hypo-methylation), allowing transcription factors to bind and allow transcription to occur (for example removal of methylation on oncogenes promoting the development of cancer).
  • an epigenetic marker on key histone residues as a result of a genomic variation may modify the transcription rate of nearby gene(s), for example, the addition of a methyl group to Histone3 at lysine 27, causes re-modelling of the local chromatin structure and blocks access to the promoter region of a gene
  • heterochromatin state This results in gene silencing and prevents the transcription of a gene product.
  • histone modifications may also result in gene activation and local re-modelling of chromatin structure allowing transcription factors to bind (euchromatin state).
  • the genomic variation is thereby determined to be located either within a coding region or a non-coding region.
  • a genomic variation such as a single nucleotide polymorphism in a human DNA sequence may be located within a region which is known to encode for a protein; a further SNP within the DNA sequence may be found to be located within a region known to encode for a regulatory entity, such as miRNA or a transcription factor binding site.
  • Location of an epigenetic genomic variation may also be determined, for example remodelling of chromatin or altering/regulating expression of gene products in a specific location, for example by methylation of DNA at CpG sites or modification of histone residues.
  • the location of the genomic variation is recorded in the database.
  • the outcome may be, for example, production of a modified protein: a SNP present in the coding region of the first gene may result in an amino acid substitution in the resulting protein; production of a modified miRNA: a SNP in the coding region of the first gene may result in the production of a modified miRNA which may, for example, affect the half-life of mRNA and result in reduction in gene expression; as a further example, the outcome of the genomic variation may be the presence or absence of an epigenetic marker within the promoter region of the first gene, for example, the addition of a methyl group to a cytosine base within the CpG region of the promoter of the first gene, which may cause the DNA to adopt a heterochromatin state and silencing of the first gene prevents transcription or transcription and translation of a gene product.
  • the first gene may be identified by assessing the flanking sequences of the genomic variation for transcription factor binding sites (TFBS) and/or miRNA target sites (miRNA-TS). Further to this, the effect of the genomic variation on the TFBS or miRNA-TS may be classified as loss or gain of binding site/target or a neutral change. The gene corresponding to the loss or gain effect may then be identified as the first gene.
  • the flanking sequences are 50 bases upstream and downstream of the genomic variation.
  • the gene product(s) encoded by the first gene may be recorded in the database.
  • the second gene maybe referred to herein as“the effecting gene”.
  • the gene product of the second gene may have an effect on the product of the first gene.
  • a transcription factor that is encoded by a second gene may be the transcription factor required for the transcription of the first gene.
  • the gene product(s) encoded by the second gene may be recorded in the database.
  • One or more further details about the genomic variation may be recorded in the database, for example, the identification and/or name of the gene effected by the genomic variation.
  • a genetic profile may be generated for an or each individual using the information obtained according to the methods discussed herein.
  • a genetic profile may be generating comprising or more of: the degree of genomic variation; the location of the genomic variation; the outcome of the genomic variation; the first gene; the second gene; or the gene product(s) encoded by the first and/or second genes.
  • the genetic profile may also contain additional data such as transcriptomics and proteomics data. Individuals within a set of individuals may then be clustered based on similarities in their genetic profile.
  • the methods discussed herein provide a means of analysis which is applicable to many areas. They allow the grouping of individuals within a cohort on the basis of a profile created as a result of their genetic code.
  • they allow the grouping of individuals within a cohort having a particular phenotypic feature on the basis of common features identified in a profile created as a result of genomic variations in their genetic code. This may allow determination of common or consistent pathways affected, for example, common biological processes or pathways in such individuals, and thus allow identification of targets for treatment. Such information can be particularly relevant to disease or disease states or clinical or pathological conditions, for example where individuals with the condition or state displays common phenotypic features but respond differently to the same treatment(s). Accordingly, there is provided a method of identifying, from a group of individuals with a particular phenotypic feature, individuals with one or more common genomic variations.
  • the term“genetic profile” means a profile created as a result of the genomic variations of said individual.
  • the term“individual” means any organism, for example eukaryotes such as animal, plants and protists; prokaryotes such as bacteria; Archaea; viruses and fungi.
  • the individual maybe a plant or an animal.
  • the individual maybe a host, for example to a microbe.
  • the individual may be a mammal.
  • the individual may be a human, a domesticated animal, a microbe or a bacterium.
  • the phenotypic feature may be a disease or clinical condition.
  • the phenotypic feature may be antibiotic resistance or virulence.
  • the phenotypic feature may be host- microbe interaction.
  • the method may allow extraction of meaningful information relevant to one or more phenotypic traits from genetic information.
  • the method may allow insight into a microbe-host relationship.
  • the method may allow identification of biological pathways and processes which had previously not been associated with a particular phenotypic feature, or individuals with a particular phenotypic feature, and/or biological pathways and processes which had previously not been associated with a particular cohort of individuals with a particular phenotypic feature. For example, in a cohort of human patients suffering from a particular pathological condition, it may have been found that certain individuals respond positively to a particular treatment, for example by demonstrating an improvement in one or more pathological markers, whilst other individuals within the cohort may show no response or limited response to the same treatment.
  • the methods described herein may be used to reveal the biological basis for the difference in response; for example, the individuals within the cohort who do not respond to the treatment may have one or more genomic variations which affect certain gene products associated with a particular cellular pathway, which mean that treatment aimed at regulating that pathway will (or will not) be efficacious.
  • the methods described herein may inform the decision as to whether an individual in a group of individuals with a particular phenotypic feature maybe a candidate for treatment in a particular manner. For example, the genetic profile of an individual may be established using the methods described herein, which may then be used to determine the susceptibility of that individual to a particular treatment.
  • UC ulcerative colitis
  • MAML2 Mastermind-like protein 2
  • MAML2 is known to activate the NOTCHi receptor, thereby increasing NOTCHi activation, a key player in the activation of inflammation in UC
  • treatment with inhibitors of NOTCHi or MAML2 would be appropriate.
  • the methods described herein may also allow identification of individuals with a phenotypic feature who do not have a genomic variation(s) known or thought to be associated with that phenotypic feature.
  • NfKBi nuclear factor kappa light chain enhancer of activated B cells
  • PRKCB protein kinase C beta type; also denoted by PKCB
  • NFKBi indirectly (Kang et ah, 2001).
  • a prominent position for NfKB and PRKCB in ulcerative colitis would therefore be expected.
  • the methods described herein may allow identification of a patient with UC who does not have a genomic variation associated with a NFkB pathway member and may not, therefore, be an appropriate candidate for treatment with a drug which targets the NfKB pathway.
  • a Notch pathway inhibitor or MAML2 inhibitor for use in treating individuals with ulcerative colitis having a genomic variation, such as a gain of function mutation, in a Notch signalling pathway member, but not a genomic variation, such as a mutation, in an NFkB pathway member.
  • the Notch signalling pathway member may be MAML2, and/ or the NFkB pathway member may be NfKB or PRKCB, and/ or the inhibitor may be a gamma-secretase inhibitor.
  • a Notch pathway inhibitor or MAML2 inhibitor for use in a method of treating individuals with ulcerative colitis wherein the method comprises (i) determining whether a test sample from the patient comprises a genomic variation, such as a mutation, in a Notch signalling pathway member; and (ii) determining whether a test sample from the same individual does not comprise a genomic variation, such as mutation, in an NFkB pathway member, establishing whether the genomic variation is a loss or gain of function variation, and if the test sample from the patient comprises a gain of function genomic variation in a Notch signalling pathway member and not a genomic variation in an NFkB pathway member, administering to the patient an effective amount of a Notch pathway or MAML2 inhibitor.
  • the Notch pathway member may be MAML2, and/ or the NFkB pathway member may be NFkB or PRKCB and/ or the inhibitor may be a gamma-secretase inhibitor.
  • an inhibitor for use in treating individuals with ulcerative colitis having a genomic variation in a member of a specific pathway associated with a cell type wherein the genomic variation increases function of the cell type, and the inhibitor inhibits the function or number of the cell type, wherein the cell type is a fibroblast, myofibroblast, regulatory T -cell, B-cell, macrophage or dendritic cell; or an activator for use in treating individuals with ulcerative colitis having a genomic variation in a member of a specific pathway associated with a cell type, wherein the genomic variation decreases function of the cell type, and the activator increases the function or number of the cell type, wherein the cell type is a fibroblast, myofibroblast, regulatory T -cell, B-cell, macrophage or dendritic cell.
  • B-cell inhibitor or activator for use in treating individuals with ulcerative colitis having a genomic variation, such as a mutation in a B-cell pathway member.
  • a B-cell inhibitor or activator for use in a method of treating individuals with ulcerative colitis wherein the method comprises (i) determining whether a test sample from the patient comprises a genomic variation in a B-cell pathway member; (ii) if the test sample comprises a genomic variation in a B-cell pathway member establishing if the genomic variation is a gain or loss of function variation; (iii) if the test sample from the patient comprises: (a) a gain of function genomic variation, administering to the patient an effective amount of a B-cell inhibitor; or (b) a loss of function genomic variation, administering to the patient an effective amount of a B-cell activator.
  • a computer program which, when executed by a computer, causes the computer to perform the method.
  • a computer readable medium (which may be non-transitory) which stores the computer program.
  • apparatus for example one or more computer systems, configured to perform the method.
  • the at least one computer system comprises at least one processor and memory.
  • a computer- readable table storing, for each of a plurality of genomic variations: an identity of genomic variation, an identity of a product of an effecting gene, an identity of an interaction type, if present, an identity of one or more first genes directly affected by the genomic variation and an identity of the type of genomic variation and, for a set of individuals, a respective degree of variation of the genomic variation.
  • a system comprising at least one computer system and a database storing the table.
  • Figure l is a block diagram of a system for identifying contribution of a genomic variation to a phenotypic feature
  • Figure 2 is a process flow diagram of a method of identifying contribution of a genomic variation to a phenotypic feature
  • Figure 3 is a first table used in a method of identifying contribution of a genomic variation to a phenotypic feature
  • Figure 4 is a second table used in a method of identifying contribution of a genomic variation to a phenotypic feature
  • Figure 5 illustrates three examples of gene products
  • FIG. 6 is a block diagram of a system for extracting and processing single nucleotide polymorphisms (SNPs) to uncover hidden pathways and to identify mediators;
  • SNPs single nucleotide polymorphisms
  • Figure 7 is a block diagram of a high-performance computing cluster used in extracting and processing single nucleotide polymorphism to uncover hidden pathways and to identify mediators;
  • Figure 8 is a block diagram of a local computing devices user in extracting and processing single nucleotide polymorphism to uncover hidden pathways and to identify mediators;
  • Figure 10 illustrates an SNP matrix
  • Figure 11 illustrates interaction matrix
  • Figure 12 schematically illustrates extraction of patient-specific UC-omes from a UC- ome.
  • GWAS genome-wide association studies
  • SNPs single nucleotide polymorphisms
  • GWAS compare the distributions of SNPs to make meaningful conclusions and comparisons between genomes.
  • SNPs single nucleotide polymorphisms
  • the study of genomic variation was limited to the study of genetic linkage within families to identify heritable traits and genetic disorders. Whilst this worked well for single gene conditions, it was more challenging to solve complex disease patterns with multiple allelic variants.
  • the concept of genetic association wherein the frequency of genetic variants of a particular allele are evaluated and compared between individuals with and without the phenotype of interest (i.e., a disease state), was proposed and led to the further development of GWAS.
  • the database PharmGKB facilitates the study of how different people respond to different pharmacological agents depending upon which genetic variant they possess.
  • COSMIC the Catalogue of Somatic Mutations in Cancer, curates data on the genetic variants amongst cancer types, enabling identification of common cancer sub-types and potentially informing treatment.
  • the“missing heritability” problem identifies that a single genomic variation may not account for much of a phenotypic feature.
  • This is a problem that has significant implications for medicine, since a person's susceptibility to disease may depend more on "the combined effect of all the genes in the background than on the disease genes in the foreground", or the role of particular genomic variations may have been overestimated.
  • a phenotypic feature may be determined by more than the concerted effect of disease associated genes, so that the direct role of individual genes/gene variants in determining phenotypic variation could prove insignificant.
  • it has been identified that not all genetic effects are easily attributed to the SNPs identified by GWAS.
  • genomic variants are often maintained at low frequencies by natural selection and would require WGS methods to identify the specific mutations.
  • GWAS also fails to address the combinatorial effect of very different genomic variants/genetic loci and the effect these have on the phenotypic output. Further, the presence of genomic variants that impact expression but have no impact on disease risk remains unclear. The effects of combinatorial genetic variants on important cellular networks and pathways are also uncertain.
  • genomic variations there is provided herein a systems approach to enable the analysis of genomic variations on both the genome and the network of regulatory interactions. Such an approach can provide insight into the cumulative effects of genomic variations, including multiple regulatory genomic variations, to the central developmental and cell proliferation pathways in the core region of the network.
  • the system 1 may include first and second computer systems 2, 3 in communication with storage 4 which stores first and second tables 8, 9.
  • the first system 2 may take the form of a high-performance computer cluster and the second system 3 may take the form of non-HPC system, such a desktop computer.
  • the first system 2 includes first, second, third and fourth modules 21, 22, 23, 24.
  • the first module 21 determines a degree 31 of genomic variation 51 for a plurality of genomic variations 51 for each individual in a set of individuals and records the value 31 (i.e., the degree) in the first table 8 (step Si).
  • the degree 31 of genomic variation may be the presence or absence of the genomic variation 51 and so may be stored using‘o’ or .
  • the second module 22 determines the location of the genomic variation 51 which is stored in the second table 9 (step S2).
  • the third module 23 determines the identity of a first gene 52 directly affected by the genomic variation 51 and, if the genomic variation affects a gene product directly, recording the gene, in the coding sequence of which the genomic variation is located, as the first gene (steps S3 to S5).
  • the fourth module 24 determines whether the genomic variation regulates the production of a gene product 53, 54, recording the gene, in the promoter region or other regulatory region of which the genomic variation is located, as the first gene (steps S6 to S7).
  • the fourth module 24 also determines the outcome of the genomic variation on the first gene and record the outcome of the genomic variation in the second table 7 (step S8).
  • the second system 3 includes fifth and sixth modules 25, 26.
  • the fifth module 25 determines whether one or more other gene products are present and, if so, records the identity of the gene encoding the one or more other gene produces in the second table 7 (steps S9 & S10).
  • the one or more other gene products may interact with the first gene, may be encoded by the first gene and/or regulate the first gene.
  • the following example is provided to illustrate an embodiment of the present invention and should not be construed as limiting thereof.
  • Example 1 Identification of an aberrant Notch signalling pathway in a well-defined cohort of individuals suffering from ulcerative colitis
  • the system and method can be used to identify contribution of a genomic variation, in the form of processing single nucleotide polymorphisms (SNPs), to ulcerative colitis (UC).
  • SNPs single nucleotide polymorphisms
  • UC ulcerative colitis
  • SNPs single nucleotide polymorphisms
  • GWAS Genome-wide association studies
  • CD Crohn’s disease
  • UC Ulcerative colitis
  • UC-associated SNPs and their associated‘risk’ allele were identified using Immunochip data (Jostins et ah, 2012) and the dbSNP ancestral allele database (Sherry et ah, 2001).
  • UC-specific SNP data for 377 UC patients were compiled from seven centres across East Yale, UK (Cambridge, Norwich, Ipswich, Welwyn Garden City, Luton, Bedford, and West- Suffolk).
  • the location of the genomic variation was determined: the location of the SNPs was recorded as either exonic (missense, synonymous), intro nic/non-translated regions and intergenic.
  • Flanking nucleotide sequences were obtained from dbSNP (Sherry et ah, 2001). The analyzed SNPs are shown Figure 9. Assessing effect of SNPs on transcription factor binding sites and miRNA target sites From the JASPAR database we downloaded 396 human transcription factors’ binding profiles represented by Position Specific Scoring Matrices (PSSMs) (Mathelier et ah, 2016). The downloaded PSSMs in JASPAR format were converted to the TRANSFAC format to ease handling of results. To assess the effect of the SNP on the gain or loss of putative TF binding sites, flanking sequences 50 bases upstream and downstream of the SNPs were extracted.
  • PSSMs Position Specific Scoring Matrices
  • the Regulatory Sequence Analysis Tool (RSAT) matrix-scan (Turatsinze et ah, 2008) was used to search for potential TFBS in the ancestral and patient-specific mutant alleles.
  • the background model estimation was determined by using residue probabilities from the input sequences with a Markov order of 1. The search was subject to both strands of the sequences. Hits with a P-value ⁇ ie-05 were considered as putative binding sites. Other parameters were set at default values.
  • the 22bp sequences of mature miRNAs were retrieved from miRBase (Kozomara and Griffiths- Jones, 2011).
  • flanking sequences of SNPs were assessed for the presence of miRNA target sites using miRanda (Enright et ah, 2003). Hits predicted to occur in the seed region (2’-8’) of the miRNAs and with alignment scores 3 90 and energy threshold ⁇ -16 kcal/mol were considered as target sites. Other parameters were set to default settings. A final manual check was performed to ensure that the SNPs overlapped with the predicted TF or miRNA binding sites. We also considered gain or loss of the regulatory interactions between TFs and protein-coding genes in our analysis, where the protein-coding gene was within tokb upstream or downstream of the SNP-affected TFBS.
  • SNP-affected genes We called the genes corresponding to such SNPs‘SNP-affected genes’ from here onwards.
  • Protein-protein interactions of the proteins encoded by SNP-affected genes were obtained from OmniPath in January 2017 (Tiirei et ah, 2016).
  • the set of proteins encoded by SNP-affected genes and their first interactors were defined as the UC-specific network footprint of a particular patient.
  • the union of all network footprints, the UC-ome, was analyzed and visualized in Cytoscape 3.3.0 (Su et ah, 2014) using the inverted self-organizing map layout.
  • the Scipy scikit-learn package was used for hierarchical clustering (Pedregosa et ah, 2011) of the patient-specific clusters.
  • the constructed distance matrix between patients was based on the Hamming distance (Hamming 1950). If a protein was directly or indirectly affected by a SNP, then it was assigned a“1” in a patient. If the protein was not affected, then it was scored as“o”. Multidimensional scaling was conducted in the KNIME environment using the MSA KNIME node (Berthold et al. 2008; Kruskal, 1964). We retained only the first three dimensions. The first two dimensions were plotted in Microsoft Excel. For the obtained clusters and sub-clusters we compared the occurrence of therapeutic upscaling performance with Fisher exact tests in Matlab version 8.4 (MATLAB, 2014).
  • a Random Forest classifier was implemented to predict therapeutic outcome.
  • the outcome variable was whether a patient needs additional immunomodulation therapy beside mesalazine as a binary variable for the 370 patients that we had therapeutic information for.
  • the SNP affected genes and the first neighbour protein-protein interactors of multiple SNP-affected genes were the binary features and the redundant features were removed. There were 15 such genes/proteins.
  • the model was
  • UC-associated regulatory SNPs localized within transcription factor binding sites (TFBS) or miRNA target sites (miRNA-TS) based on integrating immunochip data (Jostins et ah, 2012) and the dbSNP ancestral allele database (Sherry et ah, 2001) with regulatoiy network resources (see Methods for details).
  • TFBS transcription factor binding sites
  • miRNA-TS miRNA target sites
  • the UC network consisted of 247 proteins nodes, 1269 protein-protein interactions, 631 TF-target gene and 66 miRNAmRNA regulatory connections. The two most central proteins were
  • NFKBi and PRKCB both of which are known to be involved in UC (Burkitt et ah, 2015; Gould et ah, 2016).
  • NFKBi is the central mediator of inflammation (Caamano and Hunter, 2002) and PRKCB activates NFKBi indirectly (Kang et ah, 2001). The prominent position for NFKBi and PRKCBi in the UC network was therefore expected.
  • the UC network consists of six distinct but intertwined network modules according to Girvan- Newman clustering (Newman and Girvan, 2004). These modules are distinguishable by visualizing the whole network in a force-directed layout (Figure 2a). Each module is centred around a key signalling protein directly affected by a SNP ( Figure 2b). The three most abundant modules are formed mainly by the interactors of, 1) PRKCB and FCGR2A (88 proteins), 2) 7 NFKBi (51 proteins), and 3) the binding partners of LSPi and GNA12 that contains many interactors of both NFKBi and PRKCB (71 proteins).
  • HDAC7 histone deacetylase 7
  • DNMT3B DNA methyltransferases 3 beta
  • the sixth module contains members of the Notch pathway.
  • the Notch pathway is connected to the UC-ome through MAML2, an important NOTCH protein co-activator (McElhinny et ah, 2008; Wu and Griffin, 2004).
  • MAML2 expression could be affected by loss of the miR-4495 target site as a result of SNP ⁇ 543104, which occurs in 40% of the examined patients.
  • NOTCH proteins are involved in
  • gastrointestinal stem cell homeostasis driving differentiation towards absorptive epithelial cells (rather than secretory cells, like goblet cells) (Chen et ah, 2017; Katoh and Katoh, 2007).
  • absorptive epithelial cells rather than secretory cells, like goblet cells
  • NOTCH proteins While malfunction of these processes is associated with UC, the direct involvement of NOTCH proteins in UC pathogenesis is unclear (Kini et al.,2015).
  • the present method identified several major signalling proteins linked to UC with a novel systems level overview identifying cross-talk between these proteins and the pathways they reside within. In addition to connecting known components of UC, this systems genomics approach by revealing SNP-affected signalling proteins and processes, extends our understanding of UC pathogenesis.
  • each patient contained the proteins encoded by the SNP- affected genes and the interactors of these proteins, i.e. their first neighbour proteins.
  • the first cluster contained the profiles for patients whose mutations were related to PRKCB, with the second cluster containing profiles for patients with mutations related to NFKBi.
  • the profiles contained both PRKCB and NFKBi SNPs, while the network footprints of the fourth cluster had neither PRKCB nor NFKBi affected.
  • FCGR2A low affinity immunoglobulin gamma Fc region receptor Il-a
  • MAML2 Mastermind Ligand 2
  • the model’s selection of MAML2 can be rationalized by its ability to bind NOTCH proteins. Through NOTCH proteins, MAML2 can modulate the activity of already described UC-associated pathways, including the NFKB pathway. Thus, an unbiased machine learning approach has also confirmed the importance of MAML2 in influencing therapeutic outcome, due to its role in regulating the Notch pathway.
  • NF-kappaB family of transcription factors central regulators of innate and adaptive immune functions. Clin Microbiol Rev 15,
  • Clinical guideline CG166 Ulcerative colitis: management (2013). National Institute for Health and Care Excellence. Core Team, R. (2015). R: A Language and Environment for Statistical Computing.
  • Micro RNA targets in Drosophila Genome Biol 5, Ri.
  • PKCbeta modulates antigen receptor signaling via regulation of Btk membrane localization.
  • MicroRNA-34b inhibits pancreatic cancer metastasis through repressing Smad3. Curr Mol Med 13, 467-478.
  • dbSNP the NCBI database of genetic variation. Nucleic Acids Res 29, 308-311.
  • MicroRNAs are differentially expressed in ulcerative colitis and alter expression of macrophage inflammatory peptide-2 alpha. Gastroenterology 135, 1624-1635.624.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Chemical & Material Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Theoretical Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Medical Informatics (AREA)
  • Organic Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Analytical Chemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Physiology (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Biochemistry (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

L'invention concerne un procédé d'identification de la contribution d'une variation génomique à une caractéristique phénotypique. Le procédé comprend la détermination d'un degré d'une variation génomique (51 ; Fig.) dans chaque individu dans un ensemble d'individus et l'enregistrement du degré de variation génomique dans chaque individu dans une base de données, par exemple, dans une première table (6). Le procédé comprend la détermination d'un emplacement de la variation génomique pour permettre la détermination, grâce à l'emplacement, du fait que la variation génomique affecte un produit génique directement et/ou régule la production d'un produit génique (53, 54 ; Fig. 5). Le procédé comprend la détermination d'un premier gène (51 ; Fig. 5) affecté par la variation génomique. Lorsque la variation génomique affecte directement un produit génique, l'enregistrement du gène, dans la séquence codante dont la variation génomique est localisée, en tant que premier gène, par exemple, dans une seconde table (7). Lorsque la variation génomique régule la production d'un produit génique, enregistrement du gène, dans la région promotrice ou une autre région régulatrice dans laquelle la variation génomique est localisée, en tant que premier gène. Le procédé consiste à déterminer un résultat de la variation génomique sur le premier gène et à enregistrer le résultat de la variation génomique dans la base de données, par exemple, dans la seconde table et déterminer la présence ou l'absence d'au moins un autre produit génique (i) interagissant avec le premier gène, (ii) codé par le premier gène ; et/ou (iii) réguler le premier gène et, si présent, enregistrer l'identité du gène codant pour lesdites autres produits géniques dans la base de données en tant que second gène, par exemple, dans la seconde table.
PCT/GB2019/053128 2018-11-05 2019-11-05 Analyse génomique Ceased WO2020095035A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB1818024.0A GB2578727A (en) 2018-11-05 2018-11-05 Genomic analysis
GB1818024.0 2018-11-05

Publications (1)

Publication Number Publication Date
WO2020095035A1 true WO2020095035A1 (fr) 2020-05-14

Family

ID=64655494

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2019/053128 Ceased WO2020095035A1 (fr) 2018-11-05 2019-11-05 Analyse génomique

Country Status (2)

Country Link
GB (1) GB2578727A (fr)
WO (1) WO2020095035A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113628685A (zh) * 2021-07-27 2021-11-09 广东省农业科学院水稻研究所 一种基于多个基因组比较和二代测序数据的全基因组关联分析方法

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112102878B (zh) * 2020-09-16 2024-01-26 张云鹏 一种LncRNA学习系统

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030144799A1 (en) * 2001-09-17 2003-07-31 Volker Nowotny Regulatory single nucleotide polymorphisms and methods therefor
US20130116930A1 (en) * 2011-08-22 2013-05-09 The Board Of Trustees Of The Leland Stanford Junior University Method and System for Assessment of Regulatory Variants in a Genome
US20160048634A1 (en) * 2013-03-15 2016-02-18 Ali Torkamani Systems and methods for genomic annotation and distributed variant interpretation

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101395281B (zh) * 2006-01-04 2013-05-01 骆树恩 用于核酸作图和鉴定核酸的精细结构变化的方法以及用途
CA2671267A1 (fr) * 2006-11-30 2008-06-05 Navigenics Inc. Procedes et systemes d'analyse genetique
US8140270B2 (en) * 2007-03-22 2012-03-20 National Center For Genome Resources Methods and systems for medical sequencing analysis
EP2901345A4 (fr) * 2012-09-27 2016-08-24 Childrens Mercy Hospital Système d'analyse du génome et diagnostic de maladie génétique

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030144799A1 (en) * 2001-09-17 2003-07-31 Volker Nowotny Regulatory single nucleotide polymorphisms and methods therefor
US20130116930A1 (en) * 2011-08-22 2013-05-09 The Board Of Trustees Of The Leland Stanford Junior University Method and System for Assessment of Regulatory Variants in a Genome
US20160048634A1 (en) * 2013-03-15 2016-02-18 Ali Torkamani Systems and methods for genomic annotation and distributed variant interpretation

Non-Patent Citations (38)

* Cited by examiner, † Cited by third party
Title
"UniProt Consortium (2015). UniProt: a hub for protein information", NUCLEIC ACIDS RES, vol. 43, 2015, pages D204 - 12
BURKITT, M.D.HANEDI, A.F.DUCKWORTH, C.A.WILLIAMS, J.M.TANG, J.M.O'REILLY, L.A.PUTOCZKI, T.L.GERONDAKIS, S.DIMALINE, R.CAAMANO, J.H: "NF-KBI, NF-KB2 and c-Rel differentially regulate susceptibility to colitis-associated adenoma development in C57BL/6 mice", J PATHOL, vol. 236, 2015, pages 326 - 336
CAAMANO, J.HUNTER, C.A.: "NF-kappaB family of transcription factors: central regulators of innate and adaptive immune functions", CLIN MICROBIOL REV, vol. 15, 2002, pages 414 - 429
CROFT, D.MUNDO, A.F.HAW, R.MILACIC, M.WEISER, J.WU, G.CAUDY, M.GARAPATI, P.GILLESPIE, M.KAMDAR, M.R. ET AL.: "The Reactome pathway knowledgebase", NUCLEIC ACIDS RES, vol. 42, 2014, pages D472 - 7
DE LANGE, K.M.MOUTSIANAS, L.LEE, J.C.LAMB, C.A.LUO, Y.KENNEDY, N.A.JOSTINS, L.RICE, D.L.GUTIERREZ-ACHURY, J.JI, S.-G. ET AL.: "Genome-wide association study implicates immune activation of multiple integrin genes in inflammatory bowel disease", NAT GENET, vol. 49, 2017, pages 256 - 261
DEZSO MÓDOS ET AL: "Neighbours of cancer-related proteins have key influence on pathogenesis and could increase the drug target space for anticancer therapies", NPJ SYSTEMS BIOLOGY AND APPLICATIONS, vol. 3, no. 1, 24 January 2017 (2017-01-24), pages 1 - 13, XP055660890, DOI: 10.1038/s41540-017-0003-6 *
ENRIGHT, A.J.JOHN, B.GAUL, U.TUSCHL, T.SANDER, C.MARKS, D.S.: "MicroRNA targets in Drosophila", GENOME BIOL, vol. 5, 2003, pages R1, XP021012829, DOI: 10.1186/gb-2003-5-1-r1
GINI, C., VARIABILITY E MUTUABILITA (BOLOGNA: C. CUPPINI, 1912
GONG, Y.WU, C.N.XU, J.FENG, G.XING, Q.H.FU, W.LI, C.HE, L.ZHAO, X.Z.: "Polymorphisms in microRNA target sites influence susceptibility to schizophrenia by altering the binding of miRNAs to their targets", EUR NEUROPSYCHOPHARMACOL, vol. 23, 2013, pages 1182 - 1189
GOULD, N.J.DAVIDSON, K.L.NWOKOLO, C.U.ARASARADNAM, R.P.: "A systematic review of the role of DNA methylation on inflammatory genes in ulcerative colitis", EPIGENOMICS, vol. 8, 2016, pages 667 - 684
HUANG, H.FANG, M.JOSTINS, L.UMICEVIC MIRKOV, M.BOUCHER, G.ANDERSON, C.A.ANDERSEN, V.CLEYNEN, I.CORTES, A.CRINS, F. ET AL.: "Fine-mapping inflammatory bowel disease loci to single-variant resolution", NATURE, vol. 547, 2017, pages 173 - 178
JIANMING WU ET AL: "Human FasL Gene Is a Target of [beta]-Catenin/T-Cell Factor Pathway and Complex FasL Haplotypes Alter Promoter Functions", PLOS ONE, vol. 6, no. 10, 11 October 2011 (2011-10-11), pages e26143, XP055661194, DOI: 10.1371/journal.pone.0026143 *
JOSTINS, L.RIPKE, S.WEERSMA, R.K.DUERR, R.H.MCGOVERN, D.P.HUI, K.Y.LEE, J.C.SCHUMM, L.P.SHARMA, Y.ANDERSON, C.A. ET AL.: "Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease", NATURE, vol. 491, 2012, pages 119 - 124, XP055484363, DOI: 10.1038/nature11582
KANG, S.W.WAHL, M.I.CHU, J.KITAURA, J.KAWAKAMI, Y.KATO, R.M.TABUCHI, R.TARAKHOVSKY, A.KAWAKAMI, T.TURCK, C.W. ET AL.: "PKCbeta modulates antigen receptor signaling via regulation of Btk membrane localization", EMBO J, vol. 20, 2001, pages 5692 - 5702
KATOH, M.KATOH, M.: "Notch signaling in gastrointestinal tract (review", INT J ONCOL, vol. 30, 2007, pages 247 - 251, XP009100925
KENT, W.J.SUGNET, C.W.FUREY, T.S.ROSKIN, K.M.PRINGLE, T.H.ZAHLER, A.M.HAUSSLER, D.: "The human genome browser at UCSC", GENOME RES, vol. 12, 2002, pages 996 - 1006, XP007901725, DOI: 10.1101/gr.229102. Article published online before print in May 2002
KIM, Y.S.HO, S.B.: "Intestinal goblet cells and mucins in health and disease: recent insights and progress", CURR GASTROENTEROL REP, vol. 12, 2010, pages 319 - 330
KINI, A.T.THANGARAJ, K.R.SIMON, E.SHIVAPPAGOWDAR, A.THIAGARAJAN, D.ABBAS, S.RAMACHANDRAN, A.VENKATRAMAN, A.: "Aberrant niche signaling in the etiopathogenesis of ulcerative colitis", INFLAMM BOWEL DIS, vol. 21, 2015, pages 2549 - 2561
KOZOMARA, A.GRIFFITHS-JONES, S.: "miRBase: integrating microRNA annotation and deep-sequencing data", NUCLEIC ACIDS RES, vol. 39, 2011, pages D152 - 7
KRUSKAL, J.B.: "Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis", PSYCHOMETRIKA, vol. 29, 1964, pages 1 - 27, XP008130844
LAZAR ET AL., MOL. SYS. BIOL., 2013
LAZAR ET AL., NATURE COMM., 2014
LIU, C.CHENG, H.SHI, S.CUI, X.YANG, J.CHEN, L.CEN, P.CAI, X.LU, Y.WU, C. ET AL.: "MicroRNA-34b inhibits pancreatic cancer metastasis through repressing Smad3", CURR MOL MED, vol. 13, 2013, pages 467 - 478
MATHELIER, A.FORNES, O.ARENILLAS, D.J.CHEN, C.-Y.DENAY, G.LEE, J.SHI, W.SHYR, C.TAN, G.WORSLEY-HUNT, R. ET AL.: "JASPAR 2016: a major expansion and update of the openaccess database of transcription factor binding profiles", NUCLEIC ACIDS RES, vol. 44, 2016, pages D110 - 5
MCELHINNY, A.S.LI, J.L.WU, L.: "Mastermind-like transcriptional co-activators: emerging roles in regulating cross talk among multiple signaling pathways", ONCOGENE, vol. 27, 2008, pages 5138 - 5147
MORRIS, J.H.APELTSIN, L.NEWMAN, A.M.BAUMBACH, J.WITTKOP, T.SU, G.BADER, G.D.FERRIN, T.E.: "clusterMaker: a multi-algorithm clustering plugin for Cytoscape", BMC BIOINFORMATICS, vol. 12, 2011, pages 436, XP021093164, DOI: 10.1186/1471-2105-12-436
NEWMAN, M.E.J.GIRVAN, M.: "Finding and evaluating community structure in networks", PHYS REV E STAT NONLIN SOFT MATTER PHYS, vol. 69, 2004, pages 026113
OZ, L MOL. BIOL. AND EVOL., 2014
PATRICIA SARLOS ET AL: "Genetic update on inflammatory factors in ulcerative colitis: Review of the current literature", WORLD JOURNAL OF GASTROINTESTINAL PATHOPHYSIOLOGY, vol. 5, no. 3, 1 January 2014 (2014-01-01), pages 304, XP055498556, ISSN: 2150-5330, DOI: 10.4291/wjgp.v5.i3.304 *
PEDREGOSA, F.VAROQUAUX, G.GRAMFORT, A.MICHEL, V.THIRION, B.GRISEL, O.BLONDEL, M.PRETTENHOFER, P.WEISS, R.DUBOURG, V. ET AL.: "Scikit-learn: Machine Learning in Python", JOURNAL OF MACHINE LEARNING RESEARCH, 2011
PRAGER, M.BUETTNER, J.BUENING, C.: "Genes involved in the regulation of intestinal permeability and their role in ulcerative colitis", J DIG DIS, vol. 16, 2015, pages 713 - 722
SHERRY, S.T.WARD, M.H.KHOLODOV, M.BAKER, J.PHAN, L.SMIGIELSKI, E.M.SIROTKIN, K.: "dbSNP: the NCBI database of genetic variation", NUCLEIC ACIDS RES, vol. 29, 2001, pages 308 - 311, XP055125042, DOI: 10.1093/nar/29.1.308
TIIREI, D.KORCSMAROS, T.SAEZ-RODRIGUEZ, J.: "OmniPath: guidelines and gateway for literature-curated signaling pathway resources", NAT METHODS, vol. 13, 2016, pages 966 - 967
TURATSINZE, J.-V.THOMAS-CHOLLIER, M.DEFRANCE, M.VAN HELDEN, J.: "Using RSAT to scan genome sequences for transcription factor binding sites and cis-regulatory modules", NAT PROTOC, vol. 3, no. 1, 2008, pages 8 - 1588
WU, F.HUANG, Y.DONG, F.KWON, J.H.: "Ulcerative Colitis-Associated Long Noncoding RNA, BC012900, Regulates Intestinal Epithelial Cell Apoptosis", INFLAMM BOWEL DIS, vol. 22, 2016, pages 782 - 795
WU, F.ZIKUSOKA, M.TRINDADE, A.DASSOPOULOS, T.HARRIS, M.L.BAYLESS, T.M.BRANT, S.R.CHAKRAVARTI, S.KWON, J.H.: "MicroRNAs are differentially expressed in ulcerative colitis and alter expression of macrophage inflammatory peptide-2 alpha", GASTROENTEROLOGY, vol. 13, 2008, pages 1624 - 1635
WU, L.GRIFFIN, J.D.: "Modulation of Notch signaling by mastermind-like (MAML) transcriptional co-activators and their involvement in tumorigenesis", SEMIN CANCER BIOL, vol. 14, 2004, pages 348 - 356
YUNGUO GONG ET AL: "Polymorphisms in microRNA target sites influence susceptibility to schizophrenia by altering the binding of miRNAs to their targets", EUROPEAN NEUROPSYCHOPHARMACOLOGY., vol. 23, no. 10, 1 October 2013 (2013-10-01), NL, pages 1182 - 1189, XP055660932, ISSN: 0924-977X, DOI: 10.1016/j.euroneuro.2012.12.002 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113628685A (zh) * 2021-07-27 2021-11-09 广东省农业科学院水稻研究所 一种基于多个基因组比较和二代测序数据的全基因组关联分析方法
CN113628685B (zh) * 2021-07-27 2022-03-15 广东省农业科学院水稻研究所 一种基于多个基因组比较和二代测序数据的全基因组关联分析方法

Also Published As

Publication number Publication date
GB201818024D0 (en) 2018-12-19
GB2578727A (en) 2020-05-27

Similar Documents

Publication Publication Date Title
Zhang et al. Genetic analyses support the contribution of mRNA N 6-methyladenosine (m6A) modification to human disease heritability
Levy et al. Novel diagnostic DNA methylation episignatures expand and refine the epigenetic landscapes of Mendelian disorders
Hannon et al. Leveraging DNA-methylation quantitative-trait loci to characterize the relationship between methylomic variation, gene expression, and complex traits
Chowdhary et al. Long non-coding RNAs: mechanisms, experimental, and computational approaches in identification, characterization, and their biomarker potential in cancer
US11367508B2 (en) Systems and methods for detecting cellular pathway dysregulation in cancer specimens
Reddy et al. Genetic and functional drivers of diffuse large B cell lymphoma
Lee et al. Profiling allele-specific gene expression in brains from individuals with autism spectrum disorder reveals preferential minor allele usage
Mossman et al. Mitochondrial-nuclear interactions mediate sex-specific transcriptional profiles in Drosophila
Yoon et al. Genetics and regulatory impact of alternative polyadenylation in human B-lymphoblastoid cells
WO2021119311A1 (fr) Systèmes et méthodes de prédiction de l'état d'une déficience de recombinaison homologue d'un spécimen
Larson et al. Comprehensively evaluating cis-regulatory variation in the human prostate transcriptome by using gene-level allele-specific expression
Rheinbay et al. Discovery and characterization of coding and non-coding driver mutations in more than 2,500 whole cancer genomes
Mortazavi et al. SNPs, short tandem repeats, and structural variants are responsible for differential gene expression across C57BL/6 and C57BL/10 substrains
Oak et al. Framework for microRNA variant annotation and prioritization using human population and disease datasets
Clark et al. Novel and haplotype specific microRNAs encoded by the major histocompatibility complex
EP3976829A1 (fr) Méthode de traitement ou de prophylaxie
WO2020095035A1 (fr) Analyse génomique
Berthold et al. Bridging the gap: Short structural variants in the genetics of anorexia nervosa
Aydın et al. A hybrid approach to assess the structural impact of long noncoding RNA mutations uncovers key NEAT1 interactions in colorectal cancer
Moody et al. Profiling of transcribed cis-regulatory elements in single cells
Elfman et al. Discovery of a polymorphic gene fusion via bottom-up chimeric RNA prediction
Memon et al. In silico prediction of housekeeping long intergenic non-coding RNAs reveals HKlincR1 as an essential player in lung cancer cell survival
Wu et al. Identification of infertility-associated topologically important genes using weighted co-expression network analysis
Hamba et al. Topologically associating domain underlies tissue specific expression of long intergenic non-coding RNAs
Morales-Vicente et al. The human developing cerebral cortex is characterized by an elevated de novo expression of long noncoding RNAs in excitatory neurons

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19804775

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19804775

Country of ref document: EP

Kind code of ref document: A1