[go: up one dir, main page]

WO2021061697A1 - Predicting neonatal complications using genetic variation - Google Patents

Predicting neonatal complications using genetic variation Download PDF

Info

Publication number
WO2021061697A1
WO2021061697A1 PCT/US2020/052099 US2020052099W WO2021061697A1 WO 2021061697 A1 WO2021061697 A1 WO 2021061697A1 US 2020052099 W US2020052099 W US 2020052099W WO 2021061697 A1 WO2021061697 A1 WO 2021061697A1
Authority
WO
WIPO (PCT)
Prior art keywords
dna
exome
sequencing
genes
accumulation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US2020/052099
Other languages
French (fr)
Inventor
Leif NELIN
William CL STEWART
Mark KLEBANOFF
Komia GNONA
Irina A. Buhimschi
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nationwide Childrens Hospital Inc
Original Assignee
Nationwide Childrens Hospital Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nationwide Childrens Hospital Inc filed Critical Nationwide Childrens Hospital Inc
Publication of WO2021061697A1 publication Critical patent/WO2021061697A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers

Definitions

  • NC neonatal complications
  • BPD bronchopulmonary dysplasia
  • ROP retinopathy of prematurity
  • NC may also be influenced by gestational age (GA), sex, and race.
  • GA gestational age
  • Draper et al showed that GA is negatively correlated with risk for NC.
  • Trembath A et al. showed that preterm males have increased risk for NC relative to preterm females.
  • Ryan et al. showed that conditional on GA, Non- Hispanic White infants have increased risk relative to Non- Hispanic Black infants. Ryan etal., J Pediatr, 207:130-135 (2019).
  • a variety of methods are used for the screening of newborns for various genetic conditions. These tests typically involve separate assays for particular metabolite or enzymatic activity associated with particular diseases.
  • screening of premature infants is complex, and test parameters are not optimized for these patient groups. For example, newborn intensive care unit infants are more likely to generate false positive or negative results. Accordingly, improved methods for identifying premature infants at risk of neonatal complications are needed.
  • Figure 1 provides a diagram summarizing the inclusion and exclusion criteria used for selecting subjects for exome sequencing.
  • Figure 2 provides graphs of a Manhattan plot of -log 10 p- values obtained from the logistic regression of preterm infant status onto burden. After correcting for multiple tests, no single gene in Non-Hispanic White (left) and Non-Hispanic (right) preterm infants is statistically significant at the exome-wide level (dashed line).
  • Figure 3 provides graphs showing the exome-wide distribution of burden-based p- values by gene in Non-Hispanic White (NHW) preterm infants (left panel), and Non-Hispanic Black (NHB) preterm infants (right panel).
  • Figure 4 provides a graph showing ROC curves are shown for three predictors of NC: Polygenic Risk Score (PRS) -tGestational Age (GA) (solid), GA alone(dashed), PRS alone(dotdashed), and Random (dotted).
  • the corresponding AUC’s are: 96%, 84%, 78%, and 50%, respectively.
  • the average AUC of PRS+GA dropped to 87%
  • the average AUC of PRS alone dropped to 67%.
  • the present invention relates to a method of predicting the likelihood that a premature infant will develop a neonatal complication.
  • the method includes conducting exome sequencing of a plurality of genes of the DNA exome of a biological sample from the premature infant; determining the level of minor allele accumulation in the genes of the DNA exome, and characterizing the premature infant as having an increased risk of developing a neonatal complication if the genes of the DNA exome have a higher than average minor allele accumulation level.
  • diagnosis can encompass determining the likelihood that a subject will develop a disease, or the existence or nature of disease in a subject.
  • diagnosis as used herein also encompasses determining the severity and probable outcome of disease or episode of disease or prospect of recovery, which is generally referred to as prognosis).
  • diagnosis can also encompass diagnosis in the context of rational therapy, in which the diagnosis guides therapy, including initial selection of therapy, modification of therapy (e.g. , adjustment of dose or dosage regimen), and the like.
  • treatment refers to obtaining a desired pharmacologic or physiologic effect.
  • the effect may be therapeutic in terms of a partial or complete cure for a disease or an adverse effect attributable to the disease.
  • Treatment covers any treatment of a disease in a mammal, particularly in a human, and can include inhibiting the disease or condition, i.e., arresting its development; and relieving the disease, i.e., causing regression of the disease.
  • Prevention or prophylaxis refers to preventing the disease or a symptom of a disease from occurring in a subject which may be predisposed to the disease but has not yet been diagnosed as having it ⁇ e.g., including diseases that may be associated with or caused by a primary disease). Prevention may include completely or partially preventing a disease or symptom.
  • exons mean short, functionally important sequences of DNA which represent the regions in genes that are translated into protein and the untranslated region (UTR) flanking them.
  • exome sequencing means an efficient strategy to selectively sequence the coding regions of the genome as a cheaper but still effective alternative to whole genome sequencing.
  • UTRs are usually not included in exome studies. In the human genome there are about 180,000 exons: these constitute about 1% of the human genome, which translates to about 30 megabases (Mb) in length. It is estimated that the protein coding regions of the human genome constitute about 85 percent of the disease-causing mutations.
  • the term "gene,” as used herein, means one or more sequence(s) of nucleotides in a genome that together encode one or more expressed molecule, e.g., an RNA, or polypeptide.
  • the gene can include coding sequences that are transcribed into RNA which may then be translated into a polypeptide sequence, and can include associated structural or regulatory sequences that aid in replication or expression of the gene.
  • the term "genotype” refers to the alleles present in DNA from a subject or patient, where an allele can be defined by the particular nucleotide(s) present in a nucleic acid sequence at a particular site(s). Often a genotype is the nucleotide(s) present at a single polymorphic site known to vary in the human population.
  • allele refers to a variant form of a given gene.
  • the variation relates to the nucleotide sequence of the gene, and may or may not result in different observable phenotypic traits.
  • the sequence variants may be single or multiple base changes, including without limitation insertions, deletions, or substitutions, or may be a variable number of sequence repeats.
  • Allele frequency i.e., allele accumulation refers to the fraction of gene copies that are of a particular allele in a defined population.
  • “Increased risk” or “increased likelihood” refers to a statistically higher frequency of occurrence of the disease or condition in an individual carrying a higher minor allele burden in comparison to the frequency of occurrence of the disease or condition in a member of a population that does not carry the higher minor allele burden.
  • the increased risk can be referred to as a percentage increase.
  • an increased risk can refer to a 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 150%, 200%, 300% or higher frequency of occurrence.
  • nucleic acid refers to polynucleotides or oligonucleotides such as deoxyribonucleic acid (DNA), and, where appropriate, ribonucleic acid (RNA).
  • DNA deoxyribonucleic acid
  • RNA ribonucleic acid
  • the term should also be understood to include, as equivalents, analogs of either RNA or DNA made from nucleotide analogs (e.g. peptide nucleic acids) and as applicable to the embodiment being described, single (sense or antisense) and double- stranded polynucleotides.
  • polymorphism refers to the coexistence of more than one form of a gene or portion (e.g., allelic variant) thereof.
  • a portion of a gene of which there are at least two different forms, i.e., two different nucleotide sequences, is referred to as a "polymorphic region of a gene".
  • a specific genetic sequence at a polymorphic region of a gene is an allele.
  • a polymorphic region can be a single nucleotide, the identity of which differs in different alleles.
  • a polymorphic region can also be several nucleotides long.
  • One aspect of the invention provides a method of predicting the likelihood that a premature infant will develop a neonatal complication. Predicting includes calculating or estimating the probability that neonatal complications will occur later in time. The method includes the steps of conducting exome sequencing of a plurality of genes of the DNA exome of a biological sample from the premature infant; determining the level of minor allele accumulation in the genes of the DNA exome, and characterizing the premature infant as having an increased risk of developing a neonatal complication if the genes of the DNA exome have a higher than average minor allele accumulation level.
  • Biological samples to be analyzed using the methods provided herein can be derived from biological material including nucleic acid such as a cell or tissue sample.
  • a biological sample can be obtained in some cases from a hospital, laboratory, clinical or medical laboratory.
  • the biological sample can comprise various types of nucleic acid, e.g., RNA or DNA.
  • Nucleic acids can be extracted from a biological sample by methods known to those of ordinary skill in the art.
  • the sample can be aqueous humor, vitreous humor, bile, whole blood, blood serum, blood plasma, breast milk, cerebrospinal fluid, cerumen, lymphatic fluid, gastric juice, mucus, peritoneal fluid, saliva, sebum, semen, sweat, perspiration, tears, vaginal secretion, vomit, feces, or urine.
  • the sample can be obtained from a hospital, laboratory, clinical or medical laboratory.
  • the nucleic acid of the sample can be, e.g., mitochondrial DNA, genomic DNA, mRNA, siRNA, miRNA, cRNA, single-stranded DNA, double- stranded DNA, single-stranded RNA, double-stranded RNA, tRNA, rRNA, or cDNA.
  • the sample can comprise cell-free nucleic acid.
  • the sample can be a cell line, genomic DNA, cell-free plasma, formalin fixed paraffin embedded (FFPE) sample, or flash frozen sample.
  • FFPE formalin fixed paraffin embedded
  • a formalin fixed paraffin embedded sample can be deparaffinized before nucleic acid is extracted.
  • the biological sample is a blood sample.
  • the sample may be processed to render it competent for amplification.
  • Exemplary sample processing can include lysing cells of the sample to release nucleic acid, purifying the sample (e.g., to isolate nucleic acid from other sample components, which may inhibit amplification), diluting/concentrating the sample, and/or combining the sample with reagents for amplification, such as a DNA/RNA polymerase (e.g., a heat-stable DNA polymerase for PCR amplification), dNTPs (e.g., dATP, dCTP, dGTP, and dTTP (and/or dUTP)), a primer set for each allele sequence or polymorphic locus to be amplified, probes (e.g., fluorescent probes, such as TAQMAN probes or molecular beacon probes, among others) capable of hybridizing specifically to each allele sequence to be amplified, Mg 2+ , DMSO, BSA, a buffer, or any combination thereof, among
  • the method further includes the step of obtaining a biological sample from the premature infant.
  • Methods of obtaining a biological sample from the subject include use of a needle stick, needle biopsy, swab, and the like.
  • the biological sample is a blood sample, which may be obtained for example by venipuncture.
  • the biological sample is obtained in utero.
  • an in utero sample can be obtained using percutaneous umbilical cord blood sampling.
  • Neonatal Complications A variety of neonatal complications are known to be associated with premature birth. These include neurological problems such as retinopathy of prematurity, cardiovascular complications such as intraventricular hemorrhage, respiratory problems such as bronchopulmonary dysplasia, gastrointestinal and metabolic issues such as necrotizing enterocolitis, hematological complications such as anemia of prematurity, and infection.
  • the neonatal complication is selected from the group consisting of bronchopulmonary dysplasia, intraventricular hemorrhage, retinopathy of prematurity, and necrotizing enterocolitis.
  • Bronchopulmonary dysplasia is a chronic lung disease which mainly arises in premature children with very low weight at birth. It may result in a longwinded damage of the lung up to early adulthood, or in case of a progressive pulmonary change it may lead to death.
  • a premature lung with a lack of surfactant is a significant risk factor which is responsible for BPD.
  • Children which are under artificial ventilation over a long period of time e.g. to treat infant respiratory distress syndrome (IRDS), are also at risk to develop BPD.
  • BPD development of BPD is the result of reorganization processes with an inflammatory formation of connective tissue after an initial water retention in a premature lung which is exposed to chemical (oxygen radicals), mechanical (pressure trauma, volume trauma), and biological (microbial agent) damages.
  • BPD of very early preterm infants which are born in a gestation age of less than 28 weeks is in particular characterized by a disorder or an arrest of the formation of the alveoli where the entire area of diffusion for the oxygen and the carbon dioxide is reduced.
  • the respiratory tract is also affected, which becomes narrowed and, as a consequence, increases the airway resistance and the blood vessels of the lung, which may cause due to the vasoconstriction an increased pressure in the pulmonary circulation and a strain on the right ventricle.
  • Clinical symptoms of BPD may be an increased breathing rate, deeper, enforced breathing with contractions in the chest, increased bronchosecretion, growth retardation and livid skin and mucosa.
  • In the X-ray image of the lung inter alia areas of overblowing next to areas with insufficient ventilation and a fibrotic connective tissue reconstruction can be found.
  • Diagnosis and classification is made on the basis of the oxygen requirement which is necessary at a certain age of the child for a sufficient oxygen saturation of the blood. A distinction is made between a mild, moderate, and a severe progressive form.
  • BPD is treated by the administration of oxygen to maintain the physiological oxygen saturation of the blood.
  • Corticosteroids counteract the chronic inflammatory process.
  • Diuretic medications should treat a pulmonary edema. In case of a narrowing of the respiratory tract the inhalation of bronchospasmolytics may be considered.
  • Vasodilating medications may decrease any increased pressure in the pulmonary circulation.
  • IVH intraventricular hemorrhage
  • Intraventricular hemorrhage also known as intraventricular bleeding, is a bleeding into the brain's ventricular system, where the cerebrospinal fluid is produced and circulates through towards the subarachnoid space.
  • IVH in the preterm brain usually arises from the germinal matrix whereas IVH in the term infants originates from the choroid plexus. However, it is particularly common in premature infants or those of very low birth weight. The cause of IVH in premature infants, unlike that in older infants, children or adults, is rarely due to trauma. Instead it is thought to result from changes in perfusion of the delicate cellular stmctures that are present in the growing brain, augmented by the immaturity of the cerebral circulatory system, which is especially vulnerable to hypoxic ischemic encephalopathy. The lack of blood flow results in cell death and subsequent breakdown of the blood vessel walls, leading to bleeding. While this bleeding can result in further injury, it is itself a marker for injury that has already occurred. Most intraventricular hemorrhages occur in the first 72 hours after birth.
  • ICP intracranial pressure
  • ROP retinopathy of prematurity
  • RLF retrolental fibroplasia
  • Terry syndrome is a disease of the eye affecting prematurely bom babies generally having received intensive neonatal care, in which oxygen therapy is used on them due to the premature development of their lungs. It is thought to be caused by disorganized growth of retinal blood vessels which may result in scarring and retinal detachment. ROP can be mild and may resolve spontaneously, but it may lead to blindness in serious cases. As such, all preterm babies are at risk for ROP, and very low birth-weight is an additional risk factor. Both oxygen toxicity and relative hypoxia can contribute to the development of ROP.
  • Peripheral retinal ablation is the mainstay of ROP treatment.
  • the destruction of the avascular retina can be performed with a solid-state laser photocoagulation device.
  • Cryotherapy an earlier technique in which regional retinal destruction was done using a probe to freeze the desired areas, has also been evaluated in multi-center clinical trials as an effective modality for prevention and treatment of ROP.
  • cryotherapy is no longer preferred for routine avascular retinal ablation in premature babies, due to the side effects of inflam ation and lid swelling.
  • Additional methods for treating ROP include vitrectomy, intravitreal injection of bevacizumab, and oral administration of propranolol.
  • Necrotizing enterocolitis is a medical condition where a portion of the bowel dies. Symptoms may include poor feeding, bloating, decreased activity, blood in the stool, or vomiting of bile. The underlying mechanism is believed to involve a combination of poor blood flow and infection of the intestines. Diagnosis is based on symptoms and confirmed with medical imaging.
  • Prevention includes feeding the infant with breast milk and probiotics. Small amounts of oral feeds of human milk starting as soon as possible, while the infant is being primarily fed intravenously, primes the immature gut to mature and become ready to receive greater intake by mouth. Treatment consists primarily of supportive care including providing bowel rest by stopping enteral feeds, gastric decompression with intermittent suction, fluid repletion to correct electrolyte abnormalities and third-space losses, support for blood pressure, parenteral nutrition, and prompt antibiotic therapy. Surgery is required in those who have free air in the abdomen.
  • the method of the invention is used to evaluate premature human infants.
  • a premature infant is a baby bom earlier than normal (e.g., at fewer than 37 weeks of gestational age).
  • Gestational age is a risk factor associated with likelihood that an infant will develop a neonatal complication. There is little chance of survival for infants born at fewer than 22 weeks of gestational age. Accordingly, in some embodiments, the premature infant has a gestational age of 22 to 36 weeks. In other embodiments, the premature infant has a gestational age of 24 to 35 weeks. In another embodiment, the premature infant has a gestational age from 25 to 33 weeks. In yet another embodiment, the premature infant has a gestational age from 26 to 31 weeks.
  • race of the premature infant can also affect the risk of developing a neonatal complication. Races include Caucasian, Hispanic, Asian, Black, Polynesian, and Native American.
  • the method of the invention includes conducting exome sequencing of a plurality of genes of the DNA exome
  • Exome sequencing is a process by which the entire coding region of the DNA of an organism is sequenced. Exons are flanked by untranslated regions (UTR) that are usually not included in exome studies. In exome sequencing, only the mRNA is sequenced. The untranslated regions of the genome are not included in exome sequencing. See, e.g., Choi, M. el ai, Genetic diagnosis by whole exome capture and massively parallel DNA sequencing. 2009. PNAS. 106(45): 19096-19101, the disclosure of which is incorporated herein by reference. In the human genome, there are about 180,000 exons.
  • Exome sequencing includes two main steps.
  • the first step is to select the subset of DNA that encodes the exons (i.e., the DNA exome), which is also referred to as target- enrichment.
  • the second step is sequencing the exonic DNA using high-throughput DNA sequencing technology.
  • Target-enrichment methods allow one to selectively capture genomic regions of interest (e.g., exons) from a DNA sample prior to sequencing.
  • genomic regions of interest e.g., exons
  • Examples of target enrichment methods include array-based capture, and in-solution capture.
  • microarrays contain single- stranded oligonucleotides with sequences from the human genome to tile the region of interest fixed to the surface. Genomic DNA is sheared to form double-stranded fragments. The fragments undergo end-repair to produce blunt ends and adaptors with universal priming sequences are added. These fragments are hybridized to oligos on the microarray. Un-hybridized fragments are washed away and the desired fragments are eluted. The fragments are then amplified using the polymerase chain reaction (PCR) method or other amplification methods. See Elsharawy et ai, Brief Funct Genomics. 10 (6): 374-386 (2011).
  • PCR polymerase chain reaction
  • Amplification techniques are known to those of skill in the art and include, but are not limited to cloning, polymerase chain reaction (PCR), polymerase chain reaction of specific alleles (ASA), ligase chain reaction (LCR), nested polymerase chain reaction, self-sustained sequence replication (Guatelli, J. C. et ai, 1990, Proc. Natl. Acad. Sci. USA 87:1874-1878), transcriptional amplification system (Kwoh, D. Y. et ai, 1989, Proc. Natl. Acad. Sci. USA 86:1173-1177), and Q-Beta Replicase (Lizardi, P. M. etal., 1988, Bio/Technology 6:1197).
  • PCR polymerase chain reaction
  • ASA polymerase chain reaction of specific alleles
  • LCR ligase chain reaction
  • nested polymerase chain reaction self-sustained sequence replication
  • self-sustained sequence replication
  • In-solution capture involves the use of a pool of custom oligonucleotides (probes) that are synthesized and hybridized in solution to a fragmented genomic DNA sample.
  • the probes (labeled with beads) selectively hybridize to the genomic regions of interest after which the beads (now including the DNA fragments of interest) can be pulled down and washed to clear excess material.
  • the beads are then removed and the genomic fragments can be sequenced allowing for selective DNA sequencing of genomic regions (e.g., exons) of interest. See Mamanova et ai, Nature Methods. 7 (2): 111-118 (2010).
  • hybridization probes can be linked to magnetic beads to pull down the sequences belonging to the exosome.
  • conducting the exome sequencing includes target enrichment using the paired end sequencing protocol.
  • Paired-end sequencing allows users to sequence both ends of a fragment and generate high-quality, alignable sequence data. Paired-end sequencing facilitates detection of genomic rearrangements and repetitive sequence elements, as well as gene fusions and novel transcripts. Paired-end DNA sequencing also detects common DNA rearrangements such as insertions, deletions, and inversions. See Fullwood et al, Genome Res., 19(4):521-32 (2009).
  • the exonic DNA is sequenced using high-throughput (e.g., “Next Generation”) DNA sequencing technology.
  • high-throughput DNA sequencing e.g., “Next Generation” DNA sequencing technology.
  • methods of high-throughput DNA sequencing are known to those skilled in the art. See Grada A, Weinbrecht K., J Invest Dermatol. 133 (8): ell (2013).
  • Examples of high-throughput DNA sequencing include massively parallel signature sequencing, polony sequencing, 454 pyrosequencing, Illumina sequencing, combinatorial probe anchor sequencing (cPAS), SOLiD sequencing, ion torrent semiconductor sequencing, DNA snowball sequencing, Heliscope single molecule sequencing, single molecule real time (SMRT) sequencing, and nanopore DNA sequencing.
  • the high-throughput DNA sequencing technique is 454 sequencing (Roche) (see e.g., Margulies, M etal. (2005) Nature 437: 376-380).
  • 454 sequencing involved two steps. In the first step, DNA can be sheared into fragments of approximately 300- 800 base pairs, and the fragments can be blunt ended. Oligonucleotide adaptors can then ligated to the ends of the fragments. The adaptors can serve as sites for hybridizing primers for amplification and sequencing of the fragments.
  • the fragments can be attached to DNA capture beads, e.g., streptavidin-coated beads using, e.g., Adaptor B, which can contain 5'-biotin tag.
  • the fragments can be attached to DNA capture beads through hybridization.
  • a single fragment can be captured per bead.
  • the fragments attached to the beads can be PCR amplified within droplets of an oil-water emulsion. The result can be multiple copies of clonally amplified DNA fragments on each bead.
  • the emulsion can be broken while the amplified fragments remain bound to their specific beads.
  • the beads can be captured in wells (pico-liter sized; PicoTiterPlate (PTP) device).
  • the surface can be designed so that only one bead fits per well.
  • the PTP device can be loaded into an instrument for sequencing. Pyrosequencing can be performed on each DNA fragment in parallel.
  • Addition of one or more nucleotides can generate a light signal that can be recorded by a CCD camera in a sequencing instrument.
  • the signal strength can be proportional to the number of nucleotides incorporated.
  • Pyrosequencing can make use of pyrophosphate (PPi) which can be released upon nucleotide addition.
  • PPi can be converted to ATP by ATP sulfurylase in the presence of adenosine 5' phosphosulfate.
  • Luciferase can use ATP to convert luciferin to oxyluciferin, and this reaction can generate light that is detected and analyzed.
  • the high-throughput DNA sequencing technique is SOLiD technology (Applied Biosystems; Life Technologies).
  • SOLiD sequencing genomic DNA can be sheared into fragments, and adaptors can be attached to the 5' and 3' ends of the fragments to generate a fragment library.
  • internal adaptors can be introduced by ligating adaptors to the 5' and 3' ends of the fragments, circularizing the fragments, digesting the circularized fragment to generate an internal adaptor, and attaching adaptors to the 5' and 3' ends of the resulting fragments to generate a mate-paired library.
  • clonal bead populations can be prepared in microreactors containing beads, primers, template, and PCR components.
  • the templates can be denatured and beads can be enriched to separate the beads with extended templates. Templates on the selected beads can be subjected to a 3' modification that permits bonding to a glass slide.
  • a sequencing primer can bind to adaptor sequence.
  • a set of four fluorescently labeled di-base probes can compete for ligation to the sequencing primer. Specificity of the di-base probe can be achieved by interrogating every first and second base in each ligation reaction.
  • the sequence of a template can be determined by sequential hybridization and ligation of partially random oligonucleotides with a determined base (or pair of bases) that can be identified by a specific fluorophore.
  • the ligated oligonucleotide can be cleaved and removed and the process can be then repeated.
  • the extension product can be removed and the template can be reset with a primer complementary to the n-1 position for a second round of ligation cycles. Five rounds of primer reset can be completed for each sequence tag.
  • the primer reset process most of the bases can be interrogated in two independent ligation reactions by two different primers. Up to 99.99% accuracy can be achieved by sequencing with an additional primer using a multi-base encoding scheme.
  • the high-throughput DNA sequencing technique is SOLEXA sequencing (Illumina sequencing).
  • SOLEXA sequencing can be based on the amplification of DNA on a solid surface using fold-back PCR and anchored primers.
  • SOLEXA sequencing can involve a library preparation step. Genomic DNA can be fragmented, and sheared ends can be repaired and adenylated. Adaptors can be added to the 5’ and 3’ ends of the fragments. The fragments can be size selected and purified.
  • SOLEXA sequence can comprise a cluster generation step. DNA fragments can be attached to the surface of flow cell channels by hybridizing to a lawn of oligonucleotides attached to the surface of the flow cell channel.
  • the fragments can be extended and clonally amplified through bridge amplification to generate unique clusters.
  • the fragments become double stranded, and the double stranded molecules can be denatured.
  • Multiple cycles of the solid-phase amplification followed by denaturation can create several million clusters of approximately 1,000 copies of single- stranded DNA molecules of the same template in each channel of the flow cell.
  • Reverse strands can be cleaved and washed away. Ends can be blocked, and primers can by hybridized to DNA templates.
  • SOLEXA sequencing can comprise a sequencing step. Hundreds of millions of clusters can be sequenced simultaneously.
  • Primers, DNA polymerase and four fluorophore-labeled, reversibly terminating nucleotides can be used to perform sequential sequencing. All four bases can compete with each other for the template. After nucleotide incorporation, a laser is used to excite the fluorophores, and an image is captured and the identity of the first base is recorded. The 3' terminators and fluorophores from each incorporated base are removed and the incorporation, detection and identification steps are repeated. A single base can be read each cycle.
  • the high-throughput DNA sequencing technique comprises real time (SMRTTM) technology by Pacific Biosciences.
  • SMRT real time
  • each of four DNA bases can be attached to one of four different fluorescent dyes. These dyes can be phospholinked.
  • a single DNA polymerase can be immobilized with a single molecule of template single stranded DNA at the bottom of a zero-mode waveguide (ZMW).
  • ZMW can be a confinement structure which enables observation of incorporation of a single nucleotide by DNA polymerase against the background of fluorescent nucleotides that can rapidly diffuse in an out of the ZMW (in microseconds). It can take several milliseconds to incorporate a nucleotide into a growing strand.
  • the fluorescent label can be excited and produce a fluorescent signal, and the fluorescent tag can be cleaved off.
  • the ZMW can be illuminated from below. Attenuated light from an excitation beam can penetrate the lower 20-30 nm of each ZMW. A microscope with a detection limit of 20 zeptoliters (10 21 liters) can be created. The tiny detection volume can provide 1000-fold improvement in the reduction of background noise. Detection of the corresponding fluorescence of the dye can indicate which base was incorporated. The process can be repeated.
  • the high-throughput DNA sequencing technique is nanopore sequencing (See e.g., Soni G V and Meller A. (2007) Clin Chem 53: 1996-2001).
  • a nanopore can be a small hole, of the order of about one nanometer in diameter. Immersion of a nanopore in a conducting fluid and application of a potential across it can result in a slight electrical current due to conduction of ions through the nanopore. The amount of current which flows can be sensitive to the size of the nanopore. As a DNA molecule passes through a nanopore, each nucleotide on the DNA molecule can obstmct the nanopore to a different degree.
  • the nanopore sequencing technology can be from Oxford Nanopore Technologies; e.g., a GridlON system.
  • a single nanopore can be inserted in a polymer membrane across the top of a microwell. Each microwell can have an electrode for individual sensing.
  • the microwells can be fabricated into an array chip, with 100,000 or more microwells per chip.
  • An instrument (or node) can be used to analyze the chip. Data can be analyzed in real-time. One or more instruments can be operated at a time.
  • the nanopore can be a protein nanopore, e.g., the protein alpha-hemolysin, a heptameric protein pore.
  • the nanopore can be a solid-state nanopore made, e.g., a nanometer sized hole formed in a synthetic membrane (e.g., SiN x , or SiCfe).
  • the nanopore can be a hybrid pore (e.g., an integration of a protein pore into a solid-state membrane.
  • the nanopore can be a nanopore with an integrated sensors (e.g., tunneling electrode detectors, capacitive detectors, or graphene based nano-gap or edge state detectors (see e.g., Garaj et al. (2010) Nature vol. 67, doi:10.1038/nature09379)).
  • Nanopore can be functionalized for analyzing a specific type of molecule (e.g., DNA, RNA, or protein).
  • Nanopore sequencing can comprise "strand sequencing" in which intact DNA polymers can be passed through a protein nanopore with sequencing in real time as the DNA translocates the pore.
  • An enzyme can separate strands of a double stranded DNA and feed a strand through a nanopore.
  • the DNA can have a hairpin at one end, and the system can read both strands.
  • the high-throughput DNA sequencing technique is ion semiconductor sequencing (e.g., using technology from Life Technologies (Ion Torrent)).
  • Ion semiconductor sequencing can take advantage of the fact that when a nucleotide is incorporated into a strand of DNA, an ion can be released.
  • a high density array of micromachined wells can formed. Each well can hold a single DNA template. Beneath the well can be an ion sensitive layer, and beneath the ion sensitive layer can be an ion sensor.
  • H+ When a nucleotide is added to a DNA, H+ is released, when can be measured as a change in pH. The H+ ion can be converted to voltage and recorded by the semiconductor sensor.
  • An array chip can be sequentially flooded with one nucleotide after another. No scanning, light, or cameras can be required.
  • the high-throughput DNA sequencing technique is DNA nanoball sequencing (as performed, e.g., by Complete Genomics; see e.g., Drmanac et al. (2010) Science 327: 78-81).
  • DNA can be isolated, fragmented, and size selected.
  • DNA can be fragmented (e.g., by sonication) to a mean length of about 500 bp.
  • Adaptors (Adi) can be attached to the ends of the fragments.
  • the adaptors can be used to hybridize to anchors for sequencing reactions.
  • DNA with adaptors bound to each end can be PCR amplified.
  • the adaptor sequences can be modified so that complementary single strand ends bind to each other forming circular DNA.
  • the DNA can be methylated to protect it from cleavage by a Type IIS restriction enzyme used in a subsequent step.
  • An adaptor e.g., the right adaptor
  • An adaptor can have a restriction recognition site, and the restriction recognition site can remain non-methylated.
  • the non-methylated restriction recognition site in the adaptor can be recognized by a restriction enzyme (e.g., Acul), and the DNA can be cleaved by Acul 13 bp to the right of the right adaptor to form linear double stranded DNA.
  • a second round of right and left adaptors (Ad2) can be ligated onto either end of the linear DNA, and all DNA with both adaptors bound can be PCR amplified (e.g., by PCR).
  • Ad2 sequences can be modified to allow them to bind each other and form circular DNA.
  • the DNA can be methylated, but a restriction enzyme recognition site can remain non-methylated on the left Adi adaptor.
  • a restriction enzyme e.g., Acul
  • a third round of right and left adaptor (Ad3) can be ligated to the right and left flank of the linear DNA, and the resulting fragment can be PCR amplified.
  • the adaptors can be modified so that they can bind to each other and form circular DNA.
  • a type III restriction enzyme e.g., EcoP15
  • EcoP15 can be added; EcoP15 can cleave the DNA 26 bp to the left of Ad3 and 26 bp to the right of Ad2. This cleavage can remove a large segment of DNA and linearize the DNA once again.
  • a fourth round of right and left adaptors (Ad4) can be ligated to the DNA, the DNA can be amplified (e.g., by PCR), and modified so that they bind each other and form the completed circular DNA template.
  • Rolling circle replication e.g., using Phi 29 DNA polymerase
  • the four adaptor sequences can contain palindromic sequences that can hybridize and a single strand can fold onto itself to form a DNA nanoball (DNBTM) which can be approximately 200-300 nanometers in diameter on average.
  • a DNA nanoball can be attached (e.g., by adsorption) to a microarray (sequencing flowcell).
  • the flow cell can be a silicon wafer coated with silicon dioxide, titanium and hexamethyldisilazane (HMDS) and a photoresist material. Sequencing can be performed by unchained sequencing by ligating fluorescent probes to the DNA. The color of the fluorescence of an interrogated position can be visualized by a high resolution camera. The identity of nucleotide sequences between adaptor sequences can be determined.
  • the high-throughput DNA sequencing technique is Helicos True Single Molecule Sequencing (tSMS) (see e.g., Harris T. D. etal. (2008) Science 320:106-109).
  • tSMS Helicos True Single Molecule Sequencing
  • a DNA sample can be cleaved into strands of approximately 10 to 200 nucleotides, and apolyA sequence can be added to the 3' end of each DNA strand.
  • Each strand can be labeled by the addition of a fluorescently labeled adenosine nucleotide.
  • the DNA strands can then be hybridized to a flow cell, which can contain millions of oligo-T capture sites immobilized to the flow cell surface.
  • the templates can be at a density of about 100 million templates/cm 2 .
  • the flow cell can then be loaded into an instrument, e.g., HELISCOPETM sequencer, and a laser can illuminate the surface of the flow cell, revealing the position of each template.
  • a CCD camera can map the position of the templates on the flow cell surface.
  • the template fluorescent label can then be cleaved and washed away.
  • the sequencing reaction can begin by introducing a DNA polymerase and a fluorescently labeled nucleotide.
  • the oligo-T nucleic acid can serve as a primer.
  • the DNA polymerase can incorporate the labeled nucleotides to the primer in a template directed manner. The DNA polymerase and unincorporated nucleotides can be removed.
  • the templates that have directed incorporation of the fluorescently labeled nucleotide can be detected by imaging the flow cell surface. After imaging, a cleavage step can remove the fluorescent label, and the process can be repeated with other fluorescently labeled nucleotides until a desired read length is achieved. Sequence information can be collected with each nucleotide addition step.
  • the sequencing can be asynchronous. The sequencing can comprise at least 1 billion bases per day or per hour.
  • the method includes the step of determining the level of minor allele accumulation in the genes of the DNA exome.
  • Accumulation in this context, refers to the number of minor alleles present in the DNA of a subject.
  • a minor allele is an allele which differs from the most common allele for a given gene in a given population.
  • a minor allele often exhibits single nucleotide polymorphism relative to the predominant, or “major” allele.
  • the level of minor allele accumulation is an aggregate of the minor allele frequencies (MAF) for the genes evaluated using exome sequencing. Minor alleles can be identified using algorithmic sequence analysis and alignment.
  • the level of minor allele accumulation is evaluated using the single-gene burden method, while in other embodiments the level of minor allele accumulation is evaluated using the exome-wide burden method.
  • the premature infant is characterized as having an increased risk of developing a neonatal complication.
  • Average minor allele accumulation levels are known to those skilled in the art, and can be readily be determined through evaluation of the minor allele accumulation in a pool of premature infants.
  • the amount higher than average can include to a 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 150%, 200%, 300% or higher amount than average, in comparison with the average minor allele accumulation.
  • the level of minor allele accumulation is used to generate a polygenic risk score.
  • the polygenic risk score can be used to characterize the likelihood that the premature infant will develop a neonatal complication.
  • the number of minor alleles at each polymorphic site can be scored as 0 if no minor allele is present, scored as 1 if a single minor allele is present, and scored as 2 if both alleles present are minor. These scores can then be algorithmically combined to generate a polygenic risk score.
  • polygenic risk scores see Gibson, G., PLoS Genet., 15(4):el008060 (2019).
  • Exome DNA sequencing is carried out on a plurality of genes of the premature infant.
  • the exome sequencing is whole exome sequencing, in which case all of the exomes of the premature infant are evaluated.
  • a more limited set of exomes are evaluated.
  • These exomes can represent a set of genes that have been previously identified as being more effective for predicting the likelihood that the premature infant will develop neonatal complications.
  • the inventors have demonstrated that the genes RANBP2, CCDC138, GUCY1A3, RNU6-66P, SPINK1, ZCCHC2, DQ579288, FAM47E- STBD1, RSF24D1, and FTE are more effective for predicting the likelihood that neonatal complications will develop in premature infants.
  • the minor allele accumulation for 10 to 100 genes of the DNA exome are evaluated using exome sequencing, while in other embodiments the minor allele accumulation for 5 to 20 genes of the DNA exome are evaluated using exome sequencing.
  • the genes evaluated can optionally be used to calculate a polygenic risk score for the premature infant.
  • the exome sequencing is conducting using a microarray.
  • a microarray is a multiplex lab-on-a-chip. It is a 2D array on a solid substrate (usually a glass slide or silicon thin-film cell) that assays large amounts of biological material using high- throughput screening miniaturized, multiplexed and parallel processing and detection methods.
  • Microarrays are known in the art and available commercially from companies such as Affymetrix, Agilent, Applied Microarrays, Arrayit, Illumina, and others.
  • the array contains probes complementary to at least one single nucleotide polymorphism identified herein, preferably probes are included for hybridization to the target mutations.
  • a wide variety of array formats can be employed in accordance with the present disclosure.
  • One example includes a linear array of oligonucleotide bands, generally referred to in the art as a dipstick.
  • Another suitable format includes a two-dimensional pattern of discrete cells (such as 4096 squares in a 64 by 64 array).
  • other array formats including, but not limited to slot (rectangular) and circular arrays are equally suitable for use (see U.S. Pat. No. 5,981,185).
  • the array is formed on a polymer medium, which is a thread, membrane or film.
  • An example of an organic polymer medium is a polypropylene sheet having a thickness on the order of about 1 mm (0.001 inch) to about 20 mm although the thickness of the film is not critical and can be varied over a fairly broad range.
  • Biaxially oriented polypropylene (BOPP) films are also suitable in this regard; in addition to their durability, BOPP films exhibit a low background fluorescence.
  • the array is a solid phase, Allele-Specific Oligonucleotides (ASO) based nucleic acid array.
  • one or more traditional risk factors are also used to evaluate the likelihood that a premature infant will develop a neonatal complication.
  • traditional risk factors include gender, race, and gestational age. For example, decreasing gestational age is associated with an increased risk that the premature infant will develop neonatal complications. The increased risk known to be associated with these traditional risk factors can be mathematically factored into an overall analysis of the likelihood that the premature infant will develop a neonatal complication.
  • the level of minor allele accumulation can be displayed in a variety of ways.
  • the levels can be displayed graphically on a display as numeric values or proportional bars (i.e., a bar graph) or any other display method known to those skilled in the art.
  • the graphic display can provide a visual representation of the amount of the variant gene or protein in the biological sample being evaluated.
  • a report is generated which summarizes the identified variants.
  • the report lists the genes in which each variant was found.
  • the report lists the genomic location (e.g., chromosome number and numerical location) in which each variant was found.
  • the report lists the type variant (e.g., single nucleotide change or deletion).
  • the report lists the identity of the variant (e.g., an A to G mutation).
  • the report provides information on which variants are pathogenic or likely to be pathogenic.
  • the report provides information on which variants are associated with a disease of condition.
  • the report provides information on the disease or condition, including, but not limited to symptoms, pathology, diagnostic testing, and treatment.
  • the method includes treating premature infants identified as having an increased risk of developing a neonatal complication to reduce the likelihood or effect of the neonatal complication.
  • a preferred method for preventing the development of a neonatal complication is treatment with oxygen therapy.
  • methods of preventing or decreasing the risk that the complication will develop can be used. For example, in premature infants identified as having an increased risk of developing necrotizing enterocolitis, breast milk and/or probiotics can be provided to the premature infant to decrease the likelihood that the infant will develop necrotizing enterocolotis.
  • the objective of this study is to test the hypothesis that, among preterm infants, the accumulation of genetic variation across coding regions of the genome (;. ⁇ ? ., the exome) influences risk for NC.
  • the inventors sought to determine if any observed sex and race disparities in NC relate to the burden as defined by the gene-specific accumulation of minor alleles found by whole exome sequencing (WES).
  • WES whole exome sequencing
  • NC neonatal complications
  • SUS susceptible
  • BPD was defined as a requirement for oxygen and/or positive airway pressure at 36 weeks post-menstrual age. Jobe etal, Am J Respir Crit Care Med, 163:1723-1729 (2001).
  • NEC was defined as Bell’s Stage 2 or greater. Bell et al, J Pediatr Surg, 14:1-4 (1979).
  • ROP was defined as stage 2 or greater according to the ICROP. International Committee for the Classification of Retinopathy of Prematurity, Arch Ophthalmol, 123:991-999 (2005).
  • Severe IVH was defined as grade 3 or greater according to the Papile classification. Papile et al, J Pediatr, 92:529-534 (1978).
  • the inventors defined “resilient” (RES) as those infants that had none of the NC listed above. For each group, they recorded the following: gestational age at birth, the birthweight, Apgar scores (at 1 min and 5 min), race, sex, delivery route, any exposure to antenatal steroids, whether surfactant was given, and maternal characteristics initiating birth.
  • Resulting VCF files were downloaded from GenomeNext for subsequent analysis.
  • the inventors used GRCh37/hgl9 from the University of California at Santa Cruz database (Lander et al, Nature, 409:860-921 (2001)) and the 1000 Genome project phase 3 (Auton et al. , Nature, 526:68-74(2015)) for reference human genome annotation.
  • Burden can be tested on multiple levels (i.e. at the level of individual genes, the whole- exome, and across a selected set of selected genes).
  • AUC observed area under the curve
  • Burden-based Polygenic Risk Scores [0088] Using the results of their whole-exome association study (WEAS) of NC in Non- Hispanic Whites, they derived a polygenic risk score based on burden. Specifically, for the i* infant they computed the polygenic risk score (PRS) (Dudbridge F., PLoS Genet, 9:el003348 (2013)) as ⁇ j /? [/ ⁇ 3 ⁇ 4, where for the k lh gene, /? fc is the estimated coefficient for burden and B lk is the observed burden in the k lh infant. The summation is taken over the burden of 10 genes showing the strongest evidence for association.
  • PRS polygenic risk score
  • Table 1 Clinical outcomes of the study group
  • the SUS group had a much lower birthweight (p ⁇ 0.001) and gestational age (p ⁇ 0.001) at birth.
  • the SUS group also had more (p ⁇ 0.05) male newborns than did the RES group.
  • the Apgar scores at 1 min and 5 min also were statistically different (p ⁇ 0.009 and p ⁇ 0.03 respectively) between the two groups of infants with SUS group having somewhat lower scores. There was no significant difference in delivery route (p ⁇ 0.38), in antenatal steroids (p ⁇ 0.20), or surfactant administration (p ⁇ 0.96) between the two groups.
  • the inventors compared the ROC curves of PRS alone and their composite biomarker (PRS+GA) in terms of AUC (Figure 4). Recall that their composite biomarker combines gestational age and PRS into a single predictor of NC, where PRS is computed from the top ten genes (Table 3) of the exome-wide analysis for Non-Hispanic White infants. However, this comparison does not account for over-fitting. To compare the predictive power of PRS alone and PRS+GA without over-fitting, they averaged the AUC of each predictor in a 10-fold cross- validation procedure. The average AUC for PRS alone was 0.67 (p ⁇ 0.003), and increased to 0.87 (p ⁇ 0.001) when GA was combined with PRS (Table 4).
  • GUCY1A3 Another gene among the top ten with direct biological plausibility is GUCY1A3. This gene encodes for the al subunit of the nitric oxide/cGMP signaling pathway - a pathway that regulates sensitivity to nitric oxide. Note that nitric oxide is an established mediator of newborn lung development, and when inhaled, it has been proposed to exert therapeutic benefit to prevent BPD. While randomized clinical trials have yielded conflicting results on the protective effect of inhaled nitric oxide in the general population, there is the suggestion of a possible subgroup benefit in non- white infants. Furthermore, a polymorphism in GUCY1A3 has previously associated with decreased risk for pulmonary hypertension in a high-altitude population.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Wood Science & Technology (AREA)
  • Analytical Chemistry (AREA)
  • Zoology (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Pathology (AREA)
  • Immunology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

A method of predicting the likelihood that a premature infant will develop a neonatal complication is described. The method includes conducting exome sequencing of a plurality of genes of the DNA exome of a biological sample from the premature infant; determining the level of minor allele accumulation in the genes of the DNA exome, and characterizing the premature infant as having an increased risk of developing a neonatal complication if the genes of the DNA exome have a higher than average minor allele accumulation level.

Description

PREDICTING NEONATAL COMPLICATIONS USING GENETIC VARIATION
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefit of U.S. Provisional Application No. 62/904,113, filed September 23, 2019, which is incorporated herein by reference.
BACKGROUND
[0002] Preterm birth is the leading cause of mortality and morbidity in young children (age < 5 years) worldwide, with more than a million deaths per year arising from neonatal complications (NC) (Liu et al, Lancet, 388:3027-3035 (2016)). Therefore, to improve the health outcomes of preterm infants in the era of precision medicine and to reduce medical costs, a better predictive models for NC needs to be developed. One attractive approach is to combine well-established risk factors (e.g. gestational age at birth) with genetic risk factors (e.g. exonic mutations) into a single predictor of NC.
[0003] Based on evidence from twin studies, it is known that genes play a role in individual neonatal complications like bronchopulmonary dysplasia (BPD) and retinopathy of prematurity (ROP). In particular, Bhandari et al estimated the heritability of BPD at 53% in a multicenter twin study using logistic regression with mixed effects; and in a retrospective twin study of ROP, Bizzaro et al estimated the heritability at 70%. Bhandari et al., Semin Perinatal., 219-26 (2006); Bizzaro et al, Pediatrics, 118:1858-1863 (2006). However, pinpointing the specific genes that account for the impact of sex and race on susceptibility for NC has been much more difficult.
[0004] Epidemiological studies demonstrate that NC may also be influenced by gestational age (GA), sex, and race. For example, Draper et al showed that GA is negatively correlated with risk for NC. Draper et al., BMJ, 319(7217): 1093-7 (1999). In addition, Trembath A et al. showed that preterm males have increased risk for NC relative to preterm females. Trembath, Laughon MM., Clin Perinatal, 39:585-601 (2012). Finally, Ryan et al. showed that conditional on GA, Non- Hispanic White infants have increased risk relative to Non- Hispanic Black infants. Ryan etal., J Pediatr, 207:130-135 (2019). [0005] A variety of methods are used for the screening of newborns for various genetic conditions. These tests typically involve separate assays for particular metabolite or enzymatic activity associated with particular diseases. However, screening of premature infants is complex, and test parameters are not optimized for these patient groups. For example, newborn intensive care unit infants are more likely to generate false positive or negative results. Accordingly, improved methods for identifying premature infants at risk of neonatal complications are needed.
SUMMARY OF THE INVENTION
[0006] The impact of accumulated genetic variation (burden) on NC in Non-Hispanic White (NHW) and Non-Hispanic Black (NHB) preterm infants was investigated. The inventors sequenced 182 exomes from infants with gestational ages from 26 to 31 weeks. These infants were cared for in the same time period and hospital environment. Eighty-one preterm infants did not develop NC, whereas 101 developed at least one severe complication. The effect of burden at the single-gene and exome-wide levels was measured, and derived a polygenic risk score (PRS) from the top 10 genes to predict NC.
[0007] Burden across the exome was associated with NC in NHW (p=0.05) preterm infants suggesting that multiple genes influence susceptibility. In a post hoc analysis, it was found that PRS alone predicts NC (AUC=0.67) and that PRS is uncorrelated with GA (r =0.05; p=0.53). When PRS and GA at birth are combined, the AUC is 0.87. These results support the hypothesis that genetic burden influences NC in NHW preterm infants.
BRIEF DESCRIPTION OF THE FIGURES
[0008] The present invention may be more readily understood by reference to the following figures, wherein:
[0009] Figure 1 provides a diagram summarizing the inclusion and exclusion criteria used for selecting subjects for exome sequencing.
[0010] Figure 2 provides graphs of a Manhattan plot of -log 10 p- values obtained from the logistic regression of preterm infant status onto burden. After correcting for multiple tests, no single gene in Non-Hispanic White (left) and Non-Hispanic (right) preterm infants is statistically significant at the exome-wide level (dashed line).
[0011] Figure 3 provides graphs showing the exome-wide distribution of burden-based p- values by gene in Non-Hispanic White (NHW) preterm infants (left panel), and Non-Hispanic Black (NHB) preterm infants (right panel). The distribution of p- values in NHWs (n=75 SUS, n=56 RES) shows a statistically significant excess of low with p-values (p=0.05), suggesting that genetic burden influences neonatal complications in NHWs. By contrast, the distribution of p-values in NHBs (n=25 SUS, n=26 RES) is inconclusive (p=0.4).
[0012] Figure 4 provides a graph showing ROC curves are shown for three predictors of NC: Polygenic Risk Score (PRS) -tGestational Age (GA) (solid), GA alone(dashed), PRS alone(dotdashed), and Random (dotted). The corresponding AUC’s are: 96%, 84%, 78%, and 50%, respectively. After correcting for over-fitting, the average AUC of PRS+GA dropped to 87%, and the average AUC of PRS alone dropped to 67%. The difference in predictive power between PRS+GA and PRS alone is significant (p=0.0012).
DETAILED DESCRIPTION OF THE INVENTION
[0013] The present invention relates to a method of predicting the likelihood that a premature infant will develop a neonatal complication. The method includes conducting exome sequencing of a plurality of genes of the DNA exome of a biological sample from the premature infant; determining the level of minor allele accumulation in the genes of the DNA exome, and characterizing the premature infant as having an increased risk of developing a neonatal complication if the genes of the DNA exome have a higher than average minor allele accumulation level.
Definitions
[0014] As used herein, the term "diagnosis" can encompass determining the likelihood that a subject will develop a disease, or the existence or nature of disease in a subject. The term diagnosis, as used herein also encompasses determining the severity and probable outcome of disease or episode of disease or prospect of recovery, which is generally referred to as prognosis). "Diagnosis" can also encompass diagnosis in the context of rational therapy, in which the diagnosis guides therapy, including initial selection of therapy, modification of therapy (e.g. , adjustment of dose or dosage regimen), and the like.
[0015] As used herein, the terms "treatment," "treating," and the like, refer to obtaining a desired pharmacologic or physiologic effect. The effect may be therapeutic in terms of a partial or complete cure for a disease or an adverse effect attributable to the disease. "Treatment," as used herein, covers any treatment of a disease in a mammal, particularly in a human, and can include inhibiting the disease or condition, i.e., arresting its development; and relieving the disease, i.e., causing regression of the disease.
[0016] Prevention or prophylaxis , as used herein, refers to preventing the disease or a symptom of a disease from occurring in a subject which may be predisposed to the disease but has not yet been diagnosed as having it {e.g., including diseases that may be associated with or caused by a primary disease). Prevention may include completely or partially preventing a disease or symptom.
[0017] The term "exons," as used herein, mean short, functionally important sequences of DNA which represent the regions in genes that are translated into protein and the untranslated region (UTR) flanking them.
[0018] The term "exome sequencing" (also known as targeted exome capture), as used herein, means an efficient strategy to selectively sequence the coding regions of the genome as a cheaper but still effective alternative to whole genome sequencing. UTRs are usually not included in exome studies. In the human genome there are about 180,000 exons: these constitute about 1% of the human genome, which translates to about 30 megabases (Mb) in length. It is estimated that the protein coding regions of the human genome constitute about 85 percent of the disease-causing mutations.
[0019] The term "gene," as used herein, means one or more sequence(s) of nucleotides in a genome that together encode one or more expressed molecule, e.g., an RNA, or polypeptide. The gene can include coding sequences that are transcribed into RNA which may then be translated into a polypeptide sequence, and can include associated structural or regulatory sequences that aid in replication or expression of the gene. [0020] The term "genotype" refers to the alleles present in DNA from a subject or patient, where an allele can be defined by the particular nucleotide(s) present in a nucleic acid sequence at a particular site(s). Often a genotype is the nucleotide(s) present at a single polymorphic site known to vary in the human population.
[00 1] The term “allele” refers to a variant form of a given gene. The variation relates to the nucleotide sequence of the gene, and may or may not result in different observable phenotypic traits. The sequence variants may be single or multiple base changes, including without limitation insertions, deletions, or substitutions, or may be a variable number of sequence repeats. Allele frequency (i.e., allele accumulation) refers to the fraction of gene copies that are of a particular allele in a defined population.
[0022] "Increased risk" or “increased likelihood” refers to a statistically higher frequency of occurrence of the disease or condition in an individual carrying a higher minor allele burden in comparison to the frequency of occurrence of the disease or condition in a member of a population that does not carry the higher minor allele burden. The increased risk can be referred to as a percentage increase. For example, an increased risk can refer to a 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 150%, 200%, 300% or higher frequency of occurrence.
[0023] As used herein, the term "nucleic acid" refers to polynucleotides or oligonucleotides such as deoxyribonucleic acid (DNA), and, where appropriate, ribonucleic acid (RNA). The term should also be understood to include, as equivalents, analogs of either RNA or DNA made from nucleotide analogs (e.g. peptide nucleic acids) and as applicable to the embodiment being described, single (sense or antisense) and double- stranded polynucleotides.
[0024] The term "polymorphism" refers to the coexistence of more than one form of a gene or portion (e.g., allelic variant) thereof. A portion of a gene of which there are at least two different forms, i.e., two different nucleotide sequences, is referred to as a "polymorphic region of a gene". A specific genetic sequence at a polymorphic region of a gene is an allele. A polymorphic region can be a single nucleotide, the identity of which differs in different alleles. A polymorphic region can also be several nucleotides long.
[0025] Where a range of values is provided, it is understood that each intervening value, to the tenth of the unit of the lower limit unless the context clearly dictates otherwise, between the upper and lower limit of that range and any other stated or intervening value in that stated range, is encompassed within the invention. The upper and lower limits of these smaller ranges may independently be included in the smaller ranges, and are also encompassed within the invention, subject to any specifically excluded limit in the stated range. Where the stated range includes one or both of the limits, ranges excluding either or both of those included limits are also included in the invention.
[0026] Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
[0027] As used herein and in the appended claims, the singular forms "a", "and", and "the" include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to "a sample" also includes a plurality of such samples and reference to "a neonatal complication " includes reference to one or more neonatal complications, and so forth.
Predicting Neonatal Complications
[0028] One aspect of the invention provides a method of predicting the likelihood that a premature infant will develop a neonatal complication. Predicting includes calculating or estimating the probability that neonatal complications will occur later in time. The method includes the steps of conducting exome sequencing of a plurality of genes of the DNA exome of a biological sample from the premature infant; determining the level of minor allele accumulation in the genes of the DNA exome, and characterizing the premature infant as having an increased risk of developing a neonatal complication if the genes of the DNA exome have a higher than average minor allele accumulation level.
Biological Samples
[0029] Biological samples to be analyzed using the methods provided herein can be derived from biological material including nucleic acid such as a cell or tissue sample. A biological sample can be obtained in some cases from a hospital, laboratory, clinical or medical laboratory. The biological sample can comprise various types of nucleic acid, e.g., RNA or DNA. Nucleic acids can be extracted from a biological sample by methods known to those of ordinary skill in the art. [0030] The sample can be aqueous humor, vitreous humor, bile, whole blood, blood serum, blood plasma, breast milk, cerebrospinal fluid, cerumen, lymphatic fluid, gastric juice, mucus, peritoneal fluid, saliva, sebum, semen, sweat, perspiration, tears, vaginal secretion, vomit, feces, or urine. The sample can be obtained from a hospital, laboratory, clinical or medical laboratory. The nucleic acid of the sample can be, e.g., mitochondrial DNA, genomic DNA, mRNA, siRNA, miRNA, cRNA, single-stranded DNA, double- stranded DNA, single-stranded RNA, double-stranded RNA, tRNA, rRNA, or cDNA. The sample can comprise cell-free nucleic acid. The sample can be a cell line, genomic DNA, cell-free plasma, formalin fixed paraffin embedded (FFPE) sample, or flash frozen sample. A formalin fixed paraffin embedded sample can be deparaffinized before nucleic acid is extracted. In some embodiments, the biological sample is a blood sample.
[0031] The sample may be processed to render it competent for amplification. Exemplary sample processing can include lysing cells of the sample to release nucleic acid, purifying the sample (e.g., to isolate nucleic acid from other sample components, which may inhibit amplification), diluting/concentrating the sample, and/or combining the sample with reagents for amplification, such as a DNA/RNA polymerase (e.g., a heat-stable DNA polymerase for PCR amplification), dNTPs (e.g., dATP, dCTP, dGTP, and dTTP (and/or dUTP)), a primer set for each allele sequence or polymorphic locus to be amplified, probes (e.g., fluorescent probes, such as TAQMAN probes or molecular beacon probes, among others) capable of hybridizing specifically to each allele sequence to be amplified, Mg2+, DMSO, BSA, a buffer, or any combination thereof, among others. In some examples, the sample may be combined with a restriction enzyme, uracil-DNA glycosylase (UNG), reverse transcriptase, or any other enzyme of nucleic acid processing.
[0032] In some embodiments, the method further includes the step of obtaining a biological sample from the premature infant. Methods of obtaining a biological sample from the subject include use of a needle stick, needle biopsy, swab, and the like. In an exemplary method, the biological sample is a blood sample, which may be obtained for example by venipuncture. In some embodiments, the biological sample is obtained in utero. For example, an in utero sample can be obtained using percutaneous umbilical cord blood sampling.
Neonatal Complications [0033] A variety of neonatal complications are known to be associated with premature birth. These include neurological problems such as retinopathy of prematurity, cardiovascular complications such as intraventricular hemorrhage, respiratory problems such as bronchopulmonary dysplasia, gastrointestinal and metabolic issues such as necrotizing enterocolitis, hematological complications such as anemia of prematurity, and infection. In some embodiments, the neonatal complication is selected from the group consisting of bronchopulmonary dysplasia, intraventricular hemorrhage, retinopathy of prematurity, and necrotizing enterocolitis.
[0034] Bronchopulmonary dysplasia (BPD) is a chronic lung disease which mainly arises in premature children with very low weight at birth. It may result in a longwinded damage of the lung up to early adulthood, or in case of a progressive pulmonary change it may lead to death. A premature lung with a lack of surfactant is a significant risk factor which is responsible for BPD. Children which are under artificial ventilation over a long period of time e.g. to treat infant respiratory distress syndrome (IRDS), are also at risk to develop BPD. On the one hand, the development of BPD is the result of reorganization processes with an inflammatory formation of connective tissue after an initial water retention in a premature lung which is exposed to chemical (oxygen radicals), mechanical (pressure trauma, volume trauma), and biological (microbial agent) damages. On the other hand, BPD of very early preterm infants which are born in a gestation age of less than 28 weeks is in particular characterized by a disorder or an arrest of the formation of the alveoli where the entire area of diffusion for the oxygen and the carbon dioxide is reduced. In addition to the pulmonary alveoli the respiratory tract is also affected, which becomes narrowed and, as a consequence, increases the airway resistance and the blood vessels of the lung, which may cause due to the vasoconstriction an increased pressure in the pulmonary circulation and a strain on the right ventricle.
[0035] Clinical symptoms of BPD may be an increased breathing rate, deeper, enforced breathing with contractions in the chest, increased bronchosecretion, growth retardation and livid skin and mucosa. In the X-ray image of the lung inter alia areas of overblowing next to areas with insufficient ventilation and a fibrotic connective tissue reconstruction can be found. Diagnosis and classification is made on the basis of the oxygen requirement which is necessary at a certain age of the child for a sufficient oxygen saturation of the blood. A distinction is made between a mild, moderate, and a severe progressive form. [0036] Currently BPD is treated by the administration of oxygen to maintain the physiological oxygen saturation of the blood. Corticosteroids counteract the chronic inflammatory process. Diuretic medications should treat a pulmonary edema. In case of a narrowing of the respiratory tract the inhalation of bronchospasmolytics may be considered. Vasodilating medications may decrease any increased pressure in the pulmonary circulation.
[0037] Another neonatal complication is intraventricular hemorrhage (IVH). Intraventricular hemorrhage, also known as intraventricular bleeding, is a bleeding into the brain's ventricular system, where the cerebrospinal fluid is produced and circulates through towards the subarachnoid space.
[0038] IVH in the preterm brain usually arises from the germinal matrix whereas IVH in the term infants originates from the choroid plexus. However, it is particularly common in premature infants or those of very low birth weight. The cause of IVH in premature infants, unlike that in older infants, children or adults, is rarely due to trauma. Instead it is thought to result from changes in perfusion of the delicate cellular stmctures that are present in the growing brain, augmented by the immaturity of the cerebral circulatory system, which is especially vulnerable to hypoxic ischemic encephalopathy. The lack of blood flow results in cell death and subsequent breakdown of the blood vessel walls, leading to bleeding. While this bleeding can result in further injury, it is itself a marker for injury that has already occurred. Most intraventricular hemorrhages occur in the first 72 hours after birth.
[0039] Treatment of IVH focuses on monitoring intracranial pressure (ICP) monitoring via an intraventricular catheter and medications to maintain ICP, blood pressure, and coagulation. In more severe cases an external ventricular drain may be required to maintain ICP and evacuate the hemorrhage, and in extreme cases an open craniotomy may be required. In cases of unilateral IVH with small intraparenchymal hemorrhage the combined method of stereotaxy and open craniotomy has produced promising results
[0040] Another neonatal complication is retinopathy of prematurity (ROP). Retinopathy of prematurity, also called retrolental fibroplasia (RLF) and Terry syndrome, is a disease of the eye affecting prematurely bom babies generally having received intensive neonatal care, in which oxygen therapy is used on them due to the premature development of their lungs. It is thought to be caused by disorganized growth of retinal blood vessels which may result in scarring and retinal detachment. ROP can be mild and may resolve spontaneously, but it may lead to blindness in serious cases. As such, all preterm babies are at risk for ROP, and very low birth-weight is an additional risk factor. Both oxygen toxicity and relative hypoxia can contribute to the development of ROP.
[0041] Peripheral retinal ablation is the mainstay of ROP treatment. The destruction of the avascular retina can be performed with a solid-state laser photocoagulation device. Cryotherapy, an earlier technique in which regional retinal destruction was done using a probe to freeze the desired areas, has also been evaluated in multi-center clinical trials as an effective modality for prevention and treatment of ROP. However, when laser treatment is available, cryotherapy is no longer preferred for routine avascular retinal ablation in premature babies, due to the side effects of inflam ation and lid swelling. Additional methods for treating ROP include vitrectomy, intravitreal injection of bevacizumab, and oral administration of propranolol.
[0042] Another neonatal complication is necrotizing enterocolitis (NEC). Necrotizing enterocolitis (NEC) is a medical condition where a portion of the bowel dies. Symptoms may include poor feeding, bloating, decreased activity, blood in the stool, or vomiting of bile. The underlying mechanism is believed to involve a combination of poor blood flow and infection of the intestines. Diagnosis is based on symptoms and confirmed with medical imaging.
[0043] Prevention includes feeding the infant with breast milk and probiotics. Small amounts of oral feeds of human milk starting as soon as possible, while the infant is being primarily fed intravenously, primes the immature gut to mature and become ready to receive greater intake by mouth. Treatment consists primarily of supportive care including providing bowel rest by stopping enteral feeds, gastric decompression with intermittent suction, fluid repletion to correct electrolyte abnormalities and third-space losses, support for blood pressure, parenteral nutrition, and prompt antibiotic therapy. Surgery is required in those who have free air in the abdomen.
Subjects
[0044] The method of the invention is used to evaluate premature human infants. A premature infant is a baby bom earlier than normal (e.g., at fewer than 37 weeks of gestational age). Gestational age is a risk factor associated with likelihood that an infant will develop a neonatal complication. There is little chance of survival for infants born at fewer than 22 weeks of gestational age. Accordingly, in some embodiments, the premature infant has a gestational age of 22 to 36 weeks. In other embodiments, the premature infant has a gestational age of 24 to 35 weeks. In another embodiment, the premature infant has a gestational age from 25 to 33 weeks. In yet another embodiment, the premature infant has a gestational age from 26 to 31 weeks.
[0045] The race of the premature infant can also affect the risk of developing a neonatal complication. Races include Caucasian, Hispanic, Asian, Black, Polynesian, and Native American.
Exome sequencing
[0046] The method of the invention includes conducting exome sequencing of a plurality of genes of the DNA exome "Exome sequencing” is a process by which the entire coding region of the DNA of an organism is sequenced. Exons are flanked by untranslated regions (UTR) that are usually not included in exome studies. In exome sequencing, only the mRNA is sequenced. The untranslated regions of the genome are not included in exome sequencing. See, e.g., Choi, M. el ai, Genetic diagnosis by whole exome capture and massively parallel DNA sequencing. 2009. PNAS. 106(45): 19096-19101, the disclosure of which is incorporated herein by reference. In the human genome, there are about 180,000 exons.
[0047] Exome sequencing includes two main steps. The first step is to select the subset of DNA that encodes the exons (i.e., the DNA exome), which is also referred to as target- enrichment. The second step is sequencing the exonic DNA using high-throughput DNA sequencing technology.
[0048] Target-enrichment methods allow one to selectively capture genomic regions of interest (e.g., exons) from a DNA sample prior to sequencing. Examples of target enrichment methods include array-based capture, and in-solution capture.
[0049] In array-based capture, microarrays contain single- stranded oligonucleotides with sequences from the human genome to tile the region of interest fixed to the surface. Genomic DNA is sheared to form double-stranded fragments. The fragments undergo end-repair to produce blunt ends and adaptors with universal priming sequences are added. These fragments are hybridized to oligos on the microarray. Un-hybridized fragments are washed away and the desired fragments are eluted. The fragments are then amplified using the polymerase chain reaction (PCR) method or other amplification methods. See Elsharawy et ai, Brief Funct Genomics. 10 (6): 374-386 (2011).
[0050] Amplification techniques are known to those of skill in the art and include, but are not limited to cloning, polymerase chain reaction (PCR), polymerase chain reaction of specific alleles (ASA), ligase chain reaction (LCR), nested polymerase chain reaction, self-sustained sequence replication (Guatelli, J. C. et ai, 1990, Proc. Natl. Acad. Sci. USA 87:1874-1878), transcriptional amplification system (Kwoh, D. Y. et ai, 1989, Proc. Natl. Acad. Sci. USA 86:1173-1177), and Q-Beta Replicase (Lizardi, P. M. etal., 1988, Bio/Technology 6:1197).
[0051] In-solution capture involves the use of a pool of custom oligonucleotides (probes) that are synthesized and hybridized in solution to a fragmented genomic DNA sample. The probes (labeled with beads) selectively hybridize to the genomic regions of interest after which the beads (now including the DNA fragments of interest) can be pulled down and washed to clear excess material. The beads are then removed and the genomic fragments can be sequenced allowing for selective DNA sequencing of genomic regions (e.g., exons) of interest. See Mamanova et ai, Nature Methods. 7 (2): 111-118 (2010). For example, hybridization probes can be linked to magnetic beads to pull down the sequences belonging to the exosome.
[0052] In some embodiments, conducting the exome sequencing includes target enrichment using the paired end sequencing protocol. Paired-end sequencing allows users to sequence both ends of a fragment and generate high-quality, alignable sequence data. Paired-end sequencing facilitates detection of genomic rearrangements and repetitive sequence elements, as well as gene fusions and novel transcripts. Paired-end DNA sequencing also detects common DNA rearrangements such as insertions, deletions, and inversions. See Fullwood et al, Genome Res., 19(4):521-32 (2009).
High-throughput DNA sequencing
[0053] After target enrichment, the exonic DNA is sequenced using high-throughput (e.g., “Next Generation”) DNA sequencing technology. A variety of methods of high-throughput DNA sequencing are known to those skilled in the art. See Grada A, Weinbrecht K., J Invest Dermatol. 133 (8): ell (2013). Examples of high-throughput DNA sequencing include massively parallel signature sequencing, polony sequencing, 454 pyrosequencing, Illumina sequencing, combinatorial probe anchor sequencing (cPAS), SOLiD sequencing, ion torrent semiconductor sequencing, DNA snowball sequencing, Heliscope single molecule sequencing, single molecule real time (SMRT) sequencing, and nanopore DNA sequencing.
[0054] In some embodiments, the high-throughput DNA sequencing technique is 454 sequencing (Roche) (see e.g., Margulies, M etal. (2005) Nature 437: 376-380). 454 sequencing involved two steps. In the first step, DNA can be sheared into fragments of approximately 300- 800 base pairs, and the fragments can be blunt ended. Oligonucleotide adaptors can then ligated to the ends of the fragments. The adaptors can serve as sites for hybridizing primers for amplification and sequencing of the fragments. The fragments can be attached to DNA capture beads, e.g., streptavidin-coated beads using, e.g., Adaptor B, which can contain 5'-biotin tag. The fragments can be attached to DNA capture beads through hybridization. A single fragment can be captured per bead. The fragments attached to the beads can be PCR amplified within droplets of an oil-water emulsion. The result can be multiple copies of clonally amplified DNA fragments on each bead. The emulsion can be broken while the amplified fragments remain bound to their specific beads. In a second step, the beads can be captured in wells (pico-liter sized; PicoTiterPlate (PTP) device). The surface can be designed so that only one bead fits per well. The PTP device can be loaded into an instrument for sequencing. Pyrosequencing can be performed on each DNA fragment in parallel. Addition of one or more nucleotides can generate a light signal that can be recorded by a CCD camera in a sequencing instrument. The signal strength can be proportional to the number of nucleotides incorporated. Pyrosequencing can make use of pyrophosphate (PPi) which can be released upon nucleotide addition. PPi can be converted to ATP by ATP sulfurylase in the presence of adenosine 5' phosphosulfate. Luciferase can use ATP to convert luciferin to oxyluciferin, and this reaction can generate light that is detected and analyzed.
[0055] In some embodiments, the high-throughput DNA sequencing technique is SOLiD technology (Applied Biosystems; Life Technologies). In SOLiD sequencing, genomic DNA can be sheared into fragments, and adaptors can be attached to the 5' and 3' ends of the fragments to generate a fragment library. Alternatively, internal adaptors can be introduced by ligating adaptors to the 5' and 3' ends of the fragments, circularizing the fragments, digesting the circularized fragment to generate an internal adaptor, and attaching adaptors to the 5' and 3' ends of the resulting fragments to generate a mate-paired library. Next, clonal bead populations can be prepared in microreactors containing beads, primers, template, and PCR components. Following PCR, the templates can be denatured and beads can be enriched to separate the beads with extended templates. Templates on the selected beads can be subjected to a 3' modification that permits bonding to a glass slide. A sequencing primer can bind to adaptor sequence. A set of four fluorescently labeled di-base probes can compete for ligation to the sequencing primer. Specificity of the di-base probe can be achieved by interrogating every first and second base in each ligation reaction. The sequence of a template can be determined by sequential hybridization and ligation of partially random oligonucleotides with a determined base (or pair of bases) that can be identified by a specific fluorophore. After a color is recorded, the ligated oligonucleotide can be cleaved and removed and the process can be then repeated. Following a series of ligation cycles, the extension product can be removed and the template can be reset with a primer complementary to the n-1 position for a second round of ligation cycles. Five rounds of primer reset can be completed for each sequence tag. Through the primer reset process, most of the bases can be interrogated in two independent ligation reactions by two different primers. Up to 99.99% accuracy can be achieved by sequencing with an additional primer using a multi-base encoding scheme.
[0056] In some embodiments, the high-throughput DNA sequencing technique is SOLEXA sequencing (Illumina sequencing). SOLEXA sequencing can be based on the amplification of DNA on a solid surface using fold-back PCR and anchored primers. SOLEXA sequencing can involve a library preparation step. Genomic DNA can be fragmented, and sheared ends can be repaired and adenylated. Adaptors can be added to the 5’ and 3’ ends of the fragments. The fragments can be size selected and purified. SOLEXA sequence can comprise a cluster generation step. DNA fragments can be attached to the surface of flow cell channels by hybridizing to a lawn of oligonucleotides attached to the surface of the flow cell channel. The fragments can be extended and clonally amplified through bridge amplification to generate unique clusters. The fragments become double stranded, and the double stranded molecules can be denatured. Multiple cycles of the solid-phase amplification followed by denaturation can create several million clusters of approximately 1,000 copies of single- stranded DNA molecules of the same template in each channel of the flow cell. Reverse strands can be cleaved and washed away. Ends can be blocked, and primers can by hybridized to DNA templates. SOLEXA sequencing can comprise a sequencing step. Hundreds of millions of clusters can be sequenced simultaneously. Primers, DNA polymerase and four fluorophore-labeled, reversibly terminating nucleotides can be used to perform sequential sequencing. All four bases can compete with each other for the template. After nucleotide incorporation, a laser is used to excite the fluorophores, and an image is captured and the identity of the first base is recorded. The 3' terminators and fluorophores from each incorporated base are removed and the incorporation, detection and identification steps are repeated. A single base can be read each cycle.
[0057] In some embodiments, the high-throughput DNA sequencing technique comprises real time (SMRT™) technology by Pacific Biosciences. In SMRT, each of four DNA bases can be attached to one of four different fluorescent dyes. These dyes can be phospholinked. A single DNA polymerase can be immobilized with a single molecule of template single stranded DNA at the bottom of a zero-mode waveguide (ZMW). A ZMW can be a confinement structure which enables observation of incorporation of a single nucleotide by DNA polymerase against the background of fluorescent nucleotides that can rapidly diffuse in an out of the ZMW (in microseconds). It can take several milliseconds to incorporate a nucleotide into a growing strand. During this time, the fluorescent label can be excited and produce a fluorescent signal, and the fluorescent tag can be cleaved off. The ZMW can be illuminated from below. Attenuated light from an excitation beam can penetrate the lower 20-30 nm of each ZMW. A microscope with a detection limit of 20 zeptoliters (1021 liters) can be created. The tiny detection volume can provide 1000-fold improvement in the reduction of background noise. Detection of the corresponding fluorescence of the dye can indicate which base was incorporated. The process can be repeated.
[0058] In some embodiments, the high-throughput DNA sequencing technique is nanopore sequencing (See e.g., Soni G V and Meller A. (2007) Clin Chem 53: 1996-2001). A nanopore can be a small hole, of the order of about one nanometer in diameter. Immersion of a nanopore in a conducting fluid and application of a potential across it can result in a slight electrical current due to conduction of ions through the nanopore. The amount of current which flows can be sensitive to the size of the nanopore. As a DNA molecule passes through a nanopore, each nucleotide on the DNA molecule can obstmct the nanopore to a different degree. Thus, the change in the current passing through the nanopore as the DNA molecule passes through the nanopore can represent a reading of the DNA sequence. The nanopore sequencing technology can be from Oxford Nanopore Technologies; e.g., a GridlON system. A single nanopore can be inserted in a polymer membrane across the top of a microwell. Each microwell can have an electrode for individual sensing. The microwells can be fabricated into an array chip, with 100,000 or more microwells per chip. An instrument (or node) can be used to analyze the chip. Data can be analyzed in real-time. One or more instruments can be operated at a time. The nanopore can be a protein nanopore, e.g., the protein alpha-hemolysin, a heptameric protein pore. The nanopore can be a solid-state nanopore made, e.g., a nanometer sized hole formed in a synthetic membrane (e.g., SiNx, or SiCfe). The nanopore can be a hybrid pore (e.g., an integration of a protein pore into a solid-state membrane. The nanopore can be a nanopore with an integrated sensors (e.g., tunneling electrode detectors, capacitive detectors, or graphene based nano-gap or edge state detectors (see e.g., Garaj et al. (2010) Nature vol. 67, doi:10.1038/nature09379)). A nanopore can be functionalized for analyzing a specific type of molecule (e.g., DNA, RNA, or protein). Nanopore sequencing can comprise "strand sequencing" in which intact DNA polymers can be passed through a protein nanopore with sequencing in real time as the DNA translocates the pore. An enzyme can separate strands of a double stranded DNA and feed a strand through a nanopore. The DNA can have a hairpin at one end, and the system can read both strands.
[0059] In some embodiments, the high-throughput DNA sequencing technique is ion semiconductor sequencing (e.g., using technology from Life Technologies (Ion Torrent)). Ion semiconductor sequencing can take advantage of the fact that when a nucleotide is incorporated into a strand of DNA, an ion can be released. To perform ion semiconductor sequencing, a high density array of micromachined wells can formed. Each well can hold a single DNA template. Beneath the well can be an ion sensitive layer, and beneath the ion sensitive layer can be an ion sensor. When a nucleotide is added to a DNA, H+ is released, when can be measured as a change in pH. The H+ ion can be converted to voltage and recorded by the semiconductor sensor. An array chip can be sequentially flooded with one nucleotide after another. No scanning, light, or cameras can be required.
[0060] In some embodiments, the high-throughput DNA sequencing technique is DNA nanoball sequencing (as performed, e.g., by Complete Genomics; see e.g., Drmanac et al. (2010) Science 327: 78-81). DNA can be isolated, fragmented, and size selected. For example, DNA can be fragmented (e.g., by sonication) to a mean length of about 500 bp. Adaptors (Adi) can be attached to the ends of the fragments. The adaptors can be used to hybridize to anchors for sequencing reactions. DNA with adaptors bound to each end can be PCR amplified. The adaptor sequences can be modified so that complementary single strand ends bind to each other forming circular DNA. The DNA can be methylated to protect it from cleavage by a Type IIS restriction enzyme used in a subsequent step. An adaptor (e.g., the right adaptor) can have a restriction recognition site, and the restriction recognition site can remain non-methylated. The non-methylated restriction recognition site in the adaptor can be recognized by a restriction enzyme (e.g., Acul), and the DNA can be cleaved by Acul 13 bp to the right of the right adaptor to form linear double stranded DNA. A second round of right and left adaptors (Ad2) can be ligated onto either end of the linear DNA, and all DNA with both adaptors bound can be PCR amplified (e.g., by PCR). Ad2 sequences can be modified to allow them to bind each other and form circular DNA. The DNA can be methylated, but a restriction enzyme recognition site can remain non-methylated on the left Adi adaptor. A restriction enzyme (e.g., Acul) can be applied, and the DNA can be cleaved 13 bp to the left of the Adi to form a linear DNA fragment. A third round of right and left adaptor (Ad3) can be ligated to the right and left flank of the linear DNA, and the resulting fragment can be PCR amplified. The adaptors can be modified so that they can bind to each other and form circular DNA. A type III restriction enzyme (e.g., EcoP15) can be added; EcoP15 can cleave the DNA 26 bp to the left of Ad3 and 26 bp to the right of Ad2. This cleavage can remove a large segment of DNA and linearize the DNA once again. A fourth round of right and left adaptors (Ad4) can be ligated to the DNA, the DNA can be amplified (e.g., by PCR), and modified so that they bind each other and form the completed circular DNA template. Rolling circle replication (e.g., using Phi 29 DNA polymerase) can be used to amplify small fragments of DNA. The four adaptor sequences can contain palindromic sequences that can hybridize and a single strand can fold onto itself to form a DNA nanoball (DNB™) which can be approximately 200-300 nanometers in diameter on average. A DNA nanoball can be attached (e.g., by adsorption) to a microarray (sequencing flowcell). The flow cell can be a silicon wafer coated with silicon dioxide, titanium and hexamethyldisilazane (HMDS) and a photoresist material. Sequencing can be performed by unchained sequencing by ligating fluorescent probes to the DNA. The color of the fluorescence of an interrogated position can be visualized by a high resolution camera. The identity of nucleotide sequences between adaptor sequences can be determined.
[0061] In some embodiments, the high-throughput DNA sequencing technique is Helicos True Single Molecule Sequencing (tSMS) (see e.g., Harris T. D. etal. (2008) Science 320:106-109). In the tSMS technique, a DNA sample can be cleaved into strands of approximately 10 to 200 nucleotides, and apolyA sequence can be added to the 3' end of each DNA strand. Each strand can be labeled by the addition of a fluorescently labeled adenosine nucleotide. The DNA strands can then be hybridized to a flow cell, which can contain millions of oligo-T capture sites immobilized to the flow cell surface. The templates can be at a density of about 100 million templates/cm2. The flow cell can then be loaded into an instrument, e.g., HELISCOPE™ sequencer, and a laser can illuminate the surface of the flow cell, revealing the position of each template. A CCD camera can map the position of the templates on the flow cell surface. The template fluorescent label can then be cleaved and washed away. The sequencing reaction can begin by introducing a DNA polymerase and a fluorescently labeled nucleotide. The oligo-T nucleic acid can serve as a primer. The DNA polymerase can incorporate the labeled nucleotides to the primer in a template directed manner. The DNA polymerase and unincorporated nucleotides can be removed. The templates that have directed incorporation of the fluorescently labeled nucleotide can be detected by imaging the flow cell surface. After imaging, a cleavage step can remove the fluorescent label, and the process can be repeated with other fluorescently labeled nucleotides until a desired read length is achieved. Sequence information can be collected with each nucleotide addition step. The sequencing can be asynchronous. The sequencing can comprise at least 1 billion bases per day or per hour.
Minor allele analysis
[0062] Once the genes have been sequenced, the method includes the step of determining the level of minor allele accumulation in the genes of the DNA exome. Accumulation, in this context, refers to the number of minor alleles present in the DNA of a subject. A minor allele is an allele which differs from the most common allele for a given gene in a given population. A minor allele often exhibits single nucleotide polymorphism relative to the predominant, or “major” allele. The level of minor allele accumulation is an aggregate of the minor allele frequencies (MAF) for the genes evaluated using exome sequencing. Minor alleles can be identified using algorithmic sequence analysis and alignment. In some embodiments, the level of minor allele accumulation is evaluated using the single-gene burden method, while in other embodiments the level of minor allele accumulation is evaluated using the exome-wide burden method.
[0063] If the genes of the DNA exome have a higher than average minor allele accumulation level, the premature infant is characterized as having an increased risk of developing a neonatal complication. Average minor allele accumulation levels are known to those skilled in the art, and can be readily be determined through evaluation of the minor allele accumulation in a pool of premature infants. The amount higher than average can include to a 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 150%, 200%, 300% or higher amount than average, in comparison with the average minor allele accumulation.
[0064] In some embodiments, the level of minor allele accumulation is used to generate a polygenic risk score. The polygenic risk score can be used to characterize the likelihood that the premature infant will develop a neonatal complication. When calculating a polygenic risk score, the number of minor alleles at each polymorphic site can be scored as 0 if no minor allele is present, scored as 1 if a single minor allele is present, and scored as 2 if both alleles present are minor. These scores can then be algorithmically combined to generate a polygenic risk score. For a discussion of the use of polygenic risk scores, see Gibson, G., PLoS Genet., 15(4):el008060 (2019).
[0065] Exome DNA sequencing is carried out on a plurality of genes of the premature infant. In some embodiments, the exome sequencing is whole exome sequencing, in which case all of the exomes of the premature infant are evaluated. In other embodiments, a more limited set of exomes are evaluated. These exomes can represent a set of genes that have been previously identified as being more effective for predicting the likelihood that the premature infant will develop neonatal complications. For example, the inventors have demonstrated that the genes RANBP2, CCDC138, GUCY1A3, RNU6-66P, SPINK1, ZCCHC2, DQ579288, FAM47E- STBD1, RSF24D1, and FTE are more effective for predicting the likelihood that neonatal complications will develop in premature infants. In some embodiments, the minor allele accumulation for 10 to 100 genes of the DNA exome are evaluated using exome sequencing, while in other embodiments the minor allele accumulation for 5 to 20 genes of the DNA exome are evaluated using exome sequencing. The genes evaluated can optionally be used to calculate a polygenic risk score for the premature infant.
[0066] In some embodiments, the exome sequencing is conducting using a microarray. A microarray is a multiplex lab-on-a-chip. It is a 2D array on a solid substrate (usually a glass slide or silicon thin-film cell) that assays large amounts of biological material using high- throughput screening miniaturized, multiplexed and parallel processing and detection methods. Microarrays are known in the art and available commercially from companies such as Affymetrix, Agilent, Applied Microarrays, Arrayit, Illumina, and others. The array contains probes complementary to at least one single nucleotide polymorphism identified herein, preferably probes are included for hybridization to the target mutations.
[0067] A wide variety of array formats can be employed in accordance with the present disclosure. One example includes a linear array of oligonucleotide bands, generally referred to in the art as a dipstick. Another suitable format includes a two-dimensional pattern of discrete cells (such as 4096 squares in a 64 by 64 array). As is appreciated by those skilled in the art, other array formats including, but not limited to slot (rectangular) and circular arrays are equally suitable for use (see U.S. Pat. No. 5,981,185). In one example, the array is formed on a polymer medium, which is a thread, membrane or film. An example of an organic polymer medium is a polypropylene sheet having a thickness on the order of about 1 mm (0.001 inch) to about 20 mm although the thickness of the film is not critical and can be varied over a fairly broad range. Biaxially oriented polypropylene (BOPP) films are also suitable in this regard; in addition to their durability, BOPP films exhibit a low background fluorescence. In a particular example, the array is a solid phase, Allele-Specific Oligonucleotides (ASO) based nucleic acid array.
[0068] In some embodiments, one or more traditional risk factors are also used to evaluate the likelihood that a premature infant will develop a neonatal complication. Examples of traditional risk factors include gender, race, and gestational age. For example, decreasing gestational age is associated with an increased risk that the premature infant will develop neonatal complications. The increased risk known to be associated with these traditional risk factors can be mathematically factored into an overall analysis of the likelihood that the premature infant will develop a neonatal complication.
[0069] Once the presence of allelic variants (i.e., minor allele accumulation) has been determined, the level of minor allele accumulation can be displayed in a variety of ways. For example, the levels can be displayed graphically on a display as numeric values or proportional bars (i.e., a bar graph) or any other display method known to those skilled in the art. The graphic display can provide a visual representation of the amount of the variant gene or protein in the biological sample being evaluated.
[0070] In some embodiments, a report is generated which summarizes the identified variants. In some embodiments, the report lists the genes in which each variant was found. In some embodiments, the report lists the genomic location (e.g., chromosome number and numerical location) in which each variant was found. In some embodiments, the report lists the type variant (e.g., single nucleotide change or deletion). In some embodiments, the report lists the identity of the variant (e.g., an A to G mutation). In some embodiments, the report provides information on which variants are pathogenic or likely to be pathogenic. In some embodiments, the report provides information on which variants are associated with a disease of condition. In some embodiments, the report provides information on the disease or condition, including, but not limited to symptoms, pathology, diagnostic testing, and treatment.
Therapeutic Methods
[0071] In some embodiments, the method includes treating premature infants identified as having an increased risk of developing a neonatal complication to reduce the likelihood or effect of the neonatal complication. A preferred method for preventing the development of a neonatal complication is treatment with oxygen therapy. When a specific neonatal complication has been identified, methods of preventing or decreasing the risk that the complication will develop can be used. For example, in premature infants identified as having an increased risk of developing necrotizing enterocolitis, breast milk and/or probiotics can be provided to the premature infant to decrease the likelihood that the infant will develop necrotizing enterocolotis.
[0072] An example has been included to more clearly describe a particular embodiment of the invention and its associated cost and operational advantages. However, there are a wide variety of other embodiments within the scope of the present invention, which should not be limited to the particular examples provided herein.
EXAMPLE
Prediction of Short-Term Neonatal Complications in Preterm Infants Using Exome-
Wide Genetic Variation and Gestational Age
[0073] The objective of this study is to test the hypothesis that, among preterm infants, the accumulation of genetic variation across coding regions of the genome (;.<?., the exome) influences risk for NC. As a corollary, the inventors sought to determine if any observed sex and race disparities in NC relate to the burden as defined by the gene-specific accumulation of minor alleles found by whole exome sequencing (WES). Their study was partially motivated by several promising examples in the study of complex traits for adult diseases and morbidities (e.g. schizophrenia, Parkinson’s disease, and obesity), where investigators have found associations with the accumulation of minor alleles (Zhu et al, PLoS One, 10:e0133421 (2015)). Lastly, in a post hoc analysis, they compared the predictive power of a burden-based polygenic risk score (PRS) and a composite biomarker that combines PRS and GA into a single predictor of NC. Overall, they (1) demonstrated that NC is influenced by the accumulation of minor alleles found by WES in Non- Hispanic White preterm infants (2) did not detect an effect of burden on NC in Non- Hispanic Black preterm infants (3) confirmed the effects of previously reported traditional risk factors (e.g. GA) and show that the impact of minor allele accumulation is independent of GA (at least within the GA range studied) and (4) showed that susceptibility to NC can be accurately predicted by a composite biomarker that combines GA and PRS into a single predictor of NC.
METHODS
Study design, patient population and samples
[0074] This study was approved by the Institutional Review Board at Nationwide Children’s Hospital. The study utilized the Perinatal Research Repository (PRR), which is a data and biospecimen repository of preterm neonates admitted to the neonatal intensive care unit (NICU) at Nationwide Children’s Hospital (NCH). Parents of infants eligible for inclusion in the PRR (i.e. infants <37 completed weeks of gestation) were approached to provide written informed consent for their participation and for the participation of their child. Once consent was obtained, blood or buccal swabs were obtained for DNA extraction. Samples were processed for DNA extraction at the Nationwide Children Hospital Biopathology Center and stored at -80°C until analysis. Clinical data was abstracted from the electronic medical record upon death or discharge by research personnel who were not directly involved in the present study.
[0075] Of the infants enrolled in PRR, eligible preterm infants for this study were singletons with gestational age at birth between 26 and 31 weeks, inclusive. Infants with known chromosomal abnormalities or congenital anomalies were a priori excluded. The gestational age range was chosen on the rationale that for infants with gestational age less than 26 weeks, extreme immaturity could potentially overwhelm any genetic component. Conversely, severe NC are far less common in infants bom at 32 weeks gestation or more. Because both outcomes tend to reduce statistical power, restricting gestational age to the range of 26 to 31 weeks would likely provide the greatest power to detect genetic factors influencing NC.
Clinical phenotype of relevant neonatal complications (NC) and study groups
[0076] Among eligible infants with samples and data available as of December 2015 the inventors defined as “susceptible” (SUS) the infants who were diagnosed at death or discharge with at least one of the of the following severe short-term neonatal outcomes.
• BPD was defined as a requirement for oxygen and/or positive airway pressure at 36 weeks post-menstrual age. Jobe etal, Am J Respir Crit Care Med, 163:1723-1729 (2001).
• NEC was defined as Bell’s Stage 2 or greater. Bell et al, J Pediatr Surg, 14:1-4 (1979).
• ROP was defined as stage 2 or greater according to the ICROP. International Committee for the Classification of Retinopathy of Prematurity, Arch Ophthalmol, 123:991-999 (2005).
• Severe IVH was defined as grade 3 or greater according to the Papile classification. Papile et al, J Pediatr, 92:529-534 (1978).
[0077] The inventors defined “resilient” (RES) as those infants that had none of the NC listed above. For each group, they recorded the following: gestational age at birth, the birthweight, Apgar scores (at 1 min and 5 min), race, sex, delivery route, any exposure to antenatal steroids, whether surfactant was given, and maternal characteristics initiating birth.
[0078] They used Wilcoxon rank sum tests to compare quantitative variables and chi-squared tests to compare categorical variables. A p-value <0.05 was considered significant.
Sequencing
[0079] WES was performed on each infant using the SureSelectXT Target Enrichment System for Illumina Paired End Sequencing Protocol (Agilent Technologies, CA). DNA libraries were captured and enriched for exons using the SureSelect Clinical Research Exome version 1 kit (Agilent). Paired-end 96 base pair reads were generated for exome-enriched libraries sequenced across eight Illumina HiSeq 2500 runs. Samples were sequenced to an average of 76X depth of coverage, with a minimum depth of 50X targeted region coverage. [0080] Following sequencing, primary data analysis consisted of using Illumina’s Real-Time Analysis software to perform base calling and quality scoring from the raw intensity files. The resulting base call format files were then converted and demultiplexed using Illumina’s bcl2fastq2 Conversion Software into the standard FASTQ file format appropriate for secondary analysis.
[0081] Secondary analysis was performed using Churchill, a pipeline developed in house for the discovery of human genetic variation that implements a best practices workflow for variant discovery and genotyping. Kelly et al, Genome Biol, 16:6 (2015). Churchill utilizes the Burrows-Wheeler Aligner to align sequence data to the GRCh37/hgl9 reference genome. Duplicate sequence reads were removed using PicardTools (version 1.104). Local realignment was performed on the aligned sequence data using the Genome Analysis Toolkit (version 3.3- 0) Churchill’s own deterministic implementation of base quality score recalibration was used. The GATK’s HaplotypeCaller was used to call variants. All analysis was performed by uploading FASTQ files to GenomeNext LLC, which automated execution of the Churchill pipeline for the entire dataset.
[0082] Resulting VCF files were downloaded from GenomeNext for subsequent analysis. The inventors used GRCh37/hgl9 from the University of California at Santa Cruz database (Lander et al, Nature, 409:860-921 (2001)) and the 1000 Genome project phase 3 (Auton et al. , Nature, 526:68-74(2015)) for reference human genome annotation.
Assessment of genetic burden
[0083] Burden can be tested on multiple levels (i.e. at the level of individual genes, the whole- exome, and across a selected set of selected genes).
Single-gene Burden
[0084] They defined single-gene burden as the total count of minor alleles across a given region which includes 7.5 kilobases upstream and downstream flanking sequences. Gene regions were further filtered to remove genes with excessive amounts of missing data (i.e. genes with more than 90% missing sequence data), variants with extremely low reads (i.e, number of reads less than two), and variants with extremely high reads (i.e. number of reads exceeds 2.5 standard deviations from the median). These analyses were conducted for each infant and for all genes in the human genome.
[0085] However, because their sequence data were organized around the count of alternate alleles at polymorphic sites, and because alternate alleles are not necessarily minor alleles, computing the single-gene burden is not trivial. Specifically, they scored the number of minor alleles at each site as follows:
• 0 if the site is homozygous for the alternate allele and the alternate allele is major, (i.e., no minor alleles are present)
• 1 if the site is heterozygous (site carries exactly 1 minor allele)
• 2 if the site is homozygous for the alternate allele and the alternate allele is minor (ie, both alleles present are minor)
[0086] Note in their data, sites which are homozygous for the reference allele are not recorded. That said, if they index infants by i, genes by j, and polymorphic sites by k, then the single gene burden of the jth gene in the ith infant is B ,> º åk M
Figure imgf000026_0001
. For each gene, they used a logistic regression (with GA as covariate) to test single-gene burden for association to NC; SUS is coded as 1 (i.e. high risk) and RES is coded as 0 (i.e. low risk). To correct for the number of multiple tests they implemented a Bonferroni procedure.
Exome-wide Burden
[0087] In contrast to the single-gene test described above, the joint effect of all genes was assessed by looking for an excess of low p-values (i.e. p<0.05) among the observed burden p- values. (Note, burden p-values were corrected for heteroscedasticity (Barton et al, BMC Genomics, 14:161(2013)). Because burden is discrete, the distribution of burden p-values is not uniform under the null hypothesis of “no association”. Therefore, to determine the statistical significance of the observed burden p-values, they performed a permutation test. Ludbrook J, Dudley H., American Statistician, 52:127-132 (1998). Specifically, they first computed the observed area under the curve (AUC) from the cumulative distribution of the burden p-values. Then, they permuted preterm infant status (e.g. RES and SUS) 10,000 times to obtain a permutation distribution for AUC. Because a large observed AUC is evidence for an excess of low p-values, they used the proportion of permutations with AUC larger than (or equal to) the observed AUC to estimate the permutation p-value.
Burden-based Polygenic Risk Scores [0088] Using the results of their whole-exome association study (WEAS) of NC in Non- Hispanic Whites, they derived a polygenic risk score based on burden. Specifically, for the i* infant they computed the polygenic risk score (PRS) (Dudbridge F., PLoS Genet, 9:el003348 (2013)) asåj /?[/<¾, where for the klh gene, /?fc is the estimated coefficient for burden and Blk is the observed burden in the klh infant. The summation is taken over the burden of 10 genes showing the strongest evidence for association. Furthermore, they used the ROCR package (Sing et al, Bioinformatics, 21:3940-3941 (2005)) to compute the AUC from Receiver Operating Characteristic (ROC) curves (Haijan-Tilaki K., Caspian J Intern Med, 4:627-635 (2013)) of PRS and their composite biomarker, which combines PRS and GA into a single predictor of NC. Then, they averaged the AUC over a 10-fold cross-validation procedure (Stone M., J R Stat Soc Series B Stat Methodol, 36:111-147(1974)) to mitigate the negative effects of over- fitting. Finally, they used the average AUC and a 95% confidence interval to compare the predictive power of each predictor.
RESULTS
Clinical characteristics of the study groups
[0089] The workflow of the subjects with inclusion and exclusion criteria is summarized in Figure 1. There were 287 eligible babies of which 94 fulfilled SUS criteria and 125 were classified as RES. Of these 219 infants, high quality exome sequencing data was successfully generated on 209 newborns (SUS: n=90, RES: n=119). In anticipation of future meta-analyses, the inventors did not exclude eligible candidates based on race or sex. However due to analytical limitations of understanding the clinical significance of burden in admixed populations with small sample sizes they restricted their analyses to Non-Hispanic White (n=131) and Non-Hispanic Black (n=51) infants. Two infants in their study died as a result of their complications. Table 1 summarizes the clinical outcomes, and Table 2 summarizes the clinical characteristics of the SUS and RES groups limited to Non-Hispanic Whites and Non- Hispanic Blacks.
[0090] Table 1: Clinical outcomes of the study group
Figure imgf000027_0001
Figure imgf000028_0001
* Individual morbidities not mutually exclusive. 17 non-Hispanic White and 6 non-Hispanic
Black children had multiple morbidities.
** two susceptible infants died before BDP and ROP can be assessed. [0091] Table 2: Clinical characteristics of the study group
Figure imgf000028_0002
Figure imgf000029_0001
* Gestational hypertension, preeclampsia, or HELLP (hemolysis, elevated liver enzymes, low platelet count)
[0092] As anticipated, the SUS group had a much lower birthweight (p<0.001) and gestational age (p<0.001) at birth. The SUS group also had more (p<0.05) male newborns than did the RES group. In addition, the Apgar scores at 1 min and 5 min also were statistically different (p<0.009 and p<0.03 respectively) between the two groups of infants with SUS group having somewhat lower scores. There was no significant difference in delivery route (p<0.38), in antenatal steroids (p<0.20), or surfactant administration (p<0.96) between the two groups.
Confirmation of known risk factors in their study groups
[0093] Several authors have previously reported associations between NC and gestational age (Ward RM, Beachy JC., BJOG 110 Suppl, 20:8-16(2003)) and between NC and sex (Morse el al, Pediatrics, 117:el06-l 12(2006)), and between NC and race (Schieve LA, Handler A., Obstet Gynecol, 88:356-363(1996)). Therefore, they sought to confirm these findings in their sample of 182 preterm infants. First, they found that gestational age (GA) is associated with NC in Non-Hispanic Whitesan [ORGA — 0.43; 95% Cl: (0.31, 0.56)], and in Non-Hispanic Blacks[ORGA = 0.42; 95% Cl: (0.24, 0.63)]. Second, they tested for an association between sex and NC with gestational age as a covariate. Although the evidence for association was not statistically significant, the effect of male sex may be stronger in Non-Hispanic Blacks [(0RSex = 3.6; 95% Cl: (0.84,17.3)] than in Non-Hispanic Whites [( 0RSex = 1.9; 95% Cl: (0.77,4.83)]. Within in each race, male preterm infants appear to have higher risk of NC than female preterm infants. Third, they found suggestive evidence for an association between race andNC with gestational age as a covariate ( ORRace — 2.2; 95% Cl: (0.98, 5.22)], implying that Non-Hispanic White preterm infants may be more susceptible to NC than Non-Hispanic Black preterm infants.
Exome-wide burden in genetic variation associates with NC in Non-Hispanic White infants [0094] Using the human genome reference annotations (GRCh37/hgl9) from the University of California at Santa Cruz database (Lander et ah, Nature, 409:860-921(2001)) they identified a total of 27939 gene regions. After filtering of the WES data, they retained a total of 23,854 gene regions for their analysis of data from Non-Hispanic White infants and 20,232 gene regions for analysis of Non-Hispanic Black infants. On average, each infant provided sequence data at about 273,168 variants.
[0095] Although no single gene was statistically significant at the exome-wide level of significance for either Black or White preterm infants (Figure 2), they did find an excess of low p-values across all genes among non-Hispanic White babies (Figure 3). Specifically, after 10,000 permutations of infant status (i.e., SUS and RES) the observed AUC was statistically significant (p=0.05). This implies that when analyzed in aggregate, the burden of minor alleles influences the risk for NC in Non-Hispanic White newborns.
[0096] The inventors compared the ROC curves of PRS alone and their composite biomarker (PRS+GA) in terms of AUC (Figure 4). Recall that their composite biomarker combines gestational age and PRS into a single predictor of NC, where PRS is computed from the top ten genes (Table 3) of the exome-wide analysis for Non-Hispanic White infants. However, this comparison does not account for over-fitting. To compare the predictive power of PRS alone and PRS+GA without over-fitting, they averaged the AUC of each predictor in a 10-fold cross- validation procedure. The average AUC for PRS alone was 0.67 (p<0.003), and increased to 0.87 (p<0.001) when GA was combined with PRS (Table 4). Note that the average AUC based on PRS+GA is significantly larger than the average AUC based on PRS alone (p=0.0012). Moreover, for each gene in the PRS-gene set, the average burden was higher among SUS preterm infants than RES preterm infants. This suggests that large values of PRS are associated with increased risk of NC. Interestingly, PRS and GA were not correlated [p =0.05; p=0.53|. This may explain why the predictive power of their composite biomarker, which combines PRS and GA into a single predictor of NC, is so high compared to the predictive power of PRS alone.
[0097] Table 3: Top 10 genes used to construct PRS
Figure imgf000030_0001
Figure imgf000031_0001
The top 10 genes from our exome-wide burden-based association study in Non- Hispanic
Whites preterm infants; p-values are also shown.
Table 4: AUC Averaged over 10 Cross-Validation Set
Figure imgf000031_0002
Summary table of the ROC curve analysis: For each risk factor: PRS (polygenic risk score, which is based on burden) and the combination of PRS and gestational age (GA), we measured the AUC* (average area under the curve), and constructed the corresponding 95% confidence intervals (CIs) and p-values from the non-parametric bootstrap procedure. Note that AUC was averaged over 10 cross-validation sets to mitigate the negative effects of over-fitting.
Discussion
[0098] Preterm infants are at increased risk for NC (neonatal complications), and as such, predicting the health outcomes of preterm infants could have a tremendous impact on their outcomes, and furthermore could be extremely useful in the development of preventative therapies by identifying high-risk populations. In this study, the inventors confirm that gestational age is an important predictor of NC; and they find suggestive evidence for an effect of race. Their data also indicates that male sex may play a stronger role in Non-Hispanic Blacks than Non-Hispanic Whites. Furthermore, they show that the combination of traditional risk factors and genetic risk factors can substantially improve prediction. In what follows, they discuss each of these findings, their implications, and any relevant limitations of their study.
Genetic Predictors of Neonatal Complications
[0099] The results demonstrate that in aggregate (i.e. across genes) the accumulation of exonic variants is positively correlated with NC, but as with most whole-exome association studies, there are potential limitations. First, there’s always the question of how should one summarize variation across potentially overlapping, nested, and alternatively spliced genomic regions. Here, they took the simplest possible approach and summarized exonic variation across each gene, where start and end positions were given by the UCSC GRCh38/hg38 annotation file. Second, because exons are highly conserved, exonic variation in human populations is typically low (Irimia et al, Mol Biol Evol, 25:375-382(2008)). As such, WEAS often require large sample sizes to detect an association (Hong EP, Park JW., Genomics Inform 10:117- 122(2012)). Because it is often difficult to collect large samples of preterm infants (Torgerson et al, Am J Physiol Lung Cell Mol Physiol, 315:L858-L869 (2018)), most WEAS studies of preterm infants are underpowered. For example, given the size of the genetic effects that they see in their Non-Hispanic White infants, their Non- Hispanic Black sample (n=51) is probably too small to detect an association at either the single-gene level, or at the exome-wide level.
[00100] In a post hoc analysis, they tested their newly discovered PRS-gene set, which contains 10 genes showing the largest evidence for association. There’s almost always some difficultly in deciding exactly how many genes to include in a PRS risk score (Dudbridge F., PLoS Genet, 9:el003348(2013)). Nevertheless, they decided to use the top ten genes because the degree of over-fitting seemed acceptably small. To further mitigate the potentially negative effects of over-fitting, they chose to implement a 10-fold cross validation procedure. Interestingly, their polygenic risk score does not appear to be correlated with GA, which likely stems from the fact that their logistic regression included GA as a covariate, but may also suggest that NC and prematurity have different genetic etiologies.
Race and Sex as Predictors of NC
[00101] To date, the strongest predictors for NC are gestational age [MANUCK, 2016], birth weight (Gooden et al, Am J Perinatol, 31:441-446 (2014)), sex (Zisk et al, Am J Perinatol, 28:241-246 (2011)) and race (Ryan et al, J Pediatr, 207:130-135(2019)). While the evidence for an association between race and NC is only suggestive in their data (p=0.06), when they imputed the percentage of African ancestry in each preterm infant from the available exome sequence data, they found that imputed African ancestry was negatively correlated with risk for NC (p=0.049). This suggests that preterm infants of African descent may be less likely to develop NC. Furthermore, the results (and the results of Morse et al. (Morse et al, Pediatrics, 117:el06-l 12 (2006)) suggest that the effect of sex may depend on race. Lastly, their PRS gene set predicted NC poorly in Non-Hispanic Black infants. Here, the PRS was computed from the PRS gene set (and the corresponding regression coefficients) identified in Non-Hispanic Whites, and from the observed burden of Non-Hispanic Black infants. Although the relatively small number of Non-Hispanic Black infants in their study (n=51) could explain their inability to detect an association, another possible explanation is that NC in Non-Hispanic Blacks and Non-Hispanic Whites is influenced by different genes.
Candidate Genes and Improved Biomarkers for NC
[00102] Among the ten genes composing the PRS score for NC, the top score was assigned to RANBP2 (Ran-binding protein- 2), a protein located on the cytoplasmic surface of the nuclear pore complex that plays a role in intracellular trafficking. Although additional studies need to investigate the mechanistic role of RANBP2 in the pathogenesis of NCs, there is some biologically plausibility based on what is known so far about this protein. First, the mouse RANBP2 knockout is embryonically lethal, suggesting that RANBP2 plays an important role in development. In addition, studies on conditional knockout mice linked the cause of lethality to inadequate nuclear import, although whether this affected a broad spectrum of proteins or a small subset remains to be determined. Hamada et al, J. Cell. Biol., 194, 597-612 (2011). Second, several RANBP2 mutations in children are known to cause acute necrotizing encephalopathy, a disorder where previously normal children develop encephalopathy in response to common viral infection. Therefore, the possible involvement of RANBP2 in a broader adaptive response such as the ones required by premature newborns in the context of NICU environment needs to be studied.
[00103] Another gene among the top ten with direct biological plausibility is GUCY1A3. This gene encodes for the al subunit of the nitric oxide/cGMP signaling pathway - a pathway that regulates sensitivity to nitric oxide. Note that nitric oxide is an established mediator of newborn lung development, and when inhaled, it has been proposed to exert therapeutic benefit to prevent BPD. While randomized clinical trials have yielded conflicting results on the protective effect of inhaled nitric oxide in the general population, there is the suggestion of a possible subgroup benefit in non- white infants. Furthermore, a polymorphism in GUCY1A3 has previously associated with decreased risk for pulmonary hypertension in a high-altitude population. Wilkins et al., Circ. Cardiovas. Genet. 7, 920-929 (2014). [00104] Although the discussion of candidate genes has been limited to RANBP2 and GUCY1A3, future mechanistic investigations of NCs should be extended to other genes as well, especially because genetic burden could have additive effects in genes with seemingly unrelated function. Furthermore, specifically with regard to BPD, it would be interesting to determine wither the accumulation of minor alleles in GUCY1A3 modulates the therapeutic response to inhaled nitric oxide. Hasan et ciL, JAMA Pediatr. 171, 1081-1089 (2017).
[00105] Interestingly, while the burden-based PRS predicts NC in Non-Hispanic Whites, it is a poor predictor of GA. This implies (and the AUC analyses confirm) that the inventors composite predictor of NC - which combines information from both PRS and GA - should perform better than a predictor based on PRS alone. Overall, we believe that the composite predictor will facilitate the design of individualized treatments for preterm infants that, in turn, could substantially improve health outcomes and reduce hospitalization costs.
Conclusion
[00106] This work demonstrates conclusively, their ability to predict NC among Non-Hispanic White preterm infants using information from both genetic risk factors and traditional risk factors (e.g. gestational age). Provided that future studies of NC continue to collect genetic data on preterm infants from under-represented populations - where the chance of preterm birth is higher - then more efficacious composite biomarkers could be developed and implemented for members of these extremely vulnerable populations.
[00107] The complete disclosure of all patents, patent applications, and publications, and electronically available material cited herein are incorporated by reference. The foregoing detailed description and examples have been given for clarity of understanding only. No unnecessary limitations are to be understood therefrom. The invention is not limited to the exact details shown and described, for variations obvious to one skilled in the art will be included within the invention defined by the claims.

Claims

CLAIMS What is claimed is:
1. A method of predicting the likelihood that a premature infant will develop a neonatal complication, comprising: conducting exome sequencing of a plurality of genes of the DNA exome of a biological sample from the premature infant; determining the level of minor allele accumulation in the genes of the DNA exome, and characterizing the premature infant as having an increased risk of developing a neonatal complication if the genes of the DNA exome have a higher than average minor allele accumulation level.
2. The method of claim 1, further comprising the step of obtaining a biological sample from the premature infant.
3. The method of claim 2, wherein the biological sample is obtained in utero.
4. The method of claim 1, wherein the biological sample is a blood sample.
5. The method of claim 1, wherein the neonatal complication is selected from the group consisting of bronchopulmonary dysplasia, intraventricular hemorrhage, retinopathy of prematurity, and necrotizing enterocolitis.
6. The method of claim 1, wherein the infant is a Caucasian infant.
7. The method of claim 1, wherein the premature infant has a gestational age from 26 to
31 weeks.
8. The method of claim 1, wherein conducing the exome sequencing comprises target enrichment using the paired end sequencing protocol.
9. The method of claim 1, wherein the exome sequencing is conducting using a micro array.
10. The method of claim 1, wherein the exome sequencing is whole exome sequencing.
11. The method of claim 1, wherein the level of minor allele accumulation is evaluating using the single-gene burden method.
12. The method of claim 1, wherein the level of minor allele accumulation is evaluated using the exome- wide burden method.
13. The method of claim 1, wherein the level of minor allele accumulation is used to generate a polygenic risk score, and characterizing the level of the likelihood that the premature infant will develop a neonatal complication based on the polygenic risk score.
14. The method of claim 13, wherein the minor allele accumulation for 5 to 20 genes of the DNA exome are used to calculate the polygenic risk score.
15. The method of claim 14, wherein the genes of the DNA exome include one or more of the genes RANBP2, CCDC138, GUCY1A3, RNU6-66P, SPINK1, ZCCHC2, DQ579288, FAM47E-STBD1, RSL24D1, and FTL.
16. The method of claim 1, wherein one or more traditional risk factors selected from race and gestational age are also used to evaluate the likelihood that a premature infant will develop a neonatal complication.
17. The method of claim 1, further comprising treating premature infants identified as having an increased risk of developing a neonatal complication to reduce the risk or effect of the neonatal complication.
18. The method of claim 17, wherein the premature infant is treated with oxygen therapy.
PCT/US2020/052099 2019-09-23 2020-09-23 Predicting neonatal complications using genetic variation Ceased WO2021061697A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201962904113P 2019-09-23 2019-09-23
US62/904,113 2019-09-23

Publications (1)

Publication Number Publication Date
WO2021061697A1 true WO2021061697A1 (en) 2021-04-01

Family

ID=75166419

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2020/052099 Ceased WO2021061697A1 (en) 2019-09-23 2020-09-23 Predicting neonatal complications using genetic variation

Country Status (1)

Country Link
WO (1) WO2021061697A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2778301C1 (en) * 2021-12-02 2022-08-17 Федеральное государственное бюджетное учреждение "Ивановский научно-исследовательский институт материнства и детства имени В.Н. Городкова" Министерства здравоохранения Российской Федерации Method for predicting intraventricular hemorrhages in deeply premature newborns
WO2024182491A1 (en) * 2023-02-28 2024-09-06 Virginia Commonwealth University Polygenic risk estimator for cervical length change during pregnancy

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160281166A1 (en) * 2015-03-23 2016-09-29 Parabase Genomics, Inc. Methods and systems for screening diseases in subjects

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160281166A1 (en) * 2015-03-23 2016-09-29 Parabase Genomics, Inc. Methods and systems for screening diseases in subjects

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
BHANDARI ET AL.: "Familial and Genetic Susceptibility to Major Neonatal Morbidities in Preterm Twins", PEDIATRICS, vol. 117, no. 6, June 2006 (2006-06-01), pages 1901 - 1906 *
DRAPER E. S, MANKTELOW B., FIELD D. J, JAMES D.: "Prediction of survival for preterm births by weight and gestational age: retrospective population based study", BMJ, vol. 319, no. 7217, 23 October 1999 (1999-10-23), pages 1093 - 1097, XP055811388 *
T PATEL, K J BROOKES , J TURTON , S CHAUDHURY , T GUETTA-BARANES , R GUERREIRO , J BRAS, D HERNANDEZ , A SINGLETON, P T FRANCIS , : "Whole-exome sequencing of the BDR cohort: evidence to support the role of the PILRA gene in Alzheimer's disease", NEUROPATHOL APPL NEUROBIOL, vol. 44, no. 5, August 2018 (2018-08-01), pages 506 - 521, XP055811380 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2778301C1 (en) * 2021-12-02 2022-08-17 Федеральное государственное бюджетное учреждение "Ивановский научно-исследовательский институт материнства и детства имени В.Н. Городкова" Министерства здравоохранения Российской Федерации Method for predicting intraventricular hemorrhages in deeply premature newborns
WO2024182491A1 (en) * 2023-02-28 2024-09-06 Virginia Commonwealth University Polygenic risk estimator for cervical length change during pregnancy

Similar Documents

Publication Publication Date Title
US20230047963A1 (en) Gestational age assessment by methylation and size profiling of maternal plasma dna
JP7678837B2 (en) A preeclampsia-specific circulating RNA signature
US20210375391A1 (en) Detection of microsatellite instability
JP2016526895A (en) Preterm biomarkers
JP2023501760A (en) A circulating RNA signature specific to pre-eclampsia
EP4532769A2 (en) Circulating rna biomarkers for preeclampsia
WO2021061697A1 (en) Predicting neonatal complications using genetic variation
EP3212811B1 (en) Diagnosis of genetic alterations associated with eosinophilic esophagitis
US20210222233A1 (en) Compositions and methods for diagnosing and treating arrhythmias
US20130261011A1 (en) Analyzing neonatal saliva and readiness to feed
CN114634932A (en) Novel circRNA, kit and application
US20240182982A1 (en) Fragmentomics in urine and plasma
HK40091879A (en) Gestational age assessment by methylation and size profiling of maternal plasma dna
HK40015099B (en) Circulating rna signatures specific to preeclampsia
HK40062225A (en) Circulating rna signatures specific to preeclampsia
HK40015099A (en) Circulating rna signatures specific to preeclampsia
Familiari Genomic approach to Idiopathic Calcium Nephrolithiasis: searching for susceptibility genes in a three generation family
HK40005275A (en) Gestational age assessment by methylation and size profiling of maternal plasma dna
HK40005275B (en) Gestational age assessment by methylation and size profiling of maternal plasma dna

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20869164

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20869164

Country of ref document: EP

Kind code of ref document: A1