[go: up one dir, main page]

WO2019070598A1 - Préparation de bibliothèques pour le séquençage du génome entier - Google Patents

Préparation de bibliothèques pour le séquençage du génome entier Download PDF

Info

Publication number
WO2019070598A1
WO2019070598A1 PCT/US2018/053784 US2018053784W WO2019070598A1 WO 2019070598 A1 WO2019070598 A1 WO 2019070598A1 US 2018053784 W US2018053784 W US 2018053784W WO 2019070598 A1 WO2019070598 A1 WO 2019070598A1
Authority
WO
WIPO (PCT)
Prior art keywords
adaptor
single stranded
nucleic acids
nucleic acid
dna
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US2018/053784
Other languages
English (en)
Inventor
Anna Vilborg HARTWIG
Austin SO
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
TOMA BIOSCIENCES Inc
Original Assignee
TOMA BIOSCIENCES Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by TOMA BIOSCIENCES Inc filed Critical TOMA BIOSCIENCES Inc
Publication of WO2019070598A1 publication Critical patent/WO2019070598A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12PFERMENTATION OR ENZYME-USING PROCESSES TO SYNTHESISE A DESIRED CHEMICAL COMPOUND OR COMPOSITION OR TO SEPARATE OPTICAL ISOMERS FROM A RACEMIC MIXTURE
    • C12P19/00Preparation of compounds containing saccharide radicals
    • C12P19/26Preparation of nitrogen-containing carbohydrates
    • C12P19/28N-glycosides
    • C12P19/30Nucleotides
    • C12P19/34Polynucleotides, e.g. nucleic acids, oligoribonucleotides
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1093General methods of preparing gene libraries, not provided for in other subgroups

Definitions

  • Cancer poses serious challenges for modern medicine. It has been estimated that cancer causes over 10% of all human deaths worldwide. Cancer encompasses a broad group of various diseases, generally involving unregulated cell growth. In cancer, cells can divide and grow uncontrollably, can form malignant tumors, and can invade nearby parts of the body. Cancer can also spread to more distant parts of the body, for example, via the lymphatic system or bloodstream. There are over 200 different known cancers that afflict humans. Many cancers are associated with mutations, for example, mutations in cancer- related genes. The mutational status of a cancer can vary widely from one individual subject to another, and even from one tumor cell to another tumor cell in the same subject. Knowledge of these mutations can aid in the selection of cancer therapy, and can also aid in informing disease prognosis and/or disease status.
  • Next Generation Sequencing is increasingly used in translational cancer research and as a diagnostic test to identify actionable mutations in tumors of cancer patients.
  • most tumor specimens are only available as formalin-fixed, paraffin-embedded (FFPE) blocks, whether from patient biopsies or a part of archival biobanks.
  • FFPE-derived DNA is typically fragmented and has frayed ends, abasic positions, crosslinks, and modified bases.
  • libraries described herein are prepared from single stranded or double stranded nucleic acids.
  • Single-stranded nucleic acids can be prepared from a sample of double-stranded nucleic acid using any means known in the art or described herein.
  • Starting samples can be a biological sample obtained from a subject.
  • the biological sample can be formalin-fixed paraffin-embedded (FFPE) tissues, serum, blood, urine, cerebral spinal fluid, other bodily fluids, tissue (e.g., organs), cells, swabs, etc.
  • the nucleic acid can be obtained from the biological sample as RNA, DNA, or cDNA.
  • the nucleic acid can be obtained from biological samples, for example, using commercially available kits (e.g., those sold by Qiagen or Covaris).
  • Nucleic acids obtained from the biological sample can be fragmented to a desired size using, e.g., restriction enzymes, nuclease, sonication, shearing, other physical treatments that break the nucleic acids, or combinations of the foregoing.
  • restriction enzymes e.g., restriction enzymes, nuclease, sonication, shearing, other physical treatments that break the nucleic acids, or combinations of the foregoing.
  • the fragmented nucleic acids can be treated to remove crosslinks and damaged DNA nucleotides from the nucleic acids.
  • Damaged nucleotides can be removed using, for example, AP-endonuclease 1, Uracil DNA glycosylase (UDG), formamidopyrimidine [fapy]-DNA glycosylase (Fpg), bifunctional DNA glycosylase OGG1, other glycosylases, DNA polymerase ⁇ , X-ray repair cross-complementing group 1 (XRCC1), DNA ligase III, Poly(ADP-ribose) polymerase (PARP-1), Uvr proteins, Endo VIII, nucleotide excision repair enzymes (e.g., CETN2, DDB1, DDB2, ERCC1, ERCC2, ERCC3, ERCC4, ERCC5, ERCC6, ERCC8, LIG1, MNAT1, MMS19, RAD23A, RAD23B, RPA1, RPA2, TFIIH, XAB2, XPA, XPC).
  • the damaged nucleotides can be removed using the above enzymes (or combinations of
  • the nucleic acid fragments can then be treated to add a phosphate to the 5' end of the fragments and optionally to remove the phosphate on the 3' end of the fragments.
  • the nucleic acid fragments can be treated with T4 polynucleotide kinase and ATP to add phosphate to the 5' end, and remove phosphates from the 3 ' end of the nucleic acids.
  • nucleic acids can optionally be blocked with appropriate blocking groups, e.g., dideoxynucleotides can be added to the 3' end, or reversible protection groups can be added to the 3' end to prevent ligation reactions at the 3' end of the DNA fragments or strands.
  • Nucleic acid fragments with 5' phosphate and optionally dephosphorylated (and/or protected) 3' ends are ligated to 5 '-adapters on the 5' end of the fragments. If desired, unligated 5 '-adapters can be separated or removed from the fragment-5' -adapters by purification or other methods.
  • the 5 '-adapter-fragments are then ligated with 3 '-adapters on the 3' end of the 5 '-adapter-fragments. If a protective group has been placed on the 3' end of the fragments to prevent ligation reactions, this protective group must be removed prior to the ligation of the 3 '-adapter. If desired, after ligation of the 3 '-adapters, unligated 3 '-adapters can be separated or removed from the 5 '-adapter-fragment-3' -adapter by purification or other methods.
  • the library of 5 '-adapter-fragment-3 '-adapter nucleic acids can be directly sequenced, or the library can be subject to amplification. The amplification can be performed using primers specific for sequences in the adapters, or a target directed amplification can be done.
  • Libraries made using the methods described herein can be made from genomic material or mRNA obtained from the biological sample that provide good coverage depth and high percentage coverage of the genome (or expressed genes). The libraries can provide a median coverage depth of at least 20, 25, 30, 35, 40, 50, 100, 500, 1000, 10,000 or 100,000 fold.
  • the libraries can also provide 80%, 90% 95%, 99% or 100% coverage of a genome, expressed genes, or target sequence with a coverage depth of at least 20, 25, 30, 35, 40, 50, 100, 500, or 1000 fold.
  • the libraries can provide a sensitivity and/or precision in making sequence calls of 70%, 75%, 80%, 85%, 90%, 95%, 99%, 99.5% or 99.99%.
  • Libraries made using the methods described herein can be used to detect known and new mutations, detect new alleles, diagnose and/or monitor disease, diagnose and/or monitor disorders, monitor, and/or improve the treatment of subjects suffering from a disease or disorder, and/or retrospective studies for any of the proceeding.
  • the methods described herein can also be used to investigate and identify mutations, sequence changes, and/or variants that are associated with and have predictive value for diagnosis and treatment of diseases.
  • adaptor-ligated is defined as a nucleic acid that has been ligated to an adaptor.
  • the adaptor can be ligated to a 5' end or a 3' end of a nucleic acid molecule, or can be added to an internal region of a nucleic acid molecule.
  • amplification of a nucleic acid sequence is defined as techniques for enzymatically increasing the number of copies of a nucleic acid. Amplification methods include both asymmetric methods (in which the predominant product is single-stranded) and symmetrical methods (in which the predominant product is double-stranded), such as, for example, PCR.
  • anneal As used herein, the terms “anneal,” “hybridize,” or “bind” are defined as two polynucleotide sequences, segments or strands, that have sufficient complementariness to each other to form a double stranded nucleic acid, and these terms can be used interchangeably.
  • Two sequences with sufficient complementary bases e.g., DNA and/or RNA
  • barcode sequence is defined as a unique sequence of nucleotides that can encode information.
  • a barcode sequence can encode information relating to the identity of an interrogated allele, identity of a target polynucleotide or genomic locus, identity of a sample, a subject, or any combination thereof.
  • a barcode sequence can be a portion of a primer, a reporter probe, or both.
  • a barcode sequence may be at the 5 '-end or 3'-end of an oligonucleotide, or may be located in any region of the oligonucleotide.
  • a barcode sequence generally is not part of a template sequence.
  • Barcode sequences may vary widely in size and composition; the following references provide guidance for selecting sets of barcode sequences appropriate for particular uses: Brenner, U.S. Pat. No. 5,635,400; Brenner et al, Proc. Natl. Acad. Sci., 97: 1665-1670 (2000); Shoemaker et al, Nature Genetics, 14: 450-456 (1996); Morris et al, European patent publication 0799897A1; Wallace, U.S. Pat. No. 5,981,179, all of which are incorporated by reference in their entirety for all purposes.
  • a barcode sequence may have a length of about 4 to 36 nucleotides, about 6 to 30 nucleotides, or about 8 to 20 nucleotides.
  • CNV CNV
  • CNA copy number alteration
  • copy number variant are used interchangeably, and are defined as sections of the genome that are repeated and the number of copies can vary between individuals.
  • the term "complementary” is defined as a relationship between two antiparallel nucleic acid sequences in which the sequences are related by the base-pairing rules: A pairs with T or U and C pairs with G.
  • a first sequence or segment that is "perfectly complementary” to a second sequence or segment is complementary across its entire length and has no mismatches.
  • a first sequence or segment is "substantially complementary” to a second sequence of segment when a polynucleotide consisting of the first sequence is sufficiently complementary to specifically hybridize to a polynucleotide consisting of the second sequence.
  • deletion is defined as a mutation in which part of the genome or DNA sequence is lost or removed with respect to a human genome reference sequence. Deletions can be as small as 1 nucleotide or base pair, and can be as a large as the loss of a chromosome.
  • genomic sequence is defined as a sequence that occurs in a genome. Because RNAs are transcribed from a genome, this term encompasses sequence that exist in the nuclear genome of an organism, as well as sequences that are present in a cDNA copy of an RNA (e.g., an mRNA) transcribed from such a genome.
  • Insertions are defined as a mutation in which part of the genome or DNA sequence includes a sequence not present in a human genome reference sequence. Insertions can be as small as 1 nucleotide or base pair, and can be as a large as tens of millions of base pairs.
  • library or “sequencing library” are used interchangeably and are defined as a plurality of nucleic acid fragments obtained from a biological sample. Generally, the fragments are modified with an adaptor sequence which affects coupling (e.g., capture and/or immobilization) of the fragments to a sequencing platform and which adaptors also include primer sequences for amplifying and/or sequencing of the nucleic acid.
  • an adaptor sequence which affects coupling (e.g., capture and/or immobilization) of the fragments to a sequencing platform and which adaptors also include primer sequences for amplifying and/or sequencing of the nucleic acid.
  • ligating is defined as the enzyme catalyzed j oining of the terminal nucleotide at the 5' end of a first DNA molecule to the terminal nucleotide at the 3' end of a second DNA molecule.
  • locus is defined as a location of a gene, nucleotide, or sequence on a chromosome.
  • An "allele” of a locus can refer to an alternative form of a nucleotide or sequence at the locus.
  • a “wild-type allele” refers to an allele that has the highest frequency in a population of subjects.
  • a “wild-type allele” generally is not associated with a disease.
  • mutation is defined as a change of the nucleotide sequence of a wild-type genome. Mutations can involve large sections of DNA (e.g., copy number variation). Mutations can involve whole chromosomes (e.g., aneuploidy). Mutations can involve small sections of DNA.
  • mutations involving small sections of DNA include, e.g., point mutations or single nucleotide polymorphisms, multiple nucleotide polymorphisms, insertions (e.g., insertion of one or more nucleotides at a locus), multiple nucleotide changes, deletions (e.g., deletion of one or more nucleotides at a locus), and inversions (e.g., reversal of a sequence of one or more nucleotides).
  • polynucleotides As used herein, the terms “polynucleotides,” “nucleic acid,” “nucleotides,” and “oligonucleotides” can be used interchangeably, and are defined to mean a polymeric form of nucleotides of any length, either deoxyribonucleotides or ribonucleotides, or analogs thereof. Polynucleotides may have any three-dimensional structure, and may perform any function, known or unknown.
  • polynucleotides coding or non-coding regions of a gene or gene fragment, loci (locus) defined from linkage analysis, exons, introns, messenger RNA (mRNA), transfer RNA, ribosomal RNA, ribozymes, cDNA, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, isolated DNA of any sequence, isolated RNA of any sequence, nucleic acid probes, and primers.
  • a polynucleotide may comprise modified nucleotides, such as methylated nucleotides and nucleotide analogs.
  • modifications to the nucleotide structure may be imparted before or after assembly of the polymer.
  • the sequence of nucleotides may be interrupted by non-nucleotide components.
  • a polynucleotide may be further modified after polymerization, such as by conjugation with a labeling component.
  • Rearrangement is defined as chromosome changes that result in a different structure of a native chromosome. Rearrangements can be, for example, deletions, duplications, inversions, and translocations.
  • a sample or “nucleic acid sample” are defined as any substance containing or presumed to contain nucleic acid.
  • the sample can be a biological sample obtained from a subject.
  • the nucleic acids can be RNA, DNA, e.g., genomic DNA, mitochondrial DNA, viral DNA, synthetic DNA, or cDNA reverse transcribed from RNA.
  • the nucleic acids in a nucleic acid sample generally serve as templates for extension of a hybridized primer.
  • the biological sample is a liquid sample.
  • the liquid sample can be whole blood, plasma, serum, ascites, cerebrospinal fluid, sweat, urine, tears, saliva, buccal sample, cavity rinse, or organ rinse.
  • the liquid sample can be an essentially cell-free liquid sample (e.g., plasma, serum, sweat, urine, tears, etc).
  • the biological sample is a solid biological sample, e.g., feces or tissue biopsy, e.g., a tumor biopsy.
  • a sample can also comprise in vitro cell culture constituents (including but not limited to conditioned medium resulting from the growth of cells in cell culture medium, recombinant cells and cell components).
  • sequence matching is defined as a method by which the identity of at least 10 consecutive nucleotides (e.g., the identity of at least 20, at least 50, at least 100, at least 200, or at least 500 or more consecutive nucleotides) of a polynucleotide are obtained.
  • single nucleotide variant or "SNV” is defined as a type of genomic sequence variation resulting from a single nucleotide substitution within a sequence.
  • NV alleles or “alleles of a SNV” generally refer to alternative forms of the SNV at particular locus.
  • small indels is defined as small insertion and small deletions in the genome. These small insertions and small deletions are 1 to 200 bp in length.
  • the term "subject" is defined to mean a biological entity containing expressed genetic materials.
  • the biological entity can be a plant, animal, or microorganism, including, e.g., bacteria, viruses, fungi, and protozoa.
  • the subject can be tissues, cells and their progeny of a biological entity obtained in vivo or cultured in vitro.
  • the subject can be a mammal.
  • the mammal can be a human.
  • the human may be diagnosed or suspected of being at high risk for a disease.
  • the disease can be cancer.
  • target polynucleotide is defined as a polynucleotide of interest under study.
  • a target polynucleotide may contain one or more sequences that are of interest and under study.
  • a target polynucleotide can comprise, for example, a genomic sequence.
  • the target polynucleotide can comprise a target sequence whose presence, amount, and/or nucleotide sequence, or changes in these, are desired to be determined.
  • translocation is defined as the transfer of a segment from one chromosome to another chromosome, or transfer of the segment to a new location in the same chromosome.
  • wild-type is defined as a gene sequence that is most prevalent for a locus among a population of subjects of the same species.
  • libraries described herein are prepared from single stranded or double stranded nucleic acids.
  • Single-stranded nucleic acids can be prepared from a sample of double-stranded nucleic acid using any means known in the art or described herein.
  • Starting samples can be a biological sample obtained from a subject.
  • the biological sample can be tissues, cells and their progeny obtained from a subject.
  • the biological sample can be a liquid sample including, for example, whole blood, plasma, serum, ascites, cerebrospinal fluid, sweat, urine, tears, saliva, buccal sample, cavity rinse, or organ rinse.
  • the liquid sample can be an essentially cell-free liquid sample (e.g., plasma, serum, sweat, urine, tears, etc).
  • the biological sample can also be a solid biological sample, e.g., feces, tissue biopsy, a tumor biopsy, FFPE tissues, etc.
  • a biological sample can also comprise in vitro cell culture constituents (including but not limited to conditioned medium resulting from the growth of cells in cell culture medium, recombinant cells and cell components).
  • the biological sample can be any of the above stored in a suitable way such as, for example, FFPE, lyophilized, stored in buffers, frozen, etc.
  • the nucleic can be DNA obtained from a biological sample.
  • the DNA can be obtained from formalin-fixed paraffin-embedded (FFPE) tissues (or cells) or circulating DNA.
  • FFPE formalin-fixed paraffin-embedded
  • DNA can be isolated from FFPE samples using commercially available kits (e.g., those sold by Qiagen or Covaris).
  • the DNA can also be cDNA generated from RNA isolated from a biological sample using random primed reverse transcription (RNaseH+) to generate randomly sized cDNA.
  • the DNA can be fragmented in situ to a desired size, e.g., the DNA can be sheared to an average size of 500-600 base pairs. Fragmented DNA can be treated with a base excision repair enzyme or enzyme cocktail (e.g., Endo VIII, formamidopyrimidine DNA glycosylase (FPG)) to excise damaged bases that can interfere with polymerization.
  • the DNA can also be treated with a proof-reading polymerase (e.g. T4 DNA polymerase) to polish ends and replace damaged nucleotides (e.g. abasic sites) and a heat-labile phosphatase to remove all phosphate groups from DNA.
  • the reaction mixture can be heated to 80 °C for 10 min to inactivate the phosphatase and/or polymerase and denature double stranded DNA to single strands.
  • the nucleic acid sample can also be enriched for target polynucleotides.
  • Target enrichment can be by any means known in the art.
  • the nucleic acid sample may be enriched by amplifying target sequences using target-specific primers.
  • the target amplification can occur in a digital PCR format, using any methods or systems known in the art.
  • the nucleic acid sample may be enriched by capture of target sequences onto an array of immobilized, target-selective oligonucleotides.
  • the nucleic acid sample may be enriched by hybridizing to target-selective oligonucleotides free in solution.
  • the oligonucleotides may comprise a capture moiety which enables capture by a capture reagent.
  • Other target capture methods are described in United States patent application Serial No. 15/099,525 filed April 14, 2016, which is incorporated by reference in its entirety for all purposes.
  • Nucleic acid libraries can be made using the methods disclosed herein.
  • double stranded DNA (or other nucleic acids) can be fragmented to a desired size using, e.g., restriction enzymes, nuclease, sonication, shearing, other physical treatments that break the nucleic acids, or combinations of the foregoing.
  • restriction enzymes e.g., restriction enzymes, nuclease, sonication, shearing, other physical treatments that break the nucleic acids, or combinations of the foregoing.
  • the fragmented nucleic acids is obtained from formalin fixed paraffin embedded (FFPE) samples or from other sources with damaged DNA
  • FFPE formalin fixed paraffin embedded
  • Damage can include, for example, apurinic sites, apyrimidinic sites, thymine dimers, nicks, gaps, deaminated cytosine, and 8-oxoguanine.
  • Treatments for damaged nucleic acids include, for example, use of AP-endonuclease 1, Uracil DNA glycosylase (UDG), formamidopyrimidine [fapy]-DNA glycosylase (Fpg), bifunctional DNA glycosylase OGG1, other glycosylases, DNA polymerase ⁇ , X-ray repair cross-complementing group 1 (XRCC1), DNA ligase III, Poly(ADP-ribose) polymerase (PARP-1), Uvr proteins, Endo VIII, nucleotide excision repair enzymes (e.g., CETN2, DDB1, DDB2, ERCC1, ERCC2, ERCC3, ERCC4, ERCC5, ERCC6, ERCC8, LIG1, MNAT1, MMS
  • repair products include, for example, PreCR Repair Mix sold by New England Biolabs, NEBNext FFPE DNA Repair Mix sold by New England Biolabs, or the DNA Repair Kits (version B) catalog Nos. 51296 & 51796 sold by Active Motif North America.
  • the damaged nucleic acids can be treated using the above to remove the damaged base and the sugar to which it was attached resulting in a gap or break in the nucleic acid.
  • the double stranded DNA can be converted to single stranded DNA by denaturing the dsDNA with an appropriate treatment (e.g., heat, chaotropes, or combinations).
  • Heat denaturation can be achieved by heating a dsDNA sample to about 60 °C or above, about 65 °C or above, about 70 °C or above, about 75 °C or above, about 80 °C or above, about 85 °C or above, about 90 °C or above, about 95 °C or above, or about 98 °C or above.
  • the dsDNA sample can be heated by any means known in the art, including, e.g., incubation in a water bath, a temperature controlled heat block, or a thermal cycler. In some embodiments the sample is heated for 0.5, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more than 10 minutes.
  • Denaturation by incubation in basic pH can be achieved by, for example, incubation of a dsDNA sample in a solution comprising sodium hydroxide (NaOH) or potassium hydroxide (KOH).
  • the solution can comprise about 1 mM NAOH, 2 mM NAOH, 5 mM NAOH, 10 mM NAOH, 20 mM NAOH, 40 mM NAOH, 60 mM NAOH, 80 mM NAOH, 100 mM NAOH, 0.2M NaOH, about 0.3M NaOH, about 0.4M NaOH, about 0.5M NaOH, about 0.6M NaOH, about 0.7M NaOH, about 0.8M NaOH, about 0.9M NaOH, about 1.0M NaOH, or greater than 1.0M NaOH.
  • the solution can comprise about 1 mM KOH, 2 mM KOH, 5 mM KOH, 10 mM KOH, 20 mM KOH, 40 mM KOH, 60 mM KOH, 80 mM KOH, 100 mM KOH, 0.2M KOH, 0.5M KOH, 1M KOH, or greater than 1M KOH.
  • the dsDNA sample can be incubated in NaOH or KOH for 0.5., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 30, 40, 50, 60, or more than 60 minutes.
  • the dsDNA can be incubated in Na-acetate following NaOH or KOH incubation.
  • the ssDNA fragments (or dsDNA prior to denaturation) can then be treated to add a phosphate to the 5' end of the fragments and optionally to remove the phosphate on the 3' end of the fragments.
  • the DNA can be treated with T4 polynucleotide kinase and ATP to add phosphate to the 5' end and remove phosphates from the 3 ' end of the DNA strands.
  • the 3' end can also be blocked with appropriate blocking groups, e.g., dideoxynucleotides can be added to the 3' end, or reversible protection groups can be added to the 3' end to prevent ligation reactions at the 3' end of the DNA fragments or strands.
  • appropriate blocking groups e.g., dideoxynucleotides can be added to the 3' end, or reversible protection groups can be added to the 3' end to prevent ligation reactions at the 3' end of the DNA fragments or strands.
  • Removal of 3' phosphates or blocking of the 3' end of the DNA fragments can minimize aberrant ligation of two library members. Accordingly, in some embodiments, 3' phosphates are removed and/or 3' ends are blocked in at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, or greater than 95% of DNA fragments.
  • Substantially all phosphate groups can be removed and/or substantially all 3' ends can be blocked in the DNA fragments. Substantially all phosphates are removed and/or substantially all 3' ends are blocked in at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, or greater than 95% of DNA fragments in a sample.
  • the 3 '-end a nucleic acid, adapter or other oligonucleotide can be blocked with a phosphate.
  • the 3'- phosphate can be removed allowing the 3' end of the nucleic acid, adapter or other oligonucleotide to react in a chain polymerization reaction.
  • Other 3' end blocking groups include, for example, a 3'-0-azidomethyl group, a dideoxynucleotide, 3'-0-(a- methoxyethyl)ether, 3'-0-isovaleryl ester, 3'-ONH 2 blocking groups, certain dyes, etc.
  • the ligase can be a DNA or RNA ligase.
  • Commercially sold DNA ligases include, for example, T3 DNA ligase, T4 DNA ligase, T7 DNA ligase, Taq DNA ligase, DNA ligase III, or Pfu DNA ligase.
  • the RNA ligase can be an Rnl 1 or Rnl 2 family ligase. Generally, Rnl 1 family ligases can repair single-stranded breaks in tRNA.
  • Exemplary Rnl 1 family ligases include, e.g., T4 RNA ligase, or thermostable RNA ligase 1 from Thermus scitoductus bacteriophage TS2126. These ligases generally catalyze the ATP-dependent formation of a phosphodiester bond between a nucleotide 3-OH nucleophile and a 5' phosphate group. Generally, Rnl 2 family ligases can seal nicks in duplex RNAs. Exemplary Rnl 2 family ligases include, e.g., T4 RNA ligase 2.
  • the RNA ligase can be an Archaeal RNA ligase, e.g., an archaeal RNA ligase from the thermophilic archaeon Methanobacterium thermoautotrophicum (MthRnl).
  • MthRnl thermophilic archaeon Methanobacterium thermoautotrophicum
  • the ligation of adaptors to the single-stranded or double stranded nucleic acid fragments can comprise preparing a reaction mixture comprising the DNA fragments, an adaptor, and ligase.
  • the reaction can be performed at room temperature, or at lower temperatures.
  • the reaction mixture can also be heated to effect ligation of the adaptors to the DNA fragments.
  • the reaction mixture can be heated to about 30 °C, about 35 °C, 37 °C, about 40 °C, about 45 °C, about 50 °C, about 55 °C, about 60 °C, about 65 °C, about 70 °C, or above 70 °C.
  • the reaction mixture can be heated to about 60-70 °C.
  • the reaction mixture can be heated for a sufficient time to effect ligation of the adaptor to the DNA fragment.
  • the reaction mixture can be heated for about 5 min, about 10 min, about 15 min, about 20 min, about 25 min, about 30 min, about 35 min, about 40 min, about 45 min, about 50 min, about 55 min, about 60 min, about 70 min, about 80 min, about 90 min, about 120 min, about 150 min, about 180 min, about 210 min, about 240 min, or more than 240 min.
  • the adaptors can be present at a concentration that is greater than the concentration of DNA fragments in the mixture.
  • the adaptors can be present at a concentration that is at least 10%, 20%, 30%, 40%, 60%, 60%, 70%, 80%, 90%, 100% or more than 100% greater than the concentration of DNA fragments in the mixture.
  • the adaptors can be present at concentration that is at least 10-fold, 100-fold, 1000-fold, or 10000-fold greater than the concentration of DNA fragments in the mixture.
  • the adaptors can be present at a final concentration of 0.1 uM, 0.5 uM, 1 uM, 10 uM or greater.
  • the ligase can be present in the reaction mixture at any amount suitable for ligation, including for example, a saturating amount.
  • the reaction mixture can also comprise a high molecular weight inert molecule, e.g., PEG of MW 4000, 6000, or 8000.
  • the inert molecule can be present in an amount that is about 0.5%, 1 %, 2%, 3%, 4%, 5%, 7.5%, 10%, 12.5%, 15%, 17.5%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, or greater than 50% weight/volume.
  • the inert molecular can be present in an amount that is about 0.5-2%, about 1 -5%, about 2-15%, about 10-20%, about 15-30%, about 20-50%, or more than 50% weight/volume.
  • unligated 5 '-adapters can be separated or removed from the fragment-5'- adapters by purification or other methods including for example, filtration by molecular weight cutoff, size exclusion chromatography, use of a spin column, selective precipitation with polyethylene glycol (PEG), selective precipitation with PEG onto a silica matrix, alcohol precipitation, sodium acetate precipitation, PEG and salt precipitation, or high stringency washing.
  • PEG polyethylene glycol
  • the fragment-5' -adapters are then ligated with 3 '-adapters on the 3 ' end of the 5'- adapters-fragment.
  • a protective group has been placed on the 3 ' end of the fragments to prevent ligation reactions, this protective group must be removed prior to the ligation of the 3 '-adapter. If desired, unligated 3 '-adapters can be separated or removed from the 5'- adapter-fragment-3' -adapter by purification or other methods.
  • the library of 5 '-adapter-fragment-3' -adapter nucleic acids can be directly sequenced, or the library can be subject to amplification.
  • the amplification can be performed using primers specific for sequences in the adapters, or a target directed amplification can be done using approaches as those described in, for example, United States patent application Serial No. 15/099,525 filed April 14, 2016, which is incorporated by reference in its entirety for all purposes.
  • the methods described herein can produce libraries for a variety of purposes including, for example, detection of mutations, detection of alleles, retrospective studies, disease diagnostics and monitoring, diagnostics and monitoring for disorders, research, etc.
  • the libraries can be made from any biological sample including, for example, a sample obtained from biological entity containing expressed genetic materials.
  • the biological entity can be obtained from a plant, animal, or microorganism, including, e.g., bacteria, viruses, fungi, and protozoa.
  • the biological sample can be tissues, cells and their progeny of a biological entity obtained in vivo or cultured in vitro.
  • the biological entity can be a mammal, and the mammal can be a human.
  • the human may be diagnosed or suspected of being at high risk for a disease.
  • the disease can be cancer.
  • the biological sample can be a liquid sample including, for example, whole blood, plasma, serum, ascites, cerebrospinal fluid, sweat, urine, tears, saliva, buccal sample, cavity rinse, or organ rinse.
  • the liquid sample can be an essentially cell-free liquid sample (e.g., plasma, serum, sweat, urine, tears, etc).
  • the biological sample can be a solid biological sample, e.g., feces, tissue biopsy, tumor biopsy, FFPE tissues.
  • a biological sample can also comprise in vitro cell culture constituents (including but not limited to conditioned medium resulting from the growth of cells in cell culture medium, recombinant cells and cell components).
  • the biological sample can be any of the above stored in a suitable way such as, for example, FFPE, lyophilized, stored in buffers, frozen, etc.
  • Nucleic acids are obtained from the biological sample for use in making the libraries described herein.
  • Nucleic acids can be RNA, DNA, e.g., genomic DNA, mitochondrial DNA, viral DNA, synthetic DNA, or cDNA reverse transcribed from RNA.
  • the nucleic acids in a nucleic acid sample can serve as templates for extension of a hybridized primer and/or can be substrates for attachment of adapters.
  • the nucleic acids in the library can be single stranded or double stranded.
  • the nucleic acids in the library can be directly sequenced or the nucleic acids can be amplified (selectively or non-selectively) followed by sequencing.
  • the nucleic acids are generally modified with an adaptor sequence which affects coupling (e.g., capture and/or immobilization) of the fragments to a sequencing platform, and which adapters can include sequences complementary to primers useful for amplification or sequencing of the nucleic acids.
  • the nucleic acids of the library can also be used to make whole exome libraries by generating a whole-genome library that can then be subject to capture of the known exon regions in the human or other organism genome, by methods known in the art such as hybridization with set of biotinylated long oligonucleotide baits complementary to said regions and subsequent pull-down.
  • the nucleic acids in the library can be enriched for target polynucleotides.
  • Target enrichment can be by any means known in the art.
  • the nucleic acid sample may be enriched by amplifying target sequences using target-specific primers. The target amplification can occur in a digital PCR format, using any methods or systems known in the art.
  • the nucleic acid sample may be enriched by capture of target sequences onto an array immobilized thereon target-selective oligonucleotides.
  • the nucleic acid sample may be enriched by hybridizing to target-selective oligonucleotides free in solution.
  • the oligonucleotides may comprise a capture moiety which enables capture by a capture reagent. Exemplary capture moieties and capture reagents are described herein.
  • Libraries can be made from genomic material or mRNA that provide good coverage depth and high percentage coverage of the genome (or expressed genes).
  • the libraries can provide a median coverage depth of at least 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1,000, 5,000, 10,000, 20,000, 30,000, 40,000, 50,000, 60,000, 70,000, 80,000, 90,000, or 100,000 fold.
  • the libraries can provide a median coverage depth of at least 20-30, 20-35, 20-40, 20-50, 20-60, 20-70, 20-80, 20-90, 20-100, 30-40, 30-50, 30-60, 30-70, 30-8-, 30-90, 30-100, 40-100, 50-100, 60-100, 70-100, 80-100, 90-100, 100-200, 100-300, 100-400, 100-500, 100-600, 100-700, 100-800, 100-900, 100- 1000, 200-500, 500-1000, 1000-10,000, 10,000-50,000, or 50,000-100,000 fold.
  • the libraries can also provide 70%, 80%, 90% 95%, 99% or 100% coverage of a genome, expressed genes, or target sequence with a coverage depth of 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1,000, 5,000, 10,000, 20,000, 30,000, 40,000, 50,000, 60,000, 70,000, 80,000, 90,000, or 100,000 fold.
  • the libraries can provide a sensitivity and/or precision of 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 99%, 99.5% or 99.99%.
  • Any sequencing methodologies may be used with the nucleic acids disclosed herein.
  • Commercially available sequencing methods include, e.g., sequencing-by-synthesis, ion semiconductor sequencing, pyrosequencing, reversible dye terminator sequencing, sequencing by ligation, single-molecule sequencing, sequencing by hybridization, and nanopore sequencing.
  • Platforms for sequencing by synthesis are available from, e.g., Illumina, 454 Life Sciences, Helicos Biosciences, and Qiagen.
  • Illumina platforms can include, e.g., Illumina's Solexa platform, Illumina's Genome Analyzer, and are described in Gudmundsson et al (Nat. Genet. 2009 41 : 1122-6), Out et al (Hum. Mutat.
  • Platforms for ion seminconductor sequencing include, e.g., the Ion Torrent Personal Genome Machine (PGM) and are described in U.S. Pat. No. 7,948,015, which is incorporated by reference in its entirety for all purposes.
  • Platforms for pryosequencing include the GS Flex 454 system and are described in U.S. Pat. Nos. 7,211,390; 7,244,559; 7,264,929, which are incorporated by reference in their entirety for all purposes.
  • Platforms and methods for sequencing by ligation include, e.g., the SOLiD sequencing platform from Thermo Fisher described in U.S. Pat. No.
  • the DNA sequencing technology can utilize the Ion Torrent sequencing platform, which pairs semiconductor technology with a sequencing chemistry to directly translate chemically encoded information (A, C, G, T) into digital information (0, 1) on a semiconductor chip.
  • Ion Torrent sequencing platform which pairs semiconductor technology with a sequencing chemistry to directly translate chemically encoded information (A, C, G, T) into digital information (0, 1) on a semiconductor chip.
  • A, C, G, T chemically encoded information
  • A, C, G, T chemically encoded information
  • digital information (0, 1) on a semiconductor chip.
  • the Ion Torrent platform detects the release of the hydrogen atom as a change in pH. A detected change in pH can be used to indicate nucleotide incorporation.
  • the Ion Torrent platform comprises a high-density array of micro-machined wells to perform this biochemical process in a massively parallel way.
  • Each well holds a different library member, which may be clonally amplified. Beneath the wells is an ion-sensitive layer and beneath that an ion sensor.
  • the platform sequentially floods the array with one nucleotide after another.
  • a nucleotide for example a C
  • a hydrogen ion will be released.
  • the charge from that ion will change the pH of the solution, which can be identified by Ion Torrent's ion sensor. If the nucleotide is not incorporated, no voltage change will be recorded and no base will be called. If there are two identical bases on the DNA strand, the voltage will be double, and the chip will record two identical bases called. Direct identification allows recordation of nucleotide incorporation in seconds.
  • Library preparation for the Ion Torrent platform generally involves ligation of two distinct adaptors at both ends of a DNA fragment.
  • Illumina products generally employ cluster amplification of library members onto a flow cell and a sequencing-by-synthesis approach.
  • Cluster-amplified library members are subjected to repeated cycles of polymerase-directed single base extension.
  • Single-base extension can involve incorporation of reversible-terminator dNTPs, each dNTP labeled with a different removable fluorophore.
  • the reversible-terminator dNTPs are generally 3' modified to prevent further extension by the polymerase.
  • the incorporated nucleotide can be identified by fluorescence imaging. Following fluorescence imaging, the fluorophore can be removed and the 3' modification can be removed resulting in a 3' hydroxyl group, thereby allowing another cycle of single base extension.
  • Library preparation for the Illumina platform generally involves ligation of two distinct adaptors at both ends of a DNA fragment.
  • HELICOSTM True Single Molecule Sequencing can employ sequencing- by-synthesis technology.
  • a polyA adaptor can be ligated to the 3' end of DNA fragments.
  • the adapted fragments can be hybridized to poly-T oligonucleotides immobilized on the TSMSTM flow cell.
  • the library members can be immobilized onto the flow cell at a density of about 100 million templates/cm2.
  • the flow cell can be then loaded into an instrument, e.g., HELISCOPETM sequencer, and a laser can illuminate the surface of the flow cell, revealing the position of each template.
  • a CCD camera can map the position of the templates on the flow cell surface.
  • the library members can be subjected to repeated cycles of polymerase-directed single base extension.
  • the sequencing reaction begins by introducing a DNA polymerase and a fluorescently labeled nucleotide.
  • the polymerase can incorporate the labeled nucleotides to the primer in a template directed manner.
  • the polymerase and unincorporated nucleotides can be removed.
  • the templates that have directed incorporation of the fluorescently labeled nucleotide can be discerned by imaging the flow cell surface. After imaging, a cleavage step can remove the fluorescent label, and the process can be repeated with other fluorescently labeled nucleotides until a desired read length is achieved. Sequence information can be collected with each nucleotide addition step.
  • the 454 sequencing platform (Roche) (e.g. as described in Margulies, M. et al. Nature 437:376-380 (2005), which is incorporated by reference in its entirety for all purposes) generally uses two steps.
  • DNA can be sheared into fragments.
  • the fragments can be blunt-ended.
  • Oligonucleotide adaptors can be ligated to the ends of the fragments.
  • the adaptors generally serve as primers for amplification and sequencing of the fragments.
  • At least one adaptor can comprise a capture reagent, e.g., a biotin.
  • the fragments can be attached to DNA capture beads, e.g., streptavi din-coated beads.
  • the fragments attached to the beads can be PCR amplified within droplets of an oil-water emulsion, resulting in multiple copies of clonally amplified DNA fragments on each bead.
  • the beads can be captured in wells, which can be pico-liter sized. Pyrosequencing can be performed on each DNA fragment in parallel. Pyrosequencing generally detects release of pyrophosphate (PPi) upon nucleotide incorporation. PPi can be converted to ATP by ATP sulfurylase in the presence of adenosine 5' phosphosulfate. Luciferase can use ATP to convert luciferin to oxyluciferin, thereby generating a light signal that is detected.
  • PPi pyrophosphate
  • Luciferase can use ATP to convert luciferin to oxyluciferin, thereby generating a light signal that is detected.
  • the SOLiDTM platform generally utilizes a sequencing-by-ligation approach.
  • Library preparation for use with a SOLiDTM platform generally comprises ligation of adaptors to the 5' and 3' ends of the fragments to generate a fragment library.
  • internal adaptors can be introduced by ligating adaptors to the 5' and 3' ends of the fragments, circularizing the fragments, digesting the circularized fragment to generate an internal adaptor, and attaching adaptors to the 5' and 3' ends of the resulting fragments to generate a mate-paired library.
  • clonal bead populations can be prepared in microreactors containing beads, primers, template, and PCR components. Following PCR, the templates can be denatured. Beads can be enriched for beads with extended templates. Templates on the selected beads can be subjected to a 3' modification that permits bonding to a glass slide. The sequence can be determined by sequential hybridization and ligation of partially random oligonucleotides with a central determined base (or pair of bases) that is identified by a specific fluorophore. After a color is recorded, the ligated oligonucleotide can be removed and the process can then be repeated.
  • Single molecule, real-time (SMRTTM) sequencing uses the continuous incorporation of dye-labeled nucleotides with imaging during DNA synthesis.
  • Single DNA polymerase molecules can be attached to the bottom surface of individual zero- mode wavelength identifiers (ZMW identifiers) that obtain sequence information while phospolinked nucleotides are being incorporated into the growing primer strand.
  • ZMW generally refers to a confinement structure which enables observation of incorporation of a single nucleotide by DNA polymerase against a background of fluorescent nucleotides that rapidly diffuse in an out of the ZMW on a microsecond scale.
  • incorporation of a nucleotide generally occurs on a milliseconds timescale.
  • the fluorescent label can be excited to produce a fluorescent signal, which is detected. Detection of the fluorescent signal can be used to generate sequence information. The fluorophore can then be removed, and the process repeated.
  • Library preparation for the SMRTTM platform generally involves ligation of hairpin adaptors to the ends of DNA fragments.
  • Nanopore sequencing DNA analysis techniques are being industrially developed by a number of companies, including Oxford Nanopore Technologies (Oxford, United Kingdom).
  • Nanopore sequencing is a single-molecule sequencing technology whereby a single molecule of DNA is sequenced directly as it passes through a nanopore.
  • a nanopore can be a small hole, of the order of 1 nanometer in diameter.
  • Immersion of a nanopore in a conducting fluid and application of a potential (voltage) across can result in a slight electrical current due to conduction of ions through the nanopore.
  • the amount of current which flows is sensitive to the size and shape of the nanopore and to occlusion by, e.g., a DNA molecule.
  • each nucleotide on the DNA molecule obstructs the nanopore to a different degree, changing the magnitude of the current through the nanopore in different degrees.
  • this change in the current as the DNA molecule passes through the nanopore represents a reading of the DNA sequence.
  • the DNA sequencing technology can utilize a chemical-sensitive field effect transistor (chemFET) array (e.g., as described in U. S. Patent Application Publication No. 20090026082, which is incorporated by reference in its entirety for all purposes).
  • chemFET chemical-sensitive field effect transistor
  • DNA molecules can be placed into reaction chambers, and the template molecules can be hybridized to a sequencing primer bound to a polymerase. Incorporation of one or more triphosphates into a new nucleic acid strand at the 3' end of the sequencing primer can be discerned by a change in current by a chemFET.
  • An array can have multiple chemFET sensors.
  • single nucleic acids can be attached to beads, and the nucleic acids can be amplified on the bead, and the individual beads can be transferred to individual reaction chambers on a chemFET array, with each chamber having a chemFET sensor, and the nucleic acids can be sequenced.
  • the DNA sequencing technology can utilize transmission electron microscopy (TEM).
  • the method termed Individual Molecule Placement Rapid Nano Transfer (IMPRNT), generally comprises single atom resolution transmission electron microscope imaging of high-molecular weight (150 kb or greater) DNA selectively labeled with heavy atom markers and arranging these molecules on ultra-thin films in ultra-dense (3 nm strand- to-strand) parallel arrays with consistent base-to-base spacing.
  • the electron microscope is used to image the molecules on the films to determine the position of the heavy atom markers and to extract base sequence information from the DNA.
  • PCT patent publication WO 2009/046445 which is incorporated by reference in its entirety for all purposes. The method allows for sequencing complete human genomes in less than ten minutes.
  • Sequencing By Hybridization generally comprises contacting a plurality of polynucleotide sequences with a plurality of polynucleotide probes, wherein each of the plurality of polynucleotide probes can be optionally tethered to a substrate.
  • the substrate might be flat surface comprising an array of known nucleotide sequences.
  • the pattern of hybridization to the array can be used to determine the polynucleotide sequences present in the sample.
  • each probe is tethered to a bead, e.g., a magnetic bead or the like.
  • Hybridization to the beads can be identified and used to identify the plurality of polynucleotide sequences within the sample.
  • sequence read can vary depending on the particular sequencing technology utilized. Sequencing methodologies can provide sequence reads that vary in size from tens to hundreds, or thousands of base pairs. Using sequencing methods described herein, and others known in the art, the sequence reads can be about 20 bases long, about 25 bases long, about 30 bases long, about 35 bases long, about 40 bases long, about 45 bases long, about 50 bases long, about 55 bases long, about 60 bases long, about 65 bases long, about 70 bases long, about 75 bases long, about 80 bases long, about 85 bases long, about 90 bases long, about 95 bases long, about 100 bases long, about 110 bases long, about 120 bases long, about 130, about 140 bases long, about 150 bases long, about 200 bases long, about 250 bases long, about 300 bases long, about 350 bases long, about 400 bases long, about 450 bases long, about 500 bases long, about 600 bases long, about 700 bases long, about 800 bases long, about 900 bases long, about 1000 bases long, 2,000 bases long, 3000 bases long, 4000 base long, 5000 bases long
  • Mapping of the sequences can be achieved by comparing the sequence with the sequence of a reference genome to determine the chromosomal origin of the sequenced nucleic acid (e.g. cell free DNA) molecule, and specific genetic sequence information is not needed.
  • a number of computer algorithms are available for aligning sequences, including without limitation BLAST (Altschul et al, 1990), BLITZ (MPsrch) (Sturrock & Collins, 1993), FASTA (Person & Lipman, 1988), BOWTIE (Langmead et al, Genome Biology 10:R25.1-R25.10 [2009]), or ELAND (Illumina, Inc., San Diego, Calif, USA).
  • One end of the clonally expanded copies of the DNA molecule can be sequenced and processed by bioinformatic alignment analysis for the Illumina Genome Analyzer, which uses the Efficient Large-Scale Alignment of Nucleotide Databases (ELAND) software. Additional software includes SAMtools (SAMtools, Bioinformatics, 2009, 25(16):2078-9), and the Burroughs- Wheeler block sorting compression procedure which involves block sorting or preprocessing to make compression more efficient.
  • SAMtools SAMtools, Bioinformatics, 2009, 25(16):2078-9
  • Burroughs- Wheeler block sorting compression procedure which involves block sorting or preprocessing to make compression more efficient.
  • Sequences obtained from multiple samples from a plurality of chromosomes can also be compared to identify sequence variants using the tools described herein. This direct sequence comparison can be done without the use of a reference genome.
  • An adaptor sequence can comprise a defined oligonucleotide sequence that affects coupling of a library member to a sequencing platform.
  • the adaptor can include a bar code, and sequences complementary to an immobilizing polynucleotide, primers for amplification, sequencing primers, and other primers.
  • An adaptor can include all of these sequences and others, or it can have a subset of these sequences.
  • the adaptor can comprise a sequence appropriate for a capture probe on immobilized onto a solid support (e.g., a sequencing flow cell or bead).
  • the adaptor sequence for capture has sufficient complementarity so that the adaptor anneals to the capture probe under appropriate conditions.
  • An adaptor sequence can also comprise a defined oligonucleotide sequence appropriate for a sequencing primer (e.g., the adaptor sequence has sufficient complementary or identity so the sequencing primer can anneal under the appropriate conditions).
  • the sequencing primer can enable nucleotide incorporation by a polymerase, wherein incorporation of the nucleotide is monitored to provide sequencing information.
  • the sequencing primer can be about 15-25 bases.
  • the sequencing primer can be conjugated to the 3' end of the adaptor.
  • An adaptor can comprise a sequence that has sufficient complementary or identity to an oligonucleotide sequence immobilized onto a solid support and a sequence so the immobilized probe and the adaptor can anneal under appropriate conditions. Coupling can also be achieved through serially stitching adaptors together.
  • the number of adaptors that can be stitched can be 1, 2, 3, 4 or more.
  • the stitched adaptors can be at least 35 bases, 70 bases, 105 bases, 140 bases or more.
  • the adaptor can also comprise a barcode sequence.
  • Each fragment in the library can have a unique bar code, or fragments in the library can share a bar code, depending on the use and purpose of the bar code. For example, at least 0.01 %, 0.1 %, 1 %, 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or 100% of sequencing library members in a library could comprise the same adaptor sequence.
  • the adaptor sequence can be chosen by a user according to the sequencing platform used for sequencing.
  • an Illumina sequencing by synthesis platform comprises a solid support with a first and second population of surface-bound oligonucleotides immobilized thereon.
  • oligonucleotides comprise a sequence for hybridizing to a first and second Illumina-specific adaptor oligonucleotide and priming an extension reaction.
  • a DNA library member can comprise a first Illumina-specific adaptor that is partially or wholly complementary to a first population of surface bound oligonucleotides of an Illumina system.
  • the SOLiD system, and Ion Torrent, GS FLEX system comprises a solid support in the form of a bead with a single population of surface bound oligonucleotides immobilized thereon.
  • the ssDNA library member comprises an adaptor sequence that is complementary to a surface-bound oligonucleotide of a SOLiD system, Ion Torrent system, or GS Flex system.
  • Kits for performing methods are also disclosed herein.
  • the kit can comprise a 5'- adaptor, a 3 '-adaptor, and a ligase.
  • the ligase can be a DNA ligase or an RNA ligase.
  • the kit can optionally include a DNA repair mix comprised of repair enzymes such as glycosylases, polymerases (e.g., proofreading DNA polymerases), nucleases, etc.
  • the kit can optionally include a solid support, e.g., a bead with a capture reagent.
  • the kit can optionally include a kinase and reagents for reacting the nucleic acids with the kinase.
  • the kit can optionally include a polymerase.
  • the polymerase can be a thermostable polymerase having a 5' to 3' exonuclease activity and not having a 3' to 5' exonuclease activity.
  • the kit can include a negative control sample.
  • the kit can also include a positive control sample.
  • kits can also include a packaging material.
  • packaging material refers to a physical structure housing the components of the kit.
  • the packaging material can maintain sterility of the kit components, and can be made of material commonly used for such purposes (e.g., paper, corrugated fiber, glass, plastic, foil, ampules, etc.).
  • Kits can also include a buffering agent, a preservative, or a protein/nucleic acid stabilizing agent.
  • the methods and kits described herein can be used for the sensitive detection of a mutation in a target polynucleotide.
  • the methods and kits of the invention can be used for the discrimination of alleles in a target tissue.
  • the invention provides methods and kits for the detection of mutant alleles in a background of high wild- type allelic ratio.
  • the methods and kits disclosed herein can be used for the detection of multiple alleles.
  • Kits can include one or more primer sets. Kits can further comprise instructions for use of the one or more primer sets, e.g., instructions for practicing a method described herein.
  • the kit includes a packaging material. Kits can also include a buffering agent, a preservative, or a protein/nucleic acid stabilizing agent. Kits can also include other components of a reaction mixture as described herein. For example, kits may include one or more aliquots of thermostable DNA polymerase, and/or one or more aliquots of dNTPs. Kits can also include control samples of known amounts of template DNA molecules harboring the individual alleles of a locus.
  • the kit can include a negative control sample, e.g., a sample that does not contain DNA molecules harboring the individual alleles of a locus.
  • the kit can also include a positive control sample, e.g., a sample containing known amounts of one or more of the individual alleles of a locus.
  • the libraries and methods disclosed herein can be used to detect new mutations, detect new alleles, retrospective studies, disease diagnostics and monitoring, diagnostics and monitoring for disorders, research to study, monitor, and/or improve the treatment of subjects suffering from a disease or disorder.
  • High through put sequencing of the libraries described herein can provide the sequence of regions of interest and/or the entire genome.
  • the whole genome sequence information from one subject can identify the alleles and mutations carried by the subject in genes of interest from tissues of interest. This information on alleles and mutations can be used to diagnose the subject's disease and select treatment options with the best predicted outcomes.
  • the whole genome sequence information from a plurality of subject's whom share disease or diagnosis can also be compared to characterize common alleles and mutations in the subject that correlate with disease, severity of disease, response to various treatment options, morbidity, mortality, etc.
  • This group of subjects can have known outcomes and responses to courses of treatment, and the whole genome sequencing of the subject's nucleic acids can provide retrospective information on the genetic make-up of the subjects that influenced the outcomes in the subjects.
  • the disease can be a cancer, e.g., a tumor or a leukemia such as acute leukemia, acute t-cell leukemia, acute lymphocytic leukemia, acute myelocytic leukemia, myeloblastic leukemia, promyelocytic leukemia, myelomonocytic leukemia, monocytic leukemia, erythroleukemia, chronic leukemia, chronic myelocytic (granulocytic) leukemia, or chronic lymphocytic leukemia, polycythemia vera, lymphomas such as Hodgkin's lymphoma, follicular lymphoma or non-Hodgkin's lymphoma, multiple myeloma, Waldenstrom's macroglobulinemia, heavy chain disease, solid tumors, sarcomas, carcinomas such as, e.g., fibrosarcoma, myxosarcoma, liposarcoma,
  • the disease or disorder can be any the afflicts a subject including, for example, infectious diseases, hereditary diseases, autoimmune diseases, inflammatory syndromes or diseases, coronary artery diseases, cerebrovascular diseases, cognition diseases and disorders (e.g., Alzheimer's, dementia, Parkinson's, etc.), other disorders and diseases of the brain and central nervous system, substance abuse, etc.
  • infectious diseases hereditary diseases, autoimmune diseases, inflammatory syndromes or diseases
  • coronary artery diseases e.g., cerebrovascular diseases, cognition diseases and disorders (e.g., Alzheimer's, dementia, Parkinson's, etc.), other disorders and diseases of the brain and central nervous system, substance abuse, etc.
  • the nucleic acids sequenced can include a region of a gene associated with a disease.
  • the nucleic acids can be obtained from tissue samples from subjects who have a disease (e.g., FFPE samples), or can be obtained from cell lines and organoids.
  • the genome sequenced can include druggable targets.
  • the term "druggable target” means a gene or cellular pathway that can be modulated by a disease therapy.
  • the disease can be cancer. Accordingly, the genome sequenced can contain known cancer-related genes.
  • Cancer-related genes can include, for example, ABCA1, BRAF, CHD5, EP300, FLT1, ITPA, MYC, PIK3R1, SKP2, TP53, ABCA7, BRCA1, CHEK1, EPHA3, FLT3, JAK1, MYCL1, PIK3R2, SLC19A1, TP73, ABCB1, BRCA2, CHEK2, EPHA5, FLT4, JAK2, MYCN, PKHD1, SLC1A6, TPM3, ABCC2, BRIP1, CLTC, EPHA6, FN1, JAK3, MYH2, PLCB1, SLC22A2, TPMT, ABCC3, BUB IB, COL1A1, EPHA7, FOS, JUN, MYH9, PLCG1, SLC01B3, TPO, ABCC4, Clorfl44, COPS5, EPHA8, FOXOl, KBTBD11, NAV3, PLCG2, SMAD2, TPR, ABCG2, CABLES 1, CREB1, EPHB1, FOX03, KDM
  • Example 1 Making a Library from a FFPE Biological Sample
  • a library was made from the Ashkenazim PGP Son reference standard in an FFPE format as sold by Horizon Discovery (Horizon Catalog ID GM24385).
  • the Ashkenazim PGP Son reference standard is a reference genome material selected by the Genome in a Bottle Consortium and developed by the National Institute of Standards and Technology (NIST).
  • DNA was isolated from the FFPE sample using the Reliaprep FFPE gDNA miniprep kit from Promega. The DNA was fragmented to a size of about 550 base pairs using a Covaris sonicator. Damaged nucleotides in the DNA fragments were removed using the Repair mix in the TOMA Biosciences DNA Repair kit.
  • the DNA fragments were phosphorylated on the 5' ends and dephosphorylated on the 3' ends using the kinase mix from the TOMA Biosciences DNA Repair kit.
  • the DNA fragments were then isolated from the reaction mix and denatured to make single stranded DNA fragments.
  • the ssDNA fragments were ligated to 5 '-adaptors using the TOMA Biosciences Adaptor Set, ligase, and the activation mix and AD buffer from the TOMA Bioscience Library Preparation Reagents. (See the TOMA OS-Seq Tumor Profiling System: Library Preparation Module, 2017). After this ligation, the 5'-adaptor-ssDNA fragments are isolated.
  • 5'-adaptor ssDNA fragments were reacted with a 3 '-adaptor which has a sequence complementary for appropriate Illumina sequencing primers and a sequence suitable for flow cell binding, ligase, and the activation mix and AD buffer from the TOMA Bioscience Library Preparation Reagents. After this ligation step, the 5'-adaptor ssDNA fragment 3'-adaptors were isolated.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Organic Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biotechnology (AREA)
  • Molecular Biology (AREA)
  • Biochemistry (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Microbiology (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Analytical Chemistry (AREA)
  • Plant Pathology (AREA)
  • Immunology (AREA)
  • General Chemical & Material Sciences (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

L'invention concerne des procédés de préparation de bibliothèques pour le séquençage du génome entier. Les procédés sont particulièrement appropriés pour la préparation de bibliothèques à partir d'échantillons biologiques avec de l'ADN endommagé, tels que, par exemple, des tissus inclus dans la paraffine fixés au formol. Les procédés permettent de préparer des bibliothèques de haute qualité qui peuvent être séquencées au niveau du génome entier pour les tissus avec de l'ADN endommagé.
PCT/US2018/053784 2017-10-04 2018-10-01 Préparation de bibliothèques pour le séquençage du génome entier Ceased WO2019070598A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201762567918P 2017-10-04 2017-10-04
US62/567,918 2017-10-04

Publications (1)

Publication Number Publication Date
WO2019070598A1 true WO2019070598A1 (fr) 2019-04-11

Family

ID=65994926

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2018/053784 Ceased WO2019070598A1 (fr) 2017-10-04 2018-10-01 Préparation de bibliothèques pour le séquençage du génome entier

Country Status (1)

Country Link
WO (1) WO2019070598A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024047169A1 (fr) * 2022-08-31 2024-03-07 Saga Diagnostics Ab Préparation de banques à partir d'échantillons fixes

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160265027A1 (en) * 2013-10-17 2016-09-15 Illumina, Inc. Methods and compositions for preparing nucleic acid libraries
WO2017139492A1 (fr) * 2016-02-09 2017-08-17 Toma Biosciences, Inc. Systèmes et procédé d'analyse d'acides nucléiques

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160265027A1 (en) * 2013-10-17 2016-09-15 Illumina, Inc. Methods and compositions for preparing nucleic acid libraries
WO2017139492A1 (fr) * 2016-02-09 2017-08-17 Toma Biosciences, Inc. Systèmes et procédé d'analyse d'acides nucléiques

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DO, H ET AL.: "Dramatic reduction of sequence artefacts from DNA isolated from formalin-fixed cancer biopsies by treatment with uracil-DNA glycosylase", ONCOTARGET, vol. 3, no. 5, 24 May 2012 (2012-05-24), pages 546 - 558, XP055282871, DOI: doi:10.18632/oncotarget.503 *
HYKIN, SM ET AL.: "Fixing Formalin: A Method to Recover Genomic -Scale DNA Sequence Data from Formalin-Fixed Museum Specimens Using High-Throughput Sequencing", PLOS ONE, vol. 10, no. 10, 27 October 2015 (2015-10-27), pages 1 - 16, XP055588252, ISSN: 1932-6203, DOI: 10.1371/journal.pone.0141579 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024047169A1 (fr) * 2022-08-31 2024-03-07 Saga Diagnostics Ab Préparation de banques à partir d'échantillons fixes

Similar Documents

Publication Publication Date Title
US12338496B2 (en) Methods, systems, compositions, kits, apparatus and computer-readable media for molecular tagging
JP7304393B2 (ja) Dna試料中のゲノムコピー変化の検出方法
US20180148756A1 (en) Methods, compositions, and kits for nucleic acid analysis
US20160281154A1 (en) Methods for assessing cancer
US20170101674A1 (en) Methods, compositions, and kits for nucleic acid analysis
US20190050530A1 (en) Systems and Methods for Analyzing Nucleic Acids
JP2016513959A5 (fr)
WO2015131110A1 (fr) Procédés pour l'analyse d'éléments mobiles somatiques, et leurs utilisations
KR20240004397A (ko) 다중 라이브러리의 동시 유전자 분석을 위한 조성물 및 방법
CN107002123A (zh) 多重转录组分析
WO2019070598A1 (fr) Préparation de bibliothèques pour le séquençage du génome entier
US20250223641A1 (en) Target enrichment and quantification utilizing isothermally linear-amplified probes
US20220307077A1 (en) Conservative concurrent evaluation of dna modifications
BR112019003704B1 (pt) Método para realizar uma análise genética em uma região alvo de dna a partir de uma amostra de teste

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18865236

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 21/08/2020)

122 Ep: pct application non-entry in european phase

Ref document number: 18865236

Country of ref document: EP

Kind code of ref document: A1