[go: up one dir, main page]

WO2018031588A1 - Adaptateurs d'acides nucléiques à séquences d'identification moléculaires et leur utilisation - Google Patents

Adaptateurs d'acides nucléiques à séquences d'identification moléculaires et leur utilisation Download PDF

Info

Publication number
WO2018031588A1
WO2018031588A1 PCT/US2017/045976 US2017045976W WO2018031588A1 WO 2018031588 A1 WO2018031588 A1 WO 2018031588A1 US 2017045976 W US2017045976 W US 2017045976W WO 2018031588 A1 WO2018031588 A1 WO 2018031588A1
Authority
WO
WIPO (PCT)
Prior art keywords
nucleic acid
population
adaptor
stem
double
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US2017/045976
Other languages
English (en)
Inventor
Fang Sun
Dmitry GORYUNOV
Konstantinos Charizanis
John LANGMORE
Emmanuel Kamberov
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Takara Bio USA Inc
Original Assignee
Takara Bio USA Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Takara Bio USA Inc filed Critical Takara Bio USA Inc
Publication of WO2018031588A1 publication Critical patent/WO2018031588A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6844Nucleic acid amplification reactions
    • C12Q1/6853Nucleic acid amplification reactions using modified primers or templates
    • C12Q1/6855Ligating adaptors
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids

Definitions

  • Standarding DNA ends with additional short polynucleotide sequences is used in many areas of molecular biology such as whole genome amplification and sequencing.
  • Barcodes can be used to identify nucleic acid molecules, for example, where sequencing can reveal a certain barcode coupled to a nucleic acid molecule of interest.
  • a sequence-specific event can be used to identify a nucleic acid molecule, where at least a portion of the barcode is recognized in the sequence-specific event, e.g., at least a portion of the barcode can participate in a ligation or extension reaction.
  • the barcode can therefore allow identification, selection or amplification of DNA molecules that are coupled thereto.
  • gDNA genomic DNA
  • adaptors where at least one end of each fragment of genomic DNA is ligated to an adaptor including a barcode.
  • the ligated adaptors and gDNA fragments may be nick repaired, size selected, and amplified by PCR with primers directed to the adaptors to produce an amplified library.
  • ligation adaptors each including one of 16 different barcodes can be used to prepare 16 different gDNA samples, each with a unique barcode, such that either each sample can be amplified separately by PCR using the same PCR primers and then pooled (mixed together) or each sample can be pooled first and then simultaneously amplified using the same PCR primers.
  • bar codes there are many methods to use bar codes to tag samples, there are unmet needs to tag individual molecules within the input DNA sample.
  • UMI unique molecular identification
  • molecular tags are essential to identify and quantify different molecules in the same input sample that otherwise would be indistinguishable on the basis of sequence or other properties. Multiple sequences that have been amplified with the same molecular tag can be grouped together to remove artifacts that are created during the library amplification and sequencing processes.
  • double-stranded nucleic acid adaptors comprising molecular identifications sequences. Further provided are methods of using the double-stranded nucleic acid adaptors for generating nucleic acid libraries, e.g., for amplification and sequencing.
  • a first embodiment of the present disclosure provides a population of double- stranded nucleic acid adaptors for ligating to a population of nucleic acid target molecules, the double-stranded adaptors comprising a ligateable stem region having a terminal 5' end strand and a terminal 3' end strand; a non-ligateable stem region having a terminal 5' end strand and a terminal 3' end strand; and an asymmetric loop region between the ligateable stem region and the non-ligateable stem region, wherein the asymmetric loop region comprises a molecular identification sequence (MIS).
  • MIS molecular identification sequence
  • the double-stranded nucleic acid adaptors are further defined as a single-stranded nucleic acid molecule that under ligation conditions forms a stem-loop adaptor having a distal loop region attached to the non-ligateable stem region.
  • the distal loop region comprises a non-replicable base.
  • the non-replicable base comprises an abasic site.
  • the abasic site comprises an l',2'-dideoxyribose.
  • the non- replicable base comprises a deoxyduridine or a ribonucleotide base.
  • the non-ligateable stem region comprises a primer binding site.
  • the double-stranded adaptors within the population comprise a mixture of double- stranded adaptors with a first primer binding site and double-stranded adaptors with a second primer binding site.
  • the non-ligateable stem region comprises one or more mismatched bases.
  • the ligateable stem region further comprises a variable stem region defined as a region whose length varies among the members of the population.
  • the variable stem comprises 4-15 nucleotides.
  • the variable stem comprises 8-11 nucleotides.
  • the variable stem comprises 8, 9, 10, or 11 nucleotides.
  • the molecular identification sequence is unique to a subset of the population. In certain aspects, the molecular identification sequence is degenerate to a subset of the population. In further aspects, the molecular identification sequence is unique to a subset of the population and degenerate to another subset of the population. [0010] In certain aspects, the asymmetric loop region is formed between the terminal 3' end strand of the ligateable strand and the terminal 5' end strand of the non-ligateable strand. In other aspects, the asymmetric loop region is formed between the terminal 5' end strand of the ligateable strand and the terminal 3' end strand of the non-ligateable strand.
  • the double-stranded nucleic acid adaptors comprise DNA. In certain aspects, the double -stranded nucleic acid adaptors comprise RNA. In some aspects, the double- stranded nucleic acid adaptors comprise DNA and RNA.
  • the population of nucleic acid target molecules comprise genomic DNA, fragmented DNA, cDNA, amplified DNA, or a nucleic acid library.
  • a gap region on the strand opposite the asymmetric loop region has a length of at least one nucleotide shorter than that of the asymmetric loop region. In some aspects, the length of the gap region is at least 2 nucleotides shorter than that of the asymmetric loop region. In certain aspects, the length of the gap region is less than 5 nucleotides. In particular aspects, the length of the gap region is 1 nucleotide. In specific aspects, the gap region has a length of a bond between two adjacent nucleotides. In some aspects, the gap region comprises a spacer incapable of base- pairing. In certain aspects, the spacer comprises an abasic site. In particular aspects, the abasic site comprises an l',2'-dideoxyribose.
  • the asymmetric loop region is 4 to 16 nucleotides in length. In certain aspects, the asymmetric loop region is 5 to 8 nucleotides in length. In particular aspects, the asymmetric loop region is 6 nucleotides in length.
  • the molecular identification sequence comprises 5-10 nucleotides. In particular aspects, the molecular identification sequence comprises 6, 7, or 8 nucleotides. In specific aspects, the molecular identification sequence comprises 6 nucleotides. In some aspects, the molecular identification sequence is unique throughout the population. In further aspects, the molecular identification sequence is partially degenerate within the population. In certain aspects, the molecular identification sequence is degenerate within the population.
  • a 5' terminal end and/or 3' terminal end of the ligateable stem region comprise nucleotides having phosphorothioate linkages.
  • a 5' terminal end of the ligateable stem comprises a ligation block.
  • the ligation block is a dephosphorylated nucleotide, a 5' hydroxy 1 group, a dideoxy nucleotide, or an inverted dT.
  • the ligateable stem region further comprises one or more replication blocks or cleavable bases between a 3' terminal end or the 5' terminal end and the asymmetric loop or gap region.
  • the non-ligateable stem region further comprises one or more replication blocks or cleavable bases between the asymmetric loop or gap region and the distal loop region.
  • the cleavable base is inosine, uracil, or ribonucleotide.
  • a method for producing a library of adaptor- bound target nucleic acids comprising providing a population of target nucleic acid molecules and attaching to each end a double-stranded nucleic acid adaptor according to the embodiments, thereby generating a population of adapter-bound target nucleic acid molecules; and replacing one strand of the adaptor-bound target nucleic acid molecules by strand displacement or nick translation to make an exact copy of the asymmetric loop.
  • the resultant library of adaptor-bound target nucleic acids has an MIS domain on each strand of the resultant double stranded molecule, which results in the ability to determine, during amplification, which strands came from the same original molecule.
  • This ability may increase sequencing accuracy and the ability to identify/alleviate bias as compared to conventional methods in which a UMI (i.e. unique molecular identifier) is attached to only on one strand.
  • a UMI i.e. unique molecular identifier
  • the UMI containing strand gets amplified and if errors arise, it is not possible to determine whether such errors arise from all Crick strand errors or also Watson strand errors.
  • bias may be identified and, if identified, compensation for such bias may be made.
  • the methods may include determining amplification, e.g., PCR bias, e.g., by counting molecules, based on two MIS sequences from the same double stranded molecule (or two complementary MIS sequences from the same double- stranded molecule).
  • the adaptor-bound nucleic acid molecule comprises a nick having a 3' hydroxyl group.
  • the strand displacement or nick translation polymerization is further defined as polymerization that ceases at a non-replicable base or region in the asymmetric loop or in a region of the non-ligateable stem region adjacent to the asymmetric loop.
  • a further embodiment provides a library of adaptor-bound target nucleic acids produced by the methods of the embodiments.
  • the population of adaptor-bound target nucleic acid molecules comprises a first double-stranded nucleic acid adaptor with a first primer binding site attached on one end and a second double-stranded nucleic acid adaptor with a second primer binding site attached on the other end.
  • attaching is further defined as ligating. In some aspects, attaching is further defined as double strand ligation. In other aspects, attaching is further defined as single strand ligation. In particular aspects, attaching is further defined as ligating a double-stranded nucleic acid adaptor to complementary single strands. In specific aspects, attaching is further defined as blunt end ligation. In one particular aspect, attaching is further defined as ligation to an overhang.
  • a method for producing a library of adaptor-bound target nucleic acids comprising providing a population of target nucleic acid molecules and attaching to one end a double-stranded nucleic acid adaptor according to the embodiments, thereby generating a population of adapter-bound target nucleic acid molecules; and displacing one strand of the adaptor-bound target nucleic acid molecules by strand displacement or nick translation, such that a complementary copy of the asymmetric loop is incorporated into the replaced strand.
  • the adaptor-bound nucleic acid molecule comprises a nick having a 3' hydroxyl group.
  • the strand displacement or nick translation polymerization is further defined as polymerization that ceases at a non-replicable base or region in the asymmetric loop or in a region of the non-ligateable stem adjacent to the asymmetric loop.
  • a further embodiment provides a library of adaptor-bound target nucleic acids produced by the methods of the embodiments.
  • attaching is further defined as ligating. In some aspects, attaching is further defined as double strand ligation. In other aspects, attaching is further defined as single strand ligation. In particular aspects, attaching is further defined as ligating a double-stranded nucleic acid adaptor to complementary single strands. In specific aspects, attaching is further defined as blunt end ligation. In one particular aspect, attaching is further defined as ligation to an overhang.
  • Another embodiment provides a method for producing a library of adaptor-bound target nucleic acids comprising providing a population of target nucleic acid molecules, attaching to one end a first double-stranded nucleic acid adaptor according to the embodiments, and attaching to the other end a second double-stranded nucleic acid adaptor optionally comprising a MIS, thereby generating a population of adapter-bound target nucleic acid molecules; and replacing one strand of the adaptor-bound target nucleic acid molecules by strand displacement or nick translation such that a complementary copy of the second strand is incorporated into the replaced strand.
  • the adaptor-bound nucleic acid molecule comprises a nick having a 3' hydroxyl group.
  • the strand displacement or nick translation polymerization is further defined as polymerization that ceases at a non-replicable base or region in the loop or in a region of the stem adjacent to the loop.
  • the second double-stranded nucleic acid adaptor does not comprise a MIS. In certain aspects, the second double-stranded nucleic acid adaptor does not comprise an asymmetric loop.
  • the first double-stranded nucleic acid adaptor and/or second double-stranded nucleic acid adaptor are stem-loop adaptors.
  • the first double- stranded nucleic acid adaptor comprises a first primer binding site in the non-ligateable stem region and the second double-stranded nucleic acid adaptor comprises a second primer binding site in the non-ligateable stem region.
  • attaching is further defined as ligating. In some aspects, attaching is further defined as double strand ligation. In other aspects, attaching is further defined as single strand ligation. In particular aspects, attaching is further defined as ligating a double-stranded nucleic acid adaptor to complementary single strands. In specific aspects, attaching is further defined as blunt end ligation. In one particular aspect, attaching is further defined as ligation to an overhang.
  • the method further comprises preventing MIS switching.
  • the method further comprises preventing MIS switching by contacting the library of adaptor-bound target nucleic acids and excess double-stranded nucleic acid adaptors with terminal deoxyribonucleotidyl transferase (TdT).
  • the method further comprises preventing MIS switching by performing PCR purification on the adaptor-bound target nucleic acids and excess double-stranded nucleic acid adaptors.
  • the first or second double-stranded nucleic acid adaptors comprise one or more uracils within the ligateable stem region.
  • the method further comprises contacting the library of adaptor-bound target nucleic acids and excess double-stranded nucleic acid adaptors with a uracil excision reagent, e.g., a combination of Uracil DNA glycosylase (UDG) and the DNA glycosylase-lyase Endonuclease VIII (such as the USERTM enzyme).
  • a uracil excision reagent e.g., a combination of Uracil DNA glycosylase (UDG) and the DNA glycosylase-lyase Endonuclease VIII (such as the USERTM enzyme).
  • UDG Uracil DNA glycosylase
  • the method further comprises contacting the library and excess adaptors with exonuclease I and incubating at a non-denaturing temperature prior to performing PCR.
  • FIG. 1A-1D Schematic depicting designs of non-degradable bud adaptors (YB) comprising a molecular identification sequence (MIS) in an asymmetric loop, an abasic site (e.g., l',2'-Dideoxyribose (idSp)) opposite the MIS to facilitate correct adaptor folding, abasic sites in the distal loop to function as polymerase terminators, a variable-length stem to mediate ligation to nucleic acid molecules and provide base diversity, and phosphorothioate bonds (*) to prevent degradation through exonuclease activity and decrease adaptor dimer formation.
  • MIS molecular identification sequence
  • idSp l',2'-Dideoxyribose
  • IB Schematic depicting design of non-degradable bubble design (BB) comprising a MIS in a symmetric loop, abasic site(s) opposite the MIS, abasic sites in the main loop, and phosphorothioate bonds.
  • FIG. 1C Schematic depicting design of degradable bubble with a self-complementary byproduct (RB) formed by "fold-back" synthesis that replicates the non-degraded bubble.
  • FIG. ID Exemplary sequences of dephased stem region.
  • FIG. 3A-3B Data from a NextSeqTM sequencing run of libraries prepared with 6 different designs of bud adaptors with a 5' phosphorothioate bond. Sequencing was analyzed with the Illumina ® sample barcode trimmed and the additional adaptor sequence including the molecular barcode remaining. Run480 Trimmed was mapped to the human genome and metrics measured after manually trimming the entire molecular barcode and adaptor sequence.
  • FIG. 3B Sequencing results as Q scores of the individual nucleotides starting with the MIS, proceeding across the ligatable stem and into the gDNA region, using bud adaptors comprised of two different ligatable stem lengths.
  • FIG. 4 Amplification curves of libraries using the bud adaptor with or without a 5' phosphorothioate bond (5PT) and various stem lengths (e.g., 8, 10, and 12 nucleotides). (NTC: No template control).
  • FIG. 5A-5C Sequencing data from a run including bud adaptors with variable length stems.
  • FIG. 5B-5C Input titration of 4-stem dephased bud adaptors (trimmed (FIG. 5B) and untrimmed (FIG. 5C)).
  • a significant problem in next generation sequencing (NGS) is distinguishing duplicate sequencing reads.
  • Duplicates can be broadly categorized into two families: amplification duplicates and biological duplicates.
  • Amplification duplicates also known as PCR duplicates, arise as a consequence of amplification during library preparation prior to sequencing, or are generated on the flow cell of the sequencer.
  • Biological duplicates or molecular duplicates may be the result of the generation of two identical DNA or RNA fragments arising from biological means such as the generation of multiple short, identical mRNA molecules, or as a result of random fragmentation or enzymatic fragmentation of a number of copies of the genome.
  • individual source molecules may be distinguished from one another, as each will have different unique identifiers.
  • errors arising from PCR or sequencing can be informatically determined, and differentiated from true biological mutations. Both types of data can then be used for interpretation of sequencing data.
  • the present disclosure overcomes challenges associated with current technologies by providing double-stranded nucleic acid adaptors, such as stem-loop nucleic acid adaptors, with molecular identification sequences (i.e., MIS).
  • MIS molecular identification sequences
  • These double-stranded nucleic acid adaptors can be used to tag individual template fragments with an MIS to create libraries which can be amplified and sequenced.
  • the MIS can be used to identify, distinguish, and use duplicated sequences arising from both the biological source and subsequent amplifications.
  • the MIS allow for the differentiation between PCR duplicates and true biological duplicates, enabling PCR error correction and quantitative detection of low-frequency alleles with high statistical confidence.
  • the adaptors may comprise replication stops such as non-replicable bases which can be used to stop replication at specific locations in the adaptor.
  • the unreacted stem- loop adaptors are self-complementary and therefore usually unable to participate in unwanted priming reactions that might otherwise lead to MIS switching.
  • Adaptors provided herein may be degradable or arenon-degradable adaptors.
  • thedaptors have an MIS, which may be located in an asymmetric loop (also referred to herein as a "bud") on either stem strand.
  • the nucleotides between the loops may comprise a mismatched base.
  • Across from the asymmetric loop may be an unpaired gap region which can comprise only the bond between adjacent nucleotides, unpaired nucleotides, or at least one non-replicable base or spacer, e.g., to reduce barcode bias, allow for correct folding of the adaptor and/or to prevent collapse of the asymmetric loop structure.
  • non-replicable bases include, but are not limited to: abasic lesions (e.g., a tetrahydrofuran derivative, l',2'-Dideoxyribose (idSp), etc.); nucleotide adducts; iso- nucleotide bases (e.g., isocytosine, isoguanine, and the like), and any combination thereof, etc.
  • the non-replicable bases can also be present in the distal loop of the stem-loop adaptor, e.g., to function as polymerase terminators.
  • the stem-loop adaptor can also have phosphorothioate bonds on the 3' terminal end and/or the 5' terminal end to protect the adaptor from degradation by proofreading enzymes and prevent adaptor dimers to optimize the signal-to-noise ratio.
  • the adaptors can have variable stem lengths to provide sufficient base diversity in the library at the beginning of the read for cluster detection and intensity correction, such as by RTA software, without the use of a control nucleic acid library, such as the PhiX control nucleic acid library.
  • the variable stems can add further unique data which can add to the level of barcoding, e.g., where the variable stem may, where desired, be employed in combination with an MIS domain to provide a unique molecular barcode. Diversity in the stems allows for low amplification background and the generation of fewer unmapped reads during sequencing.
  • the variable stem lengths also provide extra unique sequence information for informatic analysis of sequencing data.
  • a population of adaptors may include a common barcode domain, e.g., a region or sequence of nucleic acids that is common or the same among the population of adaptors and serves as a barcode or identifier for a source of target nucleic acids to which the adaptors are ligated during use.
  • a barcode domain may serve as an identifier of a sample from which the target nucleic acids are obtained, such that it may be viewed as a sample barcode.
  • the barcode domain may be positioned at any convenient location in the adaptor, such as the non-ligateable stem region of the adaptor, the asymmetric loop of the adaptor, etc.
  • the barcode domain may be combined with the MIS domain, e.g., such that the adaptors include a barcode/MIS domain.
  • a barcode/MIS domain is made up of a series of interspersed barcode and MIS bases.
  • interspersed is meant that the bases which are barcode bases (i.e., the bases that collectively make up the barcode component of a barcode/MIS domain) are distributed or positioned among MIS bases (i.e., the bases that collectively make up the MIS domain of a barcode/MIS domain).
  • a given barcode/MIS domain is one that includes at least one MIS base positioned adjacent to at least one barcode base, where in those instances in which the barcode/MIS domain is made up of 3 or more bases, at least two bases of a first type (e.g., MIS or barcode) may be separated by at least one base of another type (e.g., MIS or barcode).
  • the length of a given barcode/MIS domain may vary, ranging in some instances from 4 to 50 nts, where in some instances the length ranges from 5 to 25 nts, e.g., 6 to 20 nts, where specific lengths of interest include, but are not limited to: 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, and 16 nts.
  • the barcode/MIS domain may be positioned at any convenient location in the adaptor, such as the non-ligateable stem region of the adaptor, the asymmetric loop of the adaptor, etc.
  • the present disclosure provides methods for the production of libraries using the nucleic acid adaptors disclosed herein which are compatible with the major sequencing platforms including, but not limited to, Illumina's MiSeq TM , NextSeq, and HiSeq TM using conventional flow cells. Libraries are also compatible with hybridization capture target enrichment platforms, such as those manufactured by Agilent and NimbleGen. Libraries produced with the non-degradable bud stem-loop adaptors performed well across a wide target input range, including low inputs which can be difficult to amplify and sequence.
  • the template fragments can include DNA, such as cell-free DNA (cfDNA) isolated from plasma, urine, or cerebrospinal fluid samples or isolated genomic DNA (gDNA) which has been subjected to fragmentation, or RNA.
  • cfDNA cell-free DNA
  • gDNA isolated genomic DNA
  • RNA isolated genomic DNA
  • the template fragments are end repaired and at least one end of each template fragment is ligated to a stem-loop adaptor.
  • the template fragment may have an adaptor with a MIS on one end and an adaptor without an MIS on the other end.
  • the template fragment may have an adaptor with an MIS on both ends.
  • a stem-loop adaptor comprising a first primer binding site in the non- ligateable stem is ligated to one end of a template nucleic acid and a stem-loop adaptor comprising a second primer binding site in the non-ligateable stem is ligated to the other end of the template nucleic acid.
  • the two stem-loop adaptors may comprise different sequences or comprise a common sequence to promote suppression of amplification of short ligation products, i.e., adaptor dimers or short inserts that will be suppressed to reduce background from the unwanted adaptor dimers or short gDNA fragments.
  • the use of two stem-loop adaptors prevents suppression of amplification that is inherent to the use of a single adaptor, which facilitates sequencing on sequencing platforms such as those manufactured by Illumina.
  • the suppression prevents molecules which have two copies of the first adapter with the first primer binding site or two copies of the second adapter with the second primer binding site from amplification.
  • MIS switching is non-specific replacement of a MIS, originally assigned to a target molecule by the attachment chemistry of a library preparation. MIS switching may occur during PCR amplification, where residual adaptors or their byproducts carried over into the PCR step act as primers that randomly replace the original MIS with a different (non-specific) MIS.
  • TdT terminal deoxyribonucleotidyl transferase
  • TdT is then inactivated and will not interfere with PCR.
  • AMPure ® clean-up can be used after ligation and prior to PCR to remove unreacted adaptors.
  • Another method to reduce MIS switching involves inactivation of unligated stem-loop adaptors.
  • the stem-loop adaptor comprises one or more uracil residues within the terminal 5' end strand of the ligateable stem region or within the terminal 3' end strand of the non-ligateable stem region which are converted to abasic sites and degraded by a suitable enzyme, such as USERTM (Uracil-Specific Excision Reagent), to produce several short, single-stranded oligonucleotide products from the terminal 5' end strand of the ligateable stem region or the terminal 3' end strand of the non- ligateable stem region as well as an intact single-stranded oligonucleotide from the terminal 3' end strand of the ligateable stem region or the terminal 5' end strand of the non-ligateable stem region.
  • Exonuclease I is then added to the PCR premix which is incubated a temperature too low to denature the DNA; thus, the short fragments from the unreacted stem-loop adaptors and adaptor dimers are degrade
  • MIS switching is reduced by adding one or more uracils, such as 2, 3, or 4 uracils, near the 3' terminal end of the ligateable stem region of a first adaptor ⁇ e.g., the adaptor with a first primer binding site) having the MIS.
  • the second adaptor does not have a MIS and has the second primer binding site.
  • the residual activities of the repair enzymes will extend both 3' ends of the gDNA to make copies of the 5' ends of the adaptors even in the presence of uracil in one of the adaptors.
  • the 5' end of the second adaptor Upon addition of the PCR polymerase, the 5' end of the second adaptor will not be replicated due to the uracil in the template; however, the 5' end of the first adaptor will be replicated to make a copy of the MIS.
  • One strand will be replicated normally by PCR as it does not have uracil at its 5' end, and the second strand will not be PCR amplified as the PCR polymerase cannot read through uracil.
  • This method of making a nucleic acid library from one of the two strands overcomes challenges associated with duplex sequencing. Duplex sequencing requires enough reads from the forward and reverse strand to get accurate sequencing from both.
  • the sequencing resources can instead be devoted to sequencing the single strand more deeply or sequencing a strand from a different gDNA molecule.
  • this method can be used in cases where very deep sequencing is not possible or when small insertions or deletions and translocations are being detected.
  • un-ligated adaptor is removed by a 3 ' exonuclease active on both blunt and recessed 3' ends (e.g., E. coli exonuclease III), or a combination of a 5' exonuclease active on blunt and recessed 5' ends (e.g., E. coli exonuclease VIII) and a 3 ' exonuclease active on 3' protruding ends (e.g., exonuclease T).
  • the 5 ' exonuclease can expose a 3 ' extension that can be substrate for the 3' exonuclease.
  • adaptors of the disclosure when fully ligated they may not have any free ends and therefore may be protected from cleavage by the exonucleases.
  • a 3 '-protected, un-extendable blocker oligonucleotide can be added to the amplification reaction.
  • the blocker oligonucleotide can be at 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 or more nucleotides in length.
  • the blocker oligonucleotide can, in some instances, be at most 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 or more nucleotides in length.
  • the blocker oligonucleotide can be fully complementary to the 3' end of the 3 ' strand of the adaptor.
  • the blocker oligonucleotide is fully complementary to the adaptor stem sequence (e.g., dephase stem). An excess of such blocker may prevent priming by the free adaptor yet may not prevent priming by the PCR primer, which does not contain the stem sequence and is therefore still able to anneal to and prime target nucleic acid molecules.
  • the 3' terminal end of the ligateable stem region of the double- stranded nucleic acid adaptor is ligated to the 5' phosphate of the target fragment leaving a nick between the 3' end of target fragment and the 5' terminal end of the ligateable region of the double- stranded nucleic acid adaptor.
  • Polymerase extension is then performed on the adaptor-bound template by extending the 3' end of the template fragment end toward the end of the double-stranded nucleic acid adaptor, copying the molecular identification sequence, during strand displacement or nick translation.
  • the adaptor-bound target fragments are then amplified to create libraries and may then be sequenced.
  • the MIS adaptors described herein can be used for amplification and/or sequencing, such as to generate nucleic acid libraries for sequencing. I. Definitions
  • Nucleotide is a term of art that refers to a base-sugar-phosphate combination. Nucleotides are the monomeric units of nucleic acid polymers, i.e., of DNA and RNA. The term includes ribonucleotide triphosphates, such as rATP, rCTP, rGTP, or rUTP, and deoxyribonucleotide triphosphates, such as dATP, dCTP, dUTP, dGTP, or dTTP.
  • ribonucleotide triphosphates such as rATP, rCTP, rGTP, or rUTP
  • deoxyribonucleotide triphosphates such as dATP, dCTP, dUTP, dGTP, or dTTP.
  • a "nucleoside” is a base-sugar combination, i.e. , a nucleotide lacking a phosphate. It is recognized in the art that there is a certain inter-changeability in usage of the terms nucleoside and nucleotide.
  • the nucleotide deoxyuridine triphosphate, dUTP is a deoxyribonucleoside triphosphate. After incorporation into DNA, it serves as a DNA monomer, formally being deoxyuridylate, i.e. , dUMP or deoxyuridine monophosphate.
  • dUTP is a base-sugar combination
  • dUTP is a deoxyribonucleoside triphosphate.
  • dUMP deoxyuridine monophosphate.
  • one may say that one incorporates deoxyuridine into DNA even though that is only a part of
  • nucleic acid or “polynucleotide” will generally refer to at least one molecule or strand of DNA, RNA, DNA-RNA chimera or a derivative or analog thereof, comprising at least one nucleobase, such as, for example, a naturally occurring purine or pyrimidine base found in DNA (e.g. adenine "A,” guanine “G,” thymine “T” and cytosine “C”) or RNA (e.g. A, G, uracil "U” and C).
  • nucleobase such as, for example, a naturally occurring purine or pyrimidine base found in DNA (e.g. adenine "A,” guanine “G,” thymine “T” and cytosine "C”) or RNA (e.g. A, G, uracil "U” and C).
  • nucleic acid encompasses the terms “oligonucleotide” and “polynucleotide.”
  • oligonucleotide refers to at least one molecule of between about 3 and about 100 nucleobases in length.
  • polynucleotide refers to at least one molecule of greater than about 100 nucleobases in length.
  • a nucleic acid may encompass at least one double-stranded molecule or at least one triple-stranded molecule that comprises one or more complementary strand(s) or "complement(s)" of a particular sequence comprising a strand of the molecule.
  • a single stranded nucleic acid may be denoted by the prefix "ss”, a double-stranded nucleic acid by the prefix "ds”, and a triple stranded nucleic acid by the prefix "ts.”
  • nucleic acid molecule or “nucleic acid target molecule” refers to any single- stranded or double-stranded nucleic acid molecule including standard canonical bases, hypermodified bases, non-natural bases, or any combination of the bases thereof.
  • the nucleic acid molecule contains the four canonical DNA bases - adenine, cytosine, guanine, and thymine, and/or the four canonical RNA bases - adenine, cytosine, guanine, and uracil. Uracil can be substituted for thymine when the nucleoside contains a 2' -deoxyribose group.
  • the nucleic acid molecule can be transformed from RNA into DNA and from DNA into RNA.
  • mRNA can be created into complementary DNA (cDNA) using reverse transcriptase and DNA can be created into RNA using RNA polymerase.
  • a nucleic acid molecule can be of biological or synthetic origin. Examples of nucleic acid molecules include genomic DNA, cDNA, RNA, a DNA/RNA hybrid, amplified DNA, a pre-existing nucleic acid library, etc.
  • a nucleic acid may be obtained from a human sample, such as blood, serum, plasma, cerebrospinal fluid, cheek scrapings, biopsy, semen, urine, feces, saliva, sweat, etc.
  • a nucleic acid molecule may be subjected to various treatments, such as repair treatments and fragmenting treatments. Fragmenting treatments include mechanical, sonic, and hydrodynamic shearing. Repair treatments include nick repair via extension and/or ligation, polishing to create blunt ends, removal of damaged bases, such as deaminated, derivatized, abasic, or crosslinked nucleotides, etc.
  • a nucleic acid molecule of interest may also be subjected to chemical modification (e.g., bisulfite conversion, methylation / demethylation), extension, amplification (e.g., PCR, isothermal, etc.), etc.
  • Analogous forms of purines and pyrimidines are well known in the art, and include, but are not limited to aziridinylcytosine, 4-acetylcytosine, 5-fluorouracil, 5-bromouracil, 5- carboxymethylaminomethyl-2-thiouracil, 5-carboxymethylaminomethyluracil, inosine, N6- isopentenyladenine, 1 -methyladenine, 1-methylpseudouracil, 1 -methylguanine, 1-methylinosine, 2,2- dimethylguanine, 2-methyladenine, 2-methylguanine, 3-methylcytosine, 5-methylcytosine, N.sup.6- methyladenine, 7-methylguanine, 5-methylaminomethyluracil, 5-methoxyaminomethyl-2-thiouracil, beta-D-mannosylqueosine, 5-methoxyuracil, 2-methylthio-N6-isopentenyladen
  • the nucleic acid molecule can also contain one or more hypermodified bases, for example and without limitation, 5- hydroxymethyluracil, 5-hydroxyuracil, a-putrescinylthymine, 5-hydroxymethylcytosine, 5- hydroxycytosine, 5-methylcytosine,— methyl cytosine, 2-aminoadenine, acarbamoylmethyladenine, N' -methyladenine, inosine, xanthine, hypoxanthine, 2,6-diaminpurine, and ⁇ 7 -methylguanine.
  • hypermodified bases for example and without limitation, 5- hydroxymethyluracil, 5-hydroxyuracil, a-putrescinylthymine, 5-hydroxymethylcytosine, 5- hydroxycytosine, 5-methylcytosine,— methyl cytosine, 2-aminoadenine, acarbamoylmethyladenine, N' -methyladenine, inosine, xanthine
  • the nucleic acid molecule can also contain one or more non-natural bases, for example and without limitation, 7 -deaza-7 -hydroxymethyladenine, 7 -deaza-7- hydroxymethylguanine, isocytosine (isoC), 5-methylisocytosine, and isoguanine (isoG).
  • non-natural bases for example and without limitation, 7 -deaza-7 -hydroxymethyladenine, 7 -deaza-7- hydroxymethylguanine, isocytosine (isoC), 5-methylisocytosine, and isoguanine (isoG).
  • the nucleic acid molecule containing only canonical, hypermodified, non-natural bases, or any combinations the bases thereof can also contain, for example and without limitation where each linkage between nucleotide residues can consist of a standard phosphodiester linkage, and in addition, may contain one or more modified linkages, for example and without limitation, substitution of the non-bridging oxygen atom with a nitrogen atom (i.e., a phosphoramidate linkage, a sulfur atom (i.e., a phosphorothioate linkage), or an alkyl or aryl group (i.e., alkyl or aryl phosphonates), substitution of the bridging oxygen atom with a sulfur atom (i.e., phosphorothiolate), substitution of the phosphodiester bond with a peptide bond (i.e., peptide nucleic acid or PNA), or formation of one or more additional covalent bonds (i.e., locked nucleic acid or LNA), which has an
  • Nucleic acid(s) that are “complementary” or “complement(s)” are those that are capable of base-pairing according to the standard Watson-Crick, Hoogsteen or reverse Hoogsteen binding complementarity rules.
  • the term “complementary” or “complement(s)” may refer to nucleic acid(s) that are substantially complementary, as may be assessed by the same nucleotide comparison set forth above.
  • substantially complementary may refer to a nucleic acid comprising at least one sequence of consecutive nucleobases, or semiconsecutive nucleobases if one or more nucleobase moieties are not present in the molecule, are capable of hybridizing to at least one nucleic acid strand or duplex even if less than all nucleobases do not base pair with a counterpart nucleobase.
  • a "substantially complementary" nucleic acid contains at least one sequence in which about 70%, about 71%, about 72%, about 73%, about 74%, about 75%, about 76%, about 77%, about 77%, about 78%, about 79%, about 80%, about 81%, about 82%, about 83%, about 84%, about 85%, about 86%, about 87%, about 88%, about 89%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%, about 99%, to about 100%, and any range therein, of the nucleobase sequence is capable of base - pairing with at least one single or double-stranded nucleic acid molecule during hybridization.
  • the term “substantially complementary” refers to at least one nucleic acid that may hybridize to at least one nucleic acid strand or duplex in stringent conditions.
  • a “partially complementary” nucleic acid comprises at least one sequence that may hybridize in low stringency conditions to at least one single or double-stranded nucleic acid, or contains at least one sequence in which less than about 70% of the nucleobase sequence is capable of base-pairing with at least one single or double-stranded nucleic acid molecule during hybridization.
  • Oligonucleotide refers collectively and interchangeably to two terms of art, “oligonucleotide” and “polynucleotide.” Note that although oligonucleotide and polynucleotide are distinct terms of art, there is no exact dividing line between them and they are used interchangeably herein.
  • the term “adaptor” may also be used interchangeably with the terms “oligonucleotide” and “polynucleotide.”
  • Amplification refers to any in vitro process for increasing the number of copies of a nucleotide sequence or sequences. Nucleic acid amplification results in the incorporation of nucleotides into DNA or RNA. As used herein, one amplification reaction may consist of many rounds of DNA replication. For example, one PCR reaction may consist of 30-100 "cycles" of denaturation and replication. [0064] “Polymerase chain reaction,” or “PCR,” means a reaction for the in vitro amplification of specific DNA sequences by the simultaneous primer extension of complementary strands of DNA.
  • PCR is a reaction for making multiple copies or replicates of a target nucleic acid flanked by primer binding sites, such reaction comprising one or more repetitions of the following steps: (i) denaturing the target nucleic acid, (ii) annealing primers to the primer binding sites, and (iii) extending the primers by a nucleic acid polymerase in the presence of nucleoside triphosphates.
  • the reaction is cycled through different temperatures optimized for each step in a thermal cycler instrument.
  • Primer means an oligonucleotide, either natural or synthetic that is capable, upon forming a duplex with a polynucleotide template, of acting as a point of initiation of nucleic acid synthesis and being extended from its 3' end along the template so that an extended duplex is formed.
  • the sequence of nucleotides added during the extension process is determined by the sequence of the template polynucleotide.
  • primers are extended by a DNA polymerase.
  • Primers are generally of a length compatible with its use in synthesis of primer extension products, and are usually are in the range of between 8 to 100 nucleotides in length, such as 10 to 75, 15 to 60, 15 to 40, 18 to 30, 20 to 40, 21 to 50, 22 to 45, 25 to 40, and so on, more typically in the range of between 18-40, 20-35, 21-30 nucleotides long, and any length between the stated ranges.
  • Typical primers can be in the range of between 10-50 nucleotides long, such as 15-45, 18-40, 20-30, 21-25 and so on, and any length between the stated ranges.
  • the primers are usually not more than about 10, 12, 15, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 55, 60, 65, or 70 nucleotides in length.
  • stem-loop oligonucleotide refers to a structure formed by an oligonucleotide comprised of 5' and 3' terminal regions, which are intramolecular inverted repeats that form a double-stranded stem, and a non-self-complementary central region, which forms a single -stranded loop.
  • the stem-loop oligonucleotide further comprises a second or third single-stranded loop, such as within the 5' stem and/or the 3' stem.
  • An "asymmetric loop” refers to a single-stranded loop on only one stem strand with a "gap region" of unpaired bases across from the asymmetric loop.
  • non-complementary refers to nucleic acid sequence that lacks the ability to form at least one Watson-Crick base pair through specific hydrogen bonds.
  • “Cleavable base,” as used herein, refers to a nucleotide that is generally not found in a sequence of DNA.
  • deoxyuridine is an example of a cleavable base.
  • dUTP triphosphate form of deoxyuridine
  • the resulting deoxyuridine is promptly removed in vivo by normal processes, e.g., processes involving the enzyme uracil-DNA glycosylase (UDG) (U.S. Patent No. 4,873,192; Duncan, 1981; both references incorporated herein by reference in their entirety).
  • deoxyuridine occurs rarely or never in natural DNA.
  • Non-limiting examples of other cleavable bases include deoxyinosine, bromodeoxyuridine, 7-methylguanine, 5,6-dihyro-5,6 dihydroxydeoxythymidine, 3- methyldeoxadenosine, etc. (see, Duncan, 1981).
  • Other cleavable bases will be evident to those skilled in the art.
  • degenerate refers to a nucleotide or series of nucleotides wherein the identity can be selected from a variety of choices of nucleotides, as opposed to a defined sequence. In specific embodiments, there can be a choice from two or more different nucleotides. In further specific embodiments, the selection of a nucleotide at one particular position comprises selection from only purines, only pyrimidines, or from non-pairing purines and pyrimidines.
  • non-replicable base refers to a position at which polymerization ceases.
  • the non-replicable base or sequence may comprise an abasic site or sequence, hexaethylene glycol, and/or a bulky chemical moiety attached to the sugar-phosphate backbone or the base.
  • an "abasic site” lacks a base at a position in the oligonucleotide, i.e., the sugar residue is present at the position in the probe, but the purine or pyrimidine (nucleobase) group has been removed or replaced.
  • One or more abasic sites may become incorporated into one or more locations in an oligonucleotide.
  • ligase refers to an enzyme that is capable of joining the 3' hydroxyl terminus of one nucleic acid molecule to a 5' phosphate terminus of a second nucleic acid molecule to form a single molecule.
  • the ligase may be a DNA ligase or RNA ligase.
  • DNA ligases include E. coli DNA ligase, T4 DNA ligase, and mammalian DNA ligases.
  • MIS molecular identifier sequence(s)
  • a MIS can be added to a target nucleic acid by including the sequence in the adaptor to be ligated to the target.
  • a MIS can also be added to a target nucleic acid of interest during amplification by carrying out reverse transcription with a primer that contains a region comprising the barcode sequence and a region that is complementary to the target nucleic acid such that the barcode sequence is incorporated into the final amplified target nucleic acid product (i.e., amplicon).
  • the MIS may be any number of nucleotides of sufficient length to distinguish the MIS from other MIS.
  • a MIS may be anywhere from 4 to 20 nucleotides long, such as 5 to 11, or 12 to 20.
  • the MIS has a length of 6 random nucleotides.
  • the term "molecular identifier sequence,” "MIS,” “unique molecular identifier,” “UMI,” “molecular barcode,” “molecular identifier sequence”, “molecular tag sequence” and “barcode” are used interchangeably herein.
  • sample means a material obtained or isolated from a fresh or preserved biological sample or synthetically -created source that contains nucleic acids of interest.
  • a sample is the biological material that contains the variable immune region(s) for which data or information are sought.
  • Samples can include at least one cell, fetal cell, cell culture, tissue specimen, blood, serum, plasma, saliva, urine, tear, vaginal secretion, sweat, lymph fluid, cerebrospinal fluid, mucosa secretion, peritoneal fluid, ascites fluid, fecal matter, body exudates, umbilical cord blood, chorionic villi, amniotic fluid, embryonic tissue, multicellular embryo, lysate, extract, solution, or reaction mixture suspected of containing immune nucleic acids of interest. Samples can also include non-human sources, such as non-human primates, rodents and other mammals, other animals, plants, fungi, bacteria, and viruses.
  • the present disclosure provides synthetic oligonucleotides which form double-stranded adaptors for use in the generation of nucleic acid libraries.
  • the double-stranded adaptors are stem-loop adaptors comprising a distal loop.
  • the synthetic oligonucleotides which form the double-stranded adaptors can have a length of 20 to 100 nucleotides, such as 50 to 80 nucleotides, such as between 60 and 70 nucleotides.
  • Exemplary structures of the double-stranded nucleic acid adaptors, such as a bud adaptor are provided in FIGs. 1A-1C.
  • the synthetic oligonucleotides which form a bud adaptor comprise a double- stranded ligateable stem region and a double stranded non-ligateable stem region, separated by a bud (also referred to herein as an asymmetric loop region).
  • Each double-stranded region has a 5' end stem strand and a 3' end stem strand. The 3' end and the 5' end can form a blunt end or a staggered end.
  • the double-stranded regions have blunt ends.
  • the asymmetric loop i.e., bud
  • MIS molecular identification sequence
  • the double-stranded nucleic acid adaptor may further comprise a gap region on the strand opposite of the bud.
  • the gap region may only comprise the bond between adjacent nucleotides or a region of non-paired nucleotides.
  • the gap region can be on either strand between the ligateable and non-ligateable stem regions.
  • the gap region comprises a non-replicable base, such as an abasic site or spacer.
  • the double-stranded nucleic acid adaptor mayfurther comprises the non-ligateable stem region which in a stem-loop will be between the distal loop region and the asymmetric loop region.
  • this region comprises one or more mismatched bases.
  • the double-stranded nucleic acid adaptor may further comprise a primer binding site with a known sequence.
  • the primer binding site may be located in the non-ligateable stem region or the distal loop.
  • a forward primer binding site is located between the gap region and distal loop
  • a reverse primer binding site is located between the asymmetric loop and the main loop.
  • the adaptor may comprise flow cell binding sequences, such as P5 and/or P7, or fragments thereof.
  • a first adaptor comprises a P5 sequence and a second adaptor comprises a P7 sequence.
  • the adaptor can comprise part or all of sequencing primer sequences or their binding sites such as index sequencing primers for particular sequencing platforms (e.g., Illumina index primers).
  • an adaptor may include a barcode domain, e.g., a region or sequence of nucleic acids that serves as a barcode or identifier for a source of target nucleic acids to which the adaptor is ligated during use.
  • a barcode domain may serve as an identifier of a sample from which the target nucleic acids are obtained, such that it may be viewed as a sample barcode.
  • the barcode domain may be positioned at any convenient location in the adaptor, such as the non-ligateable stem region of the adaptor, the asymmetric loop of the adaptor, etc.
  • the barcode domain may be combined with the MIS domain, e.g., such that the adaptors include a barcode/MIS domain.
  • a barcode/MIS domain is made up of a series of interspersed barcode and MIS bases.
  • interspersed is meant that the bases which are barcode bases (i.e., the bases that collectively make up the barcode component of a barcode/MIS domain) are distributed or positioned among MIS bases (i.e., the bases that collectively make up the MIS domain of a barcode/MIS domain).
  • a given barcode/MIS domain is one that includes at least one MIS base positioned adjacent to at least one barcode base, where in those instances in which the barcode/MIS domain is made up of 3 or more bases, at least two bases of a first type (e.g., MIS or barcode) may be separated by at least one base of another type (e.g., MIS or barcode).
  • the length of a given barcode/MIS domain may vary, ranging in some instances from 4 to 50 nts, where in some instances the length ranges from 5 to 25 nts, e.g., 6 to 20 nts, where specific lengths of interest include, but are not limited to: 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, and 16 nts.
  • the barcode/MIS domain may be positioned at any convenient location in the adaptor, such as the non- ligateable stem region of the adaptor, the asymmetric loop of the adaptor, etc.
  • Embodiments of the methods described herein employ populations or collections of stem loop adaptors, e.g., as described above. Such populations or collections may be made up of a plurality of different, i.e., distinct, step loop adaptors that differ from each other in terms of sequence.
  • the plurality of different stem loop adaptors is made up of adaptors that have regions of common sequence (e.g., the stem regions, the loop regions) and regions of differing sequence (e.g., the MIS containing regions, the dephased stem region, etc.).
  • a population of stem loop adaptors may be made up of stem loop adaptors in which the only region that differs among the population is the MIS containing region, e.g., the disparate distinct adaptor members of the population only differ from each other in terms of their MIS sequences.
  • the number of distinct stem loop adaptors in a given population that is employed in embodiments of the invention may vary, where in some instances the amount is 10 or more, such as 50 or more, 100 or more, 500 or more, 1,000 or more, 5,000 or more, 10,000 or more, 50,000 or more, 100,000 or more, 250,000 or more, 500,000 or more, 1,000,000 or more, 5,000,000 or more, 10,000,000 or more, 20,000,000 or more, where in some instances the number is 50,000,000 or less, such as 25,000,000 or less, including 20,000,000 or less, where in some instances the number is 10,000,000 or less, 5,000,000 or less, 1,000,000 or less, 500,000 or less, 100,000 or less, including 50,000 or less.
  • a molecular identification sequence within the stem-loop adaptors, particularly in an asymmetric loop of the adaptor, allows for the tagging of individual source molecules for subsequent informatic analysis, and provides diversity and balance to analyze samples of high complexity.
  • the barcode or molecular identifier sequence within the asymmetric loop can have a length of 4 to 15 nucleotides, such as 5 to 10 nucleotides, such as 5, 6, 7, 8, 9, or 10 nucleotides.
  • the asymmetric loop has a length of 6 nucleotides resulting in 16.8xl0 6 total possible combinations of MIS adaptors within a library.
  • the random barcode sequence is generated by using a mixture of A, G, C, and/or T for incorporation of nucleotides into the MIS of the double-stranded nucleic acid adaptor.
  • the gap region across from the asymmetric loop has a length at least 1 nucleotide less than the length of the asymmetric loop. In some aspects, the gap region is at least 2, or up to at least 5, nucleotides shorter than the length of the asymmetric loop.
  • an adaptor with an asymmetric loop of 6 nucleotides would have a gap region of less than 6 nucleotides, such as 5, 4, 3, 2, 1, or 0 (e.g., nucleotide bond) nucleotides in length.
  • the gap region has a length of 1 nucleotide, such as one non-replicable base, particularly one abasic site.
  • barcodes may be employed. Barcoding is described, e.g. , in U.S. Pat. 7,902, 122. Methods of using stem loop adaptor ligation and primer extension or PCR to add additional sequences are described, e.g. , in U.S. Pat. 7,803,550, which is incorporated by reference herein in its entirety. Barcode incorporation by primer extension, for example via PCR, may be performed using methods described in U.S. 5,935,793 or US 2010/0227329. In some embodiments, a barcode may be incorporated into a nucleic acid via using ligation, which can then be followed by amplification; for example, methods described in U.S. Pat. 5,858,656, U.S.
  • U.S. Pat. Publn. 2011/0319290, or U.S. Pat. Publn. 2012/0028814 may be used with the present invention.
  • one or more barcode may be used, e.g. , as described in U.S. Pat. Publn. 2007/0020640, U.S. Pat. Publn. 2009/0068645, U.S. Pat. Publn. 2010/0273219, U.S. Pat. Publn. 2011/0015096, or U.S. Pat. Publn. 2011/0257031.
  • the double-stranded nucleic acid adaptors further comprise a variable length stem region in the ligateable stem region between the terminal end (e.g., the 5' terminal end and/or the 3' terminal end) and the bud or gap region.
  • the variable stem provides sufficient diversity in the library at the beginning of the read for cluster detection and intensity correction without a control nucleic acid library, such as the PhiX control nucleic acid library.
  • the variable stems also provide more unique information for distinguishing between sequences bioinformatically.
  • the variable stems (also referred to herein as the dephased stems) can differ by a single nucleotide (e.g., FIG.
  • a population of double-stranded nucleic acid adaptors can comprise a mixture of double-stranded nucleic acid adaptors having a stem length of n, n+1, n+2, and n+3, such as 8, 9, 10, and 11.
  • the variable stems have a length n between 3-20 nucleotides, particularly 6-15 nucleotides, such as 6, 7, 8, 9, 10, 11, or 12 nucleotides.
  • dephased stems are described in Lundberg et al., 2013, and Wu et al., 2015.
  • variable-length stem sequences within one subset of double-stranded nucleic acid adaptors include TGAGCTAC, TGAGCTACT, TGAGCTACTG, and TGAGCTACTGA as well as the sequences disclosed in FIG. ID.
  • the terminal end such as the 5' end, comprises a ligation block.
  • the ligation block is a dephosphorylated nucleotide, a 5' hydroxy 1, or an inverted base, such as inverted dT.
  • a 5' end of the double-stranded nucleic acid adaptor oligonucleotide lacks a phosphate.
  • the 5 'end and/or 3 'end has at least one phosphorothioate bond.
  • the phosphorothioate bond can protect the adaptor from degradation by proofreading enzymes (e.g., 5 '-3' exonuclease) and prevent unwanted ligation products or adaptor dimers.
  • the double-stranded nucleic acid adaptor has a phosphorothioate modification on the last 2 bases of the 3' terminal end, and the 1st base of the 5' terminal end to deter adapter dimer formation and optimize the signal-to-noise ratio.
  • exonuclease resistant modifications may include phosphorodithioates, methyl phosphonates and 2'-0-methyl sugars, either separately or in combination.
  • a number of other modifications are known to reduce the exonuclease degradation of single DNA strands, including phosphoramidites (P-NR2), phosphorofluoridates (P-F), boranophosphanes (P-BH3) or phosphoroselenoates (P-Se), and modifications to the sugar rings, such as 2'-0 alkyl groups, 2'-fluoro groups, 2' -amino groups such as 2-amino propyl.
  • the double-stranded nucleic acid adaptor comprises a replication stop or non-replicable base.
  • the gap region may comprise a non- replicable base or spacer, such as an abasic site or cleavable base.
  • the distal loop of the stem-loop oligonucleotide adaptor may comprise a non-replicable base or spacer, such as an abasic site or cleavable base.
  • the replication stop may be at the 5' end of the stem, the 3' end of the stem, or proximal to the distal loop.
  • the non-replicable base can function as a polymerase terminator and facilitates correct adaptor folding. Correct adaptor folding, facilitated by the use of non-replicable bases, also prevents spurious priming by excess stem loop adaptors as the folded, stem-loop conformation is thermodynamically favored rather than hybridization to library molecules.
  • the adaptor may comprise at least 2, 3, 4, 5, 6 or more non-replicable bases depending on the length of the adaptor.
  • Non-replicable bases include, but are not limited to, l',2'-dideoxyribose (idSp), and deoxyuridine.
  • Cleavable bases include, but are not limited to: uracil, inosine or a ribonucleotide.
  • spacer means a hydrocarbon residue with preferably one to six carbon atoms, preferably an alkdiyl group with 2 to 4 carbon atoms, most preferred linear C3 (5'-C3-spacer).
  • Double-stranded nucleic acid adaptors comprising cleavable bases can be cleaved by enzymes or chemical reagents.
  • cleaving agents include DNA repair enzymes, glycosylases, DNA cleaving endonucleases, ribonucleases and silver nitrate.
  • cleavage at dU may be achieved using uracil DNA glycosylase and endonuclease VIII (USERTM, NEB, Ipswich, Mass.) (U.S. Pat. No. 7,435,572).
  • the modified nucleotide is a ribonucleotide
  • the adapter can be cleaved with an endoribonuclease.
  • Abasic sites can be recognized and cleaved by AP endonucleases and/or AP lyases.
  • Class II AP endonucleases cleave at AP sites to leave a 3' OH that can be used in polynucleotide polymerization.
  • AP endonucleases can remove moieties attached to the 3' OH that inhibit polynucleotide polymerization. For example a 3' phosphate can be converted to a 3' OH by E. coli endonuclease IV.
  • AP endonucleases can work in conjunction with glycosylases.
  • FIGS. 1A-1D provide depictions of illustrative embodiments of MIS containing adaptors according to certain embodiments of the invention.
  • a non-degradable bud adaptor is illustrated, where the adaptor comprises a molecular identification sequence (MIS) in an asymmetric loop (or bud), an abasic site ⁇ e.g., 1 ',2'-Dideoxyribose (idSp)) opposite the MIS to facilitate correct adaptor folding, abasic sites in the distal loop to function as polymerase terminators, a variable ength stem to mediate ligation to nucleic acid molecules and provide base diversity, and phosphorothioate bonds (*) to prevent degradation through exonuclease activity and decrease adaptor dimer formation.
  • MIS molecular identification sequence
  • idSp 1 ',2'-Dideoxyribose
  • FIG. IB provides a schematic depicting a non-degradable bubble adaptor (BB) comprising a MIS in a symmetric loop, abasic site(s) opposite the MIS, abasic sites in the main loop, and phosphorothioate bonds.
  • FIG. 1 C provides a schematic depicting a degradable bubble with a self- complementary byproduct (RB).
  • FIG. ID provides exemplary sequences of de-phased stem regions that may be present in adaptors of the invention.
  • Double-stranded nucleic acid adaptors and stem-loop oligonucleotides can be used as adaptors for preparing libraries for whole genome or whole transcriptome amplification for PCR analysis, microarray analysis, conventional Sanger or next generation sequencing, e.g., as described in U.S. Pat. No. 7,803,550.
  • a whole genome is amplified from a single cell.
  • a method of preparing a library of nucleic acid molecules For example, libraries generated by DNA fragmentation and addition of a stem-loop adaptor to one or both DNA ends may be used to amplify (by PCR) and sequence DNA regions adjacent to a previously established DNA sequence (see, for example, U.S. Patent No. 6,777,187 and references therein, all of which are incorporated by reference herein in their entirety).
  • the double-stranded nucleic acid adaptor can be ligated to the 5' end, the 3' end, or both strands of DNA.
  • a plurality of nucleic acid molecules are amplified and sequenced by ligating the plurality of nucleic acid molecules to a population, e.g., as described above, of double-stranded nucleic acid adaptors.
  • One method comprises obtaining a population of target nucleic acid molecules and attaching at least one end of a double-stranded nucleic acid adaptor to at least one end of the target nucleic acid molecule and displacing one strand of the adaptor bound oligonucleotide by strand displacement or nick translation.
  • a bud stem-loop adaptor comprising a MIS is ligated to both ends of the target nucleic acid.
  • a bud stem-loop adaptor comprising a MIS is ligated to one end of the target nucleic acid and a stem-loop adaptor not comprising a MIS is ligated to the other end of the target nucleic acid.
  • the two adaptors ligated to each end of a target nucleic acid may comprise part or all of a first sequencing primer sequence or a second sequencing primer sequence, such that an adaptor on one end has part or all of a first sequencing primer sequence and an adaptor on the other end has part or all of a second sequencing primer sequence.
  • the adaptor may be ligated to one strand (i.e., single-stranded ligation) or to both strands (i.e., double-stranded ligation) of the target nucleic acid.
  • the target nucleic acid is a double-stranded DNA molecule.
  • the double-stranded DNA may be any type of DNA (or sub-type thereof) including, but not limited to, genomic DNA (e.g., prokaryotic genomic DNA (e.g., bacterial genomic DNA, archaea genomic DNA, etc.), eukaryotic genomic DNA (e.g., plant genomic DNA, fungi genomic DNA, animal genomic DNA (e.g., mammalian genomic DNA (e.g., human genomic DNA, rodent genomic DNA (e.g., mouse, rat, etc.), etc.), insect genomic DNA (e.g., drosophila), amphibian genomic DNA (e.g., Xenopus), etc.)), viral genomic DNA, mitochondrial DNA, cell-free DNA, such as NIPT DNA, including fetal and/or maternal cell free DNA, or any combination of DNA types thereof or subtypes thereof.
  • genomic DNA e.g., prokaryotic genomic DNA (e.g.,
  • the method comprises attaching an adaptor to complementary single strands of the double-stranded DNA molecule.
  • the plurality of genomic DNA molecules are enzymatically digested or randomly fragmented to produce DNA fragments, a MIS stem-loop adaptor is ligated to at least one end of a plurality of the DNA fragments to produce adaptor-linked fragments, and the adaptor-linked fragments are then amplified.
  • the target nucleic acids are isolated from cell-free DNA (cfDNA), e.g., where the DNA is an NIPT DNA sample.
  • the isolated cfDNA may comprise fragments (e.g., of about 50 to 200 bp, particularly about 167 bp in length) and not need a fragmentation step prior to library preparation.
  • a MIS double-stranded nucleic acid adaptor may be coupled to one end of a target nucleic acid molecule or to both ends of a target nucleic acid molecule.
  • the double-stranded nucleic acid adaptor may be coupled to the nucleic acid molecule via ligation to the 5' end of the nucleic acid molecule, for example, by blunt-end ligation. Ligating the double-stranded nucleic acid adaptor to one or both ends of a target nucleic acid molecule may result in nick formation. Said one or more nicks may be removed from the ligated double-stranded nucleic acid adaptor and the nucleic acid target molecule.
  • the adaptor-bound nucleic acid molecule comprises a nick having a 3' hydroxy 1 group and strand displacement or nick translation polymerization is performed to extend the nucleic acid molecules to the adaptor.
  • the polymerization may cease at a non-replicable base, such as within the gap region.
  • polymerization may cease in the region between loops, and/or the main loop.
  • an extension reaction may extend the 3' end of the nucleic acid molecule through the stem-loop adaptor where the loop portion is cleaved at a cleavable replication stop.
  • methods of the present invention utilize a strand-displacing polymerase, such as ⁇ 29 Polymerase, Bst Polymerase, Vent Polymerase, 9oNm Polymerase, Klenow fragment of DNA Polymerase I, MMLV Reverse Transcriptase, AMV reverse transcriptase, HIV reverse transcriptase, a mutant form of T7 phage DNA polymerase that lacks 3 '-5 ' exonuclease activity, or a mixture thereof.
  • a strand-displacing polymerase such as ⁇ 29 Polymerase, Bst Polymerase, Vent Polymerase, 9oNm Polymerase, Klenow fragment of DNA Polymerase I, MMLV Reverse Transcriptase, AMV reverse transcriptase, HIV reverse transcriptase, a mutant form of T7 phage DNA polymerase that lacks 3 '-5 ' exonuclease activity, or a mixture thereof.
  • Nucleic acids in a nucleic acid sample being analyzed (or processed) in accordance with the present invention can be from any nucleic acid source, e.g., as described above.
  • nucleic acids in a nucleic acid sample can be from virtually any nucleic acid source, including but not limited to genomic DNA, complementary DNA (cDNA), RNA (e.g., messenger RNA, ribosomal RNA, short interfering RNA, microRNA, etc.), plasmid DNA, mitochondrial DNA, cfDNA, etc.
  • genomic DNA complementary DNA
  • RNA e.g., messenger RNA, ribosomal RNA, short interfering RNA, microRNA, etc.
  • plasmid DNA mitochondrial DNA
  • cfDNA mitochondrial DNA
  • any organism can be used as a source of nucleic acids to be processed in accordance with the present invention, no limitation in that regard is intended.
  • Exemplary organisms include, but are not limited to, plants, animals (e.g., reptiles, mammals, insects, worms, fish, etc.), bacteria, fungi (e.g., yeast), viruses, etc.
  • the nucleic acids in the nucleic acid sample are derived from a mammal, where in certain embodiments the mammal is a human.
  • a nucleic acid molecule of interest can be a single nucleic acid molecule or a plurality of nucleic acid molecules.
  • a nucleic acid molecule of interest can be of biological or synthetic origin. Examples of nucleic acid molecules include genomic DNA, cDNA, RNA, amplified DNA, a pre-existing nucleic acid library, etc.
  • a nucleic acid molecule of interest may be subjected to various treatments, such as repair treatments and fragmenting treatments. Fragmenting treatments include mechanical, sonic, chemical, enzymatic, degradation over time, etc. Repair treatments include nick repair via extension and/or ligation, polishing to create blunt ends, removal of damaged bases such as deaminated, derivatized, abasic, or crosslinked nucleotides, etc.
  • a nucleic acid molecule of interest may also be subjected to chemical modification (e.g., bisulfite conversion, methylation / demethylation), extension, amplification (e.g., PCR, isothermal, etc.), etc.
  • the reaction may or may s not use a fragmentation step.
  • the plurality of nucleic acid molecules comprises nucleic acid fragments, such as gDNA subject to fragmentation.
  • the shear force may be a hydrodynamic shear force, such as those generated by acoustic or mechanical means.
  • Hydrodynamic shearing of a nucleic acid can occur by any method known in the art, including passing the nucleic acid through a narrow capillary or orifice, referred to as "point-sink” shearing (Oefner et al, 1996; Thorstenson et al, 1998: Quail, 2010), acoustic shearing, or sonication.
  • the commercially available focused-ultrasonicators in conjunction with miniTUBEs or microTUBEs (Covaris, Woburn, MA; U.S. Patent Nos. 8,459,121; 8,353,619; 8,263,005; 7,981,368; 7,757,561), can randomly fragment DNA with distributions centered between 2-5 kb and 0.1-1.5 kb, respectively.
  • Sonication subjects nucleic acid to hydrodynamic shearing forces (Grokhovsky, 2006; Sambrook et al, 2006).
  • the commercially available Bioruptor (Diagenode; Denville, NJ; U.S. Patent Publn. No. 2012/0264228) use sonication to shear nucleic acids.
  • a nucleic acid fragment may have a size of about 100 bp, about 200 bp, about 300 bp, about 400 bp, about 500 bp, about 1000 bp, or about 2000 bp.
  • the nucleic acid fragments may have an average size of about 100 bp, about 200 bp, about 300 bp, about 400 bp, about 500 bp, about 1000 bp, or about 2000 bp.
  • a nucleic acid molecule may have a size of about 2000 bp, 5000 bp, 7500 bp, 10,000 bp, 20,000 bp, 30,000 bp, 40,000 bp, 50,000 bp, 60,000 bp, 70,000 bp, 80,000 bp, 90,000 bp, or 100,000 bp.
  • Nucleic acids may be, for example, RNA or DNA. Modified forms of RNA or DNA may also be used.
  • a given protocol may include a pooling step, e.g., where a first adaptor ligated composition is combined or pooled with the one or more additional adaptor ligated compositions.
  • a pooling step e.g., where a first adaptor ligated composition is combined or pooled with the one or more additional adaptor ligated compositions.
  • nucleic acid fragments tagged according to aspects of the subject invention are pooled with nucleic acid fragments derived from a plurality of sources (e.g., a plurality of organisms, tissues, cells, or subjects), where by "plurality" is meant two or more.
  • the number of different tagged compositions produced from different sources that are combined or pooled in such embodiments may vary, where the number ranges in some instances from 2 to 50, such as 3 to 25, including 4 to 20 or 10,000, or more.
  • the different tagged compositions Prior to or after pooling, can be amplified, e.g., by polymerase chain reaction (PCR), such as described above.
  • PCR polymerase chain reaction
  • the RNA molecule may be obtained from a sample, such as a sample comprising total cellular RNA, a transcriptome, or both; the sample may be obtained from one or more viruses; from one or more bacteria; or from a mixture of animal cells, bacteria, and/or viruses, for example.
  • the sample may comprise mRNA, such as mRNA that is obtained by affinity capture.
  • Obtaining nucleic acid molecules may comprise generation of the cDNA molecule by reverse transcribing the mRNA molecule with a reverse transcriptase, such as, for example Tth DNA polymerase, HIV Reverse Transcriptase, AMV Reverse Transcriptase, MMLV Reverse Transcriptase, or a mixture thereof.
  • a reverse transcriptase such as, for example Tth DNA polymerase, HIV Reverse Transcriptase, AMV Reverse Transcriptase, MMLV Reverse Transcriptase, or a mixture thereof.
  • PCRTM polymerase chain reaction
  • two synthetic oligonucleotide primers which are complementary to two regions of the template DNA (one for each strand) to be amplified, are added to the template DNA (that need not be pure), in the presence of excess deoxynucleotides (dNTP's) and a thermostable polymerase, such as, for example, Taq (Thermus aquaticus) DNA polymerase.
  • dNTP's deoxynucleotides
  • a thermostable polymerase such as, for example, Taq (Thermus aquaticus) DNA polymerase.
  • the target DNA is repeatedly denatured (around 90°C), annealed to the primers (typically at 50-60°C) and a daughter strand extended from the primers (72°C). As the daughter strands are created they act as templates in subsequent cycles.
  • the template region between the two primers is amplified exponentially, rather than linearly.
  • a second barcode such as a sample barcode
  • One method involves annealing a primer to the first barcoded nucleic acid molecule, the primer including a first portion complementary to the first barcoded nucleic acid molecule and a second portion including a second barcode; and extending the annealed primer to form a dual barcoded nucleic acid molecule, the dual barcoded nucleic acid molecule including the second barcode, the first barcode, and at least a portion of the nucleic acid molecule.
  • the primer may include a 3' portion and a 5' portion, where the 3' portion may anneal to a portion of the first barcode and the 5' portion comprises the second barcode.
  • DNA sequencing techniques include classic dideoxy sequencing reactions (Sanger method) using labeled terminators or primers and gel separation in slab or capillary, sequencing-by-synthesis using reversibly terminated labeled nucleotides, pyrosequencing, 454 sequencing, allele specific hybridization to a library of labeled oligonucleotide probes, sequencing-by-synthesis using allele specific hybridization to a library of labeled clones that is followed by ligation, real time monitoring of the incorporation of labeled nucleotides during a polymerization step, and SOLiD sequencing.
  • the nucleic acid library may be generated with an approach compatible with
  • Illumina sequencing such as a NexteraTM DNA sample prep kit, and additional approaches for generating Illumina next-generation sequencing library preparation are described, e.g., in Oyola et al. (2012).
  • a nucleic acid library is generated with a method compatible with a SOLiDTM or Ion Torrent sequencing method (e.g. , a SOLiD® Fragment Library Construction Kit, a SOLiD® Mate-Paired Library Construction Kit, SOLiD® ChlP-Seq Kit, a SOLiD® Total RNA-Seq Kit, a SOLiD® SAGETM Kit, a Ambion® RNA-Seq Library Construction Kit, etc.). Additional methods for next-generation sequencing methods, including various methods for library construction that may be used with embodiments of the present invention are described, e.g., in Pareek (2011) and Thudi (2012).
  • the sequencing technologies used in the methods of the present disclosure include the HiSeqTM system (e.g., HiSeqTM 2000 and HiSeqTM 1000) and the MiSeqTM system from Illumina, Inc.
  • the HiSeqTM system is based on massively parallel sequencing of millions of fragments using attachment of randomly fragmented genomic DNA to a planar, optically transparent surface and solid phase amplification to create a high density sequencing flow cell with millions of clusters, each containing about 1 ,000 copies of template per sq. cm. These templates are sequenced using four-color DNA sequencing-by-synthesis technology.
  • the MiSeqTM system uses TruSeqTM, Illumina's reversible terminator-based sequencing-by-synthesis.
  • 454 sequencing involves two steps. In the first step, DNA is sheared into fragments of approximately 300-800 base pairs, and the fragments are blunt ended. Oligonucleotide adaptors are then ligated to the ends of the fragments. The adaptors serve as primers for amplification and sequencing of the fragments. The fragments can be attached to DNA capture beads, e.g., streptavidin-coated beads using, e.g., Adaptor B, which contains 5'-biotin tag.
  • DNA capture beads e.g., streptavidin-coated beads using, e.g., Adaptor B, which contains 5'-biotin tag.
  • the fragments attached to the beads are PCR amplified within droplets of an oil- water emulsion. The result is multiple copies of clonally amplified DNA fragments on each bead.
  • the beads are captured in wells (pico-liter sized). Pyrosequencing is performed on each DNA fragment in parallel. Addition of one or more nucleotides generates a light signal that is recorded by a CCD camera in a sequencing instrument. The signal strength is proportional to the number of nucleotides incorporated.
  • SOLiD sequencing genomic DNA is sheared into fragments, and adaptors are attached to the 5' and 3' ends of the fragments to generate a fragment library.
  • internal adaptors can be introduced by ligating adaptors to the 5' and 3' ends of the fragments, circularizing the fragments, digesting the circularized fragment to generate an internal adaptor, and attaching adaptors to the 5' and 3' ends of the resulting fragments to generate a mate-paired library.
  • clonal bead populations are prepared in microreactors containing beads, primers, template, and PCR components. Following PCR, the templates are denatured and beads are enriched to separate the beads with extended templates. Templates on the selected beads are subjected to a 3' modification that permits bonding to a glass slide.
  • IonTorrent uses a high-density array of micro-machined wells to perform this biochemical process in a massively parallel way. Each well holds a different DNA template. Beneath the wells is an ion-sensitive layer and beneath that a proprietary Ion sensor. If a nucleotide, for example a C, is added to a DNA template and is then incorporated into a strand of DNA, a hydrogen ion will be released. The charge from that ion will change the pH of the solution, which can be detected by the proprietary ion sensor.
  • a nucleotide for example a C
  • the sequencer will call the base, going directly from chemical information to digital information.
  • the Ion Personal Genome Machine (PGMTM) sequencer then sequentially floods the chip with one nucleotide after another. If the next nucleotide that floods the chip is not a match, no voltage change will be recorded and no base will be called. If there are two identical bases on the DNA strand, the voltage will be double, and the chip will record two identical bases called. Because this is direct detection— no scanning, no cameras, no light— each nucleotide incorporation is recorded in seconds.
  • SMRTTM single molecule, real-time
  • each of the four DNA bases is attached to one of four different fluorescent dyes. These dyes are phospholinked.
  • a single DNA polymerase is immobilized with a single molecule of template single stranded DNA at the bottom of a zero-mode waveguide (ZMW).
  • ZMW is a confinement structure which enables observation of incorporation of a single nucleotide by DNA polymerase against the background of fluorescent nucleotides that rapidly diffuse in and out of the ZMW (in microseconds). It takes several milliseconds to incorporate a nucleotide into a growing strand.
  • the fluorescent label is excited and produces a fluorescent signal, and the fluorescent tag is cleaved off. Detection of the corresponding fluorescence of the dye indicates which base was incorporated. The process is repeated.
  • a further sequencing platform includes the CGA Platform (Complete
  • Genomics The CGA technology is based on preparation of circular DNA libraries and rolling circle amplification (RCA) to generate DNA nanoballs that are arrayed on a solid support (Drmanac et al. 2010).
  • Complete genomics' CGA Platform uses a novel strategy called combinatorial probe anchor ligation (cPAL) for sequencing. The process begins by hybridization between an anchor molecule and one of the unique adapters. Four degenerate 9-mer oligonucleotides are labeled with specific fluorophores that correspond to a specific nucleotide (A, C, G, or T) in the first position of the probe. Sequence determination occurs in a reaction where the correct matching probe is hybridized to a template and ligated to the anchor using T4 DNA ligase.
  • cPAL combinatorial probe anchor ligation
  • the ligated anchor-probe molecules After imaging of the ligated products, the ligated anchor-probe molecules are denatured. The process of hybridization, ligation, imaging, and denaturing is repeated five times using new sets of fluorescently labeled 9-mer probes that contain known bases at the n + 1, n + 2, n + 3, and n + 4 positions.
  • FIG. 2 provides a schematic depiction of a process for library construction according to an embodiment of the invention.
  • fragmented double stranded genomic DNA starting is combined with a population of non-degradable bud adaptors each comprising an MIS.
  • the 3' end of the genomic DNA is extended along the stem of the bud adaptor, through the MIS containing bud, until it reaches a non-replicable base in the loop.
  • Resultant primer binding sites initial provided in the non-ligateable stem region of the adaptors are then employed to amplify the DNA, where the amplified DNA includes sample barcode and P5/P7 domains, e.g., for Illumina NGS.
  • kits for creating libraries of target nucleic acids in a sample refers to a combination of physical elements.
  • a kit may include, for example, one or more components such as double-stranded nucleic acid adaptors or stem- loop adaptors, including without limitation specific primers, enzymes, reaction buffers, an instruction sheet, and other elements useful to practice the technology described herein.
  • These physical elements can be arranged in any way suitable for carrying out the invention.
  • the kit may further comprise a polymerase, such as a strand displacing polymerase, including, for example, ⁇ 29 Polymerase, Bst Polymerase, Vent Polymerase, 9°Nm Polymerase, Klenow fragment of DNA Polymerase I, MMLV Reverse Transcriptase, a mutant form of T7 phage DNA polymerase that lacks 3 ' -5 ' exonuclease activity, or a mixture thereof.
  • a polymerase such as a strand displacing polymerase, including, for example, ⁇ 29 Polymerase, Bst Polymerase, Vent Polymerase, 9°Nm Polymerase, Klenow fragment of DNA Polymerase I, MMLV Reverse Transcriptase, a mutant form of T7 phage DNA polymerase that lacks 3 ' -5 ' exonuclease activity, or a mixture thereof.
  • kits may be packaged either in aqueous media or in lyophilized form.
  • the container means of the kits will generally include at least one vial, test tube, flask, bottle, syringe or other container means, into which a component may be placed, and preferably, suitably aliquoted (e.g., aliquoted into the wells of a microtiter plate). Where there is more than one component in the kit, the kit also will generally contain a second, third or other additional container into which the additional components may be separately placed. However, various combinations of components may be comprised in a single vial.
  • the kits of the present invention also will typically include a means for containing the nucleic acids, and any other reagent containers in close confinement for commercial sale. Such containers may include injection or blow molded plastic containers into which the desired vials are retained.
  • kits will also include instructions for employing the kit components as well the use of any other reagent not included in the kit. Instructions may include variations that can be implemented. It is contemplated that such reagents are embodiments of kits of the invention. Such kits, however, are not limited to the particular items identified above and may include any reagent used for the manipulation or characterization of the methylation of a gene.
  • the container means of the kits will generally include at least one vial, test tube, flask, bottle, or other container means, into which a component may be placed, and preferably, suitably aliquoted. Where there is more than one component in the kit, the kit also will generally contain additional containers into which the additional components may be separately placed. However, various combinations of components may be comprised in a container.
  • the kits of the present invention also will typically include a means for packaging the component containers in close confinement for commercial sale. Such packaging may include injection or blow-molded plastic containers into which the desired component containers are retained.
  • Libraries were prepared from both individual and pooled plasma samples obtained from donors. Cell-free DNA was isolated from the pooled plasma samples using the Qiagen QIAamp Circulating Nucleic Acid kit. Libraries were prepared as in the ThruPLEX ® Plasma-seq Kit (Rubicon Genomics ® ), including repairing the cfDNA to produce molecules with blunt ends, with the difference being ligation of the stem-loop adaptors depicted in FIG. 1 A to the 5' end of the cfDNA, leaving a nick at the 3' end of the target fragment. Next, the 3' ends of the cfDNA were extended to complete library synthesis and Illumina-compatible indexes were added by amplification.
  • the library was then processed and sequenced on the Illumina MiSeq, NextSeq500 on both mid- and high-output flow cells, as well as the HiSeq2500 and HiSeq3000. Sequencing data was generated using PicardTools.
  • each of the three adaptor designs were evaluated using sequencing analysis metrics, particularly the percentage of unmapped reads. While the non-degradable bubble (BB) design (FIG. 1A) was found to lose diversity due to collapse of the structure, the non-degradable bud (YB) design (FIG. 1C) was shown to have a significant reduction in the percentage of unmapped reads (FIG. 3 A). Further, the bud adaptor design with a 5' phosphorothioate bond in addition the 3' phosphorothioate bond (5PTYB-idSp) showed a significant reduction in the percentage of unmapped reads compared to the bud adaptor design with only the 3' phosphorothioate (YBidSp).
  • the bud adaptor design (YB-5PT-idSp) with 2 abasic sites in the main loop, phosphorothioate modification on the last 2 of the 3' bases, and the 1st of the 5' base to deter adapter dimer formation was used for the subsequent studies.
  • the bud stem -loop adaptors with dephased stems of 8, 10, and 12 nucleotides were ligated to a 0.5 ng pooled plasma input DNA and amplified for 14 cycles. All of the adaptors produced similar amplification results with the 10 or 8 bp stem adaptors amplifying slightly better than the 12 bp stem adaptors (FIG. 4) possibly due to better strand displacement during PCR. All of the bud adaptors showed a nice delta Ct between the samples and NTC libraries.
  • a population of double-stranded nucleic acid adaptors for ligating to a population of nucleic acid target molecules comprising:
  • asymmetric loop region between the ligateable stem region and the non-ligateable stem region, wherein the asymmetric loop region comprises a molecular identification sequence (MIS).
  • MI molecular identification sequence
  • double-stranded adaptors are further defined as a single-stranded nucleic acid molecule that under ligation conditions forms a stem-loop adaptor having a distal loop region attached to the non-ligateable stem region.
  • the double-stranded adaptors within the population comprise a mixture of double-stranded adaptors with a first primer binding site and double-stranded adaptors with a second primer binding site.
  • the ligateable stem region further comprises a variable stem region defined as a region whose length varies among the members of the population.
  • non-ligateable stem region comprises one or more mismatched bases.
  • nucleic acid target molecules comprises genomic DNA, fragmented DNA, cDNA, amplified DNA, or a nucleic acid library.
  • non-replicable base comprises a deoxyuridine or a ribonucleotide base.
  • a gap region on the strand opposite the asymmetric loop region has a length of at least one nucleotide shorter than that of the asymmetric loop region.
  • variable stem comprises 8-11 nucleotides.
  • variable stem comprises 8, 9, 10, or 11 nucleotides.
  • the ligateable stem region further comprises one or more replication blocks or cleavable bases between a 3' terminal end or the 5' terminal end and the asymmetric loop or gap region.
  • non-ligateable stem region further comprises one or more replication blocks or cleavable bases between the asymmetric loop or gap region and the distal loop region.
  • a method for producing a library of adaptor-bound target nucleic acids comprising:
  • strand displacement or nick translation polymerization is further defined as polymerization that ceases at a non-replicable base or region in the asymmetric loop or in a region of the non-ligateable stem region adjacent to the asymmetric loop.
  • a method for producing a library of adaptor-bound target nucleic acids comprising:
  • strand displacement or nick translation polymerization is further defined as polymerization that ceases at a non-replicable base or region in the asymmetric loop or in a region of the non-ligateable stem adjacent to the asymmetric loop.
  • a method for producing a library of adaptor-bound target nucleic acids comprising: (a) providing a population of target nucleic acid molecules, attaching to one end a first double-stranded nucleic acid adaptor according to any one of clauses 1-45, and attaching to the other end a second double-stranded nucleic acid adaptor optionally comprising a MIS, thereby generating a population of adapter-bound target nucleic acid molecules; and
  • first double-stranded nucleic acid adaptor comprises a first primer binding site in the non-ligateable stem region and the second double-stranded nucleic acid adaptor comprises a second primer binding site in the non-ligateable stem region.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • Physics & Mathematics (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Analytical Chemistry (AREA)
  • Immunology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

L'invention concerne des adaptateurs d'acides nucléiques à double brin comprenant des séquences d'identification moléculaires. L'invention concerne également des méthodes d'utilisation desdits adaptateurs d'acides nucléiques à double brin pour générer des banques d'acides nucléiques, par exemple pour l'amplification et le séquençage.
PCT/US2017/045976 2016-08-09 2017-08-08 Adaptateurs d'acides nucléiques à séquences d'identification moléculaires et leur utilisation Ceased WO2018031588A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201662372543P 2016-08-09 2016-08-09
US62/372,543 2016-08-09

Publications (1)

Publication Number Publication Date
WO2018031588A1 true WO2018031588A1 (fr) 2018-02-15

Family

ID=61162494

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2017/045976 Ceased WO2018031588A1 (fr) 2016-08-09 2017-08-08 Adaptateurs d'acides nucléiques à séquences d'identification moléculaires et leur utilisation

Country Status (1)

Country Link
WO (1) WO2018031588A1 (fr)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113166742A (zh) * 2018-10-24 2021-07-23 华盛顿大学 用于消耗和富集核酸序列的方法和试剂盒
US11332784B2 (en) 2015-12-08 2022-05-17 Twinstrand Biosciences, Inc. Adapters, methods, and compositions for duplex sequencing
US11479807B2 (en) 2017-03-23 2022-10-25 University Of Washington Methods for targeted nucleic acid sequence enrichment with applications to error corrected nucleic acid sequencing
EP4230748A1 (fr) * 2018-03-02 2023-08-23 F. Hoffmann-La Roche AG Génération de matrices d'adn double brin pour séquençage de molécule unique
US11739367B2 (en) 2017-11-08 2023-08-29 Twinstrand Biosciences, Inc. Reagents and adapters for nucleic acid sequencing and methods for making such reagents and adapters
CN116676175A (zh) * 2023-03-17 2023-09-01 四川大学 一种多条形码直接rna纳米孔测序分类器
US11845985B2 (en) 2018-07-12 2023-12-19 Twinstrand Biosciences, Inc. Methods and reagents for characterizing genomic editing, clonal expansion, and associated applications
ES2958260R1 (es) * 2021-04-29 2024-09-23 4Basebio S L U Adn lineal con resistencia potenciada contra exonucleases
EP4294941B1 (fr) * 2021-02-18 2025-03-26 F. Hoffmann-La Roche AG Structure pour empêcher l'enfilage de matrices d'acide nucléique à travers un nanopore pendant le séquençage
ES2996704R1 (es) * 2021-07-30 2025-07-16 4Basebio S L U Adn lineal con resistencia potenciada contra exonucleasas y metodos para la produccion de los mismos

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6175002B1 (en) * 1997-04-15 2001-01-16 Lynx Therapeutics, Inc. Adaptor-based sequence analysis
US20070212704A1 (en) * 2005-10-03 2007-09-13 Applera Corporation Compositions, methods, and kits for amplifying nucleic acids
US20120238738A1 (en) * 2010-07-19 2012-09-20 New England Biolabs, Inc. Oligonucleotide Adapters: Compositions and Methods of Use
US20120244525A1 (en) * 2010-07-19 2012-09-27 New England Biolabs, Inc. Oligonucleotide Adapters: Compositions and Methods of Use
WO2015134552A1 (fr) * 2014-03-03 2015-09-11 Swift Biosciences, Inc. Ligature d'adaptateur améliorée
US20170211140A1 (en) * 2015-12-08 2017-07-27 Twinstrand Biosciences, Inc. Adapters, methods, and compositions for duplex sequencing

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6175002B1 (en) * 1997-04-15 2001-01-16 Lynx Therapeutics, Inc. Adaptor-based sequence analysis
US20070212704A1 (en) * 2005-10-03 2007-09-13 Applera Corporation Compositions, methods, and kits for amplifying nucleic acids
US20120238738A1 (en) * 2010-07-19 2012-09-20 New England Biolabs, Inc. Oligonucleotide Adapters: Compositions and Methods of Use
US20120244525A1 (en) * 2010-07-19 2012-09-27 New England Biolabs, Inc. Oligonucleotide Adapters: Compositions and Methods of Use
WO2015134552A1 (fr) * 2014-03-03 2015-09-11 Swift Biosciences, Inc. Ligature d'adaptateur améliorée
US20170211140A1 (en) * 2015-12-08 2017-07-27 Twinstrand Biosciences, Inc. Adapters, methods, and compositions for duplex sequencing

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11332784B2 (en) 2015-12-08 2022-05-17 Twinstrand Biosciences, Inc. Adapters, methods, and compositions for duplex sequencing
US12006532B2 (en) 2017-03-23 2024-06-11 University Of Washington Methods for targeted nucleic acid sequence enrichment with applications to error corrected nucleic acid sequencing
US11479807B2 (en) 2017-03-23 2022-10-25 University Of Washington Methods for targeted nucleic acid sequence enrichment with applications to error corrected nucleic acid sequencing
US11739367B2 (en) 2017-11-08 2023-08-29 Twinstrand Biosciences, Inc. Reagents and adapters for nucleic acid sequencing and methods for making such reagents and adapters
EP4230748A1 (fr) * 2018-03-02 2023-08-23 F. Hoffmann-La Roche AG Génération de matrices d'adn double brin pour séquençage de molécule unique
US11845985B2 (en) 2018-07-12 2023-12-19 Twinstrand Biosciences, Inc. Methods and reagents for characterizing genomic editing, clonal expansion, and associated applications
CN113166742A (zh) * 2018-10-24 2021-07-23 华盛顿大学 用于消耗和富集核酸序列的方法和试剂盒
CN113166742B (zh) * 2018-10-24 2024-10-01 华盛顿大学 用于消耗和富集核酸序列的方法和试剂盒
EP4294941B1 (fr) * 2021-02-18 2025-03-26 F. Hoffmann-La Roche AG Structure pour empêcher l'enfilage de matrices d'acide nucléique à travers un nanopore pendant le séquençage
ES2958260R1 (es) * 2021-04-29 2024-09-23 4Basebio S L U Adn lineal con resistencia potenciada contra exonucleases
ES3008581R1 (es) * 2021-04-29 2025-04-07 4Basebio S L U Adn lineal con resistencia potenciada contra exonucleases
ES2996704R1 (es) * 2021-07-30 2025-07-16 4Basebio S L U Adn lineal con resistencia potenciada contra exonucleasas y metodos para la produccion de los mismos
CN116676175B (zh) * 2023-03-17 2024-04-09 四川大学 一种多条形码直接rna纳米孔测序分类器
CN116676175A (zh) * 2023-03-17 2023-09-01 四川大学 一种多条形码直接rna纳米孔测序分类器

Similar Documents

Publication Publication Date Title
WO2018031588A1 (fr) Adaptateurs d'acides nucléiques à séquences d'identification moléculaires et leur utilisation
US10711269B2 (en) Method for making an asymmetrically-tagged sequencing library
US20220259638A1 (en) Methods and compositions for high throughput sample preparation using double unique dual indexing
KR102872035B1 (ko) 단일 표면 프라이머를 사용하는 페어드 엔드 서열분석을 위한 방법 및 조성물
US20190005193A1 (en) Digital measurements from targeted sequencing
EP3981884A1 (fr) Banques de génomes entiers de cellules individuelles pour le séquençage de méthylation
CN108138228B (zh) 用于下一代测序的高分子量dna样品追踪标签
US20220267848A1 (en) Detection and quantification of rare variants with low-depth sequencing via selective allele enrichment or depletion
US20220098642A1 (en) Quantitative amplicon sequencing for multiplexed copy number variation detection and allele ratio quantitation
CN112654715B (zh) 改进多核苷酸簇克隆性优先性的方法
US20170175182A1 (en) Transposase-mediated barcoding of fragmented dna
EP4146663A1 (fr) Séquençage d'amplification de déplacement de bloqueur quantitatif (qbda) pour quantification de fréquence d'allèle variant sans étalonnage et multiplexé
US20220042100A1 (en) Quantifying foreign dna in low-volume blood samples using snp profiling
US20240336913A1 (en) Method for producing a population of symmetrically barcoded transposomes
US20250179568A1 (en) Target Enrichment
US20230250470A1 (en) Amplicon comprehensive enrichment
US20230340581A1 (en) Non-extensible oligonucleotides in dna amplification reactions
HK40062228A (en) Quantitative amplicon sequencing for multiplexed copy number variation detection and allele ratio quantitation
HK40076229A (en) Methods and compositions for high throughput sample preparation using double unique dual indexing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17840167

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17840167

Country of ref document: EP

Kind code of ref document: A1