WO2025085618A1 - Methods for appending adaptors onto polynucleotides - Google Patents
Methods for appending adaptors onto polynucleotides Download PDFInfo
- Publication number
- WO2025085618A1 WO2025085618A1 PCT/US2024/051747 US2024051747W WO2025085618A1 WO 2025085618 A1 WO2025085618 A1 WO 2025085618A1 US 2024051747 W US2024051747 W US 2024051747W WO 2025085618 A1 WO2025085618 A1 WO 2025085618A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- adaptor
- polynucleotide
- strand
- polymerase
- flap
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6806—Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
Definitions
- the disclosure relates to methods for appending adaptors to the 5’ and/or 3’ ends of polynucleotides.
- Library preparation aims to build a collection of DNA fragments for nextgeneration sequencing (NGS).
- NGS nextgeneration sequencing
- a high-quality DNA library guarantees uniform and consistent genome coverage, thus delivering comprehensive and reliable sequencing data.
- the conversion of sample DNA to library DNA can be inefficient using standard ligation methodologies, however.
- Next generation sequencing typically requires library preparation, where known adaptor DNA sequences are added to the target DNA to be sequenced. Traditionally, this requires that sample DNA is fragmented, end-repaired, and then ligated to the adaptor DNA. While ligation-mediated library prep can yield the highest quality genomes, the conversion of sample DNA to library DNA can be inefficient. In cases where the quantity of sample DNA is in short supply, this poor efficiency makes ligation-mediated library prep more challenging or even infeasible.
- amplicon based or probe-based hybridization/pulldown Traditional methods for target enrichment for NGS broadly fall into two categories: amplicon based or probe-based hybridization/pulldown.
- the former employs primer pairs and PCR to amplify targets from a sample; it is simple and fast but limited in its ability to multiplex very high numbers of targets due to PCR mispriming events. It is also restricted in the size of amplicon that can be produced due to the limits of current PCR technology.
- Other disadvantages of PCR such as sequence bias or polymerase slippage can also impact the performance scope.
- Hybridization approaches are generally longer in practice than amplicon methods but are virtually limitless in the number of targets that can be enriched. Poorer specificity arising from hybridization of a single probe only, is mitigated by additional rounds of pulldown and/or increasing the probe length and T m .
- the disclosure provides methods to append adaptors to the 5’ and/or 3’ ends of polynucleotides.
- the resulting adaptor-polynucleotide constructs can be then used in various applications, including NGS.
- the disclosure provides a method to append adaptors to the 5’ and 3’ ends of polynucleotides, comprising: fragmenting gDNA or cDNA into polynucleotides that are less than 1000 base pairs in length; end repairing and phosphorylating the polynucleotides; attaching adaptors to the 5’ and 3’ ends of the end-repaired polynucleotides using non-homologous end joining factors.
- the gDNA or cDNA is fragmented by enzymatic digestion, chemical cleavage, sonication, nebulization, or hydroshearing.
- the gDNA or cDNA is fragmented by sonication.
- the DNA fragments are enzymatically end repaired and phosphorylated by using T4 DNA polymerase and T4 polynucleotide kinase.
- a single ‘A’ deoxynucleotide is added to the end- repaired DNA fragments by use of Klenow enzyme which lacks exonuclease activity.
- the adaptors comprise a 3' overhang of a ‘T’ deoxynucleotide.
- the adaptors comprise a double stranded region of complementary sequence and a single stranded region of sequence mismatch.
- the adaptors are Y- shaped or U-shaped.
- the single stranded regions of the adaptors comprise one or more of the following sequences: P5: 5' AAT GAT ACG GCG ACC ACC GA 3' (SEQ ID NO: 32) and P7: 5' CAA GCA GAA GAC GGC ATA CGA GAT 3' (SEQ ID NO: 33).
- oligonucleotides are added to the 3’ ends of the DNA fragments with terminal transferase.
- the adaptors comprise an overhang of base pairs that are complementary to the oligonucleotides added to the 3’ ends of the DNA fragments.
- the adaptors comprise a double stranded region of complementary sequence and a single stranded region of sequence mismatch.
- the adaptors are Y-shaped or U-shaped.
- the single stranded regions of the adaptors comprise one or more of the following sequences: P5: 5' AAT GAT ACG GCG ACC ACC GA 3' (SEQ ID NO: 32) and P7: 5' CAA GCA GAA GAC GGC ATA CGA GAT 3' (SEQ ID NO: 33).
- the non-homologous end joining factors are LigD and Ku, or an engineered variant thereof.
- the LigD and Ku are from, or derived from, from Mycobacterium.
- a non-homologous end joining factor is encoded by a polypeptide that has a sequence that is at least 80%, 85%, 90%, 95%, 98%, 99% identical to SEQ ID NO: 1 to 20 and has LigD activity.
- a non-homologous end joining factor is encoded by a polypeptide that has a sequence that is at least 80%, 85%, 90%, 95%, 98%, 99% identical to SEQ ID NO:21 to 30 and has Ku activity.
- the engineered variant of LigD lacks exonuclease activity.
- the engineered variant has the sequence of SEQ ID NO: 1 with the following substitution H373 A.
- the disclosure also provides a method to append an adaptor to the 5’ end of a polynucleotide, comprising the steps of: (1) hybridizing a 5’ flap adaptor to a single stranded polynucleotide to form a hybridized product comprising a 5’ flap; (2) contacting the hybridized product with a structure-specific endonuclease that has 5’ flap cleavage activity, wherein the structure-specific endonuclease cleaves off the 5’ flap of the hybridized product to form a nicked hybridized product; and (3) contacting the nicked hybridized product with a ligase to form a ligated product comprising a 5’ flap adaptor appended to the 5’ end of the polynucleotide.
- the method further comprises appending a second adaptor to the 3’ end of the polynucleotide, comprising the steps of: (4) hybridizing a 3’ flap adaptor to the polynucleotide of (3) to form a second hybridized product comprising a 3’ flap; (5) contacting the second hybridized product with a second structure-specific endonuclease that has 3’ flap cleavage activity, wherein the second structure-specific endonuclease cleaves off the 3’ flap to form a clipped hybridized product that has a 3’ overhang of base pairs from the 3’ flap adaptors; and (6) contacting the clipped hybridized product with a polymerase, wherein the polymerase fills in the 3’ overhang of the clipped hybridized product to form a polynucleotide comprising adaptors at the 5’ and 3’ ends.
- the disclosure also provides a method of appending an adaptor to the 3’ end of a polynucleotide, comprise the steps of: (A) hybridizing a 3’ flap adaptor to a single stranded polynucleotide to form a hybridized product comprising a 3’ flap; (B) contacting the hybridized product with a second structurespecific endonuclease that has 3’ flap cleavage activity, wherein the second structure-specific endonuclease cleaves off the 3’ flap to form a clipped hybridized product that has a 3’ overhang of base pairs from the 3’ flap adaptor; and (C) contacting the clipped hybridized product with a polymerase, wherein the polymerase fills in the 3’ overhang to form a polynucleotide with an adaptor appended to the 3’ end.
- the method further comprises appending a second adaptor to the 5’ end of the polynucleotide, comprising the steps of: (D) hybridizing a 5’ flap adaptor to the polynucleotide of (C) to form a second hybridized product comprising a 5’ flap; (E) contacting the second hybridized product with a structure-specific endonuclease that has 5’ flap cleavage activity, wherein the structure-specific endonuclease cleaves off the 5’ flap of the second hybridized product to form a nicked hybridized product; and (F) contacting the nicked hybridized product with a ligase to form a ligated product comprising a 5’ flap adaptor appended to the 5’ end of the polynucleotide.
- the disclosure further provides a method of appending adaptors to the 5’ and 3’ ends of a polynucleotide, comprising: (i) hybridizing a 5’ flap adaptor and a 3’ flap adaptor to a single stranded polynucleotide to form a hybridized product comprising a 5’ flap and a 3’ flap; (ii) contacting the hybridized product with a structure-specific endonuclease that has 5’ flap cleavage activity, wherein the structure-specific endonuclease cleaves off the 5’ flap of the hybridized product to form a nicked hybridized product; (iii) contacting the nicked hybridized product with a ligase to form a ligated product comprising a 5’ flap adaptor appended to the 5’ end of the polynucleotide; (iv) contacting the ligated product with a second structure-specific endonuclease that has 3’ flap cle
- the 5’ flap adaptor comprises a double stranded adaptor region and a single stranded probe region, wherein the single stranded probe region is complementary to a target sequence of the polynucleotide, and wherein the double stranded adaptor region comprises a universal sequence.
- the base-pair of the double stranded adaptor region adjacent to the single stranded probe region also matches to the target sequence of the polynucleotide.
- the universal sequence is a sequence that is commonly used to generate sequence reads using a next generation sequencing platform.
- the structurespecific endonuclease that has 5’ flap cleavage activity is FEN1.
- the ligase is ligase selected from T4 DNA ligase, T7 DNA ligase, and Hi-T4 DNA ligase.
- the 3’ flap adaptor comprises a single stranded adaptor region and a single stranded probe region, wherein the single stranded probe region is complementary to a target sequence of the polynucleotide, and wherein the single stranded adaptor region comprises a universal sequence.
- the universal sequence is a sequence that is commonly used to generate sequence reads using a next generation sequencing platform.
- the structure-specific endonuclease that has 3’ flap cleavage activity is XPF/MUS81.
- the 5’ flap adaptor and/or the 3’ flap adaptor comprises a bar code sequence.
- the polynucleotides comprising 3’ and/or 5’ adaptors come from different genetic or polynucleotide sources and the source of the polynucleotides can be identified based upon the bar code sequence.
- the disclosure further provides a method to append adaptors to the 5’ and 3’ ends of polynucleotides to form an adaptor-polynucleotide constructs, comprising: (a) appending an adaptor to the 5' end of a polynucleotide by tagmenting the polynucleotide with a Tn5 transposome comprising: an adaptor strand that is transferred to the 5' end of the polynucleotide, and a non-transferred strand that can be removed under denaturing conditions; (b) annealing a replacement oligonucleotide that comprises one or more locked nucleic acids (LNAs) to the polynucleotide comprising the 5' adaptor; (c) extending the polynucleotide comprising the 5' adaptor up to the replacement oligonucleotide using a non-strand displacing polymerase and dNTPs, wherein the extended product comprises a binding region
- the adaptor strand may or may not have nucleotide modifications.
- the adaptor strand comprises one or more LNAs.
- the non-transferred strand can be denatured and removed using mild heat followed by a hot wash.
- the replacement oligonucleotide has a higher Tm than the non-transferred strand.
- the replacement oligonucleotide partially hybridizes to the same region as the non-transferred strand, resulting in the 5' portion of the polynucleotide being single strand upstream of the replacement oligonucleotide.
- the non-strand displacing polymerase is selected from a T4-based polymerase, a T7-based polymerase, a Pfu-based polymerase, and a Taq-based polymerase.
- the template switch oligonucleotide is blocked at its 3' end by having an -OH group, an ‘inverted T’ group, or a dideoxy version of a dNTP.
- the Tn5 transposome is immobilized on a streptavidin paramagnetic bead.
- the Tn5 transposome is immobilized on a solid substrate by the adaptor strand hybridizing to an anchor oligonucleotide attached to the solid substrate.
- the 5' adaptor and the 3' adaptor have different sequences selected from either P5 or P7: P5: 5' AAT GAT ACG GCG ACC ACC GA 3' (SEQ ID NO: 32) or P7: 5' CAA GCA GAA GAC GGC ATA CGA GAT 3' (SEQ ID NO: 33).
- the adaptor-polynucleotide constructs are used as templates for sequencing.
- the disclosure also provides a method to append adaptors to the 5’ and 3’ ends of polynucleotides to form an adaptor-polynucleotide constructs, comprising: (a) appending an adaptor to the 5' end of a polynucleotide by tagmenting the polynucleotide with a Tn5 transposome comprising an adaptor strand that is transferred to the 5' end of the polynucleotide, and a replacement oligonucleotide that comprises locked nucleic acids (LNAs) that remains hybridized to the adaptor under moderate denaturing conditions, and a nontransferred strand that can be removed under moderate denaturing conditions; (b) denaturing under moderate denaturing conditions to remove the non-transferred strand and extending the polynucleotide comprising the 5' adaptor up to the replacement oligonucleotide comprising LNAs using a non-strand displacing poly
- LNAs locked nucleic
- the adaptor strand may or may not have nucleotide modifications.
- the adaptor strand comprises one or more LNAs.
- the nontransferred strand is from 15 bp to 20 bp in length.
- the non- transferred strand can be denatured and removed using mild heat followed by a hot wash, wherein the replacement oligonucleotide remains hybridized to the adaptor under these conditions.
- the non-strand displacing polymerase is selected from a T4- based polymerase, a T7-based polymerase, a Pfu-based polymerase, and a Taq-based polymerase.
- the template switch oligonucleotide is blocked at its 3' end by having an -OH group, an ‘inverted T’ group, or a dideoxy version of a dNTP.
- the Tn5 transposome is immobilized on a solid substrate by the adaptor strand hybridizing to an anchor oligonucleotide attached to the solid substrate.
- the 5' adaptor and the 3' adaptor have different sequences selected from either P5 or P7: P5: 5' AAT GAT ACG GCG ACC ACC GA 3' (SEQ ID NO: 32) or P7: 5' CAA GCA GAA GAC GGC ATA CGA GAT 3' (SEQ ID NO: 33).
- the adaptor- polynucleotide constructs are used as templates for sequencing.
- the disclosure also provides a method to append adaptors to the 5’ and 3’ ends of polynucleotides to form an adaptor-polynucleotide constructs, comprising: (a) appending an adaptor to the 5' end of a polynucleotide by tagmenting the polynucleotide with a Tn5 transposome comprising an adaptor strand that is transferred to the 5' end of the polynucleotide, and a non-transferred strand which contains linkage(s) that resist exonuclease activity that remains hybridized to the adaptor strand when the adaptor strand is appended to the 5' end of the polynucleotide; (b) extending the polynucleotide comprising the 5' adaptor with a polymerase with 5’ to 3’ exonuclease activity (“5' exo polymerase”), wherein the 5' exo polymerase digests the hybridized non-appending strand up
- the adaptor strand may or may not have nucleotide modifications.
- the adaptor strand comprises one or more LNAs.
- the linkage(s) that resist exonuclease activity are phosphorothioate linkage(s), carbophosphonate linkage(s), pyridylphosphonate (PyrP) functionalized linkage(s), aminomethyl (AMP) or aminoethyl phosphonate (AEP) functionalized linkages, boranophosphate (BP) linkage(s), methylphosphonothioates (MPS) linkage(s), phosphorodithioates (SPS) linkage(s), thiophosphoramidates (NPS) linkage(s), boranomethylphosphonates (BMP) linkage(s), guanidine (GUA) linkage(s), morpholino phosphorodiamidate (PMO) linkage(s), and/or carbamate linkage(s).
- the linkage(s) that resist exonuclease activity are phosphorothioate linkage(s).
- the polymerase with 5’ to 3’ exonuclease activity is selected from a Taq-based polymerase and a Bst-based polymerase.
- the template switch oligonucleotide is blocked at its 3' end by having an -OH group, an ‘inverted T’ group, or a dideoxy version of a dNTP.
- the Tn5 transposome is immobilized on a streptavidin paramagnetic bead.
- the Tn5 transposome is immobilized on a solid substrate by the adaptor strand hybridizing to an anchor oligonucleotide attached to the solid substrate.
- the 5' adaptor and the 3' adaptor have different sequences selected from either P5 or P7: P5: 5' AAT GAT ACG GCG ACC ACC GA 3' (SEQ ID NO: 32) or P7: 5' CAA GCA GAA GAC GGC ATA CGA GAT 3' (SEQ ID NO: 33).
- the adaptor-polynucleotide constructs are used as templates for sequencing.
- the disclosure also provides a method to append adaptors to the 5’ and 3’ ends of polynucleotides to form an adaptor-polynucleotide constructs, comprising: (a) appending an adaptor to the 5' end of a polynucleotide by tagmenting the polynucleotide with a Tn5 transposome comprising an adaptor strand that is transferred to the 5' end of the polynucleotide, and a non-transferred strand that remains hybridized to the adaptor when the adaptor is appended to the 5' end of the polynucleotide, and wherein 5' adaptor strand and the non-transferred strand comprises a template switch oligonucleotide binding region, wherein a portion of the sequence of the template switch oligonucleotide binding region does not contain one of the four types of nucleobases; (b) extending the polynucleotide comprising the
- the adaptor strand may or may not have nucleotide modifications.
- the adaptor strand comprises one or more LNAs.
- the non-strand displacing polymerase is selected from a T4-based polymerase, a T7-based polymerase, a Pfu-based polymerase, and a Taq-based polymerase.
- the dNTPs and polymerase are removed prior to step (c).
- the dNTPs and polymerase are removed by using SPRI beads, or by magnetic bead-based washing if the adaptors appended to the 5' end of the polynucleotide are attached to a bead.
- the strand displacing polymerase is selected from a phi29-based polymerase and a Bst (large fragment)-based polymerase.
- the template switch oligonucleotide is blocked at its 3' end by having an -OH group, an ‘inverted T’ group, or a dideoxy version of a dNTP.
- the Tn5 transposome is immobilized on a solid substrate by the adaptor strand hybridizing to an anchor oligonucleotide attached to the solid substrate.
- the 5' adaptor and the 3' adaptor have different sequences selected from either P5 or P7: P5: 5' AAT GAT ACG GCG ACC ACC GA 3' (SEQ ID NO: 32) or P7: 5' CAA GCA GAA GAC GGC ATA CGA GAT 3' (SEQ ID NO: 33).
- the adaptor-polynucleotide constructs are used as templates for sequencing.
- the disclosure further provides a method to append adaptors to the 5’ and 3’ ends of polynucleotides to form an adaptor-polynucleotide constructs, comprising: (a) appending an adaptor to the 5' end of a polynucleotide by tagmenting the polynucleotide with a Tn5 transposome comprising an adaptor strand that is transferred to the 5' end of the polynucleotide, and a non-transferred strand that remains hybridized to the adaptor when the adaptor strand is appended to the 5' end of the polynucleotide, and wherein 5' adaptor strand and the non-transferred strand comprises a template switch oligonucleotide binding region, wherein a portion of the sequence of the template switch oligonucleotide binding region does not contain one of the four types of nucleobases; (b) extending the polynucleotide comprising: (a) appending an adapt
- the adaptor strand may or may not have nucleotide modifications.
- the adaptor strand comprises one or more LNAs.
- the nonstrand displacing polymerase is selected from a T4-based polymerase, a T7-based polymerase, a Pfu-based polymerase, and a Taq-based polymerase.
- the dNTPs and polymerase are removed prior to step (e).
- the dNTPs and polymerase are removed by using SPRI beads, or by magnetic bead-based washing if the adaptors appended to the 5' end of the polynucleotide are attached to a bead.
- the non-transferred strand is removed using moderate heat to selective denature the strand, or application of a lambda exonuclease that selectively digests oligonucleotides containing a 5’ phosphorylated ends.
- the strand displacing polymerase is selected from a phi29-based polymerase and a Bst (large fragmentbased polymerase.
- the template switch oligonucleotide is blocked at its 3' end by having an -OH group, an ‘inverted T’ group, or a dideoxy version of a dNTP.
- the Tn5 transposome is immobilized on a solid substrate by the adaptor strand hybridizing to an anchor oligonucleotide attached to the solid substrate.
- the 5' adaptor and the 3' adaptor have different sequences selected from either P5 or P7: P5: 5' AAT GAT ACG GCG ACC ACC GA 3' (SEQ ID NO: 32) or P7: 5' CAA GCA GAA GAC GGC ATA CGA GAT 3' (SEQ ID NO: 33).
- the polynucleotides are used as templates for sequencing.
- the disclosure provides a method to append adaptors to the 5’ and 3’ ends of polynucleotides to form an adaptor-polynucleotide constructs, comprising: (a) appending an adaptor to the 5' end of a polynucleotide by tagmenting the polynucleotide with a Tn5 transposome comprising an adaptor strand that is transferred to the 5' end of the polynucleotide, and a non-transferred strand that can be removed under denaturing conditions, wherein the adaptor comprises a complementary template switch binding domain; (b) extending to the ends of the polynucleotide comprising the 5' adaptor with a strand displacing polymerase to form a polynucleotide comprising the 5' adaptor and a complementary 5' adaptor region comprising a template switch binding domain on the 3' end; (c) denaturing and annealing to the template switch binding domain
- the adaptor strand may or may not have nucleotide modifications.
- the adaptor strand comprises one or more LNAs.
- the strand displacing polymerase is selected from a phi29-based polymerase, and a Bst (large fragmentbased polymerase.
- the polymerase that has 3' to 5' exonuclease activity is selected from a pfu-based polymerase, a phi29-based polymerase, and E. coli DNA polymerase II.
- the Tn5 transposome is immobilized on a solid substrate by the adaptor strand hybridizing to an anchor oligonucleotide attached to the solid substrate.
- the 5' adaptor and the 3' adaptor have different sequences selected from either P5 or P7: P5: 5' AAT GAT ACG GCG ACC ACC GA 3' (SEQ ID NO: 32) or P7: 5' CAA GCA GAA GAC GGC ATA CGA GAT 3' (SEQ ID NO: 33).
- the adaptor-polynucleotide constructs are used as templates for sequencing.
- the disclosure also provides a method to append adaptors to the 5’ and 3’ ends of polynucleotides to form an adaptor-polynucleotide constructs, comprising: (a) appending an adaptor to the 5' end of a polynucleotide by tagmenting the polynucleotide with a Tn5 transposome comprising an adaptor strand that is transferred to the 5' end of the polynucleotide, and a non-transferred strand that can be removed under denaturing conditions, wherein the adaptor comprises a complementary template switch binding domain; (b) extending to the ends of the polynucleotide comprising the 5' adaptor with a strand displacing polymerase to form a polynucleotide comprising the 5' adaptor and a complementary 5' adaptor region comprising a template switch binding domain on the 3' end; (c) denaturing and annealing to the template switch binding
- the adaptor strand may or may not have nucleotide modifications.
- the adaptor strand comprises one or more LNAs.
- the strand displacing polymerase is selected from a phi29-based polymerase and a Bst (large fragment)-based polymerase.
- the structure specific endonuclease is XPF/Mus81.
- the Tn5 transposome is immobilized on a solid substrate by the adaptor strand hybridizing to an anchor oligonucleotide attached to the solid substrate.
- the 5' adaptor and the 3' adaptor have different sequences selected from either P5 or P7: P5: 5' AAT GAT ACG GCG ACC ACC GA 3' (SEQ ID NO: 32) or P7: 5' CAA GCA GAA GAC GGC ATA CGA GAT 3' (SEQ ID NO: 33).
- the adaptor-polynucleotide constructs are used as templates for sequencing.
- the disclosure further provides a method to append adaptors to the 5’ and 3’ ends of polynucleotides to form an adaptor-polynucleotide constructs, comprising: (a) appending an adaptor to the 5' end of a polynucleotide by tagmenting the polynucleotide with a Tn5 transposome comprising an adaptor strand that is transferred to the 5' end of the polynucleotide, and a non-transferred strand that comprises a single stranded 3’ end that has a sequence complementary to an oligonucleotide comprising a 3’ complementary adaptor sequence; (b) annealing an oligonucleotide comprising the 3’ complementary adaptor sequence to the non-transferred strand to form a polynucleotide comprising the 5’ adaptor, the non-transferred strand, and the oligonucleotide comprising the 3’ complementary
- the non-strand displacing polymerase is selected from a T4-based polymerase, a T7-based polymerase, a Pfu-based polymerase, and a Taq-based polymerase.
- the Tn5 transposome is immobilized on a solid substrate by the adaptor strand hybridizing to an anchor oligonucleotide attached to the solid substrate.
- the 5' adaptor and the 3' adaptor have different sequences selected from either P5 or P7: P5: 5' AAT GAT ACG GCG ACC ACC GA 3' (SEQ ID NO: 32) or P7: 5' CAA GCA GAA GAC GGC ATA CGA GAT 3' (SEQ ID NO: 33).
- the adaptor-polynucleotide constructs are used as templates for sequencing.
- Figure 1 provides an overview of steps typically used for ligation-based library preparation that does not use PCR.
- Figure 2 demonstrates the process in prokaryotes of Ku bridging DNA fragments and recruiting LigD for DNA repair.
- Figure 3 provides an exemplary embodiment of the disclosure showing steps of an improved ligation-mediated library prep of the disclosure, where prokaryotic NHEJ factors Ku and LigD replace T4 ligase.
- FIG. 4 provides an exemplary embodiment of the disclosure wherein terminal transferase (TdT) is used to generate poly-nucleotide overhangs that could be joined, trimmed, and ligated by LigD.
- TdT terminal transferase
- Figure 5 provides a sampling of prokaryotic wild-type sequences for LigD (SEQ ID NOs: 1 to 20).
- Figure 6 provides a sampling of prokaryotic wild-type sequences for Ku (SEQ ID NOs: 21 to 30).
- FIG. 8 demonstrates how FEN1 plays a central role in DNA replication both in eukaryotes and prokaryotes.
- FEN1 functions to remove single stranded 5’ flaps of DNA from Okazaki fragments that are generated on the lagging strand of the DNA replication fork. These flaps form when a primase generates an RNA primer that serves as a primer to extend a new DNA strand; multiple Okazaki fragments are generated and when extending from the 3’ end of one abuts the 5’ end of another, it displaces it to form a flap structure.
- Figure 9 demonstrates how FEN1 binds to a 5’ flap and cleaves it at its base to leave a nick in the DNA.
- Figure 10 shows how a ligase seals the nick in DNA, thus generating a contiguous new long strand from the initial Okazaki fragments.
- Figure 11 presents the preferred substrate for FEN1 which is a structure having a double flap where both the 5’ end of one strand and the 3’ end of the other abutting strand overlap and both ends form flaps and moreover, the 3’ flap is a single nucleotide long.
- Figure 12 shows an embodiment of an adaptor that has a specific structure which is designed to work with FEN 1.
- the adaptor comprises two oligonucleotides that when annealed together form a partially double stranded molecule.
- a ‘probe’ portion of this adaptor is single stranded and complementary to a target of interest in a genome.
- the double stranded portion of the adaptor comprises a universal sequence that can be, for instance, the sequences of adaptors for a DNA sequence platform. The last base-pair next to the single stranded probe portion may also match the target in the genome DNA.
- Figure 13 shows that when the adaptor of FIG. 12 is hybridized to the target DNA molecule that has been previously made single stranded, a flap structure forms.
- Figure 14 demonstrates that the structure of FIG. 13 comprising a 5’ flap from the target DNA and a single nucleotide 3’ flap from the adaptor is a substrate for FEN1, which can cleave, leaving a nick that can be subsequently joined by a ligase. The result is an addition of an adaptor to the target DNA.
- FIGS. 15A-15B provides illustrations of structures utilized by flap endonucleases in the XPF/MUS81 family of proteins. These flap endonucleases play a role in damage repair caused by UV-light or DNA cross-linking.
- FIG. 15A Target structures for the XPF Flap endonuclease which comprise a fork in a DNA structure where the two branches of the fork comprise noncomplementary sequences.
- FIG. 15B The branches can be partially or fully double stranded or contiguous as in the example of a hairpin loop.
- Figure 16 demonstrates that cleavage by XPF/MUS81 flap endonuclease occurs within a few bases of the commencement of the 3’ flap, generating a nick.
- Figure 17 provides an embodiment of an exemplary adaptor having a specific structure that can be used with an XPF/MUS81 3’ flap endonuclease.
- the adaptor comprises, at a minimum, a single oligonucleotide comprising a 3’ sequence complementary to a target of interest in a genome and a 5’ sequence universal sequence that can be, for instance, used with massively parallel sequencing platforms.
- FIG. 18 demonstrates that when the adaptor of FIG. 17 is hybridized to the target DNA molecule that has been previously made single stranded, a flap structure forms that is a substrate for a XPF/MUS81 3’ flap endonuclease.
- Figure 19 shows that the flap structure formed in FIG. 18 can be cleaved by a XPF/MUS81 3’ flap endonuclease, leaving a nick that when extended with a polymerase copies the universal adaptor sequence to the target DNA.
- Figure 20 demonstrates embodiments where an adaptor for FEN1 endonuclease, complementary to a target #1, and an adaptor complementary for XPF/MUS81 endonuclease, complementary to a target #2, as shown in FIG. 13 and FIG. 18, are hybridized to DNA, for example genomic DNA, a structure forms that contains a 5’ flap and a 3’ flap.
- Figure 21 demonstrates that when FEN1 and a ligase is added to the adaptors of FIG. 20, adaptor #1 will be appended to the 5’ end of the DNA of target #1.
- Figure 22 demonstrates that when XPF/MUS81 endonuclease and a polymerase is added to the ligated adaptor/target of FIG. 21, the adaptor target will be copied and adaptor #2 will be appended to the 3’ end of the DNA of target #2.
- Figure 23 demonstrates the possibility that achieving a double-flap structure may require sequential annealing of the individual oligos that comprise the FEN1 structure, such that a longer oligo that contains the probe #1 sequence anneals first to the target #1 followed by annealing of the shorter oligo complementary to the universal sequence of the adaptor.
- Such differential hybridization can be achieved by methods known to those skilled in the art, for example through design of the probe sequence and the universal sequences with different T m .
- Figure 24 demonstrates that the methods of the disclosure which utilize Flap endonucleases can be multiplexed to include many targets.
- Figure 25 demonstrates a tagmentation process to append adaptors to 5' ends of polynucleotide fragments.
- the transposase enzyme Tn5 fragment polynucleotides and simultaneously appends adaptor sequences to the 5’ ends of the resulting polynucleotide fragments.
- Figure 26 demonstrates a process to append adaptors to the 3' ends of tagmented polynucleotides.
- the free 3’ end of polynucleotide fragments can be extended in the presence of a polymerase and dNTPs. Either heat (e.g., > 68°C), or use of a polymerase with strand displacement activity can be used to remove the ‘non-transferred’ strand of the fragment.
- the complement of the 5’ adaptor polynucleotide fragment is copied, and finally a PCR reaction with two distinct primers e.g., P5-i5-A14 and P7-i7-B15) can be used to enrich for those dsDNA PCR products that have a 5' based adaptor on one end and a 3' based adaptor on the other end.
- a PCR reaction with two distinct primers e.g., P5-i5-A14 and P7-i7-B15
- Figure 27 demonstrates an alternate process to append adaptors to the 3' ends of tagmented polynucleotides.
- a single double-stranded ‘forked’ adaptor is employed in the transposome and a non-displacing polymerase is used at a temperature below the Tm of the nontransferred strand (e.g., ⁇ 55°C) to extend the free 3’ ends of the fragment until it reaches the 5’ end of the ‘non-transferred’ adaptor strand and then a ligase covalently connects the nontransferred strand to the fragment.
- FIG. 28A DNA is tagmented with a transposome that may or may not have modifications to the transferred strand.
- the tagmented library is treated to de-anneal and remove the non-transferred strand of the transposome.
- a replacement oligo that has a higher Tm than the non-transferred strand is hybridized back to the appended adaptors. The replacement oligo does not hybridize in place of all the non-transferred strand, but instead does so partially.
- each strand of the insert now has an adaptor appended at its 5’ end (through tagmentation) and a partial adaptor at its 3’ end (through the extension step described herein). If the strands are denatured, they can then participate in a template switch reaction where a 3’ blocked template switch oligo is annealed and serves as a template to further extend the 3’ end of the insert to add new sequences to the 3’ end of the fragment.
- FIG. 28B A transposome is employed that already has the ‘replacement oligo’ annealed next to the nontransferred strand.
- the non-transferred strand can be shorter than its standard 19 base Tb5 recognition sequence (for example 16 bases). Following tagmentation and removal of the Tn5, moderate heat can be used to denature the non-transferred strand leaving the ‘LNA containing replacement oligo’ still annealed.
- a non-displacing polymerase and polymerase reagents can be used in foregoing denaturing step, or alternatively, in a separate step, to extend from the 3 ’ ends of the insert filling in the ends of the insert and extending over the nontransferred strand but stopping when it reaches the hybridized replacement oligo.
- each strand of the insert now has an adaptor appended at its 5’ end (through tagmentation) and a partial adaptor at its 3’ end (through the extension step described herein).
- FIG. 28 presents an example of data generated using the methods embodied in FIG. 28.
- a transposome was constructed that contained LNA modifications in its transferred strand and was used to tagment genomic DNA.
- the transposome was immobilized on a streptavidin paramagnetic bead via a 3’ biotin group on an ‘anchor’ oligo.
- the tagmentation was conducted in the presence of a ligase enzyme and an IlluminaTM Indexing primer P5-i5-A14.
- This primer hybridized 5’ of the transferred strand and was ligated to it by virtue of the 5’ end of the transferred strand bearing a phosphate moiety by design.
- the Tn5 protein was then removed by denaturing it with a solution comprising the anionic detergent sodium dodecyl sulfate (SDS). Different SDS concentrations (%) were tested to effect complete removal of the Tn5 protein.
- SDS anionic detergent sodium dodecyl sulfate
- a mixture of a non-displacing polymerase (tTaq608), dNTPs, Q5 polymerase and a template switching oligo was added and incubated at 47 °C to de-anneal the non-transferred strand and extend with tTaq608 pol as far as the anchor oligo.
- the temperature was then raised further (60-70 °C) to the point where the templates were rendered single stranded and no longer attached to the beads.
- the temperature was then lowered to 42 °C for 1 min to allow the template switch oligo containing the P7-i7-B15 sequences to hybridize.
- Figure 30 demonstrates additional embodiments of the disclosure to append adaptors to the ends of tagmented polynucleotides capable of hybridizing to, and extending off, a template switch oligo.
- a transposome is employed that contains one or more internal modifications in the non-transferred strand that prevents exonuclease digestion, for example a phosphorothioate linkage in the phosphodiester backbone of the oligo.
- a polymerase with a 5’ to 3’ exonuclease activity is employed to extend from the free 3’ end of the insert.
- each strand of the insert now has an adaptor appended at its 5’ end (through tagmentation) and a partial adaptor at its 3’ end (through the extension step described herein).
- Figures 31A-31B demonstrates additional embodiments of the disclosure to append adaptors to the ends of tagmented polynucleotides capable of hybridizing to, and extending off, a template switch oligo. (FIG.
- a non-displacing polymerase and all four dNTPs (dATP, dCTP, dGTP, dTTP) is used to extend from the free 3’ end of the fragment up to the 5’ end of the non-transferred strand.
- the polymerase and dNTPs are then removed (e.g., purified on SPRI beads, or by magnetic bead-based washing if the adaptors are attached to a bead).
- a fresh aliquot of a strand displacing polymerase is added and just +three out of the four dNTPs (dCTP, dGTP, dTTP); dATP is absent.
- each strand of the insert now has an adaptor appended at its 5’ end (through tagmentation) and a partial adaptor at its 3’ end through the extension step described herein and comprising a sequence (e.g., 5’CTGTCTCTT3’ (SEQ ID NO:32)).
- strands are denatured, they can then participate in a template switch reaction where a 3’ blocked template switch oligo is annealed and serves as a template to further extend the 3’ end of the insert to add new sequences to the 3’ end of the fragment.
- a non-displacing polymerase and all four dNTPs (dATP, dCTP, dGTP, dTTP) is used to extend from the free 3’ end of the fragment up to the 5’ end of the non-transferred strand.
- the polymerase and dNTP are then removed (eg purified on SPRI beads, or by magnetic bead-based washing if the adaptors are attached to a bead).
- the non-transferred strand is then removed, e.g., by moderate heat to selective denature the strand, or by application of a lambda exonuclease that selectively digests oligos containing 5’ phosphorylated ends (as is the case with non-transferred strands), or by other means known to those skilled in the art.
- a fresh aliquot of a polymerase is added and just three out of the four dNTPs (dCTP, dGTP, dTTP); dATP is absent.
- each strand of the insert now has an adaptor appended at its 5’ end (through tagmentation) and a partial adaptor at its 3’ end through the extension step described herein and comprising a sequence (e.g., 5’CTGTCTCTT3’ (SEQ ID NO:32)).
- strands are denatured, they can then participate in a template switch reaction where a 3’ blocked template switch oligo is annealed and serves as a template to further extend the 3’ end of the insert to add new sequences to the 3’ end of the fragment.
- Figure 32 demonstrates an embodiment of the disclosure demonstrating how an adaptor can be appended to the 3’ end following extension by hybridizing an oligo that is partially complementary to the 5’ adaptor but contains further sequences that are unique and not present in the 5’ adaptor.
- a ligation reaction covalently joins this oligo to the 3’ end of the insert and forms a ‘ Y’ shaped adaptor construct.
- FIG. 33A demonstrates additional embodiments of the disclosure to append adaptors to the ends of tagmented polynucleotides capable of hybridizing to, and extending off, a template switch oligo, and data generated therefrom.
- a single transposome type comprises a P5 transferred-strand and a short non-transferred-strand is used to tagment DNA.
- a strand displacing polymerase is added to extend from the free 3’ end, create the complement of the P5 transferred-strand (z.e., creates P5’), and displace the short non-transferred strand.
- the temperature is then elevated to make the fragments single stranded.
- a P7 template switch oligo hybridizes forming a forked structure that is partially double-stranded. Then in the presence of a polymerase with 3’ exo activity, the single stranded 3’ end is degraded by this activity until there is no longer any single stranded 3’ end. The remaining 3’ end of the fragment then forms a primer template that extends and creates the complement of the P7 template switch oligo.
- FIG. 33B Experiments that demonstrate a ‘proof of concept’ of the ‘fork’ -modulated switch in activity from exonuclease to extension activity. Simple P5 transposomes were immobilized on a streptavidin bead by hybridization to an ‘anchor’ oligo and used to tagment DNA.
- the template switch oligo comprised either a free extendable -OH group at its 3’ end or a non-extendable blocked dideoxyC group at its 3’ end, or a non-extendable ‘inverted T’ blocking group at its 3’ end.
- the P5’ end of the template is digested and replaced with the P7’.
- FIG. 33C Images of gel electrophoresis indicating that neither of the two 3 ’blocked template switch oligo were consumed indicating that the block is effective in preventing extension from the 3’ end of the template switch oligo which would result in creating a copy of the entire template.
- the unblocked template switch oligo produced 1.5x as much product as a result of two mechanisms: (i) ‘fork’ -modulated switch in activity from exonuclease to extension activity to append the P7’ adaptor to the 3’ end of the template, and (ii) extension from the 3’ end of the template switch oligo to append a copy of the template to the P7 template switch oligo.
- FIG. 33E A simple P5 transposome was immobilized on a streptavidin bead by hybridization to an ‘anchor’ oligo and used to tagment DNA.
- Figure 34 demonstrates additional embodiments of the disclosure to append adaptors to the ends of tagmented polynucleotides capable of hybridizing to, and extending off, a template switch oligo.
- a structure specific endonuclease e.g., XPF/MUS81
- XPF/MUS81 can be employed as an alternative to exonuclease degradation of the 3’ single stranded of the forked structure following hybridization of the template switch oligo.
- the endonuclease nicks the double stranded region of the duplex creating a free 3’ end that a polymerase extends and creates the complement of the P7 template switch oligo.
- library merely refers to a collection or plurality of template molecules, which at their 5' and 3' ends typically comprise added on adaptor sequences.
- Use of the term “library” to refer to a collection or plurality of template molecules should not be taken to imply that the templates making up the library are derived from a particular source, or that the “library” has a particular composition.
- use of the term “library” should not be taken to imply that the individual templates within the library must be of different nucleotide sequence or that the templates be related in terms of sequence and/or source.
- the disclosure encompasses formation of so-called “monotemplate” libraries, which comprise multiple copies of a single type of template molecule, each having added on adaptor sequences at their 5' ends and their 3' ends, as well as “complex” libraries wherein many, if not all, of the individual template molecules comprise different target sequences (as defined below), where each template molecule has added on adaptor sequences at their 5' ends and their 3' ends.
- complex template libraries may be prepared using the method of the disclosure starting from a complex mixture of target polynucleotides such as (but not limited to) random genomic DNA fragments, cDNA libraries etc.
- the disclosure also extends to “complex” libraries formed by mixing together several individual “monotemplate” libraries, each of which has been prepared separately using the method of the disclosure starting from a single type of target molecule (ie., a monotemplate).
- a monotemplate ie., a single type of target molecule
- more than 50%, or more than 60%, or more than 70%, or more than 80%, or more than 90%, or more than 95% of the individual polynucleotide templates in a complex library may comprise different target sequences.
- template to refer to individual polynucleotide molecules in the library merely indicates that one or both strands of the polynucleotides in the library are capable of acting as templates for template-dependent nucleic-acid polymerization catalyzed by a polymerase. Use of this term should not be taken as limiting the scope of the invention to libraries of polynucleotides which are actually used as templates in a subsequent enzyme- catalyzed polymerization reaction.
- the term “unmatched region” refers to a region of the adaptor wherein the sequences of the two polynucleotide strands forming the adaptor exhibit a degree of noncomplementarity such that the two strands are not capable of annealing to each other under standard annealing conditions for a primer extension or PCR reaction.
- the two strands in the unmatched region may exhibit some degree of annealing under standard reaction conditions for an enzyme-catalyzed ligation reaction, provided that the two strands revert to single stranded form under annealing conditions.
- a DNA sequencing library is generally formed by ligating adaptor polynucleotide molecules to the 5' and 3' ends of one or more target polynucleotide duplexes (which may be of known, partially known or unknown sequence) to form adaptor-target constructs and then carrying out an initial primer extension reaction in which extension products complementary to both strands of each individual adaptor-target construct are formed.
- the resulting primer extension products, and optionally amplified copies thereof, collectively provide a library of template polynucleotides.
- the library of template polynucleotides can then be sequenced using next generation sequencing. To save resources, multiple libraries can be pooled together and sequenced in the same run — a process known as multiplexing.
- unique index sequences, or “barcodes,” can be added to each library. These barcodes are used to distinguish between the libraries during data analysis.
- the ends of the amplification products may differ somewhat to the products of the initial primer extension reaction, since the former will be determined in part by the sequence of the PCR primer used to prime synthesis of a polynucleotide strand complementary to the initial primer extension product, whereas the latter will be determined solely by copying of the adaptor sequences at the 3' ends of the adaptortemplate constructs in the initial primer extension.
- the disclosure provides methods that utilize nonhom ologous end joining factors (nhEJF) to append adaptors to polynucleotides.
- nhEJF nonhom ologous end joining factors
- the nhEJF adaptors added onto the double stranded polynucleotides typically comprise a double stranded region of complementary sequence and a single stranded region of sequence mismatch.
- the nhEJF adaptors have a Y-shape, where the region of sequence mismatch causes the arms of the adaptor to separate from each other.
- the “doublestranded region” of the nhEJF adaptor is a short double-stranded region, typically comprising 5 or more consecutive base pairs, formed by annealing of the two partially complementary oligonucleotide strands. This term simply refers to a double-stranded region of nucleic acid in which the two strands are annealed and does not imply any particular structural conformation.
- the nhEJF adaptors instead of having a Y-shape structure, are U- shaped, such that once the nhEJF adaptors are added to the ends of polynucleotides using nhEJFs in methods of the disclosure form a continuous loop at the 5’ and 3’ ends of the templates.
- the resulting polynucleotides comprising the 5’ and 3’ adaptors can be amplified using rolling circle amplification.
- the double-stranded region of the nhEJF adaptors it is advantageous for the double-stranded region of the nhEJF adaptors to be as short as possible without loss of function.
- function in this context is meant that the double-stranded region forms a stable duplex under reaction conditions for the nhEJFs described herein, such that the two strands forming the nhEJF adaptor remain partially annealed during ligation of the nhEJF adaptor to a polynucleotide. It is not absolutely necessary for the doublestranded region to be stable under the conditions typically used in the annealing steps of primer extension or PCR reactions.
- nhEJF adaptors are added to both ends of each of the polynucleotide.
- the resulting polynucleotides will be flanked by complementary sequences derived from the double-stranded region of the nhEJF adaptors.
- the longer the doublestranded region z.e., the complementary sequences of the adaptor-polynucleotide constructs the greater the possibility that the adaptor-polynucleotide construct is able to fold back and base-pair to itself in these regions of internal self-complementarity when annealed for primer extension and/or PCR.
- the double-stranded region of the nhEJF adaptors comprise 5 base pairs (bps), 6 bps, 6 bps, 7 bps, 8 bps, 9 bps, 10 bps, 11 bps, 12 bps, 13 bps, 14 bps, 15 bps, 16 bps, 17 bps, 18 bps, 19 bps, 20 bps, or a range that includes or is between any two of the foregoing bps.
- the stability of the double-stranded region of the nhEJF adaptor may be increased, and hence its length potentially reduced, by the inclusion of non-natural nucleotides which exhibit stronger base-pairing than standard Watson-Crick base pairs.
- two strands of a nhEJF adaptor comprise base pairs that are 100% complementary to a sequence of the polynucleotide. It will be appreciated, however, that one or more nucleotide mismatches may be tolerated within the double-stranded region of the nhEJF adaptor, provided that the two strands are capable of forming a stable duplex under standard ligation conditions.
- the nhEJF adaptors added onto the double stranded templates using the non-homologous end joining factors in methods of the disclosure comprise double stranded complementary sequences.
- the resulting adaptor/template molecules can then be amplified by PCR to form the DNA library templates.
- a splint oligonucleotide can be used to join the ends of polynucleotides comprising adaptors to form a circle.
- An exonuclease is added to remove all remaining linear single-stranded and double-stranded DNA products. The result is a completed circular DNA template.
- nhEJF adaptors for use in the methods disclosed herein will generally include a double-stranded region adjacent to the “ligatable” end of the nhEJF adaptor, z.e., the end that is joined to a target polynucleotide using the non-homologous end joining factors in methods of the disclosure.
- the ligatable end of a nhEJF adaptor may be blunt or, in other embodiments, short 5’ or 3' overhangs of one or more nucleotides may be present to facilitate/promote ligation.
- the 5' terminal nucleotide at the ligatable end of the nhEJF adaptor should be phosphorylated to enable phosphodiester linkage to a 3' hydroxyl group on the target polynucleotide.
- Different annealing conditions may be used for a single primer extension reaction not forming part of a PCR reaction (again see Sambrook el al., 2001, Molecular Cloning, A Laboratory Manual, 3rd Ed, Cold Spring Harbor Laboratory Press, Cold Spring Harbor Laboratory Press, NY; Current Protocols, eds Ausubel et al.).
- Conditions for primer annealing in a single primer extension include, for example, exposure to a temperature in the range of from 30 to 37° C. in standard primer extension buffer. It will be appreciated that different enzymes, and hence different reaction buffers, may be used for a single primer extension reaction as opposed to a PCR reaction. There is no requirement to use a thermostable polymerase for a single primer extension reaction.
- the nhEJF adaptors comprise a double stranded region and an unmatched region.
- the lower limit on the length of the unmatched region will typically be determined by function, for example the need to provide a suitable sequence for binding of a primer for primer extension, PCR and/or sequencing.
- the length of unmatched region in each strand should be 20 nucleotides (nts), 25 nts, 30 nts, 35 nts, 40 nts, 45 nts, 50 nts in length, or have a range of lengths that includes or is between any two of the foregoing nucleotide lengths.
- the overall length of the two strands forming a nhEJF adaptor will typically be 25 nts, 30 nts, 35 nts, 40 nts, 45 nts, 50 nts, 55 nts, 60 nts, 65 nts, 70 nts, 75 nts, 80 nts, 85 nts, 90 nts, 95 nts, 100 nts, 105 nts, 110 nts, 115 nts, 120 nts, 125 nts, 130 nts, 135 nts, 140 nts, 145 nts, 150 nts, or a range that is between or includes any two of foregoing nucleotide lengths.
- the portions of the two strands forming the unmatched region of a nhEJF adaptor should preferably be of similar length, although this is not absolutely essential, provided that the length of each portion is sufficient to fulfil its desired function (e.g., primer binding). It has been shown by experiment that the portions of the two strands forming the unmatched region of a nhEJF may differ by up to 25 nucleotides without unduly affecting adaptor function.
- portions of the two polynucleotide strands forming an unmatched region of a nhEJF adaptor will be completely mismatched, or 100% noncompl ementary.
- some sequence “matches”, ie., a lesser degree of noncomplementarity may be tolerated in this region without affecting function to a material extent.
- the extent of sequence mismatching or non-complementarity is such that the two strands in the unmatched region remain in single-stranded form under annealing conditions as defined above.
- the precise nucleotide sequence of the nhEJF adaptors is generally not material to the disclosure and may be selected by the user such that the desired sequence elements are ultimately included in the common sequences of the library of polynucleotides comprising the adaptors, e.g., to provide binding sites for particular sets of universal amplification primers and/or sequencing primers (e.g., P7 or P5 primers). Additional sequence elements may be included, for example to provide binding sites for sequencing primers which will ultimately be used in sequencing of template molecules in the library, or products derived from amplification of the template library, for example on a solid support.
- the nhEJF adaptors may further include “bar code” sequences, which can be used to bar code polynucleotides derived from a particular source.
- the sequences of the individual strands in the unmatched region should be such that neither individual strand exhibits any internal self-complementarity which could lead to self-annealing, formation of hairpin structures, etc., under standard annealing conditions. Selfannealing of a strand in the unmatched region is to be avoided as it may prevent or reduce specific binding of an amplification primer to this strand.
- nhEJF adaptors are preferably formed from two strands of DNA, but may include mixtures of natural and non-natural nucleotides (e.g., one or more ribonucleotides) linked by a mixture of phosphodiester and non-phosphodiester backbone linkages.
- Other non-nucleotide modifications may be included such as, for example, biotin moieties, blocking groups and capture moieties for attachment to a solid surface, as discussed in further detail below.
- polynucleotides to which the adaptors are appended to may be a polynucleotide that can be used with additional methodologies, including amplification by solidphase PCR, next generation sequencing, subcloning, etc.
- Polynucleotides in which nhEJF adaptors are appended to may originate in double-stranded DNA form (e.g., genomic DNA fragments) or may have originated in single-stranded form, as DNA or RNA, and been converted to dsDNA form prior to ligation.
- mRNA molecules may be copied into double-stranded cDNAs suitable for use with nhEJF adaptors disclosed herein.
- polynucleotides The precise sequence of the polynucleotides is generally not material to the disclosure, and may be known or unknown. Modified polynucleotides including polynucleotides comprising non-natural nucleotides and/or non-natural backbone linkages could also be utilized in the methods of the disclosure, provided that the modifications do not preclude adding on nhEJF adaptors and/or copying in a primer extension reaction.
- the non-homologous end joining factors and methods of the disclosure can be used with a single polynucleotide, or can be used with a mixture or plurality of polynucleotides.
- the non-homologous end joining factors in the methods of the disclosure may be used with multiple copies of the same polynucleotides (z.e., monotemplates) or with mixtures of different polynucleotides.
- the polynucleotides may differ from each other with respect to nucleotide sequence over the full length of the polynucleotide or only a part of the polynucleotide.
- a nhEJF-based method disclosed herein may be applied to a plurality of polynucleotides derived from a common source, for example a library of genomic DNA fragments derived from a particular individual.
- the target polynucleotides will comprise random fragments of human genomic DNA.
- the fragments may be derived from a whole genome or from part of a genome (e.g., a single chromosome or sub-fraction thereof), and from one individual or several individuals.
- the polynucleotides may be treated chemically or enzymatically either prior to, or subsequent to the ligation of the nhEJF adaptor sequences. Techniques for fragmentation of genomic DNA include, for example, enzymatic digestion or mechanical shearing.
- “Ligation” of nhEJF adaptors to 5' and 3' ends of each polynucleotide involves joining of the two polynucleotide strands of the nhEJF adaptor to double-stranded target polynucleotide such that covalent linkages are formed between both strands of the two double- stranded molecules.
- joining means covalent linkage of two polynucleotide strands which were not previously covalently linked.
- such “joining” will take place by formation of a phosphodiester linkage between the two polynucleotide strands but other means of covalent linkage (e.g., non-phosphodiester backbone linkages) may be used.
- covalent linkages formed in the ligation reactions should allow for read-through of a polymerase, such that the resultant construct can be copied in a primer extension reaction using primers which binding to sequences in the regions of the adaptor-target construct that are derived from the nhEJF adaptor molecules.
- the ligation reactions will typically be enzyme-catalyzed.
- the ligation reactions will be by the non-homologous end joining factors of the disclosure.
- Non-enzymatic ligation techniques e.g., chemical ligation
- the non-enzymatic ligation leads to the formation of a covalent linkage which allows read-through of a polymerase, such that the resultant construct can be copied in a primer extension reaction.
- the desired products of the ligation reaction are adaptor-target constructs in which nhEJF adaptors are ligated at both ends of each target polynucleotide, given the structure adaptor-polynucleotide-adaptor.
- Conditions of the ligation reaction should therefore be optimized to maximize the formation of this product, in preference to targets having an adaptor at one end only.
- the products of the ligation reaction may be subjected to purification steps in order to remove unbound nhEJF adaptor molecules before the adaptor-polynucleotide constructs are processed further. Any suitable technique may be used to remove excess unbound nhEJF adaptors, examples of which will be described in further detail below.
- Adaptor-polynucleotides constructs formed in the ligation reaction as discussed above are then subject to an initial primer extension reaction in which a primer oligonucleotide is annealed to an adaptor portion of each of the adaptor-polynucleotide constructs and extended by sequential addition of nucleotides to the free 3' hydroxyl end of the primer to form extension products complementary to at least one strand of each of the adaptor-target constructs.
- the term “initial” primer extension reaction refers to a primer extension reaction in which primers are annealed directly to the adaptor-polynucleotide constructs, as opposed to either complementary strands formed by primer extension using the adaptor-polynucleotide construct as a template or amplified copies of the adaptor-polynucleotide construct.
- the initial primer extension reaction is carried out using a “universal” primer which binds specifically to a cognate sequence within an adaptor portion of the adaptor-polynucleotide construct, and is not carried out using a target-specific primer or a mixture of random primers.
- the use of an adaptor-specific primer for the initial primer extension reaction is key to formation of a library of polynucleotides which have common sequence at the 5' and common sequence at the 3' end.
- the primers used for the initial primer extension reaction will be capable of annealing to each individual strand of adaptor-polynucleotide constructs having adaptors ligated at both ends, and can be extended so as to obtain two separate primer extension products, one complementary to each strand of the construct.
- the initial primer extension reaction will result in formation of primer extension products complementary to each strand of each adaptortarget
- the primer used in the initial primer extension reaction will anneal to a primer-binding sequence (in one strand) in the unmatched region of the adaptor.
- annealing refers to sequence-specific binding/hybridization of the primer to a primer-binding sequence in an adaptor region of the adaptor-target construct under the conditions to be used for the primer annealing step of the initial primer extension reaction.
- the products of the primer extension reaction may be subjected to standard denaturing conditions in order to separate the extension products from strands of the adaptor- polynucleotide constructs.
- the strands of the adaptor-polynucleotide constructs may be removed at this stage.
- the extension products (with or without the original strands of the adaptor-target constructs) collectively form a library of template polynucleotides which can be used, e.g., as templates for solid-phase PCR.
- the initial primer extension reaction may be repeated one or more times, through rounds of primer annealing, extension and denaturation, in order to form multiple copies of the same extension products complementary to the adaptor-target constructs.
- the initial extension products may be amplified by conventional solution-phase PCR, as described in further detail below.
- the products of such further PCR amplification may be collected to form a library of templates comprising “amplification products derived from” the initial primer extension products.
- both primers used for further PCR amplification will anneal to different primerbinding sequences on opposite strands in the unmatched region of the adaptor.
- Other embodiments may, however, be based on the use of a single type of amplification primer which anneals to a primer-binding sequence in the double-stranded region of the adaptor.
- the “initial” primer extension reaction occurs in the first cycle of PCR.
- inclusion of an initial primer extension step (and optionally further rounds of PCR amplification) to form complementary copies of the adaptor-target constructs is advantageous, for several reasons. Firstly, inclusion of the primer extension step, and subsequent PCR amplification, acts as an enrichment step to select for adaptor-target constructs with adaptors ligated at both ends. Only target constructs with adaptors ligated at both ends provide effective templates for whole genome or solid-phase PCR using common or universal primers specific for primer-binding sequences in the adaptors, hence it is advantageous to produce a template library comprising only double-ligated targets prior to solidphase or whole genome amplification.
- the method disclosed herein to make a template library is PCR-free.
- PCR-free By being PCR-free, there is reduced library bias and gaps, due to preferential enrichment of certain adaptor/template constructs over others. The result is high data quality and optimal variant detection across the genome.
- inclusion of the initial primer extension step, and subsequent PCR amplification permits the length of the common sequences at the 5' and 3' ends of the target to be increased prior to solid-phase PCR or sequencing.
- Inclusion of the primer extension (and subsequent amplification) steps means that the length of the common sequences at one (or both) ends of the polynucleotides in the template library can be increased after ligation by inclusion of additional sequence at the 5' ends of the primers used for primer extension (and subsequent amplification).
- the use of such “tailed” primers is described in further detail below.
- FIG. 1 illustrates a process standardly used to generate a template library for sequencing.
- Next generation sequencing typically requires library preparation, where known adaptor DNA sequences are added to the target DNA to be sequenced. Traditionally, this requires that sample DNA is fragmented, end-repaired, and then ligated to the adaptor DNA (e.g., see FIG. 1).
- This library preparation is common to all major sequencing platforms, including those from IlluminaTM, Pacific BiosciencesTM, and Oxford NanoporeTM.
- ligation-mediated library prep is currently the only option for library preparation.
- the starting DNA is fragmented, and the fragments purified.
- An end repair reaction is then performed with T4 Polynucleotide Kinase, rATP, and T4 DNA polymerase, dNTP, to form blunt ended double stranded templates.
- an A-tailing reaction is performed with Klenow exo-, dNTP.
- the adaptor is formed by annealing two single-stranded oligonucleotides prepared by conventional automated oligonucleotide synthesis.
- the oligonucleotides are partially complementary such that the 3' end of a first oligonucleotide is complementary to the 5' end of a second oligonucleotide.
- the 5' end of the first oligonucleotide and the 3' end of second oligonucleotide are not complementary to each other.
- the resulting structure is double stranded at one end (the double-stranded region) and single stranded at the other end (the unmatched region) and is referred to herein as a “Y-shaped adaptor” (see FIG. 1).
- the double-stranded region of the Y- shaped adaptor may be blunt-ended (see FIG. 1) or it may have an overhang. In the latter case, the overhang may be a 3' overhang or a 5' overhang, and may comprise a single nucleotide or more than one nucleotide.
- the Y-shaped adaptor is phosphorylated at its 5' end and the double-stranded portion of the duplex contains a single base 3' overhang comprising a ‘T’ deoxynucleotide (see FIG. 1).
- the adaptors are then ligated using T4 Ligase, rATP, to the ends of double stranded template molecules containing a single base 5’ overhand of an ‘A’ nucleotide (see FIG. 1).
- FIG. 2 illustrates how in prokaryotes, e.g., Mycobacterium, the Ku protein bridges DNA fragments and recruits an ATP dependent DNA ligase (LigD) for DNA repair.
- NHEJ non-homologous end joining
- NHEJ is a pathway that repairs double-strand breaks in DNA.
- NHEJ is referred to as "non-homologous" because the break ends are directly ligated without the need for a homologous template, in contrast to homology directed repair, which requires a homologous sequence to guide repair.
- the non-homologous end joining (NHEJ) pathway requires only two factors: Ku and LigD.
- Ku recognizes and binds free ends of double-stranded DNA (e.g., two fragments of DNA) and joins the two ends to form a DNA bridge complex e.g., see FIG. 2).
- LigD uses its polymerase and exonuclease domains to repair the ends of the DNA. As it also possesses an ATP-dependent ligase domain, LigD then ligates the DNA.
- Ku actively bringing the ends of the DNA together and recruiting LigD, the ligation conversion is boosted by as much as 30-fold, compared to reactions with LigD alone.
- FIG. 3 provides an embodiment of a method of the disclosure which utilizes nonhom ologous end joining factors to ligate adaptors to double stranded template DNA.
- ligation-mediated library prep can yield the highest quality genomes, the conversion of sample DNA to library DNA can be inefficient. In cases where the quantity of sample DNA is in short supply, this poor efficiency makes ligation-mediated library prep more challenging or even infeasible.
- Typical ligation-mediated library prep methods employ ligases that in nature serve to ligate nicked DNA. That is, their intended purpose is not to join and ligate two strands or ends of DNA, as is required by the library prep method.
- the ligation-mediated methods disclosed herein employ the use of prokaryotic end joining and repair factors for the ligation of two ends of DNA.
- the in vitro end-repair and A-tailing steps of traditional library prep is employed.
- one uses Ku and LigD e.g., see FIG. 3
- the LigD’s wild-type nuclease activity is unneeded and so a nuclease deficient mutant can be used (e.g., Mycobacterium tuberculosis LigD H373A).
- DNA e.g., gDNA
- cDNA is fragmented into small molecules, typically less than 1000 base pairs in length. Fragmentation of DNA may be achieved by a number of methods including: enzymatic digestion, chemical cleavage, sonication, nebulization, or hydroshearing. Fragmented DNA may be made blunt-ended by a number of methods known to those skilled in the art. As shown in FIG. 3, the ends of the fragmented DNA are end repaired and phosphorylated using T4 DNA polymerase, dNTP, and T4 polynucleotide Kinase, rATP.
- a single ‘A’ deoxynucleotide is then added to both 3' ends of the DNA molecules using Klenow exo- enzyme, dATP, producing a one-base 3' overhang that is complementary to the one-base 3- ‘T' overhang on the double-stranded end of the Y-shaped nhEJF adaptor.
- a ligation reaction between the Y-shaped nhEJF adaptor and the DNA fragments is then performed using Ku exo-(lacking exonuclease activity) and LigD, ATP, which joins two copies of the adaptor to each DNA fragment, one at either end, to form adaptor-polynucleotide constructs.
- the products of this reaction can be purified from unligated nhEJF adaptor by a number of means, including size-exclusion chromatography.
- nhEJF adaptor After the excess nhEJF adaptor has been removed, unligated target DNA remains in addition to ligated adaptor-polynucleotide constructs and this can be removed by selectively capturing only those target DNA molecules that have adaptor(s) attached.
- the presence of a biotin group on the 5' end of the adaptors enables any target DNA ligated to the adaptor to be captured on a surface coated with streptavidin, a protein that selectively and tightly binds biotin.
- Streptavidin can be coated onto a surface using well developed chemistries.
- commercially available magnetic beads e.g., DynabeadsTM
- streptavidin can be used to capture ligated adaptor-target constructs.
- the application of a magnet to the side of a vessel containing these beads immobilizes them such that they can be washed free of the unligated target DNA molecules.
- FIG. 4 provides another embodiment of a method of the disclosure which utilizes non-homologous end joining factors to ligate nhEJF adaptors to double stranded template DNA.
- DNA e.g., gDNA
- cDNA is fragmented into small molecules, typically less than 1000 base pairs in length. Fragmentation of DNA may be achieved by a number of methods including: enzymatic digestion, chemical cleavage, sonication, nebulization, or hydroshearing. Fragmented DNA may be made blunt-ended by a number of methods known to those skilled in the art. As shown in FIG.
- T4 DNA polymerase dNTP
- T4 polynucleotide Kinase rATP
- Terminal transferase TdT
- Ku and LigD Terminal transferase
- unligated target DNA remains in addition to ligated adaptor-target constructs and this can be removed by selectively capturing only those target DNA molecules that have adaptor attached.
- a biotin group on the 5' end of the adaptors enables any target DNA ligated to the adaptor to be captured on a surface coated with streptavidin, a protein that selectively and tightly binds biotin.
- Streptavidin can be coated onto a surface using well developed chemistries.
- commercially available magnetic beads e.g., DynabeadsTM
- DynabeadsTM that are coated in streptavidin can be used to capture ligated adaptor-target constructs. The application of a magnet to the side of a vessel containing these beads immobilizes them such that they can be washed free of the unligated target DNA molecules.
- non-homologous end joining factors like Ku and LigD
- the non-homologous end joining factors can be used to add nhEJF adaptors to double stranded template DNA for library preparation.
- the disclosure further provides for engineered variants of Ku and/or LigD including, but not limited to, to increase enzyme stability, to suppress exonuclease activity, or to increase enzymatic activity.
- the LigD ligase domain can be replaced with another ligase e.g., T4 ligase), forming a fusion of LigD’ s polymerase and nuclease domains with the chosen ligase. This allows the fusion ligase to be recruited by Ku to DNA ends.
- a LigD exonuclease deficient mutant can be used when this nuclease activity is not desired.
- the disclosure provides for polypeptides that exhibit non-homologous end joining factor activity.
- the polypeptide may encode a wild-type enzyme, a homolog thereof or encode an engineered variant of the wild-type enzyme.
- FIG. 5 provides a sampling of wild-type sequences for LigD (see SEQ ID NO: 1 to 20).
- FIG. 6 provides a sampling of wild-type sequences for Ku (see SEQ ID NO:21 to 30).
- the disclosure provides for a polypeptide that has a sequence that is at least 80%, 85%, 90%, 95%, 98%, or 100% identical to any one of SEQ ID NO:1 to 20.
- the disclosure provides for a polypeptide that has a sequence that is at least 80%, 85%, 90%, 95%, 98%, or 100% identical to any one of SEQ ID NO:21 to 30.
- the polypeptides can encode LigD that exonuclease activity is suppressed by an appropriate substitution(s) in the exonuclease domain of LigD.
- An example of such a substitution includes, H373A of SEQ ID NO:1.
- substitutions are contemplated and can be quickly determined by in silico methods.
- homologs used with respect to an original enzyme or gene of a first family or species refers to distinct enzymes or genes of a second family or species which are determined by functional, structural or genomic analyses to be an enzyme or gene of the second family or species which corresponds to the original enzyme or gene of the first family or species. Most often, homologs will have functional, structural or genomic similarities. Techniques are known by which homologs of an enzyme or gene can readily be cloned using genetic probes and PCR. Identity of cloned sequences as homolog can be confirmed using functional assays and/or by genomic mapping of the genes.
- a protein has "homology” or is “homologous” to a second protein if the nucleic acid sequence that encodes the protein has a similar sequence to the nucleic acid sequence that encodes the second protein.
- a protein has homology to a second protein if the two proteins have "similar” amino acid sequences. (Thus, the term “homologous proteins” is defined to mean that the two proteins have similar amino acid sequences).
- two proteins are substantially homologous when the amino acid sequences have at least about 30%, 40%, 50% 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity.
- the sequences are aligned for optimal comparison purposes (e.g., gaps can be introduced in one or both of a first and a second amino acid or nucleic acid sequence for optimal alignment and non- homologous sequences can be disregarded for comparison purposes).
- amino acid or nucleic acid “identity” is equivalent to amino acid or nucleic acid "homology”).
- the percent identity between the two sequences is a function of the number of identical positions shared by the sequences, taking into account the number of gaps, and the length of each gap, which need to be introduced for optimal alignment of the two sequences.
- a “conservative amino acid substitution” is one in which an amino acid residue is substituted by another amino acid residue having a side chain (R group) with similar chemical properties (e.g., charge or hydrophobicity).
- R group side chain
- a conservative amino acid substitution will not substantially change the functional properties of a protein.
- the percent sequence identity or degree of homology may be adjusted upwards to correct for the conservative nature of the substitution. Means for making this adjustment are well known to those of skill in the art (see, e.g., Pearson et al., 1994, hereby incorporated herein by reference).
- a "conservative amino acid substitution” is one in which the amino acid residue is replaced with an amino acid residue having a similar side chain.
- Families of amino acid residues having similar side chains have been defined in the art. These families include amino acids with basic side chains (e.g., lysine, arginine, histidine), acidic side chains (e.g., aspartic acid, glutamic acid), uncharged polar side chains (e.g., glycine, asparagine, glutamine, serine, threonine, tyrosine, cysteine), nonpolar side chains (e.g., alanine, valine, leucine, isoleucine, proline, phenylalanine, methionine, tryptophan), beta-branched side chains (e.g., threonine, valine, isoleucine) and aromatic side chains (e.g., tyrosine, phenylalanine, tryptophan, histidine).
- the following six groups each contain amino acids that are conservative substitutions for one another: 1) Serine (S), Threonine (T); 2) Aspartic Acid (D), Glutamic Acid (E); 3) Asparagine (N), Glutamine (Q); 4) Arginine (R), Lysine (K); 5) Isoleucine (I), Leucine (L), Methionine (M), Alanine (A), Valine (V), and 6) Phenylalanine (F), Tyrosine (Y), Tryptophan (W).
- Sequence homology for polypeptides is typically measured using sequence analysis software. See, e.g., the Sequence Analysis Software Package of the Genetics Computer Group (GCG), University of Wisconsin Biotechnology Center, 910 University Avenue, Madison, Wis. 53705. Protein analysis software matches similar sequences using measure of homology assigned to various substitutions, deletions and other modifications, including conservative amino acid substitutions. For instance, GCG contains programs such as "Gap” and "Bestfit” which can be used with default parameters to determine sequence homology or sequence identity between closely related polypeptides, such as homologous polypeptides from different species of organisms or between a wild type protein and a mutein thereof.
- GCG Genetics Computer Group
- Bestfit programs
- BLAST Altschul, 1990; Gish, 1993; Madden, 1996; Altschul, 1997; Zhang, 1997), especially blastp or tblastn (Altschul, 1997).
- Typical parameters for BLASTp are: Expectation value: 10 (default); Filter: seg (default); Cost to open a gap: 11 (default); Cost to extend a gap: 1 (default); Max. alignments: 100 (default); Word size: 11 (default); No. of descriptions: 100 (default); Penalty Matrix: BLOWSUM62.
- polypeptide sequences can be compared using FASTA, a program in GCG Version 6.1.
- FASTA provides alignments and percent sequence identity of the regions of the best overlap between the query and search sequences (Pearson, 1990, hereby incorporated herein by reference).
- percent sequence identity between amino acid sequences can be determined using FASTA with its default parameters (a word size of 2 and the PAM250 scoring matrix), as provided in GCG Version 6.1, hereby incorporated herein by reference.
- the disclosure further provides methods using structure specific endonucleases (SSEs) for appending flap adaptors to polynucleotides.
- SSEs structure specific endonucleases
- These SSE-based methods can be used to selectively add flap adaptors in a sequence specific manner, thereby providing for enrichment of targeted polynucleotides that have a specific sequence (z.e., target enrichment).
- Traditional methods for target enrichment broadly fall into two categories: amplicon based or probe-based hybridization/pulldown.
- the former employs primer pairs and PCR to amplify targets from a sample; it is simple and fast but limited in its ability to multiplex very high numbers of targets due to PCR mispriming events. It is also restricted in the size of amplicon that can be produced due to the limits of current PCR technology.
- Other disadvantages of PCR such as sequence bias or polymerase slippage can also impact the performance scope.
- Hybridization methods involve making NGS library templates first then using a probe to pull out library templates that cover the target region of a genome of interest. This approach is generally longer in practice than the amplicon methods but is virtually limitless in the number of targets that can be enriched. Poorer specificity arising from hybridization of a single probe only, is mitigated by additional rounds of pulldown and/or increasing the probe length and Tm. [0121] In general, amplicon workflows are used for small panels of targets whereas hybridization workflows can be used for exome enrichment.
- the disclosure for the creation of target enriched polynucleotide library by methods using structure-specific nucleases. Accordingly, the disclosure provides an alternative methodology for creating target enriched libraries than known methods used in the art.
- the methods of disclosure provide increased specificity over conventional probebased hybridization/pulldown methods, by employing two probes flanking the target instead of one. Unlike amplicon-based methods, the methods of the disclosure have limitless multiplexity.
- a pre-generated library is not required and a target of any size and sequence can be enriched. While it is similar to Crispr/Cas9 approaches in that it employs two cleavage events on either side of a target sequence, it does, unlike Crispr/Cas, append adaptors either side of the target sequence.
- the resulting product comprising adaptors can be used to seed a flow cell directly or it can be further amplified by PCR if required. Amplification proceeds through primers that bind to the flanking adaptor sequences and thus is advantageous over multiplex PCR where amplification utilizes gene specific primers and efficiency varies between target amplicons.
- Structure-specific nucleases are a class of DNA binding/modifying enzymes that target structures in nucleic acids in vivo rather than sequences. These structures comprise deviations to the contiguous double helix structure that usually arise, for example, during DNA replication or in the process of damage repair. Structures such as Holiday Junctions, replication forks, or single-stranded flaps require enzymes to resolve their topology to ultimately restore the canonical structure of the genome. Examples of structure-specific nucleases include, but are not limited to, Holliday junction resolvases, and flap endonucleases.
- the disclosure provides for the creation of target enriched template library by the use of structurespecific nucleases, wherein the structure-specific nucleases comprise flap endonucleases.
- Flap endonuclease enzymes target junctions in DNA where a single-stranded stretch of DNA protrudes from the double-helix.
- a flap may be described as a 3’ flap or a 5’ flap depending on the polarity of the sequence (e.g., see FIG. 7).
- FEN1 is an example of a Flap endonuclease that targets and modifies a 5’ flap
- the XPF/MUS81 family of proteins are examples of Flap endonucleases that target and modify 3’ flap structures.
- FEN1 plays a central role in DNA replication both in eukaryotes and prokaryotes. It functions to remove single stranded 5’ flaps of DNA from Okazaki fragments that are generated on the lagging strand of the DNA replication fork. These flaps form when a primase generates an RNA primer that serves as a primer to extend a new DNA strand; multiple Okazaki fragments are generated and when the extending 3’ end of one abuts the 5’ end of another, it displaces it to form a flap structure (e.g., see FIG. 8). FEN1 binds to the 5’ flap and cleaves it at its base to leave a nick in the DNA (e.g., see FIG. 9).
- FEN1 can cleave single-stranded flaps of up to 200 nucleotides in length. It does not cleave single stranded DNA alone, such as the regions of single strands of the parental template strands at the replication fork; it only cleaves ssDNA strands in the structure of a flap.
- Its preferred substrate structure is a double flap where both the 5’ end of one strand and the 3’ end of the other abutting strand overlap and both ends form flaps and moreover, the 3’ flap is a single nucleotide long (e.g, see FIG. 11).
- a 5’ flap that contains double stranded regions are inhibitory for flap cleavage, even if the double stranded region is distant from the base of the flap.
- the disclosure provides compositions, methods, and kits directed to the use of FEN1 with a 5’ flap adaptor having a specific structure that is recognized by FENl(e.g., see FIG. 12).
- the 5’ flap adaptor comprises two oligonucleotides that when annealed together form a partially double stranded molecule.
- a ‘probe’ portion of this 5’ flap adaptor is single stranded and complementary to a targeted sequence.
- the double stranded portion of the 5’ flap adaptor comprises a universal sequence that can be, for instance, the sequences of adaptors for NGS. The last base-pair next to the single stranded probe portion may also match a targeted sequence.
- a flap structure forms (e.g., see FIG. 13).
- the structure comprising a 5’ flap from the target DNA and a single nucleotide 3’ flap from the adaptor is a substrate for FEN1, which can cleave, leaving a nick that can be subsequently joined by a ligase.
- the result is an addition of an adaptor to the 5’ end of a polynucleotide (e.g., see FIG. 14).
- Flap endonucleases in the XPF/MUS81 family of proteins play a role in damage repair caused by UV-light or DNA cross-linking. Its target is illustrated in FIG.
- the disclosure provides embodiments directed to the utilization of XPF/MUS81 3’ flap endonuclease activity in conjunction with a 3’ flap adaptor having a specific structure (e.g., see FIG. 17).
- the 3’ flap adaptor comprises, at a minimum, a single oligonucleotide comprising a 3’ sequence complementary to a target of interest in a genome and a 5’ sequence universal sequence that can be, for instance, the sequences of adaptors for NGS.
- the 5’ universal sequence may be double-stranded.
- the XPF/MUS81 3’ flap endonuclease then cleaves the DNA, leaving a nick that when extended with a polymerase copies the universal adaptor sequence to the target DNA (e.g., see FIG. 19).
- the 5’ flap adaptor, complementary to a target #1, and the 3’ flap adaptor, complementary to a target #2, as outlined above are hybridized to DNA, for example genomic DNA, a structure forms that contains a 5’ flap and a 3’ flap (e.g., see FIG. 20).
- the genomic DNA is fragmented to a suitable size such that the 5’ flap is less than 200 nucleotides long.
- the sample can be applied directly to a sequencer for sequencing of the DNA intervening the target sequences.
- the adaptor sequences can be used to append additional sequences, such as step-out primers.
- the 5’ flap adaptor illustrated in FIG. 12 is shown hybridized already in a double-flap structure in FIG. 13. In practice, achieving this double-flap structure may require sequential annealing of the individual oligos that comprise the FEN1 structure, such that the longer oligo that contains the probe #1 sequence anneals first to the target #1 followed by annealing of the shorter oligo complementary to the universal sequence of the adaptor.
- differential hybridization can be achieved by methods known to those skilled in the art, for example through design of the probe sequence and the universal sequences with different T m (e.g., see FIG. 23).
- the probe can have a lower T m than the universal adaptor sequence such that at a particular temperature the probe, but not the universal adaptor sequence, anneals first; lowering the temperature then enables the shorter universal adaptor oligo to anneal forming a structure illustrated in FIG. 13.
- the disclosure provides methods that utilizes a structure-specific endonuclease that has 5’ flap cleavage activity and a 5’ flap adaptor in order to append an adaptor to the 5’ end of a polynucleotide.
- the 5’ flap adaptor is hybridized to a complementary sequence of a single stranded polynucleotide.
- the 5’ flap adaptor comprises a first oligonucleotide that has a single stranded region that can hybridize to a target sequence of a polynucleotide.
- the length of the single stranded region can vary but should be of sufficient length to bind with high fidelity to a targeted sequence.
- the single stranded region of the 5’ flap adaptor that can hybridize to a targeted sequence can comprise 15 nt, 16 nt, 17 nt, 18 nt, 19 nt, 20 nt, 21 nt, 22 nt, 23 nt, 24 nt, 25 nt, 26 nt, 27 nt, 28 nt, 29 nt, 30 nt, 31 nt, 32, nt, 33 nt, 34 nt, 35 nt, or a range of nucleotide lengths between or including any two of the foregoing nucleotide lengths.
- the first oligonucleotide of the 5’ flap adaptor further comprises a single stranded region that codes for a universal sequence
- the universal sequence is not complementary (z.e., cannot hybridize) to the sequence of the polynucleotide.
- An example of a universal sequence includes, but is not limited to, a sequence commonly used for NGS applications, such a P5 or P7 sequence.
- the single stranded region that codes for a universal sequence can comprise 6 nt, 7 nt, 8 nt, 9 nt, 10 nt, 11 nt, 12 nt, 13 nt, 14 nt, 15 nt, 16 nt, 17 nt, 18 nt, 19 nt, 20 nt, 21 nt, 22, nt, 23 nt, 24 nt, 25 nt, or a range of nucleotide lengths between or including any two of the foregoing nucleotide lengths.
- the single stranded region that codes for a universal sequence can further comprise one or more barcode sequences, allowing for identification of the source of the polynucleotide if multiple sources of polynucleotides are being multiplexed in the same reaction.
- the first oligonucleotide of the 5’ flap adaptor can be hybridized directly to the target sequence of the polynucleotide and then a second oligonucleotide of the 5’ flap adaptor can be hybridized to the universal sequence of the first oligonucleotide of the 5’ flap adaptor.
- the second oligonucleotide of the 5’ flap adaptor comprises a sequence that is complementary to the universal sequence of the first oligonucleotide of the 5’ flap adaptor.
- the second oligonucleotide of the 5’ flap adaptor may further comprise a base pair on the 3’ end that is complementary to a base pair to the single stranded region of the first oligonucleotide that hybridizes to the target sequence of the polynucleotide.
- the second oligonucleotide of 5’ flap adaptor may be hybridized to the first oligonucleotide of the 5’ flap adaptor so that there is a double stranded region comprising base pairs for the universal sequence, and a single stranded region from the first oligonucleotide that can hybridize with a target sequence from a polynucleotide.
- the 5’ flap adaptor is bound to the target sequence of the polynucleotide a 5’ flap is generated in the polynucleotide.
- a 1 base pair 3’ flap may also be generated if the second oligonucleotide comprises a base pair on the 3’ end that is complementary to a base pair of the single stranded region of the first oligonucleotide that hybridizes to a target sequence of the polynucleotide.
- the generation of 5’ flap, or the 5’ flap and 1 bp 3’ flap of the polynucleotide- adaptor hybridized construct is then recognized by a structure-specific endonuclease that has 5’ flap cleavage activity.
- the structure-specific endonuclease binds the polynucleotide-adaptor construct and cleaves off the 5’ flap structure and forms a nick in the polynucleotide-adaptor hybridized construct. This nick may then be closed by use of a ligase.
- the end result is the 5’ flap adaptor being appended to the polynucleotide, such that the polynucleotide now contains a sequence for the universal adaptor.
- the disclosure provides methods that utilizes a structurespecific endonuclease that has 3’ flap cleavage activity and a 3’ flap adaptor in order to append an adaptor to the 3’ end of a polynucleotide.
- the 3’ flap adaptor is hybridized to a complementary sequence of a single stranded polynucleotide.
- the 3’ flap adaptor comprises an oligonucleotide that has a single stranded region that can hybridize to a target sequence of a polynucleotide. The length of the single stranded region can vary but should be of sufficient length to bind with high fidelity to a targeted sequence.
- the single stranded region of the 3’ flap adaptor that can hybridize to a targeted sequence can comprise 15 nt, 16 nt, 17 nt, 18 nt, 19 nt, 20 nt, 21 nt, 22 nt, 23 nt, 24 nt, 25 nt, 26 nt, 27 nt, 28 nt, 29 nt, 30 nt, 31 nt, 32, nt, 33 nt, 34 nt, 35 nt, or a range of nucleotide lengths between or including any two of the foregoing nucleotide lengths.
- the oligonucleotide of the 3’ flap adaptor further comprises a single stranded region that codes for a universal sequence, the universal sequence is not complementary (z.e., cannot hybridize) to the sequence of the polynucleotide.
- a universal sequence includes, but is not limited to, a sequence commonly used for NGS applications, such a P5 or P7 sequence.
- the universal sequence of the 3’ flap adaptor may be the same as the universal sequence of the 5’ flap adaptor, or alternatively be different.
- the single stranded region that codes for a universal sequence can comprise 6 nt, 7 nt, 8 nt, 9 nt, 10 nt, 11 nt, 12 nt, 13 nt, 14 nt, 15 nt, 16 nt, 17 nt, 18 nt, 19 nt, 20 nt, 21 nt, 22, nt, 23 nt, 24 nt, 25 nt, or a range of nucleotide lengths between or including any two of the foregoing nucleotide lengths.
- the single stranded region that codes for a universal sequence can further comprise one or more barcode sequences, allowing for identification of the source of the polynucleotide if multiple sources of polynucleotides are being multiplexed in the same reaction.
- the oligonucleotide of the 3’ flap adaptor is hybridized directly to a complementary target sequence of the polynucleotide.
- the 3’ flap adaptor and the 5’ flap adaptor hybridize to the same polynucleotide.
- the 5’ flap adaptor and the 3’ flap adaptor can be hybridized to the polynucleotide in a concurrent or sequential manner.
- a 3’ flap is generated in the polynucleotide.
- the generation of 3’ flap of polynucleotide-adaptor hybridized construct is then recognized by a structure-specific endonuclease that has 3’ flap cleavage activity.
- the structure-specific endonuclease binds the polynucleotide-adaptor hybridized construct and cleaves off the 3’ flap structure forming a 3’ overhang that comprises the universal sequence region of the 3’ adaptor.
- the 3’ overhang may be filed with a complementary sequence to the universal sequence of the 3’ adaptor.
- the disclosure provides methods that utilize tagmentation and template switch oligonucleotides to append adaptors to polynucleotides.
- Tagmentation is an established workflow for making templates for polynucleotide applications. The process relies on the transposase enzyme Tn5 fragmenting and simultaneously appending adaptor sequences to the 5’ ends of polynucleotide fragments (see FIG. 25).
- a second, Tn5 independent, step is used to further process the ‘adapted’ fragments to append a similar or different adaptor to the 3’ ends of the fragments thus completing the library template in a form ready for polynucleotide applications, like sequencing.
- the 3’ end adaptors are added in one of two ways. One way for appending adaptors to 3' ends of Tn5 tagmented products is shown in FIG. 26.
- the free 3’ end of the fragment can be extended in the presence of a polymerase (e.g., a strand displacing polymerase), dNTPs and heat (e.g., >68 °C) to remove the ‘non-transferred’ strand of the transposome ds adaptor.
- a polymerase e.g., a strand displacing polymerase
- dNTPs e.g., >68 °C
- the complement of the 5’ adaptor is copied, and finally a PCR reaction with two distinct primers (e.g., P5-i5-A14 and P7-i7-B15) can be used to enrich for those primary tagmentation molecules that have a P5 based adaptor on one end and a P7 based adaptor on the other end.
- FIG. 27 The other way for appending adaptors to 3' ends of Tn5 tagmented products is shown in FIG. 27.
- a single double-stranded ‘forked’ adaptor is employed in the transposome (see FIG. 27).
- a non-displacing polymerase is used at a temperature below the Tm of the nontransferred strand (e.g., ⁇ 55 °C), to extend the free 3’ end of the fragment until it reaches the 5’ end of the ‘non-transf erred’ adaptor strand and then a ligase covalently connects the nontransferred strand to the fragment.
- the disclosure provides new and innovative methods for appending an adaptor oligo sequence to the 3’ end of a tagmented fragments that utilizes partial extension from the free 3’ end of a tagmented fragment to generate a known sequence capable of hybridizing to, and extending off, a template switch oligo (see FIGs. 28-34).
- the disclosure additionally provides methods for processing a fully extended 3’ sequence to alter its base composition.
- FIG. 28 An embodiment for appending adaptors to the ends of tagmented polynucleotides capable of hybridizing to, and extending off, a template switch oligo is shown in FIG. 28.
- DNA is tagmented with a transposome that may or may not have modifications to the transferred strand.
- the tagmented library is treated to de-anneal and remove the non-transferred strand of the transposome. For example, if the transposome is attached to a solid surface (e.g., a bead), mild heat followed by a hot wash will remove the strand. Next a replacement oligo that has a higher Tm than the non-transferred strand is hybridized back to the appended adaptors.
- the oligo can be longer than the non-transferred strand it replaces, or it may contain modifications such as ‘Linked Nucleic Acids’ (LNAs). The modifications may also or only be present in the transferred strand.
- LNAs Linked Nucleic Acids
- the replacement oligo does not hybridize in place of all the non-transferred strand, but instead does so partially. This results in portion of the adaptor, 5’ of the replacement oligo, remaining single stranded.
- a non-displacing polymerase is used to extend from the 3’ ends of the insert filling in the ends of the insert and extending over the adaptor sequence but stopping when it reaches the hybridized replacement oligo.
- each strand of the insert now has an adaptor appended at its 5’ end (through tagmentation) and a partial adaptor at its 3’ end (through the extension step described herein). If the strands are denatured, they can then participate in a template switch reaction where a 3’ blocked template switch oligo is annealed and serves as a template to further extend the 3’ end of the insert to add new sequences to the 3’ end of the fragment.
- FIG. 28A-B Additional embodiments for appending adaptors to the ends of tagmented polynucleotides capable of hybridizing to, and extending off, a template switch oligo is shown in FIG. 28A-B. As shown in FIG.
- a transposome is employed that already has the ‘replacement oligo’ annealed next to the non-transferred strand.
- the nontransferred strand can be shorter than its standard 19 base Tb5 recognition sequence (for example 16 bases).
- Tb5 recognition sequence for example 16 bases.
- moderate heat can be used to denature the non-transferred strand leaving the ‘LNA containing replacement oligo’ still annealed.
- a non-displacing polymerase is used to extend from the 3’ ends of the insert filling in the ends of the insert and extending over the non-transferred strand but stopping when it reaches the hybridized replacement oligo.
- each strand of the insert now has an adaptor appended at its 5’ end (through tagmentation) and a partial adaptor at its 3’ end (through the extension step described herein).
- the strands can then participate in a template switch reaction where a 3’ blocked template switch oligo is annealed and serves as a template to further extend the 3’ end of the insert to add new sequences to the 3’ end of the fragment.
- a transposome is employed that already has the ‘replacement oligo’ annealed next to the non-transferred strand.
- the non-transferred strand can be shorter than its standard 19 base Tb5 recognition sequence (for example 16 bases).
- each strand of the insert now has an adaptor appended at its 5’ end (through tagmentation) and a partial adaptor at its 3’ end (through the extension step described herein).
- strands are denatured, they can then participate in a template switch reaction where a 3’ blocked template switch oligo is annealed and serves as a template to further extend the 3’ end of the insert to add new sequences to the 3’ end of the fragment.
- FIG. 30 Additional embodiments for appending adaptors to the ends of tagmented polynucleotides capable of hybridizing to, and extending off, a template switch oligo is shown in FIG. 30.
- a transposome is employed that contains one or more internal modifications in the non-transferred strand that prevents exonuclease digestion, e.g., a phosphorothioate linkage in the phosphodiester backbone of the oligo.
- a polymerase with a 5’ to 3’ exonuclease activity is employed to extend from the free 3’ end of the insert.
- each strand of the insert now has an adaptor appended at its 5’ end (through tagmentation) and a partial adaptor at its 3’ end (through the extension step described herein). If the strands are denatured, they can then participate in a template switch reaction where a 3’ blocked template switch oligo is annealed and serves as a template to further extend the 3’ end of the insert to add new sequences to the 3 ’ end of the fragment.
- FIGs. 31A-31B Additional embodiments for appending adaptors to the ends of tagmented polynucleotides capable of hybridizing to, and extending off, a template switch oligo is shown in FIGs. 31A-31B.
- a non-displacing polymerase and all four dNTPs (dATP, dCTP, dGTP, dTTP) is used to extend from the free 3’ end of the fragment up to the 5’ end of the non-transferred strand.
- the polymerase and dNTPs are then removed (eg purified on SPRI beads, or by magnetic bead-based washing if the adaptors are attached to a bead).
- a fresh aliquot of a strand displacing polymerase is added and just three out of the four dNTPs (dCTP, dGTP, dTTP); dATP is absent.
- This mix will continue to extend the 3’ end of the fragment across the Tn5 adaptor which for the first 19 bases is a known Tn5 recognition sequence (5’AGATGTGTATAAGAGACAG3’) (SEQ ID NO: 31) but will stop incorporating bases once it reaches a ‘T’ base in the template, due to the absence of dATP molecules.
- each strand of the insert now has an adaptor appended at its 5’ end (through tagmentation) and a partial adaptor at its 3’ end (through the extension step described herein and comprising the sequence: 5’CTGTCTCTT3’).
- the strands are denatured, they can then participate in a template switch reaction where a 3’ blocked template switch oligo is annealed and serves as a template to further extend the 3 ’ end of the insert to add new sequences to the 3 ’ end of the fragment. As shown in FIG.
- a non-displacing polymerase and all four dNTPs (dATP, dCTP, dGTP, dTTP) is used to extend from the free 3’ end of the fragment up to the 5’ end of the nontransferred strand.
- the polymerase and dNTP are then removed (e.g., purified on SPRI beads, or by magnetic bead-based washing if the adaptors are attached to a bead).
- the non-transferred strand is then removed, e.g., by moderate heat to selective denature the strand, or application of a lambda exonuclease that selectively digests oligos containing a 5’ phosphorylated ends (as is the case with non-transferred strands), or other means known to those skilled in the art.
- a fresh aliquot of a polymerase is added and just three out of the four dNTPs (dCTP, dGTP, dTTP); dATP is absent.
- each strand of the insert now has an adaptor appended at its 5’ end (through tagmentation) and a partial adaptor at its 3’ end (through the extension step described herein and comprising the sequence: 5’CTGTCTCTT3’).
- the strands are denatured, they can then participate in a template switch reaction where a 3’ blocked template switch oligo is annealed and serves as a template to further extend the 3 ’ end of the insert to add new sequences to the 3 ’ end of the fragment.
- the workflow described in the embodiments above may take place where the transposomes are attached to a surface such as a bead, or the transposomes may be free in solution.
- the 3’ end can also be completed following extension by hybridizing an oligo that is partially complementary to the 5’ adaptor but contains further sequences that are unique and not present in the 5’ adaptor.
- a ligation reaction covalently joins this oligo to the 3’ end of the insert and forms a ‘ Y’ shaped adaptor construct (see FIG. 32).
- FIG. 33A Additional embodiments for appending adaptors to the ends of tagmented polynucleotides capable of hybridizing to, and extending off, a template switch oligo is shown in FIG. 33A.
- a single transposome type comprises a P5 transferred-strand and a short non-transferred-strand is used to tagment DNA.
- a strand displacing polymerase is added to extend from the free 3’ end, create the complement of the P5 transferred-strand (ie., creates P5’), and displace the short non-transferred strand. The temperature is then elevated to make the fragments single stranded.
- a P7 template switch oligo hybridizes forming a forked structure that is partially double-stranded. Then in the presence of a polymerase with 3’ exo activity, the single stranded 3’ end is degraded by this activity until there is no longer any single stranded 3’ end. The remaining 3’ end of the fragment then forms a primer template that extends and creates the complement of the P7 template switch oligo.
- FIG. 34 Additional embodiments for appending adaptors to the ends of tagmented polynucleotides capable of hybridizing to, and extending off, a template switch oligo is shown in FIG. 34.
- a structure specific endonuclease e.g., XPF/MUS81
- the endonuclease nicks the double stranded region of the duplex creating a free 3’ end that a polymerase extends and creates the complement of the P7 template switch oligo.
- the adaptor-polynucleotide constructs prepared according to the methods disclosed herein can be used in any method of nucleic acid analysis, e.g., sequencing of the templates or amplification products thereof.
- Exemplary uses of the template libraries include, but are not limited to, providing templates for whole genome amplification, sequencing, subcloning, and PCR amplification (of either monotemplate or complex template libraries).
- Template libraries prepared according to a method of the disclosure can be from a complex mixture of genomic DNA fragments representing a whole or substantially whole genome provide suitable templates for so-called “whole-genome” amplification.
- the term “whole-genome amplification” refers to a nucleic acid amplification reaction (e.g., PCR) in which the template to be amplified comprises a complex mixture of nucleic acid fragments representative of a whole (or substantially whole genome).
- solid-phase amplification refers to any nucleic acid amplification reaction carried out on or in association with a solid support such that all or a portion of the amplified products are immobilized on the solid support as they are formed.
- solid-phase PCR solid-phase polymerase chain reaction
- solid-phase PCR is a reaction analogous to standard solution phase PCR, except that one or both of the forward and reverse amplification primers is/are immobilized on the solid support.
- one amplification primer may be immobilized (the other primer usually being present in free solution).
- both the forward and the reverse primers may be immobilized.
- References herein to forward and reverse primers are to be interpreted accordingly as encompassing a “plurality” of such primers unless the context indicates otherwise.
- Amplification primers for solid-phase PCR are preferably immobilized by covalent attachment to the solid support at or near the 5' end of the primer, leaving the templatespecific portion of the primer free for annealing to its cognate template and the 3' hydroxyl group free for primer extension.
- attachment chemistry will depend on the nature of the solid support, and any derivatization or functionalization applied to it.
- the primer itself may include a moiety, which may be a non-nucleotide chemical modification, to facilitate attachment.
- cluster and “colony” are used interchangeably herein to refer to a discrete site on a solid support comprised of a plurality of identical immobilized nucleic acid strands and a plurality of identical immobilized complementary nucleic acid strands.
- clustered array refers to an array formed from such clusters or colonies. In this context the term “array” is not to be understood as requiring an ordered arrangement of clusters.
- the disclosure further provides methods of sequencing amplified nucleic acids generated by whole genome or solid-phase amplification.
- the disclosure provides a method of nucleic acid sequencing comprising amplifying a library of nucleic acid templates using whole genome or solid-phase amplification as described above and carrying out a nucleic acid sequencing reaction to determine the sequence of the whole or a part of at least one amplified nucleic acid strand produced in the whole genome or solid-phase amplification reaction.
- Sequencing can be carried out using any suitable “sequencing-by-synthesis” technique, wherein nucleotides are added successively to a free 3 ' hydroxyl group, resulting in synthesis of a polynucleotide chain in the 5' to 3' direction.
- the nature of the nucleotide added is preferably determined after each nucleotide addition.
- the initiation point for the sequencing reaction may be provided by annealing of a sequencing primer to a product of the whole genome or solid-phase amplification reaction.
- one or both of the adaptors added during formation of the template library may include a nucleotide sequence which permits annealing of a sequencing primer to amplified products derived by whole genome or solid-phase amplification of the template library.
- bridged structures formed by annealing of pairs of Immobilized polynucleotide strands and immobilized complementary strands, both strands being attached to the solid support (e.g., a flowcell) at the 5' end.
- Arrays comprised of such bridged structures provide inefficient templates for nucleic acid sequencing, since hybridization of a conventional sequencing primer to one of the immobilized strands is not favored compared to annealing of this strand to its immobilized complementary strand under standard conditions for hybridization.
- Bridged template structures may be linearized by cleavage of one or both strands with a restriction endonuclease or by cleavage of one strand with a nicking endonuclease.
- Other methods of cleavage can be used as an alternative to restriction enzymes or nicking enzymes, including inter alfa chemical cleavage (e.g.
- cleavage of a diol linkage with periodate cleavage of a diol linkage with periodate
- cleavage of abasic sites by cleavage with endonuclease or by exposure to heat or alkali
- cleavage of ribonucleotides incorporated into amplification products otherwise comprised of deoxyribonucleotides photochemical cleavage or cleavage of a peptide linker.
- a linearization step may not be essential if the solidphase amplification reaction is performed with only one primer covalently immobilized and the other in free solution.
- the product of the cleavage reaction may be subjected to denaturing conditions in order to remove the portion(s) of the cleaved strand(s) that are not attached to the solid support.
- denaturing conditions will be apparent to the skilled reader with reference to standard molecular biology protocols (Sambrook el al.. 2001, Molecular Cloning, A Laboratory Manual, 3rd Ed, Cold Spring Harbor Laboratory Press, Cold Spring Harbor Laboratory Press, NY; Current Protocols, eds. Ausubel el al. .
- Denaturation results in the production of a sequencing template which is partially or substantially single-stranded.
- a sequencing reaction may then be initiated by hybridization of a sequencing primer to the singlestranded portion of the template.
- the nucleic acid sequencing reaction may comprise hybridizing a sequencing primer to a single-stranded region of a linearized amplification product, sequentially incorporating one or more nucleotides into a polynucleotide strand complementary to the region of amplified template strand to be sequenced, identifying the base present in one or more of the incorporated nucleotide(s) and thereby determining the sequence of a region of the template strand.
- One sequencing method which can be used in accordance with the disclosure relies on the use of modified nucleotides that can act as chain terminators. Once the modified nucleotide has been incorporated into the growing polynucleotide chain complementary to the region of the template being sequenced there is no free 3'-OH group available to direct further sequence extension and therefore the polymerase cannot add further nucleotides. Once the nature of the base incorporated into the growing chain has been determined, the 3' block may be removed to allow addition of the next successive nucleotide. By ordering the products derived using these modified nucleotides it is possible to deduce the DNA sequence of the DNA template.
- Such reactions can be done in a single experiment if each of the modified nucleotides has attached a different label, known to correspond to the particular base, to facilitate discrimination between the bases added at each incorporation step.
- a separate reaction may be carried out containing each of the modified nucleotides separately.
- the modified nucleotides may carry a label to facilitate their detection.
- this is a fluorescent label.
- Each nucleotide type may carry a different fluorescent label.
- the detectable label need not be a fluorescent label. Any label can be used which allows the detection of an incorporated nucleotide.
- One method for detecting fluorescently labelled nucleotides comprises using laser light of a wavelength specific for the labelled nucleotides, or the use of other suitable sources of illumination.
- the fluorescence from the label on the nucleotide may be detected by a CCD camera or other suitable detection means.
- the disclosure is not intended to be limited to use of the sequencing method outlined above, as essentially any sequencing methodology which relies on successive incorporation of nucleotides into a polynucleotide chain can be used. Suitable alternative techniques include, for example, PyrosequencingTM, FISSEQ (fluorescent in situ sequencing), MPSS (massively parallel signature sequencing) and sequencing by ligation-based methods.
- the target polynucleotide to be sequenced using the method of the disclosure may be any polynucleotide that it is desired to sequence.
- Using the template library preparation method described in detail herein it is possible to prepare template libraries starting from essentially any double or single-stranded target polynucleotide of known, unknown or partially known sequence. With the use of clustered arrays prepared by solid-phase amplification it is possible to sequence multiple targets of the same or different sequence in parallel.
- kits comprising the non-homologous end joining factors disclosed herein.
- the kits can be tailored for use in particular applications.
- the kits can be directed to the use of non-homologous end joining factors, or use of structure specific endonucleases in preparing libraries of adaptor-polynucleotide constructs using the methods of the disclosure.
- Such kits can comprise at least a supply of adaptors as defined.
- the kits can further comprise enzymes (e.g., structure specific endonucleases or non-homologous end joining factors), and/or amplification primers).
- the structure and properties of amplification primers will be well known to those skilled in the art.
- Adaptors included in the kit can be readily prepared using standard automated nucleic acid synthesis equipment and reagents in routine use in the art.
- kits may be supplied in the kits ready for use, or more preferably as concentrates-requiring dilution before use, or even in a lyophilized or dried form requiring reconstitution prior to use.
- the kits may further include a supply of a suitable diluent for dilution or reconstitution of the adaptors.
- the kits may further comprise supplies of reagents, buffers, enzymes, for use in carrying out the methods disclosed herein.
- Further components which may optionally be supplied in the kit include “universal” sequencing primers suitable for sequencing templates prepared using the adaptors and primers.
- a transposome was constructed that contained LNA modifications in its transferred strand and was used to tagment genomic DNA.
- the transposome was immobilized on a streptavidin paramagnetic bead via a 3’ biotin group on an ‘anchor’ oligo hybridized to the transferred strand.
- the tagmentation was conducted in the presence of a ligase enzyme and an IlluminaTM Indexing primer P5-i5-A14. This primer hybridized 5’ of the transferred strand and was ligated to it by virtue of the 5’ end of the transferred strand bearing a phosphate moiety by design.
- the Tn5 protein was then removed by denaturing it with a solution of SDS. Different SDS concentrations (%) were tested to effect complete removal of the Tn5 protein.
- a mixture of a non-displacing polymerase (tTaq608), dNTPs, Q5 polymerase and a template switching oligo was added and incubated at 47 °C to deanneal the non-transferred strand and extend with tTaq608 pol as far as the anchor oligo. The temperature was then raised further (60-70°C) to the point where the templates were rendered single stranded and no longer attached to the beads.
- the temperature was then lowered to 42 °C for 1 min to allow the template switch oligo containing the P7-i7-B15 sequences to hybridize.
- Q5 polymerase then extended from the free 3’ end of the fragment to copy the P7-i7-B15, thus completely the template construct.
- a qPCR reaction was performed to quantify how much completed template was present. The graph indicates that up to 2000 pM of correct product was formed under SDS concentrations that removed all of the Tn5 from the tagmented product complex (see FIG. 29).
- the template switch oligo comprised either a free extendable -OH group at its 3’ end or a non-extendable blocked dideoxyC group at its 3’ end, or a non-extendable ‘inverted T’ blocking group at its 3’ end.
- the P5’ end of the template is digested and replaced with the P7’.
- the products of the reaction were subjected to analysis by ‘gel size exclusion’ electrophoresis and by a qPCR reaction which only amplifies and quantifies templates that have a P5 adaptor at their 5’ end and a P7’ adaptor at their 3’ end.
- FIG. 33C presents the image of the gel electrophoresis and indicates that neither of the two 3 ’blocked template switch oligos were consumed indicating that the block is effective in preventing extension from the 3’ end of the template switch oligo which would result in creating a copy of the entire template.
- extension and copying of the entire template occurred as evident from the reduction in fluorescence intensity of the template switch oligo band and the appearance of higher molecular weight product labelled with FAM.
- FIG. 33D presents the results of the qPCR analysis that only detects product that is correctly appended with P5 on the 5’ end and P7’ on the 3’ end, but not product appended with P5 on both ends.
- Both blocked template switch oligos yielded the correct product at approximately 1,700 pM.
- the unblocked template switch oligo produced 1.5x as much product as a result of two mechanisms: (i) ‘fork’ -modulated switch in activity from exonuclease to extension activity to append the P7’ adaptor to the 3’ end of the template, and (ii) extension from the 3’ end of the template switch oligo to append a copy of the template to the P7 template switch oligo.
- the P5’ end of the template is digested and replaced with the P7’ copied from the template switch oligo (see FIG. 33E).
- a control experiment was also performed using a transposome comprising a P5 transferred-strand hybridized to a bead via an anchor oligo and a non-transferred strand comprising a single stranded 3’ end that is complementary to a P7 oligo (see FIG. 33F).
- FIG. 33G indicates that the tempi ate- switch workflow of FIG. 33E produced a greater yield of library than the control workflow of FIG. 33F.
Landscapes
- Chemical & Material Sciences (AREA)
- Organic Chemistry (AREA)
- Life Sciences & Earth Sciences (AREA)
- Analytical Chemistry (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Immunology (AREA)
- Microbiology (AREA)
- Molecular Biology (AREA)
- Biotechnology (AREA)
- Physics & Mathematics (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Biochemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Genetics & Genomics (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The disclosure relates to methods for appending adaptors to the 5' and/or 3' ends of polynucleotides.
Description
METHODS FOR APPENDING ADAPTORS ONTO POLYNUCLEOTIDES
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] The present application claims the benefit of U.S. Provisional Application No. 63/592,016, filed October 20, 2023, which is incorporated herein by reference in its entirety.
REFERENCE TO SEQUENCE LISTING SUBMITTED ELECTRONICALLY
[0002] The content of the electronically submitted sequence listing (Name: 4213_010PC01_SequenceListing_ST26.xml; Size: 53,841 bytes; and Date of Creation: October 15, 2024), filed with the application, is incorporated herein by reference in its entirety.
TECHNICAL FIELD
[0003] The disclosure relates to methods for appending adaptors to the 5’ and/or 3’ ends of polynucleotides.
BACKGROUND
[0004] Library preparation aims to build a collection of DNA fragments for nextgeneration sequencing (NGS). A high-quality DNA library guarantees uniform and consistent genome coverage, thus delivering comprehensive and reliable sequencing data. The conversion of sample DNA to library DNA can be inefficient using standard ligation methodologies, however.
[0005] Next generation sequencing (NGS) typically requires library preparation, where known adaptor DNA sequences are added to the target DNA to be sequenced. Traditionally, this requires that sample DNA is fragmented, end-repaired, and then ligated to the adaptor DNA. While ligation-mediated library prep can yield the highest quality genomes, the conversion of sample DNA to library DNA can be inefficient. In cases where the quantity of sample DNA is in short supply, this poor efficiency makes ligation-mediated library prep more challenging or even infeasible.
[0006] Traditional methods for target enrichment for NGS broadly fall into two categories: amplicon based or probe-based hybridization/pulldown. The former employs primer
pairs and PCR to amplify targets from a sample; it is simple and fast but limited in its ability to multiplex very high numbers of targets due to PCR mispriming events. It is also restricted in the size of amplicon that can be produced due to the limits of current PCR technology. Other disadvantages of PCR such as sequence bias or polymerase slippage can also impact the performance scope. Hybridization approaches are generally longer in practice than amplicon methods but are virtually limitless in the number of targets that can be enriched. Poorer specificity arising from hybridization of a single probe only, is mitigated by additional rounds of pulldown and/or increasing the probe length and Tm.
SUMMARY
[0007] The disclosure provides methods to append adaptors to the 5’ and/or 3’ ends of polynucleotides. The resulting adaptor-polynucleotide constructs can be then used in various applications, including NGS.
[0008] In a particular embodiment, the disclosure provides a method to append adaptors to the 5’ and 3’ ends of polynucleotides, comprising: fragmenting gDNA or cDNA into polynucleotides that are less than 1000 base pairs in length; end repairing and phosphorylating the polynucleotides; attaching adaptors to the 5’ and 3’ ends of the end-repaired polynucleotides using non-homologous end joining factors. In a further embodiment, the gDNA or cDNA is fragmented by enzymatic digestion, chemical cleavage, sonication, nebulization, or hydroshearing. In yet a further embodiment, the gDNA or cDNA is fragmented by sonication. In another embodiment, the DNA fragments are enzymatically end repaired and phosphorylated by using T4 DNA polymerase and T4 polynucleotide kinase. In yet another embodiment, prior to attaching adaptors to the polynucleotides, a single ‘A’ deoxynucleotide is added to the end- repaired DNA fragments by use of Klenow enzyme which lacks exonuclease activity. In a certain embodiment, the adaptors comprise a 3' overhang of a ‘T’ deoxynucleotide. In a further embodiment, the adaptors comprise a double stranded region of complementary sequence and a single stranded region of sequence mismatch. In yet a further embodiment, the adaptors are Y- shaped or U-shaped. In another embodiment, the single stranded regions of the adaptors comprise one or more of the following sequences: P5: 5' AAT GAT ACG GCG ACC ACC GA 3' (SEQ ID NO: 32) and P7: 5' CAA GCA GAA GAC GGC ATA CGA GAT 3' (SEQ ID NO: 33). In yet another embodiment, prior to attaching adaptors to the end-repaired polynucleotides, oligonucleotides are added to the 3’ ends of the DNA fragments with terminal transferase. In a
further embodiment, the adaptors comprise an overhang of base pairs that are complementary to the oligonucleotides added to the 3’ ends of the DNA fragments. In yet a further embodiment, the adaptors comprise a double stranded region of complementary sequence and a single stranded region of sequence mismatch. In another embodiment, the adaptors are Y-shaped or U-shaped. In yet another embodiment, the single stranded regions of the adaptors comprise one or more of the following sequences: P5: 5' AAT GAT ACG GCG ACC ACC GA 3' (SEQ ID NO: 32) and P7: 5' CAA GCA GAA GAC GGC ATA CGA GAT 3' (SEQ ID NO: 33). In a further embodiment, the non-homologous end joining factors are LigD and Ku, or an engineered variant thereof. In yet a further embodiment, the LigD and Ku are from, or derived from, from Mycobacterium. In a certain embodiment, a non-homologous end joining factor is encoded by a polypeptide that has a sequence that is at least 80%, 85%, 90%, 95%, 98%, 99% identical to SEQ ID NO: 1 to 20 and has LigD activity. In another embodiment, a non-homologous end joining factor is encoded by a polypeptide that has a sequence that is at least 80%, 85%, 90%, 95%, 98%, 99% identical to SEQ ID NO:21 to 30 and has Ku activity. In yet another embodiment, the engineered variant of LigD lacks exonuclease activity. In a further embodiment, the engineered variant has the sequence of SEQ ID NO: 1 with the following substitution H373 A.
[0009] In a particular embodiment, the disclosure also provides a method to append an adaptor to the 5’ end of a polynucleotide, comprising the steps of: (1) hybridizing a 5’ flap adaptor to a single stranded polynucleotide to form a hybridized product comprising a 5’ flap; (2) contacting the hybridized product with a structure-specific endonuclease that has 5’ flap cleavage activity, wherein the structure-specific endonuclease cleaves off the 5’ flap of the hybridized product to form a nicked hybridized product; and (3) contacting the nicked hybridized product with a ligase to form a ligated product comprising a 5’ flap adaptor appended to the 5’ end of the polynucleotide. In a further embodiment, the method further comprises appending a second adaptor to the 3’ end of the polynucleotide, comprising the steps of: (4) hybridizing a 3’ flap adaptor to the polynucleotide of (3) to form a second hybridized product comprising a 3’ flap; (5) contacting the second hybridized product with a second structure-specific endonuclease that has 3’ flap cleavage activity, wherein the second structure-specific endonuclease cleaves off the 3’ flap to form a clipped hybridized product that has a 3’ overhang of base pairs from the 3’ flap adaptors; and (6) contacting the clipped hybridized product with a polymerase, wherein the polymerase fills in the 3’ overhang of the clipped hybridized product to form a polynucleotide comprising adaptors at the 5’ and 3’ ends. In an alternate embodiment, the disclosure also provides a method of appending an adaptor to the 3’ end of a polynucleotide, comprise the steps
of: (A) hybridizing a 3’ flap adaptor to a single stranded polynucleotide to form a hybridized product comprising a 3’ flap; (B) contacting the hybridized product with a second structurespecific endonuclease that has 3’ flap cleavage activity, wherein the second structure-specific endonuclease cleaves off the 3’ flap to form a clipped hybridized product that has a 3’ overhang of base pairs from the 3’ flap adaptor; and (C) contacting the clipped hybridized product with a polymerase, wherein the polymerase fills in the 3’ overhang to form a polynucleotide with an adaptor appended to the 3’ end. In a further embodiment, the method further comprises appending a second adaptor to the 5’ end of the polynucleotide, comprising the steps of: (D) hybridizing a 5’ flap adaptor to the polynucleotide of (C) to form a second hybridized product comprising a 5’ flap; (E) contacting the second hybridized product with a structure-specific endonuclease that has 5’ flap cleavage activity, wherein the structure-specific endonuclease cleaves off the 5’ flap of the second hybridized product to form a nicked hybridized product; and (F) contacting the nicked hybridized product with a ligase to form a ligated product comprising a 5’ flap adaptor appended to the 5’ end of the polynucleotide. In yet another alternate embodiment, the disclosure further provides a method of appending adaptors to the 5’ and 3’ ends of a polynucleotide, comprising: (i) hybridizing a 5’ flap adaptor and a 3’ flap adaptor to a single stranded polynucleotide to form a hybridized product comprising a 5’ flap and a 3’ flap; (ii) contacting the hybridized product with a structure-specific endonuclease that has 5’ flap cleavage activity, wherein the structure-specific endonuclease cleaves off the 5’ flap of the hybridized product to form a nicked hybridized product; (iii) contacting the nicked hybridized product with a ligase to form a ligated product comprising a 5’ flap adaptor appended to the 5’ end of the polynucleotide; (iv) contacting the ligated product with a second structure-specific endonuclease that has 3’ flap cleavage activity, wherein the second structure-specific endonuclease cleaves off the 3’ flap to form a clipped hybridized product that has a 3’ overhang of base pairs from the 3’ flap adaptors; and (V) contacting the clipped hybridized product with a polymerase, wherein the polymerase fills in the 3’ overhang of the clipped hybridized product to form a polynucleotide comprising adaptors at the 5’ and 3’ ends. In a further embodiment, the 5’ flap adaptor comprises a double stranded adaptor region and a single stranded probe region, wherein the single stranded probe region is complementary to a target sequence of the polynucleotide, and wherein the double stranded adaptor region comprises a universal sequence. In another embodiment, the base-pair of the double stranded adaptor region adjacent to the single stranded probe region also matches to the target sequence of the polynucleotide. In yet another embodiment, the universal sequence is a sequence that is commonly used to generate sequence
reads using a next generation sequencing platform. In a further embodiment, the structurespecific endonuclease that has 5’ flap cleavage activity is FEN1. In yet a further embodiment, the ligase is ligase selected from T4 DNA ligase, T7 DNA ligase, and Hi-T4 DNA ligase. In another embodiment, the 3’ flap adaptor comprises a single stranded adaptor region and a single stranded probe region, wherein the single stranded probe region is complementary to a target sequence of the polynucleotide, and wherein the single stranded adaptor region comprises a universal sequence. In yet another embodiment, the universal sequence is a sequence that is commonly used to generate sequence reads using a next generation sequencing platform. In a certain embodiment, the structure-specific endonuclease that has 3’ flap cleavage activity is XPF/MUS81. In another embodiment, the 5’ flap adaptor and/or the 3’ flap adaptor comprises a bar code sequence. In yet another embodiment, the polynucleotides comprising 3’ and/or 5’ adaptors come from different genetic or polynucleotide sources and the source of the polynucleotides can be identified based upon the bar code sequence.
[0010] The disclosure further provides a method to append adaptors to the 5’ and 3’ ends of polynucleotides to form an adaptor-polynucleotide constructs, comprising: (a) appending an adaptor to the 5' end of a polynucleotide by tagmenting the polynucleotide with a Tn5 transposome comprising: an adaptor strand that is transferred to the 5' end of the polynucleotide, and a non-transferred strand that can be removed under denaturing conditions; (b) annealing a replacement oligonucleotide that comprises one or more locked nucleic acids (LNAs) to the polynucleotide comprising the 5' adaptor; (c) extending the polynucleotide comprising the 5' adaptor up to the replacement oligonucleotide using a non-strand displacing polymerase and dNTPs, wherein the extended product comprises a binding region for a template switch oligonucleotide; (d) denaturing and removing the replacement oligonucleotide to isolate a polynucleotide that comprises the 5' adaptor and the template switch oligonucleotide binding region; (e) annealing a template switch oligonucleotide that is blocked at its 3' end and comprises a 3' adaptor sequence to the template switch oligonucleotide binding region of the polynucleotide; and (f) extending from the template switch oligonucleotide at the 3' end of the polynucleotide with a polymerase and dNTPs to form a polynucleotide that comprises adaptors at the 5’ and 3’ ends of the polynucleotide. In a further embodiment, the adaptor strand may or may not have nucleotide modifications. In yet a further embodiment, the adaptor strand comprises one or more LNAs. In another embodiment, the non-transferred strand can be denatured and removed using mild heat followed by a hot wash. In yet another embodiment, the replacement oligonucleotide has a higher Tm than the non-transferred strand. In a further embodiment, the
replacement oligonucleotide partially hybridizes to the same region as the non-transferred strand, resulting in the 5' portion of the polynucleotide being single strand upstream of the replacement oligonucleotide. In yet a further embodiment, the non-strand displacing polymerase is selected from a T4-based polymerase, a T7-based polymerase, a Pfu-based polymerase, and a Taq-based polymerase. In another embodiment, wherein the template switch oligonucleotide is blocked at its 3' end by having an -OH group, an ‘inverted T’ group, or a dideoxy version of a dNTP. In yet another embodiment, the Tn5 transposome is immobilized on a streptavidin paramagnetic bead. In yet another embodiment, the Tn5 transposome is immobilized on a solid substrate by the adaptor strand hybridizing to an anchor oligonucleotide attached to the solid substrate. In a further embodiment, the 5' adaptor and the 3' adaptor have different sequences selected from either P5 or P7: P5: 5' AAT GAT ACG GCG ACC ACC GA 3' (SEQ ID NO: 32) or P7: 5' CAA GCA GAA GAC GGC ATA CGA GAT 3' (SEQ ID NO: 33). In yet a further embodiment, the adaptor-polynucleotide constructs are used as templates for sequencing.
[0011] In a certain embodiment, the disclosure also provides a method to append adaptors to the 5’ and 3’ ends of polynucleotides to form an adaptor-polynucleotide constructs, comprising: (a) appending an adaptor to the 5' end of a polynucleotide by tagmenting the polynucleotide with a Tn5 transposome comprising an adaptor strand that is transferred to the 5' end of the polynucleotide, and a replacement oligonucleotide that comprises locked nucleic acids (LNAs) that remains hybridized to the adaptor under moderate denaturing conditions, and a nontransferred strand that can be removed under moderate denaturing conditions; (b) denaturing under moderate denaturing conditions to remove the non-transferred strand and extending the polynucleotide comprising the 5' adaptor up to the replacement oligonucleotide comprising LNAs using a non-strand displacing polymerase and dNTPs, wherein the extended product comprises a binding region for a template switch oligonucleotide; (c) denaturing and removing the replacement oligonucleotide comprising LNAs to isolate a polynucleotide that comprises the 5' adaptor and the template switch oligonucleotide binding region; (d) annealing a template switch oligo that is blocked at its 3' end and comprises a 3' adaptor sequence to the template switch oligonucleotide binding region of the polynucleotide; and (e) extending from the template switch oligo at the 3' end of the polynucleotide with a polymerase and dNTPs to form a polynucleotide that comprises adaptors at the 5’ and 3’ ends of the polynucleotide. In a further embodiment, the adaptor strand may or may not have nucleotide modifications. In yet a further embodiment, the adaptor strand comprises one or more LNAs. In another embodiment, the nontransferred strand is from 15 bp to 20 bp in length. In yet another embodiment, the non-
transferred strand can be denatured and removed using mild heat followed by a hot wash, wherein the replacement oligonucleotide remains hybridized to the adaptor under these conditions. In a further embodiment, the non-strand displacing polymerase is selected from a T4- based polymerase, a T7-based polymerase, a Pfu-based polymerase, and a Taq-based polymerase. In yet a further embodiment, the template switch oligonucleotide is blocked at its 3' end by having an -OH group, an ‘inverted T’ group, or a dideoxy version of a dNTP. In yet another embodiment, the Tn5 transposome is immobilized on a solid substrate by the adaptor strand hybridizing to an anchor oligonucleotide attached to the solid substrate. In a further embodiment, the 5' adaptor and the 3' adaptor have different sequences selected from either P5 or P7: P5: 5' AAT GAT ACG GCG ACC ACC GA 3' (SEQ ID NO: 32) or P7: 5' CAA GCA GAA GAC GGC ATA CGA GAT 3' (SEQ ID NO: 33). In yet a further embodiment, the adaptor- polynucleotide constructs are used as templates for sequencing.
[0012] The disclosure also provides a method to append adaptors to the 5’ and 3’ ends of polynucleotides to form an adaptor-polynucleotide constructs, comprising: (a) appending an adaptor to the 5' end of a polynucleotide by tagmenting the polynucleotide with a Tn5 transposome comprising an adaptor strand that is transferred to the 5' end of the polynucleotide, and a non-transferred strand which contains linkage(s) that resist exonuclease activity that remains hybridized to the adaptor strand when the adaptor strand is appended to the 5' end of the polynucleotide; (b) extending the polynucleotide comprising the 5' adaptor with a polymerase with 5’ to 3’ exonuclease activity ("5' exo polymerase"), wherein the 5' exo polymerase digests the hybridized non-appending strand up to the internal phosphorothioates linkage(s) to form a binding region for a template switch oligonucleotide; (c) denaturing and removing the nonappending strand to isolate a polynucleotide that comprises the 5' adaptor and the template switch oligonucleotide binding region; (d) annealing a template switch oligo that is blocked at its 3' end and comprises a 3' adaptor sequence to the template switch oligonucleotide binding region of the polynucleotide; and (e) extending from the template switch oligo at the 3' end of the polynucleotide with a polymerase and dNTPs to form a polynucleotide that comprises adaptors at the 5’ and 3’ ends of the polynucleotide. In another embodiment, the adaptor strand may or may not have nucleotide modifications. In yet another embodiment, the adaptor strand comprises one or more LNAs. In a further embodiment, the linkage(s) that resist exonuclease activity are phosphorothioate linkage(s), carbophosphonate linkage(s), pyridylphosphonate (PyrP) functionalized linkage(s), aminomethyl (AMP) or aminoethyl phosphonate (AEP) functionalized linkages, boranophosphate (BP) linkage(s), methylphosphonothioates (MPS) linkage(s),
phosphorodithioates (SPS) linkage(s), thiophosphoramidates (NPS) linkage(s), boranomethylphosphonates (BMP) linkage(s), guanidine (GUA) linkage(s), morpholino phosphorodiamidate (PMO) linkage(s), and/or carbamate linkage(s). In a certain embodiment, the linkage(s) that resist exonuclease activity are phosphorothioate linkage(s). In another embodiment, the polymerase with 5’ to 3’ exonuclease activity is selected from a Taq-based polymerase and a Bst-based polymerase. In yet another embodiment, the template switch oligonucleotide is blocked at its 3' end by having an -OH group, an ‘inverted T’ group, or a dideoxy version of a dNTP. In a further embodiment, the Tn5 transposome is immobilized on a streptavidin paramagnetic bead. In yet another embodiment, the Tn5 transposome is immobilized on a solid substrate by the adaptor strand hybridizing to an anchor oligonucleotide attached to the solid substrate. In a further embodiment, the 5' adaptor and the 3' adaptor have different sequences selected from either P5 or P7: P5: 5' AAT GAT ACG GCG ACC ACC GA 3' (SEQ ID NO: 32) or P7: 5' CAA GCA GAA GAC GGC ATA CGA GAT 3' (SEQ ID NO: 33). In yet a further embodiment, the adaptor-polynucleotide constructs are used as templates for sequencing.
[0013] In a particular embodiment, the disclosure also provides a method to append adaptors to the 5’ and 3’ ends of polynucleotides to form an adaptor-polynucleotide constructs, comprising: (a) appending an adaptor to the 5' end of a polynucleotide by tagmenting the polynucleotide with a Tn5 transposome comprising an adaptor strand that is transferred to the 5' end of the polynucleotide, and a non-transferred strand that remains hybridized to the adaptor when the adaptor is appended to the 5' end of the polynucleotide, and wherein 5' adaptor strand and the non-transferred strand comprises a template switch oligonucleotide binding region, wherein a portion of the sequence of the template switch oligonucleotide binding region does not contain one of the four types of nucleobases; (b) extending the polynucleotide comprising the 5' adaptor with a non-strand displacing polymerase and dNTPs up to the hybridized adaptor region; (c) extending the polynucleotide comprising the 5' adaptor with a strand displacing polymerase with the dNTPs for the base pairs found only in the template switch oligonucleotide binding region so as to form a polynucleotide that comprises a 5' adaptor and the template switch oligonucleotide binding region; (d) denaturing the polynucleotide of (c) and annealing a template switch oligonucleotide that is blocked at its 3' end and comprises a 3' adaptor sequence to the template switch oligonucleotide binding region of the polynucleotide; and (e) extending from the template switch oligonucleotide at the 3' end of the polynucleotide with a polymerase and dNTPs to form a polynucleotide that comprises adaptors at the 5’ and 3’ ends of the
polynucleotide. In another embodiment, the adaptor strand may or may not have nucleotide modifications. In yet another embodiment, the adaptor strand comprises one or more LNAs. In a further embodiment, the non-strand displacing polymerase is selected from a T4-based polymerase, a T7-based polymerase, a Pfu-based polymerase, and a Taq-based polymerase. In another embodiment, after step (b), the dNTPs and polymerase are removed prior to step (c). In yet another embodiment, the dNTPs and polymerase are removed by using SPRI beads, or by magnetic bead-based washing if the adaptors appended to the 5' end of the polynucleotide are attached to a bead. In a further embodiment, the strand displacing polymerase is selected from a phi29-based polymerase and a Bst (large fragment)-based polymerase. In yet a further embodiment, the template switch oligonucleotide is blocked at its 3' end by having an -OH group, an ‘inverted T’ group, or a dideoxy version of a dNTP. In yet another embodiment, the Tn5 transposome is immobilized on a solid substrate by the adaptor strand hybridizing to an anchor oligonucleotide attached to the solid substrate. In a further embodiment, the 5' adaptor and the 3' adaptor have different sequences selected from either P5 or P7: P5: 5' AAT GAT ACG GCG ACC ACC GA 3' (SEQ ID NO: 32) or P7: 5' CAA GCA GAA GAC GGC ATA CGA GAT 3' (SEQ ID NO: 33). In yet a further embodiment, the adaptor-polynucleotide constructs are used as templates for sequencing.
[0014] In a particular embodiment, the disclosure further provides a method to append adaptors to the 5’ and 3’ ends of polynucleotides to form an adaptor-polynucleotide constructs, comprising: (a) appending an adaptor to the 5' end of a polynucleotide by tagmenting the polynucleotide with a Tn5 transposome comprising an adaptor strand that is transferred to the 5' end of the polynucleotide, and a non-transferred strand that remains hybridized to the adaptor when the adaptor strand is appended to the 5' end of the polynucleotide, and wherein 5' adaptor strand and the non-transferred strand comprises a template switch oligonucleotide binding region, wherein a portion of the sequence of the template switch oligonucleotide binding region does not contain one of the four types of nucleobases; (b) extending the polynucleotide comprising the 5' adaptor with a non-strand displacing polymerase and dNTPs up to the hybridized adaptor region; (c) removing the non-transferred strand by selective denaturation; (d) extending the polynucleotide comprising the 5' adaptor with a strand displacing polymerase with the dNTPs for the base pairs found only in the template switch oligonucleotide binding region, to form a polynucleotide that comprises a 5' adaptor and the template switch oligonucleotide binding region; (e) denaturing the polynucleotide of (d) and annealing a template switch oligonucleotide that is blocked at its 3' end and comprises a 3' adaptor sequence to the template switch
oligonucleotide binding region of the polynucleotide; and (f) extending from the template switch oligonucleotide at the 3' end of the polynucleotide with a polymerase and dNTPs to form a polynucleotide that comprises adaptors at the 5’ and 3’ ends of the polynucleotide. In another embodiment, the adaptor strand may or may not have nucleotide modifications. In yet another embodiment, the adaptor strand comprises one or more LNAs. In a further embodiment, the nonstrand displacing polymerase is selected from a T4-based polymerase, a T7-based polymerase, a Pfu-based polymerase, and a Taq-based polymerase. In yet a further embodiment, after step (c), the dNTPs and polymerase are removed prior to step (e). In another embodiment, the dNTPs and polymerase are removed by using SPRI beads, or by magnetic bead-based washing if the adaptors appended to the 5' end of the polynucleotide are attached to a bead. In yet another embodiment, for step (c) the non-transferred strand is removed using moderate heat to selective denature the strand, or application of a lambda exonuclease that selectively digests oligonucleotides containing a 5’ phosphorylated ends. In a further embodiment, the strand displacing polymerase is selected from a phi29-based polymerase and a Bst (large fragmentbased polymerase. In yet a further embodiment, the template switch oligonucleotide is blocked at its 3' end by having an -OH group, an ‘inverted T’ group, or a dideoxy version of a dNTP. In yet another embodiment, the Tn5 transposome is immobilized on a solid substrate by the adaptor strand hybridizing to an anchor oligonucleotide attached to the solid substrate. In a further embodiment, the 5' adaptor and the 3' adaptor have different sequences selected from either P5 or P7: P5: 5' AAT GAT ACG GCG ACC ACC GA 3' (SEQ ID NO: 32) or P7: 5' CAA GCA GAA GAC GGC ATA CGA GAT 3' (SEQ ID NO: 33). In yet a further embodiment, the polynucleotides are used as templates for sequencing.
[0015] In a particular embodiment, the disclosure provides a method to append adaptors to the 5’ and 3’ ends of polynucleotides to form an adaptor-polynucleotide constructs, comprising: (a) appending an adaptor to the 5' end of a polynucleotide by tagmenting the polynucleotide with a Tn5 transposome comprising an adaptor strand that is transferred to the 5' end of the polynucleotide, and a non-transferred strand that can be removed under denaturing conditions, wherein the adaptor comprises a complementary template switch binding domain; (b) extending to the ends of the polynucleotide comprising the 5' adaptor with a strand displacing polymerase to form a polynucleotide comprising the 5' adaptor and a complementary 5' adaptor region comprising a template switch binding domain on the 3' end; (c) denaturing and annealing to the template switch binding domain a template switch oligonucleotide that is blocked at its 3' end, wherein the template switch oligonucleotide comprises a complementary sequence to the
template switch binding domain and comprises a adaptor region that is not complementary to the sequence of the polynucleotide, leaving the 3 'end region of the polynucleotide unhybridized; (d) providing a polymerase that has 3' to 5' exonuclease activity that first removes the unhybridized 3' end region of the polynucleotide and then extends from the complementary adaptor region of the template switch oligonucleotide to form a 3' adaptor on the polynucleotide. In a further embodiment, the adaptor strand may or may not have nucleotide modifications. In yet a further embodiment, the adaptor strand comprises one or more LNAs. In another embodiment, the strand displacing polymerase is selected from a phi29-based polymerase, and a Bst (large fragmentbased polymerase. In yet another embodiment, the polymerase that has 3' to 5' exonuclease activity is selected from a pfu-based polymerase, a phi29-based polymerase, and E. coli DNA polymerase II. In yet another embodiment, the Tn5 transposome is immobilized on a solid substrate by the adaptor strand hybridizing to an anchor oligonucleotide attached to the solid substrate. In a further embodiment, the 5' adaptor and the 3' adaptor have different sequences selected from either P5 or P7: P5: 5' AAT GAT ACG GCG ACC ACC GA 3' (SEQ ID NO: 32) or P7: 5' CAA GCA GAA GAC GGC ATA CGA GAT 3' (SEQ ID NO: 33). In yet a further embodiment, the adaptor-polynucleotide constructs are used as templates for sequencing.
[0016] In a particular embodiment, the disclosure also provides a method to append adaptors to the 5’ and 3’ ends of polynucleotides to form an adaptor-polynucleotide constructs, comprising: (a) appending an adaptor to the 5' end of a polynucleotide by tagmenting the polynucleotide with a Tn5 transposome comprising an adaptor strand that is transferred to the 5' end of the polynucleotide, and a non-transferred strand that can be removed under denaturing conditions, wherein the adaptor comprises a complementary template switch binding domain; (b) extending to the ends of the polynucleotide comprising the 5' adaptor with a strand displacing polymerase to form a polynucleotide comprising the 5' adaptor and a complementary 5' adaptor region comprising a template switch binding domain on the 3' end; (c) denaturing and annealing to the template switch binding domain a template switch oligonucleotide that is blocked at its 3' end, wherein the template switch oligonucleotide comprises a complementary sequence to the template switch binding domain and comprises a adaptor region that is not complementary to the sequence of the polynucleotide, leaving the 3 'end region of the polynucleotide unhybridized; (d) providing a structure specific endonuclease that nicks the unhybridized 3' end region of the polynucleotide and then a polymerase extends from the complementary adaptor region of the template switch oligonucleotide to form a 3' adaptor on the polynucleotide. In another embodiment, the adaptor strand may or may not have nucleotide modifications. In yet another
embodiment, the adaptor strand comprises one or more LNAs. In a further embodiment, the strand displacing polymerase is selected from a phi29-based polymerase and a Bst (large fragment)-based polymerase. In yet a further embodiment, the structure specific endonuclease is XPF/Mus81. In yet another embodiment, the Tn5 transposome is immobilized on a solid substrate by the adaptor strand hybridizing to an anchor oligonucleotide attached to the solid substrate. In a further embodiment, the 5' adaptor and the 3' adaptor have different sequences selected from either P5 or P7: P5: 5' AAT GAT ACG GCG ACC ACC GA 3' (SEQ ID NO: 32) or P7: 5' CAA GCA GAA GAC GGC ATA CGA GAT 3' (SEQ ID NO: 33). In yet a further embodiment, the adaptor-polynucleotide constructs are used as templates for sequencing. [0017] In a certain embodiment, the disclosure further provides a method to append adaptors to the 5’ and 3’ ends of polynucleotides to form an adaptor-polynucleotide constructs, comprising: (a) appending an adaptor to the 5' end of a polynucleotide by tagmenting the polynucleotide with a Tn5 transposome comprising an adaptor strand that is transferred to the 5' end of the polynucleotide, and a non-transferred strand that comprises a single stranded 3’ end that has a sequence complementary to an oligonucleotide comprising a 3’ complementary adaptor sequence; (b) annealing an oligonucleotide comprising the 3’ complementary adaptor sequence to the non-transferred strand to form a polynucleotide comprising the 5’ adaptor, the non-transferred strand, and the oligonucleotide comprising the 3’ complementary adaptor sequence; (c) incubating polynucleotide comprising the 5’ adaptor and non-transferred strand of step (b) with a non-strand displacing polymerase to extend the 3' ends of the non-transferred strand and polynucleotide, and with a ligase to ligate the 3’ ends of the polynucleotide to the 5’ ends of the non-transferred strand to form a 3’ adaptor on the polynucleotide, wherein the ligation reaction and extension reaction are carried out in the same reaction; and (d) removing the oligonucleotide under denaturing conditions to from a polynucleotide that comprises a 5’ adaptor and a 3’ adaptor. In a further embodiment, the non-strand displacing polymerase is selected from a T4-based polymerase, a T7-based polymerase, a Pfu-based polymerase, and a Taq-based polymerase. In yet another embodiment, the Tn5 transposome is immobilized on a solid substrate by the adaptor strand hybridizing to an anchor oligonucleotide attached to the solid substrate. In a further embodiment, the 5' adaptor and the 3' adaptor have different sequences selected from either P5 or P7: P5: 5' AAT GAT ACG GCG ACC ACC GA 3' (SEQ ID NO: 32) or P7: 5' CAA GCA GAA GAC GGC ATA CGA GAT 3' (SEQ ID NO: 33). In yet a further embodiment, the adaptor-polynucleotide constructs are used as templates for sequencing.
[0018] The details of one or more embodiments of the disclosure are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description and drawings, and from the claims.
BRIEF DESCRIPTION OF DRAWINGS
[0019] Figure 1 provides an overview of steps typically used for ligation-based library preparation that does not use PCR.
[0020] Figure 2 demonstrates the process in prokaryotes of Ku bridging DNA fragments and recruiting LigD for DNA repair.
[0021] Figure 3 provides an exemplary embodiment of the disclosure showing steps of an improved ligation-mediated library prep of the disclosure, where prokaryotic NHEJ factors Ku and LigD replace T4 ligase.
[0022] Figure 4 provides an exemplary embodiment of the disclosure wherein terminal transferase (TdT) is used to generate poly-nucleotide overhangs that could be joined, trimmed, and ligated by LigD.
[0023] Figure 5 provides a sampling of prokaryotic wild-type sequences for LigD (SEQ ID NOs: 1 to 20).
[0024] Figure 6 provides a sampling of prokaryotic wild-type sequences for Ku (SEQ ID NOs: 21 to 30).
[0025] Figure 7 shows how a 3’ flap or a 5’ flap structure can be formed on a sequence, depending on the sequence polarity.
[0026] Figure 8 demonstrates how FEN1 plays a central role in DNA replication both in eukaryotes and prokaryotes. FEN1 functions to remove single stranded 5’ flaps of DNA from Okazaki fragments that are generated on the lagging strand of the DNA replication fork. These flaps form when a primase generates an RNA primer that serves as a primer to extend a new DNA strand; multiple Okazaki fragments are generated and when extending from the 3’ end of one abuts the 5’ end of another, it displaces it to form a flap structure.
[0027] Figure 9 demonstrates how FEN1 binds to a 5’ flap and cleaves it at its base to leave a nick in the DNA.
[0028] Figure 10 shows how a ligase seals the nick in DNA, thus generating a contiguous new long strand from the initial Okazaki fragments.
[0029] Figure 11 presents the preferred substrate for FEN1 which is a structure having a double flap where both the 5’ end of one strand and the 3’ end of the other abutting strand overlap and both ends form flaps and moreover, the 3’ flap is a single nucleotide long.
[0030] Figure 12 shows an embodiment of an adaptor that has a specific structure which is designed to work with FEN 1. The adaptor comprises two oligonucleotides that when annealed together form a partially double stranded molecule. A ‘probe’ portion of this adaptor is single stranded and complementary to a target of interest in a genome. The double stranded portion of the adaptor comprises a universal sequence that can be, for instance, the sequences of adaptors for a DNA sequence platform. The last base-pair next to the single stranded probe portion may also match the target in the genome DNA.
[0031] Figure 13 shows that when the adaptor of FIG. 12 is hybridized to the target DNA molecule that has been previously made single stranded, a flap structure forms.
[0032] Figure 14 demonstrates that the structure of FIG. 13 comprising a 5’ flap from the target DNA and a single nucleotide 3’ flap from the adaptor is a substrate for FEN1, which can cleave, leaving a nick that can be subsequently joined by a ligase. The result is an addition of an adaptor to the target DNA.
[0033] Figures 15A-15B provides illustrations of structures utilized by flap endonucleases in the XPF/MUS81 family of proteins. These flap endonucleases play a role in damage repair caused by UV-light or DNA cross-linking. (FIG. 15A) Target structures for the XPF Flap endonuclease which comprise a fork in a DNA structure where the two branches of the fork comprise noncomplementary sequences. (FIG. 15B) The branches can be partially or fully double stranded or contiguous as in the example of a hairpin loop.
[0034] Figure 16 demonstrates that cleavage by XPF/MUS81 flap endonuclease occurs within a few bases of the commencement of the 3’ flap, generating a nick.
[0035] Figure 17 provides an embodiment of an exemplary adaptor having a specific structure that can be used with an XPF/MUS81 3’ flap endonuclease. The adaptor comprises, at a minimum, a single oligonucleotide comprising a 3’ sequence complementary to a target of interest in a genome and a 5’ sequence universal sequence that can be, for instance, used with massively parallel sequencing platforms.
[0036] Figure 18 demonstrates that when the adaptor of FIG. 17 is hybridized to the target DNA molecule that has been previously made single stranded, a flap structure forms that is a substrate for a XPF/MUS81 3’ flap endonuclease.
[0037] Figure 19 shows that the flap structure formed in FIG. 18 can be cleaved by a XPF/MUS81 3’ flap endonuclease, leaving a nick that when extended with a polymerase copies the universal adaptor sequence to the target DNA.
[0038] Figure 20 demonstrates embodiments where an adaptor for FEN1 endonuclease, complementary to a target #1, and an adaptor complementary for XPF/MUS81 endonuclease, complementary to a target #2, as shown in FIG. 13 and FIG. 18, are hybridized to DNA, for example genomic DNA, a structure forms that contains a 5’ flap and a 3’ flap.
[0039] Figure 21 demonstrates that when FEN1 and a ligase is added to the adaptors of FIG. 20, adaptor #1 will be appended to the 5’ end of the DNA of target #1.
[0040] Figure 22 demonstrates that when XPF/MUS81 endonuclease and a polymerase is added to the ligated adaptor/target of FIG. 21, the adaptor target will be copied and adaptor #2 will be appended to the 3’ end of the DNA of target #2.
[0041] Figure 23 demonstrates the possibility that achieving a double-flap structure may require sequential annealing of the individual oligos that comprise the FEN1 structure, such that a longer oligo that contains the probe #1 sequence anneals first to the target #1 followed by annealing of the shorter oligo complementary to the universal sequence of the adaptor. Such differential hybridization can be achieved by methods known to those skilled in the art, for example through design of the probe sequence and the universal sequences with different Tm. [0042] Figure 24 demonstrates that the methods of the disclosure which utilize Flap endonucleases can be multiplexed to include many targets.
[0043] Figure 25 demonstrates a tagmentation process to append adaptors to 5' ends of polynucleotide fragments. The transposase enzyme Tn5 fragments polynucleotides and simultaneously appends adaptor sequences to the 5’ ends of the resulting polynucleotide fragments.
[0044] Figure 26 demonstrates a process to append adaptors to the 3' ends of tagmented polynucleotides. The free 3’ end of polynucleotide fragments can be extended in the presence of a polymerase and dNTPs. Either heat (e.g., > 68°C), or use of a polymerase with strand displacement activity can be used to remove the ‘non-transferred’ strand of the fragment. The complement of the 5’ adaptor polynucleotide fragment is copied, and finally a PCR reaction with two distinct primers e.g., P5-i5-A14 and P7-i7-B15) can be used to enrich for those dsDNA PCR products that have a 5' based adaptor on one end and a 3' based adaptor on the other end.
[0045] Figure 27 demonstrates an alternate process to append adaptors to the 3' ends of tagmented polynucleotides. A single double-stranded ‘forked’ adaptor is employed in the
transposome and a non-displacing polymerase is used at a temperature below the Tm of the nontransferred strand (e.g., ~ 55°C) to extend the free 3’ ends of the fragment until it reaches the 5’ end of the ‘non-transferred’ adaptor strand and then a ligase covalently connects the nontransferred strand to the fragment.
[0046] Figures 28A-28B demonstrates embodiments of the disclosure to append adaptors to the ends of tagmented polynucleotides capable of hybridizing to, and extending off, a template switch oligo. (FIG. 28A) DNA is tagmented with a transposome that may or may not have modifications to the transferred strand. The tagmented library is treated to de-anneal and remove the non-transferred strand of the transposome. Next a replacement oligo that has a higher Tm than the non-transferred strand is hybridized back to the appended adaptors. The replacement oligo does not hybridize in place of all the non-transferred strand, but instead does so partially. This results in portion of the adaptor, 5’ of the replacement oligo, remaining single stranded. Next a non-displacing polymerase is used to extend from the 3’ ends of the insert filling in the ends of the insert and extending over the adaptor sequence but stopping when it reaches the hybridized replacement oligo. In this way, each strand of the insert now has an adaptor appended at its 5’ end (through tagmentation) and a partial adaptor at its 3’ end (through the extension step described herein). If the strands are denatured, they can then participate in a template switch reaction where a 3’ blocked template switch oligo is annealed and serves as a template to further extend the 3’ end of the insert to add new sequences to the 3’ end of the fragment. (FIG. 28B) A transposome is employed that already has the ‘replacement oligo’ annealed next to the nontransferred strand. In this embodiment, the non-transferred strand can be shorter than its standard 19 base Tb5 recognition sequence (for example 16 bases). Following tagmentation and removal of the Tn5, moderate heat can be used to denature the non-transferred strand leaving the ‘LNA containing replacement oligo’ still annealed. A non-displacing polymerase and polymerase reagents can be used in foregoing denaturing step, or alternatively, in a separate step, to extend from the 3 ’ ends of the insert filling in the ends of the insert and extending over the nontransferred strand but stopping when it reaches the hybridized replacement oligo. In this way, each strand of the insert now has an adaptor appended at its 5’ end (through tagmentation) and a partial adaptor at its 3’ end (through the extension step described herein). If the strands are denatured, they can then participate in a template switch reaction where a 3’ blocked template switch oligo is annealed and serves as a template to further extend the 3’ end of the insert to add new sequences to the 3 ’ end of the fragment.
[0047] Figure 29 presents an example of data generated using the methods embodied in FIG. 28. In this example, A transposome was constructed that contained LNA modifications in its transferred strand and was used to tagment genomic DNA. The transposome was immobilized on a streptavidin paramagnetic bead via a 3’ biotin group on an ‘anchor’ oligo. The tagmentation was conducted in the presence of a ligase enzyme and an Illumina™ Indexing primer P5-i5-A14. This primer hybridized 5’ of the transferred strand and was ligated to it by virtue of the 5’ end of the transferred strand bearing a phosphate moiety by design. The Tn5 protein was then removed by denaturing it with a solution comprising the anionic detergent sodium dodecyl sulfate (SDS). Different SDS concentrations (%) were tested to effect complete removal of the Tn5 protein. Following a wash of the beads, a mixture of a non-displacing polymerase (tTaq608), dNTPs, Q5 polymerase and a template switching oligo was added and incubated at 47 °C to de-anneal the non-transferred strand and extend with tTaq608 pol as far as the anchor oligo. The temperature was then raised further (60-70 °C) to the point where the templates were rendered single stranded and no longer attached to the beads. The temperature was then lowered to 42 °C for 1 min to allow the template switch oligo containing the P7-i7-B15 sequences to hybridize. Q5 polymerase then extended from the free 3’ end of the fragment to copy the P7-i7-B15, thus completely the template construct. A qPCR reaction was performed to quantify how much completed template was present. The graph indicates that up to 2000 pM of correct product was formed under SDS concentrations that removed all of the Tn5 from the tagmented product complex.
[0048] Figure 30 demonstrates additional embodiments of the disclosure to append adaptors to the ends of tagmented polynucleotides capable of hybridizing to, and extending off, a template switch oligo. In this embodiment, a transposome is employed that contains one or more internal modifications in the non-transferred strand that prevents exonuclease digestion, for example a phosphorothioate linkage in the phosphodiester backbone of the oligo. Following tagmentation, a polymerase with a 5’ to 3’ exonuclease activity is employed to extend from the free 3’ end of the insert. As it extends it encounters the 5’ end of the non-transferred strand and digests the bases of this oligo until it encounters the modified linkage at which point it extends no further. The remaining short portion of the non-transferred strand can be denatured by moderate heat. In this way, each strand of the insert now has an adaptor appended at its 5’ end (through tagmentation) and a partial adaptor at its 3’ end (through the extension step described herein). If the strands are denatured, they can then participate in a template switch reaction where a 3’ blocked template switch oligo is annealed and serves as a template to further extend the 3’ end of the insert to add new sequences to the 3 ’ end of the fragment
[0049] Figures 31A-31B demonstrates additional embodiments of the disclosure to append adaptors to the ends of tagmented polynucleotides capable of hybridizing to, and extending off, a template switch oligo. (FIG. 31A) A non-displacing polymerase and all four dNTPs (dATP, dCTP, dGTP, dTTP) is used to extend from the free 3’ end of the fragment up to the 5’ end of the non-transferred strand. The polymerase and dNTPs are then removed (e.g., purified on SPRI beads, or by magnetic bead-based washing if the adaptors are attached to a bead). A fresh aliquot of a strand displacing polymerase is added and just +three out of the four dNTPs (dCTP, dGTP, dTTP); dATP is absent. This mix will continue to extend the 3’ end of the fragment across the Tn5 adaptor which for the first 19 bases is a known Tn5 recognition sequence (e.g., 5’AGATGTGTATAAGAGACAG3’ (SEQ ID NO:31)) but will stop incorporating bases once it reaches a ‘T’ base in the template, due to the absence of dATP molecules. In this way, each strand of the insert now has an adaptor appended at its 5’ end (through tagmentation) and a partial adaptor at its 3’ end through the extension step described herein and comprising a sequence (e.g., 5’CTGTCTCTT3’ (SEQ ID NO:32)). If the strands are denatured, they can then participate in a template switch reaction where a 3’ blocked template switch oligo is annealed and serves as a template to further extend the 3’ end of the insert to add new sequences to the 3’ end of the fragment. (FIG. 31B) A non-displacing polymerase and all four dNTPs (dATP, dCTP, dGTP, dTTP) is used to extend from the free 3’ end of the fragment up to the 5’ end of the non-transferred strand. The polymerase and dNTP are then removed (eg purified on SPRI beads, or by magnetic bead-based washing if the adaptors are attached to a bead). The non-transferred strand is then removed, e.g., by moderate heat to selective denature the strand, or by application of a lambda exonuclease that selectively digests oligos containing 5’ phosphorylated ends (as is the case with non-transferred strands), or by other means known to those skilled in the art. A fresh aliquot of a polymerase is added and just three out of the four dNTPs (dCTP, dGTP, dTTP); dATP is absent. This mix will continue to extend the 3’ end of the fragment across the Tn5 adaptor which for the first 19 bases is a known invariant sequence (e.g., 5’AGATGTGTATAAGAGACAG3’(SEQ ID NO:31)) but will stop incorporating bases once it reaches a ‘T’ base in the template, due to the absence of dATP molecules. In this way, each strand of the insert now has an adaptor appended at its 5’ end (through tagmentation) and a partial adaptor at its 3’ end through the extension step described herein and comprising a sequence (e.g., 5’CTGTCTCTT3’ (SEQ ID NO:32)). If the strands are denatured, they can then participate in a template switch reaction where a 3’ blocked template switch oligo is annealed
and serves as a template to further extend the 3’ end of the insert to add new sequences to the 3’ end of the fragment.
[0050] Figure 32 demonstrates an embodiment of the disclosure demonstrating how an adaptor can be appended to the 3’ end following extension by hybridizing an oligo that is partially complementary to the 5’ adaptor but contains further sequences that are unique and not present in the 5’ adaptor. A ligation reaction covalently joins this oligo to the 3’ end of the insert and forms a ‘ Y’ shaped adaptor construct.
[0051] Figure 33A-33G demonstrates additional embodiments of the disclosure to append adaptors to the ends of tagmented polynucleotides capable of hybridizing to, and extending off, a template switch oligo, and data generated therefrom. (FIG. 33A) A single transposome type comprises a P5 transferred-strand and a short non-transferred-strand is used to tagment DNA. Following removal of the Tn5 post-tagmentation, a strand displacing polymerase is added to extend from the free 3’ end, create the complement of the P5 transferred-strand (z.e., creates P5’), and displace the short non-transferred strand. The temperature is then elevated to make the fragments single stranded. A P7 template switch oligo hybridizes forming a forked structure that is partially double-stranded. Then in the presence of a polymerase with 3’ exo activity, the single stranded 3’ end is degraded by this activity until there is no longer any single stranded 3’ end. The remaining 3’ end of the fragment then forms a primer template that extends and creates the complement of the P7 template switch oligo. (FIG. 33B) Experiments that demonstrate a ‘proof of concept’ of the ‘fork’ -modulated switch in activity from exonuclease to extension activity. Simple P5 transposomes were immobilized on a streptavidin bead by hybridization to an ‘anchor’ oligo and used to tagment DNA. Following removal of the Tn5 and extension from the free 3’ end to create the complement of the P5 transferred-strand, the fragments were denatured and hybridized with a 5’ FAM fluorescently-labelled P7 template switch oligo. The template switch oligo comprised either a free extendable -OH group at its 3’ end or a non-extendable blocked dideoxyC group at its 3’ end, or a non-extendable ‘inverted T’ blocking group at its 3’ end. On addition of a polymerase and dNTPs, the P5’ end of the template is digested and replaced with the P7’. The products of the reaction were subjected to analysis by ‘gel size exclusion’ electrophoresis and by a qPCR reaction which only amplifies and quantifies templates that have a P5 adaptor at their 5’ end and a P7’ adaptor at their 3’ end. (FIG. 33C) Images of gel electrophoresis indicating that neither of the two 3 ’blocked template switch oligo were consumed indicating that the block is effective in preventing extension from the 3’ end of the template switch oligo which would result in creating a copy of the entire template. In
contrast, when a 3’ extendable template switch oligo was used, extension and copying of the entire template occurred as evident from the reduction in fluorescence intensity of the template switch oligo band and the appearance of higher molecular weight product labelled with FAM. (FIG. 33D) Results of the qPCR analysis that only detects product that is correctly appended with P5 on the 5’ end and P7’ on the 3’ end, but not product appended with P5 on both ends. Both blocked template switch oligos yielded the correct product at approximately 1,700 pM. In contrast, the unblocked template switch oligo produced 1.5x as much product as a result of two mechanisms: (i) ‘fork’ -modulated switch in activity from exonuclease to extension activity to append the P7’ adaptor to the 3’ end of the template, and (ii) extension from the 3’ end of the template switch oligo to append a copy of the template to the P7 template switch oligo. (FIG. 33E) A simple P5 transposome was immobilized on a streptavidin bead by hybridization to an ‘anchor’ oligo and used to tagment DNA. Following removal of the Tn5 and extension from the free 3’ end to create the complement of the P5 transferred-strand, the fragments were denatured and hybridized with a P7 template switch oligo that comprised a 3’ end blocked with a ddC moiety that prevents incorporation and extension. On addition of a polymerase and dNTPs, the P5’ end of the template is digested and replaced with the P7’ copied from the template switch oligo. (FIG. 33F) A control experiment was performed using a transposome comprising a P5 transferred-strand hybridized to a bead via an anchor oligo and a non-transferred strand comprising a single stranded 3’ end that is complementary to a P7 oligo. Following tagmentation and removal of the Tn5 protein, a mixture of a polymerase, ligase and the P7 oligo were added and incubated to extend and ligate the free 3’ end of the template to the 5’ end of the nontransferred strand while simultaneously extending the 3’ end of the non-transferred strand to copy the P7 oligo, thus producing a completed template comprised a P5 5’ end and a P7’ 3’ end. Both libraries from the control and test transposomes were subjected to qPCR to assess yields. (FIG. 33G) The tempi ate- switch workflow of (FIG. 33E) produced a greater yield of library constructs than the control workflow of (FIG. 33F).
[0052] Figure 34 demonstrates additional embodiments of the disclosure to append adaptors to the ends of tagmented polynucleotides capable of hybridizing to, and extending off, a template switch oligo. A structure specific endonuclease (e.g., XPF/MUS81) can be employed as an alternative to exonuclease degradation of the 3’ single stranded of the forked structure following hybridization of the template switch oligo. In this embodiment the endonuclease nicks the double stranded region of the duplex creating a free 3’ end that a polymerase extends and creates the complement of the P7 template switch oligo.
[0053] The accompanying drawings, which are incorporated into and constitute a part of this specification, illustrate one or more embodiments of the disclosure and, together with the detailed description, serve to explain the principles and implementations of the disclosure.
DETAILED DESCRIPTION
[0054] As used herein and in the appended claims, the singular forms "a," "an," and "the" include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to "an adaptor" includes a plurality of such adaptors and reference to "the DNA library" includes reference to one or more DNA libraries, and so forth.
[0055] Also, the use of "or" means "and/or" unless stated otherwise. Similarly, "comprise," "comprises," "comprising," "include," "includes," "including," "have," "haves," and "having" are interchangeable and not intended to be limiting.
[0056] It is to be further understood that where descriptions of various embodiments use the term "comprising," those skilled in the art would understand that in some specific instances, an embodiment can be alternatively described using language "consisting essentially of' or "consisting of."
[0057] Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood to one of ordinary skill in the art to which this disclosure belongs. Although methods and materials similar or equivalent to those described herein can be used in the practice of the disclosed methods and compositions, the exemplary methods, devices and materials are described herein.
[0058] The term “library” merely refers to a collection or plurality of template molecules, which at their 5' and 3' ends typically comprise added on adaptor sequences. Use of the term “library” to refer to a collection or plurality of template molecules should not be taken to imply that the templates making up the library are derived from a particular source, or that the “library” has a particular composition. By way of example, use of the term “library” should not be taken to imply that the individual templates within the library must be of different nucleotide sequence or that the templates be related in terms of sequence and/or source.
[0059] In various embodiments the disclosure encompasses formation of so-called “monotemplate” libraries, which comprise multiple copies of a single type of template molecule, each having added on adaptor sequences at their 5' ends and their 3' ends, as well as “complex” libraries wherein many, if not all, of the individual template molecules comprise different target
sequences (as defined below), where each template molecule has added on adaptor sequences at their 5' ends and their 3' ends. Such complex template libraries may be prepared using the method of the disclosure starting from a complex mixture of target polynucleotides such as (but not limited to) random genomic DNA fragments, cDNA libraries etc. The disclosure also extends to “complex” libraries formed by mixing together several individual “monotemplate” libraries, each of which has been prepared separately using the method of the disclosure starting from a single type of target molecule (ie., a monotemplate). In a particular embodiment more than 50%, or more than 60%, or more than 70%, or more than 80%, or more than 90%, or more than 95% of the individual polynucleotide templates in a complex library may comprise different target sequences.
[0060] Use of the term “template” to refer to individual polynucleotide molecules in the library merely indicates that one or both strands of the polynucleotides in the library are capable of acting as templates for template-dependent nucleic-acid polymerization catalyzed by a polymerase. Use of this term should not be taken as limiting the scope of the invention to libraries of polynucleotides which are actually used as templates in a subsequent enzyme- catalyzed polymerization reaction.
[0061] The term “unmatched region” refers to a region of the adaptor wherein the sequences of the two polynucleotide strands forming the adaptor exhibit a degree of noncomplementarity such that the two strands are not capable of annealing to each other under standard annealing conditions for a primer extension or PCR reaction. The two strands in the unmatched region may exhibit some degree of annealing under standard reaction conditions for an enzyme-catalyzed ligation reaction, provided that the two strands revert to single stranded form under annealing conditions.
[0062] A DNA sequencing library is generally formed by ligating adaptor polynucleotide molecules to the 5' and 3' ends of one or more target polynucleotide duplexes (which may be of known, partially known or unknown sequence) to form adaptor-target constructs and then carrying out an initial primer extension reaction in which extension products complementary to both strands of each individual adaptor-target construct are formed. The resulting primer extension products, and optionally amplified copies thereof, collectively provide a library of template polynucleotides. The library of template polynucleotides can then be sequenced using next generation sequencing. To save resources, multiple libraries can be pooled together and sequenced in the same run — a process known as multiplexing. During adaptor ligation, unique
index sequences, or “barcodes,” can be added to each library. These barcodes are used to distinguish between the libraries during data analysis.
[0063] It is generally advantageous for complex libraries of templates to be amplified by PCR (e.g., whole genome amplification) whether performed in solution or on a solid support, to include regions of “different” sequence at their 5' and 3' ends, which are nevertheless common to all template molecules in the library, especially if the amplification products are to be ultimately sequenced. For example, the presence of a common unique sequence at one end only of each template in the library can provide a binding site for a sequencing primer, enabling one strand of each template in the amplified form of the library to be sequenced in a single sequencing reaction using a single type of sequencing primer.
[0064] It will be appreciated that the ends of the amplification products may differ somewhat to the products of the initial primer extension reaction, since the former will be determined in part by the sequence of the PCR primer used to prime synthesis of a polynucleotide strand complementary to the initial primer extension product, whereas the latter will be determined solely by copying of the adaptor sequences at the 3' ends of the adaptortemplate constructs in the initial primer extension.
[0065] In a particular embodiment, the disclosure provides methods that utilize nonhom ologous end joining factors (nhEJF) to append adaptors to polynucleotides. For such methods, the nhEJF adaptors added onto the double stranded polynucleotides typically comprise a double stranded region of complementary sequence and a single stranded region of sequence mismatch. In a particular embodiment, the nhEJF adaptors have a Y-shape, where the region of sequence mismatch causes the arms of the adaptor to separate from each other. The “doublestranded region” of the nhEJF adaptor is a short double-stranded region, typically comprising 5 or more consecutive base pairs, formed by annealing of the two partially complementary oligonucleotide strands. This term simply refers to a double-stranded region of nucleic acid in which the two strands are annealed and does not imply any particular structural conformation. In an alternate embodiment, the nhEJF adaptors, instead of having a Y-shape structure, are U- shaped, such that once the nhEJF adaptors are added to the ends of polynucleotides using nhEJFs in methods of the disclosure form a continuous loop at the 5’ and 3’ ends of the templates. The resulting polynucleotides comprising the 5’ and 3’ adaptors can be amplified using rolling circle amplification.
[0066] Generally, it is advantageous for the double-stranded region of the nhEJF adaptors to be as short as possible without loss of function. By “function” in this context is meant that the
double-stranded region forms a stable duplex under reaction conditions for the nhEJFs described herein, such that the two strands forming the nhEJF adaptor remain partially annealed during ligation of the nhEJF adaptor to a polynucleotide. It is not absolutely necessary for the doublestranded region to be stable under the conditions typically used in the annealing steps of primer extension or PCR reactions.
[0067] In another embodiment, identical nhEJF adaptors are added to both ends of each of the polynucleotide. The resulting polynucleotides will be flanked by complementary sequences derived from the double-stranded region of the nhEJF adaptors. The longer the doublestranded region (z.e., the complementary sequences of the adaptor-polynucleotide constructs) the greater the possibility that the adaptor-polynucleotide construct is able to fold back and base-pair to itself in these regions of internal self-complementarity when annealed for primer extension and/or PCR. Generally, the double-stranded region of the nhEJF adaptors comprise 5 base pairs (bps), 6 bps, 6 bps, 7 bps, 8 bps, 9 bps, 10 bps, 11 bps, 12 bps, 13 bps, 14 bps, 15 bps, 16 bps, 17 bps, 18 bps, 19 bps, 20 bps, or a range that includes or is between any two of the foregoing bps. The stability of the double-stranded region of the nhEJF adaptor may be increased, and hence its length potentially reduced, by the inclusion of non-natural nucleotides which exhibit stronger base-pairing than standard Watson-Crick base pairs.
[0068] It a particular embodiment, two strands of a nhEJF adaptor comprise base pairs that are 100% complementary to a sequence of the polynucleotide. It will be appreciated, however, that one or more nucleotide mismatches may be tolerated within the double-stranded region of the nhEJF adaptor, provided that the two strands are capable of forming a stable duplex under standard ligation conditions.
[0069] Alternatively, the nhEJF adaptors added onto the double stranded templates using the non-homologous end joining factors in methods of the disclosure comprise double stranded complementary sequences. The resulting adaptor/template molecules can then be amplified by PCR to form the DNA library templates. In a further embodiment, a splint oligonucleotide can be used to join the ends of polynucleotides comprising adaptors to form a circle. An exonuclease is added to remove all remaining linear single-stranded and double-stranded DNA products. The result is a completed circular DNA template.
[0070] nhEJF adaptors for use in the methods disclosed herein will generally include a double-stranded region adjacent to the “ligatable” end of the nhEJF adaptor, z.e., the end that is joined to a target polynucleotide using the non-homologous end joining factors in methods of the disclosure. The ligatable end of a nhEJF adaptor may be blunt or, in other embodiments, short 5’
or 3' overhangs of one or more nucleotides may be present to facilitate/promote ligation. The 5' terminal nucleotide at the ligatable end of the nhEJF adaptor should be phosphorylated to enable phosphodiester linkage to a 3' hydroxyl group on the target polynucleotide.
[0071] The conditions encountered during the annealing steps of a primer extension reaction or PCR reaction will be generally known to one skilled in the art, although the precise annealing conditions will vary from reaction to reaction (see Sambrook el al.. 2001, Molecular Cloning, A Laboratory Manual, 3rd Ed, Cold Spring Harbor Laboratory Press, Cold Spring Harbor Laboratory Press, NY; Current Protocols, eds Ausubel et al.). Typically, such conditions may comprise, but are not limited to, (following a denaturing step at a temperature of about 94 °C for about one minute) exposure to a temperature in the range of from 40 °C to 72 °C (typically 50-68° C) for a period of about 1 minute in standard PCR reaction buffer.
[0072] Different annealing conditions may be used for a single primer extension reaction not forming part of a PCR reaction (again see Sambrook el al., 2001, Molecular Cloning, A Laboratory Manual, 3rd Ed, Cold Spring Harbor Laboratory Press, Cold Spring Harbor Laboratory Press, NY; Current Protocols, eds Ausubel et al.). Conditions for primer annealing in a single primer extension include, for example, exposure to a temperature in the range of from 30 to 37° C. in standard primer extension buffer. It will be appreciated that different enzymes, and hence different reaction buffers, may be used for a single primer extension reaction as opposed to a PCR reaction. There is no requirement to use a thermostable polymerase for a single primer extension reaction.
[0073] In a further embodiment, the nhEJF adaptors comprise a double stranded region and an unmatched region. The lower limit on the length of the unmatched region will typically be determined by function, for example the need to provide a suitable sequence for binding of a primer for primer extension, PCR and/or sequencing. Theoretically there is no upper limit on the length of the unmatched region, except that it general it is advantageous to minimize the overall length of the adaptor, for example in order to facilitate separation of unbound adaptors from adaptor-target constructs following the ligation step. Therefore, it is contemplated that the length of unmatched region in each strand should be 20 nucleotides (nts), 25 nts, 30 nts, 35 nts, 40 nts, 45 nts, 50 nts in length, or have a range of lengths that includes or is between any two of the foregoing nucleotide lengths.
[0074] In another embodiment, the overall length of the two strands forming a nhEJF adaptor will typically be 25 nts, 30 nts, 35 nts, 40 nts, 45 nts, 50 nts, 55 nts, 60 nts, 65 nts, 70 nts, 75 nts, 80 nts, 85 nts, 90 nts, 95 nts, 100 nts, 105 nts, 110 nts, 115 nts, 120 nts, 125 nts, 130 nts,
135 nts, 140 nts, 145 nts, 150 nts, or a range that is between or includes any two of foregoing nucleotide lengths.
[0075] In a particular embodiment, the portions of the two strands forming the unmatched region of a nhEJF adaptor should preferably be of similar length, although this is not absolutely essential, provided that the length of each portion is sufficient to fulfil its desired function (e.g., primer binding). It has been shown by experiment that the portions of the two strands forming the unmatched region of a nhEJF may differ by up to 25 nucleotides without unduly affecting adaptor function.
[0076] In a particular embodiment, portions of the two polynucleotide strands forming an unmatched region of a nhEJF adaptor will be completely mismatched, or 100% noncompl ementary. However, some sequence “matches”, ie., a lesser degree of noncomplementarity may be tolerated in this region without affecting function to a material extent. As aforesaid, the extent of sequence mismatching or non-complementarity is such that the two strands in the unmatched region remain in single-stranded form under annealing conditions as defined above.
[0077] The precise nucleotide sequence of the nhEJF adaptors is generally not material to the disclosure and may be selected by the user such that the desired sequence elements are ultimately included in the common sequences of the library of polynucleotides comprising the adaptors, e.g., to provide binding sites for particular sets of universal amplification primers and/or sequencing primers (e.g., P7 or P5 primers). Additional sequence elements may be included, for example to provide binding sites for sequencing primers which will ultimately be used in sequencing of template molecules in the library, or products derived from amplification of the template library, for example on a solid support. The nhEJF adaptors may further include “bar code” sequences, which can be used to bar code polynucleotides derived from a particular source.
[0078] Although the precise nucleotide sequence of the nhEJF adaptor is generally nonlimiting to the disclosure, the sequences of the individual strands in the unmatched region should be such that neither individual strand exhibits any internal self-complementarity which could lead to self-annealing, formation of hairpin structures, etc., under standard annealing conditions. Selfannealing of a strand in the unmatched region is to be avoided as it may prevent or reduce specific binding of an amplification primer to this strand.
[0079] nhEJF adaptors are preferably formed from two strands of DNA, but may include mixtures of natural and non-natural nucleotides (e.g., one or more ribonucleotides) linked by a
mixture of phosphodiester and non-phosphodiester backbone linkages. Other non-nucleotide modifications may be included such as, for example, biotin moieties, blocking groups and capture moieties for attachment to a solid surface, as discussed in further detail below.
[0080] In general, polynucleotides to which the adaptors are appended to may be a polynucleotide that can be used with additional methodologies, including amplification by solidphase PCR, next generation sequencing, subcloning, etc. Polynucleotides in which nhEJF adaptors are appended to, may originate in double-stranded DNA form (e.g., genomic DNA fragments) or may have originated in single-stranded form, as DNA or RNA, and been converted to dsDNA form prior to ligation. By way of example, mRNA molecules may be copied into double-stranded cDNAs suitable for use with nhEJF adaptors disclosed herein. The precise sequence of the polynucleotides is generally not material to the disclosure, and may be known or unknown. Modified polynucleotides including polynucleotides comprising non-natural nucleotides and/or non-natural backbone linkages could also be utilized in the methods of the disclosure, provided that the modifications do not preclude adding on nhEJF adaptors and/or copying in a primer extension reaction.
[0081] The non-homologous end joining factors and methods of the disclosure can be used with a single polynucleotide, or can be used with a mixture or plurality of polynucleotides. The non-homologous end joining factors in the methods of the disclosure may be used with multiple copies of the same polynucleotides (z.e., monotemplates) or with mixtures of different polynucleotides. The polynucleotides may differ from each other with respect to nucleotide sequence over the full length of the polynucleotide or only a part of the polynucleotide. A nhEJF-based method disclosed herein may be applied to a plurality of polynucleotides derived from a common source, for example a library of genomic DNA fragments derived from a particular individual. In one embodiment the target polynucleotides will comprise random fragments of human genomic DNA. The fragments may be derived from a whole genome or from part of a genome (e.g., a single chromosome or sub-fraction thereof), and from one individual or several individuals. The polynucleotides may be treated chemically or enzymatically either prior to, or subsequent to the ligation of the nhEJF adaptor sequences. Techniques for fragmentation of genomic DNA include, for example, enzymatic digestion or mechanical shearing.
[0082] “Ligation” of nhEJF adaptors to 5' and 3' ends of each polynucleotide involves joining of the two polynucleotide strands of the nhEJF adaptor to double-stranded target polynucleotide such that covalent linkages are formed between both strands of the two double-
stranded molecules. In this context “joining” means covalent linkage of two polynucleotide strands which were not previously covalently linked. Preferably such “joining” will take place by formation of a phosphodiester linkage between the two polynucleotide strands but other means of covalent linkage (e.g., non-phosphodiester backbone linkages) may be used. However, the covalent linkages formed in the ligation reactions should allow for read-through of a polymerase, such that the resultant construct can be copied in a primer extension reaction using primers which binding to sequences in the regions of the adaptor-target construct that are derived from the nhEJF adaptor molecules.
[0083] The ligation reactions will typically be enzyme-catalyzed. In particular embodiment, the ligation reactions will be by the non-homologous end joining factors of the disclosure. Non-enzymatic ligation techniques (e.g., chemical ligation) may also be used, provided that the non-enzymatic ligation leads to the formation of a covalent linkage which allows read-through of a polymerase, such that the resultant construct can be copied in a primer extension reaction.
[0084] The desired products of the ligation reaction are adaptor-target constructs in which nhEJF adaptors are ligated at both ends of each target polynucleotide, given the structure adaptor-polynucleotide-adaptor. Conditions of the ligation reaction should therefore be optimized to maximize the formation of this product, in preference to targets having an adaptor at one end only.
[0085] The products of the ligation reaction may be subjected to purification steps in order to remove unbound nhEJF adaptor molecules before the adaptor-polynucleotide constructs are processed further. Any suitable technique may be used to remove excess unbound nhEJF adaptors, examples of which will be described in further detail below.
[0086] Adaptor-polynucleotides constructs formed in the ligation reaction as discussed above are then subject to an initial primer extension reaction in which a primer oligonucleotide is annealed to an adaptor portion of each of the adaptor-polynucleotide constructs and extended by sequential addition of nucleotides to the free 3' hydroxyl end of the primer to form extension products complementary to at least one strand of each of the adaptor-target constructs.
[0087] The term “initial” primer extension reaction refers to a primer extension reaction in which primers are annealed directly to the adaptor-polynucleotide constructs, as opposed to either complementary strands formed by primer extension using the adaptor-polynucleotide construct as a template or amplified copies of the adaptor-polynucleotide construct. In a certain embodiment, the initial primer extension reaction is carried out using a “universal” primer which
binds specifically to a cognate sequence within an adaptor portion of the adaptor-polynucleotide construct, and is not carried out using a target-specific primer or a mixture of random primers. The use of an adaptor-specific primer for the initial primer extension reaction is key to formation of a library of polynucleotides which have common sequence at the 5' and common sequence at the 3' end.
[0088] The primers used for the initial primer extension reaction will be capable of annealing to each individual strand of adaptor-polynucleotide constructs having adaptors ligated at both ends, and can be extended so as to obtain two separate primer extension products, one complementary to each strand of the construct. Thus, the initial primer extension reaction will result in formation of primer extension products complementary to each strand of each adaptortarget
[0089] In a certain embodiment the primer used in the initial primer extension reaction will anneal to a primer-binding sequence (in one strand) in the unmatched region of the adaptor.
[0090] The term “annealing” as used in this context refers to sequence-specific binding/hybridization of the primer to a primer-binding sequence in an adaptor region of the adaptor-target construct under the conditions to be used for the primer annealing step of the initial primer extension reaction.
[0091] The products of the primer extension reaction may be subjected to standard denaturing conditions in order to separate the extension products from strands of the adaptor- polynucleotide constructs. Optionally the strands of the adaptor-polynucleotide constructs may be removed at this stage. The extension products (with or without the original strands of the adaptor-target constructs) collectively form a library of template polynucleotides which can be used, e.g., as templates for solid-phase PCR.
[0092] Optionally, the initial primer extension reaction may be repeated one or more times, through rounds of primer annealing, extension and denaturation, in order to form multiple copies of the same extension products complementary to the adaptor-target constructs.
[0093] In another embodiment, the initial extension products may be amplified by conventional solution-phase PCR, as described in further detail below. The products of such further PCR amplification may be collected to form a library of templates comprising “amplification products derived from” the initial primer extension products. In a certain embodiment both primers used for further PCR amplification will anneal to different primerbinding sequences on opposite strands in the unmatched region of the adaptor. Other embodiments may, however, be based on the use of a single type of amplification primer which
anneals to a primer-binding sequence in the double-stranded region of the adaptor. In embodiments of the method based on PCR amplification the “initial” primer extension reaction occurs in the first cycle of PCR.
[0094] Inclusion of an initial primer extension step (and optionally further rounds of PCR amplification) to form complementary copies of the adaptor-target constructs (prior to whole genome or solid-phase PCR) is advantageous, for several reasons. Firstly, inclusion of the primer extension step, and subsequent PCR amplification, acts as an enrichment step to select for adaptor-target constructs with adaptors ligated at both ends. Only target constructs with adaptors ligated at both ends provide effective templates for whole genome or solid-phase PCR using common or universal primers specific for primer-binding sequences in the adaptors, hence it is advantageous to produce a template library comprising only double-ligated targets prior to solidphase or whole genome amplification. Alternatively, the method disclosed herein to make a template library is PCR-free. By being PCR-free, there is reduced library bias and gaps, due to preferential enrichment of certain adaptor/template constructs over others. The result is high data quality and optimal variant detection across the genome.
[0095] Secondly, inclusion of the initial primer extension step, and subsequent PCR amplification, permits the length of the common sequences at the 5' and 3' ends of the target to be increased prior to solid-phase PCR or sequencing. As outlined above, it is generally advantageous for the length of the adaptor molecules to be kept as short as possible, to maximize the efficiency of ligation and subsequent removal of unbound adaptors. However, for the purposes of solid-phase PCR or sequencing it may be an advantage to have longer sequences common or “universal” sequences at the 5' and 3' ends of the templates to be amplified. Inclusion of the primer extension (and subsequent amplification) steps means that the length of the common sequences at one (or both) ends of the polynucleotides in the template library can be increased after ligation by inclusion of additional sequence at the 5' ends of the primers used for primer extension (and subsequent amplification). The use of such “tailed” primers is described in further detail below.
[0096] Various non-limiting specific embodiments of the method of the invention will now be described in further detail with reference to the accompanying drawings. Features described as being useful in relation to one specific embodiment of the disclosure apply mutatis mutandis to other embodiments of the disclosure unless stated otherwise.
[0097] FIG. 1 illustrates a process standardly used to generate a template library for sequencing. Next generation sequencing (NGS) typically requires library preparation, where
known adaptor DNA sequences are added to the target DNA to be sequenced. Traditionally, this requires that sample DNA is fragmented, end-repaired, and then ligated to the adaptor DNA (e.g., see FIG. 1). This library preparation is common to all major sequencing platforms, including those from Illumina™, Pacific Biosciences™, and Oxford Nanopore™. Furthermore, for samples composed of short DNA fragments like those for liquid biopsy or non-invasive prenatal testing applications, ligation-mediated library prep is currently the only option for library preparation.
[0098] As shown in FIG. 1, the starting DNA is fragmented, and the fragments purified. An end repair reaction is then performed with T4 Polynucleotide Kinase, rATP, and T4 DNA polymerase, dNTP, to form blunt ended double stranded templates. After end repair cleanup and size selection, an A-tailing reaction is performed with Klenow exo-, dNTP. The adaptor is formed by annealing two single-stranded oligonucleotides prepared by conventional automated oligonucleotide synthesis. The oligonucleotides are partially complementary such that the 3' end of a first oligonucleotide is complementary to the 5' end of a second oligonucleotide. The 5' end of the first oligonucleotide and the 3' end of second oligonucleotide are not complementary to each other. When the two strands are annealed, the resulting structure is double stranded at one end (the double-stranded region) and single stranded at the other end (the unmatched region) and is referred to herein as a “Y-shaped adaptor” (see FIG. 1). The double-stranded region of the Y- shaped adaptor may be blunt-ended (see FIG. 1) or it may have an overhang. In the latter case, the overhang may be a 3' overhang or a 5' overhang, and may comprise a single nucleotide or more than one nucleotide.
[0099] The Y-shaped adaptor is phosphorylated at its 5' end and the double-stranded portion of the duplex contains a single base 3' overhang comprising a ‘T’ deoxynucleotide (see FIG. 1). The adaptors are then ligated using T4 Ligase, rATP, to the ends of double stranded template molecules containing a single base 5’ overhand of an ‘A’ nucleotide (see FIG. 1). [0100] FIG. 2 illustrates how in prokaryotes, e.g., Mycobacterium, the Ku protein bridges DNA fragments and recruits an ATP dependent DNA ligase (LigD) for DNA repair. The Ku and LigD proteins are used for non-homologous end joining (NHEJ). NHEJ is a pathway that repairs double-strand breaks in DNA. NHEJ is referred to as "non-homologous" because the break ends are directly ligated without the need for a homologous template, in contrast to homology directed repair, which requires a homologous sequence to guide repair. In prokaryotes, the non-homologous end joining (NHEJ) pathway requires only two factors: Ku and LigD. Ku recognizes and binds free ends of double-stranded DNA (e.g., two fragments of DNA) and joins
the two ends to form a DNA bridge complex e.g., see FIG. 2). Ku then recruits LigD to the Ku- DNA complex. LigD uses its polymerase and exonuclease domains to repair the ends of the DNA. As it also possesses an ATP-dependent ligase domain, LigD then ligates the DNA. By Ku actively bringing the ends of the DNA together and recruiting LigD, the ligation conversion is boosted by as much as 30-fold, compared to reactions with LigD alone.
[0101] FIG. 3 provides an embodiment of a method of the disclosure which utilizes nonhom ologous end joining factors to ligate adaptors to double stranded template DNA. While ligation-mediated library prep can yield the highest quality genomes, the conversion of sample DNA to library DNA can be inefficient. In cases where the quantity of sample DNA is in short supply, this poor efficiency makes ligation-mediated library prep more challenging or even infeasible. Typical ligation-mediated library prep methods employ ligases that in nature serve to ligate nicked DNA. That is, their intended purpose is not to join and ligate two strands or ends of DNA, as is required by the library prep method. The ligation-mediated methods disclosed herein, employ the use of prokaryotic end joining and repair factors for the ligation of two ends of DNA.
[0102] In one embodiment, the in vitro end-repair and A-tailing steps of traditional library prep is employed. Then, instead of using T4 or other traditionally employed ligases, one uses Ku and LigD (e.g., see FIG. 3). In this case, the LigD’s wild-type nuclease activity is unneeded and so a nuclease deficient mutant can be used (e.g., Mycobacterium tuberculosis LigD H373A).
[0103] First, DNA (e.g., gDNA) or cDNA is fragmented into small molecules, typically less than 1000 base pairs in length. Fragmentation of DNA may be achieved by a number of methods including: enzymatic digestion, chemical cleavage, sonication, nebulization, or hydroshearing. Fragmented DNA may be made blunt-ended by a number of methods known to those skilled in the art. As shown in FIG. 3, the ends of the fragmented DNA are end repaired and phosphorylated using T4 DNA polymerase, dNTP, and T4 polynucleotide Kinase, rATP. A single ‘A’ deoxynucleotide is then added to both 3' ends of the DNA molecules using Klenow exo- enzyme, dATP, producing a one-base 3' overhang that is complementary to the one-base 3- ‘T' overhang on the double-stranded end of the Y-shaped nhEJF adaptor.
[0104] A ligation reaction between the Y-shaped nhEJF adaptor and the DNA fragments is then performed using Ku exo-(lacking exonuclease activity) and LigD, ATP, which joins two copies of the adaptor to each DNA fragment, one at either end, to form adaptor-polynucleotide
constructs. The products of this reaction can be purified from unligated nhEJF adaptor by a number of means, including size-exclusion chromatography.
[0105] After the excess nhEJF adaptor has been removed, unligated target DNA remains in addition to ligated adaptor-polynucleotide constructs and this can be removed by selectively capturing only those target DNA molecules that have adaptor(s) attached. For example, the presence of a biotin group on the 5' end of the adaptors enables any target DNA ligated to the adaptor to be captured on a surface coated with streptavidin, a protein that selectively and tightly binds biotin. Streptavidin can be coated onto a surface using well developed chemistries. In a particular embodiment, commercially available magnetic beads (e.g., Dynabeads™) that are coated in streptavidin can be used to capture ligated adaptor-target constructs. The application of a magnet to the side of a vessel containing these beads immobilizes them such that they can be washed free of the unligated target DNA molecules.
[0106] FIG. 4 provides another embodiment of a method of the disclosure which utilizes non-homologous end joining factors to ligate nhEJF adaptors to double stranded template DNA. First, DNA e.g., gDNA) or cDNA is fragmented into small molecules, typically less than 1000 base pairs in length. Fragmentation of DNA may be achieved by a number of methods including: enzymatic digestion, chemical cleavage, sonication, nebulization, or hydroshearing. Fragmented DNA may be made blunt-ended by a number of methods known to those skilled in the art. As shown in FIG. 4, the ends of the fragmented DNA are end repaired and phosphorylated using T4 DNA polymerase, dNTP, and T4 polynucleotide Kinase, rATP. Instead of KI enow exo-, Terminal transferase (TdT) can be used to generate poly-nucleotide overhangs that could then be joined, trimmed, and ligated by Ku and LigD to adaptors having complementary polynucleotide overhangs. Because multiple bases hybridize upon joining, the joining is much more efficient than the joining of the single-A tailed fragments to single-T overhang adaptors. However, because the exact length of the tail added by TdT cannot be easily controlled, there will be many cases where overhangs from the sample DNA and adaptor do not have the same number of overhanging bases. This imprecise junction normally cannot be ligated by basic ligases like T4 or E. coli ligase. To address this issue, Ku and LigD are employed. Again, Ku serves to bridge DNA ends and recruit LigD. Then, LigD’s wild-type exonuclease activity can trim excess overhanging single-stranded DNA to produce a precise junction that can be successfully and efficiently ligated. The products of this reaction can be purified from unligated adaptor by a number of means, including size-exclusion chromatography.
[0107] After the excess adaptor has been removed, unligated target DNA remains in addition to ligated adaptor-target constructs and this can be removed by selectively capturing only those target DNA molecules that have adaptor attached. For example, the presence of a biotin group on the 5' end of the adaptors enables any target DNA ligated to the adaptor to be captured on a surface coated with streptavidin, a protein that selectively and tightly binds biotin. Streptavidin can be coated onto a surface using well developed chemistries. In a particular embodiment, commercially available magnetic beads (e.g., Dynabeads™) that are coated in streptavidin can be used to capture ligated adaptor-target constructs. The application of a magnet to the side of a vessel containing these beads immobilizes them such that they can be washed free of the unligated target DNA molecules.
[0108] Accordingly, the disclosure provides for the use of non-homologous end joining factors, like Ku and LigD, to join various populations of dsDNA, or cDNA together for a variety of applications. For example, the non-homologous end joining factors can be used to add nhEJF adaptors to double stranded template DNA for library preparation.
[0109] The disclosure further provides for engineered variants of Ku and/or LigD including, but not limited to, to increase enzyme stability, to suppress exonuclease activity, or to increase enzymatic activity. Additionally, the LigD ligase domain can be replaced with another ligase e.g., T4 ligase), forming a fusion of LigD’ s polymerase and nuclease domains with the chosen ligase. This allows the fusion ligase to be recruited by Ku to DNA ends. As described above, a LigD exonuclease deficient mutant can be used when this nuclease activity is not desired.
[0110] In a particular embodiment, the disclosure provides for polypeptides that exhibit non-homologous end joining factor activity. In a particular embodiment, the polypeptide may encode a wild-type enzyme, a homolog thereof or encode an engineered variant of the wild-type enzyme. FIG. 5 provides a sampling of wild-type sequences for LigD (see SEQ ID NO: 1 to 20). FIG. 6 provides a sampling of wild-type sequences for Ku (see SEQ ID NO:21 to 30). In a particular embodiment, the disclosure provides for a polypeptide that has a sequence that is at least 80%, 85%, 90%, 95%, 98%, or 100% identical to any one of SEQ ID NO:1 to 20. In another embodiment, the disclosure provides for a polypeptide that has a sequence that is at least 80%, 85%, 90%, 95%, 98%, or 100% identical to any one of SEQ ID NO:21 to 30. In regards to engineered variants the polypeptides can encode LigD that exonuclease activity is suppressed by an appropriate substitution(s) in the exonuclease domain of LigD. An example of such a
substitution includes, H373A of SEQ ID NO:1. Clearly, other substitutions are contemplated and can be quickly determined by in silico methods.
[OHl] The term "homologs" used with respect to an original enzyme or gene of a first family or species refers to distinct enzymes or genes of a second family or species which are determined by functional, structural or genomic analyses to be an enzyme or gene of the second family or species which corresponds to the original enzyme or gene of the first family or species. Most often, homologs will have functional, structural or genomic similarities. Techniques are known by which homologs of an enzyme or gene can readily be cloned using genetic probes and PCR. Identity of cloned sequences as homolog can be confirmed using functional assays and/or by genomic mapping of the genes.
[0112] A protein has "homology" or is "homologous" to a second protein if the nucleic acid sequence that encodes the protein has a similar sequence to the nucleic acid sequence that encodes the second protein. Alternatively, a protein has homology to a second protein if the two proteins have "similar" amino acid sequences. (Thus, the term "homologous proteins" is defined to mean that the two proteins have similar amino acid sequences).
[0113] As used herein, two proteins (or a region of the proteins) are substantially homologous when the amino acid sequences have at least about 30%, 40%, 50% 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity. To determine the percent identity of two amino acid sequences, or of two nucleic acid sequences, the sequences are aligned for optimal comparison purposes (e.g., gaps can be introduced in one or both of a first and a second amino acid or nucleic acid sequence for optimal alignment and non- homologous sequences can be disregarded for comparison purposes). In one embodiment, the length of a reference sequence aligned for comparison purposes is at least 30%, typically at least 40%, more typically at least 50%, even more typically at least 60%, and even more typically at least 70%, 80%, 90%, 100% of the length of the reference sequence. The amino acid residues or nucleotides at corresponding amino acid positions or nucleotide positions are then compared.
When a position in the first sequence is occupied by the same amino acid residue or nucleotide as the corresponding position in the second sequence, then the molecules are identical at that position (as used herein amino acid or nucleic acid "identity" is equivalent to amino acid or nucleic acid "homology"). The percent identity between the two sequences is a function of the number of identical positions shared by the sequences, taking into account the number of gaps, and the length of each gap, which need to be introduced for optimal alignment of the two sequences.
[0114] When "homologous" is used in reference to proteins or peptides, it is recognized that residue positions that are not identical often differ by conservative amino acid substitutions. A "conservative amino acid substitution" is one in which an amino acid residue is substituted by another amino acid residue having a side chain (R group) with similar chemical properties (e.g., charge or hydrophobicity). In general, a conservative amino acid substitution will not substantially change the functional properties of a protein. In cases where two or more amino acid sequences differ from each other by conservative substitutions, the percent sequence identity or degree of homology may be adjusted upwards to correct for the conservative nature of the substitution. Means for making this adjustment are well known to those of skill in the art (see, e.g., Pearson et al., 1994, hereby incorporated herein by reference).
[0115] A "conservative amino acid substitution" is one in which the amino acid residue is replaced with an amino acid residue having a similar side chain. Families of amino acid residues having similar side chains have been defined in the art. These families include amino acids with basic side chains (e.g., lysine, arginine, histidine), acidic side chains (e.g., aspartic acid, glutamic acid), uncharged polar side chains (e.g., glycine, asparagine, glutamine, serine, threonine, tyrosine, cysteine), nonpolar side chains (e.g., alanine, valine, leucine, isoleucine, proline, phenylalanine, methionine, tryptophan), beta-branched side chains (e.g., threonine, valine, isoleucine) and aromatic side chains (e.g., tyrosine, phenylalanine, tryptophan, histidine). The following six groups each contain amino acids that are conservative substitutions for one another: 1) Serine (S), Threonine (T); 2) Aspartic Acid (D), Glutamic Acid (E); 3) Asparagine (N), Glutamine (Q); 4) Arginine (R), Lysine (K); 5) Isoleucine (I), Leucine (L), Methionine (M), Alanine (A), Valine (V), and 6) Phenylalanine (F), Tyrosine (Y), Tryptophan (W).
[0116] Sequence homology for polypeptides, which can also be referred to as percent sequence identity, is typically measured using sequence analysis software. See, e.g., the Sequence Analysis Software Package of the Genetics Computer Group (GCG), University of Wisconsin Biotechnology Center, 910 University Avenue, Madison, Wis. 53705. Protein analysis software matches similar sequences using measure of homology assigned to various substitutions, deletions and other modifications, including conservative amino acid substitutions. For instance, GCG contains programs such as "Gap" and "Bestfit" which can be used with default parameters to determine sequence homology or sequence identity between closely related polypeptides, such as homologous polypeptides from different species of organisms or between a wild type protein and a mutein thereof. See, e.g., GCG Version 6.1.
[0117] A typical algorithm used for comparing a molecule sequence to a database containing a large number of sequences from different organisms is the computer program BLAST (Altschul, 1990; Gish, 1993; Madden, 1996; Altschul, 1997; Zhang, 1997), especially blastp or tblastn (Altschul, 1997). Typical parameters for BLASTp are: Expectation value: 10 (default); Filter: seg (default); Cost to open a gap: 11 (default); Cost to extend a gap: 1 (default); Max. alignments: 100 (default); Word size: 11 (default); No. of descriptions: 100 (default); Penalty Matrix: BLOWSUM62.
[0118] When searching a database containing sequences from a large number of different organisms, it is typical to compare amino acid sequences. Database searching using amino acid sequences can be measured by algorithms other than blastp known in the art. For instance, polypeptide sequences can be compared using FASTA, a program in GCG Version 6.1. FASTA provides alignments and percent sequence identity of the regions of the best overlap between the query and search sequences (Pearson, 1990, hereby incorporated herein by reference). For example, percent sequence identity between amino acid sequences can be determined using FASTA with its default parameters (a word size of 2 and the PAM250 scoring matrix), as provided in GCG Version 6.1, hereby incorporated herein by reference.
[0119] The disclosure further provides methods using structure specific endonucleases (SSEs) for appending flap adaptors to polynucleotides. These SSE-based methods can be used to selectively add flap adaptors in a sequence specific manner, thereby providing for enrichment of targeted polynucleotides that have a specific sequence (z.e., target enrichment). Traditional methods for target enrichment broadly fall into two categories: amplicon based or probe-based hybridization/pulldown. The former employs primer pairs and PCR to amplify targets from a sample; it is simple and fast but limited in its ability to multiplex very high numbers of targets due to PCR mispriming events. It is also restricted in the size of amplicon that can be produced due to the limits of current PCR technology. Other disadvantages of PCR such as sequence bias or polymerase slippage can also impact the performance scope.
[0120] Hybridization methods involve making NGS library templates first then using a probe to pull out library templates that cover the target region of a genome of interest. This approach is generally longer in practice than the amplicon methods but is virtually limitless in the number of targets that can be enriched. Poorer specificity arising from hybridization of a single probe only, is mitigated by additional rounds of pulldown and/or increasing the probe length and Tm.
[0121] In general, amplicon workflows are used for small panels of targets whereas hybridization workflows can be used for exome enrichment.
[0122] Alternative methods of enrichment are been developed and are becoming more popular for example methods that employ Crispr/Cas9 enzymes where a predesigned RNA oligo can be used to direct Cas9 cleavage of the target regions. A pair of such guides will chop out a segment of DNA in a highly specific manner. When subsequent amplification is not required in the workflow, any size of target is possible. In theory this method can be highly multiplexed like hybridization-based enrichment, yet unlike the latter, specificity arises from the use of two Crispr/cas9/ guide complexes flanking the desired targets. Crisp/Cas9 enzymes do not append adaptor sequences to the fragments they cut, thus these methods require adaptors to be added to the enriched fragments, adding additional complexity to the workflow.
[0123] In another embodiment, the disclosure for the creation of target enriched polynucleotide library by methods using structure-specific nucleases. Accordingly, the disclosure provides an alternative methodology for creating target enriched libraries than known methods used in the art. The methods of disclosure provide increased specificity over conventional probebased hybridization/pulldown methods, by employing two probes flanking the target instead of one. Unlike amplicon-based methods, the methods of the disclosure have limitless multiplexity. A pre-generated library is not required and a target of any size and sequence can be enriched. While it is similar to Crispr/Cas9 approaches in that it employs two cleavage events on either side of a target sequence, it does, unlike Crispr/Cas, append adaptors either side of the target sequence. The resulting product comprising adaptors can be used to seed a flow cell directly or it can be further amplified by PCR if required. Amplification proceeds through primers that bind to the flanking adaptor sequences and thus is advantageous over multiplex PCR where amplification utilizes gene specific primers and efficiency varies between target amplicons.
[0124] Structure-specific nucleases are a class of DNA binding/modifying enzymes that target structures in nucleic acids in vivo rather than sequences. These structures comprise deviations to the contiguous double helix structure that usually arise, for example, during DNA replication or in the process of damage repair. Structures such as Holiday Junctions, replication forks, or single-stranded flaps require enzymes to resolve their topology to ultimately restore the canonical structure of the genome. Examples of structure-specific nucleases include, but are not limited to, Holliday junction resolvases, and flap endonucleases. In a particular embodiment, the disclosure provides for the creation of target enriched template library by the use of structurespecific nucleases, wherein the structure-specific nucleases comprise flap endonucleases. Flap
endonuclease enzymes target junctions in DNA where a single-stranded stretch of DNA protrudes from the double-helix. A flap may be described as a 3’ flap or a 5’ flap depending on the polarity of the sequence (e.g., see FIG. 7). FEN1 is an example of a Flap endonuclease that targets and modifies a 5’ flap, whereas the XPF/MUS81 family of proteins are examples of Flap endonucleases that target and modify 3’ flap structures.
[0125] FEN1 plays a central role in DNA replication both in eukaryotes and prokaryotes. It functions to remove single stranded 5’ flaps of DNA from Okazaki fragments that are generated on the lagging strand of the DNA replication fork. These flaps form when a primase generates an RNA primer that serves as a primer to extend a new DNA strand; multiple Okazaki fragments are generated and when the extending 3’ end of one abuts the 5’ end of another, it displaces it to form a flap structure (e.g., see FIG. 8). FEN1 binds to the 5’ flap and cleaves it at its base to leave a nick in the DNA (e.g., see FIG. 9). A ligase next seals the nick thus generating a contiguous new long strand from the initial Okazaki fragments (e.g., see FIG. 10). FEN1 can cleave single-stranded flaps of up to 200 nucleotides in length. It does not cleave single stranded DNA alone, such as the regions of single strands of the parental template strands at the replication fork; it only cleaves ssDNA strands in the structure of a flap. Its preferred substrate structure is a double flap where both the 5’ end of one strand and the 3’ end of the other abutting strand overlap and both ends form flaps and moreover, the 3’ flap is a single nucleotide long (e.g, see FIG. 11). A 5’ flap that contains double stranded regions are inhibitory for flap cleavage, even if the double stranded region is distant from the base of the flap.
[0126] The disclosure provides compositions, methods, and kits directed to the use of FEN1 with a 5’ flap adaptor having a specific structure that is recognized by FENl(e.g., see FIG. 12). The 5’ flap adaptor comprises two oligonucleotides that when annealed together form a partially double stranded molecule. A ‘probe’ portion of this 5’ flap adaptor is single stranded and complementary to a targeted sequence. The double stranded portion of the 5’ flap adaptor comprises a universal sequence that can be, for instance, the sequences of adaptors for NGS. The last base-pair next to the single stranded probe portion may also match a targeted sequence. When the 5’ flap adaptor is hybridized to the target DNA molecule that has been previously made single stranded, a flap structure forms (e.g., see FIG. 13). The structure comprising a 5’ flap from the target DNA and a single nucleotide 3’ flap from the adaptor is a substrate for FEN1, which can cleave, leaving a nick that can be subsequently joined by a ligase. The result is an addition of an adaptor to the 5’ end of a polynucleotide (e.g., see FIG. 14).
[0127] Flap endonucleases in the XPF/MUS81 family of proteins play a role in damage repair caused by UV-light or DNA cross-linking. Its target is illustrated in FIG. 15A and comprises a fork in a DNA structure where the two branches of the fork comprise noncomplementary sequences. The branches can be partially or fully double stranded or contiguous as in the example of a hairpin loop (see FIG. 15B). In all cases cleavage occurs within a few bases of the commencement of the 3’ flap, generating a nick (e.g., see FIG. 16). [0128] The disclosure provides embodiments directed to the utilization of XPF/MUS81 3’ flap endonuclease activity in conjunction with a 3’ flap adaptor having a specific structure (e.g., see FIG. 17). The 3’ flap adaptor comprises, at a minimum, a single oligonucleotide comprising a 3’ sequence complementary to a target of interest in a genome and a 5’ sequence universal sequence that can be, for instance, the sequences of adaptors for NGS. In one embodiment, the 5’ universal sequence may be double-stranded. When this oligo structure is hybridized to the target DNA molecule that has been previously made single stranded, a flap structure forms (e.g., see FIG. 18) which is a substrate for an XPF/MUS81 3’ flap endonuclease. The XPF/MUS81 3’ flap endonuclease then cleaves the DNA, leaving a nick that when extended with a polymerase copies the universal adaptor sequence to the target DNA (e.g., see FIG. 19). [0129] When the 5’ flap adaptor, complementary to a target #1, and the 3’ flap adaptor, complementary to a target #2, as outlined above, are hybridized to DNA, for example genomic DNA, a structure forms that contains a 5’ flap and a 3’ flap (e.g., see FIG. 20). Ideally, the genomic DNA is fragmented to a suitable size such that the 5’ flap is less than 200 nucleotides long. Addition of FEN1 and a ligase will append adaptor #1 to the target #1 5’ end of the DNA (e.g., see FIG. 21). Subsequent addition of XPF/MUS81 endonuclease and a polymerase will copy and therefore append adaptor #2 to the target #2 3’ end of the DNA (e.g., see FIG. 22). The method can be multiplexed to include many targets (e.g., see FIG. 24). It will be appreciated that flap endonucleases other than FEN1 and XPF/MUS18 may be used with the methods and compositions of the disclosure.
[0130] In the case where adaptor #1 and adaptor #2 comprise sequences chosen for cluster amplification on a sequencer, the sample can be applied directly to a sequencer for sequencing of the DNA intervening the target sequences. Alternatively, the adaptor sequences can be used to append additional sequences, such as step-out primers. For the sake of simplicity, the 5’ flap adaptor illustrated in FIG. 12 is shown hybridized already in a double-flap structure in FIG. 13. In practice, achieving this double-flap structure may require sequential annealing of the individual oligos that comprise the FEN1 structure, such that the longer oligo that contains the
probe #1 sequence anneals first to the target #1 followed by annealing of the shorter oligo complementary to the universal sequence of the adaptor. Such differential hybridization can be achieved by methods known to those skilled in the art, for example through design of the probe sequence and the universal sequences with different Tm (e.g., see FIG. 23). In such a method the probe can have a lower Tm than the universal adaptor sequence such that at a particular temperature the probe, but not the universal adaptor sequence, anneals first; lowering the temperature then enables the shorter universal adaptor oligo to anneal forming a structure illustrated in FIG. 13.
[0131] In a particular embodiment, the disclosure provides methods that utilizes a structure-specific endonuclease that has 5’ flap cleavage activity and a 5’ flap adaptor in order to append an adaptor to the 5’ end of a polynucleotide. For such methods, the 5’ flap adaptor is hybridized to a complementary sequence of a single stranded polynucleotide. Accordingly, the 5’ flap adaptor comprises a first oligonucleotide that has a single stranded region that can hybridize to a target sequence of a polynucleotide. The length of the single stranded region can vary but should be of sufficient length to bind with high fidelity to a targeted sequence. Accordingly, the single stranded region of the 5’ flap adaptor that can hybridize to a targeted sequence can comprise 15 nt, 16 nt, 17 nt, 18 nt, 19 nt, 20 nt, 21 nt, 22 nt, 23 nt, 24 nt, 25 nt, 26 nt, 27 nt, 28 nt, 29 nt, 30 nt, 31 nt, 32, nt, 33 nt, 34 nt, 35 nt, or a range of nucleotide lengths between or including any two of the foregoing nucleotide lengths. In a further embodiment, the first oligonucleotide of the 5’ flap adaptor further comprises a single stranded region that codes for a universal sequence, the universal sequence is not complementary (z.e., cannot hybridize) to the sequence of the polynucleotide. An example of a universal sequence includes, but is not limited to, a sequence commonly used for NGS applications, such a P5 or P7 sequence. The single stranded region that codes for a universal sequence can comprise 6 nt, 7 nt, 8 nt, 9 nt, 10 nt, 11 nt, 12 nt, 13 nt, 14 nt, 15 nt, 16 nt, 17 nt, 18 nt, 19 nt, 20 nt, 21 nt, 22, nt, 23 nt, 24 nt, 25 nt, or a range of nucleotide lengths between or including any two of the foregoing nucleotide lengths. The single stranded region that codes for a universal sequence can further comprise one or more barcode sequences, allowing for identification of the source of the polynucleotide if multiple sources of polynucleotides are being multiplexed in the same reaction. The first oligonucleotide of the 5’ flap adaptor can be hybridized directly to the target sequence of the polynucleotide and then a second oligonucleotide of the 5’ flap adaptor can be hybridized to the universal sequence of the first oligonucleotide of the 5’ flap adaptor. Accordingly, the second oligonucleotide of the 5’ flap adaptor comprises a sequence that is complementary to the universal sequence of the first
oligonucleotide of the 5’ flap adaptor. The second oligonucleotide of the 5’ flap adaptor may further comprise a base pair on the 3’ end that is complementary to a base pair to the single stranded region of the first oligonucleotide that hybridizes to the target sequence of the polynucleotide. Alternatively, the second oligonucleotide of 5’ flap adaptor may be hybridized to the first oligonucleotide of the 5’ flap adaptor so that there is a double stranded region comprising base pairs for the universal sequence, and a single stranded region from the first oligonucleotide that can hybridize with a target sequence from a polynucleotide. When the 5’ flap adaptor is bound to the target sequence of the polynucleotide a 5’ flap is generated in the polynucleotide. Additionally, a 1 base pair 3’ flap may also be generated if the second oligonucleotide comprises a base pair on the 3’ end that is complementary to a base pair of the single stranded region of the first oligonucleotide that hybridizes to a target sequence of the polynucleotide. The generation of 5’ flap, or the 5’ flap and 1 bp 3’ flap of the polynucleotide- adaptor hybridized construct is then recognized by a structure-specific endonuclease that has 5’ flap cleavage activity. The structure-specific endonuclease binds the polynucleotide-adaptor construct and cleaves off the 5’ flap structure and forms a nick in the polynucleotide-adaptor hybridized construct. This nick may then be closed by use of a ligase. The end result is the 5’ flap adaptor being appended to the polynucleotide, such that the polynucleotide now contains a sequence for the universal adaptor.
[0132] In a certain embodiment, the disclosure provides methods that utilizes a structurespecific endonuclease that has 3’ flap cleavage activity and a 3’ flap adaptor in order to append an adaptor to the 3’ end of a polynucleotide. For such methods, the 3’ flap adaptor is hybridized to a complementary sequence of a single stranded polynucleotide. Accordingly, the 3’ flap adaptor comprises an oligonucleotide that has a single stranded region that can hybridize to a target sequence of a polynucleotide. The length of the single stranded region can vary but should be of sufficient length to bind with high fidelity to a targeted sequence. Accordingly, the single stranded region of the 3’ flap adaptor that can hybridize to a targeted sequence can comprise 15 nt, 16 nt, 17 nt, 18 nt, 19 nt, 20 nt, 21 nt, 22 nt, 23 nt, 24 nt, 25 nt, 26 nt, 27 nt, 28 nt, 29 nt, 30 nt, 31 nt, 32, nt, 33 nt, 34 nt, 35 nt, or a range of nucleotide lengths between or including any two of the foregoing nucleotide lengths. In a further embodiment, the oligonucleotide of the 3’ flap adaptor further comprises a single stranded region that codes for a universal sequence, the universal sequence is not complementary (z.e., cannot hybridize) to the sequence of the polynucleotide. An example of a universal sequence includes, but is not limited to, a sequence commonly used for NGS applications, such a P5 or P7 sequence. The universal sequence of the
3’ flap adaptor may be the same as the universal sequence of the 5’ flap adaptor, or alternatively be different. The single stranded region that codes for a universal sequence can comprise 6 nt, 7 nt, 8 nt, 9 nt, 10 nt, 11 nt, 12 nt, 13 nt, 14 nt, 15 nt, 16 nt, 17 nt, 18 nt, 19 nt, 20 nt, 21 nt, 22, nt, 23 nt, 24 nt, 25 nt, or a range of nucleotide lengths between or including any two of the foregoing nucleotide lengths. The single stranded region that codes for a universal sequence can further comprise one or more barcode sequences, allowing for identification of the source of the polynucleotide if multiple sources of polynucleotides are being multiplexed in the same reaction. The oligonucleotide of the 3’ flap adaptor is hybridized directly to a complementary target sequence of the polynucleotide. In a particular embodiment, the 3’ flap adaptor and the 5’ flap adaptor hybridize to the same polynucleotide. The 5’ flap adaptor and the 3’ flap adaptor can be hybridized to the polynucleotide in a concurrent or sequential manner. When the 3’ flap adaptor is bound to a target sequence of the polynucleotide a 3’ flap is generated in the polynucleotide. The generation of 3’ flap of polynucleotide-adaptor hybridized construct is then recognized by a structure-specific endonuclease that has 3’ flap cleavage activity. The structure-specific endonuclease binds the polynucleotide-adaptor hybridized construct and cleaves off the 3’ flap structure forming a 3’ overhang that comprises the universal sequence region of the 3’ adaptor. By using a polymerase, the 3’ overhang may be filed with a complementary sequence to the universal sequence of the 3’ adaptor. The end result is the 3’ flap adaptor being appended to the polynucleotide, such that the polynucleotide now comprises a sequence for the universal adaptor. [0133] In a particular embodiment, the disclosure provides methods that utilize tagmentation and template switch oligonucleotides to append adaptors to polynucleotides. Tagmentation is an established workflow for making templates for polynucleotide applications. The process relies on the transposase enzyme Tn5 fragmenting and simultaneously appending adaptor sequences to the 5’ ends of polynucleotide fragments (see FIG. 25). A second, Tn5 independent, step is used to further process the ‘adapted’ fragments to append a similar or different adaptor to the 3’ ends of the fragments thus completing the library template in a form ready for polynucleotide applications, like sequencing. In existing tagmentation based consumable kits, the 3’ end adaptors are added in one of two ways. One way for appending adaptors to 3' ends of Tn5 tagmented products is shown in FIG. 26. More specifically, the free 3’ end of the fragment can be extended in the presence of a polymerase (e.g., a strand displacing polymerase), dNTPs and heat (e.g., >68 °C) to remove the ‘non-transferred’ strand of the transposome ds adaptor. The complement of the 5’ adaptor is copied, and finally a PCR reaction with two distinct primers (e.g., P5-i5-A14 and P7-i7-B15) can be used to enrich for those primary
tagmentation molecules that have a P5 based adaptor on one end and a P7 based adaptor on the other end. The other way for appending adaptors to 3' ends of Tn5 tagmented products is shown in FIG. 27. As shown, a single double-stranded ‘forked’ adaptor is employed in the transposome (see FIG. 27). A non-displacing polymerase is used at a temperature below the Tm of the nontransferred strand (e.g., <55 °C), to extend the free 3’ end of the fragment until it reaches the 5’ end of the ‘non-transf erred’ adaptor strand and then a ligase covalently connects the nontransferred strand to the fragment.
[0134] The disclosure provides new and innovative methods for appending an adaptor oligo sequence to the 3’ end of a tagmented fragments that utilizes partial extension from the free 3’ end of a tagmented fragment to generate a known sequence capable of hybridizing to, and extending off, a template switch oligo (see FIGs. 28-34). The disclosure additionally provides methods for processing a fully extended 3’ sequence to alter its base composition.
[0135] An embodiment for appending adaptors to the ends of tagmented polynucleotides capable of hybridizing to, and extending off, a template switch oligo is shown in FIG. 28. As shown in FIG. 28, DNA is tagmented with a transposome that may or may not have modifications to the transferred strand. The tagmented library is treated to de-anneal and remove the non-transferred strand of the transposome. For example, if the transposome is attached to a solid surface (e.g., a bead), mild heat followed by a hot wash will remove the strand. Next a replacement oligo that has a higher Tm than the non-transferred strand is hybridized back to the appended adaptors. A higher Tm may be conferred by a number of ways: for example, the oligo can be longer than the non-transferred strand it replaces, or it may contain modifications such as ‘Linked Nucleic Acids’ (LNAs). The modifications may also or only be present in the transferred strand. The replacement oligo does not hybridize in place of all the non-transferred strand, but instead does so partially. This results in portion of the adaptor, 5’ of the replacement oligo, remaining single stranded. Next a non-displacing polymerase is used to extend from the 3’ ends of the insert filling in the ends of the insert and extending over the adaptor sequence but stopping when it reaches the hybridized replacement oligo. In this way, each strand of the insert now has an adaptor appended at its 5’ end (through tagmentation) and a partial adaptor at its 3’ end (through the extension step described herein). If the strands are denatured, they can then participate in a template switch reaction where a 3’ blocked template switch oligo is annealed and serves as a template to further extend the 3’ end of the insert to add new sequences to the 3’ end of the fragment.
[0136] Additional embodiments for appending adaptors to the ends of tagmented polynucleotides capable of hybridizing to, and extending off, a template switch oligo is shown in FIG. 28A-B. As shown in FIG. 28A, a transposome is employed that already has the ‘replacement oligo’ annealed next to the non-transferred strand. In this embodiment, the nontransferred strand can be shorter than its standard 19 base Tb5 recognition sequence (for example 16 bases). Following tagmentation and removal of the Tn5, moderate heat can be used to denature the non-transferred strand leaving the ‘LNA containing replacement oligo’ still annealed. Next a non-displacing polymerase is used to extend from the 3’ ends of the insert filling in the ends of the insert and extending over the non-transferred strand but stopping when it reaches the hybridized replacement oligo. In this way, each strand of the insert now has an adaptor appended at its 5’ end (through tagmentation) and a partial adaptor at its 3’ end (through the extension step described herein). If the strands are denatured, they can then participate in a template switch reaction where a 3’ blocked template switch oligo is annealed and serves as a template to further extend the 3’ end of the insert to add new sequences to the 3’ end of the fragment. As shown in FIG. 28B, a transposome is employed that already has the ‘replacement oligo’ annealed next to the non-transferred strand. In this embodiment, the non-transferred strand can be shorter than its standard 19 base Tb5 recognition sequence (for example 16 bases). Following tagmentation and removal of the Tn5, moderate heat can be used to denature the nontransferred strand leaving the ‘LNA containing replacement oligo’ still annealed. Next a nondisplacing polymerase is used to extend from the 3’ ends of the insert filling in the ends of the insert and extending over the non-transferred strand but stopping when it reaches the hybridized replacement oligo. In this way, each strand of the insert now has an adaptor appended at its 5’ end (through tagmentation) and a partial adaptor at its 3’ end (through the extension step described herein). If the strands are denatured, they can then participate in a template switch reaction where a 3’ blocked template switch oligo is annealed and serves as a template to further extend the 3’ end of the insert to add new sequences to the 3’ end of the fragment.
[0137] Additional embodiments for appending adaptors to the ends of tagmented polynucleotides capable of hybridizing to, and extending off, a template switch oligo is shown in FIG. 30. As shown in FIG. 30, a transposome is employed that contains one or more internal modifications in the non-transferred strand that prevents exonuclease digestion, e.g., a phosphorothioate linkage in the phosphodiester backbone of the oligo. Following tagmentation, a polymerase with a 5’ to 3’ exonuclease activity is employed to extend from the free 3’ end of the insert. As it extends it encounters the 5’ end of the non-transferred strand and digests the bases
of this oligo until it encounters the modified linkage at which point it extends no further. The remaining short portion of the non-transferred strand can be denatured by moderate heat. In this way, each strand of the insert now has an adaptor appended at its 5’ end (through tagmentation) and a partial adaptor at its 3’ end (through the extension step described herein). If the strands are denatured, they can then participate in a template switch reaction where a 3’ blocked template switch oligo is annealed and serves as a template to further extend the 3’ end of the insert to add new sequences to the 3 ’ end of the fragment.
[0138] Additional embodiments for appending adaptors to the ends of tagmented polynucleotides capable of hybridizing to, and extending off, a template switch oligo is shown in FIGs. 31A-31B. As shown in FIG. 31A, a non-displacing polymerase and all four dNTPs (dATP, dCTP, dGTP, dTTP) is used to extend from the free 3’ end of the fragment up to the 5’ end of the non-transferred strand. The polymerase and dNTPs are then removed (eg purified on SPRI beads, or by magnetic bead-based washing if the adaptors are attached to a bead). A fresh aliquot of a strand displacing polymerase is added and just three out of the four dNTPs (dCTP, dGTP, dTTP); dATP is absent. This mix will continue to extend the 3’ end of the fragment across the Tn5 adaptor which for the first 19 bases is a known Tn5 recognition sequence (5’AGATGTGTATAAGAGACAG3’) (SEQ ID NO: 31) but will stop incorporating bases once it reaches a ‘T’ base in the template, due to the absence of dATP molecules. In this way, each strand of the insert now has an adaptor appended at its 5’ end (through tagmentation) and a partial adaptor at its 3’ end (through the extension step described herein and comprising the sequence: 5’CTGTCTCTT3’). If the strands are denatured, they can then participate in a template switch reaction where a 3’ blocked template switch oligo is annealed and serves as a template to further extend the 3 ’ end of the insert to add new sequences to the 3 ’ end of the fragment. As shown in FIG. 31B, a non-displacing polymerase and all four dNTPs (dATP, dCTP, dGTP, dTTP) is used to extend from the free 3’ end of the fragment up to the 5’ end of the nontransferred strand. The polymerase and dNTP are then removed (e.g., purified on SPRI beads, or by magnetic bead-based washing if the adaptors are attached to a bead). The non-transferred strand is then removed, e.g., by moderate heat to selective denature the strand, or application of a lambda exonuclease that selectively digests oligos containing a 5’ phosphorylated ends (as is the case with non-transferred strands), or other means known to those skilled in the art. A fresh aliquot of a polymerase is added and just three out of the four dNTPs (dCTP, dGTP, dTTP); dATP is absent. This mix will continue to extend the 3’ end of the fragment across the Tn5 adaptor which for the first 19 bases is a known invariant sequence
(5’AGATGTGTATAAGAGACAG3’) (SEQ ID NO: 31)but will stop incorporating bases once it reaches a ‘T’ base in the template, due to the absence of dATP molecules. In this way, each strand of the insert now has an adaptor appended at its 5’ end (through tagmentation) and a partial adaptor at its 3’ end (through the extension step described herein and comprising the sequence: 5’CTGTCTCTT3’). If the strands are denatured, they can then participate in a template switch reaction where a 3’ blocked template switch oligo is annealed and serves as a template to further extend the 3 ’ end of the insert to add new sequences to the 3 ’ end of the fragment.
[0139] It will be understood that the workflow described in the embodiments above may take place where the transposomes are attached to a surface such as a bead, or the transposomes may be free in solution. It will also be understood that in each workflow, the 3’ end can also be completed following extension by hybridizing an oligo that is partially complementary to the 5’ adaptor but contains further sequences that are unique and not present in the 5’ adaptor. A ligation reaction covalently joins this oligo to the 3’ end of the insert and forms a ‘ Y’ shaped adaptor construct (see FIG. 32).
[0140] Additional embodiments for appending adaptors to the ends of tagmented polynucleotides capable of hybridizing to, and extending off, a template switch oligo is shown in FIG. 33A. As shown in FIG. 33A, a single transposome type comprises a P5 transferred-strand and a short non-transferred-strand is used to tagment DNA. Following removal of the Tn5 posttagmentation, a strand displacing polymerase is added to extend from the free 3’ end, create the complement of the P5 transferred-strand (ie., creates P5’), and displace the short non-transferred strand. The temperature is then elevated to make the fragments single stranded. A P7 template switch oligo hybridizes forming a forked structure that is partially double-stranded. Then in the presence of a polymerase with 3’ exo activity, the single stranded 3’ end is degraded by this activity until there is no longer any single stranded 3’ end. The remaining 3’ end of the fragment then forms a primer template that extends and creates the complement of the P7 template switch oligo.
[0141] Additional embodiments for appending adaptors to the ends of tagmented polynucleotides capable of hybridizing to, and extending off, a template switch oligo is shown in FIG. 34. As shown in FIG. 34, a structure specific endonuclease (e.g., XPF/MUS81) can be employed as an alternative to exonuclease degradation of the 3’ single stranded of the forked structure following hybridization of the template switch oligo. In this embodiment the endonuclease nicks the double stranded region of the duplex creating a free 3’ end that a polymerase extends and creates the complement of the P7 template switch oligo.
[0142] The adaptor-polynucleotide constructs prepared according to the methods disclosed herein can be used in any method of nucleic acid analysis, e.g., sequencing of the templates or amplification products thereof. Exemplary uses of the template libraries include, but are not limited to, providing templates for whole genome amplification, sequencing, subcloning, and PCR amplification (of either monotemplate or complex template libraries).
[0143] Template libraries prepared according to a method of the disclosure can be from a complex mixture of genomic DNA fragments representing a whole or substantially whole genome provide suitable templates for so-called “whole-genome” amplification. The term “whole-genome amplification” refers to a nucleic acid amplification reaction (e.g., PCR) in which the template to be amplified comprises a complex mixture of nucleic acid fragments representative of a whole (or substantially whole genome).
[0144] The library of templates prepared according to the methods described herein can be used for solid-phase nucleic acid amplification. The term “solid-phase amplification” as used herein refers to any nucleic acid amplification reaction carried out on or in association with a solid support such that all or a portion of the amplified products are immobilized on the solid support as they are formed. In particular, the term encompasses solid-phase polymerase chain reaction (solid-phase PCR), which is a reaction analogous to standard solution phase PCR, except that one or both of the forward and reverse amplification primers is/are immobilized on the solid support.
[0145] For “solid-phase” amplification methods, one amplification primer may be immobilized (the other primer usually being present in free solution). Alternatively, both the forward and the reverse primers may be immobilized. In practice, there will be a “plurality” of identical forward primers and/or a “plurality” of identical reverse primers immobilized on the solid support, since the PCR process requires an excess of primers to sustain amplification. References herein to forward and reverse primers are to be interpreted accordingly as encompassing a “plurality” of such primers unless the context indicates otherwise.
[0146] It is possible to carry out solid-phase amplification using only one type of primer, and such single-primer methods are encompassed within the scope of the invention. Other embodiments may use forward and reverse primers which contain identical template-specific sequences but which differ in some other structural features. For example, one type of primer may contain a non-nucleotide modification which is not present in the other. In other embodiments, the forward and reverse primers may contain template-specific portions of different sequence.
[0147] Amplification primers for solid-phase PCR are preferably immobilized by covalent attachment to the solid support at or near the 5' end of the primer, leaving the templatespecific portion of the primer free for annealing to its cognate template and the 3' hydroxyl group free for primer extension. Any suitable covalent attachment means known in the art may be used for this purpose. The chosen attachment chemistry will depend on the nature of the solid support, and any derivatization or functionalization applied to it. The primer itself may include a moiety, which may be a non-nucleotide chemical modification, to facilitate attachment.
[0148] It is useful to use the library of templates prepared according to a method disclosed herein to prepare clustered arrays of nucleic acid colonies by solid-phase PCR amplification. The terms “cluster” and “colony” are used interchangeably herein to refer to a discrete site on a solid support comprised of a plurality of identical immobilized nucleic acid strands and a plurality of identical immobilized complementary nucleic acid strands. The term “clustered array” refers to an array formed from such clusters or colonies. In this context the term “array” is not to be understood as requiring an ordered arrangement of clusters.
[0149] In a particular embodiment, the disclosure further provides methods of sequencing amplified nucleic acids generated by whole genome or solid-phase amplification. Thus, the disclosure provides a method of nucleic acid sequencing comprising amplifying a library of nucleic acid templates using whole genome or solid-phase amplification as described above and carrying out a nucleic acid sequencing reaction to determine the sequence of the whole or a part of at least one amplified nucleic acid strand produced in the whole genome or solid-phase amplification reaction.
[0150] Sequencing can be carried out using any suitable “sequencing-by-synthesis” technique, wherein nucleotides are added successively to a free 3 ' hydroxyl group, resulting in synthesis of a polynucleotide chain in the 5' to 3' direction. The nature of the nucleotide added is preferably determined after each nucleotide addition.
[0151] The initiation point for the sequencing reaction may be provided by annealing of a sequencing primer to a product of the whole genome or solid-phase amplification reaction. In this connection, one or both of the adaptors added during formation of the template library may include a nucleotide sequence which permits annealing of a sequencing primer to amplified products derived by whole genome or solid-phase amplification of the template library.
[0152] The products of solid-phase amplification reactions wherein both forward and reverse amplification primers are covalently immobilized on the solid surface are so-called “bridged” structures formed by annealing of pairs of Immobilized polynucleotide strands and
immobilized complementary strands, both strands being attached to the solid support (e.g., a flowcell) at the 5' end. Arrays comprised of such bridged structures provide inefficient templates for nucleic acid sequencing, since hybridization of a conventional sequencing primer to one of the immobilized strands is not favored compared to annealing of this strand to its immobilized complementary strand under standard conditions for hybridization.
[0153] In order to provide more suitable templates for nucleic acid sequencing it is useful to remove substantially all or at least a portion of one of the immobilized strands in the “bridged” structure in order to generate a template which is at least partially single-stranded. The portion of the template which is single-stranded will thus be available for hybridization to a sequencing primer. The process of removing all or a portion of one immobilized strand in a “bridged” double-stranded nucleic acid structure may be referred to herein as “linearization”.
[0154] Bridged template structures may be linearized by cleavage of one or both strands with a restriction endonuclease or by cleavage of one strand with a nicking endonuclease. Other methods of cleavage can be used as an alternative to restriction enzymes or nicking enzymes, including inter alfa chemical cleavage (e.g. cleavage of a diol linkage with periodate), cleavage of abasic sites by cleavage with endonuclease, or by exposure to heat or alkali, cleavage of ribonucleotides incorporated into amplification products otherwise comprised of deoxyribonucleotides, photochemical cleavage or cleavage of a peptide linker.
[0155] It will be appreciated that a linearization step may not be essential if the solidphase amplification reaction is performed with only one primer covalently immobilized and the other in free solution.
[0156] In order to generate a linearized template suitable for sequencing it is necessary to remove “unequal” amounts of the complementary strands in the bridged structure formed by amplification so as to leave behind a linearized template for sequencing which is fully or partially single stranded. Most preferably one strand of the bridged structure is substantially or completely removed.
[0157] Following the cleavage step, regardless of the method used for cleavage, the product of the cleavage reaction may be subjected to denaturing conditions in order to remove the portion(s) of the cleaved strand(s) that are not attached to the solid support. Suitable denaturing conditions will be apparent to the skilled reader with reference to standard molecular biology protocols (Sambrook el al.. 2001, Molecular Cloning, A Laboratory Manual, 3rd Ed, Cold Spring Harbor Laboratory Press, Cold Spring Harbor Laboratory Press, NY; Current Protocols, eds. Ausubel el al. .
[0158] Denaturation (and subsequent re-annealing of the cleaved strands) results in the production of a sequencing template which is partially or substantially single-stranded. A sequencing reaction may then be initiated by hybridization of a sequencing primer to the singlestranded portion of the template.
[0159] Thus, the nucleic acid sequencing reaction may comprise hybridizing a sequencing primer to a single-stranded region of a linearized amplification product, sequentially incorporating one or more nucleotides into a polynucleotide strand complementary to the region of amplified template strand to be sequenced, identifying the base present in one or more of the incorporated nucleotide(s) and thereby determining the sequence of a region of the template strand.
[0160] One sequencing method which can be used in accordance with the disclosure relies on the use of modified nucleotides that can act as chain terminators. Once the modified nucleotide has been incorporated into the growing polynucleotide chain complementary to the region of the template being sequenced there is no free 3'-OH group available to direct further sequence extension and therefore the polymerase cannot add further nucleotides. Once the nature of the base incorporated into the growing chain has been determined, the 3' block may be removed to allow addition of the next successive nucleotide. By ordering the products derived using these modified nucleotides it is possible to deduce the DNA sequence of the DNA template. Such reactions can be done in a single experiment if each of the modified nucleotides has attached a different label, known to correspond to the particular base, to facilitate discrimination between the bases added at each incorporation step. Alternatively, a separate reaction may be carried out containing each of the modified nucleotides separately.
[0161] The modified nucleotides may carry a label to facilitate their detection. Preferably this is a fluorescent label. Each nucleotide type may carry a different fluorescent label. However, the detectable label need not be a fluorescent label. Any label can be used which allows the detection of an incorporated nucleotide.
[0162] One method for detecting fluorescently labelled nucleotides comprises using laser light of a wavelength specific for the labelled nucleotides, or the use of other suitable sources of illumination. The fluorescence from the label on the nucleotide may be detected by a CCD camera or other suitable detection means.
[0163] The disclosure is not intended to be limited to use of the sequencing method outlined above, as essentially any sequencing methodology which relies on successive incorporation of nucleotides into a polynucleotide chain can be used. Suitable alternative
techniques include, for example, Pyrosequencing™, FISSEQ (fluorescent in situ sequencing), MPSS (massively parallel signature sequencing) and sequencing by ligation-based methods. [0164] The target polynucleotide to be sequenced using the method of the disclosure may be any polynucleotide that it is desired to sequence. Using the template library preparation method described in detail herein it is possible to prepare template libraries starting from essentially any double or single-stranded target polynucleotide of known, unknown or partially known sequence. With the use of clustered arrays prepared by solid-phase amplification it is possible to sequence multiple targets of the same or different sequence in parallel.
[0165] The use of methods of the disclosure to append adaptor(s) to a polynucleotide can be used in any number of methods, outside of preparing sequencing libraries. For example, nonhom ologous end joining factors be used for any ligation reaction where one needs (1) high efficiency or (2) to join damaged or other incompatible ends.
[0166] The disclosure further provides for kits comprising the non-homologous end joining factors disclosed herein. The kits can be tailored for use in particular applications. For example, the kits can be directed to the use of non-homologous end joining factors, or use of structure specific endonucleases in preparing libraries of adaptor-polynucleotide constructs using the methods of the disclosure. Such kits can comprise at least a supply of adaptors as defined. The kits can further comprise enzymes (e.g., structure specific endonucleases or non-homologous end joining factors), and/or amplification primers). The structure and properties of amplification primers will be well known to those skilled in the art. Adaptors included in the kit can be readily prepared using standard automated nucleic acid synthesis equipment and reagents in routine use in the art.
[0167] Adaptors may be supplied in the kits ready for use, or more preferably as concentrates-requiring dilution before use, or even in a lyophilized or dried form requiring reconstitution prior to use. If required, the kits may further include a supply of a suitable diluent for dilution or reconstitution of the adaptors. Optionally, the kits may further comprise supplies of reagents, buffers, enzymes, for use in carrying out the methods disclosed herein. Further components which may optionally be supplied in the kit include “universal” sequencing primers suitable for sequencing templates prepared using the adaptors and primers.
EXAMPLES
[0168] Appending an adaptor to the 3 ' end of a tagmented polynucleotide that can react with a template switch oligo. A transposome was constructed that contained LNA modifications in its transferred strand and was used to tagment genomic DNA. The transposome was immobilized on a streptavidin paramagnetic bead via a 3’ biotin group on an ‘anchor’ oligo hybridized to the transferred strand. The tagmentation was conducted in the presence of a ligase enzyme and an Illumina™ Indexing primer P5-i5-A14. This primer hybridized 5’ of the transferred strand and was ligated to it by virtue of the 5’ end of the transferred strand bearing a phosphate moiety by design. The Tn5 protein was then removed by denaturing it with a solution of SDS. Different SDS concentrations (%) were tested to effect complete removal of the Tn5 protein. Following a wash of the beads, a mixture of a non-displacing polymerase (tTaq608), dNTPs, Q5 polymerase and a template switching oligo was added and incubated at 47 °C to deanneal the non-transferred strand and extend with tTaq608 pol as far as the anchor oligo. The temperature was then raised further (60-70°C) to the point where the templates were rendered single stranded and no longer attached to the beads. The temperature was then lowered to 42 °C for 1 min to allow the template switch oligo containing the P7-i7-B15 sequences to hybridize. Q5 polymerase then extended from the free 3’ end of the fragment to copy the P7-i7-B15, thus completely the template construct. A qPCR reaction was performed to quantify how much completed template was present. The graph indicates that up to 2000 pM of correct product was formed under SDS concentrations that removed all of the Tn5 from the tagmented product complex (see FIG. 29).
[0169] Experiments looking at fork’ -modulated switch in activity from exonuclease to extension activity of 3 ' adaptors that can react with a template switch oligo. Simple P5 transposomes were immobilized on a streptavidin bead by hybridization to an ‘anchor’ oligo and used to tagment DNA. Following removal of the Tn5 and extension from the free 3’ end to create the complement of the P5 transferred-strand, the fragments were denatured and hybridized with a 5’ FAM fluorescently-labelled P7 template switch oligo. The template switch oligo comprised either a free extendable -OH group at its 3’ end or a non-extendable blocked dideoxyC group at its 3’ end, or a non-extendable ‘inverted T’ blocking group at its 3’ end. On addition of a polymerase and dNTPs, the P5’ end of the template is digested and replaced with the P7’. The products of the reaction were subjected to analysis by ‘gel size exclusion’ electrophoresis and by a qPCR reaction which only amplifies and quantifies templates that have a P5 adaptor at their 5’
end and a P7’ adaptor at their 3’ end. FIG. 33C presents the image of the gel electrophoresis and indicates that neither of the two 3 ’blocked template switch oligos were consumed indicating that the block is effective in preventing extension from the 3’ end of the template switch oligo which would result in creating a copy of the entire template. In contrast, when a 3’ extendable template switch oligo was used, extension and copying of the entire template occurred as evident from the reduction in fluorescence intensity of the template switch oligo band and the appearance of higher molecular weight product labelled with FAM. FIG. 33D presents the results of the qPCR analysis that only detects product that is correctly appended with P5 on the 5’ end and P7’ on the 3’ end, but not product appended with P5 on both ends. Both blocked template switch oligos yielded the correct product at approximately 1,700 pM. In contrast, the unblocked template switch oligo produced 1.5x as much product as a result of two mechanisms: (i) ‘fork’ -modulated switch in activity from exonuclease to extension activity to append the P7’ adaptor to the 3’ end of the template, and (ii) extension from the 3’ end of the template switch oligo to append a copy of the template to the P7 template switch oligo.
[0170] Additional experiments looking at ‘fork’ -modulated switch in activity from exonuclease to extension activity of 3 ' adaptors that can react with a template switch oligo. A simple P5 transposome was immobilized on a streptavidin bead by hybridization to an ‘anchor’ oligo and used to tagment DNA. Following removal of the Tn5 and extension from the free 3’ end to create the complement of the P5 transferred-strand, the fragments were denatured and hybridized with a P7 template switch oligo that comprised a 3’ end blocked with a ddC moiety that prevents incorporation and extension. On addition of a polymerase and dNTPs, the P5’ end of the template is digested and replaced with the P7’ copied from the template switch oligo (see FIG. 33E). By way of contrast, a control experiment was also performed using a transposome comprising a P5 transferred-strand hybridized to a bead via an anchor oligo and a non-transferred strand comprising a single stranded 3’ end that is complementary to a P7 oligo (see FIG. 33F). Following tagmentation and removal of the Tn5 protein, a mixture of a polymerase, ligase and the P7 oligo were added and incubated to extend and ligate the free 3’ end of the template to the 5’ end of the non-transferred strand while simultaneously extending the 3’ end of the nontransferred strand to copy the P7 oligo, thus producing a completed template comprised a P5 5’ end and a P7’ 3’ end. Both libraries from the control and test transposomes were subjected to qPCR to assess yields (see FIG. 33G). FIG. 33G indicates that the tempi ate- switch workflow of FIG. 33E produced a greater yield of library than the control workflow of FIG. 33F. The mechanism and data presented above was completely surprising and unexpected. The initial
assumption was that extending a tagmented template to its end would only result in product with P5 adaptors at both ends and thus would not yield any product in a qPCR reaction. The gel experiment with 3’ blocked template switch oligos proved that the result was not due to an artefact, but must unexpectedly result instead from the switch from exonuclease to extension activity at the 3’ end of the template.
[0171] A number of embodiments of the disclosure have been described. Nevertheless, it will be understood that various modifications may be made without departing from the spirit and scope of the disclosure. Accordingly, other embodiments are within the scope of the following claims.
Claims
1. A method to append adaptors to the 5’ and 3’ ends of polynucleotides, comprising: fragmenting gDNA or cDNA into polynucleotides that are less than 1000 base pairs in length; end repairing and phosphorylating the polynucleotides; attaching adaptors to the 5’ and 3’ ends of the end-repaired polynucleotides using non-homologous end joining factors.
2. The method of claim 1, wherein the gDNA or cDNA is fragmented by enzymatic digestion, chemical cleavage, sonication, nebulization, or hydroshearing.
3. The method of claim 1, wherein the gDNA or cDNA is fragmented by sonication.
4. The method of any one of claims 1 to 3, wherein the DNA fragments are enzymatically end repaired and phosphorylated by using T4 DNA polymerase and T4 polynucleotide kinase.
5. The method of any one of claims 1 to 4, wherein prior to attaching adaptors to the polynucleotides, a single ‘A’ deoxynucleotide is added to the end-repaired DNA fragments by use of KI enow enzyme which lacks exonuclease activity.
6. The method of any one of the preceding claims, wherein the adaptors comprise a 3' overhang of a ‘T’ deoxynucleotide.
7. The method of any one of the preceding claims, wherein the adaptors comprise a double stranded region of complementary sequence and a single stranded region of sequence mismatch.
8. The method of claim 7, wherein the adaptors are Y-shaped or U-shaped.
9. The method of claim 7 or claim 8, wherein the single stranded regions of the adaptors comprise one or more of the following sequences:
P5: 5' AAT GAT ACG GCG ACC ACC GA 3' (SEQ ID NO: 32)
P7: 5' CAA GCA GAA GAC GGC ATA CGA GAT 3' (SEQ ID NO: 33).
10. The method of any one of claims 1 to 4, wherein prior to attaching adaptors to the end- repaired polynucleotides, oligonucleotides are added to the 3’ ends of the DNA fragments with terminal transferase.
11. The method of claim 10, wherein the adaptors comprise an overhang of base pairs that are complementary to the oligonucleotides added to the 3’ ends of the DNA fragments.
12. The method of claim 11, wherein the adaptors comprise a double stranded region of complementary sequence and a single stranded region of sequence mismatch.
13. The method of claim 12, wherein the adaptors are Y-shaped or U-shaped.
14. The method of claim 12 or claim 13, wherein the single stranded regions of the adaptors comprise one or more of the following sequences:
P5: 5' AAT GAT ACG GCG ACC ACC GA 3' (SEQ ID NO: 32)
P7: 5' CAA GCA GAA GAC GGC ATA CGA GAT 3' (SEQ ID NO: 33).
15. The method of any one of the preceding claims, wherein the non-homologous end joining factors are LigD and Ku, or an engineered variant thereof.
16. The method of claim 15, wherein the LigD and Ku are from, or derived from, from Mycobacterium.
17. The method of any one of the preceding claims, wherein a non-homologous end joining factor is encoded by a polypeptide that has a sequence that is at least 80%, 85%, 90%, 95%, 98%, 99% identical to SEQ ID NOs: 1 to 20 and has LigD activity.
18. The method of any one of the preceding claims, wherein a non-homologous end joining factor is encoded by a polypeptide that has a sequence that is at least 80%, 85%, 90%, 95%, 98%, 99% identical to SEQ ID NOs: 21 to 30 and has Ku activity.
19. The method of claim 15, wherein the engineered variant of LigD lacks exonuclease activity.
20. The method of claim 19, wherein the engineered variant has the sequence of SEQ ID NO: 1 with the following substitution H373A.
21. A method to append an adaptor to the 5’ end of a polynucleotide, comprising the steps of:
(1) hybridizing a 5’ flap adaptor to a single stranded polynucleotide to form a hybridized product comprising a 5’ flap;
(2) contacting the hybridized product with a structure-specific endonuclease that has 5’ flap cleavage activity, wherein the structure-specific endonuclease cleaves off the 5’ flap of the hybridized product to form a nicked hybridized product; and
(3) contacting the nicked hybridized product with a ligase to form a ligated product comprising a 5’ flap adaptor appended to the 5’ end of the polynucleotide.
22. The method of claim 21, wherein the method further comprises appending a second adaptor to the 3’ end of the polynucleotide, comprising the steps of:
(4) hybridizing a 3’ flap adaptor to the polynucleotide of (3) to form a second hybridized product comprising a 3’ flap;
(5) contacting the second hybridized product with a second structure-specific endonuclease that has 3’ flap cleavage activity, wherein the second structure-specific endonuclease cleaves off the 3’ flap to form a clipped hybridized product that has a 3’ overhang of base pairs from the 3’ flap adaptors; and
(6) contacting the clipped hybridized product with a polymerase, wherein the polymerase fills in the 3’ overhang of the clipped hybridized product to form a polynucleotide comprising adaptors at the 5’ and 3’ ends.
23. A method of appending an adaptor to the 3’ end of a polynucleotide, comprise the steps of:
(A) hybridizing a 3’ flap adaptor to a single stranded polynucleotide to form a hybridized product comprising a 3’ flap;
(B) contacting the hybridized product with a second structure-specific endonuclease that has 3’ flap cleavage activity, wherein the second structure-specific endonuclease cleaves off the 3’ flap to form a clipped hybridized product that has a 3’ overhang of base pairs from the 3’ flap adaptor; and
(C) contacting the clipped hybridized product with a polymerase, wherein the polymerase fills in the 3’ overhang to form a polynucleotide with an adaptor appended to the 3’ end.
24. The method of claim 23, wherein the method further comprises appending a second adaptor to the 5’ end of the polynucleotide, comprising the steps of:
(D) hybridizing a 5’ flap adaptor to the polynucleotide of (C) to form a second hybridized product comprising a 5’ flap;
(E) contacting the second hybridized product with a structure-specific endonuclease that has 5’ flap cleavage activity, wherein the structure-specific endonuclease cleaves off the 5’ flap of the second hybridized product to form a nicked hybridized product; and
(F) contacting the nicked hybridized product with a ligase to form a ligated product comprising a 5’ flap adaptor appended to the 5’ end of the polynucleotide.
25. A method of appending adaptors to the 5’ and 3’ ends of a polynucleotide, comprising:
(i) hybridizing a 5’ flap adaptor and a 3’ flap adaptor to a single stranded polynucleotide to form a hybridized product comprising a 5’ flap and a 3’ flap;
(ii) contacting the hybridized product with a structure-specific endonuclease that has 5’ flap cleavage activity, wherein the structure-specific endonuclease cleaves off the 5’ flap of the hybridized product to form a nicked hybridized product;
(iii) contacting the nicked hybridized product with a ligase to form a ligated product comprising a 5’ flap adaptor appended to the 5’ end of the polynucleotide;
(iv) contacting the ligated product with a second structure-specific endonuclease that has 3’ flap cleavage activity, wherein the second structure-specific endonuclease
cleaves off the 3’ flap to form a clipped hybridized product that has a 3’ overhang of base pairs from the 3’ flap adaptors; and
(V) contacting the clipped hybridized product with a polymerase, wherein the polymerase fills in the 3’ overhang of the clipped hybridized product to form a polynucleotide comprising adaptors at the 5’ and 3’ ends.
26. The method of any one of claims 21, 22, 24 and 25, wherein the 5’ flap adaptor comprises a double stranded adaptor region and a single stranded probe region, wherein the single stranded probe region is complementary to a target sequence of the polynucleotide, and wherein the double stranded adaptor region comprises a universal sequence.
27. The method of claim 26, wherein the base-pair of the double stranded adaptor region adjacent to the single stranded probe region also matches to the target sequence of the polynucleotide.
28. The method of claim 26 or claim 27, wherein the universal sequence is a sequence that is commonly used to generate sequence reads using a next generation sequencing platform.
29. The method of any one of claims 21, 22, and 24 to 28, wherein the structure-specific endonuclease that has 5’ flap cleavage activity is FEN1.
30. The method of any one of claims 21, 22, and 24 to 29, wherein the ligase is ligase selected from T4 DNA ligase, T7 DNA ligase, and Hi-T4 DNA ligase.
31. The method of any one of claims 22 to 30, wherein the 3 ’ flap adaptor comprises a single stranded adaptor region and a single stranded probe region, wherein the single stranded probe region is complementary to a target sequence of the polynucleotide, and wherein the single stranded adaptor region comprises a universal sequence.
32. The method of claim 31, wherein the universal sequence is a sequence that is commonly used to generate sequence reads using a next generation sequencing platform.
33. The method of claim 32, wherein the next generation sequencing platform utilizes bridge amplification.
34. The method of any one of claims 22 to 33, wherein the structure-specific endonuclease that has 3’ flap cleavage activity is XPF/MUS81.
35. The method of any one of claims 21 to 34, wherein the 5’ flap adaptor and/or the 3’ flap adaptor comprises a bar code sequence.
36. The method of claim 35, wherein the polynucleotides comprising 3’ and/or 5’ adaptors come from different genetic or polynucleotide sources and the source of the polynucleotides can be identified based upon the bar code sequence.
37. A method to append adaptors to the 5’ and 3’ ends of polynucleotides to form an adaptor- polynucleotide constructs, comprising:
(a) appending an adaptor to the 5' end of a polynucleotide by tagmenting the polynucleotide with a Tn5 transposome comprising: an adaptor strand that is transferred to the 5' end of the polynucleotide, and a non-transferred strand that can be removed under mild denaturing conditions;
(b) annealing a replacement oligonucleotide that comprises one or more locked nucleic acids (LNAs) to the polynucleotide comprising the 5' adaptor;
(c) removing the non-transferred strand under mild denaturing conditions and extending the polynucleotide comprising the 5' adaptor up to the replacement oligonucleotide using a non-strand displacing polymerase and dNTPs, wherein the extended product comprises a binding region for a template switch oligonucleotide, and wherein the removal of non-transferred strand and extension reaction can be performed in the same reaction;
(d) denaturing and removing the replacement oligonucleotide to isolate a polynucleotide that comprises the 5' adaptor and the template switch oligonucleotide binding region;
(e) annealing a template switch oligonucleotide that is blocked at its 3' end and comprises a 3' adaptor sequence to the template switch oligonucleotide binding region of the polynucleotide; and
(f) extending from the template switch oligonucleotide at the 3' end of the polynucleotide with a polymerase and dNTPs to form a polynucleotide that comprises adaptors at the 5’ and 3’ ends of the polynucleotide.
38. The method of claim 37, wherein the adaptor strand may or may not have nucleotide modifications.
39. The method of claim 38, wherein the adaptor strand comprises one or more LNAs.
40. The method of any one of claims 37 to 39, wherein the non-transferred strand can be denatured and removed using mild heat.
41. The method of any one of claims 37 to 40, wherein the replacement oligonucleotide has a higher Tm than the non-transferred strand.
42. The method of any one of claims 37 to 41, wherein the replacement oligonucleotide partially hybridizes to the same region as the non-transferred strand, resulting in the 5' portion of the polynucleotide being single strand upstream of the replacement oligonucleotide.
43. The method of any one of claims 37 to 42, wherein the non-strand displacing polymerase is selected from a T4-based polymerase, a T7-based polymerase, a Pfu-based polymerase, and a Taq-based polymerase.
44. The method of any one of claims 37 to 43, wherein the template switch oligonucleotide is blocked at its 3' end by having an -OH group, an ‘inverted T’ group, or a dideoxy version of a dNTP.
45. The method of any one of claims 37 to 44, wherein the Tn5 transposome is immobilized on a streptavidin paramagnetic bead.
46. A method to append adaptors to the 5’ and 3’ ends of polynucleotides to form an adaptor- polynucleotide constructs, comprising:
(a) appending an adaptor to the 5' end of a polynucleotide by tagmenting the polynucleotide with a Tn5 transposome comprising an adaptor strand that is transferred to the 5' end of the polynucleotide, and a replacement oligonucleotide that comprises locked nucleic acids (LNAs) that remains hybridized to the adaptor under moderate denaturing conditions, and a non-transferred strand that can be removed under moderate denaturing conditions;
(b) denaturing under moderate denaturing conditions to remove the nontransferred strand and extending the polynucleotide comprising the 5' adaptor up to the replacement oligonucleotide comprising LNAs using a non-strand displacing polymerase and dNTPs, wherein the extended product comprises a binding region for a template switch oligonucleotide;
(c) denaturing and removing the replacement oligonucleotide comprising LNAs to isolate a polynucleotide that comprises the 5' adaptor and the template switch oligonucleotide binding region;
(d) annealing a template switch oligo that is blocked at its 3' end and comprises a 3' adaptor sequence to the template switch oligonucleotide binding region of the polynucleotide; and
(e) extending from the template switch oligo at the 3' end of the polynucleotide with a polymerase and dNTPs to form a polynucleotide that comprises adaptors at the 5’ and 3’ ends of the polynucleotide.
47. The method of claim 46, wherein the adaptor strand may or may not have nucleotide modifications.
48. The method of claim 47, wherein the adaptor strand comprises one or more LNAs.
49. The method of any one of claims 46 to 48, wherein the non-transferred strand is from 15 bp to 20 bp in length.
50. The method of any one of claims 46 to 49, wherein the non-transferred strand can be denatured and removed using mild heat, and wherein the replacement oligonucleotide remains hybridized to the adaptor under these conditions.
51. The method of any one of claims 46 to 50, wherein the non-strand displacing polymerase is selected from a T4-based polymerase, a T7-based polymerase, a Pfu-based polymerase, and a Taq-based polymerase.
52. The method of any one of claims 46 to 51, wherein the template switch oligonucleotide is blocked at its 3' end by having an -OH group, an ‘inverted T’ group, or a dideoxy version of a dNTP.
53. A method to append adaptors to the 5’ and 3’ ends of polynucleotides to form an adaptor- polynucleotide constructs, comprising:
(a) appending an adaptor to the 5' end of a polynucleotide by tagmenting the polynucleotide with a Tn5 transposome comprising an adaptor strand that is transferred to the 5' end of the polynucleotide, and a non-transferred strand which contains linkage(s) that resist exonuclease activity that remains hybridized to the adaptor strand when the adaptor strand is appended to the 5' end of the polynucleotide;
(b) extending the polynucleotide comprising the 5' adaptor with a polymerase with 5’ to 3’ exonuclease activity ("5' exo polymerase"), wherein the 5' exo polymerase digests the hybridized non-appending strand up to the internal phosphorothioates linkage(s) to form a binding region for a template switch oligonucleotide;
(c) denaturing and removing the non-appending strand to isolate a polynucleotide that comprises the 5' adaptor and the template switch oligonucleotide binding region;
(d) annealing a template switch oligo that is blocked at its 3' end and comprises a 3' adaptor sequence to the template switch oligonucleotide binding region of the polynucleotide; and
(e) extending from the template switch oligo at the 3' end of the polynucleotide with a polymerase and dNTPs to form a polynucleotide that comprises adaptors at the 5’ and 3’ ends of the polynucleotide.
54. The method of claim 53, wherein the adaptor strand may or may not have nucleotide modifications.
55. The method of claim 54, wherein the adaptor strand comprises one or more LNAs.
56. The method of any one of claims 53 to 55, wherein the linkage(s) that resist exonuclease activity are phosphorothioate linkage(s), carbophosphonate linkage(s), pyridylphosphonate (PyrP) functionalized linkage(s), aminomethyl (AMP) or aminoethyl phosphonate (AEP) functionalized linkages, boranophosphate (BP) linkage(s), methylphosphonothioates (MPS) linkage(s), phosphorodithioates (SPS) linkage(s), thiophosphoramidates (NPS) linkage(s), boranomethylphosphonates (BMP) linkage(s), guanidine (GUA) linkage(s), morpholino phosphorodiamidate (PMO) linkage(s), and/or carbamate linkage(s).
57. The method of claim 56, wherein the linkage(s) that resist exonuclease activity are phosphorothioate linkage(s).
58. The method of any one of claims 53 to 57, wherein the polymerase with 5’ to 3’ exonuclease activity is selected from a Taq-based polymerase, and a Bst-based polymerase.
59. The method of any one of claims 53 to 58, wherein the template switch oligonucleotide is blocked at its 3' end by having an -OH group, an ‘inverted T’ group, and a dideoxy version of a dNTP.
60. The method of any one of claims 53 to 59, wherein the Tn5 transposome is immobilized on a streptavidin paramagnetic bead.
61. A method to append adaptors to the 5’ and 3’ ends of polynucleotides to form an adaptor- polynucleotide constructs, comprising:
(a) appending an adaptor to the 5' end of a polynucleotide by tagmenting the polynucleotide with a Tn5 transposome comprising an adaptor strand that is transferred to the 5' end of the polynucleotide, and a non-transferred strand that remains hybridized to the adaptor when the adaptor is appended to the 5' end of the polynucleotide, and wherein 5' adaptor strand and the non-transferred strand comprises a template switch oligonucleotide binding region, wherein a portion of the sequence of the template switch oligonucleotide binding region does not contain one of the four types of nucleobases;
(b) extending the polynucleotide comprising the 5' adaptor with a non-strand displacing polymerase and dNTPs up to the hybridized adaptor region;
(c) extending the polynucleotide comprising the 5' adaptor with a strand displacing polymerase with the dNTPs for the base pairs found only in the template switch oligonucleotide binding region so as to form a polynucleotide that comprises a 5' adaptor and the template switch oligonucleotide binding region;
(d) denaturing the polynucleotide of (c) and annealing a template switch oligonucleotide that is blocked at its 3' end and comprises a 3' adaptor sequence to the template switch oligonucleotide binding region of the polynucleotide; and
(e) extending from the template switch oligonucleotide at the 3' end of the polynucleotide with a polymerase and dNTPs to form a polynucleotide that comprises adaptors at the 5’ and 3’ ends of the polynucleotide.
62. The method of claim 61, wherein the adaptor strand may or may not have nucleotide modifications.
63. The method of claim 62, wherein the adaptor strand comprises one or more LNAs.
64. The method of any one of claims 61 to 63, wherein the non-strand displacing polymerase is selected from a T4-based polymerase, a T7-based polymerase, a Pfu-based polymerase, and a Taq-based polymerase.
65. The method of any one of claims 61 to 64, wherein after step (b), the dNTPs and polymerase are removed prior to step (c).
66. The method of claim 65, wherein the dNTPs and polymerase are removed by using SPRI beads, or by magnetic bead-based washing if the adaptors appended to the 5' end of the polynucleotide are attached to a bead.
67. The method of any one of claims 61 to 66, wherein the strand displacing polymerase is selected from a phi29-based polymerase, and a Bst (large fragment)-based polymerase.
68. The method of any one of claims 61 to 67, wherein the template switch oligonucleotide is blocked at its 3' end by having an -OH group, an ‘inverted T’ group, or a dideoxy version of a dNTP.
69. A method to append adaptors to the 5’ and 3’ ends of polynucleotides to form an adaptor- polynucleotide constructs, comprising:
(a) appending an adaptor to the 5' end of a polynucleotide by tagmenting the polynucleotide with a Tn5 transposome comprising an adaptor strand that is transferred to the 5' end of the polynucleotide, and a non-transferred strand that remains hybridized to the adaptor when the adaptor strand is appended to the 5' end of the polynucleotide, and wherein 5' adaptor strand and the non-transferred strand comprises a template switch oligonucleotide binding region, wherein a portion of the sequence of the template switch oligonucleotide binding region does not contain one of the four types of nucleobases;
(b) extending the polynucleotide comprising the 5' adaptor with a non-strand displacing polymerase and dNTPs up to the hybridized adaptor region;
(c) removing the non-transferred strand by selective denaturation;
(d) extending the polynucleotide comprising the 5' adaptor with a strand displacing polymerase with the dNTPs for the base pairs found only in the template switch oligonucleotide binding region, to form a polynucleotide that comprises a 5' adaptor and the template switch oligonucleotide binding region;
(e) denaturing the polynucleotide of (d) and annealing a template switch oligonucleotide that is blocked at its 3' end and comprises a 3' adaptor sequence to the template switch oligonucleotide binding region of the polynucleotide; and
(f) extending from the template switch oligonucleotide at the 3' end of the polynucleotide with a polymerase and dNTPs to form a polynucleotide that comprises adaptors at the 5’ and 3’ ends of the polynucleotide.
70. The method of claim 69, wherein the adaptor strand may or may not have nucleotide modifications.
71. The method of claim 70, wherein the adaptor strand comprises one or more LNAs.
72. The method of any one of claims 69 to 71, wherein the non-strand displacing polymerase is selected from a T4-based polymerase, a T7-based polymerase, a Pfu-based polymerase, and a Taq-based polymerase.
73. The method of any one of claims 69 to 72, wherein after step (d), the dNTPs and polymerase are removed prior to step (e).
74. The method of claim 73, wherein the dNTPs and polymerase are removed by using SPRI beads, or by magnetic bead-based washing if the adaptors appended to the 5' end of the polynucleotide are attached to a bead.
75. The method of any one of claims 69 to 74, wherein for step (c) the non-transf erred strand is removed using moderate heat to selective denature the strand, or application of a lambda exonuclease that selectively digests oligonucleotides containing a 5’ phosphorylated ends.
76. The method of any one of claims 69 to 75, wherein the strand displacing polymerase is selected from a phi29-based polymerase, and a Bst (large fragment)-based polymerase.
77. The method of any one of claims 69 to 76, wherein the template switch oligonucleotide is blocked at its 3' end by having an -OH group, an ‘inverted T’ group, or a dideoxy version of a dNTP.
78. A method to append adaptors to the 5’ and 3’ ends of polynucleotides to form an adaptor- polynucleotide constructs, comprising:
(a) appending an adaptor to the 5' end of a polynucleotide by tagmenting the polynucleotide with a Tn5 transposome comprising an adaptor strand that is transferred to the 5' end of the polynucleotide, and a non-transferred strand that remains hybridized to the adaptor when the adaptor strand is appended to the 5' end of the polynucleotide, wherein the adaptor comprises a complementary template switch binding domain;
(b) extending to the ends of the polynucleotide comprising the 5' adaptor with a strand displacing polymerase that displaces the non-transferred strand to form a polynucleotide comprising the 5' adaptor and a complementary 5' adaptor region comprising a template switch binding domain on the 3' end;
(c) denaturing and annealing to the template switch binding domain a template switch oligonucleotide that is blocked at its 3' end, wherein the template switch oligonucleotide comprises a complementary sequence to the template switch binding domain and comprises an adaptor region that is not complementary to the sequence of the polynucleotide, leaving the 3 'end region of the polynucleotide unhybridized; and
(d) providing a polymerase that has 3' to 5' exonuclease activity that first removes the unhybridized 3' end region of the polynucleotide and then extends from the complementary adaptor region of the template switch oligonucleotide to form a 3' adaptor on the polynucleotide.
79. The method of claim 78, wherein the adaptor strand may or may not have nucleotide modifications.
80. The method of claim 79. wherein the adaptor strand comprises one or more LNAs.
81. The method of any one of claims 78 to 80, wherein the strand displacing polymerase is selected from a phi29-based polymerase, and a Bst (large fragment)-based polymerase.
82. The method of any one of claims 78 to 81, wherein the polymerase that has 3' to 5' exonuclease activity is selected from a pfu-based polymerase, a phi29-based polymerase, and E. coli DNA polymerase II.
83. The method of any one of claims 78 to 82, wherein the template switch oligonucleotide is blocked at its 3' end by having an -OH group, an ‘inverted T’ group, or a dideoxy version of a dNTP.
84. A method to append adaptors to the 5’ and 3’ ends of polynucleotides to form an adaptor- polynucleotide constructs, comprising:
(a) appending an adaptor to the 5' end of a polynucleotide by tagmenting the polynucleotide with a Tn5 transposome comprising an adaptor strand that is transferred to the 5' end of the polynucleotide, and a non-transferred strand that can be removed under denaturing conditions, wherein the adaptor comprises a complementary template switch binding domain;
(b) extending to the ends of the polynucleotide comprising the 5' adaptor with a strand displacing polymerase to form a polynucleotide comprising the 5' adaptor and a complementary 5' adaptor region comprising a template switch binding domain on the 3' end;
(c) denaturing and annealing to the template switch binding domain a template switch oligonucleotide that is blocked at its 3' end, wherein the template switch oligonucleotide comprises a complementary sequence to the template switch binding domain and comprises an adaptor region that is not complementary to the sequence of the polynucleotide, leaving the 3 'end region of the polynucleotide unhybridized; and
(d) providing a structure specific endonuclease that nicks the unhybridized 3' end region of the polynucleotide and then a polymerase extends from the complementary adaptor region of the template switch oligonucleotide to form a 3' adaptor on the polynucleotide.
85. The method of claim 84, wherein the adaptor strand may or may not have nucleotide modifications.
86. The method of claim 85, wherein the adaptor strand comprises one or more LNAs.
87. The method of any one of claims 84 to 86, wherein the strand displacing polymerase is selected from a phi29-based polymerase and a Bst (large fragment)-based polymerase.
88. The method of any one of claims 84 to 87, wherein the structure specific endonuclease is XPF/Mus81.
89. The method of any one of claims 37 to 88, wherein the Tn5 transposome is immobilized on a solid substrate by the adaptor strand hybridizing to an anchor oligonucleotide attached to the solid substrate.
90. The method of any one of claims 37 to 89, wherein the 5' adaptor and the 3' adaptor have different sequences selected from either P5 or P7:
P5: 5' AAT GAT ACG GCG ACC ACC GA 3' (SEQ ID NO: 32)
P7: 5' CAA GCA GAA GAC GGC ATA CGA GAT 3' (SEQ ID NO: 33).
91. The method of any one of claims 37 to 90, wherein the adaptor-polynucleotide constructs are used as templates for sequencing.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202363592016P | 2023-10-20 | 2023-10-20 | |
| US63/592,016 | 2023-10-20 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025085618A1 true WO2025085618A1 (en) | 2025-04-24 |
Family
ID=95449084
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2024/051747 Pending WO2025085618A1 (en) | 2023-10-20 | 2024-10-17 | Methods for appending adaptors onto polynucleotides |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025085618A1 (en) |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5888737A (en) * | 1997-04-15 | 1999-03-30 | Lynx Therapeutics, Inc. | Adaptor-based sequence analysis |
| US20100167954A1 (en) * | 2006-07-31 | 2010-07-01 | Solexa Limited | Method of library preparation avoiding the formation of adaptor dimers |
| US20210363596A1 (en) * | 2015-06-09 | 2021-11-25 | Life Technologies Corporation | Methods, Systems, Compositions, Kits, Apparatus and Computer-Readable Media for Molecular Tagging |
-
2024
- 2024-10-17 WO PCT/US2024/051747 patent/WO2025085618A1/en active Pending
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5888737A (en) * | 1997-04-15 | 1999-03-30 | Lynx Therapeutics, Inc. | Adaptor-based sequence analysis |
| US20100167954A1 (en) * | 2006-07-31 | 2010-07-01 | Solexa Limited | Method of library preparation avoiding the formation of adaptor dimers |
| US20210363596A1 (en) * | 2015-06-09 | 2021-11-25 | Life Technologies Corporation | Methods, Systems, Compositions, Kits, Apparatus and Computer-Readable Media for Molecular Tagging |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US12071711B2 (en) | Method of preparing libraries of template polynucleotides | |
| US12385085B2 (en) | Preparation of templates for methylation analysis | |
| US9328378B2 (en) | Method of library preparation avoiding the formation of adaptor dimers | |
| US9012184B2 (en) | End modification to prevent over-representation of fragments | |
| AU2003223730B2 (en) | Amplification of DNA to produce single-stranded product of defined sequence and length | |
| EP1546355A2 (en) | Methods of use for thermostable rna ligases | |
| US20240191288A1 (en) | Blocking oligonucleotides for the selective depletion of non-desirable fragments from amplified libraries | |
| WO2025085618A1 (en) | Methods for appending adaptors onto polynucleotides | |
| HK40014831B (en) | Method of preparing libraries of template polynucleotides | |
| HK40014831A (en) | Method of preparing libraries of template polynucleotides |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24880569 Country of ref document: EP Kind code of ref document: A1 |