WO2025114477A1 - Adaptors for ligation - Google Patents
Adaptors for ligation Download PDFInfo
- Publication number
- WO2025114477A1 WO2025114477A1 PCT/EP2024/083987 EP2024083987W WO2025114477A1 WO 2025114477 A1 WO2025114477 A1 WO 2025114477A1 EP 2024083987 W EP2024083987 W EP 2024083987W WO 2025114477 A1 WO2025114477 A1 WO 2025114477A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- dna
- adaptor
- sequence
- seq
- site
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6806—Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
Definitions
- the present invention relates to the field of ligation of oligonucleotides to DNA fragments.
- the ligation methods of the invention may be used for attaching oligonucleotides comprising for example adaptors, primer binding sites, promoters, tags, barcodes or any combination of the aforementioned, to DNA fragments.
- Chromatin-Immunoprecipitation coupled to Next Generation Sequencing (ChlP- seq) is used to map the genomic occupancies of chromatin factors and histone modifications.
- ChIP Chromatin-Immunoprecipitation
- ChlP- seq Next Generation Sequencing
- Multiplexing is possible only when individual samples are first barcoded with a unique DNA sequence in the form of an adaptor before combining with other samples.
- adaptor ligation to nucleosomes or other protein-bound DNA fragments typically suffers from low efficiency and hence is often circumvented with excessive amount of adaptor molecules.
- the present invention provides adaptors designed in a manner so free adaptor or adaptor dimers can be discriminated from adaptors attached to DNA fragments by selective amplification.
- the design of the fill-in adaptors allows that only adaptor ligated to the target chromatin fragments can be carried over during selective amplification.
- the target chromatin fragments are in general broadly defined as transcription factor bound, which typically are ⁇ 50bp or nucleosomal, which typically are ⁇ 150bp.
- the chromatin fragments may for example be prepared cellular chromatin or they may be cell free chromatin.
- ligation of adaptor to target DNA is also referred to as eventful ligation, as opposed to self-ligation of adaptors.
- the adaptors of the invention comprise a single stranded amplification sequence and a nicking site.
- the single stranded amplification sequence is positioned at one end of the adaptor and is only functional as an amplification initiation site in the form of double-stranded DNA, i.e. after synthesis of the complementary strand.
- the nicking site is positioned close to the other end of the adaptor. If the adaptor is incubated at an elevated temperature after nicking, the resulting short stretch of oligonucleotides between the nicking site and the adaptor end will dissociate from the adaptor. Similarly, if adaptor dimers are formed, they will dissociate if incubated at elevated temperature after nicking.
- an adaptor ligated to a DNA fragment such as the gDNA moiety of a chromatin fragment, will not dissociate.
- a strand displacing polymerase will be able to elongate the nicked strand, and thereby synthesise the complementary strand of the amplification initiation site.
- Figure 1 B to E show specific examples of a fill-in adaptor of the invention
- Figure 2 illustrates the principle.
- the fill-in adaptor according to the invention consists of 2 strands.
- the upper strand consists of 3 regions denoted A, B and C, whereas the lower strand consists of 3 regions denoted A’, B’ and C’.
- A comprises a single stranded region (A-i), and either it constitutes a promoter, when bound to its complementary sequence or it is a sequence complementary to a primer binding site.
- A is substantially non-complementary to Ai.
- A is designed so that transcription of the fill-in adaptor can only occur if A is annealed to its complementary sequence.
- A’ is resistant to exonuclease.
- C’ or -B’-C’- comprises a nicking endonuclease recognition site or pre-recognition site.
- Said recognition site comprises or consists of a nicking site, i.e. the site, where said endonuclease nicks the adaptor.
- the nicking endonuclease recognition site consists of a single nucleotide (e.g. dll), and in such embodiments, the recognition site consists of a nicking site.
- the nicking site is positioned at the 3’ end of the C’ segment.
- nicking endonuclease recognition site which can be any nicking endonuclease recognition site. Alternatively, it can be a pre-recognition site, which can be transformed into a nicking endonuclease recognition site, e.g. by the action of a DNA glycosylase.
- a 3’ hydroxyl group may be generated after nicking at the nicking site allowing subsequent primer extension by a strand-displacing polymerase, which synthesizes a new bottom strand and eventually reconstitutes the promoter or primer binding functionality of A.
- the resulting “primer” at the bottom strand is rendered very unstable due to the short length, especially under heat challenge.
- the bottom strand primer length and hence its melting temperature is increased dramatically once the adaptor is ligated to another DNA fragment.
- the length and hence the resulting heat stability of the bottom strand primer provides an effective selection basis to specifically reconstitute the A promoter/primer binding site in case of an eventful ligation product, while free adaptor monomers remain inactive.
- Adaptor dimers could present yet another challenge. Thanks to the fork structure at the tail end of the fill-in adaptor, it does not support ligation and because Ai’ is exonuclease resistant, it is refractory to routine end-repairing enzymes. As such adaptor dimer can only exist in a head-to-head configuration as illustrated in Figure 2. In this case, RNase can create a nick at both the top and bottom strands, essentially cleaving the adaptor dimers to monomeric forms. If subjected to a heat challenge the adaptor dimers will dissociate into monomeric forms. This is also illustrated in Figure 2.
- the fill-in adaptor of that example consists of a T7 promoter, partial SBS primer binding site allowing sequencing in the Illumina platform, a randomized 8-nucleotide Unique Molecule Identifier (UMI) followed by an 8-nucleotide sample-specific barcode.
- the T7 promoter is used for in-vitro transcription (IVT) of any downstream DNA fragment.
- the top strand of the T7 promoter is designed to be largely single stranded by default while the bottom strand is replaced by a stretch of seven consecutive cytosines interconnected by exonuclease-resistant phosphorothioate linkage.
- the mismatch hence creates a fork DNA structure that is refractory to DNA ligation. More importantly, such single-stranded T7 promoter is incapable of driving IVT by T7 RNA polymerase.
- the eighth nucleotide within the sample barcode is a deoxy-Uridine. Embedding a single deoxy-Uridine within a DNA duplex essentially creates a recognition site for DNA glycosylases. The DNA glycosylase removes the nitrogenous base while leaving the sugar phosphate backbone intact. This creates an apurinic/apyrimidinic site (i.e.
- AP site an AP site
- AP site an AP site
- AP site an AP site
- abasic site are used interchangeably.
- the AP site (herein also referred to as a nicking site) is recognized by endonucleases, which produce a 1 nucleotide gap in C’.
- the nick at the nicking site allows subsequent primer extension by a strand-displacing Bst polymerase, which synthesizes a new bottom strand and eventually reconstitutes a functional double-strand T7 promoter.
- Figure 1 D Another example of a useful nicking strategy and associated adaptor design to be used with the present invention is shown in Figure 1 D, where a nicking restriction enzyme variant is used. Restriction enzymes recognize a specific sequence and cut either one or both strands at defined positions relative to the recognized sequence.
- the Nb.BsrDI nicking restriction enzyme variant recognizes a GCAATG sequence on the top strand while introducing a nick in the bottom strand.
- FIG. 1 E Another example of a useful nicking strategy and associated adaptor design to be used with the present invention is shown in Figure 1 E, in which the D10A Cas9 variant is used to introduce a nick in the bottom strand guided by a gRNA recognizing a 21mer.
- a PAM sequence here GG
- GG PAM sequence adjacent to the gRNA binding site
- the length and hence the resulting heat stability of the bottom strand primer provide an effective selection basis to specifically reconstitute the T7 promoter of eventful ligation product, while free adaptor monomers remained inactive for T7 transcription, hence failed to be carried over for downstream RNA adaptor ligation and cDNA conversion.
- Adaptor dimers could present yet another challenge. Thanks to the fork structure at the tail end of the adaptor, it does not support ligation. And because of the phosphorothioate linkage of the mismatch poly-C sequence, the fork structure is exonuclease resistant and hence refractory to routine end-repairing enzymes. As such adaptor dimer can only exist in a head-to-head configuration.
- the endonucleases can act as a restriction enzyme by nicking at both the top and bottom strands, essentially cleaving the adaptor dimers to monomeric forms, which dissociate from each other at elevated temperature, as demonstrated in Figure 2.
- a first aspect of the invention relates to a partly double-stranded adaptor comprising or consisting of an oligonucleotide of the general structure: 5’ -A - B - C - 3’ 3’ -A’ - B’ - C’ - 5’ wherein a) A is the top strand of a DNA amplification sequence, and A consists of 5’ - AI - A 2 - 3’; b) A’ consists of 3’ - A1’ - A2’ - 5’, and c) Ai’ is a sequence of nucleotides, which is substantially non- complementary to Ai, wherein the 3’ end is exonuclease resistant and/or contains primer extension blocking groups; d) A2 and A2’ are either not present or A2 and A2’ are sequences of nucleotides substantially complementary to each other; e) B and B’ are sequences of in the range of 5 to 100 deoxyribonucleotides, which are substantially
- nicking endonuclease recognition site is not a ribonucleotide positioned at the 3’ end of C’.
- a second aspect of the invention relates to a partly double-stranded adaptor comprising or consisting of an oligonucleotide of the general structure:
- A is the top strand of a DNA amplification sequence, and A consists of 5’ - Ai - A 2 - 3’; b) A’ consists of 3’ - Ai’ - A2’ - 5’, and c) Ai’ is a sequence of nucleotides, which is substantially non- complementary to Ai, wherein the 3’ end is exonuclease resistant and/or contains primer extension blocking groups; d) A2 and A2’ are either not present or A2 and A2’ are sequences of nucleotides substantially complementary to each other; e) B and B’ are sequences of in the range of 5 to 100 deoxyribonucleotides, which are substantially complementary to each other; and f) C and C’ are sequences, which are substantially complementary to each other, wherein C’ comprises a deoxy-Uridine (dll).
- a third aspect of the invention relates to a method of attaching adaptor(s) to DNA fragment(s), said method comprising: a) providing at least one adaptor according to the invention; b) providing a sample containing DNA fragments; c) attaching the adaptor to the DNA fragments in the sample; and d) i) if C’ comprises a nicking endonuclease pre-recognition site, incubating the sample with a DNA glycosylase recognising said pre-nicking site and with a nicking endonuclease recognising the nicking endonuclease recognition site generated by the DNA glycosylase either sequentially, partly simultaneously or simultaneously under conditions allowing for activity of said enzymes; or ii) if C’ or -B’-C’- comprises a nicking endonuclease recognition site incubating the sample with a nicking endonuclease recognising said nicking endonuclease
- a fourth aspect of the invention relates to a method of attaching adaptor(s) to DNA fragment(s), said method comprising: a) providing at least one adaptor according to the invention, wherein C’ of said adaptor comprises a dll; b) providing a sample containing DNA fragments; c) attaching the adaptor to the DNA fragments in the sample; and d) i) incubating the sample with a Uracil-DNA glycosylase under conditions allowing for activity of said enzyme, thereby generating an AP site; ii) incubating the sample with a nicking endonuclease recognising an AP site under conditions allowing for activity of said enzyme, wherein steps i) and ii) may be performed simultaneously, partly simultaneously or sequentially, e) incubating the sample at a temperature that is higher than the Tm of i) and ii), wherein i) and ii) are as follows ii) 5’ - C - 3’
- the AP site generated by the Uracil-DNA-glycosylase in step (d) is an apyrimidinic site.
- a fifth aspect of the invention relates to a method of amplification of DNA fragments, said method comprising the step of a) preparing DNA fragments attached to adaptors by the method as described herein; b) amplifying said DNA fragments attached to adaptors in vitro.
- An adaptor can be added at one or both ends of a DNA fragment.
- the same or different adaptor may be ligated at each end.
- two species of adaptor are provided, that is two different adaptors, such that a different adaptor may be ligated at each end.
- Such an embodiment is particularly useful in the case that the DNA amplification sequence in the adaptor is complementary to a primer binding site, and the adaptor-attached fragments are amplified by primer-directed amplification, e.g by PCR.
- the two adaptor species may comprise (or provide) different primer binding sites.
- Figure 1 shows an example of a prior art adaptor described in Peter van Galen et al., Molecular Cell, 2016 (top strand: SEQ ID NO. 83; bottom strand SEQ ID NO. 84) and four examples of fill-in adaptors according to the invention (1 B-E).
- 1 B shows an example of a fill-in adaptor with a nicking endonuclease recognition site on the lower strand at the 5 th base pair from 5’ end (top strand: SEQ ID NO. 85; bottom strand SEQ ID NO. 86).
- 1C shows an example of a fill-in adaptor with a deoxy-Uridine on the lower strand at the 8 th base pair from 5’ end (top strand: SEQ ID NO.
- 1 D shows an example of a fill-in adaptor with a restriction endonuclease recognition site positioned between the UMI and Barcode (top strand: SEQ ID NO. 89; bottom strand SEQ ID NO. 90).
- 1 E shows an example of a fill-in adaptor with an CRIPSR-Cas9 nickase recognition site (top strand: SEQ ID NO. 91 ; bottom strand SEQ ID NO. 92).
- Figure 2 shows a schematic illustration of the concept of the invention.
- the products exist as a mixture of excessive unreacted adaptor monomers, adaptor dimers and adaptors ligated to target DNA fragments, such as genomic DNA fragments or cell free DNA.
- target DNA fragments such as genomic DNA fragments or cell free DNA.
- These molecules are then subjected to a nicking endonuclease, which specifically nicks at the nicking endonuclease recognition site or to a DNA glycosylase and a nicking endonuclease which specifically nicks at the nucleotide within the adaptor that is recognized by the DNA glycosylase, resulting in a 3’ priming site proximal to the ligation end of the adaptor.
- Heat challenge dissociates the small fragments generated by nicking in the unligated adaptors as well as any adaptor dimers.
- Subsequent primer extension by strand replacing DNA polymerase (in the figure exemplified by Bst polymerase, but it could be any strand replacing DNA polymerase) relies on the existence of a heat-stable 3’ priming site, provided by the ligation to DNA fragments, such as genomic DNA fragments (which are typically ⁇ 50- 150bp) or cell free DNA.
- Primer extension by the strand-displacing DNA polymerase reconstitutes the DNA amplification sequence in the double-stranded form necessary to transcribe or amplify the ligated adaptor-DNA fragments, due to the presence of a stable 3’ priming site. Due to the lack of a free 3’ priming site for polymerase, any free adaptor contaminants are not elongated and thus in the free adaptors or adaptor dimers the DNA amplification sequence is not regenerated and remains inactive. Thus, only adaptors ligated to the targeted DNA fragments, e.g. genomic DNA fragments or cell free DNA can be amplified.
- Figure 3A+B show an example of how gDNA fragments may be tagged, amplified by linear amplification or PCR and sequenced in methods of the invention for determining the barcode (BC), UMI and/or gDNA sequence, for example as part of a method to profile and quantify chromatin modifications
- Fill-in adaptor of this invention is added randomly to the gDNA moiety of all or a fraction of chromatin fragments from each sample. Adaptors may be added at one side or both sides of a given gDNA fragment. After pooling, optionally splitting pool into sub-pools, and submitting these to chromatin immunoprecipitation, gDNA fragments of each chromatin modification sub-pool are purified.
- a second tag is ligated onto the gDNA fragment.
- the second tag comprises an amplification sequence and may optionally comprise a second barcode sequence.
- the non-ligatable terminus of the fill-in adaptor ensures that second tag is not ligated onto the fill-in adaptor.
- the second tag serves as a primer binding site for reverse transcription 6)
- the double tagged gDNA fragments are amplified by PCR using primers specific for amplification sequence of the fill-in adaptor and amplification sequence of second tag.
- part of the amplification sequence of the fill-in adaptor is identical to part of the amplification sequence of the second tag (diagonally shaded area). Sequencing platform-specific adaptors may be added with the primer sequences. 7) The UMI-BC-gDNA part is sequenced.
- Figure 3B shows a specific scenario where adaptors added to both ends of a given gDNA fragment, in which the gDNA fragment is still amplified as intended.
- Figure 3 C+D show an example of how gDNA fragments may be tagged, amplified by PCR and sequenced in methods of the invention for selectively amplifying genespecific or locus-specific gDNA fragments, for example to determine gene-specific or locus-specific levels of chromatin modifications
- 1) Fill-in adaptor is added randomly to the gDNA moiety of all or a fraction of chromatin fragments from each sample. After pooling, splitting pool into sub-pools, and chromatin immunoprecipitation, gDNA fragments of each chromatin modification sub-pool are purified. 2) After heating the sample, the DNA polymerase is added for adaptor fill-in.
- a second tag with a locus-specific sequence serves as a primer binding site for reverse transcription
- Fill-in adaptor gDNA fragments are amplified by PCR using one primer specific for amplification sequence of the Fill-in adaptor and a primer specific for one or multiple loci of interest. Sequencing platform-specific adaptors may be added as part of the primers. Primers may also optionally comprise a second barcode sequence. 6) The UMI-BC-gDNA part is sequenced.
- Figure 3E show an example of how gDNA fragments may be tagged, amplified by PCR and sequenced in methods of the invention for determining the barcode (BC), UMI and/or gDNA sequence, for example as part of a method to profile and quantify chromatin modifications.
- a mixture of two fill-in adaptor species (“A” and “B”) of this invention, sharing the same BC but carrying different primer binding sites A and B is used.
- a and B adaptors are added randomly to the gDNA moiety of all or a fraction of chromatin fragments from each sample, so that a gDNA moiety is either ligated to an adaptor A or B on one end, or ligated on both ends to two adaptors in one of the possible combinations A-gDNA-A, A-gDNA-B, B-gDNA-A, B-gDNA-B.
- A-B Double-tagged gDNA fragments are amplified by PCR using two primers specific for amplification sequence A and B, respectively. 4) The UMI-BC- gDNA part is sequenced.
- Figure 4 shows performance comparison between the adaptors shown in Figure 1 B (r5) and figure 1C (u5 and u5_ll I) and the prior art adaptor shown in figure 1A (3C) using MINUTE-ChlP.
- the figure shows % of free adaptor compared to total sequences (read statistics from NGS analysis) of indicated libraries.
- R5, u5 and u5_l II adaptors yielded ⁇ 20-50-fold higher fraction of desired reads containing gDNA sequences.
- Figure 5 shows ligation of adaptors with an endonuclease nicking site according to this invention, to cell free DNA (cfDNA) fragments in the human blood.
- Figure 5A shows the schematic workflow used in the experiment: Ligation was performed by directly adding a reaction mix containing T4 Polynucleotide Kinase and T4 Ligase to 200uL each of plasma. The four barcoded plasma samples were pooled. 200uL of the pool were subjected to DNA purification and library preparation using T7 amplification, reverse transcription and library PCR according to the MINUTE-ChlP protocol, yielding the “Input” library.
- FIG. 5B shows a histogram showing fragment size distribution in each sequencing library.
- Figure 5C shows boxplots with the number of unique fragments (estimated library size) recovered from each of the four plasma samples that were barcoded and pooled, in the Input, H3-ChlP and H3K4me3-ChlP library.
- Figure 5D shows the reads in the libraries, which were mapped to the human genome (hg38) and plotted over 17644 known transcription start sites. The heatmaps show that H3K4me3-ChlP recovered predominantly reads mapping to the transcription start sites of genes.
- the term “adaptor” refers to an oligonucleotide, which is doublestranded at one end, and which thus can be ligated to a DNA fragment.
- amplification in relation to nucleic acids refers to any in vitro method for increasing the number of copies of a nucleotide sequence with the use of a polymerase.
- Amplification reactions include, for example, polymerase chain reactions (PCR), transcription, reverse transcription, replication or combinations of the aforementioned.
- PCR polymerase chain reactions
- DNA amplification comprises PCR.
- nucleotide sequences are considered to be “complementary” to each other, when said nucleotides sequences are able to hybridise to each other via formation of Watson-Crick base-pairing in manner so that all nucleotides of one sequence are base paired with all nucleotides of the second sequence.
- DNA amplification sequence refers to a sequence, which promotes transcription or replication of DNA, wherein said transcription or replication only is promoted when the bottom strand of the DNA amplification sequence is available.
- the DNA amplification sequence may comprise or consist of a promoter or a primer binding site.
- melting temperature in terms of nucleic acids is the temperature at which 50% of two substantially complementary nucleotide sequences form a stable double helix and the other 50% is separated to single strand molecules.
- the melting temperature may also be referred to as T m .
- the Tm as used herein is calculated using a nearest-neighbor method based on the method described in Breslauer et al., Proc. Natl. Acad. Sci. 83, 3746-50 (1986) using a salt concentration parameter of 50 mM and nucleotide sequence concentration of 900 nM.
- the method is implemented by the software "Multiple Primer Analyzer" from Life Technologies/Thermo Fisher Scientific Inc.
- nucleotide sequences are considered to be “non-complementary” if they are not capable of hybridising to each other, preferably under standard conditions for hybridization, such as in storage buffer with 10 mM Tris and 1 mM EDTA at a pH of 8.0 and a temperature of 5°C below the melting temperature of one of said nucleotide sequences with a complementary sequence forming Watson-Crick base pairs at all positions.
- two nucleotide sequences are considered to be ‘non- complementary’ to each other if at the most 30%, preferably at the most 20%, more preferably at the most 10% of the nucleotides of one sequence can form Watson-Crick base-pairs with nucleotides of the second sequence, when the sequences are aligned with each other.
- sequence identity describes the relatedness between two amino acid sequences or between two nucleotide sequences, i.e. a candidate sequence and a reference sequence based on their pairwise alignment.
- sequence identity is determined using the Needleman-Wunsch algorithm (Needleman and Wunsch, 1970, J. Mo/. Biol. 48: 443-453) as implemented in the Needle program of the EMBOSS package (EMBOSS: The European Molecular Biology Open Software Suite, Rice et al., 2000, Trends Genet.
- the Needleman-Wunsch algorithm is also used to determine whether a given amino acid in a sequence other than the reference sequence corresponds to a given position of the reference sequence.
- strand displacing polymerase refers to a nucleic acid polymerase that has a strand displacement activity apart from its nucleic acid synthesis activity. That is, a strand displacing nucleic acid polymerase can continue nucleic acid synthesis on the basis of the sequence of a nucleic acid template strand (i.e., reading the template strand) while displacing a complementary strand that had been annealed to the template strand.
- nucleotide sequences are considered to be “substantially complementary” to each other, when said nucleotides sequences are able to hybridise to each other, preferably under standard conditions for hybridization, such as in storage buffer with 10 mM Tris and 1 mM EDTA at a pH of 8.0 and a temperature of 5°C below the melting temperature of one of said nucleotide sequences with a complementary sequence forming Watson-Crick base pairs at all positions.
- two nucleotide sequences are considered to be ‘substantially complementary’ to each other if at least 80%, preferably at least 85% of the nucleotides of one sequence can form Watson- Crick base-pairs with nucleotides of the second sequence, when the sequences are hybridised to each other.
- top strand refers to the sense strand of DNA
- bottom strand refers to the anti-sense strand
- the present invention provides for the use of at least one adaptor in the methods described herein. This includes providing different adaptors such that a different adaptor may be ligated at each end of a DNA fragment. Accordingly, the term “at least one adaptor” may denote that at least one species of adaptor is provided. As noted above, and described further below, the methods herein may comprise the use of two adaptor species, or two different adaptors, such that a different adaptor is ligated at each end of DNA fragment.
- the term “at least one adaptor” may also include that a plurality of adaptors may be employed in the methods described herein. For example, two or more adaptors as described herein may be provided. In some embodiments, one, two, three, four, five, six, seven, eight, nine or ten adaptors may be provided. The skilled person would be capable of determining how many adaptors may be required. Fill-in Adaptor
- the present invention relates to adaptors, which herein are also referred to as “fill-in” adaptors.
- the fill-in adaptors are at least partly double-stranded oligonucleotides of known sequence.
- the fill-in adaptors may also comprise stretches of unknown or random sequences, such as UMI sequences.
- the fill-in adaptors may be ligated to DNA fragments, and depending on the exact sequences of the fill-in adaptors, the ligation of adaptor may enable the generation of amplification-ready products of the target DNA fragments.
- the adaptor of present invention is a partly double stranded oligonucleotide, typically adapting a forklike configuration.
- the majority of the adaptor is typically double-stranded, however one end is non-complementary, and thus the adaptor comprises single strands at one of the ends.
- the upper strand comprises or consists of 3 regions, which here are denoted A, B and C, whereas the lower strand comprises or consists of 3 regions, which herein are denoted A’, B’ and C : .
- Each of A, B, C, A’, B’ and C’ consists of a nucleotide sequence.
- Each of A, B, C, A’, B’ and C are described in more detail below, and the fill-in adaptor of the invention may comprise any of the A, B, C, A’, B’ and C’ described herein the following sections.
- the fill-in adaptor of the present invention is a partly double-stranded adaptor comprising or consisting of an oligonucleotide of the general structure:
- A is the top strand of a DNA amplification sequence, and A consists of 5’ - Ai - A 2 - 3’; b) A’ consists of 3’ - Ai’ - A 2 ’ - 5’, and c) Ai’ is a sequence of nucleotides, which is substantially non- complementary to Ai, wherein the 3’ end is exonuclease resistant and/or contains primer extension blocking groups; d) A 2 and A 2 ’ are either not present or A 2 and A 2 ’ are sequences of nucleotides substantially complementary to each other; e) B and B’ are sequences of in the range of 5 to 100 deoxyribonucleotides, which are substantially complementary to each other; and f) C and C’ are sequences, which are substantially complementary to each other, wherein C’ comprises a nicking end
- the present invention relates to a partly double-stranded adaptor comprising or consisting of an oligonucleotide of the general structure:
- A is the top strand of a DNA amplification sequence, and A consists of 5’ -AI-A 2 -3’; b) A’ consists of 3’ -A1’ -A 2 ’ - 5’, and c) A1’ is a sequence of nucleotides, which is substantially non- complementary to A1, wherein the 3’ end is exonuclease resistant and/or contains primer extension blocking groups; d) A 2 and A 2 ’ are either not present or A 2 and A 2 ’ are sequences of nucleotides substantially complementary to each other; e) B and B’ are sequences of in the range of 5 to 100 deoxyribonucleotides, which are substantially complementary to each other; and f) C and C’ are sequences, which are substantially complementary to each other, wherein C’ comprises a deoxy-Uridine (dU).
- the adaptor comprises or consists of an oligonucleotide of the general structure:
- the DNA amplification sequence is a promoter sequence of an RNA polymerase or it comprises a primer binding site.
- the adaptor does not comprise any ribonucleotides.
- the fill-in adaptor is an adaptor comprising or consisting of SEQ ID NO: 79 as one strand and SEQ ID NO: 80 as the other strand.
- the fill-in adaptor is an adaptor comprising or consisting of SEQ ID NO: 81 as one strand and SEQ ID NO: 82 as the other strand.
- the fill-in adaptor is an adaptor comprising or consisting of SEQ ID NO: 87 as one strand and SEQ ID NO: 88 as the other strand ( Figure 1C).
- the fill-in adaptor is an adaptor comprising or consisting of SEQ ID NO: 89 as one strand and SEQ ID NO: 90 as the other strand ( Figure 1 D).
- the fill-in adaptor is an adaptor comprising or consisting of SEQ ID NO: 91 as one strand and SEQ ID NO: 92 as the other strand ( Figure 1 E).
- A is the top strand of a DNA amplification sequence, and A consists of 5’ - Ai - A2 - 3’,
- A2 is not present, in which case A consists of A1.
- A is the top strand of a DNA amplification sequence.
- Said DNA amplification sequence may be any sequence, which promotes transcription or replication of DNA, when the bottom strand of the DNA amplification sequence is available.
- the fill-in adaptor comprises A, but does not comprise a sequence complementary to A1. Accordingly, the DNA amplification sequence of the free fill-in adaptor is not functional and does not promote transcription/replication.
- the DNA amplification sequence will promote transcription/replication.
- the bottom strand of fill-in adaptor can be generated with the aid of a strand-replacing polymerase using the nicked 3’ end as priming site, thereby reconstituting an active DNA amplification sequence.
- the DNA amplification sequence can be any sequence promoting transcription or replication only when the bottom strand has been reconstituted.
- the DNA amplification sequence is the promoter sequence of an RNA polymerase.
- A is recognised and bound by an RNA polymerase when in a double-stranded DNA form with its complementary sequence.
- A comprises or consists of the T7 promoter, the SP6 promoter, the T3 phage promotor or the Syn5 promotor.
- a when bound to its complementary sequence is recognized by T7 RNA polymerase, the SP6 RNA polymerase, the Bacteriophage T3 RNA polymerase or Cyanophage Syn5 polymerase.
- A is recognized by T7 RNA polymerase, when bound to its complementary sequence as a double-stranded DNA.
- Said T7 promoter preferably comprises or consists of a sequence of SEQ ID NO: 72 or a sequence sharing at least 90%, preferably at least 95% sequence identity therewith.
- Said SP6 RNA polymerase preferably recognizes the sequence of SEQ ID NO: 73 or a sequence sharing at least 95% sequence identity therewith.
- Said T3 RNA polymerase preferably recognizes the sequence of SEQ ID NO: 74 or a sequence sharing at least 95% sequence identity therewith.
- Said Cyanophage Syn5 polymerase preferably recognizes the sequence of SEQ ID NO: 75 or a sequence sharing at least 95% sequence identity therewith.
- A contains a sequence complementary to a primer binding site.
- the primer binding site may be any sequence complementary to a primer.
- neither said primer nor the primer binding site is prone to formation of secondary structure.
- A may be any suitable length.
- A comprises or consists of a promoter sequence
- A should at minimum be the length of said promoter, and frequently A is exactly the length of the promoter.
- A is a sequence complementary to a primer binding site, A is preferably long enough to allow hybridisation of the primer to the primer binding site with high affinity.
- A consists of a sequence of nucleotides in the range of 10 to 100 nucleotides, such as in the range of 15 to 50 nucleotides, such as in the range of 15 to 40 nucleotides.
- said nucleotides are deoxyribonucleotides.
- A consists of a sequence of deoxyribonucleotides in the range of 10 to 100 deoxyribonucleotides, such as in the range of 15 to 50 deoxyribonucleotides, such as in the range of 15 to 40 deoxyribonucleotides.
- A’ of the adaptor of the present disclosure is part of the lower strand of the adaptor, and it is therefore described in the 3’->5’ direction herein.
- A’ consists of 3’ - Ai’ - A 2 ’ - 5’.
- a 2 ’ is not present, in which case A’ consists of Ai’. However, if A 2 is present, then A 2 ’ is also present, and if A 2 is not present, then A 2 ’ is also not present.
- a 2 and A 2 ’ are sequences of nucleotides substantially complementary to each other.
- the length of A 2 and A 2 ’ is not important, but typically, they will be the same length and relatively short, e.g. less than 10 nucleotides, such as less than 5 nucleotides.
- said nucleotides are deoxyribonucleotides.
- a 2 and A 2 ’ may comprise less than 10 deoxyribonucleotides, such as less than 5 deoxy ribonucleotides.
- Ai’ is a sequence of nucleotides, which is non-complementary to Ai. Thus, Ai does not hybridise with A/, which results in a fork like structure at one end of the fill-in adaptor.
- the 3’ end of the A/ is exonuclease resistant and/or it contains a primer extension blocking group. That way, the 3’ end of A’ cannot serve as priming site for elongation. DNA amplification will then only take place once the complementarity of A is restored in a double-stranded DNA form.
- Ai’ is exonuclease resistant. In that manner, Ai’ will not be removed by exonucleases. If Ai’ were to be removed by exonuclease, this could create a priming site for the polymerase, and allow elongation even when the adaptor is not ligated to a DNA fragment.
- Ai’ comprises or consists of a sequence of nucleotides connected through exonuclease-resistant phosphorothioate linkage.
- Ai’ may comprise or consist of a sequence of 3 to 35, such as in the range of 5 to 15 consecutive nucleotides connected through exonuclease-resistant phosphorothioate linkages.
- Said nucleotides may be any nucleotides, however in one embodiment, the nucleotides are deoxycytidine monophosphate.
- A comprises or consists of a sequence of 3 to 35 of consecutive cytosines connected through exonuclease-resistant phosphorothioate linkages.
- Ai’ may also comprise one or more nucleotide analogues or modifications which are exonuclease-resistant.
- Said nucleotide analogues or modifications may for example be selected from the group consisting of phosphorothioate linkages, phosphoramidite C3 spacer, inverted deoxythymidine bases, 2'-O-methyl and 2'-O-methoxyethyl nucleosides.
- the 3’ end of Ai’ may contain a nucleotide that has been modified to block extension.
- the 3’ end of the lower strand of the adaptor will not be elongated, when the adaptor is not ligated to a DNA fragment.
- Said modification to block extension may be any modification known to the skilled person to block primer extension.
- the 3’ end of Ai’ comprises a dideoxynucleotide. In one embodiment of the present disclosure, 3’-end of Ai’ comprises a phosphoramidite C3 spacer.
- A? comprises a sequence that prevents RNA polymerase engagement and function in other manners.
- Ai’ may comprise a sequence that support formation of hairpin, loop or other secondary structure.
- A’ may be any desirable length.
- the length of A’ is independent from the length of A.
- A’ may be either shorter, longer or the same length as A.
- Ai and Ai’ may be the same or different lengths.
- A’ is a sequence of nucleotides of in the range of 2 to 100 nucleotides, such as in the range of 2 to 35 nucleotides, such as in the range of 2 to 10 nucleotides. It is preferred that none of said nucleotides are ribonucleotides
- the fill-in adaptors of the present invention comprise the structures -B - C- on the top strand and the structure -B’ - C’- on the lower strand. Said structure may also be depicted as:
- B and B’ are sequences of nucleotides, which are substantially complementary to each other.
- B and B’ are typically the same length, however, the length of B and B’ is not so important and can be adjusted according to the specific needs of the adaptor.
- B and/or B’ may comprise one or more functionality, such as a primer binding site, a barcode, and/or a UMI.
- B and B’ may be in the range of 5 to 100 nucleotides long.
- Said nucleotides may preferably be deoxyribonucleotides.
- C and C’ are also sequences of nucleotides, which are substantially complementary to each other. It is preferred that at least 80%, such as at least 85% of the nucleotides of C can form Watson-Crick base-pairs with nucleotides of C’. In some embodiments it is preferred that C and C’ are complementary to each other except for that up to 3, preferably up to 2, for example up to 1 nucleotide of C cannot form Watson-Crick basepair with nucleotides of C’.
- C’ comprises dll
- all other nucleotides of C’ are complementary to nucleotides of C. In other words, apart from the dll, C and C’ are complementary.
- C’ or B’-C’ comprises a nicking endonuclease recognition site
- C’ and C are complementary.
- C and C’ are typically the same length.
- C and C’ are sequences of up to 12 nucleotides.
- C’ is 2 or more nucleotides in length.
- C’ may be between 2 and 25 nucleotides in length, preferably C’ is between 2 and 20 nucleotides in length, such as between 2 to 12 nucleotides, most preferably between 5 to 12 nucleotides in length.
- C’ will dissociate from the fill-in adaptor if the adaptor is not ligated to a DNA fragment.
- C’ is between 3 and 12 nucleotides in length, for example between 2 and 10 nucleotides, such as between 4 to 10 nucleotides, for example between 5 to 8 nucleotides in length.
- B and/or B’ may comprise one or more functionalities. It is also comprised within the invention that -B-C- together and/or -B’-C’- together comprises one or more functionalities. Typically, most functionalities are comprised within B and/or B’.
- -B-C- and/or -B’-C’- may comprise one or more functionality, such as a primer binding site, a barcode, and/or a UMI.
- -B-C- and/or -B’-C’- contains a primer binding site.
- Said primer binding site may be any sequence complementary to a primer.
- neither said primer nor the primer binding site is prone to formation of secondary structure.
- ligation of the adaptor to the DNA fragments may facilitate later handling of the DNA fragments.
- the primer binding site may thus be any primer binding site, which is useful for later handling of the DNA fragments.
- the fill-in adaptor and in particular - B- C - or - B’ - C’- may comprise a primer binding site for said platform specific primer.
- -B-C- and/or -B’-C’- may contain a partial or full-length SBS3 primer binding site.
- -B-C- and/or -C’-B’- may contain a random DNA sequence acting as a unique molecular identifier, also referred to as a UMI sequence herein.
- a UMI sequence acting as a unique molecular identifier
- each UMI sequence is different.
- the UMI may comprises a random sequence of in the range of 4 to 20 nucleotides, for example in the range of 6 to 16 nucleotides.
- the U Is are each consisting of in the range of 5 to 15 random nucleotides.
- a barcode sequence is a unique sequence comprised within all adaptors ligated to a specific selection of DNA fragments. Barcode sequences are particularly useful for multiplexing. Thus, different barcode sequences can e.g. be used to label DNA fragments from different samples, so that all adaptors ligated to DNA fragments of one sample contains the same barcode sequences, whereas all adaptors ligated to DNA fragments of another samples contains a different barcode sequence. In that manner, each DNA fragment ligated to an adaptor can be assigned to a specific sample, even if DNA fragments from different samples are mixed.
- Each barcode may comprise a sequence of in the range of 4 to 20 nucleotides, for example in the range of 6 to 16 nucleotides. Preferably, each barcode consists of in the range of 5 to 15 nucleotides.
- -B-C- and/or -B’-C’- in addition contains one or more random sequences, e.g. a random sequence of in the range of 5 to 15 nucleotides.
- the fill-in adaptors of the invention comprise a nicking endonuclease recognition site or a pre-recognition site.
- the nicking endonuclease recognition site or the pre-recognition site consists of a single nucleotide (e.g. dll)
- the nicking endonuclease recognition site comprises several nucleotides
- said recognition site may be positioned in C’ or it may be spread over both B’ and C’.
- -B’-C’- may comprise said recognition site.
- the nicking endonuclease recognition site comprises a nicking site. Said nicking site may preferably be positioned within C’, and may more preferably be positioned at 3’ end of C’.
- the adaptor is modified with tags that enable detection or purification.
- the tags may function as a marker.
- the tag is an affinity group and/or bioorthogonal group.
- affinity group refers to any identifiable tag, group, or moiety that is capable of being specifically bound by another compound or composition (optionally attached or linked to a solid support, such as a bead, a filter, a plate, a membrane, a chromatographic resin, etc) for detection, identification and purification purposes. It is understood that many different species of affinity groups are known in the art and may be used, either individually or a combination. An exemplary affinity group is biotin.
- Adaptors comprising a biotin tag may be used in affinity chromatography, fluorescent or electron microscopy, ELISA assays, ELISPOT assays, western blots and other immunoanalytical methods.
- Another exemplary affinity group is an antigen, which is specifically recognised by an antibody.
- biological molecules such as those present in a bacterial, yeast or mammalian cell.
- the biological molecules can be, e.g., proteins, nucleic acids, fatty acids, or cellular metabolites.
- Adaptors comprising the biorthogonal Biotin-, Azide-, Alkyne-, Tetrazine-, Bicyclononyne-, Cyclopropene-, Trans-cyclooctene-, Norbonene-, Dibenzocyclooctyne- and Ketone-tags.
- Tetrazine tags may be used for click-chemistry reactions.
- biorthogonal click-chemistry reactions are Strain-promoted azide-alkyne cycloaddition (SPAAC), where an Azide reacts with an alkyne and Tetrazine ligation, where a tetrazine reacts with a trans-cyclooctene.
- SPAAC Strain-promoted azide-alkyne cycloaddition
- the adaptor further comprises a tag, such as an affinity group and/or bioorthogonal group, wherein the tag is selected from the group of Biotin-, Azide-, Alkyne-, Tetrazine-, Bicyclononyne-, Cyclopropene-, Trans-cyclooctene-, Norbonene-, Dibenzocyclooctyne- and Ketone-tags.
- Adaptors comprising an N-Hydroxysuccinimide Ester (NHS ester) may be used in a variety of bioconjugation reactions, such as for labelling and/or purification.
- the adaptor comprises an NHS-based handle.
- the present invention relates to adaptors, which comprise a nicking endonuclease recognition site or a pre-recognition site.
- a nicking endonuclease pre-recognition site can be converted into a nicking endonuclease recognition site by the aid of a DNA glycosylase as describe below in the section “DNA glycosylase”.
- nicking endonuclease refers to an enzyme capable of introducing a nick in a double stranded DNA sequence.
- the nicking endonuclease may for example be any of the nicking endonucleases described herein in this section.
- nicking refers to the cleavage of only one strand of a fully double-stranded nucleic acid molecule or a double-stranded portion of a partially double-stranded nucleic acid molecule.
- nicking endonucleases nicks said DNA at a specific position relative to the nicking endonuclease recognition site, which is a nucleotide sequence that is recognized by the nicking endonuclease.
- nicking endonuclease recognition site as used herein is a sequence recognised by a given nicking endonuclease, wherein the nicking endonuclease nicks DNA at a specific position relative to said sequence.
- the nicking endonuclease recognition site consists of a single nucleotide (e.g. dll), in which case the recognition site consists of the nicking site.
- the nicking is performed by nicking endonucleases which may recognize a particular nucleotide sequence of a fully or partially doublestranded nucleic acid and cleave only one strand of the fully or partially doublestranded nucleic acid at a specific position relative to the location of the recognition sequence.
- the nicking endonuclease is a Type III endonuclease of SEQ ID NO: 24 or a functional homologue thereof sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with any one of the Type III endonuclease of SEQ ID NO: 24.
- the Type III endonuclease may also be a polypeptide of any one of SEQ ID NO: 24 to 27 or a functional homologue thereof sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with any one of the Type III endonucleases of SEQ ID NO: 24 to 27.
- the Type III endonuclease may also be a polypeptide of any one of SEQ ID NO: 24 to 27 or SEQ ID NO: 32 to 41 or a functional homologue thereof sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with any one of the Type III endonucleases of SEQ ID NO: 24 to 27 or SEQ ID NO: 32 to 41 .
- said functional homologues of Type III endonuclease produce a 3’-O-phosphate end, which allows the extension of the DNA by polymerases.
- the nicking endonuclease recognition site for a Type III endonuclease may be an AP site.
- the nicking endonuclease is a Type VIII endonuclease of SEQ ID NO: 28 or a functional homologue thereof sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with any one of the Type VIII endonuclease of SEQ ID NO: 28 to 41 .
- the nicking endonuclease is a polypeptide of any one of SEQ ID NO: 28 to 31 or a functional homologue thereof sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with any one of the Type VIII endonuclease of SEQ ID NO: 28 to 31.
- the nicking endonuclease is a polypeptide of any one of SEQ ID NO: 28 to 41 or a functional homologue thereof sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with any one of the Type VIII endonuclease of SEQ ID NO: 28 to 41.
- said functional homologues of Type VIII endonuclease produce a 3’-OH end, which allows the extension of the DNA by polymerases.
- the nicking endonuclease recognition site for a Type VIII endonuclease may be an AP site.
- the nicking endonuclease is a DNA glycosylase-lyase Endonuclease IV of any one of SEQ ID NO: 31 to 41 or a functional homologue thereof sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with any one of the glycosylase-lyase endonucleases of SEQ ID NO: 31 to 41.
- the nicking endonuclease is a Type IV endonuclease of SEQ ID NO: 42 or a functional homologue thereof sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with any one of the Type IV endonuclease of SEQ ID NO: 42 - 50.
- said functional homologues of Type IV endonuclease produce an o,p-unsaturated aldehyde end.
- the nicking endonuclease recognition site for a Type IV endonuclease may be an AP site.
- the nicking endonuclease is a nicking endonuclease recognising an AP site, preferably an apyrimidinic site.
- the fill-in adaptor comprises a CRIPSR recognition site.
- the nicking endonuclease is a CRIPSR endonuclease and the nicking endonuclease recognition site is a CRISPR recognition site.
- the fill-in adaptor comprises a CRISPR recognition site and no UMI.
- the fill-in adaptor comprises a CRISPR recognition site and a UMI, wherein the UMI is positioned in either A or B, preferably in A 2 .
- the nicking endonuclease recognition site is a CRISPR recognition site, preferably a CRISPR Cas9 recognition site.
- the nicking endonuclease recognition site is CRISPR D10A nicking site comprising a PAM sequence (NGG) downstream of the nicking site.
- C’ may comprises a PAM sequence
- the remainder of the CRISPR recognition site may be comprised in B’.
- the CRISPR recognition site may differ depending on the sequence of the guide RNA as well as the particular CRISPR endonuclease, the skilled person will be able to design a CRISPR recognition site.
- the CRISPR endonuclease is a CRISPR endonuclease of SEQ ID NO: 65 or a functional homologue thereof sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with any one of the CRISPR endonuclease of SEQ ID NO: 65 - 67.
- the nicking endonuclease is a restriction endonuclease.
- a restriction endonuclease may be a restriction enzyme with nicking activity.
- the nicking endonuclease recognition site is a recognition site recognised by a restriction enzyme with nicking activity.
- useful restriction endonucleases include Nt.BspQI, Nt.CviPII, Nt.BstNBI, Nb.BsrDI, Nb.BtsI, Nt.Alwl, Nb.BbvCI, Nb.Bsml, Nb.BssSI or NtBsmAI or variants of any of the aforementioned, such as thermostable variants thereof.
- the nicking endonuclease recognition site is a nicking site recognised by Nt.BspQI, Nt.CviPII, Nt.BstNBI, Nb.BsrDI, Nb.BtsI, Nt.Alwl, Nb.BbvCI, Nt.BbvCI, Nb.Bsml, Nb.BssSI or Nt.BsmAI or variants of any of the aforementioned, such as thermostable variants thereof.
- the nicking endonuclease recognition site is GCTCTTC, CCD, GCAGTG, GGATC, CCTCAGC, GAATGC, CACGAG or GTCTC.
- the nicking endonuclease is a restriction endonuclease of any one of SEQ ID NO: 51 - 64 or a functional homologue thereof sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with any one of the restriction endonucleases of SEQ ID NO: 51 - 64.
- the adaptor comprises a nicking endonuclease pre-recognition site.
- a nicking endonuclease pre-recognition site is a site, which can be transformed into a nicking endonuclease recognition site, for example by the action of a DNA glycosylase.
- the pre-recognition site comprises or consists of deoxy-uridine (dll). In such cases the recognition site is the same as the nicking site.
- the dll may be recognized by a DNA glycosylase, preferably a Uracil DNA Glycosylase (UDG).
- the uracil DNA glycosylase can be used to cleave the glycosidic bond between the uracil base and the deoxyribose sugar to convert the prerecognition site to a recognition site.
- UDG uracil DNA glycosylase
- the resulting apyrimidinic sites block replication by DNA polymerases and may be recognized by nicking endonucleases.
- the pre-nicking site may be converted into a nicking endonuclease recognition site by the action of a DNA glycosylase.
- C’ comprises one deoxy-Uridine positioned in C’ in a position selected from the group of position 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19 and 20.
- the deoxy-Uridine may be positioned in C’ at position 5, 6, 7, 8, 9 or 10.
- the deoxy-Uridine may be positioned in C’ at position 5, 6, 7, or 8.
- the deoxy-Uridine is positioned in C’ at position 5.
- the glycosylase may be a monofunctional or bifunctional glycosylase.
- Monofunctional glycosylases have only glycosylase activity.
- monofunctional glycosylases may cleave the glycosidic bond between the uracil base and the deoxyribose sugar to convert the pre-recognition site to a recognition site.
- the DNA glycosylase is a monofunctional glycosylase.
- Bifunctional glycosylases have glycosylase activity and endonuclease activity.
- bifunctional glycosylases may cleave the glycosidic bond between the uracil base and the deoxyribose sugar to convert the pre-recognition site to a recognition site and nick at the nicking site.
- the DNA glycosylase is a bifunctional glycosylase.
- the DNA glycosylase for example the Uracil-DNA glycosylase and the nicking endonuclease are combined in one enzyme.
- DNA glycosylase is a Uracil DNA glycosylase of SEQ ID NO: 1 or a functional homologue thereof sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with any one of the Type IV endonuclease of SEQ ID NO: 1 - 23.
- step d) comprises incubating the sample with Uracil DNA glycosylase (UDG) and with DNA glycosylase-lyase Endonuclease VIII.
- step d) comprises incubating the sample with Antarctic Thermolabile Uracil DNA glycosylase (UDG) and DNA glycosylase-lyase Endonuclease III.
- step d) comprises incubating the sample with Afu Uracil DNA glycosylase (UDG) and the DNA glycosylase-lyase Endonuclease IV.
- the DNA glycosylase is a Uracil-DNA glycosylase, for example Antarctic Thermolabile Uracil DNA glycosylase (UDG) or Afu Uracil DNA glycosylase (UDG).
- UDG Antarctic Thermolabile Uracil DNA glycosylase
- UG Afu Uracil DNA glycosylase
- the nicking endonuclease is DNA glycosylase-lyase Endonuclease VIII, DNA glycosylase-lyase Endonuclease III or DNA glycosylase-lyase Endonuclease IV.
- the present disclosure also provides methods of attaching adaptor(s) (e.g. any of the fill-in adaptors described herein) to DNA fragment(s), said method comprising: a) providing at least one adaptor according to any one of the preceding items; b) providing a sample containing DNA fragments; c) attaching the adaptor to the DNA fragments in the sample; and d) i) if C’ comprises an endonuclease pre-nicking site, incubating the sample with a DNA glycosylase recognising said pre-nicking site and with a nicking endonuclease recognising the nicking endonuclease recognition site generated by the DNA glycosylase either sequentially, partly simultaneously or simultaneously under conditions allowing for activity of said enzymes; or ii) if C’ or -B’-C’- comprises a nicking endonuclease recognition site incubating the sample with a nicking endonuclease recognising
- the present disclosure also provides a method of attaching adaptor(s) (e.g. any of the fill-in adaptors described herein) to DNA fragment(s), said method comprising: a) providing at least one adaptor according to any one of items as described herein; b) providing a sample containing DNA fragments; c) attaching the adaptor to the DNA fragments in the sample; and d) i) incubating the sample with a Uracil-DNA glycosylase under conditions allowing for activity of said enzyme, thereby generating an AP site; ii) incubating the sample with a nicking endonuclease recognising an AP site under conditions allowing for activity of said enzyme, wherein steps i) and ii) may be performed simultaneously, partly simultaneously or sequentially, e) incubating the sample at a temperature that is higher than
- the steps of incubating the sample at a temperature that is higher than the Tm of i) and ii) and incubating the samples with a strand-displacing DNA polymerase can be performed either sequentially or simultaneously.
- the sample is incubated with the stranddisplacing DNA polymerase at a temperature that is higher than the Tm of i) and ii).
- the method further comprises a step of cold shock, wherein the sample comprising DNA fragments ligated to adaptor is quickly transferred to a low temperature after nicking and the heat treatment.
- Said cold shock usually comprises incubation at a temperature in the range of 0 °C to 4 °C, wherein said step is performed immediately after step e).
- step d) is performed at a temperature in the range of 20° C to 80° C.
- step f) is performed at a temperature in the range of 20 to 80°C, such as in the range of 25 to 75°C, for example in the range of 20 to 50°C, such as in the range of 25 to 37°C.
- the methods also comprise a step of heat treatment, which is performed in order to allow C’ to dissociate from unligated fill-in adaptors, and/or to allow -C’-C- to dissociate from any adaptor dimers after RNA-nicking.
- the step of heat treatment should be performed at a temperature that is higher than the Tm of i) and ii), wherein i) and ii) are as follows i) 5’ - C - 3’ 3’ - O’ - 5’, ii) 5’ - C - C’ - 3’
- step e) is performed at a temperature in the range of 40° C to 80° C, such as in the range of 45°C to 70°C, for example in the range of 50°C to 70°C.
- the methods of the invention also comprise a step of incubating the sample with a strand-displacing DNA polymerase.
- the strand-displacing DNA polymerase needs a 3’-hydroxyl group for primer extension. In other embodiments, a nick may be sufficient to start primer extension by a strand-displacing DNA polymerase. In some embodiments of this disclosure, the strand-displacing DNA polymerase has a strong activity at elevated temperatures to ensure the nicked strand does not reanneal. The person skilled in the art will appreciate that a variety of thermophilic strand-displacing polymerases can be used.
- the strand-displacing DNA polymerase may be any DNA polymerase with the ability to displace downstream DNA encountered during synthesis with newly synthesised DNA.
- the strand displacing DNA polymerase is a Bst polymerase, DNA Polymerase I, Large (Klenow) Fragment or DNA Polymerase from Thermococcus litoralis.
- the strand displacing DNA polymerase may be any DNA polymerase sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with Bst DNA Polymerase of SEQ ID NO: 68.
- the strand displacing DNA polymerase is a Bst polymerase comprising a large fragment, wherein said large fragment comprises or consists of a sequence sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with the Large fragment of Bst polymerase SEQ ID NO: 69.
- the strand displacing DNA polymerase may be any DNA polymerase of SEQ ID NO: 76 or a strand displacing DNA polymerase sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with DNA Polymerase I, Large (Klenow) Fragment of SEQ ID NO: 76.
- the strand displacing DNA polymerase may be any DNA polymerase of SEQ ID NO:77 or a strand displacing DNA polymerase sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with DNA Polymerase from Thermococcus litoralis of SEQ ID NO: 77.
- the strand displacing DNA polymerase is a DNA polymerase sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with phi 29 DNA polymerase of SEQ ID NO: 70 or Taq DNA polymerase of SEQ ID NO: 71.
- Said incubation with the strand displacing DNA polymerase is performed under conditions allowing for activity of said enzyme.
- the skilled person will be able to determine suitable conditions for the DNA polymerase of their choice.
- the step f) is performed at a temperature in the range of 20 to 80°C, such as in the range of 25 to 75°C, such as 20 to 50°C, for example in the range of 25 to 37°C.
- the strand displacing DNA polymerase is a Polymerase with 5'— >3' exonuclease activity, such as Taq Polymerase.
- the adaptors are ligated at one end of the DNA fragments, and in other embodiments adaptors may be ligated at both ends.
- two different adaptors i.e. two different adaptor species
- the methods may involve 1 -sided or 2-sided adaptor ligation. This may be the case whether the adaptor-ligated DNA fragments are amplified by linear amplification (e.g. by in vitro transcription) or by non-linear (e.g. exponential) amplification (e.g. by PCR), for example as depicted in Figure 3.
- two or more different adaptors may be provided for use in the method. These may be provided separately or together. Accordingly, in some embodiments, a mixture of fill-in adaptors (i.e. two, or two or more adaptor species) is provided.
- Different adaptors may differ in the amplification sequence that they contain. In particular, they may differ in the primer binding site that is provided to the adaptor- ligated DNA fragment, more particularly, at each end thereof.
- the different adaptor species may comprise complements of different primer binding sites (that is the sequence A in the adaptor may be different).
- each of the fill-in adaptor species may share the same barcode (BC) but comprise different primer binding sites.
- each of the adaptor species contain a sequence that is complementary to a primer binding site, wherein the primer binding site of each of the adaptor species is distinct.
- a plurality of fill-in adaptors may be provided which contain sequences that are complementary to distinct primer binding sites.
- two fill-in adaptors are provided for example as a mixture.
- the two adaptors may be referred to as. “A” and “B” adaptors, A and B being representative of first and second adaptors.
- Two adaptors for use together to provide a pair of amplification primer binding sites may be regarded as paired, or cognate, adaptors
- two fill-in adaptor species may be added randomly to the DNA fragments as described herein.
- the fill-in adaptor species may bind to all or a fraction of DNA fragments from each sample.
- an A or B (i.e. first or second) adaptor may become ligated to just one end of a fragment.
- the DNA fragment may be ligated to an adaptor at each end, either the same or a different adaptor.
- one adaptor e.g. adaptor A
- the DNA fragment is ligated to both adaptors, one at each end (e.g. adaptors A and B) (e.g.
- DNA fragments that have been ligated to two distinct fill-in adaptors may be termed “doubletagged DNA fragments”.
- doubletagged DNA fragments the DNA may be ligated to the two fill-in adaptors in several different combinations, for example A-DNA-A, A-DNA-B, B-DNA-A or B-DNA-B.
- double-tagged DNA fragments may be amplified during PCR by using primers that are specific for the distinct amplification sequences present in the distinct fill-in adaptor species. For example, if the doubletagged DNA fragment is ligated to distinct adaptor species at each end (A-DNA-B or B- DNA-A), the provided primers may be specific for the amplification sequence of A and B, respectively.
- the UMI-BC-gDNA portion is sequenced following PCR amplification.
- adaptors with appropriate primer binding site sequences.
- a representative example of such adaptors is provided by SEQ ID NOs. 93 and 94 which set out the top and bottom strand sequences of a first adaptor, and SEQ ID NOs. 95 and 96 which set out the top and bottom strand sequences of a second adaptor.
- the DNA fragments ligated with such adaptors may be amplified by the forward and reverse primers set out in SEQ ID NOs. 97 and 98 respectively.
- the fill-in adaptors of the present invention are useful for ligation to any DNA fragments.
- Said DNA fragments may for example be chromatin fragments.
- Chromatin from any source of cells or tissues may be used.
- Said chromatin may for example be fragmented using mechanical or enzymatic means, and a specific antibody may be used to precipitate those chromatin fragments that associate with any desired antigen recognized by the specific antibody.
- Chromatin fragments from decomposing cells within an organism may also be present in cell-free material, such as liquid biopsies, such as extracellular fluid, blood, urine, lymph fluid, ascites fluid, and in such cases no further fragmentation may be needed.
- Such chromatin fragments may be precipitated.
- Specific antibodies against DNA or chromatin binding factors, histone modifications or any other desired molecule associating with DNA may be used, including antibodies against modification of the DNA itself, such as cytosine methylation-specific antibodies,
- the DNA fragments consist of or comprise genomic DNA (gDNA), such as gDNA fragments.
- gDNA genomic DNA
- the DNA fragments are protein-bound DNA fragments.
- the DNA fragments are gDNA fragments bound to proteins.
- said fragments may comprise or consist of nucleosomes and/or other genomic DNA fragments bound to chromatin proteins such as transcription factors.
- the majority of the gDNA fragments are in the form of nucleosomes, such as mononucleosomes.
- the DNA fragments are naked genomic DNA.
- the gDNA may be derived from any organism of interest, and thus the genomic DNA may for example be eukaryotic or prokaryotic.
- the DNA fragments comprises or consists of cell free DNA.
- Cell free DNA is typically already fragmented, and thus it is frequently not required to further fragment cell free DNA.
- the cell free DNA is bound by proteins.
- the cell free DNA is in the form of chromatin fragments.
- the cell free DNA may largely be in the form of nucleosomes.
- the cell free DNA is in the form of naked DNA, i.e. not bound to proteins.
- cell free DNA refers to a DNA molecule or a set of DNA molecules freely circulating in a biological sample, for example in blood.
- Cell free DNA is also known as "circulating DNA”.
- Cell free DNA is extracellular, and this term is used as opposed to the intracellular DNA, which can be found, for example, in the cell nucleus or mitochondria.
- the DNA fragments are selected from the group consisting of cDNA, DNA produced by whole genome amplification, primer extension products comprising at least one double-stranded terminus, and PCR amplicons.
- the DNA fragments are obtained by isolating chromatin from a cellular sample and fragmenting said chromatin.
- the DNA fragments are obtained by lysing cells of a cellular sample and fragmenting said chromatin. Said fragmenting may be done by any useful means, for example, the DNA fragments may have been prepared by mechanical shearing and/or enzymatic digestions, nebulisation, sonication, point-sink shearing, passage through a pressure cell, using French pressure cells, transposome mediated fragmentation and/or digestion with restriction enzymes and/or endonucleases.
- the genomic DNA is fragmented by MNase digestion. MNase digestion leads to fragmentation mainly into mononucleosomes and/or dinucleosomes.
- the fragmentation is preferably done in a manner so that the fragmented DNA comprises or essentially consists of chromatin fragments.
- the DNA fragments may have any desirable size. However, the methods of the invention are particularly useful for ligating adaptors to short DNA fragments. Frequently, the DNA fragments in average comprise more than 10 base pairs, such as more than 15 base pairs, such as more than 150 base pairs, for example in the range of 10 to 500 base pairs, such as in the range of 20 to 200 base pairs.
- the chromatin fragments may comprise transcription factor bound fragments, which typically are smaller than 150 bp, for example approx..50bp and/or mononucleosomes, which typically are 150-230 bp, for example approx.150bp and/or dinucleosomes which are larger than 300 bp, for example approx. 300 bp.
- the DNA fragment in average comprise more than 10 base pairs, such as more than 15 base pairs, such as more than 150 base pairs, for example in the range of 10 to 15,000 base pairs, such as in the range of 10 to 10,000 base pairs, for example in the range of 10 to 5,000 base pairs, such as in the range of 10 to 500 base pairs.
- the adaptors may be attached to the DNA fragments by any useful means, however preferably attachment in step c) is done by ligation, such as by blunt end ligation.
- ligation may be performed by incubation with a ligase, for example a T4 DNA ligase. Said incubation with ligase is performed under conditions allowing for activity of said enzyme. The skilled person will be able to determine suitable conditions for the ligase of her choice.
- the methods of the invention may also comprise such steps.
- the adaptor contains a sample specific barcode, and the DNA fragments are obtained from the sample to be marked by said barcode.
- the ligated adaptors may be amplified.
- the invention also provides methods of amplification of DNA fragments.
- Said method comprises the step of a) preparing DNA fragments attached to adaptors by the methods described above, b) amplifying said DNA fragments attached to adaptors in vitro.
- at least one step of amplification is performed by RNA polymerase-driven transcription using said RNA polymerase.
- RNA polymerase may preferably be T7 RNA polymerase.
- the T7 RNA polymerase is a polymerase of SEQ ID NO: 78 or a functional homologue thereof sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with SEQ ID NO: 78.
- a of the fill-in adaptor contains a sequence complementary to a primer binding site.
- at least one step of said amplification involves the use of a primer capable of binding to said primer binding site.
- the step of amplification may involve the use of two different primers, each capable of binding to the distinct primer binding sites provided by the two different adaptors ligated at each end of the DNA fragment.
- Figures 3A to 3E illustrate representative implementations of the method, including 1- or 2-sided adaptor ligation, and amplification by in vitro transcription or by PCR.
- Figures C and D show locus-specific amplification with locus-specific primers.
- a second tag can be introduced by ligation at the other end of the DNA fragment, in order to provide a primer binding site, e.g. for a primer for a reverse transcription step, or for further amplification.
- one or more primers may be generated for the reverse transcription step, for example as depicted in Figure 3B.
- two primer binding events may occur. This may be a result of the particular sequence requirements of the downstream sequencing platform to be used.
- similarities between the required forward and reverse sequencing primer may result in the generation of two binding sites for the reverse transcriptase primer. Such design considerations are within the routine skill of the person skilled in this art.
- sample(s) to be used in the method of the present invention refers to various samples that contain DNA fragments.
- Examples of such a sample include samples prepared from, comprising or consisting of cultured cells, a cultured cell lysate, a culture supernatant, and/or a mammalian material.
- mammalian material refers to every mammalian-derived biological material such as tissue or biopsies collected from a mammalian (e g., tissue collected after an operation) and/or body fluids such as blood, serum, blood plasma, urine, a spinal fluid, saliva, a lymph fluid, a lacrimal fluid, or a seminal fluid.
- tissue or biopsies collected from a mammalian (e g., tissue collected after an operation) and/or body fluids such as blood, serum, blood plasma, urine, a spinal fluid, saliva, a lymph fluid, a lacrimal fluid, or a seminal fluid.
- body fluids such as blood, serum, blood plasma, urine, a spinal fluid, saliva, a lymph fluid, a lacrimal fluid, or a seminal
- the sample may comprise or consist of aforementioned cultured cells, a cultured cell lysate, a culture supernatant, and/or a mammalian material.
- the sample may also be prepared from cultured cells, a cultured cell lysate, a culture supernatant, and/or a mammalian material.
- the sample may comprise fragmented and/or isolated DNA from aforementioned material.
- the sample is prepared from any of the aforementioned materials comprising cells, by a method comprising lysing said cells and fragmenting the genomic DNA of said cells.
- the mammalian material may be obtained from any mammal.
- the mammal is a human.
- the DNA fragments are obtained by isolating or partly isolating DNA from mammalian tissue. In such embodiments the DNA is preferably subjected to fragmentation, which can be done before, after or simultaneously with isolation. In some embodiments, the DNA fragments are obtained by isolating or partly isolating DNA from body fluids. In some embodiments, the DNA fragments are obtained by isolating or partly isolating DNA from serum. In some embodiments, the DNA fragments are obtained by isolating or partly isolating DNA from blood plasma. In some embodiments, the DNA fragments are obtained by isolating or partly isolating DNA from urine. In some embodiments, the DNA fragments are obtained by isolating or partly isolating DNA from spinal fluid.
- the DNA fragments are obtained by isolating or partly isolating DNA from saliva. In some embodiments, the DNA fragments are obtained by isolating or partly isolating DNA from lymph fluid. In some embodiments, the DNA fragments are obtained by isolating or partly isolating DNA from lacrimal fluid. In some embodiments, the DNA fragments are obtained by isolating or partly isolating DNA from seminal fluid. In some embodiments, the DNA fragments are obtained by isolating or partly isolating DNA from blood. In any of the aforementioned embodiments the DNA may be subjected to fragmentation, which for example can be done as described above either before, after or simultaneously with isolation.
- the sample comprises purified DNA. In some embodiments, the sample comprises purified nucleosomes. In some embodiments, the sample comprises purified chromatin. In some embodiments, the sample comprises cell lysate, for example cell lysate which has been subjected to fragmentation. In some embodiments, the sample comprises plasma. In some embodiments, the sample comprises blood. In some embodiments, the sample comprises serum. In some embodiments, the sample comprises urine. In some embodiments, the sample comprises spinal fluid. In some embodiments, the sample comprises salvia. In some embodiments, the sample comprises lymph fluid. In some embodiments, the sample comprises lacrimal fluid. In some embodiments, the sample comprises seminal fluid.
- the DNA fragments of a given sample are ligated to fill-in adaptors of the invention.
- the fill-in adaptors may contain a sample specific barcode, such that DNA fragments of a given sample can be identified by the barcode.
- the invention may further be defined by any of the following items:
- a partly double-stranded adaptor comprising or consisting of an oligonucleotide of the general structure: 5’ - A - B - C - 3’ 3’ -A’ - B’ - C’ - 5’ wherein a) A is the top strand of a DNA amplification sequence, and A consists of 5’ - AI - A 2 - 3’; b) A’ consists of 3’ - Ai’ - A 2 ’ - 5’, and c) Ai’ is a sequence of nucleotides, which is substantially non- complementary to Ai, wherein the 3’ end is exonuclease resistant and/or contains primer extension blocking groups; d) A 2 and A 2 ’ are either not present or A 2 and A 2 ’ are sequences of nucleotides substantially complementary to each other; e) B and B’ are sequences of in the range of 5 to 100 deoxyribonucleotides, which are substantially complementary to each other;
- nicking endonuclease recognition site is a nicking site recognised by a restriction enzyme with nicking activity.
- nicking endonuclease recognition site is a nicking site recognised by Nt.BspQI, Nt.CviPII, Nt.BstNBI, Nb.BsrDI, Nb.BtsI, Nt.Alwl, Nb.BbvCI, Nt.BbvCI, Nb.Bsml, Nb.BssSI or Nt.BsmAI or variants of any of the aforementioned, such as thermostable variants thereof.
- the nicking endonuclease recognition site is GCTCTTC, CCD, GCAGTG, GGATC, CCTCAGC, GAATGC, CACGAG or GTCTC.
- nicking endonuclease recognition site is a CRISPR nicking endonuclease recognition site.
- nicking endonuclease recognition site is CRISPR D10A nicking site comprising a PAM sequence (NGG) downstream of the site of the nick.
- nicking endonuclease recognition site is an SP6 RNA polymerase recognition site, preferably a sequence of SEQ ID NO: 73 or a sequence sharing at least 95% sequence identity therewith.
- nicking endonuclease recognition site is a T3 RNA polymerase recognition site, preferably a sequence of SEQ ID NO: 74 or a sequence sharing at least 95% sequence identity therewith.
- nicking endonuclease recognition site is a Cyanophage Syn5 polymerase recognition site, preferably a sequence of SEQ ID NO: 75 or a sequence sharing at least 95% sequence identity therewith.
- a partly double-stranded adaptor comprising or consisting of an oligonucleotide of the general structure:
- A is the top strand of a DNA amplification sequence, and A consists of 5’ - AI - A 2 - 3’; b) A’ consists of 3’ - Ai’ - A 2 ’ - 5’, and c) Ai’ is a sequence of nucleotides, which is substantially non- complementary to Ai, wherein the 3’ end is exonuclease resistant and/or contains primer extension blocking groups; d) A2 and A2’ are either not present or A2 and A2’ are sequences of nucleotides substantially complementary to each other; e) B and B’ are sequences of in the range of 5 to 100 deoxyribonucleotides, which are substantially complementary to each other; and f) C and C’ are sequences, which are substantially complementary to each other, wherein C’ comprises a deoxy-Uridine (dll).
- A, B, C, A’, B’ and C’ are as defined in any one of items 1 or 12; and b) P is a 5’ phosphate.
- A consists of in the range of 10 to 100 nucleotides.
- A consists of in the range of 15 to 50 nucleotides.
- A consists of in the range of 15 to 40 nucleotides.
- A’ consists of in the range of 2 to 100 nucleotides.
- A’ consists of in the range of 2 to 35 nucleotides.
- A’ consists of in the range of 2 to 10 nucleotides.
- the adaptor according to any one of the preceding items, wherein the DNA amplification sequence is a promoter sequence of an RNA polymerase or it comprises a primer binding site.
- Ai’ comprises or consists of a sequence of nucleotides connected through exonucleaseresistant phosphorothioate linkage.
- the adaptor according to any one of the preceding items wherein Ai’ is non- complementary to Ai.
- the adaptor according to any one of the preceding items wherein Ai’ comprises or consists of a sequence of 3 to 35 of consecutive nucleotides connected through exonuclease-resistant phosphorothioate linkages. 28.
- Ai’ comprises or consists of a sequence of 3 to 35 of consecutive cytosines connected through exonuclease-resistant phosphorothioate linkages.
- Ai comprises one or more nucleotide analogues or modifications which are exonucleaseresistant, selected from the group consisting of phosphorothioate linkages, phosphoramidite C3 spacer, inverted deoxythymidine bases, 2'-O-methyl and 2'-O-methoxyethyl nucleosides.
- Ai comprises a sequence that prevent RNA polymerase engagement and function, for example, a sequence that support formation of hairpin, loop or other secondary structure.
- C is between 2 and 25 nucleotides in length, such as between 2 and 20 nucleotides, preferably between 2 and 12 nucleotides, more preferably between 2 and 10 nucleotides in length or between 5 to 12 nucleotides in length.
- C is between 3 and 12 nucleotides in length, such as between 4 to 10 nucleotides, for example between 5 to 8 nucleotides in length.
- the adaptor further comprises a tag, such as an affinity group and/or bioorthogonal group, wherein the tag is selected from the group of Biotin-, Azide-, Alkyne-, Tetrazine- Bicyclononyne-, Cyclopropene-, Trans-cyclooctene-, Norbonene-, Dibenzocyclooctyne and Ketone-tags.
- a tag such as an affinity group and/or bioorthogonal group, wherein the tag is selected from the group of Biotin-, Azide-, Alkyne-, Tetrazine- Bicyclononyne-, Cyclopropene-, Trans-cyclooctene-, Norbonene-, Dibenzocyclooctyne and Ketone-tags.
- step f) is performed at a temperature in the range of 20 to 80°C, such as in the range of 25 to 75°C, for example in the range of 20 to 50°C, such as in the range of 25 to 37°C.
- the DNA glycosylase is a Uracil DNA glycosylase of SEQ ID NO: 1 or a functional homologue thereof sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with any one of the Uracil DNA glycosylases of SEQ ID NO: 1 - 23.
- nicking endonuclease is DNA glycosylase-lyase Endonuclease VIII, DNA glycosylase-lyase Endonuclease III or DNA glycosylase-lyase Endonuclease IV of any one of SEQ ID NO: 31 to 41 or a functional homologue thereof sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with any one of the glycosylase-lyase endonucleases of SEQ ID NO: 31 to 41.
- step d) comprises incubating the sample with Uracil DNA glycosylase (UDG) and with DNA glycosylase-lyase Endonuclease VIII.
- UDG Uracil DNA glycosylase
- step d) comprises incubating the sample with Antarctic Thermolabile Uracil DNA glycosylase (UDG) and DNA glycosylase-lyase Endonuclease III.
- step d) comprises incubating the sample with Afu Uracil DNA glycosylase (UDG) and the DNA glycosylase-lyase Endonuclease IV.
- the DNA glycosylase is a Uracil-DNA glycosylase, for example Antarctic Thermolabile Uracil DNA glycosylase (UDG) or Afu Uracil DNA glycosylase (UDG).
- UDG Antarctic Thermolabile Uracil DNA glycosylase
- UG Afu Uracil DNA glycosylase
- nicking endonuclease is a nicking endonuclease recognising an AP site, preferably an apyrimidinic site.
- nicking endonuclease is endonuclease VIII, Endonuclease III or Endonuclease IV.
- nicking endonuclease is a Type III endonuclease of any one of SEQ ID NO: 24 to 27 or SEQ ID NO: 32 to 41 or a functional homologue thereof sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with any one of the Type III endonuclease of SEQ ID NO: 24 to 27 or SEQ ID NO: 32 to 41.
- nicking endonuclease is a Type VIII endonuclease of any one of SEQ ID NO: 28 to 41 or a functional homologue thereof sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with any one of the Type VIII endonuclease of SEQ ID NO: 28 to 41.
- nicking endonuclease is a Type IV endonuclease of SEQ ID NO: 42 or a functional homologue thereof sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with any one of the Type IV endonuclease of SEQ ID NO: 42 - 50.
- nicking endonuclease is DNA glycosylase-lyase Endonuclease VIII, DNA glycosylase- lyase Endonuclease III or DNA glycosylase-lyase Endonuclease IV.
- nicking endonuclease is a CRISPR Nickase, preferably D10A.
- the CRISPR endonuclease is a CRISPR endonuclease of SEQ ID NO: 65 to 67 or a functional homologue thereof sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with any one of the CRISPR endonuclease of SEQ ID NO: 65 - 67.
- nicking endonuclease is a restriction endonuclease of any one of SEQ ID NO: 51 - 64 or a functional homologue thereof sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with any one of the restriction endonucleases of SEQ ID NO: 51 - 64.
- nicking endonuclease is endonuclease VIII of SEQ ID NO: 28, Endonuclease III of SEQ ID NO: 24 or Endonuclease IV or SEQ ID NO:42 or a functional variant of thereof sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with any one of SEQ ID NO: 24 to 50.
- strand displacing DNA polymerase is a Bst polymerase, DNA Polymerase I, Large (Klenow) Fragment or DNA Polymerase from Thermococcus litoralis.
- strand displacing DNA polymerase is a DNA polymerase sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with Bst DNA Polymerase of SEQ ID NO: 68.
- the strand displacing DNA polymerase is a Bst polymerase comprising a large fragment
- said large fragment comprises or consists of a sequence sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with the Large fragment of Bst polymerase SEQ ID NO: 69.
- the DNA polymerase is a polypeptide of SEQ ID NO:76 or a DNA polymerase sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with DNA Polymerase I, Large (Klenow) Fragment of SEQ ID NO: 76.
- the strand displacing DNA polymerase is a polypeptide of SEQ ID NO:77 or a DNA polymerase sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with DNA Polymerase from Thermococcus litoralis of SEQ ID NO: 77.
- the strand displacing DNA polymerase is a DNA polymerase sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with phi 29 DNA polymerase of SEQ ID NO: 70 or Taq DNA polymerase of SEQ ID NO: 71.
- DNA fragments are cell free DNA fragments obtained from blood, blood plasma, urine, or ascites fluid.
- DNA fragments comprises nucleosomes and/or genomic DNA fragments bound to chromatin proteins such as transcription factors.
- genomic DNA is eukaryotic or prokaryotic.
- the DNA fragments are selected from the group consisting of cDNA, DNA produced by whole genome amplification, primer extension products comprising at least one double-stranded terminus, and a PCR amplicon.
- the DNA fragments are obtained by lysing cells from a cell culture or from mammalian material and fragmenting the chromatin from the lysed cells.
- the DNA fragments are obtained by isolating, partly isolating and/or fragmenting DNA from mammalian material, wherein said mammalian material for example may be material such as tissue or biopsies collected from a mammalian and/or body fluids such as blood, serum, blood plasma, urine, a spinal fluid, saliva, a lymph fluid, a lacrimal fluid, or a seminal fluid.
- mammalian material for example may be material such as tissue or biopsies collected from a mammalian and/or body fluids such as blood, serum, blood plasma, urine, a spinal fluid, saliva, a lymph fluid, a lacrimal fluid, or a seminal fluid.
- the sample comprises purified DNA, purified nucleosomes, purified chromatin, cell lysate.
- the sample comprises or consists of plasma, blood, serum, urine, spinal fluid, salvia, lymph fluid, lacrimal fluid or seminal fluid.
- the DNA fragments are cell free DNA fragments. 101.
- DNA fragment in average comprise more than 10 base pairs, such as more than 15 base pairs, such as more than 150 base pairs, for example in the range of 10 to 15,000 base pairs, such as in the range of 10 to 10,000 base pairs, for example in the range of 10 to 5,000 base pairs, such as in the range of 10 to 500 base pairs.
- step c) is done by ligation, such as by blunt end ligation.
- step d) is performed at a temperature in the range of 20° C to 80° C.
- step e) is performed at a temperature in the range of 40° C to 80° C, such as in the range of 45°C to 70°C, for example in the range of 50°C to 70°C. 110.
- a method of amplification of DNA fragments comprising the step of a) preparing DNA fragments attached to adaptors by the method according to items 49 to 109, b) amplifying said DNA fragments attached to adaptors in vitro.
- RNA polymerase is T7 RNA polymerase, Bacteriophage T3 RNA polymerase or Cyanophage Syn5 polymerase.
- T7 RNA polymerase is a polymerase of SEQ ID NO: 78 or a functional homologue thereof sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with SEQ ID NO: 78.
- a of said adaptor contains a sequence complementary to a primer binding site, and wherein a primer capable of binding to said primer binding site is used for amplification.
- each of said adaptors contains a sequence complementary to a primer binding site, wherein the primer binding site of the two adaptors are distinct, and wherein two distinct primers, each capable of binding to one of the two primer binding sites, are used for amplification.
- the following example compares the adaptor contamination obtained in a MINUTE-ChIP experiment performed essentially as described in Kumar et al, 2019 except that the adaptors described herein are used. Thus, the following example is performed using the adaptor “r5” shown in figure 1 B. As control, a prior art adaptor known as “3C” adaptor shown in figure 1A was used.
- Mouse embryonic stem cell pellets containing 1-2 x 10 6 cells were barcoded with either with u5 (see Figure 1 C), with r5 (see Figure 1 B) or with prior art (3C) adaptors ( Figure 1 A) at 50 .M All experiments were carried out in duplicates. Briefly, cells were first lysed and digested with MNase to enrich for mononucleosome population. The digestion was quenched by EGTA-containing end-repair and ligation buffer, in which each sample was ligated to u5, r5 or 3C adaptor molecules carrying unique barcode. Ligation was quenched by EDTA-containing lysis dilution buffer, before combining all samples in one tube. After centrifugation to remove insoluble cell debris, supernatant was pooled together and an aliquot was carried forward as input material to Proteinase K treatment at 65°C overnight.
- the u5 samples were subjected to enzymatic removal of adaptor contamination with sequential treatment of Uracil-DNA glycosylase and either an endonuclease VIII (NEB USER enzyme, available from New England Biolabs, US) (u5) or endonuclease IV (NEB USER enzyme III, available from New England Biolabs, US) (u5_ll I).
- R5 samples were subjected to enzymatic removal of adaptor contamination with treatment of Rnase Hll.
- Amplification Buffer is then added to the digested materials and incubate at 68°C for 2 hours for adaptor fill-in.
- RNA product was treated with Dnase I, purified and then ligated to a pre-adenylated RNA 3’ adaptor (RA3), which served as a primer binding site for reverse transcription.
- RA3 pre-adenylated RNA 3’ adaptor
- the resulting cDNA was treated with Rnase A and Rnase H, purified and then used as a template for library PCR with barcoded primers compatible with Illumina sequencing platform.
- the disclosed invention produced lower contamination in the final libraries and increased the percentage of mappable reads compared to standard adaptor.
- the following example shows that the “fill-in” adaptors can be ligated to cell free DNA fragments.
- the cfDNA fragments can be amplified and sequenced by next generation sequencing.
- This experiment can be performed with fill-in adaptors comprising all endonuclease nicking sites that are described in this invention.
- Human plasma is obtained from whole blood samples by centrifugation (10 min, 800g, 4°C) and collecting the supernatant, which is used fresh or flash frozen and may be stored at -80°C before use.
- Four plasma samples, 200uL each are set up in parallel ligation reactions for 2h at room temperature (with T4 Polynucleotide kinase (2.5U) and T4 Ligase (2.5U) 10x buffer, 3% PEG 4000, 0.2mM ATP), to directly ligate the fill-in adaptors shown in figure 1 onto cfDNA, whether in the form of nucleosomes or naked DNA in the plasma.
- the ligation reactions (250uL each) are stopped by the addition of a stop buffer (50mM Tris-Hcl, 150mM NaCI, 1% Triton X-100, 50mM EGTA, 50mM EDTA, 0.1% DOC) and the barcoded plasma samples are pooled (1.5 mL total volume). 150uL of the resulting pool is collected as the “input” and the remaining pool is equally split into two ChIP reactions that are incubated overnight at 4°C, with magnetic beads coupled with antibodies against histone H3 (3uL of Active Motif 39763) and histone H3K4me3 (3uL of Millipore 04745).
- a stop buffer 50mM Tris-Hcl, 150mM NaCI, 1% Triton X-100, 50mM EGTA, 50mM EDTA, 0.1% DOC
- the precipitated material along with the input were subjected to sequential treatment of RNase HI I and Bst polymerase, in vitro transcription by T7 RNA polymerase, cDNA conversion and library PCR as described in Example 1, yielding the Input, H3 and H3K4me3 libraries. Libraries are further diluted to 2nM and pooled, before sequencing on the Illumina NextSeq 2000 platform.
- the adaptors of the present invention can be efficiently ligated onto cell-free DNA (cfDNA) fragments, irrespective of if they are in the form of nucleosomes or not.
- cfDNA barcoded with the fill-in adaptors as described in the present invention can be amplified into a sequencing library after purification or can be subjected to ChIP in order to retrieve cfDNA fragments bound to nucleosomes (H3 ChIP) or nucleosomes with specific histone modifications present (H3K4me3 ChIP).
- SEQ ID NO: 1 >sp
- SEQ ID NO: 2 >sp
- SEQ ID NO: 4 >sp
- SEQ ID NO: 5 >tr
- SEQ ID NO: 6 >tr
- SEQ ID NO: 7 >tr
- SEQ ID NO: 8 >tr
- SEQ ID NO: 9 >tr
- SEQ ID NO: 10 >tr
- SEQ ID NO: 11 >tr
- SEQ ID NO: 12 >tr
- SEQ ID NO: 13 >tr
- SEQ ID NO: 14 >tr
- SEQ ID NO: 15 >tr
- SEQ ID NO: 16 >tr
- Type-4 uracil-DNA glycosylase from Methanosarcina mazei Gene Name:DU30_02025
- SEQ ID NO: 17 >tr
- Type-4 uracil-DNA glycosylase from Methanosarcina mazei Gene Name:DKM28_16410
- SEQ ID NO: 18 >tr
- SEQ ID NO: 19 >tr
- SEQ ID NO: 20 >tr
- SEQ ID NO: 21 >tr
- SEQ ID NO: 22 >tr
- SEQ ID NO: 23 >tr
- SEQ ID NO: 24 >sp
- SEQ ID NO: 25 >tr
- SEQ ID NO: 26 >sp
- SEQ ID NO: 27 >sp
- SEQ ID NO: 28 >sp
- SEQ ID NO: 29 >sp
- SEQ ID NO: 30 >sp
- SEQ ID NO: 31 >sp
- SEQ ID NO: 32 >sp
- SEQ ID NO: 33 >sp
- SEQ ID NO: 34 >sp
- ALKBH1 from Homo sapiens, Gene Name:ALKBH1
- SEQ ID NO: 35 >tr
- l1WZN9_HUMAN/1-238 DNA polymerase (Fragment) from Homo sapiens OX 9606
- SEQ ID NO: 36 >sp
- SEQ ID NO: 37 >tr
- Q862E1_BOVIN/1-147 Small ribosomal subunit protein uS3 (Fragment) from Bos taurus OX 9913
- SEQ ID NO: 39 >tr
- SEQ ID NO: 40 >tr
- SEQ ID NO: 41 >tr
- SEQ ID NO: 42 >sp
- SEQ ID NO: 43 >sp
- SEQ ID NO: 44 >sp
- SEQ ID NO: 45 >sp
- SEQ ID NO: 46 >sp
- SEQ ID NO: 47 >sp
- SEQ ID NO: 48 >sp
- SEQ ID NO: 50 >sp
- SEQ ID NO: 51 >tr
- SEQ ID NO: 52 >tr
- SEQ ID NO: 53 >tr
- SEQ ID NO: 54 >tr
- SEQ ID NO: 55 >tr
- SEQ ID NO: 56 >tr
- SEQ ID NO: 58 >tr
- SEQ ID NO: 59 >tr
- SEQ ID NO: 62 >tr
- SEQ ID NO: 64 >tr
- SEQ ID NO: 65 >sp
- SEQ ID NO: 68 [WP_033014420] Bst DNA Polymerase (DNA polymerase I from Geobacillus stearothermophilus)
- SEQ ID NO: 69 Bst DNA Polymerase Large Fragment (LF) 587 a. a. (290-876) (N- terminus truncated DNA polymerase I from Geobacillus stearothermophilus)
- SEQ ID NO: 70 Bacillus phage phi29 DNA polymerase
- SEQ ID NO: 73 SP6 RNA polymerase recognition sequence
- SEQ ID NO: 74 T3 RNA polymerase recognition sequence
- SEQ ID NO: 75 Cyanophage Syn5 polymerase recognition sequence
- SEQ ID NO: 78 T7 RNA polymerase
- SEQ ID NO: 79 Adaptor sequence s_USER_BC03
- SEQ ID NO: 80 Adaptor sequence as_USER_BC03
- SEQ ID NO: 83 Figure 1A top strand
- SEQ ID NO: 84 Figure 1A bottom strand
- SEQ ID NO: 88 Figure 1C bottom strand
- SEQ ID NO: 89 Figure 1 D top strand
- SEQ ID NO: 90 Figure 1 D bottom strand
- SEQ ID NO: 93 Mixed Adaptor A top strand
- SEQ ID NO: 94 Mixed Adaptor A bottom strand
- SEQ ID NO: 95 Mixed Adaptor B top strand
- SEQ ID NO: 96 Mixed Adaptor B bottom strand
- SEQ ID NO: 97 PCR forward primer (universal for all versions)
- SEQ ID NO: 98 PCR reverse primer (universal for all versions)
Landscapes
- Chemical & Material Sciences (AREA)
- Organic Chemistry (AREA)
- Life Sciences & Earth Sciences (AREA)
- Analytical Chemistry (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Immunology (AREA)
- Microbiology (AREA)
- Molecular Biology (AREA)
- Biotechnology (AREA)
- Physics & Mathematics (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Biochemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Genetics & Genomics (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The present invention relates to the field of ligation of oligonucleotides to DNA fragments. The ligation methods of the invention may be used for attaching oligonucleotides comprising for example adaptors, primer binding sites, promoters, tags, barcodes or any combination of the aforementioned to DNA fragments.
Description
Adaptors for ligation
Technical field
The present invention relates to the field of ligation of oligonucleotides to DNA fragments. The ligation methods of the invention may be used for attaching oligonucleotides comprising for example adaptors, primer binding sites, promoters, tags, barcodes or any combination of the aforementioned, to DNA fragments.
Background
Chromatin-Immunoprecipitation (ChIP) coupled to Next Generation Sequencing (ChlP- seq) is used to map the genomic occupancies of chromatin factors and histone modifications. To achieve high-throughput processing capacity and to reduce technical variation between samples, multiplexed ChlP-seq by sample pooling prior to immunoprecipitation has emerged recently. Multiplexing is possible only when individual samples are first barcoded with a unique DNA sequence in the form of an adaptor before combining with other samples. However, adaptor ligation to nucleosomes or other protein-bound DNA fragments typically suffers from low efficiency and hence is often circumvented with excessive amount of adaptor molecules. Owning to the small size differences between the adaptor (~60bp), transcription factor bound DNA (~50bp) and nucleosomal DNA (~150bp), the excessive free adaptor cannot be removed efficiently by routine beads or column-based size selection clean-up methods. Adaptor dimers or concatemers formed when ligating adaptors at high concentration are also very challenging to remove. These uneventful adaptor forms are inevitably carried over through the ChIP workflow and contaminate later sequencing. Owning to their small size and relatively high copy number, adaptor contamination could severely consume sequencing reagents, hence significantly lower the usable number of reads per sequencing run.
Summary
Accordingly, there is an unmet need for adaptors which can be ligated or otherwise attached to DNA fragments with enhanced efficiency, and in particular to protein-bound DNA fragments, and at the same time permitting discrimination between free adaptor or adaptor dimers and adaptors attached said DNA.
In particular, there is an unmet need for re-designed sample-barcoding adaptors to allow enhanced ligation.
The current patent application describes a novel design of nucleotide-barcoding adaptors which in their unligated free form are not amplified under conditions where the ligated adaptors are amplified. Herein such adaptors are referred to as “fill-in adaptors”. In particular, adaptor functions of the fill-in adaptors of the invention are only restored when they are ligated to other DNA fragments, such as genomic DNA fragments or cell free DNA. By mitigating adaptor contamination, not only are the fill-in adaptors of the invention cost-effective, e.g. when used in Next Generation Sequencing, but they also promote higher library diversity due to more selective amplification of desired ligation products. Also, since monomeric adaptor alone is non-functional, it permits a more aggressive size selection scheme to retain small DNA fragments. This is for example advantageous for mapping transcription factor footprints at high resolution using ChlP.
The present invention provides adaptors designed in a manner so free adaptor or adaptor dimers can be discriminated from adaptors attached to DNA fragments by selective amplification.
In particular, in embodiments of the invention relating to ligating fill-in adaptors to chromatin fragments, the design of the fill-in adaptors allows that only adaptor ligated to the target chromatin fragments can be carried over during selective amplification. The target chromatin fragments are in general broadly defined as transcription factor bound, which typically are ~50bp or nucleosomal, which typically are ~150bp. The chromatin fragments may for example be prepared cellular chromatin or they may be cell free chromatin. Herein ligation of adaptor to target DNA is also referred to as eventful ligation, as opposed to self-ligation of adaptors.
More specifically, the adaptors of the invention comprise a single stranded amplification sequence and a nicking site. The single stranded amplification sequence is positioned at one end of the adaptor and is only functional as an amplification initiation site in the form of double-stranded DNA, i.e. after synthesis of the complementary strand. Within the complementary strand, the nicking site is positioned close to the other end of the adaptor. If the adaptor is incubated at an elevated temperature after nicking, the resulting short stretch of oligonucleotides between the nicking site and the adaptor end will dissociate from the adaptor. Similarly, if adaptor dimers are formed, they will dissociate if incubated at elevated temperature after nicking.
In contrast, an adaptor ligated to a DNA fragment, such as the gDNA moiety of a chromatin fragment, will not dissociate. A strand displacing polymerase will be able to elongate the nicked strand, and thereby synthesise the complementary strand of the amplification initiation site.
The concept of the invention is illustrated in Figure 1 and 2. Figure 1 B to E show specific examples of a fill-in adaptor of the invention, whereas Figure 2 illustrates the principle. The fill-in adaptor according to the invention consists of 2 strands. The upper strand consists of 3 regions denoted A, B and C, whereas the lower strand consists of 3 regions denoted A’, B’ and C’. A comprises a single stranded region (A-i), and either it constitutes a promoter, when bound to its complementary sequence or it is a sequence complementary to a primer binding site. A is substantially non-complementary to Ai. Importantly, A is designed so that transcription of the fill-in adaptor can only occur if A is annealed to its complementary sequence. A’ is resistant to exonuclease. C’ or -B’-C’- comprises a nicking endonuclease recognition site or pre-recognition site. Said recognition site comprises or consists of a nicking site, i.e. the site, where said endonuclease nicks the adaptor. In some embodiments, the nicking endonuclease recognition site consists of a single nucleotide (e.g. dll), and in such embodiments, the recognition site consists of a nicking site. Typically, the nicking site is positioned at the 3’ end of the C’ segment. The mismatch between Ai and A/ hence creates a fork DNA structure that is refractory to DNA ligation. More importantly, such single-stranded A is incapable of driving/priming transcription. Another critical feature of the fill-in adaptor is presence of the nicking endonuclease recognition site, which can be any nicking endonuclease recognition site. Alternatively, it can be a pre-recognition site, which can be transformed into a nicking endonuclease recognition site, e.g. by the action of a DNA glycosylase. A 3’ hydroxyl group may be generated after nicking at the nicking site allowing subsequent primer extension by a strand-displacing polymerase, which synthesizes a new bottom strand and eventually reconstitutes the promoter or primer binding functionality of A. After nicking, the resulting “primer” at the bottom strand is rendered very unstable due to the short length, especially under heat challenge. However, the bottom strand primer length and hence its melting temperature is increased dramatically once the adaptor is ligated to another DNA fragment.
As such, the length and hence the resulting heat stability of the bottom strand primer provides an effective selection basis to specifically reconstitute the A promoter/primer binding site in case of an eventful ligation product, while free adaptor monomers remain inactive.
Adaptor dimers could present yet another challenge. Thanks to the fork structure at the tail end of the fill-in adaptor, it does not support ligation and because Ai’ is exonuclease resistant, it is refractory to routine end-repairing enzymes. As such adaptor dimer can only exist in a head-to-head configuration as illustrated in Figure 2. In this case, RNase can create a nick at both the top and bottom strands, essentially cleaving the adaptor dimers to monomeric forms. If subjected to a heat challenge the adaptor dimers will dissociate into monomeric forms. This is also illustrated in Figure 2.
An example of concept of the invention is presented in figure 1C. The fill-in adaptor of that example consists of a T7 promoter, partial SBS primer binding site allowing sequencing in the Illumina platform, a randomized 8-nucleotide Unique Molecule Identifier (UMI) followed by an 8-nucleotide sample-specific barcode. The T7 promoter is used for in-vitro transcription (IVT) of any downstream DNA fragment. The top strand of the T7 promoter is designed to be largely single stranded by default while the bottom strand is replaced by a stretch of seven consecutive cytosines interconnected by exonuclease-resistant phosphorothioate linkage. The mismatch hence creates a fork DNA structure that is refractory to DNA ligation. More importantly, such single-stranded T7 promoter is incapable of driving IVT by T7 RNA polymerase. Another critical feature of the fill-in adaptor of this example is that the eighth nucleotide within the sample barcode (counted from the ligation end of the adaptor) is a deoxy-Uridine. Embedding a single deoxy-Uridine within a DNA duplex essentially creates a recognition site for DNA glycosylases. The DNA glycosylase removes the nitrogenous base while leaving the sugar phosphate backbone intact. This creates an apurinic/apyrimidinic site (i.e. an AP site), or in other words an abasic site. As used herein, the terms “apurinic/apyrimidinic site”, “AP site” and “abasic site” are used interchangeably. The AP site (herein also referred to as a nicking site) is recognized by endonucleases, which produce a 1 nucleotide gap in C’. The nick at the nicking site allows subsequent primer extension by a strand-displacing Bst polymerase, which synthesizes a new bottom strand and eventually reconstitutes a functional double-strand T7 promoter.
Another example of a useful nicking strategy and associated adaptor design to be used with the present invention is shown in Figure 1 D, where a nicking restriction enzyme variant is used. Restriction enzymes recognize a specific sequence and cut either one or both strands at defined positions relative to the recognized sequence. In the specific example, the Nb.BsrDI nicking restriction enzyme variant recognizes a GCAATG sequence on the top strand while introducing a nick in the bottom strand.
Another example of a useful nicking strategy and associated adaptor design to be used with the present invention is shown in Figure 1 E, in which the D10A Cas9 variant is used to introduce a nick in the bottom strand guided by a gRNA recognizing a 21mer. In order for the Cas9 enzyme to cut the gRNA-targeted sequence, a PAM sequence (here GG) adjacent to the gRNA binding site must be present.
As such, the length and hence the resulting heat stability of the bottom strand primer provide an effective selection basis to specifically reconstitute the T7 promoter of eventful ligation product, while free adaptor monomers remained inactive for T7 transcription, hence failed to be carried over for downstream RNA adaptor ligation and cDNA conversion. Adaptor dimers could present yet another challenge. Thanks to the fork structure at the tail end of the adaptor, it does not support ligation. And because of the phosphorothioate linkage of the mismatch poly-C sequence, the fork structure is exonuclease resistant and hence refractory to routine end-repairing enzymes. As such adaptor dimer can only exist in a head-to-head configuration. In this case, the endonucleases can act as a restriction enzyme by nicking at both the top and bottom strands, essentially cleaving the adaptor dimers to monomeric forms, which dissociate from each other at elevated temperature, as demonstrated in Figure 2.
A first aspect of the invention relates to a partly double-stranded adaptor comprising or consisting of an oligonucleotide of the general structure: 5’ -A - B - C - 3’ 3’ -A’ - B’ - C’ - 5’ wherein a) A is the top strand of a DNA amplification sequence, and A consists of 5’ - AI - A2 - 3’; b) A’ consists of 3’ - A1’ - A2’ - 5’, and
c) Ai’ is a sequence of nucleotides, which is substantially non- complementary to Ai, wherein the 3’ end is exonuclease resistant and/or contains primer extension blocking groups; d) A2 and A2’ are either not present or A2 and A2’ are sequences of nucleotides substantially complementary to each other; e) B and B’ are sequences of in the range of 5 to 100 deoxyribonucleotides, which are substantially complementary to each other; and f) C and C’ are sequences, which are substantially complementary to each other, wherein C’ comprises a nicking endonuclease recognition site or a nicking endonuclease pre-recognition site, wherein said pre-nicking site can be converted into a nicking endonuclease recognition site, for example by the action of a DNA glycosylase.
It is preferred that the nicking endonuclease recognition site is not a ribonucleotide positioned at the 3’ end of C’.
A second aspect of the invention relates to a partly double-stranded adaptor comprising or consisting of an oligonucleotide of the general structure:
5’ - A - B - C - 3’
3’ - A’ - B’ - C’ - 5’ wherein a) A is the top strand of a DNA amplification sequence, and A consists of 5’ - Ai - A2 - 3’; b) A’ consists of 3’ - Ai’ - A2’ - 5’, and c) Ai’ is a sequence of nucleotides, which is substantially non- complementary to Ai, wherein the 3’ end is exonuclease resistant and/or contains primer extension blocking groups; d) A2 and A2’ are either not present or A2 and A2’ are sequences of nucleotides substantially complementary to each other; e) B and B’ are sequences of in the range of 5 to 100 deoxyribonucleotides, which are substantially complementary to each other; and f) C and C’ are sequences, which are substantially complementary to each other, wherein C’ comprises a deoxy-Uridine (dll).
A third aspect of the invention relates to a method of attaching adaptor(s) to DNA fragment(s), said method comprising: a) providing at least one adaptor according to the invention; b) providing a sample containing DNA fragments; c) attaching the adaptor to the DNA fragments in the sample; and d) i) if C’ comprises a nicking endonuclease pre-recognition site, incubating the sample with a DNA glycosylase recognising said pre-nicking site and with a nicking endonuclease recognising the nicking endonuclease recognition site generated by the DNA glycosylase either sequentially, partly simultaneously or simultaneously under conditions allowing for activity of said enzymes; or ii) if C’ or -B’-C’- comprises a nicking endonuclease recognition site incubating the sample with a nicking endonuclease recognising said nicking endonuclease recognition site under conditions allowing for activity of said enzyme e) incubating the sample at a temperature that is higher than the Tm of i) and ii), wherein i) and ii) are as follows i) 5’ - C - 3’
3’ - O’ - 5’, ii) 5’ - C - C’ - 3’
3’ - C - C - 5’; f) incubating the sample with a strand-displacing DNA polymerase.
A fourth aspect of the invention relates to a method of attaching adaptor(s) to DNA fragment(s), said method comprising: a) providing at least one adaptor according to the invention, wherein C’ of said adaptor comprises a dll; b) providing a sample containing DNA fragments; c) attaching the adaptor to the DNA fragments in the sample; and d) i) incubating the sample with a Uracil-DNA glycosylase under conditions allowing for activity of said enzyme, thereby generating an AP site;
ii) incubating the sample with a nicking endonuclease recognising an AP site under conditions allowing for activity of said enzyme, wherein steps i) and ii) may be performed simultaneously, partly simultaneously or sequentially, e) incubating the sample at a temperature that is higher than the Tm of i) and ii), wherein i) and ii) are as follows ii) 5’ - C - 3’
3’ - C’ - 5’, iii) 5’ - C - C’ - 3’ 3’ - C’- C - 5’; f) incubating the sample with a strand-displacing DNA polymerase.
It will be understood that where the adaptor comprises dU, the AP site generated by the Uracil-DNA-glycosylase in step (d) is an apyrimidinic site.
A fifth aspect of the invention relates to a method of amplification of DNA fragments, said method comprising the step of a) preparing DNA fragments attached to adaptors by the method as described herein; b) amplifying said DNA fragments attached to adaptors in vitro.
An adaptor can be added at one or both ends of a DNA fragment. The same or different adaptor may be ligated at each end. In an embodiment, two species of adaptor are provided, that is two different adaptors, such that a different adaptor may be ligated at each end. Such an embodiment is particularly useful in the case that the DNA amplification sequence in the adaptor is complementary to a primer binding site, and the adaptor-attached fragments are amplified by primer-directed amplification, e.g by PCR. In particular, in the case of amplification by PCR or similar, the two adaptor species may comprise (or provide) different primer binding sites.
Description of Drawings
Figure 1 : 1 A shows an example of a prior art adaptor described in Peter van Galen et al., Molecular Cell, 2016 (top strand: SEQ ID NO. 83; bottom strand SEQ ID NO. 84) and four examples of fill-in adaptors according to the invention (1 B-E). 1 B shows an
example of a fill-in adaptor with a nicking endonuclease recognition site on the lower strand at the 5th base pair from 5’ end (top strand: SEQ ID NO. 85; bottom strand SEQ ID NO. 86). 1C shows an example of a fill-in adaptor with a deoxy-Uridine on the lower strand at the 8th base pair from 5’ end (top strand: SEQ ID NO. 87; bottom strand SEQ ID NO. 88). 1 D shows an example of a fill-in adaptor with a restriction endonuclease recognition site positioned between the UMI and Barcode (top strand: SEQ ID NO. 89; bottom strand SEQ ID NO. 90). 1 E shows an example of a fill-in adaptor with an CRIPSR-Cas9 nickase recognition site (top strand: SEQ ID NO. 91 ; bottom strand SEQ ID NO. 92).
Figure 2 shows a schematic illustration of the concept of the invention. Following the addition of the adaptor to the target DNA fragments and ligation thereof, the products exist as a mixture of excessive unreacted adaptor monomers, adaptor dimers and adaptors ligated to target DNA fragments, such as genomic DNA fragments or cell free DNA. These molecules are then subjected to a nicking endonuclease, which specifically nicks at the nicking endonuclease recognition site or to a DNA glycosylase and a nicking endonuclease which specifically nicks at the nucleotide within the adaptor that is recognized by the DNA glycosylase, resulting in a 3’ priming site proximal to the ligation end of the adaptor. Heat challenge dissociates the small fragments generated by nicking in the unligated adaptors as well as any adaptor dimers. Subsequent primer extension by strand replacing DNA polymerase (in the figure exemplified by Bst polymerase, but it could be any strand replacing DNA polymerase) relies on the existence of a heat-stable 3’ priming site, provided by the ligation to DNA fragments, such as genomic DNA fragments (which are typically ~50- 150bp) or cell free DNA. Primer extension by the strand-displacing DNA polymerase reconstitutes the DNA amplification sequence in the double-stranded form necessary to transcribe or amplify the ligated adaptor-DNA fragments, due to the presence of a stable 3’ priming site. Due to the lack of a free 3’ priming site for polymerase, any free adaptor contaminants are not elongated and thus in the free adaptors or adaptor dimers the DNA amplification sequence is not regenerated and remains inactive. Thus, only adaptors ligated to the targeted DNA fragments, e.g. genomic DNA fragments or cell free DNA can be amplified.
Figure 3: Figure 3A+B show an example of how gDNA fragments may be tagged, amplified by linear amplification or PCR and sequenced in methods of the invention for
determining the barcode (BC), UMI and/or gDNA sequence, for example as part of a method to profile and quantify chromatin modifications 1) Fill-in adaptor of this invention is added randomly to the gDNA moiety of all or a fraction of chromatin fragments from each sample. Adaptors may be added at one side or both sides of a given gDNA fragment. After pooling, optionally splitting pool into sub-pools, and submitting these to chromatin immunoprecipitation, gDNA fragments of each chromatin modification sub-pool are purified. 2) After nicking and heating the sample, the DNA polymerase is added for adaptor fill-in. 3) T7-RNA polymerase-driven in vitro transcription 4) a second tag is ligated onto the gDNA fragment. The second tag comprises an amplification sequence and may optionally comprise a second barcode sequence. The non-ligatable terminus of the fill-in adaptor ensures that second tag is not ligated onto the fill-in adaptor. 5) The second tag serves as a primer binding site for reverse transcription 6) The double tagged gDNA fragments are amplified by PCR using primers specific for amplification sequence of the fill-in adaptor and amplification sequence of second tag. Optionally, part of the amplification sequence of the fill-in adaptor is identical to part of the amplification sequence of the second tag (diagonally shaded area). Sequencing platform-specific adaptors may be added with the primer sequences. 7) The UMI-BC-gDNA part is sequenced. Figure 3B shows a specific scenario where adaptors added to both ends of a given gDNA fragment, in which the gDNA fragment is still amplified as intended.
Figure 3 C+D show an example of how gDNA fragments may be tagged, amplified by PCR and sequenced in methods of the invention for selectively amplifying genespecific or locus-specific gDNA fragments, for example to determine gene-specific or locus-specific levels of chromatin modifications 1) Fill-in adaptor is added randomly to the gDNA moiety of all or a fraction of chromatin fragments from each sample. After pooling, splitting pool into sub-pools, and chromatin immunoprecipitation, gDNA fragments of each chromatin modification sub-pool are purified. 2) After heating the sample, the DNA polymerase is added for adaptor fill-in. 3) T7-RNA polymerase-driven in vitro transcription 4) A second tag with a locus-specific sequence serves as a primer binding site for reverse transcription 5) Fill-in adaptor gDNA fragments are amplified by PCR using one primer specific for amplification sequence of the Fill-in adaptor and a primer specific for one or multiple loci of interest. Sequencing platform-specific adaptors may be added as part of the primers. Primers may also optionally comprise a second barcode sequence. 6) The UMI-BC-gDNA part is sequenced.
Figure 3E show an example of how gDNA fragments may be tagged, amplified by PCR and sequenced in methods of the invention for determining the barcode (BC), UMI and/or gDNA sequence, for example as part of a method to profile and quantify chromatin modifications. 1) A mixture of two fill-in adaptor species (“A” and “B”) of this invention, sharing the same BC but carrying different primer binding sites A and B is used. A and B adaptors are added randomly to the gDNA moiety of all or a fraction of chromatin fragments from each sample, so that a gDNA moiety is either ligated to an adaptor A or B on one end, or ligated on both ends to two adaptors in one of the possible combinations A-gDNA-A, A-gDNA-B, B-gDNA-A, B-gDNA-B. After pooling, optionally splitting pool into sub-pools, and submitting these to chromatin immunoprecipitation, gDNA fragments of each chromatin modification sub-pool are purified. 2) After nicking and heating the sample, the DNA polymerase is added for adaptor fill-in. 3) A-B Double-tagged gDNA fragments are amplified by PCR using two primers specific for amplification sequence A and B, respectively. 4) The UMI-BC- gDNA part is sequenced.
Figure 4 shows performance comparison between the adaptors shown in Figure 1 B (r5) and figure 1C (u5 and u5_ll I) and the prior art adaptor shown in figure 1A (3C) using MINUTE-ChlP. The figure shows % of free adaptor compared to total sequences (read statistics from NGS analysis) of indicated libraries. R5, u5 and u5_l II adaptors yielded ~20-50-fold higher fraction of desired reads containing gDNA sequences.
Figure 5: shows ligation of adaptors with an endonuclease nicking site according to this invention, to cell free DNA (cfDNA) fragments in the human blood. Figure 5A shows the schematic workflow used in the experiment: Ligation was performed by directly adding a reaction mix containing T4 Polynucleotide Kinase and T4 Ligase to 200uL each of plasma. The four barcoded plasma samples were pooled. 200uL of the pool were subjected to DNA purification and library preparation using T7 amplification, reverse transcription and library PCR according to the MINUTE-ChlP protocol, yielding the “Input” library. 200uL each of the pool were subjected to H3 and H3K4me3 ChIP using antibodies against histone H3 or histone H3K4me3, DNA was purified from the precipitated material and libraries were prepared. Figure 5B shows a histogram showing fragment size distribution in each sequencing library. Figure 5C shows boxplots with the number of unique fragments (estimated library size) recovered from
each of the four plasma samples that were barcoded and pooled, in the Input, H3-ChlP and H3K4me3-ChlP library. Figure 5D shows the reads in the libraries, which were mapped to the human genome (hg38) and plotted over 17644 known transcription start sites. The heatmaps show that H3K4me3-ChlP recovered predominantly reads mapping to the transcription start sites of genes.
Detailed description
Definitions
As used herein, the term “adaptor” refers to an oligonucleotide, which is doublestranded at one end, and which thus can be ligated to a DNA fragment.
As used herein, the term “amplification” in relation to nucleic acids refers to any in vitro method for increasing the number of copies of a nucleotide sequence with the use of a polymerase. Amplification reactions include, for example, polymerase chain reactions (PCR), transcription, reverse transcription, replication or combinations of the aforementioned. Preferably, “DNA amplification” comprises PCR.
Herein two nucleotide sequences are considered to be “complementary” to each other, when said nucleotides sequences are able to hybridise to each other via formation of Watson-Crick base-pairing in manner so that all nucleotides of one sequence are base paired with all nucleotides of the second sequence.
As used herein the term “DNA amplification sequence” refers to a sequence, which promotes transcription or replication of DNA, wherein said transcription or replication only is promoted when the bottom strand of the DNA amplification sequence is available. In particular, the DNA amplification sequence may comprise or consist of a promoter or a primer binding site.
As used herein the term “melting temperature” in terms of nucleic acids is the temperature at which 50% of two substantially complementary nucleotide sequences form a stable double helix and the other 50% is separated to single strand molecules. The melting temperature may also be referred to as Tm. Preferably, the Tm as used herein is calculated using a nearest-neighbor method based on the method described in Breslauer et al., Proc. Natl. Acad. Sci. 83, 3746-50 (1986) using a salt concentration parameter of 50 mM and nucleotide sequence concentration of 900 nM. For example,
the method is implemented by the software "Multiple Primer Analyzer" from Life Technologies/Thermo Fisher Scientific Inc.
Herein two nucleotide sequences are considered to be “non-complementary” if they are not capable of hybridising to each other, preferably under standard conditions for hybridization, such as in storage buffer with 10 mM Tris and 1 mM EDTA at a pH of 8.0 and a temperature of 5°C below the melting temperature of one of said nucleotide sequences with a complementary sequence forming Watson-Crick base pairs at all positions. For example, two nucleotide sequences are considered to be ‘non- complementary’ to each other if at the most 30%, preferably at the most 20%, more preferably at the most 10% of the nucleotides of one sequence can form Watson-Crick base-pairs with nucleotides of the second sequence, when the sequences are aligned with each other.
The term “sequence identity” as used herein describes the relatedness between two amino acid sequences or between two nucleotide sequences, i.e. a candidate sequence and a reference sequence based on their pairwise alignment. For purposes of the present invention, the sequence identity between two amino acid sequences is determined using the Needleman-Wunsch algorithm (Needleman and Wunsch, 1970, J. Mo/. Biol. 48: 443-453) as implemented in the Needle program of the EMBOSS package (EMBOSS: The European Molecular Biology Open Software Suite, Rice et al., 2000, Trends Genet. 16: 276-277,), preferably version 5.0.0 or later (available at https://www.ebi.ac.uk/Tools/psa/emboss_needle/). The parameters used are gap open penalty of 10, gap extension penalty of 0.5, and the EBLOSUM62 (EMBOSS version of 30 BLOSUM62) substitution matrix. The output of Needle labelled “longest identity” (obtained using the -nobrief option) is used as the percent identity and is calculated as follows:
(Identical Residues x 100)/(Length of Alignment - Total Number of Gaps in Alignment)
The Needleman-Wunsch algorithm is also used to determine whether a given amino acid in a sequence other than the reference sequence corresponds to a given position of the reference sequence.
As used herein, “strand displacing polymerase” refers to a nucleic acid polymerase that has a strand displacement activity apart from its nucleic acid synthesis activity. That is, a strand displacing nucleic acid polymerase can continue nucleic acid synthesis on the
basis of the sequence of a nucleic acid template strand (i.e., reading the template strand) while displacing a complementary strand that had been annealed to the template strand.
Herein two nucleotide sequences are considered to be “substantially complementary” to each other, when said nucleotides sequences are able to hybridise to each other, preferably under standard conditions for hybridization, such as in storage buffer with 10 mM Tris and 1 mM EDTA at a pH of 8.0 and a temperature of 5°C below the melting temperature of one of said nucleotide sequences with a complementary sequence forming Watson-Crick base pairs at all positions. For example, two nucleotide sequences are considered to be ‘substantially complementary’ to each other if at least 80%, preferably at least 85% of the nucleotides of one sequence can form Watson- Crick base-pairs with nucleotides of the second sequence, when the sequences are hybridised to each other.
As used herein, the term “top strand” refers to the sense strand of DNA, while the term “bottom strand” refers to the anti-sense strand.
As discussed above, the present invention provides for the use of at least one adaptor in the methods described herein. This includes providing different adaptors such that a different adaptor may be ligated at each end of a DNA fragment. Accordingly, the term “at least one adaptor” may denote that at least one species of adaptor is provided. As noted above, and described further below, the methods herein may comprise the use of two adaptor species, or two different adaptors, such that a different adaptor is ligated at each end of DNA fragment.
It is further included that different adaptors may be provided for ligation to different DNA fragments. Accordingly, the term “at least one adaptor” may also include that a plurality of adaptors may be employed in the methods described herein. For example, two or more adaptors as described herein may be provided. In some embodiments, one, two, three, four, five, six, seven, eight, nine or ten adaptors may be provided. The skilled person would be capable of determining how many adaptors may be required.
Fill-in Adaptor
The present invention relates to adaptors, which herein are also referred to as “fill-in” adaptors. In general, the fill-in adaptors are at least partly double-stranded oligonucleotides of known sequence. However, the fill-in adaptors may also comprise stretches of unknown or random sequences, such as UMI sequences.
The fill-in adaptors may be ligated to DNA fragments, and depending on the exact sequences of the fill-in adaptors, the ligation of adaptor may enable the generation of amplification-ready products of the target DNA fragments. Preferably, the adaptor of present invention is a partly double stranded oligonucleotide, typically adapting a forklike configuration. The majority of the adaptor is typically double-stranded, however one end is non-complementary, and thus the adaptor comprises single strands at one of the ends. The upper strand comprises or consists of 3 regions, which here are denoted A, B and C, whereas the lower strand comprises or consists of 3 regions, which herein are denoted A’, B’ and C:. Each of A, B, C, A’, B’ and C’ consists of a nucleotide sequence. Each of A, B, C, A’, B’ and C: are described in more detail below, and the fill-in adaptor of the invention may comprise any of the A, B, C, A’, B’ and C’ described herein the following sections.
Preferably, the fill-in adaptor of the present invention is a partly double-stranded adaptor comprising or consisting of an oligonucleotide of the general structure:
5’ - A - B - C - 3’ 3’ - A’ - B’ - C’ - 5’ wherein a) A is the top strand of a DNA amplification sequence, and A consists of 5’ - Ai - A2 - 3’; b) A’ consists of 3’ - Ai’ - A2’ - 5’, and c) Ai’ is a sequence of nucleotides, which is substantially non- complementary to Ai, wherein the 3’ end is exonuclease resistant and/or contains primer extension blocking groups; d) A2 and A2’ are either not present or A2 and A2’ are sequences of nucleotides substantially complementary to each other; e) B and B’ are sequences of in the range of 5 to 100 deoxyribonucleotides, which are substantially complementary to each other; and
f) C and C’ are sequences, which are substantially complementary to each other, wherein C’ comprises a nicking endonuclease recognition site or a nicking endonuclease pre-recognition site, wherein said pre-nicking site can be converted into a nicking endonuclease recognition site by the action of a DNA glycosylase with the proviso that the nicking endonuclease recognition site is not a ribonucleotide positioned at the 3’ end of C’.
In another aspect, the present invention relates to a partly double-stranded adaptor comprising or consisting of an oligonucleotide of the general structure:
5’-A-B-C -3’
3’_A’-B’-C’-5’ wherein a) A is the top strand of a DNA amplification sequence, and A consists of 5’ -AI-A2-3’; b) A’ consists of 3’ -A1’ -A2’ - 5’, and c) A1’ is a sequence of nucleotides, which is substantially non- complementary to A1, wherein the 3’ end is exonuclease resistant and/or contains primer extension blocking groups; d) A2 and A2’ are either not present or A2 and A2’ are sequences of nucleotides substantially complementary to each other; e) B and B’ are sequences of in the range of 5 to 100 deoxyribonucleotides, which are substantially complementary to each other; and f) C and C’ are sequences, which are substantially complementary to each other, wherein C’ comprises a deoxy-Uridine (dU).
In some embodiments of the present invention, the adaptor comprises or consists of an oligonucleotide of the general structure:
5’-A-B-C -3’
3’-A’-B’-C’-P-5’ wherein a) A, B, C, A’, B’ and C’ are as described herein above or below; and b) P is a 5’ phosphate.
In some embodiments of the present invention, the DNA amplification sequence is a promoter sequence of an RNA polymerase or it comprises a primer binding site.
In some embodiments of the present invention, the adaptor does not comprise any ribonucleotides.
In some embodiments, the fill-in adaptor is an adaptor comprising or consisting of SEQ ID NO: 79 as one strand and SEQ ID NO: 80 as the other strand.
In some embodiments, the fill-in adaptor is an adaptor comprising or consisting of SEQ ID NO: 81 as one strand and SEQ ID NO: 82 as the other strand.
In some embodiments, the fill-in adaptor is an adaptor comprising or consisting of SEQ ID NO: 87 as one strand and SEQ ID NO: 88 as the other strand (Figure 1C).
In some embodiments, the fill-in adaptor is an adaptor comprising or consisting of SEQ ID NO: 89 as one strand and SEQ ID NO: 90 as the other strand (Figure 1 D).
In some embodiments, the fill-in adaptor is an adaptor comprising or consisting of SEQ ID NO: 91 as one strand and SEQ ID NO: 92 as the other strand (Figure 1 E).
Fill-in Adaptor: A
The following section describes A of the adaptor of the present disclosure.
A is the top strand of a DNA amplification sequence, and A consists of 5’ - Ai - A2 - 3’,
In some embodiments, A2 is not present, in which case A consists of A1.
It is a hallmark of the present invention that A is the top strand of a DNA amplification sequence. Said DNA amplification sequence may be any sequence, which promotes transcription or replication of DNA, when the bottom strand of the DNA amplification sequence is available. The fill-in adaptor comprises A, but does not comprise a sequence complementary to A1. Accordingly, the DNA amplification sequence of the free fill-in adaptor is not functional and does not promote transcription/replication.
However, if the bottom strand of the DNA amplification sequence is reconstituted, the
DNA amplification sequence will promote transcription/replication. As described herein elsewhere, once the fill-in adaptor is ligated to a DNA fragment, and the adaptor has been nicked with the endonuclease, the bottom strand of fill-in adaptor can be generated with the aid of a strand-replacing polymerase using the nicked 3’ end as priming site, thereby reconstituting an active DNA amplification sequence.
Thus, the DNA amplification sequence can be any sequence promoting transcription or replication only when the bottom strand has been reconstituted.
In some embodiments, the DNA amplification sequence is the promoter sequence of an RNA polymerase. In such embodiments, A is recognised and bound by an RNA polymerase when in a double-stranded DNA form with its complementary sequence.
In a preferred embodiment, A comprises or consists of the T7 promoter, the SP6 promoter, the T3 phage promotor or the Syn5 promotor. Thus, in such embodiments, A when bound to its complementary sequence, is recognized by T7 RNA polymerase, the SP6 RNA polymerase, the Bacteriophage T3 RNA polymerase or Cyanophage Syn5 polymerase. In some embodiments of the present disclosure, A is recognized by T7 RNA polymerase, when bound to its complementary sequence as a double-stranded DNA. Said T7 promoter preferably comprises or consists of a sequence of SEQ ID NO: 72 or a sequence sharing at least 90%, preferably at least 95% sequence identity therewith. Said SP6 RNA polymerase preferably recognizes the sequence of SEQ ID NO: 73 or a sequence sharing at least 95% sequence identity therewith. Said T3 RNA polymerase preferably recognizes the sequence of SEQ ID NO: 74 or a sequence sharing at least 95% sequence identity therewith. Said Cyanophage Syn5 polymerase preferably recognizes the sequence of SEQ ID NO: 75 or a sequence sharing at least 95% sequence identity therewith.
In one embodiment of the present disclosure, A contains a sequence complementary to a primer binding site. Thus, the adaptor and any DNA fragment ligated thereto can be amplified using a primer binding to said primer binding site. The primer binding site may be any sequence complementary to a primer. Preferably, neither said primer nor the primer binding site is prone to formation of secondary structure.
A may be any suitable length. When A comprises or consists of a promoter sequence, A should at minimum be the length of said promoter, and frequently A is exactly the length of the promoter. When A is a sequence complementary to a primer binding site, A is preferably long enough to allow hybridisation of the primer to the primer binding site with high affinity.
In general A consists of a sequence of nucleotides in the range of 10 to 100 nucleotides, such as in the range of 15 to 50 nucleotides, such as in the range of 15 to 40 nucleotides. Preferably, said nucleotides are deoxyribonucleotides. Thus, preferably, A consists of a sequence of deoxyribonucleotides in the range of 10 to 100 deoxyribonucleotides, such as in the range of 15 to 50 deoxyribonucleotides, such as in the range of 15 to 40 deoxyribonucleotides.
Fill-in Adaptor: A’
The following section describes A’ of the adaptor of the present disclosure. A’ is part of the lower strand of the adaptor, and it is therefore described in the 3’->5’ direction herein.
A’ consists of 3’ - Ai’ - A2’ - 5’.
In some embodiments, A2’ is not present, in which case A’ consists of Ai’. However, if A2 is present, then A2’ is also present, and if A2 is not present, then A2’ is also not present.
If A2 and A2’ are present, A2 and A2’ are sequences of nucleotides substantially complementary to each other. The length of A2 and A2’ is not important, but typically, they will be the same length and relatively short, e.g. less than 10 nucleotides, such as less than 5 nucleotides. Preferably, said nucleotides are deoxyribonucleotides. Thus, A2 and A2’ may comprise less than 10 deoxyribonucleotides, such as less than 5 deoxy ribonucleotides.
Ai’ is a sequence of nucleotides, which is non-complementary to Ai. Thus, Ai does not hybridise with A/, which results in a fork like structure at one end of the fill-in adaptor. The 3’ end of the A/ is exonuclease resistant and/or it contains a primer extension blocking group. That way, the 3’ end of A’ cannot serve as priming site for elongation.
DNA amplification will then only take place once the complementarity of A is restored in a double-stranded DNA form.
Preferably, Ai’ is exonuclease resistant. In that manner, Ai’ will not be removed by exonucleases. If Ai’ were to be removed by exonuclease, this could create a priming site for the polymerase, and allow elongation even when the adaptor is not ligated to a DNA fragment.
The 3’ end of Ai’ can be resistant to exonuclease in any manner known to the skilled person. In one embodiment of the present disclosure, Ai’ comprises or consists of a sequence of nucleotides connected through exonuclease-resistant phosphorothioate linkage. For example, Ai’ may comprise or consist of a sequence of 3 to 35, such as in the range of 5 to 15 consecutive nucleotides connected through exonuclease-resistant phosphorothioate linkages. Said nucleotides may be any nucleotides, however in one embodiment, the nucleotides are deoxycytidine monophosphate.
Thus, in one embodiment of the present disclosure, A comprises or consists of a sequence of 3 to 35 of consecutive cytosines connected through exonuclease-resistant phosphorothioate linkages.
Ai’ may also comprise one or more nucleotide analogues or modifications which are exonuclease-resistant. Said nucleotide analogues or modifications may for example be selected from the group consisting of phosphorothioate linkages, phosphoramidite C3 spacer, inverted deoxythymidine bases, 2'-O-methyl and 2'-O-methoxyethyl nucleosides.
It is also comprised within the present invention that the 3’ end of Ai’ may contain a nucleotide that has been modified to block extension. In that manner, the 3’ end of the lower strand of the adaptor will not be elongated, when the adaptor is not ligated to a DNA fragment. Said modification to block extension, may be any modification known to the skilled person to block primer extension.
In one embodiment of the present disclosure, the 3’ end of Ai’ comprises a dideoxynucleotide.
In one embodiment of the present disclosure, 3’-end of Ai’ comprises a phosphoramidite C3 spacer.
It is also possible that A? comprises a sequence that prevents RNA polymerase engagement and function in other manners. For example, Ai’ may comprise a sequence that support formation of hairpin, loop or other secondary structure.
A’ may be any desirable length. The length of A’ is independent from the length of A. Thus A’ may be either shorter, longer or the same length as A. Similarly, Ai and Ai’ may be the same or different lengths.
In some embodiments of the present disclosure, A’ is a sequence of nucleotides of in the range of 2 to 100 nucleotides, such as in the range of 2 to 35 nucleotides, such as in the range of 2 to 10 nucleotides. It is preferred that none of said nucleotides are ribonucleotides
Fill-in Adaptor: -B - C- and -B’ - C’-
The fill-in adaptors of the present invention comprise the structures -B - C- on the top strand and the structure -B’ - C’- on the lower strand. Said structure may also be depicted as:
5’ - B - C - 3’
3’ - B’ - C’ - 5’
B and B’ are sequences of nucleotides, which are substantially complementary to each other. B and B’ are typically the same length, however, the length of B and B’ is not so important and can be adjusted according to the specific needs of the adaptor. For example, B and/or B’ may comprise one or more functionality, such as a primer binding site, a barcode, and/or a UMI. Typically, B and B’ may be in the range of 5 to 100 nucleotides long. Said nucleotides may preferably be deoxyribonucleotides.
C and C’ are also sequences of nucleotides, which are substantially complementary to each other. It is preferred that at least 80%, such as at least 85% of the nucleotides of C can form Watson-Crick base-pairs with nucleotides of C’. In some embodiments it is preferred that C and C’ are complementary to each other except for that up to 3,
preferably up to 2, for example up to 1 nucleotide of C cannot form Watson-Crick basepair with nucleotides of C’.
In embodiments of the invention, wherein C’ comprises dll, it is preferred that all other nucleotides of C’ are complementary to nucleotides of C. In other words, apart from the dll, C and C’ are complementary.
In embodiments of the invention, wherein C’ or B’-C’ comprises a nicking endonuclease recognition site, it may be preferred that C’ and C are complementary.
C and C’ are typically the same length. Preferably, C and C’ are sequences of up to 12 nucleotides.
It is also preferred that C’ is 2 or more nucleotides in length. Thus, C’ may be between 2 and 25 nucleotides in length, preferably C’ is between 2 and 20 nucleotides in length, such as between 2 to 12 nucleotides, most preferably between 5 to 12 nucleotides in length. Upon heat treatment, C’ will dissociate from the fill-in adaptor if the adaptor is not ligated to a DNA fragment. In some embodiments of the present disclosure, C’ is between 3 and 12 nucleotides in length, for example between 2 and 10 nucleotides, such as between 4 to 10 nucleotides, for example between 5 to 8 nucleotides in length.
As noted above, B and/or B’ may comprise one or more functionalities. It is also comprised within the invention that -B-C- together and/or -B’-C’- together comprises one or more functionalities. Typically, most functionalities are comprised within B and/or B’.
For example, -B-C- and/or -B’-C’- may comprise one or more functionality, such as a primer binding site, a barcode, and/or a UMI.
In one embodiment of the present disclosure, -B-C- and/or -B’-C’- contains a primer binding site. Said primer binding site may be any sequence complementary to a primer. Preferably, neither said primer nor the primer binding site is prone to formation of secondary structure. In some embodiment, ligation of the adaptor to the DNA fragments may facilitate later handling of the DNA fragments. The primer binding site
may thus be any primer binding site, which is useful for later handling of the DNA fragments.
Many platforms for Next Generation Sequencing involve the use of platform specific primers. If the DNA fragments are to be analysed by Next Generation Sequencing, the fill-in adaptor, and in particular - B- C - or - B’ - C’- may comprise a primer binding site for said platform specific primer. For example, -B-C- and/or -B’-C’- may contain a partial or full-length SBS3 primer binding site.
It is also comprised within the invention that -B-C- and/or -C’-B’- may contain a random DNA sequence acting as a unique molecular identifier, also referred to as a UMI sequence herein. Thus, in principle each UMI sequence is different. The UMI may comprises a random sequence of in the range of 4 to 20 nucleotides, for example in the range of 6 to 16 nucleotides. Preferably, the U Is are each consisting of in the range of 5 to 15 random nucleotides.
It is also comprised within the invention that -B-C- and/or -C’-B’- may contain a barcode sequence. A barcode sequence is a unique sequence comprised within all adaptors ligated to a specific selection of DNA fragments. Barcode sequences are particularly useful for multiplexing. Thus, different barcode sequences can e.g. be used to label DNA fragments from different samples, so that all adaptors ligated to DNA fragments of one sample contains the same barcode sequences, whereas all adaptors ligated to DNA fragments of another samples contains a different barcode sequence. In that manner, each DNA fragment ligated to an adaptor can be assigned to a specific sample, even if DNA fragments from different samples are mixed. Each barcode may comprise a sequence of in the range of 4 to 20 nucleotides, for example in the range of 6 to 16 nucleotides. Preferably, each barcode consists of in the range of 5 to 15 nucleotides.
In one embodiment of the present disclosure, -B-C- and/or -B’-C’- in addition contains one or more random sequences, e.g. a random sequence of in the range of 5 to 15 nucleotides.
The fill-in adaptors of the invention comprise a nicking endonuclease recognition site or a pre-recognition site. In embodiments of the invention, wherein the nicking
endonuclease recognition site or the pre-recognition site consists of a single nucleotide (e.g. dll), it is preferred that it is positioned in C’. In embodiments wherein, the nicking endonuclease recognition site comprises several nucleotides, said recognition site may be positioned in C’ or it may be spread over both B’ and C’. In other words, -B’-C’- may comprise said recognition site. As explained herein elsewhere, the nicking endonuclease recognition site comprises a nicking site. Said nicking site may preferably be positioned within C’, and may more preferably be positioned at 3’ end of C’.
Fill-in Adaptor: Tags
In some embodiments, the adaptor is modified with tags that enable detection or purification. The tags may function as a marker. Thus, in some embodiments, the tag is an affinity group and/or bioorthogonal group. The term "affinity group" as used herein refers to any identifiable tag, group, or moiety that is capable of being specifically bound by another compound or composition (optionally attached or linked to a solid support, such as a bead, a filter, a plate, a membrane, a chromatographic resin, etc) for detection, identification and purification purposes. It is understood that many different species of affinity groups are known in the art and may be used, either individually or a combination. An exemplary affinity group is biotin. Adaptors comprising a biotin tag may be used in affinity chromatography, fluorescent or electron microscopy, ELISA assays, ELISPOT assays, western blots and other immunoanalytical methods. Another exemplary affinity group is an antigen, which is specifically recognised by an antibody.
The term "bioorthogonal" as used herein with reference to a reaction, reagent, or functional group, indicates that such reaction, reagent, or functional group does not exhibit significant or detectable reactivity towards biological molecules such as those present in a bacterial, yeast or mammalian cell. The biological molecules can be, e.g., proteins, nucleic acids, fatty acids, or cellular metabolites. Adaptors comprising the biorthogonal Biotin-, Azide-, Alkyne-, Tetrazine-, Bicyclononyne-, Cyclopropene-, Trans-cyclooctene-, Norbonene-, Dibenzocyclooctyne- and Ketone-tags. and/or Tetrazine tags may be used for click-chemistry reactions. Examples of biorthogonal click-chemistry reactions are Strain-promoted azide-alkyne cycloaddition (SPAAC), where an Azide reacts with an alkyne and Tetrazine ligation, where a tetrazine reacts with a trans-cyclooctene. Hence, in some embodiments, the adaptor further comprises a tag, such as an affinity group and/or bioorthogonal group, wherein the tag is selected
from the group of Biotin-, Azide-, Alkyne-, Tetrazine-, Bicyclononyne-, Cyclopropene-, Trans-cyclooctene-, Norbonene-, Dibenzocyclooctyne- and Ketone-tags. Adaptors comprising an N-Hydroxysuccinimide Ester (NHS ester) may be used in a variety of bioconjugation reactions, such as for labelling and/or purification. Hence, in some embodiments, the adaptor comprises an NHS-based handle.
Nicking endonuclease
The present invention relates to adaptors, which comprise a nicking endonuclease recognition site or a pre-recognition site. A nicking endonuclease pre-recognition site can be converted into a nicking endonuclease recognition site by the aid of a DNA glycosylase as describe below in the section “DNA glycosylase”. As used herein the term “nicking endonuclease” refers to an enzyme capable of introducing a nick in a double stranded DNA sequence. The nicking endonuclease may for example be any of the nicking endonucleases described herein in this section. As used herein, "nicking" refers to the cleavage of only one strand of a fully double-stranded nucleic acid molecule or a double-stranded portion of a partially double-stranded nucleic acid molecule. Nicking endonucleases nicks said DNA at a specific position relative to the nicking endonuclease recognition site, which is a nucleotide sequence that is recognized by the nicking endonuclease. In other words, a “nicking endonuclease recognition site" as used herein is a sequence recognised by a given nicking endonuclease, wherein the nicking endonuclease nicks DNA at a specific position relative to said sequence. The specific position where the nucleic acid is nicked is referred to as the "nicking site". In some embodiments, the nicking endonuclease recognition site consists of a single nucleotide (e.g. dll), in which case the recognition site consists of the nicking site. The nicking is performed by nicking endonucleases which may recognize a particular nucleotide sequence of a fully or partially doublestranded nucleic acid and cleave only one strand of the fully or partially doublestranded nucleic acid at a specific position relative to the location of the recognition sequence.
In some embodiments of the present disclosure, the nicking endonuclease is a Type III endonuclease of SEQ ID NO: 24 or a functional homologue thereof sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with any one of the Type III endonuclease of SEQ ID NO: 24. The Type III endonuclease may also be a
polypeptide of any one of SEQ ID NO: 24 to 27 or a functional homologue thereof sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with any one of the Type III endonucleases of SEQ ID NO: 24 to 27. The Type III endonuclease may also be a polypeptide of any one of SEQ ID NO: 24 to 27 or SEQ ID NO: 32 to 41 or a functional homologue thereof sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with any one of the Type III endonucleases of SEQ ID NO: 24 to 27 or SEQ ID NO: 32 to 41 . In particular it is preferred that said functional homologues of Type III endonuclease produce a 3’-O-phosphate end, which allows the extension of the DNA by polymerases. The nicking endonuclease recognition site for a Type III endonuclease may be an AP site.
In some embodiments of the present disclosure, the nicking endonuclease is a Type VIII endonuclease of SEQ ID NO: 28 or a functional homologue thereof sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with any one of the Type VIII endonuclease of SEQ ID NO: 28 to 41 . In some embodiments of the present disclosure, the nicking endonuclease is a polypeptide of any one of SEQ ID NO: 28 to 31 or a functional homologue thereof sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with any one of the Type VIII endonuclease of SEQ ID NO: 28 to 31. In some embodiments of the present disclosure, the nicking endonuclease is a polypeptide of any one of SEQ ID NO: 28 to 41 or a functional homologue thereof sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with any one of the Type VIII endonuclease of SEQ ID NO: 28 to 41. In particular it is preferred that said functional homologues of Type VIII endonuclease produce a 3’-OH end, which allows the extension of the DNA by polymerases. The nicking endonuclease recognition site for a Type VIII endonuclease may be an AP site.
In some embodiments of the present disclosure, the nicking endonuclease is a DNA glycosylase-lyase Endonuclease IV of any one of SEQ ID NO: 31 to 41 or a functional homologue thereof sharing at least 70%, such as at least 80%, for example at least
85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with any one of the glycosylase-lyase endonucleases of SEQ ID NO: 31 to 41.
In some embodiments of the present disclosure, the nicking endonuclease is a Type IV endonuclease of SEQ ID NO: 42 or a functional homologue thereof sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with any one of the Type IV endonuclease of SEQ ID NO: 42 - 50. In particular it is preferred that said functional homologues of Type IV endonuclease produce an o,p-unsaturated aldehyde end. The nicking endonuclease recognition site for a Type IV endonuclease may be an AP site.
In some embodiments of the present disclosure, the nicking endonuclease is a nicking endonuclease recognising an AP site, preferably an apyrimidinic site.
In some embodiments of the present disclosure, the fill-in adaptor comprises a CRIPSR recognition site. Thus, in some embodiments, the nicking endonuclease is a CRIPSR endonuclease and the nicking endonuclease recognition site is a CRISPR recognition site. In some embodiments of the present disclosure, the fill-in adaptor comprises a CRISPR recognition site and no UMI. In other embodiments of the present disclosure, the fill-in adaptor comprises a CRISPR recognition site and a UMI, wherein the UMI is positioned in either A or B, preferably in A2. Hence, in some embodiments, the nicking endonuclease recognition site is a CRISPR recognition site, preferably a CRISPR Cas9 recognition site. In some embodiments of the present disclosure, the nicking endonuclease recognition site is CRISPR D10A nicking site comprising a PAM sequence (NGG) downstream of the nicking site. Thus, C’ may comprises a PAM sequence, whereas the remainder of the CRISPR recognition site may be comprised in B’. Whereas the CRISPR recognition site may differ depending on the sequence of the guide RNA as well as the particular CRISPR endonuclease, the skilled person will be able to design a CRISPR recognition site.
In some embodiments of the present disclosure, the CRISPR endonuclease is a CRISPR endonuclease of SEQ ID NO: 65 or a functional homologue thereof sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with any one of the CRISPR endonuclease of SEQ ID NO: 65 - 67.
In some embodiments of the present disclosure, the nicking endonuclease is a restriction endonuclease. A restriction endonuclease may be a restriction enzyme with nicking activity. Hence, in some embodiments of this disclosure, the nicking endonuclease recognition site is a recognition site recognised by a restriction enzyme with nicking activity.
Examples of useful restriction endonucleases include Nt.BspQI, Nt.CviPII, Nt.BstNBI, Nb.BsrDI, Nb.BtsI, Nt.Alwl, Nb.BbvCI, Nb.Bsml, Nb.BssSI or NtBsmAI or variants of any of the aforementioned, such as thermostable variants thereof. Hence, in some embodiments of this invention, the nicking endonuclease recognition site is a nicking site recognised by Nt.BspQI, Nt.CviPII, Nt.BstNBI, Nb.BsrDI, Nb.BtsI, Nt.Alwl, Nb.BbvCI, Nt.BbvCI, Nb.Bsml, Nb.BssSI or Nt.BsmAI or variants of any of the aforementioned, such as thermostable variants thereof. In some embodiments of this disclosure, the nicking endonuclease recognition site is GCTCTTC, CCD, GCAGTG, GGATC, CCTCAGC, GAATGC, CACGAG or GTCTC.
In some embodiments of the present disclosure, the nicking endonuclease is a restriction endonuclease of any one of SEQ ID NO: 51 - 64 or a functional homologue thereof sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with any one of the restriction endonucleases of SEQ ID NO: 51 - 64.
DNA glycosylase
In some embodiments, the adaptor comprises a nicking endonuclease pre-recognition site. A nicking endonuclease pre-recognition site is a site, which can be transformed into a nicking endonuclease recognition site, for example by the action of a DNA glycosylase. In some embodiments, the pre-recognition site comprises or consists of deoxy-uridine (dll). In such cases the recognition site is the same as the nicking site. The dll may be recognized by a DNA glycosylase, preferably a Uracil DNA Glycosylase (UDG). The uracil DNA glycosylase (UDG) can be used to cleave the glycosidic bond between the uracil base and the deoxyribose sugar to convert the prerecognition site to a recognition site. After UDG cleaves the uracil base from the phosphodiester backbone, the resulting apyrimidinic sites block replication by DNA polymerases and may be recognized by nicking endonucleases. Hence, the pre-nicking
site may be converted into a nicking endonuclease recognition site by the action of a DNA glycosylase.
In some embodiments of the present disclosure, C’ comprises one deoxy-Uridine positioned in C’ in a position selected from the group of position 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19 and 20. For example, the deoxy-Uridine may be positioned in C’ at position 5, 6, 7, 8, 9 or 10. For example, the deoxy-Uridine may be positioned in C’ at position 5, 6, 7, or 8. In preferred embodiments, the deoxy-Uridine is positioned in C’ at position 5.
The glycosylase may be a monofunctional or bifunctional glycosylase. Monofunctional glycosylases have only glycosylase activity. Thus, monofunctional glycosylases may cleave the glycosidic bond between the uracil base and the deoxyribose sugar to convert the pre-recognition site to a recognition site. In some embodiments of the present disclosure, the DNA glycosylase is a monofunctional glycosylase. Bifunctional glycosylases have glycosylase activity and endonuclease activity. Thus, bifunctional glycosylases may cleave the glycosidic bond between the uracil base and the deoxyribose sugar to convert the pre-recognition site to a recognition site and nick at the nicking site. In some embodiments of the present disclosure, the DNA glycosylase is a bifunctional glycosylase. In some embodiments of the present disclosure, the DNA glycosylase, for example the Uracil-DNA glycosylase and the nicking endonuclease are combined in one enzyme.
In some embodiments of the present disclosure, DNA glycosylase is a Uracil DNA glycosylase of SEQ ID NO: 1 or a functional homologue thereof sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with any one of the Type IV endonuclease of SEQ ID NO: 1 - 23.
In some embodiments of the present disclosure, step d) comprises incubating the sample with Uracil DNA glycosylase (UDG) and with DNA glycosylase-lyase Endonuclease VIII. In some embodiments of the present disclosure step d) comprises incubating the sample with Antarctic Thermolabile Uracil DNA glycosylase (UDG) and DNA glycosylase-lyase Endonuclease III. In some embodiments of the present
disclosure, step d) comprises incubating the sample with Afu Uracil DNA glycosylase (UDG) and the DNA glycosylase-lyase Endonuclease IV.
In some embodiments of the present disclosure, the DNA glycosylase is a Uracil-DNA glycosylase, for example Antarctic Thermolabile Uracil DNA glycosylase (UDG) or Afu Uracil DNA glycosylase (UDG).
In some embodiments of the present disclosure, the nicking endonuclease is DNA glycosylase-lyase Endonuclease VIII, DNA glycosylase-lyase Endonuclease III or DNA glycosylase-lyase Endonuclease IV.
Method
The present disclosure also provides methods of attaching adaptor(s) (e.g. any of the fill-in adaptors described herein) to DNA fragment(s), said method comprising: a) providing at least one adaptor according to any one of the preceding items; b) providing a sample containing DNA fragments; c) attaching the adaptor to the DNA fragments in the sample; and d) i) if C’ comprises an endonuclease pre-nicking site, incubating the sample with a DNA glycosylase recognising said pre-nicking site and with a nicking endonuclease recognising the nicking endonuclease recognition site generated by the DNA glycosylase either sequentially, partly simultaneously or simultaneously under conditions allowing for activity of said enzymes; or ii) if C’ or -B’-C’- comprises a nicking endonuclease recognition site incubating the sample with a nicking endonuclease recognising said nicking endonuclease recognition site under conditions allowing for activity of said enzyme e) incubating the sample at a temperature that is higher than the Tm of i) and ii), wherein i) and ii) are as follows i) 5’ - C - 3’
3’ - O’ - 5’, ii) 5’ - C - C’ - 3’
3’ - O’- C - 5’; f) incubating the sample with a strand-displacing DNA polymerase.
In another aspect, the present disclosure also provides a method of attaching adaptor(s) (e.g. any of the fill-in adaptors described herein) to DNA fragment(s), said method comprising: a) providing at least one adaptor according to any one of items as described herein; b) providing a sample containing DNA fragments; c) attaching the adaptor to the DNA fragments in the sample; and d) i) incubating the sample with a Uracil-DNA glycosylase under conditions allowing for activity of said enzyme, thereby generating an AP site; ii) incubating the sample with a nicking endonuclease recognising an AP site under conditions allowing for activity of said enzyme, wherein steps i) and ii) may be performed simultaneously, partly simultaneously or sequentially, e) incubating the sample at a temperature that is higher than the Tm of i) and ii), wherein i) and ii) are as follows i) 5’ - C - 3’
3’ - O’ - 5’, ii) 5’ - C - C - 3’
3’ - O’- C - 5’; f) incubating the sample with a strand-displacing DNA polymerase.
The steps of incubating the sample at a temperature that is higher than the Tm of i) and ii) and incubating the samples with a strand-displacing DNA polymerase can be performed either sequentially or simultaneously.
In one embodiment of the present disclosure, the sample is incubated with the stranddisplacing DNA polymerase at a temperature that is higher than the Tm of i) and ii).
In one embodiment of the present disclosure, the method further comprises a step of cold shock, wherein the sample comprising DNA fragments ligated to adaptor is quickly transferred to a low temperature after nicking and the heat treatment. Said cold shock
usually comprises incubation at a temperature in the range of 0 °C to 4 °C, wherein said step is performed immediately after step e).
The step of incubating the sample with a nicking endonuclease and/or a DNA glycosylase is performed under conditions allowing for activity of said enzyme. The skilled person will be able to determined suitable conditions for the enzymes of her choice. Typically, however, step d) is performed at a temperature in the range of 20° C to 80° C. In some embodiments of this disclosure, step f) is performed at a temperature in the range of 20 to 80°C, such as in the range of 25 to 75°C, for example in the range of 20 to 50°C, such as in the range of 25 to 37°C.
The methods also comprise a step of heat treatment, which is performed in order to allow C’ to dissociate from unligated fill-in adaptors, and/or to allow -C’-C- to dissociate from any adaptor dimers after RNA-nicking. Accordingly, the step of heat treatment should be performed at a temperature that is higher than the Tm of i) and ii), wherein i) and ii) are as follows i) 5’ - C - 3’ 3’ - O’ - 5’, ii) 5’ - C - C’ - 3’
3’ - C’- C - 5’;
Typically, step e) is performed at a temperature in the range of 40° C to 80° C, such as in the range of 45°C to 70°C, for example in the range of 50°C to 70°C.
The methods of the invention also comprise a step of incubating the sample with a strand-displacing DNA polymerase.
In some embodiments of this disclosure, the strand-displacing DNA polymerase needs a 3’-hydroxyl group for primer extension. In other embodiments, a nick may be sufficient to start primer extension by a strand-displacing DNA polymerase. In some embodiments of this disclosure, the strand-displacing DNA polymerase has a strong activity at elevated temperatures to ensure the nicked strand does not reanneal. The person skilled in the art will appreciate that a variety of thermophilic strand-displacing polymerases can be used.
The strand-displacing DNA polymerase may be any DNA polymerase with the ability to displace downstream DNA encountered during synthesis with newly synthesised DNA. Multiple strand-displacing DNA polymerases are commercially available, and anyone of these can be used with the invention. In one embodiment, the strand displacing DNA polymerase is a Bst polymerase, DNA Polymerase I, Large (Klenow) Fragment or DNA Polymerase from Thermococcus litoralis.
Thus, the strand displacing DNA polymerase may be any DNA polymerase sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with Bst DNA Polymerase of SEQ ID NO: 68.
In one embodiment of the present disclosure, the strand displacing DNA polymerase is a Bst polymerase comprising a large fragment, wherein said large fragment comprises or consists of a sequence sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with the Large fragment of Bst polymerase SEQ ID NO: 69.
The strand displacing DNA polymerase may be any DNA polymerase of SEQ ID NO: 76 or a strand displacing DNA polymerase sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with DNA Polymerase I, Large (Klenow) Fragment of SEQ ID NO: 76.
Thus, the strand displacing DNA polymerase may be any DNA polymerase of SEQ ID NO:77 or a strand displacing DNA polymerase sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with DNA Polymerase from Thermococcus litoralis of SEQ ID NO: 77.
In one embodiment of the present disclosure the strand displacing DNA polymerase is a DNA polymerase sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with phi 29 DNA polymerase of SEQ ID NO: 70 or Taq DNA polymerase of SEQ ID
NO: 71. Said incubation with the strand displacing DNA polymerase is performed under conditions allowing for activity of said enzyme. The skilled person will be able to determine suitable conditions for the DNA polymerase of their choice. Typically, the step f) is performed at a temperature in the range of 20 to 80°C, such as in the range of 25 to 75°C, such as 20 to 50°C, for example in the range of 25 to 37°C.
In some embodiments of this invention, the strand displacing DNA polymerase is a Polymerase with 5'— >3' exonuclease activity, such as Taq Polymerase.
As noted above, in some embodiments, the adaptors are ligated at one end of the DNA fragments, and in other embodiments adaptors may be ligated at both ends. In the latter case, two different adaptors (i.e. two different adaptor species) may be provided for use in the method. Thus, the methods may involve 1 -sided or 2-sided adaptor ligation. This may be the case whether the adaptor-ligated DNA fragments are amplified by linear amplification (e.g. by in vitro transcription) or by non-linear (e.g. exponential) amplification (e.g. by PCR), for example as depicted in Figure 3.
More generally, two or more different adaptors may be provided for use in the method. These may be provided separately or together. Accordingly, in some embodiments, a mixture of fill-in adaptors (i.e. two, or two or more adaptor species) is provided.
Different adaptors may differ in the amplification sequence that they contain. In particular, they may differ in the primer binding site that is provided to the adaptor- ligated DNA fragment, more particularly, at each end thereof. Thus, the different adaptor species may comprise complements of different primer binding sites (that is the sequence A in the adaptor may be different).
In some embodiments, each of the fill-in adaptor species may share the same barcode (BC) but comprise different primer binding sites.
Alternatively viewed, each of the adaptor species contain a sequence that is complementary to a primer binding site, wherein the primer binding site of each of the adaptor species is distinct. Thus, in some embodiments, a plurality of fill-in adaptors may be provided which contain sequences that are complementary to distinct primer binding sites.
In a particular embodiment, two fill-in adaptors are provided for example as a mixture. The two adaptors may be referred to as. “A” and “B” adaptors, A and B being representative of first and second adaptors. Two adaptors for use together to provide a pair of amplification primer binding sites may be regarded as paired, or cognate, adaptors
Thus, in some embodiments, two fill-in adaptor species may be added randomly to the DNA fragments as described herein. The fill-in adaptor species may bind to all or a fraction of DNA fragments from each sample. In some cases, an A or B (i.e. first or second) adaptor may become ligated to just one end of a fragment. In other instances, the DNA fragment may be ligated to an adaptor at each end, either the same or a different adaptor. For example, one adaptor (e.g. adaptor A) may be ligated at both ends (i.e. A-DNA-A). In other instances, the DNA fragment is ligated to both adaptors, one at each end (e.g. adaptors A and B) (e.g. A-DNA-B). As used herein, DNA fragments that have been ligated to two distinct fill-in adaptors may be termed “doubletagged DNA fragments”. Thus, the DNA may be ligated to the two fill-in adaptors in several different combinations, for example A-DNA-A, A-DNA-B, B-DNA-A or B-DNA-B.
The use of two fill-in adaptors comprising distinct primer binding sites may find particular utility in methods employing PCR. A representative example of such methods can be seen in Figure 3E. In such embodiments, double-tagged DNA fragments may be amplified during PCR by using primers that are specific for the distinct amplification sequences present in the distinct fill-in adaptor species. For example, if the doubletagged DNA fragment is ligated to distinct adaptor species at each end (A-DNA-B or B- DNA-A), the provided primers may be specific for the amplification sequence of A and B, respectively. In such embodiments, the UMI-BC-gDNA portion is sequenced following PCR amplification.
The skilled person can readily design adaptors with appropriate primer binding site sequences. A representative example of such adaptors is provided by SEQ ID NOs. 93 and 94 which set out the top and bottom strand sequences of a first adaptor, and SEQ ID NOs. 95 and 96 which set out the top and bottom strand sequences of a second adaptor. The DNA fragments ligated with such adaptors may be amplified by the forward and reverse primers set out in SEQ ID NOs. 97 and 98 respectively.
DNA fragments
The fill-in adaptors of the present invention are useful for ligation to any DNA fragments.
Said DNA fragments may for example be chromatin fragments. Chromatin from any source of cells or tissues may be used. Said chromatin may for example be fragmented using mechanical or enzymatic means, and a specific antibody may be used to precipitate those chromatin fragments that associate with any desired antigen recognized by the specific antibody. Chromatin fragments from decomposing cells within an organism may also be present in cell-free material, such as liquid biopsies, such as extracellular fluid, blood, urine, lymph fluid, ascites fluid, and in such cases no further fragmentation may be needed. Such chromatin fragments may be precipitated. Specific antibodies against DNA or chromatin binding factors, histone modifications or any other desired molecule associating with DNA may be used, including antibodies against modification of the DNA itself, such as cytosine methylation-specific antibodies,
In one embodiment of the present disclosure the DNA fragments consist of or comprise genomic DNA (gDNA), such as gDNA fragments.
In one embodiment of the present disclosure, the DNA fragments are protein-bound DNA fragments.
In preferred embodiments, the DNA fragments are gDNA fragments bound to proteins. Thus, said fragments may comprise or consist of nucleosomes and/or other genomic DNA fragments bound to chromatin proteins such as transcription factors. Preferably, the majority of the gDNA fragments are in the form of nucleosomes, such as mononucleosomes.
In another embodiment of the present disclosure, the DNA fragments are naked genomic DNA.
The gDNA may be derived from any organism of interest, and thus the genomic DNA may for example be eukaryotic or prokaryotic.
In some embodiments, the DNA fragments comprises or consists of cell free DNA. Cell free DNA is typically already fragmented, and thus it is frequently not required to further fragment cell free DNA. In some embodiments, the cell free DNA is bound by proteins. Preferably, the cell free DNA is in the form of chromatin fragments. For example, the cell free DNA may largely be in the form of nucleosomes. In some embodiments the cell free DNA is in the form of naked DNA, i.e. not bound to proteins.
As used herein the term "cell free DNA" refers to a DNA molecule or a set of DNA molecules freely circulating in a biological sample, for example in blood. Cell free DNA is also known as "circulating DNA". Cell free DNA is extracellular, and this term is used as opposed to the intracellular DNA, which can be found, for example, in the cell nucleus or mitochondria.
In one embodiment of the present disclosure, the DNA fragments are selected from the group consisting of cDNA, DNA produced by whole genome amplification, primer extension products comprising at least one double-stranded terminus, and PCR amplicons.
In one embodiment of the present disclosure, the DNA fragments are obtained by isolating chromatin from a cellular sample and fragmenting said chromatin.
Alternatively, the DNA fragments are obtained by lysing cells of a cellular sample and fragmenting said chromatin. Said fragmenting may be done by any useful means, for example, the DNA fragments may have been prepared by mechanical shearing and/or enzymatic digestions, nebulisation, sonication, point-sink shearing, passage through a pressure cell, using French pressure cells, transposome mediated fragmentation and/or digestion with restriction enzymes and/or endonucleases. In one embodiment the genomic DNA is fragmented by MNase digestion. MNase digestion leads to fragmentation mainly into mononucleosomes and/or dinucleosomes.
The fragmentation is preferably done in a manner so that the fragmented DNA comprises or essentially consists of chromatin fragments. The DNA fragments may have any desirable size. However, the methods of the invention are particularly useful for ligating adaptors to short DNA fragments. Frequently, the DNA fragments in average comprise more than 10 base pairs, such as more than 15 base pairs, such as more than 150 base pairs, for example in the range of 10 to 500 base pairs, such as in
the range of 20 to 200 base pairs. The chromatin fragments may comprise transcription factor bound fragments, which typically are smaller than 150 bp, for example approx..50bp and/or mononucleosomes, which typically are 150-230 bp, for example approx.150bp and/or dinucleosomes which are larger than 300 bp, for example approx. 300 bp. In some embodiments of this disclosure, the DNA fragment in average comprise more than 10 base pairs, such as more than 15 base pairs, such as more than 150 base pairs, for example in the range of 10 to 15,000 base pairs, such as in the range of 10 to 10,000 base pairs, for example in the range of 10 to 5,000 base pairs, such as in the range of 10 to 500 base pairs.
The adaptors may be attached to the DNA fragments by any useful means, however preferably attachment in step c) is done by ligation, such as by blunt end ligation. In particular, ligation may be performed by incubation with a ligase, for example a T4 DNA ligase. Said incubation with ligase is performed under conditions allowing for activity of said enzyme. The skilled person will be able to determine suitable conditions for the ligase of her choice.
Sometimes it may be beneficial to undertake one or more steps for preparing said DNA fragments for ligation, such as filling in or resecting ends with overhangs to blunt ends. Thus, the methods of the invention may also comprise such steps.
In one embodiment of the present disclosure, the adaptor contains a sample specific barcode, and the DNA fragments are obtained from the sample to be marked by said barcode.
Amplification
Once the fill-in adaptors of the invention are attached to DNA fragments, the ligated adaptors may be amplified.
Thus, the invention also provides methods of amplification of DNA fragments. Said method comprises the step of a) preparing DNA fragments attached to adaptors by the methods described above, b) amplifying said DNA fragments attached to adaptors in vitro.
In one embodiment of the present disclosure, at least one step of amplification is performed by RNA polymerase-driven transcription using said RNA polymerase. This may in particular be the case, when the adaptor comprises a DNA amplification sequence that is the promoter sequence of an RNA polymerase. Said RNA polymerase may preferably be T7 RNA polymerase. In some embodiments of the present disclosure, the T7 RNA polymerase is a polymerase of SEQ ID NO: 78 or a functional homologue thereof sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with SEQ ID NO: 78.
In one embodiment of the present disclosure, A of the fill-in adaptor contains a sequence complementary to a primer binding site. In such embodiments it is preferred that at least one step of said amplification involves the use of a primer capable of binding to said primer binding site.
As noted above, in the case of 2-sided adaptor ligation, the step of amplification may involve the use of two different primers, each capable of binding to the distinct primer binding sites provided by the two different adaptors ligated at each end of the DNA fragment.
Figures 3A to 3E illustrate representative implementations of the method, including 1- or 2-sided adaptor ligation, and amplification by in vitro transcription or by PCR.
Figures C and D show locus-specific amplification with locus-specific primers. As depicted in Figure 3B, and described above, in the case of a 1 -sided ligation, a second tag can be introduced by ligation at the other end of the DNA fragment, in order to provide a primer binding site, e.g. for a primer for a reverse transcription step, or for further amplification.
Depending on the specific design of the adaptors, and sequencing primer binding sites used etc, one or more primers may be generated for the reverse transcription step, for example as depicted in Figure 3B. For example, two primer binding events may occur. This may be a result of the particular sequence requirements of the downstream sequencing platform to be used. For example, in view of the particular design of Illumina sequencing primers similarities between the required forward and reverse
sequencing primer may result in the generation of two binding sites for the reverse transcriptase primer. Such design considerations are within the routine skill of the person skilled in this art.
Samples
The term "sample(s)" to be used in the method of the present invention refers to various samples that contain DNA fragments.
Examples of such a sample include samples prepared from, comprising or consisting of cultured cells, a cultured cell lysate, a culture supernatant, and/or a mammalian material. The term "mammalian material" refers to every mammalian-derived biological material such as tissue or biopsies collected from a mammalian (e g., tissue collected after an operation) and/or body fluids such as blood, serum, blood plasma, urine, a spinal fluid, saliva, a lymph fluid, a lacrimal fluid, or a seminal fluid. Preferably, such mammalian material is blood, serum or blood plasma.
As described above the sample may comprise or consist of aforementioned cultured cells, a cultured cell lysate, a culture supernatant, and/or a mammalian material.
The sample may also be prepared from cultured cells, a cultured cell lysate, a culture supernatant, and/or a mammalian material. For example, the sample may comprise fragmented and/or isolated DNA from aforementioned material. In one embodiment, the sample is prepared from any of the aforementioned materials comprising cells, by a method comprising lysing said cells and fragmenting the genomic DNA of said cells.
The mammalian material may be obtained from any mammal. In some embodiments, the mammal is a human.
In some embodiments, the DNA fragments are obtained by isolating or partly isolating DNA from mammalian tissue. In such embodiments the DNA is preferably subjected to fragmentation, which can be done before, after or simultaneously with isolation. In some embodiments, the DNA fragments are obtained by isolating or partly isolating DNA from body fluids. In some embodiments, the DNA fragments are obtained by isolating or partly isolating DNA from serum. In some embodiments, the DNA fragments are obtained by isolating or partly isolating DNA from blood plasma. In some
embodiments, the DNA fragments are obtained by isolating or partly isolating DNA from urine. In some embodiments, the DNA fragments are obtained by isolating or partly isolating DNA from spinal fluid. In some embodiments, the DNA fragments are obtained by isolating or partly isolating DNA from saliva. In some embodiments, the DNA fragments are obtained by isolating or partly isolating DNA from lymph fluid. In some embodiments, the DNA fragments are obtained by isolating or partly isolating DNA from lacrimal fluid. In some embodiments, the DNA fragments are obtained by isolating or partly isolating DNA from seminal fluid. In some embodiments, the DNA fragments are obtained by isolating or partly isolating DNA from blood. In any of the aforementioned embodiments the DNA may be subjected to fragmentation, which for example can be done as described above either before, after or simultaneously with isolation.
In some embodiments, the sample comprises purified DNA. In some embodiments, the sample comprises purified nucleosomes. In some embodiments, the sample comprises purified chromatin. In some embodiments, the sample comprises cell lysate, for example cell lysate which has been subjected to fragmentation. In some embodiments, the sample comprises plasma. In some embodiments, the sample comprises blood. In some embodiments, the sample comprises serum. In some embodiments, the sample comprises urine. In some embodiments, the sample comprises spinal fluid. In some embodiments, the sample comprises salvia. In some embodiments, the sample comprises lymph fluid. In some embodiments, the sample comprises lacrimal fluid. In some embodiments, the sample comprises seminal fluid.
In one embodiment of the present disclosure, some or essentially all of the DNA fragments of a given sample are ligated to fill-in adaptors of the invention. The fill-in adaptors may contain a sample specific barcode, such that DNA fragments of a given sample can be identified by the barcode.
Items
The invention may further be defined by any of the following items:
1 . A partly double-stranded adaptor comprising or consisting of an oligonucleotide of the general structure: 5’ - A - B - C - 3’
3’ -A’ - B’ - C’ - 5’ wherein a) A is the top strand of a DNA amplification sequence, and A consists of 5’ - AI - A2 - 3’; b) A’ consists of 3’ - Ai’ - A2’ - 5’, and c) Ai’ is a sequence of nucleotides, which is substantially non- complementary to Ai, wherein the 3’ end is exonuclease resistant and/or contains primer extension blocking groups; d) A2 and A2’ are either not present or A2 and A2’ are sequences of nucleotides substantially complementary to each other; e) B and B’ are sequences of in the range of 5 to 100 deoxyribonucleotides, which are substantially complementary to each other; and f) C and C’ are sequences, which are substantially complementary to each other, wherein C’ or -B’-C’-comprises a nicking endonuclease recognition site or a nicking endonuclease pre-recognition site, wherein said pre-nicking site can be converted into a nicking endonuclease recognition site by the action of a DNA glycosylase with the proviso that the nicking endonuclease recognition site is not a ribonucleotide positioned at the 3’ end of C’.
2. The adaptor according to item 1 , wherein the pre-nicking site comprises or consists of a deoxy-Uridine (dll).
3. The adaptor according to item 1 , wherein the nicking endonuclease recognition site is a nicking site recognised by a restriction enzyme with nicking activity.
4. The adaptor according to any one of the preceding items, wherein the nicking endonuclease recognition site is a nicking site recognised by Nt.BspQI, Nt.CviPII, Nt.BstNBI, Nb.BsrDI, Nb.BtsI, Nt.Alwl, Nb.BbvCI, Nt.BbvCI, Nb.Bsml, Nb.BssSI or Nt.BsmAI or variants of any of the aforementioned, such as thermostable variants thereof.
5. The adaptor according to any one of the preceding items, wherein the nicking endonuclease recognition site is GCTCTTC, CCD, GCAGTG, GGATC, CCTCAGC, GAATGC, CACGAG or GTCTC.
6. The adaptor according to item 1 , wherein the nicking endonuclease recognition site is a CRISPR nicking endonuclease recognition site.
7. The adaptor according to item 1 , wherein the nicking endonuclease recognition site is a CRISPR Cas9 nicking site.
8. The adaptor according to item 1 , wherein the nicking endonuclease recognition site is CRISPR D10A nicking site comprising a PAM sequence (NGG) downstream of the site of the nick.
9. The adaptor according to item 1 , wherein the nicking endonuclease recognition site is an SP6 RNA polymerase recognition site, preferably a sequence of SEQ ID NO: 73 or a sequence sharing at least 95% sequence identity therewith.
10. The adaptor according to item 1 , wherein the nicking endonuclease recognition site is a T3 RNA polymerase recognition site, preferably a sequence of SEQ ID NO: 74 or a sequence sharing at least 95% sequence identity therewith.
11. The adaptor according to item 1 , wherein the nicking endonuclease recognition site is a Cyanophage Syn5 polymerase recognition site, preferably a sequence of SEQ ID NO: 75 or a sequence sharing at least 95% sequence identity therewith.
12. A partly double-stranded adaptor comprising or consisting of an oligonucleotide of the general structure:
5’ - A - B - C - 3’
3’ -A’ - B’ - C’ - 5’ wherein a) A is the top strand of a DNA amplification sequence, and A consists of 5’ - AI - A2 - 3’; b) A’ consists of 3’ - Ai’ - A2’ - 5’, and
c) Ai’ is a sequence of nucleotides, which is substantially non- complementary to Ai, wherein the 3’ end is exonuclease resistant and/or contains primer extension blocking groups; d) A2 and A2’ are either not present or A2 and A2’ are sequences of nucleotides substantially complementary to each other; e) B and B’ are sequences of in the range of 5 to 100 deoxyribonucleotides, which are substantially complementary to each other; and f) C and C’ are sequences, which are substantially complementary to each other, wherein C’ comprises a deoxy-Uridine (dll).
13. The adaptor according to any one of the preceding items, wherein said adaptor comprises or consists of an oligonucleotide of the general structure:
5’ - A - B - C - 3’
3’ - A’ - B’ - C’ - P - 5’ wherein a) A, B, C, A’, B’ and C’ are as defined in any one of items 1 or 12; and b) P is a 5’ phosphate.
14. The adaptor according to any one of the preceding items, wherein A consists of in the range of 10 to 100 nucleotides.
15. The adaptor according to any one of the preceding items, wherein A consists of in the range of 15 to 50 nucleotides.
16. The adaptor according to any one of the preceding items, wherein A consists of in the range of 15 to 40 nucleotides.
17. The adaptor according to any one of the preceding items, wherein A’ consists of in the range of 2 to 100 nucleotides.
18. The adaptor according to any one of the preceding items, wherein A’ consists of in the range of 2 to 35 nucleotides.
The adaptor according to any one of the preceding items, wherein A’ consists of in the range of 2 to 10 nucleotides. The adaptor according to any one of the preceding items, wherein the DNA amplification sequence is a promoter sequence of an RNA polymerase or it comprises a primer binding site. The adaptor according to any one of the preceding items wherein A, when bound to its complementary sequence as a double-stranded DNA, is recognized by an RNA polymerase. The adaptor according to any one of the preceding items wherein A, when bound to its complementary sequence as a double-stranded DNA, is recognized by T7 RNA polymerase, by SP6 RNA polymerase, by Bacteriophage T3 RNA polymerase or Cyanophage Syn5 polymerase. The adaptor according to any one of the preceding items wherein A, when bound to its complementary sequence as a double-stranded DNA, is recognized by T7 RNA polymerase. The adaptor according to any one of the preceding items, wherein A contains a sequence complementary to a primer binding site. The adaptor according to any one of the preceding items wherein Ai’ comprises or consists of a sequence of nucleotides connected through exonucleaseresistant phosphorothioate linkage. The adaptor according to any one of the preceding items wherein Ai’ is non- complementary to Ai. The adaptor according to any one of the preceding items wherein Ai’ comprises or consists of a sequence of 3 to 35 of consecutive nucleotides connected through exonuclease-resistant phosphorothioate linkages.
28. The adaptor according to any one of the preceding items wherein Ai’ comprises or consists of a sequence of 3 to 35 of consecutive cytosines connected through exonuclease-resistant phosphorothioate linkages.
29. The adaptor according to any one of the preceding items, wherein the 3’ end of Ai’ contains a nucleotide that has been modified to block primer extension.
30. The adaptor according to any one of the preceding items wherein the 3’ end of Ai’ comprises a dideoxynucleotide.
31. The adaptor according to any one of the preceding items wherein the 3’-end of Ai’ comprises a phosphoramidite C3 spacer.
32. The adaptor according to any one of the preceding items wherein Ai’ comprises one or more nucleotide analogues or modifications which are exonucleaseresistant, selected from the group consisting of phosphorothioate linkages, phosphoramidite C3 spacer, inverted deoxythymidine bases, 2'-O-methyl and 2'-O-methoxyethyl nucleosides.
33. The adaptor according to any one of the preceding items wherein Ai’ comprises a sequence that prevent RNA polymerase engagement and function, for example, a sequence that support formation of hairpin, loop or other secondary structure.
34. The adaptor according to any one of the preceding items, wherein -B-C- and/or -B’-C’- contains a primer binding site.
35. The adaptor according to any one of the preceding items, wherein -B-C- and/or -B’-C’- contains a partial or full-length SBS3 primer binding site.
36. The adaptor according to any one of the preceding items wherein -B-C- and/or -B’-C’- contains a randomized unique molecular identifier consisting of in the range of 5 to 15 nucleotides.
37. The adaptor according to any one of the preceding items wherein -B-C- and/or -B’-C’- contains a barcode sequence consisting of in the range of 5 to 15 nucleotides.
38. The adaptor according to any one of the preceding items wherein -B-C- and/or -B’-C’- in addition contains a random sequence of in the range of 5 to 15 nucleotides.
39. The adaptor according to any one of the preceding items wherein -B’-C’- comprises the nicking endonuclease recognition site.
40. The adaptor according to any one of the preceding items, wherein at least 80%, such as at least 85% of the nucleotides of C can form Watson-Crick base-pairs with nucleotides of C’.
41. The adaptor according to any one of the preceding items, wherein C’ is 2 or more nucleotides in length.
42. The adaptor according to any one of the preceding items, wherein C’ is between 2 and 25 nucleotides in length, such as between 2 and 20 nucleotides, preferably between 2 and 12 nucleotides, more preferably between 2 and 10 nucleotides in length or between 5 to 12 nucleotides in length.
43. The adaptor according to any one of the preceding items, wherein C’ is between 3 and 12 nucleotides in length, such as between 4 to 10 nucleotides, for example between 5 to 8 nucleotides in length.
44. The adaptor according to any one of the preceding items, wherein all nucleotides of A, A2’, B and B’ are deoxyribonucleotides.
45. The adaptor according to any one of the preceding items, wherein said adaptor do not comprises any ribonucleotides.
46. The adaptor according to any one of the preceding items, wherein C’ comprises one deoxy-Uridine positioned in C’ in a position selected from the group of position 5, 6, 7, 8, 9, 10, 11 and 12.
47. The adaptor according to any one of the preceding items, wherein the adaptor further comprises a tag, such as an affinity group and/or bioorthogonal group, wherein the tag is selected from the group of Biotin-, Azide-, Alkyne-, Tetrazine- Bicyclononyne-, Cyclopropene-, Trans-cyclooctene-, Norbonene-, Dibenzocyclooctyne and Ketone-tags.
48. The adaptor according to any one of the preceding items, wherein the adaptor comprises an NHS-based handle.
49. A method of attaching adaptor(s) to DNA fragment(s), said method comprising: a) providing at least one adaptor according to any one of the preceding items; b) providing a sample containing DNA fragments; c) attaching the adaptor to the DNA fragments in the sample; and d) i) if C’ comprises a nicking endonuclease pre-recognition site, incubating the sample with a DNA glycosylase recognising said pre-nicking site and with a nicking endonuclease recognising the nicking endonuclease recognition site generated by the DNA glycosylase either sequentially, partly simultaneously or simultaneously under conditions allowing for activity of said enzymes; or ii) if C’ or -B’-C’- comprises a nicking endonuclease recognition site incubating the sample with an endonuclease recognising said nicking endonuclease recognition site under conditions allowing for activity of said enzyme e) incubating the sample at a temperature that is higher than the Tm of i) and ii), wherein i) and ii) are as follows i) 5’ - C - 3’
3’ - C’ - 5’ , ii) 5’ - C - C’ - 3’
3’ - C’- C - 5’ ;
f) incubating the sample with a strand-displacing DNA polymerase. A method of attaching adaptor(s) to DNA fragment(s), said method comprising: a) providing at least one adaptor according to any one of items 1 to 49; b) providing a sample containing DNA fragments; c) attaching the adaptor to the DNA fragments in the sample; and d) i) incubating the sample with a Uracil-DNA glycosylase under conditions allowing for activity of said enzyme, thereby generating an AP site; ii) incubating the sample with a nicking endonuclease recognising an AP site under conditions allowing for activity of said enzyme, wherein steps i) and ii) may be performed simultaneously, partly simultaneously or sequentially, e) incubating the sample at a temperature that is higher than the Tm of i) and ii), wherein i) and ii) are as follows i) 5’ - C - 3’
3’ - C’ — 5’ , ii) 5’ - C - C’ - 3’
3’ - O’- C - 5’ ; f) incubating the sample with a strand-displacing DNA polymerase. The method according to any one of items 49 to 50 wherein the steps e) and f) are performed simultaneously. The method according to any one of items 49 to 51 , wherein the sample is incubated with the strand-displacing DNA polymerase at a temperature that is higher than the Tm of i) and ii). The method according to any one of items 49 to 52, wherein the method further comprises a step of incubation at a temperature in the range of 0°C to 4°C, wherein said step is performed immediately after step e). The method according to any one of items 49 to 53, wherein step f) is performed at a temperature in the range of 20 to 80°C, such as in the range of
25 to 75°C, for example in the range of 20 to 50°C, such as in the range of 25 to 37°C.
55. The method according to any one of items 49 to 54, wherein the DNA glycosylase is a monofunctional glycosylase.
56. The method according to any one of items 49 to 54, wherein the DNA glycosylase is a bifunctional glycosylase.
57. The method according to any one of items 49 to 54, wherein the DNA glycosylase is a Uracil DNA glycosylase of SEQ ID NO: 1 or a functional homologue thereof sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with any one of the Uracil DNA glycosylases of SEQ ID NO: 1 - 23.
58. The method according to any one of items 49 to 54 wherein the DNA glycosylase, for example the Uracil-DNA glycosylase and the nicking endonuclease are combined in one enzyme.
59. The method according to item 58, wherein the nicking endonuclease is DNA glycosylase-lyase Endonuclease VIII, DNA glycosylase-lyase Endonuclease III or DNA glycosylase-lyase Endonuclease IV of any one of SEQ ID NO: 31 to 41 or a functional homologue thereof sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with any one of the glycosylase-lyase endonucleases of SEQ ID NO: 31 to 41.
60. The method according to any one of items 49 to 59, wherein step d) comprises incubating the sample with Uracil DNA glycosylase (UDG) and with DNA glycosylase-lyase Endonuclease VIII.
61. The method according to any one of items 49 to 57, wherein step d) comprises incubating the sample with Antarctic Thermolabile Uracil DNA glycosylase (UDG) and DNA glycosylase-lyase Endonuclease III.
62. The method according to any one of items 49 to 57, wherein step d) comprises incubating the sample with Afu Uracil DNA glycosylase (UDG) and the DNA glycosylase-lyase Endonuclease IV.
63. The method according to any one of items 49 to 57, wherein the DNA glycosylase is a Uracil-DNA glycosylase, for example Antarctic Thermolabile Uracil DNA glycosylase (UDG) or Afu Uracil DNA glycosylase (UDG).
64. The method according to any one of items 49 to 63, wherein the DNA glycosylase generates an AP site.
65. The method according to any one of items 49 to 64, wherein the nicking endonuclease is a nicking endonuclease recognising an AP site, preferably an apyrimidinic site.
66. The method according to any one of items 49 to 65, wherein the nicking endonuclease is endonuclease VIII, Endonuclease III or Endonuclease IV.
67. The method according item 66, wherein the nicking endonuclease is a Type III endonuclease of any one of SEQ ID NO: 24 to 27 or SEQ ID NO: 32 to 41 or a functional homologue thereof sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with any one of the Type III endonuclease of SEQ ID NO: 24 to 27 or SEQ ID NO: 32 to 41.
68. The method according item 66, wherein the nicking endonuclease is a Type VIII endonuclease of any one of SEQ ID NO: 28 to 41 or a functional homologue thereof sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with any one of the Type VIII endonuclease of SEQ ID NO: 28 to 41.
69. The method according item 66, wherein the nicking endonuclease is a Type IV endonuclease of SEQ ID NO: 42 or a functional homologue thereof sharing at
least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with any one of the Type IV endonuclease of SEQ ID NO: 42 - 50.
70. The method according to any one of items 49 to 66, wherein the nicking endonuclease is DNA glycosylase-lyase Endonuclease VIII, DNA glycosylase- lyase Endonuclease III or DNA glycosylase-lyase Endonuclease IV.
71. The method according to any one of items 49 to 66, wherein the nicking endonuclease is a CRISPR Nickase, preferably D10A.
72. The method according to item 71 , wherein the CRISPR endonuclease is a CRISPR endonuclease of SEQ ID NO: 65 to 67 or a functional homologue thereof sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with any one of the CRISPR endonuclease of SEQ ID NO: 65 - 67.
73. The method according to any one of items 49 to 66, wherein the nicking endonuclease is a restriction endonuclease of any one of SEQ ID NO: 51 - 64 or a functional homologue thereof sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with any one of the restriction endonucleases of SEQ ID NO: 51 - 64.
74. The method according to any one of items 49 to 66, wherein the nicking endonuclease is endonuclease VIII of SEQ ID NO: 28, Endonuclease III of SEQ ID NO: 24 or Endonuclease IV or SEQ ID NO:42 or a functional variant of thereof sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with any one of SEQ ID NO: 24 to 50.
75. The method according to any one of items 49 to 74, wherein the strand displacing DNA polymerase is a Bst polymerase, DNA Polymerase I, Large (Klenow) Fragment or DNA Polymerase from Thermococcus litoralis.
The method according to any one of items 49 to 74, wherein the strand displacing DNA polymerase is a phi29. The method according to any of the items 49 to 74, wherein the strand displacing DNA polymerase is a Polymerase with 5'^3' exonuclease activity, such as Taq Polymerase. The method according to any one of items 49 to 74, wherein the strand displacing DNA polymerase is a DNA polymerase sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with Bst DNA Polymerase of SEQ ID NO: 68. The method according to any one of items 49 to 74, wherein the strand displacing DNA polymerase is a Bst polymerase comprising a large fragment, wherein said large fragment comprises or consists of a sequence sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with the Large fragment of Bst polymerase SEQ ID NO: 69. The method according to any one of items 49 to 74, wherein the DNA polymerase is a polypeptide of SEQ ID NO:76 or a DNA polymerase sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with DNA Polymerase I, Large (Klenow) Fragment of SEQ ID NO: 76. The method according to any one of items 49 to 74, wherein the strand displacing DNA polymerase is a polypeptide of SEQ ID NO:77 or a DNA polymerase sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with DNA Polymerase from Thermococcus litoralis of SEQ ID NO: 77. The method according to any one of items 49 to 74, wherein the strand displacing DNA polymerase is a DNA polymerase sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at
least 95%, such as 100% sequence identity with phi 29 DNA polymerase of SEQ ID NO: 70 or Taq DNA polymerase of SEQ ID NO: 71.
83. The method according to any one of items 49 to 82, wherein the DNA fragments consist of or comprise genomic DNA.
84. The method according to any one of items 49 to 83, wherein the DNA fragments are protein-bound DNA fragments.
85. The method according to any of the items 49 to 82, wherein the DNA fragments are naked genomic DNA.
86. The method according to any one of the items 49 to 82, wherein the DNA fragments are cell free DNA.
87. The method according to any one of the items 85 to 86, wherein the DNA fragments are cell free DNA fragments obtained from blood, blood plasma, urine, or ascites fluid.
88. The method according to any one of the preceding items, wherein the DNA fragments comprises nucleosomes, for example wherein the majority of the DNA fragments are in the form of nucleosomes.
89. The method according to any one of items 85 to 88, wherein the majority of the cell free DNA fragments are in the form of nucleosomes.
90. The method according to any one of the items 85 to 88, wherein the DNA fragments are naked cell free DNA.
91. The method according to any one of items 49 to 82, wherein the DNA fragments comprises nucleosomes and/or genomic DNA fragments bound to chromatin proteins such as transcription factors.
92. The method according to any of the items 49 to 83, wherein genomic DNA is eukaryotic or prokaryotic.
The method according to any one of items 49 to 92, wherein the DNA fragments are selected from the group consisting of cDNA, DNA produced by whole genome amplification, primer extension products comprising at least one double-stranded terminus, and a PCR amplicon. The method according to any one of items 49 to 93, wherein the DNA fragments are obtained by lysing cells from a cell culture or from mammalian material and fragmenting the chromatin from the lysed cells. The method according to any one of items 49 to 94, wherein the DNA fragments are obtained by isolating chromatin from a cellular sample and fragmenting said chromatin. The method according to any of the items 49 to 95, wherein the DNA fragments are obtained by isolating, partly isolating and/or fragmenting DNA from cultured cells, cultured cell lysate, cell culture supernatant, and/or mammalian material. The method according to any of the items 49 to 96, wherein the DNA fragments are obtained by isolating, partly isolating and/or fragmenting DNA from mammalian material, wherein said mammalian material for example may be material such as tissue or biopsies collected from a mammalian and/or body fluids such as blood, serum, blood plasma, urine, a spinal fluid, saliva, a lymph fluid, a lacrimal fluid, or a seminal fluid. The method according to any of the items 49 to 97, wherein the sample comprises purified DNA, purified nucleosomes, purified chromatin, cell lysate. The method according to any of the items 49 to 98, wherein the sample comprises or consists of plasma, blood, serum, urine, spinal fluid, salvia, lymph fluid, lacrimal fluid or seminal fluid. . The method according to any of the items 49 to 99, wherein the DNA fragments are cell free DNA fragments.
101. The method according to any one of items 49 to 100, wherein the DNA fragments have been prepared by mechanical shearing and/or enzymatic digestions.
102. The method according to any one of items 49 to 101 , wherein the DNA fragment in average comprise more than 10 base pairs, such as more than 15 base pairs, such as more than 150 base pairs, for example in the range of 10 to 15,000 base pairs, such as in the range of 10 to 10,000 base pairs, for example in the range of 10 to 5,000 base pairs, such as in the range of 10 to 500 base pairs.
103. The method according to any one of items 49 to 102, wherein attaching of step c) is done by ligation, such as by blunt end ligation.
104. The method according to any one of items 49 to 103, wherein said ligation is performed by incubation with a ligase, for example a T4 DNA ligase.
105. The method according to any of the items 49 to 104, wherein the method further comprises a step of preparing said DNA fragments for ligation.
106. The method according to any one of items 49 to 105, wherein ligation is performed by incubating the DNA fragments and the adaptors with a ligase under conditions allowing for activity of said ligase.
107. The method according to any one of items 49 to 106, wherein the adaptor contains a sample specific barcode, and wherein the DNA fragments obtained from said sample is used.
108. The method according to any one of items 49 to 107, wherein step d) is performed at a temperature in the range of 20° C to 80° C.
109. The method according to any one of items 49 to 108, wherein step e) is performed at a temperature in the range of 40° C to 80° C, such as in the range of 45°C to 70°C, for example in the range of 50°C to 70°C.
110. A method of amplification of DNA fragments, said method comprising the step of a) preparing DNA fragments attached to adaptors by the method according to items 49 to 109, b) amplifying said DNA fragments attached to adaptors in vitro.
111. The method according to item 110, wherein the amplification is performed by RNA polymerase-driven transcription.
112. The method according to item 111 , wherein the RNA polymerase is T7 RNA polymerase, Bacteriophage T3 RNA polymerase or Cyanophage Syn5 polymerase.
113. The method according to item 112, wherein the T7 RNA polymerase is a polymerase of SEQ ID NO: 78 or a functional homologue thereof sharing at least 70%, such as at least 80%, for example at least 85%, such as at least 90%, for example at least 95%, such as 100% sequence identity with SEQ ID NO: 78.
114. The method according to any one of items 110 to 113, wherein A of said adaptor contains a sequence complementary to a primer binding site, and wherein a primer capable of binding to said primer binding site is used for amplification.
115. The method according to any one of items 110 to 114, wherein two adaptors are provided each of said adaptors contains a sequence complementary to a primer binding site, wherein the primer binding site of the two adaptors are distinct, and wherein two distinct primers, each capable of binding to one of the two primer binding sites, are used for amplification.
Examples
Example 1: Performance comparison of r5 and C3 adaptor
The following example compares the adaptor contamination obtained in a MINUTE-ChIP experiment performed essentially as described in Kumar et al, 2019 except that the adaptors described herein are used. Thus, the following example is performed using the
adaptor “r5” shown in figure 1 B. As control, a prior art adaptor known as “3C” adaptor shown in figure 1A was used.
Materials and Methods
Mouse embryonic stem cell pellets containing 1-2 x 106 cells were barcoded with either with u5 (see Figure 1 C), with r5 (see Figure 1 B) or with prior art (3C) adaptors (Figure 1 A) at 50 .M All experiments were carried out in duplicates. Briefly, cells were first lysed and digested with MNase to enrich for mononucleosome population. The digestion was quenched by EGTA-containing end-repair and ligation buffer, in which each sample was ligated to u5, r5 or 3C adaptor molecules carrying unique barcode. Ligation was quenched by EDTA-containing lysis dilution buffer, before combining all samples in one tube. After centrifugation to remove insoluble cell debris, supernatant was pooled together and an aliquot was carried forward as input material to Proteinase K treatment at 65°C overnight.
The u5 samples were subjected to enzymatic removal of adaptor contamination with sequential treatment of Uracil-DNA glycosylase and either an endonuclease VIII (NEB USER enzyme, available from New England Biolabs, US) (u5) or endonuclease IV (NEB USER enzyme III, available from New England Biolabs, US) (u5_ll I). R5 samples were subjected to enzymatic removal of adaptor contamination with treatment of Rnase Hll. Bst 3.0 DNA polymerase (NEB M0374S) reaction mix in manufacturer’s Amplification Buffer is then added to the digested materials and incubate at 68°C for 2 hours for adaptor fill-in. Then the u5, u5_lll and r5 adaptor-ligated DNA fragments were purified and ready for T7-RNA polymerase-driven in vitro transcription, together with the 3C samples purified previously. The amplified RNA product was treated with Dnase I, purified and then ligated to a pre-adenylated RNA 3’ adaptor (RA3), which served as a primer binding site for reverse transcription. The resulting cDNA was treated with Rnase A and Rnase H, purified and then used as a template for library PCR with barcoded primers compatible with Illumina sequencing platform. All nucleic-acid purification were carried out with AM Pure SPRI size selection method (Beckman Coulter), either with a standard or the small fragment (sf) purification protocol. Library size distribution was assessed by Agilent BioAnalyzer and were quantified by Qubit DNA high sensitivity assay before dilution for paired-end sequencing in Illumina platform.
Results
The u5, u5_l II and r5 adaptor yielded ~20-50-fold lower adaptor contamination in the final libraries, i.e. -20-50 fold lower free adaptor (Figure 4).
Conclusion
The disclosed invention produced lower contamination in the final libraries and increased the percentage of mappable reads compared to standard adaptor.
Example 2 Ligation of adaptors of the fill-in adaptors to cell free DNA (cfDNA) fragments in the human blood
The following example shows that the “fill-in” adaptors can be ligated to cell free DNA fragments. The cfDNA fragments can be amplified and sequenced by next generation sequencing. The person skilled in the art will appreciate that this experiment can be performed with fill-in adaptors comprising all endonuclease nicking sites that are described in this invention.
Materials and Methods
Human plasma is obtained from whole blood samples by centrifugation (10 min, 800g, 4°C) and collecting the supernatant, which is used fresh or flash frozen and may be stored at -80°C before use. Four plasma samples, 200uL each are set up in parallel ligation reactions for 2h at room temperature (with T4 Polynucleotide kinase (2.5U) and T4 Ligase (2.5U) 10x buffer, 3% PEG 4000, 0.2mM ATP), to directly ligate the fill-in adaptors shown in figure 1 onto cfDNA, whether in the form of nucleosomes or naked DNA in the plasma. The ligation reactions (250uL each) are stopped by the addition of a stop buffer (50mM Tris-Hcl, 150mM NaCI, 1% Triton X-100, 50mM EGTA, 50mM EDTA, 0.1% DOC) and the barcoded plasma samples are pooled (1.5 mL total volume). 150uL of the resulting pool is collected as the “input” and the remaining pool is equally split into two ChIP reactions that are incubated overnight at 4°C, with magnetic beads coupled with antibodies against histone H3 (3uL of Active Motif 39763) and histone H3K4me3 (3uL of Millipore 04745). Post-ChIP, the precipitated material along with the input were subjected to sequential treatment of RNase HI I and Bst polymerase, in vitro transcription by T7 RNA polymerase, cDNA conversion and library PCR as described in Example 1, yielding the Input, H3 and H3K4me3 libraries.
Libraries are further diluted to 2nM and pooled, before sequencing on the Illumina NextSeq 2000 platform.
Results
The results shown in Figure 5 shows the results obtained using r5 (see Figure 1 B). The skilled person will appreciate that based on these results the adaptors of the present invention will also be efficiently ligated onto cell-free DNA (cfDNA) fragments in human plasma (Figure 5).
Conclusions
The adaptors of the present invention can be efficiently ligated onto cell-free DNA (cfDNA) fragments, irrespective of if they are in the form of nucleosomes or not. cfDNA barcoded with the fill-in adaptors as described in the present invention can be amplified into a sequencing library after purification or can be subjected to ChIP in order to retrieve cfDNA fragments bound to nucleosomes (H3 ChIP) or nucleosomes with specific histone modifications present (H3K4me3 ChIP).
Sequence overview
SEQ ID NO: 1: >sp|028007|UDGA_ARCFU Type-4 uracil-DNA glycosylase from Archaeoglobus fulgidus, Gene Name: afung
SEQ ID NO: 2: >sp|P12295|UNG_ECOLI Uracil-DNA glycosylase from Escherichia coli (strain K12), Gene Name:ung
SEQ ID NO: 3: >sp|P13051 |UNG_HUMAN Uracil-DNA glycosylase from Homo sapiens, Gene Name:UNG
SEQ ID NO: 4: >sp|Q9U221|UNG_CAEEL Uracil-DNA glycosylase from Caenorhabditis elegans, Gene Name:ung-1
SEQ ID NO: 5: >tr|A0A126LB28|A0A126LB28_HUMAN Uracil-DNA glycosylase from Homo sapiens, Gene Name:U81
SEQ ID NO: 6: >tr|A0A3Q1M076|A0A3Q1M076_BOVIN Uracil-DNA glycosylase from Bos taurus, Gene Name:ACACB
SEQ ID NO: 7: >tr|AOA3Q1MNF9|AOA3Q1 MNF9_BOVIN Uracil-DNA glycosylase from Bos taurus, Gene Name:ACACB
SEQ ID NO: 8: >tr|AOA3Q1MS56|AOA3Q1MS56_BOVIN Uracil-DNA glycosylase from Bos taurus, Gene Name:ACACB
SEQ ID NO: 9: >tr|A0A7J5YR18|A0A7J5YR18_DISMA Uracil-DNA glycosylase from Dissostichus mawsoni, Gene Name: UNG
SEQ ID NO: 10: >tr|A0A8V8TPS1 |A0A8V8TPS1_HUMAN Uracil-DNA glycosylase from Homo sapiens, Gene Name:UNG
SEQ ID NO: 11: >tr|A0A8V8TQ66|A0A8V8TQ66_HUMAN Uracil-DNA glycosylase from Homo sapiens, Gene Name:UNG
SEQ ID NO: 12: >tr|E5KTA5|E5KTA5_HUMAN Uracil-DNA glycosylase from Homo sapiens, Gene Name:UNG
SEQ ID NO: 13: >tr|E5KTA6|E5KTA6_HUMAN Uracil-DNA glycosylase from Homo sapiens, Gene Name:UNG
SEQ ID NO: 14: >tr|Q17QB8|Q17QB8_BOVIN Uracil-DNA glycosylase from Bos taurus, Gene Name:ACACB
SEQ ID NO: 15: >tr|Q6FHS8|Q6FHS8_HUMAN Uracil-DNA glycosylase from Homo sapiens, Gene Name:UNG
SEQ ID NO: 16: >tr|A0A0F8JSW2|A0A0F8JSW2_METMZ Type-4 uracil-DNA glycosylase from Methanosarcina mazei, Gene Name:DU30_02025
SEQ ID NO: 17: >tr|A0A4P8RB80|A0A4P8RB80_METMZ Type-4 uracil-DNA glycosylase from Methanosarcina mazei, Gene Name:DKM28_16410
SEQ ID NO: 18: >tr|A0A8H4BWE9|A0A8H4BWE9_YEASX Uracil-DNA glycosylase from Saccharomyces cerevisiae, Gene Name:UNG1
SEQ ID NO: 19: >tr|A0A8V8TNE1 |A0A8V8TNE1_HUMAN Uracil-DNA glycosylase from Homo sapiens, Gene Name:UNG
SEQ ID NO: 20: >tr|A0A8V8TNJ5|A0A8V8TNJ5_HUMAN Uracil-DNA glycosylase from Homo sapiens, Gene Name:UNG
SEQ ID NO: 21: >tr|A0A8V8TNW2|A0A8V8TNW2_HUMAN Uracil-DNA glycosylase from Homo sapiens, Gene Name:UNG
SEQ ID NO: 22: >tr|B4DRT6|B4DRT6_HUMAN Uracil-DNA glycosylase from Homo sapiens, Gene Name:UNG
SEQ ID NO: 23: >tr|F5GYA2|F5GYA2_HUMAN Uracil-DNA glycosylase from Homo sapiens, Gene Name:UNG
Endonucleases (Type III)
SEQ ID NO: 24: >sp|POAB83|END3_ECOLI/1-211 Endonuclease III from Escherichia coli (strain K12), Gene Name:nth
SEQ ID NO: 25: >tr|A0A8H4C029|A0A8H4C029_YEASX/1-399 Endonuclease III homolog from Saccharomyces cerevisiae, Gene Name:NTG1
SEQ ID NO: 26: >sp|Q2KID2|NTH_BOVIN/1-305 Endonuclease Ill-like protein 1 from Bos taurus, Gene Name:NTHL1
SEQ ID NO: 27: >sp|P78549|NTH_HUMAN/1-312 Endonuclease Ill-like protein 1 from Homo sapiens, Gene Name:NTHL1
Endonucleases (Type VIII)
SEQ ID NO: 28: >sp|P50465|END8_ECOLI/1-263 Endonuclease 8 from Escherichia coli (strain K12), Gene Name:nei
SEQ ID NO: 29: >sp|Q8TAT5|NEIL3_HUMAN/1-605 Endonuclease 8-like 3 from Homo sapiens, Gene Name:NEIL3
SEQ ID NO: 30: >sp|Q969S2|NEIL2_HUMAN/1-332 Endonuclease 8-like 2 from Homo sapiens, Gene Name:NEIL2
SEQ ID NO: 31: >sp|Q96FI4|NEIL1_HUMAN/1-390 Endonuclease 8-like 1 from Homo sapiens, Gene Name:NEIL1
Endonucleases
SEQ ID NO: 32: >sp|O15527|OGG1_HUMAN/1-345 N-glycosylase/DNA lyase from
Homo sapiens, Gene Name:OGG1
SEQ ID NO: 33: >sp|P06746|DPOLB_HUMAN/1-335 DNA polymerase beta from Homo sapiens, Gene Name:POLB
SEQ ID NO: 34: >sp|Q13686|ALKB1 _HUMAN/1-389 Nucleic acid dioxygenase
ALKBH1 from Homo sapiens, Gene Name:ALKBH1
SEQ ID NO: 35: >tr|l1WZN9|l1WZN9_HUMAN/1-238 DNA polymerase (Fragment) from Homo sapiens OX=9606
SEQ ID NO: 36: >sp|P05523|FPG_ECOLI/1-269 Formamidopyrimidine-DNA glycosylase from Escherichia coli (strain K12), Gene Name:mutM
SEQ ID NO: 37: >tr|Q862E1 |Q862E1_BOVIN/1-147 Small ribosomal subunit protein uS3 (Fragment) from Bos taurus OX=9913
SEQ ID NO: 38: >tr|A0A0F8GLC7|A0A0F8GLC7_METMZ/1-584 DNA polymerase beta from Methanosarcina mazei, Gene Name:DU34_00320
SEQ ID NO: 39: >tr|A0A7J5XNP8|A0A7J5XNP8_DISMA/1-727 DNA-(apurinic or apyrimidinic site) lyase from Dissostichus mawsoni, Gene Name:F7725_010514
SEQ ID NO: 40: >tr|A0A7J5Z3E3|A0A7J5Z3E3_DISMA/1-627 DNA-(apurinic or apyrimidinic site) lyase from Dissostichus mawsoni, Gene Name:F7725_022903
SEQ ID NO: 41: >tr|A0A7J5ZHK9|A0A7J5ZHK9_DISMA/1-1408 DNA-(apurinic or apyrimidinic site) lyase from Dissostichus mawsoni, Gene Name:F7725_000886
Endonucleases (Type IV)
SEQ ID NO: 42: >sp|POA6C1 |END4_ECOLI Endonuclease 4 from Escherichia coli (strain K12), Gene Name:nfo
SEQ ID NO: 43: >sp|A7GW36|END4_CAMC5 Probable endonuclease 4 from Campylobacter curvus (strain 525.92), Gene Namemfo
SEQ ID NO: 44: >sp|P00641|ENDO_BPT7 Endonuclease I from Escherichia phage T7, Gene Name:3
SEQ ID NO: 45: >sp|P54476|END4_BACSU Probable endonuclease 4 from Bacillus subtilis (strain 168), Gene Namemfo
SEQ ID NO: 46: >sp|Q31YW6|END4_SHIBS Probable endonuclease 4 from Shigella boydii serotype 4 (strain Sb227), Gene Name:nfo
SEQ ID NO: 47: >sp|Q72KH8|END4_THET2 Endonuclease 4 from Thermus thermophilus (strain ATCC BAA-163 / DSM 7039 I HB27), Gene Name:nfo
SEQ ID NO: 48: >sp|Q834D0|END4_ENTFA Probable endonuclease 4 from Enterococcus faecalis (strain ATCC 700802 ! V583), Gene Namemfo
SEQ ID NO: 49: >sp|Q9KD33|END4_HALH5 Probable endonuclease 4 from Halalkalibacterium halodurans (strain ATCC BAA-1251 DSM 18197 / FERM 7344 I JCM 9153 / C-125), Gene Name:nfo
SEQ ID NO: 50: >sp|Q1C982|END4_YERPA Probable endonuclease 4 from Yersinia pestis bv. Antiqua (strain Antiqua), Gene Name:nfo
SEQ ID NO: 51 : >tr|A0A0K2Y1 U9|A0A0K2Y1 U9_9HELI Alwl restriction endonuclease from Helicobacter ailurogastricus, Gene Name:HAL07_13780
SEQ ID NO: 52: >tr|A1XI22|A1XI22_GEOSE Nb.BsrDI from Geobacillus stearothermophilus, Gene Name:bsrDIB
SEQ ID NO: 53: >tr|A3FEV7|A3FEV7_9BACI Heterodimeric restriction endonuclease R.BspD6l large subunit from Bacillus sp. D6, Gene Name:bspD6R1
SEQ ID NO: 54: >tr|A3FEV8|A3FEV8_9BACI Heterodimeric restriction endonuclease R.BspD6l small subunit from Bacillus sp. D6, Gene Name:bspD6IR2
SEQ ID NO: 55: >tr|D2D3S6|D2D3S6_LYSSH BspQI restriction endonuclease from Lysinibacillus sphaericus, Gene Name:bspQIR
SEQ ID NO: 56: >tr|Q2IOMO|Q2IOMO_PARTM RI .BtsI from Parageobacillus thermoglucosidasius, Gene Name:btslR1
SEQ ID NO: 57: >tr|Q5D6Y4|Q5D6Y4_BREBE BbvCI endonuclease subunit 2 from
Brevibacillus brevis, Gene Name:bbvCIR-2
SEQ ID NO: 58: >tr|Q5D6Y5|Q5D6Y5_BREBE BbvCI endonuclease subunit 1 from
Brevibacillus brevis, Gene Name:bbvCIR-1
SEQ ID NO: 59: >tr|Q5Q1 P6|Q5Q1 P6_9PHYC CviPII top-strand DNA nicking endonuclease from Chlorella virus, Gene Name:cviPIINt
SEQ ID NO: 60: >tr|Q6UQ64|Q6UQ64_GEOSE BsmAI endonuclease from Geobacillus stearothermophilus, Gene Name:bsmAIR
SEQ ID NO: 61 : >tr|Q8L3A5|Q8L3A5_GEOSE BssSI restriction endonuclease from Geobacillus stearothermophilus, Gene Name:bssSIR
SEQ ID NO: 62: >tr|Q8RLN4|Q8RLN4_GEOSE Bsml from Geobacillus stearothermophilus, Gene Name:bsmlR
SEQ ID NO: 63: >tr|Q9AM79|Q9AM79_GEOSE N.BstSEI from Geobacillus stearothermophilus, Gene Name:bstSEIN
SEQ ID NO: 64: >tr|D2D3S6|D2D3S6_LYSSH BspQI restriction endonuclease from Lysinibacillus sphaericus, Gene Name:bspQIR
Cas9 Enzymes
SEQ ID NO: 65: >sp|Q99ZW2|CAS9_STRP1 CRISPR-associated endonuclease
Cas9/Csn1 from Streptococcus pyogenes serotype M1 , Gene Name:cas9
SEQ ID NO: 66: >Cas9 D10A Nickase mutant
SEQ ID NO: 67: >Cas9 H840A Nickase mutant
SEQ ID NO: 68: [WP_033014420] Bst DNA Polymerase (DNA polymerase I from Geobacillus stearothermophilus)
SEQ ID NO: 69: Bst DNA Polymerase Large Fragment (LF) 587 a. a. (290-876) (N- terminus truncated DNA polymerase I from Geobacillus stearothermophilus)
SEQ ID NO: 70: Bacillus phage phi29 DNA polymerase
SEQ ID NO: 71 Taq DNA polymerase I
SEQ ID NO: 72: T7 promotor
SEQ ID NO: 73: SP6 RNA polymerase recognition sequence
SEQ ID NO: 74: T3 RNA polymerase recognition sequence
SEQ ID NO: 75: Cyanophage Syn5 polymerase recognition sequence
SEQ ID NO: 76: >sp|P00582|DPQ1_ECQLI DNA polymerase I OS=Escherichia coli (strain K12) OX=83333 GN=polA PE=1 SV=1
SEQ ID NO: 77: >sp|P30317|DPOL_THELI DNA polymerase OS=Thermococcus litoralis OX=2265 GN=pol PE=1 SV=1
SEQ ID NO: 78: T7 RNA polymerase
SEQ ID NO: 79: Adaptor sequence s_USER_BC03
SEQ ID NO: 80: Adaptor sequence as_USER_BC03
SEQ ID NO: 81 : s_USER_BC04
SEQ ID NO: 82: as_USER_BC04
SEQ ID NO: 83: Figure 1A top strand
SEQ ID NO: 84: Figure 1A bottom strand
SEQ ID NO: 85: Figure 1B top strand
SEQ ID NO: 86: Figure 1B bottom strand
SEQ ID NO: 87: Figure 1C top strand
SEQ ID NO: 88: Figure 1C bottom strand
SEQ ID NO: 89: Figure 1 D top strand
SEQ ID NO: 90: Figure 1 D bottom strand
SEQ ID NO: 91 : Figure 1 E top strand
SEQ ID NO: 92: Figure 1 E bottom strand
SEQ ID NO: 93: Mixed Adaptor A top strand
SEQ ID NO: 94: Mixed Adaptor A bottom strand
SEQ ID NO: 95: Mixed Adaptor B top strand
SEQ ID NO: 96: Mixed Adaptor B bottom strand
SEQ ID NO: 97: PCR forward primer (universal for all versions)
SEQ ID NO: 98: PCR reverse primer (universal for all versions)
References
1. Breslauer KJ, Frank R, Blocker H, Marky LA. Predicting DNA duplex stability from the base sequence. Proc Natl Acad Sci U S A. 1986 Jun;83(11):3746-50. doi: 10.1073/pnas.83.11.3746. PMID: 3459152; PMCID: PMC323600.
2. Rice P, Longden I, Bleasby A. EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet. 2000 Jun;16(6):276-7. doi: 10.1016/s0168- 9525(00)02024-2. PMID: 10827456.
3. Kumar B, Elsasser SJ. Quantitative Multiplexed ChIP Reveals Global Alterations that Shape Promoter Bivalency in Ground State Embryonic Stem Cells. Cell Rep. 2019 Sep 17;28(12):3274-3284.e5. doi: 10.1016/j.celrep.2019.08.046. PMID: 31533047; PMCID: PMC6859498.
4. van Galen P, Viny AD, Ram O, Ryan RJ, Cotton MJ, Donohue L, Sievers C, Drier Y, Liau BB, Gillespie SM, Carroll KM, Cross MB, Levine RL, Bernstein BE. A Multiplexed System for Quantitative Comparisons of Chromatin Landscapes. Mol Cell. 2016 Jan 7;61(1):170-80. doi: 10.1016/j.molcel.2015.11.003. Epub 2015 Dec 10. PMID: 26687680; PMCID: PMC4707994.
Claims
1 . A partly double-stranded adaptor comprising or consisting of an oligonucleotide of the general structure:
5’ -A - B - C - 3’
3’ - A’ - B’ - C’ - 5’ wherein a) A is the top strand of a DNA amplification sequence, and A consists of 5’ - Ai - A2 - 3’; b) A’ consists of 3’ - Ai’ - A2’ - 5’, and c) Ai’ is a sequence of nucleotides, which is substantially non- complementary to A-i, wherein the 3’ end is exonuclease resistant and/or contains primer extension blocking groups; d) A2 and A2’ are either not present or A2 and A2’ are sequences of nucleotides substantially complementary to each other; e) B and B’ are sequences of in the range of 5 to 100 deoxyribonucleotides, which are substantially complementary to each other; and f) C and C’ are sequences, which are substantially complementary to each other, wherein:
C’ or -B’-C’-comprises a nicking endonuclease recognition site or a nicking endonuclease pre-recognition site, wherein said pre-nicking site can be converted into a nicking endonuclease recognition site by the action of a DNA glycosylase with the proviso that the nicking endonuclease recognition site is not a ribonucleotide positioned at the 3’ end of C’.
2. The adaptor according to claim 1 , wherein the pre-nicking site comprises or consists of a deoxy-Uridine (dU).
3. The adaptor according to claim 1 , wherein the nicking endonuclease recognition site is: i) a nicking site recognised by a restriction enzyme with nicking activity; ii) a nicking site recognised by Nt.BspQI, Nt.CviPII, Nt.BstNBI, Nb.BsrDI, Nb.BtsI, Nt.Alwl, Nb.BbvCI, NtBbvCI, Nb.Bsml, Nb.BssSI or Nt.BsmAI or
variants of any of the aforementioned, optionally thermostable variants thereof; or iii) GCTCTTC, CCD, GCAGTG, GGATC, CCTCAGC, GAATGC, CACGAG or GTCTC
4. The adaptor according to claim 1 , wherein the nicking endonuclease recognition site is: i) a CRISPR nicking endonuclease recognition site; ii) a CRISPR Cas9 nicking site; iii) a CRISPR D10A nicking site comprising a PAM sequence (NGG) downstream of the site of the nick; iv) an SP6 RNA polymerase recognition site, preferably a sequence of SEQ ID NO: 73 or a sequence sharing at least 95% sequence identity therewith; v) a T3 RNA polymerase recognition site, preferably a sequence of SEQ ID NO: 74 or a sequence sharing at least 95% sequence identity therewith; or vi) a Cyanophage Syn5 polymerase recognition site, preferably a sequence of SEQ ID NO: 75 or a sequence sharing at least 95% sequence identity therewith.
5. The adaptor according to any one of claims 1 to 4, wherein said adaptor comprises or consists of an oligonucleotide of the general structure:
5’ - A - B - C - 3’
3’ -A’ - B’ - C’ - P - 5’ wherein a) A, B, C, A’, B’ and C’ are as defined in any one of claims 1 ; and b) P is a 5’ phosphate.
6. The adaptor according to any one of claims 1 to 5, wherein:
(A) A consists of in the range of: i) 10 to 100 nucleotides; ii) 15 to 50 nucleotides; or iii) 15 to 40 nucleotides; and/or
(B) A’ consists of in the range of: i) 2 to 100 nucleotides; ii) 2 to 35 nucleotides; or iii) 2 to 10 nucleotides.
7. The adaptor according to any one of claims 1 to 6, wherein the DNA amplification sequence is a promoter sequence of an RNA polymerase or it comprises a primer binding site.
8. The adaptor according to any one of claims 1 to 7, wherein A, when bound to its complementary sequence as a double-stranded DNA, is
(i) recognized by a polymerase selected from an RNA polymerase, T7 RNA polymerase, SP6 RNA polymerase, Bacteriophage T3 RNA polymerase, or Cyanophage Syn5 polymerase; or
(ii) contains a sequence complementary to a primer binding site.
9. The adaptor according to any one of claims 1 to 8, wherein Ai’: i) comprises or consists of a sequence of nucleotides connected through exonuclease-resistant phosphorothioate linkage; ii) is non-complementary to A iii) comprises or consists of a sequence of 3 to 35 of consecutive nucleotides connected through exonuclease-resistant phosphorothioate linkages; and/or iv) comprises or consists of a sequence of 3 to 35 of consecutive cytosines connected through exonuclease-resistant phosphorothioate linkages.
10. The adaptor according to any one of claims 1 to 9, wherein the 3’ end of Ai’: i) contains a nucleotide that has been modified to block primer extension; ii) comprises a dideoxynucleotide; and/or iii) comprises a phosphoramidite C3 spacer.
11. The adaptor according to any one of claims 1 to 10, wherein Ai’ comprises: i) one or more nucleotide analogues or modifications which are exonuclease-resistant, selected from the group consisting of
phosphorothioate linkages, phosphoramidite C3 spacer, inverted deoxythymidine bases, 2'-O-methyl and 2'-O-methoxyethyl nucleosides; and/or ii) a sequence that prevent RNA polymerase engagement and function, for example, a sequence that supports formation of hairpin, loop or other secondary structure.
12. The adaptor according to any one of claims 1 to 11 , wherein -B-C- and/or -B’- C’- contains: i) a primer binding site; ii) a partial or full-length SBS3 primer binding site; iii) a randomized unique molecular identifier consisting of in the range of 5 to 15 nucleotides; and/or iv) a barcode sequence consisting of in the range of 5 to 15 nucleotides.
13. The adaptor according to any one of claims 1 to 12, wherein: i) -B-C- and/or -B’-C’- in addition contains a random sequence of in the range of 5 to 15 nucleotides; and/or ii) -B’-C’- comprises the nicking endonuclease recognition site.
14. The adaptor according to any one of claims 1 to 13, wherein at least 80%of the nucleotides of C can form Watson-Crick base-pairs with nucleotides of C’.
15. The adaptor according to any one of claims 1 to 14, wherein C’: i) is 2 or more nucleotides in length; ii) is between 2 and 25 nucleotides in length; iii) is between 3 and 12 nucleotides in length; and/or iv) comprises one deoxy-Uridine positioned in C’ in a position selected from the group of position 5, 6, 7, 8, 9, 10, 11 and 12.
16. The adaptor according to any one of claims 1 to 15, wherein:
(i) all nucleotides of A, A2’, B and B’ are deoxyribonucleotides; or
(ii) said adaptor do not comprises any ribonucleotides.
17. The adaptor according to any one of claims 1 to 16, wherein the adaptor: i) further comprises a tag, optionally wherein the tag is selected from the group of Biotin-, Azide-, Alkyne-, Tetrazine-, Bicyclononyne-, Cyclopropene-, Trans-cyclooctene-, Norbonene-, Dibenzocyclooctyne- and Ketone-tags; and/or ii) comprises an NHS-based handle.
18. A method of attaching adaptor(s) to DNA fragment(s), said method comprising: a) providing at least one adaptor according to any one of claims 1 to 17; b) providing a sample containing DNA fragments; c) attaching the adaptor to the DNA fragments in the sample; and d) i) if C’ comprises a nicking endonuclease pre-recognition site, incubating the sample with a DNA glycosylase recognising said pre-nicking site and with a nicking endonuclease recognising the nicking endonuclease recognition site generated by the DNA glycosylase either sequentially, partly simultaneously or simultaneously under conditions allowing for activity of said enzymes; or ii) if C’ or -B’-C’- comprises a nicking endonuclease recognition site incubating the sample with an endonuclease recognising said nicking endonuclease recognition site under conditions allowing for activity of said enzyme e) incubating the sample at a temperature that is higher than the Tm of i) and ii), wherein i) and ii) are as follows i) 5’ - C - 3’
3’ - C’ - 5’, ii) 5’ - C - C’ - 3’
3’ - C - C - 5’; f) incubating the sample with a strand-displacing DNA polymerase.
19. The method according to claim 18, wherein the steps e) and f) are performed simultaneously.
20. The method according to claim 18 or 19, wherein
(i) the sample is incubated with the strand-displacing DNA polymerase at a temperature that is higher than the Tm of i) and ii);
(ii) the method further comprises a step of incubation at a temperature in the range of 0°C to 4°C, wherein said step is performed immediately after step e); and/or
(iii) step f) is performed at a temperature in the range of 20 to 80°C, 25 to 75°C, 20 to 50°C, or 25 to 37°C.
21. The method according to any one of claims 18 to 20, wherein the DNA glycosylase: i) is a monofunctional glycosylase; ii) is a bifunctional glycosylase; or iii) is a Uracil DNA glycosylase of any one of SEQ ID NO: 1-23 or a functional homologue thereof sharing at least 70% sequence identity with any one of the Uracil DNA glycosylases of SEQ ID NO: 1 - 23; and/or iv) the DNA glycosylase and the nicking endonuclease are combined in one enzyme, optionally wherein the nicking endonuclease is DNA glycosylase-lyase Endonuclease VIII, DNA glycosylase-lyase Endonuclease III or DNA glycosylase-lyase Endonuclease IV of any one of SEQ ID NO: 31 to 41 or a functional homologue thereof sharing at least 70% sequence identity with any one of the glycosylase-lyase endonucleases of SEQ ID NO: 31 to 41.
22. The method according to any one of claims 18 to 21 , wherein step d) comprises incubating the sample with: i) Uracil DNA glycosylase (UDG) and with DNA glycosylase-lyase Endonuclease VIII; ii) Antarctic Thermolabile Uracil DNA glycosylase (UDG) and DNA glycosylase-lyase Endonuclease III; or iii) Afu Uracil DNA glycosylase (UDG) and the DNA glycosylase-lyase Endonuclease IV.
23. The method according to any one of claims 18 to 22, wherein the DNA glycosylase is a Uracil-DNA glycosylase, optionally wherein the Uracil-DNA glycosylase is Antarctic Thermolabile Uracil DNA glycosylase (UDG) or Afu Uracil DNA glycosylase (UDG).
24. The method according to any one of claims 18 to 23, wherein the DNA glycosylase generates an AP site.
25. The method according to any one of claims 18 to 24, wherein the nicking endonuclease is: i) a nicking endonuclease recognising an AP site, preferably an apyrimidinic site; and/or ii) endonuclease VIII, Endonuclease III or Endonuclease IV.
26. The method according to any one of claims 18 to 25, wherein the strand displacing DNA polymerase is: i) a Bst polymerase, DNA Polymerase I, Large (Klenow) Fragment or DNA Polymerase from Thermococcus litoralis ii) a phi29; iii) a Polymerase with 5'^3' exonuclease activity; or iv) Taq Polymerase.
27. The method according to any one of claims 18 to 26, wherein the DNA fragments: i) consist of or comprise genomic DNA; ii) are protein-bound DNA fragments; iii) are naked genomic DNA; iv) are cell-free DNA; v) are naked cell-free DNA; vi) are cell-free DNA fragments obtained from blood, blood plasma, urine, or ascites fluid; vii) comprises nucleosomes; viii) comprise genomic DNA fragments bound to chromatin proteins; and/or ix) comprise genomic DNA bound to transcription factors.
28. The method according to claim 27, wherein genomic DNA is eukaryotic or prokaryotic.
29. The method according to any one of claims 18 to 28, wherein the DNA fragments are: i) selected from the group consisting of cDNA, DNA produced by whole genome amplification, primer extension products comprising at least one double-stranded terminus, and a PCR amplicon; ii) have been prepared by mechanical shearing and/or enzymatic digestions; iii) obtained by lysing cells from a cell culture or from mammalian material and fragmenting the chromatin from the lysed cells; iv) obtained by isolating chromatin from a cellular sample and fragmenting said chromatin; v) obtained by isolating, partly isolating and/or fragmenting DNA from cultured cells, cultured cell lysate, cell culture supernatant, and/or mammalian material; and/or vi) obtained by isolating, partly isolating and/or fragmenting DNA from mammalian material, optionally wherein said mammalian material is tissue or biopsies collected from a mammalian and/or body fluids.
30. The method according to any of the claims 18 to 29, wherein the sample: i) comprises purified DNA, purified nucleosomes, purified chromatin, cell lysate; and/or ii) comprises or consists of plasma, blood, serum, urine, spinal fluid, salvia, lymph fluid, lacrimal fluid or seminal fluid.
31. The method according to any one of claims 18 to 30, wherein attaching of step c) is done by ligation, , preferably wherein said ligation is performed by incubation with a ligase, for example a T4 DNA ligase.
32. The method according to any of the claims 18 to 31 , wherein the method further comprises a step of preparing said DNA fragments for ligation.
33. The method according to any one of claims 18 to 320, wherein the adaptor contains a sample specific barcode, and wherein the DNA fragments obtained from said sample is used.
34. The method according to any one of claims 18 to 330, wherein: i) step d) is performed at a temperature in the range of 20° C to 80° C; and/or ii) step e) is performed at a temperature in the range of 40° C to 80° C.
35. A method of amplification of DNA fragments, said method comprising the step of: a) preparing DNA fragments attached to adaptors by the method according to items 18 to 34, and b) amplifying said DNA fragments attached to adaptors in vitro.
36. The method according to claim 35, wherein the amplification is performed by RNA polymerase-driven transcription.
37. The method according to claim 36, wherein the RNA polymerase is T7 RNA polymerase, Bacteriophage T3 RNA polymerase or Cyanophage Syn5 polymerase, optionally wherein the T7 RNA polymerase is a polymerase of SEQ ID NO: 78 or a functional homologue thereof sharing at least 70% sequence identity with SEQ ID NO: 78.
38. The method according to any one of claims 35 to 37, wherein A of said adaptor contains a sequence complementary to a primer binding site, and wherein a primer capable of binding to said primer binding site is used for amplification.
39. The method according to any one of claims 35 to 38, wherein two adaptors are provided, each of said adaptors containing a sequence complementary to a primer binding site, wherein the primer binding site of the two adaptors are distinct, and wherein two distinct primers, each capable of binding to one of the two primer binding sites, are used for amplification.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP23212573 | 2023-11-28 | ||
| EP23212573.2 | 2023-11-28 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025114477A1 true WO2025114477A1 (en) | 2025-06-05 |
Family
ID=88978349
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/EP2024/083987 Pending WO2025114477A1 (en) | 2023-11-28 | 2024-11-28 | Adaptors for ligation |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025114477A1 (en) |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2016049492A1 (en) * | 2014-09-26 | 2016-03-31 | The Usa, As Represented By The Secretary, Dept. Of Health And Human Services | Virus-based expression vectors and uses thereof |
| US20160130576A1 (en) * | 2011-10-19 | 2016-05-12 | Nugen Technologies, Inc. | Compositions and methods for directional nucleic acid amplification and sequencing |
| US20170088887A1 (en) * | 2014-03-03 | 2017-03-30 | Swift Biosciences, Inc. | Enhanced Adaptor Ligation |
| US20200115717A1 (en) * | 2014-06-17 | 2020-04-16 | Crown Laboratories, Inc. | Genetically modified bacteria and methods for genetic modification of bacteria |
| US20200199577A1 (en) * | 2018-12-19 | 2020-06-25 | New England Biolabs, Inc. | Target Enrichment |
| WO2021042047A1 (en) * | 2019-08-30 | 2021-03-04 | The General Hospital Corporation | C-to-g transversion dna base editors |
-
2024
- 2024-11-28 WO PCT/EP2024/083987 patent/WO2025114477A1/en active Pending
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20160130576A1 (en) * | 2011-10-19 | 2016-05-12 | Nugen Technologies, Inc. | Compositions and methods for directional nucleic acid amplification and sequencing |
| US20170088887A1 (en) * | 2014-03-03 | 2017-03-30 | Swift Biosciences, Inc. | Enhanced Adaptor Ligation |
| US20200115717A1 (en) * | 2014-06-17 | 2020-04-16 | Crown Laboratories, Inc. | Genetically modified bacteria and methods for genetic modification of bacteria |
| WO2016049492A1 (en) * | 2014-09-26 | 2016-03-31 | The Usa, As Represented By The Secretary, Dept. Of Health And Human Services | Virus-based expression vectors and uses thereof |
| US20200199577A1 (en) * | 2018-12-19 | 2020-06-25 | New England Biolabs, Inc. | Target Enrichment |
| WO2021042047A1 (en) * | 2019-08-30 | 2021-03-04 | The General Hospital Corporation | C-to-g transversion dna base editors |
Non-Patent Citations (8)
| Title |
|---|
| BRESLAUER ET AL., PROC. NATL. ACAD. SCI., vol. 83, 1986, pages 3746 - 50 |
| BRESLAUER KJFRANK RBLOCKER HMARKY LA: "Predicting DNA duplex stability from the base sequence", PROC NATL ACAD SCI U S A., vol. 83, no. 11, June 1986 (1986-06-01), pages 3746 - 50, XP002034050, DOI: 10.1073/pnas.83.11.3746 |
| KUMAR BELSASSER SJ: "Quantitative Multiplexed ChIP Reveals Global Alterations that Shape Promoter Bivalency in Ground State Embryonic Stem Cells", CELL REP., vol. 28, no. 12, 17 September 2019 (2019-09-17), pages 3274 - 3284, XP055801885, DOI: 10.1016/j.celrep.2019.08.046 |
| NEEDLEMANWUNSCH, J. MO/. BIOL., vol. 48, 1970, pages 443 - 453 |
| PETER VAN GALEN ET AL., MOLECULAR CELL, 2016 |
| RICE ET AL., TRENDS GENET., vol. 16, 2000, pages 276 - 277 |
| RICE PLONGDEN IBLEASBY A: "EMBOSS: the European Molecular Biology Open Software Suite", TRENDS GENET., vol. 16, no. 6, June 2000 (2000-06-01), pages 276 - 7, XP004200114, DOI: 10.1016/S0168-9525(00)02024-2 |
| VAN GALEN PVINY ADRAM ORYAN RJCOTTON MJDONOHUE LSIEVERS CDRIER YLIAU BBGILLESPIE SM: "A Multiplexed System for Quantitative Comparisons of Chromatin Landscapes", MOL CELL, vol. 61, no. 1, 10 December 2015 (2015-12-10), pages 170 - 80, XP029381656, DOI: 10.1016/j.molcel.2015.11.003 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| AU2022200686B2 (en) | Compositions and methods for targeted depletion, enrichment, and partitioning of nucleic acids using CRISPR/Cas system proteins | |
| US10876108B2 (en) | Compositions and methods for targeted nucleic acid sequence enrichment and high efficiency library generation | |
| AU2019274949B2 (en) | Method | |
| US8609341B2 (en) | Uniform fragmentation of DNA using binding proteins | |
| CN114250274A (en) | Amplification of primers with limited nucleotide composition | |
| US20240318244A1 (en) | Click-chemistry based barcoding | |
| JP4601830B2 (en) | Coupled polymerase chain reaction-restriction endonuclease digestion-ligase detection reaction method | |
| WO2025114477A1 (en) | Adaptors for ligation | |
| WO2023227699A1 (en) | Adaptor ligation | |
| HK40064470A (en) | Amplification with primers of limited nucleotide composition |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24816759 Country of ref document: EP Kind code of ref document: A1 |