WO2024020410A1 - Systems and methods for dual-end sequencing - Google Patents
Systems and methods for dual-end sequencing Download PDFInfo
- Publication number
- WO2024020410A1 WO2024020410A1 PCT/US2023/070442 US2023070442W WO2024020410A1 WO 2024020410 A1 WO2024020410 A1 WO 2024020410A1 US 2023070442 W US2023070442 W US 2023070442W WO 2024020410 A1 WO2024020410 A1 WO 2024020410A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- adaptor
- nucleic acid
- sequence
- segment
- strand
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6806—Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6869—Methods for sequencing
Definitions
- a method of preparing a double-stranded sample nucleic acid for sequencing wherein the double-stranded sample nucleic acid comprises a forward sample strand and a reverse sample strand, the method comprising: (a) contacting the sample nucleic acid with one or more first adaptors to form a sample-adaptor complex, wherein the one or more first adaptors prevent ligation between the 5’ end of the forward sample strand and the 5’ end of the reverse sample strand; (b) extending with a polymerase to form a forward double-stranded sample-adaptor complex comprising the forward strand, the first adaptor, and a sequence complementary to the forward strand and a reverse double-stranded sample-adaptor complex comprising the reverse sample strand, the first adaptor and a sequence complementary to the reverse sample strand; and (c) ligating one or more second adaptors to the forward double-stranded sample-adaptor complex and the reverse double-stranded sample-
- the forward double-stranded sample-adaptor complex and the reverse double-stranded sample-adaptor complex produced in step (b) comprise a blunt end and a looped end.
- the first adaptor comprises a first primer hybridization site
- the second adaptor comprises a second primer hybridization site.
- the first primer hybridization site and the second primer hybridization site are different.
- the first adaptor and the second adaptor are different.
- the first adaptor comprises: a first nucleic acid strand comprising: a 3’ segment of the first nucleic acid strand comprising a self-complementary stem and a loop, and a 5’ segment of the first nucleic acid strand comprising a sequence that is not self-complementary; and a second nucleic acid strand comprising: , a 3’ segment of the second nucleic acid strand comprising a sequence complementary to a portion of the 5’ segment of the first nucleic acid strand and a 5’ segment that blocks s ligation of the 3’ end of the first nucleic acid strand to the 5’ end of the second nucleic acid strand, thereby generating a non- contiguous segment between the 5’ end of the second nucleic acid strand and the 3’ end of the first nucleic acid strand, wherein the 5’ segment comprises a flap or a blocking group.
- the first adaptor prevents ligation by a gap, a flap, or a blocking group.
- the second adaptor comprises a hairpin.
- the method further comprises contacting the circular forwardadaptor complex or the circular reverse-adaptor complex with a methyltransferase enzyme.
- a method comprising: (a) providing a doublestranded nucleic acid molecule comprising a first strand and a second strand hybridized to the first strand; (b) ligating adaptors to ends of the double-stranded nucleic acid molecule to yield a circular nucleic acid molecule; and (c) using the circular nucleic acid molecule as a template to generate a single-stranded nucleic acid molecule comprising copies of the first strand and the second strand.
- the first adaptor is ligated to a first end of the doublestranded nucleic acid molecule before a second adaptor is ligated to a second end of the doublestranded nucleic acid molecule.
- the double-stranded nucleic acid molecule is immobilized when the first adaptor is ligated to the first end of the double-stranded nucleic acid molecule.
- a first adaptor and a second adaptor are simulatanously ligated.
- the method further comprises selecting a circular nucleic acid molecule wherein the first adaptor and the second adaptor are different.
- step (b) comprises: ligating Y-adaptors to ends of the double-stranded nucleic acid molecule to produce an adaptor-nucleic acid complex; amplifying the adaptor-nucleic acid complex to produce a nucleic acid complex with a first adaptor sequence at a first end and a second adaptor sequence at a second end; and ligating the 5’ end of the first adaptor sequence with the 3’ end of the first adaptor sequence and ligating the 5’ end of the second adaptor sequence with the 3’ end of the second adaptor sequence to produce a circule nucleic acid molecule.
- the first adaptor comprises a first primer hybridization site
- the second adaptor comprises a second primer hybridization site.
- the first primer hybridization site and the second primer hybridization site are different.
- the method further comprises contacting the single-stranded nucleic acid molecule with a methyltransferase enzyme.
- a method of paired end clonal amplification comprising: (a) contacting a sample double-stranded DNA molecule comprising a template sequence with a first adaptor comprising a first adaptor sequence and a second adaptor comprising a second adaptor sequence to produce a circular DNA molecule, wherein the first adaptor sequence comprises a first primer hybridization sequence and wherein the second adaptor sequence comprises a second primer hybridization sequence; (b) contacting the circular DNA molecule with a polymerizing enzyme to produce a single-stranded DNA molecule via rolling circle amplification, wherein the single-stranded DNA molecule comprises a sequence comprising at least the first adaptor sequence, the template sequence, the second adaptor sequence, and a sequence complementary to the template sequence; and (c) subjecting the single-stranded DNA molecule to clonal amplification from at least (i) the first primer hybridization sequence and (ii) the second primer hybridization sequence.
- subjecting the single stranded DNA molecule to clonal amplification from (i) the first primer hybridization sequence comprises contacting the ssDNA with a blocking molecule that prevents extension of a nascent sequence beyond the second adaptor; wherein the nascent sequence is complementary to the template sequence or the sequence complementary to the template sequence.
- subjecting the single stranded DNA molecule to clonal amplification from (ii) the second primer hybridization sequence comprises contacting the ssDNA with a blocking molecule that prevents extension of a nascent sequence beyond the first adaptor; wherein the nascent sequence is complementary to the template sequence or the sequence complementary to the template sequence.
- the blocking molecule comprises an oligonucleotide.
- the oligonucleotide comprises a locked nucleic acid (LNA), a psoralen modified nucleic acid, a MGB modified nucleic acid, or a G-quadruplex oligo.
- the blocking molecule comprises a peptide.
- the peptide comprises a sequence-specific DNA binding protein.
- the sequence-specific DNA binding protein is a Cas protein or a Tus protein.
- the clonal amplification in (c)(i) is performed before the clonal amplification in (c)(ii).
- the clonal amplification in (c)(ii) is performed before the clonal amplification in (c)(i).
- the single-stranded DNA molecule further comprises a second copy of the first adaptor sequence, a second copy of the sequence of interest, a second copy of the second adaptor sequence, and a second sequence complementary to the sequence of interest.
- step (a) occurs in solution.
- step (b) occurs in solution.
- step (c) occurs in a solid state.
- the first adaptor is ligated to the double-stranded DNA molecule before the second adaptor is ligated to the double-stranded DNA molecule.
- the first adaptor comprises: a first nucleic acid strand comprising: a 3’ segment of the first nucleic acid strand comprising a self-complementary stem and a loop, and a 5’ segment of the first nucleic acid strand comprising a sequence that is not self-complementary; and a second nucleic acid strand comprising: , a 3’ segment of the second nucleic acid strand comprising a sequence complementary to a portion of the 5’ segment of the first nucleic acid strand and a 5’ segment that blocks s ligation of the 3’ end of the first nucleic acid strand to the 5’ end of the second nucleic acid strand, thereby generating a non- contiguous segment between the 5’ end of the second nucleic acid strand and the 3’ end of the first nucleic acid strand, wherein the 5’ segment comprises a flap or a blocking group.
- the circular DNA molecule comprises either the forward strand of the sample double-stranded DNA molecule or the reverse strand of the sample double-stranded DNA molecule, but not both.
- the first adaptor and the second adaptor are ligated to the double-stranded DNA molecule simultaneously.
- the circular DNA molecule comprises the forward strand of the sample double-stranded DNA molecule and the reverse strand of the sample double-stranded DNA molecule.
- an adaptor comprising: a first nucleic acid strand comprising: a 3’ segment of the first nucleic acid strand comprising a self- complementary stem and a loop, and a 5’ segment of the first nucleic acid strand comprising a sequence that is not self-complementary; and a second nucleic acid strand comprising: , a 3’ segment of the second nucleic acid strand comprising a sequence complementary to a portion of the 5’ segment of the first nucleic acid strand and a 5’ segment that blocks ligation of the 3’ end of the first nucleic acid strand to the 5’ end of the second nucleic acid strand, thereby generating a non- contiguous segment between the 5’ end of the second nucleic acid strand and the 3’ end of the first nucleic acid strand, wherein the 5’ segment comprises a flap or a blocking group.
- the adaptor comprises DNA. In some embodiments, the adaptor comprises a first sequencing site. In some embodiments, the 5’ segment of the second nucleic acid comprises a flap of greater or equal to 1 nucleotides. In some embodiments, the 5’ segment of the second nucleic acid comprises a blocking group. In some embodiments, the blocking group is an oligonucleotide or a sequence for binding a sequence-specific DNA binding protein. In some embodiments, the oligonucleotide comprises a locked nucleic acid (LN A), a psoralen modified nucleic acid, a MGB modified nucleic acid, or a G-quadruplex oligo. In some embodiments, the sequence-specific DNA binding protein comprises a Cas protein or a Tus protein.
- LN A locked nucleic acid
- the sequence-specific DNA binding protein comprises a Cas protein or a Tus protein.
- a DNA molecule comprising: a first DNA strand comprising: a first segment comprising a first sequence; wherein the first segment is ligated to a first hairpin segment comprising a first hairpin sequence at the 3’ end of the first hairpin segment; a second segment comprising a second sequence that is complementary to the first sequence; wherein the second segment is ligated to the first hairpin segment at the 5’ end of the second segment and wherein the second segment is ligated to a second hairpin segment comprising a second hairpin sequence at the 3’ end of the second segment; a third segment comprising the first sequence, wherein the third segment is ligated to the second hairpin sequence at the 5’ end of the third segment and wherein the third segment is ligated to a third hairpin segment comprising the first hairpin sequence at the 3’ end; a fourth segment comprising the second sequence; wherein the fourth segment is ligated to the third hairpin segment at the 5’ end of the fourth segment; and a
- FIG. 1 depicts a sample DNA ligated to 2 different adaptors.
- FIG. 2 depicts a single-stranded circular DNA formed by ligating two adaptors to a double-stranded DNA.
- FIG. 3 depicts a single-stranded concatemer produced by rolling circle amplification.
- FIG. 4 depicts the process of rolling circle amplification while a methyltransferase is present.
- FIG. 5 depicts a methylated concatemer
- FIG. 6 depicts sequencing a methylated concatemer.
- FIG. 7 depicts sequencing a methylated concatemer after bisulfite conversion.
- FIGS. 8A-8B depict a method of sequencing using asymmetric adaptors.
- FIG. 9 depicts examples of asymmetric adaptors.
- FIG. 10 depicts asympetric adaptors.
- FIGS. 11A-11D depict the process of generating asymmetric sequencing libraries.
- FIG. 12A depicts the structure of a molecule in an asymmetric library.
- FIG. 12B depicts the amplification of an asymmetric library.
- FIG. 13 depicts paired end sequencing of the first strand from adapter A.
- FIG. 14 depicts paired end sequencing of the second strand from adapter B.
- FIG. 15 depicts sequencing of the first and second strands.
- FIG. 16 depicts an overlap of the location of sequencing reads from the first strand and second strand.
- FIG. 17 depicts base calls from the first and second strand.
- the methods and compositions described herein provide benefits over other sequencing methods.
- the methods described herein allow for error correction on the sequencing by sequencing both the Watson and Crick strands.
- the methods described herein also eliminate the need for unique molecular identifiers (UMIs) and randomly sheared DNA while increasing the effective number of reads per flow cell as PCT-free methods can be used for high-quality base calls.
- the methods further eliminate the need for circularization during library preparation of concatamers (CATs).
- the methods described herein may involve preparation of a library for sequencing.
- the methods describe ligating two separate adaptors to a double-stranded linear nucleic acid molecule to produce a single-stranded circular nucleic acid molecule.
- the single-stranded circular nucleic acid molecule is amplified to create a singlestranded concatemer containing both the adaptor sequences and both the forward and reverse sequences of the double-stranded linear nucleic acid molecule.
- a double-stranded sample nucleic acid is prepared for sequencing by ligating a first adaptor described herein and a second adaptor described herein to the double-stranded nucleic acid.
- two adapters are ligated to a doublestranded nucleic acid molecule to produce a single stranded circular nucleic acid molecule.
- the single-stranded circular nucleic acid molecule is amplified using rolling circle amplification.
- the single-stranded circular nucleic acid molecule is used as a template to create a concatemer for use in sequence.
- the singlestranded circular nucleic acid molecule is used as a template to create a single-stranded nucleic acid molecule comprising copies of the first strand of the double-stranded nucleic acid molecule and copies of the second strand of the double-stranded nucleic acid molecule.
- the adaptors may comprise loops.
- the adaptors may comprise hairpins.
- the adaptors may comprise stem loops.
- the adaptors may comprise primer hybridization sites.
- the first adaptor may comprise a first primer hybridization site and the second adaptor may comprise a second primer hybridization site.
- the first adaptor and the second adaptor may comprise different sequences.
- adaptors are ligated to a linear double-stranded nucleic acid molecule to produce a single- stranded circular nucleic acid molecule.
- an adaptor is ligated to a first end of a double-stranded nucleic acid molecule.
- a second adaptor is ligated to a second end of a doublestranded nucleic acid molecule.
- a first adaptor comprising a hairpin and a second adaptor comprising a hairpin are ligated to a double stranded linear nucleic acid molecule, as depicted in Fig. 1.
- Ligating the two adaptors to the double-stranded linear nucleic acid molecules may produce a single-stranded circular nucleic acid molecule, such as depicted in Fig- 2.
- the single-stranded circular nucleic acid molecule may comprise both the sequence of the first adaptor, the second adaptor, the forward strand of the double-stranded linear nucleic acid molecule and the reverse strand of the double-stranded linear nucleic acid molecule.
- the first adaptor may be ligated to the first end of the double-stranded nucleic acid molecule before the second adaptor is ligated to the second end of the double-stranded nucleic acid molecule.
- the nucleic acid may be immobilized on a solid support.
- the solid support may be a bead.
- a first adaptor may be ligated.
- the nucleic acid may then be cut or eluted from the bead and the second adaptor may be ligated.
- two single ligation steps with two different adaptors may occur, followed by enrichment of inserts with two different adaptors.
- the two different adaptors may be combined at a 1 : 1 ratio. Ligation may occur simultaneously.
- a Y-shaped adaptor may be ligated. Following ligation of the Y-shaped adaptors to each end of the nucleic acid, a round of PCR amplification will produce a nucleic acid sequence with a different adaptor on each end.
- the method comprises: providing a double-stranded nucleic acid molecule comprising a first strand and a second strand hybridized to the first strand; ligating adaptors to ends of the double-stranded nucleic acid molecule to yield a circular nucleic acid molecule; and using the circular nucleic acid molecule as a template to generate a singlestranded nucleic acid molecule comprising copies of the first strand and said second strand.
- the first adaptor and the second adaptor are different.
- the methods comprise preparing two-single stranded circular molecules from one double-stranded linear molecule.
- the sample nucleic acid may be contacted with one or more first adaptors to form a sample-adaptor complex.
- the one or more first adaptors may be an adaptor described herein.
- the one or more first adaptors may prevent ligation between the 5’ end of the forward sample strand and the 5’ end of the reverse sample strand.
- the adaptor may block ligation via a gap, flap, or a blocking group as described herein.
- the sample-adaptor complex is contacted with a polymerase to form a forward double-stranded sample-adaptor complex comprising the forward strand, the first adaptor, and a sequence complementary to the forward strand.
- the sample-adaptor complex is contacted with a polymerase to form a reverse double-stranded sample-adaptor complex comprising the reverse sample strand, the first adaptor and a sequence complementary to the reverse sample strand.
- the polymerase uses the forward strand of the sample to extend the 5’ end of the adaptor.
- the polymerase uses the reverse strand of the sample to extend the 5’ end of the adaptor.
- both a forward double-stranded sample-adaptor complex and a reverse doublestranded sample adaptor complex is created.
- the forward double-stranded sample-adaptor complex comprises a double-stranded DNA sequence comprising one blunt end and one looped end.
- the blunt end may comprise a 5 ’end and a 3’ end of the sample nucleic acid.
- the looped end may comprise the first adaptor.
- the reverse double-stranded sample-adaptor complex comprises a double-stranded DNA sequence comprising one blunt end and one looped end.
- the blunt end may comprise a 5 ’end and a 3’ end of the sample nucleic acid.
- the looped end may comprise the first adaptor.
- a second adaptor is ligated to the forward double-stranded adaptor complex.
- the second adaptor and the forward double-stranded adaptor complex are ligated to form a single-stranded circular DNA molecule comprising the first adaptor, the forward strand of the sample DNA, the second adaptor, and a sequence complementary to the forward strand of the sample DNA.
- a second adaptor is ligated to the reverse double-stranded adaptor complex.
- the second adaptor and the reverse double-stranded adaptor complex are ligated to form a single-stranded circular DNA molecule comprising the first adaptor, the reverse strand of the sample DNA, the second adaptor, and a sequence complementary to the reverse strand of the sample DNA.
- the second adaptor may comprise a second primer hybridization site.
- the second adaptor may comprise a hairpin.
- the second adaptor may be different than the first adaptor.
- the second adaptor may comprise a different primer hybridization site than the first adaptor.
- the second adaptor may comprise a 5’ end that is available for ligation.
- the second adaptor may comprise a 3’ end that is available for ligation.
- the methods comprise contacting the sample nucleic acid with one or more first adaptors to form a sample-adaptor complex, wherein the one or more first adaptors prevent ligation between the 5’ end of the forward sample strand and the 5’ end of the reverse sample strand; extending with a polymerase to form a forward double-stranded sampleadaptor complex comprising the forward strand, the first adaptor, and a sequence complementary to the forward strand and a reverse double-stranded sample-adaptor complex comprising the reverse sample strand, the first adaptor and a sequence complementary to the reverse sample strand; and ligating one or more second adaptors to the forward double-stranded sample-adaptor complex and the reverse double-stranded sample-adaptor complex to form a circular forward-adaptor complex and a circular reverse-adaptor complex.
- the methods described herein comprise producing a concatemer from the single-stranded circular nucleic acids produced by the methods described herein.
- rolling circle replication is used to produced the concatemer.
- the concatemer is a single-stranded nucleic acid molecule comprising copies of the first strand and the second strand of the nucleic acid molecule.
- a non-limiting example of the concatemer can be depicted in Fig. 3.
- single molecules comprise concatemers of polynucleotides, usually polynucleotide analytes, i.e. target sequences, that have been produce in a conventional rolling circle replication (RCR) reaction.
- RCR rolling circle replication
- Guidance for selecting conditions and reagents for RCR reactions is available in many references available to those of ordinary skill, as evidence by the following that are incorporated by reference: Kool, U.S. Pat. No. 5,426,180; Lizardi, U.S. Pat. Nos. 5,854,033 and 6,143,495; Landegren, U.S. Pat. No. 5,871,921; and the like.
- RCR reaction components comprise single stranded DNA circles, one or more primers that anneal to DNA circles, a DNA polymerase having strand displacement activity to extend the 3' ends of primers annealed to DNA circles, nucleoside triphosphates, and a conventional polymerase reaction buffer. Such components are combined under conditions that permit primers to anneal to DNA circles and be extended by the DNA polymerase to form concatemers of DNA circle complements.
- An exemplary RCR reaction protocol is as follows: In a 50 pL reaction mixture, the following ingredients are assembled: 2-50 pmol circular DNA, 0.5 units/pL phage q>29 DNA polymerase, 0.2 pg/pL BSA, 3 mM dNTP, l *cp29 DNA polymerase reaction buffer (Amersham). The RCR reaction is carried out at 30° C. for 12 hours. In some embodiments, the concentration of circular DNA in the polymerase reaction may be selected to be low (approximately 10-100 billion circles per ml, or 10-100 circles per picoliter) to avoid entanglement and other intermolecular interactions.
- concatemers produced by RCR are approximately uniform in size; accordingly, in some embodiments, methods of making arrays of the invention may include a step of size-selecting concatemers.
- concatemers are selected that as a population have a coefficient of variation in molecular weight of less than about 30%; and in another embodiment, less than about 20%.
- size uniformity is further improved by adding low concentrations of chain terminators, such ddNTPs, to the RCR reaction mixture to reduce the presence of very large concatemers, e.g. produced by DNA circles that are synthesized at a higher rate by polymerases.
- concentrations of ddNTPs are used that result in an expected concatemer size in the range of from 50-250 Kb, or in the range of from 50-100 Kb.
- concatemers may be enriched for a particular size range using a conventional separation techniques, e.g. size-exclusion chromatography, membrane filtration, or the like.
- Probe sequences of random arrays may be derived from virtually any population of nucleic acid fragments that can produce useful information in a hybridization assay.
- probe sequences of random arrays are extracted or derived from nucleic acids in a sample.
- Exemplary samples include, but are not limited to, samples from a population of individuals or organisms, a single patient, a single tissue from multiple patients, multiple tissues from one or more patients, an organism of economic interest, a community of microorganisms, a collection of synthetic nucleic acids (e.g. the set of all nucleic acid sequences having a length selected from the range of from 10-20), or the like.
- probe sequences may be derived from a genomic DNA library, cDNA library, cRNA library, siRNA library, or other classes of natural nucleic acids.
- the invention provides random arrays for comparing gene expression or copy number abundances among different biological samples; in such embodiment, probe sequences may be derived from a consensus or reference library of DNA fragments.
- the nucleotide sequences from a reference library are known and the sequences typically are listed in sequence databases, such as Genbank, Embl, or the like.
- a reference library of DNA may comprise a cDNA library or genomic library from a known cell type or tissue source.
- a reference library of DNA may comprise a cDNA library or a genomic library derived from the tissue of a healthy individual and a test library of DNA (from which target sequences are derived) may comprise a cDNA library or genomic library derived from the same tissue of a diseased individual.
- Reference libraries of DNA may also comprise an assembled collection of individual polynucleotides, cDNAs, genes, or exons thereof, e.g. genes or exons encoding all or a subset of known p53 variants, genes of a signal transduction pathway, or the like.
- the DNA use for making probes may be enriched through various procedures. For example, variable regions between 2 and 20 or between 20 and 2000 individuals may be collected using mismatch cutting enzymes or other procedures to make arrays enriched for polymorphisms.
- probe sequences are synthetic polynucleotides having predetermined sequences.
- synthetic probe sequences are selected for detecting protein- DNA binding, e.g. Gronostajski, Nucleic Acids Research, 15: 5545-5559 (1987); Oliphant et al, Gene, 44: 177-183 (1986); Oliphant et al, Meth. Enzymol., 155: 568-582 (1987); which references are incorporated by reference.
- probe sequences for such use may have the following form: “oligol-NNN . . . NNN-oligo2”, where “oligol” and “oligo2” are oligonucleotides of known sequence, e.g.
- primer binding sites which sandwich a random sequence region “NNN . . . NNN”, which may vary in length and composition.
- the random sequence region has a length in the range of from 6 to 20, or in the range of from 8 to 16.
- “N” is any of the four natural nucleotides.
- preparation of selected synthetic probes may be produced individually or in various pools.
- One pool example is 10-10,000 probes of different sequences mixed and extended with the same 5-15 base sequence in the same synthesis. These probes may be tagged for decoding or decoded directly by sequencing a portion of, or the entire, probe. 4-15 bases is sufficient for identifying thousands to millions of sequences.
- Genomic DNA is obtained using conventional techniques, for example, as disclosed in Sambrook et al., supra, 1999; Current Protocols in Molecular Biology, Ausubel et al., eds. (John Wiley and Sons, Inc., NY, 1999), or the like.
- Important factors for isolating genomic DNA include the following: 1) the DNA is free of DNA processing enzymes and contaminating salts; 2) the entire genome is equally represented; and 3) the DNA fragments are between about 5,000 and 100,000 bp in length. In many cases, no digestion of the extracted DNA is required because shear forces created during lysis and extraction will generate fragments in the desired range.
- shorter fragments (1-5 kb) can be generated by enzymatic fragmentation using restriction endonucleases.
- 10-100 genome-equivalents of DNA ensure that the population of fragments covers the entire genome.
- carrier DNA e.g. unrelated circular synthetic double-stranded DNA
- fragments may be derived from either an entire genome or it may be derived from a selected subset of a genome.
- Many techniques are available for isolating or enriching fragments from a subset of a genome, as exemplified by the following references that are incorporated by reference: Kandpal et al (1990), Nucleic Acids Research, 18: 1789-1795; Callow et al, U.S. patent publication 2005/0019776; Zabeau et al, U.S. Pat. No. 6,045,994; Deugau et al, U.S. Pat. No. 5,508,169; Sibson, U.S. Pat. No.
- an initial fragmentation of genomic DNA can be achieved by digestion with one or more “rare” cutting restriction endonucleases, such as Not I, Asc I, Bae I, CspC I, Pac I, Fse I, Sap I, Sfi I, Psr I, or the like.
- the resulting fragments can be used directly, or for genomes that have been sequenced, specific fragments may be isolated from such digested DNA for subsequent processing.
- Genomic DNA is digested with a rare cutting restriction endonuclease to generate fragments, after which the fragments are further digested for a short period (i.e.
- reaction is not allowed to run to completion) with a 5' single stranded exonuclease, such as exonuclease, to expose sequences adjacent to restriction site sequences at the end of the fragments.
- exonuclease a 5' single stranded exonuclease, such as exonuclease
- the methods comprise clonally amplifying a single-stranded concatemer described herein.
- the single-stranded DNA molecule is produced by contacting a sample double-stranded DNA molecule comprising a template sequence with a first adaptor comprising a first adaptor sequence and a second adaptor comprising a second adaptor sequence to produce a circular DNA molecule, wherein the first adaptor sequence comprises a first primer hybridization sequence and wherein the second adaptor sequence comprises a second primer hybridization sequence; and contacting the circular DNA molecule with a polymerizing enzyme to produce a single-stranded DNA molecule via rolling circle amplification, wherein the single-stranded DNA molecule comprises a sequence comprising at least the first adaptor sequence, the template sequence, the second adaptor sequence, and a sequence complementary to the template sequence.
- the methods may involve hybridizing a first primer to the first primer hybridization sequence site on the first adaptor sequence in the single-stranded DNA molecule or concatamer.
- a polymerase is used to extend the first primer to produce a nascent sequence complementary to the template sequence.
- the methods involve hybridizing a plurality of first primers to a plurality of first primer hybridization sequence sites on a plurality of first adaptor sequences in the single-stranded DNA molecule or concatemer.
- the first adaptor sequence comprises a first blocking sequence.
- the second adaptor sequence comprises a second blocking sequence.
- the first blocking sequence is different than the second blocking sequence.
- the first blocking sequence is the same as the first primer hybridization sequence.
- the second blocking sequence is the same as the second primer hybridization sequence.
- the methods comprise contacting the single stranded DNA molecule with a blocking molecule that binds to the blocking sequence.
- the blocking molecule may prevent extension of a nascent sequence beyond the first adaptor.
- the blocking molecule may comprise an oligonucleotide.
- the oligonucleotide may comprise a locked nucleic acid, a psoralen modified nucleic acid, a MGB modified nucleic acid or G-quadruplex oligo.
- the blocking molecule may comprise a peptide or DNA binding protein.
- the peptide or protein may bind to the blocking sequence on the adaptor.
- the DNA binding protein may be a Cas protein.
- the DNA binding protein may be a Tus protein.
- Extending from the first primer site may result in a DNA molecule comprising: a first DNA strand comprising: a first segment comprising a first sequence; wherein the first segment is ligated to a first hairpin segment comprising a first hairpin sequence at the 3’ end of the first hairpin segment; a second segment comprising a second sequence that is complementary to the first sequence; wherein the second segment is ligated to the first hairpin segment at the 5’ end of the second segment and wherein the second segment is ligated to a second hairpin segment comprising a second hairpin sequence at the 3’ end of the second segment; a third segment comprising the first sequence, wherein the third segment is ligated to the second hairpin sequence at the 5’ end of the third segment and wherein the third segment is ligated to a third hairpin segment comprising the first hairpin sequence at the 3’ end; a fourth segment comprising the second sequence; wherein the fourth segment is ligated to the third hairpin segment at the 5’ end of the fourth segment;
- the concatemer is contacted with a second primer.
- the primer may hybridize to a second primer hybridization site sequence in the second adaptor sequence.
- the reverse template strand may be extended using a polymerase.
- supports are rigid solids that have a surface, preferably a substantially planar surface so that single molecules to be interrogated are in the same plane. The latter feature permits efficient signal collection by detection optics, for example.
- solid supports of the invention are nonporous, particularly when random arrays of single molecules are analyzed by hybridization reactions requiring small volumes. Suitable solid support materials include materials such as glass, polyacrylamide-coated glass, ceramics, silica, silicon, quartz, various plastics, and the like.
- the area of a planar surface may be in the range of from 0.5 to 4 cm 2 .
- the solid support is glass or quartz, such as a microscope slide, having a surface that is uniformly silanized.
- This may be accomplished using conventional protocols, e.g. acid treatment followed by immersion in a solution of 3-glycidoxypropyl trimethoxysilane, N,N- diisopropylethylamine, and anhydrous xylene (8: 1 :24 v/v) at 80° C., which forms an epoxysilanized surface, e.g. Beattie et a (1995), Molecular Biotechnology, 4: 213.
- Such a surface is readily treated to permit end-attachment of capture oligonucleotides, e.g.
- capture oligonucleotides may comprise non-natural nucleosidic units and/or linkages that confer favorable properties, such as increased duplex stability; such compounds include, but not limited to, peptide nucleic acids (PNAs), locked nucleic acids (LNA), oligonucleotide N3' ⁇ P5' phosphoramidates, oligo-2'-O- alkylribonucleotides, and the like.
- PNAs peptide nucleic acids
- LNA locked nucleic acids
- oligonucleotide N3' ⁇ P5' phosphoramidates oligo-2'-O- alkylribonucleotides, and the like.
- photolithography, electron beam lithography, nano imprint lithography, and nano printing may be used to generate such patterns on a wide variety of surfaces, e.g.
- surfaces containing a plurality of discrete spaced apart regions are fabricated by photolithography.
- a commercially available, optically flat, quartz substrate is spin coated with a 100-500 nm thick layer of photo-resist.
- the photo-resist is then baked on to the quartz substrate.
- An image of a reticle with a pattern of regions to be activated is projected onto the surface of the photo-resist, using a stepper. After exposure, the photo-resist is developed, removing the areas of the projected pattern which were exposed to the UV source. This is accomplished by plasma etching, a dry developing technique capable of producing very fine detail.
- the substrate is then baked to strengthen the remaining photo-resist. After baking, the quartz wafer is ready for functionalization.
- the wafer is then subjected to vapor-deposition of 3- aminopropyldimethylethoxysilane.
- the density of the amino functionalized monomer can be tightly controlled by varying the concentration of the monomer and the time of exposure of the substrate. Only areas of quartz exposed by the plasma etching process may react with and capture the monomer.
- the substrate is then baked again to cure the monolayer of aminofunctionalized monomer to the exposed quartz. After baking, the remaining photo-resist may be removed using acetone. Because of the difference in attachment chemistry between the resist and silane, aminosilane-functionalized areas on the substrate may remain intact through the acetone rinse.
- oligonucleotides can be prepared with a 5 '-carboxy -modifier-c 10 linker (Glen Research). This technique allows the oligonucleotide to be attached directly to the amine modified support, thereby avoiding additional functionalization steps.
- surfaces containing a plurality of discrete spaced apart regions are fabricated by nano-imprint lithography (NIL).
- NIL nano-imprint lithography
- a quartz substrate is spin coated with a layer of resist, commonly called the transfer layer.
- a second type of resist is then applied over the transfer layer, commonly called the imprint layer.
- the master imprint tool then makes an impression on the imprint layer.
- the overall thickness of the imprint layer is then reduced by plasma etching until the low areas of the imprint reach the transfer layer. Because the transfer layer is harder to remove than the imprint layer, it remains largely untouched.
- the imprint and transfer layers are then hardened by heating.
- the substrate is then put into a plasma etcher until the low areas of the imprint reach the quartz.
- the substrate is then derivatized by vapor deposition as described above.
- surfaces containing a plurality of discrete spaced apart regions are fabricated by nano printing.
- This process uses photo, imprint, or e-beam lithography to create a master mold, which is a negative image of the features required on the print head.
- Print heads are usually made of a soft, flexible polymer such as polydimethyl siloxane (PDMS). This material, or layers of materials having different properties, are spin coated onto a quartz substrate. The mold is then used to emboss the features onto the top layer of resist material under controlled temperature and pressure conditions. The print head is then subjected to a plasma based etching process to improve the aspect ratio of the print head, and eliminate distortion of the print head due to relaxation over time of the embossed material.
- PDMS polydimethyl siloxane
- Random array substrates are manufactured using nano-printing by depositing a pattern of amine modified oligonucleotides onto a homogenously derivatized surface. These oligo-nucleotides would serve as capture probes for the RCR products.
- One potential advantage to nano-printing is the ability to print interleaved patterns of different capture probes onto the random array support. This would be accomplished by successive printing with multiple print heads, each head having a differing pattern, and all patterns fitting together to form the final structured support pattern. Such methods allow for some positional encoding of DNA elements within the random array. For example, control concatemers containing a specific sequence can be bound at regular intervals throughout a random array.
- a high density array of capture oligonucleotide spots of sub micron size is prepared using a printing head or imprint-master prepared from a bundle, or bundle of bundles, of about 10,000 to 100 million optical fibers with a core and cladding material.
- a unique material is produced that has about 50-1000 nm cores separated by a similar or 2-5 fold smaller or larger size cladding material.
- differential etching (dissolving) of cladding material a nano-printing head is obtained having a very large number of nano-sized posts.
- This printing head may be used for depositing oligonucleotides or other biological (proteins, oligopeptides, DNA, aptamers) or chemical compounds such as silane with various active groups.
- the glass fiber tool is used as a patterned support to deposit oligonucleotides or other biological or chemical compounds. In this case only posts created by etching may be contacted with material to be deposited. Also, a flat cut of the fused fiber bundle may be used to guide light through cores and allow light-induced chemistry to occur only at the tip surface of the cores, thus eliminating the need for etching.
- the same support may then be used as a light guiding/collection device for imaging fluorescence labels used to tag oligonucleotides or other reactants.
- This device provides a large field of view with a large numerical aperture (potentially >1).
- Stamping or printing tools that perform active material or oligonucleotide deposition may be used to print 2 to 100 different oligonucleotides in an interleaved pattern. This process requires precise positioning of the print head to about 50- 500 nm.
- This type of oligonucleotide array may be used for attaching 2 to 100 different DNA populations such as different source DNA. They also may be used for parallel reading from sublight resolution spots by using DNA specific anchors or tags.
- DNA specific tags e.g. 16 specific anchors for 16 DNAs and read 2 bases by a combination of 5-6 colors and using 16 ligation cycles or one ligation cycle and 16 decoding cycles. This way of making arrays is efficient if limited information (e.g. a small number of cycles) is required per fragment, thus providing more information per cycle or more cycles per surface.
- “inert” concatemers are used to prepare a surface for attachment of test concatemers.
- the surface is first covered by capture oligonucleotides complementary to the binding site present on two types of synthetic concatemers; one is a capture concatemer, the other is a spacer concatemer.
- the spacer concatemers do not have DNA segments complementary to the adapter used in preparation of test concatemers and they are used in about 5-50, preferably 10x excess to capture concatemers.
- the surface with capture oligonucleotide is “saturated” with a mix of synthetic concatemers (prepared by chain ligation or by RCR) in which the spacer concatemers are used in about 10-fold (or 5 to 50-fold) excess to capture concatemers.
- the capture concatemers are mostly individual islands in a sea of spacer concatemers.
- the 10: 1 ratio provides that two capture concatemers are on average separated by two spacer concatemers. If concatemers are about 200 nm in diameter, then two capture concatemers are at about 600 nm center-to-center spacing. This surface is then used to attach test concatemers or other molecular structures that have a binding site complementary to a region of the capture concatemers but not present on the spacer concatemers. Capture concatemers may be prepared to have less copies than the number of binding sites in test concatemers to assure single test concatemer attachment per capture concatemer spot.
- test concatemers may be prepared that have high site occupancy without congregation. Due to random attachment, some areas on the surface may not have any concatemers attached, but these areas with free capture oligonucleotide may not be able to bind test concatemers since they are designed not to have binding sites for the capture oligonucleotide.
- An array of individual test concatemers as described would not be arranged in a grid pattern. An ordered grid pattern should simplify data collection because less pixels are needed and less sophisticated image analysis systems are needed also.
- multiple arrays of the invention may be place on a single surface.
- patterned array substrates may be produced to match the standard 96 or 384 well plate format.
- a production format can be an 8* 12 pattern of 6 mm> ⁇ 6 mm arrays at 9 mm pitch or 16x24 of 3.33 mmx3.33 mm array at 4.5 mm pitch, on a single piece of glass or plastic and other optically compatible material.
- each 6 mmx6 mm array consists of 36 million 250-500 nm square regions at 1 micrometer pitch. Hydrophobic or other surface or physical barriers may be used to prevent mixing different reactions between unit arrays.
- Cycled detection includes the binding and imaging or probes, such as antibodies or nucleotides, bound to detectable labels that are capable of emitting a visible light optical signal.
- deconvolution to resolve signals from densely packed substrates can be used effectively to identify individual optical signals from signals obscured due to the diffraction limit of optical imaging. After multiple cycles the precise location of the molecule will become increasingly more accurate. Using this information additional calculations can be performed to aid in crosstalk correction regarding known asymmetries in the crosstalk matrix occurring due to pixel discretization effects.
- the raw images are obtained using sampling that is at least at the Nyquist limit to facilitate more accurate determination of the oversampled image.
- Increasing the number of pixels used to represent the image by sampling in excess of the Nyquist limit (oversampling) increases the pixel data available for image processing and display.
- each image is taken with a pixel size no more than half the wavelength of light being observed.
- a pixel size of 162.5nm x 162.5 nm is used in detection to achieve sampling at or above the Nyquist limit.
- Sampling at a frequency of at least the Nyquist limit during raw imaging of the substrate is preferred to optimize the resolution of the system or methods described herein. This can be done in conjunction with the deconvolution methods and optical systems described herein to resolve features on a substrate below the diffraction limit with high accuracy.
- Pixelation error is present in raw images and prevents identification of information present from the optical signals due to pixelation.
- Sampling at least at the Nyquist frequency and generation of an oversampled image as described herein each assist in overcoming pixilation error.
- PSF point-spread
- pixel size below Nyquist
- center-to-center spacing is so small that crosstalk due to spatial overlap occurs.
- Nearest neighbor variable regression for center-to center crosstalk
- this can be improved if we know the relative location of each analyte on the substrate and have good alignment of images of a field.
- Highly accurate relative positional information for each analyte can be achieved by overlaying images of the same field from different cycles to generate a distribution of measured peaks from optical signals of different probes bound to each analyte. This distribution can then be used to generate a peak signal that corresponds to a single relative location of the analyte. Images from a subset of cycles can be used to generate relative location information for each analyte. In some embodiments, this relative position information is provided in a localization file.
- the specific area imaged for a field for each cycle may vary from cycle to cycle.
- an alignment between images of a field across multiple cycles can be performed. From this alignment, offset information compared to a reference file can then be identified and incorporated into the deconvolution algorithms to further increase the accuracy of deconvolution and signal identification for optical signals obscured due to the diffraction limit. In some embodiments, this information is provided in a Field Alignment File.
- Signal detection cross-talk / nearest neighbor
- a plurality of optical signals obscured by the diffraction limit of the optical system are identified for each of a plurality of biomolecules immobilized on a substrate and bound to probes comprising a detectable label.
- the probes are incorporated nucleotides and the series of cycles is used to determine a sequence of a polynucleotide immobilized on the array using single molecule sequencing by synthesis. e. Simulations of deconvolution applied to images
- Molecular densities are limited by crosstalk from neighboring molecules. Acceptable crosstalk levels at or below 25% with 2X oversample occurs for pitches at or above 275 nm. Acceptable crosstalk levels at or below 25% with 4X deconvolution using the point spread function of the optical system occurs for pitches at or above 210 nm.
- the physical size of the molecule will broaden the spot roughly half the size of the binding area. For example, for an 80 nm spot the pitch will be increased by roughly 40 nm. Smaller spot sizes may be used, but this will have the trade-off that fewer copies will be allowed and greater illumination intensity will be required. A single copy provides the simplest sample preparation but requires the greatest illumination intensity.
- Methods for sub -diffraction limit imaging discussed to this point involve image processing techniques of oversampling, deconvolution and crosstalk correction. Described herein are methods and systems that incorporate determination of the precise relative location analytes on the substrate using information from multiple cycles of probe optical signal imaging for the analytes. Using this information additional calculations can be performed to aid in crosstalk correction regarding known asymmetries in the crosstalk matrix occurring due to pixel discretization effects.
- the methods comprise detecting a methylation signature.
- a methyltransferase is used during sequence amplification to maintain the DNA methylation status.
- the methyltransferase may include a DNA methyltransferase (DNMT).
- DNMT DNA methyltransferase
- Bisulfite conversion of DNA results in conversion of unmodified cytosine (C) to uracil (U) that will be read as thymine (T) upon sequencing of PCR amplified DNA. Both 5meC and 5hmC are protected against conversion and will not be converted to U. Therefore they will both be read as C upon sequencing.
- bisulfite conversion occurs after generation of the concatemer.
- bisulfite conversion occurs after sequencing the concatemer.
- an additional round of sequencing is performed after bisulfite conversion.
- the methods and compositions described herein comprise at least one adaptor molecule.
- the methods comprise a plurality of adaptor molecules.
- the methods comprise a first adaptor molecule and a second adaptor molecule, wherein the first and second adaptor molecule is different.
- the methods comprise an asymmetric adaptor molecule.
- the adaptor comprises a hairpin.
- the adaptor comprises a blocking group.
- the adaptor comprises at least a first primer hybridization site.
- an adaptor comprising: a first nucleic acid strand comprising a 5’ segment and a 3’ segment, wherein the 3’ segment of the first nucleic acid strand comprises a hairpin and the 5’ segment of the first nucleic acid strand comprises an overhang; and a second nucleic acid strand comprising a 3’ segment and a 5’ segment, wherein the 3’ segment of the second nucleic acid strand comprises a sequence complementary to the 5’ overhang of the first nucleic acid strand and wherein the 5’ segment of the second nucleic acid strand comprises a nucleic acid sequence that is not complementary to the first nucleic acid molecule, and wherein the 5’ segment of the second nucleic acid strand comprises a blocking group.
- the adaptor comprises a first nucleic acid strand and a second nucleic acid strand.
- the first nucleic acid strand may comprise a 5’ segment and a 3’ segment.
- 3’ segment of the first nucleic acid strand comprises a hairpin.
- the hairpin may comprise a loop and a self-complementary stem region.
- the 5’ segment of the first nucleic acid segment may comprise a sequence that is not self-complementary.
- the adaptor may comprise a second nucleic acid strand, wherein the second nucleic acid strand is non-contiguous with the first nucleic acid strand.
- the second nucleic acid strand may comprise a 3’ segment that is complementary with the 5’ segment of the first nucleic acid strand.
- the second nucleic acid strand 104 may comprise a sequence 105 that is complementary to the 5’ segment of the first nucleic acid strand as depicted in FIG. X.
- the 5’ end of the adaptor molecule may comprise a region that blocks ligation of the 3’ end of the first nucleic acid strand to the 5’ end of the second nucleic acid strand.
- the region that blocks ligation may be a flap, a gap, a blocking group, or a combination thereof.
- the adaptor may comprise a gap 106 between the first nucleic acid strand 101 and the second nucleic acid strand 104.
- the second nucleic acid strand 104 may comprise a flap 107 that is not complementary to the first nucleic acid sequence.
- the second nucleic acid strand 104 may comprise a blocking group 108 at the 3’ end of the segment that is complementary to the 5’ segment of the first nucleic acid strand.
- an adaptor comprising: a first nucleic acid strand comprising: a 5’ segment of the first nucleic acid strand comprising a hairpin, and a 5’ segment of the first nucleic acid strand comprising a sequence that is not self-complementary; and a second nucleic acid strand comprising: a 3’ segment of the second nucleic acid strand comprising a sequence complementary to a portion of the 5’ segment of the first nucleic acid strand and a 5’ segment that blocks s ligation of the 3’ end of the first nucleic acid strand to the 5’ end of the second nucleic acid strand, thereby generating a non- contiguous segment between the 5’ end of the second nucleic acid strand and the 3’ end of the first nucleic acid strand, wherein the 5’ segment comprises a flap or a blocking group.
- the adaptor molecule may comprise a blocking group designed to block ligation.
- the adaptor molecule may comprise blocking group designed to block extension.
- the blocking group may comprise a oligonucleotide.
- the oligonucleotide may comprise a locked nucleic acid, a psoralen modified nucleic acid, a MGB modified nucleic acid or G-quadruplex oligo.
- the blocking group may comprise a sequence that binds to a peptide or DNA binding protein.
- the peptide or protein may bind to the blocking sequence on the adaptor.
- the DNA binding protein may be a Cas protein.
- the DNA binding protein may be a Tus protein. 8. Sequencing sites
- the adaptor molecules may comprise a sequencing site.
- the adaptor molecules may comprise a primer hybridization site.
- range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the disclosure. Accordingly, the description of a range should be considered to have specifically disclosed all the possible subranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed subranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 3, 4, 5, and 6. This applies regardless of the breadth of the range.
- determining means determining if an element is present or not (for example, detection). These terms can include quantitative, qualitative or quantitative and qualitative determinations. Assessing can be relative or absolute. “Detecting the presence of’ can include determining the amount of something present in addition to determining whether it is present or absent depending on the context.
- a “subject” can be a biological entity containing expressed genetic materials.
- the biological entity can be a plant, animal, or microorganism, including, for example, bacteria, viruses, fungi, and protozoa.
- the subject can be tissues, cells and their progeny of a biological entity obtained in vivo or cultured in vitro.
- the subject can be a mammal.
- the mammal can be a human.
- the subject may be diagnosed or suspected of being at high risk for a disease. In some cases, the subject is not necessarily diagnosed or suspected of being at high risk for the disease.
- the term “about” a number refers to that number plus or minus 10% of that number.
- the term “about” a range refers to that range minus 10% of its lowest value and plus 10% of its greatest value.
- the term “overlaying” refers to overlaying images from different cycles to generate a distribution of detected optical signals (e.g., position and intensity, or position of peak) from each analyte over a plurality of cycles.
- This distribution of detected optical signals can be generated by overlaying images, overlaying artificial processed images, or overlaying datasets comprising positional information.
- overlay images encompasses any of these mechanisms to generate a distribution of position information for optical signals from a single probe bound to a single analyte for each of a plurality of cycles.
- a "cycle” is defined by completion of one or more passes and stripping of the detectable label from the substrate. Subsequent cycles of one or more passes per cycle can be performed. For the methods and systems described herein, multiple cycles are performed on a single substrate or sample. For DNA sequencing, multiple cycles requires the use of a reversible terminator and a removable detectable label from an incorporated nucleotide. For proteins, multiple cycles requires that the probe removal (stripping) conditions either maintain proteins folded in their proper configuration, or that the probes used are chosen to bind to peptide sequences so that the binding efficiency is independent of the protein fold configuration.
- a "pass" in a detection assay refers to a process where a plurality of probes comprising a detectable label are introduced to the bound analytes, selective binding occurs between the probes and distinct target analytes, and a plurality of signals are detected from the detectable labels.
- a pass includes introduction of a set of antibodies that bind specifically to a target analyte.
- a pass can also include introduction of a set of labelled nucleotides for incorporation into the growing strand during sequencing by synthesis. There can be multiple passes of different sets of probes before the substrate is stripped of all detectable labels, or before the detectable label or reversible terminator is removed from an incorporated nucleotide during sequencing.
- an image refers to an image of a field taken during a cycle or a pass within a cycle.
- a single image is limited to detection of a single color of a detectable label.
- a “target analyte” or “analyte” refers to a single molecule, compound, complex, substance or component that is to be identified, quantified, and otherwise characterized.
- a target analyte can comprise by way of example, but not limitation to, a single molecule (of any molecular size), a single biomolecule, a polypeptide, a protein (folded or unfolded), a polynucleotide molecule (RNA, cDNA, or DNA), a fragment thereof, a modified molecule thereof, such as a modified nucleic acid, or a combination thereof.
- a target polynucleotide comprises a hybridized primer to facilitate sequencing by synthesis.
- the target analytes are recognized by probes, which can be used to sequence, identify, and quantify the target analytes using optical detection methods described herein.
- a "probe” as used herein refers to a molecule that is capable of binding to other molecules (e.g., a complementary labelled nucleotide during sequencing by synthesis, polynucleotides, polypeptides or full-length proteins, etc.), cellular components or structures (lipids, cell walls, etc.), or cells for detecting or assessing the properties of the molecules, cellular components or structures, or cells.
- the probe comprises a structure or component that binds to the target analyte. In some embodiments, multiple probes may recognize different parts of the same target analyte.
- probes include, but are not limited to, a labelled reversible terminator nucleotide, an aptamer, an antibody, a polypeptide, an oligonucleotide (DNA, RNA), or any combination thereof.
- Antibodies, aptamers, oligonucleotide sequences and combinations thereof as probes are also described in detail below.
- the probe can comprise a detectable label that is used to detect the binding of the probe to a target analyte.
- the probe can be directly or indirectly bound to, hybridized to, conjugated to, or covalently linked to the target analyte.
- the term detectable label refers to a molecule bound to a probe that is capable of generating a detectable optical signal when the probe is bound to a target analyte and imaged using an optical imaging system.
- the detectable label can be directly or indirectly bound to, hybridized to, conjugated to, or covalently linked to the probe.
- the detectable label is a fluorescent molecule or a chemiluminescent molecule.
- the probe can be detected optically via the detectable label.
- the term optical distribution model refers to a statistical distribution of probabilities for light detection from a point source. These include, for example, a Gaussian distribution. The Gaussian distribution can be modified to include anticipated aberrations in detection to generate a point spread function as an optical distribution model.
- Example 1 A method of library preparation and sequencing
- Dual Seq allows for paired- end sequencing of an insert without any turnaround chemistry, and can enable sequencing both strands of a duplex in a paired-end manner.
- Any library can be used and ligated on looped adapters.
- a CAT is created by landing a primer on the open loop.
- a sequencing primer can be hybridized after loading. DNA polymerases with strand-displacement activity for efficient nucleotide incorporation can be used.
- Adapters are asymmetrically ligated to enzymatically or sonically-sheared DNA. Mixtures of looped adapters can be ligated or various modalities can be used to select for inserts with two different adapters. Sequential ligation steps can be used to ligate two different adapters. Physical means such as beads in concert with sequential ligation steps can be used to ligate two different adapters.
- the circle in the case of DualSeq has two loop regions (LI, L2) that contain different primer hybridization sequences (P1,P2), as depicted in Fig. 2
- the CAT molecule, as depicted in Fig. 3, obtained through rolling circle amplification of the circle in Fig. 2 will allow for the specific sequencing of both strands of the original dsDNA molecule via initiation at Pl or P2.
- Methylation signature of circle DNA is preserved in CATs by performing rolling circle amplification (RCA) in the presence of DNA methyltransferase (DNMT), as depicted in Fig. 4.
- Fig. 5 depicts a DualSeq CAT with preserved methylation signatures. Methylated (blue) and unmethylated Cytosines (yellow) are shown.
- RI, R2 forward and reverse strands
- Both methylated and unmethylated cytosines are read through their complementary guanine (G) base pairing.
- Ligate Clip adaptor which is designed to perform 3' end extension from both end of the molecule.
- the clip adaptor consists of two oligos hybridized together.
- the ligated molecule will be extended from 3' end using strand displacement DNA polymerase.
- the newly synthesized complementary strand will serve as reference sequence while original strand carries methyl signature of each strand.
- the CAT from dual seq circle can potentially form a duplex structure, which can inhibit the sequencing chemistry.
- the following workflow can be used:
- the first sequencing primer may be extended to fill the strand until it reaches to the adaptor region.
- DNA inserts are ligated to novel adapters, as depicted in FIG. 10. These adapters contain two independent oligonucleotides. Oligo 1 has 3 regions: Region 1 (5’ end) is single stranded; Region 2 is double stranded; and Region 3 is single stranded. Region 2 and 3 form a hairpin loop Oligo 2 has two regions: Region 1 (5’ end) is a single stranded flap and Region 2 (3’ end) is complementary to region 1 of oligo 1. An adapter is formed by hybridization of oligo 1 to oligo 2.
- the adapters are ligated to DNA inserts to produce adapted DNA molecules with two identical adapters on the two ends (See FIG. 11 A).
- the 3’ end of oligo 1 in each of the two adapters are extended in the 3’ to 5’ direction resulting in displacement of the flap (region 1 in oligo 2).
- the extending DNA strand (dotted line) displaces the flap and the attached DNA strand.
- Extension produces a pair of double stranded DNA molecules with a stem loop adapter on one end and an open adapter of the other end, as depicted in FIG. 11B.
- Asymetric libraries are generated used the methods described herein.
- the asymmetric libraries are comprised of 2 different adapters. Each adaptor has a different sequence in the loop region.
- the loop region contains a site for a blocking reagent to bind.
- the blocking reagent may be a Locked Nucleic Acid or a sequence specific binding protein such as dCas9.
- An extension primer binds to the asymmetric library and is extended by Rolling Circle Amplification producing a concatemer that binds with itself creating a double stranded accordion like structure since both the forward and reverse complement strands are present, as depicted in FIG. 12B.
- Paired end sequencing of the first strand from adapter A comprises 4 steps, as depicted in FIG. 13.
- a blocking agent specifically attaches to the said blocking region of adapter A.
- the blocking reagent is a Locked Nucleic Acid.
- the LNA binds to the blocking site on stem loop A only.
- the bottom strand is now single stranded and can hybridize to a sequencing primer.
- a sequencing primer that binds to a site on stem loop A only is hybridized. The sequencing primer is then extended by a sequencing polymerase in the process determining the sequence of bases on the bottom strand.
- a denaturing reagent is used to removed the blocking agent, the sequenced strand and the previously extended strand from the previous sequencing reactions.
- a blocking agent that specifically attaches to the said blocking region of adapter A.
- the blocking reagent is a Locked Nucleic Acid.
- the LNA binds to the blocking site on stem loop A only.
- An extension primer that binds to a site on stem loop B only. The extension primer is extended till it reaches the blocking agent forming a double stranded strand (solid and dotted line).
- the bottom strand is now single stranded and can hybridize to a sequencing primer.
- a sequencing primer that binds to a site on stem loop A only is hybridized. The sequencing primer is then extended by a sequencing polymerase in the process determining the sequence of bases on the bottom strand.
- Example 5 Paired end sequencing of the first and second strand
- FIGS. 15-17 depict images of the sequencing surface during sequencing. Each row is one sequencing cucle. The columns are the fluorescence for each cycle separated into individual images representing each nucleotide. This demonstrates that both the forward and reverse strands were successfully sequenced.
- FIG. 16 depicts the location of the sequencing reads from the first strand (red) and the second strand (green). This indicates that the molecules were successfully attached and identified for forward and reverse sequencing.
- FIG. 17 shows base calls from the first and second strand.
- the sequence generated from the amplified concatamers are shown as base calls.
- the first column of basecalls were from the first strand and the second column of base calls were the reverse complement base calls from the second strand.
- the sequences were identical, showing high accuracy between forward and reverse sequencing.
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Organic Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Analytical Chemistry (AREA)
- Biophysics (AREA)
- Immunology (AREA)
- Microbiology (AREA)
- Molecular Biology (AREA)
- Biotechnology (AREA)
- Physics & Mathematics (AREA)
- Biochemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Genetics & Genomics (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Description
Claims
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP23843830.3A EP4558642A1 (en) | 2022-07-22 | 2023-07-18 | Systems and methods for dual-end sequencing |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202263391664P | 2022-07-22 | 2022-07-22 | |
| US63/391,664 | 2022-07-22 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024020410A1 true WO2024020410A1 (en) | 2024-01-25 |
Family
ID=89618602
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2023/070442 Ceased WO2024020410A1 (en) | 2022-07-22 | 2023-07-18 | Systems and methods for dual-end sequencing |
Country Status (2)
| Country | Link |
|---|---|
| EP (1) | EP4558642A1 (en) |
| WO (1) | WO2024020410A1 (en) |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20170362639A1 (en) * | 2016-06-17 | 2017-12-21 | Pacific Biosciences Of California, Inc. | Methods and compositions for generating asymmetrically-tagged nucleic acid fragments |
| WO2018165366A1 (en) * | 2017-03-08 | 2018-09-13 | President And Fellows Of Harvard College | Methods of amplifying dna to maintain methylation status |
| WO2021078947A1 (en) * | 2019-10-25 | 2021-04-29 | Illumina Cambridge Limited | Methods for generating, and sequencing from, asymmetric adaptors on the ends of polynucleotide templates comprising hairpin loops |
-
2023
- 2023-07-18 EP EP23843830.3A patent/EP4558642A1/en active Pending
- 2023-07-18 WO PCT/US2023/070442 patent/WO2024020410A1/en not_active Ceased
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20170362639A1 (en) * | 2016-06-17 | 2017-12-21 | Pacific Biosciences Of California, Inc. | Methods and compositions for generating asymmetrically-tagged nucleic acid fragments |
| WO2018165366A1 (en) * | 2017-03-08 | 2018-09-13 | President And Fellows Of Harvard College | Methods of amplifying dna to maintain methylation status |
| WO2021078947A1 (en) * | 2019-10-25 | 2021-04-29 | Illumina Cambridge Limited | Methods for generating, and sequencing from, asymmetric adaptors on the ends of polynucleotide templates comprising hairpin loops |
Also Published As
| Publication number | Publication date |
|---|---|
| EP4558642A1 (en) | 2025-05-28 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20250154564A1 (en) | Methods for performing spatial profiling of biological molecules | |
| DE69534930T2 (en) | MOLECULAR LABELING SYSTEM | |
| US11739371B2 (en) | Arrays for single molecule detection and use thereof | |
| US20190284552A1 (en) | Arrays for Single Molecule Detection and Uses Thereof | |
| US20220314187A1 (en) | Methods and compositions for light-controlled surface patterning using a polymer | |
| US20220228201A1 (en) | Molecular arrays and methods for generating and using the arrays | |
| US20090263820A1 (en) | Optimization of Gene Expression Analysis using Immobilized Capture Probes | |
| CN116406428A (en) | Compositions and methods for in situ single cell analysis using enzymatic nucleic acid extension | |
| AU2019364418B2 (en) | Methods and compositions for identifying ligands on arrays using indexes and barcodes | |
| WO2002061135A2 (en) | Dna array sequence selection | |
| WO2008058282A2 (en) | Methods and compositions for large-scale analysis of nucleic acids using dna deletions | |
| EP2619333B1 (en) | Native-extension parallel sequencing | |
| KR20220131819A (en) | Kits, systems and flow cells | |
| US20090263872A1 (en) | Methods and compositions for preventing bias in amplification and sequencing reactions | |
| WO2024020410A1 (en) | Systems and methods for dual-end sequencing | |
| US7238486B2 (en) | DNA fingerprinting using a branch migration assay | |
| US20060110756A1 (en) | Large-scale parallelized DNA sequencing | |
| US20060110764A1 (en) | Large-scale parallelized DNA sequencing | |
| RU2825578C1 (en) | Methods and compositions for determining ligands on matrices using indices and barcodes | |
| RU2816708C2 (en) | Methods and compositions for determining ligands on matrices using indices and barcodes | |
| EP4647510A1 (en) | Sequencing method | |
| Cullen et al. | High‐Throughput and Industrial Methods for mRNA Expression Analysis |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23843830 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2023843830 Country of ref document: EP |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| ENP | Entry into the national phase |
Ref document number: 2023843830 Country of ref document: EP Effective date: 20250224 |
|
| WWP | Wipo information: published in national office |
Ref document number: 2023843830 Country of ref document: EP |