WO2025128542A1 - Adn de séquençage de nanopores par remplacement de nucléotides par des analogues - Google Patents
Adn de séquençage de nanopores par remplacement de nucléotides par des analogues Download PDFInfo
- Publication number
- WO2025128542A1 WO2025128542A1 PCT/US2024/059342 US2024059342W WO2025128542A1 WO 2025128542 A1 WO2025128542 A1 WO 2025128542A1 US 2024059342 W US2024059342 W US 2024059342W WO 2025128542 A1 WO2025128542 A1 WO 2025128542A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- analog
- dna
- nucleobase
- sequencing
- dntp
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6869—Methods for sequencing
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6806—Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/10—Processes for the isolation, preparation or purification of DNA or RNA
- C12N15/1034—Isolating an individual clone by screening libraries
- C12N15/1093—General methods of preparing gene libraries, not provided for in other subgroups
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/48—Biological material, e.g. blood, urine; Haemocytometers
- G01N33/483—Physical analysis of biological material
- G01N33/487—Physical analysis of biological material of liquid biological material
- G01N33/48707—Physical analysis of biological material of liquid biological material by electrical means
- G01N33/48721—Investigating individual macromolecules, e.g. by translocation through nanopores
Definitions
- Nanopore sequencing (e.g., the Oxford Nanopore Technologies® platform) is based on measuring changes in electrical signal generated due to DNA or RNA molecules passing through nano-scaled pores (“nanopores”).
- the approach offers long read sequencing (reads in which the mean read length can exceed 10 kb, and the maximal read length can reach 880 kb or more) as well as real-time analysis and a low initial investment cost.
- nanopore sequencing provides a generally effective approach for sequencing a variety of DNA and RNA fragments, these approaches typically suffer from a higher error rate on raw sequence reads compared with other sequencing methods, such as standard next-generation sequencing (NGS), (e.g., the Illumina® platform). Improvements in the accuracy of nanopore sequencing are greatly needed.
- NGS next-generation sequencing
- the disclosure provides a method for preparing a DNA library for high-accuracy nanopore sequencing, the method comprising: contacting a DNA strand comprising a target polynucleotide sequence with a DNA polymerase and a dNTP pool comprising an analog dNTP that comprises a modification relative to a corresponding reference dNTP that is absent from the dNTP pool; wherein the DNA polymerase incorporates the analog dNTP into a nascent DNA strand as an analog nucleotide in place of incorporation of the corresponding reference dNTP as a corresponding reference nucleotide.
- the electrical resistance of the corresponding reference nucleotide is greater than the electrical resistance of the analog nucleotide.
- the modification of the analog nucleotide increases an electrical charge of the analog nucleotide relative to an electrical charge of the corresponding reference nucleotide.
- the modification of the analog nucleotide decreases an electrical charge of the analog nucleotide relative to an electrical charge of the corresponding reference nucleotide.
- the method is isothermal.
- the contacting the DNA strand is performed a plurality of times with a plurality of dNTP pools, wherein dNTP pools of the plurality of dNTP pools comprise pool-specific analog dNTPs that comprise pool-specific modifications relative to corresponding reference dNTPs that are absent from the dNTP pools.
- the contacting the DNA strand is performed three times.
- the method further comprises removing previous dNTP pools from a reaction mixture between contacting steps.
- the removing previous dNTP pools comprises immobilizing the DNA strand and the nascent DNA strand and washing the reaction mixture such that a previous analog dNTP is removed from the reaction mixture.
- the immobilizing step comprises attachment of a DNA molecule comprising the DNA strand, the nascent DNA strand, or both, to a substrate.
- a majority of dNTPs of the dNTP pool are not analog dNTPs.
- no more than 25% of dNTPs of the dNTP pool are analog dNTPs.
- a majority of dNTPs of the dNTP pool are analog dNTPs.
- 100% of dNTPs of the dNTP pool are analog dNTPs.
- the method further comprises ligating a sequencing adaptor to the DNA strand and the nascent DNA strand to configure the DNA library for nanopore sequencing.
- the sequence adaptor comprises a barcode that corresponds with an identity of the analog dNTP.
- the method further comprises reverse transcribing an RNA strand to produce the DNA strand comprising the target polynucleotide sequence, wherein the DNA library represents at least part of an RNA fraction of a sample.
- the disclosure provides a DNA library prepared according to a method of the disclosure.
- the disclosure provides a method for high-accuracy analysis of nanopore sequencing data obtained from nanopore sequencing of a DNA library of the disclosure, the method comprising: aligning the nanopore sequencing data such that sequence data of the DNA strand is aligned with sequence data of the nascent DNA strand; computing an electrical property of the nanopore sequencing data that corresponds to an analog nucleobase of the nascent DNA strand for an analog nucleobase measurement; comparing the analog nucleobase measurement with a reference analog nucleobase measurement associated with a known analog nucleobase identity and assigning a nucleobase identity of the analog nucleobase consistent with the known analog nucleobase identity; and assigning, based on the nucleobase identity of the analog nucleobase, a nucleobase identity to an unknown nucleobase of the DNA strand that positionally corresponds to the analog nucleobase of the nascent DNA strand.
- a plurality of nucleobase identities of analog nucleobases of the nascent DNA strand correspond to a plurality of nucleobase identities of unknown nucleobases of the DNA strand.
- the plurality of nucleobase identities of analog nucleobases of the nascent DNA strand comprises one distinct nucleobase identity that corresponds to one contacting step used for DNA library preparation.
- the plurality of nucleobase identities of analog nucleobases of the nascent DNA strand comprises two distinct nucleobase identities that correspond to one or more contacting steps used for DNA library preparation.
- the plurality of nucleobase identities of analog nucleobases of the nascent DNA strand comprises three distinct nucleobase identities that correspond to one or more contacting steps used for DNA library preparation.
- the plurality of nucleobase identities of analog nucleobases of the nascent DNA strand comprises four distinct nucleobase identities that correspond to one or more contacting steps used for DNA library preparation.
- the method is performed at least in part by a programmable processor, a processor circuitry, a computational device, a computational system, a computational network, or any combination thereof.
- a programmable processor a processor circuitry, a computational device, a computational system, a computational network, or any combination thereof.
- the disclosure also provides a processor, a processor circuitry, a computational device, a computational system, a computational network, or any combination thereof, comprising circuitry configured for performance of all or part of a method of the disclosure, in any order or combination of steps, by the processor, processor circuitry, computational device, computational system, computational network, or combination thereof, as the case may be.
- DESCRIPTION OF THE DRAWINGS DESCRIPTION OF THE DRAWINGS
- FIGs 1A-1D show examples of variable voltage nanopore sequencing, according to aspects of the disclosure.
- FIG. 1A Schematic of nanopore setup.
- FIG. IB Because the DNA is elastic, different applied voltages (forces) stretch the DNA to differing degrees.
- FIG. 1C (left) Constant-voltage sequencing yields only information about the average current. Current-level degeneracies contribute significantly to sequencing errors, (right) Using a variable- voltage to floss the DNA back and forth, one can sample various locations along the DNA and extract smooth curve segments. These smooth curve segments are what one would see if DNA were smoothly and continuously translocated through the pore.
- FIG. ID Evaluation of sequencing accuracy involves an alignment step in which the called bases are aligned to the known sequence.
- FIG. 2A shows examples of histograms of constant-voltage conductance caused by translocation of DNA substrates containing 16 nucleotide homopolymers of each Hachimoji base (excluding G), according to aspects of the disclosure.
- Homopolymers of P and Z give the lowest and highest conductance signals, evidence that Hachimoji system has an expanded nanopore signal range relative to the standard alphabet.
- the number of bases which contribute to sequencing kmers and limited current space in which these kmer- induced currents exist contributes to the difficulty of high-accuracy nanopore sequencing. Using previous approaches, many kmers are difficult to distinguish from one another.
- FIGs 2B and 2C show example (FIG.
- variable-voltage consensus patterns of Hachimoji single-base substitutions within a pseudorandom sequence according to aspects of the disclosure.
- Variable-voltage nanopore sequencing contains more sequence information than obtained from current levels produced from constantvoltage analysis.
- the figure also shows an example (FIG. 2C) confusion matrix of the basecalling algorithm’s accuracy, which shows that Hachimoji single-base substitutions are distinguished with high confidence using variable-voltage nanopore sequencing. Reads are included in base calling only if they align to one of the consensus patterns with confidence > 90%.
- FIG. 3 shows an example scheme for boosting nanopore sequencing accuracy of supernumerary DNA using additional AEGIS alphabets, according to aspects of the disclosure.
- Many AEGIS bases are interchangeable with one another; z.e., there exist two versions of the base Z, four versions of the base C, etc. which are compatible with polymerases.
- strands can be built which contain multiple copies of the original sequence using differing AEGIS alphabets.
- A- tailing step 2
- hairpin adapters can be ligated to the sequence of interest (step 3).
- the strand can then be zipped up using an alternative AEGIS alphabet (step 4).
- This hairpin construct can be again A-tailed (step 5), and another hairpin adapter ligated (step 6).
- This hairpin can again be zipped up with a strand displacing polymerase using a third AEGIS alphabet. Then, as the strand is read in a nanopore sequencing read, comparisons of reads of the sense (S) and antisense (S') strands from each alphabet can be combined (FIGs 4A and 4B) to identify individual bases within the read and significantly simplify the sequencing problem.
- FIGs 4A and 4B show example schematic comparisons of reads of the same letter sequence (8 letters ACGTPZKX or 4 letters, in FIG. 4A or FIG. 4B, respectively), according to aspects of the disclosure, using different heterocyclic variants of an 8-letter AEGIS alphabet or 4-letter alphabet (S, S' in FIGs 3, 4A, and 4B). If a strand constructed as shown in FIG. 3 is read by the nanopore, several reads of the same “letter sequence,” as identified by their base-pairing behavior, can be obtained using different AEGIS alphabets. In the first comparison of the sequence, “S,” including heterocyclic variants of X and P variants are used. These base “doppl egangers” are indicated by “ “ “.
- the two reads can then be aligned based on shared ion currents (displayed here with a constant applied voltage for simplicity). Ion current increases indicate the location of X and current decreases indicate the location of P in the primary strand.
- Reads of the complementary sequence, “S',” using a third alphabet, can be used to gain additional sequence information on additional bases, in this case C and A. Combining this information significantly simplifies the sequencing problem for supernumerary DNA.
- a similar application to 4-letter DNA in FIG 4B using a G base analog allows for identification of the location of all G’s in the primary sequence and C’s in the complementary sequence.
- FIG. 5A shows examples of how rearranging hydrogen bond donor and acceptor groups on purine-pyrimidine pairs increases the number of independently replicable information units in a DNA-like evolvable system from 4 to 12, according to aspects of the disclosure.
- FIG. 5B shows a range of example AEGIS Z variants provided herein, according to aspects of the disclosure. All have the same general hydrogen bonding pattern, but may be implemented with different heterocycles and sugar substituents. This allows the tuning of the acid-base properties of the heterocycle, influencing Z contribution to AEGISZyme catalysis and influencing the Z contribution to the measured ion current, enabling base identification, according to aspects of the disclosure.
- FIG. 6A shows (left) a range of example AEGIS B provided herein, according to aspects of the disclosure, with the same general hydrogen bonding pattern implemented with different heterocycles. These allow the tuning of tautomerism properties of the heterocycle to prefer the keto or enol tautomer, (right) N- and C-glycoside variants of C and T. This perturbs the stability of the nucleotide to chemical degradation and repair enzymes, which can be desired or not depending on the application. Each nucleotide variant is configured to produce a different ion current when held in the pore constriction. These differences can be used to locate particular bases.
- FIG. 6B shows some examples of the functionalized AEGIS variants provided herein, according to aspects of the disclosure. These increase the catalytic potential of AEGIS libraries. It is noted how the boronate side chain (right) allows an AEGIS library to be enriched in components that bind glycoproteins. Each nucleotide variant is configured to produce a different ion current when held in the pore constriction. These differences can be used to locate particular bases.
- FIG. 6C shows an example generic synthesis of C-glycosides, illustrated for a variant of AEGIS Z, according to aspects of the disclosure.
- FIG. 7A shows validation of a strand with agarose gel electrophoresis, according to aspects of the disclosure.
- Lane 1 sequencing adapter. Result of elongation with Lane 2: normal Cytosine, Lane 3: 5mC, or Lane 4: 5-hmC.
- Lanes 5, 6, and 7 show the full sequence after sequencing adaptor ligation for C, 5mC, and 5-hmC strands, respectively.
- Identified bands are a) HP2 ⁇ 70 bases ( ⁇ 35 bp), b) original sequence “S” ⁇ 40 bp.
- d extended sequence - 180 bp.
- FIG. 7B shows a kmer-map prediction for ACGT DNA (black) compared to nanopore read of DNA in which 5mC replaces all Cs, according to aspects of the disclosure.
- the first 45 bases of the read are from the sequencing adapter and contain C.
- the vertical dashed line in the lower figure indicates the boundary between the adapter and primary sequence containing 5mC.
- the effect of 5mC on the ion current depends on the surrounding sequence context. An element of this result is that substitution of all C with 5mC significantly alters the kmer map. Reads of the same sequence containing C or 5mC produce orthogonal information, which can be used to significantly boost sequencing accuracy.
- FIG. 8A shows results from performing one round of replication using an alphabet of dNTPs which produces a high-current contrast signal in the nanopore which enables high accuracy sequencing of 6-letter DNA (ACGTPZ); selection of various modified dNTPs enables bases to be easily distinguished from one another.
- FIG. 8B shows an example scheme for generating DNA strands which enable multiple reads of the same primary sequence in multiple different alphabets, according to aspects of the disclosure.
- FIG. 8C shows an ion current histogram for ACGT 4-mers with M2- MspA, according to aspects of the disclosure.
- FIG. 9A shows an example of library preparation to generate data for sequencing model training, according to aspects of the disclosure.
- FIG. 9B shows the joint distribution of kmer and anti -kmer ion currents, according to aspects of the disclosure.
- a slice through the distribution at the most populous kmer current bin shows how the anti-kmer current can help break sequencing degeneracies.
- Adding kmer currents in additional alternative alphabets could be used to further-resolve kmers in particular by targeting bases responsible for kmers which are still poorly resolved from one another in duplex sequencing (cluster in light gray).
- FIG. 9C shows sequencing with multiple data streams amounts to a multiple sequence alignment problem in which data stream A is a nanopore read of the sense strand and data stream B is a nanopore read of the antisense strand, according to aspects of the disclosure.
- the third axis of the “multiple sequence alignment” (MSA) is the kmer map. Steps in the horizontal plane amount to a Needleman-Wunch alignment of A and B while vertical steps in the cube are a Viterbi sequencing algorithm which takes match scores for both data streams.
- the faces of the cube reduce to alignment of A to B (top face), sequencing of A (right face) and sequencing of B (left face). Additional streams of data can be used by performing higher dimensional alignments, which are difficult to visualize but straightforward to implement on a computer.
- FIG. 9D shows a model demonstrating that one need only consider a small subset of the elements of the MSA hyper-cube, according to aspects of the disclosure. These elements can be determined from pairwise alignments of the reads and individual read sequencing which form the faces of the hyper-cube.
- FIG. 10 shows an example diagram of sequencing of + and - strands (left), sequencing + and - sense strands with a second high-contrast alphabet (middle), and sequencing + and - sense strands with two or more high-contrast alphabets (right), according to aspects of the disclosure.
- the methods of the disclosure can be performed iteratively, with iterative use of two or more high-contrast dNTP pools for improved nanopore sequencing.
- FIG. 11A shows the set of Hachimoji DNA bases, which includes A, T, G, C, Z, P, S, and B, in a hydrogen bonded configuration, according to aspects of the disclosure.
- FIG. 11B shows an illustration of an example nanopore, according to aspects of the disclosure.
- the figures shows single-stranded DNA (ssDNA) in the nanopore, and motion of the DNA due to a voltage potential across the bilayer, Hel308 helicase, and MspA.
- the letter x designates a span of nucleotides (nt) positioned within an interior space of the MspA protein.
- FIG. 11C shows ion current (pA) as a function of time (s), according to aspects of the disclosure.
- the figure shows ion current changes as the non-canonical bases of a DNA polynucleotide strand transition from the pore (left portion of graph between two vertical dotted lines) to the helicase (right portion of graph between two vertical dotted lines)
- Nucleic acids have a wide array of natural and synthetic applications. Synthetic nucleic acids can be comprised of the canonical four nucleobases, as well as additional, non-canonical nucleobase analogs. Nanopore sequencing of natural, as well as synthetic nucleic acids, is in significant need of improvements in accuracy.
- AEGIS artificially expanded genetic information system
- this disclosure provides innovative tools to de novo sequence full 12 letter AEGIS DNA that also make variable voltage sequencing more accessible.
- differentiation between canonical nucleobases of a target sequence can be facilitated by “doping” a copy of a target sequence with one or more nucleobase analogs (containing at least one nucleobase analog), such that the “doped” copy produces a distinct electric current signature relative to a corresponding “canonical-only” copy of the target sequence (containing only G, A, T, C).
- Alignment of sequences enables identification of the location of the nucleobase analog in the doped sequence, which corresponds to the location of a corresponding canonical nucleobase in the target sequence and enables identification of that corresponding canonical nucleobase in the target sequence. This increases the accuracy of nanopore sequencing builds and reduces the size of the sequencing problem.
- Nanopore sequencing as disclosed herein not only enhances nanopore sequencing of any target polynucleotide sequence, but also enables AEGIS-LIVE to produce a new class of research, diagnostics, and therapeutic tools at low cost with fast response (in just weeks) that are catered to specific needs.
- AEGIS Bodies and AEGIS-Zymes (30-40 nucleotide DNA strands that, depending on how they are evolved, bind targets or, after binding, attach themselves to the targets, modify the targets, or are transported with the targets).
- AEGIS-Zymes can activate, inactivate, or modulate targets as “manipulating evolvable drugs” (MEDs); cross bio-barriers to enter cells, brain, and other privileged tissues in “mirror image” stable and active forms; allow cargo molecules (drugs) to acquire the pharmacodynamics of bound proteins, e.g., albumin or IgG; and can be made in weeks to target cells from individual patients, potentially providing a low-cost “personalized pharmacopeia”.
- MEDs evolvable drugs
- Analytical tools can determine what has emerged by evolution of AEGIS DNA. For example, with Rokumoji LIVE experiments, pools of Rokumoji DNA were sequenced using technology that used transliteration. Transliteration used PCR conditions that convert P to a mixture of A and G, and Z to a mixture of C and T. The mixtures were then subjected to deep sequencing. Bioinformatics identifies reads that descend from a single component in the pool, and deconvolutes the sequence. The sequence of the ancestral DNA molecule in the pool at positions where all of its descendants have G, A, C, or T are assigned to be G, A, C, or T. If a position in the descendant has both T and C, a Z is assigned at that position in the ancestral molecule in the pool. If a position in the descendant has both G and A, a P is assigned at that position in the ancestral molecule.
- the transliteration strategy can only be applied to 6-nucleotides, and a broader sequencing approach is needed if additional orders of magnitude in performance are to be obtained from AEGIS-LIVE that exploits more of the expanded genetic alphabet.
- Nanopore sequencing technology offers such a broader approach. It is “label free,” and thus is less challenging to develop and use than the “cyclic reversible terminator strategy” used, for example, on an Illumina® platform.
- cyclic reversible terminator strategy used, for example, on an Illumina® platform.
- a challenge in adapting nanopore DNA sequencing for AEGIS-systems is the sheer number of 4-base-long “kmers” for 8-, 10- and 12-letter DNA (4096, 10000, and 20736, respectively). Add to that a large number of heterocyclic variations for many of the AEGIS nucleotides and there is a vast parameter space.
- ONT pore and enzyme used by ONT is specifically tuned for the sequencing of 4-letter, ATCG DNA. ONT may not be able to be readily tuned for sequencing with AEGIS bases.
- initial results with MspA -variable-voltage sequencing were unexpectedly good. Hachimoji DNA exhibits a broader signal range in nanopore sequencing than standard DNA alone. Information about the additional bases is encoded across a broader current range, facilitating base-recognition. Hachimoji single-base substitutions are distinguishable with high confidence.
- nanopore sequencing can use a helicase molecular motor to control the motion of DNA
- the compatibility of the Hel308 motor enzyme with Hachimoji nucleotides was assessed by tracking the translocation of single Hel308 molecules along Hachimoji DNA, monitoring the enzyme kinetics and premature enzyme dissociation from the DNA.
- Hel308 is compatible with Hachimoji DNA but can dissociate more frequently when walking over C-glycoside nucleosides, compared to N-glycosides.
- High-accuracy nanopore sequencing as per approaches of the disclosure can enable: development of sub-picomolar receptors and ligands on demand, with turnaround times in weeks; enable delivery of macromolecular receptors and ligands into cells, not today possible, with AEGISBodies that bind cell-surface proteins like XRCC5 and are internalized, or bind transferrin, to move across the blood brain barrier; evolvable drugs that manipulate targets that they bind; and “personalized pharmacopeias,” where the low cost and rapid turn-around of AEGIS-LIVE allows, for example, anti-cancer drug delivery systems to evolve with the evolution of cancer in a specific patient.
- the disclosure provides example approaches for developing sequencing of the 8-letter AEGIS-alphabet for which there are working polymerases (ATCGPZKX).
- AEGIS-LIVE is possible with 10- and 12-letter DNA, and as such, the disclosure provides approaches for 10- and 12-letter sequencing.
- Variable voltage sequencing and the sequence-dependent kinetics of the motor enzyme can be used to extract additional sequence information and gain high accuracy.
- the enzyme motor can be modified by single amino acid replacements, which can involve mutagenesis of a gene encoding Hel308 from Thermococcus gammatolerans, codon-optimized for expression in E. coli. Sites for amino acid replacement can be chosen by their proximity and interaction with the nucleic acid and ATP -binding pocket. Variant forms of the protein can be characterized by expression level and activity on the nanopore.
- AEGIS components can be modified. Modification can be achieved by attaching acetylene linkers at the 5-position of the small “pyrimidine” analogs, or the 7-position of the large “purine” analogs, using palladium-catalyzed coupling and the corresponding iodo-heterocycle, obtained by iodinating the base heterocycle by N- iodosuccinimide.
- Yet another aspect of improved nanopore sequencing involves building sequencing models of expanded DNA alphabets. This can include designing and building synthetic oligos containing the various kmers within the expanded DNA libraries. These oligos can be measured on the nanopore system and incorporated into sequencing models.
- the kmer size for MspA is ⁇ 4 nucleotides, where the central dimer contributes the majority of the sequence information. It is thus possible to generate a rough kmer map by measuring all possible dimers within the given alphabet. Even for 12-letter DNA, there exist only 144 possible dimers which can be measured with a few synthetic oligos. While this map may have poor sequencing accuracy, it can serve as an initial guide for automated alignment of subsequent strands and provide insight into which bases can be tweaked for improvements to sequencing accuracy.
- motor enzymes used for the purpose of nanopore sequencing have displayed kinetics that are dependent upon the DNA sequence passing through the motor enzyme, and it was found that AEGIS bases P, Z, S, and B also interact with the motor enzyme in a unique way, affecting both step dwell-time and enzyme backsteps.
- AEGIS bases P, Z, S, and B also interact with the motor enzyme in a unique way, affecting both step dwell-time and enzyme backsteps.
- Helicase mutants developed to increase the amount of sequence information contained within the enzyme’s kinetics can also be applied to the sequencing of AEGIS DNA.
- AEGIS nucleotides may also not work well with the standard motor enzyme. This problem can be addressed in either or both of two ways: 1) mutation of the helicase to accommodate C-glycosides; and 2) selection of AEGIS-nucleotides that are compatible with the motor enzyme.
- Hel308-MspA variable-voltage sequencing is performed on a single pore with a single amplifier.
- This research/development setup allows for flexibility and rapid prototyping but may lack a level of throughput.
- sequencing AEGIS-DNA occurs, one can build parallelized setups capable of collecting significantly larger volumes of data. These setups can be for collecting large datasets to train kmer models and ultimately for sequencing large, diverse pools of AEGISBodies and AEGISZymes.
- strands including several concatenated “dual alphabets” (alphabets with the same base-pairing scheme but using different heterocyclic variants) can be combined and sequenced serially (FIGs 8A-8B).
- dual alphabets alphabets with the same base-pairing scheme but using different heterocyclic variants
- FIGS 8A-8B sequenced serially
- the base-pairing nature of each base is preserved, and reads are from singlemolecules.
- Various single-molecule re-reading strategies can also be employed to further improve single-molecule sequencing accuracy.
- the disclosure provides a method for preparing a DNA library for high-accuracy nanopore sequencing, the method comprising: contacting a DNA strand comprising a target polynucleotide sequence with a DNA polymerase and a dNTP pool comprising an analog dNTP that comprises a modification relative to a corresponding reference dNTP that is absent from the dNTP pool.
- the DNA polymerase incorporates the analog dNTP into a nascent DNA strand in place of the corresponding reference dNTP.
- a particular target polynucleotide sequence can be copied using one or more dNTP pools comprising one or more analog dNTPs that are “swapped out” for one or more canonical dNTPs.
- the modified sequence(s) having the analog dNTPs incorporated therein and resulting from copying the target polynucleotide sequence can be compared with each other for increasing statistical confidence for nucleotide base-calling for the original target polynucleotide sequence.
- an electrical resistance of the corresponding reference dNTP differs from an electrical resistance of the analog dNTP. In embodiments, the electrical resistance of the corresponding reference dNTP is greater than the electrical resistance of the analog dNTP. In embodiments, the electrical resistance of the corresponding reference dNTP is less than the electrical resistance of the analog dNTP. In embodiments, the modification of the analog dNTP increases an electrical charge of the analog dNTP relative to an electrical charge of the corresponding reference dNTP. In embodiments, the modification of the analog dNTP decreases an electrical charge of the analog dNTP relative to an electrical charge of the corresponding reference dNTP.
- the method is isothermal and does not require heat to be applied for various steps of the method to be performed.
- This can be advantageous for library preparations involving many different nucleic acid sequences. For example, if strands are melted and re-hybridized, they can form off-target duplexes; by using strand displacing polymerases, off-target hybridization can be avoided. This is also advantageous because instrumentation for thermocycling the reaction is not necessarily required.
- the contacting step is performed a plurality of times with a plurality of dNTP pools, wherein dNTP pools of the plurality of dNTP pools comprise pool-specific analog dNTPs that comprise pool-specific modifications relative to corresponding reference dNTPs that are absent from the dNTP pools.
- the contacting step can be performed two times, three times, four times, five times, or more. In at least some embodiments, the contacting step is performed three times.
- the method further comprises removing previous dNTP pools from a reaction mixture between contacting steps.
- the removing step comprises immobilizing the DNA strand and the nascent DNA strand and washing the reaction mixture such that a previous analog dNTP is removed from the reaction mixture.
- the immobilizing step comprises attachment of a DNA molecule comprising the DNA strand, the nascent DNA strand, or both, to a substrate. Any of various substrates can be used, at least relatively immobile, configured to hold the DNA molecule comprising the DNA strand, the nascent DNA strand, or both, thereto.
- the method further comprises ligating a sequencing adaptor to the DNA strand and the nascent DNA strand to configure the DNA library for nanopore sequencing. While any sequencing adaptor can be used, in various embodiments, the sequencing adaptor comprises a barcode that corresponds with an identity of the analog dNTP.
- the method further comprises reverse transcribing an RNA strand to produce the DNA strand comprising the target polynucleotide sequence, wherein the DNA library represents at least part of an RNA fraction of a sample.
- the method comprises: aligning the nanopore sequencing data such that sequence data of the DNA strand is aligned with sequence data of the nascent DNA strand; computing an electrical property of the nanopore sequencing data that corresponds to an analog nucleobase of the nascent DNA strand for an analog nucleobase measurement; comparing the analog nucleobase measurement with a reference analog nucleobase measurement associated with a known analog nucleobase identity and assigning a nucleobase identity of the analog nucleobase consistent with the known analog nucleobase identity; and assigning, based on the nucleobase identity of the analog nucleobase, a nucleobase identity to an unknown nucleobase of the DNA strand that positionally corresponds to the analog nucleobase of the nascent DNA strand.
- the plurality of nucleobase identities of analog nucleobases of the nascent DNA strand comprises one distinct nucleobase identity that corresponds to one contacting step used for DNA library preparation.
- the plurality of nucleobase identities of analog nucleobases of the nascent DNA strand comprises two distinct nucleobase identities that correspond to one or more contacting steps used for DNA library preparation.
- the plurality of nucleobase identities of analog nucleobases of the nascent DNA strand comprises three distinct nucleobase identities that correspond to one or more contacting steps used for DNA library preparation.
- a method is performed at least in part by a programmable processor, a processor circuitry, a computational device, a computational system, a computational network, or any combination thereof.
- a programmable processor a processor circuitry
- a computational device a computational system
- a computational network a computational network
- the disclosure also provides a kit comprising an instructional material and at least one element for performing a method of the disclosure. Circuitry, Processor, and Computer Implementations
- circuitry includes circuits, such as, for example, microprocessors or portions of microprocessors, that require software, firmware, and the like for operation.
- circuitry includes an implementation comprising one or more processors or portions thereof and accompanying software, firmware, hardware, and the like.
- circuitry includes a baseband integrated circuit or applications processor integrated circuit or a similar integrated circuit in a server, a cellular network device, other network device, or other computing device.
- circuitry includes one or more remotely located components.
- remotely located components e.g., server, server cluster, server farm, virtual private network, etc.
- non-remotely located components e.g., desktop computer, workstation, mobile device, controller, etc.
- remotely located components are operatively connected via one or more receivers, transmitters, transceivers, or the like.
- Embodiments include one or more data stores that, for example, store instructions and/or data.
- Non-limiting examples of one or more data stores include volatile memory (e.g., Random Access memory (RAM), Dynamic Random Access memory (DRAM), or the like), non-volatile memory (e.g., Read-Only memory (ROM), Electrically Erasable Programmable Read-Only memory (EEPROM), Compact Disc Read-Only memory (CD-ROM), or the like), persistent memory, or the like.
- Further non-limiting examples of one or more data stores include Erasable Programmable Read-Only memory (EPROM), flash memory, or the like.
- the one or more data stores can be connected to, for example, one or more computing devices by one or more instructions, data, or power buses.
- circuitry includes one or more computer-readable media drives, interface sockets, Universal Serial Bus (USB) ports, memory card slots, or the like, and one or more input/output components such as, for example, a graphical user interface, a display, a keyboard, a keypad, a trackball, a joystick, a touch-screen, a mouse, a switch, a dial, or the like, and any other peripheral device.
- circuitry includes one or more user input/output components that are operatively connected to at least one computing device to control (electrical, electromechanical, software-implemented, firmware-implemented, or other control, or combinations thereof) one or more aspects of the embodiment.
- Nonlimiting examples of signal-bearing media include a recordable type medium such as any form of flash memory, magnetic tape, floppy disk, a hard disk drive, a Compact Disc (CD), a Digital Video Disk (DVD), Blu-Ray Disc, a digital tape, a computer memory, or the like, as well as transmission type medium such as a digital and/or an analog communication medium (e.g., a fiber optic cable, a waveguide, a wired communications link, a wireless communication link e.g., transmitter, receiver, transceiver, transmission logic, reception logic, etc.).
- a recordable type medium such as any form of flash memory, magnetic tape, floppy disk, a hard disk drive, a Compact Disc (CD), a Digital Video Disk (DVD), Blu-Ray Disc, a digital tape, a computer memory, or the like
- transmission type medium such as a digital and/or an analog communication medium (e.g., a fiber optic cable, a waveguide, a wired communications
- signal-bearing media include, but are not limited to, DVD-ROM, DVD-RAM, DVD+RW, DVD-RW, DVD-R, DVD+R, CD-ROM, Super Audio CD, CD-R, CD+R, CD+RW, CD-RW, Video Compact Discs, Super Video Discs, flash memory, magnetic tape, magneto-optic disk, MINIDISC, non-volatile memory card, EEPROM, optical disk, optical storage, RAM, ROM, system memory, web server, or the like.
- analog dNTP refers to any dNTP that is not a reference, naturally-occurring, or canonical dNTP.
- the one or more processors can be any type of processor(s), such as a microprocessor, a digital signal processor, a multicore processor, etc., coupled to the non-transitory computer readable medium.
- the communication interface can include hardware to enable communication within the computational device and/or between the computational device and one or more other devices.
- the hardware can include transmitters, receivers, and antennas, for example.
- the communication interface can be configured to facilitate communication with one or more other devices, in accordance with one or more wired or wireless communication protocols.
- the display can include a flat-panel display, such as a liquid-crystal display (LCD) or a lightemitting diode (LED) display.
- a user interface can be included as part of the computational device, and can include one or more pieces of hardware used to provide data and control signals to the computing device.
- the user interface can include a mouse or a pointing device, a keyboard or a keypad, a microphone, a touchpad, or a touchscreen, among other possible types of user input devices.
- the user interface can enable an operator to interact with a graphical user interface (GUI) provided by the computing device (e.g., displayed by the display)
- GUI graphical user interface
- system refers to one or more computational devices, or one or more elements thereof, configured to perform one or more tasks, methods, or processes.
- a system of the disclosure can comprise a computing device.
- the computing device includes one or more processors, a non-transitory computer readable medium, a communication interface, a display, and a user interface. Components of the computing device are linked together by a system bus, network, or other connection mechanism.
- Words using the singular or plural number also include the plural and singular number, respectively. Additionally, the words “herein,” “above,” and “below” and words of similar import, when used in this application, shall refer to this application as a whole and not to any particular portions of the application.
- Example 1 Nanopore sequencing DNA by replacement of nucleotides with analogs.
- the disclosure provides approaches for generating DNA strands which enable multiple reads of the same primary sequence in multiple alternative alphabets.
- a short, selfpriming hairpin structure was created.
- this strand included a 4-letter ACGT- DNA.
- the sequence was elongated using 5-methyl cytosine (5mC) and 5 -hydroxymethyl cytosine (5hmC) triphosphates in place of normal cytosine (C) and then was ligated to the nanopore sequencing adaptor as in FIGs 5A-6C.
- the product was validated using gel electrophoresis which showed promising results (FIG. 7A). Unsuccessful ligation products were attributed to incomplete A-tailing.
- FIG. 7B shows one such read aligned to the predicted current values for unmodified C.
- Replacement of C with 5mC significantly alters nanopore ion currents, and reads of the target sequence in both ACGT and A(5mC)GT alphabets can yield unique sequence information and significantly increase sequencing accuracy.
- Other modifications can have larger effects, and this approach can be extended to use multiple modified bases, including with use of target sequences derived from plasmids and hairpin adapters synthesized enzymatically via PCR.
- Antibodies are frequently used for pull-down assays to interrogate chromatin organization and epigenetic markers, e.g. chromatin immuno-precipitation, CUT&RUN, etc.
- antibodies are used for diagnostics, as for example in liquid biopsies that aim to detect small numbers of cancer cells circulating in complex biological fluids.
- Antibodies are central to immunotherapy, as drugs themselves (e.g., herceptin), or conjugated to drugs (e.g., Kadcyla), where the antibody delivers toxic drugs (e.g., emtansine) to tumor cells.
- antibodies are generated by slow Darwinian evolution in animal immune systems. If the desired antibody is not in stock, a customer must pay to raise it (or an equivalent), a process of months and considerable expense. Furthermore, there are limitations to what antibodies can achieve. Even when they are in hand, antibodies as protein biologies have severe limitations. In particular, antibodies are either polyclonal or converted to monoclonal form. Here, “one gets what one gets”. Having a monoclonal antibody in hand allows researchers to do what that particular monoclonal can do, but only that. researchers have little opportunity to modify an antibody to get a set of antibodies with a range of affinities, a reagent tool kit that would be valuable to analyze systems over a dynamic range.
- This background drives the significance of various aspects of this disclosure, which provides nanopore sequencing approaches that can enable for rapid and inexpensive creation and sequence verification of macromolecules that bind targets for a significant impact in medicine and other industries.
- Such evolvable biomolecules could be raised against whole cells and counter-selected to not bind other cells; be able to chemically transform targets or attach themselves covalently to those targets; and/or be able to act after entering living cells or crossing the blood-brain barrier.
- Aptamers and catalysts can be rapidly obtained by applying selective pressure to libraries of natural DNA and RNA (nucleic acids; NA).
- NA nucleic acids
- in vitro selection “Systematic Evolution of Ligands by Exponential enrichment” (seLex), or “laboratory in vitro evolution” (Live)
- desired traits e.g., binding
- Live laboratory in vitro evolution
- the successful aptamer can be sequenced to “trim” it, optimize its performance, and attach other things. Further, an advantage of the sequenced aptamer over antibodies is that aptamers can later be synthesized as defined chemical entities and are no longer irreproducible biologies. The binding of aptamers made from regular 4-letter NA was initially hoped to “rival antibodies”. This potential has not yet been realized. NA binders for targets such as carbohydrates, small molecules, and peptides are now known, and some have entered the clinic after modification of the species originally selected. But the affinities of NAs built from standard 4-letter DNA with unfunctionalized building blocks does not match the pM affinities of antibodies.
- the disclosure provides approaches for nanopore sequencing including results from re-engineering DNA to get a better evolvable platform, with more nucleotides (now 12), more functional groups, better folding, more opportunities for compact folds, sub-picomolar affinity, and increased catalytic power, complete with organic synthesis pipelines, replicating polymerases and analytical chemistry for this “artificially expanded genetic information system” (Aegis).
- Aegis-Live is delivering reagents, ligands, and catalysts with five orders of magnitude better performance than those emerging from standard Live based on a 4-letter DNA alphabet.
- Live applied to Rokumoji DNA libraries gave RokuBodies that bind toxins, breast cancer cells, liver Hep G2 cancer cells, engineered cells, and proteins such as VEGF.
- RokuZymes were evolved that cleave targeted RNA sequences.
- Aegis nanotrains that hold ⁇ 50 doxorubicin drug molecules selfattached themselves to RokuBodies that bind liver cancer cells; the conjugate selectively delivered drugs to the liver cancer cells (not normal liver cells) and selectively kill them.
- This disclosure provides devices, systems and methods to sequence yet- more-expanded DNA molecules; increase the power of Aegis-Live, to create sub- picomolar receptors and ligands on demand, with turnaround times in weeks not today possible; create replacements for antibodies by chemicals that suffer none of their cost or challenges as biologies; get macromolecular receptors and ligands into cells, which is impossible to date; offer the possibility of evolvable drugs that manipulate targets that they bind; and offer the possibility of “personalized pharmacopeias”, where the low cost and rapid turn-around of Aegis-Live allows, for example, anti-cancer drug delivery systems to evolve with the evolution of cancer in a specific patient.
- Aegis alphabets, and nanopore sequencing can be refined and dramatically improved to enable sequencing of expanded DNA alphabets, de novo sequencing of 8-, 10-, and 12-letter DNA is not possible using current techniques. Innovations described herein make it possible. This is a new and ambitious application of nanopore sequencing to de novo sequence the large kmer set implied by 6-, 8-, 10- and 12-letter DNA.
- Typical Live experiments yield a library of AegisBodies containing just a handful of the most dominant binders in the form of -20-50 base sequences flanked by replication primers of known DNA sequence.
- the goal of this sub aim is to consensus sequence one such library with > 90% accuracy.
- heterocyclic variants of each base retain the base-pairing identity of their respective bases but have different chemical characteristics, which have various uses in AegisBodies and AegisZymes.
- Modified bases can be readily distinguished from one another in the nanopore, and slight chemical differences can lead to profound differences in nanopore ion current, and as such, sequencing of alternative versions of P and Z can be performed.
- analogs such as 5-methylcytosine, 5- hydroxymethylcytosine, 5-formylcytosine, and 5-carboxycytosine, all of which base pair in the same way as C does with G, exist for all other natural bases and can be synthesized as triphosphates which are compatible with replicative polymerases.
- Chemists have developed still more modified bases for various biochemical techniques.
- the bases of Aegis-DNA have many heterocyclic variants (FIG. 3).
- base mods can be selected to ensure maximum sequencing accuracy.
- Such a scheme can enable direct sequencing of 6-letter DNA by choosing an alphabet that reduces the similarities in ion current between different kmers. This can be achieved by targeting bases which are particularly hard to distinguish (such as S and G in FIGs 2B-2C) to try to make them more distinguishable.
- a strategy involves obtaining “duplex reads,” in which both the sense strand S and antisense strand S’ are read by the nanopore. The errors inherent in the antisense read are often orthogonal to those of the sense read, facilitating construction of the sequence.
- the complementary strand is itself an alternative alphabet in which A is swapped for T, C for G, G for C, and T for A.
- This scheme is called multi-alphabet substitution sequencing.
- Multialphabet sequencing is a novel library preparation technique in which iterative rounds of strand-displacing replication using base-analog dNTP alphabets generate a concatemer strand containing multiple successive copies of a sequence S and its complement S’ (FIG. 5B).
- a nanopore read of the concatemer strand will yield significantly more sequence information because the different alphabets’ constituents can be selected to complement one another’s weaknesses.
- Such a concatemer strand would include at least 4 duplicate reads of the original sequence: i) the primary sequence in alphabet 1, ii) the complement in alphabet 1, iii) the primary sequence in alphabet 2 and iv) the complementary sequence in alphabet 2.
- Information from a single read of a multi-alphabet concatemer can yield high single-molecule sequencing accuracy, leveraging the orthogonal information available in each of the four encodings of the original sequence. If still more information is needed, additional rounds of replication with additional alphabets can be made repeating steps 5 and 6.
- This scheme can involve implementing sets of alternative alphabets with orthogonal sequence information and sequencing algorithms capable of combining reads from several alphabets. Once implemented, their use can be put to practice to demonstrate their utility.
- results from selectively modified bases can be used to create alphabets which complement one another’s weaknesses.
- the scheme of FIG. 6B enables nanopore reads of the same primary sequence in four separate alphabets. Modified Aegis components can be authenticated. Sequencing accuracy with a particular sequencer can be tested using blind strands generated by FfAME. 6-letter ACGTPZ DNA will serve as the proving ground for this technique.
- multialphabet sequencing is capable of achieving arbitrarily high single-molecule sequencing accuracies for short DNA strands such as AegisBodies. Because one can continue to tack on additional copies of the sequence S and S’ in still more different alphabets. Accuracy may be limited by the purity of modified dNTPs and polymerase fidelity.
- ultra-high fidelity PCR replicative polymerases exist for the ACGTPZ alphabet, and have been developed for PZSB alphabets, and transcriptive RNA polymerases exist for the GACUZPSB alphabet.
- a feature of Aegis is that its components are themselves continuously evolving and base “cores” replaced with those that interact better with enzymes, such as polymerases (for better PCR amplification) or ribosomes (to let expanded DNA/RNA alphabets encode extra amino acids in an expanded protein lexicon). This development is not slowing as more teams internationally seek to become involved in this new molecular biology.
- individual alphabets can then contribute information about a subset of the bases in 8-, 10- or 12-letter DNA. Combining this information can yield the sequence with good accuracy.
- Machine Learning (ML) based sequencers serve as the primary sequencing models for nanopore sequencing and are fast and effective. Simultaneous use of data streams from multiple different alphabets to determine the primary sequence of a DNA strand is a ML problem in which multiple features are used as inputs to produce the desired output.
- each sequence produced by the microarray can have a unique barcode that can be used to establish the ground truth sequence and also include a sense and antisense sequence region and a self-priming hairpin.
- These micro-array libraries can then be subjected to one round of polymerase extension in which the primer is extended with a secondary alphabet (plus a third or fourth as may be used by larger alphabets as described above). Libraries can then be nanopore sequenced ensuring a broad random sampling of the available sequence space. Depending on the read throughput and library complexity, there may not be duplicate nanopore reads of any given training oligo.
- Strand barcodes can be long enough to enable easy identification of each read’s ground truth. As the training set grows, some training data can be set aside and used for benchmarking. The strand’s ground truth can be known from the barcode.
- ML approaches benefit significantly from large training sets which fully capture the range of variation present within the data.
- Generation of sufficiently large, complex libraries containing expanded alphabets can be replaced or combined with generation hidden-Markov model (HMM-based) which need considerably less training data.
- HMM-based generation hidden-Markov model
- HMM sequencers have the added benefit of not being “black box” sequencers so one can understand how raw data is converted into sequence.
- Such sequencing models can be developed with multiple data streams. Sequencers can be developed which simultaneously take into account measurements from various applied voltages, or which use information from both ion current and the sequence dependent stepping of the motor enzyme.
- Duplex sequencing via HMM frequently makes independent measurements of the individual reads and merges them by alignment of the base sequences. Discrepancies between the two reads are presumably resolved by evaluation of respective base-calling confidence. This is a sub- optimal approach, because the joint distributions contain significantly more information than either the sense read or antisense read alone.
- MSA approach can be computationally intensive, especially as more and more separate data streams are incorporated.
- MSA problem has been explored extensively in the literature in the context of alignment of biological sequences. It has been shown that nearly all entries of a MSA hyper-cube can be excluded from calculation using pair-wise alignments of all constituent reads (FIG. 9D). The accrued “cost” of stepping to the vast majority of hyper-cube elements is such that they need not be calculated. Validate sequencing on real libraries and begin to plug it into the development of new binders, catalysts and reagents.
- AegisBodies and AEGISZymes have been created using the 6-letter alphabet ACGTPZ. The sequences of these AegisBodies and AEGISZymes are comprised within libraries that can enable validation of a nanopore sequencing work-flow.
- AegisBodies that bind to (i) Binding immunoglobulin protein (BiP) also known as 78kDa glucose-regulated protein (GRP-78), for which an AegisBody exists (“LH5b”), (ii) Ku90, encoded by the human XRCC5 gene, for which an AegisBody exists (“LZH8”), and (iii) Human serum albumin, a generic carrier protein for AegisBodies.
- Binding immunoglobulin protein also known as 78kDa glucose-regulated protein (GRP-78)
- GRP-78 78kDa glucose-regulated protein
- Ku90 encoded by the human XRCC5 gene, for which an AegisBody exists (“L
- Sequences can be validated by using them to generate a second generation of AegisBodies that can be tested for affinity vs. those derived from Aegis-Live.
- sequencing “fidelity” can be borne out in the binding affinity achieved by AegisBodies derived from nanopore sequencing reads. This can also enable determination of the AegisBody structure enabling pruning of unnecessary loops and attachment of other cargomolecules for downstream applications. Sequencing of Aegis-DNA is envisioned in this proposal as an enabling technology, success can be gauged on this metric.
- Table 2 Example analog base pairs, the sequencing of which is improved due to aspects and embodiments of the disclosure. Aspects and embodiments of this disclosure can aid in sequencing at least the following base pairs, including those in the art of synthetic biology.
- Embodiment 1 A method for preparing a DNA library for high- accuracy nanopore sequencing, the method comprising: contacting a DNA strand comprising a target polynucleotide sequence with a DNA polymerase and a dNTP pool comprising an analog dNTP that comprises a modification relative to a corresponding reference dNTP that is absent from the dNTP pool; wherein the DNA polymerase incorporates the analog dNTP into a nascent DNA strand as an analog nucleotide in place of incorporation of the corresponding reference dNTP as a corresponding reference nucleotide.
- Embodiment 2 The method of Embodiment 1 or any other Embodiment, wherein an electrical resistance of the corresponding reference nucleotide differs from an electrical resistance of the analog nucleotide.
- Embodiment s The method of Embodiment 2 or any other Embodiment, wherein the electrical resistance of the corresponding reference nucleotide is greater than the electrical resistance of the analog nucleotide.
- Embodiment 4 The method of Embodiment 2 or any other Embodiment, wherein the electrical resistance of the corresponding reference nucleotide is less than the electrical resistance of the analog nucleotide.
- Embodiment 6 The method of any of Embodiments 1-4 or any other Embodiment, wherein the modification of the analog nucleotide decreases an electrical charge of the analog nucleotide relative to an electrical charge of the corresponding reference nucleotide.
- Embodiment 7 The method of any of Embodiments 1-6 or any other Embodiment, wherein the method is isothermal.
- Embodiment 11 The method of any of Embodiments 1-10 or any other Embodiment, wherein the removing previous dNTP pools comprises immobilizing the DNA strand and the nascent DNA strand and washing the reaction mixture such that a previous analog dNTP is removed from the reaction mixture.
- Embodiment 12 The method of any of Embodiments 1-11 or any other Embodiment, wherein the immobilizing step comprises attachment of a DNA molecule comprising the DNA strand, the nascent DNA strand, or both, to a substrate.
- Embodiment 13 The method of any of Embodiments 1-12 or any other Embodiment, wherein a majority of dNTPs of the dNTP pool are not analog dNTPs.
- Embodiment 14 The method of any of Embodiments 1-13 or any other Embodiment, wherein no more than 25% of dNTPs of the dNTP pool are analog dNTPs.
- Embodiment 16 The method of any of Embodiments 1-12 and 15 or any other Embodiment, wherein 100% of dNTPs of the dNTP pool are analog dNTPs.
- Embodiment 17 The method of any of Embodiments 1-16 or any other Embodiment, further comprising ligating a sequencing adaptor to the DNA strand and the nascent DNA strand to configure the DNA library for nanopore sequencing.
- Embodiment 18 The method of any of Embodiments 1-17 or any other Embodiment, wherein the sequence adaptor comprises a barcode that corresponds with an identity of the analog dNTP.
- Embodiment 19 The method of any of Embodiments 1-18 or any other Embodiment, further comprising reverse transcribing an RNA strand to produce the DNA strand comprising the target polynucleotide sequence, wherein the DNA library represents at least part of an RNA fraction of a sample.
- Embodiment 20 A DNA library prepared according to the method of any of Embodiments 1-19.
- Embodiment 21 A method for high-accuracy analysis of nanopore sequencing data obtained from nanopore sequencing of the DNA library of Embodiment 1 or any other Embodiment, the method comprising: aligning the nanopore sequencing data such that sequence data of the DNA strand is aligned with sequence data of the nascent DNA strand; computing an electrical property of the nanopore sequencing data that corresponds to an analog nucleobase of the nascent DNA strand for an analog nucleobase measurement; comparing the analog nucleobase measurement with a reference analog nucleobase measurement associated with a known analog nucleobase identity and assigning a nucleobase identity of the analog nucleobase consistent with the known analog nucleobase identity; and assigning, based on the nucleobase identity of the analog nucleobase, a nucleobase identity to an unknown nucleobase of the DNA strand that positionally corresponds to the analog nucleobase of the nascent DNA strand.
- Embodiment 22 The method of Embodiment 21 or any other Embodiment, wherein a plurality of nucleobase identities of analog nucleobases of the nascent DNA strand correspond to a plurality of nucleobase identities of unknown nucleobases of the DNA strand.
- Embodiment 23 The method of any of Embodiments 21-22 or any other Embodiment, wherein the plurality of nucleobase identities of analog nucleobases of the nascent DNA strand comprises one distinct nucleobase identity that corresponds to one contacting step used for DNA library preparation.
- Embodiment 24 The method of any of Embodiments 21-23 or any other Embodiment, wherein the plurality of nucleobase identities of analog nucleobases of the nascent DNA strand comprises two distinct nucleobase identities that correspond to one or more contacting steps used for DNA library preparation.
- Embodiment 25 The method of Embodiment 24 or any other Embodiment, wherein the plurality of nucleobase identities of analog nucleobases of the nascent DNA strand comprises three distinct nucleobase identities that correspond to one or more contacting steps used for DNA library preparation.
- Embodiment 27 The method of any of Embodiments 21-26 or any other Embodiment, wherein the method is performed at least in part by a programmable processor, a processor circuitry, a computational device, a computational system, a computational network, or any combination thereof.
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Organic Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Analytical Chemistry (AREA)
- Biophysics (AREA)
- Immunology (AREA)
- Microbiology (AREA)
- Molecular Biology (AREA)
- Biotechnology (AREA)
- Physics & Mathematics (AREA)
- Biochemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Genetics & Genomics (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
L'invention concerne des dispositifs, des kits, des compositions et des procédés de séquençage par nanopores de haute précision de polynucléotides. Les procédés comprennent la copie de séquences cibles avec des réactions de polymérase ayant des désoxynucléotides triphosphates analogiques (dNTP ; par exemple, des dNTP qui diffèrent chimiquement ou physiquement de dNTP canoniques), à la place de dNTP canoniques, qui sont incorporés dans des copies des séquences cibles à la place de dNTP canoniques. En raison de la présence des dNTP analogiques dans les séquences de copie, les séquences de copie présentent différentes signatures électriques par rapport aux séquences cibles et peuvent être utilisées pour améliorer la confiance dans l'appel de base et réduire la complexité du problème de séquençage.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202363608712P | 2023-12-11 | 2023-12-11 | |
| US63/608,712 | 2023-12-11 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025128542A1 true WO2025128542A1 (fr) | 2025-06-19 |
Family
ID=96058437
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2024/059342 Pending WO2025128542A1 (fr) | 2023-12-11 | 2024-12-10 | Adn de séquençage de nanopores par remplacement de nucléotides par des analogues |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025128542A1 (fr) |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20190309008A1 (en) * | 2012-04-09 | 2019-10-10 | The Trustees Of Columbia University In The City Of New York | Method of preparation of nanopore and uses thereof |
| US20210172013A1 (en) * | 2012-08-03 | 2021-06-10 | University Of Washington Through Its Center For Commercialization | Compositions and methods for improving nanopore sequencing |
-
2024
- 2024-12-10 WO PCT/US2024/059342 patent/WO2025128542A1/fr active Pending
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20190309008A1 (en) * | 2012-04-09 | 2019-10-10 | The Trustees Of Columbia University In The City Of New York | Method of preparation of nanopore and uses thereof |
| US20210172013A1 (en) * | 2012-08-03 | 2021-06-10 | University Of Washington Through Its Center For Commercialization | Compositions and methods for improving nanopore sequencing |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Philpott et al. | Nanopore sequencing of single-cell transcriptomes with scCOLOR-seq | |
| Lu et al. | Enzymatic DNA synthesis by engineering terminal deoxynucleotidyl transferase | |
| Dunn et al. | Analysis of aptamer discovery and technology | |
| Chen et al. | Single‐cell sequencing methodologies: from transcriptome to multi‐dimensional measurement | |
| Wu et al. | The promise of single-cell RNA sequencing for kidney disease investigation | |
| US20250232839A1 (en) | Barcode sequences, and related systems and methods | |
| El-Metwally et al. | Next generation sequencing technologies and challenges in sequence assembly | |
| Metzker | Sequencing technologies—the next generation | |
| Pinheiro et al. | Synthetic genetic polymers capable of heredity and evolution | |
| Korpelainen et al. | RNA-seq data analysis: a practical approach | |
| US11164659B2 (en) | Methods for expression profile classification | |
| Alkhamis et al. | Using exonucleases for aptamer characterization, engineering, and sensing | |
| Kawabe et al. | Enzymatic synthesis and nanopore sequencing of 12-letter supernumerary DNA | |
| Feldman et al. | In vivo structure–activity relationships and optimization of an unnatural base pair for replication in a semi-synthetic organism | |
| Feldman et al. | Chemical stabilization of unnatural nucleotide triphosphates for the in vivo expansion of the genetic alphabet | |
| Kong et al. | Single-cell omics: A new direction for functional genetic research in human diseases and animal models | |
| Uddin et al. | Quantitative analysis of the T cell receptor repertoire | |
| Lund et al. | Highly parallelized construction of DNA from low-cost oligonucleotide mixtures using data-optimized assembly design and golden gate | |
| Thomas et al. | Assessing readability of an 8-letter expanded deoxyribonucleic acid alphabet with nanopores | |
| Depmeier et al. | Expanding the Horizon of the Xeno Nucleic Acid Space: Threose Nucleic Acids with Increased Information Storage | |
| WO2025128542A1 (fr) | Adn de séquençage de nanopores par remplacement de nucléotides par des analogues | |
| Thomas et al. | Sequencing a DNA analog composed of artificial bases | |
| García-Sancho | The rise and fall of the idea of genetic information (1948-2006) | |
| Pitt et al. | Structure-guided engineering of the regioselectivity of RNA ligase ribozymes | |
| US20170017820A1 (en) | Automatic Processing Selection Based on Tagged Genomic Sequences |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24904719 Country of ref document: EP Kind code of ref document: A1 |