WO2025235623A1 - Procédés et systèmes de séquençage d'acide nucléique à l'aide d'un schéma de codage à mémoire tampon d'erreurs - Google Patents
Procédés et systèmes de séquençage d'acide nucléique à l'aide d'un schéma de codage à mémoire tampon d'erreursInfo
- Publication number
- WO2025235623A1 WO2025235623A1 PCT/US2025/028165 US2025028165W WO2025235623A1 WO 2025235623 A1 WO2025235623 A1 WO 2025235623A1 US 2025028165 W US2025028165 W US 2025028165W WO 2025235623 A1 WO2025235623 A1 WO 2025235623A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- different
- label
- codewords
- nucleotides
- error
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/10—Signal processing, e.g. from mass spectrometry [MS] or from PCR
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
Definitions
- the present disclosure relates generally to methods for nucleic acid sequencing and more specifically to methods for nucleic acid sequencing based on an assignment scheme for detecting and/or correcting sequencing errors.
- Sequencing methods are susceptible to sequencing errors.
- next-generation sequencing methods can yield sequencing errors due to crosstalk between the clusters of polynucleotides being sequenced in parallel, and/or due to phasing or pre-phasing between the sequencing cycles.
- existing sequencing methods are limited in their ability to detect or correct sequencing errors. Improved methods are needed for detecting and/or correcting sequencing errors. The present disclosure addresses these needs.
- sequencing methods are subject to sequencing errors. Such sequencing errors can result in incorrect sequences, which in the case of biomedical applications, can yield significant clinical consequences. Despite the serious ramifications of sequencing errors, existing sequencing methods lack the ability to detect and/or correct sequencing errors.
- the methods described herein include sequencing methods that leverage an assignment scheme which allows for the detecting or correcting of sequencing errors, during the sequencing process, e.g., in real time, or after the sequencing process, e.g., post hoc.
- the assignment scheme imposes minimal Hamming distances between codewords that are assigned a nucleotide base and codewords that remain unassigned, such that in the event of a sequencing error, a recorded signal code sequence that matches an unassigned codeword can be identified as containing the sequencing error and is not decoded as a codeword assigned to a nucleotide base.
- the assignment scheme imposes minimal Hamming distances between any two different codewords that are assigned to nucleotide bases, such that in the event of a sequencing error, an erroneous recorded signal code sequence is unlikely to resemble the codeword for a different nucleotide base.
- the minimal Hamming distance between any two different assigned codewords is at least two, three, four, or more, such that one of the assigned codeword cannot be erroneously recorded as the other assigned codeword by flipping just one code in the codeword (e.g., (1, 0, 0) for G cannot be decoded as (0, 1, 0) for A, as shown in FIG. 4A).
- the minimal Hamming distance between any two different assigned codewords is greater than the minimal Hamming distance between any assigned codeword and any unassigned codeword from a codebook.
- the minimal Hamming distance between any two different assigned codewords is 2 and the minimal Hamming distance between any assigned codeword and any unassigned codeword is 1. In some embodiments, the minimal Hamming distance between any two different assigned codewords (each assigned to a different nucleotide base) is less than the minimal Hamming distance between any assigned codeword and any unassigned codeword from a codebook. In some embodiments, the minimal Hamming distance between any two different assigned codewords (each assigned to a different nucleotide base) is equal to the minimal Hamming distance between any assigned codeword and any unassigned codeword from a codebook.
- the minimal Hamming distances help prevent a failure to identify a false positive nucleotide base call.
- errors in the recorded signal code sequences (which may be referred to as “recorded codewords,” which during decoding can be compared to the codewords in a codebook or “whitelist” generated based on the combinations of different labels and different states of each different label) can be detected and/or corrected.
- the methods described herein achieve their error-detecting and/or error-correcting functions not by implementing additional physical components to the sequencing method, but by the assigning and not assigning of codewords to nucleotide bases.
- a method for nucleic acid sequence analysis comprising generating a plurality of different codewords corresponding to combinations of: (i) multiple different labels and (ii) multiple different states associated with each different label, wherein the multiple different labels are configured to be optically distinguishable from each other.
- the method comprises assigning a different codeword from the plurality of different codewords to each one of four different bases, wherein the plurality of different codewords comprise the assigned codewords and one or more unassigned codewords, wherein: (i) any two assigned codewords have a first Hamming distance of at least 2, and (ii) any one of the assigned codewords and any one of the unassigned codewords have a second Hamming distance is at least 1.
- the method can comprise contacting a plurality of different polynucleotide templates with nucleotides of the four different bases, wherein the nucleotides comprise the multiple different labels according to the codeword assignment.
- the nucleotides of the four different bases can be contacted with the plurality of different polynucleotide templates in any suitable order, and the any two or more of the four different bases can be contacted simultaneously or sequentially with the plurality of different polynucleotide templates.
- the method can comprise allowing binding and optional incorporation of the nucleotides based on complementarity to nucleotide residues in the plurality of different polynucleotide templates.
- the method can comprise imaging the plurality of different polynucleotide templates to record the multiple different states associated with each of the multiple different labels, thereby recording signal code sequences associated with the labels of the bound and optionally incorporated nucleotides.
- the method can comprise comparing the recorded signal code sequences to the plurality of different codewords, thereby identifying the bases corresponding to the recorded signal code sequences and/or checking for errors in the recorded signal code sequences.
- the method can comprise iterations with respect to the contacting, the allowing, the imaging, or the comparing. In any of the preceding embodiments, the method can comprise identifying a nucleic acid sequence comprising the identified bases.
- the four different bases can comprise adenine (A), thymine (T), cytosine (C), guanine (G), or uracil (U).
- nucleotides having any one or more of the four different bases can comprise a reversible terminator.
- the reversible terminator can comprise a 3’-O- blocked reversible terminator or a 3 ’-unblocked reversible terminator.
- the multiple different labels can comprise fluorophores or purifiable tags.
- a label from the multiple different labels can comprise multiple fluorophores, purifiable tags, or a combination thereof.
- the multiple different labels can be removed from the nucleotides comprising the multiple different labels.
- the plurality of different polynucleotide templates can be affixed to a substrate.
- the plurality of different polynucleotide templates can be rolling circle amplification products.
- the plurality of different polynucleotide templates can be in nucleic acid nanoballs.
- the plurality of different polynucleotide templates can be in clonal clusters on a flow cell.
- the plurality of different polynucleotide templates can comprise one or more clusters each comprising a different polynucleotide template.
- the one or more clusters can comprise an ordered array of clusters on the substrate. In any of the preceding embodiments, the one or more clusters can be formed via bridge amplification. In any of the preceding embodiments, the one or more clusters can comprise multiple molecules comprising: i) one or more adapter sequences and/or one or more primer binding sequences and ii) the same template sequence or complement thereof. In any of the preceding embodiments, the one or more clusters can be formed via rolling circle amplification (RCA). In any of the preceding embodiments, the one or more clusters can each comprise one or more RCA products (RCPs) each comprising: i) one or more adapter sequences and/or one or more primer binding sequences and ii) the same template sequence.
- RCPs RCA products
- a maximum number of different codewords of the plurality of different codewords can be greater than four.
- a codeword from the plurality of different codewords can correspond to a base of the four different bases.
- a maximum number of assigned codewords can be 1, 2, 3, or 4.
- a maximum number of unassigned codewords can be 1, 2, 3, or 4.
- the maximum number of unassigned codewords can be at least equal to or greater than the maximum number of assigned codewords.
- a state from the multiple different states corresponding to each different label can be discrete. In any of the preceding embodiments, the state can be based on a discretized analog signal from the signals corresponding to the labels of the bound and optionally incorporated nucleotides. In any of the preceding embodiments, a maximum number of states from the multiple different states can be at least two. In any of the preceding embodiments, a maximum number of states from the multiple different states can be at least three.
- At least one of the unassigned codewords can be one Hamming distance away from at least one of the assigned codewords.
- the unassigned codewords can be configured to detect an error.
- the unassigned codewords can be configured to be corrected after being subjected to the error.
- the error can comprise at least one change in the state of a label from the multiple different labels for the codeword.
- the error can comprise a result of a random event.
- the error can comprise a sequencing error.
- the error can comprise an error in base calling. In any of the preceding embodiments, the error can comprise an error in resolving a resolution. In any of the preceding embodiments, the error in resolving the resolution can comprise an error in resolving a spatial resolution. In any of the preceding embodiments, the error in resolving the spatial resolution can comprise an error in resolving signals from different areas on a flow cell of a next-generation sequencing platform. In any of the preceding embodiments, the error in resolving a resolution can comprise an error in resolving a temporal resolution.
- the error in resolving the temporal resolution can comprise an error in resolving signals across sequencing cycles of a next-generation sequencing platform.
- the error can be detected in real time.
- the error can be detected post hoc.
- nucleic acid sequence analysis comprising: contacting a plurality of different polynucleotide templates with nucleotides comprising: nucleotides of a first base, each labeled with a first label, nucleotides of a second base, each labeled with a second label, nucleotides of a third base, each labeled with a third label, and nucleotides of a fourth base, comprising species labeled with the first label, species labeled with the second label, and species labeled with the third label, wherein the first, second, third, and fourth bases are different bases, and the first, second, and third labels are configured to be optically distinguishable from each other.
- the method comprises allowing binding and optional incorporation of the pool of nucleotides based on complementarity to nucleotide residues in the plurality of different polynucleotide templates.
- the method can comprise imaging the plurality of different polynucleotide templates to detect signals corresponding to labels of bound and optionally incorporated nucleotides, wherein the imaging comprises: detecting signals corresponding to the first label, detecting signals corresponding to the second label, and detecting signals corresponding to the third label.
- detecting signals corresponding to the first label, detecting signals corresponding to the second label, and detecting signals corresponding to the third label are performed sequentially, each using a different channel of a microscope, e.g., the different channels correspond to different “colors” of detection.
- the method can comprise iterations with respect to the contacting, the allowing, and/or the imaging.
- nucleic acid sequence analysis comprising: contacting a plurality of different polynucleotide templates with nucleotides comprising: nucleotides of a first base, comprising species labeled with a first label and species labeled with a second label; nucleotides of a second base, comprising species labeled with the first label and species labeled with a third label; nucleotides of a third base, comprising species labeled with the second label and species labeled with the third label; and nucleotides of a fourth base, none of which is labeled with the first label, the second label, or the third label.
- the first, second, third, and fourth bases are different bases, and the first, second, and third labels are configured to be optically distinguishable from each other.
- the method can comprise allowing binding and optional incorporation of the pool of nucleotides based on complementarity to nucleotide residues in the plurality of different polynucleotide templates.
- the method can comprise imaging the plurality of different polynucleotide templates to detect signals corresponding to the labels or the absence thereof of bound and optionally incorporated nucleotides, wherein the imaging comprises: detecting signals corresponding to the first label, detecting signals corresponding to the second label, and detecting signals corresponding to the third label.
- detecting signals corresponding to the first label, detecting signals corresponding to the second label, and detecting signals corresponding to the third label are performed sequentially, each using a different channel of a microscope, e.g., the different channels correspond to different “colors” of detection.
- the method can comprise iterations with respect to the contacting, the allowing, and/or the imaging.
- FIG. 1 provides an exemplary method for analyzing a nucleic acid sequence.
- FIG. 2 provides an exemplary schematic depicting the composition and significance of codewords.
- FIG. 3 provides an exemplary table of codeword possibilities, given two states and three labels.
- FIG. 4 A provides an exemplary table of codeword possibilities and provides assignments corresponding to the codeword possibilities.
- any of the eight signal combinations can be assigned to any of the four types of DNA bases.
- the example shows one of the possible ways of the signal assignments.
- FIG. 4B provides a schematic depicting the experiment implementation of a sequencing regimen corresponding to the table in FIG. 4 A.
- the DNA templates to be sequenced represented with X, Y, Z, and W, are anchored to the surface.
- An incorporation mixture of DNA polymerase and 3’ end-blocked deoxynucleotide triphosphates (dNTPs) is applied to the surface.
- dNTPs end-blocked deoxynucleotide triphosphates
- dTTP is modified with three tags yielding all the three signals.
- dGTP is modified with Tag 1, which yields Signal 1.
- dATP is modified with Tag 2, which yields Signal 2.
- dCTP is modified with Tag 3, which yields Signal 3.
- the three signals are detected sequentially.
- FIG. 5 provides an exemplary table of codeword possibilities and provides assignments and decoding corresponding to the codeword possibilities.
- FIG. 6 provides an exemplary table of codeword possibilities and provides assignments and decoding corresponding to the codeword possibilities.
- FIG. 7 provides an exemplary table of codeword possibilities and provides assignments and decoding corresponding to the codeword possibilities.
- Methods and systems for analyzing a nucleic acid sequence are described. That is, different codewords are generated, based on combinations of different labels and different states associated with each different label. The multiple different labels can be optically distinguishable from each other.
- a different codeword can be assigned to each of four different bases, from the different codewords.
- the different codewords also include unassigned codewords. For the different codewords, a Hamming distance between any two assigned codewords can be at least 2. In addition, a Hamming distance between any one of the assigned codewords and any one of the unassigned codewords can be at least 1.
- Different polynucleotide templates can be contacted with nucleotides of the four different bases, and the nucleotides can comprise the different labels. Nucleotides can be allowed to bind and optionally be incorporated into the different polynucleotide templates, based on complementarity.
- the different polynucleotide templates can be imaged to record the multiple different states for the multiple different labels, to generate recorded signal codes for the labels of the bound and optionally incorporated nucleotides and therefore the recorded signal code sequences for the bound and optionally incorporated nucleotides.
- the signal code sequences can be compared to the different codewords (including assigned codewords and unassigned codewords from a codebook or “whitelist” corresponding to various combinations of different detectable labels and different states of each different detectable label), to identify the bases corresponding to the recorded signal code sequences and/or to check for errors in the recorded signal code sequences.
- Sequencing methods are subject to errors, such as errors during signal detection and during decoding. Such sequencing errors can result in incorrect sequences, which in the case of biomedical applications, can yield significant clinical consequences. Despite the serious ramifications of sequencing errors, existing sequencing methods largely fail to detect and/or correct sequencing errors. This failure is true of even next-generation sequencing methods.
- the methods described herein include sequencing methods that leverage an assignment scheme which allows for the detecting or correcting of errors, during the sequencing process, e.g., in real time, or after the sequencing process, e.g., post hoc.
- codewords e.g., a combination of labels and their states
- One key benefit of the methods described herein is that the assigning of the codewords for representing each nucleotide base need not require from existing sequencing platforms any additional cost-prohibitive components. Existing sequencing platforms can readily incorporate the methods described herein. An experimenter need only decide which labels or combination of labels will be assigned to represent each nucleotide base, and provided that the assigning adheres to the Hamming distance rules described herein, the experimenter can conduct and interpret their sequencing experiment and at the very least, detect, and to some degree, correct, sequencing errors.
- an exclusive label 1 can be used to represent G nucleotides (i.e., the codeword for G)
- an exclusive label 2 can be used to represent A nucleotides (i.e., the codeword for A)
- an exclusive label 3 can be used to represent C nucleotides (i.e., the codeword for C)
- a combination of necessarily label 1, label 2, and label 3 can be used to represent T nucleotides (i.e., the codeword for T).
- the remaining unassigned codewords can be used for the errorchecking functions, according to the methods described herein.
- the described example assignment scheme simultaneously demonstrates the disclosed methods’ compatibility with low- channel sequencing methods, and adheres to the Hamming distance-based assignment rules described herein.
- the described methods are readily compatible with most existing sequencing platforms and schemes, including low-channel sequencing methods.
- the methods described herein provide the above advantages by leveraging a carefully crafted assignment scheme. More specifically, the assignment scheme adheres to two rules regarding minimum Hamming distances: a first Hamming distance between any two assigned codewords is at least 2, and a second Hamming distance between any one of the assigned codewords and any one of the unassigned codewords is at least 1.
- FIG. 1 depicts a table comprising two codewords: (0, 0, 0) and (1, 0, 0).
- Each index of the codeword refers to a type of label, such as label 1, label 2, and label 3, as depicted in FIG. 1. Therefore, in the case of FIG. 1, for the codeword (0, 0, 0), the first digit (e.g., 0) refers to label 1, the second digit (e.g., 0) refers to label 2, and the third digit (e.g., 0) refers to label 3.
- the first digit (e.g., 1) refers to label 1
- the second digit (e.g., 0) refers to label 2
- the third digit (e.g., 0) refers to label 3.
- a label may correspond to a fluorescent label, such as a dye that emits green light (e.g., about 509 nm), and the labels may be optically distinguishable, such that the emitted wavelength for a label may emit dependably distinct emission wavelength from another label.
- the value from each label can be considered a state.
- the first digit, which is at label 1 has a state of 0; the second digit, which is at label 2, has a state of 0; and the third digit, which is at label 3, has a state of 0.
- the first digit, which is at label 1 has a state of 1; the second digit, which is at label 2, has a state of 0; and the third digit, which is at label 3, has a state of 0.
- the states can refer to discrete values, and need not only be discrete values of 0 and 1, and need not be limited to only two possible states.
- the state can represent a physical event, such as whether a signal is detected for a label, which may be assigned a state of 1 , or whether a signal is not detected for a label, which may be assigned a state of 0.
- a codeword can be assigned to represent a nucleotide base, or a codeword can remain unassigned.
- the codeword (0, 0, 0) can remain unassigned and not refer to a nucleotide base.
- the codeword (1, 0, 0) can refer to a nucleotide base, such as a G nucleotide.
- the codeword (1, 0, 0) may refer to detecting a signal for label 1, but not label 2 or label 3, and such a combination of labels and their states may correspond to the detection of a G nucleotide for a nucleotide sequence.
- the codeword (0, 0, 0) for example, may refer to detecting no signals for label 1, label 2, or label 3, and such a combination of labels and their states may not directly correspond to a nucleotide base.
- the detection of an unassigned codeword may indicate the detection of a sequencing error.
- the methods described herein can comprise correcting the error.
- the correcting of the error need not be based on certainty but can comprise a less than 100% probability. For example, upon detecting a sequencing error, a 33% probability may exist that the error should be the nucleotide A, a 33% probability may exist that the error should be the nucleotide T, and a 33% probability may exist that the error should be the nucleotide G.
- the correcting an error can comprise reducing the possibility space of what the nucleotide base may have been prior to the error, when compared to the complete set of possible nucleotide bases.
- correcting an error can comprise determining that the nucleotide base was likely one of three possible nucleotide bases, as opposed to one of four possible nucleotide bases.
- increasing the amount of information e.g., reducing the entropy, in the information theory sense
- determining the original nucleotide base can be considered to be a form of correcting the error.
- the detection of an unassigned codeword may indicate the detection of a sequencing error, provided that the assignment scheme adheres to the methods and systems described herein. For example, when four different codewords are assigned to four different nucleotide bases, the experimenter can expect, during the sequencing experiment, to not detect codewords that do not correspond to the four different nucleotide bases. The detection of a codeword outside of the four different codewords that correspond to the four different nucleotide bases — that is, an unassigned codeword — would therefore indicate a sequencing error.
- the methods and systems described herein further specify the codeword assignments by referencing minimum Hamming distances.
- the assigned codewords should be sufficiently different — e.g., the Hamming distance between any two assigned codewords should be at least 2 — such that if a state for one of the assigned codeword labels was to erroneously alter (e.g., a sequencing error occurs), then that altered assigned codeword would not resemble a different assigned codeword and would instead resemble only one of the unassigned codewords. In doing so, any erroneous alteration to an assigned codeword can be reliably detected.
- the assigned codewords and the unassigned codewords should be different but sufficiently similar, such that if a state for one of the assigned codeword labels was to erroneously alter (e.g., a s sequencing error occurs), then that altered assigned codeword would resemble an unassigned codeword.
- a state for one of the assigned codeword labels was to erroneously alter (e.g., a s sequencing error occurs)
- that altered assigned codeword would resemble an unassigned codeword.
- a method for nucleic acid sequence analysis can comprise: (a) generating a plurality of different codewords based on combinations of multiple different labels and multiple different states associated with each different label, wherein the multiple different labels are configured to be optically distinguishable from each other; (b) assigning a different codeword from the plurality of different codewords to each one of four different bases, wherein the plurality of different codewords comprise the assigned codewords and one or more unassigned codewords, wherein (i) a first Hamming distance between any two assigned codewords is at least 2, and (ii) a second Hamming distance between any one of the assigned codewords and any one of the unassigned codewords is at least 1 ; (c) contacting a plurality of different polynucleotide templates with nucleotides of the four different bases, wherein the nucleotides comprise the multiple different labels according to the assigning in
- the plurality of different polynucleotide templates are contacted with nucleotides of the four different bases simultaneously. In some embodiments, the plurality of different polynucleotide templates are contacted with nucleotides of the four different bases sequentially in any order. In some embodiments, the plurality of different polynucleotide templates are contacted with nucleotides of the two or three different bases simultaneously, while nucleotides of the remaining two different bases or one base are contacted with the plurality of different polynucleotide templates either before or after contacting with nucleotides of the two or three different bases.
- “About” and “approximately” shall generally mean an acceptable degree of error for the quantity measured given the nature or precision of the measurements. Exemplary degrees of error are within 20 percent (%), typically, within 10%, and more typically, within 5% of a given value or range of values.
- the terms “comprising” (and any form or variant of comprising, such as “comprise” and “comprises”), “having” (and any form or variant of having, such as “have” and “has”), “including” (and any form or variant of including, such as “includes” and “include”), or “containing” (and any form or variant of containing, such as “contains” and “contain”), are inclusive or open-ended and do not exclude additional, un-recited additives, components, integers, elements, or method steps.
- ordinal terms such as “first”, “second”, “third”, etc., in the claims to modify a claim element does not by itself connote any priority, precedence, or order of one claim element over another or the temporal order in which acts of a method are performed, but are used merely as labels to distinguish one claim element having a certain name from another element having a same name (but for use of the ordinal term) to distinguish the claim elements.
- use of a), b), etc., or i), ii), etc. does not by itself connote any priority, precedence, or order of steps in the claims.
- the use of these terms in the specification does not by itself connote any required priority, precedence, or order.
- a nucleic acid can contain nucleotides having any of a variety of analogs of these sugar moieties that are known in the art.
- a nucleic acid can include native or non-native nucleotides.
- a native deoxyribonucleic acid can have one or more bases selected from the group consisting of adenine (A), thymine (T), cytosine (C), or guanine (G)
- a ribonucleic acid can have one or more bases selected from the group consisting of uracil (U), adenine (A), cytosine (C), or guanine (G).
- Useful non-native bases that can be included in a nucleic acid or nucleotide are known in the art.
- a “probe” or a “target,” when used in reference to a nucleic acid or sequence of a nucleic acids, is intended as a semantic identifier for the nucleic acid or sequence in the context of a method or composition, and does not limit the structure or function of the nucleic acid or sequence beyond what is expressly indicated.
- oligonucleotide and “polynucleotide” are used interchangeably to refer to a single-stranded multimer of nucleotides from about 2 to about 500 nucleotides in length. Oligonucleotides can be synthetic, made enzymatically (e.g., via polymerization), or using a “split-pool” method. Oligonucleotides can include ribonucleotide monomers (e.g., can be oligoribonucleotides) and/or deoxyribonucleotide monomers (e.g., oligodeoxyribonucleotides).
- oligonucleotides can include a combination of both deoxyribonucleotide monomers and ribonucleotide monomers in the oligonucleotide (e.g., random or ordered combination of deoxyribonucleotide monomers and ribonucleotide monomers).
- An oligonucleotide can be 4 to 10, 10 to 20, 21 to 30, 31 to 40, 41 to 50, 51 to 60, 61 to 70, 71 to 80, 80 to 100, 100 to 150, 150 to 200, 200 to 250, 250 to 300, 300 to 350, 350 to 400, or 400-500 nucleotides in length, for example.
- Oligonucleotides can include one or more functional moieties that are attached (e.g., covalently or non-covalently) to the multimer structure.
- an oligonucleotide can include one or more detectable labels (e.g., a radioisotope or fluorophore).
- detectable label refers to a directly or indirectly detectable moiety that is coupled to or may be coupled to another moiety, for example, a nucleotide or nucleotide analog.
- the detectable label can be directly detectable by itself (e.g., radioisotope labels or fluorescent labels) or, in the case of an enzymatic label, can be indirectly detectable, e.g., by catalyzing chemical alterations of a substrate compound or composition, which substrate compound or composition is directly detectable.
- the label can emit a signal or alter a signal delivered to the label so that the presence or absence of the label can be detected.
- coupling may be via a linker, which may be cleavable, such as photo-cleavable (e.g., cleavable under ultra-violet light), chemically- cleavable (e.g., via a reducing agent, such as dithiothreitol (DTT), tris(2-carboxyethyl)phosphine (TCEP)) or enzymatically cleavable (e.g., via an esterase, lipase, peptidase, or protease).
- cleavable such as photo-cleavable (e.g., cleavable under ultra-violet light), chemically- cleavable (e.g., via a reducing agent, such as dithiothreitol (DTT), tris(2-carboxyethyl)phosphine (TCEP)) or enzymatically cleavable (e.g., via an esterase, lip
- a detectable label is or includes a fluorophore.
- fluorophores include, but are not limited to, fluorescent nanocrystals; quantum dots; d- Rhodamine acceptor dyes including dichlorofRl 10], dichloro [R6G], dichloro [TAMRA], dichlorofROX] or the like; fluorescein donor dye including fluorescein, 6-FAM, or the like; Cyanine dyes such as Cy3B; Alexa dyes, SETA dyes, Atto dyes such as atto 647N which forms a FRET pair with Cy3B and the like.
- Fluorophores include, but are not limited to, MDCC (7- diethylamino-3-[([(2-maleimidyl)ethyl]amino)carbonyl]coumarin), TET, HEX, Cy3, TMR, ROX, Texas Red, Cy5, LC red 705 and LC red 640.
- a detectable label is or includes a luminescent or chemiluminescent moiety.
- luminescent/chemiluminescent moieties include, but are not limited to, peroxidases such as horseradish peroxidase (HRP), soybean peroxidase (SP), alkaline phosphatase, and luciferase. These protein moieties can catalyze chemiluminescent reactions given the appropriate substrates (e.g., an oxidizing reagent plus a chemiluminescent compound. A number of compound families are known to provide chemiluminescence under a variety of conditions.
- Non-limiting examples of chemiluminescent compound families include 2,3- dihydro-l,4-phthalazinedione luminol, 5-amino-6,7,8-trimethoxy- and the dimethylamino [ca]benz analog. These compounds can luminesce in the presence of alkaline hydrogen peroxide or calcium hypochlorite and base.
- chemiluminescent compound families include, e.g., 2,4,5-triphenylimidazoles, para-dimethylamino and - methoxy substituents, oxalates such as oxalyl active esters, p-nitrophenyl, N-alkyl acridinum esters, luciferins, lucigenins, or acridinium esters.
- a detectable label is or includes a metal-based or mass-based label.
- Pairing can be achieved by any process in which a nucleic acid sequence joins with a substantially or fully complementary sequence through base pairing to form a hybridization complex.
- two nucleic acid sequences are “substantially complementary” if at least 60% (e.g., at least 70%, at least 80%, or at least 90%) of their individual bases are complementary to one another.
- a “primer” is a single-stranded nucleic acid sequence having a 3 ’ end that can be used as a substrate for a nucleic acid polymerase in a nucleic acid extension reaction.
- RNA primers are formed of RNA nucleotides, and are used in RNA synthesis, while DNA primers are formed of DNA nucleotides and used in DNA synthesis.
- Primers can also include both RNA nucleotides and DNA nucleotides (e.g., in a random or designed pattern). Primers can also include other natural or synthetic nucleotides described herein that can have additional functionality.
- DNA primers can be used to prime RNA synthesis and vice versa (e.g., RNA primers can be used to prime DNA synthesis).
- Primers can vary in length. For example, primers can be about 6 bases to about 120 bases. For example, primers can include up to about 25 bases.
- a primer may in some cases, refer to a primer binding sequence.
- a “nucleic acid extension” generally involves incorporation of one or more nucleic acids (e.g., A, G, C, T, U, nucleotide analogs, or derivatives thereof) into a molecule (such as, but not limited to, a nucleic acid sequence) in a template-dependent manner, such that consecutive nucleic acids are incorporated by an enzyme (such as a polymerase or reverse transcriptase), thereby generating a newly synthesized nucleic acid molecule.
- Enzymatic extension can be performed by an enzyme including, but not limited to, a polymerase and/or a reverse transcriptase.
- a primer that hybridizes to a complementary nucleic acid sequence can be used to synthesize a new nucleic acid molecule by using the complementary nucleic acid sequence as a template for nucleic acid synthesis.
- a 3’ polyadenylated tail of an mRNA transcript that hybridizes to a poly (dT) sequence can be used as a template for singlestrand synthesis of a corresponding cDNA molecule.
- a poly (dT) sequence may be used as a sequencing primer for sequencing RNA molecules comprising poly(A) tails.
- a “non-terminating nucleotide” or “incorporating nucleotide” can include a nucleic acid moiety that can be attached to a 3' end of a polynucleotide using a polymerase or transcriptase, and that can have another non-terminating nucleic acid attached to it using a polymerase or transcriptase without the need to remove a protecting group or reversible terminator from the nucleotide.
- Naturally occurring nucleic acids are a type of non-terminating nucleic acid. Nonterminating nucleic acids may be labeled or unlabeled.
- a “PCR amplification” refers to the use of a polymerase chain reaction (PCR) to generate copies of genetic material, including DNA and RNA sequences. Suitable reagents and conditions for implementing PCR are described, for example, in U.S. Patent Nos. 4,683,202, 4,683,195, 4,800,159, 4,965,188, and 5,512,462, the entire contents of each of which are incorporated herein by reference.
- the reaction mixture includes the genetic material to be amplified, an enzyme, one or more primers that are employed in a primer extension reaction, and reagents for the reaction.
- the oligonucleotide primers are of sufficient length to provide for hybridization to complementary genetic material under annealing conditions.
- the length of the primers generally depends on the length of the amplification domains, but will typically be at least 4 bases, at least 5 bases, at least 6 bases, at least 8 bases, at least 9 bases, at least 10 base pairs (bp), at least 11 bp, at least 12 bp, at least 13 bp, at least 14 bp, at least 15 bp, at least 16 bp, at least 17 bp, at least 18 bp, at least 19 bp, at least 20 bp, at least 25 bp, at least 30 bp, at least 35 bp, and can be as long as 40 bp or longer, where the length of the primers will generally range from 18 to 50 bp.
- the genetic material can be contacted with a single primer or a set of two primers (forward and reverse primers), depending upon whether primer extension, linear or exponential amplification of the genetic material is desired.
- the PCR amplification process uses a DNA polymerase enzyme.
- the DNA polymerase activity can be provided by one or more distinct DNA polymerase enzymes.
- the DNA polymerase enzyme is from a bacterium, e.g., the DNA polymerase enzyme is a bacterial DNA polymerase enzyme.
- the DNA polymerase can be from a bacterium of the genus Escherichia, Bacillus, Thermophilus, or Pyrococcus.
- PCR amplification can include reactions such as, but not limited to, a strand-displacement amplification reaction, a rolling circle amplification reaction, a ligase chain reaction, a transcription-mediated amplification reaction, an isothermal amplification reaction, and/or a loop-mediated amplification reaction.
- reactions such as, but not limited to, a strand-displacement amplification reaction, a rolling circle amplification reaction, a ligase chain reaction, a transcription-mediated amplification reaction, an isothermal amplification reaction, and/or a loop-mediated amplification reaction.
- PCR amplification uses a single primer that is complementary to the 3’ tag of target DNA fragments.
- PCR amplification uses a first and a second primer, where at least a 3 ’ end portion of the first primer is complementary to at least a portion of the 3 ’ tag of the target nucleic acid fragments, and where at least a 3 ’ end portion of the second primer exhibits the sequence of at least a portion of the 5’ tag of the target nucleic acid fragments.
- a 5’ end portion of the first primer is non-complementary to the 3’ tag of the target nucleic acid fragments, and a 5’ end portion of the second primer does not exhibit the sequence of at least a portion of the 5’ tag of the target nucleic acid fragments.
- the first primer includes a first universal sequence and/or the second primer includes a second universal sequence.
- DNA polymerase includes not only naturally-occurring enzymes but also all modified derivatives thereof, including also derivatives of naturally-occurring DNA polymerase enzymes.
- the DNA polymerase can have been modified to remove 5 ’-3’ exonuclease activity.
- Sequence-modified derivatives or mutants of DNA polymerase enzymes that can be used include, but are not limited to, mutants that retain at least some of the functional, e.g. DNA polymerase activity of the wild-type sequence. Mutations can affect the activity profile of the enzymes, e.g. enhance or reduce the rate of polymerization, under different reaction conditions, e.g. temperature, template concentration, primer concentration, etc. Mutations or sequence-modifications can also affect the exonuclease activity and/or thermostability of the enzyme.
- DNA polymerases that can be used include, but are not limited to: E.coli DNA polymerase I, Bsu DNA polymerase, Bst DNA polymerase, Taq DNA polymerase, VENTTM DNA polymerase, DEEP VENTTM DNA polymerase, LongAmp® Taq DNA polymerase, LongAmp® Hot Start Taq DNA polymerase, Crimson LongAmp® Taq DNA polymerase, Crimson Taq DNA polymerase, OneTaq® DNA polymerase, OneTaq® Quick- Load® DNA polymerase, Hemo KlenTaq® DNA polymerase, REDTaq® DNA polymerase, Phusion® DNA polymerase, Phusion® High-Fidelity DNA polymerase, Platinum Pfx DNA polymerase, AccuPrime Pfx DNA polymerase, Phi29 DNA polymerase, Klenow fragment, Pwo DNA polymerase, Pfu DNA polymerase, T4 DNA polymerase and T7 DNA
- genetic material is amplified by reverse transcription polymerase chain reaction (RT-PCR).
- the desired reverse transcriptase activity can be provided by one or more distinct reverse transcriptase enzymes, suitable examples of which include, but are not limited to: M-MLV, MuLV, AMV, HIV, ArrayScriptTM, MultiScribeTM, ThermoScriptTM, and SuperScript® I, II, III, and IV enzymes.
- Reverse transcriptase includes not only naturally occurring enzymes, but all such modified derivatives thereof, including also derivatives of naturally-occurring reverse transcriptase enzymes.
- reverse transcription can be performed using sequence-modified derivatives or mutants of M-MLV, MuLV, AMV, and HIV reverse transcriptase enzymes, including mutants that retain at least some of the functional, e.g. reverse transcriptase, activity of the wildtype sequence.
- the reverse transcriptase enzyme can be provided as part of a composition that includes other components, e.g. stabilizing components that enhance or improve the activity of the reverse transcriptase enzyme, such as RNase inhibitor(s), inhibitors of DNA-dependent DNA synthesis, e.g. actinomycin D.
- sequence-modified derivative or mutants of reverse transcriptase enzymes e.g., M-MLV
- compositions including unmodified and modified enzymes are commercially available, e.g., ArrayScriptTM, MultiScribeTM, ThermoScriptTM, and SuperScript® I, II, III, and IV enzymes.
- Certain reverse transcriptase enzymes can synthesize a complementary DNA strand using both RNA (cDNA synthesis) and singlestranded DNA (ssDNA) as a template.
- the reverse transcription reaction can use an enzyme (reverse transcriptase) that is capable of using both RNA and ssDNA as the template for an extension reaction, e.g., an AMV or MMLV reverse transcriptase.
- FIG. 1 The figures illustrate processes according to various embodiments.
- some blocks are, optionally, combined, the order of some blocks is, optionally, changed, and some blocks are, optionally, omitted.
- additional steps may be performed in combination with the exemplary processes. Accordingly, the operations as illustrated (and described in greater detail below) are exemplary by nature, and, as such, should not be viewed as limiting.
- FIG. 2 shows an exemplary schematic showing a general process 200 for analyzing a nucleic acid sequence.
- the method can include: generating a plurality of different codewords based on combinations of multiple different labels and multiple different states associated with each different label, wherein the multiple different labels are configured to be optically distinguishable from each other (202); assigning a different codeword from the plurality of different codewords to each one of four different bases, wherein the plurality of different codewords comprise the assigned codewords and one or more unassigned codewords, wherein (i) a first Hamming distance between any two assigned codewords is at least 2, and (ii) a second Hamming distance between any one of the assigned codewords and any one of the unassigned codewords is at least 1 (204); contacting a plurality of different polynucleotide templates with nucleotides of the four different bases, wherein the nucleotides comprise the multiple different labels according to the assigning in (b) (206); allowing binding and
- the method can further comprise being iterated over iterations, with respect to the (c) contacting, the (d) allowing, the (e) imaging, or the (f) comparing. In some instances, the iterated method can further comprise identifying a nucleic acid sequence comprising the identified bases.
- process 200 some blocks are, optionally, combined, the order of some blocks is, optionally, changed, and some blocks are, optionally, omitted. In some examples, additional steps may be performed in combination with the process 200. Accordingly, the operations as illustrated (and described in greater detail below) are exemplary by nature and, as such, should not be viewed as limiting.
- a plurality of different codewords based on combinations of multiple different labels and multiple different states associated with each different label are generated, wherein the multiple different labels are configured to be optically distinguishable from each other.
- the emission spectra for the multiple different labels can be largely nonoverlapping, e.g., emission spectrum peaks between the multiple different labels are far enough apart such that different detectors can readily distinguish the emissions from the different types of labels.
- the multiple different labels can comprise fluorophores or purifiable tags.
- the fluorophores or purifiable tags can be small molecules.
- the label from the multiple different labels can comprise multiple fluorophores, purifiable tags, or a combination thereof.
- a codeword comprising multiple labels can comprise a homogenous population of molecules all comprising the same multiple labels.
- a codeword comprising multiple labels can comprise a heterogeneous population of molecules comprising one or more combinations of multiple different labels.
- a codeword comprising label 1 and label 2 can correspond to a population comprising molecules with only label 1 and molecules with only label 2, and/or molecules with label 1 and label 2.
- the multiple different labels can be removed from the nucleotides comprising the multiple different labels.
- the maximum number of possible multiple different labels can be at least three. The maximum number of possible multiple different labels can determine the maximum length of the codewords.
- a state from the multiple different states corresponding to each different label can be discrete. That is, each different label may assume one of the multiple different states, and each label may have a same maximum possible number of the multiple different states.
- the state can be based on a discretized analog signal from the signals corresponding to the labels of the bound and optionally incorporated nucleotides. That is, the discretized analog signal may derive from an originally analog signal, such as the strength of an emitted fluorescence signal from the label (e.g., an indicator of the number of detected photons from the label).
- the discretization of the analog signal may comprise thresholding the analog signal into the different states.
- signals that are below one or more thresholds can be considered one state, and signals that are below the one or more thresholds can be considered a different state.
- the state can be LOW or HIGH, which can also be expressed as 0 or 1.
- the state can be LOW, MIDDLE, or HIGH, which can also be expressed as 0 or 1 or 2.
- Float values can be used to represent the different states, but the float values can be understood to represent discrete values, e.g., states, and need not imply the existence of intermediate values.
- a maximum number of states from the multiple different states can be at least two.
- a maximum number of states from the multiple different states can be at least three. States can be mutually exclusive, such that a label occupying a first state may preclude the label from occupying a second state.
- a codeword can be thought of as a vector or a list, as used in mathematics or computer science — that is, a data structure comprising ordered elements.
- the codeword (0, 0, 0) can be thought of as a vector with elements 0, 0, and 0, with corresponding indices of 0, 1, and 2 (if zero-based indexing is used).
- the codeword (1, 0, 0) can be thought of as a vector with elements 1, 0, and 0, with corresponding indices of 0, 1, and 2 (if zero-based indexing is used).
- a different codeword from the plurality of different codewords is assigned to each one of four different bases, wherein the plurality of different codewords comprise the assigned codewords and one or more unassigned codewords, wherein (i) a first Hamming distance between any two assigned codewords is at least 2, and (ii) a second Hamming distance between any one of the assigned codewords and any one of the unassigned codewords is at least 1.
- the four different bases can comprise adenine (A), thymine (T), cytosine (C), guanine (G), or uracil (U).
- a maximum number of different codewords of the plurality of different codewords can be greater than four.
- a codeword from the plurality of different codewords can correspond to a base of the four different bases.
- a maximum number of assigned codewords can be 1, 2, 3, or 4.
- a maximum number of unassigned codewords can be 1, 2, 3, or 4.
- the maximum number of unassigned codewords can be at least equal to the maximum number of assigned codewords.
- At least one of the unassigned codewords can be one Hamming distance away from at least one of the assigned codewords. The Hamming distance can be understood as a minimum number of state changes to the labels for a first codeword to resemble a second codeword.
- the first codeword (0, 0, 1) has a Hamming distance of 2 relative to (0, 1, 0), because the minimal number of state changes for the first codeword to resemble the second codeword comprises both the second label and the third label of the first codeword changing states — i.e., two changes — to resemble the second codeword.
- the number of possible codewords in the assignment scheme can be determined by:
- An advantage of the disclosed methods can be seen, given the mathematical expression of (states max ) labelSmax for the number of possible codewords — the number of possible codewords scales exponentially, given the maximum number of possible states or the maximum number of possible labels. Therefore, even a small increase in the maximum number of possible states or the maximum number of possible labels can result in a massive increase in the number of possible codewords that can be assigned or unassigned.
- An increase in the number of possible codewords is beneficial, because the larger the number of possible codewords in the assignment space, the larger the maximum possible Hamming distance between codewords and the larger the number of unique unassigned codewords, which means that original codewords can be more confidently inferred (e.g., with higher probability), in the event of a sequencing error.
- the assignment scheme can or better able to check errors of greater magnitude than single code errors (e.g., flipping of 0 to 1 or vice versa at a single code position in a codeword), e.g., two-code errors or three-code errors.
- a plurality of different polynucleotide templates with nucleotides of the four different bases is contacted, wherein the nucleotides comprise the multiple different labels according to the assigning in 204.
- the nucleotides of the four different bases comprise a reversible terminator.
- the reversible terminator can comprise a 3’-O-blocked reversible terminator or a 3 ’-unblocked reversible terminator.
- the plurality of different polynucleotide templates are affixed to a substrate.
- the substrate can be a nanoball or a flow cell.
- the plurality of different polynucleotide templates can comprise one or more clusters of different polynucleotide templates.
- the one or more clusters can comprise an ordered array of clusters on the substrate.
- the one or more clusters can be formed via bridge amplification.
- the one or more clusters can comprise multiple molecules comprising: i) one or more adapter sequences and/or one or more primer binding sequences and ii) the same template sequence or complement thereof.
- the one or more clusters can be formed via rolling circle amplification (RCA).
- the one or more clusters can each comprise one or more RCA products (RCPs) each comprising: i) one or more adapter sequences and/or one or more primer binding sequences and ii) the same template sequence.
- the one or more clusters can each comprise one or more RCA products (RCPs) each comprising: i) one or more adapter sequences and/or one or more primer binding sequences and ii) the same template sequence.
- RCPs RCA products
- the one or more clusters can comprise an intermediate from a sequencing by synthesis method or a sequencing by avidity method.
- binding and optional incorporation of the nucleotides is allowed based on complementarity to nucleotide residues in the plurality of different polynucleotide templates.
- the plurality of different polynucleotide templates is imaged to record the multiple different states associated with each of the multiple different labels, thereby generating recorded signal code sequences associated with the labels of the bound and optionally incorporated nucleotides.
- the imaging can be done with a detector, such as a camera or photomultiplier tube.
- the recorded signal code sequences are compared to the plurality of different codewords, thereby identifying the bases corresponding to the recorded signal code sequences and/or checking for errors in the recorded signal code sequences.
- the unassigned codewords can be configured to detect an error.
- the error can comprise at least one change in the state of a label from the multiple different labels for the codeword.
- the error can comprise a result of a random event.
- the error can comprise a sequencing error.
- the sequencing error can be the result of the random event, or the sequencing error can be the result of a systematic deficiency in the sequencing method.
- the error can comprise an error in a single nucleotide.
- the error can comprise an error in resolving a resolution.
- the error in resolving the resolution can comprise an error in resolving a spatial resolution.
- the error in resolving the spatial resolution can comprise an error in resolving signals from different areas on a flow cell of a nextgeneration sequencing platform.
- the error in resolving a resolution can comprise an error in resolving a temporal resolution.
- the error in resolving the temporal resolution can comprise an error in resolving signals across sequencing cycles of a next-generation sequencing platform. For example, the signals may bleed across sequencing cycles, e.g., some indices being sequenced may fall behind or be ahead of other indices being sequenced, which can result in difficult to interpret signals associated with the sequencing.
- the error can be detected in real time.
- some sequencing methods can comprise determining the nucleic acid sequence, one or more nucleotide bases at a time, and the determining can be indicated to the experimenter in real time. Accordingly, the determining of the one or more nucleotide bases can comprise determining an unassigned base in real time, provided the specifications of the methods described herein. The error can also be detected post hoc. For example, some sequencing methods may determine signals related to the sequencing of the nucleic acid sequence, after at least some portion of the sequencing is complete.
- the determined signals may determine an unassigned base from the sequencing process, and the determined unassigned base may be recorded, along with its corresponding information, e.g., the index of the unassigned base.
- the recorded unassigned base and its corresponding information can then be analyzed, after at least some portion of the sequencing is complete, e.g.,post hoc.
- a method for nucleic acid sequence analysis can comprise: (a) contacting a plurality of different polynucleotide templates with nucleotides comprising: (i) nucleotides of a first base, each labeled with a first label, (ii) nucleotides of a second base, each labeled with a second label, (iii) nucleotides of a third base, each labeled with a third label, and (iv) nucleotides of a fourth base, comprising species labeled with the first label, species labeled with the second label, and species labeled with the third label, wherein the first, second, third, and fourth bases are different bases, and the first, second, and third labels are configured to be optically distinguishable from each other; (b) allowing binding and optional incorporation of the pool of nucleotides based on complementarity to nucleotide residues in the plurality of different polynucleotide templates;
- a method for nucleic acid sequence analysis comprising: (a) contacting a plurality of different polynucleotide templates with nucleotides comprising: (i) nucleotides of a first base, comprising species labeled with a first label and species labeled with a second label, (ii) nucleotides of a second base, comprising species labeled with the first label and species labeled with a third label, (iii) nucleotides of a third base, comprising species labeled with the second label and species labeled with the third label, and (iv) nucleotides of a fourth base, none of which is labeled with the first label, the second label, or the third label, wherein the first, second, third, and fourth bases are different bases, and the first, second, and third labels are configured to be optically distinguishable from each other; (b) allowing binding and optional incorporation of the pool of nucleotides based on complement
- the nucleic acid molecules used in the methods described herein may be obtained from any suitable biological source, for example a tissue sample, a blood sample, a plasma sample, a saliva sample, a fecal sample, or a urine sample.
- the polynucleotides may be DNA or RNA molecules.
- RNA molecules are reverse transcribed into DNA molecules prior to hybridizing the polynucleotide to a sequencing primer.
- RNA molecules are not reverse transcribed and are hybridized to a sequencing primer for direct RNA sequencing.
- the nucleic acid molecule is a cell-free DNA (cfDNA), such as a circulating tumor DNA (ctDNA) or a fetal cell-free DNA.
- nucleic acid molecules include DNA molecules such as single-stranded DNA (ssDNA), double-stranded DNA (dsDNA), genomic DNA, methylated DNA, specific methylated DNA sequences, fragmented DNA, mitochondrial DNA, in situ synthesized PCR products, and RNA/DNA hybrids.
- the DNA analyte can be a transcript of another nucleic acid molecule (e.g., DNA or RNA such as mRNA) present in a tissue sample.
- RNA molecules such as various types of coding and non-coding RNA, including viral RNAs.
- RNA molecules include messenger RNA (mRNA), including a nascent RNA, a pre-mRNA, a primary-transcript RNA, and a processed RNA, such as a capped mRNA (e.g., with a 5’ 7- methyl guanosine cap), a polyadenylated mRNA (poly-A tail at the 3’ end), and a spliced mRNA in which one or more introns have been removed.
- mRNA messenger RNA
- a nascent RNA e.g., a pre-mRNA, a primary-transcript RNA
- a processed RNA such as a capped mRNA (e.g., with a 5’ 7- methyl guanosine cap), a polyadenylated mRNA (poly-A tail at the 3’ end), and a splic
- RNA analyte can be a transcript of another nucleic acid molecule (e.g., DNA or RNA such as viral RNA).
- a nucleic acid molecule may be a denatured nucleic acid, wherein the resulting denatured nucleic acid is single-stranded.
- the nucleic acid may be denatured, for example, optionally using formamide, heat, or both formamide and heat. In some embodiments, the nucleic acid is not denatured for use in a method disclosed herein.
- a nucleic acid molecule can be extracted from a cell, a virus, or a tissue sample comprising the cell or virus. Processing conditions can be adjusted to extract or release nucleic acid molecules (e.g., RNA) from a cell, a virus, or a tissue sample.
- nucleic acid molecules e.g., RNA
- a method disclosed herein comprises using one or more nucleotides or analogs thereof, including a native nucleotide or a nucleotide analog or modified nucleotide (e.g., labeled with one or more detectable labels).
- a nucleotide analog comprises a nitrogenous base, five-carbon sugar, and phosphate group, wherein any component of the nucleotide may be modified and/or replaced.
- a method disclosed herein may comprise but does not require using one or more non-incorporable nucleotides. Non-incorporable nucleotides may be modified to become incorporable at any point during the sequencing method.
- Nucleotide analogs include, but are not limited to, alpha-phosphate modified nucleotides, alpha-beta nucleotide analogs, beta-phosphate modified nucleotides, beta-gamma nucleotide analogs, gamma-phosphate modified nucleotides, caged nucleotides, or ddNTPs. Examples of nucleotide analogs are described in U.S. Patent No. 8,071,755, which is incorporated by reference herein in its entirety.
- a method disclosed herein may comprise but does not require using terminators that reversibly prevent nucleotide incorporation at the 3 '-end of the primer.
- One type of reversible terminator is a 3'-O-blocked reversible terminator.
- the terminator moiety is linked to the oxygen atom of the 3'-OH end of the 5-carbon sugar of a nucleotide.
- U.S. Patent Nos. 7,544,794 and 8,034,923 (the disclosures of these patents are incorporated by reference) describe reversible terminator dNTPs having the 3'-OH group replaced by a 3'-ONH2 group.
- reversible terminator is a 3 '-unblocked reversible terminator, wherein the terminator moiety is linked to the nitrogenous base of a nucleotide.
- U.S. Patent No. 8,808,989 discloses particular examples of base-modified reversible terminator nucleotides that may be used in connection with the methods described herein.
- Other reversible terminators that similarly can be used in connection with the methods described herein include those described in U.S. Patent Nos. 7,956,171, 8,071,755, and 9,399,798, herein incorporated by reference.
- a method disclosed herein may comprise but does not require using nucleotide analogs having terminator moieties that irreversibly prevent nucleotide incorporation at the 3 '-end of the primer.
- Irreversible nucleotide analogs include 2', 3'- dideoxynucleotides, ddNTPs (ddGTP, ddATP, ddTTP, ddCTP). Dideoxynucleotides lack the 3'- OH group of dNTPs that is essential for polymerase-mediated synthesis.
- a method disclosed herein may comprise but does not require using non-incorporable nucleotides comprising a blocking moiety that inhibits or prevents the nucleotide from forming a covalent linkage to a second nucleotide (3'-OH of a primer) during the incorporation step of a nucleic acid polymerization reaction.
- the blocking moiety can be removed from the nucleotide, allowing for nucleotide incorporation.
- a method disclosed herein may comprise but does not require using 1, 2, 3, 4 or more nucleotide analogs present in the SBS reaction.
- a nucleotide analog is replaced, diluted, or sequestered during an incorporation step.
- a nucleotide analog is replaced with a native nucleotide.
- a nucleotide analog is modified during an incorporation step. The modified nucleotide analog can be similar to or the same as a native nucleotide.
- a method disclosed herein may comprise but does not require using a nucleotide analog having a different binding affinity for a polymerase than a native nucleotide.
- a nucleotide analog has a different interaction with a next base than a native nucleotide.
- Nucleotide analogs and/or non-incorporable nucleotides may basepair with a complementary base of a template nucleic acid.
- one or more nucleotides can be labeled with distinguishing and/or detectable tags or labels.
- the tags may be distinguishable by means of their differences in fluorescence, Raman spectrum, charge, mass, refractive index, luminescence, length, or any other measurable property.
- the tag may be attached to one or more different positions on the nucleotide, so long as the fidelity of binding to the polymerase-nucleic acid complex is sufficiently maintained to enable identification of the complementary base on the template nucleic acid correctly.
- the tag is attached to the nucleobase of the nucleotide.
- a tag is attached to the gamma phosphate position of the nucleotide.
- Detectable labels can be suitable for small scale detection and/or suitable for high- throughput screening.
- suitable detectable labels include, but are not limited to, radioisotopes, fluorophores, chemiluminescent compounds, bioluminescent compounds, and dyes.
- the detectable label can be qualitatively detected (e.g., optically or spectrally), or it can be quantified.
- Qualitative detection generally includes a detection method in which the existence or presence of the detectable label is confirmed, whereas quantifiable detection generally includes a detection method having a quantifiable (e.g., numerically reportable) value such as an intensity, duration, polarization, and/or other properties.
- the detectable label is bound to another moiety, for example, a nucleotide or nucleotide analog, and can include a fluorescent, a colorimetric, or a chemiluminescent label.
- a detectable label can be attached to another moiety, for example, a nucleotide or nucleotide analog.
- the detectable label is a fluorophore.
- the fluorophore can be from a group that includes: 7-AAD (7 -Aminoactinomycin D), Acridine Orange (+DNA), Acridine Orange (+RNA), Alexa Fluor® 350, Alexa Fluor® 430, Alexa Fluor® 488, Alexa Fluor® 532, Alexa Fluor® 546, Alexa Fluor® 555, Alexa Fluor® 568, Alexa Fluor® 594, Alexa Fluor® 633, Alexa Fluor® 647, Alexa Fluor® 660, Alexa Fluor® 680, Alexa Fluor® 700, Alexa Fluor® 750, Allophycocyanin (APC), AMCA / AMCA- X, 7-Aminoactinomycin D (7-AAD), 7- Amin
- the detectable label can be directly detectable by itself (e.g., radioisotope labels or fluorescent labels) or, in the case of an enzymatic label, can be indirectly detectable, e.g., by catalyzing chemical alterations of a substrate compound or composition, which substrate compound or composition is directly detectable.
- the label can emit a signal or alter a signal delivered to the label so that the presence or absence of the label can be detected.
- coupling may be via a linker, which may be cleavable, such as photo-cleavable (e.g., cleavable under ultra-violet light), chemically-cleavable (e.g., via a reducing agent, such as dithiothreitol (DTT), tris(2-carboxyethyl)phosphine (TCEP)) or enzymatically cleavable (e.g., via an esterase, lipase, peptidase, or protease).
- a linker which may be cleavable, such as photo-cleavable (e.g., cleavable under ultra-violet light), chemically-cleavable (e.g., via a reducing agent, such as dithiothreitol (DTT), tris(2-carboxyethyl)phosphine (TCEP)) or enzymatically cleavable (e.g.,
- Polymerases that may be used to carry out the disclosed techniques include naturally- occurring polymerases and any modified variations thereof, including, but not limited to, mutants, recombinants, fusions, genetic modifications, chemical modifications, synthetics, and analogs.
- Naturally occurring polymerases and modified variations thereof are not limited to polymerases that retain the ability to catalyze a polymerization reaction.
- the naturally occurring and/or modified variations thereof retain the ability to catalyze a polymerization reaction.
- the naturally-occurring and/or modified variations have special properties that enhance their ability to sequence DNA, including enhanced binding affinity to nucleic acids, reduced binding affinity to nucleic acids, enhanced catalysis rates, reduced catalysis rates, etc.
- Mutant polymerases include polymerases wherein one or more amino acids are replaced with other amino acids (naturally or non-naturally occurring), and insertions or deletions of one or more amino acids.
- a method disclosed herein may comprise but does not require using modified polymerases containing an external tag (e.g., an exogenous detectable label), which can be used to monitor the presence and interactions of the polymerase.
- an external tag e.g., an exogenous detectable label
- intrinsic signals from the polymerase can be used to monitor their presence and interactions.
- the provided methods can include monitoring the interaction of the polymerase, nucleotide and template nucleic acid through detection of an intrinsic signal from the polymerase.
- the intrinsic signal is a light scattering signal.
- intrinsic signals include native fluorescence of certain amino acids such as tryptophan.
- a method disclosed herein may comprise using an unlabeled polymerase, and monitoring is performed in the absence of an exogenous detectable label associated with the polymerase.
- Some modified polymerases or naturally occurring polymerases, under specific reaction conditions, may incorporate only single nucleotides and may remain bound to the primer-template after the incorporation of the single nucleotide.
- a method disclosed herein may comprise using an polymerase unlabeled with an exogenous detectable label (e.g., a fluorescent label).
- the label can be chemically linked to the structure of the polymerase by a covalent bond after the polymerase has been at least partially purified using protein isolation techniques.
- the exogenous detectable label can be chemically linked to the polymerase using a free sulfhydryl or a free amine moiety of the polymerase. This can involve chemical linkage to the polymerase through the side chain of a cysteine residue, or through the free amino group of the N-terminus.
- a fluorescent label attached to the polymerase is useful for locating the polymerase, as may be important for determining whether or not the polymerase has localized to a spot on an array corresponding to immobilized primed template nucleic acid.
- the fluorescent signal need not, and in some embodiments does not change absorption or emission characteristics as the result of binding any nucleotide.
- the signal emitted by the labeled polymerase is maintained uniformly in the presence and absence of any nucleotide being investigated as a possible next correct nucleotide.
- polymerase and its variants also refers to fusion proteins comprising at least two portions linked to each other, for example, where one portion comprises a peptide that can catalyze the polymerization of nucleotides into a nucleic acid strand is linked to another portion that comprises a second moiety, such as, a reporter enzyme or a processivity- modifying domain.
- T7 DNA polymerase comprises a nucleic acid polymerizing domain and a thioredoxin binding domain, wherein thioredoxin binding enhances the processivity of the polymerase.
- T7 DNA polymerase is a distributive polymerase with processivity of only one to a few bases.
- DNA polymerases differ in detail, they have a similar overall shape of a hand with specific regions referred to as the fingers, the palm, and the thumb; and a similar overall structural transition, comprising the movement of the thumb and/or finger domains, during the synthesis of nucleic acids.
- DNA polymerases include, but are not limited to, bacterial DNA polymerases, eukaryotic DNA polymerases, archaeal DNA polymerases, viral DNA polymerases and phage DNA polymerases.
- Bacterial DNA polymerases include E. coli DNA polymerases I, II and III, IV and V, the Klenow fragment of E. coli DNA polymerase, Clostridium stercorarium (Cst) DNA polymerase, Clostridium thermocellum (Cth) DNA polymerase and Sulfolobus solfataricus (Sso) DNA polymerase.
- Eukaryotic DNA polymerases include DNA polymerases a, P, y, 5, e,
- Viral DNA polymerases include T4 DNA polymerase, phi-29 DNA polymerase, GA-1, phi-29-like DNA polymerases, PZA DNA polymerase, phi- 15 DNA polymerase, Cpl DNA polymerase, Cp7 DNA polymerase, T7 DNA polymerase, and T4 polymerase.
- DNA polymerases include thermostable and/or thermophilic DNA polymerases such as DNA polymerases isolated from Thermus aquaticus (Taq) DNA polymerase, Thermus filiformis (Tfi) DNA polymerase, Thermococcus zilligi (Tzi) DNA polymerase, Thermus thermophilus (Tth) DNA polymerase, Thermus flavusu (Tfi) DNA polymerase, Pyrococcus woesei (Pwo) DNA polymerase, Pyrococcus furiosus (Pfu) DNA polymerase and Turbo Pfu DNA polymerase, Thermococcus litoralis (Tli) DNA polymerase, Pyrococcus sp.
- Taq Thermus aquaticus
- Tfi Thermus filiformis
- Tzi Thermococcus zilligi
- Tzi Thermus thermophilus
- Tth DNA polymerase
- GB-D polymerase Thermotoga maritima (Tma) DNA polymerase, Bacillus stearothermophilus (Bst) DNA polymerase, Pyrococcus Kodakaraensis (KOD) DNA polymerase, Pfx DNA polymerase, Thermococcus sp. JDF-3 (JDF-3) DNA polymerase, Thermococcus gorgonarius (Tgo) DNA polymerase, Thermococcus acidophilium DNA polymerase; Sulfolobus acidocaldarius DNA polymerase; Thermococcus sp.
- modified versions of the extremely thermophilic marine archaea Thermococcus species 9° N can be used.
- Still other useful DNA polymerases, including the 3PDX polymerase are disclosed in U.S. Patent No. 8,703,461, the disclosure of which is incorporated by reference in its entirety.
- RNA polymerases include, but are not limited to, viral RNA polymerases such as T7 RNA polymerase, T3 polymerase, SP6 polymerase, and KI 1 polymerase; Eukaryotic RNA polymerases such as RNA polymerase I, RNA polymerase II, RNA polymerase III, RNA polymerase IV, and RNA polymerase V; and Archaea RNA polymerase.
- Reverse transcriptases include, but are not limited to, HIV-1 reverse transcriptase from human immunodeficiency virus type 1 (PDB 1HMV), HIV-2 reverse transcriptase from human immunodeficiency virus type 2, M-MLV reverse transcriptase from the Moloney murine leukemia virus, AMV reverse transcriptase from the avian myeloblastosis virus, and Telomerase reverse transcriptase that maintains the telomeres of eukaryotic chromosomes.
- PDB 1HMV human immunodeficiency virus type 1
- HIV-2 reverse transcriptase from human immunodeficiency virus type 2
- M-MLV reverse transcriptase from the Moloney murine leukemia virus
- AMV reverse transcriptase from the avian myeloblastosis virus
- Telomerase reverse transcriptase that maintains the telomeres of eukaryotic chromosomes.
- a first labeled nucleotide that has been incorporated is not deactivated (e.g., by removal and/or photobleaching of the label) prior to the introduction and/or incorporation of the next, second labeled nucleotide.
- the first and second labeled nucleotides can comprise the same base or different bases.
- the first and second labeled nucleotides can be introduced into a sequencing reaction mix simultaneously or at different time points in any order.
- first and second labeled nucleotides can be introduced by itself (e.g., in a suitable solvent such as water) or in a mixture with another sequencing reagent, such as one or more other labeled nucleotides and/or one or more unlabeled nucleotides.
- the first and second labeled nucleotides can also comprise the same base or different bases.
- nucleotides that have not been incorporated at a residue corresponding to a base in the template nucleic acid are not removed from the sequencing reaction mix prior to the introduction and/or incorporation of the second labeled nucleotide.
- the first and second labeled nucleotides are provided in the same sequencing reaction mix, and the first, second, and optionally any subsequent labeled nucleotide(s) are incorporated sequentially in a continuous manner.
- some embodiments of the method disclosed herein use continuous introduction and/or incorporation of nucleotides (e.g., fluorescently labeled A, T, C, and/or G nucleotides) without the need of label deactivation and/or wash steps in between sequential incorporation events for a given template nucleic acid molecule to be sequenced.
- nucleotides e.g., fluorescently labeled A, T, C, and/or G nucleotides
- label deactivation e.g., by cleaving and/or photobleaching the label
- label deactivation of a first incorporated nucleotide may occur stochastically throughout the continuous nucleotide incorporation process, for instance, prior to, during, or after the incorporation of a second, third, fourth, or a subsequent labeled nucleotide.
- Nucleic acid sequencing reaction mixtures typically include reagents that are commonly present in polymerase based nucleic acid synthesis reactions.
- the reaction mixture can include other molecules including, but not limited to, enzymes.
- the reaction mixture comprises any reagents or biomolecules generally present in a nucleic acid polymerization reaction.
- Reaction components may include, but are not limited to, salts, buffers, small molecules, detergents, crowding agents, metals, and ions.
- properties of the reaction mixture may be manipulated, for example, electrically, magnetically, and/or with vibration.
- the provided methods herein may further comprise but do not require one or more wash steps; a temperature change; a mechanical vibration; a pH change; or an optical stimulation that is not dye illumination or photobleaching.
- the wash step comprises contacting the substrate and the nucleic acid molecule, the primer, and/or the polymerase with one of more buffers, detergents, protein denaturants, proteases, oxidizing agents, reducing agents, or other agents capable of crosslinking or releasing crosslinks, e.g., crosslinks within a polymerase or crosslinks between a polymerase and nucleic acid.
- Methods and compositions for nucleic acid sequencing are known, for example, as described in U.S. Patent Nos. 10,246,744 and 10,844,428, incorporated herein by reference in their entireties for all purposes.
- Reaction mixture reagents can include, but are not limited to, enzymes (e.g., polymerase), dNTPs, template nucleic acids, primer nucleic acids, salts, buffers, small molecules, co-factors, metals, and ions.
- the ions may be catalytic ions, divalent catalytic ions, non-catalytic ions, non-covalent metal ions, or a combination thereof.
- the reaction mixture can include salts, such as NaCl, KC1, potassium acetate, ammonium acetate, potassium glutamate, or NH4C1 or the like, that ionize in aqueous solution to yield monovalent cations.
- the reaction mixture can include a source of ions, such as Mg2+, Mn2+, Co2+, Cd2+, and/or Ba2+ ions.
- the reaction mixture can include tin, Ca2+, Zn2+, Cu2+, Co2+, Fe2+, and/or Ni2+, or other divalent non-catalytic metal cations.
- the reaction mixture can include metal cations that may inhibit formation of phosphodiester bonds between the primed template nucleic acid molecule and the cognate nucleotide.
- the metal cations can be used (e.g., at a suitable concentration) to slow down but not completely inhibit or prevent nucleotide incorporation, thereby reducing multiple nucleotide incorporation events in a single detection window.
- the sequencing reaction conditions comprise contacting the nucleic acid molecule and the primer with a buffer that regulates osmotic pressure.
- the reaction mixture comprises a buffer that regulates osmotic pressure.
- the buffer is a high salt buffer that includes a monovalent ion, such as a monovalent metal ion (e.g., potassium ion or sodium ion) at a concentration of from about 50 to about 1,500 mM.
- the buffer further comprises a source of glutamate ions (e.g., potassium glutamate).
- the buffer comprises a stabilizing agent.
- the stabilizing agent is a non-catalytic metal ion (e.g., a divalent non-catalytic metal ion).
- Non-catalytic metal ions useful in this context include, but are not limited to, calcium, strontium, scandium, titanium, vanadium, chromium, iron, cobalt, nickel, copper, zinc, gallium, germanium, arsenic, selenium, rhodium, europium, and/or terbium.
- the non-catalytic metal ion is strontium, tin, or nickel.
- the sequencing reaction mixture comprises strontium chloride or nickel chloride.
- the stabilizing agent can be used (e.g., at a suitable concentration) to slow down but not completely inhibit or prevent nucleotide incorporation, thereby reducing multiple nucleotide incorporation events in a single detection window.
- the buffer can include Tris, Tricine, HEPES, MOPS, ACES, MES, phosphate-based buffers, and acetate-based buffers.
- the reaction mixture can include chelating agents such as EDTA, EGTA, and the like.
- the reaction mixture includes cross-linking reagents.
- the interaction between the polymerase and template nucleic acid may be manipulated by modulating sequencing reaction parameters such as ionic strength, pH, temperature, or any combination thereof, or by the addition of a destabilizing agent to the reaction.
- the destabilizing agent can be used (e.g., at a suitable concentration) to slow down but not completely inhibit or prevent nucleotide incorporation, thereby reducing multiple nucleotide incorporation events in a single detection window.
- high salt e.g., 50 to 1,500 mM
- pH changes are utilized to destabilize a complex between the polymerase and template nucleic acid.
- the reaction conditions favor the stabilization of a complex among the polymerase, the template nucleic acid, and a labeled nucleotide.
- the pH of the reaction mixture can be adjusted from 4.0 to 10.0 to favor the stabilization of a complex among the polymerase, the template nucleic acid, and a labeled nucleotide.
- the pH of the reaction mixture is from 4.0 to 6.0.
- the pH of the reaction mixture is 6.0 to 10.0.
- a suitable salt concentration and/or a suitable pH can be selected to slow down but not completely inhibit or prevent nucleotide incorporation, thereby reducing multiple nucleotide incorporation events in a single detection window.
- the reaction mixture comprises a competitive inhibitor, where the competitive inhibitor may reduce the occurrence of multiple incorporations events in a detection window.
- the competitive inhibitor is a non-incorporable nucleotide.
- the competitive inhibitor is an aminoglycoside. The competitive inhibitor is capable of replacing either the nucleotide or the catalytic metal ion in the active site, such that the competitive inhibitor occupies the active site preventing or slowing down a nucleotide incorporation.
- both an incorporable nucleotide and a competitive inhibitor are introduced, such that the ratio of the incorporable nucleotide and the inhibitor can be adjusted to modulate the rate of incorporation of a single nucleotide at the 3 '-end of the primer.
- the competitive inhibitor can be used (e.g., at a low concentration) to slow down but not completely inhibit or prevent nucleotide incorporation, thereby reducing multiple nucleotide incorporation events in a single detection window.
- the reaction mixture comprises at least one nucleotide molecule that is a non-incorporable nucleotide.
- the reaction mixture comprises one or more nucleotide molecules incapable of incorporation into the primer of the primed template nucleic acid molecule.
- nucleotides incapable of incorporation include, for example, monophosphate nucleotides.
- the nucleotide may contain modifications to the triphosphate group that make the nucleotide non-incorporable. Examples of non-incorporable nucleotides may be found in U.S. Pat. No. 7,482,120, which is incorporated by reference herein in its entirety.
- the primer may not contain a free hydroxyl group at its 3'- end, thereby rendering the primer incapable of incorporating any nucleotide, and, thus, making any nucleotide non-incorporable.
- the primer may be processed such that it contains a free hydroxyl group at its 3 '-end to allow nucleotide incorporation.
- the non-incorporable nucleotide can be used (e.g., at a low concentration) to slow down but not completely inhibit or prevent nucleotide incorporation, thereby reducing multiple nucleotide incorporation events in a single detection window.
- the reaction mixture comprises at least one nucleotide molecule that is incorporable but is incorporated at a slower rate compared to a corresponding naturally- occurring nucleoside triphosphate (e.g., NTP or dNTP).
- nucleotides incorporable at a slower rate may include, for example, diphosphate nucleotides.
- the nucleotide may contain modifications to the triphosphate group that make the nucleotide incorporable at a slower rate.
- the nucleotide incorporable at a slower rate can be used to slow down but not completely inhibit or prevent nucleotide incorporation, thereby reducing multiple nucleotide incorporation events in a single detection window.
- the reaction mixture comprises a polymerase inhibitor.
- the polymerase inhibitor is a pyrophosphate analog.
- the polymerase inhibitor is an allosteric inhibitor.
- the polymerase inhibitor is a DNA or an RNA aptamer.
- the polymerase inhibitor competes with a catalytic-ion binding site in the polymerase.
- the polymerase inhibitor is a reverse transcriptase inhibitor.
- the polymerase inhibitor may be an HIV-1 reverse transcriptase inhibitor or an HIV-2 reverse transcriptase inhibitor.
- the HIV-1 reverse transcriptase inhibitor may be a (4/6-halogen/MeO/EtO-substituted benzo[d]thiazol-2-yl)thiazolidin-4-one.
- the polymerase inhibitor can be used (e.g., at a low concentration) to slow down but not completely inhibit or prevent nucleotide incorporation, thereby reducing multiple nucleotide incorporation events in a single detection window.
- the contacting step is facilitated by the use of a chamber such as a flow cell.
- the methods and apparatus described herein may employ next generation sequencing technology (NGS), which allows massively parallel sequencing.
- NGS next generation sequencing technology
- single DNA molecules are sequenced in a massively parallel fashion within a reaction chamber.
- a flow cell may be used but is not necessary.
- Flowing liquid reagents through the flow cell which contains an interior solid support surface (e.g., a planar surface), conveniently permits reagent exchange.
- Immobilized to the interior surface of the flow cell is one or more primed template nucleic acids to be sequenced or interrogated using the procedures described herein.
- Typical flow cells will include microfluidic valving that permits delivery of liquid reagents (e.g., components of the “reaction mixtures” discussed herein) to an entry port. Liquid reagents can be removed from the flow cell by exiting through an exit port.
- liquid reagents e.g., components of the “reaction mixtures” discussed herein
- a reaction chamber disclosed herein can comprise a reagent wall, an imaging area, and optionally an outlet configured to remove molecules of one or more of the polymerase, the first detectably labeled nucleotide, the second detectably labeled nucleotide, and/or one or more other reagents from the imaging area.
- the device may comprise one or more vents but no outlet or exit port for the reaction mixture.
- a method disclosed herein does not comprise a step of removing liquid reagents through an outlet or exit port, e.g., from a reaction chamber such as a flow cell.
- the methods disclosed herein may but do not need to be used in combination with any NGS sequencing methods.
- the sequencing technologies of NGS include but are not limited to pyrosequencing, sequencing-by-synthesis with reversible dye terminators, sequencing by oligonucleotide probe ligation, and ion semiconductor sequencing.
- Nucleic acids such as DNA or RNA from individual samples can be sequenced individually (singleplex sequencing) or nucleic acids such as DNA or RNA from multiple samples can be pooled and sequenced as indexed genomic molecules (multiplex sequencing) on a single sequencing run, to generate up to several hundred million reads of sequences. Examples of sequencing technologies that can be used to obtain the sequence information according to the present method are further described here.
- sequencing-by- synthesis platforms from 454 Life Sciences (Bradford, Conn.), Illumina/Solexa (Hayward, Calif.) and Helicos Biosciences (Cambridge, Mass.).
- Sanger sequencing including the automated Sanger sequencing, can also be employed in the methods described herein. Additional suitable sequencing methods include, but are not limited to nucleic acid imaging technologies, e.g., atomic force microscopy (AFM) or transmission electron microscopy (TEM).
- AFM atomic force microscopy
- TEM transmission electron microscopy
- the disclosed methods may be used in combination with massively parallel sequencing of nucleic acid molecules using Illumina's sequencing-by- synthesis and reversible terminator-based sequencing chemistry.
- a method disclosed herein can use a flow cell having a glass slide with lanes.
- sequence reads of predetermined length are localized by mapping (alignment) to a known reference sequence or genome (e.g., viral sequences or genomes).
- mapping e.g., mapping to a known reference sequence or genome (e.g., viral sequences or genomes).
- a number of computer algorithms are available for aligning sequences, including without limitation BLAST, BLITZ, FASTA, BOWTIE, or ELAND (Illumina, Inc., San Diego, Calif, USA).
- the provided sequencing methods disclosed herein may regulate polymerase interaction with the nucleotides and template nucleic acid (as well as rate of nucleotide incorporation) in a manner that reveals the identity of the next base while controlling the chemical addition of a nucleotide.
- the SBS reaction condition comprises a plurality of primed template nucleic acids, polymerases, nucleotides, or any combination thereof.
- the plurality of nucleotides comprises 1, 2, 3, 4, or more types of different nucleotides, for example dATP, dTTP (or dUTP), dGTP, and dCTP.
- the method can further comprise contacting the nucleic acid molecule with the substrate to immobilize the nucleic acid molecule.
- the nucleic acid molecule can be immobilized at a density of one molecule per at least about 250 nm2, at least about 200 nm2, at least about 150 nm2, at least about 100 nm2, at least about 90 nm2, at least about 80 nm2, at least about 70 nm2, at least about 60 nm2, at least about 50 nm2, at least about 40 nm2, at least about 30 nm2, at least about 20 nm2, at least about 10 nm2, at least about 5 nm2, or in between any two of the aforementioned values.
- nucleic acid molecules e.g., nucleic acid strands to be sequenced
- a subset of nucleic acid molecules (e.g., nucleic acid strands to be sequenced) on the substrate may be active at one or more time points.
- a first subset of nucleic acid molecules on the substrate is active (e.g., allowing nucleotide incorporation into a sequencing primer using a single-stranded sequence as template) while a second subset of nucleic acid molecules on the substrate is inactive (e.g., not allowing nucleotide incorporation into a sequencing primer using a single-stranded sequence as template).
- a first subset of nucleic acid molecules on the substrate is activated (e.g., by a first set of polymerase and/or primer molecules) for nucleotide incorporation, while a second subset of nucleic acid molecules on the substrate is not activated (e.g., by the first set of polymerase and/or primer molecules), thus only signals associated with the first subset of nucleic acid molecules are detected.
- the second subset of nucleic acid molecules on the substrate is activated (e.g., by a second set of polymerase and/or primer molecules) for nucleotide incorporation, while the first subset of nucleic acid molecules on the substrate is not activated (e.g., by the second set of polymerase and/or primer molecules), thus only signals associated with the second subset of nucleic acid molecules are detected.
- the first and second sets of polymerase and/or primer molecules can be introduced at different time points, e.g., in sequential cycles with optional washing steps between cycles (e.g., to remove a set of polymerase and/or primer molecules for SBS of a first subset of strands before introducing the next set of polymerase and/or primer molecules for SBS of a second subset of strands).
- the substrate can comprise a bead, a planar substrate, a solid surface, a flow cell, a semiconductor chip, a well, a pillar, a chamber, a channel, a through hole, a nanopore, or any combination thereof.
- the substrate can comprise a microwell, a micropillar, a microchamber, a microchannel, or any combination thereof.
- compositions and kits comprising one or more of the primers, nucleic acid molecules, substrates, nucleotides including detectably labeled nucleotides, polymerases, and reagents for performing the methods provided herein, for example reagents required for one or more steps comprising hybridization, ligation, amplification, detection, sequencing, and/or sample preparation as described herein, for example, in Section IV.
- kits may be present in separate containers or certain compatible components may be pre-combined into a single container.
- the kits further contain instructions for using the components of the kit to practice the provided methods.
- kits can contain reagents and/or consumables required for performing one or more steps of the provided methods.
- the kits contain reagents for sample processing, such as nucleic acid extraction, isolation, and/or purification, e.g., RNA extraction, isolation, and/or purification.
- the kits contain reagents, such as enzymes and buffers for ligation and/or amplification, such as ligases and/or polymerases.
- the kits contain reagents, such as enzymes and buffers for primer extension and/or nucleic acid sequencing, such as polymerases and/or transcriptases.
- the kit can also comprise any of the reagents described herein, e.g., buffer components for tuning the rate of nucleotide incorporation and/or for tuning the rate of signal deactivation (e.g., by photobleaching).
- the kits contain reagents for signal detection during sequencing, such as detectable labels and detectab ly labeled molecules.
- the kits optionally contain other components, for example nucleic acid primers, enzymes and reagents, buffers, nucleotides, modified nucleotides, and reagents for additional assays.
- the provided embodiments can be applied in analyzing nucleic acid sequences, such as DNA and/or RNA sequencing. In some aspects, the embodiments can be applied in an imaging or detection method for multiplexed nucleic acid analysis. In some aspects, the provided embodiments can be used to identify or detect regions of interest in target nucleic acids, such as viral DNA or RNA.
- the region of interest comprises one or more nucleotide residues, such as a single-nucleotide polymorphism (SNP), a single-nucleotide variant (SNV), substitutions such as a single-nucleotide substitution, mutations such as a point mutation, insertions such as a single-nucleotide insertion, deletions such as a single-nucleotide deletion, translocations, inversions, duplications, and/or other sequences of interest.
- SNP single-nucleotide polymorphism
- SNV single-nucleotide variant
- substitutions such as a single-nucleotide substitution
- mutations such as a point mutation
- insertions such as a single-nucleotide insertion
- deletions such as a single-nucleotide deletion
- translocations inversions, duplications, and/or other sequences of interest.
- the embodiments can be applied in investigative and/or diagnostic applications, for example, for characterization or assessment of a sample from a subject.
- Applications of the provided method can comprise biomedical research and clinical diagnostics.
- biomedical research applications comprise, but are not limited to, genetic and genomic analysis for biological investigation or drug screening.
- clinical diagnostics applications comprise, but are not limited to, detecting gene markers such as disease, immune responses, bacterial or viral DNA/RNA for patient samples, loss of genetic heterozygosity, the presence of gene alleles indicative of a predisposition towards disease or good health, likelihood of responsiveness to therapy, or in personalized medicine or ancestry.
- a method of determining a molecular composition comprising: determining a coding scheme, wherein the coding scheme comprises: possible signal codewords, wherein each possible signal codeword comprises one or more possible signals and each of the one or more possible signals comprises possible states, wherein a possible molecular unit of the molecular composition is encoded by a molecular unit-encoding signal codeword from the possible signal codewords, and an error buffer is encoded by an error buffer-encoding signal codeword from the possible signal codewords; receiving a signal codeword corresponding to a molecular unit of the molecular composition, the signal codeword comprising a signal, wherein each signal has a state, and each signal is susceptible to an error; and decoding from the signal codeword, based on the coding scheme, the molecular unit of the molecular composition.
- the molecular unit-encoding signal codeword corresponds to an error buffer-encoding signal codeword for each of the molecular unitencoding signal codewords.
- nucleotide base-encoding signal codeword encodes for a possible nucleotide base of the nucleic acid molecule.
- a method of determining a molecular composition comprising: receiving a sample, wherein the sample comprises a molecule; detecting signals corresponding to a molecular unit of the molecule; determining a signal codeword corresponding to the molecular unit, the signal codeword comprising a detected signal from the detected signals, wherein each detected signal has a state, and each detected signal is susceptible to an error; and decoding from the signal codeword, based on a coding scheme, the molecular unit of the molecule, wherein the coding scheme comprises: possible signal codewords, wherein each possible signal codeword comprises one or more possible signals and each of the one or more possible signals comprises possible states, wherein a possible molecular unit of the molecule is encoded by a molecular unitencoding signal codeword from the possible signal codewords, and an error buffer is encoded by an error buffer-encoding signal codeword from the possible signal codewords.
- liquid biopsy sample comprises blood, plasma, cerebrospinal fluid, sputum, stool, urine, or saliva.
- a system comprising: one or more processors; and a memory communicatively coupled to the one or more processors and configured to store instructions that, when executed by the one or more processors, cause the system to: determine coding scheme, wherein the coding scheme comprises: possible signal codewords, wherein each possible signal codeword comprises one or more possible signal and each one of the one or more possible signals comprises possible states, wherein a possible molecular unit of the molecular composition is encoded by a molecular unit-encoding signal codeword from the possible signal codewords, and an error buffer is encoded by an error buffer-encoding signal codeword from the possible signal codewords; receive a signal codeword corresponding to a molecular unit of the molecular composition, the signal codeword comprising a signal, wherein each signal has a state, and each signal is susceptible to an error; and decode from the signal codeword, based on the coding scheme, the molecular unit of the molecular composition.
- a non- transitory computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by one or more processors of a system, cause the system to: determine coding scheme, wherein the coding scheme comprises: possible signal codewords, wherein each possible signal codeword comprises one or more possible signal and each one of the one or more possible signals comprises possible states, wherein a possible molecular unit of the molecular composition is encoded by a molecular unit-encoding signal codeword from the possible signal codewords, and an error buffer is encoded by an error buffer-encoding signal codeword from the possible signal codewords; receive a signal codeword corresponding to a molecular unit of the molecular composition, the signal codeword comprising a signal, wherein each signal has a state, and each signal is susceptible to an error; and decode from the signal codeword, based on the coding scheme, the molecular unit of the molecular composition.
- Example 1 An assignment scheme for codewords comprising three labels and two states
- This section provides an example of one possible assignment scheme for codewords, where the assignment scheme comprises three possible labels — label 1, label 2, and/or label 3 — and two possible states — a state of 0 or 1.
- FIG. 3 provides a table showing an assignment scheme of all possible codewords, based on the three possible labels and two possible states. In general, the number of possible codewords in the assignment scheme can be determined by:
- states ma x refers to the maximum number of possible states
- labels ma x refers to the maximum number of possible labels.
- FIG. 3 shows eight possible codewords, and each codeword is three labels long, where each label occupies a state of 0 or 1.
- FIG. 4A reiterates the table shown in FIG. 3, but with assignments, e.g., assigned bases, for each of the eight codewords.
- Codeword 2 is assigned a G nucleotide base
- codeword 3 is assigned an A nucleotide base
- codeword 4 is assigned a C nucleotide base
- codeword 8 is assigned a T nucleotide base.
- the remaining codewords remain unassigned to any base, and recording codewords that resemble the unassigned codewords can indicate that a sequencing error has arose.
- FIG. 4B shows how the assignment scheme in FIG. 4A can be implemented according to the incorporation of physical molecules.
- FIG. 4B shows four example polynucleotide templates.
- the four example polynucleotide templates can be allowed to be bound and incorporated by nucleotides based on complementarity to nucleotide residues in the polynucleotide templates.
- a G nucleotide base covalently bound to tag 1 can correspond to codeword 2 that is (1, 0, 0)
- a C nucleotide base covalently bound to tag 3 can correspond to codeword 4 that is (0, 0, 1)
- a T nucleotide base covalently bound to tag 1, tag 2, and tag 3 or a population of T nucleotide bases covalently bound to tag 1, tag 2, or tag 3 can correspond to codeword 8 that is (1, 1, 1)
- an A nucleotide base that is covalently bound to tag 2 can correspond to codeword 3 that is (0, 1, 0).
- the term tag can be equivalent to the term label.
- FIG. 5 reiterates the table shown in FIG. 3 and FIG. 4 but denotes the possible states that can switch for a given label, for an assigned nucleotide base.
- a state switch can result from a sequencing error.
- the codeword for the nucleotide base A can be subject to a single state switch in label 1, such that the assigned A nucleotide codeword (0, 1, 0) turns into codeword (1, 1, 0), which is unassigned, and may be a recorded signal code sequence that can be corrected using methods disclosed herein.
- the codeword for the nucleotide base A can be subject to a single state switch in label 2, such that the assigned A nucleotide codeword (0, 1, 0) turns into codeword (0, 0, 0), which is unassigned, and may be a recorded signal code sequence that can be corrected using methods disclosed herein.
- the codeword for the nucleotide base A can be subject to a single state switch in label 3, such that the assigned A nucleotide codeword (0, 1, 0) turns into codeword (0, 1, 1), which is unassigned, and may be a recorded signal code sequence that can be corrected using methods disclosed herein.
- the experimenter can determine that a sequencing error has occurred.
- the signal code sequence (1, 0, 0) is recorded (e.g., at a cluster during sequencing)
- a G nucleotide base can be inferred from the sequencing method.
- the signal code sequence (1, 1, 1) is recorded
- a T nucleotide base can be inferred from the sequencing method.
- the codeword (0, 0, 0) is recorded, the recorded signal code sequence is an unassigned codeword, and a sequencing error is detected.
- the original codeword before being subject to the error can be inferred to correspond to A, G, or C. That is, the original codeword can be inferred to not correspond to T.
- the A, G, or C nucleotides can be assigned equal likelihoods of being the nucleotide represented by the original codeword. If the signal code sequence (1, 1, 0) is recorded, the recorded signal code sequence is an unassigned codeword, and a sequencing error is detected. Based on the assignment scheme, and assuming a single code error, the original codeword before being subject to the error can be inferred to correspond to A, G, or T. That is, the original codeword can be inferred to not correspond to C.
- the A, G, or T nucleotides can be assigned equal likelihoods of being the nucleotide represented by the original codeword. If the codeword (0, 0, 1) is recorded, a C nucleotide base can be inferred from the sequencing method.
- Example 2 An alternative assignment scheme for codewords comprising three labels and two states
- This section provides an example of one possible assignment scheme for codewords, where the assignment scheme comprises three possible labels — label 1, label 2, and/or label 3 — and two possible states — a state of 0 or 1.
- FIG. 6 provides a table showing an assignment scheme of all possible codewords, based on the three possible labels and two possible states. The assignment table shown in FIG. 6 is distinct from the assignment table described in Example 1.
- FIG. 6 denotes the possible states that can switch for a given label, for an assigned nucleotide base.
- a state switch can result from a sequencing error, for instance, during detection of a signal associated with a detectable label.
- a signal associated with one state e.g., “1”
- a signal associated with another state e.g., “0”
- the recorded signal code sequence for the nucleotide base A can contain a single state switch in label 1 , such that the assigned A nucleotide codeword (1, 1, 0) is recorded as signal code sequence (0, 1, 0), which corresponds to an unassigned codeword.
- the recorded signal code sequence for the nucleotide base A can contain a single state switch in label 2, such that the assigned A nucleotide codeword (1, 1, 0) is recorded as signal code sequence (1, 0, 0), which is an unassigned codeword.
- the recorded signal code sequence for the nucleotide base A can contain a single state switch in label 3, such that the assigned A nucleotide codeword (1, 1, 0) is recorded as a signal code sequence (1, 1, 1), which is an unassigned codeword. When an unassigned codeword is recorded, the experimenter can determine that a sequencing error has occurred.
- the recorded signal code sequence is an unassigned codeword, and a sequencing error is detected.
- the original codeword before being subject to the error can be inferred to correspond to G, A, or C. That is, the original codeword can be inferred to not correspond to T.
- the A, G, or C nucleotides can be assigned equal likelihoods of being the nucleotide represented by the original codeword. If the signal code sequence (1, 1, 1) is recorded which is an unassigned codeword, and a sequencing error is detected.
- the original codeword before being subject to the error can be inferred to correspond to T, C, or A. That is, the original codeword can be inferred to not correspond to G.
- the T, C, or A nucleotides can be assigned equal likelihoods of being the nucleotide represented by the original codeword. If the signal code sequence (0, 0, 0) is recorded, a G nucleotide base can be inferred from the sequencing method. If the signal code sequence (0, 0, 1) is recorded which corresponds to an unassigned codeword, and a sequencing error is detected.
- the original codeword before being subject to the error can be inferred to correspond to C, T, or G. That is, the original codeword can be inferred to not correspond to A.
- the C, T, or G nucleotides can be assigned equal likelihoods of being the nucleotide represented by the original codeword.
- Example 3 An alternative assignment scheme for codewords comprising three labels and two states
- FIG. 7 provides a table showing an assignment scheme of all possible codewords, based on the three possible labels and two possible states.
- the assignment table shown in FIG. 7 is distinct from both the assignment table described in Example 1 and the assignment table described in Example 2.
- the assignment table shown in FIG. 7 is distinct from both the assignment table described in Example 1 and the assignment table described in Example 2. The assignment table shown in FIG.
- FIG. 7 denotes the possible states that can switch for a given label, for an assigned nucleotide base.
- a state switch can result from a sequencing error.
- the codeword for the nucleotide base C can be subject to a single state switch in label 1, such that the assigned C nucleotide codeword (1, 1, 0) turns into codeword (0, 1, 0), which corresponds to the assigned A nucleotide.
- a single code error to the (1, 1, 0) codeword for C would not be detected, because it would exactly resemble the codeword for the C nucleotide, had a single code error not occurred.
- An unassigned codeword is not detected, and thus, an error is not detected.
- the codeword for the nucleotide base C can be subject to a single state switch in label 2, such that the assigned C nucleotide codeword (1, 1, 0) turns into codeword (1, 0, 0), which corresponds to the assigned G nucleotide.
- a single code error to the (1, 1, 0) codeword for C would not be detected, because it would exactly resemble the codeword for the G nucleotide, had a single code error not occurred.
- the codeword for the nucleotide base T can be subject to a single state switch in label 3, such that the assigned C nucleotide codeword (1, 1, 0) turns into codeword (1, 1, 1) which corresponds to the T nucleotide.
- the original recorded codeword prior to the error can be (1, 0, 0), or (0, 1, 0), or (0, 0, 1).
- the correct signal code sequence prior to the error must be (1, 0, 0) which corresponds to G or (0, 1, 0) which corresponds to A — again, assuming that the error is a single code error. If the (0, 0, 1) is recorded, the recorded signal code sequence is an unassigned codeword, and a sequencing error is detected.
- the correct signal code sequence prior to the error can be (1, 0, 1), or (0, 1, 1), or (0, 0, 0).
- (1, 0, 1), (0, 1, 1), and (0, 0, 0) each correspond to an unassigned codeword
- the correct signal code sequence cannot be determined — again, assuming that the error is a single code error. In other words, an error can be detected, but not corrected.
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Medical Informatics (AREA)
- Theoretical Computer Science (AREA)
- Biophysics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Signal Processing (AREA)
- Chemical & Material Sciences (AREA)
- Artificial Intelligence (AREA)
- Analytical Chemistry (AREA)
- Molecular Biology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Bioethics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Epidemiology (AREA)
- Evolutionary Computation (AREA)
- Public Health (AREA)
- Software Systems (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
L'invention concerne des procédés d'analyse d'une séquence d'acide nucléique comprenant des procédés de génération d'un schéma de codage de séquençage qui tolère des erreurs pendant le processus de séquençage. Les procédés peuvent consister, par exemple, à générer différents mots de code; à attribuer un mot de code différent à chacune de quatre bases différentes, les différents mots de code adhérant à des règles de distance de Hamming désignées; à mettre en contact différentes matrices polynucléotidiques avec des nucléotides des quatre bases différentes; à permettre la liaison et l'incorporation facultative des nucléotides; à imager les différentes matrices polynucléotidiques pour déterminer des séquences de code de signal enregistrées; et à comparer des séquences de code de signal enregistrées aux différents mots de code pour identifier des bases correspondant aux séquences de code de signal enregistrées et/ou à vérifier les erreurs dans les séquences de code de signal enregistrées.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202463644408P | 2024-05-08 | 2024-05-08 | |
| US63/644,408 | 2024-05-08 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025235623A1 true WO2025235623A1 (fr) | 2025-11-13 |
Family
ID=96091263
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2025/028165 Pending WO2025235623A1 (fr) | 2024-05-08 | 2025-05-07 | Procédés et systèmes de séquençage d'acide nucléique à l'aide d'un schéma de codage à mémoire tampon d'erreurs |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025235623A1 (fr) |
Citations (20)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US4683195A (en) | 1986-01-30 | 1987-07-28 | Cetus Corporation | Process for amplifying, detecting, and/or-cloning nucleic acid sequences |
| US4683202A (en) | 1985-03-28 | 1987-07-28 | Cetus Corporation | Process for amplifying nucleic acid sequences |
| US4800159A (en) | 1986-02-07 | 1989-01-24 | Cetus Corporation | Process for amplifying, detecting, and/or cloning nucleic acid sequences |
| US4965188A (en) | 1986-08-22 | 1990-10-23 | Cetus Corporation | Process for amplifying, detecting, and/or cloning nucleic acid sequences using a thermostable enzyme |
| US5512462A (en) | 1994-02-25 | 1996-04-30 | Hoffmann-La Roche Inc. | Methods and reagents for the polymerase chain reaction amplification of long DNA sequences |
| US20050042649A1 (en) | 1998-07-30 | 2005-02-24 | Shankar Balasubramanian | Arrayed biomolecules and their use in sequencing |
| US7482120B2 (en) | 2005-01-28 | 2009-01-27 | Helicos Biosciences Corporation | Methods and compositions for improving fidelity in a nucleic acid synthesis reaction |
| US7544794B1 (en) | 2005-03-11 | 2009-06-09 | Steven Albert Benner | Method for sequencing DNA and RNA by synthesis |
| US7956171B2 (en) | 2007-05-18 | 2011-06-07 | Helicos Biosciences Corp. | Nucleotide analogs |
| US8034923B1 (en) | 2009-03-27 | 2011-10-11 | Steven Albert Benner | Reagents for reversibly terminating primer extension |
| US8071755B2 (en) | 2004-05-25 | 2011-12-06 | Helicos Biosciences Corporation | Nucleotide analogs |
| US8703461B2 (en) | 2009-06-05 | 2014-04-22 | Life Technologies Corporation | Mutant RB69 DNA polymerase |
| US8808989B1 (en) | 2013-04-02 | 2014-08-19 | Molecular Assemblies, Inc. | Methods and apparatus for synthesizing nucleic acids |
| US9399798B2 (en) | 2011-09-13 | 2016-07-26 | Lasergen, Inc. | 3′-OH unblocked, fast photocleavable terminating nucleotides and methods for nucleic acid sequencing |
| US20180251831A1 (en) * | 2015-11-19 | 2018-09-06 | Peking University | Track one: methods for obtaining and correcting biological sequence information |
| US10246744B2 (en) | 2016-08-15 | 2019-04-02 | Omniome, Inc. | Method and system for sequencing nucleic acids |
| EP3702474A1 (fr) * | 2019-02-26 | 2020-09-02 | QIAGEN GmbH | Procédé et kit de séquençage |
| US10844428B2 (en) | 2015-04-28 | 2020-11-24 | Illumina, Inc. | Error suppression in sequenced DNA fragments using redundant reads with unique molecular indices (UMIS) |
| US20220025442A1 (en) * | 2014-07-30 | 2022-01-27 | President And Fellows Of Harvard College | Systems and methods for determining nucleic acids |
| US20230151420A1 (en) * | 2017-04-25 | 2023-05-18 | Pacific Biosciences Of California, Inc. | Methods and apparatus that increase sequencing-by-binding efficiency |
-
2025
- 2025-05-07 WO PCT/US2025/028165 patent/WO2025235623A1/fr active Pending
Patent Citations (22)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US4683202A (en) | 1985-03-28 | 1987-07-28 | Cetus Corporation | Process for amplifying nucleic acid sequences |
| US4683202B1 (fr) | 1985-03-28 | 1990-11-27 | Cetus Corp | |
| US4683195B1 (fr) | 1986-01-30 | 1990-11-27 | Cetus Corp | |
| US4683195A (en) | 1986-01-30 | 1987-07-28 | Cetus Corporation | Process for amplifying, detecting, and/or-cloning nucleic acid sequences |
| US4800159A (en) | 1986-02-07 | 1989-01-24 | Cetus Corporation | Process for amplifying, detecting, and/or cloning nucleic acid sequences |
| US4965188A (en) | 1986-08-22 | 1990-10-23 | Cetus Corporation | Process for amplifying, detecting, and/or cloning nucleic acid sequences using a thermostable enzyme |
| US5512462A (en) | 1994-02-25 | 1996-04-30 | Hoffmann-La Roche Inc. | Methods and reagents for the polymerase chain reaction amplification of long DNA sequences |
| US20050042649A1 (en) | 1998-07-30 | 2005-02-24 | Shankar Balasubramanian | Arrayed biomolecules and their use in sequencing |
| US8071755B2 (en) | 2004-05-25 | 2011-12-06 | Helicos Biosciences Corporation | Nucleotide analogs |
| US7482120B2 (en) | 2005-01-28 | 2009-01-27 | Helicos Biosciences Corporation | Methods and compositions for improving fidelity in a nucleic acid synthesis reaction |
| US7544794B1 (en) | 2005-03-11 | 2009-06-09 | Steven Albert Benner | Method for sequencing DNA and RNA by synthesis |
| US7956171B2 (en) | 2007-05-18 | 2011-06-07 | Helicos Biosciences Corp. | Nucleotide analogs |
| US8034923B1 (en) | 2009-03-27 | 2011-10-11 | Steven Albert Benner | Reagents for reversibly terminating primer extension |
| US8703461B2 (en) | 2009-06-05 | 2014-04-22 | Life Technologies Corporation | Mutant RB69 DNA polymerase |
| US9399798B2 (en) | 2011-09-13 | 2016-07-26 | Lasergen, Inc. | 3′-OH unblocked, fast photocleavable terminating nucleotides and methods for nucleic acid sequencing |
| US8808989B1 (en) | 2013-04-02 | 2014-08-19 | Molecular Assemblies, Inc. | Methods and apparatus for synthesizing nucleic acids |
| US20220025442A1 (en) * | 2014-07-30 | 2022-01-27 | President And Fellows Of Harvard College | Systems and methods for determining nucleic acids |
| US10844428B2 (en) | 2015-04-28 | 2020-11-24 | Illumina, Inc. | Error suppression in sequenced DNA fragments using redundant reads with unique molecular indices (UMIS) |
| US20180251831A1 (en) * | 2015-11-19 | 2018-09-06 | Peking University | Track one: methods for obtaining and correcting biological sequence information |
| US10246744B2 (en) | 2016-08-15 | 2019-04-02 | Omniome, Inc. | Method and system for sequencing nucleic acids |
| US20230151420A1 (en) * | 2017-04-25 | 2023-05-18 | Pacific Biosciences Of California, Inc. | Methods and apparatus that increase sequencing-by-binding efficiency |
| EP3702474A1 (fr) * | 2019-02-26 | 2020-09-02 | QIAGEN GmbH | Procédé et kit de séquençage |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| ES2929367T3 (es) | Amplificación por PCR ultrarrápida cuantitativa usando un dispositivo basado en electrohumectación | |
| US10920269B2 (en) | Amplification and analysis of selected targets on solid supports | |
| US11326206B2 (en) | Methods of quantifying target nucleic acids and identifying sequence variants | |
| JP2007530051A (ja) | 標的分子を決定するためのライゲーション反応および増幅反応 | |
| US20240376525A1 (en) | Use of ethylene carbonate in nucleic acid sequencing methods | |
| US20250109436A1 (en) | Methods, compositions, and systems for long read single molecule sequencing | |
| WO2019226896A1 (fr) | Stockage d'information d'adn à base d'hybridation pour permettre un effacement rapide et permanent | |
| US10358673B2 (en) | Method of amplifying nucleic acid sequences | |
| EP4121523B1 (fr) | Flux de travail unicellulaire pour l'amplification du génome entier | |
| CN115996938A (zh) | 用于免校准和多重变体等位基因频率定量的定量阻断剂置换扩增(qbda)测序 | |
| US20250115956A1 (en) | Methods and compositions for nucleic acid sequencing | |
| EP3601611B1 (fr) | Adaptateurs polynucléotidiques et procédés d'utilisation de ces derniers | |
| WO2025235623A1 (fr) | Procédés et systèmes de séquençage d'acide nucléique à l'aide d'un schéma de codage à mémoire tampon d'erreurs | |
| US20250333786A1 (en) | Methods for single cell sequencing and error rate reduction | |
| WO2024216159A1 (fr) | Procédés et compositions pour le séquençage d'acides nucléiques à l'aide de nucléotides marqués | |
| WO2024216163A1 (fr) | Procédés et compositions pour le séquençage d'acides nucléiques par l'utilisation de nucléotides majoritairement non marqués | |
| EP4605556A2 (fr) | Procédés et compositions pour suivre des codes-barres dans des partitions | |
| US20230250470A1 (en) | Amplicon comprehensive enrichment | |
| US20230094303A1 (en) | Methods and Systems Involving Digestible Primers for Improving Single Cell Multi-Omic Analysis |