WO2004042078A1 - Analyse de sequence nucleotidique par quantification de mutagenese - Google Patents
Analyse de sequence nucleotidique par quantification de mutagenese Download PDFInfo
- Publication number
- WO2004042078A1 WO2004042078A1 PCT/AU2003/001459 AU0301459W WO2004042078A1 WO 2004042078 A1 WO2004042078 A1 WO 2004042078A1 AU 0301459 W AU0301459 W AU 0301459W WO 2004042078 A1 WO2004042078 A1 WO 2004042078A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- nucleotide
- species
- subunit
- sequence
- polynucleotides
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6869—Methods for sequencing
Definitions
- NUCLEOTIDE SEQUENCE ANALYSIS BY QUANTIFICATION
- THIS INVENTION relates generally to sequence analysis. More particularly, the present invention relates to a method for wholly or partially determining the sequence of a polymer of subunits. Even more particularly, the present invention relates to the construction of a plurality of secondary polymers, each varying from a primary polymer by the substitution of at least one subunit with a subunit of a different species, and to their use for inferring information about the primary polymer.
- DNA sequencing using DNA polymerase based extension products is frequently impaired by sequence motifs present within the template DNA that form secondary structures or other structural forms that impede the processive extension of DNA polymerase catalysed products through the region of the motif (Tabor and Richardson, 1987 Proc. Natl. Acad. Sets USA 84:4767- 4771; Donlin and Johnson, 1994 Biochemistry 33:14908-14917; Weinshenker et al, 1998 Biotechniques 25:68-72).
- motifs are typically sequence motifs such as CG-rich motifs which present high thermal and structural stability (Mizusawa et ah, 1986 Nucleic Acids Research 14:1319-1324; McConlogue et al., 1988 Nucleic Acids Research 16:9869; Liu and Sommer, 1998 Biotechniques 25:1022-1028; Haqqi et ah, 1988 Nucleic Acids Research 16:11844; Fernandez- Rachubinski et ah, 1990 DNA Sequence 1:137-140; Perng et al., 1994 Journal of Virological Methods 46:111-116; Fernandez-Rachubinski et ah, 1990DNA Sequence 1:137-140; Motz et ah, 2000 Biotechniques 29:268-270; Liu and Sommer, 1998 Biotechniques 25:1022-1028).
- AT-rich sequence motifs (Quail, 2001 DNA Sequence 12:355-359; Glockner et ah, 2002 Nature 418:79-85; Gardner et ah, 1998 Science 282:1126-1132) and other repetitive sequence motifs (Baran, Lapidot & Manor, 1991 Proc. Natl. Acad. Sci. USA 88:507-511; Bieth et ah, 1997 Gene 194:97-105; Razin et al, 2001 Journal of Molecular Biology 307:481-486; Voet et ah, 1997 Yeast 13:177-182; Thoraval et al, 1996 Proc. Natl. Acad.
- incorporation of non-mutagenic nucleotide analogues has been employed widely to replace particular specific cognate native deoxyribonucleotides in order to effect a reduction in the thermal stability of the duplex DNA within a refractory sequence motif. Such incorporation has been found to improve the ability to sequence through particular types of nucleotide structure or sequence motif that is recalcitrant to sequencing.
- the nucleotide analogues used for these purposes have particular properties, which typically are - (i) the ability to reduce the stability of nucleotide- to-nucleotide base pairing across the DNA duplex; (ii) the ability to be efficiently incorporated into DNA by preferred DNA polymerases; (iii) the ability to be incorporated adjacent to particular cognate bases such that sequence of the DNA duplex is not altered, except for the presence of the nucleotide analogue in place of a particular cognate base; and (iv) the ability of the nucleotide analogue to be "read” as the same nucleotide base as the base it replaces.
- the replacement of certain nucleotides with nucleotide analogues achieves a more uniform distribution of local thermal stability of DNA across the entire DNA region under investigation, particularly when local regions of CG-rich DNA are present.
- nucleotide analogue 2'-deoxyinosine triphosphate which is incorporated efficiently into DNA by DNA polymerases is used widely to alter the stability of DNA pairing (Kawase et ah, 1986 Nucleic Acids Research 14:7727-7736; Bergstrom et ah, 1997 Nucleic Acids Research 25:1935-1942; Strobel and Shetty, 1997 Proc. Natl. Acad. Sci. USA 94:2903-2908; Wong and McClelland 1991 Nucleic Acids Research 19:1081-1085; Ikeda et ah, 1992 Journal of Biological Chemistry 267:6291-6296).
- dlTP 2'-deoxyinosine triphosphate
- dlTP induces only low-level (at a frequency of - 4xl0 "3 ) mutation of DNA after elevated cycles of PCR amplification (Spee et ah, 1993 Nucleic Acids Research 21:777-778; Shibata, 1994 Nippon Rinsho 52:1665-1673; Kuipers, 1996 Methods in Molecular Biology 57:351-356) and effects mainly A - ⁇ G and T - ⁇ C transitions as well as infrequent C -> G transversions.
- nucleotide analogues have heretofore been used in methods to reduce the thermal stability of difficult-to-sequence regions of DNA in order to improve their ability to be sequenced, these methods do not introduce mutations at an intensity sufficient to substantially alter the structural characteristics of these regions and are, therefore, often inefficient.
- a novel strategy for sequencing a target region of a polynucleotide was developed to deal with problematic structural features as, for example, described above.
- the strategy involves producing a plurality of variant polynucleotides that vary from the original polynucleotide in the target region by the substitution of at least one nucleotide with a nucleotide of a different species and quantifying each species of nucleotide at individual positions in the target region of all the variant polynucleotides to thereby determine the nucleotide species located at the same positions in the original polynucleotide.
- This novel strategy has been reduced to practice in methods for analysing multi-subunit sequences, including the analysis of molecules whose local sequence characteristics render them refractory to sequence analysis, as described hereinafter.
- the present invention provides methods that generally take advantage of altering a sequence of subunits selected from a finite set of possible subunit species to produce a multiplicity of secondary subunit sequences, from which the unaltered sequence can be determined.
- the present invention provides methods for wholly or partially determining the sequence of a target region of a primary polymer of subunits from a multiplicity of secondary polymers that vary from the primary polymer in the target region by the substitution of at least one subunit, including a first subunit at a first position, with a subunit of a different species, wherein each species of subunit at the first position correlates with a distinct detectable signal.
- These methods generally comprise analysing the detectable signals that correlate with the first position of all the secondary polymers, collectively, to determine the species of subunit, which is in higher abundance than other species of subunit at the first position, and which corresponds to the species of subunit at the first position in the target region of the primary polymer.
- individual secondary polymers vary from other secondary polymers at the position(s) of variation with the primary polymer.
- the polymers are suitably selected from nucleic acid polymers and amino acid polymers.
- the polymers are nucleic acid polymers.
- the secondary polymers are generated by mutagenesis, which is typically random.
- an individual secondary polymer is formed using the primary polymer as a template for polymerisation or using another secondary polymer that has been directly or indirectly formed using the primary polymer as a template for polymerisation.
- the subunit of a different species is a naturally-occurring subunit species.
- the naturally-occurring subunit species is incorporated into an individual secondary polymer by polymerising that polymer in the presence of another secondary polymer having a mutagenic subunit species at the first position, wherein the mutagenic subunit species serves as a template for incorporating at least two naturally-occurring subunit species at the first position of the individual secondary polymer.
- the mutagenic subunit species induces mutation at a frequency generally greater than about lx 10 "2 .
- the secondary polymers vary from the primary polymer in the target region by the substitution of a second subunit at a second position with a subunit of a different species, wherein each species of subunit at the second position correlates with a distinct detectable signal.
- the methods further comprise analysing the detectable signals that correlate with the second position of all the secondary polymers, collectively, to determine the species of subunit, which is in higher abundance than other species of subunit at the second position, and which corresponds to the species of subunit at the second position in the target region of the primary polymer.
- the secondary polymers vary from the primary polymer in the target region by the substitution of subunits at a multiplicity of positions with subunits of different species, wherein each species of subunit at an individual position correlates with a distinct detectable signal.
- the methods further comprise analysing the detectable signals that correlate with an individual position of all the secondary polymers, collectively, to determine the species of subunit, which is in higher abundance than other species of subunit at the individual position, and which corresponds to the species of subunit at the individual position in the target region of the primary polymer.
- the detectable signals are analysed by: measuring for each species of subunit at an individual position at least one parameter that correlates with that subunit; and processing the measured parameter(s) to determine the abundance of each subunit species relative to other subunit species at the individual position.
- the measured parameters are further processed by comparing them to determine the species of subunit that is in higher abundance than the other species of subunit at the individual position.
- the parameter is a label-associated parameter, which includes, but is not restricted to, parameters relating to fluorescence emission, luminescence, phosphorescence, infrared radiation, electromagnetic scattering including light and x-ray scattering, light transmittance, light absorbance, electrical impedance and molecular mass.
- the variant sequence of a secondary polymer is produced by mutagenesis of the target sequence of the primary polymer. In other embodiments, the variant sequence of a secondary polymer is produced by mutagenesis of the variant sequence of another secondary polymer.
- a parent target sequence is mutagenised to produce at least one variant sequence in which at least 2, 5, 10, 15, 20, 25, 30 or 35% of subunits are different than the parent target sequence.
- the mutagenesis of the parent target sequence is random.
- the invention encompasses the whole or partial sequence of a target region of a primary polymer, as determined by the methods broadly described above.
- the target region of the primary polymer is refractory to sequence analysis or repeat-length analysis and the variation in the corresponding regions of the secondary polymers is associated with the abrogation, inhibition or amelioration of the refractory behaviour.
- local sequence characteristics including inverted repeats or palindromes, which may be present in the target region of the primary nucleic acid polymer, may be modified in the secondary or variant nucleic acid polymers to change the structure of the target region in whole or in part such that formation of stem-and-loop structures, for example, is prevented, reduced or otherwise weakened. Sequencing of several sequence variants simultaneously as disclosed herein can permit the deduction of the whole or partial sequence of the target region.
- certain embodiments of the present invention relate to the mutagenesis of a target sequence of a parent polynucleotide.
- the target sequence is mutagenised using a repair deficient host, which is desirably a bacterium.
- the target sequence is mutagenised using a low fidelity nucleic acid amplification reaction and an error prone DNA polymerase, which is suitably thermostable.
- the target sequence is mutagenised using a nucleic acid amplification reaction and a DNA polymerase, which is suitably thermostable.
- the target sequence is mutagenised using an isothermal nucleic acid amplification reaction and a processive "rolling circle amplification” DNA polymerase.
- the target sequence is mutagenised using an isothermal nucleic acid amplification reaction and an error prone DNA polymerase, e.g., using a "sloppier-copier polymerase” or other "Y-family polymerase” in concert with a processive "rolling circle amplification” DNA polymerase.
- the target sequence is mutagenised using a nucleic acid amplification reaction and a RNA polymerase, wherein the template used for amplification is RNA.
- the target sequence is mutagenised using a nucleic acid amplification reaction and an error prone DNA polymerase, which is suitably a "Reverse Transcriptase" DNA polymerase.
- the target sequence is mutagenised by incorporation of mutagenic nucleotide analogues.
- the mutagenesis facilitates random replacement of nucleotides in the target sequence with at least one nucleotide analogue, which, through its adoption of different tautomeric forms that base pair with alternative nucleotides, results in transition and/or transversion mutagenesis of the target sequence to produce a mixture of randomly mutated polynucleotides (secondary polynucleotides) that vary from the parent polynucleotide in the target sequence by the substitution of at least one naturally-occurring nucleotide with a different naturally-occurring nucleotide.
- the mutagenesis suitably produces polynucleotides selected from: [0020] (i) a mixture of polynucleotides, the sequence of individual polynucleotides being mutated randomly with a single mutagenic nucleotide analogue;
- a mixture of greater than 5 polynucleotides preferably a mixture of greater than 7 polynucleotides, more preferably a mixture of greater than 10 polynucleotides, more preferably a mixture of greater than 20 polynucleotides, even more preferably a mixture of greater than 50 polynucleotides and even more preferably a mixture of greater than 100 polynucleotides, the sequence of individual polynucleotides being mutated randomly with a single mutagenic nucleotide analogue; [0024] (v) a mixture of greater than 5 polynucleotides, preferably a mixture of greater than 7 polynucleotides, more preferably a mixture of greater than 10 polynucleotides, more preferably a mixture of greater than 20 polynucleotides, even more preferably a mixture of greater than 50 polynucleotides and even more preferably a mixture of greater than 100 polynucleotides;
- a mixture of greater than 5 polynucleotides preferably a mixture of greater than 7 polynucleotides, more preferably a mixture of greater than 10 polynucleotides, more preferably a mixture of greater than 20 polynucleotides, even more preferably a mixture of greater than 50 polynucleotides and even more preferably a mixture of greater than 100 polynucleotides including a plurality of polynucleotide subsets, individual polynucleotides of each subset being mutated randomly and independently with a distinct mutagenic nucleotide analogue;
- (xiii) a mixture of polynucleotides, the sequence of individual polynucleotides being mutated randomly with a single mutagenic nucleotide analogue and further altered at one or more positions by the introduction of modified nucleotides which have increased chemical reactivity, examples of which include, but are not restricted to, 7-deaza-7-nitro- dATP, 7-deaza-7-nitro-dGTP, 5-methyl, 5-ethyl, 5-bromo or 5-iodo substitution for the 5- hydrogen of cytosine forming 2'-deoxycytidine 5'-(alpha-P-borano) triphosphates, 5- hydroxy-dCTP, 5-hydroxy-dUTP and dlTP;
- polynucleotides more preferably a mixture of greater than 10 polynucleotides, more preferably a mixture of greater than 20 polynucleotides, even more preferably a mixture of greater than 50 polynucleotides and even more preferably a mixture of greater than 100 polynucleotides, the sequence of individual polynucleotides being mutated randomly with a single mutagenic nucleotide analogue and further altered at one or more positions by the introduction of modified nucleotides which have increased chemical reactivity;
- polynucleotides more preferably a mixture of greater than 10 polynucleotides, more preferably a mixture of greater than 20 polynucleotides, even more preferably a mixture of greater than 50 polynucleotides and even more preferably a mixture of greater than 100 polynucleotides including a plurality of polynucleotide subsets, individual polynucleotides of each subset being mutated randomly and independently by a distinct mutagenic nucleotide analogue and further altered at one or more positions by the introduction of modified nucleotides which have increased chemical reactivity.
- the position will vary between the progeny polynucleotides or between the sequencing polynucleotide fragments; most of which will contain at the individual position the correct nucleotide which is present in the unmutagenised (or parent) polynucleotide, whereas some will contain an incorrect or mutant nucleotide at the same position.
- Identification of the correct nucleotide at a specified position is predicated in part on the random incorporation of incorrect nucleotides in the secondary polynucleotides (e.g., progeny polynucleotides or sequencing polynucleotide fragments) at a frequency of no more than 70%, 65%, 60%, 55%, 50%, 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10%, 5% or 1%.
- the frequency of incorporation of incorrect nucleotides is chosen so that, across the collection of secondary polynucleotides, there are more correct nucleotides at a specified position than incorrect nucleotides.
- quantification of each species of nucleotide at a particular position within a target sequence will reveal the nucleotide species which is in higher abundance than other species of nucleotides at that position and which corresponds to the correct nucleotide species in the target sequence of the parent polynucleotide.
- the invention provides methods for wholly or partially determining the sequence of a target region of a primary polynucleotide from a multiplicity of secondary polynucleotides that vary from the primary polynucleotide in the target region by the substitution of at least one nucleotide, including a first nucleotide at a first position, with a nucleotide of a different species, wherein each species of nucleotide at the first position correlates with a distinct detectable signal.
- These methods generally comprise analyzing the detectable signals that correlate with the first position of all the secondary polynucleotides, collectively, to determine the species of nucleotide, which is in higher abundance than other species of nucleotide at the first position, and which corresponds to the species of nucleotide at the first position in the target region of the primary polynucleotide.
- the different species of nucleotide is a naturally-occurring nucleotide species that is incorporated into an individual secondary polynucleotide using another secondary polynucleotide as a template for polymerisation, wherein the secondary polynucleotide comprises a mutagenic nucleotide analogue at the first position that complements the naturally-occurring nucleotide and at least one other naturally- occurring nucleotide.
- the invention contemplates methods for wholly or partially determining the sequence of a target region of a primary polynucleotide from a multiplicity of secondary polynucleotides that vary from the primary polynucleotide in the target region by the substitution of at least one nucleotide, including a first nucleotide at a first position, with a nucleotide of a different species.
- These methods generally comprise: (a) separating sequencing polynucleotide fragments formed from the secondary polynucleotides and having lengths indicative of the positions of the nucleotides within the target region, including fragments having lengths indicative of the first position, as a function of fragment length, wherein each species of nucleotide, whose position is indicated by a respective fragment, correlates with a distinct detectable signal; (b) detecting the detectable signals during, or at the completion of, the separation; (c) processing the detectable signals to produce a data set containing a plurality of peaks reflecting the positions and species of the nucleotides in the secondary polynucleotides, the plurality of peaks including a first group of peaks representing at least two species of nucleotide at the first position; and (iii) processing the peaks of the first group, collectively, to determine the species of nucleotide, which is in higher abundance than the other species of nucleotide, at the first position, and which corresponds to
- the invention contemplates methods for wholly or partially determining the sequence of a target region of a primary polynucleotide from a multiplicity of secondary polynucleotides that vary from the primary polynucleotide in the target region by the substitution of at least one nucleotide, including a first nucleotide at a first position, with a nucleotide of a different species.
- These methods generally comprise (a) separating sequencing polynucleotide fragments formed from the secondary polynucleotides and having lengths indicative of the positions of the nucleotides within the target region, as a function of molecular mass, wherein each species of nucleotide, whose position is indicated by a respective fragment, correlates with a distinct detectable signal, wherein the fragments are typically separated by a mass spectroscopic technique and especially by differential time of flight; (b) detecting the detectable signals during, or at the completion of, the separation; (c) processing the detectable signals to produce a data set containing a plurality of peaks reflecting the positions and species of the nucleotides in the secondary polynucleotides, the plurality of peaks including a first group of peaks representing at least two species of nucleotide at the first position; and (d) processing the peaks of the first group, collectively, to determine the species of nucleotide, which is in higher abundance than the other species of nucleot
- the invention contemplates methods for determining the repeat- length of a target region of a primary polynucleotide from a multiplicity of secondary polynucleotides that are formed directly or indirectly using the primary polynucleotide as a template for polymerisation and that vary from the primary polynucleotide in the target region by the substitution of at least one nucleotide with a nucleotide of a different species.
- These methods generally comprise: (a) fractionating the secondary polynucleotides according to their length, size or mass wherein each secondary polynucleotide correlates with a detectable signal; (b) detecting the detectable signals during, or at the completion of, the fractionation; and (c) processing the detectable signals to determine the repeat-length of the polynucleotide.
- the secondary polynucleotides are generated in a nucleic acid amplification reaction.
- the secondary polynucleotides are fractionated using gel electrophoresis or mass spectrometry.
- the invention contemplates the use of a mutagenic nucleotide analogue in the manufacture of a kit for wholly or partially determining the sequence of a target region of a primary polynucleotide from a multiplicity of secondary polynucleotides that vary from the primary polynucleotide in the target region by the substitution of at least one nucleotide, as broadly described above.
- the invention contemplates the use of a mutagenic nucleotide analogue in the manufacture of a kit for wholly or partially determining the repeat-length of a target region of a primary polynucleotide from a multiplicity of secondary polynucleotides that are formed directly or indirectly using the primary polynucleotide as a template for polymerisation and that vary from the primary polynucleotide in the target region by the substitution of at least one nucleotide, as broadly described above.
- the invention provides computer program products for wholly or partially deducing the sequence of a target region of a primary polymer of subunits from a multiplicity of secondary polymers that vary from the primary polymer in the target region by the substitution of at least one subunit, including a first subunit at a first position, with a subunit of a different species, wherein each species of subunit at the first position correlates with a distinct detectable signal.
- These computer program products generally include computer executable code which when implemented on a suitable processing system causes the processing system to process the detectable signals that correlate with the first position of all the secondary polymers, collectively, to determine the species of subunit, which is in higher abundance than other species of subunit at the first position, and which corresponds to the species of subunit at the first position in the target region of the primary polymer.
- the invention provides processing systems for wholly or partially deducing the sequence of a target region of a primary polymer of subunits from a multiplicity of secondary polymers that vary from the primary polymer in the target region by the substitution of at least one subunit, including a first subunit at a first position, with a subunit of a different species, wherein each species of subunit at the first position correlates with a distinct detectable signal.
- These processing systems are generally adapted to process the detectable signals that correlate with the first position of all the secondary polymers, collectively, to determine the species of subunit, which is in higher abundance than other species of subunit at the first position, and which corresponds to the species of subunit at the first position in the target region of the primary polymer.
- the detectable signals are desirably processed to produce a data set containing a plurality of peaks reflecting the positions and species of the subunit in the secondary polymers, the plurality of peaks including a first group of peaks representing at least two species of subunit at the first position.
- the processing systems are further adapted to process the peaks of the first group, collectively, to determine the species of subunit, which is in higher abundance than the other species of subunit, at the first position, and which corresponds to the species of subunit at the first position in the target region of the primary polymer.
- the processing systems further comprise a store for storing the data.
- the processing systems are further adapted to generate an indication of the sequence of the target region of the primary polymer.
- the processing systems comprise a display, which displays the indication.
- the invention provides computer program products for wholly or partially deducing the sequence of a target region of a primary polynucleotide from a multiplicity of secondary polynucleotides that vary from the primary polynucleotide in the target region by the substitution of at least one nucleotide, including a first nucleotide at a first position, with a nucleotide of a different species, wherein each species of nucleotide at the first position correlates with a distinct detectable signal.
- These computer program products generally include computer executable code which when implemented on a suitable processing system causes the processing system to (a) process the detectable signals to produce a data set containing a plurality of peaks reflecting the positions and species of the nucleotides in the secondary polynucleotides, the plurality of peaks including a first group of peaks representing at least two species of nucleotide at the first position; and (b) process the peaks of the first group, collectively, to determine the species of nucleotide, which is in higher abundance than the other species of nucleotide, at the first position, and which corresponds to the species of nucleotide at the first position in the target region of the primary polynucleotide.
- the invention provides processing systems for wholly or partially deducing the sequence of a target region of a primary polynucleotide from a multiplicity of secondary polynucleotides that vary from the primary polynucleotide in the target region by the substitution of at least one nucleotide, including a first nucleotide at a first position, with a nucleotide of a different species, wherein each species of nucleotide at the first position correlates with a distinct detectable signal.
- These processing systems are generally adapted to (i) process the detectable signals to produce a data set containing a plurality of peaks reflecting the positions and species of the nucleotides in the secondary polynucleotides, the plurality of peaks including a first group of peaks representing at least two species of nucleotide at the first position; and (ii) process the peaks of the first group, collectively, to determine the species of nucleotide, which is in higher abundance than the other species of nucleotide, at the first position, and which corresponds to the species of nucleotide at the first position in the target region of the primary polynucleotide.
- Figure 1 is a diagrammatic representation illustrating mutant configurations: (A) Star;
- Figure 2 is a graphical representation of an indicative calculated relative probability of miscalling of a nucleotide at any individual position varying in relationship with the number of different mutant polynucleotides in the range of 1 to 15 polynucleotides, wherein the polynucleotides are each mutated randomly at frequencies of 2%, 10%, 20%, 30% and 40%, and the relative probability is calculated on the assumption that 70% of the nucleotides at any position need be identical for detection.
- Figure 3 illustrates a pair of electropherograms representative of direct cycle sequencing reactions of specific "difficult to sequence" polynucleotide fragment W51.
- Figure 3 A represents an electropherogram of sequencing fragments derived from W51 using standard Applied Biosystems Incorporated BigDyeTM dideoxy-terminator sequencing chemistry, version 3.0.
- Figure 3B represents an electropherogram of sequencing fragments derived from an individual copy of W51 mutated using PCR with dPTP and sequenced using the standard ABI BigDyeTM dideoxy-terminator sequencing chemistry, version 3.0.
- Figure 4 illustrates a pair of electropherograms representative of direct cycle sequencing reactions of specific "difficult to sequence" polynucleotide fragment D36.
- Figure 4A represents an electropherogram of sequencing fragments derived from D36 using standard Applied Biosystems Incorporated BigDyeTM dideoxy-terminator sequencing chemistry, version 3.0.
- Figure 4B represents an electropherogram of sequencing fragments derived from an individual copy of D36 mutated using PCR with dPTP and sequenced using the standard ABI BigDyeTM dideoxy-terminator sequencing chemistry, version 3.0.
- Figure 5 illustrates a pair of electropherograms representative of cycle sequencing reaction of a "difficult-to-sequence" polynucleotide region from a fragment of the human RP11- 167L9 BAC.
- the bar indicates the polyA motif.
- Figure 5A represents an electropherogram of sequencing fragments derived from the wild-type region using standard Applied Biosystems Incorporated BigDyeTM dideoxy-terminator sequencing chemistry, version 2.0.
- Figure 5B represents the sequence of the region mutated using TempliPhiTM and the analogue 8-oxo-dGTP and then sequenced using standard Applied Biosystems Incorporated BigDyeTM dideoxy-terminator sequencing chemistry, version 2.0.
- Figure 6 illustrates a quartet of electropherograms representative of direct cycle sequencing reactions of specific "difficult-to-sequence" polynucleotide fragment from human BAC RP11-167L9. The bar indicates the polyA motif.
- Figure 6A represents sequence derived from the wild type BAC RP11-167L9 fragment using standard Applied Biosystems Incorporated BigDyeTM dideoxy-terminator sequencing chemistry, version 3.0.
- Figure 6B represents the mean sequence read derived from a mixture of nine different 8-oxo-dGTP mutated copies of the BAC RPI 1-167L9 fragment, sequenced using the standard ABI BigDyeTM dideoxy-terminator sequencing chemistry, version 3.0.
- Figures 6C and 6D represent the mean or 'composite' sequence read derived from a mixture of eight different dPTP mutated copies of the BAC RP11-167L9 fragments, sequenced using the standard ABI BigDyeTM dideoxy-terminator sequencing chemistries version 3.0 and version 3.1, respectively.
- Figure 7 is a schematic representation of a computer system useful in the practice of the present invention.
- Figure 8 illustrates an electropherogram representative of cycle sequencing reaction of an individual copy of the specific polynucleotide fragment pTESTTM mutated using TempliPhiTM and the analogue 5-Br-dUTP and then sequenced using standard Applied Biosystems Incorporated BigDyeTM dideoxy-terminator sequencing chemistry, version 3.0.
- the dots represent individual base substitutions in the sequence.
- Figure 9 illustrates a pair of electropherograms representative of direct cycle sequencing reactions of specific polynucleotide fragment from pTESTTM.
- Figure 9A represents sequence derived from the wild type pTESTTM using standard Applied Biosystems Incorporated BigDyeTM dideoxy-terminator sequencing chemistry, version 3.0.
- Figure 9B represents the mean sequence read derived from a mixture of fifteen different dPTP mutated copies of pTESTTM, sequenced using the standard ABI BigDyeTM dideoxy-terminator sequencing chemistry, version 3.0.
- Figure 10 illustrates a quartet of electropherograms representative of direct cycle sequencing reactions of specific "difficult-to-sequence" polynucleotide fragment from D12.
- Figure 10A represents sequence derived from the wild type D12 using standard Applied Biosystems Incorporated BigDyeTM dideoxy-terminator sequencing chemistry, version 3.1.
- Figure 10B represents the mean sequence derived from a mixture of different dPTP mutated copies of D12 sequenced using the non-standard ABI BigDyeTM dideoxy-terminator chemistry with the addition of 38 ⁇ M dPTP.
- Figure 10C and 10D represent the mean or 'composite' sequence of a mixture of different dPTP mutated copies of D12 sequenced using the non-standard ABI BigDyeTM dideoxy- terminator chemistry with the addition of 76 ⁇ M and 150 ⁇ M dPTP, respectively.
- Figure 11 illustrates a pair of electropherograms representative of direct cycle sequencing reactions of specific "difficult-to-sequence" polynucleotide fragment from D4.
- FIG. 11A represents sequence derived from the wild type D4 using standard Applied Biosystems Incorporated BigDyeTM dideoxy-terminator sequencing chemistry, version 3.1.
- Figure 11B represents the mean sequence derived from a mixture of different dPTP mutated copies of D4 sequenced using the non-standard ABI BigDyeTM dideoxy-terminator chemistry with the addition of 38 ⁇ M dPTP.
- Figure 12 illustrates a pair of electropherograms representative of direct cycle sequencing reactions of specific "difficult-to-sequence" polynucleotide fragment from human BAC clone RP11 167L9. The bar indicates the polyA motif.
- Figure 12A represents sequence derived from the wild type fragment using standard Applied Biosystems Incorporated BigDyeTM dideoxy- terminator sequencing chemistry, version 3.0, whilst Figure 12B represents the mean sequence derived from a mixture of different dPTP mutated copies of RPI 1 167L9 sequenced using the non- standard ABI BigDyeTM dideoxy-terminator chemistry, v 3.1 with the addition of 38 ⁇ M dPTP.
- Figure 13 represents the mean fragment repeat lengths of nine different Simple
- Tandem Repeat (STR) PCR products derived from a mixture of different 8-oxo-dGTP mutated copies of the Amelogenin (non STR marker, XY marker), DYS14, D21S11, D13S317, D13S258, D13S631, D18S51, D18S851 and D18S391 loci simultaneously amplified using non-standard PCR chemistry, with the addition of 76 ⁇ M 8-oxo-dGTP. All alleles were amplified, however here seven genotype loci are visible (D21S11, D13S317, D13S258, D13S631, D18S51, D18S851 and D18S391) as two fall below the cut-off intensity.
- Complementary refers to the topological capability or matching together of interacting surfaces of an oligonucleotide probe and its target oligonucleotide, which may be part of a larger polynucleotide.
- the target and its probe can be described as complementary, and furthermore, the contact surface characteristics are complementary to each other.
- Complementary includes base complementarity such as A is complementary to T or U, and C is complementary to G in the genetic code.
- this invention also encompasses situations in which there is non- traditional base-pairing such as Hoogstein base pairing which has been identified in certain transfer RNA molecules and postulated to exist in a triple helix.
- match and mismatch as used herein refer to the hybridisation potential of paired nucleotides in complementary nucleic acid strands. Matched nucleotides hybridise efficiently, such as the classical A-T and G-C base pair mentioned above. Mismatches are other combinations of nucleotides that hybridise less efficiently.
- mutagenic nucleotide analogue or “mutagenic analogue” is meant a nucleotide analogue that is incorporated by a polymerase into a nucleic acid polymer, wherein the analogue replaces a first naturally-occurring nucleotide with which it complements and wherein the analogue complements a second nucleotide, which is other than the first nucleotide, and which is suitably a naturally-occurring nucleotide (i.e., which occurs naturally in naturally-occurring nucleic acid polymers).
- modified nucleotide and "derivatised nucleotide” mean synthetic bases, i.e., non-naturally-occurring nucleotides and nucleosides, particularly modified or derivatised adenine, guanine, cytosine, thymidine, uracil and minor bases.
- modified and derivatised modification tends to relate broadly to any difference or alteration compared to a corresponding natural base, whereas derivatisation refers more specifically to the addition or presence of different chemical groups, i.e., modification by the addition of chemical groups, functional groups and/or molecules.
- Nucleotide analogue means a molecule that can be used in place of a naturally- occurring base in nucleic acid synthesis and processing, typically enzymatic as well as chemical synthesis and processing, particularly modified nucleotides capable of base pairing, including synthetic bases that do not comprise adenine, guanine, cytosine, thymidine, uracil or minor bases.
- modified nucleotide typically refers to congeners of adenine, guanine, cytosine, thymidine, uracil and minor bases
- nucleotide analogue further refers to synthetic bases that may not comprise adenine, guanine, cytosine, thymidine, uracil or minor bases, i.e., novel bases.
- Illustrative nucleotide analogues include nucleotides in which the pentose sugar and/or one or more of the phosphate esters is replaced with its respective analogue and includes modified and derivatised nucleotides.
- Exemplary pentose sugar analogues are those in which one or more of the carbon atoms are each independently substituted with one or more of the same or different -R, - OR, -NRR or halogen groups, where each R is independently hydrogen, (C ⁇ -C 6 ) alkyl or (C 5 -C 14 ) aryl.
- the pentose sugar may be saturated or unsaturated.
- pentose sugars include, but are not limited to, ribose, 2'-deoxyribose, 2'-(C ⁇ -C 6 )alkoxyribose, 2'-(C 5 -C ⁇ 4 )aryloxyribose, 2',3'- dideoxyribose, 2 5 ,3'-didehydroribose, 2'-deoxy-3'-haloribose, 2'-deoxy-3'-fluororibose, 2'-deoxy- 3'-chlororibose, 2'-deoxy-3'-aminoribose, 2'-deoxy-3'-(Ci-C 6 )alkylribose, 2'-deoxy-3'-(C ⁇ - C 6 )alkoxyribose, 2'-deoxy-3'-(C 5 -C ⁇ 4 )aryloxyribose, 2',3'-dideoxy-3'--
- Exemplary phosphate ester analogues include, but are not limited to, alkylphosphonates, methylphosphonates, phosphoramidates, phosphotriesters, phosphorothioates, phosphorodithioates, phosphoroselenoates, phosphorodiselenoates, phosphoroanilothioates, phosphoroanilidates, phosphoroamidates, boronophosphates, etc., including any associated counterions, if present.
- RNA contains the nucleobases adenine (A), guanine (G), cytosine (C) or uracil (U) and in DNA contain the nucleobases adenine (A), guanine (G), cytosine (C) or thymine (T).
- Nucleotides which are complementary to one another are those that tend to form complementary hydrogen bonds between them and, specifically, the natural complement to A is U or T, the natural complement to T or U is A, the natural complement to C is G and the natural complement to G is C.
- nucleobase refers to a substituted or unsubstituted nitrogen-containing parent heteroaromatic ring of a type that is commonly found in polynucleotides. Typically, but not necessarily, the nucleobase is capable of forming Watson-Crick and/or Hoogsteen hydrogen bonds with an appropriately complementary nucleobase.
- nucleobases include, but are not limited to, purines such as 2-aminopurine, 2,6-diaminopurine, adenine (A), ethenoadenine, N 6 - ⁇ 2 - isopentenyladenine (6iA), N 6 - ⁇ 2 -isopentenyl-2-methylthioadenine (2ms6iA), N 6 -methyladenine, guanine (G), isoguanine, N 2 -dimethylguanine (dmG), 7-methylguanine (7mG), 2-thiopyrimidine, 6-thioguanine (6sG) hypoxanthine and 0 6 -methylguanine; 7-deaza-purines such as 7-deazaadenine (7-deaza-A) and 7-deazaguanine (7-deaza-G); pyrimidines such as cytosine (C), 5- propynylcytosine, isocytosine, thymine (T), 4-
- nucleobases are purines, 7-deazapurines and pyrimidines.
- Particularly desirable nucleobases are the normal nucleobases, defined infra, and their common analogues, e.g., 2ms6iA, 6iA, 7-deaza-A,D, 2dmG, 7-deaza-G, 7mG, hypoxanthine, 4sT, 4sU and Y.
- a sample such as, for example, a polynucleotide extract is isolated from, or derived from, a particular source of the host.
- the extract can be obtained from a tissue or a biological fluid isolated directly from the host.
- oligonucleotide refers to a polymer composed of a multiplicity of nucleotide residues (deoxyribonucleotides or ribonucleotides, or related structural variants or synthetic analogues thereof) linked via phosphodiester bonds (or related structural variants or synthetic analogues thereof).
- oligonucleotide typically refers to a nucleotide polymer in which the nucleotide residues and linkages between them are naturally occurring, it will be understood that the term also includes within its scope various analogues including, but not restricted to, peptide nucleic acids (PNAs), phosphoramidates, phosphorothioates, methyl phosphonates, 2-O-methyl ribonucleic acids, and the like. The exact size of the molecule can vary depending on the particular application. An oligonucleotide is typically rather short in length, generally from about 8 to 30 nucleotides, more preferably from about 10 to
- Oligonucleotides may be prepared using any suitable method, such as, for example, the phosphotriester method as described in an article by Narang et ah, (1979 Methods Enzymol. 68:90) and U.S. Patent No. 4,356,270. Alternatively, the phosphodiester method as described in Brown et ah, (1979 Methods Enzymol. 68: 109) may be used for such preparation.
- polymerase DNA polymerase
- RNA polymerase reverse transcriptase
- an enzyme of interest e.g., a single enzyme or group of enzymatic subunits
- a group of enzymes e.g., a family of polymerases
- the target polynucleotide can designate mRNA, RNA, cRNA, cDNA single strand DNA or double strand DNA.
- polymerase chain reaction or "PCR” as used herein designates DNA or
- RNA which is amplified by a method in which an oligonucleotide primer is hybridised to the 5' end of each complementary strand of the double-stranded target polynucleotide or nucleic acid as described in US Patent Nos. 4,683,195 and 4,683,202.
- the primers are extended from the 5' end forward in the 3 ' direction by a DNA polymerase which incorporates free nucleotides into a nucleic acid sequence complementary to each strand of the target nucleic acid. After dissociation of the extension products from the target nucleic acids strands, the extension products become target sequences for the next cycle of primer hybridisation and subsequent extension.
- polynucleotide or “nucleic acid” as used herein designates mRNA, RNA, cRNA, cDNA or DNA.
- the term typically refers to oligonucleotides greater than 30 nucleotides in length. Polynucleotides or nucleic acids are understood to encompass complementary strands as well as alternative backbones described herein.
- polynucleotide variant and “variant” refer to polynucleotides that are distinguished from a reference polynucleotide by the substitution, addition or deletion of at least one nucleotide.
- Polypeptide “peptide” and “protein” are used interchangeably herein to refer to a polymer of amino acid residues and to variants and synthetic analogues of the same. Thus, these terms apply to amino acid polymers in which one or more amino acid residues is a synthetic non- naturally occurring amino acid, such as a chemical analogue of a corresponding naturally occurring amino acid, as well as to naturally-occurring amino acid polymers.
- polypeptide variant refers to polypeptides that vary from a reference polypeptide by the substitution, addition or deletion of at least one amino acid residue.
- subsequence refers to a contiguous sequence of a particular unit, value, variable or entity, that exists in part or in whole within a larger contiguous sequence of that particular unit, value, variable or entity, hi this context a subsequence can refer to a contiguous sequence of nucleotides or amino acid residues within, or that is part of, a larger contiguous sequence of nucleotides or amino acid residues, respectively.
- target region refers to at least a subsequence of a polymer of interest, which generally contains a structural element that interferes with sequence analysis of the subsequence.
- the present invention provides a new paradigm, designated SAQOM (Sequence Analysis via Quantification Of Mutation), for determining a sequence of subunits in a target region of a polymer of interest, which is suitably, but not exclusively, difficult or impossible to sequence by conventional means.
- SAQOM is predicated in part on the provision or generation of a multiplicity of variant (or mutant) polymers that vary from the polymer of interest (or parent polymer) in the target region by the substitution of at least one subunit with a subunit of a different species.
- these variant polymers are generated using the polymer of interest as template for their polymerisation. There are two reasons for providing or generating the variant polymers.
- the variants may contain fewer problem regions than the polymer of interest and should, therefore, be easier to sequence.
- genomic DNA is highly repetitive, so random mutation is more likely to destroy repeats than to create them.
- each variant polymer may contain a different pattern of problem regions.
- nucleic acid polymers which represents a specific embodiment of the present invention
- local sequence characteristics including inverted repeats or palindromes, which may be present in the target region of a primary nucleic acid polymer, may be modified in the variant nucleic acid polymers to change the structure of the target region such that formation of stem-and-loop structures, for example, is prevented, reduced or otherwise weakened.
- individual variant polymers will differ from other variant polymers in the target region at the position(s) of subunit substitution.
- the frequency of substitution is typically chosen so that the subunit species, which is located at a specified position within the target sequence of the primary polymer (i.e., the correct subunit species), is represented at that position by more variant polymers than polymers containing other species of subunit at the same position.
- the correct species of subunit must be more abundant than other subunit species at the specified position across all the variant polymers.
- the substituent (or mutant) frequency of individual variant polymers is in the range of between about 1% and about 70%.
- the substituent frequency of individual variant polymers is no more than 70%, more suitably no more than 65%, preferably no more than 60%, more preferably no more than 55%, even more preferably no more than 50%, even more preferably no more than 45%, even more preferably no more than 40%, even more preferably no more than 35%, even more preferably no more than 30%, even more preferably no more than 25%, even more preferably no more than 20%, even more preferably no more than 15%, and still even more preferably no more than 10%.
- the substituent frequency of individual variant polymers may be as low as 9%, 8%, 7%, 6%, 5%, 4%, 3% 2% or even 1%. In certain embodiments of this type, the substitution is random.
- each species of subunit in the variant polymers is detectably distinct from other species of subunit to permit interrogation of the identity of each species of subunit at individual interrogation positions in the target region.
- the variant polymers are analysed, collectively, at each interrogation position to determine the identity and quantity of each subunit species at that position and to thereby deduce the species of subunit which is in higher abundance relative to other species of subunit at the interrogated position.
- the highest abundant species so identified represents the species of subunit located at the same position in the target region of the parent polymer. Similar analyses of other interrogation positions will reveal a ''composite' sequence of highest abundant subunits, which corresponds to the sequence of at least a portion of the target region of the primary polymer.
- the sequence information so obtained may also provide the means to identify local sequence characteristics (e.g., repeat sequences and palindromic sequences) underlying any refractory behaviour of the target region in the primary polymer to sequence analysis.
- the variant polymers may already exist and could, therefore, constitute naturally occurring variants (e.g., different alleles of a gene, different polymorphic forms of a polymorphic site, homologous or orthologous genes in different organisms). Alternatively, the variant polymers may be produced by mutagenesis techniques as for example described infra.
- the subunits of the polymers are selected from a finite series of possible subunit species.
- the subunits are selected from nucleic acid subunits, amino acid subunits or carbohydrate subunits, or combinations thereof.
- the subunits are selected from nucleic acid or amino acid subunits.
- the subunits are selected from nucleic acid subunits. 3. Variant or mutant configurations
- the variants or mutants may be related in various ways to the original or parent sequence G.
- the first is the star, in which each mutant is generated directly from the original sequence.
- the second is the path, in which each mutant is generated from the previous mutant.
- the octopus and the binary tree are two generalisations, combining features of both the star and the path. If the mutants are naturally occurring (in different lineages) then they may be derived from a common ancestor of unknown sequence.
- Factors influencing the number of mutants required include, but are not restricted to: the intensity of mutation (proportion of positions or 'sites' affected); the base-specificity of mutation (e.g., some mutagens target a single type of subunit, others target all subunit types, but have varying preferences); the site-specificity of mutation (some mutagens target specific sites preferentially); the configuration of mutants (star, path, etc.); and the need for obtaining a composite sequence.
- xk (n choose k)p ⁇ k(l-p) ⁇ (n-k) [0100] and the probability that the majority of mutants have been modified at that site is the sum of xk over all k greater than or equal to n/2. As an example, if 20 mutants are generated, each with a mutation intensity of 0.1, then the probability that a majority of mutants are altered at any given site is less than 1 in 100,000. This means that an incorrect or uncertain composite sequence will be obtained at fewer than one site in 100,000 on average under these assumptions. [0101] Note that it is often possible to correctly determine the base appearing at a given site even when that site has been modified in a majority of mutants.
- the base that appears at that site in the greatest number of mutants will be typically the correct one, even if it appears in fewer than half of the mutants.
- the above method overestimates the number of mutants required to achieve an accurate composite sequence. Also note that it is easier to calculate the degree of accuracy given the number of mutants, than vice versa. To select the number of mutants, it might be desirable to create and use a table of composite sequence accuracies for various numbers of mutants and mutation intensities.
- Figure 2 graphically displays a probability factor of miscalling (reading error) any nucleotide within the composite polynucleotide signal here for illustration assuming that a shared identity of 70% of the nucleotides at each position is necessary for correct detection.
- the probability of miscalling for each unit mixture of polynucleotides teaches that maxima and minima for miscalling occur irrespective of the frequency of mutation in each polynucleotide in the mixture.
- Figure 2 also teaches that there is a trend for the decrease in probability of miscalling with increasing numbers of polynucleotides in the mixture, irrespective of the frequency of mutation in each polynucleotide in the mixture. Further, Figure 2 teaches that there is a higher probability of miscalling for each number of polynucleotides in a mixture as the frequency of mutation increases, indicating that at higher levels of mutation, greater numbers of polynucleotides must be present in a mixture to avoid miscalling of a nucleotide at any position.
- mutant polymers are nucleic acid polymers (or polynucleotides)
- the mutagenesis facilitates the random replacement of nucleotides in a target polynucleotide with nucleotide analogues.
- the nucleotide analogues will base pair with at least two species of conventional or naturally-occurring nucleotides, resulting in the random transition and/or transversion mutagenesis of the target polynucleotide.
- a mixture of randomly mutated polynucleotides molecules is thereby produced in which the sequence of each of the randomly mutated secondary polynucleotides is materially dissimilar to the sequence of the target polynucleotide.
- mutagenic nucleotide analogues may be introduced into a target polynucleotide using any suitable method known to persons of skill in the art. For example:
- the analogue(s) may be introduced by a nucleic acid amplification process of the target polynucleotide examples of which include, but are not restricted to, PCR-directed amplification or rolling circle-amplification (e.g., by using a 29 DNA polymerase) of the target polynucleotide or by any other similar DNA polymerase directed DNA replication process.
- the molar ratio of nucleotide analogue to the corresponding conventional nucleotide in the amplification reaction is generally about 1:10, more usually about 1:5, more usually from about 1:3 to about 1 :2 and preferably from about 1 : 1.5 to about 1:1.
- the analogue(s) may be introduced during a PCR-directed cycle sequencing amplification of the target polynucleotide or by any other DNA polymerase directed DNA sequencing process.
- the molar ratio of nucleotide analogue to the corresponding conventional nucleotide in the PCR reaction is generally about 1:10, more usually about 1:5, more usually from about 1 :3 to about 1 :2 and preferably from about 1 : 1.5 to about 1:1.
- the analogue(s) may be introduced by co-transformation of the target polynucleotide simultaneously with the nucleotide analogue(s) into host cells such as E. coli; and [0109] (iv) the analogues may be introduced by growth of host cells such as E. coli transformed with the target polynucleotide in the presence of nucleotide analogue(s).
- Nucleotide analogues are compounds that may mimic natural nucleotides in structural associations with natural nucleotides in DNA or RNA. These analogues may possess the ability to (i) replace particular natural nucleotides within the DNA duplex and (ii) be introduced into the DNA molecule during DNA synthesis by DNA polymerases; (iii) replace particular natural nucleotides within the RNA polynucleotide and/or (iv) be introduced into the RNA molecule during RNA synthesis by RNA polymerases.
- Mutagenic nucleotide analogues have the additional property of replacing several natural nucleotides, rather than a single cognate nucleotide base. This has the effect of inducing transition and transversion mutations in subsequent rounds of DNA replication when the novel cognate base, initially introduced opposite the mutagenic nucleotide analogue, itself forms base pairs or complements with its natural cognate nucleotide.
- mutagenic properties of mutagenic nucleotide analogues are generally complex and dependent upon several factors.
- the mutagenic nucleotide analogue must be able to replace several natural nucleotides, rather than a single nucleotide.
- an individual nucleotide analogue mimics a naturally-occurring nucleotide to a major extent, firstly in terms of its selection by polymerases to be introduced as a cognate base opposite a particular natural nucleotide, and secondly in the selection and introduction of a particular cognate nucleotide by the polymerase opposite the analogue when the analogue occurs in the template strand of a replicating nucleic acid molecule.
- the property of the mutagenic nucleotide analogue to functionally mimic several natural nucleotides is believed to be the result of the physical structural state that the analogue may assume, either as a free nucleotide precursor in solution, or while constrained within a nucleic acid molecule.
- Figures 3, 4 and 5 The effect of the analogues on sequence analysis of DNA is illustrated in Figures 3, 4 and 5.
- Figure 3 compares the sequence analysis of a "difficult-to-sequence" polynucleotide W51 with the sequence analysis of a mutated copy of that polynucleotide.
- Figure 4 compares the sequence analysis of a "difficult to sequence” polynucleotide D36 with the sequence analysis of a mutated copy of that polynucleotide.
- Figure 5 compares the sequence analysis of a "difficult to sequence" polynucleotide region from human BAC RP11-167L9 with the sequence analysis of a mutated copy of that polynucleotide.
- nucleotide-like analogues capable of introducing mutations into nucleic acid polymers are:
- C]oxazin-7-one triphosphate behaves as thymine in the majority, and as cytosine in a minority, of DNA copying events, inducing A ⁇ G and T ⁇ C transitions (Zaccolo et al, 1996 Journal of Molecular Biology 255:589-603; Hill et ah, 1998b Proc. Natl. Acad. Sci. USA 95:4258-4263).
- the tautomerisation constant of P is assumed to be about 0.03 (Brown et ah, 1968 Journal of the Chemical Society C:1925-1929; Kierdaszuk et ah, 1983 FEBS Letters 158:128-130; Moore et ah,
- N 6 -methoxy-2,6-diaminopurine (dK) behaves as adenine in the majority, and as guanine in a minority, of DNA copying events, preferentially causing A ⁇ G and T ⁇ C transitions (Hill et al, 1998a Nucleic Acids Research 26:1144-1149; Hill et ah, 1998b Proc. Natl. Acad. Sci. USA 95:4258-4263).
- N 6 -methoxyadenine (dZ) behaves as adenine in the majority, and as guanine in a minority, of DNA copying events, preferentially causing A ⁇ G and T ⁇ C transitions (Hill et ah, 1998a Nucleic Acids Research 26:1144-1149: Hill et ah, 1998b Proc. Natl. Acad. Sci. USA 95:4258-4263).
- 2'-deoxyuridine each behave as thymine in the majority DNA copying events and as cytosine in a minority, principally inducing T ⁇ C and A ⁇ G transitions. These mutations are able to be influenced by the ionization state of the analogue and are enhanced at elevated pH values. The ratio of transitions to transversions is altered if DNA amplification is performed at elevated pH.
- N 4 -methoxycytosine induces purine transition mutagenesis.
- methoxylated dCTP When methoxylated dCTP is incorporated into DNA it behaves as thymine in the majority DNA copying events and as cytosine in a minority.
- methoxycytosine directs the incorporation of adenine above guanosine in the majority of cases (Brown et ah, 1968 Journal of the Chemical Society Section C:1925-1929; Reeves and Beattie, 1985 Biochemistry 24:2262- 2268; Hossain et ah, 2001a Nucleic Acids Research 29:3949-3954; Hossain et ah, 2001b Journal of Biochemistry 130:9-12).
- 5-formylcytosine (5-fC) an oxidation product of 5-methylcytosine (5-mC) are mutagenic, with mutation frequencies in double-stranded DNA of 0.03-0.28%.
- the mutation spectrum of 5-fC was broad, and included targeted (5-fC ⁇ >G, 5-fC ⁇ >A, and 5-fC ⁇ >T) and untargeted mutations. These results suggest that the oxidation of 5-mC results in mutations at and around the modified sites (Kamiya et al., 2002 Journal of Biochemistry (Tokyo) 132:551-555).
- the triphosphate derivative (dYTP) of the analogue dY (l-(2-deoxy- ⁇ -D- ribofuranosyl)-irmdazole-4-carboxamide) is preferentially incorporated as dATP only with elevated dYTP and reduced dATP.
- dYTP can also be incorporated as a dGTP with elevated dYTP and reduced dGTP (Sala, et ah, 1996 Nucleic Acids Research 24:3302-3306; Strobel, et al, 2002 Nucleic Acids Research 30:1869-1878).
- 50H-dCTP and 4Me 50H-dCTP can replace dCTP, and to a lesser extent dTTP.
- the analogues can template for particular nucleotides.
- dG is predominantly incorporated opposite 5-OH-dC, with low dA incorporation also seen.
- 5-OH-dCTP has the principal mutagenic potential for G ⁇ A transitions (Purmal et al, 1994 Nucleic Acids Research 22:72-78; Purmal et al, 1994 Nucleic Acids Research 22:3930-3935; Loeb et ah, 1999 Proc. Natl. Acad. Sci. USA 96:1492-1497).
- nucleotide analogue N 2 ,3-ethenoguanine causes virtually only G ⁇ A transition mutations (Cheng et al, 1991 Proc. Natl. Acad. Sci. USA 88:9974-9978).
- the mutagenic nucleotide analogues of the present invention exclude dlTP.
- the stability of base pairs formed by preceding nucleotides affects the rate of insertion of mismatched nucleotide but does not protect the mismatched nucleotide from removal by the 3' to 5' exonuclease activity.
- the stability of a base pair formed by a following nucleotide determines whether a misincorporated nucleotide is extended or excised by affecting the ability of the enzyme to edit errors of incorporation.
- Pleiss et ah, 1981 similarly showed that 2-amino-purine incorporation is disfavoured and that a greater bias exists with those polymerases containing an active 3 '-exonuclease.
- At least one mixture of polynucleotides mutated using at least one mutagenic nucleotide analogue more preferably at least two mixtures of polynucleotides, wherein each mixture is mutated using different types of mutagenic nucleotide analogue for the mutagenesis, and even more preferably more than two mixtures of polynucleotides, wherein each mixture is mutated using different types of mutagenic nucleotide analogue for the mutagenesis.
- analogue types and the numbers of differently mutated polynucleotides in the mutant mixture may depend upon the nature of the target sequence and the frequency of mutation of the polynucleotide mutants within the mixture.
- the choice of mutagenic analogue types and conditions for optimisation of mutagenesis is well within the realm of the practitioner in the art.
- mutagenic analogues such as dPTP and others can be used in an analogous fashion to non-mutagenic analogues such as dlTP, to improve the ability to sequence "difficult to sequence polynucleotides," using for example chain- terminating sequence analysis (e.g., PCR cycle sequencing).
- chain- terminating sequence analysis e.g., PCR cycle sequencing.
- the present invention therefore, also extends to the use of mutagenic analogues in combination with chain-extending nucleotides, chain- terminating nucleotides and a polymerase for sequencing a polynucleotide of interest.
- the molar ratio of mutagenic nucleotide analogue to the corresponding conventional chain-extending nucleotide in a sequencing reaction can be about 1:10, more usually about 1:5, more usually from about 1 :3 to about 1 :2 and preferably from about 1 : 1.5 to about 1:1.
- the mutagenic analogues induce mutation at a frequency generally greater than lxlO "2 , more usually at a frequency greater than 2xl0 "2 , 3xl0 “2 , 4xl0 “2 , 5xl0 “2 , 6xl0 “2 , 7xl0 “2 , 8xl0 “2 , 9xl0 “2 , lxlO "1 , l.lxlO -1 , 1.2xl0 -1 , 1.3xl0 _1 , 1.4xl0 _1 , 1.5xl0 _1 , l.
- Any suitable mutagenesis technique for mutagenising polymers is contemplated for use in SAQOM.
- two general approaches are commonly used to mutate nucleic acids: low fidelity PCR amplification of a DNA element using conditions to promote mis-incorporation of nucleotides, and the chemically-induced mutagenesis of DNA followed by repair and recovery of mutants either by PCR or other polymerase catalysed polynucleotide synthesis, or by biological systems (reviewed Ling & Robertson, 1997 Analytical Biochemistry 254:157-178; Leppard, 1999 Mutagenesis of DNA virus genomes, in DNA Viruses: A Practical Approach Series 214 (ed., Alan J.
- mutagenesis schemes could potentially be used to produce suitable variant sequences for use in SAQOM.
- an original or parent polynucleotide can be mutated using random mutagenesis (e.g., PCR mediated mutagenesis) or oligonucleotide- mediated (or site-directed) mutagenesis.
- Oligonucleotide-mediated mutagenesis can be used for preparing suitable nucleotide substitution variants of a primary polynucleotide.
- This technique is well known in the art as, for example, described by Adelman et al. (1983 DNA 2:183-193). Briefly, a polynucleotide is altered by hybridising an oligonucleotide encoding the desired mutation to a template DNA, wherein the template is the single-stranded form of a plasmid or bacteriophage containing the unaltered or parent DNA sequence. After hybridisation, a DNA polymerase is used to synthesise an entire second complementary strand of the template that will thus inco ⁇ orate the oligonucleotide primer, and will code for the selected alteration in the parent DNA sequence.
- oligonucleotides of at least 25 nucleotides in length are used.
- An optimal oligonucleotide will have 12 to 15 nucleotides that are completely complementary to the template on either side of the nucleotide(s) coding for the mutation. This ensures that the oligonucleotide will hybridise properly to the single-stranded DNA template molecule.
- the DNA template can be generated by those vectors that are either derived from bacteriophage Ml 3 vectors, or those vectors that contain a single-stranded phage origin of replication as described by Vieira et al. (1987 Methods Enzymol. 153:3-11).
- the DNA that is to be mutated may be inserted into one of the vectors to generate single-stranded template. Production of single-stranded template is described, for example, in Sections 4.21-4.41 of Sambrook et al. MOLECULAR CLONING. A LABORATORY MANUAL (Cold Spring Harbor Press, 1989).
- the single-stranded template may be generated by denaturing double- stranded plasmid (or other DNA) using standard techniques.
- the oligonucleotide is hybridised to the single-stranded template under suitable hybridisation conditions.
- a DNA polymerising enzyme usually the Klenow fragment of DNA polymerase I, is then added to synthesise the complementary strand of the template using the oligonucleotide as a primer for synthesis.
- a heteroduplex molecule is thus formed such that one strand of DNA encodes the mutated form of the polypeptide or fragment under test, and the other strand (the original template) encodes the native unaltered sequence of the polypeptide or fragment under test.
- This heteroduplex molecule is then transformed into a suitable host cell, usually a prokaryote such as E. coli. After the cells are grown, they are plated onto agarose plates and screened using the oligonucleotide primer having a detectable label to identify the bacterial colonies having the mutated DNA. The resultant mutated DNA fragments are then cloned into suitable expression hosts such as E. coli using conventional technology and clones that retain the desired antigenic activity are detected. Where the clones have been derived using random mutagenesis techniques, positive clones would have to be sequenced in order to detect the mutation.
- linker-scanning mutagenesis of DNA may be used to introduce clusters of point mutations throughout a sequence of interest that has been cloned into a plasmid vector.
- a sequence of interest that has been cloned into a plasmid vector.
- a pair of complementary oligonucleotides is synthesised to fill in the gap in the sequence of interest between the linker at the deletion endpoint and the nearby restriction site.
- the linker sequence actually provides the desired clusters of point mutations as it is moved or "scanned” across the region by its position at the varied endpoints of the deletion mutation series.
- An alternate protocol is also described by Ausubel et ah, supra, which makes use of site directed mutagenesis procedures to introduce small clusters of point mutations throughout the target region. Briefly, mutations are introduced into a sequence by annealing a synthetic oligonucleotide containing one or more mismatches to the sequence of interest cloned into a single-stranded Ml 3 vector.
- This template is grown in an E. coli duf ung strain, which allows the inco ⁇ oration of uracil into the template strand.
- the oligonucleotide is annealed to the template and extended with T4 DNA polymerase to create a double-stranded heteroduplex.
- the heteroduplex is introduced into a wild-type E. coli strain, which will prevent replication of the template strand due to the presence of apurinic sites (generated where uracil is inco ⁇ orated), thereby resulting in plaques containing only mutated DNA.
- Methods for generating abundant mutations are advantageous. Examples of such methods are based on exposing an original or target polynucleotide to mutagenising chemicals . , 1 -
- SAQOM can potentially be used to sequence fragments ranging in length from a few hundred bases up to an entire genome.
- Some of the above-mentioned mutagenesis techniques rely on PCR amplification, which is currently limited to DNA fragments of about 40 kb or shorter (Cheng et ah, 1995; Fromenty et ah, 2000). This is long enough to enable some exciting applications of SAQOM, but techniques suitable for longer fragments would greatly empower the technique, as for example described infra.
- Exemplary methods of mutagenesis for use in SAQOM include one or more of the following: (1) DNA replication with mutagenic nucleotide analogues and damaged nucleotides; (2) nucleic acid shuffling protocols based on in vitro or in vivo homologous recombination of pools of nucleic acid fragments or polynucleotides; (3) in. vitro DNA replication with low fidelity polymerases and high processivity polymerases; (4) propagation of damaged DNA in repair- deficient E. coli hosts; (5) chemical mutagens and (6) Degenerate Oligonucleotide Primed PCR.
- these methods can be applied to two groups of DNA targets - small (1-10 kb) and large (>50 kb) DNA elements. The methods differ somewhat for the two targets, and are described infra:
- Small DNA elements can be mutated by the misinco ⁇ oration of bases during a nucleic acid amplification reaction, which are well known to the skilled artisan, and include polymerase chain reaction (PCR) as for example described in Ausubel et al. (supra); isothermal strand displacement amplification (SDA) as for example described in U.S. Patent No 5,422,252; rolling circle replication (RCR) as for example described in Liu et ah, (1996 and International application WO 92/01813), Laskins et ah, (U.S. Patent No. 6,323,009), Auerbach et ah, (U.S. Patent No.
- the polymerases used for these processes are suitably selected from Taq DNA polymerase, Pfo DNA polymerase, Pwo DNA polymerase, Tfl DNA polymerase, Tth DNA polymerase, Pfu DNA polymerase or Exo-Pfu DNA polymerase, Hot Tub DNA polymerase, Vent DNA polymerase or Deep Vent DNA polymerase, E. coli DNA polymerase, the Klenow fragment of E. coli DNA polymerase, T4 or T7 DNA polymerase, AmpliTaqTM DNA polymerase Stoffel fragment, AmpliTaqTM Gold DNA polymerase, Q/3 DNA polymerase, ⁇ 29 DNA polymerase, E. coli DNA polymerase V, Y-family DNA polymerases, RNA polymerase and reverse transcriptase.
- the mutagenesis is carried out using PCR-directed mutagenesis.
- small DNA elements can be mutated efficiently (1-20%.) by using non-standard base analogues (Karniya et ah, 1994 Nucleosides and Nucleotides 13:1483-1492; Zaccolo et ah, 1996 Journal of Molecular Biology 255:589-603), or less efficiently by limiting the provision of some bases (Cline et ah, 1996 Nucleic Acids Research 24:3546-3551), or by chemically reducing polymerase fidelity (Rice et ah, 1992 Proc. Natl. Acad. Sci.
- dPTP [6-(2-deoxy-B-D-ribofuranosyl)-3,4-dihydro-8H-pyrimido-[4,5-C]oxazin-7-one triphosphate] induces A ⁇ G and T ⁇ C transitions, while 8-oxo-dGTP preferentially causes A ⁇ C and T ⁇ G transversions (Zaccolo et ah, 1996 Journal of Molecular Biology 255:589-603).
- nucleoside analogues such as ⁇ 6 -methoxy-2,6-diaminopurine (dK) and N 6 - methoxyoxyaminopurine (dZ) (Hill et ah, 1998a, Nucleic Acids Research 26:1144-1149: Hill et ah, 1998b, Proc. Natl. Acad. Sci. USA 95:4258-4263) also induce particular mutations. The inventors have found that small DNAs modified by analogues in vitro may be recovered with controlled frequency from 1-30% mutation.
- a mutant DNA polymerase with lowered fidelity for inco ⁇ oration of correct complementary nucleotides during DNA synthesis, and which is desirably thermostable, is suitably employed in such nucleic acid amplification-directed mutagenesis protocols.
- a mutant Taq polymerase has been found to produce significant levels of random mutation during PCR amplification (U.S. Patent No 6,329,178; Suzuki et al, 1997 Journal of Biological Chemistry 272:11228-11235).
- This mutant polymerase can also inco ⁇ orate nucleotide analogues as efficiently as, or more efficiently than, native Taq polymerase.
- rounds of mutagenesis with the low-fidelity polymerase e.g., mutant Taq or Pfo
- mutagenic nucleotide analogues are used to effect modification in genomic sub-fragments and other small DNAs.
- superior performance by a family B-type DNA polymerase, Vent R exo ⁇ which is able to fully synthesise a 300-bp DNA product when all natural dNTPs are completely replaced by their biotin-labelled dNTP analogues are known (Tasara et ah, 2003 Nucleic Acids Research 31:2636-2646).
- the length of DNA that can be mutated exhaustively is only limited by the PCR procedure, which can routinely amplify 10-20 kb fragments, aided by E. coli exonuclease Ul (Fromenty et ah, 2000 Nucleic Acids Research 28:e50) and other protein factors (Motz et al, 2002 supra).
- (Low fidelity) PCR amplification can be carried out by methods which include, but are not restricted to, degenerate oligonucleotide primer PCR and shotgun mutagenesis.
- Bacterial strains which are deficient in enzymes of excision repair pathways that catalyse different steps in DNA sanitation, are suitably employed and these are well known to practitioners versed in the art. Examples include E.
- DNA and nucleotide analogues are co-transfected into repair-deficient bacteria, which results in increased levels of mutation, as mispaired bases are not thoroughly removed (lhoue et ah, 1998 Journal of Biological Chemistry 273:11069-11074; Fujikawa et ah, 1998 Nucleic Acids Research 26:4582-4587).
- the co-transfection of nucleotide analogues and DNAs into repair- deficient host strains can also be used to mutate random shotgun libraries at low mutation frequencies.
- Larger DNA elements can be mutated efficiently using mutagenic nucleotide analogues and repair-deficient bacteria.
- mutagenic nucleotide analogues and larger DNA elements such a BACs can be co-transfected into repair-deficient host strains to generate mutant BACs.
- the in vivo functionality of the modified BACs may be recovered efficiently in E. coli by homologous recombination (Nefedov et ah, 2000 Nucleic Acids Research [Methods on Line] 28:e79).
- RCA rolling circle amplification
- RCA polymerase including ⁇ 29 DNA polymerase and other polymerases permit the synthesis of large circular double strand DNA molecules such a large plasmids and BACs (Dean et ah, 2001 Genome Research 11:1095-1099; Amersham Biosciences, 2002 TempliPhi, Amersham Technical note; Zhang et ah, 2001 Gene 274:209-216).
- the ability to replicate large DNAs in vitro permits mutation to higher levels, without the functional limits imposed by replication in bacterial hosts.
- this technique is used in concert with mutagenic nucleotide analogues as herein described and other deoxynucleotide triphosphate analogues to inco ⁇ orate the mutagenic analogues directly into DNA templates.
- Clones harbouring mutant DNAs can then be recovered in a suitable host (e.g., E. coli) by homologous recombination, or by other in vitro recombinant techniques.
- An error-prone repair DNA polymerase with lowered fidelity for inco ⁇ oration of correct complementary nucleotides during DNA synthesis is suitably employed in such nucleic acid amplification-directed mutagenesis protocols.
- E. coli DNA polymerase V has been found to produce significant levels of random mutation during DNA repair (Goodman, 2002 Annual Review of Biochemistry 71:17-50; Silvian et ah, 2001 Nature Structural Biology 8:984-989).
- This variant polymerase can also inco ⁇ orate nucleotide analogues as efficiently as replicative DNA polymerase.
- rounds of mutagenesis with the low-fidelity polymerase e.g., DNA polymerase V
- processive polymerase such as ⁇ 29 DNA polymerase and mutagenic nucleotide analogues
- the length of DNA that can be mutated exhaustively is only limited by the rolling circle procedure, which can routinely amplify 80-120 kb fragments (Liu et ah, 1996; Dean et ah, 2001 supra).
- Larger DNA elements can also be mutated advantageously using RNA polymerase amplifications.
- RNA polymerases and RNA reverse transcriptases can be used to amplify DNA fragments (Iwata et ah, 2000 Bioorganic and Medical Chemistry 8:2185-2194; Bebenek et ah, 1999 Mutation Research 429:149-158) with inco ⁇ oration of ribo-nucleotide or deoxynucleotide analogues. Ribo-nucleotide analogues are mutagenic and some are inco ⁇ orated into both RNA (U.S. Patent No 6,132,776; U.S.
- Patent No 5,512,431 Moriyama et ah, 1998 Nucleic Acids Research 26:2105-2111; Moriyama et ah, 2001 Nucleic Acids Research Supplement :255-256) and DNA (M ⁇ ller et ah, 1978 Journal of Molecular Biology 124:343-358), and deoxynucleotide analogues are inco ⁇ orated by reverse transcriptase into DNA (Lutz et ah, 1998 Bioorganic and Medical Chemical Letters 8:499-504; Bebenek et ah, 1999 Mutation Research 429:149-158).
- RNA products inco ⁇ orating ribonucleotide analogues can be copied from cloned DNAs residing in suitable plasmid vectors possessing RNA polymerase promoters.
- the RNA products can be used to create mutated cDNA, based on G ⁇ A hypermutation resulting from retroviral reverse transcription in the presence of highly biased dNTP concentrations (Martinez et ah, 1995 Nucleic Acids Research 23:2573-2578), which are then subsequently subcloned and sequenced individually.
- Chemical mutagens can also be used advantageously to mutate larger DNA elements.
- the damaged DNA can be recovered by in vivo recovery of the target in plasmid or BAC vectors (Ling and Robinson, 1997 Analytical Biochemistry 254:157-178; Leppard, 1999 in DNA Viruses: A Practical Approach, vol 214, IRL Press) with low-level chemical modification.
- the present invention also contemplates whole genome mutagenesis.
- Several routes to random mutation of whole genomes are known, and these generally fall into two major categories: (i) induced-mutagenesis in biological systems or whole cell lines and (ii) (low fidelity) PCR amplification and replication of DNA elements using conditions to promote mis-inco ⁇ oration of nucleotides, or analogues of nucleotides.
- Induced mutagenesis can be carried out by methods which include, but are not limited to, whole cell mutation, large cloned element mutagenesis, degenerate oligonucleotide primer PCR and shotgun mutagenesis.
- Whole cell mutation involves the induction of mutation in stable cell lines from an organism, or in hybrid cells lines that carry an individual chromosome from the organism under study within the cell of another organism.
- the advantage of this approach is the potential to isolate individual mutant cell lines that may be used as a recurrent source of a particular mutated DNA sequence, while retaining the larger chromosomal context of that sequence.
- Efficient in vivo inco ⁇ oration of nucleotide analogues has been described for mammalian cell lines exposed to 5- bromo-2'-deoxyuridine (Bick and Davidson, 1974 Proc. Natl. Acad. Sci.
- brominated base analogues such as 8-bromo-2'-deoxypurines and 5-bromo-2'- deoxypyrimidines (Stewart et ah, 1968 Experimental Cell Research 49:293-299).
- non-bromine base analogues that have been inco ⁇ orated in this manner include 2- aminopurine (Glickman, 1985 Basic Life Sciences 31:353-79), 5-propynyloxy-2'-deoxyuridine, and 5-ethynyl-2'-deoxyuridine (Balzarini et ah, 1984 Biochemistry Journal 217:245-252).
- Mutagenic nucleotide analogues might conveniently be introduced conveniently into large BAC or cosmid clones using nick translation and which uses Escherichia coli DNA polymerase I for the sequential addition of nucleotide residues to the 3 '-hydroxyl terminus of a nick [created by pancreatic Deoxyribonuclease (DNAse) I], simultaneous with the elimination of nucleotides from the adjacent 5'-phosphoryl terminus of the nicked polynucleotide strand (Langer et ah, 1981 Proc. Natl. Acad. Sci. USA 78-6633-6637; Holtke et ah, 1990 Mol. Gen.
- DNAse pancreatic Deoxyribonuclease
- DNA polymerase I efficiently fills in the strand breaks as rapidly as they are formed by DNAse I nuclease nicking, inco ⁇ orating the desired nucleotides into the original strands and modifying both of the parent strands (Meinkoth and Wahl, 1987 Methods in Enzymology 152:91-94; Rigby et ah, 1977 Journal of Molecular Biology 113:237-251).
- a minimum overlapping set of these large elements that represent the genome is then further subcloned into plasmids (1-3 kb inserts), forming and stabilising the mutations.
- the plasmid-contained mutated genomic DNA elements are then sequenced. Mapping and finge ⁇ rinting techniques, such as BAC insert end-sequencing, restriction finge ⁇ rinting, STR finge ⁇ rinting, hybridisations with cDNA and other cloned and sequenced DNA elements, as well as cross-hybridisation between BAC elements is used to identify the genomic elements and to create contigs of the overlapping large cloned elements.
- Mutation of the genome can be performed segmentally on the large element clones preferably using methods for mutagenesis of large DNA elements, as for example described supra.
- Random amplification by degenerate oligonucleotide primer (DOP) PCR can be used to recover essentially random DNA fragments of 0.5 kb to 2 kb from limiting amounts of genomic DNA and from individual cytometric flow-sorted chromosomes (Zhou et ah, 2000 Biotechniques 2:766-767; Hirose et ah, 2001 Journal of Molecular Diagnostics 3:62-67). Nucleotide analogues are inco ⁇ orated efficiently by these fragment sizes by PCR to as high as ⁇ 20% mutation.
- DOP-PCR can be used to amplify from whole genomes, individual chromosomes, or to amplify from large DNA fragments such as cloned BACs to limit the sequence complexity. Such random amplified mutant fragments can then be sub-cloned to form a representative mutant library.
- Shotgun cloning of entire genomes has also been used in the sequencing the human genome (Venter et ah, 2001 Science 291:1304-1351).
- the method involves the subcloning and DNA sequencing of a selection of randomly broken, short, overlapping DNA elements that collectively represent the original genome and the reconstruction of the original sequence by the computer-aided alignment of the resulting multiple overlapping sequence reads.
- This principle could be applied to the cloning of chemically-modified DNA, in which nucleotide-damage internal to the random fragments will result in recovery of a mutated shotgun clone library.
- Chemical modification of DNA can be achieved by several different methods.
- random chemical modification of nucleotide bases can be used for shotgun-mutagenesis.
- chemical modification is combined with processes for efficient fragment end-repair and sub-cloning of the damaged DNAs.
- End repair enzymes such as E.
- coli endonuclease IV (Levin et al 1988 Journal of Biological Chemistry 263:8066-8071; Demple and Harrison 1994 Annual Review Biochemistry 63:915-948) and endonuclease lTJ (Masson and Ramotar, 1997 Molecular Microbiology 24:711-721) are used to remove 3'-phosphoglycolates and different 3 '-phosphates that may arise at the termini of chemically broken DNA fragments, and additionally conventional DNA polymerases (such as Klenow enzyme or T4 DNA polymerase) and polynucleotide kinases (e.g., T4 polynucleotide kinase) are used to 'fill-out' single strand fragment termini and to phosphorylate 5 '-termini, respectively.
- DNA polymerases such as Klenow enzyme or T4 DNA polymerase
- polynucleotide kinases e.g., T4 polynucleotide kina
- Chemical modification of DNA can also be achieved by conventional shotgun cloning followed by subsequent mutation of the random genomic sub-fragments.
- the subsequent mutation may be carried out by library mutagenesis, or individual sub-clone mutagenesis, which has the advantage that subclones of genomic DNA that are created may be first created and cloned efficiently without chemical damage to the termini requiring particular repair steps.
- Library- mutagenesis is suitably achieved either by the above methods for small element mutagenesis in which the entire random representative library is subjected to a mutagenic procedure and subsequently, random mutant clones are chosen from the resultant library, or collections of clones from the random library, e.g., 96 clones, are collectively mutated by nucleotide analogue PCR and the resultant amplicons are re-cloned to make a sub-set mutant library that can be conveniently related back to the original individual unmodified 96 clones.
- the oligonucleotide primers for PCR are preferably complementary to vector sequences flanking the ligated elements and possess (rare) restriction sites that are either all C:G or A:T. Nucleotide analogues that preferentially target either A:T or C:G base pairs for sequence mutation are then employed, which leave one of the two types of restriction sites essentially unaltered and thus available for convenient regeneration of restriction termini for cloning of the amplicons into the new plasmid vectors. In this manner, a fully representative genome library could be efficiently mutated before passage through E. coli cells.
- Desirable hosts and/or vectors for cloning parent or mutagenised sequences are those which have been engineered to ameliorate difficulties in cloning otherwise difficult-to-clone nucleic acid molecules.
- bacterial strains particularly strains of E. coli, and engineered plasmid vectors are known to practitioners in the art, which have been selected or engineered to overcome such difficulties.
- Illustrative strains for this pu ⁇ ose include, but are not restricted to: E. coli strains engineered to limit recombination of DNA, such as JMl lO cells that accept repetitive DNA, as for example disclosed by Troester et al. (2000 Gene 258:95-108), E.
- coli strains engineered to be methylation-tolerant mcrA " mcrBT) that limit the restriction of unmethylated or 'incorrectly' methylated DNA and thus accept mammalian DNA-containing clones, as for example disclosed by Doherty et al. (1991 Gene 98:77-82) and Williamson et al. (1993 Gene 124:37-44; Stratagene Co ⁇ ., SURE cells); and E.
- Suitable plasmids include, but are not limited to: plasmid vectors that have been engineered to prevent read-through transcription (e.g., the CloneSmartTM vector system from Lucigen Co ⁇ oration, Middleton WI 53562, USA, which is a gap-free cloning system available for sequencing recalcitrant or unclonable DNA), low copy plasmids that replicate in E.
- plasmid vectors that have been engineered to prevent read-through transcription
- Lucigen Co ⁇ oration e.g., Lucigen Co ⁇ oration, Middleton WI 53562, USA, which is a gap-free cloning system available for sequencing recalcitrant or unclonable DNA
- low copy plasmids that replicate in E.
- coli hosts to 1-10 copies per cell in which repeat DNA elements may be maintained e.g., pBRm and its derivatives as for example described by Mitchelson and Moss, 1987 Nucleic Acids Research 15:9577-9596; and pEV-vrf3 as for example described by Perng et ah, 1994 Journal of Virological Methods 46:111-116).
- SAQOM can be applied to the sequencing of any 'problematic' polymer including, for example, polypeptides and carbohydrates.
- Variant or mutant polypeptides can be produced using any suitable technique.
- mutant polypeptides may be produced from mutant polynucleotides prepared by rational or random mutagenesis methods as, for example, described supra.
- Sequencing of a polypeptide may be performed by site-directed or random cleavage of the polypeptide using, for example endopeptidases or CNBr, to produce a set of polypeptide fragments and subsequent sequencing of the polypeptide fragments by, for example, Edman sequencing or mass spectrometry, as is known in the art.
- the polypeptide probes or polypeptide fragments could be sequenced by use of antibody probes as for example described by Fodor et al in U.S. Patent Serial No. 5,871,928. Briefly, such antibody probes specifically recognise particular subsequences (e.g., at least three contiguous amino acids) found on a polypeptide. Optimally, these antibodies would not recognise any sequences other than the specific desired subsequence and the binding affinity should be insensitive to flanking or remote sequences found on a target molecule.
- mutant polymers can be determined by any suitable technique that is capable of interrogating corresponding positions of a multiplicity of mutant polymers, collectively, for the identity and quantity of different species of subunit.
- mutant polynucleotides may be sequenced using the chain termination method, which involves combining the target polynucleotide with a sequencing primer that hybridises with the target polynucleotide, and extending the sequencing primer in the presence of normal nucleotide precursors (dATP, dCTP, dGTP, and dTTP).
- a chain-terminating nucleotide such as a dideoxynucleotide triphosphate, of one particular base type (A, C, G, T) is added to the reaction, to effect a termination of DNA chains at random positions along the sequence.
- the nested series of DNA fragments produced in this reaction is then separated according to size (i.e., fragment length) typically by electrophoresis in a separation medium, to produce a series of bands in the profile of that medium.
- a set of four reactions (with chain termination occurring via ddA, ddC, ddG, ddT inco ⁇ oration) is required for explicit determination of the positions of all four bases in the sequence.
- This process results in fragments of DNA of varying sizes that end with a different base (A, T, C, or G).
- the determination of DNA sequence in these methods depends on separating the DNA fragments produced by order of size and either by what base they contain (when each lane has only one reaction product) or by what tag is detected (e.g., a fluorescent or chromophoric tag) if all four reaction products are in one lane as in commercially available automated DNA sequencers. If the shortest fragment ends in A, then the first base in the sequence is A. If the next longest fragment ends in T, then the next base in the DNA sequence is T and so on. This is the basic algorithm for "base calling", i.e., determining the sequence of purine and pyrimidine bases in a strand of DNA.
- the sequencing procedure employs four different fluorescent tags, one for each sequencing reaction, as for example described in U.S. Pat. No. 5,171,534.
- fluorescent tags include, but are not restricted to, fluorescein-5-isothiocynate (FITC), which has an emission or fluorescence peak at about 525 nm, Texas Red, which has a fluorescence peak at about 620 nm, Tetramethyl rhodamine isothiocynate (TRITC), which has a fluorescence peak at about 580 nm, and 4-fluoro-7nitro-benzofurazan (NBD-fluoride), which has a fluorescence peak at about 540 nm.
- FITC fluorescein-5-isothiocynate
- Texas Red which has a fluorescence peak at about 620 nm
- TRITC Tetramethyl rhodamine isothiocynate
- NBD-fluoride 4-fluoro-7nitro-benzofurazan
- chromophoric tags can be used to substitute for the fluorescent tags, wherein the chromophores have well resolved abso ⁇ tion maxima.
- the tags bind on the residual fragments in accordance with the exposed end base, if using dye terminator chemistry, or are attached to primers that are used to initiate the sequencing reaction, if using dye primer chemistry.
- fluorescent tags the sequence is read by causing the fluorescent markers to fluoresce.
- the four fluorescent tags generally are selected to have a strong fluorescence peak that is separated from the strong fluorescence peak of the remaining tags.
- An optical instrument detects the emitted fluorescence signals.
- the fragments developed in the A, G, C and T sequencing reactions are then recombined and introduced together onto a separation matrix.
- a system of optical filters is used to individually detect the fluorescent tags as they pass the detector.
- each sample is first divided into four aliquots which are combined with four sequencing reaction mixtures.
- Each sequencing reaction mixture contains a polymerase enzyme, a primer for hybridising with the target nucleic acid, nucleotide precursors and a different dideoxynucleotide. This results in the formation of an A- mixture, a G-mixture, a T-mixture and a C-mixture for each sample containing product oligonucleotide fragments of varying lengths.
- the product oligonucleotide fragments are labelled with fluorescent tags, and these tags will generally be the same for all four sequencing reactions for a sample.
- the fluorescent tags used for each sample are distinguishable one from the other on the basis of their excitation or emission spectra.
- the A-mixtures for each sample are combined to form a combined A mixture
- the G-mixtures are combined to form a combined G- mixture and so on for all four mixtures.
- the combined mixtures are loaded onto a separation matrix at separate loading sites and an electric field is applied to cause the product oligonucleotide fragments to migrate within the separation matrix.
- the separated product oligonucleotide fragments having the different fluorescent tags are detected as they migrate within the separation matrix.
- one analogue is substituted for the corresponding nucleotide during PCR, generating amplicons that contain nucleotide analogues at each occurrence of the selected base throughout the target DNA except for the primer sequences.
- Subsequent chemical cleavage at each site of modification produces fragments of different lengths and/or molecular weights that may be analysed by mass spectrometry, which employs, for example, MALDI-TOF techniques and secondary post-source decay, to determine the mass-identity of nucleotides within the fragments of the polynucleotide (Abdi et ah, 2002 Genome Research 12: 1135-1141). These data are then analysed to reconstruct the target polynucleotide sequence.
- Koster in U.S. 6,238,871, describes a method that assembles sequence information by analysing nested fragments obtained by base-specific chain termination according to their different molecular masses using mass spectrometry, as for example, MALDI or electrospray (ES) mass spectrometry.
- mass spectrometry as for example, MALDI or electrospray (ES) mass spectrometry.
- ES electrospray
- the molecular weights of the four specifically terminated fragment families can be determined simultaneously by MS, either by mixing the products of all four reactions run in at least two separate reaction vessels (i.e., all run separately, or two together, or three together) or by running one reaction having all four chain-terminating nucleotides (e.g., a reaction mixture comprising dT7W, ddTTP, dATP, ddATP, dCTP, ddCTP, dGTP, ddGTP) in one reaction vessel.
- the molecular weight values can, in effect, be inte ⁇ olated. Comparison of the mass difference measured between fragments with the known masses of each chain-terminating nucleotide allows the assignment of sequence to be carried out.
- Miniaturised chip CE systems with nano-channels ⁇ 1 ⁇ m allows analysis to be undertaken on limiting numbers of molecules held pico- and nano-molar concentrations, with amplification and detection of signals from single template molecules (Krishnan et ah, 2001 Curr-ent Opinion in Biotechnology 12:92-98; Koutny et ah, 2000 Analytical Chemistry 72:3388-3391; Paegel et ah, 2003 Current Opinion in Biotechnology 14:42-50; Chen et ah, 2002 Analytical Chemistry 74:1772-1778).
- Single molecule sequencing with exonuclease comprises the serial digestion of a single DNA strand, which is attached to a solid surface or microchannel surface (Marziali and Akeson, 2001 Annual Review of Biomedical Engineering 3:195-223; Jett et ah, 1989 Journal of Biological Structure & Dynamics 7:301-309). The fluorescent-tagged nucleotide subunits are sequentially released, then collected and detected.
- the methods demand highly efficient enzymatic inco ⁇ oration of labelled analogues at each subunit nucleotide position such as with use of nick translation (Jett et ah, 1995 US Patent 5,405,747; Gillam and Tener 1986 Analytical Biochemistry 157:199-207; Gebeyehu et ah, 1987 Nucleic Acids Research 15:4513-34), or highly efficient in vivo inco ⁇ oration as has been described for cell lines exposed to 5-bromo-2'-deoxypyrimidines (Bick and Davidson, 1974 Proc. Natl. Acad. Sci. USA 71:2082-2086). Error synthesis due to polymerase stutter, inco ⁇ oration of an incorrect nucleotide, or a physical barrier to efficient synthesis could limit the technique, as could the efficiency of single nucleotide release reactions.
- nano-devices sort DNA molecules by use of openings or pores or forests of pillars that are less than the "radius of gyration" of the DNA fragments, just large enough for DNA molecules to run through in single file.
- SMS Single-molecule sequencing
- “Sequencing by Synthesis” is a method common to primer extension methods such as “single nucleotide primer extension” (SnuPE) and pyrosequencing (Ronaghi et ah, 1999 Analytical Biochemistry 267:65-71; Nordstrom et ah, 2000 Analytical Biochemistry 282:186-193).
- the strand extensions are continued for 30 nucleotides or more, and of solid phase parallel micro-array analysis is employed in which 10 8 features (molecules) are sequenced simultaneously, coupled with a unitary base addition chemistry that allows single nucleotide additions to growing chains to be monitored on each feature.
- Oligonucleotides are anchored to glass slides at densities of up to 10 8 molecules per cm 2 and used to capture complementary genomic DNA fragments.
- Primed molecules are attached to the array such that they can be efficiently extended by polymerases with the addition of 4 differentially labelled terminating nucleotides (Braslavsky et ah, 2003 Proc. Natl. Acad. Sci. USA 100:3960-3964).
- the extended fragments are then all simultaneously detected following the addition of one nucleotide, the terminating moiety and fluorescent-tag are then removed chemically from each attached nucleotide analogue, ready for the addition of the next nucleotide to each chain (Li et ah, 2003 Proc. Natl. Acad. Sci.
- Poly technology involves a polymerase trapping technique which enables efficient nucleotide extension by DNA polymerase in a polyacrylamide matrix, and eliminates loss of enzyme during sequencing cycles. Novel types of reversibly dye-labelled nucleotide analogues are used for each extension cycle that can be efficiently inco ⁇ orated by DNA polymerase, and which dyes can be removed by thiol reduction or light exposure following nucleotide addition, and permitting sequencing of multiple 'polonies' in parallel. A high density of polonies can be achieved with minimal overlap between adjacent polonies by limiting the . _ ⁇
- nucleotide species at an individual interrogation position are each detectably labelled.
- each species of nucleotide is associated with the same label and is resolved in space or in time from other species of nucleotides as used, for example, in conventional chain terminating sequencing, in which DNA sequencing fragments are separated according to size and what base they contain (i.e., typically separating fragments in four lanes, each lane resolving fragments terminating in a single species of nucleotide).
- individual species of nucleotide are each distinctly labelled and are resolved according to their labels (i.e., typically separating four fragment subsets in a single lane, each fragment subset terminating in a distinct species of nucleotide).
- nucleotide species at an individual interrogation position are each detectably labelled.
- each species of nucleotide is associated with the same label and is resolved in space or in time from other species of nucleotides as used, for example, in conventional chain terminating sequencing, in which DNA sequencing fragments are separated according to size and what base they contain (i.e., typically separating fragments in four lanes, each lane resolving fragments terminating in a single species of nucleotide).
- individual species of nucleotide are each distinctly labelled and are resolved according to their labels (i.e., typically separating four fragment subsets in a single lane, each fragment subset terminating in a distinct species of nucleotide).
- the parameters are suitably label-associated parameters, which include, but are not restricted to, parameters relating to fluorescence emission, luminescence, phosphorescence, infrared radiation, electromagnetic scattering including light and x-ray scattering, light transmittance, light absorbance, electrical impedance and molecular mass.
- the parameter is signal intensity (e.g., fluorescence intensity, light intensity, radiation intensity, etc).
- the measured parameters are typically compared to each other to determine the species of nucleotide that is in higher abundance than the other species of nucleotide at individual interrogation positions.
- the detectable signals associated with different species of nucleotide at individual interrogation positions are processed to produce a data set containing a plurality of peaks reflecting the positions and species of the nucleotides in the mutant polynucleotides which are the subject of the analysis.
- Examples of automatic sequencing apparatus and methods that can be used for this pu ⁇ ose include, but are not restricted to, U.S. Pat. No. 4,811,218 to Hunkapiller et al. and U.S. Pat. No. 5,556,790 to Pettit.
- a group of peaks will result for each interrogation position at which a plurality of different species of nucleotide reside.
- the peaks of each group are then processed collectively to determine the species of nucleotide, which is in higher abundance relative to the other species of nucleotide, at a respective interrogation position.
- This processing may suitably involve extracting a vector from one or more peak features for each peak, i.e., a vector may quantify such peak characteristics as peak height and area under the peak, and comparing the vector(s) derived for each peak with the vector(s) of other peaks to deduce the species of nucleotide that is in higher abundance at an interrogation position than other nucleotide species.
- the detectable signals may be corrected for certain distortions such as peak clipping and contextual influences in order to more accurately identify and "base call" a distinct unique detectable signal from a composite signal.
- This operation is equivalent to noise removal as performed in speech recognition software programs. Peaks may then be detected on the corrected, if opted for, detectable signals.
- Figure 6 compares the sequence analysis of a "difficult-to-sequence polynucleotide motif from human BAC RP11-167L9 with the sequence analysis of a mixture containing a plurality of mutants of that polynucleotide.
- a mixture containing multiple identical copies of the difficult to sequence motif was sequenced using standard ABI BigDyeTM version 3.0 cycle sequencing chemistry and an ABI 3730x1 capillary sequencer, resulting in a failed sequence determination of an internal polyA tract which weakly reads 21 deoxyadenosine subunits (A) and 3 indeterminate deoxynucleosides (N) within the tract.
- the present invention discloses methods for sequence analysis, which may be conveniently implemented by a processing system such as a computer system. These methods are predicated in part on the provision or detection of signals representing distinct species of subunit resolved as a function of subunit position in a plurality of variant sequences that vary from a target sequence by the substitution of at least one subunit with a subunit of a different species.
- the signals are generated by resolving sequencing products or fragments of varying length according to their size and tags which indicate the positions of the selected species of subunit within a common region of interest in the variant sequences.
- a suitable detection means is suitably provided to detect the tags.
- an electromagnetic wave source may be used to induce the emission of electromagnetic energy (fluorescence by the tags), and the emitted energy is detected by a detector to produce an analog signal.
- the analog signal is sampled and the sampled values transmitted to a data file, which typically represents a chromatogram that includes a plurality of peaks reflecting the positions and species of the subunits in the variant sequences that are the subject of the analysis.
- a data file typically represents a chromatogram that includes a plurality of peaks reflecting the positions and species of the subunits in the variant sequences that are the subject of the analysis.
- data representing a chromatogram may be stored in a data store, which preferably includes a database, for use by a processing system in operable communication with the data store.
- the data store may have stored therein the above described plurality of peaks including, for individual positions of the common region of interest, a group of peaks representing a plurality of different species of subunit.
- the processing system is adapted to process the data in the data store to generate a comparison of peak features.
- This processing typically involves extracting a vector from one or more peak features for each peak, wherein the vector may quantify such peak characteristics as peak height and area under the peak, and comparing the vector(s) derived for each peak with the vector(s) of other peaks to deduce the species of subunit that is in higher abundance at a respective position than other species of subunit.
- the processing may involve, prior to feature extraction, correction of the signals for certain distortions such as peak clipping and contextual influences.
- Any general or special pu ⁇ ose processing system includes, but is not limited to, a processor in operable (e.g, electrical) communication with both a memory and at least one input/output device, such as a keyboard and a display.
- a processor in operable (e.g, electrical) communication with both a memory and at least one input/output device, such as a keyboard and a display.
- Such a system may include, but is not limited to, personal computers, workstations or mainframes.
- the processor may be a general pu ⁇ ose processor or microprocessor or a specialised processor executing programs located in RAM memory.
- the programs may be placed in memory (e.g., RAM) from a storage device, such as a disk or pre-programmed ROM memory.
- the RAM memory in one embodiment is used both for data storage arid program execution.
- Figure 7 is a schematic representation of a processing system (100) having in operable communication (101) with one another via, for example, an internal bus or external network, a processor (102), a memory (103), an input/output device (104) such as a keyboard and display and a data store (105), which typically includes a database (106).
- the data store may be in the form of an external storage device such as but not limited to a diskette, CD ROM, or magnetic tape.
- the processing system 100 may be formed from any suitable processing system, which is capable of operating applications software to enable the processing of the data, such as a suitably programmed personal computer.
- the processing system includes an interface (107), such as a network interface card, allowing the processing system to be connected to remote processing systems, such as via the Internet as will be described in more detail below.
- the processing system executes a sequence analysis program that includes computer executable code which when implemented on the processing system causes the system to receive data representing a chromatogram, as described above, which includes a plurality of peaks reflecting the positions and species of the subunits in the variant sequences that are the subject of the analysis.
- the chromatogram data may be obtained from a number of sources, such as manual input via the I/O device 104 or received from an external processing system via the interface 107; or by accessing subunit sequences stored in the database 106.
- the system is also caused to process the chromatogram data to generate a comparison of peak features to deduce the species of subunit that is in higher abundance at a respective position than other species of subunit.
- the system is caused (1) to extract a vector from one or more peak features for each peak, wherein the vector suitably quantifies peak features such as peak height and area under the peak, and (2) to compare the vector(s) derived for each peak with the vector(s) of other peaks to deduce the species of subunit that is in higher abundance at an individual position of the common region than other species of subunit.
- the system is further caused to effect this process for all positions in the common region to thereby deduce the species of subunit which is in highest abundance for each position of the corresponding region of the target sequence.
- the processing means analyses base statistics at each site to determine the quality of the sequence so deduced, and to identify possible sites at which there are sequencing errors or polymo ⁇ hisms.
- the processing system is further adapted to generate an indication of the target sequence, which is suitably displayed by a display means that is part of the processing system.
- kits may also optionally include appropriate reagents for detection of labels, positive and negative controls, dilution buffers and the like.
- a nucleic acid- based SAQOM l ⁇ t may include at least one, and preferably at least two, of the following: (i) a set of mutagenic nucleotide analogues such as dPTP, (ii) chain-elongating nucleotides, (iii) chain- terminating nucleotides, (iv) a polymerase such a Taq polymerase; (v) a polymerase such a Rolling Circle ⁇ 29 DNA polymerase or an error-prone DNA polymerase, (vi) buffer, (vii) adaptor polynucleotide primers, (viii) sequencing polynucleotide primers, (ix) adaptor restriction endonucleases and (x) a parent polynucleotide and a mixture of variant polynucleotides according to the invention, which may be used as a positive control.
- kits also generally will comprise, in suitable means, distinct containers for each individual reagent and instructions for use of
- the target sequence was an undefined DNA fragment pTEST of 1.5 kb in length cloned into pUC19.
- Amplification conditions were essentially as described by Zaccolo et al (1996 supra). Using a concentration of 400 ⁇ M of each dATP, dCTP, dGTP, dTTP, and a mutagenic nucleotide analogue such as dPTP in a non-standard PCR reaction, mutations were inco ⁇ orated up to a frequency of 1 in 5.
- Universal M13 primers FSP-21, FSP-40, RSP-26, RSP-48 were used for PCR amplification and sequencing. All amplification reaction described herein were performed with an Applied Biosystems GeneAmpTM PCR System 9700 or 9600 thermal cycler.
- the target sequence was an undefined DNA fragment pTESTTM of 1.5 kb in length cloned into pUC19. Amplification conditions were essentially as described by Zaccolo et al (1996 supra). Using a concentration of 200 ⁇ M of each dATP, dCTP, dGTP, dTTP and universal M13 primers (FSP-21, FSP-40, RSP-26, RSP-48) the PCR amplified products could be used directly for sequencing, or could be cloned and individual clones could be used for sequencing.
- Reaction conditions [0196] 2 ng DNA template, lx AmpliTaqTM Gold buffer, 200 ⁇ M dNTPs, 2 mM magnesium chloride, 0.4 ⁇ M each primer, 1 unit of AmpliTaqTM Gold, in a total of 25 ⁇ L. Reactions were performed as follows: 1 cycle of 94° C for 15 min., 30 cycles of 94° C 1 min, 50° C, 0.5 min, 72° C for 5 min, 1 cycle 72° C for 10 min. This regimen yields PCR products that are not further mutated but comprise the same mutations inco ⁇ orated in the amplification of Example 1. EXAMPLE 3 Standard DNA Sequencing by Cycle PCR Amplification (Big Dye version 3.0)
- the target sequence was typically an undefined DNA fragment cloned into pDPJVETM or pUC19.
- Cycle sequencing amplification conditions were essentially as described by Applied Biosystems Inco ⁇ orated (2002) for BigDyeTM® Terminator v3.0 and v3.1 Cycle Sequencing Kits.
- dATP dCTP
- dGTP dGTP
- dTTP ABI proprietary concentrations of the dye terminators ddATP, ddCTP, ddGTP and ddTTP
- FSP-21, FSP-40, RSP-26, RSP-48 universal M13 primers
- Viable mutated DNAs were recovered by re-amplification of 1 ⁇ L of a 1 in 1000-fold dilution of the nucleotide analogue modified PCR reaction products with nested primers and the four conventional dNTPs and standard PCR conditions.
- mutant PCR products were gel purified and then cloned into the pGEM T-
- EASYTM vector Promega
- pDRJNETM Promega
- Plasmid D ⁇ A of individual clones was sequenced using standard sequencing conditions. The effect of the analogues on the sequence of D ⁇ A is illustrated in Figures 3B, 4B and 5B for individual mutant clones.
- the target sequence was typically 1 pg-10 ng of an undefined D ⁇ A fragment cloned into pDRJNETM or pUC19.
- Isothermal amplification conditions were essentially as described by Amersham BioSciences PLC (2003) for TempliPhiTM D ⁇ A Amplification Kits and by Dean et al (2001 Genome Research 11, 1095-1099).
- an Amersham proprietary concentration of approximately 200 ⁇ M of each dATP, dCTP, dGTP, dTTP, and Amersham proprietary random hexamer primers the amplified D ⁇ A products were used directly for sequencing.
- TempliPhiTM premix (with dNTPs, TempliPhiTM incubation buffer and 29 DNA polymerase) to the cooled sample and mix briefly, then incubate at 30° C for 8-12 hr (recommended range 4-18 h). Heat-inactivate the enzyme at 65° C for 10 min, then cool to 4 °C. Dilute the amplified DNA approximately 1000-fold and use 10 ng DNA (typically 2-3 ⁇ L) to template for a Standard DNA Sequencing reaction.
- the target sequence was typically 1 pg-10 ng of an undefined DNA fragment cloned into pDRINETM or pUC19.
- Isothermal amplification conditions were essentially as described by Amersham BioSciences PLC (2003) for TempliPhiTM D ⁇ A Amplification Kits and GenomiPhiTM DNA Amplification Kits.
- Amersham proprietary concentration approximately 200 ⁇ M of each dATP, dCTP, dGTP, dTTP, mutagenic nucleotide analogues and Amersham proprietary random hexamer primers, the amplified DNA products were used directly for sequencing.
- Amplification buffer 200 ⁇ M dNTPs, 50 to 100 ⁇ M analogue dNTP, 2 mM Magnesium chloride,
- IM analogue dNTP 10 ⁇ L of TempliPhiTM premix (with dNTPs, TempliPhiTM incubation buffer and ⁇ 29 DNA polymerase) to the cooled sample and mix briefly, then incubate at 30° C for 8-12 hr (recommended range 4-18 h). Heat-inactivate the enzyme at 65° C for 10 min, then cool to 4° C. Dilute the amplified DNA approximately 5 -fold and use 10 ng DNA (typically 2-3 ⁇ L) to template for a Standard DNA Sequencing reaction.
- Plasmid pTESTTM was mutated using the analogue 5-Br-dUTP with the modified alkaline TempliPhiTM reaction. 1 ⁇ L of a 1 in 1000-fold dilution of the reaction products recovered by PCR re-amplification with nested primers and the four natural or conventional dNTPs and standard PCR conditions. PCR amplified fragments were cloned and individual clones were sequenced and the effect of the analogues on the sequence is illustrated in Figure 8 for an individual mutant pTESTTM clone, as described above. Here mutations were inco ⁇ orated up to a frequency of 1 in 25. A further example is illustrated in Figure 5B for an individual mutant fragment of the previously unsequencable Human genomic BAC clone RP11-167L9 as described above.
- Amplification conditions were essentially as described by Zaccolo et al. (1996 supra). Using a concentration of 400 ⁇ M of each dATP, dCTP, dGTP, dTTP, and 400 ⁇ M of a mutagenic nucleotide analogue such as dPTP using a Nucleotide Analogue PCR Amplification, mutations were inco ⁇ orated up to a frequency of 1 in 5. Universal M13 primers (FSP-21, FSP-40, RSP-26, RSP-48) were used for PCR amplification.
- EASYTM vector Promega
- EASYTM vector Promega
- Plasmids from individual clones were restricted with selected restriction endonucleases to determine the orientation of the cloned mutated DNA inserted in the plasmid. Plasmids DNA from selected individual clones with common insert orientation were combined in an approximate equimolar mixture and sequenced using Standard sequencing conditions.
- Mutated DNAs were recovered by PCR re-amplification of 1 ⁇ L of a 1 in 1000 -fold dilution of the nucleotide analogue modified PCR reaction products with nested primers and the four natural dNTPs and standard PCR conditions.
- PCR recovered products of mutant TempliPhiTM amplification were gel purified and then cloned into the pGEM T-EASYTM vector (Promega) and transformed into E. coli. Plasmids from individual clones were restricted with selected restriction endonucleases to determine the orientation of the cloned mutated DNA inserted in the plasmid. Plasmids DNA from selected individual clones with common insert orientation were combined in an approximate equimolar mixture and sequenced using Standard sequencing conditions.
- Examples of a mixed sequence read is the nine mutated previously unsequencable human fragment RP11-167L9 clones of the same insert orientation illustrated in Figures 6B, and the eight mutated previously unsequencable human fragment RP11-167L9 clones of the same insert orientation illustrated in Figures 6C and 6D.
- a further example is the mixture of 15 different mutated pTESTTM clones of the same insert orientation that was sequenced directly using Standard DNA Sequencing by Cycle PCR Amplification (Big Dye version 3.0) illustrated in Figures 9B.
- mutations are typically inco ⁇ orated in individual molecules up to a frequency of 1 in 5 (see EXAMPLE 1 above).
- the target sequence was typically an undefined DNA fragment cloned into pDRINETM or pUC19.
- Cycle sequencing amplification conditions were essentially as described by Applied Biosystems Inco ⁇ orated (2002) for BigDyeTM® Terminator v2.0, v3.0 and v3.1 Cycle Sequencing Kits.
- the annealing step can be eliminated.
- Tm of a primer is ⁇ 50° C
- PROTOCOL 5 [0272] The reaction comprises: 5 ⁇ L denaturing sample buffer, 1 ⁇ L 1/1000 diluted plasmid
- Such simple repeat motifs can be simultaneously genotyped by direct PCR amplification from human genomic DNA template using PCR amplification in the presence of a mutagenic nucleotide analogue and multiplex primer pairs. Single locus genotypes can also be determined using single locus primer pairs. The method is advantageous to eliminate Taq DNA polymerase "stutter products" from simple repeat motifs comprising homopolymer and dinucleotide sequences.
- Reaction conditions [0275] 2 ⁇ g of human genomic DNA template, lx AmpliTaqTM Gold buffer, 200 ⁇ M dNTPs, 2 mM magnesium chloride, 0.4 ⁇ M each primer (e.g., a pair of primers flanking a polymo ⁇ hic microsatellite locus or sometimes referred to as simple tandem repeat (STR) primers), 1 unit of AmpliTaqTM Gold, and about 30 to 200 ⁇ M of a mutagenic nucleotide analogue such as 8-oxo- dGTP, and in a Nucleotide Analogue PCR Amplification in a total of 25 ⁇ L.
- STR simple tandem repeat
- Reactions were performed as follows: 1 cycle of 94° C for 15 min., 30 cycles of 94° C 1 min, 50° C, 0.5 min, 72° C for 5 min., 1 cycle 72° C for 10 min. This regimen yields PCR products inco ⁇ orating analogue bases.
- Original DNA template [plasmid or larger DNA fragment, e.g. BAC clone]
- Original DNA template [plasmid or larger DNA fragment, e.g. BAC clone]
- Original DNA template [plasmid or larger DNA fragment, e.g. BAC clone]
- Original DNA template e.g. genomic DNA template
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Organic Chemistry (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Microbiology (AREA)
- Immunology (AREA)
- Biotechnology (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Analytical Chemistry (AREA)
- Physics & Mathematics (AREA)
- Biochemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Genetics & Genomics (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| AU2003277984A AU2003277984A1 (en) | 2002-11-05 | 2003-11-05 | Nucleotide sequence analysis by quantification of mutagenesis |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US42404902P | 2002-11-05 | 2002-11-05 | |
| US60/424,049 | 2002-11-05 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2004042078A1 true WO2004042078A1 (fr) | 2004-05-21 |
Family
ID=32312745
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/AU2003/001459 Ceased WO2004042078A1 (fr) | 2002-11-05 | 2003-11-05 | Analyse de sequence nucleotidique par quantification de mutagenese |
Country Status (2)
| Country | Link |
|---|---|
| AU (1) | AU2003277984A1 (fr) |
| WO (1) | WO2004042078A1 (fr) |
Cited By (12)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9074251B2 (en) | 2011-02-10 | 2015-07-07 | Illumina, Inc. | Linking sequence reads using paired code tags |
| WO2016057947A1 (fr) * | 2014-10-10 | 2016-04-14 | Cold Spring Harbor Laboratory | Mutation nucléotidique aléatoire pour dénombrement et assemblage de matrices nucléotidiques |
| US9683230B2 (en) | 2013-01-09 | 2017-06-20 | Illumina Cambridge Limited | Sample preparation on a solid support |
| US9977861B2 (en) | 2012-07-18 | 2018-05-22 | Illumina Cambridge Limited | Methods and systems for determining haplotypes and phasing of haplotypes |
| US10246746B2 (en) | 2013-12-20 | 2019-04-02 | Illumina, Inc. | Preserving genomic connectivity information in fragmented genomic DNA samples |
| WO2019204585A1 (fr) * | 2018-04-19 | 2019-10-24 | Massachusetts Institute Of Technology | Détection de rupture simple brin dans un adn double brin |
| US10457936B2 (en) | 2011-02-02 | 2019-10-29 | University Of Washington Through Its Center For Commercialization | Massively parallel contiguity mapping |
| WO2020010495A1 (fr) * | 2018-07-09 | 2020-01-16 | 深圳华大智造极创科技有限公司 | Procédé de séquençage d'acides nucléiques |
| US10557133B2 (en) | 2013-03-13 | 2020-02-11 | Illumina, Inc. | Methods and compositions for nucleic acid sequencing |
| US10577601B2 (en) | 2008-09-12 | 2020-03-03 | University Of Washington | Error detection in sequence tag directed subassemblies of short sequencing reads |
| US11873480B2 (en) | 2014-10-17 | 2024-01-16 | Illumina Cambridge Limited | Contiguity preserving transposition |
| WO2025024703A1 (fr) * | 2023-07-26 | 2025-01-30 | Bio-Rad Laboratories, Inc. | Dnaseq unicellulaire à double tagmentation |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2000056923A2 (fr) * | 1999-03-24 | 2000-09-28 | Clatterbridge Cancer Research Trust | Analyse genetique |
| WO2002079502A1 (fr) * | 2001-03-28 | 2002-10-10 | The University Of Queensland | Procede d'analyse des sequences d'acide nucleique |
-
2003
- 2003-11-05 AU AU2003277984A patent/AU2003277984A1/en not_active Abandoned
- 2003-11-05 WO PCT/AU2003/001459 patent/WO2004042078A1/fr not_active Ceased
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2000056923A2 (fr) * | 1999-03-24 | 2000-09-28 | Clatterbridge Cancer Research Trust | Analyse genetique |
| WO2002079502A1 (fr) * | 2001-03-28 | 2002-10-10 | The University Of Queensland | Procede d'analyse des sequences d'acide nucleique |
Cited By (27)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US12152236B2 (en) | 2008-09-12 | 2024-11-26 | University Of Washington | Sequence tag directed subassembly of short sequencing reads into long sequencing reads |
| US11505795B2 (en) | 2008-09-12 | 2022-11-22 | University Of Washington | Error detection in sequence tag directed sequencing reads |
| US10577601B2 (en) | 2008-09-12 | 2020-03-03 | University Of Washington | Error detection in sequence tag directed subassemblies of short sequencing reads |
| US11999951B2 (en) | 2011-02-02 | 2024-06-04 | University Of Washington Through Its Center For Commercialization | Massively parallel contiguity mapping |
| US10457936B2 (en) | 2011-02-02 | 2019-10-29 | University Of Washington Through Its Center For Commercialization | Massively parallel contiguity mapping |
| US11299730B2 (en) | 2011-02-02 | 2022-04-12 | University Of Washington Through Its Center For Commercialization | Massively parallel contiguity mapping |
| US9074251B2 (en) | 2011-02-10 | 2015-07-07 | Illumina, Inc. | Linking sequence reads using paired code tags |
| US11993772B2 (en) | 2011-02-10 | 2024-05-28 | Illumina, Inc. | Linking sequence reads using paired code tags |
| US10246705B2 (en) | 2011-02-10 | 2019-04-02 | Ilumina, Inc. | Linking sequence reads using paired code tags |
| US9977861B2 (en) | 2012-07-18 | 2018-05-22 | Illumina Cambridge Limited | Methods and systems for determining haplotypes and phasing of haplotypes |
| US11605446B2 (en) | 2012-07-18 | 2023-03-14 | Illumina Cambridge Limited | Methods and systems for determining haplotypes and phasing of haplotypes |
| US11257568B2 (en) | 2012-07-18 | 2022-02-22 | Illumina Cambridge Limited | Methods and systems for determining haplotypes and phasing of haplotypes |
| US10988760B2 (en) | 2013-01-09 | 2021-04-27 | Illumina Cambridge Limited | Sample preparation on a solid support |
| US11970695B2 (en) | 2013-01-09 | 2024-04-30 | Illumina Cambridge Limited | Sample preparation on a solid support |
| US9683230B2 (en) | 2013-01-09 | 2017-06-20 | Illumina Cambridge Limited | Sample preparation on a solid support |
| US10041066B2 (en) | 2013-01-09 | 2018-08-07 | Illumina Cambridge Limited | Sample preparation on a solid support |
| US10557133B2 (en) | 2013-03-13 | 2020-02-11 | Illumina, Inc. | Methods and compositions for nucleic acid sequencing |
| US11319534B2 (en) | 2013-03-13 | 2022-05-03 | Illumina, Inc. | Methods and compositions for nucleic acid sequencing |
| US10246746B2 (en) | 2013-12-20 | 2019-04-02 | Illumina, Inc. | Preserving genomic connectivity information in fragmented genomic DNA samples |
| US11149310B2 (en) | 2013-12-20 | 2021-10-19 | Illumina, Inc. | Preserving genomic connectivity information in fragmented genomic DNA samples |
| US11008606B2 (en) | 2014-10-10 | 2021-05-18 | Cold Spring Harbor Laboratory | Random nucleotide mutation for nucleotide template counting and assembly |
| WO2016057947A1 (fr) * | 2014-10-10 | 2016-04-14 | Cold Spring Harbor Laboratory | Mutation nucléotidique aléatoire pour dénombrement et assemblage de matrices nucléotidiques |
| US11873480B2 (en) | 2014-10-17 | 2024-01-16 | Illumina Cambridge Limited | Contiguity preserving transposition |
| WO2019204585A1 (fr) * | 2018-04-19 | 2019-10-24 | Massachusetts Institute Of Technology | Détection de rupture simple brin dans un adn double brin |
| US12338486B2 (en) | 2018-04-19 | 2025-06-24 | Massachusetts Institute Of Technology | Single-stranded break detection in double-stranded DNA |
| WO2020010495A1 (fr) * | 2018-07-09 | 2020-01-16 | 深圳华大智造极创科技有限公司 | Procédé de séquençage d'acides nucléiques |
| WO2025024703A1 (fr) * | 2023-07-26 | 2025-01-30 | Bio-Rad Laboratories, Inc. | Dnaseq unicellulaire à double tagmentation |
Also Published As
| Publication number | Publication date |
|---|---|
| AU2003277984A1 (en) | 2004-06-07 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| EP3626866B1 (fr) | Bibliothèques de séquençage de nouvelle génération | |
| US12139754B2 (en) | Polynucleotide barcodes for long read sequencing | |
| KR20210003795A (ko) | 암 또는 신생물 평가를 위한 조성물 및 방법 | |
| EP1831401B1 (fr) | Procedes, compositions et kits pour former des polynucleotides auto-complementaires | |
| CN101213311A (zh) | 利用滚环扩增扩增和克隆单个dna分子 | |
| WO2004042078A1 (fr) | Analyse de sequence nucleotidique par quantification de mutagenese | |
| AU773447B2 (en) | Template-dependent nucleic acid polymerization using oligonucleotide triphosphates building blocks | |
| KR20210112350A (ko) | 다중 복제수 변이 검출 및 대립 유전자 비율 정량화를 위한 정량적 앰플리콘 서열분석 | |
| US20230265501A1 (en) | Phase protective reagent flow ordering | |
| US12297490B2 (en) | Methods for asymmetric DNA library generation and optionally integrated duplex sequencing | |
| US9879318B2 (en) | Methods and compositions for nucleic acid sample preparation | |
| US20230357854A1 (en) | Enhanced sequencing following random dna ligation and repeat element amplification | |
| US20250320474A1 (en) | Modified enzymes and uses thereof | |
| US20240052339A1 (en) | Rna probe for mutation profiling and use thereof | |
| WO2022272150A2 (fr) | Séquençage de produits de transcription liés | |
| EP4638776A1 (fr) | Extraction de molécules d'acide nucléique à séquence vérifiée | |
| HK40062228A (en) | Quantitative amplicon sequencing for multiplexed copy number variation detection and allele ratio quantitation | |
| JP2006141347A (ja) | 高度好熱菌由来RecAタンパク質を用いた核酸塩基配列決定方法および核酸塩基配列決定キット | |
| HK1112490B (en) | Amplification and cloning of single dna molecules using rolling circle amplification |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
| AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
| DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
| WPC | Withdrawal of priority claims after completion of the technical preparations for international publication |
Ref country code: WO |
|
| NENP | Non-entry into the national phase |
Ref country code: JP |
|
| WWW | Wipo information: withdrawn in national office |
Country of ref document: JP |
|
| 122 | Ep: pct application non-entry in european phase |