AU2005201777A1

AU2005201777A1 - A Method for Direct Nucleic Acid Sequencing

Info

Publication number: AU2005201777A1
Application number: AU2005201777A
Authority: AU
Inventors: Niall Antony Armes; Derek Lyle Stemple
Original assignee: TwistDix Inc
Current assignee: TwistDix Inc
Priority date: 1999-03-10
Filing date: 2005-04-28
Publication date: 2005-05-19
Anticipated expiration: 2020-03-10
Also published as: AU2005201777B2

Description

0

N

ci 00 ci

AUSTRALIA

Patents Act 1990 COMPLETE SPECIFICATION FOR A STANDARD PATENT Name of Applicant: Address for Service: Invention Title: ASM Scientific, Inc.

CULLEN CO.

Level 26 239 George Street Brisbane Qld 4000 A Method for Direct Nucleic Acid Sequencing The following statement is a full description of the invention, including the best method of performing it, known to us: 00 FIELD OF THE INVENTION The present invention relates to methods for sequencing nucleic acid samples. More.

S specifically, the present invention relates to methods for sequencing without the need for S amplification; prior knowledge of some of the nucleotide sequence to generate the sequencing O primers; and the labor-intensive electrophoresis techniques.

in O BACKGROUND OF THE INVENTION The sequencing of nucleic acid.samples is an important analytical technique in modem molecular biology. The development of reliable methods for DNA sequencing has been crucial for understanding the function and control of genes and for applying many of the basic techniques of molecular biology. These methods have also become increasingly important as tools in genomic analysis and many non-research applications, such as genetic identification, forensic analysis, genetic counseling, medical diagnostics and many others. In these latter applications, both techniques providing partial sequence information, such as fingerprinting and sequence comparisons, and techniques providing full sequence determination have been employed. See, Gibbs et al., Proc. Natl. Acad. Sci USA 86: 1919-1923 (1989); Gyllensten et Proc. Natl. Acad. Sci USA 85: 7652-7656 (1988); Carrano et Genomics 4: 129-136 (1989); Caetano-Annoles et al., Mol. Gen. Genet. 235: 157-165 (1992); Brenner and Livak, Proc. Natl. Acad. Sci USA 86: 8902-8906 (1989); Green et al., PCR Methods and Applications 1: 77-90 (1991); and Versalovic et al, Nucleic Acid Res. 19: 6823-6831 (1991).

Most currently available DNA sequencing methods require the generation of a set of DNA fragments that are ordered by length according to nucleotide composition. The generation of this set of ordered fragments occurs in one of two ways: chemical degradation at specific nucleotides using the Maxam-Gilbert method or dideoxy nucleotide incorporation using the Sanger method. See Maxam and Gilbert; Proc Natl Acad Sci USA 74: 560-564 (1977); Sanger et al., Proc Natl Acad Sci USA 74: 5463-5467 (1977). The type and number of required steps inherently limits both the number of DNA segments that can be sequenced in parallel, and the amount of sequence that can be determined from a given site. Furthermore, both methods are prone to error due to the anomalous migration of DNA fragments in denaturing gels. Time and space limitations inherent in these gel-based methods have fueled the search for alternative methods.

SIn an effort to satisfy the current large-scale sequencing demands, improvements have been <1 made to the Sanger method. For example, the use of fluorescent chain terminators simplifies 00 Cil detection of the nucleotides. The synthesis of longer DNA fragments and improved fragment resolution produces more sequence information from each experiment. Automated analysis of fragments in gels or capillaries has significantly reduced the labor involved in collecting and processing sequence information. See, Prober et al., Science 238: 336-341 (1987); Smith et al., Nature 321: 674-679 (1986); Luckey et al., Nucleic Acids Res 18: 4417-4421(1990); S Dovichi, Electrophoresis 18: 2393-2399 (1997).

Cil However, current DNA sequencing technologies still suffer three major limitations. First, they require a large amount of identical DNA molecules, which are generally obtained either by molecular cloning or by polymerase chain reaction (PCR) amplification of DNA sequences.

Current methods of detection are insensitive and thus require a minimum critical number of labeled oligonucleotides. Also, many identical copies of the oligonucleotide are needed to generate a sequence ladder. A second limitation is that current sequencing techniques depend on priming from sequence-specific oligodeoxynucleotides that must be synthesized prior to initiating the sequencing procedure. Sanger and Coulson, J. Mol. Biol. 94: 441-448 (1975).

The need for multiple identical templates necessitates the synchronous priming of each copy from the same predetermined site. Third. current sequencing techniques depend on lengthy, labor-intensive electrophoresis techniques that are limited by the rate at which the fragments may be separated and are also limited by the number of bases that can be sequenced in a given experiment by the resolution obtainable on the gel.

In an effort to dispense with the need for elecrophoresis techniques, a sequencing method was developed which uses chain terminators that can be uncaged. or deprotected, for further extension. See, U.S. Patent No. 5,302,509: Metzker et al., Nucleic Acids Res. 22: 4259-4267 (1994). This method involves repetitive cycles of base incorporation, detection of incorporation, and re-activation of the chain terminator to allow the next cycle of DNA synthesis. Thus, by detecting each added base while the DNA chain is growing, the need for size-fractionation is eliminated. This method is nevertheless still highly dependent on large amounts of nucleic acid to be sequenced and the use of known sequences for priming the initiation of chain growth. Moreover. this technique is plagued by any inefficiencies. of O incorporation and deprotection. Because incorporation and 3'-OH regeneration are not Ci completely efficient, a pool of initially identical extending strands can rapidly become 0. asynchronous and sequences cannot be resolved beyond a few limited initial additions.

00 Thus, a need still remains in the art for a rapid, cost effective, high'throughput method for sequencing unknown nucleic acid samples that eliminates the need for amplification; prior knowledge of some of the nucleotide sequence to generate sequencing primers; and laborintensive electrophoresis techniques.

SUMMARY OF THE INVENTION S The present invention provides rapid, cost effective, high throughput methods for sequencing unknown nucleic acid samples that eliminate the need for amplification; prior knowledge of some of the nucleotide. sequence to generate sequencing primers; and labor-intensive electrophoresis techniques. The methods of the present invention permit direct nucleic acid sequencing (DNAS) of single nucleic acid molecules.

According to the methods of the present invention, a plurality of polymerase molecules is immobilized on a solid support through a covalent or non-covalent interaction. A nucleic acid sample and oligonucleotide primers are introduced to the reaction chamber in a- buffered solution containing all four labeled-caged nucleoside triphosphate terminators. Templatedriven elongation of a nucleic acid is mediated by the attached polymerases using the labeledcaged nucleoside triphosphate terminators. Reaction centers are monitored by the microscope system until a majority of sites contain immobilized polymerase bound to a nucleic acid template with a single incorporated labeled-caged nucleotide terminator. The reaction chamber is then flushed with a wash buffer. Specific nucleotide incorporation is then determined for each active reaction center. Following detection, the reaction chamber is irradiated to uncage the incorporated nucleotide and flushed with wash buffer once again. The presence of labeledcaged nucleotides is once again monitored before fresh reagents are added to reinitiate synthesis, to verify that reaction centers are successfully uncaged. A persistent failure of release or incorporation, however, indicates failure of a reaction center. A persistent failure of release or incorporation consists of 2-20 cycles, preferably 3-10 cycles, more preferably cycles, Wherein the presence of a labeled-caged nucleotide is detected during the second detection step, indicating that the reaction center was not successfully, uncaged. The sequencing cycle outlined above is repeated until a large proportion of reaction centers fail.

The differentially-labeled nucleotides used in the sequencing methods of the present invention S have a detachable labeling group and are blocked at the 3' portion with a detachable blocking group. In a preferred embodiment, the labeling group is directly attached to the detachable 3' blocking group. Uncaging of the nucleotides can be accomplished enzymatically, 00 chemically,or preferably photolytically, depending on the detachable linker used to link the labeling group and the 3' blocking group to the nucleotide.

In another preferred embodiment, the labeling group is attached to the base of each nucleotide with a detachable linker rather than to the detachable 3' blocking group. The labeling group Cl and the 3'blocking group can be removed enzymatically, chemically, or photolvtically.

O Alternative, the labeling group can be removed by a different method than and the 3' blocking C group. For example, the labeling group can be removed enzymatically while the 3' blocking group is removed chemically, or by photochemical activation.

Many independent reactions occur simultaneously within the reaction chamber, each individual reaction center generating a few hundred, or thousands, of base pairs. This apparatus has the capacity to sequence in parallel thousands and possibly millions of separate templates from either specified or random sequence points. The combined sequence from each run is on the order of several million base-pairs of sequence and does not require amplification, prior knowledge of a portion of the target sequence, or resolution of fragments on gels or capillaries.

Simple DNA preparations from any source can be sequenced with the apparatus and methods of the present invention.

BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 (Panels A-C) is a schematic representation of labeled-caged terminator nucleotides for use in direct nucleic acid sequencing. Panel A depicts a deoxyadenosine triphosphate modified by attachment of a photolabile linker-fluorochrome conjugate to the 3' carbon of the ribose.

Panel B depicts an alternative configuration, wherein the fluorochrome is attached to the base of the nucleotide by way of a photolabile linker. Panel C depicts the four different nucleotides each labeled with a fluorochrome with distinct spectral properties, which permits the four nucleotides to be distinguished during the detection phase of a direct nucleic acid sequencing reaction cycle.

o FIG. 2 is a schematic representation of the steps of one cycle of direct nucleic acid sequencing, C wherein step 1 illustrates the incorporation of a labeled-caged nucleotide, step 2 illustrates the detection of the label, and step 3 illustrates the unblocking of the 3'-OH cage.

00 FIG. 3 is a schematic representation of a reaction center depicting an immobilized polymerase and a nucleic acid sample being sequenced.

FIG. 4 is a schematic representation of the reaction chamber assembly that houses the array of DNAS reaction centers and mediates the exchange of reagents and buffer.

O FIG. 5 is a schematic representation of a reaction center array. The left side panel (Microscope O Field) depicts the view of an entire array as recorded by four successive detection events (one S for each of the separate fluorochromes). The center panel depicts a magnified view of a part of the field showing the spacing of individual reaction centers. The far right panel depicts the camera's view of a single reaction center.

FIG. 6 is a schematic representation of the principle of the evanescent wave.

FIG. 7 is a schematic representation of a direct nucleic acid sequencing set up using total internal reflection fluorescence microscopy.

FIG. 8 is a schematic representation of an example of a data acquisition algorithm obtained from a 3x3 matrix.

DETAILED DESCRIPTION OF THE INVENTION The present invention provides a novel sequencing apparatus and a novel sequencing method.

The method of the present invention, referred to herein as Direct Nucleic Acid Sequencing (DNAS), offers a rapid, cost effective, high throughput method by which nucleic acid molecules from any source can be readily sequenced without the need for prior amplification.

DNAS can be used to determine the nucleotide sequence of numerous single nucleic acid molecules in parallel.

1. DNAS Reaction Center Array olymerases are attached to the solid support, spaced at regular intervals, in an array of 'eaction centers, present at a periodicity greater than the optical resolving power of the nicroscope system. Preferably, only one polymerase molecule is present in each reaction :enter, and each reaction center is located at an optically resolvable distance from the other S reaction centers. Sequencing reactions preferably occur in a thin aqueous reaction chamber comprising a sealed cover slip and an optically transparent solid support.

S Immobilization of polymerase molecules for use in nucleic acid sequencing has been disclosed by Densham in PCT application WO 99/ 05315. Densham describes the attachment of selected 00 Cl amino groups within the polymerase to a dextran or N-hydroxysuccinimide ester-activated surface. WO 99/ 05315; EP-A-0589867; L6fas et al., Biosens. Bioelectron 10: 813-822 (1995). These techniques can be modified in the present invention to insure that the activated area is small enough so that steric hindrance will prevent the attachment of more than one 0 Cl polymerase at any given spot in the array.

The array of reaction centers containing a single polymerase molecule is constructed using Cl lithographic techniques commonly, used in the construction of electronic integrated circuits.

This methodology has. been used in the art to construct microscopic arrays of oligodeoxynucleotides and arrays of single protein motors. See, Chee et al., Science 274: 610-614 (1996); Fodor et al., Nature 364: 555-556 (1993); Fodor et al., Science 251: 767-773 (1991); Gushin, et al., Anal. Biochem. 250: 203-211 (1997); Kinosita et al., Cell 93: 21-24 (1998); Kato-Yamada et al., J Biol. Chem. 273: 19375-19377 (1998); and Yasuda et al., Cell 93: 1117-1124 (1998). Using techniques such.as photolithography and/or electron beam lithography [Rai-Choudhury, Handbook of Microlithography, Micromachining. and Microfabricarion, Volume I: Microlithography,. Volume PM39, SPIE Press (1997); Service, Science 283: 27-28 (1999)], the substrate is sensitized with a linking group that allows attachment of a single modified protein. Alternatively, an array of sensitized sites can be generated using thin-film technology such as Langmuir-Blodgett. See, Zasadzinski et al., Science 263: 1726-1733 (1994)..

The regular spacing of proteins is achieved by attachment of the protein to these sensitized sites on the substrate. Polymerases containing the appropriate tag are incubated with the sensitized substrate so that a single polymerase molecule attaches at each sensitized site. The attachment of the polymerase can be achieved via a covalent or non-covalent interaction.

Examples of such linkages common in the an include Ni'-/hexahistidine, streptavidin/biotin.

avidin/biotin, glutathione S-transferase (GST)!glutathione. monoclonal antibody/antigen, and maltose binding protein/maltose.

7 O A schematic representation of a reaction center is presented in FIG. 3. A DNA polymerase C] from Thermus aquaticus) is attached to a glass microscope slide. Attachment is mediated 0. by a hexahistidine tag on the polymerase, bound by strong non-covalent interaction to a Ni 2 O0 atom, which is, in turn, held to the glass by nitrilotriacetic acid and a linker molecule. The C' nitrilotriacetic acid is covalently linked to the glass by a linker attached by silane chemistry.

The silane chemistry is limited to small diameter spots etched at evenly spaced intervals on the glass by electron beam lithography or photolithography. In addition to the attached polymerase, the reaction center includes the template DNA molecule and an oligonucleotide primer both bound to the polymerase. The glass slide constitutes the lower slide of the DNAS S reaction chamber..

Housing the array of DNAS reaction centers and mediating the exchange of reagents and buffer is. the reaction chamber assembly. An example of DNAS reaction chamber assembly is illustrated in FIG. 4. The reaction chamber is a sealed compartment with transparent upper and lower slides. The slides are held in place by a metal or plastic housing, which may be assembled and disassembled to allow replacement of the slides. There are two ports that allow access to the chamber. One port allows the input of buffer (and reagents) and the other port allows buffer (and reaction products) to be withdrawn from the chamber. The lower slide carries the reaction center array. In addition, a prism is attached to the lower slide to direct laser light into the lower slide at such angle as to produce total internal reflection of the laser light within the lower slide. This arrangement allows an evanescent wave to be generated over the reaction center array. A high numerical apertnure objective lens is used to focus the image of the reaction center array onto the digital camera system. The reaction chamber housing can be fitted with heating and cooling elements, such as a Peltier device, to regulate the temperature of the reactions.

By fixing the site of nucleotide incorporation within the optical system, sequence information can be obtained from many distinct nucleic acid molecules simultaneously. A diagram of the DNAS reaction center array is given in FIG. 5. As described above, each reaction center is attached to the lower slide of the reaction chamber. Depicted in the left side panel (Microscope Field) is the view of an entire array as recorded by four successive detection events (one for each of the separate fluorochromes). The center panel is a magnified view of a part of the field showing the spacing of individual reaction centers. Finally, the far right panel depicts the camera's view of a single reaction center. Each reaction center is assianed 100 pixels to ensure it that it is truly isolated. The imaging area of a single pixel relative to the 1 gm X 1 im area 0 allotted to each reaction center is shown. The density of reaction centers is limited by the Soptical resolution of the microscope system. Practically, this means that reaction centers must t ,be separated by at least 0.2 pm to be detected as distinct sites.

00

C

2. Enzyme Selection In general, any macromolecule which catalyzes formation of a polynucleotide sequence can be used as the polymerase. In some embodiments, the polymerase can be an enzymatic complex o that: 1) promotes the association by hydrogen bonding or base-pairing) of a tag a ct normal or modified nucleotide, or any compound capable of specific association with 0 o complementary template nucleotides) with the complementary template nucleotide in the active site; 2) catalyzes the formation a covalent linkage between the tag and the synthetic strand or primer; and 3) translates the active site to the next template nucleotide.

While the polymerases will typically be proteinaceous enzymes, it will be obvious to one of average skill in the art that the polymerase activity need not be associated with a proteinaceous enzyme. For example, the polymerase may be a nucleic acid itself, as in the case of ribozymes or DNA-based enzymes.

A large selection of proteinaceous enzymes is available for use in the present invention. For example, the polymerase can be an enzyme such as a DNA-directed DNA polymerase, an RNA-directed DNA polymerase a DNA-directed RNA polymerase or and RNA-directed RNA polymerase. Some polymerases are multi-subunit replication systems made up of a core enzyme and associated factors that enhance the activity of the core they increase processivity or fidelity of the core subunit). The enzyme must be modified in order to link it to the support. The enzyme can be cloned by techniques well known in the art, to produce a recombinant protein with a suitable linkage tag. In a preferred embodiment, this linkage is a hexahistidine tag, which permits strong binding to nickel ions on the solid support. Preferred enzymes are highly processive, they remain associated with the template nucleotide sequence for a succession of nucleotide additions, and are able to maintain a polymerasepolynucleotide complex even when not actively synthesizing. Additionally, preferred polymerases are capable of incorporating 3-modified nucleotides. Sufficient quantities of an enzyme are obtained using standard recombinant techniques known in the art. See, for example, Dabrowski and Kur. Protein Expr. Purif 14: 131-138 (1998).

0 2.1 DNA Polymerase In a preferred embodiment. sequencing is done with a DNA-dependent DNA polvmerase.

DNA-dependent DNA polymerases catalyze the polymerization of deoxynucleotides to form 00 the complementary strand of a primed DNA template. Examples of DNA-dependent DNA polymerases include, but are not limited to, the DNA polymerase from Bacillus stearothermophilus (Bst), the E. coli DNA polvmerase I Klenow fragment, E. coli DNA S polymerase III holoenzyme, the bacteriophage T4 and T7 DNA polymerases, and those from o Thermus aquaticus (Taq), Pyrococcusfuriosis (Pfu), and Thermococcus litoralis (Vent). The tr polymerase from T7 gene 5 can also be used when complexed to thioredoxin. Tabor et al., J.

o biol. Chem., 262: 1612-1623 (1987). The Bst DNA polymerase is preferred because it has been shown to efficiently incorporate 3'-O-(-2-Nitrobenzyl)-dATP into a growing DNA chain, is highly processive. very stable, and lacks exonuclease activity. The coding sequence of this enzyme has been determined. See U.S. Patent Nos. 5,830,714 and 5,814,506. incorporated herein by reference.

In an alternative preferred embodiment where RNA is used as template, the selected' DNA-dependent DNA polymerase functions as an RNA-dependent DNA polymerase, or reverse transcriptase. For example, the DNA polymerase from Thermus thermophilus (Tth) has been reported to function as an RNA-dependent DNA polymerase, or reverse transcriptase, under certain conditions. See, Meyers and Gelfand, Biochem. 30: 7661-7666 (1991). Thus, the Tth DNA polymerase is linked to the substrate and the sequencing reaction is conducted under conditions. where this enzyme will sequence an RNA template, thereby producing a complementary DNA strand.

In some embodiments, a polymerase subunit or fragment is attached to the support, and other necessary subunits or fragments are added as part of a complex with the sample to be sequenced. This approach is useful for polymerase systems that involve a number of different replication factors. For example, to use the bacteriophage T4 replication system for DNAS sequencing, the gp43 polymerase can be attached to the support. Other replication factors.

such as the clamp loader (gp44/62) and sliding clamp (gp45), can be added with the nucleic acid template in order to increase the processivity of the replication system. A similar approach can be used with E.coli polvmerase III system, where the polymerase core is immobilized in the array and the P-dimer subunit (sliding clamp) and t and y subassembly (clamp loader) are added to the nucleic acid sample prior to DNAS sequencing. Additionally.

o this approach can be used with eukaryotic DNA polymerases ct or 6) and the Scorresponding PCNA (proliferating cell nuclear antigen). In some embodiments, the sliding clamp is the replication factor that is attached in the array and the polymerase moiety is added in conjunction with the nucleic acid sample.

00 2.2 Reverse Transcriptase A reverse transcriptase is an RNA-dependent DNA polymerase an enzyme that produces a DNA strand complementary to an RNA template. In an alternative preferred embodiment, a O reverse transcriptase enzyme is attached to the support for use in sequencing RNA molecules.

t) This permits the sequencing of RNAs taken directly from tissues, without prior reverse O transcription. Examples of reverse transcriptases include, but are not limited to, reverse transcriptase from Avian Myeloblastosis Virus (AMV), Moloney Murine Leukemia Virus. and Human Immunodeficiency Virus-1 (HIV-1). HIV-1 reverse transcriptase is particularly preferred because it is well characterized both structurally and biochemically. See, e.g..

Huang, et al., Science 282: 1669-1675 (1998).

In an alternative preferred embodiment, the immobilized reverse transcriptase functions as a DNA-dependent DNA polymerase, thereby producing a DNA copy of the sample or target DNA template strand.

2.3 RNA Polymerase In yet another alternative preferred embodiment, a DNA-dependent RNA polymerase is attached to the support, and uses labeled-caged ribonucleotides to generate an RNA copy of the sample or target DNA strand being sequenced. Preferred examples of these enzymes include, but are not limited to, RNA polymerase from E. coli [Yin, et al., Science 270: 1653-1657 (1995)] and RNA polymerases from the bacteriophages T7, T3, and SP6. In an alternative, preferred embodiment, a modified T7 RNA polymerase functions as a DNA dependent DNA polymerase. This RNA polymerase is attached to the support and uses labeled-caged deoxvribonucieotides to generate a DNA copy of a DNA template. See, Izawa. er al.. J Biol. Chem. 273: 14242-14246 (1998).

2.4 RNA Dependent RNA Polymerase Many viruses employ RNA-dependent RNA polymerases in their life-cycles. In a preferred embodiment, an RNA-dependent RNA poiymerase is attached to the support, and uses labeled-

O

o caged ribonucleotides to generate an RNA copy of a sample RNA strand being sequenced.

Preferred examples of these enzymes include, but are not limited to, RNA-dependent RNA polymerases from the viral families: bromoviruses, tobamoviruses, tombusvirus, leviviruses, 00 hepatitis C-like viruses, and picornaviruses. See, Huang et al., Science 282: 1668-1675 (1998); Lohmann etal., J. Virol. 71: 8416-8428 (1997); Lohmann et al., Virology 249:108-118 (1998), and O'Reilly and Kao, Virology 252: 287-303 (1998).

3. Sample Preparation C The nucleic acid to be sequenced can be obtained from any source. Example nucleic acid o samples to be sequenced include double-stranded DNA, single-stranded DNA, DNA from C plasmid, first strand cDNA, total genomic DNA, RNA, cut/end-modified DNA with RNA polymerase promoter), in vitro transposon tagged random insertion of RNA polymerase promoter). The target or sample nucleic acid to be sequenced is preferably sheared (or cut) to a certain size, and annealed with oligodeoxynucleotide primers using techniques well known in the art. Preferably, the sample nucleic acid is denatured, neutralized and precipitated and then diluted to an appropriate concentration, mixed with oligodeoxynucleotide primers, heated to 65*C and then cooled to room temperature in a suitable buffer. The nucleic acid is then added to the reaction chamber after the polymerase has been immobilized on the support or. alternatively, is combined with the polymerase prior to the immobilization step.

3.1 In vitro transposon tagging of template DNA In an alternative preferred embodiment purified transposases and transposable element tags will be used to randomly insert specific sequences into template double stranded DNA. In one configuration the transposable element contains the promoter for specific RNA polymerase.

Alternatively, the inverted repeats of the transposable elements can be hybridized with complementary oligodeoxvnucleotide primers for DNAS with DNA polymerases. Preferred examples of these transposases and transposable elements include, but are not limited to. TCI and TC3A from C. elegans and the engineered teleost system Sleeping Beauty. See, Ivics et al., Cell 91:501-510 (1997); Plasterk, Curr. Top. Microbiol. Immunol. 204: 125-143 (1996); van Luenen et al.. EMBO J. 12: 2513-2520 (1993), and Vos et al.. Genes Dev. 10: 755-761 (1996).

3.2 Double Stranded Template DNA 12 o In yet another embodiment, double stranded DNA is sequenced by Bst DNA polymerase

O

C without the need for primer annealing. See. Lu et al.. Chin. J. Biotechnol. 8: 29-32 S(1992).

00 3.3 Primers Various primers and promoters are known in the art and may be suitable for sequence extension in DNAS. Examples include random primers, anchor point primer libraries, singlestranded binding protein masking/primer library, and primase.

o In a preferred embodiment anchored primers are used instead of random primers. Anchor Sprimers are oligonucleotide primers to previously identified sequences. Anchor primers can be o used for rapid determination of specific sequences from whole genomic DNA, from cDNAs or RNAs. This will be of particular use for rapid genotyping, and/or for clinical screening to detect polymorphisms or mutations in previously identified disease-related genes or other genes of interest. Once genome projects, and other studies, have identified sequences of particular interest then oligonucleotides corresponding to various locations in and around that sequence can be designed for use in DNAS. This will maximize the quantity of useful data that can be obtained from a single sequencing run, particularly useful when complex DNA samples are used. For identification of mutated or polymorphic disease genes this technique will obviate the need to perform genotyping by any other means currently in use, including using single strand conformation polymorphism (SSCP) [Orita et al., Genomics 5: 874-879 (1989)], PCR sequencing or DNA array hybridization technology [Hacia, Nat. Genet. 21: 42-47 (1999)]. Direct sequencing of disease gene is superior to SSCP and hybridization technologies because they are relatively insensitive and may frequently positively or negatively identify mutations. Many anchor oligonucleotides can be mixed together so that hundreds or thousands of genes or sequences can be identified simultaneously. In essence every known or potential disease-related gene can be sequenced simultaneously from a given sample.

4. Labeled-caged Terminating Nucleotides To be useful as a chain, terminating substrate for the methods of the present invention, a nucleotide must contain a detectable label that distinguishes it from the other three nucleotides.

Furthermore, the chain terminating nucleotides must permit base incorporation, it must terminate elongation upon incorporation, and it must be capable of being uncaged to allow further chain elongation, thereby permitting repetitive cycles of incorporation, monitoring to o identify incorporated bases, and uncaging to allow the next cycle of chain elongation.

SUncaging of the nucleotides can be accomplished enzymatically, chemically, or preferably photolvtically.

0 The basic molecule is an NTP with modification at the 3'-OH the 2'-OH or the base In a standard dideoxy NTP, R=H, and R"=H.

S R=H, R'=OH, and R"=H is a chain terminator for RNA polymerases.

One set of useful chain-terminating nucleotides for the methods of the present invention is R= S cage/label, (H or OH), and H. In a preferred embodiment, the modified nucleotide is a label a fluorophore) linked to the sugar moiety by a 3 -O-(-2-Nitrobenzyl) group. The CN modified 3 '-O-(-2-Nitrobenzyl)-dNTP is incorporated into the growing DNA chain by Bst DNA polymerase linked to a support. In order to resume chain elongation, the nucleotide is uncaged by removal of the 2-Nitrobenzyl group (with its corresponding detectable label) by exposure to light of the appropriate frequency. The modified nucleotide 3'-O-(-2-Nitrobenzyl)-dATP has previously been used in a single round of nucleotide incorporation and uncaging. Metzker et al., Nucleic Acids Res. 22: 4259-4267 (1994). See also Cheesman, U.S. Patent No. 5,302,509, incorporated herein by reference.

An alternative set of useful chain-terminating nucleotides has the configuration R= cage, R'= (H or OH), and cage/label. In a preferred embodiment, the detachable labeling group is a label a fluorophore) linked to the base of the nucleotide by a 2-Nitrobenzyl group, and the detachable blocking group is a 3 '-O-(-2-Nitrobenzyl) group. The modified nucleotide is incorporated into the growing DNA chain by Bst DNA polymerase linked to a support. In order to resume chain elongation, the nucleotide is uncaged by removal of both the labeling group and the blocking group by exposure to light of the appropriate frequency.

In either of these configurations it may prove advantageous to place two labels two fluorochromes) on each cage, as has been described in WO 98/33939.

For sequencing when the synthetic strand is RNA. labeled-caged ribonucleotides R' OH) are synthesized as modified nucleotides designed for incorporation by support-linked

RNA

polymerase.

o 4.1 Fluorescent labels 0 C, The use of fluorescent tags to identify nucleotides in nucleic acid sequencing is well known in S the art. See, U.S. Patent Nos. 4,811,218; 5,405,747; 5,547,839 and 5,821,058, each incorporated herein by reference. Metzker and Gibbs have recently disclosed a family of fluorescently tagged nucleotides based on the Cy fluorophores with improved spectral characteristics. U.S. Patent No. 5,728,529, incorporated herein by reference. Alternative sets of fluorophores include: the rhodamine based fluorophores, TARAM, ROX, JOE, and FAM; the BigDye@ fluorophores (Applied Biosystems, Inc.); and the BODIPY® fluorophores (U.S.

0 Patent No. 5,728,529).

O In a preferred embodiment of the present invention, a fluorescent label is attached to the C" photolabile 3' blocking group cage). Examples of modified nucleotides for DNAS are schematically illustrated in FIG. 1 (Panels Panel A depicts a deoxyadenosine triphosphate modified by attachment of a photolabile linker-fluorochrome conjugate to the 3' carbon of the ribose. Photolysis of the linker by <360 nm light causes the fluorochrome to dissociate, leaving the 3'-OH group of the nucleotide intact. Panel B depicts an alternative configuration in which the fluorochrome is attached to the base of the nucleotide by way of a photolabile linker. The 3'-OH is blocked by a separate photolabile group. Modified nucleotides such as those depicted in Panels A and B are examples of labeled-caged deoxyribonucleotides for use in DNAS. A variety of fluorochromes and photolabile groups can be used in the synthesis of labeled-caged deoxyribonucleotides. Additionally, ribonucleotides can also be synthesized for use with RNA polymerases. Four fluorochromes with distinct spectral properties allow the four nucleotides to be distinguished during the detection phase of the DNAS reaction cycle. FIG. I (Panel C) provides a schematic representation of four different labeled-caged terminator nucleotides for use in direct nucleic acid sequencing.

After incorporation of the labeled-caged terminator nucleotides by the immobilized polymerase molecules, the fluorophores are illuminated to excite fluorescence in each of the four species of fluorophore. The emission at each point in the array is optically detected and recorded. Once the sequence information has been obtained, the photolabile linkers are removed by illumination with light at the uncaging wavelength (<360 nm).

Depicted in FIG 2 is a single round of the reaction cycle, the incorporation of a labeledcaged nucleotide; the detection of the labeled nucleotide: and the unblocking of the caged nucleotide. It is through successive rounds of the DNAS reaction cycle that primary S sequence information is deduced. In the first panel (Step 1) is an example single stranded template DNA (3'-AGCAGTCAG-5') on the left side is a short primer sequence and 0o a labeled-caged dGTP undergoing incorporation. In the middle panel (Step 2) the Cl fluorochrome, BODIPY 5 6 is excited by YAG laser illumination at 532nm. The fluorochrome emits light centered at a wavelength of 570 nm, which is detected by the microscope system. Finally, in Step 3, photolysis of the linker by illumination with <360 nm light simultaneously dissociates fluorochrome label and releases the 3' block. As a result the

O

C1 primer is extended by one base and the 3'-OH is restored so that another nucleotide o can be incorporated on the next cycle.

4.2 Quantum dot labels In an alternative preferred embodiment of the present invention, each of the caged terminators is labeled with a different type of quantum dot. Recently, highly luminescent semiconductor quantum dots (QDs) have been covalently coupled to biomolecules. Chan and Nie, Science 281: 2016-2018 (1998). These luminescent labels exhibit improved spectral characteristics over traditional organic dyes, and have been shown to allow sensitive detection with a confocal fluorescence microscope at the single dot level. In this embodiment, the caged quantum dot terminators are incorporated, detected, and uncaged in a manner similar to that described above for the fluorescent caged terminators.

4.3 Plasmon resonance particles In a preferred embodiment, each of the caged terminators is labeled with a colloidal silver plasmon-resonant particle (PRP). Schultz et al., J. Clin. Ligand Assay 22: 214-216 (1999); Schultz et al., Proc. Natl. Acad. Sci. 97: 996-1001 (2000). PRPs are metallic nanoparticles, typically 40-100 nm in diameter which can be engineered to efficiently scatter light anywhere in the visible range of the spectrum. These particles are bright enough to be used for single molecule detection. PRPs were shown to produce a scattering flux equivalent to that from million fluorescein molecules, and more than 105-fold greater than that from typical quantum dots. Schultz et al., Proc. Natl. Acad. Sci. 97: 996-1001 (2000). Furthermore, when imaged by a standard CCD, the spatial peak can be located to a precision of 10 A, similar precision to that observed with imaging single fluorophores on gold nanoparticles. Denk and Webb. Appl. Opt.

29: 2382-2391 (1990). To facilitate detection, in certain embodiments, each different type of O nucleotide is modified with a PRP of a different color. In order to resolve the signal from two C, PRPs incorporated into a sample at neighboring reaction centers, the reaction centers must at least be separated by a coherence length (approximately the wavelength of the illuminating 00 light). Additionally, Raman scattering may be used to detect the PRPs. Nie and Emory, Cl Science 275: 1102-1106 (1997).

Detection of Incorporated Nucleotides Advances in microscopic techniques have allowed the spectroscopic detection of single C molecules. See, Nie and Zare, Annu. Rev. Biophys. Biomol. Struct. 26: 567-596 (1997), and S Keller et al., Appl. Spectrosc. 50: 12A-32A (1996). For example, single fluorescent molecules 0 in aqueous solution can be visualized under total internal reflection fluorescence microscopy (TIRFM), confocal microscopy, fluorescence resonance energy transfer (FRET), or surface plasmon resonance spectroscopy (SPR). See, Dickson et al., Nature 388: 355-358 (1997); Dickson et al., Science 274: 966-969 (1996); Ishijima et al., Cell 92: 161-171 (1998); Iwane et al., FEBS Lett. 407: 235-238 (1997); Nie et al., Science 266: 1018-1021 (1994); Pierce et al., Nature 388: 338 (1997); Ha et al., Proc. Natl. Acad. Sci. USA 93: 6264-6268 (1996), and Gordon et al., Biophys. J. 74: 2702-2713 (1998). Yokota et al., Phys. Rev. Letts. 80:4606-4609 (1998). Since single molecules can be detected spectroscopically, cloned nucleic acid samples are no longer necessary for sequencing. A single copy of template, contained within a reaction center is a sufficient sample size. The apparatus and methods of the present invention allow the resolution of signals from single nucleotide tags within an optical plane and their subsequent conversion into digital information. Photons are collected from a thin plane roughly equivalent to the volume within which the enzyme and newly synthesized base reside.

5.1 TIRFM When light is directed at a particular angle into a refractive medium of set width, such as a glass slide, total internal reflection (TIR) will result. Above the plane of the refractive medium an electromagnetic phenomenon known as an evanescent wave occurs. The principle of the evanescent wave is depicted in FIG. 6. The evanescent wave extends from the surface to a distance of the order of the wavelength of light. Importantly, an evanescent wave can be used to excite fluorochromes within this distance. When this phenomenon is used for microscopy it is called total internal reflection fluorescence microscopy (TIRFM). The arrangement of microscope slides, prism and laser beam depicted in this figure will lead to TIR within the O lower slide and thus an evanescent wave .will be generated within -150 runm of the upper surface Sof the lower slide. Fluorochrome molecules, such as those within DNAS reaction centers. will be excited and can be detected optically using the objective lens, microscope and camera 00 system. A high signal-to-noise ratio is achieved using evanescent wave excitation because only those fluorochrome molecules within the evanescent wave are stimulated.

In a preferred embodiment TIRFM is used for detection. Depicted in FIG. 7 is the arrangement of equipment required to carry out DNAS using TIRFM. A standard laboratory microscope O stand houses the reaction chamber assembly, objective lens, filter wheel, microchannel plate intensifier, and cooled CCD camera. Laser light is directed into the prism by dichroic mirrors 0 and computer controlled shutters. Evanescent wave excitation is used to stimulate the sample.

Evanescent wave excitation is achieved by total internal reflection at the glass-liquid interface.

At this interface, the optical electromagnetic field does not abruptly drop to zero, but decays exponentially into the liquid phase. The rapidly decaying field (evanescent wave) can be used to excite fluorescent molecules in a thin layer of approximately 150 nm immediately next to this interface. See, PCT Patent Application WO 98/33939, incorporated herein by reference.

The sensitivity that allows single molecule detection arises from the small sample volume probed. One advantage of TIRFM is that the entire reaction center array can be imaged simultaneously. Images of the reaction center array are focused onto the face of the microchannel plate intensifier through barrier filters carried on the filter wheel. The microchannel plate intensifier amplifies the image and transfers it to the face of the cooled CCD camera. Image data are read from the CCD chip and processed on a microcomputer. A stimulating laser, or set of stimulating, lasers, is directed to the specimen by way of an optical table. Another laser uncages the 3'-OH protecting group. Additional lasers may be required for optimal fluorochrome stimulation. A filter wheel is also included in the invention to change barrier filters so that the four different fluorochromes (each corresponding to a different type of labeled-caged nucleotide) are unambiguously distinguished.

As shown in FIG. 7, a prism is built onto the microscope slide to direct the laser into the slide from outside the microscope. Ishijima et al., Cell 92: 161-171 (1998). Alternatively.

objective-type TIRFM can be used for fluorescence detection. Laser light is directed throurgh an objective lens off-center such that the critical angle is achieved using the objective lens itself See, Tokunaga er al., Biochem. Biophys. Res. Comm. 235: 47-53 (1997).

o 5.2 Confocal Microscopy In an alternative preferred embodiment. confocal microscopy is used for detection. In confocal Smicroscopy, a laser beam is brought to its diffraction-limited focus inside a sample using an oil immersion, high numerical-aperture (NA) objective lens. Single molecules have been detected 00 in solution by multi-photon confocal fluorescence. Mertz, et al., Opt. Lett. 20:2532-2534 (1995). In one embodiment of this invention, the nucleotide labels are detected by scanning multi-photon confocal microscopy. Nie et al., Science 266: 1018-1021 (1994).

S5.3 Fluorescence Resonance Energy Transfer (FRET) t) In an alternative preferred embodiment, FRET technology is used for detection. Fluorescence S resonance energy transfer is a distance-dependent interaction between the electronic excited states of two dye molecules in which excitation is transferred from a donor molecule to an acceptor molecule without emission of a photon. FRET is dependent on the inverse sixth power of the intermolecular separation, making it useful over distances comparable with the dimensions of biological macromolecules. Thus, FRET is an important technique for investigating a variety of biological phenomena that produce changes in molecular proximity.

This technique makes use of some unusual properties of dye molecules. In experiments that use fluorescent dyes, the dye molecule is typically excited at one wavelength of light and data is collected at a longer wavelength. However, when two different dye molecules are placed very close together, light can be absorbed by one molecule (the donor), and its emission can then be immediately captured by the adjacent molecuIe (the acceptor). Light at a still longer wavelength is then emitted from the acceptor. In most applications, the donor and acceptor dyes are different, in which case FRET can be detected by the appearance of sensitized fluorescence of the acceptor or by quenching of donor fluorescence. When the donor and acceptor are the same, FRET can be detected by the resulting fluorescence depolarization.

Donor and acceptor molecules must be in close proximity (typically 10-100 Absorption spectrum of the acceptor must overlap fluorescence emission spectrum of the donor, and donor and acceptor transition dipole orientations must be approximately parallel.

FRET can be employed to increase signal to noise ratios. Additionally. FRET can be used in DNAS to avoid the need for a photolabile linker on the fluorochromes. FRET is commonly used to measure the distance between molecules or parts of them. or to •detect transient molecular interactions. In practice candidate molecules, or different parts of, the same O molecule, are modified with two different fluorescent groups. The solution is then excited by light corresponding to the shorter excitation wavelength of the two fluorochromes. When the second fluorochrome is in close proximity to the first, it will be excited by the emitted energy OO of the former and emit at its own characteristic wavelength. The efficiency (quantum yield) of the conversion is directly related to the physical distance between the two fluorochromes. For specific application to DNAS, polymerase molecules are tagged with a fluorochrome that behaves as a photon donor for the modified nucleotides. This would limit their excitation to the active site of the polymerase or any other appropriate pan of the polymerase. Such an arrangement would significantly increase the signal-to-noise ratio of nucleotide detection.

O Moreover, because only nucleotides within the polymerase are excitable FRET as applied to Cl DNAS would render unnecessary the removal of previously incorporated fluorescent moieties.

FRET has been performed at the single molecule level as required for DNAS [Ha et al., Proc.

Natl. Acad. Sci. USA 93: 6264-6268 (1996)], and has been optimized for quantification in fluorescence microscopy. Gordon et al., Biophys. J. 74: 2702-2713 (1998).. Optimally the polymerase would be synthesized as a recombinant green fluorescent protein (GFP) fusion protein as this would eliminate the need to derivatize the polymerase and unlike most commonly used fluorochromes GFP is substantially resistant to photobleaching. However, we may find that the optimal arrangement is a chemically modified polymerase to which a synthetic fluorochrome or quantum dot has been attached.

5.4 Surface Plasmon Resonance In one embodiment, surface plasmon resonance (SPR) spectroscopy is used to detect the incorporation of label into the nucleic acid sample. SPR is used to measure the properties of a solution by detecting the differences in refractive index between the bulk phase of the solution and the evanescent wave region. SPR has been recently used to for single molecule imaging of fluorescently labeled proteins on metal by surface plasmons in aqueous solution. Yokota et al., Phys. Rev. Letts. 80:4606-4609 (1998). This technique involves coating the reaction chamber surface with a thin layer of metal in order to enhance the signal from fluorescenthl labeled nucleotides.

The DNAS Detector The detector is a cooled CCD camera fitted with a microchannel plate intensifier. A block diagram of the instrument set-up is presented in FIG. 7. Recently available intensified-cooled t CCD cameras have resolutions of at least l O00x 1O00 pixels. In -a preferred embodiment of this o invention, an array consists of 100x100 reaction centers: Thus, when the array is imaged onto the face of the camera, each reaction center is allotted approximately 10x10 pixels. DNAS uses a 63x 1.4 NA lens to image an array (100x100 pm grid) of regularly spaced reaction 00 centers, depicted in FIG. 5. Information can be simultaneously recorded from 10,000 reaction centers. This expected resolution is comparable to that achieved in a recent report, whereby TIRFM was used to image a sample of nile red fluorophores, and produced images of a large number of single molecules. A single nile red molecule was unambiguously imaged in an 8x8 0 pixel square. Dickson efal., Nature 388: 355-358 (1997).

S 6. The Sequencing Cycle Housing the array of DNAS reaction centers and mediating the exchange of reagents and buffer is the reaction chamber assembly. The reaction chamber is a sealed compartment with transparent upper and lower slides. The slides are held in place by a metal or plastic housing, which may be assembled and disassembled to allow replacement of the slides. There are two ports that allow access to the chamber. One port allows the input of buffer (and reagents) and the other port allows buffer (and reaction products) to be withdrawn from the chamber. The lower slide carries the reaction center array. In addition, a prism is attached to the lower slide to direct laser light into the lower slide at such angle as to produce total internal reflection of the laser light within the lower slide. This arrangement allows an evanescent wave to be generated over the reaction center array. A high numerical aperture objective lens is used to focus the image of the reaction center array onto the digital camera system. The reaction chamber housing can be fitted with heating and cooling elements, such as a Peltier device, to regulate the temperature of the reactions. A nucleic acid sample is introduced to the reaction chamber in buffered solution containing all four labeled nucleoside triphosphate terminators.

A schematic representation of the reaction chamber assembly is presented in FIG. 4. Reaction centers are monitored by the microscope system until a majority of reaction centers contain immobilized polymerase bound to the template with a single incorporated labeled-caged terminator nucleotide. The reaction chamber is then flushed with a wash buffer. Specific nucleotide incorporation is then determined for each reaction center. Following detection, the reaction chamber is irradiated to uncage the incorporated nucleotide and flushed with wash buffer once again. The presence of labeled nucleotides is once again mdnitored before fresh O reagents are added to reinitiate synthesis. This second detection verifies that a reaction center is successfully uncaged. The presence of a labeled nucleotide in the chamber during this step indicates that the reaction center has not been uncaged. Accordingly, the subsequent reading 00 from this reaction center during the next detection step of the cycle will be.ignored. Thus, by ignoring the signals from reaction centers that are not successfully uncaged, the methods of the present invention avoid the problems caused by incomplete uncaging in sequencing methods of the prior art. The sequencing cycle outlined above is repeated until a large proportion of o reaction centers persistently fail to incorporate or uncage additional nucleotides.

t' Methods for regulating the supply (and removal) of reagents to the reaction centers, as well as O the environment of the reaction chamber the temperature, and oxidative environment) are incorporated into the reaction chamber using techniques common in the art. Examples of this technology are outlined in: Kricka, Clinical Chem. 44: 2008-2014 (1998); see also U.S. Patent No. 5,846,727.

7. Sequence Acquisition Software The sequence acquisition software acquires and analyzes image data during the sequencing cycle. At the beginning of a sequencing experiment, a bin of pixels containing each reaction center is determined. During each sequencing cycle, four images of the entire array are produced, and each image corresponds to excitation of one of the four fluorescently labeled nucleotide bases A, C, G, or T For each reaction center bin, all of the four images are analyzed to determine which nucleotide species has been incorporated at that reaction center during that cycle. As described above, the reaction center bin corresponding to a certain reaction center contains a I 0x10 array of pixels. The total number of photons produced by the single fluorophore in that reaction center is determined by the summation of each pixel value in the array. Typically, 500-1500 photons are emitted from a single fluorophore when excited for 100 milliseconds with a laser producing an intensity of 5kW/cm: at the surface of the microscope slide. Dickson et al., Science 274: 966-969 (1996). The sums of the reaction center bins from each of the four images are compared, and the image that produces a significant sum corresponds to the newly incorporated base at that reaction center. The images are processed for each of the reaction centers and an array of incorporated nucleotides is recorded. An example of a data acquisition algorithm is provided in FIG. 8. Such processing is done in real time at low cost with modem image processing computers.

o Multiple reads of the reaction center array may be necessary during the detection step to ensure that the four nucleotides are properly distinguished. Exposure times can be.as low as 100 msec, and the readout time of the CCD chip can be as long as 250 msec. Thus, the maximum time needed for four complete reads of the array is 1.5 seconds. The total time for a given 00 Ccycle, including reagent addition, removal, and washes, is certainly less than 10 seconds.

Accordingly, a sequencing apparatus consisting of an array of 10,000 reaction centers is able to detect at least 360 bases per site per hour, or 3.6 Megabases per hour of total sequence, as a conservative estimate. This rate is significantly faster than those of traditional sequencing S methodologies.

S In addition to short sequencing times, the methods of the present invention do not require the Ci, time-consuming processes of sample amplification (cloning, or PCR), and gel electrophoresis.

The lack of consumables necessary for sample amplification and electrophoresis. coupled with small reagent volumes (the reaction chamber volume is on the order of 10 microliters) and reduced manual labor requirements drastically reduce the cost per nucleotide sequenced relative to traditional sequencing techniques.

8. Sequence Analysis Software Depicted in FIG. 8 is an example of DNAS data acquisition using a 3x3 array of reaction centers. In a typical configuration, however, DNAS would utilize an array of I 00x 100 reaction centers. In this example, four cycles of DNAS are presented. For each cycle, four images of the array are produced. Each image corresponds to a specific excitation wavelength and barrier filter combination, and thus corresponds to the incorporation of a specific modified nucleotide.

Consider the upper left array (Cycle 1, In this case when using the BODIPY set of modified nucleotides is 3'-O-(DMNPE-(BODIPy493/ 0 deoxy ATP. Thus the reaction center array is illuminated with 488 runm light from the Ar laser and the image focused through a 503 nm barrier filter. -Each of the nine elements in the 3x3 matrix corresponds to a 1Ox 10 pixel area of the CCD camera output. For each of the four images each reaction center pixel group is analyzed to determine whether a the given nucleotide has been incorporated. Thus we see in the example that in Cycle A. modified deoxyATPs were incorporated at reaction centers X1 and ZI. Hence. in the table the first nucleotides recorded for reaction-centers XI and ZI are If we consider a given reaction center, reaction center XI, over the four cycles of DNAS we see that in the first cycle the reaction center has incorporated a in the second 0 cycle a in the third cycle.a and in the fourth cycle an Hence the sequence fragment of the template DNA bound at reaction center Y3 is the reverse complement of 5'-ACCT-3', which is 5'-TGGA-3'. The primary sequence exists as an array of sequences, each derived from 00 a single reaction center. The length of each reaction center sequence will depend upon the number of cycles a given center remains active in an experiment. Based on the processivity of cloned polymerases reported in the art, sequence lengths of several hundred to several thousand bases are expected.

S In one embodiment of the present invention, a nucleic acid sample is sheared prior to inclusion tf in a reaction center. Once these fragments have been sequenced, sequence analysis software is used to assemble their sequences into contiguous stretches. Mahy algorithms exist in the art that can compare sequences and deduce their correct overlap. New algorithms have recently been designed to process large amounts of sequence data from shotgun (random) sequencing approaches.

In one preferred embodiment, an algorithm initially reduces the amount of data to be processed by using only two smaller sequences derived from either end of the sequence deduced from a single reaction center in a given experiment. This approach has been proposed for use in shotgun sequencing of the human genome. Rawlinson, et al., J. Virol 70: 8833-8849 (1996); Venter et al., Science 280: 1540-1542 (1998). It employs algorithms developed at the Institute for Genome Research (TIGR). Sutton, et al., Genome Sci. Technol. 1: 9 (1995).

In an alternative preferred embodiment, raw data is compressed into a fingerprint of smaller words hexanuclebtide restriction enzyme sites) and these fingerprints can be compared and assembled into larger continuous blocks of sequence (contigs). This technique is similar to that used to deduce overlapping sequences after oligonucleotide hybridization. Idury and Waterman, J. Comput. Biol. 2: 291-306 (1995). Yet another embodiment uses existing sequence data, from genetic or physical linkage maps, to assist the assembly of new sequence data from whole genomes or large genomic pieces.

9. Utility of DNAS Clinical Applications The importance of genetic diagnoses in medicine cannot be understated. Most obvious is the use of techniques that can identify carriers of harmful genetic traits for pre-natal and neo-natal O diagnosis. Currently, biochemical tests and karyotype analyses are the most commonly used

O

C- techniques, but these have clear limitations. Biochemical tests are only useful when there is a S change in the activity or levels of an enzyme or protein which has been associated with the disease state and for which a specific test has been determined. Even when a protein has been 00 l attributed to a disease state the development of such reagents can be difficult, expensive and time consuming. Karyotypic analyses are only useful for identifying gross genetic disorders such as ploidy, translocations and large deletions. Although it is theoretically possible to determine whether individuals possess defective alleles of a given gene by current DNA CN techniques, effective screening programs are only currently practicable in cases in which a o common mutation is associated with the disease and its presence can be determined by C non-sequencing techniques.

The methods of the present invention permit large amounts of DNA sequence data to be determined from an individual patient with little technical effort, and without the need to clone patient DNA or amplify specific sequences by PCR. Single molecules can be sequenced directly from a simple DNA preparation from the patient's blood, tissue samples or from amniotic fluid. Accordingly, DNAS can be used for clinical diagnosis of genetic disorders, traits or other features predictable from primary DNA sequence information, such as prenatal, neo-natal and post-natal diagnoses or detection of congenital disorders; pathological analysis of somatic disease caused by genetic recombination and/or mutation; identification of loss of heterozygosity, point mutations, or other genetic changes associated with cancer, or present in pre-cancerous states.

The methods of the present invention can also be used to identify disease-causing pathogens viral, bacterial, fungal) by direct sequencing of affected tissues.

Functional Gene Identification Large scale genetic screens for genes involved in certain processes, for example during development, are now common and are applied to vertebrates with large genomes such as the zebrafish (Danio rerio) and the amphibian Xenopus tropicalis. Attempts to clone mutant genes in mouse and human have been iengthy and difficult and even in more genetically amenable organisms like zebrafish it is still time consuming and difficult.

Since the methods of the present invention permit the sequencing of an entire genome the size of a mammal in a short period of time, identification of mutant genes can be achieved by bulk O sequence screening, sequencing whole genomes or large genomic segments of a carrier, k and comparing to the sequence of whole genomes or large genomic segments of different members of a given species.

00 C Similarly, the methods of the present invention allow facile sequencing of entire bacterial genomes. Sequence information generated in this fashion can be used for rapid identification of genes encoding novel enzymes from a wide variety of organisms, including extremophillic bacteria.

N In addition, the methods of the present invention can also be used for assessment of mutation O rates in response tomutagens and radiation in any tissue or cell type. This technique is useful Cl for optimization of protocols for future mutation screens.

Analysis of Genetic Alterations in Tumors Many cancers, possibly all cancers, begin with specific alterations in the genome of a cell or a few cells, which then grow unchecked by the controls of normal growth. Much of the treatment of cancers is dependent upon the specific physiological response of these abnormal cells to particular agents.

The method of the present invention will allow the rapid generation of a genetic profile from.

individual tumors, allowing researchers to follow precisely what genetic changes accompany various stages of tumor progression. This information will also permit the design of specific agents to target cancer cells for tailor-made assaults on individual tumors.

Analysis of Genetic Variation Many important physiological traits, such as control of blood pressure, are controlled by a multiplicity of genetic loci. Currently, these traits are analyzed by quantitative trait linkage (QTL) analysis. Generally, in QTL analysis a set of polymorphic genetic linkage markers is utilized on a group of subjects with a particular trait, such as familial chronic high blood pressure. Through an analysis of the linkage of the markers with the trait, a correlation is irawn between a set of particular loci and the trait. Usually a handful of loci contribute the najority of the trait and a larger group of loci will have minor effects on the trait.

Fhe methods of the present invention permit rapid whole genome sequencing. Thus. using the nethods of the present invention, QTL analysis is executed at a very fine scale and. with a 0 large group of subjects, all of the major loci contributing to a given trait and most of the minor Cl loci are easily identified.

Moreover, the method of the present invention can be used for constructing phylogenetic trees 00 and/or kinship relationships by estimation of previous genomic recombinations inversion, translocation, deletion, point mutation), or by previous meiotic recombination events affecting the distribution of polymorphic markers. The method of the present invention can be used to identify mutations or polymorphisms, with the aim of associating genotype with phenotype.

The method of the present invention can also be use to identify the sequence of those mutant or polymorphic genes resulting in a specific phenotype, or contributing to a polygenic trait.

0 0 Agricultural Applications Agricultural efficiency and productivity is increased by generating breeds of plants and animals with optimal genetic characteristics. The methods of the present invention can be used, for example, to reveal genetic variation underlying both desirable and undesirable traits in agriculturally important plants and animals. Additionally, the methods of the present invention can be used to identify plant and animal pathogens, and designing methods of combating them.

Forensic Applications The methods of the present invention can be used in criminal and forensic investigations, or for the purpose of paternity/matemity determination by genetically identifying samples of blood, hair, skin and other tissues to unambiguously establish a link between a suspected individual' and forensically relevant samples. The results obtained will be analogous to results obtained with current genetic fingerprinting techniques, but will provide far more detailed information and will be less likely to provide false positive identification. Moreover, the identity of individuals from a mixed sample can be determined.

Research Applications The methods of the present invention can be used for several research applications. such.as the sequencing of artificial DNA constructs to confirm/elicit their primary sequence. and'or to isolate specific mutant clones from random mutagenesis screens: the sequencing of cDNA from single cells, whole tissues or organisms from any developmental stage or environmental circumstance in order to determine the gene expression profile from that specimen; the O sequencing of PCR products and/or cloned DNA fragments of any size isolated from any source.

The methods of the present invention can be also used for the sequencing of DNA fragments 00 S generated by analytical techniques that probe higher order DNA structure by their differential sensitivity to enzymes, radiation or chemical treatment partial DNase treatment of chromatin), or for the determination of the methylation status of DNA by comparing sequence generated from a given tissue with or without prior treatment with chemicals that convert Smethyl-cytosine to thymine (or other nucleotide) as the effective base recognized by the polymerase. Further, the methods of the present invention can be used to assay cellular physiology changes occurring during development or senescence at the level of primary sequence.

The methods of the present invention can also be used for the sequencing of whole genomes or large genomic segments of transformed cells to select individuals with the desired integration status. For example, DNAS can be used for the screening of transfected embryonic stem cell lines for correct integration of specific constructs, or for the screening of organisms such as Drosophila, zebrafish, mouse, or human tissues for specific integration events.

Additionally, the method of the present invention can be used to identify novel genes through the identification of conserved blocks of sequence or motifs. from evolutionarily divergent organisms. The method of the present invention can also be used for identification of other genetic elements regulatory sequences and protein binding sites) by sequence conservation and relative genetic location.

The details of one or more embodiments of the invention have been set forth in the accompanying description above. Although any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, the preferred methods and materials are now described. Other features, objects, and advantages of the invention will be apparent from the description and from the claims. In the specification and the appended claims, the singular forms include plural referents unless the context clearly dictates otherwise. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. All patents and publications cited in this specification are incorporated by reference.

o The following EXAMPLES are presented in order to more fully illustrate the preferred embodiments of the invention. These EXAMPLES should in no way be construed as limiting the scope of the invention, as defined by the appended claims.

oO Example I Reaction chamber substratum preparation, Nickel/chelator l conjugate.

S The fundamental unit of the DNAS methodology is the reaction center (FIG. The reaction center comprises a polymerase molecule bound to a template nucleic acid molecule, and tethered to a fixed location on a transparent substrate via a high affinity interaction between groups attached to the polymerase and substrate respectively. In one configuration, DNAS 0 reactions occur in a reaction chamber whose base, the substrate, is made of glass (SiO,) modified so that polymerase molecules can be attached in a regular array. Using electron beam lithography a square array of dimensions 100 pm X 100 pm is generated. Rai-Choudhury, Handbook of Microlithography, Micromachining, and Microfabrication, Volume I: Microlithography, Volume PM39, SPIE Press (1997). A small spot, <50 nm in diameter, is etched at every 1 pm interval in resist material covering the glass slide. This etching exposes the glass for subsequent derivatization in which a nitrilotriacetic acid group is covalently bound by way of silane chemistry. Schmid, et al.. Anal Chem 69: 1979-1985 (1997). Each nitrilotriacetic acid group serves as a chelator for a Ni' ion. The coordinated Ni 2 ion can then be bound by hexahistidine moieties engineered into a variety of polymerase molecules. Thus an array of 10,000 polymerase molecules is generated in a 100 pm X 100 pm array, which will be observed in an optical microscope system. In an alternative configuration biotin is covalently attached to each spot by way of silane chemistry. The biotin is then bound by streptavidin moieties covalently linked to, or engineered into, the polymerase molecules.

Example 2: Microfluidic reaction chamber allows rapid exchange of reactants, buffer and products.

The reaction chamber is a device that houses the array of reaction centers and regulates the environment. As described in Example 1, the substrate is a glass microscope slide prepared with a regular microscopic array of covalently moieties. A prism is attached to the slide on the surface opposite to the array. The prism directs laser light into the slide at such an angle that total-internal reflection of the laser light is achieved within the slide. Under this condition an evanescent wave is generated over the array during the sequencing reaction cycle. The slide

O

O and prism are fixed into an assembly, which will generate a sealed chamber with a volume of 1-10 ul (FIG. Reagents and buffer are pumped into and out-of the chamber through S microfluidic ports on either side of the chamber. Complete exchanges of volume take place 00 within 1 second and are mediated by electronically controlled valves and pumps.

Example 3: Preparation of labeled-caged chain terminating nucleotides Preparation of fluorochrome-photolabile linker conjugate 0 Fluorochrome-linked 2-nitrobenzyl derivatives are first generated as described by Anasawa, el ot al, WO 98/33939. Alternatively a sensitized photolabile linker using DMNPE caging 0, kit, Catalog Number D-2516, Molecular Probes, Inc.) may be first attached to the 3' group of the dNTP as detailed below and then linked to a fluorochrome using succinimide chemistry or otherwise. It may prove optimal to use a linker of variable length between the fluorochrome and the caging group to reduce possible steric hindrance caused by large chemical groups.

Brandis, et Biochemistry 35: 2189-2200 (1996).

Preparation of 3'--modified-2'-deoxynucleotide analogs 3'-O-modified-2'-deoxynucleotides are synthesized by esterification of the 3'-OH group of dATP, dCTP, dGTP and dTTP. This is accomplished by several general methods. Metzker, et al., Nucleic Acids Res 22: 4259-4267 (1994).

Method 1: First 2'-deoxy-5'-hydroxy-dNTPs are reacted with tert-butyldiphenylsilyl (TBDPS) in the presence of imidazole and dimethylformamide (DMF) producing deoxynucleotides. Then the resulting 2 '-deoxy-5'-tert-butyldiphenyisilyl dNTP is dissolved in benzene and mixed with the halide derivative of the fluorochrome-photolabile linker conjugate in the presence of tetrabutylammonium hydroxide (TBAH) (and additionally NaOH in some cases) and stirred at 25°C for 16 hours. The organic layer is extracted with ethyl acetate and washed with deionized water, saturated NaCl. dried over NaSO, and purified by flash chromatography using a stepwise gradient (10% methanol/ethyl acetate to 5% methanol/ethyl acetate in 2% intervals) o Method 2: dNTPs prepared as detailed above are reacted directly with Sthe acid anhydride of the fluorochrome-photolabile linker conjugate in dry pyridine in the Oc presence of 4 -dimethylaminopyridine (DMAP) at 25 0 C for 6 hours. The pyridine is then removed under vacuum, the residue is dissolved in deionized water, extracted in chloroform, washed with deionized water, with 10% HC1, saturated NaHCO 3 saturated NaCI, dried over NaSO,, and purified by flash chromatography.

SMethod 3: in S 2'-deoxy-5'-tert-butyldiphenylsilyl dNTPs are dried by repeated co-evaporation with pyridine, Ci dissolved in hot DMF and cooled to 0'C -in an ice bath. NaOH is dissolved in DMF after washing with dry benzene, then added to the dissolved 2'-deoxy-5'-terr-butyldiphenylsily! and stirred for 45 minutes. A halogenated derivative of the fluorochrome-photolabile linker conjugate in DMF is added and the reaction is stirred for a few hours. The reaction is then quenched with cold deionized water and stirred overnight. The solid obtained is filtered, dried, and recrystallized in ethanol.

Method 4: The 3-caged NTPs can be prepared directly from the triphosphate according to Hiratsuka et al., Biochim Biophys Acta 742: 496-508 (1983).

In the case of methods 1-3, the resulting compounds are subsequently desilyated by the addition of 1.0 equivalents of tetrabutylammonium fluoride (BuNF). The reactions are monitored by thin layer chromatography and after completion (about 15 minutes), the reactions are quenched with 1 equivalent of glacial acetic acid. The solvent is removed, and the residues purified by silica column chromatography. The 5'-triphosphate derivatives of the compounds generated by methods 1-3 are synthesized by the following protocol. The 3-modified nucleoside (1.0 equivalents) is dissolved in trimethylphosphate under a Nitrogen atmosphere.

Phosphorus oxychloride (POCI,) (3.0 equivalents) is added and the reaction is stirred at for 4 hours. The reaction is quenched with a solution of tributylammonium triphosphate equivalents) in DMF and tributylamine. After stirring vigorously for 10 minutes, the reaction is quenched with TEAB pH 7.5. The solution is concentrated, and the triphosphate derivative o isolated by linear gradient (0.01 M to 0.5 M TEAB) using a DEAE cellulose (HCO,- form) column.

The final synthetic products are purified by HPLC, and may be further purified by enzymatic 0 mop-up if necessary [Metzker, et al., Biotechniques 25: 814-817 (1998)], a technique .which utilizes the extreme enzymatic preference of many polymerases for deoxynucleotides versus their 3'-blocked counterparts. This probably results from low efficiency of the catalytic formation of the phosphodiester bond when 3'-modified nucleotides are present in the enzyme O active site so that the enzyme tends to rapidly exhaust the normal contaminating S deoxynucleotides first. Brandis, et al., Biochemistry 35: 2189-2200 (1996).

0 In an alternative configuration a photolabile group is attached to the 3'-OH using succinimide or other chemistry and a fluorochrome-photolabile linker conjugate is attached directly to the base of the nucleotide as described by Anasawa et al., WO 98/33939. The 3' attached photolabile group will serve as a reversible chain terminator [Metzker, et al., Nucleic Acids Res 22: 4259-4267 (1994)] and the base-attached fluorochrome-photolabile linker will serve as a removable label. In this configuration with each cycle both photolabile groups will be removed by photolysis before further incorporation, is allowed. Such a configuration may be preferred if it is found that steric hindrance of large fluorochrome groups attached to the 3'-OH of the nucleotide prevent the nucleotide from entering the polymerase.

Example 4: DNAS using a cloned hexahistidine-tagged DNA polymerase, random primed single-stranded DNA template and total internal reflection fluorescence microscopy.

There are two phases to the process.

Phase 1: The first phase is the set-up phase. Hexahistidine-tagged DNA polymerase is washed into the reaction chamber and allowed to attach to the Ni 2 nitrilotriacetic array. As an example.

hexahistidine-tagged DNA polymerase from Thermus aquaticus might be used. Dabrowski, er al., Acta Biochim Pol 45: 661-667 (1998). Template DNA, is prepared by shearing or restriction digestion, followed by denaturation at 95C and annealing with a mixture of random oligodeoxynucleotide primers. The primed single-stranded DNA template is then pumped into the reaction chamber.

SPhase 2: 0 C, The second phase of the process is the main sequencing cycle. The cycle is as follows: 1. Reaction buffer containing labeled-caged chain-terminating deoxynucleoside o triphosphates (dNTP*s) is pumped into the reaction chamber: Reaction buffer consists of: 10 mM Tris HC1, pH 8.3; 50 mM KCI; and 2.5 mM MgCIl. The dNTP*s are each at a concentration of 0.02-0.2 mM.

2. Reaction buffer without the dNTP*s is rinsed through the reaction chamber.

3. For each of the 10,000 reaction centers, the identity of the newly incorporated o nucleotide is determined by total internal reflection fluorescence microscopy (TIRFM).

Multiple recordings of the reaction center array are made so that each of the four nucleotides are distinguished. The fluorochrores used have high extinction coefficients and/or high quantum-yields for fluorescence. In addition, the fluorochromes have well resolved excitation and/or emission maxima. There are several fluorochrome families that will be used, for example, the BODIPY family of fluorochromes (Molecular Probes, Inc.). Using BODIPY fluorochromes and the photolabile linker I-( 4 ,5-dimethoxy-2-nitrophenyl) ethyl (DMNPE) the follow set of nucleotide analogs can be employed for DNAS: 3'-O-(DMNPE-(BODIPY 4 9 3 deoxy ATP 3'-O-(DMNPE-(BODIPYS 3 50 deoxy CTP 3'-O-(DMNPE-(BODIPY 5 6 7 0 deoxy GTP 3'-O-(DMNPE-(BODIPY"'/ 59 deoxy TTP Thus incorporated 'A's are detected with 488 nm Argon-ion laser illumination and a barrier filter centered at 503 nm. Incorporated and 'G's and 'T's with are detected with 532 nm YAG laser illumination and barrier filters centered at 550 nm, 570 nm, and 591 nm respectively.

For each of the separate illumination events an evanescent wave is generated in the reaction center array and the image of the array is focused through the microscope system onto the face of a micro-channel plate intensified cooled-CCD camera.

o4. Newly incorporated nucleotides are optically uncaged by illumination with <360 tim light from another YAG laser. This causes dissociation of the DMN-PE-BODIPY from the nascent nucleic acid strand leaving it intact and prepared to incorporate the next 00 nucleotide.

The removal of the fluorescent moiety is verified by TIRFM and the reaction cycle is repeated until nucleotides are no longer incorporated.

Typically, the exposure time for each fluoroebrome is 100 msec. The readout time of the CCD chip is -0.25 sec. Hence, the detection step for each cycle takes <1 .5 secs. The total volume o of the reaction chamber is 1-10jp1. Less than one second is taken to completely flush the reaction chamber. Hence the total time for a given cycle is less than 10 seconds. Therefore, at seconds/cycle each of the 10,000 reaction centers of the DNAS machine is able to deduce at least 360 bases of sequence per hour, corresponding to 3.6 M base/hour of sequence deduced by the DNAS machine as a whole.

Shutters controlling laser illumination, filter wheels carrying the barrier filters and the CCD camera are all controlled by a microcomputer. Image collection and data analysis are all executed by the same microcomputer. Extracted sequence data and array images are stored permanently on CD ROM as they are collected.

EQUIVALENTS

From the foregoing detailed description of the specific embodiments of th& invention, it should be apparent that a unique' method and apparatus for nucleic acid sequencing has been described. Although particular embodiments have been disclosed herein in detail, this has been done by way of example for purposes of illustration only, and is not intended to be limiting with respect to the scope of the appended claims that folipw. In particular, it is contemplated by the inventors that various substitutions, alterations, and modifications may be made to the invention without departing from the spirit and scope of the invention as defined by the claims.

For instance, the choice of the particular polymerase, the particular linkagre of the polymerase to the solid support, or the particular nucleotide terminators is believed to be a matter of routine for a person of ordinary skill in the art with knowledge of the embodiments described herein.

Claims

2. The method of claim 1, wherein the 3' blocking group and the labeling group are separated from the incorporated nucleotide by photochemical activation. ASM Scientific, Inc. By their patent attorneys CULLEN CO. Date: 28 April 2005