[go: up one dir, main page]

WO2007136833A2 - Methods and compositions for aptamer production and uses thereof - Google Patents

Methods and compositions for aptamer production and uses thereof Download PDF

Info

Publication number
WO2007136833A2
WO2007136833A2 PCT/US2007/012075 US2007012075W WO2007136833A2 WO 2007136833 A2 WO2007136833 A2 WO 2007136833A2 US 2007012075 W US2007012075 W US 2007012075W WO 2007136833 A2 WO2007136833 A2 WO 2007136833A2
Authority
WO
WIPO (PCT)
Prior art keywords
rna
nucleic acid
oligonucleotides
sequence
assembly
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US2007/012075
Other languages
French (fr)
Other versions
WO2007136833A3 (en
Inventor
George Church
Joseph M. Jacobson
Brian M. Baynes
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Codon Devices Inc
Original Assignee
Codon Devices Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Codon Devices Inc filed Critical Codon Devices Inc
Publication of WO2007136833A2 publication Critical patent/WO2007136833A2/en
Publication of WO2007136833A3 publication Critical patent/WO2007136833A3/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/115Aptamers, i.e. nucleic acids binding a target molecule specifically and with high affinity without hybridising therewith ; Nucleic acids binding to non-nucleic acids, e.g. aptamers
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1048SELEX
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/111General methods applicable to biologically active non-coding nucleic acids
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/16Aptamers
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2320/00Applications; Uses
    • C12N2320/10Applications; Uses in screening processes
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2330/00Production
    • C12N2330/30Production chemically synthesised
    • C12N2330/31Libraries, arrays

Definitions

  • aspects of the invention relate to methods and compositions for the production of aptamers and uses thereof.
  • nucleic acid aptamers involve in vitro binding protocols to select nucleic acid aptamers from random pools of starting nucleic acids. In vitro selection has yielded aptamers that bind to certain nucleic acids, proteins, and small organic compounds.
  • aspects of the invention relate to nucleic acid libraries and host cells that can be used to screen many different nucleic acids in vivo and identify rare nucleic acids that have predetermined structural and/or functional properties of interest. Certain aspects of the invention involve identifying RNA aptamers using in vivo selections or screens. In some embodiments, recombinant cells may include several different in vivo aptamers associated with different reporter readouts.
  • aspects of the invention take advantage of nucleic acid assembly technology that supports the production of any nucleic acid fragments (including large nucleic acid fragments) having a predetermined sequence of interest.
  • Technology described herein allows libraries of the invention to be designed and assembled to include many different predetermined sequences of interest. This assembly technology also allows the production of nucleic acids that can be used to modify host organisms as described herein.
  • RNA libraries that can be used to screen or select for RNA molecules with functional or structural properties in vivo (e.g., RNA aptamers).
  • Other aspects of the invention relate to libraries of RNA molecules having predetermined structural and/or functional properties.
  • aspects of the invention provide compositions and methods for expressing RNA libraries in vivo.
  • Further aspects of the invention provide modified host cells that are adapted to express RNA libraries of interest.
  • a host cell may express a specific polymerase for transcribing the RNA, a ribonuclease that can specifically cleave long RNA transcripts, an RNA polymerase that can incorporate modified nucleotides, or any combination thereof.
  • an RNA aptamer may be identified in an in vitro screen or selection.
  • a pool of RNA molecules may be provided wherein each RNA molecule contains a reporter domain (e.g., a riboregulator sequence) attached to a different unique RNA sequence (e.g., a random RNA sequence) that is an aptamer candidate.
  • the pool of RNA molecules may be screened or selected to identify RNA variants that bind to a molecule (e.g., ligand) of interest.
  • Screening or selection assays may involve different configurations of an assay wherein a candidate RNA that binds to the molecule of interest can be identified when it produces a configuration change in the reporter domain that can be detected in an in vitro assay.
  • the configuration change of a riboregulator domain attached to an aptamer can be detected using any suitable technique.
  • a configuration change that affects the binding properties of the riboregulator e.g., by releasing or sequestering one or more sequences of the riboregulator due to changes in hybridization patterns within the riboregulator domain due to the conformational change
  • candidate RNA molecules or candidate motifs that are identified in an in vitro screen may be tested in vivo using a suitable expression vector for expressing the one or more RNA molecules.
  • the candidate aptamer sequences (and/or variants thereof) may be tested in vivo in association with the same reporter domain that was used for the in vitro screen.
  • the candidate aptamer sequences may be connected to a different reporter domain to test for in vivo properties of interest.
  • the invention provides a method for identifying and producing an RNA aptamer having a property of interest.
  • an RNA aptamer may be obtained from an in vitro and/or in vivo screen or selection.
  • An RNA aptamer may be expressed in a cell thereby providing a novel functional and/or structural property to that cell.
  • the invention provides a method of producing a cell having an altered cell function hy introducing into the cell a nucleic acid expressing one or more RNA aptamers having specific binding properties.
  • the method further comprises propagating the cell having an altered function. Accordingly, aspects of the invention relate to engineered cells expressing one or more identified aptamers of interest.
  • an aptamer sequence that was identified using a construct and an assay wherein the aptamer sequence was connected to a reporter domain subsequently may be synthesized, isolated, and/or expressed with or without the reporter domain.
  • the aptamer sequence may be synthesized, isolated, and/or expressed in association (e.g., fused) to one or more different reporter domains.
  • the target nucleic acid (e.g., a nucleic acid expressing one or more RNA molecules of interest) may be amplified, sequenced or cloned after it is made.
  • a host cell may be transformed with the assembled target nucleic acid.
  • the target nucleic acid may be integrated into the genome of the host cell.
  • the one or more RNA molecules of interest (e.g., including one or more aptamers of interest) may be expressed (e.g., under the control of an inducible promoter).
  • One or more expressed RNAs (e.g., RNA aptamers) may be isolated and/or purified (e.g., from a cell or cell lysate).
  • a cell transformed with an assembled nucleic acid may be stored, shipped, and/or propagated (e.g., grown in culture).
  • the invention provides methods of obtaining target nucleic acids that express one or more RNA molecules of interest (including RNA aptamers) by sending sequence information and delivery information to a remote site (e.g., inside or outside the United States).
  • the sequence may be analyzed at the remote site.
  • the starting nucleic acids may be designed and/or produced at the remote site.
  • the starting nucleic acids may be assembled in a reaction involving a combination of ligation and extension techniques at the remote site.
  • the starting nucleic acids, an intermediate product in the assembly reaction, and/or the assembled target nucleic acid may be shipped to the delivery address that was provided.
  • nucleic acid preparations of the invention may be screened or selected at a remote location to identify and/or isolate RNA molecules (e.g., aptamers) having structural and/or functional properties of interest.
  • RNA molecules or cells expressing the RNA molecules may returned from the remote site and/or sent from the remote site to a specified location.
  • Yet further aspects of the invention relate to business methods of marketing one or more methods, systems, and/or automated procedures relating to RNA expression and/or RNA aptamers described herein.
  • FIG. 1 illustrates non-limiting aspects of an embodiment of a polymerase-based multiplex oligonucleotide assembly reaction
  • FIG. 2 illustrates non-limiting aspects of an embodiment of sequential assembly of a plurality of oligonucleotides in a polymerase-based multiplex assembly reaction
  • FIG. 3 illustrates a non-limiting embodiment of a ligase-based multiplex oligonucleotide assembly reaction
  • FIG. 4 illustrates several non-limiting embodiments of ligase-based multiplex oligonucleotide assembly reactions on supports;
  • FIG. 5 illustrates a non-limiting embodiment of a decision tree for designing a nucleic acid assembly method
  • FIG. 6 illustrates non-limiting embodiments of nucleic acid constructs encoding a plurality of RNA molecules (e.g., aptamer or aptamer candidate molecules).
  • aspects of the invention relate to nucleic acid libraries and methods and compositions for preparing libraries containing very high numbers of nucleic acid regions. Aspects of the invention involve preparing a library comprising a plurality of cells, each transformed with one or more separate nucleic acid molecules, wherein each nucleic acid molecule comprises a plurality of nucleic acid regions, and wherein each nucleic acid region can be assayed to evaluate one or more structural and/or functional properties. Accordingly, aspects of the invention can be used to assay a large number of nucleic acid regions for the presence of one or more regions having structural and/or functional properties of interest (e.g., one or more nucleic acid aptamers having selective ligand-binding properties).
  • a ligand may be a biological ligand, for example an intracellular or extracellular ligand.
  • a ligand may be associated with disease, health, or other physiological condition. Accordingly, a ligand may be a metabolite, a protein, a nucleic acid, a lipid, a carbohydrate, or other molecule or any combination thereof.
  • ligands may be environmental or industrial molecules (e.g., toxins, minerals, chemical products, etc.) as described in more detail herein.
  • aspects of the invention may be used to identify aptamers that specifically bind to a target ligand.
  • other aptamers may be identified that bind less specifically to a range of ligands having certain common features. Accordingly, certain aptamers may be isolated to bind selectively to a ligand but also to cross-react with other ligands.
  • RNA aptamers may have different levels of specificity (e.g., high, medium, or low) for one or more ligands of interest.
  • aptamers may have affinities for their ligands ranging from nM to 10 ⁇ M (e.g., on the order of 1 nM, 10 nM, 100 nM, 500 nM, 1 ⁇ M, 10 ⁇ M or intermediate affinities). However, in some embodiments, aptamers may have higher or lower affinities for their ligands.
  • each nucleic acid fragment can transcribe an RNA molecule.
  • the RNA molecules can be assayed (e.g., in vivo or in vitro) to determine whether any of them have a structure or function of interest.
  • the invention provides in vivo libraries of transcribed RNA molecules that can be evaluated in vivo for the presence of one or more RNAs having structural and/or functional properties of interest (e.g., one or more RNA aptamers having selective ligand-binding properties under biological conditions).
  • the complexity of a library that comprises a plurality of different vectors wherein each vector encodes a plurality of different RNA molecules may be calculated as the number of transformants multiplied by the number of different RNA-encoding regions on each vector.
  • a library of the invention provides a large number of different RNA variants. Accordingly, methods of the invention can be useful to sample a large number of potential nucleic acid sequence variants.
  • methods of the invention can be useful for identifying one or more nucleic acids (e.g., RNAs) that have structural and/or functional properties of interest under biological conditions.
  • aptamers that are identified through in vitro aptamer screening and selection technology may not maintain their selective ligand-binding properties under biological conditions.
  • the invention provides different cell lines, each comprising a plurality of different aptamers that each recognizes a different ligand and provides a different readout (e.g., signal) when its ligand is present in vivo.
  • These cell lines, and the sets of aptamers that they contain, can be used in medicine, agriculture, industry, mining, or for other applications where the ability to detect and distinguish between different ligands can be very important.
  • a cell containing a plurality of different aptamers that can selectively bind to, and signal the presence of, different metabolic intermediates (e.g., intracellular metabolic intermediates) can be used to dissect and/or monitor metabolic pathways.
  • Such cells, and the sets of aptamers that they contain also can be used as markers to select and/or screen for enzymes, enzyme variants, or combinations thereof, that can form novel or modified metabolic pathways.
  • aspects of the invention may be used to develop novel or modified metabolic pathways that may catalyze the conversion of a first compound to a second compound, that may degrade or modify certain compounds, that may synthesize certain compounds, or any combination thereof.
  • methods of the invention may be useful to develop pathways for degrading or modifying environmental contaminants to reduce their toxicity.
  • metabolic pathways for generating commercially useful compounds may be useful (e.g., ethanol, and other commercially useful compounds).
  • methods of the invention relate to in vivo aptamer identification and production.
  • a library of RNA molecules may be transcribed and individual RNA molecules with functional and or structural properties of interest may be identified.
  • nucleic acid regions encoding different RNA molecules may be of any length.
  • a nucleic acid region and the encoded RNA variant may be at least 50 to at least 200 nucleotide bases long.
  • a transcribed variant RNA sequence may be at least 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, or 200 nucleotide bases long.
  • each of the variant RNA sequences is connected to (e.g., operably connected to) one or more reporter sequences (e.g., a sequence of a riboregulator, antisense RNA, or other reporter sequence).
  • reporter sequences e.g., a sequence of a riboregulator, antisense RNA, or other reporter sequence.
  • each different variant RNA sequence e.g., each different aptamer or aptamer candidate sequence
  • each different variant RNA sequences or subsets of different variant sequences each may be connected to different reporter sequences in some embodiments. Accordingly, the length of the synthesized, encoded, and/or transcribed RNAs may be longer than the length of the variant sequence described above since each RNA also may include the length of a reporter sequence attached to it, if present.
  • each vector may encode one or more separate RNA molecules.
  • a single vector encodes about I, 2, 3, 4, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, or more RNA molecules.
  • the RNA sequences are all different. However, in some embodiments several identical copies of one or more RNA sequences may be transcribed from a single vector. The sequences encoding the separate RNA molecules may be arranged in a linear array.
  • transcription of one or more RNA molecules may be under the control of the same promoter. In certain embodiments, transcription of one or more RNA molecules may be under the control of separate promoters. In some embodiments, each RNA is transcribed from its own separate promoter. The separate promoters may be separate copies of the same promoter or different promoters. In some embodiments, one or more promoters may be inducible. In some embodiments, RNA transcription may involve transcription enzymes of the host cell.
  • nucleic acid regions encoding separate RNA molecules may be transcribed as a single RNA transcript.
  • a single RNA transcript may include 2 or more RNA molecules.
  • a single RNA transcript may include 3, 4, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100 or more RNA molecules.
  • the single RNA transcript may include one or more cleavage sites that can be acted on to release one or more individual RNAs from the RNA transcript.
  • one or more enzymes may cut the cleavage sites to release individual RNAs.
  • the cleavage sites may be autocatalytic RNA cleavage sites.
  • RNAs may be transcribed as individual transcripts.
  • a plurality of RNAs may be transcribed in a combination of individual RNA transcripts and RNA transcripts that include two or more RNAs.
  • a nucleic acid sequence encoding an RNA molecule and one or more regulatory sequences may be "operably" joined.
  • the nucleic acid sequence and one or more regulatory sequences may be covalently linked in such a way as to place the transcription of the coding nucleic acid sequence under the influence or control of the regulatory sequences.
  • a promoter region is operably joined to a coding nucleic acid sequence if the promoter region is capable of promoting transcription of that nucleic acid sequence such that the resulting transcript may be an RNA molecule of the invention.
  • a 5' non-transcribed regulatory sequences may be used that includes a promoter region having a promoter sequence for transcriptional control of the operably joined nucleic acid sequence. Regulatory sequences may also include enhancer sequences or upstream activator sequences as desired.
  • Transcription vectors containing all the necessary elements for transcription are commercially available and known to those skilled in the art. See, e.g., Sambrook et al., Molecular Cloning: A Laboratory Manual, Second Edition, Cold Spring Harbor Laboratory Press, 1989. Systems and promoters for nucleic acid transcription in mammalian cells are known to those of ordinary skill in the art and available commercially.
  • one or more transcribed RNA sequences may be identical. However, in order to maximize the number of different RNA sequences that may be sampled, each vector may encode a plurality of unique RNA sequences.
  • the vector inserts that encode the unique RNA sequences may be made in a nucleic acid assembly procedure that is designed to generate a linear array of unique sequences.
  • the nucleic acid assembly may be designed to produce a large number of different vector inserts each encoding a plurality of unique RNA sequences that are not repeated in any of the other different vector inserts.
  • multiple copies of each different vector insert may be produced in order to clone the inserts into the vectors and/or in order to transform the host cells.
  • the number of different vector inserts that are designed and assembled may be a function of the expected number of transformants. For example, if a host system can generate up to 10 10 , 10 12 , 10 14 or more different transformants, the number of different unique vector inserts should be similar or higher. It should be appreciated that if each insert encodes 100 unique RNA sequences, then a library will encode a number of different RNA molecules that is 100 times the number of transformants.
  • RNAs expressed on one vector may differ from each other by 1-5 nucleotide substitutions.
  • RNAs encoded on one DNA insert may have sequences that differ from each other by about 6, 7, 8, 9, 10, 1 1, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190 or 200 more nucleotide substitutions.
  • a library may not sample all different sequence variants that are possible for an RNA of a predetermined length.
  • the sequence variants that are assembled may be determined at the design stage based on one or more factors that could include design and assembly considerations and/or any information that may suggest that certain sequence variants are more likely to result in structural or functional properties of interest.
  • a library may be assembled to include a plurality of identical or similar RNA sequences, and additional sequence variation may be introduced using mutagenesis, error-prone PCR, or other suitable methods. However, such methods introduce sequence variations randomly and are unlikely to generate as much sequence variation as a procedure that involves a design stage at which each unique RNA sequence may be predetermined.
  • nucleic acids encoding RNA molecules may be cloned into vectors.
  • a vector may be any suitable vector.
  • a vector may be a plasmid, a cosmid, a phagemid, a BAC, a YAC, an F factor, or any other suitable prokaryotic, eukaryotic or viral vector.
  • a vector may include an origin of replication and/or one or more selectable markers (e.g., antibiotic resistant markers, etc.) and/or detectable markers (e.g., fluorescent markers, etc.).
  • a vector may be a shuttle vector that is functional in two or more different types (e.g., species) of host cells.
  • vectors or expression systems may be transfected or transformed into a cell or other system capable of transcribing the RNA molecules of the invention.
  • a host cell may be prokaryotic (e.g., bacterial such as E. coli or B. subtilis) or eukaryotic (for example a yeast, mammal, insect, or other eukaryotic cell).
  • host cells may be bacterial cells (e.g., Escherichia coli, Bacillus subtilis, Mycobacterium spp. , M.
  • yeast cells for example, Saccharomyces spp., Picchia spp., Candida spp., or other suitable yeast species, e.g., S. cerevisiae, C. albicans, S. pombe, etc.
  • yeast cells for example, Saccharomyces spp., Picchia spp., Candida spp., or other suitable yeast species, e.g., S. cerevisiae, C. albicans, S. pombe, etc.
  • Xenopus cells mouse cells, monkey cells, human cells, insect cells (e.g., SF9 cells and Drosophila cells), worm cells (e.g., Caenorhabditis spp.), plant cells, or other suitable cells, including for example, transgenic or other recombinant cell lines.
  • a number of heterologous cell lines may be used, such as Chinese Hamster Ovary cells (CHO).
  • a nucleic acid into a eukaryotic genome (e.g., a mammalian genome) care should be taken to select sites that will allow sufficient expression (e.g., silenced regions of the genome should be avoided, whereas a site comprising an enhancer may be appropriate).
  • a eukaryotic genome e.g., a mammalian genome
  • RNA polymerase that incorporates one or more modified ribonucleotides (e.g., 2'-O-methyl ribonucleotides) that may stabilize RNA molecules
  • a population of cells may be grown under conditions suitable for the expression of the RNA molecules of the invention. Such conditions may involve providing a suitable nutrient medium to allow growth and proliferation of the cells.
  • the nutrient medium may contain any of the following in an appropriate combination: isotonic saline, buffer, amino acids, serum or serum replacement, and other exogenously added factors.
  • the nutrient medium may contain one or more drugs, such as antibiotics, used for selection of a cell having a particular characteristic.
  • the nutrient medium is serum free. Nutrient medium is commercially available from sources such as Life Technologies (Gaithersburg, MD).
  • a nucleic acid encoding different RNA molecules may be integrated into the host cell genome.
  • a population of transformed host cells can produce many different unique RNA molecules.
  • at least 10 s , 10 10 , 10 12 , 10 13 , 10 14 , 10 i5 , 10 16 , 10 17 , 10 18 , 10 19 , or 10 20 or more different unique RNA molecules may be transcribed (e.g., each having a unique variant sequence that is an aptamer or aptamer candidate sequence, and, optionally, a reporter sequence attached to the variant sequence).
  • aspects of the invention may involve one or more nucleic acid assembly reactions in order to make the sets of DNA molecules, RNA encoding fragments, aptamer constructs, modified host cells, and/or other nucleic acids that may be used to isolate and/or use RNA molecules having one or more functions of interest.
  • aspects of the invention may be used in conjunction with in vitro and/or in vivo nucleic acid assembly procedures. Non-limiting examples of techniques that may be used to assemble constructs of the invention are described herein and illustrated in FIGS. 1-5.
  • FIG. 6 provides non-limiting examples of nucleic acid constructs (e.g., DNA constructs) that can be used to express a plurality of RNA molecules for in vivo aptamer screening and/or selection.
  • FIG. 6 provides non-limiting examples of nucleic acid constructs (e.g., DNA constructs) that can be used to express a plurality of RNA molecules for in vivo aptamer screening and/or selection.
  • RNA variants 1, 2, 3, through n show an example of a single construct encoding RNA variants 1, 2, 3, through n.
  • the variants all may be different. However, in some embodiments, two or more copies of one or more variants may be present on a single construct.
  • the RNAs may be random variant sequences that are all aptamer candidates. However, the RNAs may be variants of an aptamer that has a known binding affinity (low or high) for a ligand of interest. In some embodiments, the variants may be different aptamers having different binding affinities for different ligands. Other configurations of different unique RNAs also may provided as the invention is not limited in this respect.
  • the encoded RNAs may be transcribed from separate promoters (identical, different, or a combination thereof) or transcribed as a single transcript from a single common promoter and processed (e.g., specifically cleaved) to generate the individual RNA molecules, or a combination of individual and common promoters.
  • n may be any integer (e.g., between 5 and 1,000, for example about 10 to about 100, about 50, etc., or smaller or larger).
  • FIG. 6B illustrates an embodiment wherein each RNA is operably associated with a reporter sequence (Reporter A). As illustrated, the same reporter sequence is fused to each different variant RNA.
  • reporter sequences may be fused to different variant RNAs and/or different reporter sequences may be fused to subsets of variant RNAs depending on the desired configuration.
  • FIG. 6B illustrates each reporter sequence located downstream from the variant RNA sequences.
  • the reporter sequences may be downstream, upstream, or a combination thereof when connected to different variant RNAs.
  • one or more variant RNAs may have reporter sequences at both their 5' and 3' ends.
  • a single reporter function may require both 5' and 3' sequences.
  • all of the reporter sequences may be 5' or 3' of the variant RNA sequences (e.g., of the unique variant RNA sequences) in one or more constructs.
  • a single construct may include 5' and 3' reporter sequences and/or a combination thereof.
  • a library of transcribed RNA molecules may be subjected to a screen or selection to identify one or more RNA molecules having a structural and/or functional property of interest.
  • the presence of an RNA of interest in an intracellular library of transcribed RNA molecules may be determined directly or indirectly.
  • the presence of an RNA of interest may be detected directly if the desired function can be directly screened or selected for.
  • a screen or selection may be based on the presence or absence of the enzymatic properties of interest.
  • Such an assay may be an in vivo assay. However, in some embodiments, an in vitro assay may be performed on cell extracts.
  • the presence of an RNA that binds to a ligand with high affinity and/or specificity may be detected directly if the binding to the ligand results in a detectable signal (e.g., an increase or decrease in fluorescence intensity).
  • a detectable signal e.g., an increase or decrease in fluorescence intensity
  • an RNA aptamer bound to malachite green may fluoresce whereas the dye alone does not fluoresce.
  • a fluorescent ligand or effector may be used and the assay to detect an RNA aptamer that binds to the ligand or effector may involve detecting quenching of the fluorescent signal associated with aptamer binding.
  • the ligand or effector may be toxic and RNA aptamer binding may lower the toxicity.
  • an RNA that cleaves or modifies an effector molecule may be detected if cleavage or modification alters a detectable or selectable property of the ligand or effector.
  • RNA binding to a ligand may not be readily detectable using a direct detection technique.
  • RNA binding to a ligand may be detected indirectly if the candidate RNA is fused to a predetermined reporter RNA domain and binding of the candidate RNA to a ligand affects the structure and properties of the reporter domain to an extent that can be detected using one or more different readouts.
  • a reporter domain may be a riboregulator or switch domain that changes conformation to either expose or sequester an antisense sequence when a ligand binds to the candidate domain.
  • the readout could be any detectable or selectable phenotype that can be regulated by antisense technology.
  • any detectable or selectable phenotype may be used.
  • a readout may be drug resistance or susceptibility (e.g., antibiotic resistance or susceptibility), one or more detectable cell surface properties, a change in fluorescence intensity, auxotrophy, or one or more anabolic or catabolic phenotypes. It should be appreciated that a reporter domain may be fused to each candidate RNA transcribed in a library.
  • a DNA encoding the reporter RNA may be fused to each of the DNAs encoding the different RNA candidates in the library so that each candidate is transcribed along with a reporter domain.
  • a DNA encoding a reporter RNA domain may be fused at the 3' end or the 5' end of each DNA encoding a candidate RNA, and accordingly transcribed candidate RNAs may have a reporter RNA at either their 3' or 5' end.
  • a reporter RNA may be fused at both the 3' and 5' ends. The reporter domains fused at the 3' and 5' ends may control different readouts.
  • different groups of candidate RNAs in the transcribed RNA library may be fused to different reporter RNAs.
  • a reporter RNA domain may be an enzyme that can be disrupted by ligand binding to an adjacent aptamer domain.
  • a reporter RNA domain may be a protein binding domain that can be disrupted by ligand binding to an adjacent aptamer domain.
  • each nucleic acid sequence expressing an RNA molecule has a different reporter system.
  • two or more nucleic acid sequences have the same reporter system.
  • the reporter system is the system disclosed by Smolke et al. (2005, Nature Biotechnology, 23(3):337-343), the entire contents of which are incorporated herein by reference.
  • a ligand responsive riboregulator may be used to regulate the expression of any target transcript in response to any ligand.
  • An example of such a construct may be a riboregulator having an antisense domain that controls gene expression and an aptamer domain that recognizes specific effector ligands.
  • Ligand binding induces a conformational change in the molecule that allows the antisense domain to interact with a target mRNA and inhibit or reduce translation.
  • the aptamer may bind a xanthine derivative, theopylline, causing a conformational change allowing the antisense domain to interact with the mRNA encoding green fluorescent protein (GFP).
  • GFP green fluorescent protein
  • the reporter system may be a yeast three-hybrid system such as that disclosed by SenGupta D. J. et al. (1996, Proc. Natl. Acad. Sci. USA, 93:8496-8501), the entire contents of which are incorporated herein by reference.
  • a hybrid protein containing a DNA-binding domain for example LexA
  • RNA-binding domain 1 localizes to the promoter of an appropriate reporter gene.
  • a second hybrid protein containing a transcriptional activation domain with RNA binding domain 2 activates transcription of the reporter gene when in close proximity to the gene's upstream regulatory sequences.
  • a reporter domain may be any domain that is sensitive to (e.g., can be disrupted by) a ligand binding to an aptamer sequence that is fused to the reporter domain.
  • the readout of mediated by the reporter domain may involve any detectable or selectable direct or indirect phenotype.
  • the reporter may act via one or more protein, RNA, DNA, and/or other domains to produce a readout.
  • an RNA reporter domain may be a ribozyme, an RNA switch, an antisense RNA, an allosteric effector RNA, an RNA that regulates the expression or activity of another RNA molecule, or an RNA that binds to a detectable compound. Therefore, the reporter domain also may be an aptamer domain.
  • each cell contains only one type of RNA candidate molecule
  • the isolation of a cell that has a selected or screened for phenotype provides the identify of the RNA having a desired structure or function (e.g., enzymatic activity, binding affinity, etc.)-
  • the nucleic acid encoding the transcribed RNA may be isolated and sequenced.
  • the isolation of a cell having a selected or screened for phenotype only narrows the identity of the targeted RNA down to one of the different RNAs that are transcribed in that cell.
  • the RNA with the desired structural and/or functional properties may be identified by independently testing each of the different RNAs that are transcribed in the cell.
  • the RNAs may be tested by cloning each one and transcribing them and assaying them individually in vivo.
  • individual RNAs may be synthesized or assembled and tested in vivo or in vitro. It should be appreciated that other techniques may be used to identify the RNA of interest.
  • a cell that is isolated as having a desired phenotype may contain a set of RNA coding sequences that is enriched for one or a few variants.
  • RNAs that have the desired properties may be selected from further rounds of selection and or screening to enrich host cells for RNAs that have the desired properties.
  • Repeated selection and/or screening may favor cells that have more copies of the RNA of interest relative to other transcribed RNA variants (e.g., due to gene conversion or other process that results in the RNA of interest spreading across the set of transcribed RNAs).
  • a plurality of aptamers may be preselected by their ability to bind to one or more different molecules of interest (e.g., one or more different ligands or effector molecules).
  • a plurality of different aptamers may be transcribed by a single cell line.
  • each cell expresses at least 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100 or more aptamers.
  • the transcribed aptamers are all different.
  • the transcribed aptamers may include one or more copies of the same aptamer.
  • the transcription of one or more aptamers may be under the control of the same promoter.
  • transcription of one or more aptamers molecules may be under the control of separate promoters.
  • the separate promoters may be separate copies of the same promoter or different promoters.
  • one or more promoters may be inducible.
  • aptamer transcription may involve transcription enzymes of the host cell.
  • transcribed aptamers may be of different lengths.
  • an aptamer may be at least 50 to at least 200 nucleotide bases long.
  • an aptamer may be at least 50, 55, 60, 65, 70, 75, 80, 85, 90, 95,
  • each transcribed aptamer may be of a different length.
  • certain aptamers may be transcribed as a single RNA chain. A single transcribed RNA may include two or more aptamers.
  • a single transcribed RNA may include 3, 4, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, or 100 aptamers.
  • the single RNA transcript may include one or more cleavage sites that can be acted on to release one or more different aptamers from the RNA transcript.
  • one or more enzymes may cut the cleavage sites to release individual aptamers.
  • the cleavage sites may be autocatalytic RNA cleavage sites.
  • aptamers may be transcribed as individual transcripts.
  • a plurality of aptamers may be transcribed in a combination of individual aptamer transcripts and RNA transcripts that include two or more aptamers.
  • one or more aptamer coding sequences may be integrated into the genome of a host cell.
  • an aptamer may be transcribed fused to a reporter RNA.
  • the reporter RNA may produce a signal (either directly or indirectly) if the aptamer binds to its Hgand.
  • an aptamer readout using a reporter RNA could be drug resistance or susceptibility, a cell surface property, a change in fluorescence intensity, auxotrophy, or other anabolic or catabolic phenotypes.
  • an RNA aptamer may be identified in an in vitro screen or selection.
  • a pool of RNA molecules may be provided wherein each different RNA molecule contains the same reporter RNA domain (e.g., a riboregulator sequence) attached to a different unique RNA sequence (e.g., a random RNA sequence) that is an aptamer candidate.
  • the pool of RNA molecules may be screened or selected to identify RNA variants that bind to a molecule of interest.
  • screening or selection assays may involve different configurations of an assay wherein a candidate RNA that binds to the molecule of interest can be identified when it produces a configuration change in the riboregulator.
  • the configuration change of the riboregulator can be detected using any suitable technique.
  • a configuration change that affects the binding properties of the riboregulator e.g., by exposing or hiding one or more sequences of the riboregulator due to changes in hybridization patterns within the riboregulator domain due to the conformational change
  • the binding assay may involve a nucleic acid tag that is complementary to one or more of the riboregulator sequences that are either exposed or hidden (e.g., sequestered) as a result of a conformational change associated with binding or release of the ligand of interest by the aptamer portion of the RNA.
  • the complementary nucleic acid tag may be an RNA (e.g., an rnRNA), DNA, PNA, or any other nucleic acid molecule that binds to a sequence on the riboregulator.
  • the tag may be attached to a solid support (e.g., solid phase) or in a liquid or in a gel.
  • the complementary nucleic acid tag may be immobilized before or after exposure to the pool of RNA molecules.
  • ligand e.g., theophylline
  • the pool of RNA molecules can be exposed to the tag in presence of the ligand.
  • RNA molecules that bind After immobilization of the tag (and any bound RNA molecules), unbound or weakly bound molecules may be washed away. RNA molecules that bind then may be eluted by removing the ligand.
  • initial binding can be performed in the absence of ligand, and, after washing, bound aptamers can be eluted by adding ligand.
  • a combination of sequential binding and elution in the presence or absence of ligand may be performed.
  • a combination of binding an elutions to the first and second nucleic acid tags in the absence and presence of ligand may be used to isolate aptamers of interest.
  • candidate RNA molecules or candidate motifs that are identified in an in vitro screen may be tested in vivo using a suitable expression vector for expressing the one or more RNA molecules.
  • the candidate aptamer sequences may be tested in vivo in association with the same reporter domain that was used for the in vitro screen.
  • the candidate aptamer sequences may be connected to a different reporter domain to test for in vivo properties of interest.
  • in vitro techniques may be used to screen variants of one or more aptamers identified in vitro.
  • a library containing a large number of variants of an RNA aptamer identified in vitro may be screened to identify variants that have higher affinity, that bind in vivo, or a combination thereof.
  • variants each may have one or more sequence changes (e.g., 1-10, 10-50, 50-100, etc.) relative to an initial RNA aptamer.
  • an in vivo technique of the invention may be used to screen variants of an RNA aptamer that was identified in a previous in vivo screen or selection. It should be appreciated that information obtained from a first in vitro or in vivo screen or selection may be used to identify important features for binding to a particular ligand.
  • a subsequent library may be designed to include variants at all other positions (and not the ones identified as important or required) in order to identify variants that increase the core binding associated with the important or required positions.
  • RNA pools or for in vitro or in vivo libraries of candidate KNA aptamers are described in the literature, for example in Carothers et al. (J. Am. Chem. Soc. 2004, 126, 5130-5137), the entire contents of which are incorporated herein by reference.
  • RNA aptamers with high affinity for large, complex, flexible, and/or hydrophobic molecules may be rarer than RNA aptamers for smaller and/or charged Iigands such as GTP and/or theophylline.
  • aspects of the invention may be useful to screen large numbers of RNA molecules and identify RNA aptamers having affinity for any ligand of interest (e.g., any ligand described herein).
  • libraries containing relatively long RNA candidate molecules may be screened to identify RNA aptamers that bind to certain Iigands (e.g., large, complex, flexible, hydrophobic, or any combination thereof).
  • relatively long RNA sequences may be important to allow for complex configurations that can bind (e.g., with high affinity) to certain
  • libraries of previously known aptamers may be prepared to include a variable length linker (on one or both sides of the original aptamers) so that more complex RNA including the original aptamers can be tested.
  • These libraries can be screened in vivo or in vitro as described herein (e.g., attached a reporter domain such as a riboregulator domain).
  • aspects of the invention provide sets of aptamers that can detect the presence of one or more different Iigands or effector molecules.
  • an aptamer set may be provided and transcribed in a host cell (e.g., from a transcription template that is in a vector or that is integrated into the genome of the host cell).
  • any additional RNAs and/or proteins that may be required for the different readouts may be transcribed in the host cell.
  • Aptamer sets of the invention may be used to detect the presence of any type of ligand, including for example, different analytes, metabolic intermediates and products, toxins, environmental contaminants and pollutants, and any other type of ligand and or effector molecule.
  • an environmental pollutant may be a water, air, or soil pollutant.
  • Water pollutants may be compounds such as organic and inorganic chemicals, for example, heavy metals, petrochemicals, chloroform, and different types of bacteria. Water pollution also may occur in the form of thermal pollution and dissolved oxygen depletion.
  • Air pollutants may be compounds such as carbon monoxide, sulfur dioxide, chlorofluorocarbons (CFCs), and nitrogen oxides.
  • Soil pollutants may be compounds such as hydrocarbons, heavy metals, methyl tert-butyl ether (MTBE), herbicides, pesticides and chlorinated hydrocarbons, and others.
  • detection methods may be important for detecting changes in pollutants after natural disasters such as hurricanes or flooding.
  • Compositions and methods of the invention also may be useful to identify the presence of one or more metabolic intermediates and/or products.
  • detection may be performed in the natural cellular environment in a live cell rather than in a cellular extract.
  • metabolic pathways may be studied and individual steps may be identified by providing, in vivo, a. plurality of different aptamers that are responsive to different intermediate compounds.
  • an aptamer set containing different aptamers that are responsive to different substrates, metabolic intermediates, and/or desired end products may be used as a reporter system (e.g., either on a plasmid or integrated into the genome of a host cell) in techniques designed to evolve or select novel biosynthetic pathways.
  • An aptamer set that is selected may include one or more copies of aptamers that are selective for intermediates o analytes that are expected to be produced in a novel biosynthetic pathway of interest.
  • an appropriate readout from an aptamer set may be used to indicate that a particular combination of enzymes and/or enzyme variants may have a metabolic effect that is desired.
  • a nucleic acid construct encoding an aptamer set of interest may be transcribed in vitro.
  • a set of RNA aptamers that are responsive to different ligands of interest may be assembled in vitro.
  • Sets of aptamers that bind specifically to a plurality of different ligands also may be used in vitro.
  • the aptamers may be used in an in vitro assay to detect any one or more of a plurality of different ligands (e.g., metabolic intermediates, toxins, environmental pollutants, contaminants, pathogens, analytes, etc.).
  • one or more stabilizing residues e.g., one or more 2'-O-methyl ribonucleotides or other stabilizing ribonucleotides
  • aspects of the invention may involve one or more nucleic acid assembly reactions in order to make the sets of DNA molecules,.
  • FIG. 5 illustrates a method for assembling a nucleic acid in accordance with one embodiment of the invention.
  • sequence information may be the sequence of a predetermined target nucleic acid that is to be assembled.
  • the sequence may be received in the form of an order from a customer.
  • the order may be received electronically or on a paper copy.
  • the sequence may be received as a nucleic acid sequence (e.g., DNA or RNA).
  • the sequence may be received as a protein sequence.
  • the sequence may be converted into a DNA sequence. For example, if the sequence obtained in act 500 is an RNA sequence, the Us may be replaced with Ts to obtain the corresponding DNA sequence. If the sequence obtained in act 500 is a protein sequence, it may be converted into a DNA sequence using appropriate codons for the amino acids.
  • codons for each amino acid consideration may be given to one or more of the following factors: i) using codons that correspond to the codon bias in the organism in which the target nucleic acid may be expressed, ii) avoiding excessively high or low GC or AT contents in the target nucleic acid (for example, above 60% or below 40%; e.g., greater than 65%, 70%, 75%, 80%, 85%, or 90%; or less than 35%, 30%, 25%, 20%, 15%, or 10%), and iii) avoiding sequence features that may interfere with the assembly procedure (e.g., the presence of repeat sequences or stem loop structures). However, these factors may be ignored in some embodiments as the invention is not limited in this respect.
  • a DNA sequence determination may omit one or more steps relating to the analysis of the GC or AT content of the target nucleic acid sequence (e.g., the GC or AT content may be ignored in some embodiments) or one or more steps relating to the analysis of certain sequence features (e.g., sequence repeats, inverted repeats, etc.) that could interfere with an assembly reaction performed under standard conditions but may not interfere with an assembly reaction including one or more concerted assembly steps.
  • sequence features e.g., sequence repeats, inverted repeats, etc.
  • the sequence information may be analyzed to determine an assembly strategy. This may involve determining whether the target nucleic acid will be assembled as a single fragment or if several intermediate fragments will be assembled separately and then combined in one or more additional rounds of assembly to generate the target nucleic acid.
  • input nucleic acids e.g., oligonucleotides
  • the sizes and numbers of the input nucleic acids may be based in part on the type of assembly reaction (e.g., the type of polymerase-based assembly, ligase-based assembly, chemical assembly, or combination thereof) that is being used for each fragment.
  • the input nucleic acids also niay be designed to avoid 5' and/or 3' regions that may cross-react incorrectly and be assembled to produce undesired nucleic acid fragments. Other structural and/or sequence factors also may be considered when designing the input nucleic acids. In certain embodiments, some of the input nucleic acids may be designed to incorporate one or more specific sequences (e.g., primer binding sequences, restriction enzyme sites, etc.) at one or both ends of the assembled nucleic acid fragment.
  • specific sequences e.g., primer binding sequences, restriction enzyme sites, etc.
  • the input nucleic acids are obtained. These may be synthetic oligonucleotides that are synthesized on-site or obtained from a different site (e.g., from a commercial supplier). In some embodiments, one or more input nucleic acids may be amplification products (e.g., PCR products), restriction fragments, or other suitable nucleic acid molecules. Synthetic oligonucleotides may be synthesized using any appropriate technique as described in more detail herein. It should be appreciated that synthetic oligonucleotides often have sequence errors. Accordingly, oligonucleotide preparations may be selected or screened to remove error-containing molecules as described in more detail herein.
  • an assembly reaction may be performed for each nucleic acid fragment.
  • the input nucleic acids may be assembled using any appropriate assembly technique (e.g., a polymerase-based assembly, a ligase-based assembly, a chemical assembly, or any other multiplex nucleic acid assembly technique, or any combination thereof).
  • An assembly reaction may result in the assembly of a number of different nucleic acid products in addition to the predetermined nucleic acid fragment. Accordingly, in some embodiments, an assembly reaction may be processed to remove incorrectly assembled nucleic acids (e.g., by size fractionation) and/or to enrich correctly assembled nucleic acids (e.g., by amplification, optionally followed by size fractionation).
  • correctly assembled nucleic acids may be amplified (e.g., in a PCR reaction) using primers that bind to the ends of the predetermined nucleic acid fragment. It should be appreciated that act 530 may be repeated one or more times. For example, in a first round of assembly a first plurality of input nucleic acids (e.g., oligonucleotides) may be assembled to generate a first nucleic acid fragment. In a second round of assembly, the first nucleic acid fragment may be combined with one or more additional nucleic acid fragments and used as starting material for the assembly of a larger nucleic acid fragment.
  • a first plurality of input nucleic acids e.g., oligonucleotides
  • this larger fragment may be combined with yet further nucleic acids and used as starting material for the assembly of yet a larger nucleic acid.
  • This procedure may be repeated as many times as needed for the synthesis of a target nucleic acid. Accordingly, progressively larger nucleic acids may be assembled.
  • nucleic acids of different sizes may be combined.
  • the nucleic acids being combined may have been previously assembled in a multiplex assembly reaction. However, at each stage, one or more nucleic acids being combined may have been obtained from different sources (e.g., PCR amplification of genomic DNA or cDNA, restriction digestion of a plasmid or genomic DNA, or any other suitable source).
  • nucleic acids generated in each cycle of assembly may contain sequence errors if they incorporated one or more input nucleic acids with sequence error(s). Accordingly, a fidelity optimization procedure may be performed after a cycle of assembly in order to remove or correct sequence errors. It should be appreciated that fidelity optimization may be performed after each assembly reaction when several successive cycles of assembly are performed. However, in certain embodiments fidelity optimization may be performed only after a subset (e.g., 2 or more) of successive assembly reactions are complete. In some embodiments, no fidelity optimization is performed.
  • act 540 is an optional fidelity optimization procedure.
  • Act 540 may be used in some embodiments to remove nucleic acid fragments that seem to be correctly assembled (e.g., based on their size or restriction enzyme digestion pattern) but that may have incorporated input nucleic acids containing sequence errors as described herein. For example, since synthetic oligonucleotides may contain incorrect sequences due to errors introduced during oligonucleotide synthesis, it may be useful to remove nucleic acid fragments that have incorporated one or more error-containing oligonucleotides during assembly. In some embodiments, one or more assembled nucleic acid fragments may be sequenced to determine whether they contain the predetermined sequence or not. This procedure allows fragments with the correct sequence to be identified.
  • error containing-nucleic acids may be double-stranded homoduplexes having the error on both strands (i.e., incorrect complementary nucleotide(s), deletion(s), or addition(s) on both strands), because the assembly procedure may involve one or more rounds of polymerase extension (e.g., during assembly or after assembly to amplify the assembled product) during which an input nucleic acid containing an error may serve as a template thereby producing a complementary strand with the complementary error.
  • polymerase extension e.g., during assembly or after assembly to amplify the assembled product
  • a preparation of double-stranded nucleic acid fragments may be suspected to contain a mixture of nucleic acids that have the correct sequence and nucleic acids that incorporated one or more sequence errors during assembly.
  • sequence errors may be removed using a technique that involves denaturing and reannealing the double-stranded nucleic acids.
  • single strands of nucleic acids that contain complementary errors may be unlikely to reanneal together if nucleic acids containing each individual error are present in the nucleic acid preparation at a lower frequency than nucleic acids having the correct sequence at the same position. Rather, error containing single strands may reanneal with a complementary strand that contains no errors or that contains one or more different errors.
  • error- containing strands may end up in the form of heteroduplex molecules in the reannealed reaction product.
  • Nucleic acid strands that are error-free may reanneal with error- containing strands or with other error-free strands.
  • Reannealed error-free strands form homoduplexes in the reannealed sample.
  • Any suitable method for removing heteroduplex molecules may be used, including chromatography, electrophoresis, selective binding of heteroduplex molecules, etc.
  • mismatch binding proteins that selectively (e.g., specifically) bind to heteroduplex nucleic acid molecules may be used.
  • One example includes using MutS, a MutS homolog, or a combination thereof to bind to heteroduplex molecules.
  • the MutS protein which appears to function as a homodirner, serves as a mismatch recognition factor.
  • MSH MutS Homolog
  • the MSH2-MSH6 complex (also known as MutS ⁇ ) recognizes base mismatches and single nucleotide insertion/deletion loops
  • the MSH2-MSH3 complex (also known as MutS ⁇ ) recognizes insertions/deletions of up to 12-16 nucleotides, although they exert substantially redundant functions.
  • a mismatch binding protein may be obtained from recombinant or natural sources.
  • a mismatch binding protein may be heat-stable.
  • a thermostable mismatch binding protein from a thermophilic organism may be used.
  • thermostable DNA mismatch binding proteins include, but are not limited to: Tth MutS (from Thermus thermophilus); Taq MutS (from Thermus aqualicus); Apy MutS (from Aquifex pyrophilus); Tma MutS (from Thermotoga maritimd) ⁇ any other suitable MutS; or any combination of two or more thereof.
  • protein-bound heteroduplex molecules may be removed from a sample using any suitable technique (binding to a column, a filter, a nitrocellulose filter, etc., or any combination thereof). It should be appreciated that this procedure may not be 100% efficient. Some errors may remain for at least one of the following reasons. Depending on the reaction conditions, not all of the double-stranded error-containing nucleic acids may be denatured. In addition, some of the denatured error-containing strands may reanneal with complementary error-containing strands to form an error containing homoduplex.
  • the MutS/heteroduplex interaction and the >MutS/heteroduplex removal procedures may not be 100% efficient. Accordingly, in some embodiments the fidelity optimization act 540 may be repeated one or more times after each assembly reaction. For example, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more cycles of fidelity optimization may be performed after each assembly reaction.
  • the nucleic acid is amplified after each fidelity optimization procedure. It should be appreciated that each cycle of fidelity optimization will remove additional error-containing nucleic acid molecules. However, the proportion of correct sequences is expected to reach a saturation level after a few cycles of this procedure.
  • the size of an assembled nucleic acid that is fidelity optimized may be determined by the expected number of sequence errors that are suspected to be incorporated into the nucleic acid during assembly.
  • an assembled nucleic acid product should include error free nucleic acids prior to fidelity optimization in order to be able to enrich for the error free nucleic acids. Accordingly, error screening (e.g., using MutS or a MutS homolog) should be performed on shorter nucleic acid fragments when input nucleic acids have higher error rates.
  • one or more nucleic acid fragments of between about 200 and about 800 nucleotides are assembled prior to fidelity optimization. After assembly, the one or more fragments may be exposed to one or more rounds of fidelity optimization as described herein. In some embodiments, several assembled fragments may be ligated together (e.g., to produce a larger nucleic acid fragment of between about 1,000 and about 5,000 bases in length, or larger), and optionally cloned into a vector, prior to fidelity optimization as described herein.
  • an output nucleic acid is obtained. As discussed herein, several rounds of act 530 and/or 540 may be performed to obtain the output nucleic acid, depending on the assembly strategy that is implemented.
  • the output nucleic acid may be amplified, cloned, stored, etc., for subsequent uses at act 560.
  • an output nucleic acid may be cloned with one or more other nucleic acids (e.g., other output nucleic acids) for subsequent applications. Subsequent applications may include one or more research, diagnostic, medical, clinical, industrial, therapeutic, environmental, agricultural, or other uses.
  • aspects of the invention may include automating one or more acts described herein. For example, sequence analysis, the identification of interfering sequence features, assembly strategy selection (including fragment design and selection, the choice of a particular combination of extension-based and ligation-based assembly reactions, etc.), fragment production, single-stranded overhang production, and/or concerted assembly may be automated in order to generate the desired product automatically. Acts of the invention may be automated using, for example, a computer system.
  • aspects of the invention may be used in conjunction with any suitable multiplex nucleic acid assembly procedure.
  • concerted assembly steps may be used in connection with or more of the multiplex nucleic acid assembly procedures described below.
  • multiplex nucleic acid assembly relates to the assembly of a plurality of nucleic acids to generate a longer nucleic acid product.
  • multiplex oligonucleotide assembly relates to the assembly of a plurality of oligonucleotides to generate a longer nucleic acid molecule.
  • nucleic acids e.g., single or double-stranded nucleic acid degradation products, restriction fragments, amplification products, naturally occurring small nucleic acids, other polynucleotides, etc.
  • a multiplex assembly reaction e.g., along with one or more oligonucleotides
  • an assembled nucleic acid molecule that is longer than any of the single starting nucleic acids (e.g., oligonucleotides) that were added to the assembly reaction.
  • one or more nucleic acid fragments that each were assembled in separate multiplex assembly reactions may be combined and assembled to form a further nucleic acid that is longer than any of the input nucleic acid fragments.
  • one or more nucleic acid fragments that each were assembled in separate multiplex assembly reactions may be combined with one or more additional nucleic acids (e.g., single or double-stranded nucleic acid degradation products, restriction fragments, amplification products, naturally occurring small nucleic acids, other polynucleotides, etc.) and assembled to form a further nucleic acid that is longer than any of the input nucleic acids.
  • additional nucleic acids e.g., single or double-stranded nucleic acid degradation products, restriction fragments, amplification products, naturally occurring small nucleic acids, other polynucleotides, etc.
  • a target nucleic acid may have a sequence of a naturally occurring gene and/or other naturally occurring nucleic acid (e.g., a naturally occurring coding sequence, regulatory sequence, non-coding sequence, chromosomal structural sequence such as a telomere or centromere sequence, etc., any fragment thereof or any combination of two or more thereof).
  • a target nucleic acid may have a sequence that is not naturally-occurring.
  • a target nucleic acid may be designed to have a sequence that differs from a natural sequence at one or more positions.
  • a target nucleic acid may be designed to have an entirely novel sequence.
  • target nucleic acids may include one or more naturally occurring sequences, non-naturally occurring sequences, or combinations thereof.
  • multiplex assembly may be used to generate libraries of nucleic acids having different sequences.
  • a library may contain nucleic acids having random sequences.
  • a predetermined target nucleic acid may be designed and assembled to include one or more random sequences at one or more predetermined positions.
  • a target nucleic acid may include a functional sequence
  • a target nucleic acid may lack a specific functional sequence (e.g., a target nucleic acid may include only non-functional fragments or variants of a protein binding sequence, regulatory sequence, or protein encoding sequence, or any other non-functional naturally-occurring or synthetic sequence, or any non-functional combination thereof).
  • Certain target nucleic acids may include both functional and non-functional sequences.
  • a target nucleic acid may be assembled in a single multiplex assembly reaction (e.g., a single oligonucleotide assembly reaction). However, a target nucleic acid also may be assembled from a plurality of nucleic acid fragments, each of which may have been generated in a separate multiplex oligonucleotide assembly reaction. It should be appreciated that one or more nucleic acid fragments generated via multiplex oligonucleotide assembly also may be combined with one or more nucleic acid molecules obtained from another source (e.g., a restriction fragment, a nucleic acid amplification product, etc.) to form a target nucleic acid. In some embodiments, a target nucleic acid that is assembled in a first reaction may be used as an input nucleic acid fragment for a subsequent assembly reaction to produce a larger target nucleic acid.
  • a target nucleic acid may be assembled in a single multiplex assembly reaction (e.g., a single oligonucleotide assembly reaction).
  • different strategies may be used to produce a target nucleic acid having a predetermined sequence.
  • different starting nucleic acids e.g., different sets of predetermined nucleic acids
  • predetermined nucleic acid fragments may be assembled using one or more different in vitro and/or in vivo techniques.
  • nucleic acids e.g., overlapping nucleic acid fragments
  • an enzyme e.g., a ligase and/or a polymerase
  • a chemical reaction e.g., a chemical ligation
  • in vivo e.g., assembled in a host cell after transfection into the host cell
  • each nucleic acid fragment that is used to make a target nucleic acid may be assembled from different sets of oligonucleotides.
  • a nucleic acid fragment may be assembled using an in vitro or an in vivo technique (e.g., an in vitro or in vivo polymerase, recombinase, and/or ligase based assembly process).
  • an in vitro assembly reaction may involve one or more polymerases, ligases, other suitable enzymes, chemical reactions, or any combination thereof.
  • Multiplex oligonucleotide assembly A predetermined nucleic acid fragment may be assembled from a plurality of different starting nucleic acids (e.g., oligonucleotides) in a multiplex assembly reaction (e.g., a multiplex enzyme-mediated reaction, a multiplex chemical assembly reaction, or a combination thereof). Certain aspects of multiplex nucleic acid assembly reactions are illustrated by the following description of certain embodiments of multiplex oligonucleotide assembly reactions. It should be appreciated that the description of the assembly reactions in the context of oligonucleotides is not intended to be limiting.
  • the assembly reactions described herein may be performed using starting nucleic acids obtained from one or more different sources (e.g., synthetic or natural polynucleotides, nucleic acid amplification products, nucleic acid degradation products, oligonucleotides, etc.).
  • the starting nucleic acids may be referred to as assembly nucleic acids (e.g., assembly oligonucleotides).
  • assembly nucleic acids e.g., assembly oligonucleotides
  • an assembly nucleic acid has a sequence that is designed to be incorporated into the nucleic acid product generated during the assembly process.
  • the description of the assembly reactions in the context of single-stranded nucleic acids is not intended to be limiting.
  • one or more of the starting nucleic acids illustrated in the figures and described herein may be provided as double stranded nucleic acids. Accordingly, it should be appreciated that where the figures and description illustrate the . assembly of single-stranded nucleic acids, the presence of one or more complementary nucleic acids is contemplated. Accordingly, one or more double-stranded complementary nucleic acids may be included in a reaction that is described herein in the context of a single-stranded assembly nucleic acid. However, in some embodiments the presence of one or more complementary nucleic acids may interfere with an assembly reaction by competing for hybridization with one of the input assembly nucleic acids.
  • an assembly reaction may involve only single- stranded assembly nucleic acids (i.e., the assembly nucleic acids may be provided in a single-stranded form without their complementary strand) as described or illustrated herein.
  • the presence of one or more complementary nucleic acids may have no or little effect on the assembly reaction.
  • complementary nucleic acid(s) may be incorporated during one or more steps of an assembly.
  • assembly nucleic acids and their complementary strands may be assembled under the same assembly conditions via parallel assembly reactions in the same reaction mixture.
  • a nucleic acid product resulting from the assembly of a plurality of starting nucleic acids may be identical to the nucleic acid product that results from the assembly of nucleic acids that are complementary to the starting nucleic acids (e.g., in some embodiments where the assembly steps result in the production of a double-stranded nucleic acid product).
  • an oligonucleotide may be a nucleic acid molecule comprising at least two covalently bonded nucleotide residues. In some embodiments, an oligonucleotide may be between 10 and 1,000 nucleotides long.
  • an oligonucleotide may be between 10 and 500 nucleotides long, or between 500 and 1,000 nucleotides long. In some embodiments, an oligonucleotide may be between about 20 and about 100 nucleotides long (e.g., from about 30 to 90, 40 to 85, 50 to 80, 60 to 75, or about 65 or about 70 nucleotides long), between about 100 and about 200, between about 200 and about 300 nucleotides, between about 300 and about 400, or between about 400 and about 500 nucleotides long. However, shorter or longer oligonucleotides may be used. An oligonucleotide may be a single-stranded nucleic acid. However, in some embodiments a double-stranded oligonucleotide may be used as described herein. In certain embodiments, an oligonucleotide may be chemically synthesized as described in more detail below.
  • an input nucleic acid e.g., oligonucleotide
  • the resulting product may be double-stranded.
  • one of the strands of a double-stranded nucleic acid may be removed before use so that only a predetermined single strand is added to an assembly reaction.
  • each oligonucleotide may be designed to have a sequence that is identical to a different portion of the sequence of a predetermined target nucleic acid that is to be assembled. Accordingly, in some embodiments each oligonucleotide may have a sequence that is identical to a portion of one of the two strands of a double-stranded target nucleic acid.
  • the two complementary strands of a double stranded nucleic acid are referred to herein as the positive (P) and negative (N) strands. This designation is not intended to imply that the strands are sense and anti-sense strands of a coding sequence.
  • a P strand may be a sense strand of a coding sequence
  • a P strand may be an anti-sense strand of a coding sequence
  • a target nucleic acid may be either the P strand, the N strand, or a double-stranded nucleic acid comprising both the P and N strands.
  • oligonucleotides may be designed to have different lengths.
  • one or more different oligonucleotides may have overlapping sequence regions (e.g., overlapping 5' regions or overlapping 3' regions). Overlapping sequence regions may be identical (i.e., corresponding to the same strand of the nucleic acid fragment) or complementary (i.e., corresponding to complementary strands of the nucleic acid fragment).
  • the plurality of oligonucleotides may include one or more oligonucleotide pairs with overlapping identical sequence regions, one or more oligonucleotide pairs with overlapping complementary sequence regions, or a combination thereof. Overlapping sequences may be of any suitable length.
  • overlapping sequences may encompass the entire length of one or more nucleic acids used in an assembly reaction.
  • Overlapping sequences may be between about 5 and about 500 nucleotides long (e.g., between about 10 and 100, between about 10 and 75, between about 10 and 50, about 20, about 25, about 30, about 35, about 40, about 45, about 50, etc.) However, shorter, longer or intermediate overlapping lengths may be used. It should be appreciated that overlaps between different input nucleic acids used in an assembly reaction may have different lengths.
  • the combined sequences of the different oligonucleotides in the reaction may span the sequence of the entire nucleic acid fragment on either the positive strand, the negative strand, both strands, or a combination of portions of the positive strand and portions of the negative strand.
  • the plurality of different oligonucleotides may provide either positive sequences, negative sequences, or a combination of both positive and negative sequences corresponding to the entire sequence of the nucleic acid fragment to be assembled.
  • the plurality of oligonucleotides may include one or more oligonucleotides having sequences identical to one or more portions of the positive sequence, and one or more oligonucleotides having sequences that are identical to one or more portions of the negative sequence of the nucleic acid fragment.
  • One or more pairs of different oligonucleotides may include sequences that are identical to overlapping portions of the predetermined nucleic acid fragment sequence as described herein (e.g., overlapping sequence portions from the same or from complementary strands of the nucleic acid fragment).
  • the plurality of oligonucleotides includes a set of oligonucleotides having sequences that combine to span the entire positive sequence and a set oligonucleotides having sequences that combine to span the entire negative sequence of the predetermined nucleic acid fragment.
  • the plurality of oligonucleotides may include one or more oligonucleotides with sequences that are identical to sequence portions on one strand (either the positive or negative strand) of the nucleic acid fragment, but no oligonucleotides with sequences that are complementary to those sequence portions.
  • a plurality of oligonucleotides includes only oligonucleotides having sequences identical to portions of the positive sequence of the predetermined nucleic acid fragment. In one embodiment, a plurality of oligonucleotides includes only oligonucleotides having sequences identical to portions of the negative sequence of the predetermined nucleic acid fragment. These oligonucleotides may be assembled by sequential ligation or in an extension-based reaction (e.g., if an oligonucleotide having a 3' region that is complementary to one of the plurality of oligonucleotides is added to the reaction).
  • a nucleic acid fragment may be assembled in a polymerase- mediated assembly reaction from a plurality of oligonucleotides that are combined and extended in one or more rounds of polymerase-mediated extensions.
  • a nucleic acid fragment may be assembled in a ligase-mediated reaction from a plurality of oligonucleotides that are combined and ligated in one or more rounds of ligase-mediated ligations.
  • a nucleic acid fragment may be assembled in a non- enzymatic reaction (e.g., a chemical reaction) from a plurality of oligonucleotides that are combined and assembled in one or more rounds of non-enzymatic reactions.
  • a nucleic acid fragment may be assembled using a combination of polymerase, ligase, and/or non-enzymatic reactions.
  • polymerase(s) and ligase(s) may be included in an assembly reaction mixture.
  • a nucleic acid may be assembled via coupled amplification and ligation or ligation during amplification.
  • the resulting nucleic acid fragment from each assembly technique may have a sequence that includes the sequences of each of the plurality of assembly oligonucleotides that were used as described herein.
  • polymerase-based assembly techniques may involve one or more suitable polymerase enzymes that can catalyze a template-based extension of a nucleic acid in a 5' to 3' direction in the presence of suitable nucleotides and an annealed template.
  • a polymerase may be thermostable.
  • a polymerase may be obtained from recombinant or natural sources.
  • thermostable polymerase from a thermophilic organism may be used.
  • a polymerase may include a 3'- ⁇ 5' exonuclease/proofreading activity.
  • a polymerase may have no, or little, proofreading activity (e.g., a polymerase may be a recombinant variant of a natural polymerase that has been modified to reduce its proofreading activity).
  • thermostable DNA polymerases include, but are not limited to: Taq (a heat-stable DNA polymerase from the bacterium Thermus aguaticus); Pfu (a thermophilic DNA polymerase with a 3'- ⁇ 5' exonuclease/proofreading activity from Pyrococcus furiosus, available from for example Promega); VentR® DNA Polymerase and VentR® (exo-) DNA Polymerase (thermophilic DNA polymerases with or without a 3 ' ⁇ 5 ' exonuclease/proofreading activity from Thermococcus litoralis; also known as TIi polymerase); Deep VentR® DNA Polymerase and Deep VentR® (exo-) DNA Polymerase (thermophilic DNA polymerases with or without a 3 '—» 5 ' exonuclease/proofreading activity from Pyrococcus species GB-D; available from New England Biolabs); KOD HiFi (a recombinant Therm
  • coli DNA Polymerase I which retains polymerase activity, but has lost the 5 '- ⁇ 3 ' exonuclease activity, available from, for example, Promega and NEB); SequenaseTM (T7 DNA polymerase deficient in 3'-5' exonuclease activity); Phi29 (bacteriophage 29 DNA polymerase, may be used for rolling circle amplification, for example, in a TempliPhiTM DNA Sequencing Template Amplification Kit, available from Amersham Biosciences); TopoTaq (a hybrid polymerase that combines hyperstable DNA binding domains and the DNA unlinking activity of Methanopyrus topoisomerase, with no exonuclease activity, available from Fidelity Systems); TopoTaq HiFi which incorporates a proofreading domain with exonuclease activity; PhusionTM (a Pyrococcus-l ⁇ ke enzyme with a processivity-enhancing domain, available from New England Biolabs); any other suitable DNA poly
  • Ligase-based assembly techniques may involve one or more suitable ligase enzymes that can catalyze the covalent linking of adjacent 3' and 5' nucleic acid termini (e.g., a 5' phosphate and a 3' hydroxyl of nucleic acid(s) annealed on a complementary template nucleic acid such that the 3' terminus is immediately adjacent to the 5' terminus).
  • suitable ligase enzymes that can catalyze the covalent linking of adjacent 3' and 5' nucleic acid termini (e.g., a 5' phosphate and a 3' hydroxyl of nucleic acid(s) annealed on a complementary template nucleic acid such that the 3' terminus is immediately adjacent to the 5' terminus).
  • a ligase may catalyze a ligation reaction between the 5' phosphate of a first nucleic acid to the 3' hydroxyl of a second nucleic acid if the first and second nucleic acids are annealed next to each other on a template nucleic acid),
  • a ligase may be obtained from recombinant or natural sources.
  • a ligase may be a heat- stable ligase.
  • a thermostable ligase from a thermophilic organism may be used.
  • thermostable DNA ligases include, but are not limited to: Tth DNA ligase (from Thermus thermophilus, available from, for example, Eurogentec and GeneCraft); PfU DNA ligase (a hyperthermophilic ligase from Pyrococcus furiosus); Taq ligase (from Thermus aquaticus), any other suitable heat-stable ligase, or any combination thereof.
  • one or more lower temperature ligases may be used (e.g., T4 DNA ligase).
  • a lower temperature ligase may be useful for shorter overhangs (e.g., about 3, about 4, about 5, or about 6 base overhangs) that may not be stable at higher temperatures.
  • Non-enzymatic techniques can be used to ligate nucleic acids.
  • a 5'- end e.g., the 5' phosphate group
  • a 3'-end e.g., the 3 ? hydroxyl
  • non-enzymatic techniques may offer certain advantages over enzyme-based ligations.
  • non-enzymatic techniques may have a high tolerance of non-natural nucleotide analogues in nucleic acid substrates, may be used to ligate short nucleic acid substrates, may be used to ligate RNA substrates, and/or may be cheaper and/or more suited to certain automated (e.g., high throughput) applications.
  • Non-enzymatic ligation may involve a chemical ligation.
  • nucleic acid termini of two or more different nucleic acids may be chemically ligated.
  • nucleic acid termini of a single nucleic acid may be chemically ligated (e.g., to circularize the nucleic acid).
  • both strands at a first double-stranded nucleic acid terminus may be chemically ligated to both strands at a second double-stranded nucleic acid terminus.
  • only one strand of a first nucleic acid terminus may be chemically ligated to a single strand of a second nucleic acid terminus.
  • the 5' end of one strand of a first nucleic acid terminus may be ligated to the 3' end of one strand of a second nucleic acid terminus without the ends of the complementary strands being chemically ligated.
  • a chemical ligation may be used to form a covalent linkage between a 5' terminus of a first nucleic acid end and a 3' terminus of a second nucleic acid end, wherein the first and second nucleic acid ends may be ends of a single nucleic acid or ends of separate nucleic acids.
  • chemical ligation may involve at least one nucleic acid substrate having a modified end (e.g., a modified 5' and/or 3' terminus) including one or more chemically reactive moieties that facilitate or promote linkage formation.
  • chemical ligation occurs when one or more nucleic acid termini are brought together in close proximity (e.g., when the termini are brought together due to annealing between complementary nucleic acid sequences). Accordingly, annealing between complementary 3' or 5' overhangs (e.g., overhangs generated by restriction enzyme cleavage of a double-stranded nucleic acid) or between any combination of complementary nucleic acids that results in a 3' terminus being brought into close proximity with a 5' terminus (e.g., the 3' and 5' termini are adjacent to each other when the nucleic acids are annealed to a complementary template nucleic acid) may promote a template-directed chemical ligation.
  • complementary 3' or 5' overhangs e.g., overhangs generated by restriction enzyme cleavage of a double-stranded nucleic acid
  • any combination of complementary nucleic acids that results in a 3' terminus being brought into close proximity with a 5' terminus e.g
  • Examples of chemical reactions may include, but are not limited to, condensation, reduction, and/or photochemical ligation reactions. It should be appreciated that in some embodiments chemical ligation can be used to produce naturally-occurring phosphodiester internucleotide linkages, non-naturally-occurring phosphamide pyrophosphate internucleotide linkages, and/or other non-naturally-occurring internucleotide linkages.
  • the process of chemical ligation may involve one or more coupling agents to catalyze the ligation reaction.
  • a coupling agent may promote a ligation reaction between reactive groups in adjacent nucleic acids (e.g., between a 5'- reactive moiety and a 3 '-reactive moiety at adjacent sites along a complementary template).
  • a coupling agent may be a reducing reagent (e.g., ferricyanide), a condensing reagent such (e.g., cyanoimidazole, cyanogen bromide, carbodiimide, etc.), or irradiation (e.g., UV irradiation for photo-ligation).
  • a chemical ligation may be an autoligation reaction that does not involve a separate coupling agent.
  • autoligation the presence of a reactive group on one or more nucleic acids may be sufficient to catalyze a chemical ligation between nucleic acid termini without the addition of a coupling agent (see, for example, Xu Y & Kool ET, 1997, Tetrahedron Lett. 38:5595-8).
  • Non-limiting examples of these reagent-free ligation reactions may involve nucleophilic displacements of sulfur on bromoacetyl, tosyl, or iodo-nucleoside groups (see, for example, Xu Y et al., 2001, Nat Biotech 19:148-52).
  • Nucleic acids containing reactive groups suitable for autoligation can be prepared directly on automated synthesizers (see, for example, Xu Y & Kool ET, 1999, Nuc. Acids Res. 27:875-81).
  • a phosphorothioate at a 3' terminus may react with a leaving group (such as tosylate or iodide) on a thymidine at an adjacent 5' terminus.
  • two nucleic acid strands bound at adjacent sites on a complementary target strand may undergo auto-ligation by displacement of a 5'-end iodide moiety (or tosylate) with a 3'-end sulfur moiety.
  • the product of an autoligation may include a non-naturally-occurring internucleotide linkage (e.g., a single oxygen atom may be replaced with a sulfur atom in the ligated product).
  • a synthetic nucleic acid duplex can be assembled via chemical ligation in a one step reaction involving simultaneous chemical ligation of nucleic acids on both strands of the duplex.
  • a mixture of 5'- phosphorylated oligonucleotides corresponding to both strands of a target nucleic acid may be chemically ligated by a) exposure to heat (e.g., to 97 0 C) and slow cooling to form a complex of annealed oligonucleotides, and b) exposure to cyanogen bromide or any other suitable coupling agent under conditions sufficient to chemically ligate adjacent 3' and 5' ends in the nucleic acid complex.
  • a synthetic nucleic acid duplex can be assembled via chemical ligation in a two step reaction involving separate chemical ligations for the complementary strands of the duplex.
  • each strand of a target nucleic acid may be ligated in a separate reaction containing phosphorylated oligonucleotides corresponding to the strand that is to be ligated and non-phosphorylated oligonucleotides corresponding to the complementary strand.
  • the non-phosphorylated oligonucleotides may serve as a template for the phosphorylated oligonucleotides during a chemical ligation (e.g. using cyanogen bromide).
  • the resulting single-stranded ligated nucleic acid may be purified and annealed to a complementary ligated single-stranded nucleic acid to form the target duplex nucleic acid (see, for example, Shabarova ZA et al., 1991, Nuc. Acids Res. 19:4247-51).
  • aspects of the invention may be used to enhance different types of nucleic acid assembly reactions (e.g., multiplex nucleic acid assembly reactions). Aspects of the invention may be used in combination with one or more assembly reactions described in, for example, Carr et al., 2004, Nucleic Acids Research, Vol. 32, No 20, el 62 (9 pages); Richmond et al., 2004, Nucleic Acids Research, Vol. 32, No 17, pp. 5011-5018; Caruthers et al., 1972, J. MoI. Biol. 72, 475-492; Hecker et al., 1998, Biotechniques 24:256-260; Kodumal et al., 2004, PNAS Vol. 101 , No. 44, pp.
  • synthesis and assembly methods described herein may be performed in any suitable format, including in a reaction tube, in a multi-well plate, on a surface, on a column, in a microfiuidic device (e.g., a microfluidic tube), a capillary tube, etc.
  • a microfiuidic device e.g., a microfluidic tube
  • a capillary tube e.g., a capillary tube
  • FIG. 1 shows one embodiment of a plurality of oligonucleotides that may be assembled in a polymerase-based multiplex oligonucleotide assembly reaction.
  • Figure IA shows two groups of oligonucleotides (Group P and Group N) that have sequences of portions of the two complementary strands of a nucleic acid fragment to be assembled.
  • Group P includes oligonucleotides with positive strand sequences (Pi, P 2 , ... P n -I, P n , P n+ i, ...P T , shown from 5'->3' on the positive strand).
  • Group N includes oligonucleotides with negative strand sequences (NT, ..., N n+ i, N n , N n-1 , ..., N 2 , Nj, shown from 5'-> 3' on the negative strand).
  • NT negative strand sequence
  • one or more of the oligonucleotides within the S or N group may overlap.
  • FIG. IA shows gaps between consecutive oligonucleotides in Group P and gaps between consecutive oligonucleotides in Group N.
  • FIG. IB- shows a structure of an embodiment of a Group P or Group N oligonucleotide represented in FIG. IA.
  • This oligonucleotide includes a 5' region that is complementary to a 5' region of a first oligonucleotide from the other group, a 3' region that is complementary to a 3' region of a second oligonucleotide from the other group, and a core or central region that is not complementary to any oligonucleotide sequence from the other group (or its own group).
  • This central region is illustrated as the B region in FIG. IB.
  • the sequence of the B region may be different for each different oligonucleotide.
  • the B region of an oligonucleotide in one group corresponds to a gap between two consecutive oligonucleotides in the complementary group of oligonucleotides.
  • the 5 '-most oligonucleotide in each group does not have a 5' region that is complementary to the 5' region of any other oligonucleotide in either group. Accordingly, the 5 '-most oligonucleotides (Pi and N ⁇ ) that are illustrated in FIG. IA each have a 3' complementary region and a 5' non-complementary region (the B region of FIG. IB), but no 5' complementary region.
  • any one or more of the oligonucleotides in Group P and/or Group N can be designed to have no B region.
  • a 5 '-most oligonucleotide has only the 3 ' complementary region (meaning that the entire oligonucleotide is complementary to the 3' region of the 3'-most oligonucleotide from the other group (e.g., the 3' region of Ni or P T shown in FIG. IA).
  • one of the other oligonucleotides in either Group P or Group N has only a 5' complementary region and a 3' complementary region (meaning that the entire oligonucleotide is complementary to the 5' and 3' sequence regions of the two overlapping oligonucleotides from the complementary group).
  • only a subset of oligonucleotides in an assembly reaction may include B regions. It should be appreciated that the length of the 5', 3', and B regions may be different for each oligonucleotide.
  • a 3 '-most oligonucleotide may be designed with a 3' region that extends beyond the 5' region of the 5 '-most oligonucleotide.
  • an assembled product may include the 5' end of the 5 '-most oligonucleotide, but not the 3' end of the 3 '-most oligonucleotide that extends beyond it.
  • FIG. 1C illustrates a subset of the oligonucleotides from FIG. IA 3 each oligonucleotide having a 5', a 3', and an optional B region.
  • Oligonucleotide P n is shown with a 5' region that is complementary to (and can anneal to) the 5' region of oligonucleotide N n -I.
  • Oligonucleotide P n also has a 3' region that is complementary to (and can anneal to) the 3 * region of oligonucleotide N n .
  • N n is also shown with a 5' region that is complementary (and can anneal to) the 5' region of oligonucleotide P n+ i.
  • This pattern could be repeated for all of oligonucleotides P 2 to P T and N 1 to N ⁇ 1 (with the 5 '-most oligonucleotides only having 3' complementary regions as discussed herein). If all of the oligonucleotides from Group P and Group N are mixed together under appropriate hybridization conditions, they may anneal to form a long chain such as the oligonucleotide complex illustrated in FIG. IA. However, subsets of the oligonucleotides may form shorter chains and even oligonucleotide dimers with annealed 5' or 3' regions. It should be appreciated that many copies of each oligonucleotide are included in a typical reaction mixture.
  • the resulting hybridized reaction mixture may contain a distribution of different oligonucleotide dimers and complexes.
  • Polymerase-mediated extension of the hybridized oligonucleotides results in a template- based extension of the 3' ends of oligonucleotides that have annealed 3' regions. Accordingly, polymerase-mediated extension of the oligonucleotides shown in FIG. 1C would result in extension of the 3' ends only of oligonucleotides P n and N n generating extended oligonucleotides containing sequences that are complementary to all the regions of N n and P n , respectively.
  • Extended oligonucleotide products with sequences complementary to all of N n- I and P n+ i would not be generated unless oligonucleotides P n . i and N n+ i were included in the reaction mixture. Accordingly, if all of the oligonucleotide sequences in a plurality of oligonucleotides are to be incorporated into an assembled nucleic acid fragment using a polymerase, the plurality of oligonucleotides should include 5 '-most oligonucleotides that are at least complementary to the entire 3' regions of the 3 '-most oligonucleotides.
  • the 5 '-most oligonucleotides also may have 5' regions that extend beyond the 3' ends of the 3 '-most oligonucleotides as illustrated in FIG. IA.
  • a ligase also may be added to ligate adjacent 5' and 3' ends that may be formed upon 3' extension of annealed oligonucleotides in an oligonucleotide complex such as the one illustrated in FIG. IA.
  • a single cycle of polymerase extension extends oligonucleotide pairs with annealed 3' regions.
  • a single molecule could be generated by ligating the extended oligonucleotide dimers.
  • a single molecule incorporating all of the oligonucleotide sequences may be generated by performing several polymerase extension cycles.
  • FIG. ID illustrates two cycles of polymerase extension
  • a minimal number of extension cycles for assembling a nucleic acid may be calculated as Iog 2 n, where n is the number of oligonucleotides being assembled.
  • progressive assembly of the nucleic acid may be achieved without using temperature cycles.
  • an enzyme capable of rolling circle amplification may be used (e.g., phi 29 polymerase) when a circularized nucleic acid (e.g., oligonucleotide) complex is used as a template to produce a large amount of circular product for subsequent processing using MutS or a MutS homolog as described herein.
  • a circularized nucleic acid e.g., oligonucleotide
  • annealed oligonucleotide pairs Pn/Nn and P n+ i/N n +i are extended to form oligonucleotide dimer products incorporating the sequences covered by the respective oligonucleotide pairs.
  • N n is extended to incorporate sequences that are complementary to the B and 5' regions of N n (indicated as N' n in FIG. ID).
  • N n+ 1 is extended to incorporate sequences that are complementary to the 5' and B regions of P n+ 1 (indicated as P' n +i in FIG. ID).
  • These dimer products may be denatured and reannealed to form the starting material of step 2 where the 3' end of the extended P n oligonucleotide is annealed to the 3' end of the extended N ⁇ +i oligonucleotide.
  • This product may be extended in a polymerase-mediated reaction to form a product that incorporates the sequences of the four oligonucleotides (P n , N n , P n +1 » Nn + 0-
  • One strand of this extended product has a sequence that includes (in 5' to 3' order) the 5% B, and 3' regions of P n , the complement of the B region of N n , the 5', B, and 3' regions of P n +i, and the complements of the B and 5' regions of N n+ i.
  • the other strand of this extended product has the complementary sequence. It should be appreciated that the 3' regions of P n and N n are complementary, the 5' regions of N n and P n +!
  • reaction products shown in FIG. ID are complementary, and the 3' regions of P n+ 1 and N n +i are complementary.
  • reaction products shown in FIG. ID are a subset of the reaction products that would be obtained using all of the oligonucleotides of Group P and Group N.
  • a first polymerase extension reaction using all of the oligonucleotides would result in a plurality of overlapping oligonucleotide dimers from P 1 ZNi to P ⁇ /N ⁇ . Each of these may be denatured and at least one of the strands could then anneal to an overlapping complementary strand from an adjacent (either 3* or 5') oligonucleotide dimer and be extended in a second cycle of polymerase extension as shown in FIG. ID.
  • FIG. 2 shows an embodiment of a plurality of oligonucleotides that may be assembled in a directional polymerase-based multiplex oligonucleotide assembly reaction.
  • only the 5 '-most oligonucleotide of Group P may be provided.
  • the remainder of the sequence of the predetermined nucleic acid fragment is provided by oligonucleotides of Group N.
  • the 3 '-most oligonucleotide of Group N (Nl) has a 3' region that is complementary to the 3' region of Pi as shown in FIG. 2B.
  • each Group N oligonucleotide (e.g., N n ) overlaps with two adjacent oligonucleotides: one overlaps with the 3' region (N n -O and one with the 5' region (N n + ⁇ ), except for N] that overlaps with the 3' regions of Pj (complementary overlap) and N2 (non-complementary overlap), and NT that overlaps only with N T - i. It should be appreciated that all of the overlaps shown in FIG.
  • each oligonucleotide may have 3', B, and 5'regions.of different lengths (including no B region in some embodiments). In some embodiments, none of the oligonucleotides may have B regions, meaning that the entire sequence of each oligonucleotide may overlap with the combined 5 1 and 3' region sequences of its two adjacent oligonucleotides.
  • Assembly of a predetermined nucleic acid fragment from the plurality of oligonucleotides shown in FIG. 2 A may involve multiple cycles of polymerase-mediated extension. Each extension cycle may be separated by a denaturing and an annealing step.
  • FIG. 2C illustrates the first two steps in this assembly process.
  • step 1 annealed oligonucleotides Pi and N 1 are extended to form an oligonucleotide dimer.
  • P t is shown with a 5' region that is non-complementary to the 3' region of Ni and extends beyond the 3' region of Ni when the oligonucleotides are annealed.
  • P 1 may lack the 5' non-complementary region and include only sequences that overlap with the 3' region of Ni.
  • the product of Pi extension is shown after step 1 containing an extended region that is complementary to the 5' end of Ni.
  • the single strand illustrated in FIG. 2C may be obtained by denaturing the oligonucleotide dimer that results from the extension of Pi/Ni in step 1.
  • the product of Pi extension is shown annealed to the 3' region of N 2 . This annealed complex may be extended in step 2 to generate an extended product that now includes sequences complementary to the B and 5' regions of N 2 .
  • cycles of extension may be obtained by denaturing the oligonucleotide dimer that results from the extension reaction of step 2. Additional cycles of extension may be performed to further assemble a predetermined nucleic acid fragment. In each cycle, extension results in the addition of sequences complementary to the B and 5' regions of the next Group N oligonucleotide. Each cycle may include a denaturing and annealing step. However, the extension may occur under the annealing conditions. Accordingly, in one embodiment, cycles of extension may be obtained by alternating between denaturing conditions (e.g., a denaturing temperature) and annealing/extension conditions (e.g., an annealing/extension temperature).
  • denaturing conditions e.g., a denaturing temperature
  • annealing/extension conditions e.g., an annealing/extension temperature
  • T (the number of group N oligonucleotides) may determine the minimal number of temperature cycles used to assemble the oligonucleotides.
  • progressive extension may be achieved without temperature cycling.
  • an enzyme capable promoting rolling circle amplification may be used (e.g., TempliPhi).
  • TempliPhi an enzyme capable promoting rolling circle amplification
  • a reaction mixture containing an assembled predetermined nucleic acid fragment also may contain a distribution of shorter extension products that may result from incomplete extension during one or more of the cycles or may be the result of an Pi/Ni extension that was initiated after the first cycle.
  • 2D illustrates an example of a sequential extension reaction where the 5'- most Pi oligonucleotide is bound to a support and the Group N oligonucleotides are unbound.
  • the reaction steps are similar to those described for FIG. 2C.
  • an extended predetermined nucleic acid fragment will be bound to the support via the 5'- most Pi oligonucleotide.
  • the complementary strand (the negative strand) may readily be obtained by denaturing the bound fragment and releasing the negative strand.
  • the attachment to the support may be labile or readily reversed (e.g., using light, a chemical reagent, a pH change, etc.) and the positive strand also may be released.
  • FIG. 2E illustrates an example of a sequential reaction where Pi is unbound and the Group N oligonucleotides are bound to a support. The reaction steps are similar to those described for FIG. 2C. However, an extended predetermined nucleic acid fragment will be bound to the support via the 5'-most N T oligonucleotide. Accordingly, the complementary strand (the positive strand) may readily be obtained by denaturing the bound fragment and releasing the positive strand.
  • the attachment to the support may be labile or readily reversed (e.g., using light, a chemical reagent, a pH change, etc.) and the negative strand also may be released. Accordingly, either the positive strand, the negative strand, or the double- stranded product may be obtained.
  • oligonucleotides may be used to assemble a nucleic acid via two or more cycles of polymerase-based extension. In many configurations, at least one pair of oligonucleotides have complementary 3' end regions. FIG.
  • 2F illustrates an example where an oligonucleotide pair with complementary 3' end regions is flanked on either side by a series of oligonucleotides with overlapping non-complementary sequences.
  • the oligonucleotides illustrated to the right of the complementary pair have overlapping 3' and 5' regions (with the 3' region of one oligonucleotide being identical to the 5' region of the adjacent oligonucleotide) that corresponding to a sequence of one strand of the target nucleic acid to be assembled.
  • the oligonucleotides illustrated to the left of the complementary pair have overlapping 3' and 5' regions (with the 3' region of one oligonucleotide being identical to the 5' region of the adjacent oligonucleotide) that correspond to a sequence of the complementary strand of the target nucleic acid.
  • These oligonucleotides may be assembled via sequential polymerase-based extension reactions as described herein (see also, for example, Xiong et al., 2004, Nucleic Acids Research, Vol. 32, No. 12, e98, 10 pages, the disclosure of which is incorporated by reference herein). It should be appreciated that different numbers and/or lengths of oligonucleotides may be used on either side of the complementary pair.
  • FIG. 3 shows an embodiment of a plurality of oligonucleotides that may be assembled in a ligase reaction.
  • FIG. 3 A illustrates the alignment of the oligonucleotides showing that they do not contain gaps (i.e., no B region as described herein).
  • the oligonucleotides may anneal to form a complex with no nucleotide gaps between the 3' and 5' ends of the annealed oligonucleotides in either Group P or Group N.
  • These oligonucleotides provide a suitable template for assembly using a ligase under appropriate reaction conditions.
  • these oligonucleotides also may be assembled using a polymerase-based assembly reaction as described herein.
  • FIG. 3B shows two individual ligation reactions. These reactions are illustrated in two steps.
  • ligation reactions may occur simultaneously or sequentially in any order and may occur as such in a reaction maintained under constant reaction conditions (e.g., with no temperature cycling) or in a reaction exposed to several temperature cycles.
  • the reaction illustrated in step 2 may occur before the reaction illustrated in step 1.
  • a Group N oligonucleotide is annealed to two adjacent Group P oligonucleotides (due to the complementary 5' and 3' regions between the P and N oligonucleotides), providing a template for ligation of the adjacent P oligonucleotides.
  • ligation of the N group oligonucleotides also may proceed in similar manner to assemble adjacent N oligonucleotides that are annealed to their complementary P oligonucleotide. Assembly of the predetermined nucleic acid fragment may be obtained through ligation of all of the oligonucleotides to generate a double stranded product. However, in some embodiments, a single stranded product of either the positive or negative strand may be obtained. In certain embodiments, a plurality of oligonucleotides may be designed to generate only single-stranded reaction products in a ligation reaction.
  • a first group of oligonucleotides may be provided to cover the entire sequence on one strand of the predetermined nucleic acid fragment (on either the positive or negative strand).
  • a second group of oligonucleotides may be designed to be long enough to anneal to complementary regions in the first group but not long enough to provide adjacent 5' and 3' ends between oligonucleotides in the second group. This provides substrates that are suitable for ligation of oligonucleotides from the first group but not the second group. The result is a single-stranded product having a sequence corresponding to the oligonucleotides in the first group.
  • a ligase reaction mixture that contains an assembled predetermined nucleic acid fragment also may contain a distribution of smaller fragments resulting from the assembly of a subset of the oligonucleotides.
  • FIG. 4 shows an embodiment of a ligase-based assembly where one or more of the plurality of oligonucleotides is bound to a support.
  • FIG. 4A the 5' most oligonucleotide of the P group oligonucleotides is bound to a support. Ligation of adjacent oligonucleotides in the 5' to 3' direction results in the assembly of a predetermined nucleic acid fragment.
  • FIG. 4 shows an embodiment of a ligase-based assembly where one or more of the plurality of oligonucleotides is bound to a support.
  • FIG. 4A the 5' most oligonucleotide of the P group oligonucleotides is bound to a support. Ligation of adjacent oligonucleot
  • N 2 may be in the form of a single oligonucleotide or it already may be ligated to one or more downstream oligonucleotides (N3, N4, etc.).
  • oligonucleotide may be bound to a support since the reaction can proceed in any direction.
  • a predetermined nucleic acid fragment may be assembled with a central oligonucleotide (i.e., neither the 5'-most or the 3'-most) that is bound to a support provided that the attachment to the support does not interfere with ligation.
  • FIG. 4B illustrates an example where a plurality of N group oligonucleotides are bound to a support and a predetermined nucleic acid fragment is assembled from P group oligonucleotides that anneal to their complementary support-bound N group oligonucleotides.
  • FIG. 4B illustrates a sequential addition.
  • adjacent P group oligonucleotides may be ligated in any order.
  • the bound oligonucleotides may be attached at their 5' end, 3' end, or at any other position provided that the attachment does not interfere with their ability to bind to complementary 5' and 3' regions on the oligonucleotides that are being assembled.
  • This reaction may involve one or more reaction condition changes (e.g., temperature cycles) so that ligated oligonucleotides bound to one immobilized N group oligonucleotide can be dissociated from the support and bind to a different immobilized N group oligonucleotide to provide a substrate for ligation to another P group oligonucleotide.
  • reaction condition changes e.g., temperature cycles
  • support-bound ligase reactions that generate a full length predetermined nucleic acid fragment also may generate a distribution of smaller fragments resulting from the assembly of subsets of the oligonucleotides.
  • a support used in any of the assembly reactions described herein may include any suitable support medium.
  • a support may be solid, porous, a matrix, a gel, beads, beads in a gel, etc.
  • a support may be of any suitable size.
  • a solid support may be provided in any suitable configuration or shape (e.g., a chip, a bead, a gel, a microfluidic channel, a planar surface, a spherical shape, a column, etc.).
  • suitable configuration or shape e.g., a chip, a bead, a gel, a microfluidic channel, a planar surface, a spherical shape, a column, etc.
  • different oligonucleotide assembly reactions may be used to assemble a plurality of overlapping oligonucleotides (with overlaps that are either 575', 373', 573% complementary, non-complementary, or a combination thereof).
  • oligonucleotides the pair including one oligonucleotide from a first group or P group of oligonucleotides and one oligonucleotide from a second group or N group of oligonucleotides
  • a predetermined nucleic acid may be assembled from non-overlapping oligonucleotides using blunt-ended ligation reactions.
  • the order of assembly of the non-overlapping oligonucleotides may be biased by selective phosphorylation of different 5' ends.
  • size purification may be used to select for the correct order of assembly.
  • the correct order of assembly may be promoted by sequentially adding appropriate oligonucleotide substrates into the reaction (e.g., the ligation reaction).
  • a purification step may be used to remove starting oligonucleotides and/or incompletely assembled fragments.
  • a purification step may involve chromatography, electrophoresis, or other physical size separation technique.
  • a purification step may involve amplifying the full length product. For example, a pair of amplification primers (e.g., PCR primers) that correspond to the predetermined 5' and 3' ends of the nucleic acid fragment being assembled will preferentially amplify full length product in an exponential fashion.
  • a pair of amplification primers e.g., PCR primers
  • the sequence of the predetermined fragment will be provided by the oligonucleotides as described herein.
  • the oligonucleotides may contain additional sequence information that may be removed during assembly or may be provided to assist in subsequent manipulations of the assembled nucleic acid fragment. Examples of additional sequences include, but are not limited to, primer recognition sequences for amplification (e.g., PCR primer recognition sequences), restriction enzyme recognition sequences, recombination sequences, other binding or recognition sequences, labeled sequences, etc.
  • one or more of the 5'-most oligonucleotides, one or more of the 3 '-most oligonucleotides, or any combination thereof may contain one or more additional sequences.
  • the additional sequence information may be contained in two or more adjacent oligonucleotides on either strand of the predetermined nucleic acid sequence.
  • an assembled nucleic acid fragment may contain additional sequences that may be used to connect the assembled fragment to one or more additional nucleic acid fragments (e.g., one or more other assembled fragments, fragments obtained from other sources, vectors, etc.) via ligation, recombination, polymerase-mediated assembly, etc.
  • purification may involve cloning one or more assembled nucleic acid fragments. The cloned product may be screened (e.g., sequenced, analyzed for an insert of the expected size, etc.).
  • a nucleic acid fragment assembled from a plurality of oligonucleotides may be combined with one or more additional nucleic acid fragments using a polymerase-based and/or a ligase-based extension reaction similar to those described herein for oligonucleotide assembly. Accordingly, one or more overlapping nucleic acid fragments may be combined and assembled to produce a larger nucleic acid fragment as described herein. In certain embodiments, double- stranded overlapping oligonucleotide fragments may be combined. However, single-stranded fragments, or combinations of single-stranded and double-stranded fragments may be combined as described herein.
  • a nucleic acid fragment assembled from a plurality of oligonucleotides may be of any length depending on the number and length of the oligonucleotides used in the assembly reaction.
  • a nucleic acid fragment (either single-stranded or double-stranded) assembled from a plurality of oligonucleotides may be between 50 and 1,000 nucleotides long (for example, about 70 nucleotides long, between 100 and 500 nucleotides long, between 200 and 400 nucleotides long, about 200 nucleotides long, about 300 nucleotides long, about 400 nucleotides long, etc.).
  • One or more such nucleic acid fragments (e.g., with overlapping 3' and/or 5' ends) may be assembled to form a larger nucleic acid fragment (single- stranded or double-stranded) as described herein.
  • a full length product assembled from smaller nucleic acid fragments also may be isolated or purified as described herein (e.g., using a size selection, cloning, selective binding or other suitable purification procedure).
  • any assembled nucleic acid fragment (e.g., full-length nucleic acid fragment) described herein may be amplified (prior to, as part of, or after, a purification procedure) using appropriate 5' and 3' amplification primers.
  • P Group and N Group oligonucleotides are used herein for clarity purposes only, and to illustrate several embodiments of multiplex oligonucleotide assembly.
  • the Group P and Group N oligonucleotides described herein are interchangeable, and may be referred to as first and second groups of oligonucleotides corresponding to sequences on complementary strands of a target nucleic acid fragment.
  • Oligonucleotides may be synthesized using any suitable technique.
  • oligonucleotides may be synthesized on a column or other support (e.g., a chip).
  • chip-based synthesis techniques include techniques used in synthesis devices or methods available from Combimatrix, Agilent, Affymetrix, or other sources.
  • a synthetic oligonucleotide may be of any suitable size, for example between 10 and 1,000 nucleotides long (e.g., between 10 and 200, 200 and 500, 500 and 1,000 nucleotides long, or any combination thereof).
  • An assembly reaction may include a plurality of oligonucleotides, each of which independently may be between 10 and 200 nucleotides in length (e.g., between 20 and 150, between 30 and 100, 30 to 90, 30-80, 30-70, 30-60, 35-55, 40-50, or any intermediate number of nucleotides). However, one or more shorter or longer oligonucleotides may be used in certain embodiments. Oligonucleotides may be provided as single stranded synthetic products.
  • oligonucleotides may be provided as double-stranded preparations including an annealed complementary strand.
  • Oligonucleotides may be molecules of DNA, RNA, PNA, or any combination thereof.
  • a double-stranded oligonucleotide may be produced by amplifying a single-stranded synthetic oligonucleotide or other suitable template (e.g., a sequence in a nucleic acid preparation such as a nucleic acid vector or genomic nucleic acid).
  • a plurality of oligonucleotides designed to have the sequence features described herein may be provided as a plurality of single-stranded oligonucleotides having those feature, or also may be provided along with complementary oligonucleotides.
  • an oligonucleotide may be phosphorylated (e.g., with a 5' phosphate).
  • an oligonucleotide may be non-phosphorylated.
  • an oligonucleotide may be amplified using an appropriate primer pair with one primer corresponding to each end of the oligonucleotide (e.g., one that is complementary to the 3' end of the oligonucleotide and one that is identical to the 5' end of the oligonucleotide).
  • an oligonucleotide may be designed to contain a central assembly sequence (designed to be incorporated into the target nucleic acid) flanked by a 5' amplification sequence (e.g., a 5' universal sequence) and a 3' amplification sequence (e.g., a 3' universal sequence).
  • Amplification primers corresponding to the flanking amplification sequences may be used to amplify the oligonucleotide (e.g., one primer may be complementary to the 3' amplification sequence and one primer may have the same sequence as the 5' amplification sequence).
  • the amplification sequences then may be removed from the amplified oligonucleotide using any suitable technique to produce an oligonucleotide that contains only the assembly sequence.
  • a plurality of different oligonucleotides e.g., about 5, 10,
  • a preparation of an oligonucleotide designed to have a certain sequence may include oligonucleotide molecules having the designed sequence in addition to oligonucleotide molecules that contain errors (e.g., that differ from the designed sequence at least at one position).
  • a sequence error may include one or more nucleotide deletions, additions, substitutions (e.g., transversion or transition), inversions, duplications, or any combination of two or more thereof.
  • Oligonucleotide errors may be generated during oligonucleotide synthesis. Different synthetic techniques may be prone to different error profiles and frequencies. In some embodiments, error rates may vary from 1/10 to 1/200 errors per base depending on the synthesis protocol that is used. However, in some embodiments lower error rates may be achieved. Also, the types of errors may depend on the synthetic techniques that are used. For example, in some embodiments chip-based oligonucleotide synthesis may result in relatively more deletions than column-based synthetic techniques. In some embodiments, one or more oligonucleotide preparations may be processed to remove (or reduce the frequency of) error-containing oligonucleotides. In some embodiments, a hybridization technique may be used wherein an oligonucleotide preparation is hybridized under stringent conditions one or more times to an immobilized oligonucleotide preparation designed to have a complementary sequence.
  • Oligonucleotides that do not bind may be removed in order to selectively or specifically remove oligonucleotides that contain errors that would destabilize hybridization under the conditions used. It should be appreciated that this processing may not remove all error-containing oligonucleotides since many have only one or two sequence errors and may still bind to the immobilized oligonucleotides with sufficient affinity for a fraction of them to remain bound through this selection processing procedure.
  • a nucleic acid binding protein or recombinase may be included in one or more of the oligonucleotide processing steps to improve the selection of error free oligonucleotides. For example, by preferentially promoting the hybridization of oligonucleotides that are completely complementary with the immobilized oligonucleotides, the amount of error containing oligonucleotides that are bound may be reduced.
  • this oligonucleotide processing procedure may remove more error-containing oligonucleotides and generate an oligonucleotide preparation that has a lower error frequency (e.g., with an error rate of less than 1/50, less than 1/100, less than 1/200, less than 1/300, less than 1/400, less than 1/500, less than 1/1,000, or less than 1/2,000 errors per base.
  • a lower error frequency e.g., with an error rate of less than 1/50, less than 1/100, less than 1/200, less than 1/300, less than 1/400, less than 1/500, less than 1/1,000, or less than 1/2,000 errors per base.
  • a plurality of oligonucleotides used in an assembly reaction may contain preparations of synthetic oligonucleotides, single-stranded oligonucleotides, double- stranded oligonucleotides, amplification products, oligonucleotides that are processed to remove (or reduce the frequency of) error-containing variants, etc., or any combination of two or more thereof.
  • a synthetic oligonucleotide may be amplified prior to use. Either strand of a double-stranded amplification product may be used as an assembly oligonucleotide and added to an assembly reaction as described herein.
  • a synthetic oligonucleotide may be amplified using a pair of amplification primers (e.g., a first primer that hybridizes to the 3' region of the oligonucleotide and a second primer that hybridizes to the 3' region of the complement of the oligonucleotide).
  • the oligonucleotide may be synthesized on a support such as a chip (e.g., using an ink-jet- based synthesis technology).
  • the oligonucleotide may be amplified while it is still attached to the support. In some embodiments, the oligonucleotide may be removed or cleaved from the support prior to amplification.
  • the two strands of a double-stranded amplification product may be separated and isolated using any suitable technique. In some embodiments, the two strands may be differentially labeled (e.g., using one or more different molecular weight, affinity, fluorescent, electrostatic, magnetic, and/or other suitable tags). The different labels may be used to purify and/or isolate one or both strands. In some embodiments, biotin may be used as a purification tag.
  • the strand that is to be used for assembly may be directly purified (e.g., using an affinity or other suitable tag).
  • the complementary strand is removed (e.g., using an affinity or other suitable tag) and the remaining strand is used for assembly.
  • a synthetic oligonucleotide may include a central assembly sequence flanked by 5' and 3' amplification sequences.
  • the central assembly sequence is designed for incorporation into an assembled nucleic acid.
  • the flanking sequences are designed for amplification and are not intended to be incorporated into the assembled nucleic acid.
  • the flanking amplification sequences may be used as universal primer sequences to amplify a plurality of different assembly oligonucleotides that share the same amplification sequences but have different central assembly sequences.
  • the flanking sequences are removed after amplification to produce an oligonucleotide that contains only the assembly sequence.
  • one of the two amplification primers may be biotinylated.
  • the nucleic acid strand that incorporates this biotinylated primer during amplification can be affinity purified using streptavidin (e.g., bound to a bead, column, or other surface),
  • the amplification primers also may be designed to include certain sequence features that can be used to remove the primer regions after amplification in order to produce a single-stranded assembly oligonucleotide that includes the assembly sequence without the flanking amplification sequences.
  • the non-biotinylated strand may be used for assembly.
  • the assembly oligonucleotide may be purified by removing the biotinylated complementary strand.
  • the amplification sequences may be removed if the non-biotinylated primer includes a dU at its 3' end, and if the amplification sequence recognized by (i.e., complementary to) the biotinylated primer includes at most three of the four nucleotides and the fourth nucleotide is present in the assembly sequence at (or adjacent to) the junction between the amplification sequence and the assembly sequence.
  • the double-stranded product is incubated with T4 DNA polymerase (or other polymerase having a suitable editing activity) in the presence of the fourth nucleotide (without any of the nucleotides that are present in the amplification sequence recognized by the biotinylated primer) under appropriate reaction conditions. Under these conditions, the 3' nucleotides are progressively removed through to the nucleotide that is not present in the amplification sequence (referred to as the.fourth nucleotide above). As a result, the amplification sequence that is recognized by the biotinylated primer is removed. The biotinylated strand is then removed.
  • T4 DNA polymerase or other polymerase having a suitable editing activity
  • the remaining non-biotinylated strand is then treated with uracil-DNA glycosylase (UDG) to remove the non-biotinylated primer sequence.
  • UDG uracil-DNA glycosylase
  • This technique generates a single-stranded assembly oligonucleotide without the flanking amplification sequences. It should be appreciated that this technique may be used to process a single amplified oligonucleotide preparation or a plurality of different amplified oligonucleotides in a single reaction if they share the same amplification sequence features described above.
  • the biotinylated strand may be used for assembly.
  • the assembly oligonucleotide may be obtained directly by isolating the biotinylated strand.
  • the amplification sequences may be removed if the biotinylated primer includes a dU at its 3' end, and if the amplification sequence recognized by (i.e., complementary to) the non-biotinylated primer includes at most three of the four nucleotides and the fourth nucleotide is present in the assembly sequence at (or adjacent to) the junction between the amplification sequence and the assembly sequence.
  • the double-stranded product is incubated with T4 DNA polymerase (or other polymerase having a suitable editing activity) in the presence of the fourth nucleotide (without any of the nucleotides that are present in the amplification sequence recognized by the non-biotinylated primer) under appropriate reaction conditions. Under these conditions, the 3' nucleotides are progressively removed through to the nucleotide that is not present in the amplification sequence (referred to as the fourth nucleotide above). As a result, the amplification sequence that is recognized by the non- biotinylated primer is removed. The biotinylated strand is then isolated (and the non- biotinylated strand is removed).
  • T4 DNA polymerase or other polymerase having a suitable editing activity
  • the isolated biotinylated strand is then treated with UDG to remove the biotinylated primer sequence.
  • This technique generates a single- stranded assembly oligonucleotide without the flanking amplification sequences. It should be appreciated that this technique may be used to process a single amplified oligonucleotide preparation or a plurality of different amplified oligonucleotides in a single reaction if they share the same amplification sequence features described above.
  • biotinylated primer may be designed to anneal to either the synthetic oligonucleotide or to its complement for the amplification and purification reactions described above.
  • non-biotinylated primer may be designed to anneal to either strand provided it anneals to the strand that is complementary to the strand recognized by the biotinylated primer.
  • an oligonucleotide may be modified by incorporating a modified-base (e.g., a nucleotide analog) during synthesis, by modifying the oligonucleotide after synthesis, or any combination thereof.
  • modifications include, but are not limited to, one or more of the following: universal bases such as nitroindoles, dP and dK.
  • nucleic acid binding proteins or recombinases are preferably not included in a post-assembly fidelity optimization technique (e.g., a screening technique using a MutS or MutS homolog), because the optimization procedure involves removing error-containing nucleic acids via the production and removal of heteroduplexes. Accordingly, any nucleic acid binding proteins or recombinases (e.g., RecA) that were included in the assembly steps are preferably removed (e.g., by inactivation, column purification or other suitable technique) after assembly and prior to fidelity optimization.
  • a post-assembly fidelity optimization technique e.g., a screening technique using a MutS or MutS homolog
  • the invention provides methods for assembling synthetic nucleic acids with increased efficiency.
  • the resulting assembled nucleic acids may be amplified in vitro (e.g., using PCR, LCR, or any suitable amplification technique), amplified in vivo (e.g., via cloning into a suitable vector), isolated and/or purified.
  • An assembled nucleic acid (alone or cloned into a vector) may be transformed into a host cell (e.g., a prokaryotic, eukaryotic, insect, mammalian, or other host cell).
  • the host cell may be used to propagate the nucleic acid.
  • the nucleic acid may be integrated into the genome of the host cell.
  • the nucleic acid may replace a corresponding nucleic acid region on the genome of the cell (e.g., via homologous recombination). Accordingly, nucleic acids may be used to produce recombinant organisms.
  • a target nucleic acid may be an entire genome or large fragments of a genome that are used to replace all or part of the genome of a host organism. Recombinant organisms also may be used for a variety of research, industrial, agricultural, and/or medical applications.
  • concerted assembly may be used to assemble oligonucleotide duplexes and nucleic acid fragments of less than 100 to more than 10,000 base pairs in length (e.g., 100 mers to 500 mers, 500 mers to 1 ,000 mers, 1 ,000 mers to 5,000 mers, 5, 000 mers to 10,000 mers, 25,000 mers, 50,000 mers, 75,000 mers, 100,000 mers, etc.).
  • methods described herein may be used during the assembly of an entire genome (or a large fragment thereof, e.g., about 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or more) of an organism (e.g., of a viral, bacterial, yeast, or other prokaryotic or eukaryotic organism), optionally incorporating specific modifications into the sequence at one or more desired locations.
  • an organism e.g., of a viral, bacterial, yeast, or other prokaryotic or eukaryotic organism
  • nucleic acid products e.g., including nucleic acids that are amplified, cloned, purified, isolated, etc.
  • any of the nucleic acid products may be packaged in any suitable format (e.g., in a stable buffer, lyophilized, etc.) for storage and/or shipping (e.g., for shipping to a distribution center or to a customer).
  • any of the host cells e.g., cells transformed with a vector or having a modified genome
  • cells may be prepared in a suitable buffer for storage and or transport (e.g., for distribution to a customer).
  • cells may be frozen.
  • other stable cell preparations also may be used.
  • Host cells may be grown and expanded in culture.
  • Host cells may be used for expressing one or more RNAs or polypeptides of interest (e.g., therapeutic, industrial, agricultural, and/or medical proteins).
  • the expressed polypeptides may be natural polypeptides or non-natural polypeptides.
  • the polypeptides may be isolated or purified for subsequent use. Accordingly, nucleic acid molecules generated using methods of the invention can be incorporated into a vector.
  • the vector may be a cloning vector or an expression vector.
  • a vector may comprise an origin of replication and one or more selectable markers (e.g., antibiotic resistant markers, auxotrophic markers, etc.).
  • the vector may be a viral vector.
  • a viral vector may comprise nucleic acid sequences capable of infecting target cells.
  • a prokaryotic expression vector operably linked to an appropriate promoter system can be used to transform target cells.
  • a eukaryotic vector operably linked to an appropriate promoter system can be used to transfect target cells or tissues.
  • RNAs or polypeptides may be isolated or purified.
  • Nucleic acids of the invention also may be used to add detection and/or purification tags to expressed polypeptides or fragments thereof.
  • polypeptide-based fusion/tag include, but are not limited to, hexa- histidine (His 6 ) Myc and HA, and other polypeptides with utility, such as GFP, GST, MBP, chitin and the like.
  • polypeptides may comprise one or more unnatural amino acid residue(s).
  • antibodies can be made against polypeptides or f ⁇ agment(s) thereof encoded by one or more synthetic nucleic acids.
  • synthetic nucleic acids may be provided as libraries for screening in research and development (e.g., to identify potential therapeutic proteins or peptides, to identify potential protein targets for drug development, etc.)
  • a synthetic nucleic acid may be used as a therapeutic (e.g., for gene therapy, or for gene regulation).
  • a synthetic nucleic acid may be administered to a patient in an amount sufficient to express a therapeutic amount of a protein.
  • a synthetic nucleic acid may be administered to a patient in an amount sufficient to regulate (e.g., down-regulate) the expression of a gene.
  • an assembly procedure may involve a combination of acts that are performed at one site (in the United States or outside the United States) and acts that
  • aspects of the invention may include automating one or more acts described herein.
  • a sequence analysis may be automated in order to generate a synthesis strategy automatically.
  • the synthesis strategy may include i) the design of the starting nucleic acids that are to be assembled into the target nucleic acid, ii) the choice of the assembly technique(s) to be used, iii) the number of rounds of assembly and error screening or sequencing steps to include, and/or decisions relating to subsequent processing of an assembled target nucleic acid.
  • one or more steps of an assembly reaction may be automated using one or more automated sample handling devices (e.g., one or more automated liquid or fluid handling devices).
  • starting nucleic acids e.g., oligonucleotides
  • a nucleic acid synthesizer and automated procedures.
  • Automated devices and procedures may be used to mix reaction reagents, including one or more of the following: starting nucleic acids, buffers, enzymes (e.g., one or more ligases and/or polymerases), nucleotides, nucleic acid binding proteins or recombinases, salts, and any other suitable agents such as stabilizing agents.
  • Automated devices and procedures also may be used to control the reaction conditions. For example, an automated thermal cycler may be used to control reaction temperatures and any temperature cycles that may be used.
  • a thermal cycler may be automated to provide one or more reaction temperatures or temperature cycles suitable for incubating nucleic acid fragments prior to transformation.
  • subsequent purification and analysis of assembled nucleic acid products may be automated.
  • fidelity optimization steps e.g., a MutS error screening procedure
  • Sequencing also may be automated using a sequencing device and automated sequencing protocols. Additional steps (e.g., amplification, cloning, etc.) also may be automated using one or more appropriate devices and related protocols. It should be appreciated that one or more of the device or device components described herein may be combined in a system (e.g., a robotic system).
  • Assembly reaction mixtures may be transferred from one component of the system to another using automated devices and procedures (e.g., robotic manipulation and/or transfer of samples and/or sample containers, including automated pipetting devices, etc.).
  • automated devices and procedures e.g., robotic manipulation and/or transfer of samples and/or sample containers, including automated pipetting devices, etc.
  • the system and any components thereof may be controlled by a control system.
  • acts of the invention may be automated using, for example, a computer system (e.g., a computer controlled system).
  • a computer system on which aspects of the invention can be implemented may include a computer for any type of processing (e.g., sequence analysis and/or automated device control as described herein).
  • processing steps may be provided by one or more of the automated devices that are part of the assembly system.
  • a computer system may include two or more computers.
  • one computer may be coupled, via a network, to a second computer.
  • One computer may perform sequence analysis.
  • the second computer may control one or more of the automated synthesis and assembly devices in the system.
  • additional computers may be included in the network to control one or more of the analysis or processing acts.
  • Each computer may include a memory and processor.
  • the computers can take any form, as the aspects of the present invention are not limited to being implemented on any particular computer platform.
  • the network can take any form, including a private network or a public network (e.g., the Internet).
  • Display devices can be associated with one or more of the devices and computers.
  • a display device may be located at a remote site and connected for displaying the output of an analysis in accordance with the invention. Connections between the different components of the system may be via wire, wireless transmission, satellite transmission, any other suitable transmission, or any combination of two or more of the above.
  • sequence information e.g., a target sequence, a processed analysis of the target sequence, etc.
  • a public network such as the Internet
  • a remote location to be processed by computer to produce any of the various types of outputs discussed herein (e.g., in connection with oligonucleotide design).
  • a public network such as the Internet
  • outputs discussed herein (e.g., in connection with oligonucleotide design).
  • the aspects of the present invention described herein are not limited in that respect, and that numerous other configurations are possible.
  • all of the analysis and processing described herein can alternatively be implemented on a computer that is attached locally to a device, an assembly system, or one or more components of an assembly system.
  • sequence information e.g., a target sequence, a processed analysis of the target sequence, etc.
  • a communication medium e.g., the network
  • the information can be loaded onto a computer readable medium that can then be physically transported to another computer for processing in the manners described herein.
  • a combination of two or more transmission/delivery techniques may be used.
  • computer implementable programs for performing a sequence analysis or controlling one or more of the devices, systems, or system components described herein also may be transmitted via a network or loaded onto a computer readable medium as described herein. Accordingly, aspects of the invention may involve performing one or more steps within the United States and additional steps outside the United States.
  • sequence information (e.g., a customer order) may be received at one location (e.g., in one country) and sent to a remote location for processing (e.g., in the same country or in a different country), for example, for sequence analysis to determine a synthesis strategy and/or design oligonucleotides.
  • a portion of the sequence analysis may be performed at one site (e.g., in one country) and another portion at another site (e.g., in the same country or in another country).
  • different steps in the sequence analysis may be performed at multiple sites (e.g., all in one country or in several different countries). The results of a sequence analysis then may be sent to a further site for synthesis.
  • different synthesis and quality control steps may be performed at more than one site (e.g., within one county or in two or more countries).
  • An assembled nucleic acid then may be shipped to a further site (e.g., either to a central shipping center or directly to a client).
  • each of the different aspects, embodiments, or acts of the present invention described herein can be independently automated and implemented in any of numerous ways.
  • each aspect, embodiment, or act can be independently implemented using hardware, software or a combination thereof.
  • the software code can be executed on any suitable processor or collection of processors, whether provided in a single computer or distributed among multiple computers.
  • any component or collection of components that perform the functions described above can be generically considered as one or more controllers that control the above-discussed functions.
  • the one or more controllers can be implemented in numerous ways, such as with dedicated hardware, or with general purpose hardware (e.g., one or more processors) that is programmed using microcode or software to perform the functions recited above.
  • one implementation of the embodiments of the present invention comprises at least one computer-readable medium (e.g., a computer memory, a floppy disk, a compact disk, a tape, etc.) encoded with a computer program (i.e., a plurality of instructions), which, when executed on a processor, performs one or more of the above-discussed functions of the present invention.
  • the computer-readable medium can be transportable such that the program stored thereon can be loaded onto any computer system resource to implement one or more functions of the present invention discussed herein.
  • the reference to a computer program which, when executed, performs the above-discussed functions is not limited to an application program running on a host computer. Rather, the term computer program is used herein in a generic sense to reference any type of computer code (e.g., software or microcode) that can be employed to program a processor to implement the above-discussed aspects of the present invention.
  • the computer implemented processes may, during the course of their execution, receive input manually (e.g., from a user). Accordingly, overall system-level control of the assembly devices or components described herein may be performed by a system controller which may provide control signals to the associated nucleic acid synthesizers, liquid handling devices, thermal cyclers, sequencing devices, associated robotic components, as well as other suitable systems for performing the desired input/output or other control functions.
  • a system controller may provide control signals to the associated nucleic acid synthesizers, liquid handling devices, thermal cyclers, sequencing devices, associated robotic components, as well as other suitable systems for performing the desired input/output or other control functions.
  • the system controller along with any device controllers together form a controller that controls the operation of a nucleic acid assembly system.
  • the controller may include a general purpose data processing system, which can be a general purpose computer, or network of general purpose computers, and other associated devices, including communications devices, modems, and/or other circuitry or components necessary to perform the desired input/output or other functions.
  • the controller can also be implemented, at least in part, as a single special purpose integrated circuit (e.g., ASIC) or an array of ASICs, each having a main or central processor section for overall, system- level control, and separate sections dedicated to performing various different specific computations, functions and other processes under the control of the central processor section.
  • ASIC application specific integrated circuit
  • the controller can also be implemented using a plurality of separate dedicated programmable integrated or other electronic circuits or devices, e.g., hard wired electronic or logic circuits such as discrete element circuits or programmable logic devices.
  • the controller can also include any other components or devices, such as user input/output devices (monitors, displays, printers, a keyboard, a user pointing device, touch screen, or other user interface, etc.), data storage devices, drive motors, linkages, valve controllers, robotic devices, vacuum and other pumps, pressure sensors, detectors, power supplies, pulse sources, communication devices or other electronic circuitry or components, and so on.
  • the controller also may control operation of other portions of a system, such as automated client order processing, quality control, packaging, shipping, billing, etc., to perform other suitable functions known in the art but not described in detail herein.
  • aspects of the invention may be useful to generate nucleic acid libraries that represent very large numbers of nucleic acid sequence variants (e.g., RNA candidates for an aptamer screen) nucleic acid assembly reactions. Accordingly, aspects of the invention relate to marketing methods, compositions, kits, devices, and systems for generating nucleic acid libraries that represent very large numbers of nucleic acid sequence variants, methods and compositions for in vivo aptamer screening and selection, methods and compositions for identifying, monitoring, and generating metabolic pathways, and methods for designing and assembling libraries as described herein.
  • nucleic acid sequence variants e.g., RNA candidates for an aptamer screen
  • aspects of the invention may be useful for reducing the time and/or cost of production, commercialization, and/or development of synthetic nucleic acids, and/or related compositions. Accordingly, aspects of the invention relate to business methods that involve collaboratively (e.g., with a partner) or independently marketing one or more methods, kits, compositions, devices, or systems for analyzing and/or assembling libraries and identifying aptamers in vivo as described herein. For example, certain embodiments of the invention may involve marketing a procedure and/or associated devices or systems involving techniques and assays described herein. In some embodiments, synthetic nucleic acids, libraries of synthetic nucleic acids, host cells containing synthetic nucleic acids, expressed polypeptides or proteins, etc., also may be marketed.
  • Marketing may involve providing information and/or samples relating to methods, kits, compositions, devices, and/or systems described herein.
  • Potential customers or partners may be, for example, companies in the pharmaceutical, biotechnology and agricultural industries, as well as academic centers and government research organizations or institutes.
  • Business applications also may involve generating revenue through sales and/or licenses of methods, kits, compositions, devices, and/or systems of the invention.
  • step (1) a primerless assembly of oligonucleotides is performed and in step (2) an assembled nucleic acid fragment is amplified in a primer-based amplification.
  • a 993 base long promoter>EGFP construct was assembled from 50-mer abutting oligonucleotides using a 2-step PCR assembly.
  • oligonucleotide pools were prepared as follows: 36 overlapping 50-mer oligonucleotides and two 5' terminal 59-mers were separated into 4 pools, each corresponding to overlapping 200-300 nucleotide segments of the final construct. The total oligonucleotide concentration in each pool was 5 ⁇ M.
  • a primerless PCR extension reaction was used to stitch (assemble) overlapping oligonucleotides in each pool.
  • the PCR extension reaction mixture was as follows: oligonucleotide pool (5 ⁇ M total) 1.0 ⁇ l ( ⁇ 25 nM final each) dNTP (10 mM each) 0.5 ⁇ l (250 ⁇ M final each)
  • primerless PCR product 1.0 ⁇ l primer 5 ' (1.2 ⁇ M) 5 ⁇ l (300 nM final) primer 3 ' (1.2 ⁇ M) 5 ⁇ l (300 nM final) dNTP (10 mM each) 0.5 ⁇ l (250 ⁇ M final each)
  • PfU polymerase (2.5 U/ ⁇ l) 0.5 ⁇ l The following PCR cycle conditions were used: start 2 min. 95°C
  • the amplified sub-segments were assembled using another round of primerless PCR as follows.
  • a diluted amplification product was prepared for each sub-segment by diluting each amplified sub-segment PCR product 1:10 (4 ⁇ l mix + 36 ⁇ l dEfeO). This diluted mix was used as follows: diluted sub-segment mix * 1.0 ⁇ l dNTP (1OmM each) 0.5 ⁇ l (250 ⁇ M final each) Pfu buffer (1 Ox) 2.0 ⁇ l
  • Pfu polymerase (2.5 U/ ⁇ l) 0.5 ⁇ l dH 2 O to 20 ⁇ l
  • the following PCR cycle conditions were used: start 2 min. 95°C 30 cycles of 95°C 30 sec, 65 0 C 30 sec, 72°C 1 min. final 72 0 C 2 min. extension step
  • the full-length 993 nucleotide long promoter>EGFP was amplified in the following PCR mix: assembled sub-segments 1.0 ⁇ l primer 5 ' (1.2 ⁇ M) 5 ⁇ l (300 nM final) primer 3 ' (1.2 ⁇ M) 5 ⁇ l (300 nM final) dNTP (10 mM each) 0.5 ⁇ l (250 ⁇ M final each)
  • the present invention provides among other things methods for assembling large polynucleotide constructs and organisms having increased genomic stability. While specific embodiments of the subject invention have been discussed, the above specification is illustrative and not restrictive. Many variations of the invention will become apparent to those skilled in the art upon review of this specification. The full scope of the invention should be determined by reference to the claims, along with their full scope of equivalents, and the specification, along with such variations.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Biomedical Technology (AREA)
  • Chemical & Material Sciences (AREA)
  • Wood Science & Technology (AREA)
  • Organic Chemistry (AREA)
  • Biotechnology (AREA)
  • General Engineering & Computer Science (AREA)
  • Zoology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Microbiology (AREA)
  • Plant Pathology (AREA)
  • Biophysics (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

Certain aspects of the present invention provide methods for assembling nucleic acid libraries that represent large numbers of sequence variants. Some embodiments involve generating libraries of RNA molecules that can be subjected to in vivo screens or selections to identify one or more aptamers. Certain aspects of the invention provide cells that transcribe one or more different aptamers (e.g., 10) that each bind to a different ligand and are each fused to a different reporter RNA.

Description

METHODS AND COMPOSITIONS FOR APTAMER PRODUCTION AND USES THEREOF
FIELD OF THE INVENTION Aspects of the invention relate to methods and compositions for the production of aptamers and uses thereof.
BACKGROUND OF THE INVENTION
Methods for identifying nucleic acid aptamers involve in vitro binding protocols to select nucleic acid aptamers from random pools of starting nucleic acids. In vitro selection has yielded aptamers that bind to certain nucleic acids, proteins, and small organic compounds.
SUMMARY OF THE INVENTION Aspects of the invention relate to nucleic acid libraries and host cells that can be used to screen many different nucleic acids in vivo and identify rare nucleic acids that have predetermined structural and/or functional properties of interest. Certain aspects of the invention involve identifying RNA aptamers using in vivo selections or screens. In some embodiments, recombinant cells may include several different in vivo aptamers associated with different reporter readouts.
Aspects of the invention take advantage of nucleic acid assembly technology that supports the production of any nucleic acid fragments (including large nucleic acid fragments) having a predetermined sequence of interest. Technology described herein allows libraries of the invention to be designed and assembled to include many different predetermined sequences of interest. This assembly technology also allows the production of nucleic acids that can be used to modify host organisms as described herein.
Aspects of the invention relate to RNA libraries that can be used to screen or select for RNA molecules with functional or structural properties in vivo (e.g., RNA aptamers). Other aspects of the invention relate to libraries of RNA molecules having predetermined structural and/or functional properties. Aspects of the invention provide compositions and methods for expressing RNA libraries in vivo. Further aspects of the invention provide modified host cells that are adapted to express RNA libraries of interest. For example, a host cell may express a specific polymerase for transcribing the RNA, a ribonuclease that can specifically cleave long RNA transcripts, an RNA polymerase that can incorporate modified nucleotides, or any combination thereof. In another aspect, an RNA aptamer may be identified in an in vitro screen or selection. In some embodiments, a pool of RNA molecules may be provided wherein each RNA molecule contains a reporter domain (e.g., a riboregulator sequence) attached to a different unique RNA sequence (e.g., a random RNA sequence) that is an aptamer candidate. The pool of RNA molecules may be screened or selected to identify RNA variants that bind to a molecule (e.g., ligand) of interest. Screening or selection assays may involve different configurations of an assay wherein a candidate RNA that binds to the molecule of interest can be identified when it produces a configuration change in the reporter domain that can be detected in an in vitro assay. In some embodiments, the configuration change of a riboregulator domain attached to an aptamer can be detected using any suitable technique. For example, a configuration change that affects the binding properties of the riboregulator (e.g., by releasing or sequestering one or more sequences of the riboregulator due to changes in hybridization patterns within the riboregulator domain due to the conformational change) can be detected using one or more different binding assays. In some embodiments, candidate RNA molecules or candidate motifs that are identified in an in vitro screen may be tested in vivo using a suitable expression vector for expressing the one or more RNA molecules. The candidate aptamer sequences (and/or variants thereof) may be tested in vivo in association with the same reporter domain that was used for the in vitro screen. However, in some embodiments, the candidate aptamer sequences may be connected to a different reporter domain to test for in vivo properties of interest.
In another aspect, the invention provides a method for identifying and producing an RNA aptamer having a property of interest. In some embodiments, an RNA aptamer may be obtained from an in vitro and/or in vivo screen or selection. An RNA aptamer may be expressed in a cell thereby providing a novel functional and/or structural property to that cell. In another aspect, the invention provides a method of producing a cell having an altered cell function hy introducing into the cell a nucleic acid expressing one or more RNA aptamers having specific binding properties. In some embodiments, the method further comprises propagating the cell having an altered function. Accordingly, aspects of the invention relate to engineered cells expressing one or more identified aptamers of interest. It should be appreciated that an aptamer sequence that was identified using a construct and an assay wherein the aptamer sequence was connected to a reporter domain, subsequently may be synthesized, isolated, and/or expressed with or without the reporter domain. In some embodiments, the aptamer sequence may be synthesized, isolated, and/or expressed in association (e.g., fused) to one or more different reporter domains.
In some embodiments, the target nucleic acid (e.g., a nucleic acid expressing one or more RNA molecules of interest) may be amplified, sequenced or cloned after it is made. In some embodiments, a host cell may be transformed with the assembled target nucleic acid. The target nucleic acid may be integrated into the genome of the host cell. In some embodiments, the one or more RNA molecules of interest (e.g., including one or more aptamers of interest) may be expressed (e.g., under the control of an inducible promoter). One or more expressed RNAs (e.g., RNA aptamers) may be isolated and/or purified (e.g., from a cell or cell lysate). A cell transformed with an assembled nucleic acid may be stored, shipped, and/or propagated (e.g., grown in culture).
In another aspect, the invention provides methods of obtaining target nucleic acids that express one or more RNA molecules of interest (including RNA aptamers) by sending sequence information and delivery information to a remote site (e.g., inside or outside the United States). The sequence may be analyzed at the remote site. The starting nucleic acids may be designed and/or produced at the remote site. The starting nucleic acids may be assembled in a reaction involving a combination of ligation and extension techniques at the remote site. In some embodiments, the starting nucleic acids, an intermediate product in the assembly reaction, and/or the assembled target nucleic acid may be shipped to the delivery address that was provided. Other aspects of the invention provide systems for designing starting nucleic acids and/or for assembling the starting nucleic acids to express one or more RNA molecules (including RNA aptamers) as described herein. In some embodiments, nucleic acid preparations of the invention may be screened or selected at a remote location to identify and/or isolate RNA molecules (e.g., aptamers) having structural and/or functional properties of interest. The RNA molecules or cells expressing the RNA molecules may returned from the remote site and/or sent from the remote site to a specified location. Yet further aspects of the invention relate to business methods of marketing one or more methods, systems, and/or automated procedures relating to RNA expression and/or RNA aptamers described herein.
Other features and advantages of the invention will be apparent from the following detailed description, and from the claims. The claims provided below are hereby incorporated into this section by reference.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 illustrates non-limiting aspects of an embodiment of a polymerase-based multiplex oligonucleotide assembly reaction;
FIG. 2 illustrates non-limiting aspects of an embodiment of sequential assembly of a plurality of oligonucleotides in a polymerase-based multiplex assembly reaction;
FIG. 3 illustrates a non-limiting embodiment of a ligase-based multiplex oligonucleotide assembly reaction; FIG. 4 illustrates several non-limiting embodiments of ligase-based multiplex oligonucleotide assembly reactions on supports;
FIG. 5 illustrates a non-limiting embodiment of a decision tree for designing a nucleic acid assembly method; and
FIG. 6 illustrates non-limiting embodiments of nucleic acid constructs encoding a plurality of RNA molecules (e.g., aptamer or aptamer candidate molecules).
DETAILED DESCRIPTION OF THE INVENTION
Aspects of the invention relate to nucleic acid libraries and methods and compositions for preparing libraries containing very high numbers of nucleic acid regions. Aspects of the invention involve preparing a library comprising a plurality of cells, each transformed with one or more separate nucleic acid molecules, wherein each nucleic acid molecule comprises a plurality of nucleic acid regions, and wherein each nucleic acid region can be assayed to evaluate one or more structural and/or functional properties. Accordingly, aspects of the invention can be used to assay a large number of nucleic acid regions for the presence of one or more regions having structural and/or functional properties of interest (e.g., one or more nucleic acid aptamers having selective ligand-binding properties). It should be appreciated that the in vivo and in vitro assays described herein may be used to identify RNA aptamers that bind to any ligand of interest. A ligand may be a biological ligand, for example an intracellular or extracellular ligand. A ligand may be associated with disease, health, or other physiological condition. Accordingly, a ligand may be a metabolite, a protein, a nucleic acid, a lipid, a carbohydrate, or other molecule or any combination thereof. However, ligands may be environmental or industrial molecules (e.g., toxins, minerals, chemical products, etc.) as described in more detail herein.
It should be appreciated that aspects of the invention may be used to identify aptamers that specifically bind to a target ligand. However, other aptamers may be identified that bind less specifically to a range of ligands having certain common features. Accordingly, certain aptamers may be isolated to bind selectively to a ligand but also to cross-react with other ligands. RNA aptamers may have different levels of specificity (e.g., high, medium, or low) for one or more ligands of interest. For example, aptamers may have affinities for their ligands ranging from nM to 10 μM (e.g., on the order of 1 nM, 10 nM, 100 nM, 500 nM, 1 μM, 10 μM or intermediate affinities). However, in some embodiments, aptamers may have higher or lower affinities for their ligands.
In some embodiments, each nucleic acid fragment can transcribe an RNA molecule. The RNA molecules can be assayed (e.g., in vivo or in vitro) to determine whether any of them have a structure or function of interest. Accordingly, in one aspect the invention provides in vivo libraries of transcribed RNA molecules that can be evaluated in vivo for the presence of one or more RNAs having structural and/or functional properties of interest (e.g., one or more RNA aptamers having selective ligand-binding properties under biological conditions). In some embodiments, the complexity of a library that comprises a plurality of different vectors wherein each vector encodes a plurality of different RNA molecules may be calculated as the number of transformants multiplied by the number of different RNA-encoding regions on each vector. By using vectors that encode a large number of different RNA molecules (e.g., 10-100 or more), a library of the invention provides a large number of different RNA variants. Accordingly, methods of the invention can be useful to sample a large number of potential nucleic acid sequence variants. By providing a platform for in vivo selection or screening, methods of the invention can be useful for identifying one or more nucleic acids (e.g., RNAs) that have structural and/or functional properties of interest under biological conditions. In contrast, aptamers that are identified through in vitro aptamer screening and selection technology may not maintain their selective ligand-binding properties under biological conditions.
In some aspects, the invention provides different cell lines, each comprising a plurality of different aptamers that each recognizes a different ligand and provides a different readout (e.g., signal) when its ligand is present in vivo. These cell lines, and the sets of aptamers that they contain, can be used in medicine, agriculture, industry, mining, or for other applications where the ability to detect and distinguish between different ligands can be very important. In some embodiments, a cell containing a plurality of different aptamers that can selectively bind to, and signal the presence of, different metabolic intermediates (e.g., intracellular metabolic intermediates) can be used to dissect and/or monitor metabolic pathways. Such cells, and the sets of aptamers that they contain, also can be used as markers to select and/or screen for enzymes, enzyme variants, or combinations thereof, that can form novel or modified metabolic pathways. Accordingly, aspects of the invention may be used to develop novel or modified metabolic pathways that may catalyze the conversion of a first compound to a second compound, that may degrade or modify certain compounds, that may synthesize certain compounds, or any combination thereof. For example, methods of the invention may be useful to develop pathways for degrading or modifying environmental contaminants to reduce their toxicity. In some embodiments, metabolic pathways for generating commercially useful compounds may be useful (e.g., ethanol, and other commercially useful compounds). In some embodiments, methods of the invention relate to in vivo aptamer identification and production. A library of RNA molecules may be transcribed and individual RNA molecules with functional and or structural properties of interest may be identified.
In aspects of the invention, nucleic acid regions encoding different RNA molecules (e.g., RNA aptamers or aptamer candidates) may be of any length. In some embodiments, a nucleic acid region and the encoded RNA variant may be at least 50 to at least 200 nucleotide bases long. In certain embodiments, a transcribed variant RNA sequence may be at least 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, or 200 nucleotide bases long. However, certain variant RNA sequences may be shorter than 50 bases long (e.g., between about 10 and about 50 bases long) or longer than 200 nucleotides long. In some embodiments of the invention, each of the variant RNA sequences is connected to (e.g., operably connected to) one or more reporter sequences (e.g., a sequence of a riboregulator, antisense RNA, or other reporter sequence). It should be appreciated that each different variant RNA sequence (e.g., each different aptamer or aptamer candidate sequence) may be connected to the same reporter sequences. However, different variant RNA sequences or subsets of different variant sequences each may be connected to different reporter sequences in some embodiments. Accordingly, the length of the synthesized, encoded, and/or transcribed RNAs may be longer than the length of the variant sequence described above since each RNA also may include the length of a reporter sequence attached to it, if present.
In aspects of the invention, each vector may encode one or more separate RNA molecules. In certain embodiments, a single vector encodes about I, 2, 3, 4, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, or more RNA molecules. In some embodiments, the RNA sequences are all different. However, in some embodiments several identical copies of one or more RNA sequences may be transcribed from a single vector. The sequences encoding the separate RNA molecules may be arranged in a linear array.
In some embodiments, transcription of one or more RNA molecules may be under the control of the same promoter. In certain embodiments, transcription of one or more RNA molecules may be under the control of separate promoters. In some embodiments, each RNA is transcribed from its own separate promoter. The separate promoters may be separate copies of the same promoter or different promoters. In some embodiments, one or more promoters may be inducible. In some embodiments, RNA transcription may involve transcription enzymes of the host cell.
In some embodiments, nucleic acid regions encoding separate RNA molecules may be transcribed as a single RNA transcript. A single RNA transcript may include 2 or more RNA molecules. In some embodiments, a single RNA transcript may include 3, 4, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100 or more RNA molecules. The single RNA transcript may include one or more cleavage sites that can be acted on to release one or more individual RNAs from the RNA transcript. In some embodiments, one or more enzymes may cut the cleavage sites to release individual RNAs. In some embodiments, the cleavage sites may be autocatalytic RNA cleavage sites. In other embodiments, RNAs may be transcribed as individual transcripts. In certain embodiments, a plurality of RNAs may be transcribed in a combination of individual RNA transcripts and RNA transcripts that include two or more RNAs. In aspects of the invention, a nucleic acid sequence encoding an RNA molecule and one or more regulatory sequences may be "operably" joined. The nucleic acid sequence and one or more regulatory sequences may be covalently linked in such a way as to place the transcription of the coding nucleic acid sequence under the influence or control of the regulatory sequences. A promoter region is operably joined to a coding nucleic acid sequence if the promoter region is capable of promoting transcription of that nucleic acid sequence such that the resulting transcript may be an RNA molecule of the invention.
The precise nature of the regulatory sequences needed for transcription may vary between species or cell types. In some embodiments, a 5' non-transcribed regulatory sequences may be used that includes a promoter region having a promoter sequence for transcriptional control of the operably joined nucleic acid sequence. Regulatory sequences may also include enhancer sequences or upstream activator sequences as desired.
Transcription vectors containing all the necessary elements for transcription are commercially available and known to those skilled in the art. See, e.g., Sambrook et al., Molecular Cloning: A Laboratory Manual, Second Edition, Cold Spring Harbor Laboratory Press, 1989. Systems and promoters for nucleic acid transcription in mammalian cells are known to those of ordinary skill in the art and available commercially.
In some embodiments, one or more transcribed RNA sequences may be identical. However, in order to maximize the number of different RNA sequences that may be sampled, each vector may encode a plurality of unique RNA sequences. The vector inserts that encode the unique RNA sequences may be made in a nucleic acid assembly procedure that is designed to generate a linear array of unique sequences. In addition, the nucleic acid assembly may be designed to produce a large number of different vector inserts each encoding a plurality of unique RNA sequences that are not repeated in any of the other different vector inserts. However, it should be appreciated that multiple copies of each different vector insert may be produced in order to clone the inserts into the vectors and/or in order to transform the host cells. The number of different vector inserts that are designed and assembled may be a function of the expected number of transformants. For example, if a host system can generate up to 1010, 1012, 1014 or more different transformants, the number of different unique vector inserts should be similar or higher. It should be appreciated that if each insert encodes 100 unique RNA sequences, then a library will encode a number of different RNA molecules that is 100 times the number of transformants.
The distribution of different RNA sequences across the library may be random or systematic depending on the design. In some embodiments, the RNAs expressed on one vector may differ from each other by 1-5 nucleotide substitutions. However, in some embodiments, RNAs encoded on one DNA insert may have sequences that differ from each other by about 6, 7, 8, 9, 10, 1 1, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190 or 200 more nucleotide substitutions. A library may not sample all different sequence variants that are possible for an RNA of a predetermined length. The sequence variants that are assembled may be determined at the design stage based on one or more factors that could include design and assembly considerations and/or any information that may suggest that certain sequence variants are more likely to result in structural or functional properties of interest.
In some embodiments, a library may be assembled to include a plurality of identical or similar RNA sequences, and additional sequence variation may be introduced using mutagenesis, error-prone PCR, or other suitable methods. However, such methods introduce sequence variations randomly and are unlikely to generate as much sequence variation as a procedure that involves a design stage at which each unique RNA sequence may be predetermined. In aspects of the invention, nucleic acids encoding RNA molecules may be cloned into vectors. A vector may be any suitable vector. For example, a vector may be a plasmid, a cosmid, a phagemid, a BAC, a YAC, an F factor, or any other suitable prokaryotic, eukaryotic or viral vector. A vector may include an origin of replication and/or one or more selectable markers (e.g., antibiotic resistant markers, etc.) and/or detectable markers (e.g., fluorescent markers, etc.). In some embodiments, a vector may be a shuttle vector that is functional in two or more different types (e.g., species) of host cells.
In aspects of the invention, vectors or expression systems may be transfected or transformed into a cell or other system capable of transcribing the RNA molecules of the invention. A host cell may be prokaryotic (e.g., bacterial such as E. coli or B. subtilis) or eukaryotic (for example a yeast, mammal, insect, or other eukaryotic cell). For example, host cells may be bacterial cells (e.g., Escherichia coli, Bacillus subtilis, Mycobacterium spp. , M. tuberculosis, or other suitable bacterial cells), yeast cells (for example, Saccharomyces spp., Picchia spp., Candida spp., or other suitable yeast species, e.g., S. cerevisiae, C. albicans, S. pombe, etc.), Xenopus cells, mouse cells, monkey cells, human cells, insect cells (e.g., SF9 cells and Drosophila cells), worm cells (e.g., Caenorhabditis spp.), plant cells, or other suitable cells, including for example, transgenic or other recombinant cell lines. In addition, a number of heterologous cell lines may be used, such as Chinese Hamster Ovary cells (CHO). It should be appreciated that when integrating a nucleic acid into a eukaryotic genome (e.g., a mammalian genome) care should be taken to select sites that will allow sufficient expression (e.g., silenced regions of the genome should be avoided, whereas a site comprising an enhancer may be appropriate).
In aspects of the invention, a modified RNA polymerase that incorporates one or more modified ribonucleotides (e.g., 2'-O-methyl ribonucleotides) that may stabilize RNA molecules could be expressed in the host cell. In certain embodiments, a population of cells may be grown under conditions suitable for the expression of the RNA molecules of the invention. Such conditions may involve providing a suitable nutrient medium to allow growth and proliferation of the cells. The nutrient medium may contain any of the following in an appropriate combination: isotonic saline, buffer, amino acids, serum or serum replacement, and other exogenously added factors. In some embodiments, the nutrient medium may contain one or more drugs, such as antibiotics, used for selection of a cell having a particular characteristic. In some embodiments the nutrient medium is serum free. Nutrient medium is commercially available from sources such as Life Technologies (Gaithersburg, MD).
In certain embodiments, a nucleic acid encoding different RNA molecules may be integrated into the host cell genome.
In some embodiments, a population of transformed host cells can produce many different unique RNA molecules. In some embodiments, at least 10s, 1010, 1012, 1013, 1014, 10i5, 1016, 1017, 1018, 1019, or 1020 or more different unique RNA molecules may be transcribed (e.g., each having a unique variant sequence that is an aptamer or aptamer candidate sequence, and, optionally, a reporter sequence attached to the variant sequence).
Aspects of the invention may involve one or more nucleic acid assembly reactions in order to make the sets of DNA molecules, RNA encoding fragments, aptamer constructs, modified host cells, and/or other nucleic acids that may be used to isolate and/or use RNA molecules having one or more functions of interest. Aspects of the invention may be used in conjunction with in vitro and/or in vivo nucleic acid assembly procedures. Non-limiting examples of techniques that may be used to assemble constructs of the invention are described herein and illustrated in FIGS. 1-5. FIG. 6 provides non-limiting examples of nucleic acid constructs (e.g., DNA constructs) that can be used to express a plurality of RNA molecules for in vivo aptamer screening and/or selection. FIG. 6 A shows an example of a single construct encoding RNA variants 1, 2, 3, through n. As described herein, the variants all may be different. However, in some embodiments, two or more copies of one or more variants may be present on a single construct. The RNAs may be random variant sequences that are all aptamer candidates. However, the RNAs may be variants of an aptamer that has a known binding affinity (low or high) for a ligand of interest. In some embodiments, the variants may be different aptamers having different binding affinities for different ligands. Other configurations of different unique RNAs also may provided as the invention is not limited in this respect. As described herein, the encoded RNAs may be transcribed from separate promoters (identical, different, or a combination thereof) or transcribed as a single transcript from a single common promoter and processed (e.g., specifically cleaved) to generate the individual RNA molecules, or a combination of individual and common promoters. It should be appreciated that n may be any integer (e.g., between 5 and 1,000, for example about 10 to about 100, about 50, etc., or smaller or larger). FIG. 6B illustrates an embodiment wherein each RNA is operably associated with a reporter sequence (Reporter A). As illustrated, the same reporter sequence is fused to each different variant RNA. However, the invention is not limited in this respect and different reporter sequences may be fused to different variant RNAs and/or different reporter sequences may be fused to subsets of variant RNAs depending on the desired configuration. Also, FIG. 6B illustrates each reporter sequence located downstream from the variant RNA sequences. However, it should be appreciated that the reporter sequences may be downstream, upstream, or a combination thereof when connected to different variant RNAs. In some embodiments, one or more variant RNAs may have reporter sequences at both their 5' and 3' ends. In some embodiments, a single reporter function may require both 5' and 3' sequences. In some embodiments, all of the reporter sequences may be 5' or 3' of the variant RNA sequences (e.g., of the unique variant RNA sequences) in one or more constructs. However, in some embodiments, a single construct may include 5' and 3' reporter sequences and/or a combination thereof.
Selection and screening
In aspects of the invention, a library of transcribed RNA molecules may be subjected to a screen or selection to identify one or more RNA molecules having a structural and/or functional property of interest. The presence of an RNA of interest in an intracellular library of transcribed RNA molecules may be determined directly or indirectly. In some embodiments, the presence of an RNA of interest may be detected directly if the desired function can be directly screened or selected for. For example, if an enzymatic function is desired, a screen or selection may be based on the presence or absence of the enzymatic properties of interest. Such an assay may be an in vivo assay. However, in some embodiments, an in vitro assay may be performed on cell extracts. In some embodiments, the presence of an RNA that binds to a ligand with high affinity and/or specificity may be detected directly if the binding to the ligand results in a detectable signal (e.g., an increase or decrease in fluorescence intensity). For example, an RNA aptamer bound to malachite green may fluoresce whereas the dye alone does not fluoresce. In other embodiments, a fluorescent ligand or effector may be used and the assay to detect an RNA aptamer that binds to the ligand or effector may involve detecting quenching of the fluorescent signal associated with aptamer binding. In some embodiments, the ligand or effector may be toxic and RNA aptamer binding may lower the toxicity. In certain embodiments, an RNA that cleaves or modifies an effector molecule may be detected if cleavage or modification alters a detectable or selectable property of the ligand or effector.
In some aspects, RNA (e.g., an RNA aptamer) binding to a ligand may not be readily detectable using a direct detection technique. In some embodiments, RNA binding to a ligand may be detected indirectly if the candidate RNA is fused to a predetermined reporter RNA domain and binding of the candidate RNA to a ligand affects the structure and properties of the reporter domain to an extent that can be detected using one or more different readouts. A reporter domain may be a riboregulator or switch domain that changes conformation to either expose or sequester an antisense sequence when a ligand binds to the candidate domain. Accordingly, if the candidate RNA is an aptamer that specifically binds a ligand, the readout could be any detectable or selectable phenotype that can be regulated by antisense technology. According to the invention, any detectable or selectable phenotype may be used. For example, a readout may be drug resistance or susceptibility (e.g., antibiotic resistance or susceptibility), one or more detectable cell surface properties, a change in fluorescence intensity, auxotrophy, or one or more anabolic or catabolic phenotypes. It should be appreciated that a reporter domain may be fused to each candidate RNA transcribed in a library. In some embodiments, a DNA encoding the reporter RNA may be fused to each of the DNAs encoding the different RNA candidates in the library so that each candidate is transcribed along with a reporter domain. A DNA encoding a reporter RNA domain may be fused at the 3' end or the 5' end of each DNA encoding a candidate RNA, and accordingly transcribed candidate RNAs may have a reporter RNA at either their 3' or 5' end. In some embodiments, a reporter RNA may be fused at both the 3' and 5' ends. The reporter domains fused at the 3' and 5' ends may control different readouts. In some embodiments, different groups of candidate RNAs in the transcribed RNA library may be fused to different reporter RNAs. In some embodiments, a reporter RNA domain may be an enzyme that can be disrupted by ligand binding to an adjacent aptamer domain. In some embodiments, a reporter RNA domain may be a protein binding domain that can be disrupted by ligand binding to an adjacent aptamer domain.
Accordingly, in some embodiments each nucleic acid sequence expressing an RNA molecule has a different reporter system. In certain embodiments, two or more nucleic acid sequences have the same reporter system. In some embodiments, the reporter system is the system disclosed by Smolke et al. (2005, Nature Biotechnology, 23(3):337-343), the entire contents of which are incorporated herein by reference. For example, a ligand responsive riboregulator may be used to regulate the expression of any target transcript in response to any ligand. An example of such a construct may be a riboregulator having an antisense domain that controls gene expression and an aptamer domain that recognizes specific effector ligands. Ligand binding induces a conformational change in the molecule that allows the antisense domain to interact with a target mRNA and inhibit or reduce translation. As an example, the aptamer may bind a xanthine derivative, theopylline, causing a conformational change allowing the antisense domain to interact with the mRNA encoding green fluorescent protein (GFP).
In certain embodiments, the reporter system may be a yeast three-hybrid system such as that disclosed by SenGupta D. J. et al. (1996, Proc. Natl. Acad. Sci. USA, 93:8496-8501), the entire contents of which are incorporated herein by reference. For example, a hybrid protein containing a DNA-binding domain (for example LexA) with RNA-binding domain 1 localizes to the promoter of an appropriate reporter gene. A second hybrid protein containing a transcriptional activation domain with RNA binding domain 2 activates transcription of the reporter gene when in close proximity to the gene's upstream regulatory sequences. A hybrid RNA containing sites recognized by the two RNA-binding proteins links the two hybrid proteins to one another and the complex results in detectable expression of the reporter gene. Accordingly, a reporter domain may be any domain that is sensitive to (e.g., can be disrupted by) a ligand binding to an aptamer sequence that is fused to the reporter domain. As a result of ligand binding, the readout of mediated by the reporter domain may involve any detectable or selectable direct or indirect phenotype. The reporter may act via one or more protein, RNA, DNA, and/or other domains to produce a readout. Accordingly, an RNA reporter domain may be a ribozyme, an RNA switch, an antisense RNA, an allosteric effector RNA, an RNA that regulates the expression or activity of another RNA molecule, or an RNA that binds to a detectable compound. Therefore, the reporter domain also may be an aptamer domain.
RNA identification
In some embodiments, if each cell contains only one type of RNA candidate molecule, the isolation of a cell that has a selected or screened for phenotype provides the identify of the RNA having a desired structure or function (e.g., enzymatic activity, binding affinity, etc.)- The nucleic acid encoding the transcribed RNA may be isolated and sequenced. However, in embodiments where each cell contains a plurality of different RNA candidates, the isolation of a cell having a selected or screened for phenotype only narrows the identity of the targeted RNA down to one of the different RNAs that are transcribed in that cell. In some embodiments, the RNA with the desired structural and/or functional properties may be identified by independently testing each of the different RNAs that are transcribed in the cell. The RNAs may be tested by cloning each one and transcribing them and assaying them individually in vivo. In some embodiments, individual RNAs may be synthesized or assembled and tested in vivo or in vitro. It should be appreciated that other techniques may be used to identify the RNA of interest. In some embodiments, a cell that is isolated as having a desired phenotype may contain a set of RNA coding sequences that is enriched for one or a few variants. In some embodiments, regardless of the number of different transcribed RNAs in the isolated cell, further rounds of selection and or screening may be performed to enrich host cells for RNAs that have the desired properties. Repeated selection and/or screening may favor cells that have more copies of the RNA of interest relative to other transcribed RNA variants (e.g., due to gene conversion or other process that results in the RNA of interest spreading across the set of transcribed RNAs). Aptamer expressing cells
Aspects of the invention relate to cells capable of transcribing a plurality of different aptamers. In some embodiments, a plurality of aptamers may be preselected by their ability to bind to one or more different molecules of interest (e.g., one or more different ligands or effector molecules).
In some embodiments, a plurality of different aptamers may be transcribed by a single cell line. In certain embodiments, each cell expresses at least 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100 or more aptamers. In some embodiments, the transcribed aptamers are all different. In other embodiments, the transcribed aptamers may include one or more copies of the same aptamer.
In some embodiments, the transcription of one or more aptamers may be under the control of the same promoter. In certain embodiments, transcription of one or more aptamers molecules may be under the control of separate promoters. The separate promoters may be separate copies of the same promoter or different promoters. In some embodiments, one or more promoters may be inducible. In some embodiments, aptamer transcription may involve transcription enzymes of the host cell.
In some embodiments, transcribed aptamers may be of different lengths. In some embodiments, an aptamer may be at least 50 to at least 200 nucleotide bases long. In certain embodiments, an aptamer may be at least 50, 55, 60, 65, 70, 75, 80, 85, 90, 95,
100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, or 200 nucleotide bases long or longer. However, certain aptamers may be shorter that 50 bases long (e.g., between about 10 and about 50 bases long). In some embodiments, each transcribed aptamer may be of a different length. In some embodiments, certain aptamers may be transcribed as a single RNA chain. A single transcribed RNA may include two or more aptamers. In some embodiments, a single transcribed RNA may include 3, 4, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, or 100 aptamers. The single RNA transcript may include one or more cleavage sites that can be acted on to release one or more different aptamers from the RNA transcript. In some embodiments, one or more enzymes may cut the cleavage sites to release individual aptamers. In some embodiments, the cleavage sites may be autocatalytic RNA cleavage sites. In other embodiments, aptamers may be transcribed as individual transcripts. In certain embodiments, a plurality of aptamers may be transcribed in a combination of individual aptamer transcripts and RNA transcripts that include two or more aptamers.
In certain embodiments, one or more aptamer coding sequences may be integrated into the genome of a host cell.
In aspects of the invention, an aptamer may be transcribed fused to a reporter RNA. The reporter RNA may produce a signal (either directly or indirectly) if the aptamer binds to its Hgand. In aspects of the invention, an aptamer readout using a reporter RNA could be drug resistance or susceptibility, a cell surface property, a change in fluorescence intensity, auxotrophy, or other anabolic or catabolic phenotypes.
Identification of RNA aptamers using in vitro assays and combinations with in vivo assays
In another aspect, an RNA aptamer may be identified in an in vitro screen or selection. In some embodiments, a pool of RNA molecules may be provided wherein each different RNA molecule contains the same reporter RNA domain (e.g., a riboregulator sequence) attached to a different unique RNA sequence (e.g., a random RNA sequence) that is an aptamer candidate. The pool of RNA molecules may be screened or selected to identify RNA variants that bind to a molecule of interest. In some embodiments, screening or selection assays may involve different configurations of an assay wherein a candidate RNA that binds to the molecule of interest can be identified when it produces a configuration change in the riboregulator. The configuration change of the riboregulator can be detected using any suitable technique. For example, a configuration change that affects the binding properties of the riboregulator (e.g., by exposing or hiding one or more sequences of the riboregulator due to changes in hybridization patterns within the riboregulator domain due to the conformational change) can be detected using one or more different binding assays. The binding assay may involve a nucleic acid tag that is complementary to one or more of the riboregulator sequences that are either exposed or hidden (e.g., sequestered) as a result of a conformational change associated with binding or release of the ligand of interest by the aptamer portion of the RNA. The complementary nucleic acid tag may be an RNA (e.g., an rnRNA), DNA, PNA, or any other nucleic acid molecule that binds to a sequence on the riboregulator. The tag may be attached to a solid support (e.g., solid phase) or in a liquid or in a gel. In some embodiments, the complementary nucleic acid tag may be immobilized before or after exposure to the pool of RNA molecules. In some embodiments, to select for aptamers that bind to the complementary nucleic acid tag when bound to ligand (e.g., theophylline) the pool of RNA molecules can be exposed to the tag in presence of the ligand. After immobilization of the tag (and any bound RNA molecules), unbound or weakly bound molecules may be washed away. RNA molecules that bind then may be eluted by removing the ligand. In contrast, to select for aptamers that bind to the tag when not bound to a ligand, initial binding can be performed in the absence of ligand, and, after washing, bound aptamers can be eluted by adding ligand. In some embodiments, a combination of sequential binding and elution in the presence or absence of ligand may be performed. In some embodiments, if the riboregulator binds to a first nucleic acid tag in a first configuration and a second different nucleic acid tag in a second configuration (wherein the configuration change can be promoted by ligand binding to the aptamer domain attached to the riboregulator domain), a combination of binding an elutions to the first and second nucleic acid tags in the absence and presence of ligand may be used to isolate aptamers of interest.
In some embodiments, candidate RNA molecules or candidate motifs that are identified in an in vitro screen may be tested in vivo using a suitable expression vector for expressing the one or more RNA molecules. The candidate aptamer sequences may be tested in vivo in association with the same reporter domain that was used for the in vitro screen. However, in some embodiments, the candidate aptamer sequences may be connected to a different reporter domain to test for in vivo properties of interest. In some embodiments, in vitro techniques may be used to screen variants of one or more aptamers identified in vitro. For example, a library containing a large number of variants of an RNA aptamer identified in vitro may be screened to identify variants that have higher affinity, that bind in vivo, or a combination thereof. It should be appreciated that variants each may have one or more sequence changes (e.g., 1-10, 10-50, 50-100, etc.) relative to an initial RNA aptamer. Similarly, an in vivo technique of the invention may be used to screen variants of an RNA aptamer that was identified in a previous in vivo screen or selection. It should be appreciated that information obtained from a first in vitro or in vivo screen or selection may be used to identify important features for binding to a particular ligand. In some embodiments, a subsequent library may be designed to include variants at all other positions (and not the ones identified as important or required) in order to identify variants that increase the core binding associated with the important or required positions. Certain structural considerations that may be used to inform design criteria for
RNA pools or for in vitro or in vivo libraries of candidate KNA aptamers are described in the literature, for example in Carothers et al. (J. Am. Chem. Soc. 2004, 126, 5130-5137), the entire contents of which are incorporated herein by reference.
It should be appreciated that RNA aptamers with high affinity for large, complex, flexible, and/or hydrophobic molecules may be rarer than RNA aptamers for smaller and/or charged Iigands such as GTP and/or theophylline. However, aspects of the invention may be useful to screen large numbers of RNA molecules and identify RNA aptamers having affinity for any ligand of interest (e.g., any ligand described herein). In some embodiments, libraries containing relatively long RNA candidate molecules (e.g., having aptamer domains that are longer than 100 nucleotides, longer than about 200 nucleotides, longer than about 300 nucleotides, about 300 to about 500 nucleotides, or longer than about 500 nucleotides) may be screened to identify RNA aptamers that bind to certain Iigands (e.g., large, complex, flexible, hydrophobic, or any combination thereof). According to the invention, relatively long RNA sequences may be important to allow for complex configurations that can bind (e.g., with high affinity) to certain
Iigands. In some embodiments, libraries of previously known aptamers may be prepared to include a variable length linker (on one or both sides of the original aptamers) so that more complex RNA including the original aptamers can be tested. These libraries can be screened in vivo or in vitro as described herein (e.g., attached a reporter domain such as a riboregulator domain).
Applications
As described above, aspects of the invention provide sets of aptamers that can detect the presence of one or more different Iigands or effector molecules. In some embodiments, an aptamer set may be provided and transcribed in a host cell (e.g., from a transcription template that is in a vector or that is integrated into the genome of the host cell). In some embodiments, any additional RNAs and/or proteins that may be required for the different readouts may be transcribed in the host cell.
Aptamer sets of the invention may be used to detect the presence of any type of ligand, including for example, different analytes, metabolic intermediates and products, toxins, environmental contaminants and pollutants, and any other type of ligand and or effector molecule.
Accordingly, aptamer containing cells (or isolated preparations of aptamer sets) may be used in medicine, biotechnology, industry, agriculture, environmental studies and remediation, mining, and any other application where one or more ligands may need to be detected. In aspects of the invention, an environmental pollutant may be a water, air, or soil pollutant. Water pollutants may be compounds such as organic and inorganic chemicals, for example, heavy metals, petrochemicals, chloroform, and different types of bacteria. Water pollution also may occur in the form of thermal pollution and dissolved oxygen depletion. Air pollutants may be compounds such as carbon monoxide, sulfur dioxide, chlorofluorocarbons (CFCs), and nitrogen oxides. Soil pollutants may be compounds such as hydrocarbons, heavy metals, methyl tert-butyl ether (MTBE), herbicides, pesticides and chlorinated hydrocarbons, and others. Such detection methods may be important for detecting changes in pollutants after natural disasters such as hurricanes or flooding. Compositions and methods of the invention also may be useful to identify the presence of one or more metabolic intermediates and/or products. In some embodiments, detection may be performed in the natural cellular environment in a live cell rather than in a cellular extract. In some embodiments, metabolic pathways may be studied and individual steps may be identified by providing, in vivo, a. plurality of different aptamers that are responsive to different intermediate compounds. By determining which aptamers give a positive readout, the nature of the intermediate compounds can be determined and a metabolic pathway may be inferred. In some embodiments, an aptamer set containing different aptamers that are responsive to different substrates, metabolic intermediates, and/or desired end products may be used as a reporter system (e.g., either on a plasmid or integrated into the genome of a host cell) in techniques designed to evolve or select novel biosynthetic pathways. An aptamer set that is selected may include one or more copies of aptamers that are selective for intermediates o analytes that are expected to be produced in a novel biosynthetic pathway of interest. In some embodiments, an appropriate readout from an aptamer set may be used to indicate that a particular combination of enzymes and/or enzyme variants may have a metabolic effect that is desired. It should be appreciated that a nucleic acid construct encoding an aptamer set of interest may be transcribed in vitro. Similarly, a set of RNA aptamers that are responsive to different ligands of interest may be assembled in vitro. Sets of aptamers that bind specifically to a plurality of different ligands also may be used in vitro. In some embodiments, the aptamers may be used in an in vitro assay to detect any one or more of a plurality of different ligands (e.g., metabolic intermediates, toxins, environmental pollutants, contaminants, pathogens, analytes, etc.). In some embodiments, one or more stabilizing residues (e.g., one or more 2'-O-methyl ribonucleotides or other stabilizing ribonucleotides) may be incorporated into aptamers that are synthesized in vitro and/or in vivo. Aspects of the invention may involve one or more nucleic acid assembly reactions in order to make the sets of DNA molecules,. RNA encoding fragments, aptamer constructs, modified host cells, and/or other nucleic acids that may be used to isolate and/or use RNA molecules having one or more functions of interest. Aspects of the invention may be used in conjunction with in vitro and/or in vivo nucleic acid assembly procedures. Non-limiting examples of extension-based and ligation-based assembly reactions are described herein and illustrated in FIGS. 1-4. FIG. 5 illustrates a method for assembling a nucleic acid in accordance with one embodiment of the invention. Initially, in act 500, sequence information is obtained. The sequence information may be the sequence of a predetermined target nucleic acid that is to be assembled. In some embodiments, the sequence may be received in the form of an order from a customer. The order may be received electronically or on a paper copy. In some embodiments, the sequence may be received as a nucleic acid sequence (e.g., DNA or RNA). In some embodiments, the sequence may be received as a protein sequence. The sequence may be converted into a DNA sequence. For example, if the sequence obtained in act 500 is an RNA sequence, the Us may be replaced with Ts to obtain the corresponding DNA sequence. If the sequence obtained in act 500 is a protein sequence, it may be converted into a DNA sequence using appropriate codons for the amino acids. When choosing codons for each amino acid, consideration may be given to one or more of the following factors: i) using codons that correspond to the codon bias in the organism in which the target nucleic acid may be expressed, ii) avoiding excessively high or low GC or AT contents in the target nucleic acid (for example, above 60% or below 40%; e.g., greater than 65%, 70%, 75%, 80%, 85%, or 90%; or less than 35%, 30%, 25%, 20%, 15%, or 10%), and iii) avoiding sequence features that may interfere with the assembly procedure (e.g., the presence of repeat sequences or stem loop structures). However, these factors may be ignored in some embodiments as the invention is not limited in this respect. Also, aspects of the invention may be used to reduce errors caused by one or more of these factors. Accordingly, a DNA sequence determination (e.g., a sequence determination algorithm or an automated process for determining a target DNA sequence) may omit one or more steps relating to the analysis of the GC or AT content of the target nucleic acid sequence (e.g., the GC or AT content may be ignored in some embodiments) or one or more steps relating to the analysis of certain sequence features (e.g., sequence repeats, inverted repeats, etc.) that could interfere with an assembly reaction performed under standard conditions but may not interfere with an assembly reaction including one or more concerted assembly steps.
In act 510, the sequence information may be analyzed to determine an assembly strategy. This may involve determining whether the target nucleic acid will be assembled as a single fragment or if several intermediate fragments will be assembled separately and then combined in one or more additional rounds of assembly to generate the target nucleic acid. Once the overall assembly strategy has been determined, input nucleic acids (e.g., oligonucleotides) for assembling the one or more nucleic acid fragments may be designed. The sizes and numbers of the input nucleic acids may be based in part on the type of assembly reaction (e.g., the type of polymerase-based assembly, ligase-based assembly, chemical assembly, or combination thereof) that is being used for each fragment. The input nucleic acids also niay be designed to avoid 5' and/or 3' regions that may cross-react incorrectly and be assembled to produce undesired nucleic acid fragments. Other structural and/or sequence factors also may be considered when designing the input nucleic acids. In certain embodiments, some of the input nucleic acids may be designed to incorporate one or more specific sequences (e.g., primer binding sequences, restriction enzyme sites, etc.) at one or both ends of the assembled nucleic acid fragment.
In act 520, the input nucleic acids are obtained. These may be synthetic oligonucleotides that are synthesized on-site or obtained from a different site (e.g., from a commercial supplier). In some embodiments, one or more input nucleic acids may be amplification products (e.g., PCR products), restriction fragments, or other suitable nucleic acid molecules. Synthetic oligonucleotides may be synthesized using any appropriate technique as described in more detail herein. It should be appreciated that synthetic oligonucleotides often have sequence errors. Accordingly, oligonucleotide preparations may be selected or screened to remove error-containing molecules as described in more detail herein.
In act 530, an assembly reaction may be performed for each nucleic acid fragment. For each fragment, the input nucleic acids may be assembled using any appropriate assembly technique (e.g., a polymerase-based assembly, a ligase-based assembly, a chemical assembly, or any other multiplex nucleic acid assembly technique, or any combination thereof). An assembly reaction may result in the assembly of a number of different nucleic acid products in addition to the predetermined nucleic acid fragment. Accordingly, in some embodiments, an assembly reaction may be processed to remove incorrectly assembled nucleic acids (e.g., by size fractionation) and/or to enrich correctly assembled nucleic acids (e.g., by amplification, optionally followed by size fractionation). In some embodiments, correctly assembled nucleic acids may be amplified (e.g., in a PCR reaction) using primers that bind to the ends of the predetermined nucleic acid fragment. It should be appreciated that act 530 may be repeated one or more times. For example, in a first round of assembly a first plurality of input nucleic acids (e.g., oligonucleotides) may be assembled to generate a first nucleic acid fragment. In a second round of assembly, the first nucleic acid fragment may be combined with one or more additional nucleic acid fragments and used as starting material for the assembly of a larger nucleic acid fragment. In a third round of assembly, this larger fragment may be combined with yet further nucleic acids and used as starting material for the assembly of yet a larger nucleic acid. This procedure may be repeated as many times as needed for the synthesis of a target nucleic acid. Accordingly, progressively larger nucleic acids may be assembled. At each stage, nucleic acids of different sizes may be combined. At each stage, the nucleic acids being combined may have been previously assembled in a multiplex assembly reaction. However, at each stage, one or more nucleic acids being combined may have been obtained from different sources (e.g., PCR amplification of genomic DNA or cDNA, restriction digestion of a plasmid or genomic DNA, or any other suitable source). It should be appreciated that nucleic acids generated in each cycle of assembly may contain sequence errors if they incorporated one or more input nucleic acids with sequence error(s). Accordingly, a fidelity optimization procedure may be performed after a cycle of assembly in order to remove or correct sequence errors. It should be appreciated that fidelity optimization may be performed after each assembly reaction when several successive cycles of assembly are performed. However, in certain embodiments fidelity optimization may be performed only after a subset (e.g., 2 or more) of successive assembly reactions are complete. In some embodiments, no fidelity optimization is performed.
Accordingly, act 540 is an optional fidelity optimization procedure. Act 540 may be used in some embodiments to remove nucleic acid fragments that seem to be correctly assembled (e.g., based on their size or restriction enzyme digestion pattern) but that may have incorporated input nucleic acids containing sequence errors as described herein. For example, since synthetic oligonucleotides may contain incorrect sequences due to errors introduced during oligonucleotide synthesis, it may be useful to remove nucleic acid fragments that have incorporated one or more error-containing oligonucleotides during assembly. In some embodiments, one or more assembled nucleic acid fragments may be sequenced to determine whether they contain the predetermined sequence or not. This procedure allows fragments with the correct sequence to be identified. However, in some embodiments, other techniques may be used to remove error containing nucleic acid fragments. It should be appreciated that error containing-nucleic acids may be double-stranded homoduplexes having the error on both strands (i.e., incorrect complementary nucleotide(s), deletion(s), or addition(s) on both strands), because the assembly procedure may involve one or more rounds of polymerase extension (e.g., during assembly or after assembly to amplify the assembled product) during which an input nucleic acid containing an error may serve as a template thereby producing a complementary strand with the complementary error. In certain embodiments, a preparation of double-stranded nucleic acid fragments may be suspected to contain a mixture of nucleic acids that have the correct sequence and nucleic acids that incorporated one or more sequence errors during assembly. In some embodiments, sequence errors may be removed using a technique that involves denaturing and reannealing the double-stranded nucleic acids. In some embodiments, single strands of nucleic acids that contain complementary errors may be unlikely to reanneal together if nucleic acids containing each individual error are present in the nucleic acid preparation at a lower frequency than nucleic acids having the correct sequence at the same position. Rather, error containing single strands may reanneal with a complementary strand that contains no errors or that contains one or more different errors. As a result, error- containing strands may end up in the form of heteroduplex molecules in the reannealed reaction product. Nucleic acid strands that are error-free may reanneal with error- containing strands or with other error-free strands. Reannealed error-free strands form homoduplexes in the reannealed sample. Accordingly, by removing heteroduplex molecules from the reannealed preparation of nucleic acid fragments, the amount or frequency of error containing nucleic acids may be reduced. Any suitable method for removing heteroduplex molecules may be used, including chromatography, electrophoresis, selective binding of heteroduplex molecules, etc. In some embodiments, mismatch binding proteins that selectively (e.g., specifically) bind to heteroduplex nucleic acid molecules may be used. One example includes using MutS, a MutS homolog, or a combination thereof to bind to heteroduplex molecules. In E. coli, the MutS protein, which appears to function as a homodirner, serves as a mismatch recognition factor. In eukaryotes, at least three MutS Homolog (MSH) proteins have been identified; namely, MSH2, MSH3, and MSH6, and they form heterodimers. For example in the yeast, Sacchάromyces cerevisiae, the MSH2-MSH6 complex (also known as MutSα) recognizes base mismatches and single nucleotide insertion/deletion loops, while the MSH2-MSH3 complex (also known as MutSβ) recognizes insertions/deletions of up to 12-16 nucleotides, although they exert substantially redundant functions. A mismatch binding protein may be obtained from recombinant or natural sources. A mismatch binding protein may be heat-stable. In some embodiments, a thermostable mismatch binding protein from a thermophilic organism may be used. Examples of thermostable DNA mismatch binding proteins include, but are not limited to: Tth MutS (from Thermus thermophilus); Taq MutS (from Thermus aqualicus); Apy MutS (from Aquifex pyrophilus); Tma MutS (from Thermotoga maritimd)\ any other suitable MutS; or any combination of two or more thereof.
According to aspects of the invention, protein-bound heteroduplex molecules (e.g., heteroduplex molecules bound to one or more MutS proteins) may be removed from a sample using any suitable technique (binding to a column, a filter, a nitrocellulose filter, etc., or any combination thereof). It should be appreciated that this procedure may not be 100% efficient. Some errors may remain for at least one of the following reasons. Depending on the reaction conditions, not all of the double-stranded error-containing nucleic acids may be denatured. In addition, some of the denatured error-containing strands may reanneal with complementary error-containing strands to form an error containing homoduplex. Also, the MutS/heteroduplex interaction and the >MutS/heteroduplex removal procedures may not be 100% efficient. Accordingly, in some embodiments the fidelity optimization act 540 may be repeated one or more times after each assembly reaction. For example, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more cycles of fidelity optimization may be performed after each assembly reaction. In some embodiments, the nucleic acid is amplified after each fidelity optimization procedure. It should be appreciated that each cycle of fidelity optimization will remove additional error-containing nucleic acid molecules. However, the proportion of correct sequences is expected to reach a saturation level after a few cycles of this procedure. In some embodiments, the size of an assembled nucleic acid that is fidelity optimized (e.g., using MutS or a MutS homolog) may be determined by the expected number of sequence errors that are suspected to be incorporated into the nucleic acid during assembly. For example, an assembled nucleic acid product should include error free nucleic acids prior to fidelity optimization in order to be able to enrich for the error free nucleic acids. Accordingly, error screening (e.g., using MutS or a MutS homolog) should be performed on shorter nucleic acid fragments when input nucleic acids have higher error rates. In some embodiments, one or more nucleic acid fragments of between about 200 and about 800 nucleotides (e.g., about 200, about 300, about 400, about 500, about 600, about 700 or about 800 nucleotides in length) are assembled prior to fidelity optimization. After assembly, the one or more fragments may be exposed to one or more rounds of fidelity optimization as described herein. In some embodiments, several assembled fragments may be ligated together (e.g., to produce a larger nucleic acid fragment of between about 1,000 and about 5,000 bases in length, or larger), and optionally cloned into a vector, prior to fidelity optimization as described herein.
At act 550, an output nucleic acid is obtained. As discussed herein, several rounds of act 530 and/or 540 may be performed to obtain the output nucleic acid, depending on the assembly strategy that is implemented. The output nucleic acid may be amplified, cloned, stored, etc., for subsequent uses at act 560. In some embodiments, an output nucleic acid may be cloned with one or more other nucleic acids (e.g., other output nucleic acids) for subsequent applications. Subsequent applications may include one or more research, diagnostic, medical, clinical, industrial, therapeutic, environmental, agricultural, or other uses.
Aspects of the invention may include automating one or more acts described herein. For example, sequence analysis, the identification of interfering sequence features, assembly strategy selection (including fragment design and selection, the choice of a particular combination of extension-based and ligation-based assembly reactions, etc.), fragment production, single-stranded overhang production, and/or concerted assembly may be automated in order to generate the desired product automatically. Acts of the invention may be automated using, for example, a computer system.
Aspects of the invention may be used in conjunction with any suitable multiplex nucleic acid assembly procedure. For example, concerted assembly steps may be used in connection with or more of the multiplex nucleic acid assembly procedures described below.
Multiplex Nucleic Acid Assembly
In aspects of the invention, multiplex nucleic acid assembly relates to the assembly of a plurality of nucleic acids to generate a longer nucleic acid product. In one aspect, multiplex oligonucleotide assembly relates to the assembly of a plurality of oligonucleotides to generate a longer nucleic acid molecule. However, it should be appreciated that other nucleic acids (e.g., single or double-stranded nucleic acid degradation products, restriction fragments, amplification products, naturally occurring small nucleic acids, other polynucleotides, etc.) may be assembled or included in a multiplex assembly reaction (e.g., along with one or more oligonucleotides) in order to generate an assembled nucleic acid molecule that is longer than any of the single starting nucleic acids (e.g., oligonucleotides) that were added to the assembly reaction. In certain embodiments, one or more nucleic acid fragments that each were assembled in separate multiplex assembly reactions (e.g., separate multiplex oligonucleotide assembly reactions) may be combined and assembled to form a further nucleic acid that is longer than any of the input nucleic acid fragments. In certain embodiments, one or more nucleic acid fragments that each were assembled in separate multiplex assembly reactions (e.g., separate multiplex oligonucleotide assembly reactions) may be combined with one or more additional nucleic acids (e.g., single or double-stranded nucleic acid degradation products, restriction fragments, amplification products, naturally occurring small nucleic acids, other polynucleotides, etc.) and assembled to form a further nucleic acid that is longer than any of the input nucleic acids.
In aspects of the invention, one or more multiplex assembly reactions may be used to generate target nucleic acids having predetermined sequences. In one aspect, a target nucleic acid may have a sequence of a naturally occurring gene and/or other naturally occurring nucleic acid (e.g., a naturally occurring coding sequence, regulatory sequence, non-coding sequence, chromosomal structural sequence such as a telomere or centromere sequence, etc., any fragment thereof or any combination of two or more thereof). In another aspect, a target nucleic acid may have a sequence that is not naturally-occurring. In one embodiment, a target nucleic acid may be designed to have a sequence that differs from a natural sequence at one or more positions. In other embodiments, a target nucleic acid may be designed to have an entirely novel sequence. However, it should be appreciated that target nucleic acids may include one or more naturally occurring sequences, non-naturally occurring sequences, or combinations thereof. In one aspect of the invention, multiplex assembly may be used to generate libraries of nucleic acids having different sequences. In some embodiments, a library may contain nucleic acids having random sequences. In certain embodiments, a predetermined target nucleic acid may be designed and assembled to include one or more random sequences at one or more predetermined positions. In certain embodiments, a target nucleic acid may include a functional sequence
(e.g., a protein binding sequence, a regulatory sequence, a sequence encoding a functional protein, etc., or any combination thereof). However, some embodiments of a target nucleic acid may lack a specific functional sequence (e.g., a target nucleic acid may include only non-functional fragments or variants of a protein binding sequence, regulatory sequence, or protein encoding sequence, or any other non-functional naturally-occurring or synthetic sequence, or any non-functional combination thereof). Certain target nucleic acids may include both functional and non-functional sequences. These and other aspects of target nucleic acids and their uses are described in more detail herein.
A target nucleic acid may be assembled in a single multiplex assembly reaction (e.g., a single oligonucleotide assembly reaction). However, a target nucleic acid also may be assembled from a plurality of nucleic acid fragments, each of which may have been generated in a separate multiplex oligonucleotide assembly reaction. It should be appreciated that one or more nucleic acid fragments generated via multiplex oligonucleotide assembly also may be combined with one or more nucleic acid molecules obtained from another source (e.g., a restriction fragment, a nucleic acid amplification product, etc.) to form a target nucleic acid. In some embodiments, a target nucleic acid that is assembled in a first reaction may be used as an input nucleic acid fragment for a subsequent assembly reaction to produce a larger target nucleic acid.
Accordingly, different strategies may be used to produce a target nucleic acid having a predetermined sequence. For example, different starting nucleic acids (e.g., different sets of predetermined nucleic acids) may be assembled to produce the same predetermined target nucleic acid sequence. Also, predetermined nucleic acid fragments may be assembled using one or more different in vitro and/or in vivo techniques. For example, nucleic acids (e.g., overlapping nucleic acid fragments) may be assembled in an in vitro reaction using an enzyme (e.g., a ligase and/or a polymerase) or a chemical reaction (e.g., a chemical ligation) or in vivo (e.g., assembled in a host cell after transfection into the host cell), or a combination thereof. Similarly, each nucleic acid fragment that is used to make a target nucleic acid may be assembled from different sets of oligonucleotides. Also, a nucleic acid fragment may be assembled using an in vitro or an in vivo technique (e.g., an in vitro or in vivo polymerase, recombinase, and/or ligase based assembly process). In addition, different in vitro assembly reactions may be used to produce a nucleic acid fragment. For example, an in vitro oligonucleotide assembly reaction may involve one or more polymerases, ligases, other suitable enzymes, chemical reactions, or any combination thereof.
Multiplex oligonucleotide assembly A predetermined nucleic acid fragment may be assembled from a plurality of different starting nucleic acids (e.g., oligonucleotides) in a multiplex assembly reaction (e.g., a multiplex enzyme-mediated reaction, a multiplex chemical assembly reaction, or a combination thereof). Certain aspects of multiplex nucleic acid assembly reactions are illustrated by the following description of certain embodiments of multiplex oligonucleotide assembly reactions. It should be appreciated that the description of the assembly reactions in the context of oligonucleotides is not intended to be limiting. The assembly reactions described herein may be performed using starting nucleic acids obtained from one or more different sources (e.g., synthetic or natural polynucleotides, nucleic acid amplification products, nucleic acid degradation products, oligonucleotides, etc.). The starting nucleic acids may be referred to as assembly nucleic acids (e.g., assembly oligonucleotides). As used herein, an assembly nucleic acid has a sequence that is designed to be incorporated into the nucleic acid product generated during the assembly process. However, it should be appreciated that the description of the assembly reactions in the context of single-stranded nucleic acids is not intended to be limiting. In some embodiments, one or more of the starting nucleic acids illustrated in the figures and described herein may be provided as double stranded nucleic acids. Accordingly, it should be appreciated that where the figures and description illustrate the . assembly of single-stranded nucleic acids, the presence of one or more complementary nucleic acids is contemplated. Accordingly, one or more double-stranded complementary nucleic acids may be included in a reaction that is described herein in the context of a single-stranded assembly nucleic acid. However, in some embodiments the presence of one or more complementary nucleic acids may interfere with an assembly reaction by competing for hybridization with one of the input assembly nucleic acids. Accordingly, in some embodiments an assembly reaction may involve only single- stranded assembly nucleic acids (i.e., the assembly nucleic acids may be provided in a single-stranded form without their complementary strand) as described or illustrated herein. However, in certain embodiments the presence of one or more complementary nucleic acids may have no or little effect on the assembly reaction. In some embodiments, complementary nucleic acid(s) may be incorporated during one or more steps of an assembly. In yet further embodiments, assembly nucleic acids and their complementary strands may be assembled under the same assembly conditions via parallel assembly reactions in the same reaction mixture. In certain embodiments, a nucleic acid product resulting from the assembly of a plurality of starting nucleic acids may be identical to the nucleic acid product that results from the assembly of nucleic acids that are complementary to the starting nucleic acids (e.g., in some embodiments where the assembly steps result in the production of a double-stranded nucleic acid product). As used herein, an oligonucleotide may be a nucleic acid molecule comprising at least two covalently bonded nucleotide residues. In some embodiments, an oligonucleotide may be between 10 and 1,000 nucleotides long. For example, an oligonucleotide may be between 10 and 500 nucleotides long, or between 500 and 1,000 nucleotides long. In some embodiments, an oligonucleotide may be between about 20 and about 100 nucleotides long (e.g., from about 30 to 90, 40 to 85, 50 to 80, 60 to 75, or about 65 or about 70 nucleotides long), between about 100 and about 200, between about 200 and about 300 nucleotides, between about 300 and about 400, or between about 400 and about 500 nucleotides long. However, shorter or longer oligonucleotides may be used. An oligonucleotide may be a single-stranded nucleic acid. However, in some embodiments a double-stranded oligonucleotide may be used as described herein. In certain embodiments, an oligonucleotide may be chemically synthesized as described in more detail below.
In some embodiments, an input nucleic acid (e.g., oligonucleotide) may be amplified before use. The resulting product may be double-stranded. In some embodiments, one of the strands of a double-stranded nucleic acid may be removed before use so that only a predetermined single strand is added to an assembly reaction.
In certain embodiments, each oligonucleotide may be designed to have a sequence that is identical to a different portion of the sequence of a predetermined target nucleic acid that is to be assembled. Accordingly, in some embodiments each oligonucleotide may have a sequence that is identical to a portion of one of the two strands of a double-stranded target nucleic acid. For clarity, the two complementary strands of a double stranded nucleic acid are referred to herein as the positive (P) and negative (N) strands. This designation is not intended to imply that the strands are sense and anti-sense strands of a coding sequence. They refer only to the two complementary strands of a nucleic acid (e.g., a target nucleic acid, an intermediate nucleic acid fragment, etc.) regardless of the sequence or function of the nucleic acid. Accordingly, in some embodiments a P strand may be a sense strand of a coding sequence, whereas in other embodiments a P strand may be an anti-sense strand of a coding sequence. According to the invention, a target nucleic acid may be either the P strand, the N strand, or a double-stranded nucleic acid comprising both the P and N strands.
It should be appreciated that different oligonucleotides may be designed to have different lengths. In some embodiments, one or more different oligonucleotides may have overlapping sequence regions (e.g., overlapping 5' regions or overlapping 3' regions). Overlapping sequence regions may be identical (i.e., corresponding to the same strand of the nucleic acid fragment) or complementary (i.e., corresponding to complementary strands of the nucleic acid fragment). The plurality of oligonucleotides may include one or more oligonucleotide pairs with overlapping identical sequence regions, one or more oligonucleotide pairs with overlapping complementary sequence regions, or a combination thereof. Overlapping sequences may be of any suitable length. For example, overlapping sequences may encompass the entire length of one or more nucleic acids used in an assembly reaction. Overlapping sequences may be between about 5 and about 500 nucleotides long (e.g., between about 10 and 100, between about 10 and 75, between about 10 and 50, about 20, about 25, about 30, about 35, about 40, about 45, about 50, etc.) However, shorter, longer or intermediate overlapping lengths may be used. It should be appreciated that overlaps between different input nucleic acids used in an assembly reaction may have different lengths. In a multiplex oligonucleotide assembly reaction designed to generate a predetermined nucleic acid fragment, the combined sequences of the different oligonucleotides in the reaction may span the sequence of the entire nucleic acid fragment on either the positive strand, the negative strand, both strands, or a combination of portions of the positive strand and portions of the negative strand. The plurality of different oligonucleotides may provide either positive sequences, negative sequences, or a combination of both positive and negative sequences corresponding to the entire sequence of the nucleic acid fragment to be assembled. In some embodiments, the plurality of oligonucleotides may include one or more oligonucleotides having sequences identical to one or more portions of the positive sequence, and one or more oligonucleotides having sequences that are identical to one or more portions of the negative sequence of the nucleic acid fragment. One or more pairs of different oligonucleotides may include sequences that are identical to overlapping portions of the predetermined nucleic acid fragment sequence as described herein (e.g., overlapping sequence portions from the same or from complementary strands of the nucleic acid fragment). In some embodiments, the plurality of oligonucleotides includes a set of oligonucleotides having sequences that combine to span the entire positive sequence and a set oligonucleotides having sequences that combine to span the entire negative sequence of the predetermined nucleic acid fragment. However, in certain embodiments, the plurality of oligonucleotides may include one or more oligonucleotides with sequences that are identical to sequence portions on one strand (either the positive or negative strand) of the nucleic acid fragment, but no oligonucleotides with sequences that are complementary to those sequence portions. In one embodiment, a plurality of oligonucleotides includes only oligonucleotides having sequences identical to portions of the positive sequence of the predetermined nucleic acid fragment. In one embodiment, a plurality of oligonucleotides includes only oligonucleotides having sequences identical to portions of the negative sequence of the predetermined nucleic acid fragment. These oligonucleotides may be assembled by sequential ligation or in an extension-based reaction (e.g., if an oligonucleotide having a 3' region that is complementary to one of the plurality of oligonucleotides is added to the reaction).
In one aspect, a nucleic acid fragment may be assembled in a polymerase- mediated assembly reaction from a plurality of oligonucleotides that are combined and extended in one or more rounds of polymerase-mediated extensions. In another aspect, a nucleic acid fragment may be assembled in a ligase-mediated reaction from a plurality of oligonucleotides that are combined and ligated in one or more rounds of ligase-mediated ligations. In another aspect, a nucleic acid fragment may be assembled in a non- enzymatic reaction (e.g., a chemical reaction) from a plurality of oligonucleotides that are combined and assembled in one or more rounds of non-enzymatic reactions. In some embodiments, a nucleic acid fragment may be assembled using a combination of polymerase, ligase, and/or non-enzymatic reactions. For example, both polymerase(s) and ligase(s) may be included in an assembly reaction mixture. Accordingly, a nucleic acid may be assembled via coupled amplification and ligation or ligation during amplification. The resulting nucleic acid fragment from each assembly technique may have a sequence that includes the sequences of each of the plurality of assembly oligonucleotides that were used as described herein. These assembly reactions may be referred to as primerless assemblies, since the target nucleic acid is generated by assembling the input oligonucleotides rather than being generated in an amplification reaction where the oligonucleotides act as amplification primers to amplify a pre-existing template nucleic acid molecule corresponding to the target nucleic acid. Polymerase-based assembly techniques may involve one or more suitable polymerase enzymes that can catalyze a template-based extension of a nucleic acid in a 5' to 3' direction in the presence of suitable nucleotides and an annealed template. A polymerase may be thermostable. A polymerase may be obtained from recombinant or natural sources. In some embodiments, a thermostable polymerase from a thermophilic organism may be used. In some embodiments, a polymerase may include a 3'-→ 5' exonuclease/proofreading activity. In some embodiments, a polymerase may have no, or little, proofreading activity (e.g., a polymerase may be a recombinant variant of a natural polymerase that has been modified to reduce its proofreading activity). Examples of thermostable DNA polymerases include, but are not limited to: Taq (a heat-stable DNA polymerase from the bacterium Thermus aguaticus); Pfu (a thermophilic DNA polymerase with a 3'-→ 5' exonuclease/proofreading activity from Pyrococcus furiosus, available from for example Promega); VentR® DNA Polymerase and VentR® (exo-) DNA Polymerase (thermophilic DNA polymerases with or without a 3 '→ 5 ' exonuclease/proofreading activity from Thermococcus litoralis; also known as TIi polymerase); Deep VentR® DNA Polymerase and Deep VentR® (exo-) DNA Polymerase (thermophilic DNA polymerases with or without a 3 '—» 5 ' exonuclease/proofreading activity from Pyrococcus species GB-D; available from New England Biolabs); KOD HiFi (a recombinant Thermococcus kodάkaraensis KODl DNA polymerase with a 3 '—* 5 'exonuclease/proofreading activity, available from Novagen,); BIO-X-ACT (a mix of polymerases that possesses 5 -3' DNA polymerase activity and 3'—> 5' proofreading activity); Klenow Fragment (an N-terminal truncation of E. coli DNA Polymerase I which retains polymerase activity, but has lost the 5 '-→ 3 ' exonuclease activity, available from, for example, Promega and NEB); Sequenase™ (T7 DNA polymerase deficient in 3'-5' exonuclease activity); Phi29 (bacteriophage 29 DNA polymerase, may be used for rolling circle amplification, for example, in a TempliPhi™ DNA Sequencing Template Amplification Kit, available from Amersham Biosciences); TopoTaq (a hybrid polymerase that combines hyperstable DNA binding domains and the DNA unlinking activity of Methanopyrus topoisomerase, with no exonuclease activity, available from Fidelity Systems); TopoTaq HiFi which incorporates a proofreading domain with exonuclease activity; Phusion™ (a Pyrococcus-lϊke enzyme with a processivity-enhancing domain, available from New England Biolabs); any other suitable DNA polymerase, or any combination of two or more thereof.
Ligase-based assembly techniques may involve one or more suitable ligase enzymes that can catalyze the covalent linking of adjacent 3' and 5' nucleic acid termini (e.g., a 5' phosphate and a 3' hydroxyl of nucleic acid(s) annealed on a complementary template nucleic acid such that the 3' terminus is immediately adjacent to the 5' terminus). Accordingly, a ligase may catalyze a ligation reaction between the 5' phosphate of a first nucleic acid to the 3' hydroxyl of a second nucleic acid if the first and second nucleic acids are annealed next to each other on a template nucleic acid), A ligase may be obtained from recombinant or natural sources. A ligase may be a heat- stable ligase. In some embodiments, a thermostable ligase from a thermophilic organism may be used. Examples of thermostable DNA ligases include, but are not limited to: Tth DNA ligase (from Thermus thermophilus, available from, for example, Eurogentec and GeneCraft); PfU DNA ligase (a hyperthermophilic ligase from Pyrococcus furiosus); Taq ligase (from Thermus aquaticus), any other suitable heat-stable ligase, or any combination thereof. In some embodiments, one or more lower temperature ligases may be used (e.g., T4 DNA ligase). A lower temperature ligase may be useful for shorter overhangs (e.g., about 3, about 4, about 5, or about 6 base overhangs) that may not be stable at higher temperatures.
Non-enzymatic techniques can be used to ligate nucleic acids. For example, a 5'- end (e.g., the 5' phosphate group) and a 3'-end (e.g., the 3? hydroxyl) of one or more nucleic acids may be covalently linked together without using enzymes (e.g., without using a ligase). In some embodiments, non-enzymatic techniques may offer certain advantages over enzyme-based ligations. For example, non-enzymatic techniques may have a high tolerance of non-natural nucleotide analogues in nucleic acid substrates, may be used to ligate short nucleic acid substrates, may be used to ligate RNA substrates, and/or may be cheaper and/or more suited to certain automated (e.g., high throughput) applications. Non-enzymatic ligation may involve a chemical ligation. In some embodiments, nucleic acid termini of two or more different nucleic acids may be chemically ligated. In some embodiments, nucleic acid termini of a single nucleic acid may be chemically ligated (e.g., to circularize the nucleic acid). It should be appreciated that both strands at a first double-stranded nucleic acid terminus may be chemically ligated to both strands at a second double-stranded nucleic acid terminus. However, in some embodiments only one strand of a first nucleic acid terminus may be chemically ligated to a single strand of a second nucleic acid terminus. For example, the 5' end of one strand of a first nucleic acid terminus may be ligated to the 3' end of one strand of a second nucleic acid terminus without the ends of the complementary strands being chemically ligated. Accordingly, a chemical ligation may be used to form a covalent linkage between a 5' terminus of a first nucleic acid end and a 3' terminus of a second nucleic acid end, wherein the first and second nucleic acid ends may be ends of a single nucleic acid or ends of separate nucleic acids. In one aspect, chemical ligation may involve at least one nucleic acid substrate having a modified end (e.g., a modified 5' and/or 3' terminus) including one or more chemically reactive moieties that facilitate or promote linkage formation. In some embodiments, chemical ligation occurs when one or more nucleic acid termini are brought together in close proximity (e.g., when the termini are brought together due to annealing between complementary nucleic acid sequences). Accordingly, annealing between complementary 3' or 5' overhangs (e.g., overhangs generated by restriction enzyme cleavage of a double-stranded nucleic acid) or between any combination of complementary nucleic acids that results in a 3' terminus being brought into close proximity with a 5' terminus (e.g., the 3' and 5' termini are adjacent to each other when the nucleic acids are annealed to a complementary template nucleic acid) may promote a template-directed chemical ligation. Examples of chemical reactions may include, but are not limited to, condensation, reduction, and/or photochemical ligation reactions. It should be appreciated that in some embodiments chemical ligation can be used to produce naturally-occurring phosphodiester internucleotide linkages, non-naturally-occurring phosphamide pyrophosphate internucleotide linkages, and/or other non-naturally-occurring internucleotide linkages.
In some embodiments, the process of chemical ligation may involve one or more coupling agents to catalyze the ligation reaction. A coupling agent may promote a ligation reaction between reactive groups in adjacent nucleic acids (e.g., between a 5'- reactive moiety and a 3 '-reactive moiety at adjacent sites along a complementary template). In some embodiments, a coupling agent may be a reducing reagent (e.g., ferricyanide), a condensing reagent such (e.g., cyanoimidazole, cyanogen bromide, carbodiimide, etc.), or irradiation (e.g., UV irradiation for photo-ligation). In some embodiments, a chemical ligation may be an autoligation reaction that does not involve a separate coupling agent. In autoligation, the presence of a reactive group on one or more nucleic acids may be sufficient to catalyze a chemical ligation between nucleic acid termini without the addition of a coupling agent (see, for example, Xu Y & Kool ET, 1997, Tetrahedron Lett. 38:5595-8). Non-limiting examples of these reagent-free ligation reactions may involve nucleophilic displacements of sulfur on bromoacetyl, tosyl, or iodo-nucleoside groups (see, for example, Xu Y et al., 2001, Nat Biotech 19:148-52). Nucleic acids containing reactive groups suitable for autoligation can be prepared directly on automated synthesizers (see, for example, Xu Y & Kool ET, 1999, Nuc. Acids Res. 27:875-81). In some embodiments, a phosphorothioate at a 3' terminus may react with a leaving group (such as tosylate or iodide) on a thymidine at an adjacent 5' terminus. In some embodiments, two nucleic acid strands bound at adjacent sites on a complementary target strand may undergo auto-ligation by displacement of a 5'-end iodide moiety (or tosylate) with a 3'-end sulfur moiety. Accordingly, in some embodiments the product of an autoligation may include a non-naturally-occurring internucleotide linkage (e.g., a single oxygen atom may be replaced with a sulfur atom in the ligated product).
In some embodiments, a synthetic nucleic acid duplex can be assembled via chemical ligation in a one step reaction involving simultaneous chemical ligation of nucleic acids on both strands of the duplex. For example, a mixture of 5'- phosphorylated oligonucleotides corresponding to both strands of a target nucleic acid may be chemically ligated by a) exposure to heat (e.g., to 970C) and slow cooling to form a complex of annealed oligonucleotides, and b) exposure to cyanogen bromide or any other suitable coupling agent under conditions sufficient to chemically ligate adjacent 3' and 5' ends in the nucleic acid complex.
In some embodiments, a synthetic nucleic acid duplex can be assembled via chemical ligation in a two step reaction involving separate chemical ligations for the complementary strands of the duplex. For example, each strand of a target nucleic acid may be ligated in a separate reaction containing phosphorylated oligonucleotides corresponding to the strand that is to be ligated and non-phosphorylated oligonucleotides corresponding to the complementary strand. The non-phosphorylated oligonucleotides may serve as a template for the phosphorylated oligonucleotides during a chemical ligation (e.g. using cyanogen bromide). The resulting single-stranded ligated nucleic acid may be purified and annealed to a complementary ligated single-stranded nucleic acid to form the target duplex nucleic acid (see, for example, Shabarova ZA et al., 1991, Nuc. Acids Res. 19:4247-51).
Aspects of the invention may be used to enhance different types of nucleic acid assembly reactions (e.g., multiplex nucleic acid assembly reactions). Aspects of the invention may be used in combination with one or more assembly reactions described in, for example, Carr et al., 2004, Nucleic Acids Research, Vol. 32, No 20, el 62 (9 pages); Richmond et al., 2004, Nucleic Acids Research, Vol. 32, No 17, pp. 5011-5018; Caruthers et al., 1972, J. MoI. Biol. 72, 475-492; Hecker et al., 1998, Biotechniques 24:256-260; Kodumal et al., 2004, PNAS Vol. 101 , No. 44, pp. 15573-15578; Tian et al., 2004, Nature, Vol. 432, pp. 1050-1054; and US Patent Nos. 6,008,031 and 5,922,539, the disclosures of which are incorporated herein by reference. Certain embodiments of multiplex nucleic acid assembly reactions for generating a predetermined nucleic acid fragment are illustrated with reference to FIGS. 1-4. It should be appreciated that synthesis and assembly methods described herein (including, for example, oligonucleotide synthesis, multiplex nucleic acid assembly, concerted assembly of nucleic acid fragments, or any combination thereof) may be performed in any suitable format, including in a reaction tube, in a multi-well plate, on a surface, on a column, in a microfiuidic device (e.g., a microfluidic tube), a capillary tube, etc. It should be appreciated that the reference to complementary nucleic acids or complementary nucleic acid regions herein refers to nucleic acids or regions thereof that have sequences which are reverse complements of each other so that they can hybridize in an antiparallel fashion typical of natural DNA.
FIG. 1 shows one embodiment of a plurality of oligonucleotides that may be assembled in a polymerase-based multiplex oligonucleotide assembly reaction. Figure IA shows two groups of oligonucleotides (Group P and Group N) that have sequences of portions of the two complementary strands of a nucleic acid fragment to be assembled. Group P includes oligonucleotides with positive strand sequences (Pi, P2, ... Pn-I, Pn, Pn+i, ...PT, shown from 5'->3' on the positive strand). Group N includes oligonucleotides with negative strand sequences (NT, ..., Nn+i, Nn, Nn-1, ..., N2, Nj, shown from 5'-> 3' on the negative strand). In this example, none of the P group oligonucleotides overlap with each other and none of the N group oligonucleotides overlap with each other. However, in some embodiments, one or more of the oligonucleotides within the S or N group may overlap. Furthermore, FIG. IA shows gaps between consecutive oligonucleotides in Group P and gaps between consecutive oligonucleotides in Group N. However, each P group oligonucleotide (except for Pi) and each N group oligonucleotide (except for NT) overlaps with complementary regions of two oligonucleotides from the complementary group of oligonucleotides. Pj and NT overlap with a complementary region of only one oligonucleotide from the other group (the complementary 3 '-most oligonucleotides Ni and PT, respectively). FIG. IB- shows a structure of an embodiment of a Group P or Group N oligonucleotide represented in FIG. IA. This oligonucleotide includes a 5' region that is complementary to a 5' region of a first oligonucleotide from the other group, a 3' region that is complementary to a 3' region of a second oligonucleotide from the other group, and a core or central region that is not complementary to any oligonucleotide sequence from the other group (or its own group). This central region is illustrated as the B region in FIG. IB. The sequence of the B region may be different for each different oligonucleotide. As defined herein, the B region of an oligonucleotide in one group corresponds to a gap between two consecutive oligonucleotides in the complementary group of oligonucleotides. It should be noted that the 5 '-most oligonucleotide in each group (Pi in Group P and NT in Group N) does not have a 5' region that is complementary to the 5' region of any other oligonucleotide in either group. Accordingly, the 5 '-most oligonucleotides (Pi and Nτ) that are illustrated in FIG. IA each have a 3' complementary region and a 5' non-complementary region (the B region of FIG. IB), but no 5' complementary region. However, it should be appreciated that any one or more of the oligonucleotides in Group P and/or Group N (including all of the oligonucleotides in Group P and/or Group N) can be designed to have no B region. In the absence of a B region, a 5 '-most oligonucleotide has only the 3 ' complementary region (meaning that the entire oligonucleotide is complementary to the 3' region of the 3'-most oligonucleotide from the other group (e.g., the 3' region of Ni or PT shown in FIG. IA). In the absence of a B region, one of the other oligonucleotides in either Group P or Group N has only a 5' complementary region and a 3' complementary region (meaning that the entire oligonucleotide is complementary to the 5' and 3' sequence regions of the two overlapping oligonucleotides from the complementary group). In some embodiments, only a subset of oligonucleotides in an assembly reaction may include B regions. It should be appreciated that the length of the 5', 3', and B regions may be different for each oligonucleotide. However, for each oligonucleotide the length of the 5' region is the same as the length of the complementary 5' region in the 5' overlapping oligonucleotide from the other group. Similarly, the length of the 3' region is the same as the length of the complementary 3' region in the 3' overlapping oligonucleotide from the other group. However, in certain embodiments a 3 '-most oligonucleotide may be designed with a 3' region that extends beyond the 5' region of the 5 '-most oligonucleotide. In this embodiment, an assembled product may include the 5' end of the 5 '-most oligonucleotide, but not the 3' end of the 3 '-most oligonucleotide that extends beyond it.
FIG. 1C illustrates a subset of the oligonucleotides from FIG. IA3 each oligonucleotide having a 5', a 3', and an optional B region. Oligonucleotide Pn is shown with a 5' region that is complementary to (and can anneal to) the 5' region of oligonucleotide Nn-I. Oligonucleotide Pn also has a 3' region that is complementary to (and can anneal to) the 3* region of oligonucleotide Nn. Nn is also shown with a 5' region that is complementary (and can anneal to) the 5' region of oligonucleotide Pn+i. This pattern could be repeated for all of oligonucleotides P2 to PT and N1 to N^1 (with the 5 '-most oligonucleotides only having 3' complementary regions as discussed herein). If all of the oligonucleotides from Group P and Group N are mixed together under appropriate hybridization conditions, they may anneal to form a long chain such as the oligonucleotide complex illustrated in FIG. IA. However, subsets of the oligonucleotides may form shorter chains and even oligonucleotide dimers with annealed 5' or 3' regions. It should be appreciated that many copies of each oligonucleotide are included in a typical reaction mixture. Accordingly, the resulting hybridized reaction mixture may contain a distribution of different oligonucleotide dimers and complexes. Polymerase-mediated extension of the hybridized oligonucleotides results in a template- based extension of the 3' ends of oligonucleotides that have annealed 3' regions. Accordingly, polymerase-mediated extension of the oligonucleotides shown in FIG. 1C would result in extension of the 3' ends only of oligonucleotides Pn and Nn generating extended oligonucleotides containing sequences that are complementary to all the regions of Nn and Pn, respectively. Extended oligonucleotide products with sequences complementary to all of Nn-I and Pn+i would not be generated unless oligonucleotides Pn. i and Nn+i were included in the reaction mixture. Accordingly, if all of the oligonucleotide sequences in a plurality of oligonucleotides are to be incorporated into an assembled nucleic acid fragment using a polymerase, the plurality of oligonucleotides should include 5 '-most oligonucleotides that are at least complementary to the entire 3' regions of the 3 '-most oligonucleotides. In some embodiments, the 5 '-most oligonucleotides also may have 5' regions that extend beyond the 3' ends of the 3 '-most oligonucleotides as illustrated in FIG. IA. In some embodiments, a ligase also may be added to ligate adjacent 5' and 3' ends that may be formed upon 3' extension of annealed oligonucleotides in an oligonucleotide complex such as the one illustrated in FIG. IA. When assembling a nucleic acid fragment using a polymerase, a single cycle of polymerase extension extends oligonucleotide pairs with annealed 3' regions. Accordingly, if a plurality of oligonucleotides were annealed to form an annealed complex such as the one illustrated in FIG. IA, a single cycle of polymerase extension would result in the extension of the 3' ends of the PjZNi, P2ZN2, ..., Pn-i/Nπ5 PnZNn,
Pn+iZNn+i, ... , PTZNT oligonucleotide pairs. In one embodiment, a single molecule could be generated by ligating the extended oligonucleotide dimers. In one embodiment, a single molecule incorporating all of the oligonucleotide sequences may be generated by performing several polymerase extension cycles. In one embodiment, FIG. ID illustrates two cycles of polymerase extension
(separated by a denaturing step and an annealing step) and the resulting nucleic acid products. It should be appreciated that several cycles of polymerase extension may be required to assemble a single nucleic acid fragment containing all the sequences of an initial plurality of oligonucleotides. In one embodiment, a minimal number of extension cycles for assembling a nucleic acid may be calculated as Iog2n, where n is the number of oligonucleotides being assembled. In some embodiments, progressive assembly of the nucleic acid may be achieved without using temperature cycles. For example, an enzyme capable of rolling circle amplification may be used (e.g., phi 29 polymerase) when a circularized nucleic acid (e.g., oligonucleotide) complex is used as a template to produce a large amount of circular product for subsequent processing using MutS or a MutS homolog as described herein. In step 1 of FIG. ID, annealed oligonucleotide pairs Pn/Nn and Pn+i/Nn+i are extended to form oligonucleotide dimer products incorporating the sequences covered by the respective oligonucleotide pairs. For example, Pn is extended to incorporate sequences that are complementary to the B and 5' regions of Nn (indicated as N'n in FIG. ID). Similarly, Nn+ 1 is extended to incorporate sequences that are complementary to the 5' and B regions of Pn+ 1 (indicated as P'n+i in FIG. ID). These dimer products may be denatured and reannealed to form the starting material of step 2 where the 3' end of the extended Pn oligonucleotide is annealed to the 3' end of the extended Nπ+i oligonucleotide. This product may be extended in a polymerase-mediated reaction to form a product that incorporates the sequences of the four oligonucleotides (Pn, Nn, Pn +1» Nn+0- One strand of this extended product has a sequence that includes (in 5' to 3' order) the 5% B, and 3' regions of Pn, the complement of the B region of Nn, the 5', B, and 3' regions of Pn+i, and the complements of the B and 5' regions of Nn+i. The other strand of this extended product has the complementary sequence. It should be appreciated that the 3' regions of Pn and Nn are complementary, the 5' regions of Nn and Pn+! are complementary, and the 3' regions of Pn+ 1 and Nn+i are complementary. It also should be appreciated that the reaction products shown in FIG. ID are a subset of the reaction products that would be obtained using all of the oligonucleotides of Group P and Group N. A first polymerase extension reaction using all of the oligonucleotides would result in a plurality of overlapping oligonucleotide dimers from P1ZNi to Pτ/Nχ. Each of these may be denatured and at least one of the strands could then anneal to an overlapping complementary strand from an adjacent (either 3* or 5') oligonucleotide dimer and be extended in a second cycle of polymerase extension as shown in FIG. ID. Subsequent cycles of denaturing, annealing, and extension produce progressively larger products including a nucleic acid fragment that includes the sequences of all of the initial oligonucleotides. It should be appreciated that these subsequent rounds of extension also produce many nucleic acid products of intermediate length. The reaction product may be complex since not all of the 3' regions may be extended in each cycle. Accordingly, unextended oligonucleotides may be available in each cycle to anneal to other unextended oligonucleotides or to previously extended oligonucleotides. Similarly, extended products of different sizes may anneal to each other in each cycle. Accordingly, a mixture of extended products of different sizes covering different regions of the sequence may be generated along with the nucleic acid fragment covering the entire sequence. This mixture also may contain any remaining unextended oligonucleotides.
FIG. 2 shows an embodiment of a plurality of oligonucleotides that may be assembled in a directional polymerase-based multiplex oligonucleotide assembly reaction. In this embodiment, only the 5 '-most oligonucleotide of Group P may be provided. In contrast to the example shown in FIG. 1, the remainder of the sequence of the predetermined nucleic acid fragment is provided by oligonucleotides of Group N. The 3 '-most oligonucleotide of Group N (Nl) has a 3' region that is complementary to the 3' region of Pi as shown in FIG. 2B. However, the remainder of the oligonucleotides in Group N have overlapping (but non-complementary) 3' and 5' regions as illustrated in FIG. 2B for oligonucleotides N1-N3. Each Group N oligonucleotide (e.g., Nn) overlaps with two adjacent oligonucleotides: one overlaps with the 3' region (Nn-O and one with the 5' region (Nn+ι), except for N] that overlaps with the 3' regions of Pj (complementary overlap) and N2 (non-complementary overlap), and NT that overlaps only with NT- i. It should be appreciated that all of the overlaps shown in FIG. 2 A between adjacent oligonucleotides N2 to Nτ-i are non-complementary overlaps between the 5' region of one oligonucleotide and the 3' region of the adjacent oligonucleotide illustrated in a 3' to 5' direction on the N strand of the predetermined nucleic acid fragment. It also should be appreciated that each oligonucleotide may have 3', B, and 5'regions.of different lengths (including no B region in some embodiments). In some embodiments, none of the oligonucleotides may have B regions, meaning that the entire sequence of each oligonucleotide may overlap with the combined 51 and 3' region sequences of its two adjacent oligonucleotides. Assembly of a predetermined nucleic acid fragment from the plurality of oligonucleotides shown in FIG. 2 A may involve multiple cycles of polymerase-mediated extension. Each extension cycle may be separated by a denaturing and an annealing step. FIG. 2C illustrates the first two steps in this assembly process. In step 1, annealed oligonucleotides Pi and N1 are extended to form an oligonucleotide dimer. Pt is shown with a 5' region that is non-complementary to the 3' region of Ni and extends beyond the 3' region of Ni when the oligonucleotides are annealed. However, in some embodiments, P1 may lack the 5' non-complementary region and include only sequences that overlap with the 3' region of Ni. The product of Pi extension is shown after step 1 containing an extended region that is complementary to the 5' end of Ni. The single strand illustrated in FIG. 2C may be obtained by denaturing the oligonucleotide dimer that results from the extension of Pi/Ni in step 1. The product of Pi extension is shown annealed to the 3' region of N2. This annealed complex may be extended in step 2 to generate an extended product that now includes sequences complementary to the B and 5' regions of N2. Again, the single strand illustrated in FIG. 2C may be obtained by denaturing the oligonucleotide dimer that results from the extension reaction of step 2. Additional cycles of extension may be performed to further assemble a predetermined nucleic acid fragment. In each cycle, extension results in the addition of sequences complementary to the B and 5' regions of the next Group N oligonucleotide. Each cycle may include a denaturing and annealing step. However, the extension may occur under the annealing conditions. Accordingly, in one embodiment, cycles of extension may be obtained by alternating between denaturing conditions (e.g., a denaturing temperature) and annealing/extension conditions (e.g., an annealing/extension temperature). In one embodiment, T (the number of group N oligonucleotides) may determine the minimal number of temperature cycles used to assemble the oligonucleotides. However, in some embodiments, progressive extension may be achieved without temperature cycling. For example, an enzyme capable promoting rolling circle amplification may be used (e.g., TempliPhi). It should be appreciated that a reaction mixture containing an assembled predetermined nucleic acid fragment also may contain a distribution of shorter extension products that may result from incomplete extension during one or more of the cycles or may be the result of an Pi/Ni extension that was initiated after the first cycle. FIG. 2D illustrates an example of a sequential extension reaction where the 5'- most Pi oligonucleotide is bound to a support and the Group N oligonucleotides are unbound. The reaction steps are similar to those described for FIG. 2C. However, an extended predetermined nucleic acid fragment will be bound to the support via the 5'- most Pi oligonucleotide. Accordingly, the complementary strand (the negative strand) may readily be obtained by denaturing the bound fragment and releasing the negative strand. In some embodiments, the attachment to the support may be labile or readily reversed (e.g., using light, a chemical reagent, a pH change, etc.) and the positive strand also may be released. Accordingly, either the positive strand, the negative strand, or the double-stranded product may be obtained. FIG. 2E illustrates an example of a sequential reaction where Pi is unbound and the Group N oligonucleotides are bound to a support. The reaction steps are similar to those described for FIG. 2C. However, an extended predetermined nucleic acid fragment will be bound to the support via the 5'-most NT oligonucleotide. Accordingly, the complementary strand (the positive strand) may readily be obtained by denaturing the bound fragment and releasing the positive strand. In some embodiments, the attachment to the support may be labile or readily reversed (e.g., using light, a chemical reagent, a pH change, etc.) and the negative strand also may be released. Accordingly, either the positive strand, the negative strand, or the double- stranded product may be obtained. It should be appreciated that other configurations of oligonucleotides may be used to assemble a nucleic acid via two or more cycles of polymerase-based extension. In many configurations, at least one pair of oligonucleotides have complementary 3' end regions. FIG. 2F illustrates an example where an oligonucleotide pair with complementary 3' end regions is flanked on either side by a series of oligonucleotides with overlapping non-complementary sequences. The oligonucleotides illustrated to the right of the complementary pair have overlapping 3' and 5' regions (with the 3' region of one oligonucleotide being identical to the 5' region of the adjacent oligonucleotide) that corresponding to a sequence of one strand of the target nucleic acid to be assembled. The oligonucleotides illustrated to the left of the complementary pair have overlapping 3' and 5' regions (with the 3' region of one oligonucleotide being identical to the 5' region of the adjacent oligonucleotide) that correspond to a sequence of the complementary strand of the target nucleic acid. These oligonucleotides may be assembled via sequential polymerase-based extension reactions as described herein (see also, for example, Xiong et al., 2004, Nucleic Acids Research, Vol. 32, No. 12, e98, 10 pages, the disclosure of which is incorporated by reference herein). It should be appreciated that different numbers and/or lengths of oligonucleotides may be used on either side of the complementary pair. Accordingly, the illustration of the complementary pair as the central pair in FIG. 2F is not intended to be limiting as other configuration of a complementary oligonucleotide pair flanked by a different number of non-complementary pairs on either side may be used according to methods of the invention. FIG. 3 shows an embodiment of a plurality of oligonucleotides that may be assembled in a ligase reaction. FIG. 3 A illustrates the alignment of the oligonucleotides showing that they do not contain gaps (i.e., no B region as described herein). Accordingly, the oligonucleotides may anneal to form a complex with no nucleotide gaps between the 3' and 5' ends of the annealed oligonucleotides in either Group P or Group N. These oligonucleotides provide a suitable template for assembly using a ligase under appropriate reaction conditions. However, it should be appreciated that these oligonucleotides also may be assembled using a polymerase-based assembly reaction as described herein. FIG. 3B shows two individual ligation reactions. These reactions are illustrated in two steps. However, it should be appreciated that these ligation reactions may occur simultaneously or sequentially in any order and may occur as such in a reaction maintained under constant reaction conditions (e.g., with no temperature cycling) or in a reaction exposed to several temperature cycles. For example, the reaction illustrated in step 2 may occur before the reaction illustrated in step 1. In each ligation reaction illustrated in FIG. 3B, a Group N oligonucleotide is annealed to two adjacent Group P oligonucleotides (due to the complementary 5' and 3' regions between the P and N oligonucleotides), providing a template for ligation of the adjacent P oligonucleotides. Although not illustrated, ligation of the N group oligonucleotides also may proceed in similar manner to assemble adjacent N oligonucleotides that are annealed to their complementary P oligonucleotide. Assembly of the predetermined nucleic acid fragment may be obtained through ligation of all of the oligonucleotides to generate a double stranded product. However, in some embodiments, a single stranded product of either the positive or negative strand may be obtained. In certain embodiments, a plurality of oligonucleotides may be designed to generate only single-stranded reaction products in a ligation reaction. For example, a first group of oligonucleotides (of either Group P or Group N) may be provided to cover the entire sequence on one strand of the predetermined nucleic acid fragment (on either the positive or negative strand). In contrast, a second group of oligonucleotides (from the complementary group to the first group) may be designed to be long enough to anneal to complementary regions in the first group but not long enough to provide adjacent 5' and 3' ends between oligonucleotides in the second group. This provides substrates that are suitable for ligation of oligonucleotides from the first group but not the second group. The result is a single-stranded product having a sequence corresponding to the oligonucleotides in the first group. Again, as with other assembly reactions described herein, a ligase reaction mixture that contains an assembled predetermined nucleic acid fragment also may contain a distribution of smaller fragments resulting from the assembly of a subset of the oligonucleotides. FIG. 4 shows an embodiment of a ligase-based assembly where one or more of the plurality of oligonucleotides is bound to a support. In FIG. 4A, the 5' most oligonucleotide of the P group oligonucleotides is bound to a support. Ligation of adjacent oligonucleotides in the 5' to 3' direction results in the assembly of a predetermined nucleic acid fragment. FIG. 4A illustrates an example where adjacent oligonucleotides Pz and P3 are added sequentially. However, the ligation of any two adjacent oligonucleotides from Group P may occur independently and in any order in a ligation reaction mixture. For example, when Pi is ligated to the 5' end of N2, N2 may be in the form of a single oligonucleotide or it already may be ligated to one or more downstream oligonucleotides (N3, N4, etc.). It should be appreciated that for a ligation assembly bound to a support, either the 5'-most (e.g., P1 for Group P, or NT for Group N) or the 3'-most (e.g., PT for Group P, or Ni for Group N) oligonucleotide may be bound to a support since the reaction can proceed in any direction. In some embodiments, a predetermined nucleic acid fragment may be assembled with a central oligonucleotide (i.e., neither the 5'-most or the 3'-most) that is bound to a support provided that the attachment to the support does not interfere with ligation.
FIG. 4B illustrates an example where a plurality of N group oligonucleotides are bound to a support and a predetermined nucleic acid fragment is assembled from P group oligonucleotides that anneal to their complementary support-bound N group oligonucleotides. Again, FIG. 4B illustrates a sequential addition. However, adjacent P group oligonucleotides may be ligated in any order. Also, the bound oligonucleotides may be attached at their 5' end, 3' end, or at any other position provided that the attachment does not interfere with their ability to bind to complementary 5' and 3' regions on the oligonucleotides that are being assembled. This reaction may involve one or more reaction condition changes (e.g., temperature cycles) so that ligated oligonucleotides bound to one immobilized N group oligonucleotide can be dissociated from the support and bind to a different immobilized N group oligonucleotide to provide a substrate for ligation to another P group oligonucleotide.
As with other assembly reactions described herein, support-bound ligase reactions (e.g., those illustrated in FIG. 4B) that generate a full length predetermined nucleic acid fragment also may generate a distribution of smaller fragments resulting from the assembly of subsets of the oligonucleotides. A support used in any of the assembly reactions described herein (e.g., polymerase-based, ligase-based, or other assembly reaction) may include any suitable support medium. A support may be solid, porous, a matrix, a gel, beads, beads in a gel, etc. A support may be of any suitable size. A solid support may be provided in any suitable configuration or shape (e.g., a chip, a bead, a gel, a microfluidic channel, a planar surface, a spherical shape, a column, etc.). As illustrated herein, different oligonucleotide assembly reactions may be used to assemble a plurality of overlapping oligonucleotides (with overlaps that are either 575', 373', 573% complementary, non-complementary, or a combination thereof). Many of these reactions include at least one pair of oligonucleotides (the pair including one oligonucleotide from a first group or P group of oligonucleotides and one oligonucleotide from a second group or N group of oligonucleotides) have overlapping complementary 3' regions. However, in some embodiments, a predetermined nucleic acid may be assembled from non-overlapping oligonucleotides using blunt-ended ligation reactions. In some embodiments, the order of assembly of the non-overlapping oligonucleotides may be biased by selective phosphorylation of different 5' ends. In some embodiments, size purification may be used to select for the correct order of assembly. In some embodiments, the correct order of assembly may be promoted by sequentially adding appropriate oligonucleotide substrates into the reaction (e.g., the ligation reaction).
In order to obtain a full-length nucleic acid fragment from a multiplex oligonucleotide assembly reaction, a purification step may be used to remove starting oligonucleotides and/or incompletely assembled fragments. In some embodiments, a purification step may involve chromatography, electrophoresis, or other physical size separation technique. In certain embodiments, a purification step may involve amplifying the full length product. For example, a pair of amplification primers (e.g., PCR primers) that correspond to the predetermined 5' and 3' ends of the nucleic acid fragment being assembled will preferentially amplify full length product in an exponential fashion. It should be appreciated that smaller assembled products may be amplified if they contain the predetermined 5' and 3' ends. However, such smaller-than- expected products containing the predetermined 5' and 3' ends should only be generated if an error occurred during assembly (e.g., resulting in the deletion or omission of one or more regions of the target nucleic acid) and may be removed by size fractionation of the amplified product. Accordingly, a preparation containing a relatively high amount of full length product may be obtained directly by amplifying the product of an assembly reaction using primers that correspond to the predetermined 5' and 3' ends. In some embodiments, additional purification (e.g., size selection) techniques may be used to obtain a more purified preparation of amplified full-length nucleic acid fragment.
When designing a plurality of oligonucleotides to assemble a predetermined nucleic acid fragment, the sequence of the predetermined fragment will be provided by the oligonucleotides as described herein. However, the oligonucleotides may contain additional sequence information that may be removed during assembly or may be provided to assist in subsequent manipulations of the assembled nucleic acid fragment. Examples of additional sequences include, but are not limited to, primer recognition sequences for amplification (e.g., PCR primer recognition sequences), restriction enzyme recognition sequences, recombination sequences, other binding or recognition sequences, labeled sequences, etc. In some embodiments, one or more of the 5'-most oligonucleotides, one or more of the 3 '-most oligonucleotides, or any combination thereof, may contain one or more additional sequences. In some embodiments, the additional sequence information may be contained in two or more adjacent oligonucleotides on either strand of the predetermined nucleic acid sequence. Accordingly, an assembled nucleic acid fragment may contain additional sequences that may be used to connect the assembled fragment to one or more additional nucleic acid fragments (e.g., one or more other assembled fragments, fragments obtained from other sources, vectors, etc.) via ligation, recombination, polymerase-mediated assembly, etc. In some embodiments, purification may involve cloning one or more assembled nucleic acid fragments. The cloned product may be screened (e.g., sequenced, analyzed for an insert of the expected size, etc.).
In some embodiments, a nucleic acid fragment assembled from a plurality of oligonucleotides may be combined with one or more additional nucleic acid fragments using a polymerase-based and/or a ligase-based extension reaction similar to those described herein for oligonucleotide assembly. Accordingly, one or more overlapping nucleic acid fragments may be combined and assembled to produce a larger nucleic acid fragment as described herein. In certain embodiments, double- stranded overlapping oligonucleotide fragments may be combined. However, single-stranded fragments, or combinations of single-stranded and double-stranded fragments may be combined as described herein. A nucleic acid fragment assembled from a plurality of oligonucleotides may be of any length depending on the number and length of the oligonucleotides used in the assembly reaction. For example, a nucleic acid fragment (either single-stranded or double-stranded) assembled from a plurality of oligonucleotides may be between 50 and 1,000 nucleotides long (for example, about 70 nucleotides long, between 100 and 500 nucleotides long, between 200 and 400 nucleotides long, about 200 nucleotides long, about 300 nucleotides long, about 400 nucleotides long, etc.). One or more such nucleic acid fragments (e.g., with overlapping 3' and/or 5' ends) may be assembled to form a larger nucleic acid fragment (single- stranded or double-stranded) as described herein.
A full length product assembled from smaller nucleic acid fragments also may be isolated or purified as described herein (e.g., using a size selection, cloning, selective binding or other suitable purification procedure). In addition, any assembled nucleic acid fragment (e.g., full-length nucleic acid fragment) described herein may be amplified (prior to, as part of, or after, a purification procedure) using appropriate 5' and 3' amplification primers. Synthetic Oligonucleotides
It should be appreciated that the terms P Group and N Group oligonucleotides are used herein for clarity purposes only, and to illustrate several embodiments of multiplex oligonucleotide assembly. The Group P and Group N oligonucleotides described herein are interchangeable, and may be referred to as first and second groups of oligonucleotides corresponding to sequences on complementary strands of a target nucleic acid fragment.
Oligonucleotides may be synthesized using any suitable technique. For example, oligonucleotides may be synthesized on a column or other support (e.g., a chip). Examples of chip-based synthesis techniques include techniques used in synthesis devices or methods available from Combimatrix, Agilent, Affymetrix, or other sources. A synthetic oligonucleotide may be of any suitable size, for example between 10 and 1,000 nucleotides long (e.g., between 10 and 200, 200 and 500, 500 and 1,000 nucleotides long, or any combination thereof). An assembly reaction may include a plurality of oligonucleotides, each of which independently may be between 10 and 200 nucleotides in length (e.g., between 20 and 150, between 30 and 100, 30 to 90, 30-80, 30-70, 30-60, 35-55, 40-50, or any intermediate number of nucleotides). However, one or more shorter or longer oligonucleotides may be used in certain embodiments. Oligonucleotides may be provided as single stranded synthetic products.
However, in some embodiments, oligonucleotides may be provided as double-stranded preparations including an annealed complementary strand. Oligonucleotides may be molecules of DNA, RNA, PNA, or any combination thereof. A double-stranded oligonucleotide may be produced by amplifying a single-stranded synthetic oligonucleotide or other suitable template (e.g., a sequence in a nucleic acid preparation such as a nucleic acid vector or genomic nucleic acid). Accordingly, a plurality of oligonucleotides designed to have the sequence features described herein may be provided as a plurality of single-stranded oligonucleotides having those feature, or also may be provided along with complementary oligonucleotides. In some embodiments, an oligonucleotide may be phosphorylated (e.g., with a 5' phosphate). In some embodiments, an oligonucleotide may be non-phosphorylated. In some embodiments, an oligonucleotide may be amplified using an appropriate primer pair with one primer corresponding to each end of the oligonucleotide (e.g., one that is complementary to the 3' end of the oligonucleotide and one that is identical to the 5' end of the oligonucleotide). In some embodiments, an oligonucleotide may be designed to contain a central assembly sequence (designed to be incorporated into the target nucleic acid) flanked by a 5' amplification sequence (e.g., a 5' universal sequence) and a 3' amplification sequence (e.g., a 3' universal sequence). Amplification primers (e.g., between 10 and 50 nucleotides long, between 15 and 45 nucleotides long, about 25 nucleotides long, etc.) corresponding to the flanking amplification sequences may be used to amplify the oligonucleotide (e.g., one primer may be complementary to the 3' amplification sequence and one primer may have the same sequence as the 5' amplification sequence). The amplification sequences then may be removed from the amplified oligonucleotide using any suitable technique to produce an oligonucleotide that contains only the assembly sequence. In some embodiments, a plurality of different oligonucleotides (e.g., about 5, 10,
50, 100, or more) with different central assembly sequences may have identical 5' amplification sequences and identical 3' amplification sequences. These oligonucleotides can all be amplified in the same reaction using the same amplification primers. A preparation of an oligonucleotide designed to have a certain sequence may include oligonucleotide molecules having the designed sequence in addition to oligonucleotide molecules that contain errors (e.g., that differ from the designed sequence at least at one position). A sequence error may include one or more nucleotide deletions, additions, substitutions (e.g., transversion or transition), inversions, duplications, or any combination of two or more thereof. Oligonucleotide errors may be generated during oligonucleotide synthesis. Different synthetic techniques may be prone to different error profiles and frequencies. In some embodiments, error rates may vary from 1/10 to 1/200 errors per base depending on the synthesis protocol that is used. However, in some embodiments lower error rates may be achieved. Also, the types of errors may depend on the synthetic techniques that are used. For example, in some embodiments chip-based oligonucleotide synthesis may result in relatively more deletions than column-based synthetic techniques. In some embodiments, one or more oligonucleotide preparations may be processed to remove (or reduce the frequency of) error-containing oligonucleotides. In some embodiments, a hybridization technique may be used wherein an oligonucleotide preparation is hybridized under stringent conditions one or more times to an immobilized oligonucleotide preparation designed to have a complementary sequence.
Oligonucleotides that do not bind may be removed in order to selectively or specifically remove oligonucleotides that contain errors that would destabilize hybridization under the conditions used. It should be appreciated that this processing may not remove all error-containing oligonucleotides since many have only one or two sequence errors and may still bind to the immobilized oligonucleotides with sufficient affinity for a fraction of them to remain bound through this selection processing procedure.
In some embodiments, a nucleic acid binding protein or recombinase (e.g., RecA) may be included in one or more of the oligonucleotide processing steps to improve the selection of error free oligonucleotides. For example, by preferentially promoting the hybridization of oligonucleotides that are completely complementary with the immobilized oligonucleotides, the amount of error containing oligonucleotides that are bound may be reduced. As a result, this oligonucleotide processing procedure may remove more error-containing oligonucleotides and generate an oligonucleotide preparation that has a lower error frequency (e.g., with an error rate of less than 1/50, less than 1/100, less than 1/200, less than 1/300, less than 1/400, less than 1/500, less than 1/1,000, or less than 1/2,000 errors per base.
A plurality of oligonucleotides used in an assembly reaction may contain preparations of synthetic oligonucleotides, single-stranded oligonucleotides, double- stranded oligonucleotides, amplification products, oligonucleotides that are processed to remove (or reduce the frequency of) error-containing variants, etc., or any combination of two or more thereof.
In some aspects, a synthetic oligonucleotide may be amplified prior to use. Either strand of a double-stranded amplification product may be used as an assembly oligonucleotide and added to an assembly reaction as described herein. A synthetic oligonucleotide may be amplified using a pair of amplification primers (e.g., a first primer that hybridizes to the 3' region of the oligonucleotide and a second primer that hybridizes to the 3' region of the complement of the oligonucleotide). The oligonucleotide may be synthesized on a support such as a chip (e.g., using an ink-jet- based synthesis technology). In some embodiments, the oligonucleotide may be amplified while it is still attached to the support. In some embodiments, the oligonucleotide may be removed or cleaved from the support prior to amplification. The two strands of a double-stranded amplification product may be separated and isolated using any suitable technique. In some embodiments, the two strands may be differentially labeled (e.g., using one or more different molecular weight, affinity, fluorescent, electrostatic, magnetic, and/or other suitable tags). The different labels may be used to purify and/or isolate one or both strands. In some embodiments, biotin may be used as a purification tag. In some embodiments, the strand that is to be used for assembly may be directly purified (e.g., using an affinity or other suitable tag). In some embodiments, the complementary strand is removed (e.g., using an affinity or other suitable tag) and the remaining strand is used for assembly.
In some embodiments, a synthetic oligonucleotide may include a central assembly sequence flanked by 5' and 3' amplification sequences. The central assembly sequence is designed for incorporation into an assembled nucleic acid. The flanking sequences are designed for amplification and are not intended to be incorporated into the assembled nucleic acid. The flanking amplification sequences may be used as universal primer sequences to amplify a plurality of different assembly oligonucleotides that share the same amplification sequences but have different central assembly sequences. In some embodiments, the flanking sequences are removed after amplification to produce an oligonucleotide that contains only the assembly sequence.
In some embodiments, one of the two amplification primers may be biotinylated. The nucleic acid strand that incorporates this biotinylated primer during amplification can be affinity purified using streptavidin (e.g., bound to a bead, column, or other surface), hi some embodiments, the amplification primers also may be designed to include certain sequence features that can be used to remove the primer regions after amplification in order to produce a single-stranded assembly oligonucleotide that includes the assembly sequence without the flanking amplification sequences. In some embodiments, the non-biotinylated strand may be used for assembly.
The assembly oligonucleotide may be purified by removing the biotinylated complementary strand. In some embodiments, the amplification sequences may be removed if the non-biotinylated primer includes a dU at its 3' end, and if the amplification sequence recognized by (i.e., complementary to) the biotinylated primer includes at most three of the four nucleotides and the fourth nucleotide is present in the assembly sequence at (or adjacent to) the junction between the amplification sequence and the assembly sequence. After amplification, the double-stranded product is incubated with T4 DNA polymerase (or other polymerase having a suitable editing activity) in the presence of the fourth nucleotide (without any of the nucleotides that are present in the amplification sequence recognized by the biotinylated primer) under appropriate reaction conditions. Under these conditions, the 3' nucleotides are progressively removed through to the nucleotide that is not present in the amplification sequence (referred to as the.fourth nucleotide above). As a result, the amplification sequence that is recognized by the biotinylated primer is removed. The biotinylated strand is then removed. The remaining non-biotinylated strand is then treated with uracil-DNA glycosylase (UDG) to remove the non-biotinylated primer sequence. This technique generates a single-stranded assembly oligonucleotide without the flanking amplification sequences. It should be appreciated that this technique may be used to process a single amplified oligonucleotide preparation or a plurality of different amplified oligonucleotides in a single reaction if they share the same amplification sequence features described above. In some embodiments, the biotinylated strand may be used for assembly. The assembly oligonucleotide may be obtained directly by isolating the biotinylated strand. In some embodiments, the amplification sequences may be removed if the biotinylated primer includes a dU at its 3' end, and if the amplification sequence recognized by (i.e., complementary to) the non-biotinylated primer includes at most three of the four nucleotides and the fourth nucleotide is present in the assembly sequence at (or adjacent to) the junction between the amplification sequence and the assembly sequence. After amplification, the double-stranded product is incubated with T4 DNA polymerase (or other polymerase having a suitable editing activity) in the presence of the fourth nucleotide (without any of the nucleotides that are present in the amplification sequence recognized by the non-biotinylated primer) under appropriate reaction conditions. Under these conditions, the 3' nucleotides are progressively removed through to the nucleotide that is not present in the amplification sequence (referred to as the fourth nucleotide above). As a result, the amplification sequence that is recognized by the non- biotinylated primer is removed. The biotinylated strand is then isolated (and the non- biotinylated strand is removed). The isolated biotinylated strand is then treated with UDG to remove the biotinylated primer sequence. This technique generates a single- stranded assembly oligonucleotide without the flanking amplification sequences. It should be appreciated that this technique may be used to process a single amplified oligonucleotide preparation or a plurality of different amplified oligonucleotides in a single reaction if they share the same amplification sequence features described above.
It should be appreciated that the biotinylated primer may be designed to anneal to either the synthetic oligonucleotide or to its complement for the amplification and purification reactions described above. Similarly, the non-biotinylated primer may be designed to anneal to either strand provided it anneals to the strand that is complementary to the strand recognized by the biotinylated primer.
In certain embodiments, it may be helpful to include one or more modified oligonucleotides in an assembly reaction. An oligonucleotide may be modified by incorporating a modified-base (e.g., a nucleotide analog) during synthesis, by modifying the oligonucleotide after synthesis, or any combination thereof. Examples of modifications include, but are not limited to, one or more of the following: universal bases such as nitroindoles, dP and dK. inosine, uracil; halogenated bases such as BrdU; fluorescent labeled bases; non-radioactive labels such as biotin (as a derivative of dT) and digoxigenin (DIG); 2,4-Dinitrophenyl (DNP); radioactive nucleotides; post-coupling modification such as dR-NHb (deoxyribose-NH2); Acridine (6-chloro-2- methoxiacridine); and spacer phosphoramides which are used during synthesis to add a spacer 'arm' into the sequence, such as C3, C8 (octanediol), C9, Cl 2, HEG (hexaethlene glycol) and C 18.
It should be appreciated that one or more nucleic acid binding proteins or recombinases are preferably not included in a post-assembly fidelity optimization technique (e.g., a screening technique using a MutS or MutS homolog), because the optimization procedure involves removing error-containing nucleic acids via the production and removal of heteroduplexes. Accordingly, any nucleic acid binding proteins or recombinases (e.g., RecA) that were included in the assembly steps are preferably removed (e.g., by inactivation, column purification or other suitable technique) after assembly and prior to fidelity optimization.
Aspects of the invention may be useful for a range of applications involving the production and/or use of synthetic nucleic acids. As described herein, the invention provides methods for assembling synthetic nucleic acids with increased efficiency. The resulting assembled nucleic acids may be amplified in vitro (e.g., using PCR, LCR, or any suitable amplification technique), amplified in vivo (e.g., via cloning into a suitable vector), isolated and/or purified. An assembled nucleic acid (alone or cloned into a vector) may be transformed into a host cell (e.g., a prokaryotic, eukaryotic, insect, mammalian, or other host cell). In some embodiments, the host cell may be used to propagate the nucleic acid. In certain embodiments, the nucleic acid may be integrated into the genome of the host cell. In some embodiments, the nucleic acid may replace a corresponding nucleic acid region on the genome of the cell (e.g., via homologous recombination). Accordingly, nucleic acids may be used to produce recombinant organisms. In some embodiments, a target nucleic acid may be an entire genome or large fragments of a genome that are used to replace all or part of the genome of a host organism. Recombinant organisms also may be used for a variety of research, industrial, agricultural, and/or medical applications. Many of the techniques described herein can be used together to produce long nucleic acid molecules. For example, concerted assembly may be used to assemble oligonucleotide duplexes and nucleic acid fragments of less than 100 to more than 10,000 base pairs in length (e.g., 100 mers to 500 mers, 500 mers to 1 ,000 mers, 1 ,000 mers to 5,000 mers, 5, 000 mers to 10,000 mers, 25,000 mers, 50,000 mers, 75,000 mers, 100,000 mers, etc.). In an exemplary embodiment, methods described herein may be used during the assembly of an entire genome (or a large fragment thereof, e.g., about 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or more) of an organism (e.g., of a viral, bacterial, yeast, or other prokaryotic or eukaryotic organism), optionally incorporating specific modifications into the sequence at one or more desired locations. Any of the nucleic acid products (e.g., including nucleic acids that are amplified, cloned, purified, isolated, etc.) may be packaged in any suitable format (e.g., in a stable buffer, lyophilized, etc.) for storage and/or shipping (e.g., for shipping to a distribution center or to a customer). Similarly, any of the host cells (e.g., cells transformed with a vector or having a modified genome) may be prepared in a suitable buffer for storage and or transport (e.g., for distribution to a customer). In some embodiments, cells may be frozen. However, other stable cell preparations also may be used. Host cells may be grown and expanded in culture. Host cells may be used for expressing one or more RNAs or polypeptides of interest (e.g., therapeutic, industrial, agricultural, and/or medical proteins). The expressed polypeptides may be natural polypeptides or non-natural polypeptides. The polypeptides may be isolated or purified for subsequent use. Accordingly, nucleic acid molecules generated using methods of the invention can be incorporated into a vector. The vector may be a cloning vector or an expression vector. A vector may comprise an origin of replication and one or more selectable markers (e.g., antibiotic resistant markers, auxotrophic markers, etc.). In some embodiments, the vector may be a viral vector. A viral vector may comprise nucleic acid sequences capable of infecting target cells. Similarly, in some embodiments, a prokaryotic expression vector operably linked to an appropriate promoter system can be used to transform target cells. In other embodiments, a eukaryotic vector operably linked to an appropriate promoter system can be used to transfect target cells or tissues.
Transcription and/or translation of the constructs described herein may be carried out in vitro (i.e., using cell-free systems) or in vivo (i.e., expressed in cells). In some embodiments, cell lysates may be prepared. In certain embodiments, expressed RNAs or polypeptides may be isolated or purified. Nucleic acids of the invention also may be used to add detection and/or purification tags to expressed polypeptides or fragments thereof. Examples of polypeptide-based fusion/tag include, but are not limited to, hexa- histidine (His6) Myc and HA, and other polypeptides with utility, such as GFP, GST, MBP, chitin and the like. In some embodiments, polypeptides may comprise one or more unnatural amino acid residue(s).
In some embodiments, antibodies can be made against polypeptides or fτagment(s) thereof encoded by one or more synthetic nucleic acids. In certain embodiments, synthetic nucleic acids may be provided as libraries for screening in research and development (e.g., to identify potential therapeutic proteins or peptides, to identify potential protein targets for drug development, etc.) In some embodiments, a synthetic nucleic acid may be used as a therapeutic (e.g., for gene therapy, or for gene regulation). For example, a synthetic nucleic acid may be administered to a patient in an amount sufficient to express a therapeutic amount of a protein. In other embodiments, a synthetic nucleic acid may be administered to a patient in an amount sufficient to regulate (e.g., down-regulate) the expression of a gene.
It should be appreciated that different acts or embodiments described herein may be performed independently and may be performed at different locations in the United States or outside the United States. For example, each of the acts of receiving an order for a target nucleic acid, analyzing a target nucleic acid sequence, identifying an assembly strategy, designing one or more starting nucleic acids (e.g., oligonucleotides), synthesizing starting nucleic acid(s), purifying starting nucleic acid(s), assembling starting nucleic acid(s), isolating assembled nucleic acid(s), confirming the sequence of assembled nucleic acid(s), manipulating assembled nucleic acid(s) (e.g., amplifying, cloning, inserting into a host genome, etc.), and any other acts or any parts of these acts may be performed independently either at one location or at different sites within the United States or outside the United States. In some embodiments, an assembly procedure may involve a combination of acts that are performed at one site (in the United States or outside the United States) and acts that are performed at one or more remote sites (within the United States or outside the United States).
Aspects of the invention may include automating one or more acts described herein. For example, a sequence analysis may be automated in order to generate a synthesis strategy automatically. The synthesis strategy may include i) the design of the starting nucleic acids that are to be assembled into the target nucleic acid, ii) the choice of the assembly technique(s) to be used, iii) the number of rounds of assembly and error screening or sequencing steps to include, and/or decisions relating to subsequent processing of an assembled target nucleic acid. Similarly, one or more steps of an assembly reaction may be automated using one or more automated sample handling devices (e.g., one or more automated liquid or fluid handling devices). For example, the synthesis and optional selection of starting nucleic acids (e.g., oligonucleotides) may be automated using a nucleic acid synthesizer and automated procedures. Automated devices and procedures may be used to mix reaction reagents, including one or more of the following: starting nucleic acids, buffers, enzymes (e.g., one or more ligases and/or polymerases), nucleotides, nucleic acid binding proteins or recombinases, salts, and any other suitable agents such as stabilizing agents. Automated devices and procedures also may be used to control the reaction conditions. For example, an automated thermal cycler may be used to control reaction temperatures and any temperature cycles that may be used. In some embodiments, a thermal cycler may be automated to provide one or more reaction temperatures or temperature cycles suitable for incubating nucleic acid fragments prior to transformation. Similarly, subsequent purification and analysis of assembled nucleic acid products may be automated. For example, fidelity optimization steps (e.g., a MutS error screening procedure) may be automated using appropriate sample processing devices and associated protocols. Sequencing also may be automated using a sequencing device and automated sequencing protocols. Additional steps (e.g., amplification, cloning, etc.) also may be automated using one or more appropriate devices and related protocols. It should be appreciated that one or more of the device or device components described herein may be combined in a system (e.g., a robotic system). Assembly reaction mixtures (e.g., liquid reaction samples) may be transferred from one component of the system to another using automated devices and procedures (e.g., robotic manipulation and/or transfer of samples and/or sample containers, including automated pipetting devices, etc.). The system and any components thereof may be controlled by a control system.
Accordingly, acts of the invention may be automated using, for example, a computer system (e.g., a computer controlled system). A computer system on which aspects of the invention can be implemented may include a computer for any type of processing (e.g., sequence analysis and/or automated device control as described herein). However, it should be appreciated that certain processing steps may be provided by one or more of the automated devices that are part of the assembly system. In some embodiments, a computer system may include two or more computers. For example, one computer may be coupled, via a network, to a second computer. One computer may perform sequence analysis. The second computer may control one or more of the automated synthesis and assembly devices in the system. In other aspects, additional computers may be included in the network to control one or more of the analysis or processing acts. Each computer may include a memory and processor. The computers can take any form, as the aspects of the present invention are not limited to being implemented on any particular computer platform. Similarly, the network can take any form, including a private network or a public network (e.g., the Internet). Display devices can be associated with one or more of the devices and computers. Alternatively, or in addition, a display device may be located at a remote site and connected for displaying the output of an analysis in accordance with the invention. Connections between the different components of the system may be via wire, wireless transmission, satellite transmission, any other suitable transmission, or any combination of two or more of the above. In accordance with one embodiment of the present invention for use on a computer system it is contemplated that sequence information (e.g., a target sequence, a processed analysis of the target sequence, etc.) can be obtained and then sent over a public network, such as the Internet, to a remote location to be processed by computer to produce any of the various types of outputs discussed herein (e.g., in connection with oligonucleotide design). However, it should be appreciated that the aspects of the present invention described herein are not limited in that respect, and that numerous other configurations are possible. For example, all of the analysis and processing described herein can alternatively be implemented on a computer that is attached locally to a device, an assembly system, or one or more components of an assembly system. As a further alternative, as opposed to transmitting sequence information (e.g., a target sequence, a processed analysis of the target sequence, etc.) over a communication medium (e.g., the network), the information can be loaded onto a computer readable medium that can then be physically transported to another computer for processing in the manners described herein. In another embodiment, a combination of two or more transmission/delivery techniques may be used. It also should be appreciated that computer implementable programs for performing a sequence analysis or controlling one or more of the devices, systems, or system components described herein also may be transmitted via a network or loaded onto a computer readable medium as described herein. Accordingly, aspects of the invention may involve performing one or more steps within the United States and additional steps outside the United States. In some embodiments, sequence information (e.g., a customer order) may be received at one location (e.g., in one country) and sent to a remote location for processing (e.g., in the same country or in a different country), for example, for sequence analysis to determine a synthesis strategy and/or design oligonucleotides. In certain embodiments, a portion of the sequence analysis may be performed at one site (e.g., in one country) and another portion at another site (e.g., in the same country or in another country). In some embodiments, different steps in the sequence analysis may be performed at multiple sites (e.g., all in one country or in several different countries). The results of a sequence analysis then may be sent to a further site for synthesis. However, in some embodiments, different synthesis and quality control steps may be performed at more than one site (e.g., within one county or in two or more countries). An assembled nucleic acid then may be shipped to a further site (e.g., either to a central shipping center or directly to a client).
Each of the different aspects, embodiments, or acts of the present invention described herein can be independently automated and implemented in any of numerous ways. For example, each aspect, embodiment, or act can be independently implemented using hardware, software or a combination thereof. When implemented in software, the software code can be executed on any suitable processor or collection of processors, whether provided in a single computer or distributed among multiple computers. It should be appreciated that any component or collection of components that perform the functions described above can be generically considered as one or more controllers that control the above-discussed functions. The one or more controllers can be implemented in numerous ways, such as with dedicated hardware, or with general purpose hardware (e.g., one or more processors) that is programmed using microcode or software to perform the functions recited above.
In this respect, it should be appreciated that one implementation of the embodiments of the present invention comprises at least one computer-readable medium (e.g., a computer memory, a floppy disk, a compact disk, a tape, etc.) encoded with a computer program (i.e., a plurality of instructions), which, when executed on a processor, performs one or more of the above-discussed functions of the present invention. The computer-readable medium can be transportable such that the program stored thereon can be loaded onto any computer system resource to implement one or more functions of the present invention discussed herein. In addition, it should be appreciated that the reference to a computer program which, when executed, performs the above-discussed functions, is not limited to an application program running on a host computer. Rather, the term computer program is used herein in a generic sense to reference any type of computer code (e.g., software or microcode) that can be employed to program a processor to implement the above-discussed aspects of the present invention.
It should be appreciated that in accordance with several embodiments of the present invention wherein processes are implemented in a computer readable medium, the computer implemented processes may, during the course of their execution, receive input manually (e.g., from a user). Accordingly, overall system-level control of the assembly devices or components described herein may be performed by a system controller which may provide control signals to the associated nucleic acid synthesizers, liquid handling devices, thermal cyclers, sequencing devices, associated robotic components, as well as other suitable systems for performing the desired input/output or other control functions. Thus, the system controller along with any device controllers together form a controller that controls the operation of a nucleic acid assembly system. The controller may include a general purpose data processing system, which can be a general purpose computer, or network of general purpose computers, and other associated devices, including communications devices, modems, and/or other circuitry or components necessary to perform the desired input/output or other functions. The controller can also be implemented, at least in part, as a single special purpose integrated circuit (e.g., ASIC) or an array of ASICs, each having a main or central processor section for overall, system- level control, and separate sections dedicated to performing various different specific computations, functions and other processes under the control of the central processor section. The controller can also be implemented using a plurality of separate dedicated programmable integrated or other electronic circuits or devices, e.g., hard wired electronic or logic circuits such as discrete element circuits or programmable logic devices. The controller can also include any other components or devices, such as user input/output devices (monitors, displays, printers, a keyboard, a user pointing device, touch screen, or other user interface, etc.), data storage devices, drive motors, linkages, valve controllers, robotic devices, vacuum and other pumps, pressure sensors, detectors, power supplies, pulse sources, communication devices or other electronic circuitry or components, and so on. The controller also may control operation of other portions of a system, such as automated client order processing, quality control, packaging, shipping, billing, etc., to perform other suitable functions known in the art but not described in detail herein.
Business applications
Aspects of the invention may be useful to generate nucleic acid libraries that represent very large numbers of nucleic acid sequence variants (e.g., RNA candidates for an aptamer screen) nucleic acid assembly reactions. Accordingly, aspects of the invention relate to marketing methods, compositions, kits, devices, and systems for generating nucleic acid libraries that represent very large numbers of nucleic acid sequence variants, methods and compositions for in vivo aptamer screening and selection, methods and compositions for identifying, monitoring, and generating metabolic pathways, and methods for designing and assembling libraries as described herein.
Aspects of the invention may be useful for reducing the time and/or cost of production, commercialization, and/or development of synthetic nucleic acids, and/or related compositions. Accordingly, aspects of the invention relate to business methods that involve collaboratively (e.g., with a partner) or independently marketing one or more methods, kits, compositions, devices, or systems for analyzing and/or assembling libraries and identifying aptamers in vivo as described herein. For example, certain embodiments of the invention may involve marketing a procedure and/or associated devices or systems involving techniques and assays described herein. In some embodiments, synthetic nucleic acids, libraries of synthetic nucleic acids, host cells containing synthetic nucleic acids, expressed polypeptides or proteins, etc., also may be marketed.
Marketing may involve providing information and/or samples relating to methods, kits, compositions, devices, and/or systems described herein. Potential customers or partners may be, for example, companies in the pharmaceutical, biotechnology and agricultural industries, as well as academic centers and government research organizations or institutes. Business applications also may involve generating revenue through sales and/or licenses of methods, kits, compositions, devices, and/or systems of the invention.
EXAMPLES Example 1. Nucleic acid fragment assembly
Gene assembly via a 2-step PCR method: In step (1), a primerless assembly of oligonucleotides is performed and in step (2) an assembled nucleic acid fragment is amplified in a primer-based amplification.
A 993 base long promoter>EGFP construct was assembled from 50-mer abutting oligonucleotides using a 2-step PCR assembly.
Mixed oligonucleotide pools were prepared as follows: 36 overlapping 50-mer oligonucleotides and two 5' terminal 59-mers were separated into 4 pools, each corresponding to overlapping 200-300 nucleotide segments of the final construct. The total oligonucleotide concentration in each pool was 5 μM. A primerless PCR extension reaction was used to stitch (assemble) overlapping oligonucleotides in each pool. The PCR extension reaction mixture was as follows: oligonucleotide pool (5 μM total) 1.0 μl (~ 25 nM final each) dNTP (10 mM each) 0.5 μl (250 μM final each)
Pfu buffer (1 Ox) 2.0 μl Pfu polymerase (2.5 U/μl) 0.5 μl dH2O to 20 μl
Assembly was achieved by cycling this mixture through several rounds of denaturing, annealing, and extension reactions as follows: start 2 min. 95°C 30 cycles of 95°C 30 sec, 650C 30 sec, 720C 1 min. final 720C 2 min. extension step
The resulting product was exposed to amplification conditions to amplify the desired nucleic acid fragments (sub-segments of 200-300 nucleotides). The following PCR mix was used: primerless PCR product 1.0 μl primer 5 ' (1.2 μM) 5 μl (300 nM final) primer 3 ' (1.2 μM) 5 μl (300 nM final) dNTP (10 mM each) 0.5 μl (250 μM final each)
Pfu buffer (1 Ox) 2.0 μl
PfU polymerase (2.5 U/μl) 0.5 μl
Figure imgf000068_0001
The following PCR cycle conditions were used: start 2 min. 95°C
35 cycles of 950C 30 sec, 65°C 30 sec, 720C 1 min. final 72°C 2 min. extension step
The amplified sub-segments were assembled using another round of primerless PCR as follows. A diluted amplification product was prepared for each sub-segment by diluting each amplified sub-segment PCR product 1:10 (4 μl mix + 36 μl dEfeO). This diluted mix was used as follows: diluted sub-segment mix * 1.0 μl dNTP (1OmM each) 0.5 μl (250 μM final each) Pfu buffer (1 Ox) 2.0 μl
Pfu polymerase (2.5 U/μl) 0.5 μl dH2O to 20 μl The following PCR cycle conditions were used: start 2 min. 95°C 30 cycles of 95°C 30 sec, 650C 30 sec, 72°C 1 min. final 720C 2 min. extension step
The full-length 993 nucleotide long promoter>EGFP was amplified in the following PCR mix: assembled sub-segments 1.0 μl primer 5 ' (1.2 μM) 5 μl (300 nM final) primer 3 ' (1.2 μM) 5 μl (300 nM final) dNTP (10 mM each) 0.5 μl (250 μM final each)
Pfu buffer (1Ox) 2.0 μl
Pfu polymerase (2.5 U/μl) 0.5 μl dH2O to 20 μl
The following PCR cycle conditions were used: start 2 min. 9S°C
35 cycles of 95°C 30 sec, 65°C 30 sec, 72°C 1 min. final 720C 2 min. extension step
EQUIVALENTS
The present invention provides among other things methods for assembling large polynucleotide constructs and organisms having increased genomic stability. While specific embodiments of the subject invention have been discussed, the above specification is illustrative and not restrictive. Many variations of the invention will become apparent to those skilled in the art upon review of this specification. The full scope of the invention should be determined by reference to the claims, along with their full scope of equivalents, and the specification, along with such variations.
INCORPORATION BY REFERENCE All publications, patents and sequence database entries mentioned herein, including those items listed below, are hereby incorporated by reference in their entirety as if each individual publication or patent was specifically and individually indicated to be incorporated by reference. In addition, the disclosures of co-pending provisional applications serial number 60/80.1,842, filed May 19, 2006, and serial numbers ζQj $Ql $3>4" an, 60/801,833, filed May 19, 2006, and the utility and PCT applications claiming priority thereto. In case of conflict, the present application, including any definitions herein, will control.
We claim:

Claims

1. A nucleic acid library comprising: a plurality of unique nucleic acids, wherein each unique nucleic acid encodes a plurality of unique RNA variants.
2. The library of claim 1 , comprising at least 104 unique nucleic acids, wherein each unique nucleic acid encodes at least 10 unique RNA variants.
3. The library of claim 2, comprising at least 10s unique nucleic acids, wherein each unique nucleic acid encodes about 100 unique RNA variants.
4. The library of claim 1, wherein each RNA variant is about 100 nucleotides long.
5. The library of claim 1 , wherein each RNA variant comprises an RNA aptamer or an RNA aptamer candidate sequence.
6. The library of claim 1 , wherein each RNA variant is operably associated with a reporter RNA.
7. The library of claim 6, wherein the reporter RNA operably associated with each RNA variant has the same reporter sequence.
8. The library of claim 1 , wherein the unique variants are cloned into a vector.
9. The library of claim 8, wherein the vector is a plasmid.
10. A host cell composition comprising a nucleic acid library, wherein the nucleic acid library comprises a plurality of unique nucleic acids, and wherein each unique nucleic acid encodes a plurality of unique RNA variants.
11. The host cell composition of claim 10, wherein the host cell is a eukaryotic cell.
12. The host cell composition of claim 11 , wherein the host cell is a mammalian cell.
13. The host cell composition of claim 10, wherein the cell is a bacterial cell.
14. The host cell composition of claim 10, wherein each of the plurality of RNA variants is transcribed from its own promoter.
15. The host cell composition of claim 10, wherein the plurality of RNA variants is transcribed from the same promoter.
16. The host cell composition of claim 10, wherein each of the plurality of RNA variants is fused to a first reporter RNA.
17. The host cell composition of claim 16, wherein the first reporter RNA is an antisense switch that inhibits expression of a second reporter RNA in the absence of ligand recognition by the RNA aptamer that is fused to the first reporter RNA.
18. The host cell composition of claim 16, wherein the first reporter RNA is an antisense switch that inhibits expression of a second reporter RNA in the presence of ligand recognition by the RNA aptamer that is fused to the first reporter RNA.
19. The host cell composition of claim 17 or 18, wherein the second reporter RNA generates a fluorescent signal when transcribed in the presence of malachite green.
20. The host cell composition of any one of claims 10-19, wherein the composition is a cell culture or a frozen cell preparation.
21. The host cell composition of any one of claims 10-20, wherein the RNA variants comprise RNA aptamers or RNA aptamer candidate sequences. .
22. A host cell composition comprising a library of any one of claims 1-9.
23. A cell comprising a plurality of RNA aptamers, wherein each RNA aptamer is fused to a different reporter RNA and wherein each RNA aptamer binds to a different ligand.
24. A method of selecting for a cell that contains an RNA aptamer that binds to a ligand, the method comprising exposing a host cell preparation of any one of claims 10- 22 to a ligand, and identifying a cell as containing an RNA aptamer that binds to the ligand if the cell exhibits a different property relative to other cells in the host cell culture in the presence of the ligand.
25. The method of claim 24, wherein the property is a fluorescence intensity, an enzyme activity, a growth property, or other functional property.
26. The method of claim 24 or 25, further comprising identifying the RNA variant that in the cell that is an aptamer.
PCT/US2007/012075 2006-05-19 2007-05-19 Methods and compositions for aptamer production and uses thereof Ceased WO2007136833A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US80183406P 2006-05-19 2006-05-19
US60/801,834 2006-05-19

Publications (2)

Publication Number Publication Date
WO2007136833A2 true WO2007136833A2 (en) 2007-11-29
WO2007136833A3 WO2007136833A3 (en) 2008-01-24

Family

ID=38514204

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2007/012075 Ceased WO2007136833A2 (en) 2006-05-19 2007-05-19 Methods and compositions for aptamer production and uses thereof

Country Status (1)

Country Link
WO (1) WO2007136833A2 (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010005055A1 (en) * 2008-07-09 2010-01-14 公立大学法人大阪市立大学 Oligonucleotide structure, and method for regulation of gene expression
US8568979B2 (en) 2006-10-10 2013-10-29 Illumina, Inc. Compositions and methods for representational selection of nucleic acids from complex mixtures using hybridization
US9051666B2 (en) 2002-09-12 2015-06-09 Gen9, Inc. Microarray synthesis and assembly of gene-length polynucleotides
US9925510B2 (en) 2010-01-07 2018-03-27 Gen9, Inc. Assembly of high fidelity polynucleotides
US9968902B2 (en) 2009-11-25 2018-05-15 Gen9, Inc. Microfluidic devices and methods for gene synthesis
US10081807B2 (en) 2012-04-24 2018-09-25 Gen9, Inc. Methods for sorting nucleic acids and multiplexed preparative in vitro cloning
US10202608B2 (en) 2006-08-31 2019-02-12 Gen9, Inc. Iterative nucleic acid assembly using activation of vector-encoded traits
US10207240B2 (en) 2009-11-03 2019-02-19 Gen9, Inc. Methods and microfluidic devices for the manipulation of droplets in high fidelity polynucleotide assembly
US10308931B2 (en) 2012-03-21 2019-06-04 Gen9, Inc. Methods for screening proteins using DNA encoded chemical libraries as templates for enzyme catalysis
US10457935B2 (en) 2010-11-12 2019-10-29 Gen9, Inc. Protein arrays and methods of using and making the same
US11072789B2 (en) 2012-06-25 2021-07-27 Gen9, Inc. Methods for nucleic acid assembly and high throughput sequencing
US11084014B2 (en) 2010-11-12 2021-08-10 Gen9, Inc. Methods and devices for nucleic acids synthesis
US11702662B2 (en) 2011-08-26 2023-07-18 Gen9, Inc. Compositions and methods for high fidelity assembly of nucleic acids

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5958672A (en) * 1995-07-18 1999-09-28 Diversa Corporation Protein activity screening of clones having DNA from uncultivated microorganisms
US6790605B1 (en) * 1995-12-07 2004-09-14 Diversa Corporation Methods for obtaining a desired bioactivity or biomolecule using DNA libraries from an environmental source
US6242211B1 (en) * 1996-04-24 2001-06-05 Terragen Discovery, Inc. Methods for generating and screening novel metabolic pathways
JP5101288B2 (en) * 2004-10-05 2012-12-19 カリフォルニア インスティテュート オブ テクノロジー Aptamer-regulated nucleic acids and uses thereof
AU2005295351A1 (en) * 2004-10-18 2006-04-27 Codon Devices, Inc. Methods for assembly of high fidelity synthetic polynucleotides

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10774325B2 (en) 2002-09-12 2020-09-15 Gen9, Inc. Microarray synthesis and assembly of gene-length polynucleotides
US10640764B2 (en) 2002-09-12 2020-05-05 Gen9, Inc. Microarray synthesis and assembly of gene-length polynucleotides
US9051666B2 (en) 2002-09-12 2015-06-09 Gen9, Inc. Microarray synthesis and assembly of gene-length polynucleotides
US10450560B2 (en) 2002-09-12 2019-10-22 Gen9, Inc. Microarray synthesis and assembly of gene-length polynucleotides
US10202608B2 (en) 2006-08-31 2019-02-12 Gen9, Inc. Iterative nucleic acid assembly using activation of vector-encoded traits
US9587273B2 (en) 2006-10-10 2017-03-07 Illumina, Inc. Compositions and methods for representational selection of nucleic acids from complex mixtures using hybridization
US9139826B2 (en) 2006-10-10 2015-09-22 Illumina, Inc. Compositions and methods for representational selection of nucleic acids from complex mixtures using hybridization
US8568979B2 (en) 2006-10-10 2013-10-29 Illumina, Inc. Compositions and methods for representational selection of nucleic acids from complex mixtures using hybridization
US8916350B2 (en) 2006-10-10 2014-12-23 Illumina, Inc. Compositions and methods for representational selection of nucleic acids from complex mixtures using hybridization
US10538759B2 (en) 2006-10-10 2020-01-21 Illumina, Inc. Compounds and method for representational selection of nucleic acids from complex mixtures using hybridization
US9340781B2 (en) 2006-10-10 2016-05-17 Illumina, Inc. Compositions and methods for representational selection of nucleic acids from complex mixtures using hybridization
WO2010005055A1 (en) * 2008-07-09 2010-01-14 公立大学法人大阪市立大学 Oligonucleotide structure, and method for regulation of gene expression
US10207240B2 (en) 2009-11-03 2019-02-19 Gen9, Inc. Methods and microfluidic devices for the manipulation of droplets in high fidelity polynucleotide assembly
US20190143291A1 (en) * 2009-11-03 2019-05-16 Gen9, Inc. Methods and microfluidic devices for the manipulation of droplets in high fidelity polynucleotide assembly
US9968902B2 (en) 2009-11-25 2018-05-15 Gen9, Inc. Microfluidic devices and methods for gene synthesis
US9925510B2 (en) 2010-01-07 2018-03-27 Gen9, Inc. Assembly of high fidelity polynucleotides
US11071963B2 (en) 2010-01-07 2021-07-27 Gen9, Inc. Assembly of high fidelity polynucleotides
US11084014B2 (en) 2010-11-12 2021-08-10 Gen9, Inc. Methods and devices for nucleic acids synthesis
US10457935B2 (en) 2010-11-12 2019-10-29 Gen9, Inc. Protein arrays and methods of using and making the same
US11845054B2 (en) 2010-11-12 2023-12-19 Gen9, Inc. Methods and devices for nucleic acids synthesis
US10982208B2 (en) 2010-11-12 2021-04-20 Gen9, Inc. Protein arrays and methods of using and making the same
US11702662B2 (en) 2011-08-26 2023-07-18 Gen9, Inc. Compositions and methods for high fidelity assembly of nucleic acids
US10308931B2 (en) 2012-03-21 2019-06-04 Gen9, Inc. Methods for screening proteins using DNA encoded chemical libraries as templates for enzyme catalysis
US10081807B2 (en) 2012-04-24 2018-09-25 Gen9, Inc. Methods for sorting nucleic acids and multiplexed preparative in vitro cloning
US10927369B2 (en) 2012-04-24 2021-02-23 Gen9, Inc. Methods for sorting nucleic acids and multiplexed preparative in vitro cloning
US11072789B2 (en) 2012-06-25 2021-07-27 Gen9, Inc. Methods for nucleic acid assembly and high throughput sequencing
US12241057B2 (en) 2012-06-25 2025-03-04 Gen9, Inc. Methods for nucleic acid assembly and high throughput sequencing

Also Published As

Publication number Publication date
WO2007136833A3 (en) 2008-01-24

Similar Documents

Publication Publication Date Title
WO2007136833A2 (en) Methods and compositions for aptamer production and uses thereof
US20240368682A1 (en) Systems and methods for clonal replication and amplification of nucleic acid molecules for genomic and therapeutic applications
US20090087840A1 (en) Combined extension and ligation for nucleic acid assembly
US20070231805A1 (en) Nucleic acid assembly optimization using clamped mismatch binding proteins
WO2008054543A2 (en) Oligonucleotides for multiplex nucleic acid assembly
US8053191B2 (en) Iterative nucleic acid assembly using activation of vector-encoded traits
US20200399690A1 (en) Compositions and methods for selection of nucleic acids
AU2014354852B2 (en) Libraries of nucleic acids and methods for making the same
US20080064610A1 (en) Nucleic acid library design and assembly
US20090136986A1 (en) Methods and cells for creating functional diversity and uses thereof
WO2007123742A2 (en) Methods and compositions for increasing the fidelity of multiplex nucleic acid assembly
WO2007120624A2 (en) Concerted nucleic acid assembly reactions
US20060115850A1 (en) Method for the synthesis of DNA fragments
AU2016281758B2 (en) Reagents, kits and methods for molecular barcoding
KR20220041874A (en) gene mutation analysis
JP2022110152A (en) Diagnostic methods and compositions
Hartig et al. Small circular DNAs for synthesis of the human telomere repeat: varied sizes, structures and telomere-encoding activities
US20250146059A1 (en) Method and Reagent Kit for Targeted Genomic Enrichment
EP4321630A1 (en) Method of parallel, rapid and sensitive detection of dna double strand breaks
Max Multiplexed genes synthesis from oligos within microdroplets
WO2022235898A1 (en) High-throughput analysis of biomolecules
CN102549154A (en) Method for amplifying nucleic acid
HK1232917B (en) Methods for clonal replication and amplification of nucleic acid molecules for genomic and therapeutic applications
Soh et al. Ready-to-use Aptamer Biosensors for DNT and RDX

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07795109

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 07795109

Country of ref document: EP

Kind code of ref document: A2