WO2025128162A1 - Degenerate dropsynth gene synthesis - Google Patents
Degenerate dropsynth gene synthesis Download PDFInfo
- Publication number
- WO2025128162A1 WO2025128162A1 PCT/US2024/037902 US2024037902W WO2025128162A1 WO 2025128162 A1 WO2025128162 A1 WO 2025128162A1 US 2024037902 W US2024037902 W US 2024037902W WO 2025128162 A1 WO2025128162 A1 WO 2025128162A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- nucleic acids
- target nucleic
- oligonucleotides
- nucleotides
- assembled
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/10—Processes for the isolation, preparation or purification of DNA or RNA
- C12N15/1034—Isolating an individual clone by screening libraries
- C12N15/1093—General methods of preparing gene libraries, not provided for in other subgroups
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12P—FERMENTATION OR ENZYME-USING PROCESSES TO SYNTHESISE A DESIRED CHEMICAL COMPOUND OR COMPOSITION OR TO SEPARATE OPTICAL ISOMERS FROM A RACEMIC MIXTURE
- C12P19/00—Preparation of compounds containing saccharide radicals
- C12P19/26—Preparation of nitrogen-containing carbohydrates
- C12P19/28—N-glycosides
- C12P19/30—Nucleotides
- C12P19/34—Polynucleotides, e.g. nucleic acids, oligoribonucleotides
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6806—Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Y—ENZYMES
- C12Y207/00—Transferases transferring phosphorus-containing groups (2.7)
- C12Y207/07—Nucleotidyltransferases (2.7.7)
- C12Y207/07007—DNA-directed DNA polymerase (2.7.7.7), i.e. DNA replicase
Definitions
- This disclosure relates to methods of synthesizing and assembling a plurality of target nucleic acids, particularly libraries of target nucleic acids with specifically designed variants.
- Sequence Listing is submitted as an XML file in the form of the file named “1505- 111303-02_Sequence_Listing” (56,594 bytes), which was created on July 12, 2024, which is incorporated by reference herein.
- Protein engineering seeks to design proteins with novel properties and functions by determining the sequences corresponding to a particular targeted function or property. Its potential applications are immense and span from drug development and diagnostics to biofuel production and environmental remediation.
- One important aspect of protein engineering is the creation of chimeric or hybrid proteins. These proteins, especially prevalent in cellular signaling, are constructed by combining different modules to create new or enhanced functionalities. This is particularly useful in the development of biomass saccharification, cellular signaling, and complex biosynthesis pathways, to name a few. Central to the success of these chimeric proteins are the design and engineering of fusion points and linkers.
- fusion point can significantly impact the function of the resulting chimeric protein, as it can influence the spatial arrangement of the protein domains and their ability to interact with each other and with other molecules.
- Linkers are connectors that provide flexibility and space between the domains, enabling proper folding and independent functioning. Their characteristics (length, composition, and conformation) significantly affect the activity, stability, and solubility of the chimeric protein.
- the provided methods utilize degenerate oligonucleotides and include methods for multiplex gene synthesis of a plurality of variant target nucleic acids in a single reaction.
- each oligonucleotide comprising: a subsequence of one or more target nucleic acids, and a predesigned unique subsequence, wherein at least one or more of the oligonucleotides are degenerate oligonucleotides comprising a common overlap with one or more of the set of oligonucleotides;
- the disclosed methods further include one or more of: cleaving the hybridized oligonucleotides from the substrate; recovering the plurality of assembled target nucleic acids; purifying the plurality of assembled target nucleic acids; cloning each of the plurality of assembled target nucleic acids into a vector; and sequencing each of the plurality of assembled target nucleic acids.
- the plurality of degenerate oligonucleotides comprise 0, 2, 4, 6, or 8 degenerate oligonucleotides.
- the one or more of the plurality of target nucleic acids comprises a gene or a fragment thereof, one or more variants of the target nucleic acids, and/or one or more chimeric nucleic acids.
- the one or more chimeric nucleic acids encode a first protein domain and a second protein domain linked at a fusion point or by a linker.
- the one or more chimeric nucleic acids encode a functional chimeric protein.
- the substrate includes a plurality of beads, a microarray, a silicon chip, or a microfluidic chip.
- the beads are fractionated into one or more compartments.
- a cover is placed over the substrate in such a way that the one or more nucleic acid barcodes and hybridized oligonucleotides are isolated into separate compartments, optionally following a washing step to remove or substantially remove unhybridized oligo nucleotides.
- the compartment is an emulsion droplet, a well, a chamber, a subcompartment of the substrate, a vesicle (such as a lipid or block copolymer vesicle), a liposome, or a polymersome.
- a vesicle such as a lipid or block copolymer vesicle
- the well or chamber is a microfabricated well or chamber on a silicon ship, a compartment within a microfluidic chip, or a compartment within a PDMS or SU8 (or other patterned functional polymer) substrate bonded to glass.
- the oligonucleotides of the set of oligonucleotides are from 50 nucleotides to 1000 nucleotides long, such as 300 nucleotides long.
- the assembled target nucleic acids are about 100 nucleotides to about 3000 nucleotides long, such as 1000 nucleotides long.
- the assembled target nucleic acids comprise two or more variable or degenerate regions.
- a combinatorial library encompassing all possible combinations through the assembly process can be created.
- FIGS. A-3B show protein coverage. The number of designed protein variants for which at least one perfect amino acid sequence is observed. These are shown both individually for each codon library as well as combined together.
- FIG. 3A The percentage observed relative to the number of variants designed.
- FIG. 3B The absolute numbers of variants observed, with the total designed variants shown in light gray. The percent coverage decreases less than the increase in degeneracy scale, resulting in overall increased numbers of variants assembled.
- FIG. 6 shows the absolute number of variants designed at each degeneracy level for each of the four libraries tested.
- FIG. 8 shows that when plotting the median percentage of perfects, there is a correlation with the total number of barcodes observed and the degeneracy levels which is much stronger with the 5 oligo libraries (top-row) compared to the 4 oligo libraries (bottom-row).
- FIG. 12 shows the fraction of barcodes observed, normalized by the total faction of designs.
- the sum of all observed unique gene barcodes at a particular degeneracy level were determined, and it was divided by the total number of observed unique gene barcodes in the library, to calculate the fraction of observed barcodes.
- the fraction of designs at each degeneracy level was determined by dividing the total number of variants at that degeneracy level divided by the total number of variants in the entire library. There is a strong decay in the observed barcodes as the degeneracy level is increased.
- FIG. 14 is a model of PCR amplification applied to all four libraries.
- the y-axis values are log transformed barcodes observed per variant while the x-axis is the expected variant concentration given by the total number of barcoded beads with a given degeneracy level divided by the total number of variants at that level.
- FIG. 17 shows the folding energy of 4000 random 20 bp sequences determined using both seqfold and LJNAfold, shows a high 0.84 correlation.
- FIG. 18 is a map of plasmid pHKGGl. The libraries are cloned into the N-terminal region of EnvZ. This plasmid is a derivative of plasmid pSR348.
- FIG. 19 shows an exemplary layout of barcoded beads.
- SEQ ID NO: 54 is the sequence of CP anchor 52bio skpp511F+R and captured oligo.
- SEQ ID NO: 55 is the sequence of CP liga- 5phos-skpp51 lFrc-3bio and CP bcoligo-cpl2mer-l.
- nucleic acid and amino acid sequences provided herein and listed in the accompanying sequence listing are shown using standard letter abbreviations for nucleotide bases, and single letter code for amino acids, as defined in 37 C.F.R. 1.822. Only one strand of each nucleic acid sequence is shown, but the complementary strand is understood as included by any reference to the displayed strand.
- SEQ ID NOs: 1-8 are the sequences of subpool amplification primers.
- SEQ ID Nos: 9-32 are the sequences of primers used to generate pHKGGl.
- SEQ ID NOs: 33-35 are the sequences of fragments used in Golden Gate to make libraries in plasmid pHKGGl .
- SEQ ID NO: 36 is the sequence of an end cloning site.
- SEQ ID NO: 37 is the sequence of a 24 basepair quasi-randomer barcode region.
- SEQ ID NO: 38 is the sequence of the constant regions flanking the barcode.
- SEQ ID NO: 39 is the sequence of a conserved region in the vector immediately flanking the cloning site.
- SEQ ID Nos: 40 and 41 are standard assembly primers 504F&R.
- SEQ ID NOs: 42 and 43 are modified versions of SEQ ID NOs: 40 and 41, respectively, with ITRs.
- SEQ ID NO: 44 is the sequence of a primer for single-primer suppression PCR.
- SEQ ID NOs: 45-53 are amino acid sequences of exemplary addition or removal of amino acid residues from N-terminal or C-terminal fragments on either side of a fusion point.
- SEQ ID NO: 54 is the sequence of exemplary CP anchor52bioskpp511F+R and captured oligo.
- SEQ ID NO: 55 is the sequence of exemplary CP lig-5phos-skpp511Frc-3bio and CP gcoligo-cp 12mer- 1.
- SEQ ID NO: 56 is the amino acid sequence of residues 232-237 of envZ.
- Design-Build-Test-Learn strategy could consist of: (1) designing large amounts of diverse relevant hybrids through metagenomic mining or rational computational approaches, (2) assembling large libraries of specifically designed variants spanning many diverse genes, (3) functionally characterizing the library using a multiplexed functional assay (4) feeding the resulting data into computational or machine learning (ML) models which can discern the underlying patterns, (5) repeating this process using computational or ML generated variants and feeding the results back into the model until some target threshold for accuracy is achieved.
- ML machine learning
- chimeric fusions were built in a region just below the HAMP signaling domain, a homodimeric four alpha-helix parallel coiled-coil region.
- the disclosed methods were used to create many phase variants for each sensor domain through the controlled addition or subtraction of amino acid residues on either the C or N terminal fragments of each chimera as shown in FIG. ID.
- Any variant can be made as it is programmatically encoded on a corresponding assembly oligo, with the only requirement being that the variant sequence length can fit onto the corresponding oligo.
- a degenerate oligonucleotide refers to a mixture of sequences that differ in one or more position (such as 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more positions).
- a degenerate oligonucleotide is a mixture of sequences that differ in one or more positions but encode the same amino acid sequence.
- a degenerate oligonucleotide is a mixture of sequences that differ in one or more positions and encode different amino acid sequences or result in frame-shifted amino acid sequences.
- a degenerate oligonucleotide can contain one or more conserved regions used to assemble multiple oligonucleotide fragments together (for example, used in Polymerase Cycling Assembly). These conserved regions may contain sequence overlaps to other oligonucleotides present in the assembly or other sequence features ( ⁇ ?.g., restriction enzyme sites) that enable assembly of multiple fragments together.
- each of the oligonucleotides of the set of oligonucleotides are 300 nucleotides long.
- hybridizing the set of oligonucleotides with a substrate, wherein the substrate includes one or more nucleic acid barcodes complementary to a predesigned unique subsequence of the set of oligonucleotides includes incubating the oligonucleotides and the substrate with a ligase (such as Taq ligase) under conditions suitable for hybridization.
- a ligase such as Taq ligase
- the hybridization conditions are 3 hours at 50°C followed by a 0. l°C/min ramp to 40°C and 3 hour incubation at 40°C, followed by a 0.1°C/min ramp to 30°C and 3 hour incubation at 30°C.
- the methods further include amplifying the set of oligonucleotides and processing the oligonucleotides to expose the barcode as single-stranded DNA prior to the hybridization. In additional examples, the methods further include washing the substrate following hybridization to remove unbound oligonucleotides.
- the substrate is a plurality of beads, a microarray, a silicon chip, or a microfluidic chip.
- the substrate is a plurality of beads, such as a plurality of microbeads including or linked to one or more nucleic acid barcodes complementary to the predesigned unique subsequence(s) of the set of oligonucleotides.
- the plurality of beads are a plurality of magnetic beads and/or a plurality of streptavidin labeled beads.
- the beads are DynabeadsTM M-270 Streptavidin (Invitrogen), however, one of ordinary skill in the art can identify additional suitable beads of use in the disclosed methods.
- isolating the one or more nucleic acid barcodes and hybridized oligonucleotides into a plurality of compartments includes isolating the one or more nucleic acid barcodes and hybridized oligonucleotides into a plurality of emulsion droplets, wells, chambers, subcompartments of the substrate, vesicles, liposomes, or polymersomes.
- the one or more nucleic acid barcodes and hybridized oligonucleotides are isolated into a plurality of emulsion droplets (such as at least 10,000 droplets, at least 100,000 droplets, at least 500,000 droplets, at least 1,000,000 droplets, at least 10,000,000 droplets, at least 50,000,000 droplets, at least 100,000,000 droplets, at least 150,000,000 droplets, at least 250,000,000 droplets, at least 500,000,000 droplets, at least 1,000,000,000 droplets, at least 100,000,000,000 droplets, at least 500,000,000,000 droplets, or at least 1,000,000,000,000 droplets).
- emulsion droplets such as at least 10,000 droplets, at least 100,000 droplets, at least 500,000 droplets, at least 1,000,000 droplets, at least 10,000,000 droplets, at least 50,000,000 droplets, at least 100,000,000 droplets, at least 150,000,000 droplets, at least 250,000,000 droplets, at least 500,000,000 droplets, at least 1,000,000,000 droplets, at least 100,000,000,000 droplets, at least 500,000,000,000 droplets, or at least 1,000,000,000,000 droplets).
- the methods include assembling the plurality of target nucleic acids utilizing a polymerase.
- the assembly is carried out using polymerase cycling assembly (PCA, also known as assembly PCR).
- PCA is a method for assembling larger oligonucleotides (or even entire genes) from shorter, overlapping oligonucleotides.
- the oligonucleotides anneal to complementary fragments via the overlaps and are filled in by a polymerase, building fragments of increasing length.
- the final product is obtained by a regular PCR reaction using primers that anneal to the 5’ and 3’ end of the full-length (e.g., target) sequence, which also eliminates shorter, incomplete fragments.
- the assembly is carried out using emulsion PCA (ePCA).
- the assembled target nucleic acids are about 100 nucleotides to about 3000 nucleotides long, such as about 100 nucleotides to about 500 nucleotides, about 250 nucleotides to about 750 nucleotides, about 500 nucleotides to about 1000 nucleotides (for example about 100 nucleotides, about 150 nucleotides, about 200 nucleotides, about 250 nucleotides, about 300 nucleotides, about 350 nucleotides, about 400 nucleotides, about 450 nucleotides, about 500 nucleotides, about 550 nucleotides, about 600 nucleotides, about 650 nucleotides, about 700 nucleotides, about 750 nucleotides, about 800 nucleotides, about 850 nucleotides, about 900 nucleotides, about 950 nucleotides, or about 1000 nucleotides long)
- the assembled target nucleic acids are about 100 nucle
- the assembled target nucleic acids vary in length (are not all the same length), and in some examples, vary by up to about 25 nucleotides in their length, vary by about 50 nucleotides in their length, vary by about 75 nucleotides in their length, vary by about 100 nucleotides in their length, or more.
- the methods further include one or more steps including cleaving the hybridized oligonucleotides from the substrate; recovering the plurality of assembled target nucleic acids; purifying the plurality of assembled target nucleic acids; cloning each of the plurality of assembled target nucleic acids into a vector; and/or sequencing each of the plurality of assembled target nucleic acids.
- cleaving the hybridized oligonucleotides from the substrate is carried out by contacting the oligonucleotides hybridized to the substrate with a restriction enzyme that cleaves a restriction site included in the oligonucleotides.
- a restriction enzyme that cleaves a restriction site included in the oligonucleotides.
- Any suitable restriction enzyme can be used; however, in one specific example, the restriction enzyme is Btsl.
- the restriction enzyme can be BsrDI, Bst6I, Earl, Bso31I, BsmAI, Bse3DI, BmrI, Bmul, Bsal, BsmBI, BbsI, PaqCI, or BspQI.
- a photo-cleavable linker can be integrated into the oligonucleotide and used to cleave the oligonucleotide from the substrate upon exposure to light.
- cloning each of the plurality of assembled target nucleic acids into a vector is carried out by any method known to one of ordinary skill in the art.
- the cloning is by Golden Gate assembly.
- cloning each of the plurality of assembled target nucleic acids into a vector is carried out using Gibson assembly, PC A (Polymerase Cycling Assembling), SLIC (Sequence and Ligation Independent Cloning), LIC (Ligation-independent cloning), LCR (Ligase Cycling Reaction), USER, InFusion, SliCE (Seamless Ligation Cloning Extract), Gateway, or CPEC (Circular polymerase extension cloning).
- the vector includes one or more regulatory elements, such as a promoter (such as a constitutive or inducible promoter), a selection marker (such as an antibiotic resistance gene), and/or an origin of replication.
- the vector may also include one or more cloning sites (such as one or more restriction sites or one or more homology sites) for introducing a sequence of interest, such as assembled target nucleic acids.
- the vector includes a lac repressor (such as LacI) and a P trc inducible promoter.
- the one or more of the plurality of target nucleic acids comprises a gene or a fragment thereof.
- the plurality of target nucleic acids include one or more variants of the target nucleic acid.
- the one or more variants of the target nucleic acid encode one or more amino acid substitutions, insertions, and/or deletions.
- the plurality of target nucleic acids includes one or more chimeric nucleic acids.
- the one or more chimeric nucleic acids encode a first protein domain and a second protein domain linked at a fusion point or by a linker.
- the one or more chimeric nucleic acids encode a functional chimeric protein.
- the assembled target nucleic acids include one or more variable degenerate regions or two or more variable or degenerate regions (such as 2, 3, 4, 5, 6, 7, 8, 9, 10, or more).
- An example of assembled target nucleic acids with one variable or degenerate region is shown schematically in FIG. 4.
- the variable or degenerate region is at the N- terminal of the assembled nucleic acids, internal to the assembled nucleic acids, or at the C- terminal of the assembled nucleic acids.
- An example of assembled target nucleic acids with two degenerate or variable regions is shown schematically in FIG. 5.
- one of the degenerate or variable regions is at the N-terminal of the assembled nucleic acids and one of the degenerate or variable regions is internal to the assembled nucleic acids. In other examples, one of the degenerate or variable regions is at the C-terminal of the assembled nucleic acids and one of the degenerate or variable regions is internal to the assembled nucleic acids. In further examples, both of the degenerate or variable regions are internal to the assembled nucleic acids. In still further examples, one of the degenerate or variable regions is at the N-terminal of the assembled nucleic acids and one of the degenerate or variable regions is at the C-terminal of the assembled nucleic acids.
- the assembled target nucleic acids may include one or more full- length genes or two or more full-length genes (such as 1-32, for example, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, or 32 full-length genes).
- the assembled target nucleic acids include a set of genetic circuits.
- a set of genetic circuits may include a set of regulatory elements, promoters, ribosome binding sites (RBS), one or more gene coding regions (e.g., monocistronic, bicistronic, or multicistronic), transcriptional terminators, aptamers, introns, non-coding RNA elements, recombination sites, and cloning sites.
- the genetic circuits may encode logic gates.
- the genes may encode a biosynthetic pathway, for example, a biosynthetic pathway capable of producing a small-molecule of interest.
- the full-length genes may encode for proteins of interest.
- the full-length genes encode for protein domains, or protein fragments.
- the genetic circuits may include synthetic operons designed for coordinated expression of multiple genes. These operons may be used to engineer metabolic pathways to produce complex biomolecules, such as antibiotics, biofuels, or polymers.
- the circuits may encode gene-editing systems, such as CRISPR-Cas9 components, for targeted modifications of genomic DNA.
- the assembled nucleic acids could include reporter genes for monitoring gene expression, such as fluorescent or luminescent markers.
- the nucleic acids may incorporate elements for inducible gene expression, allowing for temporal control of gene activity in response to specific stimuli or environmental conditions.
- a method of synthesizing a plurality of target nucleic acids comprising:
- Aspect 2 The method of aspect 1, further comprising one or more of: cleaving the hybridized oligonucleotides from the substrate; recovering the plurality of assembled target nucleic acids; purifying the plurality of assembled target nucleic acids; cloning each of the plurality of assembled target nucleic acids into a vector; and sequencing each of the plurality of assembled target nucleic acids.
- Aspect 4 The method of any one of aspects 1 to 3, wherein one or more of the plurality of target nucleic acids comprises a gene or a fragment thereof.
- Aspect 6 The method of any one of aspect 1 to 5, wherein the plurality of target nucleic acids comprises one or more chimeric nucleic acids.
- Aspect 7 The method of aspect 6, wherein the one or more chimeric nucleic acids encode a first protein domain and a second protein domain linked at a fusion point or by a linker.
- Barcode mapping of constructs over 500 bp requires long read sequencing, beyond what can be achieved with Illumina sequencing.
- Use of Oxford Nanopore sequencing as a viable alternative to PacBio MAS-Iso-seq was explored. Although the raw reads have a higher error rate, previous studies have shown this can be reduced significantly by collapsing reads on barcodes (Karst et al., Nat. Methods 18:165-169, 2021 ; Zurek ⁇ ?/ al., Nat. Commun. 11 :6023, 2020).
- a read-based majority call was used to determine consensus for each barcode.
- In comparing the percentage perfects observed through each sequencing method there was consistently a higher rate for the PacBio data (median 7.8%, s.d. 4.2%) as shown in FIG. 16A-16B. This highlights the gap in error rates between the two platforms and suggests that more complex consensus calling approaches and likely higher sequencing depths are required to reach parity between the two methods.
- N_x N_0*(l+E) A (x) i.
- N_x is the final number of molecules after x cycles ii.
- N_0 is the initial number of dsDNA molecules iii.
- E is the PCR efficiency iv. If the template is ssDNA use (x-1)
- NTC no-template controls
- Saturation is due to competitive inhibition of DNA polymerase by dsDNA. Not depletion of primers, NTPs, or loss of polymerase activity.
- Annealing temp set to 60°C. Typically 12-24 cycles are needed.
- PCR protocol i. 45 sec 98°C initial denaturation ii. 15 sec 98°C denaturation iii. 30 sec 60°C annealing iv. 15 sec 72°C extension v. Go to step 2, repeat based on the number of cycles determined by qPCR. vi. 1 min 72°C final extension
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Health & Medical Sciences (AREA)
- Organic Chemistry (AREA)
- Engineering & Computer Science (AREA)
- Genetics & Genomics (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Biochemistry (AREA)
- Biotechnology (AREA)
- General Health & Medical Sciences (AREA)
- Microbiology (AREA)
- Analytical Chemistry (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Physics & Mathematics (AREA)
- Biophysics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Biomedical Technology (AREA)
- General Chemical & Material Sciences (AREA)
- Immunology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Crystallography & Structural Chemistry (AREA)
- Plant Pathology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Methods of multiplex gene synthesis of a plurality of variant target nucleic acids in a single reaction are provided.
Description
DEGENERATE DROPSYNTH GENE SYNTHESIS
CROSS REFERENCE TO RELATED APPLICATIONS
This application claims priority to U.S. Provisional Application No. 63/608,469, filed December 11, 2023, which is incorporated by reference in its entirety.
FIELD
This disclosure relates to methods of synthesizing and assembling a plurality of target nucleic acids, particularly libraries of target nucleic acids with specifically designed variants.
SEQUENCE LISTING INCORPORATION BY REFERENCE
The Sequence Listing is submitted as an XML file in the form of the file named “1505- 111303-02_Sequence_Listing” (56,594 bytes), which was created on July 12, 2024, which is incorporated by reference herein.
BACKGROUND
Protein engineering seeks to design proteins with novel properties and functions by determining the sequences corresponding to a particular targeted function or property. Its potential applications are immense and span from drug development and diagnostics to biofuel production and environmental remediation. One important aspect of protein engineering is the creation of chimeric or hybrid proteins. These proteins, especially prevalent in cellular signaling, are constructed by combining different modules to create new or enhanced functionalities. This is particularly useful in the development of biomass saccharification, cellular signaling, and complex biosynthesis pathways, to name a few. Central to the success of these chimeric proteins are the design and engineering of fusion points and linkers. The choice of fusion point can significantly impact the function of the resulting chimeric protein, as it can influence the spatial arrangement of the protein domains and their ability to interact with each other and with other molecules. Linkers, on the other hand, are connectors that provide flexibility and space between the domains, enabling proper folding and independent functioning. Their characteristics (length, composition, and conformation) significantly affect the activity, stability, and solubility of the chimeric protein. Thus, the strategic selection of fusion points and linkers is a key element in the field of protein engineering.
Previous methods to generate fusion or linker variants are unable to generate designed variants for a diverse library of gene fusions. Methods utilizing nuclease mediated truncation such
as (SHIPREC and ITCHY) create many fusions at random points. Although PCR primer based methods such as PATCHY allow control over the fusion points, they are only applicable to single genes and do not work with diverse gene libraries, due to the lack of conserved sequence in the priming region among different library members. Thus there remains a need to develop new methods to generate fusion or linker variants, particularly for multiplex methods.
SUMMARY
Provided herein are methods for synthesizing a plurality of nucleic acids of interest utilizing a modification of the DropSynth method and emulsion PCR (see, e.g., U.S. Pat. No. 10,202,628; Plesa et al. Science 359:343-347, 2018; Sidore et al., Nucl. Acids Res. 48:e95, 2020; all of which are incorporated herein by reference in their entirety). The provided methods utilize degenerate oligonucleotides and include methods for multiplex gene synthesis of a plurality of variant target nucleic acids in a single reaction.
Provided herein are methods of synthesizing a plurality of target nucleic acids, comprising:
(i) providing a set of oligonucleotides, each oligonucleotide comprising: a subsequence of one or more target nucleic acids, and a predesigned unique subsequence, wherein at least one or more of the oligonucleotides are degenerate oligonucleotides comprising a common overlap with one or more of the set of oligonucleotides;
(ii) hybridizing the set of oligonucleotides with a substrate, wherein the substrate comprises one or more nucleic acid barcodes complementary to a predesigned unique subsequence of the set of oligonucleotides;
(hi) isolating the one or more nucleic acid barcodes and hybridized oligonucleotides into a plurality of compartments; and
(iv) assembling the plurality of target nucleic acids utilizing a polymerase.
In some examples, the disclosed methods further include one or more of: cleaving the hybridized oligonucleotides from the substrate; recovering the plurality of assembled target nucleic acids; purifying the plurality of assembled target nucleic acids; cloning each of the plurality of assembled target nucleic acids into a vector; and sequencing each of the plurality of assembled target nucleic acids.
In some examples, the plurality of degenerate oligonucleotides comprise 0, 2, 4, 6, or 8 degenerate oligonucleotides.
In some examples, the one or more of the plurality of target nucleic acids comprises a gene or a fragment thereof, one or more variants of the target nucleic acids, and/or one or more chimeric nucleic acids. In some examples, the one or more chimeric nucleic acids encode a first protein domain and a second protein domain linked at a fusion point or by a linker. In some examples, the one or more chimeric nucleic acids encode a functional chimeric protein.
In some examples, the substrate includes a plurality of beads, a microarray, a silicon chip, or a microfluidic chip. In examples, where the substrate is a plurality of beads, the beads are fractionated into one or more compartments. In examples where the substrate is an array, a silicon chip or a microfluidic chip, a cover is placed over the substrate in such a way that the one or more nucleic acid barcodes and hybridized oligonucleotides are isolated into separate compartments, optionally following a washing step to remove or substantially remove unhybridized oligo nucleotides.
In some examples, the compartment is an emulsion droplet, a well, a chamber, a subcompartment of the substrate, a vesicle (such as a lipid or block copolymer vesicle), a liposome, or a polymersome. In some examples, the well or chamber is a microfabricated well or chamber on a silicon ship, a compartment within a microfluidic chip, or a compartment within a PDMS or SU8 (or other patterned functional polymer) substrate bonded to glass.
In some examples, the oligonucleotides of the set of oligonucleotides are from 50 nucleotides to 1000 nucleotides long, such as 300 nucleotides long. In some examples, the assembled target nucleic acids are about 100 nucleotides to about 3000 nucleotides long, such as 1000 nucleotides long.
In some examples, the assembled target nucleic acids comprise two or more variable or degenerate regions. In this example, a combinatorial library encompassing all possible combinations through the assembly process can be created.
In some examples, the assembled target nucleic acids comprise two or more full-length genes. In this example, overlaps between different genes in each group are designed and/or screened to minimize cross-hybridization, to allow independent assembly of each of the two or more genes within the compartment.
In other examples, the assembled target nucleic acids comprise a set of genetic circuits. The genetic circuits may include one or more of regulatory elements, promoters, ribosome binding sites, coding regions, terminators, aptamers, introns, non-coding RNA elements, recombination sites, and cloning sites.
The foregoing and other features of this disclosure will become more apparent from the following detailed description of several aspects which proceeds with reference to the accompanying figures.
BRIEF DESCRIPTION OF THE FIGURES
FIGS. 1 A-1E illustrate degenerate DropSynth for rational fusion point engineering. FIG. 1A: Overview of the DropSynth protocol and cloning scheme. FIG. IB: Within each barcoded microbead droplet, multiple versions of the last oligo are added. These degenerate oligos share a common overlap and can participate in the assembly reaction, with each encoding a different mutant variant. FIG. 1C: Domain architecture of a typical sensor histidine kinase. SHKs with diverse sensory and HAMP domains each linked to the C-terminal portion of the well characterized histidine kinase envZ were assembled. Fusion points were generated below the HAMP domain (bottom insert). AlphaFold2 predicted structure around the fusion site of a NarX-EnvZ chimera. Proper function of these chimeric sensors requires the correct phase orientation between the fused alpha helices. FIG. ID: Alignment of phase variant sequences (SEQ ID NOs: 45-53). By adding or removing residues from the N-terminal or C-terminal fragments, the relative phase orientation between the alpha helices on either side of the fusion point is changed. FIG. IE: Assembly reactions were tested where genes contained between 1 (no degeneracy) and 8 degenerate oligos encoding different phase variants to assess its impact on the gene assembly process.
FIGS. 2A-2D show 4x and 5x 300-mer oligo assemblies. Assembly of the two 4-oligo (FIG. 2A) and 5-oligo (FIG. 2B) libraries. FIG. 2C: The percentage perfects observed for variants with at least 100 barcodes shown both at the AA level (collapsed on synonymous mutations) and DNA level. FIG. 2D: The percentage of DNA perfects as a function of length showing a consistent reduction from -20% just over 600 bp down to -8% by 1 kbp. Grey line is a smoothed GAM fit while the colored lines are fits for 4 and 5 oligos with the model described in the text.
FIGS. A-3B show protein coverage. The number of designed protein variants for which at least one perfect amino acid sequence is observed. These are shown both individually for each codon library as well as combined together. FIG. 3A: The percentage observed relative to the number of variants designed. FIG. 3B: The absolute numbers of variants observed, with the total designed variants shown in light gray. The percent coverage decreases less than the increase in degeneracy scale, resulting in overall increased numbers of variants assembled.
FIG. 4 illustrates how degenerate DropSynth could be used to introduce variation in many parts of an assembled gene by creating multiple oligos for gene fragments at 1) the start, 2) internally, or 3) at the end.
FIG. 5 illustrates introducing degeneracy in multiple fragments leads to combinatorial assemblies.
FIG. 6 shows the absolute number of variants designed at each degeneracy level for each of the four libraries tested.
FIG. 7 shows the number of barcodes used (out of 1536) for each degeneracy level in each of the four libraries tested.
FIG. 8 shows that when plotting the median percentage of perfects, there is a correlation with the total number of barcodes observed and the degeneracy levels which is much stronger with the 5 oligo libraries (top-row) compared to the 4 oligo libraries (bottom-row).
FIG. 9 shows the difference in percentage perfects observed at the protein amino acid level (synonymous mutations collapsed) and the DNA level. There is a relatively consistent difference of 4.0% with 4 oligos and 2.7% with 5 oligos.
FIG. 10 shows analysis of the CIGAR alignment strings produced by minimap2 reveals the relative error frequencies per kbp for different types of errors. There are equivalent rates of insertions and single base deletions among the four libraries and much higher rates of multi base deletions. In the 4-oligo libraries mismatch rates are comparable to insertions and single base deletions, while in the 5-oligo libraries mismatches are substantially higher.
FIG. 11 A: A histogram of the length of deletions observed across all 4 libraires. While single-base deletions are dominant, a substantial amount of long deletions are observed. FIG. 1 IB: Since many multi-base deletions are so long, at any given base in a deletion, only 23.6% are from single-deletions, while 28.3% are deletion of 5-49 bp in length, 32.7% are deletions of 50 to 99 bp in length, 23.1% are deletions of 100 to 149 bp in length, and 8.1% are 150 bp or longer.
FIG. 12 shows the fraction of barcodes observed, normalized by the total faction of designs. For each library, the sum of all observed unique gene barcodes at a particular degeneracy level were determined, and it was divided by the total number of observed unique gene barcodes in the library, to calculate the fraction of observed barcodes. The fraction of designs at each degeneracy level was determined by dividing the total number of variants at that degeneracy level divided by the total number of variants in the entire library. There is a strong decay in the observed barcodes as the degeneracy level is increased.
FIG. 13 shows the fraction of barcodes observed, divided by the total faction of designs, scaled by degeneracy level. In other words the data from FIG. 12 on the fraction of observed barcodes was taken and multiplied by the degeneracy level. If it is assumed that the amount of DNA for variants at the end of assembly is inversely proportional to the degeneracy level it would be expected that roughly similar numbers (e.g. a variant from a degeneracy level of 8 has the
amount of DNA as a variant from degeneracy level of 1). Since a strong decay it still observed, this implies that other factors such as PCR amplification contribute to the effect.
FIG. 14 is a model of PCR amplification applied to all four libraries. The y-axis values are log transformed barcodes observed per variant while the x-axis is the expected variant concentration given by the total number of barcoded beads with a given degeneracy level divided by the total number of variants at that level.
FIG. 15 shows the Gini coefficient (as determined using perfect gene assemblies) - a measure of the inequality among the distribution of library members. The values observed are consistent with previous libraries assembled with DropSynth.
FIG. 16A: The percentage of perfects determined from the PacBio data (y-axis) plotted against the percentage of perfects determined using Oxford Nanopore data (x-axis). Data using only degeneracy level of 1. FIG. 16B: The distribution of delta percentage perfects shows a consistent (median) 7.8% higher rate for PacBio data highlighting its lower error rate in sequencing.
FIG. 17 shows the folding energy of 4000 random 20 bp sequences determined using both seqfold and LJNAfold, shows a high 0.84 correlation.
FIG. 18 is a map of plasmid pHKGGl. The libraries are cloned into the N-terminal region of EnvZ. This plasmid is a derivative of plasmid pSR348.
FIG. 19 shows an exemplary layout of barcoded beads. SEQ ID NO: 54 is the sequence of CP anchor 52bio skpp511F+R and captured oligo. SEQ ID NO: 55 is the sequence of CP liga- 5phos-skpp51 lFrc-3bio and CP bcoligo-cpl2mer-l.
SEQUENCE LISTING
The nucleic acid and amino acid sequences provided herein and listed in the accompanying sequence listing are shown using standard letter abbreviations for nucleotide bases, and single letter code for amino acids, as defined in 37 C.F.R. 1.822. Only one strand of each nucleic acid sequence is shown, but the complementary strand is understood as included by any reference to the displayed strand.
In the accompanying sequence listing:
SEQ ID NOs: 1-8 are the sequences of subpool amplification primers.
SEQ ID NOs: 9-32 are the sequences of primers used to generate pHKGGl.
SEQ ID NOs: 33-35 are the sequences of fragments used in Golden Gate to make libraries in plasmid pHKGGl .
SEQ ID NO: 36 is the sequence of an end cloning site.
SEQ ID NO: 37 is the sequence of a 24 basepair quasi-randomer barcode region.
SEQ ID NO: 38 is the sequence of the constant regions flanking the barcode.
SEQ ID NO: 39 is the sequence of a conserved region in the vector immediately flanking the cloning site.
SEQ ID NOs: 40 and 41 are standard assembly primers 504F&R.
SEQ ID NOs: 42 and 43 are modified versions of SEQ ID NOs: 40 and 41, respectively, with ITRs.
SEQ ID NO: 44 is the sequence of a primer for single-primer suppression PCR.
SEQ ID NOs: 45-53 are amino acid sequences of exemplary addition or removal of amino acid residues from N-terminal or C-terminal fragments on either side of a fusion point.
SEQ ID NO: 54 is the sequence of exemplary CP anchor52bioskpp511F+R and captured oligo.
SEQ ID NO: 55 is the sequence of exemplary CP lig-5phos-skpp511Frc-3bio and CP gcoligo-cp 12mer- 1.
SEQ ID NO: 56 is the amino acid sequence of residues 232-237 of envZ.
DETAILED DESCRIPTION
I. Overview
Although most protein engineering efforts have focused on single chimeric proteins, an increasingly desirable approach would be able to decipher the rules for engineering entire protein families rather than individual proteins. Towards this end a potential Design-Build-Test-Learn strategy could consist of: (1) designing large amounts of diverse relevant hybrids through metagenomic mining or rational computational approaches, (2) assembling large libraries of specifically designed variants spanning many diverse genes, (3) functionally characterizing the library using a multiplexed functional assay (4) feeding the resulting data into computational or machine learning (ML) models which can discern the underlying patterns, (5) repeating this process using computational or ML generated variants and feeding the results back into the model until some target threshold for accuracy is achieved. This disclosure addresses the second step in this process and demonstrates how to build large libraries of designed local mutant variants for many diverse genes. As provided herein, Degenerate DropSynth significantly enhances the scope, length, and cost-efficiency of gene library construction, thereby facilitating the exploration and understanding of entire protein families through large-scale, programmable assembly.
A previously described multiplex gene synthesis method, DropSynth 2.0 (Sidore et al., Nucl. Acids Res. 48:e95, 2020), demonstrated that it was capable of assembling 1,536 genes with a
median 501 bp in length in a single reaction with 64% coverage and 25% perfects at the amino acid level (-15% at nucleotide level). Although this approach increased the assembly scale significantly, in the context of protein engineering, given the immense number of variants possible, creating a large number of variants for many diverse genes would quickly saturate the 1536x genes possible in a typical reaction. As such, this disclosure describes synthesizing multiple variants for the same parent sequence within each droplet. In this approach, the DropSynth method proceeds as before, with oligos processed to expose a 12-nt single stranded barcoded overhang, which is then hybridized and ligated on to corresponding barcoded beads, followed by emulsification into droplets, and emulsion polymerase cycling assembly (ePCA). The degenerate DropSynth approach described herein leverages the fact that any additional oligo targeted to a barcode which contains the same ePC A overhangs as the gene will also participate in the assembly reaction (FIG. IB) and create a variant of the gene in a programmable manner. This approach provides several advantages. First, the marginal cost of each additional variant is only the cost of an additional oligo as all other reagents remain unchanged. Second, this provides a simple path towards much higher scales without larger sets of barcoded beads. While background-isolation from complex pools is critical to prevent cross-hybridization, small numbers of genes can be successfully assembled together with minimal screening for orthogonality (Borovkov et al., Nucl. Acids Res. 38:el8O, 2010).
Here, the degenerate DropSynth approach was applied to the creation of chimeric sensor histidine kinases (SHK) as a proof of concept. These homodimeric modular proteins typically contain an extracellular sensing domain, transmembrane domains, and signaling domains which help propagate the activation signal to a kinase domain (FIG. 1C). Phosphorylation of a histidine residue is transferred to the aspartate residue on a specific response regulator protein which can then dimerize and activate transcription of a target promoter. Despite their abundance, activating ligands are only known for a small number of SHKs since individual characterization is a slow and laborious process. Large scale characterization and deorphanization of this family could be achieved if the modular sensory domains could be swapped onto a well characterized kinase domain to allow for multiplexed testing of many receptors. Many challenges exist in such an approach. One issue is the selection of a fusion point due to the uncertainty in the exact domain boundaries as detected by HMMs or other methods, which makes it difficult to create functional chimeras, even if the signal transduction mechanism of both halves is the same. This uncertainty in the fusion boundaries makes it difficult to put the upstream (helix) portion into the proper phase orientation with the downstream (helix) portion. In this work chimeric fusions were built in a region just below the HAMP signaling domain, a homodimeric four alpha-helix parallel coiled-coil
region. The disclosed methods were used to create many phase variants for each sensor domain through the controlled addition or subtraction of amino acid residues on either the C or N terminal fragments of each chimera as shown in FIG. ID. Any variant can be made as it is programmatically encoded on a corresponding assembly oligo, with the only requirement being that the variant sequence length can fit onto the corresponding oligo.
As a proof of concept, in each library the assembly of between 1 and 8 different phase variants was tested for each gene, as shown in FIG. IE. In other words, in the same reaction, some droplets would only assemble one gene with no additional variants (degeneracy of 1 ), while at the other extreme droplets would assemble 8 different variants of the same gene. This allowed control for inter-reaction variability from factors such as the DNA recovery from the emulsions, PCR amplification, and processing yields. Two sets of proteins were designed based on their length. One with 1,530 proteins to be assembled with 4x 300-mer oligos and another with 1,531 proteins to be assembled with 5x 300-mer oligos. Two different codon libraries were made for each set for a total of four libraries with 6,122 total genes distributed among them. Using multiple codon versions increases the chances of a successful assembly. With the extra degenerate variants added, the four libraries encoded for a total of 10,862 proteins and 21,724 genes with distribution of degeneracy levels shown in FIG. 6. In the two 4-oligo libraries, the microbead barcodes were distributed relatively uniformly among the different degeneracy levels (median 292 microbead barcodes, s.d. 22), while in the 5-oligo libraries over half of the microbead barcodes had no degeneracy (level 1) (FIG. 7), to ensure sufficient statistics for the baseline case with increased length. The use of 300-mer oligos allowed assembly lengths of 1 kbp with 5 oligos, effectively doubling the length demonstrated previously with DropSynth 2.0.
II. Methods
Provided herein are methods of synthesizing a plurality of target nucleic acids. An overview of exemplary methods or portions thereof are shown schematically in FIGS. 1 A-1C and FIGS. 4 and 5. In some aspects the methods include
(i) providing a set of oligonucleotides, each oligonucleotide comprising: a subsequence of one or more target nucleic acids, and a predesigned unique subsequence, wherein at least one or more of the oligonucleotides are degenerate oligonucleotides comprising a common overlap with one or more of the set of oligonucleotides;
(ii) hybridizing the set of oligonucleotides with a substrate, wherein the substrate comprises one or more nucleic acid barcodes complementary to a predesigned unique subsequence of the set of oligonucleotides;
(iii) isolating the one or more nucleic acid barcodes and hybridized oligonucleotides into a plurality of compartments; and
(iv) assembling the plurality of target nucleic acids utilizing a polymerase.
In some aspects, at least one (such as at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 16, at least 32, at least 64, or more) of the plurality of oligonucleotides of the set is a degenerate oligonucleotide. In particular examples, the plurality of oligonucleotides includes 2, 4, 6, or 8 degenerate oligonucleotides. In other distinct examples, the plurality of oligonucleotides includes 0 degenerate oligonucleotides. A degenerate oligonucleotide refers to a mixture of sequences that differ in one or more position (such as 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more positions). In some examples, a degenerate oligonucleotide is a mixture of sequences that differ in one or more positions but encode the same amino acid sequence. In other examples, a degenerate oligonucleotide is a mixture of sequences that differ in one or more positions and encode different amino acid sequences or result in frame-shifted amino acid sequences. In some examples, a degenerate oligonucleotide can contain one or more conserved regions used to assemble multiple oligonucleotide fragments together (for example, used in Polymerase Cycling Assembly). These conserved regions may contain sequence overlaps to other oligonucleotides present in the assembly or other sequence features (<?.g., restriction enzyme sites) that enable assembly of multiple fragments together.
In some aspects, each of the oligonucleotides of the set of oligonucleotides are from about 50 nucleotides to about 1000 nucleotides long, such as from 50 nucleotides to 200 nucleotides, from 100 nucleotides to 300 nucleotides, from 150 nucleotides to 250 nucleotides, from 300 nucleotides to 400 nucleotides, from 250 nucleotides to 500 nucleotides, from 350 nucleotides to 600 nucleotides, from 450 nucleotides to 700 nucleotides, from 550 nucleotides to 800 nucleotides, from 650 nucleotides to 900 nucleotides, or from 750 nucleotides to 1000 nucleotides (for example, about 50 nucleotides, about 100 nucleotides, about 150 nucleotides, about 200 nucleotides, about 250 nucleotides, about 300 nucleotides, about 350 nucleotides, about 400 nucleotides, about 450 nucleotides, about 500 nucleotides, about 550 nucleotides, about 600 nucleotides, about 650 nucleotides, about 700 nucleotides, about 750 nucleotides, about 800 nucleotides, about 850 nucleotides, about 900 nucleotides, about 950 nucleotides, or about 1000 nucleotides long). In some examples, the each of the oligonucleotides in the set are the same length or about the same length. In other examples, the oligonucleotides in the set may vary by up to 30% in length from
- to -
one another or from a particular length (such as 300 nucleotides +/- 30%). In one specific example, each of the oligonucleotides of the set of oligonucleotides are 300 nucleotides long.
In some aspects, hybridizing the set of oligonucleotides with a substrate, wherein the substrate includes one or more nucleic acid barcodes complementary to a predesigned unique subsequence of the set of oligonucleotides includes incubating the oligonucleotides and the substrate with a ligase (such as Taq ligase) under conditions suitable for hybridization. In some examples, the hybridization conditions are 3 hours at 50°C followed by a 0. l°C/min ramp to 40°C and 3 hour incubation at 40°C, followed by a 0.1°C/min ramp to 30°C and 3 hour incubation at 30°C. In some examples, the methods further include amplifying the set of oligonucleotides and processing the oligonucleotides to expose the barcode as single-stranded DNA prior to the hybridization. In additional examples, the methods further include washing the substrate following hybridization to remove unbound oligonucleotides.
In further aspect, the substrate is a plurality of beads, a microarray, a silicon chip, or a microfluidic chip. In one specific aspect, the substrate is a plurality of beads, such as a plurality of microbeads including or linked to one or more nucleic acid barcodes complementary to the predesigned unique subsequence(s) of the set of oligonucleotides. In some examples, the plurality of beads are a plurality of magnetic beads and/or a plurality of streptavidin labeled beads. In one example, the beads are Dynabeads™ M-270 Streptavidin (Invitrogen), however, one of ordinary skill in the art can identify additional suitable beads of use in the disclosed methods.
In other aspects, isolating the one or more nucleic acid barcodes and hybridized oligonucleotides into a plurality of compartments includes isolating the one or more nucleic acid barcodes and hybridized oligonucleotides into a plurality of emulsion droplets, wells, chambers, subcompartments of the substrate, vesicles, liposomes, or polymersomes. In one specific example, the one or more nucleic acid barcodes and hybridized oligonucleotides are isolated into a plurality of emulsion droplets (such as at least 10,000 droplets, at least 100,000 droplets, at least 500,000 droplets, at least 1,000,000 droplets, at least 10,000,000 droplets, at least 50,000,000 droplets, at least 100,000,000 droplets, at least 150,000,000 droplets, at least 250,000,000 droplets, at least 500,000,000 droplets, at least 1,000,000,000 droplets, at least 100,000,000,000 droplets, at least 500,000,000,000 droplets, or at least 1,000,000,000,000 droplets).
In additional aspects, the methods include assembling the plurality of target nucleic acids utilizing a polymerase. In some examples, the assembly is carried out using polymerase cycling assembly (PCA, also known as assembly PCR). PCA is a method for assembling larger oligonucleotides (or even entire genes) from shorter, overlapping oligonucleotides. During the polymerase cycling, the oligonucleotides anneal to complementary fragments via the overlaps and
are filled in by a polymerase, building fragments of increasing length. The final product is obtained by a regular PCR reaction using primers that anneal to the 5’ and 3’ end of the full-length (e.g., target) sequence, which also eliminates shorter, incomplete fragments. In one example, the assembly is carried out using emulsion PCA (ePCA).
In some aspects, the assembled target nucleic acids are about 100 nucleotides to about 3000 nucleotides long, such as about 100 nucleotides to about 500 nucleotides, about 250 nucleotides to about 750 nucleotides, about 500 nucleotides to about 1000 nucleotides (for example about 100 nucleotides, about 150 nucleotides, about 200 nucleotides, about 250 nucleotides, about 300 nucleotides, about 350 nucleotides, about 400 nucleotides, about 450 nucleotides, about 500 nucleotides, about 550 nucleotides, about 600 nucleotides, about 650 nucleotides, about 700 nucleotides, about 750 nucleotides, about 800 nucleotides, about 850 nucleotides, about 900 nucleotides, about 950 nucleotides, or about 1000 nucleotides long) In some non-limiting examples, the assembled target nucleic acids are about 1000 nucleotides long. In some examples, the assembled target nucleic acids vary in length (are not all the same length), and in some examples, vary by up to about 25 nucleotides in their length, vary by about 50 nucleotides in their length, vary by about 75 nucleotides in their length, vary by about 100 nucleotides in their length, or more.
In some aspects, the methods further include one or more steps including cleaving the hybridized oligonucleotides from the substrate; recovering the plurality of assembled target nucleic acids; purifying the plurality of assembled target nucleic acids; cloning each of the plurality of assembled target nucleic acids into a vector; and/or sequencing each of the plurality of assembled target nucleic acids.
In some examples, cleaving the hybridized oligonucleotides from the substrate is carried out by contacting the oligonucleotides hybridized to the substrate with a restriction enzyme that cleaves a restriction site included in the oligonucleotides. Any suitable restriction enzyme can be used; however, in one specific example, the restriction enzyme is Btsl. In other examples the restriction enzyme can be BsrDI, Bst6I, Earl, Bso31I, BsmAI, Bse3DI, BmrI, Bmul, Bsal, BsmBI, BbsI, PaqCI, or BspQI. In further examples a photo-cleavable linker can be integrated into the oligonucleotide and used to cleave the oligonucleotide from the substrate upon exposure to light.
In some examples, recovering the plurality of assembled target nucleic acids includes purification of the assembled target nucleic acids away from other components, such as primers, enzymes, nucleotides, salts, or other reaction components. In some examples, this includes membrane-based purification, such as spin columns (e.g., miniprep or DNA purification columns commercially available from New England Biolabs or Qiagen, among other manufacturers). In
other examples this includes using Solid Phase Reversible Immobilization (SPRI) DNA purification beads (such as AMPure XP beads from Beckman Coulter). In other examples, recovering the plurality of assembled target nucleic acids includes or further includes gel-based purification, such as agarose gel purification of the assembled target nucleic acids.
In additional examples, cloning each of the plurality of assembled target nucleic acids into a vector is carried out by any method known to one of ordinary skill in the art. In one non-limiting example, the cloning is by Golden Gate assembly. In other examples cloning each of the plurality of assembled target nucleic acids into a vector is carried out using Gibson assembly, PC A (Polymerase Cycling Assembling), SLIC (Sequence and Ligation Independent Cloning), LIC (Ligation-independent cloning), LCR (Ligase Cycling Reaction), USER, InFusion, SliCE (Seamless Ligation Cloning Extract), Gateway, or CPEC (Circular polymerase extension cloning). In some examples, the vector includes one or more regulatory elements, such as a promoter (such as a constitutive or inducible promoter), a selection marker (such as an antibiotic resistance gene), and/or an origin of replication. The vector may also include one or more cloning sites (such as one or more restriction sites or one or more homology sites) for introducing a sequence of interest, such as assembled target nucleic acids. In some examples, the vector includes a lac repressor (such as LacI) and a Ptrc inducible promoter.
In further examples, sequencing each of the plurality of assembled target nucleic acids utilizes any sequencing method known in the art. In some examples, sequencing is carried out using multiplexed arrays sequencing (MAS-Seq), for example, commercially available from PacBio. In other examples, sequencing is carried out using nanopore sequencing, for example, commercially available from Oxford Nanopore.
In some aspects, the one or more of the plurality of target nucleic acids comprises a gene or a fragment thereof. In some examples, the plurality of target nucleic acids include one or more variants of the target nucleic acid. In some examples, the one or more variants of the target nucleic acid encode one or more amino acid substitutions, insertions, and/or deletions. In other examples, the plurality of target nucleic acids includes one or more chimeric nucleic acids. In some examples, the one or more chimeric nucleic acids encode a first protein domain and a second protein domain linked at a fusion point or by a linker. In certain examples, the one or more chimeric nucleic acids encode a functional chimeric protein.
In additional aspects, the assembled target nucleic acids include one or more variable degenerate regions or two or more variable or degenerate regions (such as 2, 3, 4, 5, 6, 7, 8, 9, 10, or more). An example of assembled target nucleic acids with one variable or degenerate region is shown schematically in FIG. 4. In some examples, the variable or degenerate region is at the N-
terminal of the assembled nucleic acids, internal to the assembled nucleic acids, or at the C- terminal of the assembled nucleic acids. An example of assembled target nucleic acids with two degenerate or variable regions is shown schematically in FIG. 5. In some examples, one of the degenerate or variable regions is at the N-terminal of the assembled nucleic acids and one of the degenerate or variable regions is internal to the assembled nucleic acids. In other examples, one of the degenerate or variable regions is at the C-terminal of the assembled nucleic acids and one of the degenerate or variable regions is internal to the assembled nucleic acids. In further examples, both of the degenerate or variable regions are internal to the assembled nucleic acids. In still further examples, one of the degenerate or variable regions is at the N-terminal of the assembled nucleic acids and one of the degenerate or variable regions is at the C-terminal of the assembled nucleic acids.
In additional aspects, the assembled target nucleic acids may include one or more full- length genes or two or more full-length genes (such as 1-32, for example, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, or 32 full-length genes). In some examples, the assembled target nucleic acids include a set of genetic circuits. A set of genetic circuits may include a set of regulatory elements, promoters, ribosome binding sites (RBS), one or more gene coding regions (e.g., monocistronic, bicistronic, or multicistronic), transcriptional terminators, aptamers, introns, non-coding RNA elements, recombination sites, and cloning sites. In some examples the genetic circuits may encode logic gates. In other examples the genes may encode a biosynthetic pathway, for example, a biosynthetic pathway capable of producing a small-molecule of interest. In other examples the full-length genes may encode for proteins of interest. In some examples the full-length genes encode for protein domains, or protein fragments. In other examples, the genetic circuits may include synthetic operons designed for coordinated expression of multiple genes. These operons may be used to engineer metabolic pathways to produce complex biomolecules, such as antibiotics, biofuels, or polymers. In still further examples, the circuits may encode gene-editing systems, such as CRISPR-Cas9 components, for targeted modifications of genomic DNA. In some examples, the assembled nucleic acids could include reporter genes for monitoring gene expression, such as fluorescent or luminescent markers. Furthermore, in some examples, the nucleic acids may incorporate elements for inducible gene expression, allowing for temporal control of gene activity in response to specific stimuli or environmental conditions.
III. Aspects of the Disclosure
Aspect 1. A method of synthesizing a plurality of target nucleic acids, comprising:
(i) providing a set of oligonucleotides, each oligonucleotide comprising: a subsequence of one or more target nucleic acids, and a predesigned unique subsequence, wherein at least one or more of the oligonucleotides are degenerate oligonucleotides comprising a common overlap with one or more of the set of oligonucleotides;
(ii) hybridizing the set of oligonucleotides with a substrate, wherein the substrate comprises one or more nucleic acid barcodes complementary to a predesigned unique subsequence of the set of oligonucleotides;
(iii) isolating the one or more nucleic acid barcodes and hybridized oligonucleotides into a plurality of compartments; and
(iv) assembling the plurality of target nucleic acids utilizing a polymerase.
Aspect 2. The method of aspect 1, further comprising one or more of: cleaving the hybridized oligonucleotides from the substrate; recovering the plurality of assembled target nucleic acids; purifying the plurality of assembled target nucleic acids; cloning each of the plurality of assembled target nucleic acids into a vector; and sequencing each of the plurality of assembled target nucleic acids.
Aspect 3. The method of aspect 1 or aspect 2, wherein the plurality of oligonucleotides comprises 2, 4, 6, or 8 degenerate oligonucleotides.
Aspect 4. The method of any one of aspects 1 to 3, wherein one or more of the plurality of target nucleic acids comprises a gene or a fragment thereof.
Aspect 5. The method of any one of aspects 1 to 4, wherein the plurality of target nucleic acids comprise one or more variants of the target nucleic acid.
Aspect 6. The method of any one of aspect 1 to 5, wherein the plurality of target nucleic acids comprises one or more chimeric nucleic acids.
Aspect 7. The method of aspect 6, wherein the one or more chimeric nucleic acids encode a first protein domain and a second protein domain linked at a fusion point or by a linker.
Aspect 8. The method of aspect 6 or aspect 7, wherein the one or more chimeric nucleic acids encode a functional chimeric protein.
Aspect 9. The method of any one of aspects 1 to 8, wherein the substrate comprises a plurality of beads, a microarray, a silicon chip, or a microfluidic chip.
Aspect 10. The method of any one of aspects 1 to 9, wherein the compartment comprises an emulsion droplet, a well, a chamber, a subcompartment of the substrate, a vesicle, a liposome, or a polymersome.
Aspect 11. The method of any one of aspects 1 to 10, wherein the oligonucleotides of the set of oligonucleotides are from 50 nucleotides to 1000 nucleotides long, such as 300 nucleotides long.
Aspect 12. The method of any one of aspects 1 to 11 , wherein the assembled target nucleic acids are about 100 nucleotides to about 3000 nucleotides long, such as 1000 nucleotides long.
Aspect 13. The method of any one of aspects 1 to 12, wherein the assembled target nucleic acids comprise two or more variable or degenerate regions.
Aspect 14. The method of any one of aspects 1 to 13, wherein the assembled target nucleic acids comprise two or more full-length genes.
Aspect 15. The method of any one of aspects 1 to 14, wherein the assembled target nucleic acids comprise a set of genetic circuits.
EXAMPLES
The following examples are provided to illustrate particular features of certain aspects of the disclosure, but the scope of the claims should not be limited to those features exemplified.
Example 1 Methods
This example describes the methods used for the results described in Example 2.
Gene Design: All amino acid sequences (1,127,577 in total) containing a histidine kinase domain were obtained from UniProt release 2021_02. This dataset was then loaded into HMMER (version 3.2.1) and the domains were annotated for each sequence using the Pfam HMMs (version 33.1). Phobius (version 1.01) and TMHMM-py (version 1.3.1) were both used to define transmembrane regions (TMs); only TMs with an overlap consensus of at least nine residues were kept, with the boundaries determined by Phobius being used. Only proteins with 2 TM domains and a HAMP domain were kept, leaving 126,611 proteins. These were further filtered into proteins where the length from the N-terminal to the end of the HAMP domain could fit into a 4x or 5x 300- mer oligo DropSynth assembly.
DropSynth Oligo Design: The DropSynth oligos were designed using a series of custom scripts available at (github.com/PlesaLab/DropSynth_code_2023). These scripts were significantly optimized compared to older versions of the design scripts. Briefly, some of the changes include a switch to a “recipe” based workflow with all parameters in a single file. The use of Lattice- Automation’s seqfold python library for minimum free energy structure calculations, instead of unafold (hybrid-ss-min). See FIG. 11 for a comparison. A programmable database for handling all restriction enzyme sites required. The ability to split as many genes as necessary in the first step with subsequent (384x, 1536x) library splitting. Virtual assembly and translation to verify oligo designs. The option to do microbead barcode reversal between libraries to offset barcoded microbead effects. Improved codon optimization with lower split failures through the use of hardcoded rules. The ability to require certain sequences (controls) in each library. Single oligo processing for very small genes. Initial support for DNA (non-protein) constructs. Improved oligo junction length handling, with genes that fail due to length placed into a special file for input into higher oligo splits. As before each subpool was given unique subpool amplification primers as shown in Table 1.
Degenerate Oligo Design: An R script was created to create all necessary degenerate oligos. Briefly, DropSynth oligos were initially designed using the +2 N fusion variants, since all other variants are as long or shorter. This ensured that all variants could fit on the last oligo. All DropSynth oligos were loaded into an R script and the payload sequence between the BtsI sites was determined. The overlap sequence between the last and second to last oligo payloads was determined. It was then determined if the degenerate mutation could be made based on the distance between the end cloning site (GACGTGAGACC; SEQ ID NO: 36) and the end of the overlap. Since the overlap sequence could not be modified, degenerate oligos were designed only if there was sufficient length after the overlap to implement the desired mutation. If the length was sufficient the script designed degenerate oligos by selectively removing or mutating codons for each phase variant and level of degeneracy (FIGS. ID, IE). If codons were removed, the overall length of the oligo was maintained the same by adding random bases into the padding region between the end cloning site and assembly amplification reverse primer site (skpp504R), checking to make sure no illegal restriction sites were introduced. If codons were changed, the padding was left unchanged, but the new sequence was still screened for illegal restriction sites. All successful degenerate oligo designs were combined together with the full DropSynth oligo set.
Oligo amplification and processing: Oligo designs were ordered as part of a pool of 58,500 300-mer oligos from Twist Bioscience. Processing followed the same protocol as detailed previously. Briefly the OLS pool was resuspended to a concentration of 19 ng/ L. Subpools were amplified using 18-20 cycles (as first determined using qPCR) with Kapa HiFi. Bulk amplification of each subpool was then carried out using 8-11 cycles (determined by qPCR) with 0.5 to 1 ng of template. Between 7-9 ug of bulk amplified DNA was put into the nt.BspQI nick processing, with a yield ranging from 4.2-6.2 ug (corresponding to a range of molar yields of 52-75%).
Emulsion assembly and Suppression PCR: Briefly, the DropSynth reaction was carried out using 1.3 pg of processed DNA. After emulsion breaking, the correct length assemblies were (blind) size selected with an agarose gel. DNA was extracted from gel slices with an NEB Monarch DNA Gel Extraction Kit and eluted in 30 p L. Of this, 1 pL was used as template in a suppression PCR reaction (first on a qPCR) using 25-28 cycles. These reactions were cleaned up, quantified by Qubit, and 0.36 pmol was subsequently used in each Golden Gate reaction as described below.
Plasmid design and cloning of pHKGGl: Plasmid pSR348 containing the complete NarX SHK under the Lad inducible promoter was received from Addgene (#124713) and sequence verified using full plasmid nanopore sequencing. Golden Gate Assembly was used to change the antibiotic selection maker from spectinomycin resistance to carbenicillin creating the plasmid pSR348_Carb. Several clones were isolated and their plasmids extracted and sequence verified in the same manner as pSR348. After swapping the selection marker, the NarX CDS was replaced with the wild type EnvZ SHK (amplified from E. coli MG1655) to generate the plasmid EnvZ_pSR348_Carb. Finally, site directed mutagenesis was used to remove a Kpnl restriction site for downstream cloning purposes creating the final plasmid renamed pHKGGl . A complete plasmid map and all primers used to generate pHKGGl are found in FIG. 18 and Table 2, respectively. All transformations after each cloning step were done in 10-Beta cells (NEB) unless otherwise noted.
Assembly barcode sequencing (MAS Iso-seq) and analysis: A three fragment Golden Gate assembly was used to clone the DropSynth generated libraries. The first fragment (A) of the assembly consisted of a variable segment composed of HK sensory domains produced from the degenerate DropSynth protocol. The overhang CATA was used at the start, where the final adenine base is the start of the ATG codon of the gene. The overhang GACG was used at the end where GAC encodes residue D232 in the EnvZ protein. The second fragment (B) contained a conserved portion of the EnvZ gene corresponding to amino acid residues D232-G450 and a 24 basepair quasi -randomer barcode region (NNBBDDBBVVHHDDBBVVHNDDNN; SEQ ID NO: 37) downstream of the stop codon. The final fragment (C) was derived from pHKGGl and contained the pl5A Ori, Amp selection marker, lac repressor, and Ptrc inducible promoter. The fragment (B) containing the conserved portion of EnvZ and the 24 bp barcode along with the pHKGGl backbone were both generated via PCR and their respective primers are tabulated in Table 1. The sequences of the three Golden Gate fragments are provided in Table 3.
Table 3. The sequences of the three fragments used in Golden Gate to make libraries in plasmid pHKGGl. The BsaI-HF-V2 recognition seq is underlined and overhang seq is bold.
For each Degenerate DropSynth library, three Golden Gate reactions were run using the following temperature program: 1) Incubate at 37 °C for 20 hours; 2) heat inactivation of both enzymes at 80°C for 20 mins; 3) final hold at 12°C. The molar ratios used were 0. 18 pmol for FrgC (backbone), 0.36 pmol for FrgB (conserved region of EnvZ and barcode), and 0.36 pmol for FrgA (degenerate DropSynth synthesized library) for a total of 470 ng of DNA in the reaction. The reagent volumes were 2.5 pL of lOx T4 DNA Ligase Buffer (NEB), 2.5 pL of 10 mM ATP (NEB), 0.25 pL T4 DNA Ligase (NEB), 0.75 pL BsaI-HFv2 (NEB), and water to complete the remaining volume to a total of 25 pL. Assemblies were then pooled and cleaned using the Monarch PCR and DNA Cleanup Kit (NEB) and then drop dialyzed on a 0.05 pm membrane filter (Sigma Millipore) for a minimum of 30 minutes. Purified DNA was used as the input for two electroporations (BioRad MicroPulser) which were then combined and plated. Serial dilutions were used to calculate the total number of CFUs and are listed in Table 4.
Table 4. The number of CFUs observed after transformation.
Assembly barcode sequencing (MAS Iso-seq) and analysis: Each library was PCR amplified with one of the HK_PB_O#_FWD+REV primer pairs using Q5 DNA polymerase, 10 ng of template (miniprepped plasmid), in a 50 pL reaction for 11-12 cycles. To submit samples for MAS Iso-seq, the four libraries were mixed into a single 30 ,uL sample containing 1 .25ng/pL per library (5ng/pL total) and submitted to the University of Oregon GC3F core for library preparation and sequencing. Briefly, PacBio MAS-Seq for lOx Single Cell 3’ kit (102-659-600) was used to generate arrays for sequencing on a Sequel II instrument producing 120.20 Gbases of unique molecular data in 6.14 million raw reads. These samples were also sequenced by Plasmidsaurus using Oxford Nanopore sequencing with R10.4.1 flowcells, vl4 chemistry, and basecalled with Guppy 6.5.7. This produced between 10,592,215 to 12,245,627 reads for each sample.
For the MAS Iso-seq data, skera (0.1.0) was used to split the MAS arrays within 1,268,200 CCS reads, producing 15,907,686 split segments. Lima was then used to demultiplex the split segments resulting in 2,740,434 to 4,826,815 reads per library. For both MAS Iso-seq and Oxford Nanopore data a custom python script was used to first identify the constant regions flanking the barcode (GTCGCTGCCGAACAGC-24N-AGGAGAAGAGCGCACG; SEQ ID NO: 38) allowing up to 3 mismatches. Then each read was scanned for the presence of the Ndel (CATATG) site at the start codon and the conserved region in the vector immediately flanking the cloning site (GACGACCGCACGCTGCTG (SEQ ID NO: 39) which corresponds to residues 232-237 (DDRTLL; SEQ ID NO: 56) of envZ). The script outputs this variable region and each associated barcode, as well as the barcodes counts. Barcode counts were input into starcode (1.4) for collapse with a distance of 1 using the sphere algorithm. A consensus call was made for each barcode using a simple majority call. Genes were aligned to their closest parent sequence using minimap2 with k- mers set to 10. To assign reads to degenerate variants the last 16 bp of each variable region was taken. Perfect matches to designs were taken as is, and for the remainder the Levenshtein distance was calculated between each 16 bp end sequence and all degenerate variants for that parent sequence. The read was assigned based on the smallest Levenshtein distance. In case of a tie (rare), a random assignment was made. All subsequent analysis and plots were carried out in R (4.3.1).
Example 2 Degenerate DropSynth Gene Synthesis
Increased levels of degenerate oligos have a minimal impact on the percentage perfects of assembled genes. The four libraries were successfully assembled (FIGS. 2A-2B) using the standard DropSynth protocol, Golden Gate cloned into a randomly barcoded expression vector, and sequenced using both MAS Iso-seq (PacBio Sequel II) and Oxford Nanopore. Among genes with at least 100 barcodes, a median of 16.6% and 17.3% DNA perfects were found for the 4 oligo library codon 1 and codon 2 respectively (FIG. 2C, top). The impact of adding increasing numbers of degenerate oligos on the rates of perfects observed was explored. Comparing the rates of genes from the no degeneracy set to those with 2, 4, 6, or 8 no statistically significant differences were found except for Codon 2 degeneracy level 6, which showed a weakly significant decrease (p=0.014). These results suggest that the presence of additional assembly oligos has no impact on the percentage of perfects. For the five oligo assemblies there was a median of 10.2% and 13.0% DNA perfects codon 1 and codon 2 respectively (FIG. 2C, bottom). For the codon 1 library there were significant decreases for degeneracy levels of 4 (8.2%, p=0.02), 6 (7.6%, p=0.02), and 8 (6.4%, p=9E-6) relative to no degeneracy. A similar pattern was observed in codon 2 library with significant decreases for degeneracy levels of 4 (11.2%, p=0.0004), 6 (11.6%, p=0.0004), and 8 (11.6%, p=0.03) relative to the no degeneracy case. These decreases are attributed to the imbalance in the final representation between the variants from different degeneracy levels which is exponentially increased by PCR amplification, and can be observed when plotting the perfects rate against the number of barcodes observed (FIG. 8). Comparing the rates seen at the DNA level to those at the amino acid level, where synonymous mutations have been collapsed onto the parent sequence, there was a consistent 4.0% (s.d. 0.2%) higher rate at the protein level for 4 oligo assemblies and a 2.7% (s.d. 0.8%) increase for 5 oligo assemblies, as shown in FIG. 9. As expected, this suggests that the fraction of synonymous mutations decreases as length increases.
The relative error frequencies per kbp were determined for different types of errors by analyzing the CIGAR alignment strings. Equivalent frequencies for insertions (median 1.47, s.d. 0.22) and single base deletions (median 1.44, s.d. 0.25) were found, as shown in FIG. 10. The mismatch frequencies were also similar in the 4-oligo libraries (median 1.47), but much higher in the 5-oligo libraries (median 4.56). The frequency of multi-base deletions was high across all libraries (median 5.75, s.d. 2.08). The deletion lengths were further explored and, while single base deletions were by far the most common (FIG. 11 A), the relatively large number of long deletions means that at any given base there is a higher probability that a deletion is part of a larger multi-
base deletion rather than a single-base isolated deletion (FIG. 1 IB). The higher rates of deletions were consistent with previous reports using microarray derived oligos (Kosuri et al., Nat. Methods 11 :499-507, 2014). Further investigation is required to determine if the primary source of the long multi-base deletions are from oligo synthesis or from the ePCA. Note that these numbers were derived with minimap2, a general purpose aligner, as opposed to an exhaustive global alignment like Needleman- Wunsch.
The percentage of perfects had a strong inverse relationship with length as errors in the oligos propagate and combine though the assembly process. The genes from all degeneracy levels were plotted together as a function of length (FIG. 2D), and a median rate of 18.2% (s.d. 4.7%) was observed for the 17 genes with lengths below 650 bp to 8.2% (s.d. 5.2%) for the 22 genes above 950 bp. A simple model was created to fit this data which combines errors due to oligo synthesis, PCR amplification, and ePC A assembly. With an error rate of 5.52E-6 errors per base per cycle, Kapa HiFi amplification of the oligos (27-29 cycles) and assembled genes (25-28 cycles) has a relatively modest impact on the percentage of perfects. For a 1 kbp gene, 68% of genes would be expected to remain perfect when accounting for all PCR amplification. For oligo synthesis with a 1 in 3000 bp error rate (based on vendor provided rates), a 61% probability is expected that all 5 oligos (300-mers) are perfect, which would reduce to 47% if the error rate increases to 1 in 2000 bp. This calculation is pessimistic, as errors in some oligo regions (primers, restriction sites, overlaps, etc.) will have a strong selection pressure against propagation during the assembly process. Combining the expected effects of PCR and oligo synthesis an estimated 41% perfects rate is found. Since the observed perfects rate is around 8.2% at 1 kbp, this suggests that either the ePCA process is the largest contributor to errors, or the error rates used for PCR and oligo synthesis should be higher.
Decreases in observed coverage are attributed to reduced representation at higher degeneracy levels. The coverage for all libraries and degeneracy levels was examined, where coverage is defined by the number of variants for which a perfect protein sequence is observed at least one. Both PacBio and Nanopore data were combined to maximize the sequencing depth of the data. For the 4 oligo libraries 84% and 71 % coverage was observed for a degeneracy of 1, which reduces to 63% and 45% by degeneracy of 8, as shown in FIG. 3A. When both codon versions are combined over proteins this improves to 93% for degeneracy of 1 and 76% by degeneracy level 8. Although there was an average 0.69x fold decrease in the percentage coverage, this was far less than the 8x fold increase in scale due to the degeneracy. On an absolute basis, the total number of genes observed increases from 244 and 207 at degeneracy level of 1 to 1448 and 1026 by degeneracy level of 8, due to the increased scale, for the 4 oligo libraries which have a
roughly uniform distribution of microbead barcodes (median 292, s.d. 22) among different degen levels (FIG. 7). For the 5 oligo libraries 49% and 70% coverage was observed for a degeneracy of 1, which reduces to 20% and 50% by degeneracy of 8. When combining the two codon versions this improves to 82% for degeneracy of 1 and 58% by degeneracy level 8.
We looked for trends which could explain this drop in coverage. In examining the fraction of barcodes observed to the corresponding fraction of overall designs as a function of degeneracy level (FIG. 12), an exponential decrease was observed with increasing degeneracy. The decay still occurs when these numbers are scaled by the degeneracy level to account for lower amounts of DNA (FIG. 13). It is hypothesized that this is due to the suppression PCR amplification which occurs post assembly. Briefly, each of the barcoded beads have a limited loading capacity. As such it is expected, on average, that the amount of DNA per variant after the DropSynth assembly reaction will be inversely proportional to the degeneracy level. In other words, a variant made in a droplet with a degeneracy level of 8, will have, on average, 8-fold less assembled DNA. The subsequent suppression PCR exponentially amplifies these differences. Indeed, a simple model of PCR amplification fits the log transformed relationship between barcodes observed per variant versus expected variant concentration relationship quite well, with R2 values ranging from 0.945 to 0.998 (FIG. 14). Here the expected variant concentration is taken to be the total number of barcoded microbeads with a given degeneracy level divided by the total number of variants at that level. This effect highlights the importance of using the same level of degeneracy across all variants in a degenerate DropSynth reaction. With a full 1536x library fully having a degeneracy level of 8 (12,288 variants) it is estimated that the protein coverage will be to be at least -8,000 for a single library and -9,000 when combined over two different codon versions for 4 oligos and -6,000 for a single library and -7,000 when combined over two for 5 oligos, depending on sufficient sequencing depth. This estimate is conservative as it does not account for the expected improvement when only a single consistent degeneracy is used among all barcodes in the library.
The uniformity in the representation of the variants was determined. Previous libraries synthesized with DropSynth showed a Gini coefficient (ranging from perfect equality at 0 to perfect inequality at 1) in the range of 0.69 to 0.94. The Gini coefficient at each degeneracy level for all four libraries is shown in FIG. 15. This ranged from 0.71 to 0.88 (median 0.81, s.d. 0.06) for the 4 oligo libraries with a slightly increasing trend as degeneracy increases. For the 5 oligo libraries, higher (less uniform) values were seen at a range of 0.83 to 0.93 (median 0.87, s.d. 0.04). These values are in line with previous observations suggesting the increased degeneracy has a minor effect relative to other factors such as PCR bias.
Barcode mapping of constructs over 500 bp requires long read sequencing, beyond what can be achieved with Illumina sequencing. Use of Oxford Nanopore sequencing as a viable alternative to PacBio MAS-Iso-seq was explored. Although the raw reads have a higher error rate, previous studies have shown this can be reduced significantly by collapsing reads on barcodes (Karst et al., Nat. Methods 18:165-169, 2021 ; Zurek <?/ al., Nat. Commun. 11 :6023, 2020). A read-based majority call was used to determine consensus for each barcode. In comparing the percentage perfects observed through each sequencing method there was consistently a higher rate for the PacBio data (median 7.8%, s.d. 4.2%) as shown in FIG. 16A-16B. This highlights the gap in error rates between the two platforms and suggests that more complex consensus calling approaches and likely higher sequencing depths are required to reach parity between the two methods.
Example 3 Exemplary Protocol
Layout of the barcoded beads is shown in FIG. 19.
Assembly
Rehydrate the OLS Chip
• Resuspend OLS chip in 100pL of either Qiagen EB (10 mM Tris-Cl, pH 8.5) or 100pL lx TE.
• -10 pmol/lOOpL = 100 nM
• Vortex & spin.
• Heat to 50°C if necessary.
• Check concentration with a ssDNA Qubit fluorometer kit.
• Make a 1/10 OLS dilution
• WpL OLS to 90pL 10 mM Tris-Cl, pH 8.5
• 1/10 = 10 nM tube qPCR Amplification
When amplifying libraries limit the number of cycles to prevent overamplification.
Library amplification is a two-step process:
1. Carry out a qPCR reaction to determine the number of cycles where you are still in exponential phase.
2. Do a regular PCR reaction with the EXACT same conditions (polymerase, primer and template amounts) using the number of cycles determined using qPCR. Some important points about the qPCR:
1. If your number of cycles is very high, this could indicate a diversity bottleneck.
Recall that:
N_x = N_0*(l+E)A(x) i. Where N_x is the final number of molecules after x cycles ii. N_0 is the initial number of dsDNA molecules iii. E is the PCR efficiency iv. If the template is ssDNA use (x-1)
2. Always run no-template controls (NTC) so that you can be sure you are seeing your library and not primer dimers.
3. Saturation is due to competitive inhibition of DNA polymerase by dsDNA. Not depletion of primers, NTPs, or loss of polymerase activity.
3. Run the amplification on a gel to verify that it's not overamplified. Look for high-molecular weight products indicative of overamplification.
Begin by amplifying each library with a qPCR and determining the maximum number of cycles in the exponential stage. Annealing temp set to 60°C. Typically 12-24 cycles are needed.
Subpool Amplification Protocol:
1. qPCR amplify each subpool: 2X tubes a. qPCR protocol (70sec/cycle + 2min): i. 45 sec 98°C initial denaturation ii. 15 sec 98°C denaturation iii. 30 sec 60°C annealing iv. 15 sec 72°C extension
v. Plate read vi. Go to step 2, repeat 35x vii. 1 min 72°C final extension b. Setup the reaction as follows: i. 1 pL template (1/10 OLS pool dilution) ii. 1.25 pL AmpF lOuM iii. 1.25 pL AmpR lOuM iv. 21.5 pL UltraPure Distilled Water (Invitrogen) v. 25 pL Kapa Sybr Fast vi. TOTAL: 50 pL
2. Amplify each subpool using Kapa HiFi. a. 1 pL template (1/10 OLS pool dilution) b. 1.25 pL subpool specific primer 10 pM ampF c. 1.25 pL subpool specific primer 10 pM ampR d. 21.5 pL UltraPure Distilled Water (Invitrogen) e. 25 pL Kapa HiFi f. TOTAL: 50 pL
PCR protocol: i. 45 sec 98°C initial denaturation ii. 15 sec 98°C denaturation iii. 30 sec 60°C annealing iv. 15 sec 72°C extension v. Go to step 2, repeat based on the number of cycles determined by qPCR. vi. 1 min 72°C final extension
3. Column purify amplified oligos.
4. Measure concentrations on qubit.
5. Run PCR products on gel. Look for higher MW products, indicative of overamplification. Excessive low MW products may indicate chip synthesis issues.
6. Size select, using gel extraction, if necessary.
7. Create Ing/pL dilutions of each amplified subpool.
Bulk Amplification
8. Bulk amplify subpools. a. Run a second PCR using a biotinylated FWD amplification primer, with sufficient tubes to make 5 ug to 10 ug of PCR product. Run 24x tubes per lib. i. 1 pL of 1 ng/pL subpool dilution ii. 1 .25 pL subpool specific primer mix 10 pM biotinylated ampF iii. 1.25 pL subpool specific primer mix 10 pM (NOT biotinylated) ampR iv. 21.5 pL UltraPure Distilled Water (Invitrogen) v. 25 pL Kapa HiFi vi. TOTAL: 50 pL
vii. PCR protocol:
1. 45 sec 98°C initial denaturation
2. 15 sec 98°C denaturation
3. 30 sec 60°C annealing
4. 15 sec 72°C extension
5. Go to step 2, 18X
6. 1 min 72°C final extension b. Pool and column purify. Note how many tubes mixed and elution volume. c. Qubit measure concentration.
Nicking
9. Nicking. a. Nick the bulk amplified subpools. Split the following across multiple tubes depending on the amount of DNA to be processed. In each 1.5 mL tube add: i. 15 pL Nt.BspQI (lOU/pL) (New England Biolabs) add last ii. 5 to 10 ug of DNA iii. 15 pL NEBuffer3.1 (New England Biolabs) iv. UltraPure Distilled Water (Invitrogen) to 150 pL total b. Leave at 50°C overnight with shaking > 1500 RPM.
Capture and remove the short biotinylated fragment. o Wash 50 pL streptavidin M270 Dynabeads (Invitrogen) for each 1.5 mL tube in the nicking reaction, as per manufacturer’s instructions and resuspend in 10 mM NaCl, 10 mM Tris-HCl, 1 mM EDTA buffer. There is actually no need to use streptavidin M270 Dynabeads, any cheaper polydisperse streptavidin magnetic beads will do. o Add 50 pL of washed beads to the 150 pL nicking reaction in each tube. o Incubate at 55°C with 800 RPM shaking for at least 1 hour. o Move all 1.5 mL tubes to a 55°C water bath. o Place the tube so that solution is just below the surface of the water. Hold a strong magnet underwater against the side of the tube to magnetically separate Dynabeads. Pipette the supernatant, which contains the processed oligos and save them in a new container. Remove the tube with the Dynabeads from the magnet. o Add 100 pL of UltraPure Distilled Water (Invitrogen) to the tube and resuspend the beads. Incubate these at 55°C for another 30 min and then repeat the procedure to recover the supernatant again while leaving the Dynabeads behind. o Repeat this procedure for all tubes as necessary. o Pool processed oligos (supernatant) for each subpool and column purify.
Capture
2. Capture processed oligos with barcoded beads. o Take 20 pL of the pooled barcoded beads. These are in stored in 2X B&W buffer (high ionic concentration) which may interfere with ligation reaction. Resuspend them in 20 pL UltraPure Distilled Water (Invitrogen). o Mix the processed DNA with the barcoded beads:
■ 1.3 ug processed DNA (-12 pmol)
■ 20 LI L pooled barcoded beads (-6 million beads, binding capacity 1.3 ug DNA)
■ 10 pL 10X Taq ligase buffer (New England Biolabs)
■ 4 pL Taq ligase (40 U/pL) (New England Biolabs)
■ UltraPure Distilled Water (Invitrogen) to 100 pL o Overnight cycling (>2 hr incubation at each of the following temperatures) (9 hr), use shaking to prevent beads from settling down:
■ 3 hours @ 50°C
■ Ramp to 40°C for 3h, 0. l°C/min
■ Ramp to 30°C for 3h, 0.1°C/min o Wash 6 times at RT using 2X B&W buffer + (NP-40 optional). This is important for removing unbound oligos in order to increase specificity. o Re-suspend in 100 pL Elution Buffer (Qiagen) (~60k beads/pL)
DropSynth Reaction
3. Emulsion assembly (ePCA). o Setup emulsion. All of this procedure should be done on ice. FWD and REV assembly primers contain ITR overhangs which will be used for single-primer suppression PCR. Add Btsl-v2 only at the very last step. Try to minimize the time between adding the Btsl-v2 and vortexing the emulsion.
■ 40 pL of loaded beads (-500 ng DNA)
■ 0.5 pL 100 pM AsmF_40bpITR Suppression_501-503-504F
■ 0.5 pL 100 pM AsmR_40bpITR Suppression_501-503-504R
■ 50 pL KAPA HiFi 2X Mastermix (KAPA Biosystems)
■ 1 pL BSA (New England Biolabs)
■ 1 pL UltraPure Distilled Water (Invitrogen)
■ 7 pL Btsl-v2 (New England Biolabs) (add last)
■ TOTAL: 100 pL o Mix at low speed in vortexer to resuspend beads. o Add 600 pL Droplet Generation Oil for EvaGreen (Bio-Rad) (or a similar emulsion oil) to a 1.5mL non-stick tube. o Add 100 pL aqueous phase to the bottom of the oil phase. o Vortex at Max Speed in foam holder taped down for 3 minutes. If doing multiple emulsions, do this one at a time. We use a Vortex Genie 2 (Scientific Industries) at max speed. o After vortexing all emulsions, place each emulsion into PCR tubes with 100 pL in each tube. Use a Pl 000 tip to avoid disturbing the emulsion. Most of the droplets will float to the top of the tube, try to get as much of this as possible and distribute this over multiple PCR tubes. o PCR Cycling
■ 55°C for 90 min (allow Btsl-v2 to cleave DNA from the beads)
■ 94°C for 2 min (initial denaturing)
■ 94°C for 15 sec (denaturing)
■ 60°C for 20 sec (annealing)
■ 72°C for 45 sec (extension)
■ Go to step 3 for additional 60 cycles
■ 72°C for 5min (final extension)
■ 4°C forever
4. Break the emulsion using Chloroform o After ePC A, split the 7 ePCA tubes with DS reaction inside between two 1.5 mL microcentrifuge tubes (one of the 7 ePCA tubes should only have ~50 pL inside), let sit for 10 mins to allow for phase separation. Using a Pl 000 extract nearly all of the oil (lower phase) from each microcentrifuge tube. This is done as described below:
Emulsion breaking protocol for BioRad Droplet Generation Oil (adapted from pg 69 of Bio Rad
Droplet Digital PCR Applications Guide):
1. Press a Pl 000 down to its first stop, push through the droplets to the bottom of the tube, press down to the second stop to expel any droplets, then wait several seconds for the droplets to float back up to the droplet layer, and finally aspirate out the oil. You do not need to remove every last bit of oil. Better to leave a small amount to avoid removing any of the DNA.
2. Add 50 pL of TE buffer for each lOOpL of PCR reaction in the two 1.5mL snap cap microcentrifuge tubes (should have ~350pL in each tube)
3. In a fume hood, add 175 pL of chloroform for each PCR reaction in the tube. (If 4 PCR reactions in a tube then, contents will be: <400pL PCR reactions, 200pL TE, 700 pL chloroform).
4. Vortex at maximum speed for 1 min.
5. Apply enough phase lock gel to the top of each microcentrifuge tube to cover the opening completely (the amount of gel should be slightly larger than the cap for an IDT oligo tube)
6. In a centrifuge, spin down at 15,500 x g for 10 min.
7. Remove upper aqueous phase by pipetting, avoiding the chloroform phase. Transfer this to a clean 1.5mL tube (should have ~300 pl of DNA solution)
8. Dispose of chloroform phase appropriately. Hazardous waste.
9. Clean the 300 pL of DNA by combining with 600 pL of NEB DNA Binding Buffer and load onto a Mini Prep Column. Wash twice with 200 pL of NEB DNA Wash Buffer. Elute in 30 pL of NEB Elution Buffer (heated to 50°C)
Blind Extraction
1. Prepare a 2% agarose gel stained with Sybr Safe.
2. For each library load 15 pL of cleaned ePCA reaction (from above) directly between two 100 bp DNA ladders (e.g.: Lane 1 : 100 bp Ladder, Lane 2: sample, Lane 3: lOObp ladder) ***only 15 pL of library is loaded should something downstream fail***
3. Run gel in FRESH TAE buffer at 100V for 1 hour and 15 minutes.
4. Extract the region of the gel where your expected library products should be. You should excise a region slightly larger than the precise mean length of your library. (+/- 50 bp)
5. When following the NEB Agarose Gel Extraction protocol after adding the dissolving buffer and shaking at 50 C for 3 mins and loading onto a DNA Clean Up Column the first spin should be done at 10,000 RCF (this helps with the binding efficiency) and then the centrifuge can be ramped up to 16,000 RCF for the subsequent wash steps.
6. Elute in 12 pL
7. Use eluted DNA as template for 2 qPCR reactions listed below using Kapa Sybr Fast DNA polymerase.
8. You may want to run qPCR products on a gel to confirm the desired library products are present.
Suppression PCR
5. Single-primer suppression PCR. o In this technique, self-annealing of inverted terminal repeats (ITRs) flanking the assembled genes competes with the annealing of a single primer which aligns to part of the ITR3. Shorter by-products tend to self-anneal, while correct assembly products anneal to the primer, resulting in proper amplification.
■ 1 pL template
■ 4 pL 10 pM suppression primer
■ 25 pL Kapa HiFi
■ UltraPure Distilled Water (Invitrogen) to 50 pL
■ PCR protocol:
1. 3 min 95°C initial denaturation
2. 15 sec 98°C denaturation
3. 30 sec 58°C annealing
4. 15 sec 72°C extension
5. Go to step 2, determine cycles using qPCR.
6. 1 min 72°C final extension
Primer is
o Column purify using a DNA Clean & Concentrator o Check size distribution on gel or tapestation. o Quantify DNA (Qubit) and proceed to downstream applications.
It will be apparent that the precise details of the methods or compositions described may be varied or modified without departing from the spirit of the described aspects of the disclosure. We claim all such modifications and variations that fall within the scope and spirit of the claims below.
Claims
1. A method of synthesizing a plurality of target nucleic acids, comprising:
(i) providing a set of oligonucleotides, each oligonucleotide comprising: a subsequence of one or more target nucleic acids, and a predesigned unique subsequence, wherein at least one or more of the oligonucleotides are degenerate oligonucleotides comprising a common overlap with one or more of the set of oligonucleotides;
(ii) hybridizing the set of oligonucleotides with a substrate, wherein the substrate comprises one or more nucleic acid barcodes complementary to a predesigned unique subsequence of the set of oligonucleotides;
(iii) isolating the one or more nucleic acid barcodes and hybridized oligonucleotides into a plurality of compartments; and
(iv) assembling the plurality of target nucleic acids utilizing a polymerase.
2. The method of claim 1, further comprising one or more of: cleaving the hybridized oligonucleotides from the substrate; recovering the plurality of assembled target nucleic acids; purifying the plurality of assembled target nucleic acids; cloning each of the plurality of assembled target nucleic acids into a vector; and sequencing each of the plurality of assembled target nucleic acids.
3. The method of claim 1, wherein the plurality of oligonucleotides comprise 2, 4, 6, or 8 degenerate oligonucleotides.
4. The method of claim 1, wherein one or more of the plurality of target nucleic acids comprises a gene or a fragment thereof.
5. The method of claim 1, wherein the plurality of target nucleic acids comprise one or more variants of the target nucleic acid.
6. The method of claim 1, wherein the plurality of target nucleic acids comprises one or more chimeric nucleic acids.
7. The method of claim 6, wherein the one or more chimeric nucleic acids encode a first protein domain and a second protein domain linked at a fusion point or by a linker.
8. The method of claim 6, wherein the one or more chimeric nucleic acids encode a functional chimeric protein.
9. The method of claim 1, wherein the substrate comprises a plurality of beads, a microarray, a silicon chip, or a microfluidic chip.
10. The method of claim 1, wherein the compartment comprises an emulsion droplet, a well, a chamber, a subcompartment of the substrate, a vesicle, a liposome, or a polymersome.
11. The method of claim 1 , wherein the oligonucleotides of the set of oligonucleotides are from 50 nucleotides to 1000 nucleotides long, such as 300 nucleotides long.
12. The method of claim 1, wherein the assembled target nucleic acids are about 100 nucleotides to about 3000 nucleotides long, such as 1000 nucleotides long.
13. The method of claim 1, wherein the assembled target nucleic acids comprise two or more variable or degenerate regions.
14. The method of claim 1, wherein the assembled target nucleic acids comprise two or more full-length genes.
15. The method of claim 1, wherein the assembled target nucleic acids comprise a set of genetic circuits.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202363608469P | 2023-12-11 | 2023-12-11 | |
| US63/608,469 | 2023-12-11 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025128162A1 true WO2025128162A1 (en) | 2025-06-19 |
Family
ID=96058275
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2024/037902 Pending WO2025128162A1 (en) | 2023-12-11 | 2024-07-12 | Degenerate dropsynth gene synthesis |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025128162A1 (en) |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20100120092A1 (en) * | 2006-08-30 | 2010-05-13 | Hepgenics Pty Ltd. | Recombinant proteins and virus like particles comprising l and s polypeptides of avian hepadnaviridae and methods, nucleic acid constructs, vectors and host cells for producing same |
| US20110082055A1 (en) * | 2009-09-18 | 2011-04-07 | Codexis, Inc. | Reduced codon mutagenesis |
| US20150111256A1 (en) * | 2012-02-17 | 2015-04-23 | President And Fellows Of Harvard College | Assembly of Nucleic Acid Sequences in Emulsions |
| US20230303684A1 (en) * | 2021-03-17 | 2023-09-28 | Myeloid Therapeutics, Inc. | Engineered chimeric fusion protein compositions and methods of use thereof |
| US20230340139A1 (en) * | 2021-10-14 | 2023-10-26 | Arsenal Biosciences, Inc. | Immune cells having co-expressed shrnas and logic gate systems |
-
2024
- 2024-07-12 WO PCT/US2024/037902 patent/WO2025128162A1/en active Pending
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20100120092A1 (en) * | 2006-08-30 | 2010-05-13 | Hepgenics Pty Ltd. | Recombinant proteins and virus like particles comprising l and s polypeptides of avian hepadnaviridae and methods, nucleic acid constructs, vectors and host cells for producing same |
| US20110082055A1 (en) * | 2009-09-18 | 2011-04-07 | Codexis, Inc. | Reduced codon mutagenesis |
| US20150111256A1 (en) * | 2012-02-17 | 2015-04-23 | President And Fellows Of Harvard College | Assembly of Nucleic Acid Sequences in Emulsions |
| US10202628B2 (en) * | 2012-02-17 | 2019-02-12 | President And Fellows Of Harvard College | Assembly of nucleic acid sequences in emulsions |
| US20230303684A1 (en) * | 2021-03-17 | 2023-09-28 | Myeloid Therapeutics, Inc. | Engineered chimeric fusion protein compositions and methods of use thereof |
| US20230340139A1 (en) * | 2021-10-14 | 2023-10-26 | Arsenal Biosciences, Inc. | Immune cells having co-expressed shrnas and logic gate systems |
Non-Patent Citations (1)
| Title |
|---|
| HOLSTON ANDREW S., HINTON SAMUEL R., LINDLEY KYRA A., KEARNS NORA C., PLESA CALIN: "Degenerate DropSynth for Simultaneous Assembly of Diverse Gene Libraries and Local Designed Mutants", BIORXIV, 12 December 2023 (2023-12-12), pages 1 - 16, XP093327953, Retrieved from the Internet <URL:https://www.biorxiv.org/content/10.1101/2023.12.11.569291v1.full.pdf> [retrieved on 20240830], DOI: 10.1101/2023.12.11.569291 * |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP7322202B2 (en) | Methods for Nucleic Acid Assembly and High Throughput Sequencing | |
| Hoose et al. | DNA synthesis technologies to close the gene writing gap | |
| US11242523B2 (en) | Compositions, methods and apparatus for oligonucleotides synthesis | |
| US20190203201A1 (en) | Compositions and methods for multiplex nucleic acids synthesis | |
| HUE029228T2 (en) | Method of synthesizing polynucleotide variants | |
| JP2007534320A (en) | Polynucleotide synthesis method | |
| US7544793B2 (en) | Making nucleic acid sequences in parallel and use | |
| US10570386B2 (en) | PCR-based method for generating multisite saturation mutagenic DNA libraries | |
| WO2003033718A1 (en) | Synthesis of oligonucleotides on solid support and assembly into doublestranded polynucleotides | |
| AU2021320307A1 (en) | Preparation of RNA and DNA sequencing libraries using bead-linked transposomes | |
| WO2025128162A1 (en) | Degenerate dropsynth gene synthesis | |
| EP1383887A4 (en) | NEW DIRECTED EVOLUTION METHODS | |
| WO2002090538A1 (en) | Method of synthesizing nucleic acid | |
| Mamedov et al. | Rational de novo gene synthesis by rapid polymerase chain assembly (PCA) and expression of endothelial protein-C and thrombin receptor genes | |
| CA3211172A1 (en) | Methods of preparing directional tagmentation sequencing libraries using transposon-based technology with unique molecular identifiers for error correction | |
| US8883411B2 (en) | Making nucleic acid sequences in parallel and use | |
| CN106282269B (en) | Synthesis and assembly method of repetitive DNA and application thereof | |
| Villegas | Innovations in Programmable Nucleic Acid Libraries and CRISPR Enrichment for Molecular Biology Applications | |
| CN120418451A (en) | Method for obtaining correctly assembled nucleic acids | |
| WO2019246388A1 (en) | Solid-phase synthesis of polynucleotides using a template array | |
| KR20160149158A (en) | Method for synthesizing gene by using high depth oligonucleotide tiling |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24904576 Country of ref document: EP Kind code of ref document: A1 |