WO2025072262A1 - Synthèse d'adn à ultra-haute capacité pour la conception biologique fonctionnellement générative - Google Patents
Synthèse d'adn à ultra-haute capacité pour la conception biologique fonctionnellement générative Download PDFInfo
- Publication number
- WO2025072262A1 WO2025072262A1 PCT/US2024/048310 US2024048310W WO2025072262A1 WO 2025072262 A1 WO2025072262 A1 WO 2025072262A1 US 2024048310 W US2024048310 W US 2024048310W WO 2025072262 A1 WO2025072262 A1 WO 2025072262A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- polynucleotides
- polynucleotide
- aspects
- synthesis
- sequences
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/10—Processes for the isolation, preparation or purification of DNA or RNA
- C12N15/102—Mutagenizing nucleic acids
- C12N15/1027—Mutagenizing nucleic acids by DNA shuffling, e.g. RSR, STEP, RPR
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/10—Processes for the isolation, preparation or purification of DNA or RNA
- C12N15/1034—Isolating an individual clone by screening libraries
- C12N15/1093—General methods of preparing gene libraries, not provided for in other subgroups
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12P—FERMENTATION OR ENZYME-USING PROCESSES TO SYNTHESISE A DESIRED CHEMICAL COMPOUND OR COMPOSITION OR TO SEPARATE OPTICAL ISOMERS FROM A RACEMIC MIXTURE
- C12P19/00—Preparation of compounds containing saccharide radicals
- C12P19/26—Preparation of nitrogen-containing carbohydrates
- C12P19/28—N-glycosides
- C12P19/30—Nucleotides
- C12P19/34—Polynucleotides, e.g. nucleic acids, oligoribonucleotides
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6806—Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6844—Nucleic acid amplification reactions
- C12Q1/6853—Nucleic acid amplification reactions using modified primers or templates
- C12Q1/6855—Ligating adaptors
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Y—ENZYMES
- C12Y207/00—Transferases transferring phosphorus-containing groups (2.7)
- C12Y207/07—Nucleotidyltransferases (2.7.7)
- C12Y207/07007—DNA-directed DNA polymerase (2.7.7.7), i.e. DNA replicase
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Y—ENZYMES
- C12Y301/00—Hydrolases acting on ester bonds (3.1)
- C12Y301/11—Exodeoxyribonucleases producing 5'-phosphomonoesters (3.1.11)
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Y—ENZYMES
- C12Y301/00—Hydrolases acting on ester bonds (3.1)
- C12Y301/30—Endoribonucleases active with either ribo- or deoxyribonucleic acids and producing 5'-phosphomonoesters (3.1.30)
- C12Y301/30001—Aspergillus nuclease S1 (3.1.30.1)
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Y—ENZYMES
- C12Y605/00—Ligases forming phosphoric ester bonds (6.5)
- C12Y605/01—Ligases forming phosphoric ester bonds (6.5) forming phosphoric ester bonds (6.5.1)
- C12Y605/01002—DNA ligase (NAD+) (6.5.1.2)
-
- C—CHEMISTRY; METALLURGY
- C40—COMBINATORIAL TECHNOLOGY
- C40B—COMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
- C40B50/00—Methods of creating libraries, e.g. combinatorial synthesis
- C40B50/08—Liquid phase synthesis, i.e. wherein all library building blocks are in liquid phase or in solution during library creation; Particular methods of cleavage from the liquid support
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/20—Supervised data analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B35/00—ICT specially adapted for in silico combinatorial libraries of nucleic acids, proteins or peptides
- G16B35/10—Design of libraries
Definitions
- Foundation models for protein structure prediction represent a generational leap in capability.
- the next equivalent step in function will come from making foundation models effective co-pilots for biological design.
- Current models are trained on highly abundant protein sequence data, but this next step in capability will require models trained on large- scale protein functional data.
- the central barrier to collecting this functional data is the scalable synthesis of long and diverse DNA.
- the present disclosure generally relates to a method of generating a population of product polynucleotides, wherein the method comprises: a) pooling a plurality of polynucleotide fragments, wherein the polynucleotide fragments comprise sequences derived from at least two different genes; b) performing a synthesis reaction to produce a plurality of target polynucleotides; and c) selecting a population of product polynucleotides from the plurality of target polynucleotides.
- the plurality of polynucleotide fragments is generated by: i) providing a plurality of polynucleotides, wherein the polynucleotides comprise sequences derived from at least two different genes; ii) contacting the plurality' of polynucleotides with a plurality of adapters to produce a reaction mixture; and iii) subjecting the reaction mixture to amplification reaction conditions to produce a plurality of polynucleotide fragments.
- the plurality of polynucleotides comprise partially double-stranded polynucleotides, double-stranded polynucleotides, or a mixture thereof.
- the plurality 7 of polynucleotides is designed using a thermodynamic model, a machine learning model, and/or artificial intelligence.
- the plurality of polynucleotides comprises sequences derived from between about 2 to about 100.000 unique polynucleotide sequences.
- the plurality 7 of polynucleotides comprises sequences derived from between 2 to about 100,000 different genes.
- the plurality of polynucleotides are from about 500 nucleotides to about 2000 nucleotides in length.
- the plurality of polynucleotide fragments are from about 500 nucleotides to about 2000 nucleotides in length.
- the plurality of adapters comprises a primer, a barcode, an unmasking site, and/or a cloning site.
- the plurality 7 of adapters comprises at least one 5’ end-protected and at least one 3’ end-protected priming sequence for each product polynucleotide.
- the plurality of adapters comprises at least one pro-telomerase site for the production of covalently closed polynucleotide ends.
- the plurality of adapters comprises interacting sequences or modifications to produce a circularized polynucleotide in synthesis.
- interacting sequences or nucleotide modifications result in in-trans synthesis of a subset of polynucleotides followed by in-cis synthesis of a subset of polynucleotides to produce a circular polynucleotide.
- the method further comprises contacting the plurality 7 of polynucleotide fragments with an unmasking agent prior to pooling the polynucleotide fragments.
- the unmasking agent comprises a nuclease.
- the nuclease is a Type IIS restriction enzyme.
- selecting the population of product polynucleotides comprises contacting the plurality of target polynucleotides with an exonuclease and/or a nuclease. In some aspects, the contacting results in removal of polynucleotides that are not product polynucleotides.
- the nuclease is SI nuclease.
- the exonuclease is T5 exonuclease.
- the method further comprises unmasking at a non-natural nucleotide, a deoxyuracil, or a methylated nucleotide.
- the unmasking comprises cleavage at the nonnatural nucleotide, deoxyuracil, or methylated nucleotide. In some aspect, the method further comprises unmasking via photocleavage. In some aspects, the product polynucleotides are at least 99.90% identical to the sequence of a desired product polynucleotide. In some aspects, the synthesis reaction comprises an overlap extension synthesis reaction. In some aspects, the synthesis reaction is performed under isothermal conditions. In some aspects, the synthesis reaction comprises contacting the plurality’ of target polynucleotides with an exonuclease, a polymerase, and a ligase.
- the exonuclease is T5 exonuclease, T7 exonuclease, and/or Lambda exonuclease.
- the population of product polynucleotides encodes for at least two different proteins.
- one or more synthesis reactions are performed in an iterative series to produce a plurality of target polynucleotide sequences.
- the population of product polynucleotides are used to generate a plurality of polynucleotide fragments.
- the plurality of polynucleotide fragments are used for step a) of a subsequent iteration of the method.
- the product polynucleotides are about 0.1 kb to about 10.0 kb in length. In some aspects, the product polynucleotides are linear. In some aspects, the product polynucleotides are circular. In some aspects, the sequences derived from at least two different genes are derived from human genes.
- the present disclosure generally relates to a method of generating a population of product polynucleotides, wherein the method comprises: a) pooling a plurality of polynucleotide fragments, wherein the polynucleotide fragments comprise sequences derived from at least two different regulatory' sequences; b) performing a synthesis reaction to produce a plurality' of target polynucleotides; and c) selecting a population of product polynucleotides from the plurality of target polynucleotides.
- the polynucleotide fragments are generated by: i) providing a plurality of polynucleotides, wherein the polynucleotides comprise sequences derived from at least two different regulatory' sequences; ii) contacting the plurality of polynucleotides with a plurality of adapters to produce a reaction mixture; and iii) subjecting the reaction mixture to amplification reaction conditions to produce a plurality of polynucleotide fragments.
- each regulatory sequence independently comprises an enhancer, a silencer, an insulator, a promoter, an untranslated region (UTR), and/or an operator.
- FIG. 1 illustrates the circular IDT pool validation of pool 1.
- FIG. 2 illustrates the circular IDT pool validation of pool 2.
- FIG. 3 illustrates the linear IDT pool validation of pool 3.
- FIG. 4 illustrates the linear IDT pool validation of pool 4.
- FIG. 5 is a diagram illustrating a technical overview of pooled gene synthesis and synthesis powered by strong selection chemistry and rich learning data.
- FIG. 6 shows an example of successful pooled synthesis of polynucleotides
- the term “complementary” generally refers to the ability of a single strand of a polynucleotide (or portion thereol) to hybridize to an anti-parallel polynucleotide strand (or portion thereof) by contiguous base-pairing between the nucleotides (that is not interrupted by any unpaired nucleotides) of the anti-parallel polynucleotide single strands, thereby forming a double-stranded polynucleotide between the complementary strands.
- a first polynucleotide is said to be “completely complementary” to a second polynucleotide strand if each and every nucleotide of the first polynucleotide forms base-paring with nucleotides within the complementary 7 region of the second polynucleotide.
- a first polynucleotide is not completely complementary 7 (i.e.. partially complementary) to the second polynucleotide if at least one nucleotide in the first polynucleotide does not base pair with the corresponding nucleotide in the second polynucleotide.
- the degree of complementarity between polynucleotide strands has significant effects on the efficiency and strength of annealing or hybridization between polynucleotide strands. This is of particular importance in amplification reactions, which depend upon binding between polynucleotide strands. It is well-known in the art that sequences need not be completely complementary in order for hybridization to occur.
- An oligonucleotide primer is “complementary ” to a target polynucleotide if at least 50% (preferably, 60%, more preferably 70%, 80%, still more preferably 90% or more) nucleotides of the primer form base-pairs with nucleotides on the target polynucleotide.
- hybridization means the pairing of complementary oligomeric compounds (e.g.. a single strand of a polynucleotide pairing with an anti-parallel polynucleotide strand).
- the most common mechanism of pairing involves hydrogen bonding, which may be Watson-Crick, Hoogsteen or reversed Hoogsteen hydrogen bonding, between complementary nucleoside or nucleotide bases (nucleobases).
- hydrogen bonding which may be Watson-Crick, Hoogsteen or reversed Hoogsteen hydrogen bonding, between complementary nucleoside or nucleotide bases (nucleobases).
- nucleobases complementary nucleobase complementary’ to the natural nucleobases thymidine and uracil which pair through the formation of hydrogen bonds.
- the natural base guanine is nucleobase complementary to the natural bases cytosine and 5-methyl cytosine. Hybridization can occur under varying circumstances.
- the term “specifically hybridizes” refers to the ability of an oligomeric compound to hybridize to one nucleic acid site with greater affinity than it hybridizes to another nucleic acid site.
- an adaptor and “adaptor” are used interchangeably and refer to a nucleic acid sequence that can be used for manipulation of a nucleic acid molecule.
- an adaptor comprises at least a portion of at least one barcode.
- an adaptor comprises at least one barcode.
- adapters are used during synthesis of two or more nucleic acid molecules, such as two or more polynucleotide fragments.
- adaptors are used for amplification of one or more target nucleic acids.
- adaptors are used in reactions for sequencing.
- an adaptor comprises, consists of. or consist essentially of at least one priming site.
- a nucleic acid molecule can be tagged with an adaptor by, e g., an amplification reaction using a primer comprising the adaptor.
- the adaptor comprises an unmasking site.
- the adaptor comprises a cloning site.
- the adapter comprises a sequence that is included in a target polynucleotide.
- the adapter comprises a sequence that is used during synthesis of a target polynucleotide, but which sequence is not included in the target polynucleotide. Further characteristics of adapters are discussed elsewhere herein.
- an adapter comprises one or more non-natural nucleotides.
- an adapter comprises one or more deoxyuracils.
- an adapter comprises one or more methylation sites.
- an adapter comprises a photocleavable site.
- molecular barcode and “barcode” refer to a nucleic acid sequence, or a combination of nucleic acid sequences, that can act as a ‘key’ to distinguish or separate a plurality of sequences in a sample.
- two nucleic acid molecules can each be tagged with a molecular barcode having a unique nucleic acid sequence, such that the two uniquely tagged nucleic acid molecules are distinguishable from one another based on their respective molecular barcodes during nucleic acid sequencing.
- each of two or more different nucleic acid molecules can be tagged with two or more molecular barcodes, wherein the combination of molecular barcodes used to tag each of the two or more different nucleic acid molecules distinguishes the different nucleic acid molecules.
- at least one molecular barcode is incorporated into the nucleotide sequence of at least one adaptor and/or at least one primer.
- at least one molecular barcode is used to tag at least one nucleic acid molecule.
- molecular barcodes are used for amplification of one or more target nucleic acids.
- the molecular barcodes are used in reactions for sequencing.
- a molecular barcode comprises, consists of, or consist essentially of at least one priming site.
- polynucleotide amplification refers to a reaction for generating a copy of a particular polynucleotide sequence or increasing the copy number or amount of a particular poly nucleotide sequence.
- polynucleotide amplification may be a process using a polymerase and a pair of oligonucleotide primers for producing any particular polynucleotide sequence, z.e., the whole or a portion of a target polynucleotide sequence, in an amount that is greater than that initially present.
- Amplification may be accomplished by the in vitro methods of the polymerase chain reaction (PCR). See generally, PCR Technology: Principles and Applications for DNA Amplification (H. A.
- PCR Protocols A Guide to Methods and Applications (Innis et al., Eds.) Academic Press, San Diego, CA (1990); Mattila et al., Nucleic Acids Res. 19 : 4967 (1991); Eckert et al., PCR Methods and Applications 1 : 17 (1991); PCR (McPherson et al. Ed.), IRL Press, Oxford; and U. S. Patent Nos. 4,683,202 and 4,683,195.
- Other amplification methods include, but are not limited to: (a) ligase chain reaction (LCR) (see Wu and Wallace, Genomics 4 : 560 (1989) and Landegren et al..
- NABSA nucleic acid based sequence amplification
- amplification reaction conditions refers to any mixture of reagents that promotes an amplification reaction.
- the term “encoding’” refers to the inherent property of specific sequences of nucleotides in a polynucleotide, such as a gene, a cDNA, or an mRNA, to serve as templates for synthesis of other polymers and macromolecules in biological processes having either a defined sequence of nucleotides (z.e., rRNA, tRNA and mRNA) or a defined sequence of amino acids and the biological properties resulting therefrom.
- a gene encodes a protein if transcription and translation of mRNA corresponding to that gene produces the protein in a cell or other biological system.
- Both the coding strand the nucleotide sequence of which is identical to the mRNA sequence and is usually provided in sequence listings, and the non-coding strand, used as the template for transcription of a gene or cDNA, can be referred to as encoding the protein or other product of that gene or cDNA.
- expression is defined as the transcription and/or translation of a particular nucleotide sequence driven by its promoter.
- expression vector refers to a vector comprising a recombinant polynucleotide comprising expression control sequences operatively linked to a nucleotide sequence to be expressed.
- An expression vector comprises sufficient cis-acting elements for expression; other elements for expression can be supplied by the host cell or in an in vitro expression system.
- Expression vectors include all those known in the art, such as cosmids, plasmids (e.g., naked or contained in liposomes) and viruses (e.g, Sendai viruses, lentiviruses, retroviruses, adenoviruses, and adeno-associated viruses) that incorporate the recombinant polynucleotide.
- oligonucleotide typically refers to short polynucleotides. It will be understood that when a nucleotide sequence is represented by a DNA sequence (/. ⁇ ?., A, T, C, G), this also includes an RNA sequence (z.e., A, U, C, G) in which “U” replaces “T.” Unless otherwise specified, a “nucleotide sequence encoding an amino acid sequence” includes all nucleotide sequences that are degenerate versions of each other and that encode the same amino acid sequence. The phrase nucleotide sequence that encodes a protein or an RNA may also include introns to the extent that the nucleotide sequence encoding the protein may in some version contain an intron(s).
- nucleic acid is polynucleotides, which can be hydrolyzed into the monomeric “nucleotides” and which comprise one or more “nucleotide sequence(s)”.
- monomeric nucleotides can be hydrolyzed into nucleosides.
- polynucleotides include, but are not limited to, all nucleic acid sequences (z. e.
- nucleotide sequences which are obtained by any means available in the art, including, without limitation, recombinant means, i.e., the cloning of nucleic acid sequences from a recombinant library or a cell genome, using ordinary cloning technology and PCR, and the like, and by synthetic means.
- the term “gene” typically refers to polynucleotides that are the target product of synthesis. These polynucleotides can contain any one or combination of coding, noncoding, or synthetic DNA sequences. As such, in some aspects, the gene can be derived from a natural source, such as a human gene, but can further include in some instances synthetic DNA sequences introduced into the human gene sequence. In some aspects, the gene comprises a variant of sequence from which it was derived. In some aspects, the variant sequence is designed, for example, by artificial intelligence. In some aspects, the variant sequence is designed by any form of user input.
- a “vector” is a composition of matter which comprises an isolated nucleic acid and which can be used to deliver the isolated nucleic acid to the interior of a cell.
- vectors are known in the art including, but not limited to, linear polynucleotides, polynucleotides associated with ionic or amphiphilic compounds, plasmids, and viruses.
- the term “vector” includes an autonomously replicating plasmid or a virus.
- the term should also be construed to include non-plasmid and non-viral compounds which facilitate transfer of nucleic acid into cells, such as, for example, polylysine compounds, liposomes, and the like.
- viral vectors include, but are not limited to. Sendai viral vectors, adenoviral vectors, adeno-associated virus vectors, retroviral vectors, lentiviral vectors, and the like.
- conservative sequence modifications is intended to refer to amino acid modifications that do not significantly affect or alter the binding characteristics of the antibody containing the amino acid sequence. Such conservative modifications include amino acid substitutions, additions and deletions. Modifications can be introduced into an antibody of the invention by standard techniques know n in the art, such as site-directed mutagenesis and PCR-mediated mutagenesis. Conservative amino acid substitutions are ones in which the amino acid residue is replaced with an amino acid residue having a similar side chain. Families of amino acid residues having similar side chains have been defined in the art.
- amino acids with basic side chains e.g, lysine, arginine, histidine
- acidic side chains e.g, aspartic acid, glutamic acid
- uncharged polar side chains e.g, glycine, asparagine, glutamine, serine, threonine, tyrosine, cysteine, tryptophan
- nonpolar side chains e.g, alanine, valine, leucine, isoleucine, proline, phenylalanine.
- one or more amino acid residues within the CDR regions of an antibody can be replaced with other amino acid residues from the same side chain family and the altered antibody can be tested for the ability to bind antigens using the functional assays described herein.
- a “disease” is a state of health of an animal wherein the animal cannot maintain homeostasis, and wherein if the disease is not ameliorated then the animal’s health continues to deteriorate.
- a “disorder” in an animal is a state of health in which the animal is able to maintain homeostasis, but in which the animal’s state of health is less favorable than it would be in the absence of the disorder. Left untreated, a disorder does not necessarily cause a further decrease in the animal’s state of health.
- downstreamregulation refers to the decrease or elimination of gene expression of one or more genes.
- the terms “effective amount” and “pharmaceutically effective amount” refer to a nontoxic but sufficient amount of an agent or drug to provide the desired biological result. That result can be reduction and/or alleviation of the signs, symptoms, or causes of a disease or disorder, imaging or monitoring of an in vitro or in vivo system (including a living organism), or any other desired alteration of a biological system.
- An appropriate effective amount in any individual case may be determined by one of ordinary skill in the art using routine experimentation.
- “Homologous” as used herein refers to the subunit sequence identity between two polymeric molecules, e.g, between two nucleic acid molecules, such as, two DNA molecules or two RNA molecules, or between two polypeptide molecules. When a subunit position in both of the two molecules is occupied by the same monomeric subunit; e.g, if a position in each of two DNA molecules is occupied by adenine, then they are homologous at that position.
- the homology between two sequences is a direct function of the number of matching or homologous positions; e.g, if half (e.g., five positions in a polymer ten subunits in length) of the positions in two sequences are homologous, the two sequences are 50% homologous; if 90% of the positions (e.g. 9 of 10). are matched or homologous, the two sequences are 90% homologous.
- Identity refers to the subunit sequence identity between two polymeric molecules particularly between two amino acid molecules, such as, between two polypeptide molecules. When two amino acid sequences have the same residues at the same positions; e.g, if a position in each of two polypeptide molecules is occupied by an Arginine, then they are identical at that position. The identity' or extent to which two amino acid sequences have the same residues at the same positions in an alignment is often expressed as a percentage.
- the identity between two amino acid sequences is a direct function of the number of matching or identical positions; e.g., if half (e.g., five positions in a polymer ten amino acids in length) of the positions in two sequences are identical, the two sequences are 50% identical; if 90% of the positions (e.g., 9 of 10), are matched or identical, the two amino acids sequences are 90% identical.
- An '‘individual”, '‘patient” or “subject”, as that term is used herein, includes a member of any animal species including, but are not limited to, birds, humans and other primates, and other mammals including commercially relevant mammals such as cattle, pigs, horses, sheep, cats, and dogs.
- the subject is a human.
- ⁇ ‘Instructional material,” as that term is used herein, includes a publication, a recording, a diagram, or any other medium of expression that can be used to communicate the usefulness of the composition and/or compound of the invention in a kit.
- the instructional material of the kit may, for example, be affixed to a container that contains the compound and/or composition of the invention or be shipped together with a container that contains the compound and/or composition. Alternatively, the instructional material may be shipped separately from the container with the intention that the recipient uses the instructional material and the compound cooperatively.
- Delivery of the instructional material may be, for example, by physical delivery of the publication or other medium of expression communicating the usefulness of the kit, or may alternatively be achieved by electronic transmission, for example by means of a computer, such as by electronic mail, or dow nload from a website.
- isolated means altered or removed from the natural state through the actions of a human being.
- a nucleic acid or a protein naturally present in a living animal is not “isolated,” but the same nucleic acid or protein partially or completely separated from the coexisting materials of its natural state is “isolated.”
- An isolated nucleic acid or protein can exist in substantially purified form, or can exist in a non-native environment such as, for example, a host cell.
- An '‘isolated nucleic acid” refers to a nucleic acid segment or fragment which has been separated from sequences which flank it in a naturally occurring state, i.e., a DNA fragment which has been removed from the sequences that are normally adjacent to the fragment, i.e., the sequences adjacent to the fragment in a genome in which it naturally occurs.
- the term also applies to nucleic acids that have been substantially purified from other components which naturally accompany the nucleic acid, i.e., RNA or DNA or proteins, which naturally accompany it in the cell.
- the term therefore includes, for example, a recombinant DNA that is incorporated into a vector, into an autonomously replicating plasmid or virus, or into the genomic DNA of a prokary ote or eukaryote, or which exists as a separate molecule (z. e. , as a cDNA or a genomic or cDNA fragment produced by PCR or restriction enzyme digestion) independent of other sequences. It also includes a recombinant DNA that is part of a hybrid gene encoding additional polypeptide sequence.
- recombinant polypeptide as used herein is defined as a polypeptide produced by using recombinant DNA methods.
- recombinant DNA as used herein is defined as DNA produced by joining pieces of DNA from different sources.
- “Variant” as the term is used herein, is a nucleic acid sequence or a peptide sequence that differs in sequence from a reference nucleic acid sequence or peptide sequence respectively, but retains essential properties of the reference molecule. Changes in the sequence of a nucleic acid variant may not alter the amino acid sequence of a peptide encoded by the reference nucleic acid, or may result in ammo acid substitutions, additions, deletions, fusions and truncations. Changes in the sequence of peptide variants are typically limited or conservative, so that the sequences of the reference peptide and the variant are closely similar overall and, in many regions, identical.
- a variant and reference peptide may differ in amino acid sequence by one or more substitutions, additions, or deletions in any combination.
- a variant of a nucleic acid or peptide may be a naturally occurring such as an allelic variant, or may be a variant that is not known to occur naturally.
- Non-naturally occurring variants of nucleic acids and peptides may be made by mutagenesis techniques or by direct synthesis.
- modified is meant a changed state or structure of a molecule or cell of the invention.
- Molecules may be modified in many w ays, including chemically, structurally, and functionally.
- Cells may be modified through the introduction of nucleic acids.
- moduleating mediating a detectable increase or decrease in the level of a response in a subject compared with the level of a response in the subject in the absence of a treatment or compound, and/or compared with the level of a response in an otherwise identical but untreated subject.
- the term encompasses perturbing and/or affecting a native signal or response thereby mediating a beneficial therapeutic response in a subject, preferably, a human.
- Parenteral administration of an immunogenic composition includes, e.g., subcutaneous (s.c.), intravenous (i.v.), intramuscular (i.m.), or intrastemal injection, or infusion techniques.
- the term “pharmaceutically acceptable” refers to a material, such as a carrier or diluent, which does not abrogate the biological activity 7 or properties of the compound, and is relatively nontoxic, i. e. , the material may be administered to an individual without causing undesirable biological effects or interacting in a deleterious manner with any of the components of the composition in which it is contained.
- the term “pharmaceutically acceptable carrier” means a pharmaceutically acceptable material, composition or carrier, such as a liquid or solid filler, stabilizer, dispersing agent, suspending agent, diluent, excipient, thickening agent, solvent or encapsulating material, involved in carrying or transporting a compound useful within the invention within or to the patient such that it may perform its intended function.
- a pharmaceutically acceptable material, composition or carrier such as a liquid or solid filler, stabilizer, dispersing agent, suspending agent, diluent, excipient, thickening agent, solvent or encapsulating material, involved in carrying or transporting a compound useful within the invention within or to the patient such that it may perform its intended function.
- Such constructs are carried or transported from one organ, or portion of the body, to another organ, or portion of the body.
- Each carrier must be “acceptable” in the sense of being compatible with the other ingredients of the formulation, including the compound useful within the invention, and not injurious to the patient.
- materials that may serve as pharmaceutically acceptable carriers include: sugars, such as lactose, glucose and sucrose; starches, such as com starch and potato starch; cellulose, and its derivatives, such as sodium carboxymethyl cellulose, ethyl cellulose and cellulose acetate; powdered tragacanth; malt; gelatin; talc; excipients, such as cocoa butter and suppository waxes; oils, such as peanut oil, cottonseed oil, safflower oil, sesame oil, olive oil, com oil and soybean oil; glycols, such as propylene glycol; polyols, such as glycerin, sorbitol, mannitol and polyethylene glycol; esters, such as ethyl oleate and ethyl laurate; agar; buffering agents, such as magnesium hydroxide and aluminum hydroxide; surface active agents; alginic acid; pyrogen-free water; isotonic s
- “pharmaceutically acceptable carrier” also includes any and all coatings, antibacterial and antifungal agents, and absorption delaying agents, and the like that are compatible with the activity of the compound useful within the invention, and are physiologically acceptable to the patient. Supplementary active compounds may also be incorporated into the compositions.
- the “pharmaceutically acceptable carrier” may further include a pharmaceutically acceptable salt of the compound useful within the invention.
- Other additional ingredients that can be included in the pharmaceutical compositions used in the practice of the invention are known in the art and described, for example in Remington’s Pharmaceutical Sciences (Genaro, Ed., Mack Publishing Co., 1985. Easton, PA), which is incorporated herein by reference.
- the term '‘pharmaceutical composition” refers to a mixture of at least one compound of the invention with other chemical components, such as carriers, stabilizers, diluents, dispersing agents, suspending agents, thickening agents, and/or excipients.
- the pharmaceutical composition facilitates administration of the compound to an organism. Multiple techniques of administering a compound exist in the art including, but not limited to, intravenous, oral, aerosol, parenteral, ophthalmic, pulmonary and topical administration.
- protein As used herein, the terms “protein”, “peptide” and “polypeptide” are used interchangeably, and refer to a compound comprised of amino acid residues covalently linked by peptide bonds.
- peptide bond means a covalent amide linkage formed by loss of a molecule of water between the carboxyl group of one amino acid and the amino group of a second amino acid.
- a protein or peptide must contain at least tw o amino acids, and no limitation is placed on the maximum number of amino acids that may comprise the sequence of a protein or peptide.
- Polypeptides include any peptide or protein comprising two or more amino acids joined to each other by peptide bonds.
- Proteins include, for example, biologically active fragments, substantially homologous proteins, oligopeptides, homodimers, heterodimers, variants of proteins, modified proteins, derivatives, analogs, and fusion proteins, among others.
- the proteins include natural proteins, recombinant proteins, synthetic proteins, or a combination thereof.
- a protein may be a receptor or a non-receptor.
- an antibody which recognizes a specific antigen, but does not substantially recognize or bind other molecules in a sample.
- an antibody that specifically binds to an antigen from one species may also bind to that antigen from one or more species. But, such crossspecies reactivity does not itself alter the classification of an antibody as specific.
- an antibody that specifically binds to an antigen may also bind to different allelic forms of the antigen. However, such cross reactivity does not itself alter the classification of an antibody as specific.
- the terms “specific binding” or “specifically binding,” can be used in reference to the interaction of an antibody, a protein, or a peptide with a second chemical species, to mean that the interaction is dependent upon the presence of a particular structure (e.g., an antigenic determinant or epitope) on the chemical species; for example, an antibody recognizes and binds to a specific protein structure rather than to proteins generally. If an antibody is specific for epitope “A”, the presence of a molecule containing epitope A (or free, unlabeled A), in a reaction containing labeled “A” and the antibody, will reduce the amount of labeled A bound to the antibody.
- a particular structure e.g., an antigenic determinant or epitope
- substantially the same amino acid sequence is defined as a sequence with at least 70%, preferably at least about 80%, more preferably at least about 85%, more preferably at least about 90%, even more preferably at least about 95%, and most preferably at least 99% homology with another amino acid sequence, as determined by the FASTA search method in accordance with Pearson & Lipman, 1988, Proc. Natl. Inst. Acad. Sci. USA 85:2444-48.
- substantially purified cell is a cell that is essentially free of other cell types.
- a substantially purified cell also refers to a cell which has been separated from other cell ty pes with which it is normally associated in its naturally occurring state.
- a population of substantially purified cells refers to a homogenous population of cells. In other instances, this term refers simply to cell that have been separated from the cells with which they are naturally associated in their natural state.
- the cells are cultured in vitro. In other embodiments, the cells are not cultured in vitro.
- terapéutica as used herein means a treatment and/or prophylaxis.
- a therapeutic effect is obtained by suppression, remission, or eradication of a disease state.
- ranges throughout this disclosure, various aspects of the invention can be presented in a range format. It should be understood that the description in range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the invention. Accordingly, the description of a range should be considered to have specifically disclosed all the possible subranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed subranges such as from 1 to 3. from 1 to 4, from 1 to 5. from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 2.7, 3, 4, 5, 5.3, and 6. This applies regardless of the breadth of the range.
- the biggest single bottleneck for scaling the DBT cycle is DNA synthesis.
- the cost per nucleotide for a synthetic gene (or any DNA product larger than -200 nucleotides) has remained static for the past 5-7 years.
- the cost of gene synthesis often makes large scale synthesis of gene diversity infeasible.
- Most protein discovery' or engineering workflows synthesize only a handful of genes from vast spaces of metagenomic or generatively designed proteins as opposed to exploring this space at a scale that can facilitate learning. Redefining the productivity’ curve for gene synthesis is key to scaling the DBT loop to realize feedback-driven generative design models.
- the present disclosure generally relates to a method of generating a population of product polynucleotides.
- the method comprises: a. pooling a plurality of polynucleotide fragments.
- the polynucleotide fragments comprise sequences derived from at least two different genes.
- the polynucleotide fragments comprise sequences derived from at least 2 unique polynucleotide sequences.
- the method comprises: b. performing an synthesis reaction to produce a plurality of target polynucleotides. In some aspects, the method comprises: c. selecting a population of product polynucleotides from the plurality of target polynucleotides.
- the polynucleotide fragments are generated by: i. providing a plurality of polynucleotides. In some aspects, the plurality of polynucleotides comprises sequences derived from at least 2 unique polynucleotide sequences. In some aspects, the plurality of polynucleotides comprises sequences derived from at least two different genes. In some aspects, the polynucleotide fragments are generated by: ii. contacting the plurality of polynucleotides with a plurality of adapters to produce a reaction mixture. In some aspects, the polynucleotide fragments are generated by: iii.
- the plurality of polynucleotides comprise partially double-stranded polynucleotides, double-stranded polynucleotides, or a mixture thereof.
- polynucleotide design is based on the sequence of desired product polynucleotide sequence(s).
- the polynucleotides used in the generation of the desired product polynucleotides described herein are designed using a DNA encoding and synthesis model. For instance, such a model can take as input a pooled set of protein sequences and output a highly orthogonal set of DNA building blocks that can be used to generate product polynucleotides encoding for such protein sequences. The orthogonal building blocks can then be synthesized into a pool of DNA genes encoding the full set of proteins in the pool, as is described herein.
- deep long read sequencing of DNA synthesis products can allow the model to learn from every pool of sequences made and maximize the efficiency and uniform (or user-defined) representation of genes in the output pool in iterative cycles.
- the plurality of poly nucleotides is designed using a thermodynamic model, a machine learning model, and/or artificial intelligence. In some aspects, the plurality of polynucleotides is designed using a thermodynamic model. In some aspects, the thermodynamic model is based on a graph optimization architecture aimed at maximizing the probability of correct polynucleotide interactions and minimizing the probability of incorrect internal or external polynucleotide interactions. In some aspects, design of the plurality of polynucleotides comprises the use of simulated annealing. In some aspects, design of the plurality 7 of polynucleotides comprises the use of generative models.
- the plurality of polynucleotides comprises sequences derived from at least 2 unique polynucleotide sequences. In some aspects, the plurality of polynucleotides comprises sequences derived from between about 2 to about 100,000 unique polynucleotide sequences. In some aspects, the plurality of polynucleotides comprises sequences derived from at least about 2, at least about 5, at least about 10, at least about 25, at least about 50, at least about 75, at least about 100, at least about 250, at least about 500. at least about 750, at least about 1,000, at least about 50,000, or at least about 100,000 unique polynucleotide sequences, or any value therebetween.
- the plurality of polynucleotides comprises sequences generated from user input. In some aspects, the plurality 7 of polynucleotides comprises sequences generated by artificial intelligence. In some aspects, the plurality of polynucleotides comprise sequences generated and/or derived any source from which polynucleotide sequences can be generated and/or derived. In some aspects, the plurality 7 polynucleotides comprises sequences derived from different regions of the same gene. In some aspects, the plurality of polynucleotides comprises sequences derived from between about 2 to about 1000 different genes, such as from between about 2 to about 100 different genes.
- the plurality of polynucleotides comprises sequences derived from about 2, about 3, about 4, about 5, about 6, about 7, about 8, about 9, about 10, about 15, about 20, about 25, about 30, about 35, about 40, about 45, about 50, about 55. about 60, about 65, about 70, about 75, about 80, about 85, about 90, about 95. about 100, about 500, about 1,000, about 5,000, about 10,000, about 25,000, about 50,000, about 75,000, about 100,000, about 250,000, about 500,000, or about 750,000 different genes or any value therebetween.
- the different gene sequences are, for instance, derived from two different versions of the same gene, such as a wild-type version of the gene and a version of the gene that contains at least one point mutation, substitution, and/or deletion.
- the gene sequences are derived from a mammalian gene, a viral gene, or a bacterial gene.
- the gene sequences are derived from a human gene.
- the plurality 7 of polynucleotides encode for an entire genome of an organism.
- the gene sequences are derived from a diverse set of metagenomic sequences.
- the gene sequences are derived from a set of Al generated sequences.
- the plurality of polynucleotides are from about 500 nucleotides to about 2000 nucleotides in length. In some aspects, plurality of polynucleotides are from about 100 nucleotides to about 10,000 nucleotides in length, or any value therebetween. In some aspects, the plurality of polynucleotides are about 100, about 200, about 300. about 400, about 500, about 600. about 700. about 800, about 900, about 1000, about 1250, about 1500, about 1750, about 2000, about 3000, about 4000, about 5000, about 6000, about 7000, about 8000, about 9000, or about 10,000 nucleotides in length.
- the plurality of polynucleotide fragments are from about 500 nucleotides to about 2000 nucleotides in length. In some aspects, plurality of polynucleotide fragments are from about 100 nucleotides to about 10,000 nucleotides in length, or any value therebetween. In some aspects, the plurality of polynucleotide fragments are about 100, about 200, about 300, about 400, about 500, about 600, about 700, about 800, about 900, about 1000, about 1250, about 1500, about 1750, about 2000, about 3000, about 4000, about 5000, about 6000, about 7000, about 8000. about 9000, or about 10.000 nucleotides in length.
- the adapters comprise a primer. In some aspects, the plurality of adapters comprises at least one 5’ and at least one 3’ end-protected priming sequence for each product polynucleotide. In some aspects, the end-protections comprise one or more 2’-O- methoxy-ethyl (MOE) bases. In some aspects, the end-protections comprise one or more phosphorothioated (PS) bases. In some aspects, the plurality of adapters comprise at least one protelomerase binding site or a partial binding site, such that treatment of the polynucleotide with protelomerase results in closed end-protected DNA.
- MOE 2’-O- methoxy-ethyl
- PS phosphorothioated
- the adapters comprise interacting sequences or nucleotide modifications that will cause a linear DNA molecule to circularize during the synthesis reaction.
- the adapters comprise sequences or nucleotide modifications to facilitate some of the plurality' of polynucleotides to synthesize in-trans and some to synthesize in-cis to produce a circular DNA polynucleotide.
- the plurality’ of adapters comprises at least one pro-telomerase site for the production of covalently closed polynucleotide ends.
- the plurality of adapters comprises interacting sequences or modifications to produce a circularized polynucleotide in synthesis.
- interacting sequences or nucleotide modifications result in in-trans synthesis of a subset of polynucleotides followed by in-cis synthesis of a subset of polynucleotides to produce a circular polynucleotide.
- the adapters comprise a primer comprising a barcode. In some aspects, the adapters comprise a barcode. In some aspects, the adapters comprise an unmasking site. In some aspects, the unmasking site comprises a type II-S restriction endonuclease site. In some aspects, the unmasking site comprises a type II-C restriction endonuclease site. In some aspects, the adapter comprises a cloning site.
- the polynucleotides and adapters are designed for linear synthesis of the product polynucleotides. In some aspects, the polynucleotides and adapters are designed for circular synthesis of the product polynucleotides. In some aspects, the polynucleotides and/or adapters comprise 5’ and/or 3’ end protection groups. In some aspects, the polynucleotides and/or adapters comprise a circular end protection group. In some aspects, the polynucleotides and/or adapters comprise at least one moiety for attachment to a support, e.g., a solid state support.
- the methods described herein further comprise contacting the plurality of polynucleotide fragments with an unmasking agent prior to pooling the polynucleotide fragments.
- unmasking comprises removing amplification primer sequences and/or otherwise exposing the polynucleotide sequences, e g., gene sequences, e.g., adapter sequences, e.g., regulatory’ sequences, e.g., target polynucleotide sequences, that will participate in the synthesis reaction.
- the unmasking removes a restriction site and/or primer barcode to thereby expose the sequences, e.g., gene sequences, e.g., adapter sequences, e.g., regulatory’ sequences, e.g., target polynucleotide sequences.
- unmasking comprises generating a partially single stranded sequence at the end of the sequence, e.g., gene sequence, e.g., adapter sequence, e.g.. regulatory sequence, e.g., target polynucleotide sequence.
- the partially single stranded sequence can either be cleaved or filled in to blunt or add chemical groups (e.g.
- deoxyuracil cleavage comprises use of a uracil-DNA glycosylase.
- unmasking comprises cleavage at a methylation site.
- the cleavage at the methylation site comprises use of a methylation dependent endonuclease.
- unmasking comprises cleavage at a photocleavable site.
- unmasking comprises use of a CRISPR endonuclease.
- the synthesis reaction comprises an overlap extension synthesis reaction. For instance, in general, synthesis proceeds by 1) 5’ -> 3’ exonuclease removal of nucleotides at the gene fragment ends, 2) annealing of complementary 3’ overhang sequences, 3) polymerase fill in of annealed 3’ overlaps, and 4) ligation of filled in 3’ overlaps to create a durable synthesis of two gene fragments.
- the synthesis reaction is performed under isothermal conditions.
- the synthesis reaction is performed using temperature cycling.
- the synthesis reaction comprises contacting the plurality of target polynucleotides with an exonuclease, a polymerase, and a ligase.
- one or more synthesis reactions are performed in an iterative series to produce a plurality of target polynucleotide sequences.
- the exonuclease is T5 exonuclease, T7 exonuclease. Lambda exonuclease, or exonuclease V.
- the ligase is Taq ligase, Tth ligase, 9°N ligase.
- the synthesis reaction comprises a single stranded binding protein, such as ET SSB.
- the polymerase used for the synthesis reaction is Q5, Phusion, or another high-fidelity polymerase.
- the polymerase used for clonal amplification is Phi29 or another linear isothermal amplification polymerase.
- the synthesis reaction comprises the use of end protection modifications, such as one or more 2’-O-methoxy-ethyl (MOE) Bases, one or more phosphorothioate (PS) bonds, covalently closed DNA ends, or combinations thereof.
- end protection modifications such as one or more 2’-O-methoxy-ethyl (MOE) Bases, one or more phosphorothioate (PS) bonds, covalently closed DNA ends, or combinations thereof.
- the synthesis reaction produces a circular DNA polynucleotide.
- the interacting sequences or nucleotide modifications are designed to facilitate some of the plurality of polynucleotides to synthesize in-trans and some to synthesize in-cis to produce a circular DNA polynucleotide.
- the synthesis reaction produces a circular polynucleotide.
- the unmasking and synthesis reactions are performed as a one-pot reaction using either purified or unpurified unmasked amplified polynucleotides from PCR step by adding Type IIS or Type IIC enzyme to synthesis reaction.
- a one-pot reaction includes an initial 37°C 10’ thermocycler step in synthesis protocol.
- a cycling or stepdown of temperature is used during synthesis to encourage ligation and/or to allow for less ligase to be used during the reaction.
- the unmasking and/or synthesis reactions comprise one or more crowding agents.
- the methods described herein comprise selecting a population of product polynucleotides from the plurality of target polynucleotides.
- the selecting comprises removal of polynucleotides that are not product polynucleotides, such as the removal of partially or incorrectly synthesized DNA strands.
- the removal is by chemical means.
- selection criteria are used so as to permit a degree of variability, such as about 0.1%. about 0.5%, about 1 .0%, or about 5.0%, or any degree of variability desired by a practitioner, in the product polynucleotides. For instance, introducing small amounts of polyclonal errors into different copies of an individual gene in the pool can be advantageous. This type of error creates additional diversity which is unlikely to impede the functional testing of genes and may be beneficial for the functional testing of genes.
- selecting the population of product polynucleotides comprises contacting the plurality of target polynucleotides with an exonuclease and/or a nuclease.
- the nuclease is SI nuclease.
- the exonuclease is T5 exonuclease.
- the exonuclease is exonuclease V (BCD).
- the contacting results in the removal of polynucleotides that are not product polynucleotides. such as removal of target polynucleotides and/or polynucleotide fragments that are partially synthesized versions of the desired product polynucleotide.
- selecting the population of product polynucleotides comprises nucleic acid sequencing.
- the nucleic acid sequencing comprises sequencing of a barcode, such as can be comprised by an adapter.
- selecting the population of product polynucleotides comprises amplification of a target polynucleotide.
- the methods described herein comprise selecting a population of product polynucleotides from the plurality of target polynucleotides.
- the population of product polynucleotides can serve as the input polynucleotides for iterative synthesis of polynucleotides.
- the population of product polynucleotides are used to generate a plurality of polynucleotide fragments.
- the plurality of polynucleotide fragments are used for step a) of a subsequent iteration of the method.
- a product polynucleotide is any polynucleotide sequence that, within a pre-selected degree of variation, represents a polynucleotide sequence desired by a practitioner of the methods described herein.
- the product polynucleotides are at least 99.95% identical to the sequence of a desired product polynucleotide.
- the identity 7 of the product polynucleotides to the sequence of a desired product polynucleotide is any such degree of identity as is desired by the practitioner.
- the product polynucleotides are at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 98.5%, at least 99%, at least 99.25%, at least 99.50%, at least 99.75%, at least 99.90%, at least 99.95%, at least 99.975%. or at least 99.99% identical to the desired product polynucleotide sequence. In some aspects, the product polynucleotides are 100% identical to the desired product polynucleotide sequence.
- the population of product polynucleotides comprises from between 2 to about 100,000 unique product polynucleotide sequences. In some aspects, the population of product polynucleotides comprises at least about 2. at least about 5, at least about 10, at least about 25, at least about 50, at least about 75, at least about 100, at least about 250, at least about 500, at least about 750, at least about 1,000, at least about 50,000, or at least about 100,000 unique product polynucleotide sequences, or any value therebetween. In some aspects, the population of product polynucleotides comprises product polynucleotide sequences generated from user input.
- the population of product polynucleotides comprises product polynucleotide sequences generated by artificial intelligence. In some aspects, the population of product polynucleotides comprises product polynucleotide sequences generated and/or derived any source from which polynucleotide sequences can be generated and/or derived. In some aspects, the population of product polynucleotides comprises product polynucleotide sequences derived from between about 2 to about 1000 different genes, such as from between about 2 to about 100 different genes.
- the population of product polynucleotides comprises sequences derived from about 2, about 3, about 4, about 5, about 6, about 7, about 8, about 9, about 10, about 15, about 20, about 25, about 30, about 35, about 40, about 45, about 50, about 55, about 60, about 65, about 70, about 75, about 80, about 85, about 90, about 95, about 100, about 500, about 1,000, about 5.000, about 10,000, about 25,000. about 50.000, about 75,000, about 100,000, about 250,000, about 500,000, or about 750,000 different genes or any value therebetween.
- the different gene sequences are, for instance, derived from two different versions of the same gene, such as a wild-type version of the gene and a version of the gene that contains at least one point mutation, substitution, and/or deletion.
- the gene sequences are derived from a mammalian gene, a viral gene, or a bacterial gene.
- the gene sequences are derived from a human gene.
- the population of product polynucleotides encode for an entire genome of an organism.
- the product polynucleotide comprises a regulatory sequence, as further described herein.
- the population of product polynucleotides encodes for at least two different proteins. In some aspects, the population of product polynucleotides encode for a large serine recombinase. In some aspects, the population of product polynucleotides encode for a non-LTR retrotransposase. In some aspects, using the methods described herein, product polynucleotides encoding a serine recombinase and/or a non-LTR retrotransposase are designed so as to overwrite a gene, e.g. a human gene, to correct a genetic defect and/or to treat a genetic disease. In some aspects, the population of product polynucleotides encode for a transcription factor.
- product polynucleotides encoding a transcription factor are precisely tuned for specific epigenetic editing or cellular reprogramming.
- the population of product polynucleotides encode for target(s) for high-yield biomanufacturing.
- product polynucleotides can encode enzy matic and/or regulatory component-based targets, which targets can enhance cellular manufacturing yields.
- the population of product polynucleotides are about 0. 1 kb to about 10.0 kb.
- the population of product polynucleotides are linear.
- the population of product polynucleotides are circular.
- the population of product polynucleotides comprise a vector, e.g., an expression vector.
- the methods of the present disclosure can be used for the generation of product polynucleotides comprising regulatory sequences, e g., promoter sequences.
- regulatory sequences e g., promoter sequences.
- Such regulatory sequences can be designed to be tuned for temporally precise and cell type-specific expression.
- the present disclosure generally relates to a method of generating a population of product polynucleotides, wherein the method comprises: a. pooling a plurality of polynucleotide fragments, wherein the polynucleotide fragments comprise sequences derived from at least two different regulatory sequences; b. performing a synthesis reaction to produce a plurality of target polynucleotides; and c.
- the polynucleotide fragments are generated by: i. providing a plurality' of polynucleotides, wherein the polynucleotides comprise sequences derived from at least two different regulatory sequences; ii. contacting the plurality' of polynucleotides with a plurality of adapters to produce a reaction mixture; and iii. subjecting the reaction mixture to amplification reaction conditions to produce a plurality of polynucleotide fragments.
- each regulatory sequence independently comprises an enhancer, a silencer, an insulator, a promoter, an untranslated region (UTR), and/or an operator.
- the population of product polynucleotides further comprises sequences derived from at least two different genes and/or sequences derived from at least two unique polynucleotide sequences, as discussed herein.
- reaction conditions including but not limited to reaction times, reaction size/volume, and experimental reagents, such as solvents, catalysts, pressures, atmospheric conditions, e.g., nitrogen atmosphere, and reducing/oxi dizing agents, with art- recognized alternatives and using no more than routine experimentation, are within the scope of the present application.
- range such as from 1 to 6 should be considered to have specifically disclosed sub-ranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6. from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 2.7, 3, 4, 5, 5.3, and 6. This applies regardless of the breadth of the range.
- pooled synthesis of genes is not currently done. What is needed to enable pooled synthesis is a new architecture comprising three components: 1) an encoding & synthesis model to generate orthogonal DNA building blocks for a pool of target polynucleotides, 2) a DNA synthesis chemistry that has no boundary on the size and diversity of polynucleotides that can be synthesized within a pool and is highly selective for the generation of full length target polynucleotides with low error rates, and 3) long-read sequencing to facilitate comprehensive sequencing of synthesized polynucleotide products in output pools. Together, these three components form a virtuous feedback cycle, enabling the generation of a massive synthesis dataset that can be used to continuously refine the generative encoding & synthesis model to drive enhanced performance with every pool of polynucleotides synthesized.
- the methods disclosed herein focus on the pooled synthesis of high diversity sequences. This diversity of genes within a pool could vary from distributed point mutations to completely diverse genes. This will allow the creation of flexible gene pools that efficiently capture the sequence diversity necessary to represent different functional landscapes of proteins or genomic sequences.
- the methods of the present disclosure will, in some aspects, take as input a pooled set of protein sequences and output a highly orthogonal set of DNA building blocks.
- the orthogonal building blocks will then be synthesized into a pool of DNA genes encoding the full set of proteins in the pool.
- Deep long read sequencing of DNA synthesis products will allow the model to leam from every pool of sequences made and maximize the efficiency and uniform (or user-defined) representation of genes in the output pool.
- a generative DNA encoding & synthesis model will be developed which can drive gene synthesis efficiency.
- the DNA encoding and synthesis model will track all components of the synthesis process to provide continuity with generative protein design and maximize synthesis performance.
- the synthesis of polynucleotide input materials into genes consists of 5 steps detailed in FIG. 5.
- the process starts with pre-amplification of specific elements from a polynucleotide pool that are needed for synthesis of a specific gene pool (polynucleotide sub pool).
- Pre-amplification proceeds by linear amplification or exponential amplification.
- Exponential amplification via PCR utilizes (T)PBC_5_F and (T)PBC_3_R priming sites specific to the polynucleotide sub pool.
- PCR amplification is stopped while in the exponential phase of amplification to maintain an excess of unincorporated primers and prevent errant annealing of products.
- synthesis reactions may utilize exonuclease, polymerase, and ligase enzyme components.
- synthesis proceeds by 1) 5‘ -> 3‘ exonuclease removal of nucleotides at the gene fragment ends. 2) annealing of complementary 3’ overhang sequences, 3) polymerase fill in of annealed 3’ overlaps, and 4) ligation of filled in 3’ overlaps to create a durable synthesis of tw o gene fragments.
- This reaction typically takes place at a single isothermal temperature (typically 50C) with all enzy matic components acting in concert.
- Example enzymes typically used include T5 exonuclease.
- T5 exonuclease is not inhibited by some types of end protection, and must be replaced in the reaction components to ensure the integrity of the end protected termini of linearly synthesized gene products.
- the T5 exonuclease can be replaced by exonucleases, such as T7 or Lambda exonuclease, which are unable to act on protected ends.
- the T7 and Lambda exonucleases require 5’ phosphate groups to initiate removal of 5’ nucleotides. The required phosphate groups are retained on the 5’ termini of all gene sequences following unmasking cleavage.
- Synthesis is allowed to proceed for a period of approximately 15 minutes resulting in the generation of both fully synthesized genes with both ends protected, and partial gene synthesis products. In some instances, synthesis is allowed to proceed for a period of approximately 10 minutes. 30 minutes, or 16 hours.
- Partial synthesis products could include synthesis products that have synthesized correctly but have not formed a complete synthetic gene. Partial synthesis products could also include partial gene synthesis products containing errors in synthesis resulting from the incorrect interaction of partial gene sequences. Partial synthesis products may be shorter or longer than intended synthesis products. Both of these forms of partial synthesis need to be removed from the reaction to enrich full-length gene products.
- the selection step removes partial synthesis products immediately following pooled synthesis.
- an excess of exonuclease single or multiple enzymes
- the reaction converts these products to single nucleotides or short nucleotide oligomers.
- An example of an exonuclease that can efficiently perform this reaction is T5 exonuclease. It is possible to generate partial synthesis products that, due to errors in synthesis, are either circularized or protected at both ends. These can be removed by adding a nuclease that resolves (cleaves) bubbles resulting from the hybridization of two gene fragments that are not fully complementary.
- SI nuclease An example of an exonuclease that can efficiently perform this reaction is SI nuclease. Selection should result in a highly enriched pool of full length gene products. Adaptive sequencing (see below) may be used for additional enrichment or isolation of specific subpools, genes, or clones.
- Output products from pooled synthesis are finally amplified and cleaned up to increase concentration before they are shipped to a customer.
- Additional selection or adaptive sequencing may be used for isolation of specific subpools, genes, or clones.
- Embodiment 1 A method of generating a population of product polynucleotides, wherein the method comprises: a) pooling a plurality of polynucleotide fragments, wherein the polynucleotide fragments comprise sequences derived from at least two different genes; b) performing a synthesis reaction to produce a plurality of target polynucleotides; and c) selecting a population of product polynucleotides from the plurality of target polynucleotides.
- Embodiment 2 The method of embodiment 1 , wherein the plurality of polynucleotide fragments is generated by: i) providing a plurality 7 of polynucleotides, wherein the polynucleotides comprise sequences derived from at least two different genes; ii) contacting the plurality of polynucleotides with a plurality of adapters to produce a reaction mixture; and iii) subjecting the reaction mixture to amplification reaction conditions to produce a plurality of polynucleotide fragments.
- Embodiment 3 The method of embodiment 2, wherein the plurality of polynucleotides comprise partially double-stranded polynucleotides, double-stranded polynucleotides, or a mixture thereof.
- Embodiment 4 The method of embodiment 2 or embodiment 3, wherein the plurality of polynucleotides is designed using a thermodynamic model, a machine learning model, and/or artificial intelligence.
- Embodiment 5 The method of any one of embodiments 2-4, wherein the plurality of polynucleotides comprises sequences derived from between about 2 to about 100,000 unique polynucleotide sequences.
- Embodiment 6 The method of any one of embodiments 2-5, wherein the plurality 7 of polynucleotides comprises sequences derived from between 2 to about 100,000 different genes.
- Embodiment 7 The method of any one of embodiments 2-6, wherein the plurality of polynucleotides are from about 500 nucleotides to about 2000 nucleotides in length.
- Embodiment 8 The method of any one of embodiments 2-7, wherein the plurality of polynucleotide fragments are from about 500 nucleotides to about 2000 nucleotides in length.
- Embodiment 9 The method of any one of embodiments 2-8, wherein the plurality of adapters comprises a primer, a barcode, an unmasking site, and/or a cloning site.
- Embodiment 10 The method of any one of embodiments 2-9, wherein the plurality of adapters comprises at least one 5’ end-protected and at least one 3’ end-protected priming sequence for each product polynucleotide.
- Embodiment 11 The method of any one of embodiments 2-10, wherein the plurality of adapters comprises at least one pro-telomerase site for the production of covalently closed polynucleotide ends.
- Embodiment 12 The method of any one of embodiments 2-11, wherein the plurality of adapters comprises interacting sequences or modifications to produce a circularized polynucleotide in synthesis.
- Embodiment 13 The method of embodiment 12, wherein interacting sequences or nucleotide modifications result in in-trans synthesis of a subset of polynucleotides followed by in-cis synthesis of a subset of polynucleotides to produce a circular polynucleotide.
- Embodiment 18 The method of embodiment 17, wherein the contacting results in removal of polynucleotides that are not product polynucleotides.
- Embodiment 19 The method of embodiment 17 or 18, wherein the nuclease is SI nuclease.
- Embodiment 20 The method of any one of embodiments 17-19, wherein the exonuclease is T5 exonuclease.
- Embodiment 21 The method of any one of embodiments 1-20, further comprising unmasking at a non-natural nucleotide, a deoxyuracil, or a methylated nucleotide.
- Embodiment 22 The method of embodiment 21, wherein the unmasking comprises cleavage at the non-natural nucleotide, deoxyuracil, or methylated nucleotide.
- Embodiment 23 The method of any one of embodiments 1-22, further comprising unmasking via photocleavage.
- Embodiment 24 The method of any one of embodiments 1-23, wherein the product polynucleotides are at least 99.90% identical to the sequence of a desired product polynucleotide.
- Embodiment 25 The method of any one of embodiments 1-24, wherein the synthesis reaction comprises an overlap extension synthesis reaction.
- Embodiment 26 The method of any one of embodiments 1-25, wherein the synthesis reaction is performed under isothermal conditions.
- Embodiment 27 The method of any one of embodiments 1-26, wherein the synthesis reaction comprises contacting the plurality of target polynucleotides with an exonuclease, a polymerase, and a ligase.
- Embodiment 28 The method of embodiment 27, wherein the exonuclease is T5 exonuclease, T7 exonuclease, and/or Lambda exonuclease.
- Embodiment 29 The method of any one of embodiments 1-28, wherein the population of product polynucleotides encodes for at least two different proteins.
- Embodiment 30 The method of any one of embodiments 1-29, wherein one or more synthesis reactions are performed in an iterative series to produce a plurality of target polynucleotide sequences.
- Embodiment 31 The method of any one of embodiments 1-30, wherein the population of product polynucleotides are used to generate a plurality of polynucleotide fragments.
- Embodiment 32 The method of embodiment 31, wherein the plurality of polynucleotide fragments are used for step a) of a subsequent iteration of the method.
- Embodiment 33 The method of any one of embodiments 1-32, wherein the product polynucleotides are about 0.1 kb to about 10.0 kb in length.
- Embodiment 34 The method of any one of embodiments 1-33, wherein the product polynucleotides are linear.
- Embodiment 35 The method of any one of embodiments 1-34, wherein the product polynucleotides are circular.
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Organic Chemistry (AREA)
- Engineering & Computer Science (AREA)
- Genetics & Genomics (AREA)
- Wood Science & Technology (AREA)
- Zoology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Biochemistry (AREA)
- Molecular Biology (AREA)
- Biotechnology (AREA)
- Physics & Mathematics (AREA)
- Biophysics (AREA)
- Microbiology (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Biomedical Technology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Analytical Chemistry (AREA)
- Data Mining & Analysis (AREA)
- General Chemical & Material Sciences (AREA)
- Plant Pathology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Crystallography & Structural Chemistry (AREA)
- Immunology (AREA)
- Medical Informatics (AREA)
- Bioethics (AREA)
- Software Systems (AREA)
- Evolutionary Biology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Public Health (AREA)
- Evolutionary Computation (AREA)
- Epidemiology (AREA)
- Structural Engineering (AREA)
- Medicinal Chemistry (AREA)
- Databases & Information Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
Abstract
La présente invention concerne des procédés et des compositions permettant la synthèse d'ADN à très faible coût afin de faciliter les tests fonctionnels à grande échelle de diverses protéines. Selon certains aspects, les présents procédés et compositions consistent à concevoir et à synthétiser des polynucléotides de produits souhaités.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202363585355P | 2023-09-26 | 2023-09-26 | |
| US63/585,355 | 2023-09-26 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025072262A1 true WO2025072262A1 (fr) | 2025-04-03 |
Family
ID=95202221
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2024/048310 Pending WO2025072262A1 (fr) | 2023-09-26 | 2024-09-25 | Synthèse d'adn à ultra-haute capacité pour la conception biologique fonctionnellement générative |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025072262A1 (fr) |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20070269870A1 (en) * | 2004-10-18 | 2007-11-22 | George Church | Methods for assembly of high fidelity synthetic polynucleotides |
| US20150353927A1 (en) * | 2013-01-10 | 2015-12-10 | Ge Healthcare Dharmacon, Inc. | Templates, Libraries, Kits and Methods for Generating Molecules |
| US20200199577A1 (en) * | 2018-12-19 | 2020-06-25 | New England Biolabs, Inc. | Target Enrichment |
| US20220106589A1 (en) * | 2020-10-02 | 2022-04-07 | Inscripta, Inc. | Methods and systems for modeling of design representation in a library of editing cassettes |
| US20230211308A1 (en) * | 2013-08-05 | 2023-07-06 | Twist Bioscience Corporation | De novo synthesized gene libraries |
-
2024
- 2024-09-25 WO PCT/US2024/048310 patent/WO2025072262A1/fr active Pending
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20070269870A1 (en) * | 2004-10-18 | 2007-11-22 | George Church | Methods for assembly of high fidelity synthetic polynucleotides |
| US20150353927A1 (en) * | 2013-01-10 | 2015-12-10 | Ge Healthcare Dharmacon, Inc. | Templates, Libraries, Kits and Methods for Generating Molecules |
| US20230211308A1 (en) * | 2013-08-05 | 2023-07-06 | Twist Bioscience Corporation | De novo synthesized gene libraries |
| US20200199577A1 (en) * | 2018-12-19 | 2020-06-25 | New England Biolabs, Inc. | Target Enrichment |
| US20220106589A1 (en) * | 2020-10-02 | 2022-04-07 | Inscripta, Inc. | Methods and systems for modeling of design representation in a library of editing cassettes |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Venter et al. | Synthetic chromosomes, genomes, viruses, and cells | |
| CN102803489B (zh) | 多核苷酸变体的组合自动化平行合成 | |
| CN104603286B (zh) | 在体外克隆中分选核酸和多重制备物的方法 | |
| Patten et al. | Applications of DNA shuffling to pharmaceuticals and vaccines | |
| CN103668472B (zh) | 利用CRISPR/Cas9系统构建真核基因敲除文库的方法 | |
| KR101454886B1 (ko) | 핵산분자의 제조방법 | |
| Xiong et al. | Chemical gene synthesis: strategies, softwares, error corrections, and applications | |
| JP2002537758A (ja) | コドン変更された遺伝子のシャッフリング | |
| CN111748637B (zh) | 一种用于亲缘关系分析鉴定的snp分子标记组合、多重复合扩增引物组、试剂盒及方法 | |
| Poluri et al. | Protein engineering techniques: Gateways to synthetic protein universe | |
| WO2019005955A1 (fr) | Transcriptase inverse améliorée et procédés d'utilisation associés | |
| RU2377248C2 (ru) | Устойчивые к грибкам растения и их использование | |
| JP2025108504A (ja) | プライマー認識が改良されたPhi29 DNAポリメラーゼ変異体 | |
| CN109486814A (zh) | 一种用于修复HBB1基因点突变的gRNA、基因编辑系统、表达载体和基因编辑试剂盒 | |
| CN112458080B (zh) | 一种获得针对lncRNA LOC157273的siRNA钓取方法 | |
| JP3777158B2 (ja) | 一方向性の一本鎖dna切片を用いた組換えdnaライブラリーの製造方法 | |
| US9834762B2 (en) | Modified polymerases for replication of threose nucleic acids | |
| CN116042573A (zh) | 一种提高引导编辑系统碱基编辑效率的方法 | |
| WO2025072262A1 (fr) | Synthèse d'adn à ultra-haute capacité pour la conception biologique fonctionnellement générative | |
| CA3206795A1 (fr) | Procedes et systemes pour generer une diversite d'acides nucleiques | |
| CN108949763A (zh) | 能有效抑制猪瘟病毒感染的精确突变LamR基因及应用 | |
| US20210054451A1 (en) | Optimizing high-throughput sequencing capacity | |
| CN115703842A (zh) | 高效率高精度的胞嘧啶c到鸟嘌呤g转变的碱基编辑器 | |
| WO2024119461A1 (fr) | Compositions et procédés pour détecter les sites de clivage cibles des nucléases crispr/cas et la translocation de l'adn | |
| KR20210088615A (ko) | Dna 라이브러리의 다중 결정적 어셈블리 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24873452 Country of ref document: EP Kind code of ref document: A1 |