WO2025042766A1 - Shotgun genetic engineering - Google Patents
Shotgun genetic engineering Download PDFInfo
- Publication number
- WO2025042766A1 WO2025042766A1 PCT/US2024/042753 US2024042753W WO2025042766A1 WO 2025042766 A1 WO2025042766 A1 WO 2025042766A1 US 2024042753 W US2024042753 W US 2024042753W WO 2025042766 A1 WO2025042766 A1 WO 2025042766A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- gene
- library
- targeting signal
- sequence
- vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/10—Processes for the isolation, preparation or purification of DNA or RNA
- C12N15/1034—Isolating an individual clone by screening libraries
- C12N15/1079—Screening libraries by altering the phenotype or phenotypic trait of the host
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/10—Processes for the isolation, preparation or purification of DNA or RNA
- C12N15/1034—Isolating an individual clone by screening libraries
- C12N15/1065—Preparation or screening of tagged libraries, e.g. tagged microorganisms by STM-mutagenesis, tagged polynucleotides, gene tags
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/63—Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
- C12N15/79—Vectors or expression systems specially adapted for eukaryotic hosts
- C12N15/85—Vectors or expression systems specially adapted for eukaryotic hosts for animal cells
- C12N15/86—Viral vectors
-
- C—CHEMISTRY; METALLURGY
- C40—COMBINATORIAL TECHNOLOGY
- C40B—COMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
- C40B40/00—Libraries per se, e.g. arrays, mixtures
- C40B40/04—Libraries containing only organic compounds
- C40B40/06—Libraries containing nucleotides or polynucleotides, or derivatives thereof
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N2810/00—Vectors comprising a targeting moiety
Definitions
- This application relates to a method of identifying one or more transcriptional units that confer one or more desired characteristics to a host cell.
- This application also relates to a method of genetically engineering a cell based on the identified transcriptional units.
- This application further relates to libraries involved in the methods described herein, for example, libraries of transcription units, libraries of vectors, and/or libraries of host cells.
- a method of identifying one or more transcriptional units that confer one or more desired characteristics to a host cell comprising: a) providing (i) one or more gene sequences that encode gene product(s) associated with the one or more desired characteristics; and (ii) one or more regulatory sequences; b) generating one or more test transcription units, wherein each test transcription unit comprises one or more gene sequences which are placed under the control of a regulatory sequence, said gene sequence(s) and said regulatory sequence are independently selected from the one or more gene sequences and the one or more regulatory sequences provided in step (a), respectively; and wherein each test transcription unit further comprises one or more gene sequence barcodes each of which uniquely identifies one of said gene sequence(s) and a regulatory sequence barcode that uniquely identifies said regulatory sequence; c) generating one or more vectors, wherein each vector comprises a test transcriptional unit selected from the one or more test transcriptional units; d) introducing the one or more vectors into a population of host cells,
- the method further comprises g) determining copy number of the one or more transcriptional units identified in step (f), or the level of the product(s) encoded by the gene sequence(s) comprised within the one or more test transcriptional units identified in step (f) or corresponding RNAs.
- one or more test transcriptional units further comprise a nucleotide sequence encoding an organellar targeting signal which directs the gene product from the gene sequence to a target location in the host cell.
- the organellar targeting signal is selected from a mitochondrial targeting signal, an endoplasmic reticulum (ER) targeting signal, a nuclear localization sequence, a peroxisome targeting signal, a lysosome targeting signal, and a membrane targeting signal.
- ER endoplasmic reticulum
- the one or more test transcriptional units further comprise an organellar targeting signal barcode that uniquely identifies said nucleotide sequence encoding the organellar targeting signal.
- step (f) further comprises detecting the organellar targeting signal barcode in DNA of the one or more host cells selected in step (e) and identifying the nucleotide sequence encoding the organellar targeting signal comprised within the one or more test transcriptional units that confer the one or more desired characteristics to the host cell(s) based on the detected barcode.
- each regulatory sequence comprises one or more elements selected from a promoter, an enhancer, a silencer, an insulator, an operator, and a terminator.
- the one or more gene sequences comprise at least one gene sequence that is not naturally present in the host cells. In some embodiments, the one or more gene sequences comprise at least one gene sequence that is the same as or derived from a gene sequence naturally present in the host cells.
- the one or more gene sequences comprise at least one gene sequence that is derived from a different species than the host cells.
- the one or more gene sequences comprise at least one gene sequence that is codon-optimized for expression in the host cells.
- the one or more gene sequences comprise at least one gene sequence encoding a genome engineering system and/or a component thereof.
- the genome engineering system is a CRISPR activation (CRISPRa) or CRISPR inhibition (CRISPRi) system.
- the one or more gene sequences comprise at least one gene sequence encoding a negative control.
- the negative control is a detectable marker.
- the detectable marker is a fluorescent protein.
- the one or more gene sequences comprise at least one gene sequence encoding a synthetic protein.
- the one or more gene sequences comprise about 1-10,000 gene sequences. In some embodiments, the one or more gene sequences comprise about 2-100 gene sequences. In some embodiments, one or more gene sequences comprise about 4-20 gene sequences.
- the one or more regulatory sequences comprise about 1-10,000 regulatory sequences. In some embodiments, the one or more regulatory sequences comprise about 2-10 regulatory sequences. In some embodiments, the one or more regulatory sequences comprise about 2-5 regulatory sequences.
- the vector is a viral vector. In some embodiments, the viral vector is an adenoviral vector, retroviral vector, or herpes viral vector. In some embodiments, the retroviral vector is a lentiviral vector.
- the vector is a non-viral vector.
- the non- viral vector is a transposon.
- the one or more test transcription units are generated using a Golden Gate cloning technique.
- the gene sequence barcode(s), regulatory sequence barcode, and/or organellar targeting signal barcode are present in a 3’ untranslated region (UTR) of each test transcription unit.
- the 3’ UTR of each test transcription unit further comprises a pair of universal primer binding sites flanking the barcodes.
- detection of one or more barcodes is achieved using polymerase chain reaction (PCR) with one or more pairs of universal primers followed by sequencing of PCR product(s).
- detection of one or more barcodes is achieved using a single-cell sequencing approach.
- the copy number of the one or more transcriptional units is determined using quantitative PCR, digital droplet PCR, or hybridization array.
- the method further comprises performing one or more singlecell omics analyses on the one or more host cells selected in step (e).
- the one or more single-cell omics analyses comprise single-cell RNA sequencing, single-cell proteomics, single-cell metabolomics, and/or single-cell epigenomics.
- step (e) comprises culturing the population of host cells generated in step (d) under conditions allowing for selection of the one or more host cells that exhibit the desired characteristic.
- the host cells are mammalian cells.
- a method of genetically engineering a cell comprising introducing into the cell the one or more transcriptional units identified according to the method of described herein.
- the one or more desired characteristics is selected from protein composition, protein content, DNA composition, DNA content, tolerance to environmental stressor(s), doubling time, prototrophy, biosynthesis capability, cell surface marker(s), cell size, light absorbance, light reflection, fluorescence, light scatter, polarization, one or more electrical properties of the cells, one or more magnetic properties, one or more morphological properties, membrane permeability, membrane fluidity, and/or redox state.
- a library comprising a plurality of transcription units, wherein each transcription unit comprises one or more gene sequence(s) which are placed under the control of a regulatory sequence; and wherein each transcription unit further comprises one or more gene sequence barcodes each of which uniquely identifies one of said gene sequence(s) and a second barcode that uniquely identifies said regulatory sequence.
- one or more transcriptional units further comprise a nucleotide sequence encoding an organellar targeting signal which directs a gene product from the gene sequence to a target location in a host cell.
- the organellar targeting signal is selected from a mitochondrial targeting signal, an endoplasmic reticulum (ER) targeting signal, a nuclear localization sequence, a peroxisome targeting signal, and a lysosome targeting signal, and a membrane targeting signal.
- the one or more test transcriptional units further comprises an organellar targeting signal barcode that uniquely identifies said nucleotide sequence encoding the organellar targeting signal.
- each regulatory sequence comprises one or more elements selected from a promoter, an enhancer, a silencer, an insulator, an operator, and a terminator.
- the gene sequence barcode(s), regulatory sequence barcode, and/or organellar targeting signal barcode are present in a 3’ untranslated region (UTR) of each transcription unit.
- the 3’ UTR of each transcription unit further comprises a pair of universal primer binding sites flanking the barcodes.
- the library comprises about 2-10,000 transcription units. In some embodiments, the library comprises about 2-1000 transcription units. In some embodiments, the library comprises about 8-100 transcription units.
- a library comprising a plurality of vectors, wherein each vector comprises a transcriptional unit selected from the library described herein.
- the vector is a viral vector.
- the viral vector is an adenoviral vector, retroviral vector, or herpes viral vector.
- the retroviral vector is a lentiviral vector.
- the vector is a non-viral vector. In some embodiments, the non-viral vector is a transposon.
- a library comprising a plurality of host cells, wherein each host cell expresses one or more gene product(s) from one or more transcriptional unit(s) selected from the library comprising a plurality of transcription units described herein or comprises a vector from the library of vectors described herein.
- kits comprising a library described herein, and optionally, packaging and/or instructions for using the same.
- the kit further comprises one or more pairs of universal primers.
- Figs. 1A-1B show a schematic representation of the shotgun genetic engineering workflow.
- Fig. 2 is a schematic showing that current methods to engineer mammalian metabolism rely on testing one solution at a time.
- FIG. 3 shows a previously engineered valine biosynthesis solution (pMTIV) in Chinese Hamster ovary (CHO) cells (Trolle et al., 2022).
- Fig. 4 shows that engineering valine biosynthesis in CHO enabled growth in valine-free conditions.
- Fig. 5 shows a schematic representation of minimal library design and assembly (LibJTROOl).
- Fig. 6 shows that clones generated using shotgun genetic engineering with LibJTROOl outperformed the rationally designed solution engineered in previous work (pMTIV).
- Fig. 7 shows that Amplicon-seq revealed similar barcode signatures in well-performing prototrophic clones derived from LibJTROOl cells.
- Fig. 8 shows that qRT-PCR revealed expression signatures that characterize wellperforming prototrophic clones derived from LibJTROOl cells.
- Fig. 9 shows LibJTR012 library - composition of transcription units as determined by Amplicon-seq.
- Fig. 10 shows a valine-free growth assay featuring clones derived from LibJTR012 cells and identified by selection in low valine conditions.
- Fig. 11 shows barcodes contained within clone 012-Val-Dl, the LibJTR012 clone identified as growing the fastest in valine-free medium following low valine selection. The expectation of finding ilvB, ilvC, ilvD, and HvN is based on the experiment illustrated in Example 2, which showed that the introduction of these 4 genes is sufficient to confer valine prototrophy to CHO cells.
- Fig. 12 shows isoleucine-free growth assay featuring clones derived from LibJTR012 cells and identified by selection in low isoleucine conditions. Growth characteristics of clones exhibiting isoleucine prototrophy across multiple passaging events.
- Fig. 13 shows barcodes contained within clone 012-10p-Hl, a LibJTR012 clone identified as growing quickly in isoleucine-free medium following low isoleucine selection.
- Fig. 14 shows comparison of survival advantage of 7 clones selected on low isoleucine RPMI medium in low isoleucine and low valine RPMI, respectively.
- the present disclosure provides a novel approach, termed “shotgun genetic engineering”, to engineering complex cellular behaviors in cells (e.g., prokaryotic cells or eukaryotic cells) that combines DNA synthesis, diversity sampling (e.g., phylogenetic or functional diversity sampling), multiplexed high-throughput screens, and evolutionary selection.
- shotgun genetic engineering to engineering complex cellular behaviors in cells (e.g., prokaryotic cells or eukaryotic cells) that combines DNA synthesis, diversity sampling (e.g., phylogenetic or functional diversity sampling), multiplexed high-throughput screens, and evolutionary selection.
- the present approach tackles both the above-mentioned constraints, enabling the sampling of a high number of potential solutions to a cellular engineering problem in a pooled approach while requiring relatively little infrastructure.
- the present disclosure allows the rapid generation and testing of millions of potential solutions at once. Diversity directly relevant to the desired phenotype can be generated on several dimensions including gene sequence diversity, gene expression diversity, and localization (organellar compartmentalization) diversity. It is expected that each genetic solution generated via the present approach can generate vastly different phenotypic outcomes as compared to the difference in outcome that would be generated by individual or few base pair mutations at a time (the current state-of-the-art). The present approach further enables quick identification of the relevant genetic modifications that have enabled the optimized cellular behavior. Altogether the present approach can enable the engineering of complex cellular behaviors in cells.
- the development of the present technology can have applications in any industry involving the large-scale culturing of cells (e.g., prokaryotic cells or eukaryotic cells) or the requirement for encoding complex behaviors in cells (e.g., prokaryotic cells or eukaryotic cells). This is inclusive of but not restricted to the development of cell therapies with complex behaviors, cultured meat products as well as in the production of biologies and viral vectors.
- Cellular behaviors that could be encoded in cells e.g., prokaryotic cells or eukaryotic cells) to ease their culturing at scale include improving upon their tolerance to various stress conditions such as shear stress, hypoxia, and variations in pH.
- Desired cell state and/or cell fate could be introduced in cells (e.g., prokaryotic cells or eukaryotic cells) to generate desired cell types for applications in biomedicine such as cell fate conversion for cell therapy.
- Other advantageous behaviors for culturing cells at scale include reducing the requirement for exogenous supplementation of essential nutrients such as amino acids and growth factors or adapting adherent cells for growth in suspension.
- essential nutrients such as amino acids and growth factors or adapting adherent cells for growth in suspension.
- the term “about” or “approximately” means within a statistically meaningful range of a value. Such a range can be within an order of magnitude, preferably within 50%, more preferably within 20%, still more preferably within 10%, and even more preferably within 5% of a given value or range.
- the allowable variation encompassed by the term “about” or “approximately” depends on the particular system under study, and can be readily appreciated by one of ordinary skill in the art.
- characteristics described herein include refersto any functional or phenotypical trait of a cell.
- exemplary characteristics of a cell include, but are not limited to, protein composition, protein content, DNA composition, DNA content, RNA composition, RNA content, fatty acid composition, fatty acid content, lipid composition, lipid content, sugar composition, sugar content, tolerance to environmental stressors, doubling time, prototrophy, biosynthesis capability, cell surface markers, cell size, light absorbance, light reflection, fluorescence, light scatter, polarization, electrical properties of the cells, magnetic properties, morphological properties, membrane permeability, membrane fluidity, and redox state.
- a method described herein may identify or engineer any of a number of alternative characteristics in a cell of interest, and that these alternative characteristics are readily amenable to exploitation in the described methods.
- the term “vector”, as used herein, means a nucleic acid molecule capable of transporting another nucleic acid to which it has been linked.
- the vector is a viral vector, wherein additional DNA segments may be ligated into the viral genome.
- the vector is a non-viral vector, such as a transposon or plasmid.
- the vectors are capable of autonomous replication in a host cell into which they are introduced (e. , bacterial vectors having a bacterial origin of replication and episomal mammalian vectors).
- the vectors e.g., non-episomal mammalian vectors
- the vectors can be integrated into the genome of a host cell upon introduction into the host cell, and thereby are replicated along with the host genome.
- certain vectors are capable of directing the expression of genes to which they are operably linked. Such vectors are referred to herein as “recombinant expression vectors” (or simply, “expression vectors”).
- regulatory sequence means a nucleic acid sequence which can regulate expression of a gene product operably linked to the regulatory sequence.
- this sequence may be the core promoter sequence and in other instances, this sequence may also include other regulatory elements which are required for expression of the gene product, such as an enhancer, a silencer, an insulator, an operator, and a terminator.
- promoter as used herein is defined as a nucleic acid sequence recognized by the transcriptional machinery of the cell, or introduced transcriptional machinery, required to initiate the specific transcription of a polynucleotide sequence.
- a “constitutive” promoter is a nucleotide sequence which, when operably linked with a polynucleotide which encodes or specifies a gene product, causes the gene product to be produced in a cell under most or all physiological conditions of the cell.
- An “inducible” promoter is a nucleotide sequence which, when operably linked with a polynucleotide which encodes or specifies a gene product, causes the gene product to be produced in a cell substantially only when an inducer which corresponds to the promoter is present in the cell.
- nucleic acid and “nucleotide”, used interchangeably herein, encompass both DNA and RNA unless specified otherwise.
- a reference to a nucleotide sequence encompasses its complement unless otherwise specified.
- a reference to a nucleic acid having a particular sequence should be understood to encompass its complementary strand, with its complementary sequence.
- a “library” refers to an isolated collection of at least two elements that differ from one another in at least one aspect.
- a “library of transcription units” is a collection of at least two transcription units that may differ from one another by at least one element, such as a gene sequence, regulatory sequence (e.g., promoter), or organellar targeting signal.
- a “vector library” is a collection of vectors that may differ from one another by at least one element in the vector, such as a gene sequence, regulatory sequence (e.g., promoter), or organellar targeting signal included in a transcription unit comprised within the vector.
- the elements of the library are isolated from like type of elements that are not part of the library (e.g., vectors of a vector library are isolated from vectors that are not part of the library).
- the library may exist in vitro or ex vivo.
- a method of identifying one or more transcriptional units that confer one or more desired characteristics to a host cell comprising one or more of the following steps: a) providing (i) one or more gene sequences that encode gene product(s) associated with the one or more desired characteristics; and (ii) one or more regulatory sequences; b) generating one or more test transcription units, wherein each test transcription unit comprises one or more gene sequences which are placed under the control of a regulatory sequence, the gene sequence(s) and the regulatory sequence are independently selected from the one or more gene sequences and the one or more regulatory sequences provided in step (a), respectively; and wherein each test transcription unit further comprises one or more gene sequence barcodes each of which uniquely identifies one of the gene sequence(s) and a regulatory sequence barcode that uniquely identifies the regulatory sequence; c) generating one or more vectors, wherein each vector comprises a test transcriptional unit selected from the one or more test transcriptional units; d) introducing the one or more vectors into
- the selection of the population of host cells generated in step (d) can be performed using a cell culture method (e.g., by culturing host cells under condition(s) that allows growth of the host cells that can express desired gene product(s)), a cell sorting method (e.g., fluorescence activated cell sorting based on a biosensor integrated in the target cells, magnetic sorting), or an imaging method.
- a cell culture method e.g., by culturing host cells under condition(s) that allows growth of the host cells that can express desired gene product(s)
- a cell sorting method e.g., fluorescence activated cell sorting based on a biosensor integrated in the target cells, magnetic sorting
- an imaging method e.g., a cell sorting method, fluorescence activated cell sorting based on a biosensor integrated in the target cells, magnetic sorting
- the method further comprises a step (g): determining copy number of the one or more transcriptional units identified in step (f), or determining the level of the product(s) encoded by the gene sequence(s) comprised within the one or more test transcriptional units identified in step (f) or corresponding RNAs.
- the regulatory sequence described herein may comprise any expression control sequences including, for example, sequences necessary for appropriate transcription initiation, termination, promoter and enhancer sequences; efficient RNA processing signals such as splicing and polyadenylation signals; sequences that stabilize cytoplasmic mRNA; sequences that enhance translation efficiency (/.e., Kozak consensus sequence); sequences that enhance protein stability; and when desired, sequences that enhance protein secretion.
- efficient RNA processing signals such as splicing and polyadenylation signals
- sequences that stabilize cytoplasmic mRNA sequences that enhance translation efficiency (/.e., Kozak consensus sequence); sequences that enhance protein stability; and when desired, sequences that enhance protein secretion.
- the nature of such regulatory sequences differs depending upon the host cell. For example, in prokaryotes, such regulatory sequences generally include promoter, ribosomal binding site, and transcription termination sequence; in eukaryotes, generally, such regulatory sequences include promoters and transcription termination sequence.
- a regulatory sequence described herein may comprise one or more elements selected from a promoter, an enhancer, a silencer, an insulator, an operator, and a terminator.
- the regulatory sequence described herein comprises at least a promoter.
- the promoter may be a constitutive or inducible promoter.
- promoters suitable for use in the methods and compositions described herein include, not are not limited to, a cytomegalovirus (CMV) promoter, a simian virus 40 (SV40) promoter, a human elongation factor 1 alpha (EFla) promoter, or a phosphoglycerate kinase (PGK) promoter, a minimal promoter fragment derived from the CMV promoter (minCMV promoter), a RSV LTR, a MoMLV LTR, a CK6 promoter, a transthyretin promoter (TTR), a TK promoter, a tetracycline responsive promoter (TRE), an HBV promoter, an hAAT promoter, a LSP promoter, chimeric liver-specific promoters (LSPs), a E2F promoter, a telomerase (hTERT) promoter, a cytomegalovirus enhancer/chicken beta-actin/Rabbit P
- CMV
- promoters used in the methods described herein may be an RNA polymerase II promoter. In some embodiments, promoters used in the methods described herein may be an RNA polymerase I promoter. In some embodiments, promotors used in the methods described herein may be an RNA polymerase III promoter.
- one or more test transcriptional units may further comprise a nucleotide sequence encoding an organellar targeting signal which directs the gene product from the gene sequence to a target location (e.g., an organellar or a sub-organellar compartment, cell surface) in the host cell.
- organellar targeting signals include a mitochondrial targeting signal (or a mitochondrial localization signal), an endoplasmic reticulum (ER) targeting signal, a nuclear localization sequence, a peroxisome targeting signal, a lysosome targeting signal, and a membrane targeting signal.
- the one or more test transcriptional units may further comprise an organellar targeting signal barcode that uniquely identifies the nucleotide sequence encoding the organellar targeting signal.
- the method may further comprise in step (f) detecting the organellar targeting signal barcode in DNA of one or more host cells selected in step (e) and identifying the nucleotide sequence encoding the organellar targeting signal comprised within the one or more test transcriptional units that confer the one or more desired characteristics to the host cell(s) based on the detected barcode.
- gene sequences used in the methods or compositions described herein may comprise at least one gene sequence that is not naturally present in the host cells. In some embodiments, gene sequences used in the methods or compositions described herein may comprise at least one gene sequence that is the same as or derived from a gene sequence naturally present in the host cells.
- gene sequences used in the methods or compositions described herein comprise at least one gene sequence that is derived from a different species than the host cells.
- gene sequences used in the methods or compositions described herein comprise at least one gene sequence that is codon-optimized for expression in the host cells. In some embodiments, gene sequences used in the methods or compositions described herein comprise at least one gene sequence that is not codon-optimized for expression in the host cells. In some embodiments, gene sequences used in the methods or compositions described herein comprise gene sequences that are not codon-optimized for expression in the host cells.
- gene sequences used in the methods or compositions described herein comprise at least one gene sequence encoding a genome engineering system and/or a component (e.g., a gene editing nuclease or variants thereof, a guide nucleic acid) thereof.
- a genome engineering system can be used including, but not limited to, CRISPR- associated protein (Cas) nucleases, zinc finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs), meganucleases, other endo- or exo-nucleases, variants thereof, fragments thereof, and combinations thereof.
- the genome engineering system used herein is a CRISPR activation (CRISPRa) or CRISPR inhibition (CRISPRi) system.
- CRISPRa CRISPR activation
- CRISPRi CRISPR inhibition
- the genome engineering systems used herein may be designed to specifically target promoters or coding sequences of genes (e.g., genes that encode gene products that may be associated with the one or more desired characteristics) to change their expression or sequence.
- gene sequences used in the methods or compositions described herein comprise at least one gene sequence encoding a negative control.
- the negative control may be a detectable marker, such as a reporter protein.
- a reporter protein may produce a detectable signal which allows detection of the target cell or tissue.
- reporter proteins include, without limitation, b-lactamase, b- galactosidase (LacZ), alkaline phosphatase, thymidine kinase, green fluorescent protein (GFP), yellow fluorescent protein (YFP), red fluorescent protein (RFP), cyan fluorescent proteins, orange fluorescent proteins, chloramphenicol acetyltransferase (CAT), luciferase, membrane bound proteins including, for example, CD2, CD4, CD8, the influenza hemagglutinin protein, to which high affinity antibodies directed thereto exist or can be produced by conventional means, and fusion proteins comprising a membrane bound protein appropriately fused to an antigen tag domain from, among others, hemagglutinin or Myc.
- the detectable marker is a fluorescent protein, such as a green fluorescent protein (GFP), yellow fluorescent protein (YFP), red fluorescent protein (RFP), cyan fluorescent protein, orange fluorescent protein, and the like.
- GFP green fluorescent protein
- YFP yellow fluorescent protein
- RFP red fluorescent protein
- cyan fluorescent protein orange fluorescent protein, and the like.
- gene sequences used in the methods or compositions described herein comprise at least one gene sequence encoding a synthetic protein.
- the number of gene sequences used in the methods or compositions described herein can be any integral number. In some embodiments, the number of gene sequences used in the methods or compositions described herein is any integral number between 1 to about 10,000 sequences.
- the number of gene sequences used in the methods or compositions described herein is any number between 1 to 2500, between 1 to 5000, between 1 to 1000, between 1 to 500, between 1 to 250, between 2 to 100, between 2 to 80, between 2 to 50, between 2 to 30, between 2 to 20, between 2 to 10, between 3 to 100, between 3 to 80, between 3 to 50, between 3 to 30, between 3 to 20, between 3 to 10, between 4 to 100, between 4 to 80, between 4 to 50, between 4 to 30, between 4 to 20, between 4 to 10, between 5 to 100, between 5 to 80, between 5 to 50, between 5 to 30, between 5 to 20, between 5 to 15, or between 5 to 10.
- the number of gene sequences used in the methods or compositions described herein is 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or about 25, about 30, about 35, about 40, about 45, about 50, about 55, about 60, about 65, about 70, about 75, about 80, about 85, about 90, about 95, about 100, about 100, about 250, about 500, about 1000, about 2000, about 2500, about 3000, about 4000, about 5000, about 6000, about 7000, about 8000, about 9000, or about 10,000.
- the number of regulatory sequences used in the methods or compositions described herein can be any integral number.
- the number of regulatory sequences used in the methods or compositions described herein is any integral number between 1 to about 10,000 sequences. In some embodiments, the number of regulatory sequences used in the methods or compositions described herein is any integral number between 1 to about 5000 sequences. In some embodiments, the number of regulatory sequences used in the methods or compositions described herein is any integral number between 1 to about 2500 sequences. In some embodiments, the number of regulatory sequences used in the methods or compositions described herein is any integral number between 1 to about 1000 sequences. In some embodiments, the number of regulatory sequences used in the methods or compositions described herein is any integral number between 1 to about 500 sequences.
- the number of regulatory sequences used in the methods or compositions described herein is any integral number between 1 to about 250 sequences. In some embodiments, the number of regulatory sequences used in the methods or compositions described herein is any integral number between 1 to about 100 sequences. In some embodiments, the number of regulatory sequences used in the methods or compositions described herein is any integral number between 1 to about 50 sequences.
- the number of regulatory sequences used in the methods or compositions described herein is any number between 2 to 40, between 2 to 30, between 2 to 20, between 2 to 10, between 2 to 8, between 2 to 5, between 3 to 40, between 3 to 30, between 3 to 20, between 3 to 10, between 3 to 8, between 3 to 6, between 4 to 40, between 4 to 30, between 4 to 20, between 4 to 10, between 4 to 8, between 5 to 40, between 5 to 30, between 5 to 20, or between 5 to 10.
- the number of regulatory sequences used in the methods or compositions described herein is 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, about 25, about 30, about 35, about 40, about 45, about 50, about 55, about 60, about 65, about 70, about 75, about 80, about 85, about 90, about 95, about 100, , about 250, about 500, about 1000, about 2000, about 2500, about 3000, about 4000, about 5000, about 6000, about 7000, about 8000, about 9000, or about 10,000.
- the methods or compositions described herein involve the use of a viral vector.
- viral vectors suitable for use in the methods or compositions described herein include an adenoviral vector, a retroviral vector (e.g., lentiviral vector), a herpes viral vector, a baculoviral vector, and/or an adeno-associated viral (AAV) vector.
- the viral vector used herein is a lentiviral vector.
- the methods or compositions described herein may also include the use of a non-viral vector.
- the non-viral vector is a transposon.
- Exemplary transposons that may be used in the present methods include PiggyBac transposons and sleeping beauty transposons.
- cloning techniques may be used to generate the transcription units described herein.
- suitable techniques include, but are not limited to, Golden Gate cloning technique, Gibson cloning, gateway cloning, InFusion cloning, yeast assembly, or restriction cloning.
- transcription units described herein are generated using a Golden Gate cloning technique.
- a library of vectors carrying barcoded regulatory sequences and/or nucleotide sequences encoding organellar targeting signals also carry IIS restriction sites which enable the insertion of any member of a pool of barcoded gene sequences, which are also flanked by IIS restriction sites.
- the barcodes described herein may be present anywhere in the transcription unit.
- the barcodes may be present in a 3’ untranslated region (UTR) or 5’ UTR of each test transcription unit, or in an intron engineered into the transcription unit.
- the barcodes are present in the 3’ UTR of each test transcription unit.
- the transcription units further comprise a pair of universal primer binding sites flanking the barcodes.
- the detection of one or more barcodes may be achieved using polymerase chain reaction (PCR) with one or more pairs of universal primers followed by sequencing of PCR product(s).
- PCR polymerase chain reaction
- the detection of one or more barcodes may be achieved using direct whole genome sequencing.
- Copy number of transcriptional units may be determined using any methods known in the art, such as quantitative PCR, digital droplet PCR, or hybridization array.
- the method may further comprise performing one or more single-cell omics analyses on one or more host cells selected in step (e).
- the single-cell omics analyses may include single-cell RNA sequencing, single-cell proteomics, single-cell metabolomics, and/or single-cell epigenomics.
- a method of genetically engineering a cell by introducing into the cell the one or more transcriptional units identified according to the shotgun genetic engineering method described herein.
- Bacterial cells may be preferred prokaryotic cells in some circumstances.
- the bacterial cells may be a strain of E. coll such as, the E. coli strains DH5, RR1, Stbl4TM, and NEB® Stable, or a strain of Vibrio Natriegens.
- Non-limiting examples of eukaryotic cells include yeast, insect, plant and mammalian cells (e.g., from a mouse, rat, monkey, or human cell lines).
- yeast cells include, e.g., BY4741, YPH499, YPH500, and YPH501.
- Non-limiting examples of mammalian cells include Chinese hamster ovary (CHO) cells, NIH Swiss mouse embryo cells NIH/3T3, monkey kidney- derived COS-1 cells, and 293 cells which are human embryonic kidney (HEK) cells.
- Examples of insect cells include Sf9 cells, which can be transfected with baculovirus expression vectors.
- the method described herein is used to engineer mammalian cells.
- the mammalian cells that are used in the methods described herein are Chinese hamster ovary (CHO) cells, human embryonic kidney (HEK) cells, and stem cells such as induced pluripotent stem cells (iPSCs), or mouse embryonic stem (ES) cells.
- iPSCs induced pluripotent stem cells
- ES mouse embryonic stem
- the method described herein is used to engineer plant cells.
- the method described herein is used to engineer insect cells.
- Viral delivery mechanisms include but are not limited to retroviral vectors (e.g., lentiviral vectors), adenoviral vectors, adeno-associated viral (AAV) vectors, herpes viral vectors, and baculoviral vectors etc. as described above.
- retroviral vectors e.g., lentiviral vectors
- AAV adeno-associated viral
- Non-viral delivery mechanisms include transposon-mediated gene delivery, lipid mediated transfection, liposomes, immunoliposomes, lipofection, cationic facial amphiphiles (CFAs) and combinations thereof.
- Successfully transformed or transduced cells i.e., cells that contain a transcription unit of the present disclosure, can be identified by, for example, PCR. Alternatively, the presence of the gene products (e.g., proteins) in the supernatant can be detected using, for example, antibodies.
- the term “isolated” means that the referenced material (e.g, a cell or virus) is removed from its native environment. Thus, an isolated biological material can be free of some or all cellular components, z.e., components of the cells in which the native material occurs naturally (e.g, cytoplasmic or membrane component). A material shall be deemed isolated if it is present in a cell extract or supernatant.
- an isolated nucleic acid includes, without limitation, a PCR product, an isolated RNA (e.g, mRNA), a DNA ( .g., cDNA), or a restriction fragment.
- Isolated nucleic acid molecules include sequences inserted into plasmids, cosmids, artificial chromosomes, and the like, i.e., when it forms part of a chimeric recombinant nucleic acid construct.
- a recombinant nucleic acid is an isolated nucleic acid.
- An isolated protein may be associated with other proteins or nucleic acids, or both, with which it associates in the cell, or with cellular membranes if it is a membrane-associated protein.
- An isolated organelle, cell, or tissue is removed from the anatomical site in which it is found in an organism.
- An isolated material may be, but need not be, purified.
- purified refers to material that has been isolated under conditions that reduce or eliminate the presence of unrelated materials, i.e., contaminants, including native materials from which the material is obtained.
- a purified virus is preferably substantially free of host cell or culture components, including tissue culture or egg proteins, non-specific pathogens, and the like.
- substantially free is used operationally, in the context of analytical testing of the material.
- purified material substantially free of contaminants is at least 50% pure; more preferably, at least 90% pure, and still more preferably at least 99% pure. Purity can be evaluated by chromatography, gel electrophoresis, immunoassay, composition analysis, biological assay, and other methods known in the art.
- Viral particles can be purified by ultrafiltration through sucrose cushions or by ultracentrifugation, preferably continuous centrifugation (see Furminger, In: Nicholson, Webster and May (eds.), Textbook of Influenza, Chapter 24, pp. 324-332). Other purification methods are possible and contemplated herein.
- a purified material may contain less than about 50%, preferably less than about 75%, and most preferably less than about 90%, of the cellular components, media, proteins, or other undesirable components or impurities (as context requires), with which it was originally associated.
- the term “substantially pure” indicates the highest degree of purity which can be achieved using conventional purification techniques known in the art.
- each transcription unit may comprise one or more gene sequence(s) which are placed under the control of a regulatory sequence. Further, each transcription unit may further comprise one or more gene sequence barcodes each of which uniquely identifies one of the gene sequence(s) and a second barcode that uniquely identifies the regulatory sequence.
- the regulatory sequence described herein may comprises one or more elements selected from a promoter, an enhancer, a silencer, an insulator, an operator, and a terminator. In some embodiments, the regulatory sequence described herein comprises at least a promoter.
- one or more transcriptional units may further comprise a nucleotide sequence encoding an organellar targeting signal which directs a gene product from the gene sequence to a target location (e.g., an organellar or a sub-organellar compartment, cell surface) in a host cell.
- the organellar targeting signal may be selected from, for example, a mitochondrial targeting signal, an endoplasmic reticulum (ER) targeting signal, a nuclear localization sequence, a peroxisome targeting signal, and a lysosome targeting signal, and a membrane targeting signal.
- the one or more test transcriptional units further comprises an organellar targeting signal barcode that uniquely identifies the nucleotide sequence encoding the organellar targeting signal.
- the barcodes may be present anywhere in the transcription unit.
- the barcodes may be present in a 3’ untranslated region (UTR) or 5’ UTR of each test transcription unit, or in an intron engineered into the transcription unit.
- the barcodes are present in the 3’ UTR of each test transcription unit.
- each transcription unit further comprises a pair of universal primer binding sites flanking the barcodes.
- a library of transcriptional units may comprise about 2 to 10,000 transcription units. In some embodiments, a library of transcriptional units may comprise about 2 to 5000 transcription units. In some embodiments, a library of transcriptional units may comprise about 2 to 2500 transcription units. In some embodiments, a library of transcriptional units may comprise about 2 to 1000 transcription units.
- the library comprises about 4 to 800, about 4 to 720, about 4 to 600, about 4 to 500, about 4 to 400, about 4 to 300, about 4 to 250, about 4 to 200, about 4 to 160, about 4 to 120, about 4 to 100, about 4 to 80, about 4 to 60, about 4 to 40, about 4 to 30, about 4 to 20, about 4 to 18, about 4 to 15, about 4 to 12, about 4 to 10, about 6 to 800, about 6 to 720, about 6 to 600, about 6 to 500, about 6 to 400, about 6 to 300, about 6 to 250, about 6 to 200, about 6 to 160, about 6 to 120, about 6 to 100, about 6 to 80, about 6 to 60, about 6 to 40, about 6 to 30, about 6 to 20, about 6 to 18, about 6 to 15, about 6 to 12, about 6 to 10, about 8 to 800, about 8 to 720, about 8 to 600, about 8 to 500, about 8 to 400, about 8 to 300, about 8 to 250, about 8 to 200, about 8 to 160, about 8 to 120, about 8 to 100, about 8 to 160,
- the library comprises about 4, about 6, about 8, about 10, about 12, about 15, about 18, about 20, about 25, about 30, about 35, about 40, about 45, about 50, about 60, about 70, about 80, about 90, about 100, about 120, about 150, about 160, about 180, about 200, about 250, about 300, about 350, about 400, about 450, about 500, about 550, about 600, about 650, about 700, about 750, about 8000, about 900, about 1000, about 2000, about 2500, about 3000, about 4000, about 5000, about 6000, about 7000, about 800, about 9000, or about 10,000 transcription units.
- a library comprising a plurality of vectors, wherein each vector comprises a transcriptional unit selected from the library of transcription units described herein.
- vectors included in the vector library described herein are viral vectors.
- suitable viral vectors include an adenoviral vector, a retroviral vector (e.g., lentiviral vector), a herpes viral vector, a baculoviral vector, and/or an adeno- associated viral (AAV) vector.
- the viral vector used herein is a lentiviral vector.
- vectors included in the vector library described herein are non- viral vectors.
- the non-viral vector is a transposon.
- Exemplary transposons include PiggyBac transposons and sleeping beauty transposons.
- a library comprising a plurality of host cells, wherein each host cell expresses one or more gene product(s) from one or more transcriptional unit(s) selected from the library comprising a plurality of transcription units described herein or comprises a vector from the library of vectors described herein.
- the library comprising a plurality of host cells can be referred as a “host cell library”.
- a “host cell library” is a collection of host cells that may differ from one another by at least one element in the host cells, such as a gene sequence, regulatory sequence (e.g., promoter), or organellar targeting signal expressed from the transcription unit comprised within the host cells.
- a library comprising a plurality of regulatory sequences, each of which is operably linked to a nucleotide sequence encoding an organellar targeting signal.
- the regulatory sequence may comprise one or more elements selected from a promoter, an enhancer, a silencer, an insulator, an operator, and a terminator.
- the regulatory sequence described herein comprises at least a promoter, such as a promoter described herein.
- the organellar targeting signal may be selected from those known in the art and described herein, such as a mitochondrial targeting signal, an endoplasmic reticulum (ER) targeting signal, a nuclear localization sequence, a peroxisome targeting signal, a lysosome targeting signal, and a membrane targeting signal.
- a kit comprising a library described herein, and optionally, packaging and/or instructions for using the same.
- the kit may further comprise one or more pairs of universal primers.
- the kit may further comprise enzymes or buffers that may be necessary to carry out one or more of the method steps. Kits may contain a single container that contains the compositions with or without other components (e. ., primers, enzymes, buffers) or may have a separate container for each component.
- a “shotgun” genetic engineering methodology employing a synthetic library screening approach was designed to dramatically expand the scale and speed with which the possible solution space can be sampled in “pools” of solutions rather than the state-of-the-art of sampling individual solutions in parallel (Figs. 1A-1B).
- a set of coding sequences (CDSes) from across the phylogenetic tree are identified that are relevant to the desired cellular behavior to be engineered.
- a set of coding sequences (CDSes) relevant to multiple metabolic solutions that perform overlapping functions can be screened to add functional diversity. These are sourced from a divergent set of species spanning broad phylogenetic diversity in order to increase the solution space.
- the set of species may encompass any known species.
- the coding sequences may encompass any synthetic sequences including sequences that may not exist in nature. Each of these sequences is codon- optimized for expression in the destined host cell and de novo synthesized.
- these genes can also include sequences that can be used to modify the existing cellular environment (e.g., CRISPRa/CRISPRi components).
- a set of genes are also included in the pool that are expected to not contribute to the cellular behavior (e.g., GFP) as negative controls.
- a set of regulatory sequences such as promoters e.g., strong, medium or weak promoters) that drive varying levels of gene expression are assembled.
- Each promoter will be encoded in the library as a stand-alone version as well as in versions with organellar targeting signals (OTSes) attached, which allows for the specific targeting of gene products to different organellar compartments within the cell including, for example, the nucleus (with a nuclear localization signal or sequence (NLS)), mitochondria (with a mitochondrial localization signal (MLS)), and the endoplasmic reticulum.
- OTSes organellar targeting signals
- NLS nuclear localization signal
- MLS mitochondrial localization signal
- each promoter/OTS combination and each gene product is associated with a unique 8 base-pair DNA barcode that facilitates downstream analysis.
- a golden gate cloning strategy is then employed to generate a library of Transcription Units (TUs) that encode for random combinations of promoters/OTSes and CDSes. For instance, starting from an initial pool of 5 promoters that come with 2 OTSes (none or Mitochondria) and 10 CDSes, 100 different TUs are generated in a pool (5 x 2 x 10).
- the acceptor vector in this golden gate cloning strategy contains lentiviral sequences allowing for lentiviral integration of each TU into the target cell type. Importantly, the cloning strategy is designed to not affect the expression of the TUs in cells.
- Scars such as the restriction enzyme sites that enable golden gate cloning and the barcodes that enable downstream analysis are located in the 3’ UTR of each TU.
- the 3’ UTR also contains universal primer binding sites to amplify across the promoter and gene barcodes such that the composition of each TU can be identified by deep Next Generation Sequencing of a PCR product that is the same size.
- a pool of lentivirus from this library of TUs is then generated and used to infect cells at varying multiplicity of infection (MOI). This leads to the random integration of varying combinations of TUs in each individual cell, thereby generating a pool of cells each bearing a unique combination of TUs.
- the pool of cells is then challenged with a selective condition such that only those cells that contain a functional program will confer a growth advantage.
- DNA is then isolated from these clones. PCR is performed using universal primers pairs and the resultant product is deep sequenced. Sequencing reads are informative of the identity and relative copy number of each TU that is present in a clone that is able to confer the selective advantage.
- the present shotgun genetic engineering workflow can also be used to go beyond deducing the identity of the optimal solution.
- this approach can allow for the possibility of performing single-cell RNA sequencing (and other single cell omics readouts) on cells that grow well in the selective condition allowing for added layers of data integration that can inform the cellular engineering effort.
- Another possibility can be the incorporation of synthetic proteins into the present coding sequence libraries.
- An increasing number of efforts towards predictive protein engineering have shown success in recent years though no extensive tests have been performed to sample a large library of such proteins.
- Plasmids encoding these 4 transcription units were pooled and collectively transfected/packaged in HEK293Ts to generate a pool of lentivirus carrying all 4 TUs. This lentivirus was used to transduce CHO cells, which were then subject to selection in reduced-valine medium. Most cells died during selection, but prototrophic colonies formed and were expanded over 32 days of selection. 3 such colonies were picked and passaged in separated wells.
- qRT-PCR was performed to reveal what the copy number variation for each gene in the pathway corresponds to in mRNA expression levels, a first step towards reconstituting the ideal solution composition without use of lentivirus (Fig. 8).
- This data confirmed that increased ilvD expression strongly supports improved valine prototrophic outcomes while the same is true of increased ilvB levels though to a lesser degree.
- ilvC levels were down in selected prototrophic clones and populations relative to pMTIV ilvC levels.
- ilvN levels were either level with or down relative to pMTIV ilvN levels, suggesting that these proteins are less important for enabling valine prototrophy in the context of CHO metabolism.
- this data illustrated the utility of the “shotgun” approach to solution engineering. Rather than assembling and delivering large DNA cargos at low efficiency, a methodology was designed herein that allows for readout of functional solutions to encoding novel behaviors in target cells (e.g., prokaryotic cells or eukaryotic cells) by delivering smaller DNA cargos at high efficiency. Moreover, this data demonstrated how improved outcomes can be generated simply by varying the gene dosage of individual solution components.
- target cells e.g., prokaryotic cells or eukaryotic cells
- the complexity of the introduced library can be expanded from 4 members to 20 members (2 promoters x 2 localizations x 5 CDS) and eventually to 100 members (5 promoters x 2 localizations x 10 CDS), which will allow diversity to be sampled across promoters and in how CDSes are localized within the cell.
- Utilizing these higher complexity libraries is anticipated to demonstrate the engineering of solutions that have not been previously engineered, enabling functionalities that have not been engineered in target cells (e.g., prokaryotic cells or eukaryotic cells) before.
- LibJTR012 a 40-member library comprising 2 promoters, 2 organellar localization signals and 10 CDSes (Table 1) was designed and assembled. This library was designed with the intention to confer both valine and isoleucine prototrophy to mammalian cells. Following assembly of the LibJTR012 DNA library, the barcodes contained within the library were sequenced and each intended transcription unit was found to be represented (Fig. 9).
- Example 4 Engineering novel complex phenotypes with Shotgun Genetic Engineering [00140] From LibJTR012 populations subject to 15 days of low isoleucine selection, 48 clones were picked and passaged of which all were subject to a functional growth assay in isoleucine- free medium. 9 of the 48 clones exhibited growth across several passaging events in isoleucine- free medium. Of the 9 clones exhibiting isoleucine prototrophy, the fastest growing clone in isoleucine-free conditions was clone 012-10p-G7, which exhibited growth at 2.1 days/doubling (Fig. 12). For comparison, no isoleucine prototrophic CHO cells have been engineered before using rational design principles despite many attempts.
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Genetics & Genomics (AREA)
- Chemical & Material Sciences (AREA)
- Engineering & Computer Science (AREA)
- Organic Chemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biomedical Technology (AREA)
- Biotechnology (AREA)
- General Engineering & Computer Science (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Biochemistry (AREA)
- Molecular Biology (AREA)
- Plant Pathology (AREA)
- Physics & Mathematics (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Microbiology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Crystallography & Structural Chemistry (AREA)
- Virology (AREA)
- Medicinal Chemistry (AREA)
- General Chemical & Material Sciences (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Micro-Organisms Or Cultivation Processes Thereof (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
This application provides a method of identifying one or more transcriptional units that confer one or more desired characteristics to a host cell. This application also provides a method of genetically engineering a cell based on the identified transcriptional units. This application further provides libraries involved in the methods described herein, for example, libraries of transcription units and/or libraries of vectors.
Description
SHOTGUN GENETIC ENGINEERING
CROSS-REFERENCE TO RELATED APPLICATION
[0001] This application claims priority to U.S. Provisional Application No. 63/533,483, filed August 18, 2023, the disclosure of which is herein incorporated by reference in its entirety.
STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH
[0002] This invention was made with government support under HG009491 awarded by National Institutes of Health. The government has certain rights in the invention.
FIELD OF THE INVENTION
[0003] This application relates to a method of identifying one or more transcriptional units that confer one or more desired characteristics to a host cell. This application also relates to a method of genetically engineering a cell based on the identified transcriptional units. This application further relates to libraries involved in the methods described herein, for example, libraries of transcription units, libraries of vectors, and/or libraries of host cells.
BACKGROUND
[0004] The engineering of novel cellular states and behaviors requires the optimization of many parameters: (1) the identification of a set of instructions (enzymes, structural proteins etc.) that can function together to produce the target behavior in the desired cell type, (2) the finetuning of relative expression levels and localization of these identified instructions to improve efficiency, and (3) the optimization of the cellular milieu to enable those gene products to perform in an optimal manner. As there are many variables involved in this process, complex behaviors requiring more gene products are difficult to encode. There are an almost infinite number of potential solutions to the problem but not an efficient means to rank them. This results in a tedious and time-consuming workflow where individual solutions are tested one at a time and then iteratively edited before discovering a working solution.
[0005] One approach to tackling this “knowledge constraint” is directed evolution. Typically, in these approaches, an imperfect set of gene products and regulatory elements which control the
expression of those gene products (e.g., a metabolic pathway) is assembled. This assembly is then transformed into a cell alongside a mutagenic agent, which generates a high number of mutated versions of the gene products and regulatory elements. An environmental stressor which favors growth of cells encoding the desired cellular behavior is then applied allowing for cells with advantageous mutations to outperform cells without advantageous mutations.
[0006] While these approaches can potentially generate more optimized cellular behaviors, they are generally constrained by two factors: (1) mutagenic agents used in these approaches tend to only be able to make small e.g. single base-pair modifications to the gene sequences, thereby limiting the amount of DNA sequence diversity that is explored in any one experiment, and (2) when faced with a number of mutations that collectively confer a more optimized cellular behavior, it is typically only a subset that make a meaningful difference to the performance of the cellular behavior. It is difficult and time-consuming to identify which mutations make up the meaningful difference and which are simply “passenger” mutations that happened to be generated alongside meaningful mutations. Further, these methods have largely only been applied for behaviors that are encoded by one or two genes, limiting the complexity of engineering that can be performed.
SUMMARY
[0007] As explained in the Background section above, there is an unmet need for a method that allows for rapidly generating and testing of millions of potential solutions to a cellular engineering problem at once.
[0008] In one aspect, provided herein is a method of identifying one or more transcriptional units that confer one or more desired characteristics to a host cell, comprising: a) providing (i) one or more gene sequences that encode gene product(s) associated with the one or more desired characteristics; and (ii) one or more regulatory sequences; b) generating one or more test transcription units, wherein each test transcription unit comprises one or more gene sequences which are placed under the control of a regulatory sequence, said gene sequence(s) and said regulatory sequence are independently selected from the one or more gene sequences and the one or more regulatory sequences provided in step (a), respectively; and wherein each test
transcription unit further comprises one or more gene sequence barcodes each of which uniquely identifies one of said gene sequence(s) and a regulatory sequence barcode that uniquely identifies said regulatory sequence; c) generating one or more vectors, wherein each vector comprises a test transcriptional unit selected from the one or more test transcriptional units; d) introducing the one or more vectors into a population of host cells, wherein the host cells are suitable for expression of gene product(s) from the test transcriptional unit(s); e) selecting from the population of host cells generated in step (d) one or more host cells that exhibits the one or more desired characteristics; and f) detecting one or more gene sequence and regulatory sequence barcodes in DNA of the one or more host cells selected in step (e) and identifying the gene sequence(s) and the regulatory sequence(s) comprised within the one or more test transcriptional units that confer the one or more desired characteristics to the host cell(s) based on the detected barcodes.
[0009] In some embodiments, the method further comprises g) determining copy number of the one or more transcriptional units identified in step (f), or the level of the product(s) encoded by the gene sequence(s) comprised within the one or more test transcriptional units identified in step (f) or corresponding RNAs.
[0010] In some embodiments, one or more test transcriptional units further comprise a nucleotide sequence encoding an organellar targeting signal which directs the gene product from the gene sequence to a target location in the host cell.
[0011] In some embodiments, the organellar targeting signal is selected from a mitochondrial targeting signal, an endoplasmic reticulum (ER) targeting signal, a nuclear localization sequence, a peroxisome targeting signal, a lysosome targeting signal, and a membrane targeting signal.
[0012] In some embodiments, the one or more test transcriptional units further comprise an organellar targeting signal barcode that uniquely identifies said nucleotide sequence encoding the organellar targeting signal.
[0013] In some embodiments, step (f) further comprises detecting the organellar targeting signal barcode in DNA of the one or more host cells selected in step (e) and identifying the
nucleotide sequence encoding the organellar targeting signal comprised within the one or more test transcriptional units that confer the one or more desired characteristics to the host cell(s) based on the detected barcode.
[0014] In some embodiments, each regulatory sequence comprises one or more elements selected from a promoter, an enhancer, a silencer, an insulator, an operator, and a terminator. [0015] In some embodiments, the one or more gene sequences comprise at least one gene sequence that is not naturally present in the host cells. In some embodiments, the one or more gene sequences comprise at least one gene sequence that is the same as or derived from a gene sequence naturally present in the host cells.
[0016] In some embodiments, the one or more gene sequences comprise at least one gene sequence that is derived from a different species than the host cells.
[0017] In some embodiments, the one or more gene sequences comprise at least one gene sequence that is codon-optimized for expression in the host cells.
[0018] In some embodiments, the one or more gene sequences comprise at least one gene sequence encoding a genome engineering system and/or a component thereof. In some embodiments, the genome engineering system is a CRISPR activation (CRISPRa) or CRISPR inhibition (CRISPRi) system.
[0019] In some embodiments, the one or more gene sequences comprise at least one gene sequence encoding a negative control. In some embodiments, the negative control is a detectable marker. In some embodiments, the detectable marker is a fluorescent protein.
[0020] In some embodiments, the one or more gene sequences comprise at least one gene sequence encoding a synthetic protein.
[0021] In some embodiments, the one or more gene sequences comprise about 1-10,000 gene sequences. In some embodiments, the one or more gene sequences comprise about 2-100 gene sequences. In some embodiments, one or more gene sequences comprise about 4-20 gene sequences.
[0022] In some embodiments, the one or more regulatory sequences comprise about 1-10,000 regulatory sequences. In some embodiments, the one or more regulatory sequences comprise about 2-10 regulatory sequences. In some embodiments, the one or more regulatory sequences comprise about 2-5 regulatory sequences.
[0023] In some embodiments, the vector is a viral vector. In some embodiments, the viral vector is an adenoviral vector, retroviral vector, or herpes viral vector. In some embodiments, the retroviral vector is a lentiviral vector.
[0024] In some embodiments, the vector is a non-viral vector. In some embodiments, the non- viral vector is a transposon.
[0025] In some embodiments, the one or more test transcription units are generated using a Golden Gate cloning technique.
[0026] In some embodiments, the gene sequence barcode(s), regulatory sequence barcode, and/or organellar targeting signal barcode are present in a 3’ untranslated region (UTR) of each test transcription unit. In some embodiments, the 3’ UTR of each test transcription unit further comprises a pair of universal primer binding sites flanking the barcodes. In some embodiments, detection of one or more barcodes is achieved using polymerase chain reaction (PCR) with one or more pairs of universal primers followed by sequencing of PCR product(s). In some embodiments, detection of one or more barcodes is achieved using a single-cell sequencing approach.
[0027] In some embodiments, the copy number of the one or more transcriptional units is determined using quantitative PCR, digital droplet PCR, or hybridization array.
[0028] In some embodiments, the method further comprises performing one or more singlecell omics analyses on the one or more host cells selected in step (e). In some embodiments, the one or more single-cell omics analyses comprise single-cell RNA sequencing, single-cell proteomics, single-cell metabolomics, and/or single-cell epigenomics.
[0029] In some embodiments, step (e) comprises culturing the population of host cells generated in step (d) under conditions allowing for selection of the one or more host cells that exhibit the desired characteristic.
[0030] In some embodiments, the host cells are mammalian cells.
[0031] In another aspect, provided herein is a method of genetically engineering a cell, comprising introducing into the cell the one or more transcriptional units identified according to the method of described herein.
[0032] In some embodiments, the one or more desired characteristics is selected from protein composition, protein content, DNA composition, DNA content, tolerance to environmental stressor(s), doubling time, prototrophy, biosynthesis capability, cell surface marker(s), cell size,
light absorbance, light reflection, fluorescence, light scatter, polarization, one or more electrical properties of the cells, one or more magnetic properties, one or more morphological properties, membrane permeability, membrane fluidity, and/or redox state.
[0033] In another aspect, provided herein is a library comprising a plurality of transcription units, wherein each transcription unit comprises one or more gene sequence(s) which are placed under the control of a regulatory sequence; and wherein each transcription unit further comprises one or more gene sequence barcodes each of which uniquely identifies one of said gene sequence(s) and a second barcode that uniquely identifies said regulatory sequence.
[0034] In some embodiments, one or more transcriptional units further comprise a nucleotide sequence encoding an organellar targeting signal which directs a gene product from the gene sequence to a target location in a host cell.
[0035] In some embodiments, the organellar targeting signal is selected from a mitochondrial targeting signal, an endoplasmic reticulum (ER) targeting signal, a nuclear localization sequence, a peroxisome targeting signal, and a lysosome targeting signal, and a membrane targeting signal. [0036] In some embodiments, the one or more test transcriptional units further comprises an organellar targeting signal barcode that uniquely identifies said nucleotide sequence encoding the organellar targeting signal.
[0037] In some embodiments, each regulatory sequence comprises one or more elements selected from a promoter, an enhancer, a silencer, an insulator, an operator, and a terminator. [0038] In some embodiments, the gene sequence barcode(s), regulatory sequence barcode, and/or organellar targeting signal barcode are present in a 3’ untranslated region (UTR) of each transcription unit. In some embodiments, the 3’ UTR of each transcription unit further comprises a pair of universal primer binding sites flanking the barcodes.
[0039] In some embodiments, the library comprises about 2-10,000 transcription units. In some embodiments, the library comprises about 2-1000 transcription units. In some embodiments, the library comprises about 8-100 transcription units.
[0040] In another aspect, provided herein is a library comprising a plurality of vectors, wherein each vector comprises a transcriptional unit selected from the library described herein. In some embodiments, the vector is a viral vector. In some embodiments, the viral vector is an adenoviral vector, retroviral vector, or herpes viral vector. In some embodiments, the retroviral vector is a
lentiviral vector. In some embodiments, the vector is a non-viral vector. In some embodiments, the non-viral vector is a transposon.
[0041] In another aspect, provided herein is a library comprising a plurality of host cells, wherein each host cell expresses one or more gene product(s) from one or more transcriptional unit(s) selected from the library comprising a plurality of transcription units described herein or comprises a vector from the library of vectors described herein.
[0042] In another aspect, provided herein is a kit comprising a library described herein, and optionally, packaging and/or instructions for using the same. In some embodiments, the kit further comprises one or more pairs of universal primers.
BRIEF DESCRIPTION OF THE DRAWINGS
[0043] Figs. 1A-1B show a schematic representation of the shotgun genetic engineering workflow.
[0044] Fig. 2 is a schematic showing that current methods to engineer mammalian metabolism rely on testing one solution at a time.
[0045] Fig. 3 shows a previously engineered valine biosynthesis solution (pMTIV) in Chinese Hamster ovary (CHO) cells (Trolle et al., 2022).
[0046] Fig. 4 shows that engineering valine biosynthesis in CHO enabled growth in valine-free conditions.
[0047] Fig. 5 shows a schematic representation of minimal library design and assembly (LibJTROOl).
[0048] Fig. 6 shows that clones generated using shotgun genetic engineering with LibJTROOl outperformed the rationally designed solution engineered in previous work (pMTIV).
[0049] Fig. 7 shows that Amplicon-seq revealed similar barcode signatures in well-performing prototrophic clones derived from LibJTROOl cells.
[0050] Fig. 8 shows that qRT-PCR revealed expression signatures that characterize wellperforming prototrophic clones derived from LibJTROOl cells.
[0051] Fig. 9 shows LibJTR012 library - composition of transcription units as determined by Amplicon-seq.
[0052] Fig. 10 shows a valine-free growth assay featuring clones derived from LibJTR012 cells and identified by selection in low valine conditions.
[0053] Fig. 11 shows barcodes contained within clone 012-Val-Dl, the LibJTR012 clone identified as growing the fastest in valine-free medium following low valine selection. The expectation of finding ilvB, ilvC, ilvD, and HvN is based on the experiment illustrated in Example 2, which showed that the introduction of these 4 genes is sufficient to confer valine prototrophy to CHO cells.
[0054] Fig. 12 shows isoleucine-free growth assay featuring clones derived from LibJTR012 cells and identified by selection in low isoleucine conditions. Growth characteristics of clones exhibiting isoleucine prototrophy across multiple passaging events.
[0055] Fig. 13 shows barcodes contained within clone 012-10p-Hl, a LibJTR012 clone identified as growing quickly in isoleucine-free medium following low isoleucine selection. [0056] Fig. 14 shows comparison of survival advantage of 7 clones selected on low isoleucine RPMI medium in low isoleucine and low valine RPMI, respectively.
DETAILED DESCRIPTION
[0057] The present disclosure provides a novel approach, termed “shotgun genetic engineering”, to engineering complex cellular behaviors in cells (e.g., prokaryotic cells or eukaryotic cells) that combines DNA synthesis, diversity sampling (e.g., phylogenetic or functional diversity sampling), multiplexed high-throughput screens, and evolutionary selection. The present approach tackles both the above-mentioned constraints, enabling the sampling of a high number of potential solutions to a cellular engineering problem in a pooled approach while requiring relatively little infrastructure.
[0058] The engineering of complex behaviors into cells (e.g., prokaryotic cells or eukaryotic cells) requires the fine-tuned and localized expression of many components. However, it is difficult to identify the exact combination of genes that will function together to specify the desired behavior, once they are expressed at the appropriate levels and are localized correctly in cells. Success requires the ability to generate and test a large number of potential solutions in terms of candidate genes, expression level and localization. Currently, there are two major approaches to solving this issue: trial and error of single candidate solutions in series or the directed evolution of a single candidate. The first approach is arduous, costly, time-consuming and has low probability of success given the low amount of solution space that is explored (Fig. 2). On the other hand, existing directed evolution efforts suffer from (1) not generating
sufficiently relevant diversity (in terms of gene sequences, expression level and localization) that would lead to optimized solutions, (2) the lack of ability to tell relevant randomly generated genetic modifications from “passenger” mutations, and (3) are often focused on single genes, precluding the optimization of multi-component solutions.
[0059] The present disclosure allows the rapid generation and testing of millions of potential solutions at once. Diversity directly relevant to the desired phenotype can be generated on several dimensions including gene sequence diversity, gene expression diversity, and localization (organellar compartmentalization) diversity. It is expected that each genetic solution generated via the present approach can generate vastly different phenotypic outcomes as compared to the difference in outcome that would be generated by individual or few base pair mutations at a time (the current state-of-the-art). The present approach further enables quick identification of the relevant genetic modifications that have enabled the optimized cellular behavior. Altogether the present approach can enable the engineering of complex cellular behaviors in cells.
[0060] The development of the present technology can have applications in any industry involving the large-scale culturing of cells (e.g., prokaryotic cells or eukaryotic cells) or the requirement for encoding complex behaviors in cells (e.g., prokaryotic cells or eukaryotic cells). This is inclusive of but not restricted to the development of cell therapies with complex behaviors, cultured meat products as well as in the production of biologies and viral vectors. [0061] Cellular behaviors that could be encoded in cells e.g., prokaryotic cells or eukaryotic cells) to ease their culturing at scale include improving upon their tolerance to various stress conditions such as shear stress, hypoxia, and variations in pH. Desired cell state and/or cell fate could be introduced in cells (e.g., prokaryotic cells or eukaryotic cells) to generate desired cell types for applications in biomedicine such as cell fate conversion for cell therapy. Other advantageous behaviors for culturing cells at scale include reducing the requirement for exogenous supplementation of essential nutrients such as amino acids and growth factors or adapting adherent cells for growth in suspension. For applications involving culturing many cell lines in parallel it might be further advantageous to work with cells that are engineered to be more efficiently cryopreserved, and for applications involving primary cell lines, it may be useful to work with cells engineered to be more easily immortalized and are resistant to contamination.
Definitions
[0062] When a list is presented, unless stated otherwise, it is to be understood that each individual element of that list, and every combination of that list, is a separate embodiment. For example, a list of embodiments presented as “A, B, or C” is to be interpreted as including the embodiments, “A,” “B,” “C,” “A or B,” “A or C,” “B or C,” or “A, B, or C ”
[0063] The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting. As used herein, the indefinite articles “a”, “an” and “the” should be understood to include plural reference unless the context clearly indicates otherwise.
[0064] The term “about” or “approximately” means within a statistically meaningful range of a value. Such a range can be within an order of magnitude, preferably within 50%, more preferably within 20%, still more preferably within 10%, and even more preferably within 5% of a given value or range. The allowable variation encompassed by the term “about” or “approximately” depends on the particular system under study, and can be readily appreciated by one of ordinary skill in the art.
[0065] The term “characteristic” described herein include refersto any functional or phenotypical trait of a cell. Exemplary characteristics of a cell include, but are not limited to, protein composition, protein content, DNA composition, DNA content, RNA composition, RNA content, fatty acid composition, fatty acid content, lipid composition, lipid content, sugar composition, sugar content, tolerance to environmental stressors, doubling time, prototrophy, biosynthesis capability, cell surface markers, cell size, light absorbance, light reflection, fluorescence, light scatter, polarization, electrical properties of the cells, magnetic properties, morphological properties, membrane permeability, membrane fluidity, and redox state. One of ordinary skill in the art will readily appreciate that a method described herein may identify or engineer any of a number of alternative characteristics in a cell of interest, and that these alternative characteristics are readily amenable to exploitation in the described methods.
[0066] The term “vector”, as used herein, means a nucleic acid molecule capable of transporting another nucleic acid to which it has been linked. In some embodiments, the vector is a viral vector, wherein additional DNA segments may be ligated into the viral genome. In some embodiments, the vector is a non-viral vector, such as a transposon or plasmid. In some embodiments, the vectors are capable of autonomous replication in a host cell into which they are introduced (e. , bacterial vectors having a bacterial origin of replication and episomal
mammalian vectors). In other embodiments, the vectors (e.g., non-episomal mammalian vectors) can be integrated into the genome of a host cell upon introduction into the host cell, and thereby are replicated along with the host genome. Moreover, certain vectors are capable of directing the expression of genes to which they are operably linked. Such vectors are referred to herein as “recombinant expression vectors” (or simply, “expression vectors”).
[0067] As used herein, the term “regulatory sequence” means a nucleic acid sequence which can regulate expression of a gene product operably linked to the regulatory sequence. In some instances, this sequence may be the core promoter sequence and in other instances, this sequence may also include other regulatory elements which are required for expression of the gene product, such as an enhancer, a silencer, an insulator, an operator, and a terminator.
[0068] The term “promoter” as used herein is defined as a nucleic acid sequence recognized by the transcriptional machinery of the cell, or introduced transcriptional machinery, required to initiate the specific transcription of a polynucleotide sequence.
[0069] A “constitutive” promoter is a nucleotide sequence which, when operably linked with a polynucleotide which encodes or specifies a gene product, causes the gene product to be produced in a cell under most or all physiological conditions of the cell.
[0070] An “inducible” promoter is a nucleotide sequence which, when operably linked with a polynucleotide which encodes or specifies a gene product, causes the gene product to be produced in a cell substantially only when an inducer which corresponds to the promoter is present in the cell.
[0071] The terms “nucleic acid” and “nucleotide”, used interchangeably herein, encompass both DNA and RNA unless specified otherwise. A reference to a nucleotide sequence encompasses its complement unless otherwise specified. Thus, a reference to a nucleic acid having a particular sequence should be understood to encompass its complementary strand, with its complementary sequence.
[0072] The term “library” refers to an isolated collection of at least two elements that differ from one another in at least one aspect. For example, a “library of transcription units” is a collection of at least two transcription units that may differ from one another by at least one element, such as a gene sequence, regulatory sequence (e.g., promoter), or organellar targeting signal. As another example, a “vector library” is a collection of vectors that may differ from one another by at least one element in the vector, such as a gene sequence, regulatory sequence (e.g.,
promoter), or organellar targeting signal included in a transcription unit comprised within the vector. The elements of the library are isolated from like type of elements that are not part of the library (e.g., vectors of a vector library are isolated from vectors that are not part of the library). The library may exist in vitro or ex vivo.
[0073] The methods and techniques of the present invention are generally performed according to conventional methods well known in the art and as described in various general and more specific references that are cited and discussed throughout the present specification unless otherwise indicated. See, e.g., Sambrook et al., Molecular Cloning: A Laboratory Manual, 2d ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (1989) and Ausubel et al., Current Protocols in Molecular Biology, Greene Publishing Associates (1992), and Harlow and Lane Antibodies: A Laboratory Manual, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (1990), which are incorporated herein by reference. Enzymatic reactions and purification techniques are performed according to manufacturer’s specifications, as commonly accomplished in the art or as described herein. The nomenclatures used in connection with, and the laboratory procedures and techniques of, analytical chemistry, synthetic organic chemistry, and medicinal and pharmaceutical chemistry described herein are those well-known and commonly used in the art. Standard techniques are used for chemical syntheses, chemical analyses, pharmaceutical preparation, formulation, and delivery, and treatment of patients.
Methods of the Invention
[0074] In one aspect, provided herein is a method of identifying one or more transcriptional units that confer one or more desired characteristics to a host cell, comprising one or more of the following steps: a) providing (i) one or more gene sequences that encode gene product(s) associated with the one or more desired characteristics; and (ii) one or more regulatory sequences; b) generating one or more test transcription units, wherein each test transcription unit comprises one or more gene sequences which are placed under the control of a regulatory sequence, the gene sequence(s) and the regulatory sequence are independently selected from the one or more gene sequences and the one or more regulatory sequences provided in step (a), respectively; and wherein each test transcription unit further comprises one or more gene sequence barcodes each of which uniquely identifies one of the gene sequence(s) and a regulatory sequence barcode that uniquely identifies the regulatory sequence; c) generating one
or more vectors, wherein each vector comprises a test transcriptional unit selected from the one or more test transcriptional units; d) introducing the one or more vectors into a population of host cells, wherein the host cells are suitable for expression of gene product(s) from the test transcriptional unit(s); e) selecting from the population of host cells generated in step (d) one or more host cells that exhibits the one or more desired characteristics; and f) detecting one or more gene sequence and regulatory sequence barcodes in DNA of the one or more host cells selected in step (e) and identifying the gene sequence(s) and the regulatory sequence(s) comprised within the one or more test transcriptional units that confer the one or more desired characteristics to the host cell(s) based on the detected barcodes.
[0075] In some embodiments, the selection of the population of host cells generated in step (d) can be performed using a cell culture method (e.g., by culturing host cells under condition(s) that allows growth of the host cells that can express desired gene product(s)), a cell sorting method (e.g., fluorescence activated cell sorting based on a biosensor integrated in the target cells, magnetic sorting), or an imaging method.
[0076] In some embodiments, the method further comprises a step (g): determining copy number of the one or more transcriptional units identified in step (f), or determining the level of the product(s) encoded by the gene sequence(s) comprised within the one or more test transcriptional units identified in step (f) or corresponding RNAs.
[0077] The regulatory sequence described herein may comprise any expression control sequences including, for example, sequences necessary for appropriate transcription initiation, termination, promoter and enhancer sequences; efficient RNA processing signals such as splicing and polyadenylation signals; sequences that stabilize cytoplasmic mRNA; sequences that enhance translation efficiency (/.e., Kozak consensus sequence); sequences that enhance protein stability; and when desired, sequences that enhance protein secretion. The nature of such regulatory sequences differs depending upon the host cell. For example, in prokaryotes, such regulatory sequences generally include promoter, ribosomal binding site, and transcription termination sequence; in eukaryotes, generally, such regulatory sequences include promoters and transcription termination sequence.
[0078] In various embodiments, a regulatory sequence described herein may comprise one or more elements selected from a promoter, an enhancer, a silencer, an insulator, an operator, and a
terminator. In some embodiments, the regulatory sequence described herein comprises at least a promoter. The promoter may be a constitutive or inducible promoter.
[0079] Examples of promoters suitable for use in the methods and compositions described herein include, not are not limited to, a cytomegalovirus (CMV) promoter, a simian virus 40 (SV40) promoter, a human elongation factor 1 alpha (EFla) promoter, or a phosphoglycerate kinase (PGK) promoter, a minimal promoter fragment derived from the CMV promoter (minCMV promoter), a RSV LTR, a MoMLV LTR, a CK6 promoter, a transthyretin promoter (TTR), a TK promoter, a tetracycline responsive promoter (TRE), an HBV promoter, an hAAT promoter, a LSP promoter, chimeric liver-specific promoters (LSPs), a E2F promoter, a telomerase (hTERT) promoter, a cytomegalovirus enhancer/chicken beta-actin/Rabbit P-globin promoter (CAG) promoter, a rod opsin promoter, a cone opsin promoter, a beta phosphodiesterase (PDE) promoter, a retinitis pigmentosa (RP1) promoter, human U6 nuclear promoter, a Drosophila UAS promoter containing Gal4 binding sites, a T7 promoter, a Sp6 promoter, a promote from Lac operon, an araBad promoter, a trp promoter, a Ptac promoter, an interphotoreceptor retinoid-binding protein gene (IRBP) promoter, or synthetic promoters.
[0080] In some embodiments, promoters used in the methods described herein may be an RNA polymerase II promoter. In some embodiments, promoters used in the methods described herein may be an RNA polymerase I promoter. In some embodiments, promotors used in the methods described herein may be an RNA polymerase III promoter.
[0081] In some embodiments, one or more test transcriptional units may further comprise a nucleotide sequence encoding an organellar targeting signal which directs the gene product from the gene sequence to a target location (e.g., an organellar or a sub-organellar compartment, cell surface) in the host cell. Non-limiting examples of organellar targeting signals that may be used in the methods or compositions described herein include a mitochondrial targeting signal (or a mitochondrial localization signal), an endoplasmic reticulum (ER) targeting signal, a nuclear localization sequence, a peroxisome targeting signal, a lysosome targeting signal, and a membrane targeting signal.
[0082] In some embodiments where an organellar targeting signal is included, the one or more test transcriptional units may further comprise an organellar targeting signal barcode that uniquely identifies the nucleotide sequence encoding the organellar targeting signal. In some embodiments, the method may further comprise in step (f) detecting the organellar targeting
signal barcode in DNA of one or more host cells selected in step (e) and identifying the nucleotide sequence encoding the organellar targeting signal comprised within the one or more test transcriptional units that confer the one or more desired characteristics to the host cell(s) based on the detected barcode.
[0083] In some embodiments, gene sequences used in the methods or compositions described herein may comprise at least one gene sequence that is not naturally present in the host cells. In some embodiments, gene sequences used in the methods or compositions described herein may comprise at least one gene sequence that is the same as or derived from a gene sequence naturally present in the host cells.
[0084] In some embodiments, gene sequences used in the methods or compositions described herein comprise at least one gene sequence that is derived from a different species than the host cells.
[0085] In some embodiments, gene sequences used in the methods or compositions described herein comprise at least one gene sequence that is codon-optimized for expression in the host cells. In some embodiments, gene sequences used in the methods or compositions described herein comprise at least one gene sequence that is not codon-optimized for expression in the host cells. In some embodiments, gene sequences used in the methods or compositions described herein comprise gene sequences that are not codon-optimized for expression in the host cells. [0086] In some embodiments, gene sequences used in the methods or compositions described herein comprise at least one gene sequence encoding a genome engineering system and/or a component (e.g., a gene editing nuclease or variants thereof, a guide nucleic acid) thereof. Any suitable genome engineering system can be used including, but not limited to, CRISPR- associated protein (Cas) nucleases, zinc finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs), meganucleases, other endo- or exo-nucleases, variants thereof, fragments thereof, and combinations thereof. In one embodiment, the genome engineering system used herein is a CRISPR activation (CRISPRa) or CRISPR inhibition (CRISPRi) system. The genome engineering systems used herein may be designed to specifically target promoters or coding sequences of genes (e.g., genes that encode gene products that may be associated with the one or more desired characteristics) to change their expression or sequence.
[0087] In some embodiments, gene sequences used in the methods or compositions described herein comprise at least one gene sequence encoding a negative control. For example, the
negative control may be a detectable marker, such as a reporter protein. A reporter protein may produce a detectable signal which allows detection of the target cell or tissue. Examples of reporter proteins include, without limitation, b-lactamase, b- galactosidase (LacZ), alkaline phosphatase, thymidine kinase, green fluorescent protein (GFP), yellow fluorescent protein (YFP), red fluorescent protein (RFP), cyan fluorescent proteins, orange fluorescent proteins, chloramphenicol acetyltransferase (CAT), luciferase, membrane bound proteins including, for example, CD2, CD4, CD8, the influenza hemagglutinin protein, to which high affinity antibodies directed thereto exist or can be produced by conventional means, and fusion proteins comprising a membrane bound protein appropriately fused to an antigen tag domain from, among others, hemagglutinin or Myc.
[0088] In some embodiments, the detectable marker is a fluorescent protein, such as a green fluorescent protein (GFP), yellow fluorescent protein (YFP), red fluorescent protein (RFP), cyan fluorescent protein, orange fluorescent protein, and the like.
[0089] In some embodiments, gene sequences used in the methods or compositions described herein comprise at least one gene sequence encoding a synthetic protein.
[0090] In some embodiments, the number of gene sequences used in the methods or compositions described herein can be any integral number. In some embodiments, the number of gene sequences used in the methods or compositions described herein is any integral number between 1 to about 10,000 sequences. In some embodiments, the number of gene sequences used in the methods or compositions described herein is any number between 1 to 2500, between 1 to 5000, between 1 to 1000, between 1 to 500, between 1 to 250, between 2 to 100, between 2 to 80, between 2 to 50, between 2 to 30, between 2 to 20, between 2 to 10, between 3 to 100, between 3 to 80, between 3 to 50, between 3 to 30, between 3 to 20, between 3 to 10, between 4 to 100, between 4 to 80, between 4 to 50, between 4 to 30, between 4 to 20, between 4 to 10, between 5 to 100, between 5 to 80, between 5 to 50, between 5 to 30, between 5 to 20, between 5 to 15, or between 5 to 10. In some embodiments, the number of gene sequences used in the methods or compositions described herein is 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or about 25, about 30, about 35, about 40, about 45, about 50, about 55, about 60, about 65, about 70, about 75, about 80, about 85, about 90, about 95, about 100, about 100, about 250, about 500, about 1000, about 2000, about 2500, about 3000, about 4000, about 5000, about 6000, about 7000, about 8000, about 9000, or about 10,000.
[0091] In some embodiments, the number of regulatory sequences used in the methods or compositions described herein can be any integral number. In some embodiments, the number of regulatory sequences used in the methods or compositions described herein is any integral number between 1 to about 10,000 sequences. In some embodiments, the number of regulatory sequences used in the methods or compositions described herein is any integral number between 1 to about 5000 sequences. In some embodiments, the number of regulatory sequences used in the methods or compositions described herein is any integral number between 1 to about 2500 sequences. In some embodiments, the number of regulatory sequences used in the methods or compositions described herein is any integral number between 1 to about 1000 sequences. In some embodiments, the number of regulatory sequences used in the methods or compositions described herein is any integral number between 1 to about 500 sequences. In some embodiments, the number of regulatory sequences used in the methods or compositions described herein is any integral number between 1 to about 250 sequences. In some embodiments, the number of regulatory sequences used in the methods or compositions described herein is any integral number between 1 to about 100 sequences. In some embodiments, the number of regulatory sequences used in the methods or compositions described herein is any integral number between 1 to about 50 sequences. In some embodiments, the number of regulatory sequences used in the methods or compositions described herein is any number between 2 to 40, between 2 to 30, between 2 to 20, between 2 to 10, between 2 to 8, between 2 to 5, between 3 to 40, between 3 to 30, between 3 to 20, between 3 to 10, between 3 to 8, between 3 to 6, between 4 to 40, between 4 to 30, between 4 to 20, between 4 to 10, between 4 to 8, between 5 to 40, between 5 to 30, between 5 to 20, or between 5 to 10. In some embodiments, the number of regulatory sequences used in the methods or compositions described herein is 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, about 25, about 30, about 35, about 40, about 45, about 50, about 55, about 60, about 65, about 70, about 75, about 80, about 85, about 90, about 95, about 100, , about 250, about 500, about 1000, about 2000, about 2500, about 3000, about 4000, about 5000, about 6000, about 7000, about 8000, about 9000, or about 10,000.
[0092] In some embodiments, the methods or compositions described herein involve the use of a viral vector. Non-limiting examples of viral vectors suitable for use in the methods or compositions described herein include an adenoviral vector, a retroviral vector (e.g., lentiviral
vector), a herpes viral vector, a baculoviral vector, and/or an adeno-associated viral (AAV) vector. In one embodiment, the viral vector used herein is a lentiviral vector.
[0093] In other embodiments, the methods or compositions described herein may also include the use of a non-viral vector. In one embodiment, the non-viral vector is a transposon. Exemplary transposons that may be used in the present methods include PiggyBac transposons and sleeping beauty transposons.
[0094] Various cloning techniques may be used to generate the transcription units described herein. For example, suitable techniques include, but are not limited to, Golden Gate cloning technique, Gibson cloning, gateway cloning, InFusion cloning, yeast assembly, or restriction cloning. In one embodiment, transcription units described herein are generated using a Golden Gate cloning technique. For example, in a Golden Gate cloning technique, a library of vectors carrying barcoded regulatory sequences and/or nucleotide sequences encoding organellar targeting signals also carry IIS restriction sites which enable the insertion of any member of a pool of barcoded gene sequences, which are also flanked by IIS restriction sites.
[0095] The barcodes described herein, e.g., gene sequence barcodes, regulatory sequence barcodes, and/or organellar targeting signal barcodes, may be present anywhere in the transcription unit. For example, the barcodes may be present in a 3’ untranslated region (UTR) or 5’ UTR of each test transcription unit, or in an intron engineered into the transcription unit. In one embodiment, the barcodes are present in the 3’ UTR of each test transcription unit.
[0096] In some embodiments, the transcription units further comprise a pair of universal primer binding sites flanking the barcodes. In some embodiments, the detection of one or more barcodes may be achieved using polymerase chain reaction (PCR) with one or more pairs of universal primers followed by sequencing of PCR product(s). Alternatively, the detection of one or more barcodes may be achieved using direct whole genome sequencing.
[0097] Copy number of transcriptional units may be determined using any methods known in the art, such as quantitative PCR, digital droplet PCR, or hybridization array.
[0098] In some embodiments, the method may further comprise performing one or more single-cell omics analyses on one or more host cells selected in step (e). The single-cell omics analyses may include single-cell RNA sequencing, single-cell proteomics, single-cell metabolomics, and/or single-cell epigenomics.
[0099] In another aspect, provided herein is a method of genetically engineering a cell by introducing into the cell the one or more transcriptional units identified according to the shotgun genetic engineering method described herein.
[00100] The methods described herein may be used to engineer any type of cells including prokaryotic and eukaryotic cells. Bacterial cells may be preferred prokaryotic cells in some circumstances. For examples, the bacterial cells may be a strain of E. coll such as, the E. coli strains DH5, RR1, Stbl4™, and NEB® Stable, or a strain of Vibrio Natriegens. Non-limiting examples of eukaryotic cells include yeast, insect, plant and mammalian cells (e.g., from a mouse, rat, monkey, or human cell lines). Non-limiting examples of yeast cells include, e.g., BY4741, YPH499, YPH500, and YPH501. Non-limiting examples of mammalian cells include Chinese hamster ovary (CHO) cells, NIH Swiss mouse embryo cells NIH/3T3, monkey kidney- derived COS-1 cells, and 293 cells which are human embryonic kidney (HEK) cells. Examples of insect cells include Sf9 cells, which can be transfected with baculovirus expression vectors. [00101] In some embodiments, the method described herein is used to engineer mammalian cells. In some embodiments, the mammalian cells that are used in the methods described herein are Chinese hamster ovary (CHO) cells, human embryonic kidney (HEK) cells, and stem cells such as induced pluripotent stem cells (iPSCs), or mouse embryonic stem (ES) cells.
[00102] In some embodiments, the method described herein is used to engineer plant cells.
[00103] In some embodiments, the method described herein is used to engineer insect cells.
[00104] In some embodiments, the method described herein is used to engineer microbes.
[00105] Delivery of one or more transcription units of the present disclosure into the target cellular hosts can accomplished by well-known methods that typically depend on the type of vector used. Viral delivery mechanisms include but are not limited to retroviral vectors (e.g., lentiviral vectors), adenoviral vectors, adeno-associated viral (AAV) vectors, herpes viral vectors, and baculoviral vectors etc. as described above. Non-viral delivery mechanisms include transposon-mediated gene delivery, lipid mediated transfection, liposomes, immunoliposomes, lipofection, cationic facial amphiphiles (CFAs) and combinations thereof. Successfully transformed or transduced cells, i.e., cells that contain a transcription unit of the present disclosure, can be identified by, for example, PCR. Alternatively, the presence of the gene products (e.g., proteins) in the supernatant can be detected using, for example, antibodies.
[00106] As used herein, the term “isolated” means that the referenced material (e.g, a cell or virus) is removed from its native environment. Thus, an isolated biological material can be free of some or all cellular components, z.e., components of the cells in which the native material occurs naturally (e.g, cytoplasmic or membrane component). A material shall be deemed isolated if it is present in a cell extract or supernatant. In the case of nucleic acid molecules, an isolated nucleic acid includes, without limitation, a PCR product, an isolated RNA (e.g, mRNA), a DNA ( .g., cDNA), or a restriction fragment. Isolated nucleic acid molecules include sequences inserted into plasmids, cosmids, artificial chromosomes, and the like, i.e., when it forms part of a chimeric recombinant nucleic acid construct. Thus, in a specific embodiment, a recombinant nucleic acid is an isolated nucleic acid. An isolated protein may be associated with other proteins or nucleic acids, or both, with which it associates in the cell, or with cellular membranes if it is a membrane-associated protein. An isolated organelle, cell, or tissue is removed from the anatomical site in which it is found in an organism. An isolated material may be, but need not be, purified.
[00107] The term “purified” as used herein refers to material that has been isolated under conditions that reduce or eliminate the presence of unrelated materials, i.e., contaminants, including native materials from which the material is obtained. For example, a purified virus is preferably substantially free of host cell or culture components, including tissue culture or egg proteins, non-specific pathogens, and the like. As used herein, the term “substantially free” is used operationally, in the context of analytical testing of the material. Preferably, purified material substantially free of contaminants is at least 50% pure; more preferably, at least 90% pure, and still more preferably at least 99% pure. Purity can be evaluated by chromatography, gel electrophoresis, immunoassay, composition analysis, biological assay, and other methods known in the art.
[00108] Methods for purification are well-known in the art. Viral particles can be purified by ultrafiltration through sucrose cushions or by ultracentrifugation, preferably continuous centrifugation (see Furminger, In: Nicholson, Webster and May (eds.), Textbook of Influenza, Chapter 24, pp. 324-332). Other purification methods are possible and contemplated herein. A purified material may contain less than about 50%, preferably less than about 75%, and most preferably less than about 90%, of the cellular components, media, proteins, or other undesirable components or impurities (as context requires), with which it was originally associated. The term
“substantially pure” indicates the highest degree of purity which can be achieved using conventional purification techniques known in the art.
Libraries and Kits
[00109] In one aspect, provided herein is a library comprising a plurality of transcription units. Each transcription unit may comprise one or more gene sequence(s) which are placed under the control of a regulatory sequence. Further, each transcription unit may further comprise one or more gene sequence barcodes each of which uniquely identifies one of the gene sequence(s) and a second barcode that uniquely identifies the regulatory sequence.
[00110] In some embodiments, the regulatory sequence described herein may comprises one or more elements selected from a promoter, an enhancer, a silencer, an insulator, an operator, and a terminator. In some embodiments, the regulatory sequence described herein comprises at least a promoter.
[00111] In some embodiments, one or more transcriptional units may further comprise a nucleotide sequence encoding an organellar targeting signal which directs a gene product from the gene sequence to a target location (e.g., an organellar or a sub-organellar compartment, cell surface) in a host cell. The organellar targeting signal may be selected from, for example, a mitochondrial targeting signal, an endoplasmic reticulum (ER) targeting signal, a nuclear localization sequence, a peroxisome targeting signal, and a lysosome targeting signal, and a membrane targeting signal.
[00112] In some embodiments, the one or more test transcriptional units further comprises an organellar targeting signal barcode that uniquely identifies the nucleotide sequence encoding the organellar targeting signal.
[00113] In some embodiments, the barcodes, e.g., gene sequence barcodes, regulatory sequence barcodes, and/or organellar targeting signal barcodes, may be present anywhere in the transcription unit. For example, the barcodes may be present in a 3’ untranslated region (UTR) or 5’ UTR of each test transcription unit, or in an intron engineered into the transcription unit. In one embodiment, the barcodes are present in the 3’ UTR of each test transcription unit.
[00114] In some embodiments, each transcription unit further comprises a pair of universal primer binding sites flanking the barcodes.
[00115] In some embodiments, a library of transcriptional units may comprise about 2 to 10,000 transcription units. In some embodiments, a library of transcriptional units may comprise about 2 to 5000 transcription units. In some embodiments, a library of transcriptional units may comprise about 2 to 2500 transcription units. In some embodiments, a library of transcriptional units may comprise about 2 to 1000 transcription units. In some embodiments, the library comprises about 4 to 800, about 4 to 720, about 4 to 600, about 4 to 500, about 4 to 400, about 4 to 300, about 4 to 250, about 4 to 200, about 4 to 160, about 4 to 120, about 4 to 100, about 4 to 80, about 4 to 60, about 4 to 40, about 4 to 30, about 4 to 20, about 4 to 18, about 4 to 15, about 4 to 12, about 4 to 10, about 6 to 800, about 6 to 720, about 6 to 600, about 6 to 500, about 6 to 400, about 6 to 300, about 6 to 250, about 6 to 200, about 6 to 160, about 6 to 120, about 6 to 100, about 6 to 80, about 6 to 60, about 6 to 40, about 6 to 30, about 6 to 20, about 6 to 18, about 6 to 15, about 6 to 12, about 6 to 10, about 8 to 800, about 8 to 720, about 8 to 600, about 8 to 500, about 8 to 400, about 8 to 300, about 8 to 250, about 8 to 200, about 8 to 160, about 8 to 120, about 8 to 100, about 8 to 80, about 8 to 60, about 8 to 40, about 8 to 30, about 8 to 20, about 8 to 18, about 8 to 15, about 8 to 12, about 10 to 800, about 10 to 720, about 10 to 600, about 10 to 500, about 10 to 400, about 10 to 300, about 10 to 250, about 10 to 200, about 10 to 160, about 10 to 120, about 10 to 100, about 10 to 80, about 10 to 60, about 10 to 40, about 10 to 30, about 10 to 20, about 10 to 18, about 10 to 15, about 12 to 800, about 12 to 720, about 12 to 600, about 12 to 500, about 12 to 400, about 12 to 300, about 12 to 250, about 12 to 200, about 12 to 160, about 12 to 120, about 12 to 100, about 12 to 80, about 12 to 60, about 12 to 40, about 12 to 30, about 12 to 20, about 15 to 800, about 15 to 720, about 15 to 600, about 15 to 500, about 15 to 400, about 15 to 300, about 15 to 250, about 15 to 200, about 15 to 160, about 15 to 120, about 15 to 100, about 15 to 80, about 15 to 60, about 15 to 40, about 15 to 30, about 15 to 20, about 20 to 800, about 20 to 720, about 20 to 600, about 20 to 500, about 20 to 400, about 20 to 300, about 20 to 250, about 20 to 200, about 20 to 160, about 20 to 120, about 20 to 100, about 20 to 80, about 20 to 60, about 20 to 40, or about 20 to 30 transcription units. In some embodiments, the library comprises about 4, about 6, about 8, about 10, about 12, about 15, about 18, about 20, about 25, about 30, about 35, about 40, about 45, about 50, about 60, about 70, about 80, about 90, about 100, about 120, about 150, about 160, about 180, about 200, about 250, about 300, about 350, about 400, about 450, about 500, about 550, about 600, about 650, about 700, about 750, about 8000, about
900, about 1000, about 2000, about 2500, about 3000, about 4000, about 5000, about 6000, about 7000, about 800, about 9000, or about 10,000 transcription units.
[00116] In another aspect, provided herein is a library comprising a plurality of vectors, wherein each vector comprises a transcriptional unit selected from the library of transcription units described herein.
[00117] In some embodiments, vectors included in the vector library described herein are viral vectors. Non-limiting examples of suitable viral vectors include an adenoviral vector, a retroviral vector (e.g., lentiviral vector), a herpes viral vector, a baculoviral vector, and/or an adeno- associated viral (AAV) vector. In one embodiment, the viral vector used herein is a lentiviral vector.
[00118] In other embodiments, vectors included in the vector library described herein are non- viral vectors. In one embodiment, the non-viral vector is a transposon. Exemplary transposons include PiggyBac transposons and sleeping beauty transposons.
[00119] In another aspect, provided herein is a library comprising a plurality of host cells, wherein each host cell expresses one or more gene product(s) from one or more transcriptional unit(s) selected from the library comprising a plurality of transcription units described herein or comprises a vector from the library of vectors described herein. The library comprising a plurality of host cells can be referred as a “host cell library”. A “host cell library” is a collection of host cells that may differ from one another by at least one element in the host cells, such as a gene sequence, regulatory sequence (e.g., promoter), or organellar targeting signal expressed from the transcription unit comprised within the host cells.
[00120] In another aspect, provided herein is a library comprising a plurality of regulatory sequences, each of which is operably linked to a nucleotide sequence encoding an organellar targeting signal. The regulatory sequence may comprise one or more elements selected from a promoter, an enhancer, a silencer, an insulator, an operator, and a terminator. In some embodiments, the regulatory sequence described herein comprises at least a promoter, such as a promoter described herein. The organellar targeting signal may be selected from those known in the art and described herein, such as a mitochondrial targeting signal, an endoplasmic reticulum (ER) targeting signal, a nuclear localization sequence, a peroxisome targeting signal, a lysosome targeting signal, and a membrane targeting signal.
[00121] In another aspect, provided herein is a kit comprising a library described herein, and optionally, packaging and/or instructions for using the same. The kit may further comprise one or more pairs of universal primers. The kit may further comprise enzymes or buffers that may be necessary to carry out one or more of the method steps. Kits may contain a single container that contains the compositions with or without other components (e. ., primers, enzymes, buffers) or may have a separate container for each component.
EXAMPLES
[00122] The present invention is also described and demonstrated by way of the following examples. However, the use of these and other examples anywhere in the specification is illustrative only and in no way limits the scope and meaning of the invention or of any exemplified term. Likewise, the invention is not limited to any particular preferred embodiments described here. Indeed, many modifications and variations of the invention may be apparent to those skilled in the art upon reading this specification, and such variations can be made without departing from the invention in spirit or in scope. The invention is therefore to be limited only by the terms of the appended claims along with the full scope of equivalents to which those claims are entitled.
Example 1. Shotgun Genetic Engineering Workflow
[00123] A “shotgun” genetic engineering methodology employing a synthetic library screening approach was designed to dramatically expand the scale and speed with which the possible solution space can be sampled in “pools” of solutions rather than the state-of-the-art of sampling individual solutions in parallel (Figs. 1A-1B).
[00124] First, a set of coding sequences (CDSes) from across the phylogenetic tree are identified that are relevant to the desired cellular behavior to be engineered. There may be multiple metabolic solutions that perform overlapping functions within the same species. A set of coding sequences (CDSes) relevant to multiple metabolic solutions that perform overlapping functions can be screened to add functional diversity. These are sourced from a divergent set of species spanning broad phylogenetic diversity in order to increase the solution space. The set of species may encompass any known species. The coding sequences may encompass any synthetic sequences including sequences that may not exist in nature. Each of these sequences is codon-
optimized for expression in the destined host cell and de novo synthesized. Importantly, these genes can also include sequences that can be used to modify the existing cellular environment (e.g., CRISPRa/CRISPRi components). A set of genes are also included in the pool that are expected to not contribute to the cellular behavior (e.g., GFP) as negative controls. Second, in order to sample diversity in expression level, a set of regulatory sequences such as promoters e.g., strong, medium or weak promoters) that drive varying levels of gene expression are assembled. Each promoter will be encoded in the library as a stand-alone version as well as in versions with organellar targeting signals (OTSes) attached, which allows for the specific targeting of gene products to different organellar compartments within the cell including, for example, the nucleus (with a nuclear localization signal or sequence (NLS)), mitochondria (with a mitochondrial localization signal (MLS)), and the endoplasmic reticulum. Importantly, each promoter/OTS combination and each gene product is associated with a unique 8 base-pair DNA barcode that facilitates downstream analysis.
[00125] A golden gate cloning strategy is then employed to generate a library of Transcription Units (TUs) that encode for random combinations of promoters/OTSes and CDSes. For instance, starting from an initial pool of 5 promoters that come with 2 OTSes (none or Mitochondria) and 10 CDSes, 100 different TUs are generated in a pool (5 x 2 x 10). The acceptor vector in this golden gate cloning strategy contains lentiviral sequences allowing for lentiviral integration of each TU into the target cell type. Importantly, the cloning strategy is designed to not affect the expression of the TUs in cells. Scars such as the restriction enzyme sites that enable golden gate cloning and the barcodes that enable downstream analysis are located in the 3’ UTR of each TU. The 3’ UTR also contains universal primer binding sites to amplify across the promoter and gene barcodes such that the composition of each TU can be identified by deep Next Generation Sequencing of a PCR product that is the same size.
[00126] A pool of lentivirus from this library of TUs is then generated and used to infect cells at varying multiplicity of infection (MOI). This leads to the random integration of varying combinations of TUs in each individual cell, thereby generating a pool of cells each bearing a unique combination of TUs. The pool of cells is then challenged with a selective condition such that only those cells that contain a functional program will confer a growth advantage. DNA is then isolated from these clones. PCR is performed using universal primers pairs and the resultant product is deep sequenced. Sequencing reads are informative of the identity and relative copy
number of each TU that is present in a clone that is able to confer the selective advantage. By integrating information of all the TUs present in cells with the selective advantage, it is then deduced what is likely to be a successful solution toward enabling the desired cellular behavior in that cell type. Enormous diversity is thus generated directly relevant to the desired cellular behavior and yet a small search space is left to interrogate for the components underlying the cellular behavior. During this process, it is furthermore possible to modulate the host cell genome by targeting the expression levels of each of the host’s genes, allowing increased diversity sampled on yet another dimension.
[00127] The present shotgun genetic engineering workflow can also be used to go beyond deducing the identity of the optimal solution. For example, this approach can allow for the possibility of performing single-cell RNA sequencing (and other single cell omics readouts) on cells that grow well in the selective condition allowing for added layers of data integration that can inform the cellular engineering effort. Another possibility can be the incorporation of synthetic proteins into the present coding sequence libraries. An increasing number of efforts towards predictive protein engineering have shown success in recent years though no extensive tests have been performed to sample a large library of such proteins.
Example 2. Applying Shotgun Genetic Engineering to Engineer Valine Biosynthesis Pathway
[00128] The initial aim was to reconstitute a behavior that has previously been engineered in mammalian cells, namely valine biosynthesis in Chinese Hamster ovary (CHO) cells (Julie Trolle, et.al., 2022, Resurrecting essential amino acid biosynthesis in mammalian cells eLife 11 :e72847. doi.org/10.7554/eLife.72847). In previous work, CHO cells were found to be amenable to the introduction of the E. co/z-derived valine biosynthesis pathway consisting of proteins encoded by the ilvB, ilvC, ilvD, and ilvN genes. The expression of these genes at a 1 : 1 : 1 : 1 ratio, enabled by use of 2A ribosome-skipping peptide sequences, allowed CHO cells to grow in valine-free medium (Figs. 3-4).
[00129] To reconstitute this known pathway, a “minimal” library was generated encoding only the ilvB, ilvC, ilvD, and ilvN genes, each under control of an Efl a promoter and localized to the cytoplasm of the cell (1 promoter x 1 localization x 4 CDS = a 4-member library) (Fig. 5).
Plasmids encoding these 4 transcription units were pooled and collectively transfected/packaged
in HEK293Ts to generate a pool of lentivirus carrying all 4 TUs. This lentivirus was used to transduce CHO cells, which were then subject to selection in reduced-valine medium. Most cells died during selection, but prototrophic colonies formed and were expanded over 32 days of selection. 3 such colonies were picked and passaged in separated wells.
[00130] When subject to growth in valine-free conditions, 2 of 3 clones survived, exhibiting growth rates of 1.7 and 2.0 days per doubling, respectively. This is about 3 times better than the growth rate of 5.0 days per doubling as seen for the pMTIV cell line that was rationally designed/engineered in our previous work (Fig. 6).
[00131] Amplicon-seq was performed next to sequence the barcodes contained in these 3 clones as well as the infected bulk population both before and after selection in reduced-valine conditions (Fig. 7). It was found that selection on reduced-valine medium resulted in increased presence of ilvB and ilvD across both clones A7 and A9 as well as in the total pool of cells that survived reduced-valine selection.
[00132] qRT-PCR was performed to reveal what the copy number variation for each gene in the pathway corresponds to in mRNA expression levels, a first step towards reconstituting the ideal solution composition without use of lentivirus (Fig. 8). This data confirmed that increased ilvD expression strongly supports improved valine prototrophic outcomes while the same is true of increased ilvB levels though to a lesser degree. On the other hand, ilvC levels were down in selected prototrophic clones and populations relative to pMTIV ilvC levels. Similarly, ilvN levels were either level with or down relative to pMTIV ilvN levels, suggesting that these proteins are less important for enabling valine prototrophy in the context of CHO metabolism.
[00133] Altogether, this data illustrated the utility of the “shotgun” approach to solution engineering. Rather than assembling and delivering large DNA cargos at low efficiency, a methodology was designed herein that allows for readout of functional solutions to encoding novel behaviors in target cells (e.g., prokaryotic cells or eukaryotic cells) by delivering smaller DNA cargos at high efficiency. Moreover, this data demonstrated how improved outcomes can be generated simply by varying the gene dosage of individual solution components.
[00134] The complexity of the introduced library can be expanded from 4 members to 20 members (2 promoters x 2 localizations x 5 CDS) and eventually to 100 members (5 promoters x 2 localizations x 10 CDS), which will allow diversity to be sampled across promoters and in how CDSes are localized within the cell.
[00135] Utilizing these higher complexity libraries is anticipated to demonstrate the engineering of solutions that have not been previously engineered, enabling functionalities that have not been engineered in target cells (e.g., prokaryotic cells or eukaryotic cells) before.
Example 3. Application of Shotgun Genetic Engineering to achieve near full-speed growth rate in valine-free medium
[00136] LibJTR012, a 40-member library comprising 2 promoters, 2 organellar localization signals and 10 CDSes (Table 1) was designed and assembled. This library was designed with the intention to confer both valine and isoleucine prototrophy to mammalian cells. Following assembly of the LibJTR012 DNA library, the barcodes contained within the library were sequenced and each intended transcription unit was found to be represented (Fig. 9).
* Assumes 4 integration events of Efla-driven ilvN/B/C/D. Eg., for the minimal library: 4/4*3/4*2/4*174 = 4 !/44 = 9.38%.
** Assumes 4 integration events
[00137] Following viral packaging and transduction of LibJTR012 alongside a GFP control virus, populations on low valine, and low isoleucine media were selected, respectively. After 15 days of selection, cells infected with the GFP control virus had mostly died off in both the low valine and in the low isoleucine conditions. However, for the LibJTR012 population, clearly proliferating clones could be detected in both the low valine and low isoleucine conditions. Clones that appeared to be proliferating in each selective condition were individually passaged and expanded.
[00138] From LibJTR012 populations subject to 15 days of low valine selection, 16 clones were picked and passaged, of which 10 were subject to a functional growth assay in valine-free medium. 9 of 10 clones proliferated across several passaging events in valine-free medium with one clone, 012-Val-Dl, growing as quickly as 1.1 days/doubling, which is close to the growth rate of CHO cells in complete medium of 1.0 days/doubling (Fig. 10). This is considerably faster than the fastest growth rates achieved with LibJTROOl (1.7 days/doubling) despite considerably lower solution dosage. Furthermore, this is yet faster than any CHO line that has previously been engineered using rational design (4.3 days/doubling).
[00139] Next, the barcodes contained within the fastest growing clone in valine-free medium, clone 012-Val-Dl were sequenced (Fig. 11). A significant representation of only 4 CDSes, namely i/vB, ilvC, ilvD and ilvN, was found, which was consistent with previous valine biosynthetic engineering effort. All 4 CDSes were found to be mitochondrially localized, suggesting that mitochondrial localization is beneficial for optimal functionality of the valine biosynthetic pathway. In addition to the 4 mitochondrially localized genes, representation of cytosolically localized ilvC and ilvD was found.
Example 4. Engineering novel complex phenotypes with Shotgun Genetic Engineering [00140] From LibJTR012 populations subject to 15 days of low isoleucine selection, 48 clones were picked and passaged of which all were subject to a functional growth assay in isoleucine- free medium. 9 of the 48 clones exhibited growth across several passaging events in isoleucine- free medium. Of the 9 clones exhibiting isoleucine prototrophy, the fastest growing clone in isoleucine-free conditions was clone 012-10p-G7, which exhibited growth at 2.1 days/doubling (Fig. 12). For comparison, no isoleucine prototrophic CHO cells have been engineered before using rational design principles despite many attempts.
[00141] The slower maximum growth rate found among isoleucine prototrophic clones (2.1 days/doubling for isoleucine vs 1.1 days/doubling for valine) and the lower efficiency with which isoleucine prototrophic clones were identified (9 of 48 tested clones for isoleucine vs 9 of 10 tested clones for valine), are presumed to be a reflection of the difference in complexity of each pathway. This is consistent with a model in which the engineering of optimal isoleucine prototrophy requires import of a higher number of transcription units than does valine prototrophy. For example, it is possible that conferral of isoleucine prototrophy to mammalian
metabolism requires import of an additional enzymatic step to produce 2-oxobutanoate, an isoleucine-specific pathway substrate, on top of the 4 enzymatic steps that overlap with valine biosynthesis. As such, the MOI can be increased at the time of transduction to achieve faster growing clones in the isoleucine-free condition.
[00142] The barcodes contained in clone 012-10p-Hl, which grew relatively quickly in isoleucine-free medium (at 2.6 days/doubling) were sequenced. In this clone, significant representation of ilvB, ilvC, ilvD, ilvG, ilvM, and of two variants of ilvA was found, suggesting that a combination of these transcription units supports the conferral of isoleucine biosynthesis to CHO cells (Fig. 13). All of these CDSes were found to be localized to the mitochondria with ilvB showing some presence in the cytosol as well, suggesting that mitochondrial localization of these transcription units is beneficial to the isoleucine prototrophic phenotype.
Example 5. Stacking of complex phenotypes with Shotgun Genetic Engineering
[00143] Next, a select number of clones that had undergone low isoleucine selection were tested for their ability to grow in valine-free medium given the overlap in biosynthetic pathway steps. Clones which were prototrophic for isoleucine only, clones which were prototrophic for valine only as well as clones prototrophic for both valine and isoleucine to varying degrees were found (Fig. 14)
[00144] This result highlights the key benefit of the shotgun genetic engineering approach: a diversity of phenotypes can be sampled in a single experiment. It further showcases the critical importance of CDS choice, their expression levels and genetic context in influencing success of the cellular engineering project. In this case, the end phenotype can range from no prototrophy at all to prototrophy that supports growth in the amino acid starvation condition at a rate on par with growth in complete medium. If testing just one metabolic solution at a time, it would be near impossible to capture all of these different possibilities. Using the shotgun genetic engineering approach, >2.5 million different solutions have been tested at once in principle.
[00145] By combining both isoleucine and valine prototrophy in the same cells, it has been demonstrated that the shotgun genetic engineering approach can be used to ‘stack’ multiple new phenotypes to target cells (e.g., prokaryotic cells or eukaryotic cells) at once.
* * *
[00146] The present invention is not to be limited in scope by the specific embodiments described herein. Indeed, various modifications of the invention in addition to those described herein will become apparent to those skilled in the art from the foregoing description. Such modifications are intended to fall within the scope of the appended claims.
[00147] All patents, applications, publications, test methods, literature, and other materials cited herein are hereby incorporated by reference in their entirety as if physically present in this specification.
Claims
1. A method of identifying one or more transcriptional units that confer one or more desired characteristics to a host cell, comprising: a) providing (i) one or more gene sequences that encode gene product(s) associated with the one or more desired characteristics; and (ii) one or more regulatory sequences; b) generating one or more test transcription units, wherein each test transcription unit comprises one or more gene sequences which are placed under the control of a regulatory sequence, said gene sequence(s) and said regulatory sequence are independently selected from the one or more gene sequences and the one or more regulatory sequences provided in step (a), respectively; and wherein each test transcription unit further comprises one or more gene sequence barcodes each of which uniquely identifies one of said gene sequence(s) and a regulatory sequence barcode that uniquely identifies said regulatory sequence; c) generating one or more vectors, wherein each vector comprises a test transcriptional unit selected from the one or more test transcriptional units; d) introducing the one or more vectors into a population of host cells, wherein the host cells are suitable for expression of gene product(s) from the test transcriptional unit(s); e) selecting from the population of host cells generated in step (d) one or more host cells that exhibits the one or more desired characteristics; and f) detecting one or more gene sequence and regulatory sequence barcodes in DNA of the one or more host cells selected in step (e) and identifying the gene sequence(s) and the regulatory sequence(s) comprised within the one or more test transcriptional units that confer the one or more desired characteristics to the host cell(s) based on the detected barcodes.
2. The method of claim 1, wherein the method further comprises g) determining copy number of the one or more transcriptional units identified in step (f), or the level of the product(s) encoded by the gene sequence(s) comprised
within the one or more test transcriptional units identified in step (f) or corresponding RNAs.
3. The method of claim 1 or 2, wherein one or more test transcriptional units further comprise a nucleotide sequence encoding an organellar targeting signal which directs the gene product from the gene sequence to a target location in the host cell.
4. The method of claim 3, wherein the organellar targeting signal is selected from a mitochondrial targeting signal, an endoplasmic reticulum (ER) targeting signal, a nuclear localization sequence, a peroxisome targeting signal, a lysosome targeting signal, and a membrane targeting signal.
5. The method of claim 3 or 4, wherein the one or more test transcriptional units further comprise an organellar targeting signal barcode that uniquely identifies said nucleotide sequence encoding the organellar targeting signal.
6. The method of claim 5, wherein step (f) further comprises detecting the organellar targeting signal barcode in DNA of the one or more host cells selected in step (e) and identifying the nucleotide sequence encoding the organellar targeting signal comprised within the one or more test transcriptional units that confer the one or more desired characteristics to the host cell(s) based on the detected barcode.
7. The method of any one of claims 1-6, wherein each regulatory sequence comprises one or more elements selected from a promoter, an enhancer, a silencer, an insulator, an operator, and a terminator.
8. The method of any one of claims 1-7, wherein the one or more gene sequences comprise at least one gene sequence that is not naturally present in the host cells.
9. The method of any one of claims 1-7, wherein the one or more gene sequences comprise at least one gene sequence that is the same as or derived from a gene sequence naturally present in the host cells.
10. The method of any one of claims 1-9, wherein the one or more gene sequences comprise at least one gene sequence that is derived from a different species than the host cells.
11. The method of any one of claims 1-10, wherein the one or more gene sequences comprise at least one gene sequence that is codon-optimized for expression in the host cells.
12. The method of any one of claims 1-11, wherein the one or more gene sequences comprise at least one gene sequence encoding a genome engineering system and/or a component thereof.
13. The method of claim 12, wherein the genome engineering system is a CRISPR activation (CRISPRa) or CRISPR inhibition (CRISPRi) system.
14. The method of any one of claims 1-13, wherein the one or more gene sequences comprise at least one gene sequence encoding a negative control.
15. The method of claim 14, wherein the negative control is a detectable marker.
16. The method of claim 15, wherein the detectable marker is a fluorescent protein.
17. The method of any one of claims 1-16, wherein the one or more gene sequences comprise at least one gene sequence encoding a synthetic protein.
18. The method of any one of claims 1-17, wherein the one or more gene sequences comprise about 1-10,000 gene sequences.
19. The method of any one of claims 1-18, wherein the one or more gene sequences comprise about 2-100 gene sequences.
20. The method of claim 1-19, wherein one or more gene sequences comprise about 4-20 gene sequences.
21. The method of any one of claims 1-20, wherein the one or more regulatory sequences comprise about 1-10,000 regulatory sequences.
22. The method of any one of claims 1-21, wherein the one or more regulatory sequences comprise about 2-10 regulatory sequences.
23. The method of claim 1-22, wherein the one or more regulatory sequences comprise about 2-5 regulatory sequences.
24. The method of any one of claims 1-23, wherein the vector is a viral vector.
25. The method of claim 24, wherein the viral vector is an adenoviral vector, retroviral vector, or herpes viral vector.
26. The method of claim 25, wherein the retroviral vector is a lentiviral vector.
27. The method of any one of claims 1-23, wherein the vector is a non-viral vector.
28. The method of claim 27, wherein the non-viral vector is a transposon.
29. The method of any one of claims 1-28, wherein the one or more test transcription units are generated using a Golden Gate cloning technique.
30. The method of any one of claims 1-29, wherein the gene sequence barcode(s), regulatory sequence barcode, and/or organellar targeting signal barcode are present in a 3’ untranslated region (UTR) of each test transcription unit.
31. The method of claim 30, wherein the 3’ UTR of each test transcription unit further comprises a pair of universal primer binding sites flanking the barcodes.
32. The method of claim 31 , wherein the detecting of one or more barcodes is achieved using polymerase chain reaction (PCR) with one or more pairs of universal primers followed by sequencing of PCR product(s).
33. The method of claim 2, wherein the copy number of the one or more transcriptional units is determined using quantitative PCR, digital droplet PCR, or hybridization array.
34. The method of any one of claims 1-33, wherein the method further comprises performing one or more single-cell omics analyses on the one or more host cells selected in step (e).
35. The method of claim 34, wherein the one or more single-cell omics analyses comprise single-cell RNA sequencing, single-cell proteomics, single-cell metabolomics, and/or single-cell epigenomics.
36. The method of any one of claims 1-35, step (e) comprises culturing the population of host cells generated in step (d) under conditions allowing for selection of the one or more host cells that exhibit the desired characteristic.
37. The method of any one of claims 1-36, wherein the host cells are mammalian cells.
38. A method of genetically engineering a cell, comprising introducing into the cell the one or more transcriptional units identified according to the method of any one of claims 1- 37.
39. The method of any one of claims 1-38, wherein the one or more desired characteristics is selected from protein composition, protein content, DNA composition, DNA content, RNA composition, RNA content, fatty acid composition, fatty acid content, lipid composition, lipid content, sugar composition, sugar content, tolerance to environmental stressor(s), doubling time, prototrophy, biosynthesis capability, cell surface marker(s), cell size, light absorbance, light reflection, fluorescence, light scatter, polarization, one or more electrical properties of the cells, one or more magnetic properties, one or more morphological properties, membrane permeability, membrane fluidity, and/or redox state.
40. A library comprising a plurality of transcription units, wherein each transcription unit comprises one or more gene sequence(s) which are placed under the control of a regulatory sequence; and wherein each transcription unit further comprises one or more gene sequence barcodes each of which uniquely identifies one of said gene sequence(s) and a second barcode that uniquely identifies said regulatory sequence.
41. The library of claim 40, wherein one or more transcriptional units further comprise a nucleotide sequence encoding an organellar targeting signal which directs a gene product from the gene sequence to a target location in a host cell.
42. The library of claim 41, wherein the organellar targeting signal is selected from a mitochondrial targeting signal, an endoplasmic reticulum (ER) targeting signal, a nuclear localization sequence, a peroxisome targeting signal, and a lysosome targeting signal, and a membrane targeting signal.
43. The library of claim 40 or 41, wherein the one or more test transcriptional units further comprises an organellar targeting signal barcode that uniquely identifies said nucleotide sequence encoding the organellar targeting signal.
44. The library of any one of claims 40-43, wherein each regulatory sequence comprises one or more elements selected from a promoter, an enhancer, a silencer, an insulator, an operator, and a terminator.
45. The library of any one of claims 40-44, wherein the gene sequence barcode(s), regulatory sequence barcode, and/or organellar targeting signal barcode are present in a 3’ untranslated region (UTR) of each transcription unit.
46. The library of claim 45, wherein the 3’ UTR of each transcription unit further comprises a pair of universal primer binding sites flanking the barcodes.
47. The library of any one of claims 40-46, wherein the library comprises about 2-10,000 transcription units.
48. The library of any one of claims 40-47, wherein the library comprises about 2-1000 transcription units.
49. The library of any one of claims 40-48, wherein the library comprises about 8-100 transcription units.
50. A library comprising a plurality of vectors, wherein each vector comprises a transcriptional unit selected from the library of any one of claims 40-49.
51. The library of claim 50, wherein the vector is a viral vector.
52. The library of claim 51, wherein the viral vector is an adenoviral vector, retroviral vector, or herpes viral vector.
53. The library of claim 52, wherein the retroviral vector is a lentiviral vector.
54. The library of claim 50, wherein the vector is a non-viral vector.
55. The library of claim 54, wherein the non-viral vector is a transposon.
56. A library comprising a plurality of host cells, wherein each host cell expresses one or more gene product(s) from one or more transcriptional unit(s) selected from the library of any one of claims 40-49 or comprises a vector from the library of any one of claims 50- 55.
57. A kit comprising a library of any one of claims 40-56, and optionally, packaging and/or instructions for using the same.
58. The kit of claim 57, further comprising one or more pairs of universal primers.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202363533483P | 2023-08-18 | 2023-08-18 | |
| US63/533,483 | 2023-08-18 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025042766A1 true WO2025042766A1 (en) | 2025-02-27 |
Family
ID=92672024
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2024/042753 Pending WO2025042766A1 (en) | 2023-08-18 | 2024-08-16 | Shotgun genetic engineering |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025042766A1 (en) |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2013169917A1 (en) * | 2012-05-08 | 2013-11-14 | Cellecta, Inc. | Clonal analysis of functional genomic assays and compositions for practicing same |
| WO2016149422A1 (en) * | 2015-03-16 | 2016-09-22 | The Broad Institute, Inc. | Encoding of dna vector identity via iterative hybridization detection of a barcode transcript |
| WO2020139892A1 (en) * | 2018-12-28 | 2020-07-02 | University Of Pittsburgh - Of The Commonwealth System Of Higher Education | Methods and materials for single cell transcriptome-based development of aav vectors and promoters |
-
2024
- 2024-08-16 WO PCT/US2024/042753 patent/WO2025042766A1/en active Pending
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2013169917A1 (en) * | 2012-05-08 | 2013-11-14 | Cellecta, Inc. | Clonal analysis of functional genomic assays and compositions for practicing same |
| WO2016149422A1 (en) * | 2015-03-16 | 2016-09-22 | The Broad Institute, Inc. | Encoding of dna vector identity via iterative hybridization detection of a barcode transcript |
| WO2020139892A1 (en) * | 2018-12-28 | 2020-07-02 | University Of Pittsburgh - Of The Commonwealth System Of Higher Education | Methods and materials for single cell transcriptome-based development of aav vectors and promoters |
Non-Patent Citations (5)
| Title |
|---|
| "Textbook of Influenza", vol. 24, pages: 324 - 332 |
| AUSUBEL ET AL.: "Current Protocols in Molecular Biology", 1992, GREENE PUBLISHING ASSOCIATES |
| HARLOWLANE: "Antibodies: A Laboratory Manual", 1990, COLD SPRING HARBOR LABORATORY PRESS |
| LALANNE JEAN-BENOÎT ET AL: "Multiplex profiling of developmental enhancers with quantitative, single-cell expression reporters", BIORXIV, 10 December 2022 (2022-12-10), pages 1 - 46, XP093218158, Retrieved from the Internet <URL:https://www.biorxiv.org/content/10.1101/2022.12.10.519236v1.abstract> [retrieved on 20241025], DOI: 10.1101/2022.12.10.519236 * |
| SAMBROOK ET AL.: "Molecular Cloning: A Laboratory Manual", 1989, COLD SPRING HARBOR LABORATORY PRESS |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Bieniossek et al. | MultiBac: expanding the research toolbox for multiprotein complexes | |
| EP3604527A1 (en) | Using programmable dna binding proteins to enhance targeted genome modification | |
| Mazzoni-Putman et al. | A plant biologist’s toolbox to study translation | |
| CN111133100A (en) | Multiplexed receptor-ligand interaction screening | |
| US11542633B2 (en) | Nucleic acid-guided editing of exogenous polynucleotides in heterologous cells | |
| Schmeer et al. | Pharmaceutical grade large-scale plasmid DNA manufacturing process | |
| WO2002072783A2 (en) | Identification of cellular targets for biologically active molecules | |
| Flies et al. | Generation and testing of fluorescent adaptable simple theranostic (FAST) proteins | |
| WO2025042766A1 (en) | Shotgun genetic engineering | |
| Hopkins et al. | Optimizing transient recombinant protein expression in mammalian cells | |
| CN109957568A (en) | For targeting the gRNA and the HBB mutation detection methods based on C2c2, detection kit of HBB RNA | |
| Forstner et al. | Optimization of protein expression systems for modern drug discovery | |
| CN109402115A (en) | Target Rett mutated gene RNA gRNA and Rett mutated gene detection method, detection kit | |
| WO2021142394A1 (en) | Cell populations with rationally designed edits | |
| EP2692864B1 (en) | Vector for foreign gene introduction and method for producing vector to which foreign gene has been introduced | |
| CN109897852A (en) | The gRNA of tumour related mutation gene based on C2c2, detection method, detection kit | |
| CN109295055A (en) | The gRNA of tumour related mutation gene based on C2c2, detection method, detection kit | |
| WO2023064904A9 (en) | Method for profiling of cells from groups of cells | |
| Chandrashekar et al. | Selection, Screening, and Analysis of Recombinant Clones | |
| Geny et al. | Gene tagging with the CRISPR-Cas9 system to facilitate macromolecular complex purification | |
| RU2807690C1 (en) | Cell-free protein synthesis system based on gallus gallus embryonic cells and method of protein synthesis based on cell-free protein synthesis system in embryonic cells | |
| EP4296363B1 (en) | A cell surface tag exchange (cste) system for tracing and manipulation of cells during recombinase mediated cassette exchange integration of nucleic acid sequences to engineered receiver cells | |
| US20250043314A1 (en) | Targeted genomic barcoding for tracking of editing events | |
| US20250179484A1 (en) | Fusion proteins | |
| HK40106819A (en) | A cell surface tag exchange (cste) system for tracing and manipulation of cells during recombinase mediated cassette exchange integration of nucleic acid sequences to engineered receiver cells |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24765897 Country of ref document: EP Kind code of ref document: A1 |