WO2024263652A1 - Compositions for and methods of tagging endogenously expressed proteins - Google Patents
Compositions for and methods of tagging endogenously expressed proteins Download PDFInfo
- Publication number
- WO2024263652A1 WO2024263652A1 PCT/US2024/034636 US2024034636W WO2024263652A1 WO 2024263652 A1 WO2024263652 A1 WO 2024263652A1 US 2024034636 W US2024034636 W US 2024034636W WO 2024263652 A1 WO2024263652 A1 WO 2024263652A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- nucleic acid
- cells
- cell
- acid sequence
- protein
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/87—Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
- C12N15/90—Stable introduction of foreign DNA into chromosome
- C12N15/902—Stable introduction of foreign DNA into chromosome using homologous recombination
- C12N15/907—Stable introduction of foreign DNA into chromosome using homologous recombination in mammalian cells
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/10—Processes for the isolation, preparation or purification of DNA or RNA
- C12N15/102—Mutagenizing nucleic acids
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/63—Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
- C12N15/65—Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression using markers
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/0004—Oxidoreductases (1.)
- C12N9/0008—Oxidoreductases (1.) acting on the aldehyde or oxo group of donors (1.2)
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/14—Hydrolases (3)
- C12N9/16—Hydrolases (3) acting on ester bonds (3.1)
- C12N9/22—Ribonucleases [RNase]; Deoxyribonucleases [DNase]
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Y—ENZYMES
- C12Y102/00—Oxidoreductases acting on the aldehyde or oxo group of donors (1.2)
- C12Y102/01—Oxidoreductases acting on the aldehyde or oxo group of donors (1.2) with NAD+ or NADP+ as acceptor (1.2.1)
- C12Y102/01012—Glyceraldehyde-3-phosphate dehydrogenase (phosphorylating) (1.2.1.12)
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K2319/00—Fusion polypeptide
- C07K2319/20—Fusion polypeptide containing a tag with affinity for a non-protein ligand
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/11—DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
- C12N15/52—Genes encoding for enzymes or proenzymes
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N2310/00—Structure or type of the nucleic acid
- C12N2310/10—Type of nucleic acid
- C12N2310/20—Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPR]
Definitions
- the invention relates generally to compositions comprising homology directed repair (HDR) donor templates for tagging endogenous proteins by gene editing enzymes, such as CRISPR/Cas, for target protein purification from a biological sample containing heterogenous nucleic acid species.
- HDR homology directed repair
- PCT_SL.xml size: 954,220 bytes; and date of creation: June 19, 2024
- HDR homology directed repair
- compositions that can provide all three HDR donor types from a single template and methods of their use for purifying proteins in their native states under different conditions from a biological sample, such as a cell culture sample containing heterogeneous nucleic acids.
- the present disclosure relates to a versatile strategy of tagging endogenous proteins, using a multicomponent HDR donor template that can be easily converted to all three HDR donor forms for parallelized knock-in experiments.
- This strategy provides additional structural and mechanistic insights that are not revealed by using recombinant proteins. While this disclosure focuses on structural characterization of endogenous proteins, it will be understood by those skilled in the art that the same strategy can be applied to other types of cell lines and used beyond structural biology.
- the present inventors generated a set of plasmids, with different variations containing a DNA fragment to be knocked-in to the targeted locus, flanked by two multiple cloning sites (MCS) for insertion of left and right homology arms (L- and R-arms) flanking the targeted locus.
- the plasmids also contain a bacterial origin of replication and a selectable antibiotic resistance marker for efficient amplification in bacterial cells.
- the inserted DNA fragment between two homology arms contains one or a plurality of affinity tags, whose choice can vary to match specific experimental goals, and one or a plurality of selectable markers for use in mammalian cells.
- a self-cleaving peptide is inserted between the affinity tag and the selection marker, and between the two selection markers.
- the two MCSs in the plasmid can be used as cleavage sites to convert the plasmid into a dsDNA HDR donor.
- a ssDNA HDR donor can be generated from dsDNA form.
- all three forms of HDR donor can be generated without having to make multiple constructs and can be used to tag any target endogenous protein in parallel.
- Genome- edited cells can be selected in two steps, i.e., antibiotic treatment followed by fluorescence activated cell sorting (FACS). Alternatively, in practice, a majority of un-edited cells are removed by antibiotic treatment. At this point, the knock-in result can be checked by western blot, sequencing, and/or fluorescent light microscopy. The enriched population of genome-edited cells makes the final sorting by FACS more efficient or even unnecessary.
- FACS fluorescence activated cell sorting
- the disclosure provides a composition comprising a nucleic acid molecule comprising: (i) a first and a second homology donor (HDR) region complementary to a target domain; (ii) at least a first and a second nucleic acid sequence that encodes a selection domain, wherein each of the first and second nucleic acid sequences that encode a selection domain are positioned in a 5’ to 3’ orientation between the first and second HDR sequence; (iii) a first nucleic acid sequence encoding a cleavage site positioned between the first HDR sequence and the first nucleic acid sequence that encodes a selection domain; (iv) a second nucleic acid sequence encoding a cleavage site positioned in a 5’ to 3’ orientation between the first nucleic acid sequence that encodes a selection domain and the second nucleic acid sequence that encodes a selection domain; (v) a first and a second multiple cloning site, wherein the first multiple clon
- the disclosure provides a cell comprising an endogenous nucleic acid sequence encoding an expressible amino acid, said endogenous nucleic acid sequence modified on its amino or carboxy terminus by an expressible exogenous modification element comprising: (i) at least a first and a second nucleic acid sequence that encode a selection domain, wherein each of the first and second nucleic acid sequences that encode a selection domain; (ii) a first nucleic acid sequence encoding a cleavage site positioned between the amino or carboxy terminus and the first nucleic acid sequence that encodes a selection domain; and (iii) a second nucleic acid sequence encoding a cleavage site positioned between the first nucleic acid sequence that encodes a selection domain and the second nucleic acid sequence that encodes a selection domain.
- the disclosure also provides a method of culturing a cell comprising: exposing the composition of the first aspect to one or plurality of cells for a period of time sufficient to transfect the cell.
- the disclosure provides a method of editing endogenous DNA of one or a plurality of cells comprising exposing the nucleic acid molecule according to the first aspect to one or plurality of cells for a period of time sufficient to transfect the cell.
- the disclosure provides a method of isolating a protein in a cell comprising exposing the nucleic acid molecule of the first aspect to the one or plurality of cells for a period of time sufficient to transfect the cell.
- the disclosure provides a method of endogenously labeling a protein in a cell comprising exposing the nucleic acid molecule of the first aspect to the one or plurality of cells for a period of time sufficient to transfect the cell.
- the disclosure provides a method of screening for a therapeutic agent in a cell comprising exposing any one or plurality of cells according to the second aspect to a pathogen.
- the disclosure also relates to a method of imaging a cell comprising exposing the nucleic acid molecule according to the first aspect to one or plurality of cells for a period of time sufficient to transfect the cell such that at least one target gene is tranfected into endogenous DNA of the cell, exposing the cell to microscopy, and stimulating the cell with light at a frequency sufficient to induce fluorescence of the endogenously expressed protein encoded by the nucleic acid molecule if the endogenously expressed protein comprises a fluorescent tag or label; and, optionally, inducing expression of the endogenously expressed nucleic acid sequence prior to stimulating the cell with light.
- the disclosure also relates to a method of manufacturing a single stranded DNA, a doublestranded DNA simultaneously from a nucleic acid molecule disclosed herein, the method comprising exposing a disclosed nucleic acid molecule disclosed herein (e.g.
- a multicomponent HDR donor template to an endonuclease specific for the multiple cloning site of the nucleic acid molecule forming a cleaved DNA molecule encoding the target protein, exposing the resulting cleaved DNA to a pair of primers specific for the 3’ and 5’ ends of the endonuclease recognition site, and exposing the cleaved DNA to an exonuclease digestion to generate a population of linear single-stranded DNA comprising the nucleic acid sequence encoding the target protein and a linear double-stranded DNA comprising the nucleic acid sequence encoding the target protein.
- FIGS. 1 A to ID show HDR donor template design and an endogenous protein tagging scheme.
- FIG. 1A shows a schematic of pYC plasmid design. All elements are colored and labeled.
- changeable HDR which contains the changeable tags and selection markers (blue) flanked by corresponding left and right homology arm (yellow)
- the plasmid contains ColEl origin of replication (red) and ampicillin resistance gene (grey) for routine amplification in bacteria.
- FIG. IB shows a changeable HDR to tag N-terminus (above) or C-terminus (below) of a target protein (middle).
- FIG 1C shows a workflow of converting a plasmid into dsDNA and ssDNA HDR donors.
- FIG. ID shows agarose gels with SYBR gold showing the size of DNA corresponding to all forms of HDR donors.
- FIGS. 2 A to 2G show tagging of endogenous proteins in HEK293T and Jurkat cells.
- FIG. 2A shows fluorescence microscopy images (left) and anti-FLAG western blot (right) of cells after puromycin treatment.
- FIG. 2B shows Knock-in efficiency for each HDR donor type across the six target genes. Knock-in experiments and validation processes were repeated three times.
- FIG. 2C shows fluorescence images showing knock-in results of five different genes in Jurkat cells (upper row). Brightfield image (bottom row) is used to identify all cells within field of view (yellow dotted line). The bar scale is 100 pm.
- FIG. 2D shows that tagging is validated by anti -FLAG western blot. All proteins show their theoretical size on SDS-PAGE, the same as HEK293T cells do in FIG. 2A.
- FIG. 2E shows that the size distribution of genome-edited cells is similar to the control cells.
- FIG. 2F shows histograms of integrated mNeonGreen fluorescence intensity from genome- edited cells. To obtain the control signal, wildtype Jurkat cells were used.
- FIG. 2G shows a graph representing the knock-in efficiency across the five different genes. The knock-in experiments and validation processes were repeated twice in Jurkat cells.
- FIGS. 3A to 3F show characterization of endogenous PCNA.
- FIG. 3 A shows the SEC profile of 3 *FLAG-tagged endogenous PCNA after affinity purification by anti-FLAG M2 resin. Colored labels (black and red line) above SEC profile match with the corresponding fractions in the western blot. The pink box on the profile indicates the theoretical elution volume of PCNA.
- FIG. 3B shows a negative stained EM micrograph and representative 2D class averages of purified endogenous PCNA.
- FIG. 3C shows a cryo-EM micrograph of purified endogenous PCNA.
- 3D shows 3D reconstruction of endogenous trimeric PCNA (from left to right: top, side views of the density map, and docking of atomic structure (PDB: 4D2G) to cryo-EM density map) and representation 2D class averages of trimeric PCNA.
- FIG. 3E shows that in combination with ALFA-tag and GFP-conjugated ALFA nanobody (ALFANB-eGFP), different complexes of endogenous PCNA were observed (black line). The experiment was repeated three times.
- FIG. 4A to 4K show structural changes of human endogenous GAPDH in response to prolonged oxidative stress.
- FIG. 4A shows two different views of human endogenous GAPDH cryo- EM map determined without oxidative stress. A non-protein density (yellow) is seen.
- FIG. 4B shows the electrostatics surface showing the mixture of positively charge and hydrophobic surface surrounding unknown density. The groove is composed by mainly Ca of the loop lining amino acids. R13 and R16 equip positively charged local environment for substrate recruitment.
- FIG. 4C shows a lipid strip assay showing that de-lipidated endogenous GAPDH preferentially binds to PIP3. The experiment is repeated twice independently.
- FIG. 4A shows two different views of human endogenous GAPDH cryo- EM map determined without oxidative stress. A non-protein density (yellow) is seen.
- FIG. 4B shows the electrostatics surface showing the mixture of positively charge and hydrophobic surface surrounding unknown density. The groove is composed by mainly Ca of the loop
- FIG. 4D shows that following oxidation, FLAG-tagged endogenous GAPDH translocates to the nucleus in a time-dependent manner.
- the experiment was triplicated independently.
- FIGS. 4E and 4F show that upon prolonged oxidative stress, GAPDH enzymic activity decreases.
- the OD450 curve is normalized to the maximum point of the 0-hour oxidation condition.
- the bar graph shows the amount of generated NADH and the statistical significance are tested by two tailed T-test. The p-values are shown on the graph. The experiment was repeated multiple times.
- FIG. 4G shows that the overall architecture of nuclear GAPDH after 8- hours oxidation is almost identical (rmsd:0.270) with endogenous GAPDH from healthy cells.
- FIG. 41 shows that the catalytic site configuration shows a different configuration at 8-hours oxidation from nucleus GAPDH. Due to the oxidative modification on Cl 52, the side chain of Cl 52 shows a bulk density and loose the connection with Hl 79, called the “inactive subunit”. This is present both in the 8-hour and 24-hour conditions.
- FIG. 4J shows that based on single subunit analysis, the number of “inactive subunits” in the tetrameric GAPDH complex increases during oxidative stress, and nuclear GAPDH is more damaged than cytosolic GAPDH.
- FIG. 4K shows that Endogenous GAPDH is damaged by prolonged oxidative stress and lose its functional subunits. This would trigger other post- translational modifications (PTMs) and eventually nuclear translocation occurs.
- PTMs post- translational modifications
- FIG. 5A to 5E Each panel shows a SEC profile of the tagged proteins purified by anti-FLAG M2 resin (top) with colored lines above marking fractions, anti-FLAG western blots (middle) from fractions marked by the same-colored line, negative stain EM micrograph and 2D class averages and (bottom) tire sample from fractions marked with pink shadow, which shows the theoretical elution position of the target protein.
- FIG. 5A is ACTB
- FIG. 5B is TKT
- FIG. 5C is VIM
- FIG. 5D is FASN
- FIG. 5E is GAPDH.
- FIGS. 6A, 6B, and 6C show an image processing workflow. The workflow of 200kV cryo- EM datasets is illustrated. Reconstructions of FASN, methylosome, GAPDH, and PCNA are shown. Among them, methylosome is purified as a “contaminant” protein of FASN.
- FIGS. 7A, 7B, and 7C show the workflow of Krios dataset about endogenous GAPDH.
- Five different Krios datasets were collected to elucidate GAPDH changes under different oxidation stresses.
- the motion of all datasets was corrected by using Relion.
- To reach the optimal particle picking on home-made GO-amino grids multiple picking methods were applied and evaluated manually in a micrograph-by-micrograph manner.
- By using 2D classification junk particles were eradicated, and the remaining particles were subjected to 3D-based analysis.
- Relion was mainly used while Cryosparc-based 3D classification was implemented on the nuclear GPADH dataset after 24 hours oxidation.
- the red boxed classes were used as a final particles and iterative refinement with D2 symmetry, ctf-refinement, and Bayesian polishing were implemented. The final resolutions are shown with the corresponding maps.
- D2 refinement and particle expansion
- G signal subtraction
- FIGS. 8 A to 8E show cryo-EM structures of endogenous GAPDH.
- FIG. 8 A shows from left to right, angular distribution, local resolution and FSC curves of the endogenous GAPDH from 0- hour oxidation.
- FIGS 8B and 8C show Representative densities of human endogenous GAPDH.
- FIG. 8D shows three structures, two from cytoplasm and one from nucleus, after 8-hour oxidation, displayed with their angular distribution and location resolution.
- the FSC curves show the data quality of Cryo-EM density maps and corresponding atomic models.
- FIG. 8E shows post 24-hours oxidation GAPDH structures presented with their angular distribution and local resolution. FSC curves represents the data quality. All statistic data were calculated by Relion.
- FIGS. 9A to 9D show the lipid-like ligand binding groove in endogenous GAPDH.
- FIG. 9A shows the hydrophobic surface charge of the groove in endogenous GAPDH.
- FIG. 9B shows the SEC profile of de-lipidation treated GAPDH.
- FIG. 9C shows lysates from mutantexpressing cells directly loaded onto SDS-PAGE and monitoring the GAPDH expression level. R13 mutant abolishes GAPDH expression significantly.
- FIG. 9D shows FSEC profiles of mutant recombinant GAPDH. Most mutant GAPDH proteins are detected in FSEC as monomer.
- FIGS. 10A to 10D show the enzymatic activity of endogenous GAPDH during oxidation stress.
- FIG. 10A shows a standard curve of NADH as a reference.
- FIG. 10B shows five independent measures of enzymatic activity during prolonged oxidation stress. Each data point is shown as a dot.
- FIG. 10C shows western blot images showing the amount of GAPDH in each independent experiment.
- FIG. 11A, 1 IB-1, and 1 IB-2 show normalized maps of the catalytic site of endogenous GAPDH.
- FIG. 11 A shows an enlarged view of the catalytic sites of the normalized cryo-EM density maps from cytoplasmic GAPDH without oxidation stress (left) and nuclear GAPDH after 8 hours oxidation (right) displayed at different thresholds. This shows the distribution between 0-hour oxidation (non-labelled) and 8-hours oxidation (red) by following the oxidation stress. In 0-hour oxidation, all classes show electron density between Cl 52 and Hl 79.
- FIGS. 1 IB-1 and 1 IB-2 show after symmetry expansion, the single subunits subtracted for 3D classification without image alignment. 00031 FIGS. 12A to 12D show a summary of variations in different components of pYC for different applications.
- FIG. 12A shows a list of variations in each component of pYC, including protease cleavage site, purification tag, 2A self-cleavage site and antibiotic and fluorescent selection markers.
- FIG. 12B shows combinations of different components for protein purification.
- FIG. 12C shows cellular localization of target protein by fluorescence microscopy.
- FIG. 12D shows cellular localization of target protein by FSEC.
- FIGS. 13 A to 13 J show tagging of endogenous protein in HEK293T, Jurkat and MDA- MB468 cells.
- FIG. 13 A shows representative fluorescence microsopy of bright field (top row), mNeonGreen (second row), mApple (third row) and merged (bottom row) recorded on the indicated days after initial transfection
- FIG. 13B shows fraction of cells showing Cas-mApple (magenta) or mNeonGreen (green) fluorescence over untransfected control cells on different days after initial transfection, validating enrichment of mNeonGreen-positive population over time from initial transfection through puromycin treatment (Mean+/-SD from two replicates).
- FIG. 13C shows that anti -FLAG western blot validates tagging of GAPDH.
- FIG. 13E shows that anti -FLAG western blots validate tagging of endogenous targets in Jurkat cells; all proteins show their expected size on SDS-PAGE.
- FIG. 13F shows that size distributions of genome- edited cells are comparable to untreated control cells. (Sample number, n, of ACTB, FASN, GAPDH, TKT and PCNA is 2385, 10647, 759, 1912 and 2804 cells, respectively.)
- FIG. 13H shows representative fluorescence images of MDA-MB468 cells. From left to right are bright field image, fluorescence image cells labeled by SPY650-DNA, fluorescence image of mNeonGreen revealing genome edited cells, and the merged image.
- FIG. 131 shows that tagging in MDA-MB468 is validated by anti- FLAG western blot.
- FIG. 13 J shows knock-in efficiency for FASN and GAPDH in MDA-MB-468 cells measured as mNeonGreen signal higher than background signal in untreated control cells.
- FIGS. 14A to 14D show purification and negative staining EM of tagged endogenous proteins purified by anti-FLAG M2 resin (top) with colored lines above the marking fractions. Antiflag western blots (middle) from each fraction are marked by the same-colored line or triangle. Pink shadow marks the fractions with strongest western blot band. Negative stain EM micrograph and 2D class averages (bottom) of the sample from the fractions marked colored line or triangle on SEC profile.
- FIGs 14A-E show, respectively, ACTB, TKT, VIM, FASN and GAPDH. Scale bar is lOOnm. The SEC profiles often show multiple peaks, indicating that the affinity pulldown captures different complexes associated with the target protein. Furthermore, affinity pulldown could also capture proteins that may not form stable complexes with the target proteins.
- FIGS. 15 A, 15B, and 15C show an image processing workflow of 200 kV cryo-EM datasets. Reconstructions of FASN, GAPDH, methylosome and PCNA are shown.
- FIGS. 16A, 16B, and 16C show the workflow of cryo-EM data processing on analyzing GAPDH.
- FIGS. 17A to 17D show the lipid-like ligand binding groove in GAPDH.
- FIG. 17A shows hydrophobic surface charge of the groove in endogenous GAPDH. The surface potential is generated default in ChimeraX.
- FIG. 17B shows SEC profile of affinity purified GAPDH after de-lipidation treatment. The shaded peak corresponds to the intact tetrameric GAPDH. Insert are anti-FLAG western blot of affinity purified GAPDH before de-lipidation treatment and from the shaded peak after de-lipidation.
- FIG. 17C shows GFP fluorescence image of SDS-PAGE gel of the lysates from the wild type and mutant GAPDH expressing cells. R13 mutant reduces GADPH expression.
- FIG. 17A shows hydrophobic surface charge of the groove in endogenous GAPDH. The surface potential is generated default in ChimeraX.
- FIG. 17B shows SEC profile of affinity purified GAPDH after de-lipidation treatment. The shaded peak corresponds to the intact
- FIG. 17D shows cell lysate FSEC profiles of wild type and mutant recombinant GAPDH.
- FSEC profile of cell lysate with wild type GAPDH black curve
- Colored FSEC profiles are from cell lysate of mutant GAPDH.
- the peak indicated by the red dashed line corresponds to non-intact GAPDH.
- FIGS. 18A and 18B show a comparison of selection strategies and HDR designs. Tree diagrams illustrate different strategies of selection markers and HDR template construction used in different studies or available from vendors.
- FIG. 18A shows the use of two common selection markers of CRISPR/Cas9 knock in cells, fluorescent proteins (FP) and antibiotic resistance genes. The present study used double selection markers that both are separated from the targeted proteins by the 2A cleavage site.
- FIG. 18B shows construction of HDR template. In the present study multiple cloning sites (MCS) were used and the plasmid itself can be used as a HDR donor.
- MCS multiple cloning sites
- FIG. 19 illustrates the cBAF complex.
- FIG. 20 shows a western blot analysis for DPF2.
- FIG. 21 shows results of protein purification for 2xStrep-ALFA-GFPl l-3xFLAG.
- FIGS. 22A and 22B illustrate mass Spec, of the purification in FIG. 21.
- FIG. 23 illustrates a negative stain from the cBAF experiments.
- FIG. 24 shows a western blot for 2xStrep-ALFA-GFPl l-3xFLAG-SNF2h (—133 kDa) (tags on the N-terminus of SNF2h).
- FIGS. 25 A and 25B show Exportin-1 (gene name crml or xpol) as tagged in HEK293 cells.
- FIG. 26 illustrates a western blot for FASN knock in HeLa cells.
- 00049 Various terms relating to the methods and other aspects of the present disclosure are used throughout the specification and claims. Such terms are to be given their ordinary meaning in the art unless otherwise indicated. Other specifically defined terms are to be construed in a manner consistent with the definition provided herein. 00050 00036 The term “about” as used herein when referring to a measurable value such as an amount, a temporal duration, and the like, is meant to encompass variations of ⁇ 20%, ⁇ 10%, ⁇ 5%, ⁇ 1%, 00051 ⁇ 0.5%, or ⁇ 0.1% from the specified value, as such variations are appropriate to perform the disclosed methods.
- a reference to "A and/or B,” when used in conjunction with open-ended language such as “comprising” can refer, in one embodiment, to A without B (optionally including elements other than B); in another embodiment, to B without A (optionally including elements other than A); in yet another embodiment, to both A and B (optionally including other elements); etc.
- the terms “activate,” “stimulate,” “enhance” “increase” and/or “induce” are used interchangeably to generally refer to the act of improving or increasing, either directly or indirectly, a concentration, level, function, activity, or behavior relative to the natural, expected, or average, or relative to a control condition before the act is completed.
- “Activate” refers to a primary response induced by ligation or interaction between two molecules, such as a protein-protein interaction.
- such stimulation entails the interaction of a receptor and its ligand, resulting in a subsequent signal transduction event.
- the stimulation event may activate a cell and upregulate or downregulate expression or secretion of a molecule.
- ligation of cell surface moieties even in the absence of a direct signal transduction event, may result in the reorganization of cytoskeletal structures, or in the coalescing of cell surface moieties, each of which could serve to enhance, modify, or alter subsequent cellular responses by activation or stimulation.
- Cell type means the organism, organ, and/or tissue type from which the cell is derived or sourced, state of development, phenotype or any other categorization of a particular cell that appropriately forms the basi for defining it as "similar to” or “different from” another cell or cells.
- Coding sequence or "encoding nucleic acid” as used herein may mean refers to the nucleic acid (RNA, DNA, or RNA/DNA hybrid molecule) that comprises a nucleotide sequence which encodes a protein.
- the coding sequence may further include initiation and termination signals operably linked to regulatory elements including a promoter and polyadenylation signal capable of directing expression in the cells of an individual or mammal to whom the nucleic acid is administered.
- nucleic acid may mean Watson-Crick (e.g., A-T/U and C-G) or Hoogsteen base pairing between nucleotides or nucleotide analogs of nucleic acid molecules.
- fragment is meant to be a portion of a polypeptide or nucleic acid molecule, such as, but not limiting to, a truncation mutant. This portion contains, preferably, at least about 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95% of the entire length of the reference nucleic acid molecule or polypeptide.
- a fragment may contain about 5, about 10, about 20, about 30, about 40, about 50, about 60, about 70, about 80, about 90, about 100, about 200, about 300, about 400, about 500, about 600, about 700, about 800, about 900, or about 1000 or more nucleotides or amino acids of a nucleotide or amino acid sequence, respectively, upon which it is based.
- a functional fragment means any portion or fragment of a polypeptide or nucleic acid sequenc from which the respective full-length polypeptide or nucleic acid relates that is of a sufficient length and has a sufficient structure to confer a biological affect that is similar or substantially similar to the full-length polypeptide or nucleic acid upon which the fragment is based.
- a functional fragment is a portion of a full-length or wild-type nucleic acid sequence that encodes any one of the nucleic acid sequences disclosed herein, and said portion encodes a polypeptide of a certain length and/or structure that is less than full length but encodes a domain that still biologically functional as compared to the full- length or wild-type protein.
- the functional fragment may have a reduced biological activity, about equivalent biological activity, or an enhanced biological activity as compared to the wild-type or full-length polypeptide sequence upon which the fragment is based.
- the functional fragment is derived from the sequence of an organism, such as a human.
- the functional fragment may retain about 99%, about 98%, about 97%, about 96%, about 95%, about 94%, about 93%, about 92%, about 91%, or about 90% sequence identity to the wild-type or given sequence upon which the sequence is derived.
- the functional fragment may retain about 85%, about 80%, about
- the term “genetic construct” is meant to refer to the DNA or RNA molecules that comprise a nucleotide sequence that encodes protein.
- the coding sequence includes initiation and termination signals operably linked to regulatory elements including a promoter and polyadenylation signal capable of directing expression in the cells of the individual to whom the nucleic acid molecule is administered.
- the term “host cell” as used herein is meant to refer to a cell that can be used to express a nucleic acid, e.g., a nucleic acid of the disclosure.
- the host cell can be, but is not limited to, a eukaryotic cell, a bacteria cell, an insect cell, or a human cell.
- Suitable eukaryotic cells include, but are not limited to, Vero cells, HeLa cells, COS cells, CHO cells, HEK293 cells, BHK cells and MDCKII cells.
- the host cell i a cell chosen from one listed in Table X or Table Y.
- the host cell is a cell chosen from a cell line, optionally stored at -80 degrees Celsius or -212 degrees Celsius.
- suitable insect cells include, but are no limited to, Sf9 cells.
- the phrase "recombinant host cell" can be used to denote a host cell that has been transformed or transfected with a nucleic acid to be expressed.
- a host cell also can be a cell that comprises the nucleic acid but does not express it at a desired level unless a regulatory sequence is introduced into the host cell such that it becomes operably linked with the nucleic acid. It is understood that the term host cell refers not only to the particular subject cell but also to the progeny or potential progeny of such a cell. Because certain modifications may occur in succeeding generations due to, e.g., mutation or environmental influence, such progeny may not, in fact, be identical to the parent cell, but are still included within the scope of the term as use herein.
- hybridize as used herein is meant pair to form a double-stranded molecule between complementary polynucleotide sequences (e.g., a gene described herein), or portions thereof, under various conditions of stringency.
- complementary polynucleotide sequences e.g., a gene described herein
- stringency See, e.g., Wahl, G. M. and S. L. Berger (1987) Methods Enzymol. 152:399; Kimmel, A. R. (1987) Methods Enzymol. 152:507).
- isolated means that the nucleic acid molecule, polynucleotide or polypeptide o fragment, variant, or derivative thereof has been essentially removed from other biological materials with which it is naturally associated, or essentially free from other biological materials derived, e.g., from a recombinant host cell that has been genetically engineered to express the polypeptide of the disclosure.
- nucleic acid may not be the species listed.
- nucleic acid may incorporate the mutations above in combination with one or more other mutations listed or not listed, but the nucleic acid may not be defined as the single species containing the nucleic acid mutations listed.
- polypeptide encompasses two or more naturally or non-naturally-occurring amino acids joined by a covalent bond (e.g., an amide bond).
- Polypeptides as described herein include full-length proteins (e.g., fully processed pro-proteins or full-length synthetic polypeptides) as well as shorter amino acid sequences (e.g., fragments of naturally-occurring proteins or synthetic polypeptide fragments).
- polypeptide sequence associated with a cell line means any polypeptide or fragment thereof, modified or unmodified by any macromolecule (such as a sugar molecule or macromolecule) that is produced naturally by a cell or cell line.
- the cell line is originally from or derived from a multicellular organism.
- a polypeptide sequence associated with the hepatocyte is any polypeptide or fragment thereof, modified or unmodified by any macromolecule (such as a sugar molecule or macromolecule) that is produced naturally by the cell lines in culture.
- a polypeptide sequence associated with the cell is any polypeptide sequence comprising any one or plurality of the polypeptides disclosed in Table Y or a sequence that shares about 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence identity with the polypeptides disclosed in Table Y or a functional fragment thereof.
- a polypeptide sequence associated with the extracellular matrix consists of any of the polypeptides disclosed in Table Y or a sequence that shares about 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence identity with the polypeptides disclosed in Table Y.
- the term “purified” means that the polynucleotide or polypeptide or fragment, variant, or derivative thereof is substantially free of other biological material with which it is naturally associated, or free from other biological materials derived, e.g., from a recombinant host cell that has been genetically engineered to express the polypeptide. That is, e.g., a purified polypeptide is a polypeptide that is at least from about 70% to about 100% pure, i.e., the polypeptide is present in a composition wherein the polypeptide constitutes from about 70% to about 100% by weight of the total composition. In some embodiments, the purified polypeptide is from about 75% to about 99% by weight pure, from about 80% to about 99% by weight pure, from about 90 to about 99% by weight pure, or from about 95% to about 99% by weight pure.
- the terms “subject,” “individual,” “host,” and “patient,” are used interchangeably herein and refer to any mammalian subject for whom diagnosis, treatment, or therapy is desired, parti humans. The methods described herein are applicable to both human therapy and veterinary applications. In some embodiments, the subject is a mammal, and in other embodiments the subject is a human.
- nucleic acid molecules e.g., cDNA or genomic DNA
- RNA molecules e.g., mRNA
- analogs of the DN or RNA generated using nucleotide analogs e.g., peptide nucleic acids and non-naturally occurring nucleotide analogs
- hybrids thereof e.g., peptide nucleic acids and non-naturally occurring nucleotide analogs
- the nucleic acid molecule can be single-stranded or double-stranded.
- the nucleic acid molecules of the disclosure comprise a contiguous open reading frame encoding an Cas protein, or a fragment thereof, as described herein.
- Nucleic acid or “oligonucleotide” or “polynucleotide” as used herein may mean at least two nucleotides covalently linked together.
- the depiction of single strand also defines the sequence of the complementary strand.
- a nucleic acid also encompasses the complementary strand of a depicted single strand.
- Many variants of a nucleic acid may be used for the same purpose as a given nucleic acid.
- a nucleic acid also encompasses substantially identical nucleic acids and complements thereof.
- a single strand provides a probe that may hybridize to a target sequence under stringent hybridization conditions.
- a nucleic acid also encompasses a probe that hybridizes under stringent hybridization conditions.
- Nucleic acids may be single stranded or double stranded, or may contain portions of both double stranded and single stranded sequence.
- the nucleic acid may be DNA, both genomic and cDNA, RNA, or a hybrid, where the nucleic acid may contain combinations of deoxyribo- and ribo-nucleotides, and combinations of bases including uracil, adenine, thymine, cytosine, guanine, inosine, xanthine hypoxanthine, isocytosine and isoguanine Nucleic acids may be obtained by chemical synthesis methods or by recombinant methods.
- nucleic acid will generally contain phosphodiester bonds, although, in some embodiments, nucleic acid analogs may be included that may have at least one different linkage, e.g., phosphoramidate, phosphorothioate, phosphorodi thioate, or O-methylphosphoroamidite linkages and peptide nucleic acid backbones and linkages.
- Other analog nucleic acids include those with positive backbones; non-ionic backbones, and non-ribose backbones, including those described in U.S. Pat. Nos. 5,235,033 and 5,034,506, which are incorporated by reference in their entireties.
- Nucleic acids containing one or more non-naturally occurring or modified nucleotides are also included within one definition of nucleic acids.
- the modified nucleotide analog may be located for example at the 5'-end and/or the 3'-end of the nucleic acid molecule.
- Representative examples of nucleotide analogs may be selected from sugar- or backbone- modified ribonucleotides. It should be noted, however, that also nucleobase-modified ribonucleotides, i.e. ribonucleotides, containing a non-naturally occurring nucleobase instead of a naturally occurring nucleobase such as uridines or cytidines modified at the 5-position, e.g.
- the 2'-OH-group may be replaced by a group selected from H, OR, R, halo, SH, SR, NH.sub.2, NHR, N.sub.2 or CN, wherein R is C.sub.1- C.sub.6 alkyl, alkenyl or alkynyl and halo is F, Cl, Br or I.
- Modified nucleotides also include nucleotides conjugated with cholesterol through, e.g., a hydroxyprolinol linkage as described in Krutzfeldt et al., Nature (Oct. 30, 2005), Soutschek et al., Nature 432:173-178 (2004), and U.S. Patent Publication No.
- Modified nucleotides and nucleic acids may also include locked nucleic acids (LNA), as described in U.S. Patent No. 20020115080, which is incorporated herein by reference. Additional modified nucleotides and nucleic acids are described in U.S. Patent Publication No. 20050182005, which is incorporated herein by reference in its entirety. Modifications of the ribose-phosphate backbone may be done for a variety of reasons, e.g., to increase the stability and half-life of such molecules in physiological environments, to enhance diffusion across cell membranes, or as probes on a biochip.
- LNA locked nucleic acids
- nucleotide sequence encoding one or more Cas proteins is free of modified nucleotide analogs.
- nucleotide sequence encoding one or more antigens comprises from about 1 to about 20 nucleic acid modifications.
- nucleotide sequence encoding one or more Cas proteins comprises from about 1 to about 50 nucleic acid modifications.
- nucleotide sequence encoding one or more antigens independently comprise from about 1 to about 100 nucleic acid modifications.
- nucleic acid molecule comprises one or more nucleotide sequences that encode one or more proteins.
- a nucleic acid molecule comprises initiation and termination signals operably linked to regulatory elements including a promoter and polyadenylation signal capable of directing expression in the cells of the individual to whom the nucleic acid molecule is administered.
- the nucleic acid molecule also is a plasmid comprising one or more nucleotide sequences that encode one or a plurality of neoantigens.
- the disclosure relates to a pharmaceutical composition
- a pharmaceutical composition comprising a first, second, third or more nucleic acid molecules, each of which encoding one or a plurality of neoantigens and at least one of each plasmid comprising one or more of the Formulae disclosed herein.
- polypeptide refers to polymers of amino acids of any length.
- the polymer may be linear or branched, it may comprise modified amino acids, and it may be interrupted by non-natural amino acids or chemical groups that are not amino acids.
- the terms also encompass an amino acid polymer that has been modified; for example, disulfide bond formation, glycosylation, lipidation, acetylation, phosphorylation, or any other manipulation, such as conjugation with a labeling component.
- amino acid includes natural and/or unnatural or synthetic amino acids, including glycine and both the D or L optical isomers, and amino acid analogs and peptidomimetics.
- “conservative” amino acid substitutions may be defined as set out in Tables A, B, or C below.
- the vaccines, compositions, pharmaceutical compositions and method may comprise nucleic acid sequences comprising one or more conservative substitutions.
- the vaccines, compositions, pharmaceutical compositions and methods comprise nucleic acid sequences that retain from about 70% sequence identity to about 99% sequences identity to the sequence identification numbers disclosed herein but comprise one or more conservative substitutions.
- Conservative substitutions of the present disclosure include those wherein conservative substitutions (from either nucleic acid or amino acid sequences) have been introduced by modification of polynucleotides encoding polypeptides.
- Amino acids can be 00077 according to physical properties and contribution to secondary and tertiary protein structure.
- a conservative substitution is recognized in the art as a substitution of one amino acid for another amino acid that has similar properties.
- the conservative substitution is recognized in the art as a substitution of one nucleic acid for another nucleic acid that has similar properties, or, when encoded, has similar binding affinities to its target.
- the target is a cell comprising endogenous DNA to which plasmids of the disclosure initiate a recombination event. Exemplary conservative substitutions are set out in Table A.
- conservative amino acids can be grouped as described in Lehninger, (Biochemistry, Second Edition; Worth Publishers, Inc. NY, N.Y. (1975), pp. 71-77) as set forth in Table B.
- inhibitors described herein are intended to include nucleic acids and, when the nucleic acid sequences of the disclosure are encoded, include polypeptide, polypeptides bearing one or more insertions, deletions, or substitutions, or any combination thereof, of amino acid residues as well as modifications other than insertions, deletions, or substitutions of amino acid residues.
- “more than one” or “two or more” of the aforementioned amino acid substitutions means 2, 3, 4, 5, 6, 7, 8, 9, 10 or more of the recited amino acid or nucleic acid substitutions. In some embodiments, “more than one” means 2, 3, 4, or 5 of the recited amino acid substitutions or nucleic acid substitutions. In some embodiments, “more than one” means 2, 3, 4 or more of the recited amino acid substitutions or nucleic acid substitutions. In some embodiments, “more than one” means 2, 3 or 4 of the recited amino acid substitutions or nucleic acid substitutions. In some embodiments, “more than one” means 2 or more of the recited amino acid substitutions or nucleic acid substitutions. In some embodiments, “more than one” means 2 of the recited amino acid substitutions or nucleic acid substitutions.
- the percentage may be calculated by optimally aligning the two sequences, comparing the two sequences over the specified region, determining the number of positions at which the identical residue occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the specified region, and multiplying the result by 100 to yield the percentage of sequence identity.
- the residues of single sequence are included in the denominator but not the numerator of the calculation.
- BLAST high scoring sequence pair
- T is referred to as the neighborhood word score threshold (Altschul et al., 1997).
- These initial neighborhood word hits act as seeds for initiating searches to find HSPs containing them.
- the word hits are extended in both directions along each sequence for as far as the cumulative alignment score can be increased. Extension for the word hits in each direction are halted when: 1) the cumulative alignment score falls off by the quantity X from its maximum achieved value; 2) the cumulative score goes to zero or below, due to the accumulation of one or more negative-scoring residue alignments; or 3) the end of either sequence is reached.
- the Blast algorithm parameters W, T and X determine the sensitivity and speed of the alignment.
- the Blast program uses as defaults a word length (W) of 11, the BLOSUM62 scoring matrix (see Henikoff et al., Proc. Natl. Acad. Sci.
- BLAST algorithm Karlin et al., Proc. Natl. Acad. Sci. USA, 1993, 90, 5873-5787, which is incorporated herein by reference in its entirety
- Gapped BLAST perform a statistical analysis of the similarity between two sequences.
- One measure of similarity provided by the BLAST algorithm is the smallest sum probability (P(N)), which provides an indication of the probability by which a match between two nucleotide sequences would occur by chance.
- P(N) the smallest sum probability
- a nucleic acid is considered similar to another if the smallest sum probability in comparison of the test nucleic acid to the other nucleic acid is less than about 1, less than about 0.1, less than about 0.01, and less than about 0.001.
- Two single-stranded polynucleotides are “the complement” of each other if their sequences can be aligned in an anti-parallel orientation such that every nucleotide in one polynucleotide is opposite its complementary nucleotide in the other polynucleotide, without the introduction of gaps, and without unpaired nucleotides at the 5' or the 3' end of either sequence.
- a polynucleotide is "complementary" to another polynucleotide if the two polynucleotides can hybridize to one another under moderately stringent conditions.
- a polynucleotide can be complementary to another polynucleotide without being its complement.
- stringent hybridization conditions or “stringent conditions” as used herein is meant to refer to conditions under which a nucleic acid molecule will hybridize another nucleic acid molecule, but to no other sequences. Stringent conditions are sequence-dependent and will be different in different circumstances. Longer sequences hybridize specifically at higher temperatures. Generally, stringent conditions are selected to be about 5°C lower than the thermal melting point (Tm) for the specific sequence at a defined ionic strength and pH. The Tm is the temperature (under defined ionic strength, pH and nucleic acid concentration) at which 50% of the probes complementary to the target sequence hybridize to the target sequence at equilibrium. Since the target sequences are generally present in excess, at Tm, 50% of the probes are occupied at equilibrium.
- Tm thermal melting point
- stringent conditions will be those in which the salt concentration is less than about 1.0 M sodium ion, typically about 0.01 to 1.0 M sodium ion (or other salts) at pH 7.0 to 8.3 and the temperature is at least about 30° C for short probes, primers or oligonucleotides (e g. 10 to 50 nucleotides) and at least about 600C for longer probes, primers or oligonucleotides.
- Stringent conditions may also be achieved with the addition of destabilizing agents, such as formamide.
- nucleic acid molecule or polypeptide exhibiting at least 50% identity to a reference amino acid sequence (for example, any one of the amino acid sequences described herein) or nucleic acid sequence (for example, any one of the nucleic acid sequences described herein).
- a reference amino acid sequence for example, any one of the amino acid sequences described herein
- nucleic acid sequence for example, any one of the nucleic acid sequences described herein.
- such a sequence is at least about 60%, about 80% or about 85%, and about 90%, about 95% or about 99% identical at the amino acid level or nucleic acid to the sequence used for comparison.
- a nucleotide sequence is "operably linked" to a regulatory sequence if the regulatory sequence affects the expression (e.g., the level, timing, or location of expression) of the nucleotide sequence.
- a "regulatory sequence” is a nucleic acid that affects the expression (e.g., the level, timing, or location of expression) of a nucleic acid to which it is operably linked.
- the regulatory sequence can, for example, exert its effects directly on the regulated nucleic acid, or through the action of one or more other molecules (e.g., polypeptides that bind to the regulatory sequence and/or the nucleic acid).
- regulatory sequences include promoters, enhancers and other expression control elements (e.g., polyadenylation signals). Further examples of regulatory sequences are described in, for example, Goeddel, 1990, Gene Expression Technology: Methods in Enzymology 185, Academic Press, San Diego, Calif, and Baron et al., 1995, Nucleic Acids Res. 23:3605-06.
- 00085 00071 "Operably linked" as used herein may mean that expression of a gene is under the control of a promoter with which it is spatially connected.
- a promoter may be positioned 5' (upstream) or 3' (downstream) of a gene under its control.
- the distance between the promoter and a gene may be approximately the same as the distance between that promoter and the gene it controls in the gene from which the promoter is derived. As is known in the art, variation in this distance may be accommodated without loss of promoter function.
- Promoter may mean a synthetic or naturally-derived molecule which is capable of conferring, activating or enhancing expression of a nucleic acid in a cell.
- a promoter may comprise one or more specific transcriptional regulatory sequences to further enhance expression and/or to alter the spatial expression and/or temporal expression of same.
- a promoter may also comprise distal enhancer or repressor elements, which can be located as much as several thousand base pairs from the start site of transcription.
- a promoter may be derived from sources including viral, bacterial, fungal, plants, insects, and animals.
- a promoter may regulate the expression of a gene component constitutively, or differentially with respect to cell, the tissue or organ in which expression occurs or, with respect to the developmental stage at which expression occurs, or in response to external stimuli such as physiological stresses, pathogens, metal ions, or inducing agents.
- promoters include the bacteriophage T7 promoter, bacteriophage T3 promoter, SP6 promoter, lac operator-promoter, tac promoter, SV40 late promoter, SV40 early promoter, RSV-LTR promoter, CMV IE promoter, SV40 early promoter or SV40 late promoter and the CMV IE promoter.
- sample refers generally to a limited quantity of something which is intended to be similar to and represent a larger amount of that something.
- a sample is a collection, swab, brushing, scraping, biopsy, removed tissue, or surgical resection that is to be testing for the absence, presence of a transfected cell or an endogenously labeled cell.
- a sample believed to contain one or more transformed cells as compared to a “control sample” that is known to be free of one or more transformed cells, transfected cells or cells that are free of the plasmid or plasmid insert of the disclosure.
- the methods relate to the step of exposing a swab, brushing or other sample from an environment to a set of reagents sufficient to isolate and/or sequence the DNA and RNA of one or a plurality of cells in the sample.
- the methods relate to the step of exposing a swab, brushing or other sample of cells from a cell culture system to a set of reagents sufficient to isolate and/or sequence or observe the expression of one or a plurality of amino acids in the cells in the sample.
- Stringent hybridization conditions may mean conditions under which a first nucleic acid sequence (e.g., probe) will hybridize to a second nucleic acid sequence (e.g., target), such as in a complex mixture of nucleic acids. Stringent conditions are sequencedependent and will be different in different circumstances. Stringent conditions may be selected to be about 5-10°C lower than the thermal melting point (Tm) for the specific sequence at a defined ionic strength pH. The T m may be the temperature (under defined ionic strength, pH, and nucleic concentration) at which 50% of the probes complementary to the target hybridize to the target sequence at equilibrium (as the target sequences are present in excess, at Tm, 50%> of the probes are occupied at equilibrium).
- Tm thermal melting point
- Stringent conditions may be those in which the salt concentration is less than about 1.0 M sodium ion, such as about 0.01-1.0 M sodium ion concentration (or other salts) at pH 7.0 to 8.3 and the temperature is at least about 30°C for short probes (e.g., about 10-50 nucleotides) and at least about 60°C for long probes (e.g., greater than about 50 nucleotides). Stringent conditions may also be achieved with the addition of destabilizing agents such as formamide. For selective or specific hybridization, a positive signal may be at least 2 to 10 times background hybridization.
- Exemplary stringent hybridization conditions include the following: 50%> formamide, 5x SSC, and 1% SDS, incubating at 42°C, or, 5x SSC, 1% SDS, incubating at 65°C, with wash in 0.2x SSC, and 0.1% SDS at 65°C.
- 00089 "Substantially complementary” as used herein may mean that a first sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% identical to the 00090 complement of a second sequence over a region of 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
- a first and second sequence are at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% identical over a region of 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
- nucleic acid means (i) a portion or fragment of a referenced nucleotide sequence; (ii) the complement of a referenced nucleotide sequence or portion thereof; (iii) a nucleic acid that is substantially identical to a referenced nucleic acid or the complement thereof; or (iv) a nucleic acid that hybridizes under stringent conditions to the referenced nucleic acid, complement thereof, or a sequences substantially identical thereto.
- Variant may be in an embodiment, with respect to a peptide or polypeptide, a polypeptide that differs in amino acid sequence by the insertion, deletion, or conservative substitution of amino acids, but retain at least one biological activity.
- Variant may also mean a protein with an amino acid sequence that is substantially identical to a referenced protein with an amino acid sequence that retains at least one biological activity.
- a conservative substitution of an amino acid i.e., replacing an amino acid with a different amino acid of similar properties (e.g., hydrophilicity, degree and distribution of charged regions) is recognized in the art as typically involving a minor change. These minor changes can be identified, in part, by considering the hydropathic index of amino acids, as understood in the art.
- the hydropathic index of an amino acid is based on a consideration of its hydrophobicity and charge. It is known in the art that amino acids of similar hydropathic indexes can be substituted and still retain protein function. In one aspect, amino acids having hydropathic indexes of ⁇ 2 are substituted.
- the hydrophilicity of amino acids can also be used to reveal substitutions that would result in proteins retaining biological function. A consideration of the hydrophilicity of amino acids in the context of a peptide permits calculation of the greatest local average hydrophilicity of that peptide, a useful measure that has been reported to correlate well with antigenicity and immunogenicity. U.S. Patent No. 4,554,101, fully incorporated by reference herein. Substitution of amino acids having similar hydrophilicity values can result in peptides retaining biological activity, for example immunogenicity, as is understood in the art.
- substitutions may be performed with amino acids having hydrophilicity values within ⁇ 2 of each other. Both the hydrophobicity index and the hydrophilicity value of amino acids are influenced by the particular side chain of that amino acid. Consistent with that observation, amino acid substitutions that are compatible with biological function are understood to depend on the relative similarity of the amino acids, and particularly the side chains of those amino acids, as revealed by the hydrophobicity, hydrophilicity, charge, size, and other properties.
- Nucleic acid molecules or nucleic acid sequences of the disclosure include those coding sequences comprising the first nucleic acid sequence that encodes a selection domain and the second nucleic acid sequence that encodes a selection domain, or functional fragments or variants thereof that possess no less than about 65, 70, 75, 80, 85, 90, 95, 96, 97, 98, or 99% sequence identity with the coding sequences of the selection domains disclosed herein.
- a “vector” is a nucleic acid molecule that can be used to introduce a nucleic acid sequence subcomponent linked to it into a cell.
- a vector is a "plasmid,” which refers to a linear or circular double stranded DNA molecule into which additional nucleic acid segments can be ligated.
- a viral vector e.g., replication defective retroviruses, adenoviruses and adeno-associated viruses
- RNA, DNA or hybrid RNA/DNA molecule comprising viral genome promoter sequences are operably linked to the expressible nucleotide sequence.
- the expressible nucleotide sequence is introduced into a cellular genome.
- Certain vectors are capable of autonomous replication in a host cell into which they are introduced (e.g., bacterial vectors comprising a bacterial origin of replication and episomal mammalian vectors).
- Other vectors e g., non-episomal mammalian vectors
- An "expression vector” is a type of vector that can direct the expression of a chosen polynucleotide.
- the disclosure relates to any one or plurality of vectors that comprise nucleic acid sequences encoding any one or plurality of amino acid sequence disclosed herein.
- the present disclosure relates to a versatile strategy of tagging endogenous proteins, using a multicomponent HDR donor template that can be easily converted to all three HDR donor forms for parallelized knock-in experiments.
- This strategy provides additional structural and mechanistic insights that are not revealed by using recombinant proteins. While this disclosure focuses on structural characterization of endogenous proteins, it will be understood by those skilled in the art that the same strategy can be applied to other types of cell lines and used beyond structural biology.
- the present inventors generated a set of plasmids, with different variations containing a DNA fragment to be knocked-in to the targeted locus, flanked by two multiple cloning sites (MCS) for insertion of left (5’) and right (3’) homology arms (L- and R-arms) flanking the targeted locus.
- MCS multiple cloning sites
- the plasmids also contain a bacterial origin of replication and a selectable antibiotic resistance marker for efficient amplification in bacterial cells.
- the inserted DNA fragment between two homology arms contains one or a plurality of affinity tags, whose choice can vary to match specific experimental goals, and one or a plurality of selectable markers for use in mammalian cells.
- a self-cleaving peptide is inserted between the affinity tag and the selection marker, and between the two selection markers.
- the two MCSs in the plasmid can be used as cleavage sites to convert the plasmid into a dsDNA HDR donor.
- a ssDNA HDR donor can be generated from dsDNA form.
- all three forms of HDR donor can be generated without having to make multiple constructs and can be used to tag any target endogenous protein in parallel.
- Genome- edited cells can be selected in two steps, i.e., antibiotic treatment followed by fluorescence activated cell sorting (FACS). Alternatively, in practice, a majority of un-edited cells are removed by antibiotic treatment. At this point, the knock-in result can be checked by western blot, sequencing, and/or fluorescent light microscopy. The enriched population of genome-edited cells makes the final sorting by FACS more efficient or even unnecessary. A major advantage of this approach is that the success rate of generating genome-edited cell lines is less dependent on the initial efficiency of CRISRP/Cas9 knock-in.
- the disclosure relates to a composition
- a composition comprising a cell, the cell comprising a nucleic acid molecule disclosed herein.
- the cell comprises a protein encoded by a target gene, the protein comprising a label; and two exogenous nucleic acid sequences encoding a first and second selection domain.
- the selection domain is chosen from one or a combination of selection domains in Table Y.
- the first nucleic acid sequence encoding a selection domain encodes an amino acid that confers resistance to the presence of a chemical substance, such as an antibiotic.
- the first nucleic acid encoding a selection domain is a nucleic acid sequence whose presence in the cell confers resistance to a toxin, such as an antibiotic.
- the second nucleic acid sequence encoding a selection domain encodes a protein that is free of the amino acid structure of the target protein and is a protein that emits light when exposed to certain wavelengths.
- the second nucleic acidsequence encoding a selection domain comprises a nucleic acid sequence of Table Y or variants thereof that comprise from about 70% to about 99% sequence identity to the nucleic acid sequences in Table Y.
- the disclosure provides a composition comprising a nucleic acid molecule comprising: (i) a first and a second homology donor (HDR) region complementary to a target domain; (ii) at least a first and a second nucleic acid sequence that encodes a selection domain, wherein each of the first and second nucleic acid sequences that encode a selection domain are positioned in a 5’ to 3’ orientation between the first and second HDR sequences; (iii) a first nucleic acid sequence encoding a cleavage site positioned between the first HDR sequence and the first nucleic acid sequence that encodes a selection domain; (iv) a second nucleic acid sequence encoding a cleavage site positioned, in a 5’ to 3’ orientation, between the first nucleic acid sequence that encodes a selection domain and the second nucleic acid sequence that encodes a selection domain; (v) a first and a second multiple cloning site, wherein the first multiple clo
- the first and the second HDR regions comprise from about 10 to about 30 base pairs in nucleic acid length. In some embodiments, the first and the second HDR regions comprise from about 100 to about 900 base pairs in nucleic acid length. In some embodiments, the first and the second HDR regions comprise from about 50 to about 500 base pairs in nucleic acid length. In some embodiments, the first and the second HDR regions comprise from about 500 to about 900 base pairs in nucleic acid length.
- the first nucleic acid encoding a cleavage site, the first nucleic acid sequence encoding a selection domain, the second nucleic acid encoding a cleavage site, and the second nucleic acid sequence encoding a cleavage site are positioned in a contiguous nucleic acid sequence in a 5’ to 3’ orientation.
- the first and second nucleic acids encoding a cleavage site encode a P2A cleavage site.
- the composition further comprises a protein tag domain positioned, in a 5’ to 3’ orientation, either: (a) between the first HDR region and the first nucleic acid encoding a cleavage site; or (b) between the second nucleic acid encoding a cleavage site and the second HDR region.
- the first nucleic acid sequence encoding a selection domain encodes a fluorescent protein.
- the fluorescent protein is chosen from a domain encoding the protein in Table Z.
- the second nucleic acid encoding a selection domain encodes a selection domain that confers resistance to exposure of a toxic chemical.
- the toxic chemical is an antibiotic.
- the antibiotic is chosen from any antibiotic chosen from Table X or the gene that confers antibiotic resistance to the antibiotic is chosen from one or a combination of nucleic acid sequences disclosed in Table X or Table Y.
- the first multiple cloning site comprises at least about 70% sequence identity to SEQ ID NO:59.
- the second multiple cloning site comprises at least about 70% sequence identity to SEQ ID NO:2 .
- the nucleic acid molecule further comprises an origin of replication comprising at least 70% sequence identity to SEQ ID NO:56.
- the nucleic acid molecule further comprises an origin of replication comprising at least 70% sequence identity to SEQ ID NO:57.
- elements (i) through (v) are positioned in a modification element, wherein the nucleic acid molecule further comprises a regulatory sequence operably linked to a third nucleic acid sequence encoding a selection domain that is positioned outside of the modification element. In some embodiments, the third nucleic acid sequence encoding a selection domain confers puromycin resistance. In some embodiments, elements (i) through (v) are positioned in a modification element, and wherein the nucleic acid molecule further comprises an origin of replication positioned outside of the modification element. In some embodiments, the disclosure relates to a composition comprising a nucleic acid molecule comprising at least a first and second origin of replication.
- the first and second origin of replication are a nucleic acid sequence comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO 56 and SEQ ID NO:57, respectively.
- the nucleic acid molecule is a plasmid, a double stranded DNA or a single stranded DNA molecule.
- the composition further comprises a transfection reagent.
- the nucleic acid molecule comprises a selection domain comprising SEQ ID NO:34 or a functional fragment thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence identity to SEQ ID NO:34.
- the nucleic acid molecule comprises a selection domain comprising SEQ ID NO:35 or a functional fragment thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence identity to SEQ ID NO:35.
- the nucleic acid molecule comprises a selection domain comprising SEQ ID NO:36 or a functional fragment thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence identity to SEQ ID NO:36.
- the nucleic acid molecule comprises a selection domain comprising SEQ ID NO:37 or a functional fragment thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%sequence identity to SEQ ID NO:37.
- the nucleic acid molecule comprises a selection domain comprising SEQ ID NO:38 or a functional fragment thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence identity to SEQ ID NO:38.
- the nucleic acid molecule comprises a selection domain comprising SEQ ID NO:39 or a functional fragment thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:39.
- the nucleic acid molecule comprises a cleavage domain comprising SEQ ID NO:29 or a functional fragment thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:29.
- the nucleic acid molecule comprises a cleavage domain comprising SEQ ID NO:30 or a functional fragment thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:30.
- the nucleic acid molecule comprises a cleavage domain comprising SEQ ID NO:31 or a functional fragment thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:31.
- the nucleic acid molecule comprises a cleavage domain comprising SEQ ID NO:32 or a functional fragment thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:32.
- the nucleic acid molecule comprises a cleavage domain comprising SEQ ID NO:33 or a functional fragment thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:33.
- the nucleic acid molecule comprises a cleavage domain comprising SEQ ID NO:63 or a functional fragment thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:63.
- the nucleic acid molecule comprises a nucleic acid conferring prokaryotic antibiotic resistance comprising SEQ ID NO:49 or a functional fragment thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:49.
- the nucleic acid molecule comprises a nucleic acid conferring prokaryotic antibiotic resistance comprising SEQ ID NO:50 or a functional fragment thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%,
- the nucleic acid molecule comprises a nucleic acid conferring prokaryotic antibiotic resistance comprising SEQ ID NO:51 or a functional fragment thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100%
- the nucleic acid molecule comprises a nucleic acid conferring prokaryotic antibiotic resistance comprising SEQ ID NO:52 or a functional fragment thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:52.
- the nucleic acid molecule comprises a nucleic acid conferring prokaryotic antibiotic resistance comprising SEQ ID NO:53 or a functional fragment thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:53.
- the nucleic acid molecule comprises a nucleic acid conferring prokaryotic antibiotic resistance comprising SEQ ID NO:54 or a functional fragment thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO: 54.
- the nucleic acid molecule comprises a nucleic acid conferring prokaryotic antibiotic resistance comprising SEQ ID NO:58 or a functional fragment thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:58.
- the nucleic acid molecule comprises a nucleic acid sequence encoding a LoxP site comprising SEQ ID NO: 60 or a variant thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:60.
- the nucleic acid molecule comprises a nucleic acid sequence encoding a LoxP site comprising SEQ ID NO:61 or a variant thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:61 .
- the nucleic acid molecule comprises a nucleic acid sequence comprising SEQ ID NO:60 or a variant thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:60; and SEQ ID NO:61 or a variant thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:61.
- the nucleic acid molecule comprises a nucleic acid sequence comprising a multiple cloning site comprising SEQ ID NO:40 or a variant thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100%
- the nucleic acid molecule comprises a nucleic acid sequence comprising a multiple cloning site comprising SEQ ID NO:41 or a variant thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:41.
- the nucleic acid molecule comprises a nucleic acid sequence comprising a multiple cloning site comprising SEQ ID NO:42 or a variant thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:42.
- the nucleic acid molecule comprises a nucleic acid sequence comprising a multiple cloning site comprising SEQ ID NO:43 or a variant thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:43.
- the nucleic acid molecule comprises a nucleic acid sequence comprising a multiple cloning site comprising SEQ ID NO:44 or a variant thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or
- the nucleic acid molecule comprises a nucleic acid sequence comprising a multiple cloning site comprising SEQ ID NO:45 or a variant thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:45.
- the nucleic acid molecule comprises a nucleic acid sequence comprising a multiple cloning site comprising SEQ ID NO:46 or a variant thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:46.
- the nucleic acid molecule comprises a nucleic acid sequence comprising a multiple cloning site comprising SEQ ID NO:47 or a variant thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:47.
- the nucleic acid molecule comprises a nucleic acid sequence comprising a multiple cloning site comprising SEQ ID NO:48 or a variant thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:48.
- the nucleic acid molecule comprises a nucleic acid sequence comprising a multiple cloning site comprising SEQ ID NO: 59 or a variant thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO: 59.
- the nucleic acid molecule of the disclosure comprises any two of the above-mentioned nucleic acid sequences that are multiple cloning sites.
- the nucleic acid molecule comprises a nucleic acid sequence encoding a protein tag as a selection domain comprising SEQ ID NO:62 or a variant thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:62.
- the nucleic acid molecule comprises a nucleic acid sequence encoding a protein tag as a selection domain comprising SEQ ID NO:65 or a variant thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:65.
- the nucleic acid molecule comprises a nucleic acid sequence encoding a protein tag as a selection domain comprising SEQ ID NO:66 or a variant thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:66.
- the nucleic acid molecule comprises a nucleic acid sequence encoding a protein tag as a selection domain comprising SEQ ID NO:67 or a variant thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:67.
- the nucleic acid molecule comprises a nucleic acid sequence encoding a protein tag as a selection domain comprising SEQ ID NO:68 or a variant thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:68.
- the nucleic acid molecule comprises a nucleic acid sequence comprising SEQ ID NO:55 or a variant thereof comprising 70%, 75%, 80%, 85%, 90%,
- the nucleic acid molecule comprises a nucleic acid sequence comprising SEQ ID NO:69 or a variant thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:69, also known as pYC.
- the disclosure provides a cell or a plurality of cells comprising an endogenous nucleic acid sequence encoding an expressible amino acid, said endogenous nucleic acid sequence modified on its amino or carboxy terminus by an expressible exogenous modification element comprising: (i) at least a first and a second nucleic acid sequence that encode a selection domain, wherein each of the first and second nucleic acid sequences that encode a selection domain; (ii) a first nucleic acid sequence encoding a cleavage site positioned between the amino or carboxy terminus and the first nucleic acid sequence that encodes a selection domain; and (iii) a second nucleic acid sequence encoding a cleavage site positioned between the first nucleic acid sequence that encodes a selection domain and the second nucleic acid sequence that encodes a selection domain.
- the modification element further comprises a nucleic acid sequence encoding a protein tag.
- the protein tag is any amino acid encoded by sequence identifier chosen from SEQ ID NO: 65 through SEQ ID NO:68 or SEQ ID:62, or variant thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% an amino acid sequence encoded by SEQ ID NO:65 through SEQ ID NO:68 or SEQ ID:62.
- the protein tag is any amino acid encoded by sequence identifier chosen from SEQ ID NO:65 through SEQ ID NO:68 or SEQ ID:62, or variant thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%,
- the plurality of cells comprises one or a combination ofcompositions according to the first aspect of the invention.
- the cells comprise NB458 cells, 293T cells and/or Jurkat cells.
- at least about 30% of the cells comprise the one or a portion of the modification element in their endogenous DNA from about 7 to about 18 days in culture.
- the cells have a doubling time of about 4 days.
- the cell is the cell line identified in Table Z, comprises any of the nucleotide sequences disclosed herein and has the doubling time identified in Table Z. 000123
- the disclosure provides a method of culturing a cell comprising: exposing the composition of the first aspect to one or plurality of cells for a period of time sufficient to transfect the cell.
- the plurality of cells comprise at least one NB458 cell and at least one non-cancerous cell.
- the method further comprises exposing the one or plurality of cells to a selection stimulus sensitive to the first or second nucleic acid sequence encoding the selection domain.
- the method further comprises exposing the one or plurality of cells to a selection stimulus sensitive to the first and second nucleic acid sequence encoding the selection domain.
- the selection stimulus is an antibiotic.
- the antibiotic is chosen from one or a combination of antibiotics from Table Y.
- the antibiotic is puromycin.
- the selection stimulus is exposure to light with a wavelength from about 500 nm to about 650 nm.
- the selection stimulus is exposure to light with a wavelength in the range of any of the stimulant wavelengths disclosed in Table Y if the fluorescent protein corresponding to that wavelength is expressed in the cell.
- the method further comprises a step of exposing the plurality of cells to a Cas protein for a period of time sufficient to cleave at least one region of endogenous DNA of a cell in the plurality of cells. In some embodiments, the method further comprises a step of culturing the one or plurality of cells from about 10 to about 14 days. In some embodiments, the method further comprises a step of culturing the one or plurality of cells from about 10 to about 60 days or a sufficient time after exposure of the cell or plurality of cells to a selection agent, such as an antibiotic, for a period of time to kill cells in the culture that did become transfected and/or whose endogenous DNA was not modified by the nucleic acid molecule of the disclosure.
- a selection agent such as an antibiotic
- the disclosure provides a method of editing endogenous DNA of one or a plurality of cells comprising exposing the nucleic acid molecule according to the first aspect to one or plurality of cells for a period of time sufficient to transfect the cell.
- the method further comprises exposing the plurality of cells to a Cas protein for a period of time sufficient to cleave at least one region of endogenous DNA of a cell in the plurality of cells prior to exposing the cells to nucleic acid molecule according to the first aspect, such that endogenous DNA of the one or plurality of cells is cleaved at a target sequence.
- the method further comprises exposing the nucleic acid molecule to the cleaved endogenous DNA for a time period sufficient for the modification element to integrate into the endogenous DNA of the one or plurality of cells at the target sequence. In some embodiments, the method further comprises culturing the one or plurality of cells for no less than about 7 days. 000126 In a fifth aspect, the disclosure provides a method of isolating a protein in a cell comprising (a) exposing the nucleic acid molecule of the first aspect to the one or plurality of cells for a period of time sufficient to transfect the cell.
- the method further comprises (b) exposing the plurality of cells to a Cas protein for a period of time sufficient to cleave at least one region of endogenous DNA of a cell in the plurality of cells prior to step (a), such that endogenous DNA of the one or plurality of cells is cleaved at a target sequence.
- the method further comprises exposing the nucleic acid molecule to the cleaved endogenous DNA for a time period sufficient for the modification element to integrate into the endogenous DNA of the one or plurality of cells at the target sequence.
- the method further comprises culturing the one or plurality of cells for no less than about 7 days.
- the method further comprises allowing the one or plurality of cells to express a protein modified at the target sequence with the modification element, wherein the modification element comprises a nucleic acid encoding a protein tag.
- the method further comprises isolating the protein by exposing the protein tag to one or a plurality of capture elements that associate with or bind to the protein tag.
- the method further comprises isolating or precipitating the capture element.
- the disclosure provides a method of endogenously labeling a protein in a cell comprising (a) exposing the nucleic acid molecule of the first aspect to the one or plurality of cells for a period of time sufficient to transfect the cell.
- the method further comprises (b) exposing the plurality of cells to a Cas protein for a period of time sufficient to cleave at least one region of endogenous DNA of a cell in the plurality of cells prior to step (a), such that endogenous DNA of the one or plurality of cells is cleaved at a target sequence.
- the method further comprises exposing the nucleic acid molecule to the cleaved endogenous DNA for a time period sufficient for the modification element to integrate into the endogenous DNA of the one or plurality of cells at the target sequence, such that the protein is expressed with the modification element at the target sequence.
- the method further comprises culturing the one or plurality of cells for no less than about 7 days.
- the step or steps of exposing the cell to the nucleic acid protein and/or a Cas protein is free of exposure of the cells to a viral particle or a viral vector, whether that vector is replication-deficient or attenuated.
- the disclosure provides a method of screening for therapeutic agent in a cell comprising (a) exposing any one or plurality of cells according to the second aspect to a pathogen.
- the pathogen is chosen from one or a combination of lentiviruses, hepatitis viruses, papilloma viruses, corona viruses, influenza viruses and rotoviruses.
- the method further comprises exposing the one or plurality of cells to a library of agents.
- the step of exposing is performed in the presence or absence of a viral inhibitor.
- the pathogen is a bacterial cell.
- the pathogen is a fungal cell.
- the one or plurality of cells are human cells.
- Gene editing enzymes of the disclosure are chosen from meganucleases, transposases, and Cas proteins.
- Another aspect of the disclosure relates to a system comprising a CRISPR enzyme (or "Cas protein") or a nucleotide sequence encoding one or more Cas proteins; and the nucleic acid molecules of the disclosure. Any protein capable of enzymatic activity in cooperation with a guide sequence is a Cas protein.
- the disclosure relates to a system comprising a vector comprising a regulatory element operably linked to an enzymecoding sequence encoding a CRISPR enzyme, such as a Cas protein from the Cas family of enzymes.
- the disclosure relates to a system or composition comprising any one or plurality of Cas proteins either individually or in combination with one or a plurality of guide sequences.
- Compositions of one or a plurality of Cas proteins may be administered to a cell with any of the disclosed nucleic acid sequences sequentially or contemporaneously.
- Cas proteins include Casl, CaslB, Cas2, Cas3, Cas4, Cas5, Cas6, Cas7, Cas8, Cas9 (also known as Csnl and Csxl2), CaslO, Csyl, Csy2, Csy3, Csel, Cse2, Cscl, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, Cmrl, Cmr3, Cmr4, Cmr5, Cmr6, Csbl, Csb2, Csb3, Csxl7, Csxl4, CsxlO, Csxl6, CsaX, Csx3, Csxl, Csxl5, Csfl, Csf2, Csf3, Csf4, type V CRISPR-Cas systems, variants and fragments thereof, or modified versions thereof comprising at least 70% sequence identity to the sequences of Table C.
- the amino acid sequence of S. pyogenes Cas9 protein may be found in the SwissProt database under accession number Q99ZW2.
- the unmodified CRISPR enzyme has DNA cleavage activity, such as Cas9.
- the CRISPR enzyme is Cas9, and may be Cas9 from S. pyogenes or S. pneumoniae .
- the CRISPR enzyme directs cleavage of one or both strands of endogenous DNA in the disclosed cell or cells at the location of a target sequence, such as within the target sequence and/or within the complement of the target sequence.
- the CRISPR enzyme directs cleavage of one or both strands within about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 50, 100, 200, 500, or more base pairs from the first or last nucleotide of a target sequence.
- a vector encodes a CRISPR enzyme or Cas protein that is mutated to with respect to a corresponding wild-type enzyme, such that the CRISPR enzyme lacks the ability to cleave one or both strands of a target polynucleotide containing a target sequence.
- D10A aspartate- to-alanine substitution
- pyogenes converts Cas9 from a nuclease that cleaves both strands to a nickase (cleaves a single strand).
- Other examples of mutations that render Cas9 a nickase include, without limitation, H840A, N854A, and N863A.
- a Cas9 nickase may be used in combination with guide sequenc(es), e.g., two guide sequences, which target respectively sense and antisense strands of the DNA target.
- guide sequenc(es) e.g., two guide sequences, which target respectively sense and antisense strands of the DNA target.
- Other mutations may be useful; where the Cas9 or other CRISPR enzyme is from a species other than S. pyogenes, mutations in corresponding amino acids may be made to achieve similar effects.
- the composition of the disclosure comprises an amino acid sequence of at least about 70%, 80%, 90%, 95%, 96%
- NC_020246.1 NC_020246.1 ; NC_018224.1 ; NC_015943.1 ; NC_011138.3; NC_009778.1 ; NC_006834.1 ; NC_014228.1 ; NC_010002.1 ; NC_013892.1 ; NC_010296.1 ; NC_009615.1; NC_012632.1;
- NC_010482.1 NC_009776.1 ; NC_009776.1 ; NC_009033.1; NC_000916.1; NC_018015.1;
- NC_019943.1 NC_016023.1 ; NC_016023.1 ; NC_015416.1; NC_013722.1 ; NC_013722.1 ;
- NC_008599.1 NC_007796.1; NC_007796.1; NC_007796.1; NC_007355. l; NC_021082.1; NC_018001.1; NC_009785.1 ; NC_022084.1 ; NC_018092.1 ; NC_014804.1 ; NC_014147.1;
- NC_004829.2 NC_015516.1; NC_014374.1; NC_009033.1; NC_007681.1; NC_002689.2;
- NC_011529.1 NC_010482.1 ; NC_009515.1; NC_009440.1 ; NC_008942.1 ; NC_008054.1 ;
- NC_005085.1 NC_009613.3; NC_014334.1; NW_006726754.1 ; NC_002663.1 ; NC_003143.1 ;
- NC_007644.1 NC_017459.1; NC_015416.1; NC_013722.1; NC_007643.1; NC_007643.1;
- NC_017459.1 NC_015518.1; NC_014933.1 ; NC_007426.1 ; NC_003901.1; NC_003106.2;
- NC_021592.1 NC_015931.1; NC_015931.1; NC_010482.1 ; NC_009033.1; NC_000853.1 ;
- NC_007644.1 NC_000917.1; NC_003106.2; NC_011916.1; NC_007643.1; NC_006347.1;
- NC_017844.1 NW_006399893.1 ; NC_002695.1 ; NC_017634.1; NC_003143.1; NC_017941.2; NC_004605.1; NC_004605.1; NC_019411.1 ; NC_007164.1 ; NC_002932.3; NC_005085.1;
- NC_027207.1 NC_016452.1 ; NC_016112.1; NC_009784.1.
- Methods of the disclosure relate to a method of modifying endogenous DNA of a cell by exposing the cell to a first nucleic acid molecule comprising a nucleic acid sequence encoding a Cas protein and a second nucleic acid molecule comprising (i) a first and a second homology donor (HDR) region complementary to a target domain; (ii) at least a first and a second nucleic acid sequence that encodes a selection domain, wherein each of the first and second nucleic acid sequences that encode a selection domain are positioned in a 5’ to 3’ orientation between the first and second HDR sequence; (iii) a first nucleic acid sequence encoding a cleavage site positioned between the first HDR sequence and the first nucleic acid sequence that encodes a selection domain; (iv) a second nucleic acid sequence encoding a cleavage site positioned in a 5’ to 3’ orientation between the first nucleic acid sequence that encodes a selection domain and
- Methods of the disclosure also relate to a method of manufacturing a cell or a method labeling endogenous DNA of a cell comprising: (a) exposing the cell to any nucleic acid molecule disclosed herein for a time period sufficient to transfect the cell; and (b) exposing the cell to a Cas protein for a time period sufficient to excise endogenous DNA in the cell; and (c) allowing a portion of the nucleic acid molecule to integrate into the endogenous DNA of the cell.
- the cell is an isolated cell.
- the cell is chosen from the cells in Table Z and have a doubling time disclosed in Table Z.
- the method is free of exposing the cell to a viral particle or a viral vector.
- the method is performed by transfection. In some embodiments, the method is performed by transfection such that the nucleic acid sequence of the disclosure (single strand or double stranded DNA) is positioned within the cell; and then the cell is exposed to a gene editing enzyme, such as a Cas protein or a nucleic acid sequence encoding a Cas protein, such that the gene editing enzyme cuts endogenous DNA of the cell and facilitates integration of the DNA positioned between the HDR regions of the disclosed nucleic acid molecules into the target domain.
- a gene editing enzyme such as a Cas protein or a nucleic acid sequence encoding a Cas protein
- the disclosure also relates to a method of altering expression of at least one gene product in a cell comprising introducing into a cell an engineered, non-naturally occurring CRISPR associated (Cas) (CRISPR-Cas) system comprising: (a) a vector comprising a nucleotide sequence encoding any CRISPR enzyme disclosed herein, any mutated CRISPR enzyme having at least 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 9%, 97%, 98%, or 99% sequence homology to any CRISPR enzyme disclosed herein (such as Table C), or functional fragment thereof; and (b) a nucleic acid sequence disclosed herein, wherein components (a) and (b) are located on same or different vectors of the system; wherein the cell comprises endogenous DNA comprising a target domain and encoding a gene product; and wherein the CRISPR enzyme or functional fragment thereof cleaves the endogenous DNA molecule, whereby expression of the at
- the disclosure relates to a composition
- a cell line comprising one or a plurality of cells disclosed herein.
- those cells comprise any one or combination of cells identified in Table Z and comprise a first and second nucleic acid sequence encoding a selection domain.
- a selection domain is chosen from the amino acid sequences disclosed in Table Y.
- the cell or cells comprise a nucleic acid molecule disclosed herein, a complementary sequence thereof and/or express a target protein with at least one protein tag.
- the protein tag is chosen from those tags aforementioned above or is chosen from those amino acid sequences disclosed in Table Y.
- the cell or cell express one or a combination of amino acid sequences disclosed in Table Y or a variant thereof comprising about 70%, 80% 90%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to the amino acids disclosed in Table Y.
- the cell or cell line comprises a mutation in an endogenous DNA wherein a portion of the endogenous DNA encoding a target protein is modified on its 5’ or 3’ end to express the nucleic acid encoding the protein tag on the resulting amino or carboxy end of the encoding target protein.
- the same cell or cells are modified endogenously or transiently to simultaneously express at least one or two
- the expression of the independently regulated selection markers is free of the regulatory sequence operably linked to the target proteins.
- the selection markers of the regulatory sequence operably linked to the target proteins comprise a first nucleic acid sequence that confers antibiotic resistance to the cell or cells and the second nucleic acid sequence encodes expression of a physical protein, such as a fluorescent protein in the cell.
- the protein tag is not the same physical protein marker.
- the physical protein marker is a protein chosen from the amino acid sequences disclosed on Table Y or a variant thereof comprising about 70%, 80% 90%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to the amino acids disclosed in Table Y; and the protein tag is an amino acid sequence chosen from the section of Affinity tag sequence identified in Table X or a variant thereof comprising about 70%, 80% 90%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to the amino acids disclosed in Table X.
- the disclosure relates to a composition
- a composition comprising a nucleic acid molecule encoding one or a plurality of sequences chosen from the amino acid sequences identified in Table X.
- the nucleic acid molecule is comprises: one or more protease cleavage sites, affinity tag sequences positioned between one or more homology regions, one or more self-cleavage sequences, one or more mammalian antibiotic selection sequences, at least one multiple cloning sites, and one or more bacterial antibiotic selection sequences, one or more origin of replication sequences.
- the nucleic acid sequence comprises, consists of or consists essentially of SEQ ID NO:55 or 69, or variants thereof comprising about 70%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:55 or 69.
- nucleic acid molecules suitably cloned by the methods of the present invention may be DNA molecules (including cDN A molecules), RNA molecules (including polyadenylated RNA (polyA+ RNA), messenger RNA (mRNA), transfer RNA (tRNA) and ribosomal RNA (rRNA) molecules) or DNA-RNA hybrid molecules, and may be single- stranded or double-stranded.
- DNA molecules including cDN A molecules
- RNA molecules including polyadenylated RNA (polyA+ RNA), messenger RNA (mRNA), transfer RNA (tRNA) and ribosomal RNA (rRNA) molecules
- DNA-RNA hybrid molecules may be single- stranded or double-stranded.
- nucleic acid molecules to be cloned according to the methods of the present invention may be prepared synthetically according to standard organic chemical synthesis methods that will be familiar to one of ordinary skill.
- the nucleic acid molecules may be obtained from natural sources, such as a variety of cells, tissues, organs or organisms.
- Cells that may be used as sources of nucleic acid molecules may be prokaryotic (bacterial cells, including those of species of the genera Escherichia, Bacillus, Serratia, Salmonella, Staphylococcus, Streptococcus, Clostridium, Chlamydia, Neisseria, Treponema, Mycoplasma, Borrelia, Legionella, Pseudomonas, Mycobacterium, Helicobacter, Erwinia, Agrobacterium, Rhizobium, and Streptomyces) or eukaryotic (including fungi (especially yeasts), plants, protozoans and other parasites, and animals including insects (particularly Drosophila spp.
- prokaryotic bacterial cells, including those of species of the genera Escherichia, Bacillus, Serratia, Salmonella, Staphylococcus, Streptococcus, Clostridium, Chlamydia, Neisseria, Treponema, Mycoplasma
- nematodes particularly Caenorhabditis elegans cells
- mammals particularly human, rodent (rat or mice), monkey, ape, canine, feline, equine, bovine and ovine cells, and most particularly human cells
- Mammalian somatic cells that may be used as sources of nucleic acids include blood cells (reticulocytes and leukocytes), endothelial cells, epithelial cells, neuronal cells (from the central or peripheral nervous systems), muscle cells (including myocytes and myoblasts from skeletal, smooth or cardiac muscle), connective tissue cells (including fibroblasts, adipocytes, chondrocytes, chondroblasts, osteocytes and osteoblasts) and other stromal cells (e.g., macrophages, dendritic cells, Schwann cells).
- blood cells reticulocytes and leukocytes
- endothelial cells epithelial cells
- neuronal cells from the central or peripheral nervous systems
- muscle cells including myocytes and myoblasts from skeletal, smooth or cardiac muscle
- connective tissue cells including fibroblasts, adipocytes, chondrocytes, chondroblasts, osteocytes and osteoblasts
- stromal cells
- Mammalian germ cells may also be used as sources of nucleic acids for use in the invention, as may the progenitors, precursors and stem cells that give rise to the above somatic and germ cells (e.g. , embryonic stem cells).
- nucleic acid sources are mammalian tissues or organs such as those derived from brain, kidney, liver, pancreas, blood, bone marrow, muscle, nervous, skin, genitourinary, circulatory, lymphoid, gastrointestinal and connective tissue sources, as well as those derived from a mammalian (including human) embryo or fetus.
- prokaryotic or eukaryotic cells, tissues and organs may be normal, diseased, transformed, established, progenitors, precursors, fetal or embryonic.
- Diseased cells may, for example, include those involved in infectious diseases (caused by bacteria, fungi or yeast, viruses (including HIV) or parasites), in genetic or biochemical pathologies (e.g. , cystic fibrosis, hemophilia, Alzheimer's disease, muscular dystrophy or multiple sclerosis) or in cancerous processes.
- Transformed or established animal cell lines may include, for example, COS cells, CHO cells, VERO cells, BHK cells, HeLa cells, HepG2 cells, K562 cells, F9 cells and the like.
- nucleic acid molecules and cDNA libraries may be obtained commercially, for example from Life Technologies, Inc. (Rockville, Maryland) and other commercial suppliers that will be familiar to the skilled artisan.
- nucleic acid molecules to be cloned are amplified nucleic acid molecules.
- Nucleic acid molecules may be amplified by a number of methods, which may comprise one or more steps. For example, one such method comprises
- amplification methods may be accomplished by any of a variety of techniques, including but not limited to use of the polymerase chain reaction (PCR; U.S. Patent Nos. 4,683, 195 and 4,683,202), Strand Displacement Amplification (SDA; U.S. Patent No. 5,455,166), and Nucleic Acid Sequence-Based Amplification (NASBA; U.S. Patent No. 5,409,818);
- the method of manufacturing or preparing the disclosed nucleic acid molecule comprises performing PCR to clone individual components of the nucleic acid molecule into the nucleic acid molecule, such as SEQ ID NO:69 or functional variants thereof comprising at least about 75% sequence identity to SEQ ID NO:69.
- nucleic acid molecules to be cloned by the methods of the invention may be isolated by methods that are well-known in the art.
- Methods of the disclosure also relate to a method of manufacturing a cell or a method labeling endogenous DNA of a cell comprising: (a) exposing the cell to any nucleic acid molecule disclosed herein for a time period sufficient to transfect the cell; and (b) exposing the cell to a Cas protein for a time period sufficient to excise endogenous DNA in the cell; and (c) allowing a portion of the nucleic acid molecule to integrate into the endogenous DNA of the cell.
- the cell is an isolated cell.
- the cell is chosen from the cells in Table Z and have a doubling time disclosed in Table Z.
- the method is free of exposing the cell to a viral particle or a viral vector.
- the disclosure relates to a composition comprising a cell line comprising one or a plurality of cells disclosed herein.
- those cells comprise any one or combination of cells identified in Table Z and comprise a first and second nucleic acid sequence encoding a selection domain.
- a selection domain is chosen from the amino acid sequences disclosed in Table Y.
- the cell or cells comprise a nucleic acid molecule disclosed herein, a complementary sequence thereof and/or express a target protein with at least one protein tag.
- the protein tag is chosen from those tags aforementioned above or is chosen from those amino acid sequences disclosed in Table Y.
- the cell or cell express one or a combination of amino acid sequences disclosed in Table Y or a variant thereof comprising about 70%, 80% 90%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to the amino acids disclosed in Table Y.
- the cell or cell line comprises a mutation in an endogenous DNA wherein a portion of the endogenous DNA encoding a target protein is modified on its 5’ or 3’ end to express the nucleic acid encoding the protein tag on the resulting amino or carboxy end of the encoding target protein.
- the same cell or cells are modified endogenously or transiently to simultaneously express at least one or two different selection markers independent of the tag, such that expression of the target protein is not dependent or regulated by expression of the selection markers.
- the expression of the independently regulated selection markers is free of the regulatory sequence operably linked to the target proteins.
- the selection markers of the regulatory sequence operably linked to the target proteins comprise a first nucleic acid sequence that confers antibiotic resistance to the cell or cells and the second nucleic acid sequence encodes expression of a physical protein, such as a fluorescent protein in the cell.
- the protein tag is not the same physical protein marker.
- the physical protein marker is a protein chosen from the amino acid sequences disclosed on Table Y or a variant thereof comprising about 70%, 80% 90%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to the amino acids disclosed in Table Y; and the protein tag is an amino acid sequence chosen from the section of Affinity tag sequence identified in Table X or a variant thereof comprising about 70%, 80% 90%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to the amino acids disclosed in Table X.
- the disclosure relates to a composition comprising a nucleic acid molecule encoding one or a plurality of sequences chosen from the amino acid sequences identified in Table X.
- the nucleic acid molecule is comprises: one or more protease cleavage sites, affinity tag sequences positioned between one or more homology regions, one or more self-cleavage sequences, one or more mammalian antibiotic selection sequences, at least one multiple cloning sites, and one or more bacterial antibiotic selection sequences, one or more origin of replication sequences.
- the nucleic acid sequence comprises, consists of or consists essentially of SEQ ID NO:55 or 69, or variants thereof comprising about 70%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:55 or 69.
- the nucleic acid molecule encodes any amino acid sequence identified before in Table X or variants that comprise about 70%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to the amino acids of Table X.
- Some embodiments of the disclosure also include a cell comprising the disclosed nucleic acid molecule.
- the cell or plurality of cells comprises any cell line identified in Table Z.
- Antibiotic resistance (Ampicillin) (SEQ ID NO:58)
- P2A Self-cleavage sequence (SEQ ID NO:63)
- Example 1 Plasmid YifanCheng (pYC) generation, homology directed repair (HDR) arm design, and single-guided RNA (sgRNA) generation
- the backbone of pYC was derived from a pEG plasmid with plasmid amplification and antibiotic resistance.
- the tag components, purification tags and selection markers, were synthesized by Integrated DNA Technology and subcloned by PCR.
- the generated pYC were then amplified using the DH5a competent E.Coli cells.
- the designed multiple cloning sites were validated by double-restriction digestion and subsequent agarose gel analysis. The pYC sequences were confirmed (Elim BioPharma).
- the left and right HDR arms were designed based on the CRch38 (hg38, homo sapiens) in Benchling (https://benchling.com/) and NC_000012.12 Chromosome 12 Reference GRCh38.pl4 Primary Assembly in NCBI database (https://www.ncbi.nlm.nih.gov/gene). To achieve optimal knock-in and cost-efficiency, the length of the arms was maintained within a range of 500 - 990 bp, including sticky and blunt enzyme digestion sites. Both arms were subcloned into the pYC plasmid by PCR or by the digestion-ligation method.
- the HDR was first amplified by PCR using a pair of primers (Elim BioPharma) in which the 5’ end of the 3’ primers is modified by addition of a phosphoryl group.
- the PCR product was then incubated with 1 uL of Lamda exonuclease (NEB), which recognizes this phosphoryl group on the 5’ end of DNA and degrades one strand to leave behind ssDNA, at 37°C for 0.5 -1 hour. Due to the lack of DNA secondary structure, SYBR Gold (ThermoFisher) was applied for the visualization.
- the digested samples were then purified by DNA purification kit (Zymo research) and diluted into the designated concentration.
- the sgRNAs were designed by using CRch38 (hg38, homo sapiens) in Benchling software (https://benchling.com/). The higher ranked sgRNA fragments were synthesized by Elim Biopharm and subcloned into a px458 like previous described (Ran, 2013), respectively.
- a pair of primers which is encoded sgRNA and designed BbsI site (NEB)
- NEB BbsI site
- the annealing product was ligased into pre-linearized px458 by using T4 ligase (NEB).
- HEK293T cells were grown in DMEM media (Gibco) containing 10% fetal bovine serum (FBS) (v/v).
- FBS fetal bovine serum
- cells were transfected with 300 ng of px458s and 500 ng of HDRs (be it plasmids, dsDNA, and ssDNA) using Lipofectamine 2000 (ThermoFisher). Two days later, 1 - 2 ug/mL of puromycin (ThermoFisher) was supplied. 10 - 14 days post-selection, successful knock-in cells were confirmed by their dim GFP fluorescence, western blot, and cDNA sequencing.
- the cells were transferred to suspension cultures in Freestyle293 media (Gibco) with 1% FBS (v/v).
- the suspended cells were grown in 37°C with 5% CCh by shaking at 120 rpm.
- Puromycin was added up to 100 m of suspension culture, however, cells were grown in the absence of puromycin for the protein purification 400 - 800 mb culture.
- 000148 lurkat cells were grown in home-modified RPMI 1640 media (Gibco) by supplement with 10% FBS (v/v), lOOU/ml of penicillin-streptomycin, 2 mM of L-glutamine, 10 mM HEPES pH 7.4 and ImM sodium pyruvate.
- FBS v/v
- lOOU/ml of penicillin-streptomycin 2 mM of L-glutamine
- 10 mM HEPES pH 7.4 10 mM HEPES pH 7.4
- ImM sodium pyruvate ImM sodium pyruvate.
- the same Knock-in materials with HEK293T cells 300 ng of px458s and 500 ng of HDR were delivered to the cells by using Lipofectamine 2000 (ThermoFisher).
- 1 ug/ml of puromycin were introduced for the knock-in selection, typically for 10 - 14 days.
- the dead cells were removed by using
- HEK293T and lurkat cells were resuspended with corresponding fresh media without puromycin and spun down at 1000g for 5 minutes to meet density of l x 107 cells/mL per vial.
- Cell pallets were resuspended with freezing media mixture (50% fresh media, 40% conditional media and 10% DMSO (Sigma)) and transferred to cryogenic vials (Corning). The vials were slowly frozen in Mr. Frosty at -80°C for 8 - 12 hours and transferred to liquid nitrogen dewar for long term storage.
- the cell pellets were resuspended by 180 pL of ice-cold TBS buffer (20mM Tris-HCl, pH8.0, 150mM NaCl), and then combined with 20 pL of 100 mM n- Dodecyl-P-D-maltoside (DDM) and 20 mM cholesteryl hemisuccinate (CHS) to lysis cells. The mixtures were rotated at 4°C for 45 minutes and 4xSDS-loading buffer was applied (Bio-Rad).
- TBS buffer 20mM Tris-HCl, pH8.0, 150mM NaCl
- DDM Dodecyl-P-D-maltoside
- CHS cholesteryl hemisuccinate
- the samples were vortexed, boiled at 90°C and spun down at 20,000g for 1 minutes at 4°C by using bench-top centrifuge (Eppendorf 5810R) to subject to SDS-PAGE gel (Bio-Rad, 4 - 15% gradient).
- the proteins on SDS-PAGE were transferred to a 0.2 pm nitrocellulose membrane (Bio-Rad) using a Trans-Blot Turbo (Bio-Rad).
- the samples were immunoblotted using Anti-FLAG-peroxidase (HRP) (Sigma-Aldrich, A8592) to validate its size.
- the peroxidase signals were developed with SuperSignal (ThemoFisher) and imaged by Chemidoc MP (Bio-Rad).
- the knock-in cells were seeded into 8-well chambered #1.5 coverglasses (C8-1.5P, Cellvis) coated with poly-l-lysine (Sigma-Aldrich) or fibronectin (Corning).
- a LED light source (X-Cite XLED1, Excelitas) operated at excitation power densities of 0.7 W/cm 2 at 488 nm and 1.6 W/cm 2 at 640 nm at the sample plane was used for excitation of mNeonGreen and SPY650- DNA.
- a sCMOS camera (Orca Flash 4.0, Hamamatsu) with a back-projected pixel size of 108 was used to detect brightfield and fluorescence signals. The sample was maintained at 37 °C and 5% CO2 using a stage-top incubation chamber with environmental control unit (Tokai Hit).
- Emitted fluorescence was separated from excitation light using a 405/488/561/640 nm quadband dichroic and mNeonGreen and SPY650-DNA signal was further fdtered using 525/50 and 700/75 nm bandpass filters. All microscope components were controlled using the micromanager 1.4 software platform (Edelstein, 2014). 000152 The total RNA was extracted by using Momach total RNA miniprep kit (NEB) and the manufacture provided protocol. The concentration of extracted total RNA was measured by Nanophoto meter NP80 (Implen) and lug of total RNAs were subjected to reverse transcript reaction to generate cDNAs by using LunaScript RT super mix (NEB).
- telomere sequence was confirmed by Elim BioPharma.
- HEK293T cells From HEK293T cells, the full- length cDNA of ACTB, GAPDH, TKT, VIM, and PCNA were validated. The cDNA of ACTB and PCNA from the Jurkat cells were achieved.
- FASN form HEK293T knock-in sample and FASN, GAPDH, and TKT from knock-in Jurkat cells were failed to obtain full-length cDNA but biochemical validation including western blot information and mass spectrometry along with structure information showed successfully tagging.
- the cell debris was then discarded by two follow-up centrifugation steps, one at 8000g for 20 minutes at 4°C using the rotor JA-25.50 (Beckman Coulter), and then at 126,000g for 1 hour at 4°C using the rotor Ti45 or 50.2Ti (Beckman Coulter), sequentially.
- the final supernatant was applied to preequilibrated 1 mL of anti-FLAG M2 affinity gel (Sigma-Aldrich) and incubated between 2 hours to overnight.
- the beads were then loaded in a polyprep column (Bio-Read) and then extensively washed with 50 column volumes of 500 mM NaCl and 20 mM Tris-HCl pH 8.0, followed by 50 column volumes of TBS buffer.
- pelleted cells were initially resuspended with ice-cold 250 mM sucrose, 5 mM MgC12, and 10 mM HEPES pH 7.4 (Fractionation Buffer) supplemented with protease inhibitor cocktail.
- the cells were mechanically homogenized using 10 - 20 times up-and-down in tissue glider (Wheaton).
- the homogenized samples were then spun down at 600g for 10 minutes at 4°C in the bench-top centrifuge (Eppendorf 5810R).
- the pellet mainly contains larger components, like unbroken cells and nuclei, while the supernatant contains lighter cellular fractions, like cytoplasm or endoplasmic reticulum.
- the resulting cytoplasmic and nuclear fractions were then purified according to the procedures described above.
- the cell pellets were harvested and resuspended by 180 pL of ice-cold TBS buffer, and then combined with 20 pL of 100 mM DDM and 20 mM CHS to lysis cells. The mixtures were rotated at 4°C for 45 minutes followed by spin down at 21,000g for 20 mins at 4°C. The sample was injected onto a Superdex 200 increase 10/300 GL column (Cytiva), preequilibrated with TBS buffer, at a flow rate of 0.5 mL/min. The GFP signal was collected by HPLC (Shimazu) equipped with RF-20A fluorescence detector - excitation 488nm and emission 508nm (Shimazu).
- Example 6 Fluorescence size-exclusion chromatography (FSEC) and fluorescence light microscope
- FSEC Fluorescent size exclusion chromatography
- the supernatant was incubated with 1 pL of ALFANB- eGFP, 0.1 mg/mL stock, for 2 hours at 4 °C, followed by injecting onto a Superdex 200 increase 10/300 GL column (Cytiva), pre-equilibrated with TBS buffer, at a flow rate of 0.5 mL/min.
- the GFP signal was collected by HPLC (Shimazu) equipped with RF-20A fluorescence detector - excitation 488nm and emission 508nm (Shimazu).
- the proteins were transferred to a 0.2 pm nitrocellulose membrane (Bio-Rad) using a Trans-Blot Turbo (Bio-Rad).
- the samples were immunoblotted using the corresponding antibodies; Anti-FLAG-peroxidase (HRP) (Sigma-Aldrich, A8592) to label the GAPDH, anti-FASN-HRP (Abeam, EPR7466) to monitor the cytoplasmic fraction, as well as anti-H2A (BioVision, cat3621- 100) with its secondary antibody rabbit-HRP (Bio-Rad, catl70-6516) to confirm the nuclear fraction.
- HRP Anti-FLAG-peroxidase
- Anti-FASN-HRP Abeam, EPR7466
- the peroxidase signals were developed with SuperSignal (ThemoFisher) and imaged by Chemi doc MP (Bio-Rad).
- GAPDH in the cytosolic compartment was also assessed for its activity. GAPDH levels were estimated by immunoblotting to keep amounts constant during the assay. Same quantities of proteins from the different oxidation stress conditions (none, 8 hours, and 24 hours) were added onto Greiner 96-well flat transparent plates and GAPDH enzymic activity were detected by GAPDH activity assay kit (Abeam, ab204732). Endogenous GAPDHs catalyze glyceraladehyde- 3-phophate into 1,3 -bisphosphate glycerate while conversing nicotinamide adenine dinucleotide (NAD + ) to NADH.
- NAD + nicotinamide adenine dinucleotide
- the provided chemicals from the kit react with the products and generate different colors, which allows to conduct colorimetric assay in OD450 nm absorption.
- SPARK 10M a plate reader manufactured by TEC AN, the samples were incubated at 37°C and agitated every 5 seconds, while OD450 was measured every minute for 30 minutes.
- Example 8 De-lipidation of endogenous GAPDH and membrane lipid strip assay
- the beads were washed with 200 column volume of TBS buffer to remove detergent and GAPDH was eluted with 3 column volume of TBS buffer supplemented with 0.25 mg/mL 3> ⁇ FLAG peptide.
- the elution was further incubated with 50 mg of Bio-Beads SM2 (Bio-Rad) at 4°C for 5 hours to remove any remaining detergent micelles.
- the sample was then concentrated using a 50 kDa MWCO filter and ran through a Superdex 200 Increase 10/300 GL column equilibrated with TBS buffer.
- the de-lipidated GAPDH peaks were pooled and used for lipid strip assays.
- the membrane lipid strip P-6002 (Echelon Bioscience) was used to identify the lipid(s) that interact with GAPDH.
- the membrane lipid strip was first blocked with TBST buffer (25mM Tris-HCl, pH 7.2, 150mM NaCl, 0.1% Tween-20 (v/v)) with 5% (w/v) milk for 1 hour at room temperature and then washed three times with TBST buffer for 10 minutes each round.
- the strip was then gently agitated in 10 mL of 3 - 8 pg of purified GAPDH for 1 hour at room temperature.
- Conventional immunoblotting was used to visualize lipid-bound endogenous GAPDH using an anti-FLAG antibody further developed with SuperSignal and imaged with ChemiDoc MP.
- Example 10 Cry o-EM sample preparation and data collection
- the grids were screened by Talos Arctica and Glacios electron microscope (Therm oFisher-FEI), operated at 200kV and equipped with Gatan K3 camera (Gatan, Inc.). Movies were acquired when suitable grids are identified. For GAPDH sample, final high resolution dataset were acquired using the Titan Krios electron microscope operated at 300kV and equipped with Gatan K3 camera and BioQuantum energy filter. The energy selection slit is set to 20eV. Movies were recorded in super-resolution mode at a nominal magnification of 105K, resulting in a super-resolution pixel size of 0.4175 A/pixel (0.835 A/pixel after 2x FT bin).
- Each movie stack was dose-fractionated in 80 frames using a total exposure time of 2 s at 0.025 seconds per frame. The total does was 45.8 (e / A 2 ). All image stacks were collected using SerialEM (Mastronarde, 2005). Defocus values varied from - 0.8 to - 1.5 pm.
- FIG. 19 illustrates the cBAF complex.
- FIG. 20 illustrates a western blot analysis for DPF2.
- FIG. 21 illustrates protein purification results from a 2L culture of HEK293 cells with the 2xStrep-ALFA-GFPl l-3xFLAG.
- FIG. 22 illustrates Mass spectrum of the purification. The data demonstrates that the cBAF complex was tagged.
- FIG. 23 illustrates a negative stain.
- FIG. 24 illustrates a western blot showing 2xStrep-ALFA- GFP1 l -3xFLAG-SNF2h (-133 kDa) (tags on the N-terminus of SNF2h). 2xStrep-ALFA-GFPl 1 - 3xFLAG was endogenously placed to the N-terminus of SNF2h (SMARCA5).
- FIG. 24 illustrates a western blot showing 2xStrep-ALFA- GFP1 l-3xFLAG-SNF2h (-133 kDa) (tags on the N-terminus of SNF2h).
- Example 15 TaggingExportin-1 (gene name crml orxpol) in HEK293 Cells
- FIG. 26 illustrates a western blot demonstrating success. Similar protocols as in the above examples were utilized. In brief, the same sgRNA sequence-containing px458 and homology arms from pYC system were transiently transfected using lipofectamine 3000 to HeLa cells. The selection was implemented in supplement of puromycin. The western blot was performed to confirm knock-in result. The FASN knock-in HeLa cells presents 250 kDa Anti-FLAG bands while wildtype HeLa cells do not. The Anti-GAPDH western blot is used as loading control.
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Genetics & Genomics (AREA)
- Chemical & Material Sciences (AREA)
- Engineering & Computer Science (AREA)
- Organic Chemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Wood Science & Technology (AREA)
- Zoology (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Biotechnology (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Biochemistry (AREA)
- Microbiology (AREA)
- Physics & Mathematics (AREA)
- Biophysics (AREA)
- Plant Pathology (AREA)
- Medicinal Chemistry (AREA)
- Crystallography & Structural Chemistry (AREA)
- Cell Biology (AREA)
- Mycology (AREA)
- Micro-Organisms Or Cultivation Processes Thereof (AREA)
Abstract
The disclosure relates to a series of multicomponent HDR donor plasmids and methods of using the same to stably transfect cells with a target protein fused to a selection domain. Cells comprising nucleic acid molecules expressing the target protein do so by modifying the endogenous DNA of the cell with a portion of the nucleic acid molecule.
Description
COMPOSITIONS FOR AND METHODS OF TAGGING ENDOGENOUSLY EXPRESSED PROTEINS
FIELD OF THE INVENTION
0001 The invention relates generally to compositions comprising homology directed repair (HDR) donor templates for tagging endogenous proteins by gene editing enzymes, such as CRISPR/Cas, for target protein purification from a biological sample containing heterogenous nucleic acid species.
CROSS-REFERENCE TO RELATED APPLICATIONS
0002 This Application claims the benefit of U.S. Application No. 63/509,021, filed on
June 19, 2023, the contents of which are hereby incorporated by reference in its entirety.
SEQUENCE LISTING
0003 The contents of the electronic sequence listing attached herewith (UCAL-032-
PCT_SL.xml; size: 954,220 bytes; and date of creation: June 19, 2024) is herein incorporated by reference in its entirety.
BACKGROUND
0004 Modem structural biology has benefitted from the recombinant technology that allows overexpression of target genes from plasmids with a strong promotor within a heterologous expression system. Such recombinant technology has enabled efficient production of high-quality samples with protein quantities sufficient for structural characterization. It also enables efficient introduction and assay for various modifications to the target proteins, such as introducing point mutations or domain deletions, for dissecting protein functions. However, producing functional large, dynamic, and multi-subunits protein complexes that are folded and assembled properly remains challenging and often requires significant trial-and-error efforts.
0005 Endogenous protein complexes are often properly assembled in native environment and offers certain advantages for structural and functional characterizations. With new nanotechnologies, such as single particle cryogenic electron microscopy (cryo-EM) the requirement
of both quantity and purity of any target sample needed for structural analysis is much less stringent than previously. Taking advantage of such technological advancements, capturing endogenous proteins for routine structural characterization has become not only feasible but also advantageous, as it offers opportunities to capture endogenous protein complexes at various stages during their assembly, and to determine changes under different cellular environment, such as under various stresses.
0006 A major challenge in studying mammalian endogenous proteins is their purification, which often requires sophisticated approaches that are designed and optimized for individual targets. When tagging endogenous proteins genetically with CRISPR/Cas, the choice of homology directed repair (HDR) donors plays a key role. Currently, there are three forms of HDR donors that are commonly used for knock-in approaches, i.e., plasmid, double-stranded and single-stranded DNA (dsDNA and ssDNA). Among them ssDNA is considered a promising HDR donor form, due to its higher efficiency and lower off-target rates. However, there is substantial variation from gene to gene in the efficiency of the different HDR donors.
SUMMARY OF ILLUSTRATIVE EMBODIMENTS
0007 This disclosure provides compositions that can provide all three HDR donor types from a single template and methods of their use for purifying proteins in their native states under different conditions from a biological sample, such as a cell culture sample containing heterogeneous nucleic acids.
0008 The present disclosure relates to a versatile strategy of tagging endogenous proteins, using a multicomponent HDR donor template that can be easily converted to all three HDR donor forms for parallelized knock-in experiments. This strategy provides additional structural and mechanistic insights that are not revealed by using recombinant proteins. While this disclosure focuses on structural characterization of endogenous proteins, it will be understood by those skilled in the art that the same strategy can be applied to other types of cell lines and used beyond structural biology. 0009 To facilitate efficient tagging by CRISPR/Cas9, the present inventors generated a set of plasmids, with different variations containing a DNA fragment to be knocked-in to the targeted locus, flanked by two multiple cloning sites (MCS) for insertion of left and right homology arms (L- and R-arms) flanking the targeted locus. The plasmids also contain a bacterial origin of
replication and a selectable antibiotic resistance marker for efficient amplification in bacterial cells. The inserted DNA fragment between two homology arms contains one or a plurality of affinity tags, whose choice can vary to match specific experimental goals, and one or a plurality of selectable markers for use in mammalian cells. A self-cleaving peptide is inserted between the affinity tag and the selection marker, and between the two selection markers. The two MCSs in the plasmid can be used as cleavage sites to convert the plasmid into a dsDNA HDR donor. With a pair of primers, in which the 3’ primer contains a phosphorylation modification at the 5’ end, and the use of exonuclease digestion, a ssDNA HDR donor can be generated from dsDNA form. Thus, all three forms of HDR donor can be generated without having to make multiple constructs and can be used to tag any target endogenous protein in parallel.
00010 Once inserted at the target locus, this design produces three proteins: the target protein with the affinity tags, and two selection markers separated from the targeted protein. Genome- edited cells can be selected in two steps, i.e., antibiotic treatment followed by fluorescence activated cell sorting (FACS). Alternatively, in practice, a majority of un-edited cells are removed by antibiotic treatment. At this point, the knock-in result can be checked by western blot, sequencing, and/or fluorescent light microscopy. The enriched population of genome-edited cells makes the final sorting by FACS more efficient or even unnecessary. A major advantage of this approach is that the success rate of generating genome-edited cell lines is less dependent on the initial efficiency of gene editing enzymes, such as CRISRP/Cas9 knock-in.
00011 In a first aspect, the disclosure provides a composition comprising a nucleic acid molecule comprising: (i) a first and a second homology donor (HDR) region complementary to a target domain; (ii) at least a first and a second nucleic acid sequence that encodes a selection domain, wherein each of the first and second nucleic acid sequences that encode a selection domain are positioned in a 5’ to 3’ orientation between the first and second HDR sequence; (iii) a first nucleic acid sequence encoding a cleavage site positioned between the first HDR sequence and the first nucleic acid sequence that encodes a selection domain; (iv) a second nucleic acid sequence encoding a cleavage site positioned in a 5’ to 3’ orientation between the first nucleic acid sequence that encodes a selection domain and the second nucleic acid sequence that encodes a selection domain; (v) a first and a second multiple cloning site, wherein the first multiple cloning site is
positioned upstream of the first HDR region and the second multiple cloning site is positioned downstream of the second HDR region.
00012 In a second aspect, the disclosure provides a cell comprising an endogenous nucleic acid sequence encoding an expressible amino acid, said endogenous nucleic acid sequence modified on its amino or carboxy terminus by an expressible exogenous modification element comprising: (i) at least a first and a second nucleic acid sequence that encode a selection domain, wherein each of the first and second nucleic acid sequences that encode a selection domain; (ii) a first nucleic acid sequence encoding a cleavage site positioned between the amino or carboxy terminus and the first nucleic acid sequence that encodes a selection domain; and (iii) a second nucleic acid sequence encoding a cleavage site positioned between the first nucleic acid sequence that encodes a selection domain and the second nucleic acid sequence that encodes a selection domain.
00013 In a third aspect, the disclosure also provides a method of culturing a cell comprising: exposing the composition of the first aspect to one or plurality of cells for a period of time sufficient to transfect the cell.
00014 In a fourth aspect, the disclosure provides a method of editing endogenous DNA of one or a plurality of cells comprising exposing the nucleic acid molecule according to the first aspect to one or plurality of cells for a period of time sufficient to transfect the cell.
00015 In a fifth aspect, the disclosure provides a method of isolating a protein in a cell comprising exposing the nucleic acid molecule of the first aspect to the one or plurality of cells for a period of time sufficient to transfect the cell.
00016 In a sixth aspect, the disclosure provides a method of endogenously labeling a protein in a cell comprising exposing the nucleic acid molecule of the first aspect to the one or plurality of cells for a period of time sufficient to transfect the cell.
00017 In a seventh aspect, the disclosure provides a method of screening for a therapeutic agent in a cell comprising exposing any one or plurality of cells according to the second aspect to a pathogen.
00018 The disclosure also relates to a method of imaging a cell comprising exposing the nucleic acid molecule according to the first aspect to one or plurality of cells for a period of time sufficient to transfect the cell such that at least one target gene is tranfected into endogenous DNA of the cell, exposing the cell to microscopy, and stimulating the cell with light at a frequency sufficient to
induce fluorescence of the endogenously expressed protein encoded by the nucleic acid molecule if the endogenously expressed protein comprises a fluorescent tag or label; and, optionally, inducing expression of the endogenously expressed nucleic acid sequence prior to stimulating the cell with light.
00019 The disclosure also relates to a method of manufacturing a single stranded DNA, a doublestranded DNA simultaneously from a nucleic acid molecule disclosed herein, the method comprising exposing a disclosed nucleic acid molecule disclosed herein (e.g. a multicomponent HDR donor template) to an endonuclease specific for the multiple cloning site of the nucleic acid molecule forming a cleaved DNA molecule encoding the target protein, exposing the resulting cleaved DNA to a pair of primers specific for the 3’ and 5’ ends of the endonuclease recognition site, and exposing the cleaved DNA to an exonuclease digestion to generate a population of linear single-stranded DNA comprising the nucleic acid sequence encoding the target protein and a linear double-stranded DNA comprising the nucleic acid sequence encoding the target protein.
BRIEF DESCRIPTION OF DRAWINGS
00020 FIGS. 1 A to ID show HDR donor template design and an endogenous protein tagging scheme. FIG. 1A shows a schematic of pYC plasmid design. All elements are colored and labeled. In addition to changeable HDR, which contains the changeable tags and selection markers (blue) flanked by corresponding left and right homology arm (yellow), the plasmid contains ColEl origin of replication (red) and ampicillin resistance gene (grey) for routine amplification in bacteria. FIG. IB shows a changeable HDR to tag N-terminus (above) or C-terminus (below) of a target protein (middle). The affinity tags and selection markers (antibiotic selection and fluorescence) are integrated immediately after the start codon (for N-terminus tagging) or before the stop codon (for C-terminus tagging). Two P2A self-cleavage sites (red triangle) are located between two reporter markers and between the gene of interest and nearby selection marker. FIG 1C shows a workflow of converting a plasmid into dsDNA and ssDNA HDR donors. FIG. ID shows agarose gels with SYBR gold showing the size of DNA corresponding to all forms of HDR donors.
00021 FIGS. 2 A to 2G show tagging of endogenous proteins in HEK293T and Jurkat cells. FIG. 2A shows fluorescence microscopy images (left) and anti-FLAG western blot (right) of cells after puromycin treatment. Six genes, ACTP>, FASN, GAPDH, TK1 VIM and PCNA, were tagged by
CRISPR/Cas9 using indicated HDR donors. To locate cells, the non-covalent DNA stain SPY650- DNA was applied. Scale bar is 100 pm. FIG. 2B shows Knock-in efficiency for each HDR donor type across the six target genes. Knock-in experiments and validation processes were repeated three times. FIG. 2C shows fluorescence images showing knock-in results of five different genes in Jurkat cells (upper row). Brightfield image (bottom row) is used to identify all cells within field of view (yellow dotted line). The bar scale is 100 pm. FIG. 2D shows that tagging is validated by anti -FLAG western blot. All proteins show their theoretical size on SDS-PAGE, the same as HEK293T cells do in FIG. 2A. FIG. 2E shows that the size distribution of genome-edited cells is similar to the control cells. FIG. 2F shows histograms of integrated mNeonGreen fluorescence intensity from genome- edited cells. To obtain the control signal, wildtype Jurkat cells were used. FIG. 2G shows a graph representing the knock-in efficiency across the five different genes. The knock-in experiments and validation processes were repeated twice in Jurkat cells.
00022 FIGS. 3A to 3F show characterization of endogenous PCNA.
FIG. 3 A shows the SEC profile of 3 *FLAG-tagged endogenous PCNA after affinity purification by anti-FLAG M2 resin. Colored labels (black and red line) above SEC profile match with the corresponding fractions in the western blot. The pink box on the profile indicates the theoretical elution volume of PCNA. FIG. 3B shows a negative stained EM micrograph and representative 2D class averages of purified endogenous PCNA. FIG. 3C shows a cryo-EM micrograph of purified endogenous PCNA. FIG. 3D shows 3D reconstruction of endogenous trimeric PCNA (from left to right: top, side views of the density map, and docking of atomic structure (PDB: 4D2G) to cryo-EM density map) and representation 2D class averages of trimeric PCNA. FIG. 3E shows that in combination with ALFA-tag and GFP-conjugated ALFA nanobody (ALFANB-eGFP), different complexes of endogenous PCNA were observed (black line). The experiment was repeated three times.
00023 FIG. 4A to 4K show structural changes of human endogenous GAPDH in response to prolonged oxidative stress. FIG. 4A shows two different views of human endogenous GAPDH cryo- EM map determined without oxidative stress. A non-protein density (yellow) is seen. FIG. 4B shows the electrostatics surface showing the mixture of positively charge and hydrophobic surface surrounding unknown density. The groove is composed by mainly Ca of the loop lining amino acids. R13 and R16 equip positively charged local environment for substrate recruitment. FIG. 4C shows a
lipid strip assay showing that de-lipidated endogenous GAPDH preferentially binds to PIP3. The experiment is repeated twice independently. FIG. 4D shows that following oxidation, FLAG-tagged endogenous GAPDH translocates to the nucleus in a time-dependent manner. The experiment was triplicated independently. FIGS. 4E and 4F show that upon prolonged oxidative stress, GAPDH enzymic activity decreases. The OD450 curve is normalized to the maximum point of the 0-hour oxidation condition. The bar graph shows the amount of generated NADH and the statistical significance are tested by two tailed T-test. The p-values are shown on the graph. The experiment was repeated multiple times. FIG. 4G shows that the overall architecture of nuclear GAPDH after 8- hours oxidation is almost identical (rmsd:0.270) with endogenous GAPDH from healthy cells. FIG. 4H shows the catalytic triad without oxidative stress showing a density connecting Cl 52 and Hl 79. No other substrate density is observed nearby. This is called the “active subunit”. FIG. 41 shows that the catalytic site configuration shows a different configuration at 8-hours oxidation from nucleus GAPDH. Due to the oxidative modification on Cl 52, the side chain of Cl 52 shows a bulk density and loose the connection with Hl 79, called the “inactive subunit”. This is present both in the 8-hour and 24-hour conditions. FIG. 4J shows that based on single subunit analysis, the number of “inactive subunits” in the tetrameric GAPDH complex increases during oxidative stress, and nuclear GAPDH is more damaged than cytosolic GAPDH. FIG. 4K shows that Endogenous GAPDH is damaged by prolonged oxidative stress and lose its functional subunits. This would trigger other post- translational modifications (PTMs) and eventually nuclear translocation occurs.
00024 FIG. 5A to 5E. Each panel shows a SEC profile of the tagged proteins purified by anti-FLAG M2 resin (top) with colored lines above marking fractions, anti-FLAG western blots (middle) from fractions marked by the same-colored line, negative stain EM micrograph and 2D class averages and (bottom) tire sample from fractions marked with pink shadow, which shows the theoretical elution position of the target protein. FIG. 5A is ACTB, FIG. 5B is TKT, FIG. 5C is VIM, FIG. 5D is FASN and FIG. 5E is GAPDH. 00025 FIGS. 6A, 6B, and 6C show an image processing workflow. The workflow of 200kV cryo- EM datasets is illustrated. Reconstructions of FASN, methylosome, GAPDH, and PCNA are shown. Among them, methylosome is purified as a “contaminant” protein of FASN.
00026 FIGS. 7A, 7B, and 7C show the workflow of Krios dataset about endogenous GAPDH. Five different Krios datasets were collected to elucidate GAPDH changes under different oxidation stresses. The motion of all datasets was corrected by using Relion. To reach the optimal particle picking on home-made GO-amino grids, multiple picking methods were applied and evaluated
manually in a micrograph-by-micrograph manner. By using 2D classification, junk particles were eradicated, and the remaining particles were subjected to 3D-based analysis. For 3D classification, Relion was mainly used while Cryosparc-based 3D classification was implemented on the nuclear GPADH dataset after 24 hours oxidation. The red boxed classes were used as a final particles and iterative refinement with D2 symmetry, ctf-refinement, and Bayesian polishing were implemented. The final resolutions are shown with the corresponding maps. To analyze single subunits of GAPDH, the particles undergo the described procedure - Cl refinement and particle expansion (D2) followed by signal subtraction (Gray dot line). To classify the conformational changes, 3D classification is implemented without image alignment with higher T value (T=5 to 15).
00027 FIGS. 8 A to 8E show cryo-EM structures of endogenous GAPDH. FIG. 8 A shows from left to right, angular distribution, local resolution and FSC curves of the endogenous GAPDH from 0- hour oxidation. FIGS 8B and 8C show Representative densities of human endogenous GAPDH. FIG. 8D shows three structures, two from cytoplasm and one from nucleus, after 8-hour oxidation, displayed with their angular distribution and location resolution. The FSC curves show the data quality of Cryo-EM density maps and corresponding atomic models. FIG. 8E shows post 24-hours oxidation GAPDH structures presented with their angular distribution and local resolution. FSC curves represents the data quality. All statistic data were calculated by Relion.
00028 FIGS. 9A to 9D show the lipid-like ligand binding groove in endogenous GAPDH. FIG. 9A shows the hydrophobic surface charge of the groove in endogenous GAPDH. FIG. 9B shows the SEC profile of de-lipidation treated GAPDH. FIG. 9C shows lysates from mutantexpressing cells directly loaded onto SDS-PAGE and monitoring the GAPDH expression level. R13 mutant abolishes GAPDH expression significantly. FIG. 9D shows FSEC profiles of mutant recombinant GAPDH. Most mutant GAPDH proteins are detected in FSEC as monomer.
00029 FIGS. 10A to 10D show the enzymatic activity of endogenous GAPDH during oxidation stress. FIG. 10A shows a standard curve of NADH as a reference. FIG. 10B shows five independent measures of enzymatic activity during prolonged oxidation stress. Each data point is shown as a dot. FIG. 10C shows western blot images showing the amount of GAPDH in each independent experiment.
00030 FIG. 11A, 1 IB-1, and 1 IB-2 show normalized maps of the catalytic site of endogenous GAPDH. FIG. 11 A shows an enlarged view of the catalytic sites of the normalized cryo-EM density
maps from cytoplasmic GAPDH without oxidation stress (left) and nuclear GAPDH after 8 hours oxidation (right) displayed at different thresholds. This shows the distribution between 0-hour oxidation (non-labelled) and 8-hours oxidation (red) by following the oxidation stress. In 0-hour oxidation, all classes show electron density between Cl 52 and Hl 79. At 8-hours oxidation, the nuclear GAPDH contains approximately twice more C152-H179 “inactive subunit” class than cytosolic GAPDH. At 24-hours oxidation, the cytosolic and nuclear GAPDH show the similar population of the “inactive subunit” class. The analysis was done by Relion and the ambiguous classes were ruled out from tracing back to intact GAPDH - how many “inactive subunit” class exist in the tetrameric GAPDH during chronical oxidation stress. FIGS. 1 IB-1 and 1 IB-2 show after symmetry expansion, the single subunits subtracted for 3D classification without image alignment. 00031 FIGS. 12A to 12D show a summary of variations in different components of pYC for different applications.
00032 FIG. 12A shows a list of variations in each component of pYC, including protease cleavage site, purification tag, 2A self-cleavage site and antibiotic and fluorescent selection markers. FIG. 12B shows combinations of different components for protein purification. FIG. 12C shows cellular localization of target protein by fluorescence microscopy. FIG. 12D shows cellular localization of target protein by FSEC.
00033 FIGS. 13 A to 13 J show tagging of endogenous protein in HEK293T, Jurkat and MDA- MB468 cells.
00034 FIG. 13 A shows representative fluorescence microsopy of bright field (top row), mNeonGreen (second row), mApple (third row) and merged (bottom row) recorded on the indicated days after initial transfection, FIG. 13B shows fraction of cells showing Cas-mApple (magenta) or mNeonGreen (green) fluorescence over untransfected control cells on different days after initial transfection, validating enrichment of mNeonGreen-positive population over time from initial transfection through puromycin treatment (Mean+/-SD from two replicates). FIG. 13C shows that anti -FLAG western blot validates tagging of GAPDH. FIG. 13D shows quantification of mNeonGreen intensity of single cells on different days after initial transfection. N=8563, 4424, 2191, 811, 282, 4835, 8723, 8426 and 10406 cells from two replicates for days 1-16. FIG. 13E shows that anti -FLAG western blots validate tagging of endogenous targets in Jurkat cells; all proteins show their expected size on SDS-PAGE. FIG. 13F shows that size distributions of genome-
edited cells are comparable to untreated control cells. (Sample number, n, of ACTB, FASN, GAPDH, TKT and PCNA is 2385, 10647, 759, 1912 and 2804 cells, respectively.) FIG. 13G shows histograms of integrated mNeonGreen fluorescence from gene-edited cells (same sample number as for FIG. 13F). Wildtype Jurkat cells (n=3382) were used as control. FIG. 13H shows representative fluorescence images of MDA-MB468 cells. From left to right are bright field image, fluorescence image cells labeled by SPY650-DNA, fluorescence image of mNeonGreen revealing genome edited cells, and the merged image. FIG. 131 shows that tagging in MDA-MB468 is validated by anti- FLAG western blot. FIG. 13 J shows knock-in efficiency for FASN and GAPDH in MDA-MB-468 cells measured as mNeonGreen signal higher than background signal in untreated control cells. 00035 FIGS. 14A to 14D show purification and negative staining EM of tagged endogenous proteins purified by anti-FLAG M2 resin (top) with colored lines above the marking fractions. Antiflag western blots (middle) from each fraction are marked by the same-colored line or triangle. Pink shadow marks the fractions with strongest western blot band. Negative stain EM micrograph and 2D class averages (bottom) of the sample from the fractions marked colored line or triangle on SEC profile. FIGs 14A-E show, respectively, ACTB, TKT, VIM, FASN and GAPDH. Scale bar is lOOnm. The SEC profiles often show multiple peaks, indicating that the affinity pulldown captures different complexes associated with the target protein. Furthermore, affinity pulldown could also capture proteins that may not form stable complexes with the target proteins.
00036 FIGS. 15 A, 15B, and 15C show an image processing workflow of 200 kV cryo-EM datasets. Reconstructions of FASN, GAPDH, methylosome and PCNA are shown.
00037 FIGS. 16A, 16B, and 16C show the workflow of cryo-EM data processing on analyzing GAPDH. Five cryo-EM datasets of endogenous human GAPDH purified at different time points from cytosol and nucleus of cells after prolonged oxidative stress.
00038 FIGS. 17A to 17D show the lipid-like ligand binding groove in GAPDH. FIG. 17A shows hydrophobic surface charge of the groove in endogenous GAPDH. The surface potential is generated default in ChimeraX. FIG. 17B shows SEC profile of affinity purified GAPDH after de-lipidation treatment. The shaded peak corresponds to the intact tetrameric GAPDH. Insert are anti-FLAG western blot of affinity purified GAPDH before de-lipidation treatment and from the shaded peak after de-lipidation. FIG. 17C shows GFP fluorescence image of SDS-PAGE gel of the lysates from the wild type and mutant GAPDH expressing cells. R13 mutant reduces GADPH expression. FIG.
17D shows cell lysate FSEC profiles of wild type and mutant recombinant GAPDH. FSEC profile of cell lysate with wild type GAPDH (black curve) is used as a control showing the location of the intact tetrameric GAPDH (black dashed line). Colored FSEC profiles are from cell lysate of mutant GAPDH. The peak indicated by the red dashed line corresponds to non-intact GAPDH.
00039 FIGS. 18A and 18B show a comparison of selection strategies and HDR designs. Tree diagrams illustrate different strategies of selection markers and HDR template construction used in different studies or available from vendors. FIG. 18A shows the use of two common selection markers of CRISPR/Cas9 knock in cells, fluorescent proteins (FP) and antibiotic resistance genes. The present study used double selection markers that both are separated from the targeted proteins by the 2A cleavage site. FIG. 18B shows construction of HDR template. In the present study multiple cloning sites (MCS) were used and the plasmid itself can be used as a HDR donor.
00040 FIG. 19 illustrates the cBAF complex.
00041 FIG. 20 shows a western blot analysis for DPF2.
00042 FIG. 21 shows results of protein purification for 2xStrep-ALFA-GFPl l-3xFLAG.
00043 FIGS. 22A and 22B illustrate mass Spec, of the purification in FIG. 21.
00044 FIG. 23 illustrates a negative stain from the cBAF experiments.
00045 FIG. 24 shows a western blot for 2xStrep-ALFA-GFPl l-3xFLAG-SNF2h (—133 kDa) (tags on the N-terminus of SNF2h).
00046 FIGS. 25 A and 25B show Exportin-1 (gene name crml or xpol) as tagged in HEK293 cells. 00047 FIG. 26 illustrates a western blot for FASN knock in HeLa cells.
DETAILED DESCRIPTION OF EMBODIMENTS
00048 The following terms or definitions are provided solely to aid in the understanding of the invention. Unless specifically defined herein, all terms used herein have the same meaning as they would to one skilled in the art of the present invention. Practitioners are particularly directed to Sambrook et al., Molecular Cloning: A Laboratory Manual, 2nd ed., Cold Spring Harbor Press, Plainsview, N.Y. (1989); and Ausubel et al., Current Protocols in Molecular Biology (Supplement 47), John Wiley & Sons, New York (1999), for definitions and terms of the art. The definitions provided herein should not be construed to have a scope less than understood by a person of ordinary skill in the art.
00049 Various terms relating to the methods and other aspects of the present disclosure are used throughout the
specification and claims. Such terms are to be given their ordinary meaning in the art unless otherwise indicated. Other specifically defined terms are to be construed in a manner consistent with the definition provided herein. 00050 00036 The term “about” as used herein when referring to a measurable value such as an amount, a temporal duration, and the like, is meant to encompass variations of ±20%, ±10%, ±5%, ±1%, 00051 ±0.5%, or ±0.1% from the specified value, as such variations are appropriate to perform the disclosed methods.
00052 The indefinite articles "a" and "an," as used herein in the specification and in the claims, unless clearly indicated to the contrary, should be understood to mean "at least one."
00053 The phrase "and/or," as used herein in the specification and in the claims, should be understood to mean "either or both" of the elements so conjoined, i.e., elements that are
00054 conjunctively present in some cases and disjunctively present in other cases. Other elements may 00038 optionally be present other than the elements specifically identified by the "and/or" clause, whether related or unrelated to those elements specifically identified unless clearly indicated to the contrary. Thus, as a nonlimiting example, a reference to "A and/or B," when used in conjunction with open-ended language such as "comprising" can refer, in one embodiment, to A without B (optionally including elements other than B); in another embodiment, to B without A (optionally including elements other than A); in yet another embodiment, to both A and B (optionally including other elements); etc.
00055 As used herein in the specification and in the claims, "or" should be understood to have the same meaning as "and/or" as defined above. For example, when separating items in a list, "or" or "and/or" shall be interpreted as being inclusive, i.e., the inclusion of at least one, but also including more than one of a number or list of elements, and, optionally, additional unlisted items. Only terms clearly indicated to the contrary, such as "only one of or "exactly one of," or, when used in the claims, "consisting of," will refer to the inclusion of exactly one element of a number or list of elements. In general, the term "or" as used herein shall only be interpreted as indicating exclusive alternatives (i.e. "one or the other but not both") when preceded by terms of exclusivity, "either," "one of," "only one of," or "exactly one of "Consisting essentially of," when used in the claims, shall have its ordinary meaning as used in the field of patent law.
00056 As used herein, the terms “activate,” “stimulate,” “enhance” “increase” and/or “induce” (and like terms) are used interchangeably to generally refer to the act of improving or increasing, either directly or indirectly, a concentration, level, function, activity, or behavior relative to the natural, expected, or average, or relative to a control condition before the act is completed. “Activate” refers to a primary response induced by ligation or
interaction between two molecules, such as a protein-protein interaction. For example, in the context of receptors, such stimulation entails the interaction of a receptor and its ligand, resulting in a subsequent signal transduction event. Further, the stimulation event may activate a cell and upregulate or downregulate expression or secretion of a molecule. Thus, ligation of cell surface moieties, even in the absence of a direct signal transduction event, may result in the reorganization of cytoskeletal structures, or in the coalescing of cell surface moieties, each of which could serve to enhance, modify, or alter subsequent cellular responses by activation or stimulation.
00057 “Cell type" means the organism, organ, and/or tissue type from which the cell is derived or sourced, state of development, phenotype or any other categorization of a particular cell that appropriately forms the basi for defining it as "similar to" or "different from" another cell or cells.
00058 "Coding sequence" or "encoding nucleic acid" as used herein may mean refers to the nucleic acid (RNA, DNA, or RNA/DNA hybrid molecule) that comprises a nucleotide sequence which encodes a protein. The coding sequence may further include initiation and termination signals operably linked to regulatory elements including a promoter and polyadenylation signal capable of directing expression in the cells of an individual or mammal to whom the nucleic acid is administered.
00059 "Complement" or "complementary" as used herein may mean a nucleic acid may mean Watson-Crick (e.g., A-T/U and C-G) or Hoogsteen base pairing between nucleotides or nucleotide analogs of nucleic acid molecules.
00060 00044 The term “fragment” is meant to be a portion of a polypeptide or nucleic acid molecule, such as, but not limiting to, a truncation mutant. This portion contains, preferably, at least about 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95% of the entire length of the reference nucleic acid molecule or polypeptide. A fragment may contain about 5, about 10, about 20, about 30, about 40, about 50, about 60, about 70, about 80, about 90, about 100, about 200, about 300, about 400, about 500, about 600, about 700, about 800, about 900, or about 1000 or more nucleotides or amino acids of a nucleotide or amino acid sequence, respectively, upon which it is based.
00061 The term “functional fragment” means any portion or fragment of a polypeptide or nucleic acid sequenc from which the respective full-length polypeptide or nucleic acid relates that is of a sufficient length and has a sufficient structure to confer a biological affect that is similar or substantially similar to the full-length polypeptide or nucleic acid upon which the fragment is based. In some embodiments, a functional fragment is a portion of a full-length or wild-type nucleic acid sequence that encodes any one of the nucleic acid sequences
disclosed herein, and said portion encodes a polypeptide of a certain length and/or structure that is less than full length but encodes a domain that still biologically functional as compared to the full- length or wild-type protein. In some embodiments, the functional fragment may have a reduced biological activity, about equivalent biological activity, or an enhanced biological activity as compared to the wild-type or full-length polypeptide sequence upon which the fragment is based. In some embodiments, the functional fragment is derived from the sequence of an organism, such as a human. In such embodiments, the functional fragment may retain about 99%, about 98%, about 97%, about 96%, about 95%, about 94%, about 93%, about 92%, about 91%, or about 90% sequence identity to the wild-type or given sequence upon which the sequence is derived. In some embodiments, the functional fragment may retain about 85%, about 80%, about
00062 75%, about 70%, about 65%, or about 60% sequence homology to the wild-type sequence upon which the sequence is derived.
00063 As used herein, the term “genetic construct” is meant to refer to the DNA or RNA molecules that comprise a nucleotide sequence that encodes protein. The coding sequence includes initiation and termination signals operably linked to regulatory elements including a promoter and polyadenylation signal capable of directing expression in the cells of the individual to whom the nucleic acid molecule is administered.
00064 00047 The term “host cell” as used herein is meant to refer to a cell that can be used to express a nucleic acid, e.g., a nucleic acid of the disclosure. The host cell can be, but is not limited to, a eukaryotic cell, a bacteria cell, an insect cell, or a human cell. Suitable eukaryotic cells include, but are not limited to, Vero cells, HeLa cells, COS cells, CHO cells, HEK293 cells, BHK cells and MDCKII cells. In some embodiments, the host cell i a cell chosen from one listed in Table X or Table Y. In some embodiments, the host cell is a cell chosen from a cell line, optionally stored at -80 degrees Celsius or -212 degrees Celsius. Suitable insect cells include, but are no limited to, Sf9 cells. The phrase "recombinant host cell" can be used to denote a host cell that has been transformed or transfected with a nucleic acid to be expressed. A host cell also can be a cell that comprises the nucleic acid but does not express it at a desired level unless a regulatory sequence is introduced into the host cell such that it becomes operably linked with the nucleic acid. It is understood that the term host cell refers not only to the particular subject cell but also to the progeny or potential progeny of such a cell. Because certain modifications may occur in succeeding generations due to, e.g., mutation or environmental influence, such progeny may not, in fact, be identical to the parent cell, but are still included within the scope of the term as use herein.
00065 The term “hybridize” as used herein is meant pair to form a double-stranded molecule between
complementary polynucleotide sequences (e.g., a gene described herein), or portions thereof, under various conditions of stringency. (See, e.g., Wahl, G. M. and S. L. Berger (1987) Methods Enzymol. 152:399; Kimmel, A. R. (1987) Methods Enzymol. 152:507).
00066 The term “isolated” as used herein means that the nucleic acid molecule, polynucleotide or polypeptide o fragment, variant, or derivative thereof has been essentially removed from other biological materials with which it is naturally associated, or essentially free from other biological materials derived, e.g., from a recombinant host cell that has been genetically engineered to express the polypeptide of the disclosure.
00067 The terms “in isolation” mean that, for purposes of this disclosure, the nucleic acid may not be the species listed. In other words, the nucleic acid may incorporate the mutations above in combination with one or more other mutations listed or not listed, but the nucleic acid may not be defined as the single species containing the nucleic acid mutations listed.
00068 The term “polypeptide” encompasses two or more naturally or non-naturally-occurring amino acids joined by a covalent bond (e.g., an amide bond). Polypeptides as described herein include full-length proteins (e.g., fully processed pro-proteins or full-length synthetic polypeptides) as well as shorter amino acid sequences (e.g., fragments of naturally-occurring proteins or synthetic polypeptide fragments).
00069 As used herein, the terms "polypeptide sequence associated with a cell line" means any polypeptide or fragment thereof, modified or unmodified by any macromolecule (such as a sugar molecule or macromolecule) that is produced naturally by a cell or cell line. In some embodiment, the cell line is originally from or derived from a multicellular organism. In some embodiments, a polypeptide sequence associated with the hepatocyte is any polypeptide or fragment thereof, modified or unmodified by any macromolecule (such as a sugar molecule or macromolecule) that is produced naturally by the cell lines in culture. In some embodiments, a polypeptide sequence associated with the cell is any polypeptide sequence comprising any one or plurality of the polypeptides disclosed in Table Y or a sequence that shares about 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence identity with the polypeptides disclosed in Table Y or a functional fragment thereof. In some embodiments, a polypeptide sequence associated with the extracellular matrix consists of any of the polypeptides disclosed in Table Y or a sequence that shares about 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence identity with the polypeptides disclosed in Table Y.
00070 As used herein, the term “purified” means that the polynucleotide or polypeptide or fragment, variant, or derivative thereof is substantially free of other biological material with which it is naturally associated, or free from other biological materials derived, e.g., from a recombinant host cell that has been genetically engineered
to express the polypeptide. That is, e.g., a purified polypeptide is a polypeptide that is at least from about 70% to about 100% pure, i.e., the polypeptide is present in a composition wherein the polypeptide constitutes from about 70% to about 100% by weight of the total composition. In some embodiments, the purified polypeptide is from about 75% to about 99% by weight pure, from about 80% to about 99% by weight pure, from about 90 to about 99% by weight pure, or from about 95% to about 99% by weight pure.
00071 As used herein, the terms “subject,” “individual,” “host,” and “patient,” are used interchangeably herein and refer to any mammalian subject for whom diagnosis, treatment, or therapy is desired, parti humans. The methods described herein are applicable to both human therapy and veterinary applications. In some embodiments, the subject is a mammal, and in other embodiments the subject is a human.
00072 The terms “polynucleotide,” “oligonucleotide” and “nucleic acid” are used interchangeably throughout and include DNA molecules (e.g., cDNA or genomic DNA), RNA molecules (e.g., mRNA), analogs of the DN or RNA generated using nucleotide analogs (e.g., peptide nucleic acids and non-naturally occurring nucleotide analogs), and hybrids thereof. The nucleic acid molecule can be single-stranded or double-stranded. In some embodiments, the nucleic acid molecules of the disclosure comprise a contiguous open reading frame encoding an Cas protein, or a fragment thereof, as described herein. “Nucleic acid" or “oligonucleotide” or “polynucleotide” as used herein may mean at least two nucleotides covalently linked together. The depiction of single strand also defines the sequence of the complementary strand. Thus, a nucleic acid also encompasses the complementary strand of a depicted single strand. Many variants of a nucleic acid may be used for the same purpose as a given nucleic acid. Thus, a nucleic acid also encompasses substantially identical nucleic acids and complements thereof. A single strand provides a probe that may hybridize to a target sequence under stringent hybridization conditions. Thus, a nucleic acid also encompasses a probe that hybridizes under stringent hybridization conditions. Nucleic acids may be single stranded or double stranded, or may contain portions of both double stranded and single stranded sequence. The nucleic acid may be DNA, both genomic and cDNA, RNA, or a hybrid, where the nucleic acid may contain combinations of deoxyribo- and ribo-nucleotides, and combinations of bases including uracil, adenine, thymine, cytosine, guanine, inosine, xanthine hypoxanthine, isocytosine and isoguanine Nucleic acids may be obtained by chemical synthesis methods or by recombinant methods.
00073 A nucleic acid will generally contain phosphodiester bonds, although, in some embodiments, nucleic acid analogs may be included that may have at least one different linkage,
e.g., phosphoramidate, phosphorothioate, phosphorodi thioate, or O-methylphosphoroamidite linkages and peptide nucleic acid backbones and linkages. Other analog nucleic acids include those with positive backbones; non-ionic backbones, and non-ribose backbones, including those described in U.S. Pat. Nos. 5,235,033 and 5,034,506, which are incorporated by reference in their entireties. Nucleic acids containing one or more non-naturally occurring or modified nucleotides are also included within one definition of nucleic acids. The modified nucleotide analog may be located for example at the 5'-end and/or the 3'-end of the nucleic acid molecule. Representative examples of nucleotide analogs may be selected from sugar- or backbone- modified ribonucleotides. It should be noted, however, that also nucleobase-modified ribonucleotides, i.e. ribonucleotides, containing a non-naturally occurring nucleobase instead of a naturally occurring nucleobase such as uridines or cytidines modified at the 5-position, e.g. 5-(2- amino)propyl uridine, 5-bromo uridine; adenosines and guanosines modified at the 8-position, e.g. 8-bromo guanosine; deaza nucleotides, e.g. 7-deaza-adenosine; O- and N-alkylated nucleotides, e.g. N6-methyl adenosine are suitable. The 2'-OH-group may be replaced by a group selected from H, OR, R, halo, SH, SR, NH.sub.2, NHR, N.sub.2 or CN, wherein R is C.sub.1- C.sub.6 alkyl, alkenyl or alkynyl and halo is F, Cl, Br or I. Modified nucleotides also include nucleotides conjugated with cholesterol through, e.g., a hydroxyprolinol linkage as described in Krutzfeldt et al., Nature (Oct. 30, 2005), Soutschek et al., Nature 432:173-178 (2004), and U.S. Patent Publication No. 20050107325, which are incorporated herein by reference in their entireties. Modified nucleotides and nucleic acids may also include locked nucleic acids (LNA), as described in U.S. Patent No. 20020115080, which is incorporated herein by reference. Additional modified nucleotides and nucleic acids are described in U.S. Patent Publication No. 20050182005, which is incorporated herein by reference in its entirety. Modifications of the ribose-phosphate backbone may be done for a variety of reasons, e.g., to increase the stability and half-life of such molecules in physiological environments, to enhance diffusion across cell membranes, or as probes on a biochip. Mixtures of naturally occurring nucleic acids and analogs may be made; alternatively, mixtures of different nucleic acid analogs, and mixtures of naturally occurring nucleic acids and analogs may be made. In some embodiments, the nucleotide sequence encoding one or more Cas proteins is free of modified nucleotide analogs. In some embodiments, the nucleotide sequence encoding one or more antigens comprises from about 1 to about 20 nucleic acid modifications. In some embodiments, the nucleotide sequence encoding
one or more Cas proteins comprises from about 1 to about 50 nucleic acid modifications. In some embodiments, the nucleotide sequence encoding one or more antigens independently comprise from about 1 to about 100 nucleic acid modifications.
00074 As used herein, the term “nucleic acid molecule” comprises one or more nucleotide sequences that encode one or more proteins. In some embodiments, a nucleic acid molecule comprises initiation and termination signals operably linked to regulatory elements including a promoter and polyadenylation signal capable of directing expression in the cells of the individual to whom the nucleic acid molecule is administered. In some embodiments, the nucleic acid molecule also is a plasmid comprising one or more nucleotide sequences that encode one or a plurality of neoantigens. In some embodiments, the disclosure relates to a pharmaceutical composition comprising a first, second, third or more nucleic acid molecules, each of which encoding one or a plurality of neoantigens and at least one of each plasmid comprising one or more of the Formulae disclosed herein.
00075 The terms “polypeptide”, “peptide” and “protein” are used interchangeably herein to refer to polymers of amino acids of any length. The polymer may be linear or branched, it may comprise modified amino acids, and it may be interrupted by non-natural amino acids or chemical groups that are not amino acids. The terms also encompass an amino acid polymer that has been modified; for example, disulfide bond formation, glycosylation, lipidation, acetylation, phosphorylation, or any other manipulation, such as conjugation with a labeling component. As used herein the term “amino acid” includes natural and/or unnatural or synthetic amino acids, including glycine and both the D or L optical isomers, and amino acid analogs and peptidomimetics.
00076 As used herein, “conservative” amino acid substitutions may be defined as set out in Tables A, B, or C below. The vaccines, compositions, pharmaceutical compositions and method may comprise nucleic acid sequences comprising one or more conservative substitutions. In some embodiments, the vaccines, compositions, pharmaceutical compositions and methods comprise nucleic acid sequences that retain from about 70% sequence identity to about 99% sequences identity to the sequence identification numbers disclosed herein but comprise one or more conservative substitutions. Conservative substitutions of the present disclosure include those wherein conservative substitutions (from either nucleic acid or amino acid sequences) have been introduced by modification of polynucleotides encoding polypeptides. Amino acids can be 00077 according to physical properties and contribution to secondary and tertiary protein
structure. A conservative substitution is recognized in the art as a substitution of one amino acid for another amino acid that has similar properties. In some embodiments, the conservative substitution is recognized in the art as a substitution of one nucleic acid for another nucleic acid that has similar properties, or, when encoded, has similar binding affinities to its target. In some embodiments, the target is a cell comprising endogenous DNA to which plasmids of the disclosure initiate a recombination event. Exemplary conservative substitutions are set out in Table A.
Table A - Conservative Substitutions I
Side Chain Characteristics Amino Acid
Aliphatic
Non-polar G A P I L V F
Polar - uncharged C S T M N Q
Polar - charged D E K R
Aromatic H F W Y
Other N Q D E
Alternately, conservative amino acids can be grouped as described in Lehninger, (Biochemistry, Second Edition; Worth Publishers, Inc. NY, N.Y. (1975), pp. 71-77) as set forth in Table B.
Table B — Conservative Substitutions II
Side Chain Characteristic Amino Acid Non-polar
(hydrophobic)
Aliphatic: A L I V P
Aromatic: F W Y
Sulfur-containing: M
Borderline: G Y
Uncharged-polar
Hydroxyl : S T Y
Amides: N Q
Sulfhydryl: C
Borderline: G Y
Negatively Charged (Acidic): D E
Alternately, exemplary conservative substitutions are set out in Table B.
Table B — Conservative Substitutions III
Original Residue
Exemplary Substitution
Ala (A) Vai Leu He Met
Arg (R) Lys His
Asn (N) Gin
Asp (D) Glu
Cys (C) Ser Thr
Gln (Q) Asn
Glu (E) Asp
Gly (G) Ala Vai Leu Pro
His (H) Lys Arg
He (I) Leu Vai Met Ala Phe
Leu (L) He Vai Met Ala Phe
Lys (K) Arg His
Met (M) Leu He Vai Ala
Phe (F) Trp Tyr He
Pro (P) Gly Ala Vai Leu He
Ser (S) Thr
Thr (T) Ser
Trp (W) Tyr Phe He
Tyr (Y) Trp Phe Thr Ser
Vai (V) He Leu Met Ala
00078 It should be understood that the inhibitors described herein are intended to include nucleic acids and, when the nucleic acid sequences of the disclosure are encoded, include polypeptide, polypeptides bearing one or more insertions, deletions, or substitutions, or any combination thereof, of amino acid residues as well as modifications other than insertions, deletions, or substitutions of amino acid residues.
00079 As used herein, “more than one” or “two or more” of the aforementioned amino acid substitutions means 2, 3, 4, 5, 6, 7, 8, 9, 10 or more of the recited amino acid or nucleic acid
substitutions. In some embodiments, “more than one” means 2, 3, 4, or 5 of the recited amino acid substitutions or nucleic acid substitutions. In some embodiments, “more than one” means 2, 3, 4 or more of the recited amino acid substitutions or nucleic acid substitutions. In some embodiments, “more than one” means 2, 3 or 4 of the recited amino acid substitutions or nucleic acid substitutions. In some embodiments, “more than one” means 2 or more of the recited amino acid substitutions or nucleic acid substitutions. In some embodiments, “more than one” means 2 of the recited amino acid substitutions or nucleic acid substitutions.
00080 The “percent identity” or "percent homology" of two polynucleotide or two polypeptide sequences is determined by comparing the sequences using the GAP computer program (a part of the GCG Wisconsin Package, version 10.3 (Accelrys, San Diego, Calif.)) using its default parameters. "Identical" or "identity" as used herein in the context of two or more nucleic acids or amino acid sequences, may mean that the sequences have a specified percentage of residues that are the same over a specified region. The percentage may be calculated by optimally aligning the two sequences, comparing the two sequences over the specified region, determining the number of positions at which the identical residue occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the specified region, and multiplying the result by 100 to yield the percentage of sequence identity. In cases where the two sequences are of different lengths or the alignment produces one or more staggered ends and the specified region of comparison includes only a single sequence, the residues of single sequence are included in the denominator but not the numerator of the calculation. When comparing DNA and RNA, thymine (T) and uracil (U) may be considered equivalent. Identity may be performed manually or by using a computer sequence algorithm such as BLAST or BLAST 2.0. Briefly, the BLAST algorithm, which stands for Basic Local Alignment Search Tool is suitable for determining sequence similarity. Software for performing BLAST analyses is publicly available through the National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov). This algorithm involves first identifying high scoring sequence pair (HSPs) by identifying short words of length within a query sequence that either match or satisfy some positive-valued threshold score T when aligned with a word of the same length in a database sequence. T is referred to as the neighborhood word score threshold (Altschul et al., 1997). These initial neighborhood word hits act as seeds for initiating searches to find HSPs containing them. The word hits are extended in both directions along each sequence for as far as
the cumulative alignment score can be increased. Extension for the word hits in each direction are halted when: 1) the cumulative alignment score falls off by the quantity X from its maximum achieved value; 2) the cumulative score goes to zero or below, due to the accumulation of one or more negative-scoring residue alignments; or 3) the end of either sequence is reached. The Blast algorithm parameters W, T and X determine the sensitivity and speed of the alignment. The Blast program uses as defaults a word length (W) of 11, the BLOSUM62 scoring matrix (see Henikoff et al., Proc. Natl. Acad. Sci. USA, 1992, 89, 10915- 10919, which is incorporated herein by reference in its entirety) alignments (B) of 50, expectation (E) of 10, M=5, N=4, and a comparison of both strands. The BLAST algorithm (Karlin et al., Proc. Natl. Acad. Sci. USA, 1993, 90, 5873-5787, which is incorporated herein by reference in its entirety) and Gapped BLAST perform a statistical analysis of the similarity between two sequences. One measure of similarity provided by the BLAST algorithm is the smallest sum probability (P(N)), which provides an indication of the probability by which a match between two nucleotide sequences would occur by chance. For example, a nucleic acid is considered similar to another if the smallest sum probability in comparison of the test nucleic acid to the other nucleic acid is less than about 1, less than about 0.1, less than about 0.01, and less than about 0.001.
00081 Two single-stranded polynucleotides are “the complement” of each other if their sequences can be aligned in an anti-parallel orientation such that every nucleotide in one polynucleotide is opposite its complementary nucleotide in the other polynucleotide, without the introduction of gaps, and without unpaired nucleotides at the 5' or the 3' end of either sequence. A polynucleotide is "complementary" to another polynucleotide if the two polynucleotides can hybridize to one another under moderately stringent conditions. Thus, a polynucleotide can be complementary to another polynucleotide without being its complement.
00082 00068 The phrase “stringent hybridization conditions” or “stringent conditions” as used herein is meant to refer to conditions under which a nucleic acid molecule will hybridize another nucleic acid molecule, but to no other sequences. Stringent conditions are sequence-dependent and will be different in different circumstances. Longer sequences hybridize specifically at higher temperatures. Generally, stringent conditions are selected to be about 5°C lower than the thermal melting point (Tm) for the specific sequence at a defined ionic strength and pH. The Tm is the temperature (under defined ionic strength, pH and nucleic acid concentration) at which 50% of the probes complementary to the target sequence hybridize to the target sequence at equilibrium.
Since the target sequences are generally present in excess, at Tm, 50% of the probes are occupied at equilibrium. Typically, stringent conditions will be those in which the salt concentration is less than about 1.0 M sodium ion, typically about 0.01 to 1.0 M sodium ion (or other salts) at pH 7.0 to 8.3 and the temperature is at least about 30° C for short probes, primers or oligonucleotides (e g. 10 to 50 nucleotides) and at least about 600C for longer probes, primers or oligonucleotides. Stringent conditions may also be achieved with the addition of destabilizing agents, such as formamide.
00083 By “substantially identical” is meant nucleic acid molecule (or polypeptide) exhibiting at least 50% identity to a reference amino acid sequence (for example, any one of the amino acid sequences described herein) or nucleic acid sequence (for example, any one of the nucleic acid sequences described herein). Preferably, such a sequence is at least about 60%, about 80% or about 85%, and about 90%, about 95% or about 99% identical at the amino acid level or nucleic acid to the sequence used for comparison.
00084 A nucleotide sequence is "operably linked" to a regulatory sequence if the regulatory sequence affects the expression (e.g., the level, timing, or location of expression) of the nucleotide sequence. A "regulatory sequence" is a nucleic acid that affects the expression (e.g., the level, timing, or location of expression) of a nucleic acid to which it is operably linked. The regulatory sequence can, for example, exert its effects directly on the regulated nucleic acid, or through the action of one or more other molecules (e.g., polypeptides that bind to the regulatory sequence and/or the nucleic acid). Examples of regulatory sequences include promoters, enhancers and other expression control elements (e.g., polyadenylation signals). Further examples of regulatory sequences are described in, for example, Goeddel, 1990, Gene Expression Technology: Methods in Enzymology 185, Academic Press, San Diego, Calif, and Baron et al., 1995, Nucleic Acids Res. 23:3605-06.
00085 00071 "Operably linked" as used herein may mean that expression of a gene is under the control of a promoter with which it is spatially connected. A promoter may be positioned 5' (upstream) or 3' (downstream) of a gene under its control. The distance between the promoter and a gene may be approximately the same as the distance between that promoter and the gene it controls in the gene from which the promoter is derived. As is known in the art, variation in this distance may be accommodated without loss of promoter function.
00086 "Promoter" as used herein may mean a synthetic or naturally-derived molecule which is
capable of conferring, activating or enhancing expression of a nucleic acid in a cell. A promoter may comprise one or more specific transcriptional regulatory sequences to further enhance expression and/or to alter the spatial expression and/or temporal expression of same. A promoter may also comprise distal enhancer or repressor elements, which can be located as much as several thousand base pairs from the start site of transcription. A promoter may be derived from sources including viral, bacterial, fungal, plants, insects, and animals. A promoter may regulate the expression of a gene component constitutively, or differentially with respect to cell, the tissue or organ in which expression occurs or, with respect to the developmental stage at which expression occurs, or in response to external stimuli such as physiological stresses, pathogens, metal ions, or inducing agents. Representative examples of promoters include the bacteriophage T7 promoter, bacteriophage T3 promoter, SP6 promoter, lac operator-promoter, tac promoter, SV40 late promoter, SV40 early promoter, RSV-LTR promoter, CMV IE promoter, SV40 early promoter or SV40 late promoter and the CMV IE promoter.
00087 As used herein, the term “sample” refers generally to a limited quantity of something which is intended to be similar to and represent a larger amount of that something. In the present disclosure, a sample is a collection, swab, brushing, scraping, biopsy, removed tissue, or surgical resection that is to be testing for the absence, presence of a transfected cell or an endogenously labeled cell. In some embodiments, a sample believed to contain one or more transformed cells as compared to a “control sample” that is known to be free of one or more transformed cells, transfected cells or cells that are free of the plasmid or plasmid insert of the disclosure. This disclosure contemplates using any one or a plurality of disclosed samples herein to identify, detect, sequence and/or quantify the amount of plasmids or cells comprising plasmids (highly or minimally immunogenic) within a particular sample, which can then be used as an estimate for disclosing the number . In some embodiments, the methods relate to the step of exposing a swab, brushing or other sample from an environment to a set of reagents sufficient to isolate and/or sequence the DNA and RNA of one or a plurality of cells in the sample. In some embodiments, the methods relate to the step of exposing a swab, brushing or other sample of cells from a cell culture system to a set of reagents sufficient to isolate and/or sequence or observe the expression of one or a plurality of amino acids in the cells in the sample.
00088 "Stringent hybridization conditions" as used herein may mean conditions under which a first nucleic acid sequence (e.g., probe) will hybridize to a second nucleic acid sequence (e.g.,
target), such as in a complex mixture of nucleic acids. Stringent conditions are sequencedependent and will be different in different circumstances. Stringent conditions may be selected to be about 5-10°C lower than the thermal melting point (Tm) for the specific sequence at a defined ionic strength pH. The Tm may be the temperature (under defined ionic strength, pH, and nucleic concentration) at which 50% of the probes complementary to the target hybridize to the target sequence at equilibrium (as the target sequences are present in excess, at Tm, 50%> of the probes are occupied at equilibrium). Stringent conditions may be those in which the salt concentration is less than about 1.0 M sodium ion, such as about 0.01-1.0 M sodium ion concentration (or other salts) at pH 7.0 to 8.3 and the temperature is at least about 30°C for short probes (e.g., about 10-50 nucleotides) and at least about 60°C for long probes (e.g., greater than about 50 nucleotides). Stringent conditions may also be achieved with the addition of destabilizing agents such as formamide. For selective or specific hybridization, a positive signal may be at least 2 to 10 times background hybridization. Exemplary stringent hybridization conditions include the following: 50%> formamide, 5x SSC, and 1% SDS, incubating at 42°C, or, 5x SSC, 1% SDS, incubating at 65°C, with wash in 0.2x SSC, and 0.1% SDS at 65°C.
00089 "Substantially complementary" as used herein may mean that a first sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% identical to the 00090 complement of a second sequence over a region of 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
19, 20,
00091 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100 or more nucleotides
00092 or amino acids, or that the two sequences hybridize under stringent hybridization conditions.
00093 "Substantially identical" as used herein may mean that, in respect to a first and a second sequence, a first and second sequence are at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% identical over a region of 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100 or more nucleotides or amino acids, or with respect to nucleic acids, if the first sequence is substantially complementary to the complement of the second sequence.
00094 "Variant" used herein with respect to a nucleic acid means (i) a portion or fragment of a referenced nucleotide sequence; (ii) the complement of a referenced nucleotide sequence or
portion thereof; (iii) a nucleic acid that is substantially identical to a referenced nucleic acid or the complement thereof; or (iv) a nucleic acid that hybridizes under stringent conditions to the referenced nucleic acid, complement thereof, or a sequences substantially identical thereto.
00095 "Variant" may be in an embodiment, with respect to a peptide or polypeptide, a polypeptide that differs in amino acid sequence by the insertion, deletion, or conservative substitution of amino acids, but retain at least one biological activity. Variant may also mean a protein with an amino acid sequence that is substantially identical to a referenced protein with an amino acid sequence that retains at least one biological activity. A conservative substitution of an amino acid, i.e., replacing an amino acid with a different amino acid of similar properties (e.g., hydrophilicity, degree and distribution of charged regions) is recognized in the art as typically involving a minor change. These minor changes can be identified, in part, by considering the hydropathic index of amino acids, as understood in the art. Kyte et al., J. Mol. Biol. 157: 1 OS- 132 (1982). The hydropathic index of an amino acid is based on a consideration of its hydrophobicity and charge. It is known in the art that amino acids of similar hydropathic indexes can be substituted and still retain protein function. In one aspect, amino acids having hydropathic indexes of ±2 are substituted. The hydrophilicity of amino acids can also be used to reveal substitutions that would result in proteins retaining biological function. A consideration of the hydrophilicity of amino acids in the context of a peptide permits calculation of the greatest local average hydrophilicity of that peptide, a useful measure that has been reported to correlate well with antigenicity and immunogenicity. U.S. Patent No. 4,554,101, fully incorporated by reference herein. Substitution of amino acids having similar hydrophilicity values can result in peptides retaining biological activity, for example immunogenicity, as is understood in the art.
00096 Substitutions may be performed with amino acids having hydrophilicity values within ±2 of each other. Both the hydrophobicity index and the hydrophilicity value of amino acids are influenced by the particular side chain of that amino acid. Consistent with that observation, amino acid substitutions that are compatible with biological function are understood to depend on the relative similarity of the amino acids, and particularly the side chains of those amino acids, as revealed by the hydrophobicity, hydrophilicity, charge, size, and other properties.
00097 Nucleic acid molecules or nucleic acid sequences of the disclosure include those coding sequences comprising the first nucleic acid sequence that encodes a selection domain and the second nucleic acid sequence that encodes a selection domain, or functional fragments or variants
thereof that possess no less than about 65, 70, 75, 80, 85, 90, 95, 96, 97, 98, or 99% sequence identity with the coding sequences of the selection domains disclosed herein.
00098 The disclosure relates to a composition comprising a vector. A “vector” is a nucleic acid molecule that can be used to introduce a nucleic acid sequence subcomponent linked to it into a cell. One type of vector is a "plasmid," which refers to a linear or circular double stranded DNA molecule into which additional nucleic acid segments can be ligated. Another type of vector is a viral vector (e.g., replication defective retroviruses, adenoviruses and adeno-associated viruses), in the form of an RNA, DNA or hybrid RNA/DNA molecule comprising viral genome promoter sequences are operably linked to the expressible nucleotide sequence. In some embodiments, the expressible nucleotide sequence is introduced into a cellular genome. Certain vectors are capable of autonomous replication in a host cell into which they are introduced (e.g., bacterial vectors comprising a bacterial origin of replication and episomal mammalian vectors). Other vectors (e g., non-episomal mammalian vectors) are integrated into the genome of a host cell upon introduction into the host cell, and thereby are replicated along with the host genome. An "expression vector" is a type of vector that can direct the expression of a chosen polynucleotide. The disclosure relates to any one or plurality of vectors that comprise nucleic acid sequences encoding any one or plurality of amino acid sequence disclosed herein.
00099 The present disclosure relates to a versatile strategy of tagging endogenous proteins, using a multicomponent HDR donor template that can be easily converted to all three HDR donor forms for parallelized knock-in experiments. This strategy provides additional structural and mechanistic insights that are not revealed by using recombinant proteins. While this disclosure focuses on structural characterization of endogenous proteins, it will be understood by those skilled in the art that the same strategy can be applied to other types of cell lines and used beyond structural biology.
000100 To facilitate efficient tagging by CRISPR/Cas9, the present inventors generated a set of plasmids, with different variations containing a DNA fragment to be knocked-in to the targeted locus, flanked by two multiple cloning sites (MCS) for insertion of left (5’) and right (3’) homology arms (L- and R-arms) flanking the targeted locus. The plasmids also contain a bacterial origin of replication and a selectable antibiotic resistance marker for efficient amplification in bacterial cells. The inserted DNA fragment between two homology arms contains one or a plurality of affinity tags, whose choice can vary to match specific experimental goals, and one or a
plurality of selectable markers for use in mammalian cells. A self-cleaving peptide is inserted between the affinity tag and the selection marker, and between the two selection markers. The two MCSs in the plasmid can be used as cleavage sites to convert the plasmid into a dsDNA HDR donor. With a pair of primers, in which the 3’ primer contains a phosphorylation modification at the 5’ end, and the use of exonuclease digestion, a ssDNA HDR donor can be generated from dsDNA form. Thus, all three forms of HDR donor can be generated without having to make multiple constructs and can be used to tag any target endogenous protein in parallel.
000101 Once inserted at the target locus, this design produces three proteins: the target protein with the affinity tags, and two selection markers separated from the targeted protein. Genome- edited cells can be selected in two steps, i.e., antibiotic treatment followed by fluorescence activated cell sorting (FACS). Alternatively, in practice, a majority of un-edited cells are removed by antibiotic treatment. At this point, the knock-in result can be checked by western blot, sequencing, and/or fluorescent light microscopy. The enriched population of genome-edited cells makes the final sorting by FACS more efficient or even unnecessary. A major advantage of this approach is that the success rate of generating genome-edited cell lines is less dependent on the initial efficiency of CRISRP/Cas9 knock-in. In some embodiments, the disclosure relates to a composition comprising a cell, the cell comprising a nucleic acid molecule disclosed herein. In some embodiments, the cell comprises a protein encoded by a target gene, the protein comprising a label; and two exogenous nucleic acid sequences encoding a first and second selection domain. In some embodiments, the selection domain is chosen from one or a combination of selection domains in Table Y. In some embodiments, the first nucleic acid sequence encoding a selection domain encodes an amino acid that confers resistance to the presence of a chemical substance, such as an antibiotic. In some embodiments, the first nucleic acid encoding a selection domain is a nucleic acid sequence whose presence in the cell confers resistance to a toxin, such as an antibiotic. In some embodiments, the second nucleic acid sequence encoding a selection domain encodes a protein that is free of the amino acid structure of the target protein and is a protein that emits light when exposed to certain wavelengths. In some embodiments, the second nucleic acidsequence encoding a selection domain comprises a nucleic acid sequence of Table Y or variants thereof that comprise from about 70% to about 99% sequence identity to the nucleic acid sequences in Table Y.
000102 In some aspects, the disclosure provides a composition comprising a nucleic acid
molecule comprising: (i) a first and a second homology donor (HDR) region complementary to a target domain; (ii) at least a first and a second nucleic acid sequence that encodes a selection domain, wherein each of the first and second nucleic acid sequences that encode a selection domain are positioned in a 5’ to 3’ orientation between the first and second HDR sequences; (iii) a first nucleic acid sequence encoding a cleavage site positioned between the first HDR sequence and the first nucleic acid sequence that encodes a selection domain; (iv) a second nucleic acid sequence encoding a cleavage site positioned, in a 5’ to 3’ orientation, between the first nucleic acid sequence that encodes a selection domain and the second nucleic acid sequence that encodes a selection domain; (v) a first and a second multiple cloning site, wherein the first multiple cloning site is positioned upstream of the first HDR region and the second multiple cloning site is positioned downstream of the second HDR region.
000103 In some embodiments, the first and the second HDR regions comprise from about 10 to about 30 base pairs in nucleic acid length. In some embodiments, the first and the second HDR regions comprise from about 100 to about 900 base pairs in nucleic acid length. In some embodiments, the first and the second HDR regions comprise from about 50 to about 500 base pairs in nucleic acid length. In some embodiments, the first and the second HDR regions comprise from about 500 to about 900 base pairs in nucleic acid length. In some embodiments, the first nucleic acid encoding a cleavage site, the first nucleic acid sequence encoding a selection domain, the second nucleic acid encoding a cleavage site, and the second nucleic acid sequence encoding a cleavage site are positioned in a contiguous nucleic acid sequence in a 5’ to 3’ orientation. In some embodiments, the first and second nucleic acids encoding a cleavage site encode a P2A cleavage site. In some embodiments, the composition further comprises a protein tag domain positioned, in a 5’ to 3’ orientation, either: (a) between the first HDR region and the first nucleic acid encoding a cleavage site; or (b) between the second nucleic acid encoding a cleavage site and the second HDR region.
000104 In some embodiments, the first nucleic acid sequence encoding a selection domain encodes a fluorescent protein. In some embodiments, the fluorescent protein is chosen from a domain encoding the protein in Table Z. In some embodiments, the second nucleic acid encoding a selection domain encodes a selection domain that confers resistance to exposure of a toxic chemical. In some embodiments, the toxic chemical is an antibiotic. In some embodiments, the antibiotic is chosen from any antibiotic chosen from Table X or the gene that confers antibiotic
resistance to the antibiotic is chosen from one or a combination of nucleic acid sequences disclosed in Table X or Table Y.
000105 In some embodiments, the first multiple cloning site comprises at least about 70% sequence identity to SEQ ID NO:59. In some embodiments, the second multiple cloning site comprises at least about 70% sequence identity to SEQ ID NO:2 . In some embodiments, the nucleic acid molecule further comprises an origin of replication comprising at least 70% sequence identity to SEQ ID NO:56. In some embodiments, the nucleic acid molecule further comprises an origin of replication comprising at least 70% sequence identity to SEQ ID NO:57. 00088 In some embodiments, elements (i) through (v) are positioned in a modification element, wherein the nucleic acid molecule further comprises a regulatory sequence operably linked to a third nucleic acid sequence encoding a selection domain that is positioned outside of the modification element. In some embodiments, the third nucleic acid sequence encoding a selection domain confers puromycin resistance. In some embodiments, elements (i) through (v) are positioned in a modification element, and wherein the nucleic acid molecule further comprises an origin of replication positioned outside of the modification element. In some embodiments, the disclosure relates to a composition comprising a nucleic acid molecule comprising at least a first and second origin of replication. In some embodiments, the first and second origin of replication are a nucleic acid sequence comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO 56 and SEQ ID NO:57, respectively. In some embodiments, the nucleic acid molecule is a plasmid, a double stranded DNA or a single stranded DNA molecule. In some embodiments, the composition further comprises a transfection reagent. In some embodiments, the nucleic acid molecule comprises a selection domain comprising SEQ ID NO:34 or a functional fragment thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence identity to SEQ ID NO:34. In some embodiments, the nucleic acid molecule comprises a selection domain comprising SEQ ID NO:35 or a functional fragment thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence identity to SEQ ID NO:35. 000106 In some embodiments, the nucleic acid molecule comprises a selection domain comprising SEQ ID NO:36 or a functional fragment thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence identity to SEQ ID NO:36. In some embodiments, the nucleic acid molecule comprises a selection domain comprising SEQ
ID NO:37 or a functional fragment thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%sequence identity to SEQ ID NO:37. In some embodiments, the nucleic acid molecule comprises a selection domain comprising SEQ ID NO:38 or a functional fragment thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence identity to SEQ ID NO:38.
000107 In some embodiments, the nucleic acid molecule comprises a selection domain comprising SEQ ID NO:39 or a functional fragment thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:39. In some embodiments, the nucleic acid molecule comprises a cleavage domain comprising SEQ ID NO:29 or a functional fragment thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:29.
000108 In some embodiments, the nucleic acid molecule comprises a cleavage domain comprising SEQ ID NO:30 or a functional fragment thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:30.
000109 In some embodiments, the nucleic acid molecule comprises a cleavage domain comprising SEQ ID NO:31 or a functional fragment thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:31.
000110 In some embodiments, the nucleic acid molecule comprises a cleavage domain comprising SEQ ID NO:32 or a functional fragment thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:32.
000111 In some embodiments, the nucleic acid molecule comprises a cleavage domain comprising SEQ ID NO:33 or a functional fragment thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:33. In some embodiments, the nucleic acid molecule comprises a cleavage domain comprising SEQ ID NO:63 or a functional fragment thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:63.
000112 In some embodiments, the nucleic acid molecule comprises a nucleic acid conferring prokaryotic antibiotic resistance comprising SEQ ID NO:49 or a functional fragment thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:49. In some embodiments, the nucleic acid molecule comprises a nucleic acid conferring prokaryotic antibiotic resistance comprising SEQ ID NO:50 or a functional fragment thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%,
000113 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:50. In some embodiments, the nucleic acid molecule comprises a nucleic acid conferring prokaryotic antibiotic resistance comprising SEQ ID NO:51 or a functional fragment thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100%
000114 sequence identity to SEQ ID NO:51. In some embodiments, the nucleic acid molecule comprises a nucleic acid conferring prokaryotic antibiotic resistance comprising SEQ ID NO:52 or a functional fragment thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:52. In some embodiments, the nucleic acid molecule comprises a nucleic acid conferring prokaryotic antibiotic resistance comprising SEQ ID NO:53 or a functional fragment thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:53. In some embodiments, the nucleic acid molecule comprises a nucleic acid conferring prokaryotic antibiotic resistance comprising SEQ ID NO:54 or a functional fragment thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO: 54.
000115 In some embodiments, the nucleic acid molecule comprises a nucleic acid conferring prokaryotic antibiotic resistance comprising SEQ ID NO:58 or a functional fragment thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:58. In some embodiments, the nucleic acid molecule comprises a nucleic acid sequence encoding a LoxP site comprising SEQ ID NO: 60 or a variant thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:60. In some embodiments, the nucleic acid molecule comprises a nucleic acid sequence encoding a LoxP site comprising SEQ ID NO:61 or a variant thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%,
98%, 99% or 100% sequence identity to SEQ ID NO:61 . In some embodiments, the nucleic acid molecule comprises a nucleic acid sequence comprising SEQ ID NO:60 or a variant thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:60; and SEQ ID NO:61 or a variant thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:61.
000116 In some embodiments, the nucleic acid molecule comprises a nucleic acid sequence comprising a multiple cloning site comprising SEQ ID NO:40 or a variant thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100%
000117 sequence identity to SEQ ID NO:40. In some embodiments, the nucleic acid molecule comprises a nucleic acid sequence comprising a multiple cloning site comprising SEQ ID NO:41 or a variant thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:41. In some embodiments, the nucleic acid molecule comprises a nucleic acid sequence comprising a multiple cloning site comprising SEQ ID NO:42 or a variant thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:42. In some embodiments, the nucleic acid molecule comprises a nucleic acid sequence comprising a multiple cloning site comprising SEQ ID NO:43 or a variant thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:43. In some embodiments, the nucleic acid molecule comprises a nucleic acid sequence comprising a multiple cloning site comprising SEQ ID NO:44 or a variant thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or
000118 100% sequence identity to SEQ ID NO:44. In some embodiments, the nucleic acid molecule comprises a nucleic acid sequence comprising a multiple cloning site comprising SEQ ID NO:45 or a variant thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:45. In some embodiments, the nucleic acid molecule comprises a nucleic acid sequence comprising a multiple cloning site comprising SEQ ID NO:46 or a variant thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:46. In some embodiments, the nucleic acid molecule comprises a nucleic acid sequence comprising a
multiple cloning site comprising SEQ ID NO:47 or a variant thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:47. In some embodiments, the nucleic acid molecule comprises a nucleic acid sequence comprising a multiple cloning site comprising SEQ ID NO:48 or a variant thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:48. In some embodiments, the nucleic acid molecule comprises a nucleic acid sequence comprising a multiple cloning site comprising SEQ ID NO: 59 or a variant thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO: 59. In some embodiments, the nucleic acid molecule of the disclosure comprises any two of the above-mentioned nucleic acid sequences that are multiple cloning sites.
000119 In some embodiments, the nucleic acid molecule comprises a nucleic acid sequence encoding a protein tag as a selection domain comprising SEQ ID NO:62 or a variant thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:62. In some embodiments, the nucleic acid molecule comprises a nucleic acid sequence encoding a protein tag as a selection domain comprising SEQ ID NO:65 or a variant thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:65. In some embodiments, the nucleic acid molecule comprises a nucleic acid sequence encoding a protein tag as a selection domain comprising SEQ ID NO:66 or a variant thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:66. In some embodiments, the nucleic acid molecule comprises a nucleic acid sequence encoding a protein tag as a selection domain comprising SEQ ID NO:67 or a variant thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:67. In some embodiments, the nucleic acid molecule comprises a nucleic acid sequence encoding a protein tag as a selection domain comprising SEQ ID NO:68 or a variant thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:68.
000120 In some embodiments, the nucleic acid molecule comprises a nucleic acid sequence comprising SEQ ID NO:55 or a variant thereof comprising 70%, 75%, 80%, 85%, 90%,
91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:55.
In some embodiments, the nucleic acid molecule comprises a nucleic acid sequence comprising SEQ ID NO:69 or a variant thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:69, also known as pYC.
000121 In a second aspect, the disclosure provides a cell or a plurality of cells comprising an endogenous nucleic acid sequence encoding an expressible amino acid, said endogenous nucleic acid sequence modified on its amino or carboxy terminus by an expressible exogenous modification element comprising: (i) at least a first and a second nucleic acid sequence that encode a selection domain, wherein each of the first and second nucleic acid sequences that encode a selection domain; (ii) a first nucleic acid sequence encoding a cleavage site positioned between the amino or carboxy terminus and the first nucleic acid sequence that encodes a selection domain; and (iii) a second nucleic acid sequence encoding a cleavage site positioned between the first nucleic acid sequence that encodes a selection domain and the second nucleic acid sequence that encodes a selection domain. In some embodiments, the modification element further comprises a nucleic acid sequence encoding a protein tag. In some embodiments, the protein tag is any amino acid encoded by sequence identifier chosen from SEQ ID NO: 65 through SEQ ID NO:68 or SEQ ID:62, or variant thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% an amino acid sequence encoded by SEQ ID NO:65 through SEQ ID NO:68 or SEQ ID:62. In some embodiments, the protein tag is any amino acid encoded by sequence identifier chosen from SEQ ID NO:65 through SEQ ID NO:68 or SEQ ID:62, or variant thereof comprising 70%, 75%, 80%, 85%, 90%, 91%, 92%,
000122 93%, 94%, 95%, 96%, 97%, 98%, or 99% an amino acid sequence encoded by
SEQ ID NO:65 through SEQ ID NO:68 or SEQ ID:62; and the cell also comprises one or more antibiotic selection domains chosen from one or a combination of SEQ ID NO: 34 - 39, 49 - 54, 58, or 64. 00091 In some embodiments, the plurality of cells comprises one or a combination ofcompositions according to the first aspect of the invention. In some embodiments, the cells comprise NB458 cells, 293T cells and/or Jurkat cells. In some embodiments, at least about 30% of the cells comprise the one or a portion of the modification element in their endogenous DNA from about 7 to about 18 days in culture. In some embodiments, the cells have a doubling time of about 4 days. In some embodiments, the cell is the cell line identified in Table Z, comprises any of the nucleotide sequences disclosed herein and has the doubling time identified in Table Z.
000123 In a third aspect, the disclosure provides a method of culturing a cell comprising: exposing the composition of the first aspect to one or plurality of cells for a period of time sufficient to transfect the cell. In some embodiments, the plurality of cells comprise at least one NB458 cell and at least one non-cancerous cell. In some embodiments, the method further comprises exposing the one or plurality of cells to a selection stimulus sensitive to the first or second nucleic acid sequence encoding the selection domain. In some embodiments, the method further comprises exposing the one or plurality of cells to a selection stimulus sensitive to the first and second nucleic acid sequence encoding the selection domain. In some embodiments, the selection stimulus is an antibiotic. In some embodiments, the antibiotic is chosen from one or a combination of antibiotics from Table Y. In some embodiments, the antibiotic is puromycin. In some embodiments, the selection stimulus is exposure to light with a wavelength from about 500 nm to about 650 nm. In some embodiments, the selection stimulus is exposure to light with a wavelength in the range of any of the stimulant wavelengths disclosed in Table Y if the fluorescent protein corresponding to that wavelength is expressed in the cell.
000124 In some embodiments, the method further comprises a step of exposing the plurality of cells to a Cas protein for a period of time sufficient to cleave at least one region of endogenous DNA of a cell in the plurality of cells. In some embodiments, the method further comprises a step of culturing the one or plurality of cells from about 10 to about 14 days. In some embodiments, the method further comprises a step of culturing the one or plurality of cells from about 10 to about 60 days or a sufficient time after exposure of the cell or plurality of cells to a selection agent, such as an antibiotic, for a period of time to kill cells in the culture that did become transfected and/or whose endogenous DNA was not modified by the nucleic acid molecule of the disclosure.
000125 In a fourth aspect, the disclosure provides a method of editing endogenous DNA of one or a plurality of cells comprising exposing the nucleic acid molecule according to the first aspect to one or plurality of cells for a period of time sufficient to transfect the cell. In some embodiments, the method further comprises exposing the plurality of cells to a Cas protein for a period of time sufficient to cleave at least one region of endogenous DNA of a cell in the plurality of cells prior to exposing the cells to nucleic acid molecule according to the first aspect, such that endogenous DNA of the one or plurality of cells is cleaved at a target sequence. In some embodiments, the method further comprises exposing the nucleic acid molecule to the cleaved
endogenous DNA for a time period sufficient for the modification element to integrate into the endogenous DNA of the one or plurality of cells at the target sequence. In some embodiments, the method further comprises culturing the one or plurality of cells for no less than about 7 days. 000126 In a fifth aspect, the disclosure provides a method of isolating a protein in a cell comprising (a) exposing the nucleic acid molecule of the first aspect to the one or plurality of cells for a period of time sufficient to transfect the cell. In some embodiments, the method further comprises (b) exposing the plurality of cells to a Cas protein for a period of time sufficient to cleave at least one region of endogenous DNA of a cell in the plurality of cells prior to step (a), such that endogenous DNA of the one or plurality of cells is cleaved at a target sequence. In some embodiments, the method further comprises exposing the nucleic acid molecule to the cleaved endogenous DNA for a time period sufficient for the modification element to integrate into the endogenous DNA of the one or plurality of cells at the target sequence. In some embodiments, the method further comprises culturing the one or plurality of cells for no less than about 7 days. In some embodiments, the method further comprises allowing the one or plurality of cells to express a protein modified at the target sequence with the modification element, wherein the modification element comprises a nucleic acid encoding a protein tag. In some embodiments, the method further comprises isolating the protein by exposing the protein tag to one or a plurality of capture elements that associate with or bind to the protein tag. In some embodiments, the method further comprises isolating or precipitating the capture element.
000127 In a sixth aspect, the disclosure provides a method of endogenously labeling a protein in a cell comprising (a) exposing the nucleic acid molecule of the first aspect to the one or plurality of cells for a period of time sufficient to transfect the cell. In some embodiments, the method further comprises (b) exposing the plurality of cells to a Cas protein for a period of time sufficient to cleave at least one region of endogenous DNA of a cell in the plurality of cells prior to step (a), such that endogenous DNA of the one or plurality of cells is cleaved at a target sequence. In some embodiments, the method further comprises exposing the nucleic acid molecule to the cleaved endogenous DNA for a time period sufficient for the modification element to integrate into the endogenous DNA of the one or plurality of cells at the target sequence, such that the protein is expressed with the modification element at the target sequence. In some embodiments, the method further comprises culturing the one or plurality of cells for no less than about 7 days. In some embodiments, the step or steps of exposing the cell to the nucleic
acid protein and/or a Cas protein is free of exposure of the cells to a viral particle or a viral vector, whether that vector is replication-deficient or attenuated.
000128 In a seventh aspect, the disclosure provides a method of screening for therapeutic agent in a cell comprising (a) exposing any one or plurality of cells according to the second aspect to a pathogen. In some embodiments, the pathogen is chosen from one or a combination of lentiviruses, hepatitis viruses, papilloma viruses, corona viruses, influenza viruses and rotoviruses. In some embodiments, the method further comprises exposing the one or plurality of cells to a library of agents. In some embodiments, the step of exposing is performed in the presence or absence of a viral inhibitor. In some embodiments, the pathogen is a bacterial cell. In some embodiments, the pathogen is a fungal cell. In some embodiments, the one or plurality of cells are human cells.
000129 Gene editing enzymes of the disclosure are chosen from meganucleases, transposases, and Cas proteins. Another aspect of the disclosure relates to a system comprising a CRISPR enzyme (or "Cas protein") or a nucleotide sequence encoding one or more Cas proteins; and the nucleic acid molecules of the disclosure. Any protein capable of enzymatic activity in cooperation with a guide sequence is a Cas protein. In some embodiments, the disclosure relates to a system comprising a vector comprising a regulatory element operably linked to an enzymecoding sequence encoding a CRISPR enzyme, such as a Cas protein from the Cas family of enzymes. In some embodiments, the disclosure relates to a system or composition comprising any one or plurality of Cas proteins either individually or in combination with one or a plurality of guide sequences. Compositions of one or a plurality of Cas proteins may be administered to a cell with any of the disclosed nucleic acid sequences sequentially or contemporaneously. Non-limiting examples of Cas proteins include Casl, CaslB, Cas2, Cas3, Cas4, Cas5, Cas6, Cas7, Cas8, Cas9 (also known as Csnl and Csxl2), CaslO, Csyl, Csy2, Csy3, Csel, Cse2, Cscl, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, Cmrl, Cmr3, Cmr4, Cmr5, Cmr6, Csbl, Csb2, Csb3, Csxl7, Csxl4, CsxlO, Csxl6, CsaX, Csx3, Csxl, Csxl5, Csfl, Csf2, Csf3, Csf4, type V CRISPR-Cas systems, variants and fragments thereof, or modified versions thereof comprising at least 70% sequence identity to the sequences of Table C. These enzymes are known; for example, the amino acid sequence of S. pyogenes Cas9 protein may be found in the SwissProt database under accession number Q99ZW2. In some embodiments, the unmodified CRISPR enzyme has DNA cleavage activity, such as Cas9. In some embodiments the CRISPR enzyme is Cas9, and may be
Cas9 from S. pyogenes or S. pneumoniae . In some embodiments, the CRISPR enzyme directs cleavage of one or both strands of endogenous DNA in the disclosed cell or cells at the location of a target sequence, such as within the target sequence and/or within the complement of the target sequence. In some embodiments, the CRISPR enzyme directs cleavage of one or both strands within about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 50, 100, 200, 500, or more base pairs from the first or last nucleotide of a target sequence. In some embodiments, a vector encodes a CRISPR enzyme or Cas protein that is mutated to with respect to a corresponding wild-type enzyme, such that the CRISPR enzyme lacks the ability to cleave one or both strands of a target polynucleotide containing a target sequence. For example, an aspartate- to-alanine substitution (D10A) in the RuvC I catalytic domain of Cas9 from S. pyogenes converts Cas9 from a nuclease that cleaves both strands to a nickase (cleaves a single strand). Other examples of mutations that render Cas9 a nickase include, without limitation, H840A, N854A, and N863A. In some embodiments, a Cas9 nickase may be used in combination with guide sequenc(es), e.g., two guide sequences, which target respectively sense and antisense strands of the DNA target. Other mutations may be useful; where the Cas9 or other CRISPR enzyme is from a species other than S. pyogenes, mutations in corresponding amino acids may be made to achieve similar effects. In some embodiments, the composition of the disclosure comprises an amino acid sequence of at least about 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity to Cas9 below:
TABLE C - Cas proteins
Accession Numbers of Cas proteins (or those related with Cas-like function) and Nucleic Acids encoding the same. All amino acid and nucleic acid sequences associated with the Accession Numbers below as of June 19, 2023, are incorporated by reference in their entireties. Any mutants or variants that comprise at least 70, 75, 80, 85, 90, 95, 96, 97, 98, 99% sequence identity to the encoded nucleic acids or amino acids set forth in the Accession Numbers below are also incorporated by reference in their entireties. NC_014644.1 ; NC_002967.9;
NC_007929.1; NC_000913.3; NC_004547.2; NC_009380.1; NC_011661.1; NC_010175.1; NC 010175.1; NC 010175.1; NC 003413.1; NC 000917.1; NC 002939.5; NC 018227.2;
NC_004829.2; NC_021921.1; NC_014160.1; NC_011766.1; NC_007681.1; NC_021592.1; NC_021592.1 ; NC_021169.1 ; NC_020517.1; NC_018656.1; NC_018015.1; NC_018015.1 ; NC_017946.1; NC_017576.1; NC_017576.1; NC_015865.1; NC_015865.1; NC_015680.1; NC_015680.1; NC_015474.1; NC_015435.1; NC_013790.1; NC_013790.1; NC_012883.1; NC_012470.1 ; NC_016051.1; NC_010610.1 ; NC_009515.1; NC_008942.1 ; NC_007181.1; NC_007181.1; NC_006624.1; NC_006448.1; NC_002935.2; NC_002935.2; NC_002950.2;
NC_002950.2; NC_002663.1; NC_002663.1; NC_004557.1; NC_004557.1; NC_019943.1; NC_019943.1 ; NC_019943.1 ; NC_017459.1 ; NC_017459.1 ; NC_015518.1; NC_015460.1 ; NC_015416.1; NC_014933.1; NC_013961.1; NC_013202.1; NC_013158.1; NC_009464.1; NC_008508.1 ; NC_007426.1 ; NC_000917.1; NC_003901.1 ; NC_003901.1; NC_003106.2; NC_009434.1 ; NC_005085.1 ; NC_005085.1 ; NC_020247.1 ; NC_020247. 1 ; NC_020246.1 ;
NC_020246.1 ; NC_018224.1 ; NC_015943.1 ; NC_011138.3; NC_009778.1 ; NC_006834.1 ; NC_014228.1 ; NC_010002.1 ; NC_013892.1 ; NC_010296.1 ; NC_009615.1; NC_012632.1;
NC_012632.1; NC_012588.1; NC_012588.1; NC 007643.1; NC 002939.5; NC_011296.1;
NC_011296.1; NC_018609.1; NC_021355.1; NC_021355.1; NC_020800.1 ; NC_019942.1 ;
NC_019792.1; NC_015958.1; NC_015678.1; NC_015636.1; NC_015562.1; NC_014222.1;
NC_014222.1; NC_014002.1; NC_013887.1; NC_013156.1; NC_011832. 1; NC_009953.1; NC_009635.1; NC_009634.1 ; NC_008618.1; NC_007955.1; NC_007955.1 ; NC_007955.1 ; NC_007955.1 ; NC_007955.1 ; NC_007796.1 ; NC_002754.1 ; NC_002754.1 ; NC_011835.1; NC_013198.1; NC_000962.3; NC_002163.1; NC_017034.1; NC_009089.1; NC_008698.1;
NC_020419.1; NC_020419.1; NC_020419.1; NC_015847.1; NC_014374.1; NC_013520.1;
NC_010482.1 ; NC_009776.1 ; NC_009776.1 ; NC_009033.1; NC_000916.1; NC_018015.1;
NC_015518.1; NC_014537.1; NC_009440.1; NC_007644.1; NC_007644.1; NC_022246.1;
NC_019943.1 ; NC_016023.1 ; NC_016023.1 ; NC_015416.1; NC_013722.1 ; NC_013722.1 ;
NC_009464.1; NC_007643.1; NC_007643.1; NC_007643.1; NC_003106.2; NC_004342.2;
NC_018658.1; NC_017276.1; NC_017275.1; NC_016112.1; NC_016112.1; NC_003552.1;
NC_003197.1; NC_003198.1; NC_012726.1 ; NC_012623.1 ; NC_015964.1 ; NC_023069.1; NC_023044.1; NC_022777.1; NC_022777.1; NC_022777.1; NC_013769.1; NC_013769.1;
NC_011832. l; NC_011296. l; NC_009712.1; NC_009634.1; NC_009439.1; NC_009135.1;
NC_008599.1 ; NC_007796.1; NC_007796.1; NC_007796.1; NC_007355. l; NC_021082.1; NC_018001.1; NC_009785.1 ; NC_022084.1 ; NC_018092.1 ; NC_014804.1 ; NC_014147.1;
NC_009053.1; NC_000961.1; NC_000961.1; NC_021058.1; NC_018876.1; NC_018876.1;
NC_018081.1; NC_011567.1; NC_016901.1; NC_014500.1; NC_013715.1; NC_019977.1;
NC_019042.1; NC_017274.1; NC_015954.1; NC_015676.1; NC_015320.1; NC_014122.1;
NC_014122.1; NC_013407.1; NC_014961.1; NC_013926.1; NC_013926.1; NC_021353.1;
NC_008818.1; NC_021058.1; NC_015151.1; NC_013849. 1 ; NC_009051.1; NC_018876.1;
NC_018876.1; NC_014507.1; NC_015574.1; NC_014500.1; NC_012622.1; NC_012589.1;
NC_009515.1; NC_017275.1 ; NC_000913.3; NC_017527.1; NC_018227.2; NC_007355.1;
NC_014106.1; NC_010610.1; NC_008054.1; NC_007164.1; NC_015760.1; NC_009953.1;
NC_010572.1; NC_009613.3; NC_014334.1; NC_008526.1; NC_026150.1; NC_015776.1;
NC_007116.6; NC_012779.2; NC_003901.1; NC_020892.1 ; NC_011832.1; NC_003143.1;
NC_003143.1; NC_008800.1; NC_011308.1; NC_008942.1; NC_007297.1; NC_005877.1;
NC_005877.1 ; NC_002689.2; NC_006085.1 ; NC_004116.1 ; NC_010397.1 ; NC_009917.1 ;
NC_012490.1 ; NC_006067.1 ; NW_004197518.1; NC_022777.1 ; NC_019042.1 ; NC_004547.2;
NC 002695.1; NC 017634.1; NC 003143.1; NC 002737.2; NC 002737.2; NC 000918.1;
NC_020913.1; NC_006448.1; NC_022093.1; NC_022093.1; NC_015680.1; NC_007297.1;
NC_004350.2; NC_004350.2; NC_004350.2; NC_004350.2; NC_003454.1; NC_000853.1;
NC_018876.1; NC_009440.1; NC_009009.1; NC_009009.1; NC_002932.3; NC_002932.3;
NC_026150.1; NC_003552.1; NC_025263.1; NC_016112.1; NC_011098.1; NC_007643.1;
NC_007643.1; NC_007643.1; NC_006347.1; NC_005140.1; NC_004342.2; NC_002945.3;
NW_007382731.1; NW_007381138.1; NC_024320.1; NW_005756335.1; NW_003384463.1;
NC_019977.1; NC_011296.1; NC_007929.1; NC_000913.3; NC_003413.1; NC_002754.1;
NC_010175.1; NC_010175.1; NC_010175.1; NC_011661.1; NC_014537.1; NC_012470.1;
NC_004829.2; NC_015516.1; NC_014374.1; NC_009033.1; NC_007681.1; NC_002689.2;
NC_006085.1; NC_021592.1; NC_021592.1; NC_021169.1; NC_020517.1; NC_018015.1;
NC_018015.1 ; NC_018015.1 ; NC_017946.1 ; NC_017946. l; NC_017576. l; NC_017576.1;
NC_015865.1; NC_015865.1; NC_015847.1; NC_015680.1; NC_015680.1; NC_015474.1;
NC_015435.1; NC_014106.1; NC_013790.1; NC_012883.1; NC_012804.1; NC_016051.1;
NC_011529.1 ; NC_010482.1 ; NC_009515.1; NC_009440.1 ; NC_008942.1 ; NC_008054.1 ;
NC_007181.1; NC_006624.1; NC_006448.1; NC_006448.1; NC_002935.2; NC_002935.2;
NC_002950.2; NC_002663.1; NC_019943.1; NC_019943.1; NC_017459.1; NC_017459.1;
NC_016023.1; NC_015518.1; NC_015460.1; NC_015460.1; NC_015416.1; NC_014933.1;
NC_013202.1; NC_013158.1; NC_009464.1; NC_008508.1; NC_003901.1; NC_009434.1;
NC 005085.1; NC 020247.1; NC 020246.1; NC 018224.1; NC 015943.1; NC 009380.1;
NC_006834.1; NC_003552.1; NC_017276.1; NC_017275.1; NC_010296.1; NC_009615.1;
NC_012632.1; NC_012632.1; NC_012623.1; NC_012588.1; NC_012588.1; NC_007181.1;
NC_002939.5; NC_020247.1; NC_020246.1; NC_011296.1; NC_011296.1; NC_011296.1;
NC_018609.1; NC_015964.1; NC_021355.1; NC_020800.1; NC_019942.1; NC_019792.1;
NC_015958.1; NC_015760.1; NC_015678.1; NC_015636.1; NC_015562.1; NC_014222.1;
NC_014222.1; NC_013887.1; NC_013769.1; NC_013156.1; NC_009953.1; NC_009635.1;
NC_009634.1; NC_009135.1; NC_008618.1; NC_008599.1; NC_007955.1; NC_007796.1;
NC_007355.1; NC_002754.1 ; NC_010572.1 ; NC_015151.1; NC_000962.3 ; NC_021921.1 ;
NC_002163.1; NC_017034.1; NC_009089.1; NC_008698.1; NC_020419.1; NC_020419.1;
NC_020419.1 ; NC_014160.1 ; NC_011766.1 ; NC_007681.1 ; NC_000916.1 ; NC_017527.1 ;
NC_013790.1; NC_013790.1; NC_000917.1; NC_000917.1; NC_004557.1; NC_004557.1;
NC 022246.1; NC 017384.1; NC 013722.1; NC 007643.1; NC 007643.1; NC 007643.1;
NC_007643.1; NC_007643.1; NC_002967.9; NC_004342.2; NC_016112.1; NC_016112.1;
NC_005140.1; NC_005140.1; NC_012726.1; NC_023069.1; NC_023044.1; NC_022777.1;
NC_022777.1; NC_011296.1; NC_021355.1; NC_009634.1; NC_007796.1; NC_007355.1;
NC_021082.1; NC_013926.1; NC_020913.1; NC_014961.1; NC_014658.1; NC_013198.1;
NC_005877.1; NC_009785.1; NC_022084.1; NC_018092.1; NC_014804.1; NC_000961.1;
NC_021058.1; NC_018081.1; NC_013849.1; NC_011567.1; NC_015574.1; NC_014500.1;
NC_012622.1; NC_012589.1; NC_012589.1; NC_019977.1; NC_019042.1; NC_017274.1;
NC_017274.1; NC_015954.1; NC_015676.1; NC_015320.1; NC_014122.1; NC_014122.1;
NC_013407.1; NC_011835.1; NC_021353.1; NC_018001.1; NC_008818.1; NC_000961.1;
NC_015931.1; NC_019042.1 ; NC_013961.1; NC_011138.3; NC_009778.1 ; NC_014228.1 ;
NC_013892.1 ; NC_011832.1; NC_009439.1 ; NC_007955.1 ; NC_007796. 1 ; NC_013520.1;
NC_016070.1; NC_007426.1; NC_003106.2; NC_003106.2; NC_018227.2; NC_000913.3;
NC_005085.1 ; NC_009613.3; NC_014334.1; NW_006726754.1 ; NC_002663.1 ; NC_003143.1 ;
NC_003076.8; NC_015666.1; NC_014644.1; NC_004116.1; NC_003454.1; NC_011567.1;
NC_024905.1; NC_003295.1; NC_008526.1; NC_012871.1; NC_012871.1; NC_010682.1;
NC_002737.2; NC_002737.2; NC_017954.1; NC_009515.1; NC_007297.1; NC_007297.1;
NC_004350.2; NC_004350.2; NC_000853.1; NC_009009.1; NC_007644.1; NC_007644.1;
NC_002967.9; NC_002932.3; NC_002932.3; NC_007643.1; NC_007606.1; NC_006347.1;
NC 002945.3; NW 006804726.1 ; NW 006383769.1; NC 013769.1; NC 014644.1;
NC_000913.3; NC_019943.1; NC_019943.1; NC_011661.1; NC_010175.1; NC_002950.2;
NC_004547.2; NC_013887.1; NC_013156.1; NC_007426.1; NC_002939.5; NC_021169.1;
NC_020517.1; NC_018015.1; NC_017946.1 ; NC_015865.1; NC_015680. 1 ; NC_009515.1;
NC_004557.1; NC_005085.1; NC_006834.1; NC_011296.1; NC_010175.1; NC_020800.1;
NC_015958.1; NC_009635.1; NC_008618.1; NC_007355.1; NC_009089.1; NC_020419.1;
NC_021592.1; NC_021592.1; NC_015847.1; NC_013790.1; NC_016051.1; NC_007644.1;
NC_007644.1; NC_017459.1; NC_015416.1; NC_013722.1; NC_007643.1; NC_007643.1;
NC_007643.1; NC_009434.1; NC_005085.1; NC_003552.1; NC_014318.1; NC_021355.1;
NC_014222.1; NC_014222.1; NC_011832.1; NC_009634.1; NC_009135.1; NC_021082.1;
NC_000961.1 ; NC_015574.1; NC_014228.1 ; NC_014122.1 ; NC_009439.1; NC_017459.1;
NC_015460.1; NC_O11138.3; NC_009380.1; NC_017275.1; NC_013892.1; NC_021353.1;
NC 015676.1; NC 011296.1; NC 007955.1; NC 009953.1; NC 009953.1; NC 021921.1;
NC_014160.1; NC_010482.1; NC_009776.1; NC_009033.1; NC_016070.1; NC_O 16070.1;
NC_015435.1; NC_009440.1; NC_017384.1; NC_013722.1; NC_016112.1; NC_012726.1;
NC_022777.1; NC_008698.1; NC_008599.1; NC_007955.1; NC_007355.1; NC_014147.1;
NC_021058.1; NC_021058.1; NC_016901.1; NC_014500.1; NC_014500.1; NC_014961.1;
NC_018001.1; NC_015931.1; NC_O15151.1; NC_013849.1; NC_013715.1; NC_011766.1;
NC_018001.1; NC_014644.1; NC_017034.1; NC_009033.1; NC_002754.1; NC_009089.1;
NC_002939.5; NC_014106.1; NC_010610.1; NC_008054.1; NC_003413.1; NC_009464.1;
NC_008526.1; NC_015474.1; NC_012804.1; NC_015518.1; NC_017276.1; NC_017275.1;
NC_012632.1; NC_012623.1; NC_012588.1; NC_015636.1; NC_015562.1; NC_013769.1;
NC_002754.1; NC_017634.1; NC_014160.1; NC_011766.1; NC_016070.1; NC_015435.1;
NC_009440.1; NC_009440.1; NC_012726.1; NC_012632.1; NC_012588.1; NC_013887.1;
NC_013156.1; NC_011296.1; NC_002754.1; NC_011835.1; NC_018092.1; NC_021058.1;
NC_012622.1; NC_012589.1; NC_015954.1; NC_013407.1; NC_018001.1; NC_013849.1;
NC_017274.1; NC_000913.3; NC_003413.1; NC_018092.1; NC_000961.1; NC_000918.1;
NC_007796.1; NC_000868.1; NC_022084.1; NC_018015.1; NC_015865.1; NC_015680.1;
NC_015474.1; NC_014804.1; NC_012470.1; NC_006624.1; NC_002663.1; NC_016023.1;
NC_013202.1; NC_013158.1; NC_008508.1; NC_000917.1; NC_015943.1; NC_019792.1;
NC_019042.1; NC_015760.1; NC_015678.1; NC_014122.1; NC_004119.1; NC_007681.1;
NC 007681.1; NC 007297.1; NC 002935.2; NC 002932.3; NC 003454.1; NC 014933.1;
NC_011567.1; NC_004342.2; NC_016112.1; NC_003197.1; NC_022777.1; NC_015320.1;
NC_002695.1; NC_003143.1; NC_002737.2; NC_012883.1; NC_010610.1; NC_000916.1;
NC_004350.2; NC_000853.1; NC_000917.1; NC_006347.1; NC_018658.1; NC_015870.2;
NC_011751.1; NC_013961.1; NC_009778.1; NC_020990.1; NC_016112.1; NC_000868.1;
NC_003413.1; NC_022084.1; NC_018092.1; NC_017946.1; NC_015680.1; NC_015680.1;
NC_015474.1; NC_014106.1; NC_012804.1; NC_009053.1; NC_008054.1; NC_006624.1;
NC_000961.1; NC_021058.1; NC_015518.1; NC_018224.1 ; NC_017276.1 ; NC 017276.1; NC_017275.1; NC_017275.1 ; NC_010296.1 ; NC_010296.1 ; NC_009615.1 ; NC 012632.1; NC_012632.1; NC_012632.1; NC_012623.1; NC_012623.1; NC_012622.1; NC 012622.1;
NC_012589.1 ; NC_012589.1 ; NC_012588.1 ; NC_012588.1 ; NC_012588.1 ; NC_020892.1 ;
NC_019792.1; NC_017970.1; NC_017274.1; NC_017274.1; NC_016159.1; NC_013887.1;
NC 013769.1; NC 013156.1; NC 002754.1; NC 002754.1; NC 002754.1; NC 003687.1;
NC_006814.3; NC_006814.3; NC_014418.1; NC_010152.1; NC_017946.1; NC_017954.1;
NC_009776.1 ; NC_008818.1 ; NC_008818.1; NC_000961.1; NC_000918.1; NC_015931.1;
NC_015931.1 ; NC_014537.1 ; NC_007181.1; NC_006624.1 ; NC_003106.2; NC_004342.2;
NC_020247.1; NC_020246.1; NC_018472.1; NC_012623.1; NC_012589.1; NC_006045.2;
NC_023069.1; NC_022777.1; NC_019942.1; NC_017274.1; NC_013769.1; NC_009953.1;
NC_008698.1; NC_007493.2; NC_002754.1; NC_002754.1; NC_005125.1; NC_021347.1;
NC_022093.1 ; NC_022093.1 ; NC_015931.1; NC_007164.1 ; NC_015416.1; NC_015151.1;
NC_000917.1; NC_003106.2; NC_002932.3; NC_014500.1; NC_004337.2; NC_007087.3;
NC_012726.1 ; NC_024314.1 ; NW_003120284. l; NW_003120529. l; NW_003126883.1;
NW_003384275.1; NC_023069.1; NC_016567.1; NC_009954.1; NC_000913.3; NC_000913.3;
NC_000913.3; NC_000913.3; NC_027204.1; NC_002754.1; NC_010175.1; NC_016070.1;
NC_000868.1; NC_003413.1; NC_017527.1; NC_002939.5; NC_018227.2; NC_007355.1;
NC_014205.1; NC_014160.1; NC_009033.1; NC_007681.1; NC_020517.1; NC_018015.1;
NC_016070.1; NC_015865.1; NC_015847.1; NC_015680.1; NC_015474.1; NC_O15315.1;
NC_013790.1; NC_012883.1; NC_012470.1; NC_016051.1; NC_011529.1; NC_010610.1;
NC_009515.1; NC_009440.1 ; NC_007181.1; NC_006624.1 ; NC_019943.1 ; NC_019943.1 ;
NC_017459.1 ; NC_015518.1; NC_014933.1 ; NC_007426.1 ; NC_003901.1; NC_003106.2;
NC_003106.2; NC_009434.1; NC_005085.1; NC_020247.1; NC_020246.1; NC_006834.1;
NC_017276.1; NC_013158.1; NC 000917.1; NC 020247.1; NC 020246.1; NC 003552.1;
NC_017275.1; NC_010296.1; NC_012632.1; NC_012632.1; NC_012588.1; NC_012588.1;
NC_011296.1; NC_011296.1; NC_021355.1; NC_019792.1; NC_015958.1; NC_015636.1;
NC_015562.1; NC_013887.1; NC_013156.1; NC_007955.1; NC_007355.1; NC_002754.1;
NC_002754.1; NC_009778.1; NC_000962.3; NC_009089.1; NC_021592.1; NC_017946.1;
NC_015680.1; NC_018015.1; NC_014537.1; NC_014537.1; NC_012883.1; NC_012804.1;
NC_018224.1; NC_017459.1; NC_016023.1; NC_015943.1; NC_023069.1; NC_023044.1;
NC_019942.1; NC_014222.1; NC_008599.1; NC_002754.1; NC_022084.1; NC_022084.1;
NC_022084.1; NC_018092.1; NC_014804.1; NC_014804.1; NC_021058.1; NC_017274.1;
NC_015320.1; NC_014122.1 ; NC_013407.1; NC_014658.1; NC_000961.1; NC_000961.1;
NC_018092.1 ; NC_015151.1 ; NC_013849.1 ; NC_01 1567.1 ; NC_013926.1 ; NC_002754.1 ;
NC_013520.1; NC_013520.1; NC_007181.1; NC_007426.1; NC_003106.2; NC_020247.1;
NC 020246.1; NC 017276.1; NC 017275.1; NC 012632.1; NC 012623.1; NC 012588.1;
NC_011296.1; NC_011296.1; NC_013769.1; NC_013769.1; NC_012726.1 ; NC_021058.1;
NC_012622.1; NC_012589.1; NC_015151.1; NC_015954.1; NC_004547.2; NC_000913.3;
NC_010175.1; NC_016070.1; NC_002950.2; NC_009380.1; NC_009089.1; NC_009495.1;
NC_022777.1; NC_007796.1; NC_014106.1; NC_008054.1; NC_002663.1; NC_013961.1;
NC_O11138.3; NC_014228.1; NC_020990.1; NC_013892.1; NC_015760.1; NC_009953.1;
NC_009953.1; NC_009439.1; NC_010572.1; NC_002971.3; NC_021353.1; NC_014644.1;
NC_010610.1; NC_002935.2; NC_013722.1; NC_009464.1; NC_007643.1; NC_002939.5;
NC_010002.1; NC_003198.1; NC_014318.1; NC_008526.1; NC_011832.1; NC_008701.1;
NC_007955.1; NC_007796.1; NC_014147.1; NC_009053.1; NC_014500.1; NC_016901.1;
NC_015676.1; NC_020418.1; NW_003613864.1; NC_000011.10; NW_006800487.1;
NC_006603.3; NW_003614246.1; NC_012602.1; NC_013790.1; NC_009515.1; NC_015574.1;
NC_007871.1; NC_006478.3; NC_004347.2; NC_009006.2; NT_078266.2; NC_003454.1;
NW_006212882.1; NC_011913.1; NC_009917.1; NW_007675828.1 ; NW_007370782.1;
NW_007248774.1; NC_023642.1; NW_006775074.1; NW_006730123.1; NW_006718075.1;
NW_006711808.1; NW_006400147.1 ; NW_006408681.1; NW_006384369.1 ;
NW_005882764.1; NW_006200097.1; NC_022285.1; NW_004209914.1; NC_018435.1;
NC_018732.2; NC_018165.1; NC_027879.1; NC_019830.1; NC_013906.1; NC_009150.2;
NC_000964.3; NC_002696.2; NC_017034.1; NC_007581.1; NC_018001.1; NC_017954.1;
NC_014961.1; NC_014961.1; NC_011766.1; NC 009776.1 ; NC 007681.1; NC 007681.1;
NC_005877.1; NC_005877.1; NC_005877.1; NC_002689.2; NC_000918.1; NC_000918.1;
NC_021592.1 ; NC_015931.1; NC_015931.1; NC_010482.1 ; NC_009033.1; NC_000853.1 ;
NC_015518.1; NC_015416.1; NC_013849.1; NC_009440.1; NC_009440.1; NC_009009.1;
NC_007644.1; NC_000917.1; NC_003106.2; NC_011916.1; NC_007643.1; NC_006347.1;
NC_004342.2; NC_002945.3; NC_012589.1; NC_012623.1; NC_011672.1; NC_016131.1;
NW_004454187.1; NC_019862.1; NC_010451.3; NC_015768.1; NC_020173.2; NC_017972.1;
NC_015320.1; NC_011832.1; NC_010175.1; NC_010175.1; NC_008599.1; NC_006461.1;
NC_015637.1; NC_009784.1; NC_016114.1; NC_001493.2; NC_008508.1; NC_003197.1;
NC_017844.1 ; NW_006399893.1 ; NC_002695.1 ; NC_017634.1; NC_003143.1; NC_017941.2;
NC_004605.1; NC_004605.1; NC_019411.1 ; NC_007164.1 ; NC_002932.3; NC_005085.1;
NC_027207.1 ; NC_016452.1 ; NC_016112.1; NC_009784.1.
000130 Methods of the disclosure relate to a method of modifying endogenous DNA of a cell by exposing the cell to a first nucleic acid molecule comprising a nucleic acid sequence encoding a Cas protein and a second nucleic acid molecule comprising (i) a first and a second homology donor (HDR) region complementary to a target domain; (ii) at least a first and a second nucleic acid sequence that encodes a selection domain, wherein each of the first and second nucleic acid sequences that encode a selection domain are positioned in a 5’ to 3’ orientation between the first and second HDR sequence; (iii) a first nucleic acid sequence encoding a cleavage site positioned between the first HDR sequence and the first nucleic acid sequence that encodes a selection domain; (iv) a second nucleic acid sequence encoding a cleavage site positioned in a 5’ to 3’ orientation between the first nucleic acid sequence that encodes a selection domain and the second nucleic acid sequence that encodes a selection domain; (v) a first and a second multiple cloning site, wherein the first multiple cloning site is positioned upstream of the first HDR region and the second multiple cloning site is positioned downstream of the second HDR region.
000131 Methods of the disclosure also relate to a method of manufacturing a cell or a method labeling endogenous DNA of a cell comprising: (a) exposing the cell to any nucleic acid molecule disclosed herein for a time period sufficient to transfect the cell; and (b) exposing the cell to a Cas protein for a time period sufficient to excise endogenous DNA in the cell; and (c) allowing a portion of the nucleic acid molecule to integrate into the endogenous DNA of the cell. In some embodiments, the cell is an isolated cell. In some embodiments, the cell is chosen from the cells in Table Z and have a doubling time disclosed in Table Z. In some embodiments, the method is free of exposing the cell to a viral particle or a viral vector. In some embodiments, the method is performed by transfection. In some embodiments, the method is performed by transfection such that the nucleic acid sequence of the disclosure (single strand or double stranded DNA) is positioned within the cell; and then the cell is exposed to a gene editing enzyme, such as a Cas protein or a nucleic acid sequence encoding a Cas protein, such that the gene editing enzyme cuts endogenous DNA of the cell and facilitates integration of the DNA positioned between the HDR regions of the disclosed nucleic acid molecules into the target domain. The
disclosure also relates to a method of altering expression of at least one gene product in a cell comprising introducing into a cell an engineered, non-naturally occurring CRISPR associated (Cas) (CRISPR-Cas) system comprising: (a) a vector comprising a nucleotide sequence encoding any CRISPR enzyme disclosed herein, any mutated CRISPR enzyme having at least 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 9%, 97%, 98%, or 99% sequence homology to any CRISPR enzyme disclosed herein (such as Table C), or functional fragment thereof; and (b) a nucleic acid sequence disclosed herein, wherein components (a) and (b) are located on same or different vectors of the system; wherein the cell comprises endogenous DNA comprising a target domain and encoding a gene product; and wherein the CRISPR enzyme or functional fragment thereof cleaves the endogenous DNA molecule, whereby expression of the at least one gene product is altered. In some embodiments, the gene product is altered by integration of the endogenous DNA molecule integration of the DNA positioned between the HDR regions of the disclosed nucleic acid molecules into the target domain.
000132 In some embodiments, the disclosure relates to a composition comprising a cell line comprising one or a plurality of cells disclosed herein. In some embodiments, those cells comprise any one or combination of cells identified in Table Z and comprise a first and second nucleic acid sequence encoding a selection domain. In some embodiments, a selection domain is chosen from the amino acid sequences disclosed in Table Y. In some embodiments, the cell or cells comprise a nucleic acid molecule disclosed herein, a complementary sequence thereof and/or express a target protein with at least one protein tag. In some embodiments, the protein tag is chosen from those tags aforementioned above or is chosen from those amino acid sequences disclosed in Table Y. In some embodiments, the cell or cell express one or a combination of amino acid sequences disclosed in Table Y or a variant thereof comprising about 70%, 80% 90%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to the amino acids disclosed in Table Y. In some embodiments, the cell or cell line comprises a mutation in an endogenous DNA wherein a portion of the endogenous DNA encoding a target protein is modified on its 5’ or 3’ end to express the nucleic acid encoding the protein tag on the resulting amino or carboxy end of the encoding target protein. In some embodiments, the same cell or cells are modified endogenously or transiently to simultaneously express at least one or two
000133 selection markers independent of the tag, such that expression of the target protein is not dependent or regulated by expression of the selection markers. In some embodiments, the
expression of the independently regulated selection markers is free of the regulatory sequence operably linked to the target proteins. In some embodiments, the selection markers of the regulatory sequence operably linked to the target proteins comprise a first nucleic acid sequence that confers antibiotic resistance to the cell or cells and the second nucleic acid sequence encodes expression of a physical protein, such as a fluorescent protein in the cell. In some embodiments, the protein tag is not the same physical protein marker. In some embodiments, the physical protein marker is a protein chosen from the amino acid sequences disclosed on Table Y or a variant thereof comprising about 70%, 80% 90%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to the amino acids disclosed in Table Y; and the protein tag is an amino acid sequence chosen from the section of Affinity tag sequence identified in Table X or a variant thereof comprising about 70%, 80% 90%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to the amino acids disclosed in Table X.
000134 In some embodiments, the disclosure relates to a composition comprising a nucleic acid molecule encoding one or a plurality of sequences chosen from the amino acid sequences identified in Table X. In some embodiments, the nucleic acid molecule is comprises: one or more protease cleavage sites, affinity tag sequences positioned between one or more homology regions, one or more self-cleavage sequences, one or more mammalian antibiotic selection sequences, at least one multiple cloning sites, and one or more bacterial antibiotic selection sequences, one or more origin of replication sequences. In some embodiments, the nucleic acid sequence comprises, consists of or consists essentially of SEQ ID NO:55 or 69, or variants thereof comprising about 70%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:55 or 69.
000135 Methods of the disclosure relate to a method of cloning or manufacturing a nucleic acid molecule disclosed herein. Using the methods of the invention, synthesized, amplified or digested nucleic acid molecules may be derived from a variety of sources. Nucleic acid molecules suitably cloned by the methods of the present invention may be DNA molecules (including cDN A molecules), RNA molecules (including polyadenylated RNA (polyA+ RNA), messenger RNA (mRNA), transfer RNA (tRNA) and ribosomal RNA (rRNA) molecules) or DNA-RNA hybrid molecules, and may be single- stranded or double-stranded.
000136 The nucleic acid molecules to be cloned according to the methods of the present invention may be prepared synthetically according to standard organic chemical synthesis
methods that will be familiar to one of ordinary skill. In some embodiments, the nucleic acid molecules may be obtained from natural sources, such as a variety of cells, tissues, organs or organisms. Cells that may be used as sources of nucleic acid molecules may be prokaryotic (bacterial cells, including those of species of the genera Escherichia, Bacillus, Serratia, Salmonella, Staphylococcus, Streptococcus, Clostridium, Chlamydia, Neisseria, Treponema, Mycoplasma, Borrelia, Legionella, Pseudomonas, Mycobacterium, Helicobacter, Erwinia, Agrobacterium, Rhizobium, and Streptomyces) or eukaryotic (including fungi (especially yeasts), plants, protozoans and other parasites, and animals including insects (particularly Drosophila spp. cells), nematodes (particularly Caenorhabditis elegans cells), and mammals (particularly human, rodent (rat or mice), monkey, ape, canine, feline, equine, bovine and ovine cells, and most particularly human cells)).
000137 Mammalian somatic cells that may be used as sources of nucleic acids include blood cells (reticulocytes and leukocytes), endothelial cells, epithelial cells, neuronal cells (from the central or peripheral nervous systems), muscle cells (including myocytes and myoblasts from skeletal, smooth or cardiac muscle), connective tissue cells (including fibroblasts, adipocytes, chondrocytes, chondroblasts, osteocytes and osteoblasts) and other stromal cells (e.g., macrophages, dendritic cells, Schwann cells). Mammalian germ cells (spermatocytes and oocytes) may also be used as sources of nucleic acids for use in the invention, as may the progenitors, precursors and stem cells that give rise to the above somatic and germ cells (e.g. , embryonic stem cells). Also suitable for use as nucleic acid sources are mammalian tissues or organs such as those derived from brain, kidney, liver, pancreas, blood, bone marrow, muscle, nervous, skin, genitourinary, circulatory, lymphoid, gastrointestinal and connective tissue sources, as well as those derived from a mammalian (including human) embryo or fetus. Any of the above prokaryotic or eukaryotic cells, tissues and organs may be normal, diseased, transformed, established, progenitors, precursors, fetal or embryonic. Diseased cells may, for example, include those involved in infectious diseases (caused by bacteria, fungi or yeast, viruses (including HIV) or parasites), in genetic or biochemical pathologies (e.g. , cystic fibrosis, hemophilia, Alzheimer's disease, muscular dystrophy or multiple sclerosis) or in cancerous processes. Transformed or established animal cell lines may include, for example, COS cells, CHO cells, VERO cells, BHK cells, HeLa cells, HepG2 cells, K562 cells, F9 cells and the like. Other cells, cell lines, tissues, organs and organisms suitable as sources of nucleic acids for use in the present invention will be
apparent to one of ordinary skill in the art. In addition, such nucleic acid molecules and cDNA libraries may be obtained commercially, for example from Life Technologies, Inc. (Rockville, Maryland) and other commercial suppliers that will be familiar to the skilled artisan.
000138 In some embodiments, the nucleic acid molecules to be cloned are amplified nucleic acid molecules. Nucleic acid molecules may be amplified by a number of methods, which may comprise one or more steps. For example, one such method comprises
(a) contacting a first nucleic acid molecule, a first primer molecule which is complementary to a portion of the first nucleic acid molecule, a second nucleic acid molecule and a second primer molecule which is complementary to a portion of the second nucleic acid molecule, with one or more polypeptides having polymerase activity; (b) incubating the molecules and one or more polypeptides under conditions sufficient to form a third nucleic acid molecule complementary to all or a portion of the first nucleic acid molecule and a fourth nucleic acid molecule complementary to all or a portion of the second nucleic acid molecule; (c) denaturing the first and third and the second and fourth nucleic acid molecules; and (d) repeating steps (a) through (c) one or more times. Such amplification methods may be accomplished by any of a variety of techniques, including but not limited to use of the polymerase chain reaction (PCR; U.S. Patent Nos. 4,683, 195 and 4,683,202), Strand Displacement Amplification (SDA; U.S. Patent No. 5,455,166), and Nucleic Acid Sequence-Based Amplification (NASBA; U.S. Patent No. 5,409,818); In some embodiments, the method of manufacturing or preparing the disclosed nucleic acid molecule comprises performing PCR to clone individual components of the nucleic acid molecule into the nucleic acid molecule, such as SEQ ID NO:69 or functional variants thereof comprising at least about 75% sequence identity to SEQ ID NO:69. Once the starting cells, tissues, organs, libraries or other samples are obtained, nucleic acid molecules to be cloned by the methods of the invention may be isolated by methods that are well-known in the art. 000139 Methods of the disclosure also relate to a method of manufacturing a cell or a method labeling endogenous DNA of a cell comprising: (a) exposing the cell to any nucleic acid molecule disclosed herein for a time period sufficient to transfect the cell; and (b) exposing the cell to a Cas protein for a time period sufficient to excise endogenous DNA in the cell; and (c) allowing a portion of the nucleic acid molecule to integrate into the endogenous DNA of the cell. In some embodiments, the cell is an isolated cell. In some embodiments, the cell is chosen from the cells in Table Z and have a doubling time disclosed in Table Z. In some embodiments, the
method is free of exposing the cell to a viral particle or a viral vector. In some embodiments, the disclosure relates to a composition comprising a cell line comprising one or a plurality of cells disclosed herein. In some embodiments, those cells comprise any one or combination of cells identified in Table Z and comprise a first and second nucleic acid sequence encoding a selection domain. In some embodiments, a selection domain is chosen from the amino acid sequences disclosed in Table Y. In some embodiments, the cell or cells comprise a nucleic acid molecule disclosed herein, a complementary sequence thereof and/or express a target protein with at least one protein tag. In some embodiments, the protein tag is chosen from those tags aforementioned above or is chosen from those amino acid sequences disclosed in Table Y. In some embodiments, the cell or cell express one or a combination of amino acid sequences disclosed in Table Y or a variant thereof comprising about 70%, 80% 90%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to the amino acids disclosed in Table Y. In some embodiments, the cell or cell line comprises a mutation in an endogenous DNA wherein a portion of the endogenous DNA encoding a target protein is modified on its 5’ or 3’ end to express the nucleic acid encoding the protein tag on the resulting amino or carboxy end of the encoding target protein. In some embodiments, the same cell or cells are modified endogenously or transiently to simultaneously express at least one or two different selection markers independent of the tag, such that expression of the target protein is not dependent or regulated by expression of the selection markers. In some embodiments, the expression of the independently regulated selection markers is free of the regulatory sequence operably linked to the target proteins. In some embodiments, the selection markers of the regulatory sequence operably linked to the target proteins comprise a first nucleic acid sequence that confers antibiotic resistance to the cell or cells and the second nucleic acid sequence encodes expression of a physical protein, such as a fluorescent protein in the cell. In some embodiments, the protein tag is not the same physical protein marker. In some embodiments, the physical protein marker is a protein chosen from the amino acid sequences disclosed on Table Y or a variant thereof comprising about 70%, 80% 90%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to the amino acids disclosed in Table Y; and the protein tag is an amino acid sequence chosen from the section of Affinity tag sequence identified in Table X or a variant thereof comprising about 70%, 80% 90%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to the amino acids disclosed in Table X.
000140 In some embodiments, the disclosure relates to a composition comprising a nucleic acid molecule encoding one or a plurality of sequences chosen from the amino acid sequences identified in Table X. In some embodiments, the nucleic acid molecule is comprises: one or more protease cleavage sites, affinity tag sequences positioned between one or more homology regions, one or more self-cleavage sequences, one or more mammalian antibiotic selection sequences, at least one multiple cloning sites, and one or more bacterial antibiotic selection sequences, one or more origin of replication sequences. In some embodiments, the nucleic acid sequence comprises, consists of or consists essentially of SEQ ID NO:55 or 69, or variants thereof comprising about 70%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO:55 or 69. In some embbodiments, the nucleic acid molecule encodes any amino acid sequence identified before in Table X or variants that comprise about 70%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to the amino acids of Table X.
000141 Some embodiments of the disclosure also include a cell comprising the disclosed nucleic acid molecule. In some embodiments, the cell or plurality of cells comprises any cell line identified in Table Z.
Table X - Plasmid Elements
PYC-N whole sequence (SEQ ID NO:55)
TTCTCTGTCACAGAATGAAAATTTTTCTGTCATCTCTTCGTTATTAATGTTTGTAATTGACTGAATATCAACGCTTATTT
GCAGCCTGAATGGCGAATGGGACGCGCCCTGTAGCGGCGCATTAAGCGCGGCGGGTGTGGTGGTTACGCGCAGC
GTGACCGCTACACTTGCCAGCGCCCTAGCGCCCGCTCCTTTCGCTTTCTTCCCTTCCTTTCTCGCCACGTTCGCCGGCT
TTCCCCGTCAAGCTCTAAATCGGGGGCTCCCTTTAGGGTTCCGATTTAGTGCTTTACGGCACCTCGACCCCAAAAAA
CTTGATTAGGGTGATGGTTCACGTAGTGGGCCATCGCCCTGATAGACGGTTTTTCGCCCTTTGACGTTGGAGTCCAC
GTTCTTAATAGTGGACTCTTGTTCCAAACTGGAACAACACTCAACCCTATCTCGGTCTATTCTTTTGATTTATAAGGG
ATTTTGCCGATTTCGGCCTATTGGTTAAAAAATGAGCTGATTTAACAAAAATTTAACGCGAATTTTAACAAAATATTA
ACGCTTACAATTTAGGTGGCACTTTTCGGGGAAATGTGCGCGGAACCCCTATTTGTTTATTTTTCTAAATACATTCAA
ATATGTATCCGCTCATGAGACAATAACCCTGATAAATGCTTCAATAATATTGAAAAAGGAAGAGTATGAGTATTCAA
CATTTCCGTGTCGCCCTTATTCCCTTTTTTGCGGCATTTTGCCTTCCTGTTTTTGCTCACCCAGAAACGCTGGTGAAAG
TAAAAGATGCTGAAGATCAGTTGGGTGCACGAGTGGGTTACATCGAACTGGATCTCAACAGCGGTAAGATCCTTGA
GAGTTTTCGCCCCGAAGAACGTTTTCCAATGATGAGCACTTTTAAAGTTCTGCTATGTGGCGCGGTATTATCCCGTAT
TGACGCCGGGCAAGAGCAACTCGGTCGCCGCATACACTATTCTCAGAATGACTTGGTTGAGTACTCACCAGTCACA
GAAAAGCATCTTACGGATGGCATGACAGTAAGAGAATTATGCAGTGCTGCCATAACCATGAGTGATAACACTGCGG
CCAACTTACTTCTGACAACGATCGGAGGACCGAAGGAGCTAACCGCTTTTTTGCACAACATGGGGGATCATGTAAC
TCGCCTTGATCGTTGGGAACCGGAGCTGAATGAAGCCATACCAAACGACGAGCGTGACACCACGATGCCTGTAGCA
ATGGCAACAACGTTGCGCAAACTATTAACTGGCGAACTACTTACTCTAGCTTCCCGGCAACAATTAATAGACTGGAT
GGAGGCGGATAAAGTTGCAGGACCACTTCTGCGCTCGGCCCTTCCGGCTGGCTGGTTTATTGCTGATAAATCTGGA
GCCGGTGAGCGTGGGTCTCGCGGTATCATTGCAGCACTGGGGCCAGATGGTAAGCCCTCCCGTATCGTAGTTATCT
ACACGACGGGGAGTCAGGCAACTATGGATGAACGAAATAGACAGATCGCTGAGATAGGTGCCTCACTGATTAAGC
ATTGGTAACTGTCAGACCAAGTTTACTCATATATACTTTAGATTGATTTAAAACTTCATTTTTAATTTAAAAGGATCTA
GGTGAAGATCCTTTTTGATAATCTCATGACCAAAATCCCTTAACGTGAGTTTTCGTTCCACTGAGCGTCAGACCCCGT
AGAAAAGATCAAAGGATCTTCTTGAGATCCTTTTTTTCTGCGCGTAATCTGCTGCTTGCAAACAAAAAAACCACCGC
TACCAGCGGTGGTTTGTTTGCCGGATCAAGAGCTACCAACTCTTTTTCCGAAGGTAACTGGCTTCAGCAGAGCGCA
GATACCAAATACTGTTCTTCTAGTGTAGCCGTAGTTAGGCCACCACTTCAAGAACTCTGTAGCACCGCCTACATACCT
CGCTCTGCTAATCCTGTTACCAGTGGCTGCTGCCAGTGGCGATAAGTCGTGTCTTACCGGGTTGGACTCAAGACGAT
AGTTACCGGATAAGGCGCAGCGGTCGGGCTGAACGGGGGGTTCGTGCACACAGCCCAGCTTGGAGCGAACGACCT
ACACCGAACTGAGATACCTACAGCGTGAGCTATGAGAAAGCGCCACGCTTCCCGAAGGGAGAAAGGCGGACAGGT
ATCCGGTAAGCGGCAGGGTCGGAACAGGAGAGCGCACGAGGGAGCTTCCAGGGGGAAACGCCTGGTATCTTTAT
AGTCCTGTCGGGTTTCGCCACCTCTGACTTGAGCGTCGATTTTTGTGATGCTCGTCAGGGGGGCGGAGCCTATGGA
AAAACGCCAGCAACGCGGCCTTTTTACGGTTCCTGGCCTTTTGCTGGCCTTTTGCTCACATGTTCTTTCCTGCGTTATC
CCCTGATTCTGTGGATAACCGTATTACCGCCTTTGAGTGAGCTGATACCGCTCGCCGCAGCCGAACGACCGAGCGC
AGCGAGTCAGTGAGCGAGGAAGCGGAAGAGCGCCTGATGCGGTATTTTCTCCTTACGCATCTGTGCGGTATTTCAC
ACCGCATAGACCAGCCGCGTAACCTGGCAAAATCGGTTACGGTTGAGTAATAAATGGATGCCCTGCGTAAGCGGGT
GTGGGCGGACAATAAAGTCTTAAACTGAACAAAATAGATCTAAACTATGACAATAAAGTCTTAAACTAGACAGAAT
AGTTGTAAACTGAAATCAGTCCAGTTATGCTGTGAAAAAGCATACTGGACTTTTGTTATGGCTAAAGCAAACTCTTC
ATTTTCTGAAGTGCAAATTGCCCGTCGTATTAAAGAGGGGCGTGGCCAAGGGCATGGTAAAGACTATATTCGCGGC
GTTGTGACAATTTACCGAACAACTCGTTTAAACGCGGCCGCTTTCGAATCTAGAATGATAACTTCGTATAGCATACA
TTATACGAAGTTATCTATGGTGAGCAAGGGCGAGGAGGATAACATGGCCTCTCTCCCAGCGACACATGAGTTACAC
ATCTTTGGCTCCATCAACGGTGTGGACTTTGACATGGTGGGTCAGGGCACCGGCAATCCAAATGATGGTTATGAGG
AGTTAAACCTGAAGTCCACCAAGGGTGACCTCCAGTTCTCCCCCTGGATTCTGGTCCCTCATATCGGGTATGGCTTC
CATCAGTACCTGCCCTACCCTGACGGGATGTCGCCTTTCCAGGCCGCCATGGTAGATGGCTCCGGATACCAAGTCCA
TCGCACAATGCAGTTTGAAGATGGTGCCTCCCTTACTGTTAACTACCGCTACACCTACGAGGGAAGCCACATCAAAG
GAGAGGCCCAGGTGAAGGGGACTGGTTTCCCTGCTGACGGTCCTGTGATGACCAACTCGCTGACCGCTGCGGACT
GGTGCAGGTCGAAGAAGACTTACCCCAACGACAAAACCATCATCAGTACCTTTAAGTGGAGTTACACCACTGGAAA
TGGCAAGCGCTACCGGAGCACTGCGCGGACCACCTACACCTTTGCCAAGCCAATGGCGGCTAACTATCTGAAGAAC
CAGCCGATGTACGTGTTCCGTAAGACGGAGCTCAAGCACTCCAAGACCGAGCTCAACTTCAAGGAGTGGCAAAAG
GCCTTTACCGATGTGATGGGCATGGACGAGCTGTACAAGGCTACTAACTTCAGCCTGCTGAAGCAAGCTGGAGACG
TGGAGGAGAACCCTGGACCTATAACTTCGTATAGCATACATTATACGAAGTTATCTATGACAGAGTATAAACCAACG
GTTCGGCTCGCAACTCGCGACGATGTGCCCCGAGCAGTTAGGACTTTGGCCGCGGCGTTCGCAGACTATCCAGCGA
CGAGGCACACCGTAGATCCGGATAGGCATATTGAACGGGTCACCGAGCTTCAGGAACTTTTTCTCACGAGAGTTGG
CCTTGATATCGGAAAGGTATGGGTAGCGGACGACGGGGCTGCTGTAGCGGTCTGGACCACGCCAGAATCAGTGGA
AGCGGGGGCGGTATTCGCAGAAATTGGTCCGAGGATGGCGGAGTTGTCCGGGTCTCGACTGGCTGCCCAGCAACA
GATGGAGGGTCTTCTCGCTCCGCACCGACCAAAGGAACCGGCTTGGTTCCTGGCTACAGTTGGCGTTTCACCAGAT
CACCAAGGTAAAGGACTTGGAAGCGCAGTCGTCCTTCCGGGGGTTGAGGCAGCGGAACGGGCAGGTGTCCCCGC
GTTCTTGGAGACCAGTGCTCCTAGGAACCTCCCTTTCTACGAGCGACTTGGGTTCACGGTCACAGCTGATGTAGAG
GTTCCAGAGGGCCCCAGGACTTGGTGTATGACCAGGAAACCGGGTGCCACGAATTTCAGCCTGCTTAAGCAAGCCG
GTGACGTTGAAGAGAATCCAGGCCCCCCGGGTTCCTGGTCCCACCCCCAATTTGAAAAGGGTGGAGGAAGTGGAG
GTGGTTCCGGAGGTAGCGCGTGGAGTCACCCACAATTCGAGAAAGGGAGTGGACCGAGCCGCCTGGAAGAAGAA
CTGCGCCGCCGCCTGACCGAACCGGGTTCAGGATCCCGAGATCATATGGTGCTCCACGAATACGTGAATGCGGCAG
GCATCACAGGTTCCCCGGATTACAAAGACCATGACGGGGATTATAAAGACCACGATATTGATTATAAAGACGACGA
TGACAAATTGGTGCCGCGGGGCAGCGGTTCAGAATTCTAAGCTTCTCGAGCAATTGGTTTAAACAGATCCGAACCA
GATAAGTGAAATCTAGTTCCAAACTATTTTGTCATTTTTAATTTTCGTATTAGCTTACGACGCTACACCCAGTTCCCAT
CTATTTTGTCACTCTTCCCTAAATAATCCTTAAAAACTCCATTTCCACCCCTCCCAGTTCCCAACTATTTTGTCCGCCCA
CAGCGGGGCATTTTTCTTCCTGTTATGTTTTTAATCAAACATCCTGCCAACTCCATGTGACAAACCGTCATCTTCGGCT
ACTTT
Origin of replication ColEl (SEQ ID NO:56)
ACGCGCCCTGTAGCGGCGCATTAAGCGCGGCGGGTGTGGTGGTTACGCGCAGCGTGACCGCTACACTTGCCAGCG
CCCTAGCGCCCGCTCCTTTCGCTTTCTTCCCTTCCTTTCTCGCCACGTTCGCCGGCTTTCCCCGTCAAGCTCTAAATCG
GGGGCTCCCTTTAGGGTTCCGATTTAGTGCTTTACGGCACCTCGACCCCAAAAAACTTGATTAGGGTGATGGTTCAC
GTAGTGGGCCATCGCCCTGATAGACGGTTTTTCGCCCTTTGACGTTGGAGTCCACGTTCTTAATAGTGGACTCTTGTT
CCAAACTGGAACAACACTCAACCCTATCTCGGTCTATTCTTTTGATTTATAAGGGATTTTGCCGATTTCGGCCTATTG
GTTAAAAAATGAGCTGATTTAACAAAAATTTAACGCGAATTTTAACAAAATATTAACGCTTACAATTT
Origin of replication Fl (SEQ ID NO:57)
TTGAGATCCTTTTTTTCTGCGCGTAATCTGCTGCTTGCAAACAAAAAAACCACCGCTACCAGCGGTGGTTTGTTTGCC
GGATCAAGAGCTACCAACTCTTTTTCCGAAGGTAACTGGCTTCAGCAGAGCGCAGATACCAAATACTGTTCTTCTAG
TGTAGCCGTAGTTAGGCCACCACTTCAAGAACTCTGTAGCACCGCCTACATACCTCGCTCTGCTAATCCTGTTACCAG
TGGCTGCTGCCAGTGGCGATAAGTCGTGTCTTACCGGGTTGGACTCAAGACGATAGTTACCGGATAAGGCGCAGC
GGTCGGGCTGAACGGGGGGTTCGTGCACACAGCCCAGCTTGGAGCGAACGACCTACACCGAACTGAGATACCTAC
AGCGTGAGCTATGAGAAAGCGCCACGCTTCCCGAAGGGAGAAAGGCGGACAGGTATCCGGTAAGCGGCAGGGTC
GGAACAGGAGAGCGCACGAGGGAGCTTCCAGGGGGAAACGCCTGGTATCTTTATAGTCCTGTCGGGTTTCGCCAC
CTCTGACTTGAGCGTCGATTTTTGTGATGCTCGTCAGGGGGGCGGAGCCTATGGAAA
Antibiotic resistance (Ampicillin) (SEQ ID NO:58)
ATGAGTATTCAACATTTCCGTGTCGCCCTTATTCCCTTTTTTGCGGCATTTTGCCTTCCTGTTTTTGCTCACCCAGAAAC
GCTGGTGAAAGTAAAAGATGCTGAAGATCAGTTGGGTGCACGAGTGGGTTACATCGAACTGGATCTCAACAGCGG
TAAGATCCTTGAGAGTTTTCGCCCCGAAGAACGTTTTCCAATGATGAGCACTTTTAAAGTTCTGCTATGTGGCGCGG
TATTATCCCGTATTGACGCCGGGCAAGAGCAACTCGGTCGCCGCATACACTATTCTCAGAATGACTTGGTTGAGTAC
TCACCAGTCACAGAAAAGCATCTTACGGATGGCATGACAGTAAGAGAATTATGCAGTGCTGCCATAACCATGAGTG
ATAACACTGCGGCCAACTTACTTCTGACAACGATCGGAGGACCGAAGGAGCTAACCGCTTTTTTGCACAACATGGG
GGATCATGTAACTCGCCTTGATCGTTGGGAACCGGAGCTGAATGAAGCCATACCAAACGACGAGCGTGACACCAC
GATGCCTGTAGCAATGGCAACAACGTTGCGCAAACTATTAACTGGCGAACTACTTACTCTAGCTTCCCGGCAACAAT
TAATAGACTGGATGGAGGCGGATAAAGTTGCAGGACCACTTCTGCGCTCGGCCCTTCCGGCTGGCTGGTTTATTGC
TGATAAATCTGGAGCCGGTGAGCGTGGGTCTCGCGGTATCATTGCAGCACTGGGGCCAGATGGTAAGCCCTCCCGT
ATCGTAGTTATCTACACGACGGGGAGTCAGGCAACTATGGATGAACGAAATAGACAGATCGCTGAGATAGGTGCC
TCACTGATTAAGCATTGGTAA
Multiple cloning site (SEQ ID NO:59)
GTTTAAACGCGGCCGCTTTCGAATCTAGA
LoxP site 1 (SEQ ID N0:60)
ATGATAACTTCGTATAGCATACATTATACGAAGTTATCT
LoxP site 2 (SEQ ID NO:61)
ATAACTTCGTATAGCATACATTATACGAAGTTATCT
Fluorescence selection marker (mNeonGreen) (SEQ ID NO:62)
ATGGTGAGCAAGGGCGAGGAGGATAACATGGCCTCTCTCCCAGCGACACATGAGTTACACATCTTTGGCTCCATCA
ACGGTGTGGACTTTGACATGGTGGGTCAGGGCACCGGCAATCCAAATGATGGTTATGAGGAGTTAAACCTGAAGT
CCACCAAGGGTGACCTCCAGTTCTCCCCCTGGATTCTGGTCCCTCATATCGGGTATGGCTTCCATCAGTACCTGCCCT
ACCCTGACGGGATGTCGCCTTTCCAGGCCGCCATGGTAGATGGCTCCGGATACCAAGTCCATCGCACAATGCAGTT
TGAAGATGGTGCCTCCCTTACTGTTAACTACCGCTACACCTACGAGGGAAGCCACATCAAAGGAGAGGCCCAGGTG
AAGGGGACTGGTTTCCCTGCTGACGGTCCTGTGATGACCAACTCGCTGACCGCTGCGGACTGGTGCAGGTCGAAG
AAGACTTACCCCAACGACAAAACCATCATCAGTACCTTTAAGTGGAGTTACACCACTGGAAATGGCAAGCGCTACC
GGAGCACTGCGCGGACCACCTACACCTTTGCCAAGCCAATGGCGGCTAACTATCTGAAGAACCAGCCGATGTACGT
GTTCCGTAAGACGGAGCTCAAGCACTCCAAGACCGAGCTCAACTTCAAGGAGTGGCAAAAGGCCTTTACCGATGTG
ATGGGCATGGACGAGCTGTACAAG
Self-cleavage sequence (P2A) (SEQ ID NO:63)
GCTACTAACTTCAGCCTGCTGAAGCAAGCTGGAGACGTGGAGGAGAACCCTGGACCT
Chemical selection (Puromycin) (SEQ ID NO:64)
ATGACAGAGTATAAACCAACGGTTCGGCTCGCAACTCGCGACGATGTGCCCCGAGCAGTTAGGACTTTGGCCGCGG
CGTTCGCAGACTATCCAGCGACGAGGCACACCGTAGATCCGGATAGGCATATTGAACGGGTCACCGAGCTTCAGG
AACTTTTTCTCACGAGAGTTGGCCTTGATATCGGAAAGGTATGGGTAGCGGACGACGGGGCTGCTGTAGCGGTCTG
GACCACGCCAGAATCAGTGGAAGCGGGGGCGGTATTCGCAGAAATTGGTCCGAGGATGGCGGAGTTGTCCGGGT
CTCGACTGGCTGCCCAGCAACAGATGGAGGGTCTTCTCGCTCCGCACCGACCAAAGGAACCGGCTTGGTTCCTGGC
TACAGTTGGCGTTTCACCAGATCACCAAGGTAAAGGACTTGGAAGCGCAGTCGTCCTTCCGGGGGTTGAGGCAGC
GGAACGGGCAGGTGTCCCCGCGTTCTTGGAGACCAGTGCTCCTAGGAACCTCCCTTTCTACGAGCGACTTGGGTTC
ACGGTCACAGCTGATGTAGAGGTTCCAGAGGGCCCCAGGACTTGGTGTATGACCAGGAAACCGGGT
Twin-strepll tag (SEQ ID NO:65)
TGGTCCCACCCCCAATTTGAAAAGGGTGGAGGAAGTGGAGGTGGTTCCGGAGGTAGCGCGTGGAGTCACCCACAA
TTCGAGAAA
ALFA tag (SEQ ID NO:66)
AGCCGCCTGGAAGAAGAACTGCGCCGCCGCCTGACCGAA
GFPllth-strand (SEQ ID NO:67)
CGAGATCATATGGTGCTCCACGAATACGTGAATGCGGCAGGCATCACA
3XFLAG tag (SEQ ID NO:68)
GATTACAAAGACCATGACGGGGATTATAAAGACCACGATATTGATTATAAAGACGACGATGACAAA
8. pYC-C (SEQ ID NO:69)
TTCTCTGTCACAGAATGAAAATTTTTCTGTCATCTCTTCGTTATTAATGTTTGTAATTGACTGAATATCAACGCTTATTT
GCAGCCTGAATGGCGAATGGGACGCGCCCTGTAGCGGCGCATTAAGCGCGGCGGGTGTGGTGGTTACGCGCAGC
GTGACCGCTACACTTGCCAGCGCCCTAGCGCCCGCTCCTTTCGCTTTCTTCCCTTCCTTTCTCGCCACGTTCGCCGGCT
TTCCCCGTCAAGCTCTAAATCGGGGGCTCCCTTTAGGGTTCCGATTTAGTGCTTTACGGCACCTCGACCCCAAAAAA
CTTGATTAGGGTGATGGTTCACGTAGTGGGCCATCGCCCTGATAGACGGTTTTTCGCCCTTTGACGTTGGAGTCCAC
GTTCTTAATAGTGGACTCTTGTTCCAAACTGGAACAACACTCAACCCTATCTCGGTCTATTCTTTTGATTTATAAGGG
ATTTTGCCGATTTCGGCCTATTGGTTAAAAAATGAGCTGATTTAACAAAAATTTAACGCGAATTTTAACAAAATATTA
ACGCTTACAATTTAGGTGGCACTTTTCGGGGAAATGTGCGCGGAACCCCTATTTGTTTATTTTTCTAAATACATTCAA
ATATGTATCCGCTCATGAGACAATAACCCTGATAAATGCTTCAATAATATTGAAAAAGGAAGAGTATGAGTATTCAA
CATTTCCGTGTCGCCCTTATTCCCTTTTTTGCGGCATTTTGCCTTCCTGTTTTTGCTCACCCAGAAACGCTGGTGAAAG
TAAAAGATGCTGAAGATCAGTTGGGTGCACGAGTGGGTTACATCGAACTGGATCTCAACAGCGGTAAGATCCTTGA
GAGTTTTCGCCCCGAAGAACGTTTTCCAATGATGAGCACTTTTAAAGTTCTGCTATGTGGCGCGGTATTATCCCGTAT
TGACGCCGGGCAAGAGCAACTCGGTCGCCGCATACACTATTCTCAGAATGACTTGGTTGAGTACTCACCAGTCACA
GAAAAGCATCTTACGGATGGCATGACAGTAAGAGAATTATGCAGTGCTGCCATAACCATGAGTGATAACACTGCGG
CCAACTTACTTCTGACAACGATCGGAGGACCGAAGGAGCTAACCGCTTTTTTGCACAACATGGGGGATCATGTAAC
TCGCCTTGATCGTTGGGAACCGGAGCTGAATGAAGCCATACCAAACGACGAGCGTGACACCACGATGCCTGTAGCA
ATGGCAACAACGTTGCGCAAACTATTAACTGGCGAACTACTTACTCTAGCTTCCCGGCAACAATTAATAGACTGGAT
GGAGGCGGATAAAGTTGCAGGACCACTTCTGCGCTCGGCCCTTCCGGCTGGCTGGTTTATTGCTGATAAATCTGGA
GCCGGTGAGCGTGGGTCTCGCGGTATCATTGCAGCACTGGGGCCAGATGGTAAGCCCTCCCGTATCGTAGTTATCT
ACACGACGGGGAGTCAGGCAACTATGGATGAACGAAATAGACAGATCGCTGAGATAGGTGCCTCACTGATTAAGC
ATTGGTAACTGTCAGACCAAGTTTACTCATATATACTTTAGATTGATTTAAAACTTCATTTTTAATTTAAAAGGATCTA
GGTGAAGATCCTTTTTGATAATCTCATGACCAAAATCCCTTAACGTGAGTTTTCGTTCCACTGAGCGTCAGACCCCGT
AGAAAAGATCAAAGGATCTTCTTGAGATCCTTTTTTTCTGCGCGTAATCTGCTGCTTGCAAACAAAAAAACCACCGC
TACCAGCGGTGGTTTGTTTGCCGGATCAAGAGCTACCAACTCTTTTTCCGAAGGTAACTGGCTTCAGCAGAGCGCA
GATACCAAATACTGTTCTTCTAGTGTAGCCGTAGTTAGGCCACCACTTCAAGAACTCTGTAGCACCGCCTACATACCT
CGCTCTGCTAATCCTGTTACCAGTGGCTGCTGCCAGTGGCGATAAGTCGTGTCTTACCGGGTTGGACTCAAGACGAT
AGTTACCGGATAAGGCGCAGCGGTCGGGCTGAACGGGGGGTTCGTGCACACAGCCCAGCTTGGAGCGAACGACCT
ACACCGAACTGAGATACCTACAGCGTGAGCTATGAGAAAGCGCCACGCTTCCCGAAGGGAGAAAGGCGGACAGGT
ATCCGGTAAGCGGCAGGGTCGGAACAGGAGAGCGCACGAGGGAGCTTCCAGGGGGAAACGCCTGGTATCTTTAT
AGTCCTGTCGGGTTTCGCCACCTCTGACTTGAGCGTCGATTTTTGTGATGCTCGTCAGGGGGGCGGAGCCTATGGA
AAAACGCCAGCAACGCGGCCTTTTTACGGTTCCTGGCCTTTTGCTGGCCTTTTGCTCACATGTTCTTTCCTGCGTTATC
CCCTGATTCTGTGGATAACCGTATTACCGCCTTTGAGTGAGCTGATACCGCTCGCCGCAGCCGAACGACCGAGCGC
AGCGAGTCAGTGAGCGAGGAAGCGGAAGAGCGCCTGATGCGGTATTTTCTCCTTACGCATCTGTGCGGTATTTCAC
ACCGCATAGACCAGCCGCGTAACCTGGCAAAATCGGTTACGGTTGAGTAATAAATGGATGCCCTGCGTAAGCGGGT
GTGGGCGGACAATAAAGTCTTAAACTGAACAAAATAGATCTAAACTATGACAATAAAGTCTTAAACTAGACAGAAT
AGTTGTAAACTGAAATCAGTCCAGTTATGCTGTGAAAAAGCATACTGGACTTTTGTTATGGCTAAAGCAAACTCTTC
ATTTTCTGAAGTGCAAATTGCCCGTCGTATTAAAGAGGGGCGTGGCCAAGGGCATGGTAAAGACTATATTCGCGGC
GTTGTGACAATTTACCGAACAACTCGTTTAAACGCGGCCGCTTTCGAATCTAGAATGTTGGTGCCGCGGGGCAGCG
GTTCAGATTACAAAGACCATGACGGGGATTATAAAGACCACGATATTGATTATAAAGACGACGATGACAAAGGTTC
AGGATCCCGAGATCATATGGTGCTCCACGAATACGTGAATGCGGCAGGCATCACAGGTTCCCCGAGCCGCCTGGAA
GAAGAACTGCGCCGCCGCCTGACCGAACCGGGTTCCTGGTCCCACCCCCAATTTGAAAAGGGTGGAGGAAGTGGA
GGTGGTTCCGGAGGTAGCGCGTGGAGTCACCCACAATTCGAGAAAGGGAGTGGAGCCACGAATTTCAGCCTGCTT
AAGCAAGCCGGTGACGTTGAAGAGAATCCAGGCCCCATGACAGAGTATAAACCAACGGTTCGGCTCGCAACTCGC
GACGATGTGCCCCGAGCAGTTAGGACTTTGGCCGCGGCGTTCGCAGACTATCCAGCGACGAGGCACACCGTAGAT
CCGGATAGGCATATTGAACGGGTCACCGAGCTTCAGGAACTTTTTCTCACGAGAGTTGGCCTTGATATCGGAAAGG
TATGGGTAGCGGACGACGGGGCTGCTGTAGCGGTCTGGACCACGCCAGAATCAGTGGAAGCGGGGGCGGTATTC
GCAGAAATTGGTCCGAGGATGGCGGAGTTGTCCGGGTCTCGACTGGCTGCCCAGCAACAGATGGAGGGTCTTCTC
GCTCCGCACCGACCAAAGGAACCGGCTTGGTTCCTGGCTACAGTTGGCGTTTCACCAGATCACCAAGGTAAAGGAC TTGGAAGCGCAGTCGTCCTTCCGGGGGTTGAGGCAGCGGAACGGGCAGGTGTCCCCGCGTTCTTGGAGACCAGTG CTCCTAGGAACCTCCCTTTCTACGAGCGACTTGGGTTCACGGTCACAGCTGATGTAGAGGTTCCAGAGGGCCCCAG GACTTGGTGTATGACCAGGAAACCGGGTATAACTTCGTATAGCATACATTATACGAAGTTATCTGCTACTAACTTCA GCCTGCTGAAGCAAGCTGGAGACGTGGAGGAGAACCCTGGACCTATGGTGAGCAAGGGCGAGGAGGATAACATG GCCTCTCTCCCAGCGACACATGAGTTACACATCTTTGGCTCCATCAACGGTGTGGACTTTGACATGGTGGGTCAGGG CACCGGCAATCCAAATGATGGTTATGAGGAGTTAAACCTGAAGTCCACCAAGGGTGACCTCCAGTTCTCCCCCTGG ATTCTGGTCCCTCATATCGGGTATGGCTTCCATCAGTACCTGCCCTACCCTGACGGGATGTCGCCTTTCCAGGCCGCC ATGGTAGATGGCTCCGGATACCAAGTCCATCGCACAATGCAGTTTGAAGATGGTGCCTCCCTTACTGTTAACTACCG CTACACCTACGAGGGAAGCCACATCAAAGGAGAGGCCCAGGTGAAGGGGACTGGTTTCCCTGCTGACGGTCCTGT GATGACCAACTCGCTGACCGCTGCGGACTGGTGCAGGTCGAAGAAGACTTACCCCAACGACAAAACCATCATCAGT ACCTTTAAGTGGAGTTACACCACTGGAAATGGCAAGCGCTACCGGAGCACTGCGCGGACCACCTACACCTTTGCCA AGCCAATGGCGGCTAACTATCTGAAGAACCAGCCGATGTACGTGTTCCGTAAGACGGAGCTCAAGCACTCCAAGAC CGAGCTCAACTTCAAGGAGTGGCAAAAGGCCTTTACCGATGTGATGGGCATGGACGAGCTGTACAAGATAACTTCG TATAGCATACATTATACGAAGTTATCTTGAGAATTCTAAGCTTCTCGAGCAATTGGTTTAAACAGATCCGAACCAGA TAAGTGAAATCTAGTTCCAAACTATTTTGTCATTTTTAATTTTCGTATTAGCTTACGACGCTACACCCAGTTCCCATCT ATTTTGTCACTCTTCCCTAAATAATCCTTAAAAACTCCATTTCCACCCCTCCCAGTTCCCAACTATTTTGTCCGCCCACA GCGGGGCATTTTTCTTCCTGTTATGTTTTTAATCAAACATCCTGCCAACTCCATGTGACAAACCGTCATCTTCGGCTAC TTT
000142 The following Examples are intended to further illustrate certain preferred embodiments of the invention and are not to be construed as limiting the scope of the invention in any way. Various publications, including patents, published applications, technical articles and scholarly articles are cited throughout the specification. Each of these cited publications is incorporated by reference herein, in its entirety.
Example 1: Plasmid YifanCheng (pYC) generation, homology directed repair (HDR) arm design, and single-guided RNA (sgRNA) generation
000143 The backbone of pYC was derived from a pEG plasmid with plasmid amplification and antibiotic resistance. The tag components, purification tags and selection markers, were synthesized by Integrated DNA Technology and subcloned by PCR. The generated pYC were then amplified using the DH5a competent E.Coli cells. The designed multiple cloning sites were validated by double-restriction digestion and subsequent agarose gel analysis. The pYC sequences were confirmed (Elim BioPharma).
000144 The left and right HDR arms were designed based on the CRch38 (hg38, homo sapiens) in Benchling (https://benchling.com/) and NC_000012.12 Chromosome 12 Reference GRCh38.pl4 Primary Assembly in NCBI database (https://www.ncbi.nlm.nih.gov/gene). To achieve optimal knock-in and cost-efficiency, the length of the arms was maintained within a range of 500 - 990 bp, including sticky and blunt enzyme digestion sites. Both arms were subcloned into the pYC plasmid by PCR or by the digestion-ligation method. To generate the double-stranded DNA form of the HDR, specific blunt cutter sites on MCSs were introduced by PCR and the reconstituted plasmid was digested at 37°C for 2 hours by the corresponding enzymes, followed by agarose gel purification (Qiagen). To obtain the single-stranded DNA form of the HDR, the HDR was first amplified by PCR using a pair of primers (Elim BioPharma) in which the 5’ end of the 3’ primers is modified by addition of a phosphoryl group. The PCR product was then incubated with 1 uL of Lamda exonuclease (NEB), which recognizes this phosphoryl group on the 5’ end of DNA and degrades one strand to leave behind ssDNA, at 37°C for 0.5 -1 hour. Due to the lack of DNA secondary structure, SYBR Gold (ThermoFisher) was applied for the visualization. The digested
samples were then purified by DNA purification kit (Zymo research) and diluted into the designated concentration.
000145 The sgRNAs were designed by using CRch38 (hg38, homo sapiens) in Benchling software (https://benchling.com/). The higher ranked sgRNA fragments were synthesized by Elim Biopharm and subcloned into a px458 like previous described (Ran, 2013), respectively. In brief, a pair of primers, which is encoded sgRNA and designed BbsI site (NEB), were synthesized (Elim BioPharma), and two pairs of primers were annealed by using thermocycler. The annealing product was ligased into pre-linearized px458 by using T4 ligase (NEB).
000146 All DNA materials were stored at - 20°C until their usage.
Example 2: Stable cell line generation
000147 In 37°C with 8% CO2, adherent HEK293T cells were grown in DMEM media (Gibco) containing 10% fetal bovine serum (FBS) (v/v). Upon reaching 50% confluence, cells were transfected with 300 ng of px458s and 500 ng of HDRs (be it plasmids, dsDNA, and ssDNA) using Lipofectamine 2000 (ThermoFisher). Two days later, 1 - 2 ug/mL of puromycin (ThermoFisher) was supplied. 10 - 14 days post-selection, successful knock-in cells were confirmed by their dim GFP fluorescence, western blot, and cDNA sequencing. After confirmation, the cells were transferred to suspension cultures in Freestyle293 media (Gibco) with 1% FBS (v/v). The suspended cells were grown in 37°C with 5% CCh by shaking at 120 rpm. Puromycin was added up to 100 m of suspension culture, however, cells were grown in the absence of puromycin for the protein purification 400 - 800 mb culture.
000148 lurkat cells were grown in home-modified RPMI 1640 media (Gibco) by supplement with 10% FBS (v/v), lOOU/ml of penicillin-streptomycin, 2 mM of L-glutamine, 10 mM HEPES pH 7.4 and ImM sodium pyruvate. When the cell density reached at 1 - 1.5*106 cells/mL, the same Knock-in materials with HEK293T cells; 300 ng of px458s and 500 ng of HDR were delivered to the cells by using Lipofectamine 2000 (ThermoFisher). After three days, 1 ug/ml of puromycin were introduced for the knock-in selection, typically for 10 - 14 days. The dead cells were removed by using the centrifuge at 100g for 3 minutes at room temperature.
000149 To store genome edited cells, HEK293T and lurkat cells were resuspended with corresponding fresh media without puromycin and spun down at 1000g for 5 minutes to meet density of l x 107 cells/mL per vial. Cell pallets were resuspended with freezing media mixture
(50% fresh media, 40% conditional media and 10% DMSO (Sigma)) and transferred to cryogenic vials (Corning). The vials were slowly frozen in Mr. Frosty at -80°C for 8 - 12 hours and transferred to liquid nitrogen dewar for long term storage.
Example 3: Knock-in validation
000150 For western blot analysis, the cell pellets were resuspended by 180 pL of ice-cold TBS buffer (20mM Tris-HCl, pH8.0, 150mM NaCl), and then combined with 20 pL of 100 mM n- Dodecyl-P-D-maltoside (DDM) and 20 mM cholesteryl hemisuccinate (CHS) to lysis cells. The mixtures were rotated at 4°C for 45 minutes and 4xSDS-loading buffer was applied (Bio-Rad). The samples were vortexed, boiled at 90°C and spun down at 20,000g for 1 minutes at 4°C by using bench-top centrifuge (Eppendorf 5810R) to subject to SDS-PAGE gel (Bio-Rad, 4 - 15% gradient). The proteins on SDS-PAGE were transferred to a 0.2 pm nitrocellulose membrane (Bio-Rad) using a Trans-Blot Turbo (Bio-Rad). The samples were immunoblotted using Anti-FLAG-peroxidase (HRP) (Sigma-Aldrich, A8592) to validate its size. The peroxidase signals were developed with SuperSignal (ThemoFisher) and imaged by Chemidoc MP (Bio-Rad).
000151 For fluorescence imaging, the knock-in cells were seeded into 8-well chambered #1.5 coverglasses (C8-1.5P, Cellvis) coated with poly-l-lysine (Sigma-Aldrich) or fibronectin (Corning). Samples were imaged in DMEM FluoroBrite (Gibco) supplemented with 10 % FBS (UCSF Cell culture facility) and SPY650-DNA (Cytoskeleton Inc.) Widefield fluorescence microscopy was performed on a custom-built microscope constructed around a Ti-E body (Nikon) equipped with a water immersion objective (CFI Plan Apochromat IR 60x WI NA 1.27, Nikon), an active focus stabilization system (PFS, Nikon) and a motorized stage (MS-2000, ASI). A LED light source (X-Cite XLED1, Excelitas) operated at excitation power densities of 0.7 W/cm2 at 488 nm and 1.6 W/cm2 at 640 nm at the sample plane was used for excitation of mNeonGreen and SPY650- DNA. A sCMOS camera (Orca Flash 4.0, Hamamatsu) with a back-projected pixel size of 108 was used to detect brightfield and fluorescence signals. The sample was maintained at 37 °C and 5% CO2 using a stage-top incubation chamber with environmental control unit (Tokai Hit). Emitted fluorescence was separated from excitation light using a 405/488/561/640 nm quadband dichroic and mNeonGreen and SPY650-DNA signal was further fdtered using 525/50 and 700/75 nm bandpass filters. All microscope components were controlled using the micromanager 1.4 software platform (Edelstein, 2014).
000152 The total RNA was extracted by using Momach total RNA miniprep kit (NEB) and the manufacture provided protocol. The concentration of extracted total RNA was measured by Nanophoto meter NP80 (Implen) and lug of total RNAs were subjected to reverse transcript reaction to generate cDNAs by using LunaScript RT super mix (NEB). The amplificons isolated by using 1% agarose gel and QIAquick PCR purification kit (Acros and Qiagen). The purified amplicons were confirmed their sequence by Elim BioPharma. From HEK293T cells, the full- length cDNA of ACTB, GAPDH, TKT, VIM, and PCNA were validated. The cDNA of ACTB and PCNA from the Jurkat cells were achieved. FASN form HEK293T knock-in sample and FASN, GAPDH, and TKT from knock-in Jurkat cells were failed to obtain full-length cDNA but biochemical validation including western blot information and mass spectrometry along with structure information showed successfully tagging.
Example 4: Endogenous protein purification and subcellular fractionation
000153 Purified tagged endogenous proteins were only from HEK293T cells. The knocked- in cells were grown to a density of 3 - 4* 106 cells/mL before being harvested by centrifuge at 3000 - 4000g for 10 minutes at 4°C in the bench-top centrifuge (Eppendorf 5810R) or floor centrifuge (Beckman Coulter rotor JLA-8.1). The cell pellet was resuspended with ice-cold TBS buffer with additional protease inhibitor cocktail (Sigma-Aldrich). Sonication was applied to break cell membranes on ice-water mixture with gentle stirring to prevent the sample from overheating. The cell debris was then discarded by two follow-up centrifugation steps, one at 8000g for 20 minutes at 4°C using the rotor JA-25.50 (Beckman Coulter), and then at 126,000g for 1 hour at 4°C using the rotor Ti45 or 50.2Ti (Beckman Coulter), sequentially. The final supernatant was applied to preequilibrated 1 mL of anti-FLAG M2 affinity gel (Sigma-Aldrich) and incubated between 2 hours to overnight. The beads were then loaded in a polyprep column (Bio-Read) and then extensively washed with 50 column volumes of 500 mM NaCl and 20 mM Tris-HCl pH 8.0, followed by 50 column volumes of TBS buffer. To elute the proteins, 2 column volumes of TBS buffer supplemented with 0.25 mg/mL of 3 x FLAG peptide (Sigma-Aldrich) was applied twice. The eluted proteins were further purified by loading onto the size-exclusion chromatography column in TBS buffer, Superose 6 increase 10/300 GL or Superdex 200 increase 10/300 GL (Cytiva) depending on
the size of the proteins and putative complexes of interest. The fractions containing the target proteins were determined by using anti-FLAG immunoblot.
000154 To harvest GAPDH proteins from cytosol and nuclei separately, pelleted cells were initially resuspended with ice-cold 250 mM sucrose, 5 mM MgC12, and 10 mM HEPES pH 7.4 (Fractionation Buffer) supplemented with protease inhibitor cocktail. The cells were mechanically homogenized using 10 - 20 times up-and-down in tissue glider (Wheaton). The homogenized samples were then spun down at 600g for 10 minutes at 4°C in the bench-top centrifuge (Eppendorf 5810R). The pellet mainly contains larger components, like unbroken cells and nuclei, while the supernatant contains lighter cellular fractions, like cytoplasm or endoplasmic reticulum. The resulting cytoplasmic and nuclear fractions were then purified according to the procedures described above.
Example 5: Recombinant GAPDH with point mutations
000155 Human GAPDH (Uniprot ID: P04406) was subcloned into pEG plasmid containing C-terminal eGFP (Addgene ID: 160681, a home-made). All mutations are generated by overlapping PCR method and subcloned into pEG C-terminal eGFP vector. The sequences are confirmed by Elim BioPharma. After validating the sequence, the plasmids were transfected into HEK293T cells by using Lipofectamine 2000 and placed them at 37°C with 8% CO2. 20 hours posttransfection, 10 mM sodium butylate was added and the cells were transferred to 30°C with 5% CO2. After 20 - 24 hours, the cell pellets were harvested and resuspended by 180 pL of ice-cold TBS buffer, and then combined with 20 pL of 100 mM DDM and 20 mM CHS to lysis cells. The mixtures were rotated at 4°C for 45 minutes followed by spin down at 21,000g for 20 mins at 4°C. The sample was injected onto a Superdex 200 increase 10/300 GL column (Cytiva), preequilibrated with TBS buffer, at a flow rate of 0.5 mL/min. The GFP signal was collected by HPLC (Shimazu) equipped with RF-20A fluorescence detector - excitation 488nm and emission 508nm (Shimazu).
Example 6: Fluorescence size-exclusion chromatography (FSEC) and fluorescence light microscope
000156 Fluorescent size exclusion chromatography (FSEC) is an efficient method to efficiently characterize protein behavior without purification (Kawate, 2006). FSEC is typically
performed on proteins tagged with GFP. We demonstrated using FSEC to characterize PCNA without tagging it with GFP. Cell pellets were harvested from genome edited cell grown in adherent plates and resuspended by 180 pL of ice-cold TBS buffer, and then combined with 20 pL of 100 mM DDM and 20 mM CHS to lysis cells. The mixtures were rotated at 4°C for 45 minutes followed by spin down at 21,000g for 20 mins at 4°C. The supernatant was incubated with 1 pL of ALFANB- eGFP, 0.1 mg/mL stock, for 2 hours at 4 °C, followed by injecting onto a Superdex 200 increase 10/300 GL column (Cytiva), pre-equilibrated with TBS buffer, at a flow rate of 0.5 mL/min. The GFP signal was collected by HPLC (Shimazu) equipped with RF-20A fluorescence detector - excitation 488nm and emission 508nm (Shimazu).
000157 To visualize cellular location of PCNA-GFP-11, 30 - 50 ng of Superfold GFP1-
10 in pEG plasmid were transfected into the PCNA knocked-in cells without mNeonGreen. The cells were grown in 37 °C, 8% CO2 for 20 hours before imaging. The imaging method is modified from the method described above in “Knock-in validation by, western blot, fluorescence imaging, and cDNA sequencing” section.
Example 7: Endogenous GAPDH translocation and enzymatic activity
000158 When the GAPDH knocked-in cells reached at density of 2.8 -3 * 106 cells/mL, the cells were stressed with addition of 1 mM H2O2 and 10 mM L-Arg at pH 7.4 to the media at 37°C with 5% CChby shaking at 120 rpm. The cells were treated for 8 and 24 hours, after which point 50 ml were harvested. The cells were spun down at 1000g for 10 minutes at 4°C using a bench-top centrifuge (Eppendorf 581 OR), the supernatant was decanted, and the pellet flash-frozen in liquid nitrogen before -80°C storage for later use.
000159 Thawed pellet was resuspended in 10 ml of ice-cold Fractionation Buffer by gentle pipetting. Then subcellular fractionation was performed as described above. No further manipulation was done to the cytoplasmic fraction, but the nuclear fraction was rinsed by ice-cold TBS buffer twice before solubilization with 10 mM DDM and 2 mM CHS for an hour on a rotator at 4°C. Each sample was loaded with SDS-loading buffer containing P-mercaptoethanol and boiled at 90°C. The boiled samples were spun down and loaded onto a SDS-PAGE gel (Bio-Rad, 4 - 15% gradient). The proteins were transferred to a 0.2 pm nitrocellulose membrane (Bio-Rad) using a Trans-Blot Turbo (Bio-Rad). The samples were immunoblotted using the corresponding antibodies;
Anti-FLAG-peroxidase (HRP) (Sigma-Aldrich, A8592) to label the GAPDH, anti-FASN-HRP (Abeam, EPR7466) to monitor the cytoplasmic fraction, as well as anti-H2A (BioVision, cat3621- 100) with its secondary antibody rabbit-HRP (Bio-Rad, catl70-6516) to confirm the nuclear fraction. The peroxidase signals were developed with SuperSignal (ThemoFisher) and imaged by Chemi doc MP (Bio-Rad).
000160 GAPDH in the cytosolic compartment was also assessed for its activity. GAPDH levels were estimated by immunoblotting to keep amounts constant during the assay. Same quantities of proteins from the different oxidation stress conditions (none, 8 hours, and 24 hours) were added onto Greiner 96-well flat transparent plates and GAPDH enzymic activity were detected by GAPDH activity assay kit (Abeam, ab204732). Endogenous GAPDHs catalyze glyceraladehyde- 3-phophate into 1,3 -bisphosphate glycerate while conversing nicotinamide adenine dinucleotide (NAD+) to NADH. The provided chemicals from the kit react with the products and generate different colors, which allows to conduct colorimetric assay in OD450 nm absorption. Using SPARK 10M, a plate reader manufactured by TEC AN, the samples were incubated at 37°C and agitated every 5 seconds, while OD450 was measured every minute for 30 minutes.
Example 8: De-lipidation of endogenous GAPDH and membrane lipid strip assay
000161 Cells with FLAG-tagged GAPDH were grown to a density of 3 - 4 * 106 cells per mL at
400 mL total culture volume. Initial purification steps were followed as described above. After sonication, the supernatant and pellet were separated through two rounds of centrifugation at 8,000g and 126,000g, respectively. The supernatant from the latter round of centrifugation was filtered through a 0.2 pm filter. For GAPDH de-lipidation, 10 mM lauryl maltose neopentyl glycol (LMNG) and 2 mM CHS were added to the clarified supernatant, and the mixture was incubated with 1 mL of pre-equilibrated anti-FLAG M2 affinity gel at 4°C overnight. The beads were washed with 200 column volume of TBS buffer to remove detergent and GAPDH was eluted with 3 column volume of TBS buffer supplemented with 0.25 mg/mL 3><FLAG peptide. The elution was further incubated with 50 mg of Bio-Beads SM2 (Bio-Rad) at 4°C for 5 hours to remove any remaining detergent micelles. The sample was then concentrated using a 50 kDa MWCO filter and ran through a
Superdex 200 Increase 10/300 GL column equilibrated with TBS buffer. The de-lipidated GAPDH peaks were pooled and used for lipid strip assays.
000162 The membrane lipid strip P-6002 (Echelon Bioscience) was used to identify the lipid(s) that interact with GAPDH. The membrane lipid strip was first blocked with TBST buffer (25mM Tris-HCl, pH 7.2, 150mM NaCl, 0.1% Tween-20 (v/v)) with 5% (w/v) milk for 1 hour at room temperature and then washed three times with TBST buffer for 10 minutes each round. The strip was then gently agitated in 10 mL of 3 - 8 pg of purified GAPDH for 1 hour at room temperature. Conventional immunoblotting was used to visualize lipid-bound endogenous GAPDH using an anti-FLAG antibody further developed with SuperSignal and imaged with ChemiDoc MP.
Example 9: Widefield. fluorescence microscopy data processing
000163 To determine the fraction of cells exhibiting successful genomic integration of 6 different genes, image sets of 20-80 fields of view per condition were automatically processed using custom-written analysis routines implemented in ImageJ and Matlab. First, individual cells were segmented using Cellpose (Stringer, 2021) with custom -trained models for detection of nuclei labeled with SPY650-DNA (HEK293T cells) or entire cells in brightfield images (Jurkat cells). From segmented regions, the mean mNeonGreen intensity as well as the segmented area was extracted for each cell. Cells were classified as expressing target proteins, if the mean intensity for a given cells exceeded the mean + 2 standard deviations of respective wild-type cell populations.
Example 10: Cry o-EM sample preparation and data collection
000164 Purified protein samples were concentrated to 0.025 - 0.15 mg/mL using 50 kDa cut-off protein concentrator (ThermoFisher). 3 pL of the sample was then applied to graphene oxide grids with amine modification (made from gold quantifoil grids, 1.2/1.3-pm size/hole space, 200 to 300 mesh by applying ethylenediamine (Sigma)) (Wang, 2020), blotted for 4 - 6 s with 0 blotting force at 100% humidity using a Vitrobot Mark III, and flash-frozen in vitreous liquid ethane. The grids were screened by Talos Arctica and Glacios electron microscope (Therm oFisher-FEI), operated at 200kV and equipped with Gatan K3 camera (Gatan, Inc.). Movies were acquired when suitable
grids are identified. For GAPDH sample, final high resolution dataset were acquired using the Titan Krios electron microscope operated at 300kV and equipped with Gatan K3 camera and BioQuantum energy filter. The energy selection slit is set to 20eV. Movies were recorded in super-resolution mode at a nominal magnification of 105K, resulting in a super-resolution pixel size of 0.4175 A/pixel (0.835 A/pixel after 2x FT bin). Each movie stack was dose-fractionated in 80 frames using a total exposure time of 2 s at 0.025 seconds per frame. The total does was 45.8 (e / A 2 ). All image stacks were collected using SerialEM (Mastronarde, 2005). Defocus values varied from - 0.8 to - 1.5 pm.
Example 11: Cry o-EM data processing
000165 For 200kV dataset, all movies were motion-corrected on-the-fly by MotionCor2 (Zheng,
2017) implemented in Scipion (de la Rosa-Trevin, 2016). Motion corrected micrographs were further processed by Cryosparc (Punjani, 2017, including CTF estimation and particle picking. The pick particles were subjected to 2D classification and reasonable classes were pooled together for ab-initio reconstruction, followed by heterogenous refinement. Final datasets were subject to iterative non-uniform refinement followed by global CTF refinement.
000166 For GAPDH Krios dataset, motion-correction and dose weighting were performed using
MotionCor2 implemented in Relion (Kimanius, 2021). CTF was estimated by CTFFIND-4.1 (Zhang, 2016). By using relion star handler, micrographs with lower than preset resolution were triaged. Particle picking were first performed in 50 randomly selected micrographs. A GAPDH reconstruction determined from a Glacios dataset was used as a template for particle picking. Particle picking was performed either by using Topaz (for 0-hour oxidative stress) or templated- based particle-picking (8-hour oxidation and 24-hour oxidation).
000167 Extracted particles were first 4-binned and subjected to two rounds of 2D classification to remove junk particles. The remaining particles were re-extracted with 2-times-binning and subjected to 3D classification analysis with two different regularization value, 4 and 15 in Relion. The classification results were monitored by visualizing maps and the corresponding angular distributions in UCSF Chimera or ChimeraX (Pettersen, 2004;Pettersen, 2021). The low resolution
GAPDH structure from Glacios dataset was used as a reference map and Cl symmetry was applied. Several rounds of CTF refinements, anisotropic and CTF fitting were performed followed by Bayesian polishing. Both Cl and D2 symmetries were applied in parallel. Symmetry expanded and background subtracted particles were subjected to 3D classification without image alignment under T=15 and k=5 parameter in Relion.
Example 12: Model building
000168 Crystal structure of GAPDH (PDB:4WNC) was determined in USCF Chimera, and manually modified by using Coot (Emsley, 2010), followed by several rounds of real-space refinement in Phenix (Afonine, 2012) to improve the model accuracy. Multiple rounds of real-space refinement in Phenix and manual editing in Coot were conducted, and finally all GAPDH models were edited and validated in Coot and by using Molprobity (Chen, 2010) when the parameters reached reasonable range.
Example 13: Tagging DPF2
000169 The FLAG tag was endogenously placed to the N-terminus of DPF2, which is one core subunit of the cBAF complex. Similar protocols as in the above examples were utilized. 2x Strep- ALFA-GFP1 l-3xFLAG was endogenously placed to the N-terminus of DPF2. In brief, we choose 500bps upstream of start codon in DPF2 gene as the left homology arm and 500bps downstream of start codon in DPF2 gene as the right homology arm, and then perform molecule cloning to insert the fragment to pYC, which serves as the HDR template. After transfecting the HEK293 cells with px458 and HDR, the puromycin selection and the FACS were performed to get the positive cells. Western blot and Mass spectrum was used to validate the results.
000170 FIG. 19 illustrates the cBAF complex. FIG. 20 illustrates a western blot analysis for DPF2. FIG. 21 illustrates protein purification results from a 2L culture of HEK293 cells with the 2xStrep-ALFA-GFPl l-3xFLAG. FIG. 22 illustrates Mass spectrum of the purification. The data demonstrates that the cBAF complex was tagged. FIG. 23 illustrates a negative stain.
Example 14: Knock In Data for SNF2h in HEK293 Cells
000171 Endogenous SNF2h (SMARCA5)' was tagged in HEK293 cells. Similar protocols as in the above examples were utilized. FIG. 24 illustrates a western blot showing 2xStrep-ALFA-
GFP1 l -3xFLAG-SNF2h (-133 kDa) (tags on the N-terminus of SNF2h). 2xStrep-ALFA-GFPl 1 - 3xFLAG was endogenously placed to the N-terminus of SNF2h (SMARCA5). In brief, we choose 500bps upstream of start codon in SMARCA5 gene as the left homology arm and 500bps downstream of start codon in SMARCA5 gene as the right homology arm, and then perform molecule cloning to insert the fragment to pYC, which serves as the HDR template. After transfecting the HEK293 cells with px458 and HDR, the puromycin selection and the FACS were performed to get the positive cells. FIG. 24 illustrates a western blot showing 2xStrep-ALFA- GFP1 l-3xFLAG-SNF2h (-133 kDa) (tags on the N-terminus of SNF2h).
Example 15: TaggingExportin-1 (gene name crml orxpol) in HEK293 Cells
000172 Exportin-1 (gene name crml or xpol) was tagged in HEK293 cells. Similar protocols as in the above examples were utilized. See FIGS. 25A and 25B.
Example 16: HeLa Cells
000173 FASN (fatty acid synthase) knock in HeLa cells was successfully tagged, and FIG. 26 illustrates a western blot demonstrating success. Similar protocols as in the above examples were utilized. In brief, the same sgRNA sequence-containing px458 and homology arms from pYC system were transiently transfected using lipofectamine 3000 to HeLa cells. The selection was implemented in supplement of puromycin. The western blot was performed to confirm knock-in result. The FASN knock-in HeLa cells presents 250 kDa Anti-FLAG bands while wildtype HeLa cells do not. The Anti-GAPDH western blot is used as loading control.
Claims
1. A composition comprising a nucleic acid molecule comprising:
(i) a first and a second homology donor (HDR) region complementary to a target domain;
(ii) at least a first and a second nucleic acid sequence that encodes a selection domain, wherein each of the first and second nucleic acid sequences that encode a selection domain are positioned in a 5’ to 3’ orientation between the first and second HDR sequence;
(iii) a first nucleic acid sequence encoding a cleavage site positioned between the first HDR sequence and the first nucleic acid sequence that encodes a selection domain;
(iv) a second nucleic acid sequence encoding a cleavage site positioned in a 5’ to 3’ orientation between the first nucleic acid sequence that encodes a selection domain and the second nucleic acid sequence that encodes a selection domain;
(v) a first and a second multiple cloning site, wherein the first multiple cloning site is positioned upstream of the first HDR region and the second multiple cloning site is positioned downstream of the second HDR region.
2. The composition of claim 1, wherein the first and the second HDR regions comprise from about 10 to about 30 base pairs in nucleic acid length.
3. The composition of claim 1, wherein the first nucleic acid encoding a cleavage site, the first nucleic acid sequence encoding a selection domain, the second nucleic acid encoding a cleavage site, and the second nucleic acid sequence encoding a cleavage site are positioned in a contiguous nucleic acid sequence in a 5’ to 3’ orientation.
4. The composition of any of claims 1 through 3, wherein the first and second nucleic acids encoding a cleavage site encode a P2A cleavage site.
5. The composition of any of claims 1 through 4 further comprising a protein tag domain positioned, in a 5’ to 3’ orientation, either: (a) between the first HDR region and the first nucleic acid encoding a cleavage site; or (b) between the second nucleic acid encoding a cleavage site and the second HDR region.
6. The composition of any of claims 1 through 5, wherein the first nucleic acid sequence encoding a selection domain encodes a fluorescent protein.
7. The composition of any of claims 1 through 6, wherein the fluorescent protein is chosen from Table X.
8. The composition of any of claims 1 through 7, wherein the second nucleic acid encoding a selection domain encodes a selection domain that confers resistance to exposure of a toxic chemical.
9. The composition of claim 8, wherein the toxic chemical is an antibiotic.
10. The composition of claim 9, wherein the antibiotic is chosen from any antibiotic chosen from Table Y.
11. The composition of any of claims 1 through 10, wherein first multiple cloning site comprises at least about 70% sequence identity to [SEQ ID NO: 1 is the MCS 1 of pYC.
12. The composition of any of claims 1 through 11, wherein second multiple cloning site comprises at least about 70% sequence identity to SEQ ID NO:2 [SEQ ID NO:2 is the MCS 2 of pYC],
13. The composition of any of claims 1 through 12, wherein the nucleic acid molecule further comprises an origin of replication comprising at least 70% sequence identity to SEQ ID NO:3 (ori ofpYC).
14. The composition of any of claims 1 through 13, wherein (i) through (v) are positioned in a modification element, and wherein the nucleic acid molecule further comprises a regulatory sequence operably linked to a third nucleic acid sequence encoding a selection domain that is positioned outside of the modification element.
15. The composition of claim 14, wherein the third nucleic acid sequence encoding a selection domain confers puromycin resistance.
16. The composition of any of claims 1 through 13, wherein (i) through (v) are positioned in a modification element, and wherein the nucleic acid molecule further comprises an origin of replication positioned outside of the modification element.
17. The composition of claim 16, wherein the origin of replication comprises at least 70% sequence identity to SEQ ID NO:3 (ori of pYC).
18. The composition of any of claims 1 through 17 further comprising a transfection reagent.
19. The composition of any of claims 1 through 18, wherein the nucleic acid molecule is a plasmid, a double stranded DNA or a single stranded DNA molecule.
20. A cell comprising the composition of any of claims 1 through 19.
21. A cell comprising an endogenous nucleic acid sequence encoding an expressible amino acid, said endogenous nucleic acid sequence modified on its amino or carboxy terminus by an expressible exogenous modification element comprising:
(i) at least a first and a second nucleic acid sequence that encode a selection domain, wherein each of the first and second nucleic acid sequences that encode a selection domain;
(ii) a first nucleic acid sequence encoding a cleavage site positioned between the amino or carboxy terminus and the first nucleic acid sequence that encodes a selection domain; and
(iii) a second nucleic acid sequence encoding a cleavage site positioned between the first nucleic acid sequence that encodes a selection domain and the second nucleic acid sequence that encodes a selection domain.
22. The cell of claim 21, wherein the modification element further comprises a nucleic acid sequence encoding a protein tag.
23. A method of culturing a cell comprising:
(a) exposing the composition of any of claims 1 through 19 to the one or plurality of cells for a period of time sufficient to transfect the cell.
24. The method of claim 23 further comprising:
(b) exposing the one or plurality of cells to a selection stimulus sensitive to the first or second nucleic acid sequence encoding the selection domain.
25. The method of claim 23 further comprising
(b) exposing the one or plurality of cells to a selection stimulus sensitive to the first and second nucleic acid sequence encoding the selection domain.
26. The method of any of claims 23 through 25, wherein the selection stimulus is an antibiotic.
27. The method of claim 26, wherein the antibiotic is chosen from one or a combination of antibiotics from Table Y.
28. The method of claim 27, wherein the antibiotic is puromycin.
29. The method of any of claims 23 through 28, wherein the selection stimulus is exposure to light with a wavelength from about 500 nm to about 650 nm.
30. The method of any of claims 23 through 29 further comprising a step of exposing the plurality of cells to a Cas protein for a period of time sufficient to cleave at least one region of endogenous DNA of a cell in the plurality of cells.
31. The method of any of claims 23 through 30 further comprising a step culturing the one or plurality of cells from about 10 to about 14 days.
32. A plurality of cells comprising one or a combination of compositions chosen from any of claims 1 through 19.
33. The cells of claim 32, wherein the cells comprise NB458 cells, 293T cells and/or Jurkat cells.
34. The cells of claim 32 or 33, wherein the one or plurality of cells comprise NB458 cell and at least one non-cancerous cell.
35. The cells of claims 32 through 34, wherein at least about 30% of the cells comprise the one or a portion of the modification element in their endogenous DNA from about 7 to about 18 days in culture.
36. The cells of any of claims 32 through 35, wherein the cells doubling time is about 4 days.
37. The cells of claims 32 through 34, wherein at least about 50% of the cells comprise the one or a portion of the modification element in their endogenous DNA from about 14 to about 18 days in culture.
38. A method of editing endogenous DNA of one or a plurality of cells comprising:
(a) exposing the nucleic acid molecule of any of claims 1 through 19 to the one or plurality of cells for a period of time sufficient to transfect the cell.
39. The method of claim 38 further comprising (b) exposing the plurality of cells to a Cas protein for a period of time sufficient to cleave at least one region of endogenous DNA of a cell in the plurality of cells prior to step (a), such that endogenous DNA of the one or plurality of cells is cleaved at a target sequence.
40. The method of claim 39, further comprising exposing the nucleic acid molecule to the cleaved endogenous DNA for a time period sufficient for the modification element to integrate into the endogenous DNA of the one or plurality of cells at the target sequence.
41. The method of claim 40 further comprising culturing the one or plurality of cells for no less than about 7 days.
42. A method of isolating a protein in a cell comprising:
(a) exposing the nucleic acid molecule of any of claims 1 through 9 to the one or plurality of cells for a period of time sufficient to transfect the cell.
43. The method of claim 42 further comprising (b) exposing the plurality of cells to a Cas protein for a period of time sufficient to cleave at least one region of endogenous DNA of a cell in the plurality of cells prior to step (a), such that endogenous DNA of the one or plurality of cells is cleaved at a target sequence.
44. The method of claim 43, further comprising exposing the nucleic acid molecule to the cleaved endogenous DNA for a time period sufficient for the modification element to integrate into the endogenous DNA of the one or plurality of cells at the target sequence.
45. The method of claim 44 further comprising culturing the one or plurality of cells for no less than about 7 days.
46. The method of claim 44 further comprising allowing the one or plurality of cells to express a protein modified at the target sequence with the modification element, wherein the modification element comprises a nucleic acid encoding a protein tag.
47. The method of claim 46 further comprising isolating the protein by exposing the protein tag to one or a plurality of capture elements that associate with or bind to the protein tag.
48. The method of claim 47 further comprising isolating or precipitating the capture element.
49. A method of endogenously labeling a protein in a cell comprising:
(a) exposing the nucleic acid molecule of any of claims 1 through 19 to the one or plurality of cells for a period of time sufficient to transfect the cell.
50. The method of claim 49 further comprising (b) exposing the plurality of cells to a Cas protein for a period of time sufficient to cleave at least one region of endogenous DNA of a cell in the plurality of cells prior to step (a), such that endogenous DNA of the one or plurality of cells is cleaved at a target sequence.
51. The method of claim 50, further comprising exposing the nucleic acid molecule to the cleaved endogenous DNA for a time period sufficient for the modification element to integrate into the endogenous DNA of the one or plurality of cells at the target sequence, such that the protein is expressed with the modification element at the target sequence.
52. The method of claim 40 further comprising culturing the one or plurality of cells for no less than about 7 days.
53. A method of screening for therapeutic agent in a cell comprising
(a) exposing any one or plurality of cells of claims 20 through 22 to a pathogen.
54. The method of claim 53, wherein the pathogen is chosen from one or a combination of lentiviruses, hepatitis viruses, papilloma viruses, corona viruses, influenza viruses, rotaviruses.
55. The method of either of claims 53 and 54 further comprising exposing the one or plurality of cells to a library of therapeutic agents.
56. The method of claim 54, wherein the step of exposing is performed in the presence of absence of a viral inhibitor.
57. The method of claim 53 wherein the pathogen is a bacterial cell.
58. The method of claim 53, wherein the pathogen is a fungal cell.
59. The method of claim 53, wherein the one or plurality of cells are human cells.
60. The method of claim 53, wherein the one or plurality of cells are chosen from a cell line in
Table Z.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202363509021P | 2023-06-19 | 2023-06-19 | |
| US63/509,021 | 2023-06-19 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024263652A1 true WO2024263652A1 (en) | 2024-12-26 |
Family
ID=93936333
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2024/034636 Pending WO2024263652A1 (en) | 2023-06-19 | 2024-06-19 | Compositions for and methods of tagging endogenously expressed proteins |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2024263652A1 (en) |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20180305700A1 (en) * | 2015-10-13 | 2018-10-25 | The Research Foundation For The State University Of New York | Cleavable fusion tag for protein overexpression and purification |
| WO2019018534A1 (en) * | 2017-07-18 | 2019-01-24 | The Board Of Trustees Of Leland Stanford Junior University | Scarless genome editing through two-step homology directed repair |
| US11091780B2 (en) * | 2015-09-18 | 2021-08-17 | The Regents Of The University Of California | Methods for autocatalytic genome editing and neutralizing autocatalytic genome editing and compositions thereof |
-
2024
- 2024-06-19 WO PCT/US2024/034636 patent/WO2024263652A1/en active Pending
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11091780B2 (en) * | 2015-09-18 | 2021-08-17 | The Regents Of The University Of California | Methods for autocatalytic genome editing and neutralizing autocatalytic genome editing and compositions thereof |
| US20180305700A1 (en) * | 2015-10-13 | 2018-10-25 | The Research Foundation For The State University Of New York | Cleavable fusion tag for protein overexpression and purification |
| WO2019018534A1 (en) * | 2017-07-18 | 2019-01-24 | The Board Of Trustees Of Leland Stanford Junior University | Scarless genome editing through two-step homology directed repair |
Non-Patent Citations (1)
| Title |
|---|
| WOOYOUNG CHOIA, HAO WUA, KLAUS YSERENTANT, BO HUANGB, YIFAN CHENGA: "Efficient tagging of endogenous proteins in human cell lines for structural studies by single-particle cryo-EM", PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES, vol. 120, no. 31, 24 July 2023 (2023-07-24), pages 1 - 10, XP093256673 * |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP7700079B2 (en) | Systems and methods for genome editing | |
| US11946040B2 (en) | Adenine DNA base editor variants with reduced off-target RNA editing | |
| EP4021945A2 (en) | Combinatorial adenine and cytosine dna base editors | |
| JP2020521451A (en) | Use of split deaminase to limit unwanted off-target base editor deamination | |
| US20160362667A1 (en) | CRISPR-Cas Compositions and Methods | |
| US20240035010A1 (en) | Compositions comprising a variant polypeptide and uses thereof | |
| CA3211223A1 (en) | Compositions comprising a variant polypeptide and uses thereof | |
| CN118995664A (en) | Pilot editing system based on PERV reverse transcriptase | |
| CA3230609A1 (en) | Compositions comprising a crispr nuclease and uses thereof | |
| EP4558625A1 (en) | Compositions comprising a variant nuclease and uses thereof | |
| CN115975986B (en) | Mutant Cas12j protein and its application | |
| US11946045B2 (en) | Compositions comprising a variant polypeptide and uses thereof | |
| WO2023143150A1 (en) | Novel cas enzyme and system and use | |
| WO2023019243A1 (en) | Compositions comprising a variant cas12i3 polypeptide and uses thereof | |
| WO2024263652A1 (en) | Compositions for and methods of tagging endogenously expressed proteins | |
| WO2023165613A1 (en) | Use of 5'→3' exonuclease in gene editing system, and gene editing system and gene editing method | |
| KR20230051688A (en) | Nuclease-mediated nucleic acid modification | |
| US20250223576A1 (en) | Optimized cas protein and use thereof | |
| CN113661247A (en) | Cell penetrating transposase | |
| EP4209589A1 (en) | Miniaturized cytidine deaminase-containing complex for modifying double-stranded dna | |
| WO2024235991A1 (en) | Rna-guided nucleases and nucleic acid targeting systems comprising such rna-guided nucleases | |
| WO2024042168A1 (en) | Novel rna-guided nucleases and nucleic acid targeting systems comprising such rna-guided nucleases | |
| WO2025254779A1 (en) | Engineered epigenetic effectors | |
| CN117043326A (en) | Compositions comprising variant polypeptides and uses thereof | |
| HK40092826A (en) | Miniaturized cytidine deaminase-containing complex for modifying double-stranded dna |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24826573 Country of ref document: EP Kind code of ref document: A1 |