WO2025076000A2 - Compositions and methods for modulating chromatin state - Google Patents
Compositions and methods for modulating chromatin state Download PDFInfo
- Publication number
- WO2025076000A2 WO2025076000A2 PCT/US2024/049478 US2024049478W WO2025076000A2 WO 2025076000 A2 WO2025076000 A2 WO 2025076000A2 US 2024049478 W US2024049478 W US 2024049478W WO 2025076000 A2 WO2025076000 A2 WO 2025076000A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- tet2
- rna
- mbd6
- aspects
- mutations
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K14/00—Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
- C07K14/435—Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans
- C07K14/46—Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from vertebrates
- C07K14/47—Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from vertebrates from mammals
- C07K14/4701—Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from vertebrates from mammals not used
- C07K14/4702—Regulators; Modulating activity
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61K—PREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
- A61K31/00—Medicinal preparations containing organic active ingredients
- A61K31/70—Carbohydrates; Sugars; Derivatives thereof
- A61K31/7088—Compounds having three or more nucleosides or nucleotides
Definitions
- aspects of this invention relate to at least the field of molecular biology and medicine. More particularly, aspects concern at least compositions and methods for modifying, detecting, mapping, and/or evaluating chromatin state for therapeutic purposes.
- TET2 Ten-eleven translocation enzyme 2
- TET2 protein pathway associated genes are frequently disrupted in various diseases (1-3), such as human cancers, and have been shown to drive myeloid malignancy initation and progression. TET2 deficiency has been shown to result in globally opened chromatin and activation of genes contributing to aberrant hematopoietic stem cell self-renewal (4,5).
- the inventors have characterized upstream writers (e.g., NSUN1 and/or NSUN2), readers (e.g., MBD5 and/or MBD6), and erasers (e.g., TET2) involved in m 5 C regulation in caRNA.
- upstream writers e.g., NSUN1 and/or NSUN2
- readers e.g., MBD5 and/or MBD6
- erasers e.g., TET2
- genes associated with TET2 pathways such as isocitrate dehydrogenase 1 (IDH1) and/or isocitrate dehydrogenase 2 (IDH2) are mutated in ⁇ 77% low-grade glioma and ⁇ 30% of cholangiocarcinoma.
- IDH1/2 mutations are known to produce oncometabolite R- 2HG, which can inhibit TET2 activities.
- the disease comprises cancer of the lung, brain, breast, blood, skin, pancreas, liver, colon, head and neck, kidney, thyroid, stomach, spleen, gallbladder, bone, ovary, testes, endometrium, prostate, rectum, anus, and/or cervix.
- the disease comprises clonal hematopoiesis of indeterminate potential (CHIP).
- the disease is characterized by atherosclerosis, myocardial fibrosis, and/or heart failure.
- the disease comprises a blood cancer.
- the disease comprises a leukemia.
- the disease comprises a myeloid malignancy.
- diseases for treatment utilizing methods provided here comprise cells with one or more mutations in one or more genes encoding a ten- eleven translocation (tet) methylcytosine dioxygenase 2 (TET2), ASXL transcriptional regulator 1 (ASXL1), isocitrate dehydrogenase 1 (IDH1), isocitrate dehydrogenase 2 (IDH2), tumor protein p53 (p53), DNA (cytosine-5-)-methyltransferase 3A (DNMT3A), Janus kinase 2 (JAK2), Protein Phosphatase Mn2+/Mg2+-Dependent 1D (PPM1D), Spliceosome Factor 3b1 (SF3B1), and/or Serine and Arginine Rich Splicing Factor 2 (SRSF2).
- TET2 ten- eleven translocation
- ASXL1 ASXL transcriptional regulator 1
- IDH1 isocitrate dehydrogenase 1
- administering the one or more MBD6 inhibitors decreases m 5 C levels in one or more chromatin associated RNA (caRNA) andor decreasing association of the one or more caRNA with a PR-DUB complex in one or more diseased cells in the individual, relative to a control non-diseased cell.
- administering the one or more MBD6 inhibitors increases oxidation of m 5 C in a caRNA and/or inhibits installation of m 5 C in the one or more caRNA in one or more diseased cells in the individual.
- one or more caRNA comprises or consists essentially Long Terminal Repeat (LTR) RNAs.
- one or more caRNA comprises one or more of the caRNAs described in Table 1.
- the polynucleotide comprises RNA with one or more phosphorothioate bonds. In some aspects, the polynucleotide is comprised within a lentiviral particle and/or nanoparticle. In some aspects, the inhibitor of MBD6 comprises more than one polynucleotide. In some aspects, one or more inhibitors of MBD6 comprise a proteolysis targeting chimera. [0016] Also provided herein are methods of promoting histone ubiquitination in a cell comprising contacting the cell with one or more inhibitors of methyl-CpG-binding domain protein 6 (MBD6). In some aspects, promoting histone ubiquitination comprises or consists of promoting ubiquitination on H2A.
- MBD6 methyl-CpG-binding domain protein 6
- Aspect 2 is the method of aspect 1, wherein the disease comprises cancer of the lung, brain, breast, blood, skin, pancreas, liver, colon, head and neck, kidney, thyroid, stomach, spleen, gallbladder, bone, ovary, testes, endometrium, prostate, rectum, anus, and/or cervix.
- Aspect 3 is the method of aspect 1, wherein the disease comprises clonal hematopoiesis of indeterminate potential (CHIP).
- Aspect 4 is the method of aspect 3, wherein the disease is characterized by atherosclerosis, myocardial fibrosis, and/or heart failure.
- Aspect 11 is the method of aspect 10, wherein the glioma comprises glioblastoma.
- Aspect 12 is the method of any one of aspects 1 to 11, comprising reducing proliferation of a cancer and/or pre-cancerous cell.
- Aspect 18 is the method of aspect 17, wherein the one or more mutations in PR- DUB complex associated components OGT, KDM1B, FOXK1, FOXK2, BAP1, ASXL1, ASXL2, ASXL3, and/or HCFC1 comprises one or more gain of function mutations.
- Aspect 19 is the method of aspect 13, wherein the disease is associated with diseased cells with one or more mutations in a TET2 encoding gene.
- Aspect 20 is the method of aspect 19, wherein the one or more mutations in a TET2 encoding gene comprises one or more loss of function mutations.
- Aspect 21 is the method of aspect 20, wherein the disease is associated with diseased cells comprising one or more mutations in one or more genes encoding ASXL1, p53, DNMT3A, JAK2, PPM1D, SF3B1, and/or SRSF2.
- Aspect 22 is the method of any one of aspects 1 to 21, wherein the individual is administered an additional therapy.
- Aspect 23 is the method of aspect 22, wherein the additional therapy is surgery, radiation, chemotherapy, hormone therapy, and/or immunotherapy.
- Aspect 24 is the method of aspect 22 or 23, wherein the additional therapy comprises administration of an inhibitor of TET2.
- Aspect 30 is the method of any one of aspects 1 to 29, wherein the disease is is associated with diseased cells characterized as comprising an open chromatin state relative to non-diseased cells of the same developmental lineage.
- Aspect 31 is the method of any one of aspects 1 to 30, wherein administering the one or more MBD6 inhibitors results in promotion of a closed chromatin state in one or more diseased cells in the individual.
- Aspect 32 is the method of any one of aspects 1 to 31, wherein the disease is characterized as pro-inflammatory.
- Aspect 33 is the method of any one of aspects 1 to 32, comprising inhibition of diseased cell proliferation.
- Aspect 34 is the method of any one of aspects 1 to 33, wherein administering the one or more MBD6 inhibitors decreases m5C levels in one or more chromatin associated RNA (caRNA) and/or decreases association of one or more caRNA with a PR-DUB complex in one or more diseased cells in the individual.
- Aspect 35 is the method of aspect 34, wherein administering the one or more MBD6 inhibitors increases oxidation of m5C in the one or more caRNA and/or inhibits installation of m5C in the one or more caRNA in one or more diseased cells in the individual.
- Aspect 39 is the method of any one of aspects 34 to 38, wherein the one or more caRNA comprise a sequence at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to any one or more of SEQ ID NOs: 100-614.
- Aspect 40 is the method of any one of aspects 34 to 39, wherein the one or more caRNA comprise a sequence at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to any one or more of SEQ ID NOs 104 or 107.
- FIG.2B Proportion of m 5 C-marked peaks on repeat RNA and non- repeat RNA regions in WT mESCs (2,951 in non-Repeat RNAs; and 2,673 (47.5%) in Repeat RNAs).
- FIG.2C Average profile and heatmap depicting the m 5 C level in WT mESCs, along with the corresponding ATAC-seq signal in WT and Tet2 KO mESC cells. The regions analyzed were m 5 C-marked peaks located within repeat RNA regions.
- FIG. 2D Dot plot displaying m 5 C enrichment in various repeat RNA families (e.g., Long Terminal Repeat (LTR), Long Interspersed Nuclear Element (LINE), and Short Interspersed Nuclear Element (SINE)).
- LTR Long Terminal Repeat
- LINE Long Interspersed Nuclear Element
- SINE Short Interspersed Nuclear Element
- FIG.2E The relative m 5 C methylation level of the Intracisternal A-particle (IAP) and L1 RNAs in WT, Tet2 KO and Pspc1 KO mESCs. P values were determined using unpaired Student’s t test with Welch’s correction.
- FIG. 3C Depicts the average ATAC signals in WT versus Tet2 KO, and WT versus Pspc1 KO mESCs at DNA hypermethylated regions (detected in Tet2 KO mESCs).
- FIG. 3D Displays bar graphs illustrating the overlapping ratios (Jaccard index) of the genomic binding sites of PSPC1, TET2, and CXXC5 with DNA hypomethylated regions (detected in Tet2 knockout mESCs).
- FIG.3E Measurements of the average histone modification levels, as well as input signals, in WT and Tet2 KO mESCs at DNA hypomethylated regions (detected in Tet2 KO mESCs.
- DOX doxycycline
- FIG.4G Volcano plot depicting the log2FC of the H2AK119ub levels comparing Tet2 -/- with WT mLK cells at repeat regions, with IAPEz-int regions showing an overall decreased H2AK119ub level upon Tet2 KO in mLK cells.
- FIG.4H Depicts results of colony formation of replating assays of WT and Tet2 -/- mLK cells transfected with control (shNC) or Mbd6 shRNA (shMbd6) plasmids, and then cultured in the presence of indicated cytokines (see Methods). Colony counts were scored every 7 days. Numbers of colonies after each replating were quantified.
- FIG.4I Flow cytometry analyses of WT and Tet2 -/- mLKs transfected with shNC or shMbd6 plasmids.
- FIG.5A Proliferation curves showing that MBD6 knockdown significantly attenuated the proliferation of the TET2 mutant cell line SKM-1.
- MBD6 knockdown by two individual siRNAs siMBD6- 1 and siMBD6-2, respectively
- TTP-1, K-562, and SKM-1 Three different human leukemia cells
- siNC Non-targeting siRNA
- FIG. 5F The average profile of H2AK119ub around H2AK119ub peak center and flanking 2.5 kb regions in control versus MBD6 KD, and WT versus TET2 KO K-562 cells.
- FIG.5G Depicts a cumulative curve of the log 2 FC of H2AK119ub between TET2 KO versus WT, and between MBD6 KD versus control K-562 cells at repeat loci. P values were calculated by a nonparametric two-tailed Wilcoxon- Mann-Whitney test.
- FIG. 5H Dot plot displaying the m 5 C enrichment at various repeat families. The size of each dot corresponded to the number of loci that exhibited m 5 C methylation in WT K-562 cells.
- FIG. 7C Average profile and heatmap depicting the m 5 C levels in WT mESCs, along with the ATAC-seq signal in WT and Tet2 KO mESC cells. The regions analyzed were m 5 C-marked peaks.
- FIG.10A Provided are schematics showing the design of CRISPR tethering system of the TET2 catalytic domain (TET2(CD)) to IAP RNA (e.g., as denotated in FIG.10A).
- TET2(CD) TET2 catalytic domain
- IAP RNA IAP RNA
- mESCs stably expressed fusion proteins dCas13b-TET2(CD) or a catalytically dead version (CD with mutations in the HxD motif) (dCas13b-TET2(CD)-HxDmut) thereof, and a guide RNA sequence downstream of a Tet operator (TetO)-controlled H1 promotor (H1-2O2).
- TetO Tet operator
- FIG.12B Volcano plot depicting the log 2 FC of H2AK119ub in Tet2 KO versus WT mESCs at the repeat family loci, with corresponding statistical significance. P values for the comparison of expression levels between two groups were obtained using the Wald test by DESeq2.
- FIG.12C H2AK119ub at the IAP loci showed faster response to TET inhibition than H3K27me3. H2AK119ub and H3K27me3 chromatin bindings at IAP (left), MERVL (middle), and LINE (right) loci were measured through the CUT&Tag procedure followed by qPCR with loci-specific primers.
- FIG. 14E RT- qPCR analysis of the IAP RNA stability upon Tet2 KO or Mbd6 knockdown in mESCs.
- FIG. 18C The correlation of gene expression log2FC between Tet2 KO versus WT and ASO targeting IAP versus control in mLK cells. The genes were associated with various signaling pathways, including MAPK signaling pathways (top) and C-type lectin receptor signaling pathways (bottom).
- FIGs. 19A-19E MBD6 knockdown potently inhibited TET2-deficient leukemia cells.
- FIG. 19A Proliferation curves showing that MBD6 knockdown attenuated leukemia cell proliferation.
- siMBD6-1 and siMBD6-2 Knockdown of MBD6 by two individual siRNAs (siMBD6-1 and siMBD6-2) in human leukemia cells (TF-1 and OCI-AML3) were performed. Cell proliferations were monitored with MTS assay at different time points post viral transduction (24 hour, 48 hour, 72 hour and 96 hour). MTS signals at different time points were normalized to those at 24 hour to yield relative MTS signals.
- FIGs.20A-20C MBD6 knockdown attenuated leukemia progression in mouse PDX models regardless of leukemia cell TET2 mutational status.
- FIG. 20A NSG mice were intravenously injected with K-562 (FIG. 20B) or THP-1 (FIG.
- FIG.21D Shows the negative correlation of repeat RNAs abundance log2FC when comparing TET2 KO with WT K-562 cells and when comparing MBD6 KD with control in TET2 KO K-562 cells.
- FIG. 21E Shows a scatter plot demonstrating the positive correlation between repeat RNAs abundance log2FC and their target gene expression log2FC when comparing TET2 KO versus WT in K-562 cells.
- FIG.21F Boxplots displays the log 2 FC of carRNA abundance between TET2 KO versus WT, and between MBD6 KD versus control in TET2 KO K-562 cells.
- FIG.22C Displays cumulative curves of the log2FC of H2AK119ub between TET2 KO versus WT, and between MBD6 KD versus control K-562 cells at different genomic loci including enhancers (bottom) and promoters (top). P values were calculated by a nonparametric two-tailed Wilcoxon-Mann- Whitney test.
- FIG. 22D Shows the correlation between log2FC of repeat RNAs abundance (TET2 KO vs. WT) and log 2 FC of H2AK119ub at the corresponding genomic loci (TET2 KO vs. WT, top; MBD6 KD vs. control, bottom) in K-562 cells.
- FIG.22E Shows the correlation between the number of m 5 C methylated repeat RNAs and their respective m 5 C methylation levels in WT K-562 cells.
- LTR repeat RNAs HERVH-int ranked top in terms of methylation enrichment, while LTR12C had the highest levels of methylation.
- FIGs. 23A-23C MBD6 deficiency resulted in downregulation of leukemia- related genes that were generally upregulated in TET2 depletion models.
- FIG.23A Venn diagram illustrating the overlap of differentially expressed genes upon TET2 KO in control K- 562 cells, and upon MBD6 KD in TET2 KO K-562 cells.
- FIG.24A Heatmap showing the proliferation of various cancer cell lines with shRNA-based knockdown of TET2 and/or MBD6.
- H3K27me3 did not show a global change upon TET2 knockout (left panel). H3K27me3 peaks were then categorized into two subgroups according to whether or not they were overlapped with H2AK119ub (middle and right panels respectively). The results showed a dramatic reduction in H3K27me3 modification level upon TET2 knockout when these peaks were co-occupied with H2AK119ub (right panel). Similar changes were not observed at genomic regions exclusively marked with H3K27me3 modifications. [0127] FIG. 26, Exemplary model of TET2, NSUN2, MBD6, and PR-DUB interactions with H2A. DETAILED DESCRIPTION I.
- compositions and methods for their use can “comprise,” “consist essentially of,” or “consist of” any of the ingredients or steps disclosed throughout the specification.
- compositions and methods “consisting essentially of” any of the ingredients or steps disclosed limits the scope of the claim to the specified materials or steps which do not materially affect the basic and novel characteristic of the claimed invention.
- sequence as used herein in reference to a polynucleotide refers to the nucleotide sequence such as “A” for adenosine, “G” for guanine, “C” for cytosine, “T” for thymine, “U” for uracil, “I” for inosine, and “N” for “A”/“C”/“U”/“T”/“G”/“I”.
- a protein is specifically mentioned herein, it is in general a reference to a native (wild-type) or recombinant (modified) protein or, optionally, a protein in which any signal sequence has been removed.
- the protein may be isolated directly from the organism of which it is native, produced by recombinant DNA/exogenous expression methods, or produced by solid phase peptide synthesis (SPPS) or other in vitro methods.
- SPPS solid phase peptide synthesis
- nucleic acid segments and recombinant vectors incorporating nucleic acid sequences that encode a polypeptide e.g., an enzymatic domain, such as a deaminase domain, or a fragment thereof.
- polynucleotide oligonucleotides (e.g., nucleic acids typically 200 residues or less, or 100 residues or less in length), recombinant vectors, including, for example, plasmids, cosmids, phage, viruses, and the like.
- Polynucleotides include, in certain aspects, regulatory sequences, isolated substantially away from their naturally occurring genes or protein encoding sequences. Polynucleotides may be single- stranded (coding or antisense) or double- stranded, and may be RNA, DNA (genomic, cDNA or synthetic), analogs thereof, or a combination thereof.
- RNA RNA
- CRISPR/Cas system ancillary components linking elements, targeting RNAs, etc.
- polypeptide and/or functional RNA species may be encoded by nucleic acids containing variations having slightly different nucleic acid sequences but, nonetheless, encode the same or substantially similar protein and/or RNA species.
- polynucleotide variants having substantial identity to the sequences disclosed herein; those comprising at least 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% or higher sequence identity, including all values and ranges there between, compared to a polynucleotide sequence provided herein using the methods known in the art and/or described herein (e.g., BLAST analysis using standard parameters).
- nucleic acid fragments of almost any length may be employed, with the total length preferably being limited by the ease of preparation and use in the intended recombinant nucleic acid protocol.
- a nucleic acid sequence may encode a polypeptide sequence with additional heterologous coding sequences, for example to allow for purification of the polypeptide, transport, secretion, post-translational modification, or for therapeutic benefits such as targeting or efficacy.
- a tag or other heterologous polypeptide may be added to the modified polypeptide-encoding sequence, wherein “heterologous” refers to a polypeptide that is not the same as the modified polypeptide.
- the one or more genetic mutations result in an abnormal phenotype relative to cells that do not comprise the one or more mutations (e.g. "healthy cells").
- Abnormal phenotypes may include reduced or increased fitness, such as phenotypes associated with cancerous cells.
- a diseased cell comprises one or more mutations in one or more oncogenes and/or tumor suppressor genes.
- compositions and kits can be used to achieve methods disclosed herein.
- Any method in the context of a therapeutic, diagnostic, or physiologic purpose or effect may also be described in “use” claim language such as “Use of” any compound, composition, or agent discussed herein for achieving or implementing a described therapeutic, diagnostic, or physiologic purpose or effect.
- II. Overview [0149] Chromatin-associated regulatory RNAs can offer a platform for dynamic chromatin regulation.
- an epigenetic regulatory pathway that can comprise, but is not limited to, chromatin associated RNA (caRNA), Ten-eleven translocation enzyme 2 (TET2), NOP2/Sun RNA methyltransferase 2 (NSUN2), methyl-CpG binding domain protein 6 (MBD6), and the Polycomb repressive complexes (e.g., comprising PRC1, PRC2, and/or PR- DUB), which components can interact to influence chromatin state through histone modifications (see e.g., FIG. 26), and as such, influence the transcriptome and proteome of cells.
- caRNA chromatin associated RNA
- TAT2 Ten-eleven translocation enzyme 2
- NUN2 NOP2/Sun RNA methyltransferase 2
- MBD6 methyl-CpG binding domain protein 6
- Polycomb repressive complexes e.g., comprising PRC1, PRC2, and/or PR- DUB
- TET2 deficiency has also been described as leading to a globally opened chromatin state (e.g., euchromatin) and aberrant activation of genes, contributing to aberrant cell self-renewal (e.g., aberrant hematopoietic stem cell (HSC) self-renewal).
- chromatin state e.g., euchromatin
- HSC hematopoietic stem cell
- HPSCs hematopoietic progenitor and stem cells
- chromatin-associated RNA e.g., chromatin-associated retrotransposon RNA
- m 5 C 5- methylcytosine
- MBD6 methyl-CpG-binding domain protein
- TET2 can oxidize m 5 C and antagonize this caRNA m 5 C associated MBD6-dependent H2AK119ub deubiquitylation.
- TET2 depletion can lead to globally decreased H2AK119ub, more open chromatin, and increased/aberrant transcription in cells, such as stem cells.
- methods of treatment of diseases associated with diseased cells with TET2 mutations can comprise inhibition of MBD6.
- TET2 mutant diseases e.g., TET2 mutant human leukemias
- TET2 and MBD6 can result in a synergistically lethal effect upon diseased cells (e.g., cancer cells).
- inhibiting MBD6 protein activity can selectively inhibit proliferation of TET2 mutant cells.
- technologies provided herein can facilitate greater than or equal to about 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41%, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%
- the polypeptides of the disclosure may include at least, at most, or exactly 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 11
- the substitution is with an alanine, arginine, asparagine, aspartic acid, cysteine, glutamine, glutamic acid, glycine, histidine, isoleucine, leucine, lysine, methionine, phenylalanine, proline, serine, threonine, tryptophan, tyrosine, or valine.
- the protein or polypeptide, or polynucleotide encoding the same may comprise amino acids or nucleotides 1 to 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105,
- the protein, polypeptide, or nucleic acid may comprise, comprise at least, or comprise at most 1 to 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108
- the polypeptide, protein, or nucleic acid may comprise at least, at most, or exactly 1 to 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108,
- MLL myeloproliferative neoplasms
- MDSs myelodysplastic syndromes
- AML acute myeloid leukemia
- somatic alterations in genes that regulate the epigenetic state of hematopoietic cells are a common pathogenetic event in leukemogenesis.
- methods of treating diseases associated with somatic alterations in genes that regulate the epigenetic state of hematopoietic cells such as but not limited to hematopoietic stem cells.
- CHIP has also been reported to be a significant risk factor for cardiovascular diseases, related in part to hyper-inflammatory progeny macrophages that carry TET2 inactivating mutations. Therefore, somatic TET2 mutations, and mutations in TET2 pathways, can contribute to myeloid expansion and innate immune dysregulation with age and can contribute to prevalent diseases in the developed world (e.g., cancer and/or cardiovascular disease).
- MBD5 is a member of the methyl-CpG-binding domain (MBD) family.
- the MBD domain consists of approximately 70 residues (e.g., ⁇ 65 to ⁇ 75 residues) and is the minimal region required for a methyl-CpG-binding protein binding specifically to methylated DNA.
- MBD5 protein contains a PWWP domain (Pro-Trp-Trp-Pro motif), which consists of 100-150 amino acids and is found in numerous proteins that are involved in cell division, growth and differentiation. Mutations in this gene have been reported to result in an autosomal dominant type of cognitive disability.
- the MBD5 protein has been reported to interact with the polycomb repressive complex PR-DUB which catalyzes the deubiquitination of a lysine residue of histone 2A.
- MBD6 has been reported to enable chromatin binding activity, and has been found to be located in the chromocenter, the fibrillar center, and the nucleoplasm. Mutations in MBD6 have been reported to be implicated in autism spectrum disorder.
- MBD6 functions as an m 5 C RNA reader protein and/or reader complex component (e.g., as a guide/reader for the PR-DUB complex).
- Exemplary human wildtype MBD6 protein isoforms or polypeptides derived therefrom, and encoding gene are provided as SEQ ID NOs: 1-4. Additional exemplary human wildtype MBD6 protein isoforms and encoding genes, and information related thereto, can be found in the NCBI database under Gene ID: 114785.
- methods disclosed herein comprise, consist essentially of, or consist of inhibition of MBD6.
- methods disclosed herein comprise administering one or more inhibitors of MBD6.
- the one or more inhibitors of MBD6 may, in some aspects, comprise a polynucleotide at least partially complementary to a gene encoding MBD6 (e.g., a short hairpin RNA and/or small interfering RNA).
- the one or more inhibitors of MBD6 comprise a proteolysis targeting chimera (PROTAC) targeting MBD6.
- PROTAC proteolysis targeting chimera
- methods disclosed herein comprise, consist essentially of, or consist of inhibition of NSUN1.
- methods disclosed herein comprise administering one or more inhibitors of NSUN1.
- the one or more inhibitors of NSUN1 may, in some aspects, comprise a polynucleotide at least partially complementary to a gene encoding NSUN1 (e.g., a short hairpin RNA and/or small interfering RNA).
- the one or more inhibitors of NSUN1 comprise a proteolysis targeting chimera (PROTAC) targeting NSUN1.
- PROTAC proteolysis targeting chimera
- Exemplary human wildtype NSUN2 protein isoforms and encoding genes are provided as SEQ ID NOs: 11-14. Additional exemplary human wildtype NSUN1 protein isoforms and encoding genes, and information related thereto, can be found in the NCBI database under Gene ID: 54888.
- methods disclosed herein comprise, consist essentially of, or consist of inhibition of NSUN2.
- methods disclosed herein comprise administering one or more inhibitors of NSUN2.
- the one or more inhibitors of NSUN2 may, in some aspects, comprise a polynucleotide at least partially complementary to a gene encoding NSUN2 (e.g., a short hairpin RNA and/or small interfering RNA).
- the one or more inhibitors of NSUN2 comprise a proteolysis targeting chimera (PROTAC) targeting NSUN2.
- TET2 is a methylcytosine dioxygenase that catalyzes the conversion of methylcytosine to 5-hydroxymethylcytosine.
- the TET2 protein has been reported to be involved in myelopoiesis, and defects in this gene have been reported to be associated with several myeloproliferative disorders. Two variants encoding different isoforms have been found for this gene.
- TET2 functions as an m 5 C RNA eraser protein and/or eraser complex component.
- Exemplary human wildtype TET2 protein isoforms, and polypeptides derived therefrom, and encoding genes are provided as SEQ ID NOs: 15-20. Additional exemplary human wildtype TET2 protein isoforms and encoding genes, and information related thereto, can be found in the NCBI database under Gene ID: 54790.
- methods disclosed herein comprise, consist essentially of, or consist of inhibition of TET2.
- methods disclosed herein comprise administering one or more inhibitors of TET2.
- the one or more inhibitors of TET2 may, in some aspects, comprise a polynucleotide at least partially complementary to a gene encoding TET2 (e.g., a short hairpin RNA and/or small interfering RNA).
- RNAs for example but not limited to, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, or 1000, or more than 1000, or any range derivable therein), such as but not limited to RNAs identified in Table 1.
- RNAs for example but not limited to, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, or 1000, or more than 1000, or any range derivable therein), such as but not limited to RNAs identified in Table 1.
- RNAs for example but not limited to, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, or 1000, or more than 1000, or any range derivable therein), such as but not limited to RNAs identified in Table 1.
- RNAs for example but not limited to, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, or 1000, or more than 1000, or any range derivable therein), such as but not limited to RNAs identified in Table 1.
- polypeptides may be mutated by truncation, rendering them shorter than their corresponding wild-type form, also or alternatively, they might be altered by fusing or conjugating a heterologous protein or polypeptide sequence with a particular function (e.g., for targeting or localization, for enhanced immunogenicity, for purification purposes, etc.).
- proteins capable of RNA m 5 C installation, reading/binding, and/or erasing, or polypeptides derived therefrom are guided to a target RNA molecule (such as but not limited to, an RNA molecule comprising a m 5 C RNA feature) by a targeting element comprising a protein, polypeptide, and/or RNA molecule.
- a targeting element comprising a protein and/or polypeptide can be guided to a target RNA element by a complementary, or at least partially complementary, RNA molecule.
- a targeting element comprises a Cas protein.
- a Cas protein can be, or can be derived from, a type I, type II, type III, type IV, type V, and/or type VI, CRISPR systems.
- a targeting element such as a targeting protein, can be linked to a protein (or polypeptide derived therefrom) capable of RNA m 5 C installation, reading (e.g., binding), and/or erasing (e.g., oxidation).
- SEQ ID NO: 25 Exemplary catalytically dead Cas13d (dCas13d) amino acid sequence NIPALVENQKKYFGTYSVMAMLNAQTVLDHIQKVADIEGEQNENNENLWFHPVMSHLYNAKN GYDKQPEKTMFIIERLQSYFPFLKIMAENQREYSNGKYKQNRVEVNSNDIFEVLKRAFGVLK MYRDLTNAYKTYEEKLNDGCEFLTSTEQPLSGMINNYYTVALRNMNERYGYKTEDLAFIQDK RFKFVKDAYGKKKSQVNTGFFLSLQDYNGDTQKKLHLSGVGIALLICLFLDKQYINIFLSRL PIFSSYNAQSEERRIIIRSFGINSIKLPKDRIHSEKSNKSVAMDMLNEVKRCPDELFTTLSA EKQSRFRIISDDHNEVLMKRSSDRFVPLLLQYIDYGKLFDHIRFHVNMGKLRYLLKADKT
- one or more for example but not limited to, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, or 1000, or more than 1000, or any range derivable therein
- features are used to classify (e.g., diagnose) a disease state and/or identify one or more effective treatment options for a patient with a disease associated with aberrant transcription (e.g., aberrant transcription associated with increased euchromatin).
- one or more features are used to diagnose a disease state and/or identify one or more effective treatment options for a patient with cancer.
- one or more features for example but not limited to, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, or 1000, or more than 1000, or any range derivable therein) are used to diagnose a disease state and/or identify one or more effective treatment options for a patient with a blood cancer.
- one or more features are used to diagnose a disease state and/or identify one or more effective treatment options for a patient with a leukemia. In some aspects, one or more features are used to diagnose a disease state and/or identify one or more effective treatment options for a patient with a myeloid malignancy. In some aspects, one or more features are used to diagnose a disease state and/or identify one or more effective treatment options for a patient with acute myeloid leukemia. In some aspects, one or more features are used to diagnose a disease state and/or identify one or more effective treatment options for a patient with chronic myelomonocytic leukemia.
- one or more features are used to diagnose a disease state and/or identify one or more effective treatment options for a patient with a glioma. In some aspects, one or more features are used to diagnose a disease state and/or identify one or more effective treatment options for a patient with glioblastoma.
- one or more for example but not limited to, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, or 1000, or more than 1000, or any range derivable therein
- features are used to diagnose a disease state and/or identify one or more effective treatment options for a patient diagnosed with a pre-cancerous condition, such as but not limited to, clonal hematopoiesis of indeterminate potential (CHIP).
- a pre-cancerous condition such as but not limited to, clonal hematopoiesis of indeterminate potential (CHIP).
- m 5 C RNA modification features are comprised, consist essentially of, or consist of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109,
- caRNAs of particular interest as m 5 C RNA features are described herein in Table 1 as SEQ ID NOs: 100-614 (e.g., SEQ ID NOs: 104 and 107 in Table 2).
- caRNAs of interest as m 5 C RNA features comprise, consist essentially of, or consist of HERVH-int repeats.
- caRNAs of interest as m 5 C RNA features comprise, consist essentially of, or consist of HERVH-int repeats identified in Table 1 (shaded rows) on Chr1, Chr2, Chr3, Chr4, Chr8, Chr11, Chr212, and ChrX.
- caRNAs of interest as m 5 C RNA features in leukemia cells comprise, consist essentially of, or consist of any one or more of HERVH-int repeats: chr1:22997913- 23003991;-;HERVH-int,LTR,ERV1;2,7713,(0)
- technologies described herein may comprise increasing one or more m 5 C marks at one or more loci, such as but not limited to, increasing m 5 C marks by 0.1x, 0.2x, 0.3x, 0.4x, 0.5x, 0.6x, 0.7x, 0.8x, 0.9x, 1.0x, 2.0x, 3.0x, 4.0x, 5.0x, 6.0x, 7.0x, 8.0x, 9.0x, 10.0x, 15.0x, 20.0x, 25.0x, 50.0x, 100.0x, or greater than 100x (or any range derivable therein) relative to a control level, at 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, or 1000, or more than 1000, or any range derivable therein,
- technologies described herein may comprise decreasing one or more m 5 C marks at one or more loci, such as but not limited to, decreasing m 5 C marks by 0.1x, 0.2x, 0.3x, 0.4x, 0.5x, 0.6x, 0.7x, 0.8x, 0.9x, 1.0x, 2.0x, 3.0x, 4.0x, 5.0x, 6.0x, 7.0x, 8.0x, 9.0x, 10.0x, 15.0x, 20.0x, 25.0x, 50.0x, 100.0x, or greater than 100x (or any range derivable therein) relative to a control level, at 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, or 1000, or more than 1000, or any range derivable therein,
- technologies described herein may comprise decreasing recognition (e.g., reading) of one or more m 5 C marks at one or more loci by one or more complexes comprising deubiquitination activity, (e.g., PR-DUB, PCR1, PCR2, etc.) such as but not limited to, decreasing recognition of m 5 C marks by 0.1x, 0.2x, 0.3x, 0.4x, 0.5x, 0.6x, 0.7x, 0.8x, 0.9x, 1.0x, 2.0x, 3.0x, 4.0x, 5.0x, 6.0x, 7.0x, 8.0x, 9.0x, 10.0x, 15.0x, 20.0x, 25.0x, 50.0x, 100.0x, or greater than 100x (or any range derivable therein) relative to a control level, at 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400
- RNAs that may guide a protein and/or ribonucleoprotein complex to a target RNA of interest.
- a targeting RNA is driven by a promoter comprising a Polymerase III promoter (i.e., a promoter that can drive Pol III mediated transcription).
- transcription of a targeting RNA is driven by one or more U6 promoters.
- a targeting RNA is transcribed by Polymerase III.
- a targeting RNA is driven by a promoter comprising a Polymerase II promoter (i.e., a promoter that can drive Pol II mediated transcription).
- RNA species that mediate greater than or equal to about 0%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41%, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 7
- Vectors comprising polynucleotide constructs according to the present disclosure include all those known in the art, including cosmids, plasmids (e.g., naked or contained in liposomes) and viral constructs (e.g., lentiviral, retroviral, adenoviral, and adeno associated viral constructs) that incorporate a polynucleotide encoding a polypeptide with m 5 C RNA writer, reader, and/or erasor functionality and/or targeting element described herein, or characteristic portions thereof (e.g., as utilized herein, a “characteristic portion thereof” refers to the portion of said protein required to perform the desired function, e.g., it comprises the ability to a impact m 5 C activity and/or levels, for example, a polypeptide with m 5 C RNA writer, reader, and/or erasor functionality, or the ability to inhibit the same, in a site-specific or non-site specific manner).
- cosmids e.
- a construct is a plasmid (i.e., a circular DNA molecule that can autonomously replicate inside a cell).
- a construct can be a cosmid (e.g., pWE or sCos series).
- a construct is a viral construct.
- a viral construct is a lentivirus, retrovirus, adenovirus, or adeno-associated virus construct.
- a construct is an adeno-associated virus (AAV) construct (see, e.g., Asokan et al., Mol. Ther.20: 699-7080, 2012, which is incorporated herein by reference for the purposes described herein).
- AAV adeno-associated virus
- a viral construct is an adenovirus construct.
- a viral construct may also be based on or derived from an alphavirus.
- a construct is a plasmid and can have a total length in a range of about 1 kb to about 2 kb, about 1 kb to about 3 kb, about 1 kb to about 4 kb, about 1 kb to about 5 kb, about 1 kb to about 6 kb, about 1 kb to about 7 kb, about 1 kb to about 8 kb, about 1 kb to about 9 kb, about 1 kb to about 10 kb, about 1 kb to about 11 kb, about 1 kb to about 12 kb, about 1 kb to about 13 kb, about 1 kb to about 14 kb, or about 1 kb to about 15 kb.
- a construct is a viral construct and can have a total number of nucleotides of up to 10 kb. In some aspects, a viral construct can have a total number of nucleotides in the range of about 4.5 kb to 5 kb, or about 4.7 kb.
- a viral construct can have a total number of nucleotides in the range of about 1 kb to about 2 kb, 1 kb to about 3 kb, about 1 kb to about 4 kb, about 1 kb to about 5 kb, about 1 kb to about 6 kb, about 1 kb to about 7 kb, about 1 kb to about 8 kb, about 1 kb to about 9 kb, about 1 kb to about 1 O kb, about 2 kb to about 3 kb, about 2 kb to about 4 kb, about 2 kb to about 5 kb, about 2 kb to about 6 kb, about 2 kb to about 7 kb, about 2 kb to about 8 kb, about 2 kb to about 9 kb, about 2 kb to about 10 kb, about 3 kb to about 4 kb, about 3 kb to about 5 kb, about 3 kb to
- a construct is a lentivirus construct and can have a total number of nucleotides of up to 8 kb.
- a lentivirus construct can have a total number of nucleotides of about 1 kb to about 2 kb, about 1 kb to about 3 kb, about 1 kb to about 4 kb, about 1 kb to about 5 kb, about 1 kb to about 6 kb, about 1 kb to about 7 kb, about 1 kb to about 8 kb, about 2 kb to about 3 kb, about 2 kb to about 4 kb, about 2 kb to about 5 kb, about 2 kb to about 6 kb, about 2 kb to about 7 kb, about 2 kb to about 8 kb, about 3 kb to about 4 kb, about 3 kb to about 4 kb, about 3 kb to about 5 kb, about 2 kb to about 6
- a construct is an adenovirus construct and can have a total number of nucleotides of up to 8 kb.
- an adenovirus construct can have a total number of nucleotides in the range of about 1 kb to about 2 kb, about 1 kb to about 3 kb, about 1 kb to about 4 kb, about 1 kb to about 5 kb, about 1 kb to about 6 kb, about 1 kb to about 7 kb, about 1 kb to about 8 kb, about 2 kb to about 3 kb, about 2 kb to about 4 kb, about 2 kb to about 5 kb, about 2 kb to about 6 kb, about 2 kb to about 7 kb, about 2 kb to about 8 kb, about 3 kb to about 4 kb, about 3 kb to about 4 kb, about 3 kb to about 5 kb, about 2 kb
- any of the constructs described herein can further include a control sequence, e.g., a control sequence selected from the group of a transcription initiation sequence, a transcription termination sequence, a promoter sequence, an enhancer sequence, an RNA splicing sequence, a polyadenylation (poly(A)) sequence, a Kozak consensus sequence, and/or additional untranslated regions which may house pre- or post-transcriptional regulatory and/or control elements.
- a promoter can be a native promoter, a constitutive promoter, an inducible promoter, and/or a tissue-specific promoter.
- control sequences are described herein. 1.
- an AAV particle may be described as a pseudotype, wherein the capsid and construct are derived from different AAV strains, for example, AAV2/9 would refer to an AAV particle that comprises a construct utilizing the AAV2 ITRs and an AAV9 capsid. Additional examples of pseudotyped AAV vectors include, but are not limited to, AAV2/1, AAV2/2, AAV2/3, AAV2/4, AAV2/5, AAV2/6, AAV2/7, AAV2/8 and AAV2/9. [0214] In some aspects, AAV particles suitable for use according to the present disclosure may comprise or be derived from any natural or recombinant AAV serotype.
- an AAV according to the present disclosure can be selected from natural serotypes such as AAV1, AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, AAV10, AAV11, and AAV12; or pseudotypes, chimeras, and variants thereof.
- the term "chimera” when referring to an AAV vector, or a "chimeric AAV vector” refers to an AAV vector which comprises a capsid containing VP1, VP2 and VP3 proteins from at least two different AAV serotypes; or alternatively, which comprises VP1, VP2 and VP3 proteins, at least one of which comprises at least a portion from another AAV serotype.
- an AAV serotype and/or pseudotype according to the present invention is selected from the group comprising or consisting of AAV1, AAV2, AAV3, AAV 4, AAV5, AAV6, AAV7, AAV8, AAV9, AAV10, AAV11, AAV12, AAV106.1/hu.37, AAV114.3/hu.40, AAV127.2/hu.41, AAV127.5/hu.42, AAV128.1/hu.43, AAV128.3/hu.44, AAV130.4/hu.48, AAV145.1/hu.53, AAV145.5/hu.54, AAV145.6/hu.55, AAV16.12/hu.11, AAV16.3, A
- the term "surface-bound", when referring to the at least one saccharide, means that said at least one saccharide is bound to and exposed at the outer surface of the AAV vector.
- Suitable examples of saccharides include, but are not limited to, monosaccharides, oligosaccharides, polysaccharides, and derivatives thereof.
- AAV constructs [0219]
- the present disclosure provides polynucleotide vectors (e.g., polynucleotide constructs) that comprise a nucleotide sequence encoding a polypeptide with m 5 C RNA writer, reader, and/or erasor functionality and/or targeting element.
- a polynucleotide vector comprising a nucleotide sequence encoding a polypeptide with m 5 C RNA writer, reader, and/or erasor functionality and/or targeting element, can be comprised in an AAV capsid to produce an AAV particle (e.g., an AAV particle comprises an AAV construct comprised in an AAV capsid).
- a polynucleotide construct comprises one or more components derived from or modified from a naturally occurring AAV genomic construct.
- Typical AAV2-derived ITR sequences are approximately 145 nucleotides in length. In some aspects, at least or exactly 80% of a typical ITR sequence (e.g., at least or exactly 85%, at least or exactly 90%, at least or exactly 95%, or at least or exactly 100%, etc.) is incorporated into a construct provided herein. The ability to modify these ITR sequences is within the skill of the art. (See, e.g., texts such as Sambrook et al., "Molecular Cloning.
- any of the coding sequences and/or constructs described herein are flanked by 5' and 3' AAV ITR sequences.
- the AAV ITR sequences may be obtained from any known AAV, including presently identified AAV types.
- polynucleotide constructs described in accordance with this disclosure and in a pattern known to the art see, e.g., Asokan et al., Mal.
- provided constructs comprise an additional optional coding sequence that is a nucleic acid sequence (e.g., inhibitory nucleic acid sequence), heterologous to the construct sequences, which encodes a polypeptide, protein, functional RNA molecule (e.g., miRNA, miRNA inhibitor) or other gene product, of interest.
- a nucleic acid coding sequence is operatively linked to and/or control components in a manner that permits coding sequence transcription, translation, and/or expression in a cell of a target tissue.
- an unmodified AAV endogenous genome includes two open reading frames, "cap” and "rep,” which are flanked by ITRs.
- an AAV construct optionally comprises a promoter, an enhancer, an untranslated region (e.g., a 5' UTR, 3' UTR), a Kozak sequence, an internal ribosomal entry site (IRES), splicing sites (e.g., an acceptor site, a donor site), a polyadenylation site, or any combination thereof.
- a construct is an AAV construct.
- any of the constructs described herein can further include regulatory and/or control sequences, e.g., a control sequence selected from the group of a transcription initiation sequence, a transcription termination sequence, a promoter sequence, an enhancer sequence, an RNA splicing sequence, a polyadenylation (poly(A)) sequence, a Kozak consensus sequence, and/or any combination thereof.
- a promoter can be a native promoter, a constitutive promoter, an inducible promoter, and/or a tissue-specific promoter.
- Non-limiting examples of control sequences are described herein and others are known in the art 3.
- promoter refers to a DNA sequence recognized by enzymes/proteins that can promote and/or initiate transcription of an operably linked gene.
- a promoter typically refers to, e.g., a nucleotide sequence to which an RNA polymerase and/or any associated factor binds and from which it can initiate transcription.
- a construct e.g., an AAV construct
- a promoter will generally be one that is able to promote transcription in a mammalian cell.
- a variety of promoters are known in the art, which in some aspects, can be used herein.
- Nonlimiting examples of promoters that can be used herein in some aspects include: human EFl ⁇ , human cytomegalovirus (CMV) (US Patent No.5,168,062, which is incorporated herein by reference for the purposes described herein), human ubiquitin C (UBC), mouse phosphoglycerate kinase 1, polyoma adenovirus, simian virus 40 (SV40), ⁇ -globin, ⁇ -actin, ⁇ - fetoprotein, ⁇ -globin, ⁇ -interferon, ⁇ -glutamyl transferase, mouse mammary tumor virus (MMTV), Rous sarcoma virus, rat insulin, glyceraldehyde-3-phosphate dehydrogenase, metallothionein II (MT II), am
- a promoter is the CMV immediate early promoter.
- the promoter is a CAG promoter and/or a CAG/CBA promoter.
- Enhancers [0236]
- a construct can include an enhancer sequence.
- the term "enhancer” as used herein refers to a nucleotide sequence that can increase the level of transcription of a nucleic acid encoding a protein and/or RNA molecule of interest (e.g., a polypeptide with m 5 C RNA writer, reader, and/or erasor functionality and/or targeting element), and/or increase or modify the translational efficiency of a transcript following transcription.
- enhancer sequences generally 50-1500 bp in length
- transcription-associated proteins e.g., transcription factors
- an enhancer sequence is found within an intronic sequence. In some aspects, an enhancer sequence is found in a 3 ⁇ and/or 5 ⁇ UTR. In some aspects, an enhancer region is found downstream of a coding sequence comprising a transgene and proximal to a poly adenylation sequence. Unlike promoter sequences, enhancer sequences can act at much larger distance away from the transcription start site (e.g., as compared to a promoter).
- Non-limiting examples of enhancers include a woodchuck hepatitis virus post-transcriptional regulatory element (WPRE), RSV enhancer, a CMV enhancer, and/or a SV40 enhancer.
- any of the constructs described herein can include an untranslated region (UTR), such as a 5' UTR or a 3' UTR.
- UTRs of a gene are transcribed but not translated.
- a 5' UTR starts at the transcription start site and continues to the start codon but does not include the start codon.
- a 3' UTR starts immediately following the stop codon and continues until the transcriptional termination signal.
- the regulatory and/or control features of a UTR can be incorporated into any of the constructs, particles, polynucleotides, compositions, kits, or methods as described herein to enhance or otherwise modulate the expression of a gene.
- Natural 5' UTRs include a sequence that plays a role in translation initiation.
- a 5' UTR can comprise sequences, like Kozak sequences, which are commonly known to be involved in the process by which the ribosome initiates translation of many genes.
- Kozak sequences have the consensus sequence CCR(A/G)CCAUGG, where R is a purine (A or G) three bases upstream of the start codon (AUG), and the start codon is followed by another “G”.
- 5' UTRs also form secondary structures that are involved in elongation factor binding.
- a 5' UTR is included in any of the constructs described herein.
- AU-rich elements can be separated into three classes (see e.g., Chen et al., Mol. Cell. Biol.15:5777-5788, 1995; Chen et al., Mol. Cell Biol.15:2010-2018, 1995, each of which is incorporated herein by reference for the purposes described herein): Class I AREs contain several dispersed copies of an AUUUA motif within U-rich regions. For example, c- Myc and MyoD mRNAs contain class I AREs. Class II AREs possess two or more overlapping UUAUUUA(U/A) (U/A) nonamers.
- GM-CSF and TNF-alpha mRNAs are examples that contain class II AREs.
- Class III AREs are less well defined. These U-rich regions do not contain an AUUUA motif, two well-studied examples of this class are c-Jun and myogenin mRNAs.
- Most proteins binding to the AREs are known to destabilize the messenger, whereas members of the ELAV family, most notably HuR, have been documented to increase the stability of mRNA.
- HuR binds to AREs of all the three classes. Engineering the HuR specific binding sites into the 3' UTR of nucleic acid molecules may lead to HuR binding and thus, stabilization of the message in vivo.
- the introduction, removal, or modification of 3' UTR AREs can be used to modulate the stability of an mRNA encoding a gene of interest.
- AREs can be removed or mutated to increase the intracellular stability and thus increase translation and production of a protein of interest.
- non-ARE sequences may be incorporated into the 5' or 3' UTRs.
- introns or portions of intron sequences may be incorporated into the flanking regions of the polynucleotides in any of the constructs, particles, polynucleotides, compositions, kits, and methods provided herein. Incorporation of intronic sequences may increase protein production as well as mRNA levels. e.
- IRES sequences known to those in skilled in the art, including those from, e.g., foot and mouth disease virus (FMDV), encephalomyocarditis virus (EMCV), human rhinovirus (HRV), cricket paralysis virus, human immunodeficiency virus (HIV), hepatitis A virus (HA V), hepatitis C virus (HCV), and poliovirus (PV) (see e.g., Alberts, Molecular Biology of the Cell, Garland Science, 2002; and Hellen et al., Genes Dev. 15(13):1593-612, 2001, each of which are incorporated herein by reference for the purposes described herein).
- FMDV foot and mouth disease virus
- EMCV encephalomyocarditis virus
- HRV human rhinovirus
- HCV hepatitis A virus
- HCV hepatitis C virus
- PV poliovirus
- an IRES sequence that is incorporated into a construct described herein is the foot and mouth disease virus (FMDV) 2A sequence.
- the Foot and Mouth Disease Virus 2A sequence is a small peptide (approximately 18 amino acids in length) that has been shown to mediate the cleavage of polyproteins (see e.g., Ryan, MD et al., EMBO 4:928-933, 1994; Mattion et al., J Virology 70:8124-8127, 1996; Furler et al., Gene Therapy 8:864-873, 2001; and Halpin et al., Plant Journal 4:453-459, 1999, each of which is incorporated herein by reference for the purposes described herein).
- the cleavage activity of the 2A sequence has previously been demonstrated in artificial systems including plasmids and gene therapy constructs (e.g., AAV and retroviruses) (see e.g., Ryan et al., EMBO 4:928-933, 1994; Mattion et al., J Virology 70:8124-8127, 1996; Furler et al., Gene Therapy 8:864-873, 2001; and Halpin et al., Plant Journal 4:453-459, 1999; de Felipe et al., Gene Therapy 6: 198-208, 1999; de Felipe et al., Human Gene Therapy II: 1921-1931, 2000; and Klump et al., Gene Therapy 8:811-817, 2001, each of which is incorporated herein by reference for the purposes described herein).
- gene therapy constructs e.g., AAV and retroviruses
- a construct provided herein can include a polyadenylation (poly(A)) signal sequence.
- poly(A) polyadenylation
- a poly(A) tail confers mRNA stability and transferability (see e.g., Molecular Biology of the Cell, Third Edition by B.
- polyadenylation refers to the covalent linkage of a polyadenylyl moiety, or its modified variant, to a messenger RNA molecule. In eukaryotic organisms, most messenger RNA (mRNA) molecules are polyadenylated at the 3' end.
- a "poly(A) signal sequence” or “polyadenylation signal sequence” is a sequence that triggers the endonuclease cleavage of an mRNA and the addition of a series of adenosines to the 3' end of the cleaved mRNA.
- poly(A) signal sequences There are several poly(A) signal sequences that can be used in some aspects, including those derived from bovine growth hormone (bGH) (Woychik et al., Proc. Natl. Acad Sci. U.S.A.81(13):3944-3948, 1984; U.S.
- Patent No.5,122,458 each of which is incorporated herein by reference for the purposes described herein
- mouse- ⁇ -globin mouse- ⁇ -globin
- mouse- ⁇ -globin human collagen
- polyoma virus Bacillus Virus
- HSV TK Herpes simplex virus thymidine kinase gene
- IgG heavy-chain gene polyadenylation signal US 2006/0040354, which is incorporated herein by reference for the purposes described herein
- human growth hormone hGH
- SV40 poly(A) site such as the SV40 late and early poly(A) site (see e.g., Schek et al., Mol Cell Biol.
- the poly(A) signal sequence can be AATAAA.
- the AATAAA sequence may be substituted with other hexanucleotide sequences with homology to AATAAA and that are capable of signaling polyadenylation, including ATTAAA, AGTAAA, CATAAA, TATAAA, GATAAA, ACTAAA, AATATA, AAGAAA, AATAAT, AAAAAA, AATGAA, AATCAA, AACAAA, AATCAA, AATAAC, AATAGA, AATTAA, or AATAAG (see, e.g., WO 06/12414, which is incorporated herein by reference for the purposes described herein).
- a poly(A) signal sequence can be a synthetic polyadenylation site (see, e.g., the pCl-neo expression construct of Promega that is based on Levitt et al., Genes Dev. 3(7):1019-1025, 1989, which is incorporated herein by reference for the purposes described herein). h. Additional sequences [0252]
- constructs of the present disclosure may comprise a 2A element or sequence.
- constructs of the present disclosure may include one or more cloning sites. In some such aspects, cloning sites may not be fully removed prior to manufacturing for administration to a subject.
- cloning sites may have functional roles including as linker sequences, or as portions of a Kozak site. As will be appreciated by those skilled in the art, cloning sites may vary significantly in primary sequence while retaining their desired function.
- a 2A element is a T2A, P2A, E2A, and/or F2A element.
- a 2A sequence may comprise an optional 5 ⁇ linker sequence, such as but not limited to GSG (e.g., Glycine, Serine, Glycine). i.
- any of the constructs provided herein can optionally include a sequence encoding a destabilizing domain ("a destabilizing sequence") for temporal and/or spatial control of protein expression.
- destabilizing sequences include sequences encoding a FK506 sequence, a dihydrofolate reductase (DHFR) sequence, or other exemplary destabilizing sequences.
- DHFR dihydrofolate reductase
- protein degradation is inhibited, thereby allowing the protein sequence operatively linked to the destabilizing sequence to be actively expressed.
- protein expression can be detected by conventional means, including enzymatic, radiographic, colorimetric, fluorescence, or other spectrographic assays, fluorescent activating cell sorting (FACS) assays, and/or immunological assays (e.g., enzyme linked immunosorbent assay (ELISA), radioimmunoassay (RIA), and immunohistochemistry).
- FACS fluorescent activating cell sorting
- immunological assays e.g., enzyme linked immunosorbent assay (ELISA), radioimmunoassay (RIA), and immunohistochemistry.
- the destabilizing sequence is a FK506- and rapamycin-binding protein (FKBP12) sequence
- the stabilizing ligand is Shield-I (Shld1)
- a destabilizing sequence is a DHFR sequence
- a stabilizing ligand is trimethoprim (TMP) (see e.g., Iwamoto et al., (2010) Chem Biol 17:981-988, which is incorporated herein by reference for the purposes described herein).
- constructs provided herein can optionally include a sequence encoding a reporter polypeptide and/or protein ("a reporter sequence").
- reporter sequences include DNA sequences encoding: a beta-lactamase, a betagalactosidase (LacZ), an alkaline phosphatase, a thymidine kinase, a green fluorescent protein (GFP), a red fluorescent protein, an mCherry fluorescent protein, a yellow fluorescent protein, a chloramphenicol acetyltransferase (CAT), and a luciferase. Additional examples of reporter sequences are known in the art.
- the reporter sequence When associated with control elements which drive their expression, the reporter sequence can provide signals detectable by conventional means, including enzymatic, radiographic, colorimetric, fluorescence, or other spectrographic assays, fluorescent activating cell sorting (FACS) assays and/or immunological assays (e.g., enzyme linked immunosorbent assay (ELISA), radioimmunoassay (RIA), and immunohistochemistry).
- FACS fluorescent activating cell sorting
- immunological assays e.g., enzyme linked immunosorbent assay (ELISA), radioimmunoassay (RIA), and immunohistochemistry.
- a reporter sequence is a FLAG tag (e.g., a 3xFLAG tag), and the presence of a construct carrying the FLAG tag in a cell is detected by protein binding or detection assays (e.g., Western blots, immunohistochemistry, radioimmunoassay (RIA), mass spectrometry).
- a reporter sequence can be used to verify tissue-specific targeting capabilities and/or tissue-specific promoter regulatory and/or control activity of any of the constructs described herein.
- a therapeutic agent which can be any of the therapeutic agents disclosed herein (for example but not limited to, inhibitors of NSUN1, NSUN2, MBD5, MBD6, and/or TET2, and/or fusion proteins described herein).
- the nanoparticle compositions may encapsulate therapeutic agents, which may be but are not limited to: engineered protein compositions, polynucleotides, and/or small molecules.
- a nanoparticle composition confers water solubility to hydrophobic agents, to combinations of hydrophobic agents, and/or to combinations of hydrophobic and hydrophilic agents.
- a nanoparticle composition comprises a liposomal and/or nano-emulsion composition of a therapeutic agent.
- a nanoparticle composition e.g., a mixed micelle composition, a liposomal composition, solid lipid particles, oil-in-water emulsions, water-in-oil-in-water emulsions, water-in-oil emulsions, oil-in-water-in-oil emulsions, etc.
- a nanoparticle composition e.g., a mixed micelle composition, a liposomal composition, solid lipid particles, oil-in-water emulsions, water-in-oil-in-water emulsions, water-in-oil emulsions, oil-in-water-in-oil emulsions, etc.
- the dry weight % of one or more therapeutic agents present in the nanoparticle compositions is equal to, is equal to at least, or is equal to at most: 0.1%, 0.5%, 1%, 2.5%, 5%, 7.5%, 10%, 12.5%, 15%, 20%, 22.5%, 25%, 27.5%, 30%, 32.5%, 35%, 37.5%, 40%, 42.5%, 45%, 47.5%, 50%, 52.5%, 55%, 57.5%, 60%, or any range derivable therein.
- the therapeutic agents are provided in an aqueous composition.
- the one or more therapeutic agents are present in the aqueous composition at a concentration of greater than or equal to about: 150 mg/mL, 100 mg/mL, 75 mg/mL, 50 mg/mL, 25 mg/mL, 20 mg/mL, 10 mg/mL, 5 mg/mL, 2.5 mg/mL, 2 mg/mL, 1.5 mg/mL, 1 mg/mL, 0.5 mg/mL, 0.1 mg/mL, 0.05 mg/mL, 0.01 mg/mL, or ranges including and/or spanning the aforementioned values.
- the one or more therapeutic agents, collectively or individually are present in the composition at a dry wt.
- a long chain triglyceride comprises a fatty acid greater than 12 carbons in length (e.g., greater than or equal to 13, 14, 15, 16, 17, 18, 19, or 20 carbons in length, or ranges including and/or spanning the aforementioned values).
- a co-emulsifier component is a single lipid.
- a co-emulsifier component is highly pure.
- a co-emulsifier component has a purity by weight % of equal to or greater than about: 90%, 95%, 97%, 98%, 99%, 100%, or ranges including and/or spanning the aforementioned values.
- a co-emulsifier component is present in the nanoparticle composition at dry weight % of equal to or greater than about: 10%, 20%, 30%, 35%, 40%, 45%, 50%, or ranges including and/or spanning the aforementioned values.
- a nanoparticle composition comprises one or more sterols.
- a nanoparticle composition does not comprise a sterol.
- one or more sterols comprises one or more cholesterols, ergosterols, hopanoids, hydroxysteroids, phytosterols (e.g., vegapure), ecdysteroids, and/or steroids.
- a sterol comprises a cholesterol.
- sorbates and benzoates may be used in acidic pH formulations.
- one or more preservatives are present in the composition at a dry wt. % of equal to or at less than about: 0.01%, 0.1%, 0.25%, 0.5%, 1%, 5%, 7.5%, 10%, 15%, 20%, 25%, or ranges including and/or spanning the aforementioned values.
- one or more preservatives are present in the composition at a wet wt.
- the composition is aqueous, while in others it has been dried into a powder.
- the composition is aqueous (wet), while in others it has been dried into a powder (dry).
- preservatives inhibit or prevent growth of mold, bacteria, and/or fungus.
- a nanoparticle composition comprises a metal.
- a metal may be zinc.
- a nanoparticle composition does not comprise a metal.
- one or more metals are present in the composition at a dry wt.
- one or more metals are present in the composition at a wet wt. % of equal to or less than about: 0.001%, 0.01%, 0.025%, 0.05%, 0.1%, 0.5%, 0.75%, 1.0%, 1.5%, 2.0%, 2.5%, 5%, or ranges including and/or spanning the aforementioned values.
- the size of the particle can be measured using Scanning Electron Microscopy (SEM). In several aspects, the size of the particle can be measured using a cyrogenic SEM (cryo-SEM). Where the size of a nanoparticle is disclosed elsewhere herein, any one or more of these instruments or methods may be used to measure such sizes.
- a nanoparticle composition may comprise nanoparticles having an average size of less than or equal to about: 10 nm, 25 nm, 40 nm, 50 nm, 100 nm, 250 nm, 500 nm, 1000 nm, or ranges including and/or spanning the aforementioned values.
- the size distribution of the nanoparticles for at least 90% of the particles present is equal to or less than about: 100 nm, 110 nm, 120 nm, 130 nm, 140 nm, 160 nm, 180 nm, 200 nm, or ranges including and/or spanning the aforementioned nm values.
- the D90 of the particles present is equal to or less than about: 80 nm, 100 nm, 110 nm, 120 nm, 130 nm, 140 nm, 160 nm, 180 nm, 200 nm, 300 nm, 400 nm, 500 nm, or ranges including and/or spanning the aforementioned values.
- nanoparticles prepared by methods disclosed herein have a particle size of between about 90 nm to about 150 nm (e.g., as measured by zeta sizing (e.g., refractive index). In several aspects, maintaining consistency in size allows predictable delivery to subjects. In several aspects, the D90 particle size measurement varies between 150 and 500 nm. [0275] In several aspects, the average size of the nanoparticles of a composition as disclosed herein may be substantially constant and/or does not change significantly over time (e.g., it is a stable nanoparticle).
- the polydispersity index (PDI) of the nanoparticles of a composition as disclosed herein is less than or equal to about: 0.05, 0.10, 0.15, 0.20, 0.25, 0.30, 0.40, 0.50, 0.60, 0.70, 0.80, or ranges including and/or spanning the aforementioned values.
- the size distribution of the nanoparticles is highly monodisperse with a polydispersity index of less than or equal to about: 0.05, 0.10, 0.15, 0.20, 0.25, or ranges including and/or spanning the aforementioned values.
- the zeta potential of the nanoparticles of a composition as disclosed herein is less than or equal to about: 1 mV, 3 mV, 4 mV, 5 mV, 6 mV, 7 mV, 8 mV, 10 mV, 20 mV, or ranges including and/or spanning the aforementioned values.
- the zeta potential of the nanoparticles is greater than or equal to about: -3 mV, -1 mV, 0 mV, 1 mV, 3 mV, 4 mV, 5 mV, 6 mV, 7 mV, 8 mV, 4 mV, 10 mV, 20 mV, or ranges including and/or spanning the aforementioned values.
- the zeta potential and/or diameter of the particles is acquired using a zetasizer (e.g., a Malvern ZS90 or similar instrument).
- a nanoparticle composition is an oil-in-water emulsion, water-in-oil emulsion, water-in-oil-in-water emulsion, oil-in-water-in- oil emulsion, liposome, solid lipid particles formulation, etc.
- these may just be referred to as the composition.
- a nanoparticle composition can be processed to comprises one or more of solid lipid nanoparticles, liposomes (and variants including multi- lamellar, double liposome preparations, etc.), niosomes, ethosomes, electrostatic particulates, microemulsions, nanoemulsions, microsuspensions, nanosuspensions, or combinations thereof.
- polymeric nanoparticles may be formed.
- cyclodextrin is added.
- a solid lipid nanoparticle compositions comprises a lipid core matrix.
- the lipid core matrix is solid.
- the solid lipid comprises one or more ingredients as disclosed elsewhere herein.
- the core of the solid lipid comprises one or more lipids, surfactants, active ingredients, etc.
- the surfactant acts as an emulsifier.
- emulsifiers can be used to stabilize the lipid dispersion (with respect to charge and molecular weight).
- the core ingredients e.g., the components of the core
- the core ingredients and/or the emulsifiers are present in the composition at a wet wt. % of equal to or less than about: 0.5%, 1.0% 2.5%, 5%, 7.5%, 10%, 12.5%, 15%, 20%, 30%, 40%, 60% or ranges including and/or spanning the aforementioned values.
- a nanoparticle composition (e.g., when in water or dried) comprises multilamellar nanoparticle vesicles, unilamellar nanoparticle vesicles, multivesicular nanoparticles, emulsion particles, irregular particles with lamellar structures and bridges, partial emulsion particles, combined lamellar and emulsion particles, and/or combinations thereof.
- the nanoparticle compositions do not comprise multilamellar nanoparticle vesicles, unilamellar nanoparticle vesicles, multivesicular nanoparticles, emulsion particles, irregular particles with lamellar structures and bridges, partial emulsion particles, combined lamellar and emulsion particles, and/or combinations thereof.
- the composition is characterized by having multiple types of particles (e.g., lamellar, emulsion, irregular, etc.).
- a majority of the particles present are emulsion particles.
- a majority of the particles present are lamellar (multilamellar and/or unilamellar).
- a majority of the particles present are irregular particles.
- a minority of the particles present are emulsion particles.
- a minority of the particles present are lamellar (multilamellar and/or unilamellar).
- a minority of the particles present are irregular particles.
- multilamellar nanoparticles comprise equal to or at least about 5%, 8%, 9%, 10%, 15%, 25%, 50%, 75%, 85%, 95%, or 100% (or ranges spanning and/or including the aforementioned values) of the particles present in the composition (e.g., the aqueous composition).
- unilamellar nanoparticles comprise equal to, at most, or at least about 5%, 8%, 9%, 10%, 15%, 20%, 25%, 50%, 75%, 85%, 95%, or 100% (or ranges spanning and/or including the aforementioned values) of the particles present in the composition (e.g., the aqueous composition).
- the composition e.g., the aqueous composition.
- between about 10% and about 15% of the particles present are unilamellar.
- emulsion particles comprise equal to, at most, or at least about 5%, 8%, 9%, 10%, 15%, 25%, 50%, 60%, 65%, 70%, 75%, 85%, 95%, or 100% (or ranges spanning and/or including the aforementioned values) of the particles present in the composition (e.g., the aqueous composition). For example, in some aspects, between about 60% to about 75% of the particles present are emulsion particles.
- micelle particles comprise equal to, at most, or at least about 5%, 8%, 9%, 10%, 15%, 25%, 50%, 60%, 65%, 70%, 75%, 85%, 95%, or 100% (or ranges spanning and/or including the aforementioned values) of the particles present in the composition (e.g., the aqueous composition).
- liposomes comprise equal to, at most, or at least about 5%, 8%, 9%, 10%, 15%, 25%, 50%, 60%, 65%, 70%, 75%, 85%, 95%, or 100% (or ranges spanning and/or including the aforementioned values) of the particles present in the composition (e.g., the aqueous composition).
- combined lamellar and emulsion particles comprise equal to, at most, or at least about 5%, 6%, 7%, 8%, 9%, 10%, 15%, 25%, 50%, 75%, 85%, 95%, or 100% (or ranges spanning and/or including the aforementioned values) of the particles present in the composition (e.g., the aqueous composition).
- mixed-micelle particles comprise equal to, at most, or at least about 5%, 6%, 7%, 8%, 9%, 10%, 15%, 25%, 50%, 75%, 85%, 95%, or 100% (or ranges spanning and/or including the aforementioned values) of the particles present in the composition (e.g., the aqueous composition).
- the nanoparticle compositions can comprise, but are not limited to, combinations of multilamellar nanoparticles, unilamellar nanoparticles, emulsion nanoparticles, micelle nanoparticles, irregular particles, and/or liposomes.
- the percentages and/or concentrations of particles present in the composition may be purposefully modified. In some aspects, the percentage and/or concentration of the particles present in the composition are tailored to the active compound and/or the liquid comprising the particles. Such tailoring may lead to more homogenization and/or dispersion in the liquid. The tailoring may stabilize dispersion in the liquid.
- the formulations and/or compositions disclosed herein are stable during sterilization.
- the nanoparticle compositions (including after stabilization) disclosed herein have a shelf life of equal to or greater than 1 month, 3 months, 6 months, 12 months, 14 months, 16 months, 18 months, 19 months, or ranges including and/or spanning the aforementioned values.
- the shelf-life can be determined as the period of time in which there is 95% confidence that at least 50% of the response (active agent(s) concentration or particle size) is within the specification limit. This refers to a 95% confidence interval and when linear regression predicts that at least 50% of the response is within the set specification limit.
- compositions and methods for modifying histone marks such as but not limited to, histone ubiquitination and/or histone methylation.
- histone modifications can occur via inhibition of NSUN2, MBD6, and/or TET2.
- histone modifications can occur via contact of a histone associated caRNA with a polypeptide with m 5 C RNA writer, reader, and/or erasor functionality.
- histone modifications can occur via site-specific targeting of caRNA molecules associated with certain genetic loci.
- technologies described herein may comprise increasing one or more histone marks at one or more loci, such as but not limited to, increasing histone marks by 0.1x, 0.2x, 0.3x, 0.4x, 0.5x, 0.6x, 0.7x, 0.8x, 0.9x, 1.0x, 2.0x, 3.0x, 4.0x, 5.0x, 6.0x, 7.0x, 8.0x, 9.0x, 10.0x, 15.0x, 20.0x, 25.0x, 50.0x, 100.0x, or greater than 100x (or any range derivable therein) relative to a control histone mark level, at 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, or 1000, or more than 1000, or any range derivable therein, his
- technologies described herein may comprise increasing one or more H2AK119ub mark at one or more loci, such as but not limited to, increasing histone H2AK119ub by 0.1x, 0.2x, 0.3x, 0.4x, 0.5x, 0.6x, 0.7x, 0.8x, 0.9x, 1.0x, 2.0x, 3.0x, 4.0x, 5.0x, 6.0x, 7.0x, 8.0x, 9.0x, 10.0x, 15.0x, 20.0x, 25.0x, 50.0x, 100.0x, or greater than 100x (or any range derivable therein) relative to control H2AK119ub levels, at 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, or 1000, or more than 1000,
- a target RNA comprises, consists essentially of, or consists of one or more (e.g., at 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, or 1000, or more than 1000, or any range derivable therein) caRNA described in Table 1 and annotated as SEQ ID NOs: 100-614.
- a target RNA comprises, consists essentially of, or consists of one or more caRNA described in Table 1 and annotated as SEQ ID NOs: 104 or 107.
- constructs, vectors, particles, polypeptides, polynucleotides, and/or compositions described herein may be comprised in a formulation wherein the formulation comprises pharmaceutically acceptable excipients.
- constructs, vectors, particles, polypeptides, polynucleotides, and/or compositions described herein may be administered to a cell in an in vitro environment.
- a cell may be derived from a subject.
- a cell is an immune cell, a stem cell, an induced pluripotent stem cell, a precursor cell, and/or a terminally differentiated cell.
- administration regimens comprising constructs, vectors, particles, polypeptides, polynucleotides, and/or compositions described herein comprise administering of more than one composition, such as 2 compositions, 3 compositions, 4 compositions, or more than 4 compositions.
- compositions such as 2 compositions, 3 compositions, 4 compositions, or more than 4 compositions.
- constructs, vectors, particles, polypeptides, polynucleotides, and/or compositions of the disclosure may be administered by the same route of administration or by different routes of administration.
- constructs, vectors, particles, polypeptides, polynucleotides, and/or compositions described herein, or those otherwise known in the art may be used in treating a disease or disorder, wherein the disease or disorder is a neurodegenerative disease, an inflammatory disease, an autoimmune disease, a metabolic syndrome, a cancer, a vascular disease, a fibrotic disease, a viral infection, a bacterial infection, a fungal infection, a parasitic infection, a musculoskeletal disease (such as a myopathy), an ocular disease, or a genetic disorder.
- the disease or disorder is a cancer.
- the cancer of secretory cells is non-Hodgkin’s lymphoma, Burkitt’s lymphoma, chronic lymphocytic leukemia, monoclonal gammopathy of undetermined significance (MGUS), plasmacytoma, lymphoplasmacytic lymphoma or acute lymphoblastic leukemia.
- methods described herein comprise treatment of CHIP and/or one or more disease states associated therewith, comprising inhibition of NSUN2.
- a disease or disorder for treatment comprising technologies described herein is associated with diseased cells with one or more mutations in one or more genes encoding a ten-eleven translocation (tet) methylcytosine dioxygenase 2 (TET2), ASXL transcriptional regulator 1 (ASXL1), isocitrate dehydrogenase 1 (IDH1), isocitrate dehydrogenase 2 (IDH2), tumor protein p53 (p53), DNA (cytosine-5-)-methyltransferase 3A (DNMT3A), Janus kinase 2 (JAK2), Protein Phosphatase Mn2+/Mg2+-Dependent 1D (PPM1D), Spliceosome Factor 3b1 (SF3B1), and/or Serine and Arginine Rich S
- tet ten-
- a disease or disorder for treatment comprising technologies described herein is associated with a diseased cell comprising one or more mutations in a TET2 encoding gene (e.g., one or more loss of function mutations).
- a disease or disorder for treatment comprising technologies described herein is associated with diseased cells with one or more mutations in TET2, IDH1, IDH2, DNMT3A, ASXL1, PPM1D, TP53, JAK2, SF3B1, and/or SRSF2.
- a disease or disorder for treatment comprising technologies described herein is associated with diseased cells with one or more mutations in one or more genes encoding ASXL1, p53, DNMT3A, JAK2, PPM1D, SF3B1, and/or SRSF2.
- a disease or disorder for treatment comprising technologies described herein is associated with diseased cells with one or more mutations in TET2, IDH1, IDH2, DNMT3A, ASXL1, PPM1D, TP53, JAK2, SF3B1, and/or SRSF2, wherein the one or more mutations renders the diseased cells susceptible to synthetic lethality induced by inhibition of MBD5, MBD6, NSUN1, and/or NSUN2.
- the diseased cell comprising one or more mutations in PR-DUB complex associated components OGT, KDM1B, FOXK1, FOXK2, BAP1, ASXL1, ASXL2, ASXL3, and/or HCFC1 comprises one or more gain of function mutations in those genes.
- a disease or disorder for treatment expressly does not comprise diseased cells with one or more mutations in OGT.
- a disease or disorder for treatment expressly does not comprise diseased cells with one or more mutations in KDM1B.
- a disease or disorder for treatment expressly does not comprise diseased cells with one or more mutations in FOXK1.
- the disease or disorder is an autoimmune disease.
- the autoimmune disease is, or expressly is not, systemic lupus erythematosus, type 1 diabetes, multiple sclerosis, psoriasis/psoriatic arthritis, inflammatory bowel disease, Addison’s disease, Graves’ disease, Sjogren’s syndrome, Hashimoto’s thyroiditis, Myasthenia gravis, autoimmune vasculitis, pernicious anemia, celiac disease, or rheumatoid arthritis.
- the disease or disorder is a metabolic syndrome.
- the metabolic syndrome is, or expressly is not, acute pancreatitis, chronic pancreatitis, alcoholic liver steatosis, obesity, glucose intolerance, insulin resistance, hyperglycemia, fatty liver, dyslipidemia, hyperlipidemia, hyperhomocysteinemia, or type 2 diabetes.
- the metabolic syndrome is alcoholic liver steatosis, obesity, glucose intolerance, insulin resistance, hyperglycemia, fatty liver, dyslipidemia, hyperlipidemia, hyperhomocysteinemia, or type 2 diabetes.
- the disease or disorder is a musculoskeletal disease (such as a myopathy).
- the musculoskeletal disease (such as the skeletal muscle atrophy) is triggered by ageing, chronic diseases, stroke, malnutrition, bedrest, orthopedic injury, bone fracture, cachexia, starvation, heart failure, obstructive lung disease, renal failure, Acquired Immunodeficiency Syndrome (AIDS), sepsis, an immune disorder, a cancer, ALS, a burn injury, denervation, diabetes, muscle disuse, limb immobilization, mechanical unload, myositis, or a dystrophy.
- the disease or disorder is, or expressly is not, a musculoskeletal disease.
- skeletal muscle mass, quality and/or strength are increased.
- the disease or disorder is, or expressly is not, a vascular disease.
- the vascular disease is atherosclerosis, abdominal aortic aneurism, carotid artery disease, deep vein thrombosis, Buerger’s disease, chronic venous hypertension, vascular calcification, telangiectasia or lymphoedema.
- the disease or disorder is genetic disorder.
- suitable carriers for parenteral delivery via injectable, infusion or irrigation and topical delivery include distilled water, physiological phosphate-buffered saline, normal or lactated Ringer's solutions, dextrose solution, Hank's solution, or propanediol.
- sterile, fixed oils may be employed as a solvent or suspending medium.
- any biocompatible oil may be employed including synthetic mono- or diglycerides.
- fatty acids such as oleic acid find use in the preparation of injectables.
- the carrier and agent may be compounded as a liquid, suspension, polymerizable or non-polymerizable gel, paste or salve.
- the carrier may also comprise a delivery vehicle to sustain (i.e., extend, delay or regulate) the delivery of the agent(s) or to enhance the delivery, uptake, stability or pharmacokinetics of the therapeutic agent(s).
- the composition may contain 10 mg or less, 25 mg, 50 mg or up to about 100 mg of human serum albumin per milliliter of phosphate buffered saline.
- Other pharmaceutically acceptable carriers include aqueous solutions, non-toxic excipients, including salts, preservatives, buffers and the like.
- non-limiting examples of non-aqueous solvents are propylene glycol, polyethylene glycol, vegetable oil and injectable organic esters such as ethyloleate.
- non-limiting examples of aqueous carriers include water, alcoholic/aqueous solutions, saline solutions, parenteral vehicles such as sodium chloride, Ringer's dextrose, etc.
- intravenous vehicles include fluid and nutrient replenishers.
- Preservatives include antimicrobial agents, antifungal agents, anti-oxidants, chelating agents and inert gases.
- the pH and exact concentration of the various components the pharmaceutical composition are adjusted according to well-known parameters.
- formulations comprising constructs described herein and/or co- administered formulations may be suitable for oral administration.
- oral formulations include such typical excipients as, for example, pharmaceutical grades of mannitol, lactose, starch, magnesium stearate, sodium saccharine, cellulose, magnesium carbonate and the like.
- the compositions take the form of solutions, suspensions, tablets, pills, capsules, sustained release formulations or powders.
- An effective amount of the pharmaceutical composition is determined based on the intended goal.
- unit dose or “dosage” refers to physically discrete units suitable for use in a subject, each unit containing a predetermined-quantity of the pharmaceutical composition calculated to produce the desired responses discussed above in association with its administration, i.e., the appropriate route and treatment regimen.
- the quantity to be administered both according to number of treatments and unit dose, depends on the protection or effect desired.
- Precise amounts of the pharmaceutical composition also depend on the judgment of the practitioner and are peculiar to each individual.
- kits containing compositions of the disclosure or compositions to implement methods disclosed herein include the physical and clinical state of the patient, the route of administration, the intended goal of treatment (e.g., alleviation of symptoms versus cure) and the potency, stability and toxicity of the particular therapeutic substance.
- kits containing compositions of the disclosure or compositions to implement methods disclosed herein include the physical and clinical state of the patient, the route of administration, the intended goal of treatment (e.g., alleviation of symptoms versus cure) and the potency, stability and toxicity of the particular therapeutic substance.
- kits containing compositions of the disclosure or compositions to implement methods disclosed herein include the physical and clinical state of the patient, the route of administration, the intended goal of treatment (e.g., alleviation of symptoms versus cure) and the potency, stability and toxicity of the particular therapeutic substance.
- kits containing compositions of the disclosure or compositions to implement methods disclosed herein include the physical and clinical state of the patient, the route of administration, the intended goal of treatment (e.g., alleviation of
- kits may comprise a number of agents for assessing differential m 5 C RNA levels and/or modifying m 5 C RNA levels in any feature described herein, for example, in features listed in Table 1, in particular, features listed in Table 1 and annotated as SEQ ID NOs: 100-614.
- a kit may comprise reagents for detection of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24 and/or 25 features.
- a kit may comprise reagents for detection and/or modification of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112,
- kits may comprise reagents for detection and/or modification of features whose sequence characteristics are at least, exactly, or about 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or 100 percent identical to the specified features found in Table 1 or any subset thereof.
- Kits may comprise components, which may be individually packaged or placed in a container, such as a tube, bottle, vial, syringe, or other suitable container means.
- kits Individual components may also be provided in a kit in concentrated amounts; in some aspects, a component is provided individually in the same concentration as it would be in a solution with other components. Concentrations of components may be provided as 1x, 2x, 5x, 10x, or 20x or more.
- negative and/or positive control nucleic acids, probes, and inhibitors are included in some kit aspects.
- a kit may include a sample that is a negative or positive control, for example a nucleic acid that does not comprise a m 5 C mark may be included as a negative control and a nucleic acid that does comprise a a m 5 C mark may be included as a positive control.
- kits of the present disclosure may exclude any one or more of the described components in certain aspects.
- Examples [0353] are included to demonstrate certain aspects of inventions disclosed herein. It should be appreciated by those of skill in the art that the techniques disclosed in the examples which follow represent techniques discovered by the inventor to function well in the practice of inventions disclosed herein, and thus can be considered to constitute certain modes for its practice. However, those of skill in the art should, in light of the present disclosure, appreciate that many changes can be made in the specific aspects which are disclosed and still obtain a like or similar result without departing from the spirit and scope of inventions described herein.
- Xenotransplantation of human leukemia cells 1 ⁇ 10 6 K-562 cells were injected intravenously via the tail vein into adult NOD.Cg-Prkdc scid Il2rg tm1Wjl /SzJ (NSG) mice (6- 8 weeks old) pretreated with 280 cGy whole body irradiation.
- NSG NOD.Cg-Prkdc scid Il2rg tm1Wjl /SzJ mice
- PB peripheral blood
- BM bone marrow
- the colonies were imaged by STEMvisionTM (STEMCELL Technologies) and scored on day 7, then these colonies were sequentially replated every 7 days for replating assay. Colony cells were also harvested and analyzed for expression of stem and progenitor markers and myeloid linage markers by flow cytometry.
- the LK cells were also incubated in suspension culture containing 30% FBS and 2% BSA in complete RPMI-1640 medium supplemented with 100 ng/mL mSCF, 10 ng/mL mIL-3, 50 ng/mL mTPO and 10 ng/ml mGM-CSF. Cells were harvested and analyzed for expression of stem/progenitor markers and myeloid lineage markers by flow cytometry on day 7.
- THP-1, K-562, SKM-1, and OCI-AML3 cells were kept in RPMI-1640 (Gibco, 61870036) with 10% fetal bovine serum (FBS, Gibco 26140079) at 37 °C and 5% CO 2 .
- FBS fetal bovine serum
- TF-1 was kept in RPMI-1640 (Gibco, 61870036) with 10% FBS (Gibco 26140079) and 2 ng/ml recombinant GM-CSF (Peprotech, 300-03) at 37 °C and 5% CO 2 .
- Hep G2, HeLa, HCT 116, A549 and A-375 cells were kept in DMEM (Gibco, 11995065) supplemented with 10% FBS (Gibco 26140079). All cell types were kept at 37 °C and 5% CO 2 .
- shNC and shMBD6 THP-1 and K-562 cell lines were constructed by lentivirus transduction with TransDuxTM MAX Lentivirus Transduction Reagent (System Biosciences, LV860A-1).
- LK cells were plated in triplicate in methylcellulose medium (MethoCult, M3134) supplemented with mouse stem cell factor (mSCF; 100 ng/mL), interleukin-3 (mIL-3; 10 ng/mL), thrombopoietin (mTPO; 50 ng/mL), granulocyte-macrophage colony-stimulating factor (mGM-CSF; 10 ng/mL), human erythropoietin (hEPO; 4 U/mL), and interleukin-6 (hIL- 6; 50 ng/mL, PeproTech) (see e.g., FIGs.4 and 15).
- mSCF mouse stem cell factor
- mIL-3 interleukin-3
- mTPO thrombopoietin
- mGM-CSF granulocyte-macrophage colony-stimulating factor
- hEPO human erythropoietin
- siRNA Small interfering RNA
- plasmid transfection Two or three individual siRNAs, or a pool of four siRNAs targeting different regions of the same transcript (Dharmacon and/or Qiagen siRNA, Human NSUN2: Dharmacon L-018217-01-0005; Mouse Nsun2: Qiagen SI01331687; Human MBD5: Dharmacon L- 027190-01-0005; Mouse Mbd5: Qiagen SI04942448; Human MBD6: Dharmacon L-015157- 01-0005; and Mouse Mbd6: Dharmacon L-049319-01-0005; and Broad Institute Human TET2: TRCN0000418976 or Human MBD6: TRCN0000038787) were used for knockdown of human or mouse transcripts.
- Plasmid transfections in mESCs or HEK293T cells were performed with LipofectamineTM 3000 Transfection Reagent (Invitrogen, L3000015) according to the manufacturer’s instructions.
- Cell proliferation assay [0370] Cell proliferation assay for adherent and suspension cells were performed similarly. Cells were seeded in 96-well plates before assaying in 100 ⁇ L settings with CellTiter 96® Aqueous One Solution Cell Proliferation Assay (Promega, G3582) following the manufacturer’s instructions. 2000-10,000 cells were seeded per well at day 0 and cell proliferation was monitored every 24 hours by incubation cell suspension with MTS reagent at 37 °C for 1 hour.
- DNase I–TUNEL assay For cell line samples, mESCs were reseeded to 10-cm cell culture dishes 12-hour prior to small interfering RNA (siRNA) transfection. The DNase I–TUNEL assay was performed using DeadEnd Fluorometric TUNEL System (Promega, G3250) following the manufacturer’s instructions following cell fixation with paraformaldehyde and permeabilization with Triton X-100. Two independent experiments were performed. Cells were treated with 1 U/mL of DNase I (Thermo Scientific, EN0521) for 5 minutes at 37 °C before rTdT labeling.
- DNase I Thermo Scientific, EN0521
- Recombinant protein purification [0378] Standard molecular cloning strategies were used to generate C-terminally MBP- 6 ⁇ His tagged MBD domain of MBD6 (residues 1-100). Human MBD6 coding sequence was obtained from Origene (Origene, #SC324058). Full-length coding sequence was cloned using PrimeSTAR ® GXL DNA Polymerase (TaKaRa Bio, R050B). Recombinant proteins were expressed in E. coli BL21 (DE3) grown to OD600 of 0.6 in LB medium. The expression was induced with 0.6 mM IPTG at 16 °C for 20 hours and cells were harvested via centrifugation.
- Protein samples were prepared from respective cells by lysis in RIPA buffer (Thermo Scientific, 89900) containing 1 ⁇ HaltTM Protease and Phosphatase Inhibitor Cocktail (Thermo Scientific 78441). Protein concentration was measured by NanoDrop 8000 Spectrophotometer (Thermo Scientific). Lysates of equal total protein concentration were heated at 90 °C in 1 ⁇ loading buffer (Bio-Rad, 1610747) for ten minutes. Denatured protein was loaded into 4-12% NuPAGE Bis-Tris gels (Invitrogen, NP0335BOX) and transferred to PVDF membranes (Thermo Scientific, 88585).
- Membranes were blocked in Tris-Buffered Saline, 0.1% Tween® 20 (TBST) with 3% BSA (MilliporeSigma, A7030) for 30 minutes at room temperature, incubated in a diluted primary antibody solution at 4 °C overnight, then washed and incubated in a dilution of secondary antibody conjugated to HRP for 1 hour at room temperature. Protein bands were detected using SuperSignal West Dura Extended Duration Substrate kit (ThermoFisher, 34075) with a FluroChem R (Proteinsimple). Blot intensities were quantified with Fiji (ImageJ) Analyze-Gel module.
- RNA immunoprecipitation with spike-in 5-methylcytosine (m 5 C) modified or unmodified mRNA spike-ins were in vitro transcribed from firefly luciferase or renilla luciferase coding sequences with mMESSAGE mMACHINETM T7 Transcription Kit (Invitrogen, AM1344) and manually reconstituted dNTP mixes with 2% m 5 CTP/CTP ratio.5-methylcytidine-5-triphosphate was obtained from Trilink (#N-101405). Yielded RNA was purified by using the standard protocol of RNA Clean & Concentrator-5 (Zymo Research, R1013).
- RNA samples were then applied to RNA before fragmentation.
- Total RNAs from whole cell or the chromatin-associated fractions were randomly fragmented by incubation at 94 °C for 4 minutes using 1 ⁇ fragmentation buffer (NEB, E6186A). Fragmentation was stopped by adding 1 ⁇ Stop Solution. Spike-in RNAs were added to each sample.
- RNA samples were added to the bead-antibody complexes and incubated with 1 ⁇ L SUPERase•InTM RNase Inhibitor (Invitrogen, AM2694) overnight at 4 °C on a rotating wheel. After several washes with IP buffer, RNA was incubated in 100 ⁇ L elution buffer (5 mM Tris- HCl pH 7.5, 1 mM EDTA, 0.05% SDS, and 200 ⁇ g Proteinase K (Invitrogen, 25530049)) for 1 hour at 50 °C.
- SUPERase•InTM RNase Inhibitor Invitrogen, AM2694
- Methyl-DNA immunoprecipitation (MeDIP) [0390] Genomic DNA was extracted from cultured cells with Monarch ® Genomic DNA Purification Kit (New England Biolabs, T3010S). Unmethylated lambda DNA (Promega, D1521) was spiked at a 0.5 % ratio for quality control of the immunoprecipitation. DNAs were then fragmented to 200 – 1000 base pairs (bps) with NEBNext ® dsDNA Fragmentase ® (New England Biolabs, M0348S) by a 22-minute incubation. The fragmented DNA was then denatured at 95 °C for 5 minutes and immediately cooled on ice for another 5 minutes. Input samples were removed and saved on ice for later use.
- mESCs were seed to 6-cm dishes at the same density in three replicates. After 42 hours, cells were treated with 1 mM 5-ethynyl uridine (5-EU) for 10 minutes, 20 minutes, and 40 minutes before RNA harvest with TRIzol TM Reagent (Invitrogen, 15596026). Ribosomal RNA was depleted from total RNA preps prior to click reaction with biotin azide (PEG4 carboxamide-6- azidohexanyl biotin). Biotinylated RNA was enriched by DynabeadsTM MyOneTM Streptavidin T1 (Invitrogen, 65601).
- ERCC RNA Spike-In Mix (Invitrogen, 4456740) was added to eluted RNA with the amount proportional to total RNA to each sample before rRNA depletion.
- Spiked-RNAs were used as input for RNA-seq library constructions with SMARTer® Stranded Total RNA-Seq Kit v2 - Pico Input Mammalian (TaKaRa Bio, 634411) according to the manufacturer’s instructions. Libraries were sequenced on a NovaSeq 6000 sequencer. Cleavage Under Targets and Tagmentation (CUT&Tag) [0392] CUT&Tag was performed using CUT&Tag-ITTM Assay Kit (Active motif, 53160) following the manufacturer’s instructions.
- CUT&Tag Cleavage Under Targets and Tagmentation
- 0.2 million cells were used as input for one replicate and washed with 1 ⁇ Wash buffer. Washed cells were conjugated to concanavalin A beads and permeabilized with Digitonin-containing buffer before primary antibody (anti- H3K27me3, anti-H2AK119ub or normal rabbit IgG) incubation. Pre-assembled protein A-Tn5 transposome enabled DNA tagmentation was performed after secondary antibody conjugation. Tagged DNA was extracted by proteinase K digestion and amplified by PCR with indexed primers to yield DNA libraries. DNA libraries were subjected to qPCR analysis with gene- specific primers or high-throughput sequencing on a NovaSeq 6000 sequencer.
- Chromatin immunoprecipitation was performed by using SimpleChIP Enzymatic Chromatin IP Kit (Cell Signaling Technology, 9003) following the manufacturer’s instructions. Briefly, 10 million per replicate mESCs were crosslinked with 1% paraformaldehyde for 10 minutes at room temperature. Cell pellet was scraped and snap-frozen by liquid nitrogen before stored at -80 °C. Nucleus was prepared from thawed cell pellet by hypotonic treatment before enzymatic digestion by MNase.
- dCas13b protein fusion with catalytic domain of mouse TET2 (TET2CD) or catalytic dead mutants were constructed first from WT mESCs.
- the coding sequence of dCas13b was cloned from plasmid pCMV-dCas13-M3nls, which was a gift from David Liu (Addgene plasmid #155366; http://n2t.net/addgene:155366; RRID:Addgene 155366).
- TET2CD The coding sequence of TET2CD was cloned from plasmid pcDNA3-FLAG-mTET2 (CD), which was a gift from Yue Xiong (Addgene plasmid #89736; http://n2t.net/addgene:89736; RRID:Addgene_89736), and the catalytic dead mutant was cloned from plasmid pcDNA3-Flag-Tet2 CD Mut, which was a gift from Yi Zhang (Addgene plasmid #72220; http://n2t.net/addgene:72220; RRID:Addgene_72220).
- TET2CD or mutant
- dCas13b The coding sequences of TET2CD (or mutant) and dCas13b were fused and separated with EASLSRPDPALAALGGGGSGGGGSGGGGS (SEQ ID NO: 33) linker.
- the fusion protein was delivered to mESCs with lentiviral system and selected with Hygromycin B (Gibco, 10687010).
- Sequences expressing guide RNA for dCas13b were cloned into a plasmid expressing a Tet operator controlled H1 operator (H1-2O2) (53).
- This tet-pLKO-sgRNA-puro plasmid was a gift from Nathanael Gray (Addgene plasmid #104321; http://n2t.net/addgene:104321; RRID:Addgene_104321).
- the guide-RNA expression plasmid was delivered into TET2CD fusion protein expressing mESCs by lentivirus. Resulting cell lines were selected with puromycin (Gibco, A1113803).
- ASO Antisense Oligonucleotide
- LK cells LK cells
- the steric-blocking ASOs Integrated DNA Technologies
- targeted to the hyper- methylated motifs were fully modified with 2’-O-Methoxyethyl (2'MOE) bases and phosphorothioate bonds, which were also incorporated with a fluorescent dye Cy5 at 3’ end to monitor transfection efficiency.
- the NC5 ASO was used as negative control that was not targeting to human or mouse genome.
- IAPEz-int 2’MOE AGTTGAATCCTTCTTAACAGTCTGCTTTACGGGAAC (SEQ ID NO: 89), [0397] Specifically, SEQ ID NO: 89 modified as follows: /52MOErA/*/i2MOErG/*/i2MOErT/*/i2MOErT/*/i2MOErA/*/i2MOErA/*/i2 MOErT/*/i2MOErC/*/i2MOErT/*/i2MOErT/*/i2MOErT/*/i2MOErT/*/i2MOErT/*/i2MOErT/*/i2MOErT/*/i2MOE rT/*/i2MOErA/*/i2MOErA/*/i2MOErA/*/i2MOErG/*/i2MOErT/*/i2MOErT/*/i2MOErG/*/i2MOErG/*/i2MO
- SEQ ID NO: 88 modified as follows: /52MOErA/*/i2MOErC/*/i2MOErC/*/i2MOErA/*/i2MOErT/*/i2MOErT/*/i2MOErA/*/i2 MOErC/*/i2MOErT/*/i2MOErG/*/i2MOErG/*/i2MOErT/*/i2MOErT/*/i2MOErA/*/i2MOErA/*/i2MO ErT/*/i2MOErA/*/i2MO ErT/*/i2MOErT/*/i2MOErT/*/i2MOErT/*/i2MOErT/*/i2MOErT/* /i2MOErT//3Cy5Sp/.
- NC52'MOE GCGACTATACGCGCAATATG (SEQ ID NO: 90), [0401] Specifically, SEQ ID NO: 90 modified as follows: /52MOErG/*/i2MOErC/*/i2MOErA/*/i2MOErC/*/i2MOErT/*/i2MOErA/*/i2 MOErT/*/i2MOErA/*/i2MOErA/*/i2MOErC/*/i2MOErG/*/i2MOErC/*/i2MOErG/*/i2MOErC/*/i2MO ErA/*/i2MOErA/*/i2MOErT/*/i2MOErT/*/i2MOErG//3Cy5Sp/.
- RNA bisulfite sequencing results was custom-synthesized and cloned into pLentiRNAGuide_002-hU6-RfxCas13d-DR-BsmBI-EFS-EGFP:P2A:Puro-WPRE vector.
- the catalytic domain of mouse TET2 (mTET2CD) or a catalytically dead mutant TET2 H1304Y/D1306A (mTET2CDHxDCD) was cloned into pLV[Exp]-[EF-1sc>[NLS- RfxCas13d]:[Linker]:P2A:mCherry(ns):T2A:Bsd vector. All these plasmids were synthesized, constructed and confirmed by VectorBuilder Inc.
- ASO Antisense oligo transfections
- the inventors designed antisense oligos targeting the primary m 5 C sites on IAPEz or MERVL sequences based on our RNA m 5 C sequencing results.
- ASO transfections in mESCs were performed by using LipofectamineTM RNAiMAX Transfection Reagent (Invitrogen, 13778075) according to the manufacturer’s instructions.
- CIP Crosslinking and Immunoprecipitation
- Pellets were thawed on ice and resuspended in 3 volume of ice-cold CLIP lysis buffer (50 mM HEPES pH 7.5, 150 mM KCl, 2 mM EDTA, 0.5% (v/v) NP-40, 0.5 mM DTT, 1 ⁇ HaltTM Protease and Phosphatase Inhibitor Cocktail (Thermo Scientific, 78442), 1 ⁇ RNaseOUT Recombinant Ribonuclease Inhibitor (Invitrogen, 10777019)). Pellets were lysed by rotating at 4 °C for 15 minutes after passing through a 26 G needle (BD Biosciences).
- CLIP lysis buffer 50 mM HEPES pH 7.5, 150 mM KCl, 2 mM EDTA, 0.5% (v/v) NP-40, 0.5 mM DTT, 1 ⁇ HaltTM Protease and Phosphatase Inhibitor Cocktail (Thermo
- Embryo suspensions were sonicated on a bioruptor (Diagenode) with 30 s on/30 s off for 5 cycles. Lysates were cleared by centrifugation at 21,000 g for 15 minutes at 4 °C on a benchtop centrifuge. Supernatants were applied to Flag antibody (Abcam, ab205606) conjugated protein A beads (Invitrogen, 1001D) and left overnight at 4 °C on an end-to-end rotor.
- coli BL21 (DE3). Different concentrations of proteins were mixed with 100 nM FAM-labeled oligo probes in 1 ⁇ binding buffer (20 mM HEPES pH 7.5, 40 mM KCl, 10 mM MgCl2, 0.1 % Triton X-100, 10 % glycerol and 1 ⁇ RNaseOUT Recombinant Ribonuclease Inhibitor (Invitrogen, 10777019)). The probe-protein mixture was incubated on ice for 30 minutes. The mixtures were loaded to a 10% NovexTM TBE Gel (Invitrogen, EC62755BOX).
- ribosomal RNA was removed by using RiboMinusTM Eukaryote System v2 (Invitrogen, A15026) with purification and size-selection using RNA Clean & Concentrator-5 (Zymo Research, R1013). Recovered RNAs were subjected to digestion and MS-Spec analysis. Biotinylation of immunoprecipitated RNAs [0408] Biotin labeling of immunoprecipitated RNA was performed according to a published protocol (https://www.protocols.io/view/biotin-labelling-of-immunoprecipitated- rna-v1pre-kqdg354kpv25/) without any modifications.
- Chambers were washed with 0.05% Triton X-100 in DPBS for 3 times, then 1:1000 diluted goat anti rabbit IgG-AF568 conjugate (Invitrogen, A-11011) in blocking solution was added to each well and chambers were incubated at room temperature for one hour. Then chambers were washed with 0.05% Triton X-100 in DPBS for three times and fixed with 4% PFA in DPBS for thirty minutes at room temperature and washed for three times with DPBS. Nuclei was counterstained with 2 ⁇ g/ml Hoechst 33342 (Abcam, ab145597) in DPBS at room temperature for 20 minutes, wash with DPBS for 3 times.
- RNA spike-in control (Thermo Fisher Scientific) using HISAT2 (version 2.2.1) (57).
- Annotation files (version M19 for mouse) were obtained from GENCODE database (https://www.gencodegenes.org/) (58). Reads on each GENCODE annotated gene were counted using HTSeq (version 0.12.4) (59) and then normalized to counts per million (CPM) using edgeR packages in R (60). CPM was converted to attomole by linear fitting of the RNA ERCC spike-in. RNA level and EU adding time were fitted using a linear mathematical model, and the slope was estimated as transcription rate of RNA.
- Annotation files (version M19 for mouse, and version v29 for human in gtf format) were downloaded from GENCODE database (https://www.gencodegenes.org/) (58). Mapped reads were separated by strands with samtools (version 1.16.1) (62) and m 5 C peaks on each strand were called using MACS2 (version 2) (63) with parameter ‘--nomodel, --keep-dup all, -g 7e8, -extsize 150’ separately. In this regard, the genome size was estimated based on reads coverage obtained from input samples. Significant peaks with q ⁇ 0.01 identified by MACS2 were considered.
- PerCP-CyTM5.5 mouse lineage antibody cocktail (BD Biosciences, 561317); PE Rat anti-mouse CD117 (BD Biosciences, 553869); Brilliant Violet 421TM (BV421) anti- mouse/human CD11b (Mac-1) (BioLegend, 101236); PE-Cy TM 7 Rat anti-mouse Ly-6G and Ly-6C (Gr-1) antibodies (BD Biosciences, 552985); PE Mouse anti-human CD33 (BD Biosciences, 561816) and PE-Cy TM 7 Rat anti-mouse CD45 (BD Biosciences, 552848). All antibodies were applied at a dilution fold according to the manufacturer’s suggestions for specific use unless specified in the methods section.
- TET methylcytosine dioxygenases (TET1, TET2, and TET3) mediate oxidation of DNA 5-methylcytosine (5mC) in mammals. This pathway has been shown to impact gene expression regulation in a wide range of different biological systems (6-11). Among the three TET enzymes, TET2 is unique in that this gene distinctly exhibits high mutation ratios (10%– 40%) in myeloid malignancies (see e.g., FIG.
- TET2 isocitrate dehydrogenase
- Example 3 TET2 catalytically mediated oxidation of m 5 C on RNAs (e.g., chromatin associated RNAs) and regulated chromatin state
- RNAs e.g., chromatin associated RNAs
- UHPLC-MS/MS ultra-high-performance liquid chromatography- tandem mass spectrometry
- Tet2 KO led to a notable increase of the caRNA m 5 C level, accompanied with a decreased level of the oxidation product 5-hydroxymethylcytosine (hm 5 C) when compared with those from WT mESCs (FIG.2A).
- Tet2 KO also resulted in a widespread increase in chromatin-associated regulatory RNA (carRNA) abundance, encompassing enhancer RNAs, promoter-associated RNAs, and repeat RNAs (FIG. 7A). By performing m 5 C MeRIP enrichment followed by sequencing, it was found that when these carRNAs were marked with m 5 C methylation, their abundance showed even greater increases upon either Tet2 or Pspc1 KO (FIG. 7B).
- L1MdA_I/II and IAPEz-int/RLTR10 were closely examined as respective representatives, as they were the top ranked subfamilies in the m 5 C MeRIP enrichment dataset.
- IAP displayed an increase in local chromatin accessibility upon either Tet2 or Pspc1 KO (FIG. 8); the increased m 5 C on IAP RNAs was confirmed by using m 5 C methylated RNA immunoprecipitation followed by quantitative reverse transcription qPCR (m 5 C-MeRIP- qPCR) (FIG. 2E).
- m 5 C-marked IAP displayed a greater increase in carRNA abundance when compared to unmethylated repeats upon Tet2 KO (FIG.2F).
- NOP2/Sun RNA methyltransferase 2 (NSUN2) and DNA methyltransferase 2 (TRDMT1) were promising candidates as both were known to localize in the cell nucleus and mediate RNA m 5 C methylation (31, 32). While differences upon TRDMT1 depletion were minimal, NSUN2 depletion using siRNA resulted in an approximately 70% decrease in caRNA (rRNA depleted) m 5 C abundance (FIG. 9A). Therefore, NSUN2 appeared to be the main caRNA m 5 C writer protein in mESCs.
- transcriptome-wide alterations caused by Nsun2 depletion exhibited patterns that contrasted with the gene expression changes caused by Tet2 or Pspc1 depletion in mESCs (FIG.9B).
- TET2 protein can mediate oxidation of RNA m 5 C to hm 5 C (22, 33). Consistent with these findings, the UHPLC- MS/MS measurements reported here from TET2 depleted cells also indicated oxidation of m 5 C on chromatin-associated RNA by TET2 in mESCs (FIG. 2A), with LTR RNAs as a notable example (FIG.2D).
- RNAs were investigated further as these RNAs had the highest levels of m 5 C methylation among all LTR RNAs in mESCs (FIG.2D).
- Gene transcription rates were analyzed by enriching and sequencing EU labeled nascent transcripts with and without TET2 depletion. TET2 depletion led to increased chromatin-associated IAP RNA m 5 C level (FIG. 2E), accompanied with increased local chromatin accessibility (FIG.8), and elevated levels of its target RNAs (FIG.2F), suggesting a correlation between the more open chromatin caused by TET2 deficiency and the accumulation of m 5 C on the target repeat RNAs.
- IAP methylation (or its potential oxidation by TET2) was blocked with a designed anti-sense oligo (ASO) that annealed with the main IAP m 5 C peak in mESCs, located near the 5′-end of the RNA molecule (FIG. 8).
- ASO anti-sense oligo
- IAP ASO blocking of m 5 C installation was validated with qPCR analysis using primers targeting individual regions (FIG.10A), a slight decrease in the IAP RNA level was observed (FIG. 10B).
- the inventors also observed corresponding closed local chromatin at IAP loci (FIG.10C).
- loci-specific RNA targeting systems were generated by fusing dCas13b (SEQ ID NO: 25) (34) with the TET2 catalytic domain (TET2CD) (SEQ ID NO: 21).
- Guide RNAs targeting the primary m 5 C site proximal to the 5′- end of IAP RNAs were generated (SEQ ID NOs: 91).
- the dCas13b-TET2(CD) fusion protein (SEQ ID NO: 27) was stably expressed in mESCs with the guide RNA under the control of a doxycycline (DOX)-responsive Tet operator (35) (FIG. 10D).
- DOX doxycycline
- TET2 could mediate either DNA or RNA 5-methylcytosine oxidation.
- TET2 was directed to repeat RNAs (e.g., LTR RNAs in particular) and mediated RNA m 5 C oxidation. It was the RNA m 5 C oxidation that dominated chromatin and transcriptional regulation in the mESC system. However, it is possible that TET2 oxidation of DNA 5mC could dominate chromatin regulation in other systems.
- Example 5 – MBD5 and MBD6 were RNA-binding proteins that preferentially recognize and bind RNA m 5 C, furthermore, MBD6 binds RNA m 5 C to recruit PR-DUB.
- MBD5 and MBD6 both possess a conserved but structurally distinct methyl-binding domain (MBD), but as confirmed herein, do not bind to DNA (FIG.13A and FIG.13C) (46).
- MBD5 and MBD6 both possess a conserved but structurally distinct methyl-binding domain (MBD), but as confirmed herein, do not bind to DNA (FIG.13A and FIG.13C) (46).
- the inventors hypothesized that these two proteins may bind RNA m 5 C, which may then recruit PR-DUB to mediate H2AK119ub deubiquitylation at the m 5 C-methylated caRNA (e.g., LTR loci) for transcriptional activation.
- m 5 C-methylated caRNA e.g., LTR loci
- MBD5 and MBD6 were RNA-binding proteins that preferentially recognized and bound to RNA m 5 C-modified nucleosides.
- the RNA-binding targets of MBD5 and MBD6 were found to significantly overlap with each other (FIG. 14A).
- MBD6 appeared to affect the levels of histone H2AK119ub and repeat RNA expression more dominantly relative to MBD5, as knockdown of MBD6 was sufficient to reverse elevated expression of LTRs (e.g., IAP, MERVL and MusD) caused by Tet2 KO in mESCs, whereas MBD5 knockdown failed to do so (FIG. 14B).
- LTRs e.g., IAP, MERVL and MusD
- MBD5 knockdown failed to do so (FIG. 14B).
- the global H2AK119ub level also significantly increased only upon Mbd6 knockdown (FIG.14C). Both Mbd6 knockdown and Nsun2 knockdown were found to partially suppress the global chromatin openness caused by Tet2 KO in mESCs (FIG.4E).
- TET2- deficient LK cells (Lin-c-Kit + cells, capturing HSPCs) are their enhanced self-renewal capacity, and their skewed propensity towards differentiation into granulocytic/monocytic lineages in vitro (4).
- LK cells Long-c-Kit + cells, capturing HSPCs
- FIG. 15B Knockdown of Mbd6 significantly reduced the replating potential of Tet2 KO LK cells (FIG. 4H and FIG. 15C). Mbd6 knockdown disrupted the TET2-loss-induced prolonged maintenance of stem/progenitor cells and promoted differentiation of HSPCs toward myeloid lineages in vitro (assayed on day 7 and 14, respectively) (FIG.4I and FIG.15D).
- IAP blockade disrupted the prolonged maintenance of stem/progenitor cells and enhanced differentiation of HSPCs toward myeloid lineages (CD11b + ) in vitro (FIG.17C-17D).
- caRNA targeting e.g., MERVL targeting or IAP targeting
- ASO blockade was also utlized to sterically block m 5 C sites of IAP or MERVL RNA in LK cells, and the cells were then analyzed with total RNA-seq. The results showed that genes upregulated by Tet2 KO exhibited a higher overlap with the genes downregulated by IAP ASO treatment, rather than those with increased expression (FIG.4J- 4K and FIG.
- Example 7 The m 5 C-TET2-LTR-MBD6 axis impacted leukemia and glioma cell fitness, and MBD6 depletion selectively impaired TET2-deficient leukemia growth in vitro and in vivo
- the m 5 C-TET2-LTR-MBD6 axis was found to be important for HSPC function, the inventors then studied the role of the axis in leukemia cell fitness. The inventors proposed mechanism predicted that TET2 loss of function would lead to more MBD6 binding to caRNAs (e.g., LTR RNAs) and subsequent activation of genes critical to leukemogenesis.
- caRNAs e.g., LTR RNAs
- mice receiving WT/MBD6 KD or TET2 KO/MBD6 KD cells exhibited a dramatically decelerated leukemogenesis relative to controls, particularly those animals transplanted with TET2 KO/MBD6 KD cells, which survived significantly longer (125- 135 days or 163-220 days respectively) (FIG.5C). Consistent with these results, WT or TET2 KO recipient mice had markedly higher human CD33 + cells chimerism in bone marrow (BM) and peripheral blood (PB) after 26 days when compared with animals that received WT/MBD6 KD or TET2 KO/MBD6 KD cells (FIG.20B).
- BM bone marrow
- PB peripheral blood
- WT or TET2 KO cell recipient mice showed dramatically higher human CD33 + CD45 + cells chimerism in BM and PB after 20 days when compared to WT/MBD6 KD or TET2 KO/MBD6 KD cell recipient mice (FIG. 20C).
- MBD6 KD markedly attenuated leukemic progression in vivo, in the absence or in the presence of functional TET2, albeit with greater efficacy in the absence of functional TET2.
- MBD6 appeared to exert a specific synergistic effect on caRNA in TET2-deficient cells to attenuate cell proliferation, as MBD6 knockdown reversed (suppressed) the excessive caRNA expression observed TET2 KO, but did not significantly reduce whole- cell RNA (e.g., SNRP70 and/or mRNA) (FIG. 21A-21B).
- TET2 depletion in leukemia cells resulted in excessive expression of carRNAs (including paRNAs, eRNAs and repeat RNAs), consistent with observations reported above in mESCs and mLK cells. Wild type like levels of carRNAs could be largely rescued by MBD6 knockdown (FIG.5D and FIG.
- the inventors Upon further examination of the m 5 C methylation levels on repeat RNAs in K-562 cells, the inventors found that the LTR class, particularly the ERV1 family with LTR12 and HERVH-int as representative subfamilies, exhibited higher m 5 C methylation levels (FIG.5H and FIG.22E; and Table 1). Notably, the nearby H2AK119ub levels were also upregulated by MBD6 knockdown and downregulated by TET2 KO (FIG.5I).
- RNA m 5 C methylation on carRNAs in particular LTR family RNAs, regulated chromatin state and transcription.
- the m 5 C on these repeat RNAs could be recognized by a newly identified RNA m 5 C binding protein MBD6, which recruited the BAP1 complex to mediate H2AK119ub deubiquitylation and gene activation.
- TET2 oxidized the m 5 C on these RNAs and antagonized gene activation through the m 5 C-MBD6-H2AK119ub deubiquitylation axis.
- MBD6 ligands are covalently linked to a molecule that binds E3 ubiquitin ligase using methods described in the art.
- the PROTAC will be tested in vitro for degradation of MBD6 in cultured cells before administering in vivo (e.g., in a cancer mouse model).
- m5C levels in one or more chromatin associated RNA (caRNA) are decreased and/or association of one or more caRNA with a PR-DUB complex are decreased in in one or more diseased cells (e.g., a tumor cell) in the mouse.
- caRNA chromatin associated RNA
- TET2-mediated mRNA demethylation regulates leukemia stem cell homing and self-renewal. Cell Stem Cell 30, 1072-1090 e1010, doi:10.1016/j.stem.2023.07.001 (2023).
- Tet2 is required to resolve inflammation by recruiting Hdac2 to specifically repress IL-6. Nature 525, 389-393, doi:10.1038/nature15252 (2015). [0489] 38 Chrysanthou, S. et al. The DNA dioxygenase Tet1 regulates H3K27 modification and embryonic stem cell biology independent of its catalytic activity. Nucleic Acids Res 50, 3169-3189, doi:10.1093/nar/gkac089 (2022). [0490] 39 Singh, A. K. et al. Selective targeting of TET catalytic domain promotes somatic cell reprogramming.
- Jarid2 binds mono-ubiquitylated H2A lysine 119 to mediate crosstalk between Polycomb complexes PRC1 and PRC2. Nat Commun 7, 13661, doi:10.1038/ncomms13661 (2016). [0494] 43 de Napoles, M. et al. Polycomb group proteins Ring1A/B link ubiquitylation of histone H2A to heritable gene silencing and X inactivation. Dev Cell 7, 663-676, doi:10.1016/j.devcel.2004.10.005 (2004). [0495] 44 Scheuermann, J. C. et al. Histone H2A deubiquitinase activity of the Polycomb repressive complex PR-DUB.
- MBD5 and MBD6 interact with the human PR-DUB complex through their methyl-CpG-binding domain. Proteomics 14, 2179-2189, doi:10.1002/pmic.201400013 (2014). [0499] 48 Li, Z. et al. Deletion of Tet2 in mice leads to dysregulated hematopoietic stem cells and subsequent development of myeloid malignancies. Blood 118, 4509-4518, doi:10.1182/blood-2010-12-325241 (2011). [0500] 49 Cluzeau, T. et al. Phenotypic and genotypic characterization of azacitidine- sensitive and resistant SKM1 myeloid cell lines.
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Medicinal Chemistry (AREA)
- Organic Chemistry (AREA)
- Veterinary Medicine (AREA)
- Biochemistry (AREA)
- Public Health (AREA)
- Epidemiology (AREA)
- Toxicology (AREA)
- Zoology (AREA)
- Gastroenterology & Hepatology (AREA)
- Animal Behavior & Ethology (AREA)
- Biophysics (AREA)
- Genetics & Genomics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Pharmacology & Pharmacy (AREA)
- Pharmaceuticals Containing Other Organic And Inorganic Compounds (AREA)
- Medicines That Contain Protein Lipid Enzymes And Other Medicines (AREA)
Abstract
Aspects of the present disclosure are directed to at least methods and compositions for diagnosis and/or treatment of diseases associated with aberrant levels of m5C RNA, including but not limited chromatin associate RNA (caRNA). The disease may comprise cancer and/or pre-cancerous cells. The disease may be associated with mutations in a TET2 pathway, including but not limited to mutations in TET2, IDH1, IDH2, and/or ASXL1 encoding genes. Treatment of a disease may comprises administration of one or more inhibitors of MBD6, TET2, and/or NSUN2. Also provided herein are methods of treatment of a disease in an individual comprising administering one or more inhibitors of MBD6, TET2, and/or NSUN2 to the individual determined to have aberrant m5C RNA modifications.
Description
COMPOSITIONS AND METHODS FOR MODULATING CHROMATIN STATE CROSS REFERENCE TO RELATED APPLICATIONS [0001] This application claims the benefit of priority to U.S. Provisional Patent Application Serial No.63/587,365, filed October 2, 2023, which is incorporated by reference herein in its entirety. SEQUENCE LISTING [0002] The instant application contains a Sequence Listing which has been submitted in ST26 format and is hereby incorporated by reference in its entirety. Said ST26 copy, created on October 2, 2023, is named ARCD_P0795USP1_Sequence_Listing.xml and is 2,726,784 bytes in size. BACKGROUND I. Field [0003] Aspects of this invention relate to at least the field of molecular biology and medicine. More particularly, aspects concern at least compositions and methods for modifying, detecting, mapping, and/or evaluating chromatin state for therapeutic purposes. II. Background [0004] The gene Ten-eleven translocation enzyme 2 (TET2) and TET2 protein pathway associated genes are frequently disrupted in various diseases (1-3), such as human cancers, and have been shown to drive myeloid malignancy initation and progression. TET2 deficiency has been shown to result in globally opened chromatin and activation of genes contributing to aberrant hematopoietic stem cell self-renewal (4,5). However, the open chromatin consistently observed in TET2-deficient mouse embryonic stem cells, leukemic cells, and hematopoietic progenitor and stem cells is inconsistent with TET2’s described role of DNA 5mC oxidation. Additionally, suitable biomarkers for determining efficacy of treatment options associated with diseases comprising TET2 mutations and/or TET pathway associated mutations are lacking. Finally, there exists a need for methods and compositions for treating TET2 and/or TET2 pathway deficient diseases, such as cancer and/or pre-cancers. SUMMARY [0005] The present disclosure addresses certain needs outlined above by providing at least methods, compositions, and kits for treatment of diseases associated with chromatin state
misregulation, in particular, chromatin state misregulation associated with mutations in TET2 and/or TET2 pathway associated genes. [0006] TET2 and associated pathways are frequently disturbed in various diseases, such as human cancers. As reported herein, the inventors have identified a novel pathway through which TET2 acts to alter chromatin state and gene expression through oxidative demethylation of m5C on chromatin-associated RNAs (caRNA), including but not limited to chromatin associated repeat RNA and chromatin associated regulatory RNA (carRNA). As described herein, the inventors have characterized upstream writers (e.g., NSUN1 and/or NSUN2), readers (e.g., MBD5 and/or MBD6), and erasers (e.g., TET2) involved in m5C regulation in caRNA. In addition, genes associated with TET2 pathways, such as isocitrate dehydrogenase 1 (IDH1) and/or isocitrate dehydrogenase 2 (IDH2) are mutated in ~77% low-grade glioma and ~30% of cholangiocarcinoma. IDH1/2 mutations are known to produce oncometabolite R- 2HG, which can inhibit TET2 activities. TET2 and IDH1/2 mutants are known to be mutually exclusive in leukemia indicating they work along the same pathway in disease pathogenesis (e.g., TET pathways). Thus, biological systems with TET2 pathway deficiencies are prevalent and highly problematic, for example, oncogenic. Collectively, as described herein, the inventors have identified a new mode of chromatin-associated RNA m5C mediated chromatin regulation. For example, in certain cases, NSUN2 can install m5C in caRNA, and TET2 can mediate oxidative demethylation thereof. MBD5 and/or MBD6 can read m5C-modified RNA on chromatin and can mediate active deubiquitylation of major repressive marks, such as but not limited to H2AK119ub. H2AK119ub can be installed by the Polycomb Repressive Complex 1 (PRC1) to silence chromatin (e.g., promote heterochromatin). Perturbations of TET2 pathway, and therapeutic interventions to mitigate the same, can be broadly applied in contexts including but not limited to: 1) human diseases (e.g., cancers, etc.) with TET2 mutation and/or low expression; 2) human diseases (e.g., cancers, etc.) comprising IDH1/2 mutations; and/or 3) human diseases (e.g., cancers, etc.) comprising PRC1 or PR-DUB mutations and/or altered expression. [0007] In some aspects, are methods for treating a disease in an individual, comprising the step of administering one or more inhibitors of methyl-CpG-binding domain protein 6 (MBD6) to an individual in need thereof. In some aspects, the disease comprises cancer of the lung, brain, breast, blood, skin, pancreas, liver, colon, head and neck, kidney, thyroid, stomach, spleen, gallbladder, bone, ovary, testes, endometrium, prostate, rectum, anus, and/or cervix. In some aspects, the disease comprises clonal hematopoiesis of indeterminate potential (CHIP). In some aspects, the disease is characterized by atherosclerosis, myocardial fibrosis, and/or
heart failure. In some aspects, the disease comprises a blood cancer. In some aspects, the disease comprises a leukemia. In some aspects, the disease comprises a myeloid malignancy. In some aspects, the disease comprises acute myeloid leukemia. In some aspects, the disease comprises chronic myelomonocytic leukemia. In some aspects, the disease comprises a glioma. In some aspects, the disease comprises glioblastoma. [0008] In some aspects, methods provided herein comprise reducing proliferation of a cancer and/or pre-cancerous cell. In some aspects, diseases for treatment utilizing methods provided here comprise cells with one or more mutations in one or more genes encoding a ten- eleven translocation (tet) methylcytosine dioxygenase 2 (TET2), ASXL transcriptional regulator 1 (ASXL1), isocitrate dehydrogenase 1 (IDH1), isocitrate dehydrogenase 2 (IDH2), tumor protein p53 (p53), DNA (cytosine-5-)-methyltransferase 3A (DNMT3A), Janus kinase 2 (JAK2), Protein Phosphatase Mn2+/Mg2+-Dependent 1D (PPM1D), Spliceosome Factor 3b1 (SF3B1), and/or Serine and Arginine Rich Splicing Factor 2 (SRSF2). [0009] In some aspects, methods of treatment described herein comprise treatment of a disease associated with diseased cells with one or more mutations in one or more genes encoding components of a canonical and/or non-canonical Polycomb Repressive Complex (PRC). In some aspects, one or more mutations in one or more genes encoding components of PRC can comprise one or more loss of function mutations. In some aspects, one or more mutations in one or more genes encoding components of PRC comprises, or expressly does not comprise, one or more mutations in E3 Ubiquitin Ligase RING1A/B, Polycomb Group Ring Finger 1 (PCGF1), Polycomb Group Ring Finger 2 (PCGF2), Polycomb Group Ring Finger 3 (PCGF3), Polycomb Group Ring Finger 4 (PCGF4), Polycomb Group Ring Finger 5 (PCGF5), and/or Polycomb Group Ring Finger 6 (PCGF6). In some aspects, the disease is associated with cells with one or more mutations in one or more genes encoding Polycomb Repressive- Deubiquitinase (PR-DUB) complex associated components O-linked N-acetylglucosamine Transferase (OGT), Lysine Demethylase 1B (KDM1B), Forkhead Box K1 (FOXK1), Forkhead Box K2 (FOXK2), BRCA1 Associated Protein 1 (BAP1), ASXL Transcriptional Regulator 1 (ASXL1), ASXL Transcriptional Regulator 2 (ASXL2), ASXL Transcription Regulator 3 (ASXL3), and/or Host Cell Factor C1 (HCFC1). In some aspects, the one or more mutations in PR-DUB complex associated components OGT, KDM1B, FOXK1, FOXK2, BAP1, ASXL1, ASXL2, ASXL3, and/or HCFC1 comprises, or expressly does not comprise, one or more gain of function mutations. In some aspects, the disease is associated with cells with one or more mutations in a TET2 encoding gene. In some aspects, the one or more mutations in a TET2 encoding gene comprises, or expressly does not comprise, one or more
loss of function mutations. In some aspects, the disease is associated with cells comprising one or more mutations in one or more genes encoding ASXL1, p53, DNMT3A, JAK2, PPM1D, SF3B1, and/or SRSF2. [0010] In some aspects, also provided herein are methods comprising administration of one or more additional therapies to an individual in need thereof. In some aspects, an additional therapy is surgery, radiation, chemotherapy, hormone therapy, and/or immunotherapy. In some aspects, an additional therapy comprises administration of an inhibitor of TET2. In some aspects, the inhibitor of TET2 comprises C35 and/or TETi76. In some aspects, an additional therapy comprises administration of an inhibitor of NSUN1 and/or NSUN2. In some aspects, an additional therapy comprises administration of an inhibitor of NSUN2. In some aspects, an additional therapy comprises 5-AzaC. [0011] In some aspects, methods described herein further comprise a step of diagnosing the disease in the individual. In some aspects, a disease is associated with diseased cells characterized as comprising an open chromatin state relative to non-diseased cells of the same developmental lineage. [0012] In some aspects, administering the one or more MBD6 inhibitors to the individual results in promotion of a closed chromatin state in one or more diseased cells in the individual. In some aspects, a disease is characterized as pro-inflammatory. In some aspects, methods provided herein comprise inhibition of diseased cell proliferation. [0013] In some aspects, administering the one or more MBD6 inhibitors decreases m5C levels in one or more chromatin associated RNA (caRNA) andor decreasing association of the one or more caRNA with a PR-DUB complex in one or more diseased cells in the individual, relative to a control non-diseased cell. In some aspects, administering the one or more MBD6 inhibitors increases oxidation of m5C in a caRNA and/or inhibits installation of m5C in the one or more caRNA in one or more diseased cells in the individual. In some aspects, one or more caRNA comprises or consists essentially Long Terminal Repeat (LTR) RNAs. In some aspects, one or more caRNA comprises one or more of the caRNAs described in Table 1. In some aspects, one or more caRNA comprises one or more of the caRNAs described in Table 2. In some aspects, one or more caRNA comprise a sequence at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to any one or more of SEQ ID NOs: 100-614. In some aspects, one or more caRNA comprise a sequence at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to any one or more of SEQ ID NOs: 104 or 107. In some aspects, methods described herein comprise utilization of between at least 20 to 50 caRNAs. In some aspects, one or more caRNAs comprise a chromatin associated regulatory
RNA (carRNA) and/or a chromatin associated repeat RNA sequence. In some aspects, one or more caRNAs comprise or consist essentially of endogenous retroviral RNAs. In some aspects, an endogenous retroviral RNAs comprises or consists essentially of endogenous retrovirus-K (ERVK). [0014] Also provided herein are methods of treatment, wherein histone ubiquitination is increased (e.g., in one or more diseased cells in an individual following administration of one or more MBD6 inhibitors). In some aspects, the histone ubiquitination comprises or consists of ubiquitination on H2A. In some aspects, the histone ubiquitination occurs at a site comprising or consisting essentially of H2AK119. In some aspects, methods described herein further comprise modifying histone methylation (e.g., in one or more diseased cells in an individual following administration of one or more MBD6 inhibitors). In some aspects, a histone that comprises modified methlaytion comprises or consists essentially of H3. In some aspects, the histone comprises modified methylation on a methylation site comprising or consisting essentially of H3K27me3. In some aspects, the histone that comprises modified methylation is localized near or at (e.g. within less than 4.5 to 5.5 kb, for example 5 kb) a genetic locus overlapping with a histone that comprises modified histone ubiquitination. [0015] In some aspects, one or more inhibitors of MBD6 comprise a polynucleotide at least partially complementary to a gene encoding MBD6. In some aspects, the at least partially complementary polynucleotide comprises a short hairpin RNA and/or small interfering RNA. In some aspects, the at least partially complementary polynucleotide comprises a sequence at least 80% complementary to at least 15, 20, 25, or more than 25 contiguous nucleotides of any one or more of SEQ ID NOs: 2, 4, 6, 8, 10, 12, 14, 16, 18, or 20, or an anti-sense sequence thereof. In some aspects, the polynucleotide comprises modified RNA phosphoramidites. In some aspects, the polynucleotide comprises RNA with one or more 2ʹ-O-Methyl (2ʹ-OMe) or 2ʹ-O-Methoxyethyl (2ʹ-MOE) modifications. In some aspects, the polynucleotide comprises RNA with every nucleotide comprising a 2ʹ-MOE modification. In some aspects, the polynucleotide comprises RNA with one or more phosphorothioate bonds. In some aspects, the polynucleotide is comprised within a lentiviral particle and/or nanoparticle. In some aspects, the inhibitor of MBD6 comprises more than one polynucleotide. In some aspects, one or more inhibitors of MBD6 comprise a proteolysis targeting chimera. [0016] Also provided herein are methods of promoting histone ubiquitination in a cell comprising contacting the cell with one or more inhibitors of methyl-CpG-binding domain protein 6 (MBD6). In some aspects, promoting histone ubiquitination comprises or consists of promoting ubiquitination on H2A. In some aspects, the histone ubiquitination site comprises
or consists essentially of H2AK119. In some aspects, the cell targeted for promotion of histone ubiquitination comprises, or expressly does not comprise, one or more mutations in one or more genes encoding TET2, ASXL1, IDH1, IDH2, p53, DNMT3A, JAK2, PPM1D, SF3B1, and/or SRSF2. In some aspects, the cell comprises, or expressly does not comprise, one or more mutations in one or more genes encoding RING1A/B, PCGF1, PCGF2, PCGF3, PCGF4, PCGF5, PCGF6, OGT, KDM1B, FOXK1, FOXK2, BAP1, ASXL1, ASXL2, ASXL3, and/or HCFC1. [0017] Also provided herein are methods of decreasing m5C levels in chromatin associated RNA (caRNA) in a cell, comprising contacting the cell with one or more inhibitors of MBD6. In some aspects, the caRNA comprise or consist essentially of LTR RNAs. In some aspects, the caRNA comprise or consist essentially of endogenous retroviral RNAs. In some aspects, the endogenous retroviral RNAs comprise or consist essentially of endogenous retrovirus-K (ERVK). [0018] Also provided herein are manufactured articles for use in performing any method disclosed herein. Also provided are kits comprising means for performing any method disclosed herein. [0019] Also provided herein are methods of treating a disease in an individual, wherein the disease is characterized by cells comprising one or more mutations in TET2, ASXL1, IDH1, IDH2, p53, DNMT3A, JAK2, PPM1D, SF3B1, and/or SRSF2 encoding genes, the method comprising administering one or more inhibitors of MBD6, TET2, and/or NSUN2 to the individual in need thereof. [0020] Also provided herein are compositions including polynucleotides and/or proteins. In some aspects, provided herein are fusion proteins comprising a sequence at least 80%, 85%, 90%, 95%, or 100% identical to any one or more of SEQ ID NOs: 1-4, 11-14, or 15-20. In some aspects, a fusion protein comprises a sequence at least 80%, 85%, 90%, 95%, or 100% identical to any one of SEQ ID NOs: 19-20. [0021] Also provided herein are compositions and/or kits that include one or more components associated with a method described herein. In some aspects, such compositions and/or ktis can be utilized in a method of treating a disease associated with aberrant m5C RNA methylation in an individual, for example, by administering a fusion protein and/or composition described herein. [0022] Certain aspects of the present disclosure are characterized through the following enumerated aspects.
[0023] Aspect 1 is a method of treating a disease in an individual, comprising the step of administering one or more inhibitors of methyl-CpG-binding domain protein 6 (MBD6) to an individual in need thereof. [0024] Aspect 2 is the method of aspect 1, wherein the disease comprises cancer of the lung, brain, breast, blood, skin, pancreas, liver, colon, head and neck, kidney, thyroid, stomach, spleen, gallbladder, bone, ovary, testes, endometrium, prostate, rectum, anus, and/or cervix. [0025] Aspect 3 is the method of aspect 1, wherein the disease comprises clonal hematopoiesis of indeterminate potential (CHIP). [0026] Aspect 4 is the method of aspect 3, wherein the disease is characterized by atherosclerosis, myocardial fibrosis, and/or heart failure. [0027] Aspect 5 is the method of aspect 2, wherein the cancer comprises a blood cancer. [0028] Aspect 6 is the method of aspect 5, wherein the blood cancer comprises a leukemia. [0029] Aspect 7 is the method of aspect 5, wherein the blood cancer comprises a myeloid malignancy. [0030] Aspect 8 is the method of aspect 7, wherein the myeloid malignancy comprises acute myeloid leukemia. [0031] Aspect 9 is the method of aspect 7, wherein the myeloid malignancy comprises chronic myelomonocytic leukemia. [0032] Aspect 10 is the method of aspect 2, wherein the cancer comprises a glioma. [0033] Aspect 11 is the method of aspect 10, wherein the glioma comprises glioblastoma. [0034] Aspect 12 is the method of any one of aspects 1 to 11, comprising reducing proliferation of a cancer and/or pre-cancerous cell. [0035] Aspect 13 is the method of any one of aspects 1 to 12, wherein the disease is associated with diseased cells comprising one or more mutations in one or more genes encoding a ten-eleven translocation (tet) methylcytosine dioxygenase 2 (TET2), ASXL transcriptional regulator 1 (ASXL1), isocitrate dehydrogenase 1 (IDH1), isocitrate dehydrogenase 2 (IDH2), tumor protein p53 (p53), DNA (cytosine-5-)-methyltransferase 3A (DNMT3A), Janus kinase 2 (JAK2), Protein Phosphatase Mn2+/Mg2+-Dependent 1D (PPM1D), Spliceosome Factor 3b1 (SF3B1), and/or Serine and Arginine Rich Splicing Factor 2 (SRSF2). [0036] Aspect 14 is the method of any one of aspects 1 to 13, wherein the disease is associated with diseased cells with one or more mutations in one or more genes encoding components of a canonical and/or non-canonical Polycomb Repressive Complex (PRC). [0037] Aspect 15 is the method of aspect 14, wherein the one or more mutations in one or more genes encoding components of PRC comprises one or more loss of function mutations.
[0038] Aspect 16 is the method of aspect 14 or 15, wherein the one or more mutations in one or more genes encoding components of PRC comprises one or more mutations in E3 Ubiquitin Ligase RING1A/B, Polycomb Group Ring Finger 1 (PCGF1), Polycomb Group Ring Finger 2 (PCGF2), Polycomb Group Ring Finger 3 (PCGF3), Polycomb Group Ring Finger 4 (PCGF4), Polycomb Group Ring Finger 5 (PCGF5), and/or Polycomb Group Ring Finger 6 (PCGF6). [0039] Aspect 17 is the method of any one of aspects 1 to 16, wherein the disease is associated with diseased cells with one or more mutations in one or more genes encoding Polycomb Repressive-Deubiquitinase (PR-DUB) complex associated components O-linked N- acetylglucosamine Transferase (OGT), Lysine Demethylase 1B (KDM1B), Forkhead Box K1 (FOXK1), Forkhead Box K2 (FOXK2), BRCA1 Associated Protein 1 (BAP1), ASXL Transcriptional Regulator 1 (ASXL1), ASXL Transcriptional Regulator 2 (ASXL2), ASXL Transcription Regulator 3 (ASXL3), and/or Host Cell Factor C1 (HCFC1). [0040] Aspect 18 is the method of aspect 17, wherein the one or more mutations in PR- DUB complex associated components OGT, KDM1B, FOXK1, FOXK2, BAP1, ASXL1, ASXL2, ASXL3, and/or HCFC1 comprises one or more gain of function mutations. [0041] Aspect 19 is the method of aspect 13, wherein the disease is associated with diseased cells with one or more mutations in a TET2 encoding gene. [0042] Aspect 20 is the method of aspect 19, wherein the one or more mutations in a TET2 encoding gene comprises one or more loss of function mutations. [0043] Aspect 21 is the method of aspect 20, wherein the disease is associated with diseased cells comprising one or more mutations in one or more genes encoding ASXL1, p53, DNMT3A, JAK2, PPM1D, SF3B1, and/or SRSF2. [0044] Aspect 22 is the method of any one of aspects 1 to 21, wherein the individual is administered an additional therapy. [0045] Aspect 23 is the method of aspect 22, wherein the additional therapy is surgery, radiation, chemotherapy, hormone therapy, and/or immunotherapy. [0046] Aspect 24 is the method of aspect 22 or 23, wherein the additional therapy comprises administration of an inhibitor of TET2. [0047] Aspect 25 is the method of aspect 24, wherein the inhibitor of TET2 comprises C35 and/or TETi76. [0048] Aspect 26 is the method of any one of aspects 22 to 25, wherein the additional therapy comprises administration of an inhibitor of NSUN1 and/or NSUN2.
[0049] Aspect 27 is the method of any one of aspects 22 to 26, wherein the additional therapy comprises administration of an inhibitor of NSUN2. [0050] Aspect 28 is the method of any one of aspects 22 to 27, further comprising administration of 5-AzaC. [0051] Aspect 29 is the method of any one of aspects 1 to 27, further comprising a step of diagnosing the disease in the individual. [0052] Aspect 30 is the method of any one of aspects 1 to 29, wherein the disease is is associated with diseased cells characterized as comprising an open chromatin state relative to non-diseased cells of the same developmental lineage. [0053] Aspect 31 is the method of any one of aspects 1 to 30, wherein administering the one or more MBD6 inhibitors results in promotion of a closed chromatin state in one or more diseased cells in the individual. [0054] Aspect 32 is the method of any one of aspects 1 to 31, wherein the disease is characterized as pro-inflammatory. [0055] Aspect 33 is the method of any one of aspects 1 to 32, comprising inhibition of diseased cell proliferation. [0056] Aspect 34 is the method of any one of aspects 1 to 33, wherein administering the one or more MBD6 inhibitors decreases m5C levels in one or more chromatin associated RNA (caRNA) and/or decreases association of one or more caRNA with a PR-DUB complex in one or more diseased cells in the individual. [0057] Aspect 35 is the method of aspect 34, wherein administering the one or more MBD6 inhibitors increases oxidation of m5C in the one or more caRNA and/or inhibits installation of m5C in the one or more caRNA in one or more diseased cells in the individual. [0058] Aspect 36 is the method of aspect 34 or 35, wherein the one or more caRNA comprises or consists essentially Long Terminal Repeat (LTR) RNAs. [0059] Aspect 37 is the method of any one of aspects 34 to 36, wherein the one or more caRNA comprises one or more of the caRNAs described in Table 1. [0060] Aspect 38 is the method of any one of aspects 34 to 37, wherein the one or more caRNA comprises one or more of the caRNAs described in Table 2. [0061] Aspect 39 is the method of any one of aspects 34 to 38, wherein the one or more caRNA comprise a sequence at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to any one or more of SEQ ID NOs: 100-614.
[0062] Aspect 40 is the method of any one of aspects 34 to 39, wherein the one or more caRNA comprise a sequence at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to any one or more of SEQ ID NOs 104 or 107. [0063] Aspect 41 is the method of any one of aspects 34 to 40, wherein the one or more caRNAs comprise between 20-50 caRNAs. [0064] Aspect 42 is the method of any one of aspects 34 to 41, wherein the one or more caRNAs comprise a chromatin associated regulatory RNA (carRNA) and/or a chromatin associated repeat RNA sequence. [0065] Aspect 43 is the method of any one of aspects 34 to 42, wherein the one or more caRNAs comprise or consist essentially of endogenous retroviral RNAs. [0066] Aspect 44 is the method of aspect 43, wherein the endogenous retroviral RNAs comprise or consists essentially of endogenous retrovirus-K (ERVK). [0067] Aspect 45 is the method of any one of aspects 1 to 44, wherein histone ubiquitination is increased in one or more diseased cells after administration of the one or more MBD6 inhibitors. [0068] Aspect 46 is the method of aspect 45, wherein the histone ubiquitination comprises or consists of ubiquitination on H2A. [0069] Aspect 47 is the method of aspect 46, wherein the histone ubiquitination occurs at a site comprising or consisting essentially of H2AK119. [0070] Aspect 48 is the method of any one of aspects 1 to 46, wherein administering the one or more MBD6 inhibitors modifies histone methylation in one or more diseased cells in the individual. [0071] Aspect 49 is the method of aspect 48, wherein a histone that comprises modified methylation comprises or consists essentially of H3. [0072] Aspect 50 is the method of aspect 49, wherein the histone comprises modified methylation on a methylation site comprising or consisting essentially of H3K27me3. [0073] Aspect 51 is the method of any one of aspects 48 to 50, wherein the histone that comprises modified methylation is localized near or at a genetic locus overlapping with a histone that comprises modified histone ubiquitination. [0074] Aspect 52 is the method of any one of aspects 1 to 51, wherein the one or more inhibitors of MBD6 comprise a polynucleotide at least partially complementary to a gene encoding MBD6. [0075] Aspect 53 is the method of aspect 52, wherein the polynucleotide comprises a short hairpin RNA and/or small interfering RNA.
[0076] Aspect 54 is the method of aspect 52, wherein the polynucleotide comprises a sequence at least 80% complementary to at least 15, 20, 25, or more than 25 contiguous nucleotides of any one or more of SEQ ID NOs: 2, 4, 6, 8, 10, 12, 14, 16, 18, or 20. [0077] Aspect 55 is the method of any one of aspects 52 to 54, wherein the polynucleotide comprises modified RNA phosphoramidites. [0078] Aspect 56 is the method of any one of aspects 52 to 55, wherein the polynucleotide comprises RNA with one or more 2ʹ-O-Methyl (2ʹ-OMe) or 2ʹ-O-Methoxyethyl (2ʹ-MOE) modifications. [0079] Aspect 57 is the method of aspect 56, wherein the polynucleotide comprises RNA with every nucleotide comprising a 2ʹ-MOE modification. [0080] Aspect 58 is the method of any one of aspects 52 to 57, wherein the polynucleotide comprises RNA with one or more phosphorothioate bonds. [0081] Aspect 59 is the method of any one of aspects 52 to 58, wherein the polynucleotide is comprised within a lentiviral particle and/or nanoparticle. [0082] Aspect 60 is the method of any one of aspects 52 to 59, wherein the inhibitor of MBD6 comprises more than one polynucleotide. [0083] Aspect 61 is the method of any one of aspects 1 to 51, wherein the one or more inhibitors of MBD6 comprise a proteolysis targeting chimera. [0084] Aspect 62 is a method of promoting histone ubiquitination in a cell comprising contacting the cell with one or more inhibitors of methyl-CpG-binding domain protein 6 (MBD6). [0085] Aspect 63 is the method of aspect 62, wherein promoting histone ubiquitination comprises or consists of promoting ubiquitination on H2A. [0086] Aspect 64 is the method of aspect 63, wherein the histone ubiquitination site comprises or consists essentially of H2AK119. [0087] Aspect 65 is the method of any one of aspects 62 to 64, wherein the cell comprises one or more mutations in one or more genes encoding TET2, ASXL1, IDH1, IDH2, p53, DNMT3A, JAK2, PPM1D, SF3B1, and/or SRSF2. [0088] Aspect 66 is the method of any one of aspects 62 to 65, wherein the cell comprises one or more mutations in one or more genes encoding RING1A/B, PCGF1, PCGF2, PCGF3, PCGF4, PCGF5, PCGF6, OGT, KDM1B, FOXK1, FOXK2, BAP1, ASXL1, ASXL2, ASXL3, and/or HCFC1. [0089] Aspect 67 is a method of decreasing m5C levels in chromatin associated RNA (caRNA) in a cell, comprising contacting the cell with one or more inhibitors of MBD6.
[0090] Aspect 68 is the method of aspect 67, wherein the caRNA comprise or consists essentially of LTR RNAs. [0091] Aspect 69 is the method of aspect 67 or 68, wherein the caRNA comprise or consists essentially of endogenous retroviral RNAs. [0092] Aspect 70 is the method of aspect 69, wherein the endogenous retroviral RNAs comprise or consists essentially of endogenous retrovirus-K (ERVK). [0093] Aspect 71 is a method of treating a disease in an individual, wherein the disease is characterized by cells comprising one or more mutations in TET2, ASXL1, IDH1, IDH2, p53, DNMT3A, JAK2, PPM1D, SF3B1, and/or SRSF2 encoding genes, the method comprising administering one or more inhibitors of MBD6, TET2, and/or NSUN2 to the individual in need thereof. [0094] Aspect 72 is a manufactured article for use in performing the method of any one of aspects 1 to 71. [0095] Aspect 73 is a kit comprising means for performing the method according to any one of aspects 1 to 71. [0096] Aspect 74 is a fusion protein comprising a sequence at least 80%, 85%, 90%, 95%, or 100% identical to any one or more of SEQ ID NOs: 1-4, 11-14, or 15-20. [0097] Aspect 75 is the fusion protein of aspect 74, wherein the fusion protein comprises a sequence at least 80%, 85%, 90%, 95%, or 100% identical to any one of SEQ ID NOs: 19- 20. [0098] Aspect 76 is a composition comprising the fusion protein of aspect 74 or 75. [0099] Aspect 77 is a method of treating a disease associated with aberrant m5C RNA methylation in an individual, comprising administering the fusion protein and/or composition of any one of aspects 74 to 76. [0100] Other objects, features and advantages of the present invention will become apparent from the following detailed description. It should be understood, however, that the detailed description and the specific examples, while indicating specific aspects of the invention, are given by way of illustration only, since various changes and modifications within the spirit and scope of the invention will become apparent to those skilled in the art from this detailed description. BRIEF DESCRIPTION OF THE DRAWINGS [0101] The following drawings form part of the present specification and are included to further demonstrate certain aspects of the present invention. Inventions described herein may
be better understood by reference to one or more of these drawings in combination with the detailed description of specific aspects presented herein. [0102] FIGs. 1A-1E, Elevated chromatin accessibility following Tet methylcytosine dioxygenase 2 (TET2) depletion was facilitated by Paraspeckle Component 1 (PSPC1) through its RNA binding activity in mouse embryonic stem cells (mESCs). FIGs.1A and 1B: Comparisons of chromatin accessibility (FIG.1A) and global nascent RNA synthesis rate (FIG.1B) of WT and Tet2 knockout (KO) mESCs. Representative images were selected from three independent experiments. FIG.1A, Left: assay of transposase-accessible chromatin with visualization (ATAC-see) visualizing global chromatin accessibility. Scale bar: 20 µm. The nucleus was counterstained with Hoechst 33342 (Hoechst). Right: quantification of relative fluorescence intensity of ATAC-see signal. FIG. 1B, Left: 5-ethynyl uridine (EU) incorporation followed by fluorescence imaging visualizing RNA synthesis rates. Scale bar: 20 µm. The nucleus was counterstained with Hoechst 33342 (Hoechst). Right: quantification of relative fluorescence intensity of EU signal. P values were determined using a Mann Whitney Wilcoxon Test. FIG. 1C: PSPC1 facilitated RNA binding of TET2 to repress DNA transcription. RNA synthesis rates of WT and Pspc1 KO mESCs were assessed by EU incorporation followed by fluorescence imaging. RNA synthesis rates of Pspc1 KO mESCs transfected with empty vector (EV), or expression vectors encoding wild-type mouse PSPC1, or PSPC1 mutant with impaired RNA binding abilities were measured with EU incorporation. P values were determined using a Mann Whitney Wilcoxon Test. FIG.1D: GSEA enrichment analysis was performed with genes upregulated (upDEGs) upon Pspc1 depletion in mESCs as the gene list and expression log2 foldchange (log2FC) upon Tet2 knockout in mESCs cells as the rank list. Top: gene expression was quantified with total RNA-seq (GSE103269 and GSE48518), Bottom: gene expression was quantified with chromatin associated RNA-seq (caRNA-seq; this study). FIG. 1E: Box plot illustrating gene expression log2FC (quantified with caRNA-seq, this study) upon Pspc1 knockout or Tet2 knockout. Genes were categorized into two groups based on whether or not they were targeted by CXXC5 or PSPC1 in mESCs. P values were calculated by a nonparametric two-tailed Wilcoxon-Mann-Whitney test. [0103] FIGs. 2A-2G, TET2 depletion led to elevated chromatin associated RNA (caRNA) m5C methylation and abundance, resulting in an upregulation of downstream gene expression in mESCs. FIG. 2A: TET2 oxidized RNA 5-methylcytosine to 5- hydroxylmethylcytosine. RNA 5-methylcytosine levels (m5C/C) (left) and 5- hydroxylmethylcytosine levels (hm5C/C) (right) in the chromatin-associated fraction were measured by mass spectrometry. P values were determined using unpaired Student’s t test with
Welch’s correction. *P = 0.038 and **P = 0.0033 for comparison between WT and Tet2 KO for m5C and hm5C, respectively. Bar plots represent mean ± standard deviation (S.D.) for three independent experiments. FIG.2B: Proportion of m5C-marked peaks on repeat RNA and non- repeat RNA regions in WT mESCs (2,951 in non-Repeat RNAs; and 2,673 (47.5%) in Repeat RNAs). FIG.2C: Average profile and heatmap depicting the m5C level in WT mESCs, along with the corresponding ATAC-seq signal in WT and Tet2 KO mESC cells. The regions analyzed were m5C-marked peaks located within repeat RNA regions. FIG. 2D: Dot plot displaying m5C enrichment in various repeat RNA families (e.g., Long Terminal Repeat (LTR), Long Interspersed Nuclear Element (LINE), and Short Interspersed Nuclear Element (SINE)). The size of each dot corresponds to the number of loci that were m5C methylated in WT mESCs. The results showed both L1 in LINE and ERVK in the LTR class exhibited a significant enrichment of m5C methylation and that these m5C-marked repeat RNAs were associated with increased local chromatin accessibility. FIG.2E: The relative m5C methylation level of the Intracisternal A-particle (IAP) and L1 RNAs in WT, Tet2 KO and Pspc1 KO mESCs. P values were determined using unpaired Student’s t test with Welch’s correction. *P = 0.04 and *P = 0.01 for comparison between WT and Tet2 KO for LINE-1 and IAP, respectively; *P = 0.04 and **P = 0.003 for comparison between WT and Pspc1 KO for LINE- 1 and IAP, respectively; Bar plots represent mean ± standard deviation (S.D.) for three independent experiments. FIG. 2F: Cumulative curve showing the log2FC in the caRNA abundance of IAPEz-int repeats family between Tet2 KO and WT mESCs. Loci within the IAPEz-int repeats family were categorized into groups according to whether or not they were m5C methylated. P values were calculated by a nonparametric two-tailed Wilcoxon-Mann- Whitney test. FIG.2G: Tethering of the catalytic domain (CD) of TET2 (SEQ ID NO: 21) to the IAP RNAs led to RNA m5C hypomethylation and DNA 5mC hypermethylation. mESCs stably expressing dCas13b-TET2(CD) (shown in the figure as WT; SEQ ID NO: 27) or a catalytically dead (HxD mutation) version (dCas13b-TET2(CD)HxD (SEQ ID NO: 29), shown in the figure as HxD) fusion protein and guide RNA sequence downstream of a Tet operator (TetO)-controlled H1 promotor (H1-2O2) were used for IAP RNA targeting. DNA 5mC methylation at IAP loci (bottom) and RNA m5C methylation (top) on IAP transcripts were measured using MeDIP and MeRIP, respectively, followed by qPCR at indicated time points post doxycycline (DOX) treatment in each cell line. P values were determined using paired Student’s t test by comparing values at corresponding time points with values at “0 hour”, individually. DNA methylation: for dCas13b-TET2(CD)-HxD, **P = 0.0087, *P = 0.0164, *P = 0.0346, *P = 0.0338, **P = 0.0048, *P = 0.027 for 8h, 12h, 16h, 20h, 36h, and 72h versus
0h, respectively; for dCas13b-TET2(CD), *P = 0.037, *P = 0.016 and **P = 0.0012 for 36h, 48h and 72h versus 0h, respectively. RNA methylation: for dCas13b-TET2(CD)-HxD, **P = 0.0029 for 12h versus 0h; for dCas13b-TET2(CD), ***P = 0.0003, ***P = 0.0010, **P = 0.0022, **P = 0.0026, **P = 0.0023, **P = 0.0019, **P = 0.0017, **P = 0.0017, and **P = 0.0022 for 4h, 8h, 12h, 16h, 20h, 24h, 36h, 48h and 72h versus 0h, respectively. NS denotes not significant (P > 0.05). Error bars represent mean ± standard deviation (S.D.) for three independent experiments. [0104] FIGs.3A-3G, The activity of TET2 on RNA dictated the increased chromatin accessibility caused by TET2 depletion in mESCs. FIG. 3A: Bar graphs illustrating the overlapping ratios (Jaccard index) between enhancer, promoter, or repeats regions, respectively, with DNA hypermethylated or hypomethylated regions in Tet2 knockout mESCs. FIG.3B: Measurements of the average histone modification levels, as well as input signals, in WT and Tet2 KO mESCs at DNA hypermethylated regions (detected in Tet2 KO mESCs). FIG. 3C: Depicts the average ATAC signals in WT versus Tet2 KO, and WT versus Pspc1 KO mESCs at DNA hypermethylated regions (detected in Tet2 KO mESCs). FIG. 3D: Displays bar graphs illustrating the overlapping ratios (Jaccard index) of the genomic binding sites of PSPC1, TET2, and CXXC5 with DNA hypomethylated regions (detected in Tet2 knockout mESCs). FIG.3E: Measurements of the average histone modification levels, as well as input signals, in WT and Tet2 KO mESCs at DNA hypomethylated regions (detected in Tet2 KO mESCs. FIG. 3F: Depicts the average ATAC signals in WT versus Tet2 KO, and WT versus Pspc1 KO mESCs at DNA hypomethylated regions (detected in Tet2 KO mESCs). FIG. 3G: Displays bar graphs illustrating the overlapping ratios (Jaccard index) of RNA m5C peaks with the genomic binding sites of PSPC1, TET2, and CXXC5 in WT mESCs. DNA hypermethylated and hypomethylated regions were defined by Hon, Gary C et al. (GSE48519). PSPC1 ChIP-seq (GSE150399), TET2 ChIP-seq (GSE115964) and CXXC5 ChIP-seq (GSE132025). [0105] FIGs. 4A-4L, Tet2 knockout enhanced MBD6 binding and induced H2AK119ub deubiquitylation, leading to elevated downstream gene expression in mESCs and mLK cells (mouse Hematopoietic Progenitor Cells (HPCs) that can give rise to myeloid or erythroid lineages, e.g., Lin-c-kit+). FIG. 4A: Bar graphs illustrating the overlapping ratios (Jaccard index) of various histone modification marked regions with DNA hypomethylated regions (detected with Tet2 KO) in mESCs. FIG. 4B: TET2 tethering increased the H2AK119ub level at IAP loci. H2AK119ub levels at IAP loci were measured by CUT&Tag followed by qPCR analysis in mESCs. Tethering of the catalytic domain of TET2
(TET2(CD)) fusion protein with dCas13b (dCas13b-TET2(CD), shown in the figure as WT; SEQ ID NO: 27) or a catalytically dead (HxD mutation) version (dCas13b-TET2(CD)-HxD, shown in the figure as HxD; SEQ ID NO: 29) to IAP RNAs was induced by doxycycline (DOX) addition to the culture medium. Cells were harvested at eight hours post DOX treatment. P values were determined using unpaired Student’s t test with Welch’s correction by comparing values with values under condition “-Dox”, individually. *P = 0.010 for IAP +Dox for dCas13b-TET2(CD). NS denotes not significant (P > 0.05). Bar plots represent mean ± standard deviation (S.D.) for three independent experiments. FIG.4C: MBD6 enriched m5C- modified RNA in mESCs. rRNA-depleted RNAs crosslinked (CLIP) to Flag-tagged MBD6 with ultraviolet light (254 nm) were purified and digested to single nucleosides prior to mass spectrometry analysis. P values were determined using unpaired Student’s t test with Welch’s correction. ***P = 0.0008. Bar plots represent mean ± standard deviation (S.D.) for three independent experiments. FIG. 4D: MBD6 preferentially bound m5C-modified oligonucleotide probe. EMSA showing the binding preferences of the purified MBD domain of MBD6 fused to maltose binding protein (MBP) with C, m5C, or hm5C-containing single stranded oligonucleotide probes (100 nM). Sequences of the oligonucleotide probes (SEQ ID NOs: 95-99) were designed according to the motif identified by CLIP-Seq in mESCs, with X representing C, m5C, or hm5C. Gradient for MBP-MBD protein was 0, 0.1, 0.5, 2.5, 12, 50, and 100 μM. FIG. 4E: Mbd6 or Nsun2 knockdown rescued global chromatin accessibility caused by TET2 deficiency. DNase-TUNEL experiments were performed with accessible chromatin sites labeled by Alexa Fluor 488 fluorophore and quantified by flow cytometry. FIG. 4F: Volcano plot depicting the gene expression log2FC comparing Tet2-/- with WT mLK cells. Gene expression levels were quantified using RNA-seq in WT and Tet2-/-mLK cells. FIG.4G: Volcano plot depicting the log2FC of the H2AK119ub levels comparing Tet2-/- with WT mLK cells at repeat regions, with IAPEz-int regions showing an overall decreased H2AK119ub level upon Tet2 KO in mLK cells. FIG.4H: Depicts results of colony formation of replating assays of WT and Tet2-/- mLK cells transfected with control (shNC) or Mbd6 shRNA (shMbd6) plasmids, and then cultured in the presence of indicated cytokines (see Methods). Colony counts were scored every 7 days. Numbers of colonies after each replating were quantified. Statistical analyses were conducting by comparing number of colonies between shNC versus shMbd6 in WT or Tet2-/- groups, respectively. P values were determined using unpaired Student’s t test with Welch’s correction. ****P < 0.0001 for 2nd replating, ****P < 0.0001 for 3rd replating, and ****P < 0.0001 for 4th replating. NS denotes not significant (P > 0.05). Bar plots represent mean ± standard deviation (S.D.) for three independent experiments. FIG.4I:
Flow cytometry analyses of WT and Tet2-/- mLKs transfected with shNC or shMbd6 plasmids. mLKs were isolated from total BM cells of 6–8 weeks old mice and transfected with shNC or shMbd6 by electroporation 7 days before analyses. Frequency of hematopoietic stem and progenitor cells (HSPCs) was quantified by fractions of LK cells in the pool. P values were determined using unpaired Student’s t test with Welch’s correction by comparing values between shNC and shMbd6 groups. NS denotes not significant (P > 0.05). Bar plots represent mean ± standard deviation (S.D.) for three independent experiments. Differentiation towards monocyte/macrophages and granulocytes was quantified by fractions of CD11b+ cells in the pool. P values were determined using unpaired Student’s t test with Welch’s correction. *P = 0.021 and *P = 0.023 for Tet2-/- shNC versus shMbd6, respectively. Bar plots represent mean ± standard deviation (S.D.) for three independent experiments. FIG. 4J: Venn diagram illustrating the overlap between the upregulated genes (upDEGs) by Tet2 knockout and the downregulated genes (downDEGs) by the treatment of IAP-targeting anti-sense oligonucleotide (ASO; SEQ ID NO: 89) in mLK cells. Two-tailed P values were calculated using the Fisher’s Exact Test. FIG.4K: GSEA enrichment analysis was conducted using genes downregulated (downDEGs) upon the treatment of ASO in mLKs as the gene list and expression log2FC upon Tet2 knockout in mLK cells as the rank list. The gene expression levels were quantified using total RNA-seq in WT and Tet2-/- mLK cells. FIG.4L: The correlation of gene expression log2FC between Tet2-/- versus WT and ASO targeting IAP versus control ASO (NC ASO; SEQ ID NO: 90) in mLK cells. Altered genes were associated with transcription misregulation in cancer. [0106] FIGs. 5A-5K, MBD6 exhibited synergistic effects with TET2 deficiency in transcriptional regulation through the m5C-H2AK119ub axis in leukemia cells. FIG.5A: Proliferation curves showing that MBD6 knockdown significantly attenuated the proliferation of the TET2 mutant cell line SKM-1. MBD6 knockdown by two individual siRNAs (siMBD6- 1 and siMBD6-2, respectively) in three different human leukemia cells (THP-1, K-562, and SKM-1) were performed. Non-targeting siRNA (siNC) was used as a control. P values were determined using unpaired Student’s t test with Welch’s correction by comparing MTS signals at 96 hours of siMBD6-1 or siMBD6-2 versus siNC groups, respectively. ****P < 0.0001 and ***P = 0.0002 for THP-1; **P = 0.0044 and ***P = 0.0004 for K-562; ****P < 0.0001 and ****P < 0.0001 for SKM-1. Line plots represent mean ± standard deviation (S.D.) for six independent experiments. FIG. 5B: Bar plots summarizing proliferation of WT or TET2 KO K-562 or THP-1 cells with or without MBD6 knockdown. Cells stably expressing non-targeting shRNA (shNC) or shRNA targeting MBD6 RNA (shMBD6) were constructed via lentiviral
transduction. Cell proliferation was monitored with MTS assay at 96-hour post dilution. P values were determined using unpaired Student’s t test with Welch’s correction by comparing MTS signals at 96 hours of WT versus TET2 KO groups with MBD6 knockdown. ***P = 0.0008 for THP-1 cells and ****P < 0.0001 for K-562 cells. Line plots represent mean ± standard deviation (S.D.) for four independent experiments. FIG.5C: NOD scid gamma (NSG) mice were transplanted with K-562 (left) or THP-1 (right) cells as indicated, and the animals overall survival was measured and is depicted with the Kaplan-Meier estimator (WT shNC, n = 5; WT shMBD6, n = 5; TET2 KO shNC, n = 5; TET2 KO shMBD6, n = 5). **P = 0.0054 for WT, shMBD6 versus TET2 KO, shMBD6 K-562 cells and *P = 0.013 for WT, shMBD6 versus TET2 KO, shMBD6 THP-1 cells were determined by a log-rank Mantel–Cox test. FIG.5D: A negative correlation in repeat RNA abundance log2FC between TET2 KO versus WT and between MBD6 KD versus control in TET2 KO K-562 cells was observed. FIG.5E: Boxplot displaying the log2FC of repeat RNA abundance between TET2 KO versus WT, and between MBD6 KD versus control in TET2 KO K-562 cells. The repeat RNAs were categorized into two groups based on whether or not they were m5C methylated. P values were calculated by a nonparametric two-tailed Wilcoxon-Mann-Whitney test. FIG. 5F: The average profile of H2AK119ub around H2AK119ub peak center and flanking 2.5 kb regions in control versus MBD6 KD, and WT versus TET2 KO K-562 cells. FIG.5G: Depicts a cumulative curve of the log2FC of H2AK119ub between TET2 KO versus WT, and between MBD6 KD versus control K-562 cells at repeat loci. P values were calculated by a nonparametric two-tailed Wilcoxon- Mann-Whitney test. FIG. 5H: Dot plot displaying the m5C enrichment at various repeat families. The size of each dot corresponded to the number of loci that exhibited m5C methylation in WT K-562 cells. FIG.5I: Depicts average H2AK119ub modification levels in control versus MBD6 KD, and WT versus TET2 KO K-562 cells around LTR12C/ERV1/LTR loci. FIG.5J: Correlation of gene expression log2FC between TET2 KO versus WT, and MBD6 KD versus control in TET2 KO K-562 cells. The genes were associated with various signaling pathways, including apoptosis and cell cycle. FIG. 5K: Schematic representations showing how the RNA m5C oxidation activity of TET2 on chromatin-associated RNA regulated the chromatin state through H2AK119ub deubiquitylation and its contribution to the progression of leukemia. [0107] FIGs. 6A-6E, TET2 depletion led to global transcriptional activation in mESCs. FIG.6A: The mutation profiles of TET family proteins (TET1, TET2, and TET3) in different diseases were obtained from cbioportal.org. MDS denotes Myelodysplastic Syndromes (MDS IWG, IPSSM, NEJM Evidence 2022); MPN denotes Myeloproliferative
Neoplasms (CIMR, NEJM 2013); AML denotes Acute Myeloid Leukemia (OHSU, Nature 2018). FIG. 6B: TET2 protein is not covalently linked to a DNA-binding CXXC domain. Canonical amino acid sequences of human TET1-3 proteins and annotated domain structures were plotted (individual UniProt IDs: TET1 (Q8NFU7), TET2 (Q6N021) and TET3 (O43151)). The CXXC DNA binding domains and catalytic domains near the C-terminus were annotated. FIG.6C: Cumulative curve showing the nascent RNA transcription rate in WT and Tet2 KO mESCs. P values were calculated by a nonparametric two-tailed Wilcoxon-Mann- Whitney test. FIG.6D: Purities of different subcellular fractions of mESCs were validated with western blot. GAPDH was used as cytosolic (Cyt) protein marker, SNRP70 was used as nuclear soluble (Nuc) protein marker, and H3 was used as chromatin (Chr) marker. Equal aliquots of each isolated fractions were loaded onto each lane to enable direct comparison. The image shown is representative of three independent experiments. FIG. 6E: GSEA enrichment analysis was performed with genes downregulated (downDEGs) upon Pspc1 depletion in mESCs as the gene list and expression log2FC upon Tet2 KO in mESCs cells as the rank list. Left: gene expression was quantified with total RNA-seq (GSE103269 and GSE48518), Right: gene expression was quantified with caRNA-seq (this study). [0108] FIGs.7A-7C, TET2 depletion activated caRNA expression in mESCs. FIG.7A: Volcano plot illustrating the chromatin associated regulatory RNA (carRNA) abundance log2FC comparing Tet2 KO with WT mESCs, along with the statistical significance of their changes. carRNAs were categorized into eRNAs, paRNAs, and repeat RNAs. P values for the comparison of expression levels between two groups were obtained using the Wald test by DESeq2. FIG.7B: Boxplots displaying the log2FC of carRNA abundance comparing Tet2 KO with WT, and Pspc1 KO with WT mESCs. The different groups of carRNAs, including eRNAs, paRNAs, and repeat RNAs, were further categorized into two subgroups based on whether or not they were marked with RNA m5C methylation. P values were calculated by a nonparametric two-tailed Wilcoxon-Mann-Whitney test. FIG. 7C: Average profile and heatmap depicting the m5C levels in WT mESCs, along with the ATAC-seq signal in WT and Tet2 KO mESC cells. The regions analyzed were m5C-marked peaks. [0109] FIGs. 8A-8D, TET2 and PSPC1 regulated chromatin-associated LTR RNA m5C methylation and local chromatin state. FIG.8: Average profile and heatmap illustrating the m5C levels of different families of repeat RNAs in WT mESCs, along with the corresponding ATAC-seq signals in Tet2 KO versus WT, and Pspc1 KO versus WT mESC cells. The regions analyzed belong to the IAPEz-int (FIG.8A), RLTR10 (FIG.8B), L1MdA_I (FIG.8C), and L1MdA_II (FIG.8D) repeats families, respectively.
[0110] FIGs.9A-9B, NSUN2 installed m5C in caRNAs. FIG.9A: NSUN2 was the main methyltransferase for chromatin-associated RNA m5C installation. Chromatin-associated RNAs from WT mESCs transfected with non-targeting siRNA (siNC), and siRNAs targeting Nsun2, or Trdmt1 were measured by mass spectrometry. Bar plots represent mean ± standard deviation (S.D.) for three independent experiments. ***P = 0.0005 by comparing values between siNsun2 and siNC. NS denotes not significant (P > 0.05). Bar plots represent mean ± standard deviation (S.D.) for three individual experiments. FIG.9B: Volcano plots illustrating the gene expression level log2FC comparing Tet2 KO versus WT, Pspc1 KO versus WT, and Nsun2 KD versus control in mESCs, along with the statistical significance of their changes. The data presented were obtained using caRNA-seq. P values for the comparison of expression levels between two groups were obtained using the Wald test by DESeq2. [0111] FIGs. 10A-10E, shows the results of targeted blockade and demethylation of IAP RNAs. FIG. 10A: qPCR analysis of IAP RNA fragments revealed main m5C sites near the 5′ end. Metagene profiles of IAP RNA m5C landscape in WT mESCs and qPCR primers for different regions were designed and individual amplicons were indicated. Provided is a zoom-in showing the 5′ end sequence (SEQ ID NO: 615) that contained multiple m5C sites as revealed by bisulfite sequencing (m5C sites indicated by asterisks). IAP-targeting ASO (SEQ ID NO: 89) was designed to interfere with m5C deposition near the IAP 5′ end (left). MeRIP- qPCR with primers targeting different amplicons enabled region-specific analysis of the IAP m5C level upon IAP ASO treatment in mESCs (see Methods, Table 2, for listing of qPCR primers). A non-targeting ASO (NC ASO; SEQ ID NO: 90) was used as control. P values were determined using unpaired Student’s t test with Welch’s correction by comparing values of IAP-targeting ASO with NC ASO by individual amplicons. ****P < 0.0001, ****P < 0.0001, ****P < 0.0001, ***P = 0.00086, and **P = 0.0046 for amplicon 1, 2, 3, 4, and 5, respectively. FIG.10B: IAP ASO treatment slightly downregulated IAP RNA levels in mESCs. P value was determined using unpaired Student’s t test with Welch’s correction by comparing values of IAP ASO with NC ASO, **P < 0.01. Bar plots represent mean ± standard deviation (S.D.) for three independent experiments. FIG. 10C: qPCR analysis of chromatin accessibility at IAP loci following ASO treatment. Hoxd13 loci was used as a control. P values were determined using unpaired Student’s t test with Welch’s correction by comparing values of IAP ASO with NC ASO for different loci. *P = 0.018 for IAP loci, IAP ASO versus NC ASO. NS denotes not significant (P > 0.05). FIG.10D: Presents a schematic of experiments comprising tethering of the catalytic domain of TET2 to the IAP RNA with the Tet-On system. Provided are schematics showing the design of CRISPR tethering system of the TET2 catalytic domain (TET2(CD)) to
IAP RNA (e.g., as denotated in FIG.10A). For RNA targeting, mESCs stably expressed fusion proteins dCas13b-TET2(CD) or a catalytically dead version (CD with mutations in the HxD motif) (dCas13b-TET2(CD)-HxDmut) thereof, and a guide RNA sequence downstream of a Tet operator (TetO)-controlled H1 promotor (H1-2O2). Validation of the tethering system of dCas13b-TET2(CD)-HxDmut (bottom left) or dCas13b-TET2(CD) (bottom right) to IAP. Doxycycline (DOX)-induced expression of the tethered TET2(CD) fusion protein to IAP RNAs could be observed starting at 4 hours of DOX treatment. Bindings of the TET2(CD) fusion protein to IAP RNAs were measured by RIP-qPCR at different time points after DOX treatment. Relative enrichment values were obtained by calculating recovery rate by dividing IP to input and normalizing all time points to 0 h. P values were determined using paired Student’s t test by comparing values at the indicated time points with values at “0 hour”, individually. Left: **P = 0.0049, **P = 0.0028, **P = 0.0055, *P = 0.011, *P = 0.024, **P = 0.0083, ***P = 0.0003, **P = 0.0016, and **P = 0.0087 for 4h, 8h, 12h, 16h, 20h, 24h, 36h, 48h and 72h versus 0h, respectively. Right: ***P = 0.00025, ***P < 0.00021, **P = 0.0038, *P = 0.019, ****P < 0.0001, **P = 0.0028, ***P = 0.00078, ***P = 0.00024, and ****P < 0.0001 for 4h, 8h, 12h, 16h, 20h, 24h, 36h, 48h and 72h versus 0h, respectively. Dots and error bars represent mean ± standard deviation (S.D.) for three individual experiments. FIG. 10E: Acute broad-spectrum TET enzyme inhibition resulted in a caRNA m5C hypermethylation response earlier than a gDNA 5mC hypermethylation response. Abundance of DNA 5mC in genomic DNA (top) or RNA m5C chromatin-associated RNA (caRNA) (bottom) were measured in WT mESCs at indicated time points (0 hour, 4 hour, 8 hour, 12 hour, 16 hour, 20 hour, 24 hour, 36 hour, 48 hour, and 72 hour) post application of a small-molecule inhibitor of all TET enzymes (TETi-C35) in the medium at the final concentration of 5 μM. The medium was refreshed every eight hours. P values were determined using unpaired Student’s t test with Welch’s correction. *, P < 0.05, **, P < 0.01, ***, P < 0.001, NS denotes not significant (P > 0.05). Bar plots represent mean ± standard deviation (S.D.) for three individual experiments. Top: *P = 0.0406, **P = 0.0088, and **P = 0.0014 for 36h, 48h, and 72h versus 0h, respectively. Bottom: *P = 0.0196, ***P = 0.0004, **P = 0.0016, ***P = 0.0002, ***P = 0.0003, ***P = 0.0002, ***P = 0.0004, and **P = 0.0027 for 8h, 12h, 16h, 20h, 24h, 36h, 48h and 72h versus 0h, respectively. [0112] FIGs. 11A-11B, TET2 depletion associated increased chromatin accessibility in mESCs was caused by the oxidative activity of TET2 on RNA, rather than DNA. FIG. 11A: Shows the correlation between DNA methylation differences in Tet2 KO versus WT mESCs and changes in downstream gene transcription. The carRNAs were categorized into
different groups, including eRNA, paRNA, and repeat RNAs. Within each group of carRNAs, they were further divided into 50 bins based on the ranked DNA methylation differences upon Tet2 KO in mESCs. FIG. 11B: Bar graphs illustrating the overlapping ratios (Jaccard index) of the genomic binding sites of PSPC1, TET2, and CXXC5 with DNA hypermethylated regions (detected in the Tet2 KO mESCs). [0113] FIGs.12A-12C, Decreased H2AK119ub and H3K27me3 were observed at the IAP loci following TET2 inhibition. FIG.12A: Bar graphs illustrating the overlapping ratios (Jaccard index) of regions marked by different histone modifications with DNA hypermethylated regions (detected in the Tet2 KO mESCs). FIG.12B: Volcano plot depicting the log2FC of H2AK119ub in Tet2 KO versus WT mESCs at the repeat family loci, with corresponding statistical significance. P values for the comparison of expression levels between two groups were obtained using the Wald test by DESeq2. FIG.12C: H2AK119ub at the IAP loci showed faster response to TET inhibition than H3K27me3. H2AK119ub and H3K27me3 chromatin bindings at IAP (left), MERVL (middle), and LINE (right) loci were measured through the CUT&Tag procedure followed by qPCR with loci-specific primers. P values were determined using unpaired Student’s t test with Welch’s correction by comparing values at the indicated time points with values at “0 hour”, individually. *, P < 0.05, **, P < 0.01, ***, P < 0.001, NS denotes not significant (P > 0.05). Bar plots represent mean ± standard deviation (S.D.) for three independent experiments. Left: for H2AK119ub, ****P < 0.0001, ****P < 0.0001, ***P = 0.00014, ****P < 0.0001, ****P < 0.0001, ****P < 0.0001, ****P < 0.0001, ****P < 0.0001, and ****P < 0.0001 for 4h, 8h, 12h, 16h, 20h, 24h, 36h, 48h and 72h versus 0h, respectively; for H3K27me3, **P = 0.0044, ****P < 0.0001, ****P < 0.0001, ****P < 0.0001, ****P < 0.0001, ****P < 0.0001, and ****P < 0.0001 for 12h, 16h, 20h, 24h, 36h, 48h and 72h versus 0h, respectively. Middle: for H2AK119ub, *P = 0.020, ***P = 0.00015, ***P = 0.00056, ****P < 0.0001, ***P = 0.00067, ***P = 0.00069, ***P = 0.00045, **P = 0.0012, and **P = 0.0011 for 4h, 8h, 12h, 16h, 20h, 24h, 36h, 48h and 72h versus 0h, respectively; for H3K27me3, **P = 0.0056, *P = 0.010, ***P = 0.0003, **P = 0.007, ****P < 0.0001, ****P < 0.0001, ****P < 0.0001, ****P < 0.0001, and ****P < 0.0001 for 4h, 8h, 12h, 16h, 20h, 24h, 36h, 48h and 72h versus 0h, respectively. Bar plots represent mean ± standard deviation (S.D.) for three individual experiments. [0114] FIGs. 13A-13E, MBD5 and MBD6 preferentially bound m5C-modified RNA. FIG. 13A: The MBD domains of human MBD5 and MBD6 were found to have distinct characteristics within the MBD family proteins. Sequence alignment of human MBD family proteins MBD domains (MBD1 (SEQ ID NO: 620), MBD2 (SEQ ID NO: 622), MBD3 (SEQ
ID NO: 619), MBD4 (SEQ ID NO: 621), MBD5 (SEQ ID NO: 617), MBD6 (SEQ ID NO: 616), and MeCP2 (SEQ ID NO: 618)) showed differences in the MBD region. Conservation scores were calculated by Jalview and conserved residues among analyzed proteins were colored. FIG. 13B: Validation of Flag-MBD5 and Flag-MBD6 overexpressed mESCs. Averaged transcript expression was indicated on top of each column. P values were determined using unpaired Student’s t test with Welch’s correction by comparing values in different cell lines with WT mESCs, individually. For Mbd5,
0.0001, and ***P =0.00011 for Flag- Mbd5 and Flag-Mbd6 versus WT, respectively; for Mbd6, ****P < 0.0001 for Flag-Mbd6 versus WT. NS denotes not significant (P > 0.05). Bar plots represent mean ± standard deviation (S.D.) for three independent experiments. FIG. 13C: MBD5 and MBD6 bound significantly more RNA relative to DNA in mESCs. Nucleic acids crosslinked to MBD5 or MBD6 by UV (254 nm) were end-labeled with biotin and imaged with ECL. Samples were digested by RNase A/T1 or DNase I. The image shown is representative of three independent experiments. Western blot was quantified with ImageJ, Analyze gel module. P values were determined using unpaired Student’s t test with Welch’s correction by comparing values at the indicated time points with values of the control with no RNase or DNase treatment, individually. For Flag-Mbd5, ***P = 0.00088 and ***P = 0.00077 for RNase only and RNase + DNase, respectively; for Flag-Mbd6, **P = 0.0017 and **P = 0.0015 for RNase only and RNase + DNase, respectively. NS denotes not significant (P > 0.05). Bar plots represent mean ± standard deviation (S.D.) for three individual experiments. FIG.13D: MBD5 enriched m5C- modified RNA in mESCs. rRNA-depleted (non-Ribo) RNAs crosslinked (CLIP) to Flag- tagged MBD5 with ultraviolet light (254 nm) were purified and digested to single nucleosides prior to mass spectrometry analysis. Abundance of m5C in RNA was obtained by normalizing the m5C level by C level in the sample. P values were determined using unpaired Student’s t test with Welch’s correction. ***P = 0.0008 for comparisons between CLIP and input samples. Bar plots represent mean ± standard deviation (S.D.) for three independent experiments. FIG. 13E: EMSA showing the binding preferences of the purified MBD domain of MBD6 (SEQ ID NO: 3) fused to the maltose binding protein (MBP) with C, m5C, or hm5C-containing double- stranded oligonucleotide probes. The MBP-MBD6(MBD) protein was incubated with 100 nM fluorescently labeled single-stranded or double-stranded probes. Free unbound probes were separated with RNA-protein complexes by electrophoresis. Individual dissociation coefficients (KD) were determined as described in the exemplary methods. Sequences of the oligonucleotide probes were designed according to the motif identified by CLIP in mESCs. The purity of the single-stranded probe (ss probe) and double-stranded probe (ds probe) was validated by
electrophoresis. The fluorescently labeled probes were visualized by in-gel fluorescence with FAM. [0115] FIGs. 14A-14E, MBD6 regulated the level of H2AK119ub and the half-life of m5C-modified RNAs. FIG. 14A: Venn diagram illustrating the overlap (3558 shared sites, 45%) between the RNA binding sites of MBD5 (401 MBD5 sites, 5%) and MBD6 (3917 MBD6 sites, 50%) (CLIP-seq, this study). FIG. 14B: Depicts qPCR results showing repeat RNAs expression levels upon Mbd5 or Mbd6 knockdowns. Transcript expression levels were obtained by normalizing target RNA levels to Gapdh. P values were determined using unpaired Student’s t test with Welch’s correction. For quantification of IAP levels, *P = 0.045 and *P = 0.010 for siMbd5 and siMbd6 versus siNC; for quantification of MERVL levels, **P = 0.0098 and ***P = 0.0008 for siMbd5 and siMbd6 versus siNC; for quantification of MusD levels, *P = 0.028 for siMbd6 versus siNC. NS denotes not significant (P > 0.05). Bar plots represent mean ± standard deviation (S.D.) for three individual experiments. FIG. 14C: Is a representative Western blot showing that MBD6 significantly regulated H2AK119ub levels in mESCs. The relative of H2AK119ub was obtained by normalizing the intensity of H2AK119ub blot to the intensity of GAPDH blot. Western blots were quantified by ImageJ analyze gel module. P values were determined using unpaired Student’s t test with Welch’s correction. *P = 0.028 for siMbd6 versus siNC. NS denotes not significant (P > 0.05). Bar plots represent mean ± standard deviation (S.D.) for three independent experiments. FIG. 14D: MBD6 stabilized m5C-modified chromatin-associated RNAs. rRNA-depleted chromatin-associated RNAs were purified and digested to single nucleosides prior to mass spectrometry analysis. The abundance of m5C in RNA was obtained by normalizing m5C level to C in the sample. P values were determined using unpaired Student’s t test with Welch’s correction. *P = 0.020 for comparisons between siMbd6 versus siNC. NS denotes not significant (P > 0.05). Bar plots represent mean ± standard deviation (S.D.) for three individual experiments. FIG. 14E: RT- qPCR analysis of the IAP RNA stability upon Tet2 KO or Mbd6 knockdown in mESCs. P values were determined using unpaired Student’s t test with Welch’s correction by comparing WT versus Tet2 KO or WT mESCs transfected with negative control siRNA (siNC) and siRNA targeting Mbd6 (siMbd6). Tet2 knockout and WT, *P = 0.015 and *P = 0.014 for 3 hour and 6 hour; siMbd6 and siNC, **P = 0.0082 for 3 hour. NS denotes not significant (P > 0.05). Dot plots represent mean ± standard deviation (S.D.) for three independent experiments. [0116] FIGs. 15A-15D, Mbd6 knockdown suppressed TET2-loss-mediated abnormal HSPC proliferation in mouse LK cells. FIG.15A: Shows average ATAC-seq profiles at the gene level in WT and Tet2-/- mLK cells. FIG.15B: Are qPCR results validating efficiency of
shRNA-enabled Mbd6 knockdown in mLK cells. Transcript expression levels were obtained by normalizing target RNA levels to that of Actb. P values were determined using unpaired Student’s t test with Welch’s correction. ****P < 0.0001 and ****P < 0.0001 for shMbd6 versus shNC in WT and Tet2-/-, respectively. Bar plots represent mean ± standard deviation (S.D.) for three individual experiments. FIG. 15C: Are representative photographs showing colony formation during replating assays of mLK cells in WT or Tet2-/- mLK cells upon Mbd6 knockdown or control shRNA administration (shNC). FIG. 15D: Show representative flow cytometry results characterizing HSPC frequency in suspension cultures in the presence of indicated cytokines (see Methods) (top, c-Kit and Lineage as markers) and differentiation (bottom, CD11b as marker) in WT or Tet2-/- mLK cells upon Mbd6 knockdown or control shRNA administration (shNC). [0117] FIGs.16A-16D, Targeted m5C oxidation on IAP RNA significantly suppressed the TET2-loss-mediated abnormal HSPC proliferation and differentiation of mLK cells. FIG. 16A: Flow cytometry analyses of dCas13d-TET2CD fusion protein and guide RNA transfection efficiency in mLK cells. EGFP was used to visualize guide RNA expression. mCherry was used to visualize dCas13d-TET2CD protein. Comparable transfection efficiencies were obtained for all conditions. Representative flow cytometry analyses were shown here as density plots and 10,000 gated single cells were used for each analysis. FIG. 16B: Replating assay of WT and Tet2-/- mLK cells transfected with guide RNA targeting the IAP RNA using either the RfxdCas13d fusion protein with TET2 catalytic domain (TET2CD) or the inactivated mutant (TET2HxDCD). Tet2-/- mLK cells were isolated from total BM cells of 6–8 weeks old mice prior to electroporation. Colony counts were scored every 7 days. Representative images of the 2nd replating were shown. Numbers of colonies after each replating were quantified. Statistical analyses were conducted by comparing number of colonies between Tet2-/- TET2CD versus Tet2-/- TET2HxDCD at different replating, respectively. P values were determined using unpaired Student’s t test with Welch’s correction. ****P < 0.0001 for 2nd replating, ****P < 0.0001 for 3rd replating, and ****P < 0.0001 for 4th replating. Bar plots represent mean ± standard deviation (S.D.) for three independent experiments. FIG.16C: Flow cytometry analyses of HSPC frequency of WT and Tet2-/- mLK cells transfected with guide RNA targeting the IAP RNA using the RfxdCas13d fusion protein with TET2CD or TET2HxDCD. Lin-c-Kit+ mouse HSPCs (mLK cells) were isolated from total BM cells of 6–8 weeks old mice prior to transfection by electroporation. HSPC frequency was quantified by fractions of Lin-c-Kit+ cells in the pool. Representative flow cytometry analyses were shown here as density plots and 30,000 gated single cells were used for each analysis. P
values were determined using unpaired Student’s t test with Welch’s correction. **P = 0.0019 for Tet2-/- TET2HxDCD versus WT TET2HxDCD and *P = 0.021 for Tet2-/- TET2HxDCD versus Tet2-/- TET2CD, respectively. Bar plots represent mean ± standard deviation (S.D.) for three individual experiments. FIG.16D: Flow cytometry analyses of differentiation of WT and Tet2-/- mLK cells transfected with guide RNA targeting the IAP RNA using the RfxdCas13d fusion protein with TET2CD or TET2HxDCD. Lin-c-Kit+ mouse HSPCs were isolated from total BM cells of 6–8 weeks old mice prior to transfection by electroporation. Differentiation towards monocyte/granulocytes and granulocytes was quantified by fractions of CD11b+ cells in the pool. Representative flow cytometry analyses were shown here as density plots and 30,000 gated single cells were used for each analysis. P values were determined using unpaired Student’s t test with Welch’s correction. ****P < 0.0001 for Tet2-/- TET2HxDCD versus WT TET2HxDCD and *P = 0.027 for Tet2-/- TET2HxDCD versus Tet2-/- TET2CD, respectively. Bar plots represent mean ± standard deviation (S.D.) for three individual experiments. [0118] FIGs. 17A-17D, Targeted ASO blockade of IAP RNA suppressed the TET2- loss-mediated abnormal HSPC proliferation and differentiation in mLK cells. FIG.17A: Flow cytometry analyses of ASO transfection efficiency in mLK cells. ASOs labeled by APC fluorophores (SEQ ID NOs: 88-90) were visualized by the corresponding laser channel. Comparable transfection efficiencies were obtained for different ASOs. Representative flow cytometry analyses were shown here as density plots and 10,000 gated single cells were used for each analysis. FIG. 17B: Replating assay of WT and Tet2-/- mLK cells transfected with control (NC ASO; SEQ ID NO: 90), MERVL-targeting (MERVL ASO; SEQ ID NO: 88) or IAP-targeting (IAP ASO; SEQ ID NO: 89) steric antisense oligos. Tet2-/- mLK were isolated from total BM cells of 6–8 weeks old mice prior to ASO transfection by electroporation. Colony counts were scored every 7 days. Numbers of colonies after each replating were quantified. Statistical analyses were conducting by comparing number of colonies between Tet2-/- NC ASO versus Tet2-/- MERVL ASO or Tet2-/- IAP ASO, respectively. P values were determined using unpaired Student’s t test with Welch’s correction. **P = 0.0055 for 1st replating, **P = 0.0048 and ***P = 0.00048 for 2nd replating, ***P = 0.00015 and ****P < 0.0001 for 3rd replating, ****P < 0.0001 and ****P < 0.0001 for 4th replating. NS denotes not significant (P > 0.05). Bar plots represent mean ± standard deviation (S.D.) for three individual experiments. FIG.17C: Flow cytometry analyses of the HSPCs frequency of WT and Tet2-/- mLK cells transfected with control (NC ASO), MERVL-targeting (MERVL ASO) or IAP- targeting (IAP ASO) antisense oligos. mLK cells were isolated from total BM cells of 6–8 weeks old mice prior to ASO transfection by electroporation. The frequency of HSPCs was
quantified by fractions of Lin-c-Kit+ cells in the pool. Representative flow cytometry analyses were shown here as density plots and 30,000 gated single cells were used for each analysis. P values were determined using unpaired Student’s t test with Welch’s correction. **P = 0.0057 for Tet2-/- NC ASO versus WT NC ASO. NS denotes not significant (P > 0.05). Bar plots represent mean ± standard deviation (S.D.) for three individual experiments. FIG.17D: Flow cytometry analyses of differentiation of WT and Tet2-/- mLK cells transfected with control (NC ASO), MERVL-targeting (MERVL ASO) or IAP-targeting (IAP ASO) antisense oligos. mLK cells were isolated from total BM cells of 6–8 weeks old mice prior to ASO transfection by electroporation. Differentiation towards monocyte/macrophages and granulocytes was quantified by fractions of CD11b+ cells in the pool. Representative flow cytometry analyses were shown here as density plots and 30,000 gated single cells were used for each analysis. P values were determined using unpaired Student’s t test with Welch’s correction. *P = 0.019 for Tet2-/- NC ASO versus WT NC ASO. NS denotes not significant (P > 0.05). Bar plots represent mean ± standard deviation (S.D.) for three individual experiments. [0119] FIGs. 18A-18C, IAP RNA blockade suppressed gene expression profiles caused by TET2 deficiency in mLK cells. FIG.18A: Venn diagram displaying the overlap among differentially expressed genes upon Tet2 KO or the treatment with ASO targeting IAP RNA in mLK cells (total RNA-seq). Barplot depicting the odds ratio of upDEGs following Tet2 KO, in comparison to both upDEGs and downDEGs after treatment with ASO targeting IAP RNA in mLK cells. Two-tailed P values were calculated using the Fisher’s Exact Test. FIG.18B: KEGG pathway enrichment analysis of genes that were up-regulated by Tet2 KO and down-regulated by the treatment with ASO targeting IAP RNA in mLK cells. One-sided P values were calculated using Fisher's Exact test, and the size of the circle represents the level of significance, with larger circles indicating greater significance and smaller circles indicating lower significance. FIG. 18C: The correlation of gene expression log2FC between Tet2 KO versus WT and ASO targeting IAP versus control in mLK cells. The genes were associated with various signaling pathways, including MAPK signaling pathways (top) and C-type lectin receptor signaling pathways (bottom). [0120] FIGs. 19A-19E, MBD6 knockdown potently inhibited TET2-deficient leukemia cells. FIG. 19A: Proliferation curves showing that MBD6 knockdown attenuated leukemia cell proliferation. Knockdown of MBD6 by two individual siRNAs (siMBD6-1 and siMBD6-2) in human leukemia cells (TF-1 and OCI-AML3) were performed. Cell proliferations were monitored with MTS assay at different time points post viral transduction (24 hour, 48 hour, 72 hour and 96 hour). MTS signals at different time points were normalized
to those at 24 hour to yield relative MTS signals. Non-targeting siRNA (siNC) was used as a control. P values were determined using unpaired Student’s t test with Welch’s correction by comparing MTS signals at 96 hours of siMBD6-1 or siMBD6-2 versus siNC groups, respectively. ***P = 0.0002 and ****P < 0.0001 for TF-1. NS denotes not significant (P > 0.05). Line plots represent mean ± standard deviation (S.D.) for six independent experiments. FIG.19B: (Left) Western blot analysis of TET2 levels in WT and TET2 KO K-562 and THP- 1 cells. ACTB was used as a loading control. Representative image of three individual experiments was shown. (Right) qPCR results validating efficiency of shRNA-enabled MBD6 knockdown in K-562 or THP-1 cells. Transcript expression levels were obtained by normalizing target RNA levels to that of GAPDH. P values were determined using unpaired Student’s t test with Welch’s correction. ***P = 0.0092 and ****P < 0.0001 for shMBD6 versus shNC in WT and TET2 KO K-562 cells, respectively; ***P = 0.0018 and ***P = 0.0001 for shMBD6 versus shNC in WT and TET2 KO THP-1 cells, respectively. Bar plots represent mean ± standard deviation (S.D.) for three individual experiments. FIG. 19C: Western blots assaying the global H2AK119ub level or cleaved Poly[ADP-ribose] Polymerase 1 (PARP1) upon MBD6 knockdown. Statistical tests were performed using paired Student’s t test with Welch’s correction by comparing WT versus TET2 KO groups in shNC or shMBD6 samples. For cleaved PARP1 levels, *P = 0.049 for WT versus TET2 KO of shMBD6 samples. For H2AK119ub levels, *P = 0.048 and *P = 0.036 for WT versus TET2 KO of shNC and shMBD6 samples, respectively. NS denotes not significant (P > 0.05). Bar plots represent mean ± standard deviation (S.D.) for three independent experiments. FIG.19D: RT-qPCR validation of NSUN2 knockdown in K-562 and THP-1 cells. Relative transcript abundance was quantified by normalizing to that of GAPDH. P values were determined using unpaired Student’s t test with Welch’s correction by comparing siNSUN2 with siNC. *P = 0.010 for K-562 and *P = 0.011 for THP-1. Bar plots represent mean ± standard deviation (S.D.) for three independent experiments. FIG.19E: Proliferation curves showing that NSUN2 knockdown attenuated TET2 knockout mediated cell growth in K-562 and THP-1 cells. Cell proliferation was monitored with MTS assay at different time points post dilution (24 hour, 48 hour, 72 hour and 96 hour). MTS signals at different time points were normalized to those at 24 hour to yield relative MTS signals. P values were determined using unpaired Student’s t test with Welch’s correction by comparing MTS signals at 96 hours of WT versus TET2 KO groups. ***P = 0.0003 for K-562 cells; **P = 0.0026 for THP-1 cells. NS denotes not significant (P > 0.05). Line plots represent mean ± standard deviation (S.D.) for four independent experiments.
[0121] FIGs.20A-20C, MBD6 knockdown attenuated leukemia progression in mouse PDX models regardless of leukemia cell TET2 mutational status. FIG. 20A: NSG mice were intravenously injected with K-562 (FIG. 20B) or THP-1 (FIG. 20C) cells, and the animal’s overall survival was monitored and analyzed using the Kaplan-Meier estimator (WT shNC, n = 5; WT shMBD6, n = 5; TET2 KO shNC, n = 5; TET2 KO shMBD6, n = 5; see FIG. 5C). After transplantation, PB was harvested from submandibular vein, and BM was isolated from the tibias and femurs for human CD33+ (for K-562) or CD33+/CD45+ (for THP-1) chimerism detection and quantifications. FIG. 20B: Fractions of CD33+ cells in BM or PB isolated from NSG mice xenotransplanted with tumor cells (K-562) were quantified by flow cytometry analyses. P values were determined using unpaired Student’s t test with Welch’s correction. *P = 0.017 and *P = 0.047 for TET2 KO shNC versus TET2 KO shMBD6 in BM and PB, respectively. NS denotes not significant (P > 0.05). Bar plots represent mean ± standard deviation (S.D.) for three independent experiments. FIG. 20C: Fractions of CD33+/CD45+ cells in BM or PB isolated from NSG mice xenotransplanted with tumor cells (THP-1) were quantified by flow cytometry analyses. P values were determined using unpaired Student’s t test with Welch’s correction. *P = 0.017 and *P = 0.021 for WT shNC versus WT shMBD6 and TET2 KO shNC versus TET2 KO shMBD6 in BM; **P = 0.0043 and *P = 0.045 for WT shNC versus WT shMBD6 and TET2-/- shNC versus TET2-/- shMBD6 in PB, respectively. Bar plots represent mean ± standard deviation (S.D.) for three individual experiments. [0122] FIGs.21A- 21F, Chromatin-associated LTR RNAs were upregulated in TET2- deficiency leukemia cells. FIG.21A: Purities of different subcellular fractions of K-562 cells were validated with western blot. GAPDH was used as cytosolic (Cyt) protein marker, SNRP70 was used as nuclear soluble (Nuc) protein marker, and H3 was used as chromatin (Chr) marker. Equal aliquots of each isolated fractions were loaded onto each lane to enable direct comparison. FIG. 21B: Bar plots showing the changes of caRNA abundance (left) and total RNA abundance (right) relative to spike-in ERCC RNAs in WT, TET2 KO and TET2 KO & MBD6 KD K-562 cells (n = 3). FIG. 21C: Volcano plots depicting the log2FC of carRNA abundance, comparing TET2 KO with WT K-562 cells, and comparing MBD6 KD with control in TET2 KO K-562. The statistical significance of the changes were also determined and indicated on the y-axis. carRNAs were categorized into eRNAs, paRNAs, and repeat RNAs. P values for the comparison of expression levels between two groups were obtained using the Wald test by DESeq2. FIG.21D: Shows the negative correlation of repeat RNAs abundance log2FC when comparing TET2 KO with WT K-562 cells and when comparing MBD6 KD with
control in TET2 KO K-562 cells. FIG. 21E: Shows a scatter plot demonstrating the positive correlation between repeat RNAs abundance log2FC and their target gene expression log2FC when comparing TET2 KO versus WT in K-562 cells. FIG.21F: Boxplots displays the log2FC of carRNA abundance between TET2 KO versus WT, and between MBD6 KD versus control in TET2 KO K-562 cells. The carRNAs, including eRNA, paRNAs and repeat RNAs, were categorized into two groups based on whether or not they were m5C methylated. P values were calculated by a nonparametric two-tailed Wilcoxon-Mann-Whitney test. [0123] FIGs. 22A-22E, MBD6 enhanced the expression of TET2-regulated LTRs through H2AK119ub deubiquitylation. FIG.22A: Venn diagram illustrating the overlap of H2AK119ub peaks in control and MBD6 KD K-562 cells (4245 unique siNC peaks; 42,041 overlapping peaks; and 30,462 unique siMBD6 peaks), as well as in WT and TET2 KO K-562 cells (28,466 unique WT peaks; 36,776 overlapping peaks; and 7,214 unique TET2 KO peaks). FIG. 22B: Displays H2AK119ub modification levels at the gene level in control and MBD6 KD K-562 cells, as well as in WT and TET2 KO K-562 cells. FIG.22C: Displays cumulative curves of the log2FC of H2AK119ub between TET2 KO versus WT, and between MBD6 KD versus control K-562 cells at different genomic loci including enhancers (bottom) and promoters (top). P values were calculated by a nonparametric two-tailed Wilcoxon-Mann- Whitney test. FIG. 22D: Shows the correlation between log2FC of repeat RNAs abundance (TET2 KO vs. WT) and log2FC of H2AK119ub at the corresponding genomic loci (TET2 KO vs. WT, top; MBD6 KD vs. control, bottom) in K-562 cells. FIG.22E: Shows the correlation between the number of m5C methylated repeat RNAs and their respective m5C methylation levels in WT K-562 cells. Among LTR repeat RNAs, HERVH-int ranked top in terms of methylation enrichment, while LTR12C had the highest levels of methylation. [0124] FIGs. 23A-23C, MBD6 deficiency resulted in downregulation of leukemia- related genes that were generally upregulated in TET2 depletion models. FIG.23A: Venn diagram illustrating the overlap of differentially expressed genes upon TET2 KO in control K- 562 cells, and upon MBD6 KD in TET2 KO K-562 cells. Barplot depicting the odds ratio of upDEGs following TET2 KO, in comparison to both upDEGs and downDEGs upon MBD6 KD in TET2 KO K-562 cells. Two-tailed P values were calculated using the Fisher’s Exact Test. FIG.23B: KEGG pathway enrichment analysis was performed on genes that were upregulated in TET2 KO in control K-562 cells, and downregulated in MBD6 KD in TET2 KO K-562 cells. One-sided P values were calculated using Fisher's Exact test, and the size of the circle represents the level of significance, with larger circles indicating greater significance and smaller circles indicating lower significance. FIG. 23C: Shows the correlations of gene
expression log2FC between TET2 KO versus WT, and MBD6 KD versus control in TET2 KO K-562 cells. The genes were found to be associated with various signaling pathways, including Th1 and Th2 cell differentiation, NF-kappa B signaling pathway, and C-type lectin receptor signaling pathways. [0125] FIGs.24A-24B, TET2 inhibition and MBD6 inhibition acted synergistically to reduce cancer cell proliferative capacity. FIG.24A: Heatmap showing the proliferation of various cancer cell lines with shRNA-based knockdown of TET2 and/or MBD6. Expression plasmids for shRNAs were delivered into cells via lentiviral transduction with hygromycin B or puromycin resistance, respectively. Cell proliferation was measured by MTS assay (Promega) at 96-hours after transduction. FIG. 24B: Heatmap showing the proliferation of various leukemic cells with shRNA-based knockdown of TET2 and/or MBD6. The results showed that knockdown of TET2 and MBD6 exerted synergistic proliferation inhibition effects, particularly in leukemic cells and glioma cells. [0126] FIG.25, TET2 knockout reduces H3K27me3 modifications at sites co-occupied with H2AK119ub. In human leukemia cells, H3K27me3 did not show a global change upon TET2 knockout (left panel). H3K27me3 peaks were then categorized into two subgroups according to whether or not they were overlapped with H2AK119ub (middle and right panels respectively). The results showed a dramatic reduction in H3K27me3 modification level upon TET2 knockout when these peaks were co-occupied with H2AK119ub (right panel). Similar changes were not observed at genomic regions exclusively marked with H3K27me3 modifications. [0127] FIG. 26, Exemplary model of TET2, NSUN2, MBD6, and PR-DUB interactions with H2A. DETAILED DESCRIPTION I. Terminology [0128] Throughout this application, the term “about” is used to indicate that a value includes the inherent variation of error for the measurement or quantitation method. [0129] The use of the word “a” or “an” when used in conjunction with the term “comprising” may mean “one,” but it is also consistent with the meaning of “one or more,” “at least one,” and “one or more than one.” [0130] The phrase “and/or” means “and” or “or”. To illustrate, A, B, and/or C includes: A alone, B alone, C alone, a combination of A and B, a combination of A and C, a combination
of B and C, or a combination of A, B, and C. In other words, “and/or” operates as an inclusive or. It is specifically contemplated that A, B, or C may be specifically excluded from an aspect. [0131] The words “comprising” (and any form of comprising, such as “comprise” and “comprises”), “having” (and any form of having, such as “have” and “has”), “including” (and any form of including, such as “includes” and “include”) or “containing” (and any form of containing, such as “contains” and “contain”) are inclusive or open-ended and do not exclude additional, unrecited elements or method steps. [0132] The compositions and methods for their use can “comprise,” “consist essentially of,” or “consist of” any of the ingredients or steps disclosed throughout the specification. Compositions and methods “consisting essentially of” any of the ingredients or steps disclosed limits the scope of the claim to the specified materials or steps which do not materially affect the basic and novel characteristic of the claimed invention. [0133] The term sequence as used herein in reference to a polynucleotide refers to the nucleotide sequence such as “A” for adenosine, “G” for guanine, “C” for cytosine, “T” for thymine, “U” for uracil, “I” for inosine, and “N” for “A”/“C”/“U”/“T”/“G”/“I”. [0134] As used herein, the terms “individual," “subject,” and “patient” are used interchangeably and can refer to a human or non-human. [0135] As used herein, a “protein” “peptide” or “polypeptide” refers to a molecule comprising at least five amino acid residues. As used herein, the term “wild-type” refers to the endogenous version of a molecule that occurs naturally in an organism. In some aspects, wild- type versions of a protein or polypeptide are employed, however, in many aspects of the disclosure, a modified protein or polypeptide is employed. The terms described above may be used interchangeably. A “modified protein” or “modified polypeptide” or “engineered protein” or “engineered polypeptide” or a “variant” refers to a protein or polypeptide whose chemical structure, particularly its amino acid sequence, is altered with respect to the wild-type protein or polypeptide. In some aspects, a modified/variant protein or polypeptide has at least one modified activity or function (recognizing that proteins or polypeptides may have multiple activities or functions). It is specifically contemplated that a modified/variant protein or polypeptide may be altered with respect to one activity or function yet retain a wild-type activity or function in other respects, such as catalytic activity, RNA-binding activity, etc. [0136] Where a protein is specifically mentioned herein, it is in general a reference to a native (wild-type) or recombinant (modified) protein or, optionally, a protein in which any signal sequence has been removed. The protein may be isolated directly from the organism of which it is native, produced by recombinant DNA/exogenous expression methods, or produced
by solid phase peptide synthesis (SPPS) or other in vitro methods. In particular aspects, there are isolated nucleic acid segments and recombinant vectors incorporating nucleic acid sequences that encode a polypeptide (e.g., an enzymatic domain, such as a deaminase domain, or a fragment thereof). The term “recombinant” may be used in conjunction with a polypeptide or the name of a specific polypeptide, and this generally refers to a polypeptide produced from a nucleic acid molecule that has been manipulated in vitro or that is a replication product of such a molecule. [0137] In certain aspects the size of a protein or polypeptide (wild-type or modified) may comprise, but is not limited to, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 275, 300, 325, 350, 375, 400, 410, 415, 420, 425, 430, 435, 440, 445, 450, 455, 460, 465, 470, 475, 480, 485, 490, 495, 500, 525, 550, 575, 600, 625, 650, 675, 700, 725, 750, 775, 800, 825, 850, 875, 900, 925, 950, 1000, 1200, 1400, 1600, 1800, or 2000 amino acid residues or nucleic acid residues or greater, and any range derivable therein, or derivative of a corresponding amino sequence described or referenced herein. It is contemplated that polypeptides may be mutated by truncation, rendering them shorter than their corresponding wild-type form, also, they might be altered by fusing or conjugating a heterologous protein or polypeptide sequence with a particular function (e.g., for targeting, localization, linking, etc.). [0138] In certain aspects, nucleic acid sequences can exist in a variety of instances such as: isolated segments and recombinant vectors of incorporated sequences or recombinant polynucleotides encoding an enzyme, or a fragment, derivative, or variant thereof, polynucleotides sufficient for targeting a complex to specific loci, polynucleotides sufficient to mediate caRNA methylation (e.g., m5C) level modifications, polynucleotides sufficient for use as hybridization probes, PCR primers or sequencing primers for identifying, analyzing, mutating or amplifying a polynucleotide encoding a polypeptide, anti-sense nucleic acids for inhibiting expression of a polynucleotide, ancillary components of CRISPR/Cas systems, functional oligonucleotides, donor constructs, rescue constructs, and complementary sequences of the foregoing described herein. Nucleic acids encoding fusion proteins that include the proteins/polypeptides described herein are also contemplated. The nucleic acids can be single-stranded or double-stranded and can comprise RNA and/or DNA nucleotides and artificial variants thereof (e.g., peptide nucleic acids, etc.).
[0139] The term “polynucleotide” refers to a nucleic acid molecule that either is recombinant or has been isolated from total genomic nucleic acid. Included within the term “polynucleotide” are oligonucleotides (e.g., nucleic acids typically 200 residues or less, or 100 residues or less in length), recombinant vectors, including, for example, plasmids, cosmids, phage, viruses, and the like. Polynucleotides include, in certain aspects, regulatory sequences, isolated substantially away from their naturally occurring genes or protein encoding sequences. Polynucleotides may be single- stranded (coding or antisense) or double- stranded, and may be RNA, DNA (genomic, cDNA or synthetic), analogs thereof, or a combination thereof. Additional coding or non-coding sequences may, but need not, be present within a polynucleotide. [0140] In certain respects, the term “gene,” “polynucleotide,” or “nucleic acid” is used to refer to a nucleic acid that encodes a protein, polypeptide, or peptide (including any sequences required for proper transcription, post-translational modification, or localization), or a functional RNA species, such as but not limited to, CRISPR/Cas system ancillary components, linking elements, targeting RNAs, etc. As will be understood by those in the art, this term encompasses genomic sequences, expression cassettes, cDNA sequences, and smaller engineered nucleic acid segments that express, or may be adapted to express, proteins, polypeptides, domains, peptides, fusion proteins, mutants, and functional RNA species. A nucleic acid encoding all or part of a polypeptide and/or functional RNA species may contain a contiguous nucleic acid sequence encoding all or a portion of such a polypeptide and/or functional RNA species. It also is contemplated that a particular polypeptide and/or functional RNA species may be encoded by nucleic acids containing variations having slightly different nucleic acid sequences but, nonetheless, encode the same or substantially similar protein and/or RNA species. [0141] In certain aspects, there are polynucleotide variants having substantial identity to the sequences disclosed herein; those comprising at least 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% or higher sequence identity, including all values and ranges there between, compared to a polynucleotide sequence provided herein using the methods known in the art and/or described herein (e.g., BLAST analysis using standard parameters). In certain aspects, the isolated polynucleotide will comprise a nucleotide sequence encoding a polypeptide and/or functional RNA species that has at least 90%, 95%, or above, identity to an amino acid sequence and/or RNA sequence described herein, over the entire length of the sequence; or a nucleotide sequence complementary to said isolated polynucleotide.
[0142] Nucleic acid segments, regardless of the length of the coding sequence itself, may be combined with other nucleic acid sequences, such as promoters, polyadenylation signals, enhancers, destabilization sites, restriction enzyme sites, multiple cloning sites, other coding segments, and the like, such that their overall length may vary considerably. The nucleic acids can be any length. They can be, for example, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 75, 100, 125, 175, 200, 250, 300, 350, 400, 450, 500, 750, 1000, 1500, 3000, 5000 or more nucleotides in length, and/or can comprise one or more additional sequences, for example, regulatory sequences, and/or be a part of a larger nucleic acid, for example, a vector. It is therefore contemplated that a nucleic acid fragment of almost any length may be employed, with the total length preferably being limited by the ease of preparation and use in the intended recombinant nucleic acid protocol. In some cases, a nucleic acid sequence may encode a polypeptide sequence with additional heterologous coding sequences, for example to allow for purification of the polypeptide, transport, secretion, post-translational modification, or for therapeutic benefits such as targeting or efficacy. As discussed above, a tag or other heterologous polypeptide may be added to the modified polypeptide-encoding sequence, wherein “heterologous” refers to a polypeptide that is not the same as the modified polypeptide. [0143] The term “near or at”, particularly when used in the context of chromosomal distance, as used herein generally refers to a distance of less than, 4.5-5.5 kb (e.g., less than 5.5 kb, less than 5 kb, less than 4kb, less than 4.5 kb, less than 3 kb, less than 2 kb, or less than 1 kb, or any range derivable therein) between two locations. [0144] A "diseased cell" as used herein generally refers to a cell associated with a disease or disorder comprising one or more genetic mutations relative to a healthy cell in the same individual. The one or more genetic mutations (including but not limited to, large or small truncations, point mutations, indels, complex fusions, etc.,) result in an abnormal phenotype relative to cells that do not comprise the one or more mutations (e.g. "healthy cells"). Abnormal phenotypes may include reduced or increased fitness, such as phenotypes associated with cancerous cells. In certain aspects, a diseased cell comprises one or more mutations in one or more oncogenes and/or tumor suppressor genes. In some aspects, a diseased cell may comprise one or more mutations in one or more genes encoding ten-eleven translocation (tet) methylcytosine dioxygenase 2 (TET2), ASXL transcriptional regulator 1 (ASXL1), ASXL Transcriptional Regulator 2 (ASXL2), ASXL Transcription Regulator 3 (ASXL3), isocitrate dehydrogenase 1 (IDH1), isocitrate dehydrogenase 2 (IDH2), tumor protein p53 (p53), DNA (cytosine-5-)-methyltransferase 3A (DNMT3A), Janus kinase 2 (JAK2), Protein Phosphatase Mn2+/Mg2+-Dependent 1D (PPM1D), Spliceosome Factor 3b1 (SF3B1), Serine and Arginine
Rich Splicing Factor 2 (SRSF2), a component of a canonical and/or non-canonical Polycomb Repressive Complex (PRC) (e.g., E3 Ubiquitin Ligase RING1A/B, Polycomb Group Ring Finger 1 (PCGF1), Polycomb Group Ring Finger 2 (PCGF2), Polycomb Group Ring Finger 3 (PCGF3), Polycomb Group Ring Finger 4 (PCGF4), Polycomb Group Ring Finger 5 (PCGF5), and/or Polycomb Group Ring Finger 6 (PCGF6)), Polycomb Repressive-Deubiquitinase (PR- DUB) complex associated components O-linked N-acetylglucosamine Transferase (OGT), Lysine Demethylase 1B (KDM1B), Forkhead Box K1 (FOXK1), Forkhead Box K2 (FOXK2), BRCA1 Associated Protein 1 (BAP1), and/or Host Cell Factor C1 (HCFC1). [0145] The term “proteolysis targeting chimera” or “PROTAC” as used herein refers to a chimeric molecule comprising two protein binding molecules: one capable of engaging an E3 ubiquitin ligase and another that binds to a target protein meant for degradation. Recruitment of the E3 ligase to the target protein results in ubiquitination and subsequent degradation of the target protein via the proteosome. In some aspects, a PROTAC disclosed herein may target an MBD5 protein, an MBD6 protein, a TET2 protein, an NSUN1 protein, an an NSUN2 protein, or any combination thereof. In some aspects, a PROTAC disclosed herein may target an MBD6 protein. [0146] It is specifically contemplated that any limitation discussed with respect to one aspect of the invention may apply to any other aspect of the invention. Furthermore, any composition of the invention may be used in any method of the invention, and any method of the invention may be used to produce or to utilize any composition of the invention. Aspects of an aspect set forth in the Examples are also aspects that may be implemented in the context of aspects discussed elsewhere in a different Example or elsewhere in the application, such as in the Summary, Detailed Description, Claims, and Brief Description of the Drawings. [0147] A variety of aspects are discussed throughout this application. Any aspect discussed with respect to one aspect applies to other aspects as well and vice versa. Each aspect described herein is understood to be aspects that are applicable to all aspects. It is contemplated that any aspect discussed herein can be implemented with respect to any method or composition, and vice versa. Furthermore, compositions and kits can be used to achieve methods disclosed herein. [0148] Any method in the context of a therapeutic, diagnostic, or physiologic purpose or effect may also be described in “use” claim language such as “Use of” any compound, composition, or agent discussed herein for achieving or implementing a described therapeutic, diagnostic, or physiologic purpose or effect.
II. Overview [0149] Chromatin-associated regulatory RNAs can offer a platform for dynamic chromatin regulation. Recent studies have shown that the N6-methyladenosine (m6A) modification on carRNAs can contribute to critical roles in global and local chromatin state in mESCs and during mouse early development, with the methyltransferase METTL3 serving as a writer (e.g., installer), with binding proteins such as YTHDC1 serving as readers, and with enzymes such as FTO serving as an eraser to reversibly & dynamically guide transcriptional regulation. As described herein, the inventors have discovered that other RNA modifications may be present on caRNAs, such as carRNAs, and that these modifications can also dynamically and reversibly affect chromatin regulation. [0150] Ten-eleven translocation enzyme 2 (TET2) is known to mediate DNA 5mC oxidation and has been established as a tumor suppressor, particularly in blood malignancies. However, TET2 mutations are known to result in global DNA hypomethylation, instead of hypermethylation. This apparent inconsistency, where loss of a DNA methylation marker oxidizer (TET2, a DNA demethylase) results in RNA hypermethylation, is a puzzle that was heretofor not explained. As described herein, the inventors have discovered that TET2 mediates RNA 5-methylcytosine (m5C) methylation, particularly on caRNAs, such as carRNAs (and in particular, in the LTR family RNAs), to regulate chromatin state and transcription. Further, as described herein, the inventors have discovered that methyl-CpG binding domain protein 6 (MBD6) preferentially recognizes RNA m5C over DNA 5mC, and particularly in repeat RNA and/or regulatory RNA sequences. MBD6 can then recruit BRCA1 associated protein-1 (BAP1) associated complexes to mediate H2AK119ub deubiquitylation for increased gene activation/open chromatin state. Relatedly, m5C oxidation by TET2 can act on these caRNAs to antagonize gene activation through the m5C-MBD6-BAP1 deubiquitylation axis. Loss of TET2 can lead to caRNA m5C hypermethylation, more open chromatin, and/or widespread DNA hypomethylation that can result in activation of genes critical to disease states (e.g., critical to leukemogenesis), helping to explain accelerated malignancies associated with TET2 inactivation (e.g., accelerated myeloid malignancy, see FIG.5K). [0151] The results provided herein also reveal a potential bimodal function for TET2. Wherein TET2 can bind CXXC4/5 to mediate DNA 5mC oxidation at enhancers; but, when recruited by PSPC1, TET2 can mediate RNA m5C oxidation (such as but not limited to repeat RNA, caRNA, and/or carRNA). This RNA m5C oxidation activity by TET2 can dictate global chromatin regulation in cells, such as but not limited to mESCs, HPSCs, and leukemia cells. As described herein, the inventors have discovered a new NSUN2-TET2-MBD6-BAP1 axis in
chromatin and transcription regulation that functions through caRNA m5C (e.g., repeat carRNA m5C). Practically, the results provided herein reveal targets for therapeutic intervention in disease states associated with mutants in TET2 and/or TET2 pathways (e.g., that function upstream and/or downstream of TET2 on similar and/or the same substrates, for example but not limited to, NSUN1, NSUN2, MBD5, MBD6, ASXL1, ASXL2, ASXL3, IDH1, IDH2, etc.). [0152] 5-methylcytosine (m5C) is one of the most prevalent modifications of RNA, playing important roles in RNA metabolism, nuclear export, and translation. However, the potential role of RNA m5C methylation in chromatin regulation has hitherto remained elusive. Herein, the inventors have elucidated an epigenetic regulatory pathway that can comprise, but is not limited to, chromatin associated RNA (caRNA), Ten-eleven translocation enzyme 2 (TET2), NOP2/Sun RNA methyltransferase 2 (NSUN2), methyl-CpG binding domain protein 6 (MBD6), and the Polycomb repressive complexes (e.g., comprising PRC1, PRC2, and/or PR- DUB), which components can interact to influence chromatin state through histone modifications (see e.g., FIG. 26), and as such, influence the transcriptome and proteome of cells. As provided herein, the inventors show that this epigenetic regulatory pathway can be perturbed in certain diseases (for example but not limited to, cancers, in particular, leukemias), and show how modifications and/or manipulations of this epigenetic regulatory pathway can treat and/or mitigate certain diseases and symptoms/phenotypes associated therewith. [0153] As shown herein, in some aspects, TET2 mediates chromatin-associated retrotransposon RNA m5C oxidation to suppress deubiquitylation of histone H2AK119ub and gene expression in at least mESCs, HSCs, and leukemic cells. [0154] Mutation of TET2 has been described as driving myeloid malignancy initiation and progression. TET2 deficiency has also been described as leading to a globally opened chromatin state (e.g., euchromatin) and aberrant activation of genes, contributing to aberrant cell self-renewal (e.g., aberrant hematopoietic stem cell (HSC) self-renewal). However, the open chromatin consistently observed in TET2-deficient mouse embryonic stem cells, leukemic cells, and hematopoietic progenitor and stem cells (HPSCs) is inconsistent with TET2’s described role of DNA 5mC oxidation. As described herein, the inventors have shown that chromatin-associated RNA (caRNA, e.g., chromatin-associated retrotransposon RNA) 5- methylcytosine (m5C) can be recognized by the methyl-CpG-binding domain protein MBD6, which can guide deubiquitylation of nearby histone H2AK119ub to promote an open chromatin state. TET2 can oxidize m5C and antagonize this caRNA m5C associated MBD6-dependent H2AK119ub deubiquitylation. Thereby, TET2 depletion can lead to globally decreased H2AK119ub, more open chromatin, and increased/aberrant transcription in cells, such as stem
cells. As such, in certain aspects, methods of treatment of diseases associated with diseased cells with TET2 mutations, such as inactivating mutations, can comprise inhibition of MBD6. In certain cases, TET2 mutant diseases (e.g., TET2 mutant human leukemias) can become dependent on this gene activation pathway. In certain cases where TET2 is not mutated in human leukemias, inhibition of TET2 and MBD6 can result in a synergistically lethal effect upon diseased cells (e.g., cancer cells). In certain aspects, inhibiting MBD6 protein activity (including silencing expression of MBD6) can selectively inhibit proliferation of TET2 mutant cells. As shown herein, in some aspects, inhibiting MBD6 protein activity (including silencing expression of MBD6) can selectively inhibit proliferation of TET2 mutant leukemic cells in vitro and in vivo. Together, findings provided herein at least reveal novel chromatin regulation pathways that comprise TET2 interaction with caRNA m5C marks, and oxidation thereof (and lack thereof in mutant TET2 pathway conditions). Furthermore, as provided herein, the identity of the downstream reader protein influencing chromatin regulation, MBD5 and/or MBD6 has been shown. In some aspects, MBD5 and/or MBD6 are targeted for inhibition and therapeutic intervention against mutant TET2 and/or mutant TET2 pathway associated malignancies. [0155] In certain aspects, technologies provided herein can comprise a catalytic domain fused to a targeting element (e.g., dCas13d-TET2(CD); dCas13d-MBD6(MBD domain); dCas13d-NSUN2, etc.). In certain aspects, such fusion proteins can be utilized to provide site- directed modifications to RNA, such as but not limited to, caRNA. [0156] In some aspects, technologies provided herein (e.g., compositions, methods etc. function broadly in mammalian cells, such as human cells. In some aspects, technologies provided herein can facilitate greater than or equal to about 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41%, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%, or any range derivable therein, RNA m5C installation at user-specified or non-specified RNA loci, in a population of RNA molecules. In some aspects, technologies provided herein can facilitate greater than or equal to about 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41%, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%,
64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%, or any range derivable therein, RNA m5C removal at user- specified or non-specified RNA loci in a population of RNA molecules. [0157] In some aspects, technologies provided herein can facilitate site specific RNA m5C modification (e.g., installation, reading, or removal) activity while maintaining low off-target RNA m5C modification activity. In some aspects, technologies provided herein can facilitate site specific RNA m5C modification with low levels of sequence motif bias. [0158] In some aspects, the current disclosure provides technologies including at least polynucleotides, proteins, polypeptides, and/or vectors, and further provides methods and/or compositions comprising any one or more of the aforementioned components. In certain aspects, polynucleotides may encode sequences comprising RNA m5C modifying catalytic domains (including isolated domains, polypeptides, and/or proteins), engineered small nucleolar RNA (snoRNA), CRISPR/Cas proteins/polypeptides, and/or ancillary RNA components (e.g., CRISPR RNA (crRNA), trans-activating CRISPR RNA (tracrRNA), single guide RNA (sgRNA), etc.), antisense oligonucleotides (ASO), etc. In certain aspects, one or more of the polynucleotides and/or polypeptides can be engineered. [0159] In certain aspects, polynucleotides, proteins, polypeptides, and/or peptide sequences for wild type or mutant versions of various genes, such as site-specific target genes, have been previously disclosed, and may be found in the recognized computerized databases. In certain aspects, polynucleotides, proteins, polypeptides, and/or peptide sequences for wild type versions of various effector proteins and/or RNA molecules, have been previously disclosed, and may be found in the recognized computerized databases. Two commonly used databases are the National Center for Biotechnology Information’s Genbank and GenPept databases (on the World Wide Web at ncbi.nlm.nih.gov/) and The Universal Protein Resource (UniProt; on the World Wide Web at uniprot.org). The coding regions for these genes may be amplified and/or expressed using the techniques disclosed herein or as would be known to those of ordinary skill in the art. [0160] It is contemplated that in compositions of the disclosure, there can be between about 0.001 mg and about 10 mg of total polypeptide, peptide, and/or protein per ml. The concentration of protein in a composition can be about, at least about or at most about 0.001, 0.010, 0.050, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0, 8.5, 9.0, 9.5, 10.0 mg/ml or more (or any range derivable therein).
[0161] In certain aspects, provided herein are methods, compositions, polynucleotides, polypeptides, and/or vectors comprising engineered RNA m5C modifying (e.g., installing, reading, or erasing) catalytic domains (e.g., including complete enzymes and/or components of enzymes), engineered Cas proteins, and/or guide oligonucleotides [0162] The oligonucleotides, polypeptides, polypeptides, proteins, or polynucleotides encoding such polypeptides or proteins of the disclosure may include 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 (or any derivable range therein) or more variant amino acids or nucleic acid substitutions or be at least 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% (or any derivable range therein) similar, identical, or homologous to at least, exactly, or at most 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 425, 450, 475, 500, 525, 550, 575, 600, 625, 650, 675, 700, 725, 750,
775, 800, 825, 850, 875, 900, 925, 950, 1000, 1200, 1266, 1400, 1600, 1800, or 2000 or more contiguous amino acids or nucleic acids, or any range derivable therein, of SEQ ID NOs: 1- 625. In specific aspects, the nucleic acid encoding the peptide or polypeptide is codon optimized for expression in a mammal. In certain aspects, the peptide or polypeptide is not naturally occurring and/or is in a combination of peptides or polypeptides. [0163] The polypeptides of the disclosure may include at least, at most, or exactly 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, or 422 substitutions (or any range derivable therein). In some aspects, the substitution is with an alanine, arginine, asparagine, aspartic acid, cysteine, glutamine, glutamic acid, glycine, histidine, isoleucine, leucine, lysine, methionine, phenylalanine, proline, serine, threonine, tryptophan, tyrosine, or valine. [0164] In some aspects, the polypeptide comprises one or more substitutions at one or more amino acid positions in any one of the proteins or polynucleotides encoding the same identified in SEQ ID NOs: 1-625, wherein each substitution is independently chosen from an amino acid
selected from alanine, arginine, asparagine, aspartic acid, cysteine, glutamine, glutamic acid, glycine, histidine, isoleucine, leucine, lysine, methionine, phenylalanine, proline, serine, threonine, tryptophan, tyrosine, or valine; and wherein the polypeptide or polynucleotide encoding the same is or is at least 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% (or any derivable range therein) sequence identity to one of SEQ ID NOs: 1-625. [0165] In some aspects, the protein or polypeptide, or polynucleotide encoding the same may comprise amino acids or nucleotides 1 to 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530,
531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558, 559, 560, 561, 562, 563, 564, 565, 566, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 600, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 627, 628, 629, 630, 631, 632, 633, 634, 635, 636, 637, 638, 639, 640, 641, 642, 643, 644, 645, 646, 647, 648, 649, 650, 651, 652, 653, 654, 655, 656, 657, 658, 659, 660, 661, 662, 663, 664, 665, 666, 667, 668, 669, 670, 671, 672, 673, 674, 675, 676, 677, 678, 679, 680, 681, 682, 683, 684, 685, 686, 687, 688, 689, 690, 691, 692, 693, 694, 695, 696, 697, 698, 699, 700, 701, 702, 703, 704, 705, 706, 707, 708, 709, 710, 711, 712, 713, 714, 715, 716, 717, 718, 719, 720, 721, 722, 723, 724, 725, 726, 727, 728, 729, 730, 731, 732, 733, 734, 735, 736, 737, 738, 739, 740, 741, 742, 743, 744, 745, 746, 747, 748, 749, 750, 751, 752, 753, 754, 755, 756, 757, 758, 759, 760, 761, 762, 763, 764, 765, 766, 767, 768, 769, 770, 771, 772, 773, 774, 775, 776, 777, 778, 779, 780, 781, 782, 783, 784, 785, 786, 787, 788, 789, 790, 791, 792, 793, 794, 795, 796, 797, 798, 799, 800, 801, 802, 803, 804, 805, 806, 807, 808, 809, 810, 811, 812, 813, 814, 815, 816, 817, 818, 819, 820, 821, 822, 823, 824, 825, 826, 827, 828, 829, 830, 831, 832, 833, 834, 835, 836, 837, 838, 839, 840, 841, 842, 843, 844, 845, 846, 847, 848, 849, 850, 851, 852, 853, 854, 855, 856, 857, 858, 859, 860, 861, 862, 863, 864, 865, 866, 867, 868, 869, 870, 871, 872, 873, 874, 875, 876, 877, 878, 879, 880, 881, 882, 883, 884, 885, 886, 887, 888, 889, 890, 891, 892, 893, 894, 895, 896, 897, 898, 899, 900, 901, 902, 903, 904, 905, 906, 907, 908, 909, 910, 911, 912, 913, 914, 915, 916, 917, 918, 919, 920, 921, 922, 923, 924, 925, 926, 927, 928, 929, 930, 931, 932, 933, 934, 935, 936, 937, 938, 939, 940, 941, 942, 943, 944, 945, 946, 947, 948, 949, 950, 951, 952, 953, 954, 955, 956, 957, 958, 959, 960, 961, 962, 963, 964, 965, 966, 967, 968, 969, 970, 971, 972, 973, 974, 975, 976, 977, 978, 979, 980, 981, 982, 983, 984, 985, 986, 987, 988, 989, 990, 991, 992, 993, 994, 995, 996, 997, 998, 999, 1000, 1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1010, 1011, 1012, 1013, 1014, 1015, 1016, 1017, 1018, 1019, 1020, 1021, 1022, 1023, 1024, 1025, 1026, 1027, 1028, 1029, 1030, 1031, 1032, 1033, 1034, 1035, 1036, 1037, 1038, 1039, 1040, 1041, 1042, 1043, 1044, 1045, 1046, 1047, 1048, 1049, 1050, 1051, 1052, 1053, 1054, 1055, 1056, 1057, 1058, 1059, 1060, 1061, 1062, 1063, 1064, 1065, 1066, 1067, 1068, 1069, 1070, 1071, 1072, 1073, 1074, 1075, 1076, 1077, 1078, 1079, 1080, 1081, 1082, 1083, 1084, 1085, 1086, 1087, 1088, 1089, 1090, 1091, 1092, 1093, 1094, 1095, 1096, 1097, 1098, 1099, 1100, 1101, 1102, 1103, 1104, 1105, 1106, 1107, 1108, 1109, 1110, 1111, 1112, 1113, 1114, 1115, 1116, 1117, 1118, 1119, 1120, 1121, 1122, 1123, 1124, 1125, 1126, 1127, 1128, 1129, 1130, 1131, 1132, 1133, 1134, 1135, 1136, 1137, 1138, 1139,
1140, 1141, 1142, 1143, 1144, 1145, 1146, 1147, 1148, 1149, 1150, 1151, 1152, 1153, 1154, 1155, 1156, 1157, 1158, 1159, 1160, 1161, 1162, 1163, 1164, 1165, 1166, 1167, 1168, 1169, 1170, 1171, 1172, 1173, 1174, 1175, 1176, 1177, 1178, 1179, 1180, 1181, 1182, 1183, 1184, 1185, 1186, 1187, 1188, 1189, 1190, 1191, 1192, 1193, 1194, 1195, 1196, 1197, 1198, 1199, 1200, 1201, 1202, 1203, 1204, 1205, 1206, 1207, 1208, 1209, 1210, 1211, 1212, 1213, 1214, 1215, 1216, 1217, 1218, 1219, 1220, 1221, 1222, 1223, 1224, 1225, 1226, 1227, 1228, 1229, 1230, 1231, 1232, 1233, 1234, 1235, 1236, 1237, 1238, 1239, 1240, 1241, 1242, 1243, 1244, 1245, 1246, 1247, 1248, 1249, 1250, 1251, 1252, 1253, 1254, 1255, 1256, 1257, 1258, 1259, 1260, 1261, 1262, 1263, 1264, 1265, 1266, 1267, 1268, 1269, 1270, 1271, 1272, 1273, 1274, 1275, 1276, 1277, 1278, 1279, 1280, 1281, 1282, 1283, 1284, 1285, 1286, 1287, 1288, 1289, 1290, 1291, 1292, 1293, 1294, 1295, 1296, 1297, 1298, 1299, 1300, 1301, 1302, 1303, 1304, 1305, 1306, 1307, 1308, 1309, 1310, 1311, 1312, 1313, 1314, 1315, 1316, 1317, 1318, 1319, 1320, 1321, 1322, 1323, 1324, 1325, 1326, 1327, 1328, 1329, 1330, 1331, 1332, 1333, 1334, 1335, 1336, 1337, 1338, 1339, 1340, 1341, 1342, 1343, 1344, 1345, 1346, 1347, 1348, 1349, 1350, 1351, 1352, 1353, 1354, 1355, 1356, 1357, 1358, 1359, 1360, 1361, 1362, 1363, 1364, 1365, 1366, 1367, 1368, 1369, 1370, 1371, 1372, 1373, 1374, 1375, 1376, 1377, 1378, 1379, 1380, 1381, 1382, 1383, 1384, 1385, 1386, 1387, 1388, 1389, 1390, 1391, 1392, 1393, 1394, 1395, 1396, 1397, 1398, 1399, 1400, 1401, 1402, 1403, 1404, 1405, 1406, 1407, 1408, 1409, 1410, 1411, 1412, 1413, 1414, 1415, 1416, 1417, 1418, 1419, 1420, 1421, 1422, 1423, 1424, 1425, 1426, 1427, 1428, 1429, 1430, 1431, 1432, 1433, 1434, 1435, 1436, 1437, 1438, 1439, 1440, 1441, 1442, 1443, 1444, 1445, 1446, 1447, 1448, 1449, 1450, 1451, 1452, 1453, 1454, 1455, 1456, 1457, 1458, 1459, 1460, 1461, 1462, 1463, 1464, 1465, 1466, 1467, 1468, 1469, 1470, 1471, 1472, 1473, 1474, 1475, 1476, 1477, 1478, 1479, 1480, 1481, 1482, 1483, 1484, 1485, 1486, 1487, 1488, 1489, 1490, 1491, 1492, 1493, 1494, 1495, 1496, 1497, 1498, 1499, 1500, 1501, 1502, 1503, 1504, 1505, 1506, 1507, 1508, 1509, 1510, 1511, 1512, 1513, 1514, 1515, 1516, 1517, 1518, 1519, 1520, 1521, 1522, 1523, 1524, 1525, 1526, 1527, 1528, 1529, 1530, 1531, 1532, 1533, 1534, 1535, 1536, 1537, 1538, 1539, 1540, 1541, 1542, 1543, 1544, 1545, 1546, 1547, 1548, 1549, 1550, 1551, 1552, 1553, 1554, 1555, 1556, 1557, 1558, 1559, 1560, 1561, 1562, 1563, 1564, 1565, 1566, 1567, 1568, 1569, 1570, 1571, 1572, 1573, 1574, 1575, 1576, 1577, 1578, 1579, 1580, 1581, 1582, 1583, 1584, 1585, 1586, 1587, 1588, 1589, 1590, 1591, 1592, 1593, 1594, 1595, 1596, 1597, 1598, 1599, 1600, 1601, 1602, 1603, 1604, 1605, 1606, 1607, 1608, 1609, 1610, 1611, 1612, 1613, 1614, 1615, 1616, 1617, 1618, 1619, 1620, 1621, 1622, 1623, 1624, 1625, 1626, 1627, 1628, 1629, 1630, 1631, 1632, 1633, 1634, 1635, 1636, 1637, 1638, 1639, 1640, 1641, 1642, 1643, 1644, 1645, 1646, 1647, 1648, 1649,
1650, 1651, 1652, 1653, 1654, 1655, 1656, 1657, 1658, 1659, 1660, 1661, 1662, 1663, 1664, 1665, 1666, 1667, 1668, 1669, 1670, 1671, 1672, 1673, 1674, 1675, 1676, 1677, 1678, 1679, 1680, 1681, 1682, 1683, 1684, 1685, 1686, 1687, 1688, 1689, 1690, 1691, 1692, 1693, 1694, 1695, 1696, 1697, 1698, 1699, 1700, 1701, 1702, 1703, 1704, 1705, 1706, 1707, 1708, 1709, 1710, 1711, 1712, 1713, 1714, 1715, 1716, 1717, 1718, 1719, 1720, 1721, 1722, 1723, 1724, 1725, 1726, 1727, 1728, 1729, 1730, 1731, 1732, 1733, 1734, 1735, 1736, 1737, 1738, 1739, 1740, 1741, 1742, 1743, 1744, 1745, 1746, 1747, 1748, 1749, 1750, 1751, 1752, 1753, 1754, 1755, 1756, 1757, 1758, 1759, 1760, 1761, 1762, 1763, 1764, 1765, 1766, 1767, 1768, 1769, 1770, 1771, 1772, 1773, 1774, 1775, 1776, 1777, 1778, 1779, 1780, 1781, 1782, 1783, 1784, 1785, 1786, 1787, 1788, 1789, 1790, 1791, 1792, 1793, 1794, 1795, 1796, 1797, 1798, 1799, 1800, 1801, 1802, 1803, 1804, 1805, 1806, 1807, 1808, 1809, 1810, 1811, 1812, 1813, 1814, 1815, 1816, 1817, 1818, 1819, 1820, 1821, 1822, 1823, 1824, 1825, 1826, 1827, 1828, 1829, 1830, 1831, 1832, 1833, 1834, 1835, 1836, 1837, 1838, 1839, 1840, 1841, 1842, 1843, 1844, 1845, 1846, 1847, 1848, 1849, 1850, 1851, 1852, 1853, 1854, 1855, 1856, 1857, 1858, 1859, 1860, 1861, 1862, 1863, 1864, 1865, 1866, 1867, 1868, 1869, 1870, 1871, 1872, 1873, 1874, 1875, 1876, 1877, 1878, 1879, 1880, 1881, 1882, 1883, 1884, 1885, 1886, 1887, 1888, 1889, 1890, 1891, 1892, 1893, 1894, 1895, 1896, 1897, 1898, 1899, 1900, 1901, 1902, 1903, 1904, 1905, 1906, 1907, 1908, 1909, 1910, 1911, 1912, 1913, 1914, 1915, 1916, 1917, 1918, 1919, 1920, 1921, 1922, 1923, 1924, 1925, 1926, 1927, 1928, 1929, 1930, 1931, 1932, 1933, 1934, 1935, 1936, 1937, 1938, 1939, 1940, 1941, 1942, 1943, 1944, 1945, 1946, 1947, 1948, 1949, 1950, 1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023, 2024, 2025, 2026, 2027, 2028, 2029, 2030, 2031, 2032, 2033, 2034, 2035, 2036, 2037, 2038, 2039, 2040, 2041, 2042, 2043, 2044, 2045, 2046, 2047, 2048, 2049, 2050, 2051, 2052, 2053, 2054, 2055, 2056, 2057, 2058, 2059, or 2060 of SEQ ID NOs: 1-625, or any other amino acid or nucleotide noted therein, and have or have at least, at most, or exactly 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% (or any derivable range therein) sequence identity to one of SEQ ID NOs: 1-625.
[0166] In some aspects, the protein, polypeptide, or nucleic acid may comprise, comprise at least, or comprise at most 1 to 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 425, 450, 475, 500, 525, 550, 575, 600, 625, 650, 675, 700, 725, 750, 775, 800, 825, 850, 875, 900, 925, 950, 1000, 1200, 1266, 1400, 1600, 1800, or 2000, or more (or any derivable range therein) contiguous amino acids or nucleic acids of SEQ ID NOs: 1-625. [0167] In some aspects, the polypeptide, protein, or nucleic acid may comprise at least, at most, or exactly 1 to 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174,
175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 425, 450, 475, 500, 525, 550, 575, 600, 625, 650, 675, 700, 725, 750, 775, 800, 825, 850, 875, 900, 925, 950, 1000, 1200, 1266, 1400, 1600, 1800, or 2000, or more (or any derivable range therein) contiguous amino acids or nucleic acids of SEQ ID NOs: 1-625 that are at least, at most, or exactly 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% (or any derivable range therein) similar, identical, or homologous to one of SEQ ID NOs: 1-625. [0168] In some aspects there is a nucleic acid molecule or polypeptide starting at position 1 to 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273,
274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, or 422, or any other nucleotide or amino acid of any of SEQ ID NOs: 1-625 and comprising at least, at most, or exactly 1 to 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 425, 450, 475, 500, 525, 550, 575, 600, 625, 650, 675, 700, 725, 750, 775, 800, 825, 850, 875, 900, 925, 950, 1000, 1200, 1266, 1400, 1600, 1800, or 2000, or greater than 2000 (or any derivable range therein) contiguous amino acids or nucleic acids of any of SEQ ID NOs: 1-625.
III. TET2 pathway, mutations and diseases associated therewith [0169] Aspects of the present disclosure are directed to compositions and methods for treatment of diseases associated with abnormal m5C RNA, such as abnormal m5C RNA feature levels described herein. [0170] Somatic loss-of-function mutations in the ten-eleven translocation 2 (TET2) gene occur in a significant proportion of patients with myeloid malignancies. Genetic studies of patients with myeloid malignancies have identified recurrent somatic alterations in the majority of patients with myeloproliferative neoplasms (MPNs), myelodysplastic syndromes (MDSs), and acute myeloid leukemia (AML). In some aspects, provided herein are methods of treating myeloproliferative neoplasms (MPNs), myelodysplastic syndromes (MDSs), and/or acute myeloid leukemia (AML). Several studies have identified recurrent mutations of known and putative epigenetic modifiers in patients with myeloid malignancies. These include somatic mutations in chromatin-modifying enzymes and in DNA methyltransferases. In addition, biologic studies of recurrent chromosomal translocations, including MLL fusions, have shown that leukemogenic fusion proteins alter epigenetic regulation in hematopoietic cells, resulting in changes in chromatin state at specific loci. In certain aspects, provided herein are methods of treating MLL. Taken together, these data suggest that somatic alterations in genes that regulate the epigenetic state of hematopoietic cells are a common pathogenetic event in leukemogenesis. In certain aspects, provided herein are methods of treating diseases associated with somatic alterations in genes that regulate the epigenetic state of hematopoietic cells, such as but not limited to hematopoietic stem cells. [0171] Somatic deletions and loss-of-function mutations in the ten-eleven translocation 2 (TET2) gene were identified in 10%–20% of patients with MDS and MPN; subsequent studies identified recurrent TET2 mutations in patients with chronic myelomonocytic leukemia (CMML) and AML, and demonstrated that TET2 mutations were associated with adverse outcome in intermediate-risk AML. Within the TET family of proteins, TET1, TET2, and TET3 have been shown to modify DNA by hydroxylating 5-methylcytosine (5mC), and the TET2 mutant proteins observed in these patients with myeloid malignancies have been shown to be deficient in this enzymatic function. Mutations that impact 5-hydroxymethylation represent a mechanism of transformation in myeloid malignancies (see e.g., Kelly Moran- Crusio et al., Tet2 Loss Leads to Increased Hematopoietic Stem Cell Self-Renewal and Myeloid Transformation. Cancer Cell 2011; which is incorporated herein by reference in its entirety for the purposes described herein).
[0172] Additionally, inactivating mutations in Tet methylcytosine dioxygenase 2 (TET2) have been detected in peripheral blood cells of a remarkable 5%-10% of adults greater than 65 years of age. The TET2 inactivating mutations have been reported to impart a hematopoietic stem cell advantage and resultant clonal hematopoiesis of indeterminate potential (CHIP) with skewed myelomonocytic differentiation. CHIP has been reported to be associated with an overall increased risk of transformation to a hematological malignancy, especially myeloproliferative and myelodysplastic neoplasms (MPN, MDS) and acute myeloid leukemia (AML), of approximately 0.5% to 1% per year. However, it is becoming increasingly possible to identify individuals at greatest risk, based on CHIP mutational characteristics. CHIP, and particularly TET2-mutant CHIP, has also been reported to be a significant risk factor for cardiovascular diseases, related in part to hyper-inflammatory progeny macrophages that carry TET2 inactivating mutations. Therefore, somatic TET2 mutations, and mutations in TET2 pathways, can contribute to myeloid expansion and innate immune dysregulation with age and can contribute to prevalent diseases in the developed world (e.g., cancer and/or cardiovascular disease). [0173] MBD5 is a member of the methyl-CpG-binding domain (MBD) family. The MBD domain consists of approximately 70 residues (e.g., ~65 to ~75 residues) and is the minimal region required for a methyl-CpG-binding protein binding specifically to methylated DNA. In addition to the MBD domain, MBD5 protein contains a PWWP domain (Pro-Trp-Trp-Pro motif), which consists of 100-150 amino acids and is found in numerous proteins that are involved in cell division, growth and differentiation. Mutations in this gene have been reported to result in an autosomal dominant type of cognitive disability. The MBD5 protein has been reported to interact with the polycomb repressive complex PR-DUB which catalyzes the deubiquitination of a lysine residue of histone 2A. Haploinsufficiency of the MBD5 encoding gene has been reported to be associated with a syndrome involving microcephaly, intellectual disabilities, severe speech impairment, and seizures. In some aspects, MBD5 functions as an m5C RNA reader protein and/or reader complex component (e.g., as a guide/reader for the PR- DUB complex). Exemplary human wildtype MBD5 protein isoforms and encoding genes are provided as SEQ ID NOs: 5-8. Additional exemplary human wildtype MBD5 protein isoforms and encoding genes, and information related thereto, can be found in the NCBI database under Gene ID: 55777. In some aspects, methods disclosed herein comprise, consist essentially of, or consist of inhibition of MBD5. In some aspects, methods disclosed herein comprise administering one or more inhibitors of MBD5. The one or more inhibitors of MBD6 may, in some aspects, comprise a polynucleotide at least partially complementary to a gene encoding
MBD5 (e.g., a short hairpin RNA and/or small interfering RNA). In some aspects, the one or more inhibitors of MBD5 comprise a proteolysis targeting chimera (PROTAC) targeting MBD5. [0174] MBD6 is a member of the methyl-CpG-binding domain (MBD) family. The human MBD6 gene and encoded protein are relatively poorly characterized relative to other members of the MBD family. MBD6 has been reported to enable chromatin binding activity, and has been found to be located in the chromocenter, the fibrillar center, and the nucleoplasm. Mutations in MBD6 have been been reported to be implicated in autism spectrum disorder. In some aspects, MBD6 functions as an m5C RNA reader protein and/or reader complex component (e.g., as a guide/reader for the PR-DUB complex). Exemplary human wildtype MBD6 protein isoforms or polypeptides derived therefrom, and encoding gene are provided as SEQ ID NOs: 1-4. Additional exemplary human wildtype MBD6 protein isoforms and encoding genes, and information related thereto, can be found in the NCBI database under Gene ID: 114785. In some aspects, methods disclosed herein comprise, consist essentially of, or consist of inhibition of MBD6. In some aspects, methods disclosed herein comprise administering one or more inhibitors of MBD6. The one or more inhibitors of MBD6 may, in some aspects, comprise a polynucleotide at least partially complementary to a gene encoding MBD6 (e.g., a short hairpin RNA and/or small interfering RNA). In some aspects, the one or more inhibitors of MBD6 comprise a proteolysis targeting chimera (PROTAC) targeting MBD6. [0175] NSUN1 (NOP2 nucleolar protein) has been reported to enable RNA binding activity, and has been reported to be involved in positive regulation of cell population proliferation; regulation of signal transduction by p53 class mediator; and ribosomal large subunit assembly. NSUN1 has been reported to localize to the nucleolus. In some aspects, NSUN1 functions as an m5C RNA writer protein and/or writer complex component. Exemplary human wildtype NSUN1 protein isoforms and encoding genes are provided as SEQ ID NOs: 9-10. Additional exemplary human wildtype NSUN1 protein isoforms and encoding genes can be found in the NCBI database under Gene ID: 4839. In some aspects, methods disclosed herein comprise, consist essentially of, or consist of inhibition of NSUN1. In some aspects, methods disclosed herein comprise administering one or more inhibitors of NSUN1. The one or more inhibitors of NSUN1 may, in some aspects, comprise a polynucleotide at least partially complementary to a gene encoding NSUN1 (e.g., a short hairpin RNA and/or small interfering RNA). In some aspects, the one or more inhibitors of NSUN1 comprise a proteolysis targeting chimera (PROTAC) targeting NSUN1.
[0176] NSUN2 (NOP2/Sun RNA methyltransferase 2) is a methyltransferase that has been reported to catalyze the methylation of cytosine to 5-methylcytosine (m5C) in tRNAs, such as at position 34 of intron-containing tRNA(Leu)(CAA) precursors. The m5C modification in tRNA(Leu)(CAA) is necessary to stabilize the anticodon-codon pairing and correctly translate target mRNA. Alternatively spliced transcript variants encoding different isoforms have been noted for this gene. In some aspects, NSUN2 functions as an m5C RNA writer protein and/or writer complex component. Exemplary human wildtype NSUN2 protein isoforms and encoding genes are provided as SEQ ID NOs: 11-14. Additional exemplary human wildtype NSUN1 protein isoforms and encoding genes, and information related thereto, can be found in the NCBI database under Gene ID: 54888. In some aspects, methods disclosed herein comprise, consist essentially of, or consist of inhibition of NSUN2. In some aspects, methods disclosed herein comprise administering one or more inhibitors of NSUN2. The one or more inhibitors of NSUN2 may, in some aspects, comprise a polynucleotide at least partially complementary to a gene encoding NSUN2 (e.g., a short hairpin RNA and/or small interfering RNA). In some aspects, the one or more inhibitors of NSUN2 comprise a proteolysis targeting chimera (PROTAC) targeting NSUN2. [0177] TET2 is a methylcytosine dioxygenase that catalyzes the conversion of methylcytosine to 5-hydroxymethylcytosine. The TET2 protein has been reported to be involved in myelopoiesis, and defects in this gene have been reported to be associated with several myeloproliferative disorders. Two variants encoding different isoforms have been found for this gene. In some aspects, TET2 functions as an m5C RNA eraser protein and/or eraser complex component. Exemplary human wildtype TET2 protein isoforms, and polypeptides derived therefrom, and encoding genes are provided as SEQ ID NOs: 15-20. Additional exemplary human wildtype TET2 protein isoforms and encoding genes, and information related thereto, can be found in the NCBI database under Gene ID: 54790. In some aspects, methods disclosed herein comprise, consist essentially of, or consist of inhibition of TET2. In some aspects, methods disclosed herein comprise administering one or more inhibitors of TET2. The one or more inhibitors of TET2 may, in some aspects, comprise a polynucleotide at least partially complementary to a gene encoding TET2 (e.g., a short hairpin RNA and/or small interfering RNA). In some aspects, the one or more inhibitors of TET2 comprise a proteolysis targeting chimera (PROTAC) targeting TET2. [0178] In some aspects, provided herein are technologies for site-specific and/or non-site- specific RNA m5C modification activity (e.g., m5C installation, reading/binding, and/or erasing). In some aspects, technologies provided herein can be utilized to increase levels of
m5C installation at one or more RNAs (for example but not limited to, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, or 1000, or more than 1000, or any range derivable therein), such as but not limited to RNAs identified in Table 1. In some aspects, technologies provided herein can be utilized to decrease levels of m5C installation at one or more RNAs (for example but not limited to, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, or 1000, or more than 1000, or any range derivable therein), such as but not limited to RNAs identified in Table 1. In some aspects, technologies provided herein can be utilized to increase levels of m5C recognition (e.g., reading) at one or more RNAs (for example but not limited to, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, or 1000, or more than 1000, or any range derivable therein), such as but not limited to RNAs identified in Table 1. In some aspects, technologies provided herein can be utilized to decrease levels of m5C recognition (e.g., reading) at one or more RNAs (for example but not limited to, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, or 1000, or more than 1000, or any range derivable therein), such as but not limited to RNAs identified in Table 1. In some aspects, technologies provided herein can be utilized to increase levels of m5C erasing (e.g., oxidation) at one or more RNAs (for example but not limited to, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, or 1000, or more than 1000, or any range derivable therein), such as but not limited to RNAs identified in Table 1. In some aspects, technologies provided herein can be utilized to decrease levels of m5C erasing (e.g., oxidation) at one or more RNAs (for example but not limited to, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, or 1000, or more than 1000, or any range derivable therein), such as but not limited to RNAs identified in Table 1. [0179] In certain aspects, the size of a protein or polypeptide (wild-type or modified; including m5C writers, readers, and/or erasors, and/or other proteins/polypeptides described herein) may comprise, but are not limited to, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69,
70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 275, 300, 325, 350, 375, 400, 425, 450, 475, 500, 525, 550, 575, 600, 625, 650, 675, 700, 725, 750, 775, 800, 825, 850, 875, 900, 925, 950, 1000, 1200, 1400, 1600, 1800, 2000, 2200, 2400, 2600, 2800, 3000, 3200, 3400, 3600, 3800, or 4000, or more amino acid residues or nucleic acid residues or greater, and any range derivable therein, or derivative of a corresponding amino sequence. It is contemplated that polypeptides may be mutated by truncation, rendering them shorter than their corresponding wild-type form, also or alternatively, they might be altered by fusing or conjugating a heterologous protein or polypeptide sequence with a particular function (e.g., for targeting or localization, for enhanced immunogenicity, for purification purposes, etc.). [0180] In some aspects, a protein comprises, consists essentially of, or consists of an amino acid sequence or is encoded by a polynucleotide sequence, with about, exactly, or at least about 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%, or any percentage derivable therein, identity to any one of SEQ ID NOs: 1-32. [0181] In some aspects, an inhibitor of one or more proteins (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more) described herein comprises, consists essentially of, or consists of a polynucleotide sequence with about, exactly, or at least about 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%, or any percentage derivable therein, identity to a contiguous stretch of at least 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, or 40 nucleotides of any one of SEQ ID NOs: 1-32. SEQ ID NO: 1 – Wild type human MBD6 amino acid sequence MNGGNESSGADRAGGPVATSVPIGWQRCVREGAVLYISPSGTELSSLEQTRSYLLSDGTCKC GLECPLNVPKVFNFDPLAPVTPGGAGVGPASEEDMTKLCNHRRKAVAMATLYRSMETTCSHS SPGEGASPQMFHTVSPGPPSARPPCRVPPTTPLNGGPGSLPPEPPSVSQAFPTLAGPGGLFP PRLADPVPSGGSSSPRFLPRGNAPSPAPPPPPAISLNAPSYNWGAALRSSLVPSDLGSPPAP HASSSPPSDPPLFHCSDALTPPPLPPSNNLPAHPGPASQPPVSSATMHLPLVLGPLGGAPTV EGPGAPPFLASSLLSAAAKAQHPPLPPPSTLQGRRPRAQAPSASHSSSLRPSQRRPRRPPTV FRLLEGRGPQTPRRSRPRAPAPVPQPFSLPEPSQPILPSVLSLLGLPTPGPSHSDGSFNLLG SDAHLPPPPTLSSGSPPQPRHPIQPSLPGTTSGSLSSVPGAPAPPAASKAPVVPSPVLQSPS EGLGMGAGPACPLPPLAGGEAFPFPSPEQGLALSGAGFPGMLGALPLPLSLGQPPPSPLLNH SLFGVLTGGGGQPPPEPLLPPPGGPGPPLAPGEPEGPSLLVASLLPPPPSDLLPPPSAPPSN LLASFLPLLALGPTAGDGEGSAEGAGGPSGEPFSGLGDLSPLLFPPLSAPPTLIALNSALLA ATLDPPSGTPPQPCVLSAPQPGPPTSSVTTATTDPGASSLGKAPSNSGRPPQLLSPLLGASL
LGDLSSLTSSPGALPSLLQPPGPLLSGQLGLQLLPGGGAPPPLSEASSPLACLLQSLQIPPE QPEAPCLPPESPASALEPEPARPPLSALAPPHGSPDPPVPELLTGRGSGKRGRRGGGGLRGI NGEARPARGRKPGSRREPGRLALKWGTRGGFNGQMERSPRRTHHWQHNGELAEGGAEPKDPP PPGPHSEDLKVPPGVVRKSRRGRRRKYNPTRNSNSSRQDITLEPSPTARAAVPLPPRARPGR PAKNKRRKLAP (SEQ ID NO: 1) SEQ ID NO: 2 – Wild type human MBD6 polynucleotide coding sequence ATGAATGGGGGCAATGAGAGCAGTGGAGCAGACAGAGCTGGGGGCCCTGTGGCCACATCTGT CCCCATCGGCTGGCAGCGCTGTGTGCGAGAGGGTGCTGTGCTCTACATCAGTCCAAGTGGCA CAGAGCTGTCTTCCTTGGAGCAAACCCGGAGCTACCTCCTCAGCGATGGGACCTGCAAGTGC GGTCTGGAGTGTCCACTTAATGTCCCCAAGGTTTTCAACTTTGACCCTTTGGCCCCGGTGAC CCCGGGTGGGGCTGGGGTGGGGCCAGCATCAGAGGAGGACATGACCAAGCTGTGCAACCACC GGCGGAAAGCTGTTGCTATGGCAACTCTGTACCGCAGCATGGAGACCACCTGCTCACACTCT TCTCCTGGAGAGGGAGCGAGCCCCCAAATGTTCCACACTGTGTCCCCAGGGCCCCCCTCTGC CCGCCCTCCCTGTCGAGTTCCTCCTACAACTCCACTTAATGGGGGTCCTGGCTCCCTTCCCC CAGAACCACCCTCAGTTTCCCAGGCCTTTCCCACTCTAGCAGGCCCTGGGGGGCTTTTCCCC CCAAGGCTTGCTGACCCAGTCCCTTCTGGGGGCAGTAGCAGCCCCCGTTTCCTCCCAAGGGG CAATGCCCCCTCTCCAGCCCCACCTCCTCCACCTGCTATCAGCCTCAATGCTCCCTCATACA ACTGGGGAGCTGCCCTCAGATCCAGCCTGGTGCCCTCTGACCTGGGCTCTCCTCCGGCCCCT CATGCCTCCTCCTCACCACCTTCAGACCCTCCTCTCTTCCACTGTAGTGATGCCTTAACACC CCCTCCCCTGCCCCCGAGCAATAATCTCCCCGCCCACCCTGGTCCTGCCTCTCAGCCACCAG TGTCTTCAGCCACTATGCACCTGCCCCTGGTCCTGGGGCCCCTGGGAGGGGCCCCCACGGTG GAGGGGCCTGGGGCACCCCCCTTCCTTGCTAGCAGCCTACTCTCTGCAGCGGCCAAGGCACA GCATCCCCCACTACCCCCTCCCAGCACTTTACAGGGCCGAAGGCCCCGTGCCCAGGCACCCT CAGCTTCCCACTCCTCATCACTTCGTCCCTCTCAGCGTCGTCCCCGCAGACCCCCTACTGTA TTTCGATTGCTAGAAGGGAGAGGCCCTCAAACCCCTAGACGGAGCCGTCCTCGGGCCCCTGC TCCTGTCCCCCAACCCTTTTCTCTCCCGGAGCCATCCCAACCAATTCTCCCTTCTGTGCTGT CCCTGCTGGGACTCCCCACCCCTGGCCCTTCCCACTCTGATGGAAGCTTTAACCTTTTGGGG TCAGATGCACACCTTCCTCCTCCCCCAACCCTCTCCTCAGGGAGCCCTCCCCAGCCCAGGCA CCCCATCCAGCCCTCCCTGCCTGGGACCACCAGTGGCAGCCTCAGCAGTGTGCCAGGTGCCC CTGCCCCACCAGCTGCCTCCAAAGCCCCAGTAGTCCCCAGCCCTGTGCTTCAAAGCCCATCC GAAGGACTGGGGATGGGGGCAGGCCCGGCCTGCCCTCTGCCTCCCCTGGCTGGTGGAGAGGC TTTCCCTTTCCCCAGCCCTGAGCAGGGCCTGGCACTGAGTGGAGCTGGCTTCCCTGGGATGC TTGGGGCCTTGCCTCTCCCTCTGAGTCTGGGGCAGCCTCCACCTTCTCCATTGCTCAACCAC AGTTTATTTGGTGTGCTGACTGGGGGAGGAGGACAACCTCCCCCTGAGCCCCTGCTACCCCC ACCAGGAGGACCTGGTCCTCCCCTAGCCCCAGGAGAGCCTGAAGGGCCTTCGCTTTTGGTGG CTTCCTTGCTTCCTCCACCACCCTCAGACCTTCTTCCACCTCCTTCAGCACCTCCCAGCAAC CTCCTTGCCTCTTTCCTGCCCCTGTTGGCTCTGGGCCCCACAGCTGGGGATGGGGAGGGATC TGCAGAGGGAGCCGGGGGTCCAAGTGGGGAGCCATTTTCAGGCTTGGGAGACCTGTCCCCCC TACTTTTCCCCCCACTTTCAGCCCCCCCTACCCTCATAGCTTTAAATTCTGCGCTGCTGGCT GCCACCCTGGATCCCCCCTCGGGGACACCCCCCCAGCCCTGTGTCCTGAGTGCCCCCCAACC TGGACCACCTACCTCCAGTGTCACCACGGCAACTACTGACCCGGGGGCCTCCTCTCTGGGCA AGGCCCCCTCCAACTCAGGGAGACCCCCCCAACTCCTTAGCCCTCTGCTGGGTGCCAGCCTG CTGGGTGACCTGTCTTCACTGACCAGCAGCCCTGGAGCCCTCCCCAGCCTGTTGCAGCCTCC TGGCCCTCTTCTCTCTGGCCAGTTGGGGCTGCAGCTCCTCCCTGGGGGGGGAGCTCCTCCAC CCCTCTCAGAGGCTTCTAGTCCCCTAGCCTGCCTGCTACAGAGTCTCCAGATCCCTCCAGAG CAGCCAGAAGCCCCCTGTCTACCCCCCGAGAGCCCTGCCTCAGCCCTCGAACCAGAGCCTGC CAGGCCTCCCCTCAGTGCCTTAGCCCCACCCCATGGTTCTCCCGACCCCCCAGTCCCTGAGC TGCTCACTGGGAGGGGGTCAGGGAAACGGGGCCGGAGGGGAGGAGGGGGACTTAGGGGCATT AATGGTGAGGCCAGGCCAGCCCGGGGCCGAAAGCCTGGCAGCCGGCGGGAGCCTGGCCGACT GGCCCTCAAATGGGGGACACGTGGTGGCTTCAATGGACAAATGGAAAGGTCCCCAAGAAGAA
CCCACCATTGGCAGCATAATGGGGAGCTGGCTGAAGGGGGTGCTGAGCCCAAGGATCCACCC CCTCCCGGGCCCCATTCTGAGGACCTTAAGGTGCCCCCGGGAGTAGTCAGAAAGTCTCGTCG TGGCCGTAGGAGAAAATACAACCCTACCCGGAACAGCAATAGCTCCCGCCAGGACATTACCT TGGAACCCAGCCCTACAGCCCGAGCAGCTGTCCCTCTGCCTCCCCGGGCCCGCCCTGGCCGT CCTGCCAAAAACAAGAGGAGGAAACTGGCCCCATAG (SEQ ID NO: 2) SEQ ID NO: 3 – Wild type human MBD6 MBD catalytic domain amino acid sequence DRAGGPVATSVPIGWQRCVREGAVLYISPSGTELSSLEQTRSYLLSDGTCKCGLECPLNVPK VFNFDPLAP (SEQ ID NO: 3) SEQ ID NO: 4 – Wild type human MBD6 MBD catalytic domain polynucleotide coding sequence GCAGACAGAGCTGGGGGCCCTGTGGCCACATCTGTCCCCATCGGCTGGCAGCGCTGTGTGCG AGAGGGTGCTGTGCTCTACATCAGTCCAAGTGGCACAGAGCTGTCTTCCTTGGAGCAAACCC GGAGCTACCTCCTCAGCGATGGGACCTGCAAGTGCGGTCTGGAGTGTCCACTTAATGTCCCC AAGGTTTTCAACTTTGACCCTTTGGCCCCG (SEQ ID NO: 4) SEQ ID NO: 5 – Wild type human MBD5 isoform 1 amino acid sequence MNGGKECDGGDKEGGLPAIQVPVGWQRRVDQNGVLYVSPSGSLLSCLEQVKTYLLTDGTCKC GLECPLILPKVFNFDPGAAVKQRTAEDVKADEDVTKLCIHKRKIIAVATLHKSMEAPHPSLV LTSPGGGTNATPVVPSRAATPRSVRNKSHEGITNSVMPECKNPFKLMIGSSNAMGRLYVQEL PGSQQQELHPVYPRQRLGSSEHGQKSPFRGSHGGLPSPASSGSQIYGDGSISPRTDPLGSPD VFTRSNPGFHGAPNSSPIHLNRTPLSPPSVMLHGSPVQSSCAMAGRTNIPLSPTLTTKSPVM KKPMCNFSTNMEIPRAMFHHKPPQGPPPPPPPSCALQKKPLTSEKDPLGILDPIPSKPVNQN PVIINPTSFHSNVHSQVPMMNVSMPPAVVPLPSNLPLPTVKPGHMNHGSHVQRVQHSASTSL SPSPVTSPVHMMGTGIGRIEASPQRSRSSSTSSDHGNFMMPPVGPQATSSGIKVPPRSPRST IGSPRPSMPSSPSTKSDGHHQYKDIPNPLIAGISNVLNTPSSAAFPTASAGSSSVKSQPGLL GMPLNQILNQHNAASFPASSLLSAAAKAQLANQNKLAGNNSSSSSNSGAVAGSGNTEGHSTL NTMFPPTANMLLPTGEGQSGRAALRDKLMSQQKDALRKRKQPPTTVLSLLRQSQMDSSAVPK PGPDLLRKQGQGSFPISSMSQLLQSMSCQSSHLSSNSTPGCGASNTALPCSANQLHFTDPSM NSSVLQNIPLRGEAVHCHNANTNFVHSNSPVPNHHLAGLINQIQASGNCGMLSQSGMALGNS LHPNPPQSRISTSSTPVIPNSIVSSYNQTSSEAGGSGPSSSIAIAGTNHPAITKTTSVLQDG VIVTTAAGNPLQSQLPIGSDFPFVGQEHALHFPSNSTSNNHLPHPLNPSLLSSLPISLPVNQ QHLLNQNLLNILQPSAGEGKSEINLHPLGFLNPNVNAALAFLSSDMDGQVLQPVHFQLLAAL LQNQAQAAAMLPLPSFNLTISDLLQQQNTPLPSLTQMTAPPDHLPSNQSDNSRAETLLTSPL GNPLPSFAGSDTTFNPLFLPAVNGASGLMTLNPQLLGGVLNSASANTANHPEVSIATSSQAT TTTTTTSSAVAALTVSTLGGTAVVSMAETLLNISNNAGNTPGPAKLNSNSVVPQLLNPLLGT GLLGDMSSINNTLSNHQLTHLQSLLNNNQMFPPNQQQQQLLQGYQNLQAFQGQSTIPCPANN NPMACLFQNFQVRMQEDAALLNKRISTQPGLTALPENPNTTLPPFQDTPCELQPRIDPSLGQ QVKDGLVVGGPGDASVDAIYKAVVDAASKGMQVVITTAVNSTTQISPIPALSAMSAFTASIG DPLNLSSAVSAVIHGRNMGGVDHDGRLRNSRGARLPKNLDHGKNVNEGDGFEYFKSASCHTS KKQWDGEQSPRGERNRWKYEEFLDHPGHIHSSPCHERPNNVSTLPFLPGEQHPILLPPRNCP GDKILEENFRYNNYKRTMMSFKERLENTVERCAHINGNRPRQSRGFGELLSTAKQDLVLEEQ SPSSSNSLENSLVKDYIHYNGDFNAKSVNGCVPSPSDAKSISSEDDLRNPDSPSSNELIHYR PRTFNVGDLVWGQIKGLTSWPGKLVREDDVHNSCQQSPEEGKVEPEKLKTLTEGLEAYSRVR KRNRKSGKLNNHLEAAIHEAMSELDKMSGTVHQIPQGDRQMRPPKPKRRKISR (SEQ ID NO: 5)
SEQ ID NO: 6 – Wild type human MBD5 isoform 1 polynucleotide coding sequence ATGAATGGAGGCAAAGAGTGTGACGGAGGGGACAAGGAAGGAGGTCTTCCAGCTATACAAGT TCCTGTGGGTTGGCAGCGTCGTGTGGATCAAAATGGAGTGCTTTATGTCAGTCCCAGTGGGT CTTTGTTATCTTGCTTGGAGCAGGTTAAAACATACCTGCTTACTGATGGAACATGCAAGTGT GGCTTGGAATGTCCTCTTATTCTTCCCAAGGTATTTAATTTTGATCCTGGAGCTGCTGTGAA ACAGAGAACCGCAGAAGATGTTAAGGCAGATGAAGATGTCACAAAGCTATGCATACATAAAA GAAAAATTATTGCAGTGGCCACACTTCATAAAAGCATGGAAGCCCCACATCCTTCTCTGGTG CTCACCAGTCCCGGAGGAGGAACAAATGCAACTCCAGTAGTACCTTCTCGGGCAGCAACTCC AAGATCAGTAAGAAATAAGTCTCATGAAGGAATTACAAATTCTGTAATGCCTGAATGTAAGA ATCCTTTCAAGTTAATGATTGGATCATCAAATGCCATGGGAAGGCTATATGTACAAGAACTG CCTGGAAGCCAACAACAAGAACTCCACCCTGTCTACCCCCGACAGAGATTGGGCAGCAGTGA ACATGGACAGAAATCTCCATTCCGTGGCAGCCATGGAGGCCTGCCCAGCCCAGCGTCATCAG GTTCCCAGATATATGGAGATGGTTCAATCTCTCCAAGGACTGACCCACTTGGAAGTCCTGAT GTTTTCACAAGAAGTAATCCTGGTTTTCATGGAGCTCCCAATTCTAGTCCTATTCACCTGAA TAGGACTCCTCTTTCTCCACCTTCAGTAATGCTACATGGTTCTCCTGTACAGTCATCCTGTG CAATGGCTGGAAGGACTAATATACCTCTTTCCCCAACCTTGACTACAAAGAGTCCAGTAATG AAAAAACCAATGTGTAATTTTTCAACTAATATGGAAATACCACGAGCAATGTTCCACCACAA ACCACCCCAAGGCCCACCTCCCCCTCCTCCACCTTCTTGTGCTCTTCAGAAAAAGCCATTAA CATCTGAGAAAGATCCACTTGGCATTCTTGACCCTATTCCTAGTAAACCAGTGAATCAGAAC CCTGTTATCATTAATCCAACCAGTTTCCATTCAAATGTCCACTCTCAGGTACCTATGATGAA TGTAAGCATGCCTCCTGCTGTTGTTCCTTTGCCAAGTAATCTCCCATTGCCAACTGTAAAAC CTGGTCACATGAATCATGGGAGTCATGTACAAAGAGTTCAGCATTCAGCTTCAACCTCCCTG TCCCCTTCTCCAGTGACATCCCCCGTGCACATGATGGGGACTGGAATTGGAAGGATTGAGGC ATCGCCCCAAAGATCACGCTCATCTTCCACATCATCAGATCATGGAAATTTCATGATGCCAC CTGTAGGACCCCAGGCCACTTCTAGTGGTATTAAGGTTCCACCCAGGTCACCAAGGTCAACA ATAGGGTCCCCAAGGCCATCAATGCCATCAAGCCCTTCTACCAAGTCCGATGGACATCATCA GTACAAGGATATCCCTAACCCATTAATTGCTGGAATAAGTAATGTACTAAATACCCCAAGCA GTGCAGCTTTTCCTACTGCATCTGCCGGAAGTAGTTCTGTAAAGAGTCAGCCTGGTTTGCTG GGAATGCCTTTAAATCAGATCTTGAACCAGCACAATGCTGCCTCCTTTCCAGCAAGTAGTTT ACTCTCAGCAGCAGCCAAAGCACAGCTAGCAAATCAAAACAAACTTGCTGGTAACAACAGTA GCAGCAGTAGCAATTCTGGAGCTGTTGCCGGCAGTGGCAACACTGAAGGACATAGCACTTTA AACACCATGTTCCCTCCTACTGCCAACATGCTTCTCCCAACAGGTGAAGGGCAAAGTGGTCG AGCAGCACTAAGAGATAAGCTGATGTCTCAGCAAAAAGACGCATTGCGGAAAAGAAAACAAC CACCTACGACAGTGTTGAGTTTGCTCAGACAGTCTCAAATGGATAGTTCTGCAGTTCCTAAA CCTGGACCTGACTTGCTAAGGAAGCAGGGTCAGGGTTCATTTCCCATCAGTTCAATGTCTCA GTTACTACAGTCTATGAGTTGTCAAAGCTCTCACTTGAGTAGCAATAGTACCCCGGGTTGTG GGGCCTCAAATACTGCTTTGCCTTGCTCTGCTAACCAGCTGCATTTTACAGATCCCAGTATG AACTCTAGTGTTCTTCAGAACATACCTTTAAGAGGGGAAGCCGTGCACTGCCACAATGCAAA CACTAACTTTGTTCACAGTAACAGTCCAGTCCCCAACCACCATCTTGCAGGTTTAATAAATC AGATTCAGGCTAGCGGGAACTGTGGGATGCTCAGTCAGTCGGGCATGGCTTTAGGAAATTCC TTACATCCCAATCCACCTCAGTCAAGAATTTCAACGTCCTCCACTCCAGTGATACCAAACAG CATTGTTAGCAGCTATAATCAAACAAGTTCTGAAGCAGGCGGTTCAGGACCATCATCCTCCA TAGCCATAGCGGGCACCAACCACCCTGCCATCACAAAGACAACATCTGTTCTTCAAGATGGC GTCATAGTCACCACTGCAGCTGGAAACCCACTGCAGAGTCAGCTACCCATTGGGAGTGATTT TCCTTTTGTTGGCCAGGAGCACGCACTTCATTTTCCATCCAACAGCACTTCAAACAACCATC TTCCACACCCCTTGAACCCCAGCCTCCTCAGTTCTCTACCTATCTCTTTGCCAGTGAATCAA CAGCATCTCCTAAACCAGAATCTATTAAATATCCTCCAGCCTTCAGCAGGAGAAGGCAAGTC TGAGATCAACCTCCACCCTTTAGGTTTTCTCAACCCGAATGTAAACGCTGCTTTAGCTTTTC TCTCCAGTGACATGGATGGGCAGGTATTGCAGCCTGTTCACTTTCAGCTCTTAGCAGCCTTG CTTCAGAACCAAGCCCAAGCAGCTGCCATGCTTCCCCTGCCATCTTTCAATCTGACCATCTC AGATCTTTTGCAACAGCAAAATACCCCTTTACCCTCATTAACACAGATGACAGCCCCACCAG
ACCATTTGCCAAGCAATCAGTCAGACAACAGCCGAGCTGAGACCCTTTTAACCAGCCCCCTG GGGAACCCTTTACCAAGCTTTGCAGGCAGTGACACTACTTTTAACCCCCTGTTCCTCCCAGC TGTCAATGGGGCCTCAGGATTAATGACCTTGAATCCCCAGCTGTTGGGAGGTGTCCTGAACT CGGCATCGGCCAACACCGCTAATCATCCAGAGGTTTCCATAGCAACCTCCTCCCAGGCAACC ACTACCACAACCACTACATCATCAGCAGTGGCAGCACTGACTGTCTCAACACTTGGTGGGAC AGCAGTGGTGTCAATGGCAGAAACATTGCTGAATATATCTAATAATGCTGGGAATACACCTG GTCCAGCTAAACTCAACAGTAACTCTGTGGTGCCACAGCTACTTAACCCTCTACTGGGGACA GGTCTACTTGGTGATATGTCATCAATAAACAATACTTTGAGTAACCATCAACTGACTCATCT ACAGTCGCTGTTAAACAACAATCAGATGTTTCCTCCAAATCAGCAACAGCAGCAACTTCTCC AGGGGTACCAGAATCTCCAGGCGTTCCAAGGACAGTCCACAATTCCTTGCCCAGCTAACAAT AACCCCATGGCTTGTCTGTTTCAGAACTTTCAGGTGAGAATGCAGGAAGATGCAGCTCTCCT AAACAAAAGAATAAGCACTCAGCCTGGGCTCACAGCACTTCCTGAGAATCCAAACACTACAC TTCCACCTTTTCAAGATACACCTTGTGAGTTGCAACCGAGGATTGACCCATCTCTTGGTCAA CAGGTGAAGGATGGCCTCGTTGTGGGTGGCCCAGGTGATGCTTCCGTAGATGCCATTTACAA AGCAGTTGTCGATGCAGCCAGCAAAGGAATGCAGGTTGTCATCACCACTGCAGTCAACAGTA CAACTCAGATCAGCCCCATTCCAGCTCTGAGTGCCATGAGTGCCTTCACTGCCTCAATTGGT GACCCATTAAATCTCTCCAGTGCTGTCAGTGCGGTCATTCATGGACGGAACATGGGAGGTGT TGATCATGATGGTAGGCTGAGGAATTCAAGAGGGGCTCGGCTGCCCAAGAATCTAGACCATG GGAAAAATGTGAACGAAGGAGATGGGTTTGAATATTTCAAGTCAGCAAGTTGCCACACATCC AAAAAACAGTGGGACGGGGAGCAAAGCCCCAGAGGGGAGCGAAACAGGTGGAAGTACGAGGA ATTTTTAGATCATCCAGGCCATATCCACAGTAGTCCTTGTCATGAAAGGCCCAACAATGTCT CTACACTGCCATTTCTGCCTGGGGAACAGCACCCAATACTGTTACCACCAAGAAACTGTCCA GGGGATAAAATTCTAGAGGAAAATTTCAGGTATAATAACTACAAAAGAACTATGATGAGTTT TAAGGAGAGACTAGAGAACACTGTGGAAAGATGTGCACACATAAATGGGAATAGACCTCGAC AGAGTCGGGGATTTGGAGAGCTGCTAAGCACTGCAAAGCAAGACCTGGTCCTAGAGGAGCAG TCTCCAAGTTCCTCAAATAGTTTGGAAAATTCTCTGGTCAAAGACTACATCCATTACAATGG AGACTTTAATGCCAAAAGCGTTAATGGGTGTGTGCCTAGCCCTTCAGATGCTAAAAGCATTA GTAGTGAAGATGACCTAAGGAACCCAGACTCCCCCTCTTCAAATGAATTGATACATTATAGA CCAAGGACGTTCAATGTTGGCGACTTGGTCTGGGGCCAAATCAAAGGACTGACTTCCTGGCC TGGAAAATTAGTAAGAGAAGACGACGTTCACAATTCATGTCAACAAAGCCCCGAGGAAGGGA AGGTGGAGCCCGAGAAGTTGAAGACACTAACAGAAGGTTTGGAAGCCTACAGCCGTGTCCGG AAAAGGAACAGAAAAAGTGGAAAGCTAAATAACCATTTAGAAGCTGCTATTCATGAGGCCAT GAGTGAACTGGACAAAATGTCTGGGACTGTACACCAAATCCCACAGGGTGACAGACAAATGA GACCCCCCAAACCCAAGAGGAGGAAGATCTCCAGATAA (SEQ ID NO: 6) SEQ ID NO: 7 – Wild type human MBD5 isoform 2 amino acid sequence MNGGKECDGGDKEGGLPAIQVPVGWQRRVDQNGVLYVSPSGSLLSCLEQVKTYLLTDGTCKC GLECPLILPKVFNFDPGAAVKQRTAEDVKADEDVTKLCIHKRKIIAVATLHKSMEAPHPSLV LTSPGGGTNATPVVPSRAATPRSVRNKSHEGITNSVMPECKNPFKLMIGSSNAMGRLYVQEL PGSQQQELHPVYPRQRLGSSEHGQKSPFRGSHGGLPSPASSGSQIYGDGSISPRTDPLGSPD VFTRSNPGFHGAPNSSPIHLNRTPLSPPSVMLHGSPVQSSCAMAGRTNIPLSPTLTTKSPVM KKPMCNFSTNMEIPRAMFHHKPPQGPPPPPPPSCALQKKPLTSEKDPLGILDPIPSKPVNQN PVIINPTSFHSNVHSQVPMMNVSMPPAVVPLPSNLPLPTVKPGHMNHGSHVQRVQHSASTSL SPSPVTSPVHMMGTGIGRIEASPQRSRSSSTSSDHGNFMMPPVGPQATSSGIKVPPRSPRST IGSPRPSMPSSPSTKSDGHHQYKDIPNPLIAGISNVLNTPSSAAFPTASAGSSSVKSQPGLL GMPLNQILNQHNAASFPASSLLSAAAKAQLANQNKLAGNNSSSSSNSGAVAGSGNTEGHSTL NTMFPPTANMLLPTGEGQSGRAALRDKLMSQQKDALRKRKQPPTTVLSLLRQSQMDSSAVPK PGPDLLRKQGQGSFPISSMSQLLQSMSCQSSHLSSNSTPGCGASNTALPCSANQLHFTDPSM NSSVLQNIPLRGEAVHCHNANTNFVHSNSPVPNHHLAGLINQIQASGNCGMLSQSGMALGNS LHPNPPQSRISTSSTPVIPNSIVSSYNQTSSEAGGSGPSSSIAIAGTNHPAITKTTSVLQDG VIVTTAAGNPLQSQLPIGSDFPFVGQEHALHFPSNSTSNNHLPHPLNPSLLSSLPISLPVNQ
QHLLNQNLLNILQPSAGEGDMSSINNTLSNHQLTHLQSLLNNNQMFPPNQQQQQLLQGYQNL QAFQGQSTIPCPANNNPMACLFQNFQVRMQEDAALLNKRISTQPGLTALPENPNTTLPPFQD TPCELQPRIDPSLGQQVKDGLVVGGPGDASVDAIYKAVVDAASKGMQVVITTAVNSTTQISP IPALSAMSAFTASIGDPLNLSSAVSAVIHGRNMGGVDHDGRLRNSRGARLPKNLDHGKNVNE GDGFEYFKSASCHTSKKQWDGEQSPRGERNRWKYEEFLDHPGHIHSSPCHERPNNVSTLPFL PGEQHPILLPPRNCPGDKILEENFRYNNYKRTMMSFKERLENTVERCAHINGNRPRQSRGFG ELLSTAKQDLVLEEQSPSSSNSLENSLVKDYIHYNGDFNAKSVNGCVPSPSDAKSISSEDDL RNPDSPSSNELIHYRPRTFNVGDLVWGQIKGLTSWPGKLVREDDVHNSCQQSPEEGKVEPEK LKTLTEGLEAYSRVRKRNRKSGKLNNHLEAAIHEAMSELDKMSGTVHQIPQGDRQMRPPKPK RRKISR (SEQ ID NO: 7) SEQ ID NO: 8 – Wild type human MBD5 isoform 2 polynucleotide coding sequence ATGAATGGAGGCAAAGAGTGTGACGGAGGGGACAAGGAAGGAGGTCTTCCAGCTATACAAGT TCCTGTGGGTTGGCAGCGTCGTGTGGATCAAAATGGAGTGCTTTATGTCAGTCCCAGTGGGT CTTTGTTATCTTGCTTGGAGCAGGTTAAAACATACCTGCTTACTGATGGAACATGCAAGTGT GGCTTGGAATGTCCTCTTATTCTTCCCAAGGTATTTAATTTTGATCCTGGAGCTGCTGTGAA ACAGAGAACCGCAGAAGATGTTAAGGCAGATGAAGATGTCACAAAGCTATGCATACATAAAA GAAAAATTATTGCAGTGGCCACACTTCATAAAAGCATGGAAGCCCCACATCCTTCTCTGGTG CTCACCAGTCCCGGAGGAGGAACAAATGCAACTCCAGTAGTACCTTCTCGGGCAGCAACTCC AAGATCAGTAAGAAATAAGTCTCATGAAGGAATTACAAATTCTGTAATGCCTGAATGTAAGA ATCCTTTCAAGTTAATGATTGGATCATCAAATGCCATGGGAAGGCTATATGTACAAGAACTG CCTGGAAGCCAACAACAAGAACTCCACCCTGTCTACCCCCGACAGAGATTGGGCAGCAGTGA ACATGGACAGAAATCTCCATTCCGTGGCAGCCATGGAGGCCTGCCCAGCCCAGCGTCATCAG GTTCCCAGATATATGGAGATGGTTCAATCTCTCCAAGGACTGACCCACTTGGAAGTCCTGAT GTTTTCACAAGAAGTAATCCTGGTTTTCATGGAGCTCCCAATTCTAGTCCTATTCACCTGAA TAGGACTCCTCTTTCTCCACCTTCAGTAATGCTACATGGTTCTCCTGTACAGTCATCCTGTG CAATGGCTGGAAGGACTAATATACCTCTTTCCCCAACCTTGACTACAAAGAGTCCAGTAATG AAAAAACCAATGTGTAATTTTTCAACTAATATGGAAATACCACGAGCAATGTTCCACCACAA ACCACCCCAAGGCCCACCTCCCCCTCCTCCACCTTCTTGTGCTCTTCAGAAAAAGCCATTAA CATCTGAGAAAGATCCACTTGGCATTCTTGACCCTATTCCTAGTAAACCAGTGAATCAGAAC CCTGTTATCATTAATCCAACCAGTTTCCATTCAAATGTCCACTCTCAGGTACCTATGATGAA TGTAAGCATGCCTCCTGCTGTTGTTCCTTTGCCAAGTAATCTCCCATTGCCAACTGTAAAAC CTGGTCACATGAATCATGGGAGTCATGTACAAAGAGTTCAGCATTCAGCTTCAACCTCCCTG TCCCCTTCTCCAGTGACATCCCCCGTGCACATGATGGGGACTGGAATTGGAAGGATTGAGGC ATCGCCCCAAAGATCACGCTCATCTTCCACATCATCAGATCATGGAAATTTCATGATGCCAC CTGTAGGACCCCAGGCCACTTCTAGTGGTATTAAGGTTCCACCCAGGTCACCAAGGTCAACA ATAGGGTCCCCAAGGCCATCAATGCCATCAAGCCCTTCTACCAAGTCCGATGGACATCATCA GTACAAGGATATCCCTAACCCATTAATTGCTGGAATAAGTAATGTACTAAATACCCCAAGCA GTGCAGCTTTTCCTACTGCATCTGCCGGAAGTAGTTCTGTAAAGAGTCAGCCTGGTTTGCTG GGAATGCCTTTAAATCAGATCTTGAACCAGCACAATGCTGCCTCCTTTCCAGCAAGTAGTTT ACTCTCAGCAGCAGCCAAAGCACAGCTAGCAAATCAAAACAAACTTGCTGGTAACAACAGTA GCAGCAGTAGCAATTCTGGAGCTGTTGCCGGCAGTGGCAACACTGAAGGACATAGCACTTTA AACACCATGTTCCCTCCTACTGCCAACATGCTTCTCCCAACAGGTGAAGGGCAAAGTGGTCG AGCAGCACTAAGAGATAAGCTGATGTCTCAGCAAAAAGACGCATTGCGGAAAAGAAAACAAC CACCTACGACAGTGTTGAGTTTGCTCAGACAGTCTCAAATGGATAGTTCTGCAGTTCCTAAA CCTGGACCTGACTTGCTAAGGAAGCAGGGTCAGGGTTCATTTCCCATCAGTTCAATGTCTCA GTTACTACAGTCTATGAGTTGTCAAAGCTCTCACTTGAGTAGCAATAGTACCCCGGGTTGTG GGGCCTCAAATACTGCTTTGCCTTGCTCTGCTAACCAGCTGCATTTTACAGATCCCAGTATG AACTCTAGTGTTCTTCAGAACATACCTTTAAGAGGGGAAGCCGTGCACTGCCACAATGCAAA CACTAACTTTGTTCACAGTAACAGTCCAGTCCCCAACCACCATCTTGCAGGTTTAATAAATC AGATTCAGGCTAGCGGGAACTGTGGGATGCTCAGTCAGTCGGGCATGGCTTTAGGAAATTCC
TTACATCCCAATCCACCTCAGTCAAGAATTTCAACGTCCTCCACTCCAGTGATACCAAACAG CATTGTTAGCAGCTATAATCAAACAAGTTCTGAAGCAGGCGGTTCAGGACCATCATCCTCCA TAGCCATAGCGGGCACCAACCACCCTGCCATCACAAAGACAACATCTGTTCTTCAAGATGGC GTCATAGTCACCACTGCAGCTGGAAACCCACTGCAGAGTCAGCTACCCATTGGGAGTGATTT TCCTTTTGTTGGCCAGGAGCACGCACTTCATTTTCCATCCAACAGCACTTCAAACAACCATC TTCCACACCCCTTGAACCCCAGCCTCCTCAGTTCTCTACCTATCTCTTTGCCAGTGAATCAA CAGCATCTCCTAAACCAGAATCTATTAAATATCCTCCAGCCTTCAGCAGGAGAAGGTGATAT GTCATCAATAAACAATACTTTGAGTAACCATCAACTGACTCATCTACAGTCGCTGTTAAACA ACAATCAGATGTTTCCTCCAAATCAGCAACAGCAGCAACTTCTCCAGGGGTACCAGAATCTC CAGGCGTTCCAAGGACAGTCCACAATTCCTTGCCCAGCTAACAATAACCCCATGGCTTGTCT GTTTCAGAACTTTCAGGTGAGAATGCAGGAAGATGCAGCTCTCCTAAACAAAAGAATAAGCA CTCAGCCTGGGCTCACAGCACTTCCTGAGAATCCAAACACTACACTTCCACCTTTTCAAGAT ACACCTTGTGAGTTGCAACCGAGGATTGACCCATCTCTTGGTCAACAGGTGAAGGATGGCCT CGTTGTGGGTGGCCCAGGTGATGCTTCCGTAGATGCCATTTACAAAGCAGTTGTCGATGCAG CCAGCAAAGGAATGCAGGTTGTCATCACCACTGCAGTCAACAGTACAACTCAGATCAGCCCC ATTCCAGCTCTGAGTGCCATGAGTGCCTTCACTGCCTCAATTGGTGACCCATTAAATCTCTC CAGTGCTGTCAGTGCGGTCATTCATGGACGGAACATGGGAGGTGTTGATCATGATGGTAGGC TGAGGAATTCAAGAGGGGCTCGGCTGCCCAAGAATCTAGACCATGGGAAAAATGTGAACGAA GGAGATGGGTTTGAATATTTCAAGTCAGCAAGTTGCCACACATCCAAAAAACAGTGGGACGG GGAGCAAAGCCCCAGAGGGGAGCGAAACAGGTGGAAGTACGAGGAATTTTTAGATCATCCAG GCCATATCCACAGTAGTCCTTGTCATGAAAGGCCCAACAATGTCTCTACACTGCCATTTCTG CCTGGGGAACAGCACCCAATACTGTTACCACCAAGAAACTGTCCAGGGGATAAAATTCTAGA GGAAAATTTCAGGTATAATAACTACAAAAGAACTATGATGAGTTTTAAGGAGAGACTAGAGA ACACTGTGGAAAGATGTGCACACATAAATGGGAATAGACCTCGACAGAGTCGGGGATTTGGA GAGCTGCTAAGCACTGCAAAGCAAGACCTGGTCCTAGAGGAGCAGTCTCCAAGTTCCTCAAA TAGTTTGGAAAATTCTCTGGTCAAAGACTACATCCATTACAATGGAGACTTTAATGCCAAAA GCGTTAATGGGTGTGTGCCTAGCCCTTCAGATGCTAAAAGCATTAGTAGTGAAGATGACCTA AGGAACCCAGACTCCCCCTCTTCAAATGAATTGATACATTATAGACCAAGGACGTTCAATGT TGGCGACTTGGTCTGGGGCCAAATCAAAGGACTGACTTCCTGGCCTGGAAAATTAGTAAGAG AAGACGACGTTCACAATTCATGTCAACAAAGCCCCGAGGAAGGGAAGGTGGAGCCCGAGAAG TTGAAGACACTAACAGAAGGTTTGGAAGCCTACAGCCGTGTCCGGAAAAGGAACAGAAAAAG TGGAAAGCTAAATAACCATTTAGAAGCTGCTATTCATGAGGCCATGAGTGAACTGGACAAAA TGTCTGGGACTGTACACCAAATCCCACAGGGTGACAGACAAATGAGACCCCCCAAACCCAAG AGGAGGAAGATCTCCAGATAA (SEQ ID NO: 8) SEQ ID NO: 9 – Wild type human NSUN1 isoform 1 amino acid sequence MGRKLDPTKEKRGPGRKARKQKGAETELVRFLPAVSDENSKRLSSRARKRAAKRRLGSVEAP KTNKSPEAKPLPGKLPKGAVQTAGKKGPQSLFNAPRGKKRPAPGSDEEEEEEDSEEDGMVNH GDLWGSEDDADTVDDYGADSNSEDEEEGEALLPIERAARKQKAREAAAGIQWSEEETEDEEE EKEVTPESGPPKVEEADGGLQINVDEEPFVLPPAGEMEQDAQAPDLQRVHKRIQDIVGILRD FGAQREEGRSRSEYLNRLKKDLAIYYSYGDFLLGKLMDLFPLSELVEFLEANEVPRPVTLRT NTLKTRRRDLAQALINRGVNLDPLGKWSKTGLVVYDSSVPIGATPEYLAGHYMLQGASSMLP VMALAPQEHERILDMCCAPGGKTSYMAQLMKNTGVILANDANAERLKSVVGNLHRLGVTNTI ISHYDGRQFPKVVGGFDRVLLDAPCSGTGVISKDPAVKTNKDEKDILRCAHLQKELLLSAID SVNATSKTGGYLVYCTCSITVEENEWVVDYALKKRNVRLVPTGLDFGQEGFTRFRERRFHPS LRSTRRFYPHTHNMDGFFIAKFKKFSNSIPQSQTGNSETATPTNVDLPQVIPKSENSSQPAK KAKGAAKTKQQLQKQQHPKKASFQKLNGISKGADSELSTVPSVTKTQASSSFQDSSQPAGKA EGIREPKVTGKLKQRSPKLQSSKKVAFLRQNAPPKGTDTQTPAVLSPSKTQATLKPKDHHQP LGRAKGVEKQQLPEQPFEKAAFQKQNDTPKGPQPPTVSPIRSSRPPPAKRKKSQSRGNSQLL LS (SEQ ID NO: 9)
SEQ ID NO: 10 – Wild type human NSUN1 isoform 1 polynucleotide coding sequence ATGGGGCGCAAGTTGGACCCTACGAAGGAGAAGCGGGGGCCAGGCCGAAAGGCCCGGAAGCA GAAGGGTGCCGAGACAGAACTCGTCAGATTCTTGCCTGCAGTAAGTGACGAAAATTCCAAGA GGCTGTCTAGTCGTGCTCGAAAGAGGGCAGCCAAGAGGAGATTGGGCTCTGTTGAAGCCCCT AAGACAAATAAGTCTCCTGAGGCCAAACCATTGCCTGGAAAGCTACCAAAAGGAGCTGTCCA GACAGCTGGTAAGAAGGGACCCCAGTCCCTATTTAATGCTCCTCGAGGCAAGAAGCGCCCAG CACCTGGCAGTGATGAGGAAGAGGAGGAGGAAGACTCTGAAGAAGATGGTATGGTGAACCAC GGGGACCTCTGGGGCTCCGAGGACGATGCTGATACGGTAGATGACTATGGAGCTGACTCCAA CTCTGAGGATGAGGAGGAAGGTGAAGCGTTGCTGCCCATTGAAAGAGCTGCTCGGAAGCAGA AGGCCCGGGAAGCTGCTGCTGGGATCCAGTGGAGTGAAGAGGAGACCGAGGACGAGGAGGAA GAGAAAGAAGTGACCCCTGAGTCAGGCCCCCCAAAGGTGGAAGAGGCAGATGGGGGCCTGCA GATCAATGTGGATGAGGAACCATTTGTGCTGCCCCCTGCTGGGGAGATGGAGCAGGATGCCC AGGCTCCAGACCTGCAACGAGTTCACAAGCGGATCCAGGATATTGTGGGAATTCTGCGTGAT TTTGGGGCTCAGCGGGAGGAAGGGCGGTCTCGTTCTGAATACCTGAACCGGCTCAAGAAGGA TCTGGCCATTTACTACTCCTATGGAGACTTCCTGCTTGGCAAGCTCATGGACCTCTTCCCTC TGTCTGAGCTGGTGGAGTTCTTAGAAGCTAATGAGGTGCCTCGGCCCGTCACCCTCCGGACC AATACCTTGAAAACCCGACGCCGAGACCTTGCACAGGCTCTAATCAATCGTGGGGTTAACCT GGATCCCCTGGGCAAGTGGTCAAAGACTGGACTAGTGGTGTATGATTCTTCTGTGCCCATTG GTGCTACCCCCGAGTACCTGGCTGGGCACTACATGCTGCAGGGAGCCTCCAGCATGTTGCCC GTCATGGCCTTGGCACCCCAGGAACATGAGCGGATCCTGGACATGTGTTGTGCCCCTGGAGG AAAGACCAGCTACATGGCCCAGCTGATGAAGAACACGGGTGTGATCCTTGCCAATGACGCCA ATGCTGAGCGGCTCAAGAGTGTTGTGGGCAACTTGCATCGGCTGGGAGTCACCAACACCATT ATCAGCCACTATGATGGGCGCCAGTTCCCCAAGGTGGTGGGGGGCTTTGACCGAGTACTGCT GGATGCTCCCTGCAGTGGCACTGGGGTCATCTCCAAGGATCCAGCCGTGAAGACTAACAAGG ATGAGAAGGACATCCTGCGCTGTGCTCACCTCCAGAAGGAGTTGCTCCTGAGTGCTATTGAC TCTGTCAATGCGACCTCCAAGACAGGAGGCTACCTGGTTTACTGCACCTGTTCTATCACAGT AGAAGAGAATGAGTGGGTGGTAGACTATGCTCTGAAAAAGAGGAATGTGCGACTGGTGCCCA CGGGCCTAGACTTTGGCCAGGAAGGTTTTACCCGCTTTCGAGAAAGGCGCTTCCACCCCAGT CTGCGTTCTACCCGACGCTTCTACCCTCATACCCACAATATGGATGGGTTCTTCATTGCCAA GTTCAAGAAATTTTCCAATTCTATCCCTCAGTCCCAGACAGGAAATTCTGAAACAGCCACAC CTACAAATGTAGACTTGCCTCAGGTCATCCCCAAGTCTGAGAACAGCAGCCAGCCAGCCAAG AAAGCCAAGGGGGCTGCAAAGACAAAGCAGCAGCTGCAGAAACAGCAACATCCCAAGAAGGC CTCCTTCCAGAAGCTGAATGGCATCTCCAAAGGGGCAGACTCAGAATTGTCCACTGTACCTT CTGTCACAAAGACCCAAGCTTCCTCCAGCTTCCAGGATAGCAGTCAGCCAGCTGGAAAAGCC GAAGGGATCAGGGAGCCAAAGGTGACTGGGAAGCTAAAGCAACGATCACCTAAATTACAGTC CTCCAAGAAAGTTGCTTTCCTCAGGCAGAATGCCCCTCCCAAGGGCACAGACACACAAACAC CGGCTGTGTTATCCCCATCCAAGACTCAGGCCACCCTGAAACCTAAGGACCATCATCAGCCC CTTGGAAGGGCCAAGGGGGTTGAGAAGCAGCAGTTGCCAGAGCAGCCTTTTGAGAAAGCTGC CTTCCAGAAACAGAATGATACCCCCAAGGGGCCTCAGCCTCCCACTGTGTCTCCCATCCGTT CCAGCCGCCCCCCACCAGCAAAGAGGAAGAAATCTCAGTCCAGGGGCAACAGCCAGCTGCTG CTATCTTAG (SEQ ID NO: 10) SEQ ID NO: 11 – Wild type human NSUN2 isoform 2 amino acid sequence MGRRSRGRRLQQQQRPEDAEDGAEGGGKRGEAGWEGGYPEIVKENKLFEHYYQELKIVPEGE WGQFMDALREPLPATLRITGYKSHAKEILHCLKNKYFKELEDLEVDGQKVEVPQPLSWYPEE LAWHTNLSRKILRKSPHLEKFHQFLVSETESGNISRQEAVSMIPPLLLNVRPHHKILDMCAA PGSKTTQLIEMLHADMNVPFPEGFVIANDVDNKRCYLLVHQAKRLSSPCIMVVNHDASSIPR LQIDVDGRKEILFYDRILCDVPCSGDGTMRKNIDVWKKWTTLNSLQLHGLQLRIATRGAEQL AEGGRMVYSTCSLNPIEDEAVIASLLEKSEGALELADVSNELPGLKWMPGITQWKVMTKDGQ WFTDWDAVPHSRHTQIRPTMFPPKDPEKLQAMHLERCLRILPHHQNTGGFFVAVLVKKSSMP WNKRQPKLQGKSAETRESTQLSPADLTEGKPTDPSKLESPSFTGTGDTEIAHATEDLENNGS
KKDGVCGPPPSKKMKLFGFKEDPFVFIPEDDPLFPPIEKFYALDPSFPRMNLLTRTTEGKKR QLYMVSKELRNVLLNNSEKMKVINTGIKVWCRNNSGEEFDCAFRLAQEGIYTLYPFINSRII TVSMEDVKILLTQENPFFRKLSSETYSQAKDLAKGSIVLKYEPDSANPDALQCPIVLCGWRG KASIRTFVPKNERLHYLRMMGLEVLGEKKKEGVILTNESAASTGQPDNDVTEGQRAGEPNSP DAEEANSPDVTAGCDPAGVHPPR (SEQ ID NO: 11) SEQ ID NO: 12 - Wild type human NSUN2 isoform 1 polynucleotide coding sequence ATGGGGCGGCGGTCGCGGGGTCGGCGGCTCCAGCAACAGCAGCGGCCGGAGGACGCGGAGGA TGGCGCCGAGGGTGGTGGAAAGCGCGGCGAGGCGGGCTGGGAAGGAGGCTACCCCGAGATCG TCAAGGAGAACAAGCTGTTCGAGCACTACTACCAGGAGCTCAAGATCGTGCCCGAGGGCGAG TGGGGCCAGTTCATGGACGCTCTCAGGGAGCCGCTCCCGGCCACTTTAAGAATTACTGGTTA CAAAAGCCACGCAAAAGAGATTCTCCATTGCTTAAAGAACAAATATTTTAAGGAATTGGAGG ACCTGGAGGTGGACGGTCAGAAAGTTGAAGTTCCACAGCCACTGAGTTGGTATCCTGAAGAA CTTGCCTGGCACACAAATTTAAGTCGAAAAATCTTGAGAAAATCGCCACACTTGGAAAAGTT TCATCAGTTTCTAGTTAGTGAAACAGAATCTGGAAATATTAGTCGTCAAGAAGCTGTTAGCA TGATCCCACCACTGCTCCTCAACGTGCGGCCTCATCATAAGATCTTAGATATGTGTGCAGCA CCTGGCTCAAAGACCACACAGTTAATTGAAATGCTACATGCCGACATGAATGTCCCCTTTCC AGAGGGATTTGTTATTGCGAATGATGTGGACAACAAGCGCTGCTACCTGCTCGTCCATCAAG CCAAGAGGCTGAGCAGCCCCTGCATCATGGTGGTCAACCATGATGCCTCCAGCATACCCAGG CTCCAGATAGATGTGGACGGCAGGAAAGAGATCCTCTTCTATGATCGAATTTTATGTGATGT CCCTTGCAGTGGAGACGGCACTATGAGAAAAAACATTGATGTTTGGAAAAAGTGGACCACCT TAAATAGCTTGCAGCTACATGGCTTACAGCTGCGGATTGCAACACGCGGGGCTGAACAGCTG GCTGAAGGTGGAAGGATGGTGTATTCCACGTGTTCACTAAACCCTATTGAGGATGAAGCAGT CATAGCATCTTTACTGGAAAAAAGTGAAGGTGCTTTGGAGCTTGCTGATGTGTCTAATGAAC TGCCAGGGCTGAAGTGGATGCCTGGAATCACACAGTGGAAGGTAATGACGAAAGATGGGCAG TGGTTTACAGACTGGGACGCTGTTCCTCACAGCAGACACACCCAGATCCGACCTACCATGTT CCCTCCGAAGGACCCAGAAAAGCTGCAGGCCATGCACCTGGAGCGATGCCTTAGGATATTAC CCCATCATCAGAATACTGGAGGGTTTTTTGTGGCAGTATTGGTGAAAAAATCTTCAATGCCG TGGAATAAACGTCAGCCAAAGCTTCAGGGTAAATCTGCAGAGACCAGAGAAAGCACACAGCT GAGCCCTGCAGATCTCACAGAAGGGAAACCCACAGATCCCTCTAAGCTGGAAAGTCCGTCAT TCACAGGAACTGGTGACACAGAAATAGCTCATGCAACTGAGGATTTAGAGAATAATGGCAGT AAGAAAGATGGCGTGTGTGGTCCTCCTCCATCAAAGAAAATGAAGTTATTTGGATTTAAAGA AGATCCATTTGTATTTATTCCTGAAGATGACCCATTATTTCCACCTATTGAGAAATTTTATG CTTTGGATCCTTCATTCCCAAGGATGAATTTGTTAACTCGGACTACAGAAGGGAAGAAAAGG CAGCTCTACATGGTTTCTAAGGAGTTGCGGAATGTGCTGCTGAATAACAGTGAGAAGATGAA GGTTATTAACACGGGGATCAAAGTCTGGTGTAGAAATAACAGCGGTGAAGAGTTTGACTGTG CTTTCCGGCTGGCACAGGAGGGAATATATACATTGTATCCATTTATTAACTCAAGAATTATT ACTGTATCAATGGAAGATGTTAAGATACTGTTGACCCAGGAAAATCCCTTTTTTAGAAAACT CAGCAGTGAGACCTACAGTCAAGCAAAGGACCTGGCAAAGGGAAGCATCGTGCTGAAGTATG AACCAGATTCTGCGAATCCAGACGCTCTGCAGTGTCCCATCGTCTTATGCGGATGGCGGGGA AAGGCCTCCATTCGAACTTTTGTGCCCAAGAATGAACGGCTTCATTATCTCAGGATGATGGG GCTGGAGGTATTGGGAGAAAAGAAGAAGGAAGGGGTTATCCTCACAAATGAGAGTGCAGCCA GCACCGGACAGCCAGACAATGACGTGACTGAGGGACAGAGAGCAGGAGAGCCCAACAGCCCA GATGCAGAAGAGGCCAACAGTCCAGACGTGACAGCAGGCTGTGACCCGGCGGGGGTCCATCC ACCCCGGTGA (SEQ ID NO: 12) SEQ ID NO: 13 – Wild type human NSUN2 isoform 2 amino acid sequence MGRRSRGRRLQQQQRPEDAEDGAEGGGKRGEAGWEGGYPEIVKENKLFEHYYQELKIVPEGE WGQFMDALREPLPATLRITGYKRYPEELAWHTNLSRKILRKSPHLEKFHQFLVSETESGNIS RQEAVSMIPPLLLNVRPHHKILDMCAAPGSKTTQLIEMLHADMNVPFPEGFVIANDVDNKRC YLLVHQAKRLSSPCIMVVNHDASSIPRLQIDVDGRKEILFYDRILCDVPCSGDGTMRKNIDV
WKKWTTLNSLQLHGLQLRIATRGAEQLAEGGRMVYSTCSLNPIEDEAVIASLLEKSEGALEL ADVSNELPGLKWMPGITQWKVMTKDGQWFTDWDAVPHSRHTQIRPTMFPPKDPEKLQAMHLE RCLRILPHHQNTGGFFVAVLVKKSSMPWNKRQPKLQGKSAETRESTQLSPADLTEGKPTDPS KLESPSFTGTGDTEIAHATEDLENNGSKKDGVCGPPPSKKMKLFGFKEDPFVFIPEDDPLFP PIEKFYALDPSFPRMNLLTRTTEGKKRQLYMVSKELRNVLLNNSEKMKVINTGIKVWCRNNS GEEFDCAFRLAQEGIYTLYPFINSRIITVSMEDVKILLTQENPFFRKLSSETYSQAKDLAKG SIVLKYEPDSANPDALQCPIVLCGWRGKASIRTFVPKNERLHYLRMMGLEVLGEKKKEGVIL TNESAASTGQPDNDVTEGQRAGEPNSPDAEEANSPDVTAGCDPAGVHPPR (SEQ ID NO: 13) SEQ ID NO: 14 - Wild type human NSUN2 isoform 2 polynucleotide coding sequence ATGGGGCGGCGGTCGCGGGGTCGGCGGCTCCAGCAACAGCAGCGGCCGGAGGACGCGGAGGA TGGCGCCGAGGGTGGTGGAAAGCGCGGCGAGGCGGGCTGGGAAGGAGGCTACCCCGAGATCG TCAAGGAGAACAAGCTGTTCGAGCACTACTACCAGGAGCTCAAGATCGTGCCCGAGGGCGAG TGGGGCCAGTTCATGGACGCTCTCAGGGAGCCGCTCCCGGCCACTTTAAGAATTACTGGTTA CAAAAGGTATCCTGAAGAACTTGCCTGGCACACAAATTTAAGTCGAAAAATCTTGAGAAAAT CGCCACACTTGGAAAAGTTTCATCAGTTTCTAGTTAGTGAAACAGAATCTGGAAATATTAGT CGTCAAGAAGCTGTTAGCATGATCCCACCACTGCTCCTCAACGTGCGGCCTCATCATAAGAT CTTAGATATGTGTGCAGCACCTGGCTCAAAGACCACACAGTTAATTGAAATGCTACATGCCG ACATGAATGTCCCCTTTCCAGAGGGATTTGTTATTGCGAATGATGTGGACAACAAGCGCTGC TACCTGCTCGTCCATCAAGCCAAGAGGCTGAGCAGCCCCTGCATCATGGTGGTCAACCATGA TGCCTCCAGCATACCCAGGCTCCAGATAGATGTGGACGGCAGGAAAGAGATCCTCTTCTATG ATCGAATTTTATGTGATGTCCCTTGCAGTGGAGACGGCACTATGAGAAAAAACATTGATGTT TGGAAAAAGTGGACCACCTTAAATAGCTTGCAGCTACATGGCTTACAGCTGCGGATTGCAAC ACGCGGGGCTGAACAGCTGGCTGAAGGTGGAAGGATGGTGTATTCCACGTGTTCACTAAACC CTATTGAGGATGAAGCAGTCATAGCATCTTTACTGGAAAAAAGTGAAGGTGCTTTGGAGCTT GCTGATGTGTCTAATGAACTGCCAGGGCTGAAGTGGATGCCTGGAATCACACAGTGGAAGGT AATGACGAAAGATGGGCAGTGGTTTACAGACTGGGACGCTGTTCCTCACAGCAGACACACCC AGATCCGACCTACCATGTTCCCTCCGAAGGACCCAGAAAAGCTGCAGGCCATGCACCTGGAG CGATGCCTTAGGATATTACCCCATCATCAGAATACTGGAGGGTTTTTTGTGGCAGTATTGGT GAAAAAATCTTCAATGCCGTGGAATAAACGTCAGCCAAAGCTTCAGGGTAAATCTGCAGAGA CCAGAGAAAGCACACAGCTGAGCCCTGCAGATCTCACAGAAGGGAAACCCACAGATCCCTCT AAGCTGGAAAGTCCGTCATTCACAGGAACTGGTGACACAGAAATAGCTCATGCAACTGAGGA TTTAGAGAATAATGGCAGTAAGAAAGATGGCGTGTGTGGTCCTCCTCCATCAAAGAAAATGA AGTTATTTGGATTTAAAGAAGATCCATTTGTATTTATTCCTGAAGATGACCCATTATTTCCA CCTATTGAGAAATTTTATGCTTTGGATCCTTCATTCCCAAGGATGAATTTGTTAACTCGGAC TACAGAAGGGAAGAAAAGGCAGCTCTACATGGTTTCTAAGGAGTTGCGGAATGTGCTGCTGA ATAACAGTGAGAAGATGAAGGTTATTAACACGGGGATCAAAGTCTGGTGTAGAAATAACAGC GGTGAAGAGTTTGACTGTGCTTTCCGGCTGGCACAGGAGGGAATATATACATTGTATCCATT TATTAACTCAAGAATTATTACTGTATCAATGGAAGATGTTAAGATACTGTTGACCCAGGAAA ATCCCTTTTTTAGAAAACTCAGCAGTGAGACCTACAGTCAAGCAAAGGACCTGGCAAAGGGA AGCATCGTGCTGAAGTATGAACCAGATTCTGCGAATCCAGACGCTCTGCAGTGTCCCATCGT CTTATGCGGATGGCGGGGAAAGGCCTCCATTCGAACTTTTGTGCCCAAGAATGAACGGCTTC ATTATCTCAGGATGATGGGGCTGGAGGTATTGGGAGAAAAGAAGAAGGAAGGGGTTATCCTC ACAAATGAGAGTGCAGCCAGCACCGGACAGCCAGACAATGACGTGACTGAGGGACAGAGAGC AGGAGAGCCCAACAGCCCAGATGCAGAAGAGGCCAACAGTCCAGACGTGACAGCAGGCTGTG ACCCGGCGGGGGTCCATCCACCCCGGTGA (SEQ ID NO: 14) SEQ ID NO: 15 – Wild type human TET2 isoform A amino acid sequence MEQDRTNHVEGNRLSPFLIPSPPICQTEPLATKLQNGSPLPERAHPEVNGDTKWHSFKSYYG IPCMKGSQNSRVSPDFTQESRGYSKCLQNGGIKRTVSEPSLSGLLQIKKLKQDQKANGERRN
FGVSQERNPGESSQPNVSDLSDKKESVSSVAQENAVKDFTSFSTHNCSGPENPELQILNEQE GKSANYHDKNIVLLKNKAVLMPNGATVSASSVEHTHGELLEKTLSQYYPDCVSIAVQKTTSH INAINSQATNELSCEITHPSHTSGQINSAQTSNSELPPKPAAVVSEACDADDADNASKLAAM LNTCSFQKPEQLQQQKSVFEICPSPAENNIQGTTKLASGEEFCSGSSSNLQAPGGSSERYLK QNEMNGAYFKQSSVFTKDSFSATTTPPPPSQLLLSPPPPLPQVPQLPSEGKSTLNGGVLEEH HHYPNQSNTTLLREVKIEGKPEAPPSQSPNPSTHVCSPSPMLSERPQNNCVNRNDIQTAGTM TVPLCSEKTRPMSEHLKHNPPIFGSSGELQDNCQQLMRNKEQEILKGRDKEQTRDLVPPTQH YLKPGWIELKAPRFHQAESHLKRNEASLPSILQYQPNLSNQMTSKQYTGNSNMPGGLPRQAY TQKTTQLEHKSQMYQVEMNQGQSQGTVDQHLQFQKPSHQVHFSKTDHLPKAHVQSLCGTRFH FQQRADSQTEKLMSPVLKQHLNQQASETEPFSNSHLLQHKPHKQAAQTQPSQSSHLPQNQQQ QQKLQIKNKEEILQTFPHPQSNNDQQREGSFFGQTKVEECFHGENQYSKSSEFETHNVQMGL EEVQNINRRNSPYSQTMKSSACKIQVSCSNNTHLVSENKEQTTHPELFAGNKTQNLHHMQYF PNNVIPKQDLLHRCFQEQEQKSQQASVLQGYKNRNQDMSGQQAAQLAQQRYLIHNHANVFPV PDQGGSHTQTPPQKDTQKHAALRWHLLQKQEQQQTQQPQTESCHSQMHRPIKVEPGCKPHAC MHTAPPENKTWKKVTKQENPPASCDNVQQKSIIETMEQHLKQFHAKSLFDHKALTLKSQKQV KVEMSGPVTVLTRQTTAAELDSHTPALEQQTTSSEKTPTKRTAASVLNNFIESPSKLLDTPI KNLLDTPVKTQYDFPSCRCVEQIIEKDEGPFYTHLGAGPNVAAIREIMEERFGQKGKAIRIE RVIYTGKEGKSSQGCPIAKWVVRRSSSEEKLLCLVRERAGHTCEAAVIVILILVWEGIPLSL ADKLYSELTETLRKYGTLTNRRCALNEERTCACQGLDPETCGASFSFGCSWSMYYNGCKFAR SKIPRKFKLLGDDPKEEEKLESHLQNLSTLMAPTYKKLAPDAYNNQIEYEHRAPECRLGLKE GRPFSGVTACLDFCAHAHRDLHNMQNGSTLVCTLTREDNREFGGKPEDEQLHVLPLYKVSDV DEFGSVEAQEEKKRSGAIQVLSSFRRKVRMLAEPVKTCRQRKLEAKKAAAEKLSSLENSSNK NEKEKSAPSRTKQTENASQAKQLAELLRLSGPVMQQSQQPQPLQKQPPQPQQQQRPQQQQPH HPQTESVNSYSASGSTNPYMRRPNPVSPYPNSSHTSDIYGSTSPMNFYSTSSQAAGSYLNSS NPMNPYPGLLNQNTQYPSYQCNGNLSVDNCSPYLGSYSPQSQPMDLYRYPSQDPLSKLSLPP IHTLYQPRFGNSQSFTSKYLGYGNQNMQGDGFSSCTIRPNVHHVGKLPPYPTHEMDGHFMGA TSRLPPNLSNPNMDYKNGEHHSPSHIIHNYSAAPGMFNSSLHALHLQNKENDMLSHTANGLS KMLPALNHDRTACVQGGLHKLSDANGQEKQPLALVQGVASGAEDNDEVWSDSEQSFLDPDIG GVAVAPTHGSILIECAKRELHATTPLKNPNRNHPTRISLVFYQHKSMNEPKHGLALWEAKMA EKAREKEEECEKYGPDYVPQKSHGKKVKREPAEPHETSEPTYLRFIKSLAERTMSVTTDSTV TTSPYAFTRVTGPYNRYI (SEQ ID NO: 15) SEQ ID NO: 16 – Wild type human TET2 isoform A polynucleotide coding sequence ATGGAACAGGATAGAACCAACCATGTTGAGGGCAACAGACTAAGTCCATTCCTGATACCATC ACCTCCCATTTGCCAGACAGAACCTCTGGCTACAAAGCTCCAGAATGGAAGCCCACTGCCTG AGAGAGCTCATCCAGAAGTAAATGGAGACACCAAGTGGCACTCTTTCAAAAGTTATTATGGA ATACCCTGTATGAAGGGAAGCCAGAATAGTCGTGTGAGTCCTGACTTTACACAAGAAAGTAG AGGGTATTCCAAGTGTTTGCAAAATGGAGGAATAAAACGCACAGTTAGTGAACCTTCTCTCT CTGGGCTCCTTCAGATCAAGAAATTGAAACAAGACCAAAAGGCTAATGGAGAAAGACGTAAC TTCGGGGTAAGCCAAGAAAGAAATCCAGGTGAAAGCAGTCAACCAAATGTCTCCGATTTGAG TGATAAGAAAGAATCTGTGAGTTCTGTAGCCCAAGAAAATGCAGTTAAAGATTTCACCAGTT TTTCAACACATAACTGCAGTGGGCCTGAAAATCCAGAGCTTCAGATTCTGAATGAGCAGGAG GGGAAAAGTGCTAATTACCATGACAAGAACATTGTATTACTTAAAAACAAGGCAGTGCTAAT GCCTAATGGTGCTACAGTTTCTGCCTCTTCCGTGGAACACACACATGGTGAACTCCTGGAAA AAACACTGTCTCAATATTATCCAGATTGTGTTTCCATTGCGGTGCAGAAAACCACATCTCAC ATAAATGCCATTAACAGTCAGGCTACTAATGAGTTGTCCTGTGAGATCACTCACCCATCGCA TACCTCAGGGCAGATCAATTCCGCACAGACCTCTAACTCTGAGCTGCCTCCAAAGCCAGCTG CAGTGGTGAGTGAGGCCTGTGATGCTGATGATGCTGATAATGCCAGTAAACTAGCTGCAATG CTAAATACCTGTTCCTTTCAGAAACCAGAACAACTACAACAACAAAAATCAGTTTTTGAGAT ATGCCCATCTCCTGCAGAAAATAACATCCAGGGAACCACAAAGCTAGCGTCTGGTGAAGAAT TCTGTTCAGGTTCCAGCAGCAATTTGCAAGCTCCTGGTGGCAGCTCTGAACGGTATTTAAAA
CAAAATGAAATGAATGGTGCTTACTTCAAGCAAAGCTCAGTGTTCACTAAGGATTCCTTTTC TGCCACTACCACACCACCACCACCATCACAATTGCTTCTTTCTCCCCCTCCTCCTCTTCCAC AGGTTCCTCAGCTTCCTTCAGAAGGAAAAAGCACTCTGAATGGTGGAGTTTTAGAAGAACAC CACCACTACCCCAACCAAAGTAACACAACACTTTTAAGGGAAGTGAAAATAGAGGGTAAACC TGAGGCACCACCTTCCCAGAGTCCTAATCCATCTACACATGTATGCAGCCCTTCTCCGATGC TTTCTGAAAGGCCTCAGAATAATTGTGTGAACAGGAATGACATACAGACTGCAGGGACAATG ACTGTTCCATTGTGTTCTGAGAAAACAAGACCAATGTCAGAACACCTCAAGCATAACCCACC AATTTTTGGTAGCAGTGGAGAGCTACAGGACAACTGCCAGCAGTTGATGAGAAACAAAGAGC AAGAGATTCTGAAGGGTCGAGACAAGGAGCAAACACGAGATCTTGTGCCCCCAACACAGCAC TATCTGAAACCAGGATGGATTGAATTGAAGGCCCCTCGTTTTCACCAAGCGGAATCCCATCT AAAACGTAATGAGGCATCACTGCCATCAATTCTTCAGTATCAACCCAATCTCTCCAATCAAA TGACCTCCAAACAATACACTGGAAATTCCAACATGCCTGGGGGGCTCCCAAGGCAAGCTTAC ACCCAGAAAACAACACAGCTGGAGCACAAGTCACAAATGTACCAAGTTGAAATGAATCAAGG GCAGTCCCAAGGTACAGTGGACCAACATCTCCAGTTCCAAAAACCCTCACACCAGGTGCACT TCTCCAAAACAGACCATTTACCAAAAGCTCATGTGCAGTCACTGTGTGGCACTAGATTTCAT TTTCAACAAAGAGCAGATTCCCAAACTGAAAAACTTATGTCCCCAGTGTTGAAACAGCACTT GAATCAACAGGCTTCAGAGACTGAGCCATTTTCAAACTCACACCTTTTGCAACATAAGCCTC ATAAACAGGCAGCACAAACACAACCATCCCAGAGTTCACATCTCCCTCAAAACCAGCAACAG CAGCAAAAATTACAAATAAAGAATAAAGAGGAAATACTCCAGACTTTTCCTCACCCCCAAAG CAACAATGATCAGCAAAGAGAAGGATCATTCTTTGGCCAGACTAAAGTGGAAGAATGTTTTC ATGGTGAAAATCAGTATTCAAAATCAAGCGAGTTCGAGACTCATAATGTCCAAATGGGACTG GAGGAAGTACAGAATATAAATCGTAGAAATTCCCCTTATAGTCAGACCATGAAATCAAGTGC ATGCAAAATACAGGTTTCTTGTTCAAACAATACACACCTAGTTTCAGAGAATAAAGAACAGA CTACACATCCTGAACTTTTTGCAGGAAACAAGACCCAAAACTTGCATCACATGCAATATTTT CCAAATAATGTGATCCCAAAGCAAGATCTTCTTCACAGGTGCTTTCAAGAACAGGAGCAGAA GTCACAACAAGCTTCAGTTCTACAGGGATATAAAAATAGAAACCAAGATATGTCTGGTCAAC AAGCTGCGCAACTTGCTCAGCAAAGGTACTTGATACATAACCATGCAAATGTTTTTCCTGTG CCTGACCAGGGAGGAAGTCACACTCAGACCCCTCCCCAGAAGGACACTCAAAAGCATGCTGC TCTAAGGTGGCATCTCTTACAGAAGCAAGAACAGCAGCAAACACAGCAACCCCAAACTGAGT CTTGCCATAGTCAGATGCACAGGCCAATTAAGGTGGAACCTGGATGCAAGCCACATGCCTGT ATGCACACAGCACCACCAGAAAACAAAACATGGAAAAAGGTAACTAAGCAAGAGAATCCACC TGCAAGCTGTGATAATGTGCAGCAAAAGAGCATCATTGAGACCATGGAGCAGCATCTGAAGC AGTTTCACGCCAAGTCGTTATTTGACCATAAGGCTCTTACTCTCAAATCACAGAAGCAAGTA AAAGTTGAAATGTCAGGGCCAGTCACAGTTTTGACTAGACAAACCACTGCTGCAGAACTTGA TAGCCACACCCCAGCTTTAGAGCAGCAAACAACTTCTTCAGAAAAGACACCAACCAAAAGAA CAGCTGCTTCTGTTCTCAATAATTTTATAGAGTCACCTTCCAAATTACTAGATACTCCTATA AAAAATTTATTGGATACACCTGTCAAGACTCAATATGATTTCCCATCTTGCAGATGTGTAGA GCAAATTATTGAAAAAGATGAAGGTCCTTTTTATACCCATCTAGGAGCAGGTCCTAATGTGG CAGCTATTAGAGAAATCATGGAAGAAAGGTTTGGACAGAAGGGTAAAGCTATTAGGATTGAA AGAGTCATCTATACTGGTAAAGAAGGCAAAAGTTCTCAGGGATGTCCTATTGCTAAGTGGGT GGTTCGCAGAAGCAGCAGTGAAGAGAAGCTACTGTGTTTGGTGCGGGAGCGAGCTGGCCACA CCTGTGAGGCTGCAGTGATTGTGATTCTCATCCTGGTGTGGGAAGGAATCCCGCTGTCTCTG GCTGACAAACTCTACTCGGAGCTTACCGAGACGCTGAGGAAATACGGCACGCTCACCAATCG CCGGTGTGCCTTGAATGAAGAGAGAACTTGCGCCTGTCAGGGGCTGGATCCAGAAACCTGTG GTGCCTCCTTCTCTTTTGGTTGTTCATGGAGCATGTACTACAATGGATGTAAGTTTGCCAGA AGCAAGATCCCAAGGAAGTTTAAGCTGCTTGGGGATGACCCAAAAGAGGAAGAGAAACTGGA GTCTCATTTGCAAAACCTGTCCACTCTTATGGCACCAACATATAAGAAACTTGCACCTGATG CATATAATAATCAGATTGAATATGAACACAGAGCACCAGAGTGCCGTCTGGGTCTGAAGGAA GGCCGTCCATTCTCAGGGGTCACTGCATGTTTGGACTTCTGTGCTCATGCCCACAGAGACTT GCACAACATGCAGAATGGCAGCACATTGGTATGCACTCTCACTAGAGAAGACAATCGAGAAT TTGGAGGAAAACCTGAGGATGAGCAGCTTCACGTTCTGCCTTTATACAAAGTCTCTGACGTG
GATGAGTTTGGGAGTGTGGAAGCTCAGGAGGAGAAAAAACGGAGTGGTGCCATTCAGGTACT GAGTTCTTTTCGGCGAAAAGTCAGGATGTTAGCAGAGCCAGTCAAGACTTGCCGACAAAGGA AACTAGAAGCCAAGAAAGCTGCAGCTGAAAAGCTTTCCTCCCTGGAGAACAGCTCAAATAAA AATGAAAAGGAAAAGTCAGCCCCATCACGTACAAAACAAACTGAAAACGCAAGCCAGGCTAA ACAGTTGGCAGAACTTTTGCGACTTTCAGGACCAGTCATGCAGCAGTCCCAGCAGCCCCAGC CTCTACAGAAGCAGCCACCACAGCCCCAGCAGCAGCAGAGACCCCAGCAGCAGCAGCCACAT CACCCTCAGACAGAGTCTGTCAACTCTTATTCTGCTTCTGGATCCACCAATCCATACATGAG ACGGCCCAATCCAGTTAGTCCTTATCCAAACTCTTCACACACTTCAGATATCTATGGAAGCA CCAGCCCTATGAACTTCTATTCCACCTCATCTCAAGCTGCAGGTTCATATTTGAATTCTTCT AATCCCATGAACCCTTACCCTGGGCTTTTGAATCAGAATACCCAATATCCATCATATCAATG CAATGGAAACCTATCAGTGGACAACTGCTCCCCATATCTGGGTTCCTATTCTCCCCAGTCTC AGCCGATGGATCTGTATAGGTATCCAAGCCAAGACCCTCTGTCTAAGCTCAGTCTACCACCC ATCCATACACTTTACCAGCCAAGGTTTGGAAATAGCCAGAGTTTTACATCTAAATACTTAGG TTATGGAAACCAAAATATGCAGGGAGATGGTTTCAGCAGTTGTACCATTAGACCAAATGTAC ATCATGTAGGGAAATTGCCTCCTTATCCCACTCATGAGATGGATGGCCACTTCATGGGAGCC ACCTCTAGATTACCACCCAATCTGAGCAATCCAAACATGGACTATAAAAATGGTGAACATCA TTCACCTTCTCACATAATCCATAACTACAGTGCAGCTCCGGGCATGTTCAACAGCTCTCTTC ATGCCCTGCATCTCCAAAACAAGGAGAATGACATGCTTTCCCACACAGCTAATGGGTTATCA AAGATGCTTCCAGCTCTTAACCATGATAGAACTGCTTGTGTCCAAGGAGGCTTACACAAATT AAGTGATGCTAATGGTCAGGAAAAGCAGCCATTGGCACTAGTCCAGGGTGTGGCTTCTGGTG CAGAGGACAACGATGAGGTCTGGTCAGACAGCGAGCAGAGCTTTCTGGATCCTGACATTGGG GGAGTGGCCGTGGCTCCAACTCATGGGTCAATTCTCATTGAGTGTGCAAAGCGTGAGCTGCA TGCCACAACCCCTTTAAAGAATCCCAATAGGAATCACCCCACCAGGATCTCCCTCGTCTTTT ACCAGCATAAGAGCATGAATGAGCCAAAACATGGCTTGGCTCTTTGGGAAGCCAAAATGGCT GAAAAAGCCCGTGAGAAAGAGGAAGAGTGTGAAAAGTATGGCCCAGACTATGTGCCTCAGAA ATCCCATGGCAAAAAAGTGAAACGGGAGCCTGCTGAGCCACATGAAACTTCAGAGCCCACTT ACCTGCGTTTCATCAAGTCTCTTGCCGAAAGGACCATGTCCGTGACCACAGACTCCACAGTA ACTACATCTCCATATGCCTTCACTCGGGTCACAGGGCCTTACAACAGATATATATGA (SEQ ID NO: 16) SEQ ID NO: 17 – Wild type human TET2 isoform B amino acid sequence MEQDRTNHVEGNRLSPFLIPSPPICQTEPLATKLQNGSPLPERAHPEVNGDTKWHSFKSYYG IPCMKGSQNSRVSPDFTQESRGYSKCLQNGGIKRTVSEPSLSGLLQIKKLKQDQKANGERRN FGVSQERNPGESSQPNVSDLSDKKESVSSVAQENAVKDFTSFSTHNCSGPENPELQILNEQE GKSANYHDKNIVLLKNKAVLMPNGATVSASSVEHTHGELLEKTLSQYYPDCVSIAVQKTTSH INAINSQATNELSCEITHPSHTSGQINSAQTSNSELPPKPAAVVSEACDADDADNASKLAAM LNTCSFQKPEQLQQQKSVFEICPSPAENNIQGTTKLASGEEFCSGSSSNLQAPGGSSERYLK QNEMNGAYFKQSSVFTKDSFSATTTPPPPSQLLLSPPPPLPQVPQLPSEGKSTLNGGVLEEH HHYPNQSNTTLLREVKIEGKPEAPPSQSPNPSTHVCSPSPMLSERPQNNCVNRNDIQTAGTM TVPLCSEKTRPMSEHLKHNPPIFGSSGELQDNCQQLMRNKEQEILKGRDKEQTRDLVPPTQH YLKPGWIELKAPRFHQAESHLKRNEASLPSILQYQPNLSNQMTSKQYTGNSNMPGGLPRQAY TQKTTQLEHKSQMYQVEMNQGQSQGTVDQHLQFQKPSHQVHFSKTDHLPKAHVQSLCGTRFH FQQRADSQTEKLMSPVLKQHLNQQASETEPFSNSHLLQHKPHKQAAQTQPSQSSHLPQNQQQ QQKLQIKNKEEILQTFPHPQSNNDQQREGSFFGQTKVEECFHGENQYSKSSEFETHNVQMGL EEVQNINRRNSPYSQTMKSSACKIQVSCSNNTHLVSENKEQTTHPELFAGNKTQNLHHMQYF PNNVIPKQDLLHRCFQEQEQKSQQASVLQGYKNRNQDMSGQQAAQLAQQRYLIHNHANVFPV PDQGGSHTQTPPQKDTQKHAALRWHLLQKQEQQQTQQPQTESCHSQMHRPIKVEPGCKPHAC MHTAPPENKTWKKVTKQENPPASCDNVQQKSIIETMEQHLKQFHAKSLFDHKALTLKSQKQV KVEMSGPVTVLTRQTTAAELDSHTPALEQQTTSSEKTPTKRTAASVLNNFIESPSKLLDTPI KNLLDTPVKTQYDFPSCRCVGKCQKCTETHGVYPELANLSSDMGFSFFF (SEQ ID NO: 17)
SEQ ID NO: 18 – Wild type human TET2 isoform B polynucleotide coding sequence ATGGAACAGGATAGAACCAACCATGTTGAGGGCAACAGACTAAGTCCATTCCTGATACCATC ACCTCCCATTTGCCAGACAGAACCTCTGGCTACAAAGCTCCAGAATGGAAGCCCACTGCCTG AGAGAGCTCATCCAGAAGTAAATGGAGACACCAAGTGGCACTCTTTCAAAAGTTATTATGGA ATACCCTGTATGAAGGGAAGCCAGAATAGTCGTGTGAGTCCTGACTTTACACAAGAAAGTAG AGGGTATTCCAAGTGTTTGCAAAATGGAGGAATAAAACGCACAGTTAGTGAACCTTCTCTCT CTGGGCTCCTTCAGATCAAGAAATTGAAACAAGACCAAAAGGCTAATGGAGAAAGACGTAAC TTCGGGGTAAGCCAAGAAAGAAATCCAGGTGAAAGCAGTCAACCAAATGTCTCCGATTTGAG TGATAAGAAAGAATCTGTGAGTTCTGTAGCCCAAGAAAATGCAGTTAAAGATTTCACCAGTT TTTCAACACATAACTGCAGTGGGCCTGAAAATCCAGAGCTTCAGATTCTGAATGAGCAGGAG GGGAAAAGTGCTAATTACCATGACAAGAACATTGTATTACTTAAAAACAAGGCAGTGCTAAT GCCTAATGGTGCTACAGTTTCTGCCTCTTCCGTGGAACACACACATGGTGAACTCCTGGAAA AAACACTGTCTCAATATTATCCAGATTGTGTTTCCATTGCGGTGCAGAAAACCACATCTCAC ATAAATGCCATTAACAGTCAGGCTACTAATGAGTTGTCCTGTGAGATCACTCACCCATCGCA TACCTCAGGGCAGATCAATTCCGCACAGACCTCTAACTCTGAGCTGCCTCCAAAGCCAGCTG CAGTGGTGAGTGAGGCCTGTGATGCTGATGATGCTGATAATGCCAGTAAACTAGCTGCAATG CTAAATACCTGTTCCTTTCAGAAACCAGAACAACTACAACAACAAAAATCAGTTTTTGAGAT ATGCCCATCTCCTGCAGAAAATAACATCCAGGGAACCACAAAGCTAGCGTCTGGTGAAGAAT TCTGTTCAGGTTCCAGCAGCAATTTGCAAGCTCCTGGTGGCAGCTCTGAACGGTATTTAAAA CAAAATGAAATGAATGGTGCTTACTTCAAGCAAAGCTCAGTGTTCACTAAGGATTCCTTTTC TGCCACTACCACACCACCACCACCATCACAATTGCTTCTTTCTCCCCCTCCTCCTCTTCCAC AGGTTCCTCAGCTTCCTTCAGAAGGAAAAAGCACTCTGAATGGTGGAGTTTTAGAAGAACAC CACCACTACCCCAACCAAAGTAACACAACACTTTTAAGGGAAGTGAAAATAGAGGGTAAACC TGAGGCACCACCTTCCCAGAGTCCTAATCCATCTACACATGTATGCAGCCCTTCTCCGATGC TTTCTGAAAGGCCTCAGAATAATTGTGTGAACAGGAATGACATACAGACTGCAGGGACAATG ACTGTTCCATTGTGTTCTGAGAAAACAAGACCAATGTCAGAACACCTCAAGCATAACCCACC AATTTTTGGTAGCAGTGGAGAGCTACAGGACAACTGCCAGCAGTTGATGAGAAACAAAGAGC AAGAGATTCTGAAGGGTCGAGACAAGGAGCAAACACGAGATCTTGTGCCCCCAACACAGCAC TATCTGAAACCAGGATGGATTGAATTGAAGGCCCCTCGTTTTCACCAAGCGGAATCCCATCT AAAACGTAATGAGGCATCACTGCCATCAATTCTTCAGTATCAACCCAATCTCTCCAATCAAA TGACCTCCAAACAATACACTGGAAATTCCAACATGCCTGGGGGGCTCCCAAGGCAAGCTTAC ACCCAGAAAACAACACAGCTGGAGCACAAGTCACAAATGTACCAAGTTGAAATGAATCAAGG GCAGTCCCAAGGTACAGTGGACCAACATCTCCAGTTCCAAAAACCCTCACACCAGGTGCACT TCTCCAAAACAGACCATTTACCAAAAGCTCATGTGCAGTCACTGTGTGGCACTAGATTTCAT TTTCAACAAAGAGCAGATTCCCAAACTGAAAAACTTATGTCCCCAGTGTTGAAACAGCACTT GAATCAACAGGCTTCAGAGACTGAGCCATTTTCAAACTCACACCTTTTGCAACATAAGCCTC ATAAACAGGCAGCACAAACACAACCATCCCAGAGTTCACATCTCCCTCAAAACCAGCAACAG CAGCAAAAATTACAAATAAAGAATAAAGAGGAAATACTCCAGACTTTTCCTCACCCCCAAAG CAACAATGATCAGCAAAGAGAAGGATCATTCTTTGGCCAGACTAAAGTGGAAGAATGTTTTC ATGGTGAAAATCAGTATTCAAAATCAAGCGAGTTCGAGACTCATAATGTCCAAATGGGACTG GAGGAAGTACAGAATATAAATCGTAGAAATTCCCCTTATAGTCAGACCATGAAATCAAGTGC ATGCAAAATACAGGTTTCTTGTTCAAACAATACACACCTAGTTTCAGAGAATAAAGAACAGA CTACACATCCTGAACTTTTTGCAGGAAACAAGACCCAAAACTTGCATCACATGCAATATTTT CCAAATAATGTGATCCCAAAGCAAGATCTTCTTCACAGGTGCTTTCAAGAACAGGAGCAGAA GTCACAACAAGCTTCAGTTCTACAGGGATATAAAAATAGAAACCAAGATATGTCTGGTCAAC AAGCTGCGCAACTTGCTCAGCAAAGGTACTTGATACATAACCATGCAAATGTTTTTCCTGTG CCTGACCAGGGAGGAAGTCACACTCAGACCCCTCCCCAGAAGGACACTCAAAAGCATGCTGC TCTAAGGTGGCATCTCTTACAGAAGCAAGAACAGCAGCAAACACAGCAACCCCAAACTGAGT CTTGCCATAGTCAGATGCACAGGCCAATTAAGGTGGAACCTGGATGCAAGCCACATGCCTGT ATGCACACAGCACCACCAGAAAACAAAACATGGAAAAAGGTAACTAAGCAAGAGAATCCACC
TGCAAGCTGTGATAATGTGCAGCAAAAGAGCATCATTGAGACCATGGAGCAGCATCTGAAGC AGTTTCACGCCAAGTCGTTATTTGACCATAAGGCTCTTACTCTCAAATCACAGAAGCAAGTA AAAGTTGAAATGTCAGGGCCAGTCACAGTTTTGACTAGACAAACCACTGCTGCAGAACTTGA TAGCCACACCCCAGCTTTAGAGCAGCAAACAACTTCTTCAGAAAAGACACCAACCAAAAGAA CAGCTGCTTCTGTTCTCAATAATTTTATAGAGTCACCTTCCAAATTACTAGATACTCCTATA AAAAATTTATTGGATACACCTGTCAAGACTCAATATGATTTCCCATCTTGCAGATGTGTAGG TAAGTGCCAGAAATGTACTGAGACACATGGCGTTTATCCAGAATTAGCAAATTTATCTTCAG ATATGGGATTTTCCTTCTTTTTTTAA (SEQ ID NO: 18) SEQ ID NO: 19 - Human TET2 isoform A catalytic domain amino acid sequence SVLNNFIESPSKLLDTPIKNLLDTPVKTQYDFPSCRCVEQIIEKDEGPFYTHLGAGPNVAAI REIMEERFGQKGKAIRIERVIYTGKEGKSSQGCPIAKWVVRRSSSEEKLLCLVRERAGHTCE AAVIVILILVWEGIPLSLADKLYSELTETLRKYGTLTNRRCALNEERTCACQGLDPETCGAS FSFGCSWSMYYNGCKFARSKIPRKFKLLGDDPKEEEKLESHLQNLSTLMAPTYKKLAPDAYN NQIEYEHRAPECRLGLKEGRPFSGVTACLDFCAHAHRDLHNMQNGSTLVCTLTREDNREFGG KPEDEQLHVLPLYKVSDVDEFGSVEAQEEKKRSGAIQVLSSFRRKVRMLAEPVKTCRQRKLE AKKAAAEKLSSLENSSNKNEKEKSAPSRTKQTENASQAKQLAELLRLSGPVMQQSQQPQPLQ KQPPQPQQQQRPQQQQPHHPQTESVNSYSASGSTNPYMRRPNPVSPYPNSSHTSDIYGSTSP MNFYSTSSQAAGSYLNSSNPMNPYPGLLNQNTQYPSYQCNGNLSVDNCSPYLGSYSPQSQPM DLYRYPSQDPLSKLSLPPIHTLYQPRFGNSQSFTSKYLGYGNQNMQGDGFSSCTIRPNVHHV GKLPPYPTHEMDGHFMGATSRLPPNLSNPNMDYKNGEHHSPSHIIHNYSAAPGMFNSSLHAL HLQNKENDMLSHTANGLSKMLPALNHDRTACVQGGLHKLSDANGQEKQPLALVQGVASGAED NDEVWSDSEQSFLDPDIGGVAVAPTHGSILIECAKRELHATTPLKNPNRNHPTRISLVFYQH KSMNEPKHGLALWEAKMAEKAREKEEECEKY (SEQ ID NO: 19) SEQ ID NO: 20 – Human TET2 isoform A catalytic domain polynucleotide coding sequence TCTGTTCTCAATAATTTTATAGAGTCACCTTCCAAATTACTAGATACTCCTATAAAAAATTT ATTGGATACACCTGTCAAGACTCAATATGATTTCCCATCTTGCAGATGTGTAGAGCAAATTA TTGAAAAAGATGAAGGTCCTTTTTATACCCATCTAGGAGCAGGTCCTAATGTGGCAGCTATT AGAGAAATCATGGAAGAAAGGTTTGGACAGAAGGGTAAAGCTATTAGGATTGAAAGAGTCAT CTATACTGGTAAAGAAGGCAAAAGTTCTCAGGGATGTCCTATTGCTAAGTGGGTGGTTCGCA GAAGCAGCAGTGAAGAGAAGCTACTGTGTTTGGTGCGGGAGCGAGCTGGCCACACCTGTGAG GCTGCAGTGATTGTGATTCTCATCCTGGTGTGGGAAGGAATCCCGCTGTCTCTGGCTGACAA ACTCTACTCGGAGCTTACCGAGACGCTGAGGAAATACGGCACGCTCACCAATCGCCGGTGTG CCTTGAATGAAGAGAGAACTTGCGCCTGTCAGGGGCTGGATCCAGAAACCTGTGGTGCCTCC TTCTCTTTTGGTTGTTCATGGAGCATGTACTACAATGGATGTAAGTTTGCCAGAAGCAAGAT CCCAAGGAAGTTTAAGCTGCTTGGGGATGACCCAAAAGAGGAAGAGAAACTGGAGTCTCATT TGCAAAACCTGTCCACTCTTATGGCACCAACATATAAGAAACTTGCACCTGATGCATATAAT AATCAGATTGAATATGAACACAGAGCACCAGAGTGCCGTCTGGGTCTGAAGGAAGGCCGTCC ATTCTCAGGGGTCACTGCATGTTTGGACTTCTGTGCTCATGCCCACAGAGACTTGCACAACA TGCAGAATGGCAGCACATTGGTATGCACTCTCACTAGAGAAGACAATCGAGAATTTGGAGGA AAACCTGAGGATGAGCAGCTTCACGTTCTGCCTTTATACAAAGTCTCTGACGTGGATGAGTT TGGGAGTGTGGAAGCTCAGGAGGAGAAAAAACGGAGTGGTGCCATTCAGGTACTGAGTTCTT TTCGGCGAAAAGTCAGGATGTTAGCAGAGCCAGTCAAGACTTGCCGACAAAGGAAACTAGAA GCCAAGAAAGCTGCAGCTGAAAAGCTTTCCTCCCTGGAGAACAGCTCAAATAAAAATGAAAA GGAAAAGTCAGCCCCATCACGTACAAAACAAACTGAAAACGCAAGCCAGGCTAAACAGTTGG CAGAACTTTTGCGACTTTCAGGACCAGTCATGCAGCAGTCCCAGCAGCCCCAGCCTCTACAG AAGCAGCCACCACAGCCCCAGCAGCAGCAGAGACCCCAGCAGCAGCAGCCACATCACCCTCA GACAGAGTCTGTCAACTCTTATTCTGCTTCTGGATCCACCAATCCATACATGAGACGGCCCA ATCCAGTTAGTCCTTATCCAAACTCTTCACACACTTCAGATATCTATGGAAGCACCAGCCCT
ATGAACTTCTATTCCACCTCATCTCAAGCTGCAGGTTCATATTTGAATTCTTCTAATCCCAT GAACCCTTACCCTGGGCTTTTGAATCAGAATACCCAATATCCATCATATCAATGCAATGGAA ACCTATCAGTGGACAACTGCTCCCCATATCTGGGTTCCTATTCTCCCCAGTCTCAGCCGATG GATCTGTATAGGTATCCAAGCCAAGACCCTCTGTCTAAGCTCAGTCTACCACCCATCCATAC ACTTTACCAGCCAAGGTTTGGAAATAGCCAGAGTTTTACATCTAAATACTTAGGTTATGGAA ACCAAAATATGCAGGGAGATGGTTTCAGCAGTTGTACCATTAGACCAAATGTACATCATGTA GGGAAATTGCCTCCTTATCCCACTCATGAGATGGATGGCCACTTCATGGGAGCCACCTCTAG ATTACCACCCAATCTGAGCAATCCAAACATGGACTATAAAAATGGTGAACATCATTCACCTT CTCACATAATCCATAACTACAGTGCAGCTCCGGGCATGTTCAACAGCTCTCTTCATGCCCTG CATCTCCAAAACAAGGAGAATGACATGCTTTCCCACACAGCTAATGGGTTATCAAAGATGCT TCCAGCTCTTAACCATGATAGAACTGCTTGTGTCCAAGGAGGCTTACACAAATTAAGTGATG CTAATGGTCAGGAAAAGCAGCCATTGGCACTAGTCCAGGGTGTGGCTTCTGGTGCAGAGGAC AACGATGAGGTCTGGTCAGACAGCGAGCAGAGCTTTCTGGATCCTGACATTGGGGGAGTGGC CGTGGCTCCAACTCATGGGTCAATTCTCATTGAGTGTGCAAAGCGTGAGCTGCATGCCACAA CCCCTTTAAAGAATCCCAATAGGAATCACCCCACCAGGATCTCCCTCGTCTTTTACCAGCAT AAGAGCATGAATGAGCCAAAACATGGCTTGGCTCTTTGGGAAGCCAAAATGGCTGAAAAAGC CCGTGAGAAAGAGGAAGAGTGTGAAAAGTATGGCCCAGACTATGTGCCTCAGAAATCCCATG GCAAAAAAGTGAAACGGGAGCCTGCTGAGCCACATGAAACTTCAGAGCCCACTTACCTGCGT TTCATCAAGTCTCTTGCCGAAAGGACCATGTCCGTGACCACAGACTCCACAGTAACTACATC TCCATATGCCTTCACTCGGGTCACAGGGCCTTACAACAGATATATATGA (SEQ ID NO: 20) SEQ ID NO: 21 – Mouse TET2 catalytic domain amino acid sequence GSTSNGRQCAGIRPLQSQNGKCEGCNPDKDEAPYYTHLGAGPDVAAIRTLMEERYGEKGKAI RIEKVIYTGKEGKSSQGCPIAKWVYRRSSEEEKLLCLVRVRPNHTCETAVMVIAIMLWDGIP KLLASELYSELTDILGKCGICTNRRCSQNETKKKQSPPRNCCCQGENPETCGASFSFGCSWS MYYNGCKFARSKKPRKFRLHGAEPKEEERLGSHLQNLATVIAPIYKKLAPDAYNNQVEFEHQ APDCCLGLKEGRPFSGVTACLDFSAHSHRDQQNMPNGSTVVVTLNREDNREVGAKPEDEQFH VLPMYIIAPEDEFGSTEGQEKKIRMGSIEVLQSFRRRRVIRIGELPKSCKKKAEPKKAKTKK AARKHSSLENCSSRTEKGKSSSHTKLMENASHMKQMTAQPQLSGPVIRQPPTLQRHLQQGQR PQQPQPPQPQPQTTPQPQPQPQHIMPGNSQSVGSHCSGSTSVYTRQPTPHSPYPSSAHTSDI YGDTNHVNFYPTSSHASGSYLNPSNYMNPYLGLLNQNNQYAPFPYNGSVPVDNGSPFLGSYS PQAQSRDLHRYPNQDHLTNQNLPPIHTLHQQTFGDSPSKYLSYGNQNMQRDAFTTNSTLKPN VHHLATFSPYPTPKMDSHFMGAASRSPYSHPHTDYKTSEHHLPSHTVYSYTAAASGSSSSHA FHNKENDNIANGLSRVLPGFNHDRTASAQELLYSLTGSSQEKQPEVSGQDAAAVQEIEYWSD SEHNFQDPCIGGVAIAPTHGSILIECAKCEVHATTKVNDPDRNHPTRISLVLYRHKNLFLPK HCLALWEAKMAEKARKEEECGKNGSDHVSQKNHGKQEKREPTGPQEPSYLRFIQSLAENTGS VTTDSTVTTSPYAFTQVTGPYNTFV (SEQ ID NO: 21) SEQ ID NO: 22 – Mouse TET2 catalytic domain polynucleotide coding sequence ggatccactagtaacggccgccagtgtgctggaattcgccctttacaaagtcagaatggcaa atgtgaaggatgcaatccagacaaagatgaagctccttattatacccatctgggagctggtc ctgatgtggcagctattagaacactcatggaagaaaggtatggagagaagggtaaagctatt aggattgaaaaagtcatatatactggtaaagaaggcaagagctctcagggatgtcctattgc taaatgggtatatcggagatcgagtgaggaggagaaactactgtgtttggtacgagtgcgac ctaaccacacatgtgagacggcggtgatggtaattgccatcatgttgtgggacggaatccca aagctactcgcatcagaactctactcagaacttacagatatcttgggcaagtgtggcatatg caccaaccgtcgctgttctcagaatgaaacgaagaaaaagcaatcaccacccagaaactgtt gttgtcagggtgagaatccagagacctgtggtgcctccttttcttttggttgttcttggagc atgtactataatggatgtaagtttgccagaagcaagaaaccaaggaaatttaggctacatgg agctgagccaaaagaggaagagagactaggttctcatttgcaaaacctggctactgtcattg
ctccaatatacaagaagcttgcacccgatgcatacaataatcaggttgaatttgaacaccaa gccccagactgctgtttgggtctgaaggaaggccggccattctcaggagtcactgcatgttt ggacttctctgctcattcccacagagaccagcagaacatgccaaatggcagtacagtggtgg tcaccctcaatagagaagacaatcgagaagtcggagctaagcctgaggatgagcagttccac gtgctgcctatgtacatcatcgcccctgaggatgagtttgggagtacggaaggccaggagaa gaagatacggatggggtccattgaggttctgcagtcatttcggaggagaagggtcataagga taggagagctgcccaagagttgcaagaagaaagcggagcccaagaaagccaagaccaagaaa gcagctcgaaagcattcctctctggagaactgctccagtaggactgagaagggaaagtcttc ctcacatacaaagctgatggaaaatgcaagccatatgaaacaaatgacagcacaaccgcagc tttcgggcccggtcatccggcagccaccaacactccagaggcaccttcagcaagggcagagg ccacagcagccgcagccacctcagccgcagccgcagacgacacctcagccacagccacagcc acagcatatcatgcccggtaactctcagtctgttggttctcattgttctggatccaccagtg tctacacgagacagcctactcctcacagtccttatcccagctcagcacacacctcagatatt tatggagataccaaccatgtgaacttttaccccacttcatctcatgcctcgggttcatattt gaatccttctaattacatgaacccctaccttgggcttttgaatcagaataaccaatatgcac cttttccatacaatgggagtgtgccagtggacaatggttcccctttcttaggttcttattcc ccccaggctcagtccagggatctacatagatatccaaaccaggaccatctcaccaatcagaa cttaccacccatccacacccttcaccaacagacgtttggggacagtccctctaagtacttaa gttatggaaaccaaaatatgcagagagatgccttcactactaactccaccctaaaaccaaat gtacaccacctagcaacgttttctccttaccccacccccaagatggatagtcatttcatggg agctgcctccagatcaccatacagccacccacacactgactacaaaaccagtgagcatcatc taccctctcacacggtctacagctacacggcagcagcttcggggagcagttccagccacgcc ttccacaacaaggagaatgacaacatagccaatgggctctcaagagtgcttccagggtttaa tcatgatagaactgcttctgcccaagaactattatacagtctgactggcagcagtcaggaga agcagcctgaggtgtcaggccaggatgcagctgctgtgcaggaaattgagtattggtcagat agtgagcacaactttcaggatccttgcattggaggggtggctatagccccaactcatgggtc aattcttattgagtgtgcaaagtgtgaggttcatgccacaaccaaagtaaacgatcccgacc ggaatcaccccaccaggatctcacttgtactgtataggcataagaatttgtttctaccaaaa cattgtttggctctctgggaagccaaaatggctgaaaaggcccggaaagaggaagagtgcgg aaagaatggatcagaccacgtgtctcagaaaaatcatggcaaacaggaaaagcgtgagccca cagggccacaggaacccagttacctgcgtttcatccagtctcttgctgagaacacagggtct gtgactacggattctaccgtgactacatcaccatatgctttcactcaggtcacagggcctta caacacatttgta (SEQ ID NO: 22) SEQ ID NO: 23 – Mouse TET2 catalytic domain (dead) amino acid sequence GSTSNGRQCAGIRPLQSQNGKCEGCNPDKDEAPYYTHLGAGPDVAAIRTLMEERYGEKGKAI RIEKVIYTGKEGKSSQGCPIAKWVYRRSSEEEKLLCLVRVRPNHTCETAVMVIAIMLWDGIP KLLASELYSELTDILGKCGICTNRRCSQNETKKKQSPPRNCCCQGENPETCGASFSFGCSWS MYYNGCKFARSKKPRKFRLHGAEPKEEERLGSHLQNLATVIAPIYKKLAPDAYNNQVEFEHQ APDCCLGLKEGRPFSGVTACLDFSAHSYRAQQNMPNGSTVVVTLNREDNREVGAKPEDEQFH VLPMYIIAPEDEFGSTEGQEKKIRMGSIEVLQSFRRRRVIRIGELPKSCKKKAEPKKAKTKK AARKHSSLENCSSRTEKGKSSSHTKLMENASHMKQMTAQPQLSGPVIRQPPTLQRHLQQGQR PQQPQPPQPQPQTTPQPQPQPQHIMPGNSQSVGSHCSGSTSVYTRQPTPHSPYPSSAHTSDI YGDTNHVNFYPTSSHASGSYLNPSNYMNPYLGLLNQNNQYAPFPYNGSVPVDNGSPFLGSYS PQAQSRDLHRYPNQDHLTNQNLPPIHTLHQQTFGDSPSKYLSYGNQNMQRDAFTTNSTLKPN VHHLATFSPYPTPKMDSHFMGAASRSPYSHPHTDYKTSEHHLPSHTVYSYTAAASGSSSSHA FHNKENDNIANGLSRVLPGFNHDRTASAQELLYSLTGSSQEKQPEVSGQDAAAVQEIEYWSD SEHNFQDPCIGGVAIAPTHGSILIECAKCEVHATTKVNDPDRNHPTRISLVLYRHKNLFLPK HCLALWEAKMAEKARKEEECGKNGSDHVSQKNHGKQEKREPTGPQEPSYLRFIQSLAENTGS VTTDSTVTTSPYAFTQVTGPYNTFV (SEQ ID NO: 23)
SEQ ID NO: 24 – Mouse TET2 catalytic domain (dead) polynucleotide coding sequence GGATCCACTAGTAACGGCCGCCAGTGTGCTGGAATTCGCCCTTTACAAAGTCAGAATGGCAA ATGTGAAGGATGCAATCCAGACAAAGATGAAGCTCCTTATTATACCCATCTGGGAGCTGGTC CTGATGTGGCAGCTATTAGAACACTCATGGAAGAAAGGTATGGAGAGAAGGGTAAAGCTATT AGGATTGAAAAAGTCATATATACTGGTAAAGAAGGCAAGAGCTCTCAGGGATGTCCTATTGC TAAATGGGTATATCGGAGATCGAGTGAGGAGGAGAAACTACTGTGTTTGGTACGAGTGCGAC CTAACCACACATGTGAGACGGCGGTGATGGTAATTGCCATCATGTTGTGGGACGGAATCCCA AAGCTACTCGCATCAGAACTCTACTCAGAACTTACAGATATCTTGGGCAAGTGTGGCATATG CACCAACCGTCGCTGTTCTCAGAATGAAACGAAGAAAAAGCAATCACCACCCAGAAACTGTT GTTGTCAGGGTGAGAATCCAGAGACCTGTGGTGCCTCCTTTTCTTTTGGTTGTTCTTGGAGC ATGTACTATAATGGATGTAAGTTTGCCAGAAGCAAGAAACCAAGGAAATTTAGGCTACATGG AGCTGAGCCAAAAGAGGAAGAGAGACTAGGTTCTCATTTGCAAAACCTGGCTACTGTCATTG CTCCAATATACAAGAAGCTTGCACCCGATGCATACAATAATCAGGTTGAATTTGAACACCAA GCCCCAGACTGCTGTTTGGGTCTGAAGGAAGGCCGGCCATTCTCAGGAGTCACTGCATGTTT GGACTTCTCTGCTCATTCCTACAGAGCCCAGCAGAACATGCCAAATGGCAGTACAGTGGTGG TCACCCTCAATAGAGAAGACAATCGAGAAGTCGGAGCTAAGCCTGAGGATGAGCAGTTCCAC GTGCTGCCTATGTACATCATCGCCCCTGAGGATGAGTTTGGGAGTACGGAAGGCCAGGAGAA GAAGATACGGATGGGGTCCATTGAGGTTCTGCAGTCATTTCGGAGGAGAAGGGTCATAAGGA TAGGAGAGCTGCCCAAGAGTTGCAAGAAGAAAGCGGAGCCCAAGAAAGCCAAGACCAAGAAA GCAGCTCGAAAGCATTCCTCTCTGGAGAACTGCTCCAGTAGGACTGAGAAGGGAAAGTCTTC CTCACATACAAAGCTGATGGAAAATGCAAGCCATATGAAACAAATGACAGCACAACCGCAGC TTTCGGGCCCGGTCATCCGGCAGCCACCAACACTCCAGAGGCACCTTCAGCAAGGGCAGAGG CCACAGCAGCCGCAGCCACCTCAGCCGCAGCCGCAGACGACACCTCAGCCACAGCCACAGCC ACAGCATATCATGCCCGGTAACTCTCAGTCTGTTGGTTCTCATTGTTCTGGATCCACCAGTG TCTACACGAGACAGCCTACTCCTCACAGTCCTTATCCCAGCTCAGCACACACCTCAGATATT TATGGAGATACCAACCATGTGAACTTTTACCCCACTTCATCTCATGCCTCGGGTTCATATTT GAATCCTTCTAATTACATGAACCCCTACCTTGGGCTTTTGAATCAGAATAACCAATATGCAC CTTTTCCATACAATGGGAGTGTGCCAGTGGACAATGGTTCCCCTTTCTTAGGTTCTTATTCC CCCCAGGCTCAGTCCAGGGATCTACATAGATATCCAAACCAGGACCATCTCACCAATCAGAA CTTACCACCCATCCACACCCTTCACCAACAGACGTTTGGGGACAGTCCCTCTAAGTACTTAA GTTATGGAAACCAAAATATGCAGAGAGATGCCTTCACTACTAACTCCACCCTAAAACCAAAT GTACACCACCTAGCAACGTTTTCTCCTTACCCCACCCCCAAGATGGATAGTCATTTCATGGG AGCTGCCTCCAGATCACCATACAGCCACCCACACACTGACTACAAAACCAGTGAGCATCATC TACCCTCTCACACGGTCTACAGCTACACGGCAGCAGCTTCGGGGAGCAGTTCCAGCCACGCC TTCCACAACAAGGAGAATGACAACATAGCCAATGGGCTCTCAAGAGTGCTTCCAGGGTTTAA TCATGATAGAACTGCTTCTGCCCAAGAACTATTATACAGTCTGACTGGCAGCAGTCAGGAGA AGCAGCCTGAGGTGTCAGGCCAGGATGCAGCTGCTGTGCAGGAAATTGAGTATTGGTCAGAT AGTGAGCACAACTTTCAGGATCCTTGCATTGGAGGGGTGGCTATAGCCCCAACTCATGGGTC AATTCTTATTGAGTGTGCAAAGTGTGAGGTTCATGCCACAACCAAAGTAAACGATCCCGACC GGAATCACCCCACCAGGATCTCACTTGTACTGTATAGGCATAAGAATTTGTTTCTACCAAAA CATTGTTTGGCTCTCTGGGAAGCCAAAATGGCTGAAAAGGCCCGGAAAGAGGAAGAGTGCGG AAAGAATGGATCAGACCACGTGTCTCAGAAAAATCATGGCAAACAGGAAAAGCGTGAGCCCA CAGGGCCACAGGAACCCAGTTACCTGCGTTTCATCCAGTCTCTTGCTGAGAACACAGGGTCT GTGACTACGGATTCTACCGTGACTACATCACCATATGCTTTCACTCAGGTCACAGGGCCTTA CAACACATTTGTATGA (SEQ ID NO: 24) A. Targeting Proteins and Polypeptides [0182] In certain aspects, proteins capable of RNA m5C installation, reading/binding, and/or erasing, or polypeptides derived therefrom, are guided to a target RNA molecule (such
as but not limited to, an RNA molecule comprising a m5C RNA feature) by a targeting element comprising a protein, polypeptide, and/or RNA molecule. In some aspects, a targeting element comprising a protein and/or polypeptide can be guided to a target RNA element by a complementary, or at least partially complementary, RNA molecule. [0183] In some aspects, a targeting element comprises a Cas protein. In some aspects, a targeting element comprises an engineered Cas protein with reduced and/or absent gene-editing activity relative to non-engineered Cas proteins, such proteins may be referenced as “dead” Cas (dCas) proteins. In some aspects, dCas proteins such as but not limited to dCas13d, can have less than or equal to 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, or 12, or any range derivable therein, less potent gene-editing activity relative to non-engineered parental Cas proteins. In some aspects, a Cas protein can be, but is not limited to a wild type and/or modified variant of Cas1, Cas1B, Cas2, Cas3, Cas4, Cas5, Cas6, Cas7, Cas8, Cas9 (also known as Csn1 and Csx12), Cas10, Csy1, Csy2, Csy3, Cse1, Cse2, Csc1, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, Cmr1, Cmr3, Cmr4, Cmr5, Cmr6, Csb1, Csb2, Csb3, Csx17, Csx14, Csx10, Csx16, CsaX, Csx3, Csx1, Csx15, Csfl, Csf2, Csf3, Csf4, Cas13a, Cas13b, Cas13c, Cas13d, homologs thereof, or modified versions thereof. These enzymes are known; for example, the amino acid sequence of S. pyogenes Cas9 protein may be found in the SwissProt database under accession number Q99ZW2. In some aspects, a Cas protein can be, or can be derived from, a type I, type II, type III, type IV, type V, and/or type VI, CRISPR systems. [0184] In some aspects, a targeting element, such as a targeting protein, can be linked to a protein (or polypeptide derived therefrom) capable of RNA m5C installation, reading (e.g., binding), and/or erasing (e.g., oxidation). In some aspects, a targeting protein can be directly fused to a protein (or polypeptide derived therefrom) capable of RNA m5C installation, reading (e.g., binding), and/or erasing (e.g., oxidation). In some aspects, a targeting protein can be linked to a protein (or polypeptide derived therefrom) capable of RNA m5C installation, reading (e.g., binding), and/or erasing (e.g., oxidation) via a linker. In some aspects, a linker comprises a polypeptide. In some aspects, a linker polypeptide comprises or is a flexible polypeptide. In some aspects, a linker polypeptide comprises a sequence that is, is about, or is at least about 80%, 85% 90%, 95%, 98%, 99%, 100%, or any range derivable therein, identical to SEQ ID NO: 33. [0185] In some aspects, a targeting element comprises, consists essentially of, or consists of an amino acid sequence or is encoded by a polynucleotide sequence, that is, is about, or is at least about 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%,
90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%, or any percentage derivable therein, identity to any one of SEQ ID NOs: 25-30. SEQ ID NO: 25 - Exemplary catalytically dead Cas13d (dCas13d) amino acid sequence NIPALVENQKKYFGTYSVMAMLNAQTVLDHIQKVADIEGEQNENNENLWFHPVMSHLYNAKN GYDKQPEKTMFIIERLQSYFPFLKIMAENQREYSNGKYKQNRVEVNSNDIFEVLKRAFGVLK MYRDLTNAYKTYEEKLNDGCEFLTSTEQPLSGMINNYYTVALRNMNERYGYKTEDLAFIQDK RFKFVKDAYGKKKSQVNTGFFLSLQDYNGDTQKKLHLSGVGIALLICLFLDKQYINIFLSRL PIFSSYNAQSEERRIIIRSFGINSIKLPKDRIHSEKSNKSVAMDMLNEVKRCPDELFTTLSA EKQSRFRIISDDHNEVLMKRSSDRFVPLLLQYIDYGKLFDHIRFHVNMGKLRYLLKADKTCI DGQTRVRVIEQPLNGFGRLEEAETMRKQENGTFGNSGIRIRDFENMKRDDANPANYPYIVDT YTHYILENNKVEMFINDKEDSAPLLPVIEDDRYVVKTIPSCRMSTLEIPAMAFHMFLFGSKK TEKLIVDVHNRYKRLFQAMQKEEVTAENIASFGIAESDLPQKILDLISGNAHGKDVDAFIRL TVDDMLTDTERRIKRFKDDRKSIRSADNKMGKRGFKQISTGKLADFLAKDIVLFQPSVNDGE NKITGLNYRIMQSAIAVYDSGDDYEAKQQFKLMFEKARLIGKGTTEPHPFLYKVFARSIPAN AVEFYERYLIERKFYLTGLSNEIKKGNRVDVPFIRRDQNKWKTPAMKTLGRIYSEDLPVELP RQMFDNEIKSHLKSLPQMEGIDFNNANVTYLIAEYMKRVLDDDFQTFYQWNRNYRYMDMLKG EYDRKGSLQHCFTSVEEREGLWKERASRTERYRKQASNKIRSNRQMRNASSEEIETILDKRL SNSRNEYQKSEKVIRRYRVQDALLFLLAKKTLTELADFDGERFKLKEIMPDAEKGILSEIMP MSFTFEKGGKKYTITSEGMKLKNYGDFFVLASDKRIGNLLELVGSDIVSKEDGSKRTADGSE F (SEQ ID NO: 25) SEQ ID NO: 26 - Exemplary catalytically dead Cas13d (dCas13d) polynucleotide sequence aacatccccgctctggtggaaaaccagaagaagtactttggcacctacagcgtgatggccat gctgaacgctcagaccgtgctggaccacatccagaaggtggccgatattgagggcgagcaga acgagaacaacgagaatctgtggtttcaccccgtgatgagccacctgtacaacgccaagaac ggctacgacaagcagcccgagaaaaccatgttcatcatcgagcggctgcagagctacttccc attcctgaagatcatggccgagaaccagagagagtacagcaacggcaagtacaagcagaacc gcgtggaagtgaacagcaacgacatcttcgaggtgctgaagcgcgccttcggcgtgctgaag atgtacagggacctgaccaacgcatacaagacctacgaggaaaagctgaacgacggctgcga gttcctgaccagcacagagcaacctctgagcggcatgatcaacaactactacacagtggccc tgcggaacatgaacgagagatacggctacaagacagaggacctggccttcatccaggacaag cggttcaagttcgtgaaggacgcctacggcaagaaaaagtcccaagtgaataccggattctt cctgagcctgcaggactacaacggcgacacacagaagaagctgcacctgagcggagtgggaa tcgccctgctgatctgcctgttcctggacaagcagtacatcaacatctttctgagcaggctg cccatcttctccagctacaatgcccagagcgaggaacggcggatcatcatcagatccttcgg catcaacagcatcaagctgcccaaggaccggatccacagcgagaagtccaacaagagcgtgg ccatggatatgctcaacgaagtgaagcggtgccccgacgagctgttcacaacactgtctgcc gagaagcagtcccggttcagaatcatcagcgacgaccacaatgaagtgctgatgaagcggag cagcgacagattcgtgcctctgctgctgcagtatatcgattacggcaagctgttcgaccaca tcaggttccacgtgaacatgggcaagctgagatacctgctgaaggccgacaagacctgcatc gacggccagaccagagtcagagtgatcgagcagcccctgaacggcttcggcagactggaaga ggccgagacaatgcggaagcaagagaacggcaccttcggcaacagcggcatccggatcagag acttcgagaacatgaagcgggacgacgccaatcctgccaactatccctacatcgtggacacc tacacacactacatcctggaaaacaacaaggtcgagatgtttatcaacgacaaagaggacag cgccccactgctgcccgtgatcgaggatgatagatacgtggtcaagacaatccccagctgcc ggatgagcaccctggaaattccagccatggccttccacatgtttctgttcggcagcaagaaa accgagaagctgatcgtggacgtgcacaaccggtacaagagactgttccaggccatgcagaa agaagaagtgaccgccgagaatatcgccagcttcggaatcgccgagagcgacctgcctcaga
agatcctggatctgatcagcggcaatgcccacggcaaggatgtggacgccttcatcagactg accgtggacgacatgctgaccgacaccgagcggagaatcaagagattcaaggacgaccggaa gtccattcggagcgccgacaacaagatgggaaagagaggcttcaagcagatctccacaggca agctggccgacttcctggccaaggacatcgtgctgtttcagcccagcgtgaacgatggcgag aacaagatcaccggcctgaactaccggatcatgcagagcgccattgccgtgtacgatagcgg cgacgattacgaggccaagcagcagttcaagctgatgttcgagaaggcccggctgatcggca agggcacaacagagcctcatccatttctgtacaaggtgttcgcccgcagcatccccgccaat gccgtcgagttctacgagcgctacctgatcgagcggaagttctacctgaccggcctgtccaa cgagatcaagaaaggcaacagagtggatgtgcccttcatccggcgggaccagaacaagtgga aaacacccgccatgaagaccctgggcagaatctacagcgaggatctgcccgtggaactgccc agacagatgttcgacaatgagatcaagtcccacctgaagtccctgccacagatggaaggcat cgacttcaacaatgccaacgtgacctatctgatcgccgagtacatgaagagagtgctggacg acgacttccagaccttctaccagtggaaccgcaactaccggtacatggacatgcttaagggc gagtacgacagaaagggctccctgcagcactgcttcaccagcgtggaagagagagaaggcct ctggaaagagcgggcctccagaacagagcggtacagaaagcaggccagcaacaagatccgca gcaaccggcagatgagaaacgccagcagcgaagagatcgagacaatcctggataagcggctg agcaacagccggaacgagtaccagaaaagcgagaaagtgatccggcgctacagagtgcagga tgccctgctgtttctgctggccaaaaagaccctgaccgaactggccgatttcgacggcgaga ggttcaaactgaaagaaatcatgcccgacgccgagaagggaatcctgagcgagatcatgccc atgagcttcaccttcgagaaaggcggcaagaagtacaccatcaccagcgagggcatgaagct gaagaactacggcgacttctttgtgctggctagcgacaagaggatcggcaacctgctggaac tcgtgggcagcgacatcgtgtccaaagaggatggatccaaaagaaccgccgacggcagcgaa ttc (SEQ ID NO: 26) SEQ ID NO: 27 - Exemplary catalytically dead Cas13d (dCas13d) fused to mouse TET2 catalytic domain (TET(CD)) amino acid (shown with optional HA tag, SV40 NLS, and Flag Tag, each of which are underlined) MYPYDVPDYASPKKKRKVNIPALVENQKKYFGTYSVMAMLNAQTVLDHIQKVADIEGEQNEN NENLWFHPVMSHLYNAKNGYDKQPEKTMFIIERLQSYFPFLKIMAENQREYSNGKYKQNRVE VNSNDIFEVLKRAFGVLKMYRDLTNAYKTYEEKLNDGCEFLTSTEQPLSGMINNYYTVALRN MNERYGYKTEDLAFIQDKRFKFVKDAYGKKKSQVNTGFFLSLQDYNGDTQKKLHLSGVGIAL LICLFLDKQYINIFLSRLPIFSSYNAQSEERRIIIRSFGINSIKLPKDRIHSEKSNKSVAMD MLNEVKRCPDELFTTLSAEKQSRFRIISDDHNEVLMKRSSDRFVPLLLQYIDYGKLFDHIRF HVNMGKLRYLLKADKTCIDGQTRVRVIEQPLNGFGRLEEAETMRKQENGTFGNSGIRIRDFE NMKRDDANPANYPYIVDTYTHYILENNKVEMFINDKEDSAPLLPVIEDDRYVVKTIPSCRMS TLEIPAMAFHMFLFGSKKTEKLIVDVHNRYKRLFQAMQKEEVTAENIASFGIAESDLPQKIL DLISGNAHGKDVDAFIRLTVDDMLTDTERRIKRFKDDRKSIRSADNKMGKRGFKQISTGKLA DFLAKDIVLFQPSVNDGENKITGLNYRIMQSAIAVYDSGDDYEAKQQFKLMFEKARLIGKGT TEPHPFLYKVFARSIPANAVEFYERYLIERKFYLTGLSNEIKKGNRVDVPFIRRDQNKWKTP AMKTLGRIYSEDLPVELPRQMFDNEIKSHLKSLPQMEGIDFNNANVTYLIAEYMKRVLDDDF QTFYQWNRNYRYMDMLKGEYDRKGSLQHCFTSVEEREGLWKERASRTERYRKQASNKIRSNR QMRNASSEEIETILDKRLSNSRNEYQKSEKVIRRYRVQDALLFLLAKKTLTELADFDGERFK LKEIMPDAEKGILSEIMPMSFTFEKGGKKYTITSEGMKLKNYGDFFVLASDKRIGNLLELVG SDIVSKEDGSKRTADGSEFDYKDDDDKGSTSNGRQCAGIRPLQSQNGKCEGCNPDKDEAPYY THLGAGPDVAAIRTLMEERYGEKGKAIRIEKVIYTGKEGKSSQGCPIAKWVYRRSSEEEKLL CLVRVRPNHTCETAVMVIAIMLWDGIPKLLASELYSELTDILGKCGICTNRRCSQNETKKKQ SPPRNCCCQGENPETCGASFSFGCSWSMYYNGCKFARSKKPRKFRLHGAEPKEEERLGSHLQ NLATVIAPIYKKLAPDAYNNQVEFEHQAPDCCLGLKEGRPFSGVTACLDFSAHSHRDQQNMP NGSTVVVTLNREDNREVGAKPEDEQFHVLPMYIIAPEDEFGSTEGQEKKIRMGSIEVLQSFR RRRVIRIGELPKSCKKKAEPKKAKTKKAARKHSSLENCSSRTEKGKSSSHTKLMENASHMKQ MTAQPQLSGPVIRQPPTLQRHLQQGQRPQQPQPPQPQPQTTPQPQPQPQHIMPGNSQSVGSH
CSGSTSVYTRQPTPHSPYPSSAHTSDIYGDTNHVNFYPTSSHASGSYLNPSNYMNPYLGLLN QNNQYAPFPYNGSVPVDNGSPFLGSYSPQAQSRDLHRYPNQDHLTNQNLPPIHTLHQQTFGD SPSKYLSYGNQNMQRDAFTTNSTLKPNVHHLATFSPYPTPKMDSHFMGAASRSPYSHPHTDY KTSEHHLPSHTVYSYTAAASGSSSSHAFHNKENDNIANGLSRVLPGFNHDRTASAQELLYSL TGSSQEKQPEVSGQDAAAVQEIEYWSDSEHNFQDPCIGGVAIAPTHGSILIECAKCEVHATT KVNDPDRNHPTRISLVLYRHKNLFLPKHCLALWEAKMAEKARKEEECGKNGSDHVSQKNHGK QEKREPTGPQEPSYLRFIQSLAENTGSVTTDSTVTTSPYAFTQVTGPYNTFVV (SEQ ID NO: 27) SEQ ID NO: 28 - Exemplary catalytically dead Cas13d (dCas13d) fused to mouse TET2 catalytic domain (TET(CD)) polynucleotide coding sequence (shown with optional HA tag, SV40 NLS, and Flag Tag, each of which are underlined) ATGtacccatacgatgttccagattacgcttcgccaaagaagaagcggaaagtcaacatccc cgctctggtggaaaaccagaagaagtactttggcacctacagcgtgatggccatgctgaacg ctcagaccgtgctggaccacatccagaaggtggccgatattgagggcgagcagaacgagaac aacgagaatctgtggtttcaccccgtgatgagccacctgtacaacgccaagaacggctacga caagcagcccgagaaaaccatgttcatcatcgagcggctgcagagctacttcccattcctga agatcatggccgagaaccagagagagtacagcaacggcaagtacaagcagaaccgcgtggaa gtgaacagcaacgacatcttcgaggtgctgaagcgcgccttcggcgtgctgaagatgtacag ggacctgaccaacgcatacaagacctacgaggaaaagctgaacgacggctgcgagttcctga ccagcacagagcaacctctgagcggcatgatcaacaactactacacagtggccctgcggaac atgaacgagagatacggctacaagacagaggacctggccttcatccaggacaagcggttcaa gttcgtgaaggacgcctacggcaagaaaaagtcccaagtgaataccggattcttcctgagcc tgcaggactacaacggcgacacacagaagaagctgcacctgagcggagtgggaatcgccctg ctgatctgcctgttcctggacaagcagtacatcaacatctttctgagcaggctgcccatctt ctccagctacaatgcccagagcgaggaacggcggatcatcatcagatccttcggcatcaaca gcatcaagctgcccaaggaccggatccacagcgagaagtccaacaagagcgtggccatggat atgctcaacgaagtgaagcggtgccccgacgagctgttcacaacactgtctgccgagaagca gtcccggttcagaatcatcagcgacgaccacaatgaagtgctgatgaagcggagcagcgaca gattcgtgcctctgctgctgcagtatatcgattacggcaagctgttcgaccacatcaggttc cacgtgaacatgggcaagctgagatacctgctgaaggccgacaagacctgcatcgacggcca gaccagagtcagagtgatcgagcagcccctgaacggcttcggcagactggaagaggccgaga caatgcggaagcaagagaacggcaccttcggcaacagcggcatccggatcagagacttcgag aacatgaagcgggacgacgccaatcctgccaactatccctacatcgtggacacctacacaca ctacatcctggaaaacaacaaggtcgagatgtttatcaacgacaaagaggacagcgccccac tgctgcccgtgatcgaggatgatagatacgtggtcaagacaatccccagctgccggatgagc accctggaaattccagccatggccttccacatgtttctgttcggcagcaagaaaaccgagaa gctgatcgtggacgtgcacaaccggtacaagagactgttccaggccatgcagaaagaagaag tgaccgccgagaatatcgccagcttcggaatcgccgagagcgacctgcctcagaagatcctg gatctgatcagcggcaatgcccacggcaaggatgtggacgccttcatcagactgaccgtgga cgacatgctgaccgacaccgagcggagaatcaagagattcaaggacgaccggaagtccattc ggagcgccgacaacaagatgggaaagagaggcttcaagcagatctccacaggcaagctggcc gacttcctggccaaggacatcgtgctgtttcagcccagcgtgaacgatggcgagaacaagat caccggcctgaactaccggatcatgcagagcgccattgccgtgtacgatagcggcgacgatt acgaggccaagcagcagttcaagctgatgttcgagaaggcccggctgatcggcaagggcaca acagagcctcatccatttctgtacaaggtgttcgcccgcagcatccccgccaatgccgtcga gttctacgagcgctacctgatcgagcggaagttctacctgaccggcctgtccaacgagatca agaaaggcaacagagtggatgtgcccttcatccggcgggaccagaacaagtggaaaacaccc gccatgaagaccctgggcagaatctacagcgaggatctgcccgtggaactgcccagacagat gttcgacaatgagatcaagtcccacctgaagtccctgccacagatggaaggcatcgacttca acaatgccaacgtgacctatctgatcgccgagtacatgaagagagtgctggacgacgacttc
cagaccttctaccagtggaaccgcaactaccggtacatggacatgcttaagggcgagtacga cagaaagggctccctgcagcactgcttcaccagcgtggaagagagagaaggcctctggaaag agcgggcctccagaacagagcggtacagaaagcaggccagcaacaagatccgcagcaaccgg cagatgagaaacgccagcagcgaagagatcgagacaatcctggataagcggctgagcaacag ccggaacgagtaccagaaaagcgagaaagtgatccggcgctacagagtgcaggatgccctgc tgtttctgctggccaaaaagaccctgaccgaactggccgatttcgacggcgagaggttcaaa ctgaaagaaatcatgcccgacgccgagaagggaatcctgagcgagatcatgcccatgagctt caccttcgagaaaggcggcaagaagtacaccatcaccagcgagggcatgaagctgaagaact acggcgacttctttgtgctggctagcgacaagaggatcggcaacctgctggaactcgtgggc agcgacatcgtgtccaaagaggatggatccaaaagaaccgccgacggcagcgaattcgacta caaagacgatgacgacaagggatccactagtaacggccgccagtgtgctggaattcgccctt tacaaagtcagaatggcaaatgtgaaggatgcaatccagacaaagatgaagctccttattat acccatctgggagctggtcctgatgtggcagctattagaacactcatggaagaaaggtatgg agagaagggtaaagctattaggattgaaaaagtcatatatactggtaaagaaggcaagagct ctcagggatgtcctattgctaaatgggtatatcggagatcgagtgaggaggagaaactactg tgtttggtacgagtgcgacctaaccacacatgtgagacggcggtgatggtaattgccatcat gttgtgggacggaatcccaaagctactcgcatcagaactctactcagaacttacagatatct tgggcaagtgtggcatatgcaccaaccgtcgctgttctcagaatgaaacgaagaaaaagcaa tcaccacccagaaactgttgttgtcagggtgagaatccagagacctgtggtgcctccttttc ttttggttgttcttggagcatgtactataatggatgtaagtttgccagaagcaagaaaccaa ggaaatttaggctacatggagctgagccaaaagaggaagagagactaggttctcatttgcaa aacctggctactgtcattgctccaatatacaagaagcttgcacccgatgcatacaataatca ggttgaatttgaacaccaagccccagactgctgtttgggtctgaaggaaggccggccattct caggagtcactgcatgtttggacttctctgctcattcccacagagaccagcagaacatgcca aatggcagtacagtggtggtcaccctcaatagagaagacaatcgagaagtcggagctaagcc tgaggatgagcagttccacgtgctgcctatgtacatcatcgcccctgaggatgagtttggga gtacggaaggccaggagaagaagatacggatggggtccattgaggttctgcagtcatttcgg aggagaagggtcataaggataggagagctgcccaagagttgcaagaagaaagcggagcccaa gaaagccaagaccaagaaagcagctcgaaagcattcctctctggagaactgctccagtagga ctgagaagggaaagtcttcctcacatacaaagctgatggaaaatgcaagccatatgaaacaa atgacagcacaaccgcagctttcgggcccggtcatccggcagccaccaacactccagaggca ccttcagcaagggcagaggccacagcagccgcagccacctcagccgcagccgcagacgacac ctcagccacagccacagccacagcatatcatgcccggtaactctcagtctgttggttctcat tgttctggatccaccagtgtctacacgagacagcctactcctcacagtccttatcccagctc agcacacacctcagatatttatggagataccaaccatgtgaacttttaccccacttcatctc atgcctcgggttcatatttgaatccttctaattacatgaacccctaccttgggcttttgaat cagaataaccaatatgcaccttttccatacaatgggagtgtgccagtggacaatggttcccc tttcttaggttcttattccccccaggctcagtccagggatctacatagatatccaaaccagg accatctcaccaatcagaacttaccacccatccacacccttcaccaacagacgtttggggac agtccctctaagtacttaagttatggaaaccaaaatatgcagagagatgccttcactactaa ctccaccctaaaaccaaatgtacaccacctagcaacgttttctccttaccccacccccaaga tggatagtcatttcatgggagctgcctccagatcaccatacagccacccacacactgactac aaaaccagtgagcatcatctaccctctcacacggtctacagctacacggcagcagcttcggg gagcagttccagccacgccttccacaacaaggagaatgacaacatagccaatgggctctcaa gagtgcttccagggtttaatcatgatagaactgcttctgcccaagaactattatacagtctg actggcagcagtcaggagaagcagcctgaggtgtcaggccaggatgcagctgctgtgcagga aattgagtattggtcagatagtgagcacaactttcaggatccttgcattggaggggtggcta tagccccaactcatgggtcaattcttattgagtgtgcaaagtgtgaggttcatgccacaacc aaagtaaacgatcccgaccggaatcaccccaccaggatctcacttgtactgtataggcataa gaatttgtttctaccaaaacattgtttggctctctgggaagccaaaatggctgaaaaggccc ggaaagaggaagagtgcggaaagaatggatcagaccacgtgtctcagaaaaatcatggcaaa
caggaaaagcgtgagcccacagggccacaggaacccagttacctgcgtttcatccagtctct tgctgagaacacagggtctgtgactacggattctaccgtgactacatcaccatatgctttca ctcaggtcacagggccttacaacacatttgtagtttaa (SEQ ID NO: 28) SEQ ID NO: 29 - Exemplary catalytically dead Cas13d (dCas13d) fused to mouse TET2 catalytic domain dead (TET(HxD)) amino acid (shown with optional HA tag, SV40 NLS, and Flag Tag, each of which are underlined) MYPYDVPDYASPKKKRKVNIPALVENQKKYFGTYSVMAMLNAQTVLDHIQKVADIEGEQNEN NENLWFHPVMSHLYNAKNGYDKQPEKTMFIIERLQSYFPFLKIMAENQREYSNGKYKQNRVE VNSNDIFEVLKRAFGVLKMYRDLTNAYKTYEEKLNDGCEFLTSTEQPLSGMINNYYTVALRN MNERYGYKTEDLAFIQDKRFKFVKDAYGKKKSQVNTGFFLSLQDYNGDTQKKLHLSGVGIAL LICLFLDKQYINIFLSRLPIFSSYNAQSEERRIIIRSFGINSIKLPKDRIHSEKSNKSVAMD MLNEVKRCPDELFTTLSAEKQSRFRIISDDHNEVLMKRSSDRFVPLLLQYIDYGKLFDHIRF HVNMGKLRYLLKADKTCIDGQTRVRVIEQPLNGFGRLEEAETMRKQENGTFGNSGIRIRDFE NMKRDDANPANYPYIVDTYTHYILENNKVEMFINDKEDSAPLLPVIEDDRYVVKTIPSCRMS TLEIPAMAFHMFLFGSKKTEKLIVDVHNRYKRLFQAMQKEEVTAENIASFGIAESDLPQKIL DLISGNAHGKDVDAFIRLTVDDMLTDTERRIKRFKDDRKSIRSADNKMGKRGFKQISTGKLA DFLAKDIVLFQPSVNDGENKITGLNYRIMQSAIAVYDSGDDYEAKQQFKLMFEKARLIGKGT TEPHPFLYKVFARSIPANAVEFYERYLIERKFYLTGLSNEIKKGNRVDVPFIRRDQNKWKTP AMKTLGRIYSEDLPVELPRQMFDNEIKSHLKSLPQMEGIDFNNANVTYLIAEYMKRVLDDDF QTFYQWNRNYRYMDMLKGEYDRKGSLQHCFTSVEEREGLWKERASRTERYRKQASNKIRSNR QMRNASSEEIETILDKRLSNSRNEYQKSEKVIRRYRVQDALLFLLAKKTLTELADFDGERFK LKEIMPDAEKGILSEIMPMSFTFEKGGKKYTITSEGMKLKNYGDFFVLASDKRIGNLLELVG SDIVSKEDGSKRTADGSEFDYKDDDDKGSTSNGRQCAGIRPLQSQNGKCEGCNPDKDEAPYY THLGAGPDVAAIRTLMEERYGEKGKAIRIEKVIYTGKEGKSSQGCPIAKWVYRRSSEEEKLL CLVRVRPNHTCETAVMVIAIMLWDGIPKLLASELYSELTDILGKCGICTNRRCSQNETKKKQ SPPRNCCCQGENPETCGASFSFGCSWSMYYNGCKFARSKKPRKFRLHGAEPKEEERLGSHLQ NLATVIAPIYKKLAPDAYNNQVEFEHQAPDCCLGLKEGRPFSGVTACLDFSAHSYRAQQNMP NGSTVVVTLNREDNREVGAKPEDEQFHVLPMYIIAPEDEFGSTEGQEKKIRMGSIEVLQSFR RRRVIRIGELPKSCKKKAEPKKAKTKKAARKHSSLENCSSRTEKGKSSSHTKLMENASHMKQ MTAQPQLSGPVIRQPPTLQRHLQQGQRPQQPQPPQPQPQTTPQPQPQPQHIMPGNSQSVGSH CSGSTSVYTRQPTPHSPYPSSAHTSDIYGDTNHVNFYPTSSHASGSYLNPSNYMNPYLGLLN QNNQYAPFPYNGSVPVDNGSPFLGSYSPQAQSRDLHRYPNQDHLTNQNLPPIHTLHQQTFGD SPSKYLSYGNQNMQRDAFTTNSTLKPNVHHLATFSPYPTPKMDSHFMGAASRSPYSHPHTDY KTSEHHLPSHTVYSYTAAASGSSSSHAFHNKENDNIANGLSRVLPGFNHDRTASAQELLYSL TGSSQEKQPEVSGQDAAAVQEIEYWSDSEHNFQDPCIGGVAIAPTHGSILIECAKCEVHATT KVNDPDRNHPTRISLVLYRHKNLFLPKHCLALWEAKMAEKARKEEECGKNGSDHVSQKNHGK QEKREPTGPQEPSYLRFIQSLAENTGSVTTDSTVTTSPYAFTQVTGPYNTFV (SEQ ID NO: 29) SEQ ID NO: 30 - Exemplary catalytically dead Cas13d (dCas13d) fused to mouse TET2 catalytic domain dead (TET(HxD)) polynucleotide coding sequence (shown with optional HA tag, SV40 NLS, and Flag Tag, each of which are underlined) ATGtacccatacgatgttccagattacgcttcgccaaagaagaagcggaaagtcaacatccc cgctctggtggaaaaccagaagaagtactttggcacctacagcgtgatggccatgctgaacg ctcagaccgtgctggaccacatccagaaggtggccgatattgagggcgagcagaacgagaac aacgagaatctgtggtttcaccccgtgatgagccacctgtacaacgccaagaacggctacga caagcagcccgagaaaaccatgttcatcatcgagcggctgcagagctacttcccattcctga agatcatggccgagaaccagagagagtacagcaacggcaagtacaagcagaaccgcgtggaa gtgaacagcaacgacatcttcgaggtgctgaagcgcgccttcggcgtgctgaagatgtacag ggacctgaccaacgcatacaagacctacgaggaaaagctgaacgacggctgcgagttcctga
ccagcacagagcaacctctgagcggcatgatcaacaactactacacagtggccctgcggaac atgaacgagagatacggctacaagacagaggacctggccttcatccaggacaagcggttcaa gttcgtgaaggacgcctacggcaagaaaaagtcccaagtgaataccggattcttcctgagcc tgcaggactacaacggcgacacacagaagaagctgcacctgagcggagtgggaatcgccctg ctgatctgcctgttcctggacaagcagtacatcaacatctttctgagcaggctgcccatctt ctccagctacaatgcccagagcgaggaacggcggatcatcatcagatccttcggcatcaaca gcatcaagctgcccaaggaccggatccacagcgagaagtccaacaagagcgtggccatggat atgctcaacgaagtgaagcggtgccccgacgagctgttcacaacactgtctgccgagaagca gtcccggttcagaatcatcagcgacgaccacaatgaagtgctgatgaagcggagcagcgaca gattcgtgcctctgctgctgcagtatatcgattacggcaagctgttcgaccacatcaggttc cacgtgaacatgggcaagctgagatacctgctgaaggccgacaagacctgcatcgacggcca gaccagagtcagagtgatcgagcagcccctgaacggcttcggcagactggaagaggccgaga caatgcggaagcaagagaacggcaccttcggcaacagcggcatccggatcagagacttcgag aacatgaagcgggacgacgccaatcctgccaactatccctacatcgtggacacctacacaca ctacatcctggaaaacaacaaggtcgagatgtttatcaacgacaaagaggacagcgccccac tgctgcccgtgatcgaggatgatagatacgtggtcaagacaatccccagctgccggatgagc accctggaaattccagccatggccttccacatgtttctgttcggcagcaagaaaaccgagaa gctgatcgtggacgtgcacaaccggtacaagagactgttccaggccatgcagaaagaagaag tgaccgccgagaatatcgccagcttcggaatcgccgagagcgacctgcctcagaagatcctg gatctgatcagcggcaatgcccacggcaaggatgtggacgccttcatcagactgaccgtgga cgacatgctgaccgacaccgagcggagaatcaagagattcaaggacgaccggaagtccattc ggagcgccgacaacaagatgggaaagagaggcttcaagcagatctccacaggcaagctggcc gacttcctggccaaggacatcgtgctgtttcagcccagcgtgaacgatggcgagaacaagat caccggcctgaactaccggatcatgcagagcgccattgccgtgtacgatagcggcgacgatt acgaggccaagcagcagttcaagctgatgttcgagaaggcccggctgatcggcaagggcaca acagagcctcatccatttctgtacaaggtgttcgcccgcagcatccccgccaatgccgtcga gttctacgagcgctacctgatcgagcggaagttctacctgaccggcctgtccaacgagatca agaaaggcaacagagtggatgtgcccttcatccggcgggaccagaacaagtggaaaacaccc gccatgaagaccctgggcagaatctacagcgaggatctgcccgtggaactgcccagacagat gttcgacaatgagatcaagtcccacctgaagtccctgccacagatggaaggcatcgacttca acaatgccaacgtgacctatctgatcgccgagtacatgaagagagtgctggacgacgacttc cagaccttctaccagtggaaccgcaactaccggtacatggacatgcttaagggcgagtacga cagaaagggctccctgcagcactgcttcaccagcgtggaagagagagaaggcctctggaaag agcgggcctccagaacagagcggtacagaaagcaggccagcaacaagatccgcagcaaccgg cagatgagaaacgccagcagcgaagagatcgagacaatcctggataagcggctgagcaacag ccggaacgagtaccagaaaagcgagaaagtgatccggcgctacagagtgcaggatgccctgc tgtttctgctggccaaaaagaccctgaccgaactggccgatttcgacggcgagaggttcaaa ctgaaagaaatcatgcccgacgccgagaagggaatcctgagcgagatcatgcccatgagctt caccttcgagaaaggcggcaagaagtacaccatcaccagcgagggcatgaagctgaagaact acggcgacttctttgtgctggctagcgacaagaggatcggcaacctgctggaactcgtgggc agcgacatcgtgtccaaagaggatggatccaaaagaaccgccgacggcagcgaattcgacta caaagacgatgacgacaagggatccactagtaacggccgccagtgtgctggaattcgccctt tacaaagtcagaatggcaaatgtgaaggatgcaatccagacaaagatgaagctccttattat acccatctgggagctggtcctgatgtggcagctattagaacactcatggaagaaaggtatgg agagaagggtaaagctattaggattgaaaaagtcatatatactggtaaagaaggcaagagct ctcagggatgtcctattgctaaatgggtatatcggagatcgagtgaggaggagaaactactg tgtttggtacgagtgcgacctaaccacacatgtgagacggcggtgatggtaattgccatcat gttgtgggacggaatcccaaagctactcgcatcagaactctactcagaacttacagatatct tgggcaagtgtggcatatgcaccaaccgtcgctgttctcagaatgaaacgaagaaaaagcaa tcaccacccagaaactgttgttgtcagggtgagaatccagagacctgtggtgcctccttttc ttttggttgttcttggagcatgtactataatggatgtaagtttgccagaagcaagaaaccaa
ggaaatttaggctacatggagctgagccaaaagaggaagagagactaggttctcatttgcaa aacctggctactgtcattgctccaatatacaagaagcttgcacccgatgcatacaataatca ggttgaatttgaacaccaagccccagactgctgtttgggtctgaaggaaggccggccattct caggagtcactgcatgtttggacttctctgctcattcctacagagcccagcagaacatgcca aatggcagtacagtggtggtcaccctcaatagagaagacaatcgagaagtcggagctaagcc tgaggatgagcagttccacgtgctgcctatgtacatcatcgcccctgaggatgagtttggga gtacggaaggccaggagaagaagatacggatggggtccattgaggttctgcagtcatttcgg aggagaagggtcataaggataggagagctgcccaagagttgcaagaagaaagcggagcccaa gaaagccaagaccaagaaagcagctcgaaagcattcctctctggagaactgctccagtagga ctgagaagggaaagtcttcctcacatacaaagctgatggaaaatgcaagccatatgaaacaa atgacagcacaaccgcagctttcgggcccggtcatccggcagccaccaacactccagaggca ccttcagcaagggcagaggccacagcagccgcagccacctcagccgcagccgcagacgacac ctcagccacagccacagccacagcatatcatgcccggtaactctcagtctgttggttctcat tgttctggatccaccagtgtctacacgagacagcctactcctcacagtccttatcccagctc agcacacacctcagatatttatggagataccaaccatgtgaacttttaccccacttcatctc atgcctcgggttcatatttgaatccttctaattacatgaacccctaccttgggcttttgaat cagaataaccaatatgcaccttttccatacaatgggagtgtgccagtggacaatggttcccc tttcttaggttcttattccccccaggctcagtccagggatctacatagatatccaaaccagg accatctcaccaatcagaacttaccacccatccacacccttcaccaacagacgtttggggac agtccctctaagtacttaagttatggaaaccaaaatatgcagagagatgccttcactactaa ctccaccctaaaaccaaatgtacaccacctagcaacgttttctccttaccccacccccaaga tggatagtcatttcatgggagctgcctccagatcaccatacagccacccacacactgactac aaaaccagtgagcatcatctaccctctcacacggtctacagctacacggcagcagcttcggg gagcagttccagccacgccttccacaacaaggagaatgacaacatagccaatgggctctcaa gagtgcttccagggtttaatcatgatagaactgcttctgcccaagaactattatacagtctg actggcagcagtcaggagaagcagcctgaggtgtcaggccaggatgcagctgctgtgcagga aattgagtattggtcagatagtgagcacaactttcaggatccttgcattggaggggtggcta tagccccaactcatgggtcaattcttattgagtgtgcaaagtgtgaggttcatgccacaacc aaagtaaacgatcccgaccggaatcaccccaccaggatctcacttgtactgtataggcataa gaatttgtttctaccaaaacattgtttggctctctgggaagccaaaatggctgaaaaggccc ggaaagaggaagagtgcggaaagaatggatcagaccacgtgtctcagaaaaatcatggcaaa caggaaaagcgtgagcccacagggccacaggaacccagttacctgcgtttcatccagtctct tgctgagaacacagggtctgtgactacggattctaccgtgactacatcaccatatgctttca ctcaggtcacagggccttacaacacatttgtatga (SEQ ID NO: 30) SEQ ID NO: 31 - Exemplary dCas13d fused to human TET2 isoform A catalytic domain amino acid NIPALVENQKKYFGTYSVMAMLNAQTVLDHIQKVADIEGEQNENNENLWFHPVMSHLYNAKN GYDKQPEKTMFIIERLQSYFPFLKIMAENQREYSNGKYKQNRVEVNSNDIFEVLKRAFGVLK MYRDLTNAYKTYEEKLNDGCEFLTSTEQPLSGMINNYYTVALRNMNERYGYKTEDLAFIQDK RFKFVKDAYGKKKSQVNTGFFLSLQDYNGDTQKKLHLSGVGIALLICLFLDKQYINIFLSRL PIFSSYNAQSEERRIIIRSFGINSIKLPKDRIHSEKSNKSVAMDMLNEVKRCPDELFTTLSA EKQSRFRIISDDHNEVLMKRSSDRFVPLLLQYIDYGKLFDHIRFHVNMGKLRYLLKADKTCI DGQTRVRVIEQPLNGFGRLEEAETMRKQENGTFGNSGIRIRDFENMKRDDANPANYPYIVDT YTHYILENNKVEMFINDKEDSAPLLPVIEDDRYVVKTIPSCRMSTLEIPAMAFHMFLFGSKK TEKLIVDVHNRYKRLFQAMQKEEVTAENIASFGIAESDLPQKILDLISGNAHGKDVDAFIRL TVDDMLTDTERRIKRFKDDRKSIRSADNKMGKRGFKQISTGKLADFLAKDIVLFQPSVNDGE NKITGLNYRIMQSAIAVYDSGDDYEAKQQFKLMFEKARLIGKGTTEPHPFLYKVFARSIPAN AVEFYERYLIERKFYLTGLSNEIKKGNRVDVPFIRRDQNKWKTPAMKTLGRIYSEDLPVELP RQMFDNEIKSHLKSLPQMEGIDFNNANVTYLIAEYMKRVLDDDFQTFYQWNRNYRYMDMLKG EYDRKGSLQHCFTSVEEREGLWKERASRTERYRKQASNKIRSNRQMRNASSEEIETILDKRL
SNSRNEYQKSEKVIRRYRVQDALLFLLAKKTLTELADFDGERFKLKEIMPDAEKGILSEIMP MSFTFEKGGKKYTITSEGMKLKNYGDFFVLASDKRIGNLLELVGSDIVSKEDGSKRTADGSE FSVLNNFIESPSKLLDTPIKNLLDTPVKTQYDFPSCRCVEQIIEKDEGPFYTHLGAGPNVAA IREIMEERFGQKGKAIRIERVIYTGKEGKSSQGCPIAKWVVRRSSSEEKLLCLVRERAGHTC EAAVIVILILVWEGIPLSLADKLYSELTETLRKYGTLTNRRCALNEERTCACQGLDPETCGA SFSFGCSWSMYYNGCKFARSKIPRKFKLLGDDPKEEEKLESHLQNLSTLMAPTYKKLAPDAY NNQIEYEHRAPECRLGLKEGRPFSGVTACLDFCAHAHRDLHNMQNGSTLVCTLTREDNREFG GKPEDEQLHVLPLYKVSDVDEFGSVEAQEEKKRSGAIQVLSSFRRKVRMLAEPVKTCRQRKL EAKKAAAEKLSSLENSSNKNEKEKSAPSRTKQTENASQAKQLAELLRLSGPVMQQSQQPQPL QKQPPQPQQQQRPQQQQPHHPQTESVNSYSASGSTNPYMRRPNPVSPYPNSSHTSDIYGSTS PMNFYSTSSQAAGSYLNSSNPMNPYPGLLNQNTQYPSYQCNGNLSVDNCSPYLGSYSPQSQP MDLYRYPSQDPLSKLSLPPIHTLYQPRFGNSQSFTSKYLGYGNQNMQGDGFSSCTIRPNVHH VGKLPPYPTHEMDGHFMGATSRLPPNLSNPNMDYKNGEHHSPSHIIHNYSAAPGMFNSSLHA LHLQNKENDMLSHTANGLSKMLPALNHDRTACVQGGLHKLSDANGQEKQPLALVQGVASGAE DNDEVWSDSEQSFLDPDIGGVAVAPTHGSILIECAKRELHATTPLKNPNRNHPTRISLVFYQ HKSMNEPKHGLALWEAKMAEKAREKEEECEKY (SEQ ID NO: 31) SEQ ID NO: 32 - Exemplary dCas13d fused to human TET2 isoform A catalytic domain polynucleotide coding sequence aacatccccgctctggtggaaaaccagaagaagtactttggcacctacagcgtgatggccat gctgaacgctcagaccgtgctggaccacatccagaaggtggccgatattgagggcgagcaga acgagaacaacgagaatctgtggtttcaccccgtgatgagccacctgtacaacgccaagaac ggctacgacaagcagcccgagaaaaccatgttcatcatcgagcggctgcagagctacttccc attcctgaagatcatggccgagaaccagagagagtacagcaacggcaagtacaagcagaacc gcgtggaagtgaacagcaacgacatcttcgaggtgctgaagcgcgccttcggcgtgctgaag atgtacagggacctgaccaacgcatacaagacctacgaggaaaagctgaacgacggctgcga gttcctgaccagcacagagcaacctctgagcggcatgatcaacaactactacacagtggccc tgcggaacatgaacgagagatacggctacaagacagaggacctggccttcatccaggacaag cggttcaagttcgtgaaggacgcctacggcaagaaaaagtcccaagtgaataccggattctt cctgagcctgcaggactacaacggcgacacacagaagaagctgcacctgagcggagtgggaa tcgccctgctgatctgcctgttcctggacaagcagtacatcaacatctttctgagcaggctg cccatcttctccagctacaatgcccagagcgaggaacggcggatcatcatcagatccttcgg catcaacagcatcaagctgcccaaggaccggatccacagcgagaagtccaacaagagcgtgg ccatggatatgctcaacgaagtgaagcggtgccccgacgagctgttcacaacactgtctgcc gagaagcagtcccggttcagaatcatcagcgacgaccacaatgaagtgctgatgaagcggag cagcgacagattcgtgcctctgctgctgcagtatatcgattacggcaagctgttcgaccaca tcaggttccacgtgaacatgggcaagctgagatacctgctgaaggccgacaagacctgcatc gacggccagaccagagtcagagtgatcgagcagcccctgaacggcttcggcagactggaaga ggccgagacaatgcggaagcaagagaacggcaccttcggcaacagcggcatccggatcagag acttcgagaacatgaagcgggacgacgccaatcctgccaactatccctacatcgtggacacc tacacacactacatcctggaaaacaacaaggtcgagatgtttatcaacgacaaagaggacag cgccccactgctgcccgtgatcgaggatgatagatacgtggtcaagacaatccccagctgcc ggatgagcaccctggaaattccagccatggccttccacatgtttctgttcggcagcaagaaa accgagaagctgatcgtggacgtgcacaaccggtacaagagactgttccaggccatgcagaa agaagaagtgaccgccgagaatatcgccagcttcggaatcgccgagagcgacctgcctcaga agatcctggatctgatcagcggcaatgcccacggcaaggatgtggacgccttcatcagactg accgtggacgacatgctgaccgacaccgagcggagaatcaagagattcaaggacgaccggaa gtccattcggagcgccgacaacaagatgggaaagagaggcttcaagcagatctccacaggca agctggccgacttcctggccaaggacatcgtgctgtttcagcccagcgtgaacgatggcgag aacaagatcaccggcctgaactaccggatcatgcagagcgccattgccgtgtacgatagcgg cgacgattacgaggccaagcagcagttcaagctgatgttcgagaaggcccggctgatcggca
agggcacaacagagcctcatccatttctgtacaaggtgttcgcccgcagcatccccgccaat gccgtcgagttctacgagcgctacctgatcgagcggaagttctacctgaccggcctgtccaa cgagatcaagaaaggcaacagagtggatgtgcccttcatccggcgggaccagaacaagtgga aaacacccgccatgaagaccctgggcagaatctacagcgaggatctgcccgtggaactgccc agacagatgttcgacaatgagatcaagtcccacctgaagtccctgccacagatggaaggcat cgacttcaacaatgccaacgtgacctatctgatcgccgagtacatgaagagagtgctggacg acgacttccagaccttctaccagtggaaccgcaactaccggtacatggacatgcttaagggc gagtacgacagaaagggctccctgcagcactgcttcaccagcgtggaagagagagaaggcct ctggaaagagcgggcctccagaacagagcggtacagaaagcaggccagcaacaagatccgca gcaaccggcagatgagaaacgccagcagcgaagagatcgagacaatcctggataagcggctg agcaacagccggaacgagtaccagaaaagcgagaaagtgatccggcgctacagagtgcagga tgccctgctgtttctgctggccaaaaagaccctgaccgaactggccgatttcgacggcgaga ggttcaaactgaaagaaatcatgcccgacgccgagaagggaatcctgagcgagatcatgccc atgagcttcaccttcgagaaaggcggcaagaagtacaccatcaccagcgagggcatgaagct gaagaactacggcgacttctttgtgctggctagcgacaagaggatcggcaacctgctggaac tcgtgggcagcgacatcgtgtccaaagaggatggatccaaaagaaccgccgacggcagcgaa ttcTCTGTTCTCAATAATTTTATAGAGTCACCTTCCAAATTACTAGATACTCCTATAAAAAA TTTATTGGATACACCTGTCAAGACTCAATATGATTTCCCATCTTGCAGATGTGTAGAGCAAA TTATTGAAAAAGATGAAGGTCCTTTTTATACCCATCTAGGAGCAGGTCCTAATGTGGCAGCT ATTAGAGAAATCATGGAAGAAAGGTTTGGACAGAAGGGTAAAGCTATTAGGATTGAAAGAGT CATCTATACTGGTAAAGAAGGCAAAAGTTCTCAGGGATGTCCTATTGCTAAGTGGGTGGTTC GCAGAAGCAGCAGTGAAGAGAAGCTACTGTGTTTGGTGCGGGAGCGAGCTGGCCACACCTGT GAGGCTGCAGTGATTGTGATTCTCATCCTGGTGTGGGAAGGAATCCCGCTGTCTCTGGCTGA CAAACTCTACTCGGAGCTTACCGAGACGCTGAGGAAATACGGCACGCTCACCAATCGCCGGT GTGCCTTGAATGAAGAGAGAACTTGCGCCTGTCAGGGGCTGGATCCAGAAACCTGTGGTGCC TCCTTCTCTTTTGGTTGTTCATGGAGCATGTACTACAATGGATGTAAGTTTGCCAGAAGCAA GATCCCAAGGAAGTTTAAGCTGCTTGGGGATGACCCAAAAGAGGAAGAGAAACTGGAGTCTC ATTTGCAAAACCTGTCCACTCTTATGGCACCAACATATAAGAAACTTGCACCTGATGCATAT AATAATCAGATTGAATATGAACACAGAGCACCAGAGTGCCGTCTGGGTCTGAAGGAAGGCCG TCCATTCTCAGGGGTCACTGCATGTTTGGACTTCTGTGCTCATGCCCACAGAGACTTGCACA ACATGCAGAATGGCAGCACATTGGTATGCACTCTCACTAGAGAAGACAATCGAGAATTTGGA GGAAAACCTGAGGATGAGCAGCTTCACGTTCTGCCTTTATACAAAGTCTCTGACGTGGATGA GTTTGGGAGTGTGGAAGCTCAGGAGGAGAAAAAACGGAGTGGTGCCATTCAGGTACTGAGTT CTTTTCGGCGAAAAGTCAGGATGTTAGCAGAGCCAGTCAAGACTTGCCGACAAAGGAAACTA GAAGCCAAGAAAGCTGCAGCTGAAAAGCTTTCCTCCCTGGAGAACAGCTCAAATAAAAATGA AAAGGAAAAGTCAGCCCCATCACGTACAAAACAAACTGAAAACGCAAGCCAGGCTAAACAGT TGGCAGAACTTTTGCGACTTTCAGGACCAGTCATGCAGCAGTCCCAGCAGCCCCAGCCTCTA CAGAAGCAGCCACCACAGCCCCAGCAGCAGCAGAGACCCCAGCAGCAGCAGCCACATCACCC TCAGACAGAGTCTGTCAACTCTTATTCTGCTTCTGGATCCACCAATCCATACATGAGACGGC CCAATCCAGTTAGTCCTTATCCAAACTCTTCACACACTTCAGATATCTATGGAAGCACCAGC CCTATGAACTTCTATTCCACCTCATCTCAAGCTGCAGGTTCATATTTGAATTCTTCTAATCC CATGAACCCTTACCCTGGGCTTTTGAATCAGAATACCCAATATCCATCATATCAATGCAATG GAAACCTATCAGTGGACAACTGCTCCCCATATCTGGGTTCCTATTCTCCCCAGTCTCAGCCG ATGGATCTGTATAGGTATCCAAGCCAAGACCCTCTGTCTAAGCTCAGTCTACCACCCATCCA TACACTTTACCAGCCAAGGTTTGGAAATAGCCAGAGTTTTACATCTAAATACTTAGGTTATG GAAACCAAAATATGCAGGGAGATGGTTTCAGCAGTTGTACCATTAGACCAAATGTACATCAT GTAGGGAAATTGCCTCCTTATCCCACTCATGAGATGGATGGCCACTTCATGGGAGCCACCTC TAGATTACCACCCAATCTGAGCAATCCAAACATGGACTATAAAAATGGTGAACATCATTCAC CTTCTCACATAATCCATAACTACAGTGCAGCTCCGGGCATGTTCAACAGCTCTCTTCATGCC CTGCATCTCCAAAACAAGGAGAATGACATGCTTTCCCACACAGCTAATGGGTTATCAAAGAT GCTTCCAGCTCTTAACCATGATAGAACTGCTTGTGTCCAAGGAGGCTTACACAAATTAAGTG
ATGCTAATGGTCAGGAAAAGCAGCCATTGGCACTAGTCCAGGGTGTGGCTTCTGGTGCAGAG GACAACGATGAGGTCTGGTCAGACAGCGAGCAGAGCTTTCTGGATCCTGACATTGGGGGAGT GGCCGTGGCTCCAACTCATGGGTCAATTCTCATTGAGTGTGCAAAGCGTGAGCTGCATGCCA CAACCCCTTTAAAGAATCCCAATAGGAATCACCCCACCAGGATCTCCCTCGTCTTTTACCAG CATAAGAGCATGAATGAGCCAAAACATGGCTTGGCTCTTTGGGAAGCCAAAATGGCTGAAAA AGCCCGTGAGAAAGAGGAAGAGTGTGAAAAGTATGGCCCAGACTATGTGCCTCAGAAATCCC ATGGCAAAAAAGTGAAACGGGAGCCTGCTGAGCCACATGAAACTTCAGAGCCCACTTACCTG CGTTTCATCAAGTCTCTTGCCGAAAGGACCATGTCCGTGACCACAGACTCCACAGTAACTAC ATCTCCATATGCCTTCACTCGGGTCACAGGGCCTTACAACAGATATATATGA (SEQ ID NO: 32) IV. m5C marked RNA as disease biomarkers and/or targets for therapeutic intervention [0186] In certain aspects, provided herein are methods of diagnosis and/or treatment comprising analysis and/or modification of RNA, such as but not limited to chromatin associated RNA (caRNA), including at least chromatin associated regulatory RNA (carRNA). In some aspects, methods of diagnosis and/or treatment comprising analysis and/or modification of RNA may be loci and/or sequence specific, or may be loci and/or sequence non-specific. [0187] In certain aspects, provided herein are methods of identifying features, (e.g., such as biomarker features), and evaluating the presence, absence, or levels of said features in a sample. In some aspects, one or more (for example but not limited to, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, or 1000, or more than 1000, or any range derivable therein) features are used to classify (e.g., diagnose) a disease state and/or identify one or more effective treatment options for a patient with a disease associated with aberrant transcription (e.g., aberrant transcription associated with increased euchromatin). In some aspects, one or more features are used to diagnose a disease state and/or identify one or more effective treatment options for a patient with cancer. [0188] In some aspects, one or more features (for example but not limited to, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, or 1000, or more than 1000, or any range derivable therein) are used to diagnose a disease state and/or identify one or more effective treatment options for a patient with a blood cancer. In some aspects, one or more features are used to diagnose a disease state and/or identify one or more effective treatment options for a patient with a leukemia. In some aspects, one or more features are used to diagnose a disease state and/or identify one or more effective treatment options for a patient
with a myeloid malignancy. In some aspects, one or more features are used to diagnose a disease state and/or identify one or more effective treatment options for a patient with acute myeloid leukemia. In some aspects, one or more features are used to diagnose a disease state and/or identify one or more effective treatment options for a patient with chronic myelomonocytic leukemia. In some aspects, one or more features are used to diagnose a disease state and/or identify one or more effective treatment options for a patient with a glioma. In some aspects, one or more features are used to diagnose a disease state and/or identify one or more effective treatment options for a patient with glioblastoma. [0189] In some aspects, one or more (for example but not limited to, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, or 1000, or more than 1000, or any range derivable therein) features are used to diagnose a disease state and/or identify one or more effective treatment options for a patient diagnosed with a pre-cancerous condition, such as but not limited to, clonal hematopoiesis of indeterminate potential (CHIP). [0190] In some aspects, a feature comprises one or more m5C RNA modifications in one or more RNA populations. In some aspects, m5C RNA modification features are determined in caRNA, carRNA, and/or other RNA species. In some aspects, m5C RNA modification features are comprised, consist essentially of, or consist of one or more caRNA described in Table 1. In some aspects, m5C RNA modification features are comprised, consist essentially of, or consist of one or more caRNA described in Table 2. [0191] In some aspects, m5C RNA modification features are comprised, consist essentially of, or consist of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270,
271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 425, 450, 475, 500, 525, 550, 575, 600, 625, 650, 675, 700, 725, 750, 775, 800, 825, 850, 875, 900, 925, 950, 1000, 1200, 1266, 1400, 1500, or more, or any range derivable therein, caRNA described in Table 1. [0192] In some aspects, m5C RNA modification features are comprised, consist essentially of, or consist of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 425, 450, 475, 500, 525, 550, 575, 600, 625, 650, 675, 700, 725, 750, 775, 800, 825, 850, 875,
900, 925, 950, 1000, 1200, 1266, 1400, 1500, or more, or any range derivable therein, caRNA described in Table 2. [0193] In some aspects, m5C RNA modification features are comprised, consist essentially of, or consist of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 425, 450, 475, 500, or more than 500 features, or any range derivable therein, that are comprised in a sequence at least 70%, 75%, 80%, 85%, 90%, 95%, or 100% (or any range derivable therein) complementary to at least 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 nucleotides in length relative of any one or more of SEQ ID NOs: 100-614. [0194] In some aspects, m5C RNA modification features are comprised, consist essentially of, or consist of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99,
100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 425, 450, 475, 500, or more than 500 features that are comprised in a sequence at least 70%, 75%, 80%, 85%, 90%, 95%, or 100% (or any range derivable therein) complementary to at least 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 nucleotides in length relative of any one or more of SEQ ID NOs: 104 or 107. [0195] In certain aspects, provided herein are methods of treatment comprising analyzing m5C RNA modification levels in suspected diseased cells, determining if m5C RNA modification levels are aberrant, and if m5C RNA modification levels are aberrant, contacting the cells with one or more TET2, NSUN2, and/or MBD6 inhibitors. In certain aspects, provided herein are methods of treatment comprising analyzing m5C RNA modification levels in suspected diseased cells, determining if m5C RNA modification levels are aberrant, and if m5C RNA modification levels are aberrant, contacting the cells with one or more site-specific caRNA targeting agents, such as but not limited to, dCas13b::TET2(CD) and/or dCas13b::MBD6 fusion proteins that may be guided a polynucleotide sequence at least partially complementary to an m5C RNA feature described in Table 1 or Table 2.
A. chromatin associate RNA (caRNA) [0196] In some aspects, m5C RNA features are comprised within one or more chromatin associated RNA (caRNA) species. In ertain aspects, a caRNA may comprise caRNAs with potential regulatory functions, such as but not limited to, promoter-associated RNA (paRNA), enhancer RNA (eRNA), and/or RNA transcribed from transposable elements (repeat RNA), these caRNA species can be referred to herein as chromosome-associated regulatory RNAs (carRNAs). [0197] In certain aspects, caRNAs of interest as m5C RNA features are described herein in Table 1, and in particular, Table 2. In certain aspects, caRNAs of particular interest as m5C RNA features are described herein in Table 1 as SEQ ID NOs: 100-614 (e.g., SEQ ID NOs: 104 and 107 in Table 2). In certain aspects, caRNAs of interest as m5C RNA features comprise, consist essentially of, or consist of HERVH-int repeats. In certain aspects, caRNAs of interest as m5C RNA features comprise, consist essentially of, or consist of HERVH-int repeats identified in Table 1 (shaded rows) on Chr1, Chr2, Chr3, Chr4, Chr8, Chr11, Chr212, and ChrX. In certain aspects, caRNAs of interest as m5C RNA features in leukemia cells comprise, consist essentially of, or consist of any one or more of HERVH-int repeats: chr1:22997913- 23003991;-;HERVH-int,LTR,ERV1;2,7713,(0) || chr1:35132025-35137611;-;HERVH- int,LTR,ERV1;1,7680,(33) || chr11:67842303-67856609;-;HERVH-int,LTR,ERV1;1,7713,(0) || chr11:71737927-71752300;+;HERVH-int,LTR,ERV1;1,7713,(0) || chr12:8279422- 8293739;-;HERVH-int,LTR,ERV1;1,7713,(0) || chr2:64252898-64257196;+;HERVH- int,LTR,ERV1;1,7713,(0) || chr2:170977859-170982983;-;HERVH-int,LTR,ERV1;1,7713,(0) || chr3:44345027-44350026;-;HERVH-int,LTR,ERV1;1,7713,(0) || chr3:130138040- 130152338;-;HERVH-int,LTR,ERV1;1,7713,(0) || chr4:71031921-71036834;+;HERVH- int,LTR,ERV1;1,7713,(0) || chr8:8169077-8183461;+;HERVH-int,LTR,ERV1;1,7713,(0) || chr8:12482202-12496589;-;HERVH-int,LTR,ERV1;1,7713,(0) || chrX:71264927- 71272291;+;HERVH-int,LTR,ERV1;1,7713,(0). [0198] In some aspects, technologies described herein may comprise increasing one or more m5C marks at one or more loci, such as but not limited to, increasing m5C marks by 0.1x, 0.2x, 0.3x, 0.4x, 0.5x, 0.6x, 0.7x, 0.8x, 0.9x, 1.0x, 2.0x, 3.0x, 4.0x, 5.0x, 6.0x, 7.0x, 8.0x, 9.0x, 10.0x, 15.0x, 20.0x, 25.0x, 50.0x, 100.0x, or greater than 100x (or any range derivable therein) relative to a control level, at 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, or 1000, or more than 1000, or any range derivable therein, m5C marks.
[0199] In some aspects, technologies described herein may comprise decreasing one or more m5C marks at one or more loci, such as but not limited to, decreasing m5C marks by 0.1x, 0.2x, 0.3x, 0.4x, 0.5x, 0.6x, 0.7x, 0.8x, 0.9x, 1.0x, 2.0x, 3.0x, 4.0x, 5.0x, 6.0x, 7.0x, 8.0x, 9.0x, 10.0x, 15.0x, 20.0x, 25.0x, 50.0x, 100.0x, or greater than 100x (or any range derivable therein) relative to a control level, at 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, or 1000, or more than 1000, or any range derivable therein, m5C marks. [0200] In some aspects, technologies described herein may comprise decreasing recognition (e.g., reading) of one or more m5C marks at one or more loci by one or more complexes comprising deubiquitination activity, (e.g., PR-DUB, PCR1, PCR2, etc.) such as but not limited to, decreasing recognition of m5C marks by 0.1x, 0.2x, 0.3x, 0.4x, 0.5x, 0.6x, 0.7x, 0.8x, 0.9x, 1.0x, 2.0x, 3.0x, 4.0x, 5.0x, 6.0x, 7.0x, 8.0x, 9.0x, 10.0x, 15.0x, 20.0x, 25.0x, 50.0x, 100.0x, or greater than 100x (or any range derivable therein) relative to a control level, at 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, or 1000, or more than 1000, or any range derivable therein, m5C marks.
B. Targeting RNAs [0201] In certain aspects, provided herein are targeting RNAs that may guide a protein and/or ribonucleoprotein complex to a target RNA of interest. [0202] In certain aspects, a targeting RNA is driven by a promoter comprising a Polymerase III promoter (i.e., a promoter that can drive Pol III mediated transcription). In certain aspects, transcription of a targeting RNA is driven by one or more U6 promoters. In certain aspects, a targeting RNA is transcribed by Polymerase III. In certain aspects, a targeting RNA is driven by a promoter comprising a Polymerase II promoter (i.e., a promoter that can drive Pol II mediated transcription). In some aspects, a targeting RNA is designed to be comprised in an intronic sequence. In some aspects, a targeting RNA is created ex-vivo. In some aspects, a targeting RNA is created in-vivo. In some aspects, a targeting RNA can guide an endogenous protein and/or ribonucleoprotein complex to a target RNA of interest. [0203] In some aspects, provided herein are CRISPR/Cas system ancillary components, such as functional targeting RNA species, for example but not limited to, CRISPR RNA, trans- activating CRISPR RNA (tracrRNA), and/or gRNA. [0204] In some aspects, provided herein are targeting RNA species that mediate greater than or equal to about 0%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41%, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%, or any range derivable therein, RNA m5C installation or erasure in a target polynucleotide population. In some aspects, targeting RNA species comprise gRNA molecules. In some aspects, a gRNA molecule is engineered to provide improved functionality relative to a non-engineered gRNA. [0205] In some aspects, an RNA targeting element comprises a gRNA sequence, which may comprise or consist of a polynucleotide sequence, with about, exactly, or at least about 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%, or any percentage derivable therein, identity to any one of SEQ ID NOs: 100-614 or to any sequence complementary thereto that is
at least, exactly, or 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 nucleotides in length. V. Vectors and Nanoparticles A. Vectors [0206] Among other things, the present disclosure provides that in some aspects, polypeptides and/or oligonucleotides described herein are encoded by a polynucleotide, such as a vector comprising a polynucleotide (e.g., a polynucleotide construct). Vectors comprising polynucleotide constructs according to the present disclosure include all those known in the art, including cosmids, plasmids (e.g., naked or contained in liposomes) and viral constructs (e.g., lentiviral, retroviral, adenoviral, and adeno associated viral constructs) that incorporate a polynucleotide encoding a polypeptide with m5C RNA writer, reader, and/or erasor functionality and/or targeting element described herein, or characteristic portions thereof (e.g., as utilized herein, a “characteristic portion thereof” refers to the portion of said protein required to perform the desired function, e.g., it comprises the ability to a impact m5C activity and/or levels, for example, a polypeptide with m5C RNA writer, reader, and/or erasor functionality, or the ability to inhibit the same, in a site-specific or non-site specific manner). Those of skill in the art will be capable of selecting suitable constructs, as well as cells, for making any of the polynucleotides described herein. In some aspects, a construct is a plasmid (i.e., a circular DNA molecule that can autonomously replicate inside a cell). In some aspects, a construct can be a cosmid (e.g., pWE or sCos series). [0207] In some aspects, a construct is a viral construct. In some aspects, a viral construct is a lentivirus, retrovirus, adenovirus, or adeno-associated virus construct. In some aspects, a construct is an adeno-associated virus (AAV) construct (see, e.g., Asokan et al., Mol. Ther.20: 699-7080, 2012, which is incorporated herein by reference for the purposes described herein). In some aspects, a viral construct is an adenovirus construct. In some aspects, a viral construct may also be based on or derived from an alphavirus. Alphaviruses include but are not limited to, Sindbis (and VEEV) virus, Aura virus, Babanki virus, Barmah Forest virus, Bebaru virus, Cabassou virus, Chikungunya virus, Eastern equine encephalitis virus, Everglades virus, Fort Morgan virus, Getah virus, Highlands J virus, Kyzylagach virus, Mayaro virus, Me Tri virus, Middelburg virus, Mosso das Pedras virus, Mucambo virus, Ndumu virus, O'nyong-nyong virus, Pixuna virus, Rio Negro virus, Ross River virus, Salmon pancreas disease virus, Semliki Forest virus, Southern elephant seal virus, Tonate virus, Trocara virus, Una virus, Venezuelan equine encephalitis virus, Western equine encephalitis virus, and Whataroa virus. Generally, the genome of such viruses encode nonstructural (e.g., replicon) and structural proteins (e.g.,
capsid and envelope) that can be translated in the cytoplasm of the host cell. Ross River virus, Sindbis virus, Semliki Forest virus (SFV), and Venezuelan equine encephalitis virus (VEEV) have all been used to develop viral constructs for coding sequence delivery. Pseudotyped viruses may be formed by combining alphaviral envelope glycoproteins and retroviral capsids. Examples of alphaviral constructs can be found in U.S. Publication Nos. 20150050243, 20090305344, and 20060177819; constructs and methods of their making are incorporated herein by reference for the purposes described herein. [0208] In some aspects, constructs provided herein can be of different sizes. In some aspects, a construct is a plasmid and can include a total length of up to about 1 kb, up to about 2 kb, up to about 3 kb, up to about 4 kb, up to about 5 kb, up to about 6 kb, up to about 7 kb, up to about 8 kb, up to about 9 kb, up to about 10 kb, up to about 11 kb, up to about 12 kb, up to about 13 kb, up to about 14 kb, or up to about 15 kb. In some aspects, a construct is a plasmid and can have a total length in a range of about 1 kb to about 2 kb, about 1 kb to about 3 kb, about 1 kb to about 4 kb, about 1 kb to about 5 kb, about 1 kb to about 6 kb, about 1 kb to about 7 kb, about 1 kb to about 8 kb, about 1 kb to about 9 kb, about 1 kb to about 10 kb, about 1 kb to about 11 kb, about 1 kb to about 12 kb, about 1 kb to about 13 kb, about 1 kb to about 14 kb, or about 1 kb to about 15 kb. [0209] In some aspects, a construct is a viral construct and can have a total number of nucleotides of up to 10 kb. In some aspects, a viral construct can have a total number of nucleotides in the range of about 4.5 kb to 5 kb, or about 4.7 kb. In some aspects, a viral construct can have a total number of nucleotides in the range of about 1 kb to about 2 kb, 1 kb to about 3 kb, about 1 kb to about 4 kb, about 1 kb to about 5 kb, about 1 kb to about 6 kb, about 1 kb to about 7 kb, about 1 kb to about 8 kb, about 1 kb to about 9 kb, about 1 kb to about 1 O kb, about 2 kb to about 3 kb, about 2 kb to about 4 kb, about 2 kb to about 5 kb, about 2 kb to about 6 kb, about 2 kb to about 7 kb, about 2 kb to about 8 kb, about 2 kb to about 9 kb, about 2 kb to about 10 kb, about 3 kb to about 4 kb, about 3 kb to about 5 kb, about 3 kb to about 6 kb, about 3 kb to about 7 kb, about 3 kb to about 8 kb, about 3 kb to about 9 kb, about 3 kb to about 10 kb, about 4 kb to about 5 kb, about 4 kb to about 6 kb, about 4 kb to about 7 kb, about 4 kb to about 8 kb, about 4 kb to about 9 kb, about 4 kb to about 10 kb, about 5 kb to about 6 kb, about 5 kb to about 7 kb, about 5 kb to about 8 kb, about 5 kb to about 9 kb, about 5 kb to about 10 kb, about 6 kb to about 7 kb, about 6 kb to about 8 kb, about 6 kb to about 9 kb, about 6 kb to about 10 kb, about 7 kb to about 8 kb, about 7 kb to about 9 kb, about 7 kb to about 10 kb, about 8 kb to about 9 kb, about 8 kb to about 10 kb, or about 9 kb to about 10 kb.
[0210] In some aspects, a construct is a lentivirus construct and can have a total number of nucleotides of up to 8 kb. In some examples, a lentivirus construct can have a total number of nucleotides of about 1 kb to about 2 kb, about 1 kb to about 3 kb, about 1 kb to about 4 kb, about 1 kb to about 5 kb, about 1 kb to about 6 kb, about 1 kb to about 7 kb, about 1 kb to about 8 kb, about 2 kb to about 3 kb, about 2 kb to about 4 kb, about 2 kb to about 5 kb, about 2 kb to about 6 kb, about 2 kb to about 7 kb, about 2 kb to about 8 kb, about 3 kb to about 4 kb, about 3 kb to about 5 kb, about 3 kb to about 6 kb, about 3 kb to about 7 kb, about 3 kb to about 8 kb, about 4 kb to about 5 kb, about 4 kb to about 6 kb, about 4 kb to about 7 kb, about 4 kb to about 8 kb, about 5 kb to about 6 kb, about 5 kb to about 7 kb, about 5 kb to about 8 kb, about 6 kb to about 8 kb, about 6 kb to about 7 kb, or about 7 kb to about 8 kb. [0211] In some aspects, a construct is an adenovirus construct and can have a total number of nucleotides of up to 8 kb. In some aspects, an adenovirus construct can have a total number of nucleotides in the range of about 1 kb to about 2 kb, about 1 kb to about 3 kb, about 1 kb to about 4 kb, about 1 kb to about 5 kb, about 1 kb to about 6 kb, about 1 kb to about 7 kb, about 1 kb to about 8 kb, about 2 kb to about 3 kb, about 2 kb to about 4 kb, about 2 kb to about 5 kb, about 2 kb to about 6 kb, about 2 kb to about 7 kb, about 2 kb to about 8 kb, about 3 kb to about 4 kb, about 3 kb to about 5 kb, about 3 kb to about 6 kb, about 3 kb to about 7 kb, about 3 kb to about 8 kb, about 4 kb to about 5 kb, about 4 kb to about 6 kb, about 4 kb to about 7 kb, about 4 kb to about 8 kb, about 5 kb to about 6 kb, about 5 kb to about 7 kb, about 5 kb to about 8 kb, about 6 kb to about 7 kb, about 6 kb to about 8 kb, or about 7 kb to about 8 kb. [0212] Any of the constructs described herein can further include a control sequence, e.g., a control sequence selected from the group of a transcription initiation sequence, a transcription termination sequence, a promoter sequence, an enhancer sequence, an RNA splicing sequence, a polyadenylation (poly(A)) sequence, a Kozak consensus sequence, and/or additional untranslated regions which may house pre- or post-transcriptional regulatory and/or control elements. In some aspects, a promoter can be a native promoter, a constitutive promoter, an inducible promoter, and/or a tissue-specific promoter. Non-limiting examples of control sequences are described herein. 1. AAV particles [0213] Among other things, the present disclosure provides AAV particles that comprise a polynucleotide construct encoding a polypeptide with m5C RNA writer, reader, and/or erasor functionality and/or targeting element, and an AAV capsid. In some aspects, AAV particles can be described as having a serotype, which is a description of the construct strain and the capsid strain. For example, in some aspects an AAV particle may be described as AAV2,
wherein the particle has an AAV2 capsid and a construct that comprises characteristic AAV2 Inverted Terminal Repeats (ITRs). In some aspects, an AAV particle may be described as a pseudotype, wherein the capsid and construct are derived from different AAV strains, for example, AAV2/9 would refer to an AAV particle that comprises a construct utilizing the AAV2 ITRs and an AAV9 capsid. Additional examples of pseudotyped AAV vectors include, but are not limited to, AAV2/1, AAV2/2, AAV2/3, AAV2/4, AAV2/5, AAV2/6, AAV2/7, AAV2/8 and AAV2/9. [0214] In some aspects, AAV particles suitable for use according to the present disclosure may comprise or be derived from any natural or recombinant AAV serotype. In some aspects, an AAV according to the present disclosure can be selected from natural serotypes such as AAV1, AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, AAV10, AAV11, and AAV12; or pseudotypes, chimeras, and variants thereof. [0215] As used herein, the term "chimera" when referring to an AAV vector, or a "chimeric AAV vector", refers to an AAV vector which comprises a capsid containing VP1, VP2 and VP3 proteins from at least two different AAV serotypes; or alternatively, which comprises VP1, VP2 and VP3 proteins, at least one of which comprises at least a portion from another AAV serotype. Examples of chimeric AAV vectors include, but are not limited to, AAV-DJ, AAV-DJ/8, AAV2G9, AAV2i8, AAV2i8G9, AAV8G9, and AAV9i1. [0216] In some aspects, an AAV serotype and/or pseudotype according to the present invention is selected from the group comprising or consisting of AAV1, AAV2, AAV3, AAV 4, AAV5, AAV6, AAV7, AAV8, AAV9, AAV10, AAV11, AAV12, AAV106.1/hu.37, AAV114.3/hu.40, AAV127.2/hu.41, AAV127.5/hu.42, AAV128.1/hu.43, AAV128.3/hu.44, AAV130.4/hu.48, AAV145.1/hu.53, AAV145.5/hu.54, AAV145.6/hu.55, AAV16.12/hu.11, AAV16.3, AAV16.8/hu.10, AAV161.10/hu.60, AAV161.6/hu.61, AAV1-7/rh.48, AAV1- 8/rh.49, AAV2i8, AAV2i8G9, AAV2-15/rh.62, AAV223.1, AAV223.2, AAV223.4, AAV223.5, AAV223.6, AAV223.7, AAV2-3/rh.61, AAV24.1, AAV2-4/rh.50, AAV2- 5/rh.51, AAV2.5T, AAV27.3, AAV29.3/bb.1, AAV29.5/bb.2, AAV2G9, AAV3B, AAV3.1/hu.6, AAV3.1/hu.9, AAV3-11/rh.53, AAV3-3, AAV33.12/hu.17, AAV33.4/hu.15, AAV33.8/hu.16, AAV3-9/rh.52, AAV3a, AAV3b, AAV4-19/rh.55, AAV42.12, AAV42-10, AAV42-11, AAV42-12, AAV42-13, AAV42-15, AAV42-1b, AAV42-2, AAV42-3a, AAV42- 3b, AAV42-4, AAV42-5a, AAV42-5b, AAV42-6b, AAV42-8, AAV42-aa, AAV43-1, AAV43-12, AAV43-20, AAV43-21, AAV43-23, AAV43-25, AAV43-5, AAV4-4, AAV44.1, AAV44.2, AAV44.5, AAV46.2/hu.28, AAV46.6/hu.29, AAV4-8/rh.64, AAV4-9/rh.54, AAV52.1/hu.20,AAV52/hu.19, AAV5-22/rh.58, AAV5-3/rh.57, AAV54.1/hu.21,
AAV54.2/hu.22,AAV54.4R/hu.27, AAV54.5/hu.23, AAV54.7/hu.24, AAV58.2/hu.25, AAV6.1, AAV6.1.2, AAV6.2, AAV7m8, AAV7.2, AAV7.3/hu.7, AAV-8b, AAV8G9, AAV- 8h, AAV9i1, AAV9.11, AAV9.13, AAV9.16, AAV9.24, AAV9.45, AAV9.47, AAV9.61, AAV9.68, AAV9.84, AAV9.9, AAVcy.2, AAVcy.3, AAVcy.4, AAVcy.5, AAVcy.5R1, AAVcy.5R2, AAVcy.5R3, AAVcy.5R4, AAVcy.6, AAVhu.1, AAVhu.2, AAVhu.3, AAVhu.4, AAVhu.5, AAVhu.6, AAVhu.7, AAVhu.8, AAVhu.9, AAVhu.10, AAVhu.11, AAVhu.12, AAVhu.13, AAVhu.14/9, AAVhu.15, AAVhu.16, AAVhu.17, AAVhu.18, AAVhu.19, AAVhu.20, AAVhu.21, AAVhu.22, AAVhu.23.2, AAVhu.24, AAVhu.25, AVhu.27, AAVhu.28, AAVhu.29, AAVhu.29R, AAVhu.31, AAVhu.32, AAVhu.34, AAVhu.35, AAVhu.37, AAVhu.39, AAVhu.40, AAVhu.41, AAVhu.42, AAVhu.43, AAVhu.44, AAVhu.44R1, AAVhu.44R2, AAVhu.44R3, AAVhu.45, AAVhu.46, AAVhu.47, AAVhu.48, AAVhu.48R1, AAVhu.48R2, AAVhu.48R3, AAVhu.49, AAVhu.51, AAVhu.52, AAVhu.53, AAVhu.54, AAVhu.55, AAVhu.56, AAVhu.57, AAVhu.58, AAVhu.60, AAVhu.61, AAVhu.63, AAVhu.64, AAVhu.66, AAVhu.67, AAVpi.1, AAVpi.2, AAVpi.3, AAVrh.2, AAVrh.2R, AAVrh.8, AAVrh.8R, AAVrh8R R533A mutant, AAVrh8R A586R mutant, AAVrh.10, AAVrh.12, AAVrh.13, AAVrh.13R, AAVrh.14, AAVrh.17, AAVrh.18, AAVrh.19, AAVrh.20, AAVrh.21, AAVrh.22, AAVrh.23, AAVrh.24, AAVrh.25, AAVrh.31, AAVrh.32, AAVrh.33, AAVrh.34, AAVrh.35, AAVrh.36, AAVrh.37, AAVrh.37R2, AAVrh.38, AAVrh.39, AAVrh.40, AAVrh.43, AAVrh.44, AAVrh.45, AAVrh.46, AAVrh.47, AAVrh.48, AAVrh.48.1, AAVrh.48.1.2, AAVrh.48.2, AAVrh.49, AAVrh.50, AAVrh.51, AAVrh.52, AAVrh.53, AAVrh.54, AAVrh.55, AAVrh.56, AAVrh.57, AAVrh.58, AAVrh.59, AAVrh.60, AAVrh.61, AAVrh.62, AAVrh.64, AAVrh.64R1, AAVrh.64R2, AAVrh.65, AAVrh.67, AAVrh.68, AAVrh.69, AAVrh.70, AAVrh.72, AAVrh.73, AAVrh.74, AAV- PHP.B, AAVPHP.A, AAV-G2B-26, AAV-G2B-13, AAV-TH1.1-32, AAVTH1.1-35, AAV- PHP.B2, AAV-PHP.B3, AAV-PHP.N/PHP.B-DGT, AAV-PHP.B-EST, AAV-PHP.B-GGT, AAV-PHP.BATP, AAV-PHP.B-ATT-T, AAV-PHP.B-DGT-T, AAV-PHP.B-GGT-T, AAV- PHP.B-SGS, AAV-PHP.B-AQP, AAV-PHP.B-QQP, AAV-PHP.B-SNP(3), AAV-PHP.B- SNP, AAV-PHP.B-QGT, AAV-PHP.B-NQT, AAV-PHP.B-EGS, AAV-PHP.BSGN, AAV- PHP.B-EGT, AAV-PHP.B-DST, AAV-PHP.BDST, AAV-PHP.B-STP, AAV-PHP.B-PQP, AAV-PHP.BSQP, AAV-PHP.B-Q1P, AAV-PHP.B-TMP, AAV-PHP.BTTP, AAV- PHP.S/G2A12, AAV-G2A15/G2A3, AAV-G2B4, AAV-G2B5, PHP.S, AAAV, AAV A3.3, AAV A3.4, AAV A3.5, AAV A3.7, AAV CBr-7.3, AAV CBr-7.1, AAV CBr-7.10, AAV CBr- 7.2, AAV CBr-7.4, AAV CBr-7.5, AAV CBr-7.7, AAV CBr-7.8, AAV CBr-B7.3, AAV CBr- B7.4, AAV CBr-E1, AAV CBr-E2, AAV CBr-E3, AAV CBr-E4, AAV CBr-E5, AAV CBr-
e5, AAV CBr-E6, AAV CBr-E7, AAV CBr-E8, AAV CHt-1, AAV CHt-2, AAV CHt-3, AAV CHt-6.1, AAV CHt-6.10, AAV CHt-6.5, AAV CHt-6.6, AAV CHt-6.7, AAV CHt-6.8, AAV CHt-P1, AAV CHt-P2, AAV CHt-P5, AAV CHt-P6, AAV CHt-P8, AAV CHt-P9, AAV CKd- N4, AAV CKd-1, AAV CKd-10, AAV CKd-2, AAV CKd-3, AAV CKd-4, AAV CKd-6, AAV CKd-7, AAV CKd-8, AAV CKd-B1, AAV CKd-B2, AAV CKd-B3, AAV CKdB4, AAV CKd-B5, AAV CKd-B6, AAV CKd-B7, AAV CKd-B8, AAV CKd-H1, AAV CKd-H2, AAV CKd-H3, AAV CKd-H4, AAV CKd-H5, AAV CKd-H6, AAV CKd-N3, AAV CKd-N9, AAV CLg-F1, AAV CLg-F2, AAV CLg-F3, AAV CLg-F4, AAV CLg-F5, AAV CLg-F6, AAV CLg-F7, AAV CLg-F8, AAV CLv-M9, AAV CLv-R6, AAV CLv-1, AAV CLv1-1, AAV CLv1-10, AAV CLv1-2, AAV CLv-12, AAV CLv1-3, AAV CLv-13, AAV CLv1-4, AAV CLv1-7, AAV CLv1-8, AAV CLv1-9, AAV CLv-2, AAV CLv-3, AAV CLv-4, AAV CLv-6, AAV CLv-8, AAV CLv-D1, AAV CLv-D2, AAV CLv-D3, AAV CLv-D4, AAV CLv-D5, AAV CLv-D6, AAV CLv-D7, AAV CLv-D8, AAV CLv-E1, AAV CLv-K1, AAV CLv-K3, AAV CLv-K6, AAV CLv-L4, AAV CLv-L5, AAV CLv-L6, AAV CLv-M1, AAV CLv-M11, AAV CLv-M2, AAV CLv-M5, AAV CLv-M6, AAV CLvM7, AAV CLv-M8, AAV CLv-R1, AAV CLv-R2, AAV CLv-R3, AAV CLv-R4, AAV CLv-R5, AAV CLv-R7, AAV CLv-R8, AAV CLv-R9, AAV CSp-8.10, AAV CSp-1, AAV CSp-10, AAV CSp-11, AAV CSp-2, AAV CSp-3, AAV CSp-4, AAV CSp-6, AAV CSp-7, AAV CSp-8, AAV CSp-8.2, AAV CSp-8.4, AAV CSp-8.5, AAV CSp-8.6, AAV CSp-8.7, AAV CSp-8.8, AAV CSp-8.9, AAV CSp-9, AAVLK08, AAV-LK15, AAV Shuffle 100-1, AAV Shuffle 100-2, AAV Shuffle 100-3, AAV Shuffle 100-7, AAV Shuffle 10-2, AAV Shuffle 10-6, AAV Shuffle 10-8, AAV SM 100-10, AAV SM 100-3, AAV SM 10-1, AAV SM 10-2, AAV SM 10-8, AAV.VR-355, AAV-b, AAVC1, AAVC2, AAVC5, AAVCh.5, AAVCh.5R1, AAV-DJ, AAV-DJ8, AAVF1/HSC1, AAVF11/HSC11, AAVF12/HSC12, AAVF13/HSC13, AAVF14/HSC14, AVF15/HSC15, AAVF16/HSC16, AAVF17/HSC17, AAVF2/HSC2, AAVF3, AAVF3/HSC3, AAVF4/HSC4, AAVF5, AAVF5/HSC5, AAVF6/HSC6, AAVF7/HSC7, AAVF8/HSC8, AAVF9/HSC9, AAV-h, AAVH-1/hu.1,AAVH2,AAVH-5/hu.3,AAVH6,AAVhE1.1, AAVhEr1.14, AAVhEr1.16, AAVhEr1.18, AAVhER1.23, AAVhEr1.35, AAVhEr1.36, AAVhEr1.5, AAVhEr1.7, AAVhEr1.8, AAVhEr2.16, AAVhEr2.29, AAVhEr2.30, AAVhEr2.31, AAVhEr2.36, AAVhEr2.4, AAVhEr3.1, AAVLG-10/rh.40, AAVLG-4/rh.38, AAVLG- 9/hu.39, AAVLG-9/hu.39, AAV-LK01, AAV-LK02, AAV-LK03, AAV-LK03, AAV-LK04, AAV-LK05, AAV-LK06, AAVLK07, AAV-LK09, AAV-LK10, AAV-LK11, AAV-LK12, AAV-LK13, AAV-LK14, AAV-LK16, AAV-LK17, AAVLK18, AAV-LK19, AAVN721- 8/rh.43, AAV-PAEC, AAVPAEC12, AAV-PAEC11, AAV-PAEC2, AAV-PAEC4,
AAVPAEC6, AAV-PAEC7, AAV-PAECS, Anc80, Anc80L65, Anc81, Anc82, Anc83, Anc84, Anc94, Anc110, Anc113, Anc126, Anc127, BAAV, BNP61 AAV, BNP62 AAV, BNP63 AAV, bovine AAV, caprine AAV, Japanese AAV10 serotype, UPENN AAV10, VOY101, and VOY201. [0217] In some aspects, an AAV is an AAV variant that has been genetically modified, e.g., by substitution, deletion or addition of one or several amino acid residues in one or more capsid proteins. Examples of such variants include, but are not limited to, AAV2 with one or more of Y444F, Y500F, Y730F and/or S662V mutations; AAV3 with one or more of Y705F, Y731F and/or T492V mutations; and AAV6 with one or more of S663V and/or T492V mutations. [0218] In some aspects, an AAV capsid is modified to comprise at least one surface-bound saccharide or a derivative thereof. As used herein, the term "surface-bound", when referring to the at least one saccharide, means that said at least one saccharide is bound to and exposed at the outer surface of the AAV vector. Suitable examples of saccharides include, but are not limited to, monosaccharides, oligosaccharides, polysaccharides, and derivatives thereof. 2. AAV constructs [0219] In some aspects, the present disclosure provides polynucleotide vectors (e.g., polynucleotide constructs) that comprise a nucleotide sequence encoding a polypeptide with m5C RNA writer, reader, and/or erasor functionality and/or targeting element. In some aspects described herein, a polynucleotide vector comprising a nucleotide sequence encoding a polypeptide with m5C RNA writer, reader, and/or erasor functionality and/or targeting element, can be comprised in an AAV capsid to produce an AAV particle (e.g., an AAV particle comprises an AAV construct comprised in an AAV capsid). [0220] In some aspects, a polynucleotide construct comprises one or more components derived from or modified from a naturally occurring AAV genomic construct. In some aspects, a sequence derived from an AAV construct is an AAV1 construct, an AAV2 construct, an AAV3 construct, an AAV4 construct, an AAV5 construct, an AAV6 construct, an AAV7 construct, an AAV8 construct, an AAV DJ/8 construct, an AAV9 construct, an AAV2.7m8 construct, an AAV8BP2 construct, an AAV293 construct, an AAVPhp.B construct, or AAVPhp.eB construct (see e.g., Chan et al., 2017). Additional exemplary AAV constructs that can be used herein are known in the art. See, e.g., Kanaan et al., Mol. Ther. Nucleic Acids 8: 184-197, 2017; Li et al., Mol. Ther.16(7): 1252-1260, 2008; Adachi et al., Nat. Commun.5: 3075, 2014; Isgrig et al., Nat. Commun. 10(1): 427, 2019; and Gao et al., J. Virol. 78(12):
6381-6388, 2004; each of which are incorporated herein by reference for the purposes described herein). [0221] In some aspects, AAV derived sequences (e.g., which are comprised in a polynucleotide construct) typically include the cis-acting 5' and 3' ITR sequences (see, e.g., B. J. Carter, in "Handbook of Parvoviruses," ed., P. Tijsser, CRC Press, pp.155168, 1990, which is incorporated herein by reference for the purposes described herein). Typical AAV2-derived ITR sequences are approximately 145 nucleotides in length. In some aspects, at least or exactly 80% of a typical ITR sequence (e.g., at least or exactly 85%, at least or exactly 90%, at least or exactly 95%, or at least or exactly 100%, etc.) is incorporated into a construct provided herein. The ability to modify these ITR sequences is within the skill of the art. (See, e.g., texts such as Sambrook et al., "Molecular Cloning. A Laboratory Manual", 2d ed., Cold Spring Harbor Laboratory, New York, 1989; and K. Fisher et al., J Virol.70:520532, 1996, each of which is incorporated herein by reference for the purposes described herein). In some aspects, any of the coding sequences and/or constructs described herein are flanked by 5' and 3' AAV ITR sequences. The AAV ITR sequences may be obtained from any known AAV, including presently identified AAV types. [0222] In some aspects, polynucleotide constructs described in accordance with this disclosure and in a pattern known to the art (see, e.g., Asokan et al., Mal. Ther.20: 699- 7080, 2012, which is incorporated herein by reference for the purposes described herein) are typically comprised of, a coding sequence or a portion thereof, at least one and/or control sequence, and optionally 5' and 3' AAV inverted terminal repeats (ITRs). In some aspects, provided constructs can be packaged into a capsid to create an AAV particle. An AAV particle may be delivered to a selected target cell. In some aspects, provided constructs comprise an additional optional coding sequence that is a nucleic acid sequence (e.g., inhibitory nucleic acid sequence), heterologous to the construct sequences, which encodes a polypeptide, protein, functional RNA molecule (e.g., miRNA, miRNA inhibitor) or other gene product, of interest. In some aspects, a nucleic acid coding sequence is operatively linked to and/or control components in a manner that permits coding sequence transcription, translation, and/or expression in a cell of a target tissue. [0223] In some aspects, an unmodified AAV endogenous genome includes two open reading frames, "cap" and "rep," which are flanked by ITRs. In some aspects, recombinant AAV constructs similarly comprise one or more open reading frames flanked by ITR sequences. In some aspects, an AAV construct also comprises conventional control elements that are operably linked to the coding sequence in a manner that permits its transcription,
translation and/or expression in a cell transfected with the polynucleotide construct or infected with a virus particle produced by the disclosure. In some aspects, an AAV construct optionally comprises a promoter, an enhancer, an untranslated region (e.g., a 5' UTR, 3' UTR), a Kozak sequence, an internal ribosomal entry site (IRES), splicing sites (e.g., an acceptor site, a donor site), a polyadenylation site, or any combination thereof. [0224] In some aspects, a construct is an AAV construct. In some aspects, an AAV construct can include at least 500 bp, at least 1 kb, at least 1.5 kb, at least 2 kb, at least 2.5 kb, at least 3 kb, at least 3.5 kb, at least 4 kb, at least 4.5 kb, or at least 4.7 kb. In some aspects, an AAV construct can include at most 7.5 kb, at most 7 kb, at most 6.5 kb, at most 6 kb, at most 5.5 kb, at most 5 kb, at most 4.5 kb, at most 4 kb, at most 3.5 kb, at most 3 kb, or at most 2.5 kb. In some aspects, an AAV construct can include about 1 kb to about 2 kb, about 1 kb to about 3 kb, about 1 kb to about 4 kb, about 1 kb to about 5 kb, about 2 kb to about 3 kb, about 2 kb to about 4 kb, about 2 kb to about 5 kb, about 3 kb to about 4 kb, about 3 kb to about 5 kb, or about 4 kb to about 5 kb. [0225] Any of the constructs described herein can further include regulatory and/or control sequences, e.g., a control sequence selected from the group of a transcription initiation sequence, a transcription termination sequence, a promoter sequence, an enhancer sequence, an RNA splicing sequence, a polyadenylation (poly(A)) sequence, a Kozak consensus sequence, and/or any combination thereof. In some aspects, a promoter can be a native promoter, a constitutive promoter, an inducible promoter, and/or a tissue-specific promoter. Non-limiting examples of control sequences are described herein and others are known in the art 3. AAV capsids [0226] In some aspects, the present disclosure provides one or more polynucleotide constructs packaged into an AAV capsid. In some aspects, an AAV capsid is from or is derived from an AAV capsid of an AAV2, 3, 4, 5, 6, 7, 8, 9, 10, rh8, rhl0, rh39, rh43 or Ancestral serotype, or one or more hybrids thereof. In some aspects, an AAV capsid is from an AAV ancestral serotype. In some aspects, an AAV capsid is an ancestral (Anc) AAV capsid. An Anc capsid is created from a construct sequence that is constructed using evolutionary probabilities and evolutionary modeling to determine a probable ancestral sequence. Thus, an Anc capsid/construct sequence is not known to have existed in nature. As provided herein, in some aspects, any combination of AAV capsids and AAV constructs (e.g., comprising AAV ITRs) may be used in recombinant AAV particles of the present disclosure. 4. Exemplary AAV construct components
a. Inverted Terminal Repeat Sequences (ITRs) [0227] AAV derived sequences of a construct typically comprises the cis-acting 5' and 3' ITRs (See, e.g., B. J. Carter, in "Handbook of Parvoviruses", ed., P. Tijsser, CRC Press, pp. 155168 (1990), which is incorporated herein by reference for the purposes described herein). Generally, ITRs are able to form a hairpin. The ability to form a hairpin can contribute to an ITRs ability to self-prime, allowing primase-independent synthesis of a second DNA strand. ITRs can also aid in efficient encapsidation of an AAV construct in an AAV particle. [0228] An AAV particle of the present disclosure can comprise an AAV construct comprising a coding sequence (e.g., encoding a polypeptide with m5C RNA writer, reader, and/or erasor functionality and/or targeting element) and associated elements flanked by a 5' and a 3' AAV ITR sequences. In some aspects, an ITR is or comprises approximately 130 nucleic acids. In some aspects, an ITR is or comprises approximately 145 nucleic acids. In some aspects, an ITR is or comprises 125 to 150 nucleic acids. In some aspects, all or substantially all of a sequence encoding an ITR is used. In some aspects, an AAV ITR sequence may be obtained from any known AAV, including presently identified mammalian AAV types. In some aspects an ITR is an AAV2 ITR. In some aspects, an ITR is an AAV9 ITR. [0229] A non-limiting example of a polynucleotide construct of the present disclosure is a "cisacting" construct comprising a coding sequence, in which said sequence and any associated regulatory elements are flanked by 5' or "left" and 3' or "right" AAV ITR sequences.5' and left designations refer to a position of an ITR sequence relative to an entire construct, read left to right, in a sense direction. For example, in some aspects, a 5' or left ITR is an ITR that is closest to a promoter (e.g., as opposed to a polyadenylation sequence) for a given construct, when a construct is depicted in a sense orientation, linearly. Concurrently, 3' and right designations refer to a position of an ITR sequence relative to an entire construct, read left to right, in a sense direction. For example, in some aspects, a 3' or right ITR is an ITR that is closest to a polyadenylation sequence and/or stop codon (e.g., as opposed to a promoter sequence) for a given construct, when a construct is depicted in a sense orientation, linearly. In general, ITRs as provided herein are depicted in 5' to 3' order in accordance with a sense strand. Accordingly, one of skill in the art will appreciate that a 5' or "left" orientation ITR can also be depicted as a 3' or "right" ITR when converting from sense to anti sense direction. Further, it is well within the ability of one of skill in the art to transform a given sense ITR sequence (e.g., a 5'/left AAV ITR) into an antisense sequence (e.g., 3'/right ITR sequence). One of ordinary skill in the art would understand how to modify a given ITR sequence for use as either a 5'/left or 3'/right ITR, or an antisense version thereof.
b. Promoters [0230] In some aspects, a construct (e.g., an AAV construct) comprises a promoter. The term "promoter" refers to a DNA sequence recognized by enzymes/proteins that can promote and/or initiate transcription of an operably linked gene. For example, a promoter typically refers to, e.g., a nucleotide sequence to which an RNA polymerase and/or any associated factor binds and from which it can initiate transcription. Thus, in some aspects, a construct (e.g., an AAV construct) comprises a promoter operably linked to one of the non-limiting example promoters described herein. [0231] In some aspects, a promoter is an inducible promoter, a constitutive promoter, a mammalian cell promoter, a viral promoter, a chimeric promoter, an engineered promoter, a tissue-specific promoter, or any other type of promoter known in the art. In some aspects, a promoter is a RNA polymerase II promoter, such as a mammalian RNA polymerase II promoter. In some aspects, a promoter is a RNA polymerase III promoter, including, but not limited to, a HI promoter, a human U6 promoter, a mouse U6 promoter, or a swine U6 promoter. A promoter will generally be one that is able to promote transcription in a mammalian cell. [0232] A variety of promoters are known in the art, which in some aspects, can be used herein. Nonlimiting examples of promoters that can be used herein in some aspects include: human EFlα, human cytomegalovirus (CMV) (US Patent No.5,168,062, which is incorporated herein by reference for the purposes described herein), human ubiquitin C (UBC), mouse phosphoglycerate kinase 1, polyoma adenovirus, simian virus 40 (SV40), β-globin, β-actin, α- fetoprotein, γ-globin, β-interferon, γ-glutamyl transferase, mouse mammary tumor virus (MMTV), Rous sarcoma virus, rat insulin, glyceraldehyde-3-phosphate dehydrogenase, metallothionein II (MT II), amylase, cathepsin, MI muscarinic receptor, retroviral LTR (e.g., human T-cell leukemia virus HTLV), AAV ITR, interleukin-2, collagenase, platelet-derived growth factor, adenovirus 5 E2, stromelysin, murine MX gene, glucose regulated proteins (GRP78 and GRP94), α-2-macroglobulin, vimentin, MHC class I gene H-2K b, HSP70, proliferin, tumor necrosis factor, thyroid stimulating hormone a gene, immunoglobulin light chain, T-cell receptor, HLA DQa and DQ, interleukin-2 receptor, MHC class II, MHC class II HLA-DRa, muscle creatine kinase, prealbumin (transthyretin), elastase I, albumin gene, c-fos, c-HA-ras, neural cell adhesion molecule (NCAM), H2B (TH2B) histone, rat growth hormone, human serum amyloid (SAA), troponin I (TN I), duchenne muscular dystrophy, human immunodeficiency virus, and Gibbon Ape Leukemia Virus (GAL V) promoters. Additional examples of promoters are known in the art. See, e.g., Lodish, Molecular Cell Biology,
Freeman and Company, New York 2007, each of which is incorporated herein by reference for the purposes described herein. In some aspects, a promoter is the CMV immediate early promoter. In some aspects, the promoter is a CAG promoter and/or a CAG/CBA promoter. [0233] The term "constitutive" promoter refers to a nucleotide sequence that, when operably linked with a nucleic acid encoding a gene (e.g., encoding a polypeptide with m5C RNA writer, reader, and/or erasor functionality and/or targeting element), causes RNA to be transcribed from the nucleic acid in a cell under most or all physiological conditions. Examples of constitutive promoters include, without limitation, the retroviral Rous sarcoma virus (RSV) LTR promoter, the cytomegalovirus (CMV) promoter (see, e.g., Boshart et al., Cell 41:521- 530, 1985, which is incorporated herein by reference for the purposes described herein), the SV 40 promoter, the dihydrofolate reductase promoter, the beta-actin promoter, the phosphoglycerol kinase (PGK) promoter, and the EFl-alpha promoter (Invitrogen). [0234] Inducible promoters allow regulation of gene expression and can be regulated by exogenously supplied compounds, environmental factors such as temperature, or the presence of a specific physiological state, e.g., acute phase, a particular differentiation state of the cell, or in replicating cells only. Inducible promoters and inducible systems are available from a variety of commercial sources, including, without limitation, Invitrogen, Clontech, and Ariad. Additional examples of inducible promoters are known in the art. Examples of inducible promoters regulated by exogenously supplied compounds include the zinc-inducible sheep metallothionein (MT) promoter, the dexamethasone (Dex) inducible mouse mammary tumor virus (MMTV) promoter, the T7 polymerase promoter system (see e.g., WO 98/10088, which is incorporated herein by reference for the purposes described herein); the ecdysone insect promoter (see e.g., No et al., Proc. Natl. Acad Sci. U.S.A 93:3346-3351, 1996, which is incorporated herein by reference for the purposes described herein), the tetracycline-repressible system (see e.g., Gossen et al., Proc. Natl. Acad Sci. U.S.A 89:5547-5551, 1992, which is incorporated herein by reference for the purposes described herein), the tetracycline-inducible system (see e.g., Gossen et al., Science 268: 1766-1769, 1995, see also Harvey et al., Curr. Opin. Chem. Biol.2:512-518, 1998, each of which is incorporated herein by reference for the purposes described herein), the RU486-inducible system (see e.g., Wang et al., Nat. Biotech. 15:239- 243, 1997, and Wang et al., Gene Ther.4:432-441, 1997, each of which is incorporated herein by reference for the purposes described herein), and the rapamycin-inducible system (see e.g., Magari et al., J Clin. Invest.100:2865-2872, 1997, which is incorporated herein by reference for the purposes described herein).
[0235] The term "tissue-specific" promoter refers to a promoter that is active only in certain specific cell types and/or tissues (e.g., transcription of a specific gene occurs only within cells expressing transcription regulatory and/or control proteins that bind to the tissue-specific promoter). In some aspects, regulatory and/or control sequences impart tissue-specific gene expression capabilities. In some cases, tissue-specific regulatory and/or control sequences bind tissue-specific transcription factors that induce transcription in a tissue-specific manner. In some aspects, a tissue-specific promoter is a neuron-specific promoter. In some aspects, a tissue-specific promoter is hematopoietic lineage cell-specific promoter. In some aspects, a tissue-specific promoter is an immune cell-specific promoter. c. Enhancers [0236] In some aspects, a construct can include an enhancer sequence. The term "enhancer" as used herein refers to a nucleotide sequence that can increase the level of transcription of a nucleic acid encoding a protein and/or RNA molecule of interest (e.g., a polypeptide with m5C RNA writer, reader, and/or erasor functionality and/or targeting element), and/or increase or modify the translational efficiency of a transcript following transcription. In some aspects, enhancer sequences (generally 50-1500 bp in length) generally increase the level of transcription by providing additional binding sites for transcription-associated proteins (e.g., transcription factors), and/or stabilize or modify post-transcriptional regulatory machinery. In some aspects, an enhancer sequence is found within an intronic sequence. In some aspects, an enhancer sequence is found in a 3ʹ and/or 5ʹ UTR. In some aspects, an enhancer region is found downstream of a coding sequence comprising a transgene and proximal to a poly adenylation sequence. Unlike promoter sequences, enhancer sequences can act at much larger distance away from the transcription start site (e.g., as compared to a promoter). Non-limiting examples of enhancers include a woodchuck hepatitis virus post-transcriptional regulatory element (WPRE), RSV enhancer, a CMV enhancer, and/or a SV40 enhancer. d. Flanking untranslated regions, 5ʹ UTR and 3ʹ UTR [0237] In some aspects, any of the constructs described herein can include an untranslated region (UTR), such as a 5' UTR or a 3' UTR. UTRs of a gene are transcribed but not translated. A 5' UTR starts at the transcription start site and continues to the start codon but does not include the start codon. A 3' UTR starts immediately following the stop codon and continues until the transcriptional termination signal. The regulatory and/or control features of a UTR can be incorporated into any of the constructs, particles, polynucleotides, compositions, kits, or methods as described herein to enhance or otherwise modulate the expression of a gene.
[0238] Natural 5' UTRs include a sequence that plays a role in translation initiation. In some aspects, a 5' UTR can comprise sequences, like Kozak sequences, which are commonly known to be involved in the process by which the ribosome initiates translation of many genes. Kozak sequences have the consensus sequence CCR(A/G)CCAUGG, where R is a purine (A or G) three bases upstream of the start codon (AUG), and the start codon is followed by another “G”. In some aspects, 5' UTRs also form secondary structures that are involved in elongation factor binding. In some aspects, a 5' UTR is included in any of the constructs described herein. Non-limiting examples of 5' UTRs, including those from the following genes: albumin, serum amyloid A, Apolipoprotein A/B/E, transferrin, alpha fetoprotein, erythropoietin, and Factor VIII, can be used to enhance expression of a nucleic acid molecule, such as an mRNA. [0239] 3' UTRs are known to have stretches of adenosines and uridines (in the RNA form) or thymidines (in the DNA form) embedded in them. These AU-rich signatures are particularly prevalent in genes with high rates of turnover. Based on their sequence features and functional properties, the AU-rich elements (AREs) can be separated into three classes (see e.g., Chen et al., Mol. Cell. Biol.15:5777-5788, 1995; Chen et al., Mol. Cell Biol.15:2010-2018, 1995, each of which is incorporated herein by reference for the purposes described herein): Class I AREs contain several dispersed copies of an AUUUA motif within U-rich regions. For example, c- Myc and MyoD mRNAs contain class I AREs. Class II AREs possess two or more overlapping UUAUUUA(U/A) (U/A) nonamers. GM-CSF and TNF-alpha mRNAs are examples that contain class II AREs. Class III AREs are less well defined. These U-rich regions do not contain an AUUUA motif, two well-studied examples of this class are c-Jun and myogenin mRNAs. [0240] Most proteins binding to the AREs are known to destabilize the messenger, whereas members of the ELAV family, most notably HuR, have been documented to increase the stability of mRNA. HuR binds to AREs of all the three classes. Engineering the HuR specific binding sites into the 3' UTR of nucleic acid molecules may lead to HuR binding and thus, stabilization of the message in vivo. [0241] In some aspects, the introduction, removal, or modification of 3' UTR AREs can be used to modulate the stability of an mRNA encoding a gene of interest. In other aspects, AREs can be removed or mutated to increase the intracellular stability and thus increase translation and production of a protein of interest. [0242] In some aspects, non-ARE sequences may be incorporated into the 5' or 3' UTRs. In some aspects, introns or portions of intron sequences may be incorporated into the flanking regions of the polynucleotides in any of the constructs, particles, polynucleotides,
compositions, kits, and methods provided herein. Incorporation of intronic sequences may increase protein production as well as mRNA levels. e. Internal Ribosome Entry Sites (IRES) [0243] In some aspects, a construct described herein can include an internal ribosome entry site (IRES). An IRES forms a complex secondary structure that allows translation initiation to occur from any position with an mRNA immediately downstream from where the IRES is located (see, e.g., Pelletier and Sonenberg, Mol. Cell. Biol. 8(3): 1103-1112, 1988, which is incorporated herein by reference for the purposes described herein). There are several IRES sequences known to those in skilled in the art, including those from, e.g., foot and mouth disease virus (FMDV), encephalomyocarditis virus (EMCV), human rhinovirus (HRV), cricket paralysis virus, human immunodeficiency virus (HIV), hepatitis A virus (HA V), hepatitis C virus (HCV), and poliovirus (PV) (see e.g., Alberts, Molecular Biology of the Cell, Garland Science, 2002; and Hellen et al., Genes Dev. 15(13):1593-612, 2001, each of which are incorporated herein by reference for the purposes described herein). [0244] In some aspects, an IRES sequence that is incorporated into a construct described herein is the foot and mouth disease virus (FMDV) 2A sequence. The Foot and Mouth Disease Virus 2A sequence is a small peptide (approximately 18 amino acids in length) that has been shown to mediate the cleavage of polyproteins (see e.g., Ryan, MD et al., EMBO 4:928-933, 1994; Mattion et al., J Virology 70:8124-8127, 1996; Furler et al., Gene Therapy 8:864-873, 2001; and Halpin et al., Plant Journal 4:453-459, 1999, each of which is incorporated herein by reference for the purposes described herein). The cleavage activity of the 2A sequence has previously been demonstrated in artificial systems including plasmids and gene therapy constructs (e.g., AAV and retroviruses) (see e.g., Ryan et al., EMBO 4:928-933, 1994; Mattion et al., J Virology 70:8124-8127, 1996; Furler et al., Gene Therapy 8:864-873, 2001; and Halpin et al., Plant Journal 4:453-459, 1999; de Felipe et al., Gene Therapy 6: 198-208, 1999; de Felipe et al., Human Gene Therapy II: 1921-1931, 2000; and Klump et al., Gene Therapy 8:811-817, 2001, each of which is incorporated herein by reference for the purposes described herein). [0245] In some aspects, an IRES can be utilized in an AAV construct. In some aspects, a construct can include a polynucleotide internal ribosome entry site (IRES). In some aspects, an IRES can be part of a composition comprising more than one construct. In some aspects, an IRES is used to produce more than one polypeptide from a single gene transcript.
f. Splice sites [0246] In some aspects, any of the constructs provided herein can include splice donor and/or splice acceptor sequences, which are functional during RNA processing occurring during transcription. In some aspects, splice sites are involved in trans-splicing. g. Polyadenylation sequences [0247] In some aspects, a construct provided herein can include a polyadenylation (poly(A)) signal sequence. Most nascent eukaryotic mRNAs possess a poly(A) tail at their 3' end, which is added during a complex process that includes cleavage of the primary transcript and a coupled polyadenylation reaction driven by the poly(A) signal sequence (see, e.g., Proudfoot et al., Cell 108:501-512, 2002, which is incorporated herein by reference for the purposes described herein). A poly(A) tail confers mRNA stability and transferability (see e.g., Molecular Biology of the Cell, Third Edition by B. Alberts et al., Garland Publishing, 1994, which is incorporated herein by reference for the purposes described herein). In some aspects, a poly(A) signal sequence is positioned 3' to a coding sequence. [0248] As used herein, "polyadenylation" refers to the covalent linkage of a polyadenylyl moiety, or its modified variant, to a messenger RNA molecule. In eukaryotic organisms, most messenger RNA (mRNA) molecules are polyadenylated at the 3' end. A 3' poly(A) tail is a long sequence of adenine nucleotides (e.g., 50, 60, 70, 100, 200, 500, 1000, 2000, 3000, 4000, or 5000) added to the pre-mRNA through the action of an enzyme, polyadenylate polymerase. In some aspects, a poly(A) tail is added onto transcripts that contain a specific sequence, e.g., a poly(A) signal. A poly(A) tail and associated proteins aid in protecting mRNA from degradation by exonucleases. Polyadenylation also plays a role in transcription termination, export of the mRNA from the nucleus, and translation. Polyadenylation typically occurs in the nucleus immediately after transcription of DNA into RNA, but also can occur later in the cytoplasm. After transcription has been terminated, an mRNA chain is cleaved through the action of an endonuclease complex associated with RNA polymerase. A cleavage site is usually characterized by the presence of the base sequence AAUAAA near the cleavage site. After the mRNA has been cleaved, adenosine residues are added to the free 3' end at the cleavage site. [0249] As used herein, a "poly(A) signal sequence" or "polyadenylation signal sequence" is a sequence that triggers the endonuclease cleavage of an mRNA and the addition of a series of adenosines to the 3' end of the cleaved mRNA. [0250] There are several poly(A) signal sequences that can be used in some aspects, including those derived from bovine growth hormone (bGH) (Woychik et al., Proc. Natl. Acad Sci. U.S.A.81(13):3944-3948, 1984; U.S. Patent No.5,122,458, each of which is incorporated
herein by reference for the purposes described herein), mouse-β-globin, mouse-α-globin (Orkin et al., EMBO J 4(2):453-456, 1985; Thein et al., Blood 71(2):313-319, 1988, each of which is incorporated herein by reference for the purposes described herein), human collagen, polyoma virus (Batt et al., Mol. Cell Biol. 15(9):4783-4790, 1995, which is incorporated herein by reference for the purposes described herein), the Herpes simplex virus thymidine kinase gene (HSV TK), IgG heavy-chain gene polyadenylation signal (US 2006/0040354, which is incorporated herein by reference for the purposes described herein), human growth hormone (hGH) (Szymanski et al., Mol Therapy 15(7):1340-1347, 2007, which is incorporated herein by reference for the purposes described herein), and/or the group consisting of SV40 poly(A) site, such as the SV40 late and early poly(A) site (see e.g., Schek et al., Mol Cell Biol. 12(12):5386-5393, 1992, which is incorporated herein by reference for the purposes described herein). [0251] In some aspects, the poly(A) signal sequence can be AATAAA. The AATAAA sequence may be substituted with other hexanucleotide sequences with homology to AATAAA and that are capable of signaling polyadenylation, including ATTAAA, AGTAAA, CATAAA, TATAAA, GATAAA, ACTAAA, AATATA, AAGAAA, AATAAT, AAAAAA, AATGAA, AATCAA, AACAAA, AATCAA, AATAAC, AATAGA, AATTAA, or AATAAG (see, e.g., WO 06/12414, which is incorporated herein by reference for the purposes described herein). In some aspects, a poly(A) signal sequence can be a synthetic polyadenylation site (see, e.g., the pCl-neo expression construct of Promega that is based on Levitt et al., Genes Dev. 3(7):1019-1025, 1989, which is incorporated herein by reference for the purposes described herein). h. Additional sequences [0252] In some aspects, constructs of the present disclosure may comprise a 2A element or sequence. In some aspects, constructs of the present disclosure may include one or more cloning sites. In some such aspects, cloning sites may not be fully removed prior to manufacturing for administration to a subject. In some aspects, cloning sites may have functional roles including as linker sequences, or as portions of a Kozak site. As will be appreciated by those skilled in the art, cloning sites may vary significantly in primary sequence while retaining their desired function. [0253] In some aspects, a 2A element is a T2A, P2A, E2A, and/or F2A element. In some aspects, a 2A sequence may comprise an optional 5ʹ linker sequence, such as but not limited to GSG (e.g., Glycine, Serine, Glycine).
i. Destabilization domains [0254] In some aspects, any of the constructs provided herein can optionally include a sequence encoding a destabilizing domain ("a destabilizing sequence") for temporal and/or spatial control of protein expression. Non-limiting examples of destabilizing sequences include sequences encoding a FK506 sequence, a dihydrofolate reductase (DHFR) sequence, or other exemplary destabilizing sequences. [0255] In the absence of a stabilizing ligand, a protein sequence operatively linked to a destabilizing sequence is degraded by ubiquitination. In contrast, in the presence of a stabilizing ligand, protein degradation is inhibited, thereby allowing the protein sequence operatively linked to the destabilizing sequence to be actively expressed. As a positive control for stabilization of protein expression, protein expression can be detected by conventional means, including enzymatic, radiographic, colorimetric, fluorescence, or other spectrographic assays, fluorescent activating cell sorting (FACS) assays, and/or immunological assays (e.g., enzyme linked immunosorbent assay (ELISA), radioimmunoassay (RIA), and immunohistochemistry). [0256] Additional examples of destabilizing sequences are known in the art. In some aspects, the destabilizing sequence is a FK506- and rapamycin-binding protein (FKBP12) sequence, and the stabilizing ligand is Shield-I (Shld1) (see e.g., Banaszynski et al. (2012) Cell 126(5):995-1004, which is incorporated herein by reference for the purposes described herein). In some aspects, a destabilizing sequence is a DHFR sequence, and a stabilizing ligand is trimethoprim (TMP) (see e.g., Iwamoto et al., (2010) Chem Biol 17:981-988, which is incorporated herein by reference for the purposes described herein). j. Reporter Sequences or Elements [0257] In some aspects, constructs provided herein can optionally include a sequence encoding a reporter polypeptide and/or protein ("a reporter sequence"). Non-limiting examples of reporter sequences include DNA sequences encoding: a beta-lactamase, a betagalactosidase (LacZ), an alkaline phosphatase, a thymidine kinase, a green fluorescent protein (GFP), a red fluorescent protein, an mCherry fluorescent protein, a yellow fluorescent protein, a chloramphenicol acetyltransferase (CAT), and a luciferase. Additional examples of reporter sequences are known in the art. When associated with control elements which drive their expression, the reporter sequence can provide signals detectable by conventional means, including enzymatic, radiographic, colorimetric, fluorescence, or other spectrographic assays, fluorescent activating cell sorting (FACS) assays and/or immunological assays (e.g., enzyme linked immunosorbent assay (ELISA), radioimmunoassay (RIA), and immunohistochemistry). In some aspects, a reporter sequence is a FLAG tag (e.g., a 3xFLAG tag), and the presence of
a construct carrying the FLAG tag in a cell is detected by protein binding or detection assays (e.g., Western blots, immunohistochemistry, radioimmunoassay (RIA), mass spectrometry). [0258] In some aspects, a reporter sequence is the Lacz gene, and the presence of a construct carrying the Lacz gene in a cell is detected by assays for beta-galactosidase activity. In some aspects, a reporter sequence is a fluorescent protein (e.g., green fluorescent protein (GFP)) or luciferase. In aspects where a reporter sequence is a fluorescent protein or luciferase, the presence of a construct carrying the fluorescent protein or luciferase in a cell may be measured by fluorescent imaging techniques (e.g., fluorescent microscopy or FACS) or light production in a luminometer (e.g., a spectrophotometer or an IVIS imaging instrument). In some aspects, a reporter sequence can be used to verify tissue-specific targeting capabilities and/or tissue-specific promoter regulatory and/or control activity of any of the constructs described herein. B. Nanoparticles [0259] Certain aspects disclosed herein concern compositions comprising a nanoparticle, which may encapsulate a therapeutic agent, which can be any of the therapeutic agents disclosed herein (for example but not limited to, inhibitors of NSUN1, NSUN2, MBD5, MBD6, and/or TET2, and/or fusion proteins described herein). The nanoparticle compositions may encapsulate therapeutic agents, which may be but are not limited to: engineered protein compositions, polynucleotides, and/or small molecules. In some aspects, engineered protein compositions, polynucleotides, and/or small molecules formulated in the nanoparticles can have improved pharmacokinetic and/or pharmacodynamic properties. In some aspects, engineered protein compositions, polynucleotides, and/or small molecules nanoparticles are better tolerated by a patient, including a cancer patient. In some aspects, engineered protein compositions, polynucleotides, and/or small molecules are formulated in the nanoparticles, and in some aspects, are more effectively delivered to cells to effect their function, such as effecting transcriptional changes, than naked engineered proteins, polynucleotides, and/or small molecules. In some aspects, a nanoparticle composition confers water solubility to hydrophobic agents, to combinations of hydrophobic agents, and/or to combinations of hydrophobic and hydrophilic agents. In some aspects, a nanoparticle composition comprises a liposomal and/or nano-emulsion composition of a therapeutic agent. [0260] In several aspects, as disclosed elsewhere herein, a nanoparticle composition (e.g., a mixed micelle composition, a liposomal composition, solid lipid particles, oil-in-water emulsions, water-in-oil-in-water emulsions, water-in-oil emulsions, oil-in-water-in-oil
emulsions, etc.) is provided to aid in the delivery of therapeutic agents. In some aspects, the nanoparticles comprise one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more) therapeutic agents. In some aspects, a composition comprising the nanoparticles disclosed herein comprises a therapeutically effective amount of one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more) therapeutic agents. [0261] In several aspects, when formulated, the dry weight % of one or more therapeutic agents present in the nanoparticle compositions is equal to, is equal to at least, or is equal to at most: 0.1%, 0.5%, 1%, 2.5%, 5%, 7.5%, 10%, 12.5%, 15%, 20%, 22.5%, 25%, 27.5%, 30%, 32.5%, 35%, 37.5%, 40%, 42.5%, 45%, 47.5%, 50%, 52.5%, 55%, 57.5%, 60%, or any range derivable therein. In several aspects, the therapeutic agents are provided in an aqueous composition. In several aspects, the wet weight % of one or more therapeutic agents present in the composition (with water included) is equal to or at least about: 0.5%, 1%, 2.5%, 5%, 7.5%, 10%, 12.5%, 15%, 20%, 22.5%, 25%, 27.5%, 30%, or any range derivable therein. In several aspects, the one or more therapeutic agents may be provided in the wet composition at a concentration of greater than or equal to about: 0.001 mg/mL, 0.005 mg/mL, 0.01 mg/mL, 0.05 mg/mL, 0.1 mg/mL, 0.5 mg/mL, 1 mg/mL, 5 mg/mL, 20 mg/mL, 30 mg/mL, 50 mg/mL, 100 mg/mL, or any range derivable therein. [0262] In several aspects, the therapeutic agents, collectively or individually, are present in the aqueous nanoparticle composition at a concentration of less than or equal to about: 150 mg/mL, 100 mg/mL, 75 mg/mL, 50 mg/mL, 25 mg/mL, 20 mg/mL, 10 mg/mL, 5 mg/mL, 2.5 mg/mL, 2 mg/mL, 1.5 mg/mL, 1 mg/mL, 0.5 mg/mL, 0.1 mg/mL, 0.05 mg/mL, 0.01 mg/mL, or ranges including and/or spanning the aforementioned values. In several aspects, the one or more therapeutic agents, collectively or individually, are present in the aqueous composition at a concentration of greater than or equal to about: 150 mg/mL, 100 mg/mL, 75 mg/mL, 50 mg/mL, 25 mg/mL, 20 mg/mL, 10 mg/mL, 5 mg/mL, 2.5 mg/mL, 2 mg/mL, 1.5 mg/mL, 1 mg/mL, 0.5 mg/mL, 0.1 mg/mL, 0.05 mg/mL, 0.01 mg/mL, or ranges including and/or spanning the aforementioned values. In several aspects, the one or more therapeutic agents, collectively or individually, are present in the composition at a dry wt. % of equal to, at most, or at least about: 0.1%, 0.5%, 1%, 2.5%, 5%, 7.5%, 10%, 12.5%, 15%, 20%, 22.5%, 25%, 27.5%, 30%, 32.5%, 35%, 37.5%, 40%, 42.5%, 45%, 47.5%, 50%, 52.5%, 55%, 57.5%, 60%, or ranges including and/or spanning the aforementioned values. In several aspects, the one or more therapeutic agents, collectively or individually, are present in the composition at a wet wt. % of equal to, or at most, or at least about: 0.1%, 0.5%, 1%, 2.5%, 5%, 7.5%, 10%, 12.5%, 15%, 20%, 22.5%, 25%, 27.5%, 30%, 32.5%, 35%, 37.5%, 40%, 42.5%, 45%, 47.5%, 50%,
52.5%, 55%, 57.5%, 60%, or ranges including and/or spanning the aforementioned values. In several aspects, as disclosed elsewhere herein, the composition is aqueous, while in others it has been dried into a powder (that is free of or substantially free of water). In several aspects, where the composition has been dried, it comprises a water content of less than or equal to 20%, 15%, 10%, 7.5%, 5%, 2.5%, 1%, or ranges including and/or spanning the aforementioned values. [0263] In several aspects, as disclosed elsewhere herein, the composition is aqueous (e.g., contains water) while in other aspects, the composition is dry (lacks water or substantially lacks water). In several aspects, the composition has been dried (e.g., has been subjected to a process to remove most or substantially all water). In several aspects, the composition comprises nanoparticles in water (e.g., as a solution, suspension, or emulsion). In other aspects, the composition is provided as a powder (e.g., that may be constituted or reconstituted in water). In several aspects, as disclosed elsewhere herein, the water content (in wt. %) of the composition is less than or equal to about: 30%, 20%, 10%, 5%, 2.5%, 1%, 0.5%, 0.1%, 0%, or ranges including and/or spanning the aforementioned values. In several aspects, as disclosed elsewhere herein, the water content (in wt. %) of the composition is greater than or equal to about: 50%, 60%, 70%, 80%, 85%, 90%, 92.5%, 95%, 97.5%, or ranges including and/or spanning the aforementioned values. In several aspects, the water is nanopure, deionized, USP grade, WFI, and/or combinations of the foregoing. In some aspects, the composition is a dried composition comprising a nanoparticle having weight ratios of a first therapeutic agent: a nanoscale coordination polymer NCP: optionally a lipid source, and optionally a surfactant of 1 to 50:1 to 50:1 to 50:0 to 17.5. [0264] In some aspects, a nanoparticle composition provides an oil-in-water emulsion (e.g., a nanoemulsion), water-in-oil emulsion, a water-in-oil-in-water emulsion, an oil-in-water-in- oil emulsion, a liposome (and variants including multi-lamellar, double liposome preparations, etc.), micelle, and/or solid lipid particles. Any one of these structures may be provided as a nanoparticle or microparticle. [0265] In some aspects, the nanoparticle composition comprises a lipid source. In some aspects, the lipid source comprises a charged lipid, which can impart a charge to the nanoparticle. In some aspects, the lipid source comprises a neutral lipid. In some aspects, the lipid source comprises one or more phospholipids. In some aspects, the one or more phospholipids comprises one or more of phosphatidic acid, phosphatidylethanolamine, phosphatidylcholine, phosphatidylserine, phosphatidylinositol, phosphatidylinositol phosphate, phosphatidylinositol bisphosphate, phosphatidylinositol trisphosphate, lipoid H
100-3, phospholipon 90H, phospholipon 80H, lipoid 100-3, lipoid P75-3, 1,2-dioleoyl-sn- glycero-3-phosphate (DOPA), 1,2-dioleoyl-sn-glycero-3-phosphocholine (DOPC), 1,2- diastearoyl-sn-glycero-3-phosphoethanolamine-N-[amino(polyethylene glycol)2000] (DSPE- PEG2000) or any combination of the foregoing. In some aspects, the lipid source is a phosphatidylcholine. In some aspects, the one or more lipid source lipid(s) (collectively or individually) are present in the composition at a dry wt. % of equal to or less than about: 0%, 5%, 10%, 15%, 20%, 30%, 40%, 50%, 60%, or ranges including and/or spanning the aforementioned values. In some aspects, the one or more lipid source lipid(s) (collectively or individually) are present in the composition at a wet wt. % of equal to or less than about: 0%, 0.1%, 0.5%, 1.0%, 2.5%, 4%, 5%, 6%, 7.5%, 10%, 12.5%, 15%, 17.5%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, or ranges including and/or spanning the aforementioned values. In some aspects, the one or more lipid source lipid(s) (collectively or individually) are present in the composition at a wet w/v of equal to or less than about: 0 mg/mL, 0.1 mg/mL, 0.5 mg/mL, 1.0 mg/mL, 2.5 mg/mL, 4 mg/mL, 5 mg/mL, 6 mg/mL, 7.5 mg/mL, 10 mg/mL, 12.5 mg/mL, 15 mg/mL, 17.5 mg/mL, 20 mg/mL, 25 mg/mL, 30 mg/mL, 35 mg/mL, 40 mg/mL, 45 mg/mL, 50 mg/mL, or ranges including and/or spanning the aforementioned values. In some aspects, the composition is aqueous, while in others it has been dried into a powder. For instance, in several aspects, the composition is aqueous (wet), while in others it has been dried into a powder (dry). In some aspects, the one or more lipid(s) of the lipid source are synthetic, derived from sunflower, soy, egg, or mixtures thereof. In some aspects, the one or more lipids of the lipid source can be hydrogenated or non-hydrogenated. In some aspects, the lipid source exceeds requirements of the United States Pharmacopeia (is USP grade) and/or is National Formulary (NF) grade. [0266] In some aspects, the one or more lipids of the lipid source has a purity of greater than or equal to about: 92.5%, 95%, 96%, 96.3%, 98%, 99%, 100%, or ranges including and/or spanning the aforementioned values. In several aspects, the one or more lipids of the lipid source has a total % impurity content by weight of less than or equal to about: 8.5%, 5%, 4%, 3.7%, 2%, 1%, 0%, or ranges including and/or spanning the aforementioned values. [0267] In some aspects, a nanoparticle composition comprises a surfactant. In certain aspects, a nanoparticle composition does not comprise a surfactant. In some aspects, the surfactant is a pharmaceutically acceptable surfactant. In some aspects, the surfactant is a food surfactant. In some aspects, the surfactant comprises one or more of a polyoxyethylene sorbitan esters (e.g., polysorbates/tweens, including polysorbate 80, polysorbate 20, etc.), cremophor (e.g., a non-ionic solubilizer and emulsifier that is made by reacting ethylene oxide with castor
oil), propylene oxide-modified polymethylsiloxane, dodecyl betaine, lauramidopropyl betaine, cocoamido-2-hydroxypropyl sulfobetaine, sodium stearate (or other stearate salts), polyoxyethylene alcohol, lecithins, mono- and diglycerides of fatty acids (MDG), acetic acid esters of MDG, lactic acid esters of MDG, citric acid esters of MDG, mono- and diacetyl tartaric acid esters of MDG, sucrose esters of fatty acids, polyglycerol esters of fatty acids (e.g., polyglycerol esters), polyglycerol polyricinoleate, propane-1,2-diol esters of fatty acids, propylene glycol esters, sodium stearoyl-2-lactylate, calcium stearoyl-2-lactylate, sorbitan fatty acid esters, quillaja extract surfactant, yucca extract surfactant, saponins, silicone emulsifiers, sorbitan trioleate, soya lecithin, dioctyl sodium sulfosuccinate, dioctyl sodium sulfonate, polyoxyethylene, hydrogenated castor oil, sucrose fatty acid ester, or combinations of any of the foregoing. Natural or synthetic surfactants can be used, including polyethylene glycol and dextrans, such as cyclodextran. In some aspects, the one or more surfactants are present in a nanoparticle composition (collectively or individually) at a dry wt. % of equal to or less than about: 0%, 5%, 10%, 15%, 20%, 30%, 40%, 50%, 60%, or ranges including and/or spanning the aforementioned values. In some aspects, surfactants can include cationic, anionic, non-ionic, and zwitterionic surfactants. In some aspects, one or more surfactants (collectively or individually) are present in the composition at a wet wt. % of equal to or less than about: 0%, 0.1%, 0.5%, 1.0%, 2.5%, 4%, 5%, 6%, 7.5%, 10%, 12.5%, 15%, 17.5%, or ranges including and/or spanning the aforementioned values. In some aspects, one or more surfactants (collectively or individually) are present in the composition at a wet w/v of equal to or less than about: 0 mg/mL, 0.1 mg/mL, 0.5 mg/mL, 1.0 mg/mL, 2.5 mg/mL, 4 mg/mL, 5 mg/mL, 6 mg/mL, 7.5 mg/mL, 10 mg/mL, 12.5 mg/mL, 15 mg/mL, 17.5 mg/mL, or ranges including and/or spanning the aforementioned values. In some aspects, a surfactant exceeds requirements of the United States Pharmacopeia (is USP grade) and/or is National Formulary (NF) grade. [0268] In several aspects, one or more co-emulsifiers are used. In certain aspects, a nanoparticle composition does not comprise a co-emulsifer. In some aspects, a co-emulsifier is a pharmaceutically acceptable co-emulsifier. In some aspects, a co-emulsifier is selected from the group consisting of oleic acid, miglyol 812N (all versions), cetearyl olivate, isoprpyle myristate, celluloses, polysaccharides (e.g., methylcellulose, propylmethylcellulose, hydroxypropyl methylcellulose, xanthan gum, etc.), capric acid, caprylic acid, triglycerides (e.g., triglycerides of oleic acid, capric acid, caprylic acid (Captex 8000, Captex GTO, Captex 1000)), glycerol monooleate, glyceryl stearate, glycerol monostearate (Geleol™ Mono and Diglyceride NF), omega-3 fatty acids (α-linolenic acid (ALA), eicosapentaenoic acid (EPA), docosahexaenoic acid (DHA), Tonalin, Pronova Pure® 46:38, free fatty acid Tonalin FFA 80),
conjugated linoleic acid (CLA), alpha glycerylphosphorylcholine (alpha GPC), palmitoylethanolamide (PEA), cetyl alcohol, or emulsifying wax and/or combinations of any of the foregoing. In some aspects, one or more co-emulsifiers are present in the nanoparticle composition (collectively or individually) at a dry wt. % of equal to or less than about: 0%, 5%, 10%, 15%, 20%, 30%, 40%, 50%, 60%, or ranges including and/or spanning the aforementioned values. In some aspects, one or more co-emulsifiers (collectively or individually) are present in the composition at a wet wt. % of equal to or less than about: 0%, 0.1%, 0.5%, 1.0%, 2.5%, 4%, 5%, 6%, 7.5%, 10%, 12.5%, 15%, 17.5%, or ranges including and/or spanning the aforementioned values. In some aspects, one or more co-emulsifiers (collectively or individually) are present in the composition at a wet w/v of equal to or less than about: 0 mg/mL, 0.1 mg/mL, 0.5 mg/mL, 1.0 mg/mL, 2.5 mg/mL, 4 mg/mL, 5 mg/mL, 6 mg/mL, 7.5 mg/mL, 10 mg/mL, 12.5 mg/mL, 15 mg/mL, 17.5 mg/mL, or ranges including and/or spanning the aforementioned values. In some aspects, a co-emulsifiers exceeds requirements of the United States Pharmacopeia (is USP grade) and/or is National Formulary (NF) grade. In some aspects, a co-emulsifier component comprises a medium chain triglyceride (MCT). In some aspects, a medium chain triglyceride comprises a fatty acid selected from one or more of caprioc acid, octanoic acid, capric acid, caprylic acid, and/or lauric acid (e.g., is formed from). In some aspects, a medium chain triglyceride comprises a fatty acid 6-12 carbons in length (e.g., 6, 7, 8, 9, 10, 11, or 12). In some aspects, a co-emulsifier component comprises a long chain triglyceride (LCT). In some aspects, a long chain triglyceride comprises a fatty acid greater than 12 carbons in length (e.g., greater than or equal to 13, 14, 15, 16, 17, 18, 19, or 20 carbons in length, or ranges including and/or spanning the aforementioned values). In some aspects, a co-emulsifier component is a single lipid. In some aspects, a co-emulsifier component is highly pure. In some aspects, a co-emulsifier component has a purity by weight % of equal to or greater than about: 90%, 95%, 97%, 98%, 99%, 100%, or ranges including and/or spanning the aforementioned values. In some aspects, a co-emulsifier component is present in the nanoparticle composition at dry weight % of equal to or greater than about: 10%, 20%, 30%, 35%, 40%, 45%, 50%, or ranges including and/or spanning the aforementioned values. [0269] In some aspects, a nanoparticle composition comprises one or more sterols. In certain aspects, a nanoparticle composition does not comprise a sterol. In some aspects, one or more sterols comprises one or more cholesterols, ergosterols, hopanoids, hydroxysteroids, phytosterols (e.g., vegapure), ecdysteroids, and/or steroids. In some aspects, a sterol comprises a cholesterol. In some aspects, a sterol component is a single sterol. In some aspects, a sterol
component is cholesterol. In some aspects, a cholesterol (or other sterol) is highly pure. In some aspects, one or more sterol(s) (e.g., cholesterol, and/or other sterols), collectively or individually, are present in the aqueous composition at a concentration of less than or equal to about: 50 mg/mL, 40 mg/mL, 20 mg/mL, 10 mg/mL, 5 mg/mL, or ranges including and/or spanning the aforementioned values. In some aspects, one or more sterol(s) are present in the composition at a dry wt. % of equal to or less than about: 0.25%, 0.5%, 1%, 5%, 7.5%, 10%, 15%, 20%, 25%, or ranges including and/or spanning the aforementioned values. In some aspects, one or more sterol(s) (collectively or individually) are present in the composition at a wet wt. % of equal to or less than about: 0.1%, 0.25%, 0.5%, 1%, 2%, 3%, 4%, 5%, 7.5%, 10%, or ranges including and/or spanning the aforementioned values. In some aspects, cholesterol used in the composition comprises cholesterol from one or more of sheep’s wool, synthetic cholesterol, or semisynthetic cholesterol from plant origin. In some aspects, a sterol has a purity of greater than or equal to about: 92.5%, 95%, 96%, 98%, 99%, 99.9%, 100.0%, or ranges including and/or spanning the aforementioned values. In some aspects, a sterol has a total % impurity content by weight of less than or equal to about: 8.5%, 5%, 4%, 3.7%, 2%, 1%, 0%, or ranges including and/or spanning the aforementioned values. In some aspects, a sterol is not cholesterol. [0270] In some aspects, a nanoparticle composition comprises a preservative. In certain aspects, a nanoparticle composition does not comprise a preservative. In several aspects, a preservative includes one or more benzoates (such as sodium benzoate or potassium benzoate), nitrites (such as sodium nitrite), sulfites (such as sulfur dioxide, sodium or potassium sulphite, bisulphite or metabisulphite), sorbates (such as sodium sorbate, potassium sorbate), ethylenediaminetetraacetic acid (EDTA) (and/or the disodium salt thereof), polyphosphates, organic acids (e.g., citric, succinic, malic, tartaric, benzoic, lactic and propionic acids), and/or antioxidants (e.g., vitamins such as vitamin E and/or vitamin C, butylated hydroxytoluene). In several aspects, sorbates and benzoates may be used in acidic pH formulations. In several aspects, one or more preservatives (collectively or individually) are present in the composition at a dry wt. % of equal to or at less than about: 0.01%, 0.1%, 0.25%, 0.5%, 1%, 5%, 7.5%, 10%, 15%, 20%, 25%, or ranges including and/or spanning the aforementioned values. In several aspects, one or more preservatives (collectively or individually) are present in the composition at a wet wt. % of equal to or less than about: 0.001%, 0.01%, 0.025%, 0.05%, 0.1%, 0.5%, 0.75%, 1.0%, 1.5%, 2.0%, 2.5%, 5%, or ranges including and/or spanning the aforementioned values. In several aspects, one or more preservatives (collectively or individually) are present in the composition at a wet w/v of equal to or less than about: 0
mg/mL, 0.001 mg/mL, 0.1 mg/mL, 0.5 mg/mL, 1.0 mg/mL, 2.5 mg/mL, 4 mg/mL, 5 mg/mL, or ranges including and/or spanning the aforementioned values. In several aspects, as disclosed elsewhere herein, the composition is aqueous, while in others it has been dried into a powder. For instance, as disclosed elsewhere herein, in several aspects, the composition is aqueous (wet), while in others it has been dried into a powder (dry). In several aspects, preservatives inhibit or prevent growth of mold, bacteria, and/or fungus. [0271] In some aspects, a nanoparticle composition comprises a metal. In some aspects, a metal may be zinc. In certain aspects, a nanoparticle composition does not comprise a metal. In several aspects, one or more metals (collectively or individually) are present in the composition at a dry wt. % of equal to or at less than about: 0.01%, 0.1%, 0.25%, 0.5%, 1%, 5%, 7.5%, 10%, 15%, 20%, 25%, or ranges including and/or spanning the aforementioned values. In several aspects, one or more metals (collectively or individually) are present in the composition at a wet wt. % of equal to or less than about: 0.001%, 0.01%, 0.025%, 0.05%, 0.1%, 0.5%, 0.75%, 1.0%, 1.5%, 2.0%, 2.5%, 5%, or ranges including and/or spanning the aforementioned values. In several aspects, one or more metals (collectively or individually) are present in the composition at a wet w/v of equal to or less than about: 0 mg/mL, 0.001 mg/mL, 0.1 mg/mL, 0.5 mg/mL, 1.0 mg/mL, 2.5 mg/mL, 4 mg/mL, 5 mg/mL, or ranges including and/or spanning the aforementioned values. In some aspects, metals may be combined with metalloligands to generate a nanoscale coordination polymer. In some aspects, nanoscale coordination polymers may comprise metal-connecting points and organic bridging ligands. In some aspects, a nanoscale coordination polymers (NCP) may self-assemble into nanoparticles. [0272] In several aspects, as disclosed elsewhere herein, a nanoparticle composition provides particles in the nano-measurement range. In several aspects, a nanoparticle is spherical or substantially spherical. In several aspects, a solid lipid nanoparticle possesses a solid lipid core matrix that can solubilize lipophilic molecules. In several aspects, a lipid core is stabilized by surfactants and/or emulsifiers as disclosed elsewhere herein, while in other aspects, surfactants are absent. In several aspects, the size of the particle is measured as a mean diameter. In several aspects, the size of the particle can be measured by dynamic light scattering. In several aspects, the size of the particle can be measured using a zeta-sizer. In several aspects, the size of the particle can be measured using Scanning Electron Microscopy (SEM). In several aspects, the size of the particle can be measured using a cyrogenic SEM (cryo-SEM). Where the size of a nanoparticle is disclosed elsewhere herein, any one or more of these instruments or methods may be used to measure such sizes.
[0273] In several aspects, a nanoparticle composition may comprise nanoparticles having an average size of less than or equal to about: 10 nm, 25 nm, 40 nm, 50 nm, 100 nm, 250 nm, 500 nm, 1000 nm, or ranges including and/or spanning the aforementioned values. In several aspects, a composition comprises nanoparticles having an average size of between about 50 nm and 150 nm or between about 50 and about 250 nm. In several aspects, the size distribution of the nanoparticles for at least, or at most 50%, 75%, 80%, 90% (or ranges including and/or spanning the aforementioned percentages) of the particles present is equal to or less than about: 20 nm, 40 nm, 60 nm, 80 nm, 100 nm, 110 nm, 120 nm, 130 nm, 140 nm, 160 nm, 180 nm, 200 nm, 300 nm, 400 nm, 500 nm, or ranges including and/or spanning the aforementioned nm values. In several aspects, a composition comprises nanoparticles having an average size of less than or equal to about: 10 nm, 50 nm, 100 nm, 250 nm, 500 nm, 1000 nm, or ranges including and/or spanning the aforementioned values. In several aspects, the size distribution of the nanoparticles for at least 90% of the particles present is equal to or less than about: 20 nm, 40 nm, 60 nm, 80 nm, 100 nm, 110 nm, 120 nm, 130 nm, 140 nm, 160 nm, 180 nm, 200 nm, 300 nm, 400 nm, 500 nm, or ranges including and/or spanning the aforementioned nm values. In several aspects, the size distribution of the nanoparticles for at least 90% of the particles present is equal to or less than about: 100 nm, 110 nm, 120 nm, 130 nm, 140 nm, 160 nm, 180 nm, 200 nm, or ranges including and/or spanning the aforementioned nm values. In several aspects, the D90 of the particles present is equal to or less than about: 80 nm, 100 nm, 110 nm, 120 nm, 130 nm, 140 nm, 160 nm, 180 nm, 200 nm, 300 nm, 400 nm, 500 nm, or ranges including and/or spanning the aforementioned values. In several aspects, the size of the nanoparticle is the diameter of the nanoparticle as measured using any of the techniques as disclosed elsewhere herein. For instance, in some aspects, the size of the nanoparticle is the measured using dynamic light scattering. In several aspects, the size of the nanoparticle is the measured using a zeta sizer. In several aspects, consistency in size over time, or within a sample, allows predictable stability for the active agent encapsulated therein. [0274] In several aspects, over 50%, 75%, 95% (or ranges spanning and or including the aforementioned values) of nanoparticles prepared by methods disclosed herein have a particle size of between about 20 to about 500 nm (e.g., as measured by zeta sizing (e.g., refractive index). In several aspects, over 50%, 75%, 95% (or ranges spanning and or including the aforementioned values) of nanoparticles prepared by methods disclosed herein have a particle size of between about 50 nm to about 200 nm (e.g., as measured by zeta sizing (e.g., refractive index). In several aspects, over 50%, 75%, 95% (or ranges spanning and or including the aforementioned values) of nanoparticles prepared by methods disclosed herein have a particle
size of between about 90 nm to about 150 nm (e.g., as measured by zeta sizing (e.g., refractive index). In several aspects, maintaining consistency in size allows predictable delivery to subjects. In several aspects, the D90 particle size measurement varies between 150 and 500 nm. [0275] In several aspects, the average size of the nanoparticles of a composition as disclosed herein may be substantially constant and/or does not change significantly over time (e.g., it is a stable nanoparticle). In several aspects, after formulation and storage for a period of at least about 1 month (30 days), about 3 months (90 days), or about 6 months (180 days) (e.g., at ambient conditions, at 25^C with 60% relative humidity, or under the other testing conditions disclosed elsewhere herein), the average size of nanoparticles comprising the composition changes less than or equal to about: 1%, 5%, 10%, 20%, or ranges including and/or spanning the aforementioned values. [0276] In several aspects, the polydispersity index (PDI) of the nanoparticles of a composition as disclosed herein is less than or equal to about: 0.05, 0.10, 0.15, 0.20, 0.25, 0.30, 0.40, 0.50, 0.60, 0.70, 0.80, or ranges including and/or spanning the aforementioned values. In several aspects, the size distribution of the nanoparticles is highly monodisperse with a polydispersity index of less than or equal to about: 0.05, 0.10, 0.15, 0.20, 0.25, or ranges including and/or spanning the aforementioned values. [0277] In several aspects, the zeta potential of the nanoparticles of a composition as disclosed herein is less than or equal to about: 1 mV, 3 mV, 4 mV, 5 mV, 6 mV, 7 mV, 8 mV, 10 mV, 20 mV, or ranges including and/or spanning the aforementioned values. In several aspects, the zeta potential of the nanoparticles is greater than or equal to about: -3 mV, -1 mV, 0 mV, 1 mV, 3 mV, 4 mV, 5 mV, 6 mV, 7 mV, 8 mV, 4 mV, 10 mV, 20 mV, or ranges including and/or spanning the aforementioned values. In several aspects, the zeta potential and/or diameter of the particles (e.g., measured using dynamic light scattering) is acquired using a zetasizer (e.g., a Malvern ZS90 or similar instrument). [0278] In several aspects, as disclosed elsewhere herein, a nanoparticle composition is an oil-in-water emulsion, water-in-oil emulsion, water-in-oil-in-water emulsion, oil-in-water-in- oil emulsion, liposome, solid lipid particles formulation, etc. For brevity, these may just be referred to as the composition. In several aspects, a nanoparticle composition can be processed to comprises one or more of solid lipid nanoparticles, liposomes (and variants including multi- lamellar, double liposome preparations, etc.), niosomes, ethosomes, electrostatic particulates, microemulsions, nanoemulsions, microsuspensions, nanosuspensions, or combinations
thereof. In several aspects, polymeric nanoparticles may be formed. In several aspects, cyclodextrin is added. [0279] In several aspects, a solid lipid nanoparticle compositions comprises a lipid core matrix. In several aspects, the lipid core matrix is solid. In several aspects, the solid lipid comprises one or more ingredients as disclosed elsewhere herein. In several aspects, the core of the solid lipid comprises one or more lipids, surfactants, active ingredients, etc. In several aspects, the surfactant acts as an emulsifier. In several aspects, emulsifiers can be used to stabilize the lipid dispersion (with respect to charge and molecular weight). In several aspects, the core ingredients (e.g., the components of the core) are present in the composition (collectively or individually) at a dry wt. % of equal to or less than about: 0.5%, 1.0%, 2.5%, 5%, 10%, 15%, 20%, 30%, 40%, 50%, 60%, 80% or ranges including and/or spanning the aforementioned values. In several aspects, the core ingredients and/or the emulsifiers (collectively or individually) are present in the composition at a wet wt. % of equal to or less than about: 0.5%, 1.0% 2.5%, 5%, 7.5%, 10%, 12.5%, 15%, 20%, 30%, 40%, 60% or ranges including and/or spanning the aforementioned values. [0280] In several aspects, a nanoparticle composition (e.g., when in water or dried) comprises multilamellar nanoparticle vesicles, unilamellar nanoparticle vesicles, multivesicular nanoparticles, emulsion particles, irregular particles with lamellar structures and bridges, partial emulsion particles, combined lamellar and emulsion particles, and/or combinations thereof. In certain aspects, the nanoparticle compositions do not comprise multilamellar nanoparticle vesicles, unilamellar nanoparticle vesicles, multivesicular nanoparticles, emulsion particles, irregular particles with lamellar structures and bridges, partial emulsion particles, combined lamellar and emulsion particles, and/or combinations thereof. In several aspects, the composition is characterized by having multiple types of particles (e.g., lamellar, emulsion, irregular, etc.). In other aspects, a majority of the particles present are emulsion particles. In several aspects, a majority of the particles present are lamellar (multilamellar and/or unilamellar). In other aspects, a majority of the particles present are irregular particles. In still other aspects, a minority of the particles present are emulsion particles. In several aspects, a minority of the particles present are lamellar (multilamellar and/or unilamellar). In other aspects, a minority of the particles present are irregular particles. [0281] In several aspects, multilamellar nanoparticles comprise equal to or at least about 5%, 8%, 9%, 10%, 15%, 25%, 50%, 75%, 85%, 95%, or 100% (or ranges spanning and/or including the aforementioned values) of the particles present in the composition (e.g., the aqueous composition). For example, in some aspects, between about 5% and about 10% of the
particles present are multilamellar. In several aspects, unilamellar nanoparticles comprise equal to, at most, or at least about 5%, 8%, 9%, 10%, 15%, 20%, 25%, 50%, 75%, 85%, 95%, or 100% (or ranges spanning and/or including the aforementioned values) of the particles present in the composition (e.g., the aqueous composition). For example, in some aspects, between about 10% and about 15% of the particles present are unilamellar. [0282] In several aspects, emulsion particles comprise equal to, at most, or at least about 5%, 8%, 9%, 10%, 15%, 25%, 50%, 60%, 65%, 70%, 75%, 85%, 95%, or 100% (or ranges spanning and/or including the aforementioned values) of the particles present in the composition (e.g., the aqueous composition). For example, in some aspects, between about 60% to about 75% of the particles present are emulsion particles. [0283] In several aspects, micelle particles comprise equal to, at most, or at least about 5%, 8%, 9%, 10%, 15%, 25%, 50%, 60%, 65%, 70%, 75%, 85%, 95%, or 100% (or ranges spanning and/or including the aforementioned values) of the particles present in the composition (e.g., the aqueous composition). In several aspects, liposomes comprise equal to, at most, or at least about 5%, 8%, 9%, 10%, 15%, 25%, 50%, 60%, 65%, 70%, 75%, 85%, 95%, or 100% (or ranges spanning and/or including the aforementioned values) of the particles present in the composition (e.g., the aqueous composition). In several aspects, irregular particles (including particles with lamellar structures and/or bridges) comprise equal to, at most, or at least about 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 15%, 25%, 50%, 75%, 85%, 95%, or 100% (or ranges spanning and/or including the aforementioned values) of the particles present in the composition (e.g., the aqueous composition). For example, in some aspects, between about 1% to about 5% of the particles present are irregular particles. In several aspects, combined lamellar and emulsion particles comprise equal to, at most, or at least about 5%, 6%, 7%, 8%, 9%, 10%, 15%, 25%, 50%, 75%, 85%, 95%, or 100% (or ranges spanning and/or including the aforementioned values) of the particles present in the composition (e.g., the aqueous composition). In several aspects, mixed-micelle particles comprise equal to, at most, or at least about 5%, 6%, 7%, 8%, 9%, 10%, 15%, 25%, 50%, 75%, 85%, 95%, or 100% (or ranges spanning and/or including the aforementioned values) of the particles present in the composition (e.g., the aqueous composition). [0284] The nanoparticle compositions can comprise, but are not limited to, combinations of multilamellar nanoparticles, unilamellar nanoparticles, emulsion nanoparticles, micelle nanoparticles, irregular particles, and/or liposomes. [0285] The percentages and/or concentrations of particles present in the composition may be purposefully modified. In some aspects, the percentage and/or concentration of the particles
present in the composition are tailored to the active compound and/or the liquid comprising the particles. Such tailoring may lead to more homogenization and/or dispersion in the liquid. The tailoring may stabilize dispersion in the liquid. [0286] In several aspects, the formulations and/or compositions disclosed herein are stable during sterilization. In several aspects, the sterilization may include one or more of ozonation, UV treatment, and/or heat treatment. In several aspects, the particle size and/or PDI after sterilization (e.g., exposure to techniques that allow sterilization of the composition) varies by less than or equal to about: 1%, 5%, 10%, 20%, 30%, or ranges including and/or spanning the aforementioned values. In several aspects, the therapeutic agent concentration after sterilization (e.g., exposure to techniques that allow sterilization of the composition) varies (e.g., drops) by less than or equal to about: 1%, 5%, 10%, 15%, or ranges including and/or spanning the aforementioned values. [0287] In several aspects, the nanoparticle compositions (including after stabilization) disclosed herein have a shelf life of equal to or greater than 1 month, 3 months, 6 months, 12 months, 14 months, 16 months, 18 months, 19 months, or ranges including and/or spanning the aforementioned values. The shelf-life can be determined as the period of time in which there is 95% confidence that at least 50% of the response (active agent(s) concentration or particle size) is within the specification limit. This refers to a 95% confidence interval and when linear regression predicts that at least 50% of the response is within the set specification limit. VI. Histone Modifications [0288] In certain aspects, provided herein are compositions and methods for modifying histone marks, such as but not limited to, histone ubiquitination and/or histone methylation. In some aspects, histone modifications can occur via inhibition of NSUN2, MBD6, and/or TET2. In some aspects, histone modifications can occur via contact of a histone associated caRNA with a polypeptide with m5C RNA writer, reader, and/or erasor functionality. In some aspects, histone modifications can occur via site-specific targeting of caRNA molecules associated with certain genetic loci. In some aspects, a polypeptide with m5C RNA writer, reader, and/or erasor functionality is coupled with a targeting element to increase and/or decrease certain histone marks, such as but not limited to: H3K4me2, H3K4me1, H2AK119ub, H3K27me3, H3K9me3, H3K27ac, H3K36me3, and/or H3K4me3. In some aspects, inhibitors of TET2 pathway associated proteins, such as but not limited to, TET2, MBD5, MBD6, NSUN1, and/or NSUN2 can be utilized to alter (decrease and/or increase) histone modifications including but not
limited to: H3K4me2, H3K4me1, H2AK119ub, H3K27me3, H3K9me3, H3K27ac, H3K36me3, and/or H3K4me3. [0289] In some aspects, technologies described herein may comprise increasing one or more histone marks at one or more loci, such as but not limited to, increasing histone marks by 0.1x, 0.2x, 0.3x, 0.4x, 0.5x, 0.6x, 0.7x, 0.8x, 0.9x, 1.0x, 2.0x, 3.0x, 4.0x, 5.0x, 6.0x, 7.0x, 8.0x, 9.0x, 10.0x, 15.0x, 20.0x, 25.0x, 50.0x, 100.0x, or greater than 100x (or any range derivable therein) relative to a control histone mark level, at 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, or 1000, or more than 1000, or any range derivable therein, histones. [0290] In some aspects, technologies described herein may comprise increasing one or more H2AK119ub mark at one or more loci, such as but not limited to, increasing histone H2AK119ub by 0.1x, 0.2x, 0.3x, 0.4x, 0.5x, 0.6x, 0.7x, 0.8x, 0.9x, 1.0x, 2.0x, 3.0x, 4.0x, 5.0x, 6.0x, 7.0x, 8.0x, 9.0x, 10.0x, 15.0x, 20.0x, 25.0x, 50.0x, 100.0x, or greater than 100x (or any range derivable therein) relative to control H2AK119ub levels, at 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, or 1000, or more than 1000, or any range derivable therein, histones. [0291] In some aspects, technologies described herein may comprise decreasing one or more H2AK119ub mark at one or more loci, such as but not limited to, decreasing histone H2AK119ub by 0.1x, 0.2x, 0.3x, 0.4x, 0.5x, 0.6x, 0.7x, 0.8x, 0.9x, 1.0x, 2.0x, 3.0x, 4.0x, 5.0x, 6.0x, 7.0x, 8.0x, 9.0x, 10.0x, 15.0x, 20.0x, 25.0x, 50.0x, 100.0x, or greater than 100x (or any range derivable therein) relative to control H2AK119ub levels, at 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, or 1000, or more than 1000, or any range derivable therein, histones. VII. Methods of Use [0292] In some aspects, provided herein are methods of using compositions and technologies disclosed herein, and/or new uses of additional compositions and/or methods known in the art. [0293] In certain aspects, provided herein are methods of treating diseases and/or disorders associated with aberrant m5C RNA levels. In certain aspects, provided herein are methods of
treating diseases and/or disorders associated with aberrant m5C levels in caRNA and/or carRNA. [0294] In certain aspects, provided herein are methods of determining appropriate treatment regimens and/or treating diseases and/or disorders such as but not limited to: acute myeloid leukemias (AML), including at least AML with a translocation between chromosomes 8 and 21 [t(8;21)], AML with a translocation or inversion in chromosome 16 [t(16;16) or inv(16)], AML with a translocation or inversion in chromosome 16 [t(16;16) or inv(16)], AML with a translocation or inversion in chromosome 16 [t(16;16) or inv(16)], AML with a translocation between chromosomes 6 and 9 [t(6:9)], AML with a translocation or inversion in chromosome 3 [t(3;3) or inv(3)], AML (megakaryoblastic) with a translocation between chromosomes 1 and 22 [t(1:22)], AML with the BCR-ABL1 (BCR-ABL) fusion gene, AML with mutated NPM1 gene, AML with biallelic mutations of the CEBPA gene, AML with mutated RUNX1 gene, AML with myelodysplasia-related changes, AML related to previous chemotherapy or radiation, AML not otherwise specified, AML with minimal differentiation, AML without maturation, AML with maturation, Acute myelomonocytic leukemia, Acute monoblastic/monocytic leukemia, pure erythroid leukemia, acute megakaryoblastic leukemia, acute basophilic leukemia, acute panmyelosis with fibrosis, myeloid sarcoma, myeloid proliferations related to down syndrome, undifferentiated and biphenotypic acute leukemias, AML with mutations in TET2, TP53, RUNX1, and/or ASXL1. [0295] In certain aspects, provided herein are methods of determining appropriate treatment regimens and/or treating diseases and/or disorders such as but not limited to chronic myeloid leukemia (CML) and/or chronic myelomonocytic leukemia. [0296] In certain aspects, provided herein are methods of determining appropriate treatment regimens and/or treating diseases and/or disorders such as but not limited to gliomas such as but not limited to astrocytomas (including glioblastoma multiforme), brain stem gliomas, ependymomas, mixed gliomas, oligodendrogliomas, and/or optic pathway gliomas. [0297] In certain aspects, provided herein are methods of determining appropriate treatment regimens and/or treating diseases and/or disorders such as but not limited to clonal hematopoiesis of indeterminate potential (CHIP) or disease states associated therewith (such as but not limited to atherosclerosis, myocardial fibrosis, and/or heart failure). A. Target RNA [0298] In some aspects, methods provided herein comprise site specific modification of one or more (e.g., at 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50,
60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, or 1000, or more than 1000, or any range derivable therein) target RNA molecules. In some aspects, a target RNA is a messenger RNA (mRNA). In some aspects, a target RNA is not a mRNA. In some aspects, a target RNA is a ribosomal RNA (rRNA). In some aspects, a target RNA is not a ribosomal RNA (rRNA). In some aspects, a target RNA is a tRNA. In some aspects, a target RNA is not a tRNA. In some aspects, a target RNA is a chromatin associated RNA (caRNA). In some aspects, a target RNA is a chromatin associated regulatory RNA (carRNA). In some aspects, a target RNA is a retroviral RNA. In some aspects, a target RNA is a long-terminal repeat (LTR) RNA. In some aspects, a target RNA is a promoter RNA. In some aspects, a target RNA is not a promoter RNA. In some aspects, a target RNA is an enhancer RNA. In some aspects, a target RNA is not an enhancer RNA. In some aspects, the target RNA is in a diseased cell (e.g., a diseased cell associated with a disease in an individual). [0299] In some aspects, a target RNA comprises, consists essentially of, or consists of one or more (e.g., at 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, or 1000, or more than 1000, or any range derivable therein) caRNA described in Table 1. In some aspects, a target RNA comprises, consists essentially of, or consists of one or more caRNA described in Table 2. In some aspects, a target RNA comprises, consists essentially of, or consists of one or more (e.g., at 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, or 1000, or more than 1000, or any range derivable therein) caRNA described in Table 1 and annotated as SEQ ID NOs: 100-614. In some aspects, a target RNA comprises, consists essentially of, or consists of one or more caRNA described in Table 1 and annotated as SEQ ID NOs: 104 or 107. [0300] In some aspects, a target RNA comprises, consists essentially of, or consists of one or more RNA molecules encoding an oncogene or a tumor repressor gene. In some aspects, a target RNA comprises, consists essentially of, or consists of one or more RNA molecules encoding NSUN1, NSUN2, MBD5, MBD6, and/or TET2. B. Methods of Treatment [0301] The present disclosure additionally provides methods of detecting, methods of measuring, methods of diagnosing, and/or methods of ameliorating and/or treating diseases. In some aspects, constructs, vectors, particles, polypeptides, polynucleotides, and/or compositions described herein may be comprised in a formulation with one or more additional
therapeutic agents. In some aspects, constructs, vectors, particles, polypeptides, polynucleotides, and/or compositions described herein may be comprised in a formulation wherein the formulation comprises pharmaceutically acceptable excipients. [0302] In some aspects, constructs, vectors, particles, polypeptides, polynucleotides, and/or compositions described herein may be administered to a cell in an in vitro environment. In some aspects, a cell may be derived from a subject. In some aspects, a cell is an immune cell, a stem cell, an induced pluripotent stem cell, a precursor cell, and/or a terminally differentiated cell. In some aspects, constructs, vectors, particles, polypeptides, polynucleotides, and/or compositions described herein may be administered to a cell in vivo via administration to a subject. [0303] In some aspects, constructs, vectors, particles, polypeptides, polynucleotides, and/or compositions described herein are administered to a subject in need thereof. In some aspects, a subject may have, may be diagnosed with, or may be susceptible to a disease, such as an infectious disease, a genetic disorder, an autoimmune disease, and/or cancer. [0304] In some aspects, a subject is a mammal. In some aspects, a subject is a domestic animal. In some aspects, a subject is a farm animal. In some aspects, a subject is a zoo animal. In some aspects, a subject is a dog or a cat. In some aspects, a subject is a cow, a horse, a sheep, or a goat. In some aspects, a subject can be but is not limited to, a dog, cat, ferret, rabbit, cow, duck, pig, goat, chicken, horse, llama, camel, ostrich, deer, turkey, dove, sheep, goose, oxen, and/or reindeer. In some aspects, a subject is a human. In some aspects, a subject is equal to, less than, or greater than 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or 100 years of age. [0305] In some aspects, administration regimens comprising constructs, vectors, particles, polypeptides, polynucleotides, and/or compositions described herein comprise administering of more than one composition, such as 2 compositions, 3 compositions, 4 compositions, or more than 4 compositions. Various combinations of the agents may be employed. In some aspects, constructs, vectors, particles, polypeptides, polynucleotides, and/or compositions of the disclosure may be administered by the same route of administration or by different routes of administration. In some aspects, agents described herein and/or additional therapeutic agents are administered intravenously, intramuscularly, subcutaneously, topically, orally, transdermally, intraperitoneally, intraorbitally, by implantation, by inhalation, intrathecally,
intraventricularly, or intranasally. In some aspects, constructs, vectors, particles, polypeptides, polynucleotides, and/or compositions described herein and/or additional therapeutic agents are administered intravenously, intramuscularly, subcutaneously, topically, orally, transdermally, intraperitoneally, intraorbitally, by implantation, by inhalation, intrathecally, intraventricularly, or intranasally. In some aspects, an appropriate dosage may be determined based on the type of disease to be treated and/or prevented, severity, and/or course of the disease, the clinical condition of the individual, the individual's clinical history and response to the treatment, and/or at the discretion of the attending physician. [0306] In some aspects, administration to a subject may include various “unit doses.” Unit dose is defined as containing a predetermined-quantity of the therapeutic composition. The quantity to be administered, and the particular route and formulation, is within the skill of determination of those in the clinical arts. A unit dose need not be administered as a single injection but may comprise continuous infusion over a set period of time. In some aspects, a unit dose comprises a single administrable dose. [0307] In some aspects, the quantity to be administered, both according to number of treatments and unit dose, depends on the treatment effect desired. An effective dose is understood to refer to an amount necessary to achieve a particular effect. In the practice in certain aspects, it is contemplated that doses in the range from 0.10 mg/kg to 200 mg/kg can affect the functionality of the described agents. In certain aspects, it is contemplated that doses may comprise a composition comprising an AAV particle in a concentration of about 108 to about 1014 viral genomes per ml. Furthermore, such doses can be administered at multiple times during a day, and/or on multiple days, weeks, or months. [0308] In certain aspects, precise amounts of the therapeutic composition also depend on the judgment of the practitioner and are peculiar to each individual. Factors affecting dose include physical and clinical state of the patient, the route of administration, the intended goal of treatment (alleviation of symptoms versus cure) and the potency, stability and toxicity of the particular therapeutic substance or other therapies a subject may be undergoing. [0309] It is also understood that uptake is species and organ/tissue dependent. The applicable conversion factors and physiological assumptions to be made concerning uptake and concentration measurement are well-known and would permit those of skill in the art to convert one concentration measurement to another and make reasonable comparisons and conclusions regarding the doses, efficacies and results described herein. [0310] In certain instances, it will be desirable to have multiple administrations of the composition, e.g., 2, 3, 4, 5, 6 or more administrations. The administrations can be at 1, 2, 3,
4, 5, 6, 7, 8, to 5, 6, 7, 8, 9, 10, 11, 12 week, or more than 12 week intervals, including all ranges there between. [0311] The phrases “pharmaceutically acceptable” or “pharmacologically acceptable” refer to molecular entities and compositions that do not produce an adverse, allergic, or other untoward reaction when administered to an animal or human. As used herein, “pharmaceutically acceptable carrier” includes any and all solvents, dispersion media, coatings, anti-bacterial and anti-fungal agents, isotonic and absorption delaying agents, and the like. The use of such media and agents for pharmaceutical active substances is well known in the art. Except insofar as any conventional media or agent is incompatible with the active ingredients, its use in immunogenic and therapeutic compositions is contemplated. Supplementary active ingredients, such as other anti-infective agents and vaccines, can also be incorporated into the compositions. [0312] The active compounds can be formulated for parenteral administration, e.g., formulated for injection via the intravenous, intramuscular, subcutaneous, or intraperitoneal routes. Typically, such compositions can be prepared as either liquid solutions or suspensions; solid forms suitable for use to prepare solutions or suspensions upon the addition of a liquid prior to injection can also be prepared; and, the preparations can also be emulsified. [0313] The pharmaceutical forms suitable for injectable use include sterile aqueous solutions or dispersions; formulations including, for example, aqueous propylene glycol; and sterile powders for the extemporaneous preparation of sterile injectable solutions or dispersions. In all cases the form must be sterile and must be fluid to the extent that it may be easily injected. It also should be stable under the conditions of manufacture and storage and must be preserved against the contaminating action of microorganisms, such as bacteria and fungi. [0314] In some aspects, wherein a composition is proteinaceous, the proteinaceous compositions may be formulated into a neutral or salt form. Pharmaceutically acceptable salts, include the acid addition salts (formed with the free amino groups of the protein) and which are formed with inorganic acids such as, for example, hydrochloric or phosphoric acids, or such organic acids as acetic, oxalic, tartaric, mandelic, and the like. Salts formed with the free carboxyl groups can also be derived from inorganic bases such as, for example, sodium, potassium, ammonium, calcium, or ferric hydroxides, and such organic bases as isopropylamine, trimethylamine, histidine, procaine and the like. [0315] In some aspects, a pharmaceutical composition can include a solvent or dispersion medium containing, for example, water, ethanol, polyol (for example, glycerol, propylene
glycol, and liquid polyethylene glycol, and the like), suitable mixtures thereof, and vegetable oils. The proper fluidity can be maintained, for example, by the use of a coating, such as lecithin, by the maintenance of the required particle size in the case of dispersion, and by the use of surfactants. The prevention of the action of microorganisms can be brought about by various anti-bacterial and anti-fungal agents, for example, parabens, chlorobutanol, phenol, sorbic acid, thimerosal, and the like. In many cases, it will be preferable to include isotonic agents, for example, sugars or sodium chloride. Prolonged absorption of the injectable compositions can be brought about by the use in the compositions of agents delaying absorption, for example, aluminum monostearate and gelatin. [0316] In some aspects, sterile injectable solutions are prepared by incorporating the active compounds in the required amount in the appropriate solvent with various other ingredients enumerated above, as required, followed by filtered sterilization or an equivalent procedure. Generally, dispersions are prepared by incorporating the various sterilized active ingredients into a sterile vehicle which contains the basic dispersion medium and the required other ingredients from those enumerated above. In the case of sterile powders for the preparation of sterile injectable solutions, the preferred methods of preparation are vacuum-drying and freeze- drying techniques, which yield a powder of the active ingredient, plus any additional desired ingredient from a previously sterile-filtered solution thereof. [0317] In some aspects, upon formulation, compositions described herein may be administered in a manner compatible with the dosage formulation and in such amount as is therapeutically or prophylactically effective. In some aspects, formulations are administered in a variety of dosage forms, such as the type of injectable solutions described above. 1. Diseases or disorders [0318] In some aspects, constructs, vectors, particles, polypeptides, polynucleotides, and/or compositions described herein, or otherwise known in the art, may be used in a method of preventing, treating, reducing the progression of, and/or reducing the risk of a disease or disorder associated with aberrant m5C RNA levels. In some aspects, constructs, vectors, particles, polypeptides, polynucleotides, and/or compositions described herein, or those otherwise known in the art, may be used in treating a disease or disorder, wherein the disease or disorder is a neurodegenerative disease, an inflammatory disease, an autoimmune disease, a metabolic syndrome, a cancer, a vascular disease, a fibrotic disease, a viral infection, a bacterial infection, a fungal infection, a parasitic infection, a musculoskeletal disease (such as a myopathy), an ocular disease, or a genetic disorder.
[0319] In some aspects, the disease or disorder is a cancer. In some aspects, the cancer is pancreatic cancer, breast cancer, kidney cancer, bladder cancer, prostate cancer, testicular cancer, urothelial cancer, endometrial cancer, ovarian cancer, cervical cancer, renal cancer, esophageal cancer, gastrointestinal stromal tumor (GIST), multiple myeloma, cancer of secretory cells, thyroid cancer, gastrointestinal carcinoma, chronic myeloid leukemia, hepatocellular carcinoma, colon cancer, melanoma, malignant glioma, glioblastoma, glioblastoma multiforme, astrocytoma, dysplastic gangliocytoma of the cerebellum, Ewing’s sarcoma, rhabdomyosarcoma, ependymoma, medulloblastoma, ductal adenocarcinoma, adenosquamous carcinoma, nephroblastoma, acinar cell carcinoma, neuroblastoma, or lung cancer. In some aspects, the cancer of secretory cells is non-Hodgkin’s lymphoma, Burkitt’s lymphoma, chronic lymphocytic leukemia, monoclonal gammopathy of undetermined significance (MGUS), plasmacytoma, lymphoplasmacytic lymphoma or acute lymphoblastic leukemia. [0320] In some aspects, a cancer comprises, consists essentially of, consists of, or expressly does not comprise, an acute myeloid leukemia (AML), including at least AML with a translocation between chromosomes 8 and 21 [t(8;21)], AML with a translocation or inversion in chromosome 16 [t(16;16) or inv(16)], AML with a translocation or inversion in chromosome 16 [t(16;16) or inv(16)], AML with a translocation or inversion in chromosome 16 [t(16;16) or inv(16)], AML with a translocation between chromosomes 6 and 9 [t(6:9)], AML with a translocation or inversion in chromosome 3 [t(3;3) or inv(3)], AML (megakaryoblastic) with a translocation between chromosomes 1 and 22 [t(1:22)], AML with the BCR-ABL1 (BCR-ABL) fusion gene, AML with mutated NPM1 gene, AML with biallelic mutations of the CEBPA gene, AML with mutated RUNX1 gene, AML with myelodysplasia-related changes, AML related to previous chemotherapy or radiation, AML not otherwise specified, AML with minimal differentiation, AML without maturation, AML with maturation, Acute myelomonocytic leukemia, Acute monoblastic/monocytic leukemia, pure erythroid leukemia, acute megakaryoblastic leukemia, acute basophilic leukemia, acute panmyelosis with fibrosis, myeloid sarcoma, myeloid proliferations related to down syndrome, undifferentiated and biphenotypic acute leukemias, AML with mutations in TET2, TP53, RUNX1, asnd/or ASXL. In certain aspects, one or more of the aforementioned AML are expressly excluded from AMLs subject to treatment with methods and/or compositions described herein. [0321] In some aspects, a cancer comprises, consists essentially of, consists of, or expressly does not comprise, a glioma, including at least astrocytomas (including glioblastoma
multiforme), brain stem gliomas, ependymomas, mixed gliomas, oligodendrogliomas, and/or optic pathway gliomas. [0322] In some aspects, the disease or disorder is a pre-cancerous disease or disorder. In some aspects, the disease or disorder comprises, consists essentially of, consists of, or expressly does not comprise, clonal hematopoiesis of indeterminate potential (CHIP) or disease states associated therewith (such as but not limited to atherosclerosis, myocardial fibrosis, and/or heart failure). In some aspects, methods described herein comprise treatment of CHIP and/or one or more disease states associated therewith, comprising inhibition of NSUN1, NSUN2, MBD5, MBD6, and/or TET2. In some aspects, methods described herein comprise treatment of CHIP and/or one or more disease states associated therewith, comprising inhibition of MBD6. In some aspects, methods described herein comprise treatment of CHIP and/or one or more disease states associated therewith, comprising inhibition of NSUN2. [0323] In some aspects, a disease or disorder for treatment comprising technologies described herein is associated with diseased cells with one or more mutations in one or more genes encoding a ten-eleven translocation (tet) methylcytosine dioxygenase 2 (TET2), ASXL transcriptional regulator 1 (ASXL1), isocitrate dehydrogenase 1 (IDH1), isocitrate dehydrogenase 2 (IDH2), tumor protein p53 (p53), DNA (cytosine-5-)-methyltransferase 3A (DNMT3A), Janus kinase 2 (JAK2), Protein Phosphatase Mn2+/Mg2+-Dependent 1D (PPM1D), Spliceosome Factor 3b1 (SF3B1), and/or Serine and Arginine Rich Splicing Factor 2 (SRSF2). In some aspects, a disease or disorder for treatment expressly does not comprise diseasesd cells with one or more mutations in TET2, IDH1, IDH2, DNMT3A, ASXL1, PPM1D, TP53, JAK2, SF3B1, and/or SRSF2. In some aspects, a disease or disorder for treatment expressly does not comprise diseased cells with one or more mutations in TET2. In some aspects, a disease or disorder for treatment expressly does not comprise diseased cells with one or more mutations in IDH1. In some aspects, a disease or disorder for treatment expressly does not comprise diseased cells with one or more mutations in IDH2. In some aspects, a disease or disorder for treatment expressly does not comprise diseased cells with one or more mutations in DNMT3A. In some aspects, a disease or disorder for treatment expressly does not comprise diseased cells with one or more mutations in ASXL1. In some aspects, a disease or disorder for treatment expressly does not comprise diseased cells with one or more mutations in PPM1D. In some aspects, a disease or disorder for treatment expressly does not comprise diseased cells with one or more mutations in TP53. In some aspects, a disease or disorder for treatment expressly does not comprise diseased cells with one or more mutations in JAK2. In some aspects, a disease or disorder for treatment expressly does not comprise
diseased cells with one or more mutations in SF3B1. In some aspects, a disease or disorder for treatment expressly does not comprise diseased cells with one or more mutations in SRSF2. [0324] In some aspects, a disease or disorder for treatment comprising technologies described herein is associated with a diseased cell comprising one or more mutations in a TET2 encoding gene (e.g., one or more loss of function mutations). In some aspects, a disease or disorder for treatment comprising technologies described herein is associated with diseased cells with one or more mutations in TET2, IDH1, IDH2, DNMT3A, ASXL1, PPM1D, TP53, JAK2, SF3B1, and/or SRSF2. In some aspects, a disease or disorder for treatment comprising technologies described herein is associated with diseased cells with one or more mutations in one or more genes encoding ASXL1, p53, DNMT3A, JAK2, PPM1D, SF3B1, and/or SRSF2. [0325] In some aspects, a disease or disorder for treatment comprising technologies described herein is associated with diseased cells with one or more mutations in TET2, IDH1, IDH2, DNMT3A, ASXL1, PPM1D, TP53, JAK2, SF3B1, and/or SRSF2, wherein the one or more mutations renders the diseased cells susceptible to synthetic lethality induced by inhibition of MBD5, MBD6, NSUN1, and/or NSUN2. In some aspects, a disease or disorder for treatment comprising technologies described herein is associated with diseased cells with one or more mutations in TET2, IDH1, and/or IDH2, wherein the one or more mutations renders the diseased cells susceptible to synthetic lethality induced by inhibition of MBD5, MBD6, NSUN1, and/or NSUN2. [0326] In some aspects, a disease or disorder for treatment comprising technologies described herein is associated with diseased cells comprising one or more mutations in one or more genes encoding components of a canonical and/or non-canonical Polycomb Repressive Complex (PRC). In some aspects, the one or more mutations in one or more genes encoding components of PRC comprises one or more loss of function mutations. In some aspects, a diseased cell comprising one or more mutations in one or more genes encoding components of PRC comprises one or more mutations in E3 Ubiquitin Ligase RING1A/B, Polycomb Group Ring Finger 1 (PCGF1), Polycomb Group Ring Finger 2 (PCGF2), Polycomb Group Ring Finger 3 (PCGF3), Polycomb Group Ring Finger 4 (PCGF4), Polycomb Group Ring Finger 5 (PCGF5), and/or Polycomb Group Ring Finger 6 (PCGF6). In some aspects, the diseased cell comprises one or more mutations in one or more genes encoding Polycomb Repressive- Deubiquitinase (PR-DUB) complex associated components O-linked N-acetylglucosamine Transferase (OGT), Lysine Demethylase 1B (KDM1B), Forkhead Box K1 (FOXK1), Forkhead Box K2 (FOXK2), BRCA1 Associated Protein 1 (BAP1), ASXL Transcriptional Regulator 1 (ASXL1), ASXL Transcriptional Regulator 2 (ASXL2), ASXL Transcription
Regulator 3 (ASXL3), and/or Host Cell Factor C1 (HCFC1). For instance, in some aspects, the diseased cell comprising one or more mutations in PR-DUB complex associated components OGT, KDM1B, FOXK1, FOXK2, BAP1, ASXL1, ASXL2, ASXL3, and/or HCFC1 comprises one or more gain of function mutations in those genes. In some aspects, a disease or disorder for treatment expressly does not comprise diseased cells with one or more mutations in OGT. In some aspects, a disease or disorder for treatment expressly does not comprise diseased cells with one or more mutations in KDM1B. In some aspects, a disease or disorder for treatment expressly does not comprise diseased cells with one or more mutations in FOXK1. In some aspects, a disease or disorder for treatment expressly does not comprise diseased cells with one or more mutations in FOXK2. In some aspects, a disease or disorder for treatment expressly does not comprise diseased cells with one or more mutations in BAP1. In some aspects, a disease or disorder for treatment expressly does not comprise diseased cells with one or more mutations in ASXL1. In some aspects, a disease or disorder for treatment expressly does not comprise diseased cells with one or more mutations in ASXL2. In some aspects, a disease or disorder for treatment expressly does not comprise diseased cells with one or more mutations in ASXL3. In some aspects, a disease or disorder for treatment expressly does not comprise diseased cells with one or more mutations in HCFC1. [0327] In some aspects, the disease or disorder is an inflammatory disease. In some aspects, the inflammatory disease is, or expressly is not, arthritis, psoriatic arthritis, psoriasis, juvenile idiopathic arthritis, asthma, allergic asthma, bronchial asthma, tuberculosis, chronic airway disorder, cystic fibrosis, glomerulonephritis, membranous nephropathy, sarcoidosis, vasculitis, ichthyosis, transplant rejection, interstitial cystitis, atopic dermatitis, or inflammatory bowel disease. In some aspects, the inflammatory bowel disease is Crohn’ disease, ulcerative colitis, inflammatory bowel disease, or celiac disease. [0328] In some aspects, the disease or disorder is an autoimmune disease. In some aspects, the autoimmune disease is, or expressly is not, systemic lupus erythematosus, type 1 diabetes, multiple sclerosis, psoriasis/psoriatic arthritis, inflammatory bowel disease, Addison’s disease, Graves’ disease, Sjogren’s syndrome, Hashimoto’s thyroiditis, Myasthenia gravis, autoimmune vasculitis, pernicious anemia, celiac disease, or rheumatoid arthritis. [0329] In some aspects, the disease or disorder is a metabolic syndrome. In some aspects, the metabolic syndrome is, or expressly is not, acute pancreatitis, chronic pancreatitis, alcoholic liver steatosis, obesity, glucose intolerance, insulin resistance, hyperglycemia, fatty liver, dyslipidemia, hyperlipidemia, hyperhomocysteinemia, or type 2 diabetes. In some aspects, the metabolic syndrome is alcoholic liver steatosis, obesity, glucose intolerance, insulin resistance,
hyperglycemia, fatty liver, dyslipidemia, hyperlipidemia, hyperhomocysteinemia, or type 2 diabetes. [0330] In some aspects, the disease or disorder is a musculoskeletal disease (such as a myopathy). In some aspects, the musculoskeletal disease is, or expressly is not, a myopathy, a muscular dystrophy, a muscular atrophy, a muscular wasting, or sarcopenia. In some aspects, the muscular dystrophy is, or expressly is not, Duchenne muscular dystrophy (DMD), Becker’s disease, myotonic dystrophy, X-linked dilated cardiomyopathy, spinal muscular atrophy (SMA), or metaphyseal chondrodysplasia, Schmid type (MCDS). In some aspects, the myopathy is a skeletal muscle atrophy. In some aspects, the musculoskeletal disease (such as the skeletal muscle atrophy) is triggered by ageing, chronic diseases, stroke, malnutrition, bedrest, orthopedic injury, bone fracture, cachexia, starvation, heart failure, obstructive lung disease, renal failure, Acquired Immunodeficiency Syndrome (AIDS), sepsis, an immune disorder, a cancer, ALS, a burn injury, denervation, diabetes, muscle disuse, limb immobilization, mechanical unload, myositis, or a dystrophy. [0331] In some aspects, the disease or disorder is, or expressly is not, a musculoskeletal disease. In some aspects, skeletal muscle mass, quality and/or strength are increased. In some aspects, synthesis of muscle proteins is increased. In some aspects, skeletal muscle fiber atrophy is inhibited. [0332] In some aspects, the disease or disorder is, or expressly is not, a vascular disease. In some aspects, the vascular disease is atherosclerosis, abdominal aortic aneurism, carotid artery disease, deep vein thrombosis, Buerger’s disease, chronic venous hypertension, vascular calcification, telangiectasia or lymphoedema. [0333] In some aspects, the disease or disorder is genetic disorder. In some aspects, a genetic disorder is, or expressly is not, arrhythmogenic right ventricular dysplasia /cardiomyopathy, Brugada Syndrome, Charcot-Marie-Tooth Disease, Cleft Lip and Palate, Cleidocranial Dysplasia, Cystic Fibrosis, Familial Adenomatous Polyposis, Hirschsprungs Disease, Huntington’s Disease, Klinefelter Syndrome, Kneist Syndrome, Marfan Syndrome, Mucopolysaccharidoses, Muscular Dystrophy, Sickle Cell Disease, Von Hippel-Lindau Syndrome, Congenital Deafness, Familial Hypercholesterolemia, Hemochromatosis, Neurofibromatosis type 1, Tay-Sachs Disease, Usher Syndrome, AA amyloidosis, Adrenoleukodystrophy, Ehlers-Danlos Syndrome, Lysosomal disorders, and/or Mitochondrial disorders. [0334] In some aspects, the disease or disorder is, or expressly is not, an ocular disease. In some aspects, the ocular disease is, or expressly is not, glaucoma, age-related macular
degeneration, inflammatory retinal disease, retinal vascular disease, diabetic retinopathy, uveitis, rosacea, Sjogren´s syndrome, retinitis pigmentosa, retinoschisis, Stargardt disease, Leber congenital amaurosis, or neovascularization in proliferative retinopathy. 2. Pharmaceutical compositions [0335] In certain aspects, the constructs, vectors, particles, polypeptides, polynucleotides, and/or compositions (collectively described as “agents”) for use in the methods, such as methods of m5C RNA writing, reading, and/or erasing, and/or inhibition of NSUN1, NSUN2, MBD5, MBD6, and/or TET2, are suitably contained in a pharmaceutically acceptable carrier. In some aspects, the carrier is non-toxic, biocompatible and is selected so as not to detrimentally affect the biological activity of the agent. In some aspects, agents may be formulated into preparations for local delivery (i.e. to a specific location of the body, such as brain tissues, muscle tissue, fat tissue, etc.) or systemic delivery, in solid, semi-solid, gel, liquid or gaseous forms such as tablets, capsules, powders, granules, ointments, solutions, depositories, inhalants and injections allowing for oral, parenteral or surgical administration. Certain aspects of the disclosure also contemplate local administration of the compositions by coating medical devices and the like. [0336] In some aspects, suitable carriers for parenteral delivery via injectable, infusion or irrigation and topical delivery include distilled water, physiological phosphate-buffered saline, normal or lactated Ringer's solutions, dextrose solution, Hank's solution, or propanediol. In addition, sterile, fixed oils may be employed as a solvent or suspending medium. For this purpose any biocompatible oil may be employed including synthetic mono- or diglycerides. In addition, fatty acids such as oleic acid find use in the preparation of injectables. The carrier and agent may be compounded as a liquid, suspension, polymerizable or non-polymerizable gel, paste or salve. [0337] In certain aspects, the carrier may also comprise a delivery vehicle to sustain (i.e., extend, delay or regulate) the delivery of the agent(s) or to enhance the delivery, uptake, stability or pharmacokinetics of the therapeutic agent(s). Such a delivery vehicle may include, by way of non-limiting examples, microparticles, microspheres, nanospheres or nanoparticles composed of proteins, liposomes, carbohydrates, synthetic organic compounds, inorganic compounds, polymeric or copolymeric hydrogels and polymeric micelles. [0338] In certain aspects, the actual dosage amount of a composition administered to a patient or subject can be determined by physical and physiological factors such as body weight, severity of condition, the type of disease being treated, previous or concurrent therapeutic
interventions, idiopathy of the patient and on the route of administration. The practitioner responsible for administration will, in any event, determine the concentration of active ingredient(s) in a composition and appropriate dose(s) for the individual subject. [0339] In some aspects, solutions of pharmaceutical compositions can be prepared in water suitably mixed with a surfactant, such as hydroxypropylcellulose. Dispersions also can be prepared in glycerol, liquid polyethylene glycols, mixtures thereof and in oils. Under ordinary conditions of storage and use, these preparations contain a preservative to prevent the growth of microorganisms. [0340] In certain aspects, the pharmaceutical compositions are advantageously administered in the form of injectable compositions either as liquid solutions or suspensions; solid forms suitable or solution in, or suspension in, liquid prior to injection may also be prepared. These preparations also may be emulsified. A typical composition for such purpose comprises a pharmaceutically acceptable carrier. For instance, the composition may contain 10 mg or less, 25 mg, 50 mg or up to about 100 mg of human serum albumin per milliliter of phosphate buffered saline. Other pharmaceutically acceptable carriers include aqueous solutions, non-toxic excipients, including salts, preservatives, buffers and the like. [0341] In some aspects, non-limiting examples of non-aqueous solvents are propylene glycol, polyethylene glycol, vegetable oil and injectable organic esters such as ethyloleate. In some aspects, non-limiting examples of aqueous carriers include water, alcoholic/aqueous solutions, saline solutions, parenteral vehicles such as sodium chloride, Ringer's dextrose, etc. In some aspects, intravenous vehicles include fluid and nutrient replenishers. Preservatives include antimicrobial agents, antifungal agents, anti-oxidants, chelating agents and inert gases. The pH and exact concentration of the various components the pharmaceutical composition are adjusted according to well-known parameters. [0342] In certain aspects, formulations comprising constructs described herein and/or co- administered formulations may be suitable for oral administration. In some aspects, oral formulations include such typical excipients as, for example, pharmaceutical grades of mannitol, lactose, starch, magnesium stearate, sodium saccharine, cellulose, magnesium carbonate and the like. The compositions take the form of solutions, suspensions, tablets, pills, capsules, sustained release formulations or powders. [0343] An effective amount of the pharmaceutical composition is determined based on the intended goal. The term “unit dose” or “dosage” refers to physically discrete units suitable for use in a subject, each unit containing a predetermined-quantity of the pharmaceutical composition calculated to produce the desired responses discussed above in association with
its administration, i.e., the appropriate route and treatment regimen. The quantity to be administered, both according to number of treatments and unit dose, depends on the protection or effect desired. [0344] Precise amounts of the pharmaceutical composition also depend on the judgment of the practitioner and are peculiar to each individual. Factors affecting the dose include the physical and clinical state of the patient, the route of administration, the intended goal of treatment (e.g., alleviation of symptoms versus cure) and the potency, stability and toxicity of the particular therapeutic substance. VIII. Kits [0345] Certain aspects of the present disclosure also concern kits containing compositions of the disclosure or compositions to implement methods disclosed herein. In some aspects, disclosed are kits that can be used detect, and/or modify m5C in one or more RNAs, such as caRNAs. In some aspects, disclosed are kits that can be used to quantify, detect, and/or modify m5C in a target RNA. In some aspects, a kit may also include additional components that are useful for purifying, amplifying, or sequencing RNA or DNA, or for other applications of the present disclosure as described herein. [0346] The kit may optionally provide additional components that are useful in a procedure. These optional components include buffers, capture reagents, developing reagents, labels, reacting surfaces, means for detection, control samples, instructions, and interpretive information. In some aspects, a kit can be used to detect, for example, the absence, presence, and/or level of one or more features described herein. In certain aspects, a kit contains, contains at least, or contains at most 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1,000, or more than 1,000, probes, primers or primer sets, synthetic molecules or inhibitors, or any value or range and combination derivable therein. [0347] In some aspects, a kit may comprise a number of agents for assessing differential m5C RNA levels and/or modifying m5C RNA levels in any feature described herein, for example, in features listed in Table 1, in particular, features listed in Table 1 and annotated as SEQ ID NOs: 100-614. [0348] In some aspects, a kit may comprise reagents for detection of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24 and/or 25 features. In some aspects, a kit may comprise reagents for detection and/or modification of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, and/or 100 features. In some aspects, a kit may comprise reagents for detection and/or modification of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 1443, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 82, 183, 184, 185, 186, 187, 188, 189, 190, 191, 191, 192, 193, 194, 195, 196, 197, 198, 199, and/or 200 or more features. In some aspects, a kit may comprise reagents for detection and/or modification of features whose sequence characteristics are at least, exactly, or about 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or 100 percent identical to the specified features found in Table 1 or any subset thereof. [0349] Kits may comprise components, which may be individually packaged or placed in a container, such as a tube, bottle, vial, syringe, or other suitable container means. [0350] Individual components may also be provided in a kit in concentrated amounts; in some aspects, a component is provided individually in the same concentration as it would be in a solution with other components. Concentrations of components may be provided as 1x, 2x, 5x, 10x, or 20x or more. [0351] In certain aspects, negative and/or positive control nucleic acids, probes, and inhibitors are included in some kit aspects. In addition, a kit may include a sample that is a negative or positive control, for example a nucleic acid that does not comprise a m5C mark may be included as a negative control and a nucleic acid that does comprise a a m5C mark may be included as a positive control. [0352] It is specifically contemplated that a kit of the present disclosure may exclude any one or more of the described components in certain aspects.
Examples [0353] The following examples are included to demonstrate certain aspects of inventions disclosed herein. It should be appreciated by those of skill in the art that the techniques disclosed in the examples which follow represent techniques discovered by the inventor to function well in the practice of inventions disclosed herein, and thus can be considered to constitute certain modes for its practice. However, those of skill in the art should, in light of the present disclosure, appreciate that many changes can be made in the specific aspects which are disclosed and still obtain a like or similar result without departing from the spirit and scope of inventions described herein. Example 1 – Materials and Methods [0354] Unless otherwise stated, assays and experiments described in the following examples were performed as described here. Animals and tissues [0355] Tet2-/- mice were generated as described (48). These mice used in this study were backcrossed for more than 6 generations with C57BL/6 mice.6–8 weeks old of WT C57BL/6 and Tet2-/- mice were applied throughout this study, including both male and female. All animal studies were performed with the approval from the Institutional Animal Care and Use Committee (IACUC) at The University of Texas Health Science Center at San Antonio (UTHSCSA) and conducted in accordance with the institutional and national guidelines and regulations. Xenotransplantation of human leukemia cells [0356] For in vivo xenotransplantation study procedures, 1×106 K-562 cells were injected intravenously via the tail vein into adult NOD.Cg-PrkdcscidIl2rgtm1Wjl/SzJ (NSG) mice (6- 8 weeks old) pretreated with 280 cGy whole body irradiation. At 28-39 days after transplantation, peripheral blood (PB) was harvested from submandibular vein, and bone marrow (BM) was isolated from the tibias and femurs. Human CD33+ chimerism in BM and PB cells were analyzed by BD FACSCelesta™ flow cytometer (BD Biosciences). [0357] 2×104 THP-1 cells were injected intravenously via the tail vein into adult NSG mice (6-8 weeks old) pretreated with 280 cGy whole body irradiation. At 20-22 days after transplantation, human CD33+ CD45+ chimerism in BM and PB cells were analyzed by BD FACSCelesta™ flow cytometer.
Hematopoietic stem and progenitor cell sorting, colony assay and in vitro differentiation assay [0358] For hematopoietic stem and progenitor Lin-c-Kit+ (LK) cell selection, magnetic- activated cell sorting was applied with autoMACS® Pro Separator (Miltenyi Biotec). Briefly, the lineage-positive cells (Lin+) were depleted from total BM cells of 6–8 weeks old mice using the Direct Lineage Cell Depletion Kit (Miltenyi Biotec, 130-110-470), and then the lineage- negative cells (Lin-) were sorted with c-Kit (CD117) MicroBeads (Miltenyi Biotec, 130-091- 224). The purity of selected cells was analyzed by flow cytometry. [0359] For colony assay, LK cells (Lin-c-Kit+ cells) were plated in triplicate in methylcellulose medium (MethoCult, M3134) supplemented with mouse stem cell factor (mSCF; 100 ng/mL), interleukin-3 (mIL-3; 10 ng/mL), thrombopoietin (mTPO; 50 ng/mL), granulocyte-macrophage colony-stimulating factor (mGM-CSF; 10 ng/mL), human erythropoietin (hEPO; 4 U/mL), and interleukin-6 (hIL-6; 50 ng/mL, PeproTech). The colonies were imaged by STEMvision™ (STEMCELL Technologies) and scored on day 7, then these colonies were sequentially replated every 7 days for replating assay. Colony cells were also harvested and analyzed for expression of stem and progenitor markers and myeloid linage markers by flow cytometry. [0360] The LK cells were also incubated in suspension culture containing 30% FBS and 2% BSA in complete RPMI-1640 medium supplemented with 100 ng/mL mSCF, 10 ng/mL mIL-3, 50 ng/mL mTPO and 10 ng/ml mGM-CSF. Cells were harvested and analyzed for expression of stem/progenitor markers and myeloid lineage markers by flow cytometry on day 7. Flow cytometry analysis [0361] Cells were stained with PerCP-Cy™5.5 mouse lineage antibody cocktail (BD Biosciences, 561317) and PE Rat anti-mouse CD117 (BD Biosciences, 553869) antibody for hematopoietic stem and progenitor cells analysis. Brilliant Violet 421™ (BV421) anti- mouse/human CD11b (Mac-1) (BioLegend, 101236) and PE-CyTM7 Rat anti-mouse Ly-6G and Ly-6C (Gr-1) antibodies (BD Biosciences, 552985) were used to analyze myeloid lineage. [0362] Human CD33 chimerism was analyzed with PE Mouse anti-human CD33 (BD Biosciences, 561816) and PE-CyTM7 Rat anti-mouse CD45 (BD Biosciences, 552848) in peripheral blood (PB) and bone marrow (BM) cells from NSG mice xenotransplanted with K- 562 or THP-1 cells. All flow cytometry data were analyzed by FlowJo-V10 software (TreeStar).
Cell culture [0363] WT and Tet2-/- mouse embryonic stem cells (mESCs) were gifts from Dr. Bing Ren lab (15, 50). The control and knockout ESCs have been shown to be pluripotent by chimera formation assay. WT and Pspc1-/- mESCs were gifts from Dr. Jianlong Wang (20). All mESCs were kept in DMEM (Gibco, 11995065) supplemented with 15% Stem Cell Qualified Fetal Bovine Serum, Heat Inactivated (Gemini Bio Products, 100-525), 1 × L-glutamine (Gibco, 25030081), NEAA (Gibco, 25030081), LIF (MilliporeSigma, ESG1107), 1 × β- mercaptoethanol (Gibco, 21985023), 3 μM CHIR99021 (STEMCELL Technologies, 72052) and 1 μM PD0325901 (STEMCELL Technologies, 72182) at 37 °C and 5% CO2. The medium was replaced every 24 hours. ES cells were passaged on gelatin-coated plates twice to clear feeder cells before experiments. [0364] Wild-type THP-1, K-562, and TF-1 cells were obtained from the American Type Culture Collection (ATCC). SKM-1 cell line was obtained from DSMZ (German Collection of Microorganisms and Cell Cultures GmbH). Wild-type OCI-AML3 cell was a gift from Dr. Lucy Godley at University of Chicago. Wild-type and TET2-/- K-562 and THP-1 cells were gifts from Dr. Babal K. Jha at Cleveland Clinic as previously generated (51). THP-1, K-562, SKM-1, and OCI-AML3 cells were kept in RPMI-1640 (Gibco, 61870036) with 10% fetal bovine serum (FBS, Gibco 26140079) at 37 °C and 5% CO2. TF-1 was kept in RPMI-1640 (Gibco, 61870036) with 10% FBS (Gibco 26140079) and 2 ng/ml recombinant GM-CSF (Peprotech, 300-03) at 37 °C and 5% CO2. U-87 MG (HTB-14™), LN-229 (CRL-2611™), Hep G2 (HB-8065™), HeLa (CCL-2™), HCT 116 (CCL-247™), A549 (CCL-185™) and A- 375 (CRL-1619™) cells were obtained from American Type Culture Collection (ATCC). U- 87 MG and LN-229 were kept in ATCC-formulated Eagle's Minimum Essential Medium (ATCC, 30-2003) supplemented with 10% FBS (Gibco, 26140079) and 5% FBS (Gibco, 26140079), respectively. Hep G2, HeLa, HCT 116, A549 and A-375 cells were kept in DMEM (Gibco, 11995065) supplemented with 10% FBS (Gibco 26140079). All cell types were kept at 37 °C and 5% CO2. [0365] shNC and shMBD6 THP-1 and K-562 cell lines were constructed by lentivirus transduction with TransDux™ MAX Lentivirus Transduction Reagent (System Biosciences, LV860A-1). Lentiviral particles were prepared by using HEK293T cells and lentiviral packaging plasmids pCMV-VSV-G and pCMV-dR8.2 (pCMV-VSV-G and pCMV-dR8.2 were gifts from Bob Weinberg (Addgene plasmid # 8454 ; http://n2t.net/addgene:8454 ; RRID:Addgene_8454 and Addgene plasmid # 8455 ; http://n2t.net/addgene:8455 ; RRID:Addgene_8455) and short hairpin RNA (shRNA) plasmid pLKO.1-shC002
(MilliporeSigma, SHC002) or pLKO.1-shMBD6 (MilliporeSigma, TRCN000038787). Forty- eight hours after transfection, lentiviral particles were precipitated with PEG-it Virus Precipitation Solution (System Biosciences, LV810-1). shNC and shMBD6 THP-1 and K-562 cell lines were kept in RPMI-1640 (Gibco, 61870036) with 10% fetal bovine serum (FBS, Gibco) and 1 μg/mL puromycin (Gibco, A1113803) at 37 °C and 5% CO2. [0366] TET2 KO THP-1 cell line for PDX model was generated using CRISPR-Cas9 system. Single-guide RNAs were designed with CRISPick tool (https://portals.broadinstitute.org/gppx/crispick/public) and then cloned into LentiCRISPR V2-GFP vector by Synbio Technologies. THP-1 cells were infected by Lentiviral particles for 72 hours and followed by GFP positive cells selection using BD FACSMelody™ Cell Sorter (BD Biosciences). Knockout efficiency was verified by Western Blot. [0367] LK cells were plated in triplicate in methylcellulose medium (MethoCult, M3134) supplemented with mouse stem cell factor (mSCF; 100 ng/mL), interleukin-3 (mIL-3; 10 ng/mL), thrombopoietin (mTPO; 50 ng/mL), granulocyte-macrophage colony-stimulating factor (mGM-CSF; 10 ng/mL), human erythropoietin (hEPO; 4 U/mL), and interleukin-6 (hIL- 6; 50 ng/mL, PeproTech) (see e.g., FIGs.4 and 15). Small interfering RNA (siRNA) and plasmid transfection [0368] Two or three individual siRNAs, or a pool of four siRNAs targeting different regions of the same transcript (Dharmacon and/or Qiagen siRNA, Human NSUN2: Dharmacon L-018217-01-0005; Mouse Nsun2: Qiagen SI01331687; Human MBD5: Dharmacon L- 027190-01-0005; Mouse Mbd5: Qiagen SI04942448; Human MBD6: Dharmacon L-015157- 01-0005; and Mouse Mbd6: Dharmacon L-049319-01-0005; and Broad Institute Human TET2: TRCN0000418976 or Human MBD6: TRCN0000038787) were used for knockdown of human or mouse transcripts. siRNA transfections in mESCs and other adherent cell lines were performed with Lipofectamine™ RNAiMAX Transfection Reagent (Invitrogen, 13778075) according to the manufacturer’s instructions. Transfections in human leukemia cells (THP-1, TF-1, OCI-AML3, SKM-1) were performed with electroporation with SG Cell Line 4D- Nucleofector™ X Kit L (Lonza Bioscience, V4XC-3024) with program FF-100. Transfections in K-562 cells were performed with SF Cell Line 4D-Nucleofector™ X Kit L (Lonza Bioscience, V4XC-2012) with program FF-120. [0369] Plasmid transfections in mESCs or HEK293T cells were performed with Lipofectamine™ 3000 Transfection Reagent (Invitrogen, L3000015) according to the manufacturer’s instructions.
Cell proliferation assay [0370] Cell proliferation assay for adherent and suspension cells were performed similarly. Cells were seeded in 96-well plates before assaying in 100 μL settings with CellTiter 96® Aqueous One Solution Cell Proliferation Assay (Promega, G3582) following the manufacturer’s instructions. 2000-10,000 cells were seeded per well at day 0 and cell proliferation was monitored every 24 hours by incubation cell suspension with MTS reagent at 37 °C for 1 hour. DNase I–TUNEL assay [0371] For cell line samples, mESCs were reseeded to 10-cm cell culture dishes 12-hour prior to small interfering RNA (siRNA) transfection. The DNase I–TUNEL assay was performed using DeadEnd Fluorometric TUNEL System (Promega, G3250) following the manufacturer’s instructions following cell fixation with paraformaldehyde and permeabilization with Triton X-100. Two independent experiments were performed. Cells were treated with 1 U/mL of DNase I (Thermo Scientific, EN0521) for 5 minutes at 37 °C before rTdT labeling. Flow cytometry was performed on a BD Fortessa (BD Biosciences), and data was analyzed using Flowjo (Treestar). Nascent RNA imaging assay [0372] mESCs were reseeded in Nunc Lab-Tek II Chambered Coverglass (Thermo Scientific, 155409) 12-hour prior to treatment. The nascent RNA synthesis assay was performed using Click-iT™ RNA Alexa Fluor™ 488 Imaging Kit (Invitrogen, C10329) following the manufacturer’s instructions.5-EU incubation was performed for one hour before washing away by cell medium. Cell nucleus was counterstained with Hoechst 33342 (Abcam, ab228551). Samples were imaged on a Leica SP8 laser scanning confocal microscope at University of Chicago. Fluorescence intensity across different samples were quantified with Cellprofiler 3.0 with a custom workflow. Total RNA synthesis rate was obtained by multiplying average intensity in each cell by the area of each cell. Assay of transposase-accessible chromatin with visualization (ATAC-see) [0373] ATAC-see of mESCs were performed as described in the original report (30). ATTO-590 labeled imaging oligos were purchased from Integrated DNA Technologies (IDT) and the oligonucleotide sequences were as follows: [0374] 5′-[phos]CTGTCTCTTATACACATCT-3′ (Tn5MErev; SEQ ID NO: 92). [0375] 5′-/5ATTO590/TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG-3′ (Tn5ME-A-ATTO590; SEQ ID NO: 93).
[0376] 5′-/ATTO590/GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG-3′ (Tn5ME-B-ATTO590; SEQ ID NO: 94). [0377] The oligos were assembled with recombinant Tn5 transposase (Active motif, 81286) to produce the Tn5 transposome. Cell fixation, permeabilization, and labeling were performed as described in the original report (30). Recombinant protein purification [0378] Standard molecular cloning strategies were used to generate C-terminally MBP- 6×His tagged MBD domain of MBD6 (residues 1-100). Human MBD6 coding sequence was obtained from Origene (Origene, #SC324058). Full-length coding sequence was cloned using PrimeSTAR® GXL DNA Polymerase (TaKaRa Bio, R050B). Recombinant proteins were expressed in E. coli BL21 (DE3) grown to OD600 of 0.6 in LB medium. The expression was induced with 0.6 mM IPTG at 16 °C for 20 hours and cells were harvested via centrifugation. [0379] For purification of MBP tagged MBD domain of MBD6, bacterial pellet was resuspended in a lysis buffer containing 25 mM Tris-HCl (pH 7.5), 500 mM NaCl, 20 mM imidazole, 10 mM β-mercaptoethanol (β-ME), and protease inhibitors (Ethylenediaminetetraacetic acid-free protease inhibitor cocktail tablet, MilliporeSigma 4693132001) and disrupted by sonication for 3 minutes. Cell lysates were clarified via centrifugation at 26,000 g for 30 minutes and supernatant was applied to Ni2+-NTA resin (Thermo Scientific, 88221) and washed with lysis buffer, and bound proteins were eluted with lysis buffer supplemented with 250 mM imidazole. The eluted protein was bound back to Amylose resin (NEB, E8021S) before washing with lysis buffer. The bound protein was eluted with 1% maltose in lysis buffer. The eluted protein was analyzed by SDS-PAGE and concentrated by centrifugal filtration (Amicon Ultra-15). Final concentrated protein was aliquoted, flash frozen, and stored at -80 °C for future use. Quantitative RT-PCR (RT-qPCR) [0380] To quantify expression levels of transcripts, total RNA was reverse transcribed by using PrimeScript™ RT Master Mix (TaKaRa Bio, RR0361) with oligo dT primer and random hexamers as primers. The cDNA was then subjected to real-time PCR (LightCycler 96 sytem, Roche) by using FastStart Essential DNA Green Master (Roche, 06402712001) with gene specific primers. Relative changes in expression were calculated using the ΔΔCt method. Western blot [0381] Protein samples were prepared from respective cells by lysis in RIPA buffer (Thermo Scientific, 89900) containing 1 × Halt™ Protease and Phosphatase Inhibitor Cocktail (Thermo Scientific 78441). Protein concentration was measured by NanoDrop 8000
Spectrophotometer (Thermo Scientific). Lysates of equal total protein concentration were heated at 90 °C in 1 × loading buffer (Bio-Rad, 1610747) for ten minutes. Denatured protein was loaded into 4-12% NuPAGE Bis-Tris gels (Invitrogen, NP0335BOX) and transferred to PVDF membranes (Thermo Scientific, 88585). Membranes were blocked in Tris-Buffered Saline, 0.1% Tween® 20 (TBST) with 3% BSA (MilliporeSigma, A7030) for 30 minutes at room temperature, incubated in a diluted primary antibody solution at 4 °C overnight, then washed and incubated in a dilution of secondary antibody conjugated to HRP for 1 hour at room temperature. Protein bands were detected using SuperSignal West Dura Extended Duration Substrate kit (ThermoFisher, 34075) with a FluroChem R (Proteinsimple). Blot intensities were quantified with Fiji (ImageJ) Analyze-Gel module. Cell fractionation [0382] Fractionation of mESCs, K-562 or THP-1 cells was performed following the published protocol (52) with the optimized concentration of NP-40 (MilliporeSigma, 492018) for each cell line. In brief, 5 × 106 to 1 × 107 cells were harvested and washed with 1 mL cold PBS/1 mM EDTA buffer, then centrifuged at 4 °C with 500 g to collect the cell pellet.200 μL ice-cold lysis buffer (10 mM Tris-HCl, pH = 7.4, 0.05% NP-40, 150 mM NaCl) were added to the cell pellet and incubated on ice for 5 minutes, then gently pipetted up the cell lysate over 2.5 volumes of chilled sucrose cushion (24% RNase-free sucrose in lysis buffer) and centrifuged at 4 °C with 15,000 g for 10 minutes. All the supernatant was collected as cytoplasmic fraction and the nuclei pellet was washed once by gently adding 200 μL ice-cold PBS/1 mM EDTA to the nuclei pellet without dislodging the pellet. The nuclei pellet was resuspended in 200 μL prechilled glycerol buffer (20 mM Tris-HCl, pH = 7.4, 75 mM NaCl, 0.5 mM EDTA, 0.85 mM DTT, 0.125 mM PMSF, 50% glycerol) with gentle flicking of the tube. Then an equal volume of cold nuclei lysis buffer (10 mM HEPES, pH = 7.6, 1 mM DTT, 7.5 mM MgCl2, 0.2 mM EDTA, 0.3 M NaCl, 1 M urea, 1% NP-40) was added, followed by vigorous vertexing for 5 seconds twice. The nuclei pellet mixtures were incubated for 2 minutes on ice, then centrifuged at 4 °C with 15,000 g for 2 minutes. The supernatant was collected as the soluble nuclear fraction (nucleoplasm). The pellet was gently rinsed with cold PBS/1 mM EDTA without dislodging and then collected as the chromosome-associated fraction. [0383] Fractionation of LK cells were performed similar to ES cells with minor modification. Briefly, LK cells were cultured in vitro for 2 hours followed sorting from autoMACS® Pro Separator, and then ice-cold lysis buffer (10 mM Tris-HCl, pH = 7.4, 0.15% IGEPAL® CA-630, 75 mM NaCl) was used to separate cytoplasmic fraction. The procedures
for isolating nuclear fraction and chromosome-associated fraction were the same as that of ES cells. Quantitative analysis of m5C and hm5C levels via UHPLC-MS/MS [0384] 75 ng ribo-depleted RNA was digested by nuclease P1 (MilliporeSigma, N8630) in 20 μL buffer containing 20 mM ammonium acetate (NH4OAc) at pH 5.3 for 2 hours at 42 °C. Then, 1 unit of FastAP Thermosensitive Alkaline Phosphatase (Thermo Scientific, EF0651) was added to the reaction and FastAP buffer was added to a 1× final concentration before incubation for 2 hours at 37 °C. The samples were diluted and filtered (0.22 μm, Millipore) and injected into a C18 reverse-phase column coupled online to Agilent 6460 LC-MS/MS spectrometer in positive electrospray ionization mode. The nucleosides were quantified using retention time and the nucleoside to base ion mass transitions (268 to 136 for A; 284 to 152 for G; 258 to 126 for m5C and 274 to 142 for hm5C). Quantification was performed by comparing with the standard curve obtained from pure nucleoside standards running with the same batch of samples. Chromatin-associated RNA sequencing [0385] Chromatin-associated RNA sequencings for mESCs, K-562 and HSPCs were performed similarly. Ribosomal RNA was depleted from isolated chromatin-associated RNA with RiboMinus™ Eukaryote System v2 (Invitrogen, A15026) followed by size-selection using the standard protocol of RNA Clean & Concentrator-5 (RCC-5, Zymo Research, R1013). RNA libraries were constructed with SMARTer® Stranded Total RNA-Seq Kit v2 - Pico Input Mammalian (TaKaRa Bio, 634411) according to the manufacturer’s instructions. Three replicates were performed for each condition. Libraries were sequenced on a NovaSeq 6000 sequencer. Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq) [0386] ATAC-seq was performed by using the ATAC-Seq Kit (Active Motif, 53150) according to the manufacturer's instructions. In brief, 50,000 to 100,000 cells were aliquoted for each replicate. Cells were then permeabilized with buffer containing 0.1% Tween-20 and 0.01% Digitonin, both supplied by the original kit. Accessible chromatin regions were tagged with pre-assembled Tn5 transposome. Tagged genomic DNA was extracted from cells and DNA libraries were obtained by PCR amplification. Pooled libraries were sequenced on a NovaSeq 6000 sequencer. m5C methylated RNA immunoprecipitation with spike-in [0387] 5-methylcytosine (m5C) modified or unmodified mRNA spike-ins were in vitro transcribed from firefly luciferase or renilla luciferase coding sequences with mMESSAGE
mMACHINE™ T7 Transcription Kit (Invitrogen, AM1344) and manually reconstituted dNTP mixes with 2% m5CTP/CTP ratio.5-methylcytidine-5-triphosphate was obtained from Trilink (#N-101405). Yielded RNA was purified by using the standard protocol of RNA Clean & Concentrator-5 (Zymo Research, R1013). The spike-in RNA mixes were then applied to RNA before fragmentation. [0388] Total RNAs from whole cell or the chromatin-associated fractions were randomly fragmented by incubation at 94 °C for 4 minutes using 1× fragmentation buffer (NEB, E6186A). Fragmentation was stopped by adding 1× Stop Solution. Spike-in RNAs were added to each sample. Four microgram anti-m5C antibody (Diagenode, MAb-081-100) was conjugated with 30 μL of protein G beads (Invitrogen, 1003D) in 300 μL IP buffer (10 mM Tris-HCl pH 7.5, 150 mM NaCl, 0.05% Triton X-100 (v/v)) for two hours at 4 °C on a rotating wheel. The same procedure was performed for a control reaction using mouse IgG isotype control (Abcam, ab37355). Bead-antibody complexes were washed three times with IP buffer and finally brought to 250 μL with IP buffer. After heat denaturation and quick chill on ice, 10-μg samples of RNA were added to the bead-antibody complexes and incubated with 1 μL SUPERase•In™ RNase Inhibitor (Invitrogen, AM2694) overnight at 4 °C on a rotating wheel. After several washes with IP buffer, RNA was incubated in 100 μL elution buffer (5 mM Tris- HCl pH 7.5, 1 mM EDTA, 0.05% SDS, and 200 μg Proteinase K (Invitrogen, 25530049)) for 1 hour at 50 °C. Beads were removed by centrifugation in a microcentrifuge, and the supernatant was purified with RCC-5 without size selection. Immunoprecipitated RNAs were eluted in water and then subjected to RT-qPCR analysis. [0389] For next-generation sequencing, the immunoprecipitated RNAs were used as inputs for library constructions with SMARTer® Stranded Total RNA-Seq Kit v2 - Pico Input Mammalian (TaKaRa Bio, 634411) according to the manufacturer’s instructions. Libraries were sequenced on a NovaSeq 6000 sequencer. Methyl-DNA immunoprecipitation (MeDIP) [0390] Genomic DNA was extracted from cultured cells with Monarch® Genomic DNA Purification Kit (New England Biolabs, T3010S). Unmethylated lambda DNA (Promega, D1521) was spiked at a 0.5 % ratio for quality control of the immunoprecipitation. DNAs were then fragmented to 200 – 1000 base pairs (bps) with NEBNext® dsDNA Fragmentase® (New England Biolabs, M0348S) by a 22-minute incubation. The fragmented DNA was then denatured at 95 °C for 5 minutes and immediately cooled on ice for another 5 minutes. Input samples were removed and saved on ice for later use. The reaction was conducted in IP buffer (150 mM NaCl, 10 mM Tris-HCl, pH 7.5, 0.1% NP-40) at 4 ° C overnight. Beads were then
washed with IP buffer for three times, followed by three washes by high salt wash buffer (500 mM NaCl, 10 mM Tris-HCl, pH 7.5, 0.1% NP-40). Immunoprecipitated DNA was extracted by proteinase K digestion (Invitrogen, 25530049) prior to qPCR analysis. RNA synthesis rate assay [0391] RNA synthesis rate was measured with a procedure modified from the protocol Click-iT™ Nascent RNA Capture Kit, for gene expression analysis (Invitrogen, C10365). mESCs were seed to 6-cm dishes at the same density in three replicates. After 42 hours, cells were treated with 1 mM 5-ethynyl uridine (5-EU) for 10 minutes, 20 minutes, and 40 minutes before RNA harvest with TRIzolTM Reagent (Invitrogen, 15596026). Ribosomal RNA was depleted from total RNA preps prior to click reaction with biotin azide (PEG4 carboxamide-6- azidohexanyl biotin). Biotinylated RNA was enriched by Dynabeads™ MyOne™ Streptavidin T1 (Invitrogen, 65601). ERCC RNA Spike-In Mix (Invitrogen, 4456740) was added to eluted RNA with the amount proportional to total RNA to each sample before rRNA depletion. Spiked-RNAs were used as input for RNA-seq library constructions with SMARTer® Stranded Total RNA-Seq Kit v2 - Pico Input Mammalian (TaKaRa Bio, 634411) according to the manufacturer’s instructions. Libraries were sequenced on a NovaSeq 6000 sequencer. Cleavage Under Targets and Tagmentation (CUT&Tag) [0392] CUT&Tag was performed using CUT&Tag-IT™ Assay Kit (Active motif, 53160) following the manufacturer’s instructions. Briefly, 0.2 million cells were used as input for one replicate and washed with 1× Wash buffer. Washed cells were conjugated to concanavalin A beads and permeabilized with Digitonin-containing buffer before primary antibody (anti- H3K27me3, anti-H2AK119ub or normal rabbit IgG) incubation. Pre-assembled protein A-Tn5 transposome enabled DNA tagmentation was performed after secondary antibody conjugation. Tagged DNA was extracted by proteinase K digestion and amplified by PCR with indexed primers to yield DNA libraries. DNA libraries were subjected to qPCR analysis with gene- specific primers or high-throughput sequencing on a NovaSeq 6000 sequencer. Chromatin immunoprecipitation [0393] Chromatin immunoprecipitation (ChIP) was performed by using SimpleChIP Enzymatic Chromatin IP Kit (Cell Signaling Technology, 9003) following the manufacturer’s instructions. Briefly, 10 million per replicate mESCs were crosslinked with 1% paraformaldehyde for 10 minutes at room temperature. Cell pellet was scraped and snap-frozen by liquid nitrogen before stored at -80 °C. Nucleus was prepared from thawed cell pellet by hypotonic treatment before enzymatic digestion by MNase. After condition optimization to obtain chromatin fragments at 100 – 900 base pairs (validated by agarose gel electrophoresis),
immunoprecipitation reactions were setup and recovered DNAs were de-crosslinked with NaCl. Immunoprecipitated DNAs were therefore used as inputs for qPCR or DNA library construction using kit NEBNext® Ultra™ II DNA Library Prep with Sample Purification Beads (NEB, E7103S). ChIP-seq libraries were sequenced on a NovaSeq 6000 sequencer. Construction of induced tethering mESC cell lines [0394] Cell lines stably expressing dCas13b protein fusion with catalytic domain of mouse TET2 (TET2CD) or catalytic dead mutants were constructed first from WT mESCs. The coding sequence of dCas13b was cloned from plasmid pCMV-dCas13-M3nls, which was a gift from David Liu (Addgene plasmid #155366; http://n2t.net/addgene:155366; RRID:Addgene 155366). The coding sequence of TET2CD was cloned from plasmid pcDNA3-FLAG-mTET2 (CD), which was a gift from Yue Xiong (Addgene plasmid #89736; http://n2t.net/addgene:89736; RRID:Addgene_89736), and the catalytic dead mutant was cloned from plasmid pcDNA3-Flag-Tet2 CD Mut, which was a gift from Yi Zhang (Addgene plasmid #72220; http://n2t.net/addgene:72220; RRID:Addgene_72220). The coding sequences of TET2CD (or mutant) and dCas13b were fused and separated with EASLSRPDPALAALGGGGSGGGGSGGGGS (SEQ ID NO: 33) linker. The fusion protein was delivered to mESCs with lentiviral system and selected with Hygromycin B (Gibco, 10687010). Sequences expressing guide RNA for dCas13b were cloned into a plasmid expressing a Tet operator controlled H1 operator (H1-2O2) (53). This tet-pLKO-sgRNA-puro plasmid was a gift from Nathanael Gray (Addgene plasmid #104321; http://n2t.net/addgene:104321; RRID:Addgene_104321). The guide-RNA expression plasmid was delivered into TET2CD fusion protein expressing mESCs by lentivirus. Resulting cell lines were selected with puromycin (Gibco, A1113803). Antisense Oligonucleotide (ASO) and plasmid transfection in LK cells [0395] The steric-blocking ASOs (Integrated DNA Technologies) targeted to the hyper- methylated motifs were fully modified with 2’-O-Methoxyethyl (2'MOE) bases and phosphorothioate bonds, which were also incorporated with a fluorescent dye Cy5 at 3’ end to monitor transfection efficiency. The NC5 ASO was used as negative control that was not targeting to human or mouse genome. [0396] IAPEz-int 2’MOE: AGTTGAATCCTTCTTAACAGTCTGCTTTACGGGAAC (SEQ ID NO: 89), [0397] Specifically, SEQ ID NO: 89 modified as follows: /52MOErA/*/i2MOErG/*/i2MOErT/*/i2MOErT/*/i2MOErG/*/i2MOErA/*/i2MOErA/*/i2 MOErT/*/i2MOErC/*/i2MOErC/*/i2MOErT/*/i2MOErT/*/i2MOErC/*/i2MOErT/*/i2MOE
rT/*/i2MOErA/*/i2MOErA/*/i2MOErC/*/i2MOErA/*/i2MOErG/*/i2MOErT/*/i2MOErC/*/ i2MOErT/*/i2MOErG/*/i2MOErC/*/i2MOErT/*/i2MOErT/*/i2MOErT/*/i2MOErA/*/i2M OErC/*/i2MOErG/*/i2MOErG/*/i2MOErG/*/i2MOErA/*/i2MOErA/*/i2MOErC//3Cy5Sp/. [0398] MERVL 2’MOE: ACCATTACTGGGTATGTTAT (SEQ ID NO: 88), [0399] Specifically, SEQ ID NO: 88 modified as follows: /52MOErA/*/i2MOErC/*/i2MOErC/*/i2MOErA/*/i2MOErT/*/i2MOErT/*/i2MOErA/*/i2 MOErC/*/i2MOErT/*/i2MOErG/*/i2MOErG/*/i2MOErG/*/i2MOErT/*/i2MOErA/*/i2MO ErT/*/i2MOErG/*/i2MOErT/*/i2MOErT/* /i2MOErA/*/i2MOErT//3Cy5Sp/. [0400] NC52'MOE: GCGACTATACGCGCAATATG (SEQ ID NO: 90), [0401] Specifically, SEQ ID NO: 90 modified as follows: /52MOErG/*/i2MOErC/*/i2MOErG/*/i2MOErA/*/i2MOErC/*/i2MOErT/*/i2MOErA/*/i2 MOErT/*/i2MOErA/*/i2MOErC/*/i2MOErG/*/i2MOErC/*/i2MOErG/*/i2MOErC/*/i2MO ErA/*/i2MOErA/*/i2MOErT/*/i2MOErA/* /i2MOErT/*/i2MOErG//3Cy5Sp/. [0402] The crRNA targeting the primary m5C sites on IAPEz sequence based on the inventors RNA bisulfite sequencing results was custom-synthesized and cloned into pLentiRNAGuide_002-hU6-RfxCas13d-DR-BsmBI-EFS-EGFP:P2A:Puro-WPRE vector. The catalytic domain of mouse TET2 (mTET2CD) or a catalytically dead mutant TET2H1304Y/D1306A (mTET2CDHxDCD) was cloned into pLV[Exp]-[EF-1sc>[NLS- RfxCas13d]:[Linker]:P2A:mCherry(ns):T2A:Bsd vector. All these plasmids were synthesized, constructed and confirmed by VectorBuilder Inc. [0403] All the ASOs and plasmids were transfected into LK cells using electroporation with P3 Primary Cell 4D-Nucleofector™ X Kit S (Lonza Bioscience, Cat#V4XP-3032) by program CV-137. Antisense oligo (ASO) transfections [0404] The inventors designed antisense oligos targeting the primary m5C sites on IAPEz or MERVL sequences based on our RNA m5C sequencing results. ASO transfections in mESCs were performed by using Lipofectamine™ RNAiMAX Transfection Reagent (Invitrogen, 13778075) according to the manufacturer’s instructions. Crosslinking and Immunoprecipitation (CLIP) [0405] Cultured mESCs or human leukemia cells (K-562, SKM-1, WT and TET2-/- THP- 1) were UV crosslinked at 254 nm with a stratalinker (Stratagene) for two times to achieve a 4,500 J/m2 UV flux and flash-frozen in liquid nitrogen. Pellets were thawed on ice and resuspended in 3 volume of ice-cold CLIP lysis buffer (50 mM HEPES pH 7.5, 150 mM KCl, 2 mM EDTA, 0.5% (v/v) NP-40, 0.5 mM DTT, 1 × Halt™ Protease and Phosphatase Inhibitor
Cocktail (Thermo Scientific, 78442), 1 × RNaseOUT Recombinant Ribonuclease Inhibitor (Invitrogen, 10777019)). Pellets were lysed by rotating at 4 °C for 15 minutes after passing through a 26 G needle (BD Biosciences). Embryo suspensions were sonicated on a bioruptor (Diagenode) with 30 s on/30 s off for 5 cycles. Lysates were cleared by centrifugation at 21,000 g for 15 minutes at 4 °C on a benchtop centrifuge. Supernatants were applied to Flag antibody (Abcam, ab205606) conjugated protein A beads (Invitrogen, 1001D) and left overnight at 4 °C on an end-to-end rotor. Beads were washed extensively with 1 ml wash buffer (50 mM HEPES pH 7.5, 300 mM KCl, 0.05% (v/v) NP-40, 1 × Halt™ Protease and Phosphatase Inhibitor Cocktail, 1 × RNaseOUT Recombinant Ribonuclease Inhibitor) at 4 °C for 5 times. Protein- RNA complex conjugated to the beads were treated by 8 U/μL RNase T1 (Thermo Scientific, EN0541) at 22 °C for 10 minutes with shaking. Input samples are digested in parallel. Then input and IP samples were separated on an SDS-PAGE gel and gel slices at corresponding size ranges were treated by proteinase K (Invitrogen, 25530049) elution. RNA was recovered with TRIZol reagent (Invitrogen, 15596026). Then T4 PNK (Thermo Scientific, EK0031) end repair was performed with purified RNA before library construction with NEBNext® Small RNA Library Prep Set for Illumina® (NEB, E7330S). Libraries were pooled and sequenced on a NovaSeq 6000 sequencer. Electrophoretic mobility shift assay (EMSA) [0406] Recombinant MBD6MBD-MBP-His protein was purified from E. coli BL21 (DE3). Different concentrations of proteins were mixed with 100 nM FAM-labeled oligo probes in 1 × binding buffer (20 mM HEPES pH 7.5, 40 mM KCl, 10 mM MgCl2, 0.1 % Triton X-100, 10 % glycerol and 1 × RNaseOUT Recombinant Ribonuclease Inhibitor (Invitrogen, 10777019)). The probe-protein mixture was incubated on ice for 30 minutes. The mixtures were loaded to a 10% Novex™ TBE Gel (Invitrogen, EC62755BOX). After gel running at 4 °C in 0.5 × TBE for 2 hours, the gel was washed twice in 0.5 × TBE for five minutes. Washed gel was imaged with a GelDoc imaging system (Bio-Rad) with channel “FAM”. Individual KD values were determined from a regression equation Y = Max × [P]/(KD + [P]). Y is the fraction of probe bound at each protein concentration. Fraction bound is determined from the background-subtracted signal intensities using the expression: bound/(bound+unbound). [P] is protein concentration in each sample. Max is the band intensity of unbound probe with protein concentration 0. Quantitative analysis of RNA modification levels of CLIP RNA [0407] Cultured mESCs were washed twice with DPBS before UV crosslinking at 254 nm with a stratalinker (Stratagene) and flash-frozen in liquid nitrogen. Pellets were thawed on ice
and resuspended in 3 volume of ice-cold CLIP lysis buffer (50 mM HEPES pH 7.5, 150 mM KCl, 2 mM EDTA, 0.5% (v/v) NP-40, 0.5 mM DTT, 1 × Halt™ Protease and Phosphatase Inhibitor Cocktail (Thermo Scientific, 78442), 1 × RNaseOUT Recombinant Ribonuclease Inhibitor (Invitrogen, 10777019)). Pellets were lysed by rotating at 4 °C for 15 minutes after passing through a 26 G needle (BD Biosciences). Cell suspensions were sonicated on a bioruptor (Diagenode) with 30 s on/30 s off for 5 cycles. Lysates were cleared by centrifugation at 21,000 g for 15 minutes at 4 °C on a benchtop centrifuge. Supernatants were applied to Flag- antibody (Abcam, ab205606) conjugated protein A beads (Invitrogen, 1001D) and left overnight at 4 °C on an end-to-end rotor. Beads were washed extensively with 1 ml wash buffer (50 mM HEPES pH 7.5, 300 mM KCl, 0.05% (v/v) NP-40, 1 × Halt™ Protease and Phosphatase Inhibitor Cocktail, 1 × RNaseOUT Recombinant Ribonuclease Inhibitor) at 4 °C for 5 times. Then input and IP samples were treated by proteinase K (Invitrogen, 25530049) to release crosslinked RNA. RNA was recovered with TRIZol reagent (Invitrogen, 15596026). Then ribosomal RNA was removed by using RiboMinus™ Eukaryote System v2 (Invitrogen, A15026) with purification and size-selection using RNA Clean & Concentrator-5 (Zymo Research, R1013). Recovered RNAs were subjected to digestion and MS-Spec analysis. Biotinylation of immunoprecipitated RNAs [0408] Biotin labeling of immunoprecipitated RNA was performed according to a published protocol (https://www.protocols.io/view/biotin-labelling-of-immunoprecipitated- rna-v1pre-kqdg354kpv25/) without any modifications. Fluorescence microscopy [0409] For immunolabeling, cells were fixed with 4% PFA in DPBS at 37 °C for five minutes, permeabilized with MeOH at -20 °C for eight minutes, dried at room temperature for ten minutes, then washed with DPBS at room temperature for three times. Chambers are blocked in blocking buffer (DPBS, 0.5% BSA, 0.05% Triton X-100, 1:100 SUPERase^InTM (Invitrogen, AM2694)) for one hour at room temperature and primary antibodies were diluted in blocking solution according to the suggested fold from the manufacturer’s and incubate at room temperature for one hour. Chambers were washed with 0.05% Triton X-100 in DPBS for 3 times, then 1:1000 diluted goat anti rabbit IgG-AF568 conjugate (Invitrogen, A-11011) in blocking solution was added to each well and chambers were incubated at room temperature for one hour. Then chambers were washed with 0.05% Triton X-100 in DPBS for three times and fixed with 4% PFA in DPBS for thirty minutes at room temperature and washed for three times with DPBS. Nuclei was counterstained with 2 µg/ml Hoechst 33342 (Abcam, ab145597) in DPBS at room temperature for 20 minutes, wash with DPBS for 3 times. Chambers were
stored at 4 °C before proceeding to imaging on a Leica SP8 laser scanning confocal microscope at University of Chicago. Lifetime profiling [0410] Transcription inhibitor actinomycin D (Act D, Abcam ab141058) was applied to a final concentration of 2.5 μM in mESC medium to cultured mESCs. Act D treatment started at 48 hours post siRNA transfection (if any). RNAs were extracted from cells at different time points after Act D treatment (10 minutes, 3 hours, and 6 hours). Custom spike-in RNA (in vitro transcribed from firefly luciferase coding sequence) was added proportional to the yield of total RNA for different samples for RNA quantifications. RNA abundance was normalized to the value at 10 minutes for each condition. DNA-seq data analysis [0411] Raw reads were trimmed with Trimmomatic (version 0.39) (54) and then mapped to mouse genome (mm10) or human genome (hg38) using bowtie2 (version 2.4.1) (55) with default mode, where multiple alignments are searched and the best one is reported. [0412] For ATAC-seq, reads mapped to the mitochondrial genome were discarded and then the remaining reads were deduplicated using a Picard tool ‘MarkDuplicates’ (version 2.26.2) (http://broadinstitute.github.io/picard/). [0413] For CUT&Tag-seq, peaks were called using HOMER (version 4.9) (56) in histone mode with reads deduplicated by using ‘-tbp 1’ parameter. Peaks identified in at least two biological replicates were retained for subsequent downstream analysis. Nascent RNA-seq data analysis [0414] Raw reads were trimmed with Trimmomatic (version 0.39) (54), and then aligned to mouse genome and transcriptome (mm10, version M19) as well as external RNA Control Consortium (ERCC) RNA spike-in control (Thermo Fisher Scientific) using HISAT2 (version 2.2.1) (57). Annotation files (version M19 for mouse) were obtained from GENCODE database (https://www.gencodegenes.org/) (58). Reads on each GENCODE annotated gene were counted using HTSeq (version 0.12.4) (59) and then normalized to counts per million (CPM) using edgeR packages in R (60). CPM was converted to attomole by linear fitting of the RNA ERCC spike-in. RNA level and EU adding time were fitted using a linear mathematical model, and the slope was estimated as transcription rate of RNA. CLIP-seq data analysis [0415] Low quality reads were filtered using ‘fastq_quality_filter’, and adapter were clipped using ‘fastx_clipper’, then adapter-free reads were collapsed to remove PCR duplicates by using ‘fastx_collapser’ and finally reads longer than 15 nt were retained for further analysis
(http://hannonlab.cshl.edu/fastx_toolkit/). Reads from rRNA were removed. The preprocessed reads were mapped to mouse genome (mm10) using bowtie (version 1.0.0) (61) with ‘-v 3 -m 10 -k 1 –best –strata’ parameters. Mapped reads were separated by strands with samtools (version 1.16.1) (62) and peaks on each strand were called using MACS2 (version 2) (63) with parameter ‘-nomodel, --keep-dup 5, -g 1.3e8, -extsize 150’ separately. Significant peaks with q < 0.01 identified by MACS2 were considered. Peaks identified in at least two biological replicates were merged using bedtools (version 2.31.0) (62) and were used in the following analyses. RNA-seq data analysis [0416] Raw reads were trimmed with Trimmomatic (version 0.39) (54), then aligned to mouse (mm10) or human (hg38) genome and their corresponding transcriptome using HISAT2 (version 2.2.1) (57). Annotation files (version M19 for mouse, and version v29 for human in gtf format) were obtained from GENCODE database (https://www.gencodegenes.org/) (58). Reads were counted for each GENCODE annotated gene using HTSeq (version 0.12.4) (59) and then differentially expressed genes were called using DESeq2 package in R (64) with P- value < 0.05. m5C MeRIP-seq data analysis [0417] Raw reads were trimmed with Trimmomatic (version 0.39) (54), then aligned to mouse (mm10) or human (hg38) genome and transcriptome using HISAT2 (version 2.1.0) (57). Annotation files (version M19 for mouse, and version v29 for human in gtf format) were downloaded from GENCODE database (https://www.gencodegenes.org/) (58). Mapped reads were separated by strands with samtools (version 1.16.1) (62) and m5C peaks on each strand were called using MACS2 (version 2) (63) with parameter ‘--nomodel, --keep-dup all, -g 7e8, -extsize 150’ separately. In this regard, the genome size was estimated based on reads coverage obtained from input samples. Significant peaks with q < 0.01 identified by MACS2 were considered. Peaks identified in at least two biological replicates were merged using bedtools (version 2.31.0) (20) and were used in the following analysis. Antibodies [0418] The antibodies used in this study are summarized below: rabbit polyclonal anti- H3K27ac antibody (Abcam, ab4729); mouse monoclonal anti-H3K27me3 antibody (Abcam, ab6002); rabbit polyclonal anti-H3K4me1 antibody (Abcam, ab8895); rabbit polyclonal anti- H3K4me3 antibody (Abcam, ab8580); rabbit polyclonal anti-H3K79me2 antibody (Abcam, ab3564); rabbit polyclonal anti-H3K9me3 antibody (Abcam, ab8898); rabbit monoclonal anti- H2AK119ub antibody (Cell Signaling Technology, 8240S); rabbit monoclonal anti-H3
antibody (Cell Signaling Technology, 4499S); mouse monoclonal anti-TET2 antibody (MilliporeSigma, MABE462); mouse monoclonal anti-PSPC1 antibody (Santa Cruz, sc- 374181); rabbit monoclonal anti-GAPDH antibody, HRP conjugate (Cell Signaling Technology, 8884S); rabbit monoclonal anti-DDDDK tag antibody (Abcam, ab205606); rabbit polyclonal anti-SNRP70/U1-70K antibody (Abcam, ab83306); rabbit polyclonal anti-tubulin antibody (Cell Signaling Technology, 2144S); mouse monoclonal anti-5-methylcytosine antibody (Diagenode, C15200081-100); rabbit monoclonal anti-H3K27me3 antibody (Cell Signaling Technology, 9733S, only for CUT&Tag experiments); rabbit polyclonal anti- NSUN2 antibody (Proteintech, 20854-1-AP). Goat anti-rabbit IgG, HRP conjugated antibody (Cell Signaling Technology, 7074S) and horse anti-mouse IgG, HRP conjugated antibody (Cell signaling Technology, 7076S) were used as secondary antibodies. Mouse IgG-Isotype Control (Abcam, ab37355) and rabbit IgG-Isotype Control (Abcam, ab37415) were used as normal IgG controls. PerCP-Cy™5.5 mouse lineage antibody cocktail (BD Biosciences, 561317); PE Rat anti-mouse CD117 (BD Biosciences, 553869); Brilliant Violet 421™ (BV421) anti- mouse/human CD11b (Mac-1) (BioLegend, 101236); PE-CyTM7 Rat anti-mouse Ly-6G and Ly-6C (Gr-1) antibodies (BD Biosciences, 552985); PE Mouse anti-human CD33 (BD Biosciences, 561816) and PE-CyTM7 Rat anti-mouse CD45 (BD Biosciences, 552848). All antibodies were applied at a dilution fold according to the manufacturer’s suggestions for specific use unless specified in the methods section. Select Sequences Table 3 – Polynucleotide and Amino Acid sequences
Example 2 – Tet2 KO mESCs exhibited more open chromatin and active global transcription compared with WT mESCs. [0419] TET methylcytosine dioxygenases (TET1, TET2, and TET3) mediate oxidation of DNA 5-methylcytosine (5mC) in mammals. This pathway has been shown to impact gene expression regulation in a wide range of different biological systems (6-11). Among the three TET enzymes, TET2 is unique in that this gene distinctly exhibits high mutation ratios (10%– 40%) in myeloid malignancies (see e.g., FIG. 6A), with frequent isocitrate dehydrogenase (IDH) mutations observed in human cancers also thought to mainly act through TET2 inhibition (12-14). Intriguingly, TET2 deficiency can lead to global genomic DNA hypomethylation (15), which is inconsistent with a DNA demethylation function. Structurally, TET2 is also unique among TET enzymes in that it is not covalently linked to the zinc finger CXXC domain proteins CXXC4 or CXXC5 (16-18); the non-covalent binding between TET2 and CXXC4/5 is critical to DNA binding by TET2 (19) (see e.g., FIG. 6B). In mouse embryonic stem cells (mESCs), it has been previously shown that TET2 predominantly binds PSPC1, an RNA-binding protein, and exhibits an interesting function of RNA 5- methylcytosine (m5C) oxidation (20). Other studies have also indicated RNA m5C oxidation by TET2 or drosophila TET homologue (25-29). [0420] The inventors, and others, have recently reported chromatin regulation through reversible N6-methyladenosine (m6A) modification on chromatin-associated RNA (caRNA) (25-29) (see e.g., Jun Liu et al., N6-methyladenosine of chromosome-associated regulatory RNA regulates chromatin state and transcription. Science, 2020; which is incorporated herein by reference in its entirety for the purposes described herein). These advances, together with previous discoveries of TET2 oxidation of RNA m5C, prompted the inventors to explore potential chromatin regulation through TET2-mediated caRNA m5C oxidation. [0421] The inventors initiated studies on wild-type (WT) and Tet2 knockout (KO) mESCs, as TET2 was previously shown to bind Paraspeckle Component 1 (PSPC1) and mediate RNA m5C oxidation in mESCs (20). Additionally, extensive genomic data on the TET proteins have been reported in this cell model (15). The inventors found that Tet2 KO mESCs exhibited more
open chromatin and active global transcription when compared to WT mESCs, as revealed by assay of transposase-accessible chromatin with visualization (ATAC-see) (FIG.1A) (30) and metabolic labeling with 5-ethynyluridine (EU) followed by imaging or sequencing (FIG.1B and FIG.6C). The more open chromatin state agreed well with the previously reported global DNA hypomethylation induced by TET2 deficiency (15). However, as noted above, this finding appears to be inconsistent with TET2’s predominant DNA 5mC oxidation function, which should lead to global DNA hypermethylation with TET2 deficiency. [0422] The inventors then examined Pspc1 KO mESCs, in which RNA targeting of TET2 via PSPC1 should be impaired. PSPC1 depletion phenocopied Tet2 KO mESCs, showing the same chromatin openness and global transcription upregulation when compared with WT mESCs (FIG.1C, left panel). This effect was dependent on RNA-binding of PSPC1, as RNA binding-null PSPC1 mutants failed to rescue the chromatin change caused by PSPC1 depletion (FIG.1C, right panel). Remarkably, when Pspc1 was knocked out in mESCs, the upregulation of caRNAs (FIG.6D), rather than messenger RNAs (mRNAs), exhibited a stronger correlation with the changes observed upon Tet2 KO (FIG.1D and FIG.6E). Gene activation caused by TET2 depletion correlated best with PSPC1 chromatin localization in mESCs, but not that of CXXC4/5 (FIG.1E). While TET2 can act on either DNA or RNA when binding with different partner proteins, surprisingly, the TET2-mediated gene repression changes appeared to be associated with TET2 RNA targeting activity. Example 3 – TET2 catalytically mediated oxidation of m5C on RNAs (e.g., chromatin associated RNAs) and regulated chromatin state [0423] The invenors next investigated whether m5C on chromatin-associated RNA was a substrate of TET2 in mESCs. By employing ultra-high-performance liquid chromatography- tandem mass spectrometry (UHPLC-MS/MS) measurements, ribosomal RNA (rRNA)- depleted caRNA were found to contain m5C (FIG.2A). Importantly, Tet2 KO led to a notable increase of the caRNA m5C level, accompanied with a decreased level of the oxidation product 5-hydroxymethylcytosine (hm5C) when compared with those from WT mESCs (FIG.2A). [0424] Tet2 KO also resulted in a widespread increase in chromatin-associated regulatory RNA (carRNA) abundance, encompassing enhancer RNAs, promoter-associated RNAs, and repeat RNAs (FIG. 7A). By performing m5C MeRIP enrichment followed by sequencing, it was found that when these carRNAs were marked with m5C methylation, their abundance showed even greater increases upon either Tet2 or Pspc1 KO (FIG. 7B). Interestingly, approximately half of the m5C-marked peaks were located in repeat RNA (FIG. 2B).
Moreover, these m5C-marked repeat RNAs were associated with increased local chromatin accessibility, as quantified by ATAC-seq (FIG.2C and FIG.7C). [0425] Among the repeat RNAs, both L1 in LINE and ERVK in the LTR family exhibited a significant enrichment of m5C methylation (FIG. 2D). However, ERVK and L1 responded differently to Tet2 or Pspc1 knockout in terms of local chromatin accessibility changes (FIG. 8). L1MdA_I/II and IAPEz-int/RLTR10 were closely examined as respective representatives, as they were the top ranked subfamilies in the m5C MeRIP enrichment dataset. IAP displayed an increase in local chromatin accessibility upon either Tet2 or Pspc1 KO (FIG. 8); the increased m5C on IAP RNAs was confirmed by using m5C methylated RNA immunoprecipitation followed by quantitative reverse transcription qPCR (m5C-MeRIP- qPCR) (FIG. 2E). Moreover, m5C-marked IAP displayed a greater increase in carRNA abundance when compared to unmethylated repeats upon Tet2 KO (FIG.2F). Therefore, the TET2-mediated oxidation of m5C on IAP RNAs in mESCs led to their reduced abundances and decreased local chromatin accessibility. On the other hand, the L1-associated chromatin accessibility increased only upon Tet2 KO but not PSPC1 depletion (FIG.8), suggesting that L1-associated chromatin changes depended on TET2 in a PSPC1-independent manner. This conclusion was confirmed by m5C-MeRIP-qPCR of L1 RNAs compared with that on IAP RNAs using Tet2 or Pspc1 KO cells (FIG.2E). [0426] Bolstered by these findings, the inventors further examined potential writer proteins that may install chromatin-associated RNA m5C. NOP2/Sun RNA methyltransferase 2 (NSUN2) and DNA methyltransferase 2 (TRDMT1) were promising candidates as both were known to localize in the cell nucleus and mediate RNA m5C methylation (31, 32). While differences upon TRDMT1 depletion were minimal, NSUN2 depletion using siRNA resulted in an approximately 70% decrease in caRNA (rRNA depleted) m5C abundance (FIG. 9A). Therefore, NSUN2 appeared to be the main caRNA m5C writer protein in mESCs. Additionally, transcriptome-wide alterations caused by Nsun2 depletion exhibited patterns that contrasted with the gene expression changes caused by Tet2 or Pspc1 depletion in mESCs (FIG.9B). [0427] Biochemically, it has been shown by several groups that purified TET2 protein can mediate oxidation of RNA m5C to hm5C (22, 33). Consistent with these findings, the UHPLC- MS/MS measurements reported here from TET2 depleted cells also indicated oxidation of m5C on chromatin-associated RNA by TET2 in mESCs (FIG. 2A), with LTR RNAs as a notable example (FIG.2D). Mouse IAP RNAs were investigated further as these RNAs had the highest levels of m5C methylation among all LTR RNAs in mESCs (FIG.2D).
[0428] Gene transcription rates were analyzed by enriching and sequencing EU labeled nascent transcripts with and without TET2 depletion. TET2 depletion led to increased chromatin-associated IAP RNA m5C level (FIG. 2E), accompanied with increased local chromatin accessibility (FIG.8), and elevated levels of its target RNAs (FIG.2F), suggesting a correlation between the more open chromatin caused by TET2 deficiency and the accumulation of m5C on the target repeat RNAs. [0429] Next, IAP methylation (or its potential oxidation by TET2) was blocked with a designed anti-sense oligo (ASO) that annealed with the main IAP m5C peak in mESCs, located near the 5′-end of the RNA molecule (FIG. 8). IAP ASO blocking of m5C installation was validated with qPCR analysis using primers targeting individual regions (FIG.10A), a slight decrease in the IAP RNA level was observed (FIG. 10B). The inventors also observed corresponding closed local chromatin at IAP loci (FIG.10C). These results indicated that the m5C methylation on chromatin-associated IAP affected local chromatin state and gene expression. [0430] Third, to establish the key causal relationship, loci-specific RNA targeting systems were generated by fusing dCas13b (SEQ ID NO: 25) (34) with the TET2 catalytic domain (TET2CD) (SEQ ID NO: 21). Guide RNAs targeting the primary m5C site proximal to the 5′- end of IAP RNAs (FIG.8) were generated (SEQ ID NOs: 91). The dCas13b-TET2(CD) fusion protein (SEQ ID NO: 27) was stably expressed in mESCs with the guide RNA under the control of a doxycycline (DOX)-responsive Tet operator (35) (FIG. 10D). Acute expression of the guide RNA resulted in rapid TET2CD fusion protein recruitment and reduction of RNA m5C methylation on IAP transcripts within four hours. This was followed by increased DNA 5mC methylation at the IAP loci (FIG. 2G) and downregulation of nearby gene transcription, consistent with the gene suppression role of m5C oxidation by TET2. In sharp contrast, tethering of a TET2 catalytic-dead mutant (36) did not alter DNA or RNA methylation levels, demonstrating that this effect was dependent on TET2 oxidation activities, but not on TET2’s protein scaffolding effects (37, 38). These observations confirmed the direct effect of IAP RNA m5C oxidation by TET2 on transcriptional regulation. The increased DNA 5mC methylation caused by TET2 targeting to RNA was inconsistent with a main role of TET2, that of DNA 5mC oxidation, but agreed with the widespread DNA hypomethylation following TET2 inactivation frequently observed in embryonic stem cells, hematopoietic stem cells and cancer cells (5). Interestingly, these results demonstrated that the global chromatin and transcriptional regulation effects of TET2 were most likely mediated through RNA m5C oxidation.
[0431] Lastly, to further confirm the enzymatic activity-dependent regulation, the inventors employed a small molecule inhibitor developed previously against all TET enzymes (39). Treatment with this inhibitor resulted in significant caRNA m5C increases at an early stage (FIG.10E, bottom, significant starting from eight hour), followed by significant gDNA 5mC increases at a later stage (FIG.10E, top, significant starting from 36 hour). Example 4 – TET2 can be recruited to target either DNA 5mC or RNA m5C when engaging different nucleic acid binding protein partners [0432] To facilitate differentiation of the effects of TET2 on DNA versus RNA, the inventors further analyzed DNA 5mC changes caused by Tet2 KO at different genomic regions, including enhancers, promoters, and repeat RNAs. A negative correlation between changes in enhancer transcription and DNA methylation resulting from Tet2 KO in mESCs was observed, which contrasted with the pattern observed for repeat RNAs (FIG. 11A). Furthermore, these DNA hypermethylated regions resulting from Tet2 knockout were enriched at enhancer regions while being depleted in repeats (FIG.3A). Correspondingly, chromatin was closed near those regions upon Tet2 KO (FIGs.3B-3C), suggesting that DNA 5mC oxidation by TET2 in these enhancer regions was leading to local transcription activation. [0433] Aside from the enhancer regions that tended to be DNA hypermethylated, PSPC1- bound regions exhibited notable enrichment at DNA hypomethylated regions upon TET2 depletion (FIG. 3D and FIG. 11B). Additionally, these DNA hypomethylated regions were more accessible upon Tet2 or Pspc1 KO in mESCs (FIGs. 3E-3F), contrasting with the behavior of DNA hypermethylated regions. These observations suggested that the elevated chromatin accessibility resulting from TET2 depletion could not be attributed to the 5mC oxidation activity of TET2 in DNA, but rather was attributable to TET2’s oxidation activity on RNA m5C. Additionally and consistent with the above findings, m5C methylated sites were predominantly enriched at the PSPC1 chromatin-binding regions, as opposed to being enriched at CXXC5-bound regions (FIG. 3G). Therefore, the inventors concluded that TET2 could mediate either DNA or RNA 5-methylcytosine oxidation. When bound by PSPC1, TET2 was directed to repeat RNAs (e.g., LTR RNAs in particular) and mediated RNA m5C oxidation. It was the RNA m5C oxidation that dominated chromatin and transcriptional regulation in the mESC system. However, it is possible that TET2 oxidation of DNA 5mC could dominate chromatin regulation in other systems.
Example 5 – MBD5 and MBD6 were RNA-binding proteins that preferentially recognize and bind RNA m5C, furthermore, MBD6 binds RNA m5C to recruit PR-DUB. [0434] After establishing the role of chromatin-associated RNA m5C oxidation by TET2 on local chromatin state regulation, the inventors computationally analyzed histone modifications that correlated best with the activities of TET2 on RNAs. It was found that H2AK119ub modifications, a well-known chromatin repressive mark installed by polycomb repressive complexes 1 (PRC1) (40), ranked as the most significantly correlated mark (FIG. 4A and FIG.12A). H2AK119ub changes were profiled with and without TET2 depletion, and a decrease in H2AK119ub at the IAP loci was observed when TET2 was depleted (FIG.12B). Consistent with these observations, tethering of dCas13b-TET2(CD) (SEQ ID NO: 27), but not the catalytic dead mutant (SEQ ID NO: 29), caused a significant increase in H2AK119ub at the IAP loci (FIG. 4B). Time-lapse tracking of H2AK119ub and H3K27me3 marks with CUT&Tag (41) at IAP and control LINE loci following acute TET inhibition showed an early response of H2AK119ub depletion upon TET inhibition (FIG.12C). The H3K27me3 level at the IAP loci also decreased upon TET inhibition (FIG. 12C), which can be attributed to the known crosstalk between PRC1 and PRC2 (42). Collectively, these results elucidated the H2AK119ub depleted opened chromatin state observed upon TET2 inactivation. [0435] H2AK119ub is installed by PRC1 and can be erased by the polycomb repressive deubiquitylase (PR-DUB) complexes (40, 43-45). Studies have identified methyl-CpG binding domain protein 5 (MBD5) and methyl-CpG binding domain protein 6 (MBD6) as partner proteins of PR-DUB, and their localization to heterochromatin appeared to be independent of DNA 5mC (46, 47). MBD5 and MBD6 both possess a conserved but structurally distinct methyl-binding domain (MBD), but as confirmed herein, do not bind to DNA (FIG.13A and FIG.13C) (46). When considering the above elucidation of RNA m5C oxidation by TET2, the inventors hypothesized that these two proteins may bind RNA m5C, which may then recruit PR-DUB to mediate H2AK119ub deubiquitylation at the m5C-methylated caRNA (e.g., LTR loci) for transcriptional activation. [0436] UV-crosslinking and immunoprecipitation (CLIP) of both MBD5 and MBD6 with bound nucleic acids in mESCs (FIG.13B) was performed. The results showed that MBD5 and MBD6 crosslinked with RNA, but not with DNA, as an RNase treatment almost completely abolished nucleic acid immunoprecipitated by MBD5 or MBD6, while the effect of a DNase treatment was minor (FIG. 13C). UHPLC-MS/MS analysis of digested crosslinked product RNAs revealed an enrichment of m5C-modified nucleosides (FIG. 4C and FIG. 13D). Biochemically, the purified MBD domain (SEQ ID NO: 3) of MBD6 was found to
preferentially bind to a single-stranded oligonucleotide probe containing m5C relative to unmethylated or hm5C-modified probes in electrophoretic mobility shift assays (EMSA) (FIG. 4D and FIG. 13E). Therefore, the data revealed that MBD5 and MBD6 were RNA-binding proteins that preferentially recognized and bound to RNA m5C-modified nucleosides. [0437] The RNA-binding targets of MBD5 and MBD6 were found to significantly overlap with each other (FIG. 14A). However, MBD6 appeared to affect the levels of histone H2AK119ub and repeat RNA expression more dominantly relative to MBD5, as knockdown of MBD6 was sufficient to reverse elevated expression of LTRs (e.g., IAP, MERVL and MusD) caused by Tet2 KO in mESCs, whereas MBD5 knockdown failed to do so (FIG. 14B). The global H2AK119ub level also significantly increased only upon Mbd6 knockdown (FIG.14C). Both Mbd6 knockdown and Nsun2 knockdown were found to partially suppress the global chromatin openness caused by Tet2 KO in mESCs (FIG.4E). Furthermore, the inventors found that knockdown of Mbd6 caused a global decrease in the caRNA m5C level (FIG.14D). In line with these findings, IAP RNAs were found to be stabilized in Tet2 KO mESCs, and were found to be destabilized upon Mbd6 knockdown (FIG. 14E). These results all indicated that m5C methylation stabilized caRNAs (e.g., LTR RNAs) and that this effect was mediated largely through MBD6. Therefore, the following studies focused on MBD6, although MBD5 may play an equally important role in other cell types and/or disease states.
Example 6 – MBD6 depletion reversed aberrant HSPC self-renewal and differentiation caused by TET2 deficiency [0438] Some of the most dramatic phenotypes associated with TET2 loss can be observed in myeloid cells. TET2 deficiency in hematopoietic stem and progenitor cells (HSPCs) has been shown to cause disease states characterized by open chromatin and genome instability, which can culminate in myeloid malignancies (48). Potentially the most important features of TET2- deficient LK cells (Lin-c-Kit+ cells, capturing HSPCs) are their enhanced self-renewal capacity, and their skewed propensity towards differentiation into granulocytic/monocytic lineages in vitro (4). [0439] Consistent with the above observations found in mESCs, a global increase in transcription (FIG. 4F) and chromatin accessibility (FIG. 15A), along with a decrease in H2AK119ub levels at IAP loci was observed in LK cells upon Tet2 KO (FIG.4G). As caRNA m5C regulated the local chromatin state around the IAP loci through MBD6, the inventors investigated the role of MBD6 in HSPC function (FIG. 15B). Knockdown of Mbd6 significantly reduced the replating potential of Tet2 KO LK cells (FIG. 4H and FIG. 15C). Mbd6 knockdown disrupted the TET2-loss-induced prolonged maintenance of stem/progenitor cells and promoted differentiation of HSPCs toward myeloid lineages in vitro (assayed on day 7 and 14, respectively) (FIG.4I and FIG.15D). [0440] Along the same investigatory axis, the inventors explored the functional outcomes of targeted IAP RNA m5C oxidation using dCas13d-TET2(CD) fusion constructs in HSPCs (FIG. 16A). In a serial replating assay, targeted IAP m5C oxidation was found to partially suppress the enhanced self-renewal ability of HSPCs that was associated with TET2 loss (FIG. 16B). The normal expression level of stem/progenitor markers (Lin-c-Kit+, FIG. 16C) and differentiation marker CD11b (FIG.16D) could be restored by targeted oxidation of IAP RNA m5C by ectopic expression of a dCas13d-TET2(CD) fusion protein, but not by the catalytic- dead mutant (TET2HxDCD) in Tet2 KO LK cells. [0441] Consistent with the above results, in another serial replating assay, steric blockade of m5C sites in IAP RNA using ASOs targeting MERVL or IAP was sufficient to significantly suppress the enhanced TET2 loss-of-function associated self-renewal abilities in HSPCs when compared to HSPCs receiving control ASO (NC ASO) (FIG.17A-17B). Mechanistically, IAP blockade disrupted the prolonged maintenance of stem/progenitor cells and enhanced differentiation of HSPCs toward myeloid lineages (CD11b+) in vitro (FIG.17C-17D). [0442] To further investigate the underlying mechanism, caRNA targeting (e.g., MERVL targeting or IAP targeting) ASO blockade was also utlized to sterically block m5C sites of IAP
or MERVL RNA in LK cells, and the cells were then analyzed with total RNA-seq. The results showed that genes upregulated by Tet2 KO exhibited a higher overlap with the genes downregulated by IAP ASO treatment, rather than those with increased expression (FIG.4J- 4K and FIG. 18A). This finding was consistent with the role of IAP RNA as a substrate of TET2 involved in regulating chromatin accessibility through the m5C/MBD6/H2AK119ub axis. Further functional analysis of genes that were upregulated by Tet2 KO and downregulated by IAP ASO treatment revealed enrichments in pathways including transcriptional pathways misregulated in cancer cells (e.g., Th1 and Th2 cell differentiation, Th17 cell differentiation, etc.) (FIG. 18B). Additionally, a strong anticorrelation in cancer associated pathway expression alterations caused by Tet2 knockout were observed when compared to IAP ASO treatment cells (e.g., general transcriptional misregulation in cancer, MAPK signaling, or C- type Lectin Receptor Signaling) (FIG.4L and FIG.18C). These results showed that IAP ASO blockade could reverse (suppress) the transcriptional changes and phenotypes induced by Tet2 KO in LK cells. Example 7 – The m5C-TET2-LTR-MBD6 axis impacted leukemia and glioma cell fitness, and MBD6 depletion selectively impaired TET2-deficient leukemia growth in vitro and in vivo [0443] As the m5C-TET2-LTR-MBD6 axis was found to be important for HSPC function, the inventors then studied the role of the axis in leukemia cell fitness. The inventors proposed mechanism predicted that TET2 loss of function would lead to more MBD6 binding to caRNAs (e.g., LTR RNAs) and subsequent activation of genes critical to leukemogenesis. If the mechanism proved accurate, MBD6 depletion or inhibition would specifically inhibit proliferation of TET2 mutant or TET2 depleted leukemia cells. To test this, the inventors transiently knocked down MBD6 in a panel of human leukemia cells with two individual siRNAs. While modest inhibition of proliferation was observed in TET2 WT cell lines, almost complete proliferation blockade was observed for SKM-1 cells (FIG. 5A and FIG. 19A), a human AML cell line bearing a TET2 frameshift mutation (49). To further confirm this synergistic lethal effect, as these leukemia cells appeared to have become addicted to (dependent upon) the MBD6-mediated gene activation pathway for proliferation, the inventors compared proliferation of WT and TET2 KO K-562 and THP-1 cells with (siMBD6 or shMBD6) or without (siNC or shNC) MBD6 depletion (FIG. 19B). Marked and significant attenuation of cell proliferation was again observed by MBD6 knockdown in TET2 KO cells when compared with the controls (FIG.5B), along with increased levels of global H2AK119ub
and PARP1 cleavage (FIG. 19C). Additionally, when targeting the writer NSUN2, although the experiments only resulted in ~40% knockdown of NSUN2 in these leukemia cells using siRNAs (FIG. 19D), a synergistic inhibition of proliferation in TET2 KO K-562 and THP-1 cells (FIG. 19E) was also observed, confirming these TET2 KO cells dependence on m5C caRNA methylation for proliferative capacity. [0444] To test whether MBD6 loss affected leukemogenesis in vivo, especially in the absence of TET2, the inventors transplanted 1 × 106 WT, TET2 KO, WT/MBD6 KD, or TET2 KO/MBD6 KD K-562 cells into adult NOD.Cg-PrkdcscidIl2rgtm1Wjl/SzJ (NSG) mice (FIG. 20A). Mice receiving WT or TET2 KO cells died between 28-39 days or 35-53 days after transplantation, respectively. Mice receiving WT/MBD6 KD or TET2 KO/MBD6 KD cells exhibited a dramatically decelerated leukemogenesis relative to controls, particularly those animals transplanted with TET2 KO/MBD6 KD cells, which survived significantly longer (125- 135 days or 163-220 days respectively) (FIG.5C). Consistent with these results, WT or TET2 KO recipient mice had markedly higher human CD33+ cells chimerism in bone marrow (BM) and peripheral blood (PB) after 26 days when compared with animals that received WT/MBD6 KD or TET2 KO/MBD6 KD cells (FIG.20B). Similar results were also observed in an in vivo xenotransplantation study with WT/TET2 KO, WT/MBD6 KD, or TET2 KO/MBD6 KD THP- 1 cells (FIG.5C). WT or TET2 KO cells transplanted mice died at 20-23 days after transplant, while in comparison, MBD6 KD significantly prolonged survival in mice receiving WT/MBD6 KD cells or TET2 KO/MBD6 KD cells (surivival for 51-117 days or 110-160 days after transplant, respectively). WT or TET2 KO cell recipient mice showed dramatically higher human CD33+CD45+ cells chimerism in BM and PB after 20 days when compared to WT/MBD6 KD or TET2 KO/MBD6 KD cell recipient mice (FIG. 20C). Thus, MBD6 KD markedly attenuated leukemic progression in vivo, in the absence or in the presence of functional TET2, albeit with greater efficacy in the absence of functional TET2. [0445] Surprisingly, MBD6 appeared to exert a specific synergistic effect on caRNA in TET2-deficient cells to attenuate cell proliferation, as MBD6 knockdown reversed (suppressed) the excessive caRNA expression observed TET2 KO, but did not significantly reduce whole- cell RNA (e.g., SNRP70 and/or mRNA) (FIG. 21A-21B). Specifically, TET2 depletion in leukemia cells resulted in excessive expression of carRNAs (including paRNAs, eRNAs and repeat RNAs), consistent with observations reported above in mESCs and mLK cells. Wild type like levels of carRNAs could be largely rescued by MBD6 knockdown (FIG.5D and FIG. 21C-21E) in TET2 depletion backgrounds. Furthermore, this rescue (suppression of TET2 LOF phenotypes) was predominantly observed on m5C-marked LTR repeat regions (FIG.5E
and FIG.21F), providing additional support of the dependence of the m5C RNA methylation for the observed synergistic effects. [0446] To evaluate whether the reversal of TET2 KO phenotypes by MBD6 knockdown was mediated by MBD6’s engagement in cellular DUB (deubiquitinase) activities, the inventors performed H2AK119ub CUT&Tag experiments. Consistent with the results reported herein, MBD6 knockdown resulted in a global increase in H2AK119ub levels, while TET2 KO reduced the overall H2AK119ub levels in K-562 cells (FIG.5F and FIG.22A-22B). Among carRNA groups, these observations mainly occurred in repeat RNAs (FIG.5G and FIG.22C). By focusing on repeat RNAs, the inventors observed a negative correlation between changes in the RNA abundances of LTR RNAs and decreased nearby H2AK119ub levels upon TET2 KO, while a positive correlation was observed between the LTR RNAs abundance changed upon TET2 KO and nearby H2AK119ub changes upon MBD6 knockdown (FIG.22D). Upon further examination of the m5C methylation levels on repeat RNAs in K-562 cells, the inventors found that the LTR class, particularly the ERV1 family with LTR12 and HERVH-int as representative subfamilies, exhibited higher m5C methylation levels (FIG.5H and FIG.22E; and Table 1). Notably, the nearby H2AK119ub levels were also upregulated by MBD6 knockdown and downregulated by TET2 KO (FIG.5I). Collectively, these results supported the proposed mechanism comprising a pathway in which TET2 oxidation of m5C in repeat RNAs leads to decreased MBD6 binding and subsequently increased nearby H2AK119ub levels due to reduced deubiquitylation mediated through MBD6 m5C caRNA binding. [0447] Finally, the inventors investigated the downstream signaling pathways involved in this m5C-TET2-MBD6-H2AK119ub axis in leukemia cells. The results showed a significant overlap between the genes upregulated by TET2 KO and those downregulated by MBD6 knockdown in TET2 depleted K-562 cells (FIG.23A). The enriched gene ontology terms that overlapped between the tested conditions were highly conserved, similar to what was observed in Tet2 KO compared to IAP ASO treatment in mouse LK cells (FIG.23B-23C). Additionally, expression of TET2 KO associated genes involved in signaling pathways, including apoptosis and cell cycle, were reversed (suppressed) by MBD6 knockdown (FIG.5J), providing further support that these two proteins acted in the same pathway, and that depletion of both could result in synergistic inhibition of tumor cell growth (FIG.5B). [0448] Together, the findings provided herein elucidated a new pathway describing how RNA m5C methylation on carRNAs, in particular LTR family RNAs, regulated chromatin state and transcription. The m5C on these repeat RNAs could be recognized by a newly identified RNA m5C binding protein MBD6, which recruited the BAP1 complex to mediate H2AK119ub
deubiquitylation and gene activation. TET2 oxidized the m5C on these RNAs and antagonized gene activation through the m5C-MBD6-H2AK119ub deubiquitylation axis. Loss of TET2 (or associated TET2 pathway components) led to carRNA m5C hypermethylation, reduced histone ubiquitination, more widespread DNA hypomethylation, and open chromatin, culminating in activation of genes critical to disease progression (e.g., leukemogenesis). These findings can, at least in a significant part, explain the accelerated myeloid malignancy that is induced by TET2 inactivation (FIG. 5K). These findings also revealed synergistic killing of TET2- deficient malignant cells with MBD6 and/or NSUN2 inhibition/inactivation, providing motivation for targeted therapies against these proteins and diseases associated with loss of functional TET2 pathways, caRNA m5C hypermethylation, DNA hypomethylation, low histone ubiquitination, aberrant open chromatin, and/or aberrant gene activation. Example 8 – Design and testing of MBD6 targeting proteolysis targeting chimera [0449] As a method to inhibit MBD6, a proteolysis targeting chimera (PROTAC) is designed and tested. A chemical library is screened for affinity to MBD6 to identify a ligand binding domain. MBD6 ligands are covalently linked to a molecule that binds E3 ubiquitin ligase using methods described in the art. The PROTAC will be tested in vitro for degradation of MBD6 in cultured cells before administering in vivo (e.g., in a cancer mouse model). Following administration of the MBD6 targeting PROTAC, m5C levels in one or more chromatin associated RNA (caRNA) are decreased and/or association of one or more caRNA with a PR-DUB complex are decreased in in one or more diseased cells (e.g., a tumor cell) in the mouse. * * * [0450] All of the methods disclosed and claimed herein can be made and executed without undue experimentation in light of the present disclosure. While the compositions and methods of this invention have been described in terms of certain aspects, it will be apparent to those of skill in the art that variations may be applied to the methods and in the steps or in the sequence of steps of the method described herein without departing from the concept, spirit and scope of the invention. More specifically, it will be apparent that certain agents which are both chemically and physiologically related may be substituted for the agents described herein while the same or similar results would be achieved. All such similar substitutes and modifications apparent to those skilled in the art are deemed to be within the spirit, scope and concept of the invention as defined by the appended claims.
REFERENCES [0451] All references cited herein, the extent that they provide exemplary procedural or other details supplementary to those set forth herein, are specifically incorporated herein by reference. [0452] 1 Tefferi, A., Lim, K. H. & Levine, R. Mutation in TET2 in myeloid cancers. N Engl J Med 361, 1117; author reply 1117-1118, doi:10.1056/NEJMc091348 (2009). [0453] 2 Jankowska, A. M. et al. Loss of heterozygosity 4q24 and TET2 mutations associated with myelodysplastic/myeloproliferative neoplasms. Blood 113, 6403-6410, doi:10.1182/blood-2009-02-205690 (2009). [0454] 3 Langemeijer, S. M. et al. Acquired mutations in TET2 are common in myelodysplastic syndromes. Nat Genet 41, 838-842, doi:10.1038/ng.391 (2009). [0455] 4 Moran-Crusio, K. et al. Tet2 loss leads to increased hematopoietic stem cell self-renewal and myeloid transformation. Cancer Cell 20, 11-24, doi:10.1016/j.ccr.2011.06.001 (2011). [0456] 5 Lopez-Moyado, I. F. et al. Paradoxical association of TET loss of function with genome-wide DNA hypomethylation. Proc Natl Acad Sci U S A 116, 16933-16942, doi:10.1073/pnas.1903059116 (2019). [0457] 6 Richards, E. J. & Elgin, S. C. Epigenetic codes for heterochromatin formation and silencing: rounding up the usual suspects. Cell 108, 489-500, doi:10.1016/s0092- 8674(02)00644-x (2002). [0458] 7 Kong, L. et al. A primary role of TET proteins in establishment and maintenance of De Novo bivalency at CpG islands. Nucleic Acids Res 44, 8682-8692, doi:10.1093/nar/gkw529 (2016). [0459] 8 Wu, D. et al. Glucose-regulated phosphorylation of TET2 by AMPK reveals a pathway linking diabetes to cancer. Nature 559, 637-641, doi:10.1038/s41586-018-0350-5 (2018). [0460] 9 Gu, T. P. et al. The role of Tet3 DNA dioxygenase in epigenetic reprogramming by oocytes. Nature 477, 606-610, doi:10.1038/nature10443 (2011). [0461] 10 Wu, X. & Zhang, Y. TET-mediated active DNA demethylation: mechanism, function and beyond. Nat Rev Genet 18, 517-534, doi:10.1038/nrg.2017.33 (2017). [0462] 11 Huang, Y. & Rao, A. Connections between TET proteins and aberrant DNA modification in cancer. Trends Genet 30, 464-474, doi:10.1016/j.tig.2014.07.005 (2014).
[0463] 12 Figueroa, M. E. et al. Leukemic IDH1 and IDH2 mutations result in a hypermethylation phenotype, disrupt TET2 function, and impair hematopoietic differentiation. Cancer Cell 18, 553-567, doi:10.1016/j.ccr.2010.11.015 (2010). [0464] 13 Delhommeau, F. et al. Mutation in TET2 in myeloid cancers. N Engl J Med 360, 2289-2301, doi:10.1056/NEJMoa0810069 (2009). [0465] 14 Ward, P. S. et al. The common feature of leukemia-associated IDH1 and IDH2 mutations is a neomorphic enzyme activity converting alpha-ketoglutarate to 2- hydroxyglutarate. Cancer Cell 17, 225-234, doi:10.1016/j.ccr.2010.01.020 (2010). [0466] 15 Hon, G. C. et al.5mC oxidation by Tet2 modulates enhancer activity and timing of transcriptome reprogramming during differentiation. Mol Cell 56, 286-297, doi:10.1016/j.molcel.2014.08.026 (2014). [0467] 16 Ko, M. et al. Modulation of TET2 expression and 5-methylcytosine oxidation by the CXXC domain protein IDAX. Nature 497, 122-126, doi:10.1038/nature12052 (2013). [0468] 17 Abou-Jaoude, A. et al. Idax and Rinf facilitate expression of Tet enzymes to promote neural and suppress trophectodermal programs during differentiation of embryonic stem cells. Stem Cell Res 61, 102770, doi:10.1016/j.scr.2022.102770 (2022). [0469] 18 Ravichandran, M. et al. Rinf Regulates Pluripotency Network Genes and Tet Enzymes in Embryonic Stem Cells. Cell Rep 28, 1993-2003 e1995, doi:10.1016/j.celrep.2019.07.080 (2019). [0470] 19 Pastor, W. A., Aravind, L. & Rao, A. TETonic shift: biological roles of TET proteins in DNA demethylation and transcription. Nat Rev Mol Cell Biol 14, 341-356, doi:10.1038/nrm3589 (2013). [0471] 20 Guallar, D. et al. RNA-dependent chromatin targeting of TET2 for endogenous retrovirus control in pluripotent stem cells. Nat Genet 50, 443-451, doi:10.1038/s41588-018- 0060-9 (2018). [0472] 21 Delatte, B. et al. RNA biochemistry. Transcriptome-wide distribution and function of RNA hydroxymethylcytosine. Science 351, 282-285, doi:10.1126/science.aac5253 (2016). [0473] 22 Fu, L. et al. Tet-mediated formation of 5-hydroxymethylcytosine in RNA. J Am Chem Soc 136, 11582-11585, doi:10.1021/ja505305z (2014). [0474] 23 Shen, Q. et al. Tet2 promotes pathogen infection-induced myelopoiesis through mRNA oxidation. Nature 554, 123-127, doi:10.1038/nature25434 (2018).
[0475] 24 Li, Y. et al. TET2-mediated mRNA demethylation regulates leukemia stem cell homing and self-renewal. Cell Stem Cell 30, 1072-1090 e1010, doi:10.1016/j.stem.2023.07.001 (2023). [0476] 25 Liu, J. et al. N (6)-methyladenosine of chromosome-associated regulatory RNA regulates chromatin state and transcription. Science 367, 580-586, doi:10.1126/science.aay6018 (2020). [0477] 26 Wei, J. et al. FTO mediates LINE1 m(6)A demethylation and chromatin regulation in mESCs and mouse development. Science, eabe9582, doi:10.1126/science.abe9582 (2022). [0478] 27 Xu, W. et al. METTL3 regulates heterochromatin in mouse embryonic stem cells. Nature 591, 317-321, doi:10.1038/s41586-021-03210-1 (2021). [0479] 28 Liu, J. et al. The RNA m(6)A reader YTHDC1 silences retrotransposons and guards ES cell identity. Nature 591, 322-326, doi:10.1038/s41586-021-03313-9 (2021). [0480] 29 Chelmicki, T. et al. m(6)A RNA methylation regulates the fate of endogenous retroviruses. Nature 591, 312-316, doi:10.1038/s41586-020-03135-1 (2021). [0481] 30 Chen, X. et al. ATAC-see reveals the accessible genome by transposase- mediated imaging and sequencing. Nat Methods 13, 1013-1020, doi:10.1038/nmeth.4031 (2016). [0482] 31 Chen, H. et al. m(5)C modification of mRNA serves a DNA damage code to promote homologous recombination. Nat Commun 11, 2834, doi:10.1038/s41467-020-16722- 7 (2020). [0483] 32 Yang, X. et al. 5-methylcytosine promotes mRNA export - NSUN2 as the methyltransferase and ALYREF as an m(5)C reader. Cell Res 27, 606-625, doi:10.1038/cr.2017.55 (2017). [0484] 33 Shen, H. et al. TET-mediated 5-methylcytosine oxidation in tRNA promotes translation. J Biol Chem 296, 100087, doi:10.1074/jbc.RA120.014226 (2021). [0485] 34 Wilson, C., Chen, P. J., Miao, Z. & Liu, D. R. Programmable m(6)A modification of cellular RNAs with a Cas13-directed methyltransferase. Nat Biotechnol 38, 1431-1440, doi:10.1038/s41587-020-0572-6 (2020). [0486] 35 Das, A. T., Tenenbaum, L. & Berkhout, B. Tet-On Systems For Doxycycline- inducible Gene Expression. Curr Gene Ther 16, 156-167, doi:10.2174/1566523216666160524144041 (2016). [0487] 36 Ito, S. et al. Tet proteins can convert 5-methylcytosine to 5-formylcytosine and 5-carboxylcytosine. Science 333, 1300-1303, doi:10.1126/science.1210597 (2011).
[0488] 37 Zhang, Q. et al. Tet2 is required to resolve inflammation by recruiting Hdac2 to specifically repress IL-6. Nature 525, 389-393, doi:10.1038/nature15252 (2015). [0489] 38 Chrysanthou, S. et al. The DNA dioxygenase Tet1 regulates H3K27 modification and embryonic stem cell biology independent of its catalytic activity. Nucleic Acids Res 50, 3169-3189, doi:10.1093/nar/gkac089 (2022). [0490] 39 Singh, A. K. et al. Selective targeting of TET catalytic domain promotes somatic cell reprogramming. Proc Natl Acad Sci U S A 117, 3621-3626, doi:10.1073/pnas.1910702117 (2020). [0491] 40 Wang, H. et al. Role of histone H2A ubiquitination in Polycomb silencing. Nature 431, 873-878, doi:10.1038/nature02985 (2004). [0492] 41 Kaya-Okur, H. S. et al. CUT&Tag for efficient epigenomic profiling of small samples and single cells. Nat Commun 10, 1930, doi:10.1038/s41467-019-09982-5 (2019). [0493] 42 Cooper, S. et al. Jarid2 binds mono-ubiquitylated H2A lysine 119 to mediate crosstalk between Polycomb complexes PRC1 and PRC2. Nat Commun 7, 13661, doi:10.1038/ncomms13661 (2016). [0494] 43 de Napoles, M. et al. Polycomb group proteins Ring1A/B link ubiquitylation of histone H2A to heritable gene silencing and X inactivation. Dev Cell 7, 663-676, doi:10.1016/j.devcel.2004.10.005 (2004). [0495] 44 Scheuermann, J. C. et al. Histone H2A deubiquitinase activity of the Polycomb repressive complex PR-DUB. Nature 465, 243-247, doi:10.1038/nature08966 (2010). [0496] 45 Daou, S. et al. The BAP1/ASXL2 Histone H2A Deubiquitinase Complex Regulates Cell Proliferation and Is Disrupted in Cancer. J Biol Chem 290, 28643-28663, doi:10.1074/jbc.M115.661553 (2015). [0497] 46 Laget, S. et al. The human proteins MBD5 and MBD6 associate with heterochromatin but they do not bind methylated DNA. PLoS One 5, e11982, doi:10.1371/journal.pone.0011982 (2010). [0498] 47 Baymaz, H. I. et al. MBD5 and MBD6 interact with the human PR-DUB complex through their methyl-CpG-binding domain. Proteomics 14, 2179-2189, doi:10.1002/pmic.201400013 (2014). [0499] 48 Li, Z. et al. Deletion of Tet2 in mice leads to dysregulated hematopoietic stem cells and subsequent development of myeloid malignancies. Blood 118, 4509-4518, doi:10.1182/blood-2010-12-325241 (2011).
[0500] 49 Cluzeau, T. et al. Phenotypic and genotypic characterization of azacitidine- sensitive and resistant SKM1 myeloid cell lines. Oncotarget 5, 4384-4391, doi:10.18632/oncotarget.2024 (2014). [0501] 50 Zhang, R. R. et al. Tet1 regulates adult hippocampal neurogenesis and cognition. Cell Stem Cell 13, 237-245, doi:10.1016/j.stem.2013.05.006 (2013). [0502] 51 Guan, Y. et al. A Therapeutic Strategy for Preferential Targeting of TET2 Mutant and TET-dioxygenase Deficient Cells in Myeloid Neoplasms. Blood Cancer Discov 2, 146-161, doi:10.1158/2643-3230.BCD-20-0173 (2021). [0503] 52 Wuarin, J. & Schibler, U. Physical isolation of nascent RNA chains transcribed by RNA polymerase II: evidence for cotranscriptional splicing. Mol Cell Biol 14, 7219-7225, doi:10.1128/mcb.14.11.7219-7225.1994 (1994). [0504] 53 Henriksen, J. R. et al. Comparison of RNAi efficiency mediated by tetracycline- responsive H1 and U6 promoter variants in mammalian cell lines. Nucleic Acids Res 35, e67, doi:10.1093/nar/gkm193 (2007). [0505] 54 Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114-2120, doi:10.1093/bioinformatics/btu170 (2014). [0506] 55 Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat Methods 9, 357-359, doi:10.1038/nmeth.1923 (2012). [0507] 56 Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell 38, 576-589, doi:10.1016/j.molcel.2010.05.004 (2010). [0508] 57 Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol 37, 907-915, doi:10.1038/s41587-019-0201-4 (2019). [0509] 58 Frankish, A. et al. GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res 47, D766-D773, doi:10.1093/nar/gky955 (2019). [0510] 59 Anders, S., Pyl, P. T. & Huber, W. HTSeq--a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166-169, doi:10.1093/bioinformatics/btu638 (2015). [0511] 60 Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139-140, doi:10.1093/bioinformatics/btp616 (2010).
[0512] 61 Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory- efficient alignment of short DNA sequences to the human genome. Genome Biol 10, R25, doi:10.1186/gb-2009-10-3-r25 (2009). [0513] 62 Danecek, P. et al. Twelve years of SAMtools and BCFtools. Gigascience 10, doi:10.1093/gigascience/giab008 (2021). [0514] 63 Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol 9, R137, doi:10.1186/gb-2008-9-9-r137 (2008). [0515] 64 Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15, 550, doi:10.1186/s13059-014- 0550-8 (2014).
Claims
CLAIMS What is claimed is: 1. A method of treating a disease in an individual, comprising the step of administering one or more inhibitors of methyl-CpG-binding domain protein 6 (MBD6) to an individual in need thereof.
2. The method of claim 1, wherein the disease comprises cancer of the blood, lung, brain, breast, skin, pancreas, liver, colon, head and neck, kidney, thyroid, stomach, spleen, gallbladder, bone, ovary, testes, endometrium, prostate, rectum, anus, and/or cervix.
3. The method of claim 1, wherein the disease comprises clonal hematopoiesis of indeterminate potential (CHIP).
4. The method of claim 3, wherein the disease is characterized by atherosclerosis, myocardial fibrosis, and/or heart failure.
5. The method of claim 2, wherein the cancer comprises a blood cancer.
6. The method of claim 5, wherein the blood cancer comprises a leukemia.
7. The method of claim 5, wherein the blood cancer comprises a myeloid malignancy.
8. The method of claim 7, wherein the myeloid malignancy comprises acute myeloid leukemia.
9. The method of claim 7, wherein the myeloid malignancy comprises chronic myelomonocytic leukemia.
10. The method of claim 2, wherein the cancer comprises a glioma.
11. The method of claim 10, wherein the glioma comprises glioblastoma.
12. The method of claim 1, comprising reducing proliferation of a cancer and/or pre- cancerous cell.
13. The method of claim 1, wherein the disease is associated with diseased cells comprising one or more mutations in one or more genes encoding a ten-eleven translocation (tet) methylcytosine dioxygenase 2 (TET2), ASXL transcriptional regulator 1 (ASXL1),
isocitrate dehydrogenase 1 (IDH1), isocitrate dehydrogenase 2 (IDH2), tumor protein p53 (p53), DNA (cytosine-5-)-methyltransferase 3A (DNMT3A), Janus kinase 2 (JAK2), Protein Phosphatase Mn2+/Mg2+-Dependent 1D (PPM1D), Spliceosome Factor 3b1 (SF3B1), and/or Serine and Arginine Rich Splicing Factor 2 (SRSF2).
14. The method of claim 1, wherein the disease is associated with diseased cells with one or more mutations in one or more genes encoding components of a canonical and/or non- canonical Polycomb Repressive Complex (PRC).
15. The method of claim 14, wherein the one or more mutations in one or more genes encoding components of PRC comprises one or more loss of function mutations.
16. The method of claim 14, wherein the one or more mutations in one or more genes encoding components of PRC comprises one or more mutations in E3 Ubiquitin Ligase RING1A/B, Polycomb Group Ring Finger 1 (PCGF1), Polycomb Group Ring Finger 2 (PCGF2), Polycomb Group Ring Finger 3 (PCGF3), Polycomb Group Ring Finger 4 (PCGF4), Polycomb Group Ring Finger 5 (PCGF5), and/or Polycomb Group Ring Finger 6 (PCGF6).
17. The method of claim 1, wherein the disease is associated with diseased cells with one or more mutations in one or more genes encoding Polycomb Repressive-Deubiquitinase (PR- DUB) complex associated components O-linked N-acetylglucosamine Transferase (OGT), Lysine Demethylase 1B (KDM1B), Forkhead Box K1 (FOXK1), Forkhead Box K2 (FOXK2), BRCA1 Associated Protein 1 (BAP1), ASXL Transcriptional Regulator 1 (ASXL1), ASXL Transcriptional Regulator 2 (ASXL2), ASXL Transcription Regulator 3 (ASXL3), and/or Host Cell Factor C1 (HCFC1).
18. The method of claim 17, wherein the one or more mutations in PR-DUB complex associated components OGT, KDM1B, FOXK1, FOXK2, BAP1, ASXL1, ASXL2, ASXL3, and/or HCFC1 comprises one or more gain of function mutations.
19. The method of claim 13, wherein the disease is associated with diseased cells with one or more mutations in a TET2 encoding gene.
20. The method of claim 19, wherein the one or more mutations in a TET2 encoding gene comprises one or more loss of function mutations.
21. The method of claim 20, wherein the disease is associated with diseased cells comprising one or more mutations in one or more genes encoding ASXL1, p53, DNMT3A, JAK2, PPM1D, SF3B1, and/or SRSF2.
22. The method of claim 1, wherein the individual is administered an additional therapy.
23. The method of claim 22, wherein the additional therapy is surgery, radiation, chemotherapy, hormone therapy, and/or immunotherapy.
24. The method of claim 22, wherein the additional therapy comprises administration of an inhibitor of TET2.
25. The method of claim 24, wherein the inhibitor of TET2 comprises C35 and/or TETi76.
26. The method of claim 22, wherein the additional therapy comprises administration of an inhibitor of NSUN1 and/or NSUN2.
27. The method of claim 26, wherein the additional therapy comprises administration of an inhibitor of NSUN2.
28. The method of claim 22, further comprising administration of 5-AzaC.
29. The method of claim 1, further comprising a step of diagnosing the disease in the individual.
30. The method of claim 1, wherein the disease is is associated with diseased cells characterized as comprising an open chromatin state relative to non-diseased cells of the same developmental lineage.
31. The method of claim 30, wherein administering the one or more MBD6 inhibitors results in promotion of a closed chromatin state in one or more diseased cells in the individual.
32. The method of claim 1, wherein the disease is characterized as pro-inflammatory.
33. The method of claim 1, comprising inhibition of diseased cell proliferation.
34. The method of claim 1, wherein administering the one or more MBD6 inhibitors decreases m5C levels in one or more chromatin associated RNA (caRNA) and/or decreases association of one or more caRNA with a PR-DUB complex in one or more diseased cells in the individual.
35. The method of claim 34, wherein administering the one or more MBD6 inhibitors increases oxidation of m5C in the one or more caRNA and/or inhibits installation of m5C in the one or more caRNA in one or more diseased cells in the individual.
36. The method of claim 34, wherein the one or more caRNA comprises or consists essentially of Long Terminal Repeat (LTR) RNAs.
37. The method of claim 34, wherein the one or more caRNA comprises one or more of the caRNAs described in Table 1.
38. The method of claim 34, wherein the one or more caRNA comprises one or more of the caRNAs described in Table 2.
39. The method of claim 34, wherein the one or more caRNA comprise a sequence at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to any one or more of SEQ ID NOs: 100-614.
40. The method of claim 34, wherein the one or more caRNA comprise a sequence at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to any one or more of SEQ ID NOs 104 or 107.
41. The method of claim 34, wherein the one or more caRNAs comprise between 20-50 caRNAs.
42. The method of claim 34, wherein the one or more caRNAs comprise a chromatin associated regulatory RNA (carRNA) and/or a chromatin associated repeat RNA sequence.
43. The method of claim 34, wherein the one or more caRNAs comprise or consists essentially of endogenous retroviral RNAs.
44. The method of claim 43, wherein the endogenous retroviral RNAs comprise or consists essentially of endogenous retrovirus-K (ERVK).
45. The method of claim 1, wherein histone ubiquitination is increased in one or more diseased cells after administration of the one or more MBD6 inhibitors.
46. The method of claim 45, wherein the histone ubiquitination comprises or consists of ubiquitination on H2A.
47. The method of claim 46, wherein the histone ubiquitination occurs at a site comprising or consisting essentially of H2AK119.
48. The method of claim 1, wherein administering the one or more MBD6 inhibitors modifies histone methylation in one or more diseased cells in the individual.
49. The method of claim 48, wherein a histone that comprises modified methylation comprises or consists essentially of H3.
50. The method of claim 49, wherein the histone comprises modified methylation on a methylation site comprising or consisting essentially of H3K27me3.
51. The method of claim 48, wherein the histone that comprises modified methylation is localized near or at a genetic locus overlapping with a histone that comprises modified histone ubiquitination.
52. The method of claim 1, wherein the one or more inhibitors of MBD6 comprise a polynucleotide at least partially complementary to a gene encoding MBD6.
53. The method of claim 52, wherein the polynucleotide comprises a short hairpin RNA and/or small interfering RNA.
54. The method of claim 52, wherein the polynucleotide comprises a sequence at least 80% complementary to at least 15, 20, 25, or more than 25 contiguous nucleotides of any one or more of SEQ ID NOs: 2, 4, 6, 8, 10, 12, 14, 16, 18, or 20.
55. The method of claim 52, wherein the polynucleotide comprises modified RNA phosphoramidites.
56. The method of claim 52, wherein the polynucleotide comprises RNA with one or more 2ʹ-O-Methyl (2ʹ-OMe) or 2ʹ-O-Methoxyethyl (2ʹ-MOE) modifications.
57. The method of claim 56, wherein the polynucleotide comprises RNA with every nucleotide comprising a 2ʹ-MOE modification.
58. The method of claim 52, wherein the polynucleotide comprises RNA with one or more phosphorothioate bonds.
59. The method of claim 52, wherein the polynucleotide is comprised within a lentiviral particle and/or nanoparticle.
60. The method of claim 52, wherein the inhibitor of MBD6 comprises more than one polynucleotide.
61. The method of claim 1, wherein the one or more inhibitors of MBD6 comprise a proteolysis targeting chimera.
62. A method of promoting histone ubiquitination in a cell comprising contacting the cell with one or more inhibitors of methyl-CpG-binding domain protein 6 (MBD6).
63. The method of claim 62, wherein promoting histone ubiquitination comprises or consists essentially of promoting ubiquitination on H2A.
64. The method of claim 63, wherein the histone ubiquitination site comprises or consists essentially of H2AK119.
65. The method of claim 62, wherein the cell comprises one or more mutations in one or more genes encoding TET2, ASXL1, IDH1, IDH2, p53, DNMT3A, JAK2, PPM1D, SF3B1, and/or SRSF2.
66. The method of claim 62, wherein the cell comprises one or more mutations in one or more genes encoding RING1A/B, PCGF1, PCGF2, PCGF3, PCGF4, PCGF5, PCGF6, OGT, KDM1B, FOXK1, FOXK2, BAP1, ASXL1, ASXL2, ASXL3, and/or HCFC1.
67. A method of decreasing m5C levels in chromatin associated RNA (caRNA) in a cell, comprising contacting the cell with one or more inhibitors of MBD6.
68. The method of claim 67, wherein the caRNA comprise or consists essentially of LTR RNAs.
69. The method of claim 67, wherein the caRNA comprise or consists essentially of endogenous retroviral RNAs.
70. The method of claim 69, wherein the endogenous retroviral RNAs comprise or consists essentially of endogenous retrovirus-K (ERVK).
71. A method of treating a disease in an individual, wherein the disease is characterized by cells comprising one or more mutations in TET2, ASXL1, IDH1, IDH2, p53, DNMT3A, JAK2, PPM1D, SF3B1, and/or SRSF2 encoding genes, the method comprising administering one or more inhibitors of MBD6, TET2, and/or NSUN2 to the individual in need thereof.
72. A manufactured article for use in performing the method of any one of claims 1-71.
73. A kit comprising means for performining the method according to any one of claims 1-71.
74. A fusion protein comprising a sequence at least 80%, 85%, 90%, 95%, or 100% identical to any one or more of SEQ ID NOs: 1-4, 11-14, or 15-20.
75. The fusion protein of claim 74, wherein the fusion protein comprises a sequence at least 80%, 85%, 90%, 95%, or 100% identical to any one of SEQ ID NOs: 19-20.
76. A composition comprising the fusion protein of claim 74.
77. A method of treating a disease associated with aberrant m5C RNA methylation in an individual, comprising administering the fusion protein and/or composition of any one of claims 74-76.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202363587365P | 2023-10-02 | 2023-10-02 | |
| US63/587,365 | 2023-10-02 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| WO2025076000A2 true WO2025076000A2 (en) | 2025-04-10 |
| WO2025076000A3 WO2025076000A3 (en) | 2025-05-08 |
Family
ID=95282826
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2024/049478 Pending WO2025076000A2 (en) | 2023-10-02 | 2024-10-01 | Compositions and methods for modulating chromatin state |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025076000A2 (en) |
Family Cites Families (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101801419A (en) * | 2007-06-08 | 2010-08-11 | 米尔纳疗法公司 | MiR-34 regulated genes and pathways as targets for therapeutic intervention |
| US10233495B2 (en) * | 2012-09-27 | 2019-03-19 | The Hospital For Sick Children | Methods and compositions for screening and treating developmental disorders |
| US10900036B2 (en) * | 2015-03-17 | 2021-01-26 | The General Hospital Corporation | RNA interactome of polycomb repressive complex 1 (PRC1) |
-
2024
- 2024-10-01 WO PCT/US2024/049478 patent/WO2025076000A2/en active Pending
Also Published As
| Publication number | Publication date |
|---|---|
| WO2025076000A3 (en) | 2025-05-08 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Lavenniah et al. | Engineered circular RNA sponges act as miRNA inhibitors to attenuate pressure overload-induced cardiac hypertrophy | |
| JP7720623B2 (en) | RNA and DNA base editing via recruitment of engineered ADARs | |
| US20240417706A1 (en) | Systems and methods for the treatment of hemoglobinopathies | |
| US11492670B2 (en) | Compositions and methods for targeting cancer-specific sequence variations | |
| JP2022023229A (en) | Peptides and nanoparticles for intracellular delivery of genome-editing molecules | |
| JP2025084803A (en) | Long-lasting pain relief via targeted in vivo epigenetic suppression | |
| JP2024504608A (en) | Editing targeting RNA by leveraging endogenous ADAR using genetically engineered RNA | |
| Cardinali et al. | Time-controlled and muscle-specific CRISPR/Cas9-mediated deletion of CTG-repeat expansion in the DMPK gene | |
| US20230096378A1 (en) | Composition for inducing apoptosis of cells having genomic sequence variation and method for inducing apoptosis of cells by using composition | |
| IL297113A (en) | CRISPR inhibition for facial-laparo-brachial muscular dystrophy | |
| Li et al. | CRISPR/Cas systems usher in a new era of disease treatment and diagnosis | |
| JP7755272B2 (en) | Engineered ADAR-recruiting RNAs and methods of use thereof | |
| Tasca et al. | High-capacity adenovector delivery of forced CRISPR-Cas9 heterodimers fosters precise chromosomal deletions in human cells | |
| WO2025076000A2 (en) | Compositions and methods for modulating chromatin state | |
| US20250288692A1 (en) | Compositions and methods for treating kcnq4-associated hearing loss | |
| Rovai | Developing and improving genome editing as a therapeutic tool for hereditary hemochromatosis | |
| Dacey | A Gene Based Solution: Examining the Role of Small RNA Post-transcriptional Regulation on Non-viral Gene Delivery Systems | |
| US20230212606A1 (en) | Compositions and methods for treating kcnq4-associated hearing loss | |
| Hanson | Development of molecular medicine approaches for neuromuscular disease | |
| Li | Development of CRISPR/Cas-mediated gene editing in the retina | |
| CN118696054A (en) | Compositions and methods for treating KCNQ4-related hearing loss | |
| KR20250158061A (en) | PAH-modulating systems and methods | |
| Stevanovic | Optimizing an AAV-CRISPR system for retinal gene therapy | |
| Stephanou | Advancing Lentiviral Gene Therapy Vectors for Β-thalassaemia |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24875259 Country of ref document: EP Kind code of ref document: A2 |