WO2025039972A1 - Tls-based gene editing systems - Google Patents
Tls-based gene editing systems Download PDFInfo
- Publication number
- WO2025039972A1 WO2025039972A1 PCT/CN2024/112328 CN2024112328W WO2025039972A1 WO 2025039972 A1 WO2025039972 A1 WO 2025039972A1 CN 2024112328 W CN2024112328 W CN 2024112328W WO 2025039972 A1 WO2025039972 A1 WO 2025039972A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- rna
- protein
- sequence
- binding
- deaminase
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/11—DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
- C12N15/113—Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/14—Hydrolases (3)
- C12N9/16—Hydrolases (3) acting on ester bonds (3.1)
- C12N9/22—Ribonucleases [RNase]; Deoxyribonucleases [DNase]
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N2310/00—Structure or type of the nucleic acid
- C12N2310/10—Type of nucleic acid
- C12N2310/20—Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPR]
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N2310/00—Structure or type of the nucleic acid
- C12N2310/30—Chemical structure
- C12N2310/35—Nature of the modification
- C12N2310/351—Conjugate
- C12N2310/3519—Fusion with another nucleic acid
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N2330/00—Production
- C12N2330/50—Biochemical production, i.e. in a transformed host cell
- C12N2330/51—Specially adapted vectors
Definitions
- the present disclosure generally relates to tRNA-like structure (TLS) and RNA constructs comprising tRNA-like structure. Also disclosed are DNA constructs, vectors, systems, compositions, and methods involving the tRNA-like structure.
- TLS tRNA-like structure
- DNA constructs, vectors, systems, compositions, and methods involving the tRNA-like structure are also disclosed.
- Gene editing is a cutting-edge technique that enables precise modification of specific target genes in an organism (such as gene knockout, repair, addition) .
- Frequently used gene editing tools include Zinc finger nucleases (ZFNs) , transcription activator like effect nucleases (TALENs) , and clustered regularly interspaced short palindromic repeats/CRISPR-associated proteins (CRISPR/Cas) .
- ZFNs Zinc finger nucleases
- TALENs transcription activator like effect nucleases
- CRISPR/Cas clustered regularly interspaced short palindromic repeats/CRISPR-associated proteins
- RNA sequences guide RNA, gRNA
- CRISPR/Cas has been widely applied in biological research, biopharmaceutical, agricultural breeding and so on.
- the CRISPR/Cas system recognizes the target site of the genome, and results in double-stranded DNA breaks (DSBs) .
- the DSBs are mainly repaired via non-homologous end joining (NHEJ) , which enables knockout of the target gene.
- NHEJ non-homologous end joining
- the DSBs can be recombined with foreign DNA through homology-directed repair (HDR) , thereby achieving repair of the target site or addition of foreign fragments.
- HDR homology-directed repair
- the disadvantage of the CRISPR/Cas system is that it creates DSBs, which may cause safety risks such as large DNA fragment deletion, chromosome heterotopy, and chromothripsis.
- Base editing is regarded as the next generation gene editing technology, which modifies specific base pairs in the genome.
- Base editing tools include adenine base editors (ABEs) and cytosine base editors (CBEs) .
- ABEs adenine base editors
- CBEs cytosine base editors
- the key component of ABE is a fusion protein composed of nicked Cas9 (nCas9) and an adenine deaminase.
- the fusion protein targets specific genomic sites together with gRNA and converts A ⁇ T to G ⁇ C pairs.
- the key component of CBE is a fusion protein composed of nicked Cas9 (nCas9) and a cytosine deaminase.
- the fusion protein targets specific genomic sites together with gRNA and converts C ⁇ G to T ⁇ A pairs.
- base editing enables accurate gene modifications without causing DSBs, and therefore, has fewer safety issues.
- the gene editors may be applied to cells or organisms in the form of plasmids, viral vectors, ribonucleoprotein (RNPs) , or mRNAs/gRNAs.
- plasmids and viral vectors transcription of gene editing proteins and gRNAs are usually driven by separate promoters.
- Cas9 transcription is initiated by RNA Pol II promoters (e.g., CMV promoters)
- gRNA transcription is initiated by RNA Pol III promoters (e.g., U3 promoters) .
- RNPs consisting of gene editing proteins and gRNAs have been widely used for ex vivo gene editing (e.g., electroporation of hematopoietic stem cells) .
- gene editing proteins can be delivered to cells or organisms in the form of mRNAs, which bypasses the expression and purification of challenging recombinant proteins.
- mRNA/gRNA can also be packaged into lipid nanoparticles (LNPs) and delivered to livers or other organs.
- mRNA can be produced by in vitro transcription (IVT) using plasmids as the template in large scale.
- IVT in vitro transcription
- gRNAs are typically 80-150 nt in length. Similar to mRNA, gRNAs can be synthesized by IVT. However, IVT gRNAs have poor stability and editing efficiency in cells.
- gRNAs are generally produced by solid-phase synthesis, which makes it possible to introduce modifications to gRNA at specific positions.
- this novel design can be applied to gene editing systems.
- the present disclosure provides a novel way to express a gene editing protein and one or more gRNAs with high efficiency by combining an mRNA (encoding the gene editing protein) and the gRNAs in one transcript and separating them with one or more tRNA-like structures. After cleavage by intracellular RNase P and/or RNase Z, the full-length precursor transcript releases one mRNA molecule encoding the gene editing protein, and one or more gRNAs. (Fig. 1) . By linking mRNA and gRNA (s) together in one single transcript, the gRNA (s) do not need to be driven by separate promoters in plasmids or viral vectors.
- this novel design simplifies the process to produce mRNA-gRNA fusions by IVT, thus avoiding chemical synthesis of long gRNAs.
- This new design is particularly suitable for gene editing of multiple targets, which requires two or more gRNAs.
- this novel design can be applied to transformer base editor systems.
- a main guide RNA (mgRNA) and/or a helper guide RNA (hgRNA) and a mRNA encoding a Cas protein or a base editor protein are encoded by one polynucleotide, separating by one or more tRNA-like structures.
- this novel design can be applied to gene editing systems for editing mammalian genes.
- the present disclosure provides an RNA comprising a protein coding sequence, a non-coding RNA sequence, and a tRNA-like structure (TLS) between the protein coding sequence and the non-coding RNA sequence.
- TLS tRNA-like structure
- the RNA comprises from 5’ -end to 3’ -end the protein coding sequence, the tRNA-like structure (TLS) , and the non-coding RNA sequence.
- TLS tRNA-like structure
- the non-coding RNA sequence is a guide RNA (gRNA) sequence.
- gRNA guide RNA
- the guide RNA comprises a spacer sequence and a scaffold sequence, wherein the scaffold sequence is capable of binding to the protein encoded by the protein coding sequence, and wherein the spacer sequence targets a target gene.
- the target gene is a mammalian gene. In some embodiments, the target gene is a human gene.
- the spacer sequence in the gRNA is selected from SEQ ID NOs: 204-260.
- the scaffold sequence comprises at least one protein-binding motif, wherein the protein-binding motif is an RNA aptamer motif or a variant thereof.
- the RNA further comprises an mRNA-stabilizing sequence between the protein coding sequence and the tRNA-like structure (TLS) .
- TLS tRNA-like structure
- the mRNA-stabilizing sequence is a triple helix sequence, a poly (A) , or a histone stem-loop.
- the triple helix sequence is derived from a long non-coding RNA (lncRNA) gene.
- the lncRNA gene is metastasis-associated lung adenocarcinoma transcript 1 (MALAT1) or nuclear paraspeckle assembly transcript 1 (NEAT1) .
- the RNA further comprises a 5’ -untranslated region (5’ -UTR) at the 5’ -end of the RNA, and/or a 3’ -untranslated region (3’ -UTR) at the 3’ -end of the RNA.
- 5’ -UTR 5’ -untranslated region
- 3’ -untranslated region 3’ -UTR
- the RNA further comprises a poly-A sequence at the 3’ -end of the RNA and/or a 5’ -Cap structure at the 5’ -end of the RNA.
- the RNA further comprises a sequence encoding a nuclear localization signal (NLS) , wherein the sequence encoding the NLS is located at the 5’ -end and/or 3’ -end of the protein coding sequence.
- NLS nuclear localization signal
- the RNA comprises more than one sequences that each encodes a nuclear localization signal (NLS) , wherein the nuclear localization signals encoded by the sequences are the same or different.
- NLS nuclear localization signal
- the protein coding sequence encodes an RNA binding protein.
- the Cas protein is a Cas9, a dead Cas9 (dCas9) , or a Cas9 nickase (nCas9) .
- the Cas protein is selected from the group consisting of SpCas9, FnCas9, St1Cas9, St3Cas9, NmCas9, SaCas9, AsCpfl, LbCpfl, FnCpfl, VQR Cas9, EQR Cas9, VRER Cas9, Cas9-NG, xCas9, eCas9, SpCas9-HF1, HypaCas9, HiFiCas9, sniper-Cas9, SpG, SpRY, KKH SaCas9, CjCas9, Cas9-NRRH, Cas9-NRCH, Cas9-NRTH, SsCpfl, PcCpfl, BpCpfl, LiCpfl, PmCpfl, Lb2Cpf1, PbCpfl, PbCpfl, PeCpf1, PdCpf1, MbCpf1, EeC
- the protein having enzymatic activity is a cytidine deaminase.
- the cytidine deaminase is selected from the group consisting of APOBEC3B (A3B) , APOBEC3C (A3C) , APOBEC3D (A3D) , APOBEC3F (A3F) , APOBEC3G (A3G) , APOBEC3H (A3H) , APOBEC1 (Al) , APOBEC3 (A3) , APOBEC2 (A2) , APOBEC4 (A4) , and AICDA (AID) .
- the protein having enzymatic activity is an adenosine deaminase.
- the adenosine deaminase is selected from the group consisting of tRNA-specific adenosine deaminase (TadA) , adenosine deaminase tRNA specific 1 (ADAT1) , adenosine deaminase tRNA specific 2 (ADAT2) , adenosine deaminase tRNA specific 3 (ADAT3) , adenosine deaminase RNA specific B1 (ADARB1) , adenosine deaminase RNA specific B2 (ADARB2) , adenosine monophosphate deaminase 1 (AMPD1) , adenosine monophosphate deaminase 2 (AMPD2) , adenosine monophosphate deaminase 3 (AMPD3) , adenosine deaminase (ADA)
- the protein coding sequence encodes a base editor protein.
- the base editor protein is a cytidine base editor (CBE) protein.
- the CBE protein is selected from BE3, YE1-BE3, YEE-BE3, BE4, eBE, hA3A-BE3, hA3A-BE3-Y130F, hA3A-BE3-Y132D, eA3A-BE3, SaKKH-BE3, Target-AID, dCas12a-BE, BEACON1, BEACON2, enAsBE, PBE, and A3A-PBE.
- the base editor protein is an adenosine base editor (ABE) protein.
- the ABE protein is selected from ABE7.10, ABE8e, ABE8e-V106W, LbABE8e, STEME-1, ABE-P1, ABE-P2, and rBE14.
- the protein having enzymatic activity is a methylase, or a reverse transcriptase.
- the tRNA-like structure comprises an acceptor stem, a D-loop arm, and a T ⁇ C-loop arm.
- the tRNA-like structure comprises a cleavage site for one or more RNase P, RNase Z, and/or RNase E.
- the tRNA-like structure is derived from a tRNA gene or a long non-coding RNA (lncRNA) gene.
- the long non-coding RNA (lncRNA) gene is metastasis-associated lung adenocarcinoma transcript 1 (MALAT1) or nuclear paraspeckle assembly transcript 1 (NEAT1) .
- the tRNA-like structure is derived from a eukaryotic organism.
- the eukaryotic organism is selected from the group consisting of Saccharomyces cerevisiae, Arabidopsis thaliana, Oryza sativa, Homo Sapiens, Macaca mulatta, Macaca fascicularis, Susscrofa domestica, Canis lupus familiaris, Rattus norvegicus, and Mus musculus.
- the tRNA-like structure is encoded by any one of SEQ ID NOs: 4-7.
- the RNA comprises more than one non-coding RNA sequences, the non-coding RNA sequences are the same or different.
- the RNA comprises more than one non-coding RNA sequences that are guide RNAs (gRNAs) , wherein the gRNAs are the same or different.
- gRNAs guide RNAs
- the RNA comprises more than one non-coding RNA sequences, and wherein the RNA comprises a tRNA-like structure (TLS) between the protein coding sequence and the nearest non-coding RNA sequence, and between each non-coding RNA sequences.
- TLS tRNA-like structure
- the protein coding sequence is located upstream relative to all non-coding RNA sequences.
- the RNA comprises at least one modified nucleotide.
- the present disclosure provides a DNA encoding any of the RNA disclosed herein.
- the DNA further comprises an RNA polymerase promoter at the 5’ -end.
- the RNA polymerase promoter is a eukaryotic RNA polymerase II promoter.
- the present disclosure provides a vector comprising any of the DNA disclosed herein.
- the vector is a viral vector or a plasmid.
- the present disclosure provides a system comprising any of the RNA disclosed herein, a ribonuclease P (RNase P) or a polynucleotide encoding thereof, and a ribonuclease Z (RNase Z) or a polynucleotide encoding thereof.
- RNase P ribonuclease P
- RNase Z ribonuclease Z
- the present disclosure provides a system comprising any of the DNA disclosed herein and/or any of the vector disclosed herein, an RNA polymerase II or a polynucleotide encoding thereof, a ribonuclease P (RNase P) or a polynucleotide encoding thereof, and a ribonuclease Z (RNase Z) or a polynucleotide encoding thereof.
- RNase P ribonuclease P
- RNase Z ribonuclease Z
- the present disclosure provides a composition comprising (1) any of the RNA disclosed herein, any of the DNA disclosed herein, and/or any of the vector disclosed herein; and (2) a carrier.
- the carrier is selected from lipid nanoparticles, liposomes, cationic nanoemulsions, dendrimer-based lipid nanoparticles, cationic polymers, and polysaccharide particles.
- the present disclosure provides a gene editing system comprising
- an hgRNA comprising a CRISPR motif, an hgRNA spacer, and a first protein-binding motif, or a DNA polynucleotide encoding the hgRNA
- an mgRNA comprising a second CRISPR motif and an mgRNA spacer, or a DNA polynucleotide encoding the mgRNA, wherein the mgRNA spacer targets a target gene,
- a first CRISPR-associated protein (Cas protein) , or a polynucleotide encoding the first Cas protein, wherein the first Cas protein binds to the first CRISPR motif
- a second Cas protein or a polynucleotide encoding the second Cas protein, wherein the second Cas protein binds to the second CRISPR motif
- a first fusion protein comprising a nucleobase deaminase or a catalytic domain thereof and a first RNA binding domain, or a polynucleotide encoding the first fusion protein, wherein the nucleobase deaminase or the catalytic domain thereof and the first RNA binding domain are optionally connected by a linker, and wherein the first RNA binding domain binds to the first protein-binding motif.
- first Cas protein and second Cas protein are the same or different
- the gene editing system comprises a TLS-containing RNA or a DNA encoding the TLS-containing RNA
- the TLS-containing RNA comprises a protein coding sequence, a non-coding RNA sequence, and a tRNA-like structure (TLS) between the protein coding sequence and the non-coding RNA sequence
- the protein encoded by the protein coding sequence is selected from a group consisting of the first Cas protein, the second Cas protein, and the first fusion protein
- the non-coding RNA sequence is the hgRNA or the mgRNA.
- a “TLS-containing RNA” refers to a RNA comprising at least one tRNA-like structure.
- a TLS-containing RNA comprises an mRNA and at least one non-coding RNA, wherein at least one tRNA-like structure is located between the mRNA and the non-coding RNA, and/or between the non-coding RNAs.
- the gene editing system further comprises
- nucleobase deaminase inhibitor domain or a polynucleotide encoding thereof
- nucleobase deaminase inhibitor domain is connected to the nucleobase deaminase or the catalytic domain thereof in the first fusion protein optionally by a linker, and wherein there is a cleavage site for the protease between the nucleobase deaminase inhibitor domain and the nucleobase deaminase or the catalytic domain thereof.
- the gene editing system further comprises a second fusion protein comprising the protease and a second RNA binding domain, or a polynucleotide encoding the second fusion protein,
- protease and the second RNA binding domain are optionally connected by a linker
- mgRNA further comprises a second protein-binding motif
- RNA binding domain binds to the second protein-binding motif
- the protein encoded by the protein coding sequence is selected from a group consisting of the first Cas protein, the second Cas protein, the first fusion protein, and the second fusion protein.
- the protease is split into a first protease fragment and a second protease fragment, wherein the first and/or second protease fragment alone is not able to cleave the cleavage site.
- the gene editing system comprises
- a second fusion protein comprising the first protease fragment and a second RNA binding domain, or a polynucleotide encoding the second fusion protein, wherein the first protease fragment and the second RNA binding domain are optionally connected by a linker, and
- a third fusion protein comprising the second protease fragment and a third RNA binding domain, or a polynucleotide encoding the third fusion protein, wherein the second protease fragment and the third RNA binding domain are optionally connected by a linker,
- mgRNA further comprises a second protein-binding motif and a third protein-binding motif
- RNA binding domain binds to the second protein-binding motif
- third RNA binding domain binds to the third protein-binding motif
- the protein encoded by the protein coding sequence is selected from a group consisting of the first Cas protein, the second Cas protein, the first fusion protein, the second fusion protein, and the third fusion protein.
- the second and third RNA binding domains are the same or different, and the second and third protein-binding motifs are the same or different.
- the gene editing system further comprises
- a second fusion protein comprising the first protease fragment and a second RNA binding domain, or a polynucleotide encoding the second fusion protein
- first protease fragment and the second RNA binding domain are optionally connected by a linker
- mgRNA further comprises a second protein-binding motif
- the protein encoded by the protein coding sequence is selected from a group consisting of the first Cas protein, the second Cas protein, the first fusion protein, and the second fusion protein.
- the protease is a TEV protease, a TuMV protease, a PPV protease, a PVY protease, a ZIKV protease, or a WNV protease.
- the protease is a TEV protease comprising a sequence of SEQ ID NO: 261.
- the first TEV protease fragment comprises a sequence of SEQ ID NO: 262 or 263.
- the nucleotide deaminase is a cytidine deaminase.
- the cytidine deaminase is selected from the group consisting of APOBEC3B (A3B) , APOBEC3C (A3C) , APOBEC3D (A3D) , APOBEC3F (A3F) , APOBEC3G (A3G) , APOBEC3H (A3H) , APOBECI (Al) , APOBEC3 (A3) , APOBEC2 (A2) , APOBEC4 (A4) , and AICDA (AID) .
- the cytidine deaminase comprises an amino acid sequence of any one of SEQ ID NOs: 166-201.
- the nucleotide deaminase is an adenosine deaminase.
- the adenosine deaminase is selected from the group consisting of tRNA-specific adenosine deaminase (TadA) , adenosine deaminase tRNA specific 1 (ADAT1) , adenosine deaminase tRNA specific 2 (ADAT2) , adenosine deaminase tRNA specific 3 (ADAT3) , adenosine deaminase RNA specific B1 (ADARB1) , adenosine deaminase RNA specific B2 (ADARB2) , adenosine monophosphate deaminase 1 (AMPD1) , adenosine monophosphate deaminase 2 (AMPD2) , adenosine monophosphate deaminase 3 (AMPD3) , adenosine deaminase (ADA)
- TadA tRNA-specific a
- the adenosine deaminase comprises an amino acid sequence of any one of SEQ ID NOs: 73-165.
- the first fusion protein further comprises an uracil glycosylase inhibitor (UGI) .
- UMI uracil glycosylase inhibitor
- the Cas protein is a Cas9, a dead Cas9 (dCas9) , or a Cas9 nickase (nCas9) selected from the group consisting of SpCas9, FnCas9, St1Cas9, St3Cas9, NmCas9, SaCas9, AsCpfl, LbCpfl, FnCpfl, VQR SpCas9, EQR SpCas9, VRER SpCas9, SpCas9-NG, xSpCas9, RHA FnCas9, KKH SaCas9, NmeCas9, StCas9, CjCas9, AsCpfl, FnCpfl, SsCpfl, PcCpfl, BpCpfl, CmtCpfl, LiCpfl, PmCpfl, Pb3310Cpfl, Pb4417Cpfl, BsCpfl,
- nCas9 Cas9
- the first protein-binding RNA motif and the first RNA binding domain, the second protein-binding RNA motif and the second RNA binding domain, and the third protein-binding RNA motif and the third RNA binding domain are each independently selected from the group consisting of a MS2 phage operator stem-loop and MS2 coat protein (MCP) or an RNA-binding section thereof,
- MCP MS2 phage operator stem-loop and MS2 coat protein
- telomerase Ku binding motif and Ku protein or an RNA-binding section thereof
- telomerase Sm7 binding motif and Sm7 protein or an RNA-binding section thereof
- PCP PP7 phage operator stem -loop and PP7 coat protein
- RNA aptamer a non-natural RNA aptamer and corresponding aptamer ligand or an RNA-binding section thereof.
- the mgRNA and/or the hgRNA comprise a dual-RNA structure.
- the dual-RNA structure is formed by a CRISPR RNA (crRNA) and a trans-activating crRNA (tracrRNA) , wherein the crRNA comprises the spacer.
- crRNA CRISPR RNA
- tracrRNA trans-activating crRNA
- the mgRNA comprises a mcrRNA and a first tracrRNA
- the mcrRNA comprises the mgRNA spacer
- the hgRNA comprises a hcrRNA and a second tracrRNA
- the hcrRNA comprises the hgRNA spacer
- the first tracrRNA and the second tracrRNA are same or different.
- the TLS-containing RNA comprises more than one non-coding RNA sequences.
- the target gene is a mammalian gene.
- the mgRNA spacer and the hgRNA spacer are respectively:
- the present disclosure provides a method for gene editing in a subject, comprising administering to the subject (1) any of the RNA disclosed herein; and/or (2) any of the DNA disclosed herein; and/or (3) any of the vector disclosed herein; and/or (4) any of the system disclosed herein; and/or (5) any of the composition disclosed herein; and/or (6) any of the gene editing system disclosed herein.
- the subject is a mammal. In some embodiments, the subject is a human.
- Fig. 1 is an illustration of polynucleotide constructs comprising one or more tRNA-like structures.
- Fig. 1A illustrates the transcription, RNase cleavage, and translation process of an mRNA linked with one single guide RNA (sgRNA) .
- the coding sequence (CDS) of gene editing protein is linked to a sgRNA via t-RNA-like structure.
- the transcript is cleaved at the t-RNA-like structure by RNase P and RNase Z, which releases one mRNA encoding the gene editing protein and one sgRNA.
- the triplex sequence is used to stabilize the mRNA and to enhance its translation.
- FIG. 1B illustrates the transcription, RNase cleavage, and translation process of an mRNA linked with two or more identical or different sgRNAs.
- FIG. 1C illustrates the transcription, RNase cleavage, and translation process of a tBE system comprising the tRNA-like structures.
- a main sgRNA (msgRNA) and a helper sgRNA (hsgRNA) are linked via t-RNA-like structure to the CDS of a tBE locator (e.g., nCas9) or a tBE effector &key (e.g., a nucleotide deaminase) .
- a tBE locator e.g., nCas9
- a tBE effector &key e.g., a nucleotide deaminase
- Fig. 2 shows an application of tRNA-like structure in a CRISPR/Cas9 system as described in Example 1.
- Fig. 2A illustrates plasmid constructs used in the cell transfection.
- Fig. 2B shows gene editing efficiency at BCL11A and KLF1 locus in 293FT cells.
- nucleic acids are written left to right in the 5'to 3'orientation.
- upstream means towards the 5'end
- downstream means toward the 3’ end.
- a first sequence is located upstream of a second sequence means that the first sequence is closer to the 5’ end than the second sequence.
- a first sequence is located downstream of a second sequence means that the first sequence is closer to the 3’ end than the second sequence.
- amino acid sequences are written left to right in amino to carboxy orientation, respectively.
- percent identity and “%identity, ” as applied to nucleic acid or polynucleotide sequences, refer to the percentage of residue matches between at least two nucleic acid or polynucleotide sequences aligned using a standardized algorithm. Such an algorithm may insert, in a standardized and reproducible way, gaps in the sequences being compared in order to optimize alignment between two sequences, and therefore achieve a more meaningful comparison of the two sequences.
- Percent identity between nucleic acid or polynucleotide sequences may be determined using a suite of commonly used and freely available sequence comparison algorithms provided by the National Center for Biotechnology Information (NCBI) Basic Local Alignment Search Tool (BLAST) (Altschul, S. F. et al. (1990) J. Mol. Biol. 215: 403-410) , which is available from several sources, including the NCBI, Bethesda, Md., and on the Internet at http: //www. ncbi. nlm. nih. gov/BLAST/.
- NCBI National Center for Biotechnology Information
- BLAST Basic Local Alignment Search Tool
- Nucleic acid or polynucleotide sequences that do not show a high degree of identity may nevertheless encode similar amino acid sequences due to the degeneracy of the genetic code. It is understood that changes in a nucleic acid sequence can be made using this degeneracy to produce multiple nucleic acid sequences that all encode substantially the same protein. Specifically, degenerate codon substitutions may be achieved by generating sequences in which the third position of one or more selected (or all) codons is substituted with mixed-base and/or deoxyinosine residues (Batzer et al. (1991) Nucleic Acid Res 19: 5081; Ohtsuka et al.
- nucleic acid refers to deoxyribonucleotides or ribonucleotides and polymers thereof in either single-or double-stranded form. Unless specifically limited, the term encompasses nucleic acids containing known analogues of natural nucleotides which have similar binding properties as the reference nucleic acid and are metabolized in a manner similar to naturally occurring nucleotides.
- nucleic acid is used interchangeably with polynucleotide, and (in appropriate contexts) gene, cDNA, and mRNA encoded by a gene.
- percent (%) amino acid sequence identity with respect to a peptide, polypeptide or protein sequence is defined as the percentage of amino acid residues in a candidate sequence that are identical with the amino acid residues in another peptide or polypeptide sequence, after aligning the sequences and introducing gaps, if necessary, to achieve the maximum percent sequence identity, and not considering any conservative substitutions as part of the sequence identity. Percent amino acid sequence identity in the current disclosure is measured using BLAST software. Those skilled in the art can determine appropriate parameters for measuring alignment, including any algorithms needed to achieve maximal alignment over the full length of the sequences being compared.
- amino acid substitution refers to the replacement of one amino acid in a polypeptide with another amino acid.
- Amino acid substitutions can be conservative or non-conservative substitutions.
- a conservative replacement (also called a conservative mutation or a conservative substitution) is an amino acid replacement in a protein that changes a given amino acid to a different amino acid with similar biochemical properties (e.g., charge, hydrophobicity, and size) .
- Exemplary substitutions are shown in Table 1. Amino acid substitutions may be introduced into a protein of interest and the products screened for a desired activity, for example, retained/improved biological activity.
- Amino acids may be grouped according to common side-chain properties:
- polypeptide is intended to encompass a singular “polypeptide” as well as plural “polypeptides, ” and refers to a molecule composed of monomers (amino acids) linearly linked by amide bonds (also known as peptide bonds) .
- polypeptide refers to any chain or chains of two or more amino acids, and does not refer to a specific length of the product.
- peptides, ” “protein” , or any other term used to refer to a chain or chains of two or more amino acids are included within the definition of “polypeptide, ” and the term “polypeptide” may be used instead of, or interchangeably with any of these terms.
- polypeptide is also intended to refer to the products of post-expression modifications of the polypeptide, including without limitation glycosylation, acetylation, phosphorylation, amidation, derivatization by known protecting/blocking groups, proteolytic cleavage, or modification by non-naturally occurring amino acids.
- a polypeptide may be derived from a natural biological source or produced by recombinant technology, but is not necessarily translated from a designated nucleic acid sequence. It may be generated in any manner, including by chemical synthesis.
- encode or “encoding” as it is applied to polynucleotides refers to a polynucleotide which is said to “encode” a polypeptide if, in its native state or when manipulated by methods well known to those skilled in the art, it can be transcribed and/or translated to produce the mRNA for the polypeptide and/or a fragment thereof.
- the antisense strand is the complement of such a nucleic acid, and the encoding sequence can be deduced therefrom.
- a “single guide RNA” refers to a synthetic or expressed RNA sequence that comprises a CRISPR binding motif and a spacer.
- a “spacer” is a DNA-targeting motif, which is a sequence that is complementary to a target specific DNA region.
- the CRISPR binding motif of a guide RNA can bind to a Cas protein and DNA-targeting motif of the gRNA can guide the complex to a specific target location on a DNA.
- a guide RNA may further comprise one or more protein-binding motifs.
- a “fusion protein” is a protein comprising at least two domains that are encoded by separate genes that have been joined a single polypeptide.
- a fusion protein can comprise two domains that are encoded by separate genes that have been joined so that they are transcribed and translated as a single unit, producing a single polypeptide.
- the at least two domains are fused together directly.
- the domains are connected by one or more linkers.
- genetic modification and its grammatical equivalents as used herein can refer to one or more alterations of a nucleic acid, e.g., the nucleic acid within an organism's genome.
- genetic modification can refer to alterations, additions, and/or deletion of genes or portions of genes or other nucleic acid sequences.
- a genetically modified cell can also refer to a cell with an added, deleted, and/or altered gene or portion of a gene.
- a genetically modified cell can also refer to a cell with an added nucleic acid sequence that is not a gene or gene portion.
- Genetic modifications include, for example, both transient knock-in or knock-down mechanisms, and mechanisms that result in permanent knock-in, knock-down, or knock-out of target genes or portions of genes or nucleic acid sequences. Genetic modifications include, for example, both transient knock-in and mechanisms that result in permanent knock-in of nucleic acids sequences. Genetic modifications also include, for example, reduced or increased transcription, reduced or increased mRNA stability, reduced or increased translation, and reduced or increased protein stability.
- composition refers to any mixture of two or more products, substances, or compounds, including cells.
- RNA refers to a biomolecule composed of a chain of ribonucleotides, which are molecules made of a nitrogenous base, a sugar, and a phosphate group.
- RNA molecules include, but are not limited to, messenger RNA (mRNA) , transfer RNA (tRNA) , ribosomal RNA (rRNA) , and small non-coding RNA (ncRNA) such as microRNA (miRNA) and small interfering RNA (siRNA) .
- mRNA or “messenger RNA” refers to an RNA molecule that comprises a protein-encoding sequence, which can be translated by ribosome in the process of protein synthesis.
- non-coding RNA refers to an RNA molecule that does not encode a protein and is not translated into a protein.
- DNA or “deoxyribonucleic acid” refers to refers to a biomolecule composed of a chain of deoxyribonucleotides, which are molecules made of a nitrogenous base, a deoxyribose sugar, and a phosphate group.
- the term “variant” in the context of protein or polypeptide refers to a protein or polypeptide that differs from the parent protein, but retains essentially the same biological function or activity.
- the parent protein is a wild-type protein.
- the variant is a mutant.
- a “protein-binding RNA motif” refers to a piece of sequence in an RNA molecule that is capable of binding to proteins.
- the protein-binding RNA motif is capable of binding to specific protein with high affinity and specificity.
- the protein-binding RNA motif is an RNA aptamer or a variant thereof.
- enzymatic activity refers to catalytic properties.
- protein with enzymatic activity acts upon substrate molecules and decreases the activation energy necessary for a chemical reaction to occur by stabilizing the transition state. This stabilization speeds up reaction rates and makes them happen at physiologically significant rates.
- a protein having enzymatic activity is an enzyme, a functional domain of the enzyme, or a variant thereof.
- biological equivalents thereof are also provided.
- the biological equivalents have at least about 70%, 75%, 80%, 85%, 90%, 95%, 98%, or 99%sequence identity with the reference protein.
- the biological equivalents retain the desired activity of the reference protein.
- the biological equivalents are derived by including one, two, three, four, five, or more amino acid additions, deletions, substitutions, or the combinations thereof.
- the substitution is a conservative amino acid substitution.
- Transfer RNA is an essential part of protein biosynthesis machinery, mediating the transfer of amino acids to ribosomes.
- an aminoacyl-tRNA synthetase (aaRS) catalyzes attachment of an amino acid to the 3’ -CCA end of a tRNA.
- the charged tRNA enters the ribosome, where its anticodon interacts with one of the codons of a messenger RNA (mRNA) .
- mRNA messenger RNA
- the majority of tRNAs are single polynucleotides with a cloverleaf secondary structure.
- the cloverleaf structure consists of an acceptor stem, a D-Arm, an anticodon stem-loop, and a T-Arm. It typically folds into a L-shaped three-dimensional structure.
- tRNA-like structures were first identified in the positive-strand RNA genomes of turnip yellow mosaic virus (TYMV) in 1970 (Pierre et al., 1970) . Since the discovery, many more TLSs have been reported not only in plant viruses but also in other viruses, bacteria, and eukaryotes (Sherlock et al., 2021; Mans et al., 1992) .
- TYMV turnip yellow mosaic virus
- tRNA-like structure refers to any RNA structure that has the ability to serve as a substrate for a tRNA-specific enzyme such as RNase P, RNase Z, and RNase E. In some embodiments, the tRNA-like structure has the ability to serve as a substrate for both RNase P and RNase Z.
- Ribonuclease P (RNase P) activity is ubiquitously required in all three kingdoms of life (Archaea, Eubacteria and Eukarya) for the processing and maturation of tRNAs, together with RNase Z that catalyzes 3’ terminals of tRNA precursors (Altman et al., 1999; Gopalan et al., 2002) .
- RNA targets that are catalyzed by RNase P and RNase Z for their maturation, including two long non-coding RNAs (lncRNAs) , metastasis-associated lung adenocarcinoma transcript 1 (MALAT1) (Ji et al., 2003) and nuclear paraspeckle assembly transcript 1 (NEAT1) (Sunwoo et al., 2009) .
- MALAT1 metastasis-associated lung adenocarcinoma transcript 1
- NEAT1 nuclear paraspeckle assembly transcript 1
- the MALAT1 gene is transcribed by RNA polymerase II and generates a precursor RNA transcript after cleavage/polyadenylation (Jeremy et al., 2008) .
- This precursor RNA transcript is further cleaved by RNase P to form the mature MALAT1 transcript, which mainly locates in the nucleus (Jeremy et al., 2008) .
- the mature MALAT1 transcript is regarded to be non-coding, the triplex structure at its 3’ terminal has proved to be able to support translation like Poly (A) signals (Jeremy et al., 2012; Jessica et al., 2014) .
- the 3’ end product of RNase P cleavage is then cut by RNase Z (Jeremy et al., 2008) .
- the 5’end product of RNase Z cleavage is further added with a CCA tail and forms a mature mascRNA (MALAT1-associated small cytoplasmic RNA) of 61 nt, which is exported to the cytoplasm (Jeremy et al., 2008) .
- MALAT1-associated small cytoplasmic RNA MALAT1-associated small cytoplasmic RNA
- the mascRNA structurally resembles tRNA, which is sufficient for RNase P and RNase Z cleavage (Jeremy et al., 2008) .
- RNase E is the largest RNase (118 kDa) , comprising 1, 061 amino acids. It is encoded by the E. coli rne gene. RNase E consists of two functionally distinct domains: the globular N-terminal half (NTH; 2 residues 1-529) and the C-terminal half (CTH; residues 530-1, 061) . The RNase E-NTH has been shown to contain a catalytic activity domain for RNA cleavage, including a specific RNA binding site and a cleavage site. Because RNase E-NTH has the RNA-processing activity, such as RNA recognition and degradation, recombinant RNase E expressing amino acids 1-529 has been shown to be sufficient for the catalytic activity.
- the CTH contains an arginine-rich domain, which is commonly involved in protein binding.
- the RNase E-CTH provides a scaffolding core for polynucleotide phosphorylase (PNPase) , RNase helicase B, enolase, polyphosphate kinase, poly (A) polymerase, GroEL, and DnaK, which cooperate together to direct RNA toward the degradosome (Baek et al., 2019) .
- the present disclosure provides a novel idea of combining mRNA and one or more non-coding RNAs in one transcript ( “precursor transcript” ) and separating each component by one or more tRNA-like structures.
- the presence of tRNA-like structure (s) makes this precursor transcript subject to cleavage by intracellular RNases such as RNase P and RNase Z. After the cleavage, tRNA-like structure (s) are cut off, and the full-length precursor transcript releases the mRNA molecule and the non-coding RNA (s) .
- RNases such as RNase P and RNase Z.
- tRNA-like structure (s) are cut off, and the full-length precursor transcript releases the mRNA molecule and the non-coding RNA (s) .
- This novel design allows the non-coding RNAs to be transcribed using the same promoter with the mRNA, thus simplifies production of non-coding RNA (s) .
- this novel design can be applied in gene editing systems.
- the present disclosure provides a novel way to express the gene editing protein and the gRNA with high efficiency by combining the mRNA and gRNA in one transcript and separating them with the tRNA-like structure. After cleavage by intracellular RNase P and RNase Z, the full-length precursor transcript releases one mRNA molecule encoding the gene editing proteins, and one or more gRNAs. (Fig. 1) . By linking mRNA and gRNA together in one single transcript, the gRNA doesn’ t need to be driven by separate promoters in plasmids or viral vectors.
- the expression cassette in plasmids or viral vectors comprises (a) a eukaryotic RNA Pol II promoter, (b) a coding sequence of gene editing protein, (c) one or more triple helix structures (such as from lncRNA genes) , (d) one or more tRNA-like structures (such as from lncRNA genes) , (e) one or more gRNA sequences, and (f) a poly (A) signal.
- a eukaryotic RNA Pol II promoter a coding sequence of gene editing protein
- c one or more triple helix structures
- tRNA-like structures such as from lncRNA genes
- gRNA sequences such as from lncRNA genes
- gRNA sequences such as from lncRNA genes
- this novel design can be applied to transformer base editor systems.
- a main guide RNA (mgRNA) and/or a helper guide RNA (hgRNA) and a mRNA encoding a Cas protein or a base editor protein are encoded by one polynucleotide, separating by one or more tRNA-like structures.
- this novel design can be applied to gene editing systems for editing mammalian genes.
- the present disclosure provides an RNA comprising a protein coding sequence, a non-coding RNA sequence, and a tRNA-like structure (TLS) between the protein coding sequence and the non-coding RNA sequence.
- the tRNA-like structure is used to connect the mRNA and the non-coding RNA sequences. Cleavage of the tRNA-like structure by RNase P and RNase Z releases the mRNA and the one or more non-coding RNA sequences.
- the tRNA-like structure is encoded by a polynucleotide sequence of any one of SEQ ID NOs: 4-7.
- the RNA comprises from 5’ -end to 3’ -end the protein coding sequence, the tRNA-like structure (TLS) , and the non-coding RNA sequence.
- TLS tRNA-like structure
- the non-coding RNA sequence is a guide RNA (gRNA) sequence.
- the guide RNA (gRNA) comprises a spacer sequence and a scaffold sequence, wherein the scaffold sequence is capable of binding to the protein encoded by the protein coding sequence, and wherein the spacer sequence targets a target gene.
- the spacer sequence is 20 bp.
- the spacer sequence is 8, 9, 10, 11, 12, 13, 14, 15 , 16 , 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 bp. The spacer sequence can be changed for different targets.
- the scaffold sequence is designed according to the specific gene editing protein.
- the target gene is a mammalian gene. In some embodiments, the target gene is a human gene.
- the spacer sequence in the gRNA is selected from SEQ ID NOs: 204-260.
- the scaffold sequence comprises at least one protein-binding motif, wherein the protein-binding motif is an RNA aptamer motif or a variant thereof.
- Aptamers are single-stranded oligonucleotides that fold into defined architectures and selectively bind to a specific target, including proteins, peptides, carbohydrates, small molecules, toxins, and even live cells.
- the protein binding motif is selected from MS2, PP7, boxB, SfMu hairpin motif, telomerase Ku, and Sm7 binding motif, or a variant thereof.
- the MS2 phage operator stem-loop binds to the MS2 coat protein (MCP) .
- the boxB binds to the N22p.
- the telomerase Ku binding motif binds to the Ku.
- the telomerase Sm7 binding motif binds to the Sm7 protein.
- the PP7 phage operator stem-loop binds to the PP7 coat protein (PCP) .
- the SfMu phage Com stem-loop binds to the Com RNA binding protein. (Table 2)
- the RNA further comprises an mRNA-stabilizing sequence between the protein coding sequence and the tRNA-like structure (TLS) .
- TLS tRNA-like structure
- an “mRNA-stabilizing sequence” refers to a sequence that enhances mRNA stability. In some embodiments, the mRNA-stabilizing sequence lowers the in vivo degradation rate of the mRNA. In some embodiments, the mRNA -stabilizing sequence enhances mRNA translation.
- the mRNA-stabilizing sequence is a triple helix sequence, a poly (A) , or a histone stem-loop.
- RNA triple helix refers to a specific RNA tertiary interaction in which double-stranded RNA stems make hydrogen bond contacts with a third strand of RNA.
- An RNA triple helix consists of three strands: a Watson-Crick RNA double helix whose major-groove establishes hydrogen bonds with the so-called “third strand. ” (Brown et al., 2020)
- the triple helix structure that is located downstream of the protein coding sequence is used to stabilize the mRNA structure and promote translation.
- a poly (A) sequence is a long chain of adenines nucleotides. Poly (A) is described in detail below.
- Histone stem-loop is a stem-loop structure at the 3’ -UTR of histone mRNAs, for example, the metazoan histone mRNA. The histone stem-loop can be bound by a 31 kDa stem-loop binding protein (SLBP) .
- SLBP stem-loop binding protein
- the triple helix sequence is derived from a long non-coding RNA (lncRNA) gene.
- Long non-coding RNAs are a type of RNA, generally defined as transcripts more than 200 nucleotides that are not translated into protein.
- Long non-coding RNAs include, for example, intergenic lincRNAs, intronic ncRNAs, and sense and antisense lncRNAs.
- the lncRNA gene is metastasis-associated lung adenocarcinoma transcript 1 (MALAT1) or nuclear paraspeckle assembly transcript 1 (NEAT1) .
- the triple helix is an MATLAT1 triple helix of SEQ ID NO: 1 or 3.
- the triple helix is a NEAT1 triple helix of SEQ ID NO: 2.
- the RNA further comprises a 5’ -untranslated region (5’ -UTR) at the 5’ -end of the RNA, and/or a 3’ -untranslated region (3’ -UTR) at the 3’ -end of the RNA.
- the RNA further comprises a poly-A sequence at the 3’ -end of the RNA and/or a 5’ -Cap structure at the 5’ -end of the RNA.
- the untranslated region is a regulatory region situated at the 5’ or 3’ end of a coding region.
- 5’ UTR is directly upstream from the initiation codon of the protein coding sequence.
- the 5’ UTR is further added with a 5’ cap structure.
- 3’ UTR immediately follows the translation termination codon of the protein coding sequence.
- the 3’ UTR often contains regions for post-transcriptional regulation, such as polyadenylation, localization, and stability of the mRNA.
- the 3’ UTR is further connected with a poly (A) tail.
- the 5’ -UTR is mainly involved in translation of its downstream open reading frame, while the function of the 3’ -UTR is to maintain mRNA stability.
- the RNA sequence further comprises a 5’ cap at the 5’ end of the RNA sequence.
- 5’ cap refers to a specially altered nucleotide on the 5’ end of the RNA sequence.
- three main cap structures are possible: cap 0, cap 1, and cap 2.
- a cap 0 structure is the most elementary, namely m7GpppNp; however, an mRNA of cap 0 is likely to be recognized as exogenous RNA by the host, which could stimulate the innate immune response of the host and ultimately trigger inflammatory responses.
- a cap1 structure (m7GpppN1mp) has a methylated 2’ -OH on the first nucleotide connecting the 5’ end of the mRNA to the cap. Since the cap1 structure has only been described to date in eukaryotic mRNAs, it can be used as a signature of self-RNA, thus reducing the activation of pattern recognition receptor (PRR) and consequently improving translation efficiency of mRNA in vivo.
- cap2 (m7GpppN1mpN2mp) has a methylated 2’ -OH on both the first and second nucleotides that connect the 5’ end of the mRNA to the cap, and methylation improves mRNA translation efficiency.
- the 5’ cap has a cap1 structure.
- the 5’ cap is methylated, e.g., m7GpppN, wherein N is the 5’ terminal nucleotide of the nucleic acid carrying the 5’ cap.
- the 5’ cap structure is selected from glyceryl, inverted deoxy abasic residue, 4’ , 5’ -methylene nucleotide, 1- (beta-D-erythrofuranosyl) nucleotide, 4’ -thio nucleotide, carbocyclic nucleotide, 1, 5-anhydrohexitol nucleotide, L-nucleotides, alpha-nucleotide, modified base nucleotide, threo-pentofuranosyl nucleotide, acyclic 3’ , 4’ -seco nucleotide, acyclic 3, 4-dihydroxybutyl nucleotide, acyclic 3, 5 dihydroxypentyl nucleotide, 3’ -3’ -inverted nucleotide moiety, 3’ -3’ -inverted abasic moiety, 3’ -2’ -inverted nucleotide moiety
- RNA 5’ -triphosphatase RTPase
- GMP guanosine monophosphate
- GTase guanylyltransferase
- the guanosine moiety is methylated by a cap-specific S-adenosylmethionine- (AdoMet) -dependent (guanine-N7) methyltransferase (N7MTase) , forming a cap0 structure (m7GpppNp) .
- the cap0 structure can be further modified to cap1 (m7GpppN1mp) by 2’ -O-methyltransferase (2’ -O-MTase) .
- the mRNA further comprises a poly (A) tail at its 3’ end.
- the poly (A) tail locates downstream of all the non-coding RNA sequences and is used to terminate the transcription of RNA Pol II promoters.
- Frequently used Poly (A) signals include but not limited to, BGHpA, TKpA, hGHpA, and SV40pA.
- the poly (A) tail plays an important role in maintaining mRNA stability and translation efficiency. mRNA stability can be improved by inhibiting exonuclease-mediated mRNA degradation.
- the poly (A) tail can also bind to multiple poly (A) -binding proteins (PABPs) while working synergistically with 5’ m7G cap sequences to regulate translational efficiency.
- Polyadenylation can be done by traditional enzymatic polyadenylation, adding the poly (A) tail to the 3’ end of mRNA, or by designing a fixed-length poly (A) sequence on a DNA template and transcribing the resulting length-controllable poly (A) tail.
- the poly (A) tail is a long chain of adenine nucleotides that is added to the 3’ end of a mRNA molecule.
- the length of the poly (A) tail is at least 80, 90, 100, 150, 200, 250, 300, 350, 400, 450 or 500 nucleotides. In some embodiments, the length of the poly (A) tail is adjusted to control the stability of the mRNA molecule disclosed herein. For example, since the length of the poly (A) can influence the half-life of the mRNA molecule, the length of the poly (A) tail can be adjusted to modify the level of resistance of the mRNA to nucleases and thereby control the time course of protein expression.
- the RNA further comprises a sequence encoding a nuclear localization signal (NLS) , wherein the sequence encoding the NLS is located at the 5’ -end and/or 3’ -end of the protein coding sequence.
- the RNA comprises more than one sequences that each encodes a nuclear localization signal (NLS) , wherein the nuclear localization signals encoded by the sequences are the same or different.
- Nuclear localization signals are generally short peptides that act as a signal fragment that mediates the transport of proteins from the cytoplasm into the nucleus.
- the NLS is recognized by the corresponding nuclear transporters, which can interact with nucleoporins to help NLS-containing proteins reach the nucleus through nuclear pore complexes.
- Multiple NLS have been identified in the art (Lu et al., 2021) .
- classical NLS includes monopartite NLS (MP NLS) and bipartite NLS (BP NLS) .
- MP NLS are a single cluster composed of 4-8 basic amino acids, which generally contains 4 or more positively charged residues, that is, arginine (R) or lysine (K) .
- the characteristic motif of MP NLS is usually defined as K (K/R) X (K/R) , where X can be any residue.
- X can be any residue.
- the NLS of SV40 large T-antigen is 126PKKKRKV132, with five consecutive positively charged amino acids (KKKRK) .
- BP NLS are characterized by two clusters of 2-3 positively charged amino acids that are separated by a 9-12 amino-acid linker region, which contains several proline (P) residues [16] .
- the consensus sequence can be expressed as R/K (X) 10-12KRXK.
- the upstream and downstream clusters of amino acids are interdependent and indispensable, and jointly determine the localization of the protein in the cell.
- the BP NLS at the C-terminus of nucleoplasmin whose sequence is 155KRPAATKKAGQAKKKK170, can guide the protein into the nucleus.
- NLS non-classical NLS, such as the “proline-tyrosine” category, named PY-NLS.
- PY-NLS is characterized by 20-30 amino acids that assume a disordered structure, consisting of N-terminal hydrophobic or basic motifs and C-terminal R/K/H (X) 2-5PY motifs (where X2-5 is any sequence of 2-5 residues)
- the protein coding sequence encodes a nucleotide binding protein.
- the nucleotide binding protein is a DNA binding protein.
- the nucleotide binding protein is an RNA binding protein.
- “nucleotide binding protein” refers to a protein that is capable of binding to a single strand or double strand polynucleotide, for example, DNA or RNA.
- “RNA binding protein” refers to a protein that is capable of binding to the double or single stranded RNA to form a ribonucleoprotein complex.
- Classic RBPs are characterized by the presence of one or more RNA-binding domains (RBDs) .
- RBDs RNA-binding domains
- Most RBDs show defined 3D structures or features that make them computationally predictable.
- Classic RBDs include the prevalent RNA recognition motif (RRM) , the K-homology (KH) , DEAD/DEAH helicase and zinc-finger domains, and around 30 other domains of lesser abundance.
- RRM RNA recognition motif
- KH K-homology
- DEAD/DEAH helicase and zinc-finger domains and around 30 other domains of lesser abundance.
- Recent unbiased RNA interactome approaches have revealed additional unconventional RBPs that lack discernible RBDs but frequently contain intrinsically disordered regions or mononucleotide and dinucleotide binding domains that directly engage in RNA binding. (Gebauere et al., 2021)
- the RNA binding protein is an RNA-guided gene editing protein.
- RNA-guided systems which use complementarity between a guide RNA and target nucleic acid sequences for recognition of genetic elements, have a central role in biological processes in both prokaryotes and eukaryotes.
- CRISPR-Cas systems are well-studied prokaryotic RNA-guided systems.
- OMEGA bligate Mobile Element-guided Activity
- OMEGA is a newly identified class of RNA-guided system. It compasses an RNA-guided endonuclease protein (for example, TnpB, IscB or IsrB) and a non-coding RNA (ncRNA) transcribed from the transposon end region (called ⁇ RNA) .
- OMEGA systems are the ancestors of CRISPR-Cas systems, and TnpB evolved into the single RNA-guided endonuclease Cas12. TnpB also has remote homology with Fanzor protein.
- the RNA binding protein is an RNA-guided endonuclease protein in the CRISPR-Cas system. In some embodiments, the RNA binding protein is an RNA-guided endonuclease protein in the OMEGA system.
- the RNA binding protein is an IscB protein or a variant thereof.
- the IscB protein is OgeuIscB or AwaIscB.
- the IscB protein or a variant thereof has an amino acid of SEQ ID NO: 202 or 203.
- Transposon-encoded IscB family proteins are RNA-guided nucleases in the OMEGA (obligate mobile element-guided activity) system, and likely ancestors of the RNA-guided nuclease Cas9 in the type II CRISPR-Cas adaptive immune system.
- IscB associates with its cognate ⁇ RNA to form a ribonucleoprotein complex that cleaves double-stranded DNA targets complementary to an ⁇ RNA guide segment.
- IscB (insertion sequences Cas9-like OrfB) proteins are encoded in a distinct family of IS200/IS605 transposons.
- IscB and Cas9 share the RuvC-like nuclease domains containing three conserved catalytic motifs (RuvC-I-III) , with an inserted Arg-rich segment known as the bridge helix (BH) , and the HNH nuclease domain, IscB ( ⁇ 400 residues) is much smaller than Cas9 ( ⁇ 1000-1400 residues) , mainly due to the lack of the ⁇ -helical recognition (REC) lobe. Unlike Cas9, IscB contains an amino-terminal PLMP domain (named according to the corresponding distinct amino-acid motif) .
- IscB associates with a ⁇ 200-400-nt non-coding RNA (referred to as ⁇ RNA) , which is substantially larger than the ⁇ 100-nt crRNA: tracrRNA guides of Cas9, to form a ribonucleoprotein complex that cleaves dsDNA targets complementary to a 5’ guide sequence in the ⁇ RNA.
- IscB requires a target adjacent motif (TAM) for target DNA recognition, although its carboxy-terminal region lacks detectable sequence similarity with the equivalent PAM-interacting (PI) carboxy-terminal domain of Cas9.
- TAM target adjacent motif
- RNA binding protein is a Cas protein or a variant thereof.
- the RNA binding protein is a TnpB or a variant thereof.
- TnpB protein, encoded by the tnpB gene, is a programmable RNA-guided DNA endonuclease. TnpB cleaves double-and single-stranded DNA substrates in an RNA-guided manner. (Karvelis et al., 2021) .
- the RNA binding protein is an IsrB or a variant thereof.
- IsrB is a nickase that is homologous to IscB, lacking the HNH nuclease domain.
- IsrB protein is encoded in the IS200/IS605 superfamily of transposons. IsrB consists of only around 350 amino acids, but its small size is counterbalanced by a relatively large RNA guide (roughly 300-nt ⁇ RNA) . (Hirano et al., 2022)
- the RNA binding protein is an Fanzor (Fz) protein or a variant thereof.
- Fanzor (Fz) is a eukaryotic TnpB-IS200/IS605-like protein encoded by transposable elements, and it was initially suggested that Fz proteins (and prokaryotic TnpBs) regulate transposable element activity, possibly through methyltransferase activity. It has reported that Fanzor could cleave DNA. (Saito et al., 2023)
- the RNA binding protein is a Cas protein or a variant thereof.
- the Cas protein is a Cas9, a dead Cas9 (dCas9) , or a Cas9 nickase (nCas9) .
- Cas9, a programmable RNA-guided DNA endonuclease is the effector component of type II CRISPR-Cas adaptive immune systems.
- Cas9 associates with a CRISPR RNA (crRNA) and a trans-activating crRNA (tracrRNA) (or a synthetic single-guide RNA) and cleaves double-stranded DNA (dsDNA) targets complementary to the ⁇ 20-nucleotide (nt) crRNA guide segment derived from CRISPR spacers, using its RuvC and HNH nuclease domains.
- Cas9 requires a specific nucleotide motif adjacent to target sequences, the protospacer adjacent motif (PAM) , for DNA recognition.
- PAM protospacer adjacent motif
- Cas9 proteins such as Streptococcus pyogenes Cas9 (SpCas9) , exhibit robust DNA cleavage activity in mammalian cells and have been harnessed for a variety of molecular technologies, including genome editing, base editing, and transcriptional regulation
- the Cas protein is selected from the group consisting of SpCas9, FnCas9, St1Cas9, St3Cas9, NmCas9, SaCas9, AsCpfl, LbCpfl, FnCpfl, VQR Cas9, EQR Cas9, VRER Cas9, Cas9-NG, xCas9, eCas9, SpCas9-HF1, HypaCas9, HiFiCas9, sniper-Cas9, SpG, SpRY, KKH SaCas9, CjCas9, Cas9-NRRH, Cas9-NRCH, Cas9-NRTH, SsCpfl, PcCpfl, BpCpfl, LiCpfl, PmCpfl, Lb2Cpf1, PbCpfl, PbCpfl, PeCpf1, PdCpf1, MbCpf1, EeC
- Cas protein or “clustered regularly interspaced short palindromic repeats (CRISPR) -associated (Cas) protein” refers to RNA-guided DNA endonuclease enzymes associated with the CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) adaptive immunity system in Streptococcus pyogenes, as well as other bacteria.
- Cas proteins include Cas9 proteins, Cas12a (Cpf1) proteins, Cas12b (formerly known as C2c1) proteins, Cas13 proteins and various engineered counterparts. Table 3 lists exemplary Cas proteins.
- the protein having enzymatic activity is a cytidine deaminase or a variant thereof.
- Cytidine deaminase refers to enzymes that catalyze the hydrolytic deamination of cytidine and deoxycytidine to uridine and deoxyuridine, respectively. Cytidine deaminases maintain the cellular pyrimidine pool.
- a family of cytidine deaminases is APOBEC ( “apolipoprotein B mRNA editing enzyme, catalytic polypeptide-like” ) . Members of this family are C-to-U editing enzymes.
- APOBEC family members have two domains, one domain of APOBEC like proteins is the catalytic domain, while the other domain is a pseudocatalytic domain. More specifically, the catalytic domain is a zinc dependent cytidine deaminase domain and is important for cytidine deamination.
- RNA editing by APOBEC-1 requires homodimerisation and this complex interacts with RNA binding proteins to form the editosome.
- Non-limiting examples of APOBEC proteins include APOBEC1, APOBEC2, APOBEC3A, APOBEC3B, APOBEC3C, APOBEC3D, APOBEC3F, APOBEC3G, APOBEC3H, APOBEC4, and activation-induced (cytidine) deaminase (AID) .
- mutants of the APOBEC proteins are also known that have brought about different editing characteristics for base editors.
- certain mutants e.g., W98Y, Y130F, Y132D, W104A, D131Y and P134Y
- the term APOBEC and each of its family member also encompasses variants and mutants that have certain level (e.g., 70%, 75%, 80%, 85%, 90%, 95%, 98%, 99%) of sequence identity to the corresponding wildtype APOBEC protein or the catalytic domain and retain the cytidine deaminating activity.
- the variants and mutants can be derived with amino acid additions, deletions and/or substitutions. Such substitutions, in some embodiments, are conservative substitutions.
- the cytidine deaminase is selected from the group consisting of APOBEC3B (A3B) , APOBEC3C (A3C) , APOBEC3D (A3D) , APOBEC3F (A3F) , APOBEC3G (A3G) , APOBEC3H (A3H) , APOBEC1 (Al) , APOBEC3 (A3) , APOBEC2 (A2) , APOBEC4 (A4) , and AICDA (AID) .
- the cytidine deaminase comprises an amino acid sequence of SEQ ID NO: 166-201.
- the cytidine deaminase is a naturally occurring cytidine deaminase, an engineered cytidine deaminase, an evolved cytidine deaminase, or an adenosine deaminase that possesses cytidine deaminase activity.
- the cytidine deaminase is a human or mouse cytidine deaminase.
- the protein having enzymatic activity is an adenosine deaminase or a variant thereof.
- adenosine deaminase refers to an enzyme of the purine metabolism which catalyzes the irreversible deamination of adenosine and deoxyadenosine to inosine and deoxyinosine, respectively.
- the adenosine deaminase is selected from the group consisting of tRNA-specific adenosine deaminase (TadA) , adenosine deaminase tRNA specific 1 (ADAT1) , adenosine deaminase tRNA specific 2 (ADAT2) , adenosine deaminase tRNA specific 3 (ADAT3) , adenosine deaminase RNA specific B1 (ADARB1) , adenosine deaminase RNA specific B2 (ADARB2) , adenosine monophosphate deaminase 1 (AMPD1) , adenosine monophosphate deaminase 2 (AMPD2) , adenosine monophosphate deaminase 3 (AMPD3) , adenosine deaminase (ADA)
- the adenosine deaminase comprises an amino acid sequence of SEQ ID NO: 73-165.
- the adenosine deaminase is a naturally occurring adenosine deaminase, an engineered adenosine deaminase, an evolved adenosine deaminase, or an adenosine deaminase that possesses adenosine deaminase activity.
- the adenosine deaminase is a human or mouse adenosine deaminase.
- the protein coding sequence encodes a base editor protein.
- Base editor protein refers to a class of gene editing enzymes comprising an RNA-guided gene editing protein that is fused to or linked to a nucleotide deaminase or a catalytic domain thereof.
- a base editor protein comprises a Cas9 nickase fused to a cytidine deaminase or an adenosine deaminase.
- the base editor protein is a cytidine base editor (CBE) protein.
- the CBE protein is selected from BE3, YE1-BE3, YEE-BE3, BE4, eBE, hA3A-BE3, hA3A-BE3-Y130F, hA3A-BE3-Y132D, eA3A-BE3, SaKKH-BE3, Target-AID, dCas12a-BE, BEACON1, BEACON2, enAsBE, PBE, and A3A-PBE.
- the base editor protein is an adenosine base editor (ABE) protein.
- ABE adenosine base editor
- the ABE protein is selected from ABE7.10, ABE8e, ABE8e-V106W, LbABE8e, STEME-1, ABE-P1, ABE-P2, and rBE14.
- the protein having enzymatic activity is a methylase, or a reverse transcriptase.
- Methylase also called methyltransferase, adds methyl groups (-CH3) to adenine or cytosine bases within the recognition sequence, which is thus modified and protected from the endonuclease.
- the methylase is DNMT1, DNMT3a1, DNMT3a2, and DNMT3b.
- Reverse transcriptase is an RNA-dependent DNA polymerase.
- the reverse transcriptase is moloney murine leukemia virus reverse transcriptase (MMLV-RT) , or a functional variant thereof.
- the tRNA-like structure comprises an acceptor stem, a D-loop arm, and a T ⁇ C-loop arm.
- the acceptor stem is a 7-to 9-base pair (bp) stem made by the base pairing of the 5’ -terminal nucleotide with the 3’ -terminal nucleotide (which contains the CCA 3’ -terminal group used to attach the amino acid) .
- the acceptor stem may contain non-Watson-Crick base pairs.
- the D loop is a 4-to 6-bp stem ending in a loop that often contains dihydrouridine.
- the T ⁇ C-loop (generally called the T-loop) contains thymine, a base usually found in DNA and pseudouracil ( ⁇ ) .
- the tRNA-like structure comprises a cleavage site for one or more RNase P, RNase Z, and/or RNase E. In some embodiments, the tRNA-like structure comprises cleavage sites for both RNase P and RNase Z.
- the tRNA-like structure is derived from a tRNA gene or a long non-coding RNA (lncRNA) gene.
- the long non-coding RNA (lncRNA) gene is metastasis-associated lung adenocarcinoma transcript 1 (MALAT1) or nuclear paraspeckle assembly transcript 1 (NEAT1) .
- the tRNA-like structure is derived from a eukaryotic organism.
- the eukaryotic organism is selected from the group consisting of Saccharomyces cerevisiae, Arabidopsis thaliana, Oryza sativa, Homo Sapiens, Macaca mulatta, Macaca fascicularis, Susscrofa domestica, Canis lupus familiaris, Rattus norvegicus, and Mus musculus.
- the tRNA-like structure is encoded by any one of SEQ ID NOs: 4-7.
- the tRNA-like structure has an amino acid sequence with at least 70%, 75%, 80%, 85%, 90%, 95%, or 99%identity with the amino acid sequence encoded by any one of SEQ ID NOs: 4-7.
- the RNA comprises one or more than one protein coding sequences.
- the RNA comprises more than one non-coding RNA sequences, the non-coding RNA sequences are the same or different. In some embodiments of the RNA disclosed herein, the RNA comprises one non-coding RNA sequence. In some embodiments of the RNA disclosed herein, the RNA comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 non-coding RNA sequences.
- the RNA comprises more than one non-coding RNA sequences that are guide RNAs (gRNAs) , wherein the gRNAs are the same or different.
- the RNA comprises one gRNA.
- the RNA comprises more than one gRNAs.
- the RNA comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 gRNAs.
- the RNA comprises more than one non-coding RNA sequences, and wherein the RNA comprises a tRNA-like structure (TLS) between the protein coding sequence and the nearest non-coding RNA sequence, and between each non-coding RNA sequences.
- TLS tRNA-like structure
- the protein coding sequence is located upstream relative to all non-coding RNA sequences.
- the RNA comprises at least one modified nucleotide.
- the modification is selected from 2’ -O-alkyl (such as 2’-O-methyl) , 2’ -substituted alkoxy, 2’ -substituted alkyl, 2’ -halo (such as 2’ -fluoro) , 3’ -phosphorothioate, bridged nucleic acid (BNA) , and locked nucleic acid (LNA) .
- the RNA is codon-optimized.
- the present disclosure provides a DNA encoding any of the RNA disclosed herein.
- the DNA further comprises an RNA polymerase promoter at the 5’ -end.
- the RNA polymerase promoter is a eukaryotic RNA polymerase II promoter.
- the RNA polymerase promoter is selected from human cytomegalovirus immediate early enhancer/promoter (CMV promoter) , human eukaryotic translation elongation factor 1 ⁇ 1 promoter (EF1a promoter) , CMV early enhancer fused to modified chicken ⁇ -actin promoter (CAG promoter) , Simian virus 40 enhancer/early promoter (SV40 promoter) , and human or mouse phosphoglycerate kinase 1 promoter (PGK promoter) .
- CMV promoter human cytomegalovirus immediate early enhancer/promoter
- EF1a promoter human eukaryotic translation elongation factor 1 ⁇ 1 promoter
- CAG promoter CMV early enhancer fused to modified chicken ⁇ -actin promoter
- SV40 promoter Simian virus 40 enhancer/early promoter
- PGK promoter human or mouse phosphoglycerate kinase 1 promoter
- the present disclosure provides a system comprising any of the RNA disclosed herein, a ribonuclease P (RNase P) or a polynucleotide encoding thereof, and a ribonuclease Z (RNase Z) or a polynucleotide encoding thereof.
- RNase P ribonuclease P
- RNase Z ribonuclease Z
- the present disclosure provides a system comprising any of the DNA disclosed herein and/or any of the vector disclosed herein, an RNA polymerase II or a polynucleotide encoding thereof, a ribonuclease P (RNase P) or a polynucleotide encoding thereof, and a ribonuclease Z (RNase Z) or a polynucleotide encoding thereof.
- RNase P ribonuclease P
- RNase Z ribonuclease Z
- the present disclosure provides a vector comprising any of the DNA disclosed herein.
- the vector is a viral vector or a plasmid.
- any methods known in the art for the insertion of DNA fragments into a vector can be used to construct expression vectors comprising a polynucleotide disclosed herein. These methods can include in vitro recombinant DNA and synthetic techniques and in vivo (genetic) recombination.
- the polynucleotide disclosed herein can be operably linked to control sequences in the expression vector (s) to ensure protein expression.
- control sequences may include, but are not limited to, leader or signal sequences, promoters (e.g., naturally associated or heterologous promoters) , ribosomal binding sites, enhancer or activator elements, translational start and termination sequences, and transcription start and termination sequences, and are chosen to be compatible with the host cell chosen to express the proteins.
- the promoters may be either naturally occurring promoters, hybrid promoters that combine elements of more than one promoter, or synthetic promoters.
- An expression construct may be present in a cell on an episome, such as a plasmid, or the expression construct may be inserted in a chromosome such as in a gene locus.
- the expression vector includes a selectable marker gene to allow the selection of transformed host cells.
- the vector is an expression vector comprising a nucleotide sequence encoding a variant polypeptide operably linked to at least one regulatory control sequence. Regulatory control sequences for use herein include promoters, enhancers, and other expression control elements.
- the expression vector is designed for the choice of the host cell to be transformed, the particular variant polypeptide desired to be expressed, the vector's copy number, the ability to control that copy number, and/or the expression of any other protein encoded by the vector, such as antibiotic markers.
- the vector can include, but is not limited to, viral vectors and plasmid DNA.
- Viral vectors can include, but are not limited to, adenoviral vectors, lentiviral vectors, retroviral vectors, and adeno-associated viral vectors.
- expression vectors contain selection markers such as ampicillin-resistance, hygromycin-resistance, tetracycline resistance, kanamycin resistance, or neomycin resistance to permit detection of those cells transformed with the desired DNA sequences.
- Suitable vectors, promoter, and enhancer elements are known in the art; many are commercially available for generating subject recombinant constructs.
- the vector is a polycistronic vector.
- the vector is a bicistronic vector or a tricistronic vector.
- Bicistronic or polycistronic expression vectors may include (1) multiple promoters fused to each of the open reading frames; (2) insertion of splicing signals between genes; (3) fusion of genes whose expressions are driven by a single promoter; and (4) insertion of proteolytic cleavage sites between genes (self-cleavage peptide) or insertion of internal ribosomal entry sites (IRESs) between genes.
- a polycistronic vector is used to co-express multiple genes in the same cell.
- Two strategies are most commonly used to construct a multicistronic vector.
- an Internal Ribosome Entry Site (IRES) element is typically used for bi-cistronic vectors.
- the IRES element acting as another ribosome recruitment site, allows initiation of translation from an internal region of the mRNA. Thus, two proteins are translated from one mRNA.
- IRES elements are quite large (usually 500-600 bp) (Pelletier et al., 1988; Jang et al., 1988) .
- the engineered CD47 proteins disclosed herein have a smaller size compared to the wild-type full-length human CD47, and thus could be used with IRES element in a multicistronic vectors having limited packaging capacity.
- the present disclosure provides a composition comprising (1) any of the RNA disclosed herein, any of the DNA disclosed herein, and/or any of the vector disclosed herein; and (2) a carrier.
- the carrier is selected from lipid nanoparticles, liposomes, cationic nanoemulsions, dendrimer-based lipid nanoparticles, cationic polymers, and polysaccharide particles.
- the term “carrier” refers to compounds or compositions that are used for delivery of the polypeptide or the polynucleotide into a subject.
- the carrier enhances effectiveness and/or safety of the delivery.
- the carrier is capable of delivering large nucleic acid sequences (e.g., nucleic acids of at least 1 kDa, 1.5 kDa, 2 kDa, 2.5 kDa, 5 kDa, 10 kDa, 12 kDa, 15 kDa, 20 kDa, 25 kDa, 30 kDa, or more) .
- the nucleic acids can be formulated with one or more acceptable reagents, which provide a vehicle for delivering such nucleic acids to target cells.
- Appropriate reagents are generally selected with regards to a number of factors, which include, among other things, the biological or chemical properties of the nucleic acids (e.g., charge) , the intended route of administration, the anticipated biological environment to which such nucleic acids will be exposed and the specific properties of the intended target cells.
- Lipid nanoparticles are nanoparticles made of one or more types of lipids.
- lipid nanoparticles comprise ionizable lipids, which are positively charged at low pH (enabling RNA complexation) and neutral at physiological pH (reducing potential toxic effects, as compared with positively charged lipids, such as liposomes) .
- lipid nanoparticles are taken up by cells via endocytosis, and the ionizability of the lipids at low pH (likely) enables endosomal escape, which allows release of the cargo into the cytoplasm.
- the lipid nanoparticles comprise cationic lipids, which have a head group with permanent positive charges.
- lipid nanoparticles usually contain a helper lipid, for example, phospholipid, to promote cell binding, cholesterol to fill the gaps between the lipids, and a polyethylene glycol (PEG) to reduce opsonization by serum proteins and reticuloendothelial clearance.
- helper lipid for example, phospholipid
- PEG polyethylene glycol
- the relative amounts of ionizable lipid, helper lipid, cholesterol and PEG can vary (See Hou et al., Nature Reviews Materials, 2021) .
- Liposomes are spherical-shaped vesicles that is composed of one or more phospholipid bilayers. Liposomes are most often composed of phospholipids, especially phosphatidylcholine and cholesterol, but may also include other lipids, such as phosphatidylethanolamine, as long as they are compatible with lipid bilayer structure. The lipid bilayer of liposome can fuse with other bilayers such as the cell membrane, thus delivering the liposome contents. Generally, liposomes are definite as spherical vesicles with particle sizes ranging from 30 nm to several micrometers.
- lipid bilayers surrounding aqueous units, where the polar head groups are oriented in the pathway of the interior and exterior aqueous phases.
- self-aggregation of polar lipids is not limited to conventional bilayer structures which rely on molecular shape, temperature, and environmental and preparation conditions but may self-assemble into various types of colloidal particles (See Akbarzadeh, Nanoscale Res Lett., 2013) .
- Cationic nanoemulsions are mainly composed of two parts: one is the cationic lipid DOTAP (1, 2-dioleoyl-sn-glycero-3-phosphocholine) that can be added to the oil phase to bind the mRNA electrostatically; the other is the emulsion adjuvant MF59 that is an oil-in-water emulsion consisting of squalene and surfactants.
- CNEs are usually fabricated by the probe sonication method (Brito et al., A cationic nanoemulsion for the delivery of next-generation RNA vaccines, 2014) .
- Dendrimer-based lipid nanoparticles are nanoparticles made of lipids and dendrimers, which are highly ordered, branched polymeric molecules. Dendrimers are composed of three distinct structural components: (1) the core, (2) the repetitive branching layers (also referred to as “generation” ) , and (3) the abundant terminal groups. These precisely controlled dendritic structures harbor multivalent cooperativity and can exploit membrane-fusion-based endosome release by mimicking lipid vectors, while simultaneously retaining the “proton-sponge” -mediated endosome release of polymer vectors (See Chen et al., Amphiphilic Dendrimer Vectors for RNA Delivery: State-of-the-Art and Future Perspective, 2022) .
- Cationic polymer is another viable RNA carrier.
- An exemplary cationic polymer is poly (ethyleneimine) and its derivatives
- Polyethyleneimine (PEI) is among the earliest and most widely studied cationic polymers for gene delivery, including the delivery of RNA. It has high gene transfection efficiency and is often referred to as the gold standard for non-viral gene transfection (Lungwitz et al., 2005) .
- PEI can be in either linear or branched structures and its positive charge is conferred by numerous amine groups separated by short alkyl spacers, which lead to very high positive charge density within its structure (Jiang et al., Polymeric nanoparticles for RNA delivery, 2021) .
- Polysaccharides are a complex collection of biopolymers isolated from plant, animal, microbial and algal sources that are built from monosaccharides linked by O-glycosidic linkages.
- An exemplary polysaccharide that can be used for RNA delivery is Chitosan, is a polysaccharide contained in the cell walls of fungi and in the shells of arthropods such as crustaceans and consists of a linear chain of 2-acetaylamino-2-deoxy- ⁇ -D-glucopyranose units connected through ⁇ -1, 4 linkages (Bodnar, Hartmann &Borbely, 2005; Barclay et al., Review of polysaccharide particle-based functional drug delivery, 2020) .
- composition includes, but is not limited to, a pharmaceutical composition.
- a “pharmaceutical composition” refers to an active pharmaceutical agent formulated in pharmaceutically acceptable or physiologically acceptable solutions for administration to a cell or an animal, either alone, or in combination with one or more other modalities of therapy. It will also be understood that, if desired, the compositions of the disclosure may be administered in combination with other agents, such as, e.g., cytokines, growth factors, hormones, small molecules, chemotherapeutics, pro-drugs, drugs, antibodies, or other various pharmaceutically active agents. There is virtually no limit to other components that may also be included in the compositions, provided that the additional agents do not adversely affect the ability of the composition to deliver the intended therapy.
- phrases “pharmaceutically acceptable” is used herein to refer to those compounds, materials, compositions, and/or dosage forms which are, within the scope of sound medical judgment, suitable for use in contact with the tissues of human beings and animals without excessive toxicity, irritation, allergic response, or other problem or complication, commensurate with a reasonable benefit/risk ratio.
- compositions may also comprise a pharmaceutically acceptable carrier, diluent, or excipient.
- pharmaceutically acceptable carrier, diluent, or excipient includes, without limitation, any adjuvant, carrier, excipient, glidant, sweetening agent, diluent, preservative, dye/colorant, flavor enhancer, surfactant, wetting agent, dispersing agent, suspending agent, stabilizer, isotonic agent, solvent, surfactant, or emulsifier which has been approved by the United States Food and Drug Administration as being acceptable for use in humans or domestic animals.
- Exemplary pharmaceutically acceptable carriers include, but are not limited to, to sugars, such as lactose, glucose, and sucrose; starches, such as corn starch and potato starch; cellulose, and its derivatives, such as sodium carboxymethyl cellulose, ethyl cellulose, and cellulose acetate; tragacanth; malt; gelatin; talc; cocoa butter; waxes; animal and vegetable fats; paraffins; silicones; bentonites; silicic acid; zinc oxide; oils, such as peanut oil, cottonseed oil, safflower oil, sesame oil, olive oil, corn oil, and soybean oil; glycols, such as propylene glycol; polyols, such as glycerin, sorbitol, mannitol, and polyethylene glycol; esters, such as ethyl oleate, and ethyl laurate; agar; buffering agents, such as magnesium hydroxide and aluminum hydroxide; alginic acid;
- the liquid pharmaceutical compositions may include one or more of the following: sterile diluents such as water for injection, saline solution, preferably physiological saline; Ringers solution; isotonic sodium chloride; fixed oils such as synthetic mono or diglycerides which may serve as the solvent or suspending medium; polyethylene glycols; glycerin; propylene glycol or other solvents; antibacterial agents, such as benzyl alcohol or methyl paraben; antioxidants such as ascorbic acid or sodium bisulfite; chelating agents, such as ethylenediaminetetraacetic acid; buffers such as acetates, citrates, or phosphates; and agents for the adjustment of tonicity, such as sodium chloride or dextrose.
- the parenteral preparation can be enclosed in ampoules, disposable syringes, or multiple dose vials made of glass or plastic.
- An injectable pharmaceutical composition is preferably sterile.
- composition may be suitably developed for intravenous, intratumoral, oral, rectal, vaginal, parenteral, topical, pulmonary, intranasal, buccal, ophthalmic, or another route of administration.
- the present disclosure provides gene editing systems that are constructed with tRNA-like structures.
- the gene editing system is a CRISPR-Cas system.
- the gene editing system is a base editor (BE) system.
- the gene editing system is a transformer base editor system (tBE) .
- a transformer base editor is a CRISPR-based gene editing system which can edit cytosine or adenosine in target regions with high specificity, preferably with no observable off-target mutations.
- the transformer base editor (tBE) system comprises a CRISPR-associated protein (Cas protein) fused with a deaminase, a deaminase inhibitor domain, and a split-TEV protease.
- Cas protein CRISPR-associated protein
- a tBE system described by Wang et al. uses one main gRNA (mgRNA) to bind at the target genomic site and one helper (hgRNA) to bind at a nearby region (preferably upstream to the target genomic site) .
- the binding of the two gRNAs can guide the components of tBE system to correctly assemble at the target genomic site for base editing.
- the mgRNA is a main sgRNA (msgRNA) .
- the hgRNA is a helper sgRNA (hsgRNA) .
- the present disclosure provides a gene editing system comprising
- an hgRNA comprising a CRISPR motif, an hgRNA spacer, and a first protein-binding motif, or a DNA polynucleotide encoding the hgRNA
- an mgRNA comprising a second CRISPR motif and an mgRNA spacer, or a DNA polynucleotide encoding the mgRNA, wherein the mgRNA spacer targets a target gene,
- a first CRISPR-associated protein (Cas protein) , or a polynucleotide encoding the first Cas protein, wherein the first Cas protein binds to the first CRISPR motif
- a second Cas protein or a polynucleotide encoding the second Cas protein, wherein the second Cas protein binds to the second CRISPR motif
- a first fusion protein comprising a nucleobase deaminase or a catalytic domain thereof and a first RNA binding domain, or a polynucleotide encoding the first fusion protein, wherein the nucleobase deaminase or the catalytic domain thereof and the first RNA binding domain are optionally connected by a linker, and wherein the first RNA binding domain binds to the first protein-binding motif.
- first Cas protein and second Cas protein are the same or different
- the gene editing system comprises a TLS-containing RNA or a DNA encoding the TLS-containing RNA
- the TLS-containing RNA comprises a protein coding sequence, a non-coding RNA sequence, and a tRNA-like structure (TLS) between the protein coding sequence and the non-coding RNA sequence
- the protein encoded by the protein coding sequence is selected from a group consisting of the first Cas protein, the second Cas protein, and the first fusion protein
- the non-coding RNA sequence is the hgRNA or the mgRNA.
- a “TLS-containing RNA” refers to a RNA comprising at least one tRNA-like structure.
- a TLS-containing RNA comprises an mRNA and at least one non-coding RNA, wherein at least one tRNA-like structure is located between the mRNA and the non-coding RNA, and/or between the non-coding RNAs.
- the gene editing system further comprises
- nucleobase deaminase inhibitor domain or a polynucleotide encoding thereof
- nucleobase deaminase inhibitor domain is connected to the nucleobase deaminase or the catalytic domain thereof in the first fusion protein optionally by a linker, and wherein there is a cleavage site for the protease between the nucleobase deaminase inhibitor domain and the nucleobase deaminase or the catalytic domain thereof.
- the gene editing system further comprises a second fusion protein comprising the protease and a second RNA binding domain, or a polynucleotide encoding the second fusion protein,
- protease and the second RNA binding domain are optionally connected by a linker
- mgRNA further comprises a second protein-binding motif
- RNA binding domain binds to the second protein-binding motif
- the protein encoded by the protein coding sequence is selected from a group consisting of the first Cas protein, the second Cas protein, the first fusion protein, and the second fusion protein.
- the protease is split into a first protease fragment and a second protease fragment, wherein the first and/or second protease fragment alone is not able to cleave the cleavage site.
- the gene editing system comprises
- a second fusion protein comprising the first protease fragment and a second RNA binding domain, or a polynucleotide encoding the second fusion protein, wherein the first protease fragment and the second RNA binding domain are optionally connected by a linker, and
- a third fusion protein comprising the second protease fragment and a third RNA binding domain, or a polynucleotide encoding the third fusion protein, wherein the second protease fragment and the third RNA binding domain are optionally connected by a linker,
- mgRNA further comprises a second protein-binding motif and a third protein-binding motif
- RNA binding domain binds to the second protein-binding motif
- third RNA binding domain binds to the third protein-binding motif
- the protein encoded by the protein coding sequence is selected from a group consisting of the first Cas protein, the second Cas protein, the first fusion protein, the second fusion protein, and the third fusion protein.
- the second and third RNA binding domains are the same or different, and the second and third protein-binding motifs are the same or different.
- the gene editing system further comprises
- a second fusion protein comprising the first protease fragment and a second RNA binding domain, or a polynucleotide encoding the second fusion protein
- first protease fragment and the second RNA binding domain are optionally connected by a linker
- mgRNA further comprises a second protein-binding motif
- the protein encoded by the protein coding sequence is selected from a group consisting of the first Cas protein, the second Cas protein, the first fusion protein, and the second fusion protein.
- the protease is a TEV protease, a TuMV protease, a PPV protease, a PVY protease, a ZIKV protease, or a WNV protease.
- the protease is a TEV protease comprising a sequence of SEQ ID NO: 261.
- the first TEV protease fragment comprises a sequence of SEQ ID NO: 262 or 263.
- a “protease” refers to an enzyme that catalyzes proteolysis.
- a “cleavage site for a protease” refers to a short peptide that the protease recognizes, and within the short peptide creates a proteolytic cleavage.
- Non-limiting examples of proteases include TEV protease, TuMV protease, PPV protease, PVY protease, ZIKV protease, and WNV protease.
- the protein sequences of example proteases and their corresponding cleavage sites are provided in Table 4..
- the protease cleavage site is a self-cleaving peptide, such as the 2A peptides.
- 2A peptides are 18-22 amino-acid-long viral oligopeptides that mediate “cleavage” of polypeptides during translation in eukaryotic cells.
- the designation “2A” refers to a specific region of the viral genome and different viral 2As have generally been named after the virus they were derived from.
- the first discovered 2A was F2A (foot-and-mouth disease virus) , after which E2A (equine rhinitis A virus) , P2A (porcine teschovirus-1 2A) , and T2A (thosea asigna virus 2A) were also identified.
- E2A equine rhinitis A virus
- P2A porcine teschovirus-1 2A
- T2A thosea asigna virus 2A
- the first and/or the second TEV protease fragment is not able to cleave the TEV cleavage site on its own. However, in the presence of the remaining portion of the TEV protease, this fragment will be able to effectuate the cleavage.
- the TEV fragment may be the TEV N-terminal domain (e.g., SEQ ID NO: 262) or the TEV C-terminal domain (e.g., SEQ ID NO: 263) .
- the first TEV protease fragment comprises a sequence of SEQ ID NO: 262.
- the first TEV protease fragment comprises a sequence of SEQ ID NO: 263.
- the nucleotide deaminase is a cytidine deaminase.
- the cytidine deaminase is selected from the group consisting of APOBEC3B (A3B) , APOBEC3C (A3C) , APOBEC3D (A3D) , APOBEC3F (A3F) , APOBEC3G (A3G) , APOBEC3H (A3H) , APOBECI (Al) , APOBEC3 (A3) , APOBEC2 (A2) , APOBEC4 (A4) , and AICDA (AID) .
- the cytidine deaminase comprises an amino acid sequence of any one of SEQ ID NOs: 166-201.
- the nucleotide deaminase is an adenosine deaminase.
- the adenosine deaminase is selected from the group consisting of tRNA-specific adenosine deaminase (TadA) , adenosine deaminase tRNA specific 1 (ADAT1) , adenosine deaminase tRNA specific 2 (ADAT2) , adenosine deaminase tRNA specific 3 (ADAT3) , adenosine deaminase RNA specific B1 (ADARB1) , adenosine deaminase RNA specific B2 (ADARB2) , adenosine monophosphate deaminase 1 (AMPD1) , adenosine monophosphate deaminase 2 (AMPD2) , adenosine monophosphate deaminase 3 (AMPD3) , adenosine deaminase (ADA)
- TadA tRNA-specific a
- the adenosine deaminase comprises an amino acid sequence of any one of SEQ ID NOs: 73-165.
- the first fusion protein further comprises an uracil glycosylase inhibitor (UGI) .
- the UGI has a sequence of SEQ ID NO: 275.
- the “Uracil Glycosylase Inhibitor” (UGI) which can be prepared from Bacillus subtilis bacteriophage PBS1, is a small protein (9.5 kDa) which inhibits E. coli uracil-DNA glycosylase (UDG) as well as UDG from other species. Inhibition of UDG occurs by reversible protein binding with a 1: 1 UDG: UGI stoichiometry. UGI is capable of dissociating UDG-DNA complexes.
- UGI is found in Bacillus phage AR9 (YP_009283008.1) .
- the UGI comprises the amino acid sequence of SEQ ID NO: 275 or has at least 70%, 75%, 80%, 85%, 90%or 95%sequence identity to SEQ ID NO: 275 and retains the uracil glycosylase inhibition activity.
- the Cas protein is a Cas9, a dead Cas9 (dCas9) , or a Cas9 nickase (nCas9) selected from the group consisting of SpCas9, FnCas9, St1Cas9, St3Cas9, NmCas9, SaCas9, AsCpfl, LbCpfl, FnCpfl, VQR SpCas9, EQR SpCas9, VRER SpCas9, SpCas9-NG, xSpCas9, RHA FnCas9, KKH SaCas9, NmeCas9, StCas9, CjCas9, AsCpfl, FnCpfl, SsCpfl, PcCpfl, BpCpfl, CmtCpfl, LiCpfl, PmCpfl, Pb3310Cpfl, Pb4417Cpfl, BsCpfl,
- nCas9 Cas9
- the first protein-binding RNA motif and the first RNA binding domain, the second protein-binding RNA motif and the second RNA binding domain, and the third protein-binding RNA motif and the third RNA binding domain are each independently selected from the group consisting of a MS2 phage operator stem-loop and MS2 coat protein (MCP) or an RNA-binding section thereof,
- MCP MS2 phage operator stem-loop and MS2 coat protein
- telomerase Ku binding motif and Ku protein or an RNA-binding section thereof
- telomerase Sm7 binding motif and Sm7 protein or an RNA-binding section thereof
- PCP PP7 phage operator stem -loop and PP7 coat protein
- RNA aptamer a non-natural RNA aptamer and corresponding aptamer ligand or an RNA-binding section thereof.
- the mgRNA and/or the hgRNA comprise a dual-RNA structure.
- the dual-RNA structure is formed by a CRISPR RNA (crRNA) and a trans-activating crRNA (tracrRNA) , wherein the crRNA comprises the spacer.
- crRNA CRISPR RNA
- tracrRNA trans-activating crRNA
- the mgRNA comprises a mcrRNA and a first tracrRNA
- the mcrRNA comprises the mgRNA spacer
- the hgRNA comprises a hcrRNA and a second tracrRNA
- the hcrRNA comprises the hgRNA spacer
- the first tracrRNA and the second tracrRNA are same or different.
- the TLS-containing RNA comprises more than one non-coding RNA sequences.
- the target gene is a mammalian gene.
- the target gene is TRAC, CD52, B2M, PDCD1, CTLA4, CD7, HBG, CD33, CD123 or CLL-1.
- the target gene is HBV gene.
- the mgRNA spacer and the hgRNA spacer are respectively:
- the present disclosure provides a method for gene editing in a subject, comprising administering to the subject (1) any of the RNA disclosed herein; and/or (2) any of the DNA disclosed herein; and/or (3) any of the vector disclosed herein; and/or (4) any of the system disclosed herein; and/or (5) any of the composition disclosed herein; and/or (6) any of the gene editing system disclosed herein.
- the subject is a mammal. In some embodiments, the subject is a human.
- RNA sequences described herein and/or the polynucleotides described herein can be prepared according to any available technique including, for example, but not limited to chemical synthesis, enzymatic synthesis, which is generally termed in vitro transcription (IVT) or enzymatic or chemical cleavage of a longer precursor, etc.
- IVT in vitro transcription
- Methods of synthesizing RNAs are known in the art (see, e.g., Gait, M.J. (ed. ) Oligonucleotide synthesis: a practical approach, Oxford [Oxfordshire] , Washington, D.C.: IRL Press, 1984; and Herdewijn, P. (ed.
- RNA of interest For in vitro transcription, first DNA encoding RNA of interest is synthesized with methods know in the art, such as column-based oligonucleotide synthesis and microarray-based oligonucleotide synthesis.
- column-based oligonucleotide synthesis and microarray-based oligonucleotide synthesis are synthesized with methods know in the art, such as column-based oligonucleotide synthesis and microarray-based oligonucleotide synthesis.
- the DNA construct is then cloned into a plasmid vector.
- the vector comprises a T7 promoter.
- the vector is then amplified, isolated and purified using methods known in the art such as, but not limited to, a maxi prep using the Invitrogen PURELINK TM HiPure Maxiprep Kit (Carlsbad, Calif. ) .
- the plasmid may then be linearized using methods known in the art such as, but not limited to, the use of restriction enzymes and buffers.
- the linearization reaction may be purified using methods including, for example Invitrogen's PURELINK TM PCR Micro Kit (Carlsbad, Calif. ) , and HPLC based purification methods such as, but not limited to, strong anion exchange HPLC, weak anion exchange HPLC, reverse phase HPLC (RP-HPLC) , and hydrophobic interaction HPLC (HIC-HPLC) and Invitrogen's standard PURELINK TM PCR Kit (Carlsbad, Calif. ) .
- the purification method may be modified depending on the size of the linearization reaction which was conducted.
- the linearized plasmid is then used to generate cDNA for in vitro transcription (IVT) reactions.
- a cDNA template may be synthesized by having a linearized plasmid undergo polymerase chain reaction (PCR) .
- PCR polymerase chain reaction
- the cDNA may be submitted for sequencing analysis before undergoing transcription.
- the cDNA produced in the previous step may be transcribed using an in vitro transcription (IVT) system.
- the system typically comprises a transcription buffer, nucleotide triphosphates (NTPs) , an RNase inhibitor and a polymerase.
- NTPs may be manufactured in house, may be selected from a supplier, or may be synthesized as described herein.
- the NTPs may be selected from, but are not limited to, those described herein including natural and unnatural (modified) NTPs.
- the polymerase may be selected from, but is not limited to, T7 RNA polymerase, T3 RNA polymerase and mutant polymerases such as, but not limited to, polymerases able to be incorporated into modified nucleic acids.
- RNA clean-up may also include a purification method such as, but not limited to, system from Beckman Coulter (Danvers, Mass. ) , HPLC based purification methods such as, but not limited to, strong anion exchange HPLC, weak anion exchange HPLC, reverse phase HPLC (RP-HPLC) , and hydrophobic interaction HPLC (HIC-HPLC) .
- a purification method such as, but not limited to, system from Beckman Coulter (Danvers, Mass. )
- HPLC based purification methods such as, but not limited to, strong anion exchange HPLC, weak anion exchange HPLC, reverse phase HPLC (RP-HPLC) , and hydrophobic interaction HPLC (HIC-HPLC) .
- MEN epsilon/beta nuclear-retained non-coding RNAs are up-regulated upon muscle differentiation and are essential components of paraspeckles, Genome Research.
- a triple helix stabilizes the 3’ ends of long noncoding RNAs that lack poly (A) tails, Genes &Development, 26.
- Transposon-associated TnpB is a programmable RNA-guided DNA endonuclease. " Nature 599.7886 (2021) : 692-696.
- Plasmid “SpCas9” includes only an SpCas9-coding sequence but not an sgRNA sequence.
- plasmids “SpCas9-sgRNA1” and “SpCas9-sgRNA2” the SpCas9-encoding sequences and sgRNA sequences are connected directly, or via a tRNA-like structure derived from the hMALAT1 gene, respectively.
- SpCas9-sgRNA2 Compared with plasmid “SpCas9-sgRNA2, ” “SpCas9-sgRNA3” contains an additional triple helix sequence of hMALAT1 gene after the SpCas9 sequence.
- the EF1a promoter was used to drive transcription initiation, with BGHpA as the termination and polyadenylation signal. Plasmids were prepared by using plasmid extraction kits, quantitated by Nanodrop One C (Thermofisher) , and then used for cell transfection.
- 293FT cells were maintained in DMEM supplemented with 10%FBS and regularly tested to exclude mycoplasma contamination.
- 293FT cells were seeded in a 24-well plate at a density of 1 ⁇ 10 5 cells per well and transfected with 250 ⁇ l serum-free Opti-MEM containing 2.5 ⁇ l LIPOFECTAMINE LTX, 1 ⁇ l LIPOFECTAMINE plus, 0.95 ⁇ g SpCas9 or SpCas9-sgRNA plasmids, and 0.1 ⁇ g puromycin screening plasmids. After 24 h, puromycin was added to the medium at a final concentration of 4 ⁇ g/ml.
- genomic DNA was extracted from the cells using QuickExtract DNA Extraction Solution for subsequent sequencing analysis.
- Target genomic sequences were PCR-amplified using high-fidelity DNA polymerase PrimeSTAR HS with primer sets flanking the examined sgRNA target sites. Indel frequency at each target site was calculated by Synthego analysis. See https: //ice. synthego. com.
- spCas9 and SpCas9-sgRNA1 plasmids induced little gene editing (presented as “Indel percentage” ) at both BCL11A and KLF1 locus in 239FT cells (Fig. 2B) .
- hMALAT1 tRNA-like sequence was used to link the SpCas9 and sgRNA sequences, significantly more gene editing (over 10%) were produced (Fig. 2B, SpCas9-sgRNA2) .
- Inclusion of the hMALAT1 triple helix structure further increased the gene editing efficiency (Fig. 2B, SpCas9-sgRNA3) .
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Genetics & Genomics (AREA)
- Chemical & Material Sciences (AREA)
- Organic Chemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Wood Science & Technology (AREA)
- Zoology (AREA)
- Biotechnology (AREA)
- General Engineering & Computer Science (AREA)
- Microbiology (AREA)
- Biochemistry (AREA)
- General Health & Medical Sciences (AREA)
- Medicinal Chemistry (AREA)
- Physics & Mathematics (AREA)
- Biophysics (AREA)
- Plant Pathology (AREA)
- Enzymes And Modification Thereof (AREA)
Abstract
Provided is an RNA comprising a tRNA-like structure. Also provided are DNA constructs, vectors, systems, and compositions involving the tRNA-like structure.
Description
FIELD OF DISCLOSURE
The present disclosure generally relates to tRNA-like structure (TLS) and RNA constructs comprising tRNA-like structure. Also disclosed are DNA constructs, vectors, systems, compositions, and methods involving the tRNA-like structure.
SEQUENCE LISTING
This application contains a Sequence Listing electronically submitted as an XML file entitled “CU3247CST33CN-sql. xml” having a size of 360,854 bytes and created on August 18, 2023. The information contained in the Sequence Listing is incorporated by reference herein.
Gene editing is a cutting-edge technique that enables precise modification of specific target genes in an organism (such as gene knockout, repair, addition) . Frequently used gene editing tools include Zinc finger nucleases (ZFNs) , transcription activator like effect nucleases (TALENs) , and clustered regularly interspaced short palindromic repeats/CRISPR-associated proteins (CRISPR/Cas) . CRISPR/Cas is an acquired immune system in prokaryotes, which resists the invasion of exogenous genetic elements such as bacteriophages or plasmids by cutting specific DNA sites. Unlike ZFNs and TALENs, a CRISPR/Cas system requires short RNA sequences (guide RNA, gRNA) for specific site recognition. Currently, CRISPR/Cas has been widely applied in biological research, biopharmaceutical, agricultural breeding and so on.
The CRISPR/Cas system recognizes the target site of the genome, and results in double-stranded DNA breaks (DSBs) . The DSBs are mainly repaired via non-homologous end joining (NHEJ) , which enables knockout of the target gene. In the presence of foreign homologous DNA fragments, the DSBs can be recombined with foreign DNA through homology-directed repair (HDR) , thereby achieving repair of the target site or addition of foreign fragments. In clinical applications, the disadvantage of the CRISPR/Cas system is that it creates DSBs, which may cause safety risks such as large DNA fragment deletion, chromosome heterotopy, and chromothripsis.
Base editing is regarded as the next generation gene editing technology, which modifies specific base pairs in the genome. Base editing tools include adenine base editors (ABEs) and cytosine base editors (CBEs) . The key component of ABE is a fusion protein composed of nicked Cas9 (nCas9) and an adenine deaminase. The fusion protein targets specific genomic sites together
with gRNA and converts A·T to G·C pairs. Similarly, the key component of CBE is a fusion protein composed of nicked Cas9 (nCas9) and a cytosine deaminase. The fusion protein targets specific genomic sites together with gRNA and converts C·G to T·A pairs. Compared with the regular CRISPR/Cas system, base editing enables accurate gene modifications without causing DSBs, and therefore, has fewer safety issues.
To accomplish modification of target genes, the gene editors may be applied to cells or organisms in the form of plasmids, viral vectors, ribonucleoprotein (RNPs) , or mRNAs/gRNAs. For plasmids and viral vectors, transcription of gene editing proteins and gRNAs are usually driven by separate promoters. In a typical CRISPR/Cas9 system, Cas9 transcription is initiated by RNA Pol II promoters (e.g., CMV promoters) , while gRNA transcription is initiated by RNA Pol III promoters (e.g., U3 promoters) . RNPs consisting of gene editing proteins and gRNAs have been widely used for ex vivo gene editing (e.g., electroporation of hematopoietic stem cells) . Alternatively, gene editing proteins can be delivered to cells or organisms in the form of mRNAs, which bypasses the expression and purification of challenging recombinant proteins. Besides ex vivo gene editing, mRNA/gRNA can also be packaged into lipid nanoparticles (LNPs) and delivered to livers or other organs.
mRNA can be produced by in vitro transcription (IVT) using plasmids as the template in large scale. The success of Covid-19 mRNA vaccines has paved the way for mRNA applications in gene editing. Despite the popularity of mRNA/gRNA in gene editing, one big obstacle for clinical applications is the production of gRNA. gRNAs are typically 80-150 nt in length. Similar to mRNA, gRNAs can be synthesized by IVT. However, IVT gRNAs have poor stability and editing efficiency in cells. Currently, gRNAs are generally produced by solid-phase synthesis, which makes it possible to introduce modifications to gRNA at specific positions. These modifications, such as 2′-O-methyl and phosphorothioate, largely increase the stability and editing efficiency of gRNAs. However, it is challenging to generate high-purity gRNAs longer than 100 nt by solid-phase synthesis. Meanwhile, how to manufacture gRNA in large scale with low cost remains a problem to be resolved.
The present disclosure provides a novel idea of combining mRNA and one or more non-coding RNA in one transcript ( “precursor transcript” ) and separating each component by tRNA-like structure. The presence of tRNA-like structure makes this precursor transcript subject to cleavage by intracellular RNase P and RNase Z. After the cleavage, tRNA-like structures are cut off, and the full-length precursor transcript releases the mRNA molecule and the non-coding RNAs. (See Fig. 1) This novel design allows the non-coding RNAs to be transcribed using the same promoter for the mRNA, thus simplifies production of non-coding RNA.
In some embodiments, this novel design can be applied to gene editing systems. The present disclosure provides a novel way to express a gene editing protein and one or more gRNAs with high efficiency by combining an mRNA (encoding the gene editing protein) and the gRNAs in one transcript and separating them with one or more tRNA-like structures. After cleavage by intracellular RNase P and/or RNase Z, the full-length precursor transcript releases one mRNA molecule encoding the gene editing protein, and one or more gRNAs. (Fig. 1) . By linking mRNA and gRNA (s) together in one single transcript, the gRNA (s) do not need to be driven by separate promoters in plasmids or viral vectors. When using in vitro-synthesized mRNA/gRNAs to mediate gene editing, this novel design simplifies the process to produce mRNA-gRNA fusions by IVT, thus avoiding chemical synthesis of long gRNAs. This new design is particularly suitable for gene editing of multiple targets, which requires two or more gRNAs.
In some embodiments, this novel design can be applied to transformer base editor systems. For example, in some embodiments, a main guide RNA (mgRNA) and/or a helper guide RNA (hgRNA) , and a mRNA encoding a Cas protein or a base editor protein are encoded by one polynucleotide, separating by one or more tRNA-like structures.
In some embodiments, this novel design can be applied to gene editing systems for editing mammalian genes.
In an aspect, the present disclosure provides an RNA comprising a protein coding sequence, a non-coding RNA sequence, and a tRNA-like structure (TLS) between the protein coding sequence and the non-coding RNA sequence.
In some embodiments of the RNA disclosed herein, the RNA comprises from 5’ -end to 3’ -end the protein coding sequence, the tRNA-like structure (TLS) , and the non-coding RNA sequence.
In some embodiments of the RNA disclosed herein, the non-coding RNA sequence is a guide RNA (gRNA) sequence.
In some embodiments of the RNA disclosed herein, the guide RNA (gRNA) comprises a spacer sequence and a scaffold sequence, wherein the scaffold sequence is capable of binding to the protein encoded by the protein coding sequence, and wherein the spacer sequence targets a target gene.
In some embodiments, the target gene is a mammalian gene. In some embodiments, the target gene is a human gene.
In some embodiments, the spacer sequence in the gRNA is selected from SEQ ID NOs: 204-260.
In some embodiments of the RNA disclosed herein, the scaffold sequence comprises at least one protein-binding motif, wherein the protein-binding motif is an RNA aptamer motif or a variant thereof.
In some embodiments of the RNA disclosed herein, the RNA further comprises an mRNA-stabilizing sequence between the protein coding sequence and the tRNA-like structure (TLS) .
In some embodiments of the RNA disclosed herein, the mRNA-stabilizing sequence is a triple helix sequence, a poly (A) , or a histone stem-loop.
In some embodiments of the RNA disclosed herein, the triple helix sequence is derived from a long non-coding RNA (lncRNA) gene.
In some embodiments of the RNA disclosed herein, the lncRNA gene is metastasis-associated lung adenocarcinoma transcript 1 (MALAT1) or nuclear paraspeckle assembly transcript 1 (NEAT1) .
In some embodiments of the RNA disclosed herein, the RNA further comprises a 5’ -untranslated region (5’ -UTR) at the 5’ -end of the RNA, and/or a 3’ -untranslated region (3’ -UTR) at the 3’ -end of the RNA.
In some embodiments of the RNA disclosed herein, the RNA further comprises a poly-A sequence at the 3’ -end of the RNA and/or a 5’ -Cap structure at the 5’ -end of the RNA.
In some embodiments of the RNA disclosed herein, the RNA further comprises a sequence encoding a nuclear localization signal (NLS) , wherein the sequence encoding the NLS is located at the 5’ -end and/or 3’ -end of the protein coding sequence.
In some embodiments of the RNA disclosed herein, the RNA comprises more than one sequences that each encodes a nuclear localization signal (NLS) , wherein the nuclear localization signals encoded by the sequences are the same or different.
In some embodiments of the RNA disclosed herein, the protein coding sequence encodes an RNA binding protein.
In some embodiments of the RNA disclosed herein, the RNA binding protein is an IscB protein or a variant thereof.
In some embodiments of the RNA disclosed herein, the RNA binding protein is a Cas protein or a variant thereof.
In some embodiments of the RNA disclosed herein, the Cas protein is a Cas9, a dead Cas9 (dCas9) , or a Cas9 nickase (nCas9) .
In some embodiments of the RNA disclosed herein, the Cas protein is selected from the group consisting of SpCas9, FnCas9, St1Cas9, St3Cas9, NmCas9, SaCas9, AsCpfl, LbCpfl, FnCpfl, VQR Cas9, EQR Cas9, VRER Cas9, Cas9-NG, xCas9, eCas9, SpCas9-HF1, HypaCas9, HiFiCas9, sniper-Cas9, SpG, SpRY, KKH SaCas9, CjCas9, Cas9-NRRH, Cas9-NRCH, Cas9-NRTH, SsCpfl, PcCpfl, BpCpfl, LiCpfl, PmCpfl, Lb2Cpf1, PbCpfl, PbCpfl, PeCpf1, PdCpf1, MbCpf1, EeCpf1, CmtCpf1, BsCpfl, BhCasl2b, AkCasl2b, BsCasl2b, AmCasl2b, AaCasl2b, RfxCasl3d, LwaCasl3a, PspCasl3b, PguCasl3b, and RanCasl3b.
In some embodiments of the RNA disclosed herein, the protein coding sequence encodes a Cas protein fused with or linked to a protein having enzymatic activity.
In some embodiments of the RNA disclosed herein, the protein having enzymatic activity is a cytidine deaminase.
In some embodiments of the RNA disclosed herein, the cytidine deaminase is selected from the group consisting of APOBEC3B (A3B) , APOBEC3C (A3C) , APOBEC3D (A3D) , APOBEC3F (A3F) , APOBEC3G (A3G) , APOBEC3H (A3H) , APOBEC1 (Al) , APOBEC3 (A3) , APOBEC2 (A2) , APOBEC4 (A4) , and AICDA (AID) .
In some embodiments of the RNA disclosed herein, the protein having enzymatic activity is an adenosine deaminase.
In some embodiments of the RNA disclosed herein, the adenosine deaminase is selected from the group consisting of tRNA-specific adenosine deaminase (TadA) , adenosine deaminase tRNA specific 1 (ADAT1) , adenosine deaminase tRNA specific 2 (ADAT2) , adenosine deaminase tRNA specific 3 (ADAT3) , adenosine deaminase RNA specific B1 (ADARB1) , adenosine deaminase RNA specific B2 (ADARB2) , adenosine monophosphate deaminase 1 (AMPD1) , adenosine monophosphate deaminase 2 (AMPD2) , adenosine monophosphate deaminase 3 (AMPD3) , adenosine deaminase (ADA) , adenosine deaminase 2 (ADA2) , adenosine deaminase like (ADAL) , adenosine deaminase domain containing 1 (ADAD1) , adenosine deaminase domain containing 2 (ADAD2) , and adenosine deaminase RNA specific (ADAR) .
In some embodiments of the RNA disclosed herein, the protein coding sequence encodes a base editor protein.
In some embodiments of the RNA disclosed herein, the base editor protein is a cytidine base editor (CBE) protein.
In some embodiments of the RNA disclosed herein, the CBE protein is selected from BE3, YE1-BE3, YEE-BE3, BE4, eBE, hA3A-BE3, hA3A-BE3-Y130F, hA3A-BE3-Y132D, eA3A-BE3, SaKKH-BE3, Target-AID, dCas12a-BE, BEACON1, BEACON2, enAsBE, PBE, and A3A-PBE.
In some embodiments of the RNA disclosed herein, the base editor protein is an adenosine base editor (ABE) protein.
In some embodiments of the RNA disclosed herein, the ABE protein is selected from ABE7.10, ABE8e, ABE8e-V106W, LbABE8e, STEME-1, ABE-P1, ABE-P2, and rBE14.
In some embodiments of the RNA disclosed herein, the protein having enzymatic activity is a methylase, or a reverse transcriptase.
In some embodiments of the RNA disclosed herein, the tRNA-like structure (TLS) comprises an acceptor stem, a D-loop arm, and a TΨC-loop arm.
In some embodiments of the RNA disclosed herein, the tRNA-like structure (TLS) comprises a cleavage site for one or more RNase P, RNase Z, and/or RNase E.
In some embodiments of the RNA disclosed herein, the tRNA-like structure (TLS) is derived from a tRNA gene or a long non-coding RNA (lncRNA) gene.
In some embodiments of the RNA disclosed herein, the long non-coding RNA (lncRNA) gene is metastasis-associated lung adenocarcinoma transcript 1 (MALAT1) or nuclear paraspeckle assembly transcript 1 (NEAT1) .
In some embodiments of the RNA disclosed herein, the tRNA-like structure (TLS) is derived from a eukaryotic organism.
In some embodiments of the RNA disclosed herein, the eukaryotic organism is selected from the group consisting of Saccharomyces cerevisiae, Arabidopsis thaliana, Oryza sativa, Homo Sapiens, Macaca mulatta, Macaca fascicularis, Susscrofa domestica, Canis lupus familiaris, Rattus norvegicus, and Mus musculus.
In some embodiments of the RNA disclosed herein, the tRNA-like structure (TLS) is encoded by any one of SEQ ID NOs: 4-7.
In some embodiments of the RNA disclosed herein, the RNA comprises more than one non-coding RNA sequences, the non-coding RNA sequences are the same or different.
In some embodiments of the RNA disclosed herein, the RNA comprises more than one non-coding RNA sequences that are guide RNAs (gRNAs) , wherein the gRNAs are the same or different.
In some embodiments of the RNA disclosed herein, the RNA comprises more than one non-coding RNA sequences, and wherein the RNA comprises a tRNA-like structure (TLS) between the protein coding sequence and the nearest non-coding RNA sequence, and between each non-coding RNA sequences.
In some embodiments of the RNA disclosed herein, the protein coding sequence is located upstream relative to all non-coding RNA sequences.
In some embodiments of the RNA disclosed herein, the RNA comprises at least one modified nucleotide.
In an aspect, the present disclosure provides a DNA encoding any of the RNA disclosed herein.
In some embodiments of the DNA disclosed herein, the DNA further comprises an RNA polymerase promoter at the 5’ -end.
In some embodiments of the DNA disclosed herein, the RNA polymerase promoter is a eukaryotic RNA polymerase II promoter.
In some embodiments of the DNA disclosed herein, the RNA polymerase promoter is selected from human cytomegalovirus immediate early enhancer/promoter (CMV promoter) , human eukaryotic translation elongation factor 1 α1 promoter (EF1a promoter) , CMV early enhancer fused to modified chicken β-actin promoter (CAG promoter) , Simian virus 40 enhancer/early promoter (SV40 promoter) , and human or mouse phosphoglycerate kinase 1 promoter (PGK promoter) .
In an aspect, the present disclosure provides a vector comprising any of the DNA disclosed herein.
In some embodiments of the vector disclosed herein, the vector is a viral vector or a plasmid.
In an aspect, the present disclosure provides a system comprising any of the RNA disclosed herein, a ribonuclease P (RNase P) or a polynucleotide encoding thereof, and a ribonuclease Z (RNase Z) or a polynucleotide encoding thereof.
In an aspect, the present disclosure provides a system comprising any of the DNA disclosed herein and/or any of the vector disclosed herein, an RNA polymerase II or a polynucleotide encoding thereof, a ribonuclease P (RNase P) or a polynucleotide encoding thereof, and a ribonuclease Z (RNase Z) or a polynucleotide encoding thereof.
In an aspect, the present disclosure provides a composition comprising (1) any of the RNA disclosed herein, any of the DNA disclosed herein, and/or any of the vector disclosed herein; and (2) a carrier.
In some embodiments of the composition disclosed herein, the carrier is selected from lipid nanoparticles, liposomes, cationic nanoemulsions, dendrimer-based lipid nanoparticles, cationic polymers, and polysaccharide particles.
In an aspect, the present disclosure provides a gene editing system comprising
a. an hgRNA comprising a CRISPR motif, an hgRNA spacer, and a first protein-binding motif, or a DNA polynucleotide encoding the hgRNA,
b. an mgRNA comprising a second CRISPR motif and an mgRNA spacer, or a DNA polynucleotide encoding the mgRNA, wherein the mgRNA spacer targets a target gene,
c. a first CRISPR-associated protein (Cas protein) , or a polynucleotide encoding the first Cas protein, wherein the first Cas protein binds to the first CRISPR motif,
d. a second Cas protein, or a polynucleotide encoding the second Cas protein, wherein the second Cas protein binds to the second CRISPR motif,
e. a first fusion protein comprising a nucleobase deaminase or a catalytic domain thereof and a first RNA binding domain, or a polynucleotide encoding the first fusion protein, wherein the nucleobase deaminase or the catalytic domain thereof and the first RNA binding domain are optionally connected by a linker, and wherein the first RNA binding domain binds to the first protein-binding motif.
wherein the first Cas protein and second Cas protein are the same or different,
wherein the gene editing system comprises a TLS-containing RNA or a DNA encoding the TLS-containing RNA, wherein the TLS-containing RNA comprises a protein coding sequence, a non-coding RNA sequence, and a tRNA-like structure (TLS) between the protein coding sequence and the non-coding RNA sequence; wherein the protein encoded by the protein coding sequence is selected from a group consisting of the first Cas protein, the second Cas protein, and the first fusion protein; and wherein the non-coding RNA sequence is the hgRNA or the mgRNA.
As used herein, a “TLS-containing RNA” refers to a RNA comprising at least one tRNA-like structure. In some embodiments, a TLS-containing RNA comprises an mRNA and at least one non-coding RNA, wherein at least one tRNA-like structure is located between the mRNA and the non-coding RNA, and/or between the non-coding RNAs.
In some embodiments, the gene editing system further comprises
a. a protease, or a polynucleotide encoding thereof, and
b. a nucleobase deaminase inhibitor domain or a polynucleotide encoding thereof,
wherein the nucleobase deaminase inhibitor domain is connected to the nucleobase deaminase or the catalytic domain thereof in the first fusion protein optionally by a linker, and wherein there is a cleavage site for the protease between the nucleobase deaminase inhibitor domain and the nucleobase deaminase or the catalytic domain thereof.
In some embodiments, the gene editing system further comprises a second fusion protein comprising the protease and a second RNA binding domain, or a polynucleotide encoding the second fusion protein,
wherein the protease and the second RNA binding domain are optionally connected by a linker,
wherein the mgRNA further comprises a second protein-binding motif,
wherein the second RNA binding domain binds to the second protein-binding motif;
and wherein the protein encoded by the protein coding sequence is selected from a group consisting of the first Cas protein, the second Cas protein, the first fusion protein, and the second fusion protein.
In some embodiments, the protease is split into a first protease fragment and a second protease fragment, wherein the first and/or second protease fragment alone is not able to cleave the cleavage site.
In some embodiments, the gene editing system comprises
a. a second fusion protein comprising the first protease fragment and a second RNA binding domain, or a polynucleotide encoding the second fusion protein, wherein the first protease fragment and the second RNA binding domain are optionally connected by a linker, and
b. a third fusion protein comprising the second protease fragment and a third RNA binding domain, or a polynucleotide encoding the third fusion protein, wherein the second protease fragment and the third RNA binding domain are optionally connected by a linker,
wherein the mgRNA further comprises a second protein-binding motif and a third protein-binding motif,
wherein the second RNA binding domain binds to the second protein-binding motif, wherein the third RNA binding domain binds to the third protein-binding motif,
and wherein the protein encoded by the protein coding sequence is selected from a group consisting of the first Cas protein, the second Cas protein, the first fusion protein, the second fusion protein, and the third fusion protein.
In some embodiments, the second and third RNA binding domains are the same or different, and the second and third protein-binding motifs are the same or different.
In some embodiments, the gene editing system further comprises
a second fusion protein comprising the first protease fragment and a second RNA binding domain, or a polynucleotide encoding the second fusion protein,
wherein the first protease fragment and the second RNA binding domain are optionally connected by a linker,
wherein the mgRNA further comprises a second protein-binding motif,
wherein the second RNA binding domain binds to the second protein-binding motif,
and wherein the protein encoded by the protein coding sequence is selected from a group consisting of the first Cas protein, the second Cas protein, the first fusion protein, and the second fusion protein.
In some embodiments of the gene editing system described herein, the protease is a TEV protease, a TuMV protease, a PPV protease, a PVY protease, a ZIKV protease, or a WNV protease.
In some embodiments of the gene editing system described herein, the protease is a TEV protease comprising a sequence of SEQ ID NO: 261.
In some embodiments of the gene editing system described herein, the first TEV protease fragment comprises a sequence of SEQ ID NO: 262 or 263.
In some embodiments of the gene editing system described herein, the nucleotide deaminase is a cytidine deaminase.
In some embodiments of the gene editing system described herein, the cytidine deaminase is selected from the group consisting of APOBEC3B (A3B) , APOBEC3C (A3C) , APOBEC3D (A3D) , APOBEC3F (A3F) , APOBEC3G (A3G) , APOBEC3H (A3H) , APOBECI (Al) , APOBEC3 (A3) , APOBEC2 (A2) , APOBEC4 (A4) , and AICDA (AID) .
In some embodiments of the gene editing system described herein, the cytidine deaminase comprises an amino acid sequence of any one of SEQ ID NOs: 166-201.
In some embodiments of the gene editing system described herein, the nucleotide deaminase is an adenosine deaminase.
In some embodiments of the gene editing system described herein, the adenosine deaminase is selected from the group consisting of tRNA-specific adenosine deaminase (TadA) , adenosine deaminase tRNA specific 1 (ADAT1) , adenosine deaminase tRNA specific 2 (ADAT2) , adenosine deaminase tRNA specific 3 (ADAT3) , adenosine deaminase RNA specific B1 (ADARB1) , adenosine deaminase RNA specific B2 (ADARB2) , adenosine monophosphate deaminase 1 (AMPD1) , adenosine monophosphate deaminase 2 (AMPD2) , adenosine monophosphate deaminase 3 (AMPD3) , adenosine deaminase (ADA) , adenosine deaminase 2 (ADA2) , adenosine deaminase like (ADAL) , adenosine deaminase domain containing 1 (ADAD1) , adenosine deaminase domain containing 2 (ADAD2) , and adenosine deaminase RNA specific (ADAR) .
In some embodiments of the gene editing system described herein, the adenosine deaminase comprises an amino acid sequence of any one of SEQ ID NOs: 73-165.
In some embodiments of the gene editing system described herein, the first fusion protein further comprises an uracil glycosylase inhibitor (UGI) .
In some embodiments of the gene editing system described herein, the Cas protein is a Cas9, a dead Cas9 (dCas9) , or a Cas9 nickase (nCas9) selected from the group consisting of SpCas9, FnCas9, St1Cas9, St3Cas9, NmCas9, SaCas9, AsCpfl, LbCpfl, FnCpfl, VQR SpCas9, EQR SpCas9, VRER SpCas9, SpCas9-NG, xSpCas9, RHA FnCas9, KKH SaCas9, NmeCas9, StCas9, CjCas9, AsCpfl, FnCpfl, SsCpfl, PcCpfl, BpCpfl, CmtCpfl, LiCpfl, PmCpfl, Pb3310Cpfl, Pb4417Cpfl, BsCpfl, EeCpfl, BhCasl2b, AkCasl2b, EbCasl2b, LsCasl2b, RfCasl3d, LwaCasl3a, PspCasl3b, PguCasl3b, and RanCasl3b.
In some embodiments of the gene editing system described herein, the first protein-binding RNA motif and the first RNA binding domain, the second protein-binding RNA motif and the second RNA binding domain, and the third protein-binding RNA motif and the third RNA binding domain, are each independently selected from the group consisting of a MS2 phage operator stem-loop and MS2 coat protein (MCP) or an RNA-binding section thereof,
a BoxB and N22P or an RNA-binding section thereof,
a telomerase Ku binding motif and Ku protein or an RNA-binding section thereof,
a telomerase Sm7 binding motif and Sm7 protein or an RNA-binding section thereof,
a PP7 phage operator stem -loop and PP7 coat protein (PCP) or an RNA-binding section thereof,
a SfMu phage Com stem-loop and Com RNA binding protein or an RNA-binding section thereof, and
a non-natural RNA aptamer and corresponding aptamer ligand or an RNA-binding section thereof.
In some embodiments of the gene editing system described herein, the mgRNA and/or the hgRNA comprise a dual-RNA structure.
In some embodiments of the gene editing system described herein, the dual-RNA structure is formed by a CRISPR RNA (crRNA) and a trans-activating crRNA (tracrRNA) , wherein the crRNA comprises the spacer.
In some embodiments of the gene editing system described herein, the mgRNA comprises a mcrRNA and a first tracrRNA, and the mcrRNA comprises the mgRNA spacer, wherein the hgRNA comprises a hcrRNA and a second tracrRNA, and the hcrRNA comprises the hgRNA spacer, and wherein the first tracrRNA and the second tracrRNA are same or different.
In some embodiments of the gene editing system described herein, the TLS-containing RNA comprises more than one non-coding RNA sequences.
In some embodiments of the gene editing system described herein, the target gene is a mammalian gene.
In some embodiments of the gene editing system described herein, the mgRNA spacer and the hgRNA spacer are respectively:
In an aspect, the present disclosure provides a method for gene editing in a subject, comprising administering to the subject (1) any of the RNA disclosed herein; and/or (2) any of the DNA disclosed herein; and/or (3) any of the vector disclosed herein; and/or (4) any of the system disclosed herein; and/or (5) any of the composition disclosed herein; and/or (6) any of the gene
editing system disclosed herein. In some embodiments, the subject is a mammal. In some embodiments, the subject is a human.
BRIEF DESCRIPTION OF FIGURES
Fig. 1 is an illustration of polynucleotide constructs comprising one or more tRNA-like structures. Fig. 1A illustrates the transcription, RNase cleavage, and translation process of an mRNA linked with one single guide RNA (sgRNA) . The coding sequence (CDS) of gene editing protein is linked to a sgRNA via t-RNA-like structure. The transcript is cleaved at the t-RNA-like structure by RNase P and RNase Z, which releases one mRNA encoding the gene editing protein and one sgRNA. The triplex sequence is used to stabilize the mRNA and to enhance its translation. Fig. 1B illustrates the transcription, RNase cleavage, and translation process of an mRNA linked with two or more identical or different sgRNAs. Fig. 1C illustrates the transcription, RNase cleavage, and translation process of a tBE system comprising the tRNA-like structures. A main sgRNA (msgRNA) and a helper sgRNA (hsgRNA) are linked via t-RNA-like structure to the CDS of a tBE locator (e.g., nCas9) or a tBE effector &key (e.g., a nucleotide deaminase) .
Fig. 2 shows an application of tRNA-like structure in a CRISPR/Cas9 system as described in Example 1. Fig. 2A illustrates plasmid constructs used in the cell transfection. Fig. 2B shows gene editing efficiency at BCL11A and KLF1 locus in 293FT cells.
Definition
In the present disclosure, unless otherwise specified, the scientific and technical terms used herein have the meanings generally understood by a person skilled in the art. Although any methods and materials similar or equivalent to those described herein find use in the practice of the present disclosure, the preferred methods and materials are described herein. Accordingly, the terms defined herein are more fully described by reference to the Specification as a whole.
All publications, including but not limited to disclosures and disclosure applications, cited in this specification are herein incorporated by reference as though fully set forth. If certain content of a publication cited herein contradicts or is inconsistent with the present disclosure, the present disclosure controls.
As used herein, the singular terms “a, ” “an, ” and “the” include the plural reference unless the context clearly indicates otherwise.
As used herein, “and/or” refers to and encompasses any and all possible combinations of one or more of the associated listed items, as well as the lack of combinations when interpreted
in the alternative ( “or” ) . Moreover, the present invention also contemplates that in some embodiments of the invention, any feature or combination of features set forth herein can be excluded or omitted.
Unless the context requires otherwise, the terms “comprise, ” “comprises, ” and “comprising, ” or similar terms are intended to mean a non-exclusive inclusion, such that a recited list of elements or features does not include those stated or listed elements solely but may include other elements or features that are not listed or stated.
Unless otherwise indicated, nucleic acids are written left to right in the 5'to 3'orientation. Thus, “upstream” means towards the 5'end, and “downstream” means toward the 3’ end. For example, a first sequence is located upstream of a second sequence means that the first sequence is closer to the 5’ end than the second sequence. For example, a first sequence is located downstream of a second sequence means that the first sequence is closer to the 3’ end than the second sequence.
Unless otherwise indicated, amino acid sequences are written left to right in amino to carboxy orientation, respectively.
It is to be understood that this disclosure is not limited to the particular methodology, protocols, and reagents described, as these may vary, depending upon the context in which they are used by those skilled in the art.
As used herein, the terms “percent identity” and “%identity, ” as applied to nucleic acid or polynucleotide sequences, refer to the percentage of residue matches between at least two nucleic acid or polynucleotide sequences aligned using a standardized algorithm. Such an algorithm may insert, in a standardized and reproducible way, gaps in the sequences being compared in order to optimize alignment between two sequences, and therefore achieve a more meaningful comparison of the two sequences.
Percent identity between nucleic acid or polynucleotide sequences may be determined using a suite of commonly used and freely available sequence comparison algorithms provided by the National Center for Biotechnology Information (NCBI) Basic Local Alignment Search Tool (BLAST) (Altschul, S. F. et al. (1990) J. Mol. Biol. 215: 403-410) , which is available from several sources, including the NCBI, Bethesda, Md., and on the Internet at http: //www. ncbi. nlm. nih. gov/BLAST/.
Nucleic acid or polynucleotide sequences that do not show a high degree of identity may nevertheless encode similar amino acid sequences due to the degeneracy of the genetic code. It is understood that changes in a nucleic acid sequence can be made using this degeneracy to produce multiple nucleic acid sequences that all encode substantially the same protein. Specifically, degenerate codon substitutions may be achieved by generating sequences in which the third
position of one or more selected (or all) codons is substituted with mixed-base and/or deoxyinosine residues (Batzer et al. (1991) Nucleic Acid Res 19: 5081; Ohtsuka et al. (1985) J Biol Chem 260: 2605-2608; Cassol et al. (1992) ; Rossolini et al. (1994) Mol Cell Probes 8: 91-98) . The term “nucleic acid” refers to deoxyribonucleotides or ribonucleotides and polymers thereof in either single-or double-stranded form. Unless specifically limited, the term encompasses nucleic acids containing known analogues of natural nucleotides which have similar binding properties as the reference nucleic acid and are metabolized in a manner similar to naturally occurring nucleotides. The term nucleic acid is used interchangeably with polynucleotide, and (in appropriate contexts) gene, cDNA, and mRNA encoded by a gene.
As used herein, “percent (%) amino acid sequence identity” with respect to a peptide, polypeptide or protein sequence is defined as the percentage of amino acid residues in a candidate sequence that are identical with the amino acid residues in another peptide or polypeptide sequence, after aligning the sequences and introducing gaps, if necessary, to achieve the maximum percent sequence identity, and not considering any conservative substitutions as part of the sequence identity. Percent amino acid sequence identity in the current disclosure is measured using BLAST software. Those skilled in the art can determine appropriate parameters for measuring alignment, including any algorithms needed to achieve maximal alignment over the full length of the sequences being compared.
An amino acid substitution refers to the replacement of one amino acid in a polypeptide with another amino acid. Amino acid substitutions can be conservative or non-conservative substitutions. A conservative replacement (also called a conservative mutation or a conservative substitution) is an amino acid replacement in a protein that changes a given amino acid to a different amino acid with similar biochemical properties (e.g., charge, hydrophobicity, and size) . Exemplary substitutions are shown in Table 1. Amino acid substitutions may be introduced into a protein of interest and the products screened for a desired activity, for example, retained/improved biological activity.
Table 1 Exemplary Substitutions
Amino acids may be grouped according to common side-chain properties:
(1) hydrophobic: Norleucine, Met, Ala, Val, Leu, Ile;
(2) neutral hydrophilic: Cys, Ser, Thr, Asn, Gln;
(3) acidic: Asp, Glu;
(4) basic: His, Lys, Arg;
(5) residues that influence chain orientation: Gly, Pro;
(6) aromatic: Trp, Tyr, Phe.
As used herein, the term “polypeptide” is intended to encompass a singular “polypeptide” as well as plural “polypeptides, ” and refers to a molecule composed of monomers (amino acids) linearly linked by amide bonds (also known as peptide bonds) . The term “polypeptide” refers to any chain or chains of two or more amino acids, and does not refer to a specific length of the product. Thus, “peptides, ” “protein” , or any other term used to refer to a chain or chains of two or more amino acids, are included within the definition of “polypeptide, ” and the term “polypeptide” may be used instead of, or interchangeably with any of these terms. The term “polypeptide” is also intended to refer to the products of post-expression modifications of the polypeptide, including without limitation glycosylation, acetylation, phosphorylation, amidation, derivatization by known protecting/blocking groups, proteolytic cleavage, or
modification by non-naturally occurring amino acids. A polypeptide may be derived from a natural biological source or produced by recombinant technology, but is not necessarily translated from a designated nucleic acid sequence. It may be generated in any manner, including by chemical synthesis.
As used herein, the term “encode” or “encoding” as it is applied to polynucleotides refers to a polynucleotide which is said to “encode” a polypeptide if, in its native state or when manipulated by methods well known to those skilled in the art, it can be transcribed and/or translated to produce the mRNA for the polypeptide and/or a fragment thereof. The antisense strand is the complement of such a nucleic acid, and the encoding sequence can be deduced therefrom.
As used herein, a “single guide RNA” (sgRNA) refers to a synthetic or expressed RNA sequence that comprises a CRISPR binding motif and a spacer. A “spacer” is a DNA-targeting motif, which is a sequence that is complementary to a target specific DNA region. The CRISPR binding motif of a guide RNA can bind to a Cas protein and DNA-targeting motif of the gRNA can guide the complex to a specific target location on a DNA. A guide RNA may further comprise one or more protein-binding motifs.
As used herein, a “fusion protein” is a protein comprising at least two domains that are encoded by separate genes that have been joined a single polypeptide. For example, a fusion protein can comprise two domains that are encoded by separate genes that have been joined so that they are transcribed and translated as a single unit, producing a single polypeptide. In some embodiments, the at least two domains are fused together directly. In some embodiments, the domains are connected by one or more linkers.
The term “genetic modification” and its grammatical equivalents as used herein can refer to one or more alterations of a nucleic acid, e.g., the nucleic acid within an organism's genome. For example, genetic modification can refer to alterations, additions, and/or deletion of genes or portions of genes or other nucleic acid sequences. A genetically modified cell can also refer to a cell with an added, deleted, and/or altered gene or portion of a gene. A genetically modified cell can also refer to a cell with an added nucleic acid sequence that is not a gene or gene portion. Genetic modifications include, for example, both transient knock-in or knock-down mechanisms, and mechanisms that result in permanent knock-in, knock-down, or knock-out of target genes or portions of genes or nucleic acid sequences. Genetic modifications include, for example, both transient knock-in and mechanisms that result in permanent knock-in of nucleic acids sequences. Genetic modifications also include, for example, reduced or increased transcription, reduced or increased mRNA stability, reduced or increased translation, and reduced or increased protein stability.
As used herein, a composition refers to any mixture of two or more products, substances, or compounds, including cells.
As used herein, the term “RNA” or “ribonucleic acid” refers to a biomolecule composed of a chain of ribonucleotides, which are molecules made of a nitrogenous base, a sugar, and a phosphate group. RNA molecules include, but are not limited to, messenger RNA (mRNA) , transfer RNA (tRNA) , ribosomal RNA (rRNA) , and small non-coding RNA (ncRNA) such as microRNA (miRNA) and small interfering RNA (siRNA) . As used herein, “mRNA” or “messenger RNA” refers to an RNA molecule that comprises a protein-encoding sequence, which can be translated by ribosome in the process of protein synthesis. As used herein, “non-coding RNA” refers to an RNA molecule that does not encode a protein and is not translated into a protein.
As used herein, the term “DNA” or “deoxyribonucleic acid” refers to refers to a biomolecule composed of a chain of deoxyribonucleotides, which are molecules made of a nitrogenous base, a deoxyribose sugar, and a phosphate group.
As used herein, the term “variant” in the context of protein or polypeptide refers to a protein or polypeptide that differs from the parent protein, but retains essentially the same biological function or activity. In some embodiments, the parent protein is a wild-type protein. In some embodiments, the variant is a mutant.
As used herein, a “protein-binding RNA motif” refers to a piece of sequence in an RNA molecule that is capable of binding to proteins. In some embodiments, the protein-binding RNA motif is capable of binding to specific protein with high affinity and specificity. In some embodiments, the protein-binding RNA motif is an RNA aptamer or a variant thereof.
As used herein, “enzymatic activity” refers to catalytic properties. In some embodiments, protein with enzymatic activity acts upon substrate molecules and decreases the activation energy necessary for a chemical reaction to occur by stabilizing the transition state. This stabilization speeds up reaction rates and makes them happen at physiologically significant rates. In some embodiments, a protein having enzymatic activity is an enzyme, a functional domain of the enzyme, or a variant thereof.
For any protein of the present disclosure, biological equivalents thereof are also provided. In some embodiments, the biological equivalents have at least about 70%, 75%, 80%, 85%, 90%, 95%, 98%, or 99%sequence identity with the reference protein. Preferably, the biological equivalents retain the desired activity of the reference protein. In some embodiments, the biological equivalents are derived by including one, two, three, four, five, or more amino acid additions, deletions, substitutions, or the combinations thereof. In some embodiments, the substitution is a conservative amino acid substitution.
tRNA-like structure
Transfer RNA (tRNA) is an essential part of protein biosynthesis machinery, mediating the transfer of amino acids to ribosomes. During translation, an aminoacyl-tRNA synthetase (aaRS) catalyzes attachment of an amino acid to the 3’ -CCA end of a tRNA. The charged tRNA enters the ribosome, where its anticodon interacts with one of the codons of a messenger RNA (mRNA) . The majority of tRNAs are single polynucleotides with a cloverleaf secondary structure. The cloverleaf structure consists of an acceptor stem, a D-Arm, an anticodon stem-loop, and a T-Arm. It typically folds into a L-shaped three-dimensional structure. (Wu et al., 2021)
tRNA-like structures were first identified in the positive-strand RNA genomes of turnip yellow mosaic virus (TYMV) in 1970 (Pierre et al., 1970) . Since the discovery, many more TLSs have been reported not only in plant viruses but also in other viruses, bacteria, and eukaryotes (Sherlock et al., 2021; Mans et al., 1992) .
For the purpose of the present disclosure, the term “tRNA-like structure” refers to any RNA structure that has the ability to serve as a substrate for a tRNA-specific enzyme such as RNase P, RNase Z, and RNase E. In some embodiments, the tRNA-like structure has the ability to serve as a substrate for both RNase P and RNase Z.
Ribonuclease P (RNase P) activity is ubiquitously required in all three kingdoms of life (Archaea, Eubacteria and Eukarya) for the processing and maturation of tRNAs, together with RNase Z that catalyzes 3’ terminals of tRNA precursors (Altman et al., 1999; Gopalan et al., 2002) . Recent studies have also identified some additional RNA targets that are catalyzed by RNase P and RNase Z for their maturation, including two long non-coding RNAs (lncRNAs) , metastasis-associated lung adenocarcinoma transcript 1 (MALAT1) (Ji et al., 2003) and nuclear paraspeckle assembly transcript 1 (NEAT1) (Sunwoo et al., 2009) . The MALAT1 gene is transcribed by RNA polymerase II and generates a precursor RNA transcript after cleavage/polyadenylation (Jeremy et al., 2008) . This precursor RNA transcript is further cleaved by RNase P to form the mature MALAT1 transcript, which mainly locates in the nucleus (Jeremy et al., 2008) . Although the mature MALAT1 transcript is regarded to be non-coding, the triplex structure at its 3’ terminal has proved to be able to support translation like Poly (A) signals (Jeremy et al., 2012; Jessica et al., 2014) . The 3’ end product of RNase P cleavage is then cut by RNase Z (Jeremy et al., 2008) . The 5’end product of RNase Z cleavage is further added with a CCA tail and forms a mature mascRNA (MALAT1-associated small cytoplasmic RNA) of 61 nt, which is exported to the cytoplasm (Jeremy et al., 2008) . Mechanically, the mascRNA structurally resembles tRNA, which is sufficient for RNase P and RNase Z cleavage (Jeremy et al., 2008) .
RNase E is the largest RNase (118 kDa) , comprising 1, 061 amino acids. It is encoded by the E. coli rne gene. RNase E consists of two functionally distinct domains: the globular N-terminal half (NTH; 2 residues 1-529) and the C-terminal half (CTH; residues 530-1, 061) . The
RNase E-NTH has been shown to contain a catalytic activity domain for RNA cleavage, including a specific RNA binding site and a cleavage site. Because RNase E-NTH has the RNA-processing activity, such as RNA recognition and degradation, recombinant RNase E expressing amino acids 1-529 has been shown to be sufficient for the catalytic activity. In contrast, the CTH contains an arginine-rich domain, which is commonly involved in protein binding. The RNase E-CTH provides a scaffolding core for polynucleotide phosphorylase (PNPase) , RNase helicase B, enolase, polyphosphate kinase, poly (A) polymerase, GroEL, and DnaK, which cooperate together to direct RNA toward the degradosome (Baek et al., 2019) .
Application of tRNA-like structure
The present disclosure provides a novel idea of combining mRNA and one or more non-coding RNAs in one transcript ( “precursor transcript” ) and separating each component by one or more tRNA-like structures. The presence of tRNA-like structure (s) makes this precursor transcript subject to cleavage by intracellular RNases such as RNase P and RNase Z. After the cleavage, tRNA-like structure (s) are cut off, and the full-length precursor transcript releases the mRNA molecule and the non-coding RNA (s) . (See Fig. 1) This novel design allows the non-coding RNAs to be transcribed using the same promoter with the mRNA, thus simplifies production of non-coding RNA (s) .
In one embodiment, this novel design can be applied in gene editing systems. The present disclosure provides a novel way to express the gene editing protein and the gRNA with high efficiency by combining the mRNA and gRNA in one transcript and separating them with the tRNA-like structure. After cleavage by intracellular RNase P and RNase Z, the full-length precursor transcript releases one mRNA molecule encoding the gene editing proteins, and one or more gRNAs. (Fig. 1) . By linking mRNA and gRNA together in one single transcript, the gRNA doesn’ t need to be driven by separate promoters in plasmids or viral vectors. When using in vitro-synthesized mRNA/gRNAs to mediate gene editing, it is simplified to produce mRNA-gRNA fusions by IVT, thus avoiding the chemical synthesis of long gRNAs. This new design is particularly suitable for gene editing of multiple targets, which requires two or more gRNAs.
In some embodiments, the expression cassette in plasmids or viral vectors comprises (a) a eukaryotic RNA Pol II promoter, (b) a coding sequence of gene editing protein, (c) one or more triple helix structures (such as from lncRNA genes) , (d) one or more tRNA-like structures (such as from lncRNA genes) , (e) one or more gRNA sequences, and (f) a poly (A) signal. (Fig. 1)
In some embodiments, this novel design can be applied to transformer base editor systems. For example, in some embodiments, a main guide RNA (mgRNA) and/or a helper guide RNA (hgRNA) , and a mRNA encoding a Cas protein or a base editor protein are encoded by one polynucleotide, separating by one or more tRNA-like structures.
In some embodiments, this novel design can be applied to gene editing systems for editing mammalian genes.
In an aspect, the present disclosure provides an RNA comprising a protein coding sequence, a non-coding RNA sequence, and a tRNA-like structure (TLS) between the protein coding sequence and the non-coding RNA sequence. The tRNA-like structure is used to connect the mRNA and the non-coding RNA sequences. Cleavage of the tRNA-like structure by RNase P and RNase Z releases the mRNA and the one or more non-coding RNA sequences. In some embodiments, the tRNA-like structure is encoded by a polynucleotide sequence of any one of SEQ ID NOs: 4-7.
In some embodiments of the RNA disclosed herein, the RNA comprises from 5’ -end to 3’ -end the protein coding sequence, the tRNA-like structure (TLS) , and the non-coding RNA sequence.
In some embodiments of the RNA disclosed herein, the non-coding RNA sequence is a guide RNA (gRNA) sequence. In some embodiments of the RNA disclosed herein, the guide RNA (gRNA) comprises a spacer sequence and a scaffold sequence, wherein the scaffold sequence is capable of binding to the protein encoded by the protein coding sequence, and wherein the spacer sequence targets a target gene. In some embodiments, the spacer sequence is 20 bp. In some embodiments, the spacer sequence is 8, 9, 10, 11, 12, 13, 14, 15 , 16 , 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 bp. The spacer sequence can be changed for different targets. The scaffold sequence is designed according to the specific gene editing protein.
In some embodiments, the target gene is a mammalian gene. In some embodiments, the target gene is a human gene.
In some embodiments, the spacer sequence in the gRNA is selected from SEQ ID NOs: 204-260.
In some embodiments of the RNA disclosed herein, the scaffold sequence comprises at least one protein-binding motif, wherein the protein-binding motif is an RNA aptamer motif or a variant thereof. Aptamers are single-stranded oligonucleotides that fold into defined architectures and selectively bind to a specific target, including proteins, peptides, carbohydrates, small molecules, toxins, and even live cells.
In some embodiments, the protein binding motif is selected from MS2, PP7, boxB, SfMu hairpin motif, telomerase Ku, and Sm7 binding motif, or a variant thereof. The MS2 phage operator stem-loop binds to the MS2 coat protein (MCP) . The boxB binds to the N22p. The telomerase Ku binding motif binds to the Ku. The telomerase Sm7 binding motif binds to the Sm7 protein. The PP7 phage operator stem-loop binds to the PP7 coat protein (PCP) . The SfMu phage Com stem-loop binds to the Com RNA binding protein. (Table 2)
Table 2 Protein binding motif
In some embodiments of the RNA disclosed herein, the RNA further comprises an mRNA-stabilizing sequence between the protein coding sequence and the tRNA-like structure (TLS) .
As used herein, an “mRNA-stabilizing sequence” refers to a sequence that enhances mRNA stability. In some embodiments, the mRNA-stabilizing sequence lowers the in vivo degradation rate of the mRNA. In some embodiments, the mRNA -stabilizing sequence enhances mRNA translation.
In some embodiments of the RNA disclosed herein, the mRNA-stabilizing sequence is a triple helix sequence, a poly (A) , or a histone stem-loop. RNA triple helix refers to a specific RNA tertiary interaction in which double-stranded RNA stems make hydrogen bond contacts with a third strand of RNA. An RNA triple helix consists of three strands: a Watson-Crick RNA double helix whose major-groove establishes hydrogen bonds with the so-called “third strand. ” (Brown et al., 2020) The triple helix structure that is located downstream of the protein coding sequence is used to stabilize the mRNA structure and promote translation. A poly (A) sequence is a long chain of adenines nucleotides. Poly (A) is described in detail below. Histone stem-loop is a stem-loop structure at the 3’ -UTR of histone mRNAs, for example, the metazoan histone mRNA. The histone stem-loop can be bound by a 31 kDa stem-loop binding protein (SLBP) .
In some embodiments of the RNA disclosed herein, the triple helix sequence is derived from a long non-coding RNA (lncRNA) gene. Long non-coding RNAs (long ncRNAs, lncRNA) are a type of RNA, generally defined as transcripts more than 200 nucleotides that are not translated into protein. Long non-coding RNAs include, for example, intergenic lincRNAs, intronic ncRNAs, and sense and antisense lncRNAs. In some embodiments of the RNA disclosed herein, the lncRNA gene is metastasis-associated lung adenocarcinoma transcript 1 (MALAT1) or nuclear paraspeckle assembly transcript 1 (NEAT1) . In some embodiments, the triple helix is an MATLAT1 triple helix of SEQ ID NO: 1 or 3. In some embodiments, the triple helix is a NEAT1 triple helix of SEQ ID NO: 2.
In some embodiments of the RNA disclosed herein, the RNA further comprises a 5’ -untranslated region (5’ -UTR) at the 5’ -end of the RNA, and/or a 3’ -untranslated region (3’ -UTR) at the 3’ -end of the RNA. In some embodiments of the RNA disclosed herein, the RNA further comprises a poly-A sequence at the 3’ -end of the RNA and/or a 5’ -Cap structure at the 5’ -end of the RNA.
The untranslated region (UTR) is a regulatory region situated at the 5’ or 3’ end of a coding region. 5’ UTR is directly upstream from the initiation codon of the protein coding sequence. In some embodiments, the 5’ UTR is further added with a 5’ cap structure. 3’ UTR immediately follows the translation termination codon of the protein coding sequence. The 3’ UTR often contains regions for post-transcriptional regulation, such as polyadenylation, localization, and stability of the mRNA. In some embodiments, the 3’ UTR is further connected with a poly (A) tail. The 5’ -UTR is mainly involved in translation of its downstream open reading frame, while the function of the 3’ -UTR is to maintain mRNA stability.
In some embodiments of the RNA sequence described herein, the RNA sequence further comprises a 5’ cap at the 5’ end of the RNA sequence. 5’ cap refers to a specially altered nucleotide on the 5’ end of the RNA sequence. According to the degree of methylation, three main cap structures are possible: cap 0, cap 1, and cap 2. A cap 0 structure is the most elementary,
namely m7GpppNp; however, an mRNA of cap 0 is likely to be recognized as exogenous RNA by the host, which could stimulate the innate immune response of the host and ultimately trigger inflammatory responses. A cap1 structure (m7GpppN1mp) has a methylated 2’ -OH on the first nucleotide connecting the 5’ end of the mRNA to the cap. Since the cap1 structure has only been described to date in eukaryotic mRNAs, it can be used as a signature of self-RNA, thus reducing the activation of pattern recognition receptor (PRR) and consequently improving translation efficiency of mRNA in vivo. Lastly, cap2 (m7GpppN1mpN2mp) has a methylated 2’ -OH on both the first and second nucleotides that connect the 5’ end of the mRNA to the cap, and methylation improves mRNA translation efficiency. In some embodiments, the 5’ cap has a cap1 structure. In some embodiments, the 5’ cap is methylated, e.g., m7GpppN, wherein N is the 5’ terminal nucleotide of the nucleic acid carrying the 5’ cap. In some embodiments, the 5’ cap structure is selected from glyceryl, inverted deoxy abasic residue, 4’ , 5’ -methylene nucleotide, 1- (beta-D-erythrofuranosyl) nucleotide, 4’ -thio nucleotide, carbocyclic nucleotide, 1, 5-anhydrohexitol nucleotide, L-nucleotides, alpha-nucleotide, modified base nucleotide, threo-pentofuranosyl nucleotide, acyclic 3’ , 4’ -seco nucleotide, acyclic 3, 4-dihydroxybutyl nucleotide, acyclic 3, 5 dihydroxypentyl nucleotide, 3’ -3’ -inverted nucleotide moiety, 3’ -3’ -inverted abasic moiety, 3’ -2’ -inverted nucleotide moiety, 3’ -2’ -inverted abasic moiety, 1, 4-butanediol phosphate, 3’ -phosphoramidate, hexylphosphate, aminohexyl phosphate, 3’ -phosphate, 3’ -phosphorothioate, phosphorodithioate, bridging methylphosphonate moiety, and non-bridging methylphosphonate moiety.
Any method known in the art can be used to add the 5’ cap. For example, a capping enzyme RNA 5’ -triphosphatase (RTPase) can be used, which hydrolyzes the 5’ γ-phosphate of RNA, with a transfer of guanosine monophosphate (GMP) to 5’ -diphosphate RNA by guanylyltransferase (GTase) , and the resulting 5’ -end β-phosphate is combined with GMP to form GpppNp-RNA. Finally, the guanosine moiety is methylated by a cap-specific S-adenosylmethionine- (AdoMet) -dependent (guanine-N7) methyltransferase (N7MTase) , forming a cap0 structure (m7GpppNp) . The cap0 structure can be further modified to cap1 (m7GpppN1mp) by 2’ -O-methyltransferase (2’ -O-MTase) .
In some embodiments, the mRNA further comprises a poly (A) tail at its 3’ end. In some embodiments, the poly (A) tail locates downstream of all the non-coding RNA sequences and is used to terminate the transcription of RNA Pol II promoters. Frequently used Poly (A) signals include but not limited to, BGHpA, TKpA, hGHpA, and SV40pA. The poly (A) tail plays an important role in maintaining mRNA stability and translation efficiency. mRNA stability can be improved by inhibiting exonuclease-mediated mRNA degradation. The poly (A) tail can also bind to multiple poly (A) -binding proteins (PABPs) while working synergistically with 5’ m7G cap sequences to regulate translational efficiency. Polyadenylation can be done by traditional
enzymatic polyadenylation, adding the poly (A) tail to the 3’ end of mRNA, or by designing a fixed-length poly (A) sequence on a DNA template and transcribing the resulting length-controllable poly (A) tail. The poly (A) tail is a long chain of adenine nucleotides that is added to the 3’ end of a mRNA molecule. In some embodiments, the length of the poly (A) tail is at least 80, 90, 100, 150, 200, 250, 300, 350, 400, 450 or 500 nucleotides. In some embodiments, the length of the poly (A) tail is adjusted to control the stability of the mRNA molecule disclosed herein. For example, since the length of the poly (A) can influence the half-life of the mRNA molecule, the length of the poly (A) tail can be adjusted to modify the level of resistance of the mRNA to nucleases and thereby control the time course of protein expression.
In some embodiments of the RNA disclosed herein, the RNA further comprises a sequence encoding a nuclear localization signal (NLS) , wherein the sequence encoding the NLS is located at the 5’ -end and/or 3’ -end of the protein coding sequence. In some embodiments of the RNA disclosed herein, the RNA comprises more than one sequences that each encodes a nuclear localization signal (NLS) , wherein the nuclear localization signals encoded by the sequences are the same or different.
Nuclear localization signals (NLS) are generally short peptides that act as a signal fragment that mediates the transport of proteins from the cytoplasm into the nucleus. The NLS is recognized by the corresponding nuclear transporters, which can interact with nucleoporins to help NLS-containing proteins reach the nucleus through nuclear pore complexes. Multiple NLS have been identified in the art (Lu et al., 2021) . For example, classical NLS includes monopartite NLS (MP NLS) and bipartite NLS (BP NLS) . MP NLS are a single cluster composed of 4-8 basic amino acids, which generally contains 4 or more positively charged residues, that is, arginine (R) or lysine (K) . The characteristic motif of MP NLS is usually defined as K (K/R) X (K/R) , where X can be any residue. For example, the NLS of SV40 large T-antigen is 126PKKKRKV132, with five consecutive positively charged amino acids (KKKRK) . BP NLS are characterized by two clusters of 2-3 positively charged amino acids that are separated by a 9-12 amino-acid linker region, which contains several proline (P) residues [16] . The consensus sequence can be expressed as R/K (X) 10-12KRXK. Notably, in BP NLS, the upstream and downstream clusters of amino acids are interdependent and indispensable, and jointly determine the localization of the protein in the cell. For instance, the BP NLS at the C-terminus of nucleoplasmin, whose sequence is 155KRPAATKKAGQAKKKK170, can guide the protein into the nucleus. There is also non-classical NLS, such as the “proline-tyrosine” category, named PY-NLS. PY-NLS is characterized by 20-30 amino acids that assume a disordered structure, consisting of N-terminal hydrophobic or basic motifs and C-terminal R/K/H (X) 2-5PY motifs (where X2-5 is any sequence of 2-5 residues)
In some embodiments of the RNA disclosed herein, the protein coding sequence encodes a nucleotide binding protein. In some embodiments, the nucleotide binding protein is a
DNA binding protein. In some embodiments, the nucleotide binding protein is an RNA binding protein. As used herein, “nucleotide binding protein” refers to a protein that is capable of binding to a single strand or double strand polynucleotide, for example, DNA or RNA. As used herein, “RNA binding protein” refers to a protein that is capable of binding to the double or single stranded RNA to form a ribonucleoprotein complex. Classic RBPs are characterized by the presence of one or more RNA-binding domains (RBDs) . Most RBDs show defined 3D structures or features that make them computationally predictable. Classic RBDs include the prevalent RNA recognition motif (RRM) , the K-homology (KH) , DEAD/DEAH helicase and zinc-finger domains, and around 30 other domains of lesser abundance. Recent unbiased RNA interactome approaches have revealed additional unconventional RBPs that lack discernible RBDs but frequently contain intrinsically disordered regions or mononucleotide and dinucleotide binding domains that directly engage in RNA binding. (Gebauere et al., 2021) In some embodiments, the RNA binding protein is an RNA-guided gene editing protein.
RNA-guided systems, which use complementarity between a guide RNA and target nucleic acid sequences for recognition of genetic elements, have a central role in biological processes in both prokaryotes and eukaryotes. For example, CRISPR-Cas systems are well-studied prokaryotic RNA-guided systems. OMEGA (Obligate Mobile Element-guided Activity) is a newly identified class of RNA-guided system. It compasses an RNA-guided endonuclease protein (for example, TnpB, IscB or IsrB) and a non-coding RNA (ncRNA) transcribed from the transposon end region (called ωRNA) . OMEGA systems are the ancestors of CRISPR-Cas systems, and TnpB evolved into the single RNA-guided endonuclease Cas12. TnpB also has remote homology with Fanzor protein.
In some embodiments, the RNA binding protein is an RNA-guided endonuclease protein in the CRISPR-Cas system. In some embodiments, the RNA binding protein is an RNA-guided endonuclease protein in the OMEGA system.
In some embodiments of the RNA disclosed herein, the RNA binding protein is an IscB protein or a variant thereof. In some embodiments, the IscB protein is OgeuIscB or AwaIscB. In some embodiments, the IscB protein or a variant thereof has an amino acid of SEQ ID NO: 202 or 203. Transposon-encoded IscB family proteins are RNA-guided nucleases in the OMEGA (obligate mobile element-guided activity) system, and likely ancestors of the RNA-guided nuclease Cas9 in the type II CRISPR-Cas adaptive immune system. IscB associates with its cognate ωRNA to form a ribonucleoprotein complex that cleaves double-stranded DNA targets complementary to an ωRNA guide segment. IscB (insertion sequences Cas9-like OrfB) proteins are encoded in a distinct family of IS200/IS605 transposons. While IscB and Cas9 share the RuvC-like nuclease domains containing three conserved catalytic motifs (RuvC-I-III) , with an inserted Arg-rich segment known as the bridge helix (BH) , and the HNH nuclease domain, IscB (~400
residues) is much smaller than Cas9 (~1000-1400 residues) , mainly due to the lack of the α-helical recognition (REC) lobe. Unlike Cas9, IscB contains an amino-terminal PLMP domain (named according to the corresponding distinct amino-acid motif) . IscB associates with a ~200-400-nt non-coding RNA (referred to as ωRNA) , which is substantially larger than the ~100-nt crRNA: tracrRNA guides of Cas9, to form a ribonucleoprotein complex that cleaves dsDNA targets complementary to a 5’ guide sequence in the ωRNA. IscB requires a target adjacent motif (TAM) for target DNA recognition, although its carboxy-terminal region lacks detectable sequence similarity with the equivalent PAM-interacting (PI) carboxy-terminal domain of Cas9. Among the diverse IscB orthologs, an IscB protein derived from the human gut metagenome (OgeuIscB) exhibits DNA cleavage activity in human cells, and potentially could be used as a new genome-editing toolIn some embodiments of the RNA disclosed herein, the RNA binding protein is a Cas protein or a variant thereof. (Kato et al., 2022)
In some embodiments of the RNA disclosed herein, the RNA binding protein is a TnpB or a variant thereof. TnpB protein, encoded by the tnpB gene, is a programmable RNA-guided DNA endonuclease. TnpB cleaves double-and single-stranded DNA substrates in an RNA-guided manner. (Karvelis et al., 2021) .
In some embodiments of the RNA disclosed herein, the RNA binding protein is an IsrB or a variant thereof. IsrB is a nickase that is homologous to IscB, lacking the HNH nuclease domain. IsrB protein is encoded in the IS200/IS605 superfamily of transposons. IsrB consists of only around 350 amino acids, but its small size is counterbalanced by a relatively large RNA guide (roughly 300-nt ωRNA) . (Hirano et al., 2022)
In some embodiments of the RNA disclosed herein, the RNA binding protein is an Fanzor (Fz) protein or a variant thereof. Fanzor (Fz) is a eukaryotic TnpB-IS200/IS605-like protein encoded by transposable elements, and it was initially suggested that Fz proteins (and prokaryotic TnpBs) regulate transposable element activity, possibly through methyltransferase activity. It has reported that Fanzor could cleave DNA. (Saito et al., 2023)
In some embodiments of the RNA disclosed herein, the RNA binding protein is a Cas protein or a variant thereof. In some embodiments of the RNA disclosed herein, the Cas protein is a Cas9, a dead Cas9 (dCas9) , or a Cas9 nickase (nCas9) . Cas9, a programmable RNA-guided DNA endonuclease, is the effector component of type II CRISPR-Cas adaptive immune systems. Cas9 associates with a CRISPR RNA (crRNA) and a trans-activating crRNA (tracrRNA) (or a synthetic single-guide RNA) and cleaves double-stranded DNA (dsDNA) targets complementary to the ~20-nucleotide (nt) crRNA guide segment derived from CRISPR spacers, using its RuvC and HNH nuclease domains. In addition to the guide-target complementarity, Cas9 requires a specific nucleotide motif adjacent to target sequences, the protospacer adjacent motif (PAM) , for DNA recognition. Some Cas9 proteins, such as Streptococcus pyogenes Cas9 (SpCas9) , exhibit robust
DNA cleavage activity in mammalian cells and have been harnessed for a variety of molecular technologies, including genome editing, base editing, and transcriptional regulation
In some embodiments of the RNA disclosed herein, the Cas protein is selected from the group consisting of SpCas9, FnCas9, St1Cas9, St3Cas9, NmCas9, SaCas9, AsCpfl, LbCpfl, FnCpfl, VQR Cas9, EQR Cas9, VRER Cas9, Cas9-NG, xCas9, eCas9, SpCas9-HF1, HypaCas9, HiFiCas9, sniper-Cas9, SpG, SpRY, KKH SaCas9, CjCas9, Cas9-NRRH, Cas9-NRCH, Cas9-NRTH, SsCpfl, PcCpfl, BpCpfl, LiCpfl, PmCpfl, Lb2Cpf1, PbCpfl, PbCpfl, PeCpf1, PdCpf1, MbCpf1, EeCpf1, CmtCpf1, BsCpfl, BhCasl2b, AkCasl2b, BsCasl2b, AmCasl2b, AaCasl2b, RfxCasl3d, LwaCasl3a, PspCasl3b, PguCasl3b, and RanCasl3b. [00169] The term “Cas protein” or “clustered regularly interspaced short palindromic repeats (CRISPR) -associated (Cas) protein” refers to RNA-guided DNA endonuclease enzymes associated with the CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) adaptive immunity system in Streptococcus pyogenes, as well as other bacteria. Cas proteins include Cas9 proteins, Cas12a (Cpf1) proteins, Cas12b (formerly known as C2c1) proteins, Cas13 proteins and various engineered counterparts. Table 3 lists exemplary Cas proteins.
Table 3 Exemplary Cas Proteins
In some embodiments of the RNA disclosed herein, the protein coding sequence encodes a Cas protein fused with or linked to a protein having enzymatic activity. In some embodiments, the protein having enzymatic activity is a nucleobase deaminase. The term "nucleobase deaminase" as used herein, refers to a group of enzymes that catalyze the hydrolytic deamination of nucleobases such as cytidine, deoxycytidine, adenosine and deoxyadenosine. Non-limiting examples of nucleobase deaminases include cytidine deaminases and adenosine deaminases.
In some embodiments of the RNA disclosed herein, the protein having enzymatic activity is a cytidine deaminase or a variant thereof. “Cytidine deaminase” refers to enzymes that catalyze the hydrolytic deamination of cytidine and deoxycytidine to uridine and deoxyuridine,
respectively. Cytidine deaminases maintain the cellular pyrimidine pool. A family of cytidine deaminases is APOBEC ( “apolipoprotein B mRNA editing enzyme, catalytic polypeptide-like” ) . Members of this family are C-to-U editing enzymes. Some APOBEC family members have two domains, one domain of APOBEC like proteins is the catalytic domain, while the other domain is a pseudocatalytic domain. More specifically, the catalytic domain is a zinc dependent cytidine deaminase domain and is important for cytidine deamination. RNA editing by APOBEC-1 requires homodimerisation and this complex interacts with RNA binding proteins to form the editosome.
Non-limiting examples of APOBEC proteins include APOBEC1, APOBEC2, APOBEC3A, APOBEC3B, APOBEC3C, APOBEC3D, APOBEC3F, APOBEC3G, APOBEC3H, APOBEC4, and activation-induced (cytidine) deaminase (AID) .
Various mutants of the APOBEC proteins are also known that have brought about different editing characteristics for base editors. For instance, for human APOBEC3A, certain mutants (e.g., W98Y, Y130F, Y132D, W104A, D131Y and P134Y) even outperform the wildtype human APOBEC3A in terms of editing efficiency or editing window. Accordingly, the term APOBEC and each of its family member also encompasses variants and mutants that have certain level (e.g., 70%, 75%, 80%, 85%, 90%, 95%, 98%, 99%) of sequence identity to the corresponding wildtype APOBEC protein or the catalytic domain and retain the cytidine deaminating activity. The variants and mutants can be derived with amino acid additions, deletions and/or substitutions. Such substitutions, in some embodiments, are conservative substitutions.
In some embodiments of the RNA disclosed herein, the cytidine deaminase is selected from the group consisting of APOBEC3B (A3B) , APOBEC3C (A3C) , APOBEC3D (A3D) , APOBEC3F (A3F) , APOBEC3G (A3G) , APOBEC3H (A3H) , APOBEC1 (Al) , APOBEC3 (A3) , APOBEC2 (A2) , APOBEC4 (A4) , and AICDA (AID) .
In some embodiments of the gene editing system described herein, the cytidine deaminase comprises an amino acid sequence of SEQ ID NO: 166-201.
In some embodiments of the gene editing system described herein, the cytidine deaminase is a naturally occurring cytidine deaminase, an engineered cytidine deaminase, an evolved cytidine deaminase, or an adenosine deaminase that possesses cytidine deaminase activity.
In some embodiments of the gene editing system described herein, the cytidine deaminase is a human or mouse cytidine deaminase.
In some embodiments of the RNA disclosed herein, the protein having enzymatic activity is an adenosine deaminase or a variant thereof. “Adenosine deaminase” refers to an enzyme of the purine metabolism which catalyzes the irreversible deamination of adenosine and deoxyadenosine to inosine and deoxyinosine, respectively.
In some embodiments of the RNA disclosed herein, the adenosine deaminase is selected from the group consisting of tRNA-specific adenosine deaminase (TadA) , adenosine deaminase tRNA specific 1 (ADAT1) , adenosine deaminase tRNA specific 2 (ADAT2) , adenosine deaminase tRNA specific 3 (ADAT3) , adenosine deaminase RNA specific B1 (ADARB1) , adenosine deaminase RNA specific B2 (ADARB2) , adenosine monophosphate deaminase 1 (AMPD1) , adenosine monophosphate deaminase 2 (AMPD2) , adenosine monophosphate deaminase 3 (AMPD3) , adenosine deaminase (ADA) , adenosine deaminase 2 (ADA2) , adenosine deaminase like (ADAL) , adenosine deaminase domain containing 1 (ADAD1) , adenosine deaminase domain containing 2 (ADAD2) , and adenosine deaminase RNA specific (ADAR) .
In some embodiments of the gene editing system described herein, the adenosine deaminase comprises an amino acid sequence of SEQ ID NO: 73-165.
In some embodiments of the gene editing system described herein, the adenosine deaminase is a naturally occurring adenosine deaminase, an engineered adenosine deaminase, an evolved adenosine deaminase, or an adenosine deaminase that possesses adenosine deaminase activity.
In some embodiments of the gene editing system described herein, the adenosine deaminase is a human or mouse adenosine deaminase.
In some embodiments of the RNA disclosed herein, the protein coding sequence encodes a base editor protein. Base editor protein refers to a class of gene editing enzymes comprising an RNA-guided gene editing protein that is fused to or linked to a nucleotide deaminase or a catalytic domain thereof. In some embodiments, a base editor protein comprises a Cas9 nickase fused to a cytidine deaminase or an adenosine deaminase.
In some embodiments of the RNA disclosed herein, the base editor protein is a cytidine base editor (CBE) protein. In some embodiments of the RNA disclosed herein, the CBE protein is selected from BE3, YE1-BE3, YEE-BE3, BE4, eBE, hA3A-BE3, hA3A-BE3-Y130F, hA3A-BE3-Y132D, eA3A-BE3, SaKKH-BE3, Target-AID, dCas12a-BE, BEACON1, BEACON2, enAsBE, PBE, and A3A-PBE.
In some embodiments of the RNA disclosed herein, the base editor protein is an adenosine base editor (ABE) protein. In some embodiments of the RNA disclosed herein, the ABE protein is selected from ABE7.10, ABE8e, ABE8e-V106W, LbABE8e, STEME-1, ABE-P1, ABE-P2, and rBE14.
In some embodiments of the RNA disclosed herein, the protein having enzymatic activity is a methylase, or a reverse transcriptase. Methylase, also called methyltransferase, adds methyl groups (-CH3) to adenine or cytosine bases within the recognition sequence, which is thus modified and protected from the endonuclease. For example, the methylase is DNMT1, DNMT3a1,
DNMT3a2, and DNMT3b. Reverse transcriptase is an RNA-dependent DNA polymerase. In some embodiments, the reverse transcriptase is moloney murine leukemia virus reverse transcriptase (MMLV-RT) , or a functional variant thereof.
In some embodiments of the RNA disclosed herein, the tRNA-like structure (TLS) comprises an acceptor stem, a D-loop arm, and a TΨC-loop arm. The acceptor stem is a 7-to 9-base pair (bp) stem made by the base pairing of the 5’ -terminal nucleotide with the 3’ -terminal nucleotide (which contains the CCA 3’ -terminal group used to attach the amino acid) . The acceptor stem may contain non-Watson-Crick base pairs. The D loop is a 4-to 6-bp stem ending in a loop that often contains dihydrouridine. The TψC-loop (generally called the T-loop) contains thymine, a base usually found in DNA and pseudouracil (ψ) .
In some embodiments of the RNA disclosed herein, the tRNA-like structure (TLS) comprises a cleavage site for one or more RNase P, RNase Z, and/or RNase E. In some embodiments, the tRNA-like structure comprises cleavage sites for both RNase P and RNase Z.
In some embodiments of the RNA disclosed herein, the tRNA-like structure (TLS) is derived from a tRNA gene or a long non-coding RNA (lncRNA) gene. In some embodiments of the RNA disclosed herein, the long non-coding RNA (lncRNA) gene is metastasis-associated lung adenocarcinoma transcript 1 (MALAT1) or nuclear paraspeckle assembly transcript 1 (NEAT1) .
In some embodiments of the RNA disclosed herein, the tRNA-like structure (TLS) is derived from a eukaryotic organism. In some embodiments of the RNA disclosed herein, the eukaryotic organism is selected from the group consisting of Saccharomyces cerevisiae, Arabidopsis thaliana, Oryza sativa, Homo Sapiens, Macaca mulatta, Macaca fascicularis, Susscrofa domestica, Canis lupus familiaris, Rattus norvegicus, and Mus musculus.
In some embodiments of the RNA disclosed herein, the tRNA-like structure (TLS) is encoded by any one of SEQ ID NOs: 4-7. In some embodiments, the tRNA-like structure has an amino acid sequence with at least 70%, 75%, 80%, 85%, 90%, 95%, or 99%identity with the amino acid sequence encoded by any one of SEQ ID NOs: 4-7.
In some embodiments of the RNA disclosed herein, the RNA comprises one or more than one protein coding sequences.
In some embodiments of the RNA disclosed herein, the RNA comprises more than one non-coding RNA sequences, the non-coding RNA sequences are the same or different. In some embodiments of the RNA disclosed herein, the RNA comprises one non-coding RNA sequence. In some embodiments of the RNA disclosed herein, the RNA comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 non-coding RNA sequences.
In some embodiments of the RNA disclosed herein, the RNA comprises more than one non-coding RNA sequences that are guide RNAs (gRNAs) , wherein the gRNAs are the same or different. In some embodiments of the RNA disclosed herein, the RNA comprises one gRNA. In some embodiments of the RNA disclosed herein, the RNA comprises more than one gRNAs. In some embodiments of the RNA disclosed herein, the RNA comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 gRNAs.
In some embodiments of the RNA disclosed herein, the RNA comprises more than one non-coding RNA sequences, and wherein the RNA comprises a tRNA-like structure (TLS) between the protein coding sequence and the nearest non-coding RNA sequence, and between each non-coding RNA sequences. (Fig. 1B)
In some embodiments of the RNA disclosed herein, the protein coding sequence is located upstream relative to all non-coding RNA sequences.
In some embodiments of the RNA disclosed herein, the RNA comprises at least one modified nucleotide. In some embodiments, the modification is selected from 2’ -O-alkyl (such as 2’-O-methyl) , 2’ -substituted alkoxy, 2’ -substituted alkyl, 2’ -halo (such as 2’ -fluoro) , 3’ -phosphorothioate, bridged nucleic acid (BNA) , and locked nucleic acid (LNA) .
In some embodiments, the RNA is codon-optimized.
In an aspect, the present disclosure provides a DNA encoding any of the RNA disclosed herein.
In some embodiments of the DNA disclosed herein, the DNA further comprises an RNA polymerase promoter at the 5’ -end.
In some embodiments of the DNA disclosed herein, the RNA polymerase promoter is a eukaryotic RNA polymerase II promoter.
In some embodiments of the DNA disclosed herein, the RNA polymerase promoter is selected from human cytomegalovirus immediate early enhancer/promoter (CMV promoter) , human eukaryotic translation elongation factor 1 α1 promoter (EF1a promoter) , CMV early enhancer fused to modified chicken β-actin promoter (CAG promoter) , Simian virus 40 enhancer/early promoter (SV40 promoter) , and human or mouse phosphoglycerate kinase 1 promoter (PGK promoter) .
In an aspect, the present disclosure provides a system comprising any of the RNA disclosed herein, a ribonuclease P (RNase P) or a polynucleotide encoding thereof, and a ribonuclease Z (RNase Z) or a polynucleotide encoding thereof.
In an aspect, the present disclosure provides a system comprising any of the DNA disclosed herein and/or any of the vector disclosed herein, an RNA polymerase II or a
polynucleotide encoding thereof, a ribonuclease P (RNase P) or a polynucleotide encoding thereof, and a ribonuclease Z (RNase Z) or a polynucleotide encoding thereof.
Vectors
In an aspect, the present disclosure provides a vector comprising any of the DNA disclosed herein.
In some embodiments of the vector disclosed herein, the vector is a viral vector or a plasmid.
Any methods known in the art for the insertion of DNA fragments into a vector can be used to construct expression vectors comprising a polynucleotide disclosed herein. These methods can include in vitro recombinant DNA and synthetic techniques and in vivo (genetic) recombination. The polynucleotide disclosed herein can be operably linked to control sequences in the expression vector (s) to ensure protein expression. Such control sequences may include, but are not limited to, leader or signal sequences, promoters (e.g., naturally associated or heterologous promoters) , ribosomal binding sites, enhancer or activator elements, translational start and termination sequences, and transcription start and termination sequences, and are chosen to be compatible with the host cell chosen to express the proteins. Constitutive or inducible promoters as known in the art are also contemplated. The promoters may be either naturally occurring promoters, hybrid promoters that combine elements of more than one promoter, or synthetic promoters. An expression construct may be present in a cell on an episome, such as a plasmid, or the expression construct may be inserted in a chromosome such as in a gene locus. In some embodiment, the expression vector includes a selectable marker gene to allow the selection of transformed host cells. In some embodiments, the vector is an expression vector comprising a nucleotide sequence encoding a variant polypeptide operably linked to at least one regulatory control sequence. Regulatory control sequences for use herein include promoters, enhancers, and other expression control elements. In some embodiments, the expression vector is designed for the choice of the host cell to be transformed, the particular variant polypeptide desired to be expressed, the vector's copy number, the ability to control that copy number, and/or the expression of any other protein encoded by the vector, such as antibiotic markers.
The vector can include, but is not limited to, viral vectors and plasmid DNA. Viral vectors can include, but are not limited to, adenoviral vectors, lentiviral vectors, retroviral vectors, and adeno-associated viral vectors. Commonly, expression vectors contain selection markers such as ampicillin-resistance, hygromycin-resistance, tetracycline resistance, kanamycin resistance, or neomycin resistance to permit detection of those cells transformed with the desired DNA sequences. Suitable vectors, promoter, and enhancer elements are known in the art; many are commercially available for generating subject recombinant constructs. In some embodiments, the vector is a polycistronic vector. In some embodiments, the vector is a bicistronic vector or a
tricistronic vector. Bicistronic or polycistronic expression vectors may include (1) multiple promoters fused to each of the open reading frames; (2) insertion of splicing signals between genes; (3) fusion of genes whose expressions are driven by a single promoter; and (4) insertion of proteolytic cleavage sites between genes (self-cleavage peptide) or insertion of internal ribosomal entry sites (IRESs) between genes.
A polycistronic vector is used to co-express multiple genes in the same cell. Two strategies are most commonly used to construct a multicistronic vector. First, an Internal Ribosome Entry Site (IRES) element is typically used for bi-cistronic vectors. The IRES element, acting as another ribosome recruitment site, allows initiation of translation from an internal region of the mRNA. Thus, two proteins are translated from one mRNA. IRES elements are quite large (usually 500-600 bp) (Pelletier et al., 1988; Jang et al., 1988) . The engineered CD47 proteins disclosed herein have a smaller size compared to the wild-type full-length human CD47, and thus could be used with IRES element in a multicistronic vectors having limited packaging capacity.
Compositions
In an aspect, the present disclosure provides a composition comprising (1) any of the RNA disclosed herein, any of the DNA disclosed herein, and/or any of the vector disclosed herein; and (2) a carrier.
In some embodiments of the composition disclosed herein, the carrier is selected from lipid nanoparticles, liposomes, cationic nanoemulsions, dendrimer-based lipid nanoparticles, cationic polymers, and polysaccharide particles.
As used herein, the term “carrier” refers to compounds or compositions that are used for delivery of the polypeptide or the polynucleotide into a subject. Preferably, the carrier enhances effectiveness and/or safety of the delivery. In some embodiments, the carrier is capable of delivering large nucleic acid sequences (e.g., nucleic acids of at least 1 kDa, 1.5 kDa, 2 kDa, 2.5 kDa, 5 kDa, 10 kDa, 12 kDa, 15 kDa, 20 kDa, 25 kDa, 30 kDa, or more) . The nucleic acids can be formulated with one or more acceptable reagents, which provide a vehicle for delivering such nucleic acids to target cells. Appropriate reagents are generally selected with regards to a number of factors, which include, among other things, the biological or chemical properties of the nucleic acids (e.g., charge) , the intended route of administration, the anticipated biological environment to which such nucleic acids will be exposed and the specific properties of the intended target cells.
Lipid nanoparticles (LNPs) are nanoparticles made of one or more types of lipids. In some embodiments, lipid nanoparticles comprise ionizable lipids, which are positively charged at low pH (enabling RNA complexation) and neutral at physiological pH (reducing potential toxic effects, as compared with positively charged lipids, such as liposomes) . Owing to their size and properties, lipid nanoparticles are taken up by cells via endocytosis, and the ionizability of the
lipids at low pH (likely) enables endosomal escape, which allows release of the cargo into the cytoplasm. In some embodiments, the lipid nanoparticles comprise cationic lipids, which have a head group with permanent positive charges. In addition, lipid nanoparticles usually contain a helper lipid, for example, phospholipid, to promote cell binding, cholesterol to fill the gaps between the lipids, and a polyethylene glycol (PEG) to reduce opsonization by serum proteins and reticuloendothelial clearance. The relative amounts of ionizable lipid, helper lipid, cholesterol and PEG can vary (See Hou et al., Nature Reviews Materials, 2021) .
Liposomes are spherical-shaped vesicles that is composed of one or more phospholipid bilayers. Liposomes are most often composed of phospholipids, especially phosphatidylcholine and cholesterol, but may also include other lipids, such as phosphatidylethanolamine, as long as they are compatible with lipid bilayer structure. The lipid bilayer of liposome can fuse with other bilayers such as the cell membrane, thus delivering the liposome contents. Generally, liposomes are definite as spherical vesicles with particle sizes ranging from 30 nm to several micrometers. They consist of one or more lipid bilayers surrounding aqueous units, where the polar head groups are oriented in the pathway of the interior and exterior aqueous phases. On the other hand, self-aggregation of polar lipids is not limited to conventional bilayer structures which rely on molecular shape, temperature, and environmental and preparation conditions but may self-assemble into various types of colloidal particles (See Akbarzadeh, Nanoscale Res Lett., 2013) .
Cationic nanoemulsions (CNE) are mainly composed of two parts: one is the cationic lipid DOTAP (1, 2-dioleoyl-sn-glycero-3-phosphocholine) that can be added to the oil phase to bind the mRNA electrostatically; the other is the emulsion adjuvant MF59 that is an oil-in-water emulsion consisting of squalene and surfactants. CNEs are usually fabricated by the probe sonication method (Brito et al., A cationic nanoemulsion for the delivery of next-generation RNA vaccines, 2014) .
Dendrimer-based lipid nanoparticles are nanoparticles made of lipids and dendrimers, which are highly ordered, branched polymeric molecules. Dendrimers are composed of three distinct structural components: (1) the core, (2) the repetitive branching layers (also referred to as “generation” ) , and (3) the abundant terminal groups. These precisely controlled dendritic structures harbor multivalent cooperativity and can exploit membrane-fusion-based endosome release by mimicking lipid vectors, while simultaneously retaining the “proton-sponge” -mediated endosome release of polymer vectors (See Chen et al., Amphiphilic Dendrimer Vectors for RNA Delivery: State-of-the-Art and Future Perspective, 2022) .
Cationic polymer is another viable RNA carrier. An exemplary cationic polymer is poly (ethyleneimine) and its derivatives Polyethyleneimine (PEI) is among the earliest and most widely studied cationic polymers for gene delivery, including the delivery of RNA. It has high gene transfection efficiency and is often referred to as the gold standard for non-viral gene transfection
(Lungwitz et al., 2005) . PEI can be in either linear or branched structures and its positive charge is conferred by numerous amine groups separated by short alkyl spacers, which lead to very high positive charge density within its structure (Jiang et al., Polymeric nanoparticles for RNA delivery, 2021) .
Polysaccharides are a complex collection of biopolymers isolated from plant, animal, microbial and algal sources that are built from monosaccharides linked by O-glycosidic linkages. An exemplary polysaccharide that can be used for RNA delivery is Chitosan, is a polysaccharide contained in the cell walls of fungi and in the shells of arthropods such as crustaceans and consists of a linear chain of 2-acetaylamino-2-deoxy-β-D-glucopyranose units connected through β-1, 4 linkages (Bodnar, Hartmann &Borbely, 2005; Barclay et al., Review of polysaccharide particle-based functional drug delivery, 2020) .
As used herein, the term “composition” includes, but is not limited to, a pharmaceutical composition. A “pharmaceutical composition” refers to an active pharmaceutical agent formulated in pharmaceutically acceptable or physiologically acceptable solutions for administration to a cell or an animal, either alone, or in combination with one or more other modalities of therapy. It will also be understood that, if desired, the compositions of the disclosure may be administered in combination with other agents, such as, e.g., cytokines, growth factors, hormones, small molecules, chemotherapeutics, pro-drugs, drugs, antibodies, or other various pharmaceutically active agents. There is virtually no limit to other components that may also be included in the compositions, provided that the additional agents do not adversely affect the ability of the composition to deliver the intended therapy. The phrase “pharmaceutically acceptable” is used herein to refer to those compounds, materials, compositions, and/or dosage forms which are, within the scope of sound medical judgment, suitable for use in contact with the tissues of human beings and animals without excessive toxicity, irritation, allergic response, or other problem or complication, commensurate with a reasonable benefit/risk ratio.
The compositions may also comprise a pharmaceutically acceptable carrier, diluent, or excipient. As used herein “pharmaceutically acceptable carrier, diluent, or excipient” includes, without limitation, any adjuvant, carrier, excipient, glidant, sweetening agent, diluent, preservative, dye/colorant, flavor enhancer, surfactant, wetting agent, dispersing agent, suspending agent, stabilizer, isotonic agent, solvent, surfactant, or emulsifier which has been approved by the United States Food and Drug Administration as being acceptable for use in humans or domestic animals. Exemplary pharmaceutically acceptable carriers include, but are not limited to, to sugars, such as lactose, glucose, and sucrose; starches, such as corn starch and potato starch; cellulose, and its derivatives, such as sodium carboxymethyl cellulose, ethyl cellulose, and cellulose acetate; tragacanth; malt; gelatin; talc; cocoa butter; waxes; animal and vegetable fats; paraffins; silicones; bentonites; silicic acid; zinc oxide; oils, such as peanut oil, cottonseed oil, safflower oil, sesame
oil, olive oil, corn oil, and soybean oil; glycols, such as propylene glycol; polyols, such as glycerin, sorbitol, mannitol, and polyethylene glycol; esters, such as ethyl oleate, and ethyl laurate; agar; buffering agents, such as magnesium hydroxide and aluminum hydroxide; alginic acid; pyrogen-free water; isotonic saline; Ringer's solution; ethyl alcohol; phosphate buffer solutions; and any other compatible substances employed in pharmaceutical formulations.
The liquid pharmaceutical compositions, whether they be solutions, suspensions or other like form, may include one or more of the following: sterile diluents such as water for injection, saline solution, preferably physiological saline; Ringers solution; isotonic sodium chloride; fixed oils such as synthetic mono or diglycerides which may serve as the solvent or suspending medium; polyethylene glycols; glycerin; propylene glycol or other solvents; antibacterial agents, such as benzyl alcohol or methyl paraben; antioxidants such as ascorbic acid or sodium bisulfite; chelating agents, such as ethylenediaminetetraacetic acid; buffers such as acetates, citrates, or phosphates; and agents for the adjustment of tonicity, such as sodium chloride or dextrose. The parenteral preparation can be enclosed in ampoules, disposable syringes, or multiple dose vials made of glass or plastic. An injectable pharmaceutical composition is preferably sterile.
The composition may be suitably developed for intravenous, intratumoral, oral, rectal, vaginal, parenteral, topical, pulmonary, intranasal, buccal, ophthalmic, or another route of administration.
TLS-base gene editing systems
In an aspect, the present disclosure provides gene editing systems that are constructed with tRNA-like structures. In some embodiments, the gene editing system is a CRISPR-Cas system. In some embodiments, the gene editing system is a base editor (BE) system. In some embodiments, the gene editing system is a transformer base editor system (tBE) .
A transformer base editor (tBE) is a CRISPR-based gene editing system which can edit cytosine or adenosine in target regions with high specificity, preferably with no observable off-target mutations. In some embodiments, the transformer base editor (tBE) system comprises a CRISPR-associated protein (Cas protein) fused with a deaminase, a deaminase inhibitor domain, and a split-TEV protease. Thus, tBE remains inactive at off-target sites with a cleavable fusion of the deaminase inhibitor domain and eliminates unintended off-target mutations. Only when binding at on-target sites, tBE is transformed to cleave off the deaminase inhibitor domain and catalyzes targeted deamination for precise editing. A tBE system described by Wang et al. uses one main gRNA (mgRNA) to bind at the target genomic site and one helper (hgRNA) to bind at a nearby region (preferably upstream to the target genomic site) . The binding of the two gRNAs can guide the components of tBE system to correctly assemble at the target genomic site for base
editing. In some embodiments, the mgRNA is a main sgRNA (msgRNA) . In some embodiments, the hgRNA is a helper sgRNA (hsgRNA) .
In an aspect, the present disclosure provides a gene editing system comprising
a. an hgRNA comprising a CRISPR motif, an hgRNA spacer, and a first protein-binding motif, or a DNA polynucleotide encoding the hgRNA,
b. an mgRNA comprising a second CRISPR motif and an mgRNA spacer, or a DNA polynucleotide encoding the mgRNA, wherein the mgRNA spacer targets a target gene,
c. a first CRISPR-associated protein (Cas protein) , or a polynucleotide encoding the first Cas protein, wherein the first Cas protein binds to the first CRISPR motif,
d. a second Cas protein, or a polynucleotide encoding the second Cas protein, wherein the second Cas protein binds to the second CRISPR motif,
e. a first fusion protein comprising a nucleobase deaminase or a catalytic domain thereof and a first RNA binding domain, or a polynucleotide encoding the first fusion protein, wherein the nucleobase deaminase or the catalytic domain thereof and the first RNA binding domain are optionally connected by a linker, and wherein the first RNA binding domain binds to the first protein-binding motif.
wherein the first Cas protein and second Cas protein are the same or different,
wherein the gene editing system comprises a TLS-containing RNA or a DNA encoding the TLS-containing RNA, wherein the TLS-containing RNA comprises a protein coding sequence, a non-coding RNA sequence, and a tRNA-like structure (TLS) between the protein coding sequence and the non-coding RNA sequence; wherein the protein encoded by the protein coding sequence is selected from a group consisting of the first Cas protein, the second Cas protein, and the first fusion protein; and wherein the non-coding RNA sequence is the hgRNA or the mgRNA.
As used herein, a “TLS-containing RNA” refers to a RNA comprising at least one tRNA-like structure. In some embodiments, a TLS-containing RNA comprises an mRNA and at least one non-coding RNA, wherein at least one tRNA-like structure is located between the mRNA and the non-coding RNA, and/or between the non-coding RNAs.
In some embodiments, the gene editing system further comprises
a. a protease, or a polynucleotide encoding thereof, and
b. a nucleobase deaminase inhibitor domain or a polynucleotide encoding thereof,
wherein the nucleobase deaminase inhibitor domain is connected to the nucleobase deaminase or the catalytic domain thereof in the first fusion protein optionally by a linker, and wherein there is a cleavage site for the protease between the nucleobase
deaminase inhibitor domain and the nucleobase deaminase or the catalytic domain thereof.
In some embodiments, the gene editing system further comprises a second fusion protein comprising the protease and a second RNA binding domain, or a polynucleotide encoding the second fusion protein,
wherein the protease and the second RNA binding domain are optionally connected by a linker,
wherein the mgRNA further comprises a second protein-binding motif,
wherein the second RNA binding domain binds to the second protein-binding motif;
and wherein the protein encoded by the protein coding sequence is selected from a group consisting of the first Cas protein, the second Cas protein, the first fusion protein, and the second fusion protein.
In some embodiments, the protease is split into a first protease fragment and a second protease fragment, wherein the first and/or second protease fragment alone is not able to cleave the cleavage site.
In some embodiments, the gene editing system comprises
a. a second fusion protein comprising the first protease fragment and a second RNA binding domain, or a polynucleotide encoding the second fusion protein, wherein the first protease fragment and the second RNA binding domain are optionally connected by a linker, and
b. a third fusion protein comprising the second protease fragment and a third RNA binding domain, or a polynucleotide encoding the third fusion protein, wherein the second protease fragment and the third RNA binding domain are optionally connected by a linker,
wherein the mgRNA further comprises a second protein-binding motif and a third protein-binding motif,
wherein the second RNA binding domain binds to the second protein-binding motif, wherein the third RNA binding domain binds to the third protein-binding motif,
and wherein the protein encoded by the protein coding sequence is selected from a group consisting of the first Cas protein, the second Cas protein, the first fusion protein, the second fusion protein, and the third fusion protein.
In some embodiments, the second and third RNA binding domains are the same or different, and the second and third protein-binding motifs are the same or different.
In some embodiments, the gene editing system further comprises
a second fusion protein comprising the first protease fragment and a second RNA binding domain, or a polynucleotide encoding the second fusion protein,
wherein the first protease fragment and the second RNA binding domain are optionally connected by a linker,
wherein the mgRNA further comprises a second protein-binding motif,
wherein the second RNA binding domain binds to the second protein-binding motif,
and wherein the protein encoded by the protein coding sequence is selected from a group consisting of the first Cas protein, the second Cas protein, the first fusion protein, and the second fusion protein.
In some embodiments of the gene editing system described herein, the protease is a TEV protease, a TuMV protease, a PPV protease, a PVY protease, a ZIKV protease, or a WNV protease.
In some embodiments of the gene editing system described herein, the protease is a TEV protease comprising a sequence of SEQ ID NO: 261.
In some embodiments of the gene editing system described herein, the first TEV protease fragment comprises a sequence of SEQ ID NO: 262 or 263.
A “protease” refers to an enzyme that catalyzes proteolysis. A “cleavage site for a protease” refers to a short peptide that the protease recognizes, and within the short peptide creates a proteolytic cleavage. Non-limiting examples of proteases include TEV protease, TuMV protease, PPV protease, PVY protease, ZIKV protease, and WNV protease. The protein sequences of example proteases and their corresponding cleavage sites are provided in Table 4..
Table 4 Exemplary proteases and their cleavage sites
In some embodiments, the protease cleavage site is a self-cleaving peptide, such as the 2A peptides. “2A peptides” are 18-22 amino-acid-long viral oligopeptides that mediate “cleavage” of polypeptides during translation in eukaryotic cells. The designation “2A” refers to a specific region of the viral genome and different viral 2As have generally been named after the virus they were derived from. The first discovered 2A was F2A (foot-and-mouth disease virus) , after which E2A (equine rhinitis A virus) , P2A (porcine teschovirus-1 2A) , and T2A (thosea asigna virus 2A) were also identified. A few non-limiting examples of 2A peptides are provided in SEQ ID NOs: 276-278.
In some embodiments, the first and/or the second TEV protease fragment is not able to cleave the TEV cleavage site on its own. However, in the presence of the remaining portion of the TEV protease, this fragment will be able to effectuate the cleavage. The TEV fragment may be the TEV N-terminal domain (e.g., SEQ ID NO: 262) or the TEV C-terminal domain (e.g., SEQ ID NO: 263) . In some embodiments, the first TEV protease fragment comprises a sequence of SEQ ID NO: 262. In some embodiments, the first TEV protease fragment comprises a sequence of SEQ ID NO: 263.
In some embodiments of the gene editing system described herein, the nucleotide deaminase is a cytidine deaminase.
In some embodiments of the gene editing system described herein, the cytidine deaminase is selected from the group consisting of APOBEC3B (A3B) , APOBEC3C (A3C) , APOBEC3D (A3D) , APOBEC3F (A3F) , APOBEC3G (A3G) , APOBEC3H (A3H) , APOBECI (Al) , APOBEC3 (A3) , APOBEC2 (A2) , APOBEC4 (A4) , and AICDA (AID) .
In some embodiments of the gene editing system described herein, the cytidine deaminase comprises an amino acid sequence of any one of SEQ ID NOs: 166-201.
In some embodiments of the gene editing system described herein, the nucleotide deaminase is an adenosine deaminase.
In some embodiments of the gene editing system described herein, the adenosine deaminase is selected from the group consisting of tRNA-specific adenosine deaminase (TadA) , adenosine deaminase tRNA specific 1 (ADAT1) , adenosine deaminase tRNA specific 2 (ADAT2) ,
adenosine deaminase tRNA specific 3 (ADAT3) , adenosine deaminase RNA specific B1 (ADARB1) , adenosine deaminase RNA specific B2 (ADARB2) , adenosine monophosphate deaminase 1 (AMPD1) , adenosine monophosphate deaminase 2 (AMPD2) , adenosine monophosphate deaminase 3 (AMPD3) , adenosine deaminase (ADA) , adenosine deaminase 2 (ADA2) , adenosine deaminase like (ADAL) , adenosine deaminase domain containing 1 (ADAD1) , adenosine deaminase domain containing 2 (ADAD2) , and adenosine deaminase RNA specific (ADAR) .
In some embodiments of the gene editing system described herein, the adenosine deaminase comprises an amino acid sequence of any one of SEQ ID NOs: 73-165.
In some embodiments of the gene editing system described herein, the first fusion protein further comprises an uracil glycosylase inhibitor (UGI) . In some embodiments, the UGI has a sequence of SEQ ID NO: 275. The “Uracil Glycosylase Inhibitor” (UGI) , which can be prepared from Bacillus subtilis bacteriophage PBS1, is a small protein (9.5 kDa) which inhibits E. coli uracil-DNA glycosylase (UDG) as well as UDG from other species. Inhibition of UDG occurs by reversible protein binding with a 1: 1 UDG: UGI stoichiometry. UGI is capable of dissociating UDG-DNA complexes. A non-limiting example of UGI is found in Bacillus phage AR9 (YP_009283008.1) . In some embodiments, the UGI comprises the amino acid sequence of SEQ ID NO: 275 or has at least 70%, 75%, 80%, 85%, 90%or 95%sequence identity to SEQ ID NO: 275 and retains the uracil glycosylase inhibition activity.
In some embodiments of the gene editing system described herein, the Cas protein is a Cas9, a dead Cas9 (dCas9) , or a Cas9 nickase (nCas9) selected from the group consisting of SpCas9, FnCas9, St1Cas9, St3Cas9, NmCas9, SaCas9, AsCpfl, LbCpfl, FnCpfl, VQR SpCas9, EQR SpCas9, VRER SpCas9, SpCas9-NG, xSpCas9, RHA FnCas9, KKH SaCas9, NmeCas9, StCas9, CjCas9, AsCpfl, FnCpfl, SsCpfl, PcCpfl, BpCpfl, CmtCpfl, LiCpfl, PmCpfl, Pb3310Cpfl, Pb4417Cpfl, BsCpfl, EeCpfl, BhCasl2b, AkCasl2b, EbCasl2b, LsCasl2b, RfCasl3d, LwaCasl3a, PspCasl3b, PguCasl3b, and RanCasl3b. In some embodiments, the Cas protein has an amino acid sequence of any one of SEQ ID Nos: 8-59.
In some embodiments of the gene editing system described herein, the first protein-binding RNA motif and the first RNA binding domain, the second protein-binding RNA motif and the second RNA binding domain, and the third protein-binding RNA motif and the third RNA binding domain, are each independently selected from the group consisting of a MS2 phage operator stem-loop and MS2 coat protein (MCP) or an RNA-binding section thereof,
a BoxB and N22P or an RNA-binding section thereof,
a telomerase Ku binding motif and Ku protein or an RNA-binding section thereof,
a telomerase Sm7 binding motif and Sm7 protein or an RNA-binding section thereof,
a PP7 phage operator stem -loop and PP7 coat protein (PCP) or an RNA-binding section thereof,
a SfMu phage Com stem-loop and Com RNA binding protein or an RNA-binding section thereof, and
a non-natural RNA aptamer and corresponding aptamer ligand or an RNA-binding section thereof.
In some embodiments of the gene editing system described herein, the mgRNA and/or the hgRNA comprise a dual-RNA structure.
In some embodiments of the gene editing system described herein, the dual-RNA structure is formed by a CRISPR RNA (crRNA) and a trans-activating crRNA (tracrRNA) , wherein the crRNA comprises the spacer.
In some embodiments of the gene editing system described herein, the mgRNA comprises a mcrRNA and a first tracrRNA, and the mcrRNA comprises the mgRNA spacer, wherein the hgRNA comprises a hcrRNA and a second tracrRNA, and the hcrRNA comprises the hgRNA spacer, and wherein the first tracrRNA and the second tracrRNA are same or different.
In some embodiments of the gene editing system described herein, the TLS-containing RNA comprises more than one non-coding RNA sequences.
In some embodiments of the gene editing system described herein, the target gene is a mammalian gene. In some embodiments, the target gene is TRAC, CD52, B2M, PDCD1, CTLA4, CD7, HBG, CD33, CD123 or CLL-1.
In some embodiments, the target gene is HBV gene.
In some embodiments of the gene editing system described herein, the mgRNA spacer and the hgRNA spacer are respectively:
Methods
In an aspect, the present disclosure provides a method for gene editing in a subject, comprising administering to the subject (1) any of the RNA disclosed herein; and/or (2) any of the DNA disclosed herein; and/or (3) any of the vector disclosed herein; and/or (4) any of the system disclosed herein; and/or (5) any of the composition disclosed herein; and/or (6) any of the gene editing system disclosed herein. In some embodiments, the subject is a mammal. In some embodiments, the subject is a human.
The RNA sequences described herein and/or the polynucleotides described herein can be prepared according to any available technique including, for example, but not limited to chemical synthesis, enzymatic synthesis, which is generally termed in vitro transcription (IVT) or enzymatic or chemical cleavage of a longer precursor, etc. Methods of synthesizing RNAs are known in the art (see, e.g., Gait, M.J. (ed. ) Oligonucleotide synthesis: a practical approach, Oxford [Oxfordshire] , Washington, D.C.: IRL Press, 1984; and Herdewijn, P. (ed. ) Oligonucleotide synthesis: methods and applications, Methods in Molecular Biology, v. 288 (Clifton, N.J. ) Totowa, N.J.: Humana Press, 2005; both of which are incorporated herein by reference) .
For in vitro transcription, first DNA encoding RNA of interest is synthesized with methods know in the art, such as column-based oligonucleotide synthesis and microarray-based oligonucleotide synthesis. (Hughes RA, Ellington AD. Synthetic DNA Synthesis and Assembly: Putting the Synthetic in Synthetic Biology. Cold Spring Harb Perspect Biol. 2017 Jan 3; 9 (1) : a023812. doi: 10.1101/cshperspect. a023812. PMID: 28049645; PMCID: PMC5204324. ) .
The DNA construct is then cloned into a plasmid vector. In some embodiments, the vector comprises a T7 promoter. The vector is then amplified, isolated and purified using methods known in the art such as, but not limited to, a maxi prep using the Invitrogen PURELINKTM HiPure Maxiprep Kit (Carlsbad, Calif. ) .
The plasmid may then be linearized using methods known in the art such as, but not limited to, the use of restriction enzymes and buffers. The linearization reaction may be purified using methods including, for example Invitrogen's PURELINKTM PCR Micro Kit (Carlsbad, Calif. ) , and HPLC based purification methods such as, but not limited to, strong anion exchange HPLC, weak anion exchange HPLC, reverse phase HPLC (RP-HPLC) , and hydrophobic interaction HPLC (HIC-HPLC) and Invitrogen's standard PURELINKTM PCR Kit (Carlsbad, Calif. ) . The purification method may be modified depending on the size of the linearization reaction which was conducted. The linearized plasmid is then used to generate cDNA for in vitro transcription (IVT) reactions.
A cDNA template may be synthesized by having a linearized plasmid undergo polymerase chain reaction (PCR) . In one embodiment, the cDNA may be submitted for sequencing analysis before undergoing transcription.
The cDNA produced in the previous step may be transcribed using an in vitro transcription (IVT) system. The system typically comprises a transcription buffer, nucleotide triphosphates (NTPs) , an RNase inhibitor and a polymerase. The NTPs may be manufactured in house, may be selected from a supplier, or may be synthesized as described herein. The NTPs may be selected from, but are not limited to, those described herein including natural and unnatural (modified) NTPs. The polymerase may be selected from, but is not limited to, T7 RNA polymerase, T3 RNA polymerase and mutant polymerases such as, but not limited to, polymerases able to be incorporated into modified nucleic acids.
The cDNA template may be removed using methods known in the art such as, but not limited to, treatment with Deoxyribonuclease I (DNase I) . RNA clean-up may also include a purification method such as, but not limited to, system from Beckman Coulter (Danvers, Mass. ) , HPLC based purification methods such as, but not limited to, strong anion exchange HPLC, weak anion exchange HPLC, reverse phase HPLC (RP-HPLC) , and hydrophobic interaction HPLC (HIC-HPLC) .
Table 5 Sequences involved in the present disclosure
REFERENCES
Altman S., Kirsebom L. A (1999) . Ribonuclease P. The RNA world, second edition. (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY) , 351-380.
Venkat Gopalan, Agustin Vioque, Sidney Altman (2002) . RNase P: variations and uses, J Biol Chem 277 (9) , 6759-62.
Ping Ji, Sven Diederichs, Wenbing Wang, et al. (2003) . MALAT-1, a novel noncoding RNA, and thymosin beta4 predict metastasis and survival in early-stage non-small cell lung cancer, Oncogene 22, 8031-8041.
Hongjae Sunwoo, Marcel E. Dinger, Jeremy E. Wilusz, et al. (2009) . MEN epsilon/beta nuclear-retained non-coding RNAs are up-regulated upon muscle differentiation and are essential components of paraspeckles, Genome Research.
Jeremy E. Wilusz, Susan M. Freier, and David L. Spector (2008) . 3’ End Processing of a Long Nuclear-Retained Noncoding RNA Yields a tRNA-like Cytoplasmic RNA, Cell, 135, 919-932.
Jeremy E. Wilusz, Courtney K. JnBaptiste, Laura Y. Lu, et al. (2012) . A triple helix stabilizes the 3’ ends of long noncoding RNAs that lack poly (A) tails, Genes &Development, 26.
Jessica A Brown, David Bulkley, Jimin Wang, et al. (2014) . Structural insights into the stabilization of MALAT1 noncoding RNA by a bipartite triple helix, Nature Structural & Molecular Biology, 21 (7) .
Wu, Sipeng, Xiang Li, and Geng Wang. "tRNA‐like structures and their functions. " The FEBS Journal 289.17 (2022) : 5089-5099.
Yot, Pierre, et al. "Valine-specific tRNA-like structure in turnip yellow mosaic virus RNA. " Proceedings of the National Academy of Sciences 67.3 (1970) : 1345-1352.
Sherlock, Madeline E., et al. "Structural diversity and phylogenetic distribution of valyl tRNA-like structures in viruses. " Rna 27.1 (2021) : 27-39.
Mans, Ruud WM, Cornelis WA Pleij, and Leendert Bosch. "tRNA-like structures: structure, function and evolutionary significance. " EJB Reviews 1991 (1992) : 199-220.
Lu, Juane, et al. "Types of nuclear localization signals and mechanisms of protein import into the nucleus. " Cell communication and signaling 19.1 (2021) : 1-10.
Gebauer, Fátima, et al. "RNA-binding proteins in human genetic disease. " Nature Reviews Genetics 22.3 (2021) : 185-198.
Kato, Kazuki, et al. "Structure of the IscB-ωRNA ribonucleoprotein complex, the likely ancestor of CRISPR-Cas9. " Nature Communications 13.1 (2022) : 6719.
Baek, Yu Mi, et al. "The bacterial endoribonuclease RNase E can cleave RNA in the absence of the RNA chaperone Hfq. " Journal of Biological Chemistry 294.44 (2019) : 16465-16478.
Karvelis, Tautvydas, et al. "Transposon-associated TnpB is a programmable RNA-guided DNA endonuclease. " Nature 599.7886 (2021) : 692-696.
Hirano, Seiichi, et al. "Structure of the OMEGA nickase IsrB in complex with ωRNA and target DNA. " Nature 610.7932 (2022) : 575-581.
Saito, Makoto, et al. "Fanzor is a eukaryotic programmable RNA-guided endonuclease. " Nature (2023) : 1-3.
Wang, L. et al. Eliminating base-editor-induced genome-wide and transcriptome-wide off-target mutations. Nat Cell Biol 23, 552-563 (2021) .
EXAMPLES
Example 1 Application of tRNA-like Structure in CRISPR/Cas9 System
Plasmid construction
Different architectures of plasmids were constructed as shown in Fig. 2A. Plasmid “SpCas9” includes only an SpCas9-coding sequence but not an sgRNA sequence. In plasmids “SpCas9-sgRNA1” and “SpCas9-sgRNA2” , the SpCas9-encoding sequences and sgRNA sequences are connected directly, or via a tRNA-like structure derived from the hMALAT1 gene, respectively. Compared with plasmid “SpCas9-sgRNA2, ” “SpCas9-sgRNA3” contains an additional triple helix sequence of hMALAT1 gene after the SpCas9 sequence. In all four plasmids, the EF1a promoter was used to drive transcription initiation, with BGHpA as the termination and polyadenylation signal. Plasmids were prepared by using plasmid extraction kits, quantitated by Nanodrop One C (Thermofisher) , and then used for cell transfection.
Cell culture and transfection
293FT cells were maintained in DMEM supplemented with 10%FBS and regularly tested to exclude mycoplasma contamination. For gene editing with plasmids, 293FT cells were seeded in a 24-well plate at a density of 1 × 105 cells per well and transfected with 250 μl serum-free Opti-MEM containing 2.5 μl LIPOFECTAMINE LTX, 1 μl LIPOFECTAMINE plus, 0.95 μg SpCas9 or SpCas9-sgRNA plasmids, and 0.1 μg puromycin screening plasmids. After 24 h,
puromycin was added to the medium at a final concentration of 4 μg/ml. After another 48 h, the genomic DNA was extracted from the cells using QuickExtract DNA Extraction Solution for subsequent sequencing analysis. Target genomic sequences were PCR-amplified using high-fidelity DNA polymerase PrimeSTAR HS with primer sets flanking the examined sgRNA target sites. Indel frequency at each target site was calculated by Synthego analysis. See https: //ice. synthego. com.
Experimental results
As shown in Fig. 2B, “SpCas9” and “SpCas9-sgRNA1” plasmids induced little gene editing (presented as “Indel percentage” ) at both BCL11A and KLF1 locus in 239FT cells (Fig. 2B) . When the hMALAT1 tRNA-like sequence was used to link the SpCas9 and sgRNA sequences, significantly more gene editing (over 10%) were produced (Fig. 2B, SpCas9-sgRNA2) . Inclusion of the hMALAT1 triple helix structure further increased the gene editing efficiency (Fig. 2B, SpCas9-sgRNA3) .
Claims (82)
- An RNA comprising a protein coding sequence, a non-coding RNA sequence, and a tRNA-like structure (TLS) between the protein coding sequence and the non-coding RNA sequence.
- The RNA of claim 1, wherein the RNA comprises from 5’-end to 3’-end the protein coding sequence, the tRNA-like structure (TLS) , and the non-coding RNA sequence.
- The RNA of any one of claims 1 and 2, wherein the non-coding RNA sequence is a guide RNA (gRNA) sequence.
- The RNA of claim 3, wherein the guide RNA (gRNA) comprises a spacer sequence and a scaffold sequence, wherein the scaffold sequence is capable of binding to the protein encoded by the protein coding sequence, and wherein the spacer sequence targets a target gene.
- The RNA of claim 4, wherein the target gene is a mammalian gene.
- The RNA of claim 4, wherein the spacer sequence is selected from SEQ ID NOs: 204-260.
- The RNA of claim 4, wherein the scaffold sequence comprises at least one protein-binding motif, wherein the protein-binding motif is an RNA aptamer motif or a variant thereof.
- The RNA of any one of claims 1-7, wherein the RNA further comprises an mRNA-stabilizing sequence between the protein coding sequence and the tRNA-like structure (TLS) .
- The RNA of claim 8, wherein the mRNA-stabilizing sequence is a triple helix sequence, a poly (A) , or a histone stem-loop.
- The RNA of claim 9, wherein the triple helix is derived from a long non-coding RNA (lncRNA) gene.
- The RNA of claim 10, wherein the lncRNA gene is metastasis-associated lung adenocarcinoma transcript 1 (MALAT1) or nuclear paraspeckle assembly transcript 1 (NEAT1) .
- The RNA of any one of claims 1-11, wherein the RNA further comprises a 5’-untranslated region (5’-UTR) at the 5’-end of the RNA, and/or a 3’-untranslated region (3’-UTR) at the 3’-end of the RNA.
- The RNA of any one of claims 1-12, wherein the RNA further comprises a poly-Asequence at the 3’-end of the RNA and/or a 5’-Cap structure at the 5’-end of the RNA.
- The RNA of any one of claims 1-13, wherein the RNA further comprises a sequence encoding a nuclear localization signal (NLS) , wherein the sequence encoding the NLS is located at the 5’-end and/or 3’-end of the protein coding sequence.
- The RNA of claim 14, wherein the RNA comprises more than one sequences that each encode a nuclear localization signal (NLS) , wherein the nuclear localization signals encoded by the sequences are the same or different.
- The RNA of any one of claims 1-15, wherein the protein coding sequence encodes a nucleotide binding protein.
- The RNA of claim 16, wherein the nucleotide binding protein is an Fanzor (Fz) protein or a variant thereof.
- The RNA of claim 16, wherein the nucleotide binding protein is a TnpB or a variant thereof, an IscB or a variant thereof, or an IsrB protein or a variant thereof.
- The RNA of claim 16, wherein the nucleotide binding protein is a Cas protein or a variant thereof.
- The RNA of claim 19, wherein the Cas protein is a Cas9, a dead Cas9 (dCas9) , or a Cas9 nickase (nCas9) .
- The RNA of claim 19 or 20, wherein the Cas protein is selected from the group consisting of SpCas9, FnCas9, St1Cas9, St3Cas9, NmCas9, SaCas9, AsCpfl, LbCpfl, FnCpfl, VQR Cas9, EQR Cas9, VRER Cas9, Cas9-NG, xCas9, eCas9, SpCas9-HF1, HypaCas9, HiFiCas9, sniper-Cas9, SpG, SpRY, KKH SaCas9, CjCas9, Cas9-NRRH, Cas9-NRCH, Cas9-NRTH, SsCpfl, PcCpfl, BpCpfl, LiCpfl, PmCpfl, Lb2Cpf1, PbCpfl, PbCpfl, PeCpf1, PdCpf1, MbCpf1, EeCpf1, CmtCpf1, BsCpfl, BhCasl2b, AkCasl2b, BsCasl2b, AmCasl2b, AaCasl2b, RfxCasl3d, LwaCasl3a, PspCasl3b, PguCasl3b, and RanCasl3b.
- The RNA of any one of claims 1-15, wherein the protein coding sequence encodes a Cas protein fused with or linked to a protein having enzymatic activity.
- The RNA of claim 22, wherein the protein having enzymatic activity is a cytidine deaminase.
- The RNA of claim 23, wherein the cytidine deaminase is selected from the group consisting of APOBEC3B (A3B) , APOBEC3C (A3C) , APOBEC3D (A3D) , APOBEC3F (A3F) , APOBEC3G (A3G) , APOBEC3H (A3H) , APOBEC1 (Al) , APOBEC3 (A3) , APOBEC2 (A2) , APOBEC4 (A4) , and AICDA (AID) .
- The RNA of claim 22, wherein the protein having enzymatic activity is an adenosine deaminase.
- The RNA of claim 25, wherein the adenosine deaminase is selected from the group consisting of tRNA-specific adenosine deaminase (TadA) , adenosine deaminase tRNA specific 1 (ADAT1) , adenosine deaminase tRNA specific 2 (ADAT2) , adenosine deaminase tRNA specific 3 (ADAT3) , adenosine deaminase RNA specific B1 (ADARB1) , adenosine deaminase RNA specific B2 (ADARB2) , adenosine monophosphate deaminase 1 (AMPD1) , adenosine monophosphate deaminase 2 (AMPD2) , adenosine monophosphate deaminase 3 (AMPD3) , adenosine deaminase (ADA) , adenosine deaminase 2 (ADA2) , adenosine deaminase like (ADAL) , adenosine deaminase domain containing 1 (ADAD1) , adenosine deaminase domain containing 2 (ADAD2) , and adenosine deaminase RNA specific (ADAR) .
- The RNA of claim 22, wherein the protein coding sequence encodes a base editor protein.
- The RNA of claim 27, wherein the base editor protein is a cytidine base editor (CBE) protein.
- The RNA of claim 28, wherein the CBE protein is selected from BE3, YE1-BE3, YEE-BE3, BE4, eBE, hA3A-BE3, hA3A-BE3-Y130F, hA3A-BE3-Y132D, eA3A-BE3, SaKKH-BE3, Target-AID, dCas12a-BE, BEACON1, BEACON2, enAsBE, PBE, and A3A-PBE.
- The RNA of claim 27, wherein the base editor protein is an adenosine base editor (ABE) protein.
- The RNA of claim 30, wherein the ABE protein is selected from ABE7.10, ABE8e, ABE8e-V106W, LbABE8e, STEME-1, ABE-P1, ABE-P2, and rBE14.
- The RNA of claim 22, wherein the protein having enzymatic activity is a methylase or a reverse transcriptase.
- The RNA of any one of claims 1-32, wherein the tRNA-like structure (TLS) comprises an acceptor stem, a D-loop arm, and a TΨC-loop arm.
- The RNA of any one of claims 1-33, wherein the tRNA-like structure (TLS) comprises a cleavage site for one or more RNase P, RNase Z, and/or RNase E.
- The RNA of any one of claims 1-34, wherein the tRNA-like structure (TLS) is derived from a tRNA gene or a long non-coding RNA (lncRNA) gene.
- The RNA of claim 35, wherein the long non-coding RNA (lncRNA) gene is metastasis-associated lung adenocarcinoma transcript 1 (MALAT1) or nuclear paraspeckle assembly transcript 1 (NEAT1) .
- The RNA of any one of claims 1-36, wherein the tRNA-like structure (TLS) is derived from a eukaryotic organism.
- The RNA of claim 37, wherein the eukaryotic organism is selected from the group consisting of Saccharomyces cerevisiae, Arabidopsis thaliana, Oryza sativa, Homo Sapiens, Macaca mulatta, Macaca fascicularis, Susscrofa domestica, Canis lupus familiaris, Rattus norvegicus, and Mus musculus.
- The RNA of any one of claims 1-38, wherein the tRNA-like structure (TLS) is encoded by any one of SEQ ID NOs: 4-7.
- The RNA of any one of claims 1-39, wherein the RNA comprises more than one non-coding RNA sequences, wherein the non-coding RNA sequences are the same or different.
- The RNA of any one of claims 1-40, wherein the RNA comprises more than one non-coding RNA sequences that are guide RNAs (gRNAs) , wherein the gRNAs are the same or different.
- The RNA of any one of claims 1-41, wherein the RNA comprises more than one non-coding RNA sequences, and wherein the RNA comprises a tRNA-like structure (TLS) between the protein coding sequence and the nearest non-coding RNA sequence, and between each non-coding RNA sequences.
- The RNA of any one of claims 1-42, wherein the protein coding sequence is located upstream relative to all the non-coding RNA sequences.
- The RNA of any one of claims 1-43, wherein the RNA comprises at least one modified nucleotide.
- A DNA encoding the RNA of any one of claims 1-44.
- The DNA of claim 45, further comprising an RNA polymerase promoter at the 5’-end.
- The DNA of claim 46, wherein the RNA polymerase promoter is a eukaryotic RNA polymerase II promoter.
- The DNA of claim 47, wherein the RNA polymerase promoter is selected from human cytomegalovirus immediate early enhancer/promoter (CMV promoter) , human eukaryotic translation elongation factor 1 α1 promoter (EF1a promoter) , CMV early enhancer fused to modified chicken β-actin promoter (CAG promoter) , Simian virus 40 enhancer/early promoter (SV40 promoter) , and human or mouse phosphoglycerate kinase 1 promoter (PGK promoter) .
- A vector comprising the DNA of any one of claims 45-48.
- The vector of claim 49, wherein the vector is a viral vector or a plasmid.
- A composition comprisinga. the RNA of any one of claims 1-44, the DNA of any one of claims 45-48, and/or the vector of any one of claims 49-50; andb. a carrier.
- The composition of claim 51, wherein the carrier is selected from lipid nanoparticles, liposomes, cationic nanoemulsions, dendrimer-based lipid nanoparticles, cationic polymers, and polysaccharide particles.
- A system comprising the RNA of any one of claims 1-44, a ribonuclease P (RNase P) or a polynucleotide encoding thereof, and a ribonuclease Z (RNase Z) or a polynucleotide encoding thereof.
- A system comprising the DNA of any one of claims 45-48 and/or the vector of any one of claims 49-50, an RNA polymerase II or a polynucleotide encoding thereof, a ribonuclease P (RNase P) or a polynucleotide encoding thereof, and a ribonuclease Z (RNase Z) or a polynucleotide encoding thereof.
- A gene editing system comprisinga. an hgRNA comprising a CRISPR motif, an hgRNA spacer, and a first protein-binding motif, or a DNA polynucleotide encoding the hgRNA,b. an mgRNA comprising a second CRISPR motif and an mgRNA spacer, or a DNA polynucleotide encoding the mgRNA, wherein the mgRNA spacer targets a target gene,c. a first CRISPR-associated protein (Cas protein) , or a polynucleotide encoding the first Cas protein, wherein the first Cas protein binds to the first CRISPR motif,d. a second Cas protein, or a polynucleotide encoding the second Cas protein, wherein the second Cas protein binds to the second CRISPR motif,e. a first fusion protein comprising a nucleobase deaminase or a catalytic domain thereof and a first RNA binding domain, or a polynucleotide encoding the first fusion protein, wherein the nucleobase deaminase or the catalytic domain thereof and the first RNA binding domain are optionally connected by a linker, and wherein the first RNA binding domain binds to the first protein-binding motif.wherein the first Cas protein and second Cas protein are the same or different,wherein the gene editing system comprises a TLS-containing RNA or a DNA encoding the TLS-containing RNA, wherein the TLS-containing RNA comprises a protein coding sequence, a non-coding RNA sequence, and a tRNA-like structure (TLS) between the protein coding sequence and the non-coding RNA sequence; wherein the protein encoded by the protein coding sequence is selected from a group consisting of the first Cas protein, the second Cas protein, and the first fusion protein; and wherein the non-coding RNA sequence is the hgRNA or the mgRNA.
- The gene editing system of claim 55, further comprisinga. a protease, or a polynucleotide encoding thereof, andb. a nucleobase deaminase inhibitor domain or a polynucleotide encoding thereof,wherein the nucleobase deaminase inhibitor domain is connected to the nucleobase deaminase or the catalytic domain thereof in the first fusion protein optionally by a linker, and wherein there is a cleavage site for the protease between the nucleobase deaminase inhibitor domain and the nucleobase deaminase or the catalytic domain thereof.
- The gene editing system of claim 56, further comprising a second fusion protein comprising the protease and a second RNA binding domain, or a polynucleotide encoding the second fusion protein,wherein the protease and the second RNA binding domain are optionally connected by a linker,wherein the mgRNA further comprises a second protein-binding motif,wherein the second RNA binding domain binds to the second protein-binding motif;and wherein the protein encoded by the protein coding sequence is selected from a group consisting of the first Cas protein, the second Cas protein, the first fusion protein, and the second fusion protein.
- The gene editing system of claim 56, wherein the protease is split into a first protease fragment and a second protease fragment, wherein the first and/or second protease fragment alone is not able to cleave the cleavage site.
- The gene editing system of claim 58, further comprisinga. a second fusion protein comprising the first protease fragment and a second RNA binding domain, or a polynucleotide encoding the second fusion protein, wherein the first protease fragment and the second RNA binding domain are optionally connected by a linker, andb. a third fusion protein comprising the second protease fragment and a third RNA binding domain, or a polynucleotide encoding the third fusion protein, wherein the second protease fragment and the third RNA binding domain are optionally connected by a linker,wherein the mgRNA further comprises a second protein-binding motif and a third protein-binding motif,wherein the second RNA binding domain binds to the second protein-binding motif, wherein the third RNA binding domain binds to the third protein-binding motif,and wherein the protein encoded by the protein coding sequence is selected from a group consisting of the first Cas protein, the second Cas protein, the first fusion protein, the second fusion protein, and the third fusion protein.
- The gene editing system of claim 59, wherein the second and third RNA binding domains are the same or different, and the second and third protein-binding motifs are the same or different.
- The gene editing system of claim 58, further comprisinga second fusion protein comprising the first protease fragment and a second RNA binding domain, or a polynucleotide encoding the second fusion protein,wherein the first protease fragment and the second RNA binding domain are optionally connected by a linker,wherein the mgRNA further comprises a second protein-binding motif,wherein the second RNA binding domain binds to the second protein-binding motif,and wherein the protein encoded by the protein coding sequence is selected from a group consisting of the first Cas protein, the second Cas protein, the first fusion protein, and the second fusion protein.
- The gene editing system of any one of claims 56-61, wherein the protease is a TEV protease, a TuMV protease, a PPV protease, a PVY protease, a ZIKV protease, or a WNV protease.
- The gene editing system of claim 62, wherein the protease is a TEV protease comprising a sequence of SEQ ID NO: 261.
- The gene editing system of claim 63, wherein the first TEV protease fragment comprises a sequence of SEQ ID NO: 262 or 263.
- The gene editing system of any one of claims 55-67, wherein the nucleotide deaminase is a cytidine deaminase.
- The gene editing system of claim 65, wherein the cytidine deaminase is selected from the group consisting of APOBEC3B (A3B) , APOBEC3C (A3C) , APOBEC3D (A3D) , APOBEC3F (A3F) , APOBEC3G (A3G) , APOBEC3H (A3H) , APOBECI (Al) , APOBEC3 (A3) , APOBEC2 (A2) , APOBEC4 (A4) , and AICDA (AID) .
- The gene editing system of any one of claims 65-66, wherein the cytidine deaminase comprises an amino acid sequence of any one of SEQ ID NOs: 166-201.
- The gene editing system of any one of claims 55-67, wherein the nucleotide deaminase is an adenosine deaminase.
- The gene editing system of claim 68, wherein the adenosine deaminase is selected from the group consisting of tRNA-specific adenosine deaminase (TadA) , adenosine deaminase tRNA specific 1 (ADAT1) , adenosine deaminase tRNA specific 2 (ADAT2) , adenosine deaminase tRNA specific 3 (ADAT3) , adenosine deaminase RNA specific B1 (ADARB1) , adenosine deaminase RNA specific B2 (ADARB2) , adenosine monophosphate deaminase 1 (AMPD1) , adenosine monophosphate deaminase 2 (AMPD2) , adenosine monophosphate deaminase 3 (AMPD3) , adenosine deaminase (ADA) , adenosine deaminase 2 (ADA2) , adenosine deaminase like (ADAL) , adenosine deaminase domain containing 1 (ADAD1) , adenosine deaminase domain containing 2 (ADAD2) , and adenosine deaminase RNA specific (ADAR) .
- The gene editing system of any one of claims 68-69, wherein the adenosine deaminase comprises an amino acid sequence of any one of SEQ ID NOs: 73-165.
- The gene editing system of any one of claims 55-70, wherein the first fusion protein further comprises an uracil glycosylase inhibitor (UGI) .
- The gene editing system of any one of claims 55-71, wherein the Cas protein is a Cas9, a dead Cas9 (dCas9) , or a Cas9 nickase (nCas9) selected from the group consisting of SpCas9, FnCas9, St1Cas9, St3Cas9, NmCas9, SaCas9, AsCpfl, LbCpfl, FnCpfl, VQR SpCas9, EQR SpCas9, VRER SpCas9, SpCas9-NG, xSpCas9, RHA FnCas9, KKH SaCas9, NmeCas9, StCas9, CjCas9, AsCpfl, FnCpfl, SsCpfl, PcCpfl, BpCpfl, CmtCpfl, LiCpfl, PmCpfl, Pb3310Cpfl, Pb4417Cpfl, BsCpfl, EeCpfl, BhCasl2b, AkCasl2b, EbCasl2b, LsCasl2b, RfCasl3d, LwaCasl3a, PspCasl3b, PguCasl3b, and RanCasl3b.
- The gene editing system of any one of claims 55-72, wherein the first protein-binding RNA motif and the first RNA binding domain, the second protein-binding RNA motif and the second RNA binding domain, and the third protein-binding RNA motif and the third RNA binding domain, are each independently selected from the group consisting of a MS2 phage operator stem-loop and MS2 coat protein (MCP) or an RNA-binding section thereof,a BoxB and N22P or an RNA-binding section thereof,a telomerase Ku binding motif and Ku protein or an RNA-binding section thereof,a telomerase Sm7 binding motif and Sm7 protein or an RNA-binding section thereof,a PP7 phage operator stem -loop and PP7 coat protein (PCP) or an RNA-binding section thereof,a SfMu phage Com stem-loop and Com RNA binding protein or an RNA-binding section thereof, anda non-natural RNA aptamer and corresponding aptamer ligand or an RNA-binding section thereof.
- The gene editing system of any one of claims 55-73, wherein the mgRNA and/or the hgRNA comprise a dual-RNA structure.
- The gene editing system of claim 74, wherein the dual-RNA structure is formed by a CRISPR RNA (crRNA) and a trans-activating crRNA (tracrRNA) , wherein the crRNA comprises the spacer.
- The gene editing system of claim 74 or 75, wherein the mgRNA comprises a mcrRNA and a first tracrRNA, and the mcrRNA comprises the mgRNA spacer, wherein the hgRNA comprises a hcrRNA and a second tracrRNA, and the hcrRNA comprises the hgRNA spacer, and wherein the first tracrRNA and the second tracrRNA are same or different.
- The gene editing system of any one of claims 55-76, wherein the TLS-containing RNA comprises more than one non-coding RNA sequences.
- The gene editing system of any one of claims 55-77, wherein the target gene is a mammalian gene.
- The gene editing system of any one of claims 55-78, wherein the mgRNA spacer and the hgRNA spacer are respectively:
- A method for gene editing in a subject, comprising administering to the subjecta. the RNA of any one of claims 1-44; and/orb. the DNA of any one of claims 45-48; and/orc. the vector of any one of claims 49-50; and/ord. the composition of any one of claims 51-52; and/ore. the system of any one of claims 53-54; and/orf. the gene editing system of any one of claims 55-79.
- The method of claim 80, wherein the subject is a mammal.
- The method of claim 81, wherein the subject is a human.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CNPCT/CN2023/113793 | 2023-08-18 | ||
| CN2023113793 | 2023-08-18 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| WO2025039972A1 true WO2025039972A1 (en) | 2025-02-27 |
| WO2025039972A9 WO2025039972A9 (en) | 2025-05-22 |
Family
ID=94731374
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2024/112328 Pending WO2025039972A1 (en) | 2023-08-18 | 2024-08-15 | Tls-based gene editing systems |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025039972A1 (en) |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20160264981A1 (en) * | 2014-10-17 | 2016-09-15 | The Penn State Research Foundation | Methods and compositions for multiplex rna guided genome editing and other rna technologies |
| US20190330643A1 (en) * | 2018-04-25 | 2019-10-31 | The Catholic University Of America | Engineering of bacteriophages by genome editing using the crispr-cas9 system |
| WO2021129895A2 (en) * | 2019-12-23 | 2021-07-01 | 浙江大学 | Infectious plant rhabdovirus vector and method for non-transgenic, site-directed editing of plant genome |
| CN113493803A (en) * | 2021-08-03 | 2021-10-12 | 中国农业大学 | Alfalfa CRISPR/Cas9 genome editing system and application thereof |
| WO2023018938A1 (en) * | 2021-08-12 | 2023-02-16 | The J. David Gladstone Institutes, A Testamentary Trust Established Under The Will Of J. David Gladstone | Methods for generation of precise rna transcripts |
-
2024
- 2024-08-15 WO PCT/CN2024/112328 patent/WO2025039972A1/en active Pending
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20160264981A1 (en) * | 2014-10-17 | 2016-09-15 | The Penn State Research Foundation | Methods and compositions for multiplex rna guided genome editing and other rna technologies |
| US20190330643A1 (en) * | 2018-04-25 | 2019-10-31 | The Catholic University Of America | Engineering of bacteriophages by genome editing using the crispr-cas9 system |
| WO2021129895A2 (en) * | 2019-12-23 | 2021-07-01 | 浙江大学 | Infectious plant rhabdovirus vector and method for non-transgenic, site-directed editing of plant genome |
| CN113493803A (en) * | 2021-08-03 | 2021-10-12 | 中国农业大学 | Alfalfa CRISPR/Cas9 genome editing system and application thereof |
| WO2023018938A1 (en) * | 2021-08-12 | 2023-02-16 | The J. David Gladstone Institutes, A Testamentary Trust Established Under The Will Of J. David Gladstone | Methods for generation of precise rna transcripts |
Non-Patent Citations (1)
| Title |
|---|
| JIANG,C.Q. ET AL.: "Multiplexed Gene Engineering Based on dCas9 and gRNA-tRNA Array Encoded on Single Transcript", INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, vol. 24, 10 May 2023 (2023-05-10), XP093130116, DOI: 10.3390/ijms24108535 * |
Also Published As
| Publication number | Publication date |
|---|---|
| WO2025039972A9 (en) | 2025-05-22 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20240132877A1 (en) | Genome editing systems comprising repair-modulating enzyme molecules and methods of their use | |
| CN116497067B (en) | Compositions and methods for treating hemoglobinopathies | |
| US20230227857A1 (en) | Class ii, type v crispr systems | |
| EP3461894B1 (en) | Engineered crispr-cas9 compositions and methods of use | |
| EP3178935B1 (en) | Genome editing using campylobacter jejuni crispr/cas system-derived rgen | |
| JP2023543803A (en) | Prime Editing Guide RNA, its composition, and its uses | |
| JP2023134529A (en) | Novel minimal utr sequences | |
| CN119280261A (en) | Methods for editing disease-related genes using adenosine deaminase base editors, including treatments for genetic diseases | |
| CA3077086A1 (en) | Systems, methods, and compositions for targeted nucleic acid editing | |
| KR20160089530A (en) | Delivery, use and therapeutic applications of the crispr-cas systems and compositions for hbv and viral diseases and disorders | |
| JP2020534795A (en) | Methods and Compositions for Evolving Base Editing Factors Using Phage-Supported Continuous Evolution (PACE) | |
| HK1247238A1 (en) | Engineered crispr-cas9 compositions and methods of use | |
| CN107922949A (en) | Compounds and methods for for the genome editor based on CRISPR/CAS by homologous recombination | |
| JP2024504981A (en) | Novel engineered and chimeric nucleases | |
| US20240218339A1 (en) | Class ii, type v crispr systems | |
| TW202421785A (en) | Compositions and methods for epigenetic regulation of hbv gene expression | |
| KR20180128864A (en) | Gene editing composition comprising sgRNAs with matched 5' nucleotide and gene editing method using the same | |
| WO2025039972A1 (en) | Tls-based gene editing systems | |
| WO2025201481A1 (en) | Crispr-cas systems | |
| US20250059568A1 (en) | Class ii, type v crispr systems | |
| WO2025081042A1 (en) | Nickase-retron template-based precision editing system and methods of use | |
| CN108977442B (en) | Systems for DNA editing and applications thereof | |
| EP4658669A1 (en) | Chimeric pseudotyped recombinant rabies virus | |
| HK40006002A (en) | Engineered crispr-cas9 compositions and methods of use | |
| HK40006002B (en) | Engineered crispr-cas9 compositions and methods of use |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24855697 Country of ref document: EP Kind code of ref document: A1 |